16

I have a lot of MATLAB code that needs to get ported to C (execution speed is critical for this work) as part of a back-end process for a web application. When I attempt to outsource this code to a C developer, I assume (correct me if I'm wrong) few C developers also understand MATLAB code (things like indexing and memory management are different, etc.). I wonder if there are any C developers out there that can recommend a procedure for me to follow to best communicate what the code does?

For example, should I provide the MATLAB code and explain what it's doing line by line? Or, should I just provide the math/algorithm, explain it in plain English, and let the C developer implement it with this understanding in his/her own way (e.g. can I assume the developer understands how to work with complex math (i.e. imaginary numbers), how to generate histograms, perform an FFT, etc.)?

Or, is there a better method? I expect I'm not the first to need to do this, so I wonder if any C developers out there ran into this situation and can share any conventional wisdom how they'd like this task to be transferred?

Thanks in advance for any comments.

gkdsp
  • 497
  • 4
  • 11
  • 3
    have you tried just using mcc to convert the matlab code to c? i think its `mcc -c matlabfile.m` – Will Tate Jan 30 '11 at 03:40
  • 4
    When you hire the programmer, specify that s/he must understand both C and Matlab code. Given your emphasis on speed, you should probably use C++ instead of C though (with some care, it's never slower, and often faster). – Jerry Coffin Jan 30 '11 at 03:41
  • 3
    @willytate: This requires that you have the Matlab compiler, it doesn't produce particularly efficient code, and there's some limitations on what code you can compile. – Jonas Jan 30 '11 at 03:55
  • Matlab compiler is a $5000 option (USD), and Jonas comments here are well stated. –  Jan 30 '11 at 04:07
  • Hi Jerry, my understanding was opposite ... which was that c runs faster than c++. I understand c++ makes garbage collection easier, etc, to program, but as long as object-oriented code is avoided, wouldn't c++ be the same speed (i.e. not faster) than c? When would c++ be faster than c, for example? Very interesting info. –  Jan 30 '11 at 04:10
  • 3
    If you do object-oriented kind of programming in C, not only do you deprive yourself of the opportunity to have the compiler do extra type-safety checks, you deprive the compiler of the chance to do certain kinds of optimizations that cannot be expressed easily in C. Also, using C++ increases the chances that you can use functionality from a library that was written by somebody smarter than you. –  Jan 30 '11 at 07:01
  • 1
    Here are a couple of questions on Matlab-to-C porting/conversion: [Matlab to C or C++](http://stackoverflow.com/q/4166755/404469) and [convert matlab code to c code](http://stackoverflow.com/q/4727932/404469). – gary Jan 30 '11 at 16:53
  • 3
    Is most of the time spent in the actual algorithm, or in fiddling with files, IO, and parsing? The average C coder is not going to be able to write FFT and matrix code that comes anywhere near competing with the MATLAB implementation. If parsing, IO, and file fiddling is where your program is spending its time, then your C programmer should focus on that, and write glue code that calls functions from the MATLAB libraries. If the FFT/matrix decomposition/ODE solving is the bottleneck, then you have to find a C programmer with a LOT of numerical methods experience. – Charles E. Grant Jan 31 '11 at 02:06

7 Answers7

15

I am in a similar situation as you in that I also have people port my Matlab code to C++.

A lot depends on the complexity of your code, as well as on level of skill of the C/C++ developer and on their understanding of what they're supposed to implement - the better they are and the better they understand your problem, the more independently they can work.

Since direct translation of Matlab code to C/C++ may not be the most efficient way to handle a problem, I suggest that you communicate well what the input is, what the code should do, and what it is supposed to return as output. You should also provide ways to test the code to ensure that it works correctly - both as a help for debugging as well as providing a means for quality control. On top of that, you should provide and explain the Matlab code as a rough guideline of how the result can be achieved.

You should be able to assume that the developer knows how to structure a program and how to do use debugging tools. However, you may not necessarily be able to assume that the developer knows how to have specific knowledge in, say statistics, or (mathematical) optimization. Thus, debugging these parts will be much faster with your input.

It may help to schedule regular meetings with the developer, so that "little things" that may feel a bit weird to the developer, but that will signal important issues to you can be communicated before they escalate into big issues.

Jonas
  • 628
  • 3
  • 8
  • +1 for having valid testing results for the software. Porting code from Matlab to C/C++ is hard enough without knowing what the results should look like. – rjzii Jan 31 '11 at 01:50
12

I'm not saying this applies to you, but: Most people who only code in MATLAB write bad code. Very bad, with poor formatting, structure, and documentation.

When this is the case, the only easy way to make use of the MATLAB code is to run it to verify the results from the ported code. Trying to reverse-engineer the MATLAB code without additional documentation is something that should only be undertaken if the original author of the MATLAB code is actually dead or comatose. On the other hand, a well-written mathematical paper on an algorithm is usually much more helpful than the author's graduate student's own implementation.

To make it easier for the person porting your code:

  1. Refactor your code to make sure that operations are broken-down in to different functions. MATLAB's one-function-per-file style encourages functions to be too long and encompass too many operations. Also make sure that duplicated code is pulled out to helper functions, even if this results in more files than you would normally want to work with for a MATLAB project.

  2. Explain any magic numbers or constants used in your code, and the conditions under which they are valid.

  3. Document the data structures of your code. MATLAB's "everything is a matrix" style is very different from most languages, and it often means that your data structures are defined implicitly by how you use the matrices. A C programmer will need to figure out how to set up the various structures and allocate the necessary arrays, so make sure it is clear what the meanings and internal structures are of your variables.

  4. Document the algorithms used by your code. In particular, make sure that it is clear what happens when you use complicated whole-array functions and operators, and make sure that the C programmer has access to references about the algorithms used by any toolbox functions or standard library functions that are more complicated than BLAS functions.

  5. Document anything you've done to make the code robust, such as input validation and error handling. The way you've implemented it is probably very different from how it will have to be done in C. Academics writing MATLAB code rarely bother to learn about things like exception handling. If you haven't done anything to make your code robust, then at least document what could be done about invalid input or flawed or partial data.

  6. Make sure that the person porting the code is able to compare output with the original MATLAB code, and if at all possible, provide a thorough test suite of input and correct output.

  7. If the person doing the porting doesn't know numerical analysis, you will need to supervise the porting process and make sure that you review and understand the C code. It will be very educational for both of you.

user23748
  • 221
  • 1
  • 4
  • I agree with your technical assessment and you make good points (1 - 7), but I think the ad hominem at the top about "most people who only code in MATLAB" was unnecessary and untrue. –  Jan 30 '11 at 08:39
  • 5
    Based on my experience, most of the people who use MATLAB as their only programming language are applied math people, for whom programming is generally nothing more than a means to an end. They have no incentive to care about the practice of programming, and they care only about writing correct code, not good or beautiful code. When they share their code, it is typically meant to be used, but not read - there's always a paper to do the job of explaining the code. If you think this is offensive (and it isn't meant to be), then you are simply missing the perspective of the people who don't care. –  Jan 30 '11 at 09:43
  • 4
    In my experience, bad code is mainly written by people who write one-shot code, i.e. code that will be used (read) by a single person for a single problem. Programming language doesn't matter. And to me as a person mostly programming in Matlab, the *ad hominem* does come across as offensive. – Jonas Jan 30 '11 at 14:13
  • As an EE I'm mainly expected to write MATLAB, and I agree with you. It encourages a very horrible coding style with no scoping, namespacing, organizing into data-structures, or good variable naming. Once vectorization is understood, one doesn't really bother to comment on any of the resultant clever one-line-wonders. Just a giant garbage heap of ugly, inefficient code. – Milind R Jan 08 '15 at 07:50
4

Programming languages are much easier to read than write. Most C programmers with a modicum of experience should be able to read your Matlab code just fine with access to a reference, and especially with access to a Matlab programmer to answer their questions. Code of any sort is much less ambiguous than most requirements we have to work off of.

If they have a bachelor's degree in computer science or computer engineering, they will likely have taken calculus, trigonometry, and linear algebra, but it may be rusty. Unless they do a lot of scientific/math programming, most C programmers will know what a FFT is, but rarely or never had to do one. Your ideal candidate will have all that fresh in his mind, but anyone with a degree should be able to handle the math with some refresher study. In either case, you want someone who emphasizes finding existing libraries for common operations like that whenever possible rather than rolling their own.

Talent for being able to optimize algorithm execution time varies widely even among experienced programmers. I would recommend you have an interview problem to discover that. Show candidates an simple but intentionally inefficient algorithm and ask them what it does. See if they bring up its inefficiency on their own. Ask them what the asymptotic complexity is and what it should be. Ask them how they would rewrite to improve efficiency.

Karl Bielefeldt
  • 146,727
  • 38
  • 279
  • 479
3

The financial reason for not using the matlab compiler is well understandable. However, you can use the free scilab to C converter. The procedure would be

  • Convert your code from Matlab to Scilab with M2SCI tools,
  • Convert the Scilab code to C using "Scilab 2 C",
  • Cross test the codes,
  • Use a profiler to search bottlenecks that need a human eye.

Ideally no knowledge of Scilab is needed in the process and it's easy enough to take some time try this solution (in practice, it's maybe not as simple ...)

Note : I didn't try this but it's a solution I envisage myself for similar reasons.

Clement J.
  • 411
  • 5
  • 8
2

Develop a good test set you can run through both applications and then take a look at the metrics.

This will greatly help your developer test their code, and ensure that the quality is at a reasonable level.

2

Great post by Jonas, especially the point on providing a way to test the code. Here are some additional suggestions:

  • Sharing Code. Consider providing the MATLAB source but be prepared to explain its structure or other details (from syntax to your personal style). The C developer will hopefully recognize the high-level concepts, algorithms and math (and hopefully you commented your code).

  • Documentation. It will be crucial that you have clear documentation that defines the project; after all, if the person is not fluent in MATLAB, the code may not be a very useful reference.

  • Exercise People Skills. This may be obvious however it's good to keep in mind when collaborating, especially on this sort of micro level. So you should try to remove as much ambiguity from your code/documentation as possible. Depending on your level of leadership in the project, you may find that you are striking a balance between guiding development and letting the person make their own individual contribution.

gary
  • 121
  • 4
1

Unless your C coders use the right libs, Matlab is much better at things as trivial as inverting a matrix. A naive C impl. is not stable enough. Hiring C coders would be expensive. I would try porting Matlab code to scipy and compare the speed, try to use the Matlab's c compiler, or ... just throw more hardware at it - it could be much cheaper and simpler and safer and faster.

Job
  • 6,459
  • 3
  • 32
  • 54