577

The Mars Curiosity rover has landed successfully, and one of the promo videos "7 minutes of terror" brags about there being 500,000 lines of code. It's a complicated problem, no doubt. But that is a lot of code, surely there was a pretty big programming effort behind it. Does anyone know anything about this project? I can only imagine it's some kind of embedded C.

gnat
  • 21,442
  • 29
  • 112
  • 288
InfinitiesLoop
  • 3,673
  • 3
  • 15
  • 8
  • 97
    Why would one assume there is only one language involved in the project. – Rig Aug 06 '12 at 04:11
  • 5
    Good point, sure, it's probably got a breadth of technology associated with it. I want to know more about all of that :) – InfinitiesLoop Aug 06 '12 at 04:12
  • 3
    Which part? The spacecraft? The rover? Instruments? The ground system? As other comments indicate, there are probably several languages used in the different components. It's not out of the question that assembler was used for some of the time critical components. – GreenMatt Aug 06 '12 at 04:27
  • 1
    Since it's a government project I am guessing Forth, MUMPS 2011, and RPG V, with management interfaces built in Object COBOL, and motor control in Postscript. – joshp Aug 06 '12 at 05:44
  • I think Roverbasic, a new language purportedly designed by the JPL, but it will turn out to be actualy written by Microsoft. – Mr Lister Aug 06 '12 at 05:46
  • 71
    To be honest, when I saw the 500kloc figure I caught myself thinking "Only?" It could have been realistic had it been Haskell, but having read a bit about previous projects and their low level languages, this seemed way too low. The 2.5mio loc C code cited below are more believable. – Philip Kamenarsky Aug 06 '12 at 06:02
  • @PhilipK - I think the 500kloc figure might only cover the EDL software. – JohannesD Aug 06 '12 at 14:43
  • Some of the sub-questions you asked were not answered in the other question before. That has been fixed :) – Nate Parsons Aug 06 '12 at 15:01
  • 1
    @Philip K It might be the 500kloc is for the descent software only. The keynotes in the answer of drhorrible divides the MSL into 3 different stages, running different software, 1. the flight (earth to mars) 2. The descent and landing 3. The rover itself, roving around. – nos Aug 06 '12 at 16:38
  • 1
    @PhilipK I'm thinking 500k LOC is with the comments and extra blank lines stripped out - so, 500k _functional_ LOC, but 2.5m lines total in the codebase. ;) – Izkata Aug 06 '12 at 16:54
  • 20
    A more interesting question that *"in what language?"* is *"with what process?"*. It's the process that make the difference, and NASA has been using a rigorous one for decades now. – dmckee --- ex-moderator kitten Aug 06 '12 at 17:07
  • It was all written in LISP. Nasa is trusting LISP's backtracking to infer all the correct decisions to make. – JustinDanielson Aug 06 '12 at 21:16
  • 2
    @dmckee: I agree. Why don't you ask that question? – Jim G. Aug 06 '12 at 22:29
  • This overview was really interesting, talking about the tech behind its software and instruments: http://www.extremetech.com/extreme/134041-inside-nasas-curiosity-its-an-apple-airport-extreme-with-wheels – Matt Aug 08 '12 at 13:50
  • I asked a question on ITSec.se regarding the security on the rover: – pasawaya Oct 09 '12 at 07:27
  • For an insight into NASA software engineering culture and practices, there's a great article from 1996: http://www.fastcompany.com/28121/they-write-right-stuff – akatkinson Oct 05 '13 at 21:35

2 Answers2

538

It's running 2.5 million lines of C on a RAD750 processor manufactured by BAE. The JPL has a bit more information but I do suspect many of the details are not publicized. It does appear that the testing scripts were written in Python.

The underlying operating system is Wind River's VxWorks RTOS. The RTOS in question can be programmed in C, C++, Ada or Java. However, only C and C++ are standard to the OS, Ada and Java are supported by extensions. Wind River supplies a tremendous amount of detail as to the hows and whys of VxWorks.

The underlying chipset is almost absurdly robust. Its specs may not seem like much at first but it is allowed to have one and only one "bluescreen" every 15 years. Bear in mind, this is under bombardment from radiation that would kill a human many times over. In space, robustness wins out over speed. Of course, robustness like that comes at a cost. In this case, it's a cool $200,000 to $500,000.

An Erlang programmer talks about the features of the computers and codebase on Curiosity.

ittays
  • 5
  • 2
  • Hmmm... it still is a surprise to me that such an important mission isn't running as closer to Machine Code e.g. Assembler... – Dynamic Aug 06 '12 at 04:37
  • 48
    JPL C language coding standards, specifically for embedded environments instead of "ground software" as they call it. http://lars-lab.jpl.nasa.gov/JPL_Coding_Standard_C.pdf – Patrick Hughes Aug 06 '12 at 06:21
  • 81
    @Dynamic: It's such an important mission that NASA wouldn't risk it. Humans writing assembly make more errors, that's a measured fact. – MSalters Aug 06 '12 at 08:16
  • 8
    It's really all done in C? I thought NASA tended to avoid C on the grounds that its performance comes at the cost of being far too easy to shoot yourself in the foot with it, and preferred higher level languages with more robust error detection. – GordonM Aug 06 '12 at 08:27
  • 18
    @GordonM: I guess NASA makes a lot of re-use of existing, mature code, developed in the last decades for previous missions. So it is more amazing the code is not written in Fortran. – Doc Brown Aug 06 '12 at 09:37
  • 23
    Compiled C code is machine code, assembly language is machine code, I don't see the difference. There isn't a huge performance difference when you get down to it. – Ramhound Aug 06 '12 at 11:43
  • 25
    NASA are extermely careful with their code. Everything (EVERYTHING) is done in the spec first and is repeatedly reviewed, checked and refined. When it is put into the life code stream it is almost a cut and paste of the spec's reference. The test scripts are given at least as much attention as the code is and no 'flashy' or clever code tricks are allowed unless they are critically needed. – Stefan Aug 06 '12 at 13:02
  • 17
    Whoa, C. Colour me surprised. I would have assumed some strictly checked language without such things as pointers or undefined behaviour. – Konrad Rudolph Aug 06 '12 at 13:50
  • 11
    One shouldn't judge a language only by what a minimal compiler allows. Have you *seen* any of today's "static" code analyzers? (They aren't exactly static anymore.) – Jeanne Pindar Aug 06 '12 at 14:06
  • 4
    As for why they need to largely stick with C; if you look at the Rad750's datasheet, the fastest version on offer is only a 200mhz chip. Since it's not mentioned, I assume it's also only single core. That's not much hardware to give room for higher level languages overhead. – Dan Is Fiddling By Firelight Aug 06 '12 at 14:56
  • 256MB of DRAM.. Seriously? My phone has double that! – Amarghosh Aug 06 '12 at 14:57
  • 103
    @Amarghosh: yeah, and see how well your cell phone works when it goes through a high-radiation environment such as outer space :) – whatsisname Aug 06 '12 at 15:02
  • 18
    @TomO'Connor: [I'm not](http://www.leshatton.org/wp-content/uploads/2012/01/Ariane5_STQE499.pdf). – TMN Aug 06 '12 at 15:16
  • 12
    @KonradRudolph check the JPL Coding standard. No dynamic allocation is one of the rules. – Ólafur Waage Aug 06 '12 at 18:49
  • 7
    @Dan That’s a fallacy. Higher-level languages don’t necessarily come with an overhead. Consider Ocaml. More importantly, the coding standards used here (JPL …) mandate many redundant checks which take a lot out of the speed advantage that C otherwise has. As a result, C’s performance advantages vanishes in a puff of smoke. The real reason for using C is most probably the potential of fine-grained control over allocation but again I’d have expected specialised higher-level languages to offer this. – Konrad Rudolph Aug 06 '12 at 19:22
  • 7
    @Stefan No, not everything at NASA has the same set of code standards. But every project involving software engineering has the same set of process standards. See NPR7150. http://nodis3.gsfc.nasa.gov/displayDir.cfm?t=NPR&c=7150&s=2 and even then this depends on the class of the software. Class A software usually involved keeping Humans alive in space. But class H software is general purpose desktop software. Class H software does not require verification and validation, but class A does. – Sean McCauliff Aug 06 '12 at 22:58
  • 1
    @SeanMcCauliff, thanks. I read a doc about their software standards, I guess it only refered to a certain class but I assumed it was all of their software. – Stefan Aug 07 '12 at 08:15
  • 5
    Looks like much of that C code (at least all the FSM parts) had been generated from a high level DSL. – SK-logic Aug 07 '12 at 08:51
  • 2
    @Dynamic _especially_ for an important mission you want all the extra help that higher languages can give you. –  Aug 07 '12 at 09:37
  • 6
    @TMN If you had any experience with Ada you'd find it is a very safe language and leaves very little to chance, encouraging engineers to actually think about the code they write; hence it being used often in safety critical systems along with formal notation (Z is quite popular). To bypass the design intentions of Ada is not easy, yet the developers went out their way and did just this. – R4D4 Aug 07 '12 at 10:29
  • 14
    Why all the implicit assumption that performance is paramount? Consider the speed at which the rover travels and the speed of it's various servos, it doesn't have to be lightning, real-time, nanosecond fast. Nor is concurrency a major issue, it can pretty much do most of it's task serially (run motor 1 second, turn camera, turn wheel, run motor 0,5 seconds etc). Stability and precision, I would think, are far more important. – pap Aug 07 '12 at 14:06
  • 11
    @pap:nano-second speed isn't necessarily the issue, but real-time is. Stuff has to happen pretty much exactly when it is supposed to happen. This is why VxWorks is a popular choice for embedded real-time systems. VxWorks has great support for C, and ok support for C++. I never used it with Java, but suspect that to make that real-time they'd have to make it non-standard. Anyways, my point is that VxWorks probably drove the language decision. – Dunk Aug 07 '12 at 14:45
  • 5
    @KonradRudolph : if they ban dynamic allocation in C (for various reasons) then you'll never get it written in .NET or Java, as those systems use dynamic allocation almost exclusively. Java for example has licence restrictions for using it to write critical system. The point of C is that you can guarantee exactly what is happening at any given point in execution, something expensive to do, but necessary if sending a field engineer to debug is impractical. – gbjbaanb Aug 08 '12 at 17:26
  • @gbjbaanb And you may notice that I suggested neither of those languages. What make you think I did? – Konrad Rudolph Aug 08 '12 at 18:09
  • @ThorbjørnRavnAndersen: Sorry, but high level languages suck for anything performance and safety oriented. They abstract away many things, but all of them have problems, every solution adds new layers of problems, and getting critical system to work is to remove problems, so any high language is an exact opposite of what you want to do. Good read - http://www.joelonsoftware.com/articles/LeakyAbstractions.html – Coder Aug 09 '12 at 02:14
  • 4
    @Coder So you essentially say that you need to write in machine code to have anything reliable in a critical system? Or is your conclusion something else? –  Aug 09 '12 at 07:45
  • 3
    @Coder False. You've ignored decades of progress in programming languages. There is actually the opposite argument: that if mission-critical software started getting written in higher level, functional languages, there would be fewer failures. – Andres F. Aug 10 '12 at 01:41
  • 1
    @Coder: The latest Air Traffic Control systems are written in Java for one example. Don't mix up system reliability with software reliability- the most reliable systems are made up of unreliable parts. Because the most reliable parts (hardware and software) can fail, the system is designed to work when they do. Because it works when they fail, they non longer need to be reliable. Todays highly reliable systems are made out of consumer grade components. This does not apply to spacecraft where you tend to only have 1 of each system, and only duplicate the most critical. – mattnz Aug 10 '12 at 05:12
  • 1
    @ThorbjørnRavnAndersen: Almost, you need nop slides and stuff like that to fail when RAM error or cosmic ray creates a havoc in the memory. You really want to do any garbage collection in mission critical systems. Everything has to be super simple and easily checkable. Thinking that some high level gives you better reliability is very wrong. There are few things that might help when used sparingly, say templates, and things that downward suck, like exceptions. – Coder Aug 12 '12 at 23:58
  • 2
    @Coder I do not know if you program space crafts for a living. I don't so I found this small historic piece about Lisp on space crafts - _Debugging a program running on a $100M piece of hardware that is 100 million miles away is an interesting experience_ - http://www.flownet.com/gat/jpl-lisp.html –  Aug 13 '12 at 00:44
  • 2
    @ThorbjørnRavnAndersen: If you need to debug an app 100 million miles away, it's already a problem. And don't tell me you want to debug Java program 100 million miles away. Once under certain circumstances some weird JRE bug causes some weird behavior in garbage collector which causes chain reaction in all dynamic memory accounting, addressing and deletion. You won't even be able to re-flash the thing. And the article had one thing right - "one thing you would do - get rid of lisp". Shame that the author is a fanboy and doesn't get the core of the problem. – Coder Aug 13 '12 at 14:27
  • @Coder read the article. They found the bug and fixed it - and Lisp is not exactly assembly language. –  Aug 13 '12 at 14:55
  • @Ramhound "There isn't a huge performance difference when you get down to it.": I admit it is a while ago, but the last time I wrote the same program in C and assembly to compare for speed, the assembly program turned out to be twice as fast. – Giorgio Sep 15 '12 at 08:57
  • Can we add a link to [this talk](https://vimeo.com/84991949) into the answer? It's an hour-long look into the coding process for the Curiosity rover. Really fascinating. – Matt Jun 19 '15 at 21:04
190

The code is based on that of MER (Spirit and Opportunity), which were based off of their first lander, MPF (Sojourner). It's 3.5 million lines of C (much of it autogenerated), running on an RA50 processor manufactured by BAE and the VxWorks operating system. Over a million lines were hand coded.

The code is implemented as 150 separate modules, each performing a different function. Highly coupled modules are organized into components that abstract the modules they contain, and "specify either a specific function, activity, or behavior." These components are further organized into layers, and there are "no more than 10 top-level components."

Source: Keynote talk by Benjamin Cichy at 2010 Workshop on Spacecraft Flight Software (FSW-10), slides, audio, and video (starts with mission overview, architecture discussion at slide 80).


Someone on Hacker News asked "Not sure what means that most of the C code is auto generated. From what?"

I'm not 100% sure, although there probably is a separate presentation in that year or a different year that describes their auto-generation process. I know that it was a popular topic in general at the FSW-11 conference.

Simulink is a possibility. It's a MATLAB component popular among mechanical engineers, and therefore most navigation & control engineers, and allows them to 'code' and simulate things without thinking they're coding.

Model-based programming is definitely a thing that the industry is slowly becoming aware of, but I don't know how well it's catching on at JPL or if they would have chosen to use it when the project started.

The third and most likely possibility is for the communication code. With all space systems, you need to send commands to the flight software from the ground software, and receive telemetry from the flight software and process it with the ground software. Each command/telemetry packet is a heterogeneous data structure, and is is necessary that both sides are working from the exact same packet definition, and format the packet so it is correctly formatted on the one side, and parsed on the other side. This involves getting a whole lot of things right, including data type, size, and endianness (although the latter is usually a global thing; you could have multiple processors onboard with different endianness).

But that's just the surface. You need lots of repetitive code on both sides to handle things like logging, command/telemetry validation, limit checking, and error handling. And then you can do more sophisticated things. Say you have a command to set a hardware register value, and that value is sent back in telemetry in a particular packet. You could generate ground software that monitors that telemetry point to ensure that when this register value is set, eventually the telemetry changes to reflect the change. And of course, some telemetry points are more important than others (e.g. the main bus current) and are designated to come down in multiple packets, which involves extra copying on the flight side and data de-duplication on the ground side.

With all that, it's much easier (in my opinion) to write one collection of static text files (in XML, CSV, or some DSL/what-have-you), run them through a Perl/Python script, and presto! Code!

I do not work at JPL, so I cannot provide any detail that is not in the video, with one exception. I've heard that the autogenerated C code is written by Python scripts, and the amount of autocoding in a project varies greatly depending on who the FSW lead is.

Peter Mortensen
  • 1,050
  • 2
  • 12
  • 14
Nate Parsons
  • 1,481
  • 1
  • 9
  • 7
  • 10
    This might shed some light on Wind River, the contractor who makes VxWorks: http://www.windriver.com/news/press/pr.html?ID=10901 I've read that NASA has a team of people whose job is to find as many bugs as they can in the control system code written by another team. The bug-finding team is rewarded for bugs they find and they are really quite good in finding arcane bugs. When a bug is found, a 5Y-type analysis is done to find out how the software dev process could be improved to eliminate the possibility of similar bugs in the future. A very painstaking and expensive process. – Jim Raden Aug 06 '12 at 17:47
  • 17
    @JimRaden When the direct cost of failure for a probe runs from several hundred million to several billion dollars and several years (if at all) for a redo attempt extreme paranoia in QA is justified. The indirect costs in the form of dozens/hundreds of grad students losing years of work and having to restart on their phd work and various new professors who were counting on data from it to supply their tenure track research is another major hit but much harder to quantify than the line items in the NASA budget. – Dan Is Fiddling By Firelight Aug 06 '12 at 18:41
  • 1
    What was the C auto-generated from? Please tell me that it was not Simulink. :-) – William Payne Aug 06 '12 at 20:54
  • William: It's possible some navigation/control code was generated from Simulink, because it's fairly common in this industry, but I can't say for sure on this project. I believe most of it is generated from some sort of Interface Control Document spec and has to do with parsing + formatting of commands and telemetry. DRY, and all that. – Nate Parsons Aug 06 '12 at 21:30
  • 2
    @William Payne The keynote states that some of it are autogenerated protocol encoding/decoding routines (for communication with earth), generated by python programs from XML descriptions. – nos Aug 07 '12 at 09:36
  • 1
    Automagically generating code from ICDs is kinda cool. I like the idea! I would have used YAML rather than XML, though. :-) – William Payne Aug 07 '12 at 13:41
  • 1
    William: I also prefer YAML (or even JSON) over XML, and if I ever lead a project, it will be my choice. The project I'm on now uses .xls "because non-programmers have to edit the files" :( – Nate Parsons Aug 07 '12 at 15:04
  • 1
    Code generation is nothing new. http://en.wikipedia.org/wiki/Model-driven_software_development http://en.wikipedia.org/wiki/Model-driven_architecture http://en.wikipedia.org/wiki/Model-driven_engineering -- the goal is to express big parts of the system in a formal modelling language that lets you mathematically prove certain properties of your models. This should remind you of state machines and petri nets. These models are then transformed into code. – sleeplessnerd Jan 26 '13 at 18:01
  • @NateParsons: Ever heard of proper schema validation and xml binding? - And the .xls argument is valid to some degree considering the existing very good APIs (http://poi.apache.org/spreadsheet/index.html) – sleeplessnerd Jan 29 '13 at 16:03
  • @sleeplessnerd My apologies if I ever gave the impression that I thought code generation was anything new. Re: .xls, my beef is not with accessing them programmatically, but their QMS and change-tracking. The simple act of opening a spreadsheet in excel changes the file, and even 'real' changes have essentially unpredictable effects. On multi-engineer teams, conflicts are inevitable, and sorting them out in xls files is a pain. I think an ideal solution would basically allow well-formed YAML files to be edited via excel – Nate Parsons Feb 05 '13 at 03:35
  • @NateParsons: No apologies needed, I was the one with the rough tone :) -- You are completely right though with the excel thing, I can feel your pain, I was assuming it was for something more non critical. -- The discussion of different markup dialects threw me off though, kind of the least important detail in relation to the post - Like choosing the type of wallpaper to start your house construction planning. (sounding condescending again) Seriously though, Schemas are the best. – sleeplessnerd Feb 06 '13 at 04:39
  • I would argue that model-based programming isn't always the best choice especially when developing a system. – Andrew Larsson May 25 '14 at 15:02
  • 1
    Unfortunately, Caltech is (apparently) no longer hosting/archiving any of the material from that conference, so your links are all broken. That's a real shame. If anyone knows of an alternate source for this material, it would be very much appreciated! – kmote Oct 22 '15 at 20:57