Why are some C programs written in one huge source file?

Question

For example, the SysInternals tool "FileMon" from the past has a kernel-mode driver whose source code is entirely in one 4,000-line file. The same for the first ever ping program ever written (~2,000 LOC).

Doc Brown · Accepted Answer · 2017-06-22T05:42:30.060

147

Using multiple files always requires additional administrative overhead. One has to setup a build script and/or makefile with separated compiling and linking stages, make sure the dependencies between the different files are managed correctly, write a "zip" script for easier distribution of the source code by email or download, and so on. Modern IDEs today typically take a lot of that burden, but I am pretty sure at the time when the first ping program was written, no such IDE was available. And for files that small as ~4000 LOC, without such an IDE which manages multiple files for you well, the trade off between the mentioned overhead and the benefits from using multiple files might let people make a decision for the single file approach.

edited Jun 22 '17 at 05:42

answered Mar 02 '17 at 22:00

Doc Brown

199,015
33
367
565

12

"And for files that small as ~4000 LOC..." I'm working as a JS dev right now. When I have a file just 400 lines of code long, I get nervous about how large it's become! (But we have dozens and dozens of files in our project.) – Kevin Mar 03 '17 at 16:06
37

@Kevin: one hair on my head is too few, one hair in my soup is too many ;-) AFAIK in JS multiple files do not cause that much administrative overhead as in "C without a modern IDE". – Doc Brown Mar 03 '17 at 16:20
I agree. I just find it striking how different C is from other languages with respect to how long files often get! – Kevin Mar 03 '17 at 16:40
4

@Kevin JS is a fairly different beast though. JS is transmitted to an end user every time a user loads a website and does not have it already cached by their browser. C only has to have the code transmitted once, then the person at the other end compiles it and it stays compiled (obviously there are exceptions, but that's the general expected use-case). Also C stuff tends to be legacy code, as are much of the '4000 lines is normal' projects people are describing in the comments. – Pharap Mar 04 '17 at 01:04
5

@Kevin Now go and see how underscore.js (1700 loc, one file) and a myriad of other libraries that are distributed are written. Javascript is actually almost as bad as C with regard to modularization and deployment. – Voo Mar 04 '17 at 11:47
@Pharap / Voo Modern JS is compiled just like C is, and gets concatenated (or, recently, served over http/2) where those concerns aren't such an issue. 'old' js, sure, but any js written in the past 3-5 years is probably going to use these technologies. JS even has official syntax for modules now (though implementation details are still being worked out) – Dan Mar 04 '17 at 16:36
A kernel mode driver was wasn't built with a standard makefile, but with a special build tool named [build.exe](http://blogs.msmvps.com/kernelmustard/2005/11/04/building-win32-apps-with-build-exe-and-the-ddk/) ... which did support multiple source files though. You could use Visual Studio as an IDE (which invoked `build.exe` for building) but that wasn't officially supported. – ChrisW Mar 04 '17 at 18:38
1

@DanPantry "and does not have it already cached by their browser". I'm aware that *some* browsers compile JS, but not all of them, and not necissarily to machine code. Aside from which, the point of mentioning compiling is that unless a user is actively editing the source, a C file would tend to be downloaded and compiled just once per version, whereas JS source would be resubmitted and recompiled every time the browser's cache of the file was invalidated. Not to mention every visitor of a website will have the code implicitly downloaded (unless their browser says otherwise), unlike C. – Pharap Mar 05 '17 at 02:06
3

@Pharap I think he meant using something like [Webpack](https://webpack.js.org) before deploying the code. With Webpack, you can work on multiple files and then compile them into one bundle. – Brian McCutchon Mar 05 '17 at 03:20
2

Couldn't this be (LOC measure) of no interest? I mean a good IDE would be able to present any source code in a kind of modularized way, for example showing/presenting such a huge source code as a collection of functions and not as a huge text file. Source file size (LOC) is almost of no interest, what matters is the way you manage it. I know for compiling/link editing that is another question. – Jean-Baptiste Yunès Mar 05 '17 at 09:20
@BrianMcCutchon yeah, that's exactly what I meant, thanks. – Dan Mar 05 '17 at 14:48
@Dan Well yeah obviously. But that's not the point. The point is that if you only have minimal tool support JavaScript is just as bad as C (worse really) with modularization. If you rely on modern tooling you won't have those problems, but neither will you in C. It's all workarounds for historic reasons (in C's case the fact that systems in the 70s were very limited memory wise and in JavaScript's case just a horrible rushed design) when compared to modern languages – Voo Mar 05 '17 at 17:02
@Jean-BaptisteYunès In an OO language, LOC is not a concern in itself, but it's an indicator for other concerns. When a file hits 300 lines, you should start to question whether you're violating the Single Responsibility Principle. – Kevin Krumwiede Mar 05 '17 at 17:45
2

@KevinKrumwiede: you surely meant "in a class", not "in a file". Not every language forces you to implement each class in a different file. – Doc Brown Mar 05 '17 at 17:52
This explanation gains force when you develop your code for multiple platforms with very different build environments: I work on a system in MSVS, UNIX make and VMS MMS, so any change has to be applied in three places! I also find a few quite large files just as easy to manage as many small files. – PJTraill Mar 05 '17 at 17:59
@Voo The point was that modern tools are an explanation for why JS libraries are served as one file, not that modern tools make it easier to write JS in multiple files. It's reasonably easy to write JavaScript in multiple files without modern tools (just use several script tags and namespacing/IIFEs to avoid polluting the global scope, though it's better to distribute a library as a single file), though tools like Webpack certainly make it better. Webpack was provided as an explanation differing from your conclusion that the authors of JS libraries find it easier to write JS in one file. – Brian McCutchon Mar 05 '17 at 19:00
1

@Brian Are you claiming that writing JavaScript in multiple files without modern tools is easier than doing the same for C? That seems pretty unlikely. If you're trying to convince people that you can work around the limitations of JavaScript with modern tools - I really don't see anyone arguing against that (but the same is true for C). – Voo Mar 05 '17 at 19:10
@DocBrown The only legitimate case for putting more than one class in a file is identical to the case for putting large C programs in a single file: you want the whole thing to be easy to distribute and build without an IDE. – Kevin Krumwiede Mar 05 '17 at 20:57
1

@Voo JavaScript is almost certainly easier for that. You don't have to compile each file or mess around with headers/extern/etc. Most people seem to agree, considering that Doc Brown's comment has 23 upvotes. – Brian McCutchon Mar 06 '17 at 00:04
@BrianMcCutchon: maybe those people upvoted my comment because of the first sentence, who knows ;-) – Doc Brown Mar 06 '17 at 09:04
@BrianMcCutchon & Dan Pantry In that case, my originalcomment was not about how many files the code was separated into, but that 4000 is not that shocking for C, but would be for JS because of the use cases. 4000 lines of JS being transmitted everytime someone loads a webpage (without cache) is scarier than 4000 lines of C being transmitted for one-off compilation. – Pharap Mar 06 '17 at 21:42

score 81 · Answer 2 · answered Mar 02 '17 at 22:09

81

Because C isn't good at modularization. It gets messy (header files and #includes, extern functions, link-time errors, etc) and the more modules you bring in, the trickier it gets.

More modern languages have better modularization capabilities in part because they learned from C's mistakes, and they make it easier to break down your codebase into smaller, simpler units. But with C, it can be beneficial to avoid or minimize all that trouble, even if it means lumping what would otherwise be considered too much code into a single file.

answered Mar 02 '17 at 22:09

Mason Wheeler

82,151
24
234
309

42

I think it is unfair to describe the C approach as 'mistakes'; they were perfectly sensible and reasonable decisions at the time they were made. – Jack Aidley Mar 03 '17 at 14:43
15

None of that modularisation stuff is particularly complicated. It can be *made* complicated by bad coding style, but it's not hard to understand or implement, and none of it could be classed as "mistakes". The real reason, as per Snowman's answer, is that optimisation over multiple source files was not so good in the past, and that FileMon driver requires high performance. Also, contrary to the OP's opinion, those aren't particularly large files. – Graham Mar 03 '17 at 14:47
8

@Graham Any file larger than 1000 lines of code should be treated as a code smell. – Mason Wheeler Mar 03 '17 at 14:54
1

@MasonWheeler Sure, it's larger than ideal. But in the real world, it's not uncommon that the natural dividing lines don't let us partition our functionality more neatly. It's certainly something you'd question at code review, but it's not unambiguously bad. – Graham Mar 03 '17 at 15:01
12

@JackAidley its not unfair *at all*, having something be a mistake is not mutual exclusive with saying it was a reasonable decision at the time. Mistakes are inevitable given imperfect information and limited time and should be learned from not shamefully hidden or reclassified to save face. – Jared Smith Mar 03 '17 at 15:49
5

K&R days maybe not, but even the original ANSI C supports reasonably good modularity. I think the issue is that most C programmers simply don't know how to use the features, hence they write smelly code in big files. Perhaps something to do with the fact that it was a lot of assembly language programmer's first high-level language? Recompiling 4000 lines of code every time one line in one function is tweaked was not a good idea especially in the old days. That was a lot of floppy disk accesses...it'd shake a Kaypro II right off the desk.(modern 320KB disks not those crappy 160KB ones! :p) – Dan Haynes Mar 03 '17 at 17:00
4

@JaredSmith If someone took the train to travel cross-country in 1895, would you call it a "mistake" that they didn't take a plane? Of course not. Planes didn't exist, and the option was impossible. Therefore it cannot have been a mistake at that time. – barbecue Mar 03 '17 at 17:20
1

@JaredSmith: Sure, but this is not a case of that. C's approach was not a mistake, it was a good approach for the time it was designed. The better approaches available now require resources and tools unavailable or inappropriate for computers of the time. – Jack Aidley Mar 03 '17 at 18:27
6

I think C is really hampered by the (still) strict adherence to the "single-pass" compilation process - it made sense when computers had sequential-only storage and didn't have enough RAM to load entire programs into memory, but a 2-pass compiler (which eliminates the need for forward-declaration) would save everyone a lot of trouble - and would enable compilation of multiple files simply by concatenating `.c` files together. – Dai Mar 03 '17 at 18:31
10

Anybody who claims that C's approach is not a mistake fails to understand how a seemingly ten-liner C file can actually be ten-thousand-liner file with all headers #include:d. This means every single file in your project is effectively at least ten thousand lines, no matter how much is the line count given by "wc -l". Better support for modularity would easily cut parsing and compilation times into a tiny fraction. – juhist Mar 03 '17 at 18:33
6

@juhist: Okay, now implement a better method on a PDP-11. Claiming it was a mistake ignores the machines it was originally designed to work with. It was a perfectly good design for the period it was designed in. Better solutions are available with modern computers but remain unsuited to the computers of the era. – Jack Aidley Mar 03 '17 at 19:58
@Dai: I find it curious that even though conventions favored passing data pointers before the size of the data identified thereby, the authors of C99 decided to require that functions that include variable-length arrays in the prototype must accept the length parameter before the pointer, rather than specifying that a compiler should parse parameter lists in two steps (first identify the names of all "simple" parameters and then compute VLA sizes). A compiler won't need to *do* anything with VLA sizes expressions until after all parameters are parsed, so why not parse first? – supercat Mar 03 '17 at 22:14
6

@JackAidley It can be both a mistake and a historically sensible decision, the two things are not mutually exclusive. – Pharap Mar 04 '17 at 01:06
4

Pharap, the "mistake", then, would be continuing to use the language without redesign -- I don't see how it could possibly have been a mistake at the time to design something that was actually usable for its immediate use cases, including with respect to hardware support. – Charles Duffy Mar 04 '17 at 18:44
3

@Pharap, if by 'historically sensible' you mean a non-mistake _for that era_, it sounds like you're saying something may be a "mistake" merely when it's found unsuitable for use _outside of its intended scope_. That's a bit like calling your lawnmower a "mistake" merely because it doesn't make smoothies. It was not _designed_ to make smoothies, so it cannot fairly be called a mistake due to that (imo). – Mar 04 '17 at 21:19
3

The people waving the "mistake" flag fail to understand that C, like a chainsaw, does not protect the programmer from his mistakes. If you wield it without due care, you're going to get hurt. – Blrfl Mar 04 '17 at 22:03
2

@Blrfl As evidenced by the cult horror classic _Texas C Massacre_. – n_b Mar 05 '17 at 06:02
2

@Blrfl A chainsaw has a clearly defined interface where the hands go in one easily-understood spot and it's nowhere near the cutting part. In C, this is not the case; you're *required* to put your hands right up by the moving parts in order to get anything done. If someone built a chainsaw like that and then pointed at them and said "it's their own fault for getting their hands too close" once the inevitable happens, they'd probably be sued out of existence. – Mason Wheeler Mar 05 '17 at 10:54
2

@MasonWheeler Despite that well-defined interface, it's still possible for the untrained to get a faceful of chain by putting the guide bar where it doesn't belong. You can add a tip guard to prevent kickback, but the trade-off for that safety is having a saw that's less useful because now the last six inches of the bar can't be put into the work. The tip guards in programming languages that have them cannot exist without having been put there by someone capable of working safely in a language that doesn't have them. – Blrfl Mar 05 '17 at 13:15
@CharlesDuffy "Pharap, the "mistake", then, would be continuing to use the language without redesign" - well - the problem is that a) C is lingua franca of all system programming because most systems are rooted in 70s and 80s. Even if you need to write a new project all the API of OS is in C headers anyway b) this is a niche and most modern languages went a route which cannot be directly used there (GC and IRQL levels don't mix) c) you need to have it working on strange platforms care about ABI. That makes choice of language narrow (for example pre MSVC 2015 C89 was only version supported... – Maciej Piechotka Mar 05 '17 at 19:20
... and C++ have a strange ABI problems. So if you wanted to write a system library you probably should use C89 unless you ported program to MSVC 2015. Given the inertia of large programs as new compiler might mean new bugs and/or problems most codebase which needs to care about such things and needs to work on Windows defaults to C89). Combining it with a) that makes a C modern COBOL/Java for system/embedded programming. You may hate it but it is best option. Maybe in 10-20 years it will move to Rust or C++(if latter gain resonable ABI stability) but I don't think it will change anytime soon. – Maciej Piechotka Mar 05 '17 at 19:23
@MaciejPiechotka, believe you me, I love C (and at least used to have some code in the Linux kernel -- I've done my share of system programming). "The mistake" was a phrase used with intention of implied context of "if there *was* a mistake", without any intention of conceding same. There's actually a story I could tell about exactly how a customer asking the shop I used to work for (~2001) to build support for kernel modules written in C++ was treated you might appreciate... – Charles Duffy Mar 05 '17 at 19:27
1

@CharlesDuffy Sorry, I misunderstood. Although I am newbie with regard to system programming I had my share of answers why I don't use C++17 or C++11 on SE why my problem was in left hoof of mule named Deliverance ([link for people who don't know Mickens](https://www.usenix.org/system/files/1311_05-08_mickens.pdf)). – Maciej Piechotka Mar 05 '17 at 19:53
**Please avoid extended discussions in comments. If you would like to discuss this answer further then please visit the chat room. Thank you.** – maple_shaft Mar 08 '17 at 17:31

score 38 · Answer 3 · edited Jun 16 '20 at 10:01

38

Aside from the historical reasons, there is one reason to use this in modern performance-sensitive software. When all of the code is in one compilation unit, the compiler is able to perform whole-program optimizations. With separate compilation units, the compiler cannot optimize the entire program in certain ways (e.g. inlining certain code).

The linker can certainly perform some optimizations in addition to what the compiler can do, but not all. For example: modern linkers are really good at eliding unreferenced functions, even across multiple object files. They may be able to perform some other optimizations, but nothing like what a compiler can do inside a function.

One well-known example of a single-source code module is SQLite. You can read more about it on The SQLite Amalgamation page.

1. Executive Summary

Over 100 separate source files are concatenated into a single large files of C-code named "sqlite3.c" and called "the amalgamation". The amalgamation contains everything an application needs to embed SQLite. The amalgamation file is more than 180,000 lines long and over 6 megabytes in size.

Combining all the code for SQLite into one big file makes SQLite easier to deploy — there is just one file to keep track of. And because all code is in a single translation unit, compilers can do better inter-procedure optimization resulting in machine code that is between 5% and 10% faster.

edited Jun 16 '20 at 10:01

Community

1

answered Mar 03 '17 at 02:59

16

But note that modern C compilers can do whole-program optimization of multiple source files (although not if you compile them into individual object files first). – Davislor Mar 03 '17 at 03:35
10

@Davislor Look at the typical build script: compilers are not realistically going to do that. – Mar 03 '17 at 04:11
4

It’s significantly easier to change a build script to `$(CC) $(CFLAGS) $(LDFLAGS) -o $(TARGET) $(CFILES)` than to move everything to a single soudce file. You can even do the whole-program compilation as an alternative target to the traditional build script that skips recompiling source files that haven’t changed, similar to how people might turn off profiling and debugging for the production target. You do not have that option if everything is in one big heap o’source. It’s not what people are used to, but there’s nothing cumbersome about it. – Davislor Mar 03 '17 at 06:47
9

@Davislor whole program optimization / link-time optimization (LTO) also works when you "compile" the code into individual object files (depending on what "compile" means to you). For example, GCC's LTO will add its parsed code representation to the individual object files at compile time, and at link time will use that one instead of the (also present) object code to re-compile and build the whole program. So this works with build setups that compile to individual object files first, though the machine code generated by the initial compilation is ignored. – Dreamer Mar 03 '17 at 07:54
8

JsonCpp does this nowadays too. The key is that the files are not this way during development. – Lightness Races in Orbit Mar 03 '17 at 11:01
1

@Dreamer I did not know that. Thanks! In any case, you don’t need to have a single huge source file to do whole-program optimization on many (any?) modern compilers. – Davislor Mar 03 '17 at 15:30
I don't know technical details of the optimization done on whole programs, but saying that compilers cannot (possibly) do whole program optimizations in separated files (I thought that's what optimization is all about anyway ... ) makes the compilers seem pretty silly. They see all the content after all, so why would they not be able to gather information and then make an educated optimization decision? At the very minimum, they could decide to join some files together themselves to check for further optimization. – Zelphir Kaltstahl Mar 03 '17 at 20:49
@Davislor, there is no need to use $(CFILES) if all the files are `#include`d in 1 C file. It does put higher a burden on a compiler though. – Dmitry Rubanovich Mar 04 '17 at 10:44
1

@Zelphir That's not how traditionally C compilers (and compilers in general) have worked. There are very good historical reasons for it (not much memory in your average machine in the 70s), but has already been stated modern compilers are a good deal cleverer - although it is much harder to do this in C than in modern languages who don't really have the idea of compiling single files separately. – Voo Mar 04 '17 at 11:51
@DmitryRubanovich It is possible to `#include "source.c"`, but that has always been considered terrible style. Most maintainers are also going to be taken by surprise when they declare a global variable or macro in what they think is a separate compilation unit, and another file breaks. – Davislor Mar 05 '17 at 01:14
In any case, if you write separate files, you can turn whole-program optimization on or off. If everything is a single monolithic file, you can only compile the whole program. This also precludes other useful workflows, such as compiling one module along with a test driver to do unit tests. – Davislor Mar 05 '17 at 01:18
@Davislor, even the phrase "unit test" didn't exist when first ping code was written. Just to put things in perspective, other things which didn't exist then: Java, Linux, http protocol. As for whether it is considered a terrible practice today, it is used on large projects because it can *significantly* speed up compilation times when pre-compiled headers are not available. – Dmitry Rubanovich Mar 05 '17 at 03:58
@DmitryRubanovich I’m not second-guessing the original designers of `ping`, which is a great little utility. The question was why some projects do this, not whether new projects should, but that was then and this is now. As for compilation speed: normal workflow is to change one part of the program, recompile, and test. Unless you’re much better than I, more than once. If the whole build takes so long that it’s a genuine inconvenience, you really don’t want to have to rebuild the whole codebase every time. Recompiling one object file and relinking is much faster, after the first time. – Davislor Mar 05 '17 at 08:10
@Davislor Modern C compilers don't exists on embedded platforms and up to recently the gcc LTO implementation could be... problematic. And if you target RHEL/CentOS 4/5 you are down to times of 3.4/4.1. I don't think LTO existed on gcc back then let not mentioning the stability. SQLlite was released in 2000 when 2.95.2 was brand new so it was written with this version in mind. Given that splitting in multiple files would be major refactoring AND would have possibility of regressions... – Maciej Piechotka Mar 05 '17 at 19:32
@MaciejPiechotka I understand that the question is historical, and there are reasons why existing codebases don’t change. Maybe there are platforms with no good modern cross-compiler? Do you have an example? Nevertheless, this answer is incorrect: you do not need to code this way today to get the benefits of whole-program optimization. That hasn’t always been the case. – Davislor Mar 05 '17 at 21:30
3

Rebuilding a whole program every time anything changes will ensure that the code one runs will always match the source. By contrast, partial build systems create the danger that one might accidentally the source in a way that will alter the behavior of the next *full* build but not the next incremental one. – supercat Mar 05 '17 at 22:43
1

@Davislor CentOS/RHEL 5/6 are hardly historical (I wish they had...) and they use gcc 4.4.x or older. I think LTO started being added around 4.6 or 4.7 and wasn't really prime before 5.x. I don't know much about particular platforms but QCC 6.6 is based on 4.7. PIC16/PIC18 doesn't seem to have a good support either. Even if there is a cross compiler then you need to port the source code - in best case it is just recompilation but in worst case it is tracking differences between compilers. That said this is not my area of expertise (but I do work on codebase which should run on CentOS 5/6). – Maciej Piechotka Mar 06 '17 at 04:28
@supercat Great point! Good thing builds have gotten a lot faster. – Davislor Mar 06 '17 at 08:23
@Davislor: I don't know about that. Today's C compilers on today's machines may process more lines per second than a 1986 Pascal compiler (Turbo Pascal 3.0) running on a 1982 computer (original IBM PC), but in many cases the margin isn't huge. Turbo Pascal 3.0 didn't support incremental compilation, but generally had faster build times than other Pascal compilers that did. – supercat Mar 06 '17 at 22:56
@supercat I don’t go back that far, but I did run Turbo C 3.0 on an XT that even by that time was old. I strongly suspect (but don’t have the hardware to benchmark) that build time for the same source code would be dominated by disk speed. I bet that the optimizer today does do a lot more, although you can turn it off, but just reading from and writing to DRAM or a SSD instead of floppies is a huge win. – Davislor Mar 06 '17 at 23:28
@Davislor: The performance of the PC and XT were equivalent. I don't remember there being a Turbo C 3.0; I thought after 2.11, TC got replaced with TC++ which was a fair bit slower. Turbo C used a fairly normal compile-and-link approach; Turbo Pascal turned Pascal source *directly* to machine code. So directly, in fact, that if one got a message "Runtime error at address 1234", one could invoke the "Find runtime error" option, type 1234, and have the compiler show what location that represented in source code. – supercat Mar 07 '17 at 00:14
1

@Davislor: Some systems today might be able to use debug-info files to provide such functionality, but TP3.0 used a different approach: compile the program to the equivalent /dev/null but count how many bytes have been output, and set the compilation-error flag when that count reaches 0x1134 (0x1234 minus the start address of 0x0100). The location of the compilation "error" will be the source location corresponding to address 0x1234. Pretty slick, eh? – supercat Mar 07 '17 at 00:17
@suoercat Our (fun!) discussions are getting off-topic again, but yes, it supported both C and C++ and shipped as Turbo C++. Thinking back, I had learned C first, but i mostly used the C++ compiler at that job. – Davislor Mar 07 '17 at 00:24

score 16 · Answer 4 · answered Mar 02 '17 at 22:28

In addition to the simplicity factor the other respondent mentioned, many C programs are written by one individual.

When you have a team of individuals, it becomes desirable to split the application across several source files to avoid gratuitous conflicts in code changes. Especially when there are both advanced and very junior programmers working on the project.

When one person is working by himself, that isn't an issue.

Personally, I use multiple files based on function as a habitual thing. But that's just me.

@OskarSkog But you will never modify a file at the same time as your future self. — Loren Pechtel, Mar 05 '17 at 05:13

score 2 · Answer 5 · answered Mar 04 '17 at 10:41

2

Because C89 didn't have inline functions. Which meant that breaking up your file into functions caused the overhead of pushing values on stack and jumping around. This added quite a bit of an overhead over implementing the code in 1 large switch statement (event loop). But an event loop is always much more difficult to implement efficiently (or even correctly) than a more modularized solution. So for large-size projects, people would still opt out to modularize. But when they had the design thought-out in advance and could control the state in 1 switch statement, they opted for that.

Nowadays, even in C, one need not have to sacrifice performance to modularize because even in C functions can be inlined.

answered Mar 04 '17 at 10:41

Dmitry Rubanovich

175
4

2

C functions could be just as much inline in 89 as these days, inline is something that should be used almost never - the compiler knows better than you in almost all situations. And most of those 4k LOC files are not one gigantic function - that's a horrible coding style which won't have any noticeable performance benefit either. – Voo Mar 04 '17 at 11:55
@Voo, I don't know why you mention the coding style. I wasn't advocating it. In fact, I mentioned that in most cases it guarantees a less efficient solution due to a botched implementation. I also mentioned that it's a bad idea because it doesn't scale (to larger projects). Having said that, in very tight loops (which is what happens in close-to-hardware networking code), needlessly pushing and popping values on/off stack (when calling functions) will add to the cost of the running program. This was not a great solution. But it was the best one available at the time. – Dmitry Rubanovich Mar 04 '17 at 12:29
2

Obligatory note: *inline* keyword has only a little to do with inlining optimization. It is not a special hint for compiler to do that optimization, instead it has to do with linking with duplicate symbols. – hyde Mar 05 '17 at 16:01
@Dmitry The point is that claiming that because there was no `inline` keyword in C89 compilers couldn't inline which is why you had to write everything in one giant function is incorrect. You should pretty much never use `inline` as a performance optimisations - the compiler will generally know better than you anyhow (and can just as well ignore the keyword). – Voo Mar 05 '17 at 16:58
@Voo: A programmer and a compiler will generally each know some things the other doesn't. The `inline` keyword has linker-related semantics which are more important than the question of whether or not to perform in-line optimizations, but some implementations have other directives to control in-lining and such things can sometimes be very important. In some cases, a function may look like it's too large to be worth in-lining, but constant folding might reduce the size and execution time to almost nothing. A compiler that isn't given a strong nudge to encourage in-lining might not... – supercat Mar 05 '17 at 22:33
...take the time to find out whether it would be worthwhile if heuristics suggest it most likely won't. Having a compiler attempt more complicated analysis every time a program is compiled could, if a program is compiled many times, end up wasting a lot more time than having a program insert a directive once. – supercat Mar 05 '17 at 22:35
@Voo: To be fair, 30 years ago compilers were much less capable than they are now... and the average programmer was probably more knowledgeable about how to do low level optimization for 30 year old targets than modern programmers are about doing low level optimization for modern targets. – Mar 06 '17 at 03:48
@supercat That's why I said "almost never" not "always". – Voo Mar 06 '17 at 08:04

score 1 · Answer 6 · answered Mar 15 '17 at 22:42

This counts as an example of evolution, which I am surprised has not been mentioned yet.

In the dark days of programming, compilation of a single FILE could take minutes. If a program was modularised, then inclusion of the necessary header files (no precompiled header options) would be a significant additional cause of slowdown. Additionally the compiler might choose/need to keep some information on disk itself, probably without the benefit of an automatic swap file.

The habits that these environmental factors led to carried over into ongoing development practices and have only slowly adapted over time.

At the time the gain from using a single file would be similar to that we get by the use of SSDs instead of HDDs.

Why are some C programs written in one huge source file?

6 Answers6

1. Executive Summary