Is Program Running Time Affected by File Size?

Question

Say I wrote a program containing 3 methods. Each method was 100 lines. Method 1 was main(), and Method 2 was called by main().

I then duplicated the program into an identical second program. After duplicating it, I added 4900 lines to Method 3.

Method 3 is never called by either program.

Program 1 has a file size of 3.7kb. Program 2 has a file size of 62.5kb.

Will this change in file size affect the running time of the program, even though Method 3 was never called? What if Method 3 had lines added until Program 2 reached extreme size?

This effect in running time should be considered down to the nanosecond level, since that is a non-negligible effect in some situations. Also, I would like to consider this effect for both compiled and interpreted languages.

(This is easy to test for something as small as 300 v 5200 lines. I'm asking more about the theoretical aspect of file size affecting running time than a specific scenario.)

My question was marked as a duplicate of Is micro-optimisation important when coding?. These questions have no relation, as I'm not asking if micro-optimization is important or unimportant. I'm asking if file size has an effect on program running time. I don't care if the effect it has is important or unimportant, I merely care that the effect does or does not exist. The other question also does not specifically mention file size effect on a program - their question focuses more on a program's logical structure rather than the size of the program file(s).

On what scale are we thinking about? Perhaps on a nanosecond level, yes, but this would be insignificant. Also [dead code elimination](https://en.wikipedia.org/wiki/Dead_code_elimination) would eliminate any code that is never called. — Jesse Good, Aug 21 '16 at 02:20
@JesseGood What about interpreted languages, instead of compiled languages? Also, yes, down to the nanosecond level (I've edited the question to reflect this; thank you.) — Logan Hartman, Aug 21 '16 at 02:31
@gnat Not even remotely related. I'm not asking if micro-optimization is important. I'm asking if the size of a file has any effect on how a program runs. I don't care if it's important or not, I'm curious as to if it the effect exists. That question also has no mention of file size, nor method size as far as I can see. — Logan Hartman, Aug 21 '16 at 06:17

Jesse Good · Accepted Answer · 2016-08-22T00:34:55.717

11

In theory, yes file size can affect the running time of a program. However, the affect it has is probably so insignificant you should never have to worry about it.

There are a few reasons for this:

Compilers/interpreters are very smart and would optimize the code to be as performant as possible. Your example would be a great candidate for dead code elimination. However, they are not perfect, so there is a possibility that unused variables, unreachable code, etc. could possibly affect the output (CPU instructions) of the compiler/interpreter. Simply put, the way you write code can affect the output CPU instructions. See the blog post link below for an actual example of this.
As a result of "file size" affecting the CPU instructions, the data in your program could possibly spread out in a way that affects performance. Locality of reference explains a lot of this. In short, poor locality cause things such as cache misses and thrashing.
While not necessarily affecting runtime, a larger file size of course takes longer to load from disk and longer to interpret (for interpreted languages), affecting the startup of the program.
If your program was big enough, it could cause a page fault.

For some more details, this blog post goes into a lot of details of how reducing the binary size helped performance.

Having said all that, "file size" is something you shouldn't worry about in about 99% of situations.

edited Aug 22 '16 at 00:34

answered Aug 21 '16 at 02:56

Jesse Good

321
2
8

Everything mentioned here is sound and reasonable. So +1 for that. The heart of the issue is something called the [space time tradeoff](https://simple.wikipedia.org/wiki/Space-time_tradeoff). You can save time by using more space. You can save space by taking more time. The converse is: wasting space can cost you the ability to save time. – candied_orange Aug 21 '16 at 07:02
Worth mentioning that the experiment conducted by the OP (adding a function that is never called by the program) was a *very* easy target for the optimizer / dead code eliminator. And even if the code stays, the worst it will do is slow down program startup as you've mentioned. It becomes more involved when the bloated code is sprinkled all over the useful parts of your program putting pressure on your instruction cache. Many C++ programmers are worried about this a lot. At the optimizer level, this is the question of when to inline. It is anything but trivial to answer. – 5gon12eder Aug 21 '16 at 20:12
@5gon12eder It was intended to be simple, since it was was scenario created for illustrative purposes; however, I didn't know that optimizers targeted out-of-reach code such as uncalled methods. – Logan Hartman Aug 21 '16 at 22:14
@JesseGood What kind of affect would file size have for programs written in assembly or even raw machine code? – Logan Hartman Aug 21 '16 at 22:17
@LoganHartman: Programs written in assembly would be affected in the same way as shown in my answer, although the likelihood would decrease since "code bloat" in assembly is not as likely since you are closer to the metal. Now, for raw machine code, it is impossible for human beings to program in it (raw machine code is executed by the CPU), so the question seems moot... – Jesse Good Aug 22 '16 at 00:37
@JesseGood See the second paragraph about [Machine Code](https://en.wikipedia.org/wiki/Machine_code). It is possible to program in machine code, technically. – Logan Hartman Aug 22 '16 at 01:35
@LoganHartman: You are misreading the link a little bit. Note that link says *Numerical* machine code, i.e. yes you can program in a numeric representation of machine code such as hexadecimal. This of course still has to be translated into machine code. The answer doesn't change though. – Jesse Good Aug 22 '16 at 03:55
@JesseGood I may have not made myself clear. That was why I included machine code in the same comment as assembly - numeric machine code is simply a bit lower than assembly, and I didn't know if that would make a difference. That was what I was referring to, and my apologies for not making that completely clear. – Logan Hartman Aug 22 '16 at 03:58
@LoganHartman: I understand, but I think we also have to consider *practicality*. Otherwise, the scope becomes too broad. – Jesse Good Aug 22 '16 at 04:10

Erik Eidt · Answer 2 · 2016-08-23T02:02:13.083

Most modern operating systems will use virtual memory capabilities (supported by hardware features) to memory map the executable file into memory, which suggests there will be little to no effect due to sheer size of the executable, if the contents are otherwise largely unused/unreferenced.

Virtual memory, combined with copy-on-write also works on read/write data unique to the particular instantiated process; copy-on-write detects (initially file) mapped pages that are modified, and, virtual memory generally it pushes dirty(modified) pages out to the paging file as needed when dealing with memory constraints.

Ok, factoring out load time, let's assume for starters that the extra, unused file content you're asking about occurs after all the code that indeed does get used, which is grouped together at the beginning. There should be virtually no effect on the runtime performance for that. Very generally speaking, except for the cpu cache, the same instruction sequence executed will take the same time.

There are cache specific behaviors that might give you a hiccup or even some pathological behaviors, but I don't really see any big problems associated with a big contiguous chunk of either code or data that is not referenced in any way.

To be more clear, hardware caches have a notion of associativity. On modern processors that is usually 8-way or more. Embedded might vary. What hardware cache associativity of N-way, does is allow up to N addresses that hash to the same value (a cpu-internal hash of addresses) to be cached. Now, when you try to cache a N+1'th value that happens at the same hash, then one of the other elements gets evicted, even if there's still kilobytes left unused in the cache. Hardware caches are designed to work really well with contiguous memory.

So, if you insert other memory that isn't used in between memory that is used, you could create a pathological case where you are not using the cache effectively due to an associativity limit. You would have to try pretty hard: by putting all your actually accessed code (& data) on the same cache line hash, you could exhaust the associativity. Mind that some of this (running out of associativity) happens even under normal circumstances, so I would say that barring really trying to construct a worst case, this should not be a factor for even one or more lumps of code or data loaded but not otherwise used.

Now, there's still another effect, which is that caches are segmented into lines, which are chunks of, say 128, 256 or 512 bits (or other). In another pathological case, you can use up the cache in a different way. Since the cache (almost, depending on cpu) loads a full cache line when even only a single byte is needed, you could create a scenario where the cache is exhausted more quickly than you'd expect, by using only a very small amount of actual memory per cache line. As with the other case, you would have to work hard to construct a pathological case, but the idea is to intersperse the code (or data) that is used with code or data that isn't used in certain very regular intervals.

(There is a similar effect on virtual memory paging, that you could use up real memory faster than you'd like by using only a small number of bytes per page.)

Barring some pathological construction designed to hurt the cache, extra unused code or data should not affect the runtime performance.

For embedded systems, would file size have a larger effect due to some not supporting virtual memory (obviously, in an embedded system, file size is going to matter due to a lack of physical memory existing; I mean solely in terms of program running time)? — Logan Hartman, Aug 21 '16 at 22:27

score 0 · Answer 3 · answered Aug 23 '16 at 00:38

If the code is signed, the O/S will need to load the entire file image, scan every single byte, and compute a digital signature. The time required to perform this operation will obviously scale with the size of the .exe, regardless of how much of it is actually executed.

score -1 · Answer 4 · answered Aug 21 '16 at 19:53

Yes ...running time will be affected in languages ( like JavaScript or interpreted lang.) In case of compiled lang. Like C or Java running will be affected but in very less amount( like nano or micro sec) Because compiler has to load all functions into main memory so as to check error if exist.

Is Program Running Time Affected by File Size?

4 Answers4