Splitting one class into subclasses to save memory

Question

Suppose that you describe programs, which have a lot of AssignmentStatement(target, /*value*/Expression). There are other statements, like if-statement and for-statement and all of them may have optional label. So, our assignment must be coded like AssignmentStatement(name, target, /*value*/Expression) on the blackboard. Now, whenever an instance is created, it will allocate 8 bytes more to store the null reference to the name. You see, the problem is the waste of memory. Name is null for 95% of statements but it still consumes 8 bytes of memory. Advanced programmers may advise Option[Name] to consume even more. But let's focus on plain String names for the sake of the argument.

Because 95% of instances will carry null in their first field, we can separate them into a separate class. In order to save the memory, I feel it is better to split my classes into those which have name and those which don't,

class NamedAssignmentStatement(name, target, /*value*/Expression)
class AnonymousAssignmentStatement(target, /*value*/Expression)

This eliminates duplication (many instances carrying the same attributes) and saves memory. The only problem is the difficulty of handling the duplicated classes.

They are proliferating exponentially actually. The value expression actually may have one optional mode argument, which, likewise the name, is almost never used. It makes sense to save more memory to replace AssignmentStatement(target, mode, /*value*/Expression) with two classes, one which has mode and another which not.

Now, I have 4 classes instead of single AssignmentStatement(name, target, mode, /*value*/Expression). But that is not the end of story. The values can be either simple expressions, evaluated to value but can also be waveforms, evaluated to a list of (value, duration) pairs. To distinguish between value types, I can wrap the value expression into SimpleExpression(expression) or into Waveform(expression) to be used like AssignmentStatement(target, Waveform(Expression)). The problem is that instantiating extra wrapper per every assignment will consume ca 20 extra bytes as far as I know from JVM. Instead of using the wrapper class, I can save the memory by signaling the value type with the assignment type itself, which means that I need to split the assginment class once more

SimpleAssignmentStatement(target, /*waveform_*/expression)
WaveformAssignmentStatement(target, /*simple_*/expression)

You see, I have got 8 classes instead of originally 1. It should be a hell to manage such system. Are there languages that address easy handling of such multiple case classes? To ask more generally, is the static typing and large number of classes vs dynamic memory allocation and abuse issue that I addressed here is a known in programming?

I see that programmers are pretty careless regarding the memory usage. They will answer that 8 bytes for null is a meaningless price, worth to ignore. But,

watch the pennies, and the pounds take care of themselves

The highest mountains are made of tiniest, invisible sub-micron particles, which, in case of computational process has a form of (hundreds of) millions of objects in memory which accumulate to the huge strucutres. The problem is not only the cost of the memory (which you say is dirt-cheap and should not be considered) but the PC performance bottleneck, which is the questoin of how well your data fits the cache size. When your data fits the cache, it operates thouthands times faster and I cannot feel free looking how simple assignment, which needs only couple of byes to encode the statement, target and value reference escalates enormously, to the tens of bytes.

A disclaimer: I do not ask about importance of microoptimization. You can add your considerations regarding it into your answer but I ask how to reclaim that memory, not whether it is worth. Please understand the difference between asked how and not asked if it is worth or not.

Here is some more food for thought that may help to answer. Suppose that expressions can be binary operation, like Add(a,b), Multiply(a,b) or Concatenate(a,b). You would encode them using one general class BinOp(opCode,arg1,arg2). On the other hand, one could make BinOp an interface and make opcode its method rather than a class field and use 3 classes to implement this interface. This again would save memory because all instaces of Add class would share the opcode at the class level instead of carrying "+" String in their opcode, causing huge duplication. Such subclassing would probably improve performance because instead 2-level cases, first determining that we are dealing with BinOp and then taking action depending on the opcode, we can take action directly since every subclassed binary operation now knows what action is expected on it. The action can be evaluate operation, which depends on the binary operation subclass. Probably, there are known techniques to deal with such multiple subclasses, which can encompass the exponential explosion of the classes that I address in my question.

Hint2 Scala makes you less productive claims that splitting one class into many case-subclasses makes you more productive actually, in the case-classes section. Probably, if you can understand that idea, you can probably tell if it already answers the question. He says that class explosion won't occur but I do not quite understand why.

"The problem is [...] the PC performance bottleneck, which is the question of" __did you profile it__? Increasing the complexity of your program tenfold just to save a few kilobytes of memory is not worth it, unless you measured that it actually is. — Vincent Savard, Jan 26 '16 at 13:49
@VincentSavard Profile what? The knowledge informed to me at the computer engeneering courses and advises in textbooks on Computer Architectures? Did you profile the Newton's F=ma laws? Profile the mountain made of tiny particles or the fact that reduced cache size affects the performance? — Valentin Tihomirov, Jan 26 '16 at 13:51
You've included a lot of details, but crucial information is missing: (1) what language are you using? You have different tradeoffs in Java than in C++. (2) Why do some statements have a `name`? Is this like a label you can `go to`? (3) You seem to have two types in the language (scalars and waveforms). Do these have to be separate on the AST level, or might it be better to implement a more general type system? (4) What are you using this syntax tree for? (5) Do you have prior experience with compiler implementation? — amon, Jan 26 '16 at 13:54
@VincentSavard I think in this case he's concerned about perf impact of his architecture, which is different to the usual profiling to locate bottlenecks. Sometimes you have to consider the performance impact of your program up front to save having to rewrite it completely. Besides, its always worth a little thought first before jumping into coding. — gbjbaanb, Jan 26 '16 at 14:01
@amon You are missing couple of things. This is programmers rather than stackoverflow. Our language is `blackboard` here. I am not using a language, therefore. I just ask which languages are good for the class management that I have described and think that both C++ and java will have the 8 byte penalty for null reference on 64 bit machine. The waveform can indeed be incorporated into the type system to make it a kind of `scalar` type but I do not want to mix waveform into the language-specified type system. The label is intended for go to indeed, I think but this is not a question. — Valentin Tihomirov, Jan 26 '16 at 14:04
@ValentinTihomirov C++ will only have the 8 byte penalty for classes that have a vtable - ie those using virtual inheritance. Without that, the C++ compiler will optimise every class down to just its member variables. — gbjbaanb, Jan 26 '16 at 14:09
I believe that @VincentSavard's point was that computers are complex systems with many layers of subsystems, and there are many ways that simple optimizations like yours might be overshadowed by larger concerns. For example, structure padding to align everything on a 64-bit boundary might mean you're only saving (say) three bytes instead of eight. Or your structure might not fit in a single cache line even without the name pointer, so saving eight bytes might not buy you anything anyway. — TMN, Jan 26 '16 at 14:53
Possible duplicate of [Is micro-optimisation important when coding?](http://programmers.stackexchange.com/questions/99445/is-micro-optimisation-important-when-coding) — gnat, Jan 26 '16 at 15:14
@gbjbaanb How is that possible? Vtable is something about methods whereas an object instance is a collection of fields, which consume memory. You compile classes at compile time and instantiate later at run time. Your compiler cannot know which instance will have its name = null in advance. Compiler will generate the code that will allocate +8 bytes for any object instance, whether it is labeled with name or not, whether it is C++ or java. — Valentin Tihomirov, Jan 26 '16 at 17:07
@ValentinTihomirov a plain C++ class with no virtual methods can be optimised to nothing. Every call to one of its methods can be turned into a simple 'static' function. Its only when you have virtual methods does the compiler not know if a method is overridden or not at runtime. — gbjbaanb, Jan 27 '16 at 08:59
@gbjbaanb I asked what object instance (instance is a collection of the attributes, aka fields) has to do with methods? How do you optimize the collection (Name, Color, Age) to nothing? Where would the data be stored in. Will you store the attributes in outer space? — Valentin Tihomirov, Jan 27 '16 at 12:11
@ValentinTihomirov you can't optimise the member variables away, obviously. But you don't have to pay the overhead of a vtable. The vtable is part of every instantiated object. — gbjbaanb, Jan 27 '16 at 14:12
@gbjbaanb I am talking about `member variable` optimization. I do not understand how this issue is related to methods, which are obviously defined in the classes rather than in instances. When you invoke a method, runtime takes the class of the instance and then inspects the classes to find out the latest override. It does not inspect the instance for that purpose. It is well-known in java and it makes no sense to store a copy of vtable per every instance. It is just stupid. All object instances share their class methods and vtable. What is the point to replicate it? — Valentin Tihomirov, Jan 27 '16 at 15:46
@ValentinTihomirov then you need to be clearer. You talk about saving 8 bytes for 1 member variable because it matters to you, and then ignore 8 bytes overhead for some objects. Doesn't make sense that you refine the question to only include parts you want to consider and ignore everything else. I can't read your mind to understand what you wanted so I tell you what is relevant. Objects that use OO have RTTI and other information contained in them that you do not see. If you need to consider the memory usage of objects you need to take that into account just as much as members. — gbjbaanb, Jan 28 '16 at 08:54
@gbjbaanb I do not see which 8 bytes I ignore `for some objects`. I cannot read your mind to understand that. That is demagogy. The RTTI is included by a reference to the class. I have accounted it saying that Java empty object is roughtly 20 bytes. It includes RTTI. This howerver does not mean that the size of the reference to the class is dependent on the number of methods in the class. I say about 8 byte penalty for optional fields. Then you come and say that I won't have this penalty in C++. Obviously, I started to think that you are talking about C++ optimizing the data fields. — Valentin Tihomirov, Jan 28 '16 at 09:48

score 2 · Answer 1 · answered Jan 26 '16 at 14:04

2

The program you describe is a bit complex, but it occurs to me you can leverage the Flyweight design pattern. With that pattern you save memory by making many objects share the same state.

In your case the shared state would all the null values that take 8 bytes each.

Those classes that will have values other than null sould simply override the extrinsic properties of the flyweight. Extrinsic methods are those that exposed a common state whereas intrinsic ones are those who will vary from one instance to another. Extrinsic state will not eat up memory because it will exists only once.

Image taken from here.

answered Jan 26 '16 at 14:04

Tulains Córdova

39,201
12
97
154

Do you mean that instead of splitting classes at static typing level, I would better refer appropriate common state dynamically, which means that `NamedAssignment` refers the `AnonymousAssignment` instead of extending it? – Valentin Tihomirov Jan 26 '16 at 14:22
1

If we use a flyweight, we need to access that shared flyweight somehow inside the `Statement` class. Therefore we'll need a pointer to the flyweight which as a pointer to the `name` string. How does this improve over a nullable pointer to a string? – amon Jan 26 '16 at 14:24
@amon Probably we can benefit if we pack two attributes, the name and mode into the flyweight, which will be mostly (null, null) in most cases. This way, we replace two references with one and still benefit something. I just do not get why instance-level state decomposition is easier to manage than class level decomposition that I presented asking for language support. – Valentin Tihomirov Jan 26 '16 at 14:37

score 1 · Answer 2 · answered Jan 26 '16 at 14:57

Like others have said this is likely a premature optimization. Have you check exactly how much memory are you wasting for empty optional fields? If you want to have multiple, optimal classes you might want to have single factory method, that select appropriate class by passed arguments, but as you have said it won't scale anyway with many optional parameters.

If you really want to squeeze memory usage I am afraid you will have to give up on type safety and use something similar to network protocols / file formats.

[0]: 0011
[1-4]: field1
[5-9]: field2

Or:

[0]: 0100
[1-4]: field3

Of course it makes more sense for files as hard drive/network are much slower that CPU and wasting few cycles to pack data as tight as possible is a good trade-off. For RAM just using unaligned data access might make you program 10 times slower and with above method you also have to calculate offset for each field for every write&read. But it saves memory, so it is worth it, right?

Regarding `or course it makes more sense for files as hard drive/network are much slower that CPU`, I remember zipping the java object serialization stream written to disk. Zipping made it several times smaller but several times slower. Probably zipping is complex and requires a lot of iterations. I then resorted to Krio because noticed that there is a lot of garbage in the output stream. Krio did serialization twice slower again despite resulting stream was twice shorter. — Valentin Tihomirov, Jan 26 '16 at 17:21
Now, in OOP we use virtual calls. AnonymousStatement.GetName should use as much indirection as LabeledStatement.GetName. You actually get rid of performance penalty! Yes, if you have only class with name=null for anonymous, you have to check if it is null or not on every call. You call the getname directly if you split classes for performance. The only issue is code complexity which blows up when I need to maintain 8 classes instead of 3 properties. — Valentin Tihomirov, Jan 26 '16 at 17:27

score 0 · Answer 3 · answered Jan 26 '16 at 14:07

0

I think the Flyweight pattern might be what you're looking for. This is designed to store the minimal amount of data and apply it to objects on an as-needed basis.

eg. imagine a word processor where each letter was an individual object. Obviously it'd run slower than Word on a 286! But if you stored each letter in memory directly, and loaded the letter's data into an object as the user needed to manipulate it, then you have all the functionality without the overhead.

answered Jan 26 '16 at 14:07

gbjbaanb

48,354
6
102
172

In your example with characters all `a` characters are identical but the duplication that I described is partial: the AssignmentStatements have partially shared state. They are certainly different in target and value and sometimes even in label and mode are different. That is why I do not understand how can I share anything with flyweight here. – Valentin Tihomirov Jan 26 '16 at 14:12
@ValentinTihomirov I don't know, maybe in my example each letter also has state describing if it is bold or italic as well. Then it shows that some letter As might not be the same, and require different handling. – gbjbaanb Jan 26 '16 at 14:19

score 0 · Accepted Answer · answered Jan 28 '16 at 11:06

Have you tried restructuring your AST so that you have a NamedStatement class that wraps some other statement? Then you don't need to differentiate between NamedAssignmentStatement and AnonymousAssignmentStatement.

Similarly, for the mode, you could have an ExpressionWithMode that wraps some other expression and adds the mode.

Finally, for the Waveform/Simple distinction, that sounds like it's an inherent property of the expression (like a type), not a property of the expression use, so it should be a member inside Expression that signals the kind of expression (or a virtual function so that most expressions simply forward the kind of their inner expression(s)). Maybe you also have specific conversion expressions that change this type, but those need to be separate AST nodes anyway.

Whether this change is worth it compared to a nullable field comes down to heuristics. A nullable field consumes 8 bytes in every object, which is wasted when not used. A wrapper node consumes, let's say, 40 bytes (24 bytes overhead, the pointer to the string, plus the pointer to the inner node). The first method wastes memory of unlabeled statements, the second for labeled statements. You say that 95% of statements are unlabeled. That means for 100 nodes, the first method needs 5 * 8 = 40 bytes for the set name fields, and wastes 95 * 8 = 760 bytes for the unset name fields, for a total of 800 bytes. The second method needs 5 * 8 = 40 bytes for the set name fields, and wastes 5 * 32 = 160 bytes for the overhead of label nodes, for a total of 200 bytes. Assuming that your 95% figure is accurate, the second method saves you 600 bytes per 100 nodes. Of course, if you expect the ratio to be different, the calculation changes as well.

What is never worth it is a massive proliferation in distinct classes coming from exponential explosion along multiple change axes. The programming and complexity overhead just makes it such a bad solution that I would most strongly advise against it.

Splitting one class into subclasses to save memory

4 Answers4