Suppose that you describe programs, which have a lot of AssignmentStatement(target, /*value*/Expression)
. There are other statements, like if-statement and for-statement and all of them may have optional label. So, our assignment must be coded like AssignmentStatement(name, target, /*value*/Expression)
on the blackboard. Now, whenever an instance is created, it will allocate 8 bytes more to store the null reference to the name. You see, the problem is the waste of memory. Name is null for 95% of statements but it still consumes 8 bytes of memory. Advanced programmers may advise Option[Name] to consume even more. But let's focus on plain String names for the sake of the argument.
Because 95% of instances will carry null in their first field, we can separate them into a separate class. In order to save the memory, I feel it is better to split my classes into those which have name and those which don't,
class NamedAssignmentStatement(name, target, /*value*/Expression)
class AnonymousAssignmentStatement(target, /*value*/Expression)
This eliminates duplication (many instances carrying the same attributes) and saves memory. The only problem is the difficulty of handling the duplicated classes.
They are proliferating exponentially actually. The value expression actually may have one optional mode
argument, which, likewise the name, is almost never used. It makes sense to save more memory to replace AssignmentStatement(target, mode, /*value*/Expression)
with two classes, one which has mode and another which not.
Now, I have 4 classes instead of single AssignmentStatement(name, target, mode, /*value*/Expression)
. But that is not the end of story. The values can be either simple expressions, evaluated to value but can also be waveforms, evaluated to a list of (value, duration) pairs. To distinguish between value types, I can wrap the value expression into SimpleExpression(expression)
or into Waveform(expression)
to be used like AssignmentStatement(target, Waveform(Expression))
. The problem is that instantiating extra wrapper per every assignment will consume ca 20 extra bytes as far as I know from JVM. Instead of using the wrapper class, I can save the memory by signaling the value type with the assignment type itself, which means that I need to split the assginment class once more
SimpleAssignmentStatement(target, /*waveform_*/expression)
WaveformAssignmentStatement(target, /*simple_*/expression)
You see, I have got 8 classes instead of originally 1. It should be a hell to manage such system. Are there languages that address easy handling of such multiple case classes? To ask more generally, is the static typing and large number of classes vs dynamic memory allocation and abuse issue that I addressed here is a known in programming?
I see that programmers are pretty careless regarding the memory usage. They will answer that 8 bytes for null is a meaningless price, worth to ignore. But,
watch the pennies, and the pounds take care of themselves
The highest mountains are made of tiniest, invisible sub-micron particles, which, in case of computational process has a form of (hundreds of) millions of objects in memory which accumulate to the huge strucutres. The problem is not only the cost of the memory (which you say is dirt-cheap and should not be considered) but the PC performance bottleneck, which is the questoin of how well your data fits the cache size. When your data fits the cache, it operates thouthands times faster and I cannot feel free looking how simple assignment, which needs only couple of byes to encode the statement, target and value reference escalates enormously, to the tens of bytes.
A disclaimer: I do not ask about importance of microoptimization. You can add your considerations regarding it into your answer but I ask how to reclaim that memory, not whether it is worth. Please understand the difference between asked how and not asked if it is worth or not.
Here is some more food for thought that may help to answer. Suppose that expressions can be binary operation, like Add(a,b)
, Multiply(a,b)
or Concatenate(a,b)
. You would encode them using one general class BinOp(opCode,arg1,arg2)
. On the other hand, one could make BinOp an interface and make opcode its method rather than a class field and use 3 classes to implement this interface. This again would save memory because all instaces of Add class would share the opcode at the class level instead of carrying "+" String in their opcode, causing huge duplication. Such subclassing would probably improve performance because instead 2-level cases, first determining that we are dealing with BinOp and then taking action depending on the opcode, we can take action directly since every subclassed binary operation now knows what action is expected on it. The action can be evaluate
operation, which depends on the binary operation subclass. Probably, there are known techniques to deal with such multiple subclasses, which can encompass the exponential explosion of the classes that I address in my question.
Hint2 Scala makes you less productive claims that splitting one class into many case-subclasses makes you more productive actually, in the case-classes section. Probably, if you can understand that idea, you can probably tell if it already answers the question. He says that class explosion won't occur but I do not quite understand why.