5

I was reading this article on MathWorks about improving MATLAB performance and you will notice that one of the first suggestions is to preallocate arrays, which makes sense. But it also says that preallocating Cell arrays (that is arrays which may contain different, unknown datatypes) will improve performance.

But how will doing so improve performance because the datatypes are unknown so it doesn't know how much contiguous memory it will require even if it knows the shape of the cell array, and therefore it can't preallocate the memory surely? So how does this result in any improvement in performance?

I apologise if this question is better suited for StackOverflow than Programmers but it isn't asking about a specific problem so I thought it fit better here, please let me know if I am mistaken though.

Any explanation would be greatly appreciated :)

Alex McMurray
  • 153
  • 1
  • 4
  • My guess without knowing: The Cell Array is allocating memory for some general "data object" which only holds pointers/references to the actual data. The memory to hold the content gets allocated later when it gets in use and the decision about it's type is made. – thorsten müller Jul 10 '12 at 10:35

1 Answers1

4

I don't know details of MATLAB's memory handling, so this is just an educated guess: Cell arrays are implemented so that the array itself contains only references (pointers) to the cells, which actually live in the heap. It definitely can't allocate memory for the actual cells in advance because, as you wrote, their size is unknown. However, it can pre-allocate the pointer array itself, since the size of the pointers is known.

When you think about it, it would be quite difficult to implement an array whose element size wouldn't be constant: how would you know where in the memory X[1234] lives, if the size of each element can be different? Therefore a layer of indirection (store constant-sized pointers pointing to the actual data) is quite useful. An alternative would be some sort of linked list, a different kind of trade-off.

Joonas Pulakka
  • 23,534
  • 9
  • 64
  • 93
  • 1
    This would mean that consecutive cells aren't next to one another in a contiguous part of the memory/heap so I guess this means that cell arrays must be slower than standard arrays of fixed type. I suppose this would give an additional performance incentive to use arrays of fixed types with structure wherever possible instead of cell arrays, aside from the fact it makes the programming easier to think about. – Alex McMurray Jul 10 '12 at 10:43
  • Definitely, that's the way it is in most (if not all) programming languages: fixed, primitive types perform better because they're simpler, at least from the computer's point of view. – Joonas Pulakka Jul 10 '12 at 10:45
  • wrt. the linked list possibility, MATLAB could assign a certain amount of memory per element and then if an element exceeded that amount then use the last part of the memory to store a pointer to a new block of memory of a fixed size which stores the next bit of the element and so on until the element is fully stored, on the assumption that most datatypes will not require large amounts of memory (and if they do it is still only as inefficient as a linked list). Does anyone know enough about MATLAB to say whether cell arrays are an array of pointers or instead the linked list? – Alex McMurray Jul 10 '12 at 11:02
  • I'm 99.9 % sure that MATLAB's cell arrays aren't linked lists, because linked lists' pros are insertion and deletion, which aren't so important in stuff typically done in MATLAB, and downside is expensive random access, which is important in stuff typically done in MATLAB. – Joonas Pulakka Jul 10 '12 at 11:38