-1

I have a collection that I'm putting elements into based on some sort of input. About 90% of the time, there will be only one element in this collection. There may be more elements, but each element past the first has an exponentially decreasing chance of appearing. So it might have 2 elements 9% of the time, 3 elements 0.9%, 4 elements 0.09%, etc.

What data structure is most efficient here? My two ideas are an ArrayList with an initial size of 1, or some other very small number (maybe 2 or 3?), or a LinkedList, which will always have the overhead of the head/tail pointers, but won't waste any time/space allocating additional memory.

I realize that whatever I choose probably isn't terribly important, but I've come across this situation multiple times and I'm interested if there's an established convention.

codebreaker
  • 1,694
  • 1
  • 18
  • 24
  • 2
    Possible duplicate of [Is micro-optimisation important when coding?](https://softwareengineering.stackexchange.com/questions/99445/is-micro-optimisation-important-when-coding) – gnat Nov 25 '17 at 18:32
  • @gnat but I am interested in the solution to this specific problem, too – codebreaker Nov 25 '17 at 18:33
  • 1
    Can the downvoters please explain? I feel like I've asked very similar questions that were upvoted here... – codebreaker Nov 25 '17 at 19:18
  • 3
    I suspect the reason for the downvotes is that the answer is effectively "it really doesn't matter", which you already state in your question that you understand, so there's really nowhere for an answer to actually go. For single-element collections, you are unlikely to ever see any noticeable performance difference between any of the available collection implementations. I mean, maybe HashSet would be a little slower, because it'd need to hash the item before adding it, but other than that you can just use anything. So why worry? Just use whatever seems most reasonable to you. – Jules Nov 26 '17 at 02:36

2 Answers2

6

For collections as short as yours, there's no point in using sophisticated data structures. The only aspect that might matter is memory consumption.

And as others already commented, what you're doing looks like premature micro-optimization. But that's not always bad.

I'd use new ArrayList(2). Why?

  • It doesn't degrade readability of your code later, using the collection.
  • It allocates an array large enough for 99 percent of cases.
  • It doesn't put correctness of your code at risk.
  • It won't be slowing down your application (surely less than a percent, if adding to the collection happens most of the time).
  • It will reduce memory consumption.

Maybe an initial size of 1 or 3 would be better, but you should spend time on that only if you found it necessary.

In a similar part of one memory-tight application, after profiling memory consumption patterns, I came up with the following scheme:

  • I had many single-element collections, often many of them containing the very same element.
  • For single-element collections, I shared singletons created with Collections.singletonList().
  • Only when "adding" the next element, I replaced that with an ArrayList.

That came at a cost:

  • It made the code less readable: instead of simply adding, I needed to support that list exchange.
  • It created the risk of introducing a bug there.

But in my case, it gave a significant memory improvement.

Ralf Kleberhoff
  • 5,891
  • 15
  • 19
0

If you have any concurrency concerns, this is a good use case for a CopyOnWriteArrayList. Their main drawback is they recreate copies whenever you add an element, but that's not happening often in your case! Plus you aren't wasting much memory and GC since your lists are tiny.
They provide great thread safety.

Memory/time wise they are slightly less efficient than the straight ArrayList as suggested by Ralf, but if you are ever locking on them, wrapping in a synchronizedList(), etc. then they would probably be more efficient.

user949300
  • 8,679
  • 2
  • 26
  • 35