1

I want to design a hash table library that keeps usage statistics and based on how it is used will use different implementations at runtime. For example use a certain implementation for small size hash tables and a different one for large ones (or tweak parameters to maximize cache hits).

For example let's say I have a SmallDataHashTable, optimized for small size tables, BigDataHashTable optimized for large ones, LookUpHashTable optimized for lookup, and InsertHashTable optimized for insertion deletion and so on. Instead of leaving it to the user (developer) to decide which one works best for their use case, I want to create a BestHashTable that adapts to the best implementation based on how it is being used. For example if the user is using mostly large tables with lots of insertions, then a certain implementation is used under the hood.

To summarize let's say I keep usage statistics for n parameters. I want to select/tune implementation based on these statistics:

1- Can such design even in theory be the "best" general use hash table?
2- How do I go about designing such data structure? (I will be implementing it in C++ if that matters)

pooya13
  • 187
  • 1
  • 5
  • (From someone's comment a few days ago: https://tylerayoung.com/2019/01/29/benchmarks-of-cache-friendly-data-structures-in-c/) It does not require reflection at all. Reflection does not help. Also, C++ is not a JIT language, so usage statistics don't help much. – rwong Mar 24 '19 at 06:46
  • This usage relies heavily on what conditions make a hash table small/large. Also, why couldn't you use abstract classes? You could have a "base" (we can call it something like UsageBase), then you could make separate implementations for small and large. – eparham7861 Mar 24 '19 at 06:52
  • @eparham7861 I don't want the user to decide which subclass to use. I want the hash table to automatically use specific sub-class based on how the hash table is being used. – pooya13 Mar 24 '19 at 07:59
  • @rwong thanks! I took out reflection. I'm not sure what keywords would describe what I have in mind haha. I guess I am looking for a term for describing code that is able to use different runtime behaviour based on usage history. – pooya13 Mar 24 '19 at 08:04
  • Unless you are making this a library for other developers to use, I don't see how the user physically decides what implementation to use. The whole point of having abstract/concrete classes is to be able to have different behavior based on meeting criteria. – eparham7861 Mar 24 '19 at 08:15
  • @eparham7861 Hmmm not sure if we are on the same page. Let's say I have a class called small_map optimized for small hash_tables and one called large_map optimized for large ones. I want to create a class called best_map, that based on how it is being used, it chooses to use small_map or large_map. For example user uses best_map in their code with mostly small sizes, then best_map uses the small_map implementation at runtime by detecting the fact that it is being used with small sizes. – pooya13 Mar 24 '19 at 08:21
  • Ok, you are intending the user to be a developer using your library. And it looks like you have the implementation for both small and large. You could go with some kind of Nosql that can be used like a local session log that is referenced/checked on startup. – eparham7861 Mar 24 '19 at 08:45
  • When you've done the implementation, make sure to thoroughly profile your hashtable against a plain large-size implementation. I wouldn't be surprised to see the plain one outperforming your switching implementation even with just a handful of entries. – Ralf Kleberhoff Mar 24 '19 at 11:29
  • The only scenario I can imagine this to work for is if you have some persisted dataset that is copied to the hash table at startup time. In that case you could just instantiate either the BigDataHashTable or the SmallDataHashTable. If you start with no data, you would have to switch lookup logjc when the size exceeds a certain number of items. Not impossible but pointless and with the collection being a hash table either way, the lookup logic is already set so you will not have many options. Gradually increasing the hash size is no option either since that would change he interface. – Martin Maat Mar 24 '19 at 11:33
  • 1
    It is common practice (and not really difficult) to have an implementation that (automatically) increases the number of hash buckets at some level of collisions / density. – Erik Eidt Mar 24 '19 at 14:00
  • Thanks guys for the helpful comments. I have edited/clarified the question to fit the rules. – pooya13 Mar 24 '19 at 22:12

1 Answers1

4

You can have a proxy class that provides the same methods as a hash table and initially delegates them to an instance of the implementation optimized for small sizes.

Once the hashtable grows over a certain size, the proxy creates an instance of the large implementation, moves all the entries to this instance and makes it the default. A question than arrises: will you revert to using the smaller instance if a lot of entries are deleted?

D. Jurcau
  • 537
  • 4
  • 8
  • Thank you for your help. I have reworded my question to fit the rules. Would you like to revise your answer? – pooya13 Mar 24 '19 at 17:51