6

I decided to learn about balanced search trees, so I picked 2-3-4 and splay trees. What are the examples of splay trees usage in the real world?

In this Cornell: http://www.cs.cornell.edu/courses/cs3110/2009fa/recitations/rec-splay.html I read that splay trees are 'A good example is a network router'. But from rest of the explanation seams like network routers use hash tables and not splay trees since the lookup time is constant instead of O(log n).

Keith Thompson
  • 6,402
  • 2
  • 29
  • 35
Meena
  • 161
  • 1
  • 2
  • NB hash tables are only *expected* O(1) time. It can be significantly worse (but correct, so you may not notice) if you make an error in the hashing or collision resolution. A decent tree, however, gives you worst-case guarantees. –  Oct 20 '12 at 22:36
  • I suspect splay trees actually perform surprisingly poorly in real life, because they generate writes to memory during ostensibly read-only operations. That makes life harder for compilers and caches, which are things you really want to have on your side for good performance. – Tom Anderson Oct 21 '12 at 23:35
  • 1
    A splay tree is O(n) worst-case guarantee per query -- it is *amortized* O(log n). However, its adaptive nature means that access to *any given working set* is amortized O(log(working set size)). Splay trees are appropriate for use cases that have strong key use locality, benefit from amortized performance guarantees, and are single-threaded (due to the update requirements mentioned by @TomAnderson). – comingstorm Oct 23 '12 at 19:12
  • A kind of trivial usage of splay trees is 'recommended searches' or a 'text field suggestion'. It's easy to store strings in a binary tree and splay a value when it is searched/entered. The suggestion box could pull the top seven values from the tree. –  May 23 '14 at 23:51

1 Answers1

4

It is important to note that hash tables only have average access time of O(1). This means a particular operation could be much worse. Additionally, there are several requirements for properly formed hash trees:

  • Mostly empty - few hash algorithms perform well beyond 70% usage, and most recommend 50% usage.
  • Collision handling is complex - either having to use a list as the "stored value" or accepting a worst case linear search time
  • Hashing cost - you have to perform a hashing operation, which may or may not involve a lot of work

It sounds like the course mentions that in many routers these issues have been solved so they use a hash table for storage. However there are several advantages to splay trees

  • No overhead - if you allocate 10 MB to storage you get 10 MB worth of data without adjusting your run-times
  • Recent access preference - recent data is more easily accessed, and routers access more recent data frequently (new network nodes are likely to talk, and most networks are not used evenly)
  • Dups are fine - can be useful depending on what you are storing
  • Simple implementation - routers are typically run on fairly weak processors and are incredibly difficult to update, so simplicity is nice

Unfortunately this doesn't answer your question directly, but hopefully it sheds some light on why that is the case.

BTW, here are some notes on why routers may have moved away (these line up with advantages):

  • Storage sizes have gone up, due to the cost of RAM going down, ability to keep things in memory is easier
  • Hash tables when properly maintained have constant access to all elements
  • Hash tables can handle dupes if you need them to
  • Router manufacturers are typically long lived and can overcome complexity issues with time
Guvante
  • 636
  • 3
  • 10