5

There is a quite a bit of information about the time complexity of inserting words into a Trie data structure, but not a whole lot about the space complexity.

I believe the space complexity is O(n**m), where:

n: possible character count

m: average word length

For example, if the available characters are a and b, then n is 2, and the average length of the words, m is 5, wouldn't the worse case be the space usage of 32 (2**5)?

This is my visualisation of this example:

enter image description here

Christophe
  • 74,672
  • 10
  • 115
  • 187
perseverance
  • 575
  • 3
  • 7
  • 9
  • 8
    Data structures do not have time complexity. *Algorithms* working on data structures have time complexity. – Timothy Truckle May 05 '17 at 20:09
  • 3
    A quick [Google Search](https://www.google.com/search?q=space+complexity+of+trie) verifies that the space complexity of a trie is `O(ALPHABET_SIZE * key_length * N)`. See http://www.geeksforgeeks.org/trie-insert-and-search/ – Robert Harvey May 05 '17 at 22:49
  • 2
    I noticed you put a bounty on this question and selected the reason "a detailed canonical answer is required to address all the concerns." Can you be specific about what concerns you would like us to address? – Robert Harvey May 18 '17 at 14:06

3 Answers3

4

Let w be the amount of words in the trie. Then the boundary O(w*m) is much more useful, since it simply represents the max amount of characters in the trie, which is obviously also it's space boundary.


In a way, yes, O(n**m) is a correct boundary too. It's just pretty useless in most cases. For example, w = 200 words with an average length of m = 100 in an alphabet size of n = 50 would result in O(50**100), woot, doesn't fit in the universe! ...while the other boundery would be O(200*100).

dagnelies
  • 5,415
  • 3
  • 20
  • 26
2

A trie itself is a generic term for a data structure that stores keys implicitly as a path. If you google tries, you will see that there are multiple different implementations for a searching data structure where the keys are stored in this way, and thus, there will be different space complexities for each. One that comes to mind from my personal studies are the R-Way Trie, which uses an array of size R (if your keys can be any ASCII character, R would be 256) to store references to additional characters in the key. Each node in this structure thus has to allocate memory for an array of size R, so in terms of space complexity, this trie is O(RN) where N is the number of keys.

Another trie I studied is the DeLabrandais trie, which uses linked lists instead of arrays to store references to additional characters in the key. From a memory perspective, this trie actually is better because it allocates memory as needed for each additional character as opposed to allocating one giant chunk that will probably be partially empty (assuming a non-equal distribution of characters in keys stored). This structure however will take longer to search for a key as you lose the direct access of the reference arrays and might now have to traverse a linked list. Asymptotically, the DLB trie (I think, but might be wrong) is still O(RN), but its memory consumption is practically better in most cases because of the non-equal distribution of characters in keys that I mentioned previously.

1

Another way of thinking this is space being O(kN), where k is the count of possible characters (assuming we are using array to store the mapping), N is the number of nodes in trie.

Whereas more meaningfully, from the client's perspective, the space complexity is O(mn), where m is the average length of strings inserted, n is the number of words. This calculation assumes hashmap mapping, and O(mn) gives an upperbound because it does not consider common prefix.

Jasper Wu
  • 111
  • 3