Questions tagged [trie]

A trie is a data-structure used for matching strings in a set of strings using a tree like structure. It can be either a prefix or a suffix trie. This tag shall not be confused with trees.

A trie is a data-structure used for matching strings in a set using a tree like structure. It is constructed as a graph between adjacent letters either in a prefix or a suffix of a string.

This data-struture can be used to implement dictionaries. Other alternatives are hash tables, ordinary tries, and similar structures

This tag shall not be confused with trees.

See also:

22 questions
22
votes
2 answers

What is an Aguri tree?

Going through some old Hacker News items, I came across a post from a user that said Aguri trees, which marry a bounded-size radix trie (like you'd use in a software routing table) to an LRU list, and automatically synthesize aggregates (like,…
phwd
  • 458
  • 3
  • 17
14
votes
2 answers

Efficient Trie implementation for unicode strings

I have been looking for an efficient String trie implementation. Mostly I have found code like this: Referential implementation in Java (per wikipedia) I dislike these implementations for mostly two reasons: They support only 256 ASCII characters.…
RokL
  • 2,421
  • 3
  • 19
  • 14
10
votes
5 answers

Is it bad practice to read large file in constructor?

So, I am trying to create an English language trie data structure implementation in C++. I have created a Trie and TrieNode class. The TrieNode class takes in its constructor a vector that is a list of words to construct the Trie from. My…
Bassinator
  • 714
  • 1
  • 8
  • 22
8
votes
1 answer

Efficient data structure to implement fake file system

I want to implement a data structure that will hold the paths of directories, sort of fake file system. Input:- I have a text configuration file containing the paths as follows ... C:/temp1 C:/temp1/insideTemp1 C:/temp2/ ... I might end up…
user3054204
  • 89
  • 1
  • 2
6
votes
1 answer

Autosuggest at scale - trie sharding

While reading on the design for autosuggest implementation on large scale systems (like google), I'm able to understand the usage of trie and how top "n" terms are stored at each node to quickly retrieve the list. However, I'm not able to get my…
5
votes
3 answers

What is the space complexity for inserting a list of words into a Trie data structure?

There is a quite a bit of information about the time complexity of inserting words into a Trie data structure, but not a whole lot about the space complexity. I believe the space complexity is O(n**m), where: n: possible character count m: average…
perseverance
  • 575
  • 3
  • 7
  • 9
5
votes
2 answers

Fast set indexing data structure for superset retrieval

I am given a set of sets: {{a,b}, {a,b,c}, {a,c}, {a,c,f}} I would like to have a data structure to index those sets such that the following "lookup" is executed fast: find all supersets of a given set. For example, given the set {a,c} the…
Asterios
  • 161
  • 3
4
votes
2 answers

Finding common prefixes for a set of strings

I am trying to find common prefixes for a sorted set of strings. i.e. if the following strings are given: AB123456 AB123457 ABCDEFGH ABCDEFGX1 ABCDEFGY XXXX then my function should return three prefixes and their suffixes: AB12345 6,7 ABCDEFG …
3
votes
1 answer

PHI, NoSQL, and searching

I am working with a system that is using NoSQL (Azure Table Storage) primarily to house its data. Unfortunately, the work also involves billing and medical records - meaning the data itself will need to be protected. That's fine, we can provide user…
Telastyn
  • 108,850
  • 29
  • 239
  • 365
3
votes
1 answer

Data structure for traversing hierarchical hostnames

So, quite often I'll come across a situation where I'd like to process hostnames in a hierarchical manner. For example, given a hostname "foo.bar.baz.example.com", I might want to compare it against a data structure that contains a higher authority…
Siler
  • 421
  • 3
  • 11
3
votes
4 answers

fast n-gram access data structure

TL;DR Is there a data structure that'd quickly let me match words at any point (e.g., 'foo' matches 'foobar' and 'zoofoo'), and, ideally, returns a list of "characters that show up after the needle" (e.g., 'foo' should return ['b', $]). I'm…
Tordek
  • 454
  • 1
  • 6
  • 9
3
votes
3 answers

What is the name of this tree?

It has a single root and each node has 0..N ordered sub-nodes . The keys represent a distinct set of paths. Two trees can only be merged if they share a common root. It needs to support, at minimum: insert, merge, enumerate paths. For this…
Daniel
  • 699
  • 4
  • 13
2
votes
3 answers

Efficient way to detect duplicates in a stream of up to 20M numeric strings

I have a stream of documents, each having an ID which is a 64 bit unsigned long given as a (maximum length is 20) string. Sometimes an ID appears multiple times in the stream. I don't want to process a document if I already processed it, so I'm…
uylmz
  • 1,129
  • 1
  • 8
  • 17
2
votes
1 answer

Explanation of how to resolve Hash conflicts in HAMT or hashtables in general

I am working on trying to understand HAMT and now am uncertain generally what to do when you run into a conflict in a hash. All I understand so far is you create a list to append the keys to, but I don't understand any deeper. I would like to know…
user10869858
  • 249
  • 1
  • 5
2
votes
0 answers

How to build a HAMT with multi-index keys

By that I mean, forming a Hash Array Mapped Trie with 2 or more indexed fields, such as a User model by email and location name, or email + username + last logged in + isActive. Basically any arbitrary number of columns of any type of field…
user10869858
  • 249
  • 1
  • 5
1
2