2

I've been learning about using a hashtable to efficiently check for items in a list without looping through the whole thing, but there's one thing that I don't get:

Why hashed keys?

It seems like:

var wordList = {
   'aa'    : ['aa'],
   'aah'   : ['aah'],
   'ahhed' : ['aahed']
};

Would work just as well as:

var wordList = {
   '/* hashed value of aa*/'    : ['aa'],
   '/* hashed value of aah*/'   : ['aah'],
   '/* hashed value of aahed*/' : ['aahed']
};

What's the performance difference between looking up a hashed key and a simple name key?

  • 2
    Your first version is the way a hash table looks to the programmer. The hashes are dealt with under the hood; you don't manipulate the hash values directly. – Robert Harvey Jul 22 '14 at 21:29
  • @RobertHarvey right, I'm just using the structure in the open to show my meaning. In my case, where the table has no collisions (a very long list of unique words), is there any benefit to hashing the keys? – CuriousWebDeveloper Jul 22 '14 at 21:33
  • 1
    @RobertHarvey explains it well. When you convert the string into a number, the number is just an index into the array. Then you don't have to search at all, or do any string comparisons at all. You just go straight to that index and get your answer. BOOYA! – sea-rob Jul 22 '14 at 21:55
  • uhm, you **do** have to search from the index returned from the hashing function, and you have to do string comparisons until you get a match. the number of entries into the hash table is far less than the number of possible strings. the hashing function is not injective (one-to-one). two different strings might hash to the same starting index. when the hash table is first filled, the string that is first placed gets that starting index and strings with the same hash index (that's a collision) that are later placed get bumped over to higher (and, hopefully nearby) addresses. – robert bristow-johnson Aug 04 '15 at 20:43
  • see also: [I'm trying to understand hash tables - can someone explain it to me - clearly?](http://programmers.stackexchange.com/questions/59344/im-trying-to-understand-hash-tables-can-someone-explain-it-to-me-clearly) – gnat Aug 20 '15 at 11:26

2 Answers2

7

Because you still need a way to search the associative array. The hash table is your search mechanism; it gives you O(1) performance for any given key search, by computing an index into the bucket containing your item.

What is your underlying search mechanism, if it's not a hash table? A Binary Search Tree? That's good for very large tables, but it's not O(1); it's O(log n). If you don't have a search mechanism at all (i.e. you're searching the entire array from top to bottom to find your key), your performance is O(n).

If your keys are already ordered, you can use a Binary Search without maintaining a tree; that is also O(log n) performance.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
  • Ok, I'm going to add this to my question: What's the performance difference between looking up a hashed key and a simple name key? – CuriousWebDeveloper Jul 22 '14 at 21:38
  • What is your underlying search mechanism for the simple name key? A hashed key is still a "simple name key," as you call it; it just has a hash value associated with it. – Robert Harvey Jul 22 '14 at 21:38
  • I'm not sure. My goal is to learn to reference a long list of words efficiently. I guess by using `indexOf()`, I can take advantage of the keyed pairs, without needing to loop. That's the performance increase, as I understand it. I just don't understand the difference between a hashed key and a simple name key for my situation, where all names would be unique. – CuriousWebDeveloper Jul 22 '14 at 21:41
  • Do you understand how hash tables work? A hash computes the location of your key/value pair, directly. Without a hash, you have to look at each key one at a time, comparing it with your candidate key. Worst case, you have to look at every key in the table to find your matching key/value pair. – Robert Harvey Jul 22 '14 at 21:43
  • Admittedly, I don't understand it well yet. I've read the wikipedia article and a few SO questions, but I guess I haven't quite gotten it yet. – CuriousWebDeveloper Jul 22 '14 at 21:45
  • Is there a question on this site that asks specifically how a hash table works? – CuriousWebDeveloper Jul 22 '14 at 21:46
  • Probably not. Perhaps a concrete example would help. Let's say your Word List has 1024 entries. Without a search mechanism, the worst case result is that you have to look at all 1024 entries to find your match. With a balanced Binary Search Tree, you only have to look at 10. With a hash table, you only have to look at 1. – Robert Harvey Jul 22 '14 at 21:49
  • I've sort of extended my question into a larger, new one. It seems like a near duplicate in title, but it's so different that if I edited this one, it would sort of invalidate your answer. – CuriousWebDeveloper Jul 22 '14 at 22:06
  • Ok, so now that I've read that great answer about how hash tables work, I'm still not sure that storing each word in `list = {}` with it's own name as a key, then calling `list[wordName]` wouldn't have the same lookup speed for searching to see if a word exisits. Am I wrong about that? – CuriousWebDeveloper Jul 22 '14 at 22:36
  • 3
    Apparently you're still not getting that *you need some sort of search mechanism* to make it fast. Otherwise, you have to loop through every key in your set to find the one you're looking for. Lists that retrieve values based on a key, as in `list[wordName]` always have some such search mechanism operating under the hood. – Robert Harvey Jul 22 '14 at 22:37
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/15897/discussion-between-jonathan-todd-and-robert-harvey). – CuriousWebDeveloper Jul 22 '14 at 22:39
  • I think this answer could benefit from a brief explanation of [how O(1) is achieved](https://en.wikipedia.org/wiki/Hash_table): "hash table uses a hash function to compute an _index_ into an array of buckets or slots, from which the desired value can be found..." – gnat Aug 04 '15 at 06:28
  • 1
    @gnat: As you've already adequately demonstrated, that information is easily obtained. – Robert Harvey Aug 04 '15 at 06:29
  • http://meta.stackoverflow.com/questions/284271/so-you-got-banned-no-problem-just-create-a-new-account-or-not/284437#comment147184_284437 – gnat Aug 04 '15 at 06:43
0

Well, if you need to find something in your associative array, you need to iterate through each of the keys, and find equality. This means that in the worst case, you'll need to iterate through your whole collection to find the good key.

Accessing an array with an index (myArray[0] for example) is instant; it doesn't require any searching, because the runtime of the languages knows exactly where to find this value.

When using a hashtable, you compute the hash code of a given key to find the associated value. The hashcode is an index in the underlying array of the hashtable. This means that finding a key in a hashtable is as fast as accessing an array by index.

An hashtable's code is a bit more complex, but this is the general idea.

Jazzwave06
  • 140
  • 5