Which algorithms/data structures should I "recognize" and know by name?

Question

I'd like to consider myself a fairly experienced programmer. I've been programming for over 5 years now. My weak point though is terminology. I'm self-taught, so while I know how to program, I don't know some of the more formal aspects of computer science. So, what are practical algorithms/data structures that I could recognize and know by name?

Note, I'm not asking for a book recommendation about implementing algorithms. I don't care about implementing them, I just want to be able to recognize when an algorithm/data structure would be a good solution to a problem. I'm asking more for a list of algorithms/data structures that I should "recognize". For instance, I know the solution to a problem like this:

You manage a set of lockers labeled 0-999. People come to you to rent the locker and then come back to return the locker key. How would you build a piece of software to manage knowing which lockers are free and which are in used?

The solution, would be a queue or stack.

What I'm looking for are things like "in what situation should a B-Tree be used -- What search algorithm should be used here" etc. And maybe a quick introduction of how the more complex(but commonly used) data structures/algorithms work.

I tried looking at Wikipedia's list of data structures and algorithms but I think that's a bit overkill. So I'm looking more for what are the essential things I should recognize?

Voting to close as "not constructive". Any answer will be entirely subjective - there is no consensus on what one "should" know. — Oded, Jul 04 '12 at 20:16
What part of that locker problem requires input/output ordering? [hint!] — Telastyn, Jul 04 '12 at 20:17
@Oded How exactly should I rephrase it to not be subjective? — Earlz, Jul 04 '12 at 20:19
I don't believe you can. Answers would be backed by facts and numbers, which simply do not exist for this. — Oded, Jul 04 '12 at 20:21
@Oded there is absolutely a list that I think most people will agree upon for which data structures and algorithms a well versed programmer should know. — David Cowden, Jul 04 '12 at 20:30
@Telastyn So you're suggesting an array? How will you remove a locker from the free list or add one back. This was an Amazon interview question and I thought an array at first as well and we ended up at queue/stack as being the most efficient. Of course, a linked-list could be used too, but I think all three will perform the same — Earlz, Jul 04 '12 at 20:33
@Oded: So how would someone get an answer? Even if there is no "correct" or "the best one". Let us suppose the question is rephrased this way: "What is required or expected to recognise ..."? Let's not be nitpicking, please. — Nerevar, Jul 04 '12 at 20:40
@Nerevar - Are Stack Exchange sites supposed to answer all questions now? I, personally, think this is not constructive. You are certainly free to not agree. — Oded, Jul 04 '12 at 20:42
@earlz - I personally wouldn't use an array (in most languages) since jumping to locker N via index is fragile. But that's a usability issue; a simple array will be the fastest random access if that's your usage pattern. Mostly I'm... mildly horrified that of all the defensible data structures to use for that you picked two of the ones that aren't even defensible. — Telastyn, Jul 04 '12 at 20:53
@Telastyn Random access isn't needed though. A person will get a locker out of the free list, and a person will return a locker to the free list. That's it. If you want to discuss further we can carry this onto chat — Earlz, Jul 04 '12 at 21:00
@DavidCowden: Then perhaps you should put that list as an answer? I'm curious as to what would be on it... — FrustratedWithFormsDesigner, Jul 04 '12 at 21:17
@Oded No consensus? What about the syllabus of an introductory course on algorithms and data structures in computer science? Quite well standardized and [peer reviewed](http://en.m.wikipedia.org/wiki/ABET). A good starting point. — MarkJ, Jul 04 '12 at 21:33
Neither a queue or a stack is appropriate. The problem states that the customers are renting the lockers. Therefore you need to determine, at the time the customer returns the key, how much money they owe. If you charge per-time-period, then you need to know how long they rented it for, so you need a data structure in which it is efficient to determine the rent start time for a locker, by locker number. If you're charging by the day, one solution would be to store a 16-bit Julian day number in an array indexed by locker number. — James Youngman, Jul 04 '12 at 22:06
Alternative solution; assume you charge by the day and have a maximum charge. Attach a paper tag to the key when you let the locker and write the Julian day number on it. When the key is returned, look at the tag to calculate the rent due. Missing or defaced tags attract the maximum charge. The unused keys are stored in a bag (since there is no need to select any particular key from the free keys when letting a locker). Total data structure size: zero bits. All parts of the algorithm are O(1). — James Youngman, Jul 04 '12 at 22:08
@JamesYoungman good point there, but that wasn't included as part of the problem. I asked the same thing to the interviewer and he said assume it's a flat fee — Earlz, Jul 05 '12 at 06:55
I can understand this question being closed, but I definitely do not think it should be deleted. It already has 1000 views, so apparently I'm not the only one with this question. If it's not a good fit for SE, keep it closed.. but I don't think this resource should be deleted — Earlz, Sep 26 '12 at 14:38
@JamesYoungman the question is actually about how you manage the bag of keys, but I like your insight. — David Cowden, Apr 08 '15 at 23:07
@Oded I edited the answer and added the most objective thing I'm aware of, the ACM curriculum steering committee's report. Would you consider re-opening given this addition? — David Cowden, Mar 24 '16 at 03:25
It seems that many people's idea of what counts as the "basics" actually shifts over time so that it's always a bit behind them. Perhaps there isn't a single meaningful place to draw the line, and it's better just to ask what's the most useful thing for you to learn at any point in time. — WoodenKitty, Jul 20 '16 at 01:00

score 78 · Accepted Answer · edited May 23 '17 at 12:40

An objective response:

While my initial response to this question was based on my empirical experience as a soon-to-graduate CS student and my projected opinion of the type of people I wanted to work with in the CS field. There is actually an objective (with respect to the subjective opinions of the ACM SIGCSE and IEEE computing societies) answer. Every 10 years the ACM and the IEEE bodies cooperate on a joint publication that details suggestions for undergraduate computer science curriculum based on professional knowledge of the state of the computing industry. More information can be found at cs2013.org. The committee publishes a final report listing their curriculum recommendation.

That said, I still think my list is pretty good.

Original answer below.

What Should I Know?

Minimum

I think an adept programmer should have at least undergraduate level knowledge in Computer Science. Sure, you can be effective at many jobs with only a small subset of Computer Science because of the rock solid community CS sits upon, and the narrowed focus of most professional positions. Also, many people will further specialize after undergraduate study. However, I do not think either are an excuse to not be privy of foundational CS knowledge.

To answer the title question, here is what an undergraduate CS student (the foundation for an adept programmer) should know upon graduation:

Data Structures

Machine Data Representation
- Ones, Two's Complement, and Related Arithmetic
- Words, Pointers, Floating Point
- Bit Access, Shifting, and Manipulation
Linked Lists
Hash Tables (maps or dictionaries)
Arrays
Trees
Stacks
Queues
Graphs
Databases

Algorithms

Sorting:
- Bubble Sort (to know why it's bad)
- Insertion Sort
- Merge Sort
- Quick Sort
- Radix style sorts, Counting Sort and Bucket Sort
- Heap Sort
- Bogo and Quantum Sort (=
Searching:
- Linear Search
- Binary Search
- Depth First Search
- Breadth First Search
String Manipulation
Iteration
Tree Traversal
List Traversal
Hashing Functions
Concrete implementation of a Hash Table, Tree, List, Stack, Queue, Array, and Set or Collection
Scheduling Algorithms
File System Traversal and Manipulation (on the inode or equivalent level).

Design Patterns

Modularization
Factory
Builder
Singleton
Adapter
Decorator
Flyweight
Observer
Iterator
State [Machine]
Model View Controller
Threading and Parallel Programming Patterns

Paradigms

Imperative
Object Oriented
Functional
Declarative
Static and Dynamic Programming
Data Markup

Complexity Theory

Complexity Spaces
Computability
Regular, Context Free, and Universal Turing Machine complete Languages
Regular Expressions
Counting and Basic Combinatorics

Beyond

To get into what you're asking about later in your question, if you are familiar with the above, you should be easily able to identify the appropriate pattern, algorithm, and data structure for a given scenario. However, you should recognize that there is often no best solution. Sometimes you may be required to pick the lesser of two evils or even simply choose between two equally viable solutions. Because of this, you need the general knowledge to be able to defend your choice against your peers.

Here are some tips for algorithms and data structures:

Binary Search can only (and should) be used on sorted data.
Radix style sorts are awesome, but only when you have finite classes of things being sorted.
Trees are good for almost anything as are Hash Tables. The functionality of a Hash Table can be extrapolated and used to solve many problems at the cost of efficiency.
Arrays can be used to back most higher level data structures. Sometimes a "data structure" is no more than some clever math for accessing locations in an array.
The choice of language can be the difference between pulling your hair out over, or sailing through, a problem.
The ASCII table and a 128 element array form an implicit hash table (=
Regular expressions can solve a lot of problems, but they can't be used to parse HTML.
Sometimes the data structure is just as important as the algorithm.

Some of the above might seem like no brainers, and some may seem vague. If you want me to go into more detail, I can. But, my hope is when encountered with a more concrete question such as, "Design a function that counts the number of occurrences of every character in a String", you look to the tip about the ASCII table and 128 element arrays forming neat implicit hash tables for the answer.

Based off these ideas, I will propose an answer the locker problem outlined in your question.

Answer to the problem posed in your question.

This may not be the best answer to your question, but I think it's an interesting one that doesn't require anything too complex. And it will certainly beat the time complexity of using a queue, or stack which require linear time to determine whether a locker is free or not.

You have 0-999 lockers. Now, because you have a fixed number of lockers, you can easily conceive a hashing function with no collisions on the range 0-999. This function is simply h(x) = x mod 1000. Now, [conceptually] construct a hash table with integer keys and the contents of a 1000 element char array as your values. If a customer wants to reserve locker 78 for use, simply put 78 into the hash function (returning 78), and then add that number to the base pointer of the array -- storing a true value at the location pointed to by the offset value. Similarly, if you need to check whether 78 is in use, simply read the value stored at that location and check against true.

This solution operates in constant time for lookups and storage as opposed to a log(n) time storage and lookup in the case of a priority queue backed by a binary tree. The description is intentionally verbose so you can see the higher concepts being boiled down into an efficient algorithm.

Now, you might ask, what if I need to know all of the available lockers, wouldn't a priority queue be better? If there are k available lockers in the priority queue, iterating over all of them will take k steps. Further, depending on your priority queue implementation, you might have to rebuild your priority queue as you look at it all.. which would take k*log(k) : (k < 1000) steps. In the array solution, you only have to iterate of a 1000 element array and check which ones are open. You can also add an available or used list to the implementation to check in k time only.

Great answer! I'd also like to add, that you really should be confident in using the predefined functions/datastructures of the language you are using, for example algorithm and stl data structures in C++, or the Java API for Java. — marktani, Jul 04 '12 at 22:19
Excellent! Especially "Regular expressions can solve a lot of problems, but they can't be used to parse HTML." — FrustratedWithFormsDesigner, Jul 04 '12 at 23:40
The answer was good, until the "problem" appeared. There is no reason, at all, to use a priority queue or hash-table. A simple stack is sufficient. Add iteration to get the full list of free lockers if you wish. — Matthieu M., Jul 05 '12 at 06:19
Good point about the locker problem. The only thing is that wouldn't a queue still be faster if random access wasn't required? (A customer says "I need a locker" not "I need locker #123") I'm unsure of what you'd say the complexity is for adding and removing from a queue. Probably O(n)? — Earlz, Jul 05 '12 at 06:40
should we add relational database + SQL , knowledege of B+ tree , compiler theory, knowledge hardware organization , knowledge of operation system theory, knowledge of TCP/IP networking ? — dan_l, Jul 05 '12 at 08:12
TCP/IP and sockets for sure. I tried to include some topics from OS without saying OS: scheduling algorithms, threading, and file systems -- I should probably add security to that list. Compilers are optional in my experience because they lie in a weird place between computer engineering and computer science. But if others agree, we can add Compilers. I think a general knowledge of them is never bad (= — David Cowden, Jul 05 '12 at 13:58
@MatthieuM. You need to back your stack with something. The most logical would be an array. So why not just use an array? And I said if you need to get the full free list, just keep a list (a stack would be a good way to implement that) of the free or used numbers. Further, the OP suggested priority queue -- I was comparing my version to his. You're getting caught up on the fact that I mention hash table as if it's some big scary thing. If you read carefully, you'll notice that the "hash table" is implicit in the setup of the storage array. This is exactly the point I'm trying to make. — David Cowden, Jul 05 '12 at 15:18
@DavidCowden: I agree that the stack would probably be array based (for efficiency) or simply linked-list based (heck, even a singly link list would work). I don't understand how the "hash table" can be implicit in the setup of the array; unless we are not using the array as a stack any longer ? Note that the OP talks about a *simple* queue, not a priority queue; so I was wondering why you would introduce it to then bash on the complexity of its operations... — Matthieu M., Jul 05 '12 at 16:48
@MatthieuM. I was assuming he was suggesting a solution where you stored all the lockers in a priority queue which returns available lockers first. I think we are understanding his suggestion differently. As far as not understanding the implicit hash table formed by an array and its index as a key into the table, I can explain it in further detail in chat if you want, but I think I've outlined the idea pretty well in the answer. Again, I think we have the same idea -- we're just seeing it differently. — David Cowden, Jul 05 '12 at 17:06
@MatthieuM. Also, a linked list stack would require k*pointerSize extra space. If you know there are 1000 lockers, and there will only be 1000 lockers, you should pre-allocate the space so you don't waste extra space down the road. I guess I'm thinking of 1000 as a small number in modern terms. 10000000000 lockers would be different and you might want a dynamically expanding array based stack implementation. Either way, it's more efficient to simply keep track of the head of the stack and perform math to reach other elements than to follow pointers. — David Cowden, Jul 05 '12 at 17:10
@DavidCowden: Ah thanks... as for the hash table, don't worry, I know them inside out; it is just that your comment that the hashtable was implicit in the setup of the array *does not match* the use of the array as a stack: in this case the array is not used *at all* like a hash table. — Matthieu M., Jul 05 '12 at 17:16
I'm skeptical about Design Patterns. Many are useful in some types of languages while useless and/or unnecessary in others. You might also want to add Heuristics under algorithms, and the trie and skip-list data structures. Traditional algorithms/data structures reach a limit on synchronous access but can be beaten by other non-traditional approaches using multiple threads and concurrency. Heuristics can dramatically decrease the number of lookups needed while structures like a skip-list will allow the data structure to be written to without a global lock. — Evan Plaice, Jul 05 '12 at 22:26
I feel like this answer is a bit overambituous. I guess it depends on what you mean by adept. I agree that most of these things are good to have learned at some point or at least know the foundations of, but I really honestly see no reason whatsoever in memorizing 8 sorting algorithms so you know them by heart. isn't the core idea to know about time/space complexity and knowing how to find the right solution given some requirements, as opposed to memorizing every possible solution? I feel like most of these things can be "forgotten" because you will only use 10 % of them. — sara, Jul 09 '16 at 08:11
@kai I don't think anyone is saying your should know these by heart, rather the suggestion is that you're familiar with the terms and have encountered the concepts before. You can learn something without memorizing it. — David Cowden, Jul 11 '16 at 18:10

score 6 · Answer 2 · answered Jul 04 '12 at 20:37

6

The Algorithm Design Manual by Steven S. Skiena seems like the source you are searching. The second part is a classified list of problems with a review of the related algorithms. There is a web version.

answered Jul 04 '12 at 20:37

AProgrammer

10,404
1
30
45

3

great book, but don't feel you have to master all of it to really be a programmer. I just bought it recently, and I've been paid to program since 1979. (And yes, I bought it believing I could learn something from it.) – Kate Gregory Jul 04 '12 at 23:35
@KateGregory I bought the book and couldn't really get the hang of it because I only know high level languages like Ruby and Javascript (no binary trees, linked lists, etc)... I eventually gave up reading it. – bigpotato Jan 03 '15 at 01:00

score 4 · Answer 3 · edited Jul 05 '12 at 07:42

4

There is no "should". A. Get acquainted with the basic complexity classes (linear, logarithmic, etc.) B. Realize that you can do just about anything with a simple array as you can with a fancy data structure like a B-tree. The trick in choosing the appropriate structure/algorithm is lies in balancing performance, expected input size and implementation complexity.

Then there's abstract but immensely useful stuff (though usefulness is not immediately obvious): state machines, graph theory, convexity theory (linear programming, etc).

edited Jul 05 '12 at 07:42

gnat

21,442
29
112
288