69

I'd like to consider myself a fairly experienced programmer. I've been programming for over 5 years now. My weak point though is terminology. I'm self-taught, so while I know how to program, I don't know some of the more formal aspects of computer science. So, what are practical algorithms/data structures that I could recognize and know by name?

Note, I'm not asking for a book recommendation about implementing algorithms. I don't care about implementing them, I just want to be able to recognize when an algorithm/data structure would be a good solution to a problem. I'm asking more for a list of algorithms/data structures that I should "recognize". For instance, I know the solution to a problem like this:

You manage a set of lockers labeled 0-999. People come to you to rent the locker and then come back to return the locker key. How would you build a piece of software to manage knowing which lockers are free and which are in used?

The solution, would be a queue or stack.

What I'm looking for are things like "in what situation should a B-Tree be used -- What search algorithm should be used here" etc. And maybe a quick introduction of how the more complex(but commonly used) data structures/algorithms work.

I tried looking at Wikipedia's list of data structures and algorithms but I think that's a bit overkill. So I'm looking more for what are the essential things I should recognize?

Earlz
  • 22,658
  • 7
  • 46
  • 60
  • 10
    Voting to close as "not constructive". Any answer will be entirely subjective - there is no consensus on what one "should" know. – Oded Jul 04 '12 at 20:16
  • 2
    What part of that locker problem requires input/output ordering? [hint!] – Telastyn Jul 04 '12 at 20:17
  • @Oded How exactly should I rephrase it to not be subjective? – Earlz Jul 04 '12 at 20:19
  • I don't believe you can. Answers would be backed by facts and numbers, which simply do not exist for this. – Oded Jul 04 '12 at 20:21
  • 5
    @Oded there is absolutely a list that I think most people will agree upon for which data structures and algorithms a well versed programmer should know. – David Cowden Jul 04 '12 at 20:30
  • @Telastyn So you're suggesting an array? How will you remove a locker from the free list or add one back. This was an Amazon interview question and I thought an array at first as well and we ended up at queue/stack as being the most efficient. Of course, a linked-list could be used too, but I think all three will perform the same – Earlz Jul 04 '12 at 20:33
  • @Oded: So how would someone get an answer? Even if there is no "correct" or "the best one". Let us suppose the question is rephrased this way: "What is required or expected to recognise ..."? Let's not be nitpicking, please. – Nerevar Jul 04 '12 at 20:40
  • 2
    @Nerevar - Are Stack Exchange sites supposed to answer all questions now? I, personally, think this is not constructive. You are certainly free to not agree. – Oded Jul 04 '12 at 20:42
  • @earlz - I personally wouldn't use an array (in most languages) since jumping to locker N via index is fragile. But that's a usability issue; a simple array will be the fastest random access if that's your usage pattern. Mostly I'm... mildly horrified that of all the defensible data structures to use for that you picked two of the ones that aren't even defensible. – Telastyn Jul 04 '12 at 20:53
  • 1
    @Telastyn Random access isn't needed though. A person will get a locker out of the free list, and a person will return a locker to the free list. That's it. If you want to discuss further we can carry this onto chat – Earlz Jul 04 '12 at 21:00
  • Ugh, okay. I see your thinking there. – Telastyn Jul 04 '12 at 21:05
  • @DavidCowden: Then perhaps you should put that list as an answer? I'm curious as to what would be on it... – FrustratedWithFormsDesigner Jul 04 '12 at 21:17
  • @FrustratedWithFormsDesigner I'm working on it.. – David Cowden Jul 04 '12 at 21:28
  • 6
    @Oded No consensus? What about the syllabus of an introductory course on algorithms and data structures in computer science? Quite well standardized and [peer reviewed](http://en.m.wikipedia.org/wiki/ABET). A good starting point. – MarkJ Jul 04 '12 at 21:33
  • 1
    Neither a queue or a stack is appropriate. The problem states that the customers are renting the lockers. Therefore you need to determine, at the time the customer returns the key, how much money they owe. If you charge per-time-period, then you need to know how long they rented it for, so you need a data structure in which it is efficient to determine the rent start time for a locker, by locker number. If you're charging by the day, one solution would be to store a 16-bit Julian day number in an array indexed by locker number. – James Youngman Jul 04 '12 at 22:06
  • 3
    Alternative solution; assume you charge by the day and have a maximum charge. Attach a paper tag to the key when you let the locker and write the Julian day number on it. When the key is returned, look at the tag to calculate the rent due. Missing or defaced tags attract the maximum charge. The unused keys are stored in a bag (since there is no need to select any particular key from the free keys when letting a locker). Total data structure size: zero bits. All parts of the algorithm are O(1). – James Youngman Jul 04 '12 at 22:08
  • @JamesYoungman good point there, but that wasn't included as part of the problem. I asked the same thing to the interviewer and he said assume it's a flat fee – Earlz Jul 05 '12 at 06:55
  • I can understand this question being closed, but I definitely do not think it should be deleted. It already has 1000 views, so apparently I'm not the only one with this question. If it's not a good fit for SE, keep it closed.. but I don't think this resource should be deleted – Earlz Sep 26 '12 at 14:38
  • @JamesYoungman the question is actually about how you manage the bag of keys, but I like your insight. – David Cowden Apr 08 '15 at 23:07
  • @Oded I edited the answer and added the most objective thing I'm aware of, the ACM curriculum steering committee's report. Would you consider re-opening given this addition? – David Cowden Mar 24 '16 at 03:25
  • It seems that many people's idea of what counts as the "basics" actually shifts over time so that it's always a bit behind them. Perhaps there isn't a single meaningful place to draw the line, and it's better just to ask what's the most useful thing for you to learn at any point in time. – WoodenKitty Jul 20 '16 at 01:00

4 Answers4

78

An objective response:

While my initial response to this question was based on my empirical experience as a soon-to-graduate CS student and my projected opinion of the type of people I wanted to work with in the CS field. There is actually an objective (with respect to the subjective opinions of the ACM SIGCSE and IEEE computing societies) answer. Every 10 years the ACM and the IEEE bodies cooperate on a joint publication that details suggestions for undergraduate computer science curriculum based on professional knowledge of the state of the computing industry. More information can be found at cs2013.org. The committee publishes a final report listing their curriculum recommendation.

That said, I still think my list is pretty good.

Original answer below.


What Should I Know?

Minimum

I think an adept programmer should have at least undergraduate level knowledge in Computer Science. Sure, you can be effective at many jobs with only a small subset of Computer Science because of the rock solid community CS sits upon, and the narrowed focus of most professional positions. Also, many people will further specialize after undergraduate study. However, I do not think either are an excuse to not be privy of foundational CS knowledge.

To answer the title question, here is what an undergraduate CS student (the foundation for an adept programmer) should know upon graduation:

Data Structures

  • Machine Data Representation
    • Ones, Two's Complement, and Related Arithmetic
    • Words, Pointers, Floating Point
    • Bit Access, Shifting, and Manipulation
  • Linked Lists
  • Hash Tables (maps or dictionaries)
  • Arrays
  • Trees
  • Stacks
  • Queues
  • Graphs
  • Databases

Algorithms

  • Sorting:
    • Bubble Sort (to know why it's bad)
    • Insertion Sort
    • Merge Sort
    • Quick Sort
    • Radix style sorts, Counting Sort and Bucket Sort
    • Heap Sort
    • Bogo and Quantum Sort (=
  • Searching:
    • Linear Search
    • Binary Search
    • Depth First Search
    • Breadth First Search
  • String Manipulation
  • Iteration
  • Tree Traversal
  • List Traversal
  • Hashing Functions
  • Concrete implementation of a Hash Table, Tree, List, Stack, Queue, Array, and Set or Collection
  • Scheduling Algorithms
  • File System Traversal and Manipulation (on the inode or equivalent level).

Design Patterns

  • Modularization
  • Factory
  • Builder
  • Singleton
  • Adapter
  • Decorator
  • Flyweight
  • Observer
  • Iterator
  • State [Machine]
  • Model View Controller
  • Threading and Parallel Programming Patterns

Paradigms

  • Imperative
  • Object Oriented
  • Functional
  • Declarative
  • Static and Dynamic Programming
  • Data Markup

Complexity Theory

  • Complexity Spaces
  • Computability
  • Regular, Context Free, and Universal Turing Machine complete Languages
  • Regular Expressions
  • Counting and Basic Combinatorics

Beyond

To get into what you're asking about later in your question, if you are familiar with the above, you should be easily able to identify the appropriate pattern, algorithm, and data structure for a given scenario. However, you should recognize that there is often no best solution. Sometimes you may be required to pick the lesser of two evils or even simply choose between two equally viable solutions. Because of this, you need the general knowledge to be able to defend your choice against your peers.

Here are some tips for algorithms and data structures:

  • Binary Search can only (and should) be used on sorted data.
  • Radix style sorts are awesome, but only when you have finite classes of things being sorted.
  • Trees are good for almost anything as are Hash Tables. The functionality of a Hash Table can be extrapolated and used to solve many problems at the cost of efficiency.
  • Arrays can be used to back most higher level data structures. Sometimes a "data structure" is no more than some clever math for accessing locations in an array.
  • The choice of language can be the difference between pulling your hair out over, or sailing through, a problem.
  • The ASCII table and a 128 element array form an implicit hash table (=
  • Regular expressions can solve a lot of problems, but they can't be used to parse HTML.
  • Sometimes the data structure is just as important as the algorithm.

Some of the above might seem like no brainers, and some may seem vague. If you want me to go into more detail, I can. But, my hope is when encountered with a more concrete question such as, "Design a function that counts the number of occurrences of every character in a String", you look to the tip about the ASCII table and 128 element arrays forming neat implicit hash tables for the answer.

Based off these ideas, I will propose an answer the locker problem outlined in your question.


Answer to the problem posed in your question.

This may not be the best answer to your question, but I think it's an interesting one that doesn't require anything too complex. And it will certainly beat the time complexity of using a queue, or stack which require linear time to determine whether a locker is free or not.

You have 0-999 lockers. Now, because you have a fixed number of lockers, you can easily conceive a hashing function with no collisions on the range 0-999. This function is simply h(x) = x mod 1000. Now, [conceptually] construct a hash table with integer keys and the contents of a 1000 element char array as your values. If a customer wants to reserve locker 78 for use, simply put 78 into the hash function (returning 78), and then add that number to the base pointer of the array -- storing a true value at the location pointed to by the offset value. Similarly, if you need to check whether 78 is in use, simply read the value stored at that location and check against true.

This solution operates in constant time for lookups and storage as opposed to a log(n) time storage and lookup in the case of a priority queue backed by a binary tree. The description is intentionally verbose so you can see the higher concepts being boiled down into an efficient algorithm.

Now, you might ask, what if I need to know all of the available lockers, wouldn't a priority queue be better? If there are k available lockers in the priority queue, iterating over all of them will take k steps. Further, depending on your priority queue implementation, you might have to rebuild your priority queue as you look at it all.. which would take k*log(k) : (k < 1000) steps. In the array solution, you only have to iterate of a 1000 element array and check which ones are open. You can also add an available or used list to the implementation to check in k time only.

David Cowden
  • 2,903
  • 17
  • 23
  • 1
    Great answer! I'd also like to add, that you really should be confident in using the predefined functions/datastructures of the language you are using, for example algorithm and stl data structures in C++, or the Java API for Java. – marktani Jul 04 '12 at 22:19
  • 1
    Excellent! Especially "Regular expressions can solve a lot of problems, but they can't be used to parse HTML." – FrustratedWithFormsDesigner Jul 04 '12 at 23:40
  • 2
    The answer was good, until the "problem" appeared. There is no reason, at all, to use a priority queue or hash-table. A simple stack is sufficient. Add iteration to get the full list of free lockers if you wish. – Matthieu M. Jul 05 '12 at 06:19
  • Good point about the locker problem. The only thing is that wouldn't a queue still be faster if random access wasn't required? (A customer says "I need a locker" not "I need locker #123") I'm unsure of what you'd say the complexity is for adding and removing from a queue. Probably O(n)? – Earlz Jul 05 '12 at 06:40
  • Also, I have a lot of studying to do :) – Earlz Jul 05 '12 at 06:41
  • 1
    should we add relational database + SQL , knowledege of B+ tree , compiler theory, knowledge hardware organization , knowledge of operation system theory, knowledge of TCP/IP networking ? – dan_l Jul 05 '12 at 08:12
  • TCP/IP and sockets for sure. I tried to include some topics from OS without saying OS: scheduling algorithms, threading, and file systems -- I should probably add security to that list. Compilers are optional in my experience because they lie in a weird place between computer engineering and computer science. But if others agree, we can add Compilers. I think a general knowledge of them is never bad (= – David Cowden Jul 05 '12 at 13:58
  • @MatthieuM. You need to back your stack with something. The most logical would be an array. So why not just use an array? And I said if you need to get the full free list, just keep a list (a stack would be a good way to implement that) of the free or used numbers. Further, the OP suggested priority queue -- I was comparing my version to his. You're getting caught up on the fact that I mention hash table as if it's some big scary thing. If you read carefully, you'll notice that the "hash table" is implicit in the setup of the storage array. This is exactly the point I'm trying to make. – David Cowden Jul 05 '12 at 15:18
  • @DavidCowden: I agree that the stack would probably be array based (for efficiency) or simply linked-list based (heck, even a singly link list would work). I don't understand how the "hash table" can be implicit in the setup of the array; unless we are not using the array as a stack any longer ? Note that the OP talks about a *simple* queue, not a priority queue; so I was wondering why you would introduce it to then bash on the complexity of its operations... – Matthieu M. Jul 05 '12 at 16:48
  • @MatthieuM. I was assuming he was suggesting a solution where you stored all the lockers in a priority queue which returns available lockers first. I think we are understanding his suggestion differently. As far as not understanding the implicit hash table formed by an array and its index as a key into the table, I can explain it in further detail in chat if you want, but I think I've outlined the idea pretty well in the answer. Again, I think we have the same idea -- we're just seeing it differently. – David Cowden Jul 05 '12 at 17:06
  • @MatthieuM. Also, a linked list stack would require k*pointerSize extra space. If you know there are 1000 lockers, and there will only be 1000 lockers, you should pre-allocate the space so you don't waste extra space down the road. I guess I'm thinking of 1000 as a small number in modern terms. 10000000000 lockers would be different and you might want a dynamically expanding array based stack implementation. Either way, it's more efficient to simply keep track of the head of the stack and perform math to reach other elements than to follow pointers. – David Cowden Jul 05 '12 at 17:10
  • @DavidCowden: Ah thanks... as for the hash table, don't worry, I know them inside out; it is just that your comment that the hashtable was implicit in the setup of the array *does not match* the use of the array as a stack: in this case the array is not used *at all* like a hash table. – Matthieu M. Jul 05 '12 at 17:16
  • 1
    I'm skeptical about Design Patterns. Many are useful in some types of languages while useless and/or unnecessary in others. You might also want to add Heuristics under algorithms, and the trie and skip-list data structures. Traditional algorithms/data structures reach a limit on synchronous access but can be beaten by other non-traditional approaches using multiple threads and concurrency. Heuristics can dramatically decrease the number of lookups needed while structures like a skip-list will allow the data structure to be written to without a global lock. – Evan Plaice Jul 05 '12 at 22:26
  • I feel like this answer is a bit overambituous. I guess it depends on what you mean by adept. I agree that most of these things are good to have learned at some point or at least know the foundations of, but I really honestly see no reason whatsoever in memorizing 8 sorting algorithms so you know them by heart. isn't the core idea to know about time/space complexity and knowing how to find the right solution given some requirements, as opposed to memorizing every possible solution? I feel like most of these things can be "forgotten" because you will only use 10 % of them. – sara Jul 09 '16 at 08:11
  • @kai I don't think anyone is saying your should know these by heart, rather the suggestion is that you're familiar with the terms and have encountered the concepts before. You can learn something without memorizing it. – David Cowden Jul 11 '16 at 18:10
6

The Algorithm Design Manual by Steven S. Skiena seems like the source you are searching. The second part is a classified list of problems with a review of the related algorithms. There is a web version.

AProgrammer
  • 10,404
  • 1
  • 30
  • 45
  • 3
    great book, but don't feel you have to master all of it to really be a programmer. I just bought it recently, and I've been paid to program since 1979. (And yes, I bought it believing I could learn something from it.) – Kate Gregory Jul 04 '12 at 23:35
  • @KateGregory I bought the book and couldn't really get the hang of it because I only know high level languages like Ruby and Javascript (no binary trees, linked lists, etc)... I eventually gave up reading it. – bigpotato Jan 03 '15 at 01:00
4

There is no "should". A. Get acquainted with the basic complexity classes (linear, logarithmic, etc.) B. Realize that you can do just about anything with a simple array as you can with a fancy data structure like a B-tree. The trick in choosing the appropriate structure/algorithm is lies in balancing performance, expected input size and implementation complexity.

Then there's abstract but immensely useful stuff (though usefulness is not immediately obvious): state machines, graph theory, convexity theory (linear programming, etc).

gnat
  • 21,442
  • 29
  • 112
  • 288
zvrba
  • 3,470
  • 2
  • 23
  • 22
  • 1
    Don't underestimate the importance about knowing when to use what. Because those problems you solved using a simple array will be coming back and bite you just when you are about to reel in that big client and find out your application that worked fine for years slows down to a crawl just because you used a bubblesort instead of a quicksort. – Pieter B Jul 05 '12 at 07:52
3

MIT publish free lecture notes, videos, assignments and exam material for Introduction to Algorithms. The lecture titles list the algorithms / data structures covered.

This is a peer-reviewed consensus on what you should know. It's probably a great learning resource, too.

MarkJ
  • 2,827
  • 1
  • 18
  • 18