All of the options you've laid out for the first case (iteration) are perfectly Pythonic. If you're going to write a for
loop and iterate through an iterable, then it doesn't really matter what that iterable is (unless order matters, then obviously use a tuple
, list
, or something that maintains order and not a set
or dict
). That's the whole point of having an iterable as an abstraction!
For the second case (checking a collection contains an item), I'd always recommend using a set
. Sure, it barely matters if your list
is small. But code changes over time, and usually that change is an addition, not a subtraction. It's common for people to extend the collection to include more elements to test for. Set up future maintainers (possibly including yourself) for success, not a performance headache. Also, a set
is just the right tool/abstraction for the job, since if you just want to know whether an item is in a collection, then duplicates never matter.
If you're curious about performance, never guess! Always measure. It does make a slight difference even with a small iterable:
In [1]: def test_if_in(x):
...: if x in ['one', 'two', 'three']:
...: return 'Yes'
...: else:
...: return 'No'
...:
In [2]: def test_if_in_set(x):
...: if x in {'one', 'two', 'three'}:
...: return 'Yes'
...: else:
...: return 'No'
...:
In [3]: %timeit test_if_in('abc')
The slowest run took 10.23 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 167 ns per loop
In [4]: %timeit test_if_in_set('abc')
The slowest run took 11.04 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 116 ns per loop
In [5]: %timeit test_if_in_set('one')
The slowest run took 11.25 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 114 ns per loop
In [6]: %timeit test_if_in('one')
The slowest run took 11.73 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 109 ns per loop
Even though both are blindingly fast, the set
version is 1.5x faster in the case where the element is not in the iterable (which makes sense, since it has to iterate through everything instead of doing an O(1)
lookup like you can with a set
). It's about the same if happens to be the first item in the iterable. It'll only matter if you're calling the function a lot. It will only get worse as the iterable grows though. The set
starts to dominate list
/tuple
/non-hashed-containers very quickly.