27

I am writing a script that does something to a text file (what it does is irrelevant for my question though). So before I do something to the file I want to check if the file exists. I can do this, no problem, but the issue is more that of aesthetics.

Here is my code, implementing the same thing in two different ways.

def modify_file(filename):
    assert os.path.isfile(filename), 'file does NOT exist.'


Traceback (most recent call last):
  File "clean_files.py", line 15, in <module>
    print(clean_file('tes3t.txt'))
  File "clean_files.py", line 8, in clean_file
    assert os.path.isfile(filename), 'file does NOT exist.'
AssertionError: file does NOT exist.

or:

def modify_file(filename):
    if not os.path.isfile(filename):
        return 'file does NOT exist.'


file does NOT exist.

The first method produces an output that is mostly trivial, the only thing I care about is that the file does not exist.

The second method returns a string, it is simple.

My questions is: which method is better for letting the user know that the file does not exist? Using the assert method seems somehow more pythonic.

Vader
  • 395
  • 1
  • 3
  • 7

3 Answers3

44

You'd go with a third option instead: use raise and a specific exception. This can be one of the built-in exceptions, or you can create a custom exception for the job.

In this case, I'd use IOError, but a ValueError might also fit:

def modify_file(filename):
    if not os.path.isfile(filename):
        raise IOError('file does NOT exist.')

Using a specific exception allows you to raise other exceptions for different exceptional circumstances, and lets the caller handle the exception gracefully.

Of course, many file operations (like open()) themselves raise OSError already; explicitly first testing if the file exists may be redundant here.

Don't use assert; if you run python with the -O flag, all assertions are stripped from the code.

Martijn Pieters
  • 14,499
  • 10
  • 57
  • 58
  • interesting point about the redundancy! Would you recommend to avoid it? i.e. "it will fail anyway later on" – Ciprian Tomoiagă Oct 31 '18 at 12:57
  • 1
    @CiprianTomoiagă: why double up on tests? If the next line of `modify_file()` is `with open(filename) as f:`, then `IOError` would also be raised. And more recent Python versions have provided more detail in subclasses of `IOError` (`FileNotFoundError` specifically comes to mind) that could be helpful to a developer using this API. If the code does its own checks and raises `IOError` then that helpful detail would be lost. – Martijn Pieters Oct 31 '18 at 13:21
  • @MartijnPieters would an acceptable answer to "why double up on tests?" be when when the first check is faster than the exception raised when open() fails? e.g. when checking for the existence of a file is faster than trying to open and ultimately failing to do so. – Marcel Wilson Sep 23 '19 at 14:40
  • 3
    @MarcelWilson: No, because you'd be micro-optimizing against a method that does I/O. No amount of tweaking in minute Python semantics will make I/O go faster, and only hurt readability and maintainability. Focus on areas with more impact, I'd say. – Martijn Pieters Sep 23 '19 at 17:07
  • I agree, assertions are reserved for *bugs* (i.e. exceptional *internal* situations, that is implementation related), and exceptions are reserved for *errors* (exceptional *external* situations, that is not implementation related). A correct use of assertions in `modify_file` would be adding `assert isinstance(filename, (str, bytes, os.PathLike))` at the top of its body for checking its precondition (assertion assumed to hold at function entry). – Géry Ogam Apr 20 '22 at 09:10
  • @Maggyero: Exceptions are not only for errors, they can also be used as a signalling mechanism outside the 'normal' flow. E.g. `asyncio.CancelledError` and `GeneratorExit` are used as signals to wind up an active asyncio task or generator. Cancelling a task or generator is not an error, and your implementation of the task or generator can alter behaviour by catching and handling those exceptions. Exceptions are, essentially, a different mode of flow control. – Martijn Pieters Apr 20 '22 at 10:47
  • By ‘error’ I meant exceptional external situation (exceptional meaning that it can violate the software specification). Note that `asyncio.CancelledError` has the word ‘error’ in it. So how do you define error exceptions vs non-error exceptions? – Géry Ogam Apr 20 '22 at 11:30
  • @Maggyero: all but 2 of the concrete exceptions have 'error' in their name, it's a naming convention to contrast with the Warning exceptions, and 'errors' are just a specific use case for these. They are all *out-of-band **signals***. It is not an error when a programmers *know and expect an exception to occur* in particular calls, and use that signal control code flow where others might use an `if` test. Fundamentally, APIs takes specific input and promise specific output, and you should consider using exceptions for communicating anything that doesn't fit those specifics. – Martijn Pieters Apr 20 '22 at 11:54
  • 1
    @Maggyero: 'it can violate the software specification' is just one exceptional situation. When you reach the end of an iterator, a `StopIteration` exception is raised, not because the consumer of the iterator violated the software specification, but because the API promises to yield objects from the iterator until exhausted, and raising an exception is the best fit for signalling there is nothing more to yield. The exception **is part of the software specification**, and is the logical choice when the promise to return *something* can no longer be fulfilled. – Martijn Pieters Apr 20 '22 at 12:00
  • My bad, you are right, *assertions* are not reserved for exceptional situations that are implementation-related; *exceptions* can also be used for those, like the `__next__` method of an iterator raising a `StopIteration` exception when it is exhausted. Indeed, `StopIteration` should not be optimised away like assertions since exhausting an iterator is a *possible* situation. So assertions are actually reserved for *impossible* situations (i.e. specification violations). And exceptions are actually reserved for *exceptional* situations (i.e. abnormal but specification-conformant). – Géry Ogam Apr 23 '22 at 21:50
20

assert is intended for cases where the programmer calling the function made a mistake, as opposed to the user. Using assert under that circumstance allows you to make sure a programmer is using your function correctly during testing, but then strip it out in production.

Its value is somewhat limited since you have to ensure you exercise that path through the code, and you often want to additionally handle the problem with a separate if statement in production. assert is most useful in situations like, "I want to helpfully work around this problem if a user hits it, but if a developer hits it, I want it to crash hard so he will fix the code that calls this function incorrectly."

In your particular case, a missing file is almost certainly a user error, and should be handled by raising an exception.

Karl Bielefeldt
  • 146,727
  • 38
  • 279
  • 479
  • 1
    +1 for the distinction between *programmers* and *users* at the origin of exceptional situations (i.e. exceptional *internal* situations vs exceptional *external* situations) for choosing between *assertions* and *exceptions*. People often incorrectly think that public and protected functions should use exceptions for their preconditions and that only private functions should use assertions for their preconditions. In reality *all* functions should use assertions for their preconditions because it is the programmer who provides the arguments by calling the function, not the user. – Géry Ogam Apr 20 '22 at 09:44
7

From UsingAssertionsEffectively

Checking isinstance() should not be overused: if it quacks like a duck, there's perhaps no need to enquire too deeply into whether it really is. Sometimes it can be useful to pass values that were not anticipated by the original programmer.

Places to consider putting assertions:

checking parameter types, classes, or values
checking data structure invariants
checking "can't happen" situations (duplicates in a list, contradictory state variables.)
after calling a function, to make sure that its return is reasonable 

The overall point is that if something does go wrong, we want to make it completely obvious as soon as possible.

It's easier to catch incorrect data at the point where it goes in than to work out how it got there later when it causes trouble.

Assertions are not a substitute for unit tests or system tests, but rather a complement. Because assertions are a clean way to examine the internal state of an object or function, they provide "for free" a clear-box assistance to a black-box test that examines the external behaviour.

Assertions should not be used to test for failure cases that can occur because of bad user input or operating system/environment failures, such as a file not being found. Instead, you should raise an exception, or print an error message, or whatever is appropriate. One important reason why assertions should only be used for self-tests of the program is that assertions can be disabled at compile time.

If Python is started with the -O option, then assertions will be stripped out and not evaluated. So if code uses assertions heavily, but is performance-critical, then there is a system for turning them off in release builds. (But don't do this unless it's really necessary. It's been scientifically proven that some bugs only show up when a customer uses the machine and we want assertions to help there too. :-) )

dspjm
  • 71
  • 1
  • 3