How can I avoid chasing my own tail when testing against complicated return values?

Question

Sometimes there are functions that return complicated data and cannot be divided any further e.g. in the area of signal processing or when reading and decoding a bytestring to another format. How am I supposed to create stubs (e.g. the bytestring) to be able to assert equality of expected data with the return data without getting into trouble with complicated stub generation?

In the following example I want to test two functions. One writes my_package-objects to disk and the other reads them from disk into an object dictionary. The open-dependency is mocked away. How can I define the stub_binary()-function?

def read(filename):
    """Read any compatible my_package objects from disk."""
    with open(filename, 'rb') as f:
        return _decode(f.read())


def write(filename, **objs):
    """Write any compatible my_package objects to disk."""
    with open(filename, 'wb') as f:
        f.write(_encode(objs))

import my_package as mp

@patch('my_package.open', new_callable=mock_open)
def test_read(m_open):
    # arrange
    m_enter = m_open.return_value.__enter__.return_value
    m_enter.read.side_effect = lambda: stub_binary()
    # act
    obj_dict = mp.read('./obj_A_and_B')
    # assert
    assert obj_dict == {'A': A(), 'B': B()}


@patch('my_package.open', new_callable=mock_open)
def test_write(m_open):
    # arrange
    m_enter = m_open.return_value.__enter__.return_value
    # act
    mp.write('./obj_A_and_B', A=A(), B=B())
    # assert
    m_enter.write.assert_called_with(stub_binary())

Stub_binary could return a hard-coded bytestring, but that gets easily messy:

def stub_binary():
    return (
        b'\x93NUMPY\x01\x00v\x00{"descr": "<f8", "fortran_order": False, '
        b'"shape": (1, 3), }                                             '
        b'             \n\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\x00\x00'
        b'\x00\x00@\x00\x00\x00\x00\x00\x00\x08@')

Or reading the above byte string from a file that was generated with mp.write:

def stub_binary(): 
    with open(gen_data, 'rb') as f:
        return f.read()

Or just replace stub_binary() with the following:

def stub_binary(x):
    with io.BytesIO() as buffer:
        mp.write(buffer, A=A(), B=B())
    return buffer

I am tempted to create the data with my production code mp.write(). But this seems not right to me. How would you approach this?

Over all, there's not a lot of concrete information here on what the test expects, how the test arranges itself and asserts itself, what you'd allegedly be doubling up on in doing so, and how you'd be writing "useless tests". You also seem to refer specifically to examples that depend on external data sources (file storage, audio/image/signal processing), which is making me wonder if you're trying to make an integration test in the shape of a unit test. — Flater, Jan 15 '21 at 12:44
Sorry. I am struggeling to precisely ask what I want to ask. I edited the text and I hope it's getting a bit clearer. I want to know how I can verify the complicated return value of a function (read) with a stub. The easiest way to create the stab in this case would be to use the complementary function (write). But this does not feel the right way to do for me. — Agent49, Jan 15 '21 at 15:39
I'm no Python-dev, so I'll leave the concrete answer to someone who can show you the correct code, but it seems you're not quite grasping what a stub is. A stub is a manually configured behavior that specifically **avoids** the need to call other "real" behaviors. It makes no sense for you to need to rely on another element (be it a second stub or real behavior) just to be able to work with the original stub. — Flater, Jan 15 '21 at 15:42
That is exactly what I am asking for. A stub provides canned data to calls made during the test, like in (1). Am I right? I don't want to call other "real" behaviours like in (2) or (3). But how am I supposed to do it correctly? How do I create the data used for verification in the first place? — Agent49, Jan 15 '21 at 16:02
I would recommend not to use the word and tag "unit test", but instead "automated test" or regression test, otherwise you will get only comments and answers telling you that you did not use the term unit test properly, but not addressing what you are really after. — Doc Brown, Jan 15 '21 at 16:49
This is a good advice and I removed it from the title. Do you think I did not use the term unit test properly? If so, why? The external depency `my_package.open` is mocked. If I would persue integration test, i probably would not mock this. — Agent49, Jan 15 '21 at 17:23
I ended up deleting my old answer and posting a new one. Your question title is a little unclear. Your true question is buried deep in a wall of text and code. — Greg Burghardt, Jan 18 '21 at 15:39
I am sorry. I should have formulate my question more clearly. — Agent49, Jan 18 '21 at 17:52

score 1 · Answer 1 · answered Jan 16 '21 at 18:13

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare

One thing to consider is whether your design can be simplified to the point that other measures of evaluating its correctness are more suitable.

def read(filename):
    """Read any compatible my_package objects from disk."""
    with open(filename, 'rb') as f:
        return _decode(f.read())

What stands out to me in this code is that it is already pretty simple. It's correct behavior depends primarily on functions that are provided to you by your run time.

And the kicker: it is unlikely to change - the only piece in this implementation that looks at all volatile is the decode method itself.

About the only reasonable change I could see making here would be to make the decode function configurable

def read(filename, decode):
    """Read any compatible my_package objects from disk."""
    with open(filename, 'rb') as f:
        return decode(f.read())

What I expect is that, having written this code (in either form), it is likely to survive many builds and releases without changing further.

Which is to say, you aren't going to derive much value from subjecting this code to repeated measurements to ensure that the behavior is consistent, because the implementation itself is unchanging (at least so long as the underlying runtime is stable).

More generally - if you have a design that separates the bits that are hard to test from the parts of the design that are unstable or complicated, then you don't also need to invest in automated tests that are specific to the hard to test bits.

It seems to me that in this design, the complicated part that needs testing is making sure that _decode understands the bits written by _encode, and potentially ensuring that _decode is compatible with previously released implementations of _encode.

But, given that your design already has those two functions isolated from the I/O, those test are relatively inexpensive to write and maintain. So if you are expecting to change the design of those methods often, it might make sense to invest in those tests.

You are right, that I test the compatibility between `_decode` and `_encode` through their public interfaces `read()` and `write()` which don't actually change.`_decode` and `_encode` might change, when the objects they handle change.Your proposal would indeed make the testing easier. Howerever, I would still have to provide a proper `stub_binary()` to `test_decode()`. I understand the value of dependency injection. But I would like to provide the user of a simple interface and no variable decoder. Should I really inject the decoder only because its "better design" and easier to test? — Agent49, Jan 18 '21 at 10:54

score 1 · Answer 2 · answered Jan 16 '21 at 19:26

1

Generate a reference file.

Simply shove the result of mp.write on mock values inside the repository in a file then you'll very easily test the encode and decode functions work by testing their result on the same mock values against the file. Your option 2, basically.

Advantages over using mp.write on the fly are multiple, the most important being isolation. You won't break your decode test if you break your encode function. Advantage over raw encoded string, well it just keeps the tests clean and readable.

It's tempting to craft test based on production data but there are several pitfalls: long execution time if the dataset is not minimal, not being covering completely the function scope, and it's more difficult to reverse-engineer.

answered Jan 16 '21 at 19:26

Diane M

2,046
9
14

Strange, I already submitted a comment a few hours ago. I also like option 2 the most, because it delivers clean code, utilizes only the python-builtin `open` in `stub_binary()` which is well-known and tested and it is canned data that can't change over time. But wouldn't be the effort of isolating the test by mocking `open` in the first place be for nothing? Doesn't this reimplement the dependency? – Agent49 Jan 18 '21 at 12:43
1

You could mock and re-implement `open` or you could use existing implementation, I don't think `open` is a crucial part of the implementation either way. My preference goes with using original `open` if that is possible, without any mock, but sometimes it's not possible (if for example you append a directory or absolute path or something in the actual implementation). – Diane M Jan 18 '21 at 13:49

score 1 · Accepted Answer · answered Jan 18 '21 at 15:38

Generating complicated test data does not have a general solution. Sometimes it is useful to return a hard-coded string, if the behavior under test is very targeted. Sometimes it is useful to read an existing file, because a hard-coded string gets messy to maintain.

VoiceOfUnreason recommended splitting out the decoding function into its own parameter so you can isolate that behavior in their own tests. I echo that recommendation, but will focus on how to generate the test data.

Test fixtures are a general way to stub test data. Way too general. For your case I would build a class that contains methods returning the test data. Naming this class is up to you, but I recommend giving it a name that represents the kind of data it returns. You don't give us much to name things, but something like FooSignalFixtures where you replace "Foo" with some sort of noun or phrase representing the kind of signal data your code handles.

Method names in this class should guide test writers (and readers) as to why the test data exists:

class FooSignalFixtures:
    def invalid_header():
        # Open file with invalid header and return contents

    def invalid_foo_column():
        # Open file with invalid value in column "foo" and return file contents

    def empty_file():
        return ''

    def header_with_no_contents():
        return '...' # Headers only as hard coded string

Focus most of your testing effort on the decoder, which should accept a bytestring as a parameter. Then your decoder tests would use the test fixtures:

def given_empty_contents_when_decoding_then_error_is_thrown():
    file_contents = FooSignalFixtures.empty_file()

    try:
        decoder.decode(file_contents)
        # test fails
    except InvalidSyntaxError:
        # test passes
    except RuntimeError:
        # test fails

Now coming back to this test you can see this tests for behavior when the file is empty, further reinforced by the FooSignalFixtures.empty_file() method. Test readers do not need to understand how the test data is generated. If they do need to know, you have a class that specializes in this. The class can mix options 1 and 2, depending on the complexity of the test data.

This also gives you a way to write a test for a very specific production defect found only in one file, and keep that file in your test fixtures to guard against the defect from creeping back in to your code.

Thank you. I will follow your recommendation because it scales perfectly with new bytestrings for new objects to read and write. I probably won't inject the decoder at this moment because even though you are right about testability and I might not strictly follow the single responsibility principle, I still prefer the user to be able to read or write an object in a single line. There probably won't be more than one way to decode/encode a set of objects. — Agent49, Jan 18 '21 at 17:35

How can I avoid chasing my own tail when testing against complicated return values?

3 Answers3