Storytelling with tests #2: setting up the stage

The Roman theatre in Bosra by Alessandra Kocman

This is the second part of the post series dedicated to quality of automated tests. In the previous part I wrote about test names and granularity. Today I am going to focus on test data setup, its readability and how it is related with the evolution of our test suites.

We will start with exploring multiple approaches to implementing unit tests for a dummy domain. After that, I will show you some additional patterns for constructing test object graphs.

About the examples

In order to make examples concise, I had to simplify the implementation. If some of the presented arguments seem to be exaggerated, try to imagine a more complex, real-world domain under test. Moreover, the article contains only some of the code examples. You can see the full picture in the repository on GitHub.

Test data builders

I use test data builders for all the examples. Using this pattern is a very powerful technique and I highly recommend it. You can read more about it on Nat Pryce’s blog. Do not worry if you do not want to get into details right now. My code examples should be still easy to understand. At the end of this post I will give you some additional tips for working with test data builders.

Domain example

Let’s take the following part of the domain under consideration:

A document has an author. New document is considered a draft, and its content may be freely amended (only by the author). At some point a document may be submitted for a review. During the review, the document may be either accepted or rejected by its editors (one or more). Any further amendments result in creating new revision of the document.

I will start implementing unit tests covering these rules and we will see how adding new tests affects readability and development of the test suite.

First tests

Let’s start with these two simple tests:

public class DocumentTest {

    Person homer = new Person("Homer Simpson");
    Person bart = new Person("Bart Simpson");

    @Test
    void newRevisionIsNotCreatedWhenAmendingADraft() {
        Document document = document()
                .authoredBy(homer)
                .build();

        document.amend("new content", homer);

        assertThat(document).hasRevisionNumber(1);
    }

    @Test
    void documentCanBeAmendedOnlyByItsAuthor() {
        Document document = document()
                .authoredBy(homer)
                .build();

        assertThrows(
            IllegalArgumentException.class,
            () -> document.amend("new content", bart));
    }
}

It could be tempting to think about DRY principle. All in all, exactly the same document is set up in both of the tests, right? Let’s remove the duplication then:

public class DocumentTest {

    // (...)

    Document document;

    @BeforeEach
    void setup() {
        document = document()
                .authoredBy(homer)
                .build();
    }

    @Test
    void newRevisionIsNotCreatedWhenAmendingADraft() {
        document.amend("new content", homer);

        assertThat(document).hasRevisionNumber(1);
    }

    @Test
    void documentCanBeAmendedOnlyByItsAuthor() {
        assertThrows(
            IllegalArgumentException.class,
            () -> document.amend("new content", bart));
    }
}

There is no duplication now, but are the tests really better? In my opinion they are not. I believe that each test should be self-descriptive. Now it is not clearly visible from the test itself, that homer is author of the test document. It might not be a problem when both @Before and the test fit on a single screen, but I guess you know how quickly it may no longer be true.

How can we fix it? We could change variable names. One way would be to change document into something like documentAuthoredByHomer. The tests above would become more descriptive then, but what if further tests in the suite required additional details? You would either end up with cumbersome names such as documentAuthoredByHomerRejectedByBart or with many test objects, each one created for a couple of tests.

Another solution would be to explicitly name the variables holding Person objects. So homer would become author and bart would become sbElse. document would still be document in this case. This approach also has its limitations, but let’s keep it for now and try to see what happens.

More test cases

Let’s implement tests for the following part of our domain:

(…) During the review, the document may be either accepted or rejected by its editors (one or more). (…)

If we wanted to follow DRY consequently, we could end up with something like this (I omitted previous tests and their setup for brevity, but it’s still the same class, DocumentTest):

public class DocumentTest {
    
    // ... (setting up Person objects) ...

    Document submittedDocument;

    @BeforeEach
    void setup() {
        // ... (setting up a document for previous tests) ...

        submittedDocument = document()
                    .withStatus(SUBMITTED)
                    .authoredBy(author)
                    .withEditors(editor1, editor2)
                    .build();
    }

    @Test
    void documentIsNotAcceptedUntillAllEditorsAcceptIt() {
        submittedDocument.accept(editor1);

        assertThat(submittedDocument).doesNotHaveStatus(ACCEPTED);
    }

    @Test
    void documentHasStatusAcceptedWhenAllEditorsAcceptIt() {       
        submittedDocument.accept(editor1);
        submittedDocument.accept(editor2);
        
        assertThat(submittedDocument).hasStatus(ACCEPTED);
    }
}

The example is very simplified, but the tendency is clearly visible: our test suite contains groups of tests with common setup. The more functionalities related to Document are covered, the more of such groups in DocumentTest. The @Before section would quickly grow and it would be cumbersome to follow test objects related to a particular test. The tests would eventually be written many lines away from the data setup. This impedes reading as well. It is a smell and we should somehow address it.

One test class per unit?

It is a very popular approach to have one test class per tested unit. I followed it so far, hence we have got DocumentTest corresponding to Document. In many cases it might be OK, but when the number of tests increases (as highlighted above), we should think about Single Responsibility Principle. That’s it, I believe that test code is in no way inferior to production code, and should be always developed with good practices in mind.

What does it really mean? We could have more than one test class for our Document. All in all, each test is related to a particular business rule (domain concept), not just to a particular class (technical concept). For example, we could divide our tests into the following suites:

  • DocumentAmendingTest – tests related to amendments and corresponding revisions
  • DocumentAcceptingTest – tests related to editors accepting submitted documents

And so on, one test class for each functionality.

One could argue that having one-to-one relation between production class and test class makes looking things up much easier. It is true, especially when naming conventions are clear and simple (e.g. test class named as production class name with suffix “Test”).

However, I think that even with separate test classes you can still easily look tests up and maintain them. You can still follow the convention “production class name + suffix” (and just have more suffixes than just a single “Test”).

You can also leverage Junit nested tests and organize a hierarchical test class structure.

Excessive @Before?

I’m still not happy with readability of the tests, though. Let’s take a deeper look:

@Test
void documentIsNotAcceptedUntillAllEditorsAcceptIt() {
    submittedDocument.accept(editor1);

    assertThat(submittedDocument).doesNotHaveStatus(ACCEPTED);
}

Basing on the test name, we can assume that there is at least one more editor other than editor1. But we have to jump to @Before to be sure. In my opinion it is a code smell.

I believe that the best solution is to move document creation from the common @Before method back to the test method itself:

@Test
void documentIsNotAcceptedUntillAllEditorsAcceptIt() {
    Document document = document()
                .withStatus(SUBMITTED)
                .withEditors(editor1, editor2)
                .build();

    document.accept(editor1);

    assertThat(document).doesNotHaveStatus(ACCEPTED);
}

As an additional result, we could opt out from names such as editor1 and editor2, as the relationships between test objects are now clearly visible.

What if other tests require similar setup? My answer may sound radical: do (sic!) repeat yourself. If we use builders, the potential changes of the way the data is set up should not cause any major problems. Extracting data setup to common variables (or helper methods) may be sometimes required, but very often it’s not. I believe that as far as test data setup is concerned, focusing on DRY is not as important as maintaining DAMP (Descriptive and Meaningful Phrases).

Another view on this issue is to compare @Before method to dramatis personæ of a theatrical play. It should not contain spoilers. I hope everyone agrees that Shakespeare’s version:

DUNCAN, King of Scotland
MACBETH, Thane of Glamis and Cawdor, a general in the King’s army
LADY MACBETH, his wife

William Shakespeare

is better for a new reader than the following:

DUNCAN, King of Scotland murdered by Macbeth
MACBETH, Thane of Glamis and Cawdor and eventually King of Scotland after commiting treason and killing Duncan
LADY MACBETH his wife, who successfully persuades him to kill the King

a vicious spoiler beast

I think it is the same with the code: you may setup objects in @Before as long as the setup does not become a part of the test scenario.

Additional patterns for data setup

In the reminder of this article I will highlight some of the patterns related to test data builders.

Avoid shortcuts in test data setup

Let’s look on the following test fragment:

Document doc = document()
        .authoredBy(homer)
        .withEditor(bart)
        .accepted()
        .build();

How can we implement accepted() method of the builder? It might be tempting to use reflection or introduce a setter in order to manipulate status field of the document directly. In some cases Reflection might be a pragmatic approach. But there is risk associated with such approach: it may result in constructing invalid object graphs (i.e. your test may use state, that would never normally occur in your business objects).

The safest (but often not the most convenient) way is to use only the public API of the tested object. Only then the test data is set up in exactly the same way as is in the production code.

For the example above, in order to construct a document in ACCEPTED state, we have to make all the transitions:

  1. New document is created (with status DRAFT)
  2. The author submits it (status switches to SUBMITTED)
  3. All the editors accept it (status switches to ACCEPTED)

You can find the implementation of the builder in the repository on Github.

When the executed logic is more complex, such setup might become fragile, i.e. result in unexpected object state due to changes of the requirements (and implementation). You can use the next pattern to save yourself from some debugging in such situations.

Assertions in data setup

I sometimes find it useful to add an assertion to the data setup. It looks as follows:

public Document build() {
    Document doc = new Document(content, author, editors, reviewer);
    if (submit) {   
        doc.submit(author);
    }
    if (accept) {
        for (Person editor : editors) {
            doc.accept(editor);
        }
        // extra assertion to be sure we achieved the desired state of the object:
        if (ACCEPTED != doc.getStatus()) {
            throw new AssertionError("failed to build an accepted document");
        }
    }
    return doc;
}

If the logic changes and some extra steps are required to change the status to ACCEPTED, all the tests relying on this part of the builder will fail fast. You will be quickly prompted to update your builder and no test will execute with incorrect data (which could cause misleading error messages).

Builders with mocking support

When dealing with a mocked test of code that relies on persistence and repositories (or Data Access Objects), very often you would write a data setup similar to the following (I am using Mockito):

Document doc = document()
        .inStatus(SUBMITTED)
        .withId(7L)
        .build();
when(documentRepo.findById(7L)).thenReturn(doc); 

I found it convenient to introduce buildAndMock method to my builders:

public Document buildAndMock(Repository<Document> mockRepo) {
    Document doc = build();
    when(mockRepo.findById(doc.getId())).thenReturn(doc);
    return doc;
}

With such helper in place, the example above would have the form of:

Document doc = document()
        .inStatus(SUBMITTED)
        .withId(7L)
        .buildAndMock(documentRepo);

Please note, that this is a shortcut. I found it helpful, but I don’t recommend adding any more complex mocking in builders, as it may negatively affect readability of tests.

Builders with persistence support

I usually use a similar pattern for persisting entities in my integration tests:

public Document build(EntityManager em) {
    Document doc = build();
    em.persist(doc);
    return doc;
}

It introduces an additional build(EntityManager em) method to the builder. It calls the regular parameterless build and immediately persist it.

Summary

It has been quite a long post, so thank you for staying with me up to this point. Summing things up, I believe in the following rules in regard to the test data setup:

  • use builder pattern to hide details of constructing object graphs
  • don’t be afraid of putting your test into multiple classes on test-class-per-business-rule-group basis
  • do not follow DRY when it would have a negative impact on the readability
  • use the same way of constructing data as in the production code
  • construct test data in a reliable way

When following this rules, the tests are not only reliable, but also the story they tell is easier to follow for the reader. I hope you enjoyed this part of the series. Until next time!

Sources

You can find the code samples and working tests on Github. All the steps described in the article have been committed separately, so you can easily follow the whole process.

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *