What's the point of running unit tests on a CI server?

Question

Why would you run unit tests on a CI server?

Surely, by the time something gets committed to master, a developer has already run all the unit tests before and fixed any errors that might've occurred with their new code. Isn't that the point of unit tests? Otherwise they've just committed broken code.

Our developers are not allowed to commit to master. They push to a feature branch, the CI server then merges with master and runs tests. If they succeed, _then_ changes are merged to master. So code with broken tests **cannot** be on master... — Boris the Spider, Jan 27 '16 at 16:45
@BoristheSpider - very good workflow indeed. `master` should always be sane, and preferably automatically deployed on each merge to a staging environment for internal QA & testing. — Per Lundberg, Jan 27 '16 at 18:04
@BoristheSpider, I like that workflow, I hate doing the merge and then finding out that someone else has done a commit in the time it takes me to run the tests! — Ian, Jan 27 '16 at 19:12
Read http://www.javacodegeeks.com/2013/02/breaking-build-is-not-a-crime.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+JavaCodeGeeks+%28Java+Code+Geeks%29 appropriately titled "Breaking build is not a crime" — emory, Jan 27 '16 at 19:58
"Surely, by the time something gets committed to master, a developer has already run all the unit tests before and fixed any errors that might've occurred with their new code." What fantasy world do you live in? — jpmc26, Jan 27 '16 at 22:21
@BoristheSpider that workflow sounds great, I assume conflicts still are resolved manually, or is it required for the feature branch to pull master first? — Matthew, Jan 27 '16 at 22:22
@Matthew it's Git, so all conflicts must be resolved before push rather than on push. If Mr. Jenkins get a conflict, it will throw back en error to the developer - who must rebase his/her branch on master and try again. — Boris the Spider, Jan 28 '16 at 10:17
In some industries, the important part isn't just to run the tests on the code, it's to run the tests on the *binaries*. Running the tests on the CI output means that you can guarantee that the delivered product works, because the exact binary your client received is the one that passed all your tests. It sounds trivial, but sometimes this can have an effect (one I've seen is obfuscation; on complex projects, or when set up oddly, it can cause problems in the obfuscated build that weren't there in the clean version). — anaximander, Jan 28 '16 at 16:53
"Surely, by the time something gets committed to master, a developer has already run all the unit tests before and fixed any errors that might've occurred with their new code."...not sure if serious — chucksmash, Jan 29 '16 at 16:44
You could think of the CI as being an extra check. I've ran tests on my machine and they've all passed then done a push and found them fail on the CI server. The code that worked on my machine wasn't the same as the code that I committed. In my case running GIT clean on my code base solved the problem. — Daniel Hollinrake, Feb 03 '16 at 09:11

score 233 · Accepted Answer · edited Aug 31 '22 at 08:47

233

Surely, by the time something gets committed to master, a developer has already run all the unit tests before and fixed any errors that might've occurred with their new code.

Or not. There can be many reasons why this can happen:

The developer doesn't have the discipline to do that
They have forgotten
They didn't commit everything and pushed an incomplete commit set (thanks Matthieu M.)
They only ran some tests, but not the whole suite (thanks [nhgrif][2])
They tested on their branch prior to merging (thanks [nhgrif][2] * 2)

But the real point is to run the tests on a machine that is not the developer machine. One that is configured differently.

This helps catch out issues where tests and/or code depend on something specific to a developer box (configuration, data, timezone, locale, whatever).

Other good reasons for CI builds to run tests:

Testing on different platforms other than the main development platforms, which may be difficult for a developer to do. (thanks [TZHX][3])
Acceptance/Integration/End to End/Really long running tests may be run on the CI server that would not be run on a developer box usually. (thanks [Ixrec][4])
A developer may make a tiny change before pushing/committing (thinking this is a safe change and therefore not running the tests). (thanks [Ixrec][4] * 2)
The CI server configuration doesn't usually include all the developer tools and configuration and thus is closer to the production system
CI systems build the project from scratch every time, meaning builds are repeatable
A library change could cause problems downstream - a CI server can be configured to build all dependent codebases, not just the library one

[2]: What's the point of running unit tests on a CI server?) [3]: What's the point of running unit tests on a CI server? [4]: https://softwareengineering.stackexchange.com/users/161917/ixrec

edited Aug 31 '22 at 08:47

Qqwy

4,709
4
31
45

answered Jan 27 '16 at 13:24

Oded

53,326
19
166
181

37

Other common reasons: 1) The CI server may run high-level integration/acceptance tests that take too long for the developers to always run them. 2) The developer did run them and then made one tiny change before pushing that they were very sure wouldn't break anything, but we want to be certain. – Ixrec Jan 27 '16 at 13:34
11

A change to a dependency often also runs all the downstream builds too. If a change that a developer makes breaks something downstream, it isn't easily seen when modifying a library (say changing an underlying datatype from a SortedSet to a HashSet (only providing the contract of Set) and someone downstream worked on the mistaken assumption that the Set was sorted). Not running the (downstream) tests on the CI server would let that bug fester for awhile. – Jan 27 '16 at 14:10
2

@MichaelT Good catch. That's actually the cause for >90% of our CI failures these days, not sure how I forgot it... – Ixrec Jan 27 '16 at 15:14
1

Another one: a server can run in parallel a possibly large matrix of different build configurations, making the process much faster. – Davidmh Jan 27 '16 at 15:28
2

Most CI servers are set to email the team when something breaks. So, breaking the build brings on the shame! – jaredready Jan 27 '16 at 17:50
We have CI on our development, staging (pre-prod) and production (master) builds, with unit test execution. If someone breaks the build, this needs to happen *long* before the code is promoted to master. – GalacticCowboy Jan 27 '16 at 18:02
Running it on a different configuration is also important if the developer hard-coded something that should be configuration-driven. One of our developers committed a suite of unit tests that all referenced a CSV file on his desktop. – GalacticCowboy Jan 27 '16 at 18:03
35

Also, running them on a CI environment usually means you set up your project _from scratch_, ensuring your build is **repeatable**. – mgarciaisaia Jan 27 '16 at 18:54
5

Also, two changes could be committed that tested okay separately, but break together (e.g. one removing an unused API, and the other starting to use it). – Simon Richter Jan 27 '16 at 20:20
1

Real life argument: My computer happens to have parallel instals of Python 2.4, Python 2.7, Python 2.7 (64-bit), Python 2.7 (anaconda), Python 3.5.1. It also has multiple copies of the boost libraries installed. Needless to say, it is a poor substitute for any machine my users may ever have, concerning configuration. =) – Cort Ammon Jan 27 '16 at 21:16
In a previous job, I (working tier 2 tech support, not as a programmer) found a bug that was covered up by the fact that the test environment had working DNS, but the overwhelming majority of the customer base did not. The code wrote a URI to the hosts file rather than a domain name, and it was released that way. Had the test environment been set up without DNS (to test a feature designed to make up for the lack thereof in the field) it never would have shipped with that bug. – Monty Harder Jan 27 '16 at 23:10
@SimonRichter in that case wouldn't the developer who tried to push second have to merge in the first change on their machine? – bdsl Jan 27 '16 at 23:54
2

@bdsl, no, that would create lots of blockers -- if I have ten developers working on one feature each, and the test suite takes an hour to run, I'd have to delay the person submitting as number ten by nine hours before they can finally merge to the current state, run the tests and push their changes. Normally, their work should not interfere, so the CI would catch the rare cases where it does. – Simon Richter Jan 28 '16 at 01:29
A common case is if they have two change lists and they _think_ the changes are independent **but** one change set depends on a change from the one. Then the tests pass locally but fail when committed. – Sled Jan 29 '16 at 14:56
The developer may have run the tests on their branch but not rerun the tests after merging. The developer may have only run a partial set of the tests (the ones believed would have been effected by the changes). – nhgrif Jan 30 '16 at 13:10
Also, importantly, where I work, we have another tool that keeps an eye on the status of all of our projects. It looks at things beyond just whether or not tests are passing, but it does look at whether or not tests are passing, and it relies on pulling that data from the build server rather than from an individual developer. – nhgrif Jan 30 '16 at 13:11
2

Since you have plenty of good reasons already, might I suggest adding the infamous "oops, I forgot to commit that..."? It's easy to push an *incomplete* change, the CI server will catch that. – Matthieu M. Jan 30 '16 at 17:31
Most common test failure reason in our company is the developer testing the debug build only on his machine. The CI servers test the release build. – Sebastian Redl Jan 31 '16 at 18:14
@Oded another common thing could be that a developer commits a work in progress commit using `commit --no-verify` to later fix some tests. pushing it will send untested code to CI. – Webber Mar 26 '22 at 16:44

score 77 · Answer 2 · answered Jan 27 '16 at 13:58

77

As a developer who doesn't run all the integration and unit tests before making a commit to source control, I'll offer up my defense here.

I would have to build, test and verify that an application runs correctly on:

Microsoft Windows XP and Vista with Visual Studio 2008 compiler.
Microsoft Windows 7 with Visual Studio 2010 compiler.
- Oh, and the MSI builds for each of those.
RHEL 5 and 6 with 4.1 and 4.4 respectively (similarly CentOS)
- 7 soon. Woop-de-woop.
Fedora Workstation with GCC for last three recent versions.
Debian (and derivatives like Ubuntu) for last three recent versions.
Mac OSX in last three recent versions.
- And the packages (rpm, dmg, etc)

Add in the Fortran (with both Intel and GNU compilers), Python (and it's various versions depending on OS) and bash / bat script components and, well, I think you can see things spiral out

So that's sixteen machines I'd have to have, just to run a few tests a couple of times a day. It would be almost a full time job just to manage the infrastructure for that. I think almost anyone would agree that's unreasonable, especially multiplying it out to the number of people in the project. So we let our CI servers do the work.

Unit tests don't stop you committing broken code, they tell you if they know you've broken something. People can say "unit tests should be fast", and go on about principles and design patterns and methodologies, but in reality sometimes its just better to let the computers we've designed for repetitive, monotonous tasks do those and only get involved if they tell us they've found something.

answered Jan 27 '16 at 13:58

TZHX

5,052
2
25
32

3

Unit testing tests *code* not configurations. It would be seriously inert of you to add a new test and throw it over the wall without even running it locally first... – Robbie Dee Jan 27 '16 at 15:18
34

@RobbieDee I'm afraid I can't see your point? I don't suggest creating new tests without testing them locally, or just blindly committing things to source control without testing them yourself, and I **would** run the tests on my own machine -- but "configuration" does need to be tested for consistent behavior, and it's better to do that relatively quickly when the developer's mind is still in that area than finding an issue when the team who predominantly uses Macs wake up four thousand miles away and update their copies. – TZHX Jan 27 '16 at 15:29
1

Er...what!?! You either *do* believe in unit testing before check-in (your comment) or you don't (you answer). Or is there some grey area I'm not aware of or you haven't explained? – Robbie Dee Jan 27 '16 at 23:06
7

@RobbieDee I'd say TZHX would run *all* the tests locally if they could do so, but they *can't*. Since TZHX can't, they run some tests locally (those that can run on their dev system and short enough or most relevant to the changed code, for example), and let the full battery run on the CI system. Fairly reasonable. – muru Jan 27 '16 at 23:52
11

@RobbieDee: He believes in unit testing. So he tests them on his Macbook air and pass and checks in. The CI servers running Red Hat, Solaris and Windows then runs those tests again. Isn't it nice to know that what you tested also works on production platforms? – slebetman Jan 28 '16 at 07:27
@slebetman At last some sense. That is helpful - thank you. So in summary, he *does* run all the unit tests *somewhere* (contrary to his opening remark) - just not on all platforms. – Robbie Dee Jan 28 '16 at 08:44
2

@RobbieDee: I have often written Unit Tests that were specific to a certain compiler on a certain platform. Consider e.g. a graphics subsystem that makes use of AMD (the Intel competitor) specific CPU instructions which are only available on g++ (the GNU C++ compiler) version 4.5 or newer, but I happen to work on a Atom CPU and ICC (the Intel C++ Compiler). It would be nonsense to run the AMD/g++4.5-tests everytime on that machine, yet it is code to be tested before release; plus my own CPU-independent code must be tested for proper interoperability. Sure, there are VMs and emulators, ... – phresnel Jan 28 '16 at 14:41
@RobbieDee: ... but booting them before every single commit would be a catastrophe in salary terms. – phresnel Jan 28 '16 at 14:41
1

I'm really curious -- what are you running/testing that uses C, Fortran and python on all those systems? I'm in a similar boat (plus C++, and tcl) and it tends to be a pretty niche boat at that. – tpg2114 Jan 29 '16 at 00:35
@RobbieDee -- I don't think his opening statement is really contradictory in any way. TZHX runs a subset of the tests locally, before committing, then commits his code. Then, the CI servers will automatically detect that commit and will run the remainder of the tests. (Then, presumably, once the full suite of tests are passed, the commit is presumably merged into a master or "release" branch in some way). Or, to put it another way, there's a distinction between running tests _before_ a commit, and _after_ a commit. – Michael0x2a Jan 29 '16 at 04:37
2

@RobbieDee In some popular languages, there's undefined behavior in cases where developers don't expect it, which will behave one way on one compiler, and another way on a different compiler, such as left shift a 32 bit integer by 32 or more. And of course there's the well known Endian issue. And last but not least, compilers and standard libraries have bugs, and these bugs differ on different platforms. – Peter Jan 29 '16 at 13:06

Robbie Dee · Answer 3 · 2016-01-27T13:29:16.560

23

You'd think so wouldn't you - but developers are human and they sometimes forget.

Also, developers often fail to pull the latest code. Their latest tests might run fine then at the point of check-in, someone else commits a breaking change.

Your tests may also rely on a local (unchecked-in) resource. Something that your local unit tests wouldn't pick up.

If you think all the above is fanciful, there is a level above CI (on TFS at least) called Gated where builds that have failing tests are shelved and aren't committed to the code base.

edited Jan 27 '16 at 13:29

answered Jan 27 '16 at 13:23

Robbie Dee

9,717
2
23
53

7

I've seen more oops I forgot to commit that CI failures that I care to admit to. – Dan Is Fiddling By Firelight Jan 27 '16 at 15:10
1

@DanNeely To be fair, it beats getting your butt kicked by the build manager because you forgot to tell him/her about something... :-) – Robbie Dee Jan 27 '16 at 15:12
3

That's one of the reasons I love CI. Finding about and fixing your own ooopses is much better than having someone else find them for you. – Dan Is Fiddling By Firelight Jan 27 '16 at 15:14

score 23 · Answer 4 · edited Jan 28 '16 at 10:16

Apart from the excellent Oded answer:

You test the code from the repository. It may work on your machine with your files... that you forgot to commit. It may depend on a new table that does not have the creation script (In liquibase for example), some configuration data or properties files.
You avoid code integration problems. One developer downloads the last version, creates unit and integration test, adds code, pass all test in his machine, commits and push. Another developer has just done the same. Both changes are right on their own but when merged causes a bug. This could be the repository merging or just that it is not detected as a conflict. E.g. Dev 1 deletes file that was not used at all. Dev 2 codes against this file and tests without Dev 1 changes.
You develop an script to deploy automatically from the repository. Having an universal building and deploying script solves a lot of issues. Some developer may have added a lib or compiling option that is not shared by everybody. Not only does this save you time, but more importantly, it makes the deployment safe and predictable. Furthermore, you can go back in your repository to version 2.3.1 and deploy this version with a script that works with this version. It includes database objects like views, stored procedures, views, and triggers that should be versioned. (Or you won't be able to go back to a workable version).
Other tests: Like integration, performance and end to end tests. This can be slow and might include testing tools like Selenium. You may need a full set of data with a real database instead of mock objects or HSQL.

I once worked on a firm that had a lot of bugs on deployment due to the merging and deployment process. This was caused by a weird propietary framework that made testing and CI hard. It was not a happy experience to find that code that worked perfectly on development didn't arrive right to production.

Yeap, simply forgetting to commit some of the changes is very common. I'd say forgetting to "svn add" new files and so forgetting to commit them later is the most popular way to get a failing automatic build. — sharptooth, Jan 28 '16 at 07:28

Daenyth · Answer 5 · 2016-01-27T13:40:38.177

14

by the time something gets committed to master

I usually set up my CI to run on every single commit. Branches don't get merged into master until the branch has been tested. If you're relying on running tests on master, then that opens a window for the build to be broken.

Running the tests on a CI machine is about reproducible results. Because the CI server has a known clean environment pulled from your VCS, you know that the test results are correct. When running locally, you could forget to commit some code needed for them to pass, or have uncommitted code that makes them pass when they should be failing.

It also can save the developers time by running different suites in parallel, especially if some are slow, multi-minute tests that aren't likely to be run locally after each change.

At my current work our production deployment is gated on CI passing all tests. The deploy scripts will prevent deployment unless they're passing. This makes it impossible to accidentally forget to run them.

CI being part of the workflow takes burden off of developers as well. As a developer, do you usually run a linter, static analyzer, unit test, code coverage, and integration test for every single change? CI can, completely automatically and without needing to think about it - reducing decision fatigue.

edited Jan 27 '16 at 13:40

answered Jan 27 '16 at 13:29

Daenyth

8,077
3
31
46

1

You shouldn't really have slow unit tests - this violates [FIRST](https://github.com/ghsukumar/SFDC_Best_Practices/wiki/F.I.R.S.T-Principles-of-Unit-Testing) principles. – Robbie Dee Jan 27 '16 at 13:33
4

@RobbieDee: I think that usually the CI server runs all the tests, not just the unit tests. – RemcoGerlich Jan 27 '16 at 13:35
@RemcoGerlich It can do, but this isn't usually the case in my experience - YMMV... – Robbie Dee Jan 27 '16 at 13:37
4

@RobbieDee: in theory all unit tests are fast. In practice.... Regardless, CI can and should run *all* the tests - linters, static analysis, unit tests, integration tests. – Daenyth Jan 27 '16 at 13:39
@Daenyth That just isn't always practical when the tests can take minutes/hours and multiple builds are queuing. It can work in smaller development teams. There is of course nothing to stop you from having an specific overnight build that includes your integration tests etc. – Robbie Dee Jan 27 '16 at 13:44
2

@RobbieDee Obviously the specifics of configuration will vary from team to team. Even when the builds take multiple minutes, it's often possible to run multiple of those builds in parallel. Given a single monolithic codebase it could be a larger drawback, but IME it's not a barrier. – Daenyth Jan 27 '16 at 13:50
Unworkable in the enterprise development environment I work in with 100s of devs. But I do see how this could work in a smaller set up or for a maintenance project. – Robbie Dee Jan 27 '16 at 23:10
1

@RobbieDee I think it depends more on your architecture. I've seen it work for hand for an engineering team of ~80, but that's with well-defined sub-teams for product areas. – Daenyth Jan 28 '16 at 03:42

score 5 · Answer 6 · answered Jan 27 '16 at 13:24

By the time something gets committed to master, a developer should have already run all the unit tests ... but what if they haven't? If you don't run the unit tests on the CI server, you'll not know until someone else pulls the changes to their machine and discovers the tests just broke on them.

In addition, the developer may have made a mistake and referenced a local resource specific to their machine. When they check in the code and the CI run fails, the problem is immediately identified and can be corrected.

score 3 · Answer 7 · edited Jan 28 '16 at 09:52

Assuming (contrary to other answers) that developers are quite disciplined and do run unit tests before committing, there can be several reasons :

running unit tests can take long for some special set up. For example, running unit tests with memory checker (like valgrind) can take much longer. Although all unit tests are passing, memory check can fail.
the result is not that important for some special settings - for example, running unit tests to check the code coverage requires special compiling flags. For normal developers, code coverage is not that important - it is more for people taking care that code maintains certain quality, like team leads.

h22 · Answer 8 · 2016-01-30T17:09:10.730

It is possible to imagine cases when the change A does not break the test, and change B does not break the test, but A and B together do. If A and B are made by different developers, only CI server will detect the new bug. A and B may even be two parts of the same longer sentence.

Imagine a train driven by the two locomotives A and B. Maybe one is more than enough and this is the fix to apply. However if the two "fixes" are applied removing both, the train will not move.

Also, not all developers run all Unit tests, while most good developers do.

Peter · Answer 9 · 2016-01-29T13:17:23.400

Let's ask an equivalent question:

Why would you build the code on a CI server?

Surely, by the time something gets committed to master, a developer has already built the code before and fixed any errors that might've occurred with their new code. Isn't that the point of building code? Otherwise they've just committed broken code.

The are several reasons for doing CI, but the main point of CI is to get an idea what the state of the code is over time. The main benefit (out of several) this provides, is that we can find out when the build breaks, figure out what broke it, and then fix it.

If the code is never broken, why do we even use CI? To deliver builds for testing, nightly builds would be good enough.

What's the point of running unit tests on a CI server?

9 Answers9

Linked