9

HSQLDB is great. It (also) has an embedded mode (no dedicated server needed), which allows for quick prototyping of stuff like Proof of Concepts, and it can also be great in production-ready applications, as a quick and simple storage of various data.

However, at least 90% of the projects I've worked on in the past years, if they deal in any way with an SQL database, they will unavoidably have unit tests that use an embedded HSQLDB.

Sure-enough, these projects (which use the standard Maven structure most of the times), have a ton of "unit tests" (or, at least, some kind of tests, but which are located in the "unit-test" area of the project (like "src/test/java")) which use one or more embedded instances of HSQLDB in order to do some CRUD operations and check the results.

My question is: is this an anti-pattern? Does the fact that HSQLDB is easy to integrate and the embedded server is very lightweight make it so that it can be overlooked as "mock-up" of an actual database, when it shouldn't be the case? Shouldn't such tests be treated more like integration-tests (since each of them is composed of "something" AND that "something"'s integration with a database server)? Or am I missing something in the "unit test" definition, and we can infact say that using HSQLDB like this simply adheres to the "mock the dependencies, and test only the unit" part of the "unit test" definition?

Shivan Dragon
  • 4,583
  • 5
  • 24
  • 31
  • 3
    Isn't this just mocking applied to the Database layer? I wouldn't say it was an anti-pattern as long as it's a simple and appropriate way of getting those modules that depend on it unit-tested. – Kilian Foth Feb 11 '14 at 11:47
  • 2
    @KilianFoth yes but is it still mocking when the "mock" (the db) actually takes an active role in the testing process? For example, when you have a class with some dependency (mandatory for the instantiation of the class), if you want to unit test that class, you mock that dependency, but then you simply test the methods of the class, you don't care anymore about the mocked dependency, it doesn't have such a big role in the whole unit-testing of the class thing, as HSQLDB has in these type of unit tests... – Shivan Dragon Feb 11 '14 at 11:58
  • possible duplicate of [Staying OO and Testable while working with a database](http://programmers.stackexchange.com/questions/42792/staying-oo-and-testable-while-working-with-a-database) – gnat Feb 11 '14 at 12:00
  • 1
    @gnat well, i personally see that other question as more like "how to unit test stuff at the above-data-access layer - should we mock the data-access layer or what", whilst mine is "how to i unit test suff at the data-access layer itself, is HSQLD and the like a valid way to mock things?"... – Shivan Dragon Feb 11 '14 at 12:06
  • 1
    Related: [how to test the data access layer](http://programmers.stackexchange.com/questions/219362/how-to-test-the-data-access-layer) –  Feb 11 '14 at 13:49

6 Answers6

8

I consider using something like HSQLDB (or DBUnit) when testing my DAOs as a necessary step to guarantee quality over the code I write when touching a data layer. As already pointed out, it is not a bullet-proof solution but in combination with its dialects it can cover a reasonable part of your cases.

My only point would be: let's be careful when we talk about unit-tests. When I test a class interacting with an external component that is sort of out of my control (i.e. it can produce non-deterministic behaviour) I consider it automatically as an integration test.

Assuming that you've factored out the logic where you interact directly with your datastore (via JDBC drivers, ORM, etc.) in a concrete class hidden by an interface, a proper unit-test for your DAO would be developed by passing to it a concrete mocked implementation of the interface that returns a set of well-known values against which you will test it.

kappolo
  • 281
  • 1
  • 5
5

IMO:

  • If a test requires DB, it is not the unit test. It is either integration or acceptance test.
  • To avoid the need of using DB in a test I should follow inversion of control principle by using dependency injection (or service locator).
  • If I need to test something with DB, I should create a VM with actual database (with fake or real data, depending on the task). Vagrant and Docker are great tools for the job.
  • Mocking the DB is a bad practice. It is the same as doing it wrong (mocking DB) and then redoing it right (making it work with real DB).
  • It is OK for proof-of-concept projects, but still it is then followed by redoing (if concept is proved).
  • It is perfectly fine to use HSQLDB (or whatever) by itself if it is a proper tool for the job.
scriptin
  • 4,432
  • 21
  • 32
4

Use the same database for (unit-)tests as you are using in your production environment.

The reason is simple: Databases behave differently. Each database has their own proprietary functions. Even if you use an ORM layer for database access, your database can behave differently, also performance-wise.

Of course, the advantage of an embedded database for unit tests is that you do not have to configure anything. But installing a database on every developer machine and configure it to be used for unit tests is not that hard.

I had the same situation in a project some time ago. We used JDBC to run SQL-statements on the database. When we decided to write tests for our application, we used an embedded database. That lead to situations where some bugs could not be reproduced with unit tests, and some parts of the application could not be tested because the embedded database did not support the functions of our production database. So every developer installed the same database we also used in the production environment on the PC to run the tests, which worked much better.

Uooo
  • 786
  • 5
  • 15
  • That's a lot of Oracle instances you are suggesting installing in some environments - not cheap at all. Don't forget the build server either. –  Feb 11 '14 at 13:51
  • @MichaelT I know, but what is the alternative? – Uooo Feb 12 '14 at 05:57
  • 2
    One could use HSQLDB working with an oracle dialect (see `sql.syntax_ora` in http://hsqldb.org/doc/guide/dbproperties-chapt.html ) - its not perfect, but its good enough for most things. As you note its not an everything solution, but it does solve the issues of multiple licenses needed and allows for stand alone builds that don't depend on another service running. Another approach would be [dbunit](http://dbunit.sourceforge.net). There are indeed tradeoffs with each approach. –  Feb 12 '14 at 18:08
  • @MichaelT Oracle is free for development/testing. So there's no cost to install another Oracle instance for testing. There is of course the cost of installing more hardware to run all those instances. – jwenting Feb 17 '14 at 09:19
  • yes, you should eventually test the database layer against the database engines to be supported. But having something like HSQLDB in there can help speed things up, especially if your network is congested. – jwenting Feb 17 '14 at 09:21
  • +1 for using the same db. It might be an impossible state to reach, but test should be as close to prod as possible, precisely because software and systems behave differently at different scales e.g. suddenly that query that's so quick in dev kills the database when run against a "proper" amount of data. – Rory Hunter Feb 17 '14 at 10:55
  • @jwenting the other point to consider there is it means that you need the database up and running in order to do the tests against them. And do not forget the additional maintenance of keeping them in sync and administration of the instances. This rapidly adds up to a non-trivial organizational cost. –  Feb 17 '14 at 18:10
  • 2
    I wouldn't consider it a unit test if it was accessing a non-embedded DB – herman Feb 17 '14 at 18:27
  • +1 for the answer. I faced a similar issue where spring/hibernate code ran fine with HSQLDB in embedded mode during tests but failed badly in client-server MySQL. Agreed that the code was the fault and a well written code would have avoided the problem, but we expected the tests would fail if the code was not ok. – Jit B Feb 22 '14 at 09:22
  • The idea is to test the functionality. If you can ensure the availability of a replica of your prod DB while running builds on your CI server, there is no harm running integration tests in place of unit tests. – Jit B Feb 22 '14 at 09:23
4

I've found & fixed some really bad problems and greatly speeded up my development by testing against HSQL database. Often the lowest level of adaptor classes (i.e. your DAOs) can go untested in unit testing because it can be hard to decouple them from an actual round-trip to a live database. I have mocked Connection and JDBC artifacts in the past, but it was actually much harder than testing against an in-memory database.

On one project, I developed & tested against HSQL, ran locally against MySql, and deployed to production against SQL-Server. Amazingly, it worked just fine. The biggest integration issue I encountered was due to my own misunderstanding of the spec (seconds vs. milliseconds, lol).

While the persistence framework can mask the underlying database (like Hibernate does), the persistence framework itself can add enough complicated semantics that exercising it in isolation can shake out a lot of quirks before you get to full-blown integration testing.

Consider the case where you're using Hibernate or JPA to build a complicated result set with a lot of joins. If it's nasty and takes a lot of tries, you could develop and test that against a live database. But you could also develop and test it against an in-memory database with less overhead. You also don't have to worry about setup/teardown scripts for your database fixtures, or coordinating tests and builds with others. And if you test against the in-memory database, you enjoy the benefits of having those tests around for regression, just like other unit tests.

I've heard the term "Component Testing" for tests that are more than unit test, but still done in isolation.

Unit tests, Component Tests (if that's really their name), and Integration Tests all provide value. Teams can dispense with one or another (yes, lots of teams do no unit testing & rely solely on Integration testing), but if you do that, you're losing the benefits that the other testing tiers bring.

In the case of tests that are isolated by virtue of integrating in memory, the benefit is rapid validation of code, and then fast regression -- the benefits of isolated testing and no need for additional installations, licenses or hardware. It allows you to validate more levels of code in isolation than if you just punted on testing your DAOs until the database was provisioned.

So there's practical value. It's not required, but I've been more happy rather than less when I've done it.

sea-rob
  • 6,841
  • 1
  • 24
  • 47
3

I use approaches like that in the past, i'm not sure if this is an "anti-pattern" (some people try to elevate his personal feelings or experiences to universal rules, i'm not), but i prefer another approach:

  • For unit test, real unit test don't use external things like an in-memory database (not really lightweight if you compare to a hashMap or a stub using some stub library). If you use DI and a simple patterns like Repository or DAO for your data-access this is very easy to achieve. My objective its to have a lot of unit test that execute really quick, and with quick i'm thinking in 3k test in less than a minute.

  • For integration test, i prefer to use the real data-storage software in this test, the goal of this test its to prove the integration between my software and the real data-storage, use a different data-storage here breaks the intention of this test. I try to write the less integration test as possible to demonstrate this integration, and usually this test are executed outside of the normal build (for example only in nightly builds)

I use HSQLDB sometimes to build simple demos or proof-of concepts, its a great tool, but i don't see anymore what its the place for this tool in my normal development workflow.

AlfredoCasado
  • 2,159
  • 11
  • 11
0

For unit testing data access products like MyBatis, tests using embedded HSQLDB are perfectly fine in my opinion (and I know at least at MyBatis they are using HSQLDB exactly for this).

For most other projects I would consider this overkill. I would make some simple integration tests (these are tests that potentially have side effects on other tests) that access the real database, ensuring that each query/statement is used at least once. There should be far less integration tests than unit tests in the project.

For the unit tests, I would mock the DAO or whatever accesses the DB (the 'adaptor').

herman
  • 405
  • 4
  • 11