Continuous Integration for Databases

Many different techniques, frameworks and tools are nowadays used to automate and optimize software deployment. Continuous Integration, Continuous Delivery and sometimes even Continuous Deployment are a standard most agile companies use daily.

However, while all those process are oriented to embrace change in a controlled way, there is a gray area where change is still problematic: most applications need some form of persistent storage, a database, which is also compelled to change alongside the logic layer.

Martin Fowler has called the set of practices applying DevOps and Agile approaches to databases “Evolutionary Database Design”, a fitting definition. We will build upon that and demonstrate how to use cloud infrastructure together with tools like DbFit, Flyway and Jenkins, to realize an agile Environment for Continuous Database Development, that is: Agile databases.

Agile Databases with Evolutionary Database Design

Automating Database Creation through Configuration Management

Finding the right versioning methods or tools for databases is already a difficult task, and often a subordinate one. Nevertheless, today it is hard to imagine writing code without git or deploying an application without a versioning system.

The same should be true for databases: in order to ensure different environments are as uniform and reproducible as possible, it’s a good idea to embrace Infrastructure as Code (IaC) to automate deployment and configuration.

Building and deploying databases automatically offers many benefits, the most obvious being that it is less error-prone, much faster, and easier to maintain.

Moreover, every developer can easily build their own reproducible development environment, and when changes have to be made to a database, they can try out different possibilities without disrupting colleagues, as would be the case with a shared database.

Automating database creation is an easy task thanks to infrastructure management and configuration management tools. On Exoscale, a new Postgresql Node intended as a sandbox for the individual developer can be e.g. created with the Exoscale Terraform Provider.

If you are new to Terraform you can have a look at these articles and demos to get started:

Then, using a simple script:

resource "exoscale_compute" "postgresql-new-dev-node" {
  display_name       = "postgresql-new-dev-node"
  template           = "Linux Ubuntu 18.04 LTS 64-bit"
  size               = "Small"
  disk_size          = 10
  zone               = "at-vie-1"
  security_groups    = ["postgresql"]
  key_pair           = "new-dev-keypair"
}

The script creates a new small Ubuntu machine, using a SSH Keypair and Security Group previously created on Exoscale.

With the new machine up and running you can then configure your DB using e.g. Ansible modules, like postgresql_db or postgresql_user). Note that to be able to access your machine, the indicated Security Group must allow ingress traffic on port 22, to enable SSH connections.

Once the automatic deployment has been implemented, a developer can quickly and easily create throw-away databases to test and benchmark modifications, and has an easy way to dispose them once done.

Once modifications are validated and reviewed, the changes in the database structure and configuration can be merged as a normal commit and go through the usual Continuous Integration pipeline.

The same Configuration Management process can then be used to deploy in other environments such as Test, QA, and (finally!) Production.

Testing Databases in a Continuous Integration Pipeline

Tests, be they Unit, Integration or Acceptance Tests, have become an indispensable part in the development process of an application.

Databases, however, are not usually seen as an independent unit to test, although there is an appealing selection of possibilities to do so.

A very interesting tool in this field is DbFit, which builds upon the FitNesse framework to build a complete test suite for your database. It has several advantages:

Easy to install and run
Available on both Linux and Windows environments
File-system based: taking advantage of FitNesse, every test is a simple text file you easily bring under source control. Moreover, it offers the ability to write Acceptance Tests directly in the browser, bridging the gap among the different souls of the team (coders and non-coders).
Extensible: you have the choice between a .NET or Java wrapper depending on the database you are targeting, but either way adding a new fixture is fairly simple.

The installation is easily done by downloading and unpacking (the examples assumes you’ll put DbFit under /opt). All interactions with DbFit are done via startFitnesse.sh (or its Windows equivalent startFitnesse.bat). You can explore the tool easily running

sudo /opt/dbfit/startFitnesse.sh --help

And now we can immediately start writing our first tests: simply execute /opt/dbfit/startFitnesse.sh and go to http://localhost:8085/dbfit to use the FitNesse GUI. Detailed instructions can be found directly at http://localhost:8085/dbfit.AcceptanceTests.JavaTests.PostgresTests after starting DbFit.

You can organize DB tests alongside your regular application tests in your source code. It is recommended to create a MainSuite folder which contains all other suites and tests for the database. Here is an example of how the folder structure might look like:

fitnesse
└── MainSuite
    ├── BasicConnectionSuite
    │   ├── DbConnectionTest
    │   │   ├── content.txt
    │   │   └── properties.xml
    │   ├── SimpleQueryTest
    │   │   ├── content.txt
    │   │   └── properties.xml
    │   ├── content.txt
    │   └── properties.xml
    ├── content.txt
    ├── plugins.properties
    └── properties.xml

Once the tests are written, we can execute them with startFitnesse.sh. With the previously described installation and the created MainSuite folder, the shell command looks like this:

sh /opt/dbfit/startFitnesse.sh -v -o -r "test/fitnesse" -f /opt/dbfit/plugins.properties -d "$(pwd)" -c "MainSuite?suite&format=text"

Finally, you can add your DbFit tests to your Continuous Integration Workflow and execute them in your Continuous Delivery Pipeline.

Database Updates - Migrations - Fixtures

Planning databases is hard, and often the structure we have initially imagined is quickly going to change because of new unforeseen requirements. But databases too should be governed by common patterns of the Agile principles: changes should be able to be rolled out through environments, should be implemented at any time, exactly when they are needed for a new requirement, and all changes should be versioned.

While most modern frameworks like e.g. Django or Laravel to name a few, have integrated versioning and migration tools, what do you do when no convenience is available out of the box? A tool like Flyway can support you significantly.

If you have an empty PostgreSQL database at the beginning of a project, you can use Flyway to roll out versioned SQL scripts on all environments.

As DbFit, Flyway is also very easy to install and to configure. Simply export some environment variables:

    FLYWAY_LOCATIONS='filesystem:./sql'
    FLYWAY_URL='jdbc:postgresql://185.150.10.24:5432/evodb'
    FLYWAY_USER='flyway'
    FLYWAY_PASSWORD='flyways-secure-secret'
    FLYWAY_SCHEMAS='evolutionary-db-design'

Now we can start to make first updates on our database. Let’s create a table with personal data, in two separate files:

# V1__create_person_table.sql
CREATE TABLE person (
    ID              SERIAL PRIMARY KEY,
    NAME            TEXT          NOT NULL,
    AGE             INT           NOT NULL,
    ADDRESS_STREET  VARCHAR(50),
    ADDRESS_CITY    VARCHAR(50),
    ADDRESS_ZIP     INT,
    ADDRESS_COUNTRY VARCHAR(50)
);

# V2__insert_person_data.sql
INSERT INTO person VALUES (NULL,'Bruce Wayne',18,'1007 Mountain Drive','Gotham',53540,'USA');
INSERT INTO person VALUES (NULL,'Donald Duck',84,'Disneyland Resort','Anaheim',92803,'USA');

You can commit these scripts directly in the code repository, to track the database structure on one side and automate the filling of mock data for testing or dev purposes.

A ./flyway migrate will roll out all scripts on the database, updating the DB and filling it as described in your fixtures.

This allows you to roll out versioned changes to databases in situations where the framework you use does not offer you the tools to track, migrate and generate fixtures.

It is recommended to keep the migrations as small as possible, and in the case of schema changes, special attention should be paid to ensure applications can work with the old and new database schema for a certain period of time.

This time can be used by the applications to switch to the new schema. This is called a transition phase. Interested readers are referred to the following reading, in which a migration of a Split Table with a transition phase is carried out.

Continuous Integration of Database Changes: a Jenkins Template

To put in practice Agile principles, developers commonly test changes in the codebase via a CI build server, ensuring the code they produce is always working. With the above building blocks in place we can now integrate the DB tests via DbFit, as well as the database migrations via Flyway, directly into the pipeline.

The best way to do this is to provide your own CI database, which the Build Server uses for Continuous Integration. If you use Jenkins, a definition of the pipeline based on a Jenkinsfile could look as follows:

pipeline {
    agent any

    stages {
        stage('Build Application') {
            ### Build Application Pipeline Part ###
        }
        stage('INT - DB Migration') {
            environment {
                FLYWAY_LOCATIONS = 'filesystem:./sql'
                FLYWAY_URL = 'jdbc:postgresql://185.150.10.24:5432/evodb'
                FLYWAY_USER = 'flyway'
                FLYWAY_PASSWORD = $INT_FLYWAY_PASSWORD
                FLYWAY_SCHEMAS = 'evolutionary-db-design'
            }
            steps {
                echo 'Run Flyway Migration'
                sh '/opt/flyway-5.1.3/flyway migrate'
            }
        }
        stage('INT - Fitnesse Tests') {
            steps {
                sh '/opt/dbfit/startFitnesse.sh -v -o -r "test/fitnesse" -f /opt/dbfit/plugins.properties -d "$(pwd)" -c "MainSuite?suite&format=text"'
            }
        }
        stage('INT - Deployment') {
            ### Application Deployment Pipeline Part ###
        }
        stage('INT - Application Tests') {
            ### Application Tests Pipeline Part ###
        }
    }
}

The pipeline could easily include stages for additional environments such as Test, QA or Prod, which automatically map the deployment and tests.

Evolutionary Database Design in the Cloud is About Disposable Machines

It can be worthwhile to consider databases as a separate unit, so that the quality of the deployments, tests and maintenance of the database itself can be increased.

Above all, the flexibility and scalability of the cloud brings decisive advantages here. Databases can be set up and dismantled together with the virtual machines as required, and Continuos Integration pipelines can be set-up and torn down alongside the build server on demand, with no resource consumption during idle times.