Preparing your relational databases for automatic updates

When an application relies on relational databases for its data storage the need to update the database(s) will arise. This could be as simple as adding a column or table or a more complex reorganization of the schema (and probably also data).

When working with databases I have experienced 3 main ways of handling these kinds of changes.

  1. Manually scripting the changes in every single database (not recommended)
  2. Using some kind of framework or tool that performs differential updates based on a master schema
  3. Incremental updates

I have always been a big fan of differential updates. I created my own tool for doing this around 8 years ago. All updates are performed using a master schema which is the latest version of the database. This schema is then deployed to customers. The master schema and the customer’s schema (which is extracted when needed) are compared at install/update and all differences are automatically resolved. If a new table is added or a new column is added to an existing table this is automatically resolved. This is great, well it is actually more than great, it is brilliant because I never have to write a single line of TSQL code to manage database schema changes.

There is only “one” catch…? How is static data handled… which more often than not is not static… or fixes that requires data to be moved or transformed or deletion of columns or tables,  or <insert your question here>?

I know that there are now commercial tools out there that do this but from my experience none of them handle all scenarios unless you specifically design your database to overcome the shortcomings of these tools. When the system has been released this is often a little to late.

After using my tool with success for some years I came to realize that it might not be the best way of approaching automated SQL database updates (both schema and data). The main reason for this is that there are so many scenarios that require custom scripts to be made that simple cannot be auto generated. A brain is required!

If an application is to be updated automatically without any user interaction then using versioned incremental updates is probably a better approach. The main advantage is that you always know the state of the database and exactly what updates have been performed. It is also very fast to update the database since only required scripts are executed.

I have defined that every set of changes to the database schema or data is a new version.  A set of changes can be any number of T-SQL files. A typical file structure would look like this:

\Version_1.0\1.0.0.0\CreateTables.sql
\Version_1.0\1.0.0.1\InsertDefaultData.sql
\Version_1.0\1.0.0.2\AddColumnToAddress.sql
\Version_1.0\1.0.0.3\AddNewTables.sql
\Version_2.0\2.0.0.0\Upgrade.sql

So if a customer has version 1.0.0.2 all that has to be run when upgrading to 2.0.0.0 is 2 scripts. With this approach automatic roll back is also possible to do. Having the option to do database roll backs might be a requirement if you are releasing frequently. In some case it is not possible to do a roll back and the only way to go is forward but for some scenarios database roll backs are fully doable.

\Version_1.0\1.0.0.0\CreateTables.sql
\Version_1.0\1.0.0.1\InsertDefaultData.sql
\Version_1.0\1.0.0.1\Rollback.sql
\Version_1.0\1.0.0.2\AddColumnToAddress.sql
\Version_1.0\1.0.0.2\Rollback.sql
\Version_1.0\1.0.0.3\AddNewTables.sql
\Version_1.0\1.0.0.3\Rollback.sql
\Version_2.0\2.0.0.0\Upgrade.sql

Given the above structure we can roll back from any version between 1.0.0.3 and 1.0.0.1. Roll back from 2.0.0.0 to 1.0.0.3 is not supported in the above scenario.

The beauty of the above approach is that we know exactly what scripts are run on our customer’s databases and we know that any script is only run once. This makes the database upgrades very fast and also robust since only required scripts are run. Another advantage is that the database will not be touched if the latest version is already installed.

It is also worth mentioning that differential updates and incremental updates can be combined by generating change scripts in-house using diff and then shipping the result as an incremental update.

Changing to incremental updates can be tricky if you have not versioned your database. In most cases there will however be ways to determine a base version and work from there. Maybe the main system version can be used to figure out the state of the database?

Database upgrade/update strategy is very important to consider when introducing automated updates and hopefully this blog can be a help in making the right decisions.