Crossing borders: Rails migrations

Rethinking database schema changes

Ruby on Rails is a progressive Web development framework that implements radical ideas such as convention over configuration, heavy metaprogramming, domain-specific languages, and database wrapping instead of object-relational mapping. This article examines Rails schema migrations, a philosophy of separating each database schema change from the base object model.

Bruce Tate (bruce.tate@j2life.com), President, RapidRed

Bruce TateBruce Tate is a father, mountain biker, and kayaker in Austin, Texas. He's the author of three best-selling Java books, including the Jolt winner Better, Faster, Lighter Java. He recently released From Java to Ruby. He spent 13 years at IBM and is now the founder of the RapidRed consultancy, where he specializes in lightweight development strategies and architectures based on Java technology and Ruby on Rails. His practice now offers a full range of Ruby and Rails education, consulting, and implementation offerings.



15 August 2006

Also available in Chinese Russian Japanese

As an avid cyclist, I'm aware of two serious communities: mountain bikers and road bikers. Conventional wisdom has it that mountain biking is far more dangerous, but I disagree. Road bikers must consider much more serious obstacles than rocks or trees: cars. A similar disagreement is developing between supporters of the two persistence strategies for object-oriented application development.

Currently, persistence frameworks use one of two approaches: mapping or wrapping. Mapping solutions let you create independent database schemas and object models, and then use a layer of software to manage differences between the two. Mapping solutions seek to build an object model that closely resembles the structure of the database schema. In contrast, wrapping solutions use objects as wrappers around database tables and rows to manipulate data in a database. Conventional wisdom is that a mapping solution is often more flexible once the solution has been released into the wild because mapping software can better deal with changes in the schema or object model. But that wisdom ignores the most important part of the equation: data. To manage any application change involving a persistent domain model effectively, you must coordinate changes in the data, schema, and model. Most project teams don't get it right.

Development teams usually handle schema changes by generating a new version of a schema from scratch, with SQL scripts. A script might drop all tables and add all of the tables again. Such a strategy destroys all test data and therefore is worthless for production scenarios. Occasionally, tools might create scripts that generate delta schemas, or schemas that use SQL commands such as alter_table to modify a previous version of a schema. But very few teams bother to create scripts that undo schema changes, and fewer still manage to create automated scripts to deal with changes in data. In short, traditional mapping strategies ignore the cars in the road: reversing bad schema changes and handling data.

About this series

In the Crossing borders series, author Bruce Tate advances the notion that today's Java programmers are well served by learning other approaches and languages. The programming landscape has changed since Java technology was the obvious best choice for all development projects. Other frameworks are shaping the way Java frameworks are built, and the concepts you learn from other languages can inform your Java programming. The Python (or Ruby, or Smalltalk, or ... fill in the blank) code you write can change the way that you approach Java coding.

This series introduces you to programming concepts and techniques that are radically different from, but also directly applicable to, Java development. In some cases, you'll need to integrate the technology to take advantage of it. In others, you'll be able to apply the concepts directly. The individual tool isn't as important as the idea that other languages and frameworks can influence developers, frameworks, and even fundamental approaches in the Java community.

This article takes a deeper look at Ruby on Rails migrations -- Rails's solution for dealing with changes to a production database. Migrations combine power and simplicity to coordinate both schema changes and data changes using a wrapping approach. (If you've not seen Active Record -- the underlying persistence layer in Rails -- before, I recommend that you take a look first at this earlier Rails article in the Crossing borders series.)

Handling schema changes in Java programming

Mapping-based frameworks require a schema, a model, and a map. Such an architecture can be repetitive. Think of how many times you need to specify a given attribute:

  • The getter in the model
  • The setter in the model
  • The instance variable in the model
  • The "to" side of the mapping
  • The "from" side of the mapping
  • The column definition

To be fair, Java frameworks such as Hibernate shield you from much of the repetition by the liberal use of code generation. While object-relational mappers let you deal with legacy schemas, for new database schemas you can generate your schema directly from your model with Hibernate-provided tools and generate your getters and setters with your chosen IDE. You can embed your mapping into your domain model -- partially defeating the purpose of having a map in the first place, in my opinion -- with Java annotations. This code-generation technique also serves another purpose: schema migration. Some of these code-generation tools can detect differences between your new domain model and the old schema and generate SQL scripts to bridge those differences. Keep in mind that these scripts deal with schema, but not data.

For example, consider a migration that merges first_name and last_name database columns to a single column called name. The tools from a typical Java persistence framework won't help the database administrator because those tools deal with only part of the problem: the changes in the schema. When you make this schema change, you also need to be able to deal with the existing data. When it's time to deploy a new release of this hypothetical application, the database administrator usually must manually create SQL scripts to do the following:

  • Create a new column called name.
  • Capture the data from first_name and last_name into the new column.
  • Drop the first_name and last_name columns.

And if the code version causing the schema change is in any way deficient, the change must often be rolled back by hand. Only rare teams have the discipline to integrate and automate changes across the model, schema, and data.


Rails migrations basics

In Rails, all schema changes -- including the schema's initial creation -- occur in a migration. Each change in the database schema has its own migration object, encapsulating a move up and a move down. Listing 1 shows an empty migration:

Listing 1. An empty migration
class EmptyMigration < ActiveRecord::Migration
  def self.up
  end

  def self.down
  end
end

I'll show you how to invoke migrations soon, but for now, look at the structure of the migration in Listing 1. Within this migration's up method, you'd place all of the code necessary to make one logical database change. You'd also capture any change to undo any schema changes. By encapsulating up and down, Rails development and production tools can automate the process of deploying and backing out any change involving persistent object models. These database changes can include:

By encapsulating up and down, Rails development and production tools can automate the process of deploying and backing out any change involving persistent object models.

  • Adding or dropping any new tables.
  • Adding or dropping any new columns.
  • Changing the database in other ways, including adding, dropping, or modifying indexes or other constraints.
  • Modifying database data.

By allowing changes to data, migrations make it much easier to synchronize changes in data and schema that often come together. For example, you might add a new lookup table associating each state and its two-digit ZIP code. Within the migration, you can populate the database table, perhaps by invoking a SQL script or loading a fixture. If your migrations are correct, each migration leaves your database in a consistent state, with no manual intervention.

The filename for each migration begins with a unique number. This convention lets Rails associate a strong ordering of migrations. With this strategy, you can move up or down to any logical database schema state.


Using migrations

To use migrations, you only need a Rails project and a database. If you want to follow along in code, install a relational database manager, Ruby, and Rails version 1.1 or greater. You're ready to get started. Follow these steps to create your database-backed Rails project:

  1. Create a Rails project called blog by typing rails blog.
  2. Create a database called blog_development. Using MySQL, I simply type create database blog_development from the MySQL command line.
  3. Configure the database as necessary by config/database.yml, adding your database login ID and password.

To see how the numbering works, generate a migration:

  1. From the blog directory, type ruby script/generate migration create_blog. (If you're running on Unix, you can omit the ruby. I'll omit it from now on.)
  2. Type script/generate migration create_user. (Take a look at the files in blog/db/migrate. You'll see two sequentially numbered files. The migrations generator is managing the numbers for you. )
  3. Delete the migration called 001_create_blog.rb, and create it again with script/generate migration create_blog. You'll notice that the new migration created as 003_create_blog.rb, as in Listing 2:
Listing 2. Generating migrations
> cd blog
> script/generate migration create_blog
      create  db/migrate
      create  db/migrate/001_create_blog.rb
> script/generate migration create_user
      exists  db/migrate
      create  db/migrate/002_create_user.rb
> ls db/migrate/  
001_create_blog.rb      002_create_user.rb
> rm db/migrate/001_create_blog.rb 
> script/generate migration create_blog
      exists  db/migrate
      create  db/migrate/003_create_blog.rb
> ls db/migrate/
002_create_user.rb      003_create_blog.rb

You can see the numerical prefix for each migration. Any new migration gets the maximum prefix in the directory, plus one. This strategy ensures that migrations are regenerated -- and subsequently executed -- in order. Any migration building on the contents of others, such as a migration adding a column to a table created by another migration, remains consistent. The numbering mechanism is simple, intuitive, and consistent.

To see how the migrations work in the database, delete all of the migrations in the db/migrations directory. Generate a model object for Article and an empty migration like the one in Listing 1 by typing script/generate model Article. Notice that Rails generates your model objects and migrations for each article. Edit db/migrate/001_create_articles.rb to look like Listing 3:

Listing 3. The migration for CreateArticles
class CreateArticles < ActiveRecord::Migration
  def self.up
    create_table :articles do |t|
      t.column :name, :string, :limit => 80
      t.column :author, :string, :limit => 40
      t.column :body, :text
      t.column :created_on, :datetime
    end
  end

  def self.down
    drop_table :articles
  end
end

Migrating up and down

To see exactly what the migration does, just run it and look at the database. From the blog directory, type rake migrate. (rake is the Ruby equivalent of the C platform's make or the Java platform's ant.) migrate is one rake task.

Next, show the tables in your database. With MySQL, just go to the mysql> command prompt, type use blog_development;, and then show tables; to see the result in Listing 4:

Listing 4. The schema_info table created by Rails migrations
mysql> show tables;
+----------------------------+
| Tables_in_blog_development |
+----------------------------+
| articles                   |
| schema_info                |
+----------------------------+
2 rows in set (0.00 sec)

mysql> select * from schema_info;
+---------+
| version |
+---------+
|       1 |
+---------+
1 row in set (0.00 sec)

Notice the second table: schema_info. My migration specified the articles table, but the rake migrate command created schema_info automatically. Execute select * from schema_info.

When you run rake migrate with no parameters, you are asking Rails to run all migrations that have yet to be applied. Rails does all of the following:

  • Creates the schema_info table if it doesn't exist.
  • Inserts a row in schema_info with a value of 0 if no row exists.
  • Runs the up methods of all migrations having a number greater than the current migration. rake determines the number of the current migration by reading the value of the version column in the schema_info table. rake runs up migrations from least to greatest, and down migrations in reverse. 

To migrate down, just run rake migrate with a version number. Migrating down can destroy data, so be careful. Some operations, such as dropping tables or columns, also destroy data. Listing 5 shows the results of migrating down and then back up. You can see schema_info faithfully track the current version number. This approach does an excellent job of letting you move smoothly between schemas that represent different stages of development.

Listing 5. Migrating down
> rake migrate VERSION=0
(in /Users/batate/rails/blog)
== CreateArticles: reverting ==================================================
-- drop_table(:articles)
   -> 0.1320s
== CreateArticles: reverted (0.1322s) =========================================

> mysql -u root blog_development;
mysql> show tables;
+----------------------------+
| Tables_in_blog_development |
+----------------------------+
| schema_info                |
+----------------------------+
1 row in set (0.00 sec)

mysql> select * from schema_info;
+---------+
| version |
+---------+
|       0 |
+---------+
1 row in set (0.00 sec)

mysql> exit
Bye
> rake migrate
(in /Users/batate/rails/blog)
== CreateArticles: migrating ==================================================
-- create_table(:articles)
   -> 0.0879s
== CreateArticles: migrated (0.0881s) =========================================

> mysql -u root blog_development;
mysql> select * from schema_info;
+---------+
| version |
+---------+
|       1 |
+---------+
1 row in set (0.00 sec)

It's time to crack open the table itself. Look back at Listing 3 and at the table definition. With MySQL, you can execute the show create table articles; command, which produces the result in Listing 6:

Listing 6. Table definition for articles
mysql> show create table articles;
+----------+...-----------------+
| Table    | Create Table |
+----------+...-----------------+
| articles | CREATE TABLE 'articles' (
  'id' int(11) NOT NULL auto_increment,
  'name' varchar(80) default NULL,
  'author' varchar(40) default NULL,
  'body' text,
  'created_on' datetime default NULL,
  PRIMARY KEY  ('id')
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+----------+...-----------------+
1 row in set (0.00 sec)

You can see that much of this table definition came directly from the migration. One of the core advantages of Rails migrations is that you don't need to use direct SQL syntax to create your tables. You handle each schema modification in Ruby, so the resulting SQL is database independent. But notice the id column. Though you did not specify that column, the Rails migration created it for you anyway, with auto_increment and NOT NULL. The id column with this particular column definition follows Rails conventions for an identifier column. If you wanted to create this column without an id, your migration would simply append the :id option, like the migration in Listing 7:

Listing 7. Creating a table without an id column
  def up
    create_table :articles, :id => false do |t| 
      ...
    end
  end

You've taken a deep dive into a single migration but still haven't created a single change in the schema. It's time to create another table, this time for comments. Generate a model called Comment by typing script/generate model Comment. Edit the resulting migration in db/migrate/002_create_comments.rb to look like Listing 8. You're going to need a new table with a few columns, and you also will take advantage of the Rails capability for adding non-null columns and default values.

Listing 8. A second migration, for comments
class CreateComments < ActiveRecord::Migration
  def self.up
    create_table :comments do |t|
      t.column :name, :string, :limit => 40, :null => false
      t.column :body, :text
      t.column :author, :string, :limit => 40, :default => 'Anonymous coward'
      t.column :article_id, :integer
    end
  end

  def self.down
    drop_table :comments
  end
end

Go ahead and run this migration. If you ever get an error while a migration is in progress, just keep in mind how migrations work. You need to check the value of the row in schema_info and look at the state of the database. You might need to drop some tables manually or change the value of the row in schema_info after you correct your code. Remember, nothing magic is happening. Rails runs the up methods on all migrations that haven't been run yet. If you're adding a table or column that already exists, that operation will fail, so you need to make sure your migration is in a consistent state. For now, run rake migrate. Listing 9 shows the result:

Listing 9. Running a second migration
> rake migrate(in /Users/batate/rails/blog)
== CreateComments: migrating ==================================================
-- create_table(:comments)
   -> 0.0700s
== CreateComments: migrated (0.0702s) =========================================

> mysql -u root blog_development;
mysql> select * from schema_info;
+---------+
| version |
+---------+
|       2 |
+---------+
1 row in set (0.00 sec)

Migrations can handle many different kinds of schema changes. You can add and drop indexes; alter tables by dropping, renaming, or adding columns; and even drop down into SQL as necessary. You can do anything from a migration that you can do from SQL. Rails has wrappers around the most common operations, including:

  • Create a table (create_table)
  • Drop a table (drop_table)
  • Add a column to a table (add_column)
  • Remove a column from a table (remove_column)
  • Rename a column (rename_column)
  • Change a column (change_column)
  • Create an index (create_index)
  • Drop an index (drop_index)

Some migrations change more than one column to consolidate a single logical change in a database. Consider a migration that would add a top-level blog, with articles belonging to the blog. You'd need to create a new table and also add a new column representing the foreign key to each article to point to a blog. Listing 10 shows the full migration. You can run the migration by typing rake migrate.

Listing 10. A migration creating a table and adding a column
class CreateBlogs < ActiveRecord::Migration
  def self.up
    create_table :blogs do |t|
      t.column :name, :string, :limit => 40;
    end
    add_column "articles", "blog_id", :integer
  end

  def self.down
    drop_table :blogs
    remove_column "articles", "blog_id"
  end
end

Data too

You can do anything from a migration that you can do from SQL.

So far, I've focused only on changes in schema, but changes in data are important too. Some database changes require changes in data as well as schema, and some of those data changes require logical changes. For example, say you want to create a new comment on each blog article to indicate the article is open for comments. If you implement the change after your blog has already been open for a while, you'll want to add a new comment only to articles that don't already have comments. You can easily make this change in a migration because the migration has access to model objects and can make logical decisions based on the model's state. Type script/generate migration add_open_for_comments. You need to make changes to the comment to capture the belongs_to relationship and write the new migration. Listing 11 shows the model object and new migration:

Listing 11. Model object and new migration
class AddOpenForComments < ActiveRecord::Migration
  def self.up
    Article.find_all.each do |article|
      if article.comments.size == 0
        Comment.new do |comment|
          comment.name = 'Welcome.'
          comment.body = "Article '#{article.name}' is open for comments."
          article.comments << comment
          comment.save
          article.save
        end
      end
    end
  end

  def self.down
  end
end

For the migration in Listing 11, you make a tactical decision. You decide that your users would not like to see welcome messages disappear after they've been added, so you choose not to delete any records within the down migration. The ability to address changes in data within a migration is a tremendous luxury. You can synchronize changes in data and schema. You can also address data changes that involve logical operations on your model objects.

I've shown you most of what's available in migrations. You've got a few other tools at your disposal. If you want to start using migrations with an existing database, you can snapshot your existing schema with rake schema_dump. This rake task creates a Ruby schema with the correct migrations syntax in db/schema.rb. You can then generate a migration and copy the schema you dumped into a migration. (See Resources for more details.) I've also not talked about test fixtures, which can be instrumental in setting up test data or seeding a database. See my earlier Crossing borders article on unit tests for more details.


A final comparison

The migrations story in Java programming is not nearly as robust. Some products have spot solutions for some of the schema migrations problems, but without a systemic process for coordinating schema changes -- both up and down -- dealing with changes in the data and object model can be a difficult task. The Rails solution has some core advantages:

  • Rails migrations are DRY (don't repeat yourself). With Rails, you specify each column definition exactly once: in the migration. Some other mappers force you to specify a column six times: in the schema, the getter, the setter, the model's instance variable, the "from" mapping, and the "to" mapping.
  • Rails migrations allow data migration as well as schema migration.
  • Rails migrations let you use model logic with your data migration where SQL scripts don't.
  • Rails migrations are database independent, but SQL scripts aren't.
  • Rails migrations allow direct SQL for unsupported extensions (such as stored procedures or constraints), while some ORM mappers don't.

With all of the benefits of migrations, you'd expect a complex piece of code, but they are incredibly simple. Migrations have meaningful names and version numbers. Each migration, has an up and down method. Finally, a rake task coordinates running them in the correct order. This simple strategy is also revolutionary. The idea of expressing each schema change not in the model but as a distinct migration is both elegant and effective. Coordinating both data and schema changes is another paradigm shift, and an effective one. Best of all, these ideas are completely language agnostic. If you're building a new Java wrapping framework, you'd do well to consider migrations.

In the next article in the series, you'll get a sneak peek at a new Ajax and Web services enabled framework that takes full advantage of metaprogramming in Rails. Until then, open your mind and keep crossing borders.

Resources

Learn

  • Beyond Java (Buce Tate, O'Reilly, 2005): The author's book about the Java language's rise and plateau and the technologies that could challenge the Java platform in some niches.
  • ActiveRecord::Migration: The Rails API documentation for ActiveRecord::Migration is a good place to go for the latest capabilities of migrations.
  • "Book review: Agile Web Development with Rails" (Darren Torpey, developerWorks, May 2005): Get the scoop on a book that deepens readers' understanding of Rails and the rationale behind agile development approaches.
  • Understanding Migrations: The Rails Wiki has a good overview of migrations and is the most up-to-date source of information outside of code.
  • From Java To Ruby: Things Every Manager Should Know (Bruce Tate, Pragmatic Bookshelf, 2006): The author's book about when and where it makes sense to make a switch from Java programming to Ruby on Rails, and how to make it.
  • Programming Ruby (Dave Thomas et al., Pragmatic Bookshelf, 2005): A popular book on Ruby programming.
  • The Java technology zone: Hundreds of articles about every aspect of Java programming.

Get products and technologies

  • Ruby on Rails: Download the open source Ruby on Rails Web framework.
  • Ruby: Get Ruby from the project Web site.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Java technology on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology, Web development
ArticleID=154158
ArticleTitle=Crossing borders: Rails migrations
publish-date=08152006