As an avid cyclist, I'm aware of two serious communities: mountain bikers and road bikers. Conventional wisdom has it that mountain biking is far more dangerous, but I disagree. Road bikers must consider much more serious obstacles than rocks or trees: cars. A similar disagreement is developing between supporters of the two persistence strategies for object-oriented application development.
Currently, persistence frameworks use one of two approaches: mapping or wrapping. Mapping solutions let you create independent database schemas and object models, and then use a layer of software to manage differences between the two. Mapping solutions seek to build an object model that closely resembles the structure of the database schema. In contrast, wrapping solutions use objects as wrappers around database tables and rows to manipulate data in a database. Conventional wisdom is that a mapping solution is often more flexible once the solution has been released into the wild because mapping software can better deal with changes in the schema or object model. But that wisdom ignores the most important part of the equation: data. To manage any application change involving a persistent domain model effectively, you must coordinate changes in the data, schema, and model. Most project teams don't get it right.
Development teams usually handle schema changes by generating a new version of a schema from scratch, with SQL scripts. A script might drop all tables and add all of the tables again. Such a strategy destroys all test data and therefore is worthless for production scenarios. Occasionally, tools might create scripts that generate delta schemas, or schemas that use SQL commands such as alter_table to modify a previous version of a schema. But very few teams bother to create scripts that undo schema changes, and fewer still manage to create automated scripts to deal with changes in data. In short, traditional mapping strategies ignore the cars in the road: reversing bad schema changes and handling data.
This article takes a deeper look at Ruby on Rails migrations -- Rails's solution for dealing with changes to a production database. Migrations combine power and simplicity to coordinate both schema changes and data changes using a wrapping approach. (If you've not seen Active Record -- the underlying persistence layer in Rails -- before, I recommend that you take a look first at this earlier Rails article in the Crossing borders series.)
Handling schema changes in Java programming
Mapping-based frameworks require a schema, a model, and a map. Such an architecture can be repetitive. Think of how many times you need to specify a given attribute:
- The getter in the model
- The setter in the model
- The instance variable in the model
- The "to" side of the mapping
- The "from" side of the mapping
- The column definition
To be fair, Java frameworks such as Hibernate shield you from much of the repetition by the liberal use of code generation. While object-relational mappers let you deal with legacy schemas, for new database schemas you can generate your schema directly from your model with Hibernate-provided tools and generate your getters and setters with your chosen IDE. You can embed your mapping into your domain model -- partially defeating the purpose of having a map in the first place, in my opinion -- with Java annotations. This code-generation technique also serves another purpose: schema migration. Some of these code-generation tools can detect differences between your new domain model and the old schema and generate SQL scripts to bridge those differences. Keep in mind that these scripts deal with schema, but not data.
For example, consider a migration that merges first_name and last_name database columns to a single column called name. The tools from a typical Java persistence framework won't help the database administrator because those tools deal with only part of the problem: the changes in the schema. When you make this schema change, you also need to be able to deal with the existing data. When it's time to deploy a new release of this hypothetical application, the database administrator usually must manually create SQL scripts to do the following:
- Create a new column called
name. - Capture the data from
first_nameandlast_nameinto the new column. - Drop the
first_nameandlast_namecolumns.
And if the code version causing the schema change is in any way deficient, the change must often be rolled back by hand. Only rare teams have the discipline to integrate and automate changes across the model, schema, and data.
In Rails, all schema changes -- including the schema's initial creation -- occur in a migration. Each change in the database schema has its own migration object, encapsulating a move up and a move down. Listing 1 shows an empty migration:
Listing 1. An empty migration
class EmptyMigration < ActiveRecord::Migration def self.up end def self.down end end |
I'll show you how to invoke migrations soon, but for now, look at the structure of the migration in Listing 1. Within this migration's up method, you'd place all of the code necessary to make one logical database change. You'd also capture any change to undo any schema changes. By encapsulating up and down, Rails development and production tools can automate the process of deploying and backing out any change involving persistent object models. These database changes can include:
- Adding or dropping any new tables.
- Adding or dropping any new columns.
- Changing the database in other ways, including adding, dropping, or modifying indexes or other constraints.
- Modifying database data.
By allowing changes to data, migrations make it much easier to synchronize changes in data and schema that often come together. For example, you might add a new lookup table associating each state and its two-digit ZIP code. Within the migration, you can populate the database table, perhaps by invoking a SQL script or loading a fixture. If your migrations are correct, each migration leaves your database in a consistent state, with no manual intervention.
The filename for each migration begins with a unique number. This convention lets Rails associate a strong ordering of migrations. With this strategy, you can move up or down to any logical database schema state.
To use migrations, you only need a Rails project and a database. If you want to follow along in code, install a relational database manager, Ruby, and Rails version 1.1 or greater. You're ready to get started. Follow these steps to create your database-backed Rails project:
- Create a Rails project called
blogby typingrails blog. - Create a database called
blog_development. Using MySQL, I simply typecreate database blog_developmentfrom the MySQL command line. - Configure the database as necessary by config/database.yml, adding your database login ID and password.
To see how the numbering works, generate a migration:
- From the blog directory, type
ruby script/generate migration create_blog. (If you're running on Unix, you can omit theruby. I'll omit it from now on.) - Type
script/generate migration create_user. (Take a look at the files in blog/db/migrate. You'll see two sequentially numbered files. The migrations generator is managing the numbers for you. ) - Delete the migration called 001_create_blog.rb, and create it again with
script/generate migration create_blog. You'll notice that the new migration created as 003_create_blog.rb, as in Listing 2:
Listing 2. Generating migrations
> cd blog
> script/generate migration create_blog
create db/migrate
create db/migrate/001_create_blog.rb
> script/generate migration create_user
exists db/migrate
create db/migrate/002_create_user.rb
> ls db/migrate/
001_create_blog.rb 002_create_user.rb
> rm db/migrate/001_create_blog.rb
> script/generate migration create_blog
exists db/migrate
create db/migrate/003_create_blog.rb
> ls db/migrate/
002_create_user.rb 003_create_blog.rb
|
You can see the numerical prefix for each migration. Any new migration gets the maximum prefix in the directory, plus one. This strategy ensures that migrations are regenerated -- and subsequently executed -- in order. Any migration building on the contents of others, such as a migration adding a column to a table created by another migration, remains consistent. The numbering mechanism is simple, intuitive, and consistent.
To see how the migrations work in the database, delete all of the migrations in the db/migrations directory. Generate a model object for Article and an empty migration like the one in Listing 1 by typing script/generate model Article. Notice that Rails generates your model objects and migrations for each article. Edit db/migrate/001_create_articles.rb to look like Listing 3:
Listing 3. The migration for CreateArticles
class CreateArticles < ActiveRecord::Migration
def self.up
create_table :articles do |t|
t.column :name, :string, :limit => 80
t.column :author, :string, :limit => 40
t.column :body, :text
t.column :created_on, :datetime
end
end
def self.down
drop_table :articles
end
end
|
To see exactly what the migration does, just run it and look at the database. From the blog directory, type rake migrate. (rake is the Ruby equivalent of the C platform's make or the Java platform's ant.) migrate is one rake task.
Next, show the tables in your database. With MySQL, just go to the mysql> command prompt, type use blog_development;, and then show tables; to see the result in Listing 4:
Listing 4. The schema_info table created by Rails migrations
mysql> show tables; +----------------------------+ | Tables_in_blog_development | +----------------------------+ | articles | | schema_info | +----------------------------+ 2 rows in set (0.00 sec) mysql> select * from schema_info; +---------+ | version | +---------+ | 1 | +---------+ 1 row in set (0.00 sec) |
Notice the second table: schema_info. My migration specified the articles table, but the rake migrate command created schema_info automatically. Execute select * from schema_info.
When you run rake migrate with no parameters, you are asking Rails to run all migrations that have yet to be applied. Rails does all of the following:
- Creates the
schema_infotable if it doesn't exist. - Inserts a row in
schema_infowith a value of 0 if no row exists. - Runs the
upmethods of all migrations having a number greater than the current migration.rakedetermines the number of the current migration by reading the value of theversioncolumn in theschema_infotable.rakerunsupmigrations from least to greatest, anddownmigrations in reverse.
To migrate down, just run rake migrate with a version number. Migrating down can destroy data, so be careful. Some operations, such as dropping tables or columns, also destroy data. Listing 5 shows the results of migrating down and then back up. You can see schema_info faithfully track the current version number. This approach does an excellent job of letting you move smoothly between schemas that represent different stages of development.
Listing 5. Migrating down
> rake migrate VERSION=0 (in /Users/batate/rails/blog) == CreateArticles: reverting ================================================== -- drop_table(:articles) -> 0.1320s == CreateArticles: reverted (0.1322s) ========================================= > mysql -u root blog_development; mysql> show tables; +----------------------------+ | Tables_in_blog_development | +----------------------------+ | schema_info | +----------------------------+ 1 row in set (0.00 sec) mysql> select * from schema_info; +---------+ | version | +---------+ | 0 | +---------+ 1 row in set (0.00 sec) mysql> exit Bye > rake migrate (in /Users/batate/rails/blog) == CreateArticles: migrating ================================================== -- create_table(:articles) -> 0.0879s == CreateArticles: migrated (0.0881s) ========================================= > mysql -u root blog_development; mysql> select * from schema_info; +---------+ | version | +---------+ | 1 | +---------+ 1 row in set (0.00 sec) |
It's time to crack open the table itself. Look back at Listing 3 and at the table definition. With MySQL, you can execute the show create table articles; command, which produces the result in Listing 6:
Listing 6. Table definition for articles
mysql> show create table articles;
+----------+...-----------------+
| Table | Create Table |
+----------+...-----------------+
| articles | CREATE TABLE 'articles' (
'id' int(11) NOT NULL auto_increment,
'name' varchar(80) default NULL,
'author' varchar(40) default NULL,
'body' text,
'created_on' datetime default NULL,
PRIMARY KEY ('id')
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+----------+...-----------------+
1 row in set (0.00 sec)
|
You can see that much of this table definition came directly from the migration. One of the core advantages of Rails migrations is that you don't need to use direct SQL syntax to create your tables. You handle each schema modification in Ruby, so the resulting SQL is database independent. But notice the id column. Though you did not specify that column, the Rails migration created it for you anyway, with auto_increment and NOT NULL. The id column with this particular column definition follows Rails conventions for an identifier column. If you wanted to create this column without an id, your migration would simply append the :id option, like the migration in Listing 7:
Listing 7. Creating a table without an id column
def up
create_table :articles, :id => false do |t|
...
end
end
|
You've taken a deep dive into a single migration but still haven't created a single change in the schema. It's time to create another table, this time for comments. Generate a model called Comment by typing script/generate model Comment. Edit the resulting migration in db/migrate/002_create_comments.rb to look like Listing 8. You're going to need a new table with a few columns, and you also will take advantage of the Rails capability for adding non-null columns and default values.
Listing 8. A second migration, for comments
class CreateComments < ActiveRecord::Migration
def self.up
create_table :comments do |t|
t.column :name, :string, :limit => 40, :null => false
t.column :body, :text
t.column :author, :string, :limit => 40, :default => 'Anonymous coward'
t.column :article_id, :integer
end
end
def self.down
drop_table :comments
end
end
|
Go ahead and run this migration. If you ever get an error while a migration is in progress, just keep in mind how migrations work. You need to check the value of the row in schema_info and look at the state of the database. You might need to drop some tables manually or change the value of the row in schema_info after you correct your code. Remember, nothing magic is happening. Rails runs the up methods on all migrations that haven't been run yet. If you're adding a table or column that already exists, that operation will fail, so you need to make sure your migration is in a consistent state. For now, run rake migrate. Listing 9 shows the result:
Listing 9. Running a second migration
> rake migrate(in /Users/batate/rails/blog) == CreateComments: migrating ================================================== -- create_table(:comments) -> 0.0700s == CreateComments: migrated (0.0702s) ========================================= > mysql -u root blog_development; mysql> select * from schema_info; +---------+ | version | +---------+ | 2 | +---------+ 1 row in set (0.00 sec) |
Migrations can handle many different kinds of schema changes. You can add and drop indexes; alter tables by dropping, renaming, or adding columns; and even drop down into SQL as necessary. You can do anything from a migration that you can do from SQL. Rails has wrappers around the most common operations, including:
- Create a table (
create_table) - Drop a table (
drop_table) - Add a column to a table (add_column)
- Remove a column from a table (
remove_column) - Rename a column (
rename_column) - Change a column (
change_column) - Create an index (
create_index) - Drop an index (
drop_index)
Some migrations change more than one column to consolidate a single logical change in a database. Consider a migration that would add a top-level blog, with articles belonging to the blog. You'd need to create a new table and also add a new column representing the foreign key to each article to point to a blog. Listing 10 shows the full migration. You can run the migration by typing rake migrate.
Listing 10. A migration creating a table and adding a column
class CreateBlogs < ActiveRecord::Migration
def self.up
create_table :blogs do |t|
t.column :name, :string, :limit => 40;
end
add_column "articles", "blog_id", :integer
end
def self.down
drop_table :blogs
remove_column "articles", "blog_id"
end
end
|
So far, I've focused only on changes in schema, but changes in data are important too. Some database changes require changes in data as well as schema, and some of those data changes require logical changes. For example, say you want to create a new comment on each blog article to indicate the article is open for comments. If you implement the change after your blog has already been open for a while, you'll want to add a new comment only to articles that don't already have comments. You can easily make this change in a migration because the migration has access to model objects and can make logical decisions based on the model's state. Type script/generate migration add_open_for_comments. You need to make changes to the comment to capture the belongs_to relationship and write the new migration. Listing 11 shows the model object and new migration:
Listing 11. Model object and new migration
class AddOpenForComments < ActiveRecord::Migration
def self.up
Article.find_all.each do |article|
if article.comments.size == 0
Comment.new do |comment|
comment.name = 'Welcome.'
comment.body = "Article '#{article.name}' is open for comments."
article.comments << comment
comment.save
article.save
end
end
end
end
def self.down
end
end
|
For the migration in Listing 11, you make a tactical decision. You decide that your users would not like to see welcome messages disappear after they've been added, so you choose not to delete any records within the down migration. The ability to address changes in data within a migration is a tremendous luxury. You can synchronize changes in data and schema. You can also address data changes that involve logical operations on your model objects.
I've shown you most of what's available in migrations. You've got a few other tools at your disposal. If you want to start using migrations with an existing database, you can snapshot your existing schema with rake schema_dump. This rake task creates a Ruby schema with the correct migrations syntax in db/schema.rb. You can then generate a migration and copy the schema you dumped into a migration. (See Resources for more details.) I've also not talked about test fixtures, which can be instrumental in setting up test data or seeding a database. See my earlier Crossing borders article on unit tests for more details.
The migrations story in Java programming is not nearly as robust. Some products have spot solutions for some of the schema migrations problems, but without a systemic process for coordinating schema changes -- both up and down -- dealing with changes in the data and object model can be a difficult task. The Rails solution has some core advantages:
- Rails migrations are DRY (don't repeat yourself). With Rails, you specify each column definition exactly once: in the migration. Some other mappers force you to specify a column six times: in the schema, the getter, the setter, the model's instance variable, the "from" mapping, and the "to" mapping.
- Rails migrations allow data migration as well as schema migration.
- Rails migrations let you use model logic with your data migration where SQL scripts don't.
- Rails migrations are database independent, but SQL scripts aren't.
- Rails migrations allow direct SQL for unsupported extensions (such as stored procedures or constraints), while some ORM mappers don't.
With all of the benefits of migrations, you'd expect a complex piece of code, but they are incredibly simple. Migrations have meaningful names and version numbers. Each migration, has an up and down method. Finally, a rake task coordinates running them in the correct order. This simple strategy is also revolutionary. The idea of expressing each schema change not in the model but as a distinct migration is both elegant and effective. Coordinating both data and schema changes is another paradigm shift, and an effective one. Best of all, these ideas are completely language agnostic. If you're building a new Java wrapping framework, you'd do well to consider migrations.
In the next article in the series, you'll get a sneak peek at a new Ajax and Web services enabled framework that takes full advantage of metaprogramming in Rails. Until then, open your mind and keep crossing borders.
Learn
- Beyond Java (Buce Tate, O'Reilly, 2005): The author's book about the Java language's rise and plateau and the technologies that could challenge the Java platform in some niches.
- ActiveRecord::Migration: The Rails API documentation for ActiveRecord::Migration is a good place to go for the latest capabilities of migrations.
- "Book review: Agile Web Development with Rails" (Darren Torpey, developerWorks, May 2005): Get the scoop on a book that deepens readers' understanding of Rails and the rationale behind agile development approaches.
- Understanding Migrations: The Rails Wiki has a good overview of migrations and is the most up-to-date source of information outside of code.
- From Java To Ruby: Things Every Manager Should Know (Bruce Tate, Pragmatic
Bookshelf, 2006): The author's book about when and where it makes sense to make a switch from Java programming to Ruby on Rails, and how to make it.
- Programming Ruby (Dave Thomas et al., Pragmatic Bookshelf, 2005): A popular book on Ruby programming.
- The Java technology zone: Hundreds of articles about every aspect of Java programming.
Get products and technologies
- Ruby on Rails: Download the open source Ruby on Rails Web framework.
- Ruby: Get Ruby from the project Web site.
Discuss
- developerWorks blogs: Get involved in the developerWorks community.

Bruce Tate is a father, mountain biker, and kayaker in Austin, Texas. He's the author of three best-selling Java books, including the Jolt winner Better, Faster, Lighter Java. He recently released From Java to Ruby. He spent 13 years at IBM and is now the founder of the RapidRed consultancy, where he specializes in lightweight development strategies and architectures based on Java technology and Ruby on Rails. His practice now offers a full range of Ruby and Rails education, consulting, and implementation offerings.
Comments (Undergoing maintenance)





