Contents


Real world Rails, Part 3

Optimizing ActiveRecord

Solving common performance problems

Comments

Content series:

This content is part # of # in the series: Real world Rails, Part 3

Stay tuned for additional content in this series.

This content is part of the series:Real world Rails, Part 3

Stay tuned for additional content in this series.

Ruby on Rails programming will spoil you. The growing framework will spare you from the tedium that's rampant in other frameworks. You'll express your ideas in a fraction of the lines of code that you're used to writing. And you'll get to use ActiveRecord.

As a long-time Java™ programmer, ActiveRecord was somewhat alien to me. With Java frameworks, I would normally build a map between independent models and schemas. Frameworks like this are mapping frameworks. With ActiveRecord, I define only the database schema, either in SQL or in a Ruby class called a migration. Frameworks that base the object model design on the structure of the database are called wrapping frameworks. But unlike most wrapping frameworks, Rails can discover the features of the object model by querying the database table. Instead of building complex queries, I can use the model to traverse relationships in Ruby instead of SQL. I get the simplicity of a wrapping framework with much of the power of a mapping framework. ActiveRecord is easy to use and easy to extend. Sometimes, it's too easy.

Like any database framework, ActiveRecord lets me do many things that get me into trouble. I can fetch too many columns, or leave off important structural database features, like indexes or null constraints. I'm not saying that ActiveRecord is a bad framework. You just need to know how to harden your application if you need it to scale. In this article, I'll walk you through some of the important optimizations that you might need with the Rails unorthodox persistence framework.

Managing the basics

Generating a schema-backed model is as easy as generating a little code, with script/generate model model_name. As you know, that command generates your model, migration, unit test, and even a default fixture. It's tempting to fill in a few data columns in the migration, and enter a little test data, write a few tests, add a few validations, and call it done. But be careful. You should also consider your overall database design. Keep these things in mind:

  • Rails will not insulate you from basic database performance issues. Your database needs information, often in the form of indexes, to perform well.
  • Rails will not insulate you from data integrity problems. Though most Rails developers do not like to keep constraints in the database, you should consider things like nullable columns.
  • Rails has convenient defaults for many elements. Sometimes, default attributes like the length of text fields are too large for most practical applications.
  • Rails will not force you to create an efficient database design.

Before you trudge ahead and dive deep into ActiveRecord, you should make sure that you have a strong foundation. Make sure that your index structure will work for you. If a given table will be large, if you'll be searching on a column other than id, and if an index will help you (see your database manager documentation for details -- different databases use indexes in different ways), make sure you create your index. You need not drop down into SQL to create an index -- you can simply use a migration. You can easily create an index with your create_table migration, or create an additional migration that creates your index. Here's an example of a migration that creates an index that we use for ChangingThePresent.org (see Related topics):

Listing 1. Creating an index in a migration
class AddIndexesToUsers < ActiveRecord::Migration
  def self.up
    add_index :members, :login
    add_index :members, :email
    add_index :members, :first_name
    add_index :members, :last_name
  end

  def self.down
    remove_index :members, :login
    remove_index :members, :email
    remove_index :members, :first_name
    remove_index :members, :last_name
  end
end

ActiveRecord will take care of the index on id, so I explicitly add indexes that I use in various searches, because the table is large, infrequently updated, and frequently searched. Often, we will wait until we measure a problem in a given query before we take action. This strategy keeps us from second-guessing the database engine. But in the case of users, we know that the table will quickly grow into the millions of users, and will be ineffective without an index on frequently searched columns.

Two more common problems also relate to migrations. If you have strings and columns that should not be null, make sure you code your migration appropriately. Most DBAs (database administrators) might think that Rails has the wrong default for null columns: By default, columns can be null. If you want to create a column that cannot be null, you must explicitly add the parameter :null => false. And if you have a string column, make sure that you code an appropriate limit. By default, Rails migrations will encode a string column as a varchar(255). Normally, that's too large. You should do your best to maintain a database structure that reflects your application. Rather than have an unrestricted login, if your application limits logins to 10 characters, you should code your database appropriately, as in Listing 2:

Listing 2. Coding migrations with limits and non-nullable columns
t.column :login, :string, :limit => 10, :null => false

You should also consider default values, and any other information that you can safely provide. With a little up-front work, you can save yourself a whole lot of time chasing data integrity problems later. While you are considering your database basics, also think about which pages are static, and thus easy to cache. Given a choice between optimizing a query and caching a page, caching the page will give you a far greater return, if you can stomach the complexity. Sometimes, pages or fragments are purely static, such as a list of states, or frequently asked questions. In those cases, caching is a slam dunk. Other times, you may decide to limit your complexity, and attack database performance instead. For ChangingThePresent, we've done both, depending on the problem and the circumstance. If you should decide to attack query performance, read on.

N+1 problem

By default, ActiveRecord relationships are lazy. That means the framework will wait to access a relationship until you actually access it. Take, for example, a member that has an address. You can open the console and type the command: member = Member.find 1. You'll see the following appended to your log, as in Listing 3:

Listing 3. Log from Member.find(1)
^[[4;35;1mMember Columns (0.006198)^[[0m   ^[[0mSHOW FIELDS FROM members^[[0m
^[[4;36;1mMember Load (0.002835)^[[0m   ^[[0;1mSELECT * FROM members WHERE
 (members.`id` = 1) ^[[0m

Member has a relationship to an address, that was defined with the macro has_one :address, :as => :addressable, :dependent => :destroy. Notice that you don't see an address field in the log when ActiveRecord loaded Member. But if you type member.address in the console, you'll see the contents of Listing 4 in development.log:

Listing 4. Accessing a relation forces a database access
  ^[[36;2m./vendor/plugins/paginating_find/lib/paginating_find.rb:98:in `find'^[[0m
^[[4;35;1mAddress Load (0.252084)^[[0m   ^[[0mSELECT * FROM addresses WHERE
 (addresses.addressable_id = 1 AND addresses.addressable_type = 'Member') LIMIT 1^[[0m
  ^[[35;2m./vendor/plugins/paginating_find/lib/paginating_find.rb:98:in `find'^[[0m

So ActiveRecord does not execute the query for the address relationship until you actually access member.address. Normally, this lazy design works well, because the persistence framework does not need to move as much data to load a member. But assume you wanted to access a list of members, and all of their addresses, as in Listing 5:

Listing 5. Retrieving multiple members with addresses
Member.find([1,2,3]).each {|member| puts member.address.city}

Since you should see a query for each of the addresses, the results will not be pretty, in terms of performance. Listing 6 tells the story:

Listing 6. Queries for the N+1 problem
^[[4;36;1mMember Load (0.004063)^[[0m   ^[[0;1mSELECT * FROM members WHERE
 (members.`id` IN (1,2,3)) ^[[0m
  ^[[36;2m./vendor/plugins/paginating_find/lib/paginating_find.rb:98:in `find'^[[0m
^[[4;35;1mAddress Load (0.000989)^[[0m   ^[[0mSELECT * FROM addresses WHERE
 (addresses.addressable_id = 1 AND addresses.addressable_type = 'Member') LIMIT 1^[[0m
  ^[[35;2m./vendor/plugins/paginating_find/lib/paginating_find.rb:98:in `find'^[[0m
^[[4;36;1mAddress Columns (0.073840)^[[0m   ^[[0;1mSHOW FIELDS FROM addresses^[[0m
^[[4;35;1mAddress Load (0.002012)^[[0m   ^[[0mSELECT * FROM addresses WHERE
 (addresses.addressable_id = 2 AND addresses.addressable_type = 'Member') LIMIT 1^[[0m
  ^[[35;2m./vendor/plugins/paginating_find/lib/paginating_find.rb:98:in `find'^[[0m
^[[4;36;1mAddress Load (0.000792)^[[0m   ^[[0;1mSELECT * FROM addresses WHERE
 (addresses.addressable_id = 3 AND addresses.addressable_type = 'Member') LIMIT 1^[[0m
  ^[[36;2m./vendor/plugins/paginating_find/lib/paginating_find.rb:98:in `find'^[[0m

The results are as ugly as I promised they would be. You get one query for all of the members, and another for each of the addresses. We retrieved three members, and you got four queries. N members; N+1 queries. This problem is the dreaded N+1 problem. Most persistence frameworks solve this problem with eager associations. Rails is no exception. If you know that you will need to access a relationship, you can opt to include it with your initial query. ActiveRecord uses the :includeoption for this purpose. If you changed the query to Member.find([1,2,3], :include => :address).each {|member| puts member.address.city}, you'll get a much better result:

Listing 7. Solving the N+1 problem
^[[4;35;1mMember Load Including Associations (0.004458)^[[0m   ^[
   [0mSELECT members.`id` AS t0_r0, members.`type` AS t0_r1,
   members.`about_me` AS t0_r2, members.`about_philanthropy`

   ...

   addresses.`id` AS t1_r0, addresses.`address1` AS t1_r1,
   addresses.`address2` AS t1_r2, addresses.`city` AS t1_r3,

   ...

   addresses.`addressable_id` AS t1_r8 FROM members
   LEFT OUTER JOIN addresses ON addresses.addressable_id
   = members.id AND addresses.addressable_type =
   'Member' WHERE (members.`id` IN (1,2,3)) ^[
   [0m
 ^[[35;2m./vendor/plugins/paginating_find/lib/paginating_find.rb:
  98:in `find'^[[0m

That query will be much faster. You see one query that retrieves all of the members and addresses. That's how eager associations work.

With ActiveRecord, you can also nest the :include option, but only one level deep. For example, consider a Member that has many contacts, and a Contact that has one address. If you wanted to show all of the cities for a member's contacts, you could use the code in Listing 8:

Listing 8: Getting the cities for a member's contacts
member = Member.find(1)
member.contacts.each {|contact| puts contact.address.city}

That code would work, but you'd have to query for the member, each contact, and each contact's address. You can improve the performance a little by eagerly including :contacts with :include => :contacts. You can do better by including both, as in Listing 9:

Listing 9: Getting the cities for a member's contacts
member = Member.find(1)
member.contacts.each {|contact| puts contact.address.city}

You can do much better by using a nested include option:

member = Member.find(1, :include => {:contacts => :address})
member.contacts.each {|contact| puts contact.address.city}

That nested include tells Rails to eagerly include both the contacts and address relationships. You can use the eager loading technique whenever you know that you will use relationships in a given query. That technique is the performance optimization technique that we use for ChangingThePresent.org most frequently, but it does have limitations. When you have to join across more than two tables, you're better off dropping into SQL. If you need to do reporting, you're almost always better off simply grabbing the database connection and bypassing ActiveRecord altogether with ActiveRecord::Base.execute("SELECT * FROM..."). Generally, eager associations will be more than enough. Now I'll shift gears and look at another bugaboo for Rails developers: inheritance.

Inheritance and Rails

When most Rails developers encounter Rails for the first time, they are captivated. It's just so easy. You simply create a type column on your database table and inherit any subclass from the parent. Rails will take care of the rest. For example, you might have a table called Customer that inherits from a class called Person. A customer has all of the columns of Person, plus a loyalty number and an order history. Listing 10 shows the beauty in the simplicity of the solution. The master table has all of the columns of the parent and all subclasses.

Listing 10. Implementing inheritance
create_table "people" do |t|
  t.column "type", :string
  t.column "first_name", :string
  t.column "last_name", :string
  t.column "loyalty_number", :string
end

class Person < ActiveRecord::Base
end

class Customer < Person
  has_many :orders
end

Such a solution works well in most ways. The code is simple, and not repetitive. The queries are simple with good performance because you need not do any joins to access multiple subclasses, and ActiveRecord can use the type column to determine which records to return.

In some ways, though, ActiveRecord inheritance is fairly limited. If you have an inheritance hierarchy that is too broad, inheritance will break down. For example, at ChangingThePresent, we have several types of content that each have names, short and long descriptions, some common presentation attributes, and several custom attributes. We would like causes, nonprofits, gifts, members, drives, registries, and many other types of objects to inherit from a common base class so we could treat all types of content in the same way. We can't, because the Rails model would have the meat of our object model in a single table, and that's not a realistic solution.

Exploring alternatives

We experimented with three solutions to this problem. One, we would have each proper class in its own table, and use views to build a common table for content. We threw this solution out early, because Rails does not deal with database views very well.

Our second solution was to use simple polymorphism. With this strategy, each proper subclass has its own table. We push common columns down into each table. For example, say I need a superclass called Content that has only a name property with Gift, Cause, and Nonprofit subclasses. Gift, Nonprofit, and Cause would all have a name property. Since Ruby is dynamically typed, they need not inherit from a common base class. They need only respond to the same set of methods. ChangingThePresent uses polymorphism in several places to provide common behavior, particularly when we are dealing with images.

The third alternative is to provide a common capability, but using associations instead of inheritance. ActiveRecord has a feature called polymorphic associations that is ideal for attaching common behavior to a class without inheritance. You saw an example of a polymorphic association earlier in Address. I can use the same technique to attach my common attributes for content management instead of inheritance. Consider a class called ContentBase. Normally, to associate that class to another class, you would use a has_one relationship and a simple foreign key. But you would probably want your ContentBase to work with more than one class. You need a foreign key, and also a column that defines the type of the target class. That's exactly how ActiveRecord polymorphic associations work. Look at the classes in Listing 11.

Listing 11. Both sides of a relationship for site content
class Cause < ActiveRecord::Base
  has_one :content_base, :as => :displayable, :dependent => :destroy
  ...
end

class Nonprofit < ActiveRecord::Base
  has_one :content_base, :as => :displayable, :dependent => :destroy
  ...
end


class ContentBase < ActiveRecord::Base
  belongs_to :displayable, :polymorphic => true
end

Normally, a belongs_to relationship is with only one class, but the relationship in ContentBase is polymorphic. The foreign key has not only an identifier to identify a record, but also a type to identify the table. Using this technique, I get most of the benefits of inheritance. The common functionality is all in a single class. But I also get a few side benefits. I don't have to have all of the columns in Cause and Nonprofit all in a single table.

Some database administrators do not like polymorphic associations because they do not use true foreign keys, but for ChangingThePresent, we freely use them. In truth, the data model is not as beautiful in theory as it might be. You can't use database features such as referential integrity, and you can't rely on tools to discover the relationships based on the names of your columns. The advantages of a clean and simple object model, to us, outweigh the problems of the approach.

create_table "content_bases", :force => true do |t|
  t.column "short_description",          :string

  ...

  t.column "displayable_type", :string
  t.column "displayable_id",   :integer
end

Wrapping up

ActiveRecord is a perfectly capable persistence framework. You can build scalable, reliable systems with it, but as with any database framework, you have to pay attention to the SQL that your framework generates. When you have the occasional problem, you have to adjust your approach. Keeping up with your indexes, using eager loading with include, and using polymorphic associations in places instead of inheritance are just three ways you can use to improve your code base. Next month, I'll walk you through another example of how we write real world Rails.


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=240963
ArticleTitle=Real world Rails, Part 3: Optimizing ActiveRecord
publish-date=07172007