In the good old days, you implemented an application, and it accessed its data from a database. At a really high-level, the architecture was simply two layers: The application and the database it used.
Typical application database stack
A more advanced variation was to have multiple applications sharing a single database (the Shared Database pattern).
In this architecture, the application knows what database stores its data, the schema the data is stored in, and is responsible for transforming the data into the format it needs. This creates a lot of work for the application and makes it very dependent on the details of the database. Worse, several applications using the same data may be repeating the same effort, not only writing duplicate code to access the data but also performing the same transformations redundantly.
In modern enterprises, this application-on-a-database approach is becoming increasingly quaint. If every application first has to write a lot of code just to access its data, then applications are much more difficult to write and get working correctly. What is needed is a separation of concerns, where:
- The application is able to assume that the information it needs is easy to access in one consistent format that's exactly what it needs
- An information access layer makes the database look the way the application expects, encapsulating the knowledge needed to access the data and transform it into the desired format
In our really high-level architecture, this separation of concerns creates a third layer between the application and the database, a layer we tend to call information.
The information layer doesn't persist the data, the database still does that. The information layer rationalizes whatever is in the database, producing normalized, cleaned-up, customized data for the application.
The information layer encapsulates this data rationalization behavior so that it can be developed and maintained separately from the application. It also makes this rationalization behavior reusable by multiple applications. If your app needs certain data gathered and normalized a certain way, and another application already has that, then your app can reuse that. And if another app has already accessed this data, the data may be cached in the format your app needs so that it can just use it.
Furthermore, no complex enterprise stores its data in just a single database. An enterprise's data is spread across multiple databases, legacy systems, business partners, old archived data, unstructured data (such as much of the Internet), and so on. What may seem like one Customer record may actually come from multiple data sources. Often the same data is stored in multiple places; sometimes the redundant data conflicts with itself. Often data which an application needs to go together as a single record is stored in many different formats, none of which may be the format the application needs.
What this leads to is a three-layer architecture for the enterprise, the same application-information-database layers as before but now for a whole enterprise and not just a single application. The enterprise layers are:
- Applications -- The user applications used to perform various business tasks
- Information -- A cloud of data access that tries to make sense out of the enterprise's collective data
- Data Stores -- All the sources of data that contain the enterprise's collective data
With this layer of integrated information, the question changes from how will your application access its data to how will your application use the information layer to access the information it needs and how will the information layer access the data. Data access in an application is getting a lot more interesting.
08/29/2008 update: Here's an article that discusses this idea in a lot of detail as a pattern: "Inside the Preferred Data Source Pattern."