What are database logs? They are minimal records generated by transactional writes to keep track of database changes. In all senses, they are nothing but side products generated out of write operations within transactions; and are not something served out of, in response to users’ data queries. Moreover, DBMS incurs CPU, memory and storage resources owing to generation and maintenance of these logs. An obvious question next to this would be why someone on earth should care for stuffs that eat system resources and do just nothing? If you are a newbie in the field of system administration, and at this point ready to turn off database logging forever; hold on your decision until you read this post completely. Data management has always been a serious affair regardless of the type and/or size of the business dealing with it. Nobody would rejoice investing on something having no significant value add. So, logs are required and important in much greater extent. How? Let’s understand that.
Transaction writes facilitate applications performing changes to database’s data. By fundamental virtue, they steal database consistency at the beginning, and re-establish it at the end. In other words, during execution, transactions make database inconsistent. In addition, like any other system, failure of a DBMS is not uncommon. Situations like power outage, OS or software crash, disk corruption are unavoidable. However, under any circumstances, businesses cannot afford loosing data, and hence DBMS vendors must ensure database recovery across points of failures. Scenarios, wherein DBMS is brought down disgracefully, database is typically left inconsistent owing to abnormal termination of ongoing transactions. Applications, if allowed to connect to the database at this state, would certainly fetch some inaccurate data, which a business can never accept. Therefore, post such failure, DBMS must bring database consistency back before making it available for use. While doing this DBMS would need to refer to the history of transactional changes happened to the database before failure; but, wait…history? Where is it? Relax…DBMS won’t let your business down. Here comes the utility of transaction logs, whose credibility was being questioned sometime back. Recall that they kept on consuming system resources to keep track of the changes your database was undergoing during its good times. Now, it’s their turn to pay back. DBMS would read individual log records stored within log files sequentially, and perform the following.
- Redo the effects of committed transactions that could not make their way to database disks before the failure.
- Undo those pertaining to uncommitted (i.e. ongoing / in-flight) transactions that somehow got persistent into database storage space.
At the end, your database will get the lost data back and regain its consistency. Henceforth, applications are guaranteed to see correct data whenever connected to the database.
In another kind of failure situation, your database storage may completely or partially go bad, and you end up restoring a backup image of the database taken prior to the disk crash. Definitely, this will rebuild your database, but its state will be taken back to the backup time. Should you ask DBMS to take the state of your database forward, log records following the backup timestamp will be read sequentially from the log files and re-applied to the restored database.
So far we discussed the importance of transaction logs in a DBMS, but have you ever wondered that they could be even more precious than the actual data stored within your database? Well, let’s face the truth straight away. Yes, DBMS always cares more for transaction logs than user data. It makes sure that log records are saved in log disks before the corresponding user data is made persistent to the database. Why so? Let’s explore.
Transaction writes bring data pages from disks to bufferpool area of database memory and modifies them per user request. Each modification generates log record in log buffer (different from bufferpool, but collocated with it within database memory). Now, it is not a rare scenario, wherein bufferpool gets full of changed data pages (a.k.a. dirty pages) owing to long running transactions and DBMS needs to make space for new transactions. This is called dirty page steal, where new data pages as requested by new transaction replace the old dirty pages pertaining to other ongoing transactions. Those old dirty pages are placed into database disk. However, before that, DBMS must write their corresponding log records into the log files within log disk. If that wasn’t the logic (i.e. dirty pages were made persistent to database storage without having corresponding log records flushed to files), and a crash (like power outage) was encountered, DBMS wouldn’t find the necessary log records to undo uncommitted changes (that were written to disk). As a result, database consistency cannot be re-established. In a nutshell, transaction logs must go to disk before corresponding data changes do so. In DBMS world this mechanism is known as ‘Write Ahead Logging’ (WAL).
Externalization of log records from memory to disk is initialized when a transaction commits, and/or log buffer gets full, and/or bufferpool gets full. Writing bufferpool data to database disk on the other hand can run asynchronously based upon whether conditions like bufferpool full and/or deactivation of database etc. are met.
Hope this explains the perspective of DBMS treating transaction logs as golden data and why you cannot afford loosing them if data recovery is one of your priorities. I will talk more about this area of database technology in my upcoming posts. Until then, please stay tuned.
Although I work for IBM, the views expressed are my own and not necessarily those of IBM and its affiliates.