Friday, July 23, 2010

Blogging About Logging - Informix Physical and Logical Logging

What rolls down stairs alone or in pairs
Rolls over your neighbor's dog?

What's great for a snack and fits on your back?
It's Log, Log, Log!

It's Log, Log.  It's big, it's heavy, it's wood.
It's Log, Log.  It's better than bad, it's good!

Everyone wants a Log!  You're gonna love it, Log!
Come on and get your Log!  Everyone needs a Log!

- Ren and Stimpy

Logical Logging

Each time you insert, update or delete a row a logical log record containing information about what was changed and how is written to the logical log (if transaction logging is enabled for your database.  You do have this enabled, right?)  Each time you create or alter a database object (table, index, sequence, etc.) a logical log record is written to the logical log (even if transaction logging is disabled).  Each time you run a backup, a checkpoint happens, you add a chunk or dbspace or a new extent is allocated a logical log record is written to the logical log.

So what are all of these logical log records good for?  Well, they're good for a lot of things.
  • Transaction Rollback
  • Recovery from Engine Failure (Data Restore and Fast Recovery)
  • Deferred Constraint Checking
  • Cascading Deletes
  • Distributed Transactions (Two Phase Commit)
  • Enterprise Replication
  • High Availability Data Replication 
Physical Logging

The Physical Log stores a before-image of any page that Informix is going to modify.  A before-image of a page is how the page looked at the time of the last checkpoint.  Only one before-image for each modified page is written to the Physical Log.  If you were to modify the same page over and over and over again in between checkpoints, only the first modification would cause a before-image to be written to the Physical Log.

The before-images in the Physical Log are used by fast recovery and database restores to give an accurate snapshot of your data at the time of the last checkpoint.

Thatsalotta I/O

Yes, it is and if not configured correctly logical and physical logging can become a bottleneck.  Heck, even if configured the right way logical and physical logging can be your bottleneck.  Don't be sad, something has to be a bottleneck and it will most likely be the slowest part of a database system, the disk.  The goal is to minimize the impact of all this logging.  Whether you like it or not, we need to do all of this logging to keep our data safe.


When the engine writes to the logical or physical log it doesn't write directly to disk, it puts as many of these writes as it can into one of the 3 logical log buffers or one of the 2 physical log buffers.  These buffers are then written to stable storage on disk in a block which is faster than doing them one at a time.

I have never personally seen a performance problem that was a result of poorly configured log buffers nor have I seen much performance gain from dramatically increasing the size of my log buffers.  Start out with a LOGBUFF of 64 KB and a PHYSBUFF of 128 KB (the defaults in onconfig.std) and monitor the pages/io fields in the onstat -l output and shoot for values that equal buffer size / 2 * .75 or 75% full at the time of buffer flushing.

NOTE: If Unbuffered logging is configured (this is what I use for the highest level of data integrity) you will be hard pressed to get near 75% buffer fullness at the time of flush if you are an OLTP environment and have a lot of small transactions.  Unbuffered logging forces the logical log buffer to be flushed at the end of a transaction before the transaction can complete.  I'm currently getting 1.2 pages/io (2.4 KB) for my logical log buffers.

Get Yourself Some More Disks

Look at what can happen when you modify a row
  • Find the page to be modified either through and index or a sequential scan
  • Read the data page containing the row to be modified from disk into memory
  • Write a before-image to the physical log buffer
  • Write a logical log record of the update to the logical log buffer
  • Modify the data page in memory
  • Flush the physical log buffer to disk
  • Flush the logical log buffer to disk
  • Flush the data page to disk 
That's a lot of disk access to different parts of the disk trying to happen at about the same time.  You can really improve performance by putting your physical log, logical logs and data on separate spindles.  Is this a requirement to get good performance with Informix?  Hell no.  All RDBMSs use logging in a similar way and I'll bet that there are plenty of single disk MySQL, SQL Server, DB2, Oracle and Informix production installs out there doing just fine.  But if maximum performance is your goal and you can get multiple disks to spread and parallelize your I/O then do it.

Increasing the size of the Physical Log

Until recently it was kind of taboo to have your physical log live in the rootdbs.  The mantra was "put rootdbs, the physical log and the logical logs on separate disks" and if you didn't your peers would laugh at you and call you names.  Truth is, if you are doing what you're supposed to and do not have any user data living in the rootdbs there really isn't too much I/O to the rootdbs so having the physical log live there doesn't cause much I/O conflict.

On engine initialization the physical log is put into the rootdbs and that is where I like to keep it (provided I move the logical logs somewhere else.)  You should, however, increase the size of the physical log.  There is absolutely no performance penalty for having a big physical log, none.  There are drawbacks to having an undersized physical log.
  • Checkpoints firing too frequently due to small physical log hitting 75% full mark often
  • Checkpoint blocking due to physical log filling during a checkpoint
  • Fast Recovery can take longer if physical log is not large enough 
So make it bigger than the default of 50,000 KB.  Who cares, disk is cheap.  I use a physical log size of 4 GB on my systems and the manual recommends a physical log size of 110% the size of your buffer pool.  Pick a size based on the available disk you have and the workload for your environment, there really isn't a one size fits all answer here.  Start with 1 GB and go from there if you don't know where to start.  Use onparams to change the physical log.

informix> onparams -p -s 1048576

Moving and Increasing the Size of the Logical Logs

Unlike the physical log, there are multiple logical logs.  When the current logical log fills up the engine starts using the next logical log and kicks off a backup for the recently filled logical log.  The logical log backup is initiated through the script defined by ALARMPROGRAM in your ONCONFIG or the engine marks it backed up if LTAPEDEV is set to /dev/null.  This is very important stuff if you're going to be using Informix, but this post is getting long enough so I'll have to save it for another day.

If you don't change anything in your ONCONFIG when you initialize the engine you get 6 logical log files of 10,000 KB each put in your rootdbs.  This simply not enough logical logs, the logical logs are too small and they're living with our physical log so we need to change this.

The amount of logical log space you have is a factor of the number of logical logs you have and the size of each log (obvious?).  If you do not have enough logical log space you will run into problems.  The logical logs are used in a circular fashion, when a new logical log is needed, Informix starts overwriting the data in the logical log that contains the oldest data as long as it is 1) marked as backed up and 2) does not contain any open transactions.  If either 1 or 2 are met, the engine will block until neither 1 or 2 are true.  Having sufficient logical log space won't help with 1, but it will help with 2.

As with the physical log, there is no penalty for having too much logical log space, but you need to size your individual logical logs to be big enough that you're not switching logs every 3 seconds but not so big that they never fill up and therefore are never backed up.  Having super huge logical logs that contain 3 days worth of transactions before they are backed up exposes you a little bit to more data loss.  If your system totally dies and the disk that holds the logical logs is destroyed and you can't perform a log salvage during a cold restore to get the logical log data from the logs that are not backed up, well you just lost 3 days of transactions.  TMI?

In my environment I have 31 261628 KB logical logs for a total logical log space of about 8 GB.  My logical logs switch about every hour during peak times.  Your logical log setup will vary, but for shits and grins lets say we want 2 GB of logical log space divided between 16 logical logs.

We're going to need to create a new dbspace to hold these logical logs.  I want it to be exactly 2 GB (2 GB * 1048576 KB = 2097152 KB).  We do this with the onspaces command.

informix> touch /home/informix/chunks/LLOGDBS01.01
informix> chmod 660 /home/informix/chunks/LLOGDBS01.01
informix> onspaces -c -d llogdbs01 -p /home/informix/chunks/LLOGDBS01.01 -o 0 -s 2097152

If we look at the output from onstat -d we can see that we have a dbspace with a single chunks containing 1048576 pages and 1048523 pages free because 53 pages have been reserved by Informix.

1048523 pages / 16 logical logs = 65532 pages per logical logs.

Like I said before, when you initialize the engine you get 6 logical logs numbered 1 to 6.  These have to go.  They're of a funky size and in a different dbspace and I just can't live with that.  I just can't drop them all at once either, because Informix requires me to have at least 3 logical logs at all times.  I also can't drop the current logical log or a logical log that has the last checkpoint.  Picky, picky.  

I will use the onmode -l to force logical log switches until I'm in logical log 4 (look for the C flag of the onstat -l output) and then I'll force a checkpoint with onmode -c.  I also temporarily set LTAPEDEV to /dev/null and bounced the engine, letting me fake some required logical log backups.  I'm assuming you're working with a new engine and don't care about losing data at this point.

I will use the onparams command to drop the first 3 (leaving me with 3 total) then create 3 logical logs of the new size and in the new dbspace to take their place.  I can then drop the remaining 3 funky logical logs and create the remaining 13 logical logs.

informix> onparams -d -l 1 -y
informix> onparams -d -l 2 -y
informix> onparams -d -l 3 -y
informix> onparams -a -d llogdbs01 -s 131064 # 65532 pages = 131064 KB
informix> onparams -a -d llogdbs01 -s 131064
informix> onparams -a -d llogdbs01 -s 131064
informix> onmode -l
informix> onmode -l
informix> onmode -l
informix> onmode -c
informix> onparams -d -l 4 -y
informix> onparams -d -l 5 -y
informix> onparams -d -l 6 -y
informix> onparams -a -d llogdbs01 -s 131064
informix> onparams -a -d llogdbs01 -s 131064
informix> onparams -a -d llogdbs01 -s 131064
informix> onparams -a -d llogdbs01 -s 131064
informix> onparams -a -d llogdbs01 -s 131064
informix> onparams -a -d llogdbs01 -s 131064
informix> onparams -a -d llogdbs01 -s 131064
informix> onparams -a -d llogdbs01 -s 131064
informix> onparams -a -d llogdbs01 -s 131064
informix> onparams -a -d llogdbs01 -s 131064
informix> onparams -a -d llogdbs01 -s 131064
informix> onparams -a -d llogdbs01 -s 131064
informix> onparams -a -d llogdbs01 -s 131064

Checking onstat -d shows me that I have used all but 11 pages (22 KB) in llogdbs01 and onstat -l shows me my 16 new logical logs.

Take it One Step Further

If you are spreading out the I/O to multiple disks and you have disks to spare, try creating two logical log dbspaces on two different drives and alternate the creation of logical logs between these two dbspaces.  For example, create logical log 1 in llogdbs01, logical log 2 in llogdbs02, logical log 3 in llogdbs01, etc.  When a logical log switch occurs a backup reads all of the data from the old log while you're busy trying to write transactions to the new log.  If these two logical logs are on the same disk then you can see some I/O contention during the backup.  Is this required?  Nah, just something to keep in the back of your head.  There is a good chance in this day and age that you won't even be dealing with individual disks anyway.  Instead you're probably just be given a slice of disk on the SAN and all this I/O separation stuff is meaningless.  Kind of sad really.


  1. very well explain

  2. I disagree with the sentence: "Fast Recovery can take longer if physical log is not large enough"
    I think that the smaller the physical log, the shorter the recovery time


  3. Roger,

    I agree with you a little bit. If you have a small physical log and are having more frequent checkpoints because of the physical log becoming 75% full then recovery time will be shorter because there will be less to recover, but this is going to impact overall performance because you'll be checkpointing more often.

    On the other hand, there is no disadvantage to having a ridiculously large physical log (aside from wasted disk space). Fast Recovery against a database with a 4GB bufferpool should be the same with a 5GB physical log or a 5TB physical log.

    When RTO_SERVER_RESTART is configured the engine will log additional pages to the physical log that may be needed for fast recovery to ensure that no random I/Os are needed during roll forward.

    If the physical log is too small we will get this warning:

    Performance Warning: The physical log size is smaller than the recommended size for a server configured with RTO_SERVER_RESTART.
    Results: Fast recovery performance may not be optimal.
    Action: For best fast recovery performance when RTO_SERVER_RESTART is enabled,
    increase the physical log size to at least %ld Kb. For servers
    configured with a large bufferpool, this may not be necessary. Refer
    to the IBM Informix IDS Administration Guide for more information.

    But now that I think about it more, why would fast recovery time not be optimal? If we are logging additional stuff to make fast recovery faster then wouldn't we just hit the 75% physical log full mark sooner and trigger a checkpoint? I don't know how this would impacts fast recovery time, we are still writing the additional stuff to the physical log to eliminate random I/O during roll forward at the expense of more frequent checkpoints.

    If I find an answer to this question, I will post it here. I still stand by my suggestion of having more physical log than you think you need simply because disk is cheap, it won't make performance worse (it may make performance better because of less frequent checkpoints) and it MAY improve fast recover time.

    Thanks for the comment!

  4. This comment has been removed by the author.

  5. This comment has been removed by the author.

    1. Google/Mozilla ate your previous comment, I don't see you in the comment spam folder.

  6. I have never personally seen a performance problem that was a result of poorly configured log buffers nor have I seen much performance gain from dramatically increasing the size of my log buffers.

    Given that you wrote this nearly 4.5 years ago, I'm wondering if this still holds true. I bring it up because we've been seeing periods of high volume during which we get a large number of user sessions waiting to write to the logical log buffer (onstat -u flags field leads off with G), and am trying to think of ways in which we might try to alleviate that. Increasing LOGBUFF (presently 128) was one idea that sprung to mind.

    Disclosure: There's HDR replication in NEAR_SYNC mode involved here, which complicates matters.

    1. Also, the blockquote tag is not allowed? Tsk!

  7. I recently came across your blog and have been reading along. I thought I would leave my first comment. I don"t know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often. High Authority Blog comments | Niche Relevant blog comments | Web 2.0 Backlinks | | Content Writing service