Friday, August 19, 2016

Audience Participation! Benchmark Edition

My company recently purchased new servers with dedicated storage, and any time this happens (and I have the luxury of time to do it), I take the opportunity to re-test my old assumptions concerning what storage configuration(s) work best.  The tests are ongoing, and will be for several days now, but I thought it would be fun to ask you all to guess what you think the results will be.*

First, the storage array details:

It's equipped with 13x 15K RPM spinning drives with 12 Gbit/sec SAS interfaces.  The storage is not shared, meaning the host to which it's connected is the ONLY one that's using it.  One drive is reserved as a hot spare, leaving 12 usable drives.  I'm testing the following configurations for performance:

  1. One Giant RAID-10 ("1x12R10"): Stripe and mirror all 12 drives in a single 12-drive stripeset
  2. One Giant RAID-6 ("1x12R6"): Stripe all the drives in a single 12-drive stripeset with two parity drives
  3. Two big RAID-10s ("2x6R10"): Two sets of six drives, each mirrored and striped as RAID-10
  4. Two big RAID-6s ("2x6R6"): Two sets of six drives, striped with two parity drives per set
  5. Three small RAID-10s ("3x4R10"): Three sets of four drives, each mirrored and striped as RAID-10
  6. Plain RAID-1 ("6xR1"): Drives are mirrored but not striped

In all cases, I've set up 6 equally-sized LUNs. In the 6xR1 case, each LUN corresponds neatly to a mirrored pair; In the 3x4R10 case, there are two LUNs on each stripeset; for the two-stripeset cases, there are three LUNs per set; and in the one-big-stripe case, obviously, all LUNs on the single set.  In all cases, I have disabled the storage array's controller cache, because I want to emulate times of very high activity and test the disk, not the cache.

I'm using two different tests as benchmarks: The first (call it the "OLTP test") is the AGS TPM-C test, which you're familiar with if you've ever participated the Advance Data Tools "Fastest DBA" contest.  For this test, all of the large tables are fragmented by hash across six dbspaces, one on each LUN.  There are six index dbspaces, also one per LUN, across which the indexes have been distributed in round-robin fashion. I did not fragment/partition any of the indexes.  In this test, I have the buffer pools tuned very small, because to repeat, I'm trying to test the disk, not the rest of the system.

The second test (call it the "batch test") is a very large purge job that sequentially scans a table with two indexes and deletes about 2 million rows from that table.  The table itself is fragmented using a hash across three dbspaces, each on a separate LUN; there's a BYTE column that's stored in a multi-chunk blobspace; the chunks for the blobspace are distributed across all six LUNs, but no single chunk spans more than one LUN.  And both of the indexes are fragmented six ways, with one fragment on each LUN.  Three of these batch jobs are run simultaneously, one for each table fragment.

In both cases, there's a rootdbs, a dbspace for the physical log, and two logical log dbspaces.  All four of these dbspaces are on separate LUNs from one another, and the logical logs alternate between the two log dbspaces (so that log #2 is not in the same dbspace as log #1).

So place your bets! Which storage configuration will work best for the OLTP test? Which will work best for the batch test?  I'm curious to hear what you think.  I'll post the complete results when I have them.  Feel free to ask questions if you have any.

* - This will also be a good test to see if anybody's actually bothering to read these posts. :)

Friday, August 12, 2016

Public Service Announcement: Smart BLOBs Still Suck

The last time I tried to mess around with Smart BLOBs was probably in 2007 or 2008.  I found that they were buggy and excruciatingly slow.  But that was years ago, most likely IDS 10.0.  I've got a pretty good use case for them now, and I'm using the latest, greatest IDS 12.10.FC7, so I thought I'd have another look at them.

They're still buggy and excruciatingly slow.

Buggy how? I ran out of space in the main dbspace while trying to populate a test table in a non-logged DB with non-logging Smart BLOBs. The result is that the Smart BLOBs appear to have been created with a ref count of 0, and I was left with records that I couldn't delete, and tables I couldn't drop.  And this is apparently a common enough occurrence that there's an option for onspaces that allegedly deals with it.  (Since I was only testing, I found it easier just to restore the engine to its baseline.)

But maybe that was a fluke; just don't run out of space, or at least not in those circumstances, and it should be fine.  Which brings us to performance.

Excruciatingly slow how?  I tested the same purge process using the BYTE data type and the BLOB data type.  With BLOB, the process took nearly three times as long.  With BYTE, a 3 million row purge took just over 13 minutes, on average.  With BLOB, over 36 minutes.

Perhaps others of you have had better luck, but my experience with Smart BLOBs was bad then, and it's still bad now.

Thursday, July 21, 2016

Major Linux I/O Bug? SPOLER: No

Alternate title: Betteridge's Law Holds!

Last week, I posted on what I thought was an I/O bug in RHEL 7 and CentOS 7. I've now done enough testing to eliminate this possibility.  So, as promised:

So what was happening, then, that led me to get such drastically different results?

First, and most importantly, as I noted in an update to the prior post, one system was using SSD and the other was using spinning disk.  That accounted for the large bulk of the difference.  It still leaves the somewhat unsolved mystery of my production performance problems that coincided roughly in time with when the OS was updated (which is what led me down that rabbit hole in the first place).

But there may also have been other configuration changes made around that time that accounted for this.  In preliminary testing, I've found that I get better performance by turning kernel asynchronous I/O off, contrary to conventional wisdom (and to what I've found on other platforms). I've coupled this with also disabling DIRECT_IO, though I believe the latter may be a bad thing: it may be that the performance is buoyed by caching that's going on at the OS level, which would mean that data hasn't really been committed to disk even though the DB engine believes that it has. By making these two changes, I've gotten the performance on the production system back close to previous (good) levels, at least for the time being.

But anyway, false alarm, and egg on my face.

Friday, July 15, 2016

Major Linux I/O Bug? -- UPDATED 2016-07-19

Update 19 Jul 2016: I still believe there's an issue here, but I've got reason to believe my tests were not as apples-to-apples as I'd originally thought. So I'm re-engineering the tests and trying again to get updated numbers.  I'm also checking to see if I can replicate results with Informix completely removed from the mix, doing some testing with simple dd.

It is also possible that there's no problem at all, and I may have to cue Emily Litella.

Further Update: Not only were apples not apples, they were also not oranges or Buicks. Turns out a big part of the reason I saw the huge discrepancies was because I had SSD on one system and spinning disk on another.  Needless to say, the numbers from the original post are meaningless because of that mistake. But there's still the mystery of the performance degradation in production that coincides with when updates were installed to CentOS.

As a sanity check, I re-ran the same test on various platforms and compared those results to one another and to results I got back in September when I ran a similar battery.  In every case, the results were better today than in September EXCEPT on RHEL/CentOS, where they were considerably worse than they had been in September. AIX on old hardware (with several storage configurations) smoked all combinations on Linux. So I'm thoroughly confused, and still researching.
End Update

I believe I've stumbled upon a major bug in the Linux kernel I/O subsystem.  I've been working on testing newly-acquired hardware to see how it compares against the hardware we're replacing, and in my initial tests, the results I was getting were terrible.  Read performance looked pretty good, but write performance was abysmal.

At first, I was blaming it on the new DAS, thinking that there must be some kind of setting causing it to heavily prioritize reads over writes.  I'd run the same test on AIX machines, and those tests lost on read performance but won easily on write performance.  Then, as a sanity check, I decided to do the same test on a VM, and got the same abysmal results.  I'd run the test on an identically-configured VM before, several months ago, and got much different results, so that tripped my inner, "Hey! Wait a minute!"

This reminded me of a lingering performance issue that we've got on a production Linux VM that seems to have started back in mid-May. So I went onto the VM in question and ran "yum history." That shows that I had done a "yum update" back on May 18, right about the time performance issues started to be reported. "Aha!" I thought.  I've got a culprit now.  In setting up the new hosts, I was doing a "yum update" almost immediately.  So what happens if I re-install the system and don't install updates?

You can guess where this is going: performance looks fantastic.

I'm still trying to narrow it down.  My hope is to figure out which particular package update causes the problem, and I'll report back if/when I find it.  So far, here's what I can tell you:

  • The problem replicates on both RHEL 7 and CentOS 7.
  • If you install cleanly from the latest media (as of about 2 weeks ago), I/O performance looks great. (Kernel release: 3.10.0-327.el7.x86_64)
  • If you run "yum update" then after the updates have installed, I/O performance will go down the crapper: read performance drops by a little more than 10%; write performance drops by nearly 80%! (Kernel releases 3.10.0-327.18.2.el7.x86_64 and 3.10.0-327.22.2.el7.x86_64)
  • On systems, VMs where the updates have been installed, disabling kernel asynchronous I/O (KAIOOFF=1) seems to help, bringing read performance back to what I'd expect, and getting back about half the difference on write performance.

Note: the test I've been running is a large, single-threaded purge, starting from the same baseline and purging exactly the same records.  To isolate read performance, I have a version of the purge that scans the table to find the records to be purged but doesn't do the actual purging.  Here are the results of those tests:

Without updates, read-only: 18182.94 rows/second

With updates, read-only: 16170.75 rows/second
With updates, read-only, KAIOOFF=1: 18,389.99 rows/second

Without updates, full purge: 2010.49 rows/second
With updates, full purge: 438.46 rows/second
With updates, full purge, KAIOOFF=1: 1125.58 rows/second

[Still to test: Without updates, KAIOOFF=1]

Tuesday, May 10, 2016

Is This Thing On?

In a move that should make all of you question Andrew's judgment, he's invited me to also contribute to the Informix DBA blog.  So now you'll get erratic posting from two of the three fastest* Informix DBAs in the world.

Hopefully I'll be able to contribute something of value besides bad humor.

* - Fastest according to one extremely arbitrary contest several years and versions ago, in which only a small fraction of the world's Informix DBAs actually participated.

Thursday, January 29, 2015

Resolving In Place Alters - Part 4, The Slot Table

By the end of Resolving In Place Alters, Part 3 I was able to read the raw data pages that belong to the table with pending IPAs, but I wasn't able to see if a data page had a pending IPA or not. In this installment I will show you how I solved that problem using what I know about the Informix Data Page structure and the structure of my table before and after the in place alter was performed.

Anatomy of the Informix Data Page

When Informix stores data on disk, it doesn't just write your row data. It also writes some "housekeeping" data at the beginning of the page and at the end of the page that help the engine to know what actually is stored in this data page.

The first 24 bytes of a page contain the page header with all kinds of information that you're going to have a hard time finding documentation about. The last N bytes of a data page contain a timestamp and the slot table, which you may have better luck finding some documentation on. In between the page header and page footer (slot table + timestamp) you'll find your actual data. We will need information from each of the 3 parts of the data page.

Friday, January 23, 2015

Resolving In Place Alters - Part 3

In Resolving In Place Alters - Part 2 I decided that I want to try to identify only the pages with a pending IPA and only "fix" those data pages in an attempt to speed up the resolving of IPAs and limit the work load added to the system when doing so.

To do this, I will need to read the actual data pages stored on disk and use information in the page header and slot table to determine if a page needs to be fixed. More on this in the next blog, today I just want to talk about how to read in the raw data pages.

Thursday, January 15, 2015

Resolving In Place Alters - Part 2

In Part 1 I talked about how I knew that the prescribed method of resolving the pending in place alters in my large table by simply issuing a single large dummy update statement to update each row won't work for large tables. There are just too many problems with long transactions, locks and excessive I/O that can trash the engine.

I decided the first thing I would try would be to update every row in the table, but do it in multiple transactions to avoid long transactions and holding on to a lot of locks. Nothing too interesting, pretty standard stuff really. The logic looks like this:

Friday, November 21, 2014

I'm Back! Did You Think I was Gone Forever?

I Sure Did Miss You

Truth be told, I just haven't had much to write about until now. I haven't been doing much of anything interesting with Informix. Sure, I've been doing the day to day Informix DBA stuff, but nothing blog worthy. Mostly I've been working on some software development, data mining and business optimization projects for my company. But that all changed this week.

The Catalyst

I want to upgrade to v12.1, but I'm a big fat chicken and I've got a pending in place alter on my most importantest of important tables. The one that holds account information and balances for over 1 billion accounts and the manual tells me I should probably resolve all of my IPA's before performing an in place upgrade. Could I take a chance and upgrade the engine without following the manual's advice? Sure. Have I done it before without problems? Yeah, but I'm not proud of it and they weren't on our most precious table, so this time I'm going to follow the rules.

Part 1 - What to Expect and What I Knew Wouldn't Work

This will be a 89 part series (give or take a few) on how I accomplished this task without any downtime, without killing performance, without locking up the table and most importantly without updating every gosh darned one of those 1 billion rows like the manuals suggest you do. (Ok, full disclosure, I'm still working on the last little bit, but I think I've got this nut cracked.)

I have submitted a proposal for IIUG2015 on this topic, so if you don't feel like reading you can just wait for the movie. You will have to buy a ticket and they will be on sale soon. If you too would like to present something at IIUG2015 and receive a complimentary pass, there is still time to submit a proposal. This is the 20th anniversary of the first Informix conference in Chicago and it will be fun and educational as always.

What I knew wouldn't work - The manuals seem to assume we are all operating with tables with only a handful of rows and their simple solution to resolving IPA's is to just dummy update every row in the table.

update your_tiny_table set your_field = your_field where 1 = 1;

Well, we all know that will cause havoc on a system if you're updating a table with any substantial size. Locks, long transactions, evil I/O, oh my. There has to be a better way and there is. Next time I'll show you the first thing I tried. (Spoiler alert, it didn't work either.)

Friday, March 21, 2014

Eric Vercelletto Asks and Answers the Question of "Where is Informix?"

You may have already read this article by my friend and "Benchmark Guy", Eric Vercelletto, but it is so well done and contains so many great thoughts and observations that I think it needs to be posted here for you to read again (or for the first time).

Eric's Blog

Where is Informix?