Wednesday, August 11, 2010

Informix HDR Will Save Your Butt



If I had to guess, I would say that most production database engines utilize RAID technology to protect against the inevitable disk failure and the ones that don't probably should. Disk is cheap and the revenue saved by avoiding an extended outage can be enough to pay for disk mirroring many times over.

If I had to guess again, I would say that not nearly enough production database engines utilize High Availability Data Replication (HDR) to protect against the inevitable server failure. Why is this? Servers can fail too. Sure, servers are more expensive than disks and sure the MTTF is longer than disks but the money lost during an extended outage that could have been avoided with HDR is probably going to be more than the cost of implementing an HDR solution.

HDR continuously replicates the changes made to a Primary server to a Secondary server that can be quickly converted to a Primary if the original Primary fails. As an added bonus, the Secondary server can be used for reads and writes allowing you to make use of this hardware to improve performance instead of letting it sit there idle. You could also implement multiple Remote Standalone Secondary (RSS) or Shared Disk Secondary (SDS) servers to create a grid if your Informix Edition supports this. I'm going to focus on a single HDR Secondary which is available for no cost in Innovator-C.

As with most Informix features, HDR is incredibly easy to configure and does not require much administration.

Get Yourself Some More Hardware

To enable HDR you will need another server. This server should be identical to the Primary server. The Secondary server doesn't have to be identical in every way, but if you expect it to take over during a failure you're going to want the same amount of memory, CPUs, etc. to ensure it can handle the load. Here is what is required of servers participating in HDR:
  • Both servers must run the same Informix version
  • Both servers must be able to run the same Informix executable. Ubuntu and Red Hat run the same Informix executable, HP/UX and Red Hat do not. Why would you ever want to do this anyway?
  • Both servers must have network capabilities
  • The Secondary server must have at least as much disk space for dbspaces as the Primary. The dbspace chunk types (cooked or raw) do not have to be identical
  • Dbspace chunk path names must be identical, symbolic links can help here
  • Not really a hardware requirement but any databases you want replicated must be logged. Unbuffered logging is preferred
Install and Configure Informix on the New Server

Follow the steps from Installing Innovator-C on Linux on a new server named blogsvr02. Create 0 byte files with the touch command to mirror the dbspace chunks on the primary server.
informix@blogsvr02> mkdir /home/informix/chunks
informix@blogsvr02> touch /home/informix/chunks/ROOTDBS.01
informix@blogsvr02> touch /home/informix/chunks/LLOGDBS01.01
informix@blogsvr02> touch /home/informix/chunks/DATADBS01.01
informix@blogsvr02> touch /home/informix/chunks/DATADBS01.02
infofmix@blogsvr02> chmod 660 /home/informix/chunks/* 
Copy the /etc/profile.d/informix.sh file from the Primary to the Secondary and change INFORMIXSERVER
root@blogsvr02> scp blogsvr01:/etc/profile.d/informix.sh /etc/profile.d/informix.sh
root@blogsvr02> vi /etc/profile.d/informix.sh

export INFORMIXSERVER=blogsvr02
Copy the ONCONFIG file from the Primary to the Secondary and change DBSERVERNAME and add a DBSERVERALIASES to both ONCONFIGs that will be used exclusively for HDR.
informix@blogsvr01> vi $INFORMIXDIR/etc/$ONCONFIG

DBSERVERALIASES blogsvr01_hdr

informix@blogsvr02> scp blogsvr01:/opt/informix/etc/onconfig.blogsvr01 $INFORMIXDIR/etc/$ONCONFIG
informix@blogsvr02> vi $INFORMIXDIR/etc/$ONCONFIG

DBSERVERNAME blogsvr02
DBSERVERALIASES blogsvr02_hdr
Do we need a dedicated connection for HDR? No, but I feel doing so gives me two advantages
  • I can put HDR traffic on a separate network if I want
  • Both HDR servers must trust each other, I can use the more secure $INFORMIXDIR/etc/hosts.equiv to accomplish this if HDR runs on a dedicated port
If you would like to allow insert, update and deletes to take place on the Secondary server set the UPDATABLE_SECONDARY ONCONFIG parameter on both servers to a number between 1 and CPUVPs * 2 to configure the number of threads for transmitting updates from the Secondary to the Primary.
informix> vi $INFORMIXDIR/etc/$ONCONFIG

UPDATABLE_SECONDARY 2
Add a new port to /etc/services on both servers for HDR.
root> vi /etc/services

idshdr01         1528/tcp               # Informix HDR
Modify the sqlhosts file on both the Primary and the Secondary so they both contain connectivity information for both servers. Use the s=6 security option for the HDR ports to indicate that only Replication traffic is allowed on these ports giving us the ability to use $INFORMIXDIR/etc/hosts.equiv to establish trusts.
informix> vi $INFORMIXSQLHOSTS

# blogsvr01
blogsvr01               onsoctcp        blogsvr01                idstcp01
blogsvr01_hdr           onsoctcp        blogsvr01                idshdr01       s=6

# blogsvr02
blogsvr02               onsoctcp        blogsvr02                idstcp01
blogsvr02_hdr           onsoctcp        blogsvr02                idshdr01       s=6
Bounce the Primary server for ONCONFIG changes to take effect.

Create or Modify an Existing hosts.equiv Files

The hosts.equiv file will contain the hostname of each server that is allowed to make a trusted connection. You must also change the permissions of the file so only the informix user can write to it.
informix@blogsvr01> vi $INFORMIXDIR/etc/hosts.equiv

blogsvr02

informix@blogsvr01> chmod 640 $INFORMIXDIR/etc/hosts.equiv

informix@blogsvr02> vi $INFORMIXDIR/etc/hosts.equiv

blogsvr01

informix@blogsvr02> chmod 640 $INFORMIXDIR/etc/hosts.equiv
Note: Later when we start HDR if you see messages in your online.log (onstat -m output) that look like this:
12:12:16  listener-thread: err = -956: oserr = 0: errstr = informix@blogsvr02.prod.informix-dba.com[blogsvr02]: Client host or user informix@blogsvr02.prod.informix-dba.com[blogsvr02] is not trusted by the server.
then need you to put the full hostname, blogsvr02.prod.informix-dba.com, in hosts.equiv

Restore Secondary Server Using a Backup from the Primary

The first step in actually starting HDR is to perform a physical restore of the Primary to the Secondary.  After this is complete we will start HDR and Informix will automatically sync the Secondary with the Primary by processing the logical log records that have been written since the Primary's backup was taken.

One of my favorite Informix features is ontape to STDIO, you can use this feature to simultaneously take a Level 0 backup of your Primary, ship the data over the network and pipe it directly into a physical restore on the Secondary. This is a lot easier than performing an Imported Restore. Like to see it? Here it goes.
informix@blogsvr01> ontape -s -L 0 -F -t STDIO | ssh informix@blogsvr02 ". /etc/profile.d/informix.sh; ontape -p -t STDIO"
While this is running, you can use onstat -D on both servers to see the reading of pages on the Primary and the writing of pages on the Secondary in parallel. After the backup and restore completes the Secondary server will be in Fast Recovery mode.
informix@blogsvr02> onstat -m

IBM Informix Dynamic Server Version 11.50.UC7IE -- Fast Recovery -- Up 00:00:40 -- 1164976 Kbytes

Message Log File: /opt/informix-ids-11.50.UC7IE/tmp/online.log
13:38:11  Maximum server connections 0
13:38:11  Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 0, Llog used 0

13:38:11  Checkpoint Completed:  duration was 0 seconds.
13:38:11  Tue Aug  3 - loguniq 10, logpos 0x1816018, timestamp: 0x4a722 Interval: 721

13:38:11  Maximum server connections 0
13:38:11  Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 0, Llog used 0

13:38:11  Checkpoint Completed:  duration was 0 seconds.
13:38:11  Tue Aug  3 - loguniq 10, logpos 0x1816018, timestamp: 0x4a728 Interval: 722

13:38:11  Maximum server connections 0
13:38:11  Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 0, Llog used 0

13:38:12  Physical Restore of rootdbs, llogdbs01, datadbs01 Completed.
13:38:12  Checkpoint Completed:  duration was 0 seconds.
13:38:12  Tue Aug  3 - loguniq 10, logpos 0x1816018, timestamp: 0x4a739 Interval: 722

13:38:12  Maximum server connections 0
and you are ready to start HDR.

Starting HDR

Start HDR on the Primary with the onmode -d primary command. In this command you will tell Informix that this is a Primary HDR server and the Secondary is blogsvr02.
informix@blogsvr01> onmode -d primary blogsvr02
Start HDR on the Secondary with the onmode -d secondary command.  This will tell Informix that this is a Secondary HDR server and the Primary is blogsvr01.
informix@blogsvr02> onmode -d secondary blogsvr01
The two servers will connect and after the Secondary clears its logical logs and receives all of the logical log records from the Primary the HDR setup is complete.
informix@blogsvr02> onstat -m

IBM Informix Dynamic Server Version 11.50.UC7IE -- Updatable (Sec) -- Up 00:05:12 -- 1164976 Kbytes

Message Log File: /opt/informix-ids-11.50.UC7IE/tmp/online.log
13:42:09  Updates from secondary allowed
13:42:09  DR: Secondary server needs failure recovery

13:42:10  DR: Failure recovery from disk in progress ...
13:42:10  Logical Recovery Started.
13:42:10  10 recovery worker threads will be started.
13:42:10  Start Logical Recovery - Start Log 10, End Log ?
13:42:10  Starting Log Position - 10 0x1816018
13:42:10  Clearing the physical and logical logs has started
13:42:46  Cleared 3059 MB of the physical and logical logs in 36 seconds
13:42:48  Started processing open transactions on secondary during startup
13:42:48  Finished processing open transactions on secondary during startup.
13:42:48  DR: HDR secondary server operational
13:42:49  B-tree scanners disabled.
13:42:50  Checkpoint Completed:  duration was 0 seconds.
13:42:50  Tue Aug  3 - loguniq 10, logpos 0x181e018, timestamp: 0x4a7af Interval: 723

13:42:50  Maximum server connections 0
13:42:50  Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 14, Llog used 0
You really don't have to do anything else from this point forward to administer HDR, just sit back and relax. You're data is safer now.

What do I do when the Secondary Server Fails?

If the Secondary fails and the logical log that was current at the time of the failure has not been reused (they're circular, remember) on the Primary then you can simply restart the Secondary and it will automatically resync.
informix@blogsvr02> oninit
informix@blogsvr02> tail -40 $INFORMIXDIR/tmp/online.log
14:46:39  DR: ENCRYPT_HDR is 0 (HDR encryption Disabled)
14:46:39  Event notification facility epoll enabled.
14:46:39  IBM Informix Dynamic Server Version 11.50.UC7IE Software Serial Number AAA#B000000
14:46:40  IBM Informix Dynamic Server Initialized -- Shared Memory Initialized.

14:46:40  Started 1 B-tree scanners.
14:46:40  B-tree scanner threshold set at 5000.
14:46:40  B-tree scanner range scan size set to -1.
14:46:40  B-tree scanner ALICE mode set to 6.
14:46:40  B-tree scanner index compression level set to med.
14:46:40  Physical Recovery Started at Page (1:5623).
14:46:40  Physical Recovery Complete: 0 Pages Examined, 0 Pages Restored.
14:46:40  DR: Trying to connect to primary server = blogsvr01_hdr
14:46:41  Dataskip is now OFF for all dbspaces
14:46:41  Restartable Restore has been ENABLED
14:46:41  Recovery Mode
14:46:45  DR: Secondary server connected
14:46:46  Updates from secondary allowed
14:46:46  Updates from secondary allowed
14:46:46  DR: Using default behavior of failure-recovering Secondary server

14:46:47  DR: Failure recovery from disk in progress ...
14:46:47  Logical Recovery Started.
14:46:47  10 recovery worker threads will be started.
14:46:47  Start Logical Recovery - Start Log 10, End Log ?
14:46:47  Starting Log Position - 10 0x182e018
14:46:48  Started processing open transactions on secondary during startup
14:46:48  Finished processing open transactions on secondary during startup.
14:46:48  DR: HDR secondary server operational
14:46:49  Logical Log 10 Complete, timestamp: 0x4a92d.
14:46:50  Logical Log 11 Complete, timestamp: 0x4a944.
14:46:51  Logical Log 12 Complete, timestamp: 0x4a975.
14:46:52  Logical Log 13 Complete, timestamp: 0x4a987.
14:46:54  B-tree scanners disabled.
14:46:55  Checkpoint Completed:  duration was 0 seconds.
14:46:55  Tue Aug  3 - loguniq 14, logpos 0x9018, timestamp: 0x4a9a4 Interval: 729

14:46:55  Maximum server connections 0
14:46:55  Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 15, Llog used 0
If your Secondary has been down for a while and the logical logs have rolled over there are 2 ways to recover. The easy way and the hard way.

The easy way is to reinitialize HDR by restoring the Primary to the Secondary again and running onmode -d secondary blogsvr01_hdr on the Secondary.

The hard way is to restart the Secondary and when you see this message in the online.log
15:03:21  DR: Start failure recovery from tape ...
You can perform a Logical Restore to the Secondary using the logical log backups from the Primary. If you're backing up to a directory, copy the necessary logical log backups from the Primary to the Secondary, rename each backup to include the Secondary server name and use ontape -l -d to perform a Logical Restore.
informix@blogsvr02> scp blogsvr01:/home/informix/backup/llog/* .
blogsvr01_0_Log0000000008                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000009                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000010                                                              100% 1440KB   1.4MB/s   00:00
blogsvr01_0_Log0000000011                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000012                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000013                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000014                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000015                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000016                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000017                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000018                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000019                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000020                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000021                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000022                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000023                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000024                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000025                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000026                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000027                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000028                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000029                                                              100%   96KB  96.0KB/s   00:00
blogsvr01_0_Log0000000030                                                              100%   96KB  96.0KB/s   00:00

informix@blogsvr02> script_i_made_to_rename_the_files.ksh
informix@blogsvr02> ls -l /home/informix/backup/llog
total 3648
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000008
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000009
-rw-rw---- 1 informix informix 1474560 Aug  3 14:56 blogsvr02_0_Log0000000010
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000011
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000012
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000013
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000014
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000015
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000016
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000017
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000018
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000019
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000020
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000021
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000022
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000023
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000024
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000025
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000026
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000027
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000028
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000029
-rw-rw---- 1 informix informix   98304 Aug  3 14:56 blogsvr02_0_Log0000000030

informix@blogsvr02> ontape -l -d
Roll forward should start with log number 14
Restore is using file /home/informix/backup/llog/blogsvr02_0_Log0000000014 ...
Using the backup and restore filter /bin/gunzip.
Rollforward log file /home/informix/backup/llog/blogsvr02_0_Log0000000014 ...
Using the backup and restore filter /bin/gunzip.
Rollforward log file /home/informix/backup/llog/blogsvr02_0_Log0000000015 ...
Using the backup and restore filter /bin/gunzip.
Rollforward log file /home/informix/backup/llog/blogsvr02_0_Log0000000016 ...
Using the backup and restore filter /bin/gunzip.
Rollforward log file /home/informix/backup/llog/blogsvr02_0_Log0000000017 ...
Using the backup and restore filter /bin/gunzip.
Rollforward log file /home/informix/backup/llog/blogsvr02_0_Log0000000018 ...
Using the backup and restore filter /bin/gunzip.
Rollforward log file /home/informix/backup/llog/blogsvr02_0_Log0000000019 ...
Using the backup and restore filter /bin/gunzip.
Rollforward log file /home/informix/backup/llog/blogsvr02_0_Log0000000020 ...
Using the backup and restore filter /bin/gunzip.
Rollforward log file /home/informix/backup/llog/blogsvr02_0_Log0000000021 ...
Using the backup and restore filter /bin/gunzip.
Rollforward log file /home/informix/backup/llog/blogsvr02_0_Log0000000022 ...
Using the backup and restore filter /bin/gunzip.
Rollforward log file /home/informix/backup/llog/blogsvr02_0_Log0000000023 ...
Using the backup and restore filter /bin/gunzip.
Rollforward log file /home/informix/backup/llog/blogsvr02_0_Log0000000024 ...
Using the backup and restore filter /bin/gunzip.
Rollforward log file /home/informix/backup/llog/blogsvr02_0_Log0000000025 ...
Using the backup and restore filter /bin/gunzip.
Rollforward log file /home/informix/backup/llog/blogsvr02_0_Log0000000026 ...
Using the backup and restore filter /bin/gunzip.
Rollforward log file /home/informix/backup/llog/blogsvr02_0_Log0000000027 ...
Using the backup and restore filter /bin/gunzip.
Rollforward log file /home/informix/backup/llog/blogsvr02_0_Log0000000028 ...
Using the backup and restore filter /bin/gunzip.
Rollforward log file /home/informix/backup/llog/blogsvr02_0_Log0000000029 ...
Using the backup and restore filter /bin/gunzip.
Rollforward log file /home/informix/backup/llog/blogsvr02_0_Log0000000030 ...

Program over.

informix@blogsvr02> tail -46 $INFORMIXDIR/tmp/online.log
15:03:21  DR: Start failure recovery from tape ...
15:03:28  Logical Recovery Started.
15:03:28  10 recovery worker threads will be started.
15:03:28  Start Logical Recovery - Start Log 14, End Log ?
15:03:28  Starting Log Position - 14 0x9018
15:03:29  Started processing open transactions on secondary during startup
15:03:29  Finished processing open transactions on secondary during startup.
15:03:29  DR: HDR secondary server operational
15:03:29  Logical Log 14 Complete, timestamp: 0x4a9e2.
15:03:29  Logical Log 15 Complete, timestamp: 0x4a9f9.
15:03:29  Logical Log 16 Complete, timestamp: 0x4aa0b.
15:03:29  Logical Log 17 Complete, timestamp: 0x4aa0b.
15:03:29  Logical Log 18 Complete, timestamp: 0x4aa33.
15:03:29  Logical Log 19 Complete, timestamp: 0x4aa45.
15:03:29  Logical Log 20 Complete, timestamp: 0x4aa45.
15:03:29  Logical Log 21 Complete, timestamp: 0x4aa6a.
15:03:29  Logical Log 22 Complete, timestamp: 0x4aa6a.
15:03:29  Logical Log 23 Complete, timestamp: 0x4aa94.
15:03:29  Logical Log 24 Complete, timestamp: 0x4aaa6.
15:03:29  Logical Log 25 Complete, timestamp: 0x4aaa6.
15:03:29  Logical Log 26 Complete, timestamp: 0x4aace.
15:03:29  Logical Log 27 Complete, timestamp: 0x4aace.
15:03:29  Logical Log 28 Complete, timestamp: 0x4aaf2.
15:03:29  Checkpoint Completed:  duration was 0 seconds.
15:03:29  Tue Aug  3 - loguniq 29, logpos 0x18, timestamp: 0x4aafc Interval: 730

15:03:29  Maximum server connections 0
15:03:29  Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 16, Llog used 0

15:03:30  Logical Log 29 Complete, timestamp: 0x4ab1f.
15:03:33  DR: Failure recovery from disk in progress ...
15:03:33  Logical Log 30 Complete, timestamp: 0x4ae61.
15:03:33  Checkpoint Completed:  duration was 0 seconds.
15:03:33  Tue Aug  3 - loguniq 31, logpos 0x15018, timestamp: 0x4aea4 Interval: 731

15:03:33  Maximum server connections 0
15:03:33  Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 13, Llog used 0

15:03:33  Checkpoint Completed:  duration was 0 seconds.
15:03:33  Tue Aug  3 - loguniq 31, logpos 0x17018, timestamp: 0x4aeaa Interval: 732

15:03:33  Maximum server connections 0
15:03:33  Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 0, Llog used 0

15:03:35  B-tree scanners disabled.
15:03:36  Checkpoint Completed:  duration was 0 seconds.
15:03:36  Tue Aug  3 - loguniq 31, logpos 0x20018, timestamp: 0x4aed2 Interval: 733

What do I do when the Primary Fails?

When your Primary fails you can quickly make the Secondary server a Standalone (i.e. no HDR) server. Even if you have configured an Updatable Secondary you will need to do this since the writes on a Secondary are sent to the Primary under the covers.

Make the Secondary a Standalone server with the onmode -d standard command
informix@blogsvr02> onmode -d standard
informix@blogsvr02> onstat -m

IBM Informix Dynamic Server Version 11.50.UC7IE -- On-Line -- Up 00:30:32 -- 1164976 Kbytes

Message Log File: /opt/informix-ids-11.50.UC7IE/tmp/online.log
15:38:26  Logical Recovery Complete.
15:38:27  Quiescent Mode
15:38:27  Checkpoint Completed:  duration was 0 seconds.
15:38:27  Tue Aug  3 - loguniq 31, logpos 0x4c018, timestamp: 0x4b0a6 Interval: 740

15:38:27  Maximum server connections 0
15:38:27  Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 0, Llog used 1

15:38:27  Started 1 B-tree scanners.
15:38:27  B-tree scanner threshold set at 5000.
15:38:27  B-tree scanner range scan size set to -1.
15:38:27  B-tree scanner ALICE mode set to 6.
15:38:27  B-tree scanner index compression level set to med.
15:38:27  DR: Reservation of the last logical log for log backup turned on
15:38:27  SCHAPI: Started dbScheduler thread.
15:38:27  DR: new type = standard
15:38:27  Booting Language  from module <>
15:38:27  Loading Module 
15:38:27  SCHAPI: Started 2 dbWorker threads.
15:38:28  On-Line Mode
When the old Primary is fixed and ready to be brought back online you have options.

Option 1 is to reinitialize HDR just like we did when setting up HDR for the first time. Except now blogsvr02 will be the Primary and blogsvr01 will be the Secondary. I like this option because it doesn't require any downtime.

Option 2 is to make blogsvr01 the Primary again (easier to do if the logs have not rolled over on blogsvr02.) This requires some downtime and assumes that the disks on blogsvr01 were not the reason it went down and all of the data is still intact.

Switch blogsvr02 to Quiescent Mode
informix@blogsvr02> onmode -s
Change the HDR status of blogsvr02 to Secondary
informix@blogsvr02> onmode -d secondary blogsvr01_hdr
Start Informix on the Primary
informix@blogsvr01> oninit
If the logical logs have rolled over on the Secondary (while it was Standalone) you will need to do what we did before. Move the logical log backups that you need from blogsvr02 to blogsvr01, change their names and run ontape -l -d

If everything works as advertised the Secondary will ship over the logs the Primary needs, they will be applied to the Primary and HDR will be restored.

Pretty cool stuff that has saved my butt more than a couple of times.

6 comments:

  1. Hi Andrew, I have a simple question to you. Can I use the secondary server to take a backup? And, can I use that backup to restore primary in case of catastrophic failure? (both servers dead, maybe a earthquake or something)
    Thanks in advance,
    Manuel

    ReplyDelete
    Replies
    1. The secondary can not be used to take a backup, this can only be performed against the primary.

      If you are concerned about geographical redundancy you should take a look at RSS nodes in a different location if you can afford it.

      Delete
  2. Hi Andrew,

    Awesomely documented process. Very clear, concise and easy to understand.

    Only perhaps one thing missing, you could include a simple start and stop procedure (for those that are new to HDR) if you just need to shutdown the servers ( say for hardware maintenence)

    Thanks

    ReplyDelete
  3. Hi Andrew,
    My name is Roberto. I would like to have the commands to check if replication is ok (up & running) on both sides, primary / secondary.
    Please, can you help in this ?

    ReplyDelete
  4. hi Roberto , iam not Andrew , but just type onstat -g dri on any server

    ReplyDelete
  5. I have a problem. I am kew to Informix. I don't have backups from my databases and I used oninit -ivy. Now all my databases or gone. How can I get them back?

    ReplyDelete