Recovering ISAM files after production failures

This guide describes what to do when a production server has experienced one of the following events:

Power failure
OS crash
Hardware failure (disk, RAM, CPU)
kill -9 or Task Manager–based process termination
Restoring a backup of a file that was performed while an active store, write, or delete was occurring

Minimizing downtime: What steps can I take?

You can take the following steps to minimize downtime before one of the above events occurs:

Use Synergy/DE 11+ with REV 6 ISAM files that have been marked as “resilient.”

Using the RESILIENT file option may obviate the need for isutl -r or isutl -v after the above events other than severe types of disk/memory failure. When a file marked RESILIENT is opened, Synergy checks special indicators inside the file that let it know if the index and data are out of sync. Unlike isutl -v, this check is almost instantaneous and always performed. This requires the ISAM file to be a full REV6 file as indicated by an ipar output of the ISAM file (ipar ISAMfile.ism, where ISAMfile.ism is the ISAM file). See isutl -p for more information about ISAM patches.

Use isutl -p to upgrade a file to REV 6 ISAM, add RESILIENT, and make it SGRFA:

isutl -p -qfile=resilient,convert,sgrfa filename

Ensure your backups are consistent.

When noncooperative backups occur, the resulting data may contain partial I/O operations or partial transactions. If this happens, the index file will be corrupt, and data may be in a logically inconsistent state even if what’s on disk is technically not corrupt. The synbackup utility can be used to pause Synergy I/O operations for the short time that it takes a backup provider to take a snapshot of the file system.

Additionally on Windows, if the backup provider supports VSS, the SynergyVSSWriter service can be used to automatically pause Synergy I/O while snapshots are being taken. This won’t prevent logically inconsistent data if a backup is taken in the middle of a day-end type operation, but the index and data files won’t be corrupted.

Always have the latest isutl on hand to perform any necessary recovery operations.

Even if you are not using the latest Synergy runtime, you can download just the DBMS utilities from the Synergy DBMS Utilities page in the Downloads area of the Synergex Resource Center.

Maintain a list of known ISAM files and their locations on disk.

In an outage situation, it is helpful to have basic descriptions of files as well as some indication of the importance of the file. If a file is a temporary/scratch file that can be deleted and recreated by the system automatically, it would be a good idea to indicate that here. This list of files and their descriptions should be available in a known location that can be accessed by operations staff or anyone on call after hours.

Detecting and resolving corruption after a crash

If you have files that are REV 6 resilient

The first open of a resilient REV 6 ISAM file after corruption has occurred will attempt to repair the index. We recommend that you run isutl -s or automate file maintenance with xfServer at the startup of your servers to open all of your ISAM files.

If corruption is detected, the Synergy runtime will attempt to connect to xfServer running on the localhost port 2330, handing off the index repair operation to xfServer. This step is necessary to prevent other processes from attempting the repair at the same time and to allow the process to finish even if a user terminates the Synergy application that’s currently running. While this repair operation is taking place, OPEN statements in read-only mode may succeed, and READ/READS/FIND operations may also succeed.

See Resynchronization and resilience for more information.

If you have files that are not REV 6 resilient

After a crash you can take one of the following paths:

Run the system and monitor for errors.
Verify/repair all known important ISAM files.

If you take the monitor-for-errors approach, it’s important to know how errors are exposed in your application. Some applications, for example, will loop endlessly in retrying a failed ISAM operation. If this occurs during an xfServerPlus operation, you may exhaust your pool of connections or run out of licenses. In that case, you won’t be able to see what request is causing the failure.

If you are unsure how your application will respond to ISAM read/write failures, we recommend that you verify/repair the known important ISAM files rather than relying on monitoring for errors.

You can run isutl -r to verify an ISAM file’s integrity. (We no longer recommend isutl -v for this purpose. Running isutl -r takes a similar amount of time as -v, but if corruption is detected, you won’t have to run it a second time.) The table below shows the difference between running isutl with -r versus -ra:

Option	Description
isutl -r	Only operates on your index. Isutl won’t attempt to repair corruption in a data file but 11.1.1i onwards of isutl does attempt to leave a usable file to avoid immediate isutl -ra.
isutl -ra	Attempts to repair corruption that has occurred in a data file. The quality of this repair operation depends on indicators that are left in the data file. The best way to get a good result from this sort of repair operation is to enable compression and SGRFA on your files. Both can be turned on with isutl -r -c -qfile=sgrfa. Using -ra is not recommended unless isutl -r informs the user to do so.

Option

Description

isutl -r

Only operates on your index. Isutl won’t attempt to repair corruption in a data file but 11.1.1i onwards of isutl does attempt to leave a usable file to avoid immediate isutl -ra.

isutl -ra

Attempts to repair corruption that has occurred in a data file. The quality of this repair operation depends on indicators that are left in the data file. The best way to get a good result from this sort of repair operation is to enable compression and SGRFA on your files. Both can be turned on with isutl -r -c -qfile=sgrfa. Using -ra is not recommended unless isutl -r informs the user to do so.

A good default choice is isutl -r, in which case verify/repair looks something like the following pseudocode:

Set the environment variable ISUTLLOG

Set the environment variable ISLOGMAX

For each known_file in known_files_list

isutl -r known_file

For additional options that may dramatically improve performance, refer to isutl -r.

Depending on the amount of system resources available, you may be able to run a few of these isutl operations concurrently. You should also be aware of file fragmentation. Rebuilding large files causes additional large files to be created while the original exists, which may cause significant file structure fragmentation. Be sure to defragment file systems after significant isutl -r operations. In worst-case scenarios, fragmentation may preclude additional records being stored or cause the isutl -r to fail despite sufficient overall disk free space.

The isutl.log file pointed to by ISUTLLOG will let you know if there were any files with corruption either in the index or the data file. If data file corruption is indicated, you should

Wait for an acceptable outage window.

Make a copy of the .ism/.is1 file.

Run isutl -ra on the file to repair the data corruption.

Nonrecoverable data will be logged, and you can use it in conjunction with your application logs to recover any missing information manually.