To Delete or “Not Delete,” That Is the Question
By John Brunett, Senior Software Engineer
I was recently asked the following ISAM performance question: “What is the best and fastest way to delete records from an ISAM file?” Most Synergy applications have files that grow larger day after day (like transaction files) and periodically need to be pruned to a more manageable size. If they are not reduced, maintenance or recovery could become a future issue. Whether this is a monthly, quarterly, or annual task, there are some things to consider in terms of best practices. Waiting too long could require a considerable time investment to complete the task.
A DELETE operation is the most I/O intensive ISAM operation there is, with STORE being the next. The larger the file, the more I/O is necessary to accomplish each individual operation. Many deletes at once can tie up a system for hours, or even days, depending on the size of the file and type of disk.
Let’s assume, for example, that you want to export all transaction records older than two years to an archive ISAM file. But, to do so, you must take down the application system, which will cause an outage. You want the outage to be as short as possible. Is there a better and faster way?
Here’s an example. Let’s say the number of transaction records older than two years is 10% of the total number of records in a 30,000,000-record, 10-key ISAM file. Also, you’ve planned ahead and added a date/time key representing the transaction date that you can quickly read by and then stop upon reaching the next higher date. You know that by using isutl, you can order the key for optimal sequential performance. Once optimized, your delete function now reads sequentially on the date key, writes each record with a matching earlier date to the archive file, and then deletes the record. Upon reaching the first record with a later date, your delete function stops. Finally, you run isutl again to reduce the free space you created from the deleted records. For my test file with the above attributes, the overall time to complete this function on a Windows machine with the file residing on an HDD was over 18 hours.
Approaching the issue from a hardware perspective, you could install a fast solid-state drive. The same operation with my test file residing on an SSD took only about 25 minutes. But maybe you don’t have the option of swapping out your drives. How can you reduce the overall time to something more manageable using the existing hardware?
Let’s consider another method. Instead of deleting the records, let’s “not delete” them. In other words, when you read a matching record and write it to the archive, instead of deleting the record, you’ll do nothing. First, to ensure the fastest sequential read, you should open the target ISAM file in input exclusive mode. This eliminates having to apply locks as well as reread the header and probe the index, since yours will be the only process allowed. When you reach records newer than your target date, you’ll start writing those records to a sequential file. Upon reaching the end of the file, you’ll then use the fconvert utility to create and load a new transaction file that contains all records newer than the archive date. “Why is this faster?” you may ask. Surely, reading and deleting only 10% of the file would be faster than reading and writing 90% of the same file. But it’s not. [See DELETE for more information about writing ISAM records to a sequential file.]
To understand why, you have to understand what a delete operation entails. In a large ISAM file (in this case, 30,000,000 records), the key depth is around 5. That means each time a delete occurs, a minimum of seven reads and seven writes occurs for each key (one for the file header, five for the index blocks, one for the data record). Since the test file contains 10 keys, there would be a minimum of 52 reads and writes for every record deleted. However, when you consider a deletion in a B-tree, tree-level updates are also required to maintain a balanced structure. Combining multiple index blocks may require adjacent blocks to be read and written, doubling or tripling I/O for each key at each level. A single delete in a 10-key (depth 5) file could require upwards of 152 reads and writes. With 3,000,000 deletes, you can start to understand the problem. Without a fast SSD, an HDD drive with a spinning spindle just can’t keep up.
In the “not delete” method, writes to the output sequential file are blocked at 32 KB, which makes best use of the HDD throughput. Similarly, loading the new ISAM file with fconvert makes use of large block output as well as constructing keys on the fly, writing blocks only once. Using this method, I experienced an SSD time of around 10 minutes (instead of 21 minutes) and an HDD time of 1 hour 15 minutes (instead of 18 hours 30 minutes). Also, there’s no reason to rerun isutl to remove free space, as fconvert packs the indices.
You may also find you can use this method in place of STORE, too. A store function that stores a large number of records to a new ISAM file may benefit from writing those records first to a sequential file and running fconvert at the end to generate the ISAM file. Beware, though, that fconvert performs best when creating a new ISAM file or loading to an empty one, as it can use the isutl engine. To append large amounts of data to an ISAM file, it might be faster to unload the existing file to a sequential file first, then load both sequential files by specifying them as input files using fconvert. Hint: Fconvert 11.1.1 has a new “-k nonkey” input option for faster unloads using the new OPEN option described in the next paragraph.
Synergy/DE 11 contains a few new features that improve on these times even more. An OPEN option introduced in version 11.1.1, /nonkey_sequential, reads directly from the data, bypassing the index. The records won’t be in key order, but since you’re reading all the records in the file anyway, you won’t care. For performance, however, you will care. Using my 30,000,000-record test file, SSD reads came in at 2 minutes, 16 seconds versus /nonkey_sequential at 28 seconds, and HDD reads took 4 minutes versus /nonkey_sequential at 1 minute, 37 seconds. Additionally, with version 11, the isutl and fconvert utilities have been optimized for SSD and multi-core CPUs.
So, if your delete operation seems to take too long, consider “not delete.” That is the answer.