Best Practice Guide to IBM i FlashSystem Replication

Best Practice Guide to IBM i FlashSystem Replication

For analysis of the impact Global Mirror or Metro Mirror is having on your IBM i workload, use IBM i performance tools and review the service time and overall response time in the system report disk utilisation feature. IBM Storage Insights collects metrics on behalf of the Flash System and reports on waits attributed to Global Mirror and Metro Mirror replication.

Quick Best Practice Guide to IBM i FlashSystem Replication

Because of the unique coupling of Operating System and application database in an IBM i production partition, every disk IO between host and storage requires the same excellent performance.

In general, the more volumes that are available to IBM i, the better the performance. The following are the reasons for this:

If more volumes are attached to IBM i, storage management uses more threads and therefore enables better performance. More volumes provide a higher I/O concurrency which reduces the likelihood of I/O queuing and therefore the wait time component of the disk response time resulting in lower latency of disk I/O operations.

IBM’s guidance is to not present volumes larger than about 200 GB to IBM i production partitions and to present at least 16 volumes if sub-millisecond response times are required.

Unless you use independent auxiliary storage pools (IASP) to house user data, you must replicate the entire system as one. FlashSystems use consistency groups or volume groups to ensure data integrity across the multiple volumes presented to the IBM i partition.

IBM FlashSystem Copy Services provide synchronous replication (Metro Mirror) or asynchronous replication (Global Mirror) for IBM i workloads. Under Metro Mirror, a host write is committed to the upper cache at both sites before confirmation is returned to the host. Under Global Mirror, a host write is sequenced locally for replication then committed to upper cache locally whilst also being sent to the remote secondary system. So the confirmation is returned to the host without waiting for receipt at the remote site.

The sizing of the required replication link bandwidth for Metro Mirror or Global Mirror must be based on the peak write data rate of the IBM i workload to avoid affecting production performance. The FlashSystem must also have the Remote Copy partnership bandwidth parameter set correctly for the actual network capability. For Global Mirror, if the capabilities of the system configuration / bandwidth are exceeded, the system becomes backlogged and the hosts receive higher latencies on their write I/O. Such delays will be acutely experienced in an IBM i partition as all writes require equivalent performance, whether these are for OS functions, temporary storage or production database writes.

For analysis of the impact Global Mirror or Metro Mirror is having on your IBM i workload, use IBM i performance tools and review the service time and overall response time in the system report disk utilisation feature. IBM Storage Insights collects metrics on behalf of the FlashSystem and reports on waits attributed to Global Mirror replication.

In particular look at Secondary Write Lag – the length of time in milliseconds that replication writes are outstanding from the primary system. Also review the comparison of the Write Data Rate metric for both Primary and Secondary volumes. If Secondary Volumes appear to hit a ceiling that is not evident for the Primary volumes then it is likely you have hit a limit of the bandwidth you are providing.

For replication of IBM i workloads across an IP partnership between two FlashSystems, it is recommended to use Global Mirror Change Volumes which uses a cycling mode for replication at intervals from 5 minutes or above. This completely de-couples the host write and replication. Good for performance but with a hit on RPO.