We plan to initialize disk mirroring for Unix/Win-LUNs with Hitachi's HORCA between HDS9960 and HDS7700E. In the appropriate HDS-manual (Chapter, Consistency Group Options) we found the following remark: "Important: The copy pending timeout value should be less than the I/O timeout value of the host system."
Can you give us more clarification about what kind of I/O timeout (AIX, NT,W2K), why are the two values dependent and what can happen if we ignore the hint?
Thank you very much and best regards.
First let's talk about what a consistency group is. When you want to remote copy something like a database application where data may be spread across multiple LUNs and there are dependencies involved (like transaction logs), then it is prudent to place all dependent LUNs pertaining to that application in a "consistency group." This would mean if any LUN in the group failed to copy its data to the remote site, then the whole group should be failed together. This is important, since having half of a database is not a good thing. Consistency groups can be treated as a single entity for remote copy purposes.
You can also specify whether or not you want the application to be aware of any remote copy problems. Say one LUN is unable to be updated at the remote site due to a hardware failure or the remote site is unavailable due to a power outage. You can specify whether to notify the host and suspend the application or to continue production at the production site regardless of the failure. In either case, any changes to the primary site LUNs will be recorded and the changes then replicated once the remote site is available again. This is called the "fence level" of the remote copy consistency group.
The reason the copy pending timeout value should be less than the I/O timeout value of the host, is that if the fence level is set to suspend the pair on any remote volume error and you also want to suspend production operations in case remote copy failure, then you would want the subsystem to notify the server prior to any server based timeout issues. This means both sides will stop at the same time and your data will be consistent at both sides up to the time of the remote write failure.
The default timeout value is usually less than any O/S I/O timeout. If you have played with your HBA configuration files and configured things like "IO_RETRY TIMEOUT" value for the HBA driver, then you should change the setting on the consistency groups to be less than that timeout value. Some customers change this if they use two host bus adapters for HA to the storage and they want the timeout to be faster on HBA failure so the path fails over faster.
Editor's note: Do you agree with this expert's response? If you have more to share, post it in our Storage Networking discussion forum.
This was first published in February 2002