For users of EqualLogic PS Series SANs. If you have recently upgraded member firmware to version 6.x you might have noticed a change in behaviour for failed disks.
Prior to firmware V6.x there was no advanced warning for a disk failure. Once you had a failed disk your array would move into a rebuilding phase with your hot spare disks. You would then receive a collection of alarms that you have a failed disk, fewer spare drives, and array is being rebuilt.
New in firmware version 6.x is an early disk failure detection algorithm. This is a great improvement over previous firmwares. The EqualLogic member will communicate with the HDD firmware and when it detects a pending failure the EqualLogic member will initiate a block level copy from the source disk to the hot spare disk. When the block level copy is complete the hot spare becomes the active disk and the source disk becomes failed. The benefit in this approach is that no RAID rebuild phase needs to happen, which can take a significant amount of time with lots of disks. A RAID rebuild can also cause a performance impact to the SAN which is mitigated with the predictive failure copy.
The only downside I’ve seen of this is the potential for a higher number of disk failures. In some of the early PS Series SANs certainly types of types of disks, namely SATA, had very aggressive disk failure algorithms built into the HDD firmware. It meant that there was a higher number, than normal, of false positive disk failures. This was later corrected with a HDD firmware update that was recommended to all customers, which can still be found of the EqualLogic support site. While I haven’t seen as many disk failures of years past I have noticed a slight increase in failed disks running Firmware version 6.x since the HDD firmware upgrades.
Predictive disk failure starting a copy to a Hot-Spare disk.
Completion of copy results in a failed source disk.