I'm currently evaluating upgrade options for our primary Windows server storage volumes. I am trying to build in a solution in the upgrade to the problem of the time it takes to run CHKDSK on large, busy volumes mounted on Windows. In the days when storage volumes were up to a few hundred MB of DAS RAID this could usually be done in a few hours.
However with current systems with multi-TB volumes, especially those with SATA drives and iSCSI interfaces, the time to run a CHKDSK is stretching out to into 10's of hours or even days. Finding an acceptable time window to take a volume offline is therefore becoming very hard.
The only trick I know for working around this from the good old days of DAS RAID is that if you used RAID1 you could deliberately break the mirror, remove one set of drives, mount them in another system (of course you had to have the same SCSI/RAID controller in that system too), run a CHKDSK offline and then re-mirror the drives back together once completed. But even with DAS this wouldn't work if using RAID5.
In the current generation of SAN/NAS solutions have you faced this issue? If so how have you handled it?
Have you deployed a storage solution that has been so bullet-proof it has never needed a CHKDSK to be run on a Windows volume? Or do you know of a clever way to use Snapshots, Replication and/or other SAN/NAS features to work around having to run a CHKDSK?