I had a failing disk on the 400+ disk array. It’s not my job to fix it, but I want to have a look to the thing in case I need to do it. What follows is the result of my investigation.
I log in one of the servers of the storage, and run our SMclient. It looks like this:
There is a list of storage arrays, each one with its own name, that I carefully deleted. A Storage Array (SA) is just a bunch of disks (JBOD) that can consist of one or more trays filled with disks, with some network connections, infiniband or something similar.
Something is going on on the 9th SA, that displays a yellow triangle. A window pop-up if you single click on “details” to the right, with the SA component details. If you click on the Connection of the list visible of the new window, the option Remove (by default greyed out) becomes available. What does it mean? If we click on help, SMclient is explaining it to us.
Why would I want to remove individual management connections?
In short, if you don’t require it anymore. An example is given. The storage array has two management connections and the host for one is removed from the network permanently. The JBOD can be connected to two servers, and one can be no more needed. In that case, we can remove it. Let’s close for now this thing and go ahead.
By double click on one of the healthy storage arrays you get some rotating wheels, and after a few seconds a new window will appear, apparently named Array Management window. It looks like this:
Again, I removed the name of the component. By clicking on Performance, some graphs are generated. The Hardware slot is the most interesting one for me, it will show me the position on the drawers of the SA of the spare disks, if any. Also, by clicking on each disk icon, we get the disk info. I must say, all looks very useful. We will see.
Now we double click on the SA with the yellow triangle. The Array Management window (AMW) that pops up comes with a yellow triangle instead of the “Optimal” question mark, and there is also one on the Monitor subwindow. A single click on any of the messages will pop up a help, telling us that we have a degraded virtual disk. I will make a summary of what is there.
What Caused the Problem?
One or more physical disks have failed in a disk pool or disk group and the associated virtual disks have become degraded. The data on the virtual disks is still accessible; however, data may be lost if another physical disk in the same disk pool or disk group fails.
Important Notes (my version)
- When you replace a failed physical disk, data from the failed physical
disk is reconstructed on the new unassigned physical disk. This
reconstruction should begin automatically after you insert the new
- Make sure the replacement physical disks have a capacity equal to or
greater than the failed physical disks you will remove.
- You can replace failed physical disks while the affected virtual disks
are receiving I/O only if there are no other operations-in-progress
for those virtual disks.
Recovery Steps (my version)
|1||Check the Recovery Guru Details area to identify the failed physical disk(s).|
|2||Remove failed physical disks associated with this disk pool or disk group (the fault indicator lights on the failed physical disks should be on).|
|3||Wait 30 seconds, then insert the new physical disks. The fault indicator light on the new physical disks may become lit for a short time (one minute or less).Data reconstruction should begin on the new physical disk(s).
If you are replacing a physical disk in a storage array that contains hot spares, physical disk reconstruction will start on the hot spare before you insert the new physical disk. The data on the replacement physical disk may not be reconstructed until after it has completed the process on the hot spare.
If reconstruction does not start within a few minutes, select the new physical disk; then, select the Hardware > Physical Disk > Advanced > Manually Reconstruct menu option to start reconstruction on the physical disk.
Replace only one physical disk at a time for each disk pool or disk group. Each physical disk should complete reconstruction before the next physical disk begins reconstruction.
Wait until the reconstruction is completed for all virtual disks before continuing.
|4||Click the Recheck button to rerun the Recovery Guru.|
Of course, since these things need quick actions, the disk was replaced much before I finished my research on how to change it. On the Array Management window we see now “Operations in Progress“. The array is in Reconstruction. Time to wait…