Tuesday, June 20, 2017

Storage Spaces Direct (S2D) Storage Jobs Suspended and Degraded Disks

Storage spaces direct is great, but every once and a while a S2D storage job will get a stuck and just sit there in a suspended state. This usually happens after a reboot of one of the nodes in the cluster.

What you don't want to do is take a different node out of the cluster while a storage job is stuck and while there are degraded virtual disks.

You should make a habit out of checking the jobs and the virtual disk status before changing node membership. You can do this easily with the Get-StorageJob and Get-VirtualDisk commandlets. Alternatively, you could use the script I wrote to continually update the status of both the S2D storage jobs and the virtual disk status.

So what does one do if a storage job is stuck? There are two commandlets that I've found will fix this. The first is Optimize-StoragePool. The second is Repair-VirtualDisk. Start with Optimize-StoragePool and if that doesn't work then move on to Repair-VirtualDisk. Here is how you use them:

Get-StoragePool <storage pool friendly name> | Optimize-StoragePool

Example: Get-StoragePool s2d* | Optimize-StoragePool

Get-VirtualDisk <virtual disk friendly name> | Repair-VirtualDisk

Example: Get-VirtualDisk vd01 | Repair-VirtualDisk

Usually optimizing the storage pool takes care of the hung storage job and fixed the degraded virtual disk but if not target the disk directly.

If neither of those work, give Repair-ClusterStorageSpacesDirect / Repair-ClusterS2D a try. I haven't tried this one yet but it looks like it could help.

Update: I tried Repair-ClusterS2D. It does not appear to help with this scenario. There is limited documentation on it but it looks like it's something you use if a virtual disk gets disconnected or something.

Update: Run Get-PhysicalDisk. If any of them say they're in maintenance mode, this could be the cause of your degraded disks and your stuck jobs. This seems to happen when you pause and resume a node to close together. To take the disks our of maintenance mode run the following:

Get-PhysicalDisk | Where-Object { $_.OperationalStatus -eq "In Maintenance Mode" } | Disable-StorageMaintenanceMode

1 comment:

  1. I just want to thank you. After installing KB4041688 I had my physical disks in maintenance mode on two of my clusters but had not noticed it. While looking for a reason why my repair jobs where not running I found your post and it pointed me to the correct solution. Thank you

    ReplyDelete