It's possible to move all the physical disks involved in a S2D deployment to different servers and to bring the data online! I tested this out by installing S2D on a three node cluster. I shutdown all the nodes and pulled the disks. I then reinstalled the OS on all three nodes and put the disks back in. I setup the new cluster w/ different server names and cluster name and then ran "Enable-ClusterS2D". The old storage pool and disks could be seen. The storage pool was in a read-only state and the disks were detached. To get the data online I did the following:
Get-StoragePool *s2d* | Set-StoragePool -IsReadOnly $false
Get-VirtualDisk | Connect-VirtualDisk
Then I went to the cluster manager, right clicked on Pools and "Add Storage Pool". Then I right clicked on disks and "Add Disk". At this point I added the disks to CSV and was able to access all the data.
Wednesday, October 18, 2017
Tuesday, September 12, 2017
Storage Spaces Direct (S2D), Degraded Virtual Disks and KB4038782
After applying Windows Server 2016 September's patch (KB4038782) to a node in an S2D cluster, the disks on that node would not come out of maintenance mode after the node re-joined the cluster and after Resume -> Fail Roles Back. The VMs would move back but the disks would stay in maintenance mode, thus causing the virtual disks to show a status of degraded. I had to manually take the disks out of maintenance mode after the node joined the cluster and after I failed the roles back.
To see if the disks are in maintenance mode run:
Get-StorageFaultDomain
Under the OperationalStatus column it will say "In Maintenance Mode" by the disks for the node that was just restarted. I don't know if this issues was/is just specific to me or if it may happen to everyone that applies the patch. To take the nodes out of maintenance mode use the Disable-StorageMaintenanceMode command.
I have a smaple here that gets all the disks in maintenance mode and disables maintenance mode for those disks.
Update: Official MS KB on the issue is here
To see if the disks are in maintenance mode run:
Get-StorageFaultDomain
Under the OperationalStatus column it will say "In Maintenance Mode" by the disks for the node that was just restarted. I don't know if this issues was/is just specific to me or if it may happen to everyone that applies the patch. To take the nodes out of maintenance mode use the Disable-StorageMaintenanceMode command.
I have a smaple here that gets all the disks in maintenance mode and disables maintenance mode for those disks.
Update: Official MS KB on the issue is here
Tuesday, June 20, 2017
Storage Spaces Direct (S2D) Storage Jobs Suspended and Degraded Disks
Storage spaces direct is great, but every once and a while a S2D storage job will get a stuck and just sit there in a suspended state. This usually happens after a reboot of one of the nodes in the cluster.
What you don't want to do is take a different node out of the cluster while a storage job is stuck and while there are degraded virtual disks.
You should make a habit out of checking the jobs and the virtual disk status before changing node membership. You can do this easily with the Get-StorageJob and Get-VirtualDisk commandlets. Alternatively, you could use the script I wrote to continually update the status of both the S2D storage jobs and the virtual disk status.
So what does one do if a storage job is stuck? There are two commandlets that I've found will fix this. The first is Optimize-StoragePool. The second is Repair-VirtualDisk. Start with Optimize-StoragePool and if that doesn't work then move on to Repair-VirtualDisk. Here is how you use them:
Get-StoragePool <storage pool friendly name> | Optimize-StoragePool
Example: Get-StoragePool s2d* | Optimize-StoragePool
Get-VirtualDisk <virtual disk friendly name> | Repair-VirtualDisk
Example: Get-VirtualDisk vd01 | Repair-VirtualDisk
Usually optimizing the storage pool takes care of the hung storage job and fixed the degraded virtual disk but if not target the disk directly.
If neither of those work, give Repair-ClusterStorageSpacesDirect / Repair-ClusterS2D a try. I haven't tried this one yet but it looks like it could help.
Update: I tried Repair-ClusterS2D. It does not appear to help with this scenario. There is limited documentation on it but it looks like it's something you use if a virtual disk gets disconnected or something.
Update: Run Get-PhysicalDisk. If any of them say they're in maintenance mode, this could be the cause of your degraded disks and your stuck jobs. This seems to happen when you pause and resume a node to close together. To take the disks our of maintenance mode run the following:
Get-PhysicalDisk | Where-Object { $_.OperationalStatus -eq "In Maintenance Mode" } | Disable-StorageMaintenanceMode
Another Update: If a disk becomes dettached, try this.
What you don't want to do is take a different node out of the cluster while a storage job is stuck and while there are degraded virtual disks.
You should make a habit out of checking the jobs and the virtual disk status before changing node membership. You can do this easily with the Get-StorageJob and Get-VirtualDisk commandlets. Alternatively, you could use the script I wrote to continually update the status of both the S2D storage jobs and the virtual disk status.
So what does one do if a storage job is stuck? There are two commandlets that I've found will fix this. The first is Optimize-StoragePool. The second is Repair-VirtualDisk. Start with Optimize-StoragePool and if that doesn't work then move on to Repair-VirtualDisk. Here is how you use them:
Get-StoragePool <storage pool friendly name> | Optimize-StoragePool
Example: Get-StoragePool s2d* | Optimize-StoragePool
Get-VirtualDisk <virtual disk friendly name> | Repair-VirtualDisk
Example: Get-VirtualDisk vd01 | Repair-VirtualDisk
Usually optimizing the storage pool takes care of the hung storage job and fixed the degraded virtual disk but if not target the disk directly.
If neither of those work, give Repair-ClusterStorageSpacesDirect / Repair-ClusterS2D a try. I haven't tried this one yet but it looks like it could help.
Update: I tried Repair-ClusterS2D. It does not appear to help with this scenario. There is limited documentation on it but it looks like it's something you use if a virtual disk gets disconnected or something.
Update: Run Get-PhysicalDisk. If any of them say they're in maintenance mode, this could be the cause of your degraded disks and your stuck jobs. This seems to happen when you pause and resume a node to close together. To take the disks our of maintenance mode run the following:
Get-PhysicalDisk | Where-Object { $_.OperationalStatus -eq "In Maintenance Mode" } | Disable-StorageMaintenanceMode
Another Update: If a disk becomes dettached, try this.
Thursday, February 23, 2017
S2D Continually Refresh Job and Disk Status
In storage spaces direct you can run Get-StorageJob to see the progress of rebuilds/resyncs. The following powershell snippet allows you to continually refresh the status of the rebuild operation so that you know when things are back to normal.
function RefreshStorageJobStatus () { while($true) { Get-VirtualDisk | ft; Write-Host "-----------"; Get-StorageJob;Start-Sleep -s 1;Clear-Host; } }
Enter the above in powershell on one line. Then enter "RefreshStorageJobStatus" to start the script. The output should look similar to the following and refresh every second:
Name IsBackgroundTask ElapsedTime JobState PercentComplete BytesProcessed BytesTotal
---- ---------------- ----------- -------- --------------- -------------- ----------
Repair True 00:00:13 Suspended 0 0 7784628224
Repair True 00:00:06 Suspended 0 0 7784628224
FriendlyName ResiliencySettingName OperationalStatus HealthStatus IsManualAttach Size
------------ --------------------- ----------------- ------------ -------------- ----
vd01 OK Healthy True 1 TB
vd03 Degraded Warning True 1 TB
vd02 Degraded Warning True 1 TB
vd04 OK Healthy True 1 TB
You can press ctrl-c to stop the execution.
Update 8/16/2018: Here is an updated RefreshStorageJobStatus function that shows the bytes processed and bytes remaining in gigabytes instead of bytes:
function RefreshStorageJobStatus () { while($true) { Get-VirtualDisk | ft; Write-Host "-----------"; Get-StorageJob | Select Name,IsBackgroundTask,ElapsedTime,JobState,PercentComplete,@{label=”BytesProcessed (GB)”;expression={$_.BytesProcessed/1GB}},@{label=”Total Size (GB)”;expression={$_.BytesTotal/1GB}} | ft;Start-Sleep -s 1;Clear-Host; } }
Run the same as before, enter "RefreshStorageJobStatus" to start the script. Output looks like this:
FriendlyName ResiliencySettingName OperationalStatus HealthStatus IsManualAttach Size
------------ --------------------- ----------------- ------------ -------------- ----
test2dfsb Incomplete Warning True 3.5 TB
vd02b {Degraded, Incomplete} Warning True 1 TB
vd01b {Degraded, Incomplete} Warning True 1 TB
-----------
Name IsBackgroundTask ElapsedTime JobState PercentComplete BytesProcessed (GB) Total Size (GB)
---- ---------------- ----------- -------- --------------- ------------------- ---------------
Repair True 00:00:41 Suspended 0 0 70.25
Repair True 00:00:01 Suspended 0 0 122.25
function RefreshStorageJobStatus () { while($true) { Get-VirtualDisk | ft; Write-Host "-----------"; Get-StorageJob;Start-Sleep -s 1;Clear-Host; } }
Enter the above in powershell on one line. Then enter "RefreshStorageJobStatus" to start the script. The output should look similar to the following and refresh every second:
Name IsBackgroundTask ElapsedTime JobState PercentComplete BytesProcessed BytesTotal
---- ---------------- ----------- -------- --------------- -------------- ----------
Repair True 00:00:13 Suspended 0 0 7784628224
Repair True 00:00:06 Suspended 0 0 7784628224
FriendlyName ResiliencySettingName OperationalStatus HealthStatus IsManualAttach Size
------------ --------------------- ----------------- ------------ -------------- ----
vd01 OK Healthy True 1 TB
vd03 Degraded Warning True 1 TB
vd02 Degraded Warning True 1 TB
vd04 OK Healthy True 1 TB
You can press ctrl-c to stop the execution.
Update 8/16/2018: Here is an updated RefreshStorageJobStatus function that shows the bytes processed and bytes remaining in gigabytes instead of bytes:
function RefreshStorageJobStatus () { while($true) { Get-VirtualDisk | ft; Write-Host "-----------"; Get-StorageJob | Select Name,IsBackgroundTask,ElapsedTime,JobState,PercentComplete,@{label=”BytesProcessed (GB)”;expression={$_.BytesProcessed/1GB}},@{label=”Total Size (GB)”;expression={$_.BytesTotal/1GB}} | ft;Start-Sleep -s 1;Clear-Host; } }
Run the same as before, enter "RefreshStorageJobStatus" to start the script. Output looks like this:
FriendlyName ResiliencySettingName OperationalStatus HealthStatus IsManualAttach Size
------------ --------------------- ----------------- ------------ -------------- ----
test2dfsb Incomplete Warning True 3.5 TB
vd02b {Degraded, Incomplete} Warning True 1 TB
vd01b {Degraded, Incomplete} Warning True 1 TB
-----------
Name IsBackgroundTask ElapsedTime JobState PercentComplete BytesProcessed (GB) Total Size (GB)
---- ---------------- ----------- -------- --------------- ------------------- ---------------
Repair True 00:00:41 Suspended 0 0 70.25
Repair True 00:00:01 Suspended 0 0 122.25
Monday, February 13, 2017
AD-less S2D cluster bootstrapping
AD-less S2D cluster bootstrapping - Domain Controller VM on Hyper-converged Storage Spaces Direct
Is it a supported scenario to run a AD domain controller in a VM on a hyper-converged S2D cluster? We're looking to deploy a 4-node hyper-converged S2D cluster at a remote site. We would like to run the domain controller for the site on the cluster so we don't need to purchase a 5th server. Will the S2D cluster be able to boot if the network links to the site are down (meaning other domain controllers are not accessible)? I know WS2012 allowed for AD-less cluster bootstrapping but will the underlying mechanics uses for storage access in S2D in WS2016 work without AD? Is this a supported scenario? AD-less S2D cluster bootstrapping?
I asked this question in the Microsoft forums. I did not get a definitive answer from anyone. So I set it up and tested it and it appears to work. I don't know if it's officially supported or not but it does work. The S2D virtual disks and volumes comes up with out a domain controller. At which point you can start the domain controller VM if it did not start automatically. I didn't dig into things, but I have a feeling it's using NTLM authentication and would likely fail if your domain requires Kerberos?
Subscribe to:
Posts (Atom)