Friday, January 29, 2016

3 Node Storage Spaces Direct Cluster Works!!!

I went through the following URL but instead of creating a 4 node Storage Spaces Direct cluster, I decided to try and see if a 3 node cluster would work. Microsoft documentation says that they will only support Storage Spaces Direct with 4 servers but I thought it can't hurt to try a 3 node... and it worked!!

I did this all with virtual machines and CTP4. So I skipped the RDMA part and just setup two virtual switches. One for internal traffic and one for external. I had to add some steps in so the guest would think the virtual hard drives were either SSDs or HDDs. I also skipped the multi-resilient disks until after testing straight virtual disks. 

So I have 3 nodes. Each node has one 400GB "SSD" and one 1TB "HDD".

  1. Install-WindowsFeature –Name File-Services, Failover-Clustering –IncludeManagementTools
    1. #at this point, I hot added the 3 1tb disks to VMs
  2. Test-Cluster –Node s2dtest01,s2dtest02,s2dtest03 –Include “Storage Spaces Direct”,Inventory,Network,”System Configuration”
  3. New-Cluster –Name s2dtest –Node s2dtest01,s2dtest02,s2dtest03  –NoStorage –StaticAddress
    1. #ignore warnings
    2. #if disaggregated deployment, ensure ClusterAndClient access w/ Get-ClusterNetwork & Get-ClusterNetworkInterface. Not needed for hyper-converged deployments.
  4. Enable-ClusterS2D
    1. #ihis is just for ssd and hdd configs
    2. #optional parameters require for all flash or NVMe deployments
  5. New-StoragePool -StorageSubSystemName s2dtest.test.local -FriendlyName pool01 -WriteCacheSizeDefault 0 -ProvisioningTypeDefault Fixed -ResiliencySettingNameDefault Mirror -PhysicalDisk (Get-StorageSubSystem  -Name s2dtest.test.local | Get-PhysicalDisk)
    1. Get-StoragePool -FriendlyName pool01 | Get-PhysicalDisk  #should see 2 1tb disks
    1. Get-PhysicalDisk | Where Size -EQ  1097632579584 | Set-PhysicalDisk -MediaType HDD #set the 1tb disks to hdd type
    1. #I hot added the 3 400gb disks to VMs at this point
    1. Get-StoragePool -IsPrimordial $False | Add-PhysicalDisk -PhysicalDisks (Get-PhysicalDisk -CanPool $True) #add new disks to pool
    1. Get-StoragePool -FriendlyName pool01 | Get-PhysicalDisk  #should see 4 1tb disks and 4 400gb disks, for a total of 8
    1. Get-PhysicalDisk | Where Size -EQ  427617681408 | Set-PhysicalDisk -MediaType SSD #set the 400gb disks to ssd type
  6. Get-StoragePool pool01 | Get-PhysicalDisk |? MediaType -eq SSD | Set-PhysicalDisk -Usage Journal
  7. New-Volume -StoragePoolFriendlyName pool01 -FriendlyName vd01 -FileSystem CSVFS_ReFS -Size 1000GB -ResiliencySettingName Mirror -PhysicalDiskRedundancy 1 -NumberOfColumns 1

#scale out file server…
  1. New-StorageFileServer  -StorageSubSystemName s2dtest.test.local -FriendlyName sofstest -HostName sofstest -Protocols SMB
  2. New-SmbShare -Name share -Path C:\ClusterStorage\Volume1\share\ -FullAccess s2dtest01$, s2dtest02$,s2dtest03$,test\administrator,s2dtest$,sofstest$
  3. Set-SmbPathAcl -ShareName share

Now I tested. Everything continues to work if any of the nodes die! I tried killing each node one at a time and the virtual disk, the volume and the SOFS share were all still up and still accessible. 2-way mirroring worked with a 3 node S2D setup.

Then I created a single parity space and setup sofs share with the following:

  1. New-Volume -StoragePoolFriendlyName pool01 -FriendlyName vd02 -FileSystem CSVFS_ReFS -Size 500GB -ResiliencySettingName Parity -PhysicalDiskRedundancy 1 -NumberOfColumns 3
  2. New-SmbShare -Name share2 -Path C:\ClusterStorage\Volume2\share\ -FullAccess s2dtest01$, s2dtest02$,s2dtest03$,test\administrator,s2dtest$,sofstest$
  3. Set-SmbPathAcl -ShareName share2

It continues to work if any of the nodes die! I tried killing each node one at a time and the virtual disk, the volume and the SOFS share were all still up and still accessible. Single parity worked with a 3 node S2D setup.

I then added another 1TB disk to each of the 3 nodes and then tried to create a 2-way mirror with two columns, a 3-way mirror with 1 column, a 3-way mirror with 2 columns and a parity space with 6 columns.

  1. Get-StoragePool -IsPrimordial $False | Add-PhysicalDisk -PhysicalDisks (Get-PhysicalDisk -CanPool $True)
  2. Get-PhysicalDisk | Where Size -EQ  1097632579584 | Set-PhysicalDisk -MediaType HDD
  3. Optimize-StoragePool
  4. New-Volume -StoragePoolFriendlyName pool01 -FriendlyName vd03 -FileSystem CSVFS_ReFS -Size 500GB -ResiliencySettingName Mirror -PhysicalDiskRedundancy 2 -NumberOfColumns 1
  5. New-Volume -StoragePoolFriendlyName pool01 -FriendlyName vd04 -FileSystem CSVFS_ReFS -Size 500GB -ResiliencySettingName Mirror -PhysicalDiskRedundancy 1 -NumberOfColumns 2
  6. New-Volume -StoragePoolFriendlyName pool01 -FriendlyName vd05 -FileSystem CSVFS_ReFS -Size 500GB -ResiliencySettingName Mirror -PhysicalDiskRedundancy 2 -NumberOfColumns 1
  7. New-Volume -StoragePoolFriendlyName pool01 -FriendlyName vd06 -FileSystem CSVFS_ReFS -Size 500GB -ResiliencySettingName Parity -PhysicalDiskRedundancy 1 -NumberOfColumns 6
  8. New-SmbShare -Name share3 -Path C:\ClusterStorage\Volume3\share\ -FullAccess s2dtest01$, s2dtest02$,s2dtest03$,test\administrator,s2dtest$,sofstest$
  9. New-SmbShare -Name share4 -Path C:\ClusterStorage\Volume4\share\ -FullAccess s2dtest01$, s2dtest02$,s2dtest03$,test\administrator,s2dtest$,sofstest$
  10. Set-SmbPathAcl -ShareName share3
  11. Set-SmbPathAcl -ShareName share4

Well, the 6 column parity did not work and neither did the 3-way mirror with 2 columns. The PowerShell command would not take. That was somewhat expected. It appears that the resiliency is dependent on the fault domains. The 2-way and 3-way mirror w/ 1 column were created though and they both continued to work throughout any single node failure. The 3-way mirror could not withstand a two node failure though. Perhaps it could withstand two disks? Something to try another day. I wanted to see if the multi-resilient disks would work in a 3 node S2D with a single parity space. So I wiped away all the virtual disks and started over.

  1. Remove-SmbShare share
  2. Remove-SmbShare share2
  3. Remove-SmbShare share3
  4. Remove-SmbShare share4
  5. Remove-VirtualDisk vd01
  6. Remove-VirtualDisk vd02
  7. Remove-VirtualDisk vd03
  8. Remove-VirtualDisk vd04
  9. New-StorageTier -StoragePoolFriendlyName pool01 -FriendlyName MT -MediaType HDD -ResiliencySettingName Mirror -NumberOfColumns 2 -PhysicalDiskRedundancy 1
  10. New-StorageTier -StoragePoolFriendlyName pool01 -FriendlyName PT -MediaType HDD -ResiliencySettingName Parity -NumberOfColumns 3 -PhysicalDiskRedundancy 1
  11. $mt = Get-StorageTier MT
  12. $pt = Get-StorageTier PT
  13. New-Volume -StoragePoolFriendlyName pool01 -FriendlyName vd01_multiresil -FileSystem CSVFS_ReFS -StorageTiers $mt,$pt -StorageTierSizes 100GB, 900GB
  14. New-SmbShare -Name share -Path C:\ClusterStorage\Volume1\share\ -FullAccess s2dtest01$, s2dtest02$,s2dtest03$,test\administrator,s2dtest$,sofstest$
  15. Set-SmbPathAcl -ShareName share

It appeared to create it. I tested failed each node individually and it appeared to work. So in conclusion, it looks like you can build a 3 node Storage Spaces Direct cluster and use multi-resilient disks!!! Granted you can only have one node failure but that's fine by me.

I emailed Microsoft and asked them about supporting 3 node S2D. They said to stay tuned on support of 3 node deployments… Sounds like and looks like it will be coming!

Storage Spaces and Latent Sector Errors / Unrecoverable Read Errors

I emailed to ask about storage spaces direct and how it handles Latent Sector Errors (LSE), otherwise known as Unrecoverable Read Errors. Here is the email I sent:

My company is in the process of evaluating different options for upgrading our production server environment. I’m tasked with finding a solution that meets our needs and is within our budget.

I’m trying to compare and contrast storage spaces direct with storage spaces utilizing JBOD enclosures. Data resiliency, integrity and availability are paramount. So I’m primary looking at both of these technologies from that perspective. Thus, if we go the JBOD route, we’re looking at implementing 3 enclosures and utilizing the enclosure awareness of storage spaces. This solution has existed longer then storage spaces direct and I would think has been tested more thoroughly. I like the scalability and elegance of storage space direct though. From a conceptual overview and a hardware setup perspective it just seems easier to grasp and it seems like a better solution.

My question is, how do both of these setups handle unrecoverable read errors/latent sector errors? Does one solution handle them better than the other?

There are horror stories about hardware RAID controllers evicting drives because of URE/LSE and then during RAID rebuilds encountering additional UREs/LSEs and bricking the storage. This is more worrisome when SATA disks are used (due to UREs/LSEs occurring more often and sooner with SATA disks compared to SAS disks.) How does storage spaces/S2D differ in this regard? I know one of the selling points of S2D is the use of SATA disks. I’m curious as to how this problem has been addressed since SATA disks are being promoted. What happens if there is a URE/LSE in end user data? What happens if there is a URE/LSE in the metadata used by storage spaces/S2D or the underlying file system?

Here is the response I received:

Both Spaces direct and Shared Spaces (with JBOD)  both rely on the same software raid implementation, difference is in the connectivity.  Software raid implementation does not throw away the entire drive on failure, we trigger activity to move the data out of the drive while keeping the copy till data is moved (if we have copies available).  On Write failure we try to move the impacted range right away while background activity is moving the untouched data out of the disk,  some of the disks fail to write but they can continue to support reads in which case the data on those drives can still be used to serve user requests.  Until the data on the failed drive is rebuilt on spare capacity the drive is not removed, user can still force but not automated.  On URE - we trigger rebuilt to recover lost copy, this is triggered both when reads errors detected while satisfying user error or by back ground scrub process.  Back ground scrub process detects URE by validating sector level checksum across copies and  validating. 

So it would appear that if you utilize storage spaces you don't have to worry about a LSE/URE taking out a drive and then a subsequent LSE/URE taking out another drive, thus taking down your array.