Tuesday, October 9, 2018

Insightful WMI Snippets

Get all the namespaces
function Get-CimNamespaces ($ns="root") {
   Get-CimInstance -ClassName __NAMESPACE -Namespace $ns |
   foreach {
       Get-CimNamespaces $("$ns\" + $_.Name)
   Get-CimInstance -ClassName __NAMESPACE -Namespace $ns

Get-CimNamespaces | select Name, @{n='NameSpace';e={$_.CimSystemProperties.NameSpace}}, CimClass

View all the namespaces in a hierachy
function View-CimNamespacesHierarchy ($ns="root", $c="|") {
   Get-CimInstance -ClassName __NAMESPACE -Namespace $ns |
   foreach {
       Write-Host $c $_.Name
       View-CimNamespacesHierarchy $("$ns\" + $_.Name) $c"-"

Get all the WMI providers on the system
function Get-CimProvider ($ns="root") {
   Get-CimInstance -ClassName __NAMESPACE -Namespace $ns |
   foreach {
       Get-CimProvider $("$ns\" + $_.Name)
   Get-CimInstance -NameSpace $ns -Class __Win32Provider

Get-CimProvider | select Name, @{n='NameSpace';e={$_.CimSystemProperties.NameSpace}}, CimClass

Get-CimProvider | Measure-Object
=> 188

Get all the provider registrations. A provider can be registered multiple times w/ different registration types.
function Get-CimProviderRegistration ($ns="root") {
   Get-CimInstance -ClassName __NAMESPACE -Namespace $ns |
   foreach {
       Get-CimProviderRegistration $("$ns\" + $_.Name)
   Get-CimInstance -NameSpace $ns -Class __ProviderRegistration

Get-CimProviderRegistration | Measure-Object
=> 303

Get-CimProviderRegistration | Group-Object {$_.CimSystemProperties.ClassName} | select count, name

Count Name                              
----- ----                              
   11 __EventConsumerProviderRegistration
  135 __InstanceProviderRegistration    
    2 __PropertyProviderRegistration    
  115 __MethodProviderRegistration      
   34 __EventProviderRegistration       
    6 __ClassProviderRegistration       

Wednesday, September 19, 2018

WMI root\cimv2 Hierarchy Visualization

Update: I made a mobile friendly version. You can find it at http://www.kreel.com/wmi. If you add it to your home screen and then open it, it should open full screen and give you more real-estate (at least on the iphone.)

While digging through the room\cimv2 WMI namespace I wanted to visually see of all the parent child relationships. So I put together a quick visualization.

You can see it here: http://www.kreel.com/wmi_hierarchy.html

It's just a tree starting with all the classes that do not inherent from any parent. I created a "no parent" node that all the classes without a parent fall under. This just made it simpler/quicker to get the visualization done.

The page is not mobile optimized.

Pan around with the mouse.

Zoom in and out with the mouse's scroll wheel

Search for a specific WMI class in the root\cimv2 namespace. The exact name is needed for the search to work. Enter the name, click "go" and it will take you to the class in the visualization. The search does not work with partial names. It is case-insensitive though and there is an autocomplete that should get you the class you're looking for. (The auto-complete population does do partial names.)

The code isn't the prettiest, I put it together really quick. There is much to be improved. I thought about modifying it to show all the associations between the classes...

Tuesday, July 31, 2018

Hierarchical View of S2D Storage Fault Domains

You can view your storage fault domains by running the following command:

The problem is that this is just a flat view of everything in your S2D cluster.

If you want to view specific fault domains you can use the -Type parameter on Get-StorageFaultDomain.

For example, to view all the nodes you can run:
Get-StorageFaultDomain -Type StorageScaleUnit

The options for the type parameter are as follows:


This is in order from broadest to most specific. Most are self-explanatory.

  • Site represents different physical locations/datacenters.
  • Rack represents different racks in the datacenter.
  • Chassis isn't obvious at first. It's only used if you have blade servers and it represents the chassis that all the blade servers go into.
  • ScaleUnit is your node or your server.
  • Enclosure is if you have multiple backplanes or storage daisy chained to your server.
  • Disk is each physical the disk.

The default fault domain awareness is the storage scale unit. So data is distributed and made resilient across nodes. If you have multiple blade enclosures, racks or datacenters you can change this so that you can withstand the failure of any one of those things.

I'm not sure if you can or would want to change the fault domain awareness to StorageEnclosure or PhysicalDisk?

These fault domains are hierarchical. Disks belongs to an enclosure, an enclosure belongs to node (StorageScaleUnit), a node belongs to a chassis, a chassis belongs to rack and a racks belongs to site.

Since most people are just using node fault domains I made the following script to show your fault domains in a hierarchical layout beneath StorageScaleUnit. The operational status for each fault domain is included.

Get-StorageFaultDomain -Type StorageScaleUnit | %{Write-Host $Tab $Tab $_.FriendlyName - $_.OperationalStatus;$_ | Get-StorageFaultDomain -Type StorageEnclosure | %{Write-Host $Tab $Tab $Tab $Tab  $_.UniqueID - $_.FriendlyName  - $_.OperationalStatus;$_ | Get-StorageFaultDomain -Type PhysicalDisk | %{ Write-Host $Tab $Tab $Tab $Tab $Tab $Tab $_.SerialNumber - $_.FriendlyName - $_.PhysicalLocation - $_.OperationalStatus} } }

You could easily modify this to add another level if you utilize chassis, rack or site fault domain awareness.

Tuesday, July 10, 2018

The Proper Way to Take a Storage Spaces Direct Server Offline for Maintenance?

Way back in September 2017 a Microsoft update changed the behavior of what happened when you suspended(paused)/resumed a node from a S2D cluster. I don't think the article "Taking a Storage Spaces Direct Server offline for maintenance" has been updated to reflect this change?

Previous to the September 2017 update, when you suspended a node, either view powershell Suspend-ClusterNode or via the Failover Cluster Manager GUI Pause option, the operation would put all the disks on that node in maintenance mode. Then when you resumed the node, the resume operation would take the disks out of maintenance mode.

The current suspend/resume logic does nothing w/ the disks. If you suspend a node it's disks don't go into maintenance mode and if you resume the node nothing is done to the disks.

I postulate what you need to do after you suspend/drain the node and prior to shutting it down or restarting it is put the disks for that node into maintenance mode. This can be done with the following powershell command:

Get-StorageFaultDomain -Type StorageScaleUnit | Where-Object {$_.FriendlyName -eq "node1"} | Enable-StorageMaintenanceMode

Be sure to change "node1" to the name of the node you've suspended in the above powershell snippet.

When the node is powered on/rebooted, prior to resuming the node, you need to take the disks for that node out of maintenance mode. Which can be done with the following command:

Get-StorageFaultDomain -Type StorageScaleUnit | Where-Object {$_.FriendlyName -eq "node1"} | Disable-StorageMaintenanceMode

The reason I'm thinking this should be done is that if you just reboot the node without putting the disk in maintenance mode then the cluster will behave as if it lost that node. Things will recover eventually but timeouts may occur and if you're system is extremely busy with IO bad things could happen (VMs rebooting, CSVs moving, etc.) I'm thinking it's better to put the disks for the node in maintenance mode so all the other nodes know what's going on and the recovery logic doesn't need to kick in. Think of it this way, it's better to tell all the nodes what's going on then to make them have to figure out what's going on... I need to test this theory some more...

Update: It looks like the May 8th 2018 update (KB103723) "introduced SMB Resilient Handles for the S2D intra-cluster network to improve resiliency to transient network failures. This had some side effects in increased timeouts when a node is rebooted, which can effect a system under load. Symptoms include event ID 5120’s with a status code of STATUS_IO_TIMEOUT or STATUS_CONNECTION_DISCONNECTED when a node is rebooted." The above procedure is the work around until a fix is available.

Another symptom, at least for me, was connections to a guest SQL cluster timing out. When a node was rebooted prior to the May update everything was fine. After applying the May update and rebooting a node, SQL time outs would occur.

Update: MS released an article and for the time being you should put the disks in maintenance mode prior to rebooting it would seem. Also you might want to disable live dumps it would appear.

Update: Use $Env:ComputerName and run on the computer you want to perform maintenance on instead of specifying the node name:

Get-StorageFaultDomain -type StorageScaleUnit | Where-Object {$_.FriendlyName -eq "$($Env:ComputerName)"}
| Enable-StorageMaintenanceMode
| Disable-StorageMaintenanceMode

Friday, June 29, 2018

View Physical Disks by Node in S2D Cluster

Quick snippets to view physical disks by node in S2D cluster:

Get-StorageNode |%{$_.Name;$_ | Get-PhysicalDisk -PhysicallyConnected}

The following is useful if you're looking at performance monitor and you're trying to figure out which device number is which:

gwmi -Namespace root\wmi ClusPortDeviceInformation | sort ConnectedNode,ConnectedNodeDeviceNumber,ProductId | ft ConnectedNode,ConnectedNodeDeviceNumber,ProductId,SerialNumber

Thursday, May 17, 2018

Trouble Shooting S2D / Clusters

So you have a problem with S2D or you just fixed a problem and you want to try and figure out why it happened so it doesn't happen again. Execute the Get-ClusterDiagnosticInfo powershell script. It will create a zip file in c:\users\<username>\ that contains the cluster logs and all relavent events and settings. Do this right away so you have the data. You can always analyze it at a later date.

Sunday, May 13, 2018

S2D Replacing PhysicalDisk Quick Reference

List phsyical disks to find failed disk. Note serial number.
Get-PhysicalDisk | Get-StorageReliabilityCounter

List virtual disks that use the drive, remember them for later.
Get-PhysicalDisk -SerialNumber A1B2C3D4 | Get-VirtualDisk

"Retire" the Physical Disk to mark the drive as inactive, so that no further data will be written to it.
$Disk = Get-PhysicalDisk -SerialNumber A1B2C3D4
$Disk | Set-PhysicalDisk -Usage Retired

S2D should start a to rebuild the virtual disks that utilized the drive.

To be extra safe, run the following on each of the virtual disks that was listed above.
Repair-VirtualDisk -FriendlyName 'VirtualDiskX'

This storage jobs will likely take some time.

Remove the retired drive from the storag pool.
Get-StoragePool *S2D* | Remove-PhysicalDisk –PhysicalDisk $Disk

Physically remove the bad disk.
Physically add a new disk (could peform this first if you have empty drive bays) and check to see if it was added to the storage pool.
Get-PhysicalDisk | ? CanPool –eq True

If nothing is returned it should have been added to the pool, this is what you want as S2D should claim all disks.

If it wasn't added to the pool try the following:
$newDisk = Get-PhysicalDisk | ? CanPool –eq True
Get-StoragePool *S2D* | Add-PhysicalDisk –PhysicalDisks $newDisk –Verbose

Find the new disk's serial number and then see if any virtual disks are using it. None should be yet.
Get-PhysicalDisk -SerialNumber NEWSNBR | Get-VirtualDisk

Rebalance storage pool
Get-StoragePool *S2D* | Optimize-StoragePool
Get-VirtualDisk | Repair-VirtualDisk

Now virtual disks should be using it
Get-PhysicalDisk -SerialNumber NEWSNBR | Get-VirtualDisk