Hitchhiker’s Guide to Get-EsxTop – Part 2 – The wrapper

In an earlier post, see Hitchhiker’s Guide to Get-EsxTop – Part 1, I described my first experiences with the new Get-EsxTop cmdlet. While the use of the cmdlet is rather straightforward, the data it returns is not so easy to interprete. Luckily Carter intercepted a secret cable that allows us to actually use the data returned by the cmdlet.

The following is my first attempt to write a wrapper around the Get-EsxTop cmdlet. The idea is to have a script that produces statistical data similar to what resxtop produces.

The concept

The secret cable shows us how the data returned by Get-EsxTop has to be converted to resemble data we get from esxtop and resxtop.

Unfortunately the cable is a bit dated, and some of the counterfields from the cable are not present in the output from Get-EsxTop. And the other way around some of the counters and fields returned by Get-EsxTop are not described in the cable.

The CSV file from Carter’s post shows, besides the link to the Get-EsxTop counters and fields respectively in the PerfObjectType and PerfCounterName columns, the method how the data has to be interpreted in the DerivationMethod column.

For example, when the DerivationMethod column specifies “rate” for a specific field, it means that you will have to take the current value, substract the previous value and divide by the duration of the Get-EsxTop interval, which is btw 5 seconds. The value you get is a number/sec value. The NumOfSendPackets field in the NetPort counter group is such a metric. Applying the “rate” formula will give the average number of send packets per second during the measurement interval.

To use the spreadsheet in my script, I first had to update it with what is actually returned by Get-EsxTop. The following is the reviewed spreadsheet from Carter’s post.

Note that the cells marked in yellow are not present in the Get-EsxTop output, the cells marked in orange are not in the original spreadsheet.

The key concept for the function is the DerivationMethod column. This column indicates how the value is calculated.

With the name in the DerivationMethod column and a bit of reverse engineering, I was able to derive the following formulas.

In the following table I use the following variables:

  • n: current sample
  • n-1: previous sample
  • ESX-interval: the length of the esxtop interval. This is currently 5 seconds
  • 1E6: the time values are expressed in microseconds. To convert to seconds the values needs to be divided by 1E6
  • Tx-n: the number of packets transmitted
  • Rx-n: the number of packets received
DerivationMethod Formula
Sum n – n-1
PercentLimit100

Percent

PCPUUtil

PCPUUsed

(n – n-1) / ESX-interval / 1E6 * 100
PowerStatePercent (n – n-1) / ESX-interval / 1E6
Rate

DiskLatency

DiskAtsFailLatency

DiskCloneFailLatency

DiskCloneSuccLatency

DiskZeroFailLatency

DiskZeroSuccLatency

(n – n-1) / ESX-interval
RxDropRate (n – n-1)/(Rx-n – Rx-n-1) / ESX-interval
TxDropRate (n – n-1)/(Tx-n – Tx-n-1) / ESX-interval
RateByteToMb (n – n-1) / ESX-Interval / ESX-interval / 1MB * 8
RateDevide1M (n – n-1) / ESX-interval / ESX-interval / 1MB
RateDevide1K (n – n-1) / ESX-interval / ESX-interval / 1KB
PageToMB n * 4 / 1KB
Devide100 n / 100
Devide1K n / 1KB

Note that there quite a number of fields that do not have a DerivationMethod specified. I will try to complete these fields after further investigation.

The script

Annotations

Line 43-48: The accepted counter names

Line 56-70: An internal helper function that is used to update field values on the object

Line 67: As you can determine from the ESXCounters1 list, the properties that are returned by resxtop have different names than the counter properties that are returned by Get-EsxTop. For that reason the function creates an alias property similar to the resxtop names. Note that the Export-Csv cmdlet doesn’t make a distinction between a noteproperty and an aliasproperty, both properties will be exported as two separate columns in the CSV !

Line 72-83: An internal helper function that is used to add additional properties to an object

Line 88: To accommodate the Delay parameter, the date/time of entry in the function is captured

Line 90-103: To function accepts one or more ESX(i) servers via the Server parameter. If there is no Server parameter, the function uses the current connection, in case PowerCLI is used in ‘single‘ mode, or extracts all ESX(i) servers from the connected servers, in case PowerCLI is used in ‘multi‘ mode.

Line 106: an instance of esxcli is opened. This is used to map display names to world IDs later on in the script.

Line 109-633: This is the updated ESXCounters1 spreadsheet, from the cable, provided as inline code. My reason for doing this with inline code, is that the entire function can be handled as 1 file. There are some alternatives available, you can import the ESXCounters1-present.csv file with an Import-Csv cmdlet or you can store the lines in another .ps1 file and dot-source them.

Line 635: The inline CSV data is converted to an array. Note that the Delimiter is specified, this to avoid problem should the script be run in another ‘culture’.

Line 638: Since we calculate the actual values from two consecutive samples, we increase the number of requested samples by 1.

Line 643-646: The ESXCounters1 array is converted into a hash table. This will allow us to do key lookups later on in the script.

Line 649-691: Some counters produce samples for each of their instances. This hash table maps those counters to the property that distinguishes a specific instance. For example, with the SchedGroup counter, the VMName property identifies specific instances.

Line 694-700: For each of the requested counters, the function collects all fields (properties) and stores them in an array.

Line 703-705: If the function is called with the Delay parameter, these lines will force a sleep till the delay is passed.

Line 707-722: This loops captures the required number of samples.

Line 711-717: For the VMem and VCPU counters the function will add the Display Name to the object that is returned. These lines use the esxcli object to get a list of all running worlds and their corresponding VMXCartelID

Line 720-721: Sleep what remains of the esxtop interval (5 seconds).

Line 725-821: This loop calculates the values from the 2 consecutive samples.

Line 727-728: This test handles the Name parameter when specific instances are requested.

Line 731: Since we need 2 consecutive measurements to calculate the values, the code distinguishes between the first sample and all following samples.

Line 732-742: For the first sample, the values of the counters are stored in specific variables.

Line 738-740, 813-815: When the Counter parameter contains VMem and/or VCPU, the function adds the Displayname to the returned object.

Line 747-807: The switch statement uses the DerivationMethod column from the ESXCounters1 list to calculate the value as it would be returned by resxtop. If there is no derivation method present, the function returns that raw value of the property.

Line 816: The object with the calculated values is stored in the Values array.

Example usage

The simplest way to call the function is like this

This will return one measurement for all available counters for all available instances.

To specify a specific counter use the Counter parameter

The number of intervals is specified with the Intervals parameter

It will take more than 25 seconds before the function returns the objects. The function will take 6 samples and the esxtop interval between each sample is 5 seconds. The calculation of the values takes some time as well.

To specify more counters use the Counter parameter

If you want to capture one or more specific instances, use the Name parameter.

In this case the function will return 20 objects; 2 counters x 2 instances x 5 intervals.

And finally, with the Server parameter, you tell the function to get the samples from one or more specific ESX(i) servers.

The connection to the server(s) must be made before you call the function.

In a next post I will show samples of how the returned data can be used.

3 Comments

    Ken

    Hi Luc,

    As always, excellent work. I need this script specifically to track copy-on-write hints (COWH). My concern is that we run DRS in our environment. I could use the help from the community to vet my thinking here.

    I would think that if the script were running against a cluster of, say, 6 hosts (A-F) and collected get-esxtop stats from host A, then DRS migrates a VM from host A to host E *before* the script executes against host E, then the COWH stats would be skewed because it would have collected stats on the same VM twice, on two different hosts.

    Any thoughts on this issue? Is it a “red-herring?”

    Thanks!

      LucD

      Hi Ken, with the Get-EsxTop cmdlet you can use the Server parameter to pass 1 or more servers against which to run the cmdlet.
      I would try collecting the esxtop data from all servers in 1 call to the cmdlet, and then use PowerShell to split out the results per Server (with the Group-Object cmdlet for example).

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*

Buy the Book