Get-EsxTop – Another Look

One of the lesser used PowerCLI cmdlets must be the Get-EsxTop cmdlet.

It’s not that the Get-EsxTop cmdlet is not very useful, on the contrary. In my opinion, the main reason for it’s infrequent use might be the complexity involved to actually use the data it returns. Add to that a somewhat lacking documentation, and the Ugly Duckling of the PowerCLI cmdlets is born.

But just like in the story, this cmdlet has the potential to grow up, and transform into a beautiful swan.

Get-EsxTop post

I already did some Get-EsxTop posts in the past, see Hitchhiker’s Guide to Get-EsxTop – Part 1 and Hitchhiker’s Guide to Get-EsxTop – Part 2 – The wrapper. But a recent thread in the VMTN PowerCLI Community made me rethink how the Get-EsxTop cmdlet could be put to better use. The author of the thread wanted to compare the results returned by Get-EsxTop with the data displayed in esxtop. He also compared the calculated Get-EsxTop metrics with those returned by the Get-Stat cmdlet, and there were some serious discrepancies!

The Concept

I decided to write a script, which ultimately became a module, that could:

  • fetch the correct counters from the data returned by Get-EsxTop
  • convert the raw data into meaningful data, as we see it in the esxtop displays
  • (optionally) add Get-Stat counters to the result as a verification

Making the Link

First I needed to tackle how to link the Get-EsxTop counters with their corresponding esxtop display fields, but also with the metrics from PerformanceManager that are used by Get-Stat. I decided to go with a JSON file for this. The file looks like this.

JSON example

The layout is quite straightforward. In (1) we have the esxtop field, in (2) we have the Get-EsxTop counter and property, in (3) we have the corresponding Get-Stat metric, and finally in (4) we have the conversion formula.

Collecting Data

Now that I had linked the Get-EsxTop counters and the Get-Stat metrics, I needed a way to fetch both of these in a transparent way. The data returned by Get-EsxTop and Get-Stat are collected over different intervals. For Get-EsxTop the interval is 5 seconds, for the Get-Stat the closest interval is the Realtime interval, which collects data over 20 seconds. See my PowerCLI & vSphere statistics – Part 1 – The basics post if you want to learn more on intervals.

To avoid having to deal with intervals and elapsed time in the main code, I decided to use the Timer class. This class allows to set an interval, and will fire an Elapsed event at the end of the each interval.

With an event,  we can use the Register-ObjectEvent cmdlet to run a code block each time the event is fired. Schematically the collection function, which is called Get-TopStatRaw, looks like this.

Get-TopStatRaw Timer schematic

Note that to limit the amount data to be transferred between the “elapsed” event code, and the main function, the “event” code uses Reference variables ([Ref]) to access the arrays defined in the main function.

Formulas

In one of the previous sections, I already mentioned the JSON file in which essential data for the module is defined. One of the fields in there is the formula field. This formula contains the expression that converts the raw Get-EsxTop data into the values that are shown in the esxtop tool. For example, several of the counters returned by Get-EsxTop, are cumulative counters. In other words, you will need to subtract the value from the value from the previous 5 second interval, to get the actual counter value for the interval.

In the formulas I used the convention that $n represent the value from the actual interval, and $p represents the value from the previous interval.

To clarify further, let’s have a look at a sample entry.

JSON entry example

The counter in the example, NetPower/NumberOfRecvPackets, will give the number of received packets on a PNIC. Since the counter is a cumulative value, we will need to subtract the value from the previous interval ($p) from the one in the current interval ($n).

The esxtop fields are expressed as packets per second. So we will have to convert the value from the 5 second Get-EsxTop interval to a 1 second interval. The duration of of Get-EsxTop interval is symbolised by the $TopInterval entry.

Interpretation

As I also mentioned in my other Get-EsxTop posts, there are a (rather limited) number of places where you can find more information about the fields used in esxtop. The primary one I use all the time is the Interpreting esxtop Statistics document in the VMTN forum.

Some of the fields presented in esxtop are a bit harder to interpret than others. And the Get-EsxTop cmdlet returns “raw” values, not the values you see in an esxtop display. In the thread I mentioned earlier, there were discrepancies between what the user saw in esxtop and what Get-EsxTop returned.

CORE UTIL(%)

With Get-EsxTop we will have to retrieve the value that is stored in the CoreHaltTimeInUsec property under the LCPU counter.

To convert this CoreHaltTimeInUsec value to a CORE UTIL(%) there are a number of things to note.

  • The CoreHalTimeiInUsec value is expressed in microseconds. So you will have to multiply with 1E6 to get to seconds, and then divide by the number of seconds in an Get-EsxTop interval
  • The CORE UTIL(%) shows “utilization”, while the CoreHalTimeiInUsec shows the reverse. The counter shows the time the core is in Halt state. As a result you will have to distract the Halt time from the interval duration to get the Utilized time. That is why we subtract the calculated value from 1 (100%) in the formula.

JSON CoreHalTimeiInUsec formula

  • The CORE UTIL(%) field is only displayed in esxtop for ESXi nodes on which hyper-threading is active.

esxtop and Hyper-threading

  • From the Interpreting esxtop Statistics document “A core is utilized, if either or both of the PCPUs on this core are utilized”. As a consequence we only need to look at a core, not at each of both PCPU (in case of hyper-threading) separately. From Get-EsxTop we will get values for each PCPU, but the values for both PCPU on the same core, will be the same. We can safely ignore the duplicated values. The following screenshot displays the raw values returned by Get-EsxTop. The sample was taken on an ESXi node with a quad core processor and with hyperthreading active.

hyper-threading and core

The Module

The code to achieve the goals I described earlier in this post, ended up in a module. In that module there are a number of functions that are exported, the others are internal functions. The exported functions are

  • Get-TopMetric: which will display the defined metrics from the JSON file on screen
  • Get-TopStat: retrieves the raw Get-EsxTop data and converts that data into the esxtop fields
  • Format-TopData: a function that will display the retrieved data in a similar way as it is shown in esxtop

The following schematic shows all the functions in the module. Note that only the top-row of functions is exported (visible).

EsxTopCollect functions schematic

Code Annotations

This section will explain in some detail part of the (trickier) code.

Event Handlers

The collection of the data is triggered by the Elapsed event that is fired by each of the Timers.

Line 1-7: Define the arguments that the event handler code will receive

Line 9-13: Define the code of the event handler.

Line 11: In the event handler code, the arguments can be accessed via the $Event.MessageData object. Under that object you will find all the parameters that were defined.

Line 10: Since the array was passed as Reference variable, the code needs to access the content via the Value property.

Line 17: With the Register-EventObject cmdlet you define the code block that will be executed when the event is fired. On the same cmdlet you also specify the parameters that the event handler code block will receive.

Arrays by Reference

To avoid too much data movement, I used Reference variables for the data arrays in which the Get-EsxTop and Get-Stat counters are collected.

Line 14-15: The actual arrays are defined in the function Get-TopStat

Line 18-19: The Reference to these arrays is passed to the Get-TopStatRaw function.

Sample Usage

The function Get-TopStat collects the esxtop data, and converts it to esxtop field names.

In this example 3 esxtop fields are retrieved and that for a 20 second period. On the console that will look as follows (note that Verbose is switched on).

Get-TopStat sample run verbose

In verbose mode you follow the timers being fired. The Get-EsxTop timer is fired 4 times before the Get-Stat timer is fired. Looking into the data, you notice that all requested esxtop fields are present as properties in each returned object.

Get-TopStat sample output

The data can be saved to an external medium for further analysis.

With the Format-Topdata function, we can format the data in such a way, that it resembles the displays that you get in esxtop.

In this example we ask for the output of the CpuHeader area.

esxtop sample output CPU panel header

The function will produce the following output.

Format-TopData sample output

Future

At the moment the basic concept for presenting Get-EsxTop data in a more user friendly way is there.

To add further esxtop displays, or specific areas of such displays, will require a bit more work. The JSON file will need to be updated with new fields,  and how to obtain and calculate them. Finally, the Format-TopData function will need to be expanded to display the fields.

The module is available on github as EsxTopCollect, and I hope to trigger some community participation to expand the available features.

Enjoy!

2 Comments

    2013kaa

    это круто мужик!)))думаю взять за основу твои наработки и доработать в сторону мониторинга в заббикс!
    ты крут)))!

    Ellwood

    Could you take that and input it into a Grafana dashboard (by curling the data into influx or another db)?

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*

Buy the Book