Windows Server Monitoring using Prometheus and WMI Exporter

If you are a Windows system administrator, or a site reliability engineer, you spend a lot of time monitoring your Windows servers.

Sometimes, your servers are down, but you can’t know why easily.

Is it because of a high CPU usage on one of the processes?

Is the server having some memory issues? Is the RAM used too much on my Windows server?

Today, after understanding how to do Linux system monitoring, we are taking a look at how to configure your Windows Server monitoring properly.

For this tutorial, we are going to use Prometheus, a modern time series database and monitoring platform.

If you are not familiar with Prometheus monitoring, you can have a look at one of the guides we crafted for you.

Ready to monitor your Windows servers?

I — What You Will Learn

If you follow this tutorial until the end, here are the key concepts you are going to learn about.

  • How to install and configure Prometheus on your Linux servers;
  • How to download and install the WMI exporter for Windows servers;
  • How to bind Prometheus to your WMI exporter;
  • How to build an awesome Grafana dashboard to visualize your metrics.

Quite a long program, let’s jump into it.

II — Windows Server Monitoring Architecture

Before installing the WMI exporter, let’s have a quick look at what our final architecture looks like.

As a reminder, Prometheus is constantly scraping targets.

Targets are nodes that are exposing metrics on a given URL, accessible by Prometheus.

Such targets are equipped with “ exporters “ : exporters are binaries running on a target and responsible for getting and aggregating metrics about the host itself.

If you were to monitor a Linux system, you would run a “ Node Exporter “, that would be responsible for gathering metrics about the CPU usage or the disk I/O currently in use.

For Windows hosts, you are going to use the WMI exporter.

The WMI exporter will run as a Windows service and it will be responsible for gathering metrics about your system.

In short, here is the final architecture that you are going to build.

III — Installing Prometheus

The complete Prometheus installation for Linux was already covered in one of our previous article.

Make sure to read it extensively to have your Prometheus instance up and running.

To verify it, head over to http://localhost:9090 (9090 being the default Prometheus port).

You should see a Web Interface similar to this one.

If this is the case, it means that your Prometheus installation was successful.

Great!

Now that your Prometheus is running, let’s install the WMI exporter on your Windows Server.

IV — Installing the WMI Exporter

The WMI exporter is an awesome exporter for Windows Servers.

It will export metrics such as the CPU usage, the memory and the disk I/O usage.

The WMI exporter can also be used to monitor IIS sites and applications, the network interfaces, the services and even the local temperature!

If you want a complete look of everything that the WMI exporter offers, have a look at all the collectors available.

In order to install the WMI exporter, head over to the WMI releases page on GitHub.

As of May 2020, the latest version of the WMI exporter is 0.12.0.

On the releases page, download the MSI file corresponding to your CPU architecture.

b — Running the WMI installer

When the download is done, simply click on the MSI file and start running the installer.

This is what you should see on your screen.

Windows should now start configuring your WMI exporter.

You should be prompted with a firewall exception. Make sure to accept it for the WMI exporter to run properly.

The MSI installation should exit without any confirmation box. However, the WMI exporter should now run as a Windows service on your host.

To verify it, head over to the Services panel of Windows (by typing Services in the Windows search menu).

In the Services panel, search for the “ WMI exporter “ entry in the list. Make sure that your service is running properly.

c — Observing Windows Server metrics

Now that your exporter is running, it should start exposing metrics on http://localhost:9182/metrics

Open your web browser and navigate to the WMI exporter URL. This is what you should see in your web browser.

Some metrics are very general and exported by all the exporters, but some of the metrics are very specific to your Windows host (like the wmi_cpu_core_frequency_mhz metric for example)

Great!

Windows Server monitoring is now active using the WMI exporter.

If you remember correctly, Prometheus scrapes targets.

As a consequence, we have to configure our Windows Server as a Prometheus target.

This is done in Prometheus configuration file.

d — Binding Prometheus to the WMI exporter

As you probably saw from your web browser request, the WMI exporter exports a lot of metrics.

As a consequence, there is a chance that the scrape request times out when trying to get the metrics.

This is why we are going to set a high scrape timeout in our configuration file.

If you want to keep a low scrape timeout, make sure to configure the WMI exporter to export less metrics (by specifying just a few collectors for example).

Head over to your configuration file (mine is located at /etc/prometheus/prometheus.yml) and edit the following changes to your file.

scrape_configs: 
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# Careful, the scrape timeout has to be lower than the scrape interval.
scrape_interval: 6s
scrape_timeout: 5s
static_configs:
- targets: ['localhost:9090', 'localhost:9216']

Save your file, and restart your Prometheus service.

$ sudo systemctl restart prometheus 
$ sudo systemctl status prometheus

Head back to the Prometheus UI, and select the “ Targets “ tab to make sure that Prometheus is correctly connected to the WMI exporter.

If you are getting the following error, “context deadline exceeded”, make sure that the scrape timeout is set in your configuration file.

Great! Our Windows Server monitoring is almost ready.

Now it is time for us to start building an awesome Grafana dashboard to monitor our Windows Server.

V — Building an Awesome Grafana Dashboard

The Prometheus & Grafana installation was already covered in our previous guides. Make sure to configure your Grafana properly before moving to the next section.

If you are looking to install Grafana on Windows, here is another guide for it.

Prometheus should be configured as a Grafana target, and accessible through your reverse proxy.

a — Importing a Grafana dashboard

In Grafana, you can either create your own dashboards or you can use pre-existing ones that contributors already crafted for you.

In our case, we are going to use the Windows Node dashboard, accessible via the 2129 ID.

Head over to the main page of Grafana (located at http://localhost:3000 by default), and click on the Import option in the left menu.

In the next window, simply insert the dashboard ID in the corresponding text field.

From there, Grafana should automatically detect your dashboard as the Windows Node dashboard. This is what you should see.

Select your Prometheus datasource in the “Prometheus” dropdown, and click on “Import” for the dashboard to be imported.

Awesome!

An entire dashboard displaying Windows metrics was created for us in just one click.

As you can see, the dashboard is pretty exhaustive.

You can monitor the current CPU load, but also the number of threads created by the system, and even the number of system exceptions dispatched.

On the second line, you have access to metrics related to the network monitoring. You can for example have a look at the number of packets sent versus the number of packets received by your network card.

It can be useful to track anomalies on your network, in case of TCP flood attacks on your servers for example.

On the third line, you have metrics related to the disk I/O usage on your computer.

Those metrics can be very useful when you are trying to debug applications (for example ASP.NET applications). Using those metrics, you are able to see if your application consume too much memory or too much disk.

Finally, one of the greatest panels has to be the memory monitoring. RAM has a very big influence on the overall system performance.

As a consequence, it has to be monitored properly, and this is exactly what the fourth line of the dashboard does.

That’s an awesome dashboard!!

VII — Conclusion

As you can see, monitoring Windows servers can easily be done using Prometheus and Grafana.

From there, you can create your own visualizations, your own dashboards and your own alerts.

Working as Cloud Architect & Software enthusiastic