Skip to main content

Advanced Configuration for Prometheus Alerts

Hands-On Lab

 

Photo of Will Boyd

Will Boyd

DevOps Team Lead in Content

Length

00:45:00

Difficulty

Intermediate

Prometheus Alertmanager provides some additional useful features around the management of alerts. These features allow you to customize and tweak your alerts so they are more useful in real-world situations. In this lab, you will have the opportunity to practice using some of these Alertmanager features, including alert grouping, inhibitions, and silences.

What are Hands-On Labs?

Hands-On Labs are scenario-based learning environments where learners can practice without consequences. Don't compromise a system or waste money on expensive downloads. Practice real-world skills without the real-world risk, no assembly required.

Advanced Configuration for Prometheus Alerts

Introduction

Prometheus Alertmanager provides some additional useful features around the management of alerts. These features allow you to customize and tweak your alerts so they are more useful in real-world situations. In this lab, you will have the opportunity to practice using some of these Alertmanager features, including alert grouping, inhibitions, and silences.

Solution

Log in to the Prometheus server using the credentials provided:

ssh cloud_user@<PROMETHEUS_SERVER_PUBLIC_IP>

Combine the Web Server Down Alerts into a Single Group

  1. Check Prometheus in a web browser at http://<PROMETHEUS_SERVER_PUBLIC_IP>:9090.

  2. Click the Alerts tab. We should see the WebBadGateway alert as well as WebServer1Down and WebServer2Down.

  3. In the terminal, open the Alertmanager configuration file:

    sudo vi /etc/alertmanager/alertmanager.yml
  4. Add a new node to routing tree to combine the WebServer.*Down alerts:

    route:
    
      ...
    
      routes:
      - receiver: 'web.hook'
        group_by: ['service']
        match_re:
          alertname: 'WebServer.*Down'
  5. Save and exit the file by pressing Escape followed by :wq.

  6. Load the new configuration:

    sudo killall -HUP alertmanager
  7. Check Alertmanager in a web browser at http://<PROMETHEUS_SERVER_PUBLIC_IP>:9093. You should see the Web Server alerts grouped together under the group service="webserver".

Create an Inhibition to Stop the WebBadGateway Alert When a WebServerDown Alert Is Already Firing

  1. In the terminal, edit the Alertmanager configuration file:

    sudo vi /etc/alertmanager/alertmanager.yml
  2. Add a new inhibit rule:

    inhibit_rules:
    
      ...
    
      - source_match_re:
          alertname: 'WebServer.*Down'
        target_match:
          alertname: 'WebBadGateway'
  3. Save and exit the file by pressing Escape followed by :wq.

  4. Load the new configuration:

    sudo killall -HUP alertmanager
  5. Refresh Alertmanager in the browser. The WebBadGateway should no longer appear. You can click the Inhibited box to make it appear again.

Silence the WebServer1Down Alert

  1. Expand the service="webserver" group.

  2. Locate the alert with alertname="WebServer1Down", and click the Silence button for that alert.

  3. Enter your name for Creator.

  4. Enter "silence WebServerDown" for Comment.

  5. Click Create.

  6. Navigate back to the main Alertmanager page, where we should see WebServer1Down is no longer there.

Conclusion

Congratulations on successfully completing this hands-on lab!