AIOps Essentials (Autoscaling Kubernetes with Prometheus Metrics)
This course establishes a baseline for AIOps by utilizing Prometheus for managing time series metrics produced by Node Exporter and cAdvisor. The course guides the student through the fundamental concepts required for AIOps and the use of streaming metrics to influence autoscaling. The culmination of the course is the integration of the Prometheus rules with the Kubernetes APIServer to scale nodes in an active Kubernetes cluster.
Interactive Diagram: https://interactive.linuxacademy.com/diagrams/AIOpsEssentials.html
AIOps Essentials (Autoscaling Kubernetes)
Introduction to Author
This is a video introducing the course author.
Introduction to This Course
This introduction goes over what is covered and what is not in this brief five hour course. Three suggested courses are given if the student desires more in depth coverage of Kubernetes, Prometheus, and/or Python. This introduction demonstrated the interactive study guide and shows the proof-of-concept architecture that will be demonstrated in this course.
Autoscaling a Cluster vs. Scaling an Infrastructure
This lesson identifies the differences between Kubernetes Autoscaling techniques and what is needed in a hyrbid cloud or multicloud context. A brief explanation of Kubernetes Autoscaling is contrast with cloud orchestration and the need for AIOps to govern scaling of multiple cloud environments.
The Case for AIOps
This lesson describes in further detail how Agile and DevOps has created the need for deployment automation. The deployment of workloads to on-premises data centers (clouds) as well as hybrid and multicloud architectures are discussed.
Machine Learning and Predictive Analytics
This video provides a brief definition of what Machine Learning is and how it is applied in this course for an AIOps use case.
Monitoring and Metrics
This video introduces Prometheus, one of the software modules we will use in this course.
Prometheus Node Exporter
This lesson introduces the Node Exporter component to the student and explains its role in the overall architecture.
This lesson introduces Google's cAdvisor module and explains its role in the architecture for this course.
Prometheus Node Exporter and cAdvisor Demo for Lab Prep
This brief video informs the student of some resources available to them if they are new to Linux Academy's lab environment.
Exporting Metrics For AIOps
This video discusses the need for establishing a Data Taxonomy for log and metrics aggregation. The hierarchy of cloud infrastructures, business contexts, and cluster architectures are all covered as a potential means of classifying and categorizing diverse input streams.
Relabeling With Prometheus
This video covers the use of Prometheus Relabeling to add metadata to time series data and create the taxonomy required for enterprise aggregation. Two scrape configurations are reviewed: one being EC2, and the other is Kubernetes.
Aggregating Time Series Data
This lesson covers aggregating data with Prometheus. This is known as "Federation" within Prometheus. A sample architecture is reviewed and a sample configuration file is given.
Using the Prometheus API
This lesson briefly introduces the architecture of how a Python client may be used to pull metrics from the Prometheus API. A specific example is covered in the lab, so this lesson introduces the concepts that are then covered in a hands-on way in the lab.
Alerts and Triggers
The Problem With Noise
This lesson is an introduction to our discussion of alerts and triggers. This lesson covers the challenges of using alerts in elastic infrastructures. This lesson also covers why when enterprises scale is deployed, alerts and manual intervention are no longer a feasible way to scale capacity.
Using Rules In Prometheus
This lesson covers Prometheus Recording Rules and shows a sample of their use.
Using Dashboards for Alerting
This lesson covers the use of dashboards and provides Grafana as an example. The architecture that might be employed to further refine Prometheus metrics prior to storage is covered.
Using Linear Regression With Kubernetes
Machine Learning Fundamentals
This video explains the Machine Learning concepts that are relevant to the lab and proof-of-concept that this course covers.
Using Python to Predict Scale
This lesson discusses why Python is a particularly useful language for Machine Learning. The third-party Python libraries used in this course are also discussed.
Scaling an Infrastructure
Scaling Nodes in a Kubernetes Cluster
This lesson covers the topic of scaling capacity in a Kubernetes cloud. The use of automated installers and configuration management tooling is introduced.
Scaling a Hybrid Cloud With ML
In this lesson, we review the architecture of our proof-of-concept and explain how it might be expanded to accommodate more complex hybrid cloud architectures.
Conclusion and Next Steps
This is a brief summation and encourages the student for further study and involvement in the open-source AIOps community.
Credits and Resources
This video mentions two books that were used to create this course content, and may prove useful to the student for further study.