Linux Academy’s Elastic Certification Preparation Course
BigData Training Architect II in Content
Elasticsearch has become a favorite technology of administrators, engineers, and developers alike. Whether you are using it with the rest of the Elastic Stack, or on its own, Elasticsearch is a powerful and user-friendly search and analytics engine. Log aggregation, operational analytics, application performance monitoring, NoSQL databases, site search, and ad-hoc data analysis are just a few of the many things Elasticsearch is used for. The Elastic Certified Engineer Certification was created to recognize IT professionals with expertise in Elasticsearch. An Elastic Certified Engineer can design and deploy a complete Elasticsearch solution.
DISCLAIMER: This is not an official Elastic Company course created by or approved by the Elastic Company. The Linux Academy is in no away affiliated with Elastic Company or ElasticSearch BPV.
Welcome to this professional-level certification preparation course for becoming an Elastic Certified Engineer! In this video, I will go over who this course is for, what you need to know before taking the course, and what kinds of skills you can expect to learn.
About the Author
Get to know a little bit about me, the training architect and author for this course!
Nomenclature: ELK vs Elastic Stack
You may hear references to ELK and Elastic Stack and not know the difference. Well, lets clear that up with this short video describing the nomenclature of Elastic's product suite.
Elastic Stack Overview
While this course is about Elasticsearch, it wouldn’t be right if we didn't spend at least a few minutes talking about the Elastic Stack as a whole. It’s helpful to know what other services are offered in the Elastic Stack and how Elasticsearch fits into the Elastic Stack ecosystem. In this video, I will briefly describe each Elastic Stack service, including the FOSS services (Beats, Logstash, Elasticsearch, and Kibana) and the premium plugin X-Pack (Security, Alerting, Monitoring, Reporting, Graph and Machine Learning).
Before we jump into the exam objectives, we first need to go over some basic Elasticsearch terms and concepts. In this video, we will go over all the Elasticsearch terminology you need to know and discuss some of the inner workings of Elasticsearch to help make sense of what we will do later in the course.
Installation and Configuration
Deploy, Configure, and Start an Elasticsearch Cluster That Satisfies a Given Set of Requirements
The first step to using Elasticsearch to store and analyze vast amounts of data is to deploy it. There are many ways you can deploy an Elasticsearch cluster, but in this video we will focus on one method that you need to know for the Elastic Certified Engineer exam. I encourage you to follow along on the Linux Academy cloud servers as we demonstrate how to: Prepare a Linux system for ElasticsearchDeploy ElasticsearchConfigure a multi-node clusterAdd custom node attributesBind Elasticsearch to specific addressesConfigure node rolesName a clusterName each node in a clusterConfigure node awareness and discoveryAvoid split brain scenariosStart Elasticsearch
Secure an Elasticsearch Cluster Using X-Pack Security
Without the X-Pack Security plugin, Elasticsearch is susceptible to unauthorized use by nefarious actors. In order to secure Elasticsearch properly, you must use X-Pack Security to encrypt the various networks and enforce granular, role-based user access control. I highly encourage you to follow along on the Linux Academy cloud servers as we demonstrate how to: Install X-Pack for ElasticsearchGenerate a CA with the certutil toolGenerate node certificatesAdd certificate passwords to the keystoreEncrypt the transport networkSet built-in user passwordsEncrypt the HTTP networkInstall X-Pack for Kibana
Define Role-Based Access Control Using X-Pack Security
With X-Pack Security, you can create highly granular roles to permit or restrict access to specific indexes, documents, and fields. Follow along on the Linux Academy cloud servers as we demonstrate how to: Use Kibana's management UI to create roles and usersUse Elasticsearch APIs to create roles and usersUse command line curl to create roles and usersLimit access to specific cluster permissionsLimit access to specific indexesOnly allow specific actions on indexesRestrict access to documents via a queryLimit access to specific document fieldsCreate a user with a given set of roles
Define an Index That Satisfies a Given Set of Requirements
Before we can get our hands dirty with indexing, searching, and aggregating, we first need to learn how to create indexes. I encourage you to follow along on the Linux Academy cloud servers as we demonstrate how to: Create index with specific shard countsCreate indexes with explicit mappingsList the indexes in a clusterRead the anatomy of any given index
Perform Index, Create, Read, Update, and Delete Operations on the Documents of an Index
Performing create, read, update, and delete (CRUD) operations in Elasticsearch could be considered Elasticsearch fundamentals. I encourage you to follow along on the Linux Academy cloud servers as we demonstrate how to: Create/index documentsBulk index documents with the _bulk APIRead documentsPerform doc updatesPerform script updatesDelete documents
Define and Use Index Aliases
Often times, for large datasets, you want to spread your data out across many indexes. It can then be tedious to query your data if you don't know exactly which index to query. However, we can avoid this problem by using aliases with our indexes to reference multiple indexes at the same time. That way, if you have weekly or daily indexes, you can create them with an alias and always reference the entire dataset across all aliased indexes. I encourage you to follow along on the Linux Academy cloud servers as we demonstrate how to: Create or remove aliases with the _aliases APIAdd aliases at index-timeCreate filtered aliases
Define and Use an Index Template for a Given Pattern That Satisfies a Given Set of Requirements
Whenever you are indexing time-series data or just having to index similar data across many different indexes, it can become very tedious to continuously create indexes with the same structure over and over again. By using index templates, we can automatically structure all indexes whose name matches a regular expression with a specific set of requirements. I encourage you to follow along on the Linux Academy cloud servers as we demonstrate how to: Create custom index templatesCreate indexes that match an index pattern
Define and Use a Dynamic Template That Satisfies a Given Set of Requirements
When indexing documents, Elasticsearch tries its best to map your fields to the right datatype in the event that you do not explicitly define your field mappings. By using dynamic templates, we can change the automatic mapping behavior of Elasticsearch to behave more to our liking. I encourage you to follow along on the Linux Academy cloud servers as we demonstrate how to: Index documents with default dynamic mappingCreate dynamic templates to change dynamic mapping behaviorCreate dynamic templates that only target specific field name patternsCreate explicit field mappings
Use the Reindex API and Update by Query API to Reindex and/or Update Documents
Once you have all of your data indexed, you may find that you want to change it somehow. Rather than deleting it and indexing it all again differently, we can use the _reindex and _update_by_query APIs to either reindex our documents to new indexes or just update them in place. I encourage you to follow along on the Linux Academy cloud servers as we demonstrate how to: Reindex all documents to another indexReindex specific documents to another indexReindex from a remote cluster to a local clusterMutate documents during reindexingUpdate documents in placeMutate documents in place with painless scriptsRun documents through an ingest node pipeline during reindexing or updating
Define and Use an Ingest Pipeline That Satisfies a Given Set of Requirements, Including the Use of Painless to Modify Documents
Logstash has long been the go-to tool for taking data from Elasticsearch, mutating it in some way with Logstash pipelines, and then indexing the data back in. However, Elasticsearch has come a long way in recent years, particularly in reference to this use case. Ingest pipelines have since been added to Elasticsearch which allow you to define a Logstash-like pipeline from within Elasticsearch itself. You can then use these pipelines in-flight during reindex operations or in-place during update operations. I encourage you to follow along on the Linux Academy cloud servers as we demonstrate how to: Define an ingest pipelineRemove a field from an indexConvert a field's datatypeAdd a new field with string concatenationUse Painless to update the value of a field depending on its current valueModify documents while reindexing them to a new index with a pipeline
Mappings and Text Analysis
Define a Mapping That Satisfies a Given Set of Requirements
We briefly discussed field mappings when we worked with indexes and documents. Now let's really dig into what they are and how we can configure them. I encourage you to follow along on the Linux Academy cloud servers as we demonstrate how to: Define index mappingsCreate a mapping typeMap fields to their appropriate data types
Define and Use a Custom Analyzer That Satisfies a Given Set of Requirements
Text analysis makes it possible to search individual terms within your data. Elasticsearch ships with a variety of built-in analyzers that you can use to tokenize your data. Elasticsearch also enables you to define your own custom analyzers. I encourage you to follow along on the Linux Academy cloud servers as we demonstrate how to: Define text analysisDetermine the data types that support text analysisAnalyze some text with various built-in analyzersAnalyze text of various languages with built-in language analyzersCustomize a built-in analyzerCreate a custom analyzer with unique character and token filtersSearch for terms with different analyzers to compare text analysis behavior
Define and Use Multi-fields with Different Data Types and/or Analyzers
Different data types in Elasticsearch behave in different ways. Sometimes, you may want to use your source data for multiple use cases (e.g., for both full-text search and aggregating). To help with this, Elasticsearch lets you index your data with multiple data types and analyzers using something called multi-fields. I encourage you to follow along on the Linux Academy cloud servers as we demonstrate how to: Define multi-fieldsMap multi-fieldsSearch on specific multi-fields
Configure an Index So That It Properly Maintains the Relationships of Nested Arrays of Objects
Field mappings in Elasticsearch support hierarchy. With Elasticsearch object fields, you can have fields within fields within fields and so on. When you start to mix arrays with objects, however, things can get a little weird. In order to store an array of object fields while maintaining field relationships, you must use a special data type called nested. I encourage you to follow along on the Linux Academy cloud servers as we demonstrate how to: Create object fieldsDemonstrate flattened arrays of objectsConfigure arrays of objects to preserve their field relationshipsDemonstrate the nested field effect using the search API
Configure an Index That Implements a Parent/Child Relationship
Unlike traditional databases, Elasticsearch uses flattened data. Traditional databases generally use normalized data and then use joins to merge data sets and provide a flattened view at query time. Elasticsearch can do the same thing with join fields, but with some restrictions. I encourage you to follow along on the Linux Academy cloud servers as we demonstrate how to: Map a join fieldDefine a parent/child relationshipIndex a parent documentIndex a child document and map it to a specific parent
Allocate the Shards of an Index to Specific Nodes Based on a Given Set of Requirements
By default, Elasticsearch allocates 5 primary shards and 1 replica shard to all indexes. However, you can customize those numbers if you wish. When allocating shards to nodes, Elasticsearch automatically tries to distribute the load evenly across the cluster. Again, this is also configurable such that you can specify which nodes an index should allocate to. I encourage you to follow along on the Linux Academy cloud servers as we demonstrate how to: Specify primary and replica shards for an indexDisplay shard allocation for an indexUpdate shard allocation settings for an indexManually allocate an index to a specific node
Configure Shard Allocation Awareness and Forced Awareness for an Index
For mature production clusters that require maximum uptime and fault tolerance, you should segment your cluster into zones (racks, host machines, data centers, etc.). You can also make shard allocation more intelligent by using node attributes to inform Elasticsearch that you have multiple zones. Instead of being allocated to the same zone, primary and replica shards are spread out to improve fault tolerance in the event that an entire zone fails. I encourage you to follow along on the Linux Academy cloud servers as we demonstrate how to: Configure shard allocation awarenessConfigure forced shard allocation awarenessDetermine which awareness method to use and why
Configure a Cluster for Use with a Hot/Warm Architecture
Hot/warm architecture has become a popular data storage solution for companies with high data retention goals but that don't have the budget to buy massive amounts of really fast storage. With hot/warm architecture, we can store our most recent (and typically most relevant) data on faster, more expensive storage and then re-allocate that data to cheaper nodes when it becomes less important. I encourage you to follow along on the Linux Academy cloud servers as we demonstrate how to: Configure node attributesCreate an index allocated to a hot nodeCreate an index allocated to a warm node
Back Up and Restore a Cluster and/or Specific Indices
Properly configured, Elasticsearch is a highly redundant and fault-tolerant system. However, as with any data infrastructure, you should always back up your data to prevent losses in the event of user error or a natural disaster. Elasticsearch makes this possible with the snapshot and restore APIs. I encourage you to follow along on the Linux Academy cloud servers as we demonstrate how to: Deploy a single-node clusterConfigure the cluster to store backupsAdd a snapshot repositoryIndex some data to be backed upBack up an indexRestore an indexRestore an index with a different name
Configure a Cluster for Cross-Cluster Search and Execute a Search Query Across Multiple Clusters
Depending on your use cases, Elasticsearch deployments may be better split into separate, smaller clusters instead of one massive cluster that you use for everything. For a lot of users, the main deterrent to breaking up a cluster into multiple clusters is the ability to have it all searchable from one place. However, using Elasticsearch's cross-cluster search feature, you can configure multiple clusters to be searchable from one. I encourage you to follow along on the Linux Academy cloud servers as we demonstrate how to: Configure cross-cluster searchConfigure both clusters with the same X-Pack Security CASearch across two separate clusters
Diagnose Shard Issues and Repair a Cluster’s Health
In Elasticsearch, red cluster states are scary. So how do you fix them? Elasticsearch provides a way to force the cluster to explain itself in the event of unassigned shards. I encourage you to follow along on the Linux Academy cloud servers as we demonstrate how to: Use the explain API to diagnose missing replica shardsUse the explain API to diagnose missing primary shardsUse the explain API to give a reason for an allocation decisionUse the results of the explain API to restore cluster health
Write and Execute a Search Query for Terms and/or Phrases in One or More Fields of an Index
Once you have your data in Elasticsearch, the next logical step is to, well, search! However, the sheer number of query types available can be a bit overwhelming at first. Which query type should you use? I encourage you to follow along using the Linux Academy cloud servers as we demonstrate how to: Perform full-text searchesUse the match queryUse the match_phrase queryUse the multi_match queryPerform term-level queriesUse the term queryUse the terms queryUse the range query
Write and Execute a Search Query That Is a Boolean Combination of Multiple Queries and Filters
Individual queries in Elasticsearch are very well defined and easy to use. But what do you do when you want a more advanced query that maybe combines the features of multiple queries into one? By using the bool query, we can create a compound query that allows us to specify one or more queries with conditional logic. I encourage you to follow along using the Linux Academy cloud servers as we demonstrate how to: Create a bool queryDefine a must clauseDefine a must_not clauseDefine a should clause and specific how many queries should matchCreate a filter to reduce the dataset without affecting scoringName each query to identify which one(s) contributed to each search hit
Highlight the Search Terms in the Response of a Query
User experience expectations around search have certainly evolved over the years. Elasticsearch does a fantastic job at returning results very quickly, but these days that is not enough for most users. Highlighting the search term(s) in the search results is a great way to draw the users attention to the portion of the results they clearly care about. I encourage you to follow along using the Linux Academy cloud servers as we demonstrate how to: Highlight search terms for a specific fieldDefine your own pre and post tags to surround the highlighted terms
Sort the Results of a Query by a Given Set of Requirements
Displaying accurate search results to users is all well and good, but if the results are not sorted in a meaningful way, the user experience can be frustrating. I encourage you to follow along using the Linux Academy cloud servers as we demonstrate how to: Sort search results on a specific field's valueDefine multi-level sorting on the values of many different fields
Implement Pagination of the Results of a Search Query
When using Elasticsearch to display search results to a UI, like a website or an application, the amount of restuts you can display at once can be a design limitation. By using the size and from parameters with the Elasticsearch _search API, we can implement pagination in order to show a set amount of search results at a time. I encourage you to follow along using the Linux Academy cloud servers as we demonstrate how to: Set the size parameterSet the from parameter
Use the scroll API to retrieve large numbers of results
The size and from parameters allow you to paginate your search results, but there is a limit to how many overall results they can fetch. How do you request really big volumes of data from Elasticsearch? Maybe you need to move the data from Elasticsearch to another analysis tool, like a machine learning job. By using the scroll API, we can request an unlimited amount of results from Elasticsearch. I encourage you to follow along using the Linux Academy cloud servers as we demonstrate how to: Start a scroll searchScroll through a datasetDelete open search contextsUse a sliced scroll to search in parallel
Apply Fuzzy Matching to a Query
The queries we have covered up to this point are not necessarily user-friendly, at least in the sense that they will not account for user mistakes like spelling errors. Using a fuzzy query, we can tune the search API to try and figure out what the user actually meant in the event that they type something incorrectly. I encourage you to follow along using the Linux Academy cloud servers as we demonstrate how to: Demonstrate search with a typoUse a fuzzy query with search typosRefine a fuzzy query for more specific behavior options
Define and Use a Search Template
With a website or an application using Elasticsearch as a backend search engine, how do we take user inputs from form fields and properly construct a dynamic query? By using search templates, we can create a query that abstracts fields and values using the mustache language syntax. Then we can provide the user-inputted values as parameters. Then we can inject the parameters directly into a preconfigured query, or we can mutate them in some way to meet a given query's requirements. I encourage you to follow along using the Linux Academy cloud servers as we demonstrate how to: Construct a search templateExecute a search templateIntroduce advanced parameter mutation possibilities with search templates
Write and Execute Metric and Bucket Aggregations
Elasticsearch is more than just a search engine. It is also an analytics engine through the use of aggregations. I encourage you to follow along using the Linux Academy cloud servers as we demonstrate how to: Define cardinality, sum, and avg metric aggregationsDefine terms and date_histogram bucket aggregations
Write and Execute Aggregations That Contain Sub-Aggregations
Now that we have some knowledge of aggregation, lets go ahead and get a little more complex. To really unlock the analysis potential of Elasticsearch, we need to combine aggregations in a nested fashion. I encourage you to follow along using the Linux Academy cloud servers as we demonstrate how to: Define metric aggregations nested in bucket aggregationsSort the resulting buckets by the metric aggregation output
Write and Execute Pipeline Aggregations
Let's take things a step further. Now that we can compute and nest metric and bucket aggregations, let's use them as the inputs to pipeline aggregations to answer even more complex questions. I encourage you to follow along using the Linux Academy cloud servers as we demonstrate how to: Create the sibling pipeline aggregations max_bucket and sum_bucketCreate the parent pipeline aggregations cumulative_sum and derivative
Preparing for the Exam
In this video, we'll go over everything you need to know about the exam and how to prepare for it. We'll discuss how to register for the exam, how much it costs, and what to expect on test day.
Congratulations, you made it to the end! Wondering what to do next? Let's talk about a few courses that tie in really well with Elasticsearch as well as some courses that are crucial for learning how to use distributed systems in the real world. No matter what you decide to do next, never stop learning!
Learn how to showcase your successful completion of this course with Linux Academy founder and CEO, Anthony James.