Skip to main content

File Processing and Environment Communication with Python

Course

Intro Video

Photo of Keith Thompson

Keith Thompson

DevOps Training Architect II in Content

Length

11:00:00

Difficulty

Advanced

Videos

25

Hands-on Labs

9

Course Details

This course covers "Exam block #1: File Processing and Communicating with a Program’s Environment" of the certification exam: PCPP-32-1: Certified Professional in Python Programming 1 Certification.

Topics as called out in the exam syllabus are:

Processing different kinds of files sqlite3 – interacting with SQLite databases xml – creating and processing XML files csv – CSV file reading and writing logging – basics logging facility for Python configparser – configuration file parser Communicating with a program’s environment os – interacting with the operating system, datetime – manipulating with dates and time io – working with streams, time – time access and conversions

Syllabus

Getting Started

Course Introduction

00:00:59

Lesson Description:

Python is one of the most versatile and widely used programming languages that exists today. Whether you're working in server administration, web development, or data science, you've likely interacted with a tool written in Python or been asked to write some Python yourself. This course is designed to teach you how to use Python to work with various file types, and how to interact with the machine's operating system, dates, and times. This course has been designed to cover all of the content in the first exam block for the Certified Professional in Python Programming 1 exam (PCPP-32-1), created by the Python Institute.

About the Training Architect

00:00:48

Lesson Description:

A little about me, Keith Thompson.

Environment Setup

Installing Python 3.8 on a Cloud Playground

00:06:39

Lesson Description:

Learn how to install Python 3 using pyenv on a CentOS 7 that has code-server pre-installed to provide a full development environment. Note: This course uses Python 3.8 and you will definitely run into issues if you are using Python < 3.6. Picking the Right Cloud Playground Image If you plan on following along with the course on your local workstation, you'll want to make sure that you have a good development environment setup. But, if you want to follow along exactly with the course, then you'll want to create a Cloud Playground server (2 or 3 units) using the "CentOS 7 w/ code-server" image. This image will give us a server with code-server pre-installed (VS Code running on the server and accessible through the browser). Using code-server to Program on the Server By using the domain name for the server followed by port 8080, we can access code-server from our browser while having a full development environment and terminal available to us. We'll be redirected to the page being served over HTTPS and, depending on our browser, we'll need to click a few buttons to acknowledge that we know the certificate is self-signed. Installing pyenv Installing Python from source can be a great learning experience, but it is a little tedious. For this course, we're going to instead install pyenv, which will allow us to install and switch between multiple different Python versions more easily. To get started, we need to make sure that we have some development dependencies installed so that we can pull down the pyenv repository. We're using the --skip-broken flag because the "CentOS 7 w/ code-server" playground image already has Git installed. But, if you're using a different image you can install Git using the package manager for that system. Note: You can get to the terminal in VS by clicking the hamburger icon (3-line menu bar) on the top left.

sudo yum install -y --skip-broken git gcc zlib-devel bzip2-devel readline-devel sqlite-devel
We also need to clone the pyenv repository:
$ git clone https://github.com/pyenv/pyenv.git ~/.pyenv
For pyenv to be useful, we'll need to set a few environment variables and run a command when our shell is loading. We'll add those to our ~/.bashrc file so that it's set as soon as our shell is initialized:
$ echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
$ echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
$ echo -e 'if command -v pyenv 1>/dev/null 2>&1; thenn  eval "$(pyenv init -)"nfi' >> ~/.bashrc
Before we can use pyenv, we'll need to reload our shell:
$ exec $SHELL
Finally, let's install Python 3.8.2:
$ pyenv install 3.8.2
We can check and switch between versions of Python using pyenv. To see the versions available to us, we'll use the pyenv versions command:
$ pyenv versions
* system (set by /home/cloud_user/.pyenv/version)
  3.8.2
To change our active version, we'll use pyenv shell <VERSION>:
$ pyenv shell 3.8.2
$ python --version
Python 3.8.2
We also have an executable for python3 and python3.8 that we can use. To make it apparent what version is being used throughout the course, you'll see that the commands will use the python3.8 executable. Since we're going to be using Python 3.8.2 for this whole course, we're going to initiate 3.8.2 as our default Python version in our ~/.bashrc file: ~/.bashrc:
# previous lines omitted
if command -v pyenv 1>/dev/null 2>&1; then
  eval "$(pyenv init -)"
  pyenv shell 3.8.2
fi
Don't forget to save before exiting! Upgrade Pip The version of pip that we have might be up-to-date, but it's a good practice to try to update it after the installation. Let's update that now:
$ pip3.8 install --upgrade pip
Collecting pip
  Downloading https://files.pythonhosted.org/packages/57/36/67f809c135c17ec9b8276466cc57f35b98c240f55c780689ea29fa32f512/pip-20.0.1-py2.py3-none-any.whl (1.5MB)
     |????????????????????????????????| 1.5MB 3.1MB/s
Installing collected packages: pip
  Found existing installation: pip 19.2.3
    Uninstalling pip-19.2.3:
      Successfully uninstalled pip-19.2.3
Successfully installed pip-20.0.1
To use the improved REPL:
pip3.8 install bpython

SQLite

Interacting with SQLite Using sqlite3 - Part 1

00:16:36

Lesson Description:

SQLite is the most widely deployed database on the planet (because of mobile) and it can often be used to great benefit for local projects. In this lesson, we'll learn how to perform basic SQL operations with sqlite3 and Python. Documentation for This VideoThe sqlite3 library PEP 249 Controlling Transactions Accessing Columns by Name Instead of IndexCreating a SQLite Database The sqlite3 standard library package implements the interface described in PEP 249 and is a great library for interacting with SQLite. During our environment set up, we ensured that sqlite3 was installed on our system before we installed Python 3, and that allows us to use the sqlite3 package. To begin, let's create a file called using_sqlite.py. We'll write a script to create a database and perform some basic actions. We're going to create a database with a table to store employee information. To begin, we need to connect to the database (or create it if it doesn't exist). We'll call our database employee.db. ~/using_sqlite.py:

import sqlite3

conn = sqlite3.connect('employees.db')
cur = conn.cursor()

# code goes here

cur.close()
conn.close()
We first open up a connection that will either access an existing file from the path provided, or create it. Next, we need a cursor so that we can interact with our database, and we create that using the connection's cursor method. Finally, we want to make sure that we're closing our connection to the database at the end of our script by using the close function on the cursor and the connection. Creating a Table and Inserting Rows To interact with our database, we'll use the cursor and mostly be writing SQL queries as strings and using a few different methods to run them. Let's start by creating the database, using the execute method, and then we'll insert some rows using the executemany method. ~/using_sqlite.py:
import sqlite3

conn = sqlite3.connect('employees.db')
cur = conn.cursor()

cur.execute("""
CREATE TABLE IF NOT EXISTS employees (
    id INTEGER PRIMARY KEY,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    email TEXT NOT NULL UNIQUE,
    years_with_company INTEGER DEFAULT 0
)
""")

cur.executemany("""
INSERT INTO employees (first_name, last_name, email, years_with_company) VALUES (?, ?, ?, ?)
""", [
    ('Kevin', 'Bacon', 'kbacon@example.com', 2),
    ('Josh', 'Brolin', 'jbrolin@example.com', 1),
    ('Kim', 'Dickens', 'kdickens@example.com', 0),
])

cur.close()
conn.close()
When we want to run a single command, we can use the execute method. But, if we have a command that we want to run with multiple different values, we're able to write the SQL once and then substitute the values in, using a list of tuples. Let's run this and then check the status of our employee.db file. To get the rows out of our file, we'll call the fetchall method on a cursor:
$ pyton3.8 using_sqlite.py
$ python3.8
Python 3.8.2 (default, Apr  6 2020, 14:05:27)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> conn = sqlite3.connect('employees.db')
>>> cur = conn.cursor()
>>> all = cur.fetchall()
>>> all
[]
Notice that nothing was returned. This is because we didn't call conn.commit() to ensure that the INSERT statement is run. This is because the sqlite3 module does some things to implicitly open transactions around certain types of SQL statements, such as INSERT. To learn more about this, we should read the controlling transactions documentation. Let's add this commit to ensure that our rows are added. To prevent us from needing to open up our REPL to check the data, let's also add a print statement that prints the rows before our script ends. Additionally, let's switch to using an in-memory database by changing the connection string to be ':memory:'. This will allow us to start with a clean slate each time we run our script. ~/using_sqlite.py:
import sqlite3

#conn = sqlite3.connect('employees.db')
conn = sqlite3.connect(':memory:')
cur = conn.cursor()

cur.execute("""
CREATE TABLE IF NOT EXISTS employees (
    id INTEGER PRIMARY KEY,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    email TEXT NOT NULL UNIQUE,
    years_with_company INTEGER DEFAULT 0
)
""")

cur.executemany("""
INSERT INTO employees (first_name, last_name, email, years_with_company) VALUES (?, ?, ?, ?)
""", [
    ('Kevin', 'Bacon', 'kbacon@example.com', 2),
    ('Josh', 'Brolin', 'jbrolin@example.com', 1),
    ('Kim', 'Dickens', 'kdickens@example.com', 0),
])

conn.commit()

cur.execute("SELECT * FROM employees")
print(cur.fetchall())

cur.close()
conn.close()
We execute a SELECT * FROM employees statement so that the cursor can see the new items that we've inserted. If we didn't do this, then the cursor would not point to the rows since they weren't returned by the previous statement. As alternatives to fetchall, we could use fetchmany with a number to return, or fetchone to get the next item. It's worth noting that the cursor "moves" when we run these commands, so calling fetchone will return one item and then calling fetchall would return the remaining rows. Working with Rows The alternative way to read through data that we've selected from a query is by iterating over the cursor's rows. We can do this using a for loop. Let's print a phrase for each row returned from a query: ~/using_sqlite.py:
# previous code omitted

for row in cur.execute("SELECT * FROM employees WHERE years_with_company >= 1"):
    print(row[1], "has worked for", row[4], "years")

cur.close()
conn.close()
This query should return only 2 of the 3 rows in the database. When we run this file, we'll see the following:
$ python using_sqlite.py
[(1, 'Kevin', 'Bacon', 'kbacon@example.com', 2), (2, 'Josh', 'Brolin', 'jbrolin@example.com', 1), (3, 'Kim', 'Dickens', 'kdickens@example.com', 0)]
Kevin has worked for 2 years
Josh has worked for 1 years
Accessing the columns by the index is a little tedious, but we can make it easier on ourselves by setting the connection's row_factory option to be sqlite3.Row. Once we do that, we'll be able to also access our values by column name. ~/using_sqlite.py:
import sqlite3

#conn = sqlite3.connect('employees.db')
conn = sqlite3.connect(':memory:')
conn.row_factory = sqlite3.Row

# table creation and inserts omitted

cur.execute("SELECT * FROM employees")
print(cur.fetchall())

for row in cur.execute("SELECT * FROM employees WHERE years_with_company >= 1"):
    print(row["first_name"], "has worked for", row["years_with_company"], "years")

cur.close()
conn.close()
By running this file again, we'll see the following:
$ python using_sqlite.py
[<sqlite3.Row object at 0x7f879ec7a9f0>, <sqlite3.Row object at 0x7f8
79ebcb590>, <sqlite3.Row object at 0x7f879ebcb5b0>]
Kevin has worked for 2 years
Josh has worked for 1 years
Because our rows are no longer just tuples, we don't see information that is quite as useful when printing. But, it did make it easier to access specific fields from the row without knowing the order in which the columns are defined.

Interacting with SQLite Using `sqlite3` - Part 2

00:06:21

Lesson Description:

SQLite is the most widely deployed database on the planet (because of mobile) and it can often be used to great benefit for local projects. In this lesson, we'll continue learning how to perform basic SQL operations with sqlite3 and Python. Documentation for This VideoThe sqlite3 library PEP 249 Controlling Transactions Accessing Columns by Name Instead of IndexUpdating & Deleting Rows In our last video, we've seen how to insert rows and how to query information out of our database. In this lesson, we're going to look at what it takes to update and delete existing rows. To demonstrate, let's change the years_with_company of "Kevin Bacon" to 3 and delete the second employee. ~/using_sqlite.py

# previous code omitted

cur.execute(
    "UPDATE employees SET years_with_company = ? WHERE email = ?",
    (3, "kbacon@example.com")
)
print(f"Updated {cur.rowcount} rows")

cur.execute("DELETE FROM employees WHERE id = 2")
print(f"Deleted {cur.rowcount} rows")

cur.execute("SELECT * FROM employees")
print(f"Remaining rows: {len(cur.fetchall())}")

cur.close()
conn.close()
The rowcount attribute allows us to see how many rows were affected by our statements. This attribute doesn't return the number of rows returned by a select query though. Using execute without a Cursor Unless we want to use the methods or attributes that are specific to the Cursor type, most of the operations that we perform can be done by using the execute, executemany, or executescript methods on the Connection type itself. Let's change all of the instances of execute or executemany to use conn, instead of cur unless we're calling a fetch method afterward. We'll run into some errors and it will show us when we can and cannot use conn instead of cur. ~/using_sqlite.py
import sqlite3

#conn = sqlite3.connect('employees.db')
conn = sqlite3.connect(':memory:')
conn.row_factory = sqlite3.Row
cur = conn.cursor()

conn.execute('''
CREATE TABLE IF NOT EXISTS employees (
    id INTEGER PRIMARY KEY,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    email TEXT NOT NULL UNIQUE,
    years_with_company INTEGER DEFAULT 0
)
''')

conn.executemany('''
INSERT INTO employees (first_name, last_name, email, years_with_company) VALUES (?, ?, ?, ?)
''', [
    ('Kevin', 'Bacon', 'kbacon@example.com', 2),
    ('Josh', 'Brolin', 'jbrolin@example.com', 1),
    ('Kim', 'Dickens', 'kdickens@example.com', 0),
])

conn.commit()

cur.execute("SELECT * FROM employees")
print(cur.fetchall())

for row in conn.execute("SELECT * FROM employees WHERE years_with_company >= 1"):
    print(row["first_name"], "has worked for", row["years_with_company"], "years")

conn.execute(
    "UPDATE employees SET years_with_company = ? WHERE email = ?",
    (3, "kbacon@example.com")
)
print(f"Updated {conn.rowcount} rows")

conn.execute("DELETE FROM employees WHERE id = 2")
print(f"Deleted {conn.rowcount} rows")

cur.execute("SELECT * FROM employees")
print(f"Remaining rows: {len(cur.fetchall())}")

cur.close()
conn.close()
Now, if we run the script we'll see this error:
$ python using_sqlite.py
[<sqlite3.Row object at 0x7f0b5b2dfa10>, <sqlite3.Row object at 0x7f0b5b233590>, <sqlite3.Row object at 0x7f0b5b2335b0>]
Kevin has worked for 2 years
Josh has worked for 1 years
Traceback (most recent call last):
  File "using_sqlite.py", line 38, in <module>
    print(f"Updated {conn.rowcount} rows")
AttributeError: 'sqlite3.Connection' object has no attribute 'rowcount'
This shows that we also don't have access to the rowcount attribute. In general, if we need to see the results of our SQL statements, we'll probably want to use a cursor, instead of executing the statement directly off of the connection. Let's change the times when we use rowcount back to using the cursor and then verify that the script can run. ~/using_sqlite.py
import sqlite3

#conn = sqlite3.connect('employees.db')
conn = sqlite3.connect(':memory:')
conn.row_factory = sqlite3.Row
cur = conn.cursor()

conn.execute('''
CREATE TABLE IF NOT EXISTS employees (
    id INTEGER PRIMARY KEY,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    email TEXT NOT NULL UNIQUE,
    years_with_company INTEGER DEFAULT 0
)
''')

conn.executemany('''
INSERT INTO employees (first_name, last_name, email, years_with_company) VALUES (?, ?, ?, ?)
''', [
    ('Kevin', 'Bacon', 'kbacon@example.com', 2),
    ('Josh', 'Brolin', 'jbrolin@example.com', 1),
    ('Kim', 'Dickens', 'kdickens@example.com', 0),
])

conn.commit()

cur.execute("SELECT * FROM employees")
print(cur.fetchall())

for row in conn.execute("SELECT * FROM employees WHERE years_with_company >= 1"):
    print(row["first_name"], "has worked for", row["years_with_company"], "years")

cur.execute(
    "UPDATE employees SET years_with_company = ? WHERE email = ?",
    (3, "kbacon@example.com")
)
print(f"Updated {cur.rowcount} rows")

cur.execute("DELETE FROM employees WHERE id = 2")
print(f"Deleted {cur.rowcount} rows")

cur.execute("SELECT * FROM employees")
print(f"Remaining rows: {len(cur.fetchall())}")

cur.close()
conn.close()

Using User Defined Functions with SQLite

00:07:23

Lesson Description:

Now that we know about the basic capabilities of the sqlite3 library from the standard library, we're in a good position to start digging into slightly more complicated features. In this lesson, we're going to look at how we can add custom functions to our SQL queries. Documentation for This VideoThe sqlite3 library PEP 249 The Connection.create_function method The Connection.create_aggregate methodCreating Custom Functions SQLite doesn't have a way to define custom functions at the database level, but it does have a way to bind a function in an outside language (like Python) to a function identifier that can be used in SQL statements. To do this using the sqlite3 module, we'll define a Python function and then use the Connection.create_function method to map that function to a name that we can use in SQL. Let's add a tenure function that will take the parameters name and years_with_company and then return a short string stating that person's tenure. A trivial example, but it will allow us to see some of the various things that can come into play when defining a custom function. ~/using_sqlite.py:

import sqlite3

def tenure(name, years_with_company):
    return f"{name} has worked for {years_with_company} years"

#conn = sqlite3.connect('employees.db')
conn = sqlite3.connect(':memory:')
conn.row_factory = sqlite3.Row

conn.create_function("tenure", 2, tenure)

# remaining code omitted
We can use this function when we're using a SELECT statement. Let's add another query to the end of the script that will use the first_name and years_with_company as the arguments to this function. ~/using_sqlite.py:
# previous code omitted

for row in conn.execute("SELECT tenure(first_name, years_with_company) FROM employees"):
   print(row[0])

cur.close()
conn.close()
By doing this, each row returned from this query will only contain a single item, a string returned from our custom function using the first_name and years_with_company values from each employee in the database. Running the script will return the following:
$ python3.8 using_sqlite.py
[<sqlite3.Row object at 0x7efe876499d0>, <sqlite3.Row object at 0x7efe8759d5b0>, <sqlite3.Row object at 0x7efe8759d5d0>]
Kevin has worked for 2 years
Josh has worked for 1 years
Updated 1 rows
Deleted 1 rows
Remaining rows: 2
Kevin has worked for 3 years
Kim has worked for 0 years
Creating Custom Aggregates Custom functions execute on values from individual rows from a database query. But, sometimes we want to calculate a result using all of the rows (aggregation) from a query. To do this we need to create a custom aggregate. Unlike a custom function, an aggregate needs to hold onto some state, so we must define a custom class. For this example, let's create a custom aggregate that will sum up the total years of all of our employees, and then return a statement at the end stating the total number of "man-years" that our employees have been with the company. Here's our ManYears class, and how we can add it for use in our SQL: ~/using_sqlite.py:
import sqlite3

def tenure(name, years_worked):
    return f"{name} has worked for {years_worked} years"

class ManYears:
    def __init__(self):
        self.total = 0

    def step(self, value):
        self.total += value

    def finalize(self):
        return f"Total man years: {self.total}"

#conn = sqlite3.connect('employees.db')
conn = sqlite3.connect(':memory:')
conn.row_factory = sqlite3.Row

conn.create_function("tenure", 2, tenure)
conn.create_aggregate("man_years", 1, ManYears)

# remaining code omitted.
Our aggregate class mainly needs initial state through the __init__ method, a step method that will be the function that each rows' value(s) will be passed into, and a finalize method that will return the final aggregated result. The create_aggregate method works the same way as create_function, but we need to pass the class (not an instance of the class) as the third argument. Let's see this in action by using it in a SELECT statement at the bottom of our script: ~/using_sqlite.py:
# previous code omitted

for row in conn.execute("SELECT man_years(years_with_company) from employees"):
    print(row[0])

cur.close()
conn.close()
Finally, when we run this file, we'll see the following:
$ python3.8 using_sqlite.py
[<sqlite3.Row object at 0x7f65b0e5cad0>, <sqlite3.Row object at 0x7f65b0db15d0>, <sqlite3.Row object at 0x7f65b0db15f0>]
Kevin has worked for 2 years
Josh has worked for 1 years
Updated 1 rows
Deleted 1 rows
Remaining rows: 2
Kevin has worked for 3 years
Kim has worked for 0 years
Total man years: 3

CSV

Using the `csv` Module

00:10:51

Lesson Description:

CSV is a common data format that is useful for anyone using a spreadsheet application, and it is fairly common for us to need to create CSV files while programming. In this lesson, we'll learn how we can read and write CSV files using Python's csv module. Documentation for This VideoPython's csv Module csv.SnifferReading a CSV Document Python's standard library comes with a fairly full-featured csv module that will allow us to do perform the primary actions that we want to do when working with CSV:Parse CSV data from strings and files Write data to CSV from lists and dictionaries Determine the format of a CSV file using csv.SnifferTo begin, let's download an example CSV file and create a file to start learning about using the csv module:

$ curl -O https://raw.githubusercontent.com/linuxacademy/content-File-Processing-and-Environment-Communication-with-Python/master/employees.csv
$ touch using_csv.py
The most basic way that we can interact with this file is by using the csv.reader function and iterating over the results. Let's open the file, read it using csv.reader, and print each row: ~/using_csv.py
import csv

file_name = "employees.csv"

with open(file_name, 'r', newline='') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)
When we run this script, we'll see the following:
$ python3.8 using_csv.py
['id', 'first_name', 'last_name', 'age', 'street', 'zip']
['1', 'Ann', 'Steele', '40', 'Vofu Street', '54678']
['2', 'Hallie', 'Summers', '26', 'Bocdas Avenue', '29367']
...
['99', 'Rosalie', 'Beck', '64', 'Luve Terrace', '05262']
['100', 'Henrietta', 'Myers', '37', 'Tawe Loop', '03303']
Each row in the file is turned into a list, and all of the values are strings because the reader has no way to determing if a column (like age) should be a different data type. We can also see the header row. Determining the CSV Dialect Using csv.Sniffer Although the CSV file type stands for Comma-Separated Values, we can use a delimiter other than a comma to separate our values. Sometimes it's useful to determine the "dialect" of the file before really working with it, and we can do that using a csv.Sniffer object. The csv.Sniffer type provides a few useful methods:sniff - Returns a csv.Dialect object that describes the file in terms of the spacing, strictness, and the delimiter used has_header - Returns True if the first row of the CSV file is a header rowBy using these two methods on the csv.Sniffer class, we can do a better job of ensuring that we're parsing the file correctly. Let's modify our code to create a csv.Sniffer, get the dialect of the CSV data, and skip printing the header row if there is one: ~/using_csv.py
import csv

file_name = "employees.csv"
sniffer = csv.Sniffer()

with open(file_name, 'r', newline='') as f:
    snippet = f.read(2048)
    f.seek(0)

    dialect = sniffer.sniff(snippet)
    print(f"Dialect: {dialect.__dict__}")

    reader = csv.reader(f, dialect)

    if sniffer.has_header(snippet):
        header_row = next(reader)

    for row in reader:
        print(row)
The csv.Sniffer methods all take strings or bytes objects as the argument. First, we need to read off a portion of the CSV file, and then seek back to the beginning of the file before passing it the csv.reader function. Next, we use sniffer.sniff to get the dialect based on the snippet that it took from the file. Once we have the dialect, we can pass that object to the csv.reader function as the second argument and it will use the determined configuration values when parsing the file. Finally, if the sniffer determines that the snippet does contain a header row, we're going to force the reader object to skip its first iteration by using the next function before we ever start iterating over it. Here's what we see when we run this script now:
$ python3.8 using_csv.py
Dialect: {'__module__': 'csv', '_name': 'sniffed', 'lineterminator': 'rn', 'quoting': 0, '__doc__': None, 'doublequote':
 False, 'delimiter': ',', 'quotechar': '"', 'skipinitialspace': False}
['1', 'Ann', 'Steele', '40', 'Vofu Street', '54678']
['2', 'Hallie', 'Summers', '26', 'Bocdas Avenue', '29367']
['99', 'Rosalie', 'Beck', '64', 'Luve Terrace', '05262']
['100', 'Henrietta', 'Myers', '37', 'Tawe Loop', '03303']
Because of the sniffing, we wouldn't need to change our code to parse this file if it had instead used the pipe character as a delimiter (|). Writing CSV Now that we have a basic grasp on how we can read information from a CSV file (row by row), we're ready to learn how to write rows to a CSV file. To write to the file, we'll need to open the file with write capabilities. But, remember that using the w mode with the open function will truncate the file. Use r+ if you're planning on adding to an existing file. Let's add an additional row or two to our existing file: ~/using_csv.py
import csv

file_name = "employees.csv"
sniffer = csv.Sniffer()

with open(file_name, 'r+', newline='') as f:
    snippet = f.read(2048)
    f.seek(0)

    dialect = sniffer.sniff(snippet)
    print(f"Dialect: {dialect.__dict__}")

    reader = csv.reader(f, dialect)

    if sniffer.has_header(snippet):
        header_row = next(reader)

    for row in reader:
        last_id = int(row[0])
        print(row)

    writer = csv.writer(f, dialect)

    # writer.writerow(
    #     [last_id + 1, 'Kevin', 'Bacon', 61, 'Mulberry St', 90210]
    # )

    writer.writerows([
        [last_id + 1, 'Kevin', 'Bacon', 61, 'Mulberry St', 90210],
        [last_id + 2, 'Kevin', 'Bacon', 61, 'Mulberry St', 90210],
        [last_id + 3, 'Kevin', 'Bacon', 61, 'Mulberry St', 90210],
    ])
We didn't have to change that many things to make this work, but we did need to change the mode of the CSV file that we opened to r+ so that we could read and write. After we've read through all of the rows of the file using the reader, the cursor is at the end of the file, ready to start writing new rows. To do this, we need to first create a writer object using the csv.writer function. The csv.writer object is very simple and only has a few methods available to it, most notably the writerow and writerows methods. Both of these methods do the same thing, except writerows writes multiple rows at one time. If we run this script again, we'll add the additional rows to the file, though they won't be printed to the screen.

Mapping CSV Rows to Dictionaries

00:07:40

Lesson Description:

Reading data from a CSV file is generally useful, but working with lists of lists is usually not the ideal way to work with our data. Thankfully, the csv module provides us with an easy way to read rows into dictionaries, providing us with a better experience. Documentation for This VideoPython's csv Module csv.DictReader csv.DictWriterReading Rows into Dictionaries We're going to continue working with the employees.csv file that we used in a past lecture, but let's create a separate script to showcase how we can get dictionaries from a CSV file. We'll call this dictionary_csv.py. To begin, let's simply iterate over the rows to see exactly what we get back from the file in the simplest way that we can use the csv.DictReader class: ~/dictionary_csv.py

import csv

file_name = "employees.csv"

with open(file_name, newline='') as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row)
When we run the script we'll see the following:
$ python3.8 dictionary_csv.py
{'id': '1', 'first_name': 'Ann', 'last_name': 'Steele', 'age': '40', 'street': 'Vofu Street', 'zip': '54678'}
{'id': '2', 'first_name': 'Hallie', 'last_name': 'Summers', 'age': '26', 'street': 'Bocdas Avenue', 'zip': '29367'}
...
{'id': '98', 'first_name': 'Estella', 'last_name': 'Flowers', 'age': '54', 'street': 'Aztus Place', 'zip': '04186'}
{'id': '99', 'first_name': 'Rosalie', 'last_name': 'Beck', 'age': '64', 'street': 'Luve Terrace', 'zip': '05262'}
{'id': '100', 'first_name': 'Henrietta', 'last_name': 'Myers', 'age': '37', 'street': 'Tawe Loop', 'zip': '03303'}
Notice that the header row from the employees.csv file is not printed or returned as a row. Instead, those headers are automatically used as the keys in the dictionaries returned. In Python 3.8 or newer, the rows are of the dict type, but in older versions of Python, the type will be slightly different (though still behave like a dictionary). Now when going through this file, we can access the specific keys that we want. Using Dictionaries to Add Rows to CSV Using dictionaries to add rows to a CSV file is a little bit more involved than reading the data from the file. To achieve this, we'll be using the csv.DictWriter class. One thing that we definitely need is a list of fields. If we don't already know these off-hand and we're working with an existing file, then we'll want to store the header row. Let's add a few more rows to the existing file before creating a brand new CSV file: ~/dictionary_csv.py
import csv

file_name = "employees.csv"

with open(file_name, 'r+', newline='') as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row)

    writer = csv.DictWriter(f, fieldnames=reader.fieldnames)
    writer.writerows([
        {
            'id': 120,
            'first_name': 'Keith',
            'last_name': 'Thompson',
            'street': 'Berry St',
            'zip': '11000'
        },
        {
            'id': 121,
            'first_name': 'Larry',
            'last_name': 'Frittz',
            'street': 'Washington St',
            'zip': '00011'
        }
    ])
Notice that we're using the same writerows method that we used on the standard writer type, but we're passing in dictionaries. The fieldnames parameter takes a list of strings that are exactly like the header row from the CSV file itself. Notice too that the order of the keys in our dictionaries don't matter, we just need the keys to match up with the fieldnames. We're leaving out the age key to see what happens when we're missing a key. It's worth noting that errors will occur when there are any extra keys on the dictionaries that we're trying to write to the file. When we run this script and then look at the employees.csv file we should see the following:
$ python3.8 dictionary_csv.py
...
$ cat employees.csv
id,first_name,last_name,age,street,zip
1,Ann,Steele,40,Vofu Street,54678
2,Hallie,Summers,26,Bocdas Avenue,29367
3,Allen,Davidson,48,Ribon View,41634
...
100,Henrietta,Myers,37,Tawe Loop,03303
120,Keith,Thompson,,Berry St,11000
121,Larry,Frittz,,Washington St,00011
Creating a New CSV File Using DictWriter To see what it's like to create a new file using a DictWriter, let's create a new script called new_csv.py. The important things that we'll need are:A filename to write to A list of field names Dictionaries to add to the file as rowsLet's create a simple CSV file called states.csv that has two fields:name - The name of the state population - The population of the state in millionsHere's our final script that will fully generate a CSV file using dictionaries: ~/new_csv.py
import csv

states = [
    { "name": "California", "population": "39,512,223" },
    { "name": "Texas", "population": "28,995,881" },
    { "name": "Florida", "population": "21,477,737" },
    { "name": "New York", "population": "19,453,561" },
    { "name": "Pennsylvania", "population": "12,801,989" },
    { "name": "Illinois", "population": "12,671,821" },
    { "name": "Ohio", "population": "11,689,100" },
    { "name": "Georgia", "population": "10,617,423" },
    { "name": "North Carolina", "population": "10,488,084" },
    { "name": "Michigan", "population": "9,986,857" },
    { "name": "New Jersey", "population": "8,882,190" },
    { "name": "Virginia", "population": "8,535,519" },
    { "name": "Washington", "population": "7,614,893" },
    { "name": "Arizona", "population": "7,278,717" },
    { "name": "Massachusetts", "population": "6,949,503" },
    { "name": "Tennessee", "population": "6,833,174" },
    { "name": "Indiana", "population": "6,732,219" },
    { "name": "Missouri", "population": "6,137,428" },
    { "name": "Maryland", "population": "6,045,680" },
    { "name": "Wisconsin", "population": "5,822,434" },
    { "name": "Colorado", "population": "5,758,736" },
    { "name": "Minnesota", "population": "5,639,632" },
    { "name": "South Carolina", "population": "5,148,714" },
    { "name": "Alabama", "population": "4,903,185" },
    { "name": "Louisiana", "population": "4,648,794" },
    { "name": "Kentucky", "population": "4,467,673" },
    { "name": "Oregon", "population": "4,217,737" },
    { "name": "Oklahoma", "population": "3,956,971" },
    { "name": "Connecticut", "population": "3,565,287" },
    { "name": "Utah", "population": "3,205,958" },
    { "name": "Iowa", "population": "3,155,070" },
    { "name": "Puerto Rico", "population": "3,193,694" },
    { "name": "Nevada", "population": "3,080,156" },
    { "name": "Arkansas", "population": "3,017,825" },
    { "name": "Mississippi", "population": "2,976,149" },
    { "name": "Kansas", "population": "2,913,314" },
    { "name": "New Mexico", "population": "2,096,829" },
    { "name": "Nebraska", "population": "1,934,408" },
    { "name": "Idaho", "population": "1,787,065" },
    { "name": "West Virginia", "population": "1,792,147" },
    { "name": "Hawaii", "population": "1,415,872" },
    { "name": "New Hampshire", "population": "1,359,711" },
    { "name": "Maine", "population": "1,344,212" },
    { "name": "Montana", "population": "1,068,778" },
    { "name": "Rhode Island", "population": "1,059,361" },
    { "name": "Delaware", "population": "973,764" },
    { "name": "South Dakota", "population": "884,659" },
    { "name": "North Dakota", "population": "762,062" },
    { "name": "Alaska", "population": "731,545" },
    { "name": "District of Columbia", "population": "705,749" },
    { "name": "Vermont", "population": "623,989" },
    { "name": "Wyoming", "population": "578,759" },
    { "name": "Guam", "population": "165,718" },
    { "name": "U.S. Virgin Islands", "population": "104,914" },
    { "name": "American Samoa", "population": "55,641" },
    { "name": "Northern Mariana Islands", "population": "55,194" },
]

file_name = "states.csv"

with open(file_name, 'w', newline='') as f:
    writer = csv.DictWriter(f, fieldnames=["name", "population"])
    writer.writeheader()
    writer.writerows(states)
The fieldnames key is required for the csv.DictWriter, so we need to provide that to our file object when we create our writer object. Next, we want to write our headers using the writeheader method that will simply create the header row using the fieldnames values. Finally, we'll pass our large list of dictionaries to the writerows method. Let's run our script and look at the final CSV file:
$ python3.8 new_csv.py
$ cat stats.csv
name,population
California,"39,512,223"
Texas,"28,995,881"
Florida,"21,477,737"
...
Wyoming,"578,759"
Guam,"165,718"
U.S. Virgin Islands,"104,914"
American Samoa,"55,641"
Northern Mariana Islands,"55,194"
Notice that our CSV automatically quoted the population values since they contain commas. Otherwise, if the quotation marks weren't there, it would turn a single value into multiple fields.

XML

An Overview of Python's XML Submodules

00:02:12

Lesson Description:

XML is a very flexible markup language that is used by many APIs and applications. While it is not necessarily the most popular choice data transport (JSON has taken this crown) it is still likely that a programmer will need to interface with an application that produces and consumes XML. In this lesson, we'll take a look at the various XML modules that exist in Python's standard library before finally landing on one that we'll use to work with XML ourselves. Documentation For This VideoPython XML Documentation Defusedxml DocumentationPython's Various XML Modules The Python standard library comes with quite a few different XML libraries and many of them have security vulnerabilities. Python's documentation recommends using a different module, called defusedxml, that customizes the various modules while patching the various vulnerabilities. Throughout this section of the course we'll be using the defusedxml library while looking mostly at the documentation for the original XML modules in the official Python documentation.

Parsing XML Documents

00:15:51

Lesson Description:

In the event that we receive an XML document that we need to extract data from, we're going to need to parse the XML. In this lesson, we'll learn to use the ElementTree module (through the defusedxml module) to parse XML. Documentation For This VideoPython XML Documentation defusedxml Documentation xml.etree.elementtree Documentation ElementTree Documentation Element DocumentationInstalling defusedxml Before we begin working with XML, we will want to install defusedxml, so we can benefit from the security patches in that library while still using the APIs from the standard library modules. Let's install the library now:

$ pip3.8 install defusedxml
Parsing an XML Document To get started, we're going create a directory and a file within it, and download the example XML file from this course's Git repository. Run the following command to download the sample and create a file to start parsing XML:
$ mkdir learning_xml
$ cd learning_xml
$ touch parsing_xml.py
$ curl -O https://raw.githubusercontent.com/linuxacademy/content-File-Processing-and-Environment-Communication-with-Python/master/products.xml
We're going to be using the ElementTree XML module through defusedxml, so we can parse our XML document into a tree structure. For the most part, we'll be able to use the official Python documentation for this library, even though we'll be accessing it through defusedxml. Let's get started by opening the file and parsing the entire file into an ElementTree object: ~/parsing_xml.py
import sys
import defusedxml.ElementTree as ET

file_name = "products.xml"

try:
    tree = ET.parse(file_name)
except FileNotFoundError:
    print(f"File not found: {file_name}")
    sys.exit(1)
We have a tree object that is an xml.etree.ElementTree.ElementTree object, which represents the entire document. From here we're able to search through the XML using some useful methods on the top-level tree, or we can access the root of the tree which is an Element object. With these two types of objects, we're able to access the data that we've parsed from the XML document. Searching for Elements When it comes to searching for elements, we have a few different ways that we can search. Both the ElementTree and the Element types have a lot of the same methods. But, that's because the ElementTree will delegate these methods to the "root" element of the document. Let's search for all of the product tags. We'll eventually print out their names and price values with slight modifications, depending on the currency type:
import sys
import defusedxml.ElementTree as ET

file_name = "products.xml"

try:
    tree = ET.parse(file_name)
except FileNotFoundError:
    print(f"File not found: {file_name}")
    sys.exit(1)

for product in tree.findall("product"):
    print(product)
Running this file will print something like this:
$ python3.8 parsing_xml.py
<Element 'product' at 0x100a04ae0>
<Element 'product' at 0x100a250e0>
<Element 'product' at 0x100a25220>
<Element 'product' at 0x100a25360>
<Element 'product' at 0x100a254a0>
There are many different ways that we could have achieved this that all work a little bit differently. But, it's important to know that the find, findall, and findtext methods on the Element type (and also on the ElementTree type) will only find the elements that match the search if the element is a direct child. Let's launch our REPL and play with all of the search methods before continuing with our script. If we want to search directly off of the root element, we can use the getroot method from the ElementTree class:
$ python3.8
>>> import defusedxml.ElementTree as ET
>>> tree = ET.parse('products.xml')
>>> root = tree.getroot()
Now we're able to see using the root object. Let's see how find, findall, and findtext work:
>>> root.find('product')
<Element 'product' at 0x10e6b72c0>
>>> root.findall('product')
[<Element 'product' at 0x10e6b72c0>, <Element 'product' at 0x10e
6b7810>, <Element 'product' at 0x10e6b7c20>, <Element 'product'
at 0x10e6b7d60>, <Element 'product' at 0x10e7222c0>]
>>> root.findtext('product')
'n    '
With these searches, we're simply looking for elements that are the product tag. What if we instead try to find the price elements that are within the child product elements:
>>> root.findall('price')
[]
This won't work because the price elements aren't direct children of the products tag that our root variable represents. We can still access these from the root element by using XPath notation to dig a little deeper into the tree. Without digging too deeply into XPath notation, a filesystem is also a tree structure, and when we specify a relative or absolute path we're writing something that would work as an XPath search for a nested element. Here's an example where we access the price elements from within the products:
>>> root.find('product/price')
<Element 'price' at 0x10e6b70e0>
>>> root.findall('product/price')
[<Element 'price' at 0x10e6b70e0>, <Element 'price' at 0x10e6b77
20>, <Element 'price' at 0x10e6b7e50>, <Element 'price' at 0x10e
6b7db0>, <Element 'price' at 0x10e7229f0>]
>>> root.findtext('product/price')
'15.00'
A more complicated example would be to only find the price elements where the currency attribute is EUR:
>>> root.findall('./product/price[@currency="EUR"]')
[<Element 'price' at 0x10e7229f0>]
The XPath support provided by the ElementTree module is not full, but can still be incredibly powerful. The other ways that we can search are by using the iter family of methods (iter, iterfind, itertext). The big difference between these is that their methods will return an iterator object instead of a list. Additionally, iter will find elements by tag name recursively, meaning the tag doesn't need to be a direct child of the current element. Let's use the iter and iterfind methods to see how we would get these price elements:
>>> root.iter('price')
<_elementtree._element_iterator object at 0x10e87c900>
>>> list(root.iter('price'))
[<Element 'price' at 0x10e6b70e0>, <Element 'price' at 0x10e6b77
20>, <Element 'price' at 0x10e6b7e50>, <Element 'price' at 0x10e
6b7db0>, <Element 'price' at 0x10e7229f0>]
>>> root.iterfind('./product/price[@currency="EUR"]')
<generator object prepare_predicate.<locals>.select at 0x10e86f5
10>
>>> list(root.iterfind('./product/price[@currency="EUR"]'))
[<Element 'price' at 0x10e7229f0>]
The last thing we want to know about the element type is that it's an iterable, and when converted to a list, it will give a list of the element's children.
>>> list(root)
[<Element 'product' at 0x10e6b72c0>, <Element 'product' at 0x10e
6b7810>, <Element 'product' at 0x10e6b7c20>, <Element 'product'
at 0x10e6b7d60>, <Element 'product' at 0x10e7222c0>]
Accessing Element Information Now that we know how to access elements, let's finish our script by accessing the text from our products' child nodes and also print a slightly different message, based on the currency. To do this, we'll add and use the find method to our script, to get the child elements, and use the get method to access the currency attribute on the price elements: ~/parsing_xml.py
# previous code omitted

for product in tree.findall("product"):
    name = product.findtext("name")
    description = product.findtext("description")
    price = product.find("price")
    currency = price.get("currency") or "USD"

    if currency == "EUR":
        price_text = f"{price.text.replace('.', ',')} u20ac"
    else:
        price_text = f"${price.text}"

    message = f"""
    {name} - {price_text}
    {description}"""

    print(message)
Here we're doing a few things. We're accessing the name and description values directly by using the findtext method. Displaying the price is a little more complicated. For the price we're getting the entire price element, using the get method to access the currency attribute on it, and setting a default of USD if there isn't one. We create a price_text variable that has the price formatted properly, based on the currency. Notice that we get the text off of the price element by accessing the object's text property. Finally, we create a multi-line format string so that we can present the products properly on the screen. Here's what it looks like when we run this script:
$ python3.8 parsing_xml.py
    Lamp - $15.00
    Light your desk with style with this desk lamp.

    Microphone - $79.00
    Take your remote meeting audio to the next level with this awesome USB microphone.

    Webcam - $39.00
    Show your face with extreme clarity in your next video meeting using this excellent webcam.

    Standing Desk - $1200.00
    Improve your health even with a desk job using this adjustable sitting/standing desk.

    Desk Chair - 350,00 €
    Sit in comfort with this ergonomic desk chair.

Creating an XML Document

00:14:21

Lesson Description:

We've seen how to parse an XML document and work with the ElementTree and Element types, now we're ready to use those same types to create new XML documents. Documentation For This VideoPython XML Documentation defusedxml Documentation xml.etree.elementtree Documentation ElementTree Documentation Element Documentation xml.etree.ElementTree.SubElement Documentation xml.dom.minidom DocumentationBuilding an XML Tree Structure To create a new XML document with Python, we will primarily work with the Element class from our XML library. Since we're building a tree structure, we're going to have elements that have child and parent elements, until we get to the root element that has no parent. We're going to go back to using our employee example where we have name, age, and years_with_company values. XML is extensible, so we can create tags with names that directly match these values while wrapping them in an employee tag. We can then create a parent tag, called employees or staff, that contains all of our employee tags. To start creating our first XML document, let's create a new script called creating_xml.py:

$ touch creating_xml.py
Since we'll be creating multiple employee tags, let's create a function to build this element based on some parameters. Security vulnerabilities exist around parsing XML that is passed into our applications. Because we're not parsing XML, we will directly work with the xml.etree.ElementTree module: ~/creating_xml.py
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element, ElementTree

def create_employee(name, age, years_with_company=0):
    employee = Element('employee')
    return employee

ET.dump(create_employee('Kevin', 61, 3))
We're only creating a single element, but let's see what is returned by just these few lines:
$ python3.8 creating_xml.py
<employee />
Because our element doesn't have any children (or "sub-elements") it is written out as a self-closing tag. Now we're ready to start creating our sub-elements. Let's add name as an attribute on the employee tag itself. To do this we'll need to use the set method to set the attribute, and the ET.SubElement function to add a sub-element to our existing employee element. Note: The module name for ElementTree can be a little confusing and that's why we pull in the top-level module with a separate name of ET. ~/creating_xml.py
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element, ElementTree

def create_employee(name, age, years_with_company=0):
    employee = Element('employee')
    employee.set('name', name)

    age_tag = ET.SubElement(employee, 'age')
    age_tag.text = age

    return employee

ET.dump(create_employee('Kevin', 61, 3))
We're setting the text property on our age_tag because there isn't a way to set that when we're creating the element itself. But, ET.SubElement is creating an age element for us, adding it as a child of employee, and then returning the element to us. Let's run the script again to see what happens:
$ python3.8 creating_xml.py
<employee name="Kevin"><age>Traceback (most recent call last):
  File "/home/cloud_user/.pyenv/versions/3.8.2/lib/python3.8/xml/etree/ElementTree.py", line 1063, in _escape_cdata
    if "&" in text:
TypeError: argument of type 'int' is not iterable
...
$
This issue occurs when trying to pass an integer, age value, as the text of an element. First, we need to convert, to strings, all of the values that we want to write into our XML structure. Let's make this modification and run the script again. We'll also add the years_with_company sub-element using the same approach that we did with the age: ~/creating_xml.py
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element, ElementTree

def create_employee(name, age, years_with_company=0):
    employee = Element("employee")
    employee.set("name", name)

    age_tag = ET.SubElement(employee, "age")
    age_tag.text = str(age)

    years_tag = ET.SubElement(employee, "years_with_company")
    years_tag.text = str(years_with_company)

    return employee

ET.dump(create_employee("Kevin", 61, 3))
python3.8 creating_xml.py
<employee name="Kevin"><age>61</age><years_with_company>3</years_with_company></employee>
We've now laid out a way to create our most complicated element, and we're in a good position to structure the entire document. Creating an XML Document Using ElementTree We can create our employee elements using data for an individual employee, and to finish this document we'll need to do the following:Create an employees element to be the root of our document. Create employee tags for each employee that we have, and add those tags under the employees tag. Create an ElementTree object to hold onto our root element and handle the settings for the overall XML document. Write the final XML structure to a file using our ElementTree object.To start, let's add some data to work with by creating a list of employee dictionaries. Then, we'll create our root element, iterate through the employees while building new employee tags, and add them to the root: ~/creating_xml.py
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element, ElementTree

def create_employee(name, age, years_with_company=0):
    employee = Element("employee")
    employee.set("name", name)

    age_tag = ET.SubElement(employee, "age")
    age_tag.text = str(age)

    years_tag = ET.SubElement(employee, "years_with_company")
    years_tag.text = str(years_with_company)

    return employee

employees = [
    {"name": "Kevin Bacon", "age": 61, "years_with_company": 3},
    {"name": "Josh Broline", "age": 52, "years_with_company": 1},
    {"name": "Kim Dickens", "age": 54},
]

root = Element("employees")

for employee in employees:
    employee_tag = create_employee(**employee)
    root.append(employee_tag)

ET.dump(root)
Because we're creating the employee tags using the create_employee function, we can't use ET.SubElement to add it out our root. Thankfully, we can add an element that already exists as a sub-element using the Element.append method. We'll use this to add each employee_tag to the root object. Our dictionary keys map perfectly to the parameter names for the create_employee method. So, we're unpacking the dictionary as keyword arguments for the function, instead of manually passing them in. Let's run the script to see if it's worked:
$ python3.8 creating_xml.py
<employees><employee name="Kevin Bacon"><age>61</age><years_with_company>3</years_with_company></employee><employee name="Josh Broline"><age>52</age><years_with_company>1</years_with_company></employee><employee name="Kim Dickens"><age>54</age><years_with_company>0</years_with_company></employee></employees>
It's a little hard to read, since it isn't pretty-printed, but we have multiple employee tags within the employees tag. We're ready to create our ElementTree object, set the root element, and write a complete XML document to a file. ~/creating_xml.py
# previous code omitted

tree = ElementTree(element=root)
tree.write("output.xml", encoding="UTF-8", xml_declaration=True)
When creating our ElementTree, we're able to set the element, which is the root. Next, we use the write method to output the element to a file. The optional parameters of the write function allows us to specify the other pieces of information that might be useful when writing out the content. In our case, we want the encoding to show as UTF-8 for whoever consumes this XML. We also want the <?xml version='1.0' ... ?> declaration to be present at the top of the file, regardless of the encoding that is set. Here's what we see if we run our script and read out the file:
$ python3.8 creating_xml.py && cat output.xml
<?xml version='1.0' encoding='UTF-8'?>
<employees><employee name="Kevin Bacon"><age>61</age><years_with_company>3</years_with_company></employee><employee name="Josh Broline"><age>52</age><years_with_company>1</years_with_company></employee><employee name="Kim Dickens"><age>54</age><years_with_company>0</years_with_company></employee></employees>
The XML content is not pretty-printed, so it is a little hard to read, but we've successfully created a valid XML file. Pretty-Printing XML Sometimes we want the XML files that we created to also be human-readable. In those cases, we'll have to rely on more than just the ElementTree module. Thankfully, the standard library provides a different module that has the ability to pretty-print: xml.dom.minidom. To create pretty-printed XML, we'll need to create a string from our root element using ET.tostring, and parse that using xml.dom.minidom.parseString. Finally, we'll call toprettyxml on the Node returned from parseString. Let's add this to our script now: ~/creating_xml.py
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element, ElementTree

from xml.dom.minidom import parseString

# previous code omitted

xml_string = ET.tostring(root)
dom = parseString(xml_string)

pretty_xml = dom.toprettyxml(encoding="UTF-8")
with open("pretty.xml", "wb") as f:
    f.write(pretty_xml)
The toprettyxml function will return a bytes object, and we'll write that to a file that we opened using the wb mode. Running this one last time and printing out the pretty.xml file will yield this:
$ python3.8 creating_xml.py && cat pretty.xml
<?xml version="1.0" encoding="UTF-8"?>
<employees>
    <employee name="Kevin Bacon">
        <age>61</age>
        <years_with_company>3</years_with_company>
    </employee>
    <employee name="Josh Broline">
        <age>52</age>
        <years_with_company>1</years_with_company>
    </employee>
    <employee name="Kim Dickens">
        <age>54</age>
        <years_with_company>0</years_with_company>
    </employee>
</employees>
We've seen how we can easily create and manipulate XML elements, print them to a file, and even pretty-print them by using more than one XML module from the standard library.

Logging

Using the `logging` Module

00:06:59

Lesson Description:

As our programs are running, we will often want to log messages to ourselves so we have some insight into what our applications are doing. In this lesson, we'll learn how we can use the logging library to write logs from our scripts and applications. Documentation for This Videologging Module Documentation logging.basicConfigUsing the logging Module To get started with logging, let's create a script called using_logging.py and import the logging module. For the most part, we'll be using the logging module itself to log messages, and apply configurations directly to the module. For now, we'll have our script simply load the module and print messages using the various log levels. Below are the log levels that we can use, and a brief bit about when we would use them:DEBUG - Most detailed messages, used for diagnostics INFO - Messages that are not related to a problem, give giving a little insight into what is going on at a specific type WARNING - Software is still working, but something unexpected happened ERROR - An actual problem occurred and prevented the application from doing what it was trying to do CRITICAL - Most serious error, this is more or less is something we would use right before our program crashesHere's how we would use each of these methods: ~/using_logging.py

import logging

logging.debug("Debug message")
logging.info("Info message")
logging.warning("Warning message")
logging.error("Error message")
logging.critical("Critical message")

print("Ran entire script")
When we run this script, we'll see the following:
$ python3.8 using_logging.py
WARNING:root:Warning message
ERROR:root:Error message
CRITICAL:root:Critical message
Ran entire script
By default, INFO and DEBUG message aren't going to be printed. To adjust this, we'll need to learn how to configure the logging module. Configuring the logging Module To configure our logger, we'll use the logging.basicConfig function. This function takes quite a few different options, but the important ones for us to use in this lesson are filename, filemode, and level. It's important that we remember to configure the module pretty early in our application's execution so that we get the expected behavior every time we try to log. Let's configure our logger to print to the example.log, with the DEBUG level using the w file mode to rewrite the file each time we run the script. By default, the file mode is a, ensuring new log lines are added to the file without replacing old lines. This is usually the behavior we want, but it is good to know that we can adjust this: ~/using_logging.py
import logging

logging.basicConfig(filemode="w", filename="example.log", level=logging.DEBUG)

logging.debug("Debug message")
logging.info("Info message")
logging.warning("Warning message")
logging.error("Error message")
logging.critical("Critical message")

print("Ran entire script")
To set the log level, we'll use the constant, logging.DEBUG. Log levels all have a numeric value and these constants give us easy access to these numeric values. By setting the log level to debug, we will print all of the log level messages. Any log level with a higher numeric value than our set level will be printed. Let's run the script again to see what happens:
$ python3.8 using_logging.py
Ran entire script
$ cat example.log
DEBUG:root:Debug message
INFO:root:Info message
WARNING:root:Warning message
ERROR:root:Error message
CRITICAL:root:Critical message
We're successfully logging all of the messages to a file.

Defining Log Formatters

00:04:38

Lesson Description:

Now that we know how to do basic logging, we should work on improving how we log by customizing the formatting. In this lesson, we can do this by creating custom formatters. Documentation for This Videologging Module Documentation logging.basicConfig LogRecord AttributesCustomizing Using a Format String The simplest way for us to customize a messages that we log is by setting the format argument when we use the logging.basicConfig function. When it comes to formatting, there are a lot of attributes that we have access to that we'll probably want to sprinkle throughout our logging format. These are detailed in the LogRecord attributes portion of the documentation. For our first custom log format string, let's include the time using asctime, levelname, filename, and message. Let's form a message using these attributes in our config file: ~/using_logging.py

import logging

format_string = "%(asctime)s [%(levelname)s] - %(filename)s : %(message)s"

logging.basicConfig(
    filemode="w", filename="example.log", format=format_string, level=logging.DEBUG
)

logging.debug("Debug message %s %s", "a", 1)
logging.info("Info message")
logging.warning("Warning message")
logging.error("Error message")
logging.critical("Critical message")

print("Ran entire script")
For each of the attributes that we need to print, we'll be placing them between %(...)s. Another thing that we're demonstrating here is how we can use arguments when we're making a logging call. Any positional argument that is passed in after the initial message needs to be interpolated into the initial message, or we'll receive an error. Let's run this script again and see what happens:
$ python3.8 using_logging.py && cat example.log
2020-04-27 10:30:14,666 [DEBUG] - using_logging.py : Debug message a 1
2020-04-27 10:30:14,666 [INFO] - using_logging.py : Info message
2020-04-27 10:30:14,666 [WARNING] - using_logging.py : Warning message
2020-04-27 10:30:14,666 [ERROR] - using_logging.py : Error message
2020-04-27 10:30:14,666 [CRITICAL] - using_logging.py : Critical message
Creating a Formatter Object In addition to setting the format type on the overall logging module, we're able to create separate logging.Formatter objects. This won't be useful right now, but in the next lesson, we'll learn how to create multiple logging handlers and loggers where formatter objects are very useful.

Using Multiple Log Handlers

00:06:21

Lesson Description:

We have a basic understanding of how to configure the overall logging module, and how to format the messages that we're logging out. In this lesson, we'll learn how to take that further by logging to multiple destinations, potentially using multiple loggers and logging formats. Documentation for This Videologging Module Documentation logging.basicConfig logging.Formatter Useful Handlers logging.FileHander logging.StreamHandlerCreating Multiple Loggers and Handlers To configure our logging to stream to STDOUT and a file, we'll need to configure multiple logging handlers. The main handler types that we'll use will be FileHandler and StreamHandler, but there are other useful handlers. Let's adjust our script to modify the default logger, the root logger, by adding a StreamHandler for STDOUT, while using FileHandler to log to our file: ~/using_logging.py

import logging
import sys

format_string = "%(asctime)s [%(levelname)s] - %(filename)s : %(message)s"

# logging.basicConfig(
#     filemode="w", filename="example.log", format=format_string, level=logging.DEBUG
# )

logger = logging.getLogger()  # get the `root` logger

file_handler = logging.FileHandler(filename="example.log", mode="w")
stdout_handler = logging.StreamHandler(sys.stdout)

logger.addHandler(file_handler)
logger.addHandler(stdout_handler)
logger.setLevel(logging.DEBUG)

logging.debug("Debug message %s %s", "a", 1)
logging.info("Info message")
logging.warning("Warning message")
logging.error("Error message")
logging.critical("Critical message")

print("Ran entire script")
The logging.getLogger function will return the various loggers that we might have based on their name. If we don't specify the name, we'll get the default logger which is the root logger. By getting the logger object itself, we can do some more complicated configurations, such as adding additional handlers, but the root logger is still the logger being used when we call the functions directly off of the logging module. After getting the logger, we create a logging.FileHandler that writes to the example.log file with the w mode (just like we were before). We also create a logging.StreamHandler that writes to STDOUT by passing in sys.stdout as the stream. To add these handlers to a logger, we use the addHandler method on the Logger class. Let's run our script again and see what happens:
$ python3.8 using_logging.py && cat example.log
Debug message a 1
Info message
Warning message
Error message
Critical message
Ran entire script
Debug message a 1
Info message
Warning message
Error message
Critical message
We're seeing the output from STDOUT, including the print function's output, but we're also writing to the example.log file. Using a Formatter Object In addition to customizing our loggers with handlers, we can also create Formatter objects and use the logging.Handler.setFormatter method. A logger can have multiple handlers, and each handler gets to set its formatter. This means that we can format messages slightly differently when logging to one location over another. Let's set a custom format string for our FileHandler, but not our StreamHandler. ~/using_logging.py
import logging
import sys

format_string = "%(asctime)s [%(levelname)s] - %(filename)s : %(message)s"

# logging.basicConfig(
#     filemode="w", filename="example.log", format=format_string, level=logging.DEBUG
# )

logger = logging.getLogger()  # get the `root` logger

formatter = logging.Formatter(format_string)

file_handler = logging.FileHandler(filename="example.log", mode="w")
file_handler.setFormatter(formatter)

stdout_handler = logging.StreamHandler(sys.stdout)

logger.addHandler(file_handler)
logger.addHandler(stdout_handler)
logger.setLevel(logging.DEBUG)

logging.debug("Debug message %s %s", "a", 1)
logging.info("Info message")
logging.warning("Warning message")
logging.error("Error message")
logging.critical("Critical message")

print("Ran entire script")
Now we should be able to see a big difference between STDOUT and the formatted logs that we're writing to the file:
$ python using_logging.py && cat example.log
Debug message a 1
Info message
Warning message
Error message
Critical message
Ran entire script
2020-04-27 13:55:08,519 [DEBUG] - using_logging.py : Debug message a 1
2020-04-27 13:55:08,519 [INFO] - using_logging.py : Info message
2020-04-27 13:55:08,519 [WARNING] - using_logging.py : Warning message
2020-04-27 13:55:08,519 [ERROR] - using_logging.py : Error message
2020-04-27 13:55:08,519 [CRITICAL] - using_logging.py : Critical message

Configuration Parsing with `configparser`

Using the `configparser` Module

00:11:48

Lesson Description:

When writing applications and libraries, we sometimes need to expose a lot of configuration options to our users, and allowing for configuration files to be used can make things easier. In this lesson, we'll learn about the configparser module from Python's standard library that allows us to easily consume INI style configuration files from within our programs. Documentation for This Videoconfigparser logging.config logging.config.fileConfigParsing Config Files using configparser The INI config format is defined pretty loosely, but in general, it allows for sections and key/value pairs. The combination of these things pretty closely maps to a dictionary that can have dictionary values within. The configparser.ConfigParser class essentially does just that. To get started, let's download an example config file for us to parse and then we'll write a simple script that will utilize the configuration values:

$ curl -O https://raw.githubusercontent.com/linuxacademy/content-File-Processing-and-Environment-Communication-with-Python/master/servers.ini
$ touch using_config.py
In our using_config.py file, we'll import configparser and use the ConfigParser class. We won't pass any arguments when we instantiate our parser, but we'll use the read method to parse the values from the servers.ini file. ~/using_config.py
import configparser

config = configparser.ConfigParser()
config.read('servers.ini')
From this point, our config object can work a lot like a dictionary where the keys are the section names that we had in square brackets, and then each of those sections has keys/values. If we don't know the sections in the file then we can use the sections method, but for most other things we can use the get method or subscript into the object. Let's iterate through the sections and print each of the sections' keys: ~/using_config.py
import configparser

config = configparser.ConfigParser()
config.read("servers.ini")

for section in config:
    for key in config[section]:
        print(f"{section} - {key} : {config[section][key]}")
Let's run this and see what we learn:
$ python3.8 using_config.py
DEFAULT - force_https : false
DEFAULT - use_compression : true
auth.example.com - force_https : true
auth.example.com - use_compression : true
www.example.com - use_compression : false
www.example.com - compression_level : 4
www.example.com - redirect_to_login : yes
www.example.com - login_url : https://auth.example.com
www.example.com - force_https : false
www.example.com_users - tina : Tina Turner
www.example.com_users - ricky : Ricky Bobby
www.example.com_users - force_https : false
www.example.com_users - use_compression : true
Notice that every section has a value for use_compression and force_https because we have a [DEFAULT] section. The ConfigParser will set those values for all of the sections that don't already define that key. Another thing to note is that all of the values getting returned to us will be strings. So, for things like true and false values, we'll need to convert those to booleans or the numbers to integers or floats. Thankfully, we can do this when we're reading the keys by using the getboolean, getint, and getfloat methods. Here's an example:
>>> config['www.example.com_users'].getboolean('use_compression')
True
We also have the general get method that will allow us to try to get a value from a section, provide a potential fallback, and avoid errors in the process.
>>> config['www.example.com_users']['fake_key']
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    config['www.example.com_users']['fake_key']
  File "/home/cloud_user/.pyenv/versions/3.8.2/lib/python3.8/configparser.py", line 1254, in __getitem__
    raise KeyError(key)
KeyError: 'fake_key'
>>> config['www.example.com_users'].get('fake_key')
>>> config['www.example.com_users'].get('fake_key', False)
False
]
Writing Config Values Now that we know how to read a config file, we can take a look at how to modify or write a new one. Let's add a port key to every section with the DEFAULT getting the value of 443 and then 8080, 8081, etc. for the others. After we've done that, we'll write the values of our config to a new file called new_servers.ini: ~/using_config.py
import configparser

config = configparser.ConfigParser()
config.read("servers.ini")

port_offset = 0
for section in config:
    for key in config[section]:
        print(f"{section} - {key} : {config[section][key]}")

    if section != "DEFAULT":
        config[section]["port"] = str(8080 + port_offset)
    else:
        config[section]["port"] = str(443)

    port_offset += 1


with open("new_servers.ini", "w") as f:
    config.write(f)
We're just treating config like a dictionary, except that we can only set or modify keys using string values. Lastly, we write the contents of our config out using the write method which takes a file-object. Let's run the file one last time:
$ python3.8 using_config.py
DEFAULT - force_https : false
DEFAULT - use_compression : true
auth.example.com - force_https : true
auth.example.com - use_compression : true
auth.example.com - port : 443
www.example.com - use_compression : false
www.example.com - compression_level : 4
www.example.com - redirect_to_login : yes
www.example.com - login_url : https://auth.example.com
www.example.com - force_https : false
www.example.com - port : 443
www.example.com_users - tina : Tina Turner
www.example.com_users - ricky : Ricky Bobby
www.example.com_users - force_https : false
www.example.com_users - use_compression : true
www.example.com_users - port : 443
Notice that we set the value for port on DEFAULT after printing its values, and that was immediately reflected in every other section. This tells us that the default values are read in lazily when we access a section, and not when the file is read. If we take a look at new_servers.ini, here's what we see:
$ cat new_servers.ini
[DEFAULT]
force_https = false
use_compression = true
port = 443

[auth.example.com]
force_https = true
port = 8081

[www.example.com]
use_compression = false
compression_level = 4
redirect_to_login = yes
login_url = https://auth.example.com
port = 8082

[www.example.com_users]
tina = Tina Turner
ricky = Ricky Bobby
port = 8083

Dates and Times

Using `date` and `datetime` Objects

00:11:29

Lesson Description:

Dates and times are used so often in programming because it's very important to know when something happened, or is going to happen. In Python, there are a few different classes that we need to use from the datetime module. In this lesson, we'll learn when and how to use them. Documentation for This VideoThe datetime ModuleUnderstanding the Date and Time Types The datetime module provides a handful of types that we'll use to represent different aspects of time:datetime.date - Represents a simple date according to the Gregorian calendar datetime.time - Represents a time on a 24-hour clock, independent of a date datetime.datetime - A combination of date and time datetime.timedelta - The difference between two date, time, or datetimes in millisecondsWe also have some classes related to timezones:datetime.tzinfo - An abstract class to encompass timezone offset information datetime.timezone - An implementation of the tzinfo abstract class: This class provides an easy way to access various important timezones like UTC.Creating and Using Dates and Times To learn about dates and times, we're going to use the REPL once again. As you can imagine, a date consists of three important attributes: a year, a month, and a day. Below is the code that we would use to create a datetime.date object for Valentine's Day 2020:

>>> import datetime
>>> valentines = datetime.date(2020, 2, 14)
>>> valentines.day
14
>>> valentines.weekday()
4
There are a lot of methods on the date class, but most of them are related to converting our date information to another format. There are also useful class methods on the date class that make it easier to create date objects from other formats and time objects. The datetime.time objects work a lot like date objects, except that we work with hours, minutes, seconds, etc. Additionally, we can also add tzinfo if we want to represent a time in a specific timezone. Let's demonstrate this more complex example by creating a timezone object to represent "Eastern Daylight Time (EDT)," which is 4 hours behind UTC:
>>> edt = datetime.timezone(datetime.timedelta(hours=-4))
>>> quarter_past_twelve = datetime.time(12, 15, 10, tzinfo=edt)
>>> quarter_past_twelve.utcoffset()
datetime.timedelta(days=-1, seconds=72000)
>>> quarter_past_twelve.tzname()
'UTC-04:00'
>>> quarter_past_twelve.hour
12
>>> quarter_past_twelve.minute
15
>>> quarter_past_twelve.isoformat()
'12:15:10-04:00'
Putting It All Together: datetime.datetime The datetime.datetime class puts together both dates and times. We're going to be covering formatting in the next lesson, so let's just touch on how to create datetime objects for now. The datetime.combine class method can even let us create a datetime from a date and a time:
>>> vday_lunch = datetime.datetime.combine(valentines, quarter_past_twelve)
>>> vday_lunch.isoformat()
'2020-02-14T12:15:10-04:00'
We haven't talked about the timedelta class much, but we can create timedelta objects by subtracting datetime objects. However, there are some rules that we need to follow:
>>> birthday = datetime.datetime(2020, 2, 6, 8, 0)
>>> vday_lunch - birthday
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    vday_lunch - birthday
TypeError: can't subtract offset-naive and offset-aware datetimes
Because one of these datetime objects has a set tzinfo, we cannot subtract a tz-unaware object. The term offset-naive means timezone unaware. Let's fix this by recreating the birthday object using the edt timezone:
>>> birthday = datetime.datetime(2020, 2, 6, 8, 0, tzinfo=edt)
>>> vday_lunch - birthday
datetime.timedelta(days=8, seconds=15310)
Now that we have an idea of how to create these objects we'll move onto formatting in the next lesson.

Date and Time Formatting

00:06:49

Lesson Description:

One of the main things that we do with dates and times is format and create them from common formats. In this lesson, we'll learn all that we can about formatting. Documentation for This VideoThe datetime Module strftime and strptime Format CodesOutputting Dates and Times Using strftime Nearly every programming language has a strftime function or method. This stands for "string, format, time". The formats usually look a little strange at first, but after a few times writing these strings, they'll feel like second nature. We are looking to achieve this representation of a datetime object:

March 14, 2016 @ 12:14 PM
For this task, we're going to need to look at the format codes. Here's what we need to create this message:
>>> from datetime import datetime, date, time
>>> valentines_lunch = datetime(2020, 2, 14, 12, 14, 10)
>>> valentines_lunch.strftime("%B %d, %Y @ %I:%M %p")
'February 14, 2020 @ 12:14 PM'
All of the %B, %d, etc. look weird, but when working with format strings we just need to remember that anything that starts with a % is going to represent some aspect of the datetime. Every other character in the string is a literal character (the spaces, @, and : in this case). If we need a % character then we'll prefix it with %%. We'll use the strftime function when we want a customized representation of our dates or times. But, iso is another common format that we'll use, and all of our objects come with an easy way get the iso format.
>>> valentines_lunch.isoformat()
'2020-02-14T12:14:10'
Creating Objects from Format Strings Another thing that we can do with a format string is translate a string of a specific format into a datetime. Because we can't serialize datetimes to send them over a network connection, it is fairly common to utilize a string to represent a datetime when making requests to third-party APIs, or when someone is sending a datetime to our applications. To do this translation, we'll use the strptime. This stands for "string parse time" and it will take the datetime string and the format string that it maps to:
>>> right_now = datetime.now().isoformat()
>>> right_now
'2020-05-01T14:29:29.211322'
>>> now_as_dt = datetime.strptime(right_now, "%Y-%m-%dT%H:%M:%S.%f")
>>> now_as_dt.isoformat()
'2020-05-01T14:29:29.211322'
>>> now_as_dt.isoformat() == right_now
True

Using the `time` module

00:06:54

Lesson Description:

The last time-related module that we'll speak about is the time module itself. It's a little confusing that there's a datetime.time class and a time module, but we'll learn about the difference now. Documentation for This VideoThe datetime Module The time Module strftime and strptime Format CodesWhen to Use datetime or time? Understanding when to use either the datetime or time modules is important, because it could potentially save you some time accomplishing specific tasks. The two modules can accomplish many of the same things, but not all. The most important difference is that the datetime module takes a more object-oriented approach when we work with classes that have methods. The time module thinks about time as a float, representing the distance from the epoch in seconds. We'll often work with a struct_time object that only has attributes though. The primary formatting options that we have from the datetime module are still there, so we won't cover those. Here are some of the situations where we want to use the time module. System Aware Times Something that the time module does particularly well is that it can easily give us access to the current GMT time and the current time based on our system's locale. The two functions that we'll use for this are gmtime and localtime. Let's see these in action, knowing that my locale is EDT because of daylight savings time:

>>> import time
>>> time.gmtime()
time.struct_time(tm_year=2020, tm_mon=5, tm_mday=1, tm_hour=19, tm_min=40, tm_sec=35, tm_wday=4, tm_yday=122, tm_isdst=0)
>>> time.localtime()
time.struct_time(tm_year=2020, tm_mon=5, tm_mday=1, tm_hour=15, tm_min=40, tm_sec=38, tm_wday=4, tm_yday=122, tm_isdst=1)
This shows the shape of the struct_time. We can see the 4-hour difference between EDT (localtime()) and GMT by noticing the tm_hour field of each. Additionally, we see the difference in daylight savings time from the tm_isdst field. We also have the inverse of these functions, the mktime function, that takes a time tuple or struct_time and returns the time as a float:
>>> time.mktime(time.gmtime())
1588380538.0
Sleeping in Scripts When writing scripts, it isn't that uncommon to need to manually wait a while for something else to happen, such as a web request or waiting for a server to launch. In these situations, we can use the time.sleep function. This function takes a number of seconds to wait and make the process block for that many seconds before continuing:
>>> time.sleep(10)
Process Time and Performance Counters The last things that we want to talk about are the perf_counter and process_time functions. The process_time function gives us the number, in fractional seconds, that the process has been actively running (not including sleep time). The perf_counter function returns a very precise measurement of time that can be used to see how long it took for something to happen. A common use for perf_counter would be to get the time before and after running a function, and then looking at the difference to see how long the action took. Let's write a small script to see this in action: touch using_time.py ~/using_time.py
import time

start_time = time.perf_counter()

for i in range(10000):
    i + 2

time.sleep(4)

end_time = time.perf_counter()

print(f"Process Time: {time.process_time()}, Total Time: {end_time - start_time}")
Let's run the script to see its results:
$ python using_time.py
Process Time: 0.044035, Total Time: 4.004943551

Operating System Interactions

The `os` Module

00:05:35

Lesson Description:

The os module provides us with a portable way to interact with "operating system resources." This is a broad term that fits this module because it contains so many functions. In this lesson, we'll read through the documentation to get an overview of what this module contains. Documentation for This VideoThe os ModuleThe Big Picture Here are the big overview sections:File Names, Command Line Arguments, and Environment Variables Process Parameters File Object Creation File Descriptor Operations Files and Directories Process Management Interface to the scheduler Miscellaneous System Information Random numbers

Interacting with Environment Variables

00:07:45

Lesson Description:

Environment variables provide a great way to share information into an application, and reading environment variables is a key skill for programming. Documentation for This VideoThe os Module The environ Attribute The getenv Function The putenv Function The unsetenv FunctionReading from the Environment When we think about our programming environment, generally, we have a list of environment variable names and values. This collection of variables maps perfectly to a dictionary type object (called a mapping object) and that object is accessible through the os.environ attribute. Let's write a short script, called using_env.py, to demonstrate how we can interact with the environment: ~/using_env.py

import os

print(f"MY_VAR: {os.environ['MY_VAR']}")
Let's run this script to see both how the information is read, and what happens if the environment variable isn't set:
$ MY_VAR=testing python3.8 using_env.py
MY_VAR: testing
$ python3.8 using_env.py
Traceback (most recent call last):
  File "using_env.py", line 3, in <module>
    print(f"MY_VAR: {os.environ['MY_VAR']}")
  File "/Users/cloud_user/.pyenv/versions/3.8.2/lib/python3.8/os.py", line 675, in __getitem__
    raise KeyError(key) from None
KeyError: 'MY_VAR'
Just like a dictionary, if we try to access a key that doesn't exist, we get an error. As a workaround, we have the os.getenv function that works just like dict.get. In most situations, we'll probably don't want our program to crash when we're trying to access the ENV, so getenv is generally the approach to use. Let's modify our script to use getenv with a default value: ~/using_env.py
import os

print(f"MY_VAR: {os.getenv('MY_VAR', 'default_value')}")
Now, this is what we see running our script both ways:
$ MY_VAR=testing python3.8 using_env.py
MY_VAR: testing
$ python3.8 using_env.py
MY_VAR: default_value
Writing and Deleting Environment Values We have two ways to write values to the ENV. But, generally, writing values to the ENV is only useful if you're going to fork off subprocesses that utilize these environment variables. We can directly set items on os.environ because it's a mapping object, or we can use the os.putenv function. Let's use both and see what we can learn about setting environment values: ~/using_env.py
import os

print(f"MY_VAR: {os.getenv('MY_VAR', 'default_value')}")

os.putenv("PUT_VAR", "testing putenv")
os.environ["SET_VAR"] = "Direct Assignment"

print(f"PUT_VAR via getenv: {os.getenv('PUT_VAR')}")

try:
    print(f"PUT_VAR via environ: {os.environ['PUT_VAR']}")
except KeyError:
    print("PUT_VAR not in os.environ")

print(f"SET_VAR via getenv: {os.getenv('SET_VAR')}")

try:
    print(f"SET_VAR via environ: {os.environ['SET_VAR']}")
except KeyError:
    print("SET_VAR not in os.environ")
Let's run this to see what output we see:
$ python3.8 using_env.py
MY_VAR: default_value
PUT_VAR via getenv: None
PUT_VAR not in os.environ
SET_VAR via getenv: Direct Assignment
SET_VAR via environ: Direct Assignment
As we can see, putenv doesn't seem to work, but it didn't give us an error. The direct assignment is the approach that we most likely want to use. The putenv function is setting the value, but it is only really accessible to subprocesses that we create from within our program. To delete environment variables, it's very similar to putenv, except that the function is unsetenv. Using del os.environ['KEY'] is preferable over unsetenv because the delete process will invoke unsetenv, but not in the opposite direction.

Working with Directories and Files

00:10:00

Lesson Description:

Another functionality of the os module is it can work with files and directories. In this lesson, we'll discover how we can accomplish plenty of the things that we would normally do from the shell, with common utilities using Python. Documentation for This VideoThe os Module The getcwd Function The chdir Function The listdir Function The makedirs Function The remove Function The removedirs Function The rmdir Function The rename Function The chown Function The chmod FunctionNavigating, Creating, Removing, and Moving Directories and Files Rather than writing a script to test out all of the file and directory capabilities, we're going to experiment with a lot of functions using the REPL. Let's launch the REPL and play around. To check our path, we'll use os.getcwd and get our current directory:

$ python3.8
>>> import os
>>> os.getcwd()
'/home/cloud_user'
Below are examples for many of the useful file and directory functions:
>>> os.chdir('./other_dir') # Will change the current working directory. Relative paths are based on the new current working directory.

>>> os.listdir('.') # List the contents of a directory
['example.txt', 'other.txt']

>>> os.mkdir("sample") # Makes a directory called sample

>>> os.makedirs("sample/foo/bar") # Recursively makes directories `mkdir -p` equivalent

>>> open("sample/foo/bar/baz.txt", 'a').close() # Creates the baz.txt file if it doesn't exist

>>> os.remove("sample/foo/bar/baz.txt") # Remove the baz.txt file

>>> os.rmdir("sample/foo/bar") # Removes the bar directory

>>> os.removedirs("sample/foo") # Recursively removes foo and sample

>>> os.rename("sample.txt", "staple.txt") # Renames the file, equivalent to `mv sample.txt staple.txt`
Changing File Ownership The chmod and chown utilities are so commonly used on UNIX machines that we definitely want to know how to use the equivalent from Python. Thankfully, each of them is just a function on the os module. The chown function does require knowing the integer values for the user ID (UID) and the group ID (GID). Here's an example of how to change ownership of a file using the UID and GID:
>>> os.chown('path_to_file.txt', 501, 20)
The chmod function gets even more complicated, because there are so many potential values for the mode. Thankfully, most of the time you'll only want to change the owner, group, and other values. We can accomplish the change of ownership by using octal notation.
>>> os.chmod('path_to_file.txt', 0o644)

Using Streams

The `io` Module

00:03:10

Lesson Description:

Some of the most common operations that we perform are input/output (IO) operations. The bulk of this functionality is implemented as part of the io module in Python's standard library. In the next series of lessons, we'll review how to perform the various types of IO and cover the common terms and concepts. Documentation for This VideoThe io Module The io.IOBase Class The sys.stdout Object The open FunctionFile Objects, Streams, and File-Like Objects When working with IO we need to know a few different terms that commonly pop up: file objects, streams, and file-like objects. These three terms all refer to the same thing: an object that implements methods or "interface" from the io.Base class. This is handy because it means that we can write methods and applications that work with various types of objects, such as files on disk, and objects like sys.stdout. Familiarizing Ourselves with io.IOBase The best way to get a good grasp on streams and IO is to learn all that we can about io.IOBase, because it is the foundation of all of the IO that we'll perform. We can comfortably work with files by knowing all of the methods and attributes of IOBase. This is a great time to read the documentation and play with things in the REPL. The main methods and attributes that we'll need to remember will be:read readline readlines write writelines flush close seek tellTypes of I/O Python distinguishes a few different types of IO: text I/O, binary (buffered) I/O, and raw (unbuffered) I/O. We'll almost always work with text and binary I/O, where the main differentiating factor is whether we use str or bytes objects.

Working with Text Streams

00:10:41

Lesson Description:

Now that we have an overview of I/O in Python, we're ready to dig into the various types. In this lesson, we'll learn about how we can perform text I/O by interacting with files and using strings as in-memory text streams. Documentation for This VideoThe io Module The io.IOBase Class The sys.stdout Object [The sys.stdin Object][6] The open Function The io.StringIO ClassWorking with Text Streams We've most definitely worked with text files up to this point, and strings are one of the most common data types that we interact with. This portion of the lesson will most likely be a review, but it is important. We want to cover two things:Opening and working with text files Working with text streams like sys.stdout and sys.stdinLet's create a new file called using_text.py and write a few examples that demonstrate how we can work with text files. ~/using_text.py

try:
    # Make sure that the file exists before trying to
    # read it
    open("text_example.txt", "x").close()
except FileExistsError:
    pass

with open("text_example.txt", "r+t") as my_file:
    print(my_file.read(), end="")
    my_file.seek(0)
    my_file.truncate()
    my_file.write("Test write")
    my_file.writelines(
        ["Line 1n", "Line 2n",]
    )
    print(my_file.tell())
This is an odd example, but it allows us to see the functionality of working with streams. We're using the x (for existance mode) in the first try block to ensure that the file exists before we try to open it for reading on our second run. Next, we're leveraging the fact that io.IOBase objects are also context managers, so they can use the with statement. The with statement will call the close method on the subject of the statement once the block has completed. We're opening the file for reading and writing and explicitly setting the mode to text by using the t option, even though this is the default. Within this block, we're using most of the useful methods that we have at our disposal when working with streams. The ones that we might not have seen before are truncate and tell. The truncate method will remove all content after our current position in the file. Since we used seek to go to the 0th position before truncating, we're effectively emptying the file before writing content to it again. The tell method is the opposite of seek and it tells us what our current position within the stream. Let's run the script to see the results. We'll need to run it twice to fully see what it is doing:
$ python3.8 using_text.py
24
$ python3.8 using_text.py
Test writeLine 1
Line 2
24
Since we're calling read before truncate, we're able to see what the contents of the file are before we clear them out and write new contents. This is why on the first run there is no output besides what our last print line shows. Something to take note is that a newline is not automatically added when using write or writelines. In-Memory Text Streams Using io.StringIO Sometimes we want an object that behaves like a file-like object, but only exists in memory. In these situations, we can create an in-memory text stream by using io.StringIO. This stream can then be passed to any function that expects a stream or file object. Let's demonstrate this by extracting our file-writing code into a function that takes a file object. Then, we'll call that function with a real file and an in-memory stream. To do this, we'll need to import the io module. ~/using_text.py
import io

try:
    # Make sure that the file exists before trying to
    # read it
    open("text_example.txt", "x").close()
except FileExistsError:
    pass


def write_to_file(file_obj):
    print(file_obj.read(), end="")
    file_obj.seek(0)
    file_obj.truncate()
    file_obj.write("Test writen")
    file_obj.writelines(
        ["Line 1n", "Line 2n",]
    )
    print(file_obj.tell())


with open("text_example.txt", "r+t") as my_file:
    write_to_file(my_file)

with io.StringIO("Sample Contentn") as txt_stream:
    write_to_file(txt_stream)
    contents = txt_stream.getvalue()
    print("Text Stream Content")
    print(contents, end="")
The only new things in this code are the initialization of the io.StringIO object, which allows us to provide the initial backing string value, and then later using the getvalue method which returns the current string value of the object. For everything else that we do, txt_stream and my_file behave exactly the same way. Let's run this script and see what we see:
$ python3.8 using_text.py
Test Write
Line 1
Line 2
25
Sample content
25
Text Stream Contents
Test Write
Line 1
Line 2
Using sys.stdout, sys.stdin, and sys.stderr The most common text streams that we work with as programmers are STDOUT, STDIN, and STDERR. All of these streams can be accessed through the sys module. STDOUT and STDERR are both writeable streams, but they aren't readable. STDIN is readable, but not writeable. This means that these objects will give us errors when calling methods that aren't related to the reading/writing mode of the stream. Both STDOUT and STDERR are FakeOutput objects and cannot be closed. It's good to know so we avoid writing methods that attempt to close the files passed in.

Working with Binary and RAW I/O Streams

00:05:07

Lesson Description:

Binary I/O and raw I/O work a lot like text I/O, except that we'll be using bytes objects. In this lesson, we'll dig into binary I/O. Documentation for This VideoThe io Module The io.IOBase Class The open Function The io.BytesIO ClassWorking with Binary Streams Binary streams work just like text streams, except that we're using bytes instead of str. To demonstrate this, we're going to make a copy of using_text.py as using_bytes.py and make the appropriate changes:

$ cp using_text.py using_bytes.py
Let's modify the file: ~/using_bytes.py
import io

try:
    # Make sure that the file exists before trying to
    # read it
    open("bytes_example.txt", "x").close()
except FileExistsError:
    pass


def write_to_file(file_obj):
    print(file_obj.read(), end="")
    file_obj.seek(0)
    file_obj.truncate()
    file_obj.write(b"Test Writen")
    file_obj.writelines(
        [b"Line 1n", b"Line 2n",]
    )
    print(file_obj.tell())


with open("bytes_example.txt", "r+b") as my_file:
    write_to_file(my_file)

with io.BytesIO(b"Sample Contentn") as byte_stream:
    write_to_file(byte_stream)
    contents = byte_stream.getvalue()
    print("Byte Stream Contents")
    print(contents)
Let's run the script to see what happens. We'll need to run it twice to see exactly what it is doing:
$ python3.8 using_bytes.py
b''25
b'Sample Contentn'25
Byte Stream Contents
b'Test WritenLine 1nLine 2n'

$ python3.8 using_bytes.py
b'Test WritenLine 1nLine 2n'25
b'Sample Contentn'25
Byte Stream Contents
b'Test WritenLine 1nLine 2n'
The printing doesn't quite have the same effect as it does when working with text I/O, but it easily demonstrates that the functionality is the same. Working with Buffers Binary I/O and text I/O use buffering, so only a portion of the file object is loaded into memory at a time. We can adjust this when opening up the file object using the open function by setting the buffering option. The default is usually good for the system running the code, but it is possible to stop buffering, or manually set the size of each buffer. Additionally, we can access the buffer underlying an io.BytesIO object and directly manipulate it by using the getbuffer() method. There are quite a few ways for us to run into errors though: ~/using_bytes.py
import io

try:
    # Make sure that the file exists before trying to
    # read it
    open("bytes_example.txt", "x").close()
except FileExistsError:
    pass


def write_to_file(file_obj):
    print(file_obj.read(), end="")
    file_obj.seek(0)
    file_obj.truncate()
    file_obj.write(b"Test Writen")
    file_obj.writelines(
        [b"Line 1n", b"Line 2n",]
    )
    print(file_obj.tell())


with open("bytes_example.txt", "r+b") as my_file:
    write_to_file(my_file)

with io.BytesIO(b"Sample contentn") as byte_stream:
    write_to_file(byte_stream)
    view = byte_stream.getbuffer()
    view[0:4] = b"Byte"
    contents = byte_stream.getvalue()
    print("Byte Stream Contents")
    print(contents)
Here we're getting a buffer to modify, and changing a slice of it. Note that the slice that we change and the value that we set need to be the same size. Otherwise, we'll get a BufferError because we can't resize the buffer. Running this script, we'll still see another error that's a little harder to figure out:
$ python3.8 using_bytes.py
b'Test WritenLine 1nLine 2n'25
b'Sample contentn'25
Byte Stream Contents
b'Byte WritenLine 1nLine 2n'
Traceback (most recent call last):
  File "using_bytes.py", line 31, in <module>
    print(contents)
BufferError: Existing exports of data: object cannot be re-sized
Notice that we're actually seeing the "Byte Write" content get written out and the error is on line 31. The error is actually because we're trying to close the stream while the buffer view still exists. To get around this, we actually need to delete the view at some point after we've modified the content: ~/using_bytes.py
import io

try:
    # Make sure that the file exists before trying to
    # read it
    open("bytes_example.txt", "x").close()
except FileExistsError:
    pass


def write_to_file(file_obj):
    print(file_obj.read(), end="")
    file_obj.seek(0)
    file_obj.truncate()
    file_obj.write(b"Test Writen")
    file_obj.writelines(
        [b"Line 1n", b"Line 2n",]
    )
    print(file_obj.tell())


with open("bytes_example.txt", "r+b") as my_file:
    write_to_file(my_file)

with io.BytesIO(b"Sample contentn") as byte_stream:
    write_to_file(byte_stream)
    view = byte_stream.getbuffer()
    view[0:4] = b"Byte"
    del view
    contents = byte_stream.getvalue()
    print("Byte Stream Contents")
    print(contents)
Running this one last time, we can now see that we modify the content and are able to close the stream:
$ python3.8 using_bytes.py
b'Test WritenLine 1nLine 2n'25
b'Sample contentn'25
Byte Stream Contents
b'Byte WritenLine 1nLine 2n'
Raw I/O It's not something that we often want to do, but occasionally we'll work with an unbuffered bytes document. If we do need to (you'll probably know if you do), then the easiest way is to use the open function again, setting buffering=0. Here's what this would look like in action:
raw_file = open('using_bytes.py', 'rb', buffering=0)

Final Steps

What's Next?

00:00:31

Lesson Description:

Thanks for taking the time to go through this course! I hope you learned a lot, and I want to hear about it. Please take a moment to rate the course. It'll help with determining what works and what doesn't. Be sure to share your thoughts in the community. Everyone wants to celebrate your successes with you.

Take this course and learn a new skill today.

Transform your learning with our all access plan.

Start 7-Day Free Trial