Skip to main content

Use pandas DataFrames on Excel Data

Hands-On Lab

 

Photo of Larry Fritts

Larry Fritts

Python Development Training Architect II

Length

00:30:00

Difficulty

Beginner

In this lab, we take a .csv file and create an Excel workbook out of it using pandas.

The pdf of the notebook for this lab is here.

What are Hands-On Labs?

Hands-On Labs are scenario-based learning environments where learners can practice without consequences. Don't compromise a system or waste money on expensive downloads. Practice real-world skills without the real-world risk, no assembly required.

Use pandas DataFrames on Excel Data

In this lab, using Jupyter Notebooks, we take a CSV file and create an Excel workbook out of it using pandas.

Before We Begin

To get started, we need to log into our lab server with the credentials provided by the lab. We also need to download the PDF of the notebook for this lab is here.

Setup Jupyter Notebook Server Access

Once we're logged in and have our PDF downloaded, we can begin setting up our Jupyter notebook's server access.

Connecting to the Jupyter Notebook Server

To connect the Jupyter Notebook, complete the following:

  1. Activate the conda virtual environment:
conda activate base
  1. Change our directory to hol using cd hol.
  2. Use ls to list out the information. We will see get_notebook_token.py.
  3. To start the server, run the following to obtain the notebook token:
python get_notebook_token.py
  1. Copy and save the notebook token.

On Your Local Machine

To continue, move to your local terminal and complete the following:

  1. In the terminal window, enter the following:
ssh -N -L localhost:8087:localhost:8086 cloud_user@<the public IP address of the Playground server>

Make sure to replace <the public IP address of the Playground server> with the IP address of the Playground server. You will also use the password we used to log in to the Playground remote server.

Note:Leave this terminal open, it will appear nothing has happened, but it must remain open while you use the Jupyter Notebook server in this session.

  1. In the browser window of your choice, enter the following address: http://localhost:8087. This will open the Jupyter Notebook site.

  2. In the Password or token box, enter the token we copied from the remote server.

  3. Select Log in.

We arrive on our Jupyter lab server.

Create a Dataframe From a File

  1. Select the hol_4_2_1.ipynb notebook.

  2. Using python in our notebook, open the dow_jones_index.data file to determine the data structure:

    # open file for reading
    f = open('dow_jones_index.data')
  3. Print the first two lines and then close the document:

# print the first two lines
print(f.readline())
print(f.readline())

f.close()
  1. It appears the file is CSV. Read the file into a dataframe using the following commands to see it as a table:
import pandas as pd

stock_df = pd.read_csv('dow_jones_index.data')

stock_df.head()

Create the Requested Excel Workbook

With our content ready, we need to create our Excel workbook.

  1. To make the workbook, start by creating a dataframe for each of the requested stocks:
ge_df = stock_df[stock_df.stock=='GE']
ibm_df = stock_df[stock_df.stock=='IBM']
krft_df = stock_df[stock_df.stock=='KRFT']
  1. Write the Excel file for each of them:
with pd.ExcelWriter('stocks.xlsx') as writer:  
    ge_df.to_excel(writer, sheet_name='GE')
    ibm_df.to_excel(writer, sheet_name='IBM')
    krft_df.to_excel(writer, sheet_name='KRFT')

Check That the Excel Workbooks are Correct

  1. Load the file into an ordered dict dataframe:
my_stock_df = pd.read_excel('stocks.xlsx', sheet_name=None)  

my_stock_df.keys()
  1. Check that each worksheet is populated:
my_stock_df['GE']
my_stock_df['IBM']
my_stock_df['KRFT']

Conclusion

Congratulations — you've finished the lab!