Skip to main content

Generate a Complete Report

Hands-On Lab

 

Photo of Larry Fritts

Larry Fritts

Python Development Training Architect II

Length

01:30:00

Difficulty

Intermediate

In this lab, graphs are created from data sliced from Titanic survivability CSV files.

The PDF of the notebook for this lab is here.

What are Hands-On Labs?

Hands-On Labs are scenario-based learning environments where learners can practice without consequences. Don't compromise a system or waste money on expensive downloads. Practice real-world skills without the real-world risk, no assembly required.

Generate a Complete Report

In this lab, we'll be creating graphs from data sliced from Titanic survivability CSV files. We will be using this data to answer three questions:

  1. What part did age play?
  2. What part did gender play?
  3. Did the passenger class make a difference?

Before We Begin

To get started, we need to log in to our virtual environment using the provided information.

The PDF of the notebook for this lab is here.

Connect to the Jupyter Notebook Server

  1. To activate our conda virtual environment:
conda activate base
  1. Change the directory to col using cd hol.
  2. List out the information in this directory with ls. We'll see the get_notbook_token.py document.
  3. Using python, we want to get the notebook token from the file above using:
    python get_notebook_token.py

    This is a simple script that starts the Jupyter notebook server and sets it to continue to run outside of the terminal.

  4. Copy the token that appears and save it to a text file on your local machine.

On Our Local Machine

  1. In a terminal window, enter the following:

    ssh -N -L localhost:8087:localhost:8086 cloud_user@<the public IP address of the Playground server>

    Replace the <the public IP address of the Playground server> with the IP address provided by the lab. It will ask us for the password; this is the password we used to log in to the Playground remote server. Leave this terminal open. It will appear nothing has happened, but it must remain open while we use the Jupyter Notebook server in this session.

  2. In the browser of our choice, enter the following address:

    ```http://localhost:8087```

    This will open a Jupyter Notebook site.

  3. In the Password or token section, enter in the token we copied earlier.

  4. Select Log in once we've entered the token.

Import Required Packages and Create Dataframe From File.

The following information is what we will be using in this lab to answer the provided questions:

Titanic Data: Factors Affecting Survivability

This data was collected from a web search. It is available from many different organizations. The file provides specific data about passengers on the Titanic and whether they survived the disaster or not.

The various data available is defined as:

  • PassengerId - Indexed starting at 1
  • Survived - Survival (0 = No; 1 = Yes)
  • Pclass - Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)
  • Name - Name
  • Sex - Sex
  • Age - Age
  • SibSp - Number of Siblings/Spouses Aboard
  • Parch - Number of Parents/Children Aboard
  • Ticket - Ticket Number
  • Fare - Passenger Fare
  • Cabin - Cabin
  • Embarked - Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)

We need to make sure that this information is pulled into our Dataframe. To do this, load the CSV data into the dataframe using:

import matplotlib.pyplot as plt
import pandas as pd

%matplotlib inline

titanic_df = pd.read_csv('titanic.csv')

titanic_df.head()

We get a table of the information set in columns that correspond to the data types above.

Examine the Effect Age Had on Survivability

Our next question is about the effect of age on survivability. The following are our age ranges:

  • Under 12
  • 13 - 24
  • 25 - 49
  • 50 - 74
  • 75 and Older
  1. Look for children ages under 12:
passengers_under_12 = titanic_df[titanic_df.Age < 12]
passengers_under_12_survived = passengers_under_12[passengers_under_12.Survived == 1]
passengers_under_12_percent_survived = passengers_under_12_survived.Age.count() / passengers_under_12.Age.count()
  1. Enter in the following for ages 13-24:
passengers_13_to_24 = titanic_df[(titanic_df.Age >= 13) & (titanic_df.Age < 25)]
passengers_13_to_24_survived = passengers_13_to_24[passengers_13_to_24.Survived == 1]
passengers_13_to_24_percent_survived = passengers_13_to_24_survived.Age.count() / passengers_13_to_24.Age.count()
  1. Enter in the following for ages 25 to 49:
passengers_25_to_49 = titanic_df[(titanic_df.Age >= 25) & (titanic_df.Age < 50)]
passengers_25_to_49_survived = passengers_25_to_49[passengers_25_to_49.Survived == 1]
passengers_25_to_49_percent_survived = passengers_25_to_49_survived.Age.count() / passengers_25_to_49.Age.count()
  1. Enter in the following for ages 50 to 74:
passengers_50_to_74 = titanic_df[(titanic_df.Age >= 50) & (titanic_df.Age < 74)]
passengers_50_to_74_survived = passengers_50_to_74[passengers_50_to_74.Survived == 1]
passengers_50_to_74_percent_survived = passengers_50_to_74_survived.Age.count() / passengers_50_to_74.Age.count()
  1. Enter in the following for ages 75 and over:
passengers_75_over = titanic_df[titanic_df.Age > 74]
passengers_75_over_survived = passengers_75_over[passengers_75_over.Survived == 1]
passengers_75_over_percent_survived = passengers_75_over_survived.Age.count() / passengers_75_over.Age.count()
  1. Now, use the print command to return the percentages for this information:
print(f'Under 12:t{passengers_under_12.Age.count()} - {passengers_under_12_percent_survived}')
print(f'13 - 24:t{passengers_13_to_24.Age.count()} - {passengers_13_to_24_percent_survived}')
print(f'25 - 49:t{passengers_25_to_49.Age.count()} - {passengers_25_to_49_percent_survived}')
print(f'50 - 74:t{passengers_50_to_74.Age.count()} - {passengers_50_to_74_percent_survived}')
print(f'75 & Over:t{passengers_75_over.Age.count()} - {passengers_75_over_percent_survived}')
  1. Finally, create it as a bar chart using the following code:
groups = ('Under 12', '13 - 24', '25 - 49', '50 - 74', '75 & Over')
percentages = [0.57, 0.37, 0.41, 0.36, 1]
plt.bar(groups, percentages, align='center', alpha=0.5)
plt.ylabel("Percent Survived")
plt.title("Titanic Survivablity by Age Group")

Using the bar chart that we generated, we can infer that children under 13 may have been given some preferential treatment for lifeboats. However, it is not clear if the reported survivability is only for those that died in the event. It may be that some of the children may have been more susceptible to environmental factors, such as temperature, and died in the lifeboat.

Since there was only one passenger in the 75 & Over group, the survivability of that group is not useful and should not be considered.

Examine the Effect Gender Had on Survivability

Now, to answer our second question over the effect gender had on survivability.

  1. Enter in the following code to look at male survivability:
passengers_male = titanic_df[titanic_df.Sex == "male"]
passengers_male_survived = passengers_male[passengers_male.Survived == 1]
passengers_male_percent_survived = passengers_male_survived.Sex.count() / passengers_male.Sex.count()
  1. Enter in the following code to look at female survivability:
passengers_female = titanic_df[titanic_df.Sex == "female"]
passengers_female_survived = passengers_female[passengers_female.Survived == 1]
passengers_female_percent_survived = passengers_female_survived.Sex.count() / passengers_female.Sex.count()
  1. Once again, use the print command to view the percentages of the genders:
print(f'Male:t{passengers_male.Sex.count()} - {passengers_male_percent_survived}')
print(f'Female:t{passengers_female.Sex.count()} - {passengers_female_percent_survived}')
  1. Finally create a bar graph to review the data:
# Show data as a bar chart
groups = ('Male', 'Female')
percentages = [0.18, 0.74]
plt.bar(groups, percentages, align='center', alpha=0.5)
plt.ylabel("Percent Survived")
plt.title("Titanic Survivablity by Gender")

Based on the information, It is obvious female passengers were given preference over male passengers for lifeboats.

Examine the Effect Passenger Class Had on Survivability

Finally, let's use our information to figure out if a passenger's class had anything to do with their survivability:

  1. Look at the information for 1st class passengers:
passengers_class_1 = titanic_df[titanic_df.Pclass == 1]
passengers_class_1_survived = passengers_class_1[passengers_class_1.Survived == 1]
passengers_class_1_percent_survived = passengers_class_1_survived.Pclass.count() / passengers_class_1.Pclass.count()
  1. Enter in the following code to look at 2nd class survivability:
passengers_class_2 = titanic_df[titanic_df.Pclass == 2]
passengers_class_2_survived = passengers_class_2[passengers_class_2.Survived == 1]
passengers_class_2_percent_survived = passengers_class_2_survived.Pclass.count() / passengers_class_2.Pclass.count()
  1. Enter in the following code to look at 3rd class survivability:
passengers_class_3 = titanic_df[titanic_df.Pclass == 3]
passengers_class_3_survived = passengers_class_3[passengers_class_3.Survived == 1]
passengers_class_3_percent_survived = passengers_class_3_survived.Pclass.count() / passengers_class_3.Pclass.count()
  1. Finally, print out the information about each class:
print(f'Class 1:t{passengers_class_1.Pclass.count()} - {passengers_class_1_percent_survived}')
print(f'Class 2:t{passengers_class_2.Pclass.count()} - {passengers_class_2_percent_survived}')
print(f'Class 3:t{passengers_class_3.Pclass.count()} - {passengers_class_3_percent_survived}')
  1. Using the information we found, create a bar graph to review this information:
groups = ('Class 1', 'Class 2', 'Class 3')
percentages = [0.63, 0.47, 0.24]
plt.bar(groups, percentages, align='center', alpha=0.5)
plt.ylabel("Percent Survived")
plt.title("Titanic Survivablity by Passenger Class")

From the bar graph, we can see that 1st class passengers were more likely to be saved, whether they were closer to the lifeboats or a genuine preference cannot be determined. Still, we can give a general idea of who predominantly survived, answering the question.

Conclusion

Congratulations — we've answered the questions and completed the lab!