Forum Export: Exporting Data as JSON and CSV Using Python

Hands-On Lab

 

Photo of Keith Thompson

Keith Thompson

DevOps Training Architect II in Content

Length

00:45:00

Difficulty

Intermediate

Databases are great for storing and allowing us to query information, but the default shape and SQL are not always the best options for people within our organization. If we know the schema of our database tables, we can create classes to model our data and then massage the data into other shapes that are more helpful for specific use cases. In this hands-on lab, we'll use an existing SQLAlchemy library to query our forum database and export data as both JSON and CSV for other employees in our organization.

What are Hands-On Labs?

Hands-On Labs are scenario-based learning environments where learners can practice without consequences. Don't compromise a system or waste money on expensive downloads. Practice real-world skills without the real-world risk, no assembly required.

Forum Export: Exporting Data as JSON and CSV

Introduction

Databases are great for storing and allowing us to query information, but the default shape and SQL are not always the best options for people within our organization. If we know the schema of our database tables, we can create classes to model our data and then massage the data into other shapes that are more helpful for specific use cases. In this hands-on lab, we'll use an existing SQLAlchemy library to query our forum database and export data as both JSON and CSV for other employees in our organization.

Connect to the Lab

Option 1: Connect with the Visual Studio (VS) Code Editor

  1. Open your terminal application, and run the following command:
    ssh cloud_user@PUBLIC_IP_ADDRESS
  2. Enter yes at the prompt.
  3. Enter your cloud_user password at the prompt.
  4. Run exit to close the connection.
  5. Run the following command:
    ssh-copy-id cloud_user@PUBLIC_IP
  6. Enter your password at the prompt.
  7. Open Visual Studio Code.
  8. In the seach bar at the top, enter cloud_user@PUBLIC_IP.
  9. Once you've connected, click the square Extensions icon in the left sidebar.
  10. Under Local - Installed, scroll down to Python and click Install on SSH.
  11. Click Reload to make the changes take effect.

Option 2: Connect with Your Local Machine

  1. Open your terminal application, and run the following command (remember to replace PUBLIC_IP with the public IP you were provided on the lab instructions page):
    ssh cloud_user@PUBLIC_IP
  2. Type yes at the prompt.
  3. Enter your cloud_user password at the prompt.

Create a Virtualenv and Install the forum Package

  1. Change to the ~/forum directory.
    cd ~/forum
  2. Create a new Python 3.7 virtualenv.
    pipenv install
  3. Activate the virtualenv.
    pipenv shell
  4. Install the forum package from source.
    pip install -e .

Write the Posts Query in the export_csv.py Script

  1. Create an export_csv.py file.
    touch export_csv.py
  2. Open the file, and add the following contents:

    import os
    
    from sqlalchemy import create_engine
    from sqlalchemy.orm import sessionmaker
    from sqlalchemy.sql import func
    
    db_url = os.environ["DB_URL"]
    engine = create_engine(db_url)
    Session = sessionmaker(bind=engine)
    session = Session()
    
    from forum.models import Post, Comment
    
    comments = (
      session.query(Comment.post_id, func.count("*").label("comments"))
      .group_by(Comment.post_id)
      .subquery()
    )
    
    negative_comments = (
      session.query(Comment.post_id, func.count("*").label("negative_comments"))
      .filter(Comment.sentiment == "negative")
      .group_by(Comment.post_id)
      .subquery()
    )
    
    positive_comments = (
      session.query(Comment.post_id, func.count("*").label("positive_comments"))
      .filter(Comment.sentiment == "positive")
      .group_by(Comment.post_id)
      .subquery()
    )
    
    final_query = (
      session.query(
          Post,
          comments.c.comments,
          negative_comments.c.negative_comments,
          positive_comments.c.positive_comments,
      )
      .outerjoin(comments, Post.id == comments.c.post_id)
      .outerjoin(negative_comments, Post.id == negative_comments.c.post_id)
      .outerjoin(positive_comments, Post.id == positive_comments.c.post_id)
    )

Add the CSV Export to export_csv.py

  1. Add the following to the end of the export_csv.py file:

    import csv
    
    csv_file = open("forum_export.csv", mode="w")
    fields = ["id", "body", "author_name", "created_on","comments", "positive_comments", "negative_comments"]
    csv_writer = csv.DictWriter(csv_file, fieldnames=fields)
    csv_writer.writeheader()
    
    for post, comments, negative_comments, positive_comments in final_query:
      csv_writer.writerow({
          "id": post.id,
          "body": post.body,
          "author_name": post.author_name,
          "created_on": post.created_on.date(),
          "comments": comments or 0,
          "positive_comments": positive_comments or 0,
          "negative_comments": negative_comments or 0
      })
    
    csv_file.close()
  2. Save and exit the export_csv.py file.
  3. In your terminal (with the virtualenv active), run the following command:
    DB_URL=postgres://admin:password@PUBLIC_IP:80/forum python export_csv.py
  4. View the export file to verify that the script is working as expected.
    less forum_export.csv

Create the export_json.py Script

  1. Make a copy of export_csv.py, and name it export_json.py.
    cp export_csv.py export_json.py
  2. Edit the file, and replace the import csv section with the following:

    import json
    
    items = []
    
    for post, comments, negative_comments, positive_comments in final_query:
      items.append({
              "id": post.id,
              "body": post.body,
              "author_name": post.author_name,
              "created_on": str(post.created_on.date()),
              "comments": comments or 0,
              "positive_comments": positive_comments or 0,
              "negative_comments": negative_comments or 0,
          })
    
    with open("forum_export.json", mode="w") as f:
      json.dump(items, f)
  3. Save and exit the export_json.py file.
  4. In your terminal (with the virtualenv active), run the following command:
    DB_URL=postgres://admin:password@PUBLIC_IP:80/forum python export_json.py
  5. View the export file to verify that the script is working as expected.
    less forum_export.json

Conclusion

Congratulations, you've successfully completed this hands-on lab!