Skip to main content

Using AWS Step Functions to Manage a Long-Running Process

Hands-On Lab

 

Length

01:00:00

Difficulty

Intermediate

Lambda functions are a great way to create serverless architectures within AWS. But managing and orchestrating them can be difficult when we use many functions within a pipeline. Managing long-running asynchronous processes is also a problem. Lambda can trigger processes to start, but we should avoid having them wait for long-running processes (more than a few minutes) to conclude. AWS Step Functions is a solution to both these problems. In this hands-on lab, we will use AWS Step Functions to manage Lambda functions and monitor a long-running process — Amazon Transcribe — to trigger a subsequent action when a transcription job is complete.

What are Hands-On Labs?

Hands-On Labs are scenario-based learning environments where learners can practice without consequences. Don't compromise a system or waste money on expensive downloads. Practice real-world skills without the real-world risk, no assembly required.

Using AWS Step Functions to Manage a Long-Running Process

Introduction

In this hands-on lab, we will use AWS Step Functions to manage Lambda functions and monitor a long-running process — Amazon Transcribe — to trigger a subsequent action when a transcription job is complete.

Solution

Log in to the live AWS environment using the cloud_user credentials provided, and make sure you're in the us-east-1 (N. Virginia) region throughout the lab.

All the resources for this lab, including sample audio files to transcribe, can be found on GitHub.

Note: The main headers in this lab guide match up with the video titles, rather than the task titles.

Create an IAM Role for Step Functions

Upload Audio File to S3 Bucket and Check Lambda Function

  1. In a new browser tab, navigate to S3.
  2. Click to open the listed bucket.
  3. Upload the sample audio file provided with the lab (or another audio file on your machine, if you'd prefer).
  4. In a new browser tab, navigate to Lambda.
  5. Click the listed lab-lambda-transcribe function.
  6. Click the Monitoring tab. We should see some CloudWatch metrics populating.

Create an IAM Role for Step Functions

  1. In a new browser tab, navigate to IAM.
  2. Click Roles in the left-hand menu.
  3. Click Create role.
  4. Select Step Functions as the trusted entity.
  5. Click Next: Permissions.
  6. On the permissions policies page, confirm the automated policy selection AWSLambdaRole by selecting Next: Tags.
  7. Add the following tag:
    • Key: creator
    • Value: Enter your name.
  8. Click Next: Review.
  9. Enter a Role name of "lab-role-step-functions".
  10. Click Create role.

Create a Simple Step Functions State Machine

  1. In a new browser tab, navigate to Step Functions.
  2. Click Get started.
  3. Select Author with code snippets.
  4. In the Details section, give it a Name of "lab-step-functions".
  5. Leave the State machine definition as-is for now.
  6. Click Next.
  7. Under IAM role for executions, select Choose an existing IAM role.
  8. Select I will use an existing role.
  9. In the Existing IAM roles dropdown, make sure lab-role-step-functions is selected.
  10. Click Create state machine.
  11. Once it's created, copy the listed ARN and paste it into a text file. We'll need it in a few minutes.

Create an IAM Role and Lambda Function

Create IAM Role

  1. Back in the IAM console tab, click Create role.
  2. Select Lambda as the trusted entity.
  3. Click Next: Permissions.
  4. On the permissions policies page, search for (in the Filter policies box) and select each of the following managed policies:
    • AWSStepFunctionsFullAccess
    • CloudWatchLogsFullAccess
  5. Click Next: Tags.
  6. Add the following tag:
    • Key: creator
    • Value: Enter your name.
  7. Click Next: Review.
  8. Enter a Role name of "lab-role-lambda-step-trigger".
  9. Click Create role.

Create Lambda Function

  1. Back in the Lambda console tab, navigate back to Functions.

  2. Click Create function.

  3. Make sure the Author from scratch option at the top is selected, and then use the following settings:

    • Function name: lab-lambda-step-trigger
    • Runtime: Python 3.6
  4. Expand Choose or create an execution role, and set the following values:

    • Execution role: Use an existing role
    • Existing role: lab-role-lambda-step-trigger
  5. Click Create function.

  6. In the Function code section, delete the boilerplate code and paste in the following (also found on GitHub:

    import boto3
    import os
    import json
    
    stepfunctions = boto3.client('stepfunctions')
    
    def lambda_handler(event, context):
    
        bucket = event['Records'][0]['s3']['bucket']['name']
        key = event['Records'][0]['s3']['object']['key']
    
        input = {
            "Bucket" : bucket,
            "Key": key
        }
    
        response = stepfunctions.start_execution(
            stateMachineArn=os.environ['STATEMACHINEARN'],
            input=json.dumps(input, default=str)
        )
    
        return json.dumps(response, default=str)
  7. Click Save.

  8. Scroll down under the function code, and create an environment variable:

    • Key: STATEMACHINEARN
    • Value: Paste in the ARN of the state machine you copied a few minutes ago.
  9. Click Save.

Update the S3 Trigger to Call the Lambda Trigger Function

  1. In the S3 console tab, click to open the input-... bucket (if you aren't already in the bucket from earlier).
  2. Click the Properties tab.
  3. Click the Events card.
  4. Select the existing event, and click Delete.
  5. Click Add notification, and set the following properties:
    • Name: trigger-step
    • Events: All object create events
    • Send to: Lambda Function
    • Lambda: lab-lambda-step-trigger
    • Click Save.
  6. Click the Overview tab.
  7. Select the audio file you uploaded earlier.
  8. Click Actions > Rename.
  9. Add "-1" to the end of the file name, and click Save.
  10. In the Step Functions console tab, refresh the Executions section, where we should see the successful execution of our state machine listed.
  11. Click to open the listed execution to see a summary of what happened. We should see our bucket name and audio file name under Input in the Step details section.

Update the State Machine to Call Our Existing Lambda Function

Update the State Machine Definition

  1. Click Edit state machine.
  2. Using the Generate code snippet menu, select AWS Lambda: Invoke a function
  3. With Select function from a list selected, select the full ARN for the lab-lambda-transcribe function.
  4. Click Copy to clipboard to copy the definition snippet to your computer's clipboard.
  5. Carefully paste this definition snippet into the main definition of the entire HelloWorld state (between the first curly bracket after "States": and the second-to-last curly bracket).

>Note: To double-check what this should look like, see the "Update the State Machine to Call Our Existing Lambda Function" video on the lab page, starting at the 3:25 timestamp.

  1. Fix the validation errors:
    • Change the name (Invoke Lambda function) to transcribe.
    • Update the StartAt value to transcribe.
    • Change the Next key and value to "End" : true.
  2. Click Save > Save anyway.
  3. In the S3 console tab, select the audio file.
  4. Click Actions > Rename.
  5. Add "-2" to the end of the file name, and click Save.
  6. In the Step Functions console tab, navigate back to the state machine.
  7. Refresh the Executions section, where we should see the failed execution of our state machine.
  8. Click to open the new execution to see a summary of what happened. The transcribe box in the Visual workflow section should be red.
  9. Under Step details, expand Exception to see the problem: Our Lambda function is expecting an S3 event — not the new format being sent in by the trigger.

Update the Lambda Function

  1. In the Lambda console tab, select the lab-lambda-transcribe function.

  2. In the Function code section, replace the three lines that handle the incoming "event" (starting record =, s3bucket =, and s3object =) with the following two lines:

    s3bucket = event['Input']['Bucket']
    s3object = event['Input']['Key']
  3. Click Save.

  4. In the S3 console tab, select the audio file.

  5. Click Actions > Rename.

  6. Add "-3" to the end of the file name, and click Save.

  7. In the Step Functions console tab, navigate back to the state machine.

  8. Refresh the Executions section, where we should see the successful execution of our state machine.

  9. Click to open the new execution to see a summary of what happened. The transcribe box should now be green.

  10. Under Step details, expand Output. Scroll to see Payload, which includes TranscriptionJobName and our audio file name.

Make Further Changes to the State Machine to Manage the Long-Running Process

Add a Wait State and Choice State to the Step Function

  1. Scroll up and click Edit state machine. Navigate to Step Functions.

  2. Carefully place your cursor after the curly bracket after "End": true. This location is represented here by the X:

            ...
            },
            "End": true
        }X
    ...
  3. Where the X is located, paste in the following (also found on GitHub:

    ,
        "transcribe-wait":{
            "Type":"Wait",
            "Seconds":2,
            "Next":"transcribe-status"
        },
        "transcribe-status": {
            "Type": "Pass",
            "Next": "transcribe-complete"
        },
        "transcribe-complete":{
            "Type":"Choice",
            "Choices":[
                {
                    "Variable":"$.Payload.TranscriptionJobStatus",
                    "StringEquals":"COMPLETED",
                    "Next":"success"
                },
                {
                    "Variable":"$.Payload.TranscriptionJobStatus",
                    "StringEquals":"FAILED",
                    "Next":"error"
                }
            ],
            "Default":"transcribe-wait"  
        },
        "success": {
            "Type": "Pass",
            "End": true
        },
        "error": {
            "Type": "Pass",
            "End": true
        }
  4. Fix the validation error by changing the first "End": true in the transcribe state to "Next": "transcribe-wait".

  5. Save the state machine.

Create a Lambda Function to Get the Status of an Amazon Transcribe Task

  1. In the Lambda console tab, navigate to Functions.

  2. Click Create function.

  3. Make sure the Author from scratch option at the top is selected, and then use the following settings:

    • Function name: lab-lambda-status-checker
    • Runtime: Python 3.6
  4. Expand Choose or create an execution role, and set the following values:

    • Execution role: Use an existing role
    • Existing role: lab-role-lambda-us-east-1
  5. Click Create function.

  6. In the Function code section, delete the boilerplate code and paste in the following (also found on GitHub):

    import boto3
    
    transcribe = boto3.client('transcribe')
    
    def lambda_handler(event, context):
    
        payload = event['Input']['Payload']
        transcriptionJobName = payload['TranscriptionJobName']
    
        response = transcribe.get_transcription_job(
            TranscriptionJobName=transcriptionJobName
        )
    
        transcriptionJob = response['TranscriptionJob']
    
        transcriptFileUri = "none"
        if 'Transcript' in transcriptionJob:
            if 'TranscriptFileUri' in transcriptionJob['Transcript']:
                transcriptFileUri = transcriptionJob['Transcript']['TranscriptFileUri']
    
        return {
            'TranscriptFileUri': transcriptFileUri,
            'TranscriptionJobName': transcriptionJobName,
            'TranscriptionJobStatus': response['TranscriptionJob']['TranscriptionJobStatus']
        }
  7. Click Save.

Update State Machine to Call New Lambda Function

  1. In the Step Functions console tab, we need to edit our state machine.
  2. Using the Generate code snippet menu, select AWS Lambda: Invoke a function
  3. With Select function from a list selected, select the full ARN for the lab-lambda-status-checker function.
  4. Click Copy to clipboard to copy the definition snippet to your computer's clipboard.
  5. Within the definition, locate the Pass state called transcribe-status. Carefully select the entire transcribe-status step (leaving the last comma there), and paste over the snippet you have in your clipboard.
  6. Fix the validation issues:
    • Rename the pasted step to transcribe-status.
    • Change the value of the Next attribute to transcribe-complete.

>Note: To double-check what this should look like, see the "Create a Lambda Function to Get the Status of an Amazon Transcribe Task" video on the lab page, starting at the 3:15 timestamp.

  1. Click Save > Save anyway.

Update State Machine to Call Our Checker Function and Test

  1. In the S3 console tab, select the audio file.
  2. Click Actions > Rename.
  3. Add "-4" to the end of the file name, and click Save.
  4. In the Step Functions console tab, navigate back to the state machine.
  5. Refresh the Executions section, where we should see there's one currently running.
  6. Click to open the new execution to see a summary of what happened. We should see the full flow of what we've created.
  7. Expand Output in the Step details section to see the progress of what's happening.

Conclusion

You did it! Congratulations on successfully completing this hands-on lab.