Header menu logo FsCDK

AWS Step Functions AWS Step Functions

AWS Step Functions is a serverless orchestration service that lets you combine AWS Lambda functions and other AWS services into business-critical workflows.

Order Processing Workflow Example

Step Functions Order Processing Workflow

Quick Start

#r "../src/bin/Release/net8.0/publish/Amazon.JSII.Runtime.dll"
#r "../src/bin/Release/net8.0/publish/Constructs.dll"
#r "../src/bin/Release/net8.0/publish/Amazon.CDK.Lib.dll"
#r "../src/bin/Release/net8.0/publish/FsCDK.dll"

open FsCDK
open Amazon.CDK
open Amazon.CDK.AWS.StepFunctions
open Amazon.CDK.AWS.StepFunctions.Tasks
open Amazon.CDK.AWS.Lambda
open Amazon.CDK.AWS.Logs

Basic State Machine

Create a simple state machine that orchestrates Lambda functions.

stack "BasicStateMachine" {
    // Create Lambda functions
    let processFunc =
        lambda "ProcessData" {
            runtime Runtime.DOTNET_8
            handler "App::Process"
            code "./lambda"
            ()
        }

    let validateFunc =
        lambda "ValidateData" {
            runtime Runtime.DOTNET_8
            handler "App::Validate"
            code "./lambda"
        }

    // Create log group for state machine
    let! logGroup = logGroup "/aws/vendedlogs/states/MyStateMachine" { retention RetentionDays.ONE_MONTH }

    // Create state machine
    // Note: State definitions must be created using CDK Tasks
    // Example:
    //   let validateTask = LambdaInvoke(scope, "ValidateTask", ...)
    //   let processTask = LambdaInvoke(scope, "ProcessTask", ...)
    //   let definition = Chain.Start(validateTask).Next(processTask)

    let stateMachine =
        stepFunction "DataPipeline" {
            comment "Validates and processes data"
            // definition definition
            logDestination logGroup
            timeout (Duration.Hours(2.0))
        }

    ()
}

Standard vs Express State Machines

Step Functions offers two types of state machines:

stack "StateMachineTypes" {
    let! logGroup =
        logGroup "Logs" {
            retention RetentionDays.ONE_WEEK
            ()
        }

    // Standard: Long-running, exactly-once execution
    let standardSM =
        stepFunction "StandardWorkflow" {
            stateMachineType StateMachineType.STANDARD
            comment "Long-running workflow with exactly-once semantics"
            logDestination logGroup
            timeout (Duration.Days(1.0))
        }

    // Express: Short-lived, at-least-once execution, cheaper
    let expressSM =
        stepFunction "ExpressWorkflow" {
            stateMachineType StateMachineType.EXPRESS
            comment "High-volume, short-duration workflow"
            logDestination logGroup
            timeout (Duration.Minutes(5.0))
        }

    ()
}

Error Handling and Retry Logic

Implement robust error handling with retries and fallbacks.

Note: Error handling and retry logic must be configured on individual tasks using CDK directly.

Parallel Execution

Execute multiple tasks concurrently.

Note: Parallel states must be created using CDK Parallel construct.

Choice States

Implement conditional branching in workflows.

Note: Choice states must be created using CDK Choice construct.

Map States

Process arrays of items in parallel.

Note: Map states must be created using CDK Map construct.

Wait States

Add delays between workflow steps.

Note: Wait states must be created using CDK Wait construct.

Integration with AWS Services

Step Functions integrates with many AWS services beyond Lambda:

DynamoDB Integration

Read/write to DynamoDB tables directly from state machines.

SQS Integration

Send messages to SQS queues.

SNS Integration

Publish messages to SNS topics.

ECS/Fargate Integration

Run containerized tasks.

Human Approval Workflow

Implement workflows that require human approval.

Saga Pattern for Distributed Transactions

Implement compensating transactions for microservices.

Monitoring and Observability

Step Functions provides comprehensive monitoring capabilities.

stack "MonitoredStateMachine" {
    let! logGroup =
        logGroup "DetailedLogs" {
            retention RetentionDays.THREE_MONTHS
            ()
        }

    let sm =
        stepFunction "MonitoredWorkflow" {
            comment "Workflow with full logging and tracing"
            // Full logging (ALL events)
            loggingLevel LogLevel.ALL
            logDestination logGroup
            // X-Ray tracing enabled by default
            tracingEnabled true
        }

    ()
}

Best Practices

Performance

Security

Cost Optimization

Reliability

Operational Excellence

State Machine Types Comparison

Feature

Standard

Express

Max Duration

1 year

5 minutes

Execution Rate

2,000/second

100,000/second

Pricing

Per state transition

Per execution + duration

Execution Semantics

Exactly-once

At-least-once

Execution History

Full history (90 days)

CloudWatch Logs only

Best For

Long-running, critical

High-volume, short tasks

Default Settings

The Step Functions builder applies these production-safe defaults:

Note: Logging requires a CloudWatch Log Group destination.

Logging Levels

Helper Functions

FsCDK provides helper functions for common Step Functions patterns:

open StepFunctionHelpers

// State machine types
let standardType = StateMachineTypes.standard
let expressType = StateMachineTypes.express

// Common timeouts
let fiveMin = Timeouts.fiveMinutes
let thirtyMin = Timeouts.thirtyMinutes
let oneHour = Timeouts.oneHour
let oneDay = Timeouts.oneDay

// Logging levels
let allLogs = LoggingLevels.all
let errorLogs = LoggingLevels.error
let noLogs = LoggingLevels.off

Escape Hatch

For advanced scenarios, access the underlying CDK StateMachine:

fsharp let smResource = stepFunction "MyWorkflow" { comment "My workflow" logDestination myLogGroup }

// Access the CDK StateMachine for advanced configuration let cdkSM = smResource.StateMachine.Value

// Grant execution permissions cdkSM.GrantStartExecution myRole cdkSM.GrantStartSyncExecution myRole

// Get state machine ARN let arn = cdkSM.StateMachineArn `

Use Cases

Order Processing

ETL Pipelines

Machine Learning Workflows

Human Approval Workflows

Microservice Orchestration

📚 Learning Resources for AWS Step Functions

AWS Official Documentation

Getting Started: - AWS Step Functions Developer Guide - Complete documentation - Step Functions Tutorials - Hands-on learning - Amazon States Language (ASL) - JSON-based workflow language - Step Functions Workflow Studio - Visual workflow builder

Core Concepts: - State Types - Task, Choice, Parallel, Map, Wait, etc. - Service Integrations - 220+ AWS service integrations - Error Handling - Retry and Catch patterns - Standard vs Express Workflows - When to use each

Best Practices: - Step Functions Best Practices - Official recommendations - Design Patterns - Common workflow patterns - Cost Optimization - Pricing guide and cost strategies

Serverless Orchestration Patterns

Yan Cui (The Burning Monk) - Orchestration Expert: - The Burning Monk Blog - Yan Cui's Step Functions and orchestration insights, including comprehensive serverless best practices and error handling - Saga Pattern with Step Functions - Distributed transactions - Step Functions vs EventBridge - Choreography vs Orchestration

AWS Compute Blog - Essential Reading: - Event-Driven Orchestration - Modern patterns - Callback Pattern - Human approval workflows - Map State Deep Dive - Process arrays efficiently - Wait for Callback with Task Token - Integrate with external systems

Advanced Patterns & Architectures

Saga Pattern (Distributed Transactions): - Saga Pattern Implementation - Official AWS guide - Compensating Transactions - Rollback failed operations - Event Sourcing with Step Functions - Building audit trails

Parallel & Map State Patterns: - Distributed Map State - Process millions of items - Dynamic Parallelism - Fan-out patterns - Batch Processing - Large-scale data processing

Choice & Branching: - Choice State Examples - Conditional logic - InputPath, OutputPath, ResultPath - Data flow management - JSONPath in Step Functions - Extract and transform data

Wait & Timer Patterns: - Wait State - Fixed or dynamic waits - Schedule-Based Workflows - Cron-like execution - Polling Patterns - Wait for external job completion

Step Functions Service Integrations

Direct SDK Integrations (220+ Services): - Lambda Integration - Invoke functions sync or async - DynamoDB Integration - Read/write tables directly - ECS/Fargate Integration - Run containerized tasks - SNS/SQS Integration - Message pub/sub - Glue Integration - ETL workflows - Athena Integration - Query S3 data - SageMaker Integration - ML training/inference

Optimized Integrations: - Lambda Optimized - Automatic payload handling - Service Integrations Deep Dive - Sync vs async patterns

Error Handling & Resilience

Retry Strategies: - Retry Configuration - ErrorEquals, IntervalSeconds, MaxAttempts, BackoffRate - Exponential Backoff - Prevent thundering herd - Error Handling Best Practices - Official AWS guidance on retries and error handling

Catch & Fallback: - Catch Errors - Handle specific errors - Fallback Chains - Multiple catch handlers - States.ALL - Catch-all error handler

Circuit Breaker Pattern: - Circuit Breaker with Step Functions - Prevent cascading failures - Health Checks - Monitor external dependencies

Standard vs Express Workflows

When to Use Standard: - Long-running workflows (up to 1 year) - Exactly-once execution semantics required - Need full execution history and visual debugging - Audit trail is critical - Slower execution rate (< 2,000/second)

When to Use Express: - High-volume, short-duration workflows (< 5 minutes) - Can tolerate at-least-once execution - Cost is primary concern (100x cheaper for high volume) - Need high throughput (100,000/second) - Streaming data processing

Cost Comparison Example:

Standard: $0.025 per 1,000 state transitions
Express: $1.00 per 1 million executions + $0.0000167 per GB-second

For 100 million executions/month with 3 states each:
Standard: (100M * 3 * $0.025) / 1000 = $7,500/month
Express: (100M * $1.00) / 1M + compute = ~$100/month

Monitoring & Observability

CloudWatch Integration: - Step Functions Metrics - Execution metrics - CloudWatch Logs - Detailed execution logs - CloudWatch Alarms - Alert on failures

X-Ray Tracing: - Enable X-Ray - End-to-end tracing - Service Map - Visualize workflow dependencies - Trace Analysis - Find bottlenecks

EventBridge Integration: - Execution Events - React to workflow events - Failed Execution Alerts - Automated notifications

Real-World Use Cases

Order Processing: 1. Validate order (Lambda) 2. Check inventory (DynamoDB) 3. Charge payment (External API with callback) 4. Update inventory (DynamoDB) 5. Ship order (SQS) 6. Send confirmation (SNS) 7. On Error: Refund payment, restore inventory

ETL Pipeline: 1. Trigger Glue job (extract) 2. Wait for completion 3. Parallel transform jobs (Map state) 4. Load to Redshift 5. Run validation queries (Athena) 6. Generate reports (Lambda)

ML Training Workflow: 1. Prepare data (Glue) 2. Train model (SageMaker) 3. Evaluate model (Lambda) 4. If accuracy > 95%: Deploy (SageMaker endpoint) 5. Else: Tune hyperparameters, retry 6. Send notification (SNS)

Human Approval Workflow: 1. Submit expense report (Lambda) 2. Wait for manager approval (callback with task token) 3. If approved: Process payment (Lambda) 4. If rejected: Notify employee (SNS) 5. Archive (S3)

Video Tutorials

Beginner: - Step Functions Tutorial - AWS official introduction - Building Workflows with Workflow Studio - Visual builder demo - Step Functions for Beginners - Complete walkthrough

Advanced: - AWS re:Invent - Step Functions Deep Dive - Annual advanced sessions - Distributed Map State - Large-scale processing - Step Functions Best Practices - AWS Serverless Land

Community Tools & Libraries

Infrastructure as Code: - CDK Patterns for Step Functions - Reusable patterns - Serverless Framework Plugin - Define workflows in YAML - SAM Support - Step Functions in SAM

Testing & Development: - Step Functions Local - Test workflows locally - LocalStack - Emulate Step Functions - ASL Validator - Validate state machine definitions

Visualization: - Step Functions Graph - VS Code extension - Render ASL as SVG - Generate diagrams from code

Workshops & Hands-On Labs

Official AWS Workshops: - Step Functions Workshop - Comprehensive hands-on tutorial - Serverless Patterns - Step Functions patterns collection - Build a Saga Pattern - Distributed transaction workshop

Community Resources: - Serverless Land - Step Functions examples and patterns - AWS Samples GitHub - Official code samples

Recommended Learning Path

Week 1 - Fundamentals: 1. Read Step Functions Developer Guide - First 5 chapters 2. Watch Step Functions Tutorial Video 3. Build your first workflow with FsCDK (examples above) 4. Explore Workflow Studio

Week 2 - Patterns & Best Practices: 1. Study Step Functions Design Patterns 2. Read Step Functions Best Practices 3. Implement error handling with Retry and Catch 4. Learn Service Integrations

Week 3 - Advanced: 1. Implement Saga Pattern 2. Use Map State for parallel processing 3. Add X-Ray tracing 4. Take Step Functions Workshop

Ongoing - Mastery: - Build complex orchestration patterns - Optimize costs (Standard vs Express) - Implement circuit breakers and resilience patterns - Follow AWS Compute Blog for new features

AWS Experts to Follow

AWS Heroes AWS Heroes and community experts who share serverless workflow patterns

AWS Heroes & Advocates: - Yan Cui - Serverless orchestration expert - Twitter/X: @theburningmonk - Ben Kehoe - Serverless workflow patterns - Twitter/X: @ben11kehoe - Mastodon: @ben11kehoe@mastodon.social - Jeremy Daly - Serverless advocate - Twitter/X: @jeremy_daly - Danilo Poccia - AWS Principal Developer Advocate - Twitter/X: @danilop - Mastodon: @danilop@mastodon.social

AWS Step Functions Team: - Follow AWS Compute Blog for official updates

Common Pitfalls & Solutions

❌ DON'T: 1. Use Step Functions for high-frequency loops → Use Lambda or Fargate 2. Pass large payloads between states → Use S3 for data, pass S3 keys 3. Ignore error handling → Always add Retry and Catch 4. Use Standard for high-volume, short tasks → Use Express workflows 5. Forget timeouts → Set realistic TimeoutSeconds for each state

✅ DO: 1. Design for idempotency → Same input = same output 2. Use parallel states → Execute independent tasks concurrently 3. Implement compensating transactions → Saga pattern for rollbacks 4. Monitor execution metrics → Set up CloudWatch alarms 5. Use service integrations → Avoid Lambda for simple AWS API calls

FsCDK Step Functions Features

For implementation details, see src/StepFunctions.fs in the FsCDK repository.

namespace FsCDK
namespace Amazon
namespace Amazon.CDK
namespace Amazon.CDK.AWS
namespace Amazon.CDK.AWS.StepFunctions
namespace Amazon.CDK.AWS.StepFunctions.Tasks
namespace Amazon.CDK.AWS.Lambda
namespace Amazon.CDK.AWS.Logs
val stack: name: string -> StackBuilder
<summary>Creates an AWS CDK Stack construct.</summary>
<param name="name">The name of the stack.</param>
<code lang="fsharp"> stack "MyStack" { lambda myFunction bucket myBucket } </code>
val processFunc: FunctionSpec
val lambda: name: string -> FunctionBuilder
<summary>Creates a Lambda function configuration.</summary>
<param name="name">The function name.</param>
<code lang="fsharp"> lambda "MyFunction" { handler "index.handler" runtime Runtime.NODEJS_18_X code "./lambda" timeout 30.0 } </code>
custom operation: runtime (Runtime) Calls FunctionBuilder.Runtime
<summary>Sets the runtime for the Lambda function.</summary>
<param name="config">The function configuration.</param>
<param name="runtime">The Lambda runtime.</param>
<code lang="fsharp"> lambda "MyFunction" { runtime Runtime.NODEJS_18_X } </code>
Multiple items
type Runtime = inherit DeputyBase new: name: string * ?family: Nullable<RuntimeFamily> * ?props: ILambdaRuntimeProps -> unit member RuntimeEquals: other: Runtime -> bool member ToString: unit -> string member BundlingImage: DockerImage member Family: Nullable<RuntimeFamily> member IsVariable: bool member Name: string member SupportsCodeGuruProfiling: bool member SupportsInlineCode: bool ...

--------------------
Runtime(name: string, ?family: System.Nullable<RuntimeFamily>, ?props: ILambdaRuntimeProps) : Runtime
property Runtime.DOTNET_8: Runtime with get
custom operation: handler (string) Calls FunctionBuilder.Handler
<summary>Sets the handler for the Lambda function.</summary>
<param name="config">The function configuration.</param>
<param name="handler">The handler name (e.g., "index.handler").</param>
<code lang="fsharp"> lambda "MyFunction" { handler "index.handler" } </code>
custom operation: code (Code) Calls FunctionBuilder.Code
<summary>Sets the code source from a Code object.</summary>
<param name="config">The function configuration.</param>
<param name="path">The Code object.</param>
<code lang="fsharp"> lambda "MyFunction" { code (Code.FromBucket myBucket "lambda.zip") } </code>
val validateFunc: FunctionSpec
val logGroup: ILogGroup
val logGroup: name: string -> CloudWatchLogGroupBuilder
<summary> Creates a new CloudWatch Log Group builder with sensible defaults. Example: logGroup "/aws/ecs/my-service" { retention RetentionDays.ONE_MONTH } </summary>
custom operation: retention (RetentionDays) Calls CloudWatchLogGroupBuilder.Retention
[<Struct>] type RetentionDays = | ONE_DAY = 0 | THREE_DAYS = 1 | FIVE_DAYS = 2 | ONE_WEEK = 3 | TWO_WEEKS = 4 | ONE_MONTH = 5 | TWO_MONTHS = 6 | THREE_MONTHS = 7 | FOUR_MONTHS = 8 | FIVE_MONTHS = 9 ...
field RetentionDays.ONE_MONTH: RetentionDays = 5
val stateMachine: StepFunctionSpec
val stepFunction: name: string -> StepFunctionBuilder
<summary> Creates a new Step Functions state machine builder. Example: stepFunction "OrderWorkflow" { definition (taskState "ProcessOrder" processLambda) } </summary>
custom operation: comment (string) Calls StepFunctionBuilder.Comment
custom operation: logDestination (ILogGroup) Calls StepFunctionBuilder.LogDestination
custom operation: timeout (Duration) Calls StepFunctionBuilder.Timeout
type Duration = inherit DeputyBase member FormatTokenToNumber: unit -> string member IsUnresolved: unit -> bool member Minus: rhs: Duration -> Duration member Plus: rhs: Duration -> Duration member ToDays: ?opts: ITimeConversionOptions -> float member ToHours: ?opts: ITimeConversionOptions -> float member ToHumanString: unit -> string member ToIsoString: unit -> string member ToMilliseconds: ?opts: ITimeConversionOptions -> float ...
Duration.Hours(amount: float) : Duration
field RetentionDays.ONE_WEEK: RetentionDays = 3
val standardSM: StepFunctionSpec
custom operation: stateMachineType (StateMachineType) Calls StepFunctionBuilder.StateMachineType
[<Struct>] type StateMachineType = | EXPRESS = 0 | STANDARD = 1
field StateMachineType.STANDARD: StateMachineType = 1
Duration.Days(amount: float) : Duration
val expressSM: StepFunctionSpec
field StateMachineType.EXPRESS: StateMachineType = 0
Duration.Minutes(amount: float) : Duration
field RetentionDays.THREE_MONTHS: RetentionDays = 7
val sm: StepFunctionSpec
custom operation: loggingLevel (LogLevel) Calls StepFunctionBuilder.LoggingLevel
[<Struct>] type LogLevel = | OFF = 0 | ALL = 1 | ERROR = 2 | FATAL = 3
field LogLevel.ALL: LogLevel = 1
custom operation: tracingEnabled (bool) Calls StepFunctionBuilder.TracingEnabled
module StepFunctionHelpers from FsCDK
<summary> Helper functions for Step Functions operations </summary>
val standardType: StateMachineType
module StateMachineTypes from FsCDK.StepFunctionHelpers
<summary> Common state machine types </summary>
val standard: StateMachineType
val expressType: StateMachineType
val express: StateMachineType
val fiveMin: Duration
module Timeouts from FsCDK.StepFunctionHelpers
<summary> Common timeout durations </summary>
val fiveMinutes: Duration
val thirtyMin: Duration
val thirtyMinutes: Duration
val oneHour: Duration
val oneDay: Duration
val allLogs: LogLevel
module LoggingLevels from FsCDK.StepFunctionHelpers
<summary> Common logging levels </summary>
val all: LogLevel
val errorLogs: LogLevel
val error: LogLevel
val noLogs: LogLevel
val off: LogLevel

Type something to start searching.