🎮 Execution Control - Managing Your FaaS Functions

Welcome to the execution control guide! Learn how to manage, monitor, and control your FaaS function executions like a pro. Let's explore how to handle long-running tasks, implement graceful termination, and monitor execution status.

🛑 Execution Termination

Graceful Termination with Checkpoints

Need to safely stop a long-running execution? Use checkpoints to implement graceful termination:

class ServiceRunner(dl.BaseServiceRunner):
    def train_model(self, item: dl.Item, progress: dl.Progress):
        # Initialize training
        model = list()
        
        for epoch in range(100):
            # Check for termination request before each epoch
            self.kill_event()
            
            # Train for one epoch
            train_loss = model.append(epoch)
            
            # Save checkpoint
            print(model)
            
            # Check again after expensive operation
            self.kill_event()
            
            # Update progress
            progress.update(progress=epoch, message=f'Epoch {epoch}: loss={train_loss}')

Triggering Termination

Terminate an execution from another process:

# Get the execution
execution = dl.executions.get(execution_id='execution-id')

# Request termination
execution.terminate()

# Wait for termination to complete
execution = execution.wait()
print(f"Execution status: {execution.latest_status['status']}")

⏲️ Execution Timeout Management

Setting Timeout Duration

Control how long your function can run:

# Get your service
service = dl.services.get(service_name='my-service')

# Set timeout in seconds
service.execution_timeout = 3600  # 1 hour
service.update()

# For longer tasks
service.execution_timeout = 86400  # 24 hours
service.update()

Configuring Timeout Behavior

Choose what happens when timeout occurs:

# Option 1: Mark as failed (default)
service.on_reset = 'failed'
service.update()

# Option 2: Automatically retry
service.on_reset = 'rerun'
service.update()

📊 Execution Monitoring

Basic Status Monitoring

Monitor a single execution:

# Get execution by ID
execution = dl.executions.get(execution_id='execution-id')

# Wait for completion
execution = execution.wait()
print(f"Status: {execution.latest_status['status']}")
print(f"Duration: {execution.duration:.2f} seconds")

Execution Logs

Access execution logs for debugging:

# Get execution logs
execution = dl.executions.get(execution_id='execution-id')
logs = execution.logs()
print(logs)

# Stream logs in real-time
for log in execution.logs(follow=True):
    print(f"{log['timestamp']}: {log['message']}")

🔄 Execution Retry Management

Manual Retry

Retry failed executions:

# Get failed execution
execution = dl.executions.get(execution_id='failed-execution-id')

# Retry with same parameters
new_execution = execution.rerun()

# Wait for completion
new_execution = new_execution.wait()
print(f"Retry status: {new_execution.latest_status['status']}")

💡 Pro Tips & Best Practices

Resource Management

Implement regular checkpoints in long-running tasks
Save intermediate results when possible
Clean up temporary resources in case of termination

Error Handling

Use try/finally blocks for cleanup
Implement proper logging for debugging
Handle different types of termination gracefully

Performance Optimization

Monitor execution duration trends
Adjust timeouts based on actual needs
Use appropriate instance types for your workload

Monitoring Guidelines

Set up alerts for failed executions
Monitor resource usage patterns
Keep track of execution duration statistics

Need help? Check out our other tutorials or reach out to our support team. Happy coding! ⚡️