Last updated

🎮 Execution Control - Managing Your FaaS Functions

Welcome to the execution control guide! Learn how to manage, monitor, and control your FaaS function executions like a pro. Let's explore how to handle long-running tasks, implement graceful termination, and monitor execution status.

🛑 Execution Termination

Graceful Termination with Checkpoints

Need to safely stop a long-running execution? Use checkpoints to implement graceful termination:

class ServiceRunner(dl.BaseServiceRunner):
    def train_model(self, item: dl.Item, progress: dl.Progress):
        # Initialize training
        model = list()
        
        for epoch in range(100):
            # Check for termination request before each epoch
            self.kill_event()
            
            # Train for one epoch
            train_loss = model.append(epoch)
            
            # Save checkpoint
            print(model)
            
            # Check again after expensive operation
            self.kill_event()
            
            # Update progress
            progress.update(progress=epoch, message=f'Epoch {epoch}: loss={train_loss}')

Triggering Termination

Terminate an execution from another process:

# Get the execution
execution = dl.executions.get(execution_id='execution-id')

# Request termination
execution.terminate()

# Wait for termination to complete
execution = execution.wait()
print(f"Execution status: {execution.latest_status['status']}")

⏲️ Execution Timeout Management

Setting Timeout Duration

Control how long your function can run:

# Get your service
service = dl.services.get(service_name='my-service')

# Set timeout in seconds
service.execution_timeout = 3600  # 1 hour
service.update()

# For longer tasks
service.execution_timeout = 86400  # 24 hours
service.update()

Configuring Timeout Behavior

Choose what happens when timeout occurs:

# Option 1: Mark as failed (default)
service.on_reset = 'failed'
service.update()

# Option 2: Automatically retry
service.on_reset = 'rerun'
service.update()

📊 Execution Monitoring

Basic Status Monitoring

Monitor a single execution:

# Get execution by ID
execution = dl.executions.get(execution_id='execution-id')

# Wait for completion
execution = execution.wait()
print(f"Status: {execution.latest_status['status']}")
print(f"Duration: {execution.duration:.2f} seconds")

Execution Logs

Access execution logs for debugging:

# Get execution logs
execution = dl.executions.get(execution_id='execution-id')
logs = execution.logs()
print(logs)

# Stream logs in real-time
for log in execution.logs(follow=True):
    print(f"{log['timestamp']}: {log['message']}")

🔄 Execution Retry Management

Manual Retry

Retry failed executions:

# Get failed execution
execution = dl.executions.get(execution_id='failed-execution-id')

# Retry with same parameters
new_execution = execution.rerun()

# Wait for completion
new_execution = new_execution.wait()
print(f"Retry status: {new_execution.latest_status['status']}")

💡 Pro Tips & Best Practices

Resource Management

  • Implement regular checkpoints in long-running tasks
  • Save intermediate results when possible
  • Clean up temporary resources in case of termination

Error Handling

  • Use try/finally blocks for cleanup
  • Implement proper logging for debugging
  • Handle different types of termination gracefully

Performance Optimization

  • Monitor execution duration trends
  • Adjust timeouts based on actual needs
  • Use appropriate instance types for your workload

Monitoring Guidelines

  • Set up alerts for failed executions
  • Monitor resource usage patterns
  • Keep track of execution duration statistics

Need help? Check out our other tutorials or reach out to our support team. Happy coding! ⚡️