Operate
Distributed Compute
Roll out remote workers without moving identity, policy, or governance out of the control plane.
This runbook describes how to roll out remote compute without weakening Duck’s security and governance model.
Architecture Boundaries
- the gateway remains the single policy enforcement point
- workers execute already-rewritten SQL
- gateway-to-worker transport uses internal gRPC
- storage, auth, and governance metadata remain anchored in the control plane
When to Use Remote Compute
- you need worker isolation or a separate execution fleet
- you want lifecycle-style async execution
- you need a staged rollout with local fallback
- query or orchestration load makes local-only execution an operator bottleneck
Admin Checklist
- confirm the gateway feature flags match the intended rollout
- set worker auth and listen addresses explicitly
- start with fallback enabled on assignments
- canary a small set of users or groups before widening traffic
- monitor queue latency and failure reasons before widening scope
Remote Compute Settings
| Setting | Applies To | Why It Matters |
|---|---|---|
AGENT_TOKEN |
Worker | Authenticates the worker to the control plane |
LISTEN_ADDR |
Worker | Binds the worker’s public listener correctly |
GRPC_LISTEN_ADDR |
Worker | Exposes the internal gRPC path for execution traffic |
MAX_MEMORY_GB |
Worker | Caps worker memory for safer isolation |
QUERY_RESULT_TTL |
Worker | Controls how long async results stay available |
QUERY_CLEANUP_INTERVAL |
Worker | Governs lifecycle cleanup pressure |
FEATURE_REMOTE_ROUTING |
Gateway | Enables routing work to remote workers |
FEATURE_ASYNC_QUEUE |
Gateway | Turns on queued async execution behavior |
FEATURE_CURSOR_MODE |
Gateway | Affects cursor-style remote result handling |
FEATURE_INTERNAL_GRPC |
Gateway | Enables the internal transport to workers |
REMOTE_CANARY_USERS |
Gateway | Limits early rollout to a known audience |
Health and Failure Handling
- monitor
GET /healthandGET /metrics - expect fallback behavior when worker health degrades and assignments allow local execution
- use retention settings to control in-memory lifecycle result pressure
- document the operator decision for when fallback should be automatic versus disabled
Rollout Sequence
- enable remote support with local fallback
- route a limited audience
- observe queue latency and completion behavior
- widen scope only after representative success
Next Steps
Platform SettingsConfigure gateway and workers.
Security ChecklistCheck rollout hardening.
Observability And TroubleshootingDebug queue and worker issues.