Duck Data Platform

Operate

Distributed Compute

Roll out remote workers without moving identity, policy, or governance out of the control plane.

This runbook describes how to roll out remote compute without weakening Duck’s security and governance model.

Architecture Boundaries

the gateway remains the single policy enforcement point
workers execute already-rewritten SQL
gateway-to-worker transport uses internal gRPC
storage, auth, and governance metadata remain anchored in the control plane

When to Use Remote Compute

you need worker isolation or a separate execution fleet
you want lifecycle-style async execution
you need a staged rollout with local fallback
query or orchestration load makes local-only execution an operator bottleneck

Admin Checklist

confirm the gateway feature flags match the intended rollout
set worker auth and listen addresses explicitly
start with fallback enabled on assignments
canary a small set of users or groups before widening traffic
monitor queue latency and failure reasons before widening scope

Remote Compute Settings

Setting	Applies To	Why It Matters
`AGENT_TOKEN`	Worker	Authenticates the worker to the control plane
`LISTEN_ADDR`	Worker	Binds the worker’s public listener correctly
`GRPC_LISTEN_ADDR`	Worker	Exposes the internal gRPC path for execution traffic
`MAX_MEMORY_GB`	Worker	Caps worker memory for safer isolation
`QUERY_RESULT_TTL`	Worker	Controls how long async results stay available
`QUERY_CLEANUP_INTERVAL`	Worker	Governs lifecycle cleanup pressure
`FEATURE_REMOTE_ROUTING`	Gateway	Enables routing work to remote workers
`FEATURE_ASYNC_QUEUE`	Gateway	Turns on queued async execution behavior
`FEATURE_CURSOR_MODE`	Gateway	Affects cursor-style remote result handling
`FEATURE_INTERNAL_GRPC`	Gateway	Enables the internal transport to workers
`REMOTE_CANARY_USERS`	Gateway	Limits early rollout to a known audience

Health and Failure Handling

monitor GET /health and GET /metrics
expect fallback behavior when worker health degrades and assignments allow local execution
use retention settings to control in-memory lifecycle result pressure
document the operator decision for when fallback should be automatic versus disabled

Rollout Sequence

enable remote support with local fallback
route a limited audience
observe queue latency and completion behavior
widen scope only after representative success

Next Steps

Platform SettingsConfigure gateway and workers. Security ChecklistCheck rollout hardening. Observability And TroubleshootingDebug queue and worker issues.

Compute API

← Security Checklist Storage And Integrations →