Chatons Cloud on Kubernetes
This page documents the initial Kubernetes deployment shape for Chatons Cloud.
Status: proposed and scaffolded. The manifests and services in the repository are bootstrapping assets, not a production-complete platform.
Services
The cloud stack is split into three primary services:
cloud-apicloud-realtimeruntime-headless
The desktop login flow opens the user browser to the Chatons Cloud OIDC authorization endpoint, then returns to the desktop app through the custom chatons://cloud/auth/callback protocol.
The repository now also contains an initial browser-side Chatons Cloud portal inside the existing landing/ app. That portal reuses the landing visual system and provides the first product shell for cloud signup, login, organization onboarding, provider setup, and desktop handoff. It is still a thin browser client over cloud-api, not a separate identity service.
After that browser callback, the desktop uses the stored cloud session for two different network paths:
- authenticated HTTP bootstrap against
cloud-api - short-lived websocket token minting plus a persistent websocket session against
cloud-realtime
The realtime token request is now bound to the desktop's persisted cloud-instance id and is verified by cloud-api before cloud-realtime mints a websocket token. This keeps websocket routing stable across reconnects, avoids inferring the target instance from token string suffixes, and prevents bearer-presence-only websocket access.
This separation is intentional for Kubernetes:
cloud-apiscales on HTTP and CRUD loadcloud-realtimescales on concurrent websocket sessionsruntime-headlessscales on active runtime execution demand
Each service exposes:
/healthz/readyz
These are used by liveness and readiness probes in the cluster.
Kubernetes Layout
The repository includes a base Kustomize package under:
k8s/base/
Included resources:
- namespace
- configmap
- example secret manifest
- deployments, statefulsets, services, and PVC-backed storage for the bundled PostgreSQL and Redis instances
- ingress
- horizontal pod autoscalers
Deployment Principles
- Stateless services where possible
- Configuration from environment variables
- Kubernetes-native health probes
- Horizontal scaling for API, realtime, and runtime workers
- Separate service identities for future network policies
Operational Notes
- The current manifests assume an ingress controller such as NGINX.
- Websocket traffic is routed through the dedicated
cloud-realtimeservice. cloud-realtimenow supports Redis-backed Pub/Sub fan-out so multiple realtime pods can serve the same logical cloud instance without relying on in-process memory only.cloud-apinow supports a PostgreSQL-backed control plane for users, cloud workspaces, projects, conversations, transcripts, durable plan definitions, and stateful OIDC authorization-code requests, while still keeping an in-memory fallback for local development whenDATABASE_URLis absent.runtime-headlessnow also supports PostgreSQL-backed runtime-session persistence and cross-pod parallel-session accounting whenDATABASE_URLis configured, with an in-memory fallback for local development.- the bundled
postgresandredismanifests are now single-replicaStatefulSets with persistent volume claims so a one-node bootstrap cluster can keep data across pod restarts. - the default base profile is now tuned for single-node clusters:
cloud-api,cloud-realtime, andruntime-headlessstart at one replica each, and the HPAs scale from1instead of2. runtime-headlessnow reuses an active session for a given cloud conversation instead of blindly creating a second one, which reduces duplicate runtimes when multiple requests or pods race on the same thread.runtime-headlessnow also stamps each active conversation session with an owner id and lease expiration so a dead pod can eventually be superseded instead of blocking the conversation indefinitely.runtime-headless,cloud-realtime, andcloud-apinow share an internal service token.runtime-headlessuses it to askcloud-apifor the authoritative user, project, conversation, instance, and quota grant before creating or serving a remote runtime session, andcloud-realtimeuses it to authorize websocket token minting and event publication.cloud-realtimeno longer accepts anonymousPOST /v1/realtime/events; only trusted internal callers with the shared service token may publish events. Websocket token maps are also reaped periodically, and per-instance socket counts are capped.- the base Kubernetes manifests now include topology spread constraints, ingress TLS, ingress rate limits, pod disruption budgets, and a default namespace network policy. These are still baseline controls, not a complete production security posture.
- the desktop main process now treats a missing remote runtime session as recoverable state: if a stored
cloud_runtime_session_idreturns404, it clears that stale id locally and reacquires a fresh runtime session from the cloud service. cloud-realtimenow keeps a bounded replay buffer per cloud instance and the desktop requests replay incrementally after the last seen event sequence, so reconnect no longer needs to reapply the entire recent buffer every time. When Redis is configured, this replay buffer and its per-instance sequence counter survivecloud-realtimepod restarts.- the three cloud services now force-drain lingering HTTP connections during shutdown and bound store/Redis close waits so a single-replica rolling update is less likely to stall with an old pod stuck in
Terminating. - the cloud image build must also carry runtime JavaScript files that live under
packages/but are imported by the cloud services without TypeScript compilation, such aspackages/memory/index.js; otherwise the container starts withERR_MODULE_NOT_FOUNDand the rollout stalls on the old replica. - Secrets are represented by an example manifest and should be managed through your real secret workflow.
- the example secret now includes
POSTGRES_PASSWORD, which must match the password embedded inDATABASE_URLbecause the in-cluster PostgreSQL bootstrap and the application connection string are configured independently. cloud-apinow exposes a built-in OIDC issuer with discovery, JWKS, authorization, token, and userinfo endpoints. SetOIDC_ISSUER_URLto the public Chatons Cloud base URL so the desktop app and any future relying parties resolve the same issuer metadata.cloud-apishould also receive an explicitCHATONS_CLOUD_PUBLIC_URLinchatons-cloud-secrets. That value becomes the canonical public base URL embedded in cloud-instance bootstrap state, desktop auth redirects, and server-generated links. In the default hosted layout, set it tohttps://cloud.chatons.ai.cloud-apishould also publish explicit client-facing service URLs throughCHATONS_REALTIME_PUBLIC_URLandCHATONS_RUNTIME_PUBLIC_URL. Those values are returned in cloud bootstrap state so the desktop can connect to websocket and runtime services without guessing sibling ports.cloud-apinow also exposes lightweight web onboarding endpoints for browser signup/login, organization setup, and organization-owned provider configuration socloud.chatons.aican drive the same server-owned cloud model as the desktop app.cloud-apinow also supports password-based web auth plus transactional email flows for signup verification and password recovery. To use this in Kubernetes, provide SMTP settings throughchatons-cloud-secretsand a publicCHATONS_CLOUD_WEB_URLso verification and reset links point back to the browser portal.- If SMTP is unreachable or temporarily misconfigured,
cloud-apinow creates the account first and logs the async mail delivery failure instead of leaving browser signup or password-reset requests hanging until the SMTP timeout. Users will still need a working SMTP path to receive verification and reset links. - The base ingress now assumes three public hosts:
cloud.chatons.aias the canonical browser/API/OIDC surface forcloud-api,api.chatons.aias an optional direct alias to the samecloud-apiservice, andrealtime.chatons.aifor websocket traffic tocloud-realtime. - The base Kustomize package now also includes
cert-managerClusterIssuermanifests for Let’s Encrypt HTTP-01. Oncecert-manageris installed in the cluster and DNS points the three public hosts at the ingress entrypoint, thechatons-cloudingress can mint and renewchatons-cloud-tlsautomatically through theletsencrypt-productionissuer. - The runtime service should later be extended with queue-driven orchestration, worker heartbeats, and per-conversation ownership.
GitHub Push Deployment
The repository includes a GitHub Actions workflow for continuous deployment to a Kubernetes cluster on every push to main.
Workflow file:
.github/workflows/deploy-cloud.yml
Deployment flow:
- build and push
cloud-api,cloud-realtime, andruntime-headlessimages to GHCR - tag each image with both
latestand the Git commit SHA - decode a base64 kubeconfig from GitHub Actions secrets
- print the configured kube contexts and validate cluster connectivity before mutating resources
- apply the Kubernetes base manifests with Kustomize image overrides pinned to the pushed SHA
- wait for the three deployments to finish their rollout
- dump deployment, ReplicaSet, pod, and recent log diagnostics automatically when a rollout times out in GitHub Actions
Required GitHub repository secrets:
KUBE_CONFIG-- base64-encoded kubeconfig for the target clusterCHATONS_CLOUD_SECRETS_YAML-- optional full Kubernetes Secret manifest forchatons-cloud-secrets
Recommended bootstrap command for the kubeconfig secret:
base64 -i /path/to/kubeconfig | pbcopy
Then store the clipboard value in the KUBE_CONFIG GitHub Actions secret.
If the workflow fails with Unable to connect to the server: EOF during an early kubectl step, treat that as a kubeconfig or API reachability problem, not a missing namespace. Typical causes are:
- the
KUBE_CONFIGsecret was copied with the wrong base64 variant or extra whitespace - the kubeconfig points at a private control-plane endpoint that GitHub-hosted runners cannot reach
- the referenced client certificate, token, or exec-based auth command is invalid in CI
For a single-node cluster, the base package now keeps the stateless services leaner by default:
cloud-api,cloud-realtime, andruntime-headlesseach start at1replica- the HPAs scale from
1instead of requiring a two-pod baseline - stateless-service CPU and memory requests are reduced so more of the node remains available for the database, Redis, image pulls, and Kubernetes system workloads
Important operational detail:
- the workflow pushes images to
ghcr.io/<github-owner>/chaton/... - the workflow also creates or refreshes a
ghcr-authimage pull secret in thechatons-cloudnamespace - if you prefer public container images, you can make the GHCR package public and remove the pull-secret step later
If you use CHATONS_CLOUD_SECRETS_YAML, store the complete manifest as a GitHub secret, for example:
apiVersion: v1
kind: Secret
metadata:
name: chatons-cloud-secrets
namespace: chatons-cloud
type: Opaque
stringData:
POSTGRES_PASSWORD: "..."
DATABASE_URL: "postgres://chatons:...@postgres.chatons-cloud.svc.cluster.local:5432/chatons"
REDIS_URL: "redis://..."
CHATONS_INTERNAL_SERVICE_TOKEN: "..."
CHATONS_CLOUD_PUBLIC_URL: "https://cloud.chatons.ai"
CHATONS_REALTIME_PUBLIC_URL: "wss://realtime.chatons.ai/ws"
CHATONS_RUNTIME_PUBLIC_URL: "https://cloud.chatons.ai"
OIDC_CLIENT_ID: "chatons-desktop"
OIDC_CLIENT_SECRET: ""
OIDC_ISSUER_URL: "https://cloud.chatons.ai"
CHATONS_CLOUD_WEB_URL: "https://cloud.chatons.ai"
JWT_SIGNING_KEY: "..."
SMTP_HOST: "smtp.example.com"
SMTP_PORT: "587"
SMTP_SECURE: "false"
SMTP_USER: "no-reply@chatons.ai"
SMTP_PASS: "..."
SMTP_FROM: "Chatons Cloud <no-reply@chatons.ai>"
For a one-node cluster, use a storage class that supports ReadWriteOnce PVCs on that node. The bundled PostgreSQL and Redis manifests will retain data across pod/container restarts as long as the bound persistent volumes remain intact.
Next Infrastructure Steps
- Replace the in-cluster PostgreSQL and Redis manifests with managed services or stateful production storage
- Add observability stack integration
- Add memory and connection-count based autoscaling signals in addition to CPU
- Move desktop cloud sessions out of local plaintext SQLite and into an OS-backed secret store
- Add rollout strategy and canary policy