Chatons Cloud on Kubernetes

This page documents the initial Kubernetes deployment shape for Chatons Cloud.

Status: proposed and scaffolded. The manifests and services in the repository are bootstrapping assets, not a production-complete platform.

Services

The cloud stack is split into three primary services:

cloud-api
cloud-realtime
runtime-headless

The desktop login flow opens the user browser to the Chatons Cloud OIDC authorization endpoint, then returns to the desktop app through the custom chatons://cloud/auth/callback protocol.

The repository now also contains an initial browser-side Chatons Cloud portal inside the existing landing/ app. That portal reuses the landing visual system and provides the first product shell for cloud signup, login, organization onboarding, provider setup, and desktop handoff. It is still a thin browser client over cloud-api, not a separate identity service.

After that browser callback, the desktop uses the stored cloud session for two different network paths:

authenticated HTTP bootstrap against cloud-api
short-lived websocket token minting plus a persistent websocket session against cloud-realtime

The realtime token request is now bound to the desktop's persisted cloud-instance id and is verified by cloud-api before cloud-realtime mints a websocket token. This keeps websocket routing stable across reconnects, avoids inferring the target instance from token string suffixes, and prevents bearer-presence-only websocket access.

This separation is intentional for Kubernetes:

cloud-api scales on HTTP and CRUD load
cloud-realtime scales on concurrent websocket sessions
runtime-headless scales on active runtime execution demand

Each service exposes:

/healthz
/readyz

These are used by liveness and readiness probes in the cluster.

Kubernetes Layout

The repository includes a base Kustomize package under:

k8s/base/

Included resources:

namespace
configmap
example secret manifest
deployments, statefulsets, services, and PVC-backed storage for the bundled PostgreSQL and Redis instances
ingress
horizontal pod autoscalers

Deployment Principles

Stateless services where possible
Configuration from environment variables
Kubernetes-native health probes
Horizontal scaling for API, realtime, and runtime workers
Separate service identities for future network policies

Operational Notes

The current manifests assume an ingress controller such as NGINX.
Websocket traffic is routed through the dedicated cloud-realtime service.
cloud-realtime now supports Redis-backed Pub/Sub fan-out so multiple realtime pods can serve the same logical cloud instance without relying on in-process memory only.
cloud-api now supports a PostgreSQL-backed control plane for users, cloud workspaces, projects, conversations, transcripts, durable plan definitions, and stateful OIDC authorization-code requests, while still keeping an in-memory fallback for local development when DATABASE_URL is absent.
runtime-headless now also supports PostgreSQL-backed runtime-session persistence and cross-pod parallel-session accounting when DATABASE_URL is configured, with an in-memory fallback for local development.
the bundled postgres and redis manifests are now single-replica StatefulSets with persistent volume claims so a one-node bootstrap cluster can keep data across pod restarts.
the default base profile is now tuned for single-node clusters: cloud-api, cloud-realtime, and runtime-headless start at one replica each, and the HPAs scale from 1 instead of 2.
runtime-headless now reuses an active session for a given cloud conversation instead of blindly creating a second one, which reduces duplicate runtimes when multiple requests or pods race on the same thread.
runtime-headless now also stamps each active conversation session with an owner id and lease expiration so a dead pod can eventually be superseded instead of blocking the conversation indefinitely.
runtime-headless, cloud-realtime, and cloud-api now share an internal service token. runtime-headless uses it to ask cloud-api for the authoritative user, project, conversation, instance, and quota grant before creating or serving a remote runtime session, and cloud-realtime uses it to authorize websocket token minting and event publication.
cloud-realtime no longer accepts anonymous POST /v1/realtime/events; only trusted internal callers with the shared service token may publish events. Websocket token maps are also reaped periodically, and per-instance socket counts are capped.
the base Kubernetes manifests now include topology spread constraints, ingress TLS, ingress rate limits, pod disruption budgets, and a default namespace network policy. These are still baseline controls, not a complete production security posture.
the desktop main process now treats a missing remote runtime session as recoverable state: if a stored cloud_runtime_session_id returns 404, it clears that stale id locally and reacquires a fresh runtime session from the cloud service.
cloud-realtime now keeps a bounded replay buffer per cloud instance and the desktop requests replay incrementally after the last seen event sequence, so reconnect no longer needs to reapply the entire recent buffer every time. When Redis is configured, this replay buffer and its per-instance sequence counter survive cloud-realtime pod restarts.
the three cloud services now force-drain lingering HTTP connections during shutdown and bound store/Redis close waits so a single-replica rolling update is less likely to stall with an old pod stuck in Terminating.
the cloud image build must also carry runtime JavaScript files that live under packages/ but are imported by the cloud services without TypeScript compilation, such as packages/memory/index.js; otherwise the container starts with ERR_MODULE_NOT_FOUND and the rollout stalls on the old replica.
Secrets are represented by an example manifest and should be managed through your real secret workflow.
the example secret now includes POSTGRES_PASSWORD, which must match the password embedded in DATABASE_URL because the in-cluster PostgreSQL bootstrap and the application connection string are configured independently.
cloud-api now exposes a built-in OIDC issuer with discovery, JWKS, authorization, token, and userinfo endpoints. Set OIDC_ISSUER_URL to the public Chatons Cloud base URL so the desktop app and any future relying parties resolve the same issuer metadata.
cloud-api should also receive an explicit CHATONS_CLOUD_PUBLIC_URL in chatons-cloud-secrets. That value becomes the canonical public base URL embedded in cloud-instance bootstrap state, desktop auth redirects, and server-generated links. In the default hosted layout, set it to https://cloud.chatons.ai.
cloud-api should also publish explicit client-facing service URLs through CHATONS_REALTIME_PUBLIC_URL and CHATONS_RUNTIME_PUBLIC_URL. Those values are returned in cloud bootstrap state so the desktop can connect to websocket and runtime services without guessing sibling ports.
cloud-api now also exposes lightweight web onboarding endpoints for browser signup/login, organization setup, and organization-owned provider configuration so cloud.chatons.ai can drive the same server-owned cloud model as the desktop app.
cloud-api now also supports password-based web auth plus transactional email flows for signup verification and password recovery. To use this in Kubernetes, provide SMTP settings through chatons-cloud-secrets and a public CHATONS_CLOUD_WEB_URL so verification and reset links point back to the browser portal.
If SMTP is unreachable or temporarily misconfigured, cloud-api now creates the account first and logs the async mail delivery failure instead of leaving browser signup or password-reset requests hanging until the SMTP timeout. Users will still need a working SMTP path to receive verification and reset links.
The base ingress now assumes three public hosts: cloud.chatons.ai as the canonical browser/API/OIDC surface for cloud-api, api.chatons.ai as an optional direct alias to the same cloud-api service, and realtime.chatons.ai for websocket traffic to cloud-realtime.
The base Kustomize package now also includes cert-manager ClusterIssuer manifests for Let’s Encrypt HTTP-01. Once cert-manager is installed in the cluster and DNS points the three public hosts at the ingress entrypoint, the chatons-cloud ingress can mint and renew chatons-cloud-tls automatically through the letsencrypt-production issuer.
The runtime service should later be extended with queue-driven orchestration, worker heartbeats, and per-conversation ownership.

GitHub Push Deployment

The repository includes a GitHub Actions workflow for continuous deployment to a Kubernetes cluster on every push to main.

Workflow file:

.github/workflows/deploy-cloud.yml

Deployment flow:

build and push cloud-api, cloud-realtime, and runtime-headless images to GHCR
tag each image with both latest and the Git commit SHA
decode a base64 kubeconfig from GitHub Actions secrets
print the configured kube contexts and validate cluster connectivity before mutating resources
apply the Kubernetes base manifests with Kustomize image overrides pinned to the pushed SHA
wait for the three deployments to finish their rollout
dump deployment, ReplicaSet, pod, and recent log diagnostics automatically when a rollout times out in GitHub Actions

Required GitHub repository secrets:

KUBE_CONFIG -- base64-encoded kubeconfig for the target cluster
CHATONS_CLOUD_SECRETS_YAML -- optional full Kubernetes Secret manifest for chatons-cloud-secrets

Recommended bootstrap command for the kubeconfig secret:

base64 -i /path/to/kubeconfig | pbcopy

Then store the clipboard value in the KUBE_CONFIG GitHub Actions secret.

If the workflow fails with Unable to connect to the server: EOF during an early kubectl step, treat that as a kubeconfig or API reachability problem, not a missing namespace. Typical causes are:

the KUBE_CONFIG secret was copied with the wrong base64 variant or extra whitespace
the kubeconfig points at a private control-plane endpoint that GitHub-hosted runners cannot reach
the referenced client certificate, token, or exec-based auth command is invalid in CI

For a single-node cluster, the base package now keeps the stateless services leaner by default:

cloud-api, cloud-realtime, and runtime-headless each start at 1 replica
the HPAs scale from 1 instead of requiring a two-pod baseline
stateless-service CPU and memory requests are reduced so more of the node remains available for the database, Redis, image pulls, and Kubernetes system workloads

Important operational detail:

the workflow pushes images to ghcr.io/<github-owner>/chaton/...
the workflow also creates or refreshes a ghcr-auth image pull secret in the chatons-cloud namespace
if you prefer public container images, you can make the GHCR package public and remove the pull-secret step later

If you use CHATONS_CLOUD_SECRETS_YAML, store the complete manifest as a GitHub secret, for example:

apiVersion: v1
kind: Secret
metadata:
  name: chatons-cloud-secrets
  namespace: chatons-cloud
type: Opaque
stringData:
  POSTGRES_PASSWORD: "..."
  DATABASE_URL: "postgres://chatons:...@postgres.chatons-cloud.svc.cluster.local:5432/chatons"
  REDIS_URL: "redis://..."
  CHATONS_INTERNAL_SERVICE_TOKEN: "..."
  CHATONS_CLOUD_PUBLIC_URL: "https://cloud.chatons.ai"
  CHATONS_REALTIME_PUBLIC_URL: "wss://realtime.chatons.ai/ws"
  CHATONS_RUNTIME_PUBLIC_URL: "https://cloud.chatons.ai"
  OIDC_CLIENT_ID: "chatons-desktop"
  OIDC_CLIENT_SECRET: ""
  OIDC_ISSUER_URL: "https://cloud.chatons.ai"
  CHATONS_CLOUD_WEB_URL: "https://cloud.chatons.ai"
  JWT_SIGNING_KEY: "..."
  SMTP_HOST: "smtp.example.com"
  SMTP_PORT: "587"
  SMTP_SECURE: "false"
  SMTP_USER: "no-reply@chatons.ai"
  SMTP_PASS: "..."
  SMTP_FROM: "Chatons Cloud <no-reply@chatons.ai>"

For a one-node cluster, use a storage class that supports ReadWriteOnce PVCs on that node. The bundled PostgreSQL and Redis manifests will retain data across pod/container restarts as long as the bound persistent volumes remain intact.

Next Infrastructure Steps

Replace the in-cluster PostgreSQL and Redis manifests with managed services or stateful production storage
Add observability stack integration
Add memory and connection-count based autoscaling signals in addition to CPU
Move desktop cloud sessions out of local plaintext SQLite and into an OS-backed secret store
Add rollout strategy and canary policy