Operations Runbook

Public verification commands, release boundaries, and current infrastructure blockers.

Operations Runbook

Use this page to verify the current bootstrap infrastructure before product runtime work starts.

This runbook does not sign off production app readiness. It proves the docs lane, app edge gate, shell Cloud Run probe, and managed infrastructure baseline.

Public Edge Checks

Run:

scripts/verify-edge.sh

Expected result:

  • docs.gorunchat.com/ returns 200.
  • docs.gorunchat.com/docs/deployment-path/ returns 200.
  • docs CSS assets return 200.
  • docs responses include x-gorunchat-edge: docs.
  • app.gorunchat.com/healthz returns 200.
  • api.gorunchat.com/readyz returns 200.
  • app and API product paths return deterministic 503.
  • app and API responses include x-gorunchat-edge: app-gate.

The Edge Verification GitHub workflow runs the same checks without Cloudflare or GCP secrets.

Cloud Run Shell Checks

Run:

scripts/verify-cloud-run.sh

Expected result:

  • service: gorunchat-api.
  • region: us-central1.
  • latest ready revision: gorunchat-api-00004-mf9 until the next shell release.
  • runtime service account: gorunchat-run.
  • Direct VPC egress uses gorunchat-core and gorunchat-us-central1.
  • unauthenticated /health returns 403.
  • unauthenticated /readyz returns 403.
  • /health returns 200.
  • /readyz returns 200 with MongoDB URI, Redis URI, and Redis CA present.
  • /probes/redis returns 200 with Redis ok.
  • /probes/mongo returns 200 with MongoDB ok.
  • /probes/all returns 200 with Redis and MongoDB ok.

The script checks unauthenticated denial first. It then uses gorunchat-deployer impersonation for runtime probes. It does not build, push, or deploy.

Infrastructure Drift Check

Run:

GOOGLE_OAUTH_ACCESS_TOKEN="$(gcloud auth print-access-token)" tofu -chdir=infra/terraform plan -input=false -no-color

Expected result:

  • no changes.

The Terraform Config GitHub workflow validates format and provider schema. It does not read remote state.

Docs Deploy

Use the Docs Site GitHub workflow.

The environment docs-production belongs inside repository hey-jj/gorunchat. It is not a separate repository.

Manual deploy path:

  1. Open Actions.
  2. Select Docs Site.
  3. Run workflow from main.
  4. Set deploy to true.
  5. Confirm the Deploy job uses environment docs-production.

The deployed Worker is gorunchat-docs.

Current public hostname:

  • https://docs.gorunchat.com

Fallback operator hostname:

  • https://gorunchat-docs.labs-testing.workers.dev

Docs deploy credentials stay in the docs lane. They are runtime-forbidden.

App Edge Deploy

Use the App Edge GitHub workflow.

Manual deploy path:

  1. Open Actions.
  2. Select App Edge.
  3. Run workflow from main.
  4. Set deploy to true.
  5. Confirm the Deploy job uses environment edge-production.

The deployed Worker is gorunchat-app-gate.

Current public hostnames:

  • https://app.gorunchat.com
  • https://api.gorunchat.com

Product runtime is not connected behind the app edge gate. Product paths must keep returning deterministic 503 until runtime proof exists.

Cloud Run Release Boundary

The Cloud Run Release GitHub workflow is build-only.

It validates:

  • Go tests.
  • Docker image build.
  • local /health smoke check.
  • local /readyz smoke check.
  • runtime script syntax.

It does not push to Artifact Registry. It does not deploy Cloud Run.

Operator shell releases use:

scripts/release-cloud-run.sh

The release script requires a clean repo, gcloud, curl, and either local Docker or Cloud Build. It deploys by impersonating gorunchat-deployer.

Set BUILD_BACKEND=cloudbuild to force the remote build path. The default auto mode uses Cloud Build when local Docker is unavailable.

The build context excludes docs, edge Workers, Terraform state, dependency folders, build output, local token files, and environment files.

Do not add service-account keys. GitHub Cloud Run deploy is not configured.

Cloudflare Country Boundary

Current live enforcement is Worker-level:

  • docs Worker blocks non-US Cloudflare country metadata.
  • docs Worker blocks missing country metadata on public hostnames.
  • docs assets pass through Worker-first routing.
  • app edge Worker blocks non-US Cloudflare country metadata.
  • app edge Worker blocks missing country metadata on public hostnames.

Zone-level WAF country blocking is not managed yet.

The current .cftoken can list zone Rulesets. It cannot read the custom firewall phase entrypoint or create the required custom Ruleset.

Bare local Wrangler OAuth is not valid at the latest check. Use GitHub deploy tokens for CI deploys. Pass .cftoken as CLOUDFLARE_API_TOKEN for local read checks.

Cloudflare returned:

{"success":false,"errors":[{"message":"request is not authorized"}]}

Needed Cloudflare permission for zone gorunchat.com:

  • Rulesets Read.
  • Rulesets Write.

Intended zone-level rule:

(http.host in {"docs.gorunchat.com" "app.gorunchat.com" "api.gorunchat.com"} and ip.src.country ne "US")

Rule action:

  • block.

After permission is installed, create or update the zone-level rule and keep the Worker-level hard stop in place.

Run:

scripts/apply-cloudflare-us-only-ruleset.sh

The script creates the custom firewall entrypoint when it is missing. If the entrypoint exists, it adds or updates the GoRunChat country rule.

Production Readiness Boundary

Production app readiness is still No.

Missing proof:

  • product runtime implementation.
  • auth, provider, SSE, tools, MCP, files, messages, and audit routes.
  • product runtime connection behind the app edge gate.
  • product health checks through the chosen ingress.
  • SSE validation through the chosen ingress.
  • full runtime secret matrix.
  • W5A fail-closed audit and discovery proof.
  • rollback target.
  • production app DNS signoff.