Self-Hosted Deployment

This guide walks through deploying the Witan API in your own infrastructure. It powers the Witan CLI — spreadsheet operations (xlsx calc, xlsx exec, xlsx lint, xlsx render) and document reading (read).

Architecture Overview

┌───────────────────────────────────────────────────────┐
│                 Your Infrastructure                    │
│  ┌─────────────────┐      ┌─────────────────┐        │
│  │   Witan API     │◄────►│   Postgres DB   │        │
│  │   (Docker)      │      │   (17+)         │        │
│  └───┬─────────────┘      └─────────────────┘        │
│      │                                                 │
│      ▼                                                 │
│  ┌─────────────────┐                                  │
│  │   S3 Bucket     │                                  │
│  │   (File Store)  │                                  │
│  └─────────────────┘                                  │
└──────┼─────────────────────────────────────────────────┘
       │ HTTPS

┌──────────────────────┐
│  Witan Infrastructure│
│  ┌────────────────┐  │
│  │ Management API │  │
│  │ (Witan-hosted) │  │
│  └────────────────┘  │
│  - Auth validation   │
│  - Billing           │
└──────────────────────┘

The Witan API runs in your infrastructure and connects to:

  • Witan Management API (Witan-hosted): Handles authentication and billing
  • S3 Storage: Your bucket for file uploads
  • PostgreSQL: Your database for file metadata and state

CLI Endpoints

The Witan CLI uses the following API endpoints, all of which are synchronous request-response:

Stateless Mode (file sent in request body)

Method Path Description
POST /v0/orgs/:orgId/xlsx/exec Execute JavaScript against a workbook
POST /v0/orgs/:orgId/xlsx/calc Recalculate formulas
POST /v0/orgs/:orgId/xlsx/lint Run lint diagnostics
POST /v0/orgs/:orgId/xlsx/render Render range to image
POST /v0/orgs/:orgId/read Extract text from documents

Files-Backed Mode (uploaded file tracking)

Method Path Description
POST /v0/orgs/:orgId/files Upload a file
PUT /v0/orgs/:orgId/files/:fileId Upload a new version
GET /v0/orgs/:orgId/files/:fileId/content Download file content
POST /v0/orgs/:orgId/files/:fileId/xlsx/exec Execute JavaScript
GET /v0/orgs/:orgId/files/:fileId/xlsx/calc Recalculate formulas
GET /v0/orgs/:orgId/files/:fileId/xlsx/lint Run lint diagnostics
GET /v0/orgs/:orgId/files/:fileId/xlsx/render Render range to image
GET /v0/orgs/:orgId/files/:fileId/read Extract text from documents

All endpoints are synchronous — the CLI sends a request and waits for the complete response.

Prerequisites

Required

  • Docker: Version 20.10 or later
  • PostgreSQL: Version 16 or later
  • S3 Bucket: With write access (AWS S3 or S3-compatible storage). Versioning is recommended for recoverability — see S3 Versioning Modes.
  • Network Access: Outbound HTTPS to https://management-api.witanlabs.com (authentication and billing)
  • Management API Key: Obtained from Witan support
  • GitHub Account: For pulling the Docker image from GHCR

Resource Requirements

Configuration vCPU RAM Notes
Minimum 1 2 GB Single instance, development/testing
Recommended 2 4 GB Multiple instances for production

Getting the Docker Image

The Witan API Docker image is available from GitHub Container Registry (GHCR).

Authentication

Witan will invite you to the witanlabs/api repository on GitHub, where Docker images are published with each release. Once you have access, authenticate with GHCR using a GitHub Personal Access Token (classic) with the read:packages scope:

echo YOUR_GITHUB_PAT | docker login ghcr.io -u YOUR_GITHUB_USERNAME --password-stdin

Credential Hygiene

The PAT used to pull the image is a long-lived credential, so we recommend a few practices to keep it safe:

  • Scope the token narrowly. The PAT only needs the read:packages scope — nothing else.
  • Set an expiry. Give the token a fixed expiry rather than leaving it non-expiring — this bounds the impact of an unnoticed leak. We suggest under 90 days.
  • Rotate on a cadence. Regenerate the token before it expires so pulls keep working uninterrupted.
  • Store it securely. Keep the token in a secrets manager or your CI's secret store — never commit it to source control or share it in plain text.

If a token is ever exposed, revoke it from the GitHub account's settings and generate a replacement.

Pulling the Image

# Pull a specific version (recommended for production)
docker pull ghcr.io/witanlabs/api:v1.0.0

# Pull the latest version
docker pull ghcr.io/witanlabs/api:latest

The same image tags support linux/amd64 and linux/arm64. Docker and orchestrators such as ECS or Kubernetes automatically pull the matching variant for the host architecture.

Configuration

Required Environment Variables

Variable Description Example
STAGE Deployment stage production
AWS_REGION AWS region for S3. When using S3-compatible storage via AWS_ENDPOINT_URL, any valid value (e.g. us-east-1) can be used. us-east-1
WITAN_MGMT_API_KEY Management API key from Witan support wk_live_...
WITAN_MGMT_API_WAF_TOKEN Shared secret for WAF rate limit tiers Contact Witan support
FILES_S3_BUCKET S3 bucket name for file uploads acme-witan-files
POSTGRES_DB_URL PostgreSQL connection string postgresql://user:pass@host:5432/db

Optional Environment Variables

Variable Description Default
MIGRATE_DB_ON_STARTUP Run database migrations on startup true
LOG_LEVEL Logging verbosity (debug, info, warn, error, silent) info
LOG_STYLE Log output format (pretty, json, yaml) json
SENTRY_DSN Sentry error tracking (see below)
AWS_ENDPOINT_URL S3-compatible storage endpoint (MinIO, etc.)
FILES_S3_VERSIONING Whether the bucket has versioning (enabled or disabled) — see section below enabled
EXEC_RESULT_MAX_BYTES Max bytes for an xlsx exec script's serialized result 5242880 (5 MB)
EXEC_IMAGE_MAX_COUNT Max preview images returned from an xlsx exec response; set 0 for no image count cap 10
XLS_CONVERSION_TIMEOUT_MS Max wall-clock time for legacy .xls to .xlsx conversion 120000 (2 min)
XLSX_EXEC_SESSION_CACHE_ENABLED Allow opt-in cache=true on files-backed xlsx endpoints to keep workbook sessions warm true
XLSX_EXEC_SESSION_CACHE_MAX Max warm files-backed xlsx sessions per API instance 16
XLSX_EXEC_SESSION_CACHE_TTL_MS Idle TTL for warm files-backed xlsx sessions 300000 (5 min)

Object Storage Setup

The Witan API uses S3-compatible object storage for file uploads.

  1. Create a bucket in your preferred region
  2. Enable versioning (recommended; see S3 Versioning Modes for when this is optional)
  3. Configure credentials with read, write, and delete permissions on the bucket

For S3-compatible storage (MinIO, etc.), set AWS_ENDPOINT_URL to your storage endpoint.

No public access is required.

S3 Versioning Modes

The Witan API supports two bucket configurations, controlled by FILES_S3_VERSIONING. The user-facing CLI surface (xlsx, read, files) behaves the same in both modes for normal single-file workflows; the practical difference is durability and recoverability, not functionality.

Mode When to use Bucket requirement
FILES_S3_VERSIONING=enabled (default) AWS S3, MinIO, or any S3-compat backend that supports object versioning. Versioning enabled
FILES_S3_VERSIONING=disabled S3-compat backends that cannot expose versioning (e.g. s3proxy fronting Azure Blob Storage). Versioning not required

Why this matters: recoverability. Every xlsx exec --save (or any direct PUT /v0/files/:fileId) overwrites the object at the same S3 key. With versioning enabled, S3 retains the previous bytes as a historical version, recoverable even if the local copy is later lost or corrupted. With versioning disabled, those bytes are permanently overwritten. This safety net matters when:

  • Concurrent writes to the same file path (e.g. overlapping xlsx exec --save invocations from two terminals): last writer wins both on disk and in S3, but versioning preserves the loser's bytes in S3.
  • A local file is overwritten or damaged after upload, and you need to retrieve a previously uploaded revision.

Behavioural differences in disabled mode (mostly visible to clients calling the API directly rather than via the CLI):

  • revision_id values returned by the API are derived from the object's ETag instead of the S3 VersionId.
  • DELETE /v0/orgs/:orgId/files/:fileId is not mounted. Deletion isn't supported when versioning is disabled (no soft-delete via delete markers, and the destructive alternative would be irrecoverable). The CLI does not call this endpoint, but direct API callers will get 404.
  • GET /v0/orgs/:orgId/files/:fileId/revisions is not mounted. Revision listing relies on S3 ListObjectVersions, which requires a versioned bucket. The CLI does not call this endpoint, but direct API callers will get 404.
  • ?revision=X reads strict-match against the current ETag. If they don't match, the API returns a revision_not_found 404 rather than serving stale-but-wrong bytes. In normal CLI usage the cache stays in sync after every write, so this never trips; direct API callers holding older revision IDs will see 404s.

If your backend supports versioning, leave FILES_S3_VERSIONING unset (or set it to enabled). Use disabled only when you genuinely cannot enable versioning on the underlying object store.

Database Setup

PostgreSQL 16 or later is required. Migrations run automatically on startup and are coordinated across instances using a database advisory lock, so it's safe to start multiple instances simultaneously.

Deployment

Running the Container

docker run -d \
  --name witan-api \
  -p 3000:3000 \
  -e STAGE=production \
  -e AWS_REGION=us-east-1 \
  -e WITAN_MGMT_API_KEY=wk_live_xxx \
  -e WITAN_MGMT_API_WAF_TOKEN=xxx \
  -e FILES_S3_BUCKET=my-bucket \
  -e POSTGRES_DB_URL=postgresql://user:pass@host:5432/db \
  ghcr.io/witanlabs/api:v1.0.0

Startup Preflight Checks

On startup, the Witan API verifies connectivity to its dependencies. The required preflight checks are:

Code Service Check
0001 PostgreSQL Connection and version (16+)
0002 S3 Bucket When FILES_S3_VERSIONING=enabled: accessible and versioning enabled. When disabled: accessible.
0004 Management API Connectivity and API key

You can re-run all checks at any time via GET /health/deps:

curl http://localhost:3000/health/deps
# {"status":"ok","timestamp":"...","checks":[{"code":"0001","severity":"required","status":"pass"}, ...]}

Health Check

  • Endpoint: GET /health
  • Recommended check interval: 30 seconds
  • Timeout: 3 seconds
curl http://localhost:3000/health
# {"status":"ok","timestamp":"...","meta":{"STAGE":"production","VERSION":"v1.0.0","GIT_SHA":"..."}}

High Availability

The API supports horizontal scaling. Files-backed xlsx endpoints keep a small per-instance cache of warm workbook sessions when clients opt into ?cache=true. Enable session affinity on your load balancer so follow-up requests for the same file reach the same instance, and match the affinity duration to XLSX_EXEC_SESSION_CACHE_TTL_MS (default 5 minutes). Without affinity, cache=true requests still work but typically miss the local cache.

The API emits Set-Cookie: witan_xlsx_session=1 on successful cache=true responses, path-scoped to /v0/orgs/<orgId>/files/<fileId>. Load balancers that key off application-set cookies (e.g. AWS ALB app_cookie stickiness) can pin on it directly. Load balancers that generate their own affinity cookie ignore this header and work as-is; in that case, scope the LB's cookie path similarly where possible so unrelated routes aren't pinned globally.

Setting Value
Frontend port 443 (HTTPS)
Backend port 3000 (HTTP)
Idle timeout 5 minutes
Session affinity Recommended, duration matched to the cache TTL above
Health check path GET /health
Health check interval 30 seconds

TLS: Terminate TLS at the load balancer. The container serves plain HTTP on port 3000.

Request body size: The API accepts file uploads up to 25 MB via multipart form data. Ensure your load balancer allows request bodies of at least 25 MB and does not strip or reject multipart/form-data content types.

Headers: The API uses standard HTTP headers. Dropping invalid or malformed headers at the load balancer is recommended for defense in depth.

All CLI operations are synchronous, so a short idle timeout is sufficient. When clients opt into cache=true on files-backed xlsx endpoints, budget memory for up to XLSX_EXEC_SESSION_CACHE_MAX warm workbook processes per API instance.

Configuring the CLI

Authentication

The CLI authenticates to the Witan API using an Org API Key as a Bearer token. Org API Keys are created at app.witanlabs.com.

Pointing the CLI at Your API

Set the WITAN_API_URL environment variable or use the --api-url flag:

# Via environment variable
export WITAN_API_URL=https://your-api.example.com
export WITAN_API_KEY=your-org-api-key

# Or via flags
witan xlsx calc workbook.xlsx --api-url https://your-api.example.com --api-key your-org-api-key

Stateless vs Files-Backed Mode

The CLI supports two modes:

  • Files-backed (default): Uploads the workbook once, then references it by ID for subsequent operations. Faster for repeated operations on the same file.
  • Stateless (--stateless): Sends the full workbook with every request. No data retained on the server between requests.

Monitoring & Observability

Logs

The Witan API writes newline-delimited JSON logs to stdout/stderr (configurable via LOG_STYLE).

Key log fields:

  • level: Log severity
  • time: ISO 8601 timestamp
  • message: Log message
  • meta: Contextual data (request details, response status, etc.)

The API avoids logging sensitive data and PII by only logging explicit values. As a backstop, consider configuring data protection during log ingestion — for example, CloudWatch Logs data protection policies can automatically detect and mask sensitive data such as credentials and personal information.

Error Tracking with Sentry

Option 1 (Recommended): Use Witan's Sentry DSN

  • Contact Witan support for the DSN
  • Enables Witan to proactively identify and help resolve issues

Option 2: Use your own Sentry project

  • Set SENTRY_DSN to your project's DSN
  • Full control over error data

Security

For the xlsx execution sandbox, workbook subprocess isolation, resource limits, and recommended container hardening, see the security model.

Witan Management API

The Witan API communicates with the Witan-hosted Management API (https://management-api.witanlabs.com) for the following purposes only:

  • Authentication: Validating API keys and JWTs
  • Authorization: Checking org membership and billing status
  • Billing: Reporting usage metrics (request counts) for billing

No request bodies, response bodies, file content, spreadsheet data, or document content is ever sent to the Management API. All file and document processing happens entirely within your infrastructure.

Network Security

The Witan API can be exposed to the public internet. All API requests require a valid Org API Key as a Bearer token.

Troubleshooting

Verify Health

curl http://localhost:3000/health
# Expected: {"status":"ok", ...}

Verify Dependencies

curl http://localhost:3000/health/deps
# Expected: {"status":"ok", "checks": [...]}

Support

For assistance with self-hosting:

  • Management API Key: Contact Witan support
  • Sentry DSN: Contact Witan support
  • Technical Issues: Contact Witan support