Self-Hosted Deployment
This guide walks through deploying the Witan API in your own infrastructure. It powers the Witan CLI — spreadsheet operations (xlsx calc, xlsx exec, xlsx lint, xlsx render) and document reading (read).
Architecture Overview
┌───────────────────────────────────────────────────────┐
│ Your Infrastructure │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Witan API │◄────►│ Postgres DB │ │
│ │ (Docker) │ │ (17+) │ │
│ └───┬─────────────┘ └─────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ S3 Bucket │ │
│ │ (File Store) │ │
│ └─────────────────┘ │
└──────┼─────────────────────────────────────────────────┘
│ HTTPS
▼
┌──────────────────────┐
│ Witan Infrastructure│
│ ┌────────────────┐ │
│ │ Management API │ │
│ │ (Witan-hosted) │ │
│ └────────────────┘ │
│ - Auth validation │
│ - Billing │
└──────────────────────┘
The Witan API runs in your infrastructure and connects to:
- Witan Management API (Witan-hosted): Handles authentication and billing
- S3 Storage: Your bucket for file uploads
- PostgreSQL: Your database for file metadata and state
CLI Endpoints
The Witan CLI uses the following API endpoints, all of which are synchronous request-response:
Stateless Mode (file sent in request body)
| Method | Path | Description |
|---|---|---|
| POST | /v0/orgs/:orgId/xlsx/exec |
Execute JavaScript against a workbook |
| POST | /v0/orgs/:orgId/xlsx/calc |
Recalculate formulas |
| POST | /v0/orgs/:orgId/xlsx/lint |
Run lint diagnostics |
| POST | /v0/orgs/:orgId/xlsx/render |
Render range to image |
| POST | /v0/orgs/:orgId/read |
Extract text from documents |
Files-Backed Mode (uploaded file tracking)
| Method | Path | Description |
|---|---|---|
| POST | /v0/orgs/:orgId/files |
Upload a file |
| PUT | /v0/orgs/:orgId/files/:fileId |
Upload a new version |
| GET | /v0/orgs/:orgId/files/:fileId/content |
Download file content |
| POST | /v0/orgs/:orgId/files/:fileId/xlsx/exec |
Execute JavaScript |
| GET | /v0/orgs/:orgId/files/:fileId/xlsx/calc |
Recalculate formulas |
| GET | /v0/orgs/:orgId/files/:fileId/xlsx/lint |
Run lint diagnostics |
| GET | /v0/orgs/:orgId/files/:fileId/xlsx/render |
Render range to image |
| GET | /v0/orgs/:orgId/files/:fileId/read |
Extract text from documents |
All endpoints are synchronous — the CLI sends a request and waits for the complete response.
Prerequisites
Required
- Docker: Version 20.10 or later
- PostgreSQL: Version 16 or later
- S3 Bucket: With write access (AWS S3 or S3-compatible storage). Versioning is recommended for recoverability — see S3 Versioning Modes.
- Network Access: Outbound HTTPS to
https://management-api.witanlabs.com(authentication and billing) - Management API Key: Obtained from Witan support
- GitHub Account: For pulling the Docker image from GHCR
Resource Requirements
| Configuration | vCPU | RAM | Notes |
|---|---|---|---|
| Minimum | 1 | 2 GB | Single instance, development/testing |
| Recommended | 2 | 4 GB | Multiple instances for production |
Getting the Docker Image
The Witan API Docker image is available from GitHub Container Registry (GHCR).
Authentication
Witan will invite you to the witanlabs/api repository on GitHub, where Docker images are published with each release. Once you have access, authenticate with GHCR using a GitHub Personal Access Token (classic) with the read:packages scope:
echo YOUR_GITHUB_PAT | docker login ghcr.io -u YOUR_GITHUB_USERNAME --password-stdin
Credential Hygiene
The PAT used to pull the image is a long-lived credential, so we recommend a few practices to keep it safe:
- Scope the token narrowly. The PAT only needs the
read:packagesscope — nothing else. - Set an expiry. Give the token a fixed expiry rather than leaving it non-expiring — this bounds the impact of an unnoticed leak. We suggest under 90 days.
- Rotate on a cadence. Regenerate the token before it expires so pulls keep working uninterrupted.
- Store it securely. Keep the token in a secrets manager or your CI's secret store — never commit it to source control or share it in plain text.
If a token is ever exposed, revoke it from the GitHub account's settings and generate a replacement.
Pulling the Image
# Pull a specific version (recommended for production)
docker pull ghcr.io/witanlabs/api:v1.0.0
# Pull the latest version
docker pull ghcr.io/witanlabs/api:latest
The same image tags support linux/amd64 and linux/arm64. Docker and
orchestrators such as ECS or Kubernetes automatically pull the matching
variant for the host architecture.
Configuration
Required Environment Variables
| Variable | Description | Example |
|---|---|---|
STAGE |
Deployment stage | production |
AWS_REGION |
AWS region for S3. When using S3-compatible storage via AWS_ENDPOINT_URL, any valid value (e.g. us-east-1) can be used. |
us-east-1 |
WITAN_MGMT_API_KEY |
Management API key from Witan support | wk_live_... |
WITAN_MGMT_API_WAF_TOKEN |
Shared secret for WAF rate limit tiers | Contact Witan support |
FILES_S3_BUCKET |
S3 bucket name for file uploads | acme-witan-files |
POSTGRES_DB_URL |
PostgreSQL connection string | postgresql://user:pass@host:5432/db |
Optional Environment Variables
| Variable | Description | Default |
|---|---|---|
MIGRATE_DB_ON_STARTUP |
Run database migrations on startup | true |
LOG_LEVEL |
Logging verbosity (debug, info, warn, error, silent) |
info |
LOG_STYLE |
Log output format (pretty, json, yaml) |
json |
SENTRY_DSN |
Sentry error tracking (see below) | — |
AWS_ENDPOINT_URL |
S3-compatible storage endpoint (MinIO, etc.) | — |
FILES_S3_VERSIONING |
Whether the bucket has versioning (enabled or disabled) — see section below |
enabled |
EXEC_RESULT_MAX_BYTES |
Max bytes for an xlsx exec script's serialized result |
5242880 (5 MB) |
EXEC_IMAGE_MAX_COUNT |
Max preview images returned from an xlsx exec response; set 0 for no image count cap |
10 |
XLS_CONVERSION_TIMEOUT_MS |
Max wall-clock time for legacy .xls to .xlsx conversion |
120000 (2 min) |
XLSX_EXEC_SESSION_CACHE_ENABLED |
Allow opt-in cache=true on files-backed xlsx endpoints to keep workbook sessions warm |
true |
XLSX_EXEC_SESSION_CACHE_MAX |
Max warm files-backed xlsx sessions per API instance | 16 |
XLSX_EXEC_SESSION_CACHE_TTL_MS |
Idle TTL for warm files-backed xlsx sessions | 300000 (5 min) |
Object Storage Setup
The Witan API uses S3-compatible object storage for file uploads.
- Create a bucket in your preferred region
- Enable versioning (recommended; see S3 Versioning Modes for when this is optional)
- Configure credentials with read, write, and delete permissions on the bucket
For S3-compatible storage (MinIO, etc.), set AWS_ENDPOINT_URL to your storage endpoint.
No public access is required.
S3 Versioning Modes
The Witan API supports two bucket configurations, controlled by FILES_S3_VERSIONING. The user-facing CLI surface (xlsx, read, files) behaves the same in both modes for normal single-file workflows; the practical difference is durability and recoverability, not functionality.
| Mode | When to use | Bucket requirement |
|---|---|---|
FILES_S3_VERSIONING=enabled (default) |
AWS S3, MinIO, or any S3-compat backend that supports object versioning. | Versioning enabled |
FILES_S3_VERSIONING=disabled |
S3-compat backends that cannot expose versioning (e.g. s3proxy fronting Azure Blob Storage). | Versioning not required |
Why this matters: recoverability. Every xlsx exec --save (or any direct PUT /v0/files/:fileId) overwrites the object at the same S3 key. With versioning enabled, S3 retains the previous bytes as a historical version, recoverable even if the local copy is later lost or corrupted. With versioning disabled, those bytes are permanently overwritten. This safety net matters when:
- Concurrent writes to the same file path (e.g. overlapping
xlsx exec --saveinvocations from two terminals): last writer wins both on disk and in S3, but versioning preserves the loser's bytes in S3. - A local file is overwritten or damaged after upload, and you need to retrieve a previously uploaded revision.
Behavioural differences in disabled mode (mostly visible to clients calling the API directly rather than via the CLI):
revision_idvalues returned by the API are derived from the object's ETag instead of the S3VersionId.DELETE /v0/orgs/:orgId/files/:fileIdis not mounted. Deletion isn't supported when versioning is disabled (no soft-delete via delete markers, and the destructive alternative would be irrecoverable). The CLI does not call this endpoint, but direct API callers will get 404.GET /v0/orgs/:orgId/files/:fileId/revisionsis not mounted. Revision listing relies on S3ListObjectVersions, which requires a versioned bucket. The CLI does not call this endpoint, but direct API callers will get 404.?revision=Xreads strict-match against the current ETag. If they don't match, the API returns arevision_not_found404 rather than serving stale-but-wrong bytes. In normal CLI usage the cache stays in sync after every write, so this never trips; direct API callers holding older revision IDs will see 404s.
If your backend supports versioning, leave FILES_S3_VERSIONING unset (or set it to enabled). Use disabled only when you genuinely cannot enable versioning on the underlying object store.
Database Setup
PostgreSQL 16 or later is required. Migrations run automatically on startup and are coordinated across instances using a database advisory lock, so it's safe to start multiple instances simultaneously.
Deployment
Running the Container
docker run -d \
--name witan-api \
-p 3000:3000 \
-e STAGE=production \
-e AWS_REGION=us-east-1 \
-e WITAN_MGMT_API_KEY=wk_live_xxx \
-e WITAN_MGMT_API_WAF_TOKEN=xxx \
-e FILES_S3_BUCKET=my-bucket \
-e POSTGRES_DB_URL=postgresql://user:pass@host:5432/db \
ghcr.io/witanlabs/api:v1.0.0
Startup Preflight Checks
On startup, the Witan API verifies connectivity to its dependencies. The required preflight checks are:
| Code | Service | Check |
|---|---|---|
0001 |
PostgreSQL | Connection and version (16+) |
0002 |
S3 Bucket | When FILES_S3_VERSIONING=enabled: accessible and versioning enabled. When disabled: accessible. |
0004 |
Management API | Connectivity and API key |
You can re-run all checks at any time via GET /health/deps:
curl http://localhost:3000/health/deps
# {"status":"ok","timestamp":"...","checks":[{"code":"0001","severity":"required","status":"pass"}, ...]}
Health Check
- Endpoint:
GET /health - Recommended check interval: 30 seconds
- Timeout: 3 seconds
curl http://localhost:3000/health
# {"status":"ok","timestamp":"...","meta":{"STAGE":"production","VERSION":"v1.0.0","GIT_SHA":"..."}}
High Availability
The API supports horizontal scaling. Files-backed xlsx endpoints keep a small
per-instance cache of warm workbook sessions when clients opt into
?cache=true. Enable session affinity on your load balancer so follow-up
requests for the same file reach the same instance, and match the affinity
duration to XLSX_EXEC_SESSION_CACHE_TTL_MS (default 5 minutes). Without
affinity, cache=true requests still work but typically miss the local cache.
The API emits Set-Cookie: witan_xlsx_session=1 on successful cache=true
responses, path-scoped to /v0/orgs/<orgId>/files/<fileId>. Load balancers
that key off application-set cookies (e.g. AWS ALB app_cookie stickiness)
can pin on it directly. Load balancers that generate their own affinity cookie
ignore this header and work as-is; in that case, scope the LB's cookie path
similarly where possible so unrelated routes aren't pinned globally.
| Setting | Value |
|---|---|
| Frontend port | 443 (HTTPS) |
| Backend port | 3000 (HTTP) |
| Idle timeout | 5 minutes |
| Session affinity | Recommended, duration matched to the cache TTL above |
| Health check path | GET /health |
| Health check interval | 30 seconds |
TLS: Terminate TLS at the load balancer. The container serves plain HTTP on port 3000.
Request body size: The API accepts file uploads up to 25 MB via multipart form data. Ensure your load balancer allows request bodies of at least 25 MB and does not strip or reject multipart/form-data content types.
Headers: The API uses standard HTTP headers. Dropping invalid or malformed headers at the load balancer is recommended for defense in depth.
All CLI operations are synchronous, so a short idle timeout is sufficient. When
clients opt into cache=true on files-backed xlsx endpoints, budget memory for
up to XLSX_EXEC_SESSION_CACHE_MAX warm workbook processes per API instance.
Configuring the CLI
Authentication
The CLI authenticates to the Witan API using an Org API Key as a Bearer token. Org API Keys are created at app.witanlabs.com.
Pointing the CLI at Your API
Set the WITAN_API_URL environment variable or use the --api-url flag:
# Via environment variable
export WITAN_API_URL=https://your-api.example.com
export WITAN_API_KEY=your-org-api-key
# Or via flags
witan xlsx calc workbook.xlsx --api-url https://your-api.example.com --api-key your-org-api-key
Stateless vs Files-Backed Mode
The CLI supports two modes:
- Files-backed (default): Uploads the workbook once, then references it by ID for subsequent operations. Faster for repeated operations on the same file.
- Stateless (
--stateless): Sends the full workbook with every request. No data retained on the server between requests.
Monitoring & Observability
Logs
The Witan API writes newline-delimited JSON logs to stdout/stderr (configurable via LOG_STYLE).
Key log fields:
level: Log severitytime: ISO 8601 timestampmessage: Log messagemeta: Contextual data (request details, response status, etc.)
The API avoids logging sensitive data and PII by only logging explicit values. As a backstop, consider configuring data protection during log ingestion — for example, CloudWatch Logs data protection policies can automatically detect and mask sensitive data such as credentials and personal information.
Error Tracking with Sentry
Option 1 (Recommended): Use Witan's Sentry DSN
- Contact Witan support for the DSN
- Enables Witan to proactively identify and help resolve issues
Option 2: Use your own Sentry project
- Set
SENTRY_DSNto your project's DSN - Full control over error data
Security
For the xlsx execution sandbox, workbook subprocess isolation, resource limits, and recommended container hardening, see the security model.
Witan Management API
The Witan API communicates with the Witan-hosted Management API (https://management-api.witanlabs.com) for the following purposes only:
- Authentication: Validating API keys and JWTs
- Authorization: Checking org membership and billing status
- Billing: Reporting usage metrics (request counts) for billing
No request bodies, response bodies, file content, spreadsheet data, or document content is ever sent to the Management API. All file and document processing happens entirely within your infrastructure.
Network Security
The Witan API can be exposed to the public internet. All API requests require a valid Org API Key as a Bearer token.
Troubleshooting
Verify Health
curl http://localhost:3000/health
# Expected: {"status":"ok", ...}
Verify Dependencies
curl http://localhost:3000/health/deps
# Expected: {"status":"ok", "checks": [...]}
Support
For assistance with self-hosting:
- Management API Key: Contact Witan support
- Sentry DSN: Contact Witan support
- Technical Issues: Contact Witan support