Self-Hosted xlsx-serve Security Model

This document describes the security model for self-hosted Witan API deployments that serve Witan CLI spreadsheet operations through xlsx-serve.

It applies to the CLI-facing xlsx endpoints, including stateless /xlsx/exec requests and files-backed /files/:fileId/xlsx/exec requests.

Security Boundary

The Witan API runs in your infrastructure. For xlsx operations, the request path is:

  1. The Witan CLI sends an authenticated HTTPS request to your API.
  2. The API validates the bearer token with the Witan Management API.
  3. Workbook bytes are read from the request body or from your configured S3 bucket.
  4. The API starts or leases an xlsx-serve subprocess and communicates with it over JSON-RPC on stdin/stdout.
  5. xlsx-serve opens the workbook, runs the requested operation, and returns JSON and, when requested, updated workbook bytes.

The Witan Management API is used for authentication, authorization, and billing metadata only. Request bodies, response bodies, workbook content, document content, and spreadsheet data are not sent to the Witan Management API.

xlsx-serve provides an application-level sandbox and resource limits for workbook operations. It is not a kernel-level sandbox. Production deployments should still run the API container with normal container hardening, resource quotas, network controls, and least privilege runtime settings.

JavaScript Execution Sandbox

xlsx exec scripts run inside the xlsx-serve subprocess using an embedded JavaScript interpreter. They do not run in Node.js or a browser.

The interpreter is configured with:

  • No host runtime interop from JavaScript.
  • No require, Node.js process, filesystem, shell, or host networking APIs.
  • No browser APIs such as fetch, DOM access, storage, or workers.
  • Rejected JavaScript module imports.
  • No binary-memory, shared-memory, atomics, or realm-construction globals such as ArrayBuffer, typed arrays, DataView, SharedArrayBuffer, Atomics, or ShadowRealm.
  • A fixed host API surface: xlsx, print, console, and JSON-compatible input.
  • JSON-only arguments and results across the JavaScript-to-workbook bridge.
  • Strict-mode execution inside a generated async wrapper.

The workbook API exposed to JavaScript is a narrow wrapper around supported spreadsheet operations. Scripts can inspect and mutate the open workbook through that API, but they cannot directly access arbitrary host files, spawn processes, open sockets, or call host runtime methods.

Excel VBA macros are not executed by xlsx-serve. Macro-enabled workbooks may be opened as workbook files, but the xlsx exec code path runs only the JavaScript provided in the exec request.

Process and Workbook Isolation

Each workbook session is backed by a dedicated xlsx-serve child process.

For stateless /xlsx/exec requests, the API creates a temporary workbook file, starts a dedicated xlsx-serve process for the request, and closes the session after the response. Stateless requests do not reuse workbook process state between calls.

For files-backed xlsx endpoints, the default behavior is also to create a session for the request and close it afterward. Clients may opt into warm session reuse via cache=true on /xlsx/exec, /xlsx/calc, and /xlsx/edit. Read-only file endpoints (/xlsx/meta, /xlsx/view, /xlsx/lint, /xlsx/find, /xlsx/render, /content) opportunistically borrow an already warm session for the same file when one exists. Warm sessions reduce repeated workbook open latency, but do not change the authorization or workbook revision checks.

Cached exec sessions have these isolation properties:

  • The cache is local to each API instance.
  • The cache key includes organization, authenticated principal, file ID, file revision, and locale.
  • A cached session is leased by one request at a time; concurrent requests for the same cache key are serialized.
  • Sessions are closed on eviction, idle TTL expiry, process crash, request errors, and unsaved workbook mutations.
  • A successful save=true mutation rekeys the session to the new file revision.
  • If a script mutates a workbook without save=true, the session is evicted so later requests cannot observe dirty in-memory state.

Warm session caching is controlled by:

Variable Default Purpose
XLSX_EXEC_SESSION_CACHE_ENABLED true Allows opt-in cache=true on files-backed xlsx endpoints.
XLSX_EXEC_SESSION_CACHE_MAX 16 Maximum warm sessions per API instance.
XLSX_EXEC_SESSION_CACHE_TTL_MS 300000 Idle TTL for warm sessions.

Session affinity is not required for correctness but is strongly recommended when clients opt into cache=true, because the cache is per API instance. See the load-balancer guidance.

Exec Request Limits

The API and xlsx-serve both validate exec request sizes and timeouts.

Limit Default / Maximum Notes
Authenticated workbook upload size 25 MB Multipart stateless uploads and files-backed workbook uploads.
Guest stateless upload size 5 MB Applies only when guest access is enabled.
Exec code size 64 KB UTF-8 encoded request code.
Exec input size 64 KB Serialized JSON input.
Exec timeout 30s default, 90s max Whole JavaScript evaluation timeout for authenticated calls.
Guest exec timeout 10s default, 20s max Applies only when guest access is enabled.
Serialized result size 5 MB default Configurable with EXEC_RESULT_MAX_BYTES.
Stdout byte size 5 MB Hard cap enforced in xlsx-serve.
Printed output characters 50000 max Optional max_output_chars request setting.
Preview images returned from exec 10 default Configurable with EXEC_IMAGE_MAX_COUNT; set 0 for no image count cap. Renderer pixel/range caps still apply.
JavaScript execution stack depth 256 Applies to script call and constructor dispatch, including bound/proxy wrappers and built-in constructor paths.
JavaScript same-function recursion depth 1000 Interpreter recursion guard; the execution stack guard is the lower effective depth cap for exec scripts.
JavaScript built-in/native recursion depth 256 Applies to recursive interpreter built-ins such as JSON serialization and array flattening/string conversion.
JavaScript JSON.parse nesting depth 64 Applies to JSON parsed by the interpreter, including exec input hydration.
JavaScript regular expression timeout 25ms Applies to regular expressions evaluated inside exec scripts.
API request timeout 3 minutes HTTP request guard for xlsx routes.
RPC timeout 90s default Guard between API and xlsx-serve; exec uses the requested timeout plus a small grace period.
xlsx-serve idle timeout 60s Exits an idle child process if the API does not keep it alive.
Legacy .xls conversion timeout 120s default Configurable with XLS_CONVERSION_TIMEOUT_MS.

The exec timeout is enforced inside the JavaScript interpreter. The API also applies RPC and HTTP request timeouts, and closes the workbook session when an exec call times out.

Workbook Operation Input Limits

Workbook operations reject inputs that exceed Excel workbook bounds or operation-specific resource budgets before allocating large intermediate data.

Limit Value Applies to
Address row bounds 1 to 1048576 Cell and range address parsing.
Address column bounds 1 to 16384 (XFD) Cell and range address parsing.
Search matcher array entries 100 findCells and findRows.
Search matcher pattern length 4096 chars String and regex matchers.
Search pagination values Non-negative integers limit, offset, and context.
Custom number format length 255 chars Excel-style number format parsing.

Formula Evaluation Limits

Formula calculation is bounded at several levels:

  • Individual formulas use a default 5s calculation timeout.
  • Public calculation operations that accept timeoutMs reject values greater than 90s.
  • Workbook calculation is also bounded by the RPC timeout and the API request timeout for the calculation operation as a whole.
  • Formula evaluation triggered from xlsx exec is additionally bounded by the whole-script exec timeout.
  • Formula evaluation observes the workbook cancellation token, so closing a timed-out request also cancels in-flight calculation work.
  • Formula text is limited to 8192 characters.
  • Formula parsing and expression-tree construction reject formulas deeper than 512 parser or expression-tree nesting levels before those formulas are accepted for write or calculation paths.
  • Formula parsing enforces Excel-compatible caps for function argument count (255), function nesting depth (64), LAMBDA parameters (253), and LET name/value pairs (126).
  • LET, LAMBDA, and defined-name formula invocation have a recursion depth limit of 32.
  • Calc-engine operations that materialize two-dimensional dynamic arrays reject arrays above 10,000,000 elements or outside Excel's row/column limits.
  • Spill recalculation uses bounded retry passes to avoid unbounded spill cascades.

For public CLI calc endpoints, those RPC and HTTP request deadlines are the global wall-clock budget for the calculation request.

Rendering Limits

Rendering for ranges and tiles runs inside the workbook subprocess. Rendering requests are bounded before bitmap allocation.

Limit Value
Device pixel ratio (dpr) 1 to 3
Zoom 0.5 to 2.0
Maximum bitmap width 65535 px
Maximum bitmap height 65535 px
Maximum bitmap area 100,000,000 px
Maximum decoded embedded image size 100,000,000 px
Maximum embedded image byte size 10 MB
Maximum rows per tile render request 8192
Maximum columns per tile render request 2048

Embedded workbook fonts are used for deterministic rendering. The renderer does not need access to arbitrary host font files.

Regex Safety

User-facing search and replace operations that accept regular expressions use a non-backtracking regex engine and a 25ms regex match timeout. This limits catastrophic backtracking from caller-supplied patterns. Unsupported constructs for the non-backtracking engine are rejected rather than silently falling back to a backtracking engine.

Regex handling in customer-facing xlsx operations uses the non-backtracking engine. Where callers supply the regex pattern itself, such as search, replace, and Excel REGEXTEST, REGEXEXTRACT, and REGEXREPLACE formulas, the implementation also applies the explicit 25ms match timeout described above.

JavaScript regular expressions evaluated inside xlsx exec scripts use the embedded JavaScript interpreter's regular expression engine and the same 25ms execution timeout. Regex timeout errors are returned as exec runtime errors.