Elite deployment guide

Self-hosted aiwonder deployment guide

This guide shows the deployment boundary for firms and departments that need Leapable backend services inside their own environment: PostgreSQL on ZFS, customer-operated OCR, license bootstrap, and an air-gapped path that stays honest about what must be mirrored before network access is removed.

Leapable local vault and cloud processing data-flow boundary
On-prem architecture diagram: shared services move inside your environment, while user vaults remain local rather than becoming a hosted document silo.
Vaults stay local Each user keeps a local SQLite vault on their workstation.
Backend moves inside Marketplace, workers, Redis, PostgreSQL, TEI, and logs run on customer hosts.
OCR needs GPUs Surya + Marker workers require pinned CUDA-capable hardware or no OCR.
Air-gap is explicit Artifacts, licenses, model weights, and endpoints must be mirrored first.
Scope

What you are self-hosting

A self-hosted deployment is a customer-run aiwonder stack, not a Docker prerequisite on every end-user machine. Workstations still run the native sidecar. The shared services below move from Leapable-operated infrastructure to customer-operated infrastructure.

Customer aiwonder host

  • Marketplace API, account sessions, billing state, job queues, and support operations.
  • PostgreSQL, PgBouncer, Redis, Caddy, Cloudflare Tunnel or equivalent ingress.
  • TEI embedding and reranker endpoints kept warm behind private networking.

Workstation sidecar

  • Native Windows, macOS, or Linux sidecar installed per user.
  • Local SQLite vault, source files, settings, port file, and MCP client configs.
  • Cloud URLs swapped to customer endpoints through managed runtime env.
Readiness

System requirements

The customer-operated stack must satisfy these requirements before an on-prem pilot can process private documents. Missing prerequisites fail the rollout closed rather than silently falling back to public hosted services.

Backend host

  • Linux server with Bun runtime, supervised services, private ingress, and ZFS-backed storage.
  • PostgreSQL plus PgBouncer for central operational state and durable migration ledgers.
  • Redis append-only queue state and bounded scratch directories for worker jobs.

AI and document services

  • NVIDIA GPU capacity for OCR workers when PDF, Office, and image OCR is in scope.
  • Customer TEI embedding and reranker endpoints reachable only inside approved networks.
  • Native Windows, macOS, or Linux sidecars configured with customer endpoint URLs.
Storage

ZFS + PostgreSQL setup

Keep hot operational state on fast storage, separate archival evidence, and make the backup story explicit before any customer data reaches the stack.

Mount Purpose Verification
/zfs/hot/postgres PostgreSQL data directory for central operational state. pg_isready, migration ledger rows, WAL archiving enabled.
/zfs/hot/postgres-wal-archive WAL archive so recovery can prove the exact committed state. Recent WAL files present and restorable on a scratch host.
/zfs/hot/redis Redis append-only queue state for workers and events. AOF path exists, Redis answers locally, no public listener.
/zfs/hot/tmpwork Bounded scratch for OCR, redaction, and embed jobs. TTL sweep active and service units restrict writes to this path.
/zfs/archive Longer-retention ingest, published artifacts, audit exports, and backups. Snapshot, restic, or equivalent restore drill passes from bytes.
Services

Service topology

Deploy the stack as supervised services with fail-closed health checks. A process list is not enough; each service must prove its source of truth after start.

Leapable provenance graph from file hash through cited answer
The provenance graph is still anchored in the user vault. Backend services accelerate processing and validation.

Minimum supervised units

  • leapable-marketplace on private port 4000.
  • leapable-worker-embed, ingest, redact, and outbox relay.
  • TEI general, legal, and reranker services on loopback-only ports.
  • Secrets loader writing root-owned env files from the customer vault.
OCR

Local Surya + Marker OCR

Hosted Leapable uses GPU OCR workers for PDFs, Office files, and images. In an on-prem deployment, that path is replaced with a customer-operated GPU worker that runs the same Surya + Marker extraction stack and stamps model revisions into every result.

Fail closed: OCR workers must require CUDA when configured for GPU work. CPU fallback hides capacity failures and changes customer-visible latency.

Worker requirements

  • NVIDIA GPU with pinned driver, CUDA, model weights, and image digest.
  • Signed upload/download URLs or an internal object store with equivalent audit.
  • Every output includes worker digest, build date, model revision, and page spans.

Endpoint contract

  • Sidecars call customer OCR endpoints, not public hosted endpoints.
  • Embedding and reranking URLs point at customer TEI endpoints.
  • End-user artifacts still do not ship heavyweight OCR model packages.
Bootstrap

Step-by-step install and License-key bootstrap

The sidecar must prove account binding before it calls customer-hosted services. Do not bypass this step for air-gapped pilots; pre-seed the same state instead.

01

Create the account

Issue the account and license in the customer backend or import an approved seed.

02

Write local state

Installer writes .license_key, account email, and customer endpoint URLs.

03

Bind the runtime

Sidecar validates account, health, tunnel, and managed runtime kind before use.

04

Read the verdict

Verify the local port file, backend row, service health, and audit trail bytes.

Disconnected sites

Air-gapped install path

Air-gapped operation is possible only when every network dependency is mirrored before the cutover. Without a customer GPU OCR path, the disconnected workflow is limited to text-heavy workspaces and locally available AI models.

Pre-seed before disconnect

  • Installer artifacts, update manifests, signatures, and checksums.
  • License records, account bindings, service tokens, and revocation list.
  • TEI weights, OCR weights, Python wheels, and container images.

Verify after disconnect

systemctl is-active leapable-marketplace
curl -fsS http://127.0.0.1:4000/health
psql "$AIWONDER_PGBOUNCER_URL" -c "select 1"
curl -fsS http://127.0.0.1:4100/health
Controls

Security posture

Self-hosting changes control ownership; it does not weaken Leapable's verification model. The deployment must preserve local vault isolation, transient processing, and source-of-truth readback from persisted state.

Network and secrets

  • Service env files are rendered from the customer secrets vault with root-only permissions.
  • Public hosted endpoints are disabled for air-gapped and customer-hosted modes.

Audit and provenance

  • Source bytes, page spans, worker digests, and chain hashes remain inspectable.
  • Health checks are not accepted without row, file, queue, or artifact readback.
Acceptance

Evidence required before rollout

Treat every green command as a claim. The rollout is accepted only after operators read the durable state that proves the deployment did what it said.

Layer Source of truth Fail-closed condition
Installer Signed artifact bytes, manifest hash, sidecar version, local port file. Manifest points at missing artifact or stale version.
Backend PostgreSQL rows, migration ledger, Redis queue state, service env files. Health returns 200 while rows or migrations are absent.
OCR Worker digest, model revision, page-level OCR result, provenance records. Worker silently falls back to CPU or unstamped output.
Air-gap Offline artifact mirror, pre-seeded license, local endpoint health. Any call escapes to public hosted infrastructure.
Next step

Use this guide with Security and Elite pricing

The security boundary is documented on the Security page. Enterprise volume and custom deployment fit the Elite tier. Legal, procurement, and infrastructure review should happen before a production cutover.