This guide shows the deployment boundary for firms and departments that need Leapable
backend services inside their own environment: PostgreSQL on ZFS, customer-operated OCR,
license bootstrap, and an air-gapped path that stays honest about what must be mirrored
before network access is removed.
On-prem architecture diagram: shared services move inside your environment, while user
vaults remain local rather than becoming a hosted document silo.
Vaults stay localEach user keeps a local SQLite vault on their workstation.
Backend moves insideMarketplace, workers, Redis, PostgreSQL, TEI, and logs run on customer hosts.
OCR needs GPUsSurya + Marker workers require pinned CUDA-capable hardware or no OCR.
Air-gap is explicitArtifacts, licenses, model weights, and endpoints must be mirrored first.
Scope
What you are self-hosting
A self-hosted deployment is a customer-run aiwonder stack, not a Docker prerequisite on
every end-user machine. Workstations still run the native sidecar. The shared services
below move from Leapable-operated infrastructure to customer-operated infrastructure.
Customer aiwonder host
Marketplace API, account sessions, billing state, job queues, and support
operations.
PostgreSQL, PgBouncer, Redis, Caddy, Cloudflare Tunnel or equivalent
ingress.
TEI embedding and reranker endpoints kept warm behind private networking.
Workstation sidecar
Native Windows, macOS, or Linux sidecar installed per user.
Local SQLite vault, source files, settings, port file, and MCP client
configs.
Cloud URLs swapped to customer endpoints through managed runtime env.
Readiness
System requirements
The customer-operated stack must satisfy these requirements before an on-prem pilot can
process private documents. Missing prerequisites fail the rollout closed rather than
silently falling back to public hosted services.
Backend host
Linux server with Bun runtime, supervised services, private ingress, and
ZFS-backed storage.
PostgreSQL plus PgBouncer for central operational state and durable migration
ledgers.
Redis append-only queue state and bounded scratch directories for worker
jobs.
AI and document services
NVIDIA GPU capacity for OCR workers when PDF, Office, and image OCR is in
scope.
Customer TEI embedding and reranker endpoints reachable only inside approved
networks.
Native Windows, macOS, or Linux sidecars configured with customer endpoint
URLs.
Storage
ZFS + PostgreSQL setup
Keep hot operational state on fast storage, separate archival evidence, and make the
backup story explicit before any customer data reaches the stack.
Mount
Purpose
Verification
/zfs/hot/postgres
PostgreSQL data directory for central operational state.
WAL archive so recovery can prove the exact committed state.
Recent WAL files present and restorable on a scratch host.
/zfs/hot/redis
Redis append-only queue state for workers and events.
AOF path exists, Redis answers locally, no public listener.
/zfs/hot/tmpwork
Bounded scratch for OCR, redaction, and embed jobs.
TTL sweep active and service units restrict writes to this path.
/zfs/archive
Longer-retention ingest, published artifacts, audit exports, and backups.
Snapshot, restic, or equivalent restore drill passes from bytes.
Services
Service topology
Deploy the stack as supervised services with fail-closed health checks. A process list
is not enough; each service must prove its source of truth after start.
The provenance graph is still anchored in the user vault. Backend services accelerate
processing and validation.
Minimum supervised units
leapable-marketplace on private port 4000.
leapable-worker-embed, ingest, redact, and outbox relay.
TEI general, legal, and reranker services on loopback-only ports.
Secrets loader writing root-owned env files from the customer vault.
OCR
Local Surya + Marker OCR
Hosted Leapable uses GPU OCR workers for PDFs, Office files, and images. In an on-prem
deployment, that path is replaced with a customer-operated GPU worker that runs the same
Surya + Marker extraction stack and stamps model revisions into every result.
Fail closed: OCR workers must require CUDA when configured for GPU work.
CPU fallback hides capacity failures and changes customer-visible latency.
Worker requirements
NVIDIA GPU with pinned driver, CUDA, model weights, and image digest.
Signed upload/download URLs or an internal object store with equivalent
audit.
Every output includes worker digest, build date, model revision, and page
spans.
Endpoint contract
Sidecars call customer OCR endpoints, not public hosted endpoints.
Embedding and reranking URLs point at customer TEI endpoints.
End-user artifacts still do not ship heavyweight OCR model packages.
Bootstrap
Step-by-step install and License-key bootstrap
The sidecar must prove account binding before it calls customer-hosted services. Do not
bypass this step for air-gapped pilots; pre-seed the same state instead.
01
Create the account
Issue the account and license in the customer backend or import an approved seed.
02
Write local state
Installer writes .license_key, account email, and customer endpoint URLs.
03
Bind the runtime
Sidecar validates account, health, tunnel, and managed runtime kind before use.
04
Read the verdict
Verify the local port file, backend row, service health, and audit trail bytes.
Disconnected sites
Air-gapped install path
Air-gapped operation is possible only when every network dependency is mirrored before
the cutover. Without a customer GPU OCR path, the disconnected workflow is limited to
text-heavy workspaces and locally available AI models.
Pre-seed before disconnect
Installer artifacts, update manifests, signatures, and checksums.
License records, account bindings, service tokens, and revocation list.
TEI weights, OCR weights, Python wheels, and container images.
Self-hosting changes control ownership; it does not weaken Leapable's verification
model. The deployment must preserve local vault isolation, transient processing, and
source-of-truth readback from persisted state.
Network and secrets
Service env files are rendered from the customer secrets vault with root-only
permissions.
Public hosted endpoints are disabled for air-gapped and customer-hosted
modes.
Health checks are not accepted without row, file, queue, or artifact
readback.
Acceptance
Evidence required before rollout
Treat every green command as a claim. The rollout is accepted only after operators read
the durable state that proves the deployment did what it said.
Layer
Source of truth
Fail-closed condition
Installer
Signed artifact bytes, manifest hash, sidecar version, local port file.
Manifest points at missing artifact or stale version.
Backend
PostgreSQL rows, migration ledger, Redis queue state, service env files.
Health returns 200 while rows or migrations are absent.
OCR
Worker digest, model revision, page-level OCR result, provenance records.
Worker silently falls back to CPU or unstamped output.
Air-gap
Offline artifact mirror, pre-seeded license, local endpoint health.
Any call escapes to public hosted infrastructure.
Next step
Use this guide with Security and Elite pricing
The security boundary is documented on the
Security page. Enterprise volume and custom
deployment fit the Elite tier.
Legal, procurement, and infrastructure review should happen before a production cutover.