DataSitr — Architecture Whitepaper

Foreword

To our guests, partners, and evaluators:

DataSitr lets Saudi organizations use modern AI systems without sending raw personal data across borders by default. The gateway sits between your apps and the AI providers your tenants choose; it detects personal data, applies the privacy transformation the lane requires, and routes the request by sensitivity, tenant policy, and Saudi residency rules.

DataSitr is not a chatbot wrapper or a generic gateway. It is a Saudi-context privacy boundary built around three commitments: detect personal data thoroughly in both Arabic and English, vault what should not leave the Kingdom, and record every routing decision so a buyer or a regulator can verify what happened — not take our word for it.

This is a live-pilot architecture description, not a certification or accreditation claim. What follows is intentionally precise: each section describes what the system does today, the artifacts it produces, and the limits we are honest about. Read this whitepaper alongside the trust, compliance, and resources pages — those carry the dated evidence behind the architecture described here.

Context worth stating up front: PDPL enforcement is operational. SDAIA confirmed 48 enforcement decisions in January 2026 with administrative fines up to SAR 5 million (doubled for repeat violations) and up to two years' imprisonment for intentional sensitive-data violations. The technical work this whitepaper describes exists to make compliance demonstrable, not asserted, in that environment.

Sulaiman Husam Mohammad-Ali Abonami Founder & Architect

Gateway operating model

The gateway intercepts each request before it reaches an AI provider, changes the payload according to privacy risk, and records the decision for later review. The four steps below — intercept, detect, transform, route + record — happen for every request in scope.

01 Intercept

The request enters DataSitr before any model call. Tenant policy, route configuration, and request metadata are loaded at the edge of the pipeline.

02 Detect

Presidio, spaCy, Saudi recognizers, and Arabic NER operate together. Regex is one layer, not the architecture.

03 Transform

Direct identifiers become typed placeholders where reversible protection is allowed. Highest-risk text can remain intact only for in-Kingdom processing.

04 Route + record

The policy engine chooses green, amber, red, or blocked. Compliance metadata is written alongside the processing record.

Current production posture: as of 2026-05-20, the live DataSitr deployment is Alibaba ACK Riyadh primary with scoped GCP Dammam drill-standby evidence. The May 4 customer-route cutover to ACK passed a 4-hour soak; the May 16 Dammam drill covers DNS / GKE / TLS routing only. The platform runs Alibaba KMS startup bootstrap, encrypted vault storage, machine-readable compliance records, active alerting, and dated continuity evidence. What remains separate from current claims: cross-cloud database replication, auth failover, data-tier failover, HSM-backed custody, and a fully refreshed same-origin browser/session proof pack on this exact baseline.

Arabic detector research

The hard part is Arabic and Saudi context: names, organizations, government identifiers, mixed Arabic-English prompts, and safe Arabic prose that should not be redacted. DataSitr treats this as a measured detector program, not a single model toggle.

Backbone Wojood-warmstarted Arabic NER

The Arabic model sits behind structural and contextual gates. It improves recall without letting every Arabic noun become a person.

Saudi layer National IDs, IBANs, phones, CRs, local names

Saudi-specific recognizers cover local document shapes and name signals that generic PII libraries do not prioritize.

Benchmark +72pp recall over vanilla Presidio on the public detector benchmark

The trust and benchmark pages publish dated, sanitized detector artifacts so the claim can be inspected without a slide deck.

Guardrail False-positive controls for Arabic safe text

Hard negatives, literary Arabic, and support-text cases are part of the detector discipline so privacy protection does not become unusable redaction.

Open benchmark page Open precision / recall JSON Read detector trust notes

Vault, tokenization, rehydration

For reversible lanes, DataSitr replaces detected entities with typed placeholders and stores originals in a tenant-scoped encrypted vault. Rehydration is allowed only in the requesting tenant and request context.

Implemented behavior 1. Detect entities in the inbound prompt.
2. Generate typed placeholders such as [[PERSON:01]].
3. Encrypt original values with AES-256-GCM under tenant-scoped keys.
4. Re-scan transformed text before external eligibility.
5. Rehydrate approved responses only for the original tenant and request context.

This is not format-preserving encryption and not stateless masking. The encrypted state exists because the product must support auditable, tenant-scoped rehydration for approved responses.

Three-lane routing

The policy decision is intentionally conservative. Each request becomes green, amber, red, or blocked based on identifiability, sensitivity, tenant policy, and configured provider paths.

Green — tokenized external
Detected direct identifiers become typed placeholders, then the transformed text is rescanned before it can route to eligible global providers.
Amber — pseudonymized in-Kingdom
The text is transformed, but processing stays on operator-configured Saudi-hosted provider paths.
Red — raw in-Kingdom or blocked
Highest-risk categories, including PDPL Article 1(11)-defined sensitive data, stay intact only for configured in-Kingdom execution. If no compliant path exists, the request is blocked.

Compliance records

Every architecture decision needs a review surface. DataSitr records machine-readable processing metadata per request: classification, destination, legal or policy basis, and evidence material for export.

RoPA Records of Processing Activities

Structured processing records connect each routed request to its classification and purpose context.

Transfers Transfer register entries

External eligibility and in-Kingdom routing decisions become inspectable records rather than hidden provider calls.

Rights Subject-rights workflows

Export, deletion, consent, and breach-register workflows are part of the same compliance operating surface.

Export Signed evidence packs

Reviewer exports are designed for verification. The public compliance page explains what is available and what remains outside current claims.

Evidence boundary

The whitepaper is intentionally precise: live Saudi-hosted pilot, detector benchmarks, encrypted vaulting, compliance records, customer-route HA evidence, and dated continuity evidence. It does not convert those facts into regulator approval, SOC 2, ISO 27001, external pen-test, HSM custody, full-vault verification, or full-region tolerance claims.

Open published constraints Open trust evidence Open live status

Buyer verification path

A serious whitepaper should end with things a buyer can open. Start with the public artifacts, then request the signed reviewer bundle when control-level mappings are needed.

Controls 177-control public matrix summary

Open the JSON or Markdown summary, then request the signed reviewer bundle for control-level references.

Detector Public precision / recall artifacts

Review dated detector outputs and compare the Arabic/Saudi PII story against the benchmark page.

Runtime Status, trust, and deployment pages

Use status for current response checks, trust for dated proof, and deployment for topology and lane diagrams.

Control matrix JSON Control matrix Markdown Trust report JSON Reviewer pack brief

Operating model Gateway first: detect, transform, route, record

DataSitr sits before the model provider. The provider sees only the payload allowed by lane policy.

Detector research Arabic NER plus Saudi structural PII, benchmarked publicly

The research work is presented here as verifiable engineering evidence instead of waiting on a long journal cycle.

Claim boundary Every public constraint resolves to /compliance

The whitepaper explains the architecture. Compliance keeps the canonical constraints list.

Current posture Live Saudi-hosted baseline with Alibaba KMS startup bootstrap.

This is a live-pilot architecture description with customer-route HA now proven on ACK; it is not a regulator-approval, certification, HSM-custody, or full-region tolerance claim.

Research translation The Arabic NER work is framed for buyers: what it catches, how it is gated, and where to verify.

The page avoids academic theater and instead links the model, rules, benchmarks, and claim boundary together.

Inspection path Open the benchmark, trust report, resources, and compliance pages from one place.

The verification section at the end gives buyers a direct path from narrative to artifacts.

Evaluate the gateway with the evidence open.

Evaluate →

Architecture whitepaper.