The request enters DataSitr before any model call. Tenant policy, route configuration, and request metadata are loaded at the edge of the pipeline.
To our guests, partners, and evaluators:
DataSitr lets Saudi organizations use modern AI systems without sending raw personal data across borders by default. The gateway sits between your apps and the AI providers your tenants choose; it detects personal data, applies the privacy transformation the lane requires, and routes the request by sensitivity, tenant policy, and Saudi residency rules.
DataSitr is not a chatbot wrapper or a generic gateway. It is a Saudi-context privacy boundary built around three commitments: detect personal data thoroughly in both Arabic and English, vault what should not leave the Kingdom, and record every routing decision so a buyer or a regulator can verify what happened — not take our word for it.
This is a live-pilot architecture description, not a certification or accreditation claim. What follows is intentionally precise: each section describes what the system does today, the artifacts it produces, and the limits we are honest about. Read this whitepaper alongside the trust, compliance, and resources pages — those carry the dated evidence behind the architecture described here.
Context worth stating up front: PDPL enforcement is operational. SDAIA confirmed 48 enforcement decisions in January 2026 with administrative fines up to SAR 5 million (doubled for repeat violations) and up to two years' imprisonment for intentional sensitive-data violations. The technical work this whitepaper describes exists to make compliance demonstrable, not asserted, in that environment.
The gateway intercepts each request before it reaches an AI provider, changes the payload according to privacy risk, and records the decision for later review. The four steps below — intercept, detect, transform, route + record — happen for every request in scope.
The request enters DataSitr before any model call. Tenant policy, route configuration, and request metadata are loaded at the edge of the pipeline.
Presidio, spaCy, Saudi recognizers, and Arabic NER operate together. Regex is one layer, not the architecture.
Direct identifiers become typed placeholders where reversible protection is allowed. Highest-risk text can remain intact only for in-Kingdom processing.
The policy engine chooses green, amber, red, or blocked. Compliance metadata is written alongside the processing record.
Current production posture: as of 2026-05-20, the live DataSitr deployment is Alibaba ACK Riyadh primary with scoped GCP Dammam drill-standby evidence. The May 4 customer-route cutover to ACK passed a 4-hour soak; the May 16 Dammam drill covers DNS / GKE / TLS routing only. The platform runs Alibaba KMS startup bootstrap, encrypted vault storage, machine-readable compliance records, active alerting, and dated continuity evidence. What remains separate from current claims: cross-cloud database replication, auth failover, data-tier failover, HSM-backed custody, and a fully refreshed same-origin browser/session proof pack on this exact baseline.
The hard part is Arabic and Saudi context: names, organizations, government identifiers, mixed Arabic-English prompts, and safe Arabic prose that should not be redacted. DataSitr treats this as a measured detector program, not a single model toggle.
The Arabic model sits behind structural and contextual gates. It improves recall without letting every Arabic noun become a person.
Saudi-specific recognizers cover local document shapes and name signals that generic PII libraries do not prioritize.
The trust and benchmark pages publish dated, sanitized detector artifacts so the claim can be inspected without a slide deck.
Hard negatives, literary Arabic, and support-text cases are part of the detector discipline so privacy protection does not become unusable redaction.
For reversible lanes, DataSitr replaces detected entities with typed placeholders and stores originals in a tenant-scoped encrypted vault. Rehydration is allowed only in the requesting tenant and request context.
This is not format-preserving encryption and not stateless masking. The encrypted state exists because the product must support auditable, tenant-scoped rehydration for approved responses.
The policy decision is intentionally conservative. Each request becomes green, amber, red, or blocked based on identifiability, sensitivity, tenant policy, and configured provider paths.
Detected direct identifiers become typed placeholders, then the transformed text is rescanned before it can route to eligible global providers.
The text is transformed, but processing stays on operator-configured Saudi-hosted provider paths.
Highest-risk categories, including PDPL Article 1(11)-defined sensitive data, stay intact only for configured in-Kingdom execution. If no compliant path exists, the request is blocked.
Every architecture decision needs a review surface. DataSitr records machine-readable processing metadata per request: classification, destination, legal or policy basis, and evidence material for export.
Structured processing records connect each routed request to its classification and purpose context.
External eligibility and in-Kingdom routing decisions become inspectable records rather than hidden provider calls.
Export, deletion, consent, and breach-register workflows are part of the same compliance operating surface.
Reviewer exports are designed for verification. The public compliance page explains what is available and what remains outside current claims.
The whitepaper is intentionally precise: live Saudi-hosted pilot, detector benchmarks, encrypted vaulting, compliance records, customer-route HA evidence, and dated continuity evidence. It does not convert those facts into regulator approval, SOC 2, ISO 27001, external pen-test, HSM custody, full-vault verification, or full-region tolerance claims.
A serious whitepaper should end with things a buyer can open. Start with the public artifacts, then request the signed reviewer bundle when control-level mappings are needed.
Open the JSON or Markdown summary, then request the signed reviewer bundle for control-level references.
Review dated detector outputs and compare the Arabic/Saudi PII story against the benchmark page.
Use status for current response checks, trust for dated proof, and deployment for topology and lane diagrams.
This is a live-pilot architecture description with customer-route HA now proven on ACK; it is not a regulator-approval, certification, HSM-custody, or full-region tolerance claim.
The page avoids academic theater and instead links the model, rules, benchmarks, and claim boundary together.
The verification section at the end gives buyers a direct path from narrative to artifacts.