Sécurité by design, pas en option
L'architecture LMbox est pensée d'abord pour que vos données ne sortent jamais. Voici comment, concrètement.
100 % on-premise
Aucune connexion sortante par défaut. La Box fonctionne dans votre LAN, isolable en air-gap si besoin. Vos données ne transitent JAMAIS par un service tiers.
Chiffrement bout-en-bout
TLS interne, LUKS sur disque avec TPM 2.0, Postgres chiffré at-rest, sauvegardes Restic chiffrées. Les clés restent chez vous, jamais chez nous.
SSO entreprise
Intégration native Active Directory, Azure AD, Okta, Google Workspace via OIDC. Authentik en frontal pour les autorisations fines par groupe.
Audit logs
Toute interaction tracée et signée : qui a posé quoi, quand, à quel modèle, avec quelles sources. Rétention configurable selon votre politique.
Une Box, des couches
┌──────────────────────────────────────────────────────┐
│ Internet ❌ pas d'accès │
└──────────────────────────────────────────────────────┘
│
┌────────┴────────┐
│ Firewall LAN │ Port 443 entrant uniquement
└────────┬────────┘
│
┌─────────────────┴─────────────────┐
│ Reverse proxy + TLS (Caddy) │ Cert. CA entreprise
└─────────────────┬─────────────────┘
│
┌─────────────────┴─────────────────┐
│ SSO Authentik (OIDC vers AD) │ Auth + autorisation
└─────────────────┬─────────────────┘
│
┌──────────────┬──────┴──────┬──────────────┐
│ Open WebUI │ LiteLLM │ lmbox-rag │
│ (chat UI) │ (API gw) │ (RAG, KB) │
└──────┬───────┴──────┬──────┴──────┬───────┘
│ │ │
└──────────────┴─────────────┘
│
┌────────┴────────┐
│ Ollama │ Modèles locaux
│ (Gemma 4 ...) │ Ryzen 9 7950X CPU / RTX Pro 6000 GPU
└─────────────────┘
Postgres (chiffré at-rest, LUKS + TPM 2.0)
Audit logs centralisés (rétention configurable)
Sauvegardes Restic chiffrées vers NAS interne
LMbox vs services cloud d'IA générative
| Critère | LMbox (on-prem) | ChatGPT / Claude / Gemini cloud |
|---|---|---|
| Où vivent vos données ? | Dans votre datacenter | Datacenters US/UE éditeur |
| Données en transit | LAN interne, TLS | Internet vers cloud public |
| Données au repos | Vos disques, vos clés | Disques éditeur, leurs clés |
| SSO entreprise | Natif (OIDC/SAML) | Selon plan |
| Clés de chiffrement | Sous votre contrôle | Géré par l'éditeur |
| Audit logs | Complets, à vous | Partiels, leur périmètre |
| Base légale RGPD | Traitement interne (article 6) | Sous-traitance + DPA |
| Notification de violation | Vous décidez | Selon SLA éditeur |
| Lock-in fournisseur | Aucun, vous gardez tout | Élevé |
Notre roadmap conformité
Four deterministic layers stop the agent from making things up.
« What if the AI just makes stuff up? » is the question from law firms and Compliance Officers. The honest answer isn't « our model is better ». All LLMs hallucinate - from Mistral-7B to GPT-4. The LMbox answer is architectural: four independent layers, each capable of blocking the hallucination, wired in parallel on every catalogue agent.
Layer A - Citation Verifier
Every reference emitted by the agent (Cassation ruling, Code article, statute, decree, EU regulation, case-file exhibit) is extracted by 11 regex families and verified against Légifrance + EUR-Lex live. Reference not found → CRITICAL. Malformed reference (« 12 jav 2024 ») → CRITICAL. 11 families, 7 French codes mapped, production-grade LRU cache + rate-limit.
lmbox agent verify
Layer B - Runtime Guard
Real-time guardrail over the LLM stream: the moment a hallucinated citation completes in the output, generation is cancelled (strict) or annotated inline (annotate) or logged for observability (warn). No more waiting for the brief to finish to find the error - 200 ms after the bad token, the agent is stopped.
lmbox agent run --guard strict
Layer C - Structured Output
Every agent declares a JSON Schema in its manifest (draft 2020-12). Output is validated + re-prompted if invalid (up to 2 retries). A 7-rule linter catches schema design bugs at write-time, before deployment. No more shape drift between agents and downstream pipelines.
lmbox agent lint-schema --strict
Layer D - Source Grounding
Every identifier cited in the output (document_id, protocol_id, source_id) MUST come from a tool call this turn. If the agent claims to cite interne-2019-453 without having called search_dossiers_internes that returns it, it's rejected. The architectural layer that makes an invented source physically impossible.
lmbox agent check-grounding
The agent invents « Cass. Com., 12 janvier 2024, n° 22-15.487 »
This ruling does not exist. But its shape is perfect: valid court, valid month, valid pourvoi number format. No human eye will catch it in a 30-page conclusion. Here's how the 4 layers catch it - each independently:
-
A
Layer A calls Légifrance live, finds nothing →
CRITICAL: external_not_found. Enough on its own to fail the golden suite and block the deployment. - B Layer B picks up Layer A's verdict during streaming: as soon as the citation is complete + followed by 40 trailing characters, the verifier pings Légifrance. On NOT_FOUND, generation stops. The lawyer never reads the tainted brief.
-
C
Layer C requires (in the Conclusions Drafter schema) that any jurisprudence cited in the markdown also appears in
cited_jurisprudence[]with a source_id. An inline citation without a metadata entry → schema invalid, repair loop, hard fail if not corrected. -
D
Layer D verifies the claimed source_id came from a tool call this turn. Without a call to
search_jurisprudencethat returned that identifier, the source_id is structurally false → blocked.
Detailed architecture: the 4 ADRs (002 → 005) published on GitHub describe the design, trade-offs, failure modes and code references.
Read the ADRsChaque action du portail est cryptographiquement chaînée.
Chaque entrée du journal d'audit LMbox est hashée en SHA-256 avec le hash de l'entrée précédente. Le RSSI peut prouver à un auditeur SOC 2 ou à la CNIL qu'aucune ligne n'a été supprimée, modifiée, ou insérée - y compris par un administrateur LMbox.
chain_hash[N] = SHA-256(
chain_hash[N-1]
|| canonical(payload[N])
)
genesis = SHA-256(
"lmbox.ai/audit-chain/v1"
|| "customer=" + customer.id
|| "created=" + customer.created_at
)
-
Détection d'insertion / suppression
Modifier ou supprimer une ligne brise le chaînage : toutes les entrées suivantes deviennent invalides au prochain verify_chain.
-
Genesis par client
Chaque tenant a sa propre chaîne - deux clients ne partagent jamais le même préfixe. Aucune fuite cross-tenant possible.
-
Vérifiable en 1 clic
Le RSSI clique « Vérifier la chaîne » dans le portail. LMbox re-walk les N entrées en quelques secondes et affiche un bandeau vert ou rouge - opposable.
Voir la chaîne d'audit en live : ouvrez la démo publique, cliquez « Journal » dans le portail, puis « Vérifier la chaîne ». 200 entrées chaînées en 3 secondes.
Ouvrir la démoIs connecting SharePoint, Salesforce, Jira to LMbox a sovereignty flaw?
No - but it's a topic where precision matters. LMbox does not move your data. It reads it where you already put it. Storage sovereignty depends on your earlier choice of SaaS vendor, made before LMbox came along. AI-model sovereignty, indexing and audit are what LMbox provides - and no one else can claim that for your existing data.
The 4 layers of an AI system - who controls what
| Layer | Question | Who controls |
|---|---|---|
| 1. Stockage | Where are the source documents? | Your earlier choice (M365, Salesforce, on-prem, …) |
| 2. Indexation / RAG | Who reads them and embeds them? | LMbox - local, on-prem, on the box |
| 3. Inférence IA | Where does the model run? | LMbox - local model, never an external API |
| 4. Audit | Who keeps the trail? | LMbox - verifiable SHA-256 chain |
Three typical scenarios
Pure-SaaS connectors (Salesforce, HubSpot, Notion, Slack, Drive, Teams)
Your data is already in the vendor's cloud - a choice made before LMbox. We read it locally via OAuth, without moving anything. AI-side sovereignty, not storage-side.
Hybrid connectors on cloud (SharePoint Online, Confluence Cloud, Jira Cloud)
The vendor offers a self-hosted edition, you chose cloud. LMbox reads locally. If storage sovereignty becomes critical, migrate to the self-hosted variant - LMbox supports it without changes.
Hybrid connectors self-hosted (SharePoint Server, Confluence DC, GitLab self-managed)
Full on-prem stack end-to-end. No data leaves your DC. SecNumCloud, defence, strict HDS argument - defensible to an auditor with no caveats.
The 7 technical controls already in place
-
Credentials encrypted at rest
OAuth tokens for connectors are stored via Rails 8 attribute encryption (AES-GCM, key outside the DB). A Postgres dump never reveals a usable token.
-
Scrubbing after push
Once the credential reaches the box, the cloud-side `credentials` sub-field is wiped. The digest stays for future rotation; the plaintext value is gone.
-
Every access traced in the audit chain
Reading a SharePoint document creates a SHA-256-hashed audit entry. The CISO can re-walk the chain anytime and prove no read was hidden.
-
Outbound-only heartbeat
The box accepts no inbound connection. Every cloud → box command rides the outbound heartbeat the box initiates. No external attack surface.
-
Minimum scope on the App Registration
Explicit documentation per use case: for NDA Reviewer, `Sites.Selected` rather than `Sites.Read.All`. The integrator partner tunes the scope with the client.
-
Token rotation
Documented process for periodic rotation (90 days typical) and immediate revocation when a user leaves. Handled on the Azure AD / Google Workspace / etc. side.
-
Read-only RAG
No connector writes back to the source. No risk of malicious injection into a shared library. If an agent needs to write (0.5+), explicit human approval required.
Recommendation for maximum sovereignty: if you're in a regulated sector (defence, strict HDS, ACPR-sensitive banking), pick the self-hosted variants of hybrid connectors: SharePoint Server, Confluence Data Center, GitLab self-managed, Outlook/Exchange on-prem. LMbox supports them natively and you get the full on-prem stack. See the connector catalogue
Une question sécurité ou conformité ?
Notre équipe peut répondre à un questionnaire DSI, fournir un dossier d'architecture détaillé, ou organiser une visio avec votre RSSI.
Échanger avec un expert