Multi-agent clinical-data platform

Turn clinical records into queryable, source-backed data.

Salutera uses specialist agents for NLP, Vision, Speech, Reasoning, and Provenance to read notes, PDFs, scans, pathology reports, imaging text, and transcripts. The platform structures every extraction into clinical ontologies and links each answer back to the exact source document, page, and line.

Request a clinical-data demo See the provenance workflow

On-prem · VPC · air-gapped · zero data egress

ready

// Clinical query solversemantic parser v2.4

“Show me NSCLC patients with PD-L1 ≥ 50% on first-line immunotherapy in 2024.”

1,243 patients matched · 6 sites · 0.42slive · verified

NLPreads notes

Visionscans PDFs

Speechprocesses transcripts

Reasoningverifies cohort

Provenancecites sources

// Cohort extraction matrixFORMATS: CSV · FHIR · OMOP

patienttnm statusbiomarker statusregimenlast encounter

P-1042T3N1M0ALK+1L2024-07-30

P-1043T1N0M0PD-L1 ≥50%imm2024-09-04

P-1044T4N2M1EGFR+2L2024-06-21

1,243 total records matched · showing 3

// Provenance source map · P-1043verified block #7492

“Tumor immunohistochemistry showed PD-L1 expression 62% (Tumor Proportion Score), consistent with high-expressor classification. Patient initiated pembrolizumab monotherapy on 04 Sep 2024.”

oncology_consult_240904.pdf · pg. 3 · line 27grounded

The product in action

A patient's full clinical story.

Each event below was extracted from a different source document — biopsy report, pathology consult, oncology note, imaging report — and grounded against SNOMED-CT, LOINC, RxNorm. One row per encounter; one citation per claim; one audit entry per extraction.

[TIMELINE // SYNC_ACTIVE]

Patient Workspace: P-1043

HASH: SHA-256 // FUSED_CLINICAL_GRAPH_08

56M · NSCLC Stage III11 Source Records Fused100% Grounded

Nov 2023Dx

biopsy_report_231112.pdf·p.2

NSCLC Adenocarcinoma · Stage T2N0M0

Coords: [142, 280, 410, 298]99.4% match

Dec 2023Biomarker

pathology_consult_231205.pdf·p.1

EGFR WT · ALK Neg · PD-L1 TPS 62%

Coords: [85, 120, 310, 138]98.9% match

Jan 2024Tx-1L

oncology_note_240110.pdf·p.4

Pembrolizumab (Keytruda) Monotherapy

Coords: [190, 520, 520, 538]99.8% match

Live Provenance Inspector

PDF_VIEWER_v1.0

“Biopsy of right lower lobe lung nodule demonstrates poorly differentiated adenocarcinoma, consistent with primary non-small cell lung cancer, stage T2a N0 M0.”

BOX: [142, 280, 410, 298]

// Extraction Telemetry

Document Ref

biopsy_report_231112.pdf

Confidence

99.4%

// Semantic Ontology Grounding

SNOMED-CT254637007

Non-small cell lung cancer

ICD-10-CMC34.90

Malignant neoplasm of lung

// PROVENANCE_MONITOR

Throughput

24.8 rec/s

12:04:18.232extractagent.nlpdischarge_summary_240821.pdf
1.4 MB27 variables · grounded
12:04:18.451extractagent.cvct_chest_240821.dcm
12.5 MB4 measurements · grounded
12:04:19.108fuseagent.mmencounter:240821-103
4.2 KB1 row · ontology-validated
12:04:21.050queryuser:scarter@…cohort:nsclc-pdl1-1L
128 KB1,243 matches · cited

[OUTPUT // SCHEMA]

Unified Patient Record

Outputs are instantly available as structured JSON arrays, FHIR resources, or tabular exports. Every variable is cryptographically tied to its source PDF coordinates.

fhir_export_bundle.jsonactive

{

"resourceType": "Bundle",

"id": "fused-patient-p1043",

"entry": [ { "resource": { "id": "T3N1M0", "confidence": 0.991 } } ]

}

2.4s

Avg Extraction

99.1%

Accuracy

The data problem

Information is trapped.

Clinical data is locked in PDFs, handwritten notes, faxed scans, and fragmented systems. Teams waste weeks manually searching, abstracting, validating, and rechecking the exact same records.

Manual extraction fails

Keyword search misses context. Manual abstraction is painfully slow, inconsistent across abstractors, and nearly impossible to scale across large disease cohorts.

AI without citations is dangerous

Salutera reads what those records actually say — and cites the exact bounding box in the source PDF for every single claim it returns. Hallucinations are structurally impossible when every output must be grounded.

How it works

Structure. Search. Reason. Cite.

Eight pipeline stages. Four specialist extraction agents (NLP · Computer Vision · Speech · Multimodal). An extensible reasoning layer on top. Every cell, every claim, traces back to its source document.

// ExtractionFour specialist agents work every record in parallel[ records in → structured data out ]

// RECORDS IN

Input Documents

PDF · DICOM · HL7 · Audio · Web Scraped

↓

01Router

Intelligent Dispatcher

Dispatches segments to optimal downstream specialized AI agents.

↓

// PARALLEL PROCESSING AGENTS

NLP

Reading agent

notes · summaries · pathology

Looking agent

imaging · ECG · scanned forms

SPK

Listening agent

dictations · consultations

Fusing agent

multimodal records

↓

02Stage

Ontology Mapping

LOINC · SNOMED-CT · RxNorm standards mapping.

03Stage

Vector Embeddings

Encodes semantic context for multi-agent reasoning.

04Stage

Classification

High-precision categorization & diagnostic validation.

↓

// STRUCTURED OUTPUT

Mega-Structured Dataset

Instantly available formats:

✓ Relational Tables & JSON
✓ FHIR R4 Resources
✓ OMOP CDM Mappings

100% Traceable provenance per cell

// ReasoningAgents that work the structured output[ cited answers · extensible ]

// AGENT_01

Cohort comparison

Apples-to-apples across sites.

// AGENT_02

Eligibility screening

Trial criteria, per patient.

// AGENT_03

Signal detection

Adverse-event scanning.

// AGENT_04

Decision support

Cited answers per question.

// AGENT_05

Formulation & CMC

Pharma R&D precedent.

// AGENT_06

Custom agents

Bring your own rules.

The Processing Core

08 Stages Clinical Pipeline

End-to-end parallel multi-agent processing, clinical ontology mapping, and de-identified mega-structure outputs designed for infinite scalability.

Data Anonymization

Local compliance de-identification (HIPAA/GDPR).

High-Perf Storage

Scalable cluster indexing raw multimodal formats.

Intelligent Routing

Dispatches content to optimal downstream agents.

Parallel Processing

NLP, CV, and Speech agents fuse all claims simultaneously.

Ontology Mapping

Forces terminology to match LOINC, SNOMED, RxNorm.

Vector Embeddings

Encodes semantic relationships for reasoning models.

Variable Extraction

Pulls exact patient properties with absolute traceability.

Mega-Structure Output

Generates typed tables, FHIR bundles, and knowledge graphs.

Semantic Clinical Query

Search across records. Returns cited patient cohorts, not document links.

Longitudinal Timeline

Aligns diagnoses, biomarkers, and visits sequentially per patient.

Registry Extraction

Pre-fills oncology (NCDB) and cardiovascular (STS) fields directly.

Traceable Auditing

Every cell includes a one-click provenance jump to the source offset.

Zero-Trust Local Engine

Strips PII locally, preserving 100% privacy constraints under HIPAA.

Standardized Ontologies

Grounds messy free-text into LOINC, SNOMED-CT, RxNorm schemas.

Unified Clinical Data Platform

Connect vastly fragmented medical systems and unstructured data formats directly to our secure, high-precision AI reasoning core.

STEP 01

Clinical data mega-structures of specific institutions into AI-ready datasets

99% accurately mega-structures clinical data from vastly fragmented sources

STEP 02

Real-time AI decision support and scalable insights

Innovative Salutera Algorithms

STEP 03

Fully secured and easy to use for non-IT professionals

WEB and Hand-held apps

SystemData Type

HIS

EHR IntegrationCerner, Epic, MEDITECH, and more.

EHR/EMR

FHIR DataFHIR R4 resources.

DUR

Unstructured DataData as different formats: PDF, imaging, DICOM, etc.

PAC/RIS

Unstructured DataData in forms of particular schema, e.g. table

LIS/LABS

Unstructured DataPDF, .txt, imaging, DICOM, etc.

CRM

Structured DataData in schema-based forms like tables.

Multimodal AI/ML cross-talks medical variables of billions of data points for precision health – tailored to specific populations.

Our platform helps doctors identify the most appropriate approved treatments. It is fully secure and user-friendly.

STEP 01

Clinical data mega-structures of specific institutions into AI-ready datasets

99% accurately mega-structures clinical data from vastly fragmented sources

SystemData Type

HIS

EHR IntegrationCerner, Epic, MEDITECH, and more.

EHR/EMR

FHIR DataFHIR R4 resources.

DUR

Unstructured DataData as different formats: PDF, imaging, DICOM, etc.

PAC/RIS

Unstructured DataData in forms of particular schema, e.g. table

LIS/LABS

Unstructured DataPDF, .txt, imaging, DICOM, etc.

CRM

Structured DataData in schema-based forms like tables.

STEP 02

Real-time AI decision support and scalable insights

Innovative Salutera Algorithms

Multimodal AI/ML cross-talks medical variables of billions of data points for precision health.

STEP 03

Fully secured and easy to use for non-IT professionals

WEB and Hand-held apps

Our platform helps doctors identify the most appropriate approved treatments. Fully secure and user-friendly.

What users can do

Ask clinical questions. Build cohorts.

MODULE_01 // NL_QUERY

Ask clinical questions

Semantic search across the corpus. Returns matching patients, not just documents.

salutera query engine

›PD-L1 >= 50% AND NSCLC

Matched patients1,847

Indexed 112,711 files · verified

MODULE_02 // COHORT

Build precise cohorts

Inclusion + exclusion logic in clinician language with clear audit trails.

Cohort BuilderLIVE

PD-L1 high

1L regimen

1.2K Pts

MODULE_03 // ABSTRACT

Extract variables

Pre-fill NCDB, STS, NSQIP registries directly from underlying records.

FieldValueSource

Stage_TNMT2N0M0consult.pdf

Gleason7 (3+4)path_rpt.pdf

PSA_ng/mL6.8labs_q3.pdf

MODULE_04 // TIMELINE

Create timelines

Diagnosis, biomarker, and treatment assembled into one longitudinal view.

Patient Timeline

DxNov '23

BxJan '24

TxMar '24

✓Sep '24

MODULE_05 // PROVENANCE

Trace every claim

One-click jump to the source document, page, and exact passage.

PDF

consult_240904.pdf

pg.4 · line 27–31

View

Citation coverage100%

MODULE_06 // EXPORT

Export structured data

Typed tables, FHIR bundles, OMOP CDM mappings for downstream analytics.

{ patient: "P-1043",}

FHIR R4OMOPCSV

Benchmarks & Evidence

Validation & Extraction Accuracy

Evaluated under strict exact-match criteria. Pinned manifestations ensure clinical stability across mid-sized and government air-gapped deployments.

Evaluation Set

112,711

Unstructured Records

99.12%

Colorectal Cancer Registry

Histopathology Reports PACS DICOM MRI

99.03%Gleason

Prostate Cancer

PSA, Gleason score, margins

COPD (Pulmonary)

98.89%

Asthma Registry

98.12%

Cohort Metrics16K SYNTHETIC

Overall Accuracy

95.79%

±5.69%

Evidence & validation details

Methodology & Framework

Structuring across 16 diseases · 7 categories on a synthetic patient corpus modeled on CDC- and NIH-sourced statistics.

// Headline benchmark on synthetic cohort

0.00%

overall accuracy

<0s

retrieval time

variables

files processed

16,000 synthetic patients

Synthea + Qwen-2.5 enrichment. Evaluated strictly under exact-match accuracy metrics across 16 diseases including oncology, respiratory, immunology, and neurology.

Deployment footprint

Runs on commodity infrastructure. Scales linearly with the cluster you give it.

On-prem · VPC · Air-gapped

Zero data egress

// Grounded ontology domains & categories

ImmunizationsCodesNamesMedicationsSymptomsConditionsObservationsCare plansProceduresDevices

Verticals

One clinical engine. Three regulated surfaces.

vertical 1

Hospitals & clinics

Registry abstraction, chart-review acceleration, quality-measure support, and source-cited clinical search inside your perimeter.

Hospital scenarios →

✓Registry abstraction (NCDB · STS · NSQIP · GWTG)
✓Longitudinal patient timelines
✓Chart-review acceleration
✓Quality measures from underlying records
✓Source-cited clinical search
✓No record egress · on-prem ready

vertical 2

Pharma & biotech R&D

Real-world evidence at the speed of your research questions. Federated by default; records never cross site boundaries.

Pharma R&D scenarios →

✓RWE cohort discovery
✓Trial feasibility · eligibility screening
✓Phenotype & endpoint extraction
✓Medical review support
✓Formulation & CMC analytics
✓Source evidence for every variable

vertical 3

Government & public health

Modernize registries, surveillance, and equity reporting using the records member institutions already keep.

Public-health scenarios →

✓Disease registry modernization
✓Cross-region surveillance
✓Public-health reporting
✓Quality benchmarking
✓Equity reporting · SDoH stratification
✓Air-gapped · sovereignty-aware

Security & deployment

Every claim should survive review.

Records stay in your perimeter

Deploy on-prem, in your VPC, or air-gapped. When Salutera runs in your environment, patient files never touch our infrastructure. Zero egress.

Tenant isolation

Per-customer compute, storage, and agents. No shared inference or cross-training.

BYO KMS + Encryption

AES-256 at rest, TLS 1.3 in transit. Customer-controlled KMS natively supported.

Audit Log Per Claim

Every extraction, query, and export is logged with operator, timestamp, and scope. Easily exportable to your enterprise SIEM.

Model Governance

Pinned model manifests and staged updates. Customer sign-off is strictly required for tenant-level model changes.

HIPAA BAA available GDPR DPA availableSOC 2 · in progressFull security posture →

Pilot offer

Bring us your hardest clinical dataset.

Pick the dataset that's been blocking your team. We'll structure 1,000 records inside your perimeter — or ours — in seven business days. You decide if it's good enough.

Request a clinical-data demo→

Talk to security