Skip to content
Parker
← The APEX Platform
Stage 03 · Synthetic truth, openly licensed

Apex Atlas

Open data, at population scale.

A research-grade synthetic patient population for model training, simulation, and benchmarking. Apache 2.0 for open use; commercial agreement for production deployment.

The ProblemV2.0

Healthcare AI is bottlenecked by access to data. PHI lives behind decade-long DUAs, real-world cohorts under-represent edge cases, and academic datasets are too narrow to train on. Atlas removes the bottleneck — a continuously generated synthetic patient population, statistically faithful to the Data Lake, free of any real PHI, and openly licensed for the research community.

CapabilitiesV2.0

What Atlas does.

POPULATION
Population-scale synthetic cohorts

A continuously regenerated synthetic patient population spanning U.S. demographics, common comorbidity structures, and longitudinal history. Distributions are statistically anchored to the source Data Lake.

MULTIMODAL
Beyond claims and notes

Synthetic EHR encounters, lab panels, imaging studies (DICOM-compliant), genomic variant calls, wearable streams, and free-text clinical notes — all generated as a single coherent record per patient.

FAITHFUL
Differentially private by construction

Generation pipelines are differentially-private (ε ≤ 1.0) against the source Data Lake. Re-identification risk is mathematically bounded — independently audited by MITRE and NIST PSCR.

STEERABLE
Cohort-on-demand

Need a synthetic cohort of patients with HFpEF + Stage III CKD on SGLT2 inhibitors? Specify the criteria, generate via the Atlas API. Surface edge cases that don't exist at scale in the real world.

OPEN
Apache 2.0 for the commons

Core dataset, generation models, and benchmark suite are Apache 2.0. Use it in academic work, model evals, FDA submissions, or open-source health AI without a license conversation.

COMMERCIAL
Production rails when you need them

A separate commercial agreement covers production deployment, indemnification, custom cohort generation, and SLA-backed API access. Use Apache for R&D, convert to commercial at the deploy line.

IntegrationsV2.0

Plays well with the stack you already run.

Every integration is first-party and maintained in-house. No fragile middleware, no orphaned connectors.

Formats
  • FHIR R5
  • OMOP CDM v5.4
  • i2b2
  • DICOM
  • VCF/BCF
  • CSV/Parquet bulk export
Distribution
  • Hugging Face Datasets
  • AWS Open Data
  • Google Cloud Public Datasets
  • Snowflake Marketplace
  • Direct Atlas API
Benchmarks
  • MedQA-Atlas
  • Clinical-NER-Atlas
  • Risk-Stratification-Atlas
  • Triage-LLM-Atlas
  • Coding-Accuracy-Atlas
Compliance postureV2.0
  • Zero real PHI — Safe Harbor + Expert Determination de-identified by construction
  • Differential privacy guarantee (ε ≤ 1.0) independently audited
  • MITRE re-identification risk assessment published quarterly
  • Apache 2.0 license for non-production use
  • Commercial Use Agreement with indemnification for production deployment
  • FDA Model-Informed Drug Development (MIDD) reference dataset
Full compliance hub