# 6-Layer Automated Validation for Synthetic Clinical Data

**Stephen J. Ronan, MD**
RonanLabs | ronanlabs.ai

April 2026

---

## Abstract

Synthetic clinical data promises to accelerate medical research, enable algorithm development, and facilitate regulatory submissions without exposing patient information. Yet the utility of synthetic data hinges entirely on its quality, and the field lacks standardized validation methodology. Most synthetic data vendors rely on a single validation metric---typically statistical similarity or downstream task performance---which fails to capture the multidimensional nature of clinical data fidelity. We present a 6-layer automated validation framework that evaluates every synthetic clinical dataset across statistical fidelity, clinical pathway accuracy, temporal consistency, train-synthetic-test-real (TSTR) utility, NLP coherence, and differential privacy guarantees. Each layer addresses a distinct failure mode that the others cannot detect. The framework produces a structured validation report that ships with every dataset, providing consumers with transparent, reproducible quality evidence. This paper describes the design, metrics, and thresholds for each layer, explains why multi-layer validation is necessary, and details the report format delivered to dataset consumers. Benchmark results against public clinical datasets will be published as production datasets are generated. The framework is implemented and operational at RonanLabs, where it runs as an automated pipeline gating every dataset release.

---

## 1. Introduction

### 1.1 The Promise and the Problem

Synthetic data generation for healthcare has moved from theoretical curiosity to commercial reality. Generative adversarial networks (GANs), variational autoencoders (VAEs), and large language models (LLMs) can now produce tabular patient records, clinical notes, medical images, and longitudinal care sequences that appear indistinguishable from real data at casual inspection. The market opportunity is clear: health systems need data for algorithm development, device manufacturers need data for regulatory submissions, and researchers need data that does not require 18 months of IRB review to access.

But the core problem in synthetic clinical data is not generation. Generation is solved, or at least solvable, by a growing number of architectures. The hard problem is *validation*---proving that the synthetic data is good enough for its intended use, and that it does not leak information about real patients.

### 1.2 The Current State

The synthetic data industry has a validation deficit. A survey of commercial synthetic data vendors reveals a common pattern: generate data using a neural architecture, compute a handful of statistical similarity metrics (typically marginal distributions and a correlation comparison), and ship the dataset with a one-page quality summary. Some vendors add a TSTR benchmark. Very few check clinical plausibility. Almost none provide formal privacy guarantees beyond vague claims of "de-identification."

This is not a criticism of the generation models. It is a criticism of the delivery process. A synthetic dataset that matches real-data distributions perfectly but contains impossible clinical sequences---a patient receiving chemotherapy before a cancer diagnosis, a newborn with a hip replacement, an HbA1c of 2.1%---is worse than useless. It will silently corrupt any downstream analysis trained on it.

### 1.3 The Cost of Bad Synthetic Data

In non-clinical domains, bad synthetic data wastes compute and delays projects. In clinical settings, the consequences compound. An algorithm trained on clinically implausible synthetic data may learn spurious correlations that produce unsafe predictions when deployed on real patients. A regulatory submission backed by synthetic data that fails under scrutiny damages the credibility of the entire synthetic data approach, not just the vendor. The FDA's evolving guidance on synthetic and augmented data for medical devices [1] makes clear that the burden of proof for synthetic data quality falls on the data producer.

RonanLabs was founded on the premise that validation is the product. The synthetic data itself is a commodity---what customers are actually buying is the evidence that the data is trustworthy. This paper describes the 6-layer validation framework that produces that evidence.

---

## 2. The Case for Multi-Layer Validation

### 2.1 Single-Metric Validation Fails

Consider the most common validation approach: computing distributional similarity between real and synthetic data using metrics like Jensen-Shannon divergence (JSD) or Kolmogorov-Smirnov (KS) statistics. A synthetic dataset can achieve excellent distributional similarity while containing clinical nonsense. If the marginal distributions of diagnosis codes, procedure codes, and timestamps all match the real data, but they are combined incorrectly---appendectomy codes paired with Alzheimer's diagnoses, pediatric patients with prostate cancer---the statistical fidelity layer will not catch it.

Now consider TSTR alone. A classifier trained on synthetic data and evaluated on real data may achieve high accuracy because the decision boundary depends on a subset of features that the generator reproduces well. The same dataset may fail catastrophically for a different downstream task that depends on features the generator handles poorly. TSTR validates utility for a specific task, not data quality in general.

### 2.2 Each Layer Catches What Others Miss

The 6-layer framework is designed so that each layer addresses a distinct failure mode:

| Layer | Failure Mode Detected |
|-------|----------------------|
| Statistical Fidelity | Distributional drift, missing modes, correlation collapse |
| Clinical Pathway Accuracy | Impossible care sequences, guideline violations |
| Temporal Consistency | Reversed event ordering, implausible inter-event intervals |
| TSTR Utility | Poor downstream task performance |
| NLP Coherence | Incoherent clinical notes, hallucinated findings |
| Differential Privacy | Patient re-identification, membership inference |

A dataset that passes all six layers has been validated across statistical, clinical, temporal, utility, linguistic, and privacy dimensions. A failure in any single layer blocks release.

### 2.3 Why Six Layers

The number six is not arbitrary, but neither is it canonical. Each layer was added because we identified a class of defect that existing layers could not detect. We stopped at six because, in practice, these layers cover the failure modes we have observed across tabular EHR data, longitudinal claims data, and generated clinical notes. Future data modalities (medical imaging, genomic sequences) may require additional layers. The framework is extensible by design.

We explicitly chose not to pursue ten or fifteen layers. Validation must be computationally tractable and interpretable. Each additional layer adds pipeline complexity, increases runtime, and requires its own threshold calibration. Six layers is the minimum sufficient set for the data types RonanLabs currently produces.

---

## 3. Layer 1: Statistical Fidelity

### 3.1 Metrics and Methodology

Statistical fidelity measures how well the synthetic data reproduces the statistical properties of the real data it was trained on. We evaluate three levels of fidelity:

**Marginal distributions.** For each feature independently, we compare the distribution in synthetic versus real data. For continuous variables, we use the two-sample Kolmogorov-Smirnov test and the Wasserstein-1 (earth mover's) distance [2]. For categorical variables, we use the chi-squared goodness-of-fit test and Jensen-Shannon divergence [3]. We report both the test statistic and the p-value, but thresholds are based on effect size (JSD and Wasserstein), not statistical significance, because significance is a function of sample size and will reject trivial differences in large datasets.

**Joint distributions.** We compute pairwise joint distributions for all feature pairs and measure divergence using the 2D Jensen-Shannon divergence. For high-cardinality categoricals, we use contingency table residuals. We flag feature pairs where the joint distribution divergence exceeds the sum of marginal divergences by more than a calibrated threshold, indicating that the generator failed to capture the dependency structure.

**Correlation matrices.** We compute the Pearson correlation matrix for continuous features and the Cram\'er's V matrix for categorical features, then measure the Frobenius norm of the difference between real and synthetic matrices. We also compute the mean absolute correlation difference per feature and flag features with correlation errors exceeding 0.1.

### 3.2 What It Catches and What It Misses

Statistical fidelity catches mode collapse (a common GAN failure where entire subpopulations vanish from the synthetic data), distributional shift (the synthetic distribution is systematically biased relative to real), and correlation collapse (features that are correlated in real data become independent in synthetic data).

It does not catch clinical impossibilities. A synthetic patient with perfectly distributed vital signs, lab values, and diagnosis codes can still represent a clinically nonsensical care episode. This is why Layer 2 exists.

### 3.3 Implementation

The statistical fidelity pipeline ingests real and synthetic data as DataFrames, automatically detects column types, selects the appropriate test for each column and pair, computes all metrics in parallel, and produces a structured JSON report. Runtime scales linearly with feature count for marginals and quadratically for pairwise joints. For a typical EHR dataset with 200 features, full statistical fidelity runs in under 90 seconds.

---

## 4. Layer 2: Clinical Pathway Accuracy

### 4.1 Methodology

Clinical pathway validation checks whether the sequences of clinical events in synthetic patient records follow evidence-based care protocols. This is implemented through a combination of rule-based validators and knowledge graph traversal.

**Rule-based validators** encode hard clinical constraints as executable rules. Examples include: a treatment code must be preceded by a corresponding diagnosis code; laboratory result values must fall within physiologically possible ranges (e.g., serum sodium between 100 and 200 mEq/L, even for critically abnormal values); age-restricted procedures must match patient age (no pediatric patients with joint replacements indicated for degenerative disease); and medication dosages must fall within published ranges for the indicated condition.

**Knowledge graph validation** uses a clinical ontology that maps diagnosis codes to expected downstream events. For example, a Type 2 diabetes diagnosis (ICD-10 E11.x) should be followed, within a clinically reasonable window, by HbA1c monitoring, lifestyle counseling documentation, and, if HbA1c exceeds threshold, medication initiation following the ADA Standards of Care stepwise protocol [4]. We encode these expected pathways as graph traversals and flag synthetic records that deviate.

### 4.2 Example Pathways Validated

- **Diabetes management:** Diagnosis (E11.x) -> HbA1c lab order -> result -> metformin initiation if HbA1c >= 7.0% -> follow-up HbA1c at 3 months -> escalation if target not met
- **Chest pain workup:** ED presentation -> troponin order -> ECG -> risk stratification -> disposition (admit vs. discharge with follow-up)
- **Surgical pathway:** Pre-operative evaluation -> anesthesia clearance -> procedure -> post-operative orders -> follow-up
- **Sepsis bundle:** Suspected infection -> lactate draw -> blood cultures -> broad-spectrum antibiotics within 1 hour -> fluid resuscitation [5]

### 4.3 Integration with Clinical Practice Guidelines

The rule engine references published clinical practice guidelines (CPGs) from the ADA, ACC/AHA, NCCN, and IDSA, among others. Guidelines are encoded as versioned rule sets so that pathway validation reflects the standard of care at the time the real data was collected, not necessarily the most current guideline. This is critical for historical datasets where practices have evolved.

We do not require that every synthetic patient follow every guideline perfectly---real patients do not. We flag *impossible* sequences (treatment before diagnosis) as errors and *guideline deviations* (delayed antibiotic initiation) as warnings, calibrated against the deviation rate observed in the real training data.

---

## 5. Layer 3: Temporal Consistency

### 5.1 Methodology

Temporal consistency validation ensures that events in synthetic patient timelines occur in logically and clinically plausible order, and that inter-event intervals follow realistic distributions. We check three properties:

**Causal ordering.** Events that have a causal prerequisite must occur after that prerequisite. Diagnosis before treatment. Lab order before lab result. Hospital admission before in-patient procedures. Discharge after all in-patient events. Birth before all other events. Death after all other events.

**Interval plausibility.** The time between causally related events must fall within a plausible range. We model inter-event intervals using the empirical distribution from the real data and flag synthetic intervals that fall outside the 0.5th--99.5th percentile range. For example, the interval between a lab order and its result should typically be minutes to days, not months.

**Temporal density.** The number of events per unit time should be consistent with clinical reality. A synthetic patient with 50 emergency department visits in a single week, or a 10-year gap with zero encounters for a patient with documented chronic disease, is flagged.

### 5.2 Common Temporal Violations in Synthetic Data

In our experience, temporal violations are among the most common defects in synthetic clinical data. Generators that model events independently and then assemble timelines frequently produce:

- **Reversed causality:** Lab results that precede lab orders, discharge summaries dated before admission
- **Collapsed timelines:** Events that should span weeks compressed into a single day, often caused by generators that do not model inter-event intervals explicitly
- **Immortal patients:** Patients with events recorded after a documented death date
- **Phantom gaps:** Long periods with no clinical activity for patients with conditions requiring regular monitoring, caused by generators that undersample follow-up events

### 5.3 How Generators Cause Temporal Errors

Most tabular synthetic data generators treat each row (encounter, claim, event) independently, sampling features from learned distributions. Temporal relationships are either ignored entirely or modeled as a single "days since previous event" feature that does not capture the causal structure. Autoregressive generators (including LLM-based approaches) handle ordering more naturally but can still hallucinate impossible intervals, particularly for rare event sequences that appear infrequently in training data.

---

## 6. Layer 4: TSTR Utility

### 6.1 Benchmark Methodology

The Train-Synthetic-Test-Real (TSTR) paradigm, introduced by Esteban et al. [6], is the standard benchmark for synthetic data utility. The protocol is:

1. Split the real dataset into training (70%) and test (30%) partitions.
2. Train a generator on the real training partition.
3. Generate a synthetic dataset of equal size to the real training partition.
4. Train a classifier on the synthetic data (TSTR).
5. Train a separate classifier on the real training data (TRTR, the baseline).
6. Evaluate both classifiers on the real test partition.
7. Report the ratio of TSTR performance to TRTR performance.

A TSTR/TRTR ratio of 1.0 indicates that synthetic data is as useful as real data for the given task. Ratios above 0.9 are generally considered excellent; below 0.8 indicates meaningful utility loss.

### 6.2 Classifier Selection and Cross-Validation

We run TSTR with three classifiers: XGBoost [7], logistic regression, and random forest. Using multiple classifiers guards against the case where a single model's inductive bias masks or amplifies synthetic data defects. We report results for all three and flag the dataset if *any* classifier produces a TSTR/TRTR ratio below the threshold.

Each classifier is tuned via 5-fold cross-validation on its respective training set (synthetic for TSTR, real for TRTR) to ensure hyperparameters are optimized for the data distribution, not carried over from a default that may favor one data source.

For classification tasks, we report AUROC, AUPRC, and F1. For regression tasks, we report RMSE and R-squared. We always report metrics on the held-out real test set.

### 6.3 Interpreting TSTR Scores

A high TSTR score does not mean the synthetic data is "good" in general---it means the synthetic data preserves the signal relevant to a specific prediction task. A dataset can achieve TSTR/TRTR > 0.95 for mortality prediction while failing badly for readmission prediction if the generator captures acuity signals but not social determinants of health.

This is precisely why TSTR alone is insufficient. It validates utility, not fidelity. A dataset that passes TSTR but fails clinical pathway validation is useful for one narrow task but unreliable for exploratory analysis, hypothesis generation, or any purpose beyond the benchmarked prediction.

---

## 7. Layer 5: NLP Coherence

### 7.1 Metrics for Clinical Text Quality

When synthetic datasets include generated clinical notes (discharge summaries, progress notes, radiology reports), we evaluate text quality across four dimensions:

**Perplexity.** We measure perplexity using a clinical language model (e.g., BioGPT [8], Clinical-BERT [9]) as a proxy for linguistic coherence. Lower perplexity indicates that the generated text is consistent with the patterns of real clinical writing. We compare synthetic note perplexity against the distribution of perplexities computed over real notes from the same institution and note type.

**Medical entity density.** Using clinical NER (MetaMap [10], SciSpacy [11]), we extract medical entities (diagnoses, medications, procedures, anatomical sites, lab values) from synthetic notes and compare entity density (entities per 100 tokens) against real notes. Synthetic notes that are fluent but clinically vague---lacking specific medications, dosages, lab values---score low on entity density even if their perplexity is acceptable.

**Hallucination rate.** For synthetic notes that are generated conditioned on structured data (e.g., a discharge summary generated from a patient's coded diagnoses, procedures, and lab results), we check every clinical claim in the narrative against the structured record. A note that states "potassium was 3.2 mEq/L" when the structured data shows potassium of 4.8 is a hallucination. We report the hallucination rate as the fraction of verifiable clinical claims in the note that contradict the structured record.

**Readability.** We compute Flesch-Kincaid grade level and the Dale-Chall readability score. Clinical notes have characteristic readability profiles that vary by note type (radiology reports are more formulaic than progress notes). Synthetic notes that deviate significantly from the expected readability distribution for their note type are flagged.

### 7.2 Hallucination Detection Methodology

Hallucination detection is the most technically demanding component of this layer. Our approach:

1. Extract all clinical assertions from the synthetic note using an assertion detection model.
2. Classify each assertion as verifiable (can be checked against structured data) or unverifiable (subjective assessments, clinical reasoning).
3. For verifiable assertions, attempt to match against the structured record using entity linking and value comparison.
4. Score: hallucination rate = contradicted assertions / total verifiable assertions.

We target a hallucination rate below 2% for conditioned generation. Notes with hallucination rates above 5% trigger a dataset hold.

### 7.3 Coherence vs. Specificity Tradeoff

There is an inherent tension between coherence and specificity in clinical text generation. A model that generates vague, template-like notes ("Patient was seen, labs were reviewed, plan was discussed") will achieve low perplexity and zero hallucinations but provides no clinical value. A model that generates highly specific notes with exact lab values, medication dosages, and clinical reasoning risks hallucination if the generation is not tightly conditioned on structured data.

We address this tradeoff by requiring that both perplexity *and* entity density fall within acceptable ranges. A note must be both linguistically coherent and clinically specific to pass.

---

## 8. Layer 6: Differential Privacy

### 8.1 Privacy Threat Model

Synthetic data is not inherently private. A generator that memorizes training examples can reproduce real patient records verbatim in its output. Even without memorization, synthetic records that are statistically close to real records may enable re-identification by an adversary with auxiliary knowledge [12].

Our threat model assumes an adversary who:

- Has access to the synthetic dataset
- May have auxiliary information about specific individuals in the training data (e.g., knows that a particular person was in the dataset)
- Attempts membership inference (was a specific individual in the training data?) or attribute inference (given partial knowledge of an individual, can the adversary infer sensitive attributes?)

### 8.2 Re-Identification Risk Assessment

We compute three privacy metrics:

**Nearest-neighbor distance ratio (NNDR).** For each synthetic record, we find the nearest neighbor in the real training data (using Gower distance for mixed data types [13]) and compute the ratio of the distance to the nearest real neighbor versus the distance to the nearest synthetic neighbor. An NNDR close to 1.0 indicates that synthetic records are no closer to real records than they are to other synthetic records. NNDR values below 0.5 suggest potential memorization.

**Membership inference attack (MIA) resistance.** We implement the membership inference attack of Shokri et al. [14], adapted for generative models. We train a binary classifier to distinguish training data from held-out data based on the generator's behavior. We report the attacker's AUROC; a score near 0.5 indicates that the generator leaks no membership information.

**Hitting rate.** The fraction of synthetic records whose nearest real neighbor is closer than the 5th percentile of real-to-real distances. A hitting rate above 5% triggers review, as it suggests the generator is producing records that are suspiciously close to specific real patients.

### 8.3 Formal Privacy Guarantees

Where required, we apply differentially private training mechanisms. Differential privacy (DP), as formalized by Dwork et al. [15], provides a mathematical guarantee that the inclusion or exclusion of any single individual in the training data does not significantly change the distribution of the generator's output. We support DP-SGD [16] for neural generators and the exponential mechanism for query-based generation.

We report the privacy budget (epsilon, delta) for each dataset. Following emerging consensus in the privacy community, we target epsilon values below 10 for general-purpose synthetic data and below 1 for datasets containing highly sensitive attributes (mental health, substance use, HIV status) [17].

It is important to note that DP and utility are in tension. Lower epsilon (stronger privacy) generally reduces data utility. The validation report makes this tradeoff explicit by presenting Layer 4 (TSTR) results alongside Layer 6 (privacy) results, allowing consumers to assess whether the privacy-utility tradeoff is acceptable for their use case.

---

## 9. Validation Report Format

### 9.1 What Ships with Every Dataset

Every synthetic dataset released by RonanLabs includes a validation report in both human-readable (PDF) and machine-readable (JSON) formats. The report contains:

- **Dataset metadata:** Source data description, generation method, number of records, feature count, generation date, generator version
- **Layer-by-layer results:** For each of the six layers, the full set of computed metrics, pass/fail determination, and any warnings
- **Overall disposition:** PASS (all layers pass), CONDITIONAL PASS (all layers pass but warnings exist), or FAIL (one or more layers fail)
- **Reproducibility hash:** SHA-256 hash of the synthetic dataset, the validation code version, and all configuration parameters, enabling independent verification

### 9.2 How to Read the Report

Each layer section follows a consistent structure:

1. **Summary verdict:** Green (pass), yellow (pass with warnings), or red (fail)
2. **Aggregate metrics:** The headline numbers for the layer (e.g., mean JSD across all features, TSTR/TRTR ratio, hallucination rate)
3. **Feature-level detail:** Per-feature or per-pathway results, sorted by severity
4. **Flagged items:** Specific features, pathways, or records that triggered warnings or failures, with explanations

### 9.3 Thresholds

Thresholds are calibrated per data type and use case. Default thresholds for tabular EHR data:

| Layer | Metric | Green | Yellow | Red |
|-------|--------|-------|--------|-----|
| Statistical Fidelity | Mean JSD (categorical) | < 0.05 | 0.05--0.15 | > 0.15 |
| Statistical Fidelity | Mean KS stat (continuous) | < 0.08 | 0.08--0.20 | > 0.20 |
| Statistical Fidelity | Correlation matrix Frobenius error | < 0.10 | 0.10--0.25 | > 0.25 |
| Clinical Pathway | Impossible sequence rate | 0% | < 1% | >= 1% |
| Temporal Consistency | Causal violation rate | 0% | < 0.5% | >= 0.5% |
| TSTR Utility | Min TSTR/TRTR ratio (across classifiers) | > 0.90 | 0.80--0.90 | < 0.80 |
| NLP Coherence | Hallucination rate | < 2% | 2--5% | > 5% |
| Differential Privacy | MIA AUROC | < 0.55 | 0.55--0.65 | > 0.65 |
| Differential Privacy | Hitting rate | < 3% | 3--5% | > 5% |

Thresholds are documented in the report and can be adjusted by contract for specific use cases where tighter or looser bounds are appropriate.

---

## 10. Discussion and Future Work

### 10.1 Limitations

The framework has known limitations. Clinical pathway validation is only as complete as the encoded rule set; rare disease pathways and novel treatment protocols may not be covered. TSTR benchmarks validate utility for specific prediction tasks and do not guarantee utility for all possible downstream uses. Differential privacy bounds assume correct implementation of the DP mechanism; we rely on audited DP libraries (Opacus [18], Google DP [19]) but acknowledge that implementation bugs can invalidate formal guarantees.

### 10.2 Toward Continuous Validation

The current framework validates at the dataset level: a dataset is generated, validated, and released. Future work will extend validation to the record level, enabling streaming validation where individual synthetic records are scored as they are generated. This supports use cases where consumers generate synthetic data on demand via API and need per-record quality scores.

### 10.3 Domain-Specific Extensions

The 6-layer framework was designed for tabular EHR and claims data with optional clinical notes. Extending it to medical imaging (synthetic X-rays, CT scans, pathology slides) will require additional layers for image quality assessment, anatomical plausibility, and diagnostic accuracy. Genomic data will require validation of allele frequencies, linkage disequilibrium, and population structure. These extensions are planned and will be documented in subsequent publications.

### 10.4 Benchmarking and Open Science

We intend to publish benchmark results on public datasets (MIMIC-IV [20], Synthea [21], eICU [22]) to enable independent comparison. Validation code and threshold configurations will be released under open-source license to support reproducibility and community adoption. We believe that standardized validation methodology benefits the entire synthetic data ecosystem, including our competitors, by raising the quality bar and building trust with data consumers.

---

## References

[1] U.S. Food and Drug Administration. "Artificial Intelligence and Machine Learning in Software as a Medical Device: Action Plan." FDA, 2021. https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-software-medical-device

[2] Ramdas, A., Garcia Trillos, N., & Cuturi, M. "On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests." *Entropy*, 19(2), 47, 2017.

[3] Lin, J. "Divergence measures based on the Shannon entropy." *IEEE Transactions on Information Theory*, 37(1), 145--151, 1991.

[4] American Diabetes Association Professional Practice Committee. "Standards of Care in Diabetes---2024." *Diabetes Care*, 47(Suppl 1), S1--S321, 2024.

[5] Evans, L., Rhodes, A., Alhazzani, W., et al. "Surviving Sepsis Campaign: International Guidelines for Management of Sepsis and Septic Shock 2021." *Intensive Care Medicine*, 47, 1181--1247, 2021.

[6] Esteban, C., Hyland, S. L., & R\"atsch, G. "Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs." arXiv:1706.02633, 2017.

[7] Chen, T. & Guestrin, C. "XGBoost: A Scalable Tree Boosting System." *Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, 785--794, 2016.

[8] Luo, R., Sun, L., Xia, Y., et al. "BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining." *Briefings in Bioinformatics*, 23(6), bbac409, 2022.

[9] Alsentzer, E., Murphy, J. R., Boag, W., et al. "Publicly Available Clinical BERT Embeddings." *Proceedings of the 2nd Clinical Natural Language Processing Workshop*, 72--78, 2019.

[10] Aronson, A. R. & Lang, F. M. "An overview of MetaMap: historical perspective and recent advances." *Journal of the American Medical Informatics Association*, 17(3), 229--236, 2010.

[11] Neumann, M., King, D., Beltagy, I., & Ammar, W. "ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing." *Proceedings of the 18th BioNLP Workshop and Shared Task*, 319--327, 2019.

[12] Stadler, T., Oprisanu, B., & Troncoso, C. "Synthetic Data -- Anonymisation Groundhog Day." *Proceedings of the 31st USENIX Security Symposium*, 2022.

[13] Gower, J. C. "A General Coefficient of Similarity and Some of Its Properties." *Biometrics*, 27(4), 857--871, 1971.

[14] Shokri, R., Stronati, M., Song, C., & Shmatikov, V. "Membership Inference Attacks Against Machine Learning Models." *2017 IEEE Symposium on Security and Privacy*, 3--18, 2017.

[15] Dwork, C., McSherry, F., Nissim, K., & Smith, A. "Calibrating Noise to Sensitivity in Private Data Analysis." *Theory of Cryptography Conference*, 265--284, 2006.

[16] Abadi, M., Chu, A., Goodfellow, I., et al. "Deep Learning with Differential Privacy." *Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security*, 308--318, 2016.

[17] Abowd, J. M. "The U.S. Census Bureau Adopts Differential Privacy." *Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, 2867, 2018.

[18] Yousefpour, A., Shilov, I., Sablayrolles, A., et al. "Opacus: User-Friendly Differential Privacy Library in PyTorch." arXiv:2109.12298, 2021.

[19] Wilson, R. J., Zhang, C. Y., Lam, W., et al. "Differentially Private SQL with Bounded User Contribution." *Proceedings on Privacy Enhancing Technologies*, 2020(2), 230--250, 2020.

[20] Johnson, A. E. W., Bulgarelli, L., Shen, L., et al. "MIMIC-IV, a freely accessible electronic health record dataset." *Scientific Data*, 10, 1, 2023.

[21] Walonoski, J., Kramer, M., Nichols, J., et al. "Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record." *Journal of the American Medical Informatics Association*, 25(3), 230--238, 2018.

[22] Pollard, T. J., Johnson, A. E. W., Raffa, J. D., et al. "The eICU Collaborative Research Database, a freely available multi-center database for critical care research." *Scientific Data*, 5, 180178, 2018.

---

*Copyright 2026 RonanLabs. All rights reserved.*
