MIMIC EHR Severity Prediction Pipeline

This project simulates real-time EHR data ingestion using synthetic MIMIC-IV patient data and applies a machine learning pipeline to classify patient severity. It includes an end-to-end system for data streaming, feature engineering, model training, prediction, and exporting results to Snowflake for downstream analytics and clinical decision support.

Features

Real-time simulation of patient telemetry and note data
Feature engineering from clinical text and vitals/labs
Severity classification using Random Forest
Outputs augmented data streams to Snowflake
Modular and extensible pipeline for future model integration

Machine Learning Details

The model is a RandomForestClassifier trained on derived features including:

Text-derived statistics (note length, word count)
Vitals and lab values (e.g., heart rate, WBC count)
Categorical data (gender, admission type)
Engineered features (blood pressure difference, vitals score)

Labels are binary (severe vs. non-severe) based on thresholding heuristics.

File Structure

mimic-ehr-pipeline/
├── data/                      # Input CSVs (vitals, labs, notes)
├── ml_model/                  # Trained model artifacts (.pkl files)
├── config.py                  # Configuration settings (Snowflake, thresholds)
├── train_model.py             # Training pipeline for feature extraction and model fitting
├── realtime_etl.py            # Simulates real-time ingestion + scoring
└── utils.py                   # Helper functions (feature engineering)

Use Case

This script uses patient data to train a Random Forest model and serialize it to model.pkl. Ideal for data scientists and clinical engineers prototyping risk scoring models on real-time health data feeds.

Start Kafka locally or connect to your streaming service.

pip install -r requirements.txt
python stream_producer.py

Run the ETL + ML scoring pipeline:

python etl_pipeline.py

Model Training

python train_model.py #

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MIMIC EHR Severity Prediction Pipeline

Features

Machine Learning Details

File Structure

Use Case

Start Kafka locally or connect to your streaming service.

Run the ETL + ML scoring pipeline:

Model Training

License

About

Uh oh!

Releases 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
ml_model		ml_model
.gitignore		.gitignore
README.md		README.md
config.py		config.py
realtime_etl.py		realtime_etl.py
requirements.txt		requirements.txt
train_model.py		train_model.py
utils.py		utils.py

bravorod/mimic-ehr-pipeline

Folders and files

Latest commit

History

Repository files navigation

MIMIC EHR Severity Prediction Pipeline

Features

Machine Learning Details

File Structure

Use Case

Start Kafka locally or connect to your streaming service.

Run the ETL + ML scoring pipeline:

Model Training

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Languages