Code in this repository reflect the steps used to pre-process data for machine learning and train models to predict PTSD and create a methylation risk score, published in BMC Medical Genomics. Also provided are the final weights and features for each of the three published risk scores.
Key Instruction: The weights and features for each of the three published risk scores are located in the Data folder. The files are named as follows:
- eMRS_Model1.xlsx
- MoRS_Model2.xlsx
- MoRSAE_Model3.xlsx
install_needed_packages.Rinstall required packages.DNHS_more_pheno.Rget the required variables for DNHS.Smoking_Scores_PGC_cohorts.Restimates smoking scores for each individual discovery cohort.MRS_Preprocess.Rpre-process Marine Resilience Study cohort to include in the training.Armystarrs_and_PRISMO_preprocess.Rto pre-process Army STARRS and PRISMO cohorts pre-post deployment samples to test risk scores.Check_after_updating_pheno.RandCheck_after_updating_pheno.htmlto check the updated phenotype file with the old file.cpgassoc2.Rhelper function to perform association analysis between each CpG and PTSD.Covariate_adjustment_1.Rexample code to show covariate adjustment. paper as we thought to make it an Epic data paper.Compare_Effect_Sizes.RmdandCompare_Effect_Sizes.htmlcode to compare the effect sizes of discovery and Boston VA cohort for model1.Demographics.Rcode to get demographic information for the manuscript.Cohort_Information.RmdandCohort_Information.htmlcode to get summary information from different cohorts, e.g., variables in each cohort to check data availability.
makedirectory.pyIs to make a directory to store the outcome files from each run.Settings.ipynbcontains settings for packages and plots.Preprocess_data_updated_1.ipynbpreprocess all cohorts individually for machine learning.pre_post_trauma_processing_v1.ipynbIs to pre-process the cohorts with pre/post samples and choose post-trauma samples for machine learning.Imputation_Covariate_adjustment_2.1.ipynCode to perform imputation and covariate adjustment.Imputation_Covariate_adjustment_including_Expo_vaiables_2.1.ipynbCode to perform imputation and covariate adjustment, including exposure variables.Feature_Selection_and_training_on_ptsdpm_3.3.ipynbFeature selection using the covariate-adjusted data (output of step 3).Feature_Selection_and_training_on_ptsdpm_wd_exp_vars_adjustment_3.3.ipynbFeature selection using the covariate-adjusted data for exposure variables (input is step 4 output).model_performance_5.5.ipynbRunning model and evaluating the performance (input is step 5 output).model_performance_wd_exp_vaars_adjustment_5.5.ipynbRunning model and evaluating the performance with adjusted exposure variables (input is step 6 output).
downstream_analysis_v5.qmdTo estimate risk scores for model 1 and 2 and test the risk scores using the test set in discovery cohorts.downstream_analysis_v5.htmlis the generated report. In steps 2 and 3, we test various data sets such as test set, civilians, military, and males and females to look at various scenarios.downstream_analysis_adj_for_Exp_Vars_v5.qmdis to estimate and test risk scores using model 3 on the test data set.downstream_analysis_adj_for_Exp_Vars_v5.htmlis the generated report.Test_RiskScores_with&without_exp_vars_wd_logit_6.Rmdis a clean version of estimating and testing risk scores. It used the point-biserial correlation between binary and continuous variables. Also, we used the logit model to predict PTSD using risk scores.Test_RiskScores_with&without_exp_vars_wd_logit_6.htmlis the generated report. This file was used to generate density, distribution and correlation plots for discovery cohorts.Pre_Post_Deployment_eMRS.qmdandPre_Post_Deployment_eMRS.htmlto test risk scores pre and post-deployment.Enrichment_analysis_1.qmdto perform enrichment analysis of top CpGs from models 1, 2, and 3. Models 1 and 2 have the same set of CpGs.CpGs_in_previous_studies&ML.Rcode to find overlap between identified significant CpGs and previous studies.Overlap_between_MRS_CpGs_metaanalysis_CpGs_Freeze3.Rto check overlap between identified significant CpGs and PGC EWAS meta-analysis and Freeze3 genes.mQTL.qmdandmQTL.htmlComparing significant CpGs with BIOS QTL browser CpGs.
Create_sample_data.RandCreate_sample_data without exp vars.Rcode to create sample data with and without exposure variables as an example for external cohorts.Covariate_Adj_RiskScores_1.RandCovariate_Adj_RiskScores_without_exp_vars_1.Rcode to estimate risk scores with and without exposure variables, respectively.Test_RiskScores_with&without_exp_vars_wd_logit_2.Rmdcode to test risk scores and generate plots.