Building 1 The Biomarkers Revolution Enrique F. Schisterman, PhD Epidemiology Branch – DESPR – NICHD NIH Logo DHHS Logo 1 My Best Work (Nachtailer A. & Schisterman EF. 2012) pic 002.jpg 2 Albert.JPG MarcusEnrique.jpg Leila.JPG Penny.JPG Sunni.JPG Michael.JPG Audrey.jpg Michelle.jpg Qian.jpg Edwina.jpg Yaakov.jpg anna.jpg Leslie.jpg Karen.jpg 3 Outline Background Limit of Detection Pooling Biomarkers – Hybrid Design Calibration Curves Lipid Standardization Conclusions 4 Background Biomarker: A specific physical trait used to measure or indicate the effects or progress of a disease or condition Newly developed laboratory methods expand the number of biomarkers on a daily basis Cost Measurement Error Causal Link to Disease 5 Motivation Preliminary analysis of salivary concentrations of cortisol from the LIFE study P=0.04 Shipment 1 n Mean SD Michigan 85 0.40 0.19 Texas 142 0.57 0.79 6 Cortisol by Site & Plate 7 Do Common Laboratory Practices Affect our Estimates of Risk? Limit of Detection Measurement Error Calibration Curves Lipid Standardization 8 Outline Background Limit of Detection Pooling Biomarkers – Hybrid Design Calibration Curves Lipid Standardization Conclusions 9 Reporting of Biomarker Data ID Z 3.1 1.5 8.4 0.8 5.4 3.2 2.0 5.8 13.4 2.5 1.9 6.1 Reporting threshold is equal to 2.2 ID Z 3.1 ND 8.4 ND 5.4 3.2 ND 5.8 13.4 2.5 ND 6.1 Report values < threshold as ‘not detected’ ID Z 3.1 1.1 8.4 1.1 5.4 3.2 1.1 5.8 13.4 2.5 1.1 6.1 Report values < threshold as one half the value of the threshold 10 Conventional Determination of the Limit of Detection (LOD) BLANK SERIES 10.0 5.0 8.1 7.1 4.0 11.3 12.0 8.0 7.7 7.0 Mean = 8.02 Std Dev = 2.53 11 Example of LOD left-censored data Blanks “True” biomarker Better LOD? 12 Example of LOD left-censored data Blanks “True” biomarker Observed biomarker (samples) 13 Why is this a problem? Comparisons of PCBs in cases and controls Controls—mean PCB Cases—mean PCB Effect size LOD Blanks 14 Approaches for LOD/ missing data Simplest approach is substitution Under certain circumstances yield minimal bias Conventionally, values below the LOD are usually 1. replaced by zero, LOD, LOD/2, LOD/√2 2. excluded 3. retained Model based approaches Likelihood models (Perkins et al., AJE 2007) Multiple imputation Schisterman EF, Vexler A, Whitcomb BW, Liu A. AJE 2006 15 Why is this a problem? Comparisons of PCBs in cases and controls LOD Impute what? 0 LOD LOD/2 16 LOD Simulation Purpose: To evaluate the effect of the handling of values below the LOD on risk estimates Simulated data from a normal and log normal distribution and varied: Effect size Variance of PCBs in the exposure group LOD level Measurement error mean and variance 17 Effect of Handling of Values < LOD on %Bias *LOD “low” indicates 1.6 SDs below the mean of controls, resulting in imputed values for a small number of data points. LOD “high” indicates 1 SD above the mean of the controls, resulting in imputed values for a large number of both controls and cases 18 LOD—Conclusions Choice of how to handle values below the LOD can result in a loss of accuracy in estimating risk Retaining observed values below the LOD produces the least biased estimates Substitution of LOD/√2 for values below the LOD produces not terribly biased estimates 19 Outline Background Limit of Detection Pooling Biomarkers – Hybrid Design Calibration Curves Lipid Standardization Conclusions 20 What is pooling? Physically combining several individual specimens to create a single mixed sample Pooled samples are the average of the individual specimens 1 2 p MCj02910410000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] 21 Random Sample of Biospecimens MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] RANDOM SAMPLE Randomly select 20 samples FULL DATA N = 40 Individual Biospecimens 22 MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] Pooling Biospecimens MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] POOLED DATA 40 samples in groups of 2 FULL DATA N = 40 Individual Biospecimens 23 MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] Effect of Pooling on Markers Affected by an LOD 24 Efficiency of the Mean and Variance Variance of Estimated Mean Variance of Estimated Variance FULL DATA POOLED RANDOM FULL DATA POOLED RANDOM 25 LOD below Mean LOD below Mean LOD above Mean LOD above Mean Pooling and Random Sampling Pooling advantages Reduces the number of assays we need to test Efficiently estimates the mean Cost-effective Random sampling advantages Reduces the number of assays we need to test Efficiently estimates the variance Cost-effective & easy to implement 26 Hybrid Design: Pooled—Unpooled Creates a sample of both pooled and unpooled samples Takes advantage of the strengths of both the pooling and random sampling designs Reduces number of tests to perform Cuts overall costs Gains efficiency (by using pooling technique) Accounts for different types of measurement error without replications Pooling error Random measurement error LOD 27 Unpooled: X1,…,X5 Pooled: Z1,…,Z15 MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] Hybrid Sample S: X1,…,X5,Z1,…,Z15 Setup of Hybrid Design Unpooled: X1,…,X[αn] Pooled: Z1,…,Z[(1-α)n] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] In General Hybrid Sample S: X1,…,X[αn],Z1,…,Z[(1-α)n] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] MCj02910390000[1] α is the proportion of unpooled samples 28 Maximum Likelihood Estimators Random Sampling Pooling In order to estimate the variance, α cannot be zero. Schisterman EF et al, Stat Med 2010 29 Hybrid Design Example: IL-6 Measured IL-6 on 40 MI cases and 40 controls Biological specimens were randomly pooled in groups of 2, for the cases and controls separately, and remeasured We want to evaluate the discriminating ability of this biomarker in terms of AUC 30 Hybrid Design Example: IL-6 n αx αy AÛC Var(AÛC) Empirical 40 1.00 1.00 0.640 0.0036 Hybrid design: Optimal α 20 0.40 0.35 0.621 0.0049 Random sample: α=1 20 1.00 1.00 0.641 0.0071 Hybrid design reduced the variability of Var(AÛC) by 32% as compared to taking only a random sample 31 Summary—Hybrid Design Hybrid design is a more efficient way to estimate the mean and variance of a population Cost-effective Yields estimate of measurement error without requiring repeated measurements Here we focus on normally distributed data, but can be applied to other distributions as well 32 Outline Background Limit of Detection Pooling Biomarkers – Hybrid Design Calibration Curves Lipid Standardization Conclusions 33 Measurement of G-CSF Chemiluminescence assays 96-well plate Antibody against the biomarker of interest Set of standards of known biomarker concentration included in each batch Set of samples (concentration unknown) Light emitting molecule binds to bound biomarker 34 35 Measurement of Cytokines Cytokines are not measured directly Antibodies against analyte(s) coat wells 36 Measurement of Cytokines Samples added, analyte binds to antibodies Unbound proteins are washed away 37 Measurement of Cytokines A ‘tag’ is added to the assay that binds to the protein – antibody complex that produces color 38 Measurement of Cytokines A ‘tag’ is added to the assay that binds to the protein – antibody complex that produces color The intensity of the color is measured 39 ELISA/Multiplex Layout Step 1: prepare antibodies mixture and add to plate Step 2: prepare calibrators, add to plate Step 3: prepare unknowns, add to plate 40 Use of Chemiluminescence Assays for Measuring Protein Concentrations Use calibration to convert relative measures to the desired unit of concentration From optical density in relative fluorescence units (RFU) to concentration in pg/mL Current practice is per assay calibration Results in potentially large calibration datasets used only minimally in current practice 41 ELISA/Multiplex Layout Step 1: prepare antibodies mixture and add to plate Step 2: prepare calibrators, add to plate Step 3: prepare unknowns, add to plate 40 Use of Chemiluminescence Assays for Measuring Protein Concentrations Use calibration to convert relative measures to the desired unit of concentration From optical density in relative fluorescence units (RFU) to concentration in pg/mL Current practice is per assay calibration Results in potentially large calibration datasets used only minimally in current practice 41 Calibrating the Assay: The Standard Curve 42 Calibrating the Assay: The Standard Curve The human G-CSF standard curve is provided only for demonstration A standard curve must be generated each time an assay is run, utilizing values from the Standard Value Card included in the Base Kit Potential variation in the relation between relative fluorescence and concentration Chromophore potentially affected by temperature, humidity, etc. 43 Calibrating the Assay: The Standard Curve 42 Calibrating the Assay: The Standard Curve The human G-CSF standard curve is provided only for demonstration A standard curve must be generated each time an assay is run, utilizing values from the Standard Value Card included in the Base Kit Potential variation in the relation between relative fluorescence and concentration Chromophore potentially affected by temperature, humidity, etc. 43 G-CSF and Miscarriage in the CPP Case-control study nested in the Collaborative Perinatal Project study cohort 462 miscarriage cases 482 non-miscarriage controls Serum biospecimens from early pregnancy, prior to miscarriage onset For n = 944, 24 assays were used 44 45 This estimate is based on the conventional batch specific approach 46 Objective Question: Is the current practice of standard batch-specific calibration the best use of information? To evaluate the effect of different approaches for calibration models on risk estimation To assess bias associated with different approaches 47 Data from the calibration experiments 24 batches, each with 7 known concentrations measured in replicate Batches varied by Shape Location Agreement between replicates Presence of outliers 48 Batch 1 Calibration Curve – G-CSF Standard 1 – undiluted (conc = 6000 pg/mL) Measured optical density Fixed ‘known’ concentration *All calibration data (in log10) 49 Batch 1 Calibration Curve – G-CSF Standard 2 – 1/3rd dilution (conc = 2000 pg/mL) Measured optical density Fixed ‘known’ concentration 50 Batch 2 Calibration Curve – G-CSF Measured optical density Fixed ‘known’ concentration 51 Batch 3 Calibration Curve – G-CSF Measured optical density Fixed ‘known’ concentration 52 Batch 6 Calibration Curve – G-CSF Measured optical density Fixed ‘known’ concentration 53 Batch 9 Calibration Curve – G-CSF Measured optical density Fixed ‘known’ concentration 54 Batch 10 Calibration Curve – G-CSF Measured optical density Fixed ‘known’ concentration 55 Batch 21 Calibration Curve – G-CSF Measured optical density Fixed ‘known’ concentration 56 Batch 22 Calibration Curve – G-CSF Measured optical density Fixed ‘known’ concentration 57 Batch 24 Calibration Curve – G-CSF Measured optical density Fixed ‘known’ concentration 58 All Calibration Curves Collapsed – G-CSF Measured optical density Fixed ‘known’ concentration 59 Effect of Calibration Method on Logistic Regression Results 60 Simulation Study Generate dataset with: True biomarker concentration True effect on risk Overall relation between concentration and RFU Batch variability Occasional outliers Simulate calibration experiments to estimate RFU – concentration relation according to each approach Assess bias and variance of estimators from risk models 61 Simulation Study: The Biomarker Biomarker: exp(X ~ N(5,1)) Miscarriage risk: OR = 1.05, 1.15 or 1.65 β={0.05, 0.14, 0.50} Conc. and OD: OD determined through a single function 62 Summary of simulation study results Comparison of shape, model for β = 0.14 Collapsed Mixed Batch-specific Linear Curvilinear Linear Curvilinear FORWARDS REVERSE β ^ 0.14 Whitcomb et al, Epidemiology 2010 63 Conclusions Underestimation of effects due to calibration approach has broad implications Use of conventional batch-specific approaches performed poorly Greatest bias to estimates in simulations Most prone to loss of data for batches with failure of some calibration points 64 Outline Background Limit of Detection Pooling Biomarkers – Hybrid Design Calibration Curves Lipid Standardization Conclusions 65 Background PCBs are lipophilic xenobiotics Serum measures of exposure have practical advantages over adipose measures, but there is a price: Serum PCB concentrations are correlated with serum lipid concentration Limited understanding of the true relation of serum and adipose tissue PCB concentrations to serum lipids What does this imply for statistical models of PCB’s health effects? 66 SL is a confounder Wet Weights e.g., logit[P(y=1)] = α + β1PCBs Normalizing Factor e.g., logit[P(y=1)] = α + β1(PCBs /SLm) Predictor/Potential Confounder e.g., logit[P(y=1)] = α + β1 PCBs + β2SL Background Models of Serum PCBs & Binary Outcomes Ignores serum lipids “Standardized” model Which model best reflects underlying causal assumptions? (or are they all hopeless?) 67 To evaluate the impact of these different ways of using serum lipids in models on risk estimates, we simulated data from a log normal distribution to determine bias We varied: The truth (true underlying causal relations) The statistical model used for risk estimates The relation between PCBs and serum lipids Measurement error in serum lipids Study Aim and Methods 68 The Truth(s), in DAGs A B C D E F G H S-PCB Y SL S-PCB Y SL S-PCB Y SL S-PCB Y SL S-PCB Y SL A S-PCB Y SL A S-PCB/SL Y Adipose-PCB S-PCB Y SL Polychlorinated biphenyl (PCB), serum lipids (SL), outcome (Y), ancestor (A) 69 Each DAG Implied Different Simulated Data Allowed the causal structure to dictate how the data were generated Assigned lognormal distributions for PCB and serum lipids Assumed: outcome Y is binomial, with Pr(Y = 1 | PCB, SL) βPCB (relation of ln(PCBs) to logit[P(Y=1)]) = 0.6 γ (relation of ln(PCB) to ln(SL)) = 0.3 βSL (relation of ln(SL) to logit[P(Y=1)]) = 0.34 No interactions Linear (or log-linear) relations 70 Truth vs Statistical Models Simple cause and effect: PCBs causes Y. SL is unrelated B S-PCB Y SL α + β1ln(PCBs) + β2ln(SL) α + β1ln(PCBs/SLm) α + β1ln(PCBs) Model Unadjusted Standardized Adjusted % bias on βPCB -0.8 -75.9 -0.7 βPCB True βPCB = 0.6 Measurement error in SL ~ N(0, σe2=1) γ (strength of assoc of ln(PCB) with ln(SL)) = 0 500 reps, n=1000 logit[P(y=1)] = … 71 Standardization Bias: SL not a Causal Mediator A B D F S-PCB Y SL S-PCB Y SL S-PCB Y SL S-PCB Y SL A γ = 2.0, standardized model γ = 1.0, standardized model γ = 0.3, standardized model γ = 0.01, standardized model All gamma in all other models Bias in βPCB 72 -100% bias! βPCB Truth vs Statistical Models Confounding: A causes PCBs and SL, and both cause Y E S-PCB Y SL α + β1ln(PCBs) + β2ln(SL) α + β1ln(PCBs/SLm) α + β1ln(PCBs) Model Unadjusted Standardized Adjusted % bias on βPCB 24.0 -128.8 0.1 True βPCB = 0.6 Measurement error in SL ~ N(0, σe2=1) γ (strength of ass’n of ln(PCB) with ln(SL)) = 0.3 True βSL = 0.34 500 reps, n=1000 logit[P(y=1)] = … A 73 Standardization Bias: SL, Confounder and Intermediate C H E S-PCB Y SL S-PCB Y SL S-PCB Y SL A γ = 0.01 Unadjusted γ = 0.3 Standardized γ = 1.0 Adjusted γ = 2.0 Bias in βPCB -100% bias! _ _ _ _ _ 74 PCB in adipose tissue causes PCB in serum per SL, and causes Y PCBS/SL Adipose PCB Y βPCB-s Truth vs Statistical Models Serum PCB per SL as an ascending proxy for adipose PCB G 75 βPCB-S Truth vs Statistical Models Serum PCB per SL as an Ascending Proxy for Adipose PCB G S-PCB Y A-PCB α + β1ln(PCBs) + β2ln(SL) α + β1ln(PCBs/SLm) α + β1ln(PCBs) Model Unadjusted Standardized Adjusted % bias on βPCB -86.3 -1.0 -1.0 True βPCB = 0.6 Measurement error in SL ~ N(0, σe2=1) γ (strength of assoc of ln(PCB) with ln(SL)) = 0.3 500 reps, n=1000 logit[P(y=1)] = … 76 Standardization Bias: Serum PCB per SL as an Ascending Proxy -60% bias Bias in βPCB Unadjusted Adjusted, standardized G S-PCB/SL Y A-PCB 77 Summary of Model-DAG Agreement: % Bias in βPCB-S A B C D E F G H S-PCB Y SL S-PCB Y SL S-PCB Y SL S-PCB Y SL S-PCB Y SL A S-PCB Y SL A S-PCB/SL Y Adipose-PCB S-PCB Y SL Unadjusted Standardized Adjusted 1.2 -51.3 1.8 -0.8 -75.9 -0.7 -15.4 -351.3 -99.4 0.4 -79.8 0.8 Unadjusted Standardized Adjusted 24.0 -128.8 0.1 -0.4 -85.0 -0.1 -86.3 -1.0 -1.0 -11.2 -128.3 -25.4 78 Conclusions For the 8 underlying “truths”, represented by causal DAGs, the statistical models produced estimates with bias ranging from -351% to 24% The standardized model produced large biases for most of the evaluated DAGs The adjusted model produced small biases even for the DAG for which standardization is optimal 79 Outline Background Limit of Detection Pooling Biomarkers – Hybrid Design Calibration Curves Lipid Standardization Conclusions 80 Do Common Laboratory Practices Affect our Estimates of Risk? Limit of Detection Request the observed values Design away using hybrid methods and overcome cost, LOD and ME Calibration Curves Study Design should include a calibration curve plan Standardization Don’t do it! YES! 81 Take Home Message Treating biomarker measurement process as a black box leads to biased estimates of effects Study design overcomes most of the biases Epidemiologists & statisticians need to be more acquainted with every step of the biomarker measurement process All the biomarker measurement issues discussed in this talk informed the design and analysis of the BioCycle Study and the EAGeR Trial 82 Acknowledgments Pavillion Long Range Initiative of the American Chemistry Council From NICHD: Drs. Perkins N, Whitcomb B, Mumford S, Albert P, Liu A, Louis G. From Johns Hopkins: Dr. Louis T. From the University at Buffalo: Drs. Browne R & Vexler A. From the University of Florida: Dr Chegini N. 83 Questions? Thank you!