Toggle Health Problems and D

An attempt at inferring health risks from vitamin D levels – May 2015

Tightrope Walking: Using Predictors of 25 (OH)D Concentration Based on Multivariable Linear Regression to Infer Associations with Health Risks

Ning Ding1,2 ning.ding at canberra.edu.au, Keith Dear3, Shuyu Guo2, Fan Xiang2, Robyn Lucas 2,4
1 Faculty of Health, University of Canberra, Canberra, ACT, 2601, Australia,
2 National Centre for Epidemiology and Population Health, Research School of Population Health, The Australian National University, Canberra, ACT, 2600, Australia,
3 Duke Global Health Institute, Duke Kunshan University, Kunshan, Jiangsu, 215316, China,
4 Telethon Kids Institute, University of Western Australia, Perth, WA, 6009, Australia


The debate on the causal association between vitamin D status, measured as serum concentration of 25-hydroxyvitamin D (25[OH]D), and various health outcomes warrants investigation in large-scale health surveys. Measuring the 25(OH)D concentration for each participant is not always feasible, because of the logistics of blood collection and the costs of vitamin D testing. To address this problem, past research has used predicted 25(OH)D concentration, based on multivariable linear regression, as a proxy for unmeasured vitamin D status. We restate this approach in a mathematical framework, to deduce its possible pitfalls. Monte Carlo simulation and real data from the National Health and Nutrition Examination Survey 2005-06 are used to confirm the deductions. The results indicate that variables that are used in the prediction model (for 25[OH]D concentration) but not in the model for the health outcome (called instrumental variables), play an essential role in the identification of an effect. Such variables should be unrelated to the health outcome other than through vitamin D; otherwise the estimate of interest will be biased. The approach of predicted 25 (OH)D concentration derived from multivariable linear regression may be valid. However, careful verification that the instrumental variables are unrelated to the health


VitaminDWiki is exploring the use of Neural Networks to accomplish this task
See also VitaminDWiki
The sun provides more health benefit than vitamin D – Dr. Lucas podcast – May 2015 Dr. Lucas is a co-author of this study
Search VitaminDWiki for ("R LUCAS" OR "ROBYN LUCAS") 30 items as of May 2015

 Download the PDF from VitaminDWiki

Discussion (copied from PDF)

The results indicate that the prediction of 25(OH)D concentration based on multivariable linear regression may be correct, but care needs to be taken when applying this methodology. Even if only one of the instrumental variables used is invalid, the estimates of the association between 25(OH)D concentration and the health outcome will be unreliable. It should be noted that the second requirement of a valid instrument variable, that it should not be a risk factor for the disease, cannot be test mathematically or statistically and can only be judged according to biological findings from past research. Thus, the reasons for the choice of instrumental variables should be discussed, and the lack of correlation with the health outcome confirmed. Previous studies using this methodology have not provided an adequate consideration of the potential biases that could occur. For example, several papers used variables such as physical activity, BMI, smoking status, alcohol intake and race as instrumental variables, despite substantial evidence these factors are strongly associated with many diseases, including the outcomes of interest [6,15-18]. Vitamin D intake has also been used as an instrumental variable [8,19], but may also be associated with disease risk as a marker of a healthier lifestyle and thus lower disease risk [20].

In some studies, stratification by a potential confounder, or meta-analysis of findings have been used to indicate a greater likelihood of a “real” finding. However, a stratified analysis cannot demonstrate that the results are “correct” or robust. For example, where BMI is used in the predictive model for 25(OH)D score, then the effect estimate of 25(OH)D score on the health outcome, e.g. digestive cancer, maybe compared across strata of BMI. Higher BMI is a known risk factor for digestive cancer and is therefore an invalid instrumental variable. In this case, if the effect estimates from the two strata are the same or similar, then the conclusion may be that the association between 25(OH)D score and the health outcome is the same for both strata, or, alternatively, that the bias caused by the invalid instrumental variable plus the real association is the same for both strata. But it is not possible to distinguish between these two possible conclusions. Similarly, meta-analysis does not help although it is useful to estimate the summary effects over a number of previous studies particularly when the sample size in any single study is insufficient. If all of the individual studies use invalid instrumental variables, all of the effect estimates are biased, and the weighted average of these biased estimations will be similarly biased.

Most recently, variants of genes that affect 25(OH)D synthesis or substrate availability (e.g. CYP2R1, GC and DHCR7) have been used as instrumental variables either individually or through creation of a genetic score that acts as a proxy for long-term 25(OH)D levels [21]. This method does not predict 25(OH)D levels per se, but may be more disease-relevant than a single 25(OH)D measurement for which intraclass correlation coefficients range from 0.42-0.72 between 2 direct measures taken 2-14 years apart [6,22-24]. The substrate from which vitamin D is synthesised is 7-dehydrocholesterol (7-DHC) located in epidermal cells of the skin. The DHCR7 gene encodes the enzyme 7-DHC reductase and both 7-DHC and 7-DHC reductase are part of the cholesterol biosynthesis pathway. Using a genetic synthesis score, a recent metaanalysis showed a modest association between higher genetically instrumented 25(OH)D concentration and lower systolic blood pressure A valid instrument has an effect on the outcome only through the factor that it is a proxy for, in this case 25(OH)D concentration. In the recent study, the synthesis score was highly correlated with measured 25(OH)D concentration, but also had an overall association with higher serum total cholesterol (p = 0.04), suggesting a possible separate pathway of effect of this genetic score on higher systolic blood pressure. Thus genetic 25(OH)D scores should also be used as instrumental variables with caution, given the pleiotropic effects of some vitamin D pathway genes, e.g. GC and its association with lipid metabolism, inflammation and metabolic feedback loops.

In practice, it is common to generate a dichotomous variable based on the predicted score to categorize participants as suffering from vitamin D deficiency or not, and this further complicates the situation. In this situation, the bias caused by use of an invalid instrumental variable will be further distorted by the distribution of the predicted 25(OH)D score. The direction of the bias cannot be determined theoretically.

The method discussed here is similar to the method of Two-Stage Least Squares (2SLS) which is widely used to estimate causal relationships in economics [25,26]. Differences between the two methods include that 25(OH)D concentration is available in the main data for Stage 1, but not Stage II, while 2SLS usually uses the same dataset in both stages. The 2SLS method aims to solve the bias caused by omitted confounders; an instrumental variable can be used only if it: 1) has a strong association with the variable (exposure) of interest; and 2) is not an independent risk factor for the outcome. These two criteria also apply for the methodology using a predicted 25(OH)D score.

Although applying the predicted 25(OH)D score method to identify the association between 25(OH)D concentration and health outcomes is not straightforward, there are clinical applications for predicted data. Recently there have been large increases in vitamin D testing in several countries due to concern about possible widespread vitamin D deficiency and purported links to a wide range of health risks [27], with considerable costs to healthcare systems [28]. One solution to reduce unnecessary tests is to predict those who are at high risk of vitamin D deficiency using available data, and test only these people. However, when predicted levels are used in large-scale epidemiological studies seeking to clarify links between vitamin D status and disease risks, there is considerable risk of bias in the estimates of effect arising from incorrect specification of an instrumental variable. This must be fully considered and discussed in studies using this methodology.


  1. Holick MF (2007) Vitamin D deficiency. N Engl J Med 357: 266-281.doi: 10.1056/NEJMra070553 PMID: 17634462
  2. Schmidt-Gayk H, Bouillon R, Roth HJ (1997) Measurement of vitamin D and its metabolites (calcidiol and calcitriol) and their clinical significance. Scand J Clin Lab Invest Suppl 57: 35-45.
  3. Garland CF, Garland FC (1980) Do Sunlight and Vitamin D Reduce the Likelihood of Colon Cancer? Int J Epidemiol 9: 227-231.doi: 10.1093/ije/9.3.227 PMID: 7440046
  4. Garland FC, Garland CF, Gorham ED, Young JF (1990) Geographic variation in breast cancer mortality in the United States: A hypothesis involving exposure to solar radiation. Prev Med 19: 614-622.http:// dx.doi.org/10.1016/0091-7435(90)90058-R. PMID: 2263572
  5. Grant WB, Mohr SB (2009) Ecological studies of ultraviolet B, vitamin D And cancer since 2000. Ann Epidemiol 19: 446-454.http://dx.doi.org/10.1016/i.annepidem.2008.12.014. doi: 10.1016/j.annepidem. 2008.12.014 PMID: 19269856
  6. Giovannucci E, Liu Y, Rimm EB, Hollis BW, Fuchs CS, Stampfer MJ, et al. (2006) Prospective study of predictors of vitamin D status and cancer incidence and mortality in men. J Natl Cancer Inst 98: 451459.98/7/451 [pii], doi: 10.1093/inci/dii101 PMID: 16595781
  7. Bertrand KA, Giovannucci E, Liu Y, Malspeis S, Eliassen AH, Wu K, et al. (2012) Determinants of plasma 25-hydroxyvitamin D and development of prediction models in three US cohorts. Br J Nutr 108: 1889-1896.doi: 10.1017/S0007114511007409 PMID: 22264926
  8. Joh HK, Giovannucci EL, Bertrand KA, Lim S, Cho E (2013) Predicted plasma 25-hydroxyvitamin D and risk of renal cell cancer. J Natl Cancer Inst 105: 726-732.doi: 10.1093/inci/dit082, dit082 [pii]. PMID: 23568327
  9. Ng K, Wolpin BM, Meyerhardt JA, Wu K, Chan AT, Hollis BW, et al. (2009) Prospective study of predictors of vitamin D status and survival in patients with colorectal cancer. Br J Cancer 101: 916-923.doi: 10.1038/si.bic.6605262, 6605262 [pii]. PMID: 19690551
  10. Greene W (2002) Econometric Analysis. Englewood Cliffs, NJ: Prentice-Hall.
  11. Nelder J, Wedderburn R (1972) Generalized linear models. J R Stat Soc Ser AStat Soc 135: 370-384.
  12. McCullagh P, Nelder JA (1989) Generalized linear models. London: Chapman and Hall.
  13. Centers for Disease Control and Prevention (2014) National Centerfor Health Statistics National Health and Nutrition Examination survey data. Hyattsville, MD: US Department of Health and Human Services, Centers for Disease Control and Prevention.
  14. Bhaskaran K, Smeeth L (2014) What is the difference between missing completely at random and missing at random? Int J Epidemiol: 1-4.doi: 10.1093/iie/dyuu080
  15. Liu E, Meigs JB, Pittas AG, Economos CD, McKeown NM, Booth SL, etal. (2010) Predicted 25-hydroxyvitamin D score and incident type 2 diabetes in the Framingham Offspring Study. Am J Clin Nutr 91: 1627-1633.doi: 10.3945/aicn.2009.28441, aicn.2009.28441 [pii]. PMID: 20392893
  16. Liu JJ, Bertrand KA, Karageorgi S, Giovannucci E, Hankinson SE, RosnerB, etal. (2013) Prospective analysis of vitamin D and endometrial cancer risk. Ann Oncol 24: 687-692.doi: 10.1093/annonc/ mds509 PMID: 23136228
  17. Harris HR, Chavarro JE, Malspeis S, Willett WC, Missmer SA (2013) Dairy-food, calcium, magnesium, and vitamin D intake and endometriosis: a prospective cohort study. Am J Epidemiol 177:420-430. doi: 10.1093/aie/kws247, kws247 [pii]. PMID: 23380045
  18. Ananthakrishnan AN, Khalili H, Higuchi LM, BaoY, KorzenikJR, Giovannucci EL, etal. (2012) Higher predicted vitamin D status is associated with reduced risk of Crohn's disease. Gastroenterology 142: 482-489.doi: 10.1053/i.gastro.2011.11.040, S0016-5085(11)01638-6 [pii]. PMID: 22155183
  19. Jimenez M, Giovannucci E, Krall Kaye E, Joshipura KJ, Dietrich T (2014) Predicted vitamin D status and incidence of tooth loss and periodontitis. Public Health Nutr 17: 844-852.doi: 10.1017/ S1368980013000177 PMID: 23469936
  20. Hoggatt KJ (2003) Commentary: Vitamin supplement use and confounding by lifestyle. Int J Epidemiol 32: 553-555. PMID: 12913028
  21. Vimaleswaran KS, Cavadino A, Berry DJ, Jorde R, Dieffenbach AK, Lu C, etal. (2014) Association of vitamin D status with arterial blood pressure and hypertension risk: a mendelian randomisation study. Lancet Diabetes Endocrinol 2: 719-729. S2213-8587(14)70113-5 [pii], doi: 10.1016/S2213-8587(14) 70113-5 PMID: 24974252
  22. Hofmann Jonathan N., Yu Kai, Horst Ronald L., Hayes Richard B., Purdue MP (2010) Long-term Variation in Serum 25-Hydroxyvitamin D Concentration among Participants in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. Cancer Epidemiol Biomarkers Prev 19: 927-931. doi: 10.1158/ 1055-9965.EPI-09-1121 PMID: 20332255
  23. Jorde Rolf, Sneve Monica, Hutchinson Moira, Emaus Nina, Figenschau Yngve, Grimnes G (2010) Tracking of Serum 25-Hydroxyvitamin D Levels During 14 Years in a Population-based Study and During 12 Months in an Intervention Study. Am J Epidemiol 171: 903-908. doi: 10.1093/aie/kwq005 PMID: 20219763
  24. Kotsopoulos J, Tworoger SS, Campos H, Chung F-L, ClevengerCV, Franke AA, etal. (2010) Reproducibility of plasma and urine biomarkers among premenopausal and postmenopausal women from the Nurses' Health Studies. Cancer Epidemiol Biomarkers Prev 19: 938-946. doi: 10.1158/1055-9965. EPI-09-1318 PMID: 20332276
  25. Imbens GW, Angrist JD (1994) Identification and estimation of local average treatment effects. Econo- metrica 62: 467-475.
  26. Heckman JJ (2008) Econometric causality. Int Stat Rev 76:1-27.doi: 10.1111/i.1751-5823.2007. 00024.x
  27. Sattar N, Welsh P, Panarelli M, Forouhi NG (2012) Increasing requests for vitamin D measurement: costly, confusing, and without credibility. Lancet 379: 95-96.doi: 10.1016/S0140-6736(11)61816-3, S0140-6736(11)61816-3 [pii]. PMID: 22243814
  28. Bilinski KL, Boyages SC (2012) The rising cost of vitamin D testing in Australia: time to establish guidelines for testing. Med J Aust 197: 90.doi: 10.5694/mia12.10561 [piil. PMID: 22794050
  29. Gilbert R, Martin RM, Fraser WD, Lewis S, Donovan J, Hamdy F, etal. (2012) Predictors of 25-hydroxy- vitamin D and its association with risk factors for prostate cancer: evidence from the prostate testing for cancer and treatment study. Cancer Causes Control 23: 575-588.doi: 10.1007/s10552-012-9919-8 PMID: 22382867

Supporting information

Attached files

ID Name Comment Uploaded Size Downloads
5535 Infer health risks.pdf PDF 2015 admin 30 May, 2015 17:13 276.45 Kb 308
See any problem with this page? Report it (FINALLY WORKS)