期刊首页 优先出版 当期阅读 过刊浏览 作者中心 关于期刊 English

《工程(英文)》 >> 2021年 第7卷 第12期 doi: 10.1016/j.eng.2020.05.028

扩大多元回归方法在跨组学研究中的范围

a CAS Key Laboratory of Pathogenic Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
b Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
c University of Chinese Academy of Sciences, Beijing 100049, China
d Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, USA

收稿日期: 2019-10-08 修回日期: 2020-03-14 录用日期: 2020-05-25 发布日期: 2021-05-19

下一篇 上一篇

摘要

近年来科技的进步和发展使得高维数据急剧增加,研究人员对合适且有效的多元回归方法的需求也随之增长。许多传统的多元分析方法如主成分分析等已广泛应用于投资分析、图像识别和群体遗传结构分析等研究领域。然而,这些常见的方法存在其局限性,即忽略了响应之间的相关性和变量选择效率低的问题。因此,本文引入了降秩回归方法及其扩展形式——稀疏降秩回归和行稀疏的子空间辅助回归,这些方法有望满足上述需求,从而提高回归模型的可解释性。我们通过开展仿真研究来评估它们的效果,并将它们与其他几种变量选择方法进行比较。对于不同的应用场景,我们也提供了基于预测能力和变量选择精度的选择建议。最后,为了证明这些方法在微生物组研究领域的实用价值,我们将所选择的方法应用于实际种群水平的微生物组数据,结果验证了我们方法的有效性。该方法的扩展形式为未来的组学研究特别是多元回归研究提供了有价值的指导,并为微生物组学及其相关研究领域的新发现奠定了基础。

补充材料

图片

图1

图2

图3

图4

参考文献

[ 1 ] Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. The sequence of the human genome. Science 2001;291(5507):1304–51. 链接1

[ 2 ] Metzker ML. Sequencing technologies—the next generation. Nat Rev Genet 2010;11(1):31–46. 链接1

[ 3 ] Lozupone CA, Stombaugh JI, Gordon JI, Jansson JK, Knight R. Diversity, stability and resilience of the human gut microbiota. Nature 2012;489(7415):220–30. 链接1

[ 4 ] Kurtz ZD, Müller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. Sparse and compositionally robust inference of microbial ecological networks. PLOS Comput Biol 2015;11(5):e1004226. 链接1

[ 5 ] Tsilimigras MCB, Fodor AA. Compositional data analysis of the microbiome: fundamentals, tools, and challenges. Ann Epidemiol 2016;26(5):330–5. 链接1

[ 6 ] Izenman AJ. Modern multivariate statistical techniques: regression, classification, and manifold learning. New York: Springer-Verlag; 2008. 链接1

[ 7 ] Kharratzadeh M, Coates M. Sparse multivariate factor regression. In: Proceedings of the 2016 IEEE Statistical Signal Processing Workshop; 2016 Jun 26–29; Palma de Mallorca, Spain; 2016.

[ 8 ] Binder JJ. On the use of the multivariate regression model in event studies. J Account Res 1985;23(1):370. 链接1

[ 9 ] Kim KA, Jung IH, Park SH, Ahn YT, Huh CS, Kim DH. Comparative analysis of the gut microbiota in people with different levels of ginsenoside Rb1 degradation to compound K. PLoS ONE 2013;8(4):e62409. 链接1

[10] Peng Y, Li SN, Pei X, Hao K. The multivariate regression statistics strategy to investigate content-effect correlation of multiple components in traditional Chinese medicine based on a partial least squares method. Molecules 2018;23 (3):545. 链接1

[11] Yachida S, Mizutani S, Shiroma H, Shiba S, Nakajima T, Sakamoto T, et al. Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer. Nat Med 2019;25 (6):968–76. 链接1

[12] Smith L. A tutorial on principal components analysis. Technical report. Dunedin: University of Otago; 2002 Feb. Report No.: OUCS-2002-12.

[13] Gleason PM, Boushey CJ, Harris JE, Zoellner J. Publishing nutrition research: a review of multivariate techniques—part 3: data reduction methods. J Acad Nutr Diet 2015;115(7):1072–82. 链接1

[14] Paliy O, Shankar V. Application of multivariate statistical techniques in microbial ecology. Mol Ecol 2016;25(5):1032–57. 链接1

[15] ter Braak CJF. Canonical correspondence analysis: a new eigenvector technique for multivariate direct gradient analysis. Ecology 1986;67(5):1167–79. 链接1

[16] Geladi P, Kowalski BR. Partial least-squares regression: a tutorial. Anal Chim Acta 1986;185:1–17. 链接1

[17] Chun H, Keles S. Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J R Stat Soc Series B Stat Methodol 2010;72(1):3–25. 链接1

[18] Bunea F, She Y,WegkampMH. Joint variable and rank selection for parsimonious estimation of high-dimensional matrices. Ann Stat 2012;40(5):2359–88. 链接1

[19] Mukherjee A. Topics on reduced rank methods for multivariate regression [dissertation]. Ann Arbor: University of Michigan; 2013. 链接1

[20] D’Ambra L, Amenta P, Gallo M. Dimensionality reduction methods. Metodoloski Zveski 2005;2(1):115–23. 链接1

[21] Izenman AJ. Reduced-rank regression for the multivariate linear model. J Multivariate Analysis 1975;5(2):248–64. 链接1

[22] Hoffmann K, Schulze MB, Schienkiewitz A, Nothlings U, Boeing H. Application of a new statistical method to derive dietary patterns in nutritional epidemiology. Am J Epidemiol 2004;159(10):935–44. 链接1

[23] Cespedes EM, Hu FB. Dietary patterns: from nutritional epidemiologic analysis to national guidelines. Am J Clin Nutr 2015;101(5):899–900. 链接1

[24] Vounou M, Nichols TE, Montana G; Alzheimer’s Disease Neuroimaging Initiative. Discovering genetic associations with high-dimensional neuroimaging phenotypes: a sparse reduced-rank regression approach. NeuroImage 2010;53(3):1147–59. 链接1

[25] Vounou M, Janousova E, Wolz R, Stein JL, Thompson PM, Rueckert D, et al. Sparse reduced-rank regression detects genetic associations with voxel-wise longitudinal phenotypes in Alzheimer’s disease. NeuroImage 2012;60(1):700–16. 链接1

[26] Chen L, Huang JZ. Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. J Am Stat Assoc 2012;107(500):1533–45. 链接1

[27] Chen L, Huang JZ. Sparse reduced-rank regression with covariance estimation. Stat Comput 2016;26(1–2):461–70. 链接1

[28] Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Series B Stat Methodol 2006;68(1):49–67. 链接1

[29] Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc 2006;101 (476):1418–29. 链接1

[30] Ma Z, Sun T. Adaptive sparse reduced-rank regression. 2014. arxiv:1403.1922.

[31] Huang J, Breheny P, Ma S. A selective review of group selection in highdimensional models. Stat Sci 2012;27(4):481–99. 链接1

[32] Peng J, Zhu J, Bergamaschi A, Han W, Noh DY, Pollack JR, et al. Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann Appl Stat 2010;4(1):53–77. 链接1

[33] Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143(1):29–36. 链接1

[34] Falony G, Joossens M, Vieira-Silva S, Wang J, Darzi Y, Faust K, et al. Populationlevel analysis of gut microbiome variation. Science 2016;352(6285):560–4. 链接1

[35] Wan Y, Wang F, Yuan J, Li J, Jiang D, Zhang J, et al. Effects of dietary fat on gut microbiota and faecal metabolites, and their relationship with cardiometabolic risk factors: a 6-month randomised controlled-feeding trial. Gut 2019;68 (8):1417–29. 链接1

[36] Sanna S, van Zuydam NR, Mahajan A, Kurilshikov A, Vila AV, Võsa U, et al. Causal relationships among the gut microbiome, short-chain fatty acids and metabolic diseases. Nat Genet 2019;51(4):600–5. 链接1

[37] Maier L, Pruteanu M, Kuhn M, Zeller G, Telzerow A, Anderson EE, et al. Extensive impact of non-antibiotic drugs on human gut bacteria. Nature 2018;555(7698):623–8. 链接1

[38] Segata N, Boernigen D, Tickle TL, Morgan XC, Garrett WS, Huttenhower C. Computational metaomics for microbial community studies. Mol Syst Biol 2013;9(1):666. 链接1

相关研究