科学网

 找回密码
  注册

tag 标签: LEfSe

相关帖子

版块 作者 回复/查看 最后发表

没有相关内容

相关日志

LEfSe在Linux下的安装过程
热度 1 xbinbzy 2016-8-17 17:16
此博文记录一下在linux下安装LEfSe的过程 下载好LEfSe之后,目录下有个requirements.txt文件,其中要求 R下利用install.packages()在线安装相关的包,在安装mvtnorm时,出现报错信息 goole查询到 http://stackoverflow.com/questions/335928/ld-cannot-find-an-existing-library ,按照此方式,locate libgfortran.so,发现有libgfortran.so.3和libgfortran.so.3.0.0,于是建立libgfortran.so.3的软连接为libgfortran.so,然后再重新install.packages(“mvtnorm”),成功安装 python的包利用pip install *进行安装,发现rpy2出现如下报错信息, locate相应的*.h文件,在/usr/include下建立相应的软连接,然后再pip install rpy2,成功实现安装。 再使用run_lefse.py进行测试时,出现报错信息 google检索到 http://stackoverflow.com/questions/30968865/could-not-install-rpy2-correctly ,以为是R和python的问题,为此重新安装了python和R,运行时依然出现同样的报错,此时怀疑是否rpy2没有安装成功,为此下载rpy2本地利用python setup.py install进行安装,报错信息提示: “R was not built as a library google进一步检索, http://stackoverflow.com/questions/16204246/installing-python-module-rpy2-after-installing-enthought-canopy ,猜测是R安装时未进行编译导致的出错,于是R安装时./configure --enable-R-shlib,加上参数“--enable-R-shlib”实现安装。 然后再安装rpy2正常,安装LEfSe需要的包正常。测试结果如下
个人分类: python|11141 次阅读|4 个评论
LEfSe的介绍
热度 1 xbinbzy 2016-7-5 15:26
参考文章: Metagenomic biomarker discovery andexplanation 杂志:Genome Biology 2011 此工具目的在于metagenomic biomarker discovery,具体的原理如下图: 主要分为3个步骤,输入的数据是m个样本,每个样本有n个属性。 Figure illustrates in detail the for-mat of the input (a matrix with n rows and m columns)and the three steps performed by the computationaltool: the KW rank sum test on classes, the pairwise Wilcoxon test between subclasses of differentclasses, and the LDA on the relevant features. 具体如下: Each of the n features is represented with a positive-valued vector containing its abundances in the m samples, and each sample is associated with values describing its class and, optionally, subclass and/or originating subject. 每个样本具有n个属性,同时包含样本的class和subclass信息。The factorial KW rank sum test is applied to each feature with respect to the class factor; the subclassand subject information are used as stratifying sub-groups when present. Features that, according to the KW rank sum test, do not violate the null hypothesis of identical value distribution among classes (with default P -value, a =0.05) are not further analyzed. (KW rank sum test,秩和检验的一种, 针对每个属性进行检验,比较不同类之间的差异性,过滤掉p-value大于0.05的属性,留下p-value小于0.05的属性进一步分析 )The pairwise Wilcoxon test is applied to retained features belonging to subclasses of different classes. For each feature, the pairwise Wilcoxon test is not satisfied if at least one comparison between subclasses has a P -value higher than the chosen a or if the sign of variation is not equal among all comparisons. For example, if a feature appears in samples from two classes with three subclasses each, all nine comparisons between subclasses in different classes must violate the null hypothesis, and all signs of the differences between medians must be consistent. The features that pass the pairwise Wilcoxontest are considered successful biomarkers. (针对第一步检验后留下来的属性,根据样本的subclass类别,基于Wilcoxon秩和检验,检测每个属性在subclass之间的差异性,假设A大类有3个小类,B大类有3个小类,A类中的每个小类需要与B类中的3个小类一一比较,如此经过9轮检验之后,挑选出在9次检查中均表明存在差异的属性,此界定为biomarker。) An LDA model is finally built with the class as dependent variable and the remaining feature values, subclass, and subject values as independent variables. This model is used to estimate their effect sizes, which are obtained by averaging the differences between class means (using unmodified feature values) with the differences between class means along the first linear discriminant axis, which equally weights features’ variability and discriminatory power. (最后采用LDA进行分析,LDA是linear discriminant analysis的简写,类别是因变量,筛选过后的属性、小类和样品是自变量,如此建立线性判别模型,然后利用模型前后的differences between class means去计算一个值,经过对数转化得到LDA score。) LDA的材料介绍可参考: http://blog.csdn.net/sunmenggmail/article/details/8071502 上述的方法需要注意样本量的问题, When few samples are available, non-parametric tests like the Wilcoxon have reduced power to detect differences. This can affect LEfSe when subclasses are very small, preventing the overall test from even rejecting the null hypothesis. For this reason, small subclasses should be avoided when possible, for example, by excluding them from the problem or by grouping together all sub-classes with small cardinalities. For cases in which removing or grouping subclasses is not possible or disrupts the biological consistency of the analysis, LEfSe substitutes the Wilcoxon test with a test to compare whether subclass medians differ with the expected sign. The user can choose the subclass cardinality threshold at which this median comparison is substituted for the Wilcoxon test. 在样本量较少时,可以采取合并,或者替换Wilcoxon test的策略进行分析。
个人分类: 科研文章|17542 次阅读|2 个评论

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-5-21 20:09

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部