xbinbzy的个人博客分享 http://blog.sciencenet.cn/u/xbinbzy


多重校正-How does multiple testing correction work

已有 7842 次阅读 2015-8-23 14:33 |系统分类:科研笔记|关键词:学者| 多重校正

文章:How does multiple testing correction work?

杂志:Nature Biotechnology


   判断是否存在显著性差异的标准有p-value,False Discovery rate,q-value,此文主要介绍p-value,false discovery rate和q-value之间的关系。

   p-value是对单个值的显著性分析,the P-value is only statistically valid when a single score is computed. 在若干次检验过程中,需要用到multiple testing correction。

   进行multiple testing correction时,最常用的策略是Bonferroni adjustment,用设置的α 水平除以n,Bonferroni adjustment有助于控制“family-wise error rate”,这种策略是说明在null hypothesis下,检验值99%是不可能发生的,但是这个对于一些研究来讲稍显严厉,为此提出了不同策略(在null hypothesis下,检验值出现的可能性大小),基于这个基础,产生了FDR。(In many multiple testing settings, minimizing the family-wise error rate is too strict. Rather than saying that we want to be 99% sure that none of the observed scores is drawn according to the null, it is frequently sufficient to identify a set of scores for which a specified percentage of scores are drawn according to the null. This is the basis of multiple testing correction using false discovery rate (FDR) estimation.



       1)snull/sobs,the number sobs of observed scores t and the number snull of null scores t. 用理论分布的数目除以观察到的数目,当界定了FDR的值,如此就可以反推回去,对应的界限值在什么地方。

       2)可以利用P-values来计算FDR,It is also possible to compute FDRs from P-values using the Benjamini-Hochberg procedure,which relies on the P-values being uniformly distributed under the null hypothesis。具体是用p-value按升序排列,再用每个p-value除以此值排序后的百分比,从而估算出FDR。if the P-values are uniformly distributed, then the P-value 5% of the way down the sorted list should be ~0.05. Accordingly, the procedure consists of sorting the P-values in ascending order, and then dividing each observed P-value by its percentile rank to get an estimated FDR. In this way, small P-values that appear far down the sorted list will resultin small FDR estimates, and vice versa.

   当有实际数据的分布时,直接用实际分布去计算FDR。若是理论分布模型时,计算P-value,再按Benjiamini-Hochberg的策略去计算FDR。这两种处理策略满足大部分数据分析的需求。但是这两种策略对于FDR的估算不是特别准确,为此有更准确的估算FDR方法提出,这些方法多是通过估算parameter π0 表示which represents the percentage of the observed scores that are drawn according to the null distribution。(These simple FDR estimation methods are sufficient for many studies, and the resulting estimates are provably conservative with respect to a specified null hypothesis; that is, if the simple method estimates that the FDR associated with a collection of scores is 5%, then on average the true FDR is 5%. However, a variety of more sophisticated methods have been developed for achieving more accurate FDR estimates (reviewed in ref. 5). Most of these methods focus onestimating a parameter π0, which representsthe percentage of the observed scores that are drawn according to the null distribution.

   Complementary to the FDR, Storey proposed defining the q-value as an analog of the P-value that incorporates FDR-based multiple testing correction. 主要在于FDR存在方法内在的缺乏,(when considering a ranked list of scores, it is possible for the FDR associated with the first m scores to be higher than the FDR associated with the first m + 1 scores.  ),为此,Storey proposed defining the q-value as the minimum FDR attained at or above a given score,主要是取某个特定值的最小FDR做为q-value. If we use a score threshold of T, then the q-value associated with T is the expected proportion of false positives among all of the scores above the threshold.



下一篇:HMP-Metagenomic Pyrosequencing and Microbial Identification


该博文允许注册用户评论 请点击登录 评论 (0 个评论)


Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-6-3 20:22

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社
