# 编者信息 熊荣川 明湖实验室 xiongrongchuan@126.com http://blog.sciencenet.cn/u/Bearjazz An approximately unbiased (AU) test that uses a newly devised multiscale bootstrap technique was developed for general hypothesis testing of regions in an attempt to reduce test bias. It was applied to maximum-likelihood tree selection for obtaining the confidence set of trees. The AU test is based on the theory of Efron et al. (Proc. Natl. Acad. Sci. USA 93:13429-13434; 1996), but the new method provides higher-order accuracy yet simpler implementation. The AU test, like the Shimodaira-Hasegawa (SH) test, adjusts the selection bias overlooked in the standard use of the bootstrap probability and Kishino-Hasegawa tests. The selection bias comes from comparing many trees at the same time and often leads to overconfidence in the wrong trees. The SH test, though safe to use, may exhibit another type of bias such that it appears conservative. Here I show that the AU test is less biased than other methods in typical cases of tree selection. These points are illustrated in a simulation study as well as in the analysis of mammalian mitochondrial protein sequences. The theoretical argument provides a simple formula that covers the bootstrap probability test, the Kishino-Hasegawa test, the AU test, and the Zharkikh-Li test. A practical suggestion is provided as to which test should be used under particular circumstances. 为了减少 多区域通用假设检验偏差 ,近无偏检验( AU test )这一多尺度自举检验技术被开发了出来。它应用于最大似然树选择,以得到树的置信集。 AU 检验基于 Efron 等人的理论( Proc. Natl. Acad. Sci. USA 93:13429-13434; 1996 ),但新方法精度更高,操作更简便。 AU 检验,像 Shimodaira-Hasegawa ( SH )检验一样,调整了选择偏差,而这些偏差是被标准自举检验概率方法和 Kishino-Hasegawa 检验所忽略的。选择偏差来自于同时比较多棵树,并且常常导致错误树的过度自信。虽然使用 SH 检验较为保险,但它可能会显示出另一种类型的偏差,即偏保守。在这里,我证明了在典型的树选择情况下, AU 检验比其他方法的偏差更小。这些观点在模拟研究和哺乳动物线粒体蛋白序列分析中得到了说明。理论论证提供了一个简单的公式,涵盖了自举概率检验、 Kishino-Hasegawa 检验、 AU 检验和 Zharkikh-Li 检验。本研究还提出了在特殊情况下应采用何种检验的实用建议。 Shimodaira H . An Approximately Unbiased Test of Phylogenetic Tree Selection . Systematic Biology, 2002, 51(3):492-508.
偶然看到了一个杂志的 Instructions for Authors - Specific requirements , 比较细致地罗列出了 “常用统计检验结果”的写作格式, 如下。 各大期刊的数据统计结果的描述其实也都大同小异,下述格式较为通用,希望对科技论文写作新手有所帮助。 --------------------------------------------- Give means and standard errors/standard deviations with their associated sample size in the format: X ± SE = 35.09 ± 0.07 km, n = 15. When standard deviation/error is shown in an illustration, n should be given as well. Statistical tests use the following formats: (ANOVA, F (1,25) = 8.56, P = 0.035) (Kruskal-Wallis test, H 25 = 123.7, P = 0.001) (Chi-square test, X 22 = 0.23, P = 0.57) (Paired t test, t 24 = 2.33, P = 0.09) (Linear regression, r 2 = 0.94, F 1,66 = 306.87, P 0.001) (Spearman rank correlation, r s = 0.60, N = 33, P 0.01) (Wilcoxon signed-ranks test, T = 7, N = 33, P 0.05) (Mann-Whitney U test, U = 44, N 1 = 7, N 2 = 24, P 0.02) Please either give the exact P-value of a statistical test, or state P0.0xxx, if this is not possible. P=0 is not valid. 上述最后一点我的理解:一定 不能写 P = 0.0000,不管软件统计结果中P值后面是否全是零 ;根据实际情况可以写成 P 0.0001。 ---------------------------------------------------------------
用数据说话系列(4): 独立样本、配对样本及单样本 t 检验 样本数 至少每组多少为宜 梅卫平 Basic knowledge worth spreading! 姑且先不说 t检验前提要求数据服从正态分布,以下两点需要注意: # 注意点一:一般来讲,希望有 80% 以上的统计功效 (Statistical Power Level)假设检验才有效。 # 注意点二: 另外, 效应量(Effect Size,或R语言中为delta),反映处理效应大小的度量。即,两样本 平均数的差异,一般 delta=1 。 # n : number of observations (per group). 结果显示:一般情况(即达到80%以上统计功效的前提下), 两独立样本 双尾 t检验至少需要每组 17 个样本, 两独立样本 单尾 t 检验最少需要每组 13 个样本。 补充: power.t.test(power = 0.8,delta = 1,type = paired) # n= 9.937864 # 双尾 配对样本 t 检验 至少每组 10 个样本 power.t.test(power = 0.8,delta =1,type = paired,alternative = one.side) # n = 7.727622 # 单尾 配对样本 t 检验至少每组 8 个样本 power.t.test(power = 0.8,delta =1,type = one.sample) # n = 9.937864 # 双尾 单样本 t 检验 至少每组 10 个样本 power.t.test(power = 0.8,delta =1,type = one.sample,alternative = one.side) # n = 7.727622 # 单尾 单样本 t 检验至少每组 8 个样本 When delta=1,power against n for independent two-sample t-test(n indicates sample number per group) n 1 2 3 4 5 6 7 8 9 10 Power Na 0.09131 0.1572 0.2224 0.2859 0.3471 0.4056 0.4611 0.5133 0.5619 n 11 12 13 14 15 16 17 18 19 20 Power 0.6070 0.6486 0.6867 0.7214 0.7529 0.7813 0.8070 0.830 0.850 0.8689 n 21 22 23 ... 50 100 1000 10000 … Power 0.8852 0.8997 0.9124 0.9986 0.9999 1 1 Note : two - side t-test. # 计算过程(在R软件中运行)如下: #---------------------------------------------------------- power.t.test(n = 4, delta = 1) Two-sample t test power calculation n = 4 delta = 1 sd = 1 sig.level = 0.05 power = 0.2224633 # 样本数为4的话,统计功效very bad alternative = two.sided NOTE: n is number in *each* group power.t.test(n = 20, delta = 1) Two-sample t test power calculation n = 20 delta = 1 sd = 1 sig.level = 0.05 power = 0.8689528 # 样本数为20 的话,统计功效 good alternative = two.sided NOTE: n is number in *each* group power.t.test(power = 0.80, delta = 1) Two-sample t test power calculation n = 16.71477 # very important # 两样本双尾t test,至少每组17个样本 delta = 1 sd = 1 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number in *each* group power.t.test(power = 0.80, delta = 1, alternative = one.sided) Two-sample t test power calculation n = 13.09777 # very important # 两样本单尾t test,至少每组13个样本 delta = 1 sd = 1 sig.level = 0.05 power = 0.8 alternative = one.sided NOTE: n is number in *each* group # -------------------------------------------------- # 特定情况,比如:效用值(Effect Size或曰 delta)为2的时候 power.t.test(power = 0.80, delta = 2) Two-sample t test power calculation n = 5.090008 # 特定条件,效用值=2 的情况, 双尾只需要至少每组 5个样本 delta = 2 sd = 1 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number in *each* group power.t.test(power = 0.80, delta = 2, alternative = one.sided) Two-sample t test power calculation n = 3.987012 # 特定条件,效用值=2 的情况, 单尾只需要至少 每组 4 个样本 delta = 2 sd = 1 sig.level = 0.05 power = 0.8 alternative = one.sided NOTE: n is number in *each* group 参考博文: 1. 李淼新 : 您的t检验显著结果只是因为你的 运气吗? 2. Power calculations for one and two sample t tests 3. Statistical power 4. 统计功效和效应值 5. t.test with varying delta 纰漏和错误之处在所难免,恳请您批评指正! 系列文章 用数据说话系列(1): 样本数,数据顺序对 t test 的影响 用数据说话系列(2): 样本数,数据顺序对聚类分析的影响 用数据说话系列(3): 样本数,数据顺序对方差分析ANOVA的影响 用数据说话系列(4): 各种 t 检验 样本数 至少每组多少为宜 用数据说话系列(5): 非参数检验SteelDwass test和 Dunn test选谁