# 编者信息 熊荣川 明湖实验室 xiongrongchuan@126.com http://blog.sciencenet.cn/u/Bearjazz Here, we present a major new version of the molecular evolutionary software package Bayesian Evolutionary Analysis by Sampling Trees (BEAST), updated to version 1.7, and representing a signifcant software advance over that previously described (Drummond and Rambaut 2007). Alongside the primary analysis engine in BEAST, this package also includes a suite of utilities for specifying the analysis design, processing output files, and summarizing and visualizing the results. Taken together, these programs enable Bayesian inference of molecular sequences with an emphasis on time-structured evolutionary models including phylodynamic models, divergence time estimates, multiloci demographic models, gene–/species–tree inference, a range of spatial phylogeographic analyses, and discrete and continuous trait evolution. Implementing Markov chain Monte Carlo (MCMC) algorithms to perform these inferences, the package is intended and used for rigorous statistical inference and hypothesis testing of evolutionary models with joint inference of phylogeny. It is also possible to constrain portions of the phylogenetic model space to known values, including the tree topology, and perform conditional inference if required. 在这里,我们提出了 BEAST 一个重要更新版本( 1.7 版),相较先前介绍的版本( Drummond and Rambaut 2007 )它代表了一个显著的软件进步。除了 BEAST 中的主要分析核心要件外,此软件包还包括一套用于指定分析设置、处理输出文件以及汇总和可视化结果的实用程序。综合起来,这些程序使分子序列的贝叶斯推断成为可能,重点是时间结构进化模型,包括系统动力学模型、分化时间估计、多基因座种群模型、基因 / 物种树推断、空间分布范围系统地理分析,以及离散和连续的特征进化。利用马尔可夫蒙特卡罗( MCMC )算法实现这些推断,并将其应用于系统发育联合推论的进化模型的严格统计推断和假设检验。还可以将系统发育模型空间的一部分限制为已知值,包括树拓扑结构,并在需要时执行条件推断。 Drummond A J , Suchard M A , Xie D , et al. Bayesian Phylogenetics with BEAUti and the BEAST 1.7 . Molecular Biology and Evolution, 2012, 29(8):1969-1973.
# 编者信息 熊荣川 明湖实验室 xiongrongchuan@126.com http://blog.sciencenet.cn/u/Bearjazz Molecular sequences, morphological measurements, geographic distributions, and fossil remains all provide a wealth of potential information about the evolutionary history of life on Earth, the dynamics of ancient and modern biological populations, and the emergence and spread of infectious diseases. One of the challenges of modern Evolutionary Biology is the integration of these different data sources to address evolutionary hypotheses over the full range of spatial and temporal scales. The field is witnessing a transition to an increasingly quantitative science. This transformation began first through an explosion of molecular sequence data with the parallel development of mathematical and computational tools for their analysis. However, increasingly, this transformation can be observed in other aspects of Evolutionary Biology where large global databases of complementary sources of information, such as fossils, geographical distributions, and population history, are being curated and made publicly available. 分子序列、形态测量、地理分布和化石遗迹都提供了大量关于地球生命进化史、古代和现代生物种群动态以及传染病出现及传播的潜在信息。现代进化生物学的一个挑战是整合这些不同的数据源,以解决全时空尺度上的进化假设。这一领域见证了一个越来越向定量科学过渡的过程。这种过渡是从分子序列数据爆炸式增长,伴随相应分析的数学和计算工具的开发开始的。然而,这种转变越来越多地可以在进化生物学的其他方面观察到,在这些领域,大量的全球互补信息源数据库(如化石、地理分布和种群历史)正在被建立和公开。 Drummond A J , Suchard M A , Xie D , et al. Bayesian Phylogenetics with BEAUti and the BEAST 1.7 . Molecular Biology and Evolution, 2012, 29(8):1969-1973.
# 编者信息 熊荣川 明湖实验室 xiongrongchuan@126.com http://blog.sciencenet.cn/u/Bearjazz Computational evolutionary biology, statistical phylogenetics, and coalescent-based population genetics are becoming increasingly central to the analysis and understanding of molecular sequence data. We present the Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software package version 1.7, which implements a family of Markov chain Monte Carlo (MCMC) algorithms for Bayesian phylogenetic inference, divergence time dating, coalescent analysis, phylogeography, and related molecular evolutionary analyses. This package includes an enhanced graphical user interface program called Bayesian Evolutionary Analysis Utility (BEAUti) that enables access to advanced models for molecular sequence and phenotypic trait evolution that were previously available to developers only. The package also provides new tools for visualizing and summarizing multispecies coalescent and phylogeographic analyses. BEAUti and BEAST 1.7 are open source under the GNU lesser general public license and available at 计算进化生物学、统计系统发育学和基于溯祖理论的群体遗传学正日益成为分析和理解分子序列数据的焦点。我们提出了 BEAST (基于抽样树的贝叶斯进化分析) 1.7 版软件包,该软件包使用马尔可夫链蒙特卡罗( MCMC )算法,实现贝叶斯系统发育推断、分化时间推定、溯祖分析、系统地理学和相关分子进化的一系列分析。软件包包括一个名为 Bayesian 进化分析程序( BEAUti )的图形增强用户界面程序,允许访问分子序列和表型特征进化的高级模型,这些模型以前只供开发人员使用。该包还提供了可视化的综合多物种溯祖分析和系统地理学分析的新工具。 BEAUti 和 BEAST 1.7 是 GNU 标准较低通用公共许可证下的开源软件,可从以下网址下载: http://beast-mcmc.googlecode.com and http://beast.bio.ed.ac.uk . Drummond A J , Suchard M A , Xie D , et al. Bayesian Phylogenetics with BEAUti and the BEAST 1.7 . Molecular Biology and Evolution, 2012, 29(8):1969-1973.
1. European summer temperatures since Roman times Luterbacher et al., 2016 ERL The spatial context is critical when assessing present-day climate anomalies, attributing them to potentialforcings and making statements regarding their frequency and severity in a long-term perspective. Recentinternational initiatives have expanded the number of high-quality proxy-records and developed newstatistical reconstruction methods. These advances allow more rigorous regional past temperaturereconstructions and, in turn, the possibility of evaluating climate models on policy-relevant, spatiotemporalscales. Here we provide a new proxy-based, annually-resolved, spatial reconstruction of theEuropean summer(June – August)temperature fields back to 755 CE based on Bayesian hierarchicalmodelling (BHM), together with estimates of the European mean temperature variation since 138 BCEbased on BHM and composite-plus-scaling (CPS). Our reconstructions compare well with independentinstrumental and proxy-based temperature estimates, but suggest a larger amplitude in summertemperature variability than previously reported. Both CPS and BHM reconstructions indicate that themean 20th century European summer temperature was not significantly differentfrom some earliercenturies, including the 1st, 2nd, 8th and 10th centuries CE. The 1st century (in BHM also the 10thcentury) may even have been slightly warmer than the 20th century, but the difference is not statisticallysignificant. Comparing each 50 yr period with the 1951 – 2000 period reveals a similar pattern. Recentsummers, however, have been unusually warm in the context of the last two millennia and there are no30 yr periods in either reconstruction that exceed the mean average European summer temperature of thelast 3 decades(1986 – 2015 CE). A comparison with an ensemble of climate model simulations suggeststhat the reconstructed European summer temperature variability over the period 850 – 2000 CE reflectschanges in both internal variability and external forcing on multi-decadal time-scales. For pan-European temperatures we find slightly better agreement between the reconstruction and the model simulationswith high-end estimates for total solar irradiance. Temperature differences between the medieval period,the recent period and the Little Ice Age are larger in the reconstructions than the simulations. This mayindicate inflated variability of the reconstructions, a lack of sensitivity and processes to changes in externalforcing on the simulated European climate and/or an underestimation of internal variability oncentennial and longer time scales.
贝叶斯定理哲学上的意义之一,就是除了和我们关心的集合 A 和 B 之外,还有一些事件,它们与集合 A 和 B 之间,可能没有关系,也可能有关系但尚未发现或者尚未想到,我们先把他们归类为( -A + -B ),留待抽丝剥茧。 上一篇博文,讲应急反应的,没有得罪安哥拉、也没有踩科学网红线。但是毕竟有了没有想到的负面响应。所以主动隐藏了。这样有点对不起发表评论的网友们。抱歉!冷一段时间,修理修理,争取修复吧。这个负面响应真的属于( -A + -B )。 再举一个例子: 刘杜鹏 博主 · 在转基因问题上我为什么给方舟子站台 文章不错。但他的博客名: pengduliu 是怎么回事?从他的博文看起来,他不知道姓名中译英习惯的可能性很小;闹不清楚他自己究竟该姓什么的可能性更小。运用贝叶斯定理,老邪算出他双亲分别姓刘和杜的可能性大于其它可能。其中,父亲姓刘的可能性大于母亲姓刘的可能性。 没有冒犯的意思,只希望 刘杜鹏 博主证实一下老邪的贝叶斯定理用错了没有。
贝叶斯统计和正规化.docx (注:博文中信息不全,见附件) 正规化可以用来防止过拟合,并且保留所有的参数。 1 、极大似然估计( ML )如下图,其哲学思想是在数据之后有一组参数θ来生成 x 和 y ,注意θ是真实存在的,并不是变量,也可以说θ就是关于变量 x 的一个函数参数,只不过到底是什么需要我们自己去估计,这是我们要做的。 ML 算法的目的就是找到这样的θ使得用其作为参数来估计的准确性达到最大。以上的分析是频率学家的观点,属于统计学派。 2 、另外一种观点就是贝叶斯学派的观点。他们事先不知道θ的值是多少,但是他们会假设θ服从一个先验分布来标示θ的不确定性。比如θ可能服从一个高斯先验分布或者一个β先验分布。 若给定一个训练集 ,则我们会计算θ的后验分布,即加入了训练集中的后验知识之后的θ的分布 p( θ |S) 。有: , 事实上可以看到分母是对θ的积分,只要先验分布确定了,其值就是一个定值,也可以说不会影响估计θ的结果。因此可以看成: ,即后验分布只和分母的部分有关系。 这样的话,假设训练集中的 x 标示是房子的属性信息, y 标示的是价格,我们要找到一组参数 来利用 x 估计 y 。则对于一个新房子的价格进行预测时就用上式估计出来的后验分布 p( θ |S) 进行预测,如下: , 进一步的,在给定训练集 S 和输入 x 的情况下估计 y 的期望值的时候需要计算 。 θ的维度可能非常高,计算积分非常困难,因此一般不会计算完整的后验 p( θ |S) ,而是进行近似的计算,然后用得到的确定的一个点θ来代替其积分,比如最常用的一个方法就是 MAP ( maximum a posteriori )了,其得到的形式如下所示: 然后进行预测时,只需要用函数: 。 使用贝叶斯方法能够有效的避免极大似然估计中的过拟合现象,这是因为贝叶斯方法在加入了训练样本信息(先验知识)后会平滑数据。用数学的观点来看,极大似然估计(比如 logistic 线性回归)的目标函数为 ,而贝叶斯方法在加入了先验知识后其目标函数实际上变成了 。
概述 贝叶斯网(directed acyclic graph), MRF( undirected graph)是两种重要的概率图模型,在日益网络化的世界里,它们能对众多事物、现象合理建模,包括:社交网络(facebook, twitter)、交通网络(transportation,travel,network design)、医疗诊断等等,Russell还用它建立全球的crust vibration network, 用以分析预测 核武器生产地、地震。从马克思哲学的角度讲,运 动着的物质世界是普遍联系和永恒发展的,这种联系正是可以用图模型来模拟,而发展便是图模型的inference,看来Bayesian Network, MRF确实是广泛、客观存在的规律。从计算机视觉、机器学习专业领域来看,它更有着广泛的应用、很大的research community。 Equivalence of Joint Distribution and Graph 本质上来讲,用graphical model解决一个问题,首先是要知道变量间的独立关系(independency among variables),这样才便于建立图(I-Map)。需要说明的是,无论是bayesian network,还是MRF,他们的定义是(P,G),也就是同时定义概率和图,在joint distribution 严格 positive的情况下,联合分布和图是等价的:图是联合分布的I-Map,联合分布也能factorize over the graph。另外,对于贝叶斯图,概率分布在图上的factorization的定义很简单,而对于MRF,factoriztion的定义是:Joint distribution中每个factor都是图H的complete subgraph。注意了: complete subgraph并不是 maximal clique,也就是,如果我们建立的图模型中,每三个变量组成一个clique,这时,与这个图结构compatible的joint distribution完全可以是pairwise的clique potentials相乘而得到的。不过如果我们建立的图结构是每3个variable组成一个clique,我们会随之而建立 tri-order clique potentials相乘的gibbs distribution, 而不是 pairwise Markov models. Drawback 从机器学习的角度来讲,MRF和bayesian network都是parametric methods,他们最大的弊病是对模型要假设,然后训练模型参数;to one extreme, 模型完全符合实际,这样当然很好的解决了问题;to another extreme,先验模型is problematic,now we actually deviate from original problems. 所以William Freeman(MIT)写了一篇文章,用MRF求立体视差,能量低并不代表disparity map is closer to ground truth.换言之,MRF model contructed is not compabible with truth.这个问题怎么解决? introduction of kernal will work? 下面推荐三篇paper: 1. Comparison of graph cuts with belief propagation for stereo, using identical MRF parameters , 2003, Tappen 2. Robust higher order potentials for enforcing label consistency , 2009, P. Kohli 3. An application of markov random fields to range sensing , 2006, J. Diebel