# 编者信息 熊荣川 明湖实验室 xiongrongchuan@126.com http://blog.sciencenet.cn/u/Bearjazz Schematic illustration of the waiting times in a calibrated tree and the numbers of lineages present for each type of diversification process (interspecies diversification and within-species coalescence) during each waiting interval. Branches are categorized as either between species (thin lines) or within species branching (bold lines) according to the procedures described in Material and Methods. 校准树中种间、种内分化持续时间的示意图,以及在每个分化等待间隔期间,不同分化过程中出现的谱系数量。根据材料与方法中描述的程序,分支可分为种间(细线)或种内分支(粗线)。 Pons J, Barraclough T G, Gomez-Zurita J, et al. Sequence-based species delimitation for the DNA taxonomy of undescribed insects . Systematic biology, 2006, 55(4): 595-609.
# 编者信息 熊荣川 明湖实验室 xiongrongchuan@126.com http://blog.sciencenet.cn/u/Bearjazz Analyses of mtDNA Branching Times A statistical model was developed to test for the predicted change in branching rates at the species boundary. The overall aim of the procedure is to classify the observed branching time intervals defined by the nodes in a clock-constrained phylogram to either being the result of inter-specific (“diversification”) or intraspecific (“coalescent”) processes of lineage branching (Fig. 2). A full description of the model and its performance on simulated trees will be provided elsewhere (Barraclough, unpublished). 线粒体 DNA 分支时间分析 建立了一个统计模型来检验物种边界处分枝速率的预期变化。该程序的总体目的是将观察到的分支时间间隔(由时钟约束的系统树图中的节点界定)分类为种间分支过程(“多样化”)或种内分支过程(“溯祖”)(图 2 )。模型及其在模拟树上的性能的完整描述将在其他地方提供( Barraclough ,未发表)。 Pons J, Barraclough T G, Gomez-Zurita J, et al. Sequence-based species delimitation for the DNA taxonomy of undescribed insects . Systematic biology, 2006, 55(4): 595-609.
# 编者信息 熊荣川 明湖实验室 xiongrongchuan@126.com http://blog.sciencenet.cn/u/Bearjazz Extant ancestral nodes In natural populations, most haplotypes in the gene pool exist as sets of multiple, identical copies that originated by DNA replication. When one of these copies mutates to a new haplotype, it is extremely unlikely that other copies of the ancestral haplotype also mutate or that all copies of the ancestral haplotype rapidly become extinct. Thus, the ancestral haplotypes are expected to persist in the population and to be sampled together with their descendants. Traditional phylogenetic methods, based on a bifurcating TREE, can detect and artificially represent persistent ancestral haplotypes as occupying a branch of zero length at the basal node of a cluster. However, this approach relies on modifying (e.g. by estimation of branch lengths) an inappropriate model – a bifurcating tree with all haplotypes occupying tips or terminal branches. 现存的祖先节点 在自然种群中,基因库中的大多数单倍型以多组拷贝形式存在,这些拷贝由 DNA 复制而来。当这些拷贝中的一个突变为一个新的单倍型时,祖先单倍型的其他拷贝极不可能也跟着突变,或者祖先单倍型的所有拷贝都很快灭绝。因此,祖先的单倍型将在种群中持续存在,并与后代一起被取样。传统的系统发育方法基于一个分叉树,可以检测并人工表示持续的祖先单倍型,即在一个簇的基部节点上占据一个零长度的分支。然而,这种方法依赖于设置(例如,通过估计分支长度)一个不合适的模型,在这一分叉树模型中,所有单倍型都占据末梢或末端分支。 Posada D , Crandall K A . Intraspecific gene genealogies: trees grafting into networks . Trends in Ecology and Evolution, 2001, 16(1):0-45.
# 编者信息 熊荣川 明湖实验室 xiongrongchuan@126.com http://blog.sciencenet.cn/u/Bearjazz Polytomies can represent two different cases. First, they can represent the literal hypothesis that a common ancestral population split through cladogenesis (i.e., speciation) into multiple lineages. Under this interpretation, such an internal node is referred to as a hard polytomy (below left). Alternatively, someone who depicts a polytomy in a cladogram is not really expecting that the same ancestor gave rise to all daughter taxa, but is uncertain which resolved pattern is the best hypothesis. Under these circumstances, such a node is referred to as a soft polytomy (below right). This is actually the more common intended meaning of a polytomy. 多岐树可以表示两种不同的情况:一种是它们可以代表这样一个理论上的假设,即一个共同的祖先群体通过分枝进化(即物种形成)分裂成多个支系。基于这种解释下,这样的一个内部节点被称为硬多岐(左下)。另一种说法是,用分支图描述多分支的人并不是真的期望同一祖先产生所有子分类群,而是不确定哪一种解决模式是最好的假设。在这种情况下,这种结节被称为软多岐(右下)。这实际上是多岐的更普遍的本义。 Introduction to Phylogeny: Hard or Soft Polytomies? http://biology.fullerton.edu/biol404/phylo/polytomies.html
# 编者信息 熊荣川 明湖实验室 xiongrongchuan@126.com http://blog.sciencenet.cn/u/Bearjazz More often than not, when a phylogenetic dataset is divided into smaller partitions, each one gives rise to trees that have different topologies. One can draw two conclusions from this result: (1) one, some or all of the trees are wrong and the partitions share the same history, (2) or the trees are correct and the different partitions have experienced distinct evolutionary histories. Distinguishing between these options requires statistical testing to determine if the differences in topology are likely to have been observed simply by chance. There are many different tests of incongruence available in the field of systematic biology that all use comparable measurements and ideas. However, tests differ, sometimes subtly and sometimes drastically, in their assumptions, implementation, and interpretation. These details can be difficult to discern in the disjointed literature and controversies surrounding these tests. Incongruence tests may be broadly classified into tests that consider character information (character incongruence) and those that only consider tree shape or topology (topological incongruence). Character congruence analyses are particularly useful and powerful because they take both the tree topology and the underlying support for the tree topology into account. Topological congruence techniques have the advantage of being able to compare trees derived from data that may not be strictly comparable or easy to include in the same analysis. 通常情况下,当一个系统发育数据集被划分成更小的分区时,每个分区都会产生具有不同拓扑结构的树。从这个结果可以得出两个结论:( 1 )部分或全部的树是错误的,而各分区具有相同的历史;( 2 )或者树都是正确的,不同的分区经历了不同的进化历史。区分这些选项需要进行统计检验,以确定拓扑结构的差异是否可能只是偶然观察到的。 在系统生物学领域有许多不同的差异检验,都使用可比较的测量和想法。然而,检验在假设、实现和解释方面有时有微妙的,有时是显著的不同。在杂乱无章的文献和围绕这些检验的争论中,很难辨别出这些细节。 不一致性检验可以大致分为基础特征不一致的检验和只考虑树形状或拓扑(拓扑不一致)的检验。特征一致性分析特别有用和强大,因为它们同时考虑了树拓扑和对树拓扑的底层支持(数据来源)。拓扑同余技术的优点是能够比较来自数据的树,这些数据可能不具有严格的可比性,也不容易包含在同一分析中。 Planet P J . Tree disagreement: Measuring and testing incongruence in phylogenies . Journal of Biomedical Informatics, 2006, 39(1):86-102.
# 编者信息 熊荣川 明湖实验室 xiongrongchuan@126.com http://blog.sciencenet.cn/u/Bearjazz Here we present PhySortR, a fast, flexible R package for screening and sorting phylogenetic trees. The command-line package provides the quick and highly flexible sortTrees function, allowing for screening (within a tree) for “Exclusive” clades that contain only the target leaves and/or “Non-Exclusive” clades that include a defined portion of non-target leaves. Using simulated data, we assess the runtime of PhySortR based on the number of trees and the number of leaves within a tree, and demonstrate the potential of PhySortR in the analysis of multiple, large-scale empirical datasets. 在这里,我们介绍了 PhySortR ,一个快速,灵活的 R 包,用于筛选和分类系统发育树。该 R 语言包提供快速和高度灵活的树排序功能,允许筛选出(在树中)只包含目标末梢分类单元支系和 / 或包含所定义的非目标单元支系。利用模拟数据,我们根据树的数量和树内分类单元的数量评估了 PhySortR 的运行时间,并证明了包提供快速在复合大规模实际数据集分析中的潜力。 Stephens T G , Bhattacharya D , Ragan M A , et al. PhySortR: A fast, flexible tool for sorting phylogenetic trees in R . PeerJ, 2016, 4(5):e2038.
# 编者信息 熊荣川 明湖实验室 xiongrongchuan@126.com http://blog.sciencenet.cn/u/Bearjazz Getting trees into R : Trees in R are usually stored in the S3 phylo class (implemented in ape ), though the S4 phylo4 class (implemented in phylobase ) is also available. ape can read trees from external files in newick format (sometimes popularly known as phylip format) or NEXUS format. It can also read trees input by hand as a newick string (i.e., (human,(chimp,bonobo));). phylobase and its lighter weight sibling rncl can use the Nexus Class Library to read NEXUS, Newick, and other tree formats. treebase can search for and load trees from the online tree repository TreeBASE, rdryad can pull data from the online data repository Dryad. RNeXML can read, write, and process metadata for the NeXML format. PHYLOCH can load trees from BEAST, MrBayes, and other phylogenetics programs (PHYLOCH is only available from the author's website ). phyext2 can read and write various tree formats, including simmap formats. rotl can pull in a synthetic tree and individual study trees from the Open Tree of Life project. The treeio package can read trees in Newick, Nexus, New Hampshire eXtended format (NHX), jplace and Phylip formats and data output from BEAST, EPA, HyPhy, MrBayes, PAML, PHYLDOG, pplacer, r8s, RAxML and RevBayes. phylogram can convert Newick files into dendrogram objects. brranching can fetch phylogenies from online repositories, including phylomatic . 将系统发育树输入 R 中: R 中的树通常存储在 S3 类数据 Phylon 类(在 ape 包中实现)中,而 S4 类数据 Phylon4 类(在 phylobase 包中实现)也可用。 ape 可读取 newick 格式(有时被普遍称为 phylip 格式)或 nexus 格式的外部文件树。它还可以将手工输入的 newick 格式字符串读取为树(如 (human,(chimp,bonobo)); )。 phylobase 及其重量较轻的兄弟 rncl 可以使用 Nexus 类库读取 Nexus 、 Newick 和其他格式树文件。 treebase 可以从在线树存储库 TreeBASE 中搜索和加载树, rdryad 可以从在线数据库 Dryad 中提取数据。 RNeXML 可以读取、输出和处理 NeXML 格式的大数据。 PHYLOCH 可以从 BEAST 、 MrBayes 和其他系统发育学程序加载树( PHYLOCH 只能从作者的网站上获得)。 phyext2 可以读写各种格式树文件,包括 simmap 格式。 rotl 可以从生命之树开放项目中引入合成树和单独研究树。 treeio 包可以读取 Newick , Nexus , New Hampshire eXtended ( NHX )、 jplace 和 Phylip 等格式的树文件,以及由 BEAST, EPA, HyPhy, MrBayes, PAML, PHYLDOG, pplacer, r8s, RAxML and RevBayes 等程序输出的数据。 phylogram 可以将 Newick 文件转换成 dendrogram 对象。 brranching 可以从在线数据库中获取系统发育树,包括 phylomatic 。 https://cran.r-project.org/web/views/Phylogenetics.html
# 编者信息 熊荣川 明湖实验室 xiongrongchuan@126.com http://blog.sciencenet.cn/u/Bearjazz Adaptive radiations such as the Darwin’s finches on the Galapagos archipelago, the Hawaiian silver swords, the Caribbean anoles lizards, and the cichlid fish of the East African Great Lakes provide opportunities for studying the processes underlying rapid speciation (Schluter, 2000; Brakefield, 2006; Salzburger, 2009). Yet, resolving the phylogeny of adaptive radiations still remains a challenge. Due to the extreme rapidity of the radiations, lineage sorting often lags behind cladogenesis, complicating phylogenetic inference based on single or a few molecular markers (Pamilo and Nei, 1988; Maddison and Knowles, 2006). Furthermore, the short basal branches typical for rapid radiations can make the phylogenetic reconstructions sensitive to the choice of outgroup taxa. In particular if the phylogenetic distance between outgroup and ingroup taxa is large, t here is a high likelihood of homoplasy between ingroup and outgroup taxa. Consequently, the outgroup may attach randomly to the ingroup and bias the inferred ingroup tree topology (Wheeler, 1990; Huelsenbeck et al., 2002; Rota-Stabelli and Telford, 2008; Rosenfeld et al., 2012). 适应辐射,如加拉帕戈斯群岛上的达尔文雀、夏威夷银剑菊、加勒比安乐蜥和东非五大湖中的慈鲷鱼,为研究快速物种形成过程提供了机会( Schluter, 2000; Brakefield, 2006; Salzburger, 2009 )。然而,解决适应性辐射的系统发育仍然是一个挑战。由于辐射速度极快,谱系分类往往落后于枝状进化,使得基于单个或几个分子标记的系统发育推断复杂化变得复杂( Pamilo and Nei, 1988; Maddison and Knowles, 2006 )。此外,以快速辐射为代表的短基枝可以使系统发育重建对外群的选择敏感。尤其是当外群与内群之间的系统发育距离较大时,内外群分来单元之间很可能存在趋同现象。因此,外群可能被随机附加到内群,并导致内群系统发育树的拓扑结构推断出现偏差( Wheeler, 1990; Huelsenbeck et al., 2002; Rota-Stabelli and Telford, 2008; Rosenfeld et al., 2012 )。 Kirchberger P C , Sefc K M , Sturmbauer C , et al. Outgroup effects on root position and tree topology in the AFLP phylogeny of a rapidly radiating lineage of cichlid fish . Molecular Phylogenetics Evolution, 2014, 70(1):57-62.
# 编者信息 熊荣川 明湖实验室 xiongrongchuan@126.com http://blog.sciencenet.cn/u/Bearjazz Several methods have been developed to construct phylogenetic trees. In general these methods can be divided into two distinct classes: methods which use an optimality criterion to choose a tree or trees, and methods that use an algorithm to choose a tree. Optimality criterion: Accomplishes the goal of estimating a phylogeny by defining criteria for comparing alternative phylogenies to one another and deciding which tree is better, or that more than one are equally good. Ideally, the simplest approach to find the best tree(s) would be to evaluate every possible tree and choose the best one based on your criterion. Felsentstein (1978) calculated the number of possible rooted, bifurcating trees for specific numbers of terminal taxa. Based on these calculations it can be seen that the number of possible trees for a fairly low number of taxa soon becomes quite large. 目前,已发展出几种构建系统发生树的方法。一般来说,这些方法可以分为两类:使用最优性准则筛选树的方法和使用算法筛选树的方法。 最优性标准:通过定义比较不同系统发育树的标准,并决定哪一棵树(或哪些树)更好,从而实现估计系统发育的目标。理想情况下,找到最佳树的最简单方法是评估每个可能的树,并根据既定标准选择最优树。 Felsentstein ( 1978 )计算了特定数量终端分类群下可能的有根分叉树的数量。根据这些计算,可以看出,即使基于相当低数量的分类群,可能的系统发育树很快就会变得相当大。 http://bio.slu.edu/mayden/systematics/bsc420520lect12.html Tree Building Techniques 建树方法
1. Cytochrome P450 function and evolution 2.Myosin Unrooted Tree 3. Phylogenetic tree of env-V1/V2 nucleotide sequences 4. Phylogenetic tree of NAC proteins of potato, Arabidopsis and rice
使用编码基因和非编码基因重建系统发育树的基本方法 以线粒体基因组为例 All sequences from the L-strand – encoded genes (ND6 and 8 tRNA genes) were converted into complementary strand sequences. We dealt with the protein-coding gene sequences as follows to construct 3 different data sets (all including rRNA and tRNA gene sequences): (1) all positions included (designated as 123 n RT n ; subscript ‘ n ’ denotes nucleotides). (2) third codon positions converted into purine (R) and pyrimidine (Y) (RY-coding; designated as 12 n 3 r RT n ; subscript ‘ r ’ denotes RY-coding; Phillips and Penny 2003; Harrison et al. 2004). (3) third codon positions excluded (designated as 12 n RT n ). 参考文献 ( Saitoh et al., 2006 ) Saitoh K., Sado T., Mayden RL, Hanzawa N., Nakamura K., Nishida M., Miya M. 2006. Mitogenomic evolution and interrelationships of the Cypriniformes (Actinopterygii: Ostariophysi): The first evidence toward resolution of higher-level relationships of the world’s largest freshwater fish clade based on 59 whole mitogenome sequences. Journal of Molecular Evolution, 63: 826-841.
今天本来在试用PHYML中的最大似然法来构建系统发育树,后来与人交流,他要对一些序列进行比对,我推荐他用Mega软件,就让它从网上下载,他下载后,我发现Mega4已升级到5.0版本了。再一看,发现它居然支持最大似然法来构建系统发育树了,而以前的4.0版是不支持的。我赶紧自己也下载了一个,安装运行后,发现确实能用。不过美中不足的是,在该软件的主界面上有一个提示,说:This is a beta test release. Please do not use results generated in publications。哎,不管怎么说,以前大家都用装在苹果机上的PAUP中的ML法来构建系统发育树,而操作起来十分困难,现在好了,有MEGA5了。我试着用自己的16S序列数据,用ML法建了个树,发现所需时间不是很长,我在写帖子的时候,运行了17分钟了,完成了69%的任务。看来速度不会象苹果机上运行PAUP那么长的时间了。喜欢尝鲜的人赶紧下载下来试试。 除了这个ML法的大改进外,Mega5还有其它方面的改进,看起来和使用起来都比以前的mega4要好用。期待正式版的早日诞生! Mega5的下载地址为: http://www.megasoftware.net/beta/index.php ,需要填上姓名和email地址,然后从email中确认一下即可以下载了。