鸟类是这个星球上最神奇的物种之一,不说大自然中目光所及的鸟类,从风靡世界的卡通人物唐老鸭到电影里约大冒险里的金刚鹦鹉便可见一斑。连"愤怒的小鸟"这款游戏,游戏开发商Rivio公司也正在打造其电影版。 鸟类是体表被覆羽毛、有翼、恒温和卵生的高等脊椎动物。旺盛的新陈代谢和飞行运动是鸟类与其它动物显著不同的特征。其在分类上属于动物界脊索动物门鸟纲。目前全世界记录到的鸟类有9700多种,分为3个总目(平胸总目、企鹅总目、突胸总目),约28个目,其中雀形目就包括5000种以上的种类。中国现有鸟类1332种(2261种及亚种),隶属于24目101科429属,其中105种为特有种。(来自百科,不一定准确) 鸟类漂亮的身躯后,藏着许多值得研究的问题。例如: 古鸟类研究主要关注三大起源问题,即鸟类起源、飞行起源、羽毛起源。这些原本在学界存在巨大争议的问题,随着在中国大地上一些关键恐龙化石的相继发掘,从而为争端的解决提供了一些核心证据,也使得中国跻身于该领域强者之列。 鸟类的系统分类研究。早期主要借助于形态学分类,后来逐渐引入了分子生物学手段,借助部分细胞核和/或线粒体基因序列来进行系统分类研究。 鸟类行为,栖息地及环境适应研究。鸟类求偶行为,繁殖行为、取食行为、社群行为等等,皆是鸟类适应环境的结果。以鸟类鸣叫为例,雀形目鸟类的鸣叫,如同婴儿学语一般,也是后天习得,故而鸟类鸣啭控制系统已成为人们研究神经系统与学习、行为和发育关系的重要模型。 当然如果要说起与人们生活息息相关的鸟类研究,可能便是家禽的驯化与育种,以及禽流感H5N1了。 随着全基因组测序技术兴起后,鸟类研究也迅速进入了该领域。 第一个被破译全基因组序列的鸟类是鸡形目/雉科/原鸡属下的红原鸡(现代鸡的祖先),其为揭示脊椎动物进化提供了新的视角(2004,Nature,Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution)。 时隔六年之后,雀形目/文鸟科/梅花雀属下的斑胸草雀基因组破译(2010,Nature, The genome of a songbird);同年,鸡形目/吐绶鸡科/吐绶鸡属下火鸡基因组破译(2010,PLoS Biology,Multi-Platform Next-Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): GenomeAssembly and Analysis)。2012年,雀形目/鹀科/地雀属下达尔文雀基因组破译(http://gigadb.org/darwins-finch/),同年,雀形目/鹟科/姬鹟属下白领姬鹟基因组破译(2012,Nature,The genomic landscape of species divergence in Ficedula flycatchers)。目前,暂时雀形目处于破译优势地位,难道是因为其叫声婉转之故?更易博得研究人员“另眼相待”? 在鸟类基因组学研究中,Avian Genomes(http://aviangenomes.org/)是个不错的网站,其主旨是打造鸟类基因组学研究平台,但不知何故,更新很慢。随着越来越多鸟类, 尤其是处于进化关键地位的鸟类 基因组破译,依靠全基因组序列,构建整个鸟类进化树的目标将会变得越来越近。同时,更多鸟类基因组学数据的积累,将会产生下一个“鸟类大爆发”,新的发现(理论)呼之欲出。
在生物学特别是基因组学的研究工作中,经常会遇到多重假设检验(multiple testing)的问题;此时,得到的原始p值需要进行校正后才能使用,那么哪种校正方法更加适合自己的研究工作呢?p-values, false discovery rates(FDR) 和 q-values有什么不同?它们分别代表什么意义?对于统计科班的同学来说,这不过是小菜一碟;但对于纯生物出身的同学来说,别说去看公式了,光是听听就觉得头大!不过幸运的是,有牛人(William S Noble)了解我们的苦衷,于是一篇nature biotechnology的文章诞生了——《How does multiple testing correction work?》。这片文章不长,只有3页,用不了多长时间就可以看完。更加令人高兴的是,全篇没有一个让人头大的公式;了解基本的统计学知识、特别是p值的相关概念之后,阅读这片文章就不会有太大的困难了。作者以一个生物学例子贯穿全篇,这个例子对于大多数生物专业的同学来说都非常容易理解——在人的21号染色体上寻找CTCF(一个高度保守的锌指DNA结合蛋白)的潜在结合位点。作者先介绍了零假设(null hypothesis),进而引出了p-value的概念。之后,解释了为什么原始p值不能够直接使用,从而过渡到p值校正的话题。在这一部分,作者层层深入,以简洁明了的语言介绍、解释了Bonferroni adjustment、false discovery rate (FDR)、q-value和local FDR的概念、由来、意义等基本但非常重要的知识。最后作者给出了实际应用时的指导建议,并以点睛之笔概括总结了全文中的要点。如果你的工作涉及p值的校正、FDR、q值等概念,这篇文章绝对胜任引你入门的角色(但绝不仅限于此!)。 文章链接: http://www.seq.cn/forum.php?mod=viewthreadtid=3504 1 2 3 When prioritizing hits from a high-throughput experiment, it is important to correct for random events that falsely appear significant. How is this done and what methods should be used? Imagine that you have just invested a substantial amount of time and money in a shotgun proteomics experiment designed to identify proteins involved in a particular biological process. The experiment successfully identifies most of the proteins that you already know to be involved in the process and implicates a few more.
2008 年11月创刊的《植物组学》(Plant Omics),ISSN: 1836-0661,双月刊,澳大利亚(SOUTHERN CROSS PUBL, 8 91-93 MCKENZIE ST, LISMORE, AUSTRALIA, NSW 2480)出版,2009年入选 Web of Science的Science Citation Index Expanded,目前在SCI数据库可以检索到该期刊2008年的第1卷第1期到2010年第3卷第3期共51篇论文。 51 篇文章包括学术论文44篇、评论7篇。 51 篇文章的主要国家分布:印度14篇,孟加拉国、巴基斯坦各7篇,伊朗6篇,中国(其中台湾地区1篇)、加拿大、埃及、韩国各4篇,日本、马来西亚、土耳其各3篇,美国、沙特阿拉伯、阿拉伯联合酋长国各2篇等。 中国学者以通讯作者单位在《植物组学》(Plant Omics)上发表论文的是河南科技学院(Henan Inst Sci Technol)1篇。 51篇文章共被引用18次(其中2009年被引用3次、2010年被引用15次),平均引用0.35次, H指数为2(有2篇文章每篇最少被引用2次)。 《植物组学》( Plant Omics )投稿指南: 该刊的副标题为植物生物学和分子组学杂志,是一跨学科国际同行评审刊。涉及植物、作物和农业生物所有领域,特别是植物的基本知识和应用分子组学,包括:基因组学、生物信息学、 转录组学、 蛋白质组学、代谢组学、 表型组学、脂质组学、 糖组学、 细胞组学、 药物基因组学、生 理组学、相互作用组学等。 该杂志所涵盖的主要领域: Genomics (study of plant genes, regulatory and non-coding sequences) Bioinformatics (study of computational algorithms and methods in plant sciences) Transcriptomics (study of RNA complement of an plant organism, tissue type, or cell with association to gene expression) Proteomics (study of plant proteins and their expressions) Metabolomics (study of primary, secondary etc. Metabolites in plants) Phenomics characterization of plant phenotypes (normal and mutant) via the interaction of the genome with the environment Lipidomics (study of non-water-soluble metabolites particularly lipids in plant organisms and cells) Glycomics (study of plant glycomes including genetic, physiologic, pathologic and other aspects) Cytomics (study of cytomes and cell systems at a single cell level) Cytogenomics (study of chromosomes and their association with plant characters) Pharmacogenomics (study of genetic effects to produce plant medicinal drugs) Physiomics (physiological dynamics and functions of whole plant) Interactomics (bioinformatical and biological study of interactions among plant molecules such as proteins, lipids etc.within a plant cell or organs). 该刊是OA期刊,读者可以免费获得该刊的全文。 网址: http://www.pomics.com/ 编委会: http://www.pomics.com/editorial.html 作者指南: http://www.pomics.com/guidlines.html 在线投稿: http://www.pomics.com/login.html 《植物组学》(Plant Omics )热点论文: 1.标题: Efficient in vitro plant regeneration, flowering and fruiting of dwarf Tomato cv. Micro-Msk 作者: Mamidala P, Nanna RS 来源出版物: PLANT OMICS 卷: 2 期: 3 页: 98-102 出版年: MAY 2009 被引频次: 3 2.标题: Approaches for enhancing salt tolerance in mulberry (Morus L) - A review 作者: Vijayan K 来源出版物: PLANT OMICS 卷: 2 期: 1 页: 41-59 出版年: JAN 2009 被引频次: 3 3.标题: Proteomics profile of pre-harvest sprouting wheat by using MALDI-TOF Mass Spectrometry 作者: Kamal AHM, Kim KH, Shin DH, et al. 来源出版物: PLANT OMICS 卷: 2 期: 3 页: 110-119 出版年: MAY 2009 被引频次: 2 4.标题: Alterations in non-enzymatic antioxidant components of Catharanthus roseus exposed to paclobutrazol, gibberellic acid and Pseudomonas fluorescens 作者: Jaleel CA, Gopi R, Panneerselvam R 来源出版物: PLANT OMICS 卷: 2 期: 1 页: 30-40 出版年: JAN 2009 被引频次: 2
Patrick O Brown Biochemistry and HHMI, Stanford University, Stanford, USA Head of Section: Genomics Genetics Genomics 部分论著与科研绩效: http://scholar.google.com/scholar?hl=enq=Patrick+O+BrownbtnG=Searchas_sdt=2000as_ylo=as_vis=0 Cluster analysis and display of genome-wide expression patterns pnas.org MB Eisen, PT Spellman, PO Brown , - Proceedings of the , 1998 - National Acad Sciences A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the ... Cited by 9913 - Related articles - BL Direct - All 187 versions Quantitative monitoring of gene expression patterns with a complementary DNA microarray ensmp.fr M Schena, D Shalon, RW Davis, PO Brown - Science, 1995 - AAAS Cited by 6923 - Related articles - BL Direct - All 22 versions Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling cmu.edu , W Wilson, MR Grever, JC Byrd, D Botstein, PO Brown , - Nature, 2000 - nature.com Diffuse large B-cell lymphoma (DLBCL), the most common subtype of non-Hodgkin's lymphoma, is clinically heterogeneous: 40% of patients respond well to current therapy and have prolonged survival, whereas the remainder succumb to the disease. We proposed that ... Cited by 5096 - Related articles - BL Direct - All 82 versions Molecular portraits of human breast tumours , SX Zhu, PE Lnning, AL Brresen-Dale, PO Brown , D - Nature, 2000 - nature.com Human breast tumours are diverse in their natural history and in their responsiveness to treatments 1 . Variation in transcriptional programs accounts for much of the biological diversity of human cells and tumours. In each cell, signal transduction and regulatory systems transduce ... Cited by 3713 - Related articles - BL Direct - All 12 versions Exploring the metabolic and genetic control of gene expression on a genomic scale ramapo.edu JL DeRisi, VR Iyer, PO Brown - Science, 1997 - sciencemag.org DNA microarrays containing virtually every gene of Saccharomyces cerevisiae were used to carry out a comprehensive investigation of the temporal program of gene expression accompanying the metabolic shift from fermentation to respiration. The expression profiles observed for ... Cited by 3645 - Related articles - BL Direct - All 91 versions Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization molbiolcell.org , MB Eisen, PO Brown , D Botstein, B - Molecular biology of , 1998 - Am Soc Cell Biol In 1981 Hereford and coworkers discovered that yeast histone mRNAs oscillate in abundance during the cell division cycle (Hereford et al., 1981 ). To date 104 messages that are cell cycle regulated have been identified using traditional methods, and it was estimated that some ... Cited by 3383 - Related articles - BL Direct - All 84 versions Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications pnas.org JC Matese, PO Brown , D Botstein, PE - Proceedings of the , 2001 - National Acad Sciences The purpose of this study was to classify breast carcinomas based on variations in gene expression patterns derived from cDNA microarrays and to correlate tumor characteristics to clinical outcome. A total of 85 cDNA microarray experiments representing 78 cancers, three ... Cited by 2902 - Related articles - BL Direct - All 31 versions Genomic expression programs in the response of yeast cells to environmental changes molbiolcell.org , MB Eisen, G Storz, D Botstein, PO Brown - Molecular biology of , 2000 - Am Soc Cell Biol We explored genomic expression patterns in the yeast Saccharomyces cerevisiae responding to diverse environmental transitions. DNA microarrays were used to measure changes in transcript levels over time for almost every yeast gene, as cells responded to temperature shocks, ... Cited by 2151 - Related articles - BL Direct - All 54 versions Exploring the new world of the genome with DNA microarrays ctu.edu.vn PO Brown , D Botstein - nature genetics, 1999 - ctu.edu.vn The genome project has revitalized exploration in biological research. Not long ago, it was possible for biologists to imagine that the genes that had been discovered via mutations, selections and cloning schemes represented a good approximation of the total universe of genes, and ... Cited by 1919 - Related articles - View as HTML - BL Direct - All 82 versions Use of a cDNA microarray to analyse gene expression patterns in human cancer stanford.edu J DeRisi, L Penland, PO Brown , ML Bittner, PS - Nature , 1996 - cmgm.stanford.edu ... microarray to analyse gene expression patterns in human cancer Joseph DeRisi1*, Lolita Penland2 Patrick О ... with significantly higher expression (10-fold) in the tumorigenic cells was the human brown locus pro ... Filters were washed to a strin- gency of O .lx SSC at 42 C for 20 ... Cited by 1699 - Related articles - BL Direct - All 9 versions
我们离人造生命还有多远? 作者 量子猫 这两天因Craig Venter把整个基因组成功转入支原体细胞,从Science到The New York Times,到各种报刊、网站,引发了一轮新的人造生命讨论的热潮。不过,正像一个网友随即指出的,Scientists have been putting synthetic pieces of DNA into bacterial for decades now - it's at the heart of recombinant DNA technology. This is just the logically linear culmination of that process - introducing an entire genome.To call it synthetic life is not science, it is marketing or PR。这里结合上下文译为:科学家们把合成的DNA片段成功导入细菌,到现在已经几十年了,这是重组DNA技术的核心。Craig Venter的工作只不过是这个过程逻辑上的线性积累,把整个基因组导入而已。把这叫做人造生命,不是科学,是市场运作或公关行为。 显然,把自然生命的DNA局部进行突变、剪切、组合等加工,再导入宿主细胞并可以繁殖自身,这项DNA重组技术只不过某种程度上复制并缩短了生命自然进化的过程,在尽可能短的时间里形成一些对人类有利的性状,如通过基因工程技术生产胰岛素等各种药物,培育抗旱、抗病虫等转基因作物,等等。这并非真正意义上的人造生命。而且,由于受生物自身极其复杂的系统结构和自洽的内在调控机制的限制,这种为某一特定目的人工加快进化的改造,是有限度的,成功的机率也会越来越低,投入的人力和资金成本也会越来越高,最终也会导致这项技术失去研究的意义和发展的动力。 象任何工程都必须有蓝图一样,人造生命至少需要把一个最简单的单细胞的全部基因调控和对应的生物体结构和功能搞清楚,然后才可以合理设计全新的物种。在此之前,人造生命或合成生物学,包括通过干细胞定向诱导人工制造器官,都还只是概念,只是处在初期的研究阶段。任何宣称制造出或可以短时间制造出人工生命或人工器官的,都只是有意无意的炒作。 那么,我们离真正的人造生命还有多远?答案可能会让人失望,很远很远。而且更可能让人们失望的是,我们现在都还没有彻底弄清楚到底有多远。 基因编码了生命个体全部的关键分子构件的结构和调控、代谢机制。虽然10年前对包括人在内的一个生命个体进行全基因序列分析已不是问题,但对各个基因的功能,也就是其对应的蛋白分子等功能组件和相互调控关系,我们虽然一直不断地取得进展,但很遗憾,至今仍所知甚少。从概念上说,对一个生物个体或细胞的全部蛋白等分子组件进行分析、分离纯化、进一步的结构分析并研究相互作用关系和调控机制的技术早已成熟,甚至可以大规模、高通量、高度自动化的进行。但是,我们至今不能准确地知道,生命到底由多少种基本的蛋白构成,甚至不知道一个最简单的单细胞到底由多少种蛋白构成。当涉及到具体问题,特别是那些少量或稀有蛋白时,尽管技术和方法一再创新,我们仍然几乎无法找到并捉住它们,更不用说分离纯化了。而且,现在发现,有些中间过程是多蛋白因子相互作用,尝试把这些组件分开后,它们就失去活性和功能,从而让研究无法进行。还有,即使一个蛋白可以分离纯化拿到纯品,也不是所有的蛋白都可以拿到结晶体从而进行三维结构分析的,而且对已经可以分离的蛋白,不能进行结构分析的远多于可以分析的。如果不清楚结构,就不太能彻底弄清楚这些组件的相互作用关系而理解整个生物个体的构造和调控机制。 这些很可能只是困难的一部分。确切地说,我们现在离彻底弄清楚一个最简单的细胞的全部基因调控关系,也就是一个最简单的生命的蓝图,还不知道有多远。很可能随着研究的进一步深入,我们还会遇到更多的目前技术上无法克服的困难。也就是说,我们会发现这个路程会更遥远。分子生物学研究不仅需要技术上的突破,更需要观念和方法的创新。 当然,不必怀疑人类在科学方面的创新能力。只有当这些困难一一呈现出来的时候,方法和技术创新也才成为可能,这些困难也才能最终被克服。而且,也不必对真的人造生命大惊小怪,成熟的技术可以克服现在一切假想的负面问题。人造生命涉及的技术会极其复杂,显然也会通过知识产权层层保护,根本不是一个个人或普通机构可以实施的。还有,人造生命可以和现在的转基因生物不同,比如,人造生命由某些特定基因或分子组件构成,我们可以通过对这些组件的控制随时杀死它们,或使其离开特定人工或自然环境就不能生存。也就是说,我们可以通过功能细菌清除环境污染,通过功能植物让沙漠变绿洲。更可以建设各种生物工厂,高效生产粮食而让农田退耕以改善美化环境,制造针对各种疾病并精确制导的抗体药物,通过高效利用光能和空气中的二氧化碳制造煤炭石油等能源替代品并改善变暖的环境,等等,但不用担心这些人造生物会危害环境和人类之身。 很多在这一领域工作并做出重大贡献的科学家们非常辛苦,但他们却默默无闻,甘于奉献,甚至工作处境困难。相对来说,有些并没有在这一领域做出什么基础性或重大贡献的,借助于商业活动,反而名利双收,早已是亿万富翁,同时也把科学研究导入歧途,误导公众并浪费了公共资源。这是商业或功利目的对科学研究纯洁性的不幸侵蚀。 人类的智慧的物质基础,神经系统,在对环境的适应中不断进化。人类早已经成为了地球的主宰。人类不断对自身和环境进行深入和广泛的探索,并在不断的试错过程中发展出了科学。科学以人类最广泛并与实践过程一致的那些经验为公理,并以此为基础通过严格的逻辑演绎扩展对自身和环境的认识。一旦发觉新的现象或问题与科学固有的原则不相符合,那么新的公理就会创造出来,这使科学可以建立起一种内在的自我纠错机制。所以,科学家们一旦发现某些科学研究结果可能会存在负面问题,马上就会设计新的方法或措施改善。当然,当某些和科学有关的概念被误导或有人利用科学概念进行有意无意的炒作时,很快也就会有人站出来纠正。 只有当科学的理念而不是商业利益或功利目的越来越成为社会发展的主导时,人类社会才有可能变得更加美好。 (转自2010年5月22日《新语丝读书论坛》)
http://news.ifeng.com/mainland/200912/1213_17_1472585.shtml 2009年12月13日 05:35 中青在线-中国青年报 本报广州12月12日电(记者林洁)记者今天从华南理工大学获悉,国际著名科学期刊《自然》(Nature)12月7日在其生物技术分刊《Nature Biotechnology》上发表了由深圳华大基因研究院领衔、华南理工大学主要参与的合作研究论文《构建人类泛基因组序列图谱》。 取得这一重大优秀成果的研究团队平均年龄不超过25岁,最年轻的是并列第一作者的罗锐邦和另一名署名作者金鑫。他们分别是华南理工大学大三和大四学生,同为华南理工大学深圳华大基因研究院基因组科学创新班学生。 在论文中,作者阐述了人类基因组研究中的重大进展发现人类基因组中存在着种群特异甚至个体独有的DNA序列和功能基因,并首次提出了人类泛基因组的概念。 在论文匿名审稿过程中,一名科学家毫不吝啬地对该论文评价道:这是一篇激动人心、发人深思、严谨清晰的文章。除了对新序列的检出和分类,这篇文章还通过使用相当有趣的独创分析方法,使我们对这些新序列中所能展示的种群多样性和进化保守性有了更深的认识。 本科生论文登上国际著名科学期刊,对于华南理工大学而言,并非偶然事件。 2008年,该校两名2004级应届本科毕业生参加深圳华大基因研究院炎黄一号基因组研究,并作出贡献,成为以封面文章发表在《自然》杂志上的论文作者; 今年8月,创新班的另一名本科生邵浩靖在《科学》杂志署名发表了名为《40个基因组的重测序揭示了蚕的驯化事件及驯化相关基因》的论文。 华南理工大学校长李元元表示,该校正在积极探索与顶尖科研机构合作培养学术型人才,同时也在进一步探索高校与企业联合培养创新人才的运行机制。牵手华大基因研究院联合组建基因组科学创新班,是华南理工大学开展产学研合作教育的一次大胆创新。目前,华南理工大学每年投入100万元专项资金,立项支持产学研合作教育的探索与实践。 http://www.sciencenet.cn/htmlnews/2009/12/225947.shtm 我国科学家首提人类泛基因组 人类基因组存在着种群特异甚至个体独有的DNA序列和功能基因 经过不懈研究和攻关,我国科研人员在人类基因组研究中获得新的重大进展――发现人类基因组中存在着种群特异甚至个体独有的DNA序列和功能基因。科研人员还首次提出了人类泛基因组的概念。 由深圳华大基因研究院领衔,华南理工大学参与的研究论文《构建人类泛基因组序列图谱》12月7日在国际著名科学期刊《自然生物技术》( Nature Biotechnology )上发表。 在研究中,科研人员运用第二代测序技术和自主研发的基因组组装工具,对炎黄一号基因组――首个亚洲人个人基因组进行了进一步的深度测序和拼接,发现人类基因组中除原先公认的单核甘酸多态性、插入删除多态性和结构性变异以外,还存在着种群特异甚至个体独有的DNA序列和功能基因,例如主要在亚洲人群内特有的基因序列。 科研人员同时对近两年发表的非洲人基因组和韩国人基因组进行了重新组装,也得到类似结论。科研人员还首次提出了人类泛基因组的概念,即人类群体基因序列的总和。 国际人类基因计划基于欧洲人DNA完成的参考基因组序列,是目前绝大多数人类基因组学研究的数据基础。多年来,大多数科学研究都认为每个个体的基因组均与这一参考基因组相似,仅有替换或重排性质的变化。 专家指出,这一研究树立了新的人类基因组测序标准,进一步证明自主构建中国人群医学基因组学图谱、推进个人基因组研究和个体化医学研究的必要性,是中国科学家在人类基因组研究领域的又一重要贡献。 在论文同行匿名审稿过程中,一名科学家评价说:这是一篇激动人心,发人深思,严谨清晰的文章。除了对新序列的检出和分类,这篇文章还通过使用相当有趣的独创的分析方法,增强了我们对这些新序列中所能展示的种群多样性和进化保守性的认识。 更多阅读 《自然生物技术》发表论文摘要(英文) http://www.nature.com/nbt/journal/vaop/ncurrent/abs/nbt.1596.html Analysis abstract Nature Biotechnology Published online: 7 December 2009 | :10.1038/nbt.1596 :10.1038/nbt.1596 Building the sequence map of the human pan-genome Ruiqiang Li 1 , 2 , 7 , Yingrui Li 1 , 7 , Hancheng Zheng 1 , 3 , 7 , Ruibang Luo 1 , 3 , 7 , Hongmei Zhu 1 , Qibin Li 1 , Wubin Qian 1 , Yuanyuan Ren 1 , Geng Tian 1 , Jinxiang Li 1 , Guangyu Zhou 1 , Xuan Zhu 1 , Honglong Wu 1 , 6 , Junjie Qin 1 , Xin Jin 1 , 3 , Dongfang Li 1 , 6 , Hongzhi Cao 1 , 6 , Xueda Hu 1 , Hlne Blanche 4 , Howard Cann 4 , Xiuqing Zhang 1 , Songgang Li 1 , Lars Bolund 1 , 5 , Karsten Kristiansen 1 , 2 , Huanming Yang 1 , Jun Wang 1 , 2 Jian Wang 1 Abstract Here we integrate the de novo assembly of an Asian and an African genome with the NCBI reference human genome, as a step toward constructing the human pan-genome. We identified 5 Mb of novel sequences not present in the reference genome in each of these assemblies. Most novel sequences are individual or population specific, as revealed by their comparison to all available human DNA sequence and by PCR validation using the human genome diversity cell line panel. We found novel sequences present in patterns consistent with known human migration paths. Cross-species conservation analysis of predicted genes indicated that the novel sequences contain potentially functional coding regions. We estimate that a complete human pan-genome would contain 1940 Mb of novel sequence not present in the extant reference genome. The extensive amount of novel sequence contributing to the genetic variation of the pan-genome indicates the importance of using complete genome sequencing and de novo assembly. Top of page BGI-Shenzhen, Shenzhen 518083, China. Department of Biology, University of Copenhagen, Copenhagen, Denmark. School of Bioscience and Biotechnology, South China University of Technology, Guangzhou, China. Fondation Jean Dausset, Centre d'tude du Polymorphisme Humain (CEPH), Paris, France. Institute of Human Genetics, University of Aarhus, Aarhus, Denmark. Genome Research Institute, Shenzhen University Medical School, Shenzhen, China. These authors contributed equally to this work. Correspondence to: Jun Wang 1 , 2 e-mail: wangj@genomics.org.cn Correspondence to: Jian Wang 1 e-mail: wangjian@genomics.org.cn
Towards an accurate sequence of the rice genome Delseny M. Towards an accurate sequence of the rice genome. Curr Opin Plant Biol. 2003 Apr; 6 (2): 101-5. Several more- or less-elaborated rice genome sequences have been produced recently using different strategies. It has become possible to compare them and to unravel the major features of the rice genome in terms of nucleotide composition, repeats, gene content and variability. It has also become possible to compare the rice and Arabidopsis genomes and to evaluate rice as a model genome. Towards an accurate sequence of the rice genome Comparing the whole-genome-shotgun and map-based sequences of the rice genome Yu J, Ni P, Wong GK. Comparing the whole-genome-shotgun and map-based sequences of the rice genome. Trends Plant Sci. 2006 Aug; 11 (8): 387-91. Epub 2006 Jul 13. The rice genome has now been sequenced using whole-genome-shotgun and map-based methods. The relative merits of the two methods are the subject of debate, as they were in the human genome project. In this Opinion article, we will show that the serious discrepancies between the resultant sequences are mostly found in the large transposable elements such as copia and gypsy that populate the intergenic regions of plant genomes. Differences in published gene counts and polymorphism rates are similarly resolved by considering how transposable elements affect the sequence analysis. Comparing the whole-genome-shotgun and map-based sequences of the rice genome Diversity in Oryza genus Vaughan DA, Morishima H, Kadowaki K. Diversity in the Oryza genus. Curr Opin Plant Biol. 2003 Apr; 6 (2): 139-46. The pan-tropical wild relatives of rice grow in a wide variety of habitats: forests, savanna, mountainsides, rivers and lakes. The completion of the sequencing of the rice nuclear and cytoplasmic genomes affords an opportunity to widen our understanding of the genomes of the genus Oryza. Research on the Oryza genus has begun to help to answer questions related to domestication, speciation, polyploidy and ecological adaptation that cannot be answered by studying rice alone. The wild relatives of rice have furnished genes for the hybrid rice revolution, and other genes from Oryza species with major impact on rice yields and sustainable rice production are likely to be found. Care is needed, however, when using wild relatives of rice in experiments and in interpreting the results of these experiments. Careful checking of species identity, maintenance of herbarium specimens and recording of Genbank accession numbers of material used in experiments should be standard procedure when studying wild relatives of rice. Diversity in Oryza genus Genome-wide intraspecific DNA-sequence variations in rice Han B, Xue Y. Genome-wide intraspecific DNA-sequence variations in rice. Curr Opin Plant Biol.2003 Apr; 6 (2): 134-8. Genome-wide comparative analysis of the DNA sequences of two major cultivated rice subspecies, Oryza sativa L. ssp indica and Oryza sativa L. ssp japonica, have revealed their extensive microcolinearity in gene order and content. However, deviations from colinearity are frequent owing to insertions or deletions. Intraspecific sequence polymorphisms commonly occur in both coding and non-coding regions. These variations often affect gene structures and may contribute to intraspecific phenotypic adaptations. Genome-wide intraspecific DNA-sequence variations in rice Sequencing the maize genome Martienssen RA, Rabinowicz PD, O'Shaughnessy A, McCombie WR. Sequencing the maize genome. Curr Opin Plant Biol. 2004 Apr; 7 (2): 102-7. Sequencing of complex genomes can be accomplished by enriching shotgun libraries for genes. In maize, gene-enrichment by copy-number normalization (high C(0)t) and methylation filtration (MF) have been used to generate up to two-fold coverage of the gene-space with less than 1 million sequencing reads. Simulations using sequenced bacterial artificial chromosome (BAC) clones predict that 5x coverage of gene-rich regions, accompanied by less than 1x coverage of subclones from BAC contigs, will generate high-quality mapped sequence that meets the needs of geneticists while accommodating unusually high levels of structural polymorphism. By sequencing several inbred strains, we propose a strategy for capturing this polymorphism to investigate hybrid vigor or heterosis. Sequencing the maize genome Genomic diversity in forest tree Savolainen O, Pyhjrvi T. Genomic diversity in forest trees. Curr Opin Plant Biol. 2007 Apr; 10 (2): 162-7. Epub 2007 Feb 9. Forest trees in general are out-crossing, long-lived, and at early stages of domestication. Molecular evolution at neutral sites is very slow because of the long generation times. Transferring information between closely related conifer species is facilitated by high sequence similarity. At the nucleotide level, trees have at most intermediate levels of variation relative to other plants. Importantly, in many species linkage disequilibrium within genes declines within less than 1000 bp. In contrast to the slow rate of neutral evolution, large tree populations respond rapidly to natural selection. Detecting traces of selection may be easier in tree populations than in many other species. Association studies between genotypes and phenotypes are proving to be useful tools for functional genomics. Genomic diversity in forest tree Complex gene families in pine genomes Jumping genes and maize genomics
Comparison of rice and Arabidopsis annotation Schoof H, Karlowski WM. Comparison of rice and Arabidopsis annotation. Curr Opin Plant Biol. 2003 Apr; 6 (2): 106-12. Several versions of the rice genome were published in 2002, providing a first overview of the genome content of this model monocot. At the same time, the genome of the model dicot, Arabidopsis thaliana, reached a new level of annotation as thousands of full-length cDNA sequences were integrated with the genome sequence. Comparison of rice and Arabidopsis annotation The ABCs of comparative genomics in the Brassicaceae: building blocks of crucifer genomes Schranz ME, Lysak MA, Mitchell-Olds T. The ABC's of comparative genomics in the Brassicaceae: building blocks of crucifer genomes. Trends Plant Sci. 2006 Nov; 11 (11): 535-42. Epub 2006 Oct 6. In this review we summarize recent advances in our understanding of phylogenetics, polyploidization and comparative genomics in the family Brassicaceae. These findings pave the way for a unified comparative genomic framework. We integrate several of these findings into a simple system of 24 conserved chromosomal blocks (labeled A-X). The naming, order, orientation and color-coding of these blocks are based on their positions in a proposed ancestral karyotype (n=8), rather than by their position in the reduced genome of Arabidopsis thaliana (n=5). We show how these crucifer building blocks can be rearranged to model the genome structures of A. thaliana, Arabidopsis lyrata, Capsella rubella and Brassica rapa. A framework for comparison between species is timely because several crucifer genome-sequencing projects are underway. The ABCs of comparative genomics in the Brassicaceae-building blocks of crucifer genomes Comparative biology comes into bloom: genomic and genetic comparision of flowering pathways in rice and Arabidopsis Izawa T, Takahashi Y, Yano M. Comparative biology comes into bloom: genomic and genetic comparison of flowering pathways in rice and Arabidopsis. Curr Opin Plant Biol. 2003 Apr; 6 (2): 113-20. Huge advances in plant biology are possible now that we have the complete genome sequences of several flowering plants. Now, genomes can be comprehensively compared and map-based cloning can be performed more easily. Association study is emerging as a powerful method for the functional identification of genes and molecular genetics has begun to reveal the basis of plant diversity. Taking the flowering pathways as an example, we discuss the potential of several approaches to comparative biology. Comparative biology comes into bloom-genomic and genetic comparision of flowering pathways in rice and Arabidopsis Unveiling the molecular arms race between two conflicting genomes in cytoplasmic male sterility Touzet P, Budar F. Unveiling the molecular arms race between two conflicting genomes in cytoplasmic male sterility? Trends Plant Sci. 2004 Dec; 9 (12): 568-70. Cytoplasmic male sterility can be thought of as the product of a genetic conflict between two genomes that have different modes of inheritance. Male sterilizing factors, generally encoded by chimeric mitochondrial genes, can be down-regulated by specific nuclear restorer genes. The recent cloning of a restorer gene in rice and its comparison with restorer genes cloned in petunia and radish could be regarded as the beginning of a general molecular scenario in this peculiar arms race. Unveiling the molecular arms race between two conflicting genomes in cytoplasmic male sterility
The genetic colinearty of rice and other cereals on the basis of genomic sequence analysis Bennetzen JL, Ma J. The genetic colinearity of rice and other cereals on the basis of genomic sequence analysis. Curr Opin Plant Biol. 2003 Apr; 6 (2): 128-33. Small segments of rice genome sequence have been compared with that of the model plant Arabidopsis thaliana and with several closer relatives, including the cereals maize, rice, sorghum, barley and wheat. The rice genome is relatively stable relative to those of other grasses. Nevertheless, comparisons with other cereals have demonstrated that the DNA between cereal genes is highly variable and evolves rapidly. Genic regions have undergone many more small rearrangements than have been revealed by recombinational mapping studies. Tandem gene duplication/deletion is particularly common, but other types of deletions, inversions and translocations also occur. The many thousands of small genic rearrangements within the rice genome complicate but do not negate its use as a model for larger cereal genomes. The genetic colinearty of rice and other cereals on the basis of genomic sequence analysis Updating the crop circle Devos KM. Updating the 'crop circle'. Curr Opin Plant Biol. 2005 Apr; 8 (2): 155-62. Comparative analyses unravel the relationships between genomes of related species. The most comprehensive comparative dataset obtained to date is from the grass family, which contains all of the major cereals. Early studies aimed to identify chromosomal regions that have remained conserved over long evolutionary time periods, but in recent years, researchers have focused more on the extent of colinearity at the DNA-sequence level. The latter studies have uncovered many small rearrangements that disturb colinearity in orthologous chromosome regions. In part, genomes derive their plasticity from genome- and gene-amplification processes. Duplicated gene copies are more likely to escape selective constraints and thus move to other regions of the genome, where they might acquire new functions or become deleted. These rearrangements will affect map applications. The most popular applications, especially since the complete rice genomic sequence has been available, are the use of comparative data in the generation of new markers to tag traits in other species and to identify candidate genes for these traits. The isolation of genes underlying orthologous traits is the first step in conducting comparative functional studies. Updating the crop circle Colinearty and gene density in grass genomes Keller B, Feuillet C. Colinearity and gene density in grass genomes. Trends Plant Sci. 2000 Jun; 5 (6): 246-51. Grasses are the single most important plant family in agriculture. In the past years, comparative genetic mapping has revealed conserved gene order (colinearity) among many grass species. Recently, the first studies at gene level have demonstrated that microcolinearity of genes is less conserved: small scale rearrangements and deletions complicate the microcolinearity between closely related species, such as sorghum and maize, but also between rice and other crop plants. In spite of these problems, rice remains the model plant for grasses as there is limited useful colinearity between Arabidopsis and grasses. However, studies in rice have to be complemented by more intensive genetic work on grass species with large genomes (maize, Triticeae). Gene-rich chromosomal regions in species with large genomes, such as wheat, have a high gene density and are ideal targets for partial genome sequencing. Colinearty and gene density in grass genomes Comparison of genes among cereals Ware D, Stein L. Comparison of genes among cereals. Curr Opin Plant Biol. 2003 Apr; 6 (2): 121-7. Comparison of partially sequenced cereal genomes suggests a mosaic structure consisting of recombinationally active gene-rich islands that are separated by blocks of high-copy DNA. Annotation of the whole rice genome suggests that most, but not all, cereal genes are present within the rice genome and that the high number of reported genes in this genome is probably due to duplications. Within the cereals, macrocolinearity is conserved but, at the level of individual genes, microcolinearity is frequently disrupted. Preliminary evidence from limited comparative analysis of sequenced orthologous genomic segments suggests that local gene amplification and translocation within a plant genome may be linked in some cases. Comparison of genes among cereals Patterns in grass genome evolution Bennetzen JL. Patterns in grass genome evolution. Curr Opin Plant Biol. 2007 Apr; 10 (2): 176-81. Epub 2007 Feb 8. Increasingly comprehensive, species-rich, and large-scale comparisons of grass genome structure have uncovered an even higher level of genomic rearrangement than originally observed by recombinational mapping or orthologous clone sequence comparisons. Small rearrangements are exceedingly abundant, even in comparisons of closely related species. The mechanisms of these small rearrangements, mostly tiny deletions caused by illegitimate recombination, appear to be active in all of the plant species investigated, but their relative aggressiveness differs dramatically in different plant lineages. Transposable element amplification, including the acquisition and occasional fusion of gene fragments from multiple loci, is also common in all grasses studied, but has been a much more major contributor in some species than in others. The reasons for these quantitative differences are not known, but it is clear that they lead to species that have very different levels of genomic instability. Similarly, polyploidy and segmental duplication followed by gene loss are standard phenomena in the history of all flowering plants, including the grasses, but their frequency and final outcomes are very different in different lineages. Now that genomic instability has begun to be characterized in detail across an array of plant species, it is time for comprehensive studies to investigate the relationships between particular changes in genome structure and organismal function or fitness. Patterns in grass genome evolution The rice genome and comparative genomics of higher plants The rice genome and comparative genomics of higher plants
Leafing through the genomes of our major crop plants: strategies for capturing unique information Paterson AH. Leafing through the genomes of our major crop plants: strategies for capturing unique information. Nat Rev Genet. 2006 Mar; 7 (3): 174-84. Crop plants not only have economic significance, but also comprise important botanical models for evolution and development. This is reflected by the recent increase in the percentage of publicly available sequence data that are derived from angiosperms. Further genome sequencing of the major crop plants will offer new learning opportunities, but their large, repetitive, and often polyploid genomes present challenges. Reduced-representation approaches - such as EST sequencing, methyl filtration and Cot-based cloning and sequencing - provide increased efficiency in extracting key information from crop genomes without full-genome sequencing. Combining these methods with phylogenetically stratified sampling to allow comparative genomic approaches has the potential to further accelerate progress in angiosperm genomics. Leafing through the genomes of our major crop plants-strategies for capturing unique information Genomics tools for QTL analysis and gene discovery Borevitz JO, Chory J. Genomics tools for QTL analysis and gene discovery. Curr Opin Plant Biol. 2004 Apr; 7 (2): 132-6. In recent years, several new genomics resources and tools have become available that will greatly assist quantitative trait locus (QTL) mapping and cloning of the corresponding genes. Genome sequences, tens of thousands of molecular markers, microarrays, and knock-out collections are being applied to QTL mapping, facilitating the use of natural accessions for gene discovery. Genomics tools for QTL analysis and gene discovery Tandem gene arrays: a challenge for functional genomics Jander G, Barth C. Tandem gene arrays: a challenge for functional genomics. Trends Plant Sci. 2007 May; 12 (5): 203-10. Epub 2007 Apr 9. In sequenced plant genomes, 15% or more of the identified genes are members of tandem-arrayed gene families. Because mutating only one gene in a duplicated pair often produces no measurable phenotype, this poses a particular challenge for functional analysis. To generate phenotypic knockouts, it is necessary to create deletions that affect multiple genes, select for rare meiotic recombination between tightly linked loci, or perform sequential mutant screens in the same plant line. Successfully implemented strategies include PCR-based screening for fast neutron-induced deletions, selection for recombination between herbicide resistance markers, and localized transposon mutagenesis. Here, we review the relative merits of current genetic approaches and discuss the prospect of site-directed mutagenesis for generating elusive knockouts of tandem-arrayed gene families. Tandem gene arrays-a challenge for functional genomics Re-valuating the relevance of ancenstral shared synteny as a tool for crop improvement Delseny M. Re-evaluating the relevance of ancestral shared synteny as a tool for crop improvement. Curr Opin Plant Biol. 2004 Apr; 7 (2): 126-31. In addition to the Arabidopsis and rice genomic sequences, numerous expressed sequence tags (ESTs) and sequenced tag sites are now available for many species. These tools have made it possible to re-evaluate the extent of synteny and collinearity not only between Arabidopsis and related crops or between rice and other cereals but also between Arabidopsis and rice, between Arabidopsis and other dicots, and between cereals other than rice. Major progress in describing synteny relies on statistical tests. Overall, the data point to the occurrence of ancestral genome fragments in which a framework of common markers can be recognised. Micro-synteny studies reveal numerous rearrangements, which are likely to complicate map-based cloning strategies that use information from a model genome. Re-valuating the relevance of ancenstral shared synteny as a tool for crop improvement Synteny: recent advances and future prospects Schmidt R. Synteny: recent advances and future prospects. Curr Opin Plant Biol. 2000 Apr; 3 (2): 97-102. Their small sizes have meant that the Arabidopsis and rice genomes are the best-studied of all plant genomes. Although even closely related plant species can show large variations in genome size, extensive genome colinearity has been established at the genetic level and recently also at the gene level. This allows the transfer of information and resources assembled for rice and Arabidopsis to be used in the genome analysis of many other plants. Synteny-recent advances and future prospects Synergy between sequence and size in large-scale genomics Gregory TR. Synergy between sequence and size in large-scale genomics. Nat Rev Genet. 2005 Sep; 6 (9): 699-708. Until recently the study of individual DNA sequences and of total DNA content (the C-value) sat at opposite ends of the spectrum in genome biology. For gene sequencers, the vast stretches of non-coding DNA found in eukaryotic genomes were largely considered to be an annoyance, whereas genome-size researchers attributed little relevance to specific nucleotide sequences. However, the dawn of comprehensive genome sequencing has allowed a new synergy between these fields, with sequence data providing novel insights into genome-size evolution, and with genome-size data being of both practical and theoretical significance for large-scale sequence analysis. In combination, these formerly disconnected disciplines are poised to deliver a greatly improved understanding of genome structure and evolution. Synergy between sequence and size in large-scale genomics Transposable elements and the plant pan-genomes Morgante M, De Paoli E, Radovic S. Transposable elements and the plant pan-genomes. Curr Opin Plant Biol. 2007 Apr; 10 (2): 149-55. Epub 2007 Feb 14. The comparative sequencing of several grass genomes has revealed that transposable elements are largely responsible for extensive variation in both intergenic and local genic content, not only between closely related species but also among individuals within a species. These observations indicate that a single genome sequence might not reflect the entire genomic complement of a species, and prompted us to introduce the concept of the plant pan-genome, which includes core genomic features that are common to all individuals and a dispensable genome composed of partially shared and/or non-shared DNA sequence elements. Uncovering the intriguing nature of the dispensable genome, namely its composition, origin and function, represents a step forward towards an understanding of the processes that generate genetic diversity and phenotypic variation. The developing view of transcriptional regulation as a complex and modular system, in which long-range interactions and the involvement of transposable elements are frequently observed, lends support to the possibility of an important functional role for the dispensable genome and could make it less dispensable than previously thought. Transposable elements and the plant pan-genomes Flux an important, but neglected, component of functional genomics Fernie AR , Geigenberger P, Stitt M. Flux an important, but neglected, component of functional genomics. Curr Opin Plant Biol. 2005 Apr; 8 (2): 174-82. Genomics approaches aimed at understanding metabolism currently tend to involve mainly expression profiling, although proteomics and steady-state metabolite profiling are increasingly being carried out as alternative strategies. These approaches provide rich information on the inventory of the cell. It is, however, of growing importance that such approaches are augmented by sophisticated integrative analyses and a higher-level understanding of cellular dynamics to provide insights into mechanisms that underlie biological processes. We argue the need for, and discuss theoretical and practical aspects of, the determination of metabolic flux as a component of functional genomics. Flux an important, but neglected, component of functional genomics Genomics of sex chromosomes Ming R, Moore PH. Genomics of sex chromosomes. Curr Opin Plant Biol. 2007 Apr ;10 (2): 123-30. Epub 2007 Feb 14. Sex chromosomes in plants and animals are distinctive, not only because of their gender-determining role but also for genomic features that reflect their evolutionary history. The genomic sequences in the ancient sex chromosomes of humans and in the incipient sex chromosomes of medaka, stickleback, papaya, and poplar exhibit unusual features as consequences of their evolution. These include the enormous palindrome structure in human MSY, a duplicated genomic fragment that evolved into a Y chromosome in medaka, and a 700 kb extra telomeric sequence of the W chromosome in poplar. Comparative genomic analysis of ancient and incipient sex chromosomes highlights common features that implicate the selection forces that shaped them, even though evolutionary origin, pace, and fate vary widely among individual sex-determining systems. Genomics of sex chromosomes And then there were many: MADS goes genomic De Bodt S, Raes J, Van de Peer Y, Theissen G. And then there were many: MADS goes genomic. Trends Plant Sci. 2003 Oct; 8 (10): 475-83. During the past decade, MADS-box genes have become known as key regulators in both reproductive and vegetative plant development. Traditional genetics and functional genomics tools are now available to elucidate the expression and function of this complex gene family on a much larger scale. Moreover, comparative analysis of the MADS-box genes in diverse flowering and non-flowering plants, boosted by bioinformatics, contributes to our understanding of how this important gene family has expanded during the evolution of land plants. Therefore, the recent advances in comparative and functional genomics should enable researchers to identify the full range of MADS-box gene functions, which should help us significantly in developing a better understanding of plant development and evolution. And then there were many-MADS goes genomic Plant functional genomics: beyond the parts list Stewart CN Jr. Plant functional genomics: beyond the parts list. Trends Plant Sci. 2005 Dec; 10 (12): 561-2. Epub 2005 Nov 14. Plant functional genomics-beyond the parts list Genomics-deeper and wider in order to understanding plant diversity Genomics-deeper and wider in order to understanding plant diversity The consequences of gene and genome duplication in plants The consequences of gene and genome duplication in plants