原创,别转载。 科研思路来自知识发现 科研思路来自科研实践、来自文献信息、来自同行交流。 科研思路和科研选题也可以来自知识发现,在我的博客“知识发现栏目” http://blog.sciencenet.cn/home.php?mod=spaceuid=280034do=blogclassid=115378view=mefrom=space 有大量的科学研究知识发现的实例和分析 ... 知识发现 一项新遗传学研究揭示了与严重儿童癫痫相关的两个新基因,并为确定治疗靶点提供了一种新策略。这项研究采用了一种称之为外显子组测序(exome sequencing)的先进遗传学技术,来搜寻不遗传的新突变。研究结果表明,它或许是发现及证实许多致病基因突变的一种高效的方法。相关论文发表在8月11日的Nature杂志上。 Nature发布外显子组测序新成果 http://www.ebiotrade.com/newsf/2013-8/2013812110114335.htm Start A-Literature C-Literature B-list Filter Literature A-query: exome sequencing C-query: epilepsy The B-list contains title words and phrases (terms) that appeared in both the A and the C literature. 58 articles appeared in both literatures and were not included in the process of computing the B-list but can be viewed here . The results of this search are saved under id # 13416 and can be accessed from the start page after you leave this session. There are 460 terms on the current B-list (发现157个基因和蛋白质知识概念 are predicted to be relevant), which is shown ranked according to predicted relevance. The list can be further trimmed down using the filters listed in the left margin. To assess whether there appears to be a biologically significant relationship between the AB and BC literatures for specific B-terms, please select one or more B-terms and then click the button to view the corresponding AB and BC literatures. Use Ctrl to select multiple B-terms. http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/edit_b.cgi job id # 13416 started Tue Aug 13 03:12:23 2013 Max_citations: 50000 Stoplist: /var/www/arrowsmith_uic/data/stopwords_pubmed Ngram_max: 3 A_query_raw: exome sequencing Tue Aug 13 03:12:30 2013 A query = exome sequencing started Tue Aug 13 03:12:30 2013 A query resulted in 1680 titles C_query_raw: exome sequencing Tue Aug 13 03:12:32 2013 C: exome sequencing 1680 A: pubmed_query_A 1680 AC: ( exome sequencing ) AND ( exome sequencing ) 1680 C_query_raw: epilepsy Tue Aug 13 03:12:41 2013 C: epilepsy 140710 C_query_raw: epilepsy Tue Aug 13 03:12:41 2013 A: pubmed_query_A 1680 AC: ( exome sequencing ) AND ( epilepsy ) 58 C: epilepsy 140710 A: pubmed_query_A 1680 AC: ( exome sequencing ) AND ( epilepsy ) 58 C_query_raw: epilepsy Tue Aug 13 03:12:43 2013 C: epilepsy 140710 A: pubmed_query_A 1680 AC: ( exome sequencing ) AND ( epilepsy ) 58 C query = epilepsy started Tue Aug 13 03:12:44 2013 C query resulted in 50000 titles A AND C query resulted in 58 titles 4572 B-terms ready on Tue Aug 13 03:14:17 2013 Sem_filter: Genes Molecular Sequences, and Gene Protein Names 460 B-terms left after filter executed Tue Aug 13 03:14:46 2013 B-list on Tue Aug 13 03:18:21 2013 1 brca1 2 genome wide 3 kcnq2 4 genome sequencing 5 single nucleotide polymorphism 6 joubert syndrome 7 ubiquitin 8 gene encoding 9 transporter gene 10 stat3 11 potassium channel 12 opioid receptor 13 meningioma 14 dystrophin 15 genome project 16 gene autism 17 spinocerebellar ataxia 18 wnt 19 trk 20 candidate gene 21 scn2a 22 ryanodine receptor 23 notch1 24 chromatin remodeling 25 swi 26 quantitative trait locus 27 moyamoya disease 28 pi3k 29 caspase 30 glutamate receptor 31 helix loop helix 32 trna 33 exon 34 polydactyly 35 genomic 36 abcc8 37 myopia 38 congenital cataract 39 gene familial 40 pten 41 alms1 42 snp 43 vhl 44 nkx2-1 45 leptin 46 sod1 47 hydrocephalus 48 mody 49 calcium channel 50 tumor suppressor gene 51 essential tremor 52 lamin 53 intronic 54 glioma 55 susceptibility gene 56 rac1 57 neuroligin 58 genome 59 mitochondrial genome 60 spastic paraplegia 61 trpv4 62 tgfbeta 63 twin 64 cx3cr1 65 retinitis pigmentosa 66 transporter 67 dystonia 68 frameshift 69 whole genome sequencing 70 novel gene 71 genome array 72 kcnj11 73 gene autosomal 74 multiple endocrine neoplasia 75 trait 76 sox2 77 gene paroxysmal 78 receptor gene 79 alu 80 cystatin 81 cdna 82 ssri 83 copy 84 psen1 85 chloride channel 86 hdl 87 slc19a3 88 rbp4 89 autophagy 90 mpl 91 allelic heterogeneity 92 tcf4 93 diabetes 94 trem2 95 reading frame 96 alzheimer disease 97 inflammatory bowel disease 98 gene mutated 99 progressive external ophthalmoplegia 100 aromatase 101 ret 102 tumor suppressor 103 cytokine 104 apoe 105 sirt1 106 enhancer 107 adam10 108 hras 109 helicase 110 tremor 111 cone rod dystrophy 112 kinase 113 il-10 114 q11 115 breast cancer 116 celiac disease 117 mri 118 codon 119 alpha gene 120 cdc42 121 hypertension 122 gene patient 123 related gene 124 igf i 125 van 126 lip 127 rna 128 dna methyltransferase 129 mitochondrial 130 nf kappab 131 kelch 132 spitz 133 chaperone 134 connexin 135 dlx5 136 domain 137 ion 138 grin2a 139 gene associated 140 fgf 141 lepr 142 subunit gene 143 hydroxylase gene 144 locus 145 motor neuron disease 146 imprinted gene 147 lacking 148 cadherin 149 sdha 150 causative gene 151 scn9a 152 hla 153 meta 154 gtpase 155 gfap 156 cyst 157 map http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/view_b_txt.cgi?ID=13416 基因发现实例 Start A-Literature C-Literature B-list Filter Literature AB literature B-term BC literature exome sequencing kcnq2 epilepsy 1: Clinical spectrum of early onset epileptic encephalopathies caused by KCNQ2 mutation. 2013 Add to clipboard 1: Novel KCNQ2 Mutation in a Large Emirati Family With Benign Familial Neonatal Seizures. 2013 Add to clipboard 2: Ezogabine (KCNQ2 /3 channel opener) prevents delayed activation of meningeal nociceptors if given before but not after the occurrence of cortical spreading depression. 2013 Add to clipboard 3: Similar early characteristics but variable neurological outcome of patients with a de novo mutation of KCNQ2 . 2013 Add to clipboard 4: Video/EEG findings in a KCNQ2 epileptic encephalopathy: a case report and revision of literature data. 2013 Add to clipboard 5: KCNQ2 encephalopathy: Emerging phenotype of a neonatal epileptic encephalopathy. 2012 Add to clipboard 6: KCNQ2 abnormality in BECTS: Benign childhood epilepsy with centrotemporal spikes following benign neonatal seizures resulting from a mutation of KCNQ2 . 2012 Add to clipboard 7: KCNQ2 Potassium Channel Epileptic Encephalopathy Syndrome: Divorce of an Electro-Mechanical Couple? 2012 Add to clipboard 8: Development and Validation of a Medium-Throughput Electrophysiological Assay for KCNQ2 /3 Channel Openers Using QPatch HT. 2012 Add to clipboard 9: Role of KCNQ2 and KCNQ3 genes in juvenile idiopathic epilepsy in Arabian foals. 2012 Add to clipboard 10: Activation of KCNQ2 /3 Potassium Channels by Novel Pyrazolo pyrimidin-7(4H)-One Derivatives. 2011 Add to clipboard http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/show_sentences.cgi
《Nature》2012.09.06 扩充的人类调节DNA组库容量编码在转录因子足迹中 来自美国华盛顿大学领衔的研究小组报道了41种细胞和组织中DNase I足迹,揭示了成百上千万的编码DNA结合蛋白保守识别序列的短序列元件。 -2012年9月6日《自然》 中文翻译 ________________________________________ 【题目】扩充的人类调节DNA组库容量编码在转录因子足迹中 【译文】结合在基因组DNA上的调控因子保护潜在的序列以防被DNase I切割,进而形成核苷酸水平分辨率的足迹。在41种细胞和组织中利用基因组DNase I足迹分析,我们检测了不同调控区中4500万转录因子占有事件,这些调控区代表了840万不同短序列元件的不同结合。本研究表明这种小的基因组序列间隔,大约是外显子的两倍大小,编码了很大部分DNA结合蛋白保守识别序列,这些序列大约是人类顺式调控元件数量的两倍。我们发现影响等位基因染色质状态的遗传突变体在足迹中比较密集,这些元件优先地被DNA甲基化所保护。 高分辨率DNase I剪接模式反映了核苷酸水平的进化保守性,并示踪了蛋白-DNA作用表面的结晶结构,这表明转录因子结构已经在进化上被标记在人类基因组序列上了。我们鉴定了一种50碱基对的足迹,这种足迹可以清晰地界定成千上万人类启动子中转录本起源的位点。最后,我们描述了大量调控因子识别基序,它们在序列和功能上高度保守,并表现出细胞选择性占有模式,该模式类似于发育、分化和多潜能性的主要调控因子。 英文原稿 ________________________________________ : An expansive human regulatory lexicon encoded in transcription factor footprints :Shane Neph,1, 7 Jeff Vierstra,1, 7 Andrew B. Stergachis,1, 7 Alex P. Reynolds,1, 7 Eric Haugen,1 Benjamin Vernot,1 Robert E. Thurman,1 Sam John,1 Richard Sandstrom,1 Audra K. Johnson,1 Matthew T. Maurano,1 Richard Humbert,1 Eric Rynes,1 Hao Wang,1 Shinny Vong,1 Kristen Lee,1 Daniel Bates,1 Morgan Diegel,1 Vaughn Roach,1 Douglas Dunn,1 Jun Neri,1 Anthony Schafer,1 R. Scott Hansen,1, 2 Tanya Kutyavin,1 Erika Giste,1 Molly Weaver,1 Theresa Canfield,1 Peter Sabo,1 Miaohua Zhang,3 Gayathri Balasundaram,3 Rachel Byron,3 Michael J. MacCoss,1 Joshua M. Akey,1 M. A. Bender,3, 4Mark Groudine,3, 5 Rajinder Kaul1, 2 John A. Stamatoyannopoulos1, 6 et al. :Regulatory factor binding to genomic DNA protects the underlying sequence from cleavage by DNase I, leaving nucleotide-resolution footprints. Using genomic DNase I footprinting across 41 diverse cell and tissue types, we detected 45 million transcription factor occupancy events within regulatory regions, representing differential binding to 8.4 million distinct short sequence elements. Here we show that this small genomic sequence compartment, roughly twice the size of the exome, encodes an expansive repertoire of conserved recognition sequences for DNA-binding proteins that nearly doubles the size of the human cis–regulatory lexicon. We find that genetic variants affecting allelic chromatin states are concentrated in footprints, and that these elements are preferentially sheltered from DNA methylation. High-resolution DNase I cleavage patterns mirror nucleotide-level evolutionary conservation and track the crystallographic topography of protein–DNA interfaces, indicating that transcription factor structure has been evolutionarily imprinted on the human genome sequence. We identify a stereotyped 50-base-pair footprint that precisely defines the site of transcript origination within thousands of human promoters. Finally, we describe a large collection of novel regulatory factor recognition motifs that are highly conserved in both sequence and function, and exhibit cell-selective occupancy patterns that closely parallel major regulators of development, differentiation and pluripotency. 原文地址 http://www.nature.com/nature/journal/v489/n7414/full/nature11212.html Tags: NATURE nature-2012-09-06 短序列元件 调控因子 足迹 转录因子 http://m.bioku.cn/201210/nature-regulatory-factor-transcription-footprints-short-sequence/ http://www.ncbi.nlm.nih.gov/pubmed/22955618 Nature. 2012 Sep 6;489(7414):83-90. doi: 10.1038/nature11212. An expansive human regulatory lexicon encoded in transcription factor footprints. Neph S, Vierstra J, Stergachis AB, Reynolds AP, Haugen E, Vernot B, Thurman RE, John S, Sandstrom R, Johnson AK, Maurano MT, Humbert R, Rynes E, Wang H, Vong S, Lee K, Bates D, Diegel M, Roach V, Dunn D, Neri J, Schafer A, Hansen RS, Kutyavin T, Giste E, Weaver M,Canfield T, Sabo P, Zhang M, Balasundaram G, Byron R, MacCoss MJ, Akey JM, Bender MA, Groudine M, Kaul R, Stamatoyannopoulos JA. Source Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA. Abstract Regulatory factor binding to genomic DNA protects the underlying sequence from cleavage by DNase I, leaving nucleotide-resolution footprints. Using genomic DNase I footprinting across 41 diverse cell and tissue types, we detected 45 million transcription factor occupancy events within regulatory regions, representing differential binding to 8.4 million distinct short sequence elements. Here we show that this small genomic sequence compartment, roughly twice the size of the exome, encodes an expansive repertoire of conserved recognition sequences for DNA-binding proteins that nearly doubles the size of the human cis-regulatory lexicon. We find that genetic variants affecting allelic chromatin states are concentrated in footprints, and that these elements are preferentially sheltered from DNA methylation. High-resolution DNase I cleavage patterns mirror nucleotide-level evolutionary conservation and track the crystallographic topography of protein-DNA interfaces, indicating that transcription factor structure has been evolutionarily imprinted on the human genome sequence. We identify a stereotyped 50-base-pair footprint that precisely defines the site of transcript origination within thousands of human promoters. Finally, we describe a large collection of novel regulatory factor recognition motifs that are highly conserved in both sequence and function, and exhibit cell-selective occupancy patterns that closely parallel major regulators of development, differentiation and pluripotency. Comment in • Genomics: users' guide to the human genome. PMID: 22955618 Data from this publication Epigenomics Experiments, by feature type.See all experiments (277) • DNA methylation (29) • H2AK5ac (2) • H2BK120ac (2) • H2BK12ac (3) • H2BK15ac (3) • H2BK20ac (2) • H3K14ac (2) • H3K18ac (2) • H3K23ac (2) • H3K27ac (5) • H3K27me3 (24) • H3K36me3 (26) • H3K4ac (2) • H3K4me1 (17) • H3K4me2 (2) • H3K4me3 (28) • H3K56ac (2) • H3K79me1 (4) • H3K79me2 (2) • H3K9ac (14) • H3K9me1 (1) • H3K9me3 (22) • H4K20me1 (2) • H4K5ac (2) • H4K8ac (4) • H4K91ac (2) • chromatin accessibility (40) • gene expression (5) • input control (22) • small RNA analysis (4) Publication Types, MeSH Terms, Substances, Secondary Source ID, Grant Support Publication Types • Research Support, N.I.H., Extramural • Research Support, Non-U.S. Gov't • Research Support, U.S. Gov't, Non-P.H.S. MeSH Terms • DNA/genetics* • DNA Footprinting* • DNA Methylation • DNA-Binding Proteins/metabolism • Deoxyribonuclease I/metabolism • Encyclopedias as Topic* • Genome, Human/genetics* • Genomic Imprinting • Genomics • Humans • Molecular Sequence Annotation* • Polymorphism, Single Nucleotide/genetics • Regulatory Sequences, Nucleic Acid/genetics* • Transcription Factors/metabolism* • Transcription Initiation Site Substances • DNA-Binding Proteins • Transcription Factors • DNA • Deoxyribonuclease I Secondary Source ID • GEO/GSE18927 • GEO/GSE26328 September 5, 2012 Millions of DNA switches that power human genome’s operating system are discovered By Stephanie Seiler And Leila Gray Posted under: Health and Medicine, News Releases, Research, Science The locations of millions of DNA ‘switches’ that dictate how, when, and where in the body different genes turn on and off have been identified by a research team led by the University of Washington in Seattle. Genes make up only 2 percent of the human genome and were easy to spot, but the on/off switches controlling those genes were encrypted within the remaining 98 percent of the genome. Without these switches, called regulatory DNA, genes are inert. Researchers around the world have been focused on identifying regulatory DNA to understand how the genome works. Using a new technology developed with funding from the National Human Genome Research Institute’s ENCODE (ENCyclopedia Of DNA Elements) project, UW researchers created the first detailed maps of where regulatory DNA is located within hundreds of different kinds of living cells. They also compiled a dictionary of the instructions written within regulatory DNA — the genome’s programming language. Darryl Leja, NHGRI This illustration depicts DNA packed tightly into chromosomes, as well as a DNA molecule unwound to reveal its 3-D structure. The findings are reported in two papers appearing in the Sept. 5 online issue ofNature. “These breakthrough studies provide the first extensive maps of the DNA switches that control human genes,” said Dr. John A. Stamatoyannopoulos, associate professor of genome sciences and medicine at the University of Washington, and senior author on both papers. “This information is vital to understanding how the body makes different kinds of cells, and how normal gene circuitry gets rewired in disease. We are now able to read the living human genome at an unprecedented level of detail, and to begin to make sense of the complex instruction set that ultimately influences a wide range of human biology.” Here are the key results: 1) The first detailed maps of regulatory DNA switches that make up the genome’s ‘operating system’. See related stories: Encyclopedia of DNA elements compiled; UW a key force in Project ENCODE Researchers unlock disease information hidden in genome’s control circuitry The instructions within regulatory DNA are inscribed in small DNA ‘words’ that function as the docking sites for special proteins involved in gene control. In many cases, these switches are located far away from the genes that they control. To map the regulatory DNA regions, the researchers harnessed a special molecular probe — an enzyme called DNaseI — that snips the genome’s DNA backbone. Under the right conditions, these snips occur precisely where proteins are docked at regulatory DNA. By treating cells with DNase I and analyzing the patterns of snipped DNA sequences using massively parallel sequencing technology and powerful computers, the researchers were able to create comprehensive maps of all the regulatory DNA in hundreds of different cell and tissue types. They found that of the 2.89 million regulatory DNA regions they mapped, only a small fraction — around 200,000 — were active in any given cell type. This fraction is almost totally unique to each type of cell and becomes a sort of molecular bar code of the cell’s identity. The researchers also developed a method for linking regulatory DNA to the genes it controls. The results of these analyses show that the regulatory ‘program’ of most genes is made up of more than a dozen switches. Together, these findings greatly expand the understanding of how genes are controlled and how that control may differ between normal and diseased cells. 2) The first extensive map of regulatory protein docking sites on the human genome reveals the dictionary of DNA words comprise the genome’s programming language. The instructions for turning genes on and off are written in DNA switches called regulatory DNA. These switches are scattered throughout the non-gene regions of the human genome. Having mapped the locations of the regulatory DNA switches, UW researchers wanted to know what made them tick. These regions contain small chains of DNA ‘words’ that make up docking sites for special regulatory proteins involved in gene control. The human genome contains hundreds of genes that make such proteins. However, current technologies only allow such proteins to be studied one at a time. They also lack the accuracy to resolve the DNA letters to which the proteins dock. As a result, most of the actual DNA words recognized by regulatory proteins in living cells were unknown. To find them, the researchers employed a simple, powerful trick that enabled them to study all the proteins at once. Instead of trying to see proteins directly, they looked for their shadows or ‘footprints’ on the DNA. To accomplish this, they again turned to the DNaseI enzyme that snips the DNA backbone within regulatory DNA. Prior work had shown that DNaseI likes to snip DNA next to regulatory protein docking sites, but not within the docking site itself. By using next-generation DNA sequencing technology, the researchers analyzed hundreds of millions of DNA backbone breaks made when cells were treated with DNaseI. They then used a powerful computer to resolve millions of protein footprints. In total, they identified 8.4 million such footprints along the genome, some of which were detected in many cell types. Next, they compiled all of the short DNA sequences to which the proteins were docked. They analyzed them using a software algorithm that required hundreds of microprocessors working simultaneously. This revealed that more than 90 percent of the protein docking sites were actually slight variants of 683 different DNA words — essentially a dictionary of the genome’s programming language. “These findings significantly advance the understanding of how the instructions for controlling genes are written and organized throughout the genome, and how combinations of different instruction sets function together to control genes, often at great distance along the genome,” Stamatoyannopoulos said. “The broad spectrum of cell and tissue types included in these analyses provide an incredibly rich resource that can be mined immediately by researchers around the world to illuminate how the genes they are studying are controlled.” The scientists determined that genes are connected in a complex web. In this web, regulatory DNA regions typically control one or at most a few genes, but genes receive inputs from large numbers of regulatory regions. The researchers also found evidence for a combinatorial code that helps match regulatory DNA with the right genes. Another key finding was that the regulatory DNA controlling genes involved in cancer and other types of ‘immortal’ cells that can keep on growing indefinitely appears to acquire mutations at a different rate than other kinds of regulatory DNA. This result points to a previously unknown link between genome function and patterns of DNA variation in individual human genomes. The finding may have implications for understanding susceptibility to cancer. The findings reported in these papers are expanded upon in two related papers to be published simultaneously in the journals Science and Cell. In the Science paper, UW researchers further expanded the regulatory DNA maps, and compared them with genetic maps of human disease. Their studies revealed that most DNA variants associated with specific human diseases or clinical traits are located in regulatory DNA rather than in gene sequences. In the Cell paper, the researchers describe using the detailed information on regulatory protein docking sites to create a comprehensive map of how those proteins are wired. http://www.washington.edu/news/2012/09/05/millions-of-dna-switches-that-power-human-genomes-operating-system-are-discovered/
多基因测序技术发现自闭症外显子的频发突变 发布:2012/11/16 来自:生物通 阅读数: 960 来自华盛顿大学医学院,霍德华休斯医学院等处的研究人员利用一种新型技术,完成了多达2446个样品 外显子测序 分析,找到了 自闭症 谱系障碍(ASD,Autism Spectrum Disorder)的多个频发突变,不仅为治疗ASD疾病提供了新思路,而且也提出了一种低成本,多基因测序新方法。相关成果公布在Science杂志上。 多基因测序技术发现 自闭症 外显子的频发突变 领导这一研究的是华盛顿大学医学院Jay Shendure副教授,以及Evan E. Eichler教授,第一作者为Brian O'Roak,这一研究组致力于 自闭症 的分子机理研究。 自闭症 谱系障碍(ASD,Autism Spectrum Disorder)是根据典型 自闭症 的核心症状进行扩展定义的广泛意义上的 自闭症 ,既包括了典型 自闭症 ,也包括了不典型 自闭症 ,又包括了阿斯伯格综合症、 自闭症 边缘、 自闭症 疑似等症状。 让人担心的是,目前此类疾病的患病概率很高——据相关报告显示:平均每88个儿童中就有一个儿童患病。几十年来,科学家们一直在讨论遗传与环境因素对 自闭症 的影响,而关于基因成分与 自闭症 关系的讨论却是近几年才开始的。 之前的三个研究组,包括Eichler教授研究组在内发现了引起儿童大脑变异从而导致其社交问题的上百种,甚至上千种基因突变,但是要在大规模样品基础上,进行精确的重测序,寻找致病基因依然不容易,而且成本高。 在最新这篇文章中,研究人员改进了 分子倒置探针 (Molecular Inversion Probe,MIP)技术,从而研发出了一种新型多重靶向测序方法,这种方法成本低,精确度高,是基因测序技术的又一新发展。 分子倒置探针 技术与线性探针序列相比,能够指数级减少由于线性引物序列所引起的交叉反应及二聚体现象,具备了分子挂锁探针的优点。这种探针由7部分序列组成:2个内切酶识别位点,可利用限制性内切酶处理探针序列,2段目的基因互补序列,以及2段通用引物序列以及1段特异性标签序列。 利用这种技术,研究人员对两千多位受到不同类型ASD影响的患者进行了多基因分析,完成了四十四种基因的测序,在其中十六种基因中发现了27种随机突变。并且研究人员发现了6种频发突变:CHD8,DYRK1A,GRIN2B,TBR1,PTEN和TBL1XR,这些基因高频发生突变,可能是造成1%偶发性的 自闭症 谱系障碍的病因。 这项研究结果揭示了 自闭症 谱系障碍的分子病理机制,并提出了一种低成本,多基因测序新方法,这种方法可以用于可能是由随机破坏性的突变风险造成疾病的遗传分析。 外显子组测序的机遇与挑战 2009年, 基因组定向捕获 工具的出现,让外显子组的捕获成为可能。科学家们普遍认为外显子组测序比全基因组测序更有优势,特别是对罕见的单基因疾病。不仅仅是费用更低,数据的阐释也更为简单。因此,外显子组测序去年也被Science杂志评为年度十大突破。 Jay Shendure副教授曾对此发表过一篇综述性文章,评述了这一领域发展的机遇和挑战:他认为,外显子组测序未能解决相当大比例的孟德尔表型,即使是在遗传结构已清楚的模式生物中。如果我们希望解决所有的孟德尔遗传病,那么了解这些失败的基础将是至关重要的。同时,人们很想了解稀有变异对常见病的作用。许多研究都从外显子组测序开始,但是仍在进行中,因为需要大量的样本,才具有说服力。 外显子组测序鉴定出大约2万个变异,而全基因组测序鉴定出400万个变异。尽管蛋白改变的变异与其他变异的分离优先被证明是有用的,但无疑也是粗略的。从外显子组转移到基因组,为了未知的信号增加,我们要承担100倍的噪音增加。因此,我们需要更精密的方法,为编码和非编码变异分配更加适当的“先验值”。 不过尽管如此,Shendure也依然认为外显子组测序代表了“高产的遗传学”,通过较少样本的外显子组测序和适中的投资,就可以明确鉴定新的疾病基因。随着分析成本的进一步降低和分析精密度的提高,这种模式的生产力也会提高。 基因组水平的DNA甲基化研究新方法 Jay Shendure与其研究组今年还发表了另外一篇方法技术的原创性成果,报道了一种新的亚硫酸氢盐测序方法。 全基因组亚硫酸氢盐测序带来了高分辨率且全面的甲基化模式检测,但它需要大量的起始材料。在构建文库时,通常需要5μg以上的基因组DNA。因此,对于起始材料有限的样本,这种甲基化分析方法不适用。 相比之下,低代表性的亚硫酸氢盐测序需要的起始DNA要少一些,但同时牺牲了全面性。与分析整个基因组不同,这种方法聚焦于基因组的特定区域。而对于癌症和发育等领域的研究人员来说,起始材料往往有限,这也就限制了分析方法的选择。 而Shendure的方法方法仅需1ng起始DNA,但仍然能提供全基因组DNA甲基化模式的全面分析。其秘诀在于文库构建方法,这种称为“tagmentation”的方法比连接法更高效。基于tagmentation的全基因组亚硫酸氢盐测序方法利用Tn5转座子将DNA片段化,并同时掺入接头。与连接方法相比,转座子方法更加高效,也减少了所需的DNA起始量。 这种方法为样品量有限的表观遗传学研究人员提供了一个新选择,比如癌症的甲基化研究。此外,研究人员也在进一步优化方法,尝试使用更少的样品量。 Multiplex Targeted Sequencing Identifies Recurrently Mutated Genes in Autism Spectrum Disorders Brian J. O’Roak,Laura Vives,Wenqing Fu,Jarrett D. Egertson,Ian B. Stanaway,Ian G. Phelps,Gemma Carvill,Akash Kumar,Choli Lee,Katy Ankenman,Jeff Munson,Joseph B. Hiatt,Emily H. Turner,Roie Levy,Diana R. O’Day,Niklas Krumm,Bradley P. Coe,Beth K. Martin,Elhanan Borenstein, Deborah A. Nickerson,Heather C. Mefford,Dan Doherty,Joshua M. Akey,Raphael Bernier,Evan E. Eichler,Jay Shendure Exome sequencing studies of autism spectrum disorders (ASDs) have identified many de novo mutations, but few recurrently disrupted genes. We therefore developed a modified molecular inversion probe method enabling ultra-low-cost candidate gene resequencing in very large cohorts. To demonstrate the power of this approach, we captured and sequenced 44 candidate genes in 2446 ASD probands. We discovered 27 de novo events in 16 genes, 59% of which are predicted to truncate proteins or disrupt splicing. We estimate that recurrent disruptive mutations in six genes—CHD8, DYRK1A, GRIN2B, TBR1, PTEN, and TBL1XR1—may contribute to 1% of sporadic ASDs. Our data support associations between specific genes and reciprocal subphenotypes (CHD8-macrocephaly, DYRK1A-microcephaly) and replicate the importance of a β-catenin/chromatin remodeling network to ASD etiology. 文献链接 : Multiplex Targeted Sequencing Identifies Recurrently Mutated Genes in Autism Spectrum Disorders 相关热点 多基因测序技术发现自闭症外显子的频发突变 DNA微阵列技术和外显子测序技术检测出自闭症致病突变 自闭症早期干预可令18个月大儿童大脑活动正常化 美遗传学家成立孕前基因筛查公司引Science热议 美国生物技术公司SynapDx公司通过血检鉴别自闭症 推荐热点 哈佛科学家在DNA芯片中成功存储电子书 盘点伦敦奥运会可能出现的兴奋剂种类 疟原虫基因组测序揭示抗疟的挑战与机遇 Nature和Science同期刊登肿瘤干细胞的发现 Nature:奥运会背后的科学家们
二代测序技术正在不断突破高通量以及低测序成本的极限。 定向测序更是当前最有效控制测序成本同时获得关键序列信息的首选。 NimbleGen即将推出序列捕获前的多样本混合实验方案, 希望以此更好地优化序列捕获技术以配合二代测序平台的高通量,进而减少实验时间并降低测序费用。 这一新技术 利用不同条形码序列来结合不同样本,然后混合一次实验中进外显子或定制目标区域的液相捕获。 罗氏NimbleGen的首席执行官Frank Pitzer说:“我们很高兴向所有研究人员突出这个高效而且低成本的实验方案。相信通过这一方法,研究人员可以提高研究项目的样本通量,以此增强研究项目在统计学上的重要性。” 多样本混合实验方案的同时,新一代的外显子液相捕获产品也会同时推出。这一新产品将可捕获64M的基因组序列,包括所有外显子以及miRNA,它含与其他NimbleGen液相捕获产品相同的2.1M高密度探针,以确保高效、均一、特异、全面的定向捕获,将成为市场覆盖面最广的序列捕获产品之一。 Pitzer先生介绍说:“这个新的产品, NimbleGen SeqCap EX Exome Library v3.0延续了NimbleGen一贯以来产品的高效和均一的特点,这一点得到了行业内的认可,许多文献中的实验结果也可以证明。 例如在最近Nature Biotechonlogy杂志中刊登的一篇文章1,对于三种外显子组捕获产品的捕获序列进行测序后比较,在同样获得80M测序数据的情况下,NimbleGen有97%的目标序列达到10x以上 的测序深度,而其他产品只有90%。此外,NimbleGen SeqCap EX Exome Library v3.0产品将覆盖更广泛的区域,包括RefSeq, CCDS Vega以及Ensemble Database中的外显子相关区域。” 与此同时,研究人员仍然可以选择NimbleGen SeqCap EX Exome Library v2.0产品,它仍将是针对RefSeq数据库的外显子序列最为经济有效的测序捕获工具。而两项新产品的相关数据信息,将在最近在加拿大蒙特利尔举办的美国人类基因学年会中发布,敬请留意后续报道。 更多有关罗氏NimbleGen产品,请访问 www.nimblegen.com . 文中所涉及的文献 (1) Clark et al., Performance comparison of exome DNA sequencing technologies (2011) Nature Biotechnology Published online 25 September 2011 doi:1038/nbt.1975 英文原文如下: Roche NimbleGen Announces New Pre-capture Multiplexing for Target Enrichment Technology in Sequencing With the decreasing cost and increasing throughput of sequencing, researchers require a high-performance, cost-effective sample preparation pipeline for targeted sequencing. To enable researchers to more readily match targeted sequencing sample preparation throughput to the ever increasing throughput of next-generation sequencing, Roche NimbleGen (SIX: RO, ROG; OTCQX: RHHBY) announces the imminent launch of a pre-capture multiplex target enrichment protocol. This new pre-capture multiplex protocol enables multiple DNA samples to be barcoded and captured in a single SeqCap EZ Library reaction for exome or custom capture experiments. “We are extremely excited to provide researchers with a high performance, cost-effective pre-capture multiplex protocol that should allow researchers to increase the size of their studies, and thus, the statistical relevance,” stated Frank Pitzer, CEO of Roche NimbleGen. The pre-capture multiplex protocolwill be launched for an additional, more comprehensive Exome capture product. This new product will employ the same high-density probe technology that ensures high capture efficiency in all of its existing SeqCap EZ products. However, the new Exome product will target 64Mb of coding exons and miRNAs, providing researchers with an efficient target enrichment product with the most comprehensive coverage of coding regions. “The new extension of our target enrichment portfolio, NimbleGen SeqCap EZ Exome Library v3.0, will provide researchers with the same industry-renown performance and uniformity that researchers worldwide have proven in numerous recent publications. In one recent study in Nature Biotechnology1, with 80M reads, ~97% of the target bases are covered by more than 10-fold using NimbleGen SeqCap EZ where only ~90% of the target bases are covered by competitive technologies. Additionally, SeqCap EZ Exome Library v3.0 will target the most comprehensive collection of exons in the market as defined by the RefSeq, CCDS, Vega, and Ensembl databases,” Pitzer noted. Roche NimbleGen will continue to offer the high-performance SeqCap EZ Exome v2.0 product, as an efficient tool for researchers who want to generate extremely cost-effective sequencing data for RefSeq exons. Roche plans to release further information of both the pre-capture multiplexing protocol and the NimbleGen SeqCap EZ Exome v3.0 at the American Society of Human Genetics (ASHG) annual meeting (for more information visit Roche at ASHG booth number 502) next week in Montreal, Canada. For more information about Roche NimbleGen, please visit (1) Clark et al., Performance comparison of exome DNA sequencing technologies (2011) Nature Biotechnology Published online 25 September 2011 doi:1038/nbt.1975