科学网对Markdown排版支持较差,对格式不满意的用户请跳转至 CSDN 或 “宏基因组”公众号 阅读; 想了解更多宏基因组、16S文献阅读和分析相关文章,快关注“宏基因组”公众号,干货第一时间推送。 系统学习生物信息,快关注“生信宝典”,那里有几千志同道合的小伙伴一起学习。 作者: 刘永鑫 日期:2017-6-30 阅读时长:10min 背景介绍(Introduction)宏基因组学 宏基因组学目前的主要研究方法包括:16S/ITS/18S扩增子、宏基因组、宏转录组和代谢组,其中以扩增子研究最为广泛。 目的意义 本系列文章将带领大家结合较新的16S扩增子相关文献,来理解宏基因组16S扩增子文章中常用图表种类、图中包括的基本信息,以及作者想表达的结果。 主要内容 本系列文章内容包括:箱线图、散点图、热图、曼哈顿图、维恩图、三元图和网络图等。 学习思路 罗列知识点,熟悉专业名词,弄个脸熟,即使理解不深刻起码在阅读中不会有抵触情绪; 结合具体文章读图,实战两三次,基本就是专业人士了。 将来在大家可以很好理解相关文章图表的基础上,希望对分析、统计和绘图相关技术有进一步学习的小伙伴请积极回复并留言吧。如果本系统文章阅读过万,想学分析的留言过百。我还将详细讲解扩增子分析、统计和绘图各步骤的分析实例和源代码,希望大家多多鼓励和支持。 声明:文章的解读仅代表个人理解和观点,有不足处,请读者积极留言批评指正,互相学习,共同进步。 知识点(Method)曼哈顿图 Manhattan Plot 曼哈顿图本质上是一个散点图,用于显示大量非零大范围波动数值,最早应用于全基因组关联分析(GWAS)研究展示高度相关位点。它得名源于样式与曼哈顿天际线相似(如下图)。 Manhattan plot is a type of scatter plot, usually used to display data with a large number of data-points - many of non-zero amplitude, and with a distribution of higher-magnitude values, for instance in genome-wide association studies (GWAS). It gains its name from the similarity of such a plot to the Manhattan skyline: a profile of skyscrapers towering above the lower level “buildings” which vary around a lower height. 近几年,在宏基因组领域,尤其是差异OTU结合分类学结果,采用Manhattan plot展示有非常好的效果,倍受推崇。 曼哈顿图优点 大数据中,即展示数据全貌,又能快速找到目标基因或OTU,同时可知目标的具体位置和分类、显著程度等信息。绝对高端大气,而且还有内涵。 数据坐标轴介绍 以上图GWAS研究结果为例: X轴为染色体编号,且每个基因组SNP位点沿染色体序列排列;在16S扩增子或宏基因组中则为OTU按Taxonomy某一级别排序。 Y轴为该位点相关的统计显著性Pvalue值,由于pvalue值范围是从0-1,且越小越好,直接展示非常密集于0附近,很难区分。如何使越近0的显著数值变大,且而容易区分开,log10变换是非常好的方法,直接把关注的高显著性(Pvalue趋近零)值高位显示,远离整体,目标一目了然。 图中水平线一般为设定的不同显著性水平阈值,方便读出每个点的显著性水平;或只添加一条显示性阈值,高于则显著。 曼哈顿图绘制工具 散点图,自然还是R语言,ggplot2可以画的非常漂亮。 看图实战(Result)示例1. 双曼哈顿图展示WT和mutant间差异富含OTU分布在那些菌目 Zgadzaj, R., et.al., 2016 .PNAS 这篇文章分析了百脉根根瘤的微生物组成,同时在根瘤缺失突变体条件下发现根和根际微生物组均有较大差异的变化。 图5.A/B 曼哈顿图展示野生型,突变体根相对于根际土显著差异的OTU类型 图中元素解释 X轴标签“OTU… respect to rhizosphere”表示:根际土壤作为背景对照,计算富集的OTU; X轴OTU按分类学目水平(order)字母顺序排列显示,由于数量太多,不显示OTU编号标签反而更美观; Y轴为-log10(Pvalue);将pvalue转换为越显著越大,便于观察; 主图区的每个圆点或圈代表1个OTU,大小代表其相对丰度;其中存在显著富集OTU的目中所有OTU用彩色实心圆点显示,并添加灰度背景,且该目的名称标注于图顶部;目中内无显著富集OTU的目为空心灰点,且背景为白色。 图表结果:两个曼哈顿图展示WT和mutant间差异富含OTU分布在那些菌目;而且与野生型相比,在突变体中许多显著富集的菌目消失; 经验和技巧:单曼哈顿图显示显著富集的OTU已经信息非常丰度;采用曼哈顿图展示两中组差异的OTU,让读者自己去比较差异,反而更突出结果的显著差别。分类学注释级别选择目,找到了一些差别的类,要保证这些类即不能太多,也不能太少,才便于传递给读者工作即全面、又细致的印像。 附图注原文: Fig. 5. Manhattan plots showing root-enriched OTUs in WT (A) or in the mutants (B) with respect to rhizosphere and rhizosphere-enriched OTUs in WT (C) or in the mutants (D) with respect to root. OTUs that are significantly enriched (also with respect to soil) are depicted as full circles. The dashed line corresponds to the false discovery rate-corrected P value threshold of significance (α = 0.05). The color of each dot represents the different taxonomic affiliation of the OTUs (order level), and the size corresponds to their RAs in the respective samples . Gray boxes are used to denote the different taxonomic groups (order level). 示例2 这是我自己画的一个样式,对上图的样式做了一些改进,展示一个基因敲除突变体(KO/mutant)与野生型(WT)细菌组的比较; 图中元素解释 X轴为OTU,按分类学门水平字母排序; Y轴两组比较的Pvalue值,取loge(P),即自然对数转换; 图中点的大小代表该OTU的相对丰度,取log2(CPM)对数,即2的对数;CPM为count per million的缩写,和RPM类似,都是百万分数; 图中点颜色代表分类学门类型,便于从门水平找规律; 图中点的形状标注了其变化的类型,是上调enriched(正实心三角),还是下调depleted(倒空心三角),还是没有显著差异变化nosig(实心圆点); 图表结果:展示了KO突变体基因型相较WT有较明显的细菌组变化,尤其是放线菌门上调较多,变型菌门上调和下调都很多,但上调的更显著; 图表经验:从门水平先看整体规律,再一步步往纲、目、科、属去找规律的具体细节;用形状区分上调或下调,让结果更清楚。 Reference https://en.wikipedia.org/wiki/Manhattan_plot Zgadzaj, R., Garrido-Oter, R., Jensen, D.B., Koprivova, A., Schulze-Lefert, P. and Radutoiu, S., 2016. Root nodule symbiosis in Lotus japonicus drives the establishment of distinctive rhizosphere, root, and nodule bacterial communities. Proceedings of the National Academy of Sciences, 113(49), pp.E7996-E8005.
The pangenome of hexaploid bread wheat 今天看到这样一篇文章,着实高兴了一番。在小麦参考基因组刚刚公布,文章还未发表的情况下,突然跳出了一篇小麦宏基因组的文章,让人不得不得感叹,这都是什么世道。匆匆看过一遍之后,有点失望,这分析也太简单了,怪不得只发在了The plant journal上,需要挖掘的信息很多。 文章一共使用了18个小麦品种,其中就包括中国春,还有其他16个澳大利亚的小麦品种。分析的内容都很初级,这里不再详谈了。说点和它的原始数据有关的事。看完文章之后,竟然没说原始数据在哪里可以下载。于是匆忙给 David Edwards 教授发邮件询问。发完之后我就下载附件信息再看,结果竟然有说在什么地方下载原始数据。看到这,我寻思大事不好,人家提供了我竟然发邮件说没有,闹了国际大笑话。我于是赶紧看看是否发出去了。结果又让我大吃了一惊,David教授竟然回我邮件了。大意就是说我们提供了,只不过文章还未正式上线,数据还未整理好。我看看了时间,可能恰好是David教授刚刚上班,正在收邮件呢。我看了下原始数据从2011年就有了一部分数据,后面陆陆续续又补充了很多。也难怪,写文章时小麦还没有一个像样的参考基因组,de novo组装也不太现实。 在这里还是感谢David教授,非常及时又爽快的给我回邮件。貌似和老外邮件交流很少碰到有去无回的。这让我想起了给国内的一些老板发邮件请教问题的时候,绝大部分老板不会回我。印象中老师华中农大研究玉米的严建兵老师给我回过几次,还有凌宏清老师给我回过一次。
文章: Microbial diversity in individuals and their household contacts following typical antibiotic courses 时间:2016 杂志:Microbiome 研究目的: (1) discern the effects of the 2 most commonly prescribed antibioticson the microbiota of the skin, gut, and mouth, (2) characterize the degree of similarity in the microbiota of unrelated household contacts and decipher whether it is significantly affected by antibiotic use, (3) characterizethe long-term effects of typical antibiotic prescriptionson microbiota diversity, and (4) discern whether there may be collateral effects to antibiotic use for the diversity of microbiota in household contacts. 研究方案: We recruited and sampled the feces, saliva, and skin from a cohort of 56 subjects over a 6-month period fromthe University of California, San Diego, campus. Of those 56 individuals, there were 24 separate households consisting of 2 individuals and 8 separate controls not enrolled with a housemate. 研究结果: 1) between household pairs longitudinally and comparing with individuals from separate households, we found smaller distances among the household pairs, which was statistically significant ( p 0.05) in the gut, saliva, and skin for all households. The similarity observed in the bacterial biota was not significantly affected by the use of antibiotics, as the same patterns were observed in households that received azithromycin and those that received amoxicillin. 2) 样品间的差异随时间延长而增加,无论抗生素使用与否。 a clear trend could be ob-served for most time points, as those control subjects who received no treatment also demonstrated the same trend. 3) 样本无法根据抗生素使用与否、时间等因素分开,差异更多体现在取样部位上 4) 差异的菌属分析 5) alpha多样性的分析,抗生素与安慰剂组的差值进行作图 心得: 1)在菌群的研究中,如何将众多的菌整合成1个或少数几个指标是数据处理的关键,文章利用 weighted UniFrac distances作为一个值整体代表群体状态进行比较分析,值得类似研究中学习和借鉴。
文章: A Rapid and Economical Method for Efficient DNA Extraction from Diverse Soils Suitable for Metagenomic Applications 2015 ( https://www.ncbi.nlm.nih.gov/pubmed/26167854 )
文章: myPhyloDB: a local web server for the storage and analysis of metagenomic data 2016 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4809264/ 文章: Integrative workflows for metagenomic analysis 2016 http://www.ncbi.nlm.nih.gov/pubmed/2547856 文章: The metagenomics RAST server – a public resource for theautomatic phylogenetic and functional analysis of metagenomes 2008 (工具地址: http://metagenomics.anl.gov )
文章: EBI metagenomics in 2016 - an expanding and evolving resource for the analysis and archiving of metagenomic data 2016 ( https://www.ncbi.nlm.nih.gov/pubmed/26582919 ) 文章: HPMCD: the database of human microbial communities from metagenomic datasets and microbial reference genomes 2016 ( http://www.ncbi.nlm.nih.gov/pubmed/26578596 )
文章: Partial Least Squares Regression Can Aid in Detecting Differential Abundance of Multiple Features in Sets of Metagenomic Samples 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/26734061 ) 文章: A robust approach for identifying differentially abundant features in metagenomic samples 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/25792553 ) 文章: MetaBoot: a machine learning framework of taxonomical biomarker discovery for different microbial communities based on metagenomic data 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/26213658 ) 文章: FCMM: A comparative metagenomic approach for functional characterization of multiple metagenome samples 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/26027543 ) 文章:Differential abundance analysis for microbial marker-gene surveys 2013 ( http://www.nature.com/nmeth/journal/v10/n12/full/nmeth.2658.html ) 文章: Metastats: an improved statistical method for analysis of metagenomic data 2011 ( http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3439073/ ) 文章: STAMP: statistical analysis of taxonomic and functional profiles 2014 ( http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4609014/ )
文章: EXTRACT: interactive extraction of environment metadata and term suggestion for metagenomic sample annotation 2016 ( http://www.ncbi.nlm.nih.gov/pubmed/26896844 ) 文章: Network construction and structure detection with metagenomic count data 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/26692900 ) 文章: COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/26561344 ) 文章: SUPER-FOCUS: a tool for agile functional analysis of shotgun metagenomic data 2016 ( http://www.ncbi.nlm.nih.gov/pubmed/26454280 )
文章:GenSeed-HMM: A Tool for Progressive Assembly Using Profile HMMs as Seeds and its Application in Alpavirinae Viral Discovery from Metagenomic Data (2016.03) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4777721/ 文章: TruSPAdes: barcode assembly of TruSeq synthetic long reads 2016 ( http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.3737.html ) 文章: InteMAP: Integrated metagenomic assembly pipeline for NGS short reads 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/26250558 ) 文章: Xander: employing a novel method for efficient gene-targeted metagenomic assembly 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/26246894 ) 文章: IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth 2012 ( http://www.ncbi.nlm.nih.gov/pubmed/22495754 ) 文章: Grid-Assembly: An oligonucleotide composition-based partitioning strategy to aid metagenomic sequence assembly 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/25790784 ) 文章: DIME: a novel framework for de novo metagenomic sequence assembly 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/25684202 ) 文章: SFA-SPA: a suffix array based short peptide assembler for metagenomic data 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/25637561 ) 文章: An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/25586223 ) 文章: MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/25431440 )、 文章: Improved assemblies using a source-agnostic pipeline for MetaGenomic Assembly by Merging (MeGAMerge) of contigs 2014 ( https://www.ncbi.nlm.nih.gov/pubmed/25270300 )
文章: Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes 2016 ( https://www.ncbi.nlm.nih.gov/pubmed/27067514 ) 文章: MetaCRAM: an integrated pipeline for metagenomic taxonomy identification and compression 2016 ( http://www.ncbi.nlm.nih.gov/pubmed/26895947 ) 文章: Evaluating the Quantitative Capabilities of Metagenomic Analysis Software 2016 ( http://www.ncbi.nlm.nih.gov/pubmed/26831696 ) 文章: MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets 2016 ( http://www.ncbi.nlm.nih.gov/pubmed/26515820 ) 文章:Metagenomic Classification Using an Abstraction Augmented Markov Model 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/26618474 ) 文章:DectICO: an alignment-free supervised metagenomic classification method based on feature extraction and dynamic selection 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/26446672 ) 文章: MetaPhlAn2 for enhanced metagenomic taxonomic profiling 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/26418763 ) 文章: Multi-Layer and Recursive Neural Networks for Metagenomic Classification 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/26316190 ) 文章: deFUME: Dynamic exploration of functional metagenomic sequencing data 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/26227142 ) 文章: Spaced seeds improve k-mer-based metagenomic classification 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/26209798 ) 文章: Investigating microbial co-occurrence patterns based on metagenomic compositional data 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/26079350 ) 文章: Reconstructing 16S rRNA genes in metagenomic data 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/26072503 ) 文章: Bayesian mixture analysis for metagenomic community profiling 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/26002885 ) 文章: MICCA: a complete and accurate software for taxonomic profiling of metagenomic data 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/25988396 ) 文章: Identifying personal microbiomes using metagenomic codes 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/25964341 ) 文章: CS-SCORE: Rapid identification and removal of human genome contaminants from metagenomic datasets 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/25944184 ) 文章:T reeSeq, a Fast and Intuitive Tool for Analysis of Whole Genome and Metagenomic Sequence Data 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/25933115 ) 文章:MUSiCC: a marker genes based framework for metagenomic normalization and accurate profiling of gene abundances in the microbiome 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/25885687 ) 文章: CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/25879410 ) 文章: Woods: A fast and accurate functional annotator and classifier of genomic and metagenomic sequences 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/25863333 ) 文章: METAXA2: improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/25732605 ) 文章: Exploiting topic modeling to boost metagenomic reads binning 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/25859745 ) 文章: MBBC: an efficient approach for metagenomic binning based on clustering 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/25652152 ) 文章: VizBin - an application for reference-independent visualization and human-augmented binning of metagenomic data 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/25621171 ) 文章: Binpairs: utilization of Illumina paired-end information for improving efficiency of taxonomic binning of metagenomic sequences 2015 ( http://www.ncbi.nlm.nih.gov/pubmed/25551450 ) 文章: MetaObtainer: A Tool for Obtaining Specified Species from Metagenomic Reads of Next-generation Sequencing 2015 ( https://www.ncbi.nlm.nih.gov/pubmed/26293485 ) 文章: MetaBoot: a machine learning framework of taxonomical biomarker discovery for different microbial communities based on metagenomic data 2015 ( https://www.ncbi.nlm.nih.gov/pubmed/26213658 ) 文章: BioMaS: a modular pipeline for Bioinformatic analysis of Metagenomic AmpliconS 2015 ( https://www.ncbi.nlm.nih.gov/pubmed/26130132 ) 文章: FCMM: A comparative metagenomic approach for functional characterization of multiple metagenome samples 2015 ( https://www.ncbi.nlm.nih.gov/pubmed/26027543 ) 文章: MetaGeniE: characterizing human clinical samples using deep metagenomic sequencing 2014 ( http://www.ncbi.nlm.nih.gov/pubmed/25365329 ) 文章: Binning metagenomic contigs by coverage and composition 2014 ( https://www.ncbi.nlm.nih.gov/pubmed/25218180 ) 文章:COVER: a priori estimation of coverage for metagenomic sequencing 2012 ( http://www.ncbi.nlm.nih.gov/pubmed/23760797 )
Comparison of direct boiling method with commercial kits for extracting fecal microbiome DNA by Illumina sequencing of 16S rRNA tags 2013 ( http://www.ncbi.nlm.nih.gov/pubmed/23899773 )
FastQC: Quality control tool for high-throughput sequence data using modular options and giving graphic results of quality per base sequence, GC content, Nnumbers, duplication, and over represent ( http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ ) Fastx-Toolkit: Command line tools for Short-reads quality control. These allow processing, cutting, format conversion, and collapsing by sequence length and identity ( http://hannonlab.cshl.edu/fastx_toolkit/index.html ) PRINTSEQ: Quality control tool for sequence trimming based in dinucleotide occurrence and sequence duplication(mainly 5′/3′) ( http://prinseq.sourceforge.net/ ) NGS QC Toolkit: Tool for quality control analysis performed in parallel environment ( http://www.nipgr.res.in/ngsqctoolkit.html ) Meta-QC-Chain: Parallel environment tool for quality control. This performs a mapping against 18S rRNA databases for removing eukaryotic contaminant sequences ( http://www.computationalbioenergy.org/qc-chain.html ) Mothur: From reads quality analysis to taxonomic classification, calculus of diversity estimators and ribosomal gene metaprofiling comparison ( http://www.mothur.org/ ) QIIME: Quality pre-treatment of raw reads, taxonomic annotation, calculus of diversity estimators, and comparison of metaprofiling or metagenomic data ( http://qiime.org/ ) MEGAN: Taxonomy and functional analysis of metagenomic reads. It based on BLAST output of short reads and performs comparative metagenomics. Graphical interface ( http://ab.inf.uni-tuebingen.de/software/megan5/ ) CARMA: Phylogenetic classification of reads based on Pfam conserved domains ( http://omictools.com/carma-s1021.html ) PICRUSt: Predictor of metabolic potential from taxonomic information obtained of 16S rRNA metaprofiling projects ( http://picrust.github.io/picrust/ ) Parallel-meta: Taxonomic annotation of ribosomal gene markers sequences obtained by metaprofiling or metagenomic reads. Functional annotation based on BLAST best hits results. Comparative metagenomics ( http://www.computationalbioenergy.org/parallel-meta.html ) MOCAT: Pipeline that includes quality treatment of metagenomic reads, taxonomic annotation based on single copy marker genes classification, and gene-coding prediction ( http://vm-lux.embl.de/~kultima/MOCAT2/index.html ) TETRA: Taxonomic classification by comparison of tetranucleotide patterns. Web service available ( http://omictools.com/tetra-s1030.html ) PhylophytiaS: Composition-based classifier of sequences based on reference genomes signatures ( http://omictools.com/phylopythia-s1455.html ) MetaclusterTA: Taxonomic annotation based on binning of readsand contigs. Dependent of reference genomes ( http://i.cs.hku.hk/~alse/MetaCluster/ ) MaxBin: Unsupervised binning of metagenomic short reads and contigs ( http://sourceforge.net/projects/maxbin/ ) Amphora and Amphora2: Metagenomic phylotyping by single copy phylogenetic marker genes classification ( http://pitgroup.org/amphoranet/ ) BWA: Algorithm for mapping short-low-divergent sequences to large references. Based on Burrows–Wheeler transform ( http://bio-bwa.sourceforge.net/ ) Bowtie: Fast short read aligner to long reference sequences based on Burrows–Wheeler transform ( http://bowtie-bio.sourceforge.net/index.shtml ) Genometa: Taxonomic and functional annotation of short-reads metagenomic data. Graphical interface ( http://genomics1.mh-hannover.de/genometa/ ) SORT-Items: Taxonomic annotation by alignment-based orthology of metagenomic reads ( http://metagenomics.atc.tcs.com/binning/SOrt-ITEMS ) DiScRIBinATE: Taxonomic assignment by BLASTx best hits classification of reads ( http://metagenomics.atc.tcs.com/binning/DiScRIBinATE ) IDBA-UD: Assembler de novo of metagenomic sequences with uneven depth ( http://i.cs.hku.hk/~alse/hkubrg/projects/idba_ud/ ) MetaVelvet: De novo assembler of metagenomic short reads ( http://metavelvet.dna.bio.keio.ac.jp/ ) Ray Meta: Assembler of de novo of metagenomic reads and taxonomy profiler by Ray Communities ( http://denovoassembler.sourceforge.net/ ) MetaGeneMark: Gene coding sequences predictor from metagenomic sequences by heuristic model ( http://exon.gatech.edu/index.html ) GlimmerMG: Gene coding sequences predictor from metagenomic sequences by unsupervised clustering ( http://www.cbcb.umd.edu/software/glimmer-mg/ ) FragGeneScan: Gene coding sequences predictor from short reads ( http://sourceforge.net/projects/fraggenescan/ ) CD-HIT: Clustering and comparing sequences of nucleotides or protein ( http://weizhongli-lab.org/cd-hit/ ) HMMER3: Hidden Markov models applied in sequences alignments ( http://hmmer.janelia.org/ ) BLASTX: Basic local alignment of translated sequences ( http://blast.ncbi.nlm.nih.gov/blast/Blast.cgi?PROGRAM=blastxPAGE_TYPE=BlastSearchLINK_LOC=blasthome ) MetaORFA: Assembly of peptides obtained from predicted ORFs Min Path: Reconstruction of pathways from protein family predictions ( http://omics.informatics.indiana.edu/MinPath/ ) MetaPath: Identification of metabolic pathways differentially abundant among metagenomic samples ( http://metapath.cbcb.umd.edu/ ) GhostKOALA: KEGG’s internal annotator of metagenomes by k-number assignment by GHOSTX searches against a non-redundant database of KEGG genes ( http://www.kegg.jp/ghostkoala/ ) RAMMCAP: Metagenomic functional annotation and data clustering ( http://weizhong-lab.ucsd.edu/rammcap/cgi-bin/rammcap.cgi ) ProViDE: Analysis of viral diversity in metagenomic samples ( http://metagenomics.atc.tcs.com/binning/ProViDE/ ) Phyloseq: Tool-kit to row reads pre-processing, diversity analysis and graphics production. R, Bioconductor package ( https://joey711.github.io/phyloseq/ ) MetagenomeSeq: Analysis of differentially abundance of 16S rRNA gene in metaprofiling data. R, Bioconductor package ( http://bioconductor.org/packages/release/bioc/html/metagenomeSeq.html ) ShotgunFunctionalizeR: Metagenomic functional comparison at level of individual genes (COG and EC numbers) and complete pathways. R, Bioconductor package ( http://shotgun.math.chalmers.se/ ) Galaxy portal: Web repository of computational tools that can be run without informatic expertise. Graphical interfaceand free service ( https://usegalaxy.org/ ) MG-RAST: Taxonomic and functional annotation, comparative metagenomics. Graphical interface, web portal, andfree service ( http://metagenomics.anl.gov/ ) IMG/M: Functional annotation, phylogenetic distribution of genes and comparative metagenomics. Graphical interface, web portal, and free service ( https://img.jgi.doe.gov/cgi-bin/m/main.cgi ) 参考文章: The Road to Metagenomics: From Microbiology to DNA Sequencing Technologies and Bioinformatics (http://journal.frontiersin.org/article/10.3389/fgene.2015.00348/abstract)
文章:Can we predict future allergies from our infant gut microbiota? Sensitization to food allergens is common during early life, affecting up to 28% of preschool children(学龄前的儿童多会发生食物过敏的情况). While 66 – 90% of infants outgrow their sensitization to egg and milk, respectively, and prevalence rates drop to around 2% by age 5(随着年龄的增长,这种情况有所减轻)。 文章“ The psychosocial impact of food allergy and food hypersensitivity in children, adolescents and their families: areview ”总结了小孩和青少年的食物过敏对于家庭的影响。为了提前进行预测以更早进行干预,文章 “ Skin prick test responses and allergen-specific IgE levels as predictors of peanut, egg,and sesame allergy in infants. ” 总结了利用皮试检测IgE的水平以判断infants对于penaut、egg和sesame的过敏情况。随着新一代测序技术的发展,目前的趋势呈现为: more complete assessment of gut microbial communities during infancy enhances our ability to identify gut microbiome biomarkers which can predict future allergic disease. 有研究发现: 1)在European cohort的研究中,1个月大的infants,肠道菌群多样性的减少、 Bacteroides 含量的减少是 atopic dermatitis 的标志现象。 如果母亲患有哮喘,infants的菌群多样性较少,在6岁时,具有较高风险的 allergic sensitization。 2)在KOALA birth cohort study of 1000 infants中,1个月时肠道内繁殖有 Clostridium difficile ,则在2岁时会出现 atopic sensitization. 3)在18个月大时,诊断有 atopic dermatitis的小孩体内会聚集 Clostridium ,而 Bacteroides spp. 的含量则会下降。 4)在肠道生态出现紊乱的infants中,确定有food allergy. 5)在5个月大具有food allergy的infants中, Firmicutes 含量较高、 Bacteroidetes 含量较低,菌群多样性未见到呈现变化。 6)在具有cow’s milk allergy的infants中,多发现有 Clostridium coccoides 、 lactobacilli 和其他的厌氧菌,同时具有较少量的 bifidobacteria 和 enterobacteria . 7) food-sensitized infants are twice as likely to experience the “ atopic march ” to conditions such as atopic dermatitis, allergicrhinitis and asthma. 3个月时体内 Bacteroidaceae 含量的减少会在9个月时出现food sensitization . 在这些研究中有几个需要关注的情况: 1)不同的生育方式、不同的母乳喂养方式对infants中的菌群影响存在差异。 2)结果是不能通用的,每个国家的情况可能还不一致。 A taxonomic marker that works well in Canada and Finland may not work well in Germanyor the USA. 为此有必要中国人群自己的数据集。 3)预测模型无论基于IgE或者是microbiota,效果都有待提高,或许很多症状的出现不是一个层面,一个维度的数据可以简单判断的,是一个综合的效应。
Early View全文分享: 2015-Douglas et al.-A DNA Barcoding system integrating multigene sequence data.pdf 自2003年在加拿大多伦多大学,了解了DNA条形码的理念后,我个人持续关注。2005年在伦敦自然历史博物馆参加第 二届世界DNA条形码大会后,自己更加希望做些工作,推动昆虫系统学工作。 基于 DNA 序列,学界已经开发了很多用于分类鉴定的方法和系统。但是,在真核生物中,大多数系统使用单个预设的基因片段,如 COI 、 16S 等。有限数据信息可能导致鉴定结果出现一定的偏差。这些系统也很难识别并分析基因组来源的大量基因数据。 今天收到 Methods in Ecology and Evolution 编辑部来信, Douglas Chesters 博士等整合多基因数据的 DNA 条形码系统的研究论文已经被正式接受,并将于近日在线发表。 在这篇论文中,我们实现了多个基因的 DNA 条形码功能: 1 )基于经常测定的基因位点数据,建立一个参考框架性数据集; 2 )其它基因数据和参考序列进行同源比对、剪切,同时在种内变异范围内对查询基因片段赋予物种分类阶元信息。我们把该方法和现有一些方法进行了比较,如“ bagpipe_phylo” 。后者在系统发育树上给序列重新赋予分类阶元信息。 上述建议的多基因系统正确推断了 GenBank 中节肢动物 78% 的物种和 94% 的属级阶元。尤为关键的是,物种鉴别的比例高于仅仅用 COI 的方法。测试数据中, 24% 的物种仅仅见于非 COI 基因,而且这些 COI 之外的基因的物种阶元信息赋予正确率并没有明显的降低。同法,我们应用非 COI 的数据栏对建立的宏基因组数据进行了额外的物种阶元信息赋予。通过测试 1 个 273 条蜜蜂基因序列的数据,我们通过改变遗传距离的计算方法,物种赋予正确率和基于系统发育的分类鉴定结果差异不明显。 标准的单基因片段 DNA 条形码仍然是基于 PCR 产生数据的物种鉴定的重要鉴定工具。对于已经建立的大量物种 DNA 条形码“骨干数据”而言,本文方法可以补充下列几点: 1 )基因组数据; 2 )通过整合其它独立的基因位点降低错误; 3 )对非条形码片段进行额外的物种鉴定。通过新一代测序平台,后面两点和群落基因组监测工作尤其相关。 学海无涯勤作舟。 Douglas 博士来组里以后,努力工作,取得了一系列的研究进展。 在基于基因序列的物种界别的方向上, Douglas 博士已经连续在 Systematic Biology ( 2 篇)和 Methods in Ecology and Evolution ( 2 篇)上发表论文,把单个基因的思路推广到多个基因,并实现了大数据库中基因物种信息的自动矫正和赋予。 功夫不负有心人:2014年, 他获得中国科学院院长国际学者1年期项目(PIFI),获得一项国家自然科学基金项目,并于年底成功竞聘为动物研究所副研究员。 后续我们计划在下面几个方面继续努力: 1 ) 把该方法推广到基因组。这个功能已经部分实现,但是还需要较多组学数据的实际测试。 2)把该方法推广到系统树上的一些关键节点。这是我个人最感兴趣的点。 3 )把该方法和其它学科,特别是昆虫多样性和物种互作研究结合起来。 4 )把该方法更系统地应用到蜜蜂物种较为丰富的几个属中,加快蜜蜂总科系统学研究工作。 原文摘要和全文将于在线后和大家分享: A DNA Barcoding system integrating multi-gene sequence data Douglas Chesters, Wei-Min Zheng and Chao-Dong Zhu Accepted manuscript online: 4 MAR 2015 04:41AM EST | DOI: 10.1111/2041-210X.12366 Abstract PDF(223K) Supporting Information Request Permissions