科学网

 找回密码
  注册

tag 标签: HMP计划

相关帖子

版块 作者 回复/查看 最后发表

没有相关内容

相关日志

HMP-The Intestinal Microbiome in Early Life: Health...
xbinbzy 2015-8-31 17:26
文章:The Intestinal Microbiome In Early Life: Health and Disease 杂志:Front Immunol 年份:2014 在小孩成长到3岁之前,体内的菌群是不稳定的。 Using sequential fecal sampling from one infant during the first two and a half years of life, and two large cohort human microbiome studies across North America, Africa, South America, and Europe, it is apparent the gut microbiome is highly unstable during the first 3 years of life.且3岁以下的小孩多样性指数较低, Children younger than 3 years of age have a significantly lower diversity index compared to adults, with ~1000 operational taxonomic units (OTUs) detected in the first year of life, compared to almost 2000 OTUs after this. 在小孩成长过程中,体内的菌群随着饮食变化逐渐变化。 在小孩肠道菌群稳定的过程中,会随到很多因素的影响: 1) amniotic fluid(羊水)是无菌的,当羊水中发现有细菌时,会出现 chorioamnionitis、 pre-term delivery和 enterocolitis等病症 2)剖腹产和顺产对小孩的菌群有影响, Cesarean born infants harbored less Bifidobacterium and Bacteroides species compared to children born vaginally, the gut microbiota of cesarean delivered infants at 24 months of age is less diverse than those delivered vaginally. 3)奶粉喂养与母乳喂养对小孩菌群有影响, Comparisons between breast-fed and formula-fed infants show that breast-fed infants tend to contain a more uniform population of gut microbes. Bifidobacteria and Lactobacillus tend to dominate the guts of breast-fed infants whereas formula-fed infants exhibit higher proportions of Bacteroides, Clostridium, Streptococcus, Enterobacteria , and Veillonella spp. 4)食物的摄入对菌群亦有较大影响, shifts in diet can significantly alter the gut microbiota due to the presence of new substrates that promote the survival and proliferation of varied types of microbial species. 5)不同地区对菌群有较大影响,主要在于饮食习惯的不一样。 6)抗生素的影响, Antibiotic usage in early-life can also significantly impact the growth of otherwise dominant bacterial phyla in the human gut. 7)Crohn’s disease患者体内菌群的多样性较低,
个人分类: 科研文章|2 次阅读|0 个评论
HMP-MetaORFA
xbinbzy 2015-8-25 16:18
文章:An ORFome assembly approach to metagenomics sequences analysis 杂志:J Bioinform Comput Biol 年份:2009 宏基因组组装的挑战: 1) metagenomics projects often apply NGS technique, and produce shorter reads . As a result, many short repeats may increase the complexity of the overlap graph, and cause many more mis-assemblies. 2) unlike the conventional genome shotgun sequencing, which handles a single species, metagenomics sequencing reads are collected from a large amount of different genomes. 基本原理: We implemented a tool called MetaORFA in C/C++ under linux platforms for the ORFome assembly. MetaORFA consists of two programs. One program takes as input a set of reads and predicts a number of putative ORFs; and the other program (EULER-ORFA) takes as input the set of putative ORFs, and reports a set of peptides corresponding to the assembled ORFs. Prior to be supplied to MetaORFA, the original reads were first processed by MDUST (a tool for autonomous masking from TIGR, which implements the DUST algorithm ) to mask out low-complexity regions, and then processed by Tandem Repeat Finder (TRF V4.0) to mask out short tandem repeats. 组装效果很关键的环节在于ORF的预测,这里的处理步骤是: For each read (and its reverse complement), a region from the beginning (i.e., position 1, 2, or 3, depending on the frame) or a start codon to the end of the read or a stop codon is considered as a potential ORF. Only ORFs with more than a threshold K (default K = 25) codons were reported. These ORFs will then be transformed into peptide sequences, and subsequently assembled using EULER-ORFA algorithm, modified from the original EULER algorithm designed for DNA fragment assembly . 如果基于ORF组装是个非常不错的策略,那么对于ORF的预测会是很重要的改进点。
个人分类: 科研文章|2026 次阅读|0 个评论
HMP-Metagenomic Pyrosequencing and Microbial Identification
xbinbzy 2015-8-25 14:58
文章: Metagenomic Pyrosequencing and Microbial Identification 杂志: Clin Chem 年份:2009 80% of bacteria identified by metagenomic sequencing were considered noncultivable.(80%的菌是不能体外培养的) Pathogen identification in infectious diseases relies mostly on routine cultures and biochemical testing using semi-automated platforms in the clinical laboratory.(病毒的检测是利用培养和生化检测、半自动化的视线) 数据库: The oldest and most traditional bacterial classification system is based on Bergey’s taxonomy which has attempted to merge phenotypic (e.g. biochemical) and molecular data to create a higher-order taxonomy in recent years. More recently developed taxonomic schemes include systems proposed by Pace, Ludwig, Hugenholtz, and the NCBI. Multiple on-line databases have been developed on the basis of these different taxonomic schemes and provide convenient access to large ribosomal RNA sequence databases for clinical laboratories and research teams. The most prominent databases include the Ribosomal Database Project II (RDP II) ( http://rdp.cme.msu.edu/ ), Greengenes (greengenes.lbl.gov), and ARB-Silva. RDP II is based on Bergey’s taxonomy which contains a relatively small number of phyla. Greengenes includes multiple taxonomic schemes so that query results with this database can be compared using different classification systems. The ARB-Silva database also offers a choice of microbial taxonomies, although it is more limited in its flexibility than Greengenes. 不同的数据库之间有较大区别,这个需要注意,那反过来讲,针对各自的科研目的,选择什么样的数据库是最合适的?不同的数据库具体区别在哪?对研究结果的具体影响是什么? A s a case in point, the Pace and Hugenholtz lineages separately named 12 phylum-level lineages, and RDP II had not named any of these lineages . The taxonomic schemes varied with respect to numbers of phyla, for example, with a maximum of 88 phyla for the Pace and Hugenholtz curations and 31 phyla for the RDP (based on Bergey’s) classification system.
个人分类: 科研文章|1875 次阅读|0 个评论
HMP计划-Metagenomics: Facts and Artifacts, and Computationa
xbinbzy 2015-8-18 11:08
文章: Metagenomics: Facts and Artifacts, and Computational Challenges 杂志 : J Comput Sci Technol 年份:2009 基于next generation sequencing技术,存在metagenomics、16s、 targeted metagenomics。 1)组装及基因预测(Assembly and gene prediction ) 多数工具是对单基因组的组装,未考虑多个混合基因组数据的组装。 常见的工具 Velvet (a Eulerian path assembler)、 ALLPATHS、 Euler-SR。 基因的预测策略: use 6-frame translation when conducting a similarity search on the short reads,目前的研究进展不算很多,目前的工具有 MetaGene、 Orphelia。 2)菌群的多样性定性与定量分析工具 (Tools for characterizing microbial diversity qualitatively and quantitatively) 需要明确样本中 taxonomic composition的信息; 常用工具: MEGAN、 MLTreeMap、 AMPHORA、 CARMA ; 工具的原理: MEGAN applies a simple lowest common ancestor algorithm to assign reads to taxa, based on BLAST similarity search results. 与数据库比对确定物种信息。 MLTreeMap and AMPHORA are two phylogeny-based phylotyping tools that use the phylogenetic analysis of marker genes for taxonomic distribution estimation. Phylogenetic analysis of marker genes, including 16S rRNA genes , DNA polymerase genes , and 31 selected marker genes have also been applied to determining taxonomic distribution. 基于16s、DNA聚合酶等标记基因研究进化关系,从而决定物种分布。 CARMA searches for conserved Pfam domain and protein families in the raw metagenomic sequences and classifies them into a higher-order taxonomy, based on the reconstruction of a phylogenetic tree of each matching Pfam family. 根据Pfam和蛋白家族的保守关系去界定物种。 引发的问题: 在对物种进行研究时,需要对序列进行bin的划分,也就是对组装得到的序列进行聚类。 目前大部分工具的原理是基于DNA序列组成, Most existing computational binning tools simply utilize DNA composition. The basis of these approaches is that genome G+C content, dinucleotide frequencies, and synonymous codon usage vary among organisms, and are generally characteristic of evolutionary lineages. (设想一下,未来应当是随着研究的深入,发现目前基于DNA序列的方法有很多缺陷与不足,比如说DNA构象是不是要考虑呢,这样的话目前研究结果就不是很完善,此处的关键在于明确清楚代表DNA序列的特征,并知道哪些特征对聚类有较大影响) 相关的工具有: TETRA、 MetaClust和 CompostBin TETRA uses z-scores from tetramer frequencies to classify metagenomic sequences. MetaClust uses a combination of k-mer frequency metrics to score metagenomic sequences. CompostBin, a semi-supervised approach, uses a weighted PCA algorithm to project high dimensional DNA composition data into an informative lower-dimensional space, and then uses the normalized cut clustering algorithm to classify sequences into taxon-specific bins. 3)功能预测(function prediction) 目前这块的算法和工具还较少; 常见的预测多为COG families, KEGG families , FIG families,注释的流程多是传统基因组的流程。 工具有 MG-RAST,is an automatic server for subsystem annotation for metagenomic datasets, based on an extension of the very successful microbial genome annotation server RAST. CD-HIT algorithm: rapid analysis of the sequence diversity for very large metagenomic datasets using a clustering approach. 4)比较宏基因组学的研究(Comparative metagenomics) 多基于序列的比较分析 工具有 UniFrac、 MEGAN; UniFrac, a very popular tool for comparing communities based on the lineages, calculates the phylogenetic distances between two communities as the fraction of the branch length of the phylogenetic tree MEGAN, provides visual and statistical comparison of metagenomes based on what the lineages they contain. 对于基因组学的研究来讲,除了基于序列信息外, Microbial communities can also be compared based on other types of information, such as the functions encoded by metagenomes. 5)宏基因组学中的统计工具(Statistical tools for metagenomics ) Phylogeny-based statistical tools for comparing community structures include integral-LIBSHUFF, TreeClimber, UniFrac. AMOVA, analysis of molecular variance, which determines whether the genetic diversity within two or more communities is greater than their pooled genetic diversity. HOMOVA, homogeneity of molecular variance, which determines whether the amount of genetic diversity in each community is significantly different. Metastat was developed for detecting significantly different features (such as taxa, biological pathways, or gene families) between two populations, aiming to study how two populations are different from each other. 不同的统计工具有着各自适应的条件,可见文章 Evaluating different approaches that test whether microbial communities have the same structure 6)菌群与环境的关系研究(Modeling interactions between microbes and their enviroment) 目前对于这个研究还较少,提出“ metabolic footprint ”和基于“network”的研究 7)研究过程中需要注意的一些点 (1)16s rRNA chimeras could lead to inaccurate estimation of the species diversity of a community 嵌合体对结果的影响,可从两个方面考虑减少嵌合体:实验流程或者 emulsion PCR技术的改进,数据处理端的优化,如 Bellerophon、 Pintail和 Mallard. (2)Artificial replicates may introduce systematic artifactes to the estimation of gene and taxon abundance 研究发现 11% and 35% of sequences in a typical metagenome are artificial replicates. (3)Gene family frequencies derived based on read counts in metagenomic data may be unreliable due to different gene family lengths 主要是为了排除基因长度的影响 (4)Be aware of artificial pathways MinPath,用最少的pathway去解释所有注释到的功能 8)存在的挑战 (1)Scalability 数据量的庞大,NGS得到数据的地方越来越快,越来越多,对计算和分析带来了较大挑战 (2)Integration of metaproteomic, metatranscriptiomic and metagenomics data sets 基因组、转录组、蛋白组不同层次数据的整合和研究
3777 次阅读|0 个评论

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-5-23 14:02

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部