xbinbzy的个人博客分享 http://blog.sciencenet.cn/u/xbinbzy

博文

HMP-MetaORFA

已有 1999 次阅读 2015-8-25 16:18 |个人分类:科研文章|系统分类:科研笔记|关键词:学者| HMP计划

文章:An ORFome assembly approach to metagenomics sequences analysis

杂志:J Bioinform Comput Biol

年份:2009


宏基因组组装的挑战:

   1)metagenomics projects often apply NGS technique, and produce shorter reads. As a result, many short repeats may increase the complexity of the overlap graph, and cause many more mis-assemblies.

   2)unlike the conventional genome shotgun sequencing, which handles a single species, metagenomics sequencing reads are collected from a large amount of different genomes.

基本原理:


   We implemented a tool called MetaORFA in C/C++ under linux platforms for the ORFome assembly. MetaORFA consists of two programs. One program takes as input a set of reads and predicts a number of putative ORFs; and the other program (EULER-ORFA) takes as input the set of putative ORFs, and reports a set of peptides corresponding to the assembled ORFs. Prior to be supplied to MetaORFA, the original reads were first processed by MDUST (a tool for autonomous masking from TIGR, which implements the DUST algorithm) to mask out low-complexity regions, and then processed by Tandem Repeat Finder (TRF V4.0) to mask out short tandem repeats.


组装效果很关键的环节在于ORF的预测,这里的处理步骤是:

   For each read (and its reverse complement), a region from the beginning (i.e., position 1, 2, or 3, depending on the frame) or a start codon to the end of the read or a stop codon is considered as a potential ORF. Only ORFs with more than a threshold K (default K = 25) codons were reported. These ORFs will then be transformed into peptide sequences, and subsequently assembled using EULER-ORFA algorithm, modified from the original EULER algorithm designed for DNA fragment assembly.

   如果基于ORF组装是个非常不错的策略,那么对于ORF的预测会是很重要的改进点。



https://m.sciencenet.cn/blog-306699-915864.html

上一篇:HMP-Metagenomic Pyrosequencing and Microbial Identification
下一篇:益生元对菌群、IBS的影响

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-28 17:31

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部