

tag 标签: catch


版块 作者 回复/查看 最后发表



16s rRNA sequencing中chimera的检测
xbinbzy 2015-10-29 10:42
在16s rRNA的分析中,在数据处理过程中重要的一步操作是鉴定和去除嵌合体。 嵌合体(chimera)产生的原因主要是PCR过程中产生的错误: During this PCR amplification,chimeras might be created due to incomplete extension. 在扩增过程中,chimera的比例可能会达到70% : Likewise, the percentage of chimeric se-quences in the unique amplicon pool of PCR-amplified samplesmight reach values higher than 70%。( 实验过程的优化中,考虑减少嵌合体的产生 )。 嵌合体的处理策略,主要可分 为 reference-based和 de novo 两种。 reference-based的原理是 Reference-based methods basically screen the sequences poten-tially containing chimeras against a curated reference databasewith chimera-free sequences. 工具有 Pintai l 、 Bellerophon 。在这基础上 ChimeraSlayer 实现了较大的改动和性能优化, ChimeraSlayer的基本原理是 which uses 30% of each end as a seed forsearching a reference data set, finding the closest parent (if any),performing alignments, and scoring to the candidate parents. 它的缺点在于 it was not able to detect chimeras with a smallchimeric range. 在 ChimeraSlayer的基础 上, reference-based UCHIME 表现性能更好,In reference-based UCHIME, query sequences are divided into four nonoverlapping segments andsearched against a reference database. 有研究报道,ChimeraSlayer and reference-based UCHIME在长reads中具备短chimeric的时候效果不如DECIPHER,were reported to have a lower accuracy than that of DECIPHER in cases where the algorithms were challenged with a data set con-taining chimeric sequences with a short chimeric range and longsequence lengths. DECIPHER的原理是The DECIPHER algorithm is a search-based algorithm that splits the query sequence into different fragmentsand analyzes whether those fragments are uncommon in the ref-erence phylogenetic group where the query sequence is classified. If a significant amount of fragments is assigned to a phylogeneticgroup different from the complete query sequence, the sequence isclassified as chimeric. 实际上,chimera检测工具的性能评估很难做到公平统一,各工具有各自的适应范围。 De novo 策略的原理是 De novo methodologies are generally based on the fact thatparents of any chimeric sequence have gone through at least onemore PCR cycle than chimeric sequences. 工具有 Perseu s 、 de novo UCHIME 、 de novo ChimeraSlayer ,这些工具目前都已整合到了mothur中。 近来 the UPARSE pipeline was released, combining in one step chimera detection with clus-tering of sequencing reads into operational taxonomic units. reference-based和 de novo 两种策略各自具有优缺点:1)reference-based的优势在于 In situations dealing with well-studied environments, the reference-based approaches werefound to be very effective in distinguishing between chimeras andchimera-free (parent) sequences . 2)reference-based的劣势在于 efficiency is assumed to belower when dealing with less well-known environments.而这正好是 de novo 方法的优势所在。3) de novo 方法的劣势在于 most of the de novo approaches depend on redundancy differences between chimeras and parents, assuming that the number of parentsequences has to be at least one time more redundant than theircorresponding chimeric sequences. This requires data abundances to have been reported with high accuracy. (这个就涉及到多少数据量是能保证效果的) 15年CATCh(Combining Algorithms to Track Chimeras)出现,其原理在于利用其他chimera的检测工具作为input,利用有监督的学习方法去进行分类模型构建,利用测试集验证分类模型的准确度,最终确定分类模型来鉴定嵌合体: which is able to discriminate betweenchimeric and nonchimeric sequences based on a specific set ofinput data (called features in the context of machine learning).在此工具中,输入数据不是测序reads,而是不同工具鉴定chimera的结果。 With this tool, we use as input data not the sequence read charac-teristics but rather the results (e.g., scores) of different individual chimera detection tools mentioned above and integrate them intoone prediction. All different tools are run separately, and theiroutput values are combined and processed by the classifier in or-der to give a prediction of whether a read is a chimera or not. 此工具在处理时,主要分为3个步骤:(1) the necessary input features (i.e., output values of the different chimeradetection tools) are identified. (2) the classifier istrained via a supervised learning approach. In this step, the classifier learns to make a correct prediction based on example input data; in our case, training data consist of the output features of a set of sequences reads obtained from different chimera detection tools, together with their correct classification (i.e., whether thisread is a chimeric sequence or not). (3) In the third step, the trained classifier can be used to predict chimeric sequences in new, previously unseen data (i.e., data that did not belong to the training data). By feeding the outputs of the different individual chimeradetection tools into the classifier, CATCh is able to classify them into chimeric and chimera-free subsets. As two different types ofchimera detection tools exist, either reference based or de novo , wealso developed two different versions of CATCh. In order to illus-trate its performance, CATCh (reference based as well as de novo )was benchmarked against other chimera detection tools using var-ious publicly available benchmark data. ( 利用其他工具的检测结果作为输入数据,不同工具的结果出现不一致的情况时,对模型结果是否存在影响 ) 参考文献: Mysara M, Saeys Y, Leys N et al. CATCh, an ensemble classifier for chimera detection in 16S rRNA sequencing studies. 2015, 81(5):1573-84. doi: 10.1128/AEM.02896-14
个人分类: 科研文章|7625 次阅读|0 个评论
laplace 2012-5-19 16:58
最近关于evernote的新闻,毫无疑问是evernote为中国顾客单独设置服务器一事了,考虑到中国的网络之烂,在中国单独设置服务器,提高中国顾客的服务质量,是情理之中的事情了,毕竟大环境无法改变 。 相关的新闻,可见“ Evernote将在中国建立数据中心 。 但是,有的人认为速度是最重要的,有的人却担忧自己的数据安全,那就自由地在”印象笔记“和”international evernote“之间作出选择吧。 而对我来说,自然是希望鱼与熊掌都要兼得,既要国际版的安全,又要中国版的速度。(如果自己或周围的人没有相关安全的例子,可能不知道安全的问题是个什么问题,呵呵,那就忽略这个问题吧) 我其实非常能体会evernote作出这个决策的背景了,我之前在手机上用evernote时,就在google market上下载了evernote,希望用这个软件保留”厕所“和路上等瞬间时刻的灵感和感悟,可是用的时候会发现很痛苦,软件打开的非常慢(当然与我的手机有关系,毕竟是元老级别的里程碑一代),数据同步也很慢,完全处于无法忍受状态,灵感可不会等这迟钝的evernote,就像我进入电梯想趁着这个间隔打开“豆瓣读书”看会儿书,发现刚缓冲好,却要出电梯了。。。。我只好在手机上果断抛弃evernote,尝试了多种相关软件后,喜欢并且一直是用catch,该软件完全实现了我在手机上的需求,功能强大,打开迅速,完全为”灵感记录器“而生,当然其他功能比如照相、录音、定时提醒就没有使用上了。无奈现在catch网页版本超级的慢,经常刷新不出来。 现在有了evernote的中国特别版”印象笔记“,完全让我抛弃了catch回归到了evernote的怀抱。现在我的使用情况是: 1、前期用的国际版账户不动,依旧使用加州服务器,也就是登陆到international evernote账户中,该账户用于实验记录和总结等,属于纯粹的科研用途,在宿舍和实验室的两个电脑之间同步, 只在PC客户端上使用 ,不用插件,不用手机客户端登陆。 2、新创建了一个”印象笔记“账户,该账户不从PC客户端登陆(不得不吐槽一下,evernote免费版居然不能在两个账户之间自由切换),在 电脑前的时候 只使用网页版本, 在 接触不到手机的时候使用 android版本 ,另外chrome插件上的clearly插件也绑定到该账户(强烈推荐这个插件,这是evernote手握的一个大杀器,如果和其他笔记软件相比会有最直观的感受),另外手机上再安装上skitch,时不时来点自我设计并同步到手机客户端对应的evernote账户中。 备注:2013年5月18日做部分修改 catch notes:https://catch.com/
个人分类: 生活点滴|4597 次阅读|0 个评论
电影 Catch and Release
热度 1 lightw626 2011-2-28 17:01
熬夜看了电影 Catch and Release, 电影台中文译作:捉放爱,翻译得真直白。 影片阐述了不同文化背景下的人们对爱、忠诚、朋友、性的理解。有几点体会分享: 1、不要自以为对自己熟悉的人就非常了解,可能其实你知道的只是一小部分; 2、性开放带来的副作用之一是当你想做一个称职的妈,却不知道孩子的父亲是谁 3、有时候,思前想后是多余的、只会耽误时间。 4、人生的轨迹因遇到不同的人和事而完全不同,每一秒钟我们都在走一条有其他选择的路。 5、你可能也没有完全了解自己,因为没有遇到对应的触发器,你的另一方面的能量在沉睡。 6、年轻真好可以做更多选择,年轻也烦恼面对太多的选择。 7、老外表达友谊也亲吻拥抱,在中国不多见。 8、如果爱人要对你说一件不好的事情,别阻止还是让TA说出来,以免以后被震惊。 9、表象只能作为考量一个人本质的参考。 10、对别人慈善也同时慈善了自己。 11、看到的片段往往误导了全局。 12、聆听自己的心声,坦然面对外界。
个人分类: 电影电视剧|3502 次阅读|1 个评论

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-6-11 02:03

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社
