小柯机器人

用于快速序列对齐的无参数框架
2023-08-16 15:27

美国加州大学Pavel A. Pevzner研究组开发出UniAligner,一个用于快速序列对齐的无参数框架。该项研究成果发表在2023年8月14日出版的《自然—方法学》上。

他们提出了UniAligner -无参数序列比对算法,具有序列依赖比对评分,可自动更改任何对比较序列。UniAligner优先考虑与两个序列之间的进化关系更相关的稀有子字符串的匹配。他们应用UniAligner来估计人类着丝粒的突变率,并量化着丝粒中极高的大重复和缺失率。这一高比率表明,着丝粒可能代表了人类基因组结构组织中一些进化最快的区域。

据介绍,尽管“完整基因组学”的最新进展揭示了以前无法进入的基因组区域,但由于目前没有准确比较ETRs序列的工具,着丝粒和其他超长串联重复序列(ETRs)的变异分析面临算法挑战。与直觉相反,经典的比对方法,如Smith-Waterman算法,无法构建生物学上充分的ETRs比对。

附:英文原文

Title: UniAligner: a parameter-free framework for fast sequence alignment

Author: Bzikadze, Andrey V., Pevzner, Pavel A.

Issue&Volume: 2023-08-14

Abstract: Even though the recent advances in ‘complete genomics’ revealed the previously inaccessible genomic regions, analysis of variations in centromeres and other extra-long tandem repeats (ETRs) faces an algorithmic challenge since there are currently no tools for accurate sequence comparison of ETRs. Counterintuitively, the classical alignment approaches, such as the Smith–Waterman algorithm, fail to construct biologically adequate alignments of ETRs. We present UniAligner—the parameter-free sequence alignment algorithm with sequence-dependent alignment scoring that automatically changes for any pair of compared sequences. UniAligner prioritizes matches of rare substrings that are more likely to be relevant to the evolutionary relationship between two sequences. We apply UniAligner to estimate the mutation rates in human centromeres, and quantify the extremely high rate of large duplications and deletions in centromeres. This high rate suggests that centromeres may represent some of the most rapidly evolving regions of the human genome with respect to their structural organization.

DOI: 10.1038/s41592-023-01970-4

Source: https://www.nature.com/articles/s41592-023-01970-4

Nature Methods:《自然—方法学》,创刊于2004年。隶属于施普林格·自然出版集团,最新IF:47.99
官方网址:https://www.nature.com/nmeth/
投稿链接:https://mts-nmeth.nature.com/cgi-bin/main.plex


本期文章:《自然—方法学》:Online/在线发表

分享到:

0