小柯机器人

科学家对广泛变异类型进行高效和准确的基因分型
2022-04-16 14:19

德国杜塞尔多夫海因里希·海因大学Tobias Marschall团队取得一项新突破。他们证实基于泛基因组的基因组推断可以在广泛的变异类别中进行高效和准确的基因分型。相关论文于2022年4月11日发表在《自然—遗传学》杂志上。

他们提出了一种新的算法,PanGenie,它利用单倍型解析的泛基因组参考,结合短读测序数据中的k-mer计数,对广泛的遗传变异进行基因分型,他们称之为基因组推断。与基于图谱的方法相比,PanGenie在覆盖率为30倍的情况下速度快4倍以上,并且几乎所有被测变异类型和覆盖率的基因型一致性都更好。对于大的插入,改进尤其明显(≥50 bp)和重复区域的变异,使这些变异类别能够纳入全基因组关联研究。PanGenie有效地利用了越来越多的单倍型解析装配来揭示以前无法访问的变体的功能影响,同时比基于对齐的工作流更快。

据了解,典型的基因分型工作流程是在识别基因变异之前,先读取参考基因组。生成这样的对齐会引入参考偏差,并带来巨大的计算负担。此外,短的读取长度限制了对重复基因组区域进行表征的能力,这对于基于快速k-mer的基因型检测尤其具有挑战性。

附:英文原文

Title: Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes

Author: Ebler, Jana, Ebert, Peter, Clarke, Wayne E., Rausch, Tobias, Audano, Peter A., Houwaart, Torsten, Mao, Yafei, Korbel, Jan O., Eichler, Evan E., Zody, Michael C., Dilthey, Alexander T., Marschall, Tobias

Issue&Volume: 2022-04-11

Abstract: Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k-mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation—a process we refer to as genome inference. Compared with mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage and achieves better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (≥50bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being faster compared with alignment-based workflows.

DOI: 10.1038/s41588-022-01043-w

Source: https://www.nature.com/articles/s41588-022-01043-w

 

Nature Genetics:《自然—遗传学》,创刊于1992年。隶属于施普林格·自然出版集团,最新IF:41.307
官方网址:https://www.nature.com/ng/
投稿链接:https://mts-ng.nature.com/cgi-bin/main.plex


本期文章:《自然—遗传学》:Online/在线发表

分享到:

0