woodcorpse的个人博客分享 http://blog.sciencenet.cn/u/woodcorpse

博文

QIIME 2教程. 20实用程序Utilities(2020.11)

已有 1512 次阅读 2021-2-3 21:44 |个人分类:QIIME2|系统分类:科研笔记

QIIME 2中的实用程序

Utilities in QIIME 2

https://docs.qiime2.org/2020.11/tutorials/utilities/

以下是QIIME 2中提供的许多非基于插件的实用程序。以下文档试图演示其中的许多功能。 本文档按接口interface划分,并尝试交叉引用其他接口中可用的类似功能。

conda activate qiime2-2020.11

命令行q2cli

大多数有趣的实用程序都可以在q2clitools子命令中找到:

qiime tools --help

显示如下结果:

Usage: qiime tools [OPTIONS] COMMAND [ARGS]...

  用于QIIME 2文件的工具。Tools for working with QIIME 2 files.

Options:
  --help      显示帮助并退出Show this message and exit.

Commands:
  citations         显示引文Print citations for a QIIME 2 result.
  export            导出数据Export data from a QIIME 2 Artifact or a Visualization
  extract           解压对象Extract a QIIME 2 Artifact or Visualization archive.
  import            导入数据Import data into a new QIIME 2 Artifact.
  inspect-metadata  检查元数据列Inspect columns available in metadata.
  peek              预览Take a peek at a QIIME 2 Artifact or Visualization.
  validate          验证Validate data in a QIIME 2 Artifact.
  view              查看View a QIIME 2 Visualization.

让我们动手处理一些数据,以便我们可以进一步了解此功能! 首先,我们将查看PD Mice教程中的分类条形图:

mkdir -p utilites && cd utilites
wget -c  https://data.qiime2.org/2020.11/tutorials/utilities/taxa-barplot.qzv

检索引文 Retrieving Citations

现在我们有了一些结果,让我们更多地了解与创建此可视化相关的引文。 首先,我们可以检查qiime tools citations命令的帮助文本:

qiime tools citations --help

输出:

Usage: qiime tools citations [OPTIONS] ARTIFACT/VISUALIZATION

  Print citations as a BibTex file (.bib) for a QIIME 2 result.

Options:
  --help      Show this message and exit.

输出可视化

现在我们知道如何使用该命令,我们将运行以下命令:

qiime tools citations taxa-barplot.qzv

输出结果如下:

@article{framework|qiime2:2019.10.0|0,
 author = {Bolyen, Evan and Rideout, Jai Ram and Dillon, Matthew R. and Bokulich, Nicholas A. and Abnet, Christian C. and Al-Ghalith, Gabriel A. and Alexander, Harriet and Alm, Eric J. and Arumugam, Manimozhiyan and Asnicar, Francesco and Bai, Yang and Bisanz, Jordan E. and Bittinger, Kyle and Brejnrod, Asker and Brislawn, Colin J. and Brown, C. Titus and Callahan, Benjamin J. and Caraballo-Rodríguez, Andrés Mauricio and Chase, John and Cope, Emily K. and Da Silva, Ricardo and Diener, Christian and Dorrestein, Pieter C. and Douglas, Gavin M. and Durall, Daniel M. and Duvallet, Claire and Edwardson, Christian F. and Ernst, Madeleine and Estaki, Mehrbod and Fouquier, Jennifer and Gauglitz, Julia M. and Gibbons, Sean M. and Gibson, Deanna L. and Gonzalez, Antonio and Gorlick, Kestrel and Guo, Jiarong and Hillmann, Benjamin and Holmes, Susan and Holste, Hannes and Huttenhower, Curtis and Huttley, Gavin A. and Janssen, Stefan and Jarmusch, Alan K. and Jiang, Lingjing and Kaehler, Benjamin D. and Kang, Kyo Bin and Keefe, Christopher R. and Keim, Paul and Kelley, Scott T. and Knights, Dan and Koester, Irina and Kosciolek, Tomasz and Kreps, Jorden and Langille, Morgan G. I. and Lee, Joslynn and Ley, Ruth and Liu, Yong-Xin and Loftfield, Erikka and Lozupone, Catherine and Maher, Massoud and Marotz, Clarisse and Martin, Bryan D. and McDonald, Daniel and McIver, Lauren J. and Melnik, Alexey V. and Metcalf, Jessica L. and Morgan, Sydney C. and Morton, Jamie T. and Naimey, Ahmad Turan and Navas-Molina, Jose A. and Nothias, Louis Felix and Orchanian, Stephanie B. and Pearson, Talima and Peoples, Samuel L. and Petras, Daniel and Preuss, Mary Lai and Pruesse, Elmar and Rasmussen, Lasse Buur and Rivers, Adam and Robeson, Michael S. and Rosenthal, Patrick and Segata, Nicola and Shaffer, Michael and Shiffer, Arron and Sinha, Rashmi and Song, Se Jin and Spear, John R. and Swafford, Austin D. and Thompson, Luke R. and Torres, Pedro J. and Trinh, Pauline and Tripathi, Anupriya and Turnbaugh, Peter J. and Ul-Hasan, Sabah and van der Hooft, Justin J. J. and Vargas, Fernando and Vázquez-Baeza, Yoshiki and Vogtmann, Emily and von Hippel, Max and Walters, William and Wan, Yunhu and Wang, Mingxun and Warren, Jonathan and Weber, Kyle C. and Williamson, Charles H. D. and Willis, Amy D. and Xu, Zhenjiang Zech and Zaneveld, Jesse R. and Zhang, Yilong and Zhu, Qiyun and Knight, Rob and Caporaso, J. Gregory},
 doi = {10.1038/s41587-019-0209-9},
 issn = {1546-1696},
 journal = {Nature Biotechnology},
 number = {8},
 pages = {852-857},
 title = {Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2},
 url = {https://doi.org/10.1038/s41587-019-0209-9},
 volume = {37},
 year = {2019}
}

@article{view|types:2019.10.0|BIOMV210DirFmt|0,
 author = {McDonald, Daniel and Clemente, Jose C and Kuczynski, Justin and Rideout, Jai Ram and Stombaugh, Jesse and Wendel, Doug and Wilke, Andreas and Huse, Susan and Hufnagle, John and Meyer, Folker and Knight, Rob and Caporaso, J Gregory},
 doi = {10.1186/2047-217X-1-7},
 journal = {GigaScience},
 number = {1},
 pages = {7},
 publisher = {BioMed Central},
 title = {The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome},
 volume = {1},
 year = {2012}
}

@inproceedings{view|types:2019.10.0|pandas.core.frame:DataFrame|0,
 author = { Wes McKinney },
 booktitle = { Proceedings of the 9th Python in Science Conference },
 editor = { Stéfan van der Walt and Jarrod Millman },
 pages = { 51 -- 56 },
 title = { Data Structures for Statistical Computing in Python },
 year = { 2010 }
}

@inproceedings{view|types:2019.10.0|pandas.core.series:Series|0,
 author = { Wes McKinney },
 booktitle = { Proceedings of the 9th Python in Science Conference },
 editor = { Stéfan van der Walt and Jarrod Millman },
 pages = { 51 -- 56 },
 title = { Data Structures for Statistical Computing in Python },
 year = { 2010 }
}

@article{view|types:2019.10.0|biom.table:Table|0,
 author = {McDonald, Daniel and Clemente, Jose C and Kuczynski, Justin and Rideout, Jai Ram and Stombaugh, Jesse and Wendel, Doug and Wilke, Andreas and Huse, Susan and Hufnagle, John and Meyer, Folker and Knight, Rob and Caporaso, J Gregory},
 doi = {10.1186/2047-217X-1-7},
 journal = {GigaScience},
 number = {1},
 pages = {7},
 publisher = {BioMed Central},
 title = {The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome},
 volume = {1},
 year = {2012}
}

@article{framework|qiime2:2019.4.0|0,
 author = {Bolyen, Evan and Rideout, Jai Ram and Dillon, Matthew R and Bokulich, Nicholas A and Abnet, Christian and Al-Ghalith, Gabriel A and Alexander, Harriet and Alm, Eric J and Arumugam, Manimozhiyan and Asnicar, Francesco and Bai, Yang and Bisanz, Jordan E and Bittinger, Kyle and Brejnrod, Asker and Brislawn, Colin J and Brown, C Titus and Callahan, Benjamin J and Caraballo-Rodríguez, Andrés Mauricio and Chase, John and Cope, Emily and Da Silva, Ricardo and Dorrestein, Pieter C and Douglas, Gavin M and Durall, Daniel M and Duvallet, Claire and Edwardson, Christian F and Ernst, Madeleine and Estaki, Mehrbod and Fouquier, Jennifer and Gauglitz, Julia M and Gibson, Deanna L and Gonzalez, Antonio and Gorlick, Kestrel and Guo, Jiarong and Hillmann, Benjamin and Holmes, Susan and Holste, Hannes and Huttenhower, Curtis and Huttley, Gavin and Janssen, Stefan and Jarmusch, Alan K and Jiang, Lingjing and Kaehler, Benjamin and Kang, Kyo Bin and Keefe, Christopher R and Keim, Paul and Kelley, Scott T and Knights, Dan and Koester, Irina and Kosciolek, Tomasz and Kreps, Jorden and Langille, Morgan GI and Lee, Joslynn and Ley, Ruth and Liu, Yong-Xin and Loftfield, Erikka and Lozupone, Catherine and Maher, Massoud and Marotz, Clarisse and Martin, Bryan and McDonald, Daniel and McIver, Lauren J and Melnik, Alexey V and Metcalf, Jessica L and Morgan, Sydney C and Morton, Jamie and Naimey, Ahmad Turan and Navas-Molina, Jose A and Nothias, Louis Felix and Orchanian, Stephanie B and Pearson, Talima and Peoples, Samuel L and Petras, Daniel and Preuss, Mary Lai and Pruesse, Elmar and Rasmussen, Lasse Buur and Rivers, Adam and Robeson, II, Michael S and Rosenthal, Patrick and Segata, Nicola and Shaffer, Michael and Shiffer, Arron and Sinha, Rashmi and Song, Se Jin and Spear, John R and Swafford, Austin D and Thompson, Luke R and Torres, Pedro J and Trinh, Pauline and Tripathi, Anupriya and Turnbaugh, Peter J and Ul-Hasan, Sabah and van der Hooft, Justin JJ and Vargas, Fernando and Vázquez-Baeza, Yoshiki and Vogtmann, Emily and von Hippel, Max and Walters, William and Wan, Yunhu and Wang, Mingxun and Warren, Jonathan and Weber, Kyle C and Williamson, Chase HD and Willis, Amy D and Xu, Zhenjiang Zech and Zaneveld, Jesse R and Zhang, Yilong and Knight, Rob and Caporaso, J Gregory},
 doi = {10.7287/peerj.preprints.27295v1},
 issn = {2167-9843},
 journal = {PeerJ Preprints},
 month = {oct},
 pages = {e27295v1},
 title = {QIIME 2: Reproducible, interactive, scalable, and extensible microbiome data science},
 url = {https://doi.org/10.7287/peerj.preprints.27295v1},
 volume = {6},
 year = {2018}
}

@article{action|feature-classifier:2019.4.0|method:fit_classifier_naive_bayes|0,
 author = {Pedregosa, Fabian and Varoquaux, Gaël and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and Vanderplas, Jake and Passos, Alexandre and Cournapeau, David and Brucher, Matthieu and Perrot, Matthieu and Duchesnay, édouard},
 journal = {Journal of machine learning research},
 number = {Oct},
 pages = {2825--2830},
 title = {Scikit-learn: Machine learning in Python},
 volume = {12},
 year = {2011}
}

@inproceedings{view|types:2019.4.1|pandas.core.series:Series|0,
 author = { Wes McKinney },
 booktitle = { Proceedings of the 9th Python in Science Conference },
 editor = { Stéfan van der Walt and Jarrod Millman },
 pages = { 51 -- 56 },
 title = { Data Structures for Statistical Computing in Python },
 year = { 2010 }
}

@article{plugin|feature-classifier:2019.4.0|0,
 author = {Bokulich, Nicholas A. and Kaehler, Benjamin D. and Rideout, Jai Ram and Dillon, Matthew and Bolyen, Evan and Knight, Rob and Huttley, Gavin A. and Caporaso, J. Gregory},
 doi = {10.1186/s40168-018-0470-z},
 journal = {Microbiome},
 number = {1},
 pages = {90},
 title = {Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin},
 url = {https://doi.org/10.1186/s40168-018-0470-z},
 volume = {6},
 year = {2018}
}

@article{plugin|dada2:2019.10.0|0,
 author = {Callahan, Benjamin J and McMurdie, Paul J and Rosen, Michael J and Han, Andrew W and Johnson, Amy Jo A and Holmes, Susan P},
 doi = {10.1038/nmeth.3869},
 journal = {Nature methods},
 number = {7},
 pages = {581},
 publisher = {Nature Publishing Group},
 title = {DADA2: high-resolution sample inference from Illumina amplicon data},
 volume = {13},
 year = {2016}
}

@article{action|feature-classifier:2019.10.0|method:classify_sklearn|0,
 author = {Pedregosa, Fabian and Varoquaux, Gaël and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and Vanderplas, Jake and Passos, Alexandre and Cournapeau, David and Brucher, Matthieu and Perrot, Matthieu and Duchesnay, édouard},
 journal = {Journal of machine learning research},
 number = {Oct},
 pages = {2825--2830},
 title = {Scikit-learn: Machine learning in Python},
 volume = {12},
 year = {2011}
}

@article{plugin|feature-classifier:2019.10.0|0,
 author = {Bokulich, Nicholas A. and Kaehler, Benjamin D. and Rideout, Jai Ram and Dillon, Matthew and Bolyen, Evan and Knight, Rob and Huttley, Gavin A. and Caporaso, J. Gregory},
 doi = {10.1186/s40168-018-0470-z},
 journal = {Microbiome},
 number = {1},
 pages = {90},
 title = {Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin},
 url = {https://doi.org/10.1186/s40168-018-0470-z},
 volume = {6},
 year = {2018}
}

如您所见,上面以BibTeX格式显示了此特定可视化的引文。

我们还可以看到特定插件的引用

qiime vsearch --citations

显示如下:

% use `qiime tools citations` on a QIIME 2 result for complete list

@article{key0,
 author = {Rognes, Torbjørn and Flouri, Tomáš and Nichols, Ben and Quince, Christopher and Mahé, Frédéric},
 doi = {10.7717/peerj.2584},
 journal = {PeerJ},
 pages = {e2584},
 publisher = {PeerJ Inc.},
 title = {VSEARCH: a versatile open source tool for metagenomics},
 volume = {4},
 year = {2016}
}

以及针对插件的特定操作:

qiime vsearch cluster-features-open-reference --citations

显示如下:

% use `qiime tools citations` on a QIIME 2 result for complete list

@article{key0,
 author = {Rideout, Jai Ram and He, Yan and Navas-Molina, Jose A. and Walters, William A. and Ursell, Luke K. and Gibbons, Sean M. and Chase, John and McDonald, Daniel and Gonzalez, Antonio and Robbins-Pianka, Adam and Clemente, Jose C. and Gilbert, Jack A. and Huse, Susan M. and Zhou, Hong-Wei and Knight, Rob and Caporaso, J. Gregory},
 doi = {10.7717/peerj.545},
 journal = {PeerJ},
 pages = {e545},
 publisher = {PeerJ Inc.},
 title = {Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences},
 volume = {2},
 year = {2014}
}

查看可视化 Viewing Visualizations

如果我们要查看分类单元图怎么办? 一种选择是在https://view.qiime2.org上加载可视化文件。 另一种选择是使用qiime工具视图来完成工作

注意:只能在https://view.qiime2.org上查看出处。

qiime tools view taxa-barplot.qzv

此步需要图形界面支持。如Linux/Mac系统的桌面下运行。Widnows可使用Linux的远程桌面,详见(Windows10远程桌面Ubuntu),或Termial配置支持X11转发(如XShell+Xmanager,或Putty+xming,不推荐,反应极慢)。

这将打开一个浏览器窗口,其中包含您的可视化文件。 完成后,您可以关闭浏览器窗口并按键盘上的ctrl-c终止命令。

偷看结果 Peeking at Results

通常,我们需要验证对象的类型和uuid。 我们可以使用qiime tools peek命令来查看这些对象的简短摘要报告。 首先,让我们看一些数据:

请选择最适合您的环境的下载选项:

wget -c https://data.qiime2.org/2020.11/tutorials/utilities/faith-pd-vector.qza

现在我们有了数据,我们可以了解有关该文件的更多信息:

qiime tools peek faith-pd-vector.qza

显示结果如下:

UUID:        d5186dce-438d-44bb-903c-cb51a7ad4abe
Type:        SampleData[AlphaDiversity] % Properties('phylogenetic')
Data format: AlphaDiversityDirectoryFormat

输出对象

在这里,我们可以看到对象的类型为SampleData [AlphaDiversity]%Properties('phylogenetic'),以及对象的UUID和格式。

验证结果 Validating Results

我们还可以通过运行qiime tools validate来验证文件的完整性

qiime tools validate faith-pd-vector.qza

显示如下结果

Result faith-pd-vector.qza appears to be valid at level=max.

如果文件有问题,此命令通常会在在合理范围内很好地报告问题所在。

检查元数据 Inspecting Metadata

元数据教程中,我们了解了metadata tabulate命令及其创建的可视化效果。 通常,我们不太关心元数据的值,而只是关心它的结构:多少列? 他们的名字是什么? 他们是什么类型? 文件中有多少行(或ID)?

我们可以通过首先下载一些示例元数据来演示这一点:

wget -c http://210.75.224.110/github/QIIME2ChineseManual/2020.11/utilites/sample-metadata.tsv

然后运行qiime tools inspect-metadata命令:

qiime tools inspect-metadata sample-metadata.tsv

显示如下结果:

             COLUMN NAME  TYPE       
========================  ===========
                 barcode  categorical
                mouse_id  categorical
                genotype  categorical
                 cage_id  categorical
                   donor  categorical
            donor_status  categorical
    days_post_transplant  numeric    
enotype_and_donor_status  categorical
========================  ===========
                    IDS:  48
                COLUMNS:  8

问题:sample-metadata.tsv中有多少个元数据列? 多少个ID? 确定存在多少分类列。

该工具对于了解可作为元数据查看的文件的元数据列名称很有帮助。

详者注:我们知道行列数量(48行/IDS代表48个样品,8列/COLUMNS代表有8种样本属性),以及他们分别是属于分类型catagorical或是数值型numeric。

wget -c https://data.qiime2.org/2020.11/tutorials/utilities/jaccard-pcoa.qza

我们刚刚下载的文件是Jaccard PCoA(来自PD Mice教程),可以代替“典型” TSV格式的元数据文件使用。 我们可能需要了解我们希望运行的命令的列名,使用inspect-metadata,我们可以了解所有信息:

qiime tools inspect-metadata jaccard-pcoa.qza

结果如下:

COLUMN NAME  TYPE   
===========  =======
     Axis 1  numeric
     Axis 2  numeric
     ...     numeric
    Axis 47  numeric
===========  =======
       IDS:  47
   COLUMNS:  47

输出对象

问题:有多少个ID? 多少列? 是否有分类型的列? 为什么?

详者注:共有47个IDS,47列,无分类型列。因为PCoA的结果为坐标值,为数值型。

对象接口 Artifact API

即将推出,请继续关注!

译者简介

刘永鑫,博士,中科院青促会会员,QIIME 2项目参与人。2008年毕业于东北农业大学微生物学专业,2014年于中国科学院大学获生物信息学博士,2016年遗传学博士后出站留所工作,任工程师。目前主要研究方向为宏基因组数据分析。目前在Science、Nature Biotechnology、Protein & Cell、Current Opinion in Microbiology等杂志发表论文30余篇,被引2千余次。2017年7月创办“宏基因组”公众号,目前分享宏基因组、扩增子原创文章2400余篇,代表作有《扩增子图表解读、分析流程和统计绘图三部曲(21篇)》《微生物组实验手册》《微生物组数据分析》等,关注人数11万+,累计阅读2100万+。

Reference

https://docs.qiime2.org/2020.11

Evan Bolyen, Jai Ram Rideout, Matthew R. Dillon, Nicholas A. Bokulich, Christian C. Abnet, Gabriel A. Al-Ghalith, Harriet Alexander, Eric J. Alm, Manimozhiyan Arumugam, Francesco Asnicar, Yang Bai, Jordan E. Bisanz, Kyle Bittinger, Asker Brejnrod, Colin J. Brislawn, C. Titus Brown, Benjamin J. Callahan, Andrés Mauricio Caraballo-Rodríguez, John Chase, Emily K. Cope, Ricardo Da Silva, Christian Diener, Pieter C. Dorrestein, Gavin M. Douglas, Daniel M. Durall, Claire Duvallet, Christian F. Edwardson, Madeleine Ernst, Mehrbod Estaki, Jennifer Fouquier, Julia M. Gauglitz, Sean M. Gibbons, Deanna L. Gibson, Antonio Gonzalez, Kestrel Gorlick, Jiarong Guo, Benjamin Hillmann, Susan Holmes, Hannes Holste, Curtis Huttenhower, Gavin A. Huttley, Stefan Janssen, Alan K. Jarmusch, Lingjing Jiang, Benjamin D. Kaehler, Kyo Bin Kang, Christopher R. Keefe, Paul Keim, Scott T. Kelley, Dan Knights, Irina Koester, Tomasz Kosciolek, Jorden Kreps, Morgan G. I. Langille, Joslynn Lee, Ruth Ley, Yong-Xin Liu, Erikka Loftfield, Catherine Lozupone, Massoud Maher, Clarisse Marotz, Bryan D. Martin, Daniel McDonald, Lauren J. McIver, Alexey V. Melnik, Jessica L. Metcalf, Sydney C. Morgan, Jamie T. Morton, Ahmad Turan Naimey, Jose A. Navas-Molina, Louis Felix Nothias, Stephanie B. Orchanian, Talima Pearson, Samuel L. Peoples, Daniel Petras, Mary Lai Preuss, Elmar Pruesse, Lasse Buur Rasmussen, Adam Rivers, Michael S. Robeson, Patrick Rosenthal, Nicola Segata, Michael Shaffer, Arron Shiffer, Rashmi Sinha, Se Jin Song, John R. Spear, Austin D. Swafford, Luke R. Thompson, Pedro J. Torres, Pauline Trinh, Anupriya Tripathi, Peter J. Turnbaugh, Sabah Ul-Hasan, Justin J. J. van der Hooft, Fernando Vargas, Yoshiki Vázquez-Baeza, Emily Vogtmann, Max von Hippel, William Walters, Yunhu Wan, Mingxun Wang, Jonathan Warren, Kyle C. Weber, Charles H. D. Williamson, Amy D. Willis, Zhenjiang Zech Xu, Jesse R. Zaneveld, Yilong Zhang, Qiyun Zhu, Rob Knight & J. Gregory Caporaso#. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology. 2019, 37: 852-857. doi:10.1038/s41587-019-0209-9



https://m.sciencenet.cn/blog-3334560-1270465.html

上一篇:QIIME 2教程. 17鉴定和过滤嵌合体q2-vsearch(2020.11)
下一篇:QIIME 2教程. 21进化树推断q2-phylogeny(2020.11)

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-3-28 17:46

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部