通常真核生物基因组,尤其是植物基因组,一般具有重复序列比例高、多倍化、高杂合等特点,这使得基因组组装难度大大提高。前期短读长的Sanger和二代测序平台获得的测序reads很难跨越基因组的复杂重复区域,导致组装获得的基因组很片段化,高GC区域或基因间隔重复序列区域不能成功获得,而这些信息在某些大型研究项目,如ENCODE(the Encyclopedia of DNA Elements )等是非常重要的。 本研究采用第三代PacBio单分子实时测序技术,对一种极为耐旱的植物 Oropetium thomaeum ( 有翻译成复活草,欢迎指正提供翻译出处,谢谢 ) 基因组开展了全基因组denovo测序和组装,研究成果于2015年11月11日发表在 Nature 上。 Oropetium thomaeum 基因组大小约245Mb,基于PacBio RS II平台长读长的优势, 组装获得244Mb,即获得>99.6%的基因组序列信息 ; contigs数量仅265个 , contig N50达到2.4Mb 。进一步分析表明, Oropetium thomaeum 基因组是 接近完成级的序列图谱 ,包括 gene space都是无gap的, 尤其是在基因组草图中基本上很难获得的 端粒、着丝粒、转座子元件以及rRNA clusters都是没有gap的 。 Oropetium thomaeum 是草类基因组中最小的基因组,其中43.8%的序列为重复序列,30%多的紧缩在常染色质区域的;另外其基因组包含28466个蛋白编码基因。 一、基因组特点 : Oropetium thomaeum ,虎尾草亚科( Chloridoideae sub- family )、极其耐干旱。 核型:2n = 2x = 18; 基因组大小:1C = 0.25 pg,流式细胞仪预估基因组大小约250Mb;K-mer分析预估基因组大小约245Mb。 二、实验取材 : 1. 植株取自 印度拉贾斯坦邦的焦特布尔( Jodhpur, Rajasthan, India),扩繁。 2. DNA提取: 基于high-salt phenol–chloroform purification方法进行优化 3. RNA提取。 三、测序方案 : 1. DNA PacBio RS II平台测序:P6-C4试剂盒测序,文库insert size为15-20Kb,32个SMRT cells,基因组整体测序深度约 72×。 2. DNA illumina HiSeq平台测序:构建三个不同insert size的文库进行测序:570-bp, 1-kb, 3-kb测序约200X。目的是评估三代组装子的错误率以及基因组的杂合度。 3. DNA Irys system BioNano测序: 构建基因组图谱,目的是对contigs进行anchoring和scaffolding: Irys system 测序获 53 Gb data(>100Kb),基因组覆盖度约200X,molecule N50为169Kb。 4. RNA illumina HiSeq平台测序。 a. Histogram of length distribution of raw P6C4 chemistry PacBio reads. The mean read length of the raw reads is 12,872 bp, and the N50 is 16,485 bp . b. Genome size estimation using k -mer distribution. K -mer distribution of unassembled Oropetium Illumina WGS reads. K -mer frequency displays a unimodal curve indicating a low rate of heterozygosity (0.087%) in the Oropetium genome. Frequency distribution suggests a genome size of ~245Mb, consistent with flow-cytometry-based estimations. 四、基因组 de novo 组装 : 1. 基因组 de novo 组装:RS_HGAP_Assembly.3 protocol C. SMRT sequencing raw read, preassembly and assembly statistics. The distribution of the contig N50 length ( d ) and scaffold N50 length (e ) of all published plant genomes is plotted. The average contig N50 length for published plant genomes is ~50 kb compared to 2.4 Mb for Oropetium . 2. 基因组polishing:BLASR v1; Quiver in SMRT Analysis v2.3.0; 3. 其他软件 de novo 组装:Falcon和MHAP;三个不同组装软件获得的组装子进行比较分析,发现Falcon和MHAP组装子中序列连续性较低,获得着丝粒和端粒区域其平均长度较短。 i. Comparison of HGAP, Falcon and MHAP PacBio assemblers. 4. Irys system构建基因组图谱,对contigs进行anchoring和scaffolding: Irys system 测序获 53 Gb data(>100Kb),基因组覆盖度约200X,molecule N50为169Kb。将Irys 数据基于不同的严格程度的参数组装获得两套图谱: map set 1 has 402 maps with an N50 length of 725kb and spans 216Mb; map set 2 has 214 maps and an N50 of 1.674Mb. 将上述两个map sets和PacBio assembly一起进行拼接,获得a hybrid scaffold,共获得535个scaffolds,scaffold N50为7.1MB,最终组装子大小为244MB。 Assembly improvement using a Bio Nano-based genome map from the Irys system. a, Distribution of molecule size for raw single molecule genome mapping data. Size of single molecules in nanochannel arrays is plotted. b, Integration of the genome map with the genome assembly. Overlap between the PacBio-based contigs and the genome map. Each line shows a single PacBio contig in green; genome maps are shown in light blue. 5. 将illumina short reads比对回最终的assembly上验证最终组装子的准确性、评估杂合度: 5.1. 比对软件: BWA mem (v. 0.7.12-r1039) 5.2. Duplicate alignments: Picard tools v.1.104 MarkDuplicates ( http://broadinstitute.github.io/picard/) 5.3. Call variants: GATK HaplotypeCaller和Genome Analysis Toolkit (v.3.3.0) IndelRealigner。 h , Estimated accuracy of SMRT PacBio assembly and within-genome heterozygosity. 6. 重复序列注释:REPET v.2.2 packages TEdenovo and TEannot 7. 着丝粒和端粒序列分析:Tandem repeat finder (TRF, Version 4.07b) 8. 转录组数据组装及分析:Trinity (v.r20140717)转录组de novo组装, NCBI blastn v.2.2.30+ 比对至最终的组装子上。 9. 基因注释: Maker v2.31.8 ( http://www.yandell-lab.org/software/maker.html ) 。 10. 共线性和基因组比较分析: large-scale alignment tool (LAST) Genome data sets from Setaria, Sorghum, rice and Brachypodium were downloaded from Phytozome (version 9.1) and subject to pairwise genome alignments against the Oropetium genome。 11. 构建基因互作网络图:基于 Oropetium基因组中基因和拟南芥基因的同源关系构建基因互作网络图,结果显示 4,421 nodes (gene products) with 36,918 edges (interactions)。这个互作网络图涵盖了 Oropetium基因组中绝大部分的代谢通路信息,如光合作用、重要的合成及分解代谢过程以及胁迫条件应答相关的代谢通路信息。 附作者信息: Robert VanBuren *, Donald Danforth PlantScience Center, St Louis, Missouri 63132, USA. Doug Bryant *, Donald DanforthPlant Science Center, St Louis, Missouri 63132, USA. Patrick P. Edger , Departmentof Plant and Microbial Biology, University of California Berkeley, Berkeley,California 94720, USA. Departmentof Horticulture, Michigan State University, East Lansing, Michigan 48823, USA. Haibao Tang , iPlant Collaborative,School of Plant Sciences, University of Arizona, Tucson, Arizona 85721, USA. Diane Burgess , Department of Plantand Microbial Biology, University of California Berkeley, Berkeley, California94720, USA. Dinakar Challabathula †, IMBIO, University of Bonn, Kirschallee 1, D-53115 Bonn, Germany. Kristi Spittle , Pacific Biosciences,Menlo Park, California 94025, USA. Richard hall , Pacific Biosciences,Menlo Park, California 94025, USA. Jenny Gu , Pacific Biosciences,Menlo Park, California 94025, USA. Eric Lyons , iPlant Collaborative,School of Plant Sciences, University of Arizona, Tucson, Arizona 85721, USA. Michael Freeling , Department of Plantand Microbial Biology, University of California Berkeley, Berkeley, California94720, USA. Dorothea Bartels , IMBIO, University ofBonn, Kirschallee 1, D-53115 Bonn, Germany. Boudewijn ten hallers , BioNano Genomics, SanDiego, California 92121, USA. Alex hastie, BioNano Genomics, SanDiego, California 92121, USA. Todd P. Michael , Ibis Biosciences, Carlsbad, Todd C. Mockle r, Donald Danforth PlantScience Center, St Louis, Missouri 63132, USA. *These authors contributed equally to this work. 文献下载: Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum.pdf SI下载: SI-Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum.pdf ttwu@macrogencn.com
针对老师们问到的三代数据分析中的一些问题, 今天主要针对基本信息分析中的测序数据统计、质量QC评估,data summary等, 结合项目案例解释如 下: General - Filtering Report * Polymerase Read Bases : The number of bases in the polymerase read. 即测序获得所有数据量,包含adaptors序列。 * Polymerase Reads : The number of polymerases generating high quality reads. Polymerase reads are trimmed to the high quality region and include bases from adaptors, as well as potentially multiple passes around a circular template. 即高质量测序reads,包含adaptors以及测多次获得multiple subreads。 * Polymerase Read N50 : 50% of all polymerase reads are longer than this value. 测序reads中,50%的reads长度大于N50这个值。 * Polymerase Read Length : The mean trimmed read length of all polymerase reads. The value includes bases from adaptors as well as multiple passes around a circular template. 测序reads的平均长度, 包含adaptors以及multiple subreads。 * Polymerase Read Quality : The mean single-pass read quality of all polymerase reads. 测序reads中, single-pass read 平均 质量值 。 * Post-Filter Polymerase Read Bases : The number of bases in the polymerase reads after filtering, including adaptors. 测序reads过滤后所包含的碱基数, 包含adaptors 以及multiple subreads 。 * Post-Filter Polymerase Reads : The number of polymerases generating trimmed reads after filtering. Polymerase reads include bases from adaptors and multiple passes around a circular template. 过滤后测序reads数,过滤后reads中 包含adaptors 以及multiple subreads 。 * Post-Filter Polymerase Read Length : The mean trimmed read length of all polymerase reads after filtering. The value includes bases from adaptors as well as multiple passes around a circular template. 过滤后测序reads的平均长度,过滤后reads中 包含adaptors 以及multiple subreads 。 * Post-Filter Polymerase Read Quality : The mean single-pass read quality of all polymerase reads after filtering. 过滤后测序reads中, single-pass read 平均 质量值 。 附其他输出报告中的名词释义 : Diagnostic - Adapters Report Adapter Dimers (%) : The % of pre-filter ZMWs which have observed inserts of 0-10bp. These are likely adapter dimers. 接头二聚体(%): 测序reads过滤前,其中0-10bp的序列,极有可能为接头二聚体。 Short Inserts (%) : The % of pre-filter ZMWs which have observed inserts of 11-100bp. These are likely short fragment contamination. 短的插入片段(%): 测序reads过滤前,其中11-100bp的序列,极有可能为短的污染序列。 Diagnostic - Spike-In Control Report Control Sequence : The name of the control sequence. 对照序列/样本的信息。 Control Reads (%) : The percent of post-filter polymerase reads that are from the control sample. The formula for this is: (total # of control reads)/(total # of post-filter reads). 测序reads过滤后,control reads所占过滤后reads的比例。计算公式为: (total # of control reads)/(total # of post-filter reads). Control Polymerase Read Length : The mean mapped read length of the polymerase reads from the control sample. 对照样品测序reads中,可比对上的reads的平均长度。 Control Reads : The total number of polymerase reads from the control sample that passed filtering. 经过滤后,对照样本中总的测序reads数。 Control Subread Accuracy : The mean single-pass accuracy of the mapped polymerase reads from the control sample. 对照样本中,可比对上的测序reads的平均 single-pass准确性。 Control Polymerase Read Length 95% : The 95th percentile of mapped read length of the polymerase reads from the control sample. 对照样本中,比对率在95%的reads长度。 Diagnostic - Loading Report SMRT Cell ID : ID number of the SMRT Cell(s) used in this run. 此次运行中,SMRT Cell (s)的ID号。 Productive ZMWs : The number of ZMWs for this SMRT Cell that produced results with Productivity = 1. 此测序SMRT cell中,零膜波导孔测序产生的序列结果,且聚合酶填充率 Productivity = 1。 Productivity 0 (%) : Percentage of ZMWs that are empty, with no polymerase. 零膜波导孔没有被聚合酶填充,是空的。 Productivity 1 (%) : Percentage of ZMWs that are productive and sequencing. 零膜波导孔被聚合酶填充满,可开展测序。 Productivity 2 (%) : Percentage of ZMWs that are not P0 (empty) or P1 (productive). This may occur for a variety of reasons and the sequence data is not usable. 零膜波导孔填充值既不是 P0 (empty) 也不是 P1 (productive)。这可能是由多方面的原因导致的、且测序数据不可用。 Resequencing - Coverage Report Coverage : The mean depth of coverage across the reference sequence. 总测序数据量相对参考基因组序列的平均覆盖度(平均测序深度)。 Missing Bases (%) : The percentage of the reference sequence that has zero coverage. 参考基因组序列中完全没有被覆盖到的区域,即该区域测序深度为0。 Resequencing - Mapping Report Post-Filter Reads : The number of reads that passed filtering. 过滤后的reads数。 Mapped Reads : The number of post-filter reads that mapped to the reference sequence. 过滤后的reads中,可比对至参考基因组序列上的reads数。 Mapped Subreads : The number of post-filter subreads that mapped to the reference sequence. 过滤后获得的subreads中, 可比对至参考基因组序列上的 subreads 数。 Mapped CCS Reads : The number of post-filter CCS reads that mapped to the reference sequence. CCS即为consensus sequence,由来自同一个ZMWs的subreads比对获得。 这里是指过滤后,可比对至参考基因组序列上的CCS序列数。 Mapped Subread Bases : The number of post-filter bases from all subreads that mapped to the reference sequence. This does not include adapters. 过滤后,可比对至参考基因组序列上的subreads的总碱基数。这里不包含adapters。 Mapped CCS Read Bases : The number of post-filter CCS read bases that mapped to the reference sequence. This does not include adapters. 过滤后,可比对至参考基因组序列上的CCS的总碱基数。 这里不包含adapters。 Mapped Subread Accuracy : The mean accuracy of post-filter subreads that mapped to the reference sequence. 过滤后,可 比对至参考基因组序列上的subreads的平均准确性。 Mapped CCS Read Accuracy : The mean accuracy of post-filter CCS reads that mapped to the reference sequence. 过滤后,可比对至参考基因组序列上的CCS的 平均准确性。 Mapped Subread Length : The mean read length of post-filter subreads that mapped to the reference sequence. This does not include adapters. 过滤后,可 比对至参考基因组序列上的subreads的平均长度。 这里不包含adapters。 Mapped Read Length of Insert : The mean read length of all insert sequences, which includes only mapped sequences. The read length of insert is approximately the longest subread length per ZMW. 过滤后,可比对至参考基因组序列上的所有插入片段的平均长度。在同一个ZMW中,插入片段的长度大约是该ZMW中最长的subread的长度。 Mapped Polymerase Read Length : The mean read length of post-filter polymerase reads that mapped to the reference sequence. This includes adapters. 过滤后,可比对至参考基因组序列上的测序reads的长度, Polymerase Read是包含adapters的。 Mapped Polymerase Read Length 95% : The 95th percentile of read length of post-filter polymerase reads that mapped to the reference sequence. 过滤后,可 比对至参考基因组序列上, 比对 率在95%的 polymerase reads的 长度。 Mapped Polymerase Read Length Max : The maximum read length of post-filter polymerase reads that mapped to the reference sequence. 过滤后,可 比对至参考基因组序列上的最长的 polymerase reads的 长度。 Mapped Full Subread Length : The average of the lengths of full subreads that mapped to the reference sequence. Full subreads are subreads flanked by two adapters. 过滤后, 可 比对至参考基因组序列上的 full subreads的平均长度。 full subreads两侧均包含adapter。 Analysis - Variants Report Reference : The name of the reference sequence. Reference Length : The length of the reference sequence. Bases Called (%) : The percentage of reference sequence that has ≥ 1x coverage. % Bases Called + % Missing Bases should equal 100. Consensus Accuracy : The accuracy of the consensus sequence compared to the reference. Base Coverage : The mean depth of coverage across the reference sequence. Analysis - Top Variants Report Sequence : The name of the reference sequence. Position : The position of the variant along the reference sequence. Variant : The variant position, type, and affected nucleotide. Type : The variant type: Insertion, Deletion, or Substitution. Coverage : The coverage at position. Confidence : The confidence of the variant call. Genotype : Includes the full number of chromosomes (diploid) or half the number (haploid). Assembly - Iterations Report Assembly Iterations : The number of iterations of overlap-layout-consensus performed by the de novo or hybrid assembly algorithm. Assembly - Draft Assembly Report Draft Contigs : The number of contigs output by Celera Assembler, which may include singleton and degenerate contigs. After assembly polishing with Quiver, the final number of contigs may be smaller. N50 Contig Length : The length L of the contig for which 50% of all bases in the final contigs are of length greater than L. Reads Assembled (%) : The fraction of all reads that are assembled into contigs in the final assembly. Max Contig Length : The length of the longest contig in the final assembly. Sum of Contig Lengths : The sum of the lengths of all contigs in the final assembly. Hybrid Assembly - Assembly Iterations Report Input Contigs : The number of contigs used as input to the AHA algorithm. Min Align Score : The minimum alignment score between a read and a contig to use the alignment for scaffolding. Min Link Redundancy : The minimum number of reads that must link two contigs for those contigs to be connected in a scaffold. Min Subread Length : The minimum length required for a subread to be used by the AHA algorithm. Min Contig Length : The minimum length required for a contig to be used by the AHA algorithm. Scaffolds Across Assembly Iterations : The number of scaffolds at a particular iteration of the AHA algorithm. Linking Reads Across Assembly Iterations : The number of linking reads at a particular iteration of the AHA algorithm. Hybrid Assembly - Final Assembly Report Number : The number of scaffolds, contigs, or gaps in the initial or final assembly. Max Length : The length of the longest scaffold, contig, or gap in the initial or final assembly. N50 Length : The length L of the scaffold, contig, or gap for which 50% of all bases in the initial/final scaffold/contig/gap are of length greater than L. Sum Length : The sum of the lengths of all scaffolds, contigs, or gaps in the initial or final assembly. Initial Scaffolds : The distribution of the lengths of the scaffolds sequences before completing the AHA algorithm. Scaffolds are composed of contigs optionally separated by gap sequences. Final Scaffolds : The distribution of the lengths of the scaffolds sequences after completing the AHA algorithm. Scaffolds are composed of contigs optionally separated by gap sequences. Initial Contigs : The distribution of the lengths of the contig sequences before completing the AHA algorithm. Contigs are stretches of continuous sequence that do not contain gaps. Final Contigs : The distribution of the lengths of the contig sequences after completing the AHA algorithm. Contigs are stretches of continuous sequence that do not contain gaps. Initial Gaps : The distribution of the lengths of the gaps between contig sequences before completing the AHA algorithm. Final Gaps : The distribution of the lengths of the gaps between contig sequences after completing the AHA algorithm. Base Modifications - Motifs Report Motif : The nucleotide sequence of the methyltransferase recognition motif, using the standard IUPAC nucleotide alphabet. Modified Position : The position within the motif that is modified. The first base is 1. Example: The modified adenine in GATC is at position 2. Modification Type : The type of chemical modification most commonly identified at that motif. These are: 6mA, 4mC, 5mC, or modified_base (modification not recognized by the software.) % Motifs Detected : The percentage of times that this motif was detected as modified across the entire genome. # Of Motifs Detected : The number of times that this motif was detected as modified across the entire genome. # Of Motifs In Genome : The number of times this motif occurs in the genome. Mean Modification QV : The mean modification QV for all instances where this motif was detected as modified. Mean Motif Coverage : The mean coverage for all instances where this motif was detected as modified. Partner Motif : For motifs that are not self-palindromic, this is the complementary sequence. Assembly - Pre-Assembly Report Seed Bases : The number of bases from seed reads. Pre-Assembled Yield : The percentage of seed read bases that were successfully aligned to generate pre-assembled reads. Pre-Assembled Read Length : The average length of the pre-assembled reads. Length Cutoff : Reads with lengths greater than the length cutoff are used as seed reads for pre-assembly. Pre-Assembled Bases : The number of bases in the pre-assembled reads. Pre-Assembled Reads : The number of reads output by the pre-assembler. Pre-assembled reads are very long, highly accurate reads that can be used as input to a de novo assembler. Pre-Assembled N50 : The N50 read length of the pre-assembled reads. 待继续更新。
SMRT ® Portal Help Pacific Biosciences Terminology General Terminology Adapters : Hairpin loops that are ligated to both ends of the double stranded DNA insert. When adapter sequences are removed, the read is split into multiple subreads . 即类似发夹结构的SMRT bell adapters,在文库构建时需要连接至双链DNA模板的平末端。去除adapters后,所获得即为 multiple subreads。 Movie : Real-time observation of a SMRT Cell. 即测序一个SMRT cell实时观察时长。 Read : A contiguous sequence generated from a ZMW that includes an insert sequence and may include an adapter sequence. A read is composed of alternating subreads and adapters. 指从零膜波导孔测序获得的连续的序列,其包含insert DNA序列(靶序列,即subreads)、接头序列。 Sequencing ZMW : A ZMW that is expected to be able to produce a sequence if it is populated with a polymerase. ZMWs used for automated SMRT Cell alignment are not considered sequencing ZMWs. 零膜波导孔中被聚合酶填满,可以测序获得read,即为可测序零膜波导孔。 Subread : Sequence generated by splitting the raw sequence from a ZMW by the adapters. This is the post-sequencing version of the “insert DNA” used in sample preparation. 即 insert DNA序列, 靶序列。 Zero-Mode Waveguide (ZMW) : A nanophotonic device for confining light to a small observation volume that can be, for example, a small hole in a conductive layer whose diameter is too small to permit the propagation of light in the wavelength range used for detection. 即零膜波导孔。 Primary Analysis Terminology Adapter Screening : Annotates adapter read locations. Used to break a read into subreads during secondary analysis mapping and Circular Consensus. 鉴定adapter的位置。在标准分析比对和 Circular Consensus分析中,将每条read的adapter去除获得subreads。 High Quality Region Screening : Annotates the high quality sequencing regions of a read to be used during Raw Read Trimming. 在 Raw Read Trimming环节,鉴别每条read的高质量测序区域。 Insert Screening : Annotates insert DNA regions in the raw read. 在raw read中鉴别哪段序列为insert DNA。 Quality Value Assignment : A prediction of the error probability of a basecall. 评估每个碱基的质量。 Quality Value (QV) : The total probability that the basecall is an insertion or substitution or is preceded by a deletion. QV = -10 * log10(p) Insertion QV : The probability that the basecall is an insertion with respect to the true sequence. Deletion QV : The probability that a deletion error occurred before the current base. Substitution QV : The probability that the basecall is a substitution. Raw Read Trimming : Extraction of high quality regions from a raw read. This results in a read. Read Quality Assignment : A trained prediction of a read’s mapped accuracy based on its pulse and base file characteristics (peak signal-to-noise ratio, average base QV, interpulse duration, and so on). This is used during secondary analysis filtering. Secondary Analysis Terminology Consensus : Generation of a consensus sequence from multiple-sequence alignment. De Novo Assembly : Assembly of all subreads without a reference sequence. Filtering : Removes reads that do not meet the Read Quality and Read length parameters set by the user. The current default filtering parameters defined by Pacific Biosciences are: Read Quality ≥ .75 (as of SMRT Analysis v1.3.1) Read length ≥ 50 bases Mapping : Local alignment of a read or subread to a reference sequence. Accuracy Terminology Circular Consensus Accuracy : Accuracy of the circular consensus read. Consensus Accuracy : Accuracy of the consensus sequence compared to the reference. Read Quality : A trained prediction of a read’s mapped accuracy based on its pulse and base file characteristics (peak signal-to-noise ratio, average base QV, interpulse duration, and so on). Single Molecule Raw Accuracy : Accuracy based on one pass on one single molecule. Subread Accuracy : The post-mapping accuracy of the basecalls. Formula: , where errors = number of deletions + insertions + substitutions. Read Terminology De Novo Circular Consensus (CCS) Read : The consensus sequence produced from the alignment of subreads taken from a single ZMW. This is not aligned against a reference sequence. Raw Read : All base calls from a ZMW. Includes insert DNA and adapter sequence. Single Molecule Variant Detection (SMVD) Read : The consensus sequence produced using all subreads taken from a single ZMW and aligned to a known reference sequence. (This was formerly known as RCCS .) Read Length Terminology Mapped Read length : The distance between the first aligned base and the last aligned base in a raw read, inclusive of insert and adapter alignments. Mapped Subread Read length : The length of the subread alignment to a target reference sequence. This does not include the adapter sequence. Read length : The total number of bases produced from a ZMW after trimming. This may include the adapter sequence.
SMRT ® Portal Help What is SMRT Portal and how do I use it? Use SMRT Portal to perform secondary analysis of sequencing data generated by one or more PacBio System runs. You create and submit jobs . Jobs specify the SMRT Cells whose data will be analyzed, as well as which analysis protocols to use. After the job has completed, you then view the secondary analysis data generated. Working with SMRT Portal Create and submit a job. View the secondary analysis data generated. Create a hybrid assembly using high-confidence contigs. Open , monitor , or delete jobs. Export metrics and table data. Change your password and restore table settings Reports generated by SMRT Portal SMRT Portal reports Administrating and Managing SMRT Portal For the following functions, you must be logged in as a scientist or administrator : Managing secondary analysis protocols Managing reference sequences Importing raw data from SMRT Cells for analysis Importing SMRT Pipe jobs For the following functions, you must be logged in as an administrator : Managing application users Managing groups Specifying site-wide application settings Archiving and restoring jobs Reference SMRT Portal hardware/software requirements Protocols provided by Pacific Biosciences Pacific Biosciences software overview Pacific Biosciences terminology For troubleshooting information, see http://github.com/PacificBiosciences/SMRT-Analysis/wiki/Troubleshooting-the-SMRT-Analysis-Suite For additional technical support, contact Pacific Biosciences at TechSupport@pacificbiosciences.com or 877-920-7222.
作为全球首批使用 PacBio 最新试剂 P6-C4 的公司,千年基因通过对实验条件的不断优化及实验流程的严格控制已率先实现 PacBio RS II 三代测序的升级,读长及通量均得到显著提升。 千年基因 PacBio RS II 三代测序完美升级后,平均读长达 11Kb 以上, reads N50 长度达 16Kb 以上,每个 SMRT Cell 的测序通量高达 1Gb ,远高于 PacBio 官方的参考标准。更长读长和更高通量将有利于基因组 de novo 测序、宏基因组测序、全长转录本测序、全长 16S rDNA 测序等项目的开展。 千年基因的 PacBio RS II 三代测序自提供服务以来,已与国内大量科研单位合作开展了诸多动植物及微生物基因组 de novo 测序项目。同时,千年基因将首次应用三代平台完成人类基因组 de novo 测序,并利用三代平台长读长的优势组装得到最高质量的亚洲人参考基因组图谱,以便于亚洲人致病变异的深入挖掘。 来源于 千年基因官网 。
在美国举办的2014 ASHG会议上,Pacific Biosciences公司宣称: 千年基因 总部 Macrogen 将成为国际上首家运用PacBio RS II最新试剂盒P6-C4的公司。 Macrogen will be one of the first service provider to use the latest P6-C4 chemistry which represents 6th generation of polymerase and 4th generation of sequencing reagents, which was recently announced by Pacific Biosciences Inc. 结合P4-C2和P5-C3试剂盒进行测序reads参数信息: 详细信息参加Macrogen官网信息: Macrogen Introduces Latest Service Portfolio at 2014 ASHG Annual Meeting 。
2014年9月24日, Macrogen集团 声称将运用 Pacific Biosciences 公司的PacBio单分子实时测序平台针对亚洲人群人类基因组进行 de novo 测序组装,从而获得亚洲人群人类参考基因组序列。 目前生物医学研究者常用的人类参考基因组序列是源自白种人,这对研究不同人类种群之间的某些复杂疾病存在局限性。 Macrogen集团 作为韩国已上市的生物测序公司,将结合PacBio RS II DNA测序平台对亚洲人类基因组开展denovo测序和组装,从而获得高质量的黄种人参考基因组序列。 Macrogen集团 目前拥有两台PacBio测序仪,将运用这两台PaBio测序仪测序获得高质量的黄种人参考基因组序列。PacBio RS II DNA测序平台拥有有别于二代短序列读长平台所没有的一些特质,其长读长能够充分运用在全长转录本测序、动植物以及细菌基因组的 de novo 测序和组装、宏基因组测序、人类基因组测序及其复杂区域的目标区域测序(如组织相容性区域的测序以及HLA基因的全长测序等)。 Macrogen CEO Jeong-Sun Seo 声称PacBio测序技术已经发展成熟、能够充分运用至人类基因组测序中,而且PacBio测序技术也是目前唯一能够充分解析人类基因组复杂区域、挖掘SNPs之外的其他类型的遗传变异信息的测序技术。同时也期待PacBio测序技术能够整体的增强 Macrogen集团 的测序服务规模,从而为各全球各科研人员提供更为全面的的测序服务。 千年基因将应用第三代测序仪PacBio RS II完成亚洲人基因组de novo测序 。 GenomeWeb 报道 Macrogen to Use PacBio Technology to Create Asian Genome References 。 附PacBio RS II有关资料: 更多资料: 1. PacBio_RS_II_Brochure.pdf ; 2. PacBio_Software_and_Analysis_Overview.pdf ; 3. PacBio Blog 。