科学网

 找回密码
  注册

tag 标签: 转录组

相关帖子

版块 作者 回复/查看 最后发表

没有相关内容

相关日志

[转载]转录组表达定量- Read count?CPM? RPKM? FPKM?
planttech 2018-1-19 10:08
1.Read count 数值概念:比对到某基因的 reads 数。 用途:用于换算 CPM 、 RPKM 、 FPRM 等后续其他指标;同时作为基因异分析软件(如 DESeq 和 edgeR )的输入值,也就是说差异分析的结果来自于 read count 的计算,而非 CPM 、 RPKM 、 FPKM ,表达定量的结果主要用于主成分分析、层次聚类分析。 2.CPM : Counts per million 数值概念:计算公式: CPM= A/mapped reads*1000000 A 为比对到某基因的 reads 数( read count )。 用途:在某些情况下,只想了解每个基因被覆盖到的相对 reads 数,而不希望对其做长度校正,就会使用这个指标。 CPM 只对 read count 相对总 reads 数做了数量的均一化。当如果想进行表达量的基因间比较,则不得不考虑基因长度的不同。如果进一步做长度的均一化,就得到了下面的 RPKM 、 FPKM 。 3.RPKM : Reads Per Kilobaseof exon model per Million mapped reads ( 每千个碱基的转录每百万映射读取的 reads) 数值概念:计算公式: RPKM=(1000000*A)/( mapped reads *gene length/1000) 设 A 为比对到 某基因 的 reads 数( read count )。 RPKM 法能消除基因长度和测序量差异对计算基因表达的影响,计算得到的基因表达量可直接用于比较不同样品间的基因表达差异和不同基因间表达高低的比较。 用途:用于与基因表达量相关的后期分析。基因表达趋势分析、 WGCNA 共表达网络构建,热图绘制等都使用。 4. FPKM : Fragments Per Kilobase of exon model per Million mapped fragments( 每千个碱基的转录每百万映射读取的 fragments) FPKM 意义与 RPKM 极为相近。二者区别仅在于, Fragment 与 Read 。 RPKM 的诞生是针对早期的 SE 测序, FPKM 则是在 PE 测序上对 RPKM 的校正。只要明确 Reads 和 Fragments 的区别, RPKM 和 FPKM 的概念便易于区分。 Reads 即是指下机后 fastq 数据中的每一条 Reads , Fragments 则是指每一段用于测序的核酸片段【双端序列即使丢弃 1 端 reads ,让按照 1 个 Fragments 计算】。 www.planttech.com.cn 了解更多
15416 次阅读|0 个评论
小麦一周文献推荐(12.31)
mashengwei 2017-12-31 10:24
12 31 本期作者:麦萌 又到了周日,有没有感到时光飞逝?今天恰好是2017年的最后一天。岁末年初,还是挺忙的。今晚胖丫仍然猫在实验室,我问她为啥没出去跨年啊。胖丫淡淡的说,“热闹是他们的,我什么也没有 ”。 下面是17年12份小麦相关的文献列表。 一周文献汇总 一周文献推荐(2017.12.9) 小麦一周文献推荐(12.17) 小麦一周文献推送(12.23) 1 Interspecific and intergeneric hybridization as a source of variation for wheat grain quality improvement Wheat quality and its end-uses are mainly based on variation in three traits: grain hardness, gluten quality and starch. In recent times, the importance of nutritional quality and health-related aspects has increased the range of these traits with the inclusion of other grain components such as vitamins, fibre and micronutrients. One option to enlarge the genetic variability in wheat for all these components has been the use of wild relatives, together with underutilised or neglected wheat varieties or species. In the current review, we summarise the role of each grain component in relation to grain quality, their variation in modern wheat and the alternative sources in which wheat breeders have found novel variation. 2 Functional and DNA–protein binding studies of WRKY transcription factors and their expression analysis in response to biotic and abiotic stress in wheat (Triticum aestivum L.) WRKY, a plant-specific transcription factor family, plays vital roles in pathogen defense, abiotic stress, and phytohormone signalling. Little is known about the roles and function of WRKY transcription factors in response to rust diseases in wheat. In the present study, three TaWRKY genes encoding complete protein sequences were cloned. They belonged to class II and III WRKY based on the number of WRKY domains and the pattern of zinc finger structures. Twenty-two DNA–protein binding docking complexes predicted stable interactions of WRKY domain with W-box. Quantitative real-time-PCR using wheat near-isogenic lines with or without Lr28 gene revealed differential up- or down-regulation in response to biotic and abiotic stress treatments which could be responsible for their functional divergence in wheat. TaWRKY62 was found to be induced upon treatment with JA, MJ, and SA and reduced after ABA treatments. Maximum induction of six out of seven genes occurred at 48 h post inoculation due to pathogen inoculation. Hence, TaWRKY (49, 50, 52, 55, 57, and 62) can be considered as potential candidate genes for further functional validation as well as for crop improvement programs for stress resistance. The results of the present study will enhance knowledge towards understanding the molecular basis of mode of action of WRKY transcription factor genes in wheat and their role during leaf rust pathogenesis in particular. 3 Durum wheat diversity for heat stress tolerance during inflorescence emergence is correlated to TdHSP101C expression in early developmental stages The predicted world population increase along with climate changes threatens sustainable agricultural supply in the coming decades. It is therefore vital to understand crops diversity associated to abiotic stress response. Heat stress is considered one of the major constrains on crops productivity thus it is essential to develop new approaches for a precocious and rigorous evaluation of varietal diversity regarding heat tolerance. Plant cell membrane thermostability (CMS) is a widely used method for wheat thermotolerance assessment although its limitations require complementary solutions. In this work we used CMS assay and explored TdHSP101C genes as an additional tool for durum wheat screening. Genomic and transcriptomic analyses of TdHSP101C genes were performed in varieties with contrasting CMS results and further correlated with heat stress tolerance during fertilization and seed development. Although the durum wheat varieties studied presented a very high homology on TdHSP101C genes (99%) the transcriptomic assessment allowed the discrimination between varieties with good CMS results and its correlation with differential impacts of heat treatment during inflorescence emergence and seed development on grain yield. The evidences here reported indicate that TdHSP101C transcription levels induced by heat stress in fully expanded leaves may be a promising complementary screening tool to discriminate between durum wheat varieties identified as thermotolerant through CMS. 4 Virulence of some Puccinia triticina races to the effective wheat leaf rust resistant genes Lr 9 and Lr 19 under Egyptian field conditions Leaf rust ( Puccinia triticina Eriks.) is the most widespread disease of wheat ( Triticum aestivum L.) in Egypt and worldwide. The two leaf rust resistance genes i.e. Lr 9 and Lr 19 were previously highly effective against the predominant Puccinia triticina races in Egypt. In 2015/2016 growing season, susceptible field reaction was recorded on these two genes, where rust severity reached to 40% (S) for Lr 9 and 5% (S) for Lr 19 under Egyptian field conditions at four locations i.e. El-Behira, El-Minufiya, El-Qalubiya and El-Fayom governorates. In this study, 39 leaf rust monogenic lines and 16 commercial wheat cultivars were tested at seedling stage. While, 12 leaf rust monogenic lines and the same 16 wheat cultivars were evaluated at adult plant stage. Eight leaf rust field samples were collected from these governorates (four from each of Lr 9 and Lr 19). Forty single isolates were derived from the collected samples of Lr 9 and Lr 19 (each with 20 isolates). Eight pathotypes were identified from Lr 9, while only two pathotypes were identified from Lr 19. The most frequent pathotype (virulent to Lr 9) was KTSPT (30% frequency), followed by TTTMS (25% frequency). The other pathotypes ranged from only 5%–10% frequency. Whereas, the most frequent pathotype (virulent to Lr 19) was CTTTT (85% frequency), while the lowest frequent one was PKTST (15% frequency). Pathotypes i.e. PRSTT, NTKTS and TTTMS (identified from Lr 9) were more aggressiveness on the most of the tested leaf rust monogenic lines than others, as they were virulent to 36, 35 and 35 lines from a total of 39 monogenic lines, respectively. Also, the two pathotypes; PKTST and CTTTT (identified from Lr 19) were virulent to 36 and 35 monogenic lines, respectively. Moreover, leaf rust pathotypes i.e. NPTNK and PRSTT (from Lr 9) and PKTST (from Lr 19) were the most aggressive on the tested wheat cultivars at seedling stage. Lr 2a was the most effective leaf rust resistance gene against the tested pathotypes at adult plant stage. On the other hand, the three wheat cultivars Misr 1, Misr 2 and Nubariya 1 proved to be the highly resistant cultivars against all the tested leaf rust pathotypes at adult plant stage. 5 Resistance of Aegilops longissima to the rusts of wheat | Plant Disease Stem rust (caused by Puccinia graminis f. sp. tritici), leaf rust (P. triticina), and stripe rust (P. striiformis f. sp. tritici) rank among the most important diseases of wheat worldwide. The development of resistant cultivars is the preferred method of controlling rust diseases because it is environmentally benign and also cost-effective. However, new virulence types often arise in pathogen populations, rendering such cultivars vulnerable to losses. The identification of new sources of resistance is key to providing long-lasting disease control against the rapidly evolving rust pathogens. Thus, the objective of this research was to evaluate the wheat wild relative Aegilops longissima for resistance to stem rust, leaf rust, and stripe rust at the seedling stage in the greenhouse. A diverse collection of 394 accessions of the species, mostly from Israel, was assembled for the study, but the total number included in any one rust evaluation ranged from 308 to 379. With respect to stem rust resistance, 18.2% and 80.8% of accessions were resistant to the widely virulent U.S. and Kenyan P. graminis f. sp. tritici races of TTTTF and TTKSK, respectively. The percentage of accessions exhibiting resistance to the U.S. P. triticina races of THBJ and BBBD was 65.9% and 52.2%, respectively. Over half (50.1%) of the Ae. longissima accessions were resistant to the U.S. P. striiformis f. sp. tritici race PSTv-37. Ten accessions (AEG-683-23, AEG-725-15, AEG-803-49, AEG-1274-20, AEG-1276-22, AEG-1471-15, AEG-1475-19, AEG-2974-0, AEG-4005-20, and AEG-8705-10) were resistant to all races of the three rust pathogens used in this study. Distinct differences in the geographic distribution of resistance and susceptibility were found in Ae. longissima accessions from Israel in response to some rust races. To P. graminis f. sp. tritici race TTKSK, populations with a very high frequency of resistance were concentrated in the central and northern part of Israel, whereas populations with a comparatively higher frequency of susceptibility were concentrated in the southern part of the country. The reverse trend was observed with respect to P. striiformis f. sp. tritici race PSTv-37. The results from this study demonstrate that Ae. longissima is a rich source of rust resistance genes for wheat improvement. 6 Genes WHEAT FRIZZY PANICLE and SHAM RAMIFICATION 2 independently regulate differentiation of floral meristems in wheat Here we characterized diploid and tetraploid wheat lines of various non-standard spike morphotypes, which allowed for identification of a new mutant allele of the WHEAT FRIZZY PANICLE ( WFZP ) gene that determines spike branching in diploid wheat Ttiticum monococcum L. Moreover, we found that the development of SSs and spike branching in wheat T. durum Desf. was a result of a wfzp-A/TtBH-A1 mutation that originated from spontaneous hybridization with T. turgidum convar. сompositum (L.f.) Filat. Detailed characterization of the false-true ramification phenotype controlled by the recessive sham ramification 2 ( shr2 ) gene in tetraploid wheat T. turgidum L. allowed us to suggest putative functions of the SHR2 gene that may be involved in the regulation of spikelet meristem fate and in specification of floret meristems. The results of a gene interaction test suggested that genes WFZP and SHR2 function independently in different processes during spikelet development, whereas another spike ramification gene(s) interact(s) with SHR2 and share(s) common functions. 7 Allelic composition and associated quality traits of the Glu-1 and Glu-3 loci in selected modern Ethiopian durum wheat varieties Gluten protein determines the processing quality of both durum wheat and bread wheat. The glutenin subunits compositions and associated quality traits of 20 Ethiopian durum wheat varieties were systematically analyzed using SDS-PAGE and Payne numbers. A total of 16 glutenin patterns were identified. At the Glu-A1 locus, all varieties scored the null allele. The predominant glutenin alleles at the Glu-B1 locus were Glu-B1b (7+8) and Glu-B1e (20). In Glu-3, the most abundant glutenin subunits were Glu-A3a and Glu-B3c. Based on the Payne scores, the varieties Yerer, Ginchi, Candate, and Foka were identified to have allelic composition suitable for pasta making. The cluster analysis using agglomerative hierarchical clustering (AHC) method classified the varieties into four similarity classes. Based on the findings of this experiment, suggestions were made for allelic composition improvement through introgression of superior alleles from known Glu-1 and Glu-3 sources. 8 The NB-LRR gene Pm60 confers powdery mildew resistance in wheat 欢迎对这篇文章感兴趣的小伙伴给我们写个解读或导读。 Powdery mildew is one of the most devastating diseases of wheat. To date, few powdery mildew resistance genes have been cloned from wheat due to the size and complexity of the wheat genome. Triticum urartu is the progenitor of the A genome of wheat and is an important source for powdery mildew resistance genes. Using molecular markers designed from scaffolds of the sequenced T. urartu accession and standard map-based cloning, a powdery mildew resistance locus was mapped to a 356-kb region, which contains two nucleotide-binding and leucine-rich repeat domain (NB-LRR) protein-encoding genes. Virus-induced gene silencing, single-cell transient expression, and stable transformation assays demonstrated that one of these two genes, designated Pm60 , confers resistance to powdery mildew. Overexpression of full-length Pm60 and two allelic variants in Nicotiana benthamiana leaves induced hypersensitive cell death response, but expression of the coiled-coil domain alone was insufficient to induce hypersensitive response. Yeast two-hybrid, bimolecular fluorescence complementation and luciferase complementation imaging assays showed that Pm60 protein interacts with its neighboring NB-containing protein, suggesting that they might be functionally related. The identification and cloning of this novel wheat powdery mildew resistance gene will facilitate breeding for disease resistance in wheat. 9 Identification of QTL for flag leaf length in common wheat and their pleiotropic effects Leaf size is an important factor contributing to the photosynthetic capability of wheat plants. It also significantly affects various agronomic traits. In particular, the flag leaves contribute significantly to grain yield in wheat. A recombinant inbred line (RIL) population developed between varieties with significant differences in flag leaf traits was used to map quantitative trait loci (QTL) of flag leaf length (FLL) and to evaluate its pleiotropic effects on five yield-related traits, including spike length (SL), spikelet number per spike (SPN), kernel number per spike (KN), kernel length (KL), and thousand-kernel weight (TKW). Two additional RIL populations were used to validate the detected QTL and reveal the relationships in different genetic backgrounds. Using the diversity arrays technology (DArT) genetic linkage map, three major QTL for FLL were detected, with single QTL in different environments explaining 8.6–23.3% of the phenotypic variation. All the QTL were detected in at least four environments, and validated in two related populations based on the designed primers. These QTL and the newly developed primers are expected to be valuable for fine mapping and marker-assisted selection in wheat breeding programs. 10 The repetitive landscape of the 5100 Mbp barley genome Here, we present an analysis of the repetitive fraction of the 5100 Mb barley genome, the largest angiosperm genome to have a near-complete sequence assembly. Genes make only about 2% of the genome, while over 80% is derived from TEs. The TE fraction is composed of at least 350 different families. However, 50% of the genome is comprised of only 15 high-copy TE families, while all other TE families are present in moderate or low copy numbers. We found that the barley genome is highly compartmentalized with different types of TEs occupying different chromosomal “niches”, such as distal, interstitial, or proximal regions of chromosome arms. Furthermore, gene space represents its own distinct genomic compartment that is enriched in small non-autonomous DNA transposons, suggesting that these TEs specifically target promoters and downstream regions. Furthermore, their presence in gene promoters is associated with decreased methylation levels. 11 TaNTF2, a contributor for wheat resistance to the stripe rust pathogen Nuclear Transport Factor 2 (NTF2) functions as a critical regulator in balancing the GTP-and GDP-bound forms of Ran, a class of evolutionarily conserved small GTP-binding protein. During the incompatible interaction between wheat-Puccinia striiformis f. sp. tritici (Pst), a cDNA fragment encoding a putative wheat NTF2 gene was found to be significantly induced, suggesting a potential role in wheat resistance to Pst. In this work, the full length of TaNTF2 was obtained, with three copies located on 7A, 7B and 7D chromosomes, respectively. QRT-PCR further verified the up-regulated expression of TaNTF2 in response to avirulent Pst. In addition, TaNTF2 was also induced by exogenous hormone applications, especially JA treatment. Transient expression of TaNTF2 in tobacco cells confirmed its subcellular localization in the cytoplasm, perinuclear area and nucleus. And virus induced gene silencing (VIGS) was used to identify the function of TaNTF2 during an incompatible wheat-Pst interaction. When TaNTF2 was knocked down, resistance of wheat to avirulentPst was decreased, with a bigger necrotic spots, and higher numbers of hyphal branches and haustorial mother cells. Our results demonstrated that TaNTF2 was a contributor for wheat resistance to the stripe rust pathogen, which will help to comprehensively understand the NTF2/Ran modulating mechanism in wheat-Pst interaction. 12 Loss of AvrSr50 by somatic exchangein stem rust leads to virulence forSr50 resistance in wheat 13 Variation in the AvrSr35 genedetermines Sr35 resistance againstwheat stem rust race Ug99 14 ZmCCT9 enhances maize adaptation to higher latitudes 最后一篇是关于玉米的文章,该文发表在PNAS上,要推荐给做图位克隆的小伙伴。 欢迎关注 “ 小麦研究联盟 ”, 了解小麦新进展 请点击此处输入图片描述 投稿、转载、合作以及信息分布等请联系: wheatgenome
个人分类: 文献推荐|2353 次阅读|0 个评论
2017年12月14日湖北武汉 基因组;转录组;生物信息学数据分析...
liyongjun304 2017-11-11 15:41
@font-face { font-family: 宋体;}@font-face { font-family: Cambria Math;}@font-face { font-family: @宋体;}@font-face { font-family: 微软雅黑;}@font-face { font-family: @微软雅黑;}p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0 0 0; text-align: justify; font-size: 14px; font-family: Times New Roman; }.MsoChpDefault { font-size: 13px; }div.WordSection1 { } 章节 内 容 生物信息学介绍 生物信息学介绍与前沿技术动态 序列的比对 1 、全局比对 Clustalw,Muscle,Hmmer 2 、局部比对 Blast, Sim4,Genewise 3 、序列比对算法分析 基因组 / 基因注释分析 1 、新一代测序技术原理和数据处理介绍 2 、基因组拼接与组装 基因组de novo组装方法 重复序列分析技术 3 、 RNA分析 tRNA,rRNA,microRNA,snoRNA RNA 干扰,SiRNA预测技术 4、基因预测 原核:Glimmer,真核:Genescan, Augustus 5、 基因功能注释及常用的数据库介绍 基因组学研究概述 1 、structural genomics: 结构基因组学 2 、functional genomics: 功能基因组学 3 、Drug discovery: 药物研发 4 、Personalized medicine:个性化、精细医疗 DNA 测序技术-转录组分析的进化 1 、第一代测序技术:Sanger测序原理 2 、第二代测序技术:Illumina,454, Ion Torrent原理 3 、第三代测序技术:PacBio, Hellicos原理 4 、第四代测序技术: Oxford NanoPore原理 5 、其他技术Hybridization based methods (NabSys) Experimental procedure for transcriptomic analysis Introduction Number of duplications Sequencing coverage Transcriptomic analysis using NGS (RNA-Seq) Transcriptomic analysis using PacBio (Iso-Seq) IsoSeq Experimental design Data analysis (part 1):data pre-processing evaluation of data quality 数据分析 Data format ,fasta,fastq,quality value,gff3 Data cleanup Quality filter, trimmer, clipper Data analysis (part 2):reference free analyses(无参转录组分析) Gene discovery Trinity de novo transcriptome assembly Analysis of Differential Expressed Gene (DEGs) Abundance estimation using RSEM Differential expression analysis using EdgeR Explore the results (cummerbund) MA plot, Volcano plot, False Discovery Rate (FDR) hierarchical two-way clustering, pairwise sample-distance, gene expression profiles. 使用R语言进行生物信息学相关的分析 使用R语言相关的包对转录组等组学的高通量测序数据进行差异表达、富集分析等。 生物信息学专业图KEGG、GO等的绘制方法与运用R语言进行实现 基因组可视化软件circos的使用 使用circos绘制基因组圈图 生物信息学专业常用工具及绘图方法 应用生物信息常用的工具进行专业绘图及格式转换; 学BioEdit、WeGO等常用生物学专业软件的图表及格式转换 报名办法及费用: 每人¥4300元(含报名费、资料费、培训费、考试费),住宿可统一安排,费用自理。请各有关部门统一组织本地区行政、企事业单位报名参加培训,各单位也可直接报名参加,报名回执表请传真至会务处。 地点:湖北 武汉 联 系 人 : 李永军 联系电话: 18513478760 邮箱:3263004853@qq.com 祝:工作顺利,身体健康!!!
3666 次阅读|0 个评论
一种常见智力残疾—脆性X染色体综合征转录组层面研究取得新进展
热度 1 sciencepress 2016-10-13 15:22
10 月 11 日,同济大学医学院 薛志刚 、 范国平 联合课题组以及 江赐忠 课题组的合作研究成果在线发表于 SCIENCE CHINA Life Sciences ( 《中国科学:生命科学》英文版),该研究首次从转录组水平揭示了脆性 X 染色体综合征神经发育异常的机制。 脆性 X 染色体综合征( fragile X syndrome, FXS )是一类极为常见的 X 染色体连锁的智力残疾( X-linked intellectual disability, XLID ),因患者的 X 染色体在显微镜下呈现缢痕的脆性位点( fragile site )而得名。 目前公认脆性位点上的 FMR1 基因是 FXS 的关键致病基因。正常人的 FMR1 基因启动子只有约 30 个 CGG 三核苷酸重复,但在 FXS 患者中则超过 200 个。高重复的 CGG 造成 FMR1 启动子的甲基化水平升高,进而引起组蛋白修饰改变,最终导致 FMR1 基因沉默。在细胞质中, FMR1 基因编码的蛋白 FMRP 能够选择性结合 mRNA 翻译进程中的多聚核糖体,从而抑制 mRNA 翻译。受 FMRP 抑制的蛋白包括 mGluR5, NMDAR, 多种细胞骨架蛋白以及 ERK 通路和 mTORC1 通路下游的多种因子。但不容忽视的是, FMRP 也抑制很多转录因子的表达,同时 5% 的 FMRP 在细胞核中,说明 FMRP 有可能直接或间接地参与了转录调控。然而对 FMRP 参与转录调控的机制仍然缺乏高通量和高分辨率的研究结果。 为了解决这个科学难题, 陆平 等研究人员以诱导多功能干细胞( iPSC )为模型,比较和分析了正常人和 FXS 患者来源的 iPSC ( FXS-iPSC )在体外定向分化为神经元各阶段的转录组动态变化情况。 为了克服样本量的问题,他们也同时整合分析了其他研究团队的数据。 通过生物信息学分析,他们发现 WNT1 , BMP4 和 POU3F4 等神经分化相关的转录因子在 FXS-iPSC 分化得到的神经元中异常高表达;与神经元功能密切相关的 KCNA1 , KCNC3 和 KCNG2 等钾离子通道蛋白异常低表达;而 SHANK1 和 NNAT 这两个与神经系统发育相关的重要蛋白在时序上的表达也出现紊乱。 这表明 FXS-iPSC 分化得到的神经元不成熟,而且不具备完善的神经元功能。 FXS-iPSC 来源的神经元中特异上调或下调的四类基因 该研究证实了 FMRP 的缺失可以引起 FXS 患者神经系统发育过程中整个基因调控网络的失衡,同时发现了可能用于治疗 FXS 的新的靶标分子。后续研究将在细胞和动物模型中进一步验证这些新发现的基因与 FXS 各种临床症状之间的联系。 研究得到了国家自然科学基金委、科技部重大科学 研究计划、教育部留学回国人员科研启动基金、上海市卫计委和江苏省科技计划的资助。 论文信息 : Lu, P., Chen, X., Feng, Y., Zeng, Q., Jiang, C., Zhu, X., Fan, G.,and Xue, Z. (2016). Integrated transcriptome analysis of human iPS cellsderived from a fragile X syndrome patient during neuronal differentiation. Sci China Life Sci. doi: 10.1007/s11427-016-0194-6 http://engine.scichina.com/publisher/scp/journal/SCLS/doi/10.1007/s11427-016-0194-6?slug=full%20text
个人分类: 《中国科学》论文|5287 次阅读|1 个评论
生命科学研究中的大数据
hsm 2016-8-6 18:25
大数据包含三个层面:量大,多维度,完备性。量大这方面目前的基因测序数据已经体现,一个基因组有好几个 G ;多维度这个体现就是基因变异的数量,这个也具备了,即资源群体的全基因组测序已经具备了这个条件,但是农艺表型的维度不够,代谢表型和分子表型的拓展才能将维度不断加大;完备性就是不同变异的组合完备性,目前是最欠缺的,一个普通物种的基因有几万个,而我们研究的群体只有几百个,而按照完备性考虑,样本量达到上万才能基本达到要求; 于是可以预测,转录组检测、代谢组检测和基因编辑创造新材料将是生命科学大数据研究的支撑。
4205 次阅读|0 个评论
[转载]百迈客云(BMKCloud)教您打造一篇5分转录组文章
candice880816 2016-8-4 17:14
关于转录组的研究,前期我们设计好了实验方案,按方案进行了取样转录组测序,拿到了标准分析报告。此时,要进入整理分析结果写文章的环节了,但是我们发现面对分析报告,不知如何下手。今天,小编教你如何整理出一篇5分转录组文章主体框架。 Results 1. 首先,文章第一部分是介绍实验方案,如实验目的、实验材料、测序方法等,以及对转录组测序数据的一个整体评估和分析,包括原始数据量、数据过滤、数据组装等。以无参转录组为例,可以用下表呈现结果: Results 2. 筛选差异表达基因。根据实验方案,选择不同样品筛选差异表达基因,分析比较不同分组间相同表达和差异表达基因数目,同时也可以按照上调和下调对差异基因进行进一步分类。根据不同样品间差异基因数目初步分析样品间的生物学关系。此部分一般用样品间差异基因维恩图来显示结果: Results 3. 差异基因功能富集和通路分析。为了解析生物学过程的具体机理,需要对不同样品间的差异基因进行功能富集和通路分析。此部分可以根据功能富集和通路分析结果详细讨论样品间共有和特异性状的具体机理。结果呈现方式如下: Results 4. 差异基因共表达趋势分析。该分析使用于2个以上的时间序列样本,分析随时间推移基因表达变化模式,针对不同模式的基因集可做表达模式图及GO分类图、KEGG通路分析,从而揭示该生物学过程的具体机理。结果呈现方式如下: Results 5. 差异基因共表达网络模块分析。用WGCNA进行基因共表达网络分析,WGCNA适用于复杂的数据模式,推荐5组以上的数据。例如:器官发育的5个时期;胁迫或病原菌侵染前后的5个时间点。用WGCNA提取出不同基因模块,分析模块与表型的关联性,并对模块进行生物学功能鉴定,分析模块间的相互作用关系,且找到每个模块的关键基因,结果呈现方式如下图: Results 6:行文至此,一般情况下都找到了课题的核心关键基因,那么转录组数据分析部分也比较完整了。但是,为了让结论更具有可行性,让文章更丰满,可以考虑结合其他方法进行联合分析,从而使文章更上一层楼。主要有三种方式:一为与lncRNA、circRNA、小RNA、甲基化进行联合分析;二为结合公共数据库,下载相关数据,如同一物种不同材料相同处理的原始数据,进行如上转录组数据分析,从而总结出同一物种某种处理的核心保守基因;三为结合蛋白组、代谢组以及其他实验方法对找到的核心关键基因进行功能验证。 综上所述,一篇转录组文章的主体框架已经出现,思路清晰,结果明确。以上关于转录组数据的分析以及结果的作图在百迈客云平台上均可以实现,感兴趣的朋友可以登录云平台体验一下https://www.biocloud.net/ 参考文献: Fu Y., Poli M.., (2016) Dissection of early transcriptional responses to water stress in Arundo donax L. by unigene-based RNA-seq. Biotechnology for Biofuels. DOI 10.1186 Sun Q., Du X., (2016) To be a flower or fruiting branch: insights revealed by mRNA and Small RNA transcriptomes from different cotton developmental stages. Scientific reports. DOI: 10.1038 Vlasova A., Capella-Gutiérrez S., (2016) Genome and transcriptome analysis of the Mesoamerican common bean and the role of gene duplications in establishing tissue and temporal specialization of genes. Genome Biology. DOI 10.1186
1670 次阅读|0 个评论
应用RNA测序揭示稻属种间三倍体杂种的转录组冲击现象
WileyChina 2016-6-14 10:14
作者: Ying Wu, Yue Sun, Xutong Wang, Xiuyun Lin, Shuai Sun, Kun Shen,Jie Wang, Tingting Jiang, Silin Zhong, Chunming Xu and Bao Liu 原文链接: http://onlinelibrary.wiley.com/doi/10.1111/jipb.12357/full 摘要: 种间 杂交是高等植物基因组进化及物种形成的重要驱动力之一。种间杂交经常会导致全基因组范围内的即刻基因表达 变化,这种现象被统称为 “ 转录组冲击 ” 。尽管转录组冲击已在多种植物和动物物种中有所报道,但是冲击所诱导的基因表达变化的程度及模式往往具有很强的特质性,因此需要更多的研究探寻一般性的规律。本研究中,我们使用亚洲栽培稻 Oryza sativa , ssp. japonica (2n = 2x = 24, 基因组 AA) 和野生稻 O. punctata 的四倍体细胞型 (2n = 4x = 48, 基因组 BBCC) 作为亲本,配制了一套种间三倍体 F1 杂种植株。通过 RNA 测序对杂种植株及其亲本进行了转录组分析。我们分别从部分同源基因的偏倚表达( homeolog expression bias ) 和每个基因总体表达水平 (total expression level) 两个角度分析了杂交植株相对于 “ 模拟杂种植株 ” (亲本按基因组构成比例的机械混合)的转录差异。我们发现在 F1 杂种植株叶片组织表达的 16,112 个基因中有 16% ( 2541 )的基因表现出非加性表达,并且这些基因特异的富集在光合作用相关的途径上。有趣的是,母本来源的部分同源基因表达的改变(包括非随机性沉默)是导致 F1 杂交植株中部分同源基因相对表达比例改变的主要原因。本文研究结果为探讨种间杂交所诱导的转录组响应与杂种优势的可能关系提供了新的信息。 作者: 东北师范大学植物分子表观遗传学实验室 Chunming Xu; Bao Liu
个人分类: Life Science|2 次阅读|0 个评论
单细胞测序同时监控基因组和转录组变化及思考
热度 2 gaoshannankai 2015-8-19 21:55
单细胞,单分子测序给癌症检测带来了突飞猛进的发展。 最近nature methods发表了一篇牛逼文章,可以同时 检测基因组和转录组两个层次得变化 In those cells where chromosomal gains or losses (either reciprocal or nonreciprocal) were seen at the genomic level, we observed concomitant increases and decreases in chromosome-wide relative gene expression levels after GT-seq analysis, which established for the first time (to our knowledge) that the effects of gene expression dosage can be rapidly established after the acquisition of aneuploidies during a single cell division. 下面给生物信息群里的老师同学,点评一下这篇文章 看看他的附图,结果好得不得了,以至于我不得不把所有附图也放上来 本文最大的问题就是生物信息分析不够专业,很多 流程都是常用软件加上默认参数,但不是最好的, 特别R语言画图,就是一坨屎,几种颜色分不开,我 600多度大眼睛,还要放大很多倍。 文章的分析稍微有些粗糙,他应该看一下,哪些基因表达变化的倍数与 染色体变化一致,也就是说是染色体变化导致的,还有另外一些基因 的表达变化远远高于低于染色体带来的变化,这要归结于其他原因 比如fusion gene,或者被调控等。 他这个技术最大的这个信息他居然没有充分利用。 文章叫做《GT-seq: parallel sequencing of single-cell genomes and transcriptomes》 macaulay2015.pdf nmeth.3370-S1.pdf 上面的想法,引起一个问题, 如果癌症都会有染色体级别的DNA变化, 必然导致基因表达改变,而且前者变化更大,很可能是主导因素,那么 我们现在大量的基因表达差异分析,又有什么意义呢? 反过来讲,这些染色体级别的DNA变化,又是谁引起的呢? 但是,即使把这些搞清楚了,又能怎么样?将来无非沿着两条腿走下去, 一个是继续寻找致癌机制,这个估计道路漫长啊,另外就是不管三七二十一, 通过检测找到某种癌症表现的共性,然后靶向某些基因,干死他, 后者思路更清晰,不过没那么简单。 由此想到,国内一大片跟着做下去的,无非继续检验了老外设备的技术可靠性。 其实,现在需要更多的临床信息,特别是药物与癌症互相作用等更多 其他方面的研究,对检测和治疗有直接帮助的东西,恰恰没有人做。 当前,医生们大多放弃了自己最有优势的病人资源,去和医学院生科院的 老师抢着杀小白鼠。 下半年嚷嚷的精准医疗,也不要盲目推出为好,无非就是一大堆外显子 测序,对于医院又能挣钱又发文章,对于病人呢,能改进治疗么,或者 根据突变的检测结果给予最恰当的药物匹配。 其实,几年前,我们就想做一个包括3000多种药物的库,与带有突变的 药物靶点进行匹配。几次基金不中,也就懒得多写这样的本子,还不够 浪费纸张的,有空可能会和公司合作,做一点有用的东西,希望能够 帮助一下癌症患者。
1409 次阅读|2 个评论
高级阶元昆虫转录组研究中的标本问题
hypermarket 2015-7-13 22:04
基于转录组的昆虫(及其它很多无脊椎动物)高级阶元系统发育研究给研究者提出了一项挑战,就是合格样品的获取,这其中的困难来自于5个方面。 第一,对于完整的代表类群选取这一要求来说,有些关键类群由于分布范围的局限,或生境的隐秘,或种群涨落季节性的差异,是不容易被采集到的,即使对于有十余年甚至更长采集经验的研究者来说也是如此。如果再考虑到环境变迁,以及天气等偶然性因素,采集的难度还会更大。对于栖境与习性差别很大的高级分类单元类群,这使得较为全面获取代表类群的难度较高。 第二,在传统分类学研究中,虽然分类学家也重视个体数量,但是是与个体大小之间没有相关性的;而对于转录组研究所要求的RNA总量,小的个体必然需要更多的个体数量,而1-2mm的个体大小在昆虫中比较普遍。 第三,在传统分类学研究中,分类学家一般不急于在野外就去分辨近缘种或形态相似的物种,但是对于转录组研究来说,这一需求就比较迫切,尤其是对于近缘种同域分布的情况。如果考虑到雌雄异型、色斑型等种内个体差异,这个困难还会变得更大。而要作出准确的判别,这需要对于相关类群的分类学、形态学、分布以及习性方面的知识有着较高的熟悉程度。虽然可以考虑带回实验室,进行饲养、分拣,但是也要面对因不熟悉饲养条件而造成活体死亡的可能性。 第四,对于比较完整的研究方案,一个物种需要同时包括用于RNA提取的个体和干制标本,后者用于分类学研究,在回到研究单位后进行物种鉴定。如果有可能的话,再保存一部分酒精浸制标本才会更为理想。 第五,有些关键类群在交通非常不便利的地方分布,或者在国外分布。那么即使有能力采集到足够多的同一物种的活体个体,如何妥善保存样品直到进行RNA总量和完整性的检测,也是一个非常现实的挑战,因为不论用RNALater还是液氮保存样品,都是有一定的时效性的。
3527 次阅读|0 个评论
1KITE项目中科级分类单元覆盖度的粗略计算
hypermarket 2015-6-1 16:28
在 1000 种昆虫转录组计划( 1k insect transcriptome evolution, 1Kite )的研究中,主要的类群选取 ( http://www.1kite.org/downloads/1KITE_species.txt )集中在目间关系、多新翅类Polyneoptera、完全变态类Holometabola等几大方向,以及蜻蜓目 Odonata 、网翅总目 Dictyoptera 、膜翅目Hymenoptera、毛翅目 Trichoptera 等类群内部;而在副新翅类中,数据密度相对较低,半翅目尤其如此 。从与其它类群的对比中我们可以看出,在昆虫纲最大的几个目中,半翅目和鞘翅目已有数据的覆盖度是最低的,只有 1/4左右或更低 ,低于六足总纲整体上的平均水平。虽然由于分类系统的差异,各目中科的数量会有一定程度的浮动,但是幅度有限,大体格局差不多如表中所示 。 类群名称 1Kite 中涉及的科 科的总数 百分比 弹尾目 11 30 37% 原尾目 1 7 14% 双尾目 2 10 20% 石蛃目 2 2 100% 衣鱼目 4 4 100% 蜻蜓目 21 32 66% 蜉蝣目 5 23 22% 襀翅目 12 16 75% 革翅目 9 11 82% 缺翅目 1 1 100% 螳螂目 13 15 87% 蜚蠊目(含等翅类) 12 17 70% 直翅目 20 40 50% 蛩蠊目 1 1 100% 螳䗛目 3 3 100% 纺足目 4 13 31% 䗛目 8 13 62% 啮虫目 + 虱目 21 66 32% 缨翅目 3 9 33% 半翅目 -“Homoptera” 18 85 21% 半翅目 -Heteroptera 20 93 22% 膜翅目 63 126 50% 捻翅目 1 10 10% 鞘翅目 44 176 25% 广翅目 2 2 100% 蛇蛉目 2 2 100% 脉翅目 14 16 88% 毛翅目 31 49 63% 鳞翅目 47 95 49% 双翅目 69 159 43% 长翅目 4 9 44% 蚤目 3 18 17% 合计 471 1153 41%
3497 次阅读|0 个评论
功能基因组学知识(二):转录组
hsm 2015-1-10 12:50
转录组是基因组与基因功能的链接。为了了解基因组与细胞功能的关系,科学家普遍通过基因组的产物(RNA或Protein)来研究。蛋白组是某个细胞或组织的一整套蛋白,包含量的信息和序列信息,但是由于目前蛋白质俘获技术不够成熟,所以目前利用蛋白质来进行大量的功能基因组研究时期还没有到。但是,度量gene与protein的中间体(RNA)目前技术相当成熟,可以大批量地开展。 转录组是基因活动的标志物。在多细胞的生物体,虽然每个细胞包含相同的基因组和基因,但是并不是每个基因在每个细胞都表达或表达的模式是一样的。通过不同细胞和组织的转录组研究, 1)可以理解某个细胞类型构成的特征; 2)可以深刻地理解疾病引起基因表达变化的特征; 3)可以解读不同发育阶段基因表达的特征; 4)可以研究ohnologs间的表达动态特征,了解基因分化的奥秘; 5)可以研究基因突变对那些基因的表达产生影响; 6)可以研究那些突变(启动子和编码区)更能影响基因本身的表达或者下游基因的表达; 只有很少一部分基因组转录成RNA,人类基因组有5%转录成RNA(Frith et al ., 2005)。而且大部分转录的RNA是非编码RNA(tRNA,rRNA,microRNA等)。mRNA也包含很多变异,包括可变剪接,RNA编辑,转录起始位点变异和终止位点变异。 原文链接:http://www.nature.com/scitable/topicpage/transcriptome-connecting-the-genome-to-gene-function-605 Frith, M. C., et al . Genomics: The amazing complexity of the human transcriptome. European Journal of Human Genetics 13 , 894–897 (2005) doi:10.1038/sj.ejhg.5201459
9614 次阅读|0 个评论
生物信息培训
hsm 2014-5-17 11:11
课程包括 : linux基础知识 perl语言入门 重测序数据分析 转录组数据分析 RAD测序数据分析 GoldenGate/Infinium芯片数据分析 分子标记开发 遗传图谱构建 QTL定位 请加群 213176882 交流! 详情请访问: http://sqkj.ke.qq.com/
2524 次阅读|0 个评论
如何让无参考转录组数据更有效
热度 1 Bearjazz 2014-1-8 09:09
熊荣川 六盘水师范学院生物信息学实验室 xiongrongchuan@126.com http://blog.sciencenet.cn/u/Bearjazz 无参考基因组的转录组数据通常使用从头组装的方式,因为二代测序技术很多都是断读段测序,不可避免地要产生一些“人工”基因,那么在这种情况下有什么方法或者原则可以优化结果,排除假阳性呢? 下面这些原则来自《基因九》 1、 单纯靠序列推断出来的开放阅读框最好大于100个氨基酸 2、 有效的基因应该在近缘物种中能找到同源基因
个人分类: 我的研究|2758 次阅读|1 个评论
再谈RNA-seq转录组数据分析的几个问题
热度 5 gaoshannankai 2013-5-31 04:11
上次谈到了 谈谈RNA-seq实验的几点经验,国内学者少走弯路 http://blog.sciencenet.cn/blog-907017-688359.html 其中很重要的就是质量控制和去除污染 alignment或拼接前,必须去除adapter、病毒、rRNA等污染序列。 alignment或拼接后, 还要仔细查看alignment的比例,以及实验重复之间的表达相关性 最近做的一个Tni转录组(一种虫子),通过拼接得到70458个转录本, 后来质量控制发现不对劲,就把数据blast到nt数据库,我的天 有11825条转录本来自大肠杆菌,这么多,肯定是污染,必须进一步去除 否则对后续分析影响很大,你比如说标准化。
6593 次阅读|8 个评论
[转载]关于new transcripts鉴定的几个问题。
热度 1 bioseq 2012-9-18 10:02
在论坛上看到一篇贴子,现在整理给大家看看。声明,以下内容来自于:中国测序论坛。 1楼A提问: 各位同道,不知有没有研究新转录物的? 我之前做了novel transcript的构建,用的是tophat-cufflinks系统。 但是有几个问题提出来大家讨论一下: 首先是长度问题,得到总共一两万个new transcript中有许多(几千个)长度为1个read的(我们用的SOLiD 3,50SE)transcripts出现,很明显这不可能。 但是要怎么定义它不是artificial的呢?如何将化学过程和计算过程导致的假转录物过滤掉呢? 另外expression level的问题也是一个很困惑的问题,这个问题相信大家在做转录组的时候都会遇到。 有许多基因的表达量(FPKM)非常低,比如0.00005,这么低的转录水平,当我们用IGV(Integrative Genomics Viewerwww.broadinstitute.org/igv )看bam文件时候,它上面reads非常少,一个两个,还不连续,看起来简直就像是noise。 所以对于低表达的转录物到底有没有一个合适的cutoff来filter呢? 最后一个问题是关于现在已有的annotation的问题。 当我获得了一批transcripts的时候,需要先与已知的转录物进行比对,筛掉已知的,剩下才是真正“novel”的。 我现在的做法,是用cufflinks里面的Reference Annotation Based Transcript (RABT) assembly 方法。 annotation用的是ucsc上download的refgene.gtf 文件。这样的好处是方便,拿来就能用,但是却忽略了许多其他database的annotation。 这个问题在做一般的转录组分析时问题不大,但是如果要研究new transcript,就有问题了。 很可能折腾半天弄出来的transcript,有许多在其他数据库里已经有注释了。 我听说有的高水平实验室已经整合了refgene、ensemble、fantom等不同机构的注释,请问有哪位知道到哪里下载一个最全面的注释文件吗? 2楼B回答: 1. About length: SOLID SD50确实容易出现这种情况,原因就是比较短,而且不是pair-end,所以cufflink会给出很多这种单read的转录本,有可能连一个exon都没有cover。避免方法就是尽量测长一点,至少是PE100,现在的Hiseq2000标准都是这个结果;计算的时候这种sigle-read transcript如果表达量不是非常高的话,就可以discard。 2. About expression level: 低表达量的转录本到底如何定义叫低?也就是FPKM的cutoff取多少合适,一般是要用同样的这个RNA-seq数据去看一下已知到的同类型的转录本的表达量分布,如你是要看lincRNA,就看看已知的lincRNA表达量分布,然后取一个3/4分位数,这是比较简单易行的操作,有的文献还会做更复杂的统计学模型去估算,个人认为如果不是纯粹的生物信息文章,这个可以简单操作。 3. About annotation: 我们是不用cufflinks进行这一步操作的,有专门的cuffdiff和cuffmerge,可以做这件事情。当然还会涉及到你说的根本问题,就是必须有一个比较好的比较全的已知转录本集合。我们自己是有,不知道你是用哪个物种? 3楼——A追问到: . About length: SOLID SD50确实容易出现这种情况,原因就是比较短,而且不是pair-end,所以cufflink会给 ... 现在我们对于length的处理方法,就是只看multi-exon的transcript,这样好像更可靠一点,但是损失还是很惨重的。 你说的3/4这个cutoff,是将所有transcript按表达量排序,最低的1/4丢掉? 还是将所有transcripts的表达量这一列提取出来,sort and uniq之后,取倒数1/4位置的值为cutoff,小于此cutoff的transcripts都丢掉? 我现在就是用cuffmerge来做的,仍然是注释问题不好解决,是mouse的。 4楼B回答: SOLID出来的reads太短,你用multi-exon去卡,基本上得不到什么了。 3/4 cutoff,是指用先验知识来得到这个真正的值,然后用这个cutoff值,例如FPKM=0.35,如果你构建的新的转录本,尤其是sigle-exon or single-read transcript,就扔掉。 mouse的集合,可给你一个。 5楼A追问: "用先验知识来得到这个真正的值"? 这个怎么操作呢? 谢谢老师,请问mouse的注释集合我是要到哪里去下载吗?或者您可以发到我邮箱? mart555@163.com 6楼C参与讨论: 授人以鱼不如授人以渔: 可用cuffmerge获得Mouse的全集合: ? 1 cuffmerge -o $outDir $cuffMergeFilesList 其中cuffMergeFilesList内容为: refseq.gtf ensembl.gtf ucsc.gtf 7楼A追问: 但是ensemble的注释是不是针对GRCm38,而uscs的注释针对mm9 虽然两个都是mouse的genome,但是还是有不同的地方,这样的话,做merge的时候不会出错吗?? they are different: in refgene.gtf: chr3 refGene start_codon 34549338 34549340 0.000000 + . gene_id "Sox2"; transcript_id "NM_011443"; in Mus_musculus.GRCm38.68.gtf 3 protein_coding exon 34650005 34652461 . + . gene_id "ENSMUSG00000074637"; transcript_id "ENSMUST00000099151"; exon_number "1"; gene_name "Sox2"; gene_biotype "protein_coding"; transcript_name "Sox2-001"; 还是有不同的。 这样的话可以直接merge吗??? 8楼C回答: 当然不可以! Mus_musculus.GRCm38.68.gtf的第一列 通通加上chr 个人觉得这个讨论很精彩,决定转载到这里和大家一起分享。欢迎各位同仁一起探讨。
5320 次阅读|1 个评论
部分蛋白编码基因起源于non-coding RNA
bioseq 2012-9-14 11:31
新发表两篇恒河猴转录组医学论文, 发现部分蛋白编码基因起源于non-coding RNA, 建立恒河猴专家数据库; 附上文章链接, 欢迎讨论批评: http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1002942 Hominoid-Specific De Novo Protein-Coding Genes Originating from Long Non-Coding RNAs Chen Xie, Yong E. Zhang, Jia-Yu Chen, ..., Liping Wei and Chuan-Yun Li Abstract Tinkering with pre-existing genes has long been known as a major way to create new genes. Recently, however, motherless protein-coding genes have been found to have emerged de novo from ancestral non-coding DNAs. How these genes originated is not well addressed to date. Here we identified 24 hominoid-specific de novo protein-coding genes with precise origination timing in vertebrate phylogeny. Strand-specific RNA–Seq analyses were performed in five rhesus macaque tissues (liver, prefrontal cortex, skeletal muscle, adipose, and testis), which were then integrated with public transcriptome data from human, chimpanzee, and rhesus macaque. On the basis of comparing the RNA expression profiles in the three species, we found that most of the hominoid-specific de novo protein-coding genes encoded polyadenylated non-coding RNAs in rhesus macaque or chimpanzee with a similar transcript structure and correlated tissue expression profile. According to the rule of parsimony, the majority of these hominoid-specific de novo protein-coding genes appear to have acquired a regulated transcript structure and expression profile before acquiring coding potential. Interestingly, although the expression profile was largely correlated, the coding genes in human often showed higher transcriptional abundance than their non-coding counterparts in rhesus macaque. The major findings we report in this manuscript are robust and insensitive to the parameters used in the identification and analysis of de novo genes. Our results suggest that at least a portion of long non-coding RNAs, especially those with active and regulated transcription, may serve as a birth pool for protein-coding genes, which are then further optimized at the transcriptional level. http://nar.oxfordjournals.org/content/early/2012/09/08/nar.gks835.long RhesusBase: a knowledgebase for the monkey research community Shi-Jian Zhang, Chujun Liu ....Xiuqin Zhang and Chuan-Yun Li Abstract Although the rhesus macaque is a unique model for the translational study of human diseases, currently its use in biomedical research is still in its infant stage due to error-prone gene structures and limited annotations. Here, we present RhesusBase for the monkey research community ( http://www.rhesusbase.org ). We performed strand-specific RNA-Seq studies in 10 macaque tissues and generated 1.2 billion 90-bp paired-end reads, covering 97.4% of the putative exon in macaque transcripts annotated by Ensembl. We found that at least 28.7% of the macaque transcripts were previously mis-annotated, mainly due to incorrect exon–intron boundaries, incomplete untranslated regions (UTRs) and missed exons. Compared with the previous gene models, the revised transcripts show clearer sequence motifs near splicing junctions and the end of UTRs, as well as cleaner patterns of exon–intron distribution for expression tags and cross-species conservation scores. Strikingly, 1292 exon–intron boundary revisions between coding exons corrected the previously mis-annotated open reading frames. The revised gene models were experimentally verified in randomly selected cases. We further integrated functional genomics annotations from 60 categories of public and in-house resources and developed an online accessible database. User-friendly interfaces were developed to update, retrieve, visualize and download the RhesusBase meta-data, providing a ‘one-stop’ resource for the monkey research community.
3482 次阅读|0 个评论
转录组数据分析之tophat篇
热度 2 bioseq 2012-9-7 16:13
TopHat简介 TopHat 是一个基于Bowtie的RNA-Seq数据分析工具。它可以快速确认exon-exon剪切拼接事件。TopHat有Linux和OS X x86_64编译版本,当然也可以使用 原代码 编译适合自己操作系统的版本。 其上游软件是 Bowtie ,下游是 Cufflinks 。 理论上,TopHat是针对Illumina Genome Analyzer而设计的软件,它偶尔也能对其它来源的数据进行分析,但不保证成功。它针对75bp以上长度的短序进行了优化。 在使用TopHat前,必须将 Bowtie 的可执行文件的目录输出到PATH变量中去,例: export PATH=$PATH:/share/sbin/bowtie 确保TopHat可以运行bowtie, bowtie-inspect以及bowtie-build。 还需要 下载安装samtools 。 Tophat的安装: 可参见贴子 ,可参见贴子: http://seq.cn/forum.php?mod=viewthreadtid=678 TopHat的使用范例: tophat * ebwt_base reads1_1 tophat的参数,解释其中一部分: -o/--output-dir string 输出目录。默认值为 “./tophat_out”. -r/--mate-inner-dist int 比对时两成对引物间的距离中值。比如说,如果你的插入片段有300bp,而每个引物有50bp,那么r值就应该是200=(300+50*2)/2。没有默认值,如果是末端配对比对时这个值是必须的。 --mate-std-dev int 末端配对时中间插入片段的长度的标准差,默认值为20bp -a/--min-anchor-length int 锚定点长度”anchor length”. TopHat可以判断junction(剪切拼接)。这需要设定锚定点的最短长度,最短不能少于3,默认值为8 -m/--splice-mismatches int 锚定点范围内错配的个数。默认值为0 -i/--min-intron-length int 最短的内含子长度。默认值为70 -I/--max-intron-length int 最长的内含子长度。默认值为500000. --max-insertion-length int 比对时插入错配最长的长度,默认值为3. --max-deletion-length int 比对时缺失的最长长度,默认值为3. 参考资料 http://seq.cn/forum.php?mod=viewthreadtid=1650extra=page%3D1
7488 次阅读|3 个评论
[转载]转录组的怪事:嵌合RNA
bioseq 2012-9-7 12:16
嵌合RNA(chimeric RNA) 由两个或两个以上基因的外显子组成,并具有编码新蛋白改变细胞表型的潜力。嵌合RNA就是命名于合成神兽奇美拉的。迄今为止研究人员已在多种生物体中通过高通量RNA测序,识别了大量可能的嵌合转录本。 嵌合RNA的形成机理,多数人认为是与 tran-splicing 有关。 tran-splicing (反式剪接) :内含子的剪接一般都是发生在同一个基因内,切除内含子,相邻的外显子彼此连接,称为顺式剪接(cis-splicing),但也有另一种情况,即不同基因的外显子剪接后相互连接,此称为反式剪接(trans-splicing)。反式剪接的情况较为稀少,较典型的例子是锥虫表面糖蛋白基因VSG(variable surface glycoprotein)、线虫的肌动蛋白基因(actin genes)以及衣藻(chlamydomonas)叶绿体DNA中含有的psa基因。 “嵌合的RNA(Chimeric RNA)”以及它们的编码产物通常都是 导致肿瘤发生的原因之一 。不过这些分子在正常的细胞中也经常会有低水平的表达。我们以前一直认为这些分子是由于DNA重排产生的,现在才弄清楚,在一些正常细胞内会经常发生RNA反式剪接作用, 正常的细胞中也会鉴定到一些 嵌合的RNA ,他们的功能有待进一步挖掘! 在线视频--嵌合RNA 详述 : http://v.youku.com/v_show/id_XNDQ2NDYyNzcy.html @dean @aoe0618 @mervin @arabidopsis2012 @happytohell @genesail @bod_ljong @bioyuwei @rtliu @cchen @cnelonger @biocurie @bio_stone @bernard @wodexinren Introduction to chimera RNA(www.seq.cn).wmv
2222 次阅读|0 个评论
药用植物大麻基因组破译
bingansuan 2012-3-28 14:10
大麻,荨麻目大麻科草本植物。在中国,大麻有6000多年的种植历史,其纤维用以纺织、制作绳索、造纸;其种子可榨油、做饲料、入药;当然其最臭名昭著的特征是其雌性植株经干燥的花和毛状体,提取的四氢大麻酚(THC),THC经吸食后有致幻和毒性作用,是当今西方社会滥用最多的成瘾性药物,但由于大麻对中枢神经系统有抑制、麻醉作用,因此也用于临床上,在某些欧美国家中,用来治疗晚期癌症,多发性硬化症等(http://baike.baidu.com/view/9878.htm)。 目前为止大麻包括三个品种,分别是Cannabis sativa,Cannabis indica,Cannabis ruderalis(http://zh.wikipedia.org/wiki/%E5%A4%A7%E9%BA%BB)。由于自然选择以及人工驯化,不同品系间其大麻素类化合物含量和成分有较大差别,因此用途有差异。大麻素类化合物中最主要的有两类,即THC(四氢大麻酚)和CBD(大麻二酚),前者有致幻作用,而后者无此作用,在植物体内,二者的合成前体分别是THCA和CBDA。根据二者含量的高低比例,人们将具有高THCA/低CBDA的化学型大麻称之为marijuana;而将相反的化学型称之为hemp。 2011年8月8日,药用基因组公司(Medicinal Genomics)宣布完成了大麻两个品系(Cannabis sativa和Cannabis indica)的全基因组测序(http://www.medicinalgenomics.com/genome-browsers/),为了降低基因组复杂度,用了三次回交的品系进行测序,发现大麻品系间的基因组变异超过1%。 2011年10月20日,加拿大多伦多大学、萨斯喀彻温省大学等研究机构在Genome Biology发表了The draft genome and transcriptome of Cannabis sativa一文。(http://www.ncbi.nlm.nih.gov/pubmed?term=The%20draft%20genome%20and%20transcriptome%20of%20Cannabis%20sativa)。 他们对富含THC的marijuana大麻中Purple Kush品种进行了基因组测序,而对hemp大麻中的栽培种Finola和USO-31品种进行了重测序研究;此外,还将Purple Kush和Finola的转录组测序结果进行了比较研究。 研究结果包括:大麻(Cannabis sativa)基因组大小为534 Mb(实际上其雄性植株基因组大小为843Mb,雌性植株基因组大小为818Mb),有3万个转录本基因;THCA合成酶在Purple中表达远高于Finola品种,并且发现了大麻素类化合物合成代谢通路的全部转录本(共18个);重测序发现在四个大麻品种中大麻素类化合物合成通路酶的基因拷贝数几乎没有差异,同时基于单核苷酸变异构建了几个品种的系统进化树,揭示了marijuana和hemp的分化历史。 2012年3月2日,陕西师范大学,加拿大渥太华大学、中科院昆明动物所等研究机构在Cell上发表了Acute Cannabinoids Impair Working Memory through Astroglial CB1 Receptor Modulation of Hippocampal LTD一文。(http://www.ncbi.nlm.nih.gov/pubmed?term=Acute%20Cannabinoids%20Impair%20Working%20Memory%20through%20Astroglial%20CB1%20Receptor%20Modulation%20of%20Hippocampal%20LTD)。研究人员以小鼠为实验材料,应用免疫组化、电生理及相关行为学测验,揭示了大麻素类化合物对工作记忆损伤的分子机制,为失忆研究提出的新观点,也有助于分析大麻在缓解疼痛方面的机理研究。 科研人员正一步步揭开大麻的神秘面纱。。。 n. 细胞, 电池, 单元, 蜂房, 单人牢房
个人分类: 中草药|8168 次阅读|0 个评论

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-28 05:16

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部