博文

小麦穗粒数转录组分析(一)

已有 6553 次阅读 2018-2-17 22:12 |系统分类:科研笔记|关键词:学者| 小麦, 转录组, 关联分析, GWAS, RNA_seq

本期作者：Neal

小麦穗粒数转录组分析

过年回家，胖丫要经受两方面的考验。首先是“逼婚以及生孩子”，老大不小了怎么还不结婚，老大不小了怎么还不生孩子；再次是毕业问题，老大不小了怎么还不毕业。这两个问题，实在是让胖丫难以给他们确定的答复，每次都是搪塞过去。苦口婆心的举了很多例子，如果最后还是油盐不进，最后总会来上这么一句“上学上傻了”。听到这句话，胖丫也不去辩解，因为自己也曾一度怀疑是否真的上学上傻了。如果听到还在读博士，往往是一副羡慕的样子，总是说，下来能挣大钱了。胖丫总是说，赚不了大钱，读书都读傻了，不可能赚大钱了。

今天就说到这，以后有机会再聊，说会到我们今天的正题。中国科学院遗传与发育生物学研究所焦雨铃研究组和中国农业大学王向峰研究组合作，利用前人筛选出的我国小麦微核心种质，通过转录组关联分析和基因共表达网络分析的策略研究了幼穗发育的基因表达调控网络，并验证了其中的关键因子在穗粒数调控中的作用。研究结果得到了多个与穗粒数相关的核心共表达模块。研究人员对其中10个基因进行了过表达分析，发现过表达基因TaTFL1可以延长幼穗分化时间，增加小穗数，小花数和穗粒数；过表达基因TaPAP2, TaVRS1可以缩短幼穗分化时间，减少小穗数，小花数和穗粒数。以上研究结果为研究人员进一步解析小麦穗发育的遗传调控提供了理论基础，并为有效利用与穗粒数相关的分子模块进行了初步技术验证。

该研究结果于2017年8月14日在线发表在Plant Physiology上（DOI: 10.1104/pp.17.00694，题目是“Transcriptome Association Identifies Regulators of Wheat Spike Architecture”。焦雨铃研究组的博士后王元格和已毕业博士生于浩澎为该文章的共同第一作者。焦雨铃研究员和王向峰教授为共同通讯作者，中科院遗传发育所童依平研究员和高彩霞研究员参与研究。

以上是文章的新闻稿，提供给大家做个了解。今天我们结合另外一篇发表在Front. Plant Sci.上的文章来进一步解读，该文的题目是“A Combined Association Mapping and Linkage Analysis of Kernel Number Per Spike in Common Wheat (Triticum aestivum L.)”。下图列出了文章的作者信息，相信一定有很多小伙伴熟悉。

请点击输入图片描述

Kernel number per spike (KNPS) in wheat is a key factor that limits yield improvement. In this study, we genotyped a set of 264 cultivars, and a RIL population derived from the cross Yangmai 13/C615 using the 90K wheat iSelect SNP array. We detected 62 significantly associated signals for KNPS at 47 single nucleotide polymorphism (SNP) loci through genome-wide association analysis of data obtained from multiple environments. These loci were on 19 chromosomes, and the phenotypic variation attributable to each one ranged from 1.53 to 39.52%. Twelve (25.53%) of the loci were also significantly associated with KNPS in the RIL population grown in multiple environments. For example, BS00022896_51-2ATT , BobWhite_c10539_201-2DAA, Excalibur_c73633_120-3BGG , and Kukri_c35508_426-7 DTT were significantly associated with KNPS in all environments. Our findings demonstrate the effective integration of association mapping and linkage analysis for KNPS, and underpin KNPS as a target trait for marker-assisted selection and genetic fine mapping.

这两篇文章所用材料都有小麦微核心种质，如果结合两个结果是不是有意思的结果出来呢？实际上第一篇文章也是可以做关联分析的。下面我们就一步一步来，实际上这也是我的实验记录。

首先要在NCBI上下载数据，这一步略去不表，参见我们以前的推送SRA数据的下载以及在线blast--或许与你了解的不一样这第二步就是将下载的数据mapping至中国春基因组，并获得包含变异信息的gvcf文件。因为是转录组数据，所以mapping软件使用的是STAR，这里没有列出如何将基因组序列index。下面是流程，整个从流程从sra文件开始，到bam文件结束。具体应用到自己的项目上时，要根据需要修改。还是那句话，小麦里的变异分析，我也是新媳妇上花轿——头一回。

#!/usr/bin/env python
# -*- coding: utf-8 -*-
__author__ ='wheatomics'
import subprocess
with open('input.txt','r')as f:
    for line in f:
        line = line.strip().split()
        sra, rg = line
        print sra, rg
        proc = subprocess.Popen(['fastq-dump','--split-3','--defline-qual','+','--defline-seq','@$ac-$si/$ri','--helicos',  sra +'.sra'], shell=False)
        proc.wait()
        proc = subprocess.Popen(
                ['fastp','-w','10','-i', sra +'_1.fastq','-o', sra +'_out.1.fastq','-I', sra +'_2.fastq',
                 '-O', sra +'_out.2.fastq'], shell=False)
        proc.wait()
        # 1. Mapping reads with STAR
        proc = subprocess.Popen(
            ['STAR','--twopassMode','Basic','--genomeDir','/data2/Fshare/FastaAndIndex/IWGSC_v1.0_STAR/',
             '--runThreadN','20','--limitSjdbInsertNsj','5000000','--outSAMtype','BAM','SortedByCoordinate','--twopass1readsN','-1',
             '--sjdbOverhang','100','--outFilterMismatchNmax','6','--readFilesIn', sra +'_out.1.fastq', sra +'_out.2.fastq',
             '--outSAMattrRGline','ID:'+ rg,'SM:'+ rg,'PL:ILLUMINA'], shell=False)
        proc.wait()
        proc = subprocess.Popen(['mv','Aligned.sortedByCoord.out.bam','sorted.bam'], shell=False)
        proc.wait()
        proc = subprocess.Popen(['shred','-u','-z', sra +'_1.fastq'], shell=False)
        proc.wait()
        proc = subprocess.Popen(['shred','-u','-z', sra +'_2.fastq'], shell=False)
        proc.wait()
        proc = subprocess.Popen(['shred','-u','-z', sra +'_out.1.fastq'], shell=False)
        proc.wait()
        proc = subprocess.Popen(['shred','-u','-z', sra +'_out.2.fastq'], shell=False)
        proc.wait()
        proc = subprocess.Popen(['sentieon','util','index','sorted.bam'], shell=False)
        proc.wait()
        # 2. Metrics
        proc = subprocess.Popen(['sentieon','driver','-r','/data2/Fshare/FastaAndIndex/IWGSC_v1.0_STAR/IWGSC_v1.0_part.fasta',
                                 '-t','20','-i','sorted.bam','--algo','MeanQualityByCycle','mq_metrics.txt',
                                 '--algo','QualDistribution','qd_metrics.txt','--algo','GCBias','--summary','gc_summary.txt',
                                 'gc_metrics.txt','--algo','AlignmentStat','--adapter_seq',"''",'aln_metrics.txt',
                                 '--algo','InsertSizeMetricAlgo','is_metrics.txt'], shell=False)
        proc.wait()
        proc = subprocess.Popen(['sentieon','plot','metrics','-o', rg +'-metrics-report.pdf','gc=gc_metrics.txt',
                                 'qd=qd_metrics.txt','mq=mq_metrics.txt','isize=is_metrics.txt'], shell=False)
        proc.wait()
        # 3. Remove Duplicate Reads
        proc = subprocess.Popen(['sentieon','driver','-t','20','-i','sorted.bam','--algo','LocusCollector',
                                 '--fun','score_info','score.txt'], shell=False)
        proc.wait()
        proc = subprocess.Popen(['sentieon','driver','-t','20','-i','sorted.bam','--algo','Dedup','--rmdup',
                                 '--score_info','score.txt','--metrics','dedup_metrics.txt','deduped.bam'], shell=False)
        proc.wait()
        # 4. Split reads at Junction
        proc = subprocess.Popen(['sentieon','driver','-r','/data2/Fshare/FastaAndIndex/IWGSC_v1.0_STAR/IWGSC_v1.0_part.fasta',
                                 '-t','20','-i','deduped.bam','--algo','RNASplitReadsAtJunction','--reassign_mapq',
                                 '255:60','splitted.bam'], shell=False)
        proc.wait()
        # 5. Indel realigner
        proc = subprocess.Popen(['sentieon','driver','-r','/data2/Fshare/FastaAndIndex/IWGSC_v1.0_STAR/IWGSC_v1.0_part.fasta',
                                 '-t','20','-i','splitted.bam','--algo','Realigner', rg +'realigned.bam'], shell=False)
        proc.wait()

得到bam文件之后，通过下面的命令就可以获得vcf文件

# 此处为命令示例，多个bam文件，可使用 -i 添加
sentieon driver -r /data2/Fshare/FastaAndIndex/IWGSC_v1.0_STAR/IWGSC_v1.0_part.fasta --read_filter MapQualFilter,min_map_qual=60-t 10-i Aimengniu.realigned.bam -i Aodesa3.realigned.bam -i Baibiansui.realigned.bam -i Baidatou.realigned.bam

下面这一步就是筛选SNP了，筛选这一步要到下期介绍了。后面会多一些发表文章常见的结果。因为本文是90个材料的RNA_seq的数据，所以会包括群体遗传分析和进化方面的一些内容，下面推送会包括以为内容，GWAS，eGWAS，结合其他已知信息尝试锁定QTL候选基因等。

最后祝大家假期愉快，万事如意！还有，情人节快乐，单身的早日找到另一半

欢迎关注“小麦研究联盟”，了解小麦新进展

投稿、转载、合作以及信息分布等请联系：wheatgenome

转载本文请联系原作者获取授权，同时请注明本文来自马省伟科学网博客。
链接地址：https://m.sciencenet.cn/blog-1094241-1100172.html

上一篇：春节巨献: 揭开小麦Ph1的面纱-正史篇
下一篇：利用BSR-Seq技术快速定位小麦抗条锈病基因YrMM58和YrHY1

mashengwei的个人博客分享 http://blog.sciencenet.cn/u/mashengwei

博文

小麦穗粒数转录组分析(一)

小麦穗粒数转录组分析

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

马省伟

全部作者的精选博文

全部作者的其他最新博文

全部精选博文导读

相关博文

mashengwei的个人博客分享 http://blog.sciencenet.cn/u/mashengwei

博文

小麦穗粒数转录组分析(一)

小麦穗粒数转录组分析

当前推荐数：0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

马省伟

全部作者的精选博文

全部作者的其他最新博文

全部精选博文导读

相关博文

该博文允许注册用户评论请点击登录评论 (0 个评论)