12 31 本期作者:麦萌 又到了周日,有没有感到时光飞逝?今天恰好是2017年的最后一天。岁末年初,还是挺忙的。今晚胖丫仍然猫在实验室,我问她为啥没出去跨年啊。胖丫淡淡的说,“热闹是他们的,我什么也没有 ”。 下面是17年12份小麦相关的文献列表。 一周文献汇总 一周文献推荐(2017.12.9) 小麦一周文献推荐(12.17) 小麦一周文献推送(12.23) 1 Interspecific and intergeneric hybridization as a source of variation for wheat grain quality improvement Wheat quality and its end-uses are mainly based on variation in three traits: grain hardness, gluten quality and starch. In recent times, the importance of nutritional quality and health-related aspects has increased the range of these traits with the inclusion of other grain components such as vitamins, fibre and micronutrients. One option to enlarge the genetic variability in wheat for all these components has been the use of wild relatives, together with underutilised or neglected wheat varieties or species. In the current review, we summarise the role of each grain component in relation to grain quality, their variation in modern wheat and the alternative sources in which wheat breeders have found novel variation. 2 Functional and DNA–protein binding studies of WRKY transcription factors and their expression analysis in response to biotic and abiotic stress in wheat (Triticum aestivum L.) WRKY, a plant-specific transcription factor family, plays vital roles in pathogen defense, abiotic stress, and phytohormone signalling. Little is known about the roles and function of WRKY transcription factors in response to rust diseases in wheat. In the present study, three TaWRKY genes encoding complete protein sequences were cloned. They belonged to class II and III WRKY based on the number of WRKY domains and the pattern of zinc finger structures. Twenty-two DNA–protein binding docking complexes predicted stable interactions of WRKY domain with W-box. Quantitative real-time-PCR using wheat near-isogenic lines with or without Lr28 gene revealed differential up- or down-regulation in response to biotic and abiotic stress treatments which could be responsible for their functional divergence in wheat. TaWRKY62 was found to be induced upon treatment with JA, MJ, and SA and reduced after ABA treatments. Maximum induction of six out of seven genes occurred at 48 h post inoculation due to pathogen inoculation. Hence, TaWRKY (49, 50, 52, 55, 57, and 62) can be considered as potential candidate genes for further functional validation as well as for crop improvement programs for stress resistance. The results of the present study will enhance knowledge towards understanding the molecular basis of mode of action of WRKY transcription factor genes in wheat and their role during leaf rust pathogenesis in particular. 3 Durum wheat diversity for heat stress tolerance during inflorescence emergence is correlated to TdHSP101C expression in early developmental stages The predicted world population increase along with climate changes threatens sustainable agricultural supply in the coming decades. It is therefore vital to understand crops diversity associated to abiotic stress response. Heat stress is considered one of the major constrains on crops productivity thus it is essential to develop new approaches for a precocious and rigorous evaluation of varietal diversity regarding heat tolerance. Plant cell membrane thermostability (CMS) is a widely used method for wheat thermotolerance assessment although its limitations require complementary solutions. In this work we used CMS assay and explored TdHSP101C genes as an additional tool for durum wheat screening. Genomic and transcriptomic analyses of TdHSP101C genes were performed in varieties with contrasting CMS results and further correlated with heat stress tolerance during fertilization and seed development. Although the durum wheat varieties studied presented a very high homology on TdHSP101C genes (99%) the transcriptomic assessment allowed the discrimination between varieties with good CMS results and its correlation with differential impacts of heat treatment during inflorescence emergence and seed development on grain yield. The evidences here reported indicate that TdHSP101C transcription levels induced by heat stress in fully expanded leaves may be a promising complementary screening tool to discriminate between durum wheat varieties identified as thermotolerant through CMS. 4 Virulence of some Puccinia triticina races to the effective wheat leaf rust resistant genes Lr 9 and Lr 19 under Egyptian field conditions Leaf rust ( Puccinia triticina Eriks.) is the most widespread disease of wheat ( Triticum aestivum L.) in Egypt and worldwide. The two leaf rust resistance genes i.e. Lr 9 and Lr 19 were previously highly effective against the predominant Puccinia triticina races in Egypt. In 2015/2016 growing season, susceptible field reaction was recorded on these two genes, where rust severity reached to 40% (S) for Lr 9 and 5% (S) for Lr 19 under Egyptian field conditions at four locations i.e. El-Behira, El-Minufiya, El-Qalubiya and El-Fayom governorates. In this study, 39 leaf rust monogenic lines and 16 commercial wheat cultivars were tested at seedling stage. While, 12 leaf rust monogenic lines and the same 16 wheat cultivars were evaluated at adult plant stage. Eight leaf rust field samples were collected from these governorates (four from each of Lr 9 and Lr 19). Forty single isolates were derived from the collected samples of Lr 9 and Lr 19 (each with 20 isolates). Eight pathotypes were identified from Lr 9, while only two pathotypes were identified from Lr 19. The most frequent pathotype (virulent to Lr 9) was KTSPT (30% frequency), followed by TTTMS (25% frequency). The other pathotypes ranged from only 5%–10% frequency. Whereas, the most frequent pathotype (virulent to Lr 19) was CTTTT (85% frequency), while the lowest frequent one was PKTST (15% frequency). Pathotypes i.e. PRSTT, NTKTS and TTTMS (identified from Lr 9) were more aggressiveness on the most of the tested leaf rust monogenic lines than others, as they were virulent to 36, 35 and 35 lines from a total of 39 monogenic lines, respectively. Also, the two pathotypes; PKTST and CTTTT (identified from Lr 19) were virulent to 36 and 35 monogenic lines, respectively. Moreover, leaf rust pathotypes i.e. NPTNK and PRSTT (from Lr 9) and PKTST (from Lr 19) were the most aggressive on the tested wheat cultivars at seedling stage. Lr 2a was the most effective leaf rust resistance gene against the tested pathotypes at adult plant stage. On the other hand, the three wheat cultivars Misr 1, Misr 2 and Nubariya 1 proved to be the highly resistant cultivars against all the tested leaf rust pathotypes at adult plant stage. 5 Resistance of Aegilops longissima to the rusts of wheat | Plant Disease Stem rust (caused by Puccinia graminis f. sp. tritici), leaf rust (P. triticina), and stripe rust (P. striiformis f. sp. tritici) rank among the most important diseases of wheat worldwide. The development of resistant cultivars is the preferred method of controlling rust diseases because it is environmentally benign and also cost-effective. However, new virulence types often arise in pathogen populations, rendering such cultivars vulnerable to losses. The identification of new sources of resistance is key to providing long-lasting disease control against the rapidly evolving rust pathogens. Thus, the objective of this research was to evaluate the wheat wild relative Aegilops longissima for resistance to stem rust, leaf rust, and stripe rust at the seedling stage in the greenhouse. A diverse collection of 394 accessions of the species, mostly from Israel, was assembled for the study, but the total number included in any one rust evaluation ranged from 308 to 379. With respect to stem rust resistance, 18.2% and 80.8% of accessions were resistant to the widely virulent U.S. and Kenyan P. graminis f. sp. tritici races of TTTTF and TTKSK, respectively. The percentage of accessions exhibiting resistance to the U.S. P. triticina races of THBJ and BBBD was 65.9% and 52.2%, respectively. Over half (50.1%) of the Ae. longissima accessions were resistant to the U.S. P. striiformis f. sp. tritici race PSTv-37. Ten accessions (AEG-683-23, AEG-725-15, AEG-803-49, AEG-1274-20, AEG-1276-22, AEG-1471-15, AEG-1475-19, AEG-2974-0, AEG-4005-20, and AEG-8705-10) were resistant to all races of the three rust pathogens used in this study. Distinct differences in the geographic distribution of resistance and susceptibility were found in Ae. longissima accessions from Israel in response to some rust races. To P. graminis f. sp. tritici race TTKSK, populations with a very high frequency of resistance were concentrated in the central and northern part of Israel, whereas populations with a comparatively higher frequency of susceptibility were concentrated in the southern part of the country. The reverse trend was observed with respect to P. striiformis f. sp. tritici race PSTv-37. The results from this study demonstrate that Ae. longissima is a rich source of rust resistance genes for wheat improvement. 6 Genes WHEAT FRIZZY PANICLE and SHAM RAMIFICATION 2 independently regulate differentiation of floral meristems in wheat Here we characterized diploid and tetraploid wheat lines of various non-standard spike morphotypes, which allowed for identification of a new mutant allele of the WHEAT FRIZZY PANICLE ( WFZP ) gene that determines spike branching in diploid wheat Ttiticum monococcum L. Moreover, we found that the development of SSs and spike branching in wheat T. durum Desf. was a result of a wfzp-A/TtBH-A1 mutation that originated from spontaneous hybridization with T. turgidum convar. сompositum (L.f.) Filat. Detailed characterization of the false-true ramification phenotype controlled by the recessive sham ramification 2 ( shr2 ) gene in tetraploid wheat T. turgidum L. allowed us to suggest putative functions of the SHR2 gene that may be involved in the regulation of spikelet meristem fate and in specification of floret meristems. The results of a gene interaction test suggested that genes WFZP and SHR2 function independently in different processes during spikelet development, whereas another spike ramification gene(s) interact(s) with SHR2 and share(s) common functions. 7 Allelic composition and associated quality traits of the Glu-1 and Glu-3 loci in selected modern Ethiopian durum wheat varieties Gluten protein determines the processing quality of both durum wheat and bread wheat. The glutenin subunits compositions and associated quality traits of 20 Ethiopian durum wheat varieties were systematically analyzed using SDS-PAGE and Payne numbers. A total of 16 glutenin patterns were identified. At the Glu-A1 locus, all varieties scored the null allele. The predominant glutenin alleles at the Glu-B1 locus were Glu-B1b (7+8) and Glu-B1e (20). In Glu-3, the most abundant glutenin subunits were Glu-A3a and Glu-B3c. Based on the Payne scores, the varieties Yerer, Ginchi, Candate, and Foka were identified to have allelic composition suitable for pasta making. The cluster analysis using agglomerative hierarchical clustering (AHC) method classified the varieties into four similarity classes. Based on the findings of this experiment, suggestions were made for allelic composition improvement through introgression of superior alleles from known Glu-1 and Glu-3 sources. 8 The NB-LRR gene Pm60 confers powdery mildew resistance in wheat 欢迎对这篇文章感兴趣的小伙伴给我们写个解读或导读。 Powdery mildew is one of the most devastating diseases of wheat. To date, few powdery mildew resistance genes have been cloned from wheat due to the size and complexity of the wheat genome. Triticum urartu is the progenitor of the A genome of wheat and is an important source for powdery mildew resistance genes. Using molecular markers designed from scaffolds of the sequenced T. urartu accession and standard map-based cloning, a powdery mildew resistance locus was mapped to a 356-kb region, which contains two nucleotide-binding and leucine-rich repeat domain (NB-LRR) protein-encoding genes. Virus-induced gene silencing, single-cell transient expression, and stable transformation assays demonstrated that one of these two genes, designated Pm60 , confers resistance to powdery mildew. Overexpression of full-length Pm60 and two allelic variants in Nicotiana benthamiana leaves induced hypersensitive cell death response, but expression of the coiled-coil domain alone was insufficient to induce hypersensitive response. Yeast two-hybrid, bimolecular fluorescence complementation and luciferase complementation imaging assays showed that Pm60 protein interacts with its neighboring NB-containing protein, suggesting that they might be functionally related. The identification and cloning of this novel wheat powdery mildew resistance gene will facilitate breeding for disease resistance in wheat. 9 Identification of QTL for flag leaf length in common wheat and their pleiotropic effects Leaf size is an important factor contributing to the photosynthetic capability of wheat plants. It also significantly affects various agronomic traits. In particular, the flag leaves contribute significantly to grain yield in wheat. A recombinant inbred line (RIL) population developed between varieties with significant differences in flag leaf traits was used to map quantitative trait loci (QTL) of flag leaf length (FLL) and to evaluate its pleiotropic effects on five yield-related traits, including spike length (SL), spikelet number per spike (SPN), kernel number per spike (KN), kernel length (KL), and thousand-kernel weight (TKW). Two additional RIL populations were used to validate the detected QTL and reveal the relationships in different genetic backgrounds. Using the diversity arrays technology (DArT) genetic linkage map, three major QTL for FLL were detected, with single QTL in different environments explaining 8.6–23.3% of the phenotypic variation. All the QTL were detected in at least four environments, and validated in two related populations based on the designed primers. These QTL and the newly developed primers are expected to be valuable for fine mapping and marker-assisted selection in wheat breeding programs. 10 The repetitive landscape of the 5100 Mbp barley genome Here, we present an analysis of the repetitive fraction of the 5100 Mb barley genome, the largest angiosperm genome to have a near-complete sequence assembly. Genes make only about 2% of the genome, while over 80% is derived from TEs. The TE fraction is composed of at least 350 different families. However, 50% of the genome is comprised of only 15 high-copy TE families, while all other TE families are present in moderate or low copy numbers. We found that the barley genome is highly compartmentalized with different types of TEs occupying different chromosomal “niches”, such as distal, interstitial, or proximal regions of chromosome arms. Furthermore, gene space represents its own distinct genomic compartment that is enriched in small non-autonomous DNA transposons, suggesting that these TEs specifically target promoters and downstream regions. Furthermore, their presence in gene promoters is associated with decreased methylation levels. 11 TaNTF2, a contributor for wheat resistance to the stripe rust pathogen Nuclear Transport Factor 2 (NTF2) functions as a critical regulator in balancing the GTP-and GDP-bound forms of Ran, a class of evolutionarily conserved small GTP-binding protein. During the incompatible interaction between wheat-Puccinia striiformis f. sp. tritici (Pst), a cDNA fragment encoding a putative wheat NTF2 gene was found to be significantly induced, suggesting a potential role in wheat resistance to Pst. In this work, the full length of TaNTF2 was obtained, with three copies located on 7A, 7B and 7D chromosomes, respectively. QRT-PCR further verified the up-regulated expression of TaNTF2 in response to avirulent Pst. In addition, TaNTF2 was also induced by exogenous hormone applications, especially JA treatment. Transient expression of TaNTF2 in tobacco cells confirmed its subcellular localization in the cytoplasm, perinuclear area and nucleus. And virus induced gene silencing (VIGS) was used to identify the function of TaNTF2 during an incompatible wheat-Pst interaction. When TaNTF2 was knocked down, resistance of wheat to avirulentPst was decreased, with a bigger necrotic spots, and higher numbers of hyphal branches and haustorial mother cells. Our results demonstrated that TaNTF2 was a contributor for wheat resistance to the stripe rust pathogen, which will help to comprehensively understand the NTF2/Ran modulating mechanism in wheat-Pst interaction. 12 Loss of AvrSr50 by somatic exchangein stem rust leads to virulence forSr50 resistance in wheat 13 Variation in the AvrSr35 genedetermines Sr35 resistance againstwheat stem rust race Ug99 14 ZmCCT9 enhances maize adaptation to higher latitudes 最后一篇是关于玉米的文章,该文发表在PNAS上,要推荐给做图位克隆的小伙伴。 欢迎关注 “ 小麦研究联盟 ”, 了解小麦新进展 请点击此处输入图片描述 投稿、转载、合作以及信息分布等请联系: wheatgenome
大数据包含三个层面:量大,多维度,完备性。量大这方面目前的基因测序数据已经体现,一个基因组有好几个 G ;多维度这个体现就是基因变异的数量,这个也具备了,即资源群体的全基因组测序已经具备了这个条件,但是农艺表型的维度不够,代谢表型和分子表型的拓展才能将维度不断加大;完备性就是不同变异的组合完备性,目前是最欠缺的,一个普通物种的基因有几万个,而我们研究的群体只有几百个,而按照完备性考虑,样本量达到上万才能基本达到要求; 于是可以预测,转录组检测、代谢组检测和基因编辑创造新材料将是生命科学大数据研究的支撑。
关于转录组的研究,前期我们设计好了实验方案,按方案进行了取样转录组测序,拿到了标准分析报告。此时,要进入整理分析结果写文章的环节了,但是我们发现面对分析报告,不知如何下手。今天,小编教你如何整理出一篇5分转录组文章主体框架。 Results 1. 首先,文章第一部分是介绍实验方案,如实验目的、实验材料、测序方法等,以及对转录组测序数据的一个整体评估和分析,包括原始数据量、数据过滤、数据组装等。以无参转录组为例,可以用下表呈现结果: Results 2. 筛选差异表达基因。根据实验方案,选择不同样品筛选差异表达基因,分析比较不同分组间相同表达和差异表达基因数目,同时也可以按照上调和下调对差异基因进行进一步分类。根据不同样品间差异基因数目初步分析样品间的生物学关系。此部分一般用样品间差异基因维恩图来显示结果: Results 3. 差异基因功能富集和通路分析。为了解析生物学过程的具体机理,需要对不同样品间的差异基因进行功能富集和通路分析。此部分可以根据功能富集和通路分析结果详细讨论样品间共有和特异性状的具体机理。结果呈现方式如下: Results 4. 差异基因共表达趋势分析。该分析使用于2个以上的时间序列样本,分析随时间推移基因表达变化模式,针对不同模式的基因集可做表达模式图及GO分类图、KEGG通路分析,从而揭示该生物学过程的具体机理。结果呈现方式如下: Results 5. 差异基因共表达网络模块分析。用WGCNA进行基因共表达网络分析,WGCNA适用于复杂的数据模式,推荐5组以上的数据。例如:器官发育的5个时期;胁迫或病原菌侵染前后的5个时间点。用WGCNA提取出不同基因模块,分析模块与表型的关联性,并对模块进行生物学功能鉴定,分析模块间的相互作用关系,且找到每个模块的关键基因,结果呈现方式如下图: Results 6:行文至此,一般情况下都找到了课题的核心关键基因,那么转录组数据分析部分也比较完整了。但是,为了让结论更具有可行性,让文章更丰满,可以考虑结合其他方法进行联合分析,从而使文章更上一层楼。主要有三种方式:一为与lncRNA、circRNA、小RNA、甲基化进行联合分析;二为结合公共数据库,下载相关数据,如同一物种不同材料相同处理的原始数据,进行如上转录组数据分析,从而总结出同一物种某种处理的核心保守基因;三为结合蛋白组、代谢组以及其他实验方法对找到的核心关键基因进行功能验证。 综上所述,一篇转录组文章的主体框架已经出现,思路清晰,结果明确。以上关于转录组数据的分析以及结果的作图在百迈客云平台上均可以实现,感兴趣的朋友可以登录云平台体验一下https://www.biocloud.net/ 参考文献: Fu Y., Poli M.., (2016) Dissection of early transcriptional responses to water stress in Arundo donax L. by unigene-based RNA-seq. Biotechnology for Biofuels. DOI 10.1186 Sun Q., Du X., (2016) To be a flower or fruiting branch: insights revealed by mRNA and Small RNA transcriptomes from different cotton developmental stages. Scientific reports. DOI: 10.1038 Vlasova A., Capella-Gutiérrez S., (2016) Genome and transcriptome analysis of the Mesoamerican common bean and the role of gene duplications in establishing tissue and temporal specialization of genes. Genome Biology. DOI 10.1186
单细胞,单分子测序给癌症检测带来了突飞猛进的发展。 最近nature methods发表了一篇牛逼文章,可以同时 检测基因组和转录组两个层次得变化 In those cells where chromosomal gains or losses (either reciprocal or nonreciprocal) were seen at the genomic level, we observed concomitant increases and decreases in chromosome-wide relative gene expression levels after GT-seq analysis, which established for the first time (to our knowledge) that the effects of gene expression dosage can be rapidly established after the acquisition of aneuploidies during a single cell division. 下面给生物信息群里的老师同学,点评一下这篇文章 看看他的附图,结果好得不得了,以至于我不得不把所有附图也放上来 本文最大的问题就是生物信息分析不够专业,很多 流程都是常用软件加上默认参数,但不是最好的, 特别R语言画图,就是一坨屎,几种颜色分不开,我 600多度大眼睛,还要放大很多倍。 文章的分析稍微有些粗糙,他应该看一下,哪些基因表达变化的倍数与 染色体变化一致,也就是说是染色体变化导致的,还有另外一些基因 的表达变化远远高于低于染色体带来的变化,这要归结于其他原因 比如fusion gene,或者被调控等。 他这个技术最大的这个信息他居然没有充分利用。 文章叫做《GT-seq: parallel sequencing of single-cell genomes and transcriptomes》 macaulay2015.pdf nmeth.3370-S1.pdf 上面的想法,引起一个问题, 如果癌症都会有染色体级别的DNA变化, 必然导致基因表达改变,而且前者变化更大,很可能是主导因素,那么 我们现在大量的基因表达差异分析,又有什么意义呢? 反过来讲,这些染色体级别的DNA变化,又是谁引起的呢? 但是,即使把这些搞清楚了,又能怎么样?将来无非沿着两条腿走下去, 一个是继续寻找致癌机制,这个估计道路漫长啊,另外就是不管三七二十一, 通过检测找到某种癌症表现的共性,然后靶向某些基因,干死他, 后者思路更清晰,不过没那么简单。 由此想到,国内一大片跟着做下去的,无非继续检验了老外设备的技术可靠性。 其实,现在需要更多的临床信息,特别是药物与癌症互相作用等更多 其他方面的研究,对检测和治疗有直接帮助的东西,恰恰没有人做。 当前,医生们大多放弃了自己最有优势的病人资源,去和医学院生科院的 老师抢着杀小白鼠。 下半年嚷嚷的精准医疗,也不要盲目推出为好,无非就是一大堆外显子 测序,对于医院又能挣钱又发文章,对于病人呢,能改进治疗么,或者 根据突变的检测结果给予最恰当的药物匹配。 其实,几年前,我们就想做一个包括3000多种药物的库,与带有突变的 药物靶点进行匹配。几次基金不中,也就懒得多写这样的本子,还不够 浪费纸张的,有空可能会和公司合作,做一点有用的东西,希望能够 帮助一下癌症患者。
转录组是基因组与基因功能的链接。为了了解基因组与细胞功能的关系,科学家普遍通过基因组的产物(RNA或Protein)来研究。蛋白组是某个细胞或组织的一整套蛋白,包含量的信息和序列信息,但是由于目前蛋白质俘获技术不够成熟,所以目前利用蛋白质来进行大量的功能基因组研究时期还没有到。但是,度量gene与protein的中间体(RNA)目前技术相当成熟,可以大批量地开展。 转录组是基因活动的标志物。在多细胞的生物体,虽然每个细胞包含相同的基因组和基因,但是并不是每个基因在每个细胞都表达或表达的模式是一样的。通过不同细胞和组织的转录组研究, 1)可以理解某个细胞类型构成的特征; 2)可以深刻地理解疾病引起基因表达变化的特征; 3)可以解读不同发育阶段基因表达的特征; 4)可以研究ohnologs间的表达动态特征,了解基因分化的奥秘; 5)可以研究基因突变对那些基因的表达产生影响; 6)可以研究那些突变(启动子和编码区)更能影响基因本身的表达或者下游基因的表达; 只有很少一部分基因组转录成RNA,人类基因组有5%转录成RNA(Frith et al ., 2005)。而且大部分转录的RNA是非编码RNA(tRNA,rRNA,microRNA等)。mRNA也包含很多变异,包括可变剪接,RNA编辑,转录起始位点变异和终止位点变异。 原文链接:http://www.nature.com/scitable/topicpage/transcriptome-connecting-the-genome-to-gene-function-605 Frith, M. C., et al . Genomics: The amazing complexity of the human transcriptome. European Journal of Human Genetics 13 , 894–897 (2005) doi:10.1038/sj.ejhg.5201459
新发表两篇恒河猴转录组医学论文, 发现部分蛋白编码基因起源于non-coding RNA, 建立恒河猴专家数据库; 附上文章链接, 欢迎讨论批评: http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1002942 Hominoid-Specific De Novo Protein-Coding Genes Originating from Long Non-Coding RNAs Chen Xie, Yong E. Zhang, Jia-Yu Chen, ..., Liping Wei and Chuan-Yun Li Abstract Tinkering with pre-existing genes has long been known as a major way to create new genes. Recently, however, motherless protein-coding genes have been found to have emerged de novo from ancestral non-coding DNAs. How these genes originated is not well addressed to date. Here we identified 24 hominoid-specific de novo protein-coding genes with precise origination timing in vertebrate phylogeny. Strand-specific RNA–Seq analyses were performed in five rhesus macaque tissues (liver, prefrontal cortex, skeletal muscle, adipose, and testis), which were then integrated with public transcriptome data from human, chimpanzee, and rhesus macaque. On the basis of comparing the RNA expression profiles in the three species, we found that most of the hominoid-specific de novo protein-coding genes encoded polyadenylated non-coding RNAs in rhesus macaque or chimpanzee with a similar transcript structure and correlated tissue expression profile. According to the rule of parsimony, the majority of these hominoid-specific de novo protein-coding genes appear to have acquired a regulated transcript structure and expression profile before acquiring coding potential. Interestingly, although the expression profile was largely correlated, the coding genes in human often showed higher transcriptional abundance than their non-coding counterparts in rhesus macaque. The major findings we report in this manuscript are robust and insensitive to the parameters used in the identification and analysis of de novo genes. Our results suggest that at least a portion of long non-coding RNAs, especially those with active and regulated transcription, may serve as a birth pool for protein-coding genes, which are then further optimized at the transcriptional level. http://nar.oxfordjournals.org/content/early/2012/09/08/nar.gks835.long RhesusBase: a knowledgebase for the monkey research community Shi-Jian Zhang, Chujun Liu ....Xiuqin Zhang and Chuan-Yun Li Abstract Although the rhesus macaque is a unique model for the translational study of human diseases, currently its use in biomedical research is still in its infant stage due to error-prone gene structures and limited annotations. Here, we present RhesusBase for the monkey research community ( http://www.rhesusbase.org ). We performed strand-specific RNA-Seq studies in 10 macaque tissues and generated 1.2 billion 90-bp paired-end reads, covering 97.4% of the putative exon in macaque transcripts annotated by Ensembl. We found that at least 28.7% of the macaque transcripts were previously mis-annotated, mainly due to incorrect exon–intron boundaries, incomplete untranslated regions (UTRs) and missed exons. Compared with the previous gene models, the revised transcripts show clearer sequence motifs near splicing junctions and the end of UTRs, as well as cleaner patterns of exon–intron distribution for expression tags and cross-species conservation scores. Strikingly, 1292 exon–intron boundary revisions between coding exons corrected the previously mis-annotated open reading frames. The revised gene models were experimentally verified in randomly selected cases. We further integrated functional genomics annotations from 60 categories of public and in-house resources and developed an online accessible database. User-friendly interfaces were developed to update, retrieve, visualize and download the RhesusBase meta-data, providing a ‘one-stop’ resource for the monkey research community.