从语料库视角看中国文学作品“走出去” 2018年10月19日 08:58 来源:中国社会科学网-中国社会科学报 作者:戴光荣 随着国家实力的日益增强,影响力随之彰显,中国文化“走出去”与国际社会“渴望了解中国”相契合,已成为一种时代的趋势。中国文学的世界影响力也在不断崛起,一大批知名作家在世界上获得引人瞩目的成就,也激起国人对于中国本土文学的自信。    重视文学外译   在国家“一带一路”倡议的推动下,学者们开始深入探讨在新的形势面前,如何提升国际话语权,如何让中国文学的声音在世界范围内得到应有的重视等诸多问题。学界普遍认为,文学翻译的功用不可小觑。扩大中国文学的国际影响力,增进世界对我国的认知,展现我国文明、民主、开放、进步的形象,不管是当下还是未来,都具有十分重要的意义。经典文学与当下红火的网络文学翻译,可以助推我们文化“软实力”的提升。这种“软实力”需要时间的积累,只有进行长期的文学外译,让我们的文学作品慢慢进入到外国读者的阅读生活中去,我们的影响力才能不断增强,我国的文化软实力和国际话语权也才会得到更广泛的认可。   文学作品是社会的镜子,是记录历史文化生活的载体,能直接或间接映射出国人的方方面面,既可以让世人了解国人的日常生活,又能让国际社会知晓国人的精神状态与理想追求。   近年来,典籍外译致力于将中国的经典推介出去,也出版了一系列翻译丛书,如《大中华文库》丛书,国家层面也出台了多项推动中华学术外译的项目,力量之大,人力物力投入之多,也间接反映了我国国力在日益提高。文学作品的翻译,采用何种翻译策略,翻译质量如何鉴定,这与当下国家强调“讲好中国故事”“将传播效果最优化”息息相关。采用什么样的技术与方法来探讨这些问题,一直是翻译界所关注的焦点。    善用语料软件   当下,随着语料库技术的进步,学界密切关注通过语料库途径来思考翻译的各类现象,如分析翻译语言特征、比较原创语言与翻译语言的异同、探讨翻译风格与语言质量、反思翻译效果最优化等。在这方面,复合型多语类语料库的运用,可以帮助我们在翻译策略、语言特征等各方面做出优化选择。我们可以收集历史上及当下在世界范围内传播广、口碑好的作品,将这些作品的中外文版本建设成汉外平行语料库,再收集国际上影响较广的经典外文原创性文学作品建成参照语料库,这三类语料就构成了当下功能齐备的复合型多语类语料库。   借助语料库分析软件,对经典中文作品语料进行分析,制作词汇列表及关键词表,总结人们所关注的有关中国文学作品的主题,再分析其对应的外文译作及外文原创作品的语言特征,以此来了解人们喜欢的优秀作品所呈现出来的语言规律性特征,这样就可以提醒翻译工作者在中国文学外译过程中,借鉴这些大型语料库所提供的数据,为翻译作品更好地传播提供参考借鉴。   众所周知,清末民初外国文学的输入对中国近现代文学的贡献不可低估。当时的译介方法与当前我们标准的翻译方法有很大出入,这与当时的译者群体、社会环境等因素是分不开的。当时的译者大致分为几类:一类是懂中文的西方传教士,他们直接翻译,如林乐知(Young John Allen, 1836—1907);一类是传教士口译、中国人笔述,如傅兰雅(John Fryer,1839—1928)口译了科技书籍达百余种;一类是懂外文的中国人直接翻译,如严复、梁启超等;还有一类是口译与笔述都是中国人,如不懂外文的林纾,与他合作的口译者多达二十余人,他们合作翻译出来的外国文学经典影响了几代人,“林译小说”也成为中国翻译文学史上一道亮丽的风景。   “林纾翻译现象”值得当下的译界反思:一个不懂外文的古文大家,与口译合作者开展“口述笔达”,成就斐然。那么,该如何客观公正地评价林纾的翻译?学界对于林纾翻译的评价,大多依赖个人感悟式的探讨,这类评价在很大程度上有失偏颇,很多研究也只凭对单一译本及其只言片语的理解,这样的结论是值得商榷的,也无法判断林纾译文的优劣。要想客观公正地评价林纾译著的质量,必须对林纾的全部翻译作品有一个通盘的认识与把握,而创建林纾翻译语料库是最佳的研究途径。   我们在创建林纾译文语料库(汉语单语语料库、文言)、林纾创作语料库(汉语单语语料库、文言)、林纾翻译双语对应语料库(主要收集原文为英语及林纾的文言翻译)这三类语料库的基础上,开展了系列研究,并取得许多值得当下翻译界借鉴的发现。在林纾翻译语料库、可比语料库(翻译文本与原创文本)的基础上,我们对三类语料的语言进行科学客观的描述与分析,发现林纾的文言翻译与原创的文言作品,各有千秋。翻译文本中渗透着外文的痕迹,传统的文言与白话在译作中交集,让译作的语言如织锦,分外妖娆。   借助大型林纾翻译语料库,提取各类语料库中相关语言特征及其数据,对数据进行检验和分析,比较译文与原创语言异同,结合文本语境,将定量分析与定性分析有机融合在一起,这样可以分析译者的翻译风格(包括其用词特征,如常用高频实义词、功能词,特殊句型及句式,独特的文体风格特征等)。   借助汉外平行语料库,可以分析翻译文本在各种限制性因素影响下,表现出规律性的语言和非语言特征,这些特征都取决于译者在翻译过程中所做出的各种选择,很多属于译者个人的语言习惯。林纾翻译文本中经常可以看到译者在译本中发出自己的声音,这是出于社会历史转型期的林纾作为译者与作者,其承前与摄外、延续与裂变、守成与启蒙并重的矛盾心态及其在创作与翻译作品中的表现。通过分析,归纳出译者与作者所构想的理想国民应具备的品质;深入剖析文化术语在译作与创作中的运用,探讨翻译与创作的异同。   从语料库角度来探讨林纾译文的特征,描写更为客观,结论更为科学与可信;聚焦林纾译者/作者的双重身份,为名家翻译与创作对比、翻译与创作语言风格比较、翻译与创作语言特征分析、翻译与创作策略研究等方面提供独到的视角。通过语料库途径能有效揭示林纾译作呈现出来的规律性语言特征,挖掘林纾翻译的成因与功效,彰显文言与现代汉语的张力差异,其结果可以从多方面进行深入探讨。    减少评估难度   在电子语料库运用于翻译研究之前,译者及社会机构通常采用传统的手段(如纸质字典、纸质文本、同行专家、个人直觉等)来评估翻译,而这些手段对于客观评价翻译所必需的概念知识与语言学知识没有多大的帮助,许多的翻译评估工作都处于一种相对真空的状态。运用传统的方法对翻译质量进行评估,其可靠性有待提高。随着语料库技术的进步,翻译评估所需要的资源与工具也得到长足的发展,借助语料库及其相关的语料库分析软件等手段,可以促进翻译评估更为客观,更加具有可操作性。   借助语料库途径来开展翻译质量评估,可以有效降低评估者主观因素的影响,从而减少评估难度。基于语料库的翻译评估方法有其独特的特征,其一,它的分析都是基于相对大型的语料,其样本是经过挑选、质量有保证的文本;其二,它分析语料库中真实的语言模式,其本质是实证式的,因而是客观的,避免个人感悟式的分析;其三,基于语料库的途径采用语料库工具与方法来分析语料,其数据可以进行重复利用。因此,在当下中国文学“走出去”的过程中,我们应该打通壁垒,让翻译工作者、出版机构及网络平台等多方面联手合作,创建复合型多语类语料库,这样我们就可以拥有更多真实、合适的翻译文本,可以帮助决策者对翻译做出恰当的选择,也可以为以后从事“中国文学走出去”翻译工作的译者们提供更多具体、客观的反馈意见。   (本文系福建省社会科学研究基地重大项目“描写翻译学视域下林译小说语料库创建及其研究”(FJ2015JDZ037)、地方文献整理研究中心重点项目“林纾译文语料库创建及其翻译风格研究”(DFWX2015-A01)阶段性成果)   (作者单位:福建工程学院人文学院) 作者简介 姓名: 戴光荣 工作单位: 福建工程学院人文学院 http://www.cssn.cn/djch/djch_djchhg/gxwy/201810/t20181019_4719497.shtml
信息研究中用到许多语料库,如著名的 TREC 中有很多用于检索的语料库, The BioCreAtIvE (Critical Assessment of Information Extraction systems in Biology) 是用于信息抽取的。一直以来知道OHSUMED也是个针对MEDLINE数据库的语料库,但是一直没有找到比较合适的说明介绍。只好把它们网站首页上的文字翻译下来。 网址是: http://ir.ohsu.edu/ohsumed/ohsumed.html,可以免费使用。 OHSUMED实验集用于信息检索研究,为偏重临床的 MEDLINE 文献子集,包括 348,566 条文献记录(从 7 百多万条记录中抽取),涵盖了 270 种医学杂志 5 年间( 1987-1991 )发表的文献记录。实验集大小为 400M 。去掉了文献记录中一些与内容检索无关的字段,仅仅包括:标题、摘要、 MeSH 标引词、作者、来源及出版类型。该数据集既不完整也不更新,不能用于实际检索,只用于研究。 该实验集是一项评估医生在临床使用 MEDLINE 情况的研究之一部分。使用 MEDLINE 的新手医生根据 106 个问题进行了检索。在他们开始检索之前,请他们提供病人的信息以及自己的信息需求。然后由 4 位检索人员( 2 个有经验的医生和 2 个医学图书馆员)重新检索每一个问题,然后由另一组医生评价检索到的每一篇文献与提问之间的相关性,评价等级包括肯定相关、可能相关和不相关三个级别。总共有 12 , 565 条提问 - 文献对;为评估观察者之间的信度,又对其中超过 10% 的提问 - 文献对进行了重复评价。 原始试验集后来用于 SMART 检索系统的实验中。如我们所期待的, SMART 检索到了一些原始检索中没有找到的文献记录。在这些实验之后,进行了第二轮的相关性评判,新增了 3 , 575 条经过评价的提问 - 文献对,其中超过 10% 又进行了重复评价以评估观察者之间的信度。 这样,现在总共有 16 , 40 对判定为相关的提问 - 文献对。这些内容都放在一个文件中( judged ),文件里每一条记录都给予相关性评判。还有一些列出提问 - 文献对的文件( drel.i, drel.ui, pdrel.i, and pdrel.ui )。这些文件中只使用了原始相关性评价。 (注意:有 5 个提问没有确定相关的文献,你可能希望从实验中删除这些文献记录。这些提问被放在提问文件中,因为今后的进一步分析会发现这些提问的相关文献。某些系统,如 SMART ,会自动从分析中去除没有检索到相关文献的提问。这 5 个没有相关文献的提问为 8, 28, 49, 86, 和 93 号。 国立医学图书馆同意在实验集中使用 MEDLINE 记录用于实验,并受到下列条款约束: 1. 数据不可用于任何非实验性临床、图书馆或者其他单位。 2. 该数据的任何人类用户应明确告知数据是不完整和过时的。 实验数据集包括 13 个文件,分别描述如下: (对那些接收压缩文件的用户,只会获得 7 个文件。下面 1-5 个文件中每一个都是独立压缩的,后缀为 .tar.Z , 6-12 号的所有这些文件都压缩在一个文件中,名为 hsumed.rest.tar.Z 。最后一个文件为本文件,即 readme 文件,未压缩) 下面是文件,压缩后大小,内容描述: 1) ohsumed.87 (60,303,307) :包括 MEDLINE 的 1987 年文献。每一个 MEDLINE 文献文件的格式都遵循 SMART 系统的约定,字段名定义如下(括号里是 NLM 对应名称): .I 系列号 .U MEDLINE 标识号 (UI) .M 手工标引 MeSH 词 (MH) .T 标题 (TI) .P 文献类型 (PT) .W 文摘 (AB) .A 作者 (AU) .S 来源 (SO) ( 注意:有些文献记录的摘要在 250 个单词后截断,有些记录没有摘要。 ) 2) ohsumed.88 (78,585,929) : 1988 年的 MEDLINE 文献,格式同上。 3) ohsumed.89 (84,719,077) : 1989 年的 MEDLINE 文献,格式同上。 4) ohsumed.90 (86,754,890) : 1990 年的 MEDLINE 文献,格式同上。 5) ohsumed.91 (89,761,122) : 1991 年的 MEDLINE 文献,格式同上。 6) 查询提问 (11,591) :包含 106 个实验集合中的检索提问,包括病人和主题信息,格式如下: .I 序列号 .B 病人信息 .W 信息需求 7) drel.ui (26,919) :包含了评级为 明确相关 的查询 - 文献对,按照文献的 MEDLINE UI 号排列,格式如下: querytabdocument-ui 8) drel.i (21,709) 包含了评级为明确相关的查询 - 文献对,按照文献的系列号( .I 字段)排列,格式如下: querytabdocument-i 9) pdrel.ui (57,831) :包含了定义为 明确相关 或者 可能相关 的查询 - 文献对,按照 MEDLINE UI 排列,格式为: querytabdocument-ui 10) pdrel.i(46,664) :包含了定义为 明确相关 或者 可能相关 的查询 - 文献对,按照序列号排列( .I 字段),格式为: querytabdocument-i 11) judged(368,366) :包含 5 个原始检索者或 SMART 系统检索到的所有文献记录列表,按照查询号排列,同查询号则按照文献号排列,带有相关性评判,分别为 d (明确相关)、 p (可能相关)或者 n (不相关)。相关性评判为对最原始检索者检索到文献的最原始的相关性评判,相关性评判 2 ( relevance 2 )的评判则是对相关性评判观察者间信度评价所做的第二次评判,相关性评判 3 ( relevance3 )的评判则是对通过 SMART 而非原始检索者检索到文献的相关性评判,或者在观察者信度评估中对原始检索到的文献另外做出的相关性评判。 querytabdocument-uitabdocument-itab relevance1 12) ui(3,137,094) :包含本实验集中所有 348,566 篇文献记录的 MEDLINE UI ,逐行列出。 13) readme :本文件。 由于在构建本实验集过程中使用了相对召回程序,以及相关性评判的主观特性,我们深深感到在对相关性确定上一定会有不同看法。我一定要更新数据集,但是要系统地做,以使研究者之间的数据可以比较。因此,我寻求关于本实验集的报告,如果你发现新的文献感到是相关文献的记录,或者你对某个相关性评判有不同意见,请尽管通过 email 或者写信告知我。我们会定期地更新想关性评判并发表更新版本。
双语网页资源在多语言信息处理(特别是机器翻译和跨语言信息检索)中,是一项极其宝贵的资源。在机器翻译领域,现在各种投入使用的系统拼模型的同时,也在拼其掌握的资源。当前学术界对双语资源获取的研究中,一个代表性的方法是根据URL的组成,利用启发式规则从双语站点上自动发现双语网页资源(暂且称该方法为基于URL模式的方法),该方法需要事先制定一些启发式规则。我们(Kit Ng, 2007; Zhang, Yao Kit, 2013)试图通过机器自动发现一些规则,来降低基于URL模式的方法对外部先验知识的依懒性。 (Kit Ng, 2007)主要工作是自动发现双语URL模式、然后根据这些模式发现双语网页资源。(Zhang, Yao Kit, 2013)进一步对双语URL模式的可信度进行度量、并依据链接关系发现更多高可信度的双语网页资源,我们的实验表明,该方法大概可以找到20%额外的真实双语网页。 该工作的有趣之处在于: (1)区分URL模式的全局可信度(依据所有种子站点计算得到的URL模式可信度)和局部可信度(依据当前站点计算得到的URL模式可信度),这样可以召回一些局部可信度低、但全局可信度高的双语网页; (2)利用学习到的高可信度的双语URL模式,寻找一些原本没有链接关系的双语网页(我们称之为Deep Bilingual Webpages); (3)利用链接关系,以双语种子站点为基础、发现更多的种子站点之外的高可信度双语站点,然后进一步发现更多的高可信度双语网页。 相关工作介绍,请参见如下论文: 2. Chengzhi Zhang, Xuchen Yao and Chunyu Kit. Finding More Bilingual Web Pages with High Credibility via Link Analysis . In: Proceedings of the 6th Workshop on Building and Using Comparable Corpora (BUCC2013) . August 8, 2013, Sofia, Bulgaria. 1. Chunyu Kit and Jessica Y. H. Ng. An intelligent Web agentto mine bilingual parallel pages via automatic discovery of URL pairing patterns . In Proceedings of the2007IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops: Workshopon Agents and Data Mining Interaction (ADMI-07),Silicon Valley,California, November 2-5, 2007, Silicon Valley,California. 当然基于URL模式的方法也有其不可避免的弱点。除基于URL模式的方法之外,另外一种方法则直接计算候选双语网页之间的结构或内容相似度,通常该过程要耗费大量的计算资源或时间(比如抓到尽量多的源语言和目标语言网页,然后进行跨语言相似度计算)。个人认为,目前关于这个工作的进一步工作还有很多,比如怎么不需要人为地给出种子站点或者尽量给较少的站点,结合半监督学习发现更多高可信度的种子站点可能是个不错的想法。 关于(Zhang, Yao Kit, 2013)中使用到源代码(Pupsniffer)与数据集(种子站点、采集的双语网页以及测评结果等)可以见Pupsniffer的测评网站: http://mega.lt.cityu.edu.hk/~czhang22/pupsniffer-eval/
【备注】一年一度的教育部人文社会科学研究项目评审结果出来了,这里摘录的是标题中包含“语料库”的研究项目,供参考。 学校名称 项目类别 项目名称 申请人 内蒙古师范大学 规划基金项目 基于语料库的小学蒙古语《语文》教材分析研究 松云 北京戏曲艺术职业学院 青年基金项目 基于互联网开放领域构建京剧专业语料库 乐娟 浙江传媒学院 青年基金项目 基于语料库的“农民工”媒介话语考古及身份生产研究 赵凌 大连理工大学 规划基金项目 基于视频语料库的日语拒绝行为的研究 刘玉琴 曲阜师范大学 规划基金项目 基于语料库的《论语》英译研究——系统功能语言学和西方修辞学视角 鞠玉梅 广东第二师范学院 青年基金项目 语料库视角下的对外汉语语义韵习得研究 李芳兰 广州航海高等专科学校 青年基金项目 英汉海事双语平行语料库建设和研究 丛丽君 合肥工业大学 青年基金项目 基于语料库的茅盾文学奖获奖作品英译研究 汪晓莉 合肥工业大学 青年基金项目 基于语料库的中国大陆法律文本的多元语用解析及其英译 肖薇 河南科技学院 青年基金项目 汉英篇章结构平行语料库构建研究 冯文贺 华南师范大学 青年基金项目 基于语料库统计的两岸华语口语语法比较研究 方清明 辽宁大学 青年基金项目 葛浩文译作的文化透视——汉英平行语料库辅助的研究 车艳秋 上海交通大学 青年基金项目 唐代字样学与石刻用字比较研究——以语料库为基础 刘元春 首都师范大学 青年基金项目 基于中日双语儿童语料库对语码转换句法结构限制的研究 孟海蓉 四川外语学院 青年基金项目医患会话中的评价性语言:一项基于中国医患门诊会话语料库的研究 刘兴兵 太原科技大学 青年基金项目 学科亲缘与个体定位:基于语料库的中国和英美学术英语批评性立场和评估对比研究 董艳 天津外国语大学 青年基金项目 中医的英译及其语料库的建设 刘春梅 湘潭大学 青年基金项目 从西方中心到文化主体重建——基于平行语料库的《论语》核心概念词英译变迁之研究 刘永利
英国Manchester大学的学术论文语料库 http://www.phrasebank.manchester.ac.uk/index.htm Academic Phrasebank Writing Introductions There are many ways to introduce an academic essay or assignment. Most academic writers, however, appear to do one or more of the following in their introductions: establish the context, background and/or importance of the topic indicate a problem, controversy or a gap in the field of study define the topic or key terms state of the purpose of the essay/writing provide an overview of the coverage and/or structure of the writing Examples of phrases which are commonly employed to realise these functions are listed below. Note that there may be a certain amount of overlap between some of the categories under which the phrases are listed. Introductory sections for research dissertations , are normally much more complex than this and, as well as the elements above, may include the following: a synopsis of key literature/current state of knowledge, synopsis of methods, lists of research questions or hypotheses to be tested, significance of the study, recognition of the limitations of the study, reasons for personal interest in the topic. Establishing the importance of the topic: One of the most significant current discussions in legal and moral philosophy is ...... It is becoming increasingly difficult to ignore the ..... X is the leading cause of death in western industrialised countries. X is a common disorder characterised by ...... X is an important component in the climate system, and plays a key role in Y. In the new global economy, X has become a central issue for ...... In the history of development economics, X has been thought of as a key factor in ....... Xs are one of the most widely used groups of antibacterial agents and ...... Xs are the most potent anti-inflammatory agents known. X is a major public health problem, and the cause of about 4% of the global burden of disease. X is an increasingly important area in applied linguistics. Central to the entire discipline of X is the concept of ....... X is at the heart of our understanding of ...... Establishing the importance of the topic (time frame given): Recent developments in X have heightened the need for ...... In recent years, there has been an increasing interest in ...... Recent developments in the field of X have led to a renewed interest in ...... Recently, researchers have shown an increased interest in ...... The past decade has seen the rapid development of X in many ....... The past thirty years have seen increasingly rapid advances in the field of...... Over the past century there has been a dramatic increase in ...... One of the most important events of the 1970s was ...... Traditionally, Xs have subscribed to the belief that ...... X proved an important literary genre in the early Y community. The changes experienced by Xs over the past decade remain unprecedented. Xs are one of the most widely used groups of antibacterial agents and have been extensively used for decades to ....... Highlighting a problem in the field of study: However, these rapid changes are having a serious effect ...... However, a major problem with this kind of application is ...... Lack of X has existed as a health problem for many years. Despite its safety and efficacy, X suffers from several major drawbacks: However, research has consistently shown that first year students have not attained an adequate understanding of ...... There is increasing concern that some Xs are being disadvantaged ...... Despite its long clinical success, X has a number of problems in use. Questions have been raised about the safety of prolonged use of ...... Highlighting a controversy in the field of study: To date there has been little agreement on what ...... More recently, literature has emerged that offers contradictory findings about ..... One observer has already drawn attention to the paradox in ...... In many Xs a debate is taking place between Ys and Zs concerning ...... The controversy about scientific evidence for X has raged unabated for over a century. Debate continues about the best strategies for the management of …… This concept has recently been challenged by ……. studies demonstrating ……. One of the most significant current discussions in legal and moral philosophy is ...... One observer has already drawn attention to the paradox in ...... In many Xs a debate is taking place between Ys and Zs concerning ...... The controversy about scientific evidence for X has raged unabated for over a century. Questions have been raised about the safety of prolonged use of ...... The issue of X has been a controversial and much disputed subject within the field of ....... The issue has grown in importance in light of recent ...... One major theoretical issue that has dominated the field for many years concerns ...... One major issue in early X research concerned....... Highlighting a knowledge gap in the field of study (for research): So far, however, there has been little discussion about ...... However, far too little attention has been paid to ...... Most studies in X have only been carried out in a small number of areas. The research to date has tended to focus on X rather than Y. In addition, no research has been found that surveyed ....... So far this method has only been applied to ...... Several studies have produced estimates of X (Smith, 2002; Jones, 2003), but there is still insufficient data for ..... However, there have been no controlled studies which compare differences in ...... The experimental data are rather controversial, and there is no general agreement about ...... However, there is no reliable evidence that ...... X's analysis does not take account of ..... nor does he examine ...... Focus, aim, argument: This paper will focus on/examine/give an account of ...... This essay seeks to remedy these problems by analyisng the literature of ...... The objectives of this research are to determine whether ...... This paper seeks to address the following questions: This essay critically examines/discusses/traces ...... The purpose of this paper is to review recent research into the ...... This paper will review the research conducted on ...... This chapter reviews the literature concerning the usefulness of using ...... The aim of this paper is to determine/examine ...... The aim of this study was to evaluate and validate ..... In this paper I argue that ..... In the pages that follow, it will be argued that …… This paper attempts to show that ...... In this essay, I attempt to defend the view that ...... Outline of structure: The main questions/issues addressed in this paper are: a), b and c). This paper has been divided into four parts. The first part deals with ...... The essay has been organised in the following way. This paper first gives a brief overview of the recent history of X. This paper reviews the evidence for ..... This paper begins by ...... It will then go on to ...... The first section of this paper will examine ...... Finally, ....... Chapter 2 begins by laying out the theoretical dimensions of the research, and looks at how ...... Chapter 3 describes the design, synthesis, characterization and evaluation of ...... The last chapter assesses the ...... Explaining Keywords While a variety of definitions of the term X have been suggested, this paper will use the definition first suggested by Smith (1968) who saw it as ....... Throughout this paper the term X will refer to/will be used to refer to ....... In this article the acronym/abbreviation XYZ will be used. Referring to Literature One important characteristic of academic writing is that all the sources of information that the writer has used need to be indicated, not just as a bibliography or list of references, but also in or alongside the text. In some cases the source will be the main subject of the sentence, in others the sources may be mentioned parenthetically (in brackets) or via a notation system (eg. footnotes). The more common verbs and verb phrases used in academic writing for referring to sources are given below. Note that different referencing systems are used in different disciplines. In the examples, the Harvard in-text referencing system has been used. Also note that the "author as subject" style is less common in the sciences. "Ideally, your review should be evaluative and critical of the studies which have a particular bearing on your own. For example, you may think a particular study did not investigate some necessary aspect of the area, or that the authors failed to notice some problem with their results." Taken from the Manchester Good Practice Guide: http://www.man.ac.uk/goodpractice/ General descriptions of the relevant literature: A considerable amount of literature has been published on X. These studies ...... There is a large volume of published studies describing the role of …… The first serious discussions and analyses of X emerged during the 1970s with ...... The generalisability of much published research on this issue is problematic. What we know about X is largely based upon empirical studies that investigate how ...... During the past 30 years much more information has become available on ...... In recent years, there has been an increasing amount of literature on ....... A large and growing body of literature has investigated ...... General reference to previous research/scholarly activity (usually more than one author) Many historians have argued that ...... (eg. Jones, 1987; Johnson, 1990; Smith, 1994) Numerous studies have attempted to explain ..... (for example, Smith , 1996; Kelly, 1998; Johnson, 2002) Recent evidence suggests that ....... (Smith, 1996; Jones 1999; Johnson, 2001) Recently, in vitro studies have shown that T.thermophylus EFTu can ...... (Patel et al., 1997; Jones et al., 1998). Surveys such as that conducted by Smith (1988) have shown that ...... Several attempts have been made to ....... (Smith, 1996; Jones 1999; Johnson, 2001) Several studies have revealed that it is not just X that acts on ...... (Smith, 1996; Jones ....... Several biographies of Harris have been published . Smith presents an ........ account, whilst Jones ..... Several studies investigating X have been carried out on ...... Data from several sources have identified the increased morbidity and mortality associated with obesity Previous studies have reported ...... (Smith, 1985; Jones, 1987; Johnson, 1992). Previous research findings into X have been inconsistent and contradictory (Smith, 1996; Jones 1999, ...... A number of studies have found that ...... (Smith , 2003; Jones, 2004). Twenty cohort study analyses have examined the relationship between ....... At least 152 case-control studies worldwide have examined the relationship between...... Other studies have considered the relationship ...... The relationship between X and Y has been widely investigated (Smith, 1985; Jones, 1987, ....... The causes of X have been widely investigated (Jones, 1987; Johnson, 1990; Smith, 1994). The geology of X has been addressed in several smallscale investigations and ....... Xs have been identified as major contributing factors for the decline of many species (1). X has also been shown to reverse the anti-inflammatory effects of glucocorticoids in murine-induced arthritis (11). I t has been suggested that levels of X are independent of the size of the Y (Smith et al., 1995) It has conclusively been shown that X and Y increase Z (Smith et al., 1999; Jones, 2001 ...... It has been demonstrated that a high intake of X results in damage to ...... (Smith, 1998; ...... Reference to current state of knowledge A relationship exists between an individual's working memory and their ability to ...... (Jones et al.,1998). GM varieties of maize are able to cross-pollinate with non-GM varieties (Smith, 1998; Jones, 1999). There is an unambiguous relationship between spending on education and economic development (Rao, 1998). X is one of the most intense reactions following CHD (Lane, 2003). MIF has been found to oppose the anti-infammatory actions of X on Y (Alourfi, 2004). Reference to single investigations in the past: researcher(s) as sentence subject Smith (1999) found that as levels of literacy and education of the population rise ............. showed that reducing X to 190oC decreased ...... (see figure 2) . demonstrated that when the maximum temperature is exceeded ....... Jones et al. (2001) investigated the differential impact of formal and non-formal education on ...... analysed the data from 72 countries and concluded that ....... reviewed the literature from the period and found little evidence for this claim. interviewed 250 undergraduate students using semi-structured questionnaires. studied the effects of Cytochrome P450 on unprotected nerve cells. performed a similar series of experiments in the 1960s to show that ...... carried out a number of investigations into the ...... conducted a series of trials in which he mixed X with different quantities of .... measured both components of the ...... labelled these subsets as ....... examined the flow of international students ...... ...... identified parents of disabled children as ..... used a survey to assess the various ....... Wang et al. (2004) have recently developed a methodology for the selective introduction of ...... Reference to single investigations or publications in the past: time frame prominent In 1975, Smith et al. published a paper in which they described ..... In 1990 Patel et al. demonstrated that replacement of H2O with heavy water led to ...... Thirty years later, Smith (1974) reported three cases of Candida Albicans which ....... In the 1950s Gunnar Myrdal pointed to some of the ways in which …………… (Myrdal, 1957) In 1981, Smith and co workers demonstrated that X induced in vitro resistance to ....... In 1990, El-Guerrouj et al. reported a new and convenient synthetic procedure to obtain ...... In 1984 Jones et al. made several amino acid esters of X and evaluated them as water-soluble pro-drugs. Reference to single publication: no time frame Smith has written the most complete synthesis to date of ...... Reference to single investigations in the past: investigation prominent Preliminary work on X was undertaken by AbdulKarim (1992). The first systematic study of X was reported by Patel et al. in 1986. The study of the structural behavior of X was first carried out by Rao et al. (1986)...... Analysis of the genes involved in X was first carried out by Smith et al (1983). A recent study by Smith and Jones (2001) involved ...... A longitiudinal study of X by Smith (2002) reports that ...... A small scale study by Smith (2002) reaches different conclusions, finding no increase in ...... Smith's cross-country analysis (2002) showed that ...... Smith's comparative study (2002) found that ...... Detailed examination of X by Smith and Patel (1961) showed that ...... In another major study, Zhao (1974) found that just over half of the ...... In a radomised controlled study of X, Smith (2004) reported that ...... In a large longitudinal study, Boucahy et al. 2004) investigated the incidence of X in Y. Reference to single investigations in the past: research topic as subject Classical conditioning was first demonstrated experimentally by Pavlov (Smith, 2002). In his seminal study ...... The electronic spectroscopy of X was first studied by Smith and Douglas 1 in 1970 The acid-catalyzed condensation reaction between X and Y was first reported by Baeyer in 1872 X formed the central focus of a study by Smith (2002) in which the author found ...... X was originally isolated from Y in a soil sample from ...... (Wang et al., 1952). The way in which the X gene is regulated was studied extensively by Ho and colleagues (Ho et al. 1995 and 1998). To determine the effects of X, Zhao et al (2005) compared ...... Reference to what other writers do in their text (author as subject) Smith (2003) identifies poor food, bad housing, inadequate hygiene and large families as the major causes of ...... Rao (2003) lists three reasons why the English language has become so dominant. These are: ...... Smith (2003) traces the development of Japanese history and philosophy during the 19th century. Jones(2003) provides in-depth analysis of the work of Aristotle showing its relevance to contemporary times. Smith (2003) draws our attention to distinctive categories of motivational beliefs often observed in ....... Smith (2003) defines evidence based medicine as the conscious, explicit and judicious use of ..... Rao (2003) highlights the need to break the link between economic growth and transport growth ...... Smith (2003) discusses the challenges and strategies for facilitating and promoting ...... Toh (2003) mentions the special situation of Singapore as an example of ..... Smith (2003) questions whether mainstream schools are the best environment for ...... Smith (2003) considers whether countries work well on cross-border issues such as ...... Smith (2003) uses examples of these various techniques as evidence that ...... In her major study, In her seminal article, In her classic critique of ......, In her case study of ......, In her review of ......, In her analysis of ......, In her introduction to ......,' Smith (2004) identifies five characteristics of ....... Some analysts (eg Carnoy, 2002) have attempted to draw fine distinctions between ........ Other authors ( see Harbison, 2003; Kaplan, 2004) question the usefulness of such an approach. Reference to other writers' ideas (author as subject) According to Smith (2003), preventative medicine is far more cost effective, and therefore better adapted to the developing world. Smith (2003) points out argues maintains claims concludes suggests that preventative medicine is far more cost effective, and therefore better adapted to the developing world Smith (2003) argues for offers proposes suggests an explanatory theory for each type of irrational belief. This view is supported by Jones (2000) who writes ...... Smith argues that her data support O'Brien's (1988) view that ...... As Smith reminds us, .............. Elsewhere, Smith has argued that ...... Some ways of introducing quotations In the final part of the Theses, Marx writes: "Philosophers have hitherto only interpreted the world in various ways; the point ......." Sachs concludes: "The idea of development stands today like a ruin in the intellectual landscape…" (Sachs, 1992a: 156). As Smith argues: "In the past, the purpose of education was to ......" (Smith , 2000:150). As Carnoy (2004: 215) states: "there are many good reasons to be sceptical". Being Critical As an academic writer, you are expected to be critical of the sources that you use. This essentially means questioning what you read and not necessarily agreeing with it just because the information has been published. Being critical can also mean looking for reasons why we should not just accept something as being correct or true. This can require you to identify problems with a writer's arguments or methods, or perhaps to refer to other people's criticisms of these. Constructive criticism goes beyond this by suggesting ways in which a piece of research or writing could be improved. ...... being against is not enough. We also need to develop habits of constructive thinking. Edward de Bono Introducing questions, problems and limitations (theory) One question that needs to be asked, however, is whether ...... A serious weakness with this argument, however, is that ...... One of the limitations with this explanation is that it does not explain why... . One criticism of much of the literature on X is that ...... The key problem with this explanation is that ...... The existing accounts fail to resolve the contradiction between X and Y. However, there is an inconsistency with this argument. Smith's argument relies too heavily on qualitative analysis of ...... It seems that Jones' understanding of the X framework is questionable. Smith's interpretation overlooks much of the historical research ...... One major criticism of Smith's work is that ..... Many writers have challenged Jones' claim on the grounds that ....... X's analysis does not take account of ..... nor does he examine ...... Introducing questions, problems and limitations (method/practice) Another problem with this approach is that it fails to take X into account. Perhaps the most serious disadvantage of this method is that ..... Difficulties arise, however, when an attempt is made to implement the policy. Nevertheless, the strategy has not escaped criticism from governments, agencies and academics. One major drawback of this approach is that ...... The main limitation of biosynthetic incorporation, however, is ...... However, this method of analysis has a number of limitations. However, approaches of this kind carry with them various well known limitations. All the studies reviewed so far, however, suffer from the fact that ....... However, there are limits to how far the idea of/concept of X can be taken. However, such explanations tend to overlook the fact that...... However, one of the problems with the instrument the researchers used to measure X was ...... However, all the previously mentioned methods suffer from some serious limitations weaknesses disadvantages drawbacks. Identifying a study's weakness (However,) the main weakness of the study is the failure to address how ....... the study fails to consider the differing categories of damage that ..... the research does not take into account pre-existing ...... such as ...... the author offers no explanation for the distinction between X and Y. Smith makes no attempt to differentiate between various different types of X. Jones fails to fully acknowledge the significance of ...... the paper would appear to be over ambitious in its claims the author overlooks the fact that X contributes to Y. what Smith fails to do is to draw a distinction between ...... another weakness is that we are given no explanantion of how ...... no attempt was made to quantify the association between X and Y. Offering constructive suggestions Smith's paper Her conclusions The study The findings would have been might have been more much more far more useful convincing interesting persuasive original if he/she had if the author had included ...... considered ...... adopted ...... used ...... A better study would examine a large, randomly selected sample of societies with ...... A much more systematic study would identify how X interacts with other variables that are believed to be linked to ...... Highlighting inadequacies of previous studies Most studies in the field of X have only focussed on ...... Most studies in X have only been carried out in a small number of areas. The generalisability of much published research on this issue is problematic. The experimental data are rather controversial, and there is no general agreement about ...... Such expositions are unsatisfactory because they ..... However, few writers have been able to draw on any structured research into the opinions and attitudes of ...... The research to date has tended to focus on X rather than Y. The existing accounts fail to resolve the contradiction between X and Y. Researchers have not treated X in much detail. Previous studies of X have not dealt with ...... However, these studies used non-validated methods to measure ..... Half of the studies evaluated failed to specify whether ...... However, much of the research up to now has been descriptive in nature …. Although extensive research has been carried out on X, no single study exists which adequately covers ...... However, these results were based upon data from over 30 years ago and it is unclear if these differences still persist. Introducing other people's criticisms However, Jones (2003) points out that ..... Many analysts now argue that the strategy of X has not been successful. Jones (2003), for example, argues that ..... Non-government agencies are also very critical of the new policies. The X theory has been / vigorously / strongly challenged in recent years by a number of writers. Smith's analysis has been criticised by a number of writers. Jones (1993), for example, points out that …… Smith's meta-analysis has been subjected to considerable criticism. The most important of these criticisms is that Smith failed to note that ...... Jones (2003) is probably the best known critic of the X theory. He argues that .…. The latter point has been devastatingly critiqued by Jones (2003). Critics have also argued that not only do social surveys provide an inaccurate measure of X, but the...... Critics question the ability of poststructuralist theory to provide ...... More recent arguments against X have been summarised by Smith and Jones (1982): Jones (2003) is critical of the conclusions that Smith draws from his findings. Describing Methods In the Methods section of a dissertation or research article, writers give an account of how they carried out their research.The Materials and Methods section should be clear and detailed enough for another experienced person to repeat the research and reproduce the results. Typical features with examples of this language are listed below. Describing different methods To date various methods have been developed and introduced to measure X: In most recent studies, X is measured in four different ways. Radiographic techniques are the main non-invasive method used to determine .... Different authors have measured X in a variety of ways. Previous studies have based their criteria for selection on ...... A variety of methods are used to assess X. Each has its advantages and drawbacks. Data were gathered from multiple sources at various time points during the 2007–2008 academic year. Giving reasons why a particular method was adopted The semi-structured approach was chosen because ...... Smith et al (1994) identify several advantages of the case study, ....... It was decided that the best method to adopt for this investigation was to ...... A case study approach was chosen to allow a ...... The design of the questionnaires was based on ...... The X method is one of the more practical ways of ...... It was considered that quantitative measures would usefully supplement and extend the qualitative analysis. Many of the distributions were not normal so non-parametric signed rank tests were run. The X approach has a number of attractive features: ...... Indicating a specific method Article references were searched further for additional relevant publications. Articles were searched from January 1965 until April 2008. Publications were only included if ……. X was prepared according to the procedure used by Patel et al. (1957). The synthesis of X was done according to the procedure of Smith (1973). X was synthesised using the same method that was detailed for Y, using ...... This compound was prepared by adapting the procedure used by Zhao et al. (1990). For this study the X was used to explore the subsurface …… An alternative method for making scales homogenous is by using ….. Describing the characterisitics of the sample The initial sample consisted of 200 students of whom 13 did not complete all of the interviews All studies described as using some sort of X procedure were included in the analysis. A systematic literature review was conducted of studies that ..... All of the participants were aged between 18 and 19 at the beginning of the study..... Two groups of subjects were interviewed, namely X and Y. The first group were ...... A random sample of patients with ...... was recruited from ....... Forty-seven students studying X were recruited for this study. The students were divided into two groups based on their performance on ...... The project used a convenience sample of 32 first year modern languages students. Just over half the sample (53%) was female, of whom 69% were ...... Participants were recruited from 15 clinics across ......, covering urban and rural areas …… Eligibility criteria required individuals to have received …. Five individuals were excluded form the study on the basis of …. Eligible women who matched the selection criteria were identified by …… Semi structured interviews were conducted with 17 male offenders with a mean age of 38 years A comparison group of 12 male subjects without any history of X was drawn from a pool of ……. Indicating reasons for sample characteristics A small sample was chosen because of the expected difficulty of obtaining ...... The subjects were selected on the basis of a degree of homogeneity of their ....... Criteria for selecting the subjects were as follows: Describing the process: infinitive of purpose In order to identify the T10 and T11 spinous processes, the subjects were asked to ...... In order to understand how X regulates Y, a series of transfections was performed.. To enable the subjects to see the computer screen clearly, the laptop was configured with ...... To see if the two methods gave the same measurement, the data was plotted and ...... To control for bias, measurements were carried out by another person. To measure X , a question asking ...... was used. To determine whether ......, KG-1 cells were incubated for ...... To establish whether ......, To increase the reliability of measures, each X was tested twice with a 4-min break between ....... To compare the scores three weeks after initial screening, a global ANOVA F-test was used The vials were capped with ..... to prevent volatisation. In an attempt to make each interviewee feel as comfortable as possible, the interviewer ...... Describing the process: other phrases expressing purpose For the purpose of height measurement, subjects were asked to stand ..... For the purpose of analysis, 2 segments were extracted from each ...... For the estimation of protein concentration, 100 μ L of protein sample was mixed with ...... Describing the process: typical verbs (note use of passive form) Data management and analysis was performed using SPSS 8.0 (1999). Published studies were identified using a search startegy developed in ..... The experiments were carried out over the course of the growing period from ....... Injection solutions were coded by a colleague to reduce experimenter bias. Drugs were administered by icv injection under brief CO2 narcosis; The mean score for the two trials was subjected to multivariate analysis of variance to determine ...... The subjects were asked to pay close attention to the characters whenever ...... Prompts were used as an aid to question two so that ...... The pilot interviews were conducted informally by the trained interviewer ...... Blood samples were obtained with consent, from 256 caucasian male patients ...... Independent tests were carried out on the x and y scores for the four years from ...... This experiment was repeated under conditions in which the poor signal/noise ratio was improved. Significance levels were set at the 1% level using the student t-test. A total of 256 samples were taken from 52 boreholes (Figure 11). Describing the process: sequence words/phrases Prior to commencing the study, ethical clearance was sought from ...... In the end, the EGO was selected as the measurement tool for the current study. After "training", the subjects were told that the characters stood for X and that their task was to ....... After collection, the samples were shipped back to X in ...... After conformational analysis of X, it was necessary to ...... Once the Xs were located and marked , a thin clear plastic ruler ...... Once the positions had been decided upon , the Xs were removed from each Y and replaced by ..... Once the exposures were completed , the X was removed from the Y and placed in ...... On completion of X, the process of model specification and parameter estimation was carried out . Following this, the samples were recovered and stored overnight at ...... These ratings were then made for the ten stimuli to which the subject had been exposed ...... The analysis was checked when initially performed and then checked again at the end of ...... The subjects were then shown a film individually and were asked to ...... The soil was then weighed again, and this weight was recorded as ...... The results were corrected for the background readings and then averaged before being converted to...... Finally , questions were asked as to the role of ...... Describing the process: adverbs of manner The soil was then placed in a furnace and gradually heated up to ..... The vials were shaken manually to allow the soil to mix well with the water. The medium was then aseptically transferred to a conical flask. The resulting solution was gently mixed at room temperature for ten minutes and ...... A sample of the concentrate was then carefully injected into ...... The tubes were accurately reweighed to six decimal places using ...... Describing the process: passive verb + using .... for instruments 15 subjects were recruited using email advertisements requesting healthy students from ...... All the work on the computer was carried out using Quattro Pro for Windows and ......l. Data were collected using two high spectral resolution spectroradiometers. The data was recorded on a digital audio recorder and transcribed using a ....... Semi-automated genotyping was carried out using X software and .... Statistical significance was analysed using analysis of variance and t-tests as appropriate. Comparisons between the two groups were made using unrelated t -tests. Using the X-ray and looking at the actual X, it was possible to identify ...... Using an Anthos Microplate Reader were able to separate single cells into different ...... Describing the process: giving detailed information Compounds 3 and 5 were dissolved in X at apparent pH 2.5 to give concentrations of 4mM ..... ...... and the solutions were degraded at 55°C or 37°C for a total time of 42 hours. At intervals of 0.5 min, 50 μ Lof the X was aliquoted into 0.5mL of cooled boric acid buffer (pH 7.5) to ...... Indicating problems or limitations In this investigation there are several sources for error. The main error is ...... Another major source of uncertainty is in the method used to calculate X.. It was not possible to investigate the significant relationships of X and Y further because the sample size was too small. Further data collection is required to determine exactly how X affects Y. Reporting Results The standard approach to this section of a dissertation is to merely present the results, without elaborate discussion or comment. This does not mean that you do not need any text to describe data presented in tables and figures. Writers usually comment on the significant data presented in the tables and figures. This often takes the form of the location or summary statement, which identifies the table or figure and indicates its content. This is normally followed by a statement or statements which point out and describe the relevant or significant data. All your tables should be numbered and given a title. More elaborate commentary on the results is normally restricted to the Discussion section. In research articles, however, authors may comment extensively on their results as they are presented, and it is not uncommon for the Results section to be combined with the Discussion section under the heading: Results and Discussion. Reference to aim/method To assess X, the Y questionnaire was used. To distinguish between these two possibilities, ...... To compare the scores three weeks after initial screening, a global ANOVA F-test was used In order to assess Z, repeated measures of ANOVA were used. Regression analysis was used to predict the ...... Changes in X and Y were compared using ...... The average scores of X and Y were compared in order to ...... Nine items on the questionnaire measured the extent to which ...... The correlation between X and Y was tested. The first set of analyses examined the impact of ...... Simple statistical analysis was used to ...... A scatter diagram and a Pearson's product moment correlation were used to determine the relationship between ...... T-tests were used to analyse the relationship between ...... Comparisons between the two groups were made using unrelated t -tests. Location and summary statements: Table 1 Figure 1 shows compares presents provides the experimental data on X. the results obtained from the preliminary analysis of X. the intercorrelations among the nine measures of X. The results obtained from the preliminary analysis of X are shown can be compared are presented in Table 1. in Fig 1. As shown in Figure 12.1, As can be seen from the table (above), It can be seen from the data in Table 12.1 that From the graph above we can see that the X group reported significantly more Y than the other two groups. The table below illustrates The pie chart above shows some of the main characteristics of the ...... the breakdown of ...... Highlighting significant data in a table/chart It is apparent from this table that very few ...... This table is quite revealing in several ways. First, unlike the other tables ...... Data from this table can be compared with the data in Table 4.6 which shows ...... From the data in Figure 9, it is apparent that the length of time left between ...... From this data we can see that Study 2 resulted in the lowest value of ...... The histogram in Fig 1. indicates that ...... What is interesting in this data is that ...... In Fig.10 there is a clear trend of decreasing ...... As Table III shows, there is a significant difference ( t = -2.15, p = 0.03) between the two groups. Statements of result (positive) Strong evidence of X was found when ...... This result is significant at the p = 0.05 level. There was a significant positive correlation between ...... There was a signifcant difference between the two conditions ...... On average, Xs were shown to have ...... The mean score for X was ...... Interestingly, for those subjects with X, ...... A positive correlation was found between X and Y. The results, as shown in Table 1, indicate that …. Further analysis showed that ...... Further statistical tests revealed ..... Statements of result (negative) There was no increase of X associated with ..... There were no significant differences between ...... No significant differences were found between ..... No increase in X was detected. No difference greater than X was observed. The Chi-square test did not show any significant differences between ...... None of these differences were statistically significant. Overall, X did not affect males and females differently in these measure. No significant reduction in X was found with Y compared with placebo. A clear benefit of X in the prevention of Y could not be identified in this analysis. Highlighting significant, interesting or surprising results The most striking result to emerge from the data is that ...... Interestingly, this correlation is related to ..... The correlation between X and Y is interesting because ...... The more surprising correlation is with the ...... The single most striking observation to emerge from the data comparison was ...... Reporting results from questionnaires and interviews The response rate was 60% at six months and 56% at 12 months. Of the study population, 90 subjects completed and returned the questionnaire. Of the initial cohort of 123 students, 66 were female and 57 male. Thirty-two individuals returned the questionnaires. The majority of respondents/those who responded felt that ..... Over half of those surveyed reported that ...... 70% of those who were interviewed indicated that ...... Almost two-thirds of the participants (64%) said that ...... Approximately half of those surveyed did not comment on ...... A small number of those interviewed suggested that ...... Only a small number of respondents indicated that ...... Of the 148 patients who completed the questionnaire, just over half indicated that ....... A minority of participants (17%) indicated ...... In response to Question 1, most of those surveyed indicated that ...... The overall response to this question was very positive. When the subject were asked ......, the majority commented that ..... Other responses to this question included ...... The overall response to this question was poor. Some participants expressed the belief that ….. One individual stated that …. And another commented ……. Transition statements Turning now to the experimental evidence on ...... Comparing the two results, it can be seen that ...... A comparison of the two results reveals ...... If we now turn to ...... Discussions The term discussion has a variety of meanings in English. In academic writing, however, it usually refers to two types of activity: a) considering both sides of an issue, or question, b) considering the results of research and the implications of these. Discussion sections in dissertations and research articles are probably the most complex in terms of their elements. The most common elements and some of the language that is typically associated with them are listed below: Background information (reference to literature or to research aim/question) A strong relationship between X and Y has been reported in the literature. Prior studies that have noted the importance of ...... In reviewing the literature, no data was found on the association between X and Y. As mentioned in the literature review, ...... Very little was found in the literature on the question of ..... This study set out with the aim of assessing the importance of X in ...... The third question in this research was ...... It was hypothesized that participants with a history of ...... The present study was designed to determine the effect of ...... Statements of result (usually with reference to results section) The results of this study show/indicate that ....... This experiment did not detect any evidence for ...... On the question of X, this study found that ...... The current study found that ...... The most interesting finding was that ...... Another important finding was that ..... The results of this study did not show that ....../did not show any significant increase in ...... In the current study, comparing X with Y showed that the mean degree of ...... In this study, Xs were found to cause ..... X provided the largest set of significant clusters of ...... It is interesting to note that in all seven cases of this study...... Unexpected outcome Surprisingly, X was found to ....... Surprisingly, no differences were found in ...... One unanticipated finding was that ..... It is somewhat surprising that no X was noted in this condition ...... What is surprising is that ...... Contrary to expectations, this study did not find a significant difference between ....... However, the observed difference between X and Y in this study was not significant. However, the ANOVA (one way) showed that these results were not statistically significant. This finding was unexpected and suggests that ...... Reference to previous research (support) This study produced results which corroborate the findings of a great deal of the previous work in this field. The findings of the current study are consistent with those of Smith and Jones (2001) who found ...... This finding supports previous research into this brain area which links X and Y. This study confirms that X is associated with ...... This finding corroborates the ideas of Smith and Jones (2008), who suggested that ...... This finding is in agreement with Smith's (1999) findings which showed ....... It is encouraging to compare this figure with that found by Jones (1993) who found that ..... There are similarities between the attitudes expressed by X in this study and those described by (Smith, 1987, 1995) and Jones (1986) These findings further support the idea of ..... Increased activation in the PCC in this study corroborates these earlier findings. These results are consistent with those of other studies and suggest that ...... The present findings seem to be consistent with other research which found ...... This also accords with our earlier observations, which showed that ...... Reference to previous research (contradict) However, the findings of the current study do not support the previous research. This study has been unable to demonstrate that ...... However, this result has not previously been described. In contrast to earlier findings, however, no evidence of X was detected. Although, these results differ from some published studies (Smith, 1992; Jones, 1996), they are consistent with those of ...... These results results differ from X's 2003 estimate of Y, but they are broadly consistent with earlier ..... Explanations for results: There are several possible explanations for this result. These differences can be explained in part by the proximity of X and Y. A possible explanation for this might be that ..... Another possible explanation for this is that ...... This result may be explained by the fact that ...../ by a number of different factors. It is difficult to explain this result, but it might be related to ...... It seems possible that these results are due to ...... The reason for this is not clear but it may have something to do with ...... It may be that these students benefitted from ...... This inconsistency/discrepancy may be due to ...... This rather contradictory result may be due to ...... These factors may explain the relatively good correlation between X and Y. There are, however, other possible explanations. The possible interference of X can not be ruled out. The observed increase in X could be attributed to ..... The observed correlation between X and Y might be explained in this way. ..... Some authors 9,30 have speculated that ...... Since this difference has not been found elsewhere it is probably not due to ...... A possible explanation for some of our results may be the lack of adequate ...... Advising cautious interpretation These data must be interpreted with caution because ...... These results therefore need to be interpreted with caution. However, with a small sample size, caution must be applied, as the findings might not be transferable to ...... These findings cannot be extrapolated to all patients. Although exclusion of X did not reduce the effect on X, these results should be interpreted with caution. However, with a small sample size, caution must be applied, as the findings might not be transferable to ...... Suggesting general hypotheses The value of X suggests that a weak link may exist between ..... It is therefore likely that such connections exist between ..... It can thus be suggested that ...... It is possible to hypothesise that these conditions are less likely to occur in ...... It is possible/likely/probable therefore that ...... Hence, it could conceivably be hypothesised that ...... These findings suggest that ...... It may be the case therefore that these variations ...... In general, therefore, it seems that ...... It is possible, therefore, that ...... Therefore, X could be a major factor, if not the only one, causing ...... It can therefore be assumed that the ...... This finding, while preliminary, suggests that…… Noting implications This finding has important implications for developing ..... An implication of this is the possibility that ...... One of the issues that emerges from these findings is ...... Some of the issues emerging from this finding relate specifically to ...... This combination of findings provides some support for the conceptual premise that ..... Commenting on findings However, these results were not very encouraging. These findings are rather disappointing. The test was successful as it was able to identify students who ...... The present results are significant in at least major two respects. The results of this study do not explain the occurrence of these adverse events. Suggestions for future work However, more research on this topic needs to be undertaken before the association between X and Y is more clearly understood. Further research should be done to investigate the ...... Research questions that could be asked include ..... Future studies on the current topic are therefore recommended. A further study with more focus on X is therefore suggested. Further studies, which take these variables into account, will need to be undertaken. Further work is required to establish this. In future investigations it might be possible to use a different X in which ...... This is an important issue for future research. Writing Conclusions Conclusions are shorter sections of academic texts which usually serve two functions. The first is to summarise and bring together the main areas covered in the writing, which might be called "looking back"; and the second is to give a final comment or judgement on this. The final comment may also include making suggestions for improvement and speculating on future directions. In dissertations and research papers, conclusions tend to be more complex and will also include sections on significance of the findings and recommendations for future work. Conclusions may be optional in research articles where consolidation of the study and general implications are covered in the Discussion section. However, they are usually expected in dissertations and essays. Summarising the content This paper has given an account of and the reasons for the widespread use of X ...... This essay has argued that X is the best instrument to ...... This assignment has explained the central importance of X in Y. This dissertation has investigated ...... Restatement of aims (research) This study set out to determine ...... The present study was designed to determine the effect of ....... In this investigation, the aim was to assess ...... The purpose of the current study was to determine ...... This project was undertaken to design ...... and evaluate ..... Returning to the hypothesis/question posed at the beginning of this study, it is now possible to state that ..... Summarising the findings (research) This study has shown that ...... These findings suggest that in general ...... One of the more significant findings to emerge from this study is that ..... It was also shown that...... This study has found that generally ....... The following conclusions can be drawn from the present study ...... The relevance of X is clearly supported by the current findings. This study/research has shown that ...... The second major finding was that ........ The results of this investigation show that ....... The most obvious finding to emerge from this study is that ...... X, Y and Z emerged as reliable predictors of ...... Multiple regression analysis revealed that the ...... Suggesting implications The evidence from this study suggests that ...... The results of this study indicate that ...... The results of this research support the idea that ....... In general, therefore, it seems that ...... Taken together, these results suggest that ...... An implication of this is the possibility that ...... The findings of this study suggest that ...... Significance of the findings (research contribution) The X that we have identified therefore assists in our understanding of the role of ...... These findings enhance our understanding of ...... This research will serve as a base for future studies and ...... The current findings add substantially to our understanding of ...... The current findings add to a growing body of literature on ...... The study has gone some way towards enhancing our understanding of ...... The methods used for this X may be applied to other Xs elsewhere in the world. The present study, however, makes several noteworthy contributions to...... The empirical findings in this study provide a new understanding of …… The findings from this study make several contributions to the current literature. First,…… The present study provides additional evidence with respect to …… Taken together, these findings suggest a role for X in promoting Y. The present study confirms previous findings and contributes additional evidence that suggests .... . Whilst this study did not confirm X, it did partially substantiate ....... Limitations of the current study (research) Finally, a number of important limitations need to be considered. First, ...... A number of caveats need to be noted regarding the present study. The most important limitation lies in the fact that ...... The current investigation was limited by ...... The current study was unable to analyse these variables. The current research was not specifically designed to evaluate factors related to ...... The current study has only examined ...... The project was limited in several ways. First, the project used a convenience sample that ...... However, with a small sample size, caution must be applied, as the findings might not be transferable to ...... The sample was nationally representative of X but would tend to miss people who were ...... A limitation of this study is that the numbers of patients and controls were relatively small. Thirdly, the study did not evaluate the use of ...... However, these findings are limited by the use of a cross sectional design. Our findings in this report are subject to at least three limitations. First, these data apply only to ….. An issue that was not addressed in this study was whether….. One source of weakness in this study which could hare affected the measurements of was that …… Several limitations to this pilot study need to be acknowledged. The sample size is ...... The main weakness of this study was the paucity of…… Recommendations for further work (research) This research has thrown up many questions in need of further investigation. Further work needs to be done to establish whether ...... It is recommended that further research be undertaken in the following areas: Further experimental investigations are needed to estimate ...... What is now needed is a cross-national study involving ...... More broadly, research is also needed to determine ..... It is suggested that the association of these factors is investigated in future studies. Further research might explore/investigate ...... Further research in this field/regarding the role of X would be of great help in ....... Further investigation and experimentation into X is strongly recommended. A number of possible future studies using the same experimental set up are apparent. It would be interesting to assess the effects of ...... More information on X would help us to establish a greater degree of accuracy on this matter. If the debate is to be moved forward, a better understanding of ...... needs to be developed. I suggest that before X is introduced, a study similar to this one should be carried out on ..... These findings provide the following insights for future research: ..... Considerably more work will need to be done to determine ...... Future trials should assess a full selective decontamination regimen including More research is needed to better understand when implementation ends and ....... It would be interesting to compare experiences of individuals within the same … group. A further study could assess …... A future study investigating …... would be very interesting. The issue of X is an intriguing one which could be usefully explored in further research. Future research should therefore concentrate on the investigation of …... Large randomised controlled trials could provide more definitive evidence. Implications/recommendations for practice or policy These findings suggest several courses of action for ...... An implication of these findings is that both X and Y should be taken into account when ...... The findings of this study have a number of important implications for future practice. There is, therefore, a definite need for ...... There are a number of important changes which need to be made. Another important practical implication is that ...... Moreover, more X should be made available to ...... Other types of X could include : a), b). ...... Unless governments adopt X, Y will not be attained. This information can be used to develop targetted interventions aimed at ...... A reasonable approach to tackle this issue could be to ...... Writing Definitions In academic work students are often expected to give definitions of key words and phrases in order to demonstrate to their tutors that they understand these terms clearly. Academic writers generally, however, define terms so that their readers understand exactly what is meant when certain key terms are used. When important words are not clearly understood misinterpretation may result. In fact, many disagreements (academic, legal, diplomatic, personal) arise as a result of different interpretations of the same term. In academic writing, teachers and their students often have to explore these differing interpretations before moving on to study a topic. Introductory phrases: It is necessary here to clarify exactly what is meant by ..... This shows a need to be explicit about exactly what is meant by the word X. X is a term frequently used in the literature, but to date there is no consensus about ...... Simple three-part definitions A university is an institution where knowledge is "produced" and passed on to others. Social Economics may be broadly defined as the branch of economics concerned with the measurement, causes and consequences of social problems. Research may be defined as a systematic process which consists of three elements or components: (1) a question, problem, or hypothesis, (2) data, and (3) analysis and interpretation of data. General meanings / application of meanings: The term X has come to be used to refer to ...... The term X is generally understood to mean ...... The term X has been applied to situations where students ...... In broad biological terms, X can be defined as any stimulus that is ....... The broad use of the term X is sometimes equated with ...... The term disease refers to a biological event characterised by ....... In the literature, the term tends to be used to refer to ...... X can be defined as ...... It encompasses ...... The term X is a relatively new name for a Y, commonly referred to...…. X can be loosely described as a correlation. Indicating difficulties in defining a term: In the field of language teaching, various definitions of fluency are found. Fluency is a commonly used notion in language learning and yet it is a concept difficult to define precisely . A generally accepted definition of fluency is lacking. Smith (2001) identified four abilities that might be subsumed under the term fluency: a) ..... The term poststructuralism embodies a multitude of concepts which ...... Although differences of opinion still exist, there appears to be some agreement that X refers to ...... Specifying terms that are used in an essay/thesis: In this essay the term overseas student will be used in its broadest sense to refer to all students who ...... Throughout this thesis, the term education is used to refer to informal systems as well as formal systems. While a variety of definitions of the term X have been suggested , this paper will use the definition first suggested by Smith (1968) who saw it as ....... In this paper, the term that will be used to describe this phenomenon is X In this dissertation the terms X and Y are used interchangeably to mean ...... Referring to people's definitions (author prominent): Smith (1954) was apparently the first to use the term ...... Chomsky writes that a grammar is a 'device of some sort for producing the .....' (1957, p.11). According to a definition provided by Smith (2001:23), fluency is 'the maximally ...... The term "fluency" is used by Smith (2001) to refer to ...... Smith (2001) uses the term "fluency" to refer to ...... For Smith (2001), fluency means/refers to ....... Macro-stabilisation policy is defined by Smith (2003: 119) as "......................" Aristotle defines the imagination as "the movement which results upon an actual sensation." The term "matter" is used by Aristotle in four overlapping senses. First, it is the underlying ....... Secondly, it is the potential which ...... Smith et al. (2002) have provided a new definition of health: "health is a state of being with physical, cultural, psychological ....." In 1987, sports psychologist John Smith popularized the term X to describe ...... Referring to people's definitions (author non-prominent): Validity is the degree to which an assessment process or device measures what it is intended to measure (Smith et al., 1986) Giving Examples Writers may give specific examples as evidence to support their general claims or arguments. Examples can also be used to help the reader or listener understand unfamiliar or difficult concepts, and they tend to be easier to remember. For this reason, they are often used in teaching. Finally, students may be required to give examples in their work to demonstrate that they have understood a complex problem or concept. Many paragraphs in academic writing show development from general statements to specific details or examples. In most paragraphs, therefore, examples usually come after a more general statement, as in the short extract below. Many words can often acquire a more narrow meaning over time, or may come to be chiefly used in one special sense. A classic example of this practice is the word doctor. There were doctors (i.e., learned men) in theology, law, and many other fields beside medicine, but nowadays when we send for the doctor we mean a member of only one profession. Examples as the main information in a sentence: For example / instance, the word doctor used to mean a learned man. For example , Smith and Jones (2004) conducted a series of semi-structured interviews in ...... By way of illustration , Smith (2003) shows how the data for ..... A classic / well-known example of this is ....... An example of this is the study carried out by Smith (2004) in which ....... X is a good example / illustration of ....... X illustrates this point / shows this point clearly. This can be illustrated briefly by ....... Young people begin smoking for a variety of reasons. They may, for example , be influenced by their peers, or they may see their parents as role models. The evidence of X can be clearly seen in the case of..… Another example of what is meant by X is ...... Diseases that can result at least in part from stress include arthritis, asthma, migrane, headaches and ulcers. Examples as additional information in a sentence Young people begin smoking for a variety of reasons, such as pressure from peers and the role model of parents. Pavlov found that if some other stimulus, for example the ringing of a bell, preceded the food, the dog would start salivating. In Paris, Gassendi kept in close contact with many other prominent scholars such as Kepler, Galileo, Hobbes, and Descartes. The prices of resources, such as copper, iron ore, oil, coal and aluminium, have declined in real terms over the past 20 years. Many diseases can result at least in part from stress, including: arthritis, asthma, migrane, headaches and ulcers. Classifying and Listing When we classify things, we group and name them on the basis of something that they have in common. By doing this we can understand certain qualities and features which they shares as a class. Classifying is also a way of understanding differences between things. In writing, classifying is often used as a way of introducing a reader to a new topic. Along with writing definitions, the function of classification may be used in the early part of an essay, or longer piece of writing. We list things when we want to treat and present a series of items or different pieces of information systematically. A list is series if items. The order of a list may indicate rank importance. General Classifications X may be divided into three main classes sub-groups categories X may be classified on the basis of according to depending on in terms of Y into Xi and Xii Bone is generally classified into two types: cortical bone, also known as ....., and cancellous bone or ...... Aristotle's systematic treatises may be grouped in several divisions: logic, psychological works, physical ...... The works of Aristotle fall under three headings: (1) dialogues and ......; (2) collections of facts and ......; and (3) systematic works. There are two basic approaches currently being adopted in research into X. One is the Y approach and the other is ..... Associative learning can be categorised into classical and operant conditioning. Classical conditioning was first ...... Generally, spectratyping provides two types of information: band intensity pattern and band number. Specific Classifications: In the U.S. system, X is graded according to whether ..... on the basis of ...... in terms of Smith (1966) divided classified grouped Xs into two broad types: Xi's and Xii's Thomas and Nelson (1996) describe four basic types of validity: logical, content, criterion and contruct. Smith and Jones (2003) argue that there are two broad categories of Y, which are : a) ...... and b) .... For Aristotle, motion is of four kinds : (1) motion which ......; (2) motion which ......; (3) motion which ......; and (4) motion which....... Introducing Lists: The key aspects of management can be listed as follows: There are three reasons why the English language has become so dominant. These are: There are two types of effect which result when a patient undergoes X. These are ...... Appetitive stimuli have three separable basic functions. Firstly, they ....... Secondly, they ...... The disadvantages of the new approach can be discussed under three headings, which are : ...... This topic can best be treated under three headings: X. Y and Z. This section has been included for several reasons: it is ......; it illustrates ......; and it describes....... The "Mass for Four Voices" consists of five movements, which are : the Kyrie, Gloria, Credo, Santus and Agnus Dei. The "Three Voices for Mass" is divided into six sections. These are : the Kyrie, Gloria, ....... Refering to other people's lists Smith (2003) suggests three conditions for its acceptance. Firstly, X should be ..... Secondly, it needs to be.... Thirdly, ..... Smith and Jones (1991) list X, Y and Z as the major causes of infant mortality. Smith and Jones (2003) argue that there are two broad categories of Y, which are : a) ...... and b) .... For Aristotle, motion is of four kinds : (1) motion which ......; (2) motion which ......; (3) motion which ......; and (4) motion which....... Smith (2003) lists the main features of X as follows : it is X; it is Y; and has Z. Describing Causes and Effects A great deal of academic work involves understanding and suggesting solutions to problems. At postgraduate level, particularly in applied fields, students search out problems to study. In fact, one could say that problems are the food for a significant proportion of academic activity. However, solutions cannot be suggested unless the problem is fully analysed, and this involves a thorough understanding of the causes. Some of the language that you may find useful for explaining causes and effects is listed below: Verbs expressing causality Lack of protein may cause can lead to can result in mental retardation. Low levels of chlorine in the body can give rise to high blood presssure. Much of the instability stems from the economic effects of the war. Kwashiorkor is a disease Beri-beri is a disease Scurvy is a disease caused by resulting from stemming from insufficient protein. vitamin deficiency. lack of vitamin C. Nouns expressing causality The most likely causes of X are poor diet and lack of exercise. A consequence of vitamin A deficiency is blindness. Physical activity is an important factor in maintaining fitness. Many other medications have an influence on cholesterol levels. Another reason why Xs are considered to be important is that ....... Prepositional phrases expressing causality 200,000 people per year become deaf owing to because of as a result of a lack of iodine. Sentence connectors expressing causality If undernourished and retarded children do survive to become adults, they have decreased learning ability. Therefore, Consequently, Because of this, As a result (of this), when they grow up, it will probably be difficult for them to find work. Adverbial phrases expressing causality Malnutrition leads to illness and a reduced ability to work in adulthood, thus/thereby perpetuating the poverty cycle. The warm air rises above the surface of the sea, thus/thereby creating an area of low pressure. Other examples As a consequence of X , it appears that winds alone are not the causative factor of....... Due to X and Y inflowing surface water becomes more dense as it ....... X and Y are important driving factors of Z. The mixing of X and Y exerts a powerful effect upon Z through ...... Possible cause and effect relationships (expressed tentatively) This suggests a weak link may exist between X and Y. The human papilloma virus is linked to most cervical cancer. Stomach cancer in many cases may be associated with certain bacterial infections. A high consumption of seafood could be associated with infertility. There is some evidence that X may affect Y. Comparing and Contrasting By understanding similarities and differences between two things, we can increase our understanding and learn more about both. This usually involves a process of analysis, in which we compare the specific parts as well as whole. Comparison may also be a preliminary stage of evaluation. For example, by comparing specific aspects of A and B, we can decide which is more useful or valuable. Many paragraphs whose function is to compare or contrast will begin with an introductory sentence expressed in general terms. Note the introductory sentences below: Introductory Sentences: Differences X is different from Y in a number of respects . There are a number of important differences between X and Y. X differs from Y in a number of important ways . Smith (2003) found distinct differences between X and Y. Women and men differ not only in physical attributes but also in the way in which they ...... Introductory Sentences: Similarities The mode of processing used by the right brain is similar to that used by the left brain. The mode of processing used by the right brain is comparable in complexity to that used by the left brain. The effects of nitrous dioxide on human health are similar to those of ground level ozone. Both X and Y generally take place in a "safe environment". There are a number of similarities between X and Y. Numerous studies have compared the brain cells in man and animals and found that the cells are essentially identical. Comparison within one sentence In contrast to oral communities, it is very difficult to get away from calendar time in literate societies. Compared with people in oral cultures, people in literate cultures organise their lives around clocks and calendars. Oral societies tend to be more concerned with the present, whereas literate societies have a very definite awareness of the past. Whereas Ghazali rejected non-Islamic philosophers, Aquinas incorporated ancient Greek thought into his own philosophical writings. Women's brains process language simultaneously in the two sides of the brain, while men tend to process it in the left side only. This interpretation contrasts with that of Smith and Jones (2004) who argue that ...... Comparison within one sentence (comparative forms) Women are faster/slower than men at certain precision manual tasks, such as placing pegs in holes on a board. Women tend to perform better/worse than men on tests of perceptual speed. Further, men are more/less accurate in tests of target-directed motor skills. The corpus callosum, a part of the brain connecting the two hemispheres, may be more/less extensive in women. Women are more/less likely than men to suffer aphasia when the front part of the brain is damaged. Adolescents are less likely to be put to sleep by alcohol than adults. Women tend to have greater/less verbal fluency than men. Men learned the route in fewer trials and made fewer errors than did women. Comparison across two sentences It is very difficult to get away from calendar time in literate societies. By contrast/in contrast , many people in oral communities have little idea of the calendar year of their birth. Tests show that women generally can recall lists of words or paragraphs of text better than men. On the other hand , men usually perform better on tests that require the ability to mentally rotate an image in order to solve a problem. Young children learning their first language need simplified, comprehensible input. Similarly , low level adult L2 learners need graded input supplied in most cases by a teacher. Speech functions are less likely to be affected in women because the critical area is less often affected. A similar pattern emerges in studies of the control of hand movements. Writing about the Past Writing about the past in English is made diffcult by the rather complex tense system. However the phrases grouped below give an indication of the uses of the main tenses in academic writng. For a comprehensive explanation of the uses of the various tenses you will need to consult a good English grammar book. A good recommendation is Practical English Usage by Michael Swan, OUP. Time phrases associated with the use of the simple past tense (specific times or periods of time in the past completed) For centuries, In the second half of the 19th century, At the end of the nineteenth century, church authorities placed restrictions on academics. During the Nazi period, Between 1933 and 1945, From 1933 to 1945, In the 1930s and 1940s, restrictions were placed on German academics. Reference to single investigations or publications in the past: simple past tense used The first systematic study of the X was reported by Patel et al. in 1986. Erythromycin was originally isolated from X in a soil sample from ...... (Wang et al., 1952). In 1975, Smith et al. published a paper in which they described ..... In 1990 Patel et al. demonstrated that replacement of H2O with heavy water led to ...... Thirty years later, Smith (1974) reported three cases of Candida Albicans which ....... In the 1950s Gunnar Myrdal pointed to some of the ways in which …………… (Myrdal, 1957) In 1981, Smith and co workers demonstrated that X induced in vitro resistance to ....... In 1984 Jones et al. made several amino acid esters of X and evaluated them as water-soluble pro-drugs. An experimental demonstration of this effect was first carried out by ...... The first experimental realisation of ......, by Smith et al. , used a ...... Time phrases associated with the use of the present perfect tense (for situations/actions which began in the past and continue up to the present, or for which the period of time is unspecified): Over the past few decades, the world has seen the stunning transformation of X, Y and Z. Since 1965, these four economies have doubled their share of world production and trade. Until recently, there has been little interest in X. Recently, these questions have been addressed by researchers in many fields. In recent years researchers have investigated a variety of approaches to X but .... Up to now, the research has tended to focus on X rather than on Y. To date, little evidence has been found associating X with Y. So far, three factors have been identified as being potentially important: X, Y, and Z. The present perfect tense may also be used to describe recent research or scholarly activity with focus on the area of enquiry - usually more than one study There have been several investigations into the causes of illiteracy (Smith, 1985; Jones, 1987). The relationship between a diet high in fats and poor health has been widely investigated (Smith, 1985, Jones, 1987, Johnson, 1992). The new material has been shown to enhance cooling properties (Smith, 1985, Jones, 1987, Johnson, 1992). Invasive plants have been identified as major contributing factors for the decline of many North American species (1). A considerable amount of literature has been published on X. Describing Trends and Projections A trend is a description of change over time. A projection is a prediction of future change. Trends and projections are usually illustrated using line graphs in which the horizontal axis represents time. Some of the language commonly used for writing about trends and projections is given below. Describing trends The graph shows that there has been a slight gradual steady marked steep sharp increase rise decrease fall decline drop in the number of divorces in England and Wales since 1981. Describing high and low points in figures The number of live births outside marriage reached a peak during the second world war. The peak age for committing a crime is 18. Oil production peaked in 1985. Gas production reached a (new) low in 1990. Projecting trends The number of Xs The amount of Y The rate of Z is projected to is expected to is likely to will probably decline steadily drop sharply level off after 2010. Describing Quantities Describing ratios and proportions The proportion of live births outside marriage reached one in ten in 1945. The annual birth rate dropped from 44.4 to 38.6 per 1000 per annun. Describing fractions Of the 148 patients who completed the questionnaire, just over half indicated that ....... The response rate was 60% at six months and 56% at 12 months. Over half of those surveyed indicated that ...... 70% of those who were interviewed indicated that ..... Approximately half of those surveyed did not comment on ...... Nearly half of the respondents (48%) agreed that ...... Less than a third of those who responded (32%) indicated that ...... The number of first marriages in the United Kingdom fell by nearly two-fifths. Describing percentages 13.1% of young men and 23.1% of young women who had married said that they ...... Returned surveys from 34 radiologists yielded a 34% response rate. The response rate was 60% at six months and 56% at 12 months. East Anglia had the lowest proportion of lone parents at only 14 per cent. Since 1981, England has experienced an 89 % increase in crime. The mean income of the bottom 20 percent of U.S. families declined from $10,716 in 1970 to ....... A study in Java found that of 2,558 abortions, 58% were in young women aged 15-24, of whom 62% were ..... He also noted that less than 10% of the articles included in his study cited ...... In 1960 just over 5% of live births in 1960 were outside marriage. Describing averages This figure can be seen as the average life expectancy at various ages. The proposed model suggests a steep decline in mean life expectancy ...... Roman slaves probably had a lower than average life expectancy. The average of 12 observations in the X, Y and Z is 19.2 mgs/m ..... The mean score for the two trials was subjected to multivariate analysis of variance to determine ...... The mean income of the bottom 20 percent of U.S. families declined from $10,716 in 1970 to ....... Describing ranges The evidence shows that life expectancy from birth lies in the range of twenty to thirty years. Between 575 and 590 metres depth the sea floor is extremely flat, with an average slope of only 1 : 400 The mean income of the bottom 20 percent of U.S. families declined from 10 , 716 i n 1970 t o 10,716 in 1970 to 9,833 in 1990. The respondents had practiced for an average of 15 years (range 6 to 35 years) The participants were aged 19 to 25 and were from both rural and urban backgrounds. They calculated ranges of journal use from 10.7%–36.4% for the humanities, 25%–57% for the ...... Rates of decline ranged from 2.71– 0.08 cm day-1 (Table 11) with a mean of 0.97 cm day-1. It has been estimated that 300,000 people suffer form ......
Corpus Linguistics 2013 语料库语言学2013国际会议 Lancaster University, UK – 22nd to 26th July 2013 The seventh international Corpus Linguistics conference ( CL2013 ) will be held at Lancaster University from Tuesday 23rd July 2013 to Friday 26th July 2013. The main conference will be preceded by a workshop day on Monday 22nd July. The conference is hosted by the UCREL research centre , which brings together the Department of Linguistics and English Language with the School of Computing and Communications at Lancaster. About the conference The goals of the conference are as follows. To gather together current and developing research in the study and application of corpus linguistics; To push the field forwards by promoting dialogue among the many different users of corpora across interconnected sub-disciplines of linguistics – be they descriptive, theoretical, applied or computational; To explore new challenges both within corpus linguistics, and in the extension of corpus approaches to new fields of study. With these goals in mind, we invite contributions on as broad and inclusive a basis as possible. The areas in which we particularly welcome submissions include but are not limited to: Critical explorations of existing measures and methods in corpus linguistics; New methods and techniques in corpus development, annotation and analysis; Corpus approaches to the study of new media; New tools and techniques developed in corpus-based computational linguistics; The application of corpus approaches in the social sciences and humanities; The extension of corpus linguistics to an ever-wider range of (non-English) languages; The interface between corpus and theory; The use of corpora in discourse analysis; The use of corpora in second language acquisition studies and language pedagogy. Plenary speakers We are delighted to announce that the following speakers have accepted our invitation to give plenary lectures at CL2013: Karin Aijmer Guy Cook Michael Hoey Ute Rmer Key dates 31st October 2012 – abstract submission opens via conference website 31st October 2012 – deadline for proposals for workshops 8th January 2013 – deadline for abstract submission 15th February 2013 – notification of the outcome of peer review; early bird registration opens 1st April 2013 – early bird registration closes 30th June 2013 – final deadline for registration 22nd / 23rd July 2013 – workshop day / main conference begins 会议网站: http://ucrel.lancs.ac.uk/cl2013/
第十四届汉语词汇语义学研讨会 (CLSW2013) 征稿通知 2013 年 5 月 10 日 -12 日 中国郑州 http://nlp.zzu.edu.cn/CLSW2013/index.html 汉语词汇语义学研讨会 (Chinese Lexical Semantic Workshop ,简称 CLSW) 由台湾中央研究院郑锦全院士、北京大学俞士汶教授与香港理工大学黄居仁教授等共同倡办。自 2000 年起,连续 13 载,先后在香港、北京、台北、新加坡、厦门、新竹、烟台、苏州、武汉等地举行。 CLSW 是汉语词汇语义学和相关领域(如理论语言学 , 应用语言学、计算语言学和词典计算机辅助编纂等)的重要学术会议,已形成系列,并产生广泛的影响,推动了本领域的学术研究和应用开发。 第十四届词汇语义研讨会 (CLSW2013) 将于 2013 年在中国河南省郑州市郑州大学举行。 CLSW2013 竭诚邀请您赐稿。会议将推荐优秀论文发表至《中文信息学报》等期刊,会后英文论文集将由 Springer(LNAI) 出版,并被 EI 检索。相关信息如下: 一、 会期: 2013 年 5 月 10 日 -12 日 二、 地点:郑州大学信息工程学院 三、 征文范围 CLSW2013 研讨会所涉及的主题包括汉语词汇语义学的理论、方法、计算及其应用,征集在汉语词汇语义学方面有较好原创性研究成果且未曾公开发表的论文,具体包括但不限于以下所列的研究领域: Ø 词汇语义学各方面、各领域的最新进展。比如:义项、义位、义原、概念分类体系、语义特征、语义网、汉外词汇对比、词汇语义与句法语义的关系等;   Ø 语料库的建设及语义标注的理论、技术、工具、方法、规范等;  Ø 汉语各类词汇基础资源(辞书、义类词典、知识本体等)的建设, 比如综合型语言知识库 CLKB 、知网等 ;  Ø 汉语词汇语义的表示、计算和推理机制;  Ø 汉语词汇语义学在自然语言处理方面的应用,包括信息提取、信息检索、问答系统、机器翻译、辞典编纂等领域;   Ø 汉语词汇语义学的新方法,包括机器学习、演化计算、神经网络等方面研究。 四、 重要日期 Ø 论文投稿截止日期: 2013 年 1 月 20 日 Ø 论文审稿录用通知: 2013 年 2 月 22 日 Ø 论文修改版提交日期: 2013 年 3 月 8 日 五、工作语言:汉语、英语 六、投稿论文 论文可由汉语或英语两种语言书写,论文长度为 4-6 页(包括图表与参考文献),采用 pdf 或 Word 格式电子文档投稿。论文格式详见会议网站相关内容。一位作者仅可以提交一篇其为第一作者的论文。 七、 联络方式 如有任何问题,请邮件至 clsw2013@zzu.edu.cn
Jones fails to fully acknowledge the significance of ...... the paper would appear to be over ambitious in its claims the author overlooks the fact that X contributes to Y. what Smith fails to do is to draw a distinction between ...... another weakness is that we are given no explanantion of how ...... no attempt was made to quantify the association between X and Y. Offering constructive suggestions Smith's paper Her conclusions The study The findings would have been might have been more much more far more useful convincing interesting persuasive original if he/she had if the author had included ...... considered ...... adopted ...... used ...... A better study would examine a large, randomly selected sample of societies with ...... A much more systematic study would identify how X interacts with other variables that are believed to be linked to ...... Highlighting inadequacies of previous studies Most studies in the field of X have only focussed on ...... Most studies in X have only been carried out in a small number of areas. The generalisability of much published research on this issue is problematic. The experimental data are rather controversial, and there is no general agreement about ...... Such expositions are unsatisfactory because they ..... However, few writers have been able to draw on any structured research into the opinions and attitudes of ...... The research to date has tended to focus on X rather than Y. The existing accounts fail to resolve the contradiction between X and Y. Researchers have not treated X in much detail. Previous studies of X have not dealt with ...... However, these studies used non-validated methods to measure ..... Half of the studies evaluated failed to specify whether ...... However, much of the research up to now has been descriptive in nature …. Although extensive research has been carried out on X, no single study exists which adequately covers ...... However, these results were based upon data from over 30 years ago and it is unclear if these differences still persist. Introducing other people's criticisms However, Jones (2003) points out that ..... Many analysts now argue that the strategy of X has not been successful. Jones (2003), for example, argues that ..... Non-government agencies are also very critical of the new policies. The X theory has been / vigorously / strongly challenged in recent years by a number of writers. Smith's analysis has been criticised by a number of writers. Jones (1993), for example, points out that …… Smith's meta-analysis has been subjected to considerable criticism. The most important of these criticisms is that Smith failed to note that ...... Jones (2003) is probably the best known critic of the X theory. He argues that .…. The latter point has been devastatingly critiqued by Jones (2003). Critics have also argued that not only do social surveys provide an inaccurate measure of X, but the...... Critics question the ability of poststructuralist theory to provide ...... More recent arguments against X have been summarised by Smith and Jones (1982): Jones (2003) is critical of the conclusions that Smith draws from his findings. Describing Methods In the Methods section of a dissertation or research article, writers give an account of how they carried out their research.The Materials and Methods section should be clear and detailed enough for another experienced person to repeat the research and reproduce the results. Typical features with examples of this language are listed below. Describing different methods To date various methods have been developed and introduced to measure X: In most recent studies, X is measured in four different ways. Radiographic techniques are the main non-invasive method used to determine .... Different authors have measured X in a variety of ways. Previous studies have based their criteria for selection on ...... A variety of methods are used to assess X. Each has its advantages and drawbacks. Data were gathered from multiple sources at various time points during the 2007–2008 academic year. Giving reasons why a particular method was adopted The semi-structured approach was chosen because ...... Smith et al (1994) identify several advantages of the case study, ....... It was decided that the best method to adopt for this investigation was to ...... A case study approach was chosen to allow a ...... The design of the questionnaires was based on ...... The X method is one of the more practical ways of ...... It was considered that quantitative measures would usefully supplement and extend the qualitative analysis. Many of the distributions were not normal so non-parametric signed rank tests were run. The X approach has a number of attractive features: ...... Indicating a specific method Article references were searched further for additional relevant publications. Articles were searched from January 1965 until April 2008. Publications were only included if ……. X was prepared according to the procedure used by Patel et al. (1957). The synthesis of X was done according to the procedure of Smith (1973). X was synthesised using the same method that was detailed for Y, using ...... This compound was prepared by adapting the procedure used by Zhao et al. (1990). For this study the X was used to explore the subsurface …… An alternative method for making scales homogenous is by using ….. Describing the characterisitics of the sample The initial sample consisted of 200 students of whom 13 did not complete all of the interviews All studies described as using some sort of X procedure were included in the analysis. A systematic literature review was conducted of studies that ..... All of the participants were aged between 18 and 19 at the beginning of the study..... Two groups of subjects were interviewed, namely X and Y. The first group were ...... A random sample of patients with ...... was recruited from ....... Forty-seven students studying X were recruited for this study. The students were divided into two groups based on their performance on ...... The project used a convenience sample of 32 first year modern languages students. Just over half the sample (53%) was female, of whom 69% were ...... Participants were recruited from 15 clinics across ......, covering urban and rural areas …… Eligibility criteria required individuals to have received …. Five individuals were excluded form the study on the basis of …. Eligible women who matched the selection criteria were identified by …… Semi structured interviews were conducted with 17 male offenders with a mean age of 38 years A comparison group of 12 male subjects without any history of X was drawn from a pool of ……. Indicating reasons for sample characteristics A small sample was chosen because of the expected difficulty of obtaining ...... The subjects were selected on the basis of a degree of homogeneity of their ....... Criteria for selecting the subjects were as follows: Describing the process: infinitive of purpose In order to identify the T10 and T11 spinous processes, the subjects were asked to ...... In order to understand how X regulates Y, a series of transfections was performed.. To enable the subjects to see the computer screen clearly, the laptop was configured with ...... To see if the two methods gave the same measurement, the data was plotted and ...... To control for bias , measurements were carried out by another person. To measure X , a question asking ...... was used. To determine whether ......, KG-1 cells were incubated for ...... To establish whether ......, To increase the reliability of measures, each X was tested twice with a 4-min break between ....... To compare the scores three weeks after initial screening, a global ANOVA F-test was used The vials were capped with ..... to prevent volatisation. In an attempt to make each interviewee feel as comfortable as possible, the interviewer ...... Describing the process: other phrases expressing purpose For the purpose of height measurement, subjects were asked to stand ..... For the purpose of analysis , 2 segments were extracted from each ...... For the estimation of protein concentration, 100 μ L of protein sample was mixed with ...... Describing the process: typical verbs (note use of passive form) Data management and analysis was performed using SPSS 8.0 (1999). Published studies were identified using a search startegy developed in ..... The experiments were carried out over the course of the growing period from ....... Injection solutions were coded by a colleague to reduce experimenter bias. Drugs were administered by icv injection under brief CO2 narcosis; The mean score for the two trials was subjected to multivariate analysis of variance to determine ...... The subjects were asked to pay close attention to the characters whenever ...... Prompts were used as an aid to question two so that ...... The pilot interviews were conducted informally by the trained interviewer ...... Blood samples were obtained with consent, from 256 caucasian male patients ...... Independent tests were carried out on the x and y scores for the four years from ...... This experiment was repeated under conditions in which the poor signal/noise ratio was improved. Significance levels were set at the 1% level using the student t-test. A total of 256 samples were taken from 52 boreholes (Figure 11). Describing the process: sequence words/phrases Prior to commencing the study, ethical clearance was sought from ...... In the end , the EGO was selected as the measurement tool for the current study. After "training", the subjects were told that the characters stood for X and that their task was to ....... After collection, the samples were shipped back to X in ...... After conformational analysis of X, it was necessary to ...... Once the Xs were located and marked , a thin clear plastic ruler ...... Once the positions had been decided upon , the Xs were removed from each Y and replaced by ..... Once the exposures were completed , the X was removed from the Y and placed in ...... On completion of X, the process of model specification and parameter estimation was carried out . Following this, the samples were recovered and stored overnight at ...... These ratings were then made for the ten stimuli to which the subject had been exposed ...... The analysis was checked when initially performed and then checked again at the end of ...... The subjects were then shown a film individually and were asked to ...... The soil was then weighed again, and this weight was recorded as ...... The results were corrected for the background readings and then averaged before being converted to...... Finally , questions were asked as to the role of ...... Describing the process: adverbs of manner The soil was then placed in a furnace and gradually heated up to ..... The vials were shaken manually to allow the soil to mix well with the water. The medium was then aseptically transferred to a conical flask. The resulting solution was gently mixed at room temperature for ten minutes and ...... A sample of the concentrate was then carefully injected into ...... The tubes were accurately reweighed to six decimal places using ...... Describing the process: passive verb + using .... for instruments 15 subjects were recruited using email advertisements requesting healthy students from ...... All the work on the computer was carried out using Quattro Pro for Windows and ......l. Data were collected using two high spectral resolution spectroradiometers. The data was recorded on a digital audio recorder and transcribed using a ....... Semi-automated genotyping was carried out using X software and .... Statistical significance was analysed using analysis of variance and t-tests as appropriate. Comparisons between the two groups were made using unrelated t -tests. Using the X-ray and looking at the actual X, it was possible to identify ...... Using an Anthos Microplate Reader were able to separate single cells into different ...... Describing the process: giving detailed information Compounds 3 and 5 were dissolved in X at apparent pH 2.5 to give concentrations of 4mM ..... ...... and the solutions were degraded at 55°C or 37°C for a total time of 42 hours. At intervals of 0.5 min, 50 μ Lof the X was aliquoted into 0.5mL of cooled boric acid buffer (pH 7.5) to ...... Indicating problems or limitations In this investigation there are several sources for error. The main error is ...... Another major source of uncertainty is in the method used to calculate X.. It was not possible to investigate the significant relationships of X and Y further because the sample size was too small. Further data collection is required to determine exactly how X affects Y. Reporting Results The standard approach to this section of a dissertation is to merely present the results, without elaborate discussion or comment. This does not mean that you do not need any text to describe data presented in tables and figures. Writers usually comment on the significant data presented in the tables and figures. This often takes the form of the location or summary statement, which identifies the table or figure and indicates its content. This is normally followed by a statement or statements which point out and describe the relevant or significant data. All your tables should be numbered and given a title. More elaborate commentary on the results is normally restricted to the Discussion section. In research articles, however, authors may comment extensively on their results as they are presented, and it is not uncommon for the Results section to be combined with the Discussion section under the heading: Results and Discussion. Reference to aim/method To assess X, the Y questionnaire was used. To distinguish between these two possibilities, ...... To compare the scores three weeks after initial screening, a global ANOVA F-test was used In order to assess Z, repeated measures of ANOVA were used. Regression analysis was used to predict the ...... Changes in X and Y were compared using ...... The average scores of X and Y were compared in order to ...... Nine items on the questionnaire measured the extent to which ...... The correlation between X and Y was tested. The first set of analyses examined the impact of ...... Simple statistical analysis was used to ...... A scatter diagram and a Pearson's product moment correlation were used to determine the relationship between ...... T-tests were used to analyse the relationship between ...... Comparisons between the two groups were made using unrelated t -tests. Location and summary statements: Table 1 Figure 1 shows compares presents provides the experimental data on X. the results obtained from the preliminary analysis of X. the intercorrelations among the nine measures of X. The results obtained from the preliminary analysis of X are shown can be compared are presented in Table 1. in Fig 1. As shown in Figure 12.1, As can be seen from the table (above), It can be seen from the data in Table 12.1 that From the graph above we can see that the X group reported significantly more Y than the other two groups. The table below illustrates The pie chart above shows some of the main characteristics of the ...... the breakdown of ...... Highlighting significant data in a table/chart It is apparent from this table that very few ...... This table is quite revealing in several ways. First, unlike the other tables ...... Data from this table can be compared with the data in Table 4.6 which shows ...... From the data in Figure 9, it is apparent that the length of time left between ...... From this data we can see that Study 2 resulted in the lowest value of ...... The histogram in Fig 1. indicates that ...... What is interesting in this data is that ...... In Fig.10 there is a clear trend of decreasing ...... As Table III shows, there is a significant difference ( t = -2.15, p = 0.03) between the two groups. Statements of result (positive) Strong evidence of X was found when ...... This result is significant at the p = 0.05 level. There was a significant positive correlation between ...... There was a signifcant difference between the two conditions ...... On average, Xs were shown to have ...... The mean score for X was ...... Interestingly, for those subjects with X, ...... A positive correlation was found between X and Y. The results, as shown in Table 1, indicate that …. Further analysis showed that ...... Further statistical tests revealed ..... Statements of result (negative) There was no increase of X associated with ..... There were no significant differences between ...... No significant differences were found between ..... No increase in X was detected. No difference greater than X was observed. The Chi-square test did not show any significant differences between ...... None of these differences were statistically significant. Overall, X did not affect males and females differently in these measure. No significant reduction in X was found with Y compared with placebo. A clear benefit of X in the prevention of Y could not be identified in this analysis. Highlighting significant, interesting or surprising results The most striking result to emerge from the data is that ...... Interestingly, this correlation is related to ..... The correlation between X and Y is interesting because ...... The more surprising correlation is with the ...... The single most striking observation to emerge from the data comparison was ...... Reporting results from questionnaires and interviews The response rate was 60% at six months and 56% at 12 months. Of the study population, 90 subjects completed and returned the questionnaire. Of the initial cohort of 123 students, 66 were female and 57 male. Thirty-two individuals returned the questionnaires. The majority of respondents/those who responded felt that ..... Over half of those surveyed reported that ...... 70% of those who were interviewed indicated that ...... Almost two-thirds of the participants (64%) said that ...... Approximately half of those surveyed did not comment on ...... A small number of those interviewed suggested that ...... Only a small number of respondents indicated that ...... Of the 148 patients who completed the questionnaire, just over half indicated that ....... A minority of participants (17%) indicated ...... In response to Question 1, most of those surveyed indicated that ...... The overall response to this question was very positive. When the subject were asked ......, the majority commented that ..... Other responses to this question included ...... The overall response to this question was poor. Some participants expressed the belief that ….. One individual stated that …. And another commented ……. Transition statements Turning now to the experimental evidence on ...... Comparing the two results, it can be seen that ...... A comparison of the two results reveals ...... If we now turn to ...... Discussions The term discussion has a variety of meanings in English. In academic writing, however, it usually refers to two types of activity: a) considering both sides of an issue, or question, b) considering the results of research and the implications of these. Discussion sections in dissertations and research articles are probably the most complex in terms of their elements. The most common elements and some of the language that is typically associated with them are listed below: Background information (reference to literature or to research aim/question) A strong relationship between X and Y has been reported in the literature. Prior studies that have noted the importance of ...... In reviewing the literature, no data was found on the association between X and Y. As mentioned in the literature review, ...... Very little was found in the literature on the question of ..... This study set out with the aim of assessing the importance of X in ...... The third question in this research was ...... It was hypothesized that participants with a history of ...... The present study was designed to determine the effect of ...... Statements of result (usually with reference to results section) The results of this study show/indicate that ....... This experiment did not detect any evidence for ...... On the question of X, this study found that ...... The current study found that ...... The most interesting finding was that ...... Another important finding was that ..... The results of this study did not show that ....../did not show any significant increase in ...... In the current study, comparing X with Y showed that the mean degree of ...... In this study, Xs were found to cause ..... X provided the largest set of significant clusters of ...... It is interesting to note that in all seven cases of this study...... Unexpected outcome Surprisingly, X was found to ....... Surprisingly, no differences were found in ...... One unanticipated finding was that ..... It is somewhat surprising that no X was noted in this condition ...... What is surprising is that ...... Contrary to expectations, this study did not find a significant difference between ....... However, the observed difference between X and Y in this study was not significant. However, the ANOVA (one way) showed that these results were not statistically significant. This finding was unexpected and suggests that ...... Reference to previous research (support) This study produced results which corroborate the findings of a great deal of the previous work in this field. The findings of the current study are consistent with those of Smith and Jones (2001) who found ...... This finding supports previous research into this brain area which links X and Y. This study confirms that X is associated with ...... This finding corroborates the ideas of Smith and Jones (2008), who suggested that ...... This finding is in agreement with Smith's (1999) findings which showed ....... It is encouraging to compare this figure with that found by Jones (1993) who found that ..... There are similarities between the attitudes expressed by X in this study and those described by (Smith, 1987, 1995) and Jones (1986) These findings further support the idea of ..... Increased activation in the PCC in this study corroborates these earlier findings. These results are consistent with those of other studies and suggest that ...... The present findings seem to be consistent with other research which found ...... This also accords with our earlier observations, which showed that ...... Reference to previous research (contradict) However, the findings of the current study do not support the previous research. This study has been unable to demonstrate that ...... However, this result has not previously been described. In contrast to earlier findings, however, no evidence of X was detected. Although, these results differ from some published studies (Smith, 1992; Jones, 1996), they are consistent with those of ...... These results results differ from X's 2003 estimate of Y, but they are broadly consistent with earlier ..... Explanations for results: There are several possible explanations for this result. These differences can be explained in part by the proximity of X and Y. A possible explanation for this might be that ..... Another possible explanation for this is that ...... This result may be explained by the fact that ...../ by a number of different factors. It is difficult to explain this result, but it might be related to ...... It seems possible that these results are due to ...... The reason for this is not clear but it may have something to do with ...... It may be that these students benefitted from ...... This inconsistency/discrepancy may be due to ...... This rather contradictory result may be due to ...... These factors may explain the relatively good correlation between X and Y. There are, however, other possible explanations. The possible interference of X can not be ruled out. The observed increase in X could be attributed to ..... The observed correlation between X and Y might be explained in this way. ..... Some authors 9,30 have speculated that ...... Since this difference has not been found elsewhere it is probably not due to ...... A possible explanation for some of our results may be the lack of adequate ...... Advising cautious interpretation These data must be interpreted with caution because ...... These results therefore need to be interpreted with caution. However, with a small sample size, caution must be applied, as the findings might not be transferable to ...... These findings cannot be extrapolated to all patients. Although exclusion of X did not reduce the effect on X, these results should be interpreted with caution. However, with a small sample size, caution must be applied, as the findings might not be transferable to ...... Suggesting general hypotheses The value of X suggests that a weak link may exist between ..... It is therefore likely that such connections exist between ..... It can thus be suggested that ...... It is possible to hypothesise that these conditions are less likely to occur in ...... It is possible/likely/probable therefore that ...... Hence, it could conceivably be hypothesised that ...... These findings suggest that ...... It may be the case therefore that these variations ...... In general, therefore, it seems that ...... It is possible, therefore, that ...... Therefore, X could be a major factor, if not the only one, causing ...... It can therefore be assumed that the ...... This finding, while preliminary, suggests that…… Noting implications This finding has important implications for developing ..... An implication of this is the possibility that ...... One of the issues that emerges from these findings is ...... Some of the issues emerging from this finding relate specifically to ...... This combination of findings provides some support for the conceptual premise that ..... Commenting on findings However, these results were not very encouraging. These findings are rather disappointing. The test was successful as it was able to identify students who ...... The present results are significant in at least major two respects. The results of this study do not explain the occurrence of these adverse events. Suggestions for future work However, more research on this topic needs to be undertaken before the association between X and Y is more clearly understood. Further research should be done to investigate the ...... Research questions that could be asked include ..... Future studies on the current topic are therefore recommended. A further study with more focus on X is therefore suggested. Further studies, which take these variables into account, will need to be undertaken. Further work is required to establish this. In future investigations it might be possible to use a different X in which ...... This is an important issue for future research. Writing Conclusions Conclusions are shorter sections of academic texts which usually serve two functions. The first is to summarise and bring together the main areas covered in the writing, which might be called "looking back"; and the second is to give a final comment or judgement on this. The final comment may also include making suggestions for improvement and speculating on future directions. In dissertations and research papers, conclusions tend to be more complex and will also include sections on significance of the findings and recommendations for future work. Conclusions may be optional in research articles where consolidation of the study and general implications are covered in the Discussion section. However, they are usually expected in dissertations and essays. Summarising the content This paper has given an account of and the reasons for the widespread use of X ...... This essay has argued that X is the best instrument to ...... This assignment has explained the central importance of X in Y. This dissertation has investigated ...... Restatement of aims (research) This study set out to determine ...... The present study was designed to determine the effect of ....... In this investigation, the aim was to assess ...... The purpose of the current study was to determine ...... This project was undertaken to design ...... and evaluate ..... Returning to the hypothesis/question posed at the beginning of this study, it is now possible to state that ..... Summarising the findings (research) This study has shown that ...... These findings suggest that in general ...... One of the more significant findings to emerge from this study is that ..... It was also shown that...... This study has found that generally ....... The following conclusions can be drawn from the present study ...... The relevance of X is clearly supported by the current findings. This study/research has shown that ...... The second major finding was that ........ The results of this investigation show that ....... The most obvious finding to emerge from this study is that ...... X, Y and Z emerged as reliable predictors of ...... Multiple regression analysis revealed that the ...... Suggesting implications The evidence from this study suggests that ...... The results of this study indicate that ...... The results of this research support the idea that ....... In general, therefore, it seems that ...... Taken together, these results suggest that ...... An implication of this is the possibility that ...... The findings of this study suggest that ...... Significance of the findings (research contribution) The X that we have identified therefore assists in our understanding of the role of ...... These findings enhance our understanding of ...... This research will serve as a base for future studies and ...... The current findings add substantially to our understanding of ...... The current findings add to a growing body of literature on ...... The study has gone some way towards enhancing our understanding of ...... The methods used for this X may be applied to other Xs elsewhere in the world. The present study, however, makes several noteworthy contributions to...... The empirical findings in this study provide a new understanding of …… The findings from this study make several contributions to the current literature. First,…… The present study provides additional evidence with respect to …… Taken together, these findings suggest a role for X in promoting Y. The present study confirms previous findings and contributes additional evidence that suggests .... . Whilst this study did not confirm X, it did partially substantiate ....... Limitations of the current study (research) Finally, a number of important limitations need to be considered. First, ...... A number of caveats need to be noted regarding the present study. The most important limitation lies in the fact that ...... The current investigation was limited by ...... The current study was unable to analyse these variables. The current research was not specifically designed to evaluate factors related to ...... The current study has only examined ...... The project was limited in several ways. First, the project used a convenience sample that ...... However, with a small sample size, caution must be applied, as the findings might not be transferable to ...... The sample was nationally representative of X but would tend to miss people who were ...... A limitation of this study is that the numbers of patients and controls were relatively small. Thirdly, the study did not evaluate the use of ...... However, these findings are limited by the use of a cross sectional design. Our findings in this report are subject to at least three limitations. First, these data apply only to ….. An issue that was not addressed in this study was whether….. One source of weakness in this study which could hare affected the measurements of was that …… Several limitations to this pilot study need to be acknowledged. The sample size is ...... The main weakness of this study was the paucity of…… Recommendations for further work (research) This research has thrown up many questions in need of further investigation. Further work needs to be done to establish whether ...... It is recommended that further research be undertaken in the following areas: Further experimental investigations are needed to estimate ...... What is now needed is a cross-national study involving ...... More broadly, research is also needed to determine ..... It is suggested that the association of these factors is investigated in future studies. Further research might explore/investigate ...... Further research in this field/regarding the role of X would be of great help in ....... Further investigation and experimentation into X is strongly recommended. A number of possible future studies using the same experimental set up are apparent. It would be interesting to assess the effects of ...... More information on X would help us to establish a greater degree of accuracy on this matter. If the debate is to be moved forward, a better understanding of ...... needs to be developed. I suggest that before X is introduced, a study similar to this one should be carried out on ..... These findings provide the following insights for future research: ..... Considerably more work will need to be done to determine ...... Future trials should assess a full selective decontamination regimen including More research is needed to better understand when implementation ends and ....... It would be interesting to compare experiences of individuals within the same … group. A further study could assess …... A future study investigating …... would be very interesting. The issue of X is an intriguing one which could be usefully explored in further research. Future research should therefore concentrate on the investigation of …... Large randomised controlled trials could provide more definitive evidence. Implications/recommendations for practice or policy These findings suggest several courses of action for ...... An implication of these findings is that both X and Y should be taken into account when ...... The findings of this study have a number of important implications for future practice. There is, therefore, a definite need for ...... There are a number of important changes which need to be made. Another important practical implication is that ...... Moreover, more X should be made available to ...... Other types of X could include : a), b). ...... Unless governments adopt X, Y will not be attained. This information can be used to develop targetted interventions aimed at ...... A reasonable approach to tackle this issue could be to ...... Writing Definitions In academic work students are often expected to give definitions of key words and phrases in order to demonstrate to their tutors that they understand these terms clearly. Academic writers generally, however, define terms so that their readers understand exactly what is meant when certain key terms are used. When important words are not clearly understood misinterpretation may result. In fact, many disagreements (academic, legal, diplomatic, personal) arise as a result of different interpretations of the same term. In academic writing, teachers and their students often have to explore these differing interpretations before moving on to study a topic. Introductory phrases: It is necessary here to clarify exactly what is meant by ..... This shows a need to be explicit about exactly what is meant by the word X. X is a term frequently used in the literature, but to date there is no consensus about ...... Simple three-part definitions A university is an institution where knowledge is "produced" and passed on to others. Social Economics may be broadly defined as the branch of economics concerned with the measurement, causes and consequences of social problems. Research may be defined as a systematic process which consists of three elements or components: (1) a question, problem, or hypothesis, (2) data, and (3) analysis and interpretation of data. General meanings / application of meanings: The term X has come to be used to refer to ...... The term X is generally understood to mean ...... The term X has been applied to situations where students ...... In broad biological terms, X can be defined as any stimulus that is ....... The broad use of the term X is sometimes equated with ...... The term disease refers to a biological event characterised by ....... In the literature, the term tends to be used to refer to ...... X can be defined as ...... It encompasses ...... The term X is a relatively new name for a Y, commonly referred to...…. X can be loosely described as a correlation. Indicating difficulties in defining a term: In the field of language teaching, various definitions of fluency are found. Fluency is a commonly used notion in language learning and yet it is a concept difficult to define precisely . A generally accepted definition of fluency is lacking. Smith (2001) identified four abilities that might be subsumed under the term fluency: a) ..... The term poststructuralism embodies a multitude of concepts which ...... Although differences of opinion still exist, there appears to be some agreement that X refers to ...... Specifying terms that are used in an essay/thesis: In this essay the term overseas student will be used in its broadest sense to refer to all students who ...... Throughout this thesis, the term education is used to refer to informal systems as well as formal systems. While a variety of definitions of the term X have been suggested , this paper will use the definition first suggested by Smith (1968) who saw it as ....... In this paper, the term that will be used to describe this phenomenon is X In this dissertation the terms X and Y are used interchangeably to mean ...... Referring to people's definitions (author prominent): Smith (1954) was apparently the first to use the term ...... Chomsky writes that a grammar is a 'device of some sort for producing the .....' (1957, p.11). According to a definition provided by Smith (2001:23), fluency is 'the maximally ...... The term "fluency" is used by Smith (2001) to refer to ...... Smith (2001) uses the term "fluency" to refer to ...... For Smith (2001), fluency means/refers to ....... Macro-stabilisation policy is defined by Smith (2003: 119) as "......................" Aristotle defines the imagination as "the movement which results upon an actual sensation." The term "matter" is used by Aristotle in four overlapping senses. First, it is the underlying ....... Secondly, it is the potential which ...... Smith et al. (2002) have provided a new definition of health: "health is a state of being with physical, cultural, psychological ....." In 1987, sports psychologist John Smith popularized the term X to describe ...... Referring to people's definitions (author non-prominent): Validity is the degree to which an assessment process or device measures what it is intended to measure (Smith et al., 1986) Giving Examples Writers may give specific examples as evidence to support their general claims or arguments. Examples can also be used to help the reader or listener understand unfamiliar or difficult concepts, and they tend to be easier to remember. For this reason, they are often used in teaching. Finally, students may be required to give examples in their work to demonstrate that they have understood a complex problem or concept. Many paragraphs in academic writing show development from general statements to specific details or examples. In most paragraphs, therefore, examples usually come after a more general statement, as in the short extract below. Many words can often acquire a more narrow meaning over time, or may come to be chiefly used in one special sense. A classic example of this practice is the word doctor. There were doctors (i.e., learned men) in theology, law, and many other fields beside medicine, but nowadays when we send for the doctor we mean a member of only one profession. Examples as the main information in a sentence: For example / instance, the word doctor used to mean a learned man. For example , Smith and Jones (2004) conducted a series of semi-structured interviews in ...... By way of illustration , Smith (2003) shows how the data for ..... A classic / well-known example of this is ....... An example of this is the study carried out by Smith (2004) in which ....... X is a good example / illustration of ....... X illustrates this point / shows this point clearly. This can be illustrated briefly by ....... Young people begin smoking for a variety of reasons. They may, for example , be influenced by their peers, or they may see their parents as role models. The evidence of X can be clearly seen in the case of..… Another example of what is meant by X is ...... Diseases that can result at least in part from stress include arthritis, asthma, migrane, headaches and ulcers. Examples as additional information in a sentence Young people begin smoking for a variety of reasons, such as pressure from peers and the role model of parents. Pavlov found that if some other stimulus, for example the ringing of a bell, preceded the food, the dog would start salivating. In Paris, Gassendi kept in close contact with many other prominent scholars such as Kepler, Galileo, Hobbes, and Descartes. The prices of resources, such as copper, iron ore, oil, coal and aluminium, have declined in real terms over the past 20 years. Many diseases can result at least in part from stress, including: arthritis, asthma, migrane, headaches and ulcers. Classifying and Listing When we classify things, we group and name them on the basis of something that they have in common. By doing this we can understand certain qualities and features which they shares as a class. Classifying is also a way of understanding differences between things. In writing, classifying is often used as a way of introducing a reader to a new topic. Along with writing definitions, the function of classification may be used in the early part of an essay, or longer piece of writing. We list things when we want to treat and present a series of items or different pieces of information systematically. A list is series if items. The order of a list may indicate rank importance. General Classifications X may be divided into three main classes sub-groups categories X may be classified on the basis of according to depending on in terms of Y into Xi and Xii Bone is generally classified into two types: cortical bone, also known as ....., and cancellous bone or ...... Aristotle's systematic treatises may be grouped in several divisions: logic, psychological works, physical ...... The works of Aristotle fall under three headings: (1) dialogues and ......; (2) collections of facts and ......; and (3) systematic works. There are two basic approaches currently being adopted in research into X. One is the Y approach and the other is ..... Associative learning can be categorised into classical and operant conditioning. Classical conditioning was first ...... Generally, spectratyping provides two types of information: band intensity pattern and band number. Specific Classifications: In the U.S. system, X is graded according to whether ..... on the basis of ...... in terms of Smith (1966) divided classified grouped Xs into two broad types: Xi's and Xii's Thomas and Nelson (1996) describe four basic types of validity: logical, content, criterion and contruct. Smith and Jones (2003) argue that there are two broad categories of Y, which are : a) ...... and b) .... For Aristotle, motion is of four kinds : (1) motion which ......; (2) motion which ......; (3) motion which ......; and (4) motion which....... Introducing Lists: The key aspects of management can be listed as follows: There are three reasons why the English language has become so dominant. These are: There are two types of effect which result when a patient undergoes X. These are ...... Appetitive stimuli have three separable basic functions. Firstly, they ....... Secondly, they ...... The disadvantages of the new approach can be discussed under three headings, which are : ...... This topic can best be treated under three headings: X. Y and Z. This section has been included for several reasons: it is ......; it illustrates ......; and it describes....... The "Mass for Four Voices" consists of five movements, which are : the Kyrie, Gloria, Credo, Santus and Agnus Dei. The "Three Voices for Mass" is divided into six sections. These are : the Kyrie, Gloria, ....... Refering to other people's lists Smith (2003) suggests three conditions for its acceptance. Firstly, X should be ..... Secondly, it needs to be.... Thirdly, ..... Smith and Jones (1991) list X, Y and Z as the major causes of infant mortality. Smith and Jones (2003) argue that there are two broad categories of Y, which are : a) ...... and b) .... For Aristotle, motion is of four kinds : (1) motion which ......; (2) motion which ......; (3) motion which ......; and (4) motion which....... Smith (2003) lists the main features of X as follows : it is X; it is Y; and has Z. Describing Causes and Effects A great deal of academic work involves understanding and suggesting solutions to problems. At postgraduate level, particularly in applied fields, students search out problems to study. In fact, one could say that problems are the food for a significant proportion of academic activity. However, solutions cannot be suggested unless the problem is fully analysed, and this involves a thorough understanding of the causes. Some of the language that you may find useful for explaining causes and effects is listed below: Verbs expressing causality Lack of protein may cause can lead to can result in mental retardation. Low levels of chlorine in the body can give rise to high blood presssure. Much of the instability stems from the economic effects of the war. Kwashiorkor is a disease Beri-beri is a disease Scurvy is a disease caused by resulting from stemming from insufficient protein. vitamin deficiency. lack of vitamin C. Nouns expressing causality The most likely causes of X are poor diet and lack of exercise. A consequence of vitamin A deficiency is blindness. Physical activity is an important factor in maintaining fitness. Many other medications have an influence on cholesterol levels. Another reason why Xs are considered to be important is that ....... Prepositional phrases expressing causality 200,000 people per year become deaf owing to because of as a result of a lack of iodine. Sentence connectors expressing causality If undernourished and retarded children do survive to become adults, they have decreased learning ability. Therefore, Consequently, Because of this, As a result (of this), when they grow up, it will probably be difficult for them to find work. Adverbial phrases expressing causality Malnutrition leads to illness and a reduced ability to work in adulthood, thus/thereby perpetuating the poverty cycle. The warm air rises above the surface of the sea, thus/thereby creating an area of low pressure. Other examples As a consequence of X , it appears that winds alone are not the causative factor of....... Due to X and Y inflowing surface water becomes more dense as it ....... X and Y are important driving factors of Z. The mixing of X and Y exerts a powerful effect upon Z through ...... Possible cause and effect relationships (expressed tentatively) This suggests a weak link may exist between X and Y. The human papilloma virus is linked to most cervical cancer. Stomach cancer in many cases may be associated with certain bacterial infections. A high consumption of seafood could be associated with infertility. There is some evidence that X may affect Y. Comparing and Contrasting By understanding similarities and differences between two things, we can increase our understanding and learn more about both. This usually involves a process of analysis, in which we compare the specific parts as well as whole. Comparison may also be a preliminary stage of evaluation. For example, by comparing specific aspects of A and B, we can decide which is more useful or valuable. Many paragraphs whose function is to compare or contrast will begin with an introductory sentence expressed in general terms. Note the introductory sentences below: Introductory Sentences: Differences X is different from Y in a number of respects . There are a number of important differences between X and Y. X differs from Y in a number of important ways . Smith (2003) found distinct differences between X and Y. Women and men differ not only in physical attributes but also in the way in which they ...... Introductory Sentences: Similarities The mode of processing used by the right brain is similar to that used by the left brain. The mode of processing used by the right brain is comparable in complexity to that used by the left brain. The effects of nitrous dioxide on human health are similar to those of ground level ozone. Both X and Y generally take place in a "safe environment". There are a number of similarities between X and Y. Numerous studies have compared the brain cells in man and animals and found that the cells are essentially identical. Comparison within one sentence In contrast to oral communities, it is very difficult to get away from calendar time in literate societies. Compared with people in oral cultures, people in literate cultures organise their lives around clocks and calendars. Oral societies tend to be more concerned with the present, whereas literate societies have a very definite awareness of the past. Whereas Ghazali rejected non-Islamic philosophers, Aquinas incorporated ancient Greek thought into his own philosophical writings. Women's brains process language simultaneously in the two sides of the brain, while men tend to process it in the left side only. This interpretation contrasts with that of Smith and Jones (2004) who argue that ...... Comparison within one sentence (comparative forms) Women are faster/slower than men at certain precision manual tasks, such as placing pegs in holes on a board. Women tend to perform better/worse than men on tests of perceptual speed. Further, men are more/less accurate in tests of target-directed motor skills. The corpus callosum, a part of the brain connecting the two hemispheres, may be more/less extensive in women. Women are more/less likely than men to suffer aphasia when the front part of the brain is damaged. Adolescents are less likely to be put to sleep by alcohol than adults. Women tend to have greater/less verbal fluency than men. Men learned the route in fewer trials and made fewer errors than did women. Comparison across two sentences It is very difficult to get away from calendar time in literate societies. By contrast/in contrast , many people in oral communities have little idea of the calendar year of their birth. Tests show that women generally can recall lists of words or paragraphs of text better than men. On the other hand , men usually perform better on tests that require the ability to mentally rotate an image in order to solve a problem. Young children learning their first language need simplified, comprehensible input. Similarly , low level adult L2 learners need graded input supplied in most cases by a teacher. Speech functions are less likely to be affected in women because the critical area is less often affected. A similar pattern emerges in studies of the control of hand movements. Writing about the Past Writing about the past in English is made diffcult by the rather complex tense system. However the phrases grouped below give an indication of the uses of the main tenses in academic writng. For a comprehensive explanation of the uses of the various tenses you will need to consult a good English grammar book. A good recommendation is Practical English Usage by Michael Swan, OUP. Time phrases associated with the use of the simple past tense (specific times or periods of time in the past completed) For centuries, In the second half of the 19th century, At the end of the nineteenth century, church authorities placed restrictions on academics. During the Nazi period, Between 1933 and 1945, From 1933 to 1945, In the 1930s and 1940s, restrictions were placed on German academics. Reference to single investigations or publications in the past: simple past tense used The first systematic study of the X was reported by Patel et al. in 1986. Erythromycin was originally isolated from X in a soil sample from ...... (Wang et al., 1952). In 1975, Smith et al. published a paper in which they described ..... In 1990 Patel et al. demonstrated that replacement of H2O with heavy water led to ...... Thirty years later, Smith (1974) reported three cases of Candida Albicans which ....... In the 1950s Gunnar Myrdal pointed to some of the ways in which …………… (Myrdal, 1957) In 1981, Smith and co workers demonstrated that X induced in vitro resistance to ....... In 1984 Jones et al. made several amino acid esters of X and evaluated them as water-soluble pro-drugs. An experimental demonstration of this effect was first carried out by ...... The first experimental realisation of ......, by Smith et al. , used a ...... Time phrases associated with the use of the present perfect tense (for situations/actions which began in the past and continue up to the present, or for which the period of time is unspecified): Over the past few decades , the world has seen the stunning transformation of X, Y and Z. Since 1965 , these four economies have doubled their share of world production and trade. Until recently , there has been little interest in X. Recently , these questions have been addressed by researchers in many fields. In recent years researchers have investigated a variety of approaches to X but .... Up to now , the research has tended to focus on X rather than on Y. To date , little evidence has been found associating X with Y. So far , three factors have been identified as being potentially important: X, Y, and Z. The present perfect tense may also be used to describe recent research or scholarly activity with focus on the area of enquiry - usually more than one study There have been several investigations into the causes of illiteracy (Smith, 1985; Jones, 1987). The relationship between a diet high in fats and poor health has been widely investigated (Smith, 1985, Jones, 1987, Johnson, 1992). The new material has been shown to enhance cooling properties (Smith, 1985, Jones, 1987, Johnson, 1992). Invasive plants have been identified as major contributing factors for the decline of many North American species (1). A considerable amount of literature has been published on X. Describing Trends and Projections A trend is a description of change over time. A projection is a prediction of future change. Trends and projections are usually illustrated using line graphs in which the horizontal axis represents time. Some of the language commonly used for writing about trends and projections is given below. Describing trends The graph shows that there has been a slight gradual steady marked steep sharp increase rise decrease fall decline drop in the number of divorces in England and Wales since 1981. Describing high and low points in figures The number of live births outside marriage reached a peak during the second world war. The peak age for committing a crime is 18. Oil production peaked in 1985. Gas production reached a (new) low in 1990. Projecting trends The number of Xs The amount of Y The rate of Z is projected to is expected to is likely to will probably decline steadily drop sharply level off after 2010. Describing Quantities Describing ratios and proportions The proportion of live births outside marriage reached one in ten in 1945. The annual birth rate dropped from 44.4 to 38.6 per 1000 per annun. Describing fractions Of the 148 patients who completed the questionnaire, just over half indicated that ....... The response rate was 60% at six months and 56% at 12 months. Over half of those surveyed indicated that ...... 70% of those who were interviewed indicated that ..... Approximately half of those surveyed did not comment on ...... Nearly half of the respondents (48%) agreed that ...... Less than a third of those who responded (32%) indicated that ...... The number of first marriages in the United Kingdom fell by nearly two-fifths. Describing percentages 13.1% of young men and 23.1% of young women who had married said that they ...... Returned surveys from 34 radiologists yielded a 34% response rate. The response rate was 60% at six months and 56% at 12 months. East Anglia had the lowest proportion of lone parents at only 14 per cent. Since 1981, England has experienced an 89 % increase in crime. The mean income of the bottom 20 percent of U.S. families declined from $10,716 in 1970 to ....... A study in Java found that of 2,558 abortions, 58% were in young women aged 15-24, of whom 62% were ..... He also noted that less than 10% of the articles included in his study cited ...... In 1960 just over 5% of live births in 1960 were outside marriage. Describing averages This figure can be seen as the average life expectancy at various ages. The proposed model suggests a steep decline in mean life expectancy ...... Roman slaves probably had a lower than average life expectancy. The average of 12 observations in the X, Y and Z is 19.2 mgs/m ..... The mean score for the two trials was subjected to multivariate analysis of variance to determine ...... The mean income of the bottom 20 percent of U.S. families declined from $10,716 in 1970 to ....... Describing ranges The evidence shows that life expectancy from birth lies in the range of twenty to thirty years. Between 575 and 590 metres depth the sea floor is extremely flat, with an average slope of only 1 : 400 The mean income of the bottom 20 percent of U.S. families declined from $10,716 in 1970 to $9,833 in 1990. The respondents had practiced for an average of 15 years (range 6 to 35 years) The participants were aged 19 to 25 and were from both rural and urban backgrounds. They calculated ranges of journal use from 10.7%–36.4% for the humanities, 25%–57% for the ...... Rates of decline ranged from 2.71– 0.08 cm day-1 (Table 11) with a mean of 0.97 cm day-1. It has been estimated that 300,000 people suffer form ......
xupeiyang 2011-11-29 09:10
许博主原创博文 单汉字主题词的自动标引和文献检索的歧义问题 目前,国内文献数据库大多数采用大型汉字词库(语料库)从文献题目和摘要中截词、抽词的方法,将这些关键词转换成主题词,实现数据库的主题词检索功能。复合概念的主题词,一般不会出现概念的歧义,但单汉字就容易产生概念的歧义,比如用主题词“雨”去截词,就会将“谷雨”、“余秋雨”等等词截出来,这就是歧义。 我的看法是,应该根据单汉字主题词的文献量,如果文献很少,就不要用这个主题词,比如在中国知网的CHKD数据库中的主题词钨(14篇文献),文献量不大,不必采用这个主题词自动标引和检索,直接采用关键词检索就可以了。 其他文字的单词也有概念歧义的问题,比如“AIDS”,医学指艾滋病的缩写,但AIDS还指辅助器,拐杖的意思。文献检索的查全与查准,不是单靠标引的,还有词表、检索用词、检索策略互相配合才行。特别是采用主题词与关键词互相配合,限定主题概念,排除非相关的概念的方法。 目前,很多信息检索系统和平台,采用智能检索,有更强大的词库、词表、本体语言、语义网络支持检索,并可以对检索结果进行过滤,尽量减少概念的歧义和误检,来保证检索效率和效果。 有很多这样的实例,与大家讨论。 单字主题词问题.xls
carldy 2011-11-28 15:56
【Note】 语料库方法探讨英语新闻标题的一篇文章。供参考。 文章来源:互联网。出处: http://www.gxou.com.cn/xuebao/07-3/luo%20tianfa.htm 从自建语料库看英语新闻标题的文体特色 罗天法 (广东商学院外国语学院 广东广州 510320) 本文以yahoo网站主页的每日新闻标题为材料自建语料库,分两次从中选出英语新闻标题共280条,每次140条,旨在用语料库的方法证实英语新闻标题是否在语法上(动词时态上、语态上)、用词上等方面有其显著特点。 语料库;英语新闻标题;Yahoo网站;文体特色 H314 A 1008-7656(2007)03-0070-04 前言   标题(headline)是一则广告的灵魂,是诱惑读者的主要工具。美国一项调查显示,看标题的人平均是看广告全文的人的5倍。可见标题的重要。标题也是新闻的一个不可分割的组成部分。读者打开报纸,首先看到的是报纸的版面,而版面上首先吸引读者注意力的就是标题。好的标题本身就是一则短小新闻。新闻标题在报纸中占有举足轻重的地位,它是帮助读者选择新闻信息的向导,也是引导读者理解和阅读新闻的纲要。   标题是新闻内容的集中和概括,它用简练的文字浓缩了新闻中最主要或最值得注意的内容。作为新闻的精髓,新闻标题具有揭示、阐明、评价新闻内容的作用。本文作者在研究英语新闻标题的文体特色的过程中发现,Yahoo网站的主页每天每时都有英语新闻分类标题出现,而且随时在更新。于是作者就分两个时段,分别截取连续两天的Yahoo分类英语新闻标题,每次140条,共280条,自建数据库,并对数据库的英语新闻标题进行了文体分析,旨在证实英语新闻标题是否在语法上(动词时态上、语态上、)和用词上等方面有其自身的显著特点。 1Yahoo主页英语标题新闻的用词特色 1.1大量选用简短词:   英语新闻标题总是力求用极其有限的字数概括新闻全部或大部分内容,为此,在措词上尽可能经济达意、简短明了,一般强调短小精悍,确切达意,生动形象的词,少用甚至不用意义空泛和概念抽象的词,偏爱选用那些短小精悍或字母最少的动词。这是因为短小易懂、形象生动的措词不仅能增强新闻的简洁性和可读性,而且还能节省版面篇幅。 1.1.1大 量 使 用 简 短 的 动 词 : India, Pakistan seek peace after bombing. AP (seek=look for, search for) Wal- Mart's 4Q profits rise 9.8 percent. AP. (go up, increase, ) Iran, Saudi heads vow to work for unity. AP. (vow=promise, determine) Elizabeth Hurley weds at British castle. AP (wed=get married, marry) 1.2 大 量 使 用 节 缩 词 (abbreviations):   节 缩 词 , 亦 称 简 缩 词 , 这 种 方 法 对 字 母 比 较 多 的 长 单 词 , 采 取 截 头 去 尾 的 办 法 , 将 一 些 常 用 的 名 词 、 形 容 词 等 变 短 , 其 目 的 就 是 为 了 使 新 闻 标 题 形 式 简 短 , 言 简 意 义 、 内 容 却 不 减 : 1.2.1去 尾 : Tech firms go green as e- waste mounts. AP (Tech=technology; E=electronic) Nudists sweat it out at Dutch gym. AP - Sun Mar 4 (gym.=gymnasium) Airline opens 1st- class bathroom to all. AP- Thu Mar 1 (1st- class=first- class) 1.2.2以 更 短 的 同 义 词 替 换 比 较 长 的 词 : Home- depot 4Q profit drops 28 percent. AP (4Q=fourth quarter) Militants hit Iraq base, kill 2 U.S. GIs. AP. (GI= person in or a veteran of any of the US) 1.2.3大 量 使 用 缩 写 词 (Initials):   节 省 标 题 字 数 ,这 类 词 汇 的 使 用 日 益 频 繁 ,逐 步 赢 得 比 较 独 立 的 地 位 ,这 是 一 个 值 得 注 意 的 现 象 。 BOJ to decide whether to raise rates. AP (BOJ=Bank of Japan) NAACP head resigns AP- Sun Mar 4 (NAACP= National Association for the Advancement of Colored People, 美 国 全 国 有 色 人 种 协 进 会 , 是 一 个 由 美 国 白 人 和 黑 人 组 成 的 旨 在 促 进 黑 人 民 权 的 全 国 性 组 织 , 总 部 设 在 纽 约 。 ) NYC fast- food chains pull calorie info. AP Fri Mar 2 (NYC=New York City) 1.2.4为 使 标 题 简 短 , 人 名 只 写 名 , 不 写 姓 : Clinton defends S.C. campaign hire. AP- Feb 19 (Clinton=Bill Clinton) Bush compares Revolutionary, terror wars (Bush=George W. Bush) 2由于新闻覆盖面广,几乎古今内外各行各业,无所不包无所不谈。在英语新闻标题中,大量使用熟悉的,更多的是不熟悉的人名、地名、国籍名、公司机构名,事件名等,这些名词有些新、奇、特,如果不查找专门资料,恐怕会让母语不是英语的读者难以理解: 2.1 人 名 : McCain: Rumsfeld was one of the worst. AP Trump's hair on the line at Wrestlemania. 2.2 地 名 : Trump's hair on the line at Wrestlemania. Study: Yellowstone air quality improves. AP 2.3 公 司 和 机 构 名 : Hewlett- Packard 1Q profit up 26 percent. AP Wal- Mart's 4Q profits rise 9.8 points. AP Home- depot 4Q profit drops 28 percent. AP 2.4 节 假 日 或 事 件 名 : Businesses also celebrating Mardi Gras. Irish theme at Philadelphia Show. AP- Fri Mar 2 2.5 表 示 国 籍 的 词 : Vietnamese refugees identify with Iraqis. USATODAY.com Thu Mar 1 Irish theme at Philadelphia Show. AP- Fri Mar 2 2.6 除 了 语 言 用 的 难 度 外 , 还 有 文 化 方 面 的 因 素 : Hotels with no condoms get fined. Reuters- Fri Mar 2   这 是 西 方 普 遍 所 关 注 的 成 人 入 住 旅 店 时 的 卫 生 与 健 康 话 题 : condoms“ 避 孕 套 ” 。 中 国 读 者 也 许 都 知 道 Condoms一 词 , 但 西 方 旅 馆 的 一 些 情 况 可 能 还 不 为 一 些 中 国 读 者 所 了 解 , 尤 其 是 一 些 未 婚 读 者 , 如 “ 没 有 提 供 避 孕 套 的 旅 馆 要 受 罚 ” 。 这 算 是 文 化 因 素 吧 。 Gordon, Bulls overcome Redd's 52 points. AP   这 是 体 育 方 面 的 话 题 。 看 下 面 新 闻 正 文 , 读 者 就 明 白 了 : Ben Gordon scored a career- high 48 points to help the Chicago Bulls overcome Michael Redd's 52 points for a 126- 121 overtime victory over the Milwaukee Bucks ....带 下 划 线 的 有 定 冠 词 修 饰 的 词 是 球 队 名 , 无 定 冠 词 修 饰 的 是 球 员 人 名 。 而 标 题 把 人 名 、 球 队 名 都 简 写 以 求 简 练 , 而 给 母 语 为 非 英 语 的 读 者 带 来 不 少 困 难 。 3 为 使 标 题 简 洁 , 有 时 把 名 词 的 缩 写 形 式 用 作 动 词 : Museum IDs new species of dinosaur. AP- Sun Mar 4 (ID=identity, 主 语 是 单 数 , 动 词 词 尾 加 "s"。 ) 4. 为 使 标 题 简 洁 , 有 时 把 简 短 的 动 词 用 作 名 词 : Florida ends skid with win over Kentucky. AP- Sun Mar 4 2 Yahoo主 页 英 语 标 题 新 闻 的 几 种 基 本 句 式 :   图 表 说 明 : S=Subject(主 语 ); V=Predicate(谓 语 ); O=Object(宾 语 ); Ad=Adverbial(状 语 ); Infin.=Infinitive(不 定 式 ); Ved(表 示 动 词 过 去 分 词 ); Adj.(形 容 词 ); NP=Noun Phrase(名 词 词 组 ); Ving(动 词 的 现 在 分 词 ); “ 引 用 (消 息 来 源 )” 指 “ 有 表 示 该 标 题 或 新 闻 是 引 用 某 人 原 话 的 标 题 ” ; SVO=NP+ VP+ NP; SV=NP+ VP。   上 表 是 对 Yahoo英 语 网 站 主 页 英 语 新 闻 标 题 自 建 数 据 库 新 闻 标 题 句 子 种 类 的 统 计 表 , 及 各 类 句 子 在 280条 新 闻 标 题 中 所 占 的 百 分 比 。 从 表 中 可 以 看 出 , 标 题 中 SVO/Ad形 式 的 标 题 占 了 绝 大 多 数 比 例 , 高 达 52.1% ; 其 次 是 SV/Ad形 式 的 标 题 , 达 到 20% ; 接 着 是 S(Be)Ved/Adj形 式 的 标 题 , 达 到 12.1% ; 然 后 是 SVAd和 SVOAd; 分 别 达 到 8.9% 和 8.2% ; 所 有 上 述 这 些 标 题 都 有 一 个 共 同 的 部 分 , 那 就 是 SV。 而 且 SV形 式 的 标 题 百 分 比 高 达 72.1% , SVO形 式 的 标 题 高 达 52.1% 。   新 闻 标 题 在 报 纸 中 有 举 足 轻 重 的 地 位 , 它 是 帮 助 读 者 选 择 新 闻 信 息 的 向 导 , 也 是 引 导 读 者 理 解 和 阅 读 新 闻 的 纲 要 。 新 闻 标 题 虽 简 短 ,却 要 传 达 极 为 丰 富 的 信 息 。 从 总 体 来 看 ,绝 大 部 分 新 闻 标 题 都 准 确 地 概 括 出 了 什 么 人 (物 或 地 方 ) 、 发 生 了 什 么 事 这 两 个 内 容 。 要 想 在 短 短 的 一 句 话 里 讲 清 上 述 内 容 并 非 易 事 ,除 了 要 精 心 选 用 词 语 外 ,还 必 须 要 考 虑 选 择 什 么 样 的 句 式 。 新 闻 标 题 在 句 式 选 择 方 面 有 以 下 几 个 特 点 : 2.1 “ NP + VP” 句 :   从 人 们 对 语 言 的 理 解 来 说 ,最 容 易 被 接 受 的 是 “ NP + VP” 这 种 主 谓 句 式 ,因 为 它 具 有 主 位 和 述 位 ,可 以 向 人 们 传 达 一 个 相 对 完 整 的 意 思 ,满 足 人 们 理 解 的 需 要 。 “ NP + VP” 的 主 谓 句 式 ,主 语 都 交 代 “ 人 /物 ” ,谓 语 都 讲 述 人 或 物 “ 发 生 了 什 么 事 ” 。 “ NP + VP” 的 句 式 在 新 闻 标 题 中 占 主 导 地 位 。 2.2 “ VP” 句 式 :   人 们 在 看 新 闻 时 最 想 知 道 的 是 发 生 了 什 么 事 。 虽 然 “ NP + VP” 句 式 是 新 闻 标 题 的 主 体 ,但 仍 有 一 部 分 标 题 是 由 一 个 动 词 短 语 充 当 的 。 这 主 要 是 “ VP” 可 以 告 诉 人 们 发 生 了 什 么 事 。 2.3 “ NP” 句 式 :   此 外 ,还 有 少 量 的 新 闻 标 题 是 以 名 词 短 句 的 形 式 出 现 的 。 所 谓 名 词 短 句 ,指 新 闻 标 题 是 由 一 个 名 词 性 的 偏 正 短 语 构 成 或 是 由 几 个 名 词 并 列 在 一 起 组 合 成 的 名 词 性 短 语 。 一 般 来 说 ,名 词 性 的 偏 正 短 语 也 可 以 交 代 什 么 人 (事 /物 或 地 方 )发 生 了 什 么 事 ,具 有 “ NP + VP” 句 式 的 功 能 。 3 新 闻 标 题 动 词 的 时 态 、 语 态 特 色 :   新 闻 标 题 虽 然 短 , 却 要 传 达 极 为 丰 富 的 信 息 内 容 。 新 闻 标 题 作 为 新 闻 的 精 髓 , 具 有 揭 示 、 阐 明 、 评 价 新 闻 内 容 的 作 用 , 而 标 题 中 的 动 词 的 时 态 、 语 态 对 整 个 新 闻 简 化 和 解 读 也 起 到 非 常 关 键 的 作 用 。   从 自 建 语 料 库 来 看 , yahoo主 页 的 280条 英 语 新 闻 标 题 句 所 用 的 语 态 、 时 态 与 一 般 文 体 有 所 不 同 — — — 多 用 主 动 语 态 ,少 用 被 动 语 态 ,其 所 占 数 据 库 的 比 例 很 少 , 几 乎 没 有 统 计 意 义 , 被 动 语 态 用 以 表 示 主 题 部 分 的 受 事 状 态 ; 多 用 一 般 现 在 时 ,间 用 现 在 进 行 时 和 一 般 将 来 时 ,而 极 少 用 一 般 过 去 时 和 其 他 时 态 ; 使 用 一 般 现 在 时 报 道 新 近 发 生 的 事 情 ,目 的 在 于 增 加 其 生 动 性 与 现 实 感 ,我 们 通 常 称 之 为 “ 历 史 性 现 在 时 ” (historic present)。 新 闻 标 题 中 使 用 该 时 态 旨 在 强 调 某 事 于 目 前 仍 在 进 行 ,而 其 结 果 尚 未 可 知 。 出 于 简 练 的 需 要 ,其 句 子 构 成 中 通 常 省 略 “ be” 助 词 。 4 省 略 :   省 略 — — 英 语 新 闻 标 题 的 主 要 文 体 特 色 , 其 中 , 标 题 中 虚 词 的 省 略 就 是 它 的 重 要 特 点 之 一 。 4.1 一 般 将 来 时 表 示 将 来 要 发 生 的 事 情 , 其 句 型 模 式 通 常 有 以 下 几 种 :will ( shall) do 、 be going to do、 be to do 、 be about to do 等 ;但 在 新 闻 标 题 中 “ be to do” 形 式 最 为 常 见 。 其 中 的 “ be” 助 词 通 常 也 被 省 略 ,以 此 来 表 达 对 未 来 趋 势 的 推 测 、 判 断 等 。   比 如 : JetBlue (is) to detail customers' rights. Blair (is) to announce Iraq withdrawal plan. E- Trade (is)to unveil global trading platform. 4.2 新 闻 标 题 中 就 语 态 而 言 ,主 动 语 态 的 使 用 当 为 主 流 ,被 动 语 态 偶 尔 用 之 ,以 表 示 主 题 部 分 的 受 事 状 态 。 同 样 ,为 了 简 洁 起 见 , “ be” 助 词 也 通 常 被 省 略 :   比 如 : Next wax museum (is)set for Washington DC AP- Fri Mar 2 Mummified body (was/is) found in front of blaring TV. Reuters- Mon Feb 19 9 (were/are) killed when gas tanker bombed in Iraq. AP .... 4.3 此 外 , 在 SV( NP+ VP) 句 型 中 , 如 果 句 子 的 谓 语 部 分 V为 系 表 结 构 : 即 连 系 动 词 ( Link V) 或 be动 词 + 形 容 词 或 副 词 作 表 语 时 , 连 系 动 词 或 be动 词 通 常 也 省 略 , 也 是 为 了 简 洁 之 目 的 。 比 如 : Ho Chi Minh Trail area (is)safe for wildlife. AP- Sat Mar 3 Oil prices (are) steady above $ 58 a barrel.   但 并 不 是 所 有 这 样 的 情 况 都 省 略 , 原 因 是 本 来 标 题 就 不 长 , 没 有 必 要 省 略 :如 Footage of JFK motorcade is discovered. AP- Feb 19 McCain: Rumsfeld was one of the worst. AP Global warming scientist is encouraged. AP - Feb 19 4.4 在 主 动 进 行 时 中 , Be + doing中 的 助 动 词 Be省 略 : Muslim women (are) enjoying special swimsuits. AP- Feb 18 Rice, Abbas (are)leaving for Jordon, Europe. Mass. Health care plan (is) moving forward. AP- Sun Mar 4   所 有 省 略 当 中 ,频 率 较 高 的 当 属 “ be” 助 词 及 冠 词 ;此 外 , 连 词 通 常 省 略 , 并 用 逗 号 代 替 ; 英 语 新 闻 标 题 还 经 常 省 去 介 词 、 代 词 、 冠 词 等 , 但 这 种 省 略 并 不 至 于 影 响 读 者 的 阅 读 理 解 并 可 以 节 省 版 面 。   本 研 究 还 发 现 了 英 语 新 闻 标 题 标 点 符 号 用 法 方 面 的 一 些 文 体 特 色 。 由 于 篇 幅 所 限 , 只 稍 微 提 及 : 冒 号 除 了 用 在 引 语 之 前 表 示 “ 说 ” 外 , 还 经 常 被 用 来 代 替 联 系 动 词 “ be” ; 逗 号 常 被 用 来 代 替 连 词 "and"; 破 折 号 常 被 放 置 在 不 用 引 号 的 引 言 前 后 , 以 引 出 说 话 者 。   新 闻 标 题 由 于 要 简 约 ,其 笔 墨 也 便 经 济 ,选 词 也 有 了 讲 究 。 能 用 少 量 单 词 表 达 意 思 的 ,不 用 过 多 的 单 词 ; 有 熟 知 的 词 ,便 放 弃 生 僻 的 词 ; 能 用 缩 略 语 的 ,不 用 完 整 拼 法 ; 能 用 短 语 的 ,不 用 句 子 形 式 等 。 总 之 ,能 省 则 省 ,要 言 不 烦 。 当 然 这 里 也 有 一 个 度 的 问 题 ,即 用 词 的 简 约 不 能 有 损 于 标 题 句 本 身 以 及 人 们 对 标 题 句 的 解 读 。 了 解 英 语 新 闻 标 题 的 上 述 文 体 特 色 , 对 于 非 英 语 读 者 正 确 把 握 标 题 的 要 义 、 准 确 理 解 英 语 新 闻 的 含 义 以 及 汉 语 新 闻 标 题 的 英 语 翻 译 , 都 有 不 可 低 估 的 作 用 。 1. Laura Wright & Jonathan Hope, Stylistics: A Practical Coursebook, 外 语 教 学 与 研 究 出 版 社 , 2000. 2. 张 健 .新 闻 英 语 文 体 与 范 文 评 析 》 .第 二 版 , 上 海 外 语 教 育 出 版 社 , 2004 3. 秦 秀 白 .文 体 学 概 论 》 . 第 二 版 , 湖 南 教 育 出 版 社 , 1996 4. 秦 秀 白 .英 语 语 体 和 文 体 要 略 》 .上 海 外 语 教 育 出 版 社 , 2002 罗 天 法 , 1965年 生 , 男 , 湖 北 天 门 人 , 汉 族 , 副 教 授 , 硕 士 , 广 东 商 学 院 外 语 系 , 主 要 研 究 方 向 为 文 体 学 , 英 语 教 学 。 Viewing Features of English News Titles from a Self - Compiled Corpus Luo Tianfa ( Guangdong University of Business Studies, School of Foreign Studies Guangzhou Guangdong 510320) Abstract: This article first establishes its own corpus by using the news headlines of Yahoo, then chooses 280 news headlines from the corpus each time 140 new headlines. The purpose of so doing is, on the basis of corpus, to confirm whether English news headlines have outstanding features in grammar such tenses of verbs and voices as well as their diction. Key Words: Corpus; English news headlines; Yahoo website; stylistic features
热度 2 oscar3 2011-8-4 17:36
在卓越网( http://www.amazon.cn )上看到黄剑平老师的《辅以语料库的新认知教学法在英语教学中的应用》一书,原以为是一本基于语料库的英语教学研究专著。语料库途径近年来在语言研究和语言教学中逐渐铺开。将语料库方法和认知语言学进行结合的理论研究和应用研究都不太多见,尤其是形成专著的研究成果更是非常稀罕。因此,本人抱着很大的希望买来该书,拿到书之后也迫不及待地要了解一下书的大概内容。其结果却是非常失望。首先,该书并非一本基于语料库的研究成果,而是一本类似于综述方面的书。其次,该书所涉及到的语料库理论和方法都非常少。可以想象,如果没有了语料库方面的内容,该书的吸引力将大打折扣。该书21万字,一共9章,其中只有第5章“认知教学法与辅以语料库的新认知教学法”和第9章“语料库的应用”涉及到语料库的内容。而第5章虽然有语料库的字眼,但是,主要还是阐述“新认知教学法”,读者看后很难明白语料库在这其中到底能够起到什么作用,走着也没有能够展示,如何结合认知和语料库来促进英语教学。第9章是该书最后一章,也是该书涉及语料库最多的一章。该章由2节构成,9.1“在语法教学中的应用”和9.2“在词汇教学应用”。9.1节主要举例说明用索引行来帮助学生区别定语从句和同位语从句。9.2节中则是通过cause和lead to两者语义韵不同来说明语料库在词汇教学中应用。无论是在比较广泛的语言教学以及较为具体的词汇教学中,语料库都有广泛的用途,但是,作者并没有能够在相关领域做出深入地思考和探索。从作者所举例也可以看出,作者在语料库实践方面也非常有限,基本上没有自己的分析的案例,有限的例子都是沿用的他人的。分析缺乏深度,举例也欠新颖。 话又说回来,作者虽然没有走出一条新路,本人还是要感谢黄剑平老师抛出的这个新思路,让本人在读书到题目的时候有一种眼前一亮的感觉。需要注意的问题是,语料库语言学以及相关的研究领域具有很强的实践性或者说操作性,没有足够的实践和相当的数据分析,仅凭自省和思辨很难取得理想的研究成果。 书
热度 1 gothere 2011-6-19 11:14
《史记》和《汉书》汉英平行语料库建设与术语英译检索系统的研发 李秀英 辽宁 大连理工大学 一般项目 研究报告 电脑软件 2014/12/30 11BYY049 国际汉语教学中的基本层次范畴词库建设研究 杨吉春 高校 中央民族大学 一般项目 专题论文集 电脑软件 2014/12/31 11BYY046 服务信息检索的自然语言 熊文新 高校 北京外国语大学 一般项目 研究报告 2014/6/30 11BYY051 基于语言特征的中文意见挖掘研究 张莉 江苏 南京大学 青年项目 专题论文集 电脑软件 2013/12/31 11CYY031
热度 2 carldy 2011-5-13 22:00
语料库进入认知语言学、隐喻研究领域,这是人类研究语言发展的一种大势所趋。 近段时间在读一些有关语料库与系统功能语言学的书籍,一些文章很有趣。比如,Stubbs在其文章 “Corpus analysis: the state of the art and three types of unanswered questions” 中提出: 在由 Firth , Halliday 和 Sinclair 为代表的传统英国语言学领域中,文本与语料库分析是中心。在这一传统中,非常重视常规短语学这一概念,在语言使用的创造性与常规性之间达成一种平衡。如, Firth (1953)谈到词与短语的习惯性、约定俗成性与典型性; Halliday(1978) 指出:大部分语篇或多或少是常规性的,我们“总是一而再再而三地表达同一种观点”; Sinclair (1991) 提出“两种解释原则”,一种是习语原则,一种是开放性选择原则,认为半固定的短语是“非常普遍的”。 Stubbs 尝试运用语料库来研究英语短语学的范围与优势,并将探讨语料库方法可以为语义的认知模型研究提供大量语料 , 有助于解决认知和社会理论中长期以来悬而未决的问题 , 即语言如何与认知和社会系统相联系。他提出 他的中心议题:语料库方法能有助于解决认知和社会理论中长期以来悬而未决的问题(大致分成三类:比较简单的,比较困难的,不可能的),即: Easier descriptive questions concern how we can make generalizations about phraseology across the lexicon. More difficult questions concern whether different models of phrasal units can be related to each other. The deepest—maybe impossible—questions concern whether linguistic, cognitive and social patterns can be related. 简单的描述性问题:主要探讨我们是如何将短语学从词汇层面加以一般化(泛化); 稍难的问题:不同的短语单位模式能否彼此相关联; 最深层次问题(有可能是无能为力的问题):语言学、认知与社会模式能否彼此相关联。 从第二个层面的问题来看,语料库数据能为语义关系提供经验性证据( empirical evidence ),因此也就能够为心理词汇是如何组织的提供证据; 从第三个层面的问题来看,语料库数据能为说话者经常谈论的事物提供证据,因此也就能为社会性意义突出的词汇范围提供证据,如“金钱” (money) 和“人群” (group of people) 。 从第一个层面,即浅层次的描写型层面来看,通过那些文化关键词汇的经验性描述,给生活一种全新的阐释。 Popper(1963:125) 指出,社会理论的任务就是解释我们的目的与行为是如何引出那些无意识的后果 / 结果 。 我们试图通过抱怨门不能推移,或表达对某人延年益寿的羡慕,用语言来传达自己的意义。但是,我们并没有有意识去复制那些典型的英语短语。很明显,人们对语言的使用都是重复性的,但要解释跨社区之间的语言重复或要解释在功能系统中,什么层次的重复是最优化的,要做到这一点,可不是容易的事情。 也只有通过这些问题,语料库研究才能从描述性发展到解释性。 Stubbs通过大量例证,来说明语料库在研究人类认知活动中的功用,如: A second example of such an area is the ways in which people are classified and talked about. Two concepts which are encoded in a large number of the words and phrases we use to talk about social life are ‘groups of people’ and ‘the passing of time’ : The large number of approximate synonyms for ‘groups of people’ is not surprising, since the different ways in which people can be grouped is of inherent social interest. Here are just a few examples: - band, bunch, crew; family, flock, gang, group, jury, rabble, team - crowd, horde, mob; angry mob, lynch mob, barbarian horde - relative, friends, acquaintances, neighbours, strangers - cults, extremists, fanatics, fundamentalists, militants - anarchy, riot; concert, demonstration, applause, cheer, fame, scandal - infant, baby, child, adolescent, teenager, youth, adult - chilhood, schooldays, youthful, middle-aged, elderly, old, senile - age group, age bracket, age of consent, come of age - in my younger days, in his/her day, in his/her heyday, {cut down in} in his/her prime, thirty something; over the hill, burnt out, past it, twilight years, ripe old age 【备注】 这只是本人读书的点滴收获。还有很多话题等待挖掘。让我感兴趣的是,既然人类通过语言来表达自己的观点、感受,生活中的喜怒哀乐,政治上的价值取向,那么,同一社区的人,是否有共同的认知模式或隐喻方式?如有,这种模式是如何构成的?不同社区的人,又是什么样的情况呢? 语料库、语言、文本、语篇、认知、隐喻、社会生活......
carldy 2010-12-26 10:39
【备注】 这里引用的是两次The International Symposium on Using Corpora in Contrastive and Translation Studies (UCCTS语料库及语言对比与翻译国际研讨会) 在线论文,供参考: UCCTS2010: Edge Hill University, UK http://www.lancs.ac.uk/fass/projects/corpus/UCCTS2010Proceedings/ UCCTS2008:Zhejiang University, China www.ling.lancs.ac.uk/corplang/UCCTS2008Proceedings
carldy 2010-12-16 22:17
【备注】这里转载的是王克非教授刊发于《中国英语教育》2008年第2期的论文,以备研究之用。 也谢谢王教授能关注到我在第18届世界翻译大会(2008..8上海)上报告的论文: 戴光荣,基于自建平行语料库基础上的翻译明晰化研究 创建语料库,探讨新课题 王克非 ( 北京外国语大学 中国外语教育研究中心,北京 , 100089 ) 提要: 语料库的兴起带来研究工具和研究方法的更新,进而导致语料库语言学和语料库翻译学的产生,丰富了相关研究课题。本文综述近年来国内外语料库翻译学方面的新进展和各种双语类、翻译类语料库的研制情况,并讨论如何在描写研究、应用研究和理论研究三方面展开新课题研究。作者认为,语料库资源的共享,检索工具的不断研制和更新,使语言和翻译的研究范围从语际对比扩大到语内类比,呈现出双向、多重的对比模式,其意义不仅在理论上,在对翻译现象(也包括双语对比)的描述和翻译教学的改进方面也极有应用价值。 关键词 :语料库;翻译研究;翻译教学 中图分类号: H315.9 文献标识码: A 1. 语料库:研究手段与思路的更新 新技术往往带来新的认识工具和新的研究手段。计算机技术的发展使超大规模语料的收集、整理、标注、观察、统计成为可能,从而也使人们对语言的认识有了更宏观的视角,于是发展出现代语料库语言学。这一新型分支学科开展的二十多年来,一方面提供了新的语言考察和分析手段,一方面也推进了依据语料库大量语言事实所作的关于语言的理论思考。正如 Johansson ( 2007 : 1 )指出的: 通过语料库,我们可以观察先前没有意识到或仅仅隐约觉察到的语言模式 。 语料库带来的不只是语言学研究上的更新,还包括翻译学研究上的更新。语料库翻译学的发展,使研究者得以对大量真实语料(尤其是双语 / 多语平行语料,包括书面语和口语)进行对比分析,这为语言对比和翻译研究提供了新的契机。本文主要论述语料库为翻译研究提供的新思路、新课题,包括方法论或工具层面上的应用研究,和关于翻译特征的抽象性的理论研究。 2. 语料库翻译学:内容与定义 翻译研究历来重视语言、文本的比较,因此在理论上多借鉴语言学,在方法上多采用描写法、比较法。一脉传承下来,便有了现代的语料库翻译研究。在语料库的支持下, 翻译学理论研究的重心,从原文与译文的比较或 A 语言与 B 语言的比较,转向文本生成本身与翻译的比较 ( Baker 1995 : 233 )。 Mona Baker ( 1996 : 175 )进而提出 基于语料库的翻译研究 ( corpus-based translation studies )。此后, Tymoczko ( 1998 : 1 )将基于语料库的翻译研究称为 语料库翻译研究 ( Corpus Translation Studies ,简称 CTS )。 Mona Baker 的学生 Laviosa ( 1998 : 1 )更提出翻译研究中的语料库途径是翻译研究的一种 新范式 ( a new paradigm )。 鉴于基于语料库的语言学研究通称为语料库语言学, 基于语料库的翻译研究 我们也不妨称为语料库翻译学。既然是基于语料库的研究,语料库翻译学要借鉴语料库语言学的基本方法,包括语料的整理、标注、检索、统计等,但也有它独特之处。其一、所据语料库不同。语料库语言学依据单语语料库即可,语料库翻译学一般要依靠双语语料库,主要是翻译语料库( translational corpus )、对应语料库( parallel corpus )和类比语料库( comparable corpus )。其二、标注上双语语料库更加复杂。如翻译语料库需要对翻译、译者等要素加以详细标注,对应语料库需要对两种语料作某种层级(通常为句级)的对齐处理,类比语料库需对文体、主题、作者、译者等要素加以标注。而且双语的语料数量有限,选收翻译语料还需有质量上的考虑。其三、研究对象有别。语料库翻译学探究的是两种语言及其转换的过程、特征和规律,以及它们在教学、双语词典编纂上的意义。 若尝试为语料库翻译学作一定义,我们认为,它以语言理论和翻译理论为研究上的指导,以概率和统计为手段,以大规模双语真实语料为对象,采用语内对比与语际对比相结合的方法,对翻译现象进行历时或共时的描写和解释,探索翻译的本质(参看王克非、黄立波 2007 )。我们说语料库翻译学有语言学理论背景,是因为以 Firth 、 Halliday 和 Sinclair 为代表的英国语言学传统思想是语料库语言学的直接理论源头(参看 Stubbs 1993 )。该学派认为语言研究应以真实数据为基础,即以真实文本为主要研究对象开展实证研究,内省式例子本身是解释行为的一部分;将文本整体作为研究的基本单位;基于文本语料的研究以对比为基本模式(参看 Stubbs 1993: 8-13 )。这些思想一方面打破了传统语言研究建立在内省数据基础上演绎式、规定性的规则系统,另一方面打破了关注语言、忽略言语的倾向,转而对一定社会文化语境中具体的语言使用进行描写和解释,这些间接地为语料库翻译学提供了理论依据。大规模语料库可以帮助翻译研究者重新审视其研究对象,探究翻译研究对象不同于其它研究对象的原因,同时探索指导翻译行为的原则以及制约翻译运作的因素。描写翻译研究摒弃了内省法,从描写真实语料出发,达到对翻译现象的解释和预测。这些均为基于语料库的翻译研究范式提供了直接理论支持。 综上所述,无论是基于语料库还是语料库驱动的研究,都为研究者带来不少新的课题,从而吸引更多的研究人员,并有可能突破原来的研究局面,带来新的启示和进展。 3. 语料库翻译学:三大研究方面 从古今中外的翻译实践和论述看,最自然的翻译研究途径,也是传统的途径,便是以源语文本为参照,以忠实程度为取向,主要探讨译文与原文之间的关系或对应关系。这种二元对立的思路往往将译文视为原文衍生物,在此基础上的理论研究本质上是对译文质量的回溯式评估,这种模式在一定程度上有碍于翻译研究的发展。近三十年来,国内外学者不断有人尝试打破这种局面,探讨新的研究路径。如 Even-Zohar 等人提出多元系统理论,试图 不仅从语言还要从翻译外部即社会文化层面上解释翻译现象,提升了目标语文化语境对于翻译的作用;又如 Holmes 和 Toury 等人提出描写性翻译研究法,一方面重视翻译规范( translation norms )的研究,另一方面也试图探究翻译的普遍性特征( universal features of translation )。语料库的发展,特别是双语语料库的研制,使描写性翻译研究得以更充分地展开,基于语料库的翻译研究途径在逐步从方法论发展成为连贯、综合、丰富的范式,应用于翻译理论以及翻译的描写和实践等系列问题的探讨。综括近十几年国内外的研究,语料库翻译学至少可以使我们在以下三个方面开展一系列探讨: 1) 描写研究:包括大范围的翻译调查、翻译文体考察,不同译本的参数比较,以及语句的对应情况,对应词和短语的搭配及其频率等统计数据的检索与分析。 2) 应用研究:如自动翻译研究,将开展了半个世纪的机器翻译与语料库翻译结合起来,以期取得新的实质性突破;又如翻译教学,大量的对应本文和语句的呈现,有利于学习者翻译意识的养成和翻译技巧的自我提高;在双语词典编纂方面,可以丰富例证和提供搭配情况、频率、反向检索等数据。 3) 理论研究:使更广泛、有效的描写性翻译研究得以展开,包括翻译规范的研究和翻译普遍特征或曰共性的研究;还可以结合翻译研究进行双语对比分析,有助于丰富人们对语言的认识以及对语言习得的认识。 4. 基于语料库,探讨新课题 在各种语料库的基础上,可以开展的翻译学研究新课题、新视角很多,学者们思路非常活跃。国际上,英国曼彻斯特大学科技学院的 Mona Baker 教授较早开展基于语料库的翻译研究。她在 语料库语言学和翻译研究 ( 1993 )文中对这两者的结合作了初步阐发。在大范围翻译调查方面,如翻译文体的考察,她( 2000 )也率先从语料库角度探讨译者的文体特征,特别是从类符 / 形符比、平均句长及词项使用特点等方面加以分析。 Malmkjr ( 2004 )以 Dulcken 对安徒生作品的英译为例,对译者文体作了进一步探索,提出了 翻译文体学 ( translation stylistics )这一概念。她关注在源语文本既定条件下,译者为什么会以特定的方式来塑造译文,对这些现象的解释除考虑语言因素外,亦可从语言外因素,如翻译规范、目标语文本的目的等入手。此外还有 Laviosa 调查了英语翻译文本中的四种核心词汇运用模式; Kenny 通过对原文、译文的语义韵比较,发现译文语言有净化( sanitisation )现象; ?veras 以英译挪和挪译英各 20 本翻译小说的前 50 句(总共 2000 句)为例,考察了英语 - 挪威语翻译中衔接层面上的显化现象;王克非根据大型对应语料库探讨了译本扩增情况; Laviosa 讨论译文与母语原创文在词汇使用上的不同;柯飞通过语料库考察,发现翻译过程中对原文的模仿可能使译文变得复杂化、冗长化(模仿原作); Xiao McEnery 发现在 体 标记的使用上,汉语译文比汉语原文多出约一倍; Ebeling 比较了英语和挪威语在存现句使用上的特点; Maia 以双语对应语料库观察英语和葡萄牙语在人称主语使用频率上的差异,等等(参看王克非 2004 : 182-183 )。 还有一些研究者探讨利用双语语料库进行辅助翻译教学和翻译自主学习的探讨。双语平行语料库辅助翻译教学,是指借助大规模原文及其对应译文的电子文本,辅以计算机统计手段的翻译教学模式。这一模式旨在从翻译产品入手,通过观摩、对比、分析、借鉴的方式发挥学生的主动性,培养学生的翻译意识,在评估他人译作和自我实践的基础上提高学生的翻译技能。国际上关于利用双语平行语料库提高翻译教学的研究可参看 Zanettin, Bernardini Stewart ( 2003 )所编《翻译教育中的语料库》( Corpora in Translator Education )论文集,其中不少论文具有参考价值。 Bernardini ( 1997 )指出,翻译教学辅之以平行语料库检索,便于翻译专业学生形成一种翻译的 意识 、 反射 和 应变 ,这些技能 使专业译员有别于那些不熟练的业余爱好者 。利用平行对应语料库(配以合适的检索工具)便于查找特定表达方式的译法,使术语和短语的翻译更加准确、地道,而且常常可以提供多个翻译选择或翻译参考,比双语词典的例证更为丰富,更为真实。王克非、秦洪武、王海霞( 2007 )的一项基于语料库的翻译自主学习实验也表明,语料库和语料分析工具为发现式学习活动提供了有力的辅助工具,它能激发学生的兴趣,促使学生将注意力集中在意义和形式的关联上,提供轻松的课堂互动环境。从外语学习和翻译技巧觉识上看,学习者通过观察双语对应语料可能对搭配、语义偏向和语义韵等现象更加敏感;翻译技巧和策略可以通过学生之间和师生之间的互动性讨论逐步形成,这有助于学生形成稳定、持久和灵活的翻译策略,而单靠教师课堂传授翻译技巧很难做到这一点。 如上简述,对于语料库翻译学来说,设计和研制语料库是基础。建成了语料库,才有可能开展新课题研究。近年来,新的语料库不断创建,新的课题也在不断探讨。仅以北京外国语大学大型 通用汉英对应语料库 来说,在该库基础上已开展了许多研究(详见王克非等, 2004 );最近(根据 2008 年 8 月上海世界翻译大会最新资料)又有秦洪武凭借该库做翻译语言分析。他发现,汉语翻译语言与汉语原创语言在语言运用的宏观特征有差异:翻译语在词汇密度上低于原创语,在句长上翻译语则高于原创语。这说明翻译语较原创文本在词语使用上不如汉语原创文本用词丰富,而且句子偏长。关键词对比显示,汉语翻译文本和汉语原创文本在名词使用上的差异最为明显。在介词、连词、代词和方位词使用上,汉语翻译文本在使用的数量和频率上都远远高于汉语翻译文本,但在动词使用上二者没有明显差异。在句法组织方式方面,翻译语言与原创文本语言的差异主要表现在动词和宾语之间的线性距离上,前者明显高于后者。黄立波则利用该库探究英汉翻译中人称代词主语的显化问题。他对英汉翻译中人称代词主语在文学和非文学两种文体类型中数量、频次和转换类型三方面进行考察,发现: 1 )文学与非文学英译汉时,人称代词主语数量和频次均呈减少趋势; 2 )从转换类型看,人称代词主语转换以对应关系为主,语际显化和隐化均不明显; 3 )相对非翻译汉语文本,汉语翻译文本类比显化突出。在英译汉过程中,人称代词主语语际转换表现出源语迁移现象;语内类比显化突出。他认为根本原因在于英、汉语对形式手段依赖程度的差异,以及语言相对社会地位和认知规律等因素的作用。 燕山大学新近研制的《红楼梦》中英文语料库也值得注意。这是一个一本多译的翻译语料库。该库以《红楼梦》汉语原本与三种英语译本构成句级平行语料库,可以进行中英对照、英英对照研究和互联网自动链接检索,使相关学者得以充分利用语料库储存数据大、计算机运行速度快、有多种语料库工具可以对大量数据进行分析的诸多优势,多方位多角度地对三个英译本在翻译技巧运用、文体设计、语言选择、文化内容处理等方面开展研究。该项目的主要特点是: 1 )收集了中文 120 回原文和 120 回的霍克斯、闵福德译本、杨宪益夫妇译本及 56 回的乔利译本。这一个原本、四个译本构成四个单语语料库,可分可合,既可作为一个中文语料库对应任何一个英译本的双语平行语料库,又可作为三个译本的单语语内语料库,分别用于中英对照和英英对照研究。 2 )该语料库做了句级平行对应,并对汉英句子的类型、语类、语域、语态及有无修辞、习语等进行了标注。在 ParaConc 等检索工具的帮助下可自由提取语料库中的汉英句对。各个单语数据库支持 WordSmith 、 AntConc 等检索工具对文本各种信息和语言特色的检索分析。 3 )可以对该语料库做统计分析,开展各项研究,具体包括中英文各文本词频表、词目表、词目分布表,各译本句目、句长、词语搭配、句子类型等语言特色以及翻译技巧的对比分析等。 4 )该库建立了网络搜索平台,实现互联网自动链接检索。通过该检索平台可进行基于汉英文本内容和句子属性的检索,同时还支持对译本、章回、句子属性、词频及句长等的选择性检索。 此外,更有意思的是,各地学者纷纷建立各具特色的语料库,开展自己感兴趣的研究。仍以不久前举行的世界翻译大会情况看,就有 1 ) Maeve Olohan 以自己的语料库研究翻译的普遍特征问题, 2 )胡开宝研制莎士比亚汉译语料库,探讨汉译中的显化问题, 3 )胡显耀基于自建汉译小说语料库进行翻译小说虚词特征研究, 4 ) Wallace Chen 基于语料库探讨建立英汉翻译显化模式, 5 )王建新基于语料库研究英汉语中的转折连接词, 6 )戴光荣,基于自建平行语料库基础上的翻译明晰化研究, 7 ) Vandeweghe Rura 则试图建立一个平行语料库作为译者的多功能辅助工具, 8 )董娜以自建语料库开展林语堂作品的翻译研究, 9 )夏云以自建小型语料库进行可比文本分析与应用文体翻译研究,等等。我们最近在计划创建一个新的语料库: 原创文学与翻译文学语料库 ,是一个类比语料库( comparable corpus ),拟收录约 1000 万字的 20 世纪前半叶原创的小说、散文(含小品文)和翻译小说、散文及其英文原文,以便 1 )考察现代汉语白话文的早期发展,特别是词汇和句式上的变化; 2 )考察 20 世纪前期英语文学翻译的特征和变化; 3 )分析比较原创文学与翻译文学在语言上的异同及翻译对创作在语言及其他表现形式上的影响。 由上述可知,语料库翻译学方兴未艾,新课题新探讨层出不穷。 4 、结语 语料库的兴起带来研究工具和研究方法的更新,进而导致语料库语言学和语料库翻译学的产生,带来研究范式上的变化和研究课题的丰富。在描写研究、应用研究和理论研究三大方面,语料库翻译学都大有可为。目前国内外学者们纷纷创建和研制各种语料库,并开展各具特色的研究,大大丰富了我们对翻译的认识,推进了学术的进步。 参考文献 Baker, M. Corpus linguistics and translation studies: Implications and applications . In M. Baker, G. Francis E. Tognini-Bonelli (eds.). Text and Technology: In Honour of John Sinclair . Amsterdam : John Benjamins, 1993: 233-250. Baker, M. Corpus-based translation studies: The challenges that lie ahead . In H. Somers (ed.). Terminology, LSP, and translation . Amsterdam : John Benjamins, 1996: 175-186. Baker, M. Towards a m ethodology for i nvestigating the s tyle of a l iterary t ranslator? Target 200 0, 12(2): 241-266. Bernardini, S. A trainee translators perspective on corpora. Paper presented at Corpus Use and Learning to Translate at Bertinoro, November 1997 . http://www.sslmit.unibo.it/introduz.htm, 2008-01-17 . Gellerstam, M. Translationese in Swedish novels translated from English . In L. Wollin H. Lindquist (eds.). Translation Studies in Scandinavia . Lund : CWK Gleerup, 1986: 88-95. Johansson, S. Seeing Through Multilingual Corpora . Amsterdam : John Benjamins, 2007. Laviosa, S. Corpus-based Translation Studies: Theory, Findings and Applications . Amsterdam : Rodopi, 2002. Laviosa, S. The Corpus-based approach: A new paradigm in translation studies . Meta 1998, 43 ( 4) : 474-479. Malmkjr, K. Translational stylistics: Dulckens translations of Hans Christian Andersen . Language and Literature 2004, 13(1): 13-24. Olohan, M. Introducing Corpora in Translation Studies . London : Routledge, 2004. Stubbs, M. British traditions in text analysis -- From Firth to Sinclair . In M. Baker, G. Francis E. Tognini-Bonelli (eds.). Text and Technology: In Honour of John Sinclair . Amsterdam : John Benjamins, 1993. Tymoczko, M. Computerized corpora and the future of translation studies. Meta 1998, 43(2): 652-660. Vanderauwera, R. Dutch Novels Translated Into English . Amsterdam : Rodopi, 1985. Zanettin, F., S. Bernardini D. Stewart. Corpora in Translator Education . Manchester : St. Jerome Publishing, 2003 黄立波 , 王克非 . 翻译普遍性研究反思 . 中国翻译 , 2006 ( 5 ): 36-40. 黄立波 . 基于汉英 / 英汉平行语料库的翻译共性研究 . 上海:复旦大学出版社 , 2007. 王克非 , 黄立波 . 语料库翻译学的几个术语 . 四川外语学院学报 , 2007 ( 6 ): 101-105. 王克非 , 秦洪武 , 王海霞 . 双语对应语料库翻译教学平台的应用 . 外语电化教学 , 2006 ( 6 ): 43-49. 王克非等 . 双语对应语料库 : 研制与应用 . 北京 : 外语教学与研究出版社, 2004. 吴昂、黄立波 . 关于翻译共性的研究 . 外语教学与研究 , 2006 ( 5 ): 296-302. New Approaches to Translation Studies with Bilingual Corpora WANG Kefei (National Research Centre for Foreign Language Education, Beijing Foreign Studies University , Beijing 100089, China ) Abstract : The rising of corpus brings about an innovation in research tool and methodology for language, and results in corpus linguistics and corpus-based translation studies, which enrich us with new approaches to the issues. The recent results in corpus-based translation studies and compilation of bilingual corpora (both in China and in the western countries) are described in detail in the paper, and the author proposes the research should be carried on mainly in three aspects: descriptive, applied (including translation teaching) and theoretical studies. Along with the development of corpus compilation, concordance tools, which are constantly being improved, the research in language and translation extends its scope from inter-lingual comparison to intra-lingual comparison, and bidirectional and multifold models will be available as new approaches. Key words : corpus, translation studies, translation teaching 收稿日期: 2008-01-04 ; 本刊修订稿: 2008-03-20 作者简介 : 王克非 :北京外国语大学中国外语教育研究中心研究员,博士。学术兴趣:语言学,翻译学。
xupeiyang 2010-11-21 13:57
http://ling.cass.cn/yingyong/courses/corpusbase.htm#gaishu 语料库研究与应用综述 一 概述 语料库通常指为语言研究收集的、用电子形式保存的语言材料,由自然出现的书面语或口语的样本汇集而成,用来代表特定的语言或语言变体。经过科学选材和标注、具有适当规模的语料库能够反映和记录语言的实际使用情况。人们通过语料库观察和把握语言事实,分析和研究语言系统的规律。语料库已经成为语言学理论研究、应用研究和语言工程不可缺少的基础资源。 语料库有多种类型,确定类型的主要依据是它的研究目的和用途,这一点往往能够体现在语料采集的原则和方式上。有人曾经把语料库分成四种类型:( 1)异质的( Heterogeneous):没有特定的语料收集原则,广泛收集并原样存储各种语料;(2)同质的(Homogeneous):只收集同一类内容的语料;(3)系统的(Systematic):根据预先确定的原则和比例收集语料,使语料具有平衡性和系统性,能够代表某 一范围内的语言事实;( 4)专用的(Specialized):只收集用于某一特定用途的语料。除此之外,按照语料的语种,语料库也可以 分成单语的( Monolingual)、双语的(Bilingual)和多语的(Multilingual)。按照语料的采集单位,语料库又可以 分为语篇的、语句的、短语的。双语和多语语料库按照语料的组织形式,还可以分为平行(对齐)语料库和比较语料库,前者的语料构成译文关系,多用于机器翻译、双语词典编撰等应用领域,后者将表述同样内容的不同语言文本收集到一起,多用于语言对比研究。 语料库建设中涉及的主要问题包括: (1) 设计和规划:主要考虑语料库的用途、类型、规模、实现手段、质量保证、可扩展性等。 (2) 语料的采集:主要考虑语料获取、数据格式、字符编码、语料分类、文本描述,以及各类语料的比例以保持平衡性等。 (3) 语料的加工:包括标注项目(词语单位、词性、句法、语义、语体、篇章结构等)标记集、标注规范和加工方式。 (4) 语料管理系统的建设:包括数据维护(语料录入、校对、存储、修改、删除及语料描述信息项目管理)、语料自动加工(分词、标注、文本分割、合并、标记处理等)、用户功能(查询、检索、统计、打印等)。 (5) 语料库的应用:针对语言学理论和应用领域中的各种问题,研究和开发处理语料的算法和软件工具。 我国语料库的建设始于80年代,当时的主要目标是汉语词汇统计研究。进入90年代以后,语料库方法在自然语言信息处理领域得到了广泛的应用,建立了各种类型的语料库,研究的内容涉及语料库建设中的各个问题。90年代末到新世纪初这几年是语料库开发和应用的进一步发展时期,除了语言信息处理和言语工程领域以外,语料库方法在语言教学、词典编纂、现代汉语和汉语史研究等方面也得到了越来越多的应用。 语料库与语言信息处理有着某种天然的联系。当人们还不了解语料库方法的时候,在自然语言理解和生成、机器翻译等研究中,分析语言的主要方法是基于规则的(Rule-based)。对于用规则无法表达或不能涵盖的语言事实,计算机就很难处理。语料库出现以后,人们利用它对大规模的自然语言进行调查和统计,建立统计语言模型,研究和应用基于统计的(Statistical-based)语言处理技术,在信息检索、文本分类、文本过滤、信息抽取等应用方向取得了进展。另一方面,语言信息处理技术的发展也为语料库的建设提供了支持。从字符编码、文本输入和整理,语料的自动分词和标注,到语料的统计和检索,自然语言信息处理的研究都为语料的加工提供了关键性的技术。 下面先简要叙述1998年到2003年中国语料库建设的基本情况,然后介绍语料库的加工、管理和规范问题,最后谈谈语料库方法在语言研究和语言工程等方面的应用。由于以前的《中国语言学年鉴》很少谈及语料库问题,为了尽可能全面地反映我国语料库研究和应用的情况,必要时会将时间上限向前延伸几年。 二 中国语料库建设的基本情况 90年代末到新世纪初这几年投入建设或开始使用的语料库有数十个之多,不同的应用目的使这些语料库的类型各不相同,对语料的加工方法也各不相同。下面是其中已开始使用并且具有一定代表性的语料库。 (一)现代汉语通用语料库 这是一个由国家语言文字工作委员会主持建立、面向全社会应用需求的大型通用语料库,从90年代初开始建设,计划规模7000万字,主要应用目标是语言文字信息处理、语言文字规范和标准的制定、语言文字的学术研究、语文教育、以及语言文字的社会应用。 这个语料库收录的语料以书面语为主、以书面语转述的口语为辅。语料来源是1919年至今,主要是1977年至今出版的教材、报纸、综合性刊物、专业刊物和图书。在设计原则上,讲求通用性、描述性、实用性和抽样的科学性。在语料分类方面,以门类为主,语体为辅为原则制定三个大类: 第一类:人文与社会科学类(包括8个次类、30个细类) 1 .政法类:哲学政治宗教法律 2 .历史类:历史考古民族 3 .社会类:社会学心理语言文字教育文艺理论新闻民俗 4 .经济类:工业经济农业经济政治经济财贸经济 5 .艺术类:音乐美术舞蹈戏剧 6 .文学类:小说散文传记报告文学科幻口语 7 .军体类:军事体育 8 .生活类 第二类:自然科学类(包括6个次类) 1 .数理类 2 .生化类 3 .天文地理类 4 .海洋气象类 5 .农林类 6 .医药卫生类 第三类:综合类(包括6个次类,30多个细类) 1 .行政公文类: 请示报告批复命令指示布告纪要通知等 2 .章程法规类: 章程条例细则制度公约办法法律条文等 3 .司法文书类: 诉讼辩护词控告信委托书等 4 .商业文告类: 说明广告调查报告经济合同等 5 .礼仪辞令类: 欢迎词贺电讣告唁电慰问信祝酒词等 6 .实用文书类: 请假条检讨申请书请愿书等 在不同类别、不同来源、不同时期的语言材料中,按照不等密度的思路确定合适的语料选取比例, 从共时和历时两个角度保证入选语料的平衡性,是这个语料库的特点。譬如,在语言材料的年限方面,选材比例是: 1919 年 1925年 5% 1926 年 1949年 15% 1950 年 1965年 25% 1966 年 1976年 5% 1977 年以后 50% 在语言材料的门类、语体和来源方面,选材比例是: 人文与社会科学类占59.6%。其中 各个次类在本大类中的比例是: 政法12.7% 历史 8.4% 社会14.0% 经济9.8% 艺术 6.7% 文学44.9% 军体 2.3% 生活1.4% 自然科学类占17.24%。其中 各个次类在本大类中的比例是: 数理 17.2% 生化19.1% 天文地理14.1% 海洋气象9.1% 农林22.8% 医药卫生17.7% 综合类占9.36%。其中 各个次类在本大类中的比例是: 各类应用文91.1% 其他8.9% 报纸类占13.79%。其中 各个次类在本大类中的比例是: 全国性报刊25% 省市报刊75% 这个语料库在选材过程中收集和记录语料的有关描述信息,为每个语料样本设立了20个描述项目:总号、分类号、样本名称、类别、作者、写作时间、书刊名称、编著者、出版者、出版日期、期号(版面号)、版次(初版日期)、印册数、总页数、开本、选样方式、样本起止页数、样本字数、样本总数、繁简字。用户可以利用这些语料描述标记根据各自的需要进行各种方式的检索。语料库的建库工作分为两步,第一步先建立核心语料库(由7000万字的语料中筛选出2000万字语料组成)。到90年代末,完成了 2000万字生语料 的收录工作。从2001年开始,对2000万 字核心语料进行分词和词性标注加工。 (二)《人民日报》标注语料库 《人民日报》标注语料库由北京大学计算语言学研究所和日本富士通公司合作,从1999年开始,到2002年完成,原始语料取自1998年全年的《人民日报》,共约2700万字,到2003年又扩充到3500万字,是我国第一个大型的现代汉语标注语料库。这个语料库加工的项目有词语切分和词性标注,还有专有名词(人名、地名、团体机构名称等)标注、 语素子类标注、动词、形容词的特殊用法标注和短语型标注。下面是一段语料标注的示例,对于 1998年1月1日第5版第1篇文章的第11段: 我国的国有企业改革见成效。位于河南的中国一拖集团有限责任公司面向市场,积极调整产品结构,加快技术改造和新产品研制步伐。图为东方红牌履带拖拉机生产线。(赵鹏摄) 标注后的形式是: 19980101-05-001-011/m 我国/n的/u国有/ vn 企业/n改革/v见/v成效/n。/w位于/v河南/ns的/u nt 面向/v市场/n,/w积极/ad调整/v产品/n结构/n,/w加快/v技术/n改造/ vn 和/c新/a产品/n研制/ vn 步伐/n。/w图/n为/v东方红牌/ nz 履带/n拖拉机/n生产线/n。/w(/w赵/nr鹏/nr摄/Vg)/w 在每一个切分出来的词和标点符号后面,是该词语的标记。譬如词性标记(n,v,a,u,m,w等),专有名词标记(nr,ns, nz 等), 语素子类标记( Vg等),动词和形容词特殊用法标记( vn ,ad)。所有的标记都是以北京大学的《现代汉语语法信息词典》为基础词库,在一个加工规范的指导下标注的。 利用《人民日报》标注语料库,人们可以从各个角度考察和分析语言事实,统计各种语言单位出现的频率,譬如,词语或词类的分布、搭配和共现,专有名词的结构方式、兼类词在句子中的表现,语素字的使用情况,等等。也可以从语料里提取各种语言单位或语句片段作为研究实例。与仅仅以汉字串的形式表示的生语料相比,经过标注的熟语料显然含有更多的语言学特征信息,对汉语词汇研究、语法研究和汉语信息处理系统来说是更好的语言知识资源。 《人民日报》标注语料库中一半的语料(1998年上半年)共1300万字已经通过《人民日报》新闻信息中心公开提供许可使用权。其中一个月的语料(1998年1月)近200万字在互联网上公布,供自由下载。 (三)用于语言教学和研究的现代汉语语料库 建立现代汉语语料库的主要目的之一是对外汉语教学和现代汉语研究,可以分为书面语语料库和以文本形式表示的口语语料库两类。前者如北京语言大学的汉语中介语语料库、现代汉语研究语料库,后者如中国社会科学院语言研究所的北京地区现场即席话语语料库。 汉语中介语语料库的建设目标 是为对外汉语教学、中介 语研究、偏误分析和汉语本体研究提供资源,因此它的 语料来源很有对外汉语教学的特点。作者先在北京和其他省市的9所高等院校里,从来自96个国家和地区的1635位外国留学生那里收集了成篇成段的汉语作文或练习材料5774篇,共3528988字。再从中抽取了740人的1731篇语料,共有44218句,1041274字。全部语料都记录了学生姓名、性别、年龄、国别、是否华裔、第一语言、文化程度、所学主要教材、语料类别、写作时间、提供者等23项属性。然后对这104万字的语料进行词语切分、词性标注以及一些专用的语言学特征标注。例如,标出了字、词、句、篇等不同的层次,对语料的非规范形式(例如:错字、别字、繁体字、拼音字、非规范词等)做出索引标记,记录其对应的规范形式。这个语料库的管理系统 有语篇属性登录、文本过滤、文字预处理信息登录、语料抽样、断句、分词、词性辅助标注、自动标注以及语料的主题检索、全文检索和数据浏览等各种功能,分别处理语料库的建立、管理和维护,以及用户浏览、查询和检索等。 与人工收集的学生病句卡片资料相比,中介语语料库能够更好地反映学生学习汉语的情况,帮助教师更加全面地观察他们的学习过程,了解影响学习和习得的各种因素。在汉语作为第二语言的教学中,为教材编写、课堂教学、测试等环节提供依据。 现代汉语研究语料库的建设目标是为语言学家提供一个研究平台,由2000万字的粗语料库和200万 字经过分词和词性标注的精语料库两个部分组成。粗语料库收录的语料样本中绝大部分是九十年代的出版物,有《人民日报》1000万字,《中国新闻》500万字,各种书籍250万字,文学作品150万字,准口语材料(书面形式的对话、独白)100万字。精语料库的200万字语料样本是从粗语料库中按照规定的比例由计算机随机抽取的,有书面语语料160万字,准口语语料40万字,是从语体、题材、体裁三个方面均衡选取的平衡语料库。为了对这些语料进行词语切分和词性标注,作者制定了词语切分的细则和词性标记体系的原则,采用了一个含有112个词类标记的标记集,确定了兼类词的处理方法。这个语料库的管理系统具有建库、检索、浏览、统计、输出等功能,可以按词或词类检索,统计出词的频率、词类频率、词类共现频率、平均词长、平均句长等结果。这个语料库建成以后,很快应用在现代汉语语法、汉语教学和汉语信息处理的研究中,研究内容涉及现代汉语的插入语、汉语句子的主题-主语标注、V+N序列实验分析、词性标注中词语归类问题、动宾组合的自动获取与标注,等等。 建设北京地区现场即席话语语料库的目的是,通过收集大量的现场即席话语语料研究现场即席话语的各种动态机制,以揭示现场即席话语的使用规律。这个语料库的研究策略和取样方法很有特点,首先是严格区分资源库和语料库,资源库收集符合现场即席话语定义的录音材料,语料库收录按照一定标准从资源库提取出来的材料;另外在语料采样前先做摸底性研究,通过研究对现场即席话语的真实情况有所了解,确定取样域,再定取样范畴,然后根据取样范畴 去录现场典型材料,这是一种层次范畴化的取样方法。这个语料库目前正在建设之中,已经取得了近 600小时的录音材料和50多小时的 录象材料。 在用于汉语研究的语料库中,讲究选材均衡,注重语料加工,同时也提供公开服务的,当数台湾中央研究院历史语言研究所的现代汉语平衡语料库 (简称 Sinica Corpus) 。这个 语料库的规模为500万个词,每个句子 都依词断开,标示词类标记,并且配备了检索系统,在网上开放供大家使用。根据自己制定的一套汉语文本属性特征为语料分类,在不同的类别上尽量均衡地采集语料,是这个语料库的特点之一。文本属性用来说明文档的呈现方式、文章的写作方式、文章写作的内容和文档的来源出处,包括 7类,每类下设若干小类: 文类 (文档的呈现方式) 报导、评论、广告图文、信函、公告启事、小说故事寓言、散文、 传记日记、诗歌、语录、说明手册、剧本、会话、演讲 、会议记录 文体 (文章的写作方式) 记叙、论说、说明、描写 语式 (文档的呈现方式) 书面语、演讲稿、剧本/ 台辞 、口语谈话、会议记录 主题 (文章写作的内容) 哲学、科学、社会、艺术、生活、文学 媒体 报纸、一般杂志、学术期刊、教科书、工具书、学术论著、一般图书、 书信、视听媒体、其它 作者 姓名、性别、国籍、母语 出版 出版单位、出版地、出版日期、版次 不同研究目的的语言学者可以自己按语式、文体、媒体和主题的小类选取不同类别的语料,组成自订语料库,在自订语料库的范围内进行语料的检索和统计。除了通常的按词语、词类的检索和统计以外,这个语料库的管理系统还提供了一种进阶处理功能,对检索出来的数据作进一步处理,对处理的结果还可以再次处理,形成多层的检索结果。 (四)面向语言信息处理的现代汉语语料库 90年代中后期,面向语言信息处理的现代汉语语料库开始建立并投入应用。其中最早开发的是清华大学用于研究和开发汉语自动分词技术的现代汉语语料库,经过几年的积累已达到 8亿多字生语料 。在这个语料库的支持下, 用统计语言模型的方法研究了汉语自动分词中的理论、算法和技术,编制了总数为 9万多个词语的《信息处理用现代汉语分词词表》。这些研究工作体现了我国汉语自动分词技术的发展水平,词表被许多汉语自动分词系统作为底 表使用,是不可缺少的基础资源。 TH通用语料库系统是清华大学建立的另一个现代汉语语料库。这个语料库有两个特点,一是语料库管理系统根据不同的加工深度,分四个等级管理语料。第一级是生语料分库,有4千余万字;第二级以上都是加工程度不同的熟语料库,其中第二级存放经过自动分词并由人工校对过的初加工语料500余万字;第三级存放经过词性标注和人工校对的语料约300万字;第四级是经过句子成分标注和人工校对的语料。每个分库又按语料的来源分成一般书籍、报纸、杂志、论文和工具书五类子库。不同等级的语料可以为不同的应用目标服务。第二个特点是在这个语料库的支持下,进行了汉语信息处理技术的研究。譬如,采用以谓语为中心的句型成分分析与语料统计相结合的方法,自动分析汉语的句型,提出了一个汉语句型频度表;在汉语文本中自动标注句子成分和句型成分的边界;根据指定的句型在语料库里搜寻句子实例,等等。 HuaYu 人工标注语料库是清华大学和北京语言大学合作建立的一个现代汉语平衡语料库。这个语料库按文学、新闻、学术、应用文四个大类收录了200余万字语料。它的特点是讲究加工的深度,除了词语切分和词性标注以外,还根据语句中动词的类型和句子的长度进行语块标注和句法树标注,目的是为建立汉语短语分析或句法分析的语言模型获取统计数据提供资源。下面分别 是语块标注和句法树标注的示例。 对句子自古以来,人类就重视档案的保存和利用,设置馆库、选派专人进行管理。 进行语块标注以后得到的是一个无嵌套的线性序列,其中 S是主语 语块, P是述语 语块, O是宾语 语块: 我/ rN ] 的/u 书/n ] ] ] ] 。/w ] (五)用于开发特定语言分析技术的专用语料库 这类语料库是针对汉语信息处理技术的需要专门建立的。例如山西大学的专有名词标注语料库和分词与词性标注语料库。 分词与词性标注语料库,规模为500万字,带有分词标记、词性标记和句法标记。标注时依据《信息处理用现代汉语分词规范》和《信息处理用现代汉语词类及标记集规范》。在这个语料库的支持下,开发汉语自动分词和词性标注软件,研究自动分词和词性标注的评测技术。为了解决汉语自动分词中的切分歧义问题,还建立了交集型歧义字段库和组合型歧义字段库,专门收集这两种类型的歧义切分实例。前者有7.8万字,后者收录了140多条。并且在分词和词性标注语料库里作了这两类切分歧义的标注。利用这些语料调查交集型歧义当中的伪歧义现象(既切分结果只可能有唯一选择的那些交集型歧义切分字段),发现这种现象在歧义切分字段中很普遍,可以达到90%以上。 专有名词标注语料库用于研究汉语自动分词中专有名词的识别算法。其中包括标注了中国地名的语料280万字,标注了中国人姓名的语料300万字,标注了西文姓名的语料250万字,标注了汉语机构名称的语料50万字,还有标注了网络新词语的语料150万字。利用这些语料,建立了中国地名用字、用词库,姓氏人名库,姓氏用字频率表,名字用字频率表等, 用统计语言模型的方法识别专有名词。 (六)双语语料库 基于实例的机器翻译(Example-based)需要大规模的双语平行语料库来支持。语料库里 的源语和目标语实例要按照相同级别的翻译单位一一对齐。目前已有的双语平行语料库主要是汉语和英语的,语料对齐的单位有句子级的、子句级的、短语级的,也有词汇级的。机器翻译系统把要翻译的句子与语料库里的源语实例进行对比,分析相似程度,找到最适合的源语实例,再参照与它对齐的目标语实例生成译文。用于这类机器翻译系统的双语语料库必须有一定的规模,用人工做语料对齐的工作显然很难满足要求。这就使文本自动对齐成为建立双语语料库的关键技术。 在目前已有的双语语料库中,哈尔滨工业大学的汉英平行语料库已经直接用来开发英汉双向机器翻译系统。这个语料库有6万个汉语和英语的句子,使用多级对齐加工技术,分别按照句子、短语结构和词一一对齐。中国科学院计算技术研究所的汉英双语语料库有 20万个句 对,也完成了句子一级的对齐,并在网上提供查询服务。北京大学、中国科学院软件研究所等单位也建立 了按句对齐的汉英双语语料库。除此之外,还有以语段或短语为单位收集的汉英双语语料库,譬如中国科学院自动化研究所的汉英双语短语库,有 3~5万对已对齐的汉语和英语短语。东北大学的英汉双语语段库,用来帮助建立电子版的英汉搭配词典。 (七)面向汉语史研究的语料库 面向汉语史研究的语料库建设是从搜集汉语史文献资料开始的。台湾中央研究院历史语言研究所从90年代初期就开始了这项工作,他们先收集上古汉语的语料,然后扩展到中古汉语和近代汉语。90年代中后期逐步开始上古汉语语料和近代汉语的标注,在该院信息研究所和计算中心的协助下进行标注技术和检索技术的开发。根据是否经过分词处理和词性标注,台湾中央研究院的古汉语语料库和近代汉语语料库可以分成两类:生语料库和标记语料库。目前生语料库收集的语料已涵盖上古汉语(先秦至西汉)、中古汉语(东汉魏晋南北朝)、近代汉语(唐五代以后)的大部分重要文献资料,并己陆续开放使用。在标记语料库方面,上古汉语及近代汉语都已有部分语料完成标注工作,也逐步提供网上检索。2001年底,开放了近代汉语标记语料库WWW 版供各界使用,首先提供查询的文献是《红楼梦》及《三遂平妖传》。在查询方面,除了常用的功能以外,还可以在显示词项及词类的同时给出例句的出处,便于历史语法的研究者使用。 多年来中国社会科学院语言研究所也一直在致力于文献资料的建设,搜集整理了近代汉语书面语语料 150万字,中古近代汉语语料约1 千万字,部分语料已作了标注。目前已经完成了一个小型语料库,包括:敦煌变文集、祖堂集、三朝北盟汇编、 碧岩录、朱子语类、刘知远诸宫调、西厢记诸宫调、元刊全相平话五种、元典章 刑部、老乞大谚解、朴通事谚解、孝经直解、鲁斋遗书、经筵讲义等 十余种文献,成为汉语史和语言学理论研究的重要资源。此外,语言研究所的先秦专书电子文档有4部文献,共约120万字,并且已由古汉语学者逐篇逐句标注了语法信息。 上海师范大学、浙江师范大学、四川大学等学校也依据各自汉语史研究的方向,建立了历史文献语料库。四川大学的中古汉语语料库有1亿字的中古汉语语料和有关中古汉语研究的资料。浙江师范大学的 楚辞语库 、前四 史语库、六朝语库、太平广记语库、唐诗语库、宋词语库,已用于 前四史语言研究和唐宋诗词语词研究。 目前历史文献语料库建设的特点是依托学科建设和研究方向,广泛收集资料,注重校勘精审。随着汉语史研究和语料库应用的发展,资源共享和语料加工将得到越来越多的重视。历史文献资源共享,首先要避免语料的重复收集,还要采用国际通用的标准处理语料文本,使语料能够准确、方便地交换和使用。语料加工则是充分发掘语料应用价值的基础工作,从收集历史文献的电子文档,到建成一个具有必要的语言学标记信息、合理的逻辑结构和方便的检索功能的语料库,语料的加工是不可或缺的一步。 (八)比较语料库 为了研究汉语在不同地区的使用情况,香港城市大学建立了LIVAC共时语料库(Linguistic Variation in Chinese Speech Communities)。语料来自香港、台湾、北京、上海、澳门及新加坡六地有代表性的中文报纸,以及电子媒介上的新闻报道。自1995年7月开始,每四天一次,收集这六个地区的对等书面语文本,每次约两万字。内容包括新闻、特写、评论等文章。到2003年上半年,已收集了1亿1千多万字、超过56万个词条。计划收集到2005年6月,囊括新旧世纪交接点前后各五年各地华语社区有代表性的重要语言数据,供汉语的各种共时比较研究使用。 在语料的组织和加工方面,这个语料库用计算机自动分词,再经人工校对分类, 可以依字、词、句为基础进行检索,提供字、词配搭、分布等数据,有统计功能。语言学家能通过这个语料库考察上述六地出现的新词、词义有所发展或转移的旧词、以及有地方特色的词语,还可以对具体字或词的频率作统计比较,对字词的差别作计量分析。对研究华人社区的文化、社会、语言差异也有作用。这个语料库的一部分已经在网上提供服务。 (九)少数民族语言语料库 新疆大学从2002年起开始建设现代维吾尔语语料库系统,计划包括5个部分:语料库、电子语法信息词典、规则库、统计信息库和检索统计软件包。其中语料库部分又分成生语料库(经初步整理的原始语料)和加工语料库(经过标注和校对的语料)。目前已有生语料800万词。另外, 新疆大学也正在以新闻领域的维汉 -汉维机器翻译为目标,建设双语平行语料库。内蒙古大学的中世纪蒙古文语料库收集了《元朝秘史》、《黄金史》、《回鹘蒙古文文献集》等历史文献。他们还建立了500万词的现代蒙古语语料库,研究了蒙古文附加成分的自动切分、复合词的自动识别和语料的词性标注,获得了词频统计、音节统计、词类统计、附加成分统计等数据。西北民族大学建立了1亿3 千万字节的大型藏文语料库,用于藏文词汇频度和通用度的统计。中国社会科学院民族学与人类学研究所建立了500万藏语字符的藏语语料库,进行词语切分和标注的研究。新疆师范大学也建立了200万词的维吾尔语语料库。 与汉语语料库相比,少数民族语料库的建设还需要解决一些特殊的问题,譬如拼音文字转写的标准和规范,词语分类体系及其标记集等。 到2003年,已建和在建的各种文本语料库还有很多(包括书面语语料库和以文本形式表示的口语语料库),以上提到的只是有代表性的一部分。与文本语料库相对的,是语音语料库 。语音语料库不仅记录语图、声学参数等语音学数据,还有句法、韵律等各种语言学信息标记和副语言学信息标记,可以 在语音识别与合成系统中用来建立语音模型, 用于语音研究、语音工程开发和汉语普通话教学等领域。语音技术是当前信息技术和通讯领域里最具潜力的发展方向之一,语音语料库在 科研和工程上有很高的使用价值。关于语音语料库的详细情况,请见语音学和言语工程研究综述。 三 语料库的加工、管理和规范 (一)语料的加工 一个计算机语料库的功能主要与三个因素有关,一是语料库的规模,二是语料的分布,三是语料的加工程度。规模的大小关系到统计数据是否可靠,语料的分布涉及统计结果的适用范围,语料加工的深度则决定这个语料库能为使用者提供什么样的语言学信息。 加工语料主要指文本格式处理和文本描述两项工作,前者是对采集的语料文本进行整理,转成统一的电子文本格式,例如数据库格式、XML文本格式等。后者是描述每一篇语料样本的属性或特征,包括篇头描述 和篇体描述。篇头描述说明整篇语料样本的属性,例如语体、内容所属的领域、作者、写作时间、来源出处等等,篇体描述是在文本里添加各种语言学属性标记,对于汉语书面语语料库来说,常见的是词语切分标记、词性标记、专有名词标记,还有某些语法特征如短语标记、子句标记,或语义信息标记,等等。对汉语书面语语料的加工一般是从词语切分、词性标注,到语法、语义属性标注,按顺序进行。标注的信息逐步增多,语料加工的深度也就逐渐增加。人们通常把没有篇体描述信息的语料叫做生语料。对汉语的生语料只能以字为单位进行检索和统计。经过词语切分处理的语料,就能以词为单位进行检索、统计和定量分析。如果还作了词性标记,那么可以获得的语言学信息就更多了。语料的标注如果由人来做,当然能够保证准确性,但是人工标注对处理大规模的语料显然不够现实。所以几乎每一个大规模语料库的加工都需要借助自动化的手段,词语自动切分、词性自动标注等就成为备受关注的语料加工技术。 自动分词是我国最早开始研究的汉语信息处理技术之一。语料库的建设开始以后,自动分词技术在语料加工中又得到了应用和发展。自动分词和词性自动标注一般都需要一个词典,作为分词和词性标注的基础。这个词典与常用的语文词典相比,收录的词目不大一样,包括了语言学家认可的词,以及一些 比词小的单位(如语素字、词缀等)和一些比词大的单位(如成语、习语、简称略语等)。词典中也包括词类信息和其他语法信息。目前的自动分词技术是基于字符串匹配原理的,有正向最大匹配、逆向最大匹配等基本算法。在切分过程中会出现歧义现象,如何处理歧义是自动分词研究的重点之一,在这方面投入的研究也最多,先后提出了短语结构法、专家系统法、隐马尔科夫模型、串频统计和词匹配等辩识歧义的方法。识别未登录词是自动分词研究的第二个重点。未登录词指没有被分词底表收录的词语,包括人名、地名、机构名等专有名词和新出现的词语。对未登录词的识别一般以基于语料库的统计语言模型方法为主。 词性自动标注通常与自动分词同时进行,根据带有词类信息的分词词典,给切分出来的词语标上初始的词类标记。对于兼类词,必须在句子里判断类别。因此需要分析兼类词语在上下文中的分布特点和语法功能,并用形式化的方式表达出来,作为词性标注系统排除兼类的规则。近年来,已经有几个自动分词和词性自动标注系统投入了应用,其中北京大学用自己研制的系统为《人民日报标注语料库》做分词和词性标注的初加工,北京语言大学的自动分词系统也成为其《面向语言教学研究的汉语语料检索系统》中的关键技术。此外,经过十几年的研究和实践,2001年发布了收录9万多词语的《信息处理用现代汉语分词词表》和《现代汉语词类及标记集规范》。对于1993年制定的国家标准《信息处理用现代汉语分词规范》的可操作性问题,也进行了积极的讨论和实验,提出了有效的解决方法。关于自动分词和词性自动标注的详细情况,请见计算语言学和自然语言信息处理研究综述。 经过分词的语料,除了标注词性以外,还可以进一步标注其他语言学属性,譬如韵律、语调、短语结构、句法结构、语义关系等等。句子的语法结构需要有形式化的方式来表达,大多数语料库或者采用短语结构树,或者采用依存语法树的方式,这样标注过的语料库就成为 短语树库或句法树库。一般情况下,在词性标注的基础上再作进一步的语法标注加工,多以人工为主,也有关于自动短语定界和句法信息自动标注的研究和实验。目前已有的汉语短语库、句法树库规模都不大,至多百万词级。 在双语语料库的建设中,除了上述语料加工项目以外,还有一项不可缺少的语料加工任务:双语语料对齐。语料对齐分为段落、句子、子句、短语和词语几个不同的层次。如果考虑用计算机程序做自动对齐,不同的层次要解决的问题各不相同。每种语言的段落都有可识别的标志,因此段落的对齐最容易实现,句子的对齐在印欧语言之间比它们和汉语之间要容易,词语的对齐需要借助词典,句子内的各种结构要自动对齐则是最难的。目前双语自动对齐技术的研究主要是针对句子和句子内的结构,采用的方法有基于长度的、基于词典的,或者是这两种方法的混合策略。 (二)语料库管理系统 经过科学选材和标注、具有适当规模的语料库,还应该有一个功能齐备的管理系统,包括数据维护(语料录入、校对、存储、修改、删除及语料描述信息项目管理)、语料自动加工(分词、标注、文本分割、合并、语料对齐、标记处理等)、用户服务功能(查询、检索、统计、打印等)。其中数据维护部分主要涉及汉字字符处理、文本处理、文件管理等计算机程序设计技术。语料自动加工部分的主要内容是自动分词、各种语言学属性的标注技术,已经在前面专门介绍过了。这里主要谈谈面向用户的语料检索、统计和分析技术。 语料检索是一种全文检索技术,但是也有自己的特点,仅用普通的全文检索技术还不能满足语料检索的需要。这是因为,全文信息检索关心的是检索目标的意义,不是检索目标的语言表述形式。而面向语言研究的语料检索则特别注重语言的表述形式,它既需要 按照字、字串和词检索,也需要把词语的语言学属性作为检索的目标和约束条件,还要求把检索的结果或目标的出处按照研究的需要排序、输出。除此之外,还要有字频、词频和特定语言形式出现频率的统计功能。 对汉语生语料的检索和统计是以字或字串为单位进行的。这一类检索系统主要以单字索引和字符串匹配为关键技术,由于把词语当作字串来检索,所以检索结果中经常出现非词的问题。例如要查找出警,检索结果中除了迅速出警、拒绝出警、出警次数等实例以外,发出警告、放出警犬等也混在其中。为了解决这些问题,常常需要为字符串匹配的检索表达式另外设置限制条件。这些限制条件大多是个性的,只能排除一部分非词的实例。要想从根本上解决这个问题,就必须对语料作词语切分。经过词语切分处理的熟语料, 能以词为单位进行检索、统计和定量分析。但是熟语料库的加工代价很高,而且对于语料的词语切分和词性标注,目前还没有既成熟又便于操作的规范,所以近年来,面向生语料库的检索技术一直在广泛应用,并且在用户功能方面不断发展。譬如,可以对用户给出的任何生语料快速生成索引;可以使用具有复合逻辑关系的检索表达式;可以按照汉字、拼音、笔画对检索结果的上下文自动排序;可以提供检出实例的来源、出处;可以按字频统计的数据排序;检索结果和统计结果既可以按文本形式输出,也可以按数据库形式输出;还可以通过网络支持多用户远程检索。 对于经过词语切分处理和词性标注的熟语料库,除了所有生语料的检索功能以外,语料检索系统还可以把词语或词性作为检索的关键字或限制条件,得到关于这些语言学属性的检索和统计结果,并按各种排序和输出形式的提供给用户。语言学属性来自语言学家对汉语的研究,研究过程中有各种观点和认识,从词的定义到词类的确定,一直还没有统一的意见。另一方面,人们检索语料时的目的也各不相同,有的关心词汇问题,有的关心语法现象,还有的目标是汉语信息处理的应用问题。因此 对于熟语料库检索来说,一个好的检索系统应该能够包容各种不同的语言学观点,可以用于不同的检索目的。 为了做到这一点,通常采用的办法是,把用于语料库自动分词的底表和 附着于底表的词性、构词等属性都看作语言学属性表,使这个属性表与检索系统的程序相互独立,检索系统只把属性标记作为抽象的字符串处理,而把建立属性表的工作交给用户。以北京语言大学的《面向语言教学研究的汉语语料检索系统》为例,它的自动分词词表、词属性集和每个词的属性标记都由用户提供,提供的方式是把词目和它的属性标记登记在数据库里。检索系统使用用户提供的这个属性表对生语料自动分词,并生成索引,供给用户检索。检索系统对属性表没有任何限制,规模可大可小,表中的词目也可以跟通常认为的词没有关系,属性可以是语法的,也可以是构词的、语义的、语音的,等等。这样用户就能根据自己的需要检索和研究各种字串在语料中的表现。 把语料加工技术集成在检索系统里面,是语料库检索系统的另一个特点。语料加工技术一般指词语自动切分和词性自动标注。在北京语言大学的语料检索系统中,未登录词的自动识别技术比较有特点。它可以识别各种数字串、中西人名、中西地名、机构名、后缀短语等,并为它们建立索引,供用户检索和统计。 (三)语料库的规范问题 语料库的规范问题主要是对语料加工而言的。汉语语料库首先遇到的规范问题是词语切分。我国90年代初发布了国家标准《信息处理用现代汉语分词规范》(标准号为GB/T13715-92)。这个规范基本上采用《暂拟汉语教学语法系统》中的观点,把词定义为最小的独立运用的语言单位。针对汉语语素、词和词组界限不够清晰的问题,还特别提出了分词单位的概念。把分词单位定义成汉语信息处理使用的具有确定的语义或语法功能的基本单位,并且用结合紧密、使用稳定的原则作为判断分词单位的标准。这样做的目的是避免关于如何界定词的争论。但是结合紧密、使用稳定的原则缺少可操作性,对于自动分词研究中的具体问题常常难有定论。于是就有了根据规范制定一个词表,用规范+词表的办法指导分词的建议。这样在90年代中期和末期,分别提出了收词43570条的《信息处理用现代汉语常用词表》和收词9万多条的《信息处理用现代汉语分词词表》。其中后者是在8亿字的大规模语料库支持下,采用串频、互信息、相关度等计算统计方法,依据定量的数据分析结果辨识分词单位的。与此同时,语言学家也参与了制定这个词表的工作,他们提出的各种语言学规则,从定性分析的角度与统计数据相互作用,最后经过人工审定,确定了92843个词目,其中一级常用词56606个,二级常用词36237个,成为目前许多自动分词系统使用的词表。 90年代中期,台湾的计算语言学会也提出了一个《资讯处理用中文分词规范》。这个规范有三条基本原则,一是分词单位必须符合语言学理论的要求;二是在信息处理上切实可行;三是能够确保真实文本处理的一致性。它把分词规范分成信、达、雅三个不同的等级, 信级是基本资料交换的标准,达级是机器翻译、情报检索等自然语言处理的标准,雅级则是分词的最好结果。这样可以根据不同的应用目的做难易程度不同的分词处理。 词语切分以后,下一个规范问题就是词性标注。经过十多年的词性标注研究和实践,教育部语言文字应用研究所于2001年提出了《信息处理用现代汉语词类标记集规范》。这个规范吸收了语言学家的研究成果,也兼顾了已有的各个用于语言信息处理的词类系统,制定了标记现代汉语书面语词类的符号集,使各种汉语信息处理应用系统能够尽量使用统一的词类标记,有助于信息交换和资源共享。 标注短语和句子结构是语料库进一步深加工的内容,虽然目前尚处于起步阶段,但已经在标注的同时考虑了规范的问题。清华大学提出的《汉语句子的句法树标注规范》,主要包括句法标记集的内容描述、句法树的划分规定、歧义结构的处理、结构分析的方向性等问题。上海师范大学根据自己制定的《汉语文本短语结构人工标注规范》,对100万字的1997年《读者文摘》进行了分词、词性标注和人工标注短语的试验。哈尔滨工业大学采用包含23个短语符号的标记集合,开发了一个8000个句子的 汉语树库。清华大学还建立了一个基于语义依存关系的语料库,也涉及到标注体系的选择和标注关系集的确定。这些工作规模都不大,在规范方面还处于各自为政的状态。随着语料的进一步深入加工,统一规范将成为不可避免的问题。 北京大学的《人民日报》标注语料库是目前规模最大的汉语基本标注语料库。在它的开发过程中,各种加工规范起了关键的作用。在这些加工规范中,有词语的切分规范,主要规定把句子的汉字 串形式切分为词语序列的原则;有现代汉语词类及标记集规范,规定切分出来的词语、短语、标点符号的类别和标识符号;有切分和标注相结合的规范,规定语素构成合成词的方式(重叠、附加和复合);有标注规范,规定词性标注与词库的关系,主要解决如何在上下文环境里确定兼类词的词性;还有收词 7万余条的词库《现代汉语语法信息词典》。加工大规模的语料是一项浩大的语言工程。语料标注的准确性和一致性需要靠完善、合理的词库和严谨、实用的加工规范来保证。《人民日报》标注语料库的加工规范和《现代汉语语法信息词典》是语言学家和信息处理专家合作,在汉语语法研究的理论和方法指导下,根据汉语信息处理的实际需要制定和开发的。在标注大规模语料的实践中,又得到了验证和完善。 除了语料加工以外,语料库还应该在语料的采集和存储格式上有所规范。对于平衡语料库来说,采集规范主要是为了保证语料的平衡性,而类别分布和时间分布是语料平衡的两大要素。每个语料库都要对语料进行分类,分类的原则各不相同。有的根据内容涉及的主题分类,有的根据语体分类。在众多平衡语料库当中,台湾中央研究院的 现代汉语平衡语料库的分类标准很值得注意。这个语料库的研制者认为,用 传统的文体单一特征来界定平衡语料库不足以反映影响整个语言全貌的内在因素。 因此他们采用的是 多重分类原则:把所有语料都标上五个不同特征的值:(1) 文类 (2) 文体 (3) 语式 (4) 主题 (5) 媒体。利用以主题为主的五个特征的多重分类来进行语料库的平衡。 这样做还使 研究者能够任选其中几个特征的组合,定义自己的次语料库(sub-corpora),也可以在次语料库间作比较研究。另外,多重分类原则也有利于以后平衡语料库的更新。语料存储格式的规范一般 指采用统一的编码规范为电子文本作标记,目前可扩充置标语言 XML被广泛地用作语料库标注的元语言,存储格式的标准化有助于语料的交换和共享。 四 语料库在语言研究中的 的 应用 在语言研究中,语料库方法是一种经验的方法,它能提供大量的自然语言材料,有助于研究者根据语言实际得出客观的结论,这种结论同时也是可观测和 可验证的。在计算机技术的支持下,语料库方法对语言研究的许多领域产生了越来越多的影响。各种为不同目的而建立的语料库可以应用在词汇、语法、语义、语用、语体研究,社会语言学研究,口语研究,词典编纂,语言教学以及自然语言处理、人工智能、机器翻译、言语识别与合成等领域。我国在语料库的应用上还处于起步阶段,在计算语言学和语言信息处理领域,语料库主要用来为统计语言模型提供语言特征信息和概率数据,在语言研究的其他领域,多使用语料的检索和频率统计结果。 语料库与自然语言信息处理有着相辅相成的关系,大规模的语料库是 用统计语言模型方法处理自然语言的基础资源。然而统计语言模型本身并不关心其建模对象的语言学信息,它关心的只是一串符号的同现概率。譬如 N元语法模型,它只关心句子中各种单元(比如字、词、短语等)近距离连接关系的概率分布,而对于许多复杂的语言现象,它就无能为力了。在统计语言建模技术最先得到成功应用的自动语音识别领域,语料库的开发和建设受到格外的重视,标注语料库成为不可缺少的系统资源,就是因为,要想改进N元语法的建模技术,必须利用语料库引入更多的语言特征信息和统计语言数据。同样,在书面语语言信息处理领域里,语料库提供的语言知识也越来越多地用在统计语言模型方法中。除了词语自动切分、词性自动标注、双语语料对齐等语料加工技术以外,人们还在语料库的支持下,建立有关语法、语义的语言知识库,开发信息抽取系统、信息检索系统、文本分类和过滤系统,并且把基于统计或实例的分析技术集成到机器翻译系统里面。 近年来在语料库的支持下,从信息处理的角度研究汉语词汇、语法和语义问题的报告也日渐增多。这些研究包括:根据逐词索引作汉语词义的调查;对词语搭配进行计量分析;利用量词--名词的搭配数据研究汉语名词分类问题;进行现代汉语句型的统计和研究;做短语自动识别(例如基本名词短语、动宾结构)和自动句法分析的试验;研究在句子里为词语排除歧义的算法;分析和统计汉语词语重叠结构的深层结构类型及产生方式;等等。 对于词汇学、语法学、语言理论、历史语言学等研究来说,语料库的作用目前大多还是通过语料检索和频率统计,帮助人们观察和把握语言事实,分析和研究语言的规律。语料库方法的发展会使这种仅起辅助作用的手段逐步变成必备的应用资源和工具。利用语料库,人们可以把指定的语法现象加以量化,并且检测和验证语言理论、规则或假设。 在少数民族语言和方言调查研究方面,比较有代表性的工作是藏缅语语料库及比较研究的计量描写。它建立了我国 境内藏缅语族五大语支 82个语言点16万词条的词汇语音数据库,对藏语方言的音节、音位、声母、韵母、 声词、词素、构词能力和语音结构等 10余项特征作了分布和对比分析。对藏语15个方言点作了语音对应关系和音系对比关系的量化描述,并且在这个基础上做出具有历时和共时比较研究意义的相关分析,得出了语言分类的相关矩阵和聚类分析图表。 在应用语言学领域,词典编纂和语言教学同是语料库的最大受益者。目前已有多部词典在编纂或修订过程中,不同程度地使用语料库或电子文档收集词语数据,用于收词、释义、例句、属性标注等。南京大学近年来开发了 NULEXID语料库暨双语词典编纂系统,涉及英汉两种语言,在《新时代英汉大词典》的编纂过程中起了重要作用。从词典编纂的整体情况看, 我们还缺少充分的语料资源和有效的分析工具,很多有意义的事情还做不了。譬如,分析语料中显现的词语搭配现象,利用语料库进行词语意义辨析,在动态的语料库中辅助提取新词语,等等。把语料库用于语言教学的一个例子是上海交通大学的JDEST英语语料库,利用这个语料库,通过语料比较、统计、筛选等方法为中国大学英语教学提供通用词汇和技术词汇的应用信息,为确定大学英语教学大纲的词表提供了可靠的量化依据。这个语料库也在英语语言研究中发挥了作用,支持 基于语料库的英语语法的频率特征、语料库驱动的词语搭配等项研究。 2003年,中国学习者英语语料库由上海外语教育出版社正式发行。这个语料库是一个100多万词的书面英语语料库,涵盖我国中学生、大学英语4级和6级、英语专业低年级和高年级的学习内容,并对所有的语料作了语法标注和言语失误标注。根据这个语料库得到了词频排列表、拼写失误表、词目表、词频分布表、语法标注频数表、言语失误表等,还把这些数据与一些英语本族语语料库(如BROWN,LOB,FROWN,FLOB)进行了某些比较。这个语料库为词典编纂、教材编写和语言测试提供了必要的资源。 目前 上海交通大学正在建设 大学英语学习者口语英语语料库。 在几年来语料库建设和应用的基础上,2003年国家973计划开始支持中文语言资源联盟(Chinese Linguistic Data Consortium,简称 ChineseLDC)的建立。ChineseLDC是吸收国内高等院校、科研机构和公司参加的开放式语言资源联盟。其目的是建成能代表当今中文信息处理水平的、通用的中文语言信息知识库。ChineseLDC将建设和收集中文信息处理所需要的各种语言资源,包括词典、语料库、数据、工具等。在建立和收集语言资源的基础上,分发资源,促成统一的标准和规范,推荐给用户,并且针对中文信息处理领域的关键技术建立评测机制,为中文信息处理的基础研究和应用开发提供支持。 几年来在计算语言学和语言信息处理领域的学术会议上,语料库的建设和应用一直是重要论题之一。讨论的重点集中在基于语料库的语言分析方法,以及语料的标注、管理和规范等问题上。语言学家更多关心的是语料库的规划和建设,语料库方法在语言研究和教学中的应用。近年来语言学界也召开有关语料库的专门学术会议,譬如 2001年由中国社会科学院语言研究所主办、在清华大学召开的语料库语言学与计算语言学研究与实践研讨会(主要讨论了语料库的建设和应用、语言信息处理等问题);2003年由上海交通大学等单位主办、在上海交通大学召开的语料库语言学国际研讨会(会议主题是语料库研究与外语教学)。 下面是有关的参考文献以及部分公开发布的语料库的网址(有的互联网网址可能会随时间而有所变动)。 陈小荷等 1996关于建立大规模 汉语树库的设想,《计算机时代的汉语和汉字研究》,罗振声、袁毓林主编,北京:清华大学出版社 冯志伟 2002中国语料库研究的历史与现状,《汉语语言与计算学报》,Vol.12,Num.1 顾曰国 1998语料库与语言研究,《当代语言学》,第1期 顾曰国 2001北京地区现场即席话语语料库的取样与代表性问题, 《首届中法学术论坛论文 集》 黄昌宁等 2002《语料库语言学》,北京:商务印书馆 黄居仁 等 1997《国语日报量词典》,台北:国语日报社 教育部语言文字应用研究所计算语言学室 2001信息处理用现代汉语词类标记 集规范,《语言文字应用》第 3期 靳光瑾 2003谈语料库建设与规范标准问题,《中文信息处理若干重要问题》,徐波等主编,北京:科学出版社 雷秀云 等 2001基于语料库的研究方法及MD/MF模型与学术英语语体研究,《当代语言学》,第2期 刘开瑛 2003基于互联网的多层次汉语语料库构建研究,《中文信息处理若干重要问题》,徐波等主编,北京:科学出版社 刘连元 1996现代汉语语料库研制,《语言文字应用》第3期 卢亚军等 2003基于大型藏文语料库的藏文字符、部件、音节、词汇频度与通用 度统计及其应用研究,《西北民族大学学报(自然科学版)》,第 24卷,第2期 罗振声 1996清华TH语料库的结构、功能与应用,《计算机时代的汉语和汉字研究》,罗振声、袁毓林主编,北京:清华大学出版社 孙茂松等 1997汉语搭配定量分析初探,《中国语文》,第1期 孙茂松等 2001信息处理用现代汉语分词词表,《语言文字应用》,第4期 卫乃兴 2002 基于语料库和语料库驱动的词语搭配研究,《当代语言学》第 2 期 邢红兵 2000汉语词语重叠结构统计分析,《语言教学与研究》,第1期 杨惠中主编 2002《语料库语言学导论》,上海:上海外语教育出版社 尤方等 2003基于语义依存关系的汉语语料库的构建,《中文信息学报》,第1期 俞士汶 2002北京大学现代汉语语料库基本加工规范,《中文信息学报》,第5,6期 俞士汶 2003语料库与综合型语言知识库的建设,《中文信息处理若干重要问题》,徐波等主编,北京:科学出版社 张普 2003关于汉语语料库的建设与发展问题的思考,《中文信息处理若干重要问题》,徐波等主编,北京:科学出版社 赵军等 2003中文语言资源联盟的建设和发展,《中文信息处理若干重要问题》,徐波等主编,北京:科学出版社 郑玉玲等 1996 藏缅语 语料库及比较研究的计量描写,《中文信息学报》,第2期 邹嘉彦等 2003汉语共时语料库与信息开发,《中文信息处理若干重要问题》,徐波等主编,北京:科学出版社 北京大学《人民日报》标注语料库: http://www.icl.pku.edu.cn 北京语言大学的语料库: http://www.blcu.edu.cn/kych/H.htm 清华大学的汉语均衡语料库 TH-ACorpus : http://www.lits.tsinghua.edu.cn/ainlp/source.htm 山西大学的语料库: http://www.sxu.edu.cn/homepage/cslab/sxuc1.htm 台湾中研院的语料库: 现代汉语平衡语料库: http://www.sinica.edu.tw/SinicaCorpus 或 http://www.sinica.edu.tw/~tibe/2-words/modern-words/ 或 http://www.sinica.edu.tw/ftms-bin/kiwi.sh 近代汉语标记语料库: http://www.sinica.edu.tw/Early_Mandarin/ 古汉语语料库: http://www.sinica.edu.tw/ftms-bin/ftmsw3 或 http://www.eastasian.ucsb.edu/projects/scriptasinica/cgi-bin/ghy/kiwi.cgi 或 http://www.sinica.edu.tw/~tibe/2-words/old-words/ 台湾南岛语典藏: http://www.ling.sinica.edu.tw/Formosan/ 闽南语典藏: http://southernmin.sinica.edu.tw/ 汉籍电子文献: http://www.sinica.edu.tw/~tdbproj/handy1/ 或 http://www.sinica.edu.tw/ftms-bin/ftmsw3 香港城市大学的LIVAC共时语料库: http://www.rcl.cityu.edu.hk/livac/ 或 http://www.LIVAC.org 浙江师范大学的历史文献语料库: http://lib.zjnu.net.cn/xueke/hyywzx/xkjj.htm 中国科学院计算所的双语语料库: http://mtgroup.ict.ac.cn/corpus/query_process.php 中文语言资源联盟: http://www.chineseldc.org/xyzy.htm (傅爱平)
timy 2010-3-28 19:52
来源于: http://www.corpus4u.org/showthread.php?p=35335#post35335 【Updated on March 28, 2010】左边为英文,右边的汉语凡以分号(;)隔开的表示该术语有不同含义,以顿号(、)隔开的表示该术语有两种以上可接受的汉语译法。 Absolute frequency 绝对频数 Alignment (of parallel texts) (平行或对应)语料的对齐 Alphanumeric 字母数字类的 Annotate 标注(动词) Annotation 标注(名词) Annotation scheme 标注方案 ANSI/American National Standards Institute 美国国家标准学会 ASCII/American Standard Code for Information Exchange 美国信息交换标准码 Associate (of keywords) (主题词的)联想词 AWL/Academic word list 学术词表 Balanced corpus 平衡语料库 Base list 底表、基础词表 Bigram 二元组、二元序列、二元结构 Bi-hapax 两次词 Bilingual corpus 双语语料库 CA/Contrastive Analysis 对比分析 Case-sensitive 大小写敏感、区分大小写 Chi-square (2) test 卡方检验 Chunk 词块 CIA/Contrastive Interlanguage Analysis 中介语对比分析 CLAWS/Constituent Likelihood Automatic Word-tagging System CLAWS词性赋码系统 Clean text policy 干净文本原则 Cluster 词簇、词丛 Colligation 类联接、类连接、类联结 Collocate n./v. 搭配词;搭配 Collocability 搭配强度、搭配力 Collocation 搭配、词语搭配 Collocational strength 搭配强度 Collocational framework/frame 搭配框架 Comparable corpora 类比语料库、可比语料库 ConcGram 同现词列、框合结构 Concordance (line) 索引(行) Concordance plot (索引)词图 Concordancer 索引工具 Concordancing 索引生成、索引分析 Context 语境、上下文 Context word 语境词 Contingency table 连列表、联列表、列连表、列联表 Co-occurrence/Co-occurring 共现 Corpora 语料库(复数) Corpus Linguistics 语料库语言学 Corpus 语料库 Corpus-based 基于语料库的 Corpus-driven 语料库驱动的 Corpus-informed 语料库指导的、参考了语料库的 Co-select/Co-selection/Co-selectiveness 共选(机制) Co-text 共文 DDL/Data Driven Learning 数据驱动学习 Diachronic corpus 历时语料库 Discourse 话语、语篇 Discourse prosody 话语韵律 Documentation 备检文件、文检报告 EAGLES/Expert Advisory Groups on Language Engineering Standards EAGLES文本规格 Empirical Linguistics 实证语言学 Empiricism 经验主义 Encoding 字符编码 Error-tagging 错误标注、错误赋码 Extended unit of meaning 扩展意义单位 File-based search/concordancing 批量检索 Formulaic sequence 程式化序列 Frequency 频数、频率 General (purpose) corpus 通用语料库 Granularity 颗粒度 Hapax legomenon/hapax 一次词 Header/Text head 文本头、头标、头文件 HMM/Hidden Markov Model 隐马尔科夫模型 Idiom Principle 习语原则 Index/Indexing (建)索引 In-line annotation 文内标注、行内标注 Key keyword 关键主题词 Keyness 主题性、关键性 Keyword 主题词 KWIC/Key Word in Context 语境中的关键词、语境共现(方式) Learner corpus 学习者语料库 Lemma 词目、原形词、词元 Lemma list 词形还原对应表 Lemmata 词目、原形词、词元(复数) Lemmatization 词形还原、词元化 Lemmatizer 词形还原(词元化)工具 Lexical bundle 词束 Lexical density 词汇密度 Lexical item 词项、词语项目 Lexical priming 词汇触发理论 Lexical richness 词汇丰富度 Lexico-grammar/Lexical grammar 词汇语法 Lexis 词语、词项 LL/Log likelihood (ratio) 对数似然比、对数似然率 Longitudinal/Developmental corpus 跟踪语料库、发展语料库、历时语料库 Machine-readable 机读的 Markup 标记、置标 MDA/Multi-dimensional approach 多维度分析法 Metadata 元信息 Meta-metadata 元元信息 MF/MD (Multi-feature/Multi-dimensional) approach 多特征/多维度分析法 Mini-text 微型文本 Misuse 误用 Monitor corpus (动态)监察语料库 Monolingual corpus 单语语料库 Multilingual corpus 多语语料库 Multimodal corpus 多模态语料库 MWU/Multiword unit 多词单位 MWE/Multiword expression 多词单位 MI/Mutual information 互信息、互现信息 N-gram N元组、N元序列、N元结构、N元词、多词序列 NLP/Natural Language Processing 自然语言处理 Node 节点(词) Normalization 标准化 Normalized frequency 标准化频率、标称频率、归一频率 Observed corpus 观察语料库 Ontology 知识本体、本体 Open Choice Principle 开放选择原则 Overuse 超用、过多使用、使用过度、过度使用 Paradigmatic 纵聚合(关系)的 Parallel corpus 平行语料库、对应语料库 Parole linguistics 言语语言学 Parsed corpus 句法标注的语料库 Parser 句法分析器 Parsing 句法分析 Pattern/patterning 型式 Pattern grammar 型式语法 Pedagogic corpus 教学语料库 Phraseology 短语、短语学 POSgram 赋码序列、码串 POS tagging/Part-of-Speech tagging 词性赋码、词性标注、词性附码 POS tagger 词性赋码器、词性赋码工具 Prefab 预制语块 Probabilistic (基于)概率的、概率性的、盖然的 Probability 概率 Rationalism 理性主义 Raw text/Raw corpus 生文本(语料) Reference corpus 参照语料库 Regex/RE/RegExp/Regular Expressions 正则表达式 Register variation 语域变异 Relative frequency 相对频率 Representative/Representativeness 代表性(的) Rule-based 基于规则的 Sample n./v. 样本;取样、采样、抽样 Sampling 取样、采样、抽样 Search term 检索项 Search word 检索词 Segmentation 切分、分词 Semantic preference 语义倾向 Semantic prosody 语义韵 SGML/Standard Generalized Markup Language 标准通用标记语言 Skipgram 跨词序列、跨词结构 Span 跨距 Special purpose corpus 专用语料库、专门用途语料库、专题语料库 Specialized corpus 专用语料库 Standardized TTR/Standardized type-token ratio 标准化类符/形符比、标准化类/形比、标准化型次比 Stand-off annotation 分离式标注 Stop list 停用词表、过滤词表 Stop word 停用词、过滤词 Synchronic corpus 共时语料库 Syntagmatic 横组合(关系)的 Tag 标记、码、标注码 Tagger 赋码器、赋码工具、标注工具 Tagging 赋码、标注、附码 Tag sequence 赋码序列、码串 Tagset 赋码集、码集 Text 文本 TEI/Text Encoding Initiative 文本编码计划 The Lexical Approach 词汇中心教学法 The Lexical Syllabus 词汇大纲 Token 形符、词次 Token definition 形符界定、单词界定 Tokenization 分词 Tokenizer 分词工具 Transcription 转写 Translational corpus 翻译语料库 Treebank 树库 Trigram 三元组、三元序列、三元结构 T-score T值 Type 类符、词型 TTR/Type-token ratio 类符/形符比、类/形比、型次比 Underuse 少用、使用不足 Unicode 通用码 Unit of meaning 意义单位 WaC/Web as Corpus 网络语料库 Wildcard 通配符 Word definition 单词界定 Word form 词形 Word family 词族 Word list 词表 XML/EXtensible Markup Language 可扩展标记语言 Zipf's Law 齐夫定律 Z-score Z值
carldy 2010-3-28 18:25
《寻求第三语码基于汉语译文语料库的翻译共性研究》 本文刊发在《外语教学与研究》2010年第1期。 摘要: 翻译语言作为一种客观存在的语言变体, 既不同于源语, 又有别于目的语母语, Frawley(1984) 称其为第三语码。本文在自建汉语译文平衡语料库的基础上,结合先前创建的兰卡斯特汉语语料库(L CMC) ,对汉语译文语言的特征进行考察。作者通过对比分析两个语料库,探讨汉语译文的词汇和句法特征,发现:1) 同汉语母语相比,汉语译文词汇密度略低,特别是实义词与功能词之比偏低; 2)汉语译文使用更多连接词,呈明晰化特征; 3) 汉语译文中被动结构受英文影响大,故质疑规范化假设。 具体请看全文pdf版本: 寻求_第三语码_基于汉语译文语料库的翻译共性研究
个人分类: 语料库与翻译学研究 Corpus-based Translation Studi|4688 次阅读|0 个评论

