李:今天快讯,白宫就中美贸易谈判发表声明 Statement of the United States Regarding China Talks For the last two days, high-ranking officials from the United States and China have engaged in intense and productive negotiations over the economic relationship between our two countries. The United States appreciates the preparation, diligence, and professionalism shown throughout these meetings by Vice Premier Liu He and his team. The talks covered a wide range of issues, including: (1) the ways in which United States companies are pressured to transfer technology to Chinese companies; (2) the need for stronger protection and enforcement of intellectual property rights in China; (3) the numerous tariff and non-tariff barriers faced by United States companies in China; (4) the harm resulting from China’s cyber-theft of United States commercial property; (5) how market-distorting forces, including subsidies and state-owned enterprises, can lead to excess capacity; (6) the need to remove market barriers and tariffs that limit United States sales of manufactured goods, services, and agriculture to China; and (7) the role of currencies in the United States–China trading relationship. The two sides also discussed the need to reduce the enormous and growing trade deficit that the United States has with China. The purchase of United States products by China from our farmers, ranchers, manufacturers, and businesses is a critical part of the negotiations. The two sides showed a helpful willingness to engage on all major issues, and the negotiating sessions featured productive and technical discussions on how to resolve our differences. The United States is particularly focused on reaching meaningful commitments on structural issues and deficit reduction. Both parties have agreed that any resolution will be fully enforceable. While progress has been made, much work remains to be done. President Donald J. Trump has reiterated that the 90-day process agreed to in Buenos Aires represents a hard deadline, and that United States tariffs will increase unless the United States and China reach a satisfactory outcome by March 1, 2019. The United States looks forward to further talks with China on these vital topics. 搜狗机器翻译如下: 过去两天,美国和中国的高级官员就我们两国的经济关系进行了紧张而富有成效的谈判。美国赞赏刘副总理和他的团队在这些会议中表现出的准备、勤奋和专业精神。 会谈涉及广泛的问题,包括: ( 1 )美国公司向中国公司转让技术的压力;( 2 )加强中国知识产权保护和执法的必要性;( 3 )美国公司在中国面临的众多关税和非关税壁垒;( 4 )中国网络盗窃美国商业财产造成的危害;( 5 )市场扭曲力量,包括补贴和国有企业,如何导致产能过剩;( 6 )需要消除限制美国向中国销售制成品、服务和农业的市场壁垒和关税;( 7 )货币在美中贸易关系中的作用。双方还讨论了减少美国对中国巨大且日益增长的贸易逆差的必要性。中国从我们的农民、牧场主、制造商和企业购买美国产品是谈判的关键部分。 双方表示愿意参与所有重大问题的讨论,谈判会议就如何解决我们的分歧进行了富有成效的技术性讨论。美国特别注重就结构问题和减少赤字达成有意义的承诺。双方同意,任何决议都将完全可以执行。 虽然取得了进展,但仍有许多工作要做。唐纳德·特朗普总统重申,在布宜诺斯艾利斯商定的90天进程是一个艰难的最后期限,除非美国和中国在2019年3月1日前取得令人满意的结果,否则美国关税将会增加。美国期待着与中国就这些重要议题展开进一步会谈。 阿:@wei 文字相当棒 马:@wei 没看你说明之前,没有觉得是机器翻译的。 李:一字未改,就是搜狗MT。新闻类、IT类、科普类、日常会话类翻译都已经不是障碍了,机译已经达到人译水平,比多数人强。关键是随时随地 立等可取。所以那些写不了通顺英文的人,也不必费时间学英文了。你可以用随便写出来的中文,翻译过去,会成为一篇相当通顺的英文。不信试试搜狗MT或有道MT。译文往往比原文更通顺。从神经机器翻译的原理和目前水平看,译文总是通顺的,基本不受原文顺不顺的影响。原文只要把意思大体表达出来就好了。稍加修改,这比自己用英文拽要简单多了。 其实,可以专门针对不同文体训练机器写应用文的套路。这个服务保不准比不同语言之间的翻译会更火,因为写文章、报告等是每个人的任务,特别是在时间和老板压力下。现在,改错字、零星的文法错误已经有非常好的软件了,譬如我每天使用的 Grammarly,但还没见谁有意识地整篇文章,用神经自我翻译的方式,帮助作文差的人,瞬时提高顺畅度。这个产品定位的主意可是NLP应用的一个金点子啊,先放在这里。不太远的未来,肯定会有人这么做,这么吆喝(市场化)的。没道理不火。 最近试了试讯飞口语翻译机,日常口语语音传译也已经不是问题了。我有意用非标准的普通话,蹩脚的英文。结果都不错。 阿:@wei 怪不得现在大学英文系都萧条了。 李:AI 各应用领域要是都达到 MT 的水平,我就同意奇点到了,然后就相信马斯克、霍金之流的耸人听闻,呵呵。胜利大逃亡,跟马斯克到火星去避难。 严:看了@wei 引用的机器翻译的文章,加上这些年飞速进展,让我更纠结了。在美国高中的儿子告诉我他不想学外语,但我太太觉得一定要学的,因为绝大多数大学入学都有外语要求。不学外语的话大学选择就会大大缩小。但儿子说得也对,学三年还是比不过机器翻译,不如用这时间学点别的,比如离散数学之类。我真拿不定主意。 瑞:现在医院里护士都用手机上的APP翻译软件跟病人用各种语言交流。虽然翻译未必精确,但足够表达主要意思 迈:@严 如果学外语用数学做代价,可能是定位误差。人应该是多面手,学外语发展另一块脑力,或许可以增加认识能力,学其他科目的成效大大提升,也未可知。 严:@迈 有道理。但通常间接效益超过直接效益,需要特殊条件。 毛:如果机器翻译芯片可以植入人体,那就真的可以不学外语。 阿:@严 学语言不仅仅是学习机械的语言 看上文《 【人文科大】语言赋予思维的变革性力量 》 李:说老实话,绝大多数国人学外语 尤其是相当数量学得很苦 很受罪折磨的人,的确是劳民伤财 何苦来哉。如果把学一门外语当作一个业余兴趣,学了可以开开眼界 了解原来思想可以有不同的表达手段。但学一门 浅尝辄止就好 除非是想专门研究语言。 瑞:马云学数学是对自己的折磨,学英语却是享受 阿:至少对语言的结构有更多了解 不学西语 就不知道什么过去式 进行时 虚拟语气等等 中文好像不教这个。课堂上学的基本没用。 李:当年学第二外语法语 第三外语俄语 回想起来 真是遭罪。现在也几乎全还给老师了。好在做了语言学家 虽然具体语言的能力是久不用就忘了 但语言学职业上还是受益。但绝对不主张 不鼓励非语言学家去学二外三外。有那时间 干什么不好。其实学英语也很遭罪,不过是遭罪一辈子了,麻木了而已。 阿:对有些人来说 语言就是一种爱好 一种游戏。你去跟打游戏的人说 你玩别的多好 他不会理睬。 李:唯一不遭罪 真正感到乐趣的是学世界语。那真是神奇的东西 学起来非常过瘾。学完了 也终于没有还回去 听说是退化了一些 阅读和写作没影响。 自然语言作为外语 基本上就是把人当机器折磨。无数鸡零狗碎的习惯用法需要死记 需要反复练习。可人脑毕竟不是硬盘啊 哪里经得起这种折磨。属于低级的强体力劳动 劳动对象是头脑记忆和条件反射。不值。 世界语不同 死记的部分被压缩到最小 逻辑和规律一统天下 学进去感觉进入了思维表达的自由世界。如果学外语只是为了开阔眼界 了解母语以外的表达方式 可以推荐世界语。 以前老觉得世界语因为没什么用 会逐渐式微。现在看来 有了机器翻译 有用无用已经不是主要标准了。剩下的就是语言的兴趣、特色和投入产出了。保不准100年后 它将成为唯一的“二外”供人娱乐 满足好奇心。 《师弟轶事——疯狂世界语》 这是当年学世界语的投入,师姐的夸张描述。大概与玩游戏的人入迷类似。 Nuva:学会一门外语,等于大脑多开发了一块地方,语言间链接更多。 梁:多学一门外语,等于在你的黑屋子里多开一道窗户,也让思维更加多元。会外语,才有可能接纳,至少不排斥,来自不同文化的东西。还有,学外语,得老年痴呆的可能性也减小,大脑开发的越多,得老年痴呆的可能越小,cognative reserve 比较大。 李:都是站着说话不腰疼。啥事都有个性价比、投入产出比。学外语需要投入多少时间精力,与它带来的好处对比,简直就是黑洞。而且学了以后,如果不经常使用,一多半都会吐回去。如果回报是可以看外文原文资料、出国旅游可以与老外简单会话,问路点菜上个厕所啥的,这个回报在有机器翻译的现代社会,已经不值一提了。其他的回报还有啥可以度量的? 有一种回报是,因为人与人对于外语的吸收能力差别很大,在全民学外语的环境下,语言能力强的人会有特别的优越感。女孩子一般比男孩子学外语更快,更溜,所以学外语是妇女能顶大半边天的难得机会。这些也算是特别场景下的回报。但这种优越感是建立在其他人学外语的挫折感的基础之上。如果有挫折感的人 突然悟出来现代社会有电脑,没必要进入外语这个赛道了。这个好处也就不复存在了。 总之,外语与钢琴类似。孩子愿意学就让他学。不愿意学不必勉强他学。不是学了没好处,而是投入太大,受益太小,一般来说不值。目前的教育制度还是滞后于时代,规定人人学外语。等再过20年,也许教育制度会改成不是必修。 毛:完全同意。 王:急功近利未必能学好。 严:@wei 很受鼓舞!给儿子推荐世界语!估计只有网上资源,又要跟counselor协商了。高中里只有法语、西班牙语、日语。
这是网上这两天广泛热议和流传的AI笑话( 博鳌AI同传遭热议 ): 昨天还在想,这“一带一路”的翻译笑话是怎么回事儿呢?这类高频新术语、成语是机器的大拿,不就是一个词典记忆嘛。 今天看新智元的采访( 博鳌AI同传遭热议!腾讯翻译君负责人李学朝、讯飞胡郁有话说 ),原来,这次的笑话不是出在成语上,而是出在成语的“泛化”能力上。“成语泛化”的捕捉和翻译,这一点目前还是短板。 对于中译英,“一带一路”的翻译完全没有问题,因为这是近年来习大大新时代新政的最流行的新术语,家喻户晓,没有人去泛化它。机器翻译自然不会错,主流怎么翻译,机器就会怎么翻译,不会更好,也绝不会更差。 可是这个中国的术语到了英语世界,并不是所有受众都记得住准确的说法了。结果,“标准” 的流行译法 “one belt one road”,被有些老外记错了,成了“one road one belt” or the road and belt 等。这也是可以理解的,老外没有政治学习时间也不没有时事政治考核,能记得一个大概就不错了。 虽然说法不同了,次序有变,但两个关健词 road 和 belt 都在,这种成语“泛化”对于人译不构成挑战,因为老外的记忆偏差和“泛化”的路数,与译员的心理认知是一致的,所以人工传译遇到这类绝不会有问题。可是,以大数据驱动的机器翻译这次傻了,真地就神经了,这些泛化的变式大多是口语中的稀疏数据,无法回译成汉语的“一带一路”,笑话就出来了。 提高MT的“成语泛化”能力,是当今的一个痛点,但并不是完全无迹可寻。将来也会成为一个突破点的。只是目前一般系统和研究还顾不上去对付它。以前我提过一个成语泛化的典型案例应该具有启示作用的:“1234应犹在,只是56改”。 早期机器翻译广为流传的类似笑话也是拿成语说事(The spirit is willing, but the flesh is weak,心有余而力不足 据传被翻译成了“威士忌没有问题,但肉却腐烂了”),因为一般人认为成语的理解最难,因此也必然是机器的挑战。这是完全外行的思路。成语的本质是记忆,凡记忆电脑是大拿,人脑是豆腐。 NLP 最早的实践是机器翻译,在电脑的神秘光环下,被认为是 模拟或挑战 人类智能活动的机器翻译自然成为媒体报道的热点。其中有这么一个广为流传的机器翻译笑话,为媒体误导之最: 说的是有记者测试机器翻译系统,想到用这么一个出自圣经的成语: The spirit is willing, but the flesh is weak (心有余而力不足) 翻译成俄语后再翻译回英语就是: The whiskey is alright, but the meat is rotten(威士忌没有问题,但肉却腐烂了) 这大概是媒体上流传最广的笑话了。很多年来,这个经典笑话不断被添油加醋地重复着,成为NLP的标准笑柄。 然而,自然语言技术中没有比成语更加简单的问题了。成语是NLP难点的误解全然是外行人的臆测,这种臆测由于两个因素使得很多不求甚解的人轻信了。其一是NLP系统的成语词典不够全面的时候造成的类似上述的“笑话”,似乎暴露了机器的愚蠢,殊不知这样的“错误”是系统最容易 debug 的:补全词典即可。因为成语 by definition 是可列举的(listable),补全成语的办法可以用人工,也可以从语料库中自动习得,无论何种方式,都是 tractable 的任务。语言学告诉我们,成语的特点在于其不具有语义的可分解性(no/little semantic compositianlity),必须作为整体来记忆(存贮),这就决定了它的非开放性(可列举)。其二是对于机器“理解”(实际是一种“人工智能”)的误解,以为人理解有困难的部分也必然是机器理解的难点,殊不知两种“理解”根本就不是一回事。很多成语背后有历史故事,需要历史知识才可以真正理解其含义,而机器是没有背景知识的,由此便断言,成语是NLP的瓶颈。 事实是,对于 NLP,可以说,识别了就是理解了,而识别可枚举的表达法不过是记忆而已,说到底是存储量的问题。可是确实有人天真到以为由冷冰冰的无机材料制作的“电脑”真地具有人脑那样的自主理解 能力/机制。 from NLP 历史上最大的媒体误导:成语难倒了电脑 关于新时代“一带一路”的合适译法,我曾经从语言学构词法角度也论过: “一带一路”, 官方翻译是: one belt one road。 不得其解,昨天才搞明白是中国倡导 由中国带头 沿着古丝绸之路 开发新的经济贸易开发区 一方面帮助消化过剩的产能 一方面带动区域经济 实现共赢 让区域内国家分享中国经济高速发展的火车头效益 从而树立中国崛起的和平领军形象。 感觉还有更多也许更好的选项 反正是成语 反正光字面形式 谁也搞不清真意 总是需要伴随进一步解释 不如就译成: 一带一路 ===》 one Z one P (pronounced as:one zee one “pee”) 怎么样,这个翻译简直堪比经典翻译 long time no see (好久不见)和 “people mountain people sea” (人山人海)了。认真说,Zone 比 Belt 好得多。 One zone one path. One zone one road. New zone old road. New Silk Road Zone. 感觉都不如 one Z one P 顺口。 from 【语言学随笔:从缩略语看汉字的优越性】 【相关】 博鳌AI同传遭热议!腾讯翻译君负责人李学朝、讯飞胡郁有话说 NLP 历史上最大的媒体误导:成语难倒了电脑 【 立委随笔:成语从来不是问题 】 【语言学随笔:从缩略语看汉字的优越性】 【语义计算:李白对话录系列】 【置顶:立委NLP博文一览】 《朝华午拾》总目录
一次晚宴上,一位老同事说:10年后,机器翻译应该成熟了。我,不想争论。(Let others have the last say.) 机器(计算机)可以做到人做不到的事,包括下棋。但是,这种机器是人制造的。除非有人觉得机器翻译太重要了,否则,我相信10年后的机器依然不如人翻译得好。 原文: Human impacts on the carbon cycle are well known, yielding anticipated global changes in the Earth’s climate. Likewise, human impacts on the availability of nitrogen, largely to improve agricultural yield, leave their mark on greater levels of water pollution in rivers and coastal waters. Nitrogen that escapes from agricultural fields, largely as ammonia, affects airquality, especially fine particulate matter, in regions downwind of agricultural. Other gaseous losses of nitrogen yield acidic rain and depletion of stratospheric ozone. Machine does this: 人类的影响的碳循环是众所周知的 , 产生预期的全球性变化的地球的气候。 同样 , 人类的影响可用性的氮气 , 主要是为了提高农业产量 , 将会在更高一级的水污染的河流和沿海水域。 氮气 , 逸出的农业领域 , 主要是氨气、会影响空气质量 , 尤其是细颗粒物 , 在顺风的农业。 其他气体损失的氮的产生酸性雨中和平流层臭氧的损耗。 https://www.freetranslation.com/en/translate-english-chinese Mine: 人类对(地球系统)碳循环的影响是众所周知的;这种影响产生了预期中的全球 气候 变化。 我们同样知道,人类为了提高农产量 , 对可用氮的影响加剧了河流和沿海水域的(水)污染。 从农田逸出的氮(主要是氨气)会影响(农田的)下风地区的空气质量 , 尤其是空气中细颗粒物的浓度。 (与人类活动有关的)其他气态形式的氮释放,产生了酸雨,或损耗了平流层中的臭氧。
持续了大半个世纪的形式化论题:塞尔假设改用的中文字屋与图灵测试所用的英文词屋之间的内在逻辑关系。其中,蕴涵了双重形式化途径, 可采用它的 中文 大字符形式化途径至今为止并未超越 已流行的 英美小字符形式化途径。这个看似仅限于取值的形式化操作,却止步于置信的概念化和社会化进程之中。这个从(图灵)词屋到(塞尔)字屋持续了大半个世纪的自然语言理解的形式化论题,仅仅仅仅因为字与词之间看似(失之)毫厘实则(谬之)千里的歧义而迟迟未获实质性突破。 附录: Twenty-one years in the chinese room John R. Searle In John M. Preston Michael A. Bishop (eds.), Views Into the Chinese Room: New Essays on Searle and Artificial Intelligence. Oxford University Press (2002) http://philpapers.org/rec/SEATYI A chinese room that understands Herbert A. Simon Stuart A. Eisenstadt In John M. Preston Michael A. Bishop (eds.), Views Into the Chinese Room: New Essays on Searle and Artificial Intelligence. Oxford University Press (2003) http://philpapers.org/rec/SIMACR John Searle's chinese room argument John McCarthy Abstract:John Searle begins his ``Consciousness, Explanatory Inversion and Cognitive Science'' with ``Ten years ago in this journal I published an article criticising what I call Strong AI, the view that for a system to have mental states it is sufficient for the system to implement the right sort of program with right inputs and outputs. Strong AI is rather easy to refute and the basic argument can be summarized in one sentence: {it a system, me for example, could implement a program for understanding Chinese, for example, without understanding any Chinese at all.} This idea, when developed, became known as the Chinese Room Argument.'' The Chinese Room Argument can be refuted in one sentence. http://philpapers.org/rec/MCCJSC
机器翻译研究的现状和发展趋势 http://icl.pku.edu.cn/icl_tr/collected_papers/chinese/collection-3/15-bbzw.htm 中国机器翻译的世纪回顾 http://my.oschina.net/apdplat/blog/419511 MT Summit VII (大翻译时代的机器翻译 ——MT in the great translation era ) http://ccl.pku.edu.cn/doubtfire/NLP/Machine_Translation/Overview/Retrospect%20and%20prospect%20in%20computer-based%20translation.htm 大翻译时代的机器翻译:回顾与展望 http://www.oktranslation.com/tech/info13096.html Retrospect and prospect in computer-based translation http://mt-archive.info/MTS-1999-Hutchins.pdf 谷歌学术 http://scholar.google.com/
【立委按:这是刚出道时在导师刘倬先生指导下写的一篇机器翻译论文,举了很多第一手经验的实例,谈的是机器翻译中最棘手和繁琐的一词多译的难题及其专家词典的解决方案。就是现在看来,也可能有一些启发作用。当年写论文,也不懂西方规矩,既不做文献调查,文末列参考文献只列了自家的一篇,而且引用也不规范(没列明哪条具体观点出自什么文献)。倒不是唯我独尊,实在是不懂学术规范。不过那时候国内也没那些规矩,尤其是社会科学类的国内导师包括名家也大都不甚严谨,杂志编辑也不觉得异样。现在看来是够可笑的了,但历史就是如此。记得当年也有读严谨引用的论文,那多半是拿洋博士回国的人所为,不成气候。不过,这一篇论文是纯粹的经验总结,“专家词典”的自动翻译设计思想也是导师思想的核心(此前国外有人工智能学者 Small 做过类似专家词典的小规模尝试,这是我们写另一篇论文时查阅到的),都是干货,引用他人不全的缺点对文章的价值影响不大。】 我和导师的早期论文:《李维 刘倬:机器翻译词义辨识对策》【中文信息学报】第四卷第一期,1990年 Approach to Lexical Ambiguities in Machine Translation Authors: Wei Li, Zhuo Liu In Journal of Chinese Information Processing. Vol. 4, No. 1. pp. 1-13. Beijing 1990. 【置顶:立委科学网博客NLP博文一览(定期更新版)】
Appendix 9 Cost Estimates of Various Types of Translation Before attempting to determine the costs of various types of translation, it might be instructive to see what the costs would be for an operation that made no use of translations, that is, a system that utilized subject specialists who were also skilled in a second language. Let us assume that we have an agency that employs 100 analysts and let us further assume the following: 1. that 50 of the analysts are competent in Russian in their subject field, 2. that each analyst earns $12,000 per year, 3. that each analyst reads 1,000 words of Russian per day in his work, 4. that each analyst works 220 days per year, and 5. that, therefore, the agency consumes a total of 11,000,000 Russian words a year. Since the major effort in past work on machine translation (MT) has been to develop a program to translate Russian into English, let us now restrict our discussion to the 50 analysts who are proficient in Russian. Salaries for these 50 would amount to $600,000 per year. Other costs such as Social Security, annual and sick leave, and retirement could be calculated at approximately 33 1/3 percent of their gross salaries. Thus the cost for these analysts would be approximately $800,000 per year. Obviously, no duplication checks would be necessary to determine whether a translation of any given work was already in existence. The Committee has no figures on the cost of maintaining facilities necessary for the making of checks to prevent the duplication of translation. If these costs could be determined and if they proved to be substantial, it might be the case that it would be more economical not to make duplication checks of documents less than some specific number of pages in length. In any event, the duplication checks would be superfluous for an agency employing persons proficient in a foreign language. MAJOR COSTS OF ITEMS OF AN AGENCY UTILIZING 50 ANALYSTS PROFICIENT IN RUSSIAN 50 Analysts at $12,000 per annum $600,000 Direct cost overhead at 33 1/3 percent of the above 200,000 Duplication checks 0 Total $800,000 Figured at 220 working days per analyst the total volume of words of Russian read would amount to 11,000,000 or about $75 for each 1,000 words read. Time lag after receipt of document none Total Cost of Translation 0 MONOLINGUALS If the 50 analysts could not read Russian and had to rely on translation, a number of possibilities exist for providing them with English translation. The agency could 1. employ in-house translators in the conventional method, 2. employ translation using the dictation (or sight) method of translation, 3. employ contract translators, 4. utilize the services of JPRS, 5. provide the analysts with unedited “raw” (MT) output, 6. provide the analysts with postedited MT, or 7. use a system of machine-aided translation. Throughout the subsequent discussion, the Committee has relied heavily on the cost figures developed by Arthur D. Little, Inc., and contained in An Evaluation of Machine-Aided Translation Activities at FTD . References to this study are indicated below by (ADL) followed by the appropriate page number. IN-HOUSE TRANSLATORS At the Foreign Technology Division, the in-house translators work at a rate of about 240 Russian words per hour (ADL, p. 29), yielding a daily output of approximately 2,000 words. Thus one translator can produce enough to keep two analysts in translations. Since ADL estimates (ADL, p. 21) that the cost for in-house translation is $22.97 per 1,000 Russian words, the cost for 11,000,000 Russian words would be $252,670. We assume that direct costs were included in this figure ($5.60 per hr) for translator time. Other costs that must be included in this type of operation are those of space, equipment, recomposition, and proofreading and review. MAJOR COSTS FOR IN-HOUSE HUMAN TRANSLATION 25 Translators' salaries and direct cost overhead $252,670 Recomposition ($14.15 per 1,000 words, ADL, p. 21) 155,650 Proofreading and review ($2.97 per 1,000 words, ADL, p. 21) 32,670 Duplication checks ? Total $432,990 IN-HOUSE TRANSLATION EMPLOYING DICTATION The Committee's study described in Appendix 14 revealed that the average typing speed of the translator was only 18 words a minute and that typing took approximately 25 percent of the total time needed to produce the translation. It would seem then to be advantageous to use the translator for translating and to use trained typists to do the typing. One agency (see Appendix 1, page 35) found that on suitable texts (those with few graphics to be inserted), the daily output of the translator was doubled. A typist trained in the use of dictating equipment can type about 8,000 words of English per day. To convert this to the number of Russian words one must employ a factor of 1.35 English words per Russian word. Thus the 8,000 English words would represent 6,000 words of original Russian text. If the over-all output of the translator were to be increased by as little as 25 percent, his output would amount to 2,500 words per day. At this rate of output, only 20 translators would be needed instead of 25, and about eight typists would be needed to keep up with the output of the translators. Although some savings are realized from this type of system, owing to the fact that typists are paid at about half the rate of translators, such savings are offset to some extent by the additional space and equipment required. It seems likely, however, that the use of this system would result in a more attractive product, the copy having been prepared by well-trained typists. Furthermore, an estimated increase of only 25 percent, upon which we have based our computations, may be unduly conservative. If this is so– and the Committee would like to see studies made to determine more accurately the actual advantages of various systems–the dictation method would be even more attractive. CONTRACT TRANSLATION Since contract translation costs vary widely, we will once more base our computations on data in the Arthur D. Little, Inc., report. The ADL team found that the cost per 1,000 Russian words was $24.57 for the translation process, $5.40 for insertion of graphics, and $2.97 for proofreading and review, or a total of $32.94 (ADL, p. 21). The Committee has been told by a reliable and knowledgeable individual connected with the translation at FTD that the proofreading and review procedure was unnecessary since the translations produced by the contractor were of excellent quality. Trusting this individual's judgment, but at the same time being aware that the ADL report is a careful study of what practices were in force (regardless of their necessity or degree of efficiency) at FTD, the Committee conjectured that $1.50 per 1,000 Russian words, rather than $2.97, might be a reasonable cost for the proofreading and review procedure; therefore, our computation differs from the ADL study. It is a fact that contractors have a lower overhead than in-house translators, and it is hoped that the significance of this item will not be overlooked by the reader. An annual production of 11,000,000 Russian words by contract would cost the using agency $270,270 for translation 59,400 for graphics 16,500 for proofreading and review $346,170 Total Since the average document to be translated is about 8,000 (Russian) words in length (ADL, p. A-8), our hypothetical agency would have to handle and control only six or seven documents a day, and few or no additional personnel would be needed for this task. Thus the $346,170 estimated above would approximate the total cost. THE JOINT PUBLICATIONS RESEARCH SERVICE (JPRS) The JPRS (Appendix 3) utilizes subject matter specialists who work at home on a part-time, contract basis. Thus, JPRS is able to handle a large quantity of translations in many languages in many fields at low rates. Because it does handle a large quantity of translations, JPRS is able to charge the same price for all translations regardless of subject matter or language. The current price is $16 per 1,000 words of English. Applying the factor of 1.35 English words for each Russian word, one can see that 11,000,000 Russian words are the equivalent of 14,850,000 English words and that, therefore, the JPRS charge for such translation would amount to $237,600. Once again, as with any contract translation, the number of additional personnel would be minimal, and the cost above would be close to the true cost. UNEDITED MACHINE TRANSLATION (MT) The development of an MT program capable of producing translations of such a quality that they would be useful to the reader without requiring the intervention of a translator anywhere in the process has long been the goal of researchers in MT. As far as the Committee can determine, two attempts have been made to give analysts “raw” or unedited machine output. Neither proved to be satisfactory. The FTD experience is stated with admirable succinctness: “This marks a considerable change in attitude toward MT's which, in their earlier unedited form, were generally regarded as unsatisfactory” (ADL, p. F-5). We have worked out a simple equation that shows how many dollars may be saved by using the unedited machine output. Let CH = cost of human translation (dollars/1000 words), CM = cost of MT (dollars/1000 words), W = loaded salary of user of the translation (dollars/hr), TH = reading time for human translation (hr/1000 words), TM = reading time for MT (hr/1000 words), N = number of people who read the translation, S = saving by MT (dollars/1000 words). Then S = CH − CM − WN (TM − TH). Presumably the saving would be greatest if the reader merely read machine print-out, referring to the untranslated original for figures and equations. Here the cost of machine output could best be compared, not with the cost of JPRS translations, but with the cost of dictated and uncorrected human translations, either voice on tape, or a typewritten transcription of the tape. As we have pointed out in Appendix 1, such translation can be carried out several times as fast as “full translation.” Unfortunately, we do not know what the costs are for translations that are dictated but not typed. It would seem likely, however, that savings would be substantial, since there would be no costs (a) for typist-transcriptionists or (b) for recomposition. Whether the savings involved would be offset by increased difficulty of use by the analyst is not known. Although the analyst would not be presented with a written translation, he would at least be assured of having all the words translated, unlike the raw MT output. Most translations are apparently read by more than one reader. According to one agency, the preparation of 175 copies of a translation for distribution is standard for documents that appeared originally in the open literature and this distribution accounts for about 90 percent of the documents translated. For the remaining 10 percent (the classified documents) only one copy is prepared, but the requester has the privilege of making as many copies as he deems fit. Even more astonishing is the estimate of the Arthur D. Little, Inc., team that “about 615 members of the Air Force R D community (40,000 members) would be expected to have a common interest in the average translated document” (ADL, p. F-9). It was shown by John B. Carroll, in the study that he did for the Committee (see Appendix 10), that the average reader tested took twice as long to read raw MT as he did to read a human translation. The ADL team found that the average reading rate of those tested was 200 words per minute for well-written English (ADL, p. D-6) or 0.08 hr per 1,000 words. From these two studies we determined the reading rate for raw MT to be 100 words per minute or 0.16 hr per 1,000 words. Raw MT should be compared, as has been mentioned, with an equally inelegant product. But the Committee has no idea of the cost of a comparable product or the time required to read (or listen to) it, and these factors are crucial in the calculation of savings according to our equation. Prudence demands that we compare raw MT with a product about which we have more certain knowledge concerning cost and reading rates even though such translations are of higher quality. For the purposes of comparison, we have chosen the JPRS for the simple reasons that (1) it is relatively inexpensive and (2) the costs are known and stable. Applying our equation, we have CH = $21.60 (the JPRS cost per 1,000 Russian words, the conversion factor of 1.35 being applied to $16.00, the cost per 1,000 English words), CM = $7.63 , W = $10.00 , TH = 0.08, TM = 0.16. Utilizing the figures above, but varying N (the number of readers), we arrive at the savings made by the use of raw output. If the number of readers is 1: S = $21.60 − 7.63 − , S = $21.60 − 7.63 − 0.80, S = $13.17. If the number of readers is 10: S = $5.97. If the number of readers is 15: S = $1.97. If the number of readers is 17: S = $0.37. If the number of readers is 18: S = −$0.43. If the number of readers is 20: S = −$2.03. If the number of readers is 80: S = −$40.13. If the number of readers is 175: S = −$127.03. If the number of readers is 615: S = −$478.13. Obviously, the break-even point occurs between 17 and 18 readers. But we have seen that, in one agency at least, about 90 percent of the translations are distributed to 175 readers, whereas only 10 percent are prepared for a single reader. By simple computation it can be determined that whereas the use of JPRS for all translation would result in a loss of $14,487, the use of MT for all translation would result in a loss of $1,257,597. It might be argued that MT is still economical when used to provide translations that are user-limited; but, since relatively few translations seem to be destined for use by less than 18 readers, the volume would probably be too small to warrant the maintenance of an elaborate computer facility with its attendant personnel. To the Committee, machine output (such as that shown on pages 20-23) seems very unattractive. We believe that the only valid argument for its use would be a compelling economic argument. If it can be shown that the use of unedited machine output, taking proper account of increased reading time on the part of the readers, would result in worthwhile savings over efficient human translation of the most nearly comparable kind, then there is a cogent reason for using unedited MT. But, unless such a worthwhile saving can be convincingly demonstrated, we regard the use of unedited machine output as regressive and unkind to readers. In considering the cost of producing unedited machine output we must use the real current cost. It is nice to think that savings may be made someday by using automatic character recognition, but actual savings should be demonstrated conclusively before machine output is inflicted on users in any operational manner. POSTEDITED MACHINE TRANSLATION (MT) To provide 11,000,000 words of postedited Russian-to-English MT per year would cost $397,980 . This estimate should be regarded as a very low one, since the ADL team did not include overhead costs (ADL, p. 3). ADL figures (ADL, p. E-5) that for 100,000 words per day, 44 individuals would be required; for input typing, 14; for machine operation, 1.6; for output typing, 1.4; and for postediting, 28. Since we are assuming a 50,000-word-per-day consumption, we will halve this estimate, giving a total of 22 personnel. The point the Committee would like to make in this connection is that since 22 personnel would be required, 14 of whom (the posteditors) have to be proficient in Russian, one might as well hire a few more translators and have the translations done by humans. Another, perhaps better, alternative would be to take part of the money spent on MT and use it either (1) to raise salaries in order to hire bilingual analysts–thus avoiding translation altogether–or, (2) to use the money to teach the analysts Russian. MACHINE-AIDED TRANSLATION (M-AT) We will call M-AT any system of human translation that utilizes the computer to assist the translator and that was designed originally for such a purpose. A system such as that at the FTD might properly be called human-aided machine translation, since the postediting process was added after it became apparent that raw output was unsatisfactory and since humans are employed essentially to make up for the deficiencies of the computer output. Specific costs for the two types of M-AT systems in operation (see Appendix 12 and Appendix 13) are not known to the Committee, but from the given figures that show the proportion of translator time saved, it is possible to make some rough estimates. Both the Federal Armed Forces Translation Agency and the European Coal and Steel Community indicate that a saving of about 50 percent of the translator's time could be expected by the use of a machine-aided system. Since translators' salaries constitute the largest item in the budget for a human-translation facility, such savings would probably be substantial. Input typing costs would not be as great as those at FTD, where the entire document to be translated is keypunched, since only the individual words or sentences with which the translator desires help are keypunched. Furthermore, the programming involved is relatively simple and small, and inexpensive computers are adequate. The relatively modest increases in staff, equipment, and money necessary for the production of translator aids are likely to be offset by the increase in quality of the product. It is possible, therefore, that the savings of an M-AT system might approach 50 percent of the cost of translator salaries in a conventional human-translation system. If this estimate is sound, then the cost for an M-AT system to produce 11,000,000 words of Russian-to-English translation would be $314,655 ($126,335 for salaries, $155,650 for recomposition, $32,670 for proofreading and review). SUMMARY Throughout our discussion of costs, we have been conscious of the fact that we were not in possession of all the necessary data. We present the following estimates with diffidence and would welcome any studies that would more precisely determine actual translation costs and quality, whether they affirm or deny the validity of our estimate. ESTIMATES OF COSTS AND QUALITY FOR VARIOUS TYPES OF TRANSLATION Type Quality Cost for 11,000,000 Russian Words In-house (conventional translation) Good $ 440,000 In-house (dictation) Good 440,000 − Contract Fair to good 350,000 JPRS Fair 240,000 Raw MT Unsatisfactory 80,000 + Postedited MT Fair 400,000 M-A T Excellent 310,000 Analysts proficient in Russian - 0 CONCLUSION Since no one can be proficient in all languages, there will always be a need for translation. Yet, publication is not evenly distributed among the some 4,000 languages of the world, and this is especially so in the areas of science and technology. Russian-to-English translation constitutes a large part of the total translation done in the United States, and there are no signs that this situation is likely to change radically in the foreseeable future. This being the case, the present policy of using monolingual analysts and providing them with translations year after year seems lacking in foresight, particularly since the time required for a scientist to learn a foreign language well enough to read an article in his own field of specialization is not very long, and since the facilities are available to train him. In our hypothetical agency, the costs of providing fair and good translations were from 30 to 55 percent greater than the estimated costs of a facility using analysts proficient in Russian. To allow heavy users of Soviet literature to continue to rely on translations seems unwise. Appendix 10 An Experiment in Evaluating the Quality of Translations This experiment* was designed to lay the foundations for a standard procedure for measuring the quality of scientific translations, whether human or mechanical. There have been other experiments on this problem , but their methods for evaluating translations have been too laborious, too subject to arbitrariness in standards, or too lacking in reliability and/or validity to become generally accepted. The measurement procedure developed here gives promise of being amenable to refinement to the point where it will meet the requirements of relative simplicity and feasibility, fixed standards of evaluation, and high validity and reliability. A detailed report of this experiment will be submitted for publication elsewhere; the present brief report will serve to indicate the general nature of the measurement procedure and some of the chief results. THE MEASUREMENT PROCEDURE It was reasoned that the two major characteristics of a translation are (a) its intelligibility, and (b) its fidelity to the sense of the original text. Conceptually, these characteristics are independent; that is, a translation could be highly intelligible and yet lacking in fidelity or accuracy. Conversely, a translation could be highly accurate and yet lacking in intelligibility; this would be likely to occur, however, only in cases where the original had low intelligibility. Essentially, the method for evaluating translations employed in this experiment involved obtaining subjective ratings for these two characteristics– intelligibility and fidelity–of sentences selected randomly from a translation and interspersed in random order among other sentences from the same translation and also among sentences selected at random from other translations of varying quality. When a translation sentence was being rated for intelligibility, it was rated without reference to the original. “Fidelity” was measured indirectly: the rater was asked to gather whatever meaning he could from the translation sentence and then evaluate the original sentence for its “informativeness” in relation to what he had understood from the translation sentence. Thus, a rating of the original sentence as “highly informative” relative to the translation sentence would imply that the latter was lacking in fidelity. All ratings were made by persons who were specially selected and trained for this purpose. There were two sets of raters. The first set of raters (called here “monolinguals” for convenience) consisted of 18 native speakers of English who had no knowledge of the language of the original (Russian, in this case). They were all Harvard undergraduates with high tested verbal intelligence and with good backgrounds in science. In rating “informativeness” these raters were provided with carefully prepared English translations of the original sentences, so that in effect they were comparing two sentences in English–one the sentence from the translation being evaluated, and the other the carefully prepared translation of the original. The second set of raters (“bilinguals”) consisted of 18 native speakers of English who had a high degree of competence in the comprehension of scientific Russian. Their ratings of the intelligibility of the translation sentences may well have been influenced by their knowledge of the vocabulary and syntax of Russian; at any rate, no attempt was made to prevent them from using such knowledge. To rate “informativeness,” they made a direct comparison between the translation sentence (in English) and the original version. All ratings were made on nine-point scales that had been established by the writer prior to the experiment by an adaptation of a psychometric technique known as the method of equal-appearing intervals. Thus, points on these scales could be assumed to be equally spaced in terms of subjectively observed differences. In the case of the intelligibility scale, each of the nine points on the scale had a verbal description (see Table 4). The same was true of the “informativeness” scale except that verbal descriptions were omitted for a few of the points (see Table 5). In this way each degree on the scales could be characterized in a meaningful way. For example, point 9 on the intelligibility scale was described as follows: “Perfectly clear and intelligible. Reads like ordinary text; has no stylistic infelicities.” Point 5 (the midpoint of the scale): “The general idea is intelligible only after considerable study, but after this study one is fairly confident that he understands. Poor word choice, grotesque syntactic arrangement, untranslated words, and similar phenomena are present, but constitute mainly ‘noise' through which the main idea is still perceptible. TABLE 4. Scale of Intelligibility 9–Perfectly clear and intelligible. Reads like ordinary text; has no stylistic infelicities. 8–Perfectly or almost clear and intelligible, but contains minor grammatical or stylistic infelicities, and/or midly unusual word usage that could, nevertheless, be easily “corrected.” 7–Generally clear and intelligible, but style and word choice and/or syntactical arrangement are somewhat poorer than in category 8. 6–The general idea is almost immediately intelligible, but full comprehension is distinctly interfered with by poor style, poor word choice, alternative expressions, untranslated words, and incorrect grammatical arrangements. Postediting could leave this in nearly acceptable form. 5–The general idea is intelligible only after considerable study, but after this study one is fairly confident that he understands. Poor word choice, grotesque syntactic arrangement, untranslated words, and similar phenomena are present, but constitute mainly “noise” through which the main idea is still perceptible. 4–Masquerades as an intelligible sentence, but actually it is more unintelligible than intelligible. Nevertheless, the idea can still be vaguely apprehended. Word choice, syntactic arrangement, and/or alternative expressions are generally bizarre, and there may be critical words untranslated. 3–Generally unintelligible; it tends to read like nonsense but, with a considerable amount of reflection and study, one can at least hypothesize the idea intended by the sentence. 2–Almost hopelessly unintelligible even after reflection and study. Nevertheless, it does not seem completely nonsensical. 1–Hopelessly unintelligible. It appears that no amount of study and reflection would reveal the thought of the sentence. PREPARATION OF TEST MATERIALS AND COLLECTION OF DATA The measurement procedure was tested by applying it to six varied English translations–three human and three mechanical–TABLE 5. Scale of Informativeness (This pertains to how informative the original version is perceived to be after the translation has been seen mad studied. If the translation already conveys a great deal of information, it may be that the original can be said to be low in informativeness relative to the translation being evaluated. But if the translation conveys only a certain amount of information, it may be that the original conveys a great deal more, in which case the original is high in informativeness relative to the translation being evaluated.) 9–Extremely informative. Makes “all the difference in the world” in comprehending the meaning intended. (A rating of 9 should always be assigned when the original completely changes or reverses the meaning conveyed by the translation.) 8–Very informative. Contributes a great deal to the clarification of the meaning intended. By correcting sentence structure, words, and phrases, it makes a great change in the reader's impression of the meaning intended, although not so much as to change or reverse the meaning completely. 7–(Between 6 and 8.) 6–Clearly informative. Adds considerable information about the sentence structure and individual words, putting the reader “on the right track” as to the meaning intended. 5–(Between 4 and 6.) 4–In contrast to 3, adds a certain amount of information about the sentence structure and syntactical relationships; it may also correct minor misapprehensions about the general meaning of the sentence or the meaning of individual words. 3–By correcting one or two possibly critical meanings, chiefly on the word level, it gives a slightly different “twist” to the meaning conveyed by the translation. It adds no new information about sentence structure, however. 2–No really new meaning is added by the original, either at the word level or the grammatical level, but the reader is somewhat more confident that he apprehends the meaning intended. 1–Not informative at all; no new meaning is added, nor is the reader's confidence in his understanding increased or enhanced. 0–The original contains, if anything, less information than the translation. The translator has added certain meanings, apparently to make the passage more understandable. of a Russian work entitled Mashina i Mysl' (Machine and Thought), by Z. Rovenskii, A. Uemov, and E. Uemova (Moscow, 1960). These translations were of five passages varying considerably in type of content. (All the passages selected for this experiment, with the original Russian versions, have now been published by the Office of Technical Services, U.S. Department of Commerce, Technical Translation TT 65-60307.) The materials associated with one of these passages were used for pilot studies and rater practice sessions; the experiment proper used the remaining four passages. In preparing materials for the rating task, 36 sentences were selected at random from each of the four passages under study. Since six different translations were being evaluated, six different sets of materials were prepared (in two forms, one for the monolinguals and one for the bilinguals) in such a way that each set contained a different translation of a given sentence. In this way no rater evaluated more than one translation of a given sentence. Each set of materials was given to three monolinguals and to three bilinguals; thus, there were 18 monolinguals and 18 bilinguals. Each rater had 144 sentences to evaluate first for intelligibility and then for the informativeness of the original (or the standard translation of it) after the translation had been seen. The raters required three 90-min sessions to complete this task, dealing with 48 sentences in each session. The raters were not informed as to the source of the translations they were rating, although they were told that some had been made by machine. Before undertaking this task, the raters attended a 1-hr session in which they were given instruction in the rating procedures and required to work through a 30-sentence practice set. During the rendering of ratings for intelligibility, the raters held stopwatches on themselves to record the number of seconds it took them to read and rate each sentence. RESULTS The results of the experiment can be considered under two headings: (a) the average scores of the various translations, and (b) the variation in the scores as a function of differences in sentences, passages, and raters. Table 6 gives the over-all mean ratings and time scores for the six translations, arranged in order of general excellence according to our data. Consider first the mean ratings for intelligibility by the monolinguals. Translation 1, a published human translation that had presumably been carefully done, received the highest mean rating, 8.30, on the scale established in Table 4. But 8.30 is still appreciably different from the maximum possible mean rating of 9.00, and it is evident that not even this “careful” human translation was as good as one might have expected. Furthermore, the mean rating of Translation 1 is not significantly different from that of Translation 4 (8.21), a “quick” human translation made by rapid dictation procedures. The mean ratings of Translations 1 and 4 do, however, differ significantly from the mean rating (7.36) of Translation 2, another “quick” human translation. It may be concluded that the measurement procedure studied here is sensitive enough to differentiate among human translations. A similar remark may be made about the sensitivity of this procedure to differences in the intelligibility of machine translations. Translations 7 and 5 were shown to be significantly more intelligible, on the average, than Translation 9. Of most current interest, however, are the results having to do with the comparison of the human and the machine translations. Machine translations 7, 5, and 9 received mean ratings, respectively, of 5.72, 5.50, and 4.73. A scale value of 5 refers to a translation in which “the general idea is intelligible only after considerable study, but after this study one is fairly confident that he understands . . .” All these machine translations are significantly less intelligible, on the average, than any of the three human translations. As machine translations improve, it should be possible to scale them by the present rating procedure to determine how nearly they approach human translations in intelligibility. The monolinguals' mean ratings on “informativeness” (reflecting the lack of fidelity of the translations) show an almost perfect inverse relationship to the mean ratings on intelligibility, and they differentiate the various translations in the same way and to the same extent. This result means that in practice, when ratings are averaged over sentences, passages, and raters, “intelligibility” and “fidelity” are very highly correlated. The detailed results of this study show that only in the case of a few particular sentences do the mean ratings of intelligibility and informativeness convey different information. Furthermore, the mean reading times per sentence show almost precisely the same pattern of results as the ratings. In fact, the mean reading times are linearly related to the mean ratings, a result that supports the conclusion that the points on the rating scales are evenly spaced. The results from the ratings by bilinguals contribute nothing more to the differentiation of the translations than is obtainable with the monolinguals' ratings. Bilinguals' intelligibility ratings of the translations are slightly (and significantly) higher, on the average, than those of the monolinguals, and correspondingly, their informativeness ratings are slightly lower. Yet, they took significantly longer to read and rate the sentences. Apparently their knowledge of Russian caused them to work harder on trying to understand the translations. One is inclined to give more credence to the results from the monolinguals because monolinguals are more representative of potential users of translations and are not influenced by knowledge of the source language. It is also to be noted that the data from the monolinguals differentiate the translations to a somewhat greater extent than do the data from the bilinguals. The results concerning the differences in ratings due to differences in sentences, passages, and raters can now be considered. (The detailed tables of these results are omitted here to save space.) The more important results may be summarized as follows: 1. The results do not differ significantly from passage to passage; that is, on the average the various passages from a given translation receive highly similar ratings. For intelligibility ratings, however, there is a small but significant interaction between translation and passage, indicating that translations are to some extent differentially effective for different types of content. (This interaction effect is present both for human and for machine translations.) 2. There is a marked variation among the sentences. In fact, as may be seen from Figure 1, there is some overlap between sentences from human translations and from mechanical translations; or, in other words, there are some sentences translated by machine that have higher ratings than some other sentences translated by human translators, even though, on the average, the humantranslated sentences are better than the machine- translated ones. These results imply that in order to obtain reliable mean ratings for translations, a fairly large sample of sentences must be rated. 3. Variation among raters is relatively small, but it is large enough to suggest that ratings should always be obtained from several raters–say at least three or four. CONCLUSION This experiment has established the fact that highly reliable assessments can be made of the quality of human and machine translations. In the case of the six particular translations investigated in the study, all the human translations were clearly superior to the machine translations; further, some human translations were significantly superior to other human translations, and some machine translations were significantly superior to other machine translations. On the whole, the machine translations were found to fall about at the midpoint of a scale ranging from the best possible to the poorest possible translation. What is still needed, however, is a system whereby any translation can be easily and reliably assessed. The present experiment has determined the necessary parameters of such a system. FIGURE 1. Frequency distribution of monolinguals' mean intelligibility ratings of the 144 sentences in each of six translations. Translations 1, 4, and 2 are human translations; Translations 7, 5, and 9 are machine translations. 【置顶:立委科学网博客NLP博文一览(定期更新版)】
自动语言处理和计算语言学 在过去的10年里,政府已经使用了,通过各种机构,约2000万美元用于 机器翻译及其密切相关的科目(见附件16 ) 。这已经超过了政府 1年 翻译费用 以上。其他资金已分配到信息检索,图书馆自动化,编程指令。 虽然分时操作的机械制造和编程技术,已经部分得到来自政府的支持,计算机行业也已经使用它自己的资源用于机器开发,自动语言处理相关的支出在计算机硬件进展中起着明显的次要角色。工业界也一直负责投入计算机对新闻字距和连字符调整及其相关排版方面的重要技术(见附录17 ) ,或许是因为这方面的市场较易确定。 相对于 计算机硬件方面的较 小 影响 ,机器翻译,及其由此催生的 计算语言学工作 ,对 计算机软件(编程技术和系统) 做出了显著贡献 。这些贡献在 附录18中有相当详细的 讨论。 到目前为止,机器翻译最重要的结果在于其对语言学的影响,附录19中有更多细节的描述。 计算语言学的问世有望在自然语言的研究工作中引起一场革命。十年前,大多数语言学家认为,句法主要涉及调整词序、形态、功能词(如介词和连词) ,以及语调或标点符号。他们还认为,在普通环境下,多数以英语为母语的人说出的句子语法没有歧义。今天,他们知道,这两个信念相互不协。这个认识是计算机对普通的句子自动分析(parsing)的直接结果,他们使用的是迄今能设计的合理文法,利用程序让给定文法下的所有歧义完全暴露。 如今仍有理论语言学家对实证和计算都不感兴趣,也有应用语言学家对十年来的理论进展无动于衷,对计算机也很木纳。但是,比以往任何时候都有更多的语言学家尝试把微妙的语言理论与更丰富的数据相结合,他们中几乎所有人,无论在哪个国家,都渴望计算机的支持。前一代人需要一辈子来做的一些语言工作(譬如建立对照语库、词汇表、肤浅的文法),如今借助计算机几个星期即可完成。在对于作为人类交流工具的自然语言的理解方面,人类的确迈出了万里长征的第一步。 语言学的革命不完全是机器翻译和自动分析工作的结果,但没有这些尝试,语言学革命不可能如此广泛或重大。 我们看到计算机为语言学家预备了一系列新的挑战、见地和机会。我们相信,这些挑战可与粒子物理面临的挑战、问题和见地类比。毫无疑问,语言在所有现象中的重要性是首屈一指的。而且计算语言学所需要的工具成本,比起需要数十亿伏加速器的粒子物理小太多了。 新的语言学提出了一个有吸引力而且极其重要的挑战。 我们完全有理由相信,面对这一挑战,最终将导致在许多领域的重要贡献。一个更深的语言知识可以帮助: 更有效地教外语。 教语言的本质更有效。 更有效地使用自然语言下指令和通信。 帮助我们构造为特殊用途(例如,飞行员控制塔通讯语言)的人工语言。 使我们能够在语言的使用以及人的沟通和思想方面做有意义的心理实验。除非我们知道语言是什么,我们不知道我们必须解释什么。 用机器辅助翻译和信息检索。 然而,语言学的状态是这样的,本身具有价值的优秀研究是必不可少的,如果 语言学 最终要做出这些贡献。 这样的研究必须 使用 电脑。我们必须研究以找出有关语言奥妙的数据是压倒性的,无论在数量还是复杂性上。电脑承诺帮助我们控制 巨大的数据量 问题,并在一定程度上对付数据的复杂性问题。 但是,我们尚不具有明确而容易使用的电脑处理语言数据的好方法。 因此,下列重要的研究,是需要做的,应予以支持:(1) 计算机处理语言的方法的 基本开发研究,譬如帮助语言科学家发现并说明他的概括的工具,并作为工具帮助检查对数据的概括 建议; (2)发展研究的方法,让语言的科学家用电脑来陈述他们的详细复杂的各种理论(例如,语法和意义理论),使他们生产的理论可以被检查细节。 改善翻译的道路 我们已经注意到,我们已经具有一般科学文献的机器辅助翻译,但是我们并不具有真正有用的机器翻译。此外,机器翻译也不具备直接的或可预见的前景。 我们已经指出,机器翻译的重要贡献主要在促进语言学以及计算机编程方面的进展。我们注意到,翻译本身虽然非常重要,但对翻译需求的满足只要一个不大但有能力的活动组织即可。当然,我们发现,翻译质量的改善还是有具备吸引力的机会,我们呼吁加强针对翻译改善方面的工作。我们也注意到为了保证翻译质量,成本会有显著变化。 因此,取得客观的对准确性和质量的评价非常重要。实际有用的测试,如附录10中所描述的努力,是最重要的。 机器辅助可能是人工翻译或机助翻译的一个重要的支持。美国空军外国技术部( FTD )的数字显示,生产成本(最终翻译的组装和再生产)是非常高的。看来,翻译期刊延误是由于生产,而不是翻译。编辑和生产采用机械化手段可能是可取的(见附件17 ) 。这方面研究和开发的主要成本最好可以由其他比翻译更大的领域来承担。 机器辅助翻译可能是朝着更好、更快、更便宜的翻译发展的一个重要途径。机器辅助翻译最需要的是良好的工程。什么对人最有帮助,是特殊词汇表,文中部分或全部词的词典查找,还是一个粗略的翻译,如由FTD产出的那样 ?延误往往由于许多步骤需要排队等候所致,怎样才能避免这些延误?如何削减生产成本? 自动字符识别经常被认为对机器辅助翻译很重要。 FTD的数字表明,自动字符识别可能对作业成本略有降低。自动字符识别的工作由下列几种活动资助(例如,信息检索,邮局),这些活动领域通过成功的字符识别将比机器辅助翻译要节省更多成本。因此, 只要能节省 钱就 应采用 字符识别。但这方面研发不需要机器翻译来资助。 最后,对改善翻译究竟应该花多少钱来研究和开发?对一个相对较小规模而且满意度的很好的翻译产业上花费大笔钱,是不合理的。 委员会无法判断改善翻译究竟应该需要在研究和开发上年度总投入多少为宜。然而,钱应该花在脚踏实地、重要而相对短期见效的目标上。 建议 委员会建议在两个不同的领域投入。 首先是作为语言学一部分的计算语言学研究,如自动文法分析、句子生成、结构、语义、统计以及定量的语言问题,包括带有机器辅助或不带机助的实验。应当支持作为科学来研究语言学,这种研究不应根据其在实际翻译的任何直接或可预见的贡献来判断。重要的是要找有能力的人来审批研究方案,评判人应该有能力审定现代语言学的工作,并根据方案的科学价值进行评估。 第二个方面是改善翻译。应该得到资助的工作包括 实用的翻译评价方法; 加快人类翻译过程的种种手段; 评估翻译的质量和成本的各种来源; 调查的翻译的利用率,防止生产无人使用的翻译; 考察翻译全过程的延误,以及消除延误的方法,无论是杂志翻译还是个别项目的翻译; 评价各种各样的机器辅助翻译的相对速度和成本; 现有机械化编辑和翻译生产过程的改造; 翻译全过程; 以及 生产足够的翻译工作参考资料,包括现在主要存在于机器翻译自动字典查找中的词汇表。 所有这些研究的目 应当是增加翻译速度, 降低翻译成本,并达到 指定 的可接受的质量。 ~~~~~~~~~~~~~~~~~~~~~~~~~ Automatic Language Processing and Computational Linguistics Over the past 10 years the government has spent, through various agencies, some $20 million on machine translation and closely related subjects (see Appendix 16). This is more than the government cost of translation for 1 year. Other moneys have been allocated to information retrieval, library automation, and programmed instruction. Although techniques of machine construction and programming for time-shared operation have been developed with partial support from the government, the computer industry has spent its own resources in machine development, and expenditures in connection with automatic language processing have played a distinctly minor role in advances in computer hardware. Industry has also been responsible for the development of important techniques of computer justification and hyphenation of newsprint and related matters of composition (see Appendix 17), perhaps because the market was easy to determine. As opposed to its small effect on computer hardware, work toward machine translation, together with the computational linguistic work that has grown out of it, has contributed significantly to computer software (programming techniques and systems). These contributions are discussed in considerable detail in Appendix 18. By far the most important outcome of work toward machine translation has been its effect on linguistics, which is described in more detail in Appendix 19. The advent of computational linguistics promises to work a revolution in the study of natural languages. A decade ago, most linguists believed that syntax had to do with word order, inflection, function words (e.g., prepositions and conjunctions), and intonation or punctuation. They also believed that most sentences uttered by native speakers in ordinary contexts were syntactically unambiguous. Today, they know that these two beliefs are mutually inconsistent. Their knowledge is the immediate result of computer parsing of ordinary sentences, using reasonable grammars as hitherto conceived and programs that expose all ambiguities under a fixed grammar. Today there are linguistic theoreticians who take no interest in empirical studies or in computation. There are also empirical linguists who are not excited by the theoretical advances of the decade – or by computers. But more linguists than ever before are attempting to bring subtler theories into confrontation with richer bodies of data, and virtually all of them, in every country, are eager for computational support. The life's work of a generation ago (a concordance, a glossary, a superficial grammar) is the first small step of today, accomplished in a few weeks (next year, in a few days), the first of 10,000 steps toward an understanding of natural language as the vehicle of human communication. The revolution in linguistics has not been solely a result of attempts at machine translation and parsing, but it is unlikely that the revolution would have been extensive or significant without these attempts. We see that the computer has opened up to linguists a host of challenges, partial insights, and potentialities. We believe these can be aptly compared with the challenges, problems, and insights of particle physics. Certainly, language is second to no phenomenon in importance. And the tools of computational linguistics are considerably less costly than the multibillion-volt accelerators of particle physics. The new linguistics presents an attractive as well as an extremely important challenge. There is every reason to believe that facing up to this challenge will ultimately lead to important contributions in many fields. A deeper knowledge of language could help 1. to teach foreign languages more effectively; 2. to teach about the nature of language more effectively; 3. to use natural language more effectively in instruction and communication; 4. to enable us to engineer artificial languages for special purposes (e.g., pilot-to-control tower languages); 5. to enable us to make meaningful psychological experiments in language use and in human communication and thought (unless we know what language is we do not know what we must explain); and 6. to use machines as aids in translation and in information retrieval. However, the state of linguistics is such that excellent research, which has value in itself, is essential if linguistics is ultimately to make such contributions. Such research must make use of computers. The data we must examine in order to find out about language is overwhelming both in quantity and in complexity. Computers give promise of helping us control the problems relating to the tremendous volume of data, and to a lesser extent the problems of data complexity. But, we do not yet have good, easily used, commonly known methods for having computers deal with language data. Therefore, among the important kinds of research that need to be done and should be supported are (1) basic developmental research in computer methods for handling language, as tools for the linguistic scientist to use as a help to discover and state his generalizations, and as tools to help check proposed generalizations against data; and (2) developmental research in methods to allow linguistic scientists to use computers to state in detail the complex kinds of theories (for example, grammars and theories of meaning) they produce, so that the theories can be checked in detail. Avenues to Improvement of Translation We have already noted that, while we have machine-aided translation of general scientific text, we do not have useful machine translation. Further, there is no immediate or predictable prospect of useful machine translation. We have noted that the important contributions of machine translation have been primarily to linguistics and secondarily to computer programming. We have noted that while translation itself is vital, needs for translation are being met by a small though capable activity. We find, however, that there are attractive opportunities for improvement in translation, and we urge work aimed at such improvement. We have noted the importance of quality in translations. We have noted that cost varies markedly with asserted quality. It is important, therefore, to achieve some objective evaluation of accuracy and quality. Work toward practical useful tests, such as that described in Appendix 10, is of the greatest importance. Machine aids may be an important adjunct to human or machine-aided translation. USAF Foreign Technology Division (FTD) figures show that production costs (assembly and reproduction of the final translations) are very high. It appears that delays in translated journals are attributable to production rather than to translation. Adoption of mechanized means of editing and production might be desirable (see Appendix 17). Here the main cost of research and development can best be borne by other, larger fields than translation. Machine-aided translation may be an important avenue toward better, quicker, and cheaper translation. What machine-aided translation needs most is good engineering. What will help the human being most–special glossaries, dictionary look-up of some or all words in the text, or a rough translation such as that produced by FTD? How can the delays due to queues at many tandem steps be avoided? How can production costs be cut? Automatic character recognition is often mentioned as important to machine-aided translation. FTD figures indicate that automatic character recognition could slightly decrease the cost of the operation. Automatic character recognition work is being supported heavily in connection with several kinds of activity (information retrieval, post office, for example) where the financial savings through successful character recognition would be much greater than in machine-aided translation. Hence, character recognition should be adopted when and if it will save money, but research and development need not be supported in connection with machine translation. Finally, how much should be spent on research and development toward improving translation? It would be unreasonable to spend extravagantly on a relatively small business that is doing the job satisfactorily. The Committee cannot judge what the total annual expenditure for research and development toward improving translation should be. However, it should be spent hardheadedly toward important, realistic, and relatively short- range goals. Recommendations The Committee recommends expenditures in two distinct areas. The first is computational linguistics as a part of linguistics– studies of parsing, sentence generation, structure, semantics, statistics, and quantitative linguistic matters, including experiments in translation, with machine aids or without. Linguistics should be supported as science, and should not be judged by any immediate or foreseeable contribution to practical translation. It is important that proposals be evaluated by people who are competent to judge modern linguistic work, and who evaluate proposals on the basis of their scientific worth. The second area is improvement of translation. Work should be supported on such matters as 1. practical methods for evaluation of translations; 2. means for speeding up the human translation process; 3. evaluation of quality and cost of various sources of translations; 4. investigation of the utilization of translations, to guard against production of translations that are never read; 5. study of delays in the over-all translation process, and means for eliminating them, both in journals and in individual items; 6. evaluation of the relative speed and cost of various sorts of machine- aided translation; 7. adaptation of existing mechanized editing and production processes in translation; 8. the over-all translation process; and 9. production of adequate reference works for the translator, including the adaptation of glossaries that now exist primarily for automatic dictionary look-up in machine translation. All such studies should be aimed at increasing the speed and decreasing the cost of translations and at specifying degrees of acceptable quality. About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution. 【置顶:立委科学网博客NLP博文一览(定期更新版)】
机器翻译的现状 “机器翻译”,大概是指由算法从机器可读的源文本* 转换到有用的目标语文本,不用任何人类翻译或后编辑。在这个意义上,一直就没有一般科学文献的机器翻译,可见将来也不会有。 一直没有一般科学文献机器翻译的结论基于以下支持的事实。经过8年的工作,乔治敦大学MT项目试图产生有用的输出在1962年,他们仍不得不求助于后期编辑。后期编辑使得翻译费时稍长,而且比传统的人力翻译更加昂贵。美国空军外国技术部( FTD )的“机械翻译”设施一边输出翻译,一边后期编辑输出。 Gilbert博士航空公司Itek公司告诉王Itek已从计划委员会,建立了“机器翻译”的服务,但它会提供翻译后期编辑。博士J.C.R.的利克莱德IBM和保罗·加尔文博士邦克雷默表示,他们不会向他们的公司建议这样一个服务。 看科学文献未经编辑的机器译文是可以猜懂大部分的,但它有时误导,有时是错误的( 后期编辑的机器译文在一个较小的程度上,也有此弊) ,使阅读缓慢而痛苦。 †(见附录10 。 ) 最近的一项研究由美国学院的研究 作为其主要目标与俄语相同的文件时,可以读取他们已经翻译成英文了FTD的机器翻译( MT )系统的精度和速度比较(一对一 ,另一组只是因为它出来的计算机) ,当他们被翻译成英文,由翻译人员在常规的方式。 在物理学中,测试表明,原料MT输出10 %的读者不太准确,慢了21% ,并且有一个理解水平低于29%时,比起用手工翻译。当他用后期编辑的输出 ,他准确度少了3 % ,慢了11% ,并且理解水平低于13%,比起用手工翻译。 在地球科学,当他用原料MT输出,他16%不准确,慢了21% ,并且有一个理解水平低25% 。当他用后编辑输出,他是5%不准确,慢了11% ,并有一个理解水平低于23% ,比起用手工翻译。 主观上,有很多的麻烦似乎趴在非自然结构和不自然的词序,个别字一个字或多个翻译,留给读者的选择,但奇怪的翻译是麻烦的。 (在机器翻译中的常见错误类型的分类见附录11 )。 下文是典型的近期( 1964年11月以来)输出四种不同的机器翻译系统。每个样品给出了第一个和最后(除了翻译4号)从中间的俄罗斯空间生物学的文章段落和段落。 *机器可读的文本是简单的文本,可以用来作为一个输入到计算机。它包括打孔卡片,打孔纸带,磁带,并通过键盘操作,一般准备从印刷文本。 †优秀机输出的简单或选定的文本在多次实验中已经实现,这是没有任何实际和理论意义有限。 Bunker-Ramo Corporation No. 1 Biological experiments, conducted on various/different cosmic aircraft, astrophysical researches of the cosmic space and flights of Soviet and American astronauts with the sufficient/rather persuasiveness showed/ indicated/pointed, that momentary/transitory/short orbital flights of lower/below than radiation belts/regions/flanges of earth/land/soil in the absence of the raised/increased/ hightened sun/sunny/solar activity with respect to radiation are/appear/arrive/ report safe/not dangerous/secure. Received/obtained by astronauts of the dosage of the radiation at the expense of the primary cosmic emission/radiation and emissions/radiations of the external/outer radiation belt/region/flange are so/ such a small, that can not render/show/give the harmful influence/action/effect on/in/at/to the organism of man. Mammals (dog, mouse/mice, rat, guinea pigs), fly/flies of the drosophilae, vegetable/vegetational objects/items/objectives. Seeds of higher/superior/ supreme plants/vegetables (wheat, peas, onion/bow, the pine tree, beans, radish, carrot etc), microspore of the tradescantia/spiderwort, the culture of the alga/seeweed chlorella on/in/at/to tissue, cellular, sub-cellular, and molecular levels (Gyurdzhian, 1962A. Antipov et al., 1962) were used in these experiments. In experiments on/in/at/to mammals the special/particular/ peculiar attention/consideration/ was given to the research/analysis/ investigation of the state/condition/position of the system of the blood/ hemogenesis formation, the determination/definition/ decision of intermediate products of the exehange of nucleic acids (desoxycytidine and di)epolo$itel* substances), the study/investigation of the state/ condition/position of the natural immunity, the determination/definition/ decision of the maintenance/ content of serotonin in the blood. Moreover, the control for/during/per/beyond the condition/state pigmentation of hair for/ at/by/from black mice (the line/ strain CSUB57 BL) was conducted. Physiological shifts/improvements were studied also/as well on/in/at/to seeds of higher/superior/supreme plants, vegetables microorganisms, cells of various different tissues/cloth in the culture etc. Thus, the consideration/investigation certain/some from/of principal/ basic radiobiological problems shows/indicates/points/displays, that in the given region/area still/yet/more/back/some more very many/very much unsolved questions. This is clear/plain, since cosmic radiobiology is very the young section/division of young science--the cosmic biology. However there is/there are/is/eat basis to hope, that by common/general/total efforts of scientific various/different professions of different/various countries of the world/peace radiobiological researches in the cosmic space will be sucessfully continued/ carried on and were expanded/broadened. Computer Concepts, Inc. No. 2 The biological experiments that were carried out on different cosmic flying apparatus, ASTROFIZICESKIE the research of cosmic PROSTRANSTVA and the flights of Soviet and American KOSMONAVTOV with sufficient UBEDITEL6NOST6H showed, that the short-time orbital flights below of the radiational belts of earth in the absence that was raised by the SOLNECNO1 one of activity in a radiational attitude are BEZOPASNYMI. Dose of radiati on at the expense of primary cosmic radiation and the radiation of an exterior radiational belt the obtained by KOSMONAVTAMI are so little, that aren't able to render a harmful influence to the organism of a man. Mammals (dogs, meeth, rats, sea SVINKI) were utilized in these experiments. The flies of drosophila, vegetable objects, semena of higher plants (wheat, GOROX, LUK, a pine tree, BOBY, REDIS, a carrot and others), MIKROSPORY of TRADESKANQII the culture of an alga chlorella in different nourishing mediums, the numerous biological and QITOLOGICESKIE ones objects on the TKANEVOM, cellular, subcellular and molecular levels (Ghrdjian, 1962 and Antipov from Soavt 1962) and in experiences to mammals particular attention was being allotted to the research of the condition of the system of KROVOTVORENI4, to the definition of the intermediate products of the exchange of nucleic acids DEZOKSIQITIDINA and DIWEPOLOJITEL6NYX substances), to the study of the condition of natural IMMUNITETA, to the definition of the content of SEROTONINA in KROVI. Besides, control after the condition of PIGMENTAQII of VOLOS at CERNYX meeth (the line of C(57) of Y) was being carried out. Physiological SDVIGI were being studied also on SEMENAX of higher plants, microorganisms, the cells of different tissues in culture and T. of D. Thus, the examination of some from fundamental RADIOBIOLOGICESKIX problems shows, that in this a field still very much NEREWENNYX questions. This is clear, since cosmic RADIOBIOLOGI4 is very young RAZDELOM young science efforts of the scientific different specialties of the different countries of the world successful PRODOLJENY will be expanded there are. FTD, USAF No. 3 Biological experiments, conducted on different space aircraft/vehicles, astrophysical space research and flights of Soviet and American astronauts with/ from sufficient convincingness showed that short-term orbital flights lower than radiation belts of earth in the absence of heightened solar activity in radiation ratio are safe. Obtained by astronauts of dose of radiation at the expense of primary cosmic radiation and radiation of external radiation belt are so small that cannot render harmful influence on organism of person. In these ESKPERIMENTAKH were used mamals (dog, mice, rat, guinea pig), fly of Drosophilae, vegetable objects, seeds of highest plants (wheat, pea, onion/bow, pine, beans, radish, carrot and others), microspore of tradescantia, culture of alga chlorella on different nutrient media, numerous biological and TSITOLOGICHCHESKIE objects on tissue, cellular, sub-cellular and molecular levels (Gyurozhian 1962A, Anti-Pov with/from Soavt, 1962). In experiments on mammals special attention was allotted investigation of state of system of sanguification, determination of intermediate products of exchange of nucleic acids (deoxycytidine and Dischepositive substances), study of state of natural immunity, determination of contents gray-fineness in blood. Furthermore, was conducted counterol for/after state of pigmentation of hairs for black mice (line bl). Physiologic shifts were studied also on seeds of highest plants, microorganisms, cages of different fabrics in culture etc. Thus, consideration of certain from basic radiobiological problems shows that in given region still very many unsolved questions. This and intelligibly, since space radiobiology is very young division of young science--space biology. However is base to trust that jointly scientists of different specialties of various countries of world/peace radiobiological investigations in outer space will be successfully continued and expanded. EURATOM, Ispra, Italy No. 4 Biological experiments, which were conducted on different cosmic LETATEL6NYX APPARATI, the astrophysical investigations of cosmic space and the flights of Soviet and also American KOSMONAVTOV with the sufficient convincingness showed, that the short-term orbital flights of below radiation belts of ground upon the absence of the increased solar activity in radiation relation are safe. Obtained by KOSMONAVTAMI of dose of radiation at the expense of initial cosmic radiation and the radiations of external radiation belt are so small, that cannot have harmful action on the organism of man. In these experiments there were used mammals (dogs, mice, KRYSY, the maritime piglets), MUXI DROZOFILY, vegetable objects. The seeds of higher plants (wheat, the pea, LUK, pine, beans, REDIS, MORKOV6 etc.) MIKROSPORY TRADESKANQII, the culture of alga of chlorella on the different feed environments, numerous biological and QITOLOGICESKIE objects on TKANEVOM, cellular, SUBKLETOCNOM and molecular levels (Ghrdjian, 1962 and Antipov with Soavt 1962). In experiments on mammals special attention was devoted to the investigation of state of system of KROVOTVORENI4, the determination of intermediate products the exchange of nucleinic acids (DEZOKSIQITIDINA and DIWEPOLOJITEL6NYX substances), the study of the state of natural IMMUNITETA The determination of content of SEROTONINA in blood. Besides this, there was conducted the check for the state or PIGMENTAQII the hair at black mice (the line C(57) Y) the Physiological) shifts were studied also on the seeds of higher plants, microorganisms, the cells of the different tissues in culture and T D. 读者会发现拿上面这些样本来比较10年前的简单或挑选的文本译文结果(乔治敦IBM实验,1954年1月7日)很有启发性,较早的样本比后来者更易读。 The quality of crude oil is determined by calory content. The quality of saltpeter is determined by chemical methods. TNT is produced from coal. They obtain dynamite from nitroglycerine. Ammonite is obtained from saltpeter. Gasoline is prepared by chemical methods from crude oil. They prepare ammonite. Gasoline is produced by chemical methods from crude oil. The price of crude oil is determined by the market. Calory content determines the quality of crude oil. TNT is prepared from coal. 数字电子计算机的迅速发展表明,机器翻译是可能的。这个想法学者和管理者的想象。实际的目标很简单:从机器可读国外技术性文本有用的英文文本,准确,可读性强,并最终由美国科学家撰写的文字没有区别。简单或选定的文本,如上面给出的那些早期的机器翻译,均作为一般科学文本的“机器翻译”看似鼓励已劝阻均匀。但是,朝机器翻译的工作已经产生了许多宝贵的语言知识和洞察力,否则我们不会达到。 当然,没有人能够保证,我们不会突然或至少迅速达到机器翻译,但我们觉得这是不太可能的。胜利者 H. YNGVE麻省理工学院电子研究实验室,在回答委员会主席约翰·皮尔斯的请求,表达了他的意见如下: 我同意你的看法,不后期编辑的机器翻译目前没有用处,后编辑过所有的过程是缓慢的,并可能是不合算的。 至于到完全自动翻译的可能性,我相信,我们总有一天会达到的地步,这将是可行的,经济的。然而,有相当多的所需的基本知识,我们根本就没有在此刻,它很快就可以得到这方面的知识是谁也说不准。然而,我将继续致力于试图获得一些这方面的知识。全自动翻译是否将永远是经济问题,必须等待,直到我们看到,无论是在所有可能的。我觉得,如果可能的话,那么它将会由于计算机技术的快速发展,在未来经济。 在他的论文“机械翻译研究的启示” , YNGVE博士指出: 机械翻译工作已经拿出了反对语义屏障。 。 。我们已经走过了面对面的实现,我们将有足够的机械翻译机器时可以“理解”什么是翻译,这将是一个非常困难的任务确实。 。 。 “理解”正是我的意思。 。 。我们中的一些推进面露难色。 该委员会确实认为,这是明智的做法,是按正向面露难色,以科学的名义,但这样做的动机不能理智的任何可预见的改进在实际的翻译。也许,我们的态度可能是不同的,如果出现了一些迫切需要的机器翻译,但我们没有发现。 ~~~~~~~~~~~~~~~~~~~~ The Present State of Machine Translation “Machine Translation” presumably means going by algorithm from machine-readable source text* to useful target text, without recourse to human translation or editing. In this context, there has been no machine translation of general scientific text, and none is in immediate prospect. The contention that there has been no machine translation of general scientific text is supported by the fact that when, after 8 years of work, the Georgetown University MT project tried to produce useful output in 1962, they had to resort to postediting. The postedited translation took slightly longer to do and was more expensive than conventional human translation. The “mechanical translation” facility of the USAF Foreign Technology Division (FTD) postedits the machine output when it produces translations. Dr. Gilbert King of Itek Corporation told the Committee that Itek plans to establish a “machine translation” service, but that it will provide postedited translations. Dr. J.C.R. Licklider of IBM and Dr. Paul Garvin of Bunker-Ramo said they would not advise their companies to establish such a service. Unedited machine output from scientific text is decipherable for the most part, but it is sometimes misleading and sometimes wrong (as is postedited output to a lesser extent), and it makes slow and painful reading.† (See Appendix 10.) A recent study by the American Institutes for Research had as its principal objective comparison of the accuracy and speed with which the same Russian documents can be read when they have been translated into English by the FTD machine translation (MT) system (one set postedited, the other set just as it came out of the computer) and when they had been translated into English by a human translator in the conventional manner. In physics, tests showed that the reader of raw MT output was 10 percent less accurate, 21 percent slower, and had a comprehension level 29 percent lower than when he used human translation. When he used postedited output, he was 3 percent less accurate, 11 percent slower, and had a comprehension level 13 percent lower than when he used human translation. In the earth sciences, when he used raw MT output, he was 16 percent less accurate, 21 percent slower, and had a 25 percent lower comprehension level than when he used human translations. When he used postedited output, he was 5 percent less accurate, 11 percent slower, and had a comprehension level 23 percent lower than when he read human translations. Subjectively, a lot of the trouble seems to lie in unnatural constructions and unnatural word order, though strange translations of individual words or multiple translations of one word, with the choice left to the reader, are bothersome. (For a classification of the types of errors common in machine translation see Appendix 11.) The paragraphs below are typical of the recent (since November 1964) output of four different MT systems. Each sample gives the first and last (except for translation No. 4) paragraphs and a paragraph from the middle of a Russian article on space biology. *Machine-readable text is simply text that can be used as an input to a computer. It includes punched cards, punched paper tape, and magnetic tape, and is ordinarily prepared from printed text by a keyboard operator. †Excellent machine output of simple or selected text has been attained in several experiments; this is of no practical and limited theoretical significance. Bunker-Ramo Corporation No. 1 Biological experiments, conducted on various/different cosmic aircraft, astrophysical researches of the cosmic space and flights of Soviet and American astronauts with the sufficient/rather persuasiveness showed/ indicated/pointed, that momentary/transitory/short orbital flights of lower/below than radiation belts/regions/flanges of earth/land/soil in the absence of the raised/increased/ hightened sun/sunny/solar activity with respect to radiation are/appear/arrive/ report safe/not dangerous/secure. Received/obtained by astronauts of the dosage of the radiation at the expense of the primary cosmic emission/radiation and emissions/radiations of the external/outer radiation belt/region/flange are so/ such a small, that can not render/show/give the harmful influence/action/effect on/in/at/to the organism of man. Mammals (dog, mouse/mice, rat, guinea pigs), fly/flies of the drosophilae, vegetable/vegetational objects/items/objectives. Seeds of higher/superior/ supreme plants/vegetables (wheat, peas, onion/bow, the pine tree, beans, radish, carrot etc), microspore of the tradescantia/spiderwort, the culture of the alga/seeweed chlorella on/in/at/to tissue, cellular, sub-cellular, and molecular levels (Gyurdzhian, 1962A. Antipov et al., 1962) were used in these experiments. In experiments on/in/at/to mammals the special/particular/ peculiar attention/consideration/ was given to the research/analysis/ investigation of the state/condition/position of the system of the blood/ hemogenesis formation, the determination/definition/ decision of intermediate products of the exehange of nucleic acids (desoxycytidine and di)epolo$itel* substances), the study/investigation of the state/ condition/position of the natural immunity, the determination/definition/ decision of the maintenance/ content of serotonin in the blood. Moreover, the control for/during/per/beyond the condition/state pigmentation of hair for/ at/by/from black mice (the line/ strain CSUB57 BL) was conducted. Physiological shifts/improvements were studied also/as well on/in/at/to seeds of higher/superior/supreme plants, vegetables microorganisms, cells of various different tissues/cloth in the culture etc. Thus, the consideration/investigation certain/some from/of principal/ basic radiobiological problems shows/indicates/points/displays, that in the given region/area still/yet/more/back/some more very many/very much unsolved questions. This is clear/plain, since cosmic radiobiology is very the young section/division of young science--the cosmic biology. However there is/there are/is/eat basis to hope, that by common/general/total efforts of scientific various/different professions of different/various countries of the world/peace radiobiological researches in the cosmic space will be sucessfully continued/ carried on and were expanded/broadened. Computer Concepts, Inc. No. 2 The biological experiments that were carried out on different cosmic flying apparatus, ASTROFIZICESKIE the research of cosmic PROSTRANSTVA and the flights of Soviet and American KOSMONAVTOV with sufficient UBEDITEL6NOST6H showed, that the short-time orbital flights below of the radiational belts of earth in the absence that was raised by the SOLNECNO1 one of activity in a radiational attitude are BEZOPASNYMI. Dose of radiati on at the expense of primary cosmic radiation and the radiation of an exterior radiational belt the obtained by KOSMONAVTAMI are so little, that aren't able to render a harmful influence to the organism of a man. Mammals (dogs, meeth, rats, sea SVINKI) were utilized in these experiments. The flies of drosophila, vegetable objects, semena of higher plants (wheat, GOROX, LUK, a pine tree, BOBY, REDIS, a carrot and others), MIKROSPORY of TRADESKANQII the culture of an alga chlorella in different nourishing mediums, the numerous biological and QITOLOGICESKIE ones objects on the TKANEVOM, cellular, subcellular and molecular levels (Ghrdjian, 1962 and Antipov from Soavt 1962) and in experiences to mammals particular attention was being allotted to the research of the condition of the system of KROVOTVORENI4, to the definition of the intermediate products of the exchange of nucleic acids DEZOKSIQITIDINA and DIWEPOLOJITEL6NYX substances), to the study of the condition of natural IMMUNITETA, to the definition of the content of SEROTONINA in KROVI. Besides, control after the condition of PIGMENTAQII of VOLOS at CERNYX meeth (the line of C(57) of Y) was being carried out. Physiological SDVIGI were being studied also on SEMENAX of higher plants, microorganisms, the cells of different tissues in culture and T. of D. Thus, the examination of some from fundamental RADIOBIOLOGICESKIX problems shows, that in this a field still very much NEREWENNYX questions. This is clear, since cosmic RADIOBIOLOGI4 is very young RAZDELOM young science efforts of the scientific different specialties of the different countries of the world successful PRODOLJENY will be expanded there are. FTD, USAF No. 3 Biological experiments, conducted on different space aircraft/vehicles, astrophysical space research and flights of Soviet and American astronauts with/ from sufficient convincingness showed that short-term orbital flights lower than radiation belts of earth in the absence of heightened solar activity in radiation ratio are safe. Obtained by astronauts of dose of radiation at the expense of primary cosmic radiation and radiation of external radiation belt are so small that cannot render harmful influence on organism of person. In these ESKPERIMENTAKH were used mamals (dog, mice, rat, guinea pig), fly of Drosophilae, vegetable objects, seeds of highest plants (wheat, pea, onion/bow, pine, beans, radish, carrot and others), microspore of tradescantia, culture of alga chlorella on different nutrient media, numerous biological and TSITOLOGICHCHESKIE objects on tissue, cellular, sub-cellular and molecular levels (Gyurozhian 1962A, Anti-Pov with/from Soavt, 1962). In experiments on mammals special attention was allotted investigation of state of system of sanguification, determination of intermediate products of exchange of nucleic acids (deoxycytidine and Dischepositive substances), study of state of natural immunity, determination of contents gray-fineness in blood. Furthermore, was conducted counterol for/after state of pigmentation of hairs for black mice (line bl). Physiologic shifts were studied also on seeds of highest plants, microorganisms, cages of different fabrics in culture etc. Thus, consideration of certain from basic radiobiological problems shows that in given region still very many unsolved questions. This and intelligibly, since space radiobiology is very young division of young science--space biology. However is base to trust that jointly scientists of different specialties of various countries of world/peace radiobiological investigations in outer space will be successfully continued and expanded. EURATOM, Ispra, Italy No. 4 Biological experiments, which were conducted on different cosmic LETATEL6NYX APPARATI, the astrophysical investigations of cosmic space and the flights of Soviet and also American KOSMONAVTOV with the sufficient convincingness showed, that the short-term orbital flights of below radiation belts of ground upon the absence of the increased solar activity in radiation relation are safe. Obtained by KOSMONAVTAMI of dose of radiation at the expense of initial cosmic radiation and the radiations of external radiation belt are so small, that cannot have harmful action on the organism of man. In these experiments there were used mammals (dogs, mice, KRYSY, the maritime piglets), MUXI DROZOFILY, vegetable objects. The seeds of higher plants (wheat, the pea, LUK, pine, beans, REDIS, MORKOV6 etc.) MIKROSPORY TRADESKANQII, the culture of alga of chlorella on the different feed environments, numerous biological and QITOLOGICESKIE objects on TKANEVOM, cellular, SUBKLETOCNOM and molecular levels (Ghrdjian, 1962 and Antipov with Soavt 1962). In experiments on mammals special attention was devoted to the investigation of state of system of KROVOTVORENI4, the determination of intermediate products the exchange of nucleinic acids (DEZOKSIQITIDINA and DIWEPOLOJITEL6NYX substances), the study of the state of natural IMMUNITETA The determination of content of SEROTONINA in blood. Besides this, there was conducted the check for the state or PIGMENTAQII the hair at black mice (the line C(57) Y) the Physiological) shifts were studied also on the seeds of higher plants, microorganisms, the cells of the different tissues in culture and T D. The reader will find it instructive to compare the samples above with the results obtained on simple, or selected, text 10 years earlier (the Georgetown IBM Experiment, January 7, 1954) in that the earlier samples are more readable than the later ones. The quality of crude oil is determined by calory content. The quality of saltpeter is determined by chemical methods. TNT is produced from coal. They obtain dynamite from nitroglycerine. Ammonite is obtained from saltpeter. Gasoline is prepared by chemical methods from crude oil. They prepare ammonite. Gasoline is produced by chemical methods from crude oil. The price of crude oil is determined by the market. Calory content determines the quality of crude oil. TNT is prepared from coal. The development of the electronic digital computer quickly suggested that machine translation might be possible. The idea captured the imagination of scholars and administrators. The practical goal was simple: to go from machine-readable foreign technical text to useful English text, accurate, readable, and ultimately indistinguishable from text written by an American scientist. Early machine translations of simple or selected text, such as those given above, were as deceptively encouraging as “machine translations” of general scientific text have been uniformly discouraging. However, work toward machine translation has produced much valuable linguistic knowledge and insight that we would not otherwise have attained. No one can guarantee, of course, that we will not suddenly or at least quickly attain machine translation, but we feel that this is very unlikely. Victor H. Yngve of the MIT Research Laboratory of Electronics, in answer to a request from Committee Chairman John R. Pierce, expressed his views as follows: I concur with your view of machine translation, that at present it serves no useful purpose without postediting, and that with postediting the over-all process is slow and probably uneconomical. As to the possibility of fully automatic translation, I am convinced that we will some day reach the point where this will be feasible and economical. However, there is considerable basic knowledge required that we simply don't have at the moment, and it is anybody's guess how soon this knowledge can be obtained. However, I am dedicated to trying to obtain some of this knowledge. The question as to whether fully automatic translation will ever be economical must wait until we see whether it is possible at all. I feel that if it is possible, then it will be economical in the future because of the rapid advances in computer technology. In his paper, “Implications of Mechanical Translation Research” , Dr. Yngve notes: Work in mechanical translation has come up against a semantic barrier. . . We have come face to face with the realization that we will only have adequate mechanical translation when the machine can “understand” what it is translating and this will be a very difficult task indeed . . . “understand” is just what I mean . . . some of us are pressing forward undaunted. The Committee indeed believes that it is wise to press forward undaunted, in the name of science, but that the motive for doing so cannot sensibly be any foreseeable improvement in practical translation. Perhaps our attitude might be different if there were some pressing need for machine translation, but we find none. 【置顶:立委科学网博客NLP博文一览(定期更新版)】
人工翻译 为了了解根本性质和翻译的困难,或现有资源和翻译问题,必须要知道一些关于人类的翻译和翻译人员。因此,早在其研究过程中,委员会听取一些翻译专家。这些专家们似乎同意翻译的三个必要条件,按重要性顺序是(1)良好的目标语言知识,(2)理解主题, (3)足够的源语言知识。 因此,虽然一些译者的母语不是英语也能把外文翻译成不错的英语,一般来说译者的母语最好是英语。此外,好翻译可以由一些拥有普遍科学知识的人担任,但最好的技术翻译一般是技术领域专家。似乎也很清楚,有限的源语言能力是可以的,当译者是题材的专家的时候。 有人对翻译委员会强调需要良好的词典和参考书。这对需要一个长期的工作特别重要,当翻译分给多人,在这种情况下,适当的字典或技术术语词汇表是必不可少的,才能保持翻译的一致性。 翻译使用各种助力,包括听写机和打字机,但他们并不总是适合产生可以出版的最后版本。最终副本,加上插图和公式,通常是由中心服务部门完成。尽管联合出版物研究服务( JPRS ),或类似机构提供了主要服务,翻译费用更大的一部分通常还是翻译的人工。 实验表明,迅速口述的翻译几乎与“全译本”一样好,而仅需约四分之一的时间(见附件1) 。 ~~~~~~~~~~~~~ Human Translation In order to have an appreciation either of the underlying nature and difficulties of translation or of the present resources and problems of translation, it is necessary to know something about human translation and human translators. Thus, early in the course of its study the Committee heard from a number of experts in translation. These experts seem to agree that the three requisites in a translator, in order of importance, are (1) good knowledge of the target language, (2) comprehension of the subject matter, and, (3) adequate knowledge of the source language. Therefore, while good translations into English are made by some translators whose native tongue is not English, in general, translators whose native tongue is English are preferable. Furthermore, while good translations are made by some translators who have a general appreciation of scientific knowledge, the best technical translations are generally made by experts in the technical field covered. It also seems clear that a restricted competence in the source language is adequate when the translator is expert in the subject matter. It was emphasized by several persons who made presentations to the Committee that translators need good dictionaries and reference books. This need is especially important when a long work is split up for translation, for in such cases adequate dictionaries or glossaries are essential if technical terms are to be translated consistently. Translators use a variety of aids, including dictating machines and typewriters, but they do not always produce a final copy suitable for reproduction. The final copy, with figures and equations inserted, is usually produced by the central service. Despite the substantial services performed by the Joint Publications Research Service (JPRS) or by similar agencies, the greater part of the cost of translation usually goes to the translator. One experiment that has come to the attention of the Committee indicates that a rapidly dictated translation is almost as good as a “full translation” and takes only about one fourth the time (see Appendix 1). ×××××××××××××××××××××××× 译者从业类型 In addition, he has available to him better reference facilities than his free-lance counterparts. 译者就业的两种主要类型是编内翻译和合同翻译。每种类型都具有其特定的优点和缺点,对于需要翻译的个人或机构也是如此。 编内翻译 好处是全职工作,并享有所有休假和退休福利,与组织内其他全职员工同。此外,他有更好的参考资料,比起自由翻译工。 编内翻译对雇主而言,优势主要是以下几点: 1。在需要的时候,随时可以给译者现货或口头翻译。 2。译者和请求者之间的互利合作有更大的可能性。 3。翻译员可以在需要的时候,提供快捷服务。 4 。分类信息的安全性易于维护。 编内翻译对雇主来说,其缺点是: 1。计算开销和福利,比使用自由翻译一般较昂贵。 2。调度中的问题可能会不时出现,任务有时过多,有时不足。 3。既然编内翻译不可能是一个了解各个领域的专家,很难在机构内得到一直都好的技术翻译。 合同翻译 译者作为一名自由职业者的合同安排,其优点是: 1。如果他能处理一些更少见,因此支付更高的语言中比较广泛的题材,他可能获得大大超过了他作为一个编内翻译的收入。 2。他有相当多的自由决定何时工作和做多少工作。 对翻译的买方,合同安排的优点是: 1。在许多领域,他可以得到技术上胜任的翻译。 2。他从来不需要 支付 没非翻译的时间。 3。他有一个低得多的人头开销。 对买方,合同安排的缺点是: 1。译者不在机构内,即时咨询困难。 2。机密文件的安全性更难以维持。 ~~~~~~~~~~~~ Types of Translator Employment The two main types of translator employment are in-house and contract. Each type has particular advantages and disadvantages for the translator and for the individual or organization requiring the translation. IN-HOUSE The advantages to the in-house translator are that he is employed full time and enjoys all the benefits (leave and retirement, for example) that are offered to other full-time employees in the organization. In addition, he has available to him better reference facilities than his free-lance counterparts. The advantages to the employer of an in-house translator are chiefly the following: 1. The translator can give spot or oral translations when needed. 2. There is greater possibility for mutually beneficial collaboration between the translator and the requester. 3. The translator can provide fast service when needed. 4. The security of classified information is easily maintained. The disadvantages to the employer of the in-house translator are: 1. The arrangement (counting overhead and fringe benefits) is generally more expensive than using free-lance translators. 2. Problems in scheduling may arise from time to time, with the translator having either too much or too little to do. 3. Since it is impossible for the in-house translator to be an expert in all fields, it is difficult to get consistently good technical translations done in-house. CONTRACT The advantages of a free-lance contract arrangement for the translator are: 1. If he can handle a relatively wide range of subject matter in some of the more uncommon and therefore higher-paying languages, he may earn considerably more than he would as an in-house translator. 2. He has considerably more freedom in deciding when and how much he will work. The advantages of the contract arrangement to the buyer of translations are: 1. He can obtain technically competent translations in many fields of subject matter. 2. He never pays for time not spent in translating. 3. He has a much lower overhead. The disadvantages of the contract arrangement to the buyer are: 1. The translator is not on the premises for immediate consultation. 2. Security of classified documents is more difficult to maintain. ×××××××××××××××××××××××× 英语作为科学发表的语言 很容易高估翻译的需要,如果仅仅着眼于在世界各地出版的科学文献的数量的迅速增加。美国处于一个特别幸运的位置,因为英语为 科学的 主要语言。一项在【 物理文摘】 列出3000文摘 以及 在Referativny Zhurnal上列出350物理文摘 的 调查 给出以下结果: 物理文摘 语言 物理文摘 Referativny Zhurnal 英语 76% 63% 俄语 14% 24% 法语 4% 3% 德语 4% 2% 其他 2% 8% 虽然英语语言的论文与非英语的论文之比不同的学科领域有所不同,通常以英语为母语的科学家较少需要读一门外语,或需要翻译,比起任何其他母语的科学家。 ~~~~~~~~~~~~~ English as the Language of Science It is easy to overestimate the need for translation if one simply looks at the rapidly increasing volume of scientific literature being published throughout the world. The United States is in a particularly fortunate position because English is the predominant language of science. A survey of 3,000 abstracts listed in Physics Abstracts and 350 physics abstracts listed in Referativny Zhurnal gave the following results: Language of Paper Abstracted Physics Abstracts Referativny Zhurnal English 76 percent 63 percent Russian 14 percent 24 percernt French 4 percent 3 percent German 4 percent 2 percent Other 2 percent 8 percent Although the ratio of English-language articles to non-English articles varies with the subject field, it is generally true that the English-speaking scientist has less need to read in a foreign language or to have translations made than does a scientist of any other native tongue. ××××××××××××××××××××××××××××× 科学家学习俄语所需的时间 委员会认为,在某些情况下,可能更简单更经济的办法是让严重依赖俄语翻译的人学会直接阅读俄语文献。 JG Tolpin ,在题为 “俄语技术出版物调查:简要教程” 中指出,科学家在8至16个两小时课程单元中可以学会在俄语出版物中识别感兴趣的文章。有时候,他们可以从方程式,表格,图形和图示中提取他们所需要的资料。在其他许多情况下,只要部分口头翻译感兴趣的一点材料就足够满足需要了。公认的事实是,对于技术人员,为了利用本专业的外语期刊,只需要一点点的外语知识就可以了。* 事实上,多家知名研究†表明,200个小时或更少,科学家能够获得 在他的领域的 足够的俄语阅读能力。美国科学家和工程师有这样的知识为数越来越多。 教授政府人员阅读俄语科学读物的能力已经存在,但到目前为止,这项服务仍然未被充分使用。国防语言学院,西海岸分部(原陆军语言学校) ,已开发了两个课程的教学和用于此目的的特殊课本。一个课程运行6周,另一个需要10周。委员会获悉,国防语言学院欢迎学生入学。 10周的课程信息见附录2。 * 应给予更多重视的是, 即使是最好的翻译也 是没有用的 ,如果 一个人不能充分了解专业内容,并把它放在国内和国外其他工作的背景下。 †R. D.伯克合格,发展科技俄语合格翻译的 一些独特问题 ,P-1698,兰德公司(1959年5月12日)。 W. N.洛克,【化学教育期刊】27,426(1950)。 M·菲利普斯,科技中的外语障碍,Aslib,伦敦,英国(1962年),15页。 ~~~~~~~~~~~~~~ Time Required for Scientists to Learn Russian The Committee believes that in some cases it might be simpler and more economical for heavy users of Russian translations to learn to read the documents in the original language. An article by J. G. Tolpin, titled, “Surveying Russian Technical Publications: A Brief Course” , indicates that in eight to sixteen 2-hr class periods scientists can learn to identify articles of interest in Russian publications. Sometimes they can extract what they need from equations, tables, graphs, and figures. In many other cases, a partial oral translation of the material of interest is all that is needed. These are illustrations of the generally acknowledged fact that the technically competent reader needs only a little knowledge of a foreign language in order to make use of foreign journals in his field.* Indeed, several well-known studies † indicate that in 200 hr or less a scientist can acquire an adequate reading knowledge of Russian for material in his field. An increasing fraction of American scientists and engineers have such a knowledge. The capability for teaching government personnel to read Russian scientific text already exists, but so far this service has remained largely unused. The Defense Language Institute, West Coast Branch (formerly the Army Language School), has developed two courses of instruction and special texts for this purpose. One course runs 6 weeks, the other 10. The Committee has been informed that the Defense Language Institute would welcome the enrollment of students. Information concerning the 10-week course is presented in Appendix 2. *A corollary that should be given more emphasis is that even the best translation is of no use to a man who cannot fully understand the subject matter and place it in the context of other work here and abroad. †R. D. Burke, Some Unique Problems in the Development of Qualified Translators of Scientific Russian, P-1698, The RAND Corp. (May 12, 1959). W. N. Locke, J. Chem. Educ. 27, 426 (1950). M. Phillips, The Foreign Language Barrier in Science and Technology, Aslib, London, England (1962), p. 15. ×××××××××××××××××××××××××××× 美国政府机关里的翻译 应该强调的是没有一个统一的政府官方翻译系统。事实上,不同的政府机构采用各种不同的方法来填补他们的翻译需求。使用的方法包括合同翻译,编内翻译,联合出版物研究服务社的服务(附录3),以及这些方法的组合。 一些机构使用PL480的配套资金,以增强其在国内获得的翻译(附录4)。其他机构,主要是美国空军,利用 赖特 - 帕特森空军基地 外国技术部 后编辑过的机器输出(附录5)。 此外,美国国家科学基金会,虽然不是主要的翻译生产者,支持着30种期刊的全文翻译(附录6,表1)。 ~~~~~~~~~~~~~~~~~~ Translation in the United States Government It should be emphasized that there is no single official government translation system. Indeed there is considerable variety in the methods used by the various government agencies for filling their translation needs. The methods used include contract only, in-house translation, the services of the Joint Publications Research Service (Appendix 3), and a combination of these methods. Certain agencies are using PL 480 counterpart funds to augment their domestically obtained translations (Appendix 4). Others, principally the U.S. Air Force, utilize the postedited machine output of the Foreign Technology Division, Wright-Patterson Air Force Base (Appendix 5). In addition, the National Science Foundation, while not a primary producer of translations, is supporting the cover-to-cover translation of 30 journals (Appendix 6, Table 1). ××××××××××××××××××××××××××××× 政府译员的数量 政府内部翻译的确切数目是无法确定的,虽然它的数量本来可以从公务员分类“译员”中简单确定。 有时为了改善经济状况,翻译必须首先争取确保一个更负盛名的职业称号。这样的方式为晋升打开大门,尽管其翻译职责可能保持不变。 更复杂的是,其他职业类别的双语人士经常被要求为他们的同事或上司做粗糙或口头的翻译。这种情况当然 不是 美国政府机构特有的。 虽然实际上分类为“译员“的人的数量有不确定性, 我们从公务员服务委员会获得的1962年10月的数字如下: 翻译和办事员在美国雇用的翻译 262 翻译和办事员译者采用全球 453 (译员数量在各部门,在每个机构和CSC工资的表, 1964年, CSC资格标准,见附件7 )。 从由CSC提供的数据,我们已经得知 联邦翻译(店员翻译不包括在内) 平均年薪在美国约6850美元 。 当政府科学家( 9 000美元的年薪中位数比较,这个数字与1962年,美国科学统筹,科学和技术人员国家注册的报告, NSF 64-16 ,美国国家科学基金会,华盛顿特区, 1964年) ,很明显,有技术培训背景的双语人士将获得更多的优势,比作为在各自领域的技术翻译工作的科学家和技术人员。 尽管事实上,政府科学家的平均,平均工资为政府翻译不高,似乎是一个非常低的流动率政府翻译。事实上,供给超过需求。虽然没有现在手头上在美国就业服务网(华盛顿特区)单个请求一个全职翻译,渴望工作的人在其卷约500翻译(兼职或全职)。 (翻译和他们的语言的可用性,见附录8)。 ~~~~~~~~~~~~~~~~~ Number of Government Translators The exact number of government in-house translators is impossible to determine, although it is a simple matter to determine the number of persons in the Civil Service classification, “Translator.” It sometimes happens that the translator who decides to better his economic situation must first contrive to secure a more prestigious occupational title. Thus the way is open for advancement, even though the bulk of his duties might remain the same. The picture is further obscured by the fact that bilingual persons in other job categories are often called upon to produce rough or oral translations for their colleagues or superiors. This situation is not, of course, peculiar to agencies of the U.S. Government. Keeping in mind the indefiniteness of the number of persons actually classified under “Translator,” we give the figures obtained from the Civil Service Commission for October 1962: Translators and clerk-translators employed in the United States 262 Translators and clerk-translators employed worldwide 453 (For the number of translators in each division and grade, in each agency, and for the CSC salary schedule for 1964, and CSC qualification standards, see Appendix 7.) From the data supplied by the CSC, we have figured the average yearly salary of the federal translator (clerk-translator not included) employed in the United States to be approximately $6,850. When one compares this figure with the median annual salary of government scientists ($9,000. American Science Manpower, 1962, A Report of the National Register of Scientific and Technical Personnel, NSF 64-16, National Science Foundation, Washington, D. C., 1964), it is apparent that technically trained bilingual persons would derive more advantages from working as scientists and technologists in their subject specialties than from serving as technical translators in their respective fields. Despite the fact that the average pay for government translators is not as high as the average for government scientists, there seems to be a very low rate of turnover among government translators. Indeed, the facts are that the supply exceeds the demand. Although there is not now on hand at the U.S. Employment Service (Washington, D. C.) a single request for a full-time translator, there are approximately 500 translators on its rolls who desire work (part time or full time). (For the availability of translators and their languages, see Appendix 8.) ××××××××××××××××××××××× 翻译花费金额 考虑到安全的翻译使用的各种方法,并不奇怪,联邦机构已支付了许多不同的翻译价格 - 由$ 9至每千字66元不等的价格。 (不是完全闻所未闻的,翻译买方支付翻译格外好工作比他实际做的更多的话)。 在第一次会议上,委员会决定,这将是非常有用的相当可靠估计的金额,政府花费的翻译。委员会所收集的构成虽然这些数字只是一个估计值 - 一个粗略的估计 - 我们觉得这是到这个时候政府的翻译支出的最佳估计数。 花费金额由政府机构所做的翻译: 百万美元 JPRS 财政年度1964 1.3 商业机构 财政年度1964 3.6 (估价H. R.专责委员会) PL 480 财政年度1965 1.5 NSF国内 财政年份1965 1.1 内务 财政年度1963 5.3 FTD MT 3月1 - 10月2 1964 0.27 总计 13.07 政府翻译的大部分事业在政府支持研究和开发中是一个非常小的活动领域,很显然,从以上数字看。 伯纳德·比尔曼,美国翻译协会在纽约的翻译机构的所有者和董事估计,在美国做商业翻译的机构每年约有7.5百万美元的商业价值。加入由政府花费1300万美元,这个数字的总和约2000万美元。对此应加非政府内部翻译花费的金额2百万美元。因此花在翻译上的钱的数额估计将提高到约2200万元。 ~~~~~~~~~~~~~~~~~~~ Amount Spent for Translation Considering the various methods used to secure translations, it is not surprising that federal agencies have paid many different prices for translation – prices ranging from $9 to $66 per 1,000 words. (It is not altogether unheard of for a translation purchaser to pay a translator who does exceptionally good work for more words than he actually translates.) At its first meeting, the Committee decided that it would be useful to have a fairly reliable estimate of the amount of money the government was spending for translation. Although the figures collected by the Committee constitute only an estimate – and a rough estimate, at that – we feel that it is the best estimate of the government's translation expenditures made up to this time. Amounts spent by government agencies for translations done by: $ Millions JPRS Fiscal Year 1964 1.3 Commercial Agencies Fiscal Year 1964 3.6 (Est. by H. R. Select Committee) PL 480 Fiscal Year 1965 1.5 NSF Domestic Fiscal Year 1965 1.1 In-House Fiscal Year 1963 5.3 FTD MT 1 March - 2 October 1964 0.27 Total 13.07 It is clear from the above figures that translation in the government is a very small field of activity when compared with most undertakings in which the government supports research and development. Bernard Bierman, a New York translation agency owner and a director of the American Translators Association has estimated that the commercial translation agencies in the United States do about $7.5 million worth of business each year. When this figure is added to the $13 million spent by the government, the sum is about $20 million. To this should be added perhaps $2 million for the amount spent for nongovernment in-house translators. Thus the estimate of the amount of money spent on translation would be raised to approximately $22 million. ××××××××××××××××××××××××××××××××××××× 是否短缺翻译或译员? 在过去,有人表示,有翻译或译员短缺的需要尚未得到满足。对于其他语言翻译成英语,委员会认为,事实并非如此。这一结论是基于以下数据: 1。翻译供应大大超过需求。美国就业服务,提供的翻译工作价格低至6元1000字(或更低) ,与翻译交谈都确认了委员会的结论。 2。联合出版物研究服务的容量可以增加一倍的输出(办公室的工作人员只要一个非常小的增加) ,如果需要。 JPRS拥有4000名合同译员,平均一个月只有大约300人被利用。 JPRS选择一个重要的语言作为一个例子,中文翻译可以处理多达两个半倍于目前的需求,这没有困难。 3。美国国家科学基金会的公开支持计划将慎重考虑,通过适当的专业社会,任何外国的杂志,这样的社会提名的翻译支持。 30期刊被翻译盖覆盖在1964年财政年度(见附录6表1 ) 。一个翻译有一个流通的只有200份。这是接近的,以提供个性化的服务。在12年的美国国家科学基金会的支持,已经成为自收自支的19个翻译期刊(见附录6 ,表2 ) 。 委员会拒绝任何翻译短缺的说法,如果这种短缺是根据这样的事实,对PL 480翻译的需求超出其能力5倍以上。这种说法被拒绝的理由是,几乎任何免费商品的需求都是无法满足的。 四十五个(主要是政府)的信息设施,以响应政府研究专责委员会,第88届国会(众议院)发出一份问卷,表明其设施的工作已经有限,缺乏翻译。这45家工厂再次询问他们的设备是否已限制缺乏翻译语言自动处理咨询委员会,如果是这样,这是否是由于缺乏翻译者缺乏合格的译员缺乏授权的位置。委员会共收到25篇。有些人说,他们的设施有没有翻译功能。一位代表说,它已不仅限于译者缺乏,这种情况是由于缺乏授权的位置。六表示,他们并不仅限于缺乏翻译。九设施的回答显然是肯定的,他们已经翻译缺乏的限制,七表示,这是由于缺乏授权的位置。剩下的两个,只有一个,非政府研究中心,说是由于其缺乏缺乏合格的译员。其他人简单地回答说,他们没有足够的服务请求来证明永久职位。 调查结果证实了委员会的信念,不存在短缺的翻译,虽然有可能有一个短缺的翻译授权职位。那么,这是一个财政问题,机构和公务员制度委员会的问题,而不是一个支持机械翻译研究的研究和开发办公室的问题。 委员会得出结论,所有的苏联文献,任何明显的需求是被翻译 ,而且,虽然不容易评估需求或开放或封闭的情报材料覆盖,委员会认为这是决定性的,但遇到了一个单一的情报组织,要求更多的钱用于人类翻译。委员会听取了使用翻译分析师有限,也就是说,即使有更多的材料被翻译,分析师不会利用它。因此,具有讽刺意味的是,一些机构建议花更多的钱,做 “机器翻译”。委员会感到困惑的是,没有理由花费大量的金钱在一个小而已经经济不景气的行业,这个行业只有全职及部分时间劳动力总数不到5000 。 ~~~~~~~~~~~~~~~~~~~~~~ Is There a Shortage of Translators or Translation? In the past, it has been said that there is an unfulfilled need for translation or a shortage of translators. With respect to translators of other languages into English, the Committee finds that this is not so. This conclusion is based on the following data: 1. The supply of translators greatly exceeds the demand. The rolls of the U.S. Employment Service, the availability of translators to work at rates as low as $6 per 1,000 words (or lower), and conversations with translators confirm the Committee's conclusion. 2. The Joint Publications Research Service has the capacity to double its output immediately (with a very small increase in office staff) if called upon. The JPRS has 4,000 translators under contract, and in the average month it utilizes the services of only some 300 of them. To choose one important language as an example, the JPRS could with no difficulty handle up to two and a half times the present demand for Chinese translation. 3. The National Science Foundation's Publication Support Program will carefully consider, through a proper professional society, the support of the translation of any foreign journal that such a society nominates. Thirty journals were being translated cover to cover in Fiscal Year 1964 (see Appendix 6, Table 1). One translation has a circulation of only 200 copies. This comes close to providing individual service. In 12 years of NSF support, 19 translated journals have become self-supporting (see Appendix 6, Table 2). The Committee rejects any argument, based on the fact that the demand for the PL 480 translations is five times greater than the program can satisfy, that there is a shortage of translation. Such an argument is rejected on the grounds that the demand for almost any free commodity is insatiable. Forty-five (mostly government) information facilities, in response to a questionnaire issued by the Select Committee on Government Research (House of Representatives, 88th Congress), indicated that the work of their facilities had been limited by a lack of translators. These 45 facilities were again asked by the Automatic Language Processing Advisory Committee whether their facility had been limited by a lack of translators, and if so whether this lack was attributable to a lack of authorized positions for translators or to a lack of qualified translators. The Committee received 25 replies. Some said that their facilities had no translation function. One said that it had not been limited by a lack of translators and that this situation was attributable to a lack of authorized positions. Six indicated that they were not limited by a lack of translators. Of the nine facilities that answered clearly in the affirmative that they had been limited by a lack of translators, seven indicated that this was attributable to a lack of authorized positions. Of the two remaining, only one, the nongovernment research center, said its lack was attributable to a lack of qualified translators. The others simply replied by saying that they did not have sufficient requests for services to justify permanent positions. The results of the survey confirm the Committee's belief that there is no shortage of translators, although there may be a shortage of authorized positions for translators. This, then, is a fiscal problem for the agencies and the Civil Service Commission, and not a problem for research and development offices supporting research in mechanical translation. The Committee concludes that all the Soviet literature for which there is any obvious demand is being translated , and, although it is less easy to evaluate the needs or coverage of open or closed material for intelligence, the Committee regards it as decisive that it has not encountered a single intelligence organization that is demanding more money for human translation. The Committee has heard statements that the use of translation is analyst-limited; that is, even if more material were translated, analysts would not be available to utilize it. Thus, it is ironic that several agencies propose to spend more money for “machine translation.” The Committee is puzzled by a rationale for spending substantial sums of money on the mechanization of a small and already economically depressed industry with a full-time and part-time labor force of less than 5,000. ×××××××××××××××××××××××××××××××××××××××× 关于可能的超额翻译 虽然委员会没有关注任何缺乏的翻译,它确实有一个翻译可能超过有些担心。翻译的材料,其中有没有一定的前瞻性读者不仅造成浪费,但它堵塞翻译和信息流的渠道。应限于日常翻译期刊或书籍,放心合理的有偿流转和额外的翻译应仅针对具体要求。支持这一立场,我们引述国防部的一个研究组织,研究委员会收到的一封信: 我们已经发现,提供翻译服务,一般不包括我们的技术领域,我们需要为我们的研究的深度。因此,我们不断地把额外的期刊文章和诸如苏联专利翻译请求。我们的问题一直无法获得快速反应,这些特殊的要求,正是这个因素,阻碍我们的工作,而不是限制。如果我们有一个建议,如你做出的一项调查显示,这将是一个更好的平衡之间应建立常规翻译和翻译的特殊的用户请求。我们发现,许多文章被翻译在我们地区不值得的努力,它的出现让我们可以放弃一些日常翻译,以便使更多的特殊要求的快速反应提供翻译服务。 盖盖翻译中,除了许多有价值的信息,这是可能的,许多平庸的研究报告,美国科学家可能已发慈悲放过。 ,在1962年进行的一项有趣的研究,研究的价值包含在苏联医学/公共卫生服务翻译程序 。评估采用的方法是平行的社论裁判的苏联对口美国期刊的文章。翻译的文章的复印件被发送到对口首席美国期刊编辑分配给他们的裁判。初步结果如下。 谢切诺夫生理杂志苏联取自两个问题总数的36篇文章,31 %的人判断是可以接受的,发表在美国生理学杂志或应用生理学杂志。 共41篇文章,从生物物理两个问题(苏联) ,23 %的人判断是可以接受的生物物理学杂志“发表。此外,裁判员表示,另外八条应该是可以接受的,以适当的美国杂志。 取自肿瘤的问题,这两个问题的论文25篇,有76%被认为是可以接受的癌症。裁判表示,另外两篇文章已经接受的一次,但“现在不会被认为是新的,足以值得出版。 ” 进一步的证据是可以找到的翻译可能超过在美国化学家,由Herner及本公司( 1962年6月4日)的美国化学学会的一份报告中的极品苏联翻译: 另一方面,最大的论点,即受访者目前提供给他们的翻译是不是与他们的品质,但在其发行的时间滞后。盖盖翻译过程中,尤其是当涉及翻译是一个比较缓慢的一个。鉴于医学编辑发现,人们可能会怀疑是否平庸或伪劣论文的比例相对较高,不拖延一小部分的外观优势和显着的论文。 也许更揭露真相,除了明确表示不使用苏联翻译的原因是接收苏联的科学信息的首选媒体方面的调查问卷中的问题的答案。三种方法级别比所有其他人。这些国家是:俄罗斯的出版物,定期的英语评论苏联在特定领域的发展,个别文章和翻译的英语摘要。这三种方法当然不是相互排斥的,而是互补的。有趣的是,一些人宁愿盖盖翻译的形式得到他们的苏联信息的受访者数量只有一半的人更需要得到他们的翻译。 。 。 。唯一的东西,可能会做圆了苏联的覆盖面,目前在化学是,第一,确保取水或编辑认为是值得的,苏联的论文,给出了详细的抽象,因为他们很可能不现成的英语第二获得廉价的拷贝引用苏联的论文,可能通过化学文摘社提供手段;和第三选定可用的翻译要求建立一个机制,可能再次通过化学文摘服务。所有这三个领域的改善可能会要求由政府补贴。然而,这将可能意味着小得多的支出比将需要盖盖翻译支持的扩展程序。它也可能会产生更大的回报。 这是委员会的信念,总的技术文献,不值得翻译,它是徒劳的尝试猜测什么人可以在一段时间内要翻译。应该强调的是速度,质量和经济上的要求提供这样的翻译。 如联合出版研究服务,其中收费用户翻译服务翻译不使用少,有利于比如美空军系统司令部的外国技术部,这在一定区域内提供免费的翻译服务。 ~~~~~~~~~~~~~~~~~~~~~~~~ Regarding a Possible Excess of Translation While the Committee is not concerned with any lack of translation, it does have some concern about a possible excess of translation. Translation of material for which there is no definite prospective reader is not only wasteful, but it clogs the channels of translation and information flow. Routine translation should be confined to journals or books with reasonably assured paid circulation and additional translations should be made only in response to specific requests. In support of this position we quote from a letter received by the Committee from a research organization of the Department of Defense: We have found that the available translation services generally do not cover our technical areas to the depth that we require for our studies. As a result, we are continually putting in requests for translations of additional journal articles and such things as Soviet patents. Our problem has been the inability to obtain quick reaction to these special requests and it is this factor that has hampered rather than limited our work. If we had one recommendation to make to a survey such as yours, it would be that a better balance should be established between what is routinely translated and the special translation requests of users. We have found that many articles are being translated in our area that do not warrant the effort and it appears to us that some of the routine translations could be abandoned in order to make more translation services available for quick reaction to special requests. It is possible that the cover-to-cover translations contain, in addition to much valuable information, many uninspired research reports that the U.S. scientist could have been mercifully spared. An interesting study, conducted in 1962, investigated the value of the articles contained in the Soviet journals translated in the National Library of Medicine/Public Health Service translation program . The method of evaluation used was parallel editorial refereeing of the Soviet articles by counterpart American journals. Copies of the translated articles were sent to the editors in chief of counterpart American journals for distribution to their referees. The preliminary results were as follows. Of the total of 36 articles taken from two issues of the Sechenov Physiological Journal of the USSR, 31 percent were judged acceptable for publication in the American Journal of Physiology or the Journal of Applied Physiology. Of the total of 41 articles taken from two issues of Biophysics (USSR), 23 percent were judged acceptable for publication in the Biophysical Journal. In addition the referees indicated that another eight articles should be acceptable to the appropriate American journal. Of the 25 papers taken from two issues of Problems of Oncology, 76 percent were considered acceptable to Cancer. The referees indicated that another two articles would have been acceptable at one time but “would not now be considered new enough to merit publication.” Further evidence of a possible excess of translation is to be found in The Need for Soviet Translations Among American Chemists, a report to the American Chemical Society by Herner and Company (June 4, 1962): On the other hand, the biggest argument that the respondents had with the translations presently available to them was not with their quality but with time lags in their issuance. The translation process–particularly when cover-to-cover translations are involved–is a relatively slow one. In view of the finding of the medical editors, one might well wonder whether a relatively high proportion of mediocre or inferior papers are not delaying the appearance of a small proportion of superior and significant papers. Perhaps even more revealing than the specifically stated reasons for nonuse of Soviet translations are the answers to the question in the questionnaire in regard to preferred media for receiving Soviet scientific information. Three methods outranked all others. These were: English-language abstracts of Russian publications, regular English-language reviews of Soviet developments in specific fields, and translations of individual articles as needed. These three methods are of course not mutually exclusive but complementary. Interestingly, the number of respondents who preferred to get their Soviet information in the form of cover-to-cover translations was only half the number who preferred to get their translations as needed. . . . The only things that might be done to round out the Soviet coverage that is presently available in chemistry is, first, to make sure that Soviet papers that are worthwhile in the opinion of the abstractors or editors are given detailed abstracting because they are likely not to be readily available in English; second to provide means of obtaining cheap copies of cited Soviet papers, possibly through the Chemical Abstracts Service; and third to develop a mechanism for making selected translations available on request, again possibly through the Chemical Abstracts Service. All three areas of improvement would probably require subsidization by the Government. However, it would probably mean a far smaller expenditure than would be required to support an expanded program of cover-to-cover translations. It would also probably produce a far greater return. It is the Committee's belief that the total technical literature does not merit translation, and it is futile to try to guess what someone may at some time want translated. The emphasis should be on speed, quality, and economy in supplying such translations as are requested. A service such as the Joint Publications Research Service, which charges the user for a translation, is less conducive to translation without use than is a service such as the U.S. Air Force Systems Command's Foreign Technology Division, which supplies translations free within certain areas. ××××××××××××××××××××××××××× 翻译的关键问题 有没有在翻译领域的紧急情况。问题不在于满足一些不存在需要通过不存在的机器翻译。然而,有几个关键问题的翻译。这是质量,速度和成本。 质量 委员会强烈认为,翻译的质量,必须有足够的请求者的需求。一个完美无瑕的生产和抛光翻译为用户有限的读者是浪费时间和金钱。另一方面,当一个档案质量要求生产劣质的翻译更浪费资源。它似乎很清楚的是,在许多情况下,充足,高质量的翻译不提供委员会。 尽管有足够的质量是至关重要的,政府有没有可靠的方法来衡量翻译的质量。鉴于此,该委员会的一名成员已经成立了一个实验质量的评价。这项工作是简要介绍附录10 。一种可靠的方法来衡量质量的重视,在确定正确的翻译费用。成本和质量之间的关系是远远精确。关于这种相关性,我们报价从演示文稿,向委员会提出, 1964年, 9月30日,美国翻译协会会长博士库尔特Gingold : 没有绝对的成本和质量之间的关系。有一些优秀的翻译谁收取适度的利率,而一些不称职的管理至少暂时收取高得多的价格。存在这样的相关性可能是更好的,在高端比在低,换句话说,是一种廉价的翻译几乎总是以某种方式的缺陷,而昂贵的翻译并不总是质量优越。和大,然而,一个得到支付。 速度 合理的速度和及时的翻译是必不可少的。委员会相信,在这方面有很大的改进余地。 2,258名科学家,响应关于翻译的苏联杂志的问卷中,有1,407评论出版的滞后时间; 24.5%的影响,滞后时间应减少(美国翻译的苏联科学期刊的使用,用户编写的研究报告的意见雪城大学的美国国家科学基金会和交换从联邦科学和技术信息研究所,报告号: TT -65- 64026 ) 。 AN / GSQ -16( XW- 2 )自动语言翻译美国空军外国技术部( FTD )为109天(44天为高优先级项目)对于一般的文件处理的延迟时间(收据) 。此外,在FTD ,外部承包商的文件翻译的平均处理时间通常为65天加1.3天,每1000字的俄语翻译。 最快速的翻译服务习惯的基础上提供定期的价格已经到了委员会的关注的是,联合出版研究服务( JPRS ),的,保证50页, 15天, 30天100页。 滞后时间(从收到)出版翻译期刊,由美国国家科学基金会支持的范围从15到26周。平均来说,这种滞后的一半时间花费在翻译和编辑(附录6 ,表3) 。 因此,我们看到,许多延误“翻译”在翻译的过程中,本身不会说谎,但而在编辑和制作花费的时间,有时在避免延误。 FTD的机器辅助翻译,延误生产和后期编辑,队列中的许多操作都必须在串联在这个特殊的形式,机器辅助翻译造成的延误。 应该提到的是高优先级的项目分割成段长文本就可以了额外的快速翻译服务,或支付额外的费用可能从基准利率的25 %至50 % ,甚至更高不等,取决于特定的的情况。 成本 成本是很重要的,因为在许多情况下,它是唯一的措施,政府能够明智地使用在决定如何将其翻译是必须要做的。正如我们所看到的,变化很大,由$ 9至每千字66元。机可能不适用于某些形式的翻译,如非常高品质的外交翻译与文学翻译。但科学材料可以做或没有机器辅助翻译。至于质量和速度,可以实现额外的成本,更好的质量和更高的速度,如果长文本分割成段。因此,一个特定的结果是成本的标准,政府应适用于决定翻译手段。 (见附录9各种类型的翻译的成本的估算。 ) ~~~~~~~~~~~~~~~~ The Crucial Problems of Translation There is no emergency in the field of translation. The problem is not to meet some nonexistent need through nonexistent machine translation. There are, however, several crucial problems of translation. These are quality, speed, and cost. QUALITY The Committee believes strongly that the quality of translation must be adequate to the needs of the requester. The production of a flawless and polished translation for a user-limited readership is wasteful of both time and money. On the other hand, production of an inferior translation when one of archival quality is called for is even more wasteful of resources. It seems clear to the Committee that, in many cases, translations of adequate quality are not being provided. Despite the fact that adequate quality is essential, the government has no reliable way to measure the quality of translation. In view of this, one member of the Committee has set up an experiment in the evaluation of quality. This work is described briefly in Appendix 10. A reliable way to measure quality would be of great importance in determining proper cost of translation. The correlation between cost and quality is far from precise. Concerning this correlation, we quote from the presentation made to the Committee on September 30, 1964, by Dr. Kurt Gingold, President of the American Translators Association: There is no absolute correlation between cost and quality. There are some excellent translators who charge moderate rates, while some incompetents manage–at least temporarily–to charge much higher prices. Such correlation as exists is probably better at the low than at the high end; in other words, a cheap translation is almost always defective in some way, while an expensive translation is not always of superior quality. By and large, however, one gets what one pays for. SPEED Reasonable speed and promptness are essential in translation. The Committee is convinced that in this regard there is considerable room for improvement. Of 2,258 scientists responding to a questionnaire concerning translated Soviet journals, 1,407 commented on lag time of publication; 24.5 percent of the comments were to the effect that lag time should be reduced (American Use of Translated Soviet Scientific Journals, a user study prepared by the Syracuse University Research Institute for the National Science Foundation and available from the Clearinghouse for Federal Scientific and Technical Information, Report No. TT-65-64026). The lag time (from receipt) for the average document processed by the AN/ GSQ-16 (XW-2) Automatic Language Translator of the USAF Foreign Technology Division (FTD) is 109 days (44 days for high-priority items). Also at FTD, the average processing time for documents translated by outside contractors was usually 65 days plus 1.3 days for each 1,000 words of Russian translated. The most rapid translation service offered on a customary basis at regular prices that has come to the attention of the Committee is that of the Joint Publications Research Service (JPRS), which guarantees 50 pages in 15 days, 100 pages in 30 days. The lag time (from receipt) in publication of the translated journals supported by NSF ranges from 15 to 26 weeks. On the average, half of this lag is accounted for by time spent in translation and editing (Appendix 6, Table 3). Thus, we see that many of the delays in “translation” do not lie in the process of translation itself, but rather in time spent in editing and production, and sometimes in avoidable delays. In the FTD machine-aided translation, the delays are in production and postediting, together with the delays caused by queues in the many operations that must be done in tandem in this particular form of machine-aided translation. It should be mentioned that for high-priority items extra fast translation service can be had by splitting long texts into segments, or by paying an additional fee that may range from 25 to 50 percent of the base rate or even higher, depending on the particular circumstances. COST Cost is important because in many cases it is the only measure the government can sensibly use in deciding how its translation is to be done. As we have seen, it varies considerably–from $9 to $66 per 1,000 words. Machines are probably inappropriate for some forms of translations, such as very high-quality diplomatic translation and literary translation. But translations of scientific material can be done with or without machine aids. As to quality and speed, at extra cost, better quality and higher speed can be attained if long texts are split into segments. Thus, cost for a particular result is the criterion that the government should apply in deciding on means of translation. (See Appendix 9 for estimates of the costs of various types of translation.) 【置顶:立委科学网博客NLP博文一览(定期更新版)】
【立委按】 ALPAC 黑皮书 是自然语言处理和机器翻译领域极其重要的历史文献,原文在: http://books.nap.edu/html/alpac_lm/ARC000005.pdf 。如此重要的文献本来以为一定有若干中文译本,居然遍搜而不得。我要是有时间,就给它译了,可现在实在没空。算了,至少先凑合弄个机器翻译版吧(略加最低限度的后编辑)。本来是要枪毙机器翻译的,正好让机器翻译serve它,也算小小的报应。把重要历史文献完整挖掘出来,也算功德一枚。Google Translate,给点力!要是不努力,没准哪天我就弃明投暗,找千百度去,伊人在灯火阑珊处已然守候多时了。 ALPAC 黑皮书 1/n(机器翻译版) ~~~~~~~~~~~~~~~~~~~~~~~~~ 弗雷德里克塞茨院长博士 美国国家科学院 2101华盛顿宪法大道,D. C.20418 1965年8月20日 亲爱的博士塞茨: 在1964年4月你形成了一个自动语言处理咨询委员会,应利兰·霍沃斯博士,美国国家科学基金会主任的请求,以便告知国防部,中央情报局和美国国家科学基金会一般机械外语翻译领域的研究和发展状况。我们很快发现你是正确的,确实有很多强烈,但往往相互冲突的意见,关于机器翻译的承诺和现在应采取的最有成效的步骤是什么。 为了达到合理的结论,并提供合理的建议,我们觉得有必要咨询在各种各样领域的专家(他们的名字被列在附录20 ) 。我们已调查翻译的需求,考量翻译的评价,并比较了机器和人类的翻译和其他语言处理功能。 我们发现,我们所听到的都让我们得出同样的结论。我们谨此提交的报告阐明了我们共同的意见和建议。我们相信,这些可以形成有用的改变,旨在增加理解一个极其重要的现象:语言,并发展旨在改善人类翻译而适当使用的机器辅助。 我们很抱歉,由于有其他义务,查尔斯F.霍凯特,原委员会的成员之一,有必要在我们报告写作前就辞职了。然而,他对我们的工作作出了宝贵的贡献,这是我们要感谢的。 你真诚的, J. R.皮尔斯,董事长 语言自动处理咨询委员会 Dr. Frederick Seitz, President National Academy of Sciences 2101 Constitution Avenue Washington, D.C. 20418 Dear Dr. Seitz: In April of 1964 you formed an Automatic Language Processing Advisory Committee at the request of Dr. Leland Haworth, Director of the National Science Foundation, to advise the Department of Defense, the Central Intelligence Agency, and the National Science Foundation on research and development in the general field of mechanical translation of foreign languages. We quickly found that you were correct in stating that there are many strongly held but often conflicting opinions about the promise of machine translation and about what the most fruitful steps are that should be taken now. In order to reach reasonable conclusions and to offer sensible advice we have found it necessary to learn from experts in a wide variety of fields (their names are listed in Appendix 20). We have informed ourselves concerning the needs for translation, considered the evaluation of translations, and compared the capabilities of machines and human beings in translation and in other language processing functions. We found that what we heard led us all to the same conclusions, and the report which we are submitting herewith states our common views and recommendations. We believe that these can form the basis for useful changes in the support of research aimed at an increased understanding of a vitally important phenomenon–language, and development aimed at improved human translation, with an appropriate use of machine aids. We are sorry that other obligations made it necessary for Charles F. Hockett, one of the original members of the Committee, to resign before the writing of our report. He nonetheless made valuable contributions to our work, which we wish to acknowledge. Sincerely yours, J. R. Pierce, Chairman Automatic Language Processing Advisory Committee ×××××××××××××××××××××××××××××××××××× 弗雷德里克塞茨院长博士 美国国家科学院 2101华盛顿宪法大道,D. C.20418 1966年7月27日 亲爱的博士塞茨: 科学与公共政策委员会于3月13日对 国家研究理事会 语言自动处理咨询委员会的报告,进行了审查后,要求董事长,约翰·皮尔斯,准备一份简短的声明,说明计算语言学的资助需求,这不同于自动语言翻译的需求。这一要求源于担心孤独阅读该委员会的报告,可能会导致终止计算语言学研究的支持,以及所建议的减少对在相对短期的 翻译 目标的资助。 皮尔斯博士的建议,部分内容如下: 计算机为语言学家打开了一系列挑战、部分见地和潜力。我们相信,这些挑战可与粒子物理面临的挑战、问题和见地类比。毫无疑问,语言在所有现象中的重要性是首屈一指的。计算语言学所需要的工具成本,比起需要数十亿伏加速器的粒子物理小多了。 新的语言学提出一个有吸引力的,以及一个极其重要的挑战。 我们完全有理由相信,面对这一挑战,最终将导致在许多领域的重要贡献。一个更深的语言知识可以帮助: 1。更有效地教外语。 2。教语言的本质更有效。 3。更有效地使用自然语言下指令和通信。 4。帮助我们构造为特殊用途(例如,飞行员控制塔通讯语言)的人工语言。 5。使我们能够在语言的使用以及人的沟通和思想方面做有意义的心理实验。除非我们知道语言是什么,我们不知道我们必须解释什么。 6。用机器辅助翻译和信息检索。 然而,语言学的状态是这样的,本身具有价值的优秀研究是必不可少的,如果 语言学 最终要做出这些贡献。 这样的研究必须 使用 电脑。我们必须研究以找出有关语言奥妙的数据是压倒性的,无论在数量还是复杂性上。电脑承诺帮助我们控制 巨大的数据量 问题,并在较小程度上对付数据的复杂性问题。但我们尚未有很好的,很容易使用,普及了的方法让计算机处理语言数据。 因此,下列重要的研究,是需要做的,应予以支持:(1) 计算机处理语言的方法的 基本开发研究,譬如帮助语言科学家发现并说明他的概括的工具,并作为工具帮助检查对数据的概括 建议; (2)发展研究的方法,让语言的科学家用电脑来陈述他们的详细复杂的各种理论(例如,语法和意义理论),使他们生产的理论可以被检查细节。 对计算语言学研究最合理的支持来自美国国家科学基金会。需要多大的支持?有些工作必须做在一个相当大的规模上,因为小规模的实验和语言的微缩模型已经证明在过去有严重的偏差,一个真正的问题,只有在一定规模以上的语法、字典、可用语料库的状态下才可把握。 我们估计, 一个机构 60 或 70万一年 可以支持 一个相当 规模的 工作 。我们相信,这种规模的工作有理由在四个或五个中心进行。因此,每年250至300万美元,似乎是合理的研究开支。这个数字不包括在眼前的实际应用中的一种或另一种的工作。这个建议,我明白皮尔斯博士的委员会也认可,还送出了给科学与公共政策委员会的成员征求意见。虽然 科学与公共政策 委员会没有考虑所建议的计算语言学项目与其他国家科学基金会计划的竞争,但我们相信,皮尔斯博士的声明应提请给 美国国家科学基金会 注意,以便把信息咨询委员会的报告放在适当的角度来看。 此致,哈维·布鲁克斯, 科学与公共政策委员会主席 ~~~~~~~~~~~~~~~~~~ Dr. Frederick Seitz, President National Academy of Sciences 2101 Constitution Avenue Washington, D. C. 20418 July 27, 1966 Dear Dr. Seitz: In connection with the report of the Automatic Language Processing Advisory Committee, National Research Council, which was reviewed by the Committee on Science and Public Policy on March 13, John R. Pierce, the chairman, was asked to prepare a brief statement of the support needs for computational linguistics, as distinct from automatic language translation. This request was prompted by a fear that the committee report, read in isolation, might result in termination of research support for computational linguistics as well as in the recommended reduction of support aimed at relatively short-term goals in translation. Dr. Pierce's recommendation states in part as follows: The computer has opened up to linguists a host of challenges, partial insights, and potentialities. We believe these can be aptly compared with the challenges, problems, and insights of particle physics. Certainly, language is second to no phenomenon in importance. And the tools of computational linguistics are considerably less costly than the multibillion-volt accelerators of particle physics. The new linguistics presents an attractive as well as an extremely important challenge. There is every reason to believe that facing up to this challenge will ultimately lead to important contributions in many fields. A deeper knowledge of language could help: 1. To teach foreign languages more effectively. 2. To teach about the nature of language more effectively. 3. To use natural language more effectively in instruction and communication. 4. To enable us to engineer artificial languages for special purposes (e.g., pilot-to-control-tower languages). 5. To enable us to make meaningful psychological experiments in language use and in human communication and thought. Unless we know what language is we don't know what we must explain. 6. To use machines as aids in translation and in information retrieval. However, the state of linguistics is such that excellent research that has value in itself is essential if linguistics is ultimately to make such contributions. Such research must make use of computers. The data we must examine in order to find out about language is overwhelming both in quantity and in complexity. Computers give promise of helping us control the problems relating to the tremendous volume of data, and to a lesser extent the problems of data complexity. But we do not yet have good, easily used, commonly known methods for having computers deal with language data. Therefore, among the important kinds of research that need to be done and should be supported are (1) basic developmental research in computer methods for handling language, as tools to help the linguistic scientist discover and state his generalizations, and as tools to help check proposed generalizations against data; and (2) developmental research in methods to allow linguistic scientists to use computers to state in detail the complex kinds of theories (for example, grammars and theories of meaning) they produce, so that the theories can be checked in detail. The most reasonable government source of support for research in computational linguistics is the National Science Foundation. How much support is needed? Some of the work must be done on a rather large scale, since small-scale experiments and work with miniature models of language have proved seriously deceptive in the past, and one can come to grips with real problems only above a certain scale of grammar size, dictionary size, and available corpus. We estimate that work on a reasonably large scale can be supported in one institution for 600 o r 700 thousand a year. We believe that work on this scale would be justified at four or five centers. Thus, an annual expenditure of 2.5 t o 3 million seems reasonable for research. This figure is not intended to include support of work aimed at immediate practical applications of one sort or another. This recommendation, which I understand has the endorsement of Dr. Pierce's committee, was also sent out for comment to the membership of the Committee on Science and Public Policy. While the Committee on Science and Public Policy has not considered the recommended program in computational linguistics in competition with other National Science Foundation programs, we do believe that Dr. Pierce's statement should be brought to the attention of the National Science Foundation as information necessary to put the report of the Advisory Committee in proper perspective. Sincerely yours, Harvey Brooks, Chairman Committee on Science and Public Policy Dr. Frederick Seitz, President National Academy of Sciences 2101 Constitution Avenue Washington, D. C. 20418 ××××××××××××××××××××××××××××××××××××××××× 前言 国防部,美国国家科学基金会和美国中央情报局支持的项目,外国语言的自动处理大约十年; 这些主要是机械翻译。为了提供一个协调的联邦计划,在这方面的研究和开发,这三个机构成立了联合自动语言处理集团( JALPG ) 。 早期JALPG就确认需要一个咨询委员会,可以提供所要求的技术援助以及促进计算语言学、机械翻译,以及其他相关领域的 独立观测 。 1963年10月美国国家科学基金会主任,利兰·霍沃斯,作为这三个机构的代表要求美国国家科学院建立这样一个委员会。 委员会就这样建立了,并在1964年4月,利用 三个机构 提供的 基金, 国家研究理事会 国家科学院 自动语言处理 咨询委员会在约翰·皮尔斯主席 主持下 ,举行了第一次会议。 委员会决定,支持自动语言处理研究的理由有两个基础: (1)智力挑战领域的研究,与支持机构的使命相关;(2)研究和开发具有明确的前景:促成早期成本 降低 ,或大幅提高性能,或满足实际的需要。 委员会明白支持自动语言处理的工作的很大的动机一直是在上述(2)所代表的实用目的。根据这一目标,该委员会调查了整个翻译问题。本报告介绍了该委员会的调查结果和建议。 ~~~~~~~~~~~~~~~~~~~ Preface The Department of Defense, the National Science Foundation, and the Central Intelligence Agency have supported projects in the automatic processing of foreign languages for about a decade; these have been primarily projects in mechanical translation. In order to provide for a coordinated federal program of research and development in this area, these three agencies established the Joint Automatic Language Processing Group (JALPG). Early in its existence JALPG recognized its need for an advisory committee that could provide directed technical assistance as well as contribute independent observations in computational linguistics, mechanical translation, and other related fields. In October 1963 the Director of the National Science Foundation, Leland J. Haworth, requested on behalf of the three agencies that the National Academy of Sciences establish such a committee. This was done, and in April 1964, with funds made available by the three agencies, the Automatic Language Processing Advisory Committee of the National Academy of Sciences–National Research Council, under the chairmanship of John R. Pierce, held its first meeting. The Committee determined that support for research in automatic language processing could be justified on one of two bases: (1) research in an intellectually challenging field that is broadly relevant to the mission of the supporting agency and (2) research and development with a clear promise of effecting early cost reductions, or substantially improving performance, or meeting an operational need. It is clear to the Committee that the motivation for support of much of the work in automatic language processing has been the practical aim represented in (2) above. In the light of that objective, the Committee studied the whole translation problem. This report presents the findings and recommendations of the Committee. ×××××××××××××××××××××××××××× 目录 人类翻译1 类型译者就业2 英语作为语言的科学4 所需的时间,科学家学习俄语5 在美国政府的翻译 6 政府转换数 7 花费金额为翻译 9 是否有短缺翻译或翻译吗? 11 就可能超出翻译 13 翻译的关键问题 16 机器翻译的现状 19 机器辅助翻译在曼海姆和卢森堡 25 自动语言处理和计算语言学 29 改善翻译大道 32 建议 34 附录 1。视译与全译实验 35 2。国防语言学院课程科学俄罗斯 37 3。联合出版物研究服务 39 4 。公法 翻译 41 5 。机器翻译的外国技术部,美国 空军系统司令部 43 6 。期刊翻译支持由美国国家科学基金会 45 7。公务员制度委员会的数据联邦翻译 50 8。需求和可翻译 54 9。翻译不同类型的成本估算 57 10。质量评价的实验翻译 67 11。 机器翻译中 常见错误类型 76 12。机器辅助翻译联邦武装部队翻译 重刑局,德国曼海姆 79 13。机器辅助翻译的欧洲煤钢COM- 群落,卢森堡 87 14。机器翻译的翻译对战后期编辑 91 15。评估的科学编辑和联合出版物研究服务副外国技术部翻译 102 16。政府支持的机器翻译研究 107 17。电脑出版 113 18。编程语言和语言学的关系 118 19。机器翻译及语言学系 121 20。委员会构成 124 ~~~~~~~~~~~~~~~~~~~ Contents Human Translation 1 Types of Translator Employment 2 English as the Language of Science 4 Time Required for Scientists to Learn Russian 5 Translation in the United States Government 6 Number of Government Translators 7 Amount Spent for Translation 9 Is There a Shortage of Translators or Translation ? 11 Regarding a Possible Excess of Translation 13 The Crucial Problems of Translation 16 The Present State of Machine Translation 19 Machine-Aided Translation at Mannheim and Luxembourg 25 Automatic Language Processing and Computational Linguistics 29 Avenues to Improvement of Translation 32 Recommendations 34 APPENDIXES 1. Experiments in Sight Translation and Full Translation 35 2. Defense Language Institute Course in Scientific Russian 37 3. The Joint Publications Research Service 39 4. Public Law 480 Translations 41 5. Machine Translations at the Foreign Technology Division, U.S. 43 Air Force Systems Command 6. Journals Translated with Support by the National Science Founda- 45 tion 7. Civil Service Commission Data on Federal Translators 50 8. Demand for and Availability of Translators 54 9. Cost Estimates of Various Types of Translation 57 10. An Experiment in Evaluating the Quality of Translations 67 11. Types of Errors Common in Machine Translation 76 12. Machine-Aided Translation at the Federal Armed Forces Transla- 79 tion Agency, Mannheim, Germany 13. Machine-Aided Translation at the European Coal and Steel Com- 87 munity, Luxembourg 14. Translation Versus Postediting of Machine Translation 91 15. Evaluation by Science Editors and Joint Publications Research Ser- vice and Foreign Technology Division Translations 102 16. Government Support of Machine-Translation Research 107 17. Computerized Publishing 113 18. Relation Between Programming Languages and Linguistics 118 19. Machine Translation and Linguistics 121 20. Persons Who Appeared Before the Committee 124 【置顶:立委科学网博客NLP博文一览(定期更新版)】
双语网页资源在多语言信息处理(特别是机器翻译和跨语言信息检索)中,是一项极其宝贵的资源。在机器翻译领域,现在各种投入使用的系统拼模型的同时,也在拼其掌握的资源。当前学术界对双语资源获取的研究中,一个代表性的方法是根据URL的组成,利用启发式规则从双语站点上自动发现双语网页资源(暂且称该方法为基于URL模式的方法),该方法需要事先制定一些启发式规则。我们(Kit Ng, 2007; Zhang, Yao Kit, 2013)试图通过机器自动发现一些规则,来降低基于URL模式的方法对外部先验知识的依懒性。 (Kit Ng, 2007)主要工作是自动发现双语URL模式、然后根据这些模式发现双语网页资源。(Zhang, Yao Kit, 2013)进一步对双语URL模式的可信度进行度量、并依据链接关系发现更多高可信度的双语网页资源,我们的实验表明,该方法大概可以找到20%额外的真实双语网页。 该工作的有趣之处在于: (1)区分URL模式的全局可信度(依据所有种子站点计算得到的URL模式可信度)和局部可信度(依据当前站点计算得到的URL模式可信度),这样可以召回一些局部可信度低、但全局可信度高的双语网页; (2)利用学习到的高可信度的双语URL模式,寻找一些原本没有链接关系的双语网页(我们称之为Deep Bilingual Webpages); (3)利用链接关系,以双语种子站点为基础、发现更多的种子站点之外的高可信度双语站点,然后进一步发现更多的高可信度双语网页。 相关工作介绍,请参见如下论文: 2. Chengzhi Zhang, Xuchen Yao and Chunyu Kit. Finding More Bilingual Web Pages with High Credibility via Link Analysis . In: Proceedings of the 6th Workshop on Building and Using Comparable Corpora (BUCC2013) . August 8, 2013, Sofia, Bulgaria. 1. Chunyu Kit and Jessica Y. H. Ng. An intelligent Web agentto mine bilingual parallel pages via automatic discovery of URL pairing patterns . In Proceedings of the2007IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops: Workshopon Agents and Data Mining Interaction (ADMI-07),Silicon Valley,California, November 2-5, 2007, Silicon Valley,California. 当然基于URL模式的方法也有其不可避免的弱点。除基于URL模式的方法之外,另外一种方法则直接计算候选双语网页之间的结构或内容相似度,通常该过程要耗费大量的计算资源或时间(比如抓到尽量多的源语言和目标语言网页,然后进行跨语言相似度计算)。个人认为,目前关于这个工作的进一步工作还有很多,比如怎么不需要人为地给出种子站点或者尽量给较少的站点,结合半监督学习发现更多高可信度的种子站点可能是个不错的想法。 关于(Zhang, Yao Kit, 2013)中使用到源代码(Pupsniffer)与数据集(种子站点、采集的双语网页以及测评结果等)可以见Pupsniffer的测评网站: http://mega.lt.cityu.edu.hk/~czhang22/pupsniffer-eval/
《立委随笔:keep ambiguity untouched》 (788 bytes) Posted by: 立委 Date: April 27, 2007 06:09PM 机器翻译:至美必在其中。 (22347) Posted by: liwei999 Date: September 19, 2006 12:15AM 冰冰说: 馒头的翻译:茶之至美则必在其中矣。 藕修改后的翻译:茶道必有至美匿于其中。 原句 The truly beautiful must always be in it 要我说,你们两位都对,都不全对。不全对的地方有相同的原因:自作聪明。 还是第一次看见掐架双方一个劲地说对方正确自己错了: “你有道理” “你没错” 文明礼貌至此,语言学者想批评,都不好意思了。 我们做机器翻译的有一个原则,叫做:keep ambiguity untouched (as much as possible),这样才可以立于不败之地。The key 是不要自作聪明。 举个例子吧: A and B of C 怎么翻译? 很多人翻译成:甲和丙的乙 另有很多人翻译成:丙的甲和乙 争论不已,又不懂文明礼貌,搞不好伤了和气,丢了朋友。 机器翻译就绝不会有这个麻烦: 丙的乙和甲 【置顶:立委科学网博客NLP博文一览(定期更新版)】
Google 翻译器 VS Baidu 翻译器 1954 年美国乔治敦大学,在 IBM 公司的支持下,进行了第一次机器翻译试验,把俄文译成了英文。五年之后, 1959 年中国试验成功了俄汉机器翻译。当年,由于计算机软硬件能力有限,只能在有限的范围内进行很简单的翻译试验。如今,机器容量和速度已不成问题,翻译能力大增,在线翻译也已实现。但总的说来,还有不少问题。例如, Google 的翻译工具,把 I don’t go to the party 和 I didn’t go to the party 都翻译成 “ 我没有去参加派对”。 最近, Baidu (百度)也推出了一种翻译工具。上述两句话分别译成“我不去参加派对”和“我没有去参加派对”。在这一点上, Baidu 比 Google 技 高一筹。 然而,在其他一些例句中,两个系统的翻译效果各有千秋,请比较: 1) I played basketball at school this morning. 我打了今天上午在学校的篮球。 我在学校打篮球,今天上午。 我今天上午在学校打篮球了。 This morning I played basketball in school. Today I am playing basketball at school. 2) This black granite wall includes the names of over 58,000 American soldiers who died or disappeared during the Vietnam War. 这个黑色花岗岩墙包括超过 58,800 人死亡或在越南战争期间失踪的美国士兵的名字。 这个黑色的花岗岩壁包括姓名,超过 58000 名美国士兵死亡或失踪,在越南战争。 3) The newest of the Presidential memorials is dedicated to Franklin Delano Roosevelt. Located in west Potomac Park, it includes four open air rooms made of rough granite blocks. The four rooms symbolize the four terms that President Roosevelt served guiding the nation through the Great Depression and World War ll. 新的总统纪念馆是致力于富兰克林德拉诺罗斯福。位于西部波托马克公园,它包括四个露天粗糙的花岗岩荒料的客房。四个房间的象征,罗斯福总统曾经历过大萧条和第二次世界大战会指导全国的四个方面。 最新的总统纪念碑是献给罗斯福总统。位于西波托马克公园,它包括四个开放气室,由粗花岗岩块。四间客房,象征四条款,罗斯福总统曾指导国家通过大萧条和二战。 以上两个系统的优劣,大家可以分析,并得出自己的结论。不过,我可以在这里提供一些线索,让大家考虑。机器翻译系统主要分规则型和统计型两类。 Google 的翻译器自称是统计型的,即在数百万篇文档中找出最佳模式,从而生成译文。规则型就是根据语言学原理建立的。 Baidu 翻译器究竟根据什么建立的, 没有明确指出,只提到 四大技术亮点,即卓越的机器翻译核心技术、领先的语料挖掘技术、强大的海量计算技术、可靠的 web 前端技术。 总起来说,半个多世纪以来,机器翻译的进步是很大的。但机器翻译归根到底是一个语言学问题,单靠统计是不够的。在此基础上,努力挖掘语言规则和翻译技巧,相信在二三十年后,定会出现能在某种程度上与人工翻译媲美的译文。 * 注:应该指出,本文谈的只是中英文互译的一些情况。实际上, Google 翻译器强大得多,它能支持 57 种语言的翻译。 白水 2011-07-28
世界语到汉语和英语的自动翻译试验 --EChA 机器翻译系统概述 第 39 页———————————————————————————————————————————————————— 10. EChA 试验结果分析 总的来说 , 这次试验结果相当令人满意。译文不但可读 , 多数都很通顺。由于比较重视修辞 , 机器味儿也不浓。当然 , 这毕竟是小范围的实验 , 虽然我们尽量照顾到各种可能出现的语言现象 , 但也难说在今后的扩大试验中会出现什么问题 , 好在该系统比较容易维护和改进。 第二首诗中有两处 (110)(111) 把疑问句错译成英语强调句 : CHU kredas la vorton pure karan: vin mi amas! (111) DO BELIEVE the word purely dear: I love you! Cf: 相信纯粹地亲爱的词吗 : 我爱你 ! 这是因为原诗句为了节奏的需要 , 承前省略了主语 VI (YOU) 。有意思的是 , 译成强调句于诗意没有什么损害。 在 EChA 上机伊始 , 我们由于专心于检验方案主体的可行性和合理性 , 而忽略了修辞。初期译文 (1985.12) 显得较粗糙 , 比较后期结果 (1986.2), 译文的改进是明显的。例如 : 1. 形式主语 IT 的增加 (007)(012)(077)(122)(125)(133): Sed chio chi ankorau okazis sub homa gvidado kaj PLEJ GRAVE ESTIS, KE chio chi bazighis sur la homa scio. (012) 1) But all this still happened under man's guiding and MOST IMPORTANT WAS, THAT all this was based on the man's knowledge. 2) But all this still happened under man's guiding and IT WAS MOST IMPORTANT, THAT all this was based on the man's knowledge. 2. 不定式带 TO 跟不带 TO 的区分 (004)(019)(072)(078)(083)(084)(088)(089)(092)(095)(132)(142)(146): LABORI estas necese.(072) 1) (TO) WORK is necessary. 2) TO WORK is necessary. 工作是必要的 . 3. 双宾语 (128)(143)(144): Donu AL mi iom da kafo! (128) 1) Give TO me a little coffee! 2) Give me a little coffee! 给我一点咖啡 ! 表示存在的 ESTI 译 有 和 THERE TO BE (049)(157): En unu jaro ESTAS kvar sezonoj: printempo, somero, autuno kaj vintro. (049) 1) In one year ARE four seasons: spring, summer, autumn and winter. 在一年里面 是 四季节 : 春季 , 夏季 , 秋季和冬季。 2) In one year THERE ARE four seasons: spring, summer, autumn and winter. 在一年里面 有 四季节 : 春季 , 夏季 , 秋季和冬季 . 。 5. 目标语词义的选择 (059)(067)(081)(046)(098)(013)(014)(027)(118)(130): ELMETU viajn opiniojn pri nia laboro! (059) 1) 输出 你们的关于我们的工作的意见 ! 2) 提出 你们的关于我们的工作的意见 ! OUTPUT your opinions about our work! Chu mi FARIS multajn erarojn en mia hejmtasko? (081) 1) Did I DO a lot of mistakes in my homework? 我在我的家庭作业里面 做 了许多错误吗 ? 2) Did I MAKE a lot of mistakes in my homework? 我在我的家庭作业里面 犯 了许多错误吗 ? La partio TRE zorgas la vivon de la popolamaso. (046) 1) The party VERY cares for the life of the masses. 2) The party VERY MUCH cares for the life of the masses. 党很关心人民群众的生活。 La suno levighas CHE oriento. (013) 1) The sun rises AT east. 2) The sun rises IN THE east. 太阳在东方升起。 POST unu monato komencighos la someraj ferioj. (014) 1) AFTER one month will begin the summer's holidays. 2) IN one month will begin the summer's holidays. 暑假在一月以后将开始。 La eksperimento pri mashina tradukado ANKORAU NE estas finita. (027) 1) The experiment about machine's translating STILL has been NOT finished. 关于机器的翻译的试验 仍然没有 被完成。 2) The experiment about machine's translating has been NOT finshed YET. 关于机器的翻译的试验 还没有 被完成。 Ni esperas, ke li GAJNU championecon en la konkurso. (118) 1) We hope, that he WIN championship in the competition. 2) We hope, that he WILL WIN championship in the competition. 我们希望 , 让他在比赛里面赢得冠军。 Prenu la lingvon neutralan KIEL la bazon. (130) 1) Take the language neutral AS the base. 2) Take the language neutral FOR the base. 拿中立的语言作为基础。 通过 EChA 试验 , 我们深深体会到 , 同一语系中的语言转换较之不同语系容易许多。亲属关系越近 , 机器翻译对自动分析的精度要求也就越低 , 因而越容易推向实用。英语和汉语都是分析型语言 , 有很多类似的语言特点 , 即便如此 , 世英转换比世汉还是简单得多。只要建立一部世英自动词典 , 再加上一套形态转换算法 , 甚至无需进行层次和句法的分析 , 就可以实现词对词世英机器翻译。这样的译文尽管粗糙 , 但在相当程度上是可用的。我们对 ECHA 综合第一线 ( 形态转换 ) 输出的未经调序 * 的中间译文作了统计 , 以不引起误解为标准 , 英语正确率为 95% (150/158) 左右 , 费解的有八句 (003)(010)(075)(095)(102)(108)(111)(141), 汉语正确率为 72% (113/158) 左右。排除形态转换中利用了句法分析结果的部分 , ( 但不排除第一线的虚词分析和转换 ), 英语正确率也在 80% 以上。如果在输出译文时 , 对前置宾格名词加上标识符 , 则可懂度还可提高。当然 , 我们试验的这 158 句总有一定的局限 , 所以上述统计也只具有相对意义。中国的机器翻译 , 从一开始研究的就是印欧和汉臧这两个没有亲属关系的语系间语言的自动转换 , 难度很大。这恐怕是我们的实用系统迟迟不能问世的重要原因之一。所以 , 崐中国机器翻译工作者肩上的担子更重 , 任务更艰巨 , 更需要独创和献身精神。这种不利的条件也有它的另一面 : 机器翻译与汉语结合带来的许多特别的问题 , 客观上使我们的研究比较深入。我国的机译研究就没有象欧美那样经历词对词翻译的第一代 , 而是直接从第二代句对句翻译开始 , 起点较高 , 并且在很短时间内 (60 年代初期 ) 就赶上了当时的世界先进水平。这显然与我们所研究的特定对象 ( 俄 - 汉 , 英 - 汉等 ) 的要求有关。 现在谈谈另一个问题 : 文学作品可不可以由机器翻译 ? 我们说完全可以 , 不过很困难。要把人在翻译文学作品时所遵循的规则 ( 其中很多是下意识的 ) 形式化算法化 , 显然不容易。即便做到了 , 经济上也不上算。所以 , 在相当长的时间内 , 除特别的实验需要外 , 人们一般不去花这个力气。 EChA 选译了两首诗歌 , 在这个方面做了粗浅的尝试 , 证明机器也可译诗。从译文看 , 英语比汉语美 , 保留了更多的节奏和韵律的特点 , 更象一首诗。汉语译文除了几句译得较好 ( 如 : 向永远战争着的世界 , / 它允诺神圣的和谐 ), 总体上看 , 更象一篇散文。这也难怪 , 因为 EChA 本来就不是专门为翻译诗歌而设计的。诗歌形式上的两个最大特点是节奏和尾韵。可以设想 , 诗歌机译系统的词典跟一般机器词典应有所不同 : 各词条的每一义项下集中了一批同义的目标语等价词。这些词长短不一 , 韵尾各异 , 供机器在诗歌综合时选用 , 正象人在写诗或译诗时常需要翻韵书一样。 一提机器翻译 , 人们总爱问 : 机器能够翻译文学作品吗 ? 为什么不能 ? 离散是对连续的逼近 , 机器智能是对人的智能的模拟 , 二者之间并没有一道不可逾越的鸿沟。从功能上看 , 机器和人没有什么不同。机器不过是无机体的人罢了。只要人会的事情 , 机器迟早也能会。机器的不会并不是它不能 , 而是人没有使它会 , 这正如文盲不会写字是因为没人教他一样。不过 , 机器胃口很刁 , 不懂 意会 , 只有 言传 ( 通过计算机语言 ) 才能教会它。可惜 , 对很多事 , 人至今还是知其然 , 并不知其所以然 , 无法传授。可见 , 机器的无能全由于人的无能。可人今天不知其所以然的 , 并不说明将来总也不知 , 所以从发展的观点看 , 机器和人一样是无所不能的。事实上 , 机器目前已能代替医生 , 译员和作曲家做部分工作 , 而且比技术较差的人做得还象样些 , 因为它 取法乎上 。即便人 , 也只有很少一部分专家能够从事这些工作。机器已经闯进了万物之灵的神圣禁地。 最后 , 一般地谈谈修辞问题。由于机器翻译至今多局限在实验室里 , 所以未予修辞而产生的阅读障碍 ( 包括心理障碍 ) 还不突出。但随着机器翻译的逐步实用化 , 修辞的必要性将越来越明显。前面所举的后期译文对初期译文的改进的实例 , 主要涉及的就是修辞。 1) 什么是机器翻译修辞 ? 机器翻译修辞是保证译文通顺的一个重要手段。它是机器语法之后译文综合的一部分 , 是自动翻译过程的最后一个环节。广义的修辞包括贯穿翻译全过程的 , 一切旨在促使译文通顺和美化的手段 , 譬如成语手段 ( 通过成语词典 ), 虚词分析 ( 通过虚词模块 ), 结构手段 ( 通过搭配关系 ) 等等。有些所谓多义区分 , 实际上也是一种修辞 , 例如 LUDI (PLAY) 可分为 玩 , 打 ( 球 ), 演奏 ( 乐器 ) 等义项 , 但 演奏 义下具体选择 拉 ( 提琴 , 胡琴 )(016), 弹 ( 钢琴 )(038) 还是 吹 ( 口琴 ) 就属于修辞了。 EChA 对于涉及多义的修辞 , 即目标语合适对等词的选择 , 就把它当作多义问题解决 ( 见 EChA 虚词模块 , 词类词义区分表和多义区分模块 ) 。一般来说 , 跟具体的词汇或语法现象联系很紧的修辞 , 以及其他个性较强的特例修辞 , 应该放在相应的词典或语法部分同时处理 , 而可以归出类别的修辞 , 则由最后独立的修辞模块统一解决。 机器翻译修辞具有某种超语言学的特征 , 属于翻译学范畴。我们知道 , 根据原语和译语的语言学角度的对比差异 , 就可以对所译文句实现转换 ( 主要是句型转换 ), 这是我们目前机器翻译的主体工作。但这样直接转换的句子不能保证其通顺 , 甚至也不能保证其正确 ( 即不被误解 ), 因为语言间 ( 尤其是没有亲属关系的语言间 ) 除了词汇语法等差异外 , 还有超语言学 ( 表达习惯 , 思维方式等等 ) 的差异存在 , 即翻译学角度的对比差异。例如 : nun DE LOKO flugu ghi AL LOKO (now FROM PLACE let it fly TO PLACE) (101) / 现在从 一个 地方让它飞到 另一个 地方吧 ( 从地方到地方 不符合汉语表达习惯 ) 。修辞主要是为消除这种差异而设置的。因此 , 只有翻译学角度的语言对比差异 , 才是修辞的根本依据。 2) 修辞的分类 可分作两大类 : 必要修辞和美修辞。必要修辞是保证译文正确可懂所必需的修辞 , 它是修辞的初级阶段。美修辞则是保证译文通顺畅达 , 甚至产生某种美感或帮助形成译文风格所要求的修辞 , 它是修辞的高级阶段。机器翻译修辞首先是作为必要修辞提出来的。必要修辞是基础 , 具有更大的迫切性 , 是所有实用系统的必要组成部分 , 如形态修辞。这部分修辞数量很有限 , 一定量的研究就可以穷尽它。美修辞可以说是锦上添花。它是为机器译文不断提高质量 , 使之朝成熟 , 完美方向发展 , 以期赶上人工翻译的手段。可见 , 美修辞是无限发展的 , 它本身具有许多层次和侧面。修修补补远不能满足美修辞发展的需要。它要求体系和方法上的不断革新。就机器翻译的前景来说 , 美修辞的比重将逐渐变大。从严格的意义上讲 , 只有美修辞才真正体现修辞本身的特点和规律 , 因为必要修辞在一定的意义上不过是语法的推广 , 即可以算作广义的语法。它的手段跟机器语法没有根本的不同。在现行的 EChA 系统中 , 必要修辞就常常跟语法混在一起。 关于美修辞 , EChA 只是做了一点尝试。应该指出 , 机器翻译的美有自己的侧重点 , 它最推崇 通顺流畅 , 合乎习惯和简洁自然 , 其次是译文风格的形成。我们认为 , 机器译文的风格逐步形成 , 是完全可能的。因为从形式上看 , 风格的承担者主要是词汇 , 尤其是小词 ( 语气词 , 结构词 ), 其次 , 语法形式也有些不同。不同风格的形式特点 , 是可以为机器识辨和接受的。具体做法可以吸收计算风格学 (Computational stylistics) 的研究成果 , 去设计不同风格的译语修辞模型。风格可以有正规体 , 典雅体和口语体等等。正规体格式规范 , 清楚简单 , 给人的印象是客观公正 , 不假藻饰。典雅体的特点是虚词多用古字 ( 如 则 , 即 , 乃 , 便 , 故 , 且 , 其 , 及 等 ), 成语用的也较多 , 显得简洁古雅。口语体则比较松散自由 , 带有更多的语气词 ( 如 吗 , 呢 , 可不 , 是吗 , 啊 等 ) 。 _________________________________________________________________________________ 附注 : 参见 刘涌泉 中国的机器翻译 ( 情报科学 1980, 3 ) 研制世界语类型的机器翻译系统 , 从一开始就得到刘涌泉老师的热情支持 , 从方案主体到具体问题的处理 , 他都给以认真指导。在程序设计和上机调试的的过程中 , 刘倬老师也多次给予指导 , 有些基本操作的算法也是刘倬老师提供的。在 EChA 系统取得初步成果的时候 , 笔者向他们表示深切的感谢。另外 , 还要特别感谢机房韩老师的多方协助。没有她提供的方便 , EChA 系统根本不可能在这么短时间试验成功。 第 45 页—————————————————————————————————————————————— 1. Heinz Dieter MAAS Automata Tradukado en kaj el Esperanto ( Lingvo-kibernetiko kaj aliaj internacilingvaj aktoj de la IX-a Internacia Kongreso de Kibernetiko, pp 75-81, 1982 Gunter Narr Verlag Tubingen ) 2. 机器翻译论文选辑 ( 科学技术文献出版社 , 1979 ) 3. Kalocsay-Waringhien Plena Analiza Gramatiko de Esperanto ( 中国世界语出版社 , 1984 ) 4. 刘涌泉等著 中国的机器翻译 ( 知识出版社 , 1984 ) 5. 刘涌泉 , 高祖舜 , 刘倬著 机器翻译浅说 ( 科学普及出版社 , 1964 ) 6. 刘涌泉 , 李维 巴贝尔通天塔必将建成 ( 中国第一届世界语大会论文 , 1985.8 ) 7. 刘倬 三次机器翻译试验 ( 第一次机器翻译学术会议论文 , 1980.9 ) 论机器翻译规则系统的编制方法 ( 1982.3 上海 ) JFY 型英汉机器翻译系统的研制和试验 ( 语言学会第二届年会论文 , 1983.4 ) 8. 乔毅 开展语言的计算机处理和世界语类型的机器翻译 ( 中国第一届世界语大会论文 , 1985.8 ) 9. 魏原枢 , 徐文琪编 世界语语法 ( 上海外语教育出版社 , 1982 ) 10. 叶蜚声 , 徐通锵著 语言学纲要 ( 北京大学出版社 , 1981 ) 11. 语言和计算机 (1) ( 中国社会科学出版社 , 1982 ) 12. 语言和计算机 (2) ( 中国社会科学出版社 , 1985 ) 13. 张道真编著 实用英语语法 ( 商务印书馆 , 1984 ) 第 46 页———————————————————————————————————————————————————— EChA 试验结果 (1) LA ORIGINALA TEKSTO / THE ORIGINAL TEXT / 世界语原文 (001) TIEL EVOLUIGHIS PLI KAJ PLI LA PLANADO PER MASHINOJ . (002) TIUJ MASHINOJ KOMENCE NUR ELKALKULIS LA DIKTITAJN MATEMATIKAJN PROBLEMOJN , KONFORME AL LA ENPROGRAMIGO . (003) LA ELEKTRONIKAN PROGRAMIGON PRETIGIS HOMOJ . (004) PLI POSTE , KIAM LA SCIODISKETOJ ESTIS ELTROVITAJ , LA PLENAN INDIKARON , ENDISKIGITAN , ONI METIS EN MASHINOJN KAJ ILI TIAMANIERE POVIS EN SI MEM AKUMULI SCIENCAN STOKON , PLI GRANDAN OL LA HOMA CERBO . (005) KAJ SE TEMIS EKZEMPLE PRI LA PLANADO DE ELEKTROMOTORO , ONI ENMETIS LA SHABLONDISKETON DE LA ELEKTROMOTOR-PLANADO , DONIS LA INDIKOJN DE LA DEZIRATA MOTORO ( KILOVATO , TENSIO , ROTACIO , TIPO , KTP ) , (006) POST KIO LA MASHINO MEM PROGRAMIGIS SIN KAJ FARIS LA KALKULOJN . POST KELKAJ MINUTOJ GHI JAM PRETE ELDONIS LA MEZUROJN : LA DIAMETRON DE LA ROTACIA PARTO , GHIAN LONGON, LA MEZUROJN DE LA KANELOJ , DRATOJ , LA VOLVONOMBRON , ENTUTE CHION BEZONATAN . (007) ECH PLI : BALDAU ESTIS ATINGITE , KE LA MASHINO FARIS LA TUTAN DESEGNON KAJ TRANSDONIS GHIN AL LA FABRIKO . (008) KOMPRENEBLE TIUJ DESEGNOJ NE ESTIS IDENTAJ KUN NIAJ PAPERDESEGNOJ . (009) ILI ESTIS DISKETOJ , KIUJ ENTENIS CHIUN DETALON . (010) TIAMANIERE LA PLANADON KAJ FABRIKADON DE LA MASHINOJ JAM PLENUMIS SAME MASHINOJ . (011) ILI PLANIS LA MENDITAN MASHINON , FABRIKIS , ECH KONTROLPROVIS GHIN KAJ LA FUSHAN FORJHETIS . (012) SED CHIO CHI ANKORAU OKAZIS SUB HOMA GVIDADO KAJ PLEJ GRAVE ESTIS , KE CHIO CHI BAZIGHIS SUR LA HOMA SCIO . LA TEKSTO TRADUKITA EN LA ANGLAN / THE TEXT TRANSLATED INTO ENGLISH / 英语译文 (001) SO DEVELOPED MORE AND MORE THE PLANNING BY MACHINES . (002) THOSE MACHINES AT BEGINNING ONLY CALCULATED OUT THE DICTATED MATHEMATICAL PROBLEMS , ACCORDING TO THE PROGRAMMING . (003) MEN PREPARED THE ELECTRONIC PROGRAMMING . (004) MORE LATER , WHEN THE KNOWLEDGE-DISKETTES HAD BEEN FOUND OUT , PEOPLE PUT THE FULL INDICATION , ENDISKED , INTO MACHINES AND THEY THEREFORE COULD IN THEMSELVES ACCUMULATE SCIENTIFIC STOCK , MORE GREAT THAN THE MAN'SBRAIN . (005) AND IF IT CONCERNED FOR EXAMPLE ABOUT THE PLANNING OF ELECTRIC MOTOR , PEOPLE INPUT THE SAMPLE DISKETTE OF THE MOTOR PLANNING , GAVE THE INDICATIONS OF THE DESIRED MOTOR ( KILOWATT , VOLTAGE , ROTATION , TYPE , ETC ) , AFTER WHICH THE MACHINE ITSELF PROGRAMMED ITSELF AND DID THE CALCULATIONS . (006) AFTER SEVERAL MINUTES IT ALREADY READILY GAVE OUT THE MEASUREMENTS : THE DIAMETER OF THE ROTARY PART ,ITS LENGTH , THE MEASUREMENTS OF THE GROOVES , WIRES , THE WINDING NUMBER , IN TOTAL ALL REQUIRED . (007) EVEN MORE : SOON IT HAD BEEN ACHIEVED , THAT THE MACHINE DID THE TOTAL DESIGN AND OVERHANDED IT TO THE FACTORY . (008) OF COURSE THOSE DESIGNS WERE NOT IDENTICAL WITH OUR PAPERDESIGNS . (009) THEY WERE DISKETTES , WHICH CARRIED ALL DETAIL . (010) THEREFORE MACHINES ALREADY FULFILED THE PLANNING AND MANUFACTURING OF THE MACHINES SAMELY . (011) THEY PLANNED THE ORDERED MACHINE , MANUFACTURED , EVEN EXAMINED IT AND THREW AWAY THE USELESS . (012) BUT ALL THIS STILL HAPPENED UNDER MAN'S GUIDING AND IT WAS MOST IMPORTANT , THAT ALL THIS WAS BASED ON THE MAN'S KNOWLEDGE . LA TEKSTO TRADUKITA EN LA CHINAN / THE TEXT TRANSLATED INTO CHINESE / 汉语译文 (001) 这样用机器设计越来越发展了。 (002) 那些机器开始时仅仅按照输入程序计算出所命令的数学问题。 (003) 人准 备了电子程序设计。 (004) 更以后 , 当微型知识磁盘被发明了时 , 人们把所写入磁盘的全套指令集合放到机器里面 , 他 ( 它 ) 们这 样能在自己本身里面积累比人的头脑更大的科学贮蓄。 (005) 如果涉及例如关于电动机的设计 , 人们输入了电动机设计的微 型样品磁盘 , 给了所希望的电动机的指标 ( 千瓦 , 电压 , 运转 , 型号 , 等等 ), 在此以后机器本身把自己程序化了 , 做了计算。 (006) 在几分钟以后它已经就能给出尺寸 : 运转部分的直径 , 它的长度 , 槽纹 , 导线的尺寸 , 圈数 , 总之所需要的一切。 (007) 甚至更 : 很 快达到了 , 机器做了整个图样 , 把它转交到工厂。 (008) 当然那些 图样 与我们的图纸不是一样的。 (009) 他 ( 它 ) 们是储有所 有细节的微型磁盘。 (010) 这样机器已经同样地完成了机器的设计和制造。 (011) 他 ( 它 ) 们设计了所定购的机器 , 制造了 , 甚 至检验了它 , 把废的抛弃了。 (012) 但是这一切仍然在人的指导下进行 , 最重要的是 , 这一切以人的知识作为基础 . (2) DIVERSAJ FRAZOJ / VARIOUS SENTENCES / 各类文句 (016) KIAM MI ESTIS LUDANTA VIOLONON , MIA ONKLO VIZITIS NIAN HEJMON . WHEN I WAS PLAYING VIOLIN , MY UNCLE VISITED OUR HOME . 当我 ( 当时 ) 正在拉小提琴时 , 我的叔叔访问了我的家。 (020) MI ESTOS FININTA LA EKSPERIMENTON PRI MASHINA TRADUKADO POST KELKAJ MONATOJ . I WILL HAVE FINISHED THE EXPERIMENT ABOUT MACHINE'S TRANSLATING IN SEVERAL MONTHS. 我在几月以后将已经完成关于机器的翻译的实验。 (028) BABELO NE ESTIS ELKONSTRUITA. BABEL HAD NOT BEEN BUILT UP . 巴贝尔塔没有被建成。 (029) NEPRE ESTOS ELKONSTRUITA LA NOVA BABELO . ABSOLUTELY WILL HAVE BEEN BUILT UP THE NEW BABEL . 新巴贝尔塔必然地将被建成。 (040) KIAL VI LERNAS ESPERANTON ? WHY DO YOU LEARN ESPERANTO ? 为什么你学习世界语 ? (044) NE PROKRASTU LA HODIAUAN LABORON GHIS MORGAU . DON'T PUT OFF THE TODAY'S WORK TILL TOMORROW . 别把今天的工作推迟到明天。 (045) KIEL BONE PENTRAS LA KNABO ! HOW WELL THE BOY PAINTS ! 男孩多么好地画画啊 ! (048) KIU ESTAS LA AUTORO DE LA LIBRO , KIUN VI JHUS LEGIS ? WHO IS THE AUTHOR OF THE BOOK , WHICH YOU JUST READ ? 你刚刚读了的书的作者是谁 ? (050) SE MI PARTOPRENUS EN VIA AMUZA AKTIVADO , MI ESTUS TRE GHOJA . IF I WOULD TAKE PART IN YOUR RECREATIONAL ACTIVITY , I WOULD BE VERY GLAD . 如果我参加你 ( 们 ) 的文娱活动 , 我会是很高兴的 . (056) CHU VI MEMORAS LA TAGOJN , KIAM NI KUNE STUDIS EN LA UNIVERSITATO ? DO YOU REMEMBER THE DAYS , WHEN WE TOGETHER STUDIED IN THE UNIVERSITY ? 你记得我们在一起在大学里面学习的日子吗 ? (058) UNUIGHU PROLETOJ DE CHIUJ LANDOJ ! LET PROLETARIANS OF ALL COUNTRIES UNITE ! 让所有国家的无产者联合吧 ! (061) KIEL SAGHA VI ESTAS ! HOW WISE YOU ARE ! 你是多么聪明啊 ! (062) ESPERANTO ESTAS INTERNACIA HELPA LINGVO . ESPERANTO IS INTERNATIONAL HELP LANGUAGE . 世界语是国际辅助语言。 (067) LIA PROPONO ESTAS , KE NI CHIUJ LIBERE ELMETU NIAJN OPINIOJN . HIS PROPOSAL IS , THAT WE ALL FREELY OUTPUT OUR OPINIONS . 他的建议是 , 让我们所有人自由地提出我们的意见。 (068) MI NE SCIAS , KIAM KOMENCIGHOS NIAJ FERIOJ . I DON'T KNOW , WHEN WILL BEGIN OUR HOLIDAYS . 我不知道 , 我们的假日什么时候将开始。 (069) LA LIBRO , KIU KUSHAS SUR LA TABLO , ESTAS VERDA . THE BOOK , WHICH LIES ON THE TABLE , IS GREEN . 在桌子上躺的书是绿的。 (071) LA INFANO PLORAS , CHAR IU LIN BATIS . THE CHILD CRIES , BECAUSE SOMEBODY BEAT HIM . 小孩哭 , 因为某人打了他。 (078) LERNI ESPERANTON NE ESTAS MALFACILE . TO LEARN ESPERANTO IS NOT DIFFICULT . 学习世界语不是困难的。 (084) MI NE SCIAS , CHU VI POVAS PLENUMI TIUN CHI TASKON . I DON'T KNOW , WHETHER YOU CAN FULFIL THIS TASK . 我不知道 , 是否你能完成这个任务。 (086) MULTAJ DIVERSLANDAJ ESPERANTISTOJ CHEESTOS LA UNIVERSALAN KONGRESON DE ESPERANTO OKAZONTAN PEKINE . A LOT OF VARIOUS COUNTRY'S ESPERANTISTS WILL ATTEND THE UNIVERSAL CONGRESS OF ESPERANTO TO BE HELD IN BEIJING . 许多不同国家的世界语者将参加在北京将召开的世界语的国际大会。 (089) LIA PROPONO ELEKTI NOVAN PREZIDANTON NE ESTIS AKCEPTITA . HIS PROPOSAL TO ELECT NEW PRESIDENT HAD NOT BEEN ACCEPTED . 他的选举新总统的建议没有被接受。 (090) SHI ESTAS LA PLEJ BELA EL LA KNABINOJ . SHE IS THE MOST BEAUTIFUL OF THE GIRLS . 她在女孩里面是最漂亮的。 (092) FALINTE , LI NE POVIS RELEVIGHI . HAVING FALLEN , HE COULD NOT GET UP . 摔倒了 , 他不能重新起来。 (093) FORIRONTE , LI PREMIS MIAN MANON . TO GO AWAY , HE SHOOK MY HAND . 将要离去 , 他握了我的手。 (098) MI TRE AMAS ESPERANTON , MI PLI AMAS ESPERANTISTOJN , MI PLEJ AMAS LA IDEALON DE ESPERANTO . I VERY MUCH LOVE ESPERANTO , I MORE LOVE ESPERANTISTS , I MOST LOVE THE IDEAL OF ESPERANTO . 我很爱世界语 , 我更爱世界语者 , 我最爱世界语的理想。 (116) NI LUDU , CHU BONE ? LET'S PLAY , ALL RIGHT ? 让我们玩吧 , 好吗 ? (119) KIA MIRAKLO TIO ESTAS , KE NIAJ ANTIKVULOJ KONSTRUIS LA GRANDAN MURON NUR PER SIAJ DU MANOJ ! WHAT MIRACLE IT IS , THAT OUR ANCESTORS BUILT THE GREAT WALL ONLY BY THEIR TWO HANDS ! 我们的祖先仅仅用自己的两手建造了长城 , 这是怎样的奇迹啊 ! (121) FORPASIS UNU TAGO , FORPASIS ANKAU LA DUA . PASSED AWAY ONE DAY , PASSED AWAY ALSO THE SECOND . 一天过去了 , 第二也过去了。 (122) CHU ESTAS EBLE , KE VI NENION SCIAS ? IS IT POSSIBLE , THAT YOU KNOW NOTHING ? 你不知道任何事 , 这是可能的吗 ? (131) LA HOMON , PRI KIU VI PAROLAS , MI NENIAM VIDIS . I NEVER SAW THE MAN , ABOUT WHOM YOU SPEAK . 我从未看见过你提到的人。 (132) NI , ESPERANTISTOJ , DEVAS LABORI PLI ENERGIE OL IAM . WE , ESPERANTISTS , MUST WORK MORE HARD THAN EVER . 我们 , 世界语者 , 应该比任何时候更努力工作。 (133) SOMERE ESTAS TRE VARME . IN SUMMER IT IS VERY HOT . 夏天是很热的。 (134) DOKTORO ZAMENHOF NASKIGHIS LA 15-AN DE DECEMBRO EN 1859 . DOCTOR ZAMENHOF WAS BORN ON THE 15TH OF DECEMBER IN 1859 . 柴门霍夫博士 1859 年十二月的 15 号出生。 (135) SE VI SCIUS , KIU LI ESTAS , VI LIN PLI ESTIMUS . IF YOU WOULD KNOW , WHO HE IS , YOU MORE WOULD ESTEEM HIM . 如果你知道 , 他是谁 , 你更会尊敬他。 (136) CENTOJ DA MALFERMAJ AUTOJ NIN PORTIS AL LA CENTRA LENIN-STADIONO , MALRAPIDE MOVIGHANTE TRA LA HOMA SVARMO . HUNDREDS OF OPEN CARS CARRIED US TO THE CENTRAL LENIN STADIUM , SLOWLY MOVING THROUGH THE MAN'S SWARM . 成百敞篷汽车把我们带到中央列宁运动场 , 缓慢地通过人群运动。 (137) MI VIDIS , KE LI FALIS KAJ LIA VESTO MALPURIGHIS . I SAW , THAT HE FELL AND HIS CLOTHES BECAME DIRTY . 我看见了 , 他摔倒了 , 他的衣服弄脏了。 (139) MI SCIIS , KE LI NE FAROS , KION LI PROMESIS . I KNEW , THAT HE WOULD NOT DO WHAT HE PROMISED . 我知道 , 他将不做他允诺的。 (140) ESTAS PAULO , KIU ARANGHIS LA AFERON . IT IS PAULO THAT ARRANGED THE AFFAIR . 是 PAULO 安排了事情。 (142) KUREGIS LA KNABO PER SIA TUTA FORTO , SED LI NE POVIS ATINGI LA PAPILION . RAN THE BOY BY HIS TOTAL STRENGTH , BUT HE COULD NOT ACHIEVE THE BUTTERFLY . 男孩用自己的整个力量狂奔 , 但是他不能达到蝴蝶。 (144) LI DONIS AL MI MULTAJN INSTRUAJN LIBROJN . HE GAVE ME A LOT OF TEACHING BOOKS . 他给了我许多教科书。 (145) CHU VI PAROLAS CHINE AU JAPANE ? DO YOU SPEAK IN CHINESE OR IN JAPANESE ? 你用中文还是用日文说话 ? (151) NUR TIU NE ERARAS , KIU NENIAM ION FARAS . ONLY THAT PERSON IS NOT WRONG , WHO NEVER DOES SOMETHING . 仅仅从不做某事的那个人不犯错误。 (155) ESPERANTO ESTAS CHIES PROPRAJHO . ESPERANTO IS EVERYBODY'S PROPERTY . 世界语是所有人的财产。 (156) MI MEMORAS CHIUN , KIUN MI VIDIS . I REMEMBER ALL , WHOM I SAW . 我记得我看见了的所有人。 (157) ESTAS NENIU EN LA CHAMBRO . THERE IS NOBODY IN THE ROOM . 在房间里面没有任何人。 第 页———————————————————————————————————————————————————— (3) DU POEMOJ / TWO POEMS / 两首诗歌 (099) LA ESPERO : ESPERANTISTA HIMNO ( POEMO FAR ZAMENHOF ) . (100) EN LA MONDON VENIS NOVA SENTO , TRA LA MONDO IRAS FORTA VOKO ; (101) PER FLUGILOJ DE FACILA VENTO , NUN DE LOKO FLUGU GHI AL LOKO . (102) NE AL GLAVO SANGONSOIFANTA , GHI LA HOMAN TIRAS FAMILION ; (103) AL LA MOND' ETERNE MILITANTA , GHI PROMESAS SANKTAN HARMONION . (099) THE HOPE : ESPERANTIST'S HYMN ( POEM BY ZAMENHOF ) . (100) INTO THE WORLD CAME NEW FEELING , OVER THE WORLD GOES STRONG VOICE ; (101) BY WINGS OF EASY WIND , NOW FROM PLACE LET IT FLY TO PLACE . (102) NOT TO SWORD BLOODTHIRSTY , IT PULLS THE MAN FAMILY ; (103) TO THE WORLD EVER FIGHTING , IT PROMISES SACRED HARMONY . (099) 希望 : 世界语者的颂歌 ( 柴门霍夫所作的诗歌 ) 。 (100) 新感觉来到了世界 , 有力的声音走遍世界 ; (101) 用顺风的翅膀 , 现在让它从一个地方飞到另一个地方吧。 (102) 它不把人的家庭 引到渴血的刀剑 ; (103) 向永远战争着的世界 , 它允诺神圣的和谐。 (104) AL NIA KARA LINGVO ( FAR IU NOVA ESPERANTISTO ) . (105) LA LINGVO GRACIA , KARA MIA , GHIS KIAM VI VENIS AL MI FINE FIN ? (106) ATENDIS SOIFE MI , ETERNE VIA , MI AMAS VIN ! (107) MI AMAS VIN VERE , PRUVU DIO , KAJ MIA BON-KORO BATAS NUR POR VI ; (108) NE PLU SEKRETETO ESTAS TIO : VIN AMAS MI ! (109) CHU KREDAS VI MIAN AMON MARAN ? (110) CHU KREDAS , KE MIA KORO FLAMAS ? (111) CHU KREDAS LA VORTON PURE KARAN : VIN MI AMAS ! (104) TO OUR DEAR LANGUAGE ( BY SOME NEW ESPERANTIST ) . (105) THE LANGUAGE GRACEFUL , MY DEAR , TILL WHEN YOU CAME TO ME AT LAST ? (106) WAITED LONGINGLY I , EVER YOURS , I LOVE YOU ! (107) I LOVE YOU TRUELY , LET GOD PROVE , AND MY GOOD HEART BEATS ONLY FOR YOU ; (108) NO LONGER THAT IS LITTLE SECRET : I LOVE YOU ! (109) DO YOU BELIEVE MY LOVE LIKE SEA ? (110) DO BELIEVE , THAT MY HEART BURNS ? (111) DO BELIEVE THE WORD PURELY DEAR : I LOVE YOU ! (104) 献给我们的亲爱的语言 ( 某新世界语者所作 ) 。 (105) 优美的语言 , 我的亲爱的 , 到什么时候你最后来到了我这儿 ? (106) 我渴望地等待 , 你的永远的 , 我爱你 ! (107) 我真实地爱你 , 让上帝证明吧 , 我的善良的心仅仅为了你跳动 ; (108) 那已经不再是小秘密 : 我爱你 ! (109) 你相信我的大海一样的爱吗 ? (110) 相信 , 我的心燃烧吗 ? (111) 相信纯粹地亲爱的词吗 : 我爱你 ! 第 57 页———————————————————————————————————————————————————— 世界语摘要 Automata Tradukado el Esperanto en la Chinan kaj Anglan Lingvojn --pri EChA Mashintraduka Sistemo EChA (el Esperanto en la Chinan kaj Anglan Lingvojn) estas esperimenta mashintraduka sistemo, kiu ricevas Esperanton kiel fontolingvon kaj elmetas fine la chinan kaj anglan lingvojn kiel celolingvojn. Ghi estas fraz-al-fraza traduksistemo, en kiu la analizo de la fontolingvo kaj la sintezo de la celolingvoj sendependas unu de alia. La traduka procezo de EChA tute automatas, nebezonante antau-redakton kaj post-redakton. La tuta peniga laboro dauris unu jaron. La sistemo EChA establighis sur la mikro-komputero IBM-PC/XT kaj la progamiga komputero-lingvo estas BASIC (D 2.00). EChA estas subtenata de la CCDOS sistemo (t.e. PC DOS 2.10 kun la tenejode china ideografiajho). La chefa parto de EChA konsistas el 6 linioj da analiza-sinteza programo. Krome, en la sistemo ankau fondighis 3 mashinvortaroj kaj 2 vortotabeloj kune kun la programoj por ilin establi, konsulti, ekspansiigi kaj protekti. La tuta sistemo programighis je ch. 10,000 BASIC-frazoj. En chi tiu eksperimento ni ricevis el EChA la mashintradukajhon de pli ol 150 frazoj kun diversaj lingvistikaj trajtoj inkluzive 2 poemojn (la unua estas La Espero far Zamenhof). La tradukajho en la china kaj angla celolingvoj estas sufiche prava kaj facile komprenebla. ( Vd. la apendicon ) La originala materialo elektighis el: 1. Mashinmondo far Sandor Szhatmari; 2. Gramatiko de Esperanto (Wei Yuanshu kaj Xu Wenqi, 1982). En la sistemo EChA spegulighas la enhavo de la tuta baza gramatiko de Esperanto kun chefaj fraztipoj, tial ghi povas ghuste trakti plejmulton da fenomenoj en Esperanto. Tamen, bedaurinde, limigite de tempo kaj la kondicho de komputero, la kuranta sistemo estas ankorau malgranda, la mashinvortaroj ege limigitas. Kompreneble, la sistemo bezonas ekspansiighon kaj plibonighon. Dekiam disvolvighis la esploro pri mashina tradukado en Chinio en 1957, EChA estas la unua sistemo por prilabori Esperanton. En majo de 1986 la sistemo trapasos la cezuron de la diploma komitato, pro kio la projektoro ricevos sian magistron. ___________ 丨 ___________ Enmeto de fontolingvo ----------------------- ____________________________________ 丨 ____________________________________ 1. Fortranchi gramatikajn finajhojn; konsulti la vortarojn ( Vortaro pri fleksaj vortoj, vortaro pri senfleksaj vortoj, vortaro pri vortogrupoj kaj vortotabelo por diferencigi la signon lau vortospeco ) ( VORTAROJ ) --------------------------------------------------------------------------- ____________________________________ 丨 _____________________________________ ANALIZO DE 2. Prilabori konjunkciojn kaj interpunkciojn, forigi la frazon en partojn FONTOLINGVO kaj trakti aliajn senfleksajn vortojn -------------------------------------------------------------------------- ____________________________________ 丨 ____________________________________ 3. Formighi CDC chenoj ( la interlingvo en EChA ) -------------------------------------------------------------------------- ____________________ ____________________________________ 丨 ____________________________________ 4. Produkti gramatikajn finajhojn por la angla lingvo kaj inserti helpajn vortojn por la china lingvo; diferencigi plursignifojn; konsulti la tabelon en la angla lingvo por senregulaj vortoj ------------------------------------------------------------------------- SINTEZO DE ___________________________________ 丨 ____________________________________ CELOLINGVOJ 5. Vicigi la vortordon por la angla lingvo ------------------------------------------------------------------------- ___________________________________ 丨 ____________________________________ 6. Vicigi la vortordon kaj beligi la frazon por la china -------------------------------------------------------------------------- ___________ 丨 __________ Elmeto de celolingvoj EChA sistemo konsistas el 3 subsistemoj: 1) Mashinvortaroj inter la fontolingvo kaj la celolingvoj En tiu chi subsistemo trovighas 5 vortaroj (tabeloj) kun la algoritmo por fortranchi gramatikajn finajhojn en Esperanto. La unua estas vortaro pri fleksaj vortoj, la dua pri senfleksaj vortoj kaj la tria estas por trakti vortogrupojn. La subsistemo liveras chiujn necesajn elementajn informojn al la frazkampo, kio bone bazighas por la postaj analizo kaj sintezo. 2) Analizo de la fontolingvo En chi tiu etapo la subsistemo decidas la strukturajn tavolojn kaj semantikajn interrilatojn de la prilaborata frazo. La rezulto enkorpighas en iu alte formala interlingvo CDC. La analiza procezo iras tute sendepende de ajna celolingvo, kio tre necesas kaj facile kompreneblas char la sistemo ne prenas iun certan lingvon kiel sian celon. Fakte la projektoro planas elekti la francan kaj la rusan kiel la trian kaj kvaran celolingvojn por la ekspansiota EChA. CDC estas la shlosilo al la sistemo EChA. Kiel mashintraduka interlingvo entenanta la rezulton de sendependa analizo pri fontolingvo, ghi konsistas el la informoj morfologia, sintaksa, situa, noda, tavola kaj chena. CDC ne nur priskribas prave la arbostrukturon de la prilaborata frazo, sed ankau enhavas en si utilajn aliajn informojn. Praktike, ghi bone bazighas por la plurlingvo-sinteza subsistemo. La unua linio de programo chefe celas la senfleksajn vortojn, speciale la konjunkciojn kaj interpunkciojn. Principe oni devas establi unu aron da analizareguloj por unu senfleksa vorto. En Esperanto ekzistas nur fiksa nombro da senfleksaj vortoj, sed ili estas tre kompleksaj en uzado, ghuste simile al la funkciaj vortoj en nacilingvoj. Fakte, ili chefe reflektas la lingvan individuecon, tial bezonas respektivan prilaboron. En chi tiu linio trovighas multe da malfacilajhoj, ekzemple pri la vortoj KAJ kaj KE. Ghenerale senfleksvortoj enhavas pli da gramatikaj signifoj. Tial la tasko chi tie eksterordinare gravas al la esperantofronta automate analiza sistemo. En la dua linio, la analizo multe pli abstraktas. La prilabora procezo estas cirkule voki la subprogramojn, kies kerno estas la verbosubprogramo kiu fakte estas matematika modelo de esperantogramatiko. Post la analizo rezultatas CDC-cheno responda al la fontofrazo. 3) Sintezo de la celolingvoj En la unua linio de chi tiu etapo inkluzivas ankau la regulojn por diferencigi plursignifojn kaj elekti 妅 onvenan esprimon en la celolingvoj lau la semantikaj trajtoj, la CDC kaj la semantike transferaj reguloj de la prilaborata vorto. En la kazo pri la sintezo de la china lingvo, la chefa tasko estas reordigi la prilaboratan frazon, char la vortordo en Esperanto estas tre libera kaj en la china lingvo tre mallibera. La reordiga informo dependas de kaj la chingramatikaj reguloj kaj la CDC interlingva cheno. Post la reordigo estas ankau necese plibonigi kaj beligi la tradukajhon precipe koncerne la inserton de la chinaj helpaj vortoj kiuj povas transporti etajn signifojn pri tempo, vocho kaj modo kaj aliajn nuancojn. Kiel chiuj scias, la china estas senfleksa lingvo, en kiu gramatikaj finajhoj tute mankas. Pri la lingvo angla, la sinteza kondicho fore favoras. La substantivoj en la angla ne sindistingeblas inter nominativo kaj akuzativo, tial la reordiga pasho chi tie celas certigi la frazon lau la tipa vortordo Subjekto-Predikato-Objekto (S-P-O). La alia grava tasko estas produkti finajhojn por la angla lingvo. Efektive, la morfologiaj transferaj reguloj inter la du lingvoj ne estas kompleksaj. Kvankam EChA estas nur eksperimenta malgranda sistemo, tamen ghi riche enhavas. EChA ne nur faras analizon morfologian (pri la fontolingvo Esperanto) sed ankau produkas finajhojn morfologiajn (pri la celoligvo angla). Ghi ankorau enkalkulas la regulojn vicigan (pri la china kaj angla) kaj beligan (pri la china). Krome, EChA havas sian interlingvon CDC, kiu pruvighas tre efika. Unuvorte, EChA tushas almenau chiujn problemojn por praktika sistemo, tial ghi vere estas tipa, tute automata modelo al unu-al-plurlingva praktika traduksistemo. ______________________________________________________________________________________________ Mi deziras chi tie eksprimi mian koran dankon al Profesoro Liu Yongquan kaj Profesoro Liu Zhuo. Sen ilia gvidado, mi tute ne povis plenumi mian eksperimenton pri EChA sistemo. Dekomence Profesoro Liu Yongquan subtenas entuziasme mian projekton pri EChA kaj donis multe da gvidaj konsiloj dum mia eksperimentado. Profesoro Liu Zhuo liveris al mi kelkajn algoritmojn de la elementaj operacioj pri mashina tradukado. Dankon ankau al Sinjorino Han pro shia helpo en la komputerochambro. BIBLIOGRAFIO 1. Liu Yongquan, Gao Zushun kaj Liu Zhuo, Enkonduko de Mashina Tradukado ( Eldonejo Kexuepuji, 1964 ) 2. Liu Yongquan k.a. La Mashina Tradukado en Chinio ( Eldonejo Zhishi, 1984 ) 3. La Elektita Traktataro pri Mashina Tradukado ( Eldonejo Kexuejishuwenxian, 1979 ) 4. Lingvo kaj Komputero (1) ( Eldonejo Zhongguoshehuikexue, 1982 ) 5. Lingvo kaj Komputero (2) ( Eldonejo Zhongguoshehuikexue, 1985 ) 6. Wei Yuanshu kaj Xu Wenqi, Gramatiko de Esperanto ( Eldonejo Shanghaiwaiyujiaoyu, 1982 ) 7. Kalocsay-Waringhien, Plena Analiza Gramatiko de Esperanto ( Eldonejo Zhongguoshijieyu, 1984 ) 8. Zhang Daozhen, Praktika Gramatiko de la Angla Lingvo ( Eldonejo Shangwu, 1984 ) 9. Ye Feisheng kaj Xu tongqiang, Skeleto de Lingvistiko ( Eldonejo Beijingdaxue, 1981 ) 10.Liu Yongquan kaj Li Wei, Nepre Estos Konstruita la Nova Babelo, 1985, akademia traktato por la Unua China Kongreso de Esperanto 11.Liu Zhuo, Tri Eksperimentoj pri Mashina Tradukado, 1980, akademia traktato por la Unua China Kongreso de Mashina Tradukado 12.Heinz Dieter MAAS, Automata Tradukado en kaj el Esperanto ( Lingvo-kibernetiko kaj aliaj internacilingvaj aktoj de la IX-a Internacia Kongreso de Kibernetiko pp. 75-81, 1982 Gunter Narr Verlag Tubingen ) 13.J. Chiau, Lingvojn Komputere Prilaboru kaj Esperanton Mashine Tradukadu, 1985, akademia traktato por la Unua China Kongreso de Esperanto 【相关】 硕士论文: 世界语到汉语和英语的自动翻译试验 立委硕士论文:1. EChA概况 立委硕士论文:2. 世界语: 语言学特点及其研究价值 立委硕士论文:3. 层次递归成分体系 立委硕士论文:4. EChA机器词典及词表 立委硕士论文:5. 世界语形态分析 立委硕士论文:6/7 世界语句法分析 立委硕士论文:8. 英语形态生成 立委硕士论文:9. 目标语调序 立委硕士论文:10. EChA 试验结果的分析 立委硕士论文【致谢】【参考书目】 立委硕士论文全文(世界语版) 《朝华午拾:shijie-师弟轶事(3)——疯狂世界语 》 灵感有如神授,巧夺岂止天工 《立委随笔:一小时学会世界语语法》 立委世界语文章 (1987): 《中国报道:通天塔必将建成》 立委世界语论文(1986): 《国际语到汉语和英语的自动翻译》 立委(1988)《世界科技:世界语到汉语和英语的自动翻译试验》 DLT项目背景介绍 立委硕士论文全文(世界语版) PhD Thesis: Morpho-syntactic Interface in CPSG (cover page) 【关于机器翻译】 【置顶:立委NLP博文一览】 《朝华午拾》总目录
立委履历 (一)工作经历 2006.11-至今 首席科学家 架构师,自然语言平台和核心技术设计者 所设计研发的自然语言平台支持新一代搜索引擎,用于企业市场,主要搜索互联网上的商业情报,包括产品技术信息,客户反馈,等。该产品为多家财富500强的研究部门和市场部门采用,证明了它提供的价值是其他搜索引擎和工具难以取代的。 1997/11 至 2006/03 Cymfony 公司,研究开发部,美国纽约州水牛城(Buffalo, New York) 主研究员(Principal Research Scientist) 自然语言处理副总裁(Vice President,NLP) (1999始) 撰写研究基金申请计划,先后赢得18项美国政府”小企业创新研究基金”(SBIR: Small Business Innovative Research),担任其课题负责人(PI: Principal Investigator or co-PI),研究开发新一代基于自然语言处理(NLP: Natural Language Processing)的信息抽取(IE: Information Extraction)技术。 该技术集中体现在 Cymfony 公司所开发的 InfoXtract(TM) 软件系列,包括 InfoXtract NLP/IE 引擎,组建技术,词典语法资源,有限状态转录机工具箱(Finite State Transducer Toolkit),机器自动学习工具箱(Machine Learning Toolkit)及开发平台。 在此基础上开发的软件产品 Brand Dashboard 和 Digital Consumer Insight,实时扫描处理数千种媒体报道,自动抽取品牌报道关键信息,过滤整合,分析数据全面反映品牌走势,为大企业创保作为无形资产的名优品牌提供决策参考,达到人工分析难以企及的广度及统计学意义上的精度。 2000 年帮助成功引进华尔街高科技风险基金一千一百万,使Cymfony由有两三个员工的从事互联网一般业务的公司发展成为具有70多员工,设立三处办公楼(美国波士顿,布法罗,和印度孟买分公司),引进专业管理人员及制订信息技术(IT: Information Technology)市场营销计划的高科技中小企业。 1999 年指导 Cymfony 研发部参与由美国国家标准技术局(NIST:National Institute of Standards and Technology)主持评判的第八届”文本检索大会”(TREC-8: Text Retrieval Conference)专项竞赛“自然语言问答系统”,获得第一名。 Cymfony 的技术及成长先后被多种媒体报道,包括《财富》,《华尔街日报》,《布法罗新闻》,及中文版《世界日报》。Cymfony 由于在一系列 SBIR 研究中成绩突出,被提名竞逐“2002 全美小企业最优合同项目年度奖”(2002 US Small Business Administration Prime Contractor of the Year Award)。 1987-1991 中国社会科学院语言研究所,北京 助理研究员 从事外汉机器翻译,自然语言处理及中文信息处理等领域的研究。 1988-1991 高立软件公司,北京 高级工程师(兼职) 从事高立英汉机器翻译系统 GLMT 的开发研究。主要工作有: 开发及调试八百条机器语法规则 设计及实现系统的语义模块背景知识库 培训及指导八人小组建立并开发有六万多词条的机器翻译词典及具有上万词典规则的专家词典规则库的开发 推动高立公司将 GLMT 1.0 产品化(1992) 该机译技术成功转化到香港韦易达公司袖珍电子词典系列产品中 GLMT于1992年1月在北京新技术产业开发试验区通过鉴定,先后获得北京市科技进步奖、新加坡INFORMATICS’92国际博览会计算机应用软件银奖和92年第二届中国科技之光博览会电子行业金奖,被列入火炬计划。 1988 承接荷兰 BSO 软件公司合同项目,撰写为多语种机器翻译服务的“汉语依从关系形式句法”,获得好评。 (二)教育经历 2001年 获加拿大 Simon Fraser University 计算语言学专业博士学位 学位论文 “汉语短语结构文法中的词法句法接口研究” (The Morpho-syntactic Interface in a Chinese Phrase Structure Grammar) 该汉语形式文法成功运用于英汉双向机器翻译系统的实验,证明同一部文法可以用于双向系统的汉语分析和综合。 攻读博士期间,多次担任计算机系自然语言实验室(Natural Language Lab)助研(Research Assistant)及语言学系助教(Teaching Assistant)或临时讲师(Sessional Instructor) 1991-1992年 英国曼彻斯特理工大学计算语言学中心(CCL/UMIST)博士候选人 1986年 获中国社会科学院研究生院语言学系机器翻译专业硕士学位 学位论文”从世界语到英语和汉语自动翻译”:这是国内少有的一对多机器翻译系统的研究探索。 1982年 安庆师范学院外语系英语专业学士学位 (三)获奖 2001年获本系杰出成就奖(Outstanding Achievement Award), Department of Linguistics, Simon Fraser University (award given to the best PhD graduates from the department) 1995-1997获加拿大卑诗省科学委员会 G.R.E.A.T. 奖学金 (G.R.E.A.T. Award, Scienc Council, B.C. CANADA), 旨在促进应用性博士课题与当地高科技企业的结合 1997年获校长研究资助(President’s Research Stipend) 1996年获新加坡 ICCC 大会特别旅行资助,宣讲论文 1995年获研究生奖学金(Graduate Fellowship) 1992年与傅爱平合作的机器翻译数据库应用程序获中国社会科学院软件二等奖 1991年获中英友好奖学金(中国教育部,英国文化委员会及包玉刚基金会联合提供)赴英深造 (四)其他专业活动 2002-2005,担任新加坡《中文和计算杂志》国际编委 1998-2004 担任企业导师(Industrial Advisor),先后指导20多位博士或硕士侯选人从事有工业应用前景的暑期实习研究课题(实习生来自纽约州立大学布法罗分校计算机系或语言学系) (五)论文发表记录 Srihari, R, W. Li and X. Li, 2006. Question Answering Supported by Multiple Levels of Information Extraction, a book chapter in T. Strzalkowski S. Harabagiu (eds.), Advances in Open- Domain Question Answering. Springer, 2006, ISBN:1-4020-4744-4. Srihari, R., W. Li, C. Niu and T. Cornell. 2006. InfoXtract: A Customizable Intermediate Level Information Extraction Engine. Journal of Natural Language Engineering, 12(4), 1-37, 2006. Niu,C., W. Li, R. Srihari, and H. Li. 2005. Word Independent Context Pair Classification Model For Word Sense Disambiguation. Proceedings of Ninth Conference on Computational Natural Language Learning (CoNLL-2005). Srihari, R., W. Li, L. Crist and C. Niu. 2005. Intelligence Discovery Portal based on Corpus Level Information Extraction. Proceedings of 2005 International Conference on Intelligence Analysis Methods and Tools. Niu, C., W. Li and R. Srihari. 2004. Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction. In Proceedings of ACL 2004. Niu, C., W. Li, R. Srihari, H. Li and L. Christ. 2004. Context Clustering for Word Sense Disambiguation Based on Modeling Pairwise Context Similarities. In Proceedings of Senseval-3 Workshop. Niu, C., W. Li, J. Ding, and R. Rohini. 2004. Orthographic Case Restoration Using Supervised Learning Without Manual Annotation. International Journal of Artificial Intelligence Tools, Vol. 13, No. 1, 2004. Niu, C., W. Li and R. Srihari 2004. A Bootstrapping Approach to Information Extraction Domain Porting. AAAI-2004 Workshop on Adaptive Text Extraction and Mining (ATEM), California. Srihari, R., W. Li and C. Niu. 2004. Corpus-level Information Extraction. In Proceedings of International Conference on Natural Language Processing (ICON 2004), Hyderabad, India. Li, W., X. Zhang, C. Niu, Y. Jiang, and R. Srihari. 2003. An Expert Lexicon Approach to Identifying English Phrasal Verbs. In Proceedings of ACL 2003. Sapporo, Japan. pp. 513-520. Niu, C., W. Li, J. Ding, and R. Srihari 2003. A Bootstrapping Approach to Named Entity Classification using Successive Learners. In Proceedings of ACL 2003. Sapporo, Japan. pp. 335-342. Li, W., R. Srihari, C. Niu, and X. Li. 2003. Question Answering on a Case Insensitive Corpus. In Proceedings of Workshop on Multilingual Summarization and Question Answering - Machine Learning and Beyond (ACL-2003 Workshop). Sapporo, Japan. pp. 84-93. Niu, C., W. Li, J. Ding, and R.K. Srihari. 2003. Bootstrapping for Named Entity Tagging using Concept-based Seeds. In Proceedings of HLT/NAACL 2003. Companion Volume, pp. 73-75, Edmonton, Canada. Srihari, R., W. Li, C. Niu and T. Cornell. 2003. InfoXtract: A Customizable Intermediate Level Information Extraction Engine. In Proceedings of HLT/NAACL 2003 Workshop on Software Engineering and Architecture of Language Technology Systems (SEALTS). pp. 52-59, Edmonton, Canada. Li, H., R. Srihari, C. Niu, and W. Li. 2003. InfoXtract Location Normalization: A Hybrid Approach to Geographic References in Information Extraction. In Proceedings of HLT/NAACL 2003 Workshop on Analysis of Geographic References. Edmonton, Canada. Li, W., R. Srihari, C. Niu, and X. Li 2003. Entity Profile Extraction from Large Corpora. In Proceedings of Pacific Association for Computational Linguistics 2003 (PACLING03). Halifax, Nova Scotia, Canada. Niu, C., W. Li, R. Srihari, and L. Crist 2003. Bootstrapping a Hidden Markov Model for Relationship Extraction Using Multi-level Contexts. In Proceedings of Pacific Association for Computational Linguistics 2003 (PACLING03). Halifax, Nova Scotia, Canada. Niu, C., Z. Zheng, R. Srihari, H. Li, and W. Li 2003. Unsupervised Learning for Verb Sense Disambiguation Using Both Trigger Words and Parsing Relations. In Proceedings of Pacific Association for Computational Linguistics 2003 (PACLING03). Halifax, Nova Scotia, Canada. Niu, C., W. Li, J. Ding, and R.K. Srihari 2003. Orthographic Case Restoration Using Supervised Learning Without Manual Annotation. In Proceedings of the Sixteenth International FLAIRS Conference, St. Augustine, FL, May 2003, pp. 402-406. Srihari, R. and W. Li 2003. Rapid Domain Porting of an Intermediate Level Information Extraction Engine. In Proceedings of International Conference on Natural Language Processing 2003. Srihari, R., C. Niu, W. Li, and J. Ding. 2003. A Case Restoration Approach to Named Entity Tagging in Degraded Documents. In Proceedings of International Conference on Document Analysis and Recognition (ICDAR), Edinburgh, Scotland, Aug. 2003. Li, H., R. Srihari, C. Niu and W. Li 2002. Location Normalization for Information Extraction. In Proceedings of the 19th International Conference on Computational Linguistics (COLING-2002). Taipei, Taiwan. Li, W., R. Srihari, X. Li, M. Srikanth, X. Zhang and C. Niu 2002. Extracting Exact Answers to Questions Based on Structural Links. In Proceedings of Multilingual Summarization and Question Answering (COLING-2002 Workshop). Taipei, Taiwan. Srihari, R. and W. Li. 2000. A Question Answering System Supported by Information Extraction. In Proceedings of ANLP 2000. Seattle. Srihari, R., C. Niu and W. Li. 2000. A Hybrid Approach for Named Entity and Sub-Type Tagging. In Proceedings of ANLP 2000. Seattle. Li. W. 2000. On Chinese parsing without using a separate word segmenter. In Communication of COLIPS 10 (1). pp. 19-68. Singapore. Srihari, R. and W. Li. 1999. Information Extraction Supported Question Answering. In Proceedings of TREC-8. Washington Srihari, R., M. Srikanth, C. Niu, and W. Li 1999. Use of Maximum Entropy in Back-off Modeling for a Named Entity Tagger, Proceedings of HKK Conference, Waterloo, Canada W. Li. 1997. Chart Parsing Chinese Character Strings. In Proceedings of the Ninth North American Conference on Chinese Linguistics (NACCL-9). Victoria, Canada. W. Li. 1996. Interaction of Syntax and Semantics in Parsing Chinese Transitive Patterns. In Proceedings of International Chinese Computing Conference (ICCC’96). Singapore W. Li and P. McFetridge 1995. Handling Chinese NP Predicate in HPSG, Proceedings of PACLING-II, Brisbane, Australia Uej Li. 1991. Lingvistikaj trajtoj de la lingvo internacia Esperanto. In Serta gratulatoria in honorem Juan Régulo, Vol. IV. pp. 707-723. La Laguna: Universidad de La Laguna Z. Liu, A. Fu, and W. Li. 1989. JFY-IV Machine Translation System. In Proceedings of Machine Translation SUMMIT II. pp. 88-93, Munich. 刘倬,傅爱平,李维 (1992). 基于词专家技术的机器翻译系统,”机器翻译研究新进展”,陈肇雄编辑,电子工业出版社,第 231-242 页,北京 李维,刘倬 (1990). 机器翻译词义辨识对策,《中文信息学报》,1990年第一期,第 1-13 页,北京 刘倬,傅爱平,李维 (1989), JFY-IV 机器翻译系统概要,《中文信息学报》,1989年第四期,第 1-10 页,北京 李维 (1988). E-Ch/A 机器翻译系统及其对目标语汉语和英语的综合,《中文信息学报》,1988年第一期,第 56-60 页,北京 其他发表 (略)