科学网

 找回密码
  注册

tag 标签: church

相关帖子

版块 作者 回复/查看 最后发表

没有相关内容

相关日志

《泥沙龙笔记:【钟摆摆得太远】高大上,但有偏颇》
liwei999 2015-10-22 22:17
【Church: 钟摆摆得太远 】 Nick: 基本是我神经网络简史的nlp版,将来我写little history of ai时,要偷点料。 Yu: 老李这片译作很好,有很多思路。 我: 绝对精良 信达 雅不敢说 毛: 这篇东西真是太好了。要是能经尼克的手再加上点作料和八卦,那就绝了。 Nick: 文中某些人我有料,写了一些,如乔,明等。但大料得等以后。 毛: 是啊,加点八角、茴香什么的,回下锅,就像回锅肉不是很好吃吗。 我: 这些人是科霸,不满他们的大有人在 有意思的是 church 当年也是犯上作乱一族 难得他能反思 并重新介绍 更有意思的是 他介绍的东西其实站不住脚 譬如乔氏的中心递归论 最有意思的是 church 在万米高空相当准确地看清了天下大势 可是他的反思 价值主要在于自我批判 而不在于对老科霸的招魂 招魂实际上是指错了方向 让后学更糊涂了 毛: 山上能观虎斗,高空当然更好。 这篇东西有点像《The AI Debate》,不过还是太粗线条了一点。 我: 不容易了。再细的话 该写书了 而不是论文 毛: 对的,就应该有这么一本书,看着才过瘾。 我: 写得实在精彩 否则也不会坐冷板凳一字一句翻译它。 本来只是摘要介绍 后来看得上瘾 索性全文翻译了 毛: 好,有贡献。 Yu : 老李,至少有我们读。想想有多少文章压根没人读 我: 是啊。原译作的数字版是计算机学会的一整本杂志,下载奇慢,没法普及,如今转载了,手机上也能看了,读者会多一些了: http://www.almosthuman.cn/2015/10/21/mjsx2/#rd 译者按 :肯尼斯·丘吉(Kenneth Church) 是自然语言领域的泰斗,语料库语言学和机器学习的开拓者之一。丘吉的这篇长文《钟摆摆得太远》(A Pendulum Swung Too Far) 是一篇主流反思的扛鼎之作。作者在文章中回顾了人工智能发展中,理性主义和经验主义各领风骚此消彼长的历史规律,并预测了今后20 年自然语言领域的发展趋势。文章的主旨是,我们这一代学者赶上了经验主义的黄金时代(1990 年迄今),把唾手可得的低枝果实采用统计学方法采摘下来,留给下一代的都是“难啃的硬骨头”。20 多年来,向统计学一边倒的趋势使得我们的教育失之偏颇。现在应该思考如何矫正,使下一代学者做好创新的准备,结合理性主义,把研究推向深入。丘吉的忧思溢于言表。丘吉预测,深度网络的热潮为主流经验主义添了一把火,将会继续主导自然语言领域十多年,从而延宕理性主义回归的日程表。但是他认为理性主义复兴的历史步伐不会改变。他对主流漠视理性主义的现状颇为忧虑,担心下一代学者会淹没在一波又一波的经验主义热潮中。 选自《 中国计算机学会通讯 》第9卷第12期。本文译自 Linguistics issues in Language Technology , 2011; 6(5) K. Church 的“A Pendulum Swung Too Far”一文。 【置顶:立委科学网博客NLP博文一览(定期更新版)】
个人分类: 立委科普|3542 次阅读|0 个评论
Church - 计算语言学课程的缺陷 (翻译节选)
热度 2 liwei999 2013-10-3 08:16
节选译自: K.Church2011. A Pendulum SwungToo Far . Linguistics issues in Language Technology, Volume 6, Issue 5. 3.5无视历史注定要重复历史错误 在多数情况下,机器学习、信息检索和语音识别方面的实证复兴派干脆无视 PCM(Pierce,Chomsky and Minsky)的论点,虽然神经网络给感知机增加隐藏层可以看作是对敏斯基和帕佩特批评的让步。尽管如此,敏斯基和帕佩特(1988)对敏斯基和帕佩特(1969年)【感知机】出版以来的20年领域进展之缓慢深表失望。 “在编写这一版时,我们本来准备根据进展‘把这些理论更新’。但是,当我们发现自本书1969年第一版以来,没有看见什么有意义的进展,我们认为保留原文更为有利...只需加一个尾声即可。...这个领域进展如此缓慢的原因之一是,不熟悉领域历史的研究人员继续犯别人以前已经犯过的错误。有些读者听说该领域没有什么进步,可能会感到震惊。难道感知机类的神经网络(新名称叫connectionism,连通主义)没有成为热烈讨论的主题么?是的,的确存在很大的兴趣,很多的讨论。可能确实也有些现在的发现在未来也许会显出重要性。但可以肯定地说,领域的概念基础并没有明显改变。今天引起兴奋的问题似乎与前几轮的兴奋大同小异...。我们的立场依然是当年我们写这本书时的立场:我们相信这个领域的工作是极为重要和丰富的,但我们预计其增长需要一定程度的批判性分析,可这种分析在我们更浪漫的倡导者那里却一直似乎没有人愿意去做,也许因为连通主义的精神似乎变得与严谨分析南辕北辙。” ( Minsky and Papert 1988, 前言,第vii页) 计算语言学课程的缺陷 正如敏斯基和帕佩特上面指出的,我们之所以不断犯同样的错误与我们的教学有关。辩论的一方在当代计算语言学教科书中不再提及,已被淡忘,需要下一代人重新认识和复活它。当代的计算语言学教科书很少介绍 PCM三位前辈。皮尔斯在汝拉夫斯基和马丁编著的教科书(Jurafskyand Martin 2000)以及曼宁等编著的两套教科书中(Manning and Schütze 1999;Manning et al. 2008)根本没有提及。敏斯基对感知机的批评只在三本教科书之一中简要提起(Manningand Schütze 1999,第603页)。刚入门的新学生也许意识不到所谓“相关的学习算法”(见下列粗斜体)其实包含了当今领域非常流行的方法,如线性和logistic回(linear and logistic regression)。 “一些其他的梯度下降算法( gradient descent algorithms)也有类似的收敛定理,但是多数情况下,收敛只能达到局部最优。…感知机收敛能达到全局最优是因为它们选用了线性分离机这样比较简单的分类模型。很多重要的问题是线性不可分的,其中最著名的是异或(XOR)问题。… 决策树(decision tree)算法可以处理这样的问题,而感知机则不能。研究人员在对神经网络的最初热情(Rosenblatt 1962)以后,开始意识到这些局限。其结果是,对于神经网络及其 相关的学习算法 的兴趣很快消退,此后几十年一直一蹶不振。敏斯基和帕佩特的论文(Minskyand Papert 1969)通常被认为是这类学习算法式微的起点。” 曼宁等 2008 版教科书(Manning et al. 2008)在神经网络算法描述上,有简短的文献指向敏斯基和帕佩特1988年的论文(Minsky and Papert 1988),但并未提及文中的尖锐批评: “对上面提到但本章未及细述的算法感兴趣的读者可以参阅以下文献:神经网络方面有Bishop (2006),线性和logistic回归方面有Hastie et al. (2001) 以及 Minsky and Papert (1988)”(Manning et al. 2008,第292页)”。 根据这样的文献指引,一个学生可能得出错误印象,以为敏斯基和帕佩特是这些神经网络算法(以及当今流行的线性和 logistic回归这类方法)的赞许者。 毕晓普明确指出,敏斯基和帕佩特绝不是感知机和神经网络的赞许者,而且把它们认作 “不正确的构想” (“incorrect conjecture”)予以排斥(Bishop2006,第193页)。毕晓普把神经网络在实际应用中的普及看做是对敏斯基和帕佩特批评的反证,认为并非如他们所说的那样“没有多少改变”,“多层网络并不比感知机更有能力识别连通性(connectedness)”。 当代教科书应该教授给学生像神经网络这类有用的近似方法的优点和缺点。辩论双方都大有可言。排除任何一方的论证都是对我们的下一代不负责任,尤其是当其中一方的批评是如此的尖锐,用到 “不正确的构想”和“没有多少改变”这样的说法。 乔姆斯基比皮尔斯和敏斯基在当代教科书中被提及多一些。曼宁和舒兹的教科书( Manning and Schütze 1999)引用乔姆斯基10次,汝拉夫斯基和马丁的教科书(Jurafsky and Martin 2000)的索引中共有27处文献指向乔姆斯基。第一本书中较少引用是因为它专注于一个相对狭窄的话题,统计型自然语言处理。而第二本教科书涉及面广泛得多,包括音韵学和语音。因此,第二本书还引用了乔姆斯基的音韵学工作(Chomskyand Halle 1968)。 两本教科书都提到乔姆斯基对有限状态方法的批评,以及这些批评在当时对经验主义方法论的打击性效果。但是话题迅速转移到描述这些方法的复兴,却相对较少讨论其论点,经验主义回归的动因及其对目前实践以及未来的影响。 汝拉夫斯基和马丁的教科书第 230-231页写道(Jurafsky and Martin 2000): “在一系列极具影响力的论文中,始于乔姆斯基(1956),包括乔姆斯基(1957)以及米勒和乔姆斯基(1963) (Miller and Chomsky1963),诺姆·乔姆斯基认为,‘有限状态的马尔可夫过程’虽然可能是有用的工程近似方法,却不可能成为人类语法知识的完整认知模型。当时的这些论证促使许多语言学家和计算语言学家完全脱离了统计模型。 “N元模型的回归开始于耶利内克等(Jelinek, Mercer, Bahl)的工作。…” 两本教科书介绍 N元文法都是从引用其优缺点的讨论开始(Jurafsky and Martin 2000, 第191页): “但是必须认识到,所谓‘一个句子的概率’是一个完全无用的概念,无论怎样理解这个术语。” (Chomsky 1965, 第57页) “任何时候,只要一个语言学家离开研究组,识别率就会上升。”(FredJelinek,当时他在IBM 语音组, 1988) 曼宁和舒兹( 1999,第2页)是以这样的引用开始讨论的: “统计的考量对于理解语言的操作与发展至关重要。”(Lyons1968, 第98页) “一个人对合法语句的产生和识别能力不是基于统计近似的概念之类。”(Chomsky 1957, 第16页) 这样正反面观点的引用确实给学生介绍了争议的存在,但却不能真正帮助学生明白这些争议意味着什么。我们应提醒学生,乔姆斯基反对的是一些如今极其流行的有限状态的方法,包括 N元文法和隐马尔可夫模型,因为他相信这些方法无法捕捉远距离的依从关系(例如,一致关系的限制条件和wh-位移现象)。 乔姆斯基的立场直到今天仍然是有争议的,本文审阅者之一的反对意见也佐证了这种争议。我不希望此时在这场辩论中站在某一方。我只是要求我们应该教给下一代辩论的双方说辞,使他们不需要重新发现任何一方。 计算语言学学生应该接受普通语言学和语音学的培训 为了给进入这行的学生为低垂水果采摘完后的情形做好准备,今天的学生教育应该向广度发展,他们应该全面学习语言学的主要分支,如句法、词法、音韵学、语音学、历史语言学以及语言共性。我们目前毕业的计算语言学学生视野太窄,专业性太强,他们对于一个很专门的领域具有深入的知识(如机器学习和统计型机器翻译),但可能没听说过很多著名的语言学现象,譬如,格林伯格共性( Greenberg’s Universals), 提升(Raising), 等同( Equi), 量词辖域(quantifier scope), 空(gapping), 孤岛条件(islandconstraints)等。我们应该确保参与指代(co-reference)研究的学生都知道c-统制(c-command) 和指称相异(disjointreference)。 当学生在计算语言学会议上宣讲论文之前,他们应该了解形式语言学(FormalLinguistics)对此问题的标准处理。 语音识别工作的学生需要了解词的重音(如: Chomsky and Halle 1968)。音韵学重音对于下游语音和和声学过程具有相当的影响。 图 3 “politics” and “political”的谱图显示有三个/l/同位音。在重音前后出现不同的音位变体。 语音识别目前没有充分利用单词重音特征是一个不小的遗憾,因为重音强调是语音信号中最突出的特性之一。 T图3显示了最小对立体 “politics”和“political”的波形和谱图。这两个词千差万别,目前的技术着重于语音单位层面的区别: 1. “Politics”以 –s 结尾,而“political”以-al结尾。 2. 与 “politics” 不同,“political”中第一个元音是弱化的央元音(schwa)。 重音的区别更为突出。在诸多与重音有关的区别中,图 3突出了重音前与重音后/l/同位音之间的区别。另外还有对/t/音的影响。“politics”中 /t/ 是送气音,但在“political”中却是闪音。 目前,在语音单位层面( segmental level),仍有大量低悬水果的工作,但这些工作终有完结之时。我们应该教给语音识别的学生有关音韵学和词重音的知识,以便他们在技术瓶颈已经超越语音单位层面以后依然游刃有余。既然存在与重音相关超过三元语音单位的远距离关系,重音方面的进展需要对目前流行的近似方法的长处与缺陷均有深入的理解。语音识别方面的基础性进展,譬如能有效使用重音,很可能要依赖于基础技术的进步。 ~~~~~~~~~~~~~~~~~~~~~~~~ 3.5 Those WhoIgnore History Are Doomed To Repeat It Forthe most part, the empirical revivals in Machine Learning, Information Retrieval and Speech Recognition have simply ignored PCM's arguments, though in the case of neural nets, the addition of hidden layers to perceptrons could be viewed asa concession to Minsky and Papert. Despite such concessions, Minsky and Papert(1988) expressed disappointment with the lack of progress since Minsky andPapert (1969). “In preparing this edition we were tempted to‘bring those theories up to date.’ But when we found that little of significance had changed since 1969, when the book was first published, we concluded that it would be more useful to keep the original text ... and add an epilogue. ... One reason why progress has been so slow in this field is that researchers unfamiliar with its history have continued to make many of the same mistakes that others have made before them. Some readers may be shocked to hear it said that little of significance has happened in the field. Have not perceptron-like networks - under the new name connectionism - become a major subject of discussion. ... Certainly, yes, in that there is a great deal of interest anddiscussion. Possibly yes, in the sense that discoveries have been made thatmay, in time, turn out to be of fundamental importance. But certainly no, in that there has been little clear-cut change in the conceptual basis of the field. The issues that give rise to excitement today seem much the same as those that were responsible for previous rounds of excitement. ... Our position remains what it was when we wrote the book: We believe this realm of work to be immensely important and rich, but we expect its growth to require a degree of critical analysis that its more romantic advocates have always been reluctant to pursue- perhaps because the spirit of connectionism seems itself to go somewhat against the grain of analytic rigor.(Minsky and Papert 1988,Prologue, p. vii) Gaps in Courses on Computational Linguistics Part of the reason why we keep making the same mistakes, as Minsky and Papert mentioned above, has to do with teaching. One side of the debate is written out of the textbooks and forgotten, only to be revived/reinvented by the next generation. Contemporary textbooks in computational linguistics have remarkably little to say about PCM. Pierce isn't mentioned in Jurafsky andMartin (2000), Manning and Schütze (1999) or Manning et al. (2008). Minsky'scriticism of Perceptrons is briefly mentioned in just one of the three textbooks: Manning and Schütze (1999, p. 603). A student new to the field might not appreciate that the reference to “ related learning algorithms ” (see bold italics below) includes a number of methods that are currently very popular such as linear and logistic regression. “There are similar convergence theorems for some other gradient descent algorithms, but in most cases convergence will only be to a local optimum. . . .Perceptrons converge to a global optimum because they select a classifier from a class of simpler models, the linear separators. There are many important problems that are not linearly separable, the most famous being the XOR problem. . . . A decision tree can learn such a problem whereas a perceptron cannot. After some initial enthusiasm about Perceptrons (Rosenblatt, 1962), researchers realized these limitations. As a consequence, interest in perceptrons and related learning algorithms faded quickly and remained low for decades. The publication of Minsky and Papert (1969) is often seen as the point at which the interest in this genre of learning algorithms started to wane.” Manning et al. (2008) have a brief reference to Minsky and Papert (1988)as a good description of perceptrons, with no mention of the sharp criticism. “Readers interested in algorithms mentioned, but not described in this chapter, may wish to consult Bishop (2006) for neural networks, Hastie et al. (2001) for linear and logistic regression, and Minsky and Papert (1988) for the perceptron algorithm.” Based on this description, a student might come away with the mistaken impression that Minsky and Papert are fans of perceptrons (and currently popular relatedmethods such as linear and logistic regression). Bishop (2006, p. 193) makes it clear that Minsky and Papert are no fans of perceptrons and neural networks, but dismisses their work as “incorrect conjecture”. Bishop points to widespread use of neural networks in practical application ascounter-evidence to Minsky and Papert's claim above that “not much has changed”and “multilayer networks will be no more able to recognize connectedness than are perceptrons.” Contemporary textbooks ought to teach both the strengths and the weaknessesof useful approximations such as neural networks. Both sides of the debate have much to offer. We do the next generation a disservice when we dismiss one side or the other with harsh words like “incorrect conjecture” and “not much haschanged.” Chomsky receives more coverage than Pierce and Minsky in contemporary textbooks.There are 10 references to Chomsky in the index of Manning and Schütze (1999)and 27 in the index of Jurafsky and Martin (2000). The first textbook has fewer references because it focuses on a relatively narrow topic, Statistical Natural Language Processing, whereas the second textbook takes a broader cut across awider range of topics including phonology and speech. Thus, the secondtextbook, unlike the first textbook, cites Chomsky's work in phonology: Chomskyand Halle (1968). Both textbooks mention Chomsky's criticism of finite-state methods and the devastating effect that they had on empirical methods at the time, though they quickly move on to describe the revival of such methods, with relativelylittle discussion of the argument, motivations for the revival, andimplications for current practice and the future. “In a series of extremely influential papers starting with Chomsky (1956) and including Chomsky (1957) and Miller and Chomsky (1963), Noam Chomskyargued that “finite-state Markov processes,” while a possibly useful engineering heuristic, were incapable of being a complete cognitive model of human grammatical knowledge. These arguments led many linguists and computational linguists away from statistical models altogether. “The resurgence of N-gram models came from Jelinek, Mercer, Bahl.…” Both books also start the ngram discussion with a few quotes, pro and con. “But it must be recognized that the notion ‘probability of a sentence’ is an entirely useless one, under any known interpretation of this term” (Chomsky1965, p. 57) “Anytime a linguist leaves the group the recognition rate goes up.”(Fred Jelinek, then of IBM speech group, 1988) Manning and Schütze (1999, p. 2) starts the discussion with these quotes: “Statistical considerations are essential to an understanding of the operation and development of languages.” (Lyons 1968, p. 98) “One's ability to produce and recognize grammatical utterances is not based on notions of statistical approximations and the like.”( Chomsky 1957, p. 16) Such quotes introduce the student to the existence of a controversy, but they don't help the student appreciate what it means for them. We should remind students that Chomsky objected to a number of finite-state methods that are extremely popular today including ngrams and Hidden Markov Models because he believed such methods cannot capture long-distance dependences (e.g., agreement constraints and wh-movement). Chomsky's position remains controversial to this day, as evidenced by anobjection from one of the reviewers. I do not wish to take a position on this debate here. I am merely asking that we teach both sides of this debate to the next generation so they won't reinvent whichever side we fail to teach. Educating Computational Linguistics Students in General Linguistics andPhonetics To prepare students for what might come after the low hanging fruit has been picked over, it would be good to provide today's students with a broad education that makes room for many topics in Linguistics such as syntax, morphology, phonology, phonetics, historical linguistics and language universals. We are graduating Computational Linguistics students these days that have very deep knowledge of one particular narrow sub-area (such asmachine learning and statistical machine translation) but may not have heard of Greenberg's Universals, Raising, Equi, quantifier scope, gapping, island constraints and so on. We should make sure that students working on co-reference know about c-command and disjoint reference. When students present a paper at a Computational Linguistics conference, they should be expected to knowthe standard treatment of the topic in Formal Linguistics. Students working on speech recognition need to know about lexical stress (e.g., Chomsky and Halle (1968)). Phonological stress has all sorts of consequences on downstream phonetic and acoustic processes. Speech recognizers currently don't do much with lexical stress which seemslike a missed opportunity since stress is one of the more salient properties in the speech signal. Figure 3 shows wave forms and spectrograms for the minimal pair: “politics” and “political.” There are many differences between these two words. The technology currently focuses on differences at the segmental level: 1.“Politics” ends with -s whereas “political” ends with -al. 2. The first vowel in “political” is a reduced schwa unlike the firstvowel in “politics.” The differences in stress are even more salient. Among the many stress-related differences, Figure 3 calls out the differences between pre-stress and post-stress allophones of /l/. There are also consequences in the /t/s; /t/ isaspirated in “politics” and flapped in “political.” Currently, there is still plenty of low-hanging fruit to work on at the segmentallevel, but eventually the state of the art will get past those bottlenecks. Weought to teach students in speech recognition about the phonology andacoustic-phonetics of lexical stress, so they will be ready when the state ofthe art advances past the current bottlenecks at the segmental level. Since there are long-distance dependencies associated with stress that span over more than tri-phones, progress on stress will require a solid understanding of the strengths and weaknesses of currently popular approximations. Fundamental advances in speech recognition, such as effective use of stress, will likely require fundamental advances to the technology. 【置顶:立委科学网博客NLP博文一览(定期更新版)】
个人分类: 立委科普|8259 次阅读|5 个评论
[转载]The inscription in the church of Westminster
hslqdkxw 2013-5-31 09:35
在威斯敏斯特教堂旁边,矗立着一块墓碑,上面刻着一段非常著名的话: “当我年轻的时候,我梦想改变这个世界;当我成熟以后,我发现我不能够改变这个世界,我将目光缩短了些,决定只改变我的国家;当我进入暮年以后,我发现我不能够改变我们的国家,我的最后愿望仅仅是改变一下我的家庭,但是,这也不可能。当我现在躺在床上,行将就木时,我突然意识到:如果一开始我仅仅去改变我自己,然后,我可能改变我的家庭;在家人的帮助和鼓励下,我可能为国家做一些事情;然后,谁知道呢?我甚至可能改变这个世界。” 其实我们最需要的是改变自己。怎样才能改变自己呢? “如果你们听过他的道,领了他的教,学了他的真理,就要脱去你们从前行为上的旧人。这旧人是因私欲的迷惑,渐渐变坏的。又要将你们的心志改换一新,并且穿上新人,这新人是照着神的形象造的,有真理的仁义和圣洁。” (以弗所书 4:21-24 The inscription in the church of Westminster Next to the church in Westminster, there is a tombstone inscribed with following famous words: When I was young and free and my imagination had no limits, I dreamed of changing the world. As I grew older and wiser, I discovered the world would not change, so I shortened my sights somewhat and decided to change only my country. But it too, seemed immovable. As I grew into my twilight years, in one last desperate attempt, I settled for changing only my family, those closest to me, but alas, they would have none of it. And now as I lie on my deathbed, I suddenly realise, If I had only changed myself first, then by example I would have changed my family. From their inspiration and encouragement, I would then have been able to better my country and, who knows, I may have even changed the world. Actually what we need to do is to change ourselves , But how can we change ourselves? “For surely you have heard about him and were taught in him, as truth is in Jesus. You were taught to put away your former way of life, your old self, corrupt and deluded by its lusts, and to be renewed in the spirit of your minds,and to clothe yourselves with the new self,created according to the likeness of God in true righteousness and holiness.” (EPHESIANS 4:21-24 )
个人分类: 研究生|1734 次阅读|0 个评论

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-6-1 23:54

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部