三峡库区中药产业协同创新中心(iCCNP)立足库区中药材特色资源优势,旨在通过对接国家2011计划,促进政产学研协同创新,提升中药产业和新药研发的协同创新能力。中心依托重庆市新药创制产业技术创新战略联盟,以重庆市科委为指导、西南大学为牵头,重庆大学、重庆中药研究院、太极集团、希尔安药业等为核心以及多家境内外合作单位形成的政产学研用协同体。中心将以国家急需、行业一流为根本出发点,围绕三峡库区中药资源的可持续发展和综合开发,从重大前瞻性科学问题、行业产业共性技术问题、区域经济与社会发展的关键问题出发,突破原有制约创新的体制机制束缚,逐步成为具有重大影响的学术高地、行业产业共性技术的研发基地和区域创新发展的引领阵地。 本中心依托牵头单位西南大学及共建单位在药学、中药学、化学、生物学和生物材料领域的综合优势,着力在现代药物分析技术、中药资源与天然产物化学、药物合成与先导化合物发现、合成生物学与生物技术药物、药物递送与新剂型等方向开展研究,力争在中药资源综合利用和新药研发领域取得重大突破。 本中心借助西南大学化学博士后流动站和生物学博士后流动站常年招收有药学、中药学、化学、生物学、生物医学或纳米材料学、质谱与核磁研究背景的优秀青年博士从事博士后研究。 根据候选人基本条件、博士后研究计划提供有竞争力的薪酬。博士后收入将由学校的基本收入(10.0万元, 包含保险、绩效等)+合作导师提供补助(1-6万元)+协同中心配套补助(1-6万元)三部分组成,不低于12万/年(2.0万美元),优秀博士后年收入可达22万(3.6万美元)。在站期间的福利待遇将按照学校有关政策执行。考核表现优异者可作为特别人才留校。 欢迎优秀青年博士踊跃申请和垂询。如有意向,请发送个人详细简历和研究计划,邮件请以“应聘协同创新中心博士后”为主题。 联系人:李翀; 电 话:023-68251225;邮箱: liliccli@163.com . 三峡库区中药产业协同创新中心 西南大学药学院 Wanted: Ten postdoctoral fellowship positions have been recently opened in the Collaborative Innovation Center of Natural Products in the Three Gorges Reservoir Area (iCCNP), led by the College of Pharmaceutical Sciences, Southwest University. Applicants must have a Ph.D. degree and have a strong background in pharmacy, natural products, chemistry, biology, biomedicine, nanomaterials, mass spectrometry or nucleic magnetic resonance. The candidates will be evaluated based on the candidates' qualifications, their drive to explore new fields, and their willingness to collaborate. Excellent command of spoken and written English is essential in all positions. Successful candidates will be provided with competitive salaries. All interested applicants may submit a copy of their CV along with the names and email addresses of two references to: Professor Chong Li College of Pharmaceutical Sciences, Southwest University Chongqing 400715, P.R. China Email: liliccli@163.com Website: http://pharmacy.swu.edu.cn/english/content.php?classid=87
沈斌、周莹君、王家海.基于 AHP-RBF 神经网络的协同制造伙伴选择研究.机械设计与制造, No.11,Nov.2008:139~141 Research of collaborative manufacturing partners selection based on AHP-RBF algorithm. Machinery Design Manufacturing, No.11,Nov.2008:139~141
阅读全文,由此进入: 《软件》杂志 2011年 第6期 http://www.ccomsoft.com/kanlist.asp?ids=154 专家论坛 协同智能计算系统——理论模型及其应用 邹晓辉 邹顺鹏 《软件》杂志 2011年 第5期 http://www.ccomsoft.com/kanlist.asp?ids=153 专家论坛 间接计算模型和间接形式化方法 邹晓辉1,2,3 邹顺鹏1 中国知网:邹晓辉-协同智能计算系统——理论模型及其应用 . 软件 . 2011年06期 http://www.cnki.net/kcms/detail/12.1151.TP.20111004.1537.001.html 致 谢 感谢美国《科学》杂志执行总编(executive editor Monica M. Bradford)和高级编辑(Senior Editor Pamela J. Hines)在《Science》执行总编办公室耐心听取了笔者介绍自 己核心原创成果的三个要点并给予了很好的建议! 同时,还要 感谢UC Berkeley 提供笔者机会来介绍汉语形式化数据库即 字与字组的关系数据库设计方案及其依据的“言本位”理论! 有关单位涉及其 信息学院(Technical Lead Howie X. Lan 在认真听取了笔者介绍自己核心原创成果之后还给予了三点 书面评价,并认为该成果适合放在云端以供分享)、 RecLetterProfZou (1)from-lan.pdf 东方语言系 (汉语中心张丽华不仅两次认真听取了笔者介绍自己核心原创 成果而且还预订了教室并召集了老师们专门听取了笔者的详 细报告和答疑)以及 高等教育研究中心主任(C. Judson King 不仅认真听取了笔者介绍的原创思想而且还通过交谈举例表 示了理解并给予了积极肯定)等。这些对本文的形成或尽快公 开均具有积极的促进作用。 2011-10-5 ZouXiaohui17:10:57 在 “ 协同智能计算系统 ” 中, 数学计算的东西,是由计算机去做的; 英汉双语的解读,才由自然人来做的。 ZouXiaohui17:14:28 我的研究一个最为显著的地方就是我考虑了你这样 “ 一看到数学计算的东西就蒙了 很反感 ” 的很大一批人的情况 Joan17:16:29 呵呵 谢谢老师对我们的体恤 ZouXiaohui17:16:35 并为你们选择了一条机人或人机之间 “ 合理分工、优势互补;高度协作、优化互动 ” ( 16 字方针)各行其道的技术实现途径 ZouXiaohui17:19:32 ZouXiaohui17:20:10 你看第二条路径 哪里还有一个数学符号呢? ZouXiaohui17:21:07 你要知道,它正是为解答你的问题而专门绘制的一幅图呢 ZouXiaohui17:21:57 图 2 协同智能计算模型及其汉语形式化计算实例示意图 ZouXiaohui17:24:21 看见没有? 你,还有千千万万像你这样不喜欢数学的年轻人,都只需要对图 2 左边的几朵云中的言和语进行处理(学习和理解)就足矣! ZouXiaohui17:25:05 右边的数据或数字就全都交给计算机去处理。 ZouXiaohui17:27:08 左右两边相互转换则由在机人或人机之间遵循 “ 合理分工、优势互补;高度协作、优化互动 ”16 字方针的 “ 协同智能计算系统 ” 来完成。 ZouXiaohui17:27:31 图 1 协同智能计算模型及其原理和应用的直观示意图 ZouXiaohui17:28:16 这就是 “ 第三脑智 ” 的功能啊! ZouXiaohui17:28:45 图 3 自然语言文本理解双重技术路线中第二路径的特征示意图 Joan17:30:00 哦 怪不得您让我把曾经翻译的摘要中每句话都编上数字符号 最重要是人工建立数据库 然后让计算机进行匹配这 ZouXiaohui17:31:27 正是。你终于理解啦!很好。我继续往下说: 看见没有,你只需要处理日常语言 —— 也就是你最喜欢也最擅长的翻译啊(我称之为 “ 狭义的双语信息处理 ”) ZouXiaohui17:33:38 图 4 收敛与发散、搜索与穷举在 n 2 矩阵格中均可表达的示意图 这是计算机要处理的,你可以完全不用管它 —— 就因为你不喜欢数学我就可以让你永远不用惧怕它 —— 因为在此机器可帮你对付它啊! Joan17:34:24 您的发现 我觉得就是为计算机进行双语信息处理所提供的方法 太高级了 我就不管计算机了啊 ZouXiaohui17:36:13 对啊!你终于明白到这一层啦。哈哈,我接着说: 作为中国人,这是你所能理解的: 图 5 以汉语或中文组字成语的特点为例的语言分析示意图 ZouXiaohui17:37:22 在 “ 协同智能计算系统 ” 中, 计算机和自然人就是这样分工合作的: 图 6 以做和信两字为例揭示双语信息处理中取值与置信的关系示意图 ZouXiaohui17:38:27 北师大国际会议我那一篇文章说的也就是这个道理。 ZouXiaohui17:38:59 图 7 以英汉汉英双语解释和双向机器翻译策略为例的语言处理示意图 由图 7 可见,广义和狭义的双语信息处理的结合示例 ZouXiaohui17:39:23 图 8 仅以做这个字为例来说明双语信息处理中如何取值计算的示意图 ZouXiaohui17:40:16 由图 8 可见,经过自然人专家完成了狭义的双语信息处理之后,广义的双语信息处理就可交给计算机代理来处理, 例如:图 2 描述的为各类学校学生服务的云端计算数据查询服务器就可比普通人(即:各级学校的学生们)都做得更好。 由此可见,这样的结合才能体现协同智能计算的优点。 ZouXiaohui17:43:47 到此,你至少应该可以明白这样一个基本道理,即: 1. 自然人仅需处理右边的母语,最多再加一门外语; 2. 计算机也只处理左边的数字,最多再加它的变体; 3. 协同智能计算系统遵循 16 字方针把两者融合融通。 Joan 17:44:49 老师 其实您在论文中已经讲述得很清晰了 虽然是学术论文 但是更像为大家悉心阐述您思想的教案 目的是让我们这些以后有望得意于您的协同智能计算的学生明白它并利用好它 ZouXiaohui 17:46:34 是的, Joan 17:46:46 左右两边都需要很高智商的人和计算机来做 基础必须打好 ZouXiaohui 17:48:55 对此,我是这样看的: 对学不进去的人,那到不必,因为,有图2所示的阶梯共他们选择; ZouXiaohui 17:51:34 对学得进去的人,就如同你说的那样“ 基础必须打好 ” 阅读全文,由此进入: 《软件》杂志 2011年 第6期 http://www.ccomsoft.com/kanlist.asp?ids=154 专家论坛 协同智能计算系统——理论模型及其应用 邹晓辉 邹顺鹏 《软件》杂志 2011年 第5期 http://www.ccomsoft.com/kanlist.asp?ids=153 专家论坛 间接计算模型和间接形式化方法 邹晓辉1,2,3 邹顺鹏1 《软件》杂志 2011年 第6期 http://www.ccomsoft.com/news.asp?id=245 协同智能计算系统——理论模型及其应用 邹晓辉 邹顺鹏 中国地质大学(北京)高等教育研究所 北京 10008 关键词 本研究工作目的是揭示计算机数据与自然人知识两类信息处理方式基础之上派生的协同智能观及其指导下的协同智能计算系统的理论模型及其应用。它涉及的可验证方法,一方面,在n 2 矩阵范围内,以等价于2 n 的发散方式枚举和以等价于1/2 n 的收敛方式搜索两类基本算法可用作处理纯数字计算的任务,其特征是满足n的取值不影响计算效率的间接计算任务;另一方面,在与n 2 矩阵各个格子一一对应的范围对单音节汉字进行间接形式化处理,其特征不仅在于单音节字,即言,可间接计算,而且,还在于双音节和多音节的字组,即语,也可间接计算,同时,言和语的复用频率均可以且便于间接计算和统计。其结果是:不仅中文的自然语言理解的双重技术路线被揭示,而且,支配这类间接计算模型与间接形式化方法的信息基本定律假说也可被验证。最终可得出这样的结论,即:在前述两方面可验证的两种实证方法,远不仅仅是计算机数据信息处理方式与自然人知识信息处理方式这两类信息处理方式的简单相加,而是这两者合理分工、高度协作所产生的协同智能计算系统的理论模型或第三类信息处理方式及其应用,例如:国内外学术前沿的各类期刊及会议论文摘要、各种软件的常用问题解答以及帮助文件、协同智能计算系统用户个性化记录、自然语言的有限符号及其多样化组合或重复使用过程中蕴含的有限规则、等等各类双语信息的计算机辅助分析,该类云端计算主要服务对象是在创造性合作型生产式教研产学用各类活动中需要计算机辅助双语知识信息数据处理服务的客户。 关键词 计算机;间接计算;间接形式化;软件 中图分类号 TP18 文献标识码 A http://www.ccomsoft.com/index.asp 《软件》杂志 2011年 第6期 http://www.ccomsoft.com/news.asp?id=245 Collaborative Intelligent Computing System -- Theoretical Model with Its Application ZOU Xiaohui ZOU Shunpeng Institute of Higher Education at China University of Geosciences (Beijing), Beijing 100083,China 【 Abstract 】 This study is aimed at revealing the view of Collaborative Intelligence derived from the mode of two types of Information Processing, namely computer data and human knowledge, as well as the theoretical model of Collaborative Intelligence Computing System with its application guided by this view. It involves a verifiable method. On the one hand, within the range of n 2 matrix, the two basic algorithms, which are enumeration based on divergence equivalent to the way of 2 n and search based on convergence equivalent to the way of 1/2 n , can be used as tasks of pure digital computing. It is characterized by satisfying the condition that the value-taking of n does not influence computational efficiency of the Indirect Computational Tasks; On the other hand, within the range of n 2 matrix corresponding to each grid, Indirect Formalization Processing is used towards single character, which is characterized not only by that single-Zi-syllable, namely Yan, can be calculated indirectly, but also by that two-Zi-syllable and multi-Zi-syllable, namely Yu, can also be calculated indirectly. At the same time, the frequencies of the reuse of Yan and Yu can both be easily counted and calculated indirectly. Thus, this model can not only lead us to revealing the double technique route toward Chinese Information Processing or Natural Language Understanding, but also to verifying the hypothesis of Informatics Basic Laws, which dominate the Indirect Computing Model and Indirect Formalization Method. Eventually, it can be concluded that the previous two verifiable empirical methods are much more than a simple sum of these modes of the two types of Information Processing, namely computer data and human knowledge, but the theoretical model of Collaborate Intelligent Computing System or the third kind of Information Processing Mode with its application generated by Rational Division of labor, highly Collaborative Synergy, such as academic forefront of various journals and conference abstracts home and abroad, answers to Frequent Answered Questions of a variety of software and their Help Files, individual records of Collaborative Intelligent Computing System users, limited symbols of natural language and its limited rules inherent in the diverse combination process or the reuse process and so on, which are the Computer-Aided Analysis of all kinds of Bilingual Information. The main target of such cloud-based computing is the customers who need Computer-Aided Bilingual Knowledge and Information Processing in various types of creative cooperative and productive activities, such as teaching, research, production, learning and using. 【Key words】Computer ; Indirect Computing; Indirect Formalization; Software 本文 旨在 阐述协同智能计算系统的理论模型及其应用,其特征是间接计算模型与间接形式化方法 的结合而产生第三类信息处理方式及其应用。它不仅仅是计算机数据信息处理方式与自然人知识信息处理方式这两类信息处理方式的简单相加,而是这两者相互之间的合理分工、高度协作所产生的协同智能 计算系统的理论模型及其应用。 计算机科学技术界可以理解人们做这样的假设,即:如果人工智能 是信息技术的皇冠,那么自然语言理解 就是该皇冠上的一颗明珠。进一步也可以理解我们做这样的假设,即:如果自然人的大脑智能是第一智能,而计算机的电脑智能是第二智能,那么,基于整体大于局部之和的系统科学原理,是否可以把前两种智能的结合称之为第三智能 呢?我们认为不仅可以这么说,而且还可以这么做的。本文一个研究切入点同时也是一个原创点就是从这个思路来做自然语言理解的,因此,才可能发现并确信自然语言理解存在着双重技术路线,进而,也才可能揭示第二路径的科学机理并发现其在双语信息处理上的妙用,这不仅涉及语言学基础研究 一个突破,而且,也涉及信息学基础研究 另一个突破,同时,还涉及教育学和管理学两个领域基础研究 的又一个突破。 这怎么可能呢?真是让人难以置信!但是,经过近十几年在上述几个相关研究领域的探索、研究和广泛的国际国内交流,最终让我们得到了确切的研究结果、结论和具体的应用示例。 下面就把难以置信变为确信无疑的探索历程介绍给读者 阅读全文,由此进入: 《软件》杂志 2011年 第6期 http://www.ccomsoft.com/kanlist.asp?ids=154 专家论坛 协同智能计算系统——理论模型及其应用 邹晓辉 邹顺鹏 《软件》杂志 2011年 第5期 http://www.ccomsoft.com/kanlist.asp?ids=153 专家论坛 间接计算模型和间接形式化方法 邹晓辉1,2,3 邹顺鹏1
最近被问了一个问题:协同(控制)和系统优化有什么区别?思考了很久,查了一些资料。最终形成了下面的学习笔记。因为考虑这是一个比较开放的话题。不同领域的学者可能会有不同的理解,因此扔在这里,抛砖引玉。 系统优化属于系统科学的范畴,是对组成系统的各个部分合理地分配资源和任务,使系统更有效地完成目标的研究。虽然都是通过多方参与来共同完成目标,但是他与后两者的区别比较明显。首先系统优化中的参与各方是存在附属关系的,参与者并不是独立的个体,由于不是独立的个体,导致参与方被动地参与,严格执行上层分配的任务,缺少自主性;其次由于系统优化往往具有中心式决策机制;再次,系统优化的目的是为了避免系统各个组成部分重复的工作或者相互间工作无法衔接。另外,系统优化的目标明确程度要高于后两者,往往是存在理论上的最大值。最后系统优化的解决方法是基于问题的模型,采用运筹学等优化方法,而后两者则是通过 反复的协商和共享知识,明确各自的角色。 后两者的区别不是十分明显。但是通过下面的解释和举例仍然可以去体会。首先合作对参与其中的个体没有明确的要求,而协同强调个体的正确配置。例如产学研协同创新,仅有产业中几个企业的合作是不能称为协同的;其次,合作一般为框架性的,比较少关注个体如何制定各自的行为。而协同不仅有框架性协议,而且关注个体的行为,强调个体间的影响,协同强调对个体间相互依赖性的关注。以写一本书举例来说, A 和 B 合作是说 A 写安全, B 写优化。 A 和 B 协同则是 A 写安全分析方法,而 B 写基于此方法的案例研究。第三,合作不强调同步性,参与各方可以各自执行,相互等待。协同则强调参与方的紧密配合,更加注重实时性。 通过上面的分析,可以发现:合作 / 协作的定义范围更宽,而协同在此基础上进一步强调了三点要求。因此协同是一种特殊的合作 / 协作。 额外价值 : 是指在采用系统优化方法实现收益最大化的基础上增加的价值。
协同过滤是最早提出,研究最深入,商业应用最广泛的个性化技术。协同过滤技术服务的对象是个体,却利用了所有用户的信息。在以用户为中心,基于相似性的经典协同过滤算法中,首先通过比较历史数据,计算目标用户和其他用户的相似性,然后把和目标用户非常相似的用户喜欢的商品推荐给目标用户。计算相似性的办法非常多 ,在讨论关联规则时提到的方法,例如 Cosine 相似性(分母是两个用户购买商品数目乘积的平方根)和 Jaccard 相似性(分母是两个用户购买商品的并集大小),都是常用的。举个例子,如果用户 A 购买了商品 1,2,3 ,用户 B 购买了商品 2,3,4,5 ,则分母是共同购买的商品数 2 , Cosine 相似性等于 2 除以根号下 3 乘 4 ,约为 0.577 , Jaccard 相似性则是 2 除以 5 ,等于 0.4 。得到相似性后,可以把所有其他用户对商品的评价按照相似性加权求和的方式排序,推荐给目标用户;也可以选择相似性最高的 k 个用户,只考虑他们的影响;还可以设定一个相似性阈值,只考虑相似性高于这个阈值的用户的影响。 上面讲到的协同过滤,是以用户为中心的。另外一种应用非常广泛的方法,是以商品为中心的,一般叫做基于商品的协同过滤。这种方法的基本思路是,分析目标用户购买过的商品,向其推荐和他曾经购买过的商品相似的商品。考虑互联网用户兴趣的实时性,一般而言只分析用户近期的购买行为,或者认为以前的购买行为对当前推荐的影响是随着时间递减的。在定义商品的相似性的时候,既可以通过行为,也就是看两个商品是否频繁被同一个用户购买过,也可以通过内容,也就是看两个商品的属性或者描述是否具有相似性。后者和接下来要讲的内容分析紧密结合,事实上, Amazon 所使用的推荐算法的核心就是建立在内容分析基础上的基于商品的协同过滤 。由于图书的内容很丰富,判断内容之间的相似性非常准确,所以该方法在 Amazon 上效果很好。但是需要注意的是,这种方法移植到其他商品的推荐上,效果可能大打折扣。 基于商品的协同过滤方法有两个特别的优势:一是方便设计实时响应的算法,因为商品之间的相似性可以离线计算,这样的话,用户每次浏览新的商品后,包括放入购物车或者购买,容易实时计算并立刻更新用户看到的推荐商品栏;二是该方法可解释性强,因为在对用户进行推荐的时候,可以告诉用户推荐给你这个商品的主要原因是因为参考了你曾经购买或者浏览的若干商品——可解释性可以大大提高用户体验,在个性化电子邮件营销中有很大用途。与之相对,基于用户的协同过滤可以挖掘一些更深层次的潜在关联,帮助提高交叉销售量,也就是在用户购买某品类产品的时候,向用户推荐其他品类的产品,从而提高用户购买的多样性。这不仅仅是眼前提高了用户的客单价,更重要的是扩充了用户新的购物品类,从而可以整体提高该用户的价值。基于商品的协同过滤往往倾向于推荐同品类商品,在交叉销售方面价值较小。这两种方法遇到的一个共同的问题就是倾向于推荐热门产品,推荐的多样性和新颖性不够——如何在不伤害推荐精确性的前提下提高推荐的多样性和新颖性,是个性化推荐技术研究的重大挑战 。 L. L ü , T. Zhou, Link prediction in complex networks: a survey, Physica A 390 (2011) 1150-1170. G. Linden, B. Smith, J. York, Amazon.com recommendations: item-to-item collaborative filtering, IEEE Internet Computing 7(1) (2003) 76-80. C.-N. Ziegler, S. M. McNee, J. A. Konstan, G. Lausen, Improving recommendation lists through topic diversification, Proceedings of the 14th international conference on World Wide Web, ACM Press, New York, 2005. T. Zhou, Z. Kuscsik, J.-G. Liu, M. Medo, J. R. Wakeling, Y.-C. Zhang, Solving the apparent diversity-accuracy dilemma of recommender systems, Proceedings of the National Academy of Sciences of the United States of America 107 (2010) 4511-4515.
Title: “云”概念的分析研究——协同智能计算系统的一个特例 Paper: PDF Keywords: 云计算理论 云计算技术 云计算服务 Abstract: 自2006谷歌101项目公开提出云计算(cloud computing)至今,信息产业界说它是一种新的商业模式,信息技术界说它是软件服务(SaaS)、平台服务(PaaS)和基础设施服务(IaaS)三个不同层次的互联网技术的融合,通信与计算机科学界则说它是一种有别于既有范式的新的计算模式,另外还有各种各样的具体说法。可就是缺乏一个系统周全而又简明扼要的理论描述或定义,尤其缺乏英汉双语对照的解释。笔者认为,这对云计算理论与技术的普及和提高均为不利。为此,本文拟从云计算提出的源头及其可能的新发展两个方面来对它做一个系统的分析和展望。旨在为云计算理论与技术的普及和提高开辟新途径。 Time: May 20, 22:31 GMT Fax: Address: Authors Authors: Name Email Country Affiliation Zou Xiaohui qhkjy@yahoo.com.cn China ✔ Zou Shunpeng China “ 云 ” 概念的分析研究 —— 协同智能计算系统的一个特例 邹晓辉 1 , 2 , 邹顺鹏 1 摘 要 : 自 2006 谷歌 101 项目公开提出云计算( cloud computing )至今,信息产业界说它是一种新的商业模式,信息技术界说它是软件服务( SaaS )、平台服务( PaaS )和基础设施服务( IaaS )三个不同层次的互联网技术的融合,通信与计算机科学界则说它是一种有别于既有范式的新的计算模式,另外还有各种各样的具体说法。可就是缺乏一个系统周全而又简明扼要的理论描述或定义,尤其缺乏英汉双语对照的解释。笔者认为,这对云计算理论与技术的普及和提高均为不利。为此,本文拟从云计算提出的源头及其可能的新发展两个方面来对它做一个系统的分析和展望。旨在为云计算理论与技术的普及和提高开辟新途径。 关键词 : 云计算理论;云计算技术;云计算服务 Analysis on the Concept of "Cloud" ——A Special Case of the Collaborative Intelligence Computing System Xiaohui, Zou 1, 2 ; Shunpeng, Zou 1 qhkjy@yahoo.com.cn 15300239971 (1.China University of Geosciences, Beijing 100083, China; 2.Sino-US Berkeley Project) Abstract: Since Google advanced “cloud computing” openly in its “101 project” in 2006, the information industry sector has said that it is a new business model and information technology sector has qualified it as an integration of three different levels of Internet technology, namely software as a services (SaaS), platform as a services (PaaS) and infrastructure as a services (IaaS), while communications and computer science sector has constructed it as a new computing model that is different from the existing paradigm. Besides these, there are also a variety of specific statements. Nevertheless, what we lack is a comprehensive yet concise theoretical description or definition on it, especially the lack of English-Chinese bilingual Comparison interpretation, which in the author’s opinion has a negative impact on the popularization and improvement of “cloud computing” theory and technology. To this end, this paper attempts to make a systematic analysis and outlook on “cloud computing” from two aspects, namely the source of the proposed “cloud computing” and its possible development, aiming to open up new ways for the popularization and improvement of “cloud computing” theory and technology. Key words: Cloud Computing Theory, Cloud Computing Technology, Cloud Computing Service
《中国科学 : 信息科学》 “ 云计算与 SaaS” 专题:云计算理论与技术 ——“2011 年第二届中国云计算与 SaaS 学术会议 ” 论文 “ 云 ” 概念的分析研究 —— 协同智能计算系统的一个特例 自 2006 谷歌 101 项目提出云计算( cloud computing )至今,信息产业界说它是一种新的商业模式,信息技术界说它是软件服务、平台服务和基础设施服务三个不同层次的互联网技术的融合,通信与计算机科学界则说它是一种有别于既有范式的新的计算模式,另外还有各种各样的具体说法。可就是缺乏一个系统周全而又简明扼要的理论描述或定义,尤其缺乏英汉双语对照的解释。笔者认为,这对云计算理论与技术的普及和提高均为不利。 为此,本文拟从云计算( cloud computing )提出的源头及其可能的新发展两个方面来对它做一个系统的分析和展望。旨在为云计算理论与技术的普及和提高开辟新途径。 一、采用人机协同的方式来建立 cloud (云)概念分析数据表格 基于统计的方法对 cloud computing (云计算)权威文献 Google and the Wisdom of Clouds (谷歌和云的智慧 )中 42 处提及 cloud (云)的上下文分别做句和段两个层次的对照表格。 表格 1 是 42 处提及 cloud (云)的上下文语句英汉对照表格。 表格 2 是 19 个提及 cloud (云)的上下文语段英汉对照表格。 二、根据表格 1 和表格 2 进行机人协同的 cloud (云)概念分析 根据表格 1 对 cloud (云)概念进行机助人的协同分析 根据表格 2 对 cloud (云)概念进行人机助的协同分析 三、分析 cloud (云)概念之后发现它竟是协同计算的一个特例 其实它在技术上就是一种集群计算,其特点是大规模协同计算,由此必然会引来相应的商业模式改变。这就是笔者研读它后的结论。 笔者的译文“谷歌和云的智慧”之所以有别于目前流行的参考译文“ Google 及其云智慧”,首先是基于“信、达、雅”的翻译准则,同时,考虑英汉双语表格务必兼顾并凸显英汉双语各自特点的基本原则。 附录: 1 Google and the Wisdom of Clouds 2 谷歌和云的智慧 3 A lofty new strategy aims to put incredible computing power in the hands of many 4 这项全新的远大战略 旨在把强大得超乎想象的计算能力分布到众人手中 。 5 by Stephen Baker 6 作者:斯蒂芬 贝克 7 One simple question. That's all it took for Christophe Bisciglia to bewilder confident job applicants at Google (GOOG). Bisciglia, an angular 27-year-old senior software engineer with long wavy hair, wanted to see if these undergrads were ready to think like Googlers. "Tell me," he'd say, "what would you do if you had 1,000 times more data?" 8 这是一个简单的问题,是克里斯托夫·比希利亚为信心十足的谷歌应聘者们出的一道题。作为谷歌公司的高级软件工程师, 27 岁的比希利亚留着一头卷曲的长发,他希望了解这些大学本科生 是否已经准备好以谷歌人的方式去思考 。“告诉我,”他问道,“如果有 1000 多倍的数据量,你将怎么办?” 9 What a strange idea. If they returned to their school projects and were foolish enough to cram formulas with a thousand times more details about shopping or maps or—heaven forbid—with video files, they'd slow their college servers to a crawl. 10 真是个奇怪的问题。假如他们真的跑回学校,愚蠢地想要 去处理容量多达 1000 多倍的细节信息,那么学校的服务器恐怕会被拖累得慢如爬虫 。 11 At that point in the interview, Bisciglia would explain his question. To thrive at Google, he told them, they would have to learn to work—and to dream—on a vastly larger scale. He described Google's globe-spanning network of computers. Yes, they answered search queries instantly. But together they also blitzed through mountains of data, looking for answers or intelligence faster than any machine on earth. Most of this hardware wasn't on the Google campus. It was just out there, somewhere on earth, whirring away in big refrigerated data centers. Folks at Google called it "the cloud." And one challenge of programming at Google was to leverage that cloud—to push it to do things that would overwhelm lesser machines. New hires at Google, Bisciglia says, usually take a few months to get used to this scale. "Then one day, you see someone suggest a wild job that needs a few thousand machines, and you say: Hey, he gets it.'" 12 比希利亚 将在面试中阐释他的问题 。他告诉应聘者, 要想在谷歌发展,就必须学会从更宽广、更宏观的角度来工作和思考 。 他描述了 谷歌全球运行的计算机网络 。的确, 这些设备可以实现对搜索需求的即时回馈 ;而当形成集群,它们则能更快地处理浩如烟海的数据,其检索答案或指令的速度将超过世界上任何一台单机 。绝大部分硬件设备并非安放在谷歌公司园区,而是在园区之外,没准就在地球上 某个大型冷却数据中心 里高速运转着。谷歌内部 把这种大规模计算机集群称作“云”。 在谷歌,工程师编程过程中碰到的一大挑战便是 如何驾驭“云” ? 提高它的数据处理能力从而大幅领先于小型计算机群。比希利亚表示,谷歌的新员工通常要花费数月才能习惯 从这种角度思考 。 13 What recruits needed, Bisciglia eventually decided, was advance training. So one autumn day a year ago, when he ran into Google CEO Eric E. Schmidt between meetings, he floated an idea. He would use his 20% time, the allotment Googlers have for independent projects, to launch a course. It would introduce students at his alma mater, the University of Washington, to programming at the scale of a cloud. Call it Google 101. Schmidt liked the plan. Over the following months, Bisciglia's Google 101 would evolve and grow. It would eventually lead to an ambitious partnership with IBM (IBM), announced in October, to plug universities around the world into Google-like computing clouds. 14 比希利亚认为,谷歌的新人 所需要的是高级培训课程 。 2006 年秋季的一天,当他在会议间歇偶遇公司首席执行官埃里克·施米特时,他脑海里浮现出一个想法。他将利用自己的“ 20% 时间”(即谷歌分配给员工 用于独立开发项目的时间 )来启动一门课程, 这门课程将在他的母校华盛顿大学进行,着重引导学生们进行“云”系统的编程开发,他设想把这个项目命名为谷歌 101 。施米特很是欣赏这一计划。在接下来的数月中,比希利亚的谷歌 101 计划不断发展和深化,最终促成了谷歌与 IBM 在 2007 年 10 月开展了一次雄心勃勃的合作 --- 把全球多所大学纳入类似谷歌的计算“云”中。 15 As this concept spreads, it promises to expand Google's footprint in industry far beyond search, media, and advertising, leading the giant into scientific research and perhaps into new businesses. In the process Google could become, in a sense, the world's primary computer. 16 随着“云”概念影响的扩大 ,谷歌在产业中的足迹必然会远远超出搜索、媒体和广告领域,从而使这家 IT 巨头得以涉足科学研究甚至更新的业务领域。 在这一过程中,谷歌在某种意义上可能会成为世界上首屈一指的超级计算机。 17 "I had originally thought was going to work on education, which was fine," Schmidt says late one recent afternoon at Google headquarters. "Nine months later, he comes out with this new strategy, which was completely unexpected." The idea, as it developed, was to deliver to students, researchers, and entrepreneurs the immense power of Google-style computing, either via Google's machines or others offering the same service. 18 “我最初以为(比希利亚)不过是想在教育上做点事情,这当然也不错,” 施米特最近一个下午在谷歌总部回想道,“ 9 个月后他 拿出了新战略(即‘云’计划) ,太出乎意料了。”随着自身的不断拓展,“云”计划将为学生、研究人员和企业家们 提供谷歌式的无限的计算处理能力 ,不论是通过谷歌自身的设备或是通过提供相同服务的其他厂商。 19 What is Google's cloud? It's a network made of hundreds of thousands, or by some estimates 1 million, cheap servers, each not much more powerful than the PCs we have in our homes. It stores staggering amounts of data, including numerous copies of the World Wide Web. This makes search faster, helping ferret out answers to billions of queries in a fraction of a second . Unlike many traditional supercomputers, Google's system never ages. When its individual pieces die, usually after about three years, engineers pluck them out and replace them with new, faster boxes. This means the cloud regenerates as it grows, almost like a living thing. 20 谷歌的“云”到底是什么?它是由几十万甚至大约 100 万台廉价的服务器所组成的网络。 这些机器单个而论的话,其性能并不比家用台式机强大多少。但是这个网络存储的数据量惊人, 能容纳不计其数的网络数据拷贝,因此搜索速度能够更快,在眨眼之间便能为数十亿的搜索提交答案 。 与许多传统的超级计算机不同,谷歌的系统永远不会老化。 如果网络中某一台机器落伍(通常在使用 3 年后),工程师们就会把它淘汰,而代之以性能更强的新款计算机。 这意味着,“云”几乎就像生物一样能长生不老。 21 A move towards clouds signals a fundamental shift in how we handle information. At the most basic level, it's the computing equivalent of the evolution in electricity a century ago when farms and businesses shut down their own generators and bought power instead from efficient industrial utilities . Google executives had long envisioned and prepared for this change. Cloud computing, with Google's machinery at the very center, fit neatly into the company's grand vision, established a decade ago by founders Sergey Brin and Larry Page: " to organize the world's information and make it universally accessible. " Bisciglia's idea opened a pathway toward this future. "Maybe he had it in his brain and didn't tell me," Schmidt says. "I didn't realize he was going to try to change the way computer scientists thought about computing. That's a much more ambitious goal." 22 向“云”规模的数据处理迈进 标志着 我们在信息处理方面发生了翻天覆地的转变 。从最基本的层面讲,“云”的发展就如同 100 年前人类用电的进程演变,当时的农场和公司逐渐关闭了自己的发电机,转而从高效的发电厂购买电力。谷歌的高管们很早前就开始展望 这一转变 并 为之进行筹划准备 。 以谷歌设备为核心的“云计算” 完全符合由该公司创始人谢尔盖·布林和拉里·佩奇 10 年前 提出的远大构想 :“构建起跨越全世界的信息,供人们随时随地访问。”比希利亚的想法 刚好为实现这个构想 开辟了一条道路 。“没准他脑子里早就有数,只是没告诉我,”施米特表示,“我开始没有意识到 他将试图改变计算机专家对于计算的固有想法。这个目标太伟大了 。” 23 ONE-WAY STREET 24 单行道 25 For small companies and entrepreneurs, clouds mean opportunity—a leveling of the playing field in the most data-intensive forms of computing. To date, only a select group of cloud-wielding Internet giants has had the resources to scoop up huge masses of information and build businesses upon it. Our words, pictures, clicks, and searches are the raw material for this industry. But it has been largely a one-way street. Humanity emits the data, and a handful of companies—the likes of Google, Yahoo! (YHOO), or Amazon.com (AMZN)—transform the info into insights, services, and, ultimately, revenue. 26 对于小型公司和企业主而言,“云”意味着机会,在密集型数据处理领域这一竞技场中,它就像是一道标准线。 今天,掌控“云”系统的互联网巨头中,只有少数几家拥有吞吐海量信息并开展相关业务的资源。 我们的文字、图片、点击和搜索 全都是这个产业的原材料 。一直以来, 很大程度上这是一条单行道 ? 人们产出数据,谷歌、雅虎和亚马逊等公司 则将 信息转化成观点、服务,最终变成收入 。 27 This status quo is already starting to change. In the past year, Amazon has opened up its own networks of computers to paying customers, initiating new players, large and small, to cloud computing. Some users simply park their massive databases with Amazon. Others use Amazon's computers to mine data or create Web services. In November, Yahoo opened up a cluster of computers—a small cloud —for researchers at Carnegie Mellon University. And Microsoft (MSFT) has deepened its ties to communities of scientific researchers by providing them access to its own server farms . As these clouds grow , says Frank Gens, senior analyst at market research firm IDC, "A whole new community of Web startups will have access to these machines. It's like they're planting Google seeds." Many such startups will emerge in science and medicine, as data-crunching laboratories searching for new materials and drugs set up shop in the clouds . 28 这种状况已开始发生改变。 2006 年, 亚马逊 向付费用户 开放了自己的计算机网络,调动新的参与者加入“云”计算 ,而无论其规模大小。一些用户只是简单地将数据库存储在亚马逊,另一些则使用亚马逊的服务器搜索数据或建立网络服务。 2007 年 11 月, 雅虎 也将一个 电脑集群(即小规模的“云”) 开放给卡内基 - 梅隆大学的研究人员。 微软 同样通过开放 服务器群 来加深与科学研究团体的联系。市场调查公司 IDC 的 高级分析师弗兰克·金斯 表示: 随着这些“云”的发展,“新兴的网络公司将有望造访这些服务器 。这就 如同在播撒谷歌种子 ”。随着搜索新材料和药品的数据处理实验室把工作室搬到了“云”上,这些新公司很多将会出现在科学和医药领域。 29 For clouds to reach their potential, they should be nearly as easy to program and navigate as the Web. This, say analysts, should open up growing markets for cloud search and software tools—a natural business for Google and its competitors . Schmidt won't say how much of its own capacity Google will offer to outsiders, or under what conditions or at what prices. "Typically, we like to start with free," he says, adding that power users "should probably bear some of the costs." And how big will these clouds grow? "There's no limit," Schmidt says. As this strategy unfolds, more people are starting to see that Google is poised to become a dominant force in the next stage of computing. "Google aspires to be a large portion of the cloud, or a cloud that you would interact with every day," the CEO says . The business plan? For now, Google remains rooted in its core business, which gushes with advertising revenue. The cloud initiative is barely a blip in terms of investment. It hovers in the distance, large and hazy and still hard to piece together, but bristling with possibilities. 30 要想让“云”发挥出潜能,与此相关的编程和操作就应该与使用互联网一样简单 。分析家称,这给“云”搜索及其相关的软件工具打开了增长的市场,对于谷歌及其竞争对手来说,这可谓唾手可得的业务。谷歌将为用户提供多少存储容量,或以什么形式、什么价格提供,对于这些,施米特 都不会明说 。“通常来讲,我们 开始时会采取免费策略 ,”他表示,并强调大客户“应该很有可能负担一些费用”。那么这些“云”能发展到多大的规模呢?“无限大,”施米特表示。 随着“云”策略的展开,更多人看到谷歌随时做好了成为下一代计算的主导力量的准备。 “谷歌渴望占据“云计算”市场中相当的份额,或成为每天都与普通人打交道的‘云’,”施米特说道。那么有什么样的商业计划呢?就目前而言,谷歌仍将继续植根于核心业务,这一业务给它带来了滚滚的广告收入。 从投资角度来说,“云”计划在初始阶段不过是像雷达屏幕上显示的一个小光点。它在远处盘旋,目标很大却烟雾弥漫,很难拼凑在一起,但仍充满着无限可能。 31 Changing the nature of computing and scientific research wasn't at the top of Bisciglia's agenda the day he collared Schmidt. What he really wanted, he says, was to go back to school. Unlike many of his colleagues at Google, a place teeming with PhDs, Bisciglia was snatched up by the company as soon as he graduated from the University of Washington, or U-Dub, as nearly everyone calls it. He'd never been a grad student. He ached for a break from his daily routines at Google— the 10-hour workdays building search algorithms in his cube in Building 44, the long commutes on Google buses from the apartment he shared with three roomies in San Francisco's Duboce Triangle . He wanted to return to Seattle, if only for one day a week, and work with his professor and mentor, Ed Lazowska. "I had an itch to teach," he says. 32 就在那次比希利亚趁机和施米特谈论谷歌 101 时, 改变计算和科研的现状 还并不是他的主要意图。他自己说, 当时真正想做的是返回学校。 与公司里很多已拿到博士学位的同事不同,比希利亚 刚从华盛顿大学毕业就被谷歌录用,他甚至没读过硕士研究生。 因此 他渴望从谷歌的日常工作中抽出时间换换脑子。在谷歌,比希利亚每天都需要从公寓搭乘班车长途跋涉到公司,然后开始 10 小时的搜索运算法则的编写工作 。 他想回到西雅图,哪怕每周只有一天,回到学校去和他的教授兼导师埃德·拉佐斯卡一起工作 。 33 He didn't think twice before vaulting over the org chart and batting around his idea directly with the CEO. Bisciglia and Schmidt had known each other for years . Shortly after landing at Google five years ago as a 22-year-old programmer, Bisciglia worked in a cube across from the CEO's office . He'd wander in, he says, drawn in part by the model airplanes that reminded him of his mother's work as a United Airlines (UAUA) hostess. Naturally he talked with the soft-spoken, professorial CEO about computing. It was almost like college. And even after Bisciglia moved to other buildings, the two stayed in touch. ("He's never too hard to track down, and he's incredible about returning e-mails," Bisciglia says.) 34 在突发灵感想到“云”计划并直接和老板详细讨论之前,比希利亚并没多加考虑。他和施米特 已相识数年。他 5 年前刚入职谷歌时还只是一个年仅 22 岁的程序员,其工位就在首席执行官的办公室附近 。比希利亚回忆说,他走进办公室时被一架飞机模型所吸引,这让他想起母亲在美国联合航空公司从事的空乘工作。自然而然地,他与话语温和、有学者派头的施米特 聊起了数据计算,那种感觉就像在大学一样 。后来虽然比希利亚搬到了其他办公楼,但两人仍然保持着联系。 35 On the day they first discussed Google 101 , Schmidt offered one nugget of advice: Narrow down the project to something Bisciglia could have up and running in two months . "I actually didn't care what he did," Schmidt recalls. But he wanted the young engineer to get feedback in a hurry. Even if Bisciglia failed, he says, "he's smart, and he'd learn from it." 36 在他们第一次讨论“谷歌 101 ”计划的那一天,施米特 提出了很好的建议:把项目缩减到比希利亚能在两个月内完成的规模 。 “我实际上没太在意他的话,”施米特回忆说,但是他想尽快给这位年轻的工程师发出反馈。他说,即使比希利亚失败了,但“他很聪明,一定能从失败中获得经验”。 37 To launch Google 101, Bisciglia had to replicate the dynamics and a bit of the magic of Google's cloud—but without tapping into the cloud itself or revealing its deepest secrets. These secrets fuel endless speculation among computer scientists. But Google keeps much under cover. This immense computer, after all, runs the company. It automatically handles search, places ads, churns through e-mails. The computer does the work, and thousands of Google engineers, including Bisciglia, merely service the machine. They teach the system new tricks or find new markets for it to invade. And they add on new clusters—four new data centers this year alone, at an average cost of $600 million apiece. 38 要顺利启动“谷歌 101 ”计划,比希利亚 必须把项目的来龙去脉和谷歌“云”的些许魔力透露给合作对象,同时又不能深入“云”本身或揭示出核心机密。 这些机密会激发计算机学家无穷无尽的思考, 谷歌对此守口如瓶,毕竟这台“超级计算机”是公司运营的支柱,它能自动处理搜索、放置广告、传递电子邮件等业务。 计算机在从事这些工作,而包括比希利亚在内的上千名谷歌工程师仅仅只是“服侍”着它。他们“教授”系统新的技术或为它寻找新的主攻市场,同时在其中添加新的集群 , 2007 年一年就增加了 4 个新的数据中心,平均每个成本达 6 亿美元。 39 In building this machine, Google, so famous for search, is poised to take on a new role in the computer industry . Not so many years ago scientists and researchers looked to national laboratories for the cutting-edge research on computing. Now, says Daniel Frye, vice-president of open systems development at IBM, "Google is doing the work that 10 years ago would have gone on in a national lab." 40 在搭建这台“计算机”的过程中,在搜索领域名声大震的谷歌随时准备扮演计算机业的新角色 。不久之前,科学家和研究人员曾期望国家实验室能启动数据计算方面的前沿研究。如今, IBM 负责开放系统开发的副总裁 丹尼尔·弗赖感叹 :“谷歌现在做的事情 10 年前只有在国家实验室才能实现。” 41 How was Bisciglia going to give students access to this machine? The easiest option would have been to plug his class directly into the Google computer. But the company wasn't about to let students loose in a machine loaded with proprietary software, brimming with personal data, and running a $10.6 billion business. So Bisciglia shopped for an affordable cluster of 40 computers. He placed the order, then set about figuring out how to pay for the servers. While the vendor was wiring the computers together, Bisciglia alerted a couple of Google managers that a bill was coming. Then he "kind of sent the expense report up the chain, and no one said no." He adds one of his favorite sayings: "It's far easier to beg for forgiveness than to ask for permission." ("If you're interested in someone who strictly follows the rules, Christophe's not your guy," says Lazowska, who refers to the cluster as "a gift from heaven." ) 42 那么,比希利亚如何让学生们访问这台机器呢?最容易的方案当然是直接从学校连接专线到谷歌服务器。然而公司并不准备彻底放手让学生们随意访问这台装有授权软件、存储着私人信息以及运营着 106 亿美元业务的计算机。 比希利亚因此购买了价位适中的 40 台计算机 组成集群 。他 发出订单 后 开始琢磨 如何给这些服务器付钱。 就在卖家组装电脑集群时,比希利亚告诉谷歌的几名经理将出现一大笔账单。之后他“拿着花销报告从下到上请示了一通,结果没人反对”。说到这里,他又加上自己喜欢的一句格言:“请求原谅比寻求批准容易得多。” 43 A FRENETIC LEARNER 44 狂热的学习者 45 On Nov. 10, 2006, the rack of computers appeared at U-Dub's Computer Science building. Bisciglia and a couple of tech administrators had to figure out how to hoist the 1-ton rack up four stories into the server room. They eventually made it, and then prepared for the start of classes, in January. 46 2006 年 11 月 10 日, 排成阵列的计算机群 出现在华盛顿大学计算机科学学院的教学楼里。比希利亚和几个技术负责人 得想办法 把将近 1 吨重的机柜抬上 4 层放到机房里。他们最终解决了这个问题,并准备在第二年 1 月开始上课。 47 Bisciglia's mother, Brenda, says her son seemed marked for an unusual path from the start. He didn't speak until age 2, and then started with sentences. One of his first came as they were driving near their home in Gig Harbor, Wash. A bug flew in the open window, and a voice came from the car seat in back: "Mommy, there's something artificial in my mouth." 48 比希利亚的母亲布伦达说,她的儿子似乎 从小就注定要走一条不平凡的道路 。他直到两岁才开口说话,但很快就开始成句成句地说。最早的一次是家人开车行至离家不远的华盛顿吉格港时,一只小虫子从打开的车窗飞进来,只听到从后排座传来比利亚的声音:“妈妈,有一件物体在我嘴里。” 49 At school, the boy's endless questions and frenetic learning pace exasperated teachers. His parents, seeing him sad and frustrated, pulled him out and home-schooled him for three years. Bisciglia says he missed the company of kids during that time but developed as an entrepreneur. He had a passion for Icelandic horses and as an adolescent went into business raising them. Once, says his father, Jim , they drove far north into Manitoba and bought horses, without much idea about how to transport the animals back home. "The whole trip was like a scene from one of Chevy Chase's movies," he says. Christophe learned about computers developing Web pages for his horse sales and his father's luxury-cruise business. And after concluding that computers promised a brighter future than animal husbandry, he went off to U-Dub and signed up for as many math, physics, and computer courses as he could. 50 在学校里,这个男孩没完没了的提问和飞快的学习进度 惹恼了老师。父母看到他很伤心、很受挫,便把他带回家教了 3 年。比希利亚说,那段时间 他失去了很多小伙伴,但是学会了如何成为一个生意人 。他对冰岛野马兴趣浓厚,并在十六七岁时投身到养马行当。他的父亲吉姆回忆道,一次,他们开车一直向北行驶到马尼托巴买了马匹,却并没有考虑如何把它们运回家。“整个旅行就像塞维·蔡斯电影里的场景,”他说。比希利亚 学会了用计算机为他的贩马事业和父亲的豪华游艇业务制作网页 。比希利亚 断定 计算机比养马 更有前途 ,因此义无反顾地 报考了 华盛顿大学,并 选修了 尽可能多的学科,包括数学、物理和计算机相关学科。 51 In late 2006, as he shuttled between the Googleplex and Seattle preparing for Google 101, Bisciglia used his entrepreneurial skills to piece together a sprawling team of volunteers. He worked with college interns to develop the curriculum, and he dragooned a couple of Google colleagues from the nearby Kirkland (Wash.) facility to use some of their 20% time to help him teach it. Following Schmidt's advice, Bisciglia worked to focus Google 101 on something students could learn quickly. " I was like, what's the one thing I could teach them in two months that would be useful and really important?" he recalls. His answer was "MapReduce." 52 2006 年年末, 当比希利亚 往返于 谷歌大厦和 西雅图 之间筹备“谷歌 101 ”计划时 ,他 运用生意人的技巧,招募了一支组织松散的志愿者队伍。他和学院的实习生一起设计课程,还在谷歌公司位于学校附近的华盛顿州 科克兰德分部 拉拢部分同事,让他们抽出 20% 的时间来帮忙教课 。比希利亚听从了施米特的建议,把“谷歌 101 ”集中在学生们在学习过程中 容易上手的方面 。“我基本想的是, 什么课程 我能 在两个月里教会他们 ,同时又真正有用和重要?”他回忆道 。最终他的答案是 MapReduce 。 53 Bisciglia adores MapReduce, the software at the heart of Google computing. While the company's famous search algorithms provide the intelligence for each search, MapReduce delivers the speed and industrial heft. It divides each task into hundreds, or even thousands, of tasks, and distributes them to legions of computers. In a fraction of a second, as each one comes back with its nugget of information, MapReduce quickly assembles the responses into an answer. Other programs do the same job. But MapReduce is faster and appears able to handle near limitless work. When the subject comes up, Bisciglia rhapsodizes. "I remember graduating, coming to Google, learning about MapReduce, and really just changing the way I thought about computer science and everything," he says. He calls it "a very simple, elegant model." It was developed by another Washington alumnus, Jeffrey Dean . By returning to U-Dub and teaching MapReduce, Bisciglia would be returning this software "and this way of thinking" back to its roots. 54 比希利亚十分推崇 MapReduce ,这是谷歌数据计算的 核心软件 。 公司 著名的 搜索运算法 为 每一次搜索 提供信息 , MapReduce 则传递出速度。它把 每个任务 分解为 成百甚至上千块 小任务,然后 发送到 计算机集群中 。眨眼之间, 每台计算机 传送回 自己的那部分信息, MapReduce 则 迅速整合 这些反馈并形成答案。 虽然也有一些技术具有同样的功能,但 MapReduce 速度更快且显示出几乎可以解决无限任务的能力 。提到 MapReduce ,比希利亚 变得十分兴奋和狂热 :“我记得刚毕业时来到谷歌 学习 MapReduce ,这的的确确 改变了我对计算机科学乃至所有事情的想法 。”他把该软件称为“非常简单却极其卓越的模型”。 这个软件是由其 华盛顿大学 校友杰弗里·迪安开发的。因此 通过回到母校教授 MapReduce ,比希利亚会 将这个软件 和“这种思考方式”带回源头。 55 There was only one obstacle. MapReduce was anchored securely inside Google's machine—and it was not for outside consumption, even if the subject was Google 101. The company did share some information about it, though, to feed an open-source version of MapReduce called Hadoop . The idea was that, without divulging its crown jewel, Google could push for its standard to become the architecture of cloud computing. 56 只有一个阻碍。 MapReduce 曾经安全地“沉寂”在谷歌主机中 ? 而且不允许外界使用,对于“谷歌 101 ”项目也一视同仁 。谷歌曾拿出一部分相关信息与他人共享,以开发开源版本“ Hadoop ”。 当时的想法是在不泄露核心技术的前提下,推动自身的标准成为“云”计算的体系结构。 57 The team that developed Hadoop belonged to a company, Nutch, that got acquired. Oddly, they were now working within the walls of Yahoo, which was counting on the MapReduce offspring to give its own computers a touch of Google magic. Hadoop remained open source , though, which meant the Google team could adapt it and install it for free on the U-Dub cluster. 58 开发 Hadoop 的团队属于一家名为 Nutch 的公司。说也奇怪,这家公司现在归入雅虎麾下,雅虎希望依靠 MapReduce 的衍生产物 给自己的数据计算 提供一点谷歌“云”的魔力。 好在 Hadoop 仍然保持开源状态,这意味着谷歌团队能对其加以应用并 可免费安装在 华盛顿大学的计算机集群中。 59 Students rushed to sign up for Google 101 as soon as it appeared in the winter-semester syllabus. In the beginning, Bisciglia and his Google colleagues tried teaching . But in time they handed over the job to professional educators at U-Dub. "Their delivery is a lot clearer," Bisciglia says . Within weeks the students were learning how to configure their work for Google machines and designing ambitious Web-scale projects , from cataloguing the edits on Wikipedia to crawling the Internet to identify spam. Through the spring of 2007, as word about the course spread to other universities, departments elsewhere started asking for Google 101 . 60 “谷歌 101 ”一出现在冬季学期的课程安排中,学生们立即蜂拥而来选修这门课程。 起初比希利亚和谷歌的同事们尝试自己教课,不过后来他们及时地把这一工作转交给华盛顿大学的专职教员。“他们的讲解更加清晰,”比希利亚表示。 接下来的几周里,学生们学习如何调整自己的程序来适应谷歌计算机,并雄心勃勃地设计开发 网络规模 的项目,这些项目 涵盖了 从 维基百科 的 编辑分类 到 互联网 垃圾邮件的鉴别处理 等各个方面。 2007 年的整个春天,有关这门课程的消息不胫而走,其他大学的院系也开始要求参与“谷歌 101 ”计划。 61 Many were dying for cloud knowhow and computing power — especially for scientific research. In practically every field, scientists were grappling with vast piles of new data issuing from a host of sensors, analytic equipment, and ever-finer measuring tools. Patterns in these troves could point to new medicines and therapies, new forms of clean energy. They could help predict earthquakes. But most scientists lacked the machinery to store and sift through these digital El Dorados. "We're drowning in data," said Jeannette Wing, assistant director of the National Science Foundation. 62 很多人 迫切渴望了解“云”的相关知识和计算能力 ,特别是 在科研方面的计算 。实际上在每个领域,从各种传感器、分析设备以及先进的测量工具产生的 大量新数据 浩如烟海,让科学家们大伤脑筋。这些数据可能用于开发新药品和疗法、制造新的清洁能源、甚至预测地震,然而 绝大多数 科学家 缺少设备来 存储和筛检 这些“数据宝藏”。 “我们真是被淹没在了数据里,”美国国家科学基金会的助理主任周以真( Jeannette Wing )表示。 63 BIG BLUE LARGESSE 64 IBM 的慷慨 65 The hunger for Google computing put Bisciglia in a predicament. He had been fortunate to push through the order for the first cluster of computers. Could he do that again and again, eventually installing mini-Google clusters in each computer science department? Surely not. To extend Google 101 to universities around the world, the participants needed to plug into a shared resource. Bisciglia needed a bigger cloud. 66 对谷歌 计算能力 的巨大需求 倒是把比希利亚难住了。他能完成第一批 计算机集群 的采购安装已经算是很幸运了,可是他能像这样一次又一次、最终在 每个计算机学院 都装上一个 微型的谷歌“云” 吗?当然不现实。 为了把“谷歌 101 ”计划 扩展到 全球各地的大学,各 参与方 必须要 接入到共享的资源 中。因此比希利亚需要一个更大的“云”集群 。 67 That's when luck descended on the Googleplex in the person of IBM Chairman Samuel J. Palmisano. This was "Sam's day at Google," says an IBM researcher. The winter day was a bit chilly for beach volleyball in the center of campus, but Palmisano lunched on some of the fabled free cuisine in a cafeteria. Then he and his team sat down with Schmidt and a handful of Googlers, including Bisciglia. They drew on whiteboards and discussed cloud computing. It was no secret that IBM wanted to deploy clouds to provide data and services to business customers . At the same time, under Palmisano, IBM had been a leading promoter of open-source software, including Linux. This was a key in Big Blue's software battles, especially against Microsoft. If Google and IBM teamed up on a cloud venture , they could construct the future of this type of computing on Google-based standards, including Hadoop. 68 幸运之神随着 IBM 董事长彭明盛突访谷歌大厦而降临。这天成了“谷歌的彭明盛日”,一位 IBM 的研究员表示。那是一个冬日,如果要在谷歌园区里来场沙滩排球可能会有点寒冷,不过 彭明盛中午在谷歌的餐厅 体验到了 传说中的 免费大餐 。随后,他和他的团队与施米特以及包括比希利亚在内的十几名谷歌工程师 座谈交流 ,他们在白板上写写画画、讨论着“云计算”。 IBM 一直希望 部署“云”系统 来为企业客户 提供数据和服务 。与此同时,在彭明盛的领导下, IBM 已经成为 Linux 系统等开源软件的领先倡导者。这可是蓝色巨人在软件战役中的重点,尤其是在对抗微软的战斗中。 如果谷歌和 IBM 在“云”上合作,它们可能共创这种基于谷歌标准(包括 Hadoop 版本)的“云计算”的未来 。 69 Google, of course, had a running start on such a project: Bisciglia's Google 101. In the course of that one day, Bisciglia's small venture morphed into a major initiative backed at the CEO level by two tech titans. By the time Palmisano departed that afternoon, it was established that Bisciglia and his IBM counterpart, Dennis Quan, would build a prototype of a joint Google-IBM university cloud. 70 谷歌当然已在这个项目上 先行一步 ,即比希利亚的“谷歌 101 ”计划。就在会面的当天,比希利亚 小小的实践 成为 由两家技术巨头的首席执行官支持的一项重大计划的开端 。当彭明盛那天下午离开谷歌时,比希利亚和 IBM 公司的丹尼斯·全就被指派组建谷歌 -IBM 的联合 大学“云”的原型 。 71 Over the next three months they worked together at Google headquarters. (It was around this time, Bisciglia says, that the cloud project evolved from 20% into his full-time job .) The work involved integrating IBM's business applications and Google servers, and equipping them with a host of open-source programs, including Hadoop . In February they unveiled the prototype for top brass in Mountain View, Calif., and for others on video from IBM headquarters in Armonk, N.Y. Quan wowed them by downloading data from the cloud to his cell phone . (It wasn't relevant to the core project, Bisciglia says, but a nice piece of theater.) 72 在接下来的 3 个月中,他们在谷歌总部并肩作战。(比希利亚说,从那时起 “云”计划从“ 20% 时间”变成了他的全职工作 。)他们的主要工作是 把 IBM 的 商用软件 和谷歌的 服务器 进行整合,并装配大量包括 Hadoop 在内的开源程序 。 2007 年 2 月,他们在 加州 山景城向高层领导、同时通过视频向位于 纽约 阿蒙克的 IBM 总部人员 首次展示项目原型 。丹尼斯·全 用手机从“云”集群中下载数据 ,让在场人员赞叹不已。(比希利亚说,虽然与核心项目关系不大,但这的确是场很精彩的演出。) 73 The Google 101 cloud got the green light. The plan was to spread cloud computing first to a handful of U.S. universities within a year and later to deploy it globally. The universities would develop the clouds, creating tools and applications while producing legions of computer scientists to continue building and managing them. 74 “谷歌 101 ”计划获得了通过。这一计划是首先将“云计算”用一年时间扩展到全美的多家大学,之后在全球部署。 各所大学将会继续开发“云”,创建工具和应用程序,同时培养出大批的计算机科学家来继续建设和管理“云”。 75 Those developers should be able to find jobs at a host of Web companies, including Google. Schmidt likes to compare the data centers to the prohibitively expensive particle accelerators known as cyclotrons. "There are only a few cyclotrons in physics," he says. "And every one if them is important, because if you're a top-flight physicist you need to be at the lab where that cyclotron is being run. That's where history's going to be made; that's where the inventions are going to come. So my idea is that if you think of these as supercomputers that happen to be assembled from smaller computers, we have the most attractive supercomputers, from a science perspective, for people to come work on. " 76 那些开发者应该能在谷歌这样的网络公司找到工作。施米特 喜欢把这些 数据中心 比作极其昂贵的 粒子回旋加速器 。 “物理界只有几台粒子加速器,”他说,“每一台都十分重要,因为你如果是个顶尖的物理学家,你需要在有粒子加速器运行的实验室工作。那才是创造历史的地方,那才是诞生发明的地方。 所以我想, 假如你把‘云’当作由 小型计算机群 组成的 超级计算机 ,那么从科学观点讲,我们拥有 最能吸引人才的那种计算机 。 ” 77 As the sea of business and scientific data rises, computing power turns into a strategic resource, a form of capital. "In a sense," says Yahoo Research Chief Prabhakar Raghavan, " there are only five computers on earth ." He lists Google, Yahoo, Microsoft, IBM, and Amazon. Few others, he says, can turn electricity into computing power with comparable efficiency. 78 随着商用和科学数据量日益壮大, 数据计算能力 转变成一种 战略种资源 和一种 资本 。“从某种意义上说,” 雅虎研究主管 普拉巴卡·拉加万 表示,“世界上不过有 5 台 真正的计算机 。”他意指谷歌、雅虎、微软、 IBM 和亚马逊这几家公司。他表示,除此之外,没有哪家能有 相似的实力 把“电流”转化为 数据计算能力 。 79 All sorts of business models are sure to evolve. Google and its rivals could team up with customers, perhaps exchanging computing power for access to their data. They could recruit partners into their clouds for pet projects, such as the company's clean energy initiative, announced in November. With the electric bills at jumbo data centers running upwards of $20 million a year, according to industry analysts, it's only natural for Google to commit both brains and server capacity to the search for game-changing energy breakthroughs. 80 毫无疑问,各种 商业模式 都会进化。 谷歌及其对手能与客户合作、或许是通过 交换计算能力 来取得数据的访问权。 它们可以在“云”中引入合作伙伴进行次级项目的开发,比如公司在 2007 年 11 月宣布的清洁能源计划。行业分析家表示,随着 大型数据中心 电费开支每年以 2000 万美元的速度上升,也只有谷歌能够以其 智力资源和服务器容量 来承担重任,以图在寻找 创新能源 方面取得突破。 81 What will research clouds look like? Tony Hey, vice-president for external research at Microsoft, says they'll function as huge virtual laboratories, with a new generation of librarians—some of them human—"curating" troves of data, opening them to researchers with the right credentials. Authorized users, he says, will build new tools, haul in data, and share it with far-flung colleagues. In these new labs, he predicts, "you may win the Nobel prize by analyzing data assembled by someone else." Mark Dean, head of IBM's research operation in Almaden, Calif., says that the mixture of business and science will lead, in a few short years, to networks of clouds that will tax our imagination . "Compared to this," he says, "the Web is tiny. We'll be laughing at how small the Web is." And yet, if this "tiny" Web was big enough to spawn Google and its empire, there's no telling what opportunities could open up in the giant clouds . 82 用于研究的“云” 会是什么样子? 微软 负责外部研究的副总裁 Tony Hey 介绍,他们 把“云”建成 大型的虚拟 实验室,应用新一代 管理程序 配以适当人工来 管理数据 ,分级别适度开放给研究人员。 他说,授权用户将开发出新工具、补充新数据,并与各地同事广泛共享。据他预测,在这些新的实验室中,“你可以通过 分析 从别人那里汇集来的 数据 赢取诺贝尔奖”。位于加州 Almaden 的 IBM 研究运营部门负责人马克·迪安表示, 短短几年内,商用和科学用途的 结合 将产生“云”网络,我们尽可放开想象 。“与这个相比,”他说,“现在的网络微不足道。我们将会嘲笑 现在的网络实在太小 。”然而, 如果这个“太小”的网络对于谷歌帝国发展都已经足够大了,那么没人能预测巨型的“云”网络可以提供什么样的机会 。 83 It's a mid-November day at the Googleplex. A jetlagged Christophe Bisciglia is just back from China , where he has been talking to universities about Google 101. He's had a busy time, not only setting up the cloud with IBM but also working out deals with six universities—U-Dub, Berkeley, Stanford, MIT, Carnegie Mellon, and the University of Maryland —to launch it. Now he's got a camera crew in a conference room, with wires and lights spilling over a table. This is for a promotional video about cloud education that they'll release, at some point, on YouTube (GOOG). 84 2007 年 11 月的一天,比希利亚刚刚从中国回到美国,还没来得及倒时差。他在中国的几所大学介绍了“谷歌 101 ”的计划。他的时间表排得满满当当, 不仅要和 IBM 共同建立“云”集群,还需要处理华盛顿大学、加利福尼亚大学伯克利分校、斯坦福大学、麻省理工学院、卡内基 - 梅隆大学以及马里兰大学等 6 所高校“云”计划启动的相关事宜 。此时,他把一位摄像师请到会议室,线缆和灯光占满了整张桌子。他将录制一段有关“云”教学的宣传片,这段视频或许会 选择在 YouTube 网站 发布。 85 Eric Schmidt comes in. At 52, he is nearly twice Bisciglia's age, and his body looks a bit padded next to his protégé's willowy frame. Bisciglia guides him to a chair across from the camera and explains the plan. They'll tape the audio from the interview and then set up Schmidt for some stand-alone face shots. "B-footage," Bisciglia calls it. Schmidt nods and sits down. Then he thinks better of it. He tells the cameramen to film the whole thing and skip stand-alone shots. He and Bisciglia are far too busy to stand around for B footage. 86 埃里克·施米特走了进来。 52 岁的他几乎比比希利亚年长一倍,与爱徒清瘦的体格相比,他显得强壮不少。比希利亚 把他引到摄像机面前并解释 这个宣传计划。他们会录制采访中的音频,再拍几个施米特的单人正面镜头。比希利亚把这些镜头称为“ B 级胶片”。施米特同意了计划并落座下来。他 想到了更好的主意 ,他告诉摄像师 拍下全景 而省略单人镜头。的确,他和比希利亚 忙得怎么有时间单干 呢?
1+12, everything is conditional... 中华民族的瑰宝,传统中草药,几千年的临床历史证明了什么?我们可以宏观的讲,混合物就是好,使得药物的协同作用得以体现,也许实现了多靶点治疗。疾病是复杂的,糖尿病,心血管疾病,肥胖,其触发机制是什么呢?基因的,环境的,基因与环境相互作用的? 再看看我们的数据分析。当前火爆的 Genome-Wide Association Studies, 医学界标准的单变量分析(当然还有必须的p value 了),一大批的发表在一流杂志上的文章报告了基于单变量分析的很多跟某某病相关的SNP or LOCI, 这样的结果有意义吗或者说意义有多大?绝大部分的疾病都不会是因单个基因引起的,而是多个基因的共同作用。当然了,在现阶段,去研究所有变量(基因)的组合是不大可能的,但是,是不是数据分析方法可以稍稍进步一点点儿呢?多考虑一点点儿变量的共同作用呢?文献中有那么一些零零星星得报道,但绝对不是主流... 期待,期待好的方法,期待对数据的更好的理解...