博文

介绍国外几位生物信息学家(8)~~Ron Shamir

已有 6175 次阅读 2011-5-24 10:19 |个人分类:生物信息|系统分类:科研笔记|关键词:学者| 生物信息学

Ron Shamir, 1953年生于Jerusalem, 是Tel Aviv University计算机科学学校的教授。他在Hebrew University, Jerusalem获得了数学和物理学学士学位，在University of California at Berkelay的运筹学专业获博士学位。他是把复杂巧妙的图论技巧应用于生物信息学的先驱。Shamir的研究兴趣总是围绕着设计和算法分析，他的博士论文研究是线性规划算法，而后他发展了具有强烈趣味的图论算法，经过了曲折的历程，最终把他引向了计算生物学。回到1990年，Shamir与Martin Golumbic合作研究了时序推理(temporal reasoning)相关的问题。在这些问题中，对于一个事件的集合，每一个都与时间轴的区间联系。在两个事件之间，具有一系列的限制关系，比如这些区间是否重叠。需要确定的是在时间轴上是否存在一个区间的事件集合来满足所有的限制。这是当Seymaour Benzer碰到试图确立基因线性的问题时的一般化抽象，可是在那个时候，Shamir还不知道Benzer的工作。他说：

“我在1990年的一次会议上发表了时序推理的工作，而后Gene Lawler告诉我这个模型和DNA绘制问题符合的相当好。我那时对DNA什么都不知道，就开始阅读学习，我还给Eric Lander描述了有绘制数据的时序推理的结果。一些星期之后，Rutgers University组织了一个在计算生物学的开放日。Mike Waterman和Eric Lander来做了非常精彩的基因组计划和计算方面挑战的报告。Eric甚至提到我们的结果是计算机科学对这一领域做贡献的一个重要的例子。我认为他非常夸大了这一点，而这两个报告驱使我开始阅读生物学的教课书和文章。在第一年，我得到了我妻子Michal非常大的帮助，她恰好是一位生物学家，耐心地解答了我许多不起眼的问题。”

这个偶然的会议，Gene Lawer和Eric Lander把Shamir变成了生物信息学家。从1991年他在计算生物学的算法上投入了越来越多的时间。他仍然对图算法问题感兴趣，经常应用图理论技巧解决生物学问题。近些年，他趋向于把离散数学技巧和概率论结合起来，Ron说

“我仍然关注问题的复杂性和算法，也关心和热爱对一个问题NP完全性的证明。而生物学的训练使我当理论计算机科学方法论只能带你走那么远时，而数据却需要的更多的时候，‘容忍’启发式的方法。”

Shamir早期在生物信息学中提倡严密的聚类算法。虽然聚类算法理论在六十年代已经发展起来了，他对聚类的兴趣是在九十中期作为和在柏林的Max Plank Institute的Hans Lehrach的合作的一部分开始的。Lehrach是研究寡核苷酸指纹图谱技术之父，这个技术提早了DNA芯片技术。Lehrach在柏林有一个“工厂”，它能够排列成千的未知cDNA克隆，而后与k-mer(长度为k的DNA序列）探针进行杂交（一般是8-mers或9-mers）这些重复成打的探针，给出每一个cDNA克隆一个指纹图谱。为寻找哪些克隆对应相同的基因，就必须根据指纹图谱，对他们进行聚类，数据中的噪声水平使得精确聚类是个非常大的挑战。

正是这个挑战让Shamir开始思考聚类。由于聚类领域是非常混杂的，散布在不同的领域，他首先开始发展他的算法，而不是学习文献。很自然的，Shamir凭借图论算法，用聚类的单元（在指纹图谱的应用中是克隆）作为节点，被边连在一起的克隆是具有相似的指纹图谱，自然下一步是根据最小分割来重复分割图（最小分割就是那些去除之后使图失去连接的边的集合）。由此他的第一个聚类算法诞生了。事实上这个想法对Shamir来说那么自然，他的学生Erez Hartuv用了很长的时间查找文献，只是确认它还没有被发表，他们没有发现相同的方法，而是发现十年前有类似的想法，对Shamir来说：

“对于普遍存在的聚类问题，我一点也不奇怪在未来发现一篇考古，动物学或其他领域的旧文章有相同的想法。”

然后，在权衡了这个方法的利与弊之后，与他的学生Roded Sharan, Shamir改进了算法，加入了缜密的统计学分析，建立了这个广受欢迎的聚类算法CLICK,由于微阵列表达谱的大量数据的发表，这给它提供了巨大的机会，激发了Shamir与Amir Ben-Dor和Zoha Yakhini进一步开发了PCC和CAST。

Shimir是一位非常少的能够在算法和生物信息学的科研中领军人物的科学家之一，他在成为一个生物信息学家之前已经是算法领域有突破贡献的科学家，他说：

“可能在我的科学生涯中，最令我感到兴奋的时刻是我的1983年博士的结果，证明了Simplex算法的平均复杂性是二次的。Simplex算法是很常用的算法之一，在科学和技术领域差不多有成千的应用实例。从1950年开始，Simplex算法被认为在实践中非常有效率，过去和现在被普遍的应用着。可是Simplex对不同的变种的最差时间是指数的，有多个例子，没有人证明它有效率。这是一个尴尬的处境，解决它是非常可喜的。我与导师Ilan Ader(凸规划方向的领军人物，是Simplex算法之父Dantzig的学生）和Richard Karp（伟大理论计算机学家之一）一起对它进行了工作，这个合作是极端令人兴奋的。类似的结果在同一时间由其他的研究组同时得到。其它令人振奋的时刻，特别是在某天发现了一些非常吃惊的新结果，而第二天又发现其中的错误的之间的那些时刻，可这些正是科学研究懊恼的一种回报。”

Shimir把自己定位于不只投入研究一个特定的问题。在大学的设置中，科学研究的优势在于自由的选择某人的兴趣和嗅觉，而不是在同一问题中强迫工作很多年。好在生物学令人兴奋的快速进展提出了连续不断的新问题、想法和数据。当然，研究者不能因此呼吸影响研究深度的有害空气。实际上，除了“不投入”的想法，Ron发现自己在相同的科研领域持续工作几年的时间，跟踪这些领域的文献一段时间。他趋向于在几个不同领域的问题同时工作，并试图找出它们的一些交叉，这样就省去了理解和跟踪几个领域的努力，Shamir说：

“就我的观点，一个开放的头脑，敏感的嗅觉，在相关领域的坚实基础，以及努力的工作是科学发现中最重要的因素。就象法官说的那样，‘灵感只是出现在法律图书馆的凌晨三点时’，这在科学中更加真实。灵活性和敏感性对意外的发现是非常重要的。”

Shamir认为基因网络的研究是理解基因、蛋白质和其它分子如何象开演奏会一样工作的，这个问题将是一个长期的挑战。

原文：

   Ron Shamir (born 1953 in Jerusalem) is a professor at the School of Computer Science at Tel Aviv University. He holds an undergraduate degree in mathematics and physics from Hebrew University, Jerusalem and a PhD in operations research from the University of California at Berkeley. He was a pioneer in the application of sophisticated graphtheoretical techniques in bioinformatics. Shamir's interests have always revolved around the design and analysis of algorithms and his PhD dissertation was on studies of linear programming algorithms. Later he developed a keen interest in graph algorithms that eventually brought him (in a roundabout way) to computational biology. Back in 1990 Shamir was collaborating with Martin Golumbic on problems related to temporal reasoning. In these problems, one has a set of events, each modeled by an unknown interval on the timeline, and a collection of constraints on the relations between each two events, for example, whether two intervals overlap or not. One has to determine if there exists a realization of the events as intervals on the line, satisfying all the constraints. This is a generalization of the problem that Seymour Benzer faced trying to establish the linearity of genes, but at the time Shamir did not know about Benzer's work. He says:
    I presented my work on temporal reasoning at a workshop in 1990 and the late Gene Lawler told me this model fits perfectly the DNA mapping problem. I knew nothing about DNA at the time but started reading, and I also sent a note to Eric Lander describing the temporal reasoning result with the mapping consequences. A few weeks later Rutgers University organized a kickoff day on computational biology and brought MikeWaterman and Eric Lander who gave fantastic talks on the genome project and the computational challenges. Eric even mentioned our result as an important example of computer science contribution to the field. I think he was exaggerating quite a bit on this point, but the two talks sent me running to read biology textbooks and papers. I got a lot of help during the first years from my wife Michal, who happens to be a biologist and patiently answered all my trivial questions.
    This chance meeting with Gene Lawler and Eric Lander converted Shamir into a bioinformatician and since 1991 he has been devoting more and more time to algorithmic problems in computational biology. He still keeps an interest in graph algorithms and often applies graph-theoretical techniques to biological problems. In recent years, he has tended to combine discrete mathematics techniques and probabilistic reasoning. Ron says:
    I still worry about the complexity of the problems and the algorithms, and would care about (and enjoy) the NP-hardness proof of a new problem, but biology has trained m e to be "Äútolerant‚" to heuristics when the theoretical computer science methodologies only take you so far, and the data require more.
    Shamir was an early advocate of rigorous clustering algorithms in bioinformatics. Although clustering theory has been under development since the 1960s, his interest in clustering started in themid-1990s as part of a collaboration with Hans Lehrach of the Max Planck Institute in Berlin. Lehrach is one of the fathers of the oligonucleotide fingerprinting technology that predates the DNA chips techniques. Lehrach had a "factory"in Berlin that would array thousands of unknown cDNA clones, and then hybridize them to k-mer probes (typically 8-mers or 9-mers). This was repeated with dozens of different probes, giving a fingerprint for each cDNA clone. To find which clones correspond to the same gene, one has to cluster them based on their fingerprints, and the noise level in the data makes accurate clustering quite a challenge.
    It was this challenge that brought Shamir to think about clustering. As clustering theory is very heterogeneous and scattered over many different fields, he set out to develop his algorithm first, before learning the literature. Naturally, Shamir resorted to graph algorithms, modeled the clustered elements (clones in the fingerprinting application) as vertices, and connected clones with highly similar fingerprints by edges. It was a natural next step to repeatedly partition the graph based on minimum cuts (sets of edges whose removal makes the graph disconnected) and hence his first clustering algorithm was born. Actually the idea looked so natural to Shamir that he and his student Erez Hartuv searched the literature for a long time just to make sure it had not been published before. They did not find the same approach 382 10 Clustering and Trees but found similar lines of thinking in works a decade earlier. According to Shamir,
    Given the ubiquity of clustering, I would not be surprised to unearth in the future an old paper with the same idea in archeology, zoology, or some other field.
    Later, after considering the pros and cons of the approach, together with his student Roded Sharan, Shamir improved the algorithm by adding a rigorous probabilistic analysis and created the popular clustering algorithm CLICK. The publication of vast data sets of microarray expression profiles was a great opportunity and triggered Shamir to further develop PCC and CAST together with Amir Ben-Dor and Zohar Yakhini.
    Shamir is one of the very few scientists who is regarded as a leader in both algorithms and bioinformatics andwho was creditedwith an important breakthrough in algorithms even before he became a bioinformatician. He says:
    Probably the most exciting moment inmy scientific life was my PhD result in 1983, showing that the average complexity of the Simplex algorithm is quadratic. The Simplex algorithm is one of the most commonly used algorithms, with thousands of applications in virtually all areas of science and engineering. Since the 1950s, the Simplex algorithm was known to be very efficient in practice and was (and is) used ubiquitously, but worst-case exponential-time examples were given for many variants of the Simplex, and none was proved to be efficient. This was a very embarrassing situation, and resolving it was very gratifying. Doing it together with my supervisors Ilan Adler (a leader in convex optimization, and a student of Dantzig, the father of the Simplex algorithm) and Richard Karp (one of the great leaders of theoretical computer science) was extremely exciting. Similar results were obtained independently at the same time by other research groups. There were some other exhilarating moments, particularly those between finding out an amazing new result and discovering the bug on the next day, but these are the rewards and frustrations of scientific research.
    Shamir views himself as not committed to a particular problem. One of the advantages of scientific research in the university setting is the freedom to follow one's interests and nose, and not be obliged to work on the same problem for years. The exciting rapid developments in biology bring about a continuous flow of new problems, ideas, and data. Of course, one has to make sure breadth does not harm the research depth. In practice, in spite of this "uncommitted" ideology, Ron finds himself working on the same areas for several years, and following the literature in these fields a while later. He tends to work on several problem areas simultaneously but tries to choose topics that are overlapping to some extent, just to save on the effort of understanding and following several fields. Shamir says:
    In my opinion, an openmind, high antennae, a solid background in the relevant disciplines, and hard work are the most important elements of discovery. As a judge once said, "Inspiration only comes to the law library at 3 AM".This is even more true in science. Flexibility and sensitivity to the unexpected are crucial.
   Shamir views the study of gene networks, that is, understanding how genes, proteins, and other molecules work together in concert, as a longstanding challenge.

转载本文请联系原作者获取授权，同时请注明本文来自卢宏超科学网博客。
链接地址：https://m.sciencenet.cn/blog-565112-447379.html

上一篇：介绍国外几位生物信息学家(6)~~Web Miller
下一篇：介绍国外几位生物信息学家(9)~~David Haussler

葡萄皮的个人空间分享 http://blog.sciencenet.cn/u/Hadron74

博文

介绍国外几位生物信息学家(8)~~Ron Shamir

当前推荐数：0

发表评论评论 (0 个评论)

卢宏超

全部作者的精选博文

全部作者的其他最新博文

全部精选博文导读

相关博文

葡萄皮的个人空间分享 http://blog.sciencenet.cn/u/Hadron74

博文

介绍国外几位生物信息学家(8)~~Ron Shamir

当前推荐数：0

发表评论 评论 (0 个评论)

卢宏超

全部作者的精选博文

全部作者的其他最新博文

全部精选博文导读

相关博文

发表评论评论 (0 个评论)