I was sick of seeing university IT warning about what I can and cannot say in my university email. So, when I had trouble logged in from China, I opened a yahoo email account. I am not sure since when, it started to show ads along the right hand side of the screen when I work on my email. Ok, it's free email, and I will let you have it your way. However, today I noticed an extra line ABOVE my email list, and I don't know how to get rid of it. That is really annoy! Well, there is not much I can do to make yahoo behave, but I can move to gmail...
Contents 1 Google Bigtable: A Distributed Storage System for Structured Data Pregel: A System for Large-Scale Graph Processing PageRank vs HITS 2 Yahoo PNUTS: Yahoo!’s Hosted Data Serving Platform Data Challenges at Yahoo! Cloud Data Management @ Yahoo! 3 Microsoft Scale-Out Beyond Map-Reduce 1 Google三驾马车 Bigtable GFS beamer_bigtable_osdi06.pdf bigtable-osdi06.pdf Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber OSDI 2006 Abstract Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a exible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable. 1 Introduction the organization of this paper: 1) Section 2 describes the data model in more detail 2) Section 3 provides an overview of the client API. 3) Section 4 briey describes the underlying Google infrastructure on which Bigtable depends. 4) Section 5 describes the fundamentals of the Bigtable implementation 5) Section 6 describes some of the renements that we made to improve Bigtable's performance. 6) Section 7 provides measurements of Bigtable's performance. 7) Section 8 describe several examples of how Bigtable is used at Google 8) Section 9 discuss some lessons we learned in designing and supporting Bigtable 9) Finally, Section 10 describes related work 10) Section 11 presents our conclusions. 2 Data Model 3 API 4 Building Blocks GFS: the distributed Google File System 5 Implementation three major components: -- a library that is linked into every client, -- one master server, -- many tablet servers. 5.1 Tablet Location a three-level hierarchy analogous to that of a B+-tree to store tablet location information (Figure 4). 5.2 Tablet Assignment 5.3 Tablet Serving The persistent state of a tablet is stored in GFS, as illustrated in Figure 5. 5.4 Compactions 6 Renements 7 Performance Evaluation 8 Real Applications 9 Lessons several interesting lessons: -- 10 Related Work 11 Conclusions Bigtable A Distributed Storage System for Structured Data.pdf Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski SIGMOD’10, June 6–11, 2010, Indianapolis, Indiana, USA. ABSTRACT Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs|in some cases billions of vertices, trillions of edges|poses challenges to their ecient processing. In this paper we present a computational model suitable for this task. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. This vertexcentric approach is exible enough to express a broad set of algorithms. The model has been designed for ecient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Distributionrelated details are hidden behind an abstract API. The result is a framework for processing large graphs that is expressive and easy to program. Keywords: Distributed computing, graph algorithms 1. INTRODUCTION The rest of the paper is structured as follows. 1) Section 2 describes the model. 2) Section 3 describes its expression as a C++ API. 3) Section 4 discusses implementation issues, including performance and fault tolerance. 4) Section 5 present several applications of this model to graph algorithm problems 5) Section 6 present performance results. 6) Finally, Sect.7 discuss related work and future directions. 2. MODEL OF COMPUTATION Pregel a system for large-scale graph processing.pdf PPT: 20100707_Pregel.ppt PageRank vs HITS 可参考:PageRank 彻底解说 中文版(www) http://www.kreny.com/pagerank_cn.htm wiki the n-by-n connectivity matrix G the transition probability matrix of the Markov chain A MATLAB 实现: 12 pagerank.pdf The PageRank Citation Ranking - Redone.ppt Origin paper: The PageRank Citation Ranking.pdf Netlogo Model Library: PageRank 2 Yahoo PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein,Philip Bohannon, HansArno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni VLDB ‘08 ABSTRACT We describe PNUTS, a massively parallel and geographically distributed database system for Yahoo!’s web applications. PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of concurrent requests including updates and queries, and novel per-record consistency guarantees. It is a hosted, centrally managed, and geographically distributed service, and utilizes automated load-balancing and failover to reduce operational complexity. The first version of the system is currently serving in production. We describe the motivation for PNUTS and the design and implementation of its table storage and replication layers, and then present experimental results. 1. INTRODUCTION web applications The foremost requirements of a web application: -- Scalability. -- Response Time and Geographic Scope. -- High Availability and Fault Tolerance. PNUTS Yahoo!’s Hosted Data Serving Platform.pdf Data Challenges at Yahoo!》,2008 ABSTRACT: In this short paper we describe the data that Yahoo! handles, the current trends in Web applications, and the many challenges that this poses for Yahoo! Research. These challenges have led to the development of new data systems and novel data mining techniques. Problem: Storing and managing this ocean of data poses several important challenges. Four major interconnected trends: The emergence of structure Design and dynamics of social systems The Web as a delivery channel The Web as wisdom(Web mining) Data Platforms: Dynamo Two ongoing research projects Sherpa and PNUTS PIG 个人点评: 从工业界角度提出问题更值得一读! data challenges at yahoo.pdf Cloud Data Management @ Yahoo! Raghu Ramakrishnan Yahoo! Research, USA ramakris@yahoo-inc.com DASFAA 2010, Part I, LNCS 5981, p. 2, 2010. 一页短文,只有摘要, Abstract. In this talk, I will present an overview of cloud computing at Yahoo!, in particular, the data management aspects. I will discuss two major systems in use at Yahoo!–the Hadoop map-reduce system and the PNUTS/Sherpa storage system, in the broader context of offline and online data management in a cloud setting. Hadoop is a well known open source implementation of a distributed file system with a map-reduce interface. Yahoo! has been a major contributor to this open source effort, and Hadoop is widely used internally. Given that the mapreduce paradigm is widely known, I will cover it briefly and focus on describing how Hadoop is used at Yahoo!. I will also discuss our approach to open source software, with Hadoop as an example. Yahoo! has also developed a data serving storage system called Sherpa (sometimes referred to as PNUTS) to support data-backed web applications. These applications have stringent availability, performance and partition tolerance requirements that are difficult, sometimes even impossible, to meet using conventional database management systems. On the other hand, they typically are able to trade off consistency to achieve their goals. This has led to the development of specialized key-value stores, which are now used widely in virtually every large-scale web service. Since most web services also require capabilities such as indexing, we are witnessing an evolution of data serving stores as systems builders seek to balance these trade-offs. In addition to presenting PNUTS/Sherpa, I will survey some of the solutions that have been developed, including Amazon’s S3 and SimpleDB, Microsoft’s Azure, Google’s Megastore, the open source systems Cassandra and HBase, and Yahoo!’s PNUTS, and discuss the challenges in building such systems as ”cloud services”, providing elastic data serving capacity to developers, along with appropriately balanced consistency, availability, performance and partition tolerance. Cloud Data Management @ Yahoo!.pdf 3 Microsoft Scale-Out Beyond Map-Reduce Raghu Ramakrishnan, CISL Team Members KDD’13, August 11–14, 2013, Chicago, Illinois, USA. Keywords Analytics, Big Data, data science, Hadoop, YARN, REEF, Map- Reduce, SQL, Machine Learning, scale-out. 1. ABSTRACT scale-out architectures Hadoop Hive: 基于Hadoop的一个 数据仓库 工具 Apache Pig: 对mapreduce算法(框架)实现了一套SQL shell脚本 YARN: Apache Hadoop NextGen MapReduce Apache Mesos: a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks Scale-out beyond map-reduce.pdf
GOOGLE和YAHOO在线翻译试验 (by 戴小华 Xiaohua Dai) 我们看资料写论文需要翻译,现在有了在线翻译系统,可以多多少少帮助我们理解和写作,但总得来说不能只依赖web translator。下面用8个例子做了一下翻译试验。其中前面6个为专业叙述,后面2个为诗歌。 汉译英部分:YAHOO好像就GPS那段(例子3)翻译得稍微好点,另外2个例子则GOOGLE稍好。至于诗歌翻译,例子7YAHOO好多了,GOOGLE翻译不知所云而且还出现中文。 英译汉部分:例子4 GOOGLE稍好。例子5两者都翻译得不好。例子6两者差不多。例子8的诗歌翻译得都不好,但YAHOO居然还出现英文单词。 例子1: 翻译: 中文 英语 原文:遥感是利用遥感器从空中来探测地面物体性质的,它根据不同物体对波谱产生不同响应的原理,识别地面上各类地物,具有遥远感知事物的意思。【来源:百度百科】 GOOGLE翻译:Remote sensing is the use of remote sensors to detect from the air, the nature of the ground objects, which according to different objects have different spectral response of the principles, identify various types of surface features on the ground, with remote sensing of the meaning of things. YAHOO翻译:The remote sensing is from airborne surveys the ground object nature using the remote sensor, it acts according to the different object to have the different response principle to the spectrum, distinguishes in the ground each kind of terrain feature, has the remote sensation thing meaning. 例子2:翻译: 中文 英语 原文:GPS全球卫星定位系统由三部分组成:空间部分GPS星座(GPS星座是由24颗卫星组成的星座,其中21颗是工作卫星,3颗是备份卫星);地面控制部分地面监控系统; 用户设备部分GPS 信号接收机。 【来源:百度百科】 GOOGLE翻译:GPS Global Positioning System consists of three parts: the space segment-GPS constellation (GPS constellation is composed of 24 satellite constellation, are the work of one of 21 satellites, three satellites are backed up); ground control segment - the ground monitoring system ; user equipment part-GPS signal receiver. YAHOO翻译:The GPS whole world satellite positioning system is composed of three parts: The space segment - GPS constellation (the GPS constellation is constellation which is composed of 24 satellites, 21 are work satellites, 3 are backup satellite); Ground control part - ground monitoring system; Subscriber's equipment part - GPS signal receiver. 例子3:翻译: 中文 英语 原文:生态系统管理是在对生态系统组成、结构和功能过程加以充分理解的基础上,制定适应性的管理策略,以恢复或维持生态系统整体性和可持续性 【来源:百度百科】 GOOGLE翻译:Ecosystem management is the ecosystem in the composition, structure and function of the process to be fully understood on the basis of the development of adaptive management strategies to restore or maintain ecosystem integrity and sustainability of YAHOO翻译:The ecosystem management is to the ecosystem composition, the structure and the function process performs the full understanding in the foundation, the formulation compatible management games, restore or maintain the ecosystem integrity and sustainable 例子4:翻译: 英语 中文(简体) 原文:Most remote-sensing technology uses light, whether infrared or visible, that falls at the middle to high end of the electromagnetic spectrum. By contrast, at least one important means of remote detection uses microwaves, which are much lower in energy levels 【来源: http://www.answers.com/ 】 GOOGLE翻译:大多数遥感技术利用光,无论是红外线或可见光,属于在中东高端的电磁频谱。相比之下,至少有一个重要手段,利用微波遥感探测,这是低得多的能量水平 YAHOO翻译:多数遥感技术是否使用光,红外或可看见,落在中部对高端电磁波频谱。 相反,一遥远的侦查至少重要手段使用微波,是低在能级 例子5:翻译: 英语 中文(简体) 原文:A system of satellites, computers, and receivers that is able to determine the latitude and longitude of a receiver on Earth by calculating the time difference for signals from different satellites to reach the receiver. 【来源: http://www.answers.com/ 】 GOOGLE翻译:一个系统的卫星,电脑和接收器,能够确定的纬度和经度,地球上的接收器通过计算时间差的信号来自不同卫星到达接收器。 YAHOO翻译:能通过计算信号的时差确定一台接收器纬度和经度在地球上的从不同的卫星到达接收器卫星、计算机和接收器的系统。 例子6: 翻译: 英语 中文(简体) 原文:Many people and organizations have defined ecosystem management. The following examples represent a cross-section of definitions. There are two themes common to most of these definitions of ecosystem management: (1) management should maintain or improve ecosystems; and (2) ecosystems should provide a range of goods and services to current and future generations. 【来源: http://silvae.cfr.washington.edu/ecosystem-management/EcoManFrame.html 】 GOOGLE翻译:许多个人和组织进行了定义,生态系统管理。下面的例子是一个跨部门的定义。有两个共同主题,其中多数生态系统管理的定义: ( 1 )管理部门应维持或改善的生态系统;及( 2 )生态系统应提供各种货物和服务的当前和未来的几代人。 YAHOO翻译:许多人民和组织定义了生态系管理。 以下例子代表定义的横断面。 有二个题材共同对大多生态系管理的这些定义: (1)管理应该维护或改进生态系; 并且(2)生态系应该提供货物和服务的范围给当前和下一代。 例子7:诗歌翻译: 中文 英语 原文:红豆生南国,春来发几枝? 愿君多采撷,此物最相思【来源: 唐诗 《 相思》 王维 】 GOOGLE翻译:Health and southern red beans, spring几枝hair? Eagle would like to gather many, the most complex crassicarpa YAHOO翻译:The red bean lives the southern part of China, how many spring sends? Is willing Mr. to pick, this thing most lovesickness 例子8:诗歌翻译: 英语 中文(简体) 原文:the more we live, more brief appear our life's succeeding stages; a day to childhood seems a year, and years like passing ages. 【来源:the river of life 生命之川,by Thomas Campbell 妥默司康沫尔 】 GOOGLE翻译:更为我们的生活,更简短的出现 我们生活的成功阶段; 每天的童年似乎一年, 和多年想通过年龄。 YAHOO翻译:越多我们居住,摘要更出现 our生活的成功的阶段; 对童年的a天似乎一年, and年喜欢通过年龄。 相关链接: (1)十大在线翻译系统准确性大评比 【测评时间2008年中】 http://www.quanyo.com/zt/shangwang/2008730233520.html (2)在线翻译谁更厉害四大流行系统对比 【测评时间2007年下】 http://bbs.it.com.cn/showtopic-279164.aspx (3)突破语言障碍 在线翻译系统横向评测 【测评时间2006年中】 http://publish.it168.com/2006/0817/20060817000201.shtml 备注:随便举了8个例子,样本数应该不够,可以参考上面3个链接选择你自己喜欢的在线翻译网站。此外,专业术语可以通过dict.cnki.net查到比较好的翻译。 (2009.4.5)