What we cannot control, we do not understand. — Adapted from Richard Feynman: “What I cannot create, I do not understand.” The best way to predict the future is to control it. — Adapted from: “The best way to predict the future is to invent it.” (See https://quoteinvestigator.com/2012/09/27/invent-the-future/ )
发现之前推荐过的两本非常好的书最近都再版了,供大家参考: Richard W. Hamming 的 The Art of Doing Science and Engineering: Learning to Learn 链接: https://www.amazon.com/Art-Doing-Science-Engineering-Learning/dp/1732265178 Richard Bellman 的 Eye of the Hurricane: An Autobiography 链接: https://www.amazon.com/Hurricane-Autobiography-Richard-Ernest-Bellman/dp/9971966018
IEEE/CAA JAS第7卷第4期发表了 关于智能控制、稳定性分析、机器人、图像处理、智能车辆、机器学习、多智能体系统等方向论文。欢迎阅览。 01 Qinglai Wei, Hongyang Li and Fei-Yue Wang, Parallel Control for Continuous-Time Linear Systems: A Case Study , IEEE/CAA J. Autom. Sinica , vo l. 7, no. 4, pp. 919-928, July 2020. Highlights: ❖ A new parallel control structure for continuous-time linear systems is proposed. ❖ The parallel controller is proposed based on parallel control theory. ❖ The parallel controller considers both system state and control as input. ❖ The parallel controller can avoid the disadvantages of state feedback control. 02 Pierluigi Di Franco, Giordano Scarciotti and Alessandro Astolfi, Stability of Nonlinear Differential-Algebraic Systems Via Additive Identity, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 929-941, July 2020. Highlights: ❖ Representation of DAE systems as feedback interconnection. ❖ Stability analysis forDAE systems via Lyapunov Method and Small Gain-like arguments. ❖Stability analysis for nonlinear mechanical systems with holonomic constraints. ❖Stability analysis of Lipschitz DAE systems. 03 Jacob H. White and Randal W. Beard, An Iterative Pose Estimation Algorithm Based on Epipolar Geometry With Application to Multi-Target Tracking , IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 942-953, July 2020. Highlights: ❖ This paper introduces a new algorithm for estimating the relative pose of a moving camera. ❖ A novel optimization algorithm solves for the relative pose using the epipolar constraint. ❖ Applications include multi-target tracking, visual odometry, and 3D scene reconstruction. ❖ If IMU information is available, it is used to seed the pose estimation algorithm. ❖ Real-time execution of the algorithm is demonstrated on an embedded flight platform. 04 Haowei Lin, Bo Zhao, Derong Liu and Cesare Alippi, Data-based Fault Tolerant Control for Affine Nonlinear Systems Through Particle Swarm Optimized Neural Networks, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 954-964, July 2020. Highlights: ❖ A data-based fault tolerant control scheme is investigated. ❖ The unknown system dynamics is approximated by PSO-NN identifier. ❖ The HJB equation is solved with a high successful rate by the PSOCNN. ❖ The online fault tolerant control is shown to be optimal. 05 Xiaodong Zhao, Yaran Chen, Jin Guo and Dongbin Zhao, A Spatial-Temporal Attention Model forHuman Trajectory Prediction, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 965-974, July 2020. Highlights: ❖ Study the trajectory prediction jointly with temporal and spatial affinities. ❖ A LSTM model that uses attention mechanism to improve the accuracy of trajectory prediction . ❖ An experimental error analysis using data based on both world plane and image plane. 06 Ali Forootani, Raffaele Iervolino, Massimo Tipaldi and Joshua Neilson, Approximate Dynamic Programming for Stochastic Resource Allocation Problems, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 975-990, July 2020. Highlights: ❖ MDP based resource allocation problem is proposed. ❖ MPC is considered in the framework of the MDP. ❖ Algorithms suitable for computer implementation are proposed. ❖ Compressive sampling is considered for ADP. ❖ Linear architecture is considered for ADP. 07 Liang Yang, Bing Li, Wei Li, Howard Brand, Biao Jiang and Jizhong Xiao, Concrete Defects Inspection and 3D Mapping Using CityFlyer Quadrotor Robot, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 991-1002, July 2020. Highlights: ❖ A high-quality labeled dataset for crack and spalling detection, which is the first publicly available dataset for visual inspection of concrete structures. ❖ A robotic inspection system with visual-inertial fusion to obtain pose estimation using an RGB-D camera and an IMU. ❖ A depth in-painting model that allows depth hole in-painting in an end-to-end approach with real-time performance. ❖ A multi-resolution model that adapts to image resolution changes and allows accurate defect detection in the field. 08 Giancarlo Fortino, Antonio Liotta, Fabrizio Messina, Domenico Rosaci and Giuseppe M. L. Sarnè, Evaluating Group Formation in Virtual Communities, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 1003-1015, July 2020. Highlights: ❖ The problem of forming effective groups in virtual communities is addressed. ❖The proposed solution exploits trust information without significant overhead by adopting local reputation instead of global reputation. ❖An index to measure the effectiveness of group formation is introduced, as well as an algorithm to drive group formation as proof of concept. ❖Experimental trials performed on two data sets extracted from social networks have shown that the adoption of the proposed solution offer significant advantages. 09 Chinthaka Premachandra, Dang Ngoc Hoang Thanh, Tomotaka Kimura and Hiroharu Kawanaka, A Study on Hovering Control of Small Aerial Robot by Sensing Existing Floor Features, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 1016-1025, July 2020. Highlights: ❖ Hovering control of small aerial robot. ❖Image processing using small-type and low-weight microcontrollers. ❖Specific image feature point detection by weak directional pattern analysis. ❖On-board camera image processing based autonomous flight control of UAV. ❖Simple and low-cost image noise removal process. 10 Mohammadhossein Ghahramani, Yan Qiao, MengChu Zhou, Adrian O’Hagan and James Sweeney, AI-Based Modeling and Data-Driven Evaluation for Smart Manufacturing Processes, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 1026-1037, July 2020. Highlights: ❖ To address this concern, a dynamic feature selection model based on an integrated algorithm including a meta-heuristic method (GA) and an artificial neural network is proposed. ❖The implemented algorithm considers two major conflicting objectives: minimizing the number of features and maximizing the classification performance. ❖The proposed AI-based multi-objective feature selection method together with an efficient classification algorithm can enables decision makers to scrutinize manufacturing processes. 11 Yaojie Zhang, Bing Xu and Tiejun Zhao, Convolutional Multi-Head Self-Attention on Memory for Aspect Sentiment Classification, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 1038-1044, July 2020. Highlights: ❖ Using convolution and self-attention to capture semantic information of n-gram and sequence itself. ❖The aspect-sequence modeling ability and network parallelism of memory network are preserved. ❖Can complete ACSA and ATSA tasks and win in baseline. 12 Chaoyue Zu, Chao Yang, Jian Wang, Wenbin Gao, Dongpu Cao and Fei-Yue Wang, Simulation and Field Testing of Multiple Vehicles Collision Avoidance Algorithms, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 1045-1063, July 2020. Highlights: ❖ A distributed real-time MVCA algorithm is proposed by extending the reciprocal n-body collision avoidance method and enables the intelligent vehicles to choose their destinations and control inputs independently. ❖The effects of latency and packet loss on MVCA are also statistically investigated through theoretically formulating broadcasting process based on one-dimensional Markov chain and the results uncover that the tolerant delay should not exceed the half of deciding cycle of trajectory planning, and shortening the sending interval could alleviate the negative effects caused by the packet loss to an extent. ❖The MVCA was tested by a real intelligent vehicle, the information on obstacles and the latitude and longitude of the vehicle were input into the algorithm, 13 Kritika Bansal and Pankaj Mukhija, Aperiodic Sampled-Data Control of Distributed Networked Control Systems Under Stochastic Cyber-Attacks, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 1064-1073, July 2020. Highlights: ❖ A hybrid aperiodic sampled-data mechanism for distributed networked control systems under stochastic deception attacks is introduced to alleviate the problem of computational load, energy consumption and communication load. ❖A more general attack scenario on distributed networked control systems is considered whereby stochastic deception attacks of different intensity on different subsystems may occur. ❖The implementation of self-triggering strategy alone for distributed networked control systems under attack is also presented. ❖The analysis of the proposed strategy for an isolated system is presented as a special case. Also, minimum inter-event time is obtained for an isolated system under deception attack. 14 Chao Han and Yuzhen Shen, Three-Dimensional Scene Encryption Algorithm Based on Phase Iteration Algorithm of the Angular-Spectral Domain, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 1074-1080, July 2020. Highlights: ❖ An accurate angular spectrum diffraction is used to reduce the loss of information transmission. ❖The combination of the angular spectrum diffraction and the three - phase iterative algorithm improves the security of the encrypted information. ❖The algorithm proposed can achieve the encryption and decryption of 3D scenes and increase the capacity of the encrypted information. 15 Xiaoyuan Wang, Chenxi Jin, Xiaotao Min, Dongsheng Yu and Herbert Ho Ching Iu, An Exponential Chaotic Oscillator Design and Its Dynamic Analysis, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 1081-1086, July 2020. Highlights: ❖ Exponential nonlinear term This exponentially nonlinear term may make the new chaotic system have better performance. And the effectiveness of this exponential chaotic system has been proved by various theoretical analyses. ❖NIST test The exponential chaotic system passed all fifteen tests, but the Lü system passed only fourteen of them. Also the exponential chaotic system has 9 tests with P-values greater than the Lü system in all 15 tests. ❖Circuit This paper has designed a circuit corresponding to the exponential chaotic system. And the simulation results of Multisim are consistent with the theoretical analysis. 16 Mohammad Javad Morshed, A Nonlinear Coordinated Approach to Enhance the Transient Stability of Wind Energy-Based Power Systems, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 1087-1097, July 2020. Highlights: ❖ Introduce a new nonlinear coordination method based on MIMO zero dynamics approach. ❖Coordinate controllers of DFIG and synchronous generators (SGs) in multi-machine power systems. ❖Propose a coordinated framework for large scale power systems with n-DFIG and m-SG. ❖Enhance transient and voltage stability of inter-connected power systems. ❖The proposed approach is implemented to the IEEE 39-bus power systems. 17 Chao Deng, Weinan Gao and Weiwei Che, Distributed Adaptive Fault-Tolerant Output Regulation of Heterogeneous Multi-Agent Systems With Coupling Uncertainties and Actuator Faults, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 1098-1106, July 2020. Highlights: ❖ A novel distributed adaptive fault-tolerant control method is proposed to solve the fault-tolerant output regulation problem for heterogeneous MASs with matched system uncertainties and mismatched coupling uncertainties among subsystems. ❖Different from the existing distributed fault-tolerant control result, a more general directed network topology is considered in this paper. ❖ A novel sufficient condition with cyclic-small-gain condition is proposed by using the linear matrix inequality technique. 18 Jing Huang, Yimin Chen, Xiaoyan Peng, Lin Hu and Dongpu Cao, Study on the Driving Style Adaptive Vehicle Longitudinal Control Strategy, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 1107-1115, July 2020. Highlights: ❖ A driver-adaptive fusion control strategy of Adaptive Cruise Control and Collision Avoidance was proposed. ❖Different styles of divers’ driving behavioural data were collected via driving simulator experiments, corresponding driving behaviour characteristics were extracted and used in the driver-adaptive control. ❖Real-time recognition of driving style was achieved based on fuzzy reasoning rule. ❖The effect of the fusion control strategy was validated by virtual experiments. 19 Qi Wu, Li Yu, Yao-Wei Wang and Wen-An Zhang, LESO-based Position Synchronization Control for Networked Multi-Axis Servo Systems With Time-Varying Delay, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 1116-1123, July 2020. Highlights: ❖ It is demonstrated that the proposed approach can deal with the effects of system uncertainty, external disturbance, and short time-varying for the NMASS. ❖It is rigorously proved that the closed-loop control system under the proposed controller is bounded-input-bounded-output (BIBO) stable. ❖It is verified that the proposed method has better tracking and synchronization performance than the improve PID-based method by testing on a four-axis NMASS experimental platform. ❖The bandwidth-parameterization tuning method is applied in both controller design and observer design, so that the number of parameters that need to be adjusted is greatly reduced. 20 Longwei Fang, Zuowei Wang, Zhiqiang Chen, Fengzeng Jian, Shuo Li and Huiguang He, 3D Shape Reconstruction of Lumbar Vertebra From Two X-ray Images and a CT Model, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 1124-1133, July 2020. Highlights: ❖ This paper introduces a novel method that use prior model and two x-ray images to reconstruct 3D vertebra. ❖We use the CT data of a vertebra specimen to provide both the shape mesh and the intensity model, and only one prior model used in our method. ❖We combine the elastic-mesh-based and statistical-intensity-model-based methods, which can provide efficient and robust 3D vertebra reconstruction. 21 Jiahai Wang, Yuyan Sun, Zizhen Zhang and Shangce Gao, Solving Multitrip Pickup and Delivery Problem With Time Windows and Manpower Planning Using Multiobjective Algorithms, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 1134-1153, July 2020. Highlights: ❖ A multiobjective pickup and delivery problem with time windows and manpower planning is introduced. ❖A multiobjective iterated local search algorithm with adaptive neighborhood is proposed. ❖The nature of objective functions and the properties of the problem are analyzed. ❖The benefits of multiobjective optimization are discussed. 22 Jin Xu, Wei Wu, Keyou Wang and Guojie Li, C-Vine Pair Copula Based Wind Power Correlation Modelling in Probabilistic Small Signal Stability Analysis, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 1154-1160, July 2020. Highlights: ❖ In this paper, the C-vine pair copula theory is introduced to describe the complicated dependence of multidimensional wind power injection, and samples obeying this dependence structure are generated. ❖The probabilistic stability of power system integrated with six wind farms is investigated by performing the Monte Carlo simulations under different correlation models and different operating conditions scenarios. ❖In the case study of a modified New England test system, the simplified pair copula construction (sPCC) with C-vine structure proves to have a better reflection of the actual dependence than the linear correlation coefficient (LCC) model and multivariate normal copula model. 23 Shengwen Xiang, Hongqi Fan and Qiang Fu, Distribution of Miss Distance for Pursuit-Evasion Problem, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 1161-1168, July 2020. Highlights: ❖ An analytic method for solving the distribution of miss distance is proposed by integrating the error model of zero-effort miss distance. ❖Four different types of Bang-Bang disturbances are considered specifically. ❖Results provide a powerful tool for the design, analysis and performance evaluation of pursuit-evasion problems. 24 Teng Liu, Hong Wang, Bin Tian, Yunfeng Ai and Long Chen, Parallel Distance: A New Paradigm of Measurement for Parallel Driving, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 1169-1178, July 2020. Highlights: ❖ Parallel driving 3.0 system as potential autonomous driving system is essentially discussed. ❖Parallel distance framework is presented to measure real and artificial world. ❖Techniques related to multiple distance calculation are quantified and compared. ❖Practical applications of parallel distance framework is introduced and outlined. 25 Lan Jiang, Hongyun Huang and Zuohua Ding, Path Planning for Intelligent Robots Based on Deep Q-learning With Experience Replay and Heuristic Knowledge, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 1179-1189, July 2020. Highlights: ❖ Fast convergence and Better strategy ❖Deep Q-learning ❖Experience replay ❖Heuristic knowledge 26 Luping Wang and Hui Wei, Avoiding Non-Manhattan Obstacles Based on Projection of Spatial Corners in Indoor Environment, IEEE/CAA J. Autom. Sinica , vol. 7, no. 4, pp. 1190-1200, July 2020. Highlights: ❖ A method is presented to avoid non-Manhattan obstacles in an indoor environment from a monocular camera. ❖The method can cope with the non-Manhattan obstacle without prior training, making it practical and efficient for a navigating robot. ❖The approach is robust against changes in illumination and color in 3D scenes, without the knowledge of camera’s intrinsic parameters, nor of the relation between the camera and world.
Fundamental Entropic Laws and L_p Limitations of Feedback Systems: Implications for Machine-Learning-in-the-Loop Control 链接: https://arxiv.org/abs/1912.05541
期刊: Mobile Information Systems 特刊: Artificial Intelligence for Mobile Health Data Analysis and Processing(人工智能在移动医疗数据分析和处理方面的应用) 出版日期: 2018年11月 本期特刊以近期快速发展的移动健康为启发,探讨了人工智能在移动医疗数据和分析方面的应用。物联网(IoT)正在变革 智慧医疗 (eHealth),尤其是 移动医疗 (m-Health)系统。目前,越来越多的固定及移动医疗器械被安装在患者体内和医疗设备上。患者所处的临床/家居环境收集种类繁多的海量健康数据,并将其发送至医疗信息系统,用于病情分析。 该特刊旨在深入探讨机器学习和数据挖掘在移动医疗中的应用。 Mobile Hardware-Information System for Neuro-Electrostimulation (点击查看) 神经电刺激移动硬件信息系统 作者: Vladimir S. Kublanov、Mikhail V. Babich和Anton Yu. Dolganov 学校:俄罗斯乌拉尔联邦大学研究医疗和生物工程高科技中心(Research Medical and Biological Engineering Centre of High Technologies) 本文阐述了基于多因子神经电刺激设备的移动硬件信息系统的组织原理,并探讨了人工智能和机器学习在治疗流程管理中的应用前景。 Computer-Assisted Diagnosis for Diabetic Retinopathy Based on Fundus Images Using Deep Convolutional Neural Network (点击查看) 基于深度卷积神经网络生成的眼底图像的糖尿病视网膜病变计算机辅助诊断 作者: Yung-Hui Li(1)、Nai-Ning Yeh(1)、Shih-Jen Chen(2)和Yu-Chien Chung(3) 1:国立中央大学,台湾地区 2:国立阳明大学,台湾地区台北市北投区 3:天主教辅仁大学附设医院,台湾地区 糖尿病视网膜病变(DR)是一种糖尿病慢性并发症,早期症状较少,很难检测。目前,DR的诊断通常需要采集数字眼底图像,以及使用光学相干断层扫描技术(OCT)生成的图像。OCT设备的价格昂贵,但如果依靠读取数字眼底图像做出准确诊断,患者和眼科医生均会受益。本文作者提出了一种基于深度卷积神经网络(DCNN)的新算法,来辅助诊断糖尿病视网膜病变。 由查尔斯沃思集团(Charlesworth Group)统筹翻译。 点击查看期刊内优秀论文。
Towards Integrating Control and Information Theories: From Information-Theoretic Measures to Control Performance Limitations By Song Fang, Jie Chen, and Hideaki Ishii This book is an attempt towards bridging control theory and information theory to characterize the fundamental limitations of generic feedback systems; in particular, we aim to develop an information-theoretic framework to analyze the performance bounds and design trade-offs that are prevalent in all possible feedback systems. Towards this end, the book introduces new entropic measures compatible with the analysis of feedback control systems and studies various classes of performance limitation relations. In addition, we examine the implications of the results in the context of state estimation and correspondingly obtain generic bounds on estimation errors. It is also worth mentioning that, thanks to the information-theoretic analysis, the aforementioned performance limits are valid for arbitrary causal controllers or estimators, whether they be designed using conventional approaches or, say, machine learning methods (learning in the loop). 链接: https://www.springer.com/us/book/9783319492889
AI的未来:教会机器-学会方法(learning and search),而不是学会技术 强化学习的奠基人理查德·S·萨顿 (Richard S. Sutton)提出AI的未来是教会机器(计算机)学会能够靠自己学习的方法,而不是学会按照人类思考设计出的一门技术。而那些方法,主要靠机器自己的计算,而不是人类自己脑力计算的迁移。而目前证明能够更好work的方法是:让机器自己learning和search。Deep learning 的成功就是这样的例子。 也就是,人类负责的是设计这些机器使用的方法,而不是具体的一门门技术。未来,应该让让机器放手一搏,让它们自己找到适合机器自己的技术(通过人类设计的方法)。 PS: 当然,这让人类增加了越来越多的不安全感,我们仍然担心,未来机器会控制人类社会。但至少目前,这似乎是让AI继续发展的一种方向。 因为,机器(计算机)似乎也说:“授之于鱼不如授之于渔”。 PS : 文章原文 The Bitter Lesson Rich Sutton March 13, 2019 The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective , and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available . Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers' belated learning of this bitter lesson, and it is instructive to review some of the most prominent. In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess. When a simpler, search-based approach with special hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers. They said that ``brute force search may have won this time, but it was not a general strategy, and anyway it was not how people played chess. These researchers wanted methods based on human input to win and were disappointed when they did not. A similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale. Also important was the use of learning by self play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that first beat a world champion). Learning by self play, and learning in general, is like search in that it enables massive computation to be brought to bear. Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research. In computer Go, as in computer chess, researchers' initial effort was directed towards utilizing human understanding (so that less search was needed) and only much later was much greater success had by embracing search and learning. In speech recognition, there was an early competition, sponsored by DARPA, in the 1970s. Entrants included a host of special methods that took advantage of human knowledge---knowledge of words, of phonemes, of the human vocal tract, etc. On the other side were newer methods that were more statistical in nature and did much more computation, based on hidden Markov models (HMMs). Again, the statistical methods won out over the human-knowledge-based methods. This led to a major change in all of natural language processing, gradually over decades, where statistics and computation came to dominate the field. The recent rise of deep learning in speech recognition is the most recent step in this consistent direction. Deep learning methods rely even less on human knowledge, and use even more computation, together with learning on huge training sets, to produce dramatically better speech recognition systems. As in the games, researchers always tried to make systems that worked the way the researchers thought their own minds worked---they tried to put that knowledge in their systems---but it proved ultimately counterproductive, and a colossal waste of researcher's time, when, through Moore's law, massive computation became available and a means was found to put it to good use. In computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better. This is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach. One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning . The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.
机器学习最终清楚地解释了基因调控 诸平 据美国冷泉港实验室( Cold Spring Harbor Laboratory 简称 CSHL )的 Brian Stallard2019 年 12 月 26 日提供的消息, CSHL 的研究人员,将用于基因调控的数学热力学模型(见图 1 左上方)公式化为人工神经网络( artificial neural network 简称 ANN )(见图 1 左下方)。大型 DNA 数据集通过新的 ANN (见图 1 右图示)输入。连接方式以生物学家易于解释的方式呈现。最终可以使机器学习清楚地解释基因控制。 Fig. 1 A mathematical thermodynamic model for gene regulation (top, left) is formulated as an artificial neural network (ANN) (bottom, left). Large DNA datasets are fed through the new ANN (right). The pattern of connections is presented in a way that is easy for biologists to interpret. Credit: Kinney lab/ CSHL, 2019 在这个 “ 大数据 ” 时代,人工智能( artificial intelligence 简称 AI )已成为科学家的宝贵盟友。例如,机器学习算法正在帮助生物学家理解控制基因功能的令人眼花缭乱的分子信号。但是,随着开发出新算法来分析更多数据,它们也变得更加复杂且难以解释。 CSHL 的定量生物学家 贾斯汀· 金尼( Justin B. Kinney )和阿马尔·塔里恩( Ammar Tareen )制定了设计高级机器学习算法的策略,使生物学家更容易理解。 该算法是一种 ANN 。受神经元在大脑中连接和分支方式的启发, ANN 是高级机器学习的计算基础。尽管具有 ANN 的名称,但它并非专门用于研究大脑。 像 贾斯汀· 金尼和阿马尔·塔里恩这样的生物学家,都使用 ANN 来分析来自称为 DNA 的 “ 大规模平行报告基因分析 ” ( massively parallel reporter assay 简称 MPRA )实验方法的数据。利用这些数据,定量生物学家可以制作出 ANN ,以预测哪些分子在称为基因调控的过程中控制 特定基因 。 细胞在所有时间并非需要所有蛋白质。相反,他们依靠复杂的 分子机制 根据需要打开或关闭产生蛋白质的基因。当这些规定失效时,通常会出现疾病。 贾斯汀 · 金尼说: “ 了解基因调控的机制,对于开发针对疾病的分子疗法与束手无策之间的差异至关重要。 ” 不幸的是,根据 MPRA 数据塑造标准 ANN 的方式与科学家在生命科学中提出问题的方式大不相同。这种错位意味着生物学家发现很难解释 基因调控 是如何发生的。 贾斯汀 · 金尼助理教授在 2019 年 12 月 13 日举行的第一届计算生物学机器学习会议上展示了一种新设计的 ANN 的相对易于理解的结构 ( 见图 2 所示 ) 。 Fig . 2 Assistant Professor Justin Kinney showcases the relatively easy-to-understand structure of a newly-designed artificial neural network. His results were officially presented at the 1st Conference on Machine Learning in Computational Biology on December 13 . Credit: CSHL, 2019 现在,贾斯汀 · 金尼和阿马尔·塔里恩开发出了一种新方法,可以弥合计算工具与生物学家的想法之间的鸿沟。他们创建了自定义的人工神经网络,以数学方式反映了生物学中有关基因和控制它们的分子的常见概念。通过这种方式,这对科学家实际上迫使他们的机器学习算法以 生物学家 可以理解的方式处理数据。 贾斯汀 · 金尼解释说,这些努力强调了如何优化现代工业 AI 技术以用于生命科学。贾斯汀 · 金尼的实验室已经验证了这种用于制作定制 ANN 的新策略,因此正在将其应用于调查各种各样的生物系统,包括与 人类疾病 有关的关键基因回路。 该结果于 2019 年 12 月 13 日在加拿大温哥华举行的第一届计算生物学机器学习会议上正式宣布。也可以通过 CSHL 的 bioRxiv 服务器上 的预印本 来阅读原文,了解更多信息。 Biologists pioneer first method to decode gene expression AmmarTareen, Justin BlockKinney. Biophysical models of cis-regulation as interpretable neural networks , bioRxiv (2019). DOI: 10.1101/835942 . PostedDecember 27, 2019. Abstract The adoption of deep learning techniques in genomics has been hindered by the difficulty of mechanistically interpreting the models that these techniques produce. In recent years, a variety of post-hoc attribution methods have been proposed for addressing this neural network interpretability problem in the context of gene regulation. Here we describe a complementary way of approaching this problem. Our strategy is based on the observation that two large classes of biophysical models of cis-regulatory mechanisms can be expressed as deep neural networks in which nodes and weights have explicit physiochemical interpretations. We also demonstrate how such biophysical networks can be rapidly inferred, using modern deep learning frameworks, from the data produced by certain types of massively parallel reporter assays (MPRAs). These results suggest a scalable strategy for using MPRAs to systematically characterize the biophysical basis of gene regulation in a wide range of biological contexts. They also highlight gene regulation as a promising venue for the development of scientifically interpretable approaches to deep learning.
Finding a Good Division of Labor: Linguistics and Machine Learning in NLP Dan Flickinger Senior Research Associate Stanford University, USA 15:00-16:30, September 26, 2018 Room 1-312, FIT Building Tsinghua University Linguists developing formal models of language seek to provide detailed accounts of linguistic phenomena, making predictions that can be tested systematically. Part of the challenge in this endeavor comes in making the expressivity of the formal apparatus match the requirements of existing linguistic analyses, and part comes in exploiting the formalism to guide in extending the theory. Computational linguists building broad-coverage grammar implementations must balance several competing demands if the resulting systems are to be both effective and linguistically satisfying. There is an emerging consensus within computational linguistics that hybrid approaches combining rich symbolic resources and powerful machine learning techniques will be necessary to produce NLP applications with a satisfactory balance of robustness and precision. In this talk I will present one approach to this division of labor which we have been exploring at CSLI as part of an international consortium of researchers working on deep linguistic processing (www.delph-in.net). I will argue for the respective roles of a large-scale effort at manual construction of a grammar of English, and the systematic construction of statistical models building on annotated corpora parsed with such a grammar. Illustrations of this approach will come from three applications of NLP: machine translation, English grammar instruction, and teaching of introductory logic. Dr. Dan Flickinger (danf@stanford.edu) is a Senior Research Associate at the Center for the Study of Language and Information (CSLI) at Stanford University. He began working in computational linguistics in 1983 in the NLP group at Hewlett-Packard Laboratories, and received his doctorate from the Linguistics Department at Stanford University in 1987. He continued in project management at HP Labs until 1993, when he moved to CSLI to manage what is now the Linguistic Grammars Online (LinGO) laboratory. From 1994 through 2002 he also served as consultant and then as Chief Technology Officer at YY Technologies, once an NLP software company based in Mountain View, California. He is the principal developer of the English Resource Grammar (ERG), a precise broad-coverage implementation of Head-driven Phrase Structure Grammar (HPSG). Current LinGO research is focused on collaborating with McGraw-Hill Education in developing more advanced methods and technology for digital learning in writing, and with Up366 in Beijing to improve writing skills for learners of English as a second language. Flickinger's primary research interests are in wide-coverage grammar engineering for both parsing and generation, lexical representation, the syntax-semantics interface, methodology for evaluation of semantically precise grammars, and practical applications of deep processing. Web page: http://lingo.stanford.edu/danf
上一篇 《什么是机器学习》 介绍说,机器学习是一种自动化构建模型的数据分析方法。计算机的算法利用输入的样本数据,迭代调整一个通用数学模型的参数,使得完成这个“训练”过程后,它能对应用对象的输入,返回一个合理的答案。测试人类的智商是被测者参详样本在选项中挑出最“合理”答案,机器学习模仿这种类比判断的能力。 样本蕴含着数据的规律,在数学上是从问题的属性空间映射到答案空间的一个函数。机器学习的基本算法是从一族假设函数中,通过调整参数得到与之误差最小的函数。实践已经证明,这个学习机制非常成功地模仿了人类的智商功能,让机器从样本获得数据的模式,以此预测应用对象。 进一步深思,我们可能会疑惑,学习算法所得是与样本误差最小的数学模型,为什么它能用来判断样本之外的对象?机器学习的算法以取得与样本最小误差为训练目标,如果仅仅是这样,让机器简单记忆这些样本就好了,而包含有这些样本的应用对象,它的数据模式往往多不胜数,有限的样本怎么能用来判别无限的可能? 解答这个问题,先要分析排除一些简单的情况。 如果应用对象的数据值只有有限个。当学习的样本包含了所有这些值,学习当然不成问题。采用通用数学模型来学习,这时和死记硬背并无区别。只要判别输入是同于已知样本,预测等同于数据库的查询。逻辑推理是发现命题等价和蕴含关系的方法,如果能用逻辑推理,把问题输入归结为那些已知的样本,那可以从查询记忆来得出准确的答案。过去专家系统的人工智能 就是 在这意义上如此工作。科学理论和应用的推理本质上也是如此。这依赖于非常强的假设。 在实践中,应用对象的数据值可能有无限或近乎无限多种。在这种情况,传统的科学研究是通过分析简化,在理论上把它归结为简单的几种,应用时约减实际情况的细节,以纳入已知的条件,它可能工作很好,也可能因忽略关键的细节而出错。而专家系统的人工智能拘于形式逻辑难以过滤细节,往往陷入困境。机器学习最适合应用于这种科学分析无能为力的情况。 再看一个简单的数值预测学习问题,来厘清学习问题的关键。对于单一输入求对应数值的预测问题,其数据的规律是个单输入函数,可以用待定参数 w 的多项式数学模型来学习。公式如下: 数值预测的问题如果存在着这样模式:在一个区间里,当两个输入相差很小时,对应值相差也很小,那这函数可以用多项式来逼近。只要这数学模型有足够的参数,在这区间有足够多随机分布的样本,调整参数 w 总可以取得与样本很小的误差。那么它也能以很小的误差,在这区间预测任何输入的对应值。不难想象要取得学习成功必须满足几个条件。 应用对象的数据必须存在着某种规律或模式。 在应用的范围必须有足够多随机分布的样本数据,以覆盖各种情况。 机器学习的数学模型必须足够灵活的表达能力,以能拟合应用对象的数据模式。 机器学习必须能控制数学模型的表达灵活性,以免过度拟合输入的样本。 这些条件同样适用于一般的机器学习。第一条指应用的对象,如果它们是漫无规则,当然谈不上从中探知规律的学习和预测。第二条要求训练的样本必须包含足够多的数据规律信息,学习是从样本里汲取知识,它们必须充分蕴有。下面深入探讨与学习能力相关的后两条。 上述例子应用对象的数据规律是一个单变量函数,样本表现为 x-y 平面上一些点,学习算法调节数学模型的参数,让模型的函数曲线尽可能靠近这些点,如果模型只具有少量的参数,它的函数曲线比较“硬”,例如只能表达直线或二次曲线,它就不能适应一些更杂函数关系的样本点,表现为训练无法收敛,即数学模型不能拟合样本。另一方面,如果模型具有很多参数,表达非常灵活,它的函数曲线很“软”,调节参数很容易拟合这些样本点,也许应用对象的数据确是在一条直线,结果模型的函数曲线蛇形穿过样本点,特别是在样本含有误差的情况,虽然训练结果与样本点是完美吻合,但在样本之外却有很大的误差,这叫过度拟合。学习算法里的通用数学模型,它必须有足够的能力来拟合应用对象众多的样本,又能够防止过度拟合。这是学习功能的关键。我们从研究它的区分能力入手。 每个学习的机器可能有不同带参数的函数族作为通用的数学模型,同一个算法的机器,用不同的样本数据,赋予机器不同的知识和智能。机器的学习能力,只受数学模型对数据模式的表达能力所限,越多的参数具有越强的表达能力。 VC 维度( VC dimension )用来量度这种数学模型的复杂性、灵活性或表达能力,更准确地说是这函数族对数据模式的区分能力。这是上世纪 60-90 年代 Vladimir Vapnik 和 Alexey Chervonenkis 提出统计学习理论中的核心概念。 输出只有 0 和 1 的判断分类是最基本的情况,其他学习问题可以看作它的复合和变化。对于 N 个样本,把它们分为两类一共有 2 N 种不同的模式。学习算法中带有参数的函数族 H ,能够两分这 N 个样本的模式,最多的数量记为 m H (N) ,它是一个随 N 增大的成长函数,以 2 N 为上界。如果 m H (N) = 2 N 则说 N 个样本能被 H “粉碎( shatter )”,这时它能够实现这 N 个样本的任何一种两分模式,即对这里任何一种模式,都有 H 中合适参数的函数来实现这种模式的分类。 VC 维度 D H 定义为能够被函数族 H 粉碎的最大样本数 N 。例如,只有一个参数 w 的域值分类函数 ,有 m H (N) =N+1 , D H =1 ;具有两个参数 w 1 , w 2 在直线上的区间分类函数 ,有 m H (N) =N(N+1)/2+1 , D H =2 ;平面上一条直线,它能够区分 3 个点 2 3 =8 种分布模式,但不能区分 4 个点 16 种分布模式中的两种,示意如下图。 所以 2 维线性分类函数的 VC 维度 D H =3 。对于简单模型 VC 维度大致上等于参数的数目。用数学归纳法可以证明,函数族 H 能够两分样本模式的最多数 m H (N) ,在 N 到达 VC 维度之后,按 N 的多项式速度增长,这多项式的最高幂次是 D H . 如果应用对象的数据存在着某种模式,这意味着,当随机供给的样本数增加到一定程度后,随后的样本终将会落入前面那些样本可能的模式之中,所以只要数学模型的 VC 维度足够大,便能实现这种分类模式,这种情况下无论增添多少样本,都会落入机器能够辨识的模式,表现为训练的误差会收敛。另一方面, VC 维度太大,数学模型能够实现很多更为精细的模式,样本数据能被多种模式所拟合,训练可能选用了其中一种精细的模式,也许样本外的数据不能纳入这适应面较窄的那个精细模式,所以 VC 维度太大,错失的可能性也会越大,这表现为模型过度拟合样本,尤其是在样本含有误差的情况,对样本之外的数据误差会更大。 例如两个输入的感知器( Perceptron )或二维逻辑回归的数学模型 y = sign(w 1 x 1 + w 2 x 2 + w 0 ) ,是平面上的一条直线,它的 VC 维度是 3 ,它能够实现辨识 3 个样本点所有可能 8 种中任一的模式,对于线性可分模式的数据,再多的样本点也必定符合 3 个样本点所能划分的模式,只不过更多的样本点会更精确地调整参数,趋近应用对象数据分布的划分直线位置。如果样本含有误差,在较多的样本训练下也会得到纠正。如果采用 VC 维度更大的模型,如 y = sign(w 1 x 1 + w 2 x 2 + w 0 + w 4 x 1 2 ) ,对于线性可分的数据,经过训练它也能够辨识所有的样本,但因为数学模型是二次曲线,对样本外的判断误差较大,尤其是样本含有误差出现上图右边那种分布模式,它们能被模型更精确地拟合,但表现出更大的样本外判断的误差。 按 VC 维度概念所做的上述直观分析,用概率的语言更精确地表达是:以足够多随机选取的样本来训练,机器学习预测误差率能以足够大的概率收敛于训练样本的误差率。估计这误差的概率上界,有 Vapnik-Chervonenkis bound 公式: P ( |E in (h) – E out (h)| ε | ∃ h ∈ H ) ≤ 4m H (2N) exp( - ε 2 N/8) 这里 P 是概率,ε是任给的一个小数值, H 是数学模型的函数族, h 是训练选出的函数, N 是训练样本的个数, E in (h) 是 h 函数对训练样本预测的失误率, E out (h) 是对样本之外预测的失误率(数学期望), m H (2N) 是 H 能够两分 2N 样本的最多模式。只要函数族的 VC 维度是个有限的数, m H (2N) 对 N 按多项式速度增大远小于指数函数的减小。所以不等式的右边随 N 增大趋于 0. 这意味着只要训练的随机样本足够多,训练的误差足够小,那么对样本外的预测也能够取得很小的误差。这个误差上界的估计有许多研究,下面是其中之一,当ε很小时有很大概率是: E out (h) ≤ E in (h) + (8ln(2N) D H /N - 8ln( ε /4)/N) 1/2 这式子说明数学模型越简单,即参数变量少 VC 维数 D H 小,用非常多 N 的样本训练后,它预测的准确性就越容易接近于在训练样本上检测到的精度。另一方面,我们知道数学模型越复杂,训练的结果对样本集的失误率 E in (h) 就越小,它对样本的适应性就越好。成功的机器学习要求这两者都小,所以机器学习的算法调整模型参数,在追求尽量减小与样本误差的同时,也尽量减少起作用的模型参数,以便让成功辨识的同时,参与辨识数据模式起作用的 VC 维度尽可能的小,这样对样本外的误差就会大大地减少。这样的算法叫“正则化( Regularization )”。 简单数学模型分类器的 VC 维数大致等于函数族中自由变量的数目。复合 T 个 VC 维度为 D 的分类器,可以取得 T(D+1)(3ln(T(D+1))+2) 的 VC 维度,这意味着多层神经网络可以用较少的参数实现较复杂的数据模式,所以同样数目联接参数的深度网络要比浅层网络功能更强大。另一方面,对于一类应用对象,如图像处理,在深度网络中采用预设功能的处理层,如卷积层( CNN )有针对性的限制网络能辨识的模式,减少 VC 维度,可以取得更好的训练效果。所以深度学习要求很多设计技巧,有许多值得探索研究。 数学模型 VC 维度的概念和概率统计的公式说明,只要应用对象的数据存在着模式规律,数学模型具有足够大的 VC 维度,便有可能通过样本的训练辨识这个模式。通过足够多的随机样训练,它很可能精确地对应用对象进行预测。这便是机器学习的理论依据。它解释了机器学习在实践中取得成功的事实。 对于复杂的应用对象,机器学习需要巨量的参数和随机样本数据,它要求数据获取技术和支持这个巨量计算的计算机功能,这就是为什么一直到了大数据时代,人工智能新突破才到来。 数学的理论虽然证明了机器学习可以拥有很小的预测误差,但这是在概率的意义上,而且要求使用足够多的随机无偏样本来训练才能达到。这不但意味着它的预测,不像科学论断那样的确定无疑,而且这概率意义下的准确率,也依赖于训练样本的随机无偏性,这种无偏性事实上是无法确定的。而且要取得训练成功往往也需要多种的尝试。说到底,机器学习只是模拟了人的类比推测的功能,这是一种归纳式的推理,得不到科学理性的认可。我们虽然理解机器的学习机制,却无法用简单逻辑推演的跟踪来探知它所作判断的具体依据。学习的机器在无数参数错综复杂相互影响的乱麻中,得到犹如直觉般的综合判断,其计算直接明了却难以归纳剖析。尽管它显示出令人惊异的智能,无可争辩地带来了科技革命,却把我们的理性监督排除在外,我们能够信赖这种如同占卜似的,带有误差的认知模式吗?下一篇《机器学习的认知模式》将讨论这问题。 * 我这篇文章曾发表在《中国计算机学会通讯》第13卷第5期(2017.5),这里基本是原稿,文字略有不同。
杨晓凡 德先生 在 Google Cloud Next 2018 大会上有一个万众期待的环节,就是今年三月获得 2017 年图灵奖的 John L. Hennessy、 David A. Patterson 两人的现场访谈。 谷歌母公司 Alphabet 董事长、斯坦福大学前校长 John Hennessy与谷歌 TPU 团队、UC 伯克利大学退休教授 David Patterson两人因计算机体系架构的设计与评价方法以及对RISC微处理器行业的巨大影响获得图灵奖后,在多个场合进行了演讲。在计算机体系结构顶级学术会议 ISCA 2018上,两人就是受邀嘉宾,面对自己研究领域内的研究人员们进行了一场严肃、详细而富有前瞻性的学术演讲。 而在谷歌 2018Next大会上,谷歌云 CEO Diane Greene作为主持人,与两人展开了一场面向普通大众的、覆盖话题更为广泛的现场访谈。访谈氛围不仅轻松幽默,也展现了两人对整个计算机领域的独到观点。 Diane :我知道大家都很期待这个两位大牛的访谈环节。我先简单介绍一下David和John,我想大家都已经认识他们了,不过还是啰嗦一下,John Hennessy是 Alphabet的董事长,David Patterson现在是谷歌的杰出工程师(distinguished engineer)。两人合著了大多数人学习计算机硬件架构所用的教科书(《计算机体系结构(量化研究方法)》,Computer Architecture: A Quantitative Approach),这本书现在也已经发行了第二版了。他们共同开发了RISC架构,也在今年获得了图灵奖,油管上有一个很棒的主题演讲。图灵奖的颁奖词是二人「 开创了一种系统的、定量的方法来设计和评价计算机体系结构,并对 RISC 微处理器行业产生了持久的影响 」。 1997年的时候John来到斯坦福大学任教授,1981年开始研究MIPS项目。1989到 1993年John 任计算机系统实验室主任 ——其实一般介绍的时候是不用说这一句的,但是很巧的是我的丈夫就是在这个时候被招到斯坦福去的,我在斯坦福跟他认识的。然后 2000年到 2016年的 16年间John担任斯坦福大学校长,对斯坦福大学有非常大的影响。 David 1976年加入UC伯克利大学任计算机科学系的教授,从 1980年开始担任 RISC项目的领导者之一。我就是在这里跟David认识的,是他的计算机架构课上的学生 ——我猜他已经不记得我了( - David:当然记得啊。 - John:那她拿到A了吗? - David:A+,肯定是A+)。David 2016年退休,在UC 伯克利工作了 40年。两个人都培养了许多优秀的学生。我刚刚才发现Wikipedia上写了 David在一个每次 2天的自行车骑行慈善活动里从 2006一直到2012年都是最高贡献者,看来后来没有继续骑了。 两位除了为这个世界作出了很大贡献之外,还有一件很棒的事情是, 两个人都与同一个妻子结婚超过 40年 。(全场哄堂大笑) John :如果换了妻子那就肯定不算了啊! David :澄清一下,我们是各自和各自的妻子结婚,不是娶了同一个人……(全场再次大笑) 谣言就是这么产生的…… Diane :那么,你们两个人都不是在硅谷附近长大的,上大学是在 70年代,拿到博士学位要到 70年代中末了。你们肯定在这之前就对电子电气工程和计算机科学感兴趣了,虽然这是很久以前的事情了,不过当时都是因为什么原因呢? David :我先说吧,我当时在UCLA念数学专业,这时候大学都还没有开设计算机专业。那时候我已经知道世界上有计算机这种东西了,但我从来没有想过要做计算机行业的事情,也没有什么毕业生劝我去做计算机。大三下学期有一门数学课取消了,我就在UCLA旁听了半门计算机的课程,当时讲的是Fortran语言,用的还是打孔纸卡,不过这都不重要,重要的是就是那个时候计算机来到了我的脑海里,我被深深地迷住了。我在大四的商务和工程课的课后自学了各种计算机相关的课程,然后毕业前有人给了我一份实验室的活干,我也就这样跟着读研了。 John :我第一次接触电脑是在上高中的时候,那时候我们有时分共享的计算机和纸带,现在看起来很奇怪的。我参与了一个科学项目,要做一台能玩三连棋(tic-tac-toe)的机器,然后用的都是继电器,现在的人很难想象,但是当时我也就只买得起这个。玩三连棋的人都知道,你稍微聪明点就能赢,但是很多人其实没那么聪明,所以机器还挺经常赢的。不过也就靠这个,我说服了我当时的女朋友她们一家人觉得我做这个也能做得下去,然后她后来成了我的妻子,所以从这个角度来看结果还算不错。 然后到了我上大学的时候,现在的人肯定不会相信 70年代的时候是没法学计算机专业的,有一些计算机的研究生专业,但是没有本科专业。所以我学的是电子电气工程,也决定好了要继续读计算机的研究生,就这样不后悔地一路过来了。 Diane :这几十年过得很快,如果当时有人告诉你们未来的技术发展是怎么样的,你们会觉得惊讶吗? David :应该会吧。你知道 Gordon Moore (英特尔创始人之一,摩尔定律提出者)当时在IEEE的某个期刊发了一篇文章写了他对未来的预测,他画了几张画,画了未来的计算机和小汽车,还有计算机可以在商店里面买卖。实话说我当时是不太相信他的预测的。 John :对的,有一张画里画的就是有人在销售家用的计算机。不过即便 Gordon对未来也有一些犹豫,他说我们能预测未来 10年会发生什么,但是更远的就预测不了了。所以我觉得没有人能想到,谁能想到微处理器最终占据了整个计算机产业,完全代替了大型机和超级计算机,谁又能想到大数据、机器学习会像现在这样爆发呢? David :是这样。机器学习其实一直以来都有,但它的突然爆发、不停登上媒体头条也就是过去四五年的事情。即便只是在 10年前都很难预测到这样的发展。 Diane :确实很惊人。那么,说到摩尔定律,摩尔定律现在怎么样了呢? David : 结束了!摩尔定律结束了! 人们很难相信摩尔定律终结了是因为十年、二十年以前就有人说摩尔定律要终结了,但现在是真的来了。摩尔定律是说,每一到两年晶体管的数量就要翻倍,在我们的演讲里John有一页PPT上就有一张图,这个翻倍的趋势已经结束了。这不代表以后不会有新技术带来新的提升了,不代表我们停滞不前了,但是确实不是每一两年就翻番了,它的速度要慢得多了。 John :对的。最近 5年里这个减速的趋势变得非常明显,这 5年里的发展速度已经比摩尔定律预测的速度要少了 10倍,而这样的缓慢发展的趋势还会继续。另外还有一个大家讨论得不那么多的、但是实际上要更尖锐的问题是 Dennard 缩放定律。Robert Dennard开发的技术大家都在使用,就是把单个晶体管作为DRAM的一个单元,我们每个人每天都在用。 他做了一个预测说,单位平方毫米的晶体管所消耗的能源是保持固定的,也就是说,随着技术的发展,单位计算能力所消耗的能源会下降得非常快。这个预测的依据是电压缩放定律等等,但是 Dennard 缩放定律现在也失效了 。这就是你看到现在的处理器需要降低频率、关闭一些核心来避免过热的原因,就是因为 Dennard 缩放定律失效了。 David :第三代的TPU 已经是水冷的了,就是这个原因。 John :对于大型数据中心来说水冷没什么不好,但是手机总不能也用水冷吧,我还得背一个包,包上都是散热片。那也太滑稽了。 Diane :成了比能源的游戏。 John :对,比的是能源了。 Diane :很有趣。那么继续这个处理器的话题,你们一个人做了RISC,一个人做了MIPS,那你们当时做芯片花了多久,为什么要做呢?这个问题挺大了的了。 David :最早我们在UC伯克利开始了RISC的研究。RISC是指精简指令集计算机。我们不仅讨论这个想法讨论了很久,我们也构建了模拟器和编译器。我们两个人都做了芯片的实物出来。 RISC的想法并不难,它的出发点是,软件要借助一定的词库和硬件沟通,这个词库就被称作「指令集」。大个头的计算机里占据了支配地位的思想是要有一个很大、很丰富的词库,可能有好几千个词,别的软硬件要使用起来都比较方便。John和我的想法与此刚好相反,我们觉得要有一个简化的词库、简单的指令集。那我们面对的问题就是,除此之外你还需要处理多少指令集、你处理它们的速度又有多快。最后我们的研究结果是,比我们一开始的计划增加了 25%的指令集,但我们读取的速度增加到了 5倍快。1984年在旧金山,斯坦福的学生和UC伯克利的学生一起在最顶级的会议上发表了科研成果,我们拿出的芯片无可争议地比工业界现有的芯片好得多。 Diane :从你们产生想法,到做出芯片发表,花了多长时间?现在做类似的事情需要花多久? David :4年。现在花的时间肯定要少很多了。 John :从当时到现在,很多东西都在同时变化。微处理刚刚出现的时候,人们都是用汇编语言写程序,然后随着逐步切换到高级语言,大家开始关注有什么编译器可以用,而不是有哪些汇编语言的好程序可以用。UNIX也就是那个时候出现的,我们开始有用高级语言写的操作系统,整个技术环境都在变化,当时虽然也有位片式的计算机,但是微处理器是一种新的技术,有着新的取舍。所有这些东西都给技术发展带来了新的起点,设计计算机也开始换成新的思路。 Diane :那么你们的想法被接受了吗? David :大家都觉得我们是 满大街扔鸡尾酒燃烧瓶的激进分子 。当时占据统治地位的想法就是,很丰富的指令集对软件来说更有帮助。要回到更简单、更原始的指令集,很多软件都会出问题。别人都觉得我们这是很危险的点子。1980到 1984年间,我们去那些大规模的会议参与讨论的时候,好几次别人直接生气然后开始大叫。我和John两个人在一方,其他嘉宾、会场里所有其他的人都在我们的对面。过了几年以后,他们才逐渐开始接受我们的观点。 John :不过工业界总是很抗拒。我记得当时有一个著名的计算机先驱对我说,你会遇到的麻烦是,你影响到了他们的现有利润了,因为你的技术的性价比要高得多,他们卖 20万美元的小型计算机,就要被你的 2.5万美元的小盒子代替了。对他们来说简直是毁灭性的。很多人都担心这个。也有很多人不相信这会发生,但是最后就是这样发生了。 David :今天都有很多人不认为RISC有什么好处。(笑) Diane :在你们开发 RISC的时候,Intel也发展得很快。 John :Intel做了很多事情。首先他们发现了一种非常聪明的方式实现一种叫做SIS的指令集,它可以把x86的指令集转换成RISC指令集,然后构建出RISC指令集的工作流水线。他们确实这样做了,在Pentium Pro上很高效地实现了它,在效率方面带来了很大的改进。对于芯片来说,缓存占的面积越来越大,其它的东西变得不那么重要了。但是有那么一个问题是你没法克服、也没法绕过的,就是 芯片的设计开销以及设计时间 。对Intel来说没什么问题,他们的开发周期是 2到 3 年,有四百名工程师的开发团队。但是这个世界上还有很多别的企业,比如设计移动设备的企业,你可能需要有 5款不同的芯片,而不是一款芯片用在所有的场景里,那你就需要能够快速设计、快速验证、快速制造出货的人。RISC在这方面的优势就改变了整个芯片生态的发展。 David :RISC有很大优势。John说的设计时间是一方面,能耗也是一方面。既然你用的晶体管更少了,芯片的能耗比也就更高了。 John :当你需要做低价的芯片,比如物联网领域的芯片的时候,你可能需要做每片只要 1美元的处理器。X86这样的有复杂翻译机制的芯片是没办法做到每片 1美元的。 Diane :我想问问,现在苹果、谷歌都在做自己的芯片,以前他们都没这样做。现在发生什么了? David :是的。一开始谷歌所有东西都是买现成的,现在慢慢地谷歌开始设计自己的计算机、自己的网络。John以前也说过,这些以前都是扁平的企业,现在都开始做垂直整合了。 Diane :看到这样的现状你开心吗? David :算是吧。如果你做的工作是卖新的点子,那你就希望能够找到很急切地希望尝试新点子的人。当市场上只有Intel和ARM两家指令集和芯片的承包商的时候,你必须去说服他们,写了论文以后要先去求他们。他们是守门的人。而谷歌这样的以软件为基础的企业就对硬件创新更开放一些,因为只要在他们自己的企业里面起效就可以了。 John :这样现状是因为哪里有创新的机会,哪里就会往前发展。之前很长的时间里我们都关注的是那些通用计算的芯片,它们变得越来越快。那么 随着通用芯片的效率变得越来越低,我们需要另辟蹊径。那么我们找到的方案就是为特定的任务优化芯片设计 ,比如为机器学习设计TPU,它是专用的计算芯片。那么谁有设计专用芯片所需的知识呢?就是这些垂直整合的企业们,它们有软件设计的能力,可以专门为硬件设计语言和翻译系统。这里也是一个有趣的变化,我觉得以后做计算机体系设计的人要变得更像文艺复兴时期的人,他们要对整个技术堆栈都有了解,要能够和最上层的写软件的人沟通,用和现在完全不一样的方式了解他们要的是什么。这对整个领域都很有意思。 Diane :因为太专用了,设计流程仿佛都倒过来了。做云服务的人能看到服务器上都在进行什么样的运算,他们看到的可能反倒比做处理器的人看到的还要多、还要明白。 David :对。这也是另一点有趣的地方。对云服务提供商来说,他们是一个闭环的生态系统,在企业内部它只需要解决一个局部的问题,不需要考虑通用计算市场和各种奇怪的使用状况; 它只需要解决那一个环节的计算就可以了 。所以这也会缩短开发时间。目前来看,这些大企业都很大胆地做出了各自的行动,微软在 FPGA上有很多投入,谷歌在做自定义的芯片,传闻说Amazon等等也在做一些全新的东西。所以这个时代很有趣,可以看到很多创新。 Diane :腾讯和阿里巴巴的情况如何? David :嗯,我觉得他们也在做芯片。 John :我觉得现在这个时候很有趣,是因为有一件我们没有预计到的事情发生了。虽然我们切换到了高级语言和标准的操作系统上来,但是 80和 90年代的时候你的硬件选择反倒变少了。PC的市场占有率太高了,大家都不得不关注PC,很多一开始专门给Mac写软件的企业都变成了专门给PC写软件的企业,就是因为PC几乎占据了整个空间,这限制了这个了领域可以出现的小创意和大创新。那么一旦我们有了很多的创新的量,我们就可以做出很多新的东西。 Diane :它对创新的限制就是因为是它单方面决定了整个过程,别人都要围绕着它来。 David :与x86指令集的二进制兼容性是一件非常有价值的事情。现在让我来看的话,这些垂直整合的企业都是在提升抽象的级别,比如从x86二级制指令到TensorFlow,这是很大的一个跨越。但是到了那个抽象的高度以后,我们就有很多的空间创新、把事情做得更好。 Diane :那语言和框架呢? David :如果抛开硬件架构不谈,现在有这种 让程序员变得更加有生产力 的运动。如果你是刚入门的计算机科学家,你肯定会学Python,这是一种很酷的语言,有很多很强大的功能,也有JupiterNotebooks支持,所以它带来的生产力很高。整个世界都有这样的趋势,我们可以看到 Python这样的脚本语言、TensorFlow这样的特定领域专用的语言等等,它们服务的用户群都更窄,就是为了提高用它们的人的生产力。 John :我觉得这就是正确的方向。如果我们想要有很高的计算性能的同时还要保持软件生产力的话,你知道只是逼程序员们写更高效的程序、发挥更多硬件能力是不行的,硬件本身也要对任务有所优化。那么我们不仅需要对编程语言做创新,还需要对整个编程环境做创新,然后把运行的结果反馈给程序员们。 Diane :这样它就能不断自己改进了。到时候全世界的人、小学生都可以编程了 John :你想象一下, 三年级的小学生在用机器学习,简直是不可思议 。 Diane :你们认为最终大家都会用某一款芯片做机器学习吗? David :以我们的职业经历而言,我觉得这是一批超乎寻常地快速发展的应用领域,由于摩尔定律失效了,它就很需要定制化的计算架构。你知道,典型的计算机架构设计就像是用 打飞盘 ,我们的子弹飞出去要花好几年,但是飞盘飞得太快了,等到子弹过去的时候谁知道飞盘已经飞到哪里了。那么我们现在看到有这么多企业都专门为任务设计优于标准微处理器的硬件,但是谁知道谁的点子最好呢,尤其是这个领域还在继续快速发展。据说现在有四五十家机器学习硬件初创公司,我们期待看到大家尝试各种各样不同的点子,然后看看最终谁能胜出。历史上都是这样,如果你回头看计算机架构的市场占有率,每个人都会做出自己的努力,然后逐渐有人胜出,有人退场了。 Diane :你觉得他们会不会受制于需要配合的那个软件? David :这里的考量是,因为我们提高了编程所在的抽象级别,所以不会受到限制。不然就是经典编程的问题了。 John :世界还有一个重要的变化是,如果你回头看 80年代、90年代甚至是 2000年左右的计算机,台式计算机和小型服务器还是绝对主流的计算模式。然后突然就是云计算、移动设备和IoT了,不再是只有中间那一小块空间了。这就是说,对于能耗比、性价比的取舍,我们现在可以有许多种不同的选择。这边我可以造一个 1美元的处理器用在IoT设备上,那边可以有一个水冷的三代谷歌云TPU,许多不同的运行环境,许多不同的设计考量。它也就提供了很高的灵活程度。 David :我现在觉得,这中间是什么呢,中间的设备需要考虑二进制兼容性。在云服务器上二进制兼容性不重要,在大多数 IoT设备上二进制兼容性也不重要。我们只需要创新就好了。 Diane :嗯,这些限制都不见了,那很棒。未来即将要到来的是量子计算,跟我们讲讲这是什么、讲讲你们的看法吧。 John :量子计算是「 未来的技术 」,唯一的问题是它「永远都会是未来的技术」还是有一天会真的到来。这个问题挺开放的,我自己的思考角度是, 如果大多数研究量子计算的都是物理学家,而不是工程师的话,那离我们就还有至少 10年时间 。 那么现在做量子计算的多数都是物理学家 。 量子计算的真正难度在于能否拓展它的规模。对于某一些问题它有着无可比拟的优势,其中一个它能解决得非常好的问题是因数分解,这其实对我们现在的信息安全提出了挑战,因为RSA算法的关键点就在于难以做大数的因数分解;这会给我们带来一些麻烦。其它很多方面也有优势,比如量子化学可以给出非常精确的化学反应模拟结果。但是如果要做大规模有机分子的模拟之类的真正有趣的事情,你就需要造一台很大的量子计算机。大家不要忘了,量子计算机的运行温度只比绝对零度高几K,那么我实际上就需要保持量子计算机的运行环境非常接近绝对零度。这件事很难做。而且,室内的振动、数据的采集甚至如果量子计算机没有做好电磁防护而你带着手机走进了屋里,量子计算机的状态都会完全改变。为了让它能够计算,就要保持它的量子状态,然后最终再采集它的量子状态。这其中的物理规律很惊人,我们肯定能够在研究中学到很多这个领域的知识,但是未来的通用量子计算机会怎么发展,这个问题就很开放了。 David :我觉得量子计算机和核聚变反应堆差不多,都是非常好的研究课题。如果真的能让它们工作起来的话,对整个世界都是很大的推动作用。但是它离我们起码还有十几年的时间,我们的手机也永远都不会是量子计算的。所以,我挺高兴有这么多人在研究它,我也很敬仰愿意做这种长期研究的人,你知道,以我自己来说,我的职业生涯里很难预测 5年或者 7年之后的事情,所以我做的事情都是关注短期一些的目标,比如花 5年做一个项目,然后希望再过几年它可以改变世界。不过我们也经常会错。你预测的东西离现在越远,想要预测对就越难。 Diane :你们两位都是在学术研究的环境里成长,然后加入了企业。不过学校和企业之间的关系也在不断变化吧,你们是怎么看的? David :计算机领域有一个很大的优点是 学术界和业界之间的关系是协同的、互相促进 的 。其他一些领域,比如生物学,他们是对抗性的关系,如果你在学术界你就只能做研究,到了企业就只能卖东西。我们这边不是这样的。 Diane :现在也没问题吗?现在大公司不是把学校里的教授全都招走了? John :这确实是个问题,如果做机器学习的人全都跑到业界去了,就没人来教育以后新一辈的机器学习人才了。 David :过去的 5年里人们对于机器学习的兴趣一下子爆发了,机器学习也确实有不小的商业意义,所以做机器学习的人的薪水也在上升,这确实有点让人担心。我们两个人职业生涯早期也遇到过类似的情况,也是一样的 怕把种子当粮食吃了 。当时微处理器以及别的一些东西因为太赚钱了,就会把所有大学里的人都吸走,没有人教育未来的人才了。现在机器学习确实有这方面的问题,不过你从全局来看的话,总是源源不断地有聪明的年轻人想要研究学术,所以也不会 100%的全都离开学校的。John还做过校长呢,你也说说。 John :像你刚才说的,我们这个领域的一大优点就是学术界和业界的互哺,企业的人和学校的人做的事情虽然不同但是也有项目的尊重。有许多领域都不是这样的,学术界的人觉得业界的人做的是 无聊 的工作,业界的人觉得学术界的人做的是 没用 的工作。计算机科学领域就不是这样的。其中一个原因可能是因为这个领域一直都发展很快、有很多事情在发生。你做的某项科研成果,10年后就可能改变这个领域。这真的很棒。 Diane :我可不可以这么讲,计算机领域的长期研究主要是在学术界,短期研究主要是在企业? David :差不多吧。 John :对,差不多吧。不过当然也有一些企业在做长期的投资,比如谷歌收购 DeepMind就是一项长期的投资。微软和谷歌也都在量子计算方面有很多投资,这些都是长期的投资。这和当年ATT收购了贝尔实验室的感觉差不多,都是长期的投资,而且这些投资让整个国家受益匪浅。 Diane :工程技术也随着科学技术在发展。最近我听说亚利桑那州有个人,Michael Crowe,创办了一所工程学校。你们怎么看? John :人们当然是在对工程本身做一些创新。计算机科学相比别的学科在工程方面也有很大的优势。我们有很多跨学科的内容,可以说有很多跨学科带着计算机科学向前走,这种趋势非常明显。有一些学科始终都起到核心的作用,比如医学和一些社会科学,那么大数据、机器学习的革命来临之后,社会科学发生了革命,我们对人类自己的了解、对整个社会的了解、如何改进整个社会都有了新的发现,这都很重要。 那么计算机科学呢,当我 2000年当上斯坦福大学校长的时候,我觉得计算机科学已经发展到头了,它就那样了。然后学生物的、学医学的人开始说「二十一世纪是生物学的世纪」,开始搞功能基因组学之类的东西 ——我不是说功能基因组学不重要,不过计算机科学可能是功能基因组学里最重要的东西,有计算机科学才能做出其中的那些发现。 所以我们看到了这一场难以置信的革命,我们看到学生的数目开始增长,以及谢天谢地,这么多年了,终于看到女学生开始变多一点了。这都是非常健康的现象。我们在吸引着他们,全国的、全世界的最聪明最优秀的人才都加入了这个领域,这让我非常激动。这也改变了整个世界。 David :当我和John刚加入这个领域的时候,其实我们自己的亲戚都觉得我们是不是入错行了,「行吧你想做就做吧」,就这样。 John :我爸都说,「做硬件还行,千万别做软件」。 Diane :我们看到科技行业吸引了这么多的资金,你们自己的学生创办了好多企业,John也建立过自己的企业等等。比尔盖茨现在不做了,全职做慈善。你们做老师的时候也像是慈善事业。那么你们怎么看慈善的事情,以及整个科技行业里的人。 David :我觉得,当年我拿到UC伯克利的Offer之后,过了一阵子才去报道。当时我看了一本书,名叫 Working ,里面采访了四十个不同行业的人,让他们谈谈自己的职业。我从书里读到的是,你要么要做一些结果很持久的事情,比如造帝国大厦,或者造金门大桥,要么和别人一起合作,比如做教师或者做牧师。这样的事情能带给你满足感,因为它们能影响到别人的生活。我自己就比较期待这样的工作。其实在美国,大家默认认为等你有钱了你就开心了, 但是其实如果你的目标是开心的话,你就直接向着开心去就好了 ,挣钱在其中不一定有多么重要。我几十年工作的驱动力就差不多是这样的。有的人其实做了研究,研究什么东西会让人快乐,其中一件事就是帮助别人。 影响别人、帮助别人能让你感到开心。所以我觉得如果你想要变得开心,你就应该帮助别人。 John :讨论这个还挺有趣的。我记得我大概 25年前和比尔盖茨有过一次讨论,我问他对慈善的观点是怎么样的。他说,微软的事情太多太忙了,我现在还没有时间考虑这个。不过如果你见过比尔盖茨本人的话,你就知道他是一个非常专注、非常自我驱动的人,从他管理微软的方式上你也能看得出来。后来当他做慈善的时候,那真的是完完全全的慈善家,他可以和斯坦福医学院的人坐下来谈生物学和疾病感染,谈得非常的深入。他和妻子梅琳达是真的非常投入地要让这个世界变得更好。Gordon Moore也是这样,他建立了摩尔基金会,在科学和保护区两件大事上花了很多钱。 比尔盖茨做慈善的时候很开心,他真的很喜欢慈善事业,他和梅琳达也是很棒的搭档。我在阿拉斯加看到了 Gordon Moore做的濒危野生鲑鱼的栖息地保护区,和 Mark Zuckerberg和他妻子 Priscilla讨论他们的慈善想法,讨论如何减轻人类疾病的影响,都非常棒。我觉得其中每一个例子、每一件事,都给他们的生命带来了一些很激动有趣的东西。 之前我做斯坦福大学校长的时候,我经常在想有什么办法激励别人变得更慈善一些。然后我看了 Alexander Hamilton 的作者 Ron Chernow写的另一本书,讲石油大亨洛克菲勒的事,他快 50岁的时候得了心脏病,差点死掉,然后他就退休了,这辈子剩下的时间都在做慈善,他创办了芝加哥大学,他建立了洛克菲勒基金会,一直好好活到快 100 岁,非常美满。所以我觉得回报他人能带来快乐,我们都是聪明的、有创意的人,都能把事情做好。这也是能真正地让世界变得更好的事情。 \0 \0
我们的文章“智能哲学:‘第三问题’与图灵的‘模仿游戏’”一文[1]着重指出了图灵提出的“模仿游戏”的真正意义和价值,揭示人、机之间的复杂层次关系,本文结合我们对当前人工智能中“机器学习”问题的研究,进一步讨论“机器”与“学习”之间所隐含的人、机复杂关系。我们从图灵的一贯思想出发,发微图灵论文“计算机器与智能”[2]中所包含的丰富思想,特别是文章中第7章的内容。很明显,作为当前人工智能主流的“机器学习”与图灵所探讨的“学习机器”,其思考的角度和深刻性完全不同,启迪良多。 一、 “亚临界”状态“和“超临界”状态 在“计算机器与智能”这篇文章的第7章Learning Machines,图灵总结了对机器不能“思考”这种谬论的反驳,但他的论证是有底线的,他真正关注的方面不是机器的“功能”(思考)如何如何,而是机器的“状态”,在他看来,机器的“纯机械”方式,如钢琴演奏或剥洋葱一样:“绝大多数思想都处于‘亚临界’状态,对应于处于亚临界体积的反应堆,一个想法进入这样的思想中,平均下来只会产生少于一个的想法”,但是“有一小部分思想处于‘超临界’状态,进入其中的想法将会产生二级三级越来越多的想法,最终成为一个完整的’理论。动物的头脑显然是处于亚临界状态的。由于这种相似性,我们不得不问:’一个机器能不能做成超临界的?’” 图灵所说的这个“超临界”的状态,在我们看来,就是指现在不同于“机械步骤”(计算机)的“人工智能”的核心理论问题。 图灵认为,“亚临界”状态和“超临界”状态之间的区分和定义是非常困难的,图灵并不以为所有的这些争论已经解决了关于人的思维与机器思维的相同与不同的问题,这里既有公众对这个问题的关心所包含的模糊性(图灵努力地进行了分析),也有这个问题的自身本质上的问题,图灵承认:“These last two paragraphs do not claim to be convincing arguments. They should rather be described as ‘recitations tending to produce belief’”(上面两段并没有宣称是令人信服的论据,更应该被看作是“为了产生信仰的背书 ”,——即对立的观点的争论不过是背颂各自的宗教式的教条)。 实际上,图灵的思考并未过时,人工智能研究中的两条道路始终存在,一方面,以“联接主义”为名,代表了重视物理关系(硬件)的一方,另一方面则是以“符号主义”为名的重视算法(软件)的一方。重要的不是这两方的对立,而是这两方都有无法克服的困难,特别是双方无法沟通所形成的思想上的混乱,令人不安地再次想起“明斯基的咒语”。 尽管今天的“机器学习”取得了巨大的成功,但在这个领域最前沿工作的专家仍然承认,无法理解和解释最基本的人工神经网络模型(ANN)的机理;另一方面,除了模拟神经元-突触的ANN模型,迄今没有产生通用的Agent 硬件,现在人工智能研究大多是在电子计算机中的建模(函数化)进行的,“人工智能”与“计算机”究竟有何不同,成了公众和专家们共同的困惑。 2017NIPS (神经信息处理系统大会 Conference and Workshop on Neural Information Processing Systems,关于机器学习和计算神经科学的国际会议)上,Test of Time(时间检验奖)论文大奖获得者Ali Rahimi 在演讲中[3],把“机器学习”称为“炼金术”(Alchemy),类似的看法或对立性的争议在学术界一直没有中断过。Rahimi 引用吴恩达的话: “Artificial Intelligence is the new electricity”,(机器学习就是新时代的电力),他们的意思是说,现在的AI研究只是纯粹的技术活,整个AI缺泛严格性和一致性的理论基础,未能成为非常稳固、有规律、有系统理论的知识体系。Rahimi以后解释说,炼金术问题和黑箱问题的区别在于,“一个机器学习系统是黑箱”和“整个领域变成了黑箱”。与此对立的观点,如Facebook的首席人工智能科学家Yann LeCun则认为[4],工程技术上创新可以从乱糟糟中带来核心的理解,加州大学伯克利分校Benjamin Recht教授也认为有条不紊的研究和大胆开拓的研究可以达到一个平衡,“我们两者都需要”。 在此之前和之后,许多科学家表达了类似的看法和与此有关的剧烈论争,但所有这些对立的观点都承认,AI研究必须要有坚实的理论基础才能成为完整的科学理论体系,但现在的问题在于,我们不知道解决问题的方向,这种困惑几乎一直伴随着人工智能的发展历史。我们认为,可以从图灵的思想中去寻找启示,把当前作为工程技术的“机器学习”与图灵对“学习机器”的本质性思考结合起来,以获得理论研究方向上的灵感。 二、 “学习机器”与“机器学习” “学习机器”(图灵)与“机器学习”(当前AI的主流工作)这两个概念的不同就在于人(研究者)在人、机关系中的地位,也就是我们一直重视的人、机伦理关系。我们强调,图灵一直是作为机器的创造者角色进行思考的,他主要思考的是机器的“状态”,所以他细致地分析了机器的“亚临界”与“超临界”这两种状态,以我们现在习用的术语来说,这就是“线性的”和“非线性的”(指数的)两者本质的不同。 图灵始终以创造者的身份考虑“学习机器”的可能与不可能。对于他来说,算法与“机械步骤”都是功能性的,即“能行的”、“线性的”,对于专家或普通人这都不成为问题,真正的问题是:一个机器能不能做成超临界的? 而且图灵认识到,这个问题的最大困难在于,从工程学的角度上,无法回答这样的问题,以我们今天的理解,就是说,这是人的问题而不是机器的问题。 但当前人工智能研究的真正核心问题似乎还没有被人意识到,人们关心的只是如何发明、设计更好的算法,“机器学习”大部份研究几乎集中于此,所以称之为“电力”或“炼金术”并不冤枉,“机器学习”并不关心“机器学习”的本质是什么,从来没有像图灵一样反思过:“一个机器能不能做成超临界的?”,在他们意识中,似乎只要不断地“试错”下去,一定能让“猴子打出文章来”。 由此我们可以看到,这两个术语区别的重要性,特别是对这两者不加分析地混同,就隐藏或误导了人工智能研究中的本质性问题,实际上这个问题也是科学哲学的基本问题。 “试错”作为一种工程实践在以客观性和实证性为本质的科学领域内最终能产生突破性的成功,甚至引起“范式革命“ ,最终是以人的基本认知的转变,甚或以人的代际之间的替代为代价的,对于基本理论或概念的缺失,不能由“试错”产生。不能成为纯粹客观性和实证性的对象不是科学能力所及的,“智能”作为一个抽象的概念,不能成为科学研究对象。因此,在不知“智能”为何物,或者不能清楚地定义“人的智能”与“人工智能”这两个概念的情况下,想创造研究“智能”的显微镜、试验仪器、试验室或研究方法之类的想法,实际就是事先肯定了“人可以制造超临界的机器”的能力,这本身就是对科学精神的违背。 由此可以看出,“机器学习”与图灵的“学习机器”这两个概念在本质上有别,如何认识这两者的相同与不同,具有重要的实际意义。科学家习惯以科学思维方式工作,这是科学基本精神的人文价值,但以科学的客观性、实证去顶替人文精神,把“科学”当作一个咒语,用在人类所面临的一切,包括人类自身的价值、意义、命运上,这种以科学之名的狂妄与图灵的自知之明(Entscheidungsproblem)无法相比。 三、 对图灵的文章的直译、意译与释译 图灵的工作和文章的价值远没有得到充份的认识,当然图灵也不可能清楚、充份地回答所有的相关问题,但图灵对人类的能力的自知之明永远不会过时。解读图灵的文章时,理解他的思想、认知更重要。对图灵文章需要专研,在读、释中,如何深入地去理解图灵简短表达后的层次复杂性,不仅是语法语义问题,也是对历史的发掘(“知识考古学”——福柯),这是对历史的负责,更可以成为对现在和未来思考的灵感之源。 我们研读图灵文章时始终重视文章中隐含的多层次的复杂性,比如,对当时图灵写作基本思想的一致性和基本认知的理解,这可以举一段图灵原文的叁种不同的理解和译法作为例子: 原文:As I have explained, the problem is mainly one of programming. Advances in engineering will have to be made too, but it seems unlikely that these will not be for the requirements. 直译法:正如我所解释,问题主要是编程,工程上的进步也是需要的,但这种所需不被满足的可能性似乎不大。 意译法:正如我所解释过的,(现在的)问题主要是编程这一方面,虽然作为(计算机)工程上的问题应当受到(与编程)同样的关注,但这似乎不大可能,因为这些(“编程”和“工程”两者)无法(结合在一起而)胜任这种要求。 释译法:按意译所隐含的对应,one of programming的另一方,是工程(硬件)上的要求,这里的but it seems unlikely中的it 就是指Advances in engineering,it will have to be made too,注意这个Advances不是Advance的复数,Advances是“求爱、热切的要求”的意思,是单数名词,与it对应;后面that 是表达seems unlikely原因的状语子句,因为 these (软件和硬件)will not be adequate for the requirements,即, “工程”(硬件)与“编程”(软件)不能满足同时结合起来讨论的要求(for the requirements)。以现在的方式理解,这是对创造机器能力的人的能力而言,硬件的创造不是机器的能力而是人的能力,图灵始终是作为一个机器创造者(人类身份)而考虑人类的能力问题。编程只是“炼金术”级别,即使在硬件條件简单的情况下,也是可以讨论的。此句以下,图灵分析了当时硬件條件下可以只考虑的“学习机器”问题。 四、 成人的“学习”与儿童的“教育” 图灵区别成人的学习与儿童接受教育,虽然两者都可以名之“学习”,但图灵认为成人大脑所经历的经验不同于儿童大脑接受教育的性质,就是说,这相当于“超临界”状态与“亚临界”状态的不同。因此,“与其试图编程模拟成人大脑,不如模拟儿童大脑”,现在看来很明显,成人的学习是“学而时习之”的个人历史经验过程,儿童的教育具有被动学习的性质,主要依靠记忆和训练。但即使是这样,“儿童机器”的教育仍不同于机械的“学习机器”,儿童在教育过程中的变化是受教育者的责任约束的,人类对儿童的的教育具有类似于“自然选择”的重大责任。 图灵虽然是一个技术理论专家,却充满人文关切的伦理精神:“It will not be possible to apply exactly the same teaching process to the machine as to a normal child. …… The example of Miss Helen Keller shows that education can take place provided that communication in both directions between teacher and pupil can take place by some means or other. ” (对机器不可能应用与正常儿童完全相同的教学过程,……海伦.勒女士的例子表明只要老师和学生能够以某种方式进行双向的直接交流,教育就能进行)。今天,在我们面临AI基本理论问题和受到人、机伦理挑战的困惑的时候,图灵比我们清醒多了。 五、 规则与规则的规则 对于“机器学习”而言,算法、指令、逻辑、规则等具有相同的本质,但图灵对创造“学习机器”,特别是“儿童机器”而言,“规则”与“规则的规则”具有完全不同的意义,图灵认为,这是人工智能的基本性质: The imperatives that can be obeyed by machine that has no limbs are bound to be of a rather intellectual character, as in the example (doing homework) given above. important amongst such imperatives will be ones which regulate the order in which the rules of the logical system concerned are to be applied, For at each stage when one is using a logical system, there is a very large number of alternative steps, any of which one is permitted to apply, so far as obedience to the rules of the logical system is concerned. These choices make the difference between a brilliant and a footling reasoner, not the difference between a sound and a fallacious one. -没有肢体的机器人(AI,Agent)所能执行的指令具有智力性质,……在这些指令中,最重要的是调节逻辑系统规则的执行顺序,因为在使用这个系统的每一步,都会有许多不同选择,在遵守逻辑系统规则的情况下,任意选择一个都是允许的。如何选择将区分聪明推理者还是愚蠢推理者(Agent),而不是区分正确推理者还是谬误推理者(计算机)。 如果我们真正理解了图灵的这种思想,就不会为无法区分作为Agent 的AI 与计算机的能力问题而烦恼。 六、 不确定性与人工智能基本问题 我们的NP理论[5]坚持图灵对希尔伯特第十问题解决的基本意义,理解线性(P定义)与非线性(NP定义)分别是最基本的本质的区别,任何以P等于或不等于NP为目的前提、假设或猜想,都是循环定义或循环论证的错误。人类只能在线性与非线性之间建立最优近似性联系(NP-algorithm),但这不能以牺牲“线性”和“非线性”自身本质为代价。这种基本认知问题上的误导,就会产生以“停机问题”替代“不可判定问题”,以“图灵检验”替代“模仿游戏”。因此可以说,这些都是以“炼金术”取代基本概念和基本理论问题研究。 图灵提问:“一个机器能不能做成超临界的?” 实际就是希尔伯特第十问题在人工智能领域的再版化。正是尊循图灵一致性的思想,我们把算法理论、NP理论自然地延申到人工智能领域,有关这些问题,我们在“智能哲学”中进行深入讨论。 参考资料: [1]http://www.aisixiang.com/zhuanti/495.html [2]A.M. Turing, Computing machinery and intelligence, Mind,59, 433- 460,1950. [3]https://mp.weixin.qq.com/s?__biz=MzA3MzI4MjgzMw==mid=2650734337idx=2sn=a0482d8899df629a8b1e8c51d302fe2echksm=871b3b7fb06cb2697b4dafe698ef87256d49b72a78a7c3cd10563890edda02adb525d9480471scene=0key=57525b5230b75ef7f8d3197a7a096eee12021e31be3b4e863872af2b768b84f3ca7ad830a9e52d3f6963ad76871fdb6e6f478f02a04a20ea4ab84682bee4e71775e51612857a0d8dab789feb98285baeascene=0uin=MTY1NzgxNjIyMg%3D%3Ddevicetype=iMac+MacBookAir7%2C2+OSX+OSX+10.11.6+build(15G1611)version=12020810nettype=WIFIfontScale=100pass_ticket=FuwY1I1ZwTdtzuygXfj8oQ3sbx2mbbuoMkvag5lwjl55D%2BAbF7fEqVeGWQ1ooFcU [4]https://www.facebook.com/yann.lecun/posts/10154938130592143, [5] http://blog.sciencenet.cn/home.php?mod=spaceuid=2322490do=blogclassid=172041view=mefrom=space (此文已在“人工智能学家”( https://mp.weixin.qq.com/s?__biz=MzIzMzc3MjYyNQ==mid=2247486720idx=1sn=c90c47e5c1f14de2b168b043cea6142achksm=e881ce3edff64728f637330c8203b82cb4fb00798934e7976ee8ea36cb0562b21abcf7e1f220mpshare=1scene=1srcid=06127nGJ5BXgB7Ha3H5W2XEXpass_ticket=ry1iovIMQpfbFs%2BmpCgDrjndsjNbybCnX1e6dRuxG1q840kDTn2OBkPNwVidofkB#rd )和”爱思想”网站( http://www.aisixiang.com/data/110518.html )刊出)
降维:主成分分析( Principal Component Analysis ,PCA) 通过维基百科可查到有两种方法可以进行PCA,EVD和SVD均可 https://en.wikipedia.org/wiki/Principal_component_analysis 一、本文介绍SVD进行PCA降维,上一篇已经介绍过了SVD,趁热打铁。 附:SVD理论介绍 http://blog.sciencenet.cn/blog-1966190-1118220.html 现有的数据 , 通常先使 X 均值为零(一般是令特征列为零),再进行 SVD 分解可得: , 其中 U 为 m 阶酉矩阵, V 为 n 阶酉矩阵。 构建主成分 , 若选择奇异值前 l 个,则 , 至此已经将X由 n 维数据变成了 l 维数据,降维过程结束。 可对P进行相关分析,即 分析发现P每列(即每个维度)均是不相关的,即正交的。 也可构建主分量 , 这是另一种做法。 若选取主分量,即较大奇异值(k个, kr )和对应的 V 中向量(右奇异向量),因为在 SVD 分解中奇异值已经是从大到小排好的顺序,只需选取主分量即可。构建主分量其实属于对 X 的正交变换,这种变换在信号处理中称为 KL 变换,可参考 KL 变换去噪的文献: Jones,I.F. and Levy,S.1987.Signal to Noise Ratio Enhancement in Multichannel Seismic Data via the Karhunen-Loeve Transform,Geophysical Prospecting,35,12-32 若选取主成分为: , 再进行反变换即 : , 这就是经过变换后的数据,显然有信息丢失,可以用保留的奇异值来计算主成分贡献率,这是去噪的流程。 二、再来看特征值分解EVD如何进行PCA 一般概念 协方差矩阵 :设 X 为 n 维数据,即 X n × n =(x 1 ,x 2 , …… ,x n ) T , X 的自协方差矩阵定义为 上式中 i 和 j 取不同的值,则可得 n × n 的矩阵。 如果求两个数据的协方差矩阵,只需将式中的 X i 换成 Y i 即可。 相关矩阵 :上式中最后一行第一项即是相关矩阵的定义,因而可知,只要将数据均值为0后,其协方差矩阵就是相关矩阵。 这部分内容是信号处理中的基础知识,可自己查看清华大学张贤达《现代信号处理》。 协方差矩阵的性质 (如果数据为复数): 先对 的每列零均值化,计算的协方差矩阵 , 因 ,故C是酉矩阵。 对C进行特征值分解,即求特征值和特征向量。 可化为 ,其中 , V 是特征值对应的特征向量构成的矩阵, V 的每一列是特征向量。将特征向量按照特征值的大小排列(MatLab可能是从小到大排列的,注意调整),并进行模值归一化,取前 k 列特征向量,构建主成分 即可得到降维后的数据。 三、PCA 实战参考链接: https://blog.csdn.net/shizhixin/article/details/51181379 https://blog.csdn.net/google19890102/article/details/27969459
SVD 是奇异值分解 一、一般概念 酉矩阵 :设 A Î C n × n ,C 可以是复矩阵,实矩阵,如果 A*A= I (单位阵), 则称 A 为酉矩阵;当 A 是实矩阵时,就是我们常见的实正交矩阵。 Hermite 矩阵 :如果 A*=A, 那么 A 是 Hermite 矩阵,若 A 是实矩阵,称 A 为对称矩阵。 正规矩阵 :如果 A*A=AA* ,那么称 A 为正规矩阵。酉矩阵和 Hermite 矩阵均属于正规矩阵。 正定 :对于一 Hermite 矩阵 A ,如果对任意非零向量 x Î C n ,都满足 x*Ax0 ,称 A 为正定阵,如果是大于等于 0 ,称 A 为半正定阵。正定矩阵的特征值一定为正。 二、奇异值分解( Singular Value Decomposition , SVD ) 奇异值 :设 A Î C m × n , A*A 的特征值的非负平方根称 A 的奇异值,由此定义可知奇异值必然大于等于 0 。 奇异值分解定理 :设 ,则存在 m 阶酉矩阵 U 和 n 阶酉矩阵 V ,使得 U*AV= 其中 , s 为奇异值, 上面等式两边可以同时乘以 U 、 V* ,由于 U 和 V 都是酉矩阵,那么必然为正规矩阵, ,此为我们常见的 SVD 分解表达式。 关于 SVD 分解的证明,需用到 Schur 分解,可参看北京大学徐树方《矩阵计算的理论与方法》。 三、SVD的意义和用途 由 SVD 分解定理可知,一般矩阵都是可以进行 SVD 分解的,而常见的特征值分解则需用保证矩阵是方阵,前者分解的奇异值都是大于零的,而特征值不一定。 无论是奇异值还是特征值,都是表征矩阵重要特征的具体指标,这里的特征不妨认为是模式,即事物赖以存在的形式。该值越大表示信息的显著特征,值越小可能是噪声特征。(引申与扩展:过拟合) 如果取 A 的奇异值中较大的数值,比如取前 k 个( kr ),那么此时也能恢复 A ,但此时 A 的信息则有缺失。那么此时 U 和 V 的维度也减少了(注意维度默认是列数,也称为数据的特征,而行一般称为观测值),这就是降维的一般概念。 SVD 可用于机器学习中数据噪声的去除,保留数据的主要特征。由于 SVD 的结果多样,也不知道哪个是目标结果,所以无需训练算法,这是无监督学习的共性。 四、SVD 的实战练习可参考链接 : 1. 手写体( MATLAB ) https://blog.csdn.net/google19890102/article/details/27109235 http://www-users.math.umn.edu/~lerman/math5467/svd.pdf 2. 推荐算法提升( SVD 竟然也可以,神奇)( Python ) https://blog.csdn.net/sinat_17451213/article/details/51201682 http://www.sohu.com/a/127149396_464826 3. 图像压缩( MATLAB ) https://blog.csdn.net/bluecol/article/details/45971423
不要说数据可视化的优点,以及为了展示给老板看。 本文参考维基百科: https://en.m.wikipedia.org/wiki/Anscombe%27s_quartet 下图是著名的安斯库母四重奏, 它们具有相同的统计值,但不同的x,y,然而结果用简单的线性回归建模却得到同样的结果,事实上,拟合的结果的准确性是值得商榷的,有的效果可以,有的却是错误的。 Property Value Accuracy Mean of x 9 exact Sample variance of x 样本方差 11 exact Mean of y 7.50 to 2 decimal places Sample variance of y 4.125 plus/minus 0.003 Correlation between x and y 0.816 to 3 decimal places Linear regression line y =3.00+0.500 x to 2 and 3 decimal places, respectively Coefficient of determination of the linear regression 线性回归的确定系数 0.67 to 2 decimal places 好好看看,第二个图和第四个图是不是直接错误,第三个图勉强算对,但不准确,有个离群值明显可以舍去。第一个图是正确的。 由此可见,在数据探索中,有必要进行简单的验证,查看数据是否可以用已有的模型,模型重要,但数据质量更重要。
前记: 这篇或者即将的这系列博文,主要是缘于我们学院一帮可爱的孩子,在联络一些老师做些关于大数据、人工智能等的科普文章,所以才写,既然写了,也想分享给大家。 其实受邀给大家分享一点儿对于这个时代的标签之一——“大数据“的见解,我是诚惶诚恐的。因为众多的大咖都在布道、躬身实践,唯恐自己的浅薄认知,让人贻笑大方。不过想到知识的包容性,也就释然一些。今天我就以自己这几年对数据的应用认知、基础理论发展脉络的把握,和大家一起聊聊我认为的数据,数据分析与挖掘的核心问题,以及什么是数据价值再造,如有偏颇,欢迎大家批评指正。 2013年9月,我与国外导师Kang L. Wang教授辞行、谈及未来规划的时候,他像孩子般率真微笑道“Yu, do you know the Big Data?”哈哈,说实话当时我有点儿懵圈了~大数据?难道就是直译大数据?还是一个更加专业的术语?“Sorry, I’ve no idea. Professor, is there any story?”随即,王老师给我讲了一个美国FBI的例子,又讲了一个发生在UCLA的故事,这些都是他眼中的大数据。他还说,现在国内big data已经很火了,建议我回去可以好好思考。例子的具体细节不是很清楚了,但是大体明白就是美国也有所谓的大数据,但是他们并没有上升到一个十分夸张的万能概念角度,而是有很多实实在在的应用,更多是为了保护国家的信息安全,等等,诸如此类。这是我生平第一次与大数据有交集。 随后回国,准备博士毕业,申请留校等工作。至次年2014年3月份,留校已经敲定,可以正式留在信软学院嵌入式实时计算团队,去见团队负责人雷航教授的时候,他语重心长对我说“晓瑜,你的博士研究方向量子计算和量子电路综合,很显然工作后将不能继续了,因为团队没有这方面的任何储备。你需要有大局观,站在团队的发展规划上,重新立意自己的研究方向……现在团队有这样几个方向:嵌入式操作系统、大数据、图形图像等,你需要考虑一下选择哪个?”其实这几个都不是我擅长的,第二次听到“大数据”已经不觉得陌生了,至少已经出现过一次了,我当时鬼使神差般回答道“雷老师,我就选大数据吧~”哈哈哈,人生很多时候就是各种戏剧,各种无常,似乎这才符合人生! 就这样我开始真正结缘、走近、认知、熟悉大数据,也开始真正的作为一个参与者而不是旁观者,来审视、建设、推动、批判这个新兴的交叉学科研究方向。一直以来,我都觉得自己有一个很强大的性格特质,那就是“随遇而安”而又适应性极强,最重要是总能活下来,姑且这是自我阿Q的一种精神激励法吧~就像从本科的EE到研究生的CS,再到博士期的Quantum Computing,自己还是自得其乐的。当然,这期间也会遭到质疑,这样会不会不够专注?会不会在哪个阶段就毕不了业呢?其实,我还真的没有想那么多,只是觉得喜欢,或许就是无知才无畏,无畏才有更多的创造力! 从2014年开始,可以称得上是我的大数据元年,我开始穿梭在国内顶级的几个大数据国际会议现场活动中,开始大量快速的阅读和大数据相关的新闻、图书、paper,其实只要你肯花时间,找到关键点,很快就能切入。我用了半年的时间,对于国内大家对大数据的认知程度、大数据的发展现状、大数据在产业界、学术界、政界等的天花板也了解的七七八八。随后,就开始深入建设我们自己的学术科研队伍,我们几位老师和学生给我们的大数据小组取名SunData Group(尚数据工场)。队伍从一开始只有2位老师、2位本科生;到1年后,我们有3位老师、5位研究生、10+位本科生;再到现在的5位老师、5位博士生、30+研究生、100+优秀本科生。这个成长速度和我们自身的努力分不开,但是还远远没有驻足,因为我们只是完成了长胖的过程,还没有完成长高、健美的过程。和大家分享这些经历,主要是想告诉大家,大数据,离你我并不远,也不神秘,只要你愿意,你也可以像我们一样,快速融入,并能深入和升华。 好,我们言归正传。今天主要和大家探讨如下几个问题: 1、什么是数据,什么是大数据 数据某种程度上是对我们周围的物理信息世界的一种符号抽象,所以数据包罗了各种信息,有用的、无用的、有序的、无序的、显式的、隐式的……同时与数据千丝万缕扯不清的两个概念就是:信息与知识,在我个人看来,数据好比原石,信息好比初步磨出的翡翠,而知识则是经过精雕细琢之后的一件翡翠艺术品。 自2003年世界进入大数据元年开始,各种机构、各路学者陆续给出了对大数据的解读,最后大家初步形成共识的是,大数据的4V+O特征,分别对应了:volume(体积,数据尺度大)、velocity(高速的数据in and out)、variety(数据类型的多样化)、veracity(数据的准确性)、online(线上数据实时性等)。其实有些场合我们也会增加一个V,那就是value(价值),因为无论是数据科学处理的终极目标,还是大数据处理的最终结果,如果没有价值体现,这项工作都将毫无意义。 说到大数据的首要条件就是数据量大,那么究竟多少算得上是大数据呢?我们知道不同行业领域的数据,其尺度存在较大的差别,比如社交媒体产生的数据量就远远超过我们高校学生数据。因为社交媒体含有大量的音频、视频、图片等大容量文件,而高校学生服务数据,多以电子表格、交易记录等为主,数据量基本以KB起,而前者动辄几百个GB,甚至达到TB。所以这几年学术界和产业界呼声较高的一个界定,基本上在PB级或PB级以上才算得上大数据尺度。数据的容量尺度为:KB à MB à GB à TB à PB à EB à ZB…… 同样大数据由于其多样性,也决定了大数据处理的时候,面临的数据类型不再是单纯的结构化数据,还有更多的半结构化、非结构化数据,如我们读的报纸、看的视频、听的广播和歌曲、拍的照片等,这些都称为非结构化数据。所以,大数据处理除了面临数据类型的挑战,还有就是关于海量数据存储的问题。其中,云存储与分布式文件存储等技术,有效的解决了这一问题。大数据时代的来临不是偶然,而是其他方方面面的技术发展带来的必然结果。试想二十年前信息高速公路刚刚提出来的时候,没有想到数据爆炸如此迅猛,自然,数据尺度很难达到所谓的“大”;十年前如果没有云计算技术的落地,今日如此海量数据,该如何存放,又该如何快速计算;近些年如果没有深度学习、机器学习、神经网络等核心技术算法的快速发展,如何支撑大数据的应用落地,等等。 2、大数据的核心问题 说到这里,不得不把大数据的核心问题单独拿出来与大家探讨。与传统的概率统计、机器学习相比,大数据处理有这样几点是需要我们初学者明确的: (1)全体数据,而不是样本数据 大数据研究的是全体数据的问题,而不是抽样样本的相关问题。这一点就决定了数据越多越好、数据越全越好,因为只有这样才能更加接近大数据的全体数据,才能更加接近事物的真相和本质。 (2)关联关系,而不是因果关系 大数据研究的是数据间的关联关系,而不是传统的因果关系。因果关系我们很明确,就是有这个结果,一定有导致其产生的原因,这个因果关系在辩证唯物主义上是普适的,是大家认知所接受的。然而,关联关系是完全无关因果的一种逻辑,正如大家耳熟能详的“啤酒与尿布的故事”、“蝴蝶效应”等。我们通过发现凡是购买了尿布的消费者,一般也有很大的概率购买啤酒这一有趣事实,来指导超市的货架展销策略,将尿布和啤酒放在靠近的区域,进而提升了销售额。这里你就很难说,因为他买了尿布,所以他又买了啤酒;我们只能说买了尿布,进而买啤酒的概率很大,二者有一定的关联关系。这就是突破我们传统认知的关联关系,也是我们要开始进行大数据处理必须学习的。 (3)预测而不是断定 当然了大数据是有很强大的功能,帮助我们挖掘很多隐藏在数据背后的真相,但是它也不是万能的!就好像我之前在博客里提到的一本网络小说《当我谈论算命时,我想谈的是大数据》,暂且不论这本书是否严谨,但是大数据的核心问题之一,与之有异曲同工之妙。大数据处理、分析、挖掘,最后的结果都是对下一步,或者之后即将发生事情的一种预测,既然是预测就无法做到百分百准确,总是存在概率问题。这一点就有点儿不同于传统的统计分析,并不能够准确给出事物发生的条件概率。甚至通过某些参数的调优工作,只能无限逼近,却永远无法到达。既无奈,又让人执着! (4)决策支持是价值体现 大数据处理的终极目标是实现对决策者的客观第三方辅助支持,那么这就回到了所谓的人工智能中真正的智能决策问题上,这个open question,至今也是争论不休。何谓真正的智能,何为人工智能?对于这两个问题,我们今天暂且不去过多讨论。 首先看下大数据预测与决策支持的问题,我们知道大数据一定是面向于行业和领域应用的。因为很多时候,抛开数据背后的业务逻辑,我们是无法解读出更多数据隐含的信息的。那对于数据分析师而言,既要理解业务逻辑,同时又要能够将数据分析、挖掘的结果,作为一个有利的辅助支撑材料,提供给决策者,以便综合做出最优的决策。大数据处理的价值体现,就在于提供的这个决策到底能起多大的分量。 由于时间和篇幅的有限,下一篇,我们将一起笑谈数据分析与数据挖掘处理的几类核心问题,同时聊聊大数据与物联网、云计算等的关系。
人工智能很火, 人工智能大神很火。大神们的神器是什么? 有人说找到了,就是EM算法。 请看这篇: EM 算法的九层境界: Hinton 和 Jordan 理解的 EM 算法 http://mp.weixin.qq.com/s/NbM4sY93kaG5qshzgZzZIQ 但是最近网上引人关注的另一传闻是,一位人工智能论文获奖者在获奖感言中说人工智能方法是炼金术, 从而引起大神家族成员反驳。 报道见: http://baijiahao.baidu.com/s?id=1586237001216079684wfr=spiderfor=pc 看到上面两篇, 使我想到: EM 算法是炼金术码? 我近两年碰巧在研究用以改进 EM 算法的新算法: http://survivor99.com/lcg/CM/Recent.html ,对 EM 算法存在的问题比较清楚。我的初步结论是: EM 算法虽然在理论上有问题, 但是确实炼出金子了。 反过来也可以说, 虽然 EM 算法练出金子了,但是收敛很不可靠,流行的解释 EM 算法的收敛理由更是似是而非。有人使之神秘化,使得它是有炼金术之嫌。论据何在?下面我首先以混合模型为例,简单介绍 EM 算法, 并证明流行的 EM 算法收敛证明是错的 ( 没说算法是错的 ) 。 因为公式太多, 详见PDF文件: http://survivor99.com/lcg/CM/cm-ljs.pdf
想必大家一定知道 UC Berkely 大学在计算机专业领域的地位,计算机 top4 的 MIT , Stanford , Berkeley 和 CMU 为计算机的发展做出了不计其数的贡献,而 Berkely 特别以系统研究见长。 2009 年的时候, Berkeley 的大牛们总结了一篇《 Above the clouds: a berkeley view of cloud computing 》的论文,宣告云计算研究的兴起。 明白人看看这 6000+ 的引用次数就知道它的受欢迎程度了。最近, Berkeley 的大牛们针对目前 AI 的火爆形势,又总结了一篇《 A Berkeley View of Systems Challenges for AI 》,从系统研究的角度考虑支持 AI 的若干有意义的研究课题,可预计的它的影响力也不会低。 在正式介绍这篇论文之前,咱先看下这个作者列表,八卦下这些作者: Ion Stoica , ACM Fellow , P2P Chord 作者, ApacheSpark 作者 DawnSong (宋晓东) , 2010 年麦克阿瑟天才奖( MacArthurFellows )得主,在 top4 高校为数不多的 女 华人教授 RalucaAda Popa , 这个比较年轻,名气还没起来,但是颜值够用,做安全方向 DavidPatterson , RISC (精简指令集计算机)的发明者,美国科学院工程院两院院士,计算机历史博物馆成员, ACM/IEEE/AAAS Fellow , ACM 杰出服务奖得主。 MichaelW. Mahoney , 这个不太了解 RandyKatz , RAID 磁盘阵列发明者之一,《现代逻辑设计》一书作者,美国工程院和美国艺术科学院两院院士, ACM/IEEE Fellow AnthonyD. Joseph ,做安全机器学习,个人主页不更新了,看来比较低调 MichaelJordan , LDA 作者,机器学习泰斗,美国科学院 / 工程院 / 艺术科学院三院院士, ACM/AAAI Fellow ,认知科学最高奖 Rumelhart Prize 得主,美国人工智能协会的艾伦奖得主, 2016 年入选最有影响力的计算机科学家 Joseph M.Hellerstein , Berkeley 计算机系 Jim Gray Professor (数据库首席教授),入选 MITTechnology Review 评选的世界百位杰出青年发明家( TR100 List ), ACM Fellow ,很多研究成果应用到商业和开源数据库中 (因为方向相关,个人喜欢读他的论文,很有见地) Joseph Gonzalez ,新人, GraphLab 作者 Ken Goldberg ,这是个神人,他不光是计算机科学家,还是个艺术家、作家、发明家。他老婆 Tiffany Shlain 是个很有名的电影导演,他和他老婆的电影作品多次获各种电影节的奖项。他还导演芭蕾舞剧,他还发明了一种音响装置 …. Ali Ghodsi ,这是个年轻人,想必很多人读过他的 paper ,是 Databricks 的 CEO 和创始人之一, Apache Spark 和 Apache Mesos 的作者 David Culler ,做网络的大牛, PlanetLab 作者 Pieter Abbeel ,做机器学习的大牛,创办了很多公司,吴恩达的学生, MIT 年度 TR35 创新奖(世界范围内 35 位 35 岁以下创新者)得主,青年科学家总统奖得主,还有各种国际级国家级大奖的得主,各种国际会议 NIPS/ICML/ICRA 最佳论文奖,不一一列举了 正文比较长,我把重点先挑出来: 四大趋势: 关键性任务的人工智能(Mission-critical AI) 个性化人工智能(Personalized AI) 跨多组织机构的人工智能(AI across organizations) 后摩尔定律时期的人工智能(AI demands outpacing the Moore's Law 九大挑战: 持续学习(Continual learning) 鲁棒决策(Robust decisions) 可解读的决策(Explainable decisions) 安全飞地(Secure enclaves) 对抗学习(Adversarial learning) 在保密数据上的共享学习(Shared learning on confidential data) 特定领域定制的硬件(Domain specific hardware) 组件化的AI系统(Composable AI systems) 跨云端和边缘的系统(Cloud-edge systems) 下面开始翻译全文,原文可见 https://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-159.pdf Berkeley 观点:人工智能系统研究的挑战 Ion Stoica, Dawn Song, Raluca Ada Popa, DavidPatterson, Michael W. Mahoney, Randy Katz, Anthony D. Joseph, Michael Jordan,Joseph M. Hellerstein, Joseph Gonzalez, Ken Goldberg, Ali Ghodsi, David Culler,Pieter Abbeel 摘要: 近年来,随着计算机视觉、语音识别、机器翻译的技术的发展和商业化,及诸如数字广告和智能基础设施等基于机器学习的后台技术的普遍部署,人工智能已经从实验室的研究项目变成了实际生产系统不可或缺的关键技术。正是因为积累的海量数据、计算能力前所未有的发展高度、机器学习方法的不断进展、系统软件和架构的持续创新、及方便这些技术落地的开源项目和云计算平台,促使了人工智能技术的广泛应用。 下一代人工智能系统将更广泛地影响我们的生活,人工智能将会通过与环境交互替人类进行更关键的和更加个性化的决策。若想要人工智能发挥更大的作用,我们将面临诸多极具挑战性的问题:我们需要人工智能系统可以在各种极端情况下及时做出安全的决策,比如在各种恶意攻击情况下具备鲁棒性,在保证隐私的情况下具备处理跨多组织多个人的共享数据的能力。随着摩尔定律的终结,存储和处理数据的能力将受限,这些挑战也将变得更加难以解决。在这篇文章里,我们将总结在系统领域、体系结构领域、安全领域等方面的具体研究方向。 关键词: 人工智能,机器学习,系统,安全 1. 引言 自二十世纪 60 年代最初提出模拟人类智能的构想以来,人工智能已经成为一种被广泛应用的工程技术,它利用算法和数据可以解决包括模式识别、学习、决策等多种问题,被工程和科学中越来越多的学科所使用,同时也需要其他多种学科的研究所支持,成为计算领域一门交叉学科技术。 计算机系统近年来推动了人工智能技术的发展,并行计算设备 和高扩展性软件系统 的进步促进了机器学习框架 和算法 的发展,使人工智能可以处理真实世界的大规模问题;存储设备、众包、移动 APP 、物联网、数据采集成本的迅速降低 促使了数据处理系统和人工智能技术的进一步发展 。在很多实际任务中,人工智能已经接近甚至超过了人类,成熟的人工智能技术不仅大大提高了网络搜索和电子商务等主流产品的服务质量,也促进了物联网、增强现实、生物技术、自动驾驶汽车等新兴产业的发展。 许多应用需要人工智能系统与现实世界的交互来进行决策,例如无人驾驶飞机、机器人手术、医疗诊断治疗、虚拟助手等。由于现实世界是不断变化的,有时甚至是意料之外的变化,这些应用需要持续( continual learning )学习、终身学习( life-long learning ) 和永动学习( never-ending learning ) 。终身学习通过高效地转化和利用已经学过的知识来完成新的任务,并且要最大程度降低突发性遗忘带来的问题 。永动学习每次迭代处理一个任务集合,随着这个任务集合的不断变大,处理结果的质量每次迭代后越来越好。 为了满足以上这些需求,我们要面临诸多艰巨的挑战,比如如何主动探索不断动态变化的环境、如何在恶意攻击和噪音输入情况下做出安全稳定的决策、如何提高决策的可解读能力、如何设计模块化架构以简化应用系统构建等。另外,由于摩尔定律的终结,我们也不能寄希望于计算和存储能力的增强来解决这些下一代人工智能系统的问题。 解决这些难题需要体系结构、软件和算法的协同创新。这篇文章并不是解决人工智能算法和技术上的某些特定问题,而是分析系统方面的研究对人工智能技术发展的重要性,提出若干有意义的系统方面的研究方向。 2. 人工智能成功背后的原因 人工智能在过去二十年飞速发展的原因归结于三点: 1 )大数据, 2 )高扩展性的计算机和软件系统, 3 )开源软件( Spark 、 TensorFlow 、 MXNet 、 Caffe 、 PyTorch 、 BigDL )及公有云服务( Amazon AWS 、 Google Cloud 、 MS Azure )的兴起和流行,这使研究人员可以很容易的租用 GPU 服务器或者 FPGA 服务器来验证他们的算法。 3. 趋势和挑战 虽然人工智能已经应用到了众多应用领域,但是人类希望在更多领域发挥人工智能的作用,包括健康医疗、交通运输、工业制造、国防、娱乐、能源、农业、销售业等等领域。大规模系统和机器学习框架已经帮助人工智能取得了一定程度的成功,我们期待计算机系统能够可以更进一步地促进人工智能的发展。我们需要考虑如下几个人工智能发展的趋势来应对挑战。 3.1 关键性任务的人工智能( Mission-critical AI ) 从银行交易到自动驾驶,再到机器人手术和家居自动化,人工智能开始涉及到一些关键性任务,这些应用往往与人们的生命安全息息相关。如果人工智能要在动态变化的环境中部署,人工智能系统必须能够不断地适应新环境并且学习新技能。例如,自动驾驶汽车应该快速适应各种无法预料的危险路况(如事故或冰面道路),这可以通过观察其它汽车处理这些危险的行为进行实时学习;还有基于人工智能的入侵检测系统必须在入侵行为发生后立刻迅速地检测到新的攻击行为。另外,这些关键性任务也必须能够处理各种噪声数据及防范各种恶意的人为攻击。 挑战: 通过与动态变化的环境不断交互,设计可以不断学习和自适应的人工智能系统,使其可以做出及时、稳定、安全的决策。 3.2 个性化人工智能( Personalized AI ) 从虚拟助理到自动驾驶和政治竞选,考虑用户行为(如虚拟助理要学习用户的口音)和用户偏好(如自动驾驶系统要学习用户的驾驶习惯和偏好)的个性化决策越来越重要。这就需要采集大量敏感的用户个人信息,对这些敏感信息的滥用有可能会反过来泄漏用户的隐私。 挑战: 设计支持个性化服务的系统,同时要保护用户的隐私和保证用户的安全。 3.3 跨多组织机构的人工智能( AI across organizations ) 各大公司利用第三方数据来提升他们自己的人工智能服务的质量 ,许多医院开始共享他们的数据来防止疫情暴发,金融机构也会共享他们的数据来提升各自的欺诈检测能力。以前是一个公司利用自己业务收集的数据进行处理分析并提供服务,而未来将是多个公司共享数据来提供服务,这种趋势将导致数据垄断到数据生态系统的变革。 挑战: 设计多组织机构数据的共享机制,支持跨多组织机构的人工智能系统,同时要保障各组织机构自己数据的保密性,甚至是共享给竞争对手的数据也要保证数据的隐私信息不被泄露。 3.4 后摩尔定律时期的人工智能( AI demands outpacing the Moore’s Law ) 处理和存储大数据的能力是近年来人工智能成功的关键因素,然而匹配人工智能进步需求的大数据处理能力将变得越来越困难,主要有以下两点原因: 第一,数据量持续以指数级规模增长。 2015 年思科白皮书 声称,万物网( Internetof Everything )设备采集的数据量到 2018 年将达到 400ZB ,几乎是 2015 年估计数据量的 50 倍;近期研究 预测,到 2025 年,为了处理人类基因组,我们需要计算机处理能力有 3 到 4 个数量级的增长,这就需要计算机处理能力每年至少以 2 倍的速度增长。 第二,相对于数据爆炸,计算硬件设备处理能力的增长遇到了瓶颈 。 DRAM 内存和磁盘容量预计在未来十年才能翻倍,而 CPU 性能预计在未来二十年才能翻倍,这种不匹配的增长速度意味着,在未来,存储和处理大数据将变得异常困难。 挑战: 开发针对特定用途( domain-specific )的架构和软件系统,以适应后摩尔定律时期人工智能应用的需要,这包括针对特定人工智能应用的定制芯片、以提高数据处理效率为目的的边缘 - 云联合计算系统( edge-cloud systems )、以及数据抽象技术和数据采样技术。 4. 研究方向 这一部分讨论如何利用系统、安全、体系结构领域的创新成果来解决之前提出的若干挑战和问题。我们总结了 9 个研究方向,可以分类为三大主题,包括:动态环境下的处理技术、安全的人工智能、人工智能定制的体系结构。下图总结了人工智能的 4 大趋势和 9 大研究方向的关联关系。 4.1 动态环境下的处理技术( Acting in dynamic environments ) 相对于目前主流的针对静态数据进行学习,未来的人工智能应用将会在动态性更强的环境下进行,这种动态性体现为突发性、不可预期性、不可重复性等方面。例如,一队机器人负责维护一座办公楼的安全,当其中一个机器人坏掉或者一个新机器人加入后,其它机器人能够统一地更新各自的巡逻路径、巡逻目的、协同控制机制。或者由于某机器人自己的异常行为(比如被恶意控制)或者外部环境变化(比如电梯失灵)导致的环境突发性变化,所有机器人必须要重新迅速调整自己的策略。这就要求人工智能系统即使没有相关处理经验情况下也能快速响应的能力。 研究课题 1 :持续学习( Continuallearning )。 目前许多人工智能系统,包括电影推荐、图片识别、自动翻译等,都是通过离线训练和在线预测完成的。也就是说,这些任务不是基于动态数据的持续学习来完成的,而是通过对某个时间段的静态数据进行定期学习来完成的,利用定期学习得到的模型来预测未来。通常这种定期学习每一天进行一次,通过学习前一天的数据来更新模型,最好的情况也是每小时进行一次,但是预测和决策是需要每分每秒都发生的,定期学习有可能是利用过时的数据学习,这就导致了定期学习无法适应持续动态变化的环境,特别是对于关键性任务定期学习就会更加危险了。甚至于某些任务需要人工智能能够持续地学习和适应异步的变化,这就使持续学习变得更加困难了。 适应动态环境的学习在某些方面可以应用在线学习( online learning ) 解决,在线学习基于随时到来的数据更新模型,但是传统在线学习只能简单地应对数据的变化,不能应对环境变化(比如机器人的例子),另外传统在线学习需要对它的动作及时进行收益打分反馈以更新自己的模型,不能适应反馈延迟的复杂情况(比如下棋时的收益反馈只能在整局棋结束才知道,即输或者赢)。 这些情况可以利用增强学习( Reinforcementlearninig )来解决,增强学习的核心任务是学习一个策略函数,它以最大化某长远收益为目标,建立一个观察值到输出行为的映射关系。比如在自动驾驶中以避免碰撞为目标,建立一个汽车摄像头拍摄的图像到减速动作的映射,或者在推荐系统中以增加销售量为目标,建立一个用户访问网页请求到显示某广告动作的映射。增强学习算法根据用户动作对环境的影响变化来更新模型策略,如果由于环境的变化又导致了收益的变化,它也会相应地更新模型策略。增强学习其实在某些领域已经取得了很大的成功,包括在十五子棋 、学习行走 、基本运动技能的学习 等领域都可以达到很好的效果。但是,它需要对每一个应用进行有针对性的调整。近期将深度神经网络和增强学习结合的方法(深度增强学习 Deep RL )可以达到更加稳定学习效果并适用于不同应用领域,包括近期 Google 的 AlphaGo ,另外在医疗诊断 和资源管理 等方面都取得了成功。 支持增强学习的系统( Systems forRL ):现在许多增强学习应用依赖于模拟现实世界的反馈来解决复杂任务,通常需要万亿次的模拟来搜索可行解空间,例如在智能游戏中尝试不同变种的游戏设置,或者在机器人模拟中试验不同的控制策略。每次模拟尝试可能仅需要几毫秒,但是每次模拟需要的时间极不稳定,比如可能仅需要走几步棋就输了,也可能需要走几百步棋赢了。现实世界部署增强学习系统需要处理来自众多不同接收器观察到的环境变化数据,对应不同接收器的处理任务在处理时间、计算量、资源需求方面可能有很大不同,系统也要具有在固定时间内处理这些异构任务的能力。比如大规模集群系统要能够在一秒钟内完成上百万次的模拟,而现有的系统还远不能达到这个需求,流行的数据并行系统 每秒仅能处理几万或几千次的模拟,而高性能计算系统和分布式深度学习系统 并不能处理异构的任务,所以我们需要新的支持高效增强学习应用的系统出现。 模拟现实( Simulatedreality, SR ):与外部环境交互的能力是增强学习成功的关键,然而与现实世界的环境交互可能很久才能得到反馈(甚至几十秒几百秒),另外与现实世界环境的交互也可能会造成不可逆的物理伤害,而我们往往需要上百万次的交互才能学到一个比较好的策略模型,这就使与现实世界的交互变得不太可行。有一些算法可以减少与现实世界交互的次数 ,更通用的方法是利用模拟现实,这样就可以在真正做出交互动作之前,利用模拟现实环境不断进行模拟和预测,执行收益最高代价最小的动作。 模拟现实使学习不但更快而且更安全,想象一个机器人在打扫房间,突然发现一个以前从没见过的新手机,如果机器人在真实世界中进行一系列的尝试来学习如果抓起这个手机的话,它可能需要很长时间的尝试,也可能由于一次用力过猛的尝试直接捏碎手机。而如果机器人能够提取手机的形状信息,并在虚拟现实环境中尝试不同的动作,学习到手机的硬度、质地、重量等信息,然后在真实环境中使用一个合理的姿势和力度抓起手机,就可以避免手机被搞坏了。 模拟现实不同于虚拟现实( virtualreality, VR ),虚拟现实是模拟一个假想的环境(例如《我的世界》这个游戏,玩家可以在一个随机生成的 3D 世界内,以带材质贴图的立方体为基础进行游戏)或者是利用过去的真实世界场景(例如飞行模拟器),而模拟现实是模拟人工智能实体正在交互的那个真实环境。模拟现实也不同于增强现实( augmented reality, AR ),增强现实是在真实世界场景中加入虚拟物体。 模拟现实系统最大的挑战是,为了模拟不断变化的真实世界环境,需要不断更新模拟器的参数,同时,要在做出一个动作之前执行很多次的模拟尝试。因为学习算法与真实世界交互,它可以获得很多知识来提高模拟准确度,在每次与真实环境交互后都要更新模拟器参数,并在做下一个动作之前完成很多很多次类似“如果这么做结果会怎样”的尝试,所以模拟尝试一定要很快很快。 研究内容 :( 1 )构建支持增强学习的系统,它需要充分利用并行能力,支持动态任务图( dynamic task graphs ),达到毫秒级的反应速度,并且能够在异构硬件环境中保持反应速度;( 2 )构建模拟现实系统,可以完全模拟(动态变化的,不可预期的)真实世界环境,并且需要实时的反应速度。 研究课题 2 :鲁棒决策( Robustdecisions ) 。人工智能替人类做出决策,特别是在关键性任务上,它应该能够在获得各种不确定的或者是错误的输入和反馈时,能够做出具有鲁棒性的决策。在统计和机器学习领域,防噪声干扰和鲁棒学习是一个核心问题,增加系统层面的支持将会显著提升传统方法的性能。例如构建可以追踪数据来源的系统,对输出不稳定的数据源特殊照顾,避免不确定性带来的影响,我们也可以利用从其它数据源获得的信息来帮助构建基于每个数据源的噪音模型(例如发现遮挡的摄像头),这些能力要求数据存储系统具有对数据源检查和噪音建模的能力。有两种鲁棒性对于人工智能系统尤为重要:( 1 )在噪音输入的情况下和恶意虚假反馈情况下的鲁棒学习能力;( 2 )在存在意外输入( unforeseen inputs )和对抗输入( adversarial inputs ,对抗输入是扮演攻击角色,试图用来引发模型出错的机器学习模型的输入)的情况下的鲁棒决策能力。 学习系统使用从不可靠的数据源获得的数据,这些数据可能是打了不正确的标签,有些时候可能是故意的。例如微软的 Tay 聊天机器人就过于依赖与人类的交流来提高对话能力了,当被放在 Twitter 上与人交流一段时间后, Tay 就学坏了 。(在 Tay 推出一天之后,因为 Tay 开始有一些种族歧视之类的偏激言论,因此微软暂时关闭了 Tay 的 Twitter 账号,这些言论明显的是和网络上一些有偏激言论的人互动后,被刻意教导而出现的) 除了处理噪声数据,另外一个研究问题是应对与训练数据分布完全不同的输入,我们希望系统能够判断出这些反常数据并做出安全的反应动作,比如在自动驾驶中的安全动作就是减速停车,或者是如果有人在旁边的话,系统能够把控制权交给人类。最好是设计一个模型可以明确拒绝对其不确信的输入进行反应,或者是执行一个默认安全的动作,这样可以大大降低计算开销并且执行准确可靠的动作。 研究内容 :( 1 )构建具有精确追踪数据来源能力的人工智能系统,可以将收益变化与每个数据来源进行联系,能够自动学习基于每个数据源的噪音模型;( 2 )设计可以指定决策置信区间的编程接口和语言,允许用户根据实际应用的安全程度需要指定置信区间,并且能够标识反常的数据输入。 研究课题 3 :可解读的决策( Explainabledecisions ) 。除了黑盒预测和决策,人工智能系统往往需要向人类解释他们的决策,这往往在一些监管性的任务,还有安全和医疗等需要负法律责任的应用上尤为重要。这里的可解读性并不是可理解性( interpretable ),可理解性只是强调人工智能算法的输出对于某领域的专家是可以理解的,而可解读性的意思是能够指出输入数据的那些属性导致了这个输出结果,并且能够回答反事实问题( counterfactual questions ,虽然没有实际发生,但是假设发生了会怎样)或者回答“如果 XX 会怎样?”的问题。例如在医疗诊断中,我想要知道 X 射线检查出来某个器官的哪些指标(如大小、颜色、位置、形式)导致了这个诊断结果,如果那些指标稍微变化一点的话结果会有什么样的变化,或者是我想知道是否有其他指标组合也会导致同样的诊断,哪些指标组合最有可能导致这个诊断。我们不仅想要解释这个输出结果,还要知道哪些其他的输入也会导致这个结果,这种因果推断( causal inference )是未来许多人工智能任务的必备功能。 实际上,支持决策可解读性的关键一点是,记录和重现导致某一决策结果的计算过程的能力,这就需要系统层面的支持,系统根据过去导致某决策输出的输入数据可以重现计算,或者根据随机的或者对抗性的输入,或者根据反事实的输入,如果系统能够具有这些根据不同输入重现计算的能力,就可以帮助人工智能系统分析输入和输出的因果关系,提高决策的可解读能力。例如基于视频的安全警报系统,它想要找出什么原因导致了一个错误警报,可以通过扰动输入视频数据(比如遮挡视频图像的某些区域),或者是通过用近期相似的历史数据来尝试,看这些尝试是否会导致同样的错误警报,或者是看对警报发生概率的影响。这样的系统支持也能帮助提高新模型的统计判断能力和训练测试效果,例如设计一些有解读能力的新模型。 研究内容 :构建具有交互诊断分析能力的 AI 系统,它可以完全重现执行过程,并可以帮助分析那些对结果起关键作用的输入,这可以是通过尝试各种扰动的输入来尝试重现决策结果,甚至是使系统具有因果推断能力。 4.2 安全的人工智能( Secure AI ) 安全是个广泛的课题,人工智能应用普及和发展的关键往往都是安全相关的问题。例如,执行关键性任务的人工智能应用,个性化学习,跨组织结构的学习,这些都需要系统具有很强的安全性。安全问题的涉及面很广,我们这里只关注两大类安全问题。第一类是攻击者影响决策的正确性:攻击者可以通过破坏和控制 AI 系统本身,或者通过特意改变输入来使系统不知不觉地做出攻击者想要的决定。 第二类是攻击者获取 AI 系统训练的保密数据,或者破解加密模型。接下来,我们讨论三个有前途的研究方向来抵御这种攻击。 研究课题 4 :安全飞地( Secureenclaves ) 。(飞地:某国家拥有一块与本国主体分离开来的领土,该领土被其他国家包围,则该领土被称为飞地。比如在西德与东德尚未合并前,原本柏林境内属于美英法占领区所合并的西柏林市,四周皆被苏联控制的东德领土包围,是最出名的一块飞地)公共云的迅速崛起以及软件栈的复杂性日益增加, AI 应用程序受到攻击的风险大大增加。 二十年前,大多数应用程序都运行在商业操作系统(如 Windows 或 SunOS )之上,位于企业防火墙后部署的单个服务器上。今天,各企业公司可能在公共云上的分布式服务器上运行 AI 应用程序,这些租用的服务器是他们无法控制的,很可能与其竞争对手共享的一个复杂的软件栈,操作系统本身运行在虚拟机管理程序之上或在容器内。而且这些应用程序直接或间接地共享着其他系统,如日志摄取系统,存储系统和数据处理框架。如果这些软件组件中的任何一个受到危害, AI 应用程序本身可能会受到影响。 处理这些攻击的一般方法是提供一个“安全飞地”抽象,就是一个安全的硬件执行环境,它保护飞地内运行的应用程序免受在飞地外运行的恶意代码的影响。最近的例子是英特尔的软件防护扩展( SGX ) ,它提供了一个硬件隔离的执行环境。 SGX 内部的代码可以根据输入数据进行计算,即使是受损的操作系统或管理程序(在飞地之外运行)也无法看到这些代码或数据。 SGX 还提供了远程认证 ,一个协议使远程客户端能够验证该飞地是否正在运行预期的代码。 ARM 的 TrustZone 是另一个硬件飞地的例子。另一方面,云服务提供商开始提供物理保护的特殊裸机实例,它们部署在安全的“保险柜”中,只有授权人员通过指纹或虹膜扫描进行身份验证才有权访问。 一般来说,使用任何飞地技术,应用程序开发人员必须信任飞地内运行的所有软件。而事实上,即使在硬件飞地里,如果在飞地内运行的代码受到入侵,也可能泄露解密的数据或影响决策。 由于小型代码库通常更容易保护,所以一个研究的方向是将 AI 系统的代码拆分成在飞地内运行的代码,并且让其尽可能少,然后在不可信环境下通过利用密码技术运行另一部分代码。另一种确保飞地内的代码不会泄露敏感信息的方法是开发静态和动态验证工具以及沙盒方法( sandboxing ) 。 请注意,除了最小化可信计算区域之外,分割应用程序代码还有两个额外好处:增加功能性和降低成本。首先,某些功能可能在飞地内不可用,例如用于运行深度学习( DL )算法的 GPU 处理,或未经审查 / 移植以在安全飞地内运行的服务和应用程序。其次,由云提供商提供的安全实例可能比常规实例贵得多。 研究内容 :建立利用安全飞地的 AI 系统,以确保数据的保密性、用户隐私和决策正确性,将 AI 系统的代码拆分为在飞地内运行的最小代码库和在飞地外运行的代码,保证该飞地不泄露信息和不损害决策的正确性。 研究课题 5 :对抗学习( Adversariallearning ) 。机器学习算法的自适应特性使学习系统面临新型的攻击,比如通过恶意地改变训练数据或决策输入来影响决策的正确性。有两种广泛的攻击类型:闪避攻击( evasion attacks )和药饵攻击( data poisoning attacks )。 闪避攻击发生在推理阶段,攻击者试图制作被学习系统错误分类的数据 。比如略微改变一个停车标志的形象,虽然人类仍然认为它是一个停车标志,但自动驾驶汽车可能视为一个避让标志。 药饵攻击发生在训练阶段,对手将药饵数据(例如,具有错误标签的数据)注入训练数据集中,导致学习系统学习错误的模型,从而使攻击者具有了导致学习器错误分类的输入数据 。如果用于再训练的弱标记数据是从不可信或不可靠的来源收集的,定期进行再训练的学习系统特别容易受到这种攻击。 随着新的 AI 系统不断地与动态环境交互来学习,处理药饵攻击变得越来越重要。 现在还没有什么有效的解决方案来防范闪避攻击,所以有一些研究挑战:解释为什么对抗攻击往往容易发现,发现可以有效地防御攻击者的方法,评估防御措施的防御能力。对于药饵攻击,研究挑战包括如何检测药饵输入数据,以及如何建立适应不同类型药饵攻击的学习系统。另外,因为数据来源被认定为具有欺诈性或因监管原因被明确撤回的数据源,我们可以利用重现技术(参见研究课题 3 :可解读的决策)和增量计算来有效地消除这些来源对学习模型的影响。正如前面所指出的,这种能力是通过在数据存储系统中将建模与数据来源和有效计算结合起来实现的。 研究内容: 构建具有对抗学习能力的 AI 系统,在训练和预测期间,通过设计新的机器学习模型和网络体系结构追踪欺诈数据源,在去掉欺诈数据源后重现或重做计算以获取新的正确的决策。 研究课题 6 :在保密数据上的共享学习( Sharedlearning on confidential data ) 。如今,每家公司通常都会收集数据,分析数据,并使用这些数据来实现新的功能和产品。然而,并不是所有的企业都拥有像 Google , Facebook ,微软和亚马逊这样的大型 AI 公司所拥有的大量数据。展望未来,我们期待越来越多的公司会收集有价值的数据,会出现更多的第三方数据服务公司,并从多个公司组织的数据中获取更多的好处(参见第 3 节)。 事实上,根据我们与工业界的合作经历,我们发现这种情况越来越多。一家大银行为我们提供了一个场景,他们和其他银行希望将他们的数据汇集在一起,并使用共享的学习来改进他们的合作欺诈检测算法。虽然这些银行在金融服务方面是竞争对手,但这种“合作”对于减少由于欺诈活动而造成的损失对他们来说至关重要。另外,一个非常大的医疗保健提供商描述了一个类似的情景,其中有竞争关系的多家医院希望共享数据来训练一个预测流感暴发的共享模型,但是分享的数据不能用作其他目的。这将使他们能够提高对流行病的反应速度并控制疾病暴发,在关键地点迅速部署流动疫苗接种车。同时,每家医院都要保护自己医院数据中病人的隐私信息。 共享学习的关键挑战是如何利用属于不同(可能是竞争关系的)组织的数据学习模型,但同时不会在训练过程中泄漏这些数据的隐私信息。一种可能的解决方案是将所有数据集中在硬件飞地上学习这个模型,但是因为硬件飞地还没有被广泛部署,在某些情况下,由于监管约束或者数据量太大,数据无法复制到硬件飞地上。 另一个比较有前途的方法是使用安全多方计算( MPC ) 。 MPC 允许 n 方(每方都有私人输入)计算输入的联合功能,而没有任何一方知道其他方的输入。但是,虽然 MPC 对于简单的计算是有效的,但是对于复杂的计算,比如模型训练来说,它有一个非常大的开销。一个有趣的研究方向是研究如何将模型训练分成( 1 )局部计算和( 2 ) MPC 计算,这样我们就可以最小化 MPC 的计算复杂度。 虽然在不影响数据保密的情况下训练模型是实现共享学习的重要一步,但是还有其他问题。模型服务,即基于模型的推断,仍然可能泄露数据的隐私信息 。应对这一挑战的一个方法是使用差分隐私( differential privacy )技术 ,这是一种在统计数据库中的流行技术。差分隐私为每个查询增加了噪声,以保护数据隐私 。差分隐私的一个核心概念是隐私预算( privacy budgets ),该隐私预算限制了提供隐私保证的查询数量。 在将差分隐私应用于模型服务时,有三个有趣的研究方向:首先,利用模型和预测的固有统计特性,应用差分隐私处理复杂模型和推理;其次,尽管理论研究很多,但目前实际应用的差异性隐私系统很少,一个重要的研究方向是构建工具和系统,以便为实际的应用程序提供差分隐私的保护能力,包括智能地选择哪个隐私机制用于给定的应用程序,并自动将非差分隐私计算转换为差分隐私计算;最后,在持续学习中数据隐私是时间相关的,即新数据的隐私远比旧数据的隐私更重要。例如股票市场和在线投标,新数据的隐私是最重要的,而历史数据是不重要的甚至有时是公开的,可以开发具有自适应隐私预算的差分隐私系统,只为最新的数据的进行差分隐私保护,另一个研究方向是在数据公开后进一步发展差分隐私 。 即使我们能够在训练和决策过程中保护数据隐私,但是这还不够。事实上,即使数据隐私得到保证,组织和公司也可能拒绝分享其数据,因为这些数据可能改进竞争对手的服务质量。因此,我们需要研究激励机制以鼓励组织和公司共享其数据或其数据的副产品。具体而言,我们需要制定一些方法,让这些组织相信通过共享数据可以得到比不共享数据更好的服务(即更好的决策)。这就要求确定某个组织提供的数据的质量,这个问题可以通过排除法来解决,不论组织的数据是否包含在训练集中,都可以比较其性能,然后提供与组织提供的数据质量成反比的噪声来破坏决策,这可以激励组织提供更高质量的数据。总体而言,这种激励机制需要置于机制设计的框架内,以便组织机构制定个人数据共享策略。 研究内容 :构建具有如下两个功能的人工智能系统( 1 )可以跨多个数据源进行学习,而不会在训练或决策期间泄漏数据的隐私信息;( 2 )提供激励策略,以促使潜在竞争组织共享其数据。 4.3 AI 定制的体系结构( AI-speci!carchitectures ) 对 AI 的需求将会带来系统和硬件架构的双重革新。这些新式架构既可以提升性能,同时也会通过提供易于组合的丰富的模块化库来简化下一代 AI 应用的开发。 研究课题 7 :特定领域定制的硬件( Domainspecific hardware )。 处理和存储巨量的数据的能力是 AI 成功的关键因素之一(见 2.1 节),但是维持这种处理存储能力增长的速度将会越来越具有挑战性。正如第 3 部分所说,数据持续仍然呈指数级地增长,但 40 多年来支撑计算机工业发展的性能、成本、能耗方面的改进速度将放缓: l 摩尔定律的终结导致晶体管不会变得太小, l 登纳德缩放定律( Dennardscaling )的失效使得电能限制了芯片的承载规模, l 我们已经从一个低效的处理器 / 芯片过渡到了一个芯片装载多个高效处理器,但是阿姆达尔定律( Amdahl’s Law )说明了并行处理的极限 . 设计专用领域的处理器是保持处理器消耗能源带来性能上提升的方法之一。这些处理器只会做少量的特定任务但是会将它们做到极致。因此摩尔定律下的处理器性能的快速提升一定是基于计算机架构的革新而不是半导体工艺的改进。未来的服务将会涉及到比过去更加多元化的处理器。一个具有开拓性的专用领域处理器的例子就是谷歌的张量处理单元( Tensor Processing Unit, TPU ),它在 2015 被部署在谷歌的数据中心并由数十亿人使用。相较于同期的 CPU 和 GPU ,它将深度神经网络的前馈阶段的执行速度提升了 15 到 30 倍,而且每瓦特的性能也提升了 30 到 80 倍。此外,微软已宣布在他的 Azure 云平台上部署了 FPGA 。一系列的包括英特尔、 IBM 以及一些例如 Cerebras 和 Graphcore 这种初创公司在内的公司都在开发针对 AI 的特殊硬件,这会保持超越当前主流处理器的巨大性能提升的规律 。 考虑到 DRAM 也显露出了同样的极限,人们正在开发一些新奇的技术成为它的继任者。来自英特尔和镁光的 3D XPoint 旨在保持类 DRAM 访存性能的基础上提供 10 倍的存储能力。 STTMRAM 想要发展有类似于 DRAM 的扩展限制的闪存技术。因此云端可能会有更多级的存储和记忆能力,包含更广泛的技术。有了这些越来越多样化的处理、记忆和存储的设备之后,如何让服务匹配到硬件资源便成为一个挑战性更大的难题。相比于包括一个机柜顶部交换机和数十台服务器每个配备两个 CPU 、 1TB 的 DRAM 和 4TB 的闪存的经典标准机架设计,这些快速的变化要求我们建设更加灵活的云计算平台。 例如, UCBerkeley Firebox 项目 提出了一种多机架的超级计算机,它可以通过光纤将数以千计的处理器芯片与 DRAM 芯片和非易失性的存储介质链接起来,提供低延时、高带宽和长距离的传输能力。像这样的硬件系统可以支撑系统软件基于恰当比例和类型的专用领域的处理器、 DRAM 和 NVRAM 来提供计算服务。这种大规模分离式的资源可以极大提升愈加多样化的任务到与之匹配的多样化资源的分配。这对 AI 的工作负载十分有价值,它可以从大规模存储中获得显著的性能提升,并且适合应对多样化的资源需求。 除了性能提升之外,新的硬件架构也会带来其他额外功能,例如对安全的支持。尽管英特尔的 SGX 和 ARM 的 TrustZone 正在涉足硬件安全环境,但在它们可以完全应用于 AI 应用之前还有很多工作要做。特别是现存的安全环境显示出包括定址存储在内的多种资源极限,它们仅是服务于一般目的的 CPU 。打破这种限制,提供这些包括 GPU 和 TPU 在内的专用处理器之上的通用硬件抽象便是未来的研究方向。此外,像 RISC-V 的开放指令集处理器代表着一种令人激动的开发新安全特性的大环境趋势。 研究内容: ( 1 )设计专用领域的硬件架构来以数量级为单位提升性能降低 AI 应用消耗的能源,或者加强这些应用的安全性。( 2 )设计 AI 软件系统,利用这些专用领域的架构、资源分离式的结构和未来的非易失性存储技术。 研究课题 8 :组件化的 AI 系统( Composable AI systems ) 。模块化和组件化技术在软件系统快速更新中扮演着重要的角色,他们使开发人员能够基于现有组件快速构建产生新的系统。这样的例子包括微内核操作系统, LAMP 栈,微服务架构和网络。与此相反的是,现有的 AI 系统则是一整块的,这便导致系统很难开发测试和更新。 与此类似,模块化和组件化将会成为提升 AI 系统开发速度和应用度的关键,这会使在复杂系统中集成 AI 更加地容易。接下来,我们探讨几个关于模型和动作组合的研究问题。 模型组合( modelcomposition )对于开发更加复杂强大的 AI 系统至关重要。在一个模型服务系统中组合多种模型并以不同模式应用它们可以取得决策精度、延迟和吞吐量之间的折中效果。例如,我们可以序列化的查询模型,每一个模型可以反馈一个高准确度的决策或者说“我不知道”,然后决策会被传递到下一个模型。按照从最高到最低的“我不知道”比率和从最低到最高的延迟度来对模型加以排序,我们就可以同时优化延迟度和精确度了。 要想充分应用模型组合,仍然有很多难题亟待解决。例如( 1 )需要设计一种声明式语言( declarative language )来描述这些组件之间的拓扑结构和应用的性能目标,( 2 )为每个组件提供包括资源需求量、延时和产能在内的精确的性能模型,( 3 )通过调度和优化算法来计算出这些组件执行的计划,以及以最低消耗将组件匹配到相应的资源上来满足延时和吞吐量要求。 动作组合( actioncomposition )是将基本的决策 / 动作序列组织成低级原语,也称为选项( options )。例如对于无人驾驶汽车,某一个选项可以是当在高速公路上行驶时变更车道,执行的动作包括了加速减速,左右转向,打开转向灯等。对于机器人,某一个原语可以是抓取物体,执行的工作包括转动机器人的关节。选项已经在层级学习中被广泛的研究 ,它可以通过让代理选择一系列既存的选项来完成给定的任务而不是从更长的低级动作列表中选择,以此来极大地加速对新环境的学习或适应性。 丰富的选项库会使得新 AI 应用的开发就像当前的 web 开发人员通过调用强大的 web 接口以几行代码开发出应用一样,通过简单的组合恰当的选项来实现。另外,选项可以提升响应能力,因为基于选项来选择下一个动作要比在原始动作空间中选择一个动作简单得多。 研究内容: 设计 AI 系统和接口,使得模型和动作以一种模块化和灵活的方式进行组合,同时应用这些接口来开发丰富的模型和选项库以此极大简化 AI 应用的开发。 研究课题 9 :跨云端和边缘的系统( Cloud-edgesystems ) 。当今很多 AI 应用例如语音识别、自然语言翻译是部署在云上的。接下来我们预计跨边缘设备和云端的 AI 系统将快速增加。一方面,将当前仅在云上部署的 AI 系统例如用户推荐系统,迁移他们的部分功能到边缘设备上,这样可以提高安全性、保护隐私和降低时延(包括解决无法连接网络的问题)。另一方面,当前部署在边缘设备上的 AI 系统例如自动驾驶汽车、无人机和家用机器人都需要与云端共享数据且利用云端的计算资源来更新模型和策略 。 然而由于多种原因,要开发跨云和边缘的系统富有挑战。首先,边缘设备和云端数据中心服务器之间的计算能力相差很大。未来这种差距会更大,因为包括智能手机和输入板在内的边缘设备相较于数据中心的服务器有严格的电量和体积大小的极限。第二,各边缘设备之间在计算资源和能力上存在异构性,从低级的 ARM 或支持物联网设备的 RISC-V CPU 到自动驾驶汽车和软件平台的高性能 GPU 。这种异构性导致应用开发的难度加大。第三,边缘设备的硬件和软件更新周期要远远慢于数据中心。第四,随着存储能力提升速度的放缓而数据产生速度的持续加快,再去存储这些海量数据可能不再可行或者变得低效。 有两种方法可以解决云和边缘设备的融合。一个是通过多目标软件设计和编译技术将代码重定义到多样化的平台上面。为了解决边缘设备多样化的情况和升级运行在这些设备上的应用的困难,我们需要定义新的软件栈来抽象多种设备,将硬件能力以通用 API 的形式暴露给应用。另一个可能的方向是开发编译器和及时编译技术从而有效的编译正在运行的复杂算法,使它们能够在边缘设备上运行。这可以使用近期的代码生成工具,例如 TensorFlow 和 XLA 、 Halide 和 Weld 。 第二个通用方法是设计适应于在云和边缘云上分割执行的 AI 系统。例如,模型组合(见 4.3 )可以是在边缘设备上运行轻量级低精度的模型而在云上运行计算密集型高精度的模型。这种架构可以在不损失精确度的情况下降低延时,而且已经在最近的视频推荐系统中被采用 。再比如,动作组合可以是将对层级选项的学习放在云端,而这些选项的执行放在边缘设备上。 机器人是另一个可以利用边缘云架构的领域。当前很是缺乏机器人应用的开源平台。作为当今广泛使用的这种平台的 ROS 被限制只在本地运行而且缺少实时应用所需要的性能优化。为了利用 AI 研究的新成果,例如共享学习和持续学习,我们需要跨云和边缘云的系统,他可以允许开发者在机器人和云之间无缝地迁移功能,从而优化决策延时和学习收敛。云平台可以通过利用来自实时分布式的机器人收集到的信息运行复杂算法持续更新模型,而机器人可以基于之前下载的模型策略在本地持续地执行动作。 为了解决从边缘设备收集到的大量数据,可以采用适应于学习的压缩方法来减少处理开销,例如通过采样( sampling )和梗概( sketching ),这些方法都已经成功的应用在分析工作负载的任务上了 。一个研究方向就是以系统化的方式利用采样和梗概的方法来支持多种学习算法和预测任务。一个更大的挑战是减小存储消耗,这可能需要删除数据。关键是我们不知道未来数据会如何被使用。这是一个压缩问题,而且是针对于机器学习算法的压缩。此外,基于采样和梗概的分布式方法可以帮助解决该问题,就像机器学习方法在特征选择或模型选择策略上的应用一样。 研究方向 :设计跨云端和边缘的 AI 系统,( 1 )利用边缘设备减小延时、提升安全性并实现智能化的数据记忆技术,( 2 )利用云平台来共享各边缘设备的数据和模型,训练复杂的计算密集型的模型和采取高质量的决策。 5. 结论 人工智能在过去十年中取得的惊人进展,使其从研究实验室的研究项目成功转化为目前可以取代大量人力的商业应用核心技术。 人工智能系统和机器人不但取代了部分人类工作者,而且有挖掘人类潜力和促进新形式合作的作用 。 若想使人工智能更好地服务我们,要克服许多艰巨的挑战,其中许多挑战与系统和基础设施有关。人工智能系统需要做出更快、更安全和更易于解读的决策,确保这些决策在针对多种攻击类型的学习过程中得到准确的结果,在摩尔定律终结的前提下不断提高计算能力,以及构建易于整合到现有应用程序中的可组合系统,并且具有跨越云端和边缘的处理能力。 本文总结了几个系统、体系结构和安全方面的研究课题,我们希望这些问题能够启发新的研究来推动人工智能的发展,使其计算能力更强,具有可解释性、安全性和可靠性。 参考文献: A History of Storage Cost. 2017.http://www.mkomo.com/ cost-per-gigabyte-update. (2017). Martin Abadi, Ashish Agarwal, PaulBarham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis,Jerey Dean, and Matthieu Devin. 2015. TensorFlow: Large-scale machinelearning on heterogeneous systems. (2015). Mike Accetta, Robert Baron, WilliamBolosky, David Golub, Richard Rashid, Avadis Tevanian, and Michael Young. 1986.Mach: A New Kernel Foundation for UNIX Development. 93–112. Sameer Agarwal et al. 2013. BlinkDB:queries with bounded errors and bounded response times on very large data. InEuroSys. Ittai Anati, Shay Gueron, SimonJohnson, and Vincent Scarlata. 2013. Innovative technology for CPU basedattestation and sealing. In Proceedings of the 2nd international workshop onhardware and architectural support for security and privacy, Vol. 13. Ittai Anati, Shay Gueron, Simon PJohnson, and Vincent R Scarlata. 2013. Innovative Technology for CPU BasedAttestation and Sealing. (2013). Apache Hadoop. 2017.http://hadoop.apache.org/. (2017). Apache Mahout. 2017.http://mahout.apache.org/. (2017). Sergei Arnautov, Bohdan Trach, FranzGregor, Thomas Knauth, Andre Martin, Christian Priebe, Joshua Lind, DivyaMuthukumaran, Daniel OaĂŹKeee, Mark L Stillwell, et al. 2016. SCONE:Secure linux containers with Intel SGX. In 12th USENIX Symp. Operating SystemsDesign and Implementation. Peter Bailis, Edward Gan, SamuelMadden, Deepak Narayanan, Kexin Rong, and Sahaana Suri. 2017. MacroBase:Prioritizing Attention in Fast Data. In Proceedings of the 2017 ACMInternational Conference on Management of Data (SIGMOD ’17). ACM, New York, NY,USA, 541–556. Luiz Andre Barroso and Urs Hoelzle.2009. The Datacenter As a Computer: An Introduction to the Design ofWarehouse-Scale Machines. Morgan and Claypool. Andrew Baumann, Marcus Peinado, andGalen Hunt. 2015. Shielding applications from an untrusted cloud with haven.ACM Transactions on Computer Systems (TOCS) 33, 3 (2015), 8. Michael Ben-Or, Sha! Goldwasser, andAvi Wigderson. 1988. Completeness theorems for non-cryptographic fault-tolerantdistributed computation. In Proceedings of the 20th ACM symposium on Theory ofComputing. James Bergstra, Olivier Breuleux, FrédéricBastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian,David Warde-Farley, and Yoshua Bengio. 2010. Theano: a CPU and GPU mathexpression compiler. In Proceedings of the Python for scientic computingconference (SciPy), Vol. 4. Austin, TX, 3. bigdl. BigDL: Distributed DeepLearning on Apache Spark. https://software.intel.com/en-us/articles/bigdl-distributed-deep-learning-on-apache-spark. (????). Tay (bot). 2017.https://en.wikipedia.org/wiki/Tay_(bot). (2017). Léon Bottou. 1998. On-line Learning inNeural Networks. (1998), 9–42. Léon Bottou. 2010. Large-scale machinelearning with stochastic gradient descent. In Proceedings of COMPSTAT’2010.Springer, 177–186. Cerebras. 2017.https://www.cerebras.net/. (2017). Chainer. 2017. https://chainer.org/.(2017). T.-H. Hubert Chan, Elaine Shi, andDawn Song. 2010. Private and Continual Release of Statistics. In ICALP (2),Vol. 6199. Springer. Tianqi Chen, Mu Li, Yutian Li, MinLin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and ZhengZhang. 2015. MXNet: A Flexible and E#cient Machine Learning Library forHeterogeneous Distributed Systems. arXiv preprint arXiv:1512.01274 (2015). Tianqi Chen, Mu Li, Yutian Li, MinLin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and ZhengZhang. 2015. MXNet: A Flexible and E#cient Machine Learning Library forHeterogeneous Distributed Systems. CoRR abs/1512.01274 (2015). Travers Ching, Daniel S Himmelstein,Brett K Beaulieu-Jones, Alexandr A Kalinin, Brian T Do, Gregory P Way, EnricoFerrero, Paul-Michael Agapow, Wei Xie, Gail L Rosen, et al. 2017. OpportunitiesAnd Obstacles For Deep Learning In Biology And Medicine. bioRxiv (2017),142760. cisco. 2015. Cisco Global Cloud Index:Forecast and Methodology, 2015- 2020.http://www.cisco.com/c/dam/en/us/solutions/collateral/service-provider/global-cloud-index-gci/white-paper-c11-738085.pdf. (2015). D. Clark. 1988. The Design Philosophyof the DARPA Internet Protocols. SIGCOMM Comput. Commun. Rev. 18, 4 (Aug.1988), 106–114. CMS updates rule allowing claims datato be sold. 2016. http://www.modernhealthcare.com/article/20160701/NEWS/160709998. (2016). Graham Cormode, Minos Garofalakis,Peter J Haas, and Chris Jermaine. 2012. Synopses for massive data: Samples,histograms, wavelets, sketches. Foundations and Trends in Databases 4, 1–3(2012), 1–294. Daniel Crankshaw, Xin Wang, GiulioZhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. 2017. Clipper: ALow-Latency Online Prediction Serving System. NSDI ’17 (2017). Peter Dayan and Georey E.Hinton. 1992. Feudal Reinforcement Learning. In Advances in Neural InformationProcessing Systems 5, . 271–278. http://papers.nips.cc/paper/714-feudal-reinforcement-learning Jerey Dean, Greg Corrado, RajatMonga, Kai Chen, Matthieu Devin, Mark Mao, Marcaurelio Ranzato, Andrew Senior,Paul Tucker, Ke Yang, Quoc Le, and Andrew Y. Ng. 2012. Large Scale DistributedDeep Networks. In NIPS ’12. http://papers.nips.cc/paper/4687-large-scale-distributed-deep-networks.pdf Jerey Dean and Sanjay Ghemawat.2004. MapReduce: Simpli!ed Data Processing on Large Clusters. In Proceedings ofthe 6th Conference on Symposium on Opearting Systems Design Implementation - Volume 6 (OSDI’04). DeepMind AI Reduces Google Data CentreCooling Bill by 40%. 2017. https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill-40/.(2017). Thomas G. Dietterich. 1998. The MAXQMethod for Hierarchical Reinforcement Learning. In Proceedings of the FifteenthInternational Conference on Machine Learning (ICML 1998), Madison, Wisconsin,USA, July 24-27, 1998. 118–126. John Duchi, Michael Jordan, and MartinWainwright. to appear. Minimax optimal procedures for locally private estimation.J. Amer. Statist. Assoc. (to appear). Cynthia Dwork. 2006. DierentialPrivacy. In ICALP (2), Vol. 4052. Springer. Cynthia Dwork. 2008. Dierentialprivacy: A survey of results. In International Conference on Theory andApplications of Models of Computation. Cynthia Dwork, Moni Naor, ToniannPitassi, and Guy N Rothblum. 2010. Differential privacy under continualobservation. In Proceedings of the 42nd ACM symposium on Theory of computing. Cynthia Dwork and Aaron Roth. 2014. Thealgorithmic foundations of dierential privacy. Foundations and Trends inTheoretical Computer Science 9 (2014). The Economist. 2017. The world’s mostvaluable resource is no longer oil, but data. (May 2017). FireBox. 2017. https://bar.eecs.berkeley.edu/projects/2015-!rebox.html.(2017). Matt Fredrikson, Somesh Jha, andThomas Ristenpart. 2015. Model inversion attacks that exploit con!denceinformation and basic countermeasures. In Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Security. ACM, 1322–1333. Sanjay Ghemawat, Howard Gobio,and Shun-Tak Leung. 2003. The Google File System. In Proceedings of theNineteenth ACM Symposium on Operating Systems Principles (SOSP ’03). 29–43. Ken Goldberg. 2017. Op-Ed: Call itMultiplicity: Diverse Groups of People and Machines Working Together. WallStreet Journal (2017). Oded Goldreich, Silvio Micali, and AviWigderson. 1987. How to play any mental game. In Proceedings of the 19th ACMsymposium on Theory of computing. Joseph E. Gonzalez, Yucheng Low,Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: DistributedGraph-parallel Computation on Natural Graphs (OSDI’12). 17–30. Ian J Goodfellow, Jonathon Shlens, andChristian Szegedy. 2014. Explaining and harnessing adversarial examples. arXivpreprint arXiv:1412.6572 (2014). Graohcore. 2017.https://www.graphcore.ai/. (2017). Alon Halevy, Peter Norvig, , andFernando Pereira. 2009. The Unreasonable Eectiveness of Data. IEEEIntelligent Systems 24, 2 (2009), 8–12. Halide: A Language for ImageProcessing and Computational Photography. 2017. http://halide-lang.org/.(2017). Joseph M. Hellerstein, Peter J. Haas,and Helen J. Wang. 1997. Online Aggregation. In Proceedings of the 1997 ACMSIGMOD International Conference on Management of Data (SIGMOD ’97). ACM, NewYork, NY, USA, 171–182. https://doi.org/10.1145/253260.253291 Joseph M. Hellerstein, Christoper Ré,Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, KeeSiong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar. 2012. The MADlibAnalytics Library: Or MAD Skills, the SQL. Proc. VLDB Endow. 5, 12 (Aug. 2012),1700–1711. John L. Hennessy and David A.Patterson. to appear. Computer Architecture, Sixth Edition: A QuantitativeApproach. (to appear). Intel Nervana. 2017.https://www.intelnervana.com/intel-nervana-hardware/. (2017). Michael Isard, Mihai Budiu, Yuan Yu,Andrew Birrell, and Dennis Fetterly. 2007. Dryad: Distributed Data-parallelPrograms from Sequential Building Blocks. In Proceedings of the 2nd ACMSIGOPS/EuroSys European Conference on Computer Systems 2007 (EuroSys ’07).59–72. Martin Jaggi, Virginia Smith, MartinTakac, Jonathan Terhorst, Sanjay Krishnan, Thomas Homann, and Michael I.Jordan. 2015. Communication-E#cient Distributed Dual Coordinate Ascent. InNIPS, 27. Yangqing Jia, Evan Shelhamer, JeDonahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, andTrevor Darrell. 2014. Cae: Convolutional architecture for fast featureembedding. In Proceedings of the ACM International Conference on Multimedia.ACM, 675–678. Norman P. Jouppi, Cli Young,Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates,Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin,Cliord Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau,Jerey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati,William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, RobertHundt, Dan Hurt, Julian Ibarz, Aaron Jaey, Alek Jaworski, AlexanderKaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy,James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, AlanLundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, RahulNagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick,Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, EmadSamadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, DanSteinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma,Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, andDoe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor ProcessingUnit. In Proceedings of the 44th Annual International Symposium on ComputerArchitecture (ISCA ’17). ACM, New York, NY, USA, 1–12. https://doi.org/10.1145/3079856.3080246 Daniel Kang, John Emmons, FirasAbuzaid, Peter Bailis, and Matei Zaharia. 2017. Optimizing Deep CNN-BasedQueries over Video Streams at Scale. CoRR abs/1703.02529 (2017). Asterios Katsifodimos and SebastianSchelter. 2016. Apache Flink: Stream Analytics at Scale. Ben Kehoe, Sachin Patil, PieterAbbeel, and Ken Goldberg. 2015. A Survey of Research on Cloud Robotics andAutomation. IEEE Trans. Automation Science and Eng. 12, 2 (2015). Diederik P. Kingma and Jimmy Ba. 2014.Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980 Sanjay Krishnan, Roy Fox, Ion Stoica,and Ken Goldberg. 2017. DDCO: Discovery of Deep Continuous Options for RobotLearning from Demonstrations. In 1st Conference on Robot Learning (CoRL). LAMP (software bundle). 2017.https://en.wikipedia.org/wiki/LAMP_(software_ bundle). (2017). Leo Leung. 2015. How much data does xstore? (March 2015). https://techexpectations.org/tag/how-much-data-does-youtube-store/ Sergey Levine, Chelsea Finn, TrevorDarrell, and Pieter Abbeel. 2016. End-to-end Training of Deep VisuomotorPolicies. J. Mach. Learn. Res. 17, 1 (Jan. 2016), 1334–1373. http://dl.acm.org/citation.cfm?id=2946645.2946684 Mu Li, David G. Andersen, Jun WooPark, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J.Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with theParameter Server. In OSDI ’14. 583–598. J. Liedtke. 1995. On Micro-kernelConstruction. In Proceedings of the Fifteenth ACM Symposium on OperatingSystems Principles (SOSP ’95). ACM, New York, NY, USA, 237–250. https://doi.org/10.1145/224056.224075 M. W. Mahoney and P. Drineas. 2009.CUR Matrix Decompositions for Improved Data Analysis. Proc. Natl. Acad. Sci.USA 106 (2009), 697–702. Dahlia Malkhi, Noam Nisan, BennyPinkas, Yaron Sella, et al. 2004. FairplaySecure Two-Party Computation System..In USENIX Security Symposium, Vol. 4. San Diego, CA, USA. Michael Mccloskey and Neil J. Cohen.1989. Catastrophic Interference in Connectionist Networks: The SequentialLearning Problem. The Psychology of Learning and Motivation 24 (1989), 104–169. H. Brendan McMahan, Eider Moore,Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2016. Communication-E#cientLearning of Deep Networks from Decentralized Data. In Proceedings of the 20thInternational Conference on Articial Intelligence and Statistics(AISTATS). http://arxiv.org/abs/1602.05629 Shike Mei and Xiaojin Zhu. 2015. TheSecurity of Latent Dirichlet Allocation.. In AISTATS. Shike Mei and Xiaojin Zhu. 2015. UsingMachine Teaching to Identify Optimal Training-Set Attacks on Machine Learners..In AAAI. 2871–2877. Xiangrui Meng, Joseph Bradley, BurakYavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai,Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, RezaZadeh, Matei Zaharia, and Ameet Talwalkar. 2016. MLlib: Machine Learning inApache Spark. Journal of Machine Learning Research 17, 34 (2016), 1–7. http://jmlr.org/papers/v17/15-237.html Tom M Mitchell, William W Cohen,Estevam R Hruschka Jr, Partha Pratim Talukdar, Justin Betteridge, AndrewCarlson, Bhavana Dalvi Mishra, Matthew Gardner, Bryan Kisiel, JayantKrishnamurthy, et al. 2015. Never Ending Learning.. In AAAI. 2302–2310. Volodymyr Mnih, Koray Kavukcuoglu,David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves,Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen,Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran,Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level controlthrough deep reinforcement learning. Nature 518, 7540 (26 02 2015), 529–533. http://dx.doi.org/10.1038/nature14236 Dharmendra Modha. 2016. The brainaĂŹsarchitecture, e#ciencyaĂę on a chip. (Dec. 2016).https://www.ibm.com/blogs/research/2016/12/the-brains-architecture-e#ciency-on-a-chip/ Derek G. Murray, Malte Schwarzkopf,Christopher Smowton, Steven Smith, Anil Madhavapeddy, and Steven Hand. 2011.CIEL: A Universal Execution Engine for Distributed Data-%ow Computing. InProceedings of the 8th USENIX Conference on Networked Systems Design andImplementation (NSDI’11). USENIX Association, Berkeley, CA, USA, 113–126. http://dl.acm.org/citation.cfm?id=1972457.1972470 Average Historic Price of RAM. 2017.http://www.statisticbrain.com/averagehistoric-price-of-ram/. (2017). Frank Olken and Doron Rotem. 1990.Random sampling from database !les: A survey. Statistical and ScienticDatabase Management (1990), 92–111. Open MPI: Open Source High PerformanceComputing. 2017. https://www. open-mpi.org/. (2017). Shoumik Palkar, James J. Thomas, AnilShanbhag, Deepak Narayanan, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe,and Matei Zaharia. 2017. Weld: A Common Runtime for High Performance DataAnalytics. In CIDR. Ronald Parr and Stuart J. Russell.1997. Reinforcement Learning with Hierarchies of Machines. In Advances inNeural Information Processing Systems 10, . 1043–1049. http://papers.nips.cc/paper/1384-reinforcement-learning-with-hierarchies-of-machines Pattern: Microservice Architecture.2017. http://microservices.io/patterns/ microservices.html. (2017). Jan Peters and Stefan Schaal. 2008.Reinforcement learning of motor skills with policy gradients. Neural networks21, 4 (2008), 682–697. Gil Press. 2016. Forrester PredictsInvestment In Arti!cial Intelligence Will Grow 300% in 2017. Forbes (November2016). Project Catapult. 2017.https://www.microsoft.com/en-us/research/project/ project-catapult/. (2017). PyTorch. 2017. http://pytorch.org/.(2017). Rajat Raina, Anand Madhavan, andAndrew Y. Ng. 2009. Large-scale Deep Unsupervised Learning Using GraphicsProcessors. In Proceedings of the 26th Annual International Conference onMachine Learning (ICML ’09). ACM, New York, NY, USA, 873–880. https://doi.org/10.1145/1553374.1553486 Benjamin Recht, Christopher Re,Stephen Wright, and Feng Niu. 2011. Hogwild: A Lock-Free Approach toParallelizing Stochastic Gradient Descent. In NIPS 24. John Schulman, Sergey Levine, PhilippMoritz, Michael I. Jordan, and Pieter Abbeel. 2015. Trust Region PolicyOptimization. In Proceedings of the 32nd International Conference on MachineLearning (ICML). Felix Schuster, Manuel Costa, CédricFournet, Christos Gkantsidis, Marcus Peinado, Gloria Mainar-Ruiz, and MarkRussinovich. 2015. VC3: Trustworthy data analytics in the cloud using SGX. InSecurity and Privacy (SP), 2015 IEEE Symposium on. IEEE, 38–54. Reza Shokri, Marco Stronati, and VitalyShmatikov. 2016. Membership inference attacks against machine learning models.arXiv preprint arXiv:1610.05820 (2016). David Silver, Aja Huang, Chris JMaddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, JulianSchrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, andothers. 2016. Mastering the game of Go with deep neural networks and treesearch. Nature 529, 7587 (2016), 484–489. Daniel L Silver, Qiang Yang, andLianghao Li. 2013. Lifelong Machine Learning Systems: Beyond LearningAlgorithms.. In AAAI Spring Symposium: Lifelong Machine Learning, Vol. 13. 05. Satinder P. Singh. 1992. ReinforcementLearning with a Hierarchy of Abstract Models. In Proceedings of the 10thNational Conference on Articial Intelligence. San Jose, CA, July 12-16,1992. 202–207. http://www.aaai.org/Library/AAAI/1992/ aaai92-032.php Evan R Sparks et al. 2013. MLI: An APIfor distributed machine learning. In ICDM. Stephane Ross. 2013. InteractiveLearning for Sequential Decisions and Predictions.https://en.wikipedia.org/wiki/LAMP_(software_bundle). (2013). Zachary D Stephens, Skylar Y Lee,Faraz Faghri, Roy H Campbell, Chengxiang Zhai, Miles J Efron, Ravishankar Iyer,Michael C Schatz, Saurabh Sinha, and Gene E Robinson. 2015. Big data:Astronomical or genomical? PLoS Biology 13, 7 (2015), e1002195. Richard S. Sutton. Integratedarchitectures for learning, planning, and reacting based on approximatingdynamic programming. In Proceedings of the Seventh International Conference onMachine Learning. Morgan Kaufmann. Richard S. Sutton, Doina Precup, andSatinder P. Singh. 1999. Between MDPs and Semi-MDPs: A Framework for TemporalAbstraction in Reinforcement Learning. Artif. Intell. 112, 1-2 (1999), 181–211.https://doi.org/10.1016/S0004-3702(99) 00052-1 Christian Szegedy, Wojciech Zaremba,Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus.2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199(2013). Kai-Fu Tang, Hao-Cheng Kao, Chun-NanChou, and Edward Y. Chang. 2016. Inquire and Diagnose: Neural Symptom CheckingEnsemble using Deep Reinforcement Learning.http://infolab.stanford.edu/~echang/NIPS_DeepRL_2016_ Symptom_Checker.pdf.(2016). Russ Tedrake, Teresa Weirui Zhang,and H Sebastian Seung. 2005. Learning to walk in 20 minutes. In Proceedings ofthe Fourteenth Yale Workshop on Adaptive and Learning Systems, Vol. 95585. YaleUniversity New Haven (CT), 1939–1412. TensorFlow Serving. 2017.https://tensor%ow.github.io/serving/. (2017). TensorFlow XLA. 2017.https://www.tensor%ow.org/performance/xla/. (2017). Gerald Tesauro. 1995. Temporaldierence learning and TD-Gammon. Commun. ACM 38, 3 (1995), 58–68. Sebastian Thrun. 1998. Lifelonglearning algorithms. Learning to learn 8 (1998), 181–209. Sebastian Thrun and Anton Schwartz.1994. Finding Structure in Reinforcement Learning. In Advances in NeuralInformation Processing Systems 7, . 385–392. http://papers.nips.cc/paper/887-!nding-structure-in-reinforcement-learning Joshua Tobin, Rachel Fong, Alex Ray,Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. 2017. DomainRandomization for Transferring Deep Neural Networks from Simulation to the RealWorld. CoRR abs/1703.06907 (2017). http://arxiv.org/abs/1703.06907 Eric Tzeng, Judy Homan, NingZhang, Kate Saenko, and Trevor Darrell. 2014. Deep Domain Confusion: Maximizingfor Domain Invariance. CoRR abs/1412.3474 (2014). http://arxiv.org/abs/1412.3474 Huang Xiao, Battista Biggio, GavinBrown, Giorgio Fumera, Claudia Eckert, and Fabio Roli. 2015. Is featureselection secure against training data poisoning?. In ICML. 1689–1698. Matei Zaharia et al. 2012. Resilientdistributed datasets: A fault-tolerant abstraction for in-memory clustercomputing. In NSDI ’12. Haoyu Zhang, Ganesh Ananthanarayanan,Peter Bodík, Matthai Philipose, Paramvir Bahl, and Michael J. Freedman. 2017.Live Video Analytics at Scale with Approximation and Delay-Tolerance. In NSDI.
人工智能的浪潮正在席卷全球,诸多词汇时刻萦绕在我们耳边:人工智能(Artificial Intelligence)、机器学习(Machine Learning)、深度学习(Deep Learning)。不少人对这些高频词汇的含义及其背后的关系总是似懂非懂、一知半解。 为了帮助大家更好地理解人工智能,这篇文章用最简单的语言解释了这些词汇的含义,理清它们之间的关系,希望对刚入门的同行有所帮助。 图一 人工智能的应用 人工智能:从概念提出到走向繁荣 1956年,几个计算机科学家相聚在达特茅斯会议,提出了“人工智能”的概念,梦想着用当时刚刚出现的计算机来构造复杂的、拥有与人类智慧同样本质特性的机器。其后,人工智能就一直萦绕于人们的脑海之中,并在科研实验室中慢慢孵化。之后的几十年,人工智能一直在两极反转,或被称作人类文明耀眼未来的预言,或被当成技术疯子的狂想扔到垃圾堆里。直到2012年之前,这两种声音还在同时存在。 2012年以后,得益于数据量的上涨、运算力的提升和机器学习新算法(深度学习)的出现,人工智能开始大爆发。据领英近日发布的《全球AI领域人才报告》显示,截至2017年一季度,基于领英平台的全球AI(人工智能)领域技术人才数量超过190万,仅国内人工智能人才缺口达到500多万。 人工智能的研究领域也在不断扩大,图二展示了人工智能研究的各个分支,包括专家系统、机器学习、进化计算、模糊逻辑、计算机视觉、自然语言处理、推荐系统等。 图二 人工智能研究分支 但目前的科研工作都集中在弱人工智能这部分,并很有希望在近期取得重大突破,电影里的人工智能多半都是在描绘强人工智能,而这部分在目前的现实世界里难以真正实现(通常将人工智能分为弱人工智能和强人工智能,前者让机器具备观察和感知的能力,可以做到一定程度的理解和推理,而强人工智能让机器获得自适应能力,解决一些之前没有遇到过的问题)。 弱人工智能有希望取得突破,是如何实现的,“智能”又从何而来呢?这主要归功于一种实现人工智能的方法——机器学习。 学习人工智能相关技术知识,可了解一下在线教育平台—— 深蓝学院 。深蓝学院是致力于人工智能等前沿科技的在线教育平台。 机器学习:一种实现人工智能的方法 机器学习最基本的做法,是使用算法来解析数据、从中学习,然后对真实世界中的事件做出决策和预测。与传统的为解决特定任务、硬编码的软件程序不同,机器学习是用大量的数据来“训练”,通过各种算法从数据中学习如何完成任务。 举个简单的例子,当我们浏览网上商城时,经常会出现商品推荐的信息。这是商城根据你往期的购物记录和冗长的收藏清单,识别出这其中哪些是你真正感兴趣,并且愿意购买的产品。这样的决策模型,可以帮助商城为客户提供建议并鼓励产品消费。 机器学习直接来源于早期的人工智能领域,传统的算法包括决策树、聚类、贝叶斯分类、支持向量机、EM、Adaboost等等。从学习方法上来分,机器学习算法可以分为监督学习(如分类问题)、无监督学习(如聚类问题)、半监督学习、集成学习、深度学习和强化学习。 传统的机器学习算法在指纹识别、基于Haar的人脸检测、基于HoG特征的物体检测等领域的应用基本达到了商业化的要求或者特定场景的商业化水平,但每前进一步都异常艰难,直到深度学习算法的出现。 深度学习:一种实现机器学习的技术 深度学习本来并不是一种独立的学习方法,其本身也会用到有监督和无监督的学习方法来训练深度神经网络。但由于近几年该领域发展迅猛,一些特有的学习手段相继被提出(如残差网络),因此越来越多的人将其单独看作一种学习的方法。 最初的深度学习是利用深度神经网络来解决特征表达的一种学习过程。深度神经网络本身并不是一个全新的概念,可大致理解为包含多个隐含层的神经网络结构。为了提高深层神经网络的训练效果,人们对神经元的连接方法和激活函数等方面做出相应的调整。其实有不少想法早年间也曾有过,但由于当时训练数据量不足、计算能力落后,因此最终的效果不尽如人意。 深度学习摧枯拉朽般地实现了各种任务,使得似乎所有的机器辅助功能都变为可能。无人驾驶汽车,预防性医疗保健,甚至是更好的电影推荐,都近在眼前,或者即将实现。 三者的区别和联系 机器学习是一种实现人工智能的方法,深度学习是一种实现机器学习的技术。我们就用最简单的方法——同心圆,可视化地展现出它们三者的关系。 图三 三者关系示意图 目前,业界有一种错误的较为普遍的意识,即“ 深度学习最终可能会淘汰掉其他所有机器学习算法 ”。这种意识的产生主要是因为,当下深度学习在计算机视觉、自然语言处理领域的应用远超过传统的机器学习方法,并且媒体对深度学习进行了大肆夸大的报道。 深度学习,作为目前最热的机器学习方法,但并不意味着是机器学习的终点。起码目前存在以下问题: 1. 深度学习模型需要大量的训练数据,才能展现出神奇的效果,但现实生活中往往会遇到小样本问题,此时深度学习方法无法入手,传统的机器学习方法就可以处理; 2. 有些领域,采用传统的简单的机器学习方法,可以很好地解决了,没必要非得用复杂的深度学习方法; 3. 深度学习的思想,来源于人脑的启发,但绝不是人脑的模拟,举个例子,给一个三四岁的小孩看一辆自行车之后,再见到哪怕外观完全不同的自行车,小孩也十有八九能做出那是一辆自行车的判断,也就是说,人类的学习过程往往不需要大规模的训练数据,而现在的深度学习方法显然不是对人脑的模拟。 深度学习大佬 Yoshua Bengio 在 Quora 上回答一个类似的问题时,有一段话讲得特别好,这里引用一下,以回答上述问题: Science is NOT a battle, it is a collaboration. We all build on each other's ideas. Science is an act of love, not war. Love for the beauty in the world that surrounds us and love to share and build something together. That makes science a highly satisfying activity, emotionally speaking! 这段话的大致意思是,科学不是战争而是合作,任何学科的发展从来都不是一条路走到黑,而是同行之间互相学习、互相借鉴、博采众长、相得益彰,站在巨人的肩膀上不断前行。机器学习的研究也是一样,你死我活那是邪教,开放包容才是正道。 结合机器学习2000年以来的发展,再来看Bengio的这段话,深有感触。进入21世纪,纵观机器学习发展历程,研究热点可以简单总结为2000-2006年的流形学习、2006年-2011年的稀疏学习、2012年至今的深度学习。未来哪种机器学习算法会成为热点呢?深度学习三大巨头之一吴恩达曾表示,“在继深度学习之后,迁移学习将引领下一波机器学习技术”。但最终机器学习的下一个热点是什么,谁用能说得准呢。
D-Wave系统(量子计算机系统)显示量子计算机可以学会在堆积如山的海量数据中发现粒子的特征,但目前还没有超过常规方法。 物理学家们一直在努力开发能够使用量子力学技巧加速计算的量子计算机。但他们也希望这样的量子计算机能够回报人们的青睐,帮助他们发现新的自然规律。 现在,一个研究团队已经表明量子电路可以通过学习从原子碰撞实验的大量数据中筛选数据以搜寻一个新的粒子。他们 的验证原理的 研究使用D-Wave公司的量子计算机,在处理现在熟悉的希格斯玻色子时,并没有提供比传统技术明显的优势。但是研究者们认为,当数据量增长到更大时,量子机器学习将会在未来的实验中产生影响。他们的研究发表在10月18日的自然杂志上。 原文: DWave system shows quantum computers can learn to detect particle signatures in mountains of data, but doesn’t outpace conventional methods — yet. Physicists have been working hard to develop machines that can use quantum mechanical tricks to speed up computation. But they also hope that such quantum computers can return the favour and help them to discover new laws of nature. Now, a team has shown that a quantum circuit can learn to sift through reams of data from atom-smashing experiments in search of a new particle. Their proof-of-principle study — performed using a machine built by quantum-computing company D-Wave working on the now-familiar case of the Higgs boson — does not yet provide a clear advantage over conventional techniques. But the authors say that quantum machine learning could make a difference in future experiments, when the amounts data will grow even larger. Their research was published on 18 October in Nature. 来源: http://www.nature.com/news/quantum-machine-goes-in-search-of-the-higgs-boson-1.22860?WT.mc_id=Weibo_NatureNews_20171020_CONT Solving a Higgs optimization problem with quantum annealing for machine learning http://www.nature.com/nature/journal/v550/n7676/full/nature24047.html
上周,在IJAC优先在线发表的几篇论文中,麻省理工美国人文与科学院院士Tomaso Poggio的一篇有关深度学习的综述成为一大亮点。Poggio教授在这篇文章中阐述了有关神经网络深度学习的一个基本问题:为什么深层网络比浅层网络更好? 文章内容延续了Poggio教授在2016年8月在中国人工智能大会(2016CCAI)上的演讲报告《The Science and the Engineering ofIntelligence》。 图1 来源于2016 CCAI 演讲PPT 图2 来源于2016 CCAI 演讲PPT “ 深度学习架构和机器学习模式的搭建,来自于神经学方面的研究进展,换句话说,同样的架构是存在于大脑皮质当中的。 关于深度学习,已经有成千上万的研究者在不同领域进行这方面的研究,比如无人驾驶、语音识别等等。可是我们还不清楚,为什么深度学习在这些工程应用中会起作用,深度学习的机理是什么? 我们对这个问题很感兴趣另外一个原因是:探讨深度学习的机理也将有助于我们理解‘为什么大脑皮质会存在一些不同的层次?’ ” Poggio 教授在这篇文章中,将为您解读深度学习的关键理论、最新成果和开放式研究问题。 同时这篇文章也是IJAC即将发表的 Special Issue on Human Inspired Computing 中的一篇文章。该专题其他热文将陆续优先在线发表,敬请期待。 一点点题外话:小编在去年的CCAI大会上有幸拜访了Poggio教授,教授博学、谦逊而富有亲和力的形象给小编也留下深刻印象。他曾提到:期望能帮助年轻人更好的了解神经科学、理解机器学习。如果要在智能方面走得远,不能只靠计算机,还需要与人类本身的研究相互结合,才能碰撞出更多的东西。 接下来,小编将为您奉上这篇文章的具体信息,以及IJAC近期在线发表的其他文章链接,欢迎下载阅读 【 Title 】 Why and when can deep-but not shallow-networksavoid the curse of dimensionality: A review 【 Author 】 Tomaso Poggio, Hrushikesh Mhaskar, LorenzoRosasco, Brando Miranda, Qianli Liao 【 Abstract 】 The paper reviews and extends an emerging bodyof theoretical results on deep learning including the conditions under which itcan be exponentially better than shallow learning. A class of deepconvolutional networks represent an important special case of these conditions,though weight sharing is not the main reason for their exponential advantage.Implications of a few key theorems are discussed, together with new results,open problems and conjectures. 【 Keywords 】 Machine learning, neural networks, deep andshallow networks, convolutional neural networks, function approximation, deeplearning 【 Full Text 】 https://link.springer.com/article/10.1007/s11633-017-1054-2 【 Publish date 】 Published online March 14, 2017 The other recentlypublished online papers include: 【 Title 】 Improvement of wired drill pipe data qualityvia data validation and reconciliatio 【 Author 】 Dan Sui, Olha Sukhoboka, Bernt Sigve Aadnøy 【 Keywords 】 Data quality, wired drill pipe (WDP), datavalidation and reconciliation (DVR), drilling models 【 Full Text 】 https://link.springer.com/article/10.1007/s11633-017-1068-9 【 Publish date 】 Published online March 4, 2017 【 Title 】 Reaction torque control of redundantfree-floating space robot 【 Author 】 Ming-He Jin, Cheng Zhou, Ye-Chao Liu, Zi-QiLiu, Hong Liu 【 Keywords 】 Redundant space robot, reaction torque,reactionless control, base disturbance minimization, Linux/real timeapplication interface (RTAI) 【 Full Text 】 https://link.springer.com/article/10.1007/s11633-017-1067-x 【 Publish date 】 Published online March 4, 2017 【 Title 】 A piecewise switched linear approach fortraffic flow modeling 【 Author 】 Abdelhafid Zeroual,Nadhir Messai, SihemKechida, Fatiha Hamdi 【 Keywords 】 Switched systems, modeling, macroscopic,traffic flow, data calibration 【 Full Text 】 https://link.springer.com/article/10.1007/s11633-017-1060-4 【 Publish date 】 Published online March 4, 2017 【 Title 】 Navigation of non-holonomic mobile robot usingneuro-fuzzy logic with integrated safe boundary algorithm 【 Author 】 A. Mallikarjuna Rao, K. Ramji, B. S. K.Sundara Siva Rao, V. Vasu, C. Puneeth 【 Keywords 】 Robotics, autonomous mobile robot (AMR),navigation, fuzzy logic, neural networks, adaptive neuro-fuzzy inference system(ANFIS), safe boundary algorithm 【 Full Text 】 https://link.springer.com/article/10.1007/s11633-016-1042-y 【 Publish date 】 Published online March 4, 2017 【 Title 】 Method for visual localization of oil and gaswellhead based on distance function of projected features 【 Author 】 Ying Xie, Xiang-Dong Yang, Zhi Liu, Shu-NanRen, Ken Chen 【 Keywords 】 Robot vision, visual localization, 3D objectlocalization, model based pose estimation, distance function of projectedfeatures, nonlinear least squares, random sample consensus (RANSAC) 【 Full Text 】 https://link.springer.com/article/10.1007/s11633-017-1063-1 【 Publish date 】 Published online March 4, 2017 【 Title 】 Virtual plate based controlling strategy oftoy play for robots communication development in JA space 【 Author 】 Wei Wang, Xiao-Dan Huang 【 Keywords 】 Human robot cooperation, joint attention (JA)space, reachable space, toy play ability, a virtual plate 【 Full Text 】 https://link.springer.com/article/10.1007/s11633-016-1022-2 【 Publish date 】 Published online February 21, 2017 阅读更多IJAC优先在线出版论文: https://link.springer.com/journal/11633
线性代数源于解线性方程组,这是数学公式逆向的计算。在计算机时代之前,限于手工计算的能力,线性代数多是作为理论研究的基础。到了现代,它已是理工科但凡涉及到统计和数值计算必备的工具了,最近更成为机器学习热门的基础课。 10.1 病态系统 数学的理论无论多么漂亮,应用到实践,总要落实到数值的计算。线性系统是最简单实用的数学模型,理论上解线性方程组有非常确定的结果,但也可能有意外。让我们看一个例子。 解线性方程组0.410x + 0.492y = 0.902,0.492x + 0.590y =1.082;得x=1,y=1. 若第一个方程右边的0.902略有变动成为0.901,解就变成x= 4.5976,y= -2.000; 同样第二个方程右边的1.082的数值变成1.081后,解就变成x= -2.000,y= 3.500. 实践中的数据总有些误差,方程的参数微小的变动,竟让计算结果面目全非,这确实让人始料不及。这样不稳定的计算结果在应用上毫无价值。问题在于,这个差异并非是计算误差造成的,将它们代入方程验证都精确无误 . 这就无从计算上来改善了。这种对参数微小变动敏感的系统称为是病态的( ill-conditioned ),这是数学模型的问题。这个例子,在几何图像上不难看到原因。每个方程在空间确定一条直线,方程的解是这两条直线的交点。这两个方程确定的直线近于平行,所以位置略有变化(红线和绿线),它们的交点(红点和绿点)的位置就差得很远。 在数值分析中,条件数( conditionnumber )用来描述数学模型系统的微扰对计算结果的影响。大致说来,对条件数κ , log 10 κ 是你在计算可能要丢掉的数字位数( log 10 ‖Δ x/x ‖ ≤ log 10 κ+ log 10 ‖Δ b/b ‖), 当然这个数κ是很难确定的,它是作为相对误差上界的估算。对于解线性方程组 Ax=b ,它的意思是‖Δ x/x ‖ ≤κ‖Δ b/b ‖,经过推导有κ (A) = ‖ A ‖ * ‖ A -1 ‖,对通常的 L2 范数κ (A)= | σ max / σ min | ,它是 A 做奇异值分解后最大与最小奇异值之比,在可对角化的矩阵即是最大绝对值的特征值与最小绝对值的特征值之比。正交阵(或酉阵)的条件数是 1 ,这是最小的矩阵条件数值,所以正交阵具有最好的计算稳定性。这说明我们建立线性的数学模型时,要尽量选择近于正交的数据来建立方程组。 在MATLAB或Octave可以用cond(A)指令计算A的条件数。上面例子中κ(A) = 6099.6. 10.2 支点对计算的影响 高斯消去法几乎是线性代数各种计算的基本手段。它不仅用来化简矩阵计算行列式。对线性方程组的求解,把方程右边的列向量拼入左边的矩阵,成为增广矩阵( A , b ),然后对它进行行间的变换,直至左上角部分变成三角阵,然后对照右边最后一列,判断方程是否无解,若有解则用右边三角阵部分迭代求解,多解时则同样可以迭代解出零空间的线性无关向量。 矩阵求逆,则把单位阵拼入左边成为增广矩阵( A , I ),右边部分记录了所有的行变换,与解方程一样,先从上到下,用支点( pivot )即主对角线上的元素,消去对角线下非零元素,把左边部分变成上三角阵。然后从下到上消去对角线上非零元素。如果 A 是满秩的,增广阵终将成为( I , A -1 ),右边即是逆矩阵。在这从上到下,三角阵化的消去过程中,只要支点是非零的,我们不需要交换行来进行,这时增广矩阵的右边的子矩阵是个下三角矩阵 U ,而左边是个上三角的子矩阵 L ,这时的增广矩阵( U , L )实现了 LU 分解。 理论上,只要这个过程中的支点是非零元,用消去法解方程、求逆和做 LU 分解都是可以的。实际上仍然会遇到问题。看下面用第一行消去第二行第一列,做的 LU 分解。 条件数 κ(A)=2.6184尚属于良态,而κ(L) =10 8 ,κ(U) =0.9999*10 8 ,都是非常病态了,用这个分解做计算会带来很大的误差。问题在于计算过程中支点的数值太小,解决的办法是运用支点做消去法前,先搜寻支点所在位置及下面,从中选出最大元,交换这两行使得最大元在支点位置。对这个矩阵A,是先交换上下两行,然后再做消去法,这样有: 这时 κ(L) =1.0001,κ(U) =2.6182,都是良态的。所以在矩阵的计算中,为了减少误差经常需要交换行或列,这个步骤可能隐含在算法中,也可能需要表示在计算机函数的式子里。通常用置换矩阵P来表示这些行或列的交换,如在MATLAB或Octave中指令 =lu(A),指的是 PA = LU的分解。为了减少误差,分解指令 =lu(A),如果需要交换行,则把P吸收在G中,G=P T L,A=GU,这时G不再能保持L的下三角阵的形式了。 10.3 机器学习 人们走向理性,依赖于在意识层次上的逻辑求证。不明因果机理的预测和难以追踪判断过程的结论,都被视为迷信而被科学排斥。算术曾经是解决实际问题计算唯一可靠的途径,在那里应用问题分门别类地归纳成不同的类型,诸如鸡兔同笼、宝塔装灯等等,每个计算步骤和所得的数量,都有直观可以理解的含义。代数的方法偏离了直观推理的途径走向抽象,三个未知数的线性代数方程,已经难以用单向逻辑推理的路径来追踪它解法的含义。我们只能用严格的逻辑,来证明每一步的代数运算都是等价或蕴含原来问题的不同描述。由此,我们可以用简化了问题的计算答案来回答原来的问题。在理性求证的过程中,我们把解方程的代数方法看成可以信赖的中间站,将现实问题的各种关系表示成方程后,放弃了对解法计算的每一步的追踪判断,直接承认它的结果。物理等科学研究沿用这种思想,把实际问题描述成数学模型后,直接依赖于数学的分析和计算。 机器学习将代数方法的启示推向另一个高度。它不再依赖于人力把实践中的预测和判断问题描述成一个数学模型,而是运用一个带有众多可调参数的通用数学模型,直接利用拥有的数据来调节参数自行寻找适用于这一类数据的具体模型,以此应用解答预测和判断问题。 与代数的方法取代算术方法的困惑一样,机器学习的调整模型参数及应用模型的计算机制,在数学上都是精确有效的。但巨大数量的可变参数,难以把这简单数学模型的一个具体的辨识判断过程,解析成为像物理规律那样单纯过程的因果性机制,无法用简单逻辑推演的跟踪来获得理解。机器学习的智能渐渐走离我们理性的监督,却成为未来应用计算的利器。下面简单对此作介绍。 机器学习与人类经验公式和分类的基本计算是一样的,都是用线性的方法计算参数,找出与实验数据匹配最小误差的数学模型,以此用来预测未知。对经验公式,用的是线性回归,找出那个线性函数,它是个与实验数据误差最小的的超平面;对模式识别的分类,用逻辑回归,找出那个判别函数,它是个分隔两类样本的超平面,与实验数据的误差最小。最后它们都归结为确定那个超平面的线性代数计算。(注:这里的超平面,指 n 维几何空间中的 n-1 维平面,它不是指那种过原点作为 n-1 维线性子空间的超平面。) 在 n 维线性空间中,满足内积〈 z , a 〉 = b 的向量 z ,构成 n 维几何空间中的一个 n-1 维平面,将向量扩充到 n+1 维空间,令 x =(1, z T ) T , w =(-b, a T ) T ,这个内积可以表示为〈 x , w 〉 = 0. 线性回归( linear regression ) :在线性回归的数学模型中,假定有足够多描述事物的属性,表示为函数的变量,归纳了经验的数值公式是这些属性变量的线性函数,我们尽可能应用大量的实验数据,来统计出误差最小的模型的系数。具体计算如下。 假设 n 维的属性向量 x i 和公式结果标量 y i ,经验公式有线性的关系〈 x i , w 〉 = y i ,其中 w 是待定的参数向量。我们有 m 组实验数据, m n ,希望对所有的实验数据〈 x i , w 〉都非常靠近 y i . 在上一篇中提到,这是列满秩矩阵解线性方程组的问题,可以用最小二乘法求解。 把 m 个 n 维行向量 x i T 写成 m*n 矩阵 X , m 个 y i 列成向量 y 。设误差函数 J( w ) =1/2 ‖ X w - y ‖ ^2 = ( X w – y ) T ( X w – y )/2 ,计算问题是,求让 J( w )=( X w – y ) T ( X w – y )/2 取最小值的向量 w 0 . J( w ) 是个在 n 维空间中的二阶幂函数的曲面,它有唯一的极小值在梯度为零处,即梯度 X T ( X w – y ) =0 ,这可以表示成正规方程 X T X w =X T y . 它有唯一解 w 0 =(X T X) -1 X T y. 在大数据的情况下,这个公式解的计算量太大,我们可以采用迭代的方式求解,这通常从任何一个 w 的初值开始,沿着这个梯度 X T ( X w – y ) 下降的方向,迭代逼近这个极值点 w 0 . 逻辑回归( logistic regression ):模式识别是进行逻辑分类,它假定在足够多属性为坐标的多维空间中,用一个超平面把空间分成两半,分别对应着不同的逻辑值。逻辑回归用来确定这个将实验样本中分类误差最小的超平面。 在 n 维线性空间中,满足内积〈 z , a 〉 = b 的向量 x ,在 a 方向上投影的长度都是 b/ ‖ a ‖,这些 x 向量的端点构成的 n-1 维的平面,这超平面与原点的距离是 b/ ‖ a ‖。空间上面的点依指向它的向量 z 在 a 上面的投影被这超平面分成两个部分,依内积〈 z , a 〉是否大于 b ,确定它们属于哪一类。这是模式辨识和机器学习中分类最为基本的直观图像。 采用上面扩充向量的符号,这个超平面是由 w 的数值来确定的,所谓的学习是用样本的数据 x i 和对应的 0 或 1 的分类值 y i ,来计算这个 w 。数学模型的预测值是由判断函数 g( 〈 x , w 〉 ) 来确定,理论上 g( 〈 x , w 〉 ) = (sign( 〈 x , w 〉 )+1)/2 ,但为了便于计算梯度,多数取 sigmoid 函数 g(t)=1/(1+exp(-t)). 我们同样用求误差函数 J( w ) =1/2 ‖ g(X w) - y ‖ ^2 = (g( X w) – y ) T (g( X w) – y )/2 最小的方法,来得到极值点 w 0 ,用迭代的方法沿着梯度下降的方向,逼近这个极值点 w 0. 10.4 世界是线性的吗? 上一节说,机器学习的核心算法线性回归和逻辑回归,应用的都是线性模型。这也是经验公式和模式识别的基础。我们的世界都是线性的吗? 开篇时也说,令人感到幸运和疑惑的是,科学研究上凡是用到数学,有着漂亮定性和定量结果的,基本上可以变成一类线性系统,或者用它来逼近。为什么? 世界是线性的吗?其实不是,只是非线性的系统正向计算不难,但难以反向求解,更无法分部综合。我们能用数学工具取得很好结果的,基本上是线性系统。力学是线性动态系统,绝大多数电路是线性系统;描述连续介质,能量和场的波动,传导,稳定状态的数理方程是线性系统。非线性系统可以用两种方法将它应用线性系统的方法来处理。一是可以在足够小误差的邻域看成是线性的;微分几何用曲面上点的邻域,投影到与之相切的线性空间来计算;非线性动态系统控制稳定或跟踪已知轨迹时,可以用线性系统来近似。二是在应用上尽可能把系统处理成线性的,而把非线性部分局限在一小处或作为输入,以便分析和综合。难以照此办理的许多非线性系统即使有精确简单的方程描述,如混沌系统、联结主义模型,无论在定量和定性上,都难以深入。科学和技术与一切在竞争博弈中的发展一样,都是路径依赖的。某一方向取得突破,人们蜂拥而至,用先创的概念作为后建的基石,建构我们理解的世界。科学发展至今,解释事物的理论无所不在,我们似乎已经充分了解了这个世界,其实这不过是看见在科学这条高速公路旁的城镇,公路不可达之处是不可知的荒野。但是无论如何,线性代数已是现代科学知识的基础构件,我们必须能在头脑中想象它。 机器学习的线性模型怎么用在这个并非是线性的世界?线性回归怎么拟合一条曲线上的数据?怎么用一个超平面分划由一个曲面界定的类别?答案在于增加维数。空间中任意的样本点集,都可以看作是高维空间中点集的投影,它们可以在高维空间的一个超平面上,或被一个超平面分类。 例如,实验数据 (1,4), (2, 9), (3, 20), (4, 37), (5, 60) 是二维空间的曲线 y = 3x 2 – 4x + 5 上的点,在实验中增加一个属性 x 2 = x 1 2 的测量,上面的样本点便成为 (1,1, 4), (2, 4, 9), (3, 9, 20), (4, 16, 37), (5, 25, 60) 它们在三维空间 y =3x 2 – 4x 1 + 5 的平面上。增加几个非线性函数作为新的属性,任何曲面都可以用这些具有新属性变量的线性函数来近似。模式识别中的超平面分类也是如此。这样的处理早已在科研和工程上广泛地应用,它类比于将非线性部分局限在一些部件中,在工程上把系统仍然作为一个线性系统来分析和综合。 10.5 结语 现代公理化的数学,要求尽量抽象,从基本定义出发,不借助任何具象,纯粹形式推理来得到结论。这是严谨化确认的要求,但这不是理解、记忆和运用之道。人脑是按联想方式工作的,不能下意识地推及逻辑之远。我们对世界的理解是用象征符号,在头脑中构建的想象。所谓的课程学习是通过语言,将别人的想象串行地传递给自己,在头脑中重构自己的想象。严格数学的训练沿用形式逻辑推理,只能保证这个传递没有形式上的误差,并不能保证能够建构相同的想象。没有想象就难以记忆和应用。所以学数学就像学剑一样,必须首先心中有剑,从概念的定义开始就要有符合这语义的具象,然后沿着定理推论的展开,逐步剔除不合适的想象,构建符合逻辑的图像。学习中的理解是将语言能指的符号串,与所指的想象对应的过程。形式推理永远不能触及所指的内容,所以形式推理的机器没有理解力。能够产生理解和灵感的,是在头脑的想象世界中看到的内容。这需要拥有丰富的具体实例的素材才能构建。因此学习数学只记忆定义、证明和定理不能走远,也没有透视现象的直觉。严谨的数学证明当然很重要,但那只是用来一维地传递信息,和在逻辑上证实想象的真实性的必要手段。把握数学内容的,是能在想象中出现的图像。练习和应用是学好一门课的不二途径。。 (系列终)
微信公众号 readpami 求关注,求扩散,求打赏 QUINT: On Query-Specific Optimal Networks Li, Liangyue, Yuan Yao, Jie Tang, Wei Fan, and Hanghang Tong In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 985-994. ACM, 2016. Query-Specific Optimal Networks:我的地盘 RED.pdf
2016 年开启了人工智能的时代。在这一年里, AlphaGo 围棋胜了人类;微软报告 ASR 语音识别结果好过专业速记员;日本 NHK 电视报道, IBM 机器 Watson 只花 10 分钟完成 41 名病人的诊断,这通常是医生两周的工作;它读取大量资料和病人 DNA 后,救了一位医者束手无策的白血病人;特斯拉自动驾驶汽车已挤进车流,还发生了车祸;机器创作歌曲、绘画、诗歌、小说、电影也有了不俗的成绩。 Google 、 IBM 、 Facebook 、 Amazon 和 Microsoft 组成人工智能联盟,研究 AI 行业的规范标准。美国白宫发布了《准备迎接人工智能未来》和《国家人工智能研究发展战略计划书》。现在的人工智能核心是机器学习,它不是过去那种,依人类知识为规则来作答的专家系统,而是通过样本来获得知识,自己作判断的机器。 这篇简单介绍机器学习的工作原理。希望只要懂的线性代数、最小二乘法和统计概念的读者,便能通过数学原理了解 其 机制,破除神秘给自己解惑,对此有兴趣的人能抓住要点启发应用。 什么是机器学习?简单地说,计算机利用输入的大量样本数据,调整表示规律和分类通用数学模型的参数,然后以调好的模型作答。通常用线性函数的组合来表示数值规律和划分类别模式,实用中的线性函数参数是以万计到百亿计的数量。这样的数学模型虽然很简单,却因参数数量的巨大能够实现复杂的功能,足以涵盖各种预测和辨识情况。在数学上,这调整模型参数及应用模型的计算机制,都是精确有效的,但也因变量个数的巨大,难以分析归纳成像物理规律那样简单明晰的因果性机制,无法从人脑逻辑推演的角度来理解。 测试人的 IQ ,是让人回答几十道题,每道题列出几张图形,或几组数字,或几个词作为样本,让你从一组可能的选项中挑出一个最“合理”的答案,以此来评估人的智商。这与你拥有的知识内容无关,测定的是大脑从样本来类比判断的能力。计算机和学习算法,模仿人脑这种能力,赋予机器智商。人类具有智商,可以通过学习获得知识。样本的数据潜藏着大量的信息,通过训练给予了机器知识,让机器表现出判断和预测的智能。 机器学习基本分成无监督学习和监督学习两类。无监督学习是从样本数据分布中,按它们的聚集来分类,例如用大量的人民币硬币尺寸和重量作为样本的数据,它们在 2 维空间的分布聚集在几个地方。人们看后,知道它们是分成几类,依此知识判断新给的硬币类别。机器可以通过数据点间距离的计算( K-means ),自动将聚类分成几组。得到各组的中心和分布范围后,以此判别新输入硬币所对应的组别。许多事物看来杂乱无章,其实分属不同类别,例如学生潜力,绘画风格,只要用足够多特征属性来描述就可以把它们区分,但对于许多的特征属性,人类需要研究归纳抽取出能理解其含义的少量特征,很难利用非常多的特征属性来分类,机器却很容易做到。在你现在的工作中,也可能应用现成的 N 维自动分类程序,在已经拥有数据中发现潜藏的分类。 无监督学习就像无师自通的领悟,效率较差。有老师教学就会快得多。监督学习是最广泛最成功应用的机器学习,用我们知识来标记样本,去“教”机器学会回答问题。这个问答在数学上,是从问题的属性空间映射到答案空间的一个函数。机器学习的基本算法是从一族候选函数中,比如说线性函数,通过计算选取出与预测样本标记误差最小的函数。这个选取多是通过迭代法,沿着减小误差的梯度方向,不断修正候选函数的参数来实现。这个过程称为训练。 对于数值结果的问题,线性回归几乎是最基本的机器学习,几百年前人们就用它从实验数据中导出经验公式。采用最小二乘法,求出与它们误差最小的直线或超平面。它有公式解,在线性代数上称为“正规方程( Normal Equation )”的线性方程解。然而,商业应用中的机器学习模型未知参数数量巨大,公式解要求非常大的计算机内存和计算量来实现,通常是用梯度法迭代求出近似解。这是被应用最广泛的数值预测的学习方法。 输入属性 x 与答案 y 不能用一个线性式子表示怎么办?通过增加一些与输入 x 是非线性关系的项,例如 x 2 , x 3 ,… ,有可能把它们表达成 一个线性式子 ,这在样条函数逼近理论上,已有很好的数学研究。在应用上,它相应于选取足够多的输入属性表达。例如房价的估值,所在的地区和房子的面积是基本的属性,当它用线性回归误差较大时,加入与已有属性是非线性关系的邻居平均房价,房间卫浴个数,装修等级等来扩充属性空间的维数,便可取得较好模型精度。 对于分类模式的判断,逻辑回归是基本的算法。在直观上是用一个超平面把输入属性空间分成两半,逻辑上对应着 0 和 1 。超平面用一个线性函数表示,输出对应于这线性函数值是否大于 0 。多个的超平面将属性空间分成多类。能够这样归类的数据称为是线性可分的。上世纪五十年代 AI 热潮中的感知器( Perceptron ),用一个 Sigmoid 作用函数 S(z)=1/(1+exp(-z)) 加在线性函数之后,即 y = S ( 〈 w , x 〉 - b) ,让输出结果基本饱和在 0 和 1 ,并且易于导出修正误差的梯度。它模拟了一个神经元的功能,它们组成的单层神经网络,能很好处理线性可分的模式识别。对于不是线性可分的模式,可以采用上述增加输入特征属性的方法,让它在高维空间上线性可分。 机器学习强大的功能来自巨量的可调参数,它的学习算法并不难理解,基本是向量和矩阵的线性运算,关键之处是巨量的样本数据的获得和计算巨量未知数的技术实现。 支持向量机( SVM )采用内积的“核函数( Kernelfunction )”,将输入经过非线性变换映射到高维空间,实现线性可分。用分段线性函数代替神经元里的 Sigmoid 作用函数,这样调整间隔分类超平面的参数就只跟较少的点有关,既可以大大减少计算量,又把计算转化成二次函数在线性约束下求极值的问题。实践中的应用涉及到巨大稀疏矩阵的计算, 1998 年 Microsoft Research 的 John C. Platt 提出 SMO 算法,可以很有效地处理巨量的变量和样本数据,这使得SVM获得广泛的应用。支持向量机有清晰数学理论的支持,能有效地控制训练结果,有着许多语言实现的软件包可用,相对于多层神经网络,它所要求的机器资源较少,但要求有应用领域知识的人力合作来选取合适的核函数。它成功地应用在许多分类领域,如文本、图像、蛋白质、垃圾邮件分类和手写体字识别等等。 人工神经网络对每一层输入都用线性来分类,由于 sigmoid 作用函数,每层的输入和输出是个非线性变换,因此能通用地实现各种数据的分类。多层神经网络具有极大的适应性和潜力解决复杂的问题,但多于三层的神经网络,采用向后传播误差的梯度法来训练,较难控制学习的结果,所以较难简单地应用来处理非常复杂的情况。 相对于前面单纯机制的机器学习数学模型,深度学习像是一种综合的工程设计,它基本架构是深层的神经网络,具有处理非常复杂模式的能力。为了提高训练的效果和效率,设计不同层次网络构造,例如在底层应用“先天”预设功能的卷积网络层,来获取特征,在深层网络中分层采用无监督的预先学习和随后的监督学习,来提高学习效率。今天深度学习的神经网络已有百亿个神经联接参数,具有极强的智商,它需要巨大的计算机资源和信息的支持,在大公司研究突破性的应用和探索人工智能的未来。 机器学习的算法都以取得与样本最小误差为学习的目标,如果仅仅是这样,让机器简单记忆这些样本就行了,而现实的不同往往是无穷多的,有限的样本怎么能用来判别无限的可能? 答案是:严格上说是不能,但在很大概率上是可能的。 输出只有 0 和 1 是最基本的判断学习,机器学习是用样本调整候选函数的参数,以获得合适供作判别的模型。只要候选函数族不会细到足以区分属性空间任何的点,用足够多随机选取的样本来训练,那么它预测误差也以足够大的概率收敛于训练样本的误差率。这个 Vapnik-Chervonenkis bound 公式是: P ( |E in (h) – E out (h)| ε | ∃ h ∈ H ) ≤ 4m H (2N)exp( - ε 2 N/8) 这里 P 是概率,ε是任给的一个小数值, H 是候选函数族, h 是训练后选出的函数, N 是训练样本的个数, E in (h) 是 h 函数训练样本预测失误率, E out (h) 是样本之外预测失误率(数学期望), m H (2N) 是与候选函数族有关的上界估算。 VC 维度的理论说,只要候选函数族不会细到足以区分任何样本,这个上界至多按比 N 小 VC 维数的多项式速度增大。那么不等式的右边随 N 增大趋于 0. VC 维数大致等于函数族中自由变量的数目,例如 2 个输入的感知器 Perceptron ,有3个可调参数,它的 VC 维是 3 ,即是线性函数的参数变量数目。这式子说明数学模型越简单,即参数变量少VC维数小,用非常多的样本训练后,它预测的准确性就越容易接近于在训练样本上检测到的精度。另一方面,数学模型越复杂,训练的结果对样本集的失误率 E in (h) 就越小,它对样本的适应性就越好。成功的机器学习要求这两者都小,机器学习的实践工程是在模型的复杂程度上取得均衡,尽量用简单的模型取得满意的样本训练结果。 由此可以看到,机器学习可行性依赖于两个条件:第一,数据必须有规律模式。无规律随机分布的数据无从预测,表现为学习过程不收敛。第二,拥有巨量的随机样本数据。统计公式的基本假设是样本的随机性,只有足够多随机 无偏 的样本,这个概率估计的式子才成立。越是复杂的辨识问题,概率的样本空间越大,就需要越多数量的样本才足以满足要求。它们不仅用来减小对训练样本的误差,也用以保证有足够大的概率取得精确预测。这要求有巨量的数据,以及支持这个巨量计算的计算机功能,这就是为什么一直到了大数据时代,人工智能的突破才到来。 介绍了一般原理和抽象公式,讲些具体的数据,给大家一个直观的印象。上世纪 80 年代末,我在科学院研究生院讲授“人工神经元网络”时,所里计算机的功能还不及现在的手机。手写体字识别研究,只能用较小的网络和几百个样本串行迭代学习来进行,无论算多少小时,结果都不理想。作为比较, 前几天 我用斯坦福大学 Andrew Ng 公开课里练习项目里的数据,用 PC 训练手写数字的辨识,算法与 20 多年前几乎一样,但样本是 5000 个,用批量计算更新参数,经过 50 轮迭代, PC 运行不到 10 分钟就训练好了模型,取得 95.26% 的判别的准确率。注意这里用的只是简单 400x25x10 节点的三层网络,已经有了 1 万多个待定的参数,输入样本是 400x5000 的矩阵。这只是个辨识 10 个数字图像的小项目;斯坦福大学李菲菲教授的机器学习,用了 1500 万张照片的数据库,涵盖 22000 种物品,对 2400 万个节点, 150 亿个联接的神经网络进行训练,能够让计算机看懂了照片,描述图里的场景。机器学习后的智能来自大数据、算法和有处理这些数据能力的计算机。 计算机犹如蒸汽机的发明,只有工业化才改变了大众的职业,现在各种条件已经成熟,机器学习将智力劳动工业化,我们正在这新时代的开端。 【参考资料】 腾讯科“技盘点 2016 年人工智能的十大创意项目” http://tech.qq.com/a/20161230/030073.htm White House report : PREPARING FOR THE FUTURE OF ARTIFICIAL INTELLIGENCE https://obamawhitehouse.archives.gov/sites/default/files/whitehouse_files/microsites/ostp/NSTC/preparing_for_the_future_of_ai.pdf Stanford University Andrew Ng coursera “Machine Learning” https://www.coursera.org/learn/machine-learning Caltech course Yaser S.Abu-Mostafa “L earning From Data ” http://work.caltech.edu/telecourse.html
基本思想 通常的学习算法都是从训练样本中学到目标函数,然后把目标函数用到新样例中。 K- 近邻算法不同,它并没学到这样的普遍函数,只是把训练样例存起来,当来了新样例的时候,根据新样例和训练样例的关系,赋给新样例一个函数值。 它的基本思想是这样的:在和新样例最相似的 k 个训练样例中,最多样例属于哪个类别,新样例就属于哪个类别。 可通过下面这个图说明。图中的图形分两类,蓝色正方形和红色三角形,而那个绿色圆点的类别未知,它该属哪一类呢?如果采用 3- 近邻算法,绿色圆被划分到红色三角一类,因为距离它最近的三个图形中,有两个属于三角形,一个属于正方形,三角形更多。而 5- 近邻算法则把绿色圆归到蓝正方形一类,因为离它最近的五个图形中,蓝正方形有 3 个,红三角只有两个。 基本算法 一个样例包含 n 个属性,它对应于 n 维空间 R n 的点。目标函数值可以是离散值,也可以是连续值。 先考虑离散值,目标函数 f 为: R n → V ,其中 V 是有限集合 {v1,v2, ……, vs} 。新新样例为 x q ,从训练样例中找出距离 x q 最近的 k 个点(点间距离可用欧式距离定义)。按 K- 近邻算法: 加权算法 距离不同的点赋予不同的权值,较近的点比较重要,权值大,远的点权值小。类似引力定律,权值与距离平方成反比: 维度灾难 样例间的距离是根据样例的所有属性计算的,但实际上可能有的属性与分类无关。举个例子,每个样例由 20 个属性描述,但这些属性中只有两个与分类有关。这两个属性值一致的样例在 20 维空间中可能距离会很远,用这样的距离进行分类会导致错误结果。这种情况称之为维度灾难。 解决维度灾难的一个方法是,采用近邻算法时先对每个属性乘一个加权因子,然后再对测试样例分类,根据分类误差反复调整加权因子,从而对属性的重要性得到一个合理的估算。 K- 近邻算法的另一个不足是,计算主要发生在新样例进来的时候。它处理每个新样例都需要大量的计算,资源开销比较大。
“外”如探测浩瀚的宇宙,“内”如揭示人体细胞的奥秘,海量的信息必将改变人类社会、改变人类自己。 人类已经逐步进入信息爆炸的时代,各种个人信息、网络行为、生活习惯,甚至各种所谓的隐私、银行账号密码等“机密”信息均无法避免遭遇各种泄露门、监控门事件。肿瘤这个邪恶的幽灵隐藏在体内,时刻准备攫取资源来扩张自己的军队。而我们却对之毫无察觉。 但随着以新一代测序技术为代表的高通量生物检测技术的发展,肿瘤的机密信息也无处遁形,遭遇着前所未有的“信息泄露”,肿瘤的DNA、RNA、蛋白质、代谢物等生物信息都暴露于人类的检测之下。这些海量信息被称为组学信息(omics),如:DNA序列信息是基因组(genomics)、染色质结构和各种基因组修饰是表观遗传组(epigenomics),另外还有转录组、蛋白质组、代谢组等等。从这些泄漏的信息,我们该如何来窥探肿瘤的“隐私”?如何推出更好的对付“肿瘤”的办法? 拿到海量的肿瘤组学数据我们能做什么?首要任务是建立肿瘤的信息档案。就像是地理测绘测绘,首先要用测到的数据把地图绘制出来。目前,几个国际癌症基因组计划,如ICGC(International Cancer Genome Consortium)、TCGA(The Cancer Genome Atlas),都致力于推动肿瘤分子图谱的构建。利用TCGA的数据,可以识别在肿瘤细胞中发生的体突变,并从中识别出影响蛋白质结构与功能的突变位点(Niu et al., Nat Genet 2015),另一项类似的研究从超过7000余对肿瘤/癌旁配对样本中识别出超过47000个非同义突变位点,极大的提高了人们对肿瘤突变谱的认识(Shen et al., Am J Human Genet 2016)。基于突变谱,可以更好的估计靶向药物的潜在应用比例,经计算估计约40%的病人可能会受益于靶向药物(Rubio-Perez et al., Cancer Cell 2015)。 但由于肿瘤的高度异质性以及不同中心采样和检测手段的差异,单个项目所纳入的样本量仍然远远不够,以肝癌为例,在TCGA中有约400例样本,已经是目前公开数据集里样本数最多的了。为了更好的绘制“肿瘤的地图”,必须建立跨更多数据源的大规模图谱。比如本课题组构建的肝细胞肝癌(Hepatocellular Carcinoma,HCC)组学图谱 HCCDB ,已经收录了约3500例临床样本的基因/miRNA表达数据、约800例DNA甲基化数据、约600例CNV数据(来源于17组研究),目前可提供表达谱的网页浏览服务。多来源/多中心、大规模的组学数据是建立肿瘤信息档案的基础,基于这些收录的数据,我们已经开展了HCC的DNA甲基化图谱分析(Zheng et al. Brief Bioinform 2016 ),正在进行基于表达谱的HCC分型研究。同时,HCC相关研究人员可以方便的查询分析结果,指导相关的实验设计,提出更加合理的假设。比如某研究人员正在关注HCC肿瘤干细胞的调控机制,他可以用EPCAM、AFP、SPP1等候选基因来查询相关的通路或基因。信息爆炸的时代或者说大数据时代,绘制各种“地图”是实现数据价值的第一步,从多个维度、多个层次构建“肿瘤地图”是以信息技术攻克癌症的基础和关键。类似的,大肠癌也建立了超过4000例样本(来源于18组研究)的大规模数据集,并基于该数据集分析出大肠癌的四个分子亚型,并得到了每个分子亚型所特有的分子和表型特征(Guinney et al. Nat Med 2015)。 有了基础地图之后,我们当然需要将其绘制得更加精细,并利用“先人的知识”对其进行仔细的标注,比如在军事地图上需要标注出关键的制高点、隘口等。这个时候人工智能技术就非常重要了,针对癌症组学数据高维、异质等特性,需要更好机器学习方法对数据进行挖掘和建模,比如聚类(分子分型,如多组学整合聚类方法 LRAcluster )(Wu et al. BMC Genomics 2015)、预测(分子标志物),核心调控网络识别(分子机理)(Gu et al. Mol BioSyst 2014))及其可能的调控策略(药物干预)等。由于生命系统高度复杂,组学数据并不能很好的完成“地图标注”的任务,必须要跟专家知识、文献信息有机的结合起来。这对传统基于采样数据的机器学习方法无疑是新的挑战。可以预见,要更好的解读肿瘤的信息,必须建立可融合采样数据和知识数据的新的人工智能方法。深度学习(deep learing)(LeCun et al. Nature 2015)与层级贝叶斯学习(hierarchical Bayesian learning)(Ghahramani. Nature 2015; Lake et al. Science 2015)的结合是否是可行的路径?这些都还有待进一步的研究。 随着生物医学检测技术与人工智能的发展,肿瘤的“机密信息”将不断的被披露,人们将拥有更多肿瘤诊疗的新手段。 参考文献: Ghahramani. Probabilistic machine learning and artificial intelligence. Nature 2015, 421:452-459. Gu et al. Gene module based regulator inference identifying miR-139 as a tumor suppressor in colorectal cancer. Molecular BioSystems 2014, 10(12):3249-3254. Guinney et al. The consensus molecular subtypes of colorectal cancer. Nat Med 2015, 21(11):1350-1362. Lake et al. Human-level concept learning through probabilistic program induction. Science 2015, 350(6266):1332-1339. LeCun et al. Deep Learning. Nature 2015, 521:436-444. Niu et al. Protein-structure-guided discovery of functional mutations across 19 cancer types. Nat Genet 2016, 48(8):827-837. Rubio-Perez, et al. In Silico Prescription of Anticancer Drugs to Cohorts of 28 Tumor Types Reveals Targeting Opportunities. Cancer Cell 2015, 27:382-396. Shen et al. Proteome-Scale Investigation of Protein Allosteric Regulation Perturbed by Somatic Mutations in 7,000 Cancer Genomes. Am J Hum Genet 2016, EPub. Wu et al. Fast dimension reduction and integrative clustering of large-scale multi-omics data using low-rank approximation: application to cancer molecular classification. BMC Genomics 2015, 16:1022. Zheng et al. Genome-wide DNA methylation analysis identifies candidate epigenetic markers and drivers of hepatocellular carcinoma. Brief Bioinform 2016, Epub. 2016年12月23日
老王最近去了工业界发财了,干的是当下最高大上的data scientist。想知道老王去的是哪家公司吗?想不想跟老王一起玩机器学习?老王现在的公司招人了,还等啥?请看下面的job description。 X 工作地点在阿联酋和北京都有。老王坐标阿联酋,经常去迪拜爽,你懂的。 X 待遇有多好? 税后 年薪百万不是梦,秒杀所有一线公司。公司超级有钱,阿拉伯土豪撒钱的概念,你懂的。就在昨天,公司请所有员工去迪拜五星级酒店度假村度假了! X 跟谁一起干活?老王你还信不过?老板和骨干都是从美国来的华人,高手如云,大数据练级的最好选择。 X 干啥项目?最有挑战的大数据项目,客户是土豪,有钱! 有数据! 有Vision! 就是缺人告诉他数据的价值。 感兴趣的,马上联系老王,把你的简历发过来。老王邮箱:jywang.ieee@gmail.com We are looking for Data Scientist, who is passionate in applying data science skills to solve real life big data analytic problems, to join our core engineering team to build one of the largest big data analytics system in the world. The system will ingest, enrich, aggregate, generate insights and alert from 100s of billions of records in real time daily The candidate will work in an agile development environment and have the following responsibilities: Perform analysis and gain insight into a broad range of very large data sets Develop data models Process, cleanse, and verify the integrity of data used for analysis Data mining using state-of-the-art methods Develop predictive analytics using Artificial Intelligence and Machine Learning technologies Prototype and demonstrate solutions Work with software engineers to implement analytical models and approaches in mission-critical products Minimum qualifications: BS/MS/PhD in Computer Science, Math, Statistics, Physics, Engineering or related field Strong experience with data analysis, data modeling and statistical analysis Ability to create extensible and scalable data schema that lay the foundation for analysis Strong experience in at least one scripting language (Python, Perl, Scala, etc.) Experience working with very large data sets Strong verbal and written English communication skills Preferred qualifications: Strong experience in data analysis, statistical analysis and predictive analytics Strong experience with Machine Learning and Artificial Intelligence Working experience with statistical analysis tools such as R, SAS, SPSS, etc. Experience with Java Experience with Hadoop and Spark (Spark Core, Spark ML, SparkR, GraphX, GraphFrames) Experience with ElasticSearch/Solr/Lucene
我们组( Computational Social Science , ETH Zurich)最近招收两名博士后,详细情况请见: 1-year Postdoc Position: Complex networks/systems and Machine learning We are looking for a highly motivated postdoctoral fellow in the area of of Complex Networks/Systems and Machine Learning. The candidate will collaborate with the group on complex networks research, data-driven modelling of complex sociotechnical and financial systems, machine learning models, modelling dynamical processes on complex networks with analytical and computational methods such as Monte Carlo and agent-based simulation. The successful candidate is required to have: PhD in quantitative field, good publication record, strong analytical and mathematical background, programming skills in Matlab, Mathematica, Python and C. The candidate should also have research experience with machine learning and complex network research. The candidate must also possess excellent English language skills, social skills, team spirit, and international experience, and must be able and willing to work in a highly interdisciplinary environment. Candidates should submit their application materials as a single pdf file (5 MB), including a short motivational letter, their CV and exam documents (A-levels, bachelor, master or diploma, and PhD) with attention to: ETH Zurich, Mr. Olivier Meyrat, Human Resources, CH-8092 Zürich. 2 letters of recommendation should also be sent directly to us by recommenders. Applicants are encouraged to submit their materials as soon as possible via ETH's application system. The reviewing of applications will continue until the position is filled. Contact ================================= 1-year Postdoc Position: Big Data, Social Mining, Data Science We are looking for a highly motivated postdoctoral fellow in the area of Big Data with a particular focus on Social Mining. The ideal candidate shall pursue cutting-edge research in the areas of Big Data, data science, deep learning and their interplay with the Internet of Things for techno-socio-economic application domains such as Smart Cities. Candidates are required to do research on novel data collection and analytics methodologies and therefore advanced skills in relevant programming platforms (Hadoop, Spark, Mahout, real-time data analytics) are required. Experience in developing Internet of Things systems such as Arduino and smart phones apps are a plus. The candidate must also possess excellent English language skills, social skills, team spirit, and international experience, and must be able and willing to work in a highly interdisciplinary environment. Candidates should submit their application materials as a single pdf file (5 MB), including a short motivational letter, their CV and exam documents (A-levels, bachelor, master or diploma, and PhD) with attention to: ETH Zurich, Mr. Olivier Meyrat, Human Resources, CH-8092 Zürich. 2 to 3 letters of recommendation should also be sent directly to us by recommenders. Applicants are encouraged to submit their materials as soon as possible via ETH's application system. The reviewing of applications will continue until the position is filled. Contact
I. 引言 有回顾NLP(Natural Language Processing)历史的知名学者介绍机器学习(machine learning)取代传统规则系统(rule-based system)成为学界主流的掌故,说20多年前好像经历了一场惊心动魄的宗教战争。必须承认,NLP 这个领域,统计学家的完胜,是有其历史必然性的。机器学习在NLP很多任务上的巨大成果和效益是有目共睹的:机器翻译,语音识别/合成,搜索排序,垃圾过滤,文档分类,自动文摘,词典习得,专名标注,词性标注等(Church 2007)。 然而,近来浏览几篇 NLP 领域代表人物的综述,见其中不乏主流的傲慢与偏见,依然令人惊诧。细想之下,统计学界的确有很多对传统规则系统根深蒂固的成见和经不起推敲但非常流行的蛮横结论。可怕的不是成见,成见无处不在。真正可怕的是成见的流行无阻。而在NLP这个领域,成见的流行到了让人瞠目结舌的程度。不假思索而认同接受这些成见成为常态。因此想到立此存照一下,并就核心的几条予以详论。下列成见随处可见,流传甚广,为免纷扰,就不列出处了,明白人自然知道这绝不是杜撰和虚立的靶子。这些成见似是而非,经不起推敲,却被很多人视为理所当然的真理。为每一条成见找一个相应的规则系统的案例并不难,但是从一些特定系统的缺陷推广到对整个规则系统的方法学上的批判,乃是其要害所在。 【成见一】规则系统的手工编制(hand-crafted)是其知识瓶颈,而机器学习是自动训练的(言下之意:没有知识瓶颈)。 【成见二】规则系统的手工编制导致其移植性差,转换领域必须重启炉灶,而机器学习因为算法和系统保持不变,转换领域只要改变训练数据即可(言下之意:移植性强)。 【成见三】规则系统很脆弱,遇到没有预测的语言现象系统就会 break(什么叫 break,死机?瘫痪?失效?),开发不了鲁棒(robust)产品。 【成见四】规则系统的结果没有置信度,鱼龙混杂。 【成见五】规则系统的编制越来越庞杂,最终无法改进,只能报废。 【成见六】规则系统的手工编制注定其无法实用,不能 scale up,只能是实验室里的玩具。 【成见七】规则系统只能在极狭窄的领域成事,无法实现跨领域的系统。 【成见八】规则系统只能处理规范的语言(譬如说明书、天气预报、新闻等),无法应对 degraded text,如社会媒体、口语、方言、黑话、OCR 文档。 【成见九】规则系统是上个世纪的技术,早已淘汰(逻辑的结论似乎是:因此不可能做出优质系统)。 【成见十】从结果上看,机器学习总是胜过规则系统。 所列“成见”有两类:一类是“偏”见,如【成见一】至【成见五】。这类偏见主要源于不完全归纳,他们也许看到过或者尝试过规则系统某一个类型, 浅尝辄止,然后遽下结论(jump to conclusions)。盗亦有道,情有可原,虽然还是应该对其一一纠“正”。本文即是拨乱反正的第一篇。成见的另一类是谬见,可以事实证明其荒谬。令人惊诧的是,谬见也可以如此流行。【成见五】以降均属不攻自破的谬见。譬如【成见八】说规则系统只能分析规范性语言。事实胜于雄辩,我们开发的以规则体系为主的舆情挖掘系统处理的就是非规范的社交媒体。这个系统的大规模运行和使用也驳斥了【成见六】,可以让读者评判这样的规则系统够不够资格称为实用系统: 以全球500强企业为主要客户的多语言客户情报挖掘系统由前后两个子系统组成。核心引擎是后台子系统(back-end indexing engine),用于对社交媒体大数据做自动分析和抽取。分析和抽取结果用开源的Apache Lucene文本搜索引擎(lucene.apache.org) 存储。生成后台索引的过程基于Map-Reduce框架,利用计算云(computing cloud) 中200台虚拟服务器进行分布式索引。对于过往一年的社会媒体大数据存档(约300亿文档跨越40多种语言),后台索引系统可以在7天左右完成全部索引。前台子系统(front-end app)是基于 SaaS 的一种类似搜索的应用。用户通过浏览器登录应用服务器,输入一个感兴趣的话题,应用服务器对后台索引进行分布式搜索,搜索的结果在应用服务器经过整合,以用户可以预设(configurable)的方式呈现给用户。这一过程立等可取,响应时间不过三四秒。 II. 规则系统手工性的责难 【成见一】说:规则系统的手工编制(hand-crafted)是其知识瓶颈,而机器学习是自动训练的(言下之意:因此没有知识瓶颈)。 NLP主流对规则系统和语言学家大小偏见积久成堆,这第一条可以算是万偏之源。随便翻开计算语言学会议的论文,无论讨论什么语言现象,为了论证机器学习某算法的优越,在对比批评其他学习算法的同时,规则系统大多是随时抓上来陪斗的攻击对象,而攻击的理由往往只有这么一句话,规则系统的手工性决定了 “其难以开发”(或“其不能 scale up”,“其效率低下”,“其不鲁棒”,不一而足),或者干脆不给具体理由,直接说“文献【1】【2】【3】尝试了这个问题的不同方面,但这些系统都是手工编制的”,一句话判处死刑,甚至不用讨论它们的效果和质量。手工性几乎成了规则系统的“原罪”,编制这些系统的语言学家因此成为学术共同体背负原罪的二等公民。 手工编制(hand-crafted)又如何?在日常消费品领域,这是对艺人特别的嘉奖,是对批量机械化生产和千篇一律的反抗,是独特和匠心的代表,是高价格理直气壮的理由。缘何到了NLP领域,突然就成贬义词了呢?这是因为在NLP领域,代表主流的统计学家由于他们在NLP某些任务上的非凡成功,居功自傲,把成功无限夸大,给这个共同体施行了集体催眠术,有意无意引导人相信机器学习是万能的。换句话说,批判手工编制的劣根性,其隐含的前提是机器学习是万能的,有效的,首选的。而实际情况是,面对自然语言的复杂性,机器学习只是划过了语言学的冰山一角,远远没有到主流们自觉或不自觉吹嘘的万能境界。催眠的结果是,不独不少语言学家以及NLP相关利益方(如投资人和用户)被他们洗脑了,连他们自己也似乎逐渐相信了自己编制的神话。 真实世界中,NLP 是应用学科,最终结果体现在应用软件(applications)上,属于语言软件工程。作为一个产业,软件工程领域吸引了无数软件工程师,虽然他们自嘲为“码工”,社会共同体给予他们的尊重和待遇是很高的(Bill Gates 自封了一个 Chief Engineer,说明了这位软件大王对工匠大师的高度重视)。古有鲁班,现有码师(coding master)。这些码工谁不靠手工编制代码作为立足之本呢?没听说一位明星工程师因为编制代码的手工性质而被贬损。同是软件工程,为什么计算语言学家手工编制NLP代码与其他工程师手工编制软件代码,遭遇如此不同的对待。难道是因为NLP应用比其他应用简单?恰恰相反,自然语言的很多应用比起大多数应用(譬如图形软件、字处理软件等等)更加复杂和艰难。解释这种不同遭遇的唯一理由就是,作为大环境的软件领域没有NLP主流的小环境里面那么多的傲慢和偏见。软件领域的大师们还没有狂妄到以为可以靠自动编程取代手工编程。他们在手工编程的基础建设(编程架构和开发环境等)上下功夫,而不是把希望寄托在自动编程的万能上。也许在未来的某一天,一些简单的应用可以用代码自动化来实现,但是复杂任务的全自动化从目前来看是遥遥无期的。不管从什么标准来看,非浅层的自然语言分析和理解都是复杂任务的一种。因此,机器学习作为自动编程的一个体现是几乎不可能取代手工代码的。规则系统的NLP应用价值会长期存在。 自动是一个动听的词汇。如果一切人工智能都是自动学习的,前景该有多么美妙。机器学习因为与自动连接在一起,显得那么高高在上,让人仰视。它承载着人类对未来世界的幻想。这一切理应激励自动学习专家不断创新,而绝不该成为其傲慢和偏见的理由。 在下面具体论述所谓规则系统的知识瓶颈软肋之前,值得一提的是,本文所谓自动是指系统的开发,不要混淆为系统的应用。在应用层面,无论是机器学习出来的系统,还是手工编制的系统,都是全自动地服务用户的,这是软件应用的性质决定的。虽然这是显而易见的事实,可确实有人被误导,一听说手工编制,就引申为基于规则系统的应用也是手工的,或者半自动的。 手工编制NLP系统是不是规则系统的知识瓶颈?毋庸讳言,确实如此。这个瓶颈体现在系统开发的周期上。但是,这个瓶颈是几乎所有大型软件工程项目所共有的,是理所当然的资源成本,不独为 NLP “专美”。从这个意义上看,以知识瓶颈诟病规则系统是可笑的,除非可以证明对所有NLP项目,用机器学习开发系统比编制规则系统,周期短且质量高(个别的项目可能是这样,但一般而言绝非如此,后面还要详谈)。大体说来,对于NLP的浅层应用(譬如中文切词,专名识别,等等),没有三个月的开发,没有至少一位计算语言学家手工编制和调试规则和至少半个工程师的平台层面的支持,是出不来规则系统的。对于NLP的深层应用(如句法分析,舆情抽取等),没有至少一年的开发,涉及至少一位计算语言学家的手工编制规则,至少半个质量检测员的协助和半个工程师的平台支持,外加软件工程项目普遍具有的应用层面的用户接口开发等投入,也是出不来真正的软件产品的。当然需要多少开发资源在很大程度上决定于开发人员(包括作为知识工程师的计算语言学家)的经验和质量以及系统平台和开发环境的基础(infrastructures)如何。 计算语言学家编制规则系统的主体工作是利用形式化工具编写并调试语言规则、各类词典以及语言分析的流程调控。宏观上看,这个过程与软件工程师编写应用程序没有本质不同,不过是所用的语言、形式框架和开发平台(language,formalism and development platform)不同,系统设计和开发的测重点不同而已。这就好比现代的工程师用所谓高级语言 Java 或者 C,与30年前的工程师使用汇编语言的对比类似,本质是一样的编程,只是层次不同罢了。在为NLP特制的“高级”语言和平台上,计算语言学家可以不用为内存分配等非语言学的工程细节所羁绊,一般也不用为代码的优化和效率而烦扰,他们的注意力更多地放在面对自然语言的种种复杂现象,怎样设计语言处理的架构和流程,怎样平衡语言规则的条件宽窄,怎样与QA(质量检测)协调确保系统开发的健康,怎样保证语言学家团队编制规则的操作规范(unit testing,regression testing,code review,baselines,等等)以确保系统的可持续性,怎样根据语言开发需求对于现有形式框架的限制提出扩展要求,以及怎样保证复杂系统的鲁棒性,以及怎样突破规则系统的框架与其他语言处理包括机器学习进行协调,等等。一个领头的计算语言学家就是规则系统的架构师,系统的成败绝不仅仅在于语言规则的编制及其堆积,更多的决定于系统架构的合理性。明星工程师是软件企业的灵魂,NLP 规则系统的大规模成功也一样召唤着语言工程大师。 关于知识瓶颈的偏见,必须在对比中评估。自然语言处理需要语言学知识,把这些知识形式化是每个NLP系统的题中应有之义,机器学习绝不会自动免疫,无需知识的形式化。规则系统需要语言学家手工开发的资源投入,机器学习也同样需要资源的投入,不过是资源方式不同而已。具体说,机器学习的知识瓶颈在于需要大数量的训练数据集。排除研究性强实用性弱的无监督学习(unsupervised learning),机器学习中可资开发系统的方法是有监督的学习(supervised learning)。有监督的学习能开发知识系统成为应用的前提是必须有大量的手工标注的数据,作为学习的源泉。虽然机器学习的过程是自动的(学习算法的创新、调试和实现当然还是手工的),但是大量的数据标注则是手工的(本来就有现成标注不计,那是例外)。因此,机器学习同样面临知识瓶颈,不过是知识瓶颈的表现从需要少量的语言学家变成需要大量的低端劳动者(懂得语言及其任务的中学生或大学生即可胜任)。马克思说金钱是一般等价物,知识瓶颈的问题于是转化为高级劳动低级劳动的开销和转换问题:雇佣一个计算语言学家的代价大,还是雇佣10个中学生的代价大?虽然这个问题根据不同项目不同地区等因素答案会有不同,但所谓机器学习没有知识瓶颈的神话可以休矣。 另外,知识瓶颈的对比问题不仅仅是针对一个应用而言,而应该放在多应用的可移植性上来考察。我们知道大多数非浅层的NLP应用的技术支持都源于从自然语言做特定的信息抽取:抽取关系、事件、舆情等。由于机器学习把信息抽取看成一个直接对应输入和输出的黑匣子,所以一旦改变信息抽取目标和应用方向,以前的人工标注就废弃了,作为知识瓶颈的标注工作必须完全重来。可是规则系统不同,它通常设计成一个规则层级体系,由独立于领域的语言分析器(parser)来支持针对领域的信息抽取器(extractor)。结果是,在转移应用目标的时候,作为技术基础的语言分析器保持不变,只需重新编写不同的抽取规则而已。实践证明,对于规则系统,真正的知识瓶颈在语言分析器的开发上,而信息抽取本身花费不多。这是因为前者需要应对自然语言变化多端的表达方式,将其逻辑化,后者则是建立在逻辑形式(logical form)上,一条规则等价于底层规则的几百上千条。因此,从多应用的角度看,规则系统的知识成本趋小,而机器学习的知识成本则没有这个便利。 III. 主流的反思 如前所述,NLP领域主流意识中的成见很多,积重难返。世界上还很少有这样的怪现象:号称计算语言学(Computational Linguistics)的领域一直在排挤语言学和语言学家。语言学家所擅长的规则系统,与传统语言学完全不同,是可实现的形式语言学(Formal Linguistics)的体现。对于非浅层的NLP任务,有效的规则系统不可能是计算词典和文法的简单堆积,而是蕴含了对不同语言现象的语言学处理策略(或算法)。然而,这一路研究在NLP讲台发表的空间日渐狭小,资助亦难,使得新一代学人面临技术传承的危险。Church (2007)指出,NLP研究统计一边倒的状况是如此明显,其他的声音已经听不见。在浅层NLP的低垂果实几乎全部采摘完毕以后,当下一代学人面对复杂任务时,语言学营养缺乏症可能导致统计路线捉襟见肘。 可喜的是,近年来主流中有识之士(如,Church 2007, Wintner 2009)开始了反思和呼吁,召唤语言学的归来:“In essence, linguistics is altogether missing in contemporary natural language engineering research. … I want to call for the return of linguistics to computational linguistics.”(Wintner 2009)。相信他们的声音会受到越来越多的人的注意。 参考文献 Church 2007. A Pendulum Swung Too Far. Linguistics issues in Language Technology, Volume 2, Issue 4. Wintner 2009. What Science Underlies Natural Language Engineering? Computational Linguistics, Volume 35, Number 4 原载 《W. Li T. Tang: 主流的傲慢与偏见:规则系统与机器学习》 【计算机学会通讯】2013年第8期(总第90期) Pride and Prejudice in Mainstream: Rule System vs. Machine Learning In the area of Computational Linguistics, there are two basic approaches to natural language processing, the traditional rule system and the mainstream machine learning. They are complementary and there are pros and cons associated with both. However, as machine learning is the dominant mainstream philosophy reflected by the overwhelming ratio of papers published in academia, the area seems to be heavily biased against the rule system methodology. The tremendous success of machine learning as applied to a list of natural language tasks has reinforced the mainstream pride and prejudice in favor of one and against the other. As a result, there are numerous specious views which are often taken for granted without check, including attacks on the rule system's defects based on incomplete induction or misconception. This is not healthy for NLP itself as an applied research area and exerts an inappropriate influence on the young scientists coming to this area. This is the first piece of a series of writings aimed at correcting the prevalent prejudice, focused on the in-depth examination of the so-called hand-crafted defect of the rule system and the associated knowledge bottleneck issue. 【相关】 K. Church: A Pendulum Swung Too Far , Linguistics issues in Language Technology, 2011; 6(5) 【科普随笔:NLP主流的傲慢与偏见】 【 关于NLP方法论以及两条路线之争 】 专栏: NLP方法论 【置顶:立委NLP博文一览】 《朝华午拾》总目录
人工智能的进步一方面给人类带来了科技文明的新进步,另一方面促进了对人的智能更深刻的理解。我们一直认为只有基于对人的智能的深刻理解,才能清楚地区分人工智能与人的智能之间的复杂关系,而不能以人工智能的观点去定义人的智能,正像在算法理论中,不能以“多项式时间”(Polynomial time)的计算能力去定义“不确定性问题”(NP,Nondeterministic Problem)一样。 人工智能基本理论不仅牵涉到大量相关的应用技术科学,更深刻地与数学、逻辑学、哲学、中西比较文化等领域中基本理论问题相关联,我们与对此有兴趣的学者一样,希望把中国文化中最有价值的中国思想与前沿科学基本理论相联系,作些探索性的工作,。。。 我们的文章(智能哲学:人、机之间的“战争”与“学习”,http://www.aisixiang.com/data/99245-2.html),集中在讨论“机器学习”与“人的学习”的关系上。人、机的围棋对弈不等同于人、机之战,人的“学习”与机器的“学习”具有完全不同的性质和层次性。AlphaGo中基于人工神经网络ANN和“深度学习”等技术对围棋棋局的判断使算法有效搜索空间成为可能。一方面,AlphaGo确实是学习了弈棋高手的经验才胜过了人;但另一方面,更应该看到,人模仿了大脑神经系统制造了可以“学习”的机器,然后才是机器“学习”人的经验,这两方面才是对AlphaGo和“人工智能”的智能性的正确理解。 人、机之间最大的区别就在于人是天生的主体学习者,而机器则是在人造的“先天性”上才得到自己的“学习”能力。 ****** 智能哲学:人、机之间的“战争”与“学习” 周剑铭 柳渝 ( yu.li@u-picardie.fr ) 摘要:人、机的围棋对弈不等同于人、机之战。人的“学习”与机器的“学习”具有完全不同的性质和层次性。AlphaGo中基于人工神经网络ANN和“深度学习”等技术对围棋棋局的判断使算法有效搜索空间成为可能。一方面,AlphaGo确实是学习了弈棋高手的经验才胜过了人;但另一方面,更应该看到,人模仿了大脑神经系统制造了可以“学习”的机器,然后才是机器“学习”人的经验,这两方面才是对AlphaGo和“人工智能”的智能性的正确理解。人、机之间最大的区别就在于人是天生的主体学习者,机器则是在人造的“先天性”上才得到自己的“学习”能力。 目录: 一、“一石激起千层浪” 二、“围棋之战”不等于“人、机之战” 三、“模仿”与“学习” 四、围棋的全局性与AlphaGo 一、 “一石激起千层浪” 3月12日,韩国著名围棋棋手李世石对战谷歌AlphaGo的人机围棋大战在韩国首尔举行,AlphaGo(谷歌首席程序员Aja Huang执子)与李世石对弈。在去年战胜了欧洲围棋冠军樊麾后,AlphaGo与九段高手李世石之间的对弈,成了科学技术领域和新闻界的重大事件。 阿尔法围棋(AlphaGo)是一款围棋人工智能机器,由位于英国伦敦的谷歌(Google)旗下DeepMind公司的戴维·西尔弗、艾佳·黄和戴密斯·哈萨比斯等团队开发。AlphaGo这次与李世石的比赛五盘三胜,胜者可获得奖金100万美元。此前李世石曾表示,自己看过AlphaGo的对局,也会做一些针对性的准备,认为机器人AlphaGo的棋力相当于三段棋手的水平。如果人工智能技术继续发展的话,再过一至两年,比赛的结果将很难预料,但AlphaGo表现出乎所有人意料。 在首尔四季宾馆,经过4个多小时的对弈,到第176手AlphaGo在围棋棋盘上落下了最后一“石”,曾获多项世界冠军的九段棋手李世石向谷歌AlphaGo认输,AlphaGo以3比0率先获胜,“石破天惊”,全球哗然! 科幻电影迷和“强人工智能”追求者把这次“人机大战”看成是人类社会“奇点来临”(Ray Kurzweil:“The Singularity Is Near”)的一个信号,既然机器的智能可以胜过人的智能,机器反过来控制人类和统治世界就是可能的,一些大科学家、企业家都对人工智能的强大远景感到忧虑,比如英国“独立报”网站(http://www.independent.co.uk/news/science/stephen-hawking-ai-could-be-the-end-of-humanity-9898320.html )就刊登有关资讯,引用霍金的话:人工智能将是人类的终结吗?像这些并非笑谈的认真思考与人工智能发展的新高峰形成了当前全世界关注的大浪潮;虽然有很多业内人士认为真正能达到人的智能的机器的出现在时间上仍很遥远,但这并不能抹去这种笼罩在人类头顶上的阴影。另一方面,仍然有很多人相信,机器与人具有本质的区别,机器不会具有真正的感情、自我意识、良心、社会责任等人类独具的能力,遗憾的是现在并没有看到在理论、逻辑上可以对这种信心做出的有力支持,哲学家、人类学家、社会学家、文化理论家们似乎尴尬,因为对“智能”、“知识”、“情感”、“自我意识”等等最基本的概念几千年来几无定论,人工智能几乎把由概念和逻辑构成的晦涩、复杂、精致的庞大哲学和抽象理论轻松地推到一边去了,在人工智能发展的速度难以预测的情况下,所有的人都是哲学家,都直接面对我们人类和世界的命运:十年或五十年? 二、 “围棋之战”不等于“人、机之战” AlphaGo与樊麾、李世石的对弈是“围棋的人、机之战”,不是也不等同于一般意义上的“人、机之战”。AlphaGo是一台学习了历史上所有围棋高手经验的机器,AlphaGo对弈樊、李实际是所有过去的围棋高手(经验)对弈一个围棋高手,所以无论谁胜,都是人棋游戏,在“胜、负”的意义上,都是人与人之间的对弈游戏,不是一般意义上的“人、机之战”。 把人机对弈看成是“人、机大战”是层次上的严重混淆,这种混淆误导人们对人工智能的认识,认为所有人设计的机器具有“天生”的与人的对立性,就是把机器的“自主”能力等同于人的自主能力,认为机器能“学习”,就是一种自主性,也就是把机器“学习”等同于人的自主性学习。 迄今为止的“计算机”都是“算法计算机”,即由“机器”执行“算法”进行“计算”,所以计算机需人编制程序才能工作。计算机的硬件和软件是分离的,一台元器件组装的“祼机”首先要装入操作系统才能开机(“点亮”),然后装入如“看图”、“办公”等应用软件才能工作,所以计算机的工作能力是由人灌装进机器里面去的。 “机器学习”就是不需要人事先灌装程序得到工作能力的人工智能,“机器学习”与“算法计算”的本质区别不在于“计算”而在于“判断”,比如图形识别、语音辨析等,这实际上是人在日常生活中遇到的最多的也是最基本的问题,所以也是人的一种基本能力,这种能力是无法事先学习的,“只能在游泳中学会游泳”,这只是一种个人性的经验。 “机器学习”的这种不同于“算法计算”的判断能力确实是一种经验的学习途径,“机器学习”的能力源于“人工神经网络”(ANN),ANN是模仿大脑中的神经元、突触的联接而得到的对数据特征提取和判断的能力,“机器学习”就是通过大量的样本训练使机器“记住”了某些特征,这样就可以用这种特征去甄别要处理的对象。 AlphaGo就是通过大量的围棋实战棋谱训练而得到对棋局整体性的把握能力,人也可以通过棋谱的学习而提高棋术,但人一定是参与实战而得到下棋能力,机器则恰恰是“先学好游泳”,这才是人、机之间的根本区别。但机器为什么可以在棋盘上战胜人呢?这是因为在对经验的记忆能力、反应的速度上人不如机器,所以机器是以记忆量与敏捷性的优势上战胜人的,但是机器的看盘“直觉”和走子的策略上却是学习了人的经验的结果。 由于人、机围棋对弈限定了人的能力只能在棋盘上,人、机之间的围棋对弈最多只是体现了人机复杂关系中在“人工智能”这个问题上的局部关系,以“围棋之战”等同于“人、机之战”是造成对人工智能本质的误解的一个基本原因。 机器能与人对弈只是表明机器成功地“学习”了人下围棋的方法,AlphaGo对人获胜也只是证明“机器学习”这种“人工智能”的学习能力得到了肯定。机器在一种人、机的游戏中战胜了人,这与“机器战胜了人”这个大题目是层次上非常不同的事情。 人与人的智能的关系本身就是一个不确定性的问题,在能做出“人的智能”与“人工智能”谁比谁更强大这种判断之前,我们现在还不知道人的“智能”究竟是什么? 三、 “模仿”与“学习” 朱光潜说“‘模仿’和‘学习’本来不是两件事。姑且拿‘学习’来说。小儿学写字,最初是描红,其次是写印本,再其次是临帖。这些方法都是在借旁人所写的字做榜样,逐渐养成手腕筋肉的习惯。……推广一点说,一切艺术上的模仿都可以作如是观。比如说作诗作文,似乎没有什么筋肉的技巧,其实也是一理。诗文都要有情感和思想。情感都见于筋肉的活动,我们在前面已经说过。思想离不开语言,语言离不开喉舌的动作。比如想到“虎”字时,喉舌间都不免起若干说出“虎”字的筋肉动作。这是行为派心理学的创见,现在已逐渐为一般心理学家所公认。诗人和文人常欢喜说‘思路’,所谓‘思路’并无若何玄妙,也不过是筋肉活动所走的特殊方向而已。……” “学”,繁体字写作“學”,会意字,本义教塾中的孩子模仿。“习”,繁体字写作“習”,会意字,从羽,与鸟飞有关,本义小鸟反复地试飞。学习,先模仿别人的经验,然后反复练习,慢慢掌握技巧,变成自己的技能,养成习惯,“习以为常”,就是说通过“习”使之变为“常”。大意就是说,身体力行的“模仿”是最根本的“学习”。王阳明就特别强调“知行合一”,表达了后期儒家对日益增长的“知识”与个人心、性之间的关系以及个人身、心关系的一致性。 按中国传统文化,“学而时习之”,强调个人的在“学习”中的主体性,“学”之于人。“习”之于己,谓之“习得”,“习”就是个人的主体过程,“学”只是一种被动的模仿行为,自身的主动性才是真正的动力,人的主体性也是学习的最终归宿。“子曰:不愤不启,不悱不发。举一隅不以三隅反,则不复也。” (论语·述而) 孟子曰:“君子深造之以道,欲其自得之也。自得之,则居之安;居之安,则资之深;资之深,则取之左右逢其源,故君子欲其自得之也。”(孟子·离娄下)现代的皮亚杰(Jean Piaget 1896-1980)也认为,儿童的心理(智力)既不是起源于先天的成熟,也不是起源于后天的经验,而是起源于主体的动作。克拉申(Stephen D. Krashen 1941-) 提出“习得--学习差异” 假设,认为语言的习得就是一种无意识地、自然而然地学习,学习者通常意识不到自己在习得语言;而语言“学习”则是通过设定的教学计划和教材并有意识的练习、记忆,达到对所学语言、语法的掌握。习得的结果是潜意识的语言能力;而学习的结果是对语言结构有意识的掌握,语言的“学习”只起对语言的检测、编辑的作用。 西方文化传统重在知识,“学”在致“知”,从苏格拉底的“知识就是美德”到培根的“知识就是力量”,学习(learning)主要是指知识的学习,虽然知识的学习于人是不可或缺的,知识的积累和进化是人类文明最主要的组成部份和动力,但片面强调这种知识性的学习也会产生对人自身主体性的困惑。似乎是在“机器学习”这个概念中,人们才领悟到机器是由人“模仿”人的神经系统而使机器得到“学习”能力的,这里的“模仿”是指人在计算机中“建模(modeling)”方式的模仿,这种强调区分了人与机器在层次上的不同。实际上这里就隐含了模仿的三个层次,首先是人对神经网络的模仿(imitating)而得到ANN这种“机器代理”(Agent),然后是在“算法计算”的通用计算机中建模(modeling)ANN,然后才是机器的学习训练(“有监督”或“无监督的”机器学习)以得到机器的人工智能性。现在广泛使用的“机器学习”大抵上只是理解为将人类的既有经验纳入机器中,比如数据库式的专家系统和ANN式的“机器代理”(Agent)都是这种意义上的“学习”。 人、机之间最大的区别就在于,人主要是天生的主体学习者,机器则是在人造的“先天性”上才得到自己的“学习”能力的。 由于人工智能的出现而引发了我们对人的智能更深刻的理解,也带来了很多相关观念和概念的更新,并正在引发人 对 自我和世界的重新认识,对“模仿”和“学习”的意义和之间关系的分析能够帮助我们更加认识到,区别图灵或冯·诺依曼构型的计算机的“算法计算”与“机器代理”(Agent)的Matrix的重要(可参见 )。我们一直认为只有基于对人的智能的深刻理解,才能清楚地区分机器的人工智能与人的智能之间的复杂关系,而不能以人工智能的观点去理解人的智能,正像在算法理论中,不能以“多项式时间”(Polynomial time)的计算能力去定义“不确定性问题”(NP,Nondeterministic Problem)一样 。 总的来说,一方面,AlphaGo确实是学习了弈棋高手的经验才胜过了人,但另一方面,应更应该看到,人模仿了大脑神经系统制造了可以“学习”的机器,然后才是这个机器去“学习”人的经验,这两方面才是对AlphaGo与人之间的“围棋之战”和一般意义上所谓的“人机之战”完全不同的层次的真正分别,这也是对“人工智能”的智能性的正确理解。 四、 围棋的全局性与AlphaGo 在“深度学习”这个概念下,AlphaGo虽然不能说全是基于人工神经网络(ANN)的基础,但AlphaGo的确把对决游戏的人工智能发挥到了极致,各种强大的搜索算法无法应对的巨大搜索空间就是由对棋局的“直觉”判断的ANN而被转化为可能搜索的。从“计算”基本原理上看,ANN确实是把以往的基于“算法计算”的人工智能推进到基于“代理计算”的人工智能的道路上,ANN所表现的这种全局性的判断能力,就是人类常引以为傲的“直觉”,但迄今为止,人们仍然不了解人的直觉的秘密,因此,即使是ANN研究的专家也都承认对ANN的基本原理仍然不了解。 围棋的“局面”是一种具有全局意义的对决游戏,围棋中的盘面局势不是由棋子与棋盘上的位置关系决定的,而是由每一个棋子与其它所有的棋子组成的“局面”决定的,而且每一棋局的局部与全局具有同样的关系,这也正是现在常见的“深度”这个术语的隐含意义,所谓“卷积神经网络”(Convolutional Neural Network,CNN)也正是在这个意义上发展起来的。 国际象棋的棋子具有个别性,棋子的等级与其在盘面上的位置大体决定了棋子的价值,能下国际象棋的“深兰”就是建立了所有棋子与棋盘上的位置组成的“空间”,然后用传统的算法搜索方法和技巧对所有可能的走子行为进行整个可能空间的搜索,从而可以找出最佳走法,但这种方法对围棋棋子和棋盘位置组成的巨大的空间无能为力,DeepMind团队在人工神经网络ANN的基础上,研制了对围棋的“局面”判断方法,这就是AlphaGo中的“价值网络”与“策略网络”能够对围棋局面做出优劣判断,像高手对盘面的直觉印象一样,全局性判断虽然不能直接产生具体的走法,但可以提供局面优、劣判断以缩减搜索空间提供给算法搜索。这种价值判断与算法搜索相结合的方法就类似于人的左、右大脑的工作。 虽然无法得知AlphaGo的组成细节,但DeepMind是这样透露大体情况的:“First, the depth of the search may be reduced by position evaluation: truncating the search tree at states and replacing the subtree belows by an approximate value function v(s) ≈ v*(s) that predicts the outcome from states. This approach has led to superhuman performance in chess, checkers and othello, but it was believed to be intractable in Go due to the complexity of the game. Second, the breadth of the search may be reduced by sampling actions from a policy p(a|s) that is a probability distribution over possible moves a in positions. Recently, deep convolutional neural networks have achieved unprecedented performance in visual domains: for example, image classification, face recognition, and playing Atari games. They use many layers of neurons, each arranged in overlapping tiles, to construct increasingly abstract, localized representations of an image. We employ a similar architecture for the game of Go. We pass in the board position as a 19 × 19 image and use convolutional layers to construct a representation of the position. We use these neural networks to reduce the effective depth and breadth of the search tree: evaluating positions using a valu和e network, and sampling actions using a policy network.”——首先,搜索深度可以由棋局评价缩减:在(价值网络对)棋局(评价中)截短搜索树,并重置(可以搜索的)近似最优v(s) ≈ v*(s)的子枝层,这是(价值网络对)局面评价产生的结果。这个方法导致在棋子、棋格和局部棋块中超常人的策略,但在复杂的围棋游戏中(这下一步的走子策略)仍然是难(搜索)的;因此第二步,(此层中)搜索广度可以由(策略网络)在样本行为中对可能走子在棋局上几率分布p(a|s)进行缩减。现在,深度卷积神经网络在视像领域取得了出人意料的效果,比如图像分类、面孔辨识和ATARI游戏中。他们使用了很多神经层,每个都置于重叠的层次中,以对一个构建不断增长的抽象、局部表达。我们在围棋游戏中采用了相同的结构,我们放弃了对19 × 19盘面位置(的算法)而使用了卷积层构建棋局的表达。我们使用了这些神经网络缩减了搜索树的有效深度和广度:局面评价采用了价值网络,而走子步骤采用了策略网络。 结语:从人与机器的“学习”性质和关系上,我们可以看出, 人工智能中的人、机关系既是人与客观世界的关系,也是人与社会、人与自己的关系,这种多层次之间的缠绕关系的复杂性超过一般的多元、交互关系,人工智能给我们带来的不仅是物质和社会文明上的进步和提高,也给人和世界带来了更多的不确定性。 主要参考资料 Mastering the game of Go with deep neural networks and tree search, Nature 529 (7587): 484–489 朱光潜,“不似则失其所以为诗,似则失其所以为我”,《谈美》系列之十三。 周剑铭,智能哲学:人与人工智能 网文 周剑铭 柳渝,机器与“学习”——寻找人工智能的幽灵,网文 柳渝,不确定性的困惑与NP理论,http://blog.sciencenet.cn/home.php?mod=spaceuid=2322490
机器学习-deep learning reading 网上链接 Deep learning Reading List Following is a growing list of some of the materials i found on the web for Deep Learning beginners. Free Online Books Deep Learning by Yoshua Bengio, Ian Goodfellow and Aaron Courville Neural Networks and Deep Learning by Michael Nielsen Deep Learning by Microsoft Research Deep Learning Tutorial by LISA lab, University of Montreal Courses Machine Learning by Andrew Ng in Coursera Neural Networks for Machine Learning by Geoffrey Hinton in Coursera Neural networks class by Hugo Larochelle from Université de Sherbrooke Deep Learning Course by CILVR lab @ NYU CS231n: Convolutional Neural Networks for Visual Recognition On-Going CS224d: Deep Learning for Natural Language Processing Going to start Video and Lectures How To Create A Mind By Ray Kurzweil - Is a inspiring talk Deep Learning, Self-Taught Learning and Unsupervised Feature Learning By Andrew Ng Recent Developments in Deep Learning By Geoff Hinton The Unreasonable Effectiveness of Deep Learning by Yann LeCun Deep Learning of Representations by Yoshua bengio Principles of Hierarchical Temporal Memory by Jeff Hawkins Machine Learning Discussion Group - Deep Learning w/ Stanford AI Lab by Adam Coates Making Sense of the World with Deep Learning By Adam Coates Demystifying Unsupervised Feature Learning By Adam Coates Visual Perception with Deep Learning By Yann LeCun Papers ImageNet Classification with Deep Convolutional Neural Networks Using Very Deep Autoencoders for Content Based Image Retrieval Learning Deep Architectures for AI CMU’s list of papers Tutorials UFLDL Tutorial 1 UFLDL Tutorial 2 Deep Learning for NLP (without Magic) A Deep Learning Tutorial: From Perceptrons to Deep Networks WebSites deeplearning.net deeplearning.stanford.edu Datasets MNIST Handwritten digits Google House Numbers from street view CIFAR-10 and CIFAR-100 IMAGENET Tiny Images 80 Million tiny images Flickr Data 100 Million Yahoo dataset Berkeley Segmentation Dataset 500 Frameworks Caffe Torch7 Theano cuda-convnet Ccv NuPIC DeepLearning4J Miscellaneous Google Plus - Deep Learning Community Caffe Webinar 100 Best Github Resources in Github for DL Word2Vec Caffe DockerFile TorontoDeepLEarning convnet Vision data sets Fantastic Torch Tutorial My personal favourite. Also check out gfx.js Torch7 Cheat sheet 原文链接:http://jmozah.github.io/links/#rd
我们 “语义计算” 群在讨论这个句子的句法结构: The asbestos fiber, crocidolite, is unusually resilient once it enters the lungs, with even brief exposures to it causing symptoms that show up decades later, researchers said. 我说,it looks fine in its entirety. once-clause has a main clause before it, so it is perfectly grammatical. The PP with even brief exposures to it is an adverbial of causing ...: usually PP modifies a preceding verb, but here it modifies the following ING-verb, which is ok. 然后想到不妨测试了一下我们的 parser,果然,把 PP 连错了,说是 PP 修饰 enters,而不是 causing。 除此而外,我的 parse 完全正确。这也许是一个可以原谅的错误。如果要改进,我可以让两种可能都保留。但是统计上看,也许不值得,因为一个 PP 面对前面的一个谓语动词和后面的一个非谓语动词,修饰前者的概率远远大于修饰后者。 张老师问: 是否此句在你的训练集里?如是统计方法。否则太不容易了 我说,我这是语言学程序猿做的规则系统,不是统计方法。句子不在我的 dev corpus 里面。parsing 是一个 tractable task,下点功夫总是可以做出来,其水平可以达到接近人工(语言学家),超越普通人(非语言学家)。说的是自己实践的观察和体会。靠谱的 parsing,有经验的语言学程序猿可以搞定,无需指靠机器学习。 为了说明这个观点,我测试了我的汉语 parser: 这个汉语句子的 parsing,只有一个错误,“语言学”与 “程序猿” 之间掉链子了(说明 parsing 还有改进余地,汉语parsing开发晚一些,难度也大一些,当前的状况,掉链子的事儿还偶有发生)。但整体来看基本也算靠谱了。所以,即便是比英语句法更难的汉语,也仍然属于 tractable 人工可以搞定的任务。 语言学家搞不定的是那些千头万绪的任务,譬如语音识别(speech recognition),譬如文章分类 (document classification),譬如聚类习得 (clus tering-based ontology acquisition) 。这些在很多个 features 中玩平衡的任务,人脑不够用,见木不见林。但是对于 deep parsing 和 信息抽取,解剖的是一颗颗树,条分缕析,这是语言学家的拿手好戏,都是 tractable 的任务,当然可以搞定。(甭管多大的数据,一句句分析抽取完了存入库里,到retrieve的时候还是需要“挖掘”一番,那时候为了不一叶障目,自然是需要用到统计的)。 在 条分缕析的 tractable 任务上(譬如,deep parsing),我的基本看法是:有NLP经验的语言学家立于不败之地。而机器学习,包括深度学习(deep learning,当前呼声最高的机器学习神器),也许在将来的某一天,可以逼近专家水平。值得期待。最多是逼近语言学家,但是要超越人工,我不大相信。再牛的机器学习算法也不可能在所有任务上胜过专家的手工编码,这个观点本来应该是显然的,但是学界的多数人却天然地认为深度学习总是可以超越人工系统。 parser 的直接目标不是语义求解, 而是提供一个靠谱的结构基础, 这样后续的(语用层面的)语义理解、信息抽取、舆情分析、机器翻译、自动文摘、智能秘书或其他的NLP应用, 就可以面对有限的 patterns, 而不是无限的线性序列。 从这个目标来看,我们的中文英文的 parsers 都已经达标了。 【相关】 【围脖:做 parsing 还是要靠语言学家,机器学习不给力】 手工规则系统的软肋在文章分类 《立委随笔:语言自动分析的两个路子》 再谈机器学习和手工系统:人和机器谁更聪明能干? 【 why hybrid? on machine learning vs. hand-coded rules in NLP 】 Comparison of Pros and Cons of Two NLP Approaches 【置顶:立委科学网博客NLP博文一览(定期更新版)】
IEEE Visualization Conference 2015 - Increasing Influence of Machine Learning IEEE Visualization Conference 2015 - Increasing Influence of Machine Learning ML Blog Team 11 Nov 2015 9:00 AM Comments 0 Likes This post is authored by Yiwen Sun, Data Scientist at Microsoft. I attended the IEEE Visualization Conference 2015 in Chicago recently and jotted down a few points related to machine learning. For those of you who are unfamiliar with this conference, it’s the largest annual gathering of practitioners, academics and researchers looking to make data visually understandable and usable. Conference paper talks are organized into three tracks: Visual Analytics Science and Technology (VAST), Information Visualization (InfoVis), and Scientific Visualization (SciVis). Co-located are three IEEE symposiums: Large Data Analysis and Visualization (LDAV), Visualization for Cyber Security (VizSec), and the very first Symposium of Visualization in Data Science (VDS). Over 1500 attendees participated this year, including leading companies in Business Intelligence and Advanced Analytics including Bloomberg, Google, IBM, Tableau, and, of course, Microsoft. One big impression I got is that ML and Data Visualization are getting coupled more tightly. Over half of the papers address ML techniques in their data processing step. For example, the best paper for VAST “ Reducing Snapshots to Points: A Visual Analytics Approach to Dynamic Network Exploration ” utilizes vectorization, normalization, and dimensionality reduction to project high-dimensional dynamic network data onto two dimensions, then visualize them using two juxtaposed views: one showing network snapshots and the other showing the evolution of the network. This enables users to differentiate regular, stable states from anomalies more easily. Below is a summary of ML techniques highlighted in four major application areas: In network or spatial data visualization, clustering and classification have been widely used to reduce clutter and identify regions of interest. For example, in the paper “ MobilityGraphs: Visual Analysis of Mass Mobility Dynamics via Spatio-Temporal Graphs and Clustering ”, hourly Twitter user movement data in Greater London area are spatially aggregated into regional clusters and color-coded by temporal clusters. (Image from Interactive Graphics Systems Group at Technical University of Darmstadt) For time-series data visualization, a big challenge is to present large dataset on the limited display space without over-plotting. An effective approach is to aggregate the data points into segments of time, and create a hierarchy of multi-focus zoomed line chart, as illustrated in the paper “ TimeNotes: A Study on Effective Chart Visualization and Interaction Techniques for Time-Series Data ” (Image from TimeNotes ) In textual data visualization, text mining techniques such as entity extraction, topic identification and sentiment analysis become essential. In the paper “ Exploring Evolving Media Discourse Through Event Cueing ”, multiple mining results, such as entities in Wordle, sentiment scores over timeline, are linked together to enable and enhance the analysis of media discourse. (Image from VADER Lab at Arizona State University) Anomaly detection, though not a standalone research area for visualization, has been studied by different research groups, to assist human judgement with automated analysis results. In “ Visualization and Analysis of Rotating Stall for Transonic Jet Engine Simulation ” the authors applied Grubbs’ test to identify outliers in blade passages as the early sign of turbine engine’s rotating stall. In “ TargetVue: visual analysis of anomalous user behaviors in online communication systems ”, TLOF (time-adaptive local outlier factor) model was used to identify sudden changes of user behaviors based on a set of features extracted for each user from the online communication data. The VAST Challenge was another highlight – this is an annual contest that began in 2006 and is designed to reflect real-world analytics challenges and encourage research into novel data processing, visualization and interaction methods. This year’s challenge was to analyze individual and group movement in an amusement park over a weekend which involves a criminal investigation. Popular languages used for data processing and ML were Python and R, both of which are currently supported by Azure Machine Learning . Overall, the conference was a great place to learn about the very latest in all things visualization, and to interact with experts in the domain. Yiwen 0 Comments 来源:http://blogs.technet.com/b/machinelearning/archive/2015/11/11/ieee-visualization-conference-2015-increasing-influence-of-machine-learning.aspx
Not a Gospel to many involved parties in social media mining, but it is as powerful and true: the mainstream machine learning approach without using linguistic analysis is powerless before social media. Linguistic structures in addition to keywords have to be involved. Too important a message to go unnoticed, this blog elaborates on my last English post . 提上来,just too important to be left aside。不怕审美疲劳,也要把这话说透。对于很多人,包括投资人、创业者和用户,这不是福音,但它具有同样的普适性、爆炸性和真理性。虽然这其实一点不难理解,但蒙上眼睛不愿承认的大有人在,不仅国内,海外亦然,不仅学界,业界亦然。然而现实毕竟是现实。 用一袋子祠(Bag of Words,BOW) 机器学习 的主流方法做社交媒体挖掘,譬如舆情分类,无论西文还是中文,都不能信赖。捉襟见肘不堪应用是基本现状。原因如此显然,机器学习在短消息主导的社会媒体面前 失效 了。短消息根本就没有足够密度的数据点(所谓 keyword density)供机器学习施展。巧妇且难为无米之炊,这是一袋子词的方法论决定的,再大的训练集也难以克服这个局限。没有语言学的结构分析,这是不可逾越的挑战。 无论中文还是西文,短消息压倒多数是移动时代社交媒体的现实, 总须有人揭出社交媒体大数据挖掘背后的事实真相。BOW 面对短消息束手无策,是不争的事实,不会因为这是最简便 available 的主流方法,多数人用它,它就在不适合它的场所突然显灵了。不 work 就是不 work,这一路突破不了60%的精度瓶颈,离公认的可用精度门槛80%遥不可及,这是方法论决定的。 from 一切声称用机器学习做社会媒体舆情挖掘的系统,都值得怀疑 对于BOW 舆情分类,60% 是天花板,这一点从业界多个独立渠道得到证实,我们自己的内部实验也支持这一结论。舆情分类是信息抽取(Information Extraction)在主观语言现象中的自然延伸。在信息抽取领域,长期以来的共识是,80% 的精度是一个系统能否实用的门槛。传统的信息抽取任务中,专名识别(Named Entity)早已达标(90%+),关系抽取(Relationships)当年也接近达标(70%-80%)。复杂事件(Scenario Template)的抽取一直不能实用,因为当年最好的系统也只能达到50%左右的精度 (其实简化的事件 General Events 的抽取在句法模式的帮助下,其质量 是完全可以达标的,它比舆情分析容易,这就是为什么复杂事件作为目标逐渐被简单事件代替的缘由 ) 。在entity基础上的关系抽取和简单事件抽取是所谓知识图谱技术的核心,因为其对象是更tractable的客观语言现象,自动分析(parsing)基础上的大数据抽取已经完全成熟,只是一个工作量的问题。相比之下,舆情分析可算是信息抽取任务中难度最大的了,实践证明,在深度句法分析(deep parsing)的基础上,做细致深入的开发,也是可以达标的(80%-90%)。可是一袋子词不行,连门都进不去。结构分析是绕不过去的坎儿。多数短消息总共不过十来个词,没有结构分析作为支点,光那几个实词(按常规减去停止词后)注定玩不出舆情和语义来。 随着大数据时代的到来,得益于大数据广泛存在的信息冗余,舆情系统的质量陷阱不在查全率(recall),而是查准率(precision)。 对于社交媒体里压倒性的短消息, 不做结构分析,光靠一袋子词的传统办法,哪怕是再牛的机器学习算法,有再多的训练数据,舆情分类也不可能突破 60% 的查准率瓶颈。这就是所有社交媒体机器学习系统不得不面对的现实。无论其产品的可视化做得多诱人,社交媒体的舆情报告看上去多么漂亮,只要方法上没有用到语言结构,要想达到可信的质量是难以想象的。 【相关】 【立委科普:NLP 中的一袋子词是什么】 2015-11-27 一切声称用机器学习做社会媒体舆情挖掘的系统,都值得怀疑 2015-11-21 【 立委 科普:基于关键词的舆情分类系统面临挑战】 舆情挖掘系统独立验证的意义 2015-11-22 【置顶:立委科学网博客NLP博文一览(定期更新版)】
IEEE Visualization Conference 2015 - Increasing Influence of Machine Learning IEEE Visualization Conference 2015 - Increasing Influence of Machine Learning ML Blog Team 11 Nov 2015 9:00 AM Comments 0 Likes This post is authored by Yiwen Sun, Data Scientist at Microsoft. I attended the IEEE Visualization Conference 2015 in Chicago recently and jotted down a few points related to machine learning. For those of you who are unfamiliar with this conference, it’s the largest annual gathering of practitioners, academics and researchers looking to make data visually understandable and usable. Conference paper talks are organized into three tracks: Visual Analytics Science and Technology (VAST), Information Visualization (InfoVis), and Scientific Visualization (SciVis). Co-located are three IEEE symposiums: Large Data Analysis and Visualization (LDAV), Visualization for Cyber Security (VizSec), and the very first Symposium of Visualization in Data Science (VDS). Over 1500 attendees participated this year, including leading companies in Business Intelligence and Advanced Analytics including Bloomberg, Google, IBM, Tableau, and, of course, Microsoft. One big impression I got is that ML and Data Visualization are getting coupled more tightly. Over half of the papers address ML techniques in their data processing step. For example, the best paper for VAST “ Reducing Snapshots to Points: A Visual Analytics Approach to Dynamic Network Exploration ” utilizes vectorization, normalization, and dimensionality reduction to project high-dimensional dynamic network data onto two dimensions, then visualize them using two juxtaposed views: one showing network snapshots and the other showing the evolution of the network. This enables users to differentiate regular, stable states from anomalies more easily. Below is a summary of ML techniques highlighted in four major application areas: In network or spatial data visualization, clustering and classification have been widely used to reduce clutter and identify regions of interest. For example, in the paper “ MobilityGraphs: Visual Analysis of Mass Mobility Dynamics via Spatio-Temporal Graphs and Clustering ”, hourly Twitter user movement data in Greater London area are spatially aggregated into regional clusters and color-coded by temporal clusters. (Image from Interactive Graphics Systems Group at Technical University of Darmstadt) For time-series data visualization, a big challenge is to present large dataset on the limited display space without over-plotting. An effective approach is to aggregate the data points into segments of time, and create a hierarchy of multi-focus zoomed line chart, as illustrated in the paper “ TimeNotes: A Study on Effective Chart Visualization and Interaction Techniques for Time-Series Data ” (Image from TimeNotes ) In textual data visualization, text mining techniques such as entity extraction, topic identification and sentiment analysis become essential. In the paper “ Exploring Evolving Media Discourse Through Event Cueing ”, multiple mining results, such as entities in Wordle, sentiment scores over timeline, are linked together to enable and enhance the analysis of media discourse. (Image from VADER Lab at Arizona State University) Anomaly detection, though not a standalone research area for visualization, has been studied by different research groups, to assist human judgement with automated analysis results. In “ Visualization and Analysis of Rotating Stall for Transonic Jet Engine Simulation ” the authors applied Grubbs’ test to identify outliers in blade passages as the early sign of turbine engine’s rotating stall. In “ TargetVue: visual analysis of anomalous user behaviors in online communication systems ”, TLOF (time-adaptive local outlier factor) model was used to identify sudden changes of user behaviors based on a set of features extracted for each user from the online communication data. The VAST Challenge was another highlight – this is an annual contest that began in 2006 and is designed to reflect real-world analytics challenges and encourage research into novel data processing, visualization and interaction methods. This year’s challenge was to analyze individual and group movement in an amusement park over a weekend which involves a criminal investigation. Popular languages used for data processing and ML were Python and R, both of which are currently supported by Azure Machine Learning . Overall, the conference was a great place to learn about the very latest in all things visualization, and to interact with experts in the domain. Yiwen 0 Comments 来源:http://blogs.technet.com/b/machinelearning/archive/2015/11/11/ieee-visualization-conference-2015-increasing-influence-of-machine-learning.aspx
深度学习是指一种人工神经网络的学习。这种神经网络由多个非线性处理层连成一个级联结构。深度学习近来引起了工业界的广泛兴趣 , 如谷歌、微软、 IBM 、三星、百度等。我汇报一个称为生长认知网 (Cresceptron) 的深度学习网的关键机制——现在所熟知的最大汇集 (max-pooling) ——并向读者请教是不是 HMAX 网剽窃了生长认知网。在这篇报道中我并不声称这就是剽窃。 2014 年8月,《国际新闻界》期刊发布了一则消息, 称北京大学博士研究生于艳茹女士在此期刊的2013 年第7期发表了一篇论文。此论文剽窃了妮娜·吉尔波特在《十八世纪研究》期刊的1984 年第4期上发表了的另一篇论文。《国际新闻界》撤销了这篇剽窃论文, 并对作者作了惩罚。这则消息被广为报道,包括了BBC 中文网站。于艳茹是一个研究生, 但以下牵涉到一个资深研究员。 梅里厄姆 -韦伯斯特在线词典为剽窃词条的定义为:“偷窃或冒充(其他人的思想或语句) 当作自己的;使用(其他人的成果) 而没有指出来源。” 1991 年之前,深度网被用于识别单个的两维手写数字上。那时的三维物体识别还是使用基于三维模型的方法——找出两维图像与一个手工建造了的三维物体模型之间的匹配。 翁巨扬等人假设人脑内没有任何整块的三维物体模型。他们于 1992 年发表了生长认知网(Cresceptron) 。其目的是从自然的和混杂的两维图像中检测和识别学习过的三维物体并从这些两维图像中分割出识别了的物体。机器学习了的物体的实验例子 , 包括了人脸、人体、步行道、车辆、狗、消火栓、交通标志牌、电话机、椅子、桌面计算机。自然和混杂的实验场景 , 包括了电视节目场景、大学校园户外场景、室内办公室场景。生长认知网内的表示是由很多物体共享的分布式特征检测器的反映。 生长认知网是全发育性的 , 即它通过经验来增量地生长和适应。它由一个级联的多个非线性处理模块组成。每个模块由几个层组成。每个模块的前层由一或二层被称为模板匹配层的处理层构成。每个模板匹配层进行卷积运算——每个卷积核从一个位置学了然后用到所有其它位置上去,这样这个特征可以被用到其它所有位置上去检测。所以, 卷积是为了层内的位移不变性。 但是 , 一个主要的挑战是训练图像的数目是有限的。为了识别相似但生长认知网没有观察到过的图像,它必须宽恕物体图像的变形。 生长认知网有一个宽恕物体图像变形的关键机理是在每个模块里用 2x2 到1 的方法减少结点,用一个取最大值的运算。这相当于在每个2x2 结点组里对4 个发放率做了一个逻辑或。在1993 年发表的生长认知网论文 给出了执行最大汇集的层次化最大运算的数学表达式。 现在这被称为最大汇集。譬如 , 查看于尔根·史密贺伯(JuergenSchmidhuber) 关于深度学习的一篇综述文章 。根据这篇综述文章,生长认知网是第一次用了最大汇集。“最大汇集广泛地应用在今天的深度前馈神经网络” 。 譬如, 图像网(ImageNet) LSVRC-2010 和ILSVRC-2012 竞赛的第一名使用了由先卷积后最大汇集的模块而组成的级联结构 。 1994 年10月19 日, 应托马索·泼吉奥教授的友善的邀请,翁巨扬在在麻省理工学院的生物和计算学习中心给了一个演讲。在麻省理工学院的一个研讨会会场内, 几乎座无虚席, 他作了题为“视觉学习的框架”的演讲, 介绍了生长认知网。翁巨扬说他很感激这次旅行, 其机票和膳宿是由接待方支付的。 翁巨扬对我解释说 , 这个层次最大汇集结构至少有四个优点:(1) 层次地宽恕局部的位置扭曲, (2) 增加感受野的大小的同时不一定要增加卷积核的大小,因为大卷积核在计算上很昂贵, (3) 减少特征检测的密度来宽恕特征模板匹配的误差,(4) 允许局部漏失(譬如因遮挡而造成的部件的缺失), 由于4 个数的最大值与其它三个较小的值无关。 尽管如此 , 最大汇集不保证深度卷积网的输出不随着物体在像素平面上的平移而变。这一点在生长认知网的全细节期刊论文 内有解释。与此同时, 深度级联结构还是根本性地弱——因为它没有任何机制来像人脑能做的那样为训练集和测试集自动地进行图形-背景分割。而更加新的发育网(DN) 有这样的机能 ,是通过增量和自主的发育途径实现的。 在翁巨扬的 1994 年10 月19日在麻省理工学院的访问后大约五年后, 马克思米兰·里森贺伯和托马索·泼吉奥在《自然神经科学》发表了一篇论文 。这篇投稿1999 年6 月17日收到。它的摘要写道:“令人惊奇地, 量化模型几乎还没有... 我们叙述一个新的层次模型... 这个模型是基于类似最大的操作。”它的图2 的图解引用了福岛邦彦 , 但全文没有为这个模型的关键性最大运算引用过生长认知网或它的最大汇集方法。 福岛邦彦 手选了特别层来降低位置精度 ,但是没有用最大汇集的两个关键机理:(1)最大化运算(看 的等式(4)),和(2)在整个网络里用机算机自动地逐级降低位置精度。 后来托马索·泼吉奥把他们自己的模型称作 HMAX 但 还是没有引用生长认知网。 为了调查是不是思想剽窃 ,譬如 ,比较 的124 页的左列显示公式, 的公式(17), 的1024 页左列的最后一行里的公式, 和 的公式(3)。也比较 的图10(c) 和 的图2 中的虚线箭头 。 由于引入一些关键系统结构的机制 , 如最大汇集, 和大规模平行计算机越来越实用,如显卡平行计算, 深度学习网络在一些模式识别任务的很多测试中展示了持续增加的性能,日益吸引了工业界的兴趣, 如谷歌、微软、IBM、三星、百度等。 自然出版集团的关于剽窃的政策文件规定 :“关于已经出版了的结果的讨论: 当讨论其他人的出版了的结果时, 作者必须恰当地描述这些先前结果的贡献。知识的贡献和技术开发两者都必须相应承认和妥当地引用。” 例如 , 有一篇文章 的一个段落改述了一个贡献而没有引用此贡献的出处被两个独立的委员会, 审查委员会和调查委员会,判定为剽窃 . 为了此问题翁巨扬曾尊重地并私下地几次和托马索·泼吉奥教授联系但他没有回答。翁巨扬说 :“希望你提起这个问题不会激怒托马索·泼吉奥教授。他是我尊敬的老师之一,因为他的早期文章在我1983 年至1988 年期间当研究生时向我介绍了处于早期的计算脑科学。” 1997 年托马索·泼吉奥教授光荣地成为一名美国艺术和科学院院士。 (此文作者: Juan L. Castro-Garcia ) 参考文献 K. 福岛(Fukushima).“Neocognitron: 一个自组织的神经网络模型为了一个不受位置平移影响的模式识别的机能,”生物控制论,36,193-202,1980. A. 科里兹夫斯基(Krizhevsky),I. 苏兹凯夫(Sutskever), and G.辛顿(Hinton).“用深度卷积网络归类图像网,”在神经信息处理系统的进展25,1106–1114, 2012 年. Z. 麦克米林(McMillin).“密西根州立大学一个教授承认在2008年的一篇文章内剽窃,”州消息报, 2010 年4 月6日. M. 里森贺伯(Riesenhuber),T.泼吉奥(Poogio). “脑皮层内物体识别的层次模型,”自然神经科学, 2(11):1019–1025, 1999. J. 史密贺伯(Schmidhuber).“在神经网络里的深度学习: 一个综述,”技术报告IDSIA-03-14, 瑞士人工智能实验室IDSIA, 瑞士, 马诺-路伽诺(Manno-Lugano),2014 年10 月8 日. T. 希瑞(Serre),L. 沃尔夫(Wolf),S.拜尔斯基(Bileschi),M. 瑞森哈勃(Riesenhuber),T. 泼吉奥(Poggio). “似皮层机制的鲁棒的对象识别,”IEEE 模式分析与机器智能学报,29(3),411-426 2007. M. B. 思狄克棱(Sticklen). “撤回: 生物燃料生产的植物基因工程: 面向負擔得起的纤维素乙醇,”自然综述基因学, 11(308), 2008. J. 翁(Weng). 自然和人工智能: 计算脑心智导论 , BMI 出版社, 密西根, 欧科模斯, 2012. J. 翁(Weng)N. 阿乎嘉(Ahuja), T. S. 黄(Juang).“Cresceptron: 一个自组织的神经网络适应性地生长,” 国际联合神经网络会议录(IJCNN), 美国, 马里兰州, 巴尔的摩市, 第1卷(576-581),1992 年6 月. J. 翁(Weng)N.阿乎嘉(Ahuja), T. S. 黄(Juang). “学习从两维图像识别和分割三维物体,”IEEE 第4 届国际计算机视觉会议录(ICCV)”121-128, 1993 年5 月. J. 翁(Weng)N.阿乎嘉(Ahuja), T. S. 黄(Juang). “用生长认知网学习识别和分割,”国际计算机视觉期刊(IJCV),25(2),109-143,1997 年11 月. J. 翁(Weng),M. D. 卢契(Luciw), “脑启发的概念网: 从混杂的场景中学习概念,”IEEE 智能系统杂志,29(6), 14-22, 2014 年. Deep Learning is Hot: Max-Pooling Plagiarism? By Juan L. Castro-Garcia Deep learning is a term that describes learning by an artificial neural network that consists of acascade of nonlinear processing layers. Deep learning networks have recently attracted great interest from industries, such as Google, Microsoft, IBM,Samsung, and Baidu. I report a key architecture mechanism of deep learning network Cresceptron — well-known now as max-pooling — and ask the readerwhether HMAX plagiarized Cresceptron. In this report I do not claim that this is a plagiarism. August 2014, the Chinese Journal of Journalism Communication, announced that Ms. Yu,Yanru, a PhD student at Peking University, published an article in the journal,issue 7, 2013, that plagiarized from another article by Nina R. Gelbertpublished in the Eighteen-Century Studies journal, issue 4, 1984. The plagiarizing article was withdrawn from the journal and the author was disciplined by the journal. This announcement was widely reported, including BBC China online. Ms. Yu, Yanru was agraduate student, but the following involves a senior researcher. The word “plagiarize”was defined in the Merriam-Webster online dictionary: “to steal and pass off(the ideas or words of another) as one’s own; use (another’s production) withoutcrediting the source.” Until 1991, deep neuralnetworks were used for recognizing isolated two-dimensional (2-D) hand-writtendigits. Three dimensional (3-D) object recognition until then used 3-D model-based approaches— matching 2-D images with a handcrafted 3-D object model. Juyang Weng et al. assumed that inside a human brain a monolithic 3-D object model does not exist, although one may subjectively feel otherwise. They published Cresceptron in 1992 fordetecting and recognizing learned 3-D objects from natural and cluttered 2-D images and for segmenting the recognized objects from the 2-D images. Experimental examples of the learned objects , included human faces,human bodies, walkways, cars, dogs, fire hydrants, traffic signs, telephones, chairs, and desktop computers. Experimental examples of the natural andcluttered scenes , included TV program scenes, university campus outdoors, and indoor offices. Representations in Cresceptron are responses of distributed feature detectors that share among many objects. A Cresceptron is fully developmental in the sense that it incrementally grows and adapts through experience. It consists of a cascade of nonlinear processing modules where each module consists of a number of layers. Early layers in each module consist ofone or two pattern matching layers where each layer performs convolution — each convolution kernel learned at one image location is applied to all otherlocations so that the same feature can be used to detect at all other locations. Therefore, the convolution is for within-layer shift-invariance. However, a key challenge is that the number of training samples is limited. In order to recognize similar object views that Cresceptron has not observed, it must tolerate deformation in object views. The key mechanism in Cresceptron to tolerate deformation is the (2x2) to 1 reduction of nodes in every module using a maximization operation, to implement a Logic-OR for the firing rates of each group of (2x2) neurons. The 1993 publication of Cresceptron gave the mathematical expression forhierarchical max operations in the max-pooling. This is now commonly called max-pooling, see, e.g., a deeplearning review by Juergen Schmidhuber . According to the review, Cresceptronwas the first to use max-pooling. “Max-pooling is widely used in today’s deep feedforward neural networks” . For example, the winner of ImageNet LSVRC-2010 and ILSVRC-2012 contests used an architecture of a cascade ofmodules in which convolution layer(s) are followed by a max-pooling layer . Kindly invited by Prof. Tomaso Poggio, Weng gave a talk atthe Center for Biological and Computational Learning, Massachusetts Instituteof Technology, Cambridge, Massachusetts (MIT), Oct. 19, 1994. In a seminar roomat MIT that was an almost full audience, he presented Cresceptron under thetitle “Frameworks for Visual Learning.” Weng said that he greatly appreciatedthe visit with the host paying for the air ticket and accommodations. Weng explained to me that the hierarchical max-pooling hasat least four advantages: (1) hierarchical tolerance of local location deformation, (2) increasing the size of receptive fields without necessarily increasing the size of the convolution kernels because large convolution kernels are computationally veryexpensive, (3) reduction of feature detection density to tolerate feature-template matching errors, and (4) permit local dropouts (absence ofcomponents due to, e.g., occlusions) because the maximum of the four values is independent with the three smaller values. However, hierarchical max-pooling does not guarantee that theoutput of the deep convolutional networks is invariant to object shifts in the pixel plane, as explained in the fully detailed 1997 journal publication of Cresceptron . Furthermore, the deep cascade architecture is still fundamentally weak — regardless the size of training set and the power of computers— because it does not have any mechanism to do, like what a brain can,figure-ground automatic segmentation on training sets and testing sets. Thenewer Developmental Network (DN) architecture has such a mechanism , through autonomous and incremental development. About five years after Weng’s MIT visit Oct. 19, 1994,Maximilian Riesenhuber and Tomaso Poggio published a paper in NatureNeuroscience that was received June 17, 1999. Its abstract reads “Surprisingly,little quantitative modeling has been done ... We describe a new hierarchicalmodel ... The model is based on a MAX-like operation ... ” Its Fig. 2 captioncited Kunihiko Fukushima but the entire paper did not cite Cresceptron or its max-pooling method for the key max operation in their model. Fukushima handpicked particular layers to reduce thelocation precision, but he did not use the two major mechanisms of max-pooling:(1) maximization operation (see Eq. (4) in ) and (2) computer automatic reduction of the location resolution through every level of the network. Later, Tomaso Poggio called their model HMAX but still didnot cite Cresceptron. To investigate whether idea plagiarism took place, forexample, compare the left-column display equation on page 124 of , Eq. (17)of , the last equation in the last line of the left column on page 1024 of , and Eq. (3) of .Also compare Fig. 10(c) of and the dashed arrows in Fig. 2 of . Due to the introduction of some key architecture mechanismslike max-pooling and the practicality of massively parallel computers such as GPUs, deep learning networks have shown increasing performance in many tests for some pattern recognition tasks and have attracted increasing interest from industries, suchas Google, Microsoft, IBM, Samsung, and Baidu. The Nature Publishing Group’s policy document on plagiarism reads:“Discussion of published work: When discussing the published work of others,authors must properly describe the contribution of the earlier work. Both intellectual contributions and technical developments must be acknowledged assuch and appropriately cited.” For example, a paragraph within a paper that paraphraseda contribution without attribution to the contribution source was found by two independent committees, inquiry and investigative, to be a plagiarism . Respectfully and privately, Weng contacted Prof. Poggio a few times with regard to this issue but he did not reply. Weng said: “I wish that your raising this issue does not upset Prof. Tommy Poggio. He is one of my respected teachers because his early papers introduced me to computational neuroscience at its early stage when I was a graduate student 1983-1988.” 1997 Prof. Poggio was elected as a fellow of the American Academy of Arts and Sciences (AAAS). REFERENCES K. Fukushima. Neocognitron: A self-organizing neuralnetwork model for a mechanism of pattern recognition unaffected by shift inposition. Biological Cybernetics, 36:193–202, 1980. A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in NeuralInformation Processing Systems 25, pages 1106–1114, 2012. Z. McMillin. MSU professor admits to plagiarism in 2008 article. The State News , April 6,2010. M. Riesenhuber and T. Poggio. Hierarchical models of object recognition in cortex. Nature Neuroscience , 2(11):1019–1025, 1999. J. Schmidhuber. Deep learning in neural networks: Anoverview. Technical Report IDSIA-03-14, The Swiss AI Lab IDSIA, Manno-Lugano,Switzerland, October 8 2014. T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T.Poggio. Robust object recognition with cortex-like mechanisms. IEEE Trans. Pattern Analysis and Machine Intelligence, 29(3):411–426, 2007. J. Weng. Natural and Artificial Intelligence: Introduction to Computational Brain-Mind . BMI Press, Okemos, Michigan, 2012. J. Weng, N. Ahuja, and T. S. Huang. Cresceptron: A self-organizing neural network which grows adaptively. In Proc. Int’l Joint Conference on Neural Networks, volume 1, pages 576–581, Baltimore, Maryland,June 1992. J. Weng, N. Ahuja, and T. S. Huang. Learning recognitionand segmentation of 3-D objects from 2-D images. In Proc. IEEE 4th Int’l Conf.Computer Vision, pages 121–128, May 1993. J. Weng, N. Ahuja, and T. S. Huang. Learning recognition and segmentation using the Cresceptron. International Journal of Computer Vision , 25(2):109–143, Nov. 1997. J. Weng and M. D. Luciw. Brain-inspired conceptnetworks: Learning concepts from cluttered scenes. IEEE Intelligent Systems Magazine , 29(6):14–22, 2014.
Recommended Books Here is a list of books which I have read and feel it is worth recommending to friends who are interested in computer science. Machine Learning Pattern Recognition and Machine Learning Christopher M. Bishop A new treatment of classic machine learning topics, such as classification, regression, and time series analysis from a Bayesian perspective. It is a must read for people who intends to perform research on Bayesian learning and probabilistic inference. Graphical Models, Exponential Families, and Variational Inference Martin J. Wainwright and Michael I. Jordan It is a comprehensive and brilliant presentation of three closely related subjects: graphical models, exponential families, and variational inference. This is the best manuscript that I have ever read on this subject. Strongly recommended to everyone interested in graphical models. The connections between various inference algorithms and convex optimization is clearly explained. Note: pdf version of this book is freely available online. Big Data: A Revolution That Will Transform How We Live, Work, and Think Viktor Mayer-Schonberger, and Kenneth Cukier A short but insightful manuscript that will motivate you to rethink how we should face the explosive growth of data in the new century. Statistical Pattern Recognition (2nd/3rd Edition) Andrew R. Webb, and Keith D. Copsey A well written book on pattern recognition for beginners. It covers basic topics in this field, including discriminant analysis, decision trees, feature selection, and clustering -- all are basic knowledge that researchers in machine learning or pattern recognition should understand. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond Bernhard Schlkopf and Alexander J. Smola A comprehensive and in-depth treatment of kernel methods and support vector machine. It not only clearly develops the mathematical foundation, namely the reproducing kernel Hilbert space, but also gives a lot of practical guidance (e.g. how to choose or design kernels.) Mathematics Topology (2nd Edition) James Munkres A classic on topology for beginners. It provides a clear introduction of important concepts in general topology, such as continuity, connectedness, compactness, and metric spaces, which are the fundamentals that you have to grasped before embarking on more advanced subjects such as real analysis. Introductory Functional Analysis with Applications Erwin Kreyszig It is a very well written book on functional analysis that I would like to recommend to every one who would like to study this subject for the first time. Starting from simple notions such as metrics and norms, the book gradually unfolds the beauty of functional analysis, exposing important topics including Banach spaces, Hilbert spaces, and spectral theory with a reasonable depth and breadth. Most important concepts needed in machine learning are covered by this book. The exercises are of great help to reinforce your understanding. Real Analysis and Probability (Cambridge Studies in Advanced Mathematics) R. M. Dudley This is a dense text that combines Real analysis and modern probability theory in 500+ pages. What I like about this book is its treatment that emphasizes the interplay between real analysis and probability theory. Also the exposition of measure theory based on semi-rings gives a deep insight of the algebraic structure of measures. Convex Optimization Stephen Boyd, and Lieven Vandenberghe A classic on convex optimization. Everyone that I knew who had read this book liked it. The presentation style is very comfortable and inspiring, and it assumes only minimal prerequisite on linear algebra and calculus. Strongly recommended for any beginners on optimization. Note: the pdf of this book is freely available on the Prof. Boyd's website. Nonlinear Programming (2nd Edition) Dimitri P. Bersekas A thorough treatment of nonlinear optimization. It covers gradient-based techniques, Lagrange multiplier theory, and convex programming. Part of this book overlaps with Boyd's. Overall, it goes deeper and takes more efforts to read. Introduction to Smooth Manifolds John M. Lee This is the book that I used to learn differential geometry and Lie group theory. It provides a detailed introduction to basics of modern differential geometry -- manifolds, tangent spaces, and vector bundles. The connections between manifold theory and Lie group theory is also clearly explained. It also covers De Rham Cohomology and Lie algebra, where audience is invited to discover the beauty by linking geometry with algebra. Modern Graph Theory Bela Bollobas It is a modern treatment of this classical theory, which emphasizes the connections with other mathematical subjects -- for example, random walks and electrical networks. I found some messages conveyed by this book is enlightening for my research on machine learning methods. Probability Theory: A Comprehensive Course (Universitext) Achim Klenke This is a complete coverage of modern probability theory -- not only including traditional topics, such as measure theory, independence, and convergence theorems, but also introducing topics that are typically in textbooks on stochastic processes, such as Martingales, Markov chains, and Brownian motion, Poisson processes, and Stochastic differential equations. It is recommended as the main textbook on probability theory. A First Course in Stochastic Processes (2nd Edition) Samuel Karlin, and Howard M. Taylor A classic textbook on stochastic process which I think are particularly suitable for beginners without much background on measure theory. It provides a complete coverage of many important stochastic processes in an intuitive way. Its development of Markov processes and renewal processes is enlightening. Poisson Processes (Oxford Studies in Probability) J. F. C. Kingman If you are interested in Bayesian nonparametrics, this is the book that you should definitely check out. This manuscript provides an unparalleled introduction to random point processes, including Poisson and Cox processes, and their deep theoretical connections with complete randomness. Programming Structure and Interpretation of Computer Programs (2nd Edition) Harold Abelson, Gerald Jay Sussman, and Julie Sussman Timeless classic that must be read by all computer science majors. While some topics and the use of Scheme as the teaching language seems odd at first glance, the presentation of fundamental concepts such as abstraction, recursion, and modularity is so beautiful and insightful that you would never experienced elsewhere. Thinking in C++: Introduction to Standard C++ (2nd Edition) Bruce Eckel While it is kind of old (written in 2000), I still recommend this book to all beginners to learn C++. The thoughts underlying object-oriented programming is very clearly explained. It also provides a comprehensive coverage of C++ in a well-tuned pace. Effective C++: 55 Specific Ways to Improve Your Programs and Designs (3rd Edition) Scott Meyers The Effective C++ series by Scott Meyers is a must for anyone who is serious about C++ programming. The items (rules) listed in this book conveys the author's deep understanding of both C++ itself and modern software engineering principles. This edition reflects latest updates in C++ development, including generic programming the use of TR1 library. Advanced C++ Metaprogramming Davide Di Gennaro Like it or hate it, meta-programming has played an increasingly important role in modern C++ development. If you asked what is the key aspects that distinguishes C++ from all other languages, I would say it is the unparalleled generic programming capability based on C++ templates. This book summarizes the latest advancement of metaprogramming in the past decade. I believe it will take the place of Loki's Modern C++ Design to become the bible for C++ meta-programming. Introduction to Algorithms (2nd/3rd Edition) Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein If you know nothing about algorithms, you never understand computer science. This is book is definitely a classic on algorithms and data structures that everyone who is serious about computer science must read. This contents of this book ranges from elementary topics such as classic sorting algorithms and hash table to advanced topics such as maximum flow, linear programming, and computational geometry. It is a book for everyone. Everytime I read it, I learned something new. Design Patterns: Elements of Reusable Object-Oriented Software Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides Textbooks on C++, Java, or other languages typically use toy examples (animals, students, etc) to illustrate the concept of OOP. This way, however, does not reflect the full strength of object oriented programming. This book, which has been widely acknowledged as a classic in software engineering, shows you, via compelling examples distilled from real world projects, how specific OOP patterns can vastly improve your code's reusability and extensibility. Structured Parallel Programming: Patterns for Efficient Computation Michael McCool, James Reinders, and Arch Robison Recent trends of hardware advancement has switched from increasing CPU frequencies to increasing the number of cores. A significant implication of this change is that free lunch has come to an end -- you have to explicitly parallelize your codes in order to benefit from the latest progress on CPU/GPUs. This book summarizes common patterns used in parallel programming, such as mapping, reduction, and pipelining -- all are very useful in writing parallel codes. Introduction to High Performance Computing for Scientists and Engineers Georg Hager and Gerhard Wellein This book covers important topics that you should know in developing high performance computing programs. Particularly, it introduces SIMD, memory hierarchies, OpenMP, and MPI. With these knowledges in mind, you understand what are the factors that might influence the run-time performance of your codes. CUDA Programming: A Developer's Guide to Parallel Computing with GPUs Shane Cook This book provides an in-depth coverage of important aspects related to CUDA programming -- a programming technique that can unleash the unparalleled power of GPU computation. With CUDA and an affordable GPU card, you can run your data analysis program in the matter of minutes which may otherwise require multiple servers to run for hours.
Datawocky On Teasing Patterns from Data, with Applications to Search, Social Media, and Advertising 【立委按】讨论中提到,即便机器学习已经达到手工系统的水平,谷歌搜索的研发人员也不愿意转用机器学习。说担心机器学出来的模型在训练集未见的现象上铸成大错。而他们相信,手工系统对付未见现象不至于走偏太大。这个论点,不好置评。不过,我觉得,更主要的原因不在这里,而在遇到具体质量问题时,机器学习系统是一锅粥,很难 debug(除非不怕麻烦,重新训练去再煮一锅粥,但也常常是隔靴搔痒,很难保证这锅新粥对要解决的具体问题会奏效)。而手工系统只要设计合理(比如模块化设计,减少牵一发动全身的后果),具体问题具体对待,可直接针对性调控,debug 就容易多了。因此,即便质量相近的系统,机器学习也不占优势,因为不好维护调控以逐步提高质量(incremental enhancement)。 A couple of days ago I had coffee with Peter Norvig . Peter is currently Director of Research at Google. For several years until recently, he was the Director of Search Quality -- the key man at Google responsible for the quality of their search results. Peter also is an ACM Fellow and co-author of the best-selling AI textbook Artificial Intelligence: A Modern Approach . As such, Peter's insights into search are truly extraordinary. I have known Peter since 1996, when he joined a startup called Junglee , which I had started together with some friends from Stanford. Peter was Chief Scientist at Junglee until 1998, when Junglee was acquired by Amazon.com. I've always been a great admirer of Peter and have kept in touch with him through his short stint at NASA and then at Google. He's now taking a short leave of absence from Google to update his AI textbook. We had a fascinating discussion, and I'll be writing a couple of posts on topics we covered. It has long been known that Google's search algorithm actually works at 2 levels: An offline phase that extracts signals from a massive web crawl and usage data. An example of such a signal is page rank. These computations need to be done offline because they analyze massive amounts of data and are time-consuming. Because these signals are extracted offline, and not in response to user queries, these signals are necessarily query-independent. You can think of them tags on the documents in the index. There are about 200 such signals. An online phase, in response to a user query. A subset of documents is identified based on the presence of the user's keywords. Then, these documents are ranked by a very fast algorithm that combines the 200 signals in-memory using a proprietary formula. The online, query-dependent phase appears to be made-to-order for machine learning algorithms. Tons of training data (both from usage and from the armies of raters employed by Google), and a manageable number of signals (200) -- these fit the supervised learning paradigm well, bringing into play an array of ML algorithms from simple regression methods to Support Vector Machines . And indeed, Google has tried methods such as these. Peter tells me that their best machine-learned model is now as good as, and sometimes better than, the hand-tuned formula on the results quality metrics that Google uses. The big surprise is that Google still uses the manually-crafted formula for its search results. They haven't cut over to the machine learned model yet. Peter suggests two reasons for this. The first is hubris: the human experts who created the algorithm believe they can do better than a machine-learned model. The second reason is more interesting. Google's search team worries that machine-learned models may be susceptible to catastrophic errors on searches that look very different from the training data. They believe the manually crafted model is less susceptible to such catastrophic errors on unforeseen query types. This raises a fundamental philosophical question. If Google is unwilling to trust machine-learned models for ranking search results, can we ever trust such models for more critical things, such as flying an airplane, driving a car, or algorithmic stock market trading? All machine learning models assume that the situations they encounter in use will be similar to their training data. This, however, exposes them to the well-known problem of induction in logic. The classic example is the Black Swan , popularized by Nassim Taleb's eponymous book . Before the 17th century, the only swans encountered in the Western world were white. Thus, it was reasonable to conclude that all swans are white. Of course, when Australia was discovered, so were the black swans living there. Thus, a black swan is a shorthand for something unexpected that is outside the model. Taleb argues that black swans are more common than commonly assumed in the modern world. He divides phenomena into two classes: Mediocristan, consisting of phenomena that fit the bell curve model, such as games of chance, height and weight in humans, and so on. Here future observations can be predicted by extrapolating from variations in statistics based on past observation (for example, sample means and standard deviations). Extremistan, consisting of phenomena that don't fit the bell curve model, such as the search queries, the stock market, the length of wars, and so on. Sometimes such phenomena can sometimes be modeled using power laws or fractal distributions, and sometimes not. In many cases, the very notion of a standard deviation is meaningless. Taleb makes a convincing case that most real-world phenomena we care about actually inhabit Extremistan rather than Mediocristan. In these cases, you can make quite a fool of yourself by assuming that the future looks like the past. The current generation of machine learning algorithms can work well in Mediocristan but not in Extremistan. The very metrics these algorithms use, such as precision, recall, and root-mean square error (RMSE), make sense only in Mediocristan. It's easy to fit the observed data and fail catastrophically on unseen data. My hunch is that humans have evolved to use decision-making methods that are less likely blow up on unforeseen events (although not always, as the mortgage crisis shows). I'll leave it as an exercise to the interested graduate student to figure out whether new machine learning algorithms can be devised that work well in Extremistan, or prove that it cannot be done. May 24, 2008 in Data Mining , Search | Permalink TRACKBACK TrackBack URL for this entry: http://www.typepad.com/services/trackback/6a00d83471bc3153ef00e5527c00a38833 Listed below are links to weblogs that reference Are Machine-Learned Models Prone to Catastrophic Errors? : COMMENTS You can follow this conversation by subscribing to the comment feed for this post. from http://anand.typepad.com/datawocky/2008/05/are-human-experts-less-prone-to-catastrophic-errors-than-machine-learned-models.html 【置顶:立委科学网博客NLP博文一览(定期更新版)】
情感分析,我研究了也有半年有余了,ACL Anthology上关于情感分析的论文也基本看过了一遍,但是到目前还没有什么成就的。以下是我为一位同学毕业设计写的情感分析方面的综述,引用的论文基本上是ACL和COLING还有EMNLP上历年关于情感分析的论文,本文应该学术性比较强一点,本文虽不打算发表,但由于将来可能还有用,以及关于学术上的原因,请大家如果要引用请务必标明出处( http://blog.sina.com.cn/s/blog_48f3f8b10100irhl.html )。 概述 情感分析自从2002年由Bo Pang提出之后,获得了很大程度的研究的,特别是在在线评论的情感倾向性分析上获得了很大的发展,目前基于在线评论文本的情感倾向性分析的准确率最高能达到90%以上,但是由于深层情感分析必然涉及到语义的分析,以及文本中情感转移现象的经常出现,所以基于深层语义的情感分析以及篇章级的情感分析进展一直不是很大。情感分析还存在的一个问题是尚未存在一个标准的情感测试语料库,虽然Bo Pang实验用的电影评论数据集(http://www.cs.cornell.edu/people/pabo/movie-review-data/)以及Theresa Wilson等建立的MPQA(http://www.cs.pitt.edu/mpqa/)是目前广泛使用的两类情感分析数据集,但是并没有公认的标准加以确认。 目前情感分析的研究基本借鉴文本分类等机器学习的方法,还没有根据自身的特点形成一套独立的研究方法,当然在某种程度上也可以把情感分析看出一种特殊的文本分类。比较成熟的方法是基于监督学习的机器学习方法,半监督学习和无监督学习目前的研究不是很多,单纯的基于规则的情感分析这两年已很少研究了。既然目前很多情感分析的研究基于机器学习,那么特征选择就是一个很重要的问题,N元语法等句法特征是使用最多的一类特征,而语义特征(语义计算)和结构特征(树核函数)从文本分类的角度看效果远没有句法特征效果好,所以目前的研究不是很多的。 由于基于监督学习情感分析的研究已经很成熟了,而且在真实世界中由于测试集的数量要远远多于训练集的数量,并且测试集的领域也不像在监督学习中被限制为和训练集一致,也就是说目前情感分析所应用的归纳偏置假设在真实世界中显得太强的,为了和真实世界相一致,基于半监督学习或弱指导学习的情感分析和跨领域的情感分析势必是将来的研究趋势之一。 在情感分析的最初阶段基于语义和基于规则的情感分析曾获得了比较大的重视,但是由于本身实现的复杂性以及文本分类和机器学习方法在情感分析应用上获得的成功,目前关于这方面的研究以及很少了,但是事实上,语义的相关性和上下文的相关性正是情感分析和文本分类最大的不同之处,所以将基于语义和规则的情感分析与基于机器学习的情感分析相结合也将是未来的研究趋势之一。 以下将分别对情感分析的起源,目前基于监督学习,无监督学习,基于规则和跨领域的情感分析的一些研究工作进行简单的介绍。 起源 虽然之前也有一些相关工作,但目前公认的情感分析比较系统的研究工作开始于(Pang et al., 2002)基于监督学习(supervised learning)方法对电影评论文本进行情感倾向性分类和(Turney,2002)基于无监督学习(unsupervised learning)对文本情感情感倾向性分类的研究。(Pang et al., 2002)基于文本的N元语法(ngram)和词类(POS)等特征分别使用朴素贝叶斯(Naive Bayes),最大熵(Maximum Entropy)和支持向量机(Support Vector Machine,SVM)将文本情感倾向性分为正向和负向两类,将文本的情感进行二元划分的做法也一直沿用至今。同时他们在实验中使用电影评论数据集目前已成为广泛使用的情感分析的测试集。(Turney ,2002)基于点互信息(Pointwise Mutual Information,PMI)计算文本中抽取的关键词和种子词(excellent,poor)的相似度来对文本的情感倾向性进行判别(SO-PMI算法)。 在此之后的大部分都是基于(Pang et al., 2002)的研究。而相对来说,(Turney et al.,2002)提出的无监督学习的方法虽然在实现上更加简单,但是由于单词之间的情感相似度难以准确的计算和种子词的难以确定,继续在无监督学习方向的研究并不是很多的,但是利用SO-PMI算法计算文本情感倾向性的思想却被很多研究者所继承了。 监督学习 目前,基于监督学习的情感分析仍然是主流,除了(Li et al.,2009)基于非负矩阵三分解(Non-negative Matrix Tri-factorization),(Abbasi et al.,2008)基于遗传算法(Genetic Algorithm)的情感分析之外,使用的最多的监督学习算法是朴素贝叶斯,k最近邻(k-Nearest Neighbor, k -NN),最大熵和支持向量机的。而对于算法的改进主要在对文本的预处理阶段。 一个和文本分类不同地方就是情感分析有时需要提取文本的真正表达情感的句子。(Pang et al., 2004)基于文本中的主观句的选择和(Wilson el al.,2009)基于文本中的中性实例(neutral instances)的分析,都是为了能够尽量获得文本中真正表达情感的句子。(Abbasi et al.,2008)提出通过信息增益(Information Gain,IG)的方法来选择大量特征集中对于情感分析有益的特征。 而对于特征选择,除了N元语法和词类特征之外,(Wilson el al.,2009)提出混合单词特征,否定词特征,情感修饰特征,情感转移特征等各类句法特征的情感分析,(Abbasi et al.,2008)提出混合句子的句法(N元语法,词类,标点)和结构特征(单词的长度,词类中单词的个数,文本的结构特征等)的情感分析。 除了对于文本的预处理,对于监督学习中情感分析还进行了以下方面的研究的。(Melville et al., 2009)和(Li et al.,2009)提出结合情感词的先验的基于词典的情感倾向性和训练文本中后验的基于上下文的情感情感倾向性共同判断文本的情感倾向性。(Taboada et al.,2009)提出结合文本的题材(描述,评论,背景,解释等)和文本本身的特征共同判断文本的情感倾向性。(Tsutsumi et al.,2007)提出利用多分类器融合技术来对文本情感分类。(Wan, 2008)和(Wan, 2009)提出结合英文中丰富的情感分析资源来提高中文情感分析的效果。 基于规则 / 无监督学习 和基于监督学习的情感分析相比,基于规则和无监督学习方面的研究不是很多。除了(Turney,2002)之外,(朱嫣岚 et al.,2002)利用HowNet对中文词语语义的进行了情感倾向计算。(娄德成 et al.,2006)利用句法结构和依存关系对中文句子语义进行了情感分析,(Hiroshi et al.,2004)通过改造一个基于规则的机器翻译器实现日文短语级情感分析,(Zagibalov et al.,2008)在(Turney,2002)的SO-PMI算法的基础上通过对于中文文本特征的深入分析以及引入迭代机制从而在很大程度上提高了无监督学习情感分析的准确率。 跨领域情感分析 跨领域情感分析在情感分析中是一个新兴的领域,目前在这方面的研究不是很多,主要原因是目前的研究还没有很好的解决如何寻找两个领域之间的一种映射关系,或者说如何寻找两个领域之间特征权值之间的平衡关系。对于跨领域情感分析的研究开始于(Blitzer et al.,2007)将结构对应学习(Structural Correspondence Learning,SCL)引入跨领域情感分析,SCL是一种应用范围很广的跨领域文本分析算法,SCL的目的是将训练集上的特征尽量对应到测试集中。(Tan et al.,2009)将SCL引入了中文跨领域情感分析中。(Tan 2 et al.,2009)提出将朴素贝叶斯和EM算法的一种半监督学习方法应用到了跨领域的情感分析中。(Wu et al.,2009)将基于EM的思想将图排序(Graph Ranking)算法应用到跨领域的情感分析中,图排序算法可以认为是一种迭代的 k -NN算法。 从目前的研究可以看出,跨领域的情感分析主要问题在于寻找两个领域之间的一种映射关系,但是这样的映射关系或者很难寻找,或者需要相当强的数学证明。所以很多研究借用半监督学习的方法,通过逐次迭代逐渐减少训练集和测试集之间的差异。 参考文献: Xiaojun Wan.Using Bilingual Knowledge and Ensemble Techniques for Unsupervised Chinese Sentiment Analysis.Proceedings of EMNLP-08,553-561 Xiaoun Wan.Co-Training for Cross-Lingual Sentiment Classification.Proceedings of ACL-09,234-243 Theresa Wilson,Janyce Wiebe,Paul Hoffmann. Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level. Computer Linguistics,25(3),399-433 Ahmed Abbasi,Hsinchun Chen,Arab,Salem.Sentiment Analysis in Multiple Languages:Feature Selection for Opinion Classification in Web Forums.ACM Transaction on Information Systems,26(3),12:1-12:34 Prem Melville,Wojciech Gryc,Richard D.Larence.Sentiment Analysis Of Blogs by Combining Lexical Knowledge with Text Classification.Proceedings of KDD-09,1275-1283 KANAYAMA Hiroshi,NASUKAWA Tetsuya,WATANBE Hideo.Deep Sentiment Analysis Using Machine Translation Technology.Proceedings of Coling -04 Maite Taboada,Julian Brooke,Manfred Stede.Genre-Based Paragraph Classification for Sentiment Analysis.Proceedings of SIGDIAL-09,62-70 Taras Zagibalov,John Carroll.Automatic Seed Word Selection for Unsupervised Sentiment Classification of Chinese Text.Proceedings of Coling-08,1073-1080 Bo Pang,Lillian Lee.A Sentimental Education:Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts.Proceedings of ACL-04 Bo Pang,Lillian Lee,Shivakumar Vaithyanathan.Thumbs up?Sentiment Classification using Machine Learning Techniques.Proceedings of EMNLP-02,79-86 Peter D. Turney.Thumbs Up or Thumbs Down?Senmantic Orientition Applied to Unsupervised Classification of Reviews.Proceedings of ACL-02,417-424 Kimitaka Tsutsumi, Kazutaka Shimada,Tsutomu Endo. Movie Review Classification Based on a Multiple Classifier. Proceedings of the 21st Pacific Asia Conference on Language, Information and Computation (PACLIC21), 481-488 John Blitzer,Mark Dredze, Fernando Pereira. Biographies, Bollywood, Boom-boxes and Blenders:Domain Adaptation for Sentiment Classification. Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, 440–447 Songbo Tan,Xueqi Cheng. Improving SCL Model for Sentiment-Transfer Learning. Proceedings of NAACL HLT 2009: Short Papers, 181–184 Songbo Tan, Xueqi Cheng, Yuefen Wang, Hongbo Xu. Adapting Naive Bayes to Domain Adaptation for Sentiment Analysis. ECIR 2009,337–349 Qiong Wu,Songbo Tan,Xueqi Cheng. Graph Ranking for Sentiment Transfer. Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, 317–320 Tao Li Ti Zhang,Vikas Sindhwani.A Non-negative Matrix Tri-factorization Approach to Sentiment Classification with Lexical Prior Knowledge.Proceeding of ACL-09,244-252 娄德成,姚天妨.汉语与子语义极性分析和观点抽取方法的研究.计算机应用,2006,26(11),2622-2625 朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德. 基于HowNet的词汇语义倾向计算. 中文信息学 报,2006,20(1),14-20
如果你不能简单说清,那就是你还没有完全明白。 某个新的理论,若不是建立在连儿童都能理解的物理图景之上,那么它极有可能毫无价值。(附: 爱因斯坦的宇宙.pdf ) —— 阿尔伯特 · 爱因斯坦 关于讨论班的若干要求: 文字尽量用英文( 课本中的原话 ),不要翻译; PPT用统一的模板( ppt模板 ); PPT中如果引用了课本上的公式,尽量给出公式编号; 在讲书本上的每个图时,一定要进行分析、比较,而不是单纯地告诉听众这个是\beta分布,那个是高斯分布。 本学期要学习的教材 Christopher M. Bishop , Pattern recognition and machine learning (PRML) . 2006, New York:Springer. ( .pdf , errata.pdf , Solutions to theExercises.pdf , PRML 笔记.pdf , chp01.ppt , chp02.ppt , chp03.ppt , chp08.ppt ) Dougherty, G. , Pattern Recognition andClassification: An Introduction . 2013, New York: Springer.( .pdf , data.rar , code.rar) Kevin P Murphy , Machine Learning: a Probabilistic Perspective (MLaPP) . 2012, MIT Press. ( MLapp.pdf , code.rar ) 讨论班报告记录 2014/03/08 ,Sat. ,广C-616 ,讨论班,报告人:石丽( 验证码识别.ppt ) 、蔡逸飞( 图像修复.pptx ) 、梁浩然( 深度学习.pptx ) 。 2014/03/15 ,Sat. ,广C-616 ,讨论班,报告人:吴烨( 人脑神经纤维成像及三维重构研究.pptx ) 、蔡逸飞( 图像修复(2).pptx , Criminisi.pdf , MinimumErrorBoundaryCut.pdf , PSO_Inpainting.pdf ) 。 2014/04/09,Wed., 广C-515 ,大组讨论班,报告: 模式识别的非参数方法.ppt 2014/04/16,Wed., 广C-515,大组讨论班,报告人:李泽界, 讲: Sparse Subspace Clustering: Algorithm, Theory, and Applications (Elhamifar et al., 2013, TPAMI). The main idea is actually motivated by the famous paper: Robust face recognition via sparse representation (J.Wright et al. , 2009, TPAMI), where the lables of the training data are assumed to beknown beforehand, and the priori information that the test sample y can be sparsely represented by the training ones from the same subspace of y are utilized for classification.For the clustering problem, the case is contrary. The lables of the training data are unkown. The sparse prior, again, is used to derive the lables. In this paper, Elhamifar et al. just used the sparse coefficient matrix C of the data Y to calculate the weights W between data points (corresponding to the weights W on the edges of the similarity graph), and then apply spectral clustering to the similarity graph . Therefore, sparse representation and spectral clustering are two key points of this paper. Unfortunately, none of them are clarified on the seminar. 2014/04/19,Sat., 广C-515 ,讨论班:石丽(Bayes决策论)和Mordekai( 分布)。 2014/05/04,Sun.,阴雨, 广C-515 , 蔡逸飞介绍高斯分布(Sec.2.3.1 ~ Sec.2 .3.4),沈闻佳(Chp.1, 关于曲线拟合的总结),Mordekai(Chp.1, Sec.1.5 Decision Theroy ). 2014/05/11,Sun., 雨, 广C-616,再次讨论: Sec. 2.3 The Gaussian Distribution 【主讲人】 :高晨晖( Sec. 2.3, 引言部分)、史金专(Sec. 2.3.1~2.3.2)、姜海军(Sec. 2.3.3)、沈闻佳(Sec. 2.3.4, ppt )、李小薪(Sec. 2.3.5~2.3.7) 【报告评论】:大家都下了很多功夫准备,但沈闻佳的报告是最精彩的:清晰而严格地给出了高斯分布的均值和协方差的最大似然估计的计算过程,并对估计结果的有偏和无偏进行了严格地证明。第一次在讨论班上听到了同学如此严谨、如此清晰的报告,十分开心。《红与黑》的作者司汤达尔说: 绝对清晰,是风格上的唯一的美 。这 难道 不应该是我们要努力学习的吗? 【Review】:2.3.1~2.3.3,条件高斯分布和边缘高斯分布;2.3.4~2.3.6,高斯分布的参数估计:最大似然估计、最大似然的序贯(sequential)估计、Bayesian估计(inference);2.3.7,由高斯-Gamma分布(在对一元高斯分布的精度\lambda进行估计时产生的分布)衍生出了学生t-分布;2.3.8 周期变量的高斯分布;2.3.9 混合高斯模型。 Introduction: 高斯分布 极其重要。因为 它“arises in many different contexts and can be motivated from a variety of different perspectives”,例如:最大化熵的分布就是高斯分布,中心极限定理和二项分布等都与高斯分布有关); 证明多元高斯分布(2.43)是规范化的(从到的积分为1)。从高斯分布的几何形式(马氏距离)出发,重点讨论了协方差矩阵(马氏距离与欧式距离最大的差别就在协方差阵上),根据 协方差矩阵关于其特征向量的展开式 ,把高斯分布的表达式从原始空间变换到了 以协方差矩阵的特征向量为坐标轴的变换空间 ,从而,在此变化空间中,很容易地证明了多元高斯分布(2.43)的确是规范化的; 再来证明多元高斯分布的均值为\mu,协方差为\Sigma。这部分的证明用了重要的 变换技巧z=x-\mu ,使得证明过程看起来十分简单、清晰; 最后,讨论了多元高斯分布的局限性,从而引出了latent/hidden variables、mixture of Gaussians、hierarchical models、probabilistic graphical models等概念。 Conditional Gaussian distributions. 总体服从高斯分布,那么条件局部服从什么分布?也就是:已知 ~ ,那么 服从什么分布?作者用一句话就证明了 是服从高斯分布的: From the product rule of probability, we see that this conditional distribution can be evaluated from the joint distribution simply by fixing x_b to the observed value and normalizing the resulting expression to obtain a valid probability distribution over x_a . 因为,由乘法准则: ,分母上的p(x_b)可以看做是为了规范化p(x_a|x_b)才需要的,因此,求解p(x_a|x_b)就相当于把p(x_a,x_b)视作关于x_a的函数(把x_b看做常数),然后再做个规范化。 由于p(x_a,x_b)是高斯分布,那么,把p(x_a,x_b)看做关于x_a的函数时,也应该是高斯分布(这句话说的可能不大正确),因此,p(x_a|x_b)也是高斯分布 。然后,基于这一结论,作者用了一个简单的技巧“ complete the quare ”,很容易就给出了 ( 也 服从高斯分布)的均值和方差与 和 之间的关系 。 Marginal Gaussian distributions. 总体服从高斯分布,那么局部服从什么分布?也就是:已知 ~ ,那么 服从什么分布?由于 ,因此,只要把x_b积掉(integrate out)就可以得到p(x_a)了;再由 p(x_a)的形式确定其均值和协方差 。先看如何 积掉 x_b。这个积分看起来挺复杂的,但作者再一次向我们展示了如何化繁为简,思路是:先把联合分布式(2.70)中与x_b相关的式子拿出来,再用“complete the quare”的技术将其转换为类似于 的高斯分布的二次式(2.84),然后利用高斯分布的积分为1,就很容易算得这部分积分。这里需要说明(2.84)式是如何得到的: Bayes’Theorem for Gaussian Variables. 已知p(x)和p(y|x)都服从高斯分布,且 p(y|x)的均值是x的线性函数(linear Gaussian model),求p(y)和p(x|y)。其思路是:先利用p(x,y) = p(x)p(y|x)算得联合PDF,再利用边缘高斯分布的结论由p(x,y)计算p(y),用条件高斯分布的结论 由p(x,y)计算p(y|x)。 Maximum likelihood for the Gaussian. 详见沈闻佳的 PPT 。 Sequential estimation. 当样本数非常多时,高斯分布的最大似然估计是困难的,可以采用序贯方法。最大似然估计即:令似然函数 的导数为0,求得的解。Robbins-Monro算法给出了求解的一般公式: ,该算法将使 以概率1收敛于0. Bayesian inference for the Gaussian. 最大似然估计没有为参数的先验分布施加任何约束,本节阐述如何利用共轭先验,为高斯分布均值的估计、协方差的估计、均值和协方差的估计施加先验。均值的共轭先验,仍然是高斯分布;精度(协方差的逆)的共轭 先验,是gamma分布(一元高斯)/Wishart分布;均值和精度的共轭先验是高斯-gamma分布。对于一元高斯分布,式(2.140)利用 complete the square 技术,给出了均值和方差的估计(对应于 Exercise 2.38 ),下面给出具体的推导过程: 首先,在似然函数中,关于 的exp指数部分的标准的二次形式是: , (7.1) 的最大似然估计是 ,对应于二次形式中的一次项;精度对应于二次形式的系数: 。 对于施加了先验约束 的 ,其指数部分可化为: , 下面给出如何将其化为标准的 二次形式(这里需注意:是关于 的二次式,不关心常数项): 对照(7.1),可得: 及 . Student’s t-distribution. 主要学习t-分布的鲁棒性,见p.104的Fig. 2.16,其代码实现见MLaPP的p.40的Fig. 2.8的实现代码: robustDemo.m , MLaPP的代码从 这里 下载 。 2014/05/17,Sat., 小 雨 , 广 C-616, 讨论: Sec. 2.4 The Exponetial Family 姜海军主讲: The Exponetial Family;蔡逸飞讲: Fig. 2.16( robustDemo.m)。 讨论班安排 时间 章节 主讲人 2014/05/17 星期 6 Fig. 2.16 , robustDemo.m 代码从 这里 下载, MLapp.pdf 蔡逸飞 Sec. 2.3.8~2.3.9 姜晓睿 Sec. 2.4 Exponential Family 姜海军 Sec. 3.1 Linear Basis Function Models 朱娅妮 2014/05/24 星期6 Sec. 2.3.8~2.3.9 姜晓睿 Sec. 3.2 The Bias-Variance Decomposition 高晨晖 Sec. 3.3 Bayesian Linear Regression 史金专 Sec. 3.4 Bayesian Model Comparison 石丽
原文链接:http://blog.csdn.net/xiaowei_cqu/article/details/23689189 OpenCV中实现了两个版本的高斯混合背景/前景分割方法(Gaussian Mixture-based Background/Foreground Segmentation Algorithm) ,调用接口很明朗,效果也很好。 BackgroundSubtractorMOG 使用示例 view plain copy int main(){ VideoCapture video( 1.avi ); Mat frame,mask,thresholdImage, output; videoframe; BackgroundSubtractorMOG bgSubtractor(20,10,0.5, false ); while ( true ){ videoframe; ++frameNum; bgSubtractor(frame,mask,0.001); imshow( mask ,mask); waitKey(10); } return 0; } 构造函数可以使用默认构造函数或带形参的构造函数: view plain copy BackgroundSubtractorMOG::BackgroundSubtractorMOG() BackgroundSubtractorMOG::BackgroundSubtractorMOG( int history, int nmixtures, double backgroundRatio, double noiseSigma=0) 其中history为使用历史帧的数目,nmixtures为混合高斯数量,backgroundRatio为背景比例,noiseSigma为噪声权重。 而调用的接口只有重载操作符(): view plain copy void BackgroundSubtractorMOG::operator()(InputArray image, OutputArray fgmask, double learningRate=0) 其中image为当前帧图像,fgmask为输出的前景mask,learningRate为背景学习速率。 以下是使用BackgroundSubtractorMOG进行前景/背景检测的一个截图。 BackgroundSubtractorMOG2 使用示例 view plain copy int main(){ VideoCapture video( 1.avi ); Mat frame,mask,thresholdImage, output; //videoframe; BackgroundSubtractorMOG2 bgSubtractor(20,16, true ); while ( true ){ videoframe; ++frameNum; bgSubtractor(frame,mask,0.001); coutframeNumendl; //imshow(mask,mask); //waitKey(10); } return 0; } 同样的,构造函数可以使用默认构造函数和带形参的构造函数 view plain copy BackgroundSubtractorMOG2::BackgroundSubtractorMOG2() BackgroundSubtractorMOG2::BackgroundSubtractorMOG2( int history, float varThreshold, bool bShadowDetection= true ) history同上,varThreshold表示马氏平方距离上使用的来判断是否为背景的阈值(此值不影响背景更新速率),bShadowDetection表示是否使用阴影检测(如果开启阴影检测,则mask中使用127表示阴影)。 使用重载操作符()调用每帧检测函数: view plain copy void BackgroundSubtractorMOG2::operator()(InputArray image, OutputArray fgmask, double learningRate=-1) 参数意义同BackgroundSubtractorMOG中的operator()函数。 同时BackgroundSubtractorMOG2提供了getBackgroundImage()函数用以返回背景图像: view plain copy void BackgroundSubtractorMOG2::getBackgroundImage(OutputArray backgroundImage) 另外OpenCV的refman中说新建对象以后还有其他和模型油有关的参数可以修改,不过比较坑的是opencv把这个这些函数参数声明为protected,同时没有提供访问接口,所以要修改的话还是要自己修改源文件提供访问接口。 view plain copy protected : Size frameSize; int frameType; Mat bgmodel; Mat bgmodelUsedModes; //keep track of number of modes per pixel int nframes; int history; int nmixtures; //! here it is the maximum allowed number of mixture components. //! Actual number is determined dynamically per pixel double varThreshold; // threshold on the squared Mahalanobis distance to decide if it is well described // by the background model or not. Related to Cthr from the paper. // This does not influence the update of the background. A typical value could be 4 sigma // and that is varThreshold=4*4=16; Corresponds to Tb in the paper. ///////////////////////// // less important parameters - things you might change but be carefull //////////////////////// float backgroundRatio; // corresponds to fTB=1-cf from the paper // TB - threshold when the component becomes significant enough to be included into // the background model. It is the TB=1-cf from the paper. So I use cf=0.1 = TB=0. // For alpha=0.001 it means that the mode should exist for approximately 105 frames before // it is considered foreground // float noiseSigma; float varThresholdGen; //correspondts to Tg - threshold on the squared Mahalan. dist. to decide //when a sample is close to the existing components. If it is not close //to any a new component will be generated. I use 3 sigma = Tg=3*3=9. //Smaller Tg leads to more generated components and higher Tg might make //lead to small number of components but they can grow too large float fVarInit; float fVarMin; float fVarMax; //initial variance for the newly generated components. //It will will influence the speed of adaptation. A good guess should be made. //A simple way is to estimate the typical standard deviation from the images. //I used here 10 as a reasonable value // min and max can be used to further control the variance float fCT; //CT - complexity reduction prior //this is related to the number of samples needed to accept that a component //actually exists. We use CT=0.05 of all the samples. By setting CT=0 you get //the standard StaufferGrimson algorithm (maybe not exact but very similar) //shadow detection parameters bool bShadowDetection; //default 1 - do shadow detection unsigned char nShadowDetection; //do shadow detection - insert this value as the detection result - 127 default value float fTau; // Tau - shadow threshold. The shadow is detected if the pixel is darker //version of the background. Tau is a threshold on how much darker the shadow can be. //Tau= 0.5 means that if pixel is more than 2 times darker then it is not shadow //See: Prati,Mikic,Trivedi,Cucchiarra,Detecting Moving Shadows...,IEEE PAMI,2003. 以下是使用BackgroundSubtractorMOG2检测的前景和背景: 参考文献: KaewTraKulPong, Pakorn, and Richard Bowden. An improved adaptive background mixture model for real-time tracking with shadow detection. Video-Based Surveillance Systems. Springer US, 2002. 135-144. Zivkovic, Zoran. Improved adaptive Gaussian mixture model for background subtraction. Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on. Vol. 2. IEEE, 2004.
标题有点哗众取宠。 最近The New York Times 聘请 哥伦比亚大学的生物学研究人员(Chris Wiggins) 作为其“首席数据科学家 (chief data scientist)”. 具体报道见: WHY THE NEW YORK TIMES HIRED A BIOLOGY RESEARCHER AS ITS CHIEF DATA SCIENTIST Chris要做的事情说起来很容易,就是要“预测哪些用户将退订The New York Times ( Unsubscribing? The New York Times Wants to Predict That . )”, 但是做起来可不容易。看看 Chris的个人简历 就明白了,不仅要有很强的统计学背景,还需要很深厚的机器学习(machine learning,ML)的功底。 统计学已经是一门妇孺皆知的学科,机器学习对一部分研究人员来说可能会有些陌生。 ML是人工智能这个领域的一个分支,在多个领域已经有了广泛地应用。掌握这门技术,您将会像Chris一样开启一条新的职业发展道路。这里还有更多让人振奋的消息: Google this year spent $400 million to acquire a single machine-learning startup (see Is Google Conrnering the Market on Deep Learning ? Amazon is currently advertising positions for 40 machine-learning scientistis-in addtion to scores it already employs. 除此之外,诸多国际知名公司(像Microsoft, IBM)都对ML很是重视, see Microsoft research- machine learning department . 在如今的大数据时代,ML更是变得不可或缺。 当然,如果您不想像进军其他行业,在生物学领域也可以大有作为。 ML已经成为生物信息学和系统生物学研究的一门核心技术,基于ML已经开展了诸多理论和应用方面的研究。大家熟知的Human ENCODE project, ML已在其中大展身手( Machine learning approaches to genomics: ENCODE has applied machine learning approaches to enable integration and exploration of large and diverse data )。用machine learning关键词搜索Pubmed,您将获得5000多篇ML相关的研究文章。如果感兴趣,先看看其中的review吧。 最后,夹点私货,我们最近用ML技术开展了拟南芥胁迫应答基因的挖掘研究。对6种非生物胁迫的分析结果表明,ML较传统的差异表达分析方法能够更准确地预测出胁迫应答基因, 还可以预测出一批表达水平变化不大的胁迫应答候选基因。有兴趣的可以进一步挖掘一下这些预测结果。具体内容见: Machine Learning–Based Differential Network Analysis: A Study of Stress-Responsive Transcriptomes in Arabidopsis
这一讲,林老师从4个不同的角度对机器学习算法做分类。 1、Learning with Different Output Space $\mathcal{Y}$ binary classification: $y = \{+1, -1\}$; multiclass classification: $y = \{1, 2,\cdots ,K \}$; regression: $y = \mathcal{R}$; structured learning: $y = $ structures; ......and a lot more!! 2、Learning with Different Data Label $y_n$ supervised: known all labels $y_n$; unsupervised: unknown labels; semi-supervised: some labels known; reinforcement: implicit yn by goodness ($\tilde{y_n}$); ......and more!! 3、Learning with Different Protocol $f \Rightarrow (x_n; y_n)$ Protocol $\Leftrightarrow$ Learning Philosophy batch: duck feeding (learn everything at the same time); online: passive sequential (每次学习一个样本); active: question asking (sequentially) —query the $y_n$ of the chosen $x_n$. Active : improve hypothesis with fewer labels (hopefully) by asking questions strategically . 除此之外,还有min-batch,即介于batch和online之间,每次选取一小部分数据进行学习。 4、Learning with Different Input Space $\mathcal{X}$ concrete: sophisticated (and related) physical meaning; raw: simple physical meaning; abstract: no (or little) physical meaning; ......and more!! concrete features: each dimension of $X\in R^d$ represents sophisticated physical meaning. concrete features 是指能够反映当前机器学习任务间最本质区别或联系的特征。
原址: http://submodularity.org/ Overview This page collects some material and references related to submodular optimization, with applications in particular in machine learning and AI. Convex optimization has become a main workhorse for many machine learning algorithms during the past ten years. When minimizing a convex loss function for, e.g., training a Support Vector Machine, we can rest assured to efficiently find an optimal solution, even for large problems. In recent years, another fundamental problem structure, which has similar beneficial properties, has emerged as very useful in a variety of machine learning applications: Submodularity is an intuitive diminishing returns property, stating that adding an element to a smaller set helps more than adding it to a larger set. Similarly to convexity, submodularity allows one to efficiently find provably (near-)optimal solutions. Tutorials Tutorial on Submodularity in Machine Learning -- New Directions at ICML 2013 by Stefanie Jegelka and Andreas Krause . Tutorials on Submodularity in Machine Learning and Computer Vision at DAGM 2012 and ECAI 2012 by Stefanie Jegelka and Andreas Krause . Invited tutorial Intellgent Optimization with Submodular Functions at LION 2012 by Andreas Krause. Slides: Intelligent Information Gathering and Submodular Function Optimization at IJCAI 2009 by Andreas Krause and Carlos Guestrin. Slides: Beyond Convexity: Submodularity in Machine Learning at ICML 2008 by Andreas Krause and Carlos Guestrin. Video (recorded Oct 17 2008 at Carnegie Mellon University) Part I: Minimizing submodular functions Part II: Maximizing submodular functions Extended tutorial slides, updated July 6 2008 Software, Materials and References High-performance implementation of the minimum norm point algorithm for submodular function minimization with several applications MATLAB Toolbox for submodular function optimization maintained by Andreas Krause. Journal of Machine Learning Research Open Source Software paper Survey on Submodular Function Maximization by Daniel Golovin and Andreas Krause. To appear as chapter in Tractability: Practical Approaches to Hard Problems (This draft is for personal use only. No further distribution without permission). Class on Submodular Functions by Jeff Bilmes Annotated bibliography . Related Meetings and Workshops Cargese Workshop on Combinatorial Optimization, Topic: Submodular Functions organized by Samuel Fiorini, Gianpaolo Oriolo, Gautier Stauffer and Paolo Ventura. NIPS 2012 Workshop on Discrete Optimization in Machine Learning: Structure and Scalability organized by Stefanie Jegelka, Andreas Krause, Pradeep Ravikumar, Jeff Bilmes. Modern Aspects of Submodularity workshop at GeorgiaTech organized by Shabbir Ahmed, Nina Balcan, Satoru Iwata and Prasad Tetali NIPS 2011 Workshop on Discrete Optimization in Machine Learning: Uncertainty, Generalization and Feedback organized by Andreas Krause, Pradeep Ravikumar, Jeff Bilmes and Stefanie Jegelka. NIPS 2010 Workshop on Discrete Optimization in Machine Learning: Structures, Algorithms and Applications organized by Andreas Krause, Pradeep Ravikumar, Jeff Bilmes and Stefanie Jegelka. NIPS 2009 Workshop on Discrete Optimization in Machine Learning: Submodularity, Sparsity and Polyhedra organized by Andreas Krause, Pradeep Ravikumar and Jeff Bilmes This page is maintained by Andreas Krause and Carlos Guestrin . Please send suggested additions or corrections by email.
机器学习推荐书目 发表于 2013 年 8 月 22 日 由 justin 来源:水木社区人工智能版。 发信人: Insomnia (完美主义是种病), 信区: AI 标 题: Machine Learning书单 发信站: 水木社区 (Fri Mar 29 16:46:37 2013), 站内 持续更新,请补充。 除了以下推荐的书以外,出版在Foundations and Trends in Machine Learning上面的survey文章都值得一看。 入门: Pattern Recognition And Machine Learning Christopher M. Bishop Machine Learning : A Probabilistic Perspective Kevin P. Murphy The Elements of Statistical Learning : Data Mining, Inference, and Predictio n Trevor Hastie, Robert Tibshirani, Jerome Friedman Information Theory, Inference and Learning Algorithms David J. C. MacKay All of Statistics : A Concise Course in Statistical Inference Larry Wasserman 优化: Convex Optimization Stephen Boyd, Lieven Vandenberghe Numerical Optimization Jorge Nocedal, Stephen Wright Optimization for Machine Learning Suvrit Sra, Sebastian Nowozin, Stephen J. Wright 核方法: Kernel Methods for Pattern Analysis John Shawe-Taylor, Nello Cristianini Learning with Kernels : Support Vector Machines, Regularization, Optimizatio n, and Beyond Bernhard Schlkopf, Alexander J. Smola 半监督: Semi-Supervised Learning Olivier Chapelle 高斯过程: Gaussian Processes for Machine Learning (Adaptive Computation and Machine Le arning) Carl Edward Rasmussen, Christopher K. I. Williams 概率图模型: Graphical Models, Exponential Families, and Variational Inference Martin J Wainwright, Michael I Jordan Boosting: Boosting : Foundations and Algorithms Schapire, Robert E.; Freund, Yoav 贝叶斯: Statistical Decision Theory and Bayesian Analysis James O. Berger The Bayesian Choice : From Decision-Theoretic Foundations to Computational I mplementation Christian P. Robert Bayesian Nonparametrics Nils Lid Hjort, Chris Holmes, Peter Müller, Stephen G. Walker Principles of Uncertainty Joseph B. Kadane Decision Theory : Principles and Approaches Giovanni Parmigiani, Lurdes Inoue 蒙特卡洛: Monte Carlo Strategies in Scientific Computing Jun S. Liu Monte Carlo Statistical Methods Christian P.Robert, George Casella 信息几何: Methods of Information Geometry Shun-Ichi Amari, Hiroshi Nagaoka Algebraic Geometry and Statistical Learning Theory Watanabe, Sumio Differential Geometry and Statistics M.K. Murray, J.W. Rice 渐进收敛: Asymptotic Statistics A. W. van der Vaart Empirical Processes in M-estimation Geer, Sara A. van de 不推荐: Statistical Learning Theory Vladimir N. Vapnik Bayesian Data Analysis, Second Edition Andrew Gelman, John B. Carlin, Hal S. Stern, Donald B. Rubin Probabilistic Graphical Models : Principles and Techniques Daphne Koller, Nir Friedman
在翻译 肯尼斯•丘吉 ( Kenneth Church)教授【钟摆摆得太远】,读到了敏斯基(Minsky)对神经网络的这个经典批判:异或(XOR)是神经网络感知机(perceptrons)的命门。 丘吉 教授指出这一批评其实对很多 流行 的 机器学习算法适用,因为这些算法对训练数据有一个线性可分的假设前提(linearly separable assumption),而异或门是线性不可分的。有关章节的论述和翻译附于文末。 丘吉 教授说,由于这个异或门,情感分类(sentiment classification)在学习中不得不放弃使用褒贬词,因为这些褒贬词虽然情感色彩浓烈,但是其指向却随着上下文而改变。而这种改变往往需要依赖异或判断,线性分离为基础的机器学习根本就学不会这种判别。 怎样理解这个异或门困境? 朋友是这样解说的,我觉得比较形象: 情绪词不能线性分隔,因为它的值依赖一个异或判断xor。即: 好坏的分类=“情绪词的好坏” XOR “对象” 例如对“我们”来说: 好词xor我们 = 好 好词xor他们 = 坏 坏词xor我们 = 坏 坏词xor他们 = 好 如果把对象用X轴表示,我们为+1、他们(我们的竞争对手)为-1;情绪词用Y轴表示,+1为好词,-1为坏词。 那么“好坏分类”= X*Y. 其结果为+1就代表“好”,-1代表坏。可以看出 意思为好的两个点处在一三象限,坏点在二四象限。这两组点不能用一条直线分隔开来。 所以基于线性分隔的机器学不会情绪词。 【附录:异或门批判的有关章节】 (节选自/ 译自: K. Church 2011. A Pendulum Swung Too Far . Linguistics issuesin Language Technology, Volume 6 ,Issue 5 . ) 3.2敏斯基的批评 敏斯基和帕佩特( Minsky and Papert 1969)表明,感知机(perceptrons, 或统而言之,线性分离机)不能学会分隔那些不可线性可分的,如异或(XOR)和连通性(connectedness)。在二维空间里,如果可以用直线分隔开标记为正例和负例的点,该散点图即线性可分。更一般地,在n维空间中,当有n-1维超平面(hyperplane)能分隔正例和负例,其点便线性可分。 3.2 Minsky's Objections Minsky and Papert (1969) showed that perceptrons(and more generally, linear separators) cannot learn functions that are notlinearly separable such as XOR and connectedness. In two dimensions, a scatter plotis linearly separable when a line can separate the points with positive labelsfrom the points with negative labels. More generally, in n dimensions, pointsare linearly separable when there is a n-1 dimensional hyperplane thatseparates the positive labels from the negative labels. ...... 3.3为什么当前技术忽略谓词 信息检索和情感分析的权重系统往往专注于刚性指示词( rigid designators,例如名词,译注:刚性指示词指的是意义独立,不随上下文而改变的实体名词,基于关键词模式匹配的机器学习比较容易模型刚性指示词,因为它不受上下文关系的干扰),而忽略谓词(动词,形容词和副词)以及强调词(例如,“非常”)和贬损词(loaded terms,如“米老鼠”和“破烂儿”,译者注,“米老鼠”是贬损词,因为它与“破烂儿”一样贬义,说一家企业是米老鼠,表示的是轻蔑)。其原因可能与敏斯基和帕佩特对感知机的批评有关。多年前,我们有机会接触MIMS 数据集,这是由AT&T操作员收集的文本留言。其中一些评论被操作员标记为正面,负面或中性。刚性指示词(通常是名词)往往与正面或者负面紧密关联,但也有不少贬损词,不是正面就是负面,很少中性。 3.3 Why Current Technology Ignores Predicates Weighting systems for Information Retrievaland Sentiment Analysis tend to focus on rigid designators (e.g., nouns) andignore predicates (verbs, adjectives and adverbs) and intensifiers (e.g.,“very”) and loaded terms (e.g., “Mickey Mouse” and “Rinky Dink”). The reason might be related to Minsky and Papert's criticism ofperceptrons. Years ago, we had access to MIMS, a collection of text commentscollected by ATT operators. Some of the comments were labeled byannotators as positive, negative or neutral. Rigid designators (typicallynouns) tend to be strongly associated with one class oranother, but there were quite a few loaded terms that were either positive ornegative, but rarely neutral. 贬损词怎样会是正面呢?原来是,当贬损词与竞争对手相关联,标注者就把文档标为对我方“好”(正例);当感性词与我们关联,即标注为对我方“坏”(负例)。换句话说,有一种异或关系(贬损词 XOR 我方)超出了线性分离机的能力。 How can loaded terms be positive? It turnsout that the judges labeled the document as good for us if the loaded term waspredicated of the competition, and bad if it was predicated of us. In otherwords, there is an XOR dependency (loaded term XOR us) that is beyond the capabilitiesof a linear separator. 目前的做法,情感分析和信息检索不考虑修饰成分(谓词与算元的关系,强调词和贬损词),因为除非你知道他们在修饰什么,修饰成分的意义很难捕捉。忽视贬损词和强调词似乎是个遗憾,尤其对情感分析,因为贬损词显然表达了强烈的主观意见。但对于一个特征,如果你不知道应该给正面还是负面的符号,即使你知道强度大也没什么用。 Current practice in Sentiment Analysis andInformation Retrieval does not model modifiers (predicate-argumentrelationships, intensifiers and loaded terms), because it is hard to make senseof modifiers unless you know what they are modifying. Ignoring loaded terms andintensifiers seems like a missed opportunity, especially for Sentiment Analysis,since loaded terms are obviously expressing strong opinions. But you can't domuch with a feature if you don't know the sign, even if you know the magnitude is large. 当谓词 -算元关系最终被模型化,由于上述 XOR 异或问题的存在,我们最终需要对线性可分的前提假定重新审视。 When predicate-argument relationships areeventually modeled, it will be necessary to revisit the linearly separableassumption because of the XOR problem mentioned above. 【置顶:立委科学网博客NLP博文一览(定期更新版)】
有回顾NLP(Natural Language Processing)历史的大牛介绍统计模型(通过所谓机器学习 machine learning)取代传统知识系统(又称规则系统 rule-based system)成为学界主流的掌故,说20多年前好像经历了一场惊心动魄的宗教战争。其实我倒觉得更像49年解放军打过长江去,传统NLP的知识系统就跟国民党一样兵败如山倒,大好江山拱手相让。文傻秀才遭遇理呆兵,有理无理都说不清,缴械投降是必然结果。唯一遗憾的也许是,统计理呆仗打得太过顺利,太没有抵抗,倒是可能觉得有些不过瘾,免不了有些胜之不武的失落。苍白文弱的语言学家也 太不经打了。 自从 20 年前统计学家势不可挡地登堂入室一统天下以后,我这样语言学家出身的在学界立马成为二等公民,一直就是敲边鼓,如履薄冰地跟着潮流走。走得烦了,就做一回阿桂。 NLP 这个领域,统计学家完胜,是有其历史必然性的,不服不行。虽然统计学界有很多对传统规则系统根深蒂固的偏见和经不起推敲但非常流行的蛮横结论(以后慢慢论,血泪账一笔一笔诉 :),但是机器学习的巨大成果和效益是有目共睹无所不在的:机器翻译,语音识别/合成,搜索排序,垃圾过滤,文档分类,自动文摘,知识习得,you name it 甚至可以极端一点这么说,规则系统的成功看上去总好像是个案,是经验,是巧合,是老中医,是造化和运气。而机器学习的成功,虽然有时也有 tricks,但总体而论是科学的正道,是可以重复和批量复制的。 不容易复制的成功就跟中国餐一样,同样的材料和recipe,不同的大厨可以做出完全不同的味道来。这就注定了中华料理虽然遍及全球,可以征服食不厌精的美食家和赢得海内外无数中餐粉丝,但中餐馆还是滥竽充数者居多,因此绝对形成不了麦当劳这样的巨无霸来。 而统计NLP和机器学习就是麦当劳这样的巨无霸:味道比较单调,甚至垃圾,但绝对是饿的时候能顶事儿, fulfilling,最主要的是 no drama,不会大起大落。不管在世界哪个角落,都是一条流水线上的产品,其味道和质量如出一辙 。 做不好主流,那就做个大厨吧。做个一级大厨感觉也蛮好。最终还是系统说了算。邓小平真是聪明,有个白猫黑猫论,否则我们这些前朝遗老不如撞墙去。 就说过去10多年吧,我一直坚持做多层次的 deep parsing,来支持NLP的各种应用。当时看到统计学家们追求单纯,追求浅层的海量数据处理,心里想,难怪有些任务,你们虽然出结果快,而且也鲁棒,可质量总是卡在一个口上就过不去。从“人工智能”的概念高度看,浅层学习(shallow learning)与深层分析(deep parsing)根本就不在一个档次上,你再“科学”也没用。可这个感觉和道理要是跟统计学家说,当时是没人理睬的,是有理说不清的,因为他们从本质上就鄙视或忽视语言学家 ,根本就没有那个平等对话的氛围(chemistry)。最后人家到底自己悟出来了,因此近来天上掉下个多层 deep learning,视为神迹,仿佛一夜间主导了整个机器学习领域,趋之者若鹜。啧啧称奇的人很多,洋洋自得的也多,argue 说,一层一层往深了学习是革命性的突破,质量自然是大幅度提升。我心里想,这个大道理我十几年前就洞若观火,殊途不还是同归了嘛。想起在深度学习风靡世界之前,曾有心有灵犀的老友这样评论过: To me, Dr. Li is essentially the only one who actualy builds true industrial NLP systems with deep parsing. While the whole world is praised with heavy statistics on shallow linguistics, Dr. Li proved with excellent system performances such a simple truth: deep parsing is useful and doable in large scale real world applications. 我的预见,大概还要20年吧(不是说风水轮流转20年河东河西么),主流里面的偏见会部分得到纠正,那时也不会是规则和知识的春天重返,而是统计和规则比较地和谐合作。宗教式的交恶和贬抑会逐渐淡去。 阿弥陀佛! 【相关篇什】 【立委随笔:文傻和理呆的世纪悲剧(romance tragedy)】 ZT: 2013突破性科学技术之“深度学习” 【置顶:立委科学网博客NLP博文一览(定期更新版)】
今天参加了一个工业界发起的数据挖掘的研讨会(在Meetup网站上发起的,该网站主旨就是把有共同兴趣爱好的人聚集在一起,网址 http://www.meetup.com/ ),发现目前数据挖掘和机器学习的应用侧重点在于如何处理大规模的数据Big data,特别是在计算能力不足,Feature数目庞大的情况下,的学习问题。会议地点在伦敦的financial times总部,金融时报对这个讨论会的支持,足以可见DM+ML在工业界的具大应用价值。 演讲者有三位: “Approximate Methods for Scalable Data Mining” by Andrew Clegg, Data Scientist Tech Manager of Analytics Team @Pearson Probabilistic data structures let you trade off accuracy for scalability, by allowing a small and measurable amount of error in return for huge improvements in efficiency. Andrew’s talk provides an overview with use cases 这个演讲者来自大名鼎鼎的皮尔逊公司,他们面对的主要是大数据的情况下计算能力不足,存储空间不够的问题,因此他的解决思路是将Feature映射为包含01的二维编码,这样减少存储空间,以及提高处理效率。 “Storm + Trident and The Holistic Architecture: Using Hadoop for batch and Storm for real time” by Yodit Stanton, Freelance Data Scientist, Developer Systems Architect. Computing arbitrary functions on an arbitrary dataset in real time is a daunting problem. There is no single tool that provides a complete solution. Instead, you have to use a variety of tools and techniques to build a complete Big Data system. A Holistic Architecture may solve the problem of computing arbitrary functions on arbitrary data in real time by decomposing the problem into three layers: the batch layer, the serving layer, and the speed layer 这位演讲者是个数据挖掘的使用者,没有太多干货,主要是讲她们公司应用的东西。 PredictionIO - An Open Source Scalable Machine Learning Architecture by Simon Chan Product Lead @ Prediction.IO To deal with big data in a production environment, a horizontally non-blocking and scalable system is needed. PredictionIO provides a flexible architecture for data engineers to evaluate algorithms and apply them to real applications. The whole stack is built on top of open source software while PreodictionIO itself is an open source Scala project. Simon will introduce the system design and answer any questions you may have as developers or data scientists. 这位演讲者是华人叫Simon,他的工作很有创意,是提供开源的学习平台叫 PredictionIO ,用户提供数据,使用平台引擎,得到结果输出,相当于在底层API上外包了一层,用户主要只需关心于数据挖掘问题的建模和数据预处理,这样对于应用者来说,是非常有意义的。大家如果感兴趣,可以关注这个他们的网站 http://prediction.io/ 目前他们也在继续招募平台开发人员。 在整个研讨会期间,提供了很多啤酒,水果和小吃,然后感兴趣的朋友聚集在一起就某些问题深入讨论,这种轻松的氛围确实可以大大激发人的潜力。
Ⅰ. Machine Learning Andrew Ng Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it. Many researchers also think it is the best way to make progress towards human-level AI. In this class, you will learn about the most effective machine learning techniques, and gain practice implementing them and getting them to work for yourself. More importantly, you'll learn about not only the theoretical underpinnings of learning, but also gain the practical know-how needed to quickly and powerfully apply these techniques to new problems. Finally, you'll learn about some of Silicon Valley's best practices in innovation as it pertains to machine learning and AI. This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition. Topics include: (i) Supervised learning (parametric n-parametric algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in machine learning (bias ariance theory; innovation process in machine learning and AI). The course will also draw from numerous case studies and applications, so that you'll also learn how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas. Ⅱ. Neural Networks for Machine Learning Geoffrey Hinton Neural networks use learning algorithms that are inspired by our understanding of how the brain learns, but they are evaluated by how well they work for practical applications such as speech recognition, object recognition, image retrieval and the ability to recommend products that a user will like. As computers become more powerful, Neural Networks are gradually taking over from simpler Machine Learning methods. They are already at the heart of a new generation of speech recognition devices and they are beginning to outperform earlier systems for recognizing objects in images. The course will explain the new learning procedures that are responsible for these advances, including effective new proceduresr for learning multiple layers of non-linear features, and give you the skills and understanding required to apply these procedures in many other domains. This YouTube video gives examples of the kind of material that will be in the course, but the course will present this material at a much gentler rate and with more examples. Ⅲ. 机器学习 余凯,张潼 Kai Yu , a deputy engeering director of Baidu, managing the company's multimedia department. Tong Zhang , Professor in Department of Statistics, Rugers University. 今天,如果你从事互联网搜索,在线广告,用户行为分析,图像识别,自然语言理解,或者生物信息学,智能机器人,金融预测,那么有一门核心课程你必须深入了解,那就是-机器学习(Machine Learning)。作为人工智能的核心内容,机器学习致力于开发智能的计算机算法从历史经验数据中学习出有用的模型,从而对未知数据或事件做预测。作为一门前沿学科,它结合了计算机算法,概率论,统计学,脑神经科学,控制论,心理学,和优化理论等多方面知识。 两位授课者在机器学习领域享有国际声誉,不仅各自在世界顶级杂志和会议上发表了上百篇学术论文,而且都在著名高科技公司积累了多年左右的工作经验。通过这门课程,学生将系统掌握习机器学习的基本知识,理论,和算法,还将通过一些实例领略其在应用中发挥的巨大作用。 Course Videos , Course Slides 前两个课程都是Coursera上面的课程,可以注册一个用户,免费参与课程的学习。第一个课程的影响力很大,课程大概在今年4月22日开始新的课程,该课程持续10周。 第二个课程,推荐的主要理由是因为它由Deep learning大牛Hiton主讲的,我也才看几集,还是很值得学习的课程。 后面一个课程是国内的龙星计划课程,讲了机器学习的基本理论,同时,也引入了机器学习中比较前沿的研究课题。总共19集,每集大约45分钟(中文课程)。 The main content is copied from: 1. https://www.coursera.org/course/ml 2. https://www.coursera.org/course/neuralnets 3. http://bigeye.au.tsinghua.edu.cn/DragonStar2012/index.html
今天读了ICML12'一篇有趣的论文《 Machine Learning that Matters 》作者 Kiri Wagstaff 。 下面是这篇论文的摘要: Much of current machine learning (ML) research has lost its connection to problems of import to the larger world of science and society. From this perspective, there exist glaring limitations in the data sets we investigate, the metrics we employ for evaluation, and the degree to which results are communicated back to their originating domains. What changes are needed to how we conduct research to increase the impact that ML has? We present six Impact Challenges to explicitly focus the field’s energy and attention, and we discuss existing obstacles that must be addressed. We aim to inspire ongoing discussion and focus on ML that matters. 作者在文中指出了目前机器学习研究太过于注重测试数据,如UCI等,而忽略了数据的实际应用领域;同时,目前使用的性能评价指标像AUC、ROC曲线,完全忽略了数据本身的应用背景,提供的是一个数值上的对比,很难保证这个数值在实际应用中有任何意义。例如在植物学领域,80%的准确度也许是一个很不错的结果了,但是99%的准确度显示某类蘑菇是无毒的,也许我们也不敢吃这类蘑菇。性能评价应该结合具体的应用背景才能有其实际的价值。 作者指出,一个真正实用的机器学习算法或系统应该如下图所示 但是,现在的研究工作主要集中在第二行所示的部分,第一行使用标准测试数据替代了,而最后一行领域知识则经常被忽略了,即使主流的机器学习期刊和会议也不例外。忽略这些,实际上的标准测试数据还不如人造数据(synthetic data),因为人造数据是可控的,而标准测试数据完全不可控。 另外,这位女作者也是非常有意思,02年已经拿到了CS的PhD,08年又拿了地质学的Master,现在又在攻读Master of Library and Information Science。佩服。 强烈推荐好好阅读这篇论文。
数据科学(大数据处理)领域的大牛和研究机构总结 (第3次修改) 1 Jeffrey David Ullman ( Stanford University) http://infolab.stanford.edu/~ullman/ Mining of Massive Datasets (大数据:互联网大规模数据挖掘与分布式处理) Compilers: Principles, Techniques, and Tools 这两本书的作者。 2 Anand Rajaraman Mining of Massive Datasets 的一作。 3 Jim Gray http://research.microsoft.com/en-us/um/people/gray/ 可惜已经消失在大海里,估计喂鱼了,非常可惜!! 4 Andrew Ng(吴恩达) , Stanford University, 会说中文, 很NB的一个人。 http://ai.stanford.edu/~ang/ 5 Daphne Koller , Stanford University http://ai.stanford.edu/~koller/index.html 6 Michael I. Jordan , University of California, Berkeley http://www.cs.berkeley.edu/~jordan/ 7 David M. Blei, Princeton University http://www.cs.princeton.edu/~blei/ 8 Geoffrey E. Hinton , University of Toronto godfather of neural networks, 人工神经网络之父 deep learning的领军人物 http://www.cs.toronto.edu/~hinton/
虽然,在我自己一直都在做机器学习方面的研究,并总是听说统计机器学习。但为什么要叫“统计”机器学习?因为机器学习是“learn from the data”. 我们也可以将此看做是一个统计过程,然后得到一些“规律”,然后用这个“规律”去预测新输入的数据应该具有哪些属性。在机器学习中,我们建立一个数学模型后,然后通过已有数据得到模型中的参数的值,最后进行预测。 人工智能,我在本科的时候就对此很感兴趣,但是一直就不明白什么是人工智能,是否是研究一种能够像人一样思考、处理问题的机器,就称为人工智能呢?说到这里,我们得明白什么是智能, Intelligence has been defined in many different ways including, but not limited to, abstract thought, understanding, self-awareness, communication, reasoning, learning , having emotional knowledge , retaining , planning , and problem solving . Artificial intelligence is the simulation of intelligence in machines (From wikipedia 来自维基百科). 人工智能方法,目前是使用先验知识,制定一些规则集合,然后按照这个规则集合进行推理判断。这种方法,使得人工智能方法的泛化能力很弱。如果遇到的情况在规则集合中不存在处理它的方法,那么判断进入未知状态。 在前几天,看了王珏老师在中科院自动化所模式识别国家重点实验室的一个讲座的PPT( 结构+平均 ),下载自新浪共享。 王老师谈到:“机器学习抹杀了变量间的结构性,人工智能方法忽略了变量之间的条件独立关系”, 在PPT中提到了Daphne Koller的著作“概率图模型”。 这本著作试图找到这两种方法的一个好的“结合点”。 如果对此方法感兴趣可以去阅读该著作,其著作有1200多页,所以,要下大工夫去读。 最后,希望大家对能够对文中错误的地方给予指正,谢谢。
10.20 学习理论( Leaning Theory ) 机器学习之学习理论.docx 解决的问题: 1. 如何权衡并给出比较正式的方法来权衡 bias 和 variance ?(在接下来进行模型选择方法的时候会进一步讲解) 2. 在机器学习中,我们更关心的是一般误差,而大多数的只关心训练误差,如何才能通过训练误差体现出一般误差呢?换句话说,训练误差怎样才能更好的代表一般误差? 3. 在什么条件下学习到的算法才能很好的工作? 为了描述欠拟合和过拟合(其实大家都懂得,不多说直接给例图如下),给出示意图如下: 第一个是线性函数(欠拟合),第二个是二次函数,第三个是五次函数(过拟合)。 说到这里,不得不提一下什么是训练误差和一般误差。 其实训练误差就是得到的分类器对训练集进行分类,分错的平均误差;而一般误差可以理解为在真正的数据上(不仅仅是训练集)分类器分错的误差。下面举个例子说一下: 假设存在一个学习算法,假设集合 H={h 1 , h 2 , ..., h k } ,里面的每一个元素都是一个假设,都可以将待分类的数据 X 映射到 {0,1} 类。由于在分类过程中难免会出错,我们的目的就是找出一个具有最小训练误差的假设 h i 。 对于上述这个问题,需要分成两步来看:( 1 )首先用 来估计所有假设的误差是有合理、说服力的。( 2 )其次就是 能够良好的反应 h 的一般误差(即训练出来的学习算法,在训练样本和真实数据上的分类效果差不多),假设一般误差记做 ,可以通过证明 | | 总是小于一个比较小的上界来达到此目的。 下面任意选择一个 h i ∈ H 来进行说明。假设所有的数据点服从某分布 (x,y)~D 时,有指示变量 服从伯努利分布, 注意,这里的 Z 可以用来计算一般误差,个人理解也就是对于一般的数据点来说其分错的概率就由伯努利分布的公式直接计算就可以了,说白了就是期望值,比如就是伯努利分布中值为 1 的概率值。同样的,设训练集中的数据点也服从分布 D ,定义 . 这样 Z 和 Z j 都服从同样的分布。 设 代表 Z 的(其实也是 Z i 的)期望值。则训练误差可以记做 。则根据下面的定理: 可得: 其实可以看到只要样本量 m 足够大,训练误差和样本误差会非常接近。 接下来,另 A i 定义事件 , 则有 。然后根据下面的定理: 可以得到: 然后两边同时用 1 减,可以得到: 也就是说所有的假设 h ∈ H 中,训练误差和一般误差之间的差值的绝对值都不会大于某个上界γ的概率(也就是一致收敛的情况)为 。 到目前为止,此问题相当于给定γ和训练样本数量 m , H 中的所有假设一致收敛的概率是: 。 在此再提出“样本复杂度”的概念,设 H 中所有的假设 的概率为 时,需要的样本数量 ,也就是“样本复杂度”。 现在大家明白了吧,这些讲的内容基本意思就是如何选择样本数量 m ,才能让所有 H 中的假设在训练样本上的分类效果和在真实的数据上的分布效果一致。然后再进行到底选哪个假设 h 才能最小化误差才有意义,否则你选了一个假设 h (举个不太恰当的例子,如开始讲的哪个过拟合的图中的五次函数)虽然得到的分类结果在训练集上的误差很小,但是在真正的数据集上说不定误差就会非常大,没什么用的。
PHOG descriptor Code Download the PHOG code . It computes the PHOG descriptor over a Region Of Interest (ROI). If you want to compute the descriptor for the whole image the ROI is the image size. 链接: http://www.robots.ox.ac.uk/~vgg/research/caltech/phog.html
Prof. Shah 是计算机视觉的大牛,尤其是视频中的行为识别,做的很不错,Jingen Liu就是他的学生。 Source Code Background Modeling Bayesian Object Detection in Dynamic Scenes This code performs background modeling and foreground estimation in dynamic scenes captured by static cameras. The algorithm implemented has three innovations over existing approaches. First, the correlation in intensities of spatially proximal pixels is exploited by using a nonparametric density estimation method over a joint domain-range representation of image pixels, multimodal spatial uncertainties and complex dependencies between the domain (location) and range (color). The model of the background is implemented as a single probability density, as opposed to individual, independent, pixel-wise distributions. Second, temporal persistence is used as a detection criterion. Unlike previous approaches to object detection which detect objects by building adaptive models of the background, the foreground is modeled to augment the detection of objects (without explicit tracking) since objects detected in the preceding frame contain substantial evidence for detection in the current frame. Finally, the background and foreground models are used competitively in a MAP-MRF decision framework, stressing spatial context as a condition of detecting interesting objects and the posterior function is maximized efficiently by finding the minimum cut of a capacitated graph. This method is useful for moving object detection in scenes containing dynamic backgrounds, e.g., fountains, fans, and moving trees, etc. The entry point for background modeling is Main.m. Project Page: http://server.cs.ucf.edu/~vision/projects/backgroundsub.htm Yaser Sheikh and Mubarak Shah, Bayesian Modelling of Dyanmic Scenes for Object Detection , IEEE Transactions on PAMI, Vol. 27, Issue 11 (Nov 2005), pp. 1778-1792. Shape from Shading Zhang-Tsai-Cryer-Shah (C code) Code form the following publication: Ruo Zhang,Ping-Sing Tsai, James Cryer and Mubarak Shah, Shape from Shading: A Survey ', IEEE Transactions on PAMI, Volume 21, Number 08, August, 1999, pp 690-706 Cryer- Tsai-Shah Method (C code) Source code for the Cryer-Tsai-Shah method for combining shape from shading and stereo depth maps. Related Publication:James Cryer, Ping-Sing Tsai and Mubarak Shah. Shape from Shading and Stereo , Pattern Recognition, Volume 28, No. 7, pp 1033-1043, Jul 1995. Tsai-Shah Method (C code) Source code for the Tsai-Shah method for shape from shading. Related Publication: Ping-sing Tsai and Mubarak Shah, Shape From Shading Using Linear Approximation , Technical Report, 1992. Fundamental Matrix Fundamental Matrix Code (Matlab) normalise2dpts (Matlab) Computes the fundamental matrix from 8 or more matching points in a stereo pair of images using the normalized 8 point algorithm. The normalized 8 point algorithm given by Hartley and Zisserman is used. To achieve accurate results it is recommended that 12 or more points are used. The code uses the normalise2dpts.m file also provided. On directions to using the code please refer to the code documentation. Acknowledgements: The code was provided by Peter Kovesi. http://www.csse.uwa.edu.au/~pk/Research/MatlabFns/ Fundamental Matrix Code (C++ code) Please note that the code requires OpenCV version 1.0 (April Edition) to be installed on the target system. The package includes sample stereo images together with the correspondence points. Acknowledgements: The code was provided by Paul Smith. http://www.cs.ucf.edu/~rps43158/index.php Mean-Shift Algorithms Edge Detection and Image SegmentatiON (EDISON) System (C++ source) (binary) The EDISON system contains the image segmentation/edge preserving filtering algorithm described in the paper Mean shift: A robust approach toward feature space analysis and the edge detection algorithm described in the paper Edge detection with embedded confidence . There is also Matlab interface for the EDISON system at the below link. Acknowledgements: The source code is also available from Rutgers: http://www.caip.rutgers.edu/riul/research/robust.html Approximate Mean-Shift Method (C++ code) For instructions on using the code please refer to the readme.txt file included in the zip package. Note the code requires OpenCV to be installed on the target system. Acknowledgements: The code was provided by Alper Yilmaz. http://www.cs.ucf.edu/~yilmaz/ Kernel Density Estimation The KDE class is a general matlab class for k-dimensional kernel density estimation. It is written in a mix of matlab ".m" files and MEX/C++ code. Thus, to use it you will need to be able to compile C++ code for Matlab. The kernels supported are Gaussian, Epanetchnikov and Laplacian. There is a detailed instruction about how to use it at http://www.ics.uci.edu/~ihler/code/kde.html K-Means Algorithms for Data Clustering K-Means in Statistics Toolbox (Matlab code) The goodness of this code is that it provides the options, such as 'distance measure', 'emptyaction', and 'onlinephase'. It is quit slow when dealing with large datasets and sometimes memory will be overflow. Efficient K-Means using JIT (Matlab code) This code uses the JIT acceleration in Matlab, so it is much faster than k-means in the Statistics Toolbox. It is very simple and easy to read. Acknowledgements: The code was provided by Yi Cao. You can also find it at http://www.mathworks.com/matlabcentral/fileexchange/19344-efficient-k-means-clustering-using-jit K-means from VGG ( C code with Matlab interface) This code calls the c code of k-means. It is the fastest one among these three and can deal with large dimensional matrix. Acknowledgements: The code was provided by Mark Everingham. http://www.comp.leeds.ac.uk/me Normalized Cuts You can find the code for data clustering and image segmentation at http://www.cis.upenn.edu/~jshi/software/ , wich was provided by Jianbo Shi. Dimension Reduction PCA (Matlab code) Multidimensional Scaling (Matlab code) Facial Analysis Haar Face Detection (C++ code) For instructions on using the code please refer to the readme.txt file included in the zip package. Note the code requires OpenCVand fltk (an open source window toolkit) to be installed on the target system. Acknowledgements: The code was provided by Paul Smith. http://www.cs.ucf.edu/~rps43158/index.php Optical Flow Lucas Kanade Method (matlab) This code includes the basic Lucas Kanade algorithm and Hierarchical LK using pyramids. Please refer to the 'readme' file included in the package for help on using the code. Following is a test sample to demonstrate the use of this code to calculate the optical flow. Acknowledgements: The code was written by Sohaib Khan. http://www.cs.ucf.edu/~khan/ Code from Piotr Dollar (matlab) It provides three methods to calculate optical flow: Lucas Kanade, HornSchunck and cross-correlation. Acknowledgements: The code was written by Piotr Dollar. http://vision.ucsd.edu/~pdollar/toolbox/doc/index.html Brox and Sand Methods (matlab) This code implements a variation of the paper "High accuracy optical flow using a theory for warping", presented at ECCV 2004. Details of implementation, especially the numerics, have been adapted from the PhD thesis of Dr. Thomas Brox. Extension to different color channels, and the concept of a local smoothing function, has been adopted from the PhD thesis of Dr. Peter Sand. Acknowledgements: The code was written by Visesh Uday Kumar Chari . http://perception.inrialpes.fr/~chari/myweb/Software/ Image Registration Registration (matlab) Please refer to the 'readme' file included in the package for help on using the code. Following is a test sample to demonstrate the use of this code for image registration. Acknowledgements: The code for image registration along with test samples is provided by Yaser Sheikh. http://www.cs.ucf.edu/~yaser/ Color Space Transformations RGB to LUV (matlab) LUV to RGB (matlab) RGB to LAB (matlab) Image Acquisition VFM (matlab) VFM performs frame grabbing from any Video for Windows source. On directions to using the code please refer to the code documentation. Acknowledgements: The code was written by Farzad Pezeshkpour, School of Information Systems, University of East Anglia. Miscellaneous Snakes Demo Page - (Williams-Shah Snakes Algorithm) Interactive java demo of Williams-Shah snakes algorithm. Code written by Sebastian van Delden Deformable Contours (C++ code) Code for the greedy snake algorithm. Writing Video Applications DirectShow tutorial. 3D SIFT This MATLAB code is meant for research purposes only. There have been various changes made to the code since the initial publication.Some subtle, some not so subtle.The most significant change is the use of a tessellation method to calculate the orientation bins.Our testing has shown improved results; however, currently rotational invariance has not been re-implemented.Rotational invariance is useful in certain applications, however it is useless in others, for this reason we have focused our time elsewhere.Another noteable change is the elimination of some points due to lack of descriptive information (multiple gradient orientations).This is a change which has a flag, and can therefore be turned on or off, however I suggest leaving it on and writing your frontend in such a way that allows 3DSIFT to refuse points, as this too has proven very effective in our testing. Please see the README file for more detailed and up-to-date information. Code form the following publication: Paul Scovanner, Saad Ali, and Mubarak Shah, A 3-Dimensional SIFT Descriptor and its Application to Action Recognition , ACM MM 2007. SPREF SPREF Code (Matlab) SPatiotemporal REgularity Flow (SPREF) is a new spatiotemporal feature that represents the directions in which a video or an image is regular, i.e., the pixel appearances change the least. It has several application, such as video inpainting and video compression. For more detail, please refer to our project page SPREF section. FRAISE FRAISE Code (C/C++) Fast Registration of Aerial Image SEquences (FRAISE) is a lightweight OpenCV based software system written in C/C++ to register a sequence of aerial images in near-realtime. A demo test video video sequence and an image sequence with corresponding FRAISE alignment are included. Acknowledgements: The code was written by Subhabrata Bhattacharya. http://www.cs.ucf.edu/~subh/
这个多目标跟踪系统是由SIFT提出者David G. Lowe课题组完成的,所对应的论文: A Boosted Particle Filter: Multitarget Detection and Tracking 获得 Best Paper prize in Cognitive Vision,代码可以从 项目主页 上面下载.
序:我从获得博士学位至今已经整整16个春秋,但博士阶段的感受仍然历历在目。我从指导自己独立实验室的第一个博士生到现在也已经13年了,其中的博士研究生和博士后中已经有11人在美国和中国的大学里担任独立实验室的PI。他们的成长过程差别极大,性格、能力也各有不同。应该说,没有任何一个学生可以简单地遵循另外一个优秀科学家的足迹脱颖而出。从这个意义上讲,科学家的成功是不可能复制的。但是,优秀科学家常常具备的共同特点应该对年轻学生有很大启发。 本文主要来自我在2008年清华大学研究生入学教育里一次2.5小时的讲座,又综合了一些随后的思考和总结。在那次讲座中,我一再强调, 我的目的不是要求研究生完全按照我讲的去做,而是希望从根本上冲击、振荡一下研究生的思考角度,启发大家找到最适合自己的成才之路。 本文很长,分四部分陆续发表出来。】 1.时间的付出。 所有成功的科学家一定具有的共同点,就是他们必须付出大量的时间和心血。这是一条真理。实际上,无论社会上哪一种职业,要想成为本行业中的佼佼者,都必须付出比常人多的时间。有时,个别优秀科学家在回答学生或媒体的问题时,轻描淡写地说自己的成功凭借的是运气,不是苦干。这种回答其实不够客观、也有些不负责任,因为他们有意忽略了自己在时间上的大量付出,而只是强调成功过程中的一个偶然因素,这样说的效果常常对年轻学生造成很大的误导,因为有些幼稚的学生甚至会因此开始投机取巧、不全力进取而是等待所谓的运气;另外一些学生则开始寻找他们的运气,把相当一部分精力和时间用在了与科学研究无关的事情上面。说极端一点:如果真有这样主要凭运气而非时间付出取得成功的科学家,那么他的成功很可能是攫取别人的成果,而自己十有八九不具备真正在领域内领先的学术水平。 大约在十年前,著名的华人生物学家蒲慕明先生曾经有一封非常著名的email在网上广为流传,这封email是蒲先生写给自己实验室所有博士生和博士后的,其中的观点我完全赞同。这封email写的语重心长,从中可以看出蒲先生的良苦用心。我无论是在普林斯顿还是在清华大学都把这封email转给了我实验室的所有学生,让他们体会。其中的一段是这样说的:“The most important thing is what I consider to be sufficient amount of time and effort in the lab work. I mentioned that about 60 hr working time per week is what I consider the minimal time an average successful young scientist in these days has to put into the lab work……I suggest that everyone puts in at least 6 hr concentrated bench work and 2+ hr reading and other research-related activity each day. Reading papers and books should be done mostly after work.”(我认为最重要的事情就是在实验室里的工作时间,当今一个成功的年轻科学家平均每周要有60小时左右的时间投入到实验室的研究工作......我建议每个人每天至少有6小时的紧张实验操作和两小时以上的与科研直接有关的阅读等。文献和书籍的阅读应该在这些工作时间之外进行。)。 有些学生读完蒲先生的email后告诉我,“看来我不是做学术的料,因为我真的吃不起这份苦。”我常常回复道,“我在你这么大年纪的时候,也会觉得长期这样工作不可思议。但在不知不觉之中,你会逐渐被科学研究的精妙所打动,也会为自己的努力和成绩骄傲,你会逐渐适应这种生活方式!”这句话表面上是劝学生,实则是我自己的经历与体会。 我从小就特别贪玩,并不喜欢学习。但来自学校和父母的教育与压力迫使自己尽量刻苦读书;我高中就读于河南省实验中学,凭借着比别人更加刻苦的努力,综合成绩始终名列前茅。1984年全国高中数学联赛我获得河南赛区第一名,保送进入清华大学。大学阶段,我保持了刻苦的传统,综合成绩全班第一并提前一年毕业。但这种应试和灌输教育的结果就是我很少真正独立思考、对专业不感兴趣。大学毕业时,我本没有打算从事科学研究,而是一心一意想下海经商。阴差阳错之间,我踏上了赴美留学之路。 可想而知,留学的第一年,我情绪波动很大,内心浮躁而迷茫,根本无心念书、做研究,而是花了很多时间在中餐馆打工、选修计算机课程。第二年,我开始逐渐适应科研的“枯燥”,并开始有了一点自己的体会,有时领会了一些精妙之处后会洋洋得意,也会产生“原来不过如此”的想法,逐渐对自己的科研能力有了一点自信。这期间,博士研究生的课程全部修完,我每周五天、每天从上午9点做实验到晚上7、8点,周末也会去两个半天。到了第三年,我已经开始领会到科研的逻辑,有点儿跃跃欲试的感觉,在组会上常常提问,而这种“入门”的感觉又让我对研究增加了很多兴趣,晚上常常干到11点多,赶最后一班校车从霍普金斯医学院回Homewood campus(我住在附近)。1993年我曾经在自己的实验记录本的日期旁标注“This is the 21st consecutive day of working in the lab.”(这是我连续第21天在实验室工作。),以激励自己。其实,这多少有作秀之嫌,因为其中的一个周末我一共只做了五、六个小时的实验。到第四年以后,我完全适应了实验室的科研环境,也不会再感受到枯燥或时间上的压力了。时间安排完全服从实验的需要,尽量往前赶。其实,这段时期的实验时间远多于刚刚进实验室的时候,但感觉上好多了。 研究生阶段后期,我的刻苦在实验室是出了名的。在纽约做博士后时期则是我这辈子最苦的两年,每天晚上做实验到半夜三点左右,回到住处躺下来睡觉时常常已是四点以后;但每天早晨八点都会被窗外纽约第一大道(First Avenue)上的汽车喧闹声吵醒,九点左右又回到实验室开始了新的一天。每天三餐都在实验室,分别在上午9点、下午3点和晚上9、10点。这样的生活节奏持续11天,从周一到第二个星期的周五,周五晚上做灰狗长途汽车回到巴尔地摩(Baltimore)的家里,周末两天每天睡上近十个小时,弥补过去11天严重缺失的睡眠。周一早晨再开始下一个11天的奋斗。虽然体力上很累,但我心里很满足、很骄傲,我知道自己在用行动打造未来、在创业。有时我也会在日记里鼓励自己。我住在纽约市曼哈顿区65街与第一大道路口附近,离纽约著名的中心公园(Central Park)很近,那里也时有文化娱乐活动,但在纽约工作整整两年,我从未迈进中心公园一步。 我一定会把自己的这段经历讲给每一个我自己的学生听,新生常常问我:“老师,您觉得自己苦吗?”我通常回答,“只有做自己没有兴趣的事情时候觉得很苦。有兴趣以后一点也不觉得苦。” 是啊,一个精彩的实验带给我的享受比看一部美国大片强多了。现在回想起当时的刻苦,感觉仍很骄傲、很振奋!有时我想:如果自己在博士生、博士后阶段的那七年半不努力进取,而是不加节制地看电影、读小说、找娱乐(当时的互联网远没有现在这么内容丰富),现在该是什么状况? 做一个优秀的博士生,时间的付出是必要条件。
Important Dates Paper submission deadline July 1,2012 Review open August 30,2012 Rebuttal deadline September 6, 2012 Notification of acceptance September 25, 2012 Camera ready early registration October 10, 2012 Workshops/Tutorials/Demos/Special session November 5-6, 2012 Main conference November 7-9, 2012 Call For Papers Motion and Tracking / Stereo and Structure from Motion / Shape from X / Color and Texture / Segmentation and Grouping / Image-Based Modeling / Illumination and Reflectance Modeling / Sensors / Early and Biologically-Inspired Vision / Computational Photography and Video / Object Recognition / Object Detection and Categorization / Video Analysis and Event Recognition / Face and Gesture Analysis / Statistical Methods and Learning / Performance Evaluation / Medical Image Analysis / Optimization Methods / Applications of Computer Vision link: http://www.accv2012.org/
转载一个,但是有些期刊的影响因子不是很对,要投的时候还是再到期刊主页上面看一看吧~ 期刊缩写 期刊全名 近年影响因子 P IEEE Proceedings Of The IEEE 3.686 IEEE WIREL COMMUN IEEE Wireless Communications 2.577 IEEE T MOBILE COMPUT IEEE Transactions On Mobile Computing 2.55 IEEE NETWORK IEEE Network 2.211 IEEE PERVAS COMPUT IEEE Pervasive Computing 2.062 IEEE INTERNET COMPUT IEEE Internet Computing 1.935 IEEE J SEL AREA COMM IEEE Journal On Selected Areas In Communications 1.816 IEEE J SEL AREA COMM IEEE Journal On Selected Areas In Communications 1.816 IEEE ACM T NETWORK IEEE-ACM Transactions On Networking 1.789 IEEE COMMUN MAG IEEE Communications Magazine 1.678 IEEE T SIGNAL PROCES IEEE Transactions On Signal Processing 1.57 COMMUN ACM Communications Of The ACM 1.509 IEEE T PARALL DISTR IEEE Transactions On Parallel And Distributed Systems1.246 IEEE T COMMUN IEEE Transactions On Communications 1.208 IEEE T WIREL COMMUN IEEE TRANSACTIONS ON Wireless Communications 1.184 IEEE T VEH TECHNOL IEEE TRANSACTIONS ON Vehicular Technology 1.071 J NETW COMPUT APPL Journal Of Network And Computer Applications 1.071 WIREL NETW Wireless Networks 0.812 IEEE COMMUN LETT IEEE Communications Letter 0.684 SIGNAL PROCESS (EURASIP) European Association For Signal Processing 0.669 MOBILE NETW APPL Mobile Networks Applications 0.659 COMPUT NETW Computer Networks 0.631 COMPUT COMMUN REV Computer Communication Review 0.578 WIREL COMMUN MOB COM Wireless Communications Mobile Computing 0.511 NETWORKS Networks 0.485 EURASIP J APPL SIG P EURASIP Journal On Advances In Signal Processing 0.463 SCI CHINA F INFOSCIENCES 《中国科学f辑》(英文版) 0.454 COMPUT COMMUN Computer Communications 0.444 EUR T TELECOMMUN European Transactions On Telecommunications 0.434 J PARALLEL DISTR COM Journal Of Parallel And Distributed Computing 0.43 INT J DISTRIB SENS N International Journal Of Distributed Sensor Networks 0.333 J COMPUT SCI TECHNOL 《计算机科学与技术》(英文版) 0.293 IEICE T COMMUN IEICE Transactions On Communications 0.29 WIRELESS PERS COMMUN Wireless Personal Communications 0.247 J COMMUN NETW-S KOR Journal Of Communications And Networks 0.233 IET Communications IET Communications 0.199 CHINESE J ELECTRON 《电子学报》(英文版) 0.185 IET Engineering Technology IET Engineering Technology 0.126
13th ICCV, 2011, Barcelona, Spain Marr Prize Paper: Relative Attributes ( PDF , project ) Devi Parikh , Kristen Grauman Best Student Paper: Close the Loop: Joint Blind Image Restoration and Recognition with Sparse Representation Prior ( PDF ) Haichao Zhang, Jianchao Yang , Yanning Zhang, Thomas Huang 12th ICCV, 2009, Kyoto, Japan Marr Prize Paper: Chaitanya Desai, Deva Ramanan and Charless Fowlkes, Discriminative Models for Multi-class Object Layout Marr Prize Honorable Mention: Ahmed Kirmani, Tyler Hutchison, James Davis and Ramesh Raskar, Looking Around the Corner Using Transient Imaging 11th ICCV, 2007, Rio de Janeiro, Brazil Marr Prize Paper: Bradley Davis, P. Thomas Fletcher, Elizabeth Bullitt, Sarang Joshi, Population Shape Regression From Random Design Data Marr Prize Honorable Mention: Ying Nian Wu, Zhangzhang Si, Chuck Fleming, Song-Chun Zhu, Deformable Template As Active Basis Abhijeet Ghosh, Shruthi Achutha, Wolfgang Heidrich, Matthew O’Toole, BRDF Acquisition with Basis Illumination Manmohan Chandraker, Sameer Agarwal, David Kriegman, Serge Belongie, Globally Optimal Affine and Metric Upgrades in Stratified Autocalibration 10th ICCV, 2005, Beijing, China Marr Prize Papers Fredrik Kahl, Didier Henrion, Globally Optimal Estimates for Geometric Reconstruction Problems Marr Prize Honorable Mention Papers Kiriakos N. Kutulakos, Eron Steger, A Theory of Refractive and Specular Shape by Light-Path Triangulation Oren Boiman, Michal Irani, Detecting Irregularities in Images and in Video Stefan Roth, Michael J. Black, On the Spatial Statistics of Optical Flow 9th ICCV, 2003, Nice, France Marr Prize Papers Andrew Fitzgibbon, Yonatan Wexler, and Andrew Zisserman, Image-based Rendering using Image-based Priors Zhuowen Tu, Xiangrong Chen, Alan L. Yuille, and Song-Chun Zhu, Image Parsing: Unifying Segmentation, Detection and Recognition Paul Viola, Michael J. Jones, and Daniel Snow, Detecting Pedestrians using Patterns of Motion and Appearance 8th ICCV, 2001, Vancouver, Canada Marr Prize Papers Kentaro Toyama and Andrew Blake, Probabilistic Tracking in a Metric Space Steven Seitz, The Space of All Stereo Images Marr Prize Honorable Mention Papers Yaron Caspi and Michal Irani, Alignment of Non-Overlapping Sequences Lior Wolf and Amnon Shashua, On Projection Matrices and their Applications in Computer Vision 7th ICCV, 1999, Kerkyra, Greece Marr Prize Papers Kiriakos Kutulakos and Steven Seitz, A Theory of Shape by Space Carving Yi Ma, Stefano Soatto, Jana Kosecka, and Shankar Sastry, Euclidean Reconstruction and Reprojection up to Subgroups Marr Prize Honorable Mention Papers Michael Black and David Fleet, Probabilistic Detection and Tracking of Motion Discontinuities Ying Nian Wu, Song-Chun Zhu, Xiuwen Liu, Equivalence of Julesz and Gibbs texture ensembles 6th ICCV, 1998, Bombay, India Marr Prize Papers Marc Pollefeys, Reinhard Koch, and Luc Van Gool, Self-Calibration and Metric Reconstruction in spite of Varying and Unknown Internal Camera Parameters Phil Torr, Andrew Fitzgibbon, and Andrew Zisserman, Maintaining multiple motion model hypotheses over many views to recover matching and structure Marr Prize Honorable Mention Paper Richard Szeliski and Polina Golland, Stereo Matching with Transparency and Matting 5th ICCV, 1995, Cambridge, U.S.A. Marr Prize Papers Michael Oren and Shree Nayar, A Theory of Specular Surface Geometry Toshikazu Wada, Hiroyuki Ukida, and Takashi Matsuyama, Shape from Shading with Interreflections under a Proximal Light Source: Distortion-Free Copying of an Unfolded Book Marr Prize Honorable Mention Papers Paul Viola and William Wells III, Alignment by Maximization of Mutual Information Anders Heyden, Reconstruction from Image Sequences by Means of Relative Depths Yalin Xiong and Steven Shafer, Hypergeometric Filters for Optical Flow and Affine Matching 4th ICCV, 1993, Berlin, Germany Marr Prize Paper Charles A. Rothwell, David A. Forsyth, Andrew Zisserman, and Joseph L. Mundy, Extracting Projective Structure from Single Perspective Views of 3D Point Sets 3rd ICCV, 1990, Osaka, Japan Marr Prize Paper Shree Nayar, Katsushi Ikeuchi, and Takeo Kanade, Shape from Interreflections 2nd ICCV, 1988, Tampa, U.S.A. Marr Prize Paper Brian Funt and Jian Ho, Color from Black and White Marr Prize Honorable Mention Papers David Lowe, Organization of Smooth Image Curves at Multiple Scales Vishvjit Nalwa, Representing Oriented Piecewise C2 Surfaces Alan Yuille and Norberto Grzywacz, The Motion Coherence Theory 1st ICCV, 1987, London, United Kingdom Marr Prize Paper David Heeger, Optical Flow using Spatiotemporal Filters Marr Prize Honorable Mention Papers John Tsotsos, A `Complexity Level’ Analysis of Immediate Vision Michael Kass, Andrew Witkin, and Demetri Terzopoulos, Snakes: Active Contour Models Yiannis Aloimonos and Issac Weiss, Active Vision VIA
做机器视觉和图像处理方面的研究工作,最重要的两个问题:其一是要把握住国际上最前沿的内容;其二是所作工作要具备很高的实用背景。解决第一个问题的办法就是找出这个方向公认最高成就的几个超级专家(看看他们都在作什么)和最权威的出版物(阅读上面最新的文献),解决第二个问题的办法是你最好能够找到一个实际应用的项目,边做边写文章。 做好这几点的途径之一就是利用网络资源,利用权威网站和专家们的个人主页。 依照下面目录整理: 研究群体(国际国内) 专家主页 前沿国际国内期刊与会议 搜索资源 GPL软件资源 一、研究群体 用来搜索国际知名计算机视觉研究组(CV Groups): 国际计算机视觉研究组清单 http://peipa.essex.ac.uk/info/groups.html 美国计算机视觉研究组清单 http://peipa.essex.ac.uk/info/groups.html#USA http://www-2.cs.cmu.edu/~cil/vision.html 或 http://www.cs.cmu.edu/~cil/vision.html 这是卡奈基梅隆大学的计算机视觉研究组的主页,上面提供很全的资料,从发表文章的下载到演示程序、测试图像、常用链接、相关软硬件,甚至还有一个搜索引擎。著名的有人物Tomasi, Kanade等。 卡内基梅隆大学双目实验室 http://vision.middlebury.edu/stereo/ 卡内基梅隆研究组 http://www.cs.cmu.edu/~cil/v-groups.html 还有几个实验室: Calibrated Imaging Laboratory 图像 Digital Mapping Laboratory 映射 Interactive Systems Laboratory 互动 Vision and Autonomous Systems Center视觉自适应 http://www.via.cornell.edu/ 康奈尔大学的计算机视觉和图像分析研究组,好像是电子和计算机工程系的。侧重医学方面的研究,但是在上面有相当不错资源,关键是它正在建设中,能够跟踪一些信息。 Cornell University——Robotics and Vision group http://www-cs-students.stanford.edu/ 斯坦福大学计算机系主页 1. http://white.stanford.edu/ 2. http://vision.stanford.edu/ 3. http://ai.stanford.edu/ 美国斯坦福大学人工智能机器人实验室 The Stanford AI Lab (SAIL) is the intellectual home for researchers in the Stanford Computer Science Department whose primary research focus is Artificial Intelligence. The lab is located in the Gates... Vision and Imaging Science and Technology http://www.fmrib.ox.ac.uk/analysis/ 主要研究:Brain Extraction Tool, Nonlinear noise reduction, Linear Image Registration, Automated Segmentation, Structural brain change analysis, motion correction, etc. http://www.cse.msu.edu/prip/ —密歇根州立大学计算机和电子工程系的模式识别--图像处理研究组,它的FTP上有许多的文章(NEW)。 美国密歇根州大学认知模型和图像处理实验室 The Pattern Recognition and Image Processing (PRIP) Lab faculty and students investigate the use of machines to recognize patterns or objects. Methods are developed to sense objects, to discover which... http://www.cse.msu.edu/rgroups/prip/ http://pandora.inf.uni-jena.de/p/e/index.html 德国的一个数字图像处理研究小组,在其上面能找到一些不错的链接资源。 柏林大学 http://www.cv.tu-berlin.de/ 德国波恩大学视觉和认识模型小组 Computer Vision Group located within the Division III of the Computer Science Department in the University of Bonn in Germany. This server offers information on topics concerning our computer vision http://www-dbv.informatik.uni-bonn.de/ http://www-staff.it.uts.edu.au/~sean/CVCC.dir/home.html CVIP(used to be CVCC for Computer Vision and Cluster Computing) is a research group focusing on cluster-based computer vision within the Spiral Architecture. http://cfia.gmu.edu/ The mission of the Center for Image Analysis is to foster multi-disciplinary research in image, multimedia and related technologies by establishing links between academic institutes, industry and government agencies, and to transfer key technologies to help industry build next generation commercial and military imaging and multimedia systems. 英国的Bristol大学的Digital Media Group在高级图形图像方面不错。主要就是涉及到场景中光线计算的问题,比如用全局光照或是各种局部光照对高动态图的处理,还有近似真实的模拟现实环境 (照片级别的),还有用几张照片来建立3D模型(人头之类的)。另外也有对古代建筑模型复原。 http://www.cs.bristol.ac.uk/Research/Digitalmedia/ 而且根据Times全英计算机排名在第3, 也算比较顶尖的研究了 http://www.cmis.csiro.au/IAP/zimage.htm 这是一个侧重图像分析的站点,一般。但是提供一个Image Analysis环境---ZIMAGE and SZIMAGE。 麻省理工视觉实验室MIT http://groups.csail.mit.edu/vision/welcome/ AI Laboratory Computer Vision group Center for Biological and Computational Learning Media Laboratory, Vision and Modeling Group Perceptual Science group UC Berkeley http://0-vision.berkeley.edu.ilstest.lib.neu.edu/vsp/index.html http://www.cs.berkeley.edu.ilste ... n/vision_group.html 加州大学伯克利分校视觉实验室David A. Forsyth: http://www.cs.berkeley.edu/~daf/ UCLA(加州大学洛杉矶分校) http://vision.ucla.edu/ 视觉实验室 英国牛津的A.Zisserman: http://www.robots.ox.ac.uk/~az/ 机器人实验室 美国南加州大学智能机器人和智能系统研究所University of Southern California, Los Angeles IRIS is an interdepartmental unit of USC's School of Engineering with ties to USC's Information Sciences Institute (ISI). Members include faculty, graduate students, and research staff associated with... http://iris.usc.edu/ Computer Vision 实验室 美国南加州大学计算机视觉实验室介绍: Computer Vision Laboratory at the University of Southern California is one of the major centers of computer vision research for thirty years. they conduct research in a number of basic and applied are... http://iris.usc.edu/USC-Computer-Vision.html 英国约克大学高级计算机结构神经网络小组 The Advanced Computer Architecture Group has had a thriving research programme in neural networks for over 10 years. The 15 researchers, led by Jim Austin, focus their work in the theory and applicati... http://www.cs.york.ac.uk/arch/neural/ 瑞士戴尔莫尔感知人工智能研究所 IDIAP is a research institute established in Martigny in the Swiss Alps since 1991. Active in the areas of multimodal interaction and multimedia information management, the institute is also the leade... http://www.idiap.ch/ 英国萨里大学视觉,语言和信号处理中心 The Centre for Vision, Speech and Signal Processing (CVSSP) is more than 60 members strong, comprising 12 academic staff, 18 research fellows and more than 44 research students. The activities of the ... http://www.ee.surrey.ac.uk/Research/VSSP/ 美国阿默斯特马萨诸塞州立大学计算机视觉实验室 The Computer Vision Laboratory was established in the Computer Science Department at the University of Massachusetts in 1974 with the goal of investigating the scientific principles underlying the con... http://vis-www.cs.umass.edu University of Massachusetts——Computer Vision Laboratory for Perceptual Robotics 美国芝加哥伊利诺伊斯大学贝克曼研究中心智能机器人和计算机视觉实验室 Includes the following groups: Professor Seth Hutchinson's Research Group Professor David Kriegman's Research Group Professor Jean Ponce's Research Group Professor Narendra Ahuja's Research Gro... http://www-cvr.ai.uiuc.edu/ Computer Vision and Robotics Laboratory Vision Interfaces and Systems Laboratory (VISLab) 英国伯明翰大学计算机科学学校视觉研究小组 The vision group at the School of Computer Science (a RAE 5 rated department) performs research into a wide variety of computer vision and image understanding areas. Much of this work is performed in ... http://www.cs.bham.ac.uk/research/vision/ 微软研究院机器学习与理解研究小组 / 计算机视觉小组 The research group focuses on the development of more advanced and intelligent computer systems through the exploitation of statistical methods in machine learning and computer vision. The site lists ... http://research.microsoft.com/mlp/ http://research.microsoft.com/en-us/groups/vision/ 微软公司的文献: http://research.microsoft.com/research/pubs 微软亚洲研究院: http://research.microsoft.com/asia/ ,值得关注Harry Shum, Jian Sun, Steven Lin, Long Quan(兼职HKUST)etc. 瑞典隆德大学数学系视觉组: http://www.maths.lth.se/matematiklth/personal/andersp/ 感觉国外搞视觉的好多是数学系出身,大约做计算机视觉对数学要求很高吧。 澳大利亚国立大学: http://users.rsise.anu.edu.au/~hartley/ 美国北卡大学: http://www.cs.unc.edu/~marc/ 法国INRIA: http://www-sop.inria.fr/odyssee/team/ 由Olivier.Faugeras领衔的牛人众多。 比利时鲁汶大学的L.Van Gool: www.esat.kuleuven.ac.be/psi/visics/ 据说在这个只有中国一个小镇大小的地方的鲁汶大学在欧洲排行top10,名列世界top100,还出了几个诺贝尔奖,视觉研究也很强. 美国明德 http://vision.middlebury.edu/stereo/ 以下含有非顶尖美国学校研究组,没有链接(个别的上面已经提到),供参考。 Amerinex Applied Imaging, Inc. Boston University Image and Video Computing Research group University of California at Santa Barbara加州大学芭芭拉分校 Vision Research Lab University of California at San Diego加州大学圣迭戈分校 Computer Vision Robotics Research Laboratory Visual Computing laboratory University of California at Irvine加州大学欧文分校,加州南部一城,在圣安娜东南, Computer Vision laboratory University of California, Riverside加州大学河滨分校 Visualization and Intelligent Systems Laboratory (VISLab) University of California at Santa Cruz Perceptual Science Laboratory Caltech (加州理工) Vision group University of Central Florida Computer Vision laboratory University of Florida Center for Computer Vision and Visualization Colorado State University Computer Vision group Columbia University Automated Vision Environment (CAVE) Robotics group University of Georgia, Athens Visual and Parallel Computing Laboratory Harvard University(哈佛) Robotics Laboratory University of Illinois at Urbana-Champaign Robotics and Computer Vision University of Iowa Division of Physiologic Imaging Jet Propulsion Laboratory Machine Vision and Tracking Sensors group Khoral Research, Inc Lawrence Berkeley Laboratories Imaging and Collaborative Computing Group Imaging and Distributed Computing Lehigh University Image Processing and Pattern Analysis Lab Vision And Software Technology Laboratory University of Louisville Computer Vision and Image Processing Lab University of Maryland Computer Vision Laboratory University of Miami Underwater Vision and Imaging Laboratory University of Michigan密歇根 AI Laboratory Michigan State University 密歇根州立 Pattern Recognition and Image Processing laboratory Environmental Research Institute of Michigan (ERIM) 密歇根大学有汽车车身检测研究 University of Missouri-Columbia Computational Intelligence Research Laboratory NEC Computer Vision and Image Processing University of Nevada Computer Vision Laboratory Notre-Dame University Vision-Based Robotics using Estimation Ohio State University Signal Analysis and Machine Perception Laboratory University of Pennsylvania GRASP laboratory Medical Image Processing group Vision Analysis and Simulation Technologies (VAST) Laboratory Penn State University 宾夕法尼亚大学 Computer Vision Precision Digital Images Purdue University普渡大学 Robot Vision laboratory Video and Image Processing Laboratory (VIPER) Rensselaer Polytechnic Institute (RPI) Computer Science Vision University of Rochester Center for Electronic Imaging Systems Vision and Robotics laboratory Rutgers University (The State University of New Jersey) Image Understanding Lab University of Southern California Computer Vision University of South Florida Image Analysis Research group Stanford Research Institute International (SRI) RADIUS -- Research and Development for Image Understanding Systems The Perception program at SRI's AI Center SUNY at Stony Brook Computer Vision Lab University of Tennessee Imaging, Robotics and Intelligent Systems laboratory University of Texas, Austin Laboratory for Vision Systems University of Utah Center for Scientific Computing and Imaging Robotics and Computer Vision University of Virginia Computer Vision Research (CS) University of Washington Image Computing Systems Laboratory Information Processing Laboratory CVIA Laboratory University of West Florida Image Analysis/Robotics Research Laboratory University of Wisconsin Computer Vision group Vanderbilt University Center for Intelligent Systems Washington State University Imaging Research laboratory Wright-Patterson Model-Based Vision laboratory Wright State University Intelligent Systems Laboratory University of Wyoming Wyoming Image and Signal Processing Research (WISPR) Yale University Computational Vision Group http://www.cs.yale.edu/ School of Medicine, Image Processing and Analysis group 国内: 中科院模式识别国家重点实验室 http://www.nlpr.ia.ac.cn/English/rv/mainpage.html 虹膜识别、掌纹识别、人脸识别、 莲花山 http://www.stat.ucla.edu/~sczhu/Lotus/ 天津大学精密测试技术及仪器国家重点实验室 研究方向包括:激光及光电测试技术、传感及测量信息技术、微纳测试与制造技术、制造质量控制技术。该实验室是国内精密测试领域惟一的国家重点实验室。 “智能微系统及其集成应用技术”、“微结构光学测试技术”、“油气储运安全检测技术”、“先进制造中的视觉测量及其关键技术”、“正交偏振激光器原理、特性及其在精密计量中的应用研究”等5项代表性成果(07.3)。 中科院长春光机所 http://www.ciomp.ac.cn/ny/keyan.asp 中科院沈阳自动化所 http://www.sia.ac.cn/index.php 中科院西安光机所 http://www.opt.ac.cn/yanjiushi/gpcxjs1.htm 北京大学智能科学系 http://www.cis.pku.edu.cn/vision/vision.htm 三维视觉计算与机器人,生物特征识别与图像识别 二、专家网页 http://www.ai.mit.edu/people/wtf/ 这位可是MIT人工智能实验室的BILL FREEMAN。专长是:理解--贝叶斯模型。 http://www.merl.com/people/brand/ MERL(Mitsubishi Electric Research Laboratory)中的擅长“Style Machine”。 http://research.microsoft.com/~ablake/ CV界极有声望的A.Blake 1977年毕业于剑桥大学三一学院并或数学与电子科学学士学位。之后在MIT,Edinburgh,Oxford先后组建过研究小组并成为Oxford的教授,直到1999年进入微软剑桥研究中心。主要工作领域是计算机视觉。 http://www-2.cs.cmu.edu/afs/cs.cmu.edu/user/har/Web/home.html 这位专家好像正在学习汉语,主页并且搜集了诸如“两只老虎(Two Tigers)”的歌曲。 他的主页上面还有几个专家:Shumeet Baluja, Takeo Kanade。他们的Face Detection作的绝对是世界一流。毕业于卡奈基梅隆大学的计算机科学系,兴趣是计算机视觉。 三、前沿国际国内期刊与会议 这里的期刊大部分都可以通过上面的专家们的主页间接找到 1.国际会议 2.国际期刊 3.国内期刊 4.神经网络 5.CV 6.数字图象 7.教育资源,大学 8.常见问题 1. 国际会议 现在,国际上计算机视觉方面的三大国际会议是ICCV, CVPR和ECCV,统称之为ICE。 ICCV的全称是International Comference on Computer Vision。ICCV两年一次,与ECCV正好错开,是公认的三个会议中级别最高的。 ECCV的全称是Europeon Conference on Computer Vision,是一个欧洲的会议。 CVPR的全称是Internaltional Conference on Computer Vision and Pattern Recogintion国际计算机视觉与模式识别会议。这是一个一年一次的会议,举办地在美国。 ICIP— BMVC— MVA— 国际模式识别会议(ICPR ): 亚洲计算机视觉会议(ACCV): 2.国际期刊 以计算机视觉为主要内容之一的国际刊物也有很多,如: International Journal of Computer Vision IEEE Trans. On PAMI http://www.computer.org/tpami/index.htm IEEE Transactionson Image Processing http://www.ieee.org/organizations/pubs/transactions/tip.htm Pattern Recognition http://www.elsevier.com/locate/issn/00313203 Pattern Recognition Letters http://www.elsevier.com/locate/issn/01678655 IEEE Trans. on Robotics and Automation, IEEE TPAMI IEEE TIP CVGIP Computer Vision. Graphics and Image Processing, Visual Image Computing, IJPRAI(Internatiorial Journat of Pattern Recognition and Artificial Intelligence) 众所周知, computer vision(cv) 存在ICCV/CVPR/ECCV三个顶级会议,它们档次差不多,都应该在一流会议行列, 没有必要给个高下。有些us的人认为ICCV/CVPR略好于ECCV,而欧洲人大都认为ICCV/ECCV略好于CVPR,某些英国的人甚至认为BMVC好于CVPR。简言之, 三个会议差不多, 各有侧重和偏好。 笔者就个人经验浅谈三会异同, 以供大家参考和讨论。 三者乃cv领域的旗舰和风向标,其oral paper (包括best paper) 代表当年度cv的最高水准, 在此引用Harry Shum的一句话, 想知道某个领域在做些什么,找最近几年此领域的proceeding看看就知道了。 ICCV/CVPR由IEEE Computer Society牵头组织, ECCV好像没有专门负责的组织。 CVPR每年(除2002年)都在美国开, ECCV每两年开一次,仅限欧洲, ICCV也是每两年一次,各洲轮值。 基本可以保证每年有两个会议开, 这样研究者就有两次跻身牛会的机会。 就录取率而言, 三会都有波动。 如ICCV2001录取率30%,且出现两个人(华人)各有三篇第一作者的paper的情况, 这在顶级牛会是不常见的 (灌水嫌疑)。 但是, ICCV2003, 2005两次录取率都很低, 大约20%左右。 ECCV也是类似规律, 在2004年以前都是30%, 2006年降低到20%左右。 CVPR的录取率近年来一直偏高,从2004年开始一直都在 。最近一次CVPR2006是28.1%, CVPR2007还不知道统计数据。笔者猜测为了维持录取paper的绝对数量, 当submission少的时候录取率偏高, 反之偏低,近几年三大会议的投稿数量全部超过1000, 相对2000年前, 三会录取率均大幅度降低,最大幅度50%-20%。 对录取率走势感兴趣的朋友, 可参考 http://vrlab.epfl.ch/~ulicny/statistics/(CVPR2004 的数据是错的), http://www.adaptivebox.net/research/bookmark/CICON_stat.html. 显然, 投入cv的人越来越多,这个领域也是越来越大, 这点颇不似machine learning一直奉行愚蠢的小圈子主义。另外一点值得注意, ICCV/ECCV只收vision相关的topic,而cvpr会收少量的pattern recognition paper, 如finger print等,但是不收和image/video完全不占边的pr paper,如speech recognition等。我一个朋友曾经review过一篇投往CVPR的speech的paper, 三个reviewer一致拒绝,其中一个reviewer搞笑的指出, 你这篇paper应该是投ICASSP被据而转投CVPR的。 就topic而言, CVPR涵盖最广。 还有一个没有验证过的原因导致CVPR录取率高: 很多us的researcher不愿意或没有足够的经费到us以外的地方开会, 故CVPR会优先接收很多来自us的paper (让大家都happy)。 以上对三会的分析对我们投paper是很有指导作用的。 目前的research我想绝大部分还是纸上谈兵, 必经 read paper - write paper - publish paper - publish paper on top conferences and journals流程。故了解投paper的一些基本技巧, 掌握领域的走向和热点, 是非常必要的。 避免做无用功,选择切合的topic,改善presentation, 注意格式 (遵守规定的模板), 我想这是很多新手需要注意的问题。如ICCV2007明文规定不写summary page直接reject, 但是仍然有人忽视, 这是相当不值得的。 3.国内期刊 自动化学报、计算机学报、软件学报、电子学报,中国图象图形学报,模式识别与人工智能,光电子激光,精密光学工程等。 4.神经网络 神经网络-Neural Networks Tutorial Review http://hem.hj.se/~de96klda/NeuralNetworks.htm ftp://ftp.sas.com/pub/neural/FAQ.html Image Compression with Neural Networks http://www.comp.glam.ac.uk/digimaging/neural.htm Backpropagator's Review http://www.dontveter.com/bpr/bpr.html Bibliographies on Neural Networks http://liinwww.ira.uka.de/bibliography/Neural/ Intelligent Motion Control with an Artificial Cerebellum http://www.q12.org/phd.html Kernel Machines http://www.kernel-machines.org/ Some Neural Networks Research Organizations http://www.ieee.org/nnc/ http://www.inns.org/ Neural Network Modeling in Vision Research http://www.rybak-et-al.net/nisms.html Neural Networks and Machine Learning http://learning.cs.toronto.edu/ Neural Application Software http://attrasoft.com Neural Network Toolbox for MATLAB http://www.mathworks.com/products/neuralnet/ Netlab Software http://www.ncrg.aston.ac.uk/netlab/ Kunama Systems Limited http://www.kunama.co.uk/ 5.Computer Vision(计算机视觉) Annotated Computer Vision Bibliography http://iris.usc.edu/Vision-Notes/bibliography/contents.html http://iris.usc.edu/Vision-Notes/rosenfeld/contents.html Lawrence Berkeley National Lab Computer Vision and Robotics Applications http://www-itg.lbl.gov/ITG.hm.pg.docs/VISIon/vision.html CVonline by University of Edinburgh The Evolving, Distributed, Non-Proprietary, On-Line Compendium of Computer Vision, www.dai.ed.ac.uk/CVonline Computer Vision Handbook, www.cs.hmc.edu/~fleck/computer-vision-handbook Vision Systems Courseware www.cs.cf.ac.uk/Dave/Vision_lecture/Vision_lecture_caller.html Research Activities in Computer Vision http://www-syntim.inria.fr/syntim/analyse/index-eng.html Vision Systems Acronyms www.vision-systems-design.com/vsd/archive/acronyms.html Dictionary of Terms in Human and Animal Vision http://cns-web.bu.edu/pub/laliden/WWW/Visionary/Visionary.html Metrology based on Computer Vision www.cranfield.ac.uk/sme/amac/research/metrology/metrology.html 6.Digital Photography 数字图像 Digital Photography, Scanning, and Image Processing www.dbusch.com/scanners/scanners.htm l 7.Educational Resources, Universities 教育资源,大学 Center for Image Processing in Education www.cipe.com Library of Congress Call Numbers Related to Imaging Science by Rochester Institute of Technology http://wally2.rit.edu/pubs/guides/imagingcall.html Mathematical Experiences through Image Processing, University of Washington www.cs.washington.edu/research/metip/metip.html Vismod Tech Reports and Publications, MIT http://vismod.www.media.mit.edu/cgi-bin/tr_pagemaker Vision Lab PhD dissertation list, University of Antwerp http://wcc.ruca.ua.ac.be/~visielab/theses.html INRIA (France) Research Projects: Human-Computer Interaction, Image Processing, Data Management, Knowledge Systems www.inria.fr/Themes/Theme3-eng.html Image Processing Resources http://eleceng.ukc.ac.uk/~rls3/Contents.htm Publications of Carsten Steger http://www9.informatik.tu-muench ... r/publications.html 8.FAQs(常见问题) comp.dsp FAQ www.bdti.com/faq/dsp_faq.htm Robotics FAQ www.frc.ri.cmu.edu/robotics-faq Where's the sci.image.processing FAQ? www.cc.iastate.edu/olc_answers/p ... processing.faq.html comp.graphics.algorithms FAQ, Section 3, 2D Image/Pixel Computations www.exaflop.org/docs/cgafaq Astronomical Image Processing System FAQ www.cv.nrao.edu/aips/aips_faq.html 四、搜索资源 http://sal.kachinatech.com/ http://cheminfo.pku.edu.cn/mirrors/SAL/index.shtml 北京大学 Google输入:computer vision 或computer vision groups可以获得很多结果 网络资源: CVonline http://homepages.inf.ed.ac.uk/rbf/CVonline/ 视觉研究组列表 Computer vision test Image http://www.cs.cmu.edu/~cil/v-images.html 卡内基梅隆标准图片库 视觉论文搜索:Paper search http://www.researchindex.com 五、图像处理GPL库(代码库图像库等) http://www.ph.tn.tudelft.nl/~klamer/cppima.html Cppima 是一个图像处理的C++函数库。这里有一个较全面介绍它的库函数的文档,当然你也可以下载压缩的GZIP包,里面包含TexInfo格式的文档。 http://iraf.noao.edu/ Welcome to the IRAF Homepage! IRAF is the Image Reduction and Analysis Facility, a general purpose software system for the reduction and analysis of astronomical data http://entropy.brni-jhu.org/tnimage.html 一个非常不错的Unix系统的图像处理工具,看看它的截图。你可以在此基础上构建自己的专用图像处理工具包。 http://sourceforge.net/projects/ 这是GPL软件集散地,可以搜索IP库。 国内的CSDN http://www.csdn.net/
人脸数据库的一个汇总: Here are some face data sets often used by researchers: The Color FERET Database, USA The FERET program set out to establish a large database of facial images that was gathered independently from the algorithm developers. Dr. Harry Wechsler at George Mason University was selected to direct the collection of this database. The database collection was a collaborative effort between Dr. Wechsler and Dr. Phillips. The images were collected in a semi-controlled environment. To maintain a degree of consistency throughout the database, the same physical setup was used in each photography session. Because the equipment had to be reassembled for each session, there was some minor variation in images collected on different dates. The FERET database was collected in 15 sessions between August 1993 and July 1996. The database contains 1564 sets of images for a total of 14,126 images that includes 1199 individuals and 365 duplicate sets of images. A duplicate set is a second set of images of a person already in the database and was usually taken on a different day. For some individuals, over two years had elapsed between their first and last sittings, with some subjects being photographed multiple times. This time lapse was important because it enabled researchers to study, for the first time, changes in a subject's appearance that occur over a year. SCface - Surveillance Cameras Face Database SCface is a database of static images of human faces. Images were taken in uncontrolled indoor environment using five video surveillance cameras of various qualities. Database contains 4160 static images (in visible and infrared spectrum) of 130 subjects. Images from different quality cameras mimic the real-world conditions and enable robust face recognition algorithms testing, emphasizing different law enforcement and surveillance use case scenarios. SCface database is freely available to research community. The paper describing the database is available here . Multi-PIE A close relationship exists between the advancement of face recognition algorithms and the availability of face databases varying factors that affect facial appearance in a controlled manner. The PIE database, collected at Carnegie Mellon University in 2000, has been very influential in advancing research in face recognition across pose and illumination. Despite its success the PIE database has several shortcomings: a limited number of subjects, a single recording session and only few expressions captured. To address these issues researchers at Carnegie Mellon University collected the Multi-PIE database. It contains 337 subjects, captured under 15 view points and 19 illumination conditions in four recording sessions for a total of more than 750,000 images. The paper describing the database is available here . The Yale Face Database Contains 165 grayscale images in GIF format of 15 individuals. There are 11 images per subject, one per different facial expression or configuration: center-light, w/glasses, happy, left-light, w/no glasses, normal, right-light, sad, sleepy, surprised, and wink. The Yale Face Database B Contains 5760 single light source images of 10 subjects each seen under 576 viewing conditions (9 poses x 64 illumination conditions). For every subject in a particular pose, an image with ambient (background) illumination was also captured. PIE Database, CMU A database of 41,368 images of 68 people, each person under 13 different poses, 43 different illumination conditions, and with 4 different expressions. Project - Face In Action (FIA) Face Video Database, AMP, CMU Capturing scenario mimics the real world applications, for example, when a person is going through the airport check-in point. Six cameras capture human faces from three different angles. Three out of the six cameras have smaller focus length, and the other three have larger focus length. Plan to capture 200 subjects in 3 sessions in different time period. For one session, both in-door and out-door scenario will be captured. User-dependent pose and expression variation are expected from the video sequences. ATT "The Database of Faces" (formerly "The ORL Database of Faces") Ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement). Cohn-Kanade AU Coded Facial Expression Database Subjects in the released portion of the Cohn-Kanade AU-Coded Facial Expression Database are 100 university students. They ranged in age from 18 to 30 years. Sixty-five percent were female, 15 percent were African-American, and three percent were Asian or Latino. Subjects were instructed by an experimenter to perform a series of 23 facial displays that included single action units and combinations of action units. Image sequences from neutral to target display were digitized into 640 by 480 or 490 pixel arrays with 8-bit precision for grayscale values. Included with the image files are "sequence" files; these are short text files that describe the order in which images should be read. MIT-CBCL Face Recognition Database The MIT-CBCL face recognition database contains face images of 10 subjects. It provides two training sets: 1. High resolution pictures, including frontal, half-profile and profile view; 2. Synthetic images (324/subject) rendered from 3D head models of the 10 subjects. The head models were generated by fitting a morphable model to the high-resolution training images. The 3D models are not included in the database. The test set consists of 200 images per subject. We varied the illumination, pose (up to about 30 degrees of rotation in depth) and the background. Image Database of Facial Actions and Expressions - Expression Image Database 24 subjects are represented in this database, yielding between about 6 to 18 examples of the 150 different requested actions. Thus, about 7,000 color images are included in the database, and each has a matching gray scale image used in the neural network analysis. Face Recognition Data, University of Essex, UK 395 individuals (male and female), 20 images per individual. Contains images of people of various racial origins, mainly of first year undergraduate students, so the majority of indivuals are between 18-20 years old but some older individuals are also present. Some individuals are wearing glasses and beards. NIST Mugshot Identification Database There are images of 1573 individuals (cases) 1495 male and 78 female. The database contains both front and side (profile) views when available. Separating front views and profiles, there are 131 cases with two or more front views and 1418 with only one front view. Profiles have 89 cases with two or more profiles and 1268 with only one profile. Cases with both fronts and profiles have 89 cases with two or more of both fronts and profiles, 27 with two or more fronts and one profile, and 1217 with only one front and one profile. NLPR Face Database 450 face images. 896 x 592 pixels. JPEG format. 27 or so unique people under with different lighting/expressions/backgrounds. M2VTS Multimodal Face Database (Release 1.00) Database is made up from 37 different faces and provides 5 shots for each person. These shots were taken at one week intervals or when drastic face changes occurred in the meantime. During each shot, people have been asked to count from '0' to '9' in their native language (most of the people are French speaking), rotate the head from 0 to -90 degrees, again to 0, then to +90 and back to 0 degrees. Also, they have been asked to rotate the head once again without glasses if they wear any. The Extended M2VTS Database, University of Surrey, UK Contains four recordings of 295 subjects taken over a period of four months. Each recording contains a speaking head shot and a rotating head shot. Sets of data taken from this database are available including high quality colour images, 32 KHz 16-bit sound files, video sequences and a 3D model. The AR Face Database, Purdue University, USA 4,000 color images corresponding to 126 people's faces (70 men and 56 women). Images feature frontal view faces with different facial expressions, illumination conditions, and occlusions (sun glasses and scarf). The University of Oulu Physics-Based Face Database Contains 125 different faces each in 16 different camera calibration and illumination condition, an additional 16 if the person has glasses. Faces in frontal position captured under Horizon, Incandescent, Fluorescent and Daylight illuminant .Includes 3 spectral reflectance of skin per person measured from both cheeks and forehead. Contains RGB spectral response of camera used and spectral power distribution of illuminants. CAS-PEAL Face Database The CAS-PEAL face database has been constructed under the sponsors of National Hi-Tech Program and ISVISION. The goals to create the PEAL face database include: providing the worldwide researchers of FR community a large-scale Chinese face database for training and evaluating their algorithms; facilitating the development of FR by providing large-scale face images with different sources of variations, especially Pose, Expression, Accessories, and Lighting (PEAL); advancing the state-of-the-art face recognition technologies aiming at practical applications especially for the oriental. Japanese Female Facial Expression (JAFFE) Database The database contains 213 images of 7 facial expressions (6 basic facial expressions + 1 neutral) posed by 10 Japanese female models. Each image has been rated on 6 emotion adjectives by 60 Japanese subjects. BioID Face DB - HumanScan AG, Switzerland The dataset consists of 1521 gray level images with a resolution of 384x286 pixel. Each one shows the frontal view of a face of one out of 23 different test persons. For comparison reasons the set also contains manually set eye postions. Psychological Image Collection at Stirling (PICS) This is a collection of images useful for research in Psychology, such as sets of faces and objects. The images in the database are organised into SETS, with each set often representing a separate experimental study. The Sheffield Face Database (previously: The UMIST Face Database) Consists of 564 images of 20 people. Each covering a range of poses from profile to frontal views. Subjects cover a range of race/sex/appearance. Each subject exists in their own directory labelled 1a, 1b, ... 1t and images are numbered consequetively as they were taken. The files are all in PGM format, approximately 220 x 220 pixels in 256 shades of grey. Face Video Database of the Max Planck Institute for Biological Cybernetics This database contains short video sequences of facial Action Units recorded simultaneously from six different viewpoints, recorded in 2003 at the Max Planck Institute for Biological Cybernetics. The video cameras were arranged at 18 degrees intervals in a semi-circle around the subject at a distance of roughly 1.3m. The cameras recorded 25 frames/sec at 786x576 video resolution, non-interlaced. In order to facilitate the recovery of rigid head motion, the subject wore a headplate with 6 green markers. The website contains a total of 246 video sequences in MPEG1 format. Caltech Faces 450 face images. 896 x 592 pixels. JPEG format. 27 or so unique people under with different lighting/expressions/backgrounds. EQUINOX HID Face Database Human identification from facial features has been studied primarily using imagery from visible video cameras. Thermal imaging sensors are one of the most innovative emerging techonologies in the market. Fueled by ever lowering costs and improved sensitivity and resolution, our sensors provide exciting new oportunities for biometric identification. As part of our involvement in this effort, Equinox is collecting an extensive database of face imagery in the following modalities: coregistered broadband-visible/LWIR (8-12 microns), MWIR (3-5 microns), SWIR (0.9-1.7 microns). This data collection is made available for experimentation and statistical performance evaluations. VALID Database With the aim to facilitate the development of robust audio, face, and multi-modal person recognition systems, the large and realistic multi-modal (audio-visual) VALID database was acquired in a noisy "real world" office scenario with no control on illumination or acoustic noise. The database consists of five recording sessions of 106 subjects over a period of one month. One session is recorded in a studio with controlled lighting and no background noise, the other 4 sessions are recorded in office type scenarios. The database contains uncompressed JPEG Images at resolution of 720x576 pixels. The UCD Colour Face Image Database for Face Detection The database has two parts. Part one contains colour pictures of faces having a high degree of variability in scale, location, orientation, pose, facial expression and lighting conditions, while part two has manually segmented results for each of the images in part one of the database. These images are acquired from a wide variety of sources such as digital cameras, pictures scanned using photo-scanner, other face databases and the World Wide Web. The database is intended for distribution to researchers. Georgia Tech Face Database The database contains images of 50 people and is stored in JPEG format. For each individual, there are 15 color images captured between 06/01/99 and 11/15/99. Most of the images were taken in two different sessions to take into account the variations in illumination conditions, facial expression, and appearance. In addition to this, the faces were captured at different scales and orientations. Indian Face Database The database contains a set of face images taken in February, 2002 in the IIT Kanpur campus. There are eleven different images of each of 40 distinct subjects. For some subjects, some additional photographs are included. All the images were taken against a bright homogeneous background with the subjects in an upright, frontal position. The files are in JPEG format. The size of each image is 640x480 pixels, with 256 grey levels per pixel. The images are organized in two main directories - males and females. In each of these directories, there are directories with name as a serial numbers, each corresponding to a single individual. In each of these directories, there are eleven different images of that subject, which have names of the form abc.jpg, where abc is the image number for that subject. The following orientations of the face are included: looking front, looking left, looking right, looking up, looking up towards left, looking up towards right, looking down. Available emotions are: neutral, smile, laughter, sad/disgust. VidTIMIT Database The VidTIMIT database is comprised of video and corresponding audio recordings of 43 people, reciting short sentences. It can be useful for research on topics such as multi-view face recognition, automatic lip reading and multi-modal speech recognition. The dataset was recorded in 3 sessions, with a space of about a week between each session. There are 10 sentences per person, chosen from the TIMIT corpus. In addition to the sentences, each person performed a head rotation sequence in each session. The sequence consists of the person moving their head to the left, right, back to the center, up, then down and finally return to center. The recording was done in an office environment using a broadcast quality digital video camera. The video of each person is stored as a numbered sequence of JPEG images with a resolution of 512 x 384 pixels. The corresponding audio is stored as a mono, 16 bit, 32 kHz WAV file. The LFWcrop Database LFWcrop is a cropped version of the Labeled Faces in the Wild (LFW) dataset, keeping only the center portion of each image (i.e. the face). In the vast majority of images almost all of the background is omitted. LFWcrop was created due to concern about the misuse of the original LFW dataset, where face matching accuracy can be unrealistically boosted through the use of background parts of images (i.e. exploitation of possible correlations between faces and backgrounds). As the location and size of faces in LFW was determined through the use of an automatic face locator (detector), the cropped faces in LFWcrop exhibit real-life conditions, including mis-alignment, scale variations, in-plane as well as out-of-plane rotations. Labeled Faces in the Wild Labeled Faces in the Wild is a database of face photographs designed for studying the problem of unconstrained face recognition. The database contains more than 13,000 images of faces collected from the web. Each face has been labeled with the name of the person pictured. 1680 of the people pictured have two or more distinct photos in the database. The only constraint on these faces is that they were detected by the Viola-Jones face detector. Please see the database web page and the technical report linked there for more details. 3D_RMA database The 3D_RMA database is a collection of two sessions (Nov 1997 and Jan 1998) consisting of 120 persons. For each session, three shots were recorded with different (but limited) orientations of the head. Details about the population and typical problems affecting the quality are given in the referred link. 3D was captured thanks to a first prototype of a proprietary system based on structured light (analog camera!). The quality was limited but sufficient to show the ability of 3D face recognition. For privacy reasons, the texture images are not made available. In the period 2003-2008, this database has been downloaded by about 100 researchers. A few papers present recognition results with the database (like, of course, papers from the author ). GavabDB: 3D face database, GAVAB research group, Universidad Rey Juan Carlos, Spain GavabDB is a 3D face database. It contains 549 three-dimensional images of facial surfaces. These meshes correspond to 61 different individuals (45 male and 16 female) having 9 images for each person. The total of the individuals are Caucasian and their age is between 18 and 40 years old. Each image is given by a mesh of connected 3D points of the facial surface without texture. The database provides systematic variations with respect to the pose and the facial expression. In particular, the 9 images corresponding to each individual are: 2 frontal views with neutral expression, 2 x-rotated views (ą30o, looking up and looking down respectively) with neutral expression, 2 y-rotated views (ą90o, left and right profiles respectively) with neutral expression and 3 frontal gesture images (laugh, smile and a random gesture chosen by the user, respectively). FRAV2D Database This database is formed by up to 109 subjects (75 men and 34 women), with 32 colour images per person. Each picture has a 320 x 240 pixel resolution, with the face occupying most of the image in an upright position. For one single person, all the photographs were taken on the same day, although the subject was forced to stand up and sit down again in order to change pose and gesture. In all cases, the background is plain and dark blue. The 32 images were classified in six groups according to the pose and lighting conditions: 12 frontal images, 4 15o-turned images, 4 30o-turned images, 4 images with gestures, 4 images with occluded face features and 4 frontal images with a change of illumination. This database is delivered for free exclusively for research purposes. FRAV3D Database This database contains 106 subjects, with approximately one woman every three men. The data were acquired with a Minolta VIVID 700 scanner, which provides texture information (2D image) and a VRML file (3D image). If needed, the corresponding range data (2.5D image) can be computed by means of the VRML file. Therefore, it is a multimodal database (2D, 2.5D y 3D). During all time, a strict acquisition protocol was followed, with controlled lighting conditions. The person sat down on an adjustable stool opposite the scanner and in front of a blue wall. No glasses, hats or scarves were allowed. A total of 16 captures per person were taken in every session, with different poses and lighting conditions, trying to cover all possible variations, including turns in different directions, gestures and lighting changes. In every case only one parameter was modified between two captures. This is one of the main advantages of this database, respect to others. This database is delivered for free exclusively for research purposes. BJUT-3D Chinese Face Database The BJUT-3D is a three dimension face database including 500 Chinese persons. There are 250 females and 250 males in the database. Everyone has a 3D face data with neutral expression and without accessories. Original high-resolution 3D face data is acquired by the CyberWare 3D scanner in given environment, Every 3D face data has been preprocessed, and cut the redundant parts. Now the face database is available for research purpose only. The Multimedia and Intelligent Software Technology Beijing Municipal Key Laboratory in Beijing University of Technology is serving as the technical agent for distribution of the database and reserves the copyright of all the data in the database. The Bosphorus Database The Bosphorus Database is a new 3D face database that includes a rich set of expressions, systematic variation of poses and different types of occlusions. This database is unique from three aspects: (1) The facial expressions are composed of judiciously selected subset of Action Units as well as the six basic emotions, and many actors/actresses are incorporated to obtain more realistic expression data; (2) A rich set of head pose variations are available; (3) Different types of face occlusions are included. Hence, this new database can be a very valuable resource for development and evaluation of algorithms on face recognition under adverse conditions and facial expression analysis as well as for facial expression synthesis. PUT Face Database PUT Face Database consists of almost 10000 hi-res images of 100 people. Images were taken in controlled conditions and the database is supplied with additional data including: rectangles containing face, eyes, nose and mouth, landmarks positions and manually annotated contour models. Database is available for research purposes. The Basel Face Model (BFM) The Basel Face Model (BFM) is a 3D Morphable Face Model constructed from 100 male and 100 female example faces. The BFM consists of a generative 3D shape model covering the face surface from ear to ear and a high quality texture model. The model can be used either directly for 2D and 3D face recognition or to generate training and test images for any imaging condition. Hence, in addition to being a valuable model for face analysis it can also be viewed as a meta-database which allows the creation of accurately labeled synthetic training and testing images. To allow for a fair comparison with other algorithms, we provide both the training data set (the BFM) and the model fitting results for several standard image data sets (CMU-PIE, FERET) obtained with our fitting algorithm. The BFM web page additionally provides a set of registered scans of ten individuals, together with a set of 270 renderings of these individuals with systematic pose and light variations. These scans are not included in the training set of the BFM and form a standardized test set with a ground truth for pose and illumination. Plastic Surgery Face Database The plastic surgery face database is a real world database that contains 1800 pre and post surgery images pertaining to 900 subjects. Different types of facial plastic surgeries have different impact on facial features. To enable the researchers to design and evaluate face recognition algorithms on all types of facial plastic surgeries, the database contains images from a wide variety of cases such as Rhinoplasty (nose surgery), Blepharoplasty (eyelid surgery), brow lift, skin peeling, and Rhytidectomy (face lift). For each individual, there are two frontal face images with proper illumination and neutral expression: the first is taken before surgery and the second is taken after surgery. The database contains 519 image pairs corresponding to local surgeries and 381 cases of global surgery (e.g., skin peeling and face lift). The details of the database and performance evaluation of several well known face recognition algorithms is available in this paper . The Iranian Face Database (IFDB) The Iranian Face Database (IFDB), the first image database in middle-east, contains color facial imagery of a large number of Iranian subjects. IFDB is a large database that can support studies of the age classification systems. It contains over 3,600 color images. IFDB can be used for age classification, facial feature extraction, aging, facial ratio extraction, percent of facial similarity, facial surgery, race detection and other similar researches. The Hong Kong Polytechnic University NIR Face Database The Biometric Research Centre at The Hong Kong Polytechnic University developed a real time NIR face capture device and used it to construct a large-scale NIR face database. The NIR face image acquisition system consists of a camera, an LED light source, a filter, a frame grabber card and a computer. The camera used is a JAI camera, which is sensitive to NIR band. The active light source is in the NIR spectrum between 780nm - 1,100 nm. The peak wavelength is 850 nm. The strength of the total LED lighting is adjusted to ensure a good quality of the NIR face images when the camera face distance is between 80 cm - 120 cm, which is convenient for the users. By using the data acquisition device described above, we collected NIR face images from 335 subjects. During the recording, the subject was first asked to sit in front of the camera, and the normal frontal face images of him/her were collected. Then the subject was asked to make expression and pose changes and the corresponding images were collected. To collect face images with scale variations, we asked the subjects to move near to or away from the camera in a certain range. At last, to collect face images with time variations, samples from 15 subjects were collected at two different times with an interval of more than two months. In each recording, we collected about 100 images from each subject, and in total about 34,000 images were collected in the PolyU-NIRFD database. The Hong Kong Polytechnic University Hyperspectral Face Database (PolyU-HSFD) The Biometric Research Centre at The Hong Kong Polytechnic University established a Hyperspectral Face database. The indoor hyperspectral face acquisition system was built which mainly consists of a CRI's VariSpec LCTF and a Halogen Light, and includes a hyperspectral dataset of 300 hyperspectral image cubes from 25 volunteers with age range from 21 to 33 (8 female and 17 male). For each individual, several sessions were collected with an average time space of 5 month. The minimal interval is 3 months and the maximum is 10 months. Each session consists of three hyperspectral cubes - frontal, right and left views with neutral-expression. The spectral range is from 400 nm to 720 nm with a step length of 10 nm, producing 33 bands in all. Since the database was constructed over a long period of time, significant appearance variations of the subjects, e.g. changes of hair style and skin condition, are presented in the data. In data collection, positions of the camera, light and subject are fixed, which allows us to concentrate on the spectral characteristics for face recognition without masking from environmental changes. MOBIO - Mobile Biometry Face and Speech Database The MOBIO database consists of bi-modal (audio and video) data taken from 152 people. The database has a female-male ratio or nearly 1:2 (100 males and 52 females) and was collected from August 2008 until July 2010 in six different sites from five different countries. This led to a diverse bi-modal database with both native and non-native English speakers. In total 12 sessions were captured for each client: 6 sessions for Phase I and 6 sessions for Phase II. The Phase I data consists of 21 questions with the question types ranging from: Short Response Questions, Short Response Free Speech, Set Speech, and Free Speech. The Phase II data consists of 11 questions with the question types ranging from: Short Response Questions, Set Speech, and Free Speech. The database was recorded using two mobile devices: a mobile phone and a laptop computer. The mobile phone used to capture the database was a NOKIA N93i mobile while the laptop computer was a standard 2008 MacBook. The laptop was only used to capture part of the first session, this first session consists of data captured on both the laptop and the mobile phone. Texas 3D Face Recognition Database (Texas 3DFRD) Texas 3D Face Recognition database (Texas 3DFRD) contains 1149 pairs of facial color and range images of 105 adult human subjects. The images were acquired at the company Advanced Digital Imaging Research (ADIR), LLC (Friendswood, TX), formerly a subsidiary of Iris International, Inc. (Chatsworth, CA), with assistance from research students and faculty from the Laboratory for Image and Video Engineering (LIVE) at The University of Texas at Austin. This project was sponsored by the Advanced Technology Program of the National Institute of Standards and Technology (NIST). The database is being made available by Dr. Alan C Bovik at UT Austin. The images were acquired using a stereo imaging system at a high spatial resolution of 0.32 mm. The color and range images were captured simultaneously and thus are perfectly registered to each other. All faces have been normalized to the frontal position and the tip of the nose is positioned at the center of the image. The images are of adult humans from all the major ethnic groups and both genders. For each face, is also available information about the subjects' gender, ethnicity, facial expression, and the locations 25 anthropometric facial fiducial points. These fiducial points were located manually on the facial color images using a computer based graphical user interface. Specific data partitions (training, gallery, and probe) that were employed at LIVE to develop the Anthropometric 3D Face Recognition algorithm are also available. Natural Visible and Infrared facial Expression database (USTC-NVIE) The database contains both spontaneous and posed expressions of more than 100 subjects, recorded simultaneously by a visible and an infrared thermal camera, with illumination provided from three different directions. The posed database also includes expression images with and without glasses. The paper describing the database is available here . FEI Face Database The FEI face database is a Brazilian face database that contains a set of face images taken between June 2005 and March 2006 at the Artificial Intelligence Laboratory of FEI in Sao Bernardo do Campo, Sao Paulo, Brazil. There are 14 images for each of 200 individuals, a total of 2800 images. All images are colourful and taken against a white homogenous background in an upright frontal position with profile rotation of up to about 180 degrees. Scale might vary about 10% and the original size of each image is 640x480 pixels. All faces are mainly represented by students and staff at FEI, between 19 and 40 years old with distinct appearance, hairstyle, and adorns. The number of male and female subjects are exactly the same and equal to 100. ChokePoint ChokePoint video dataset is designed for experiments in person identification/verification under real-world surveillance conditions using existing technologies. An array of three cameras was placed above several portals (natural choke points in terms of pedestrian traffic) to capture subjects walking through each portal in a natural way. While a person is walking through a portal, a sequence of face images (ie. a face set) can be captured. Faces in such sets will have variations in terms of illumination conditions, pose, sharpness, as well as misalignment due to automatic face localisation/detection. Due to the three camera configuration, one of the cameras is likely to capture a face set where a subset of the faces is near-frontal. The dataset consists of 25 subjects (19 male and 6 female) in portal 1 and 29 subjects (23 male and 6 female) in portal 2. In total, the dataset consists of 54 video sequences and 64,204 labelled face images. UMB database of 3D occluded faces The University of Milano Bicocca 3D face database is a collection of multimodal (3D + 2D colour images) facial acquisitions. The database is available to universities and research centers interested in face detection, face recognition, face synthesis, etc. The UMB-DB has been acquired with a particular focus on facial occlusions, i.e. scarves, hats, hands, eyeglasses and other types of occlusion wich can occur in real-world scenarios. VADANA: Vims Appearance Dataset for facial ANAlysis The primary use of VADANA is for the problems of face verification and recognition across age progression. The main characteristics of VADANA, which distinguish it from current benchmarks, is the large number of intra-personal pairs (order of 168 thousand); natural variations in pose, expression and illumination; and the rich set of additional meta-data provided along with standard partitions for direct comparison and bench-marking efforts. 链接: http://www.face-rec.org/databases/
JMLR上面的开源软件 To support the open source software movement, JMLR MLOSS publishes contributions related to implementations of non-trivial machine learning algorithms, toolboxes or even languages for scientific computing. Submission instructions are available here . A Library for Locally Weighted Projection Regression Stefan Klanke, Sethu Vijayakumar, Stefan Schaal ; 9(Apr):623--626, 2008. Shark Christian Igel, Verena Heidrich-Meisner, Tobias Glasmachers ; 9(Jun):993--996, 2008. LIBLINEAR: A Library for Large Linear Classification Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, Chih-Jen Lin ; 9(Aug):1871--1874, 2008. JNCC2: The Java Implementation Of Naive Credal Classifier 2 Giorgio Corani, Marco Zaffalon ; 9(Dec):2695--2698, 2008. Python Environment for Bayesian Learning: Inferring the Structure of Bayesian Networks from Knowledge and Data Abhik Shah, Peter Woolf ; 10(Feb):159--162, 2009. Nieme: Large-Scale Energy-Based Models Francis Maes ; 10(Mar):743--746, 2009. Java-ML: A Machine Learning Library Thomas Abeel, Yves Van de Peer, Yvan Saeys ; 10(Apr):931--934, 2009. Model Monitor ( M 2 ): Evaluating, Comparing, and Monitoring Models Troy Raeder, Nitesh V. Chawla ; 10(Jul):1387--1390, 2009. Dlib-ml: A Machine Learning Toolkit Davis E. King ; 10(Jul):1755--1758, 2009. RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments Brian Tanner, Adam White ; 10(Sep):2133--2136, 2009. DL-Learner: Learning Concepts in Description Logics Jens Lehmann ; 10(Nov):2639−2642, 2009. Error-Correcting Output Codes Library Sergio Escalera, Oriol Pujol, Petia Radeva ; 11(Feb):661−664, 2010. PyBrain Tom Schaul, Justin Bayer, Daan Wierstra, Yi Sun, Martin Felder, Frank Sehnke, Thomas Rückstieß, Jürgen Schmidhuber ; 11(Feb):743−746, 2010. Continuous Time Bayesian Network Reasoning and Learning Engine Christian R. Shelton, Yu Fan, William Lam, Joon Lee, Jing Xu ; 11(Mar):1137−1140, 2010. SFO: A Toolbox for Submodular Function Optimization Andreas Krause ; 11(Mar):1141−1144, 2010. MOA: Massive Online Analysis Albert Bifet, Geoff Holmes, Richard Kirkby, Bernhard Pfahringer ; 11(May):1601−1604, 2010. FastInf: An Efficient Approximate Inference Library Ariel Jaimovich, Ofer Meshi, Ian McGraw, Gal Elidan ; 11(May):1733−1736, 2010. The SHOGUN Machine Learning Toolbox Sören Sonnenburg, Gunnar Rätsch, Sebastian Henschel, Christian Widmer, Jonas Behr, Alexander Zien, Fabio de Bona, Alexander Binder, Christian Gehl, Vojtěch Franc ; 11(Jun):1799−1802, 2010. A Surrogate Modeling and Adaptive Sampling Toolbox for Computer Based Design Dirk Gorissen, Ivo Couckuyt, Piet Demeester, Tom Dhaene, Karel Crombecq ; 11(Jul):2051−2055, 2010. Model-based Boosting 2.0 Torsten Hothorn, Peter Bühlmann, Thomas Kneib, Matthias Schmid, Benjamin Hofner ; 11(Aug):2109−2113, 2010. libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models Joris M. Mooij ; 11(Aug):2169−2173, 2010. Gaussian Processes for Machine Learning (GPML) Toolbox Carl Edward Rasmussen, Hannes Nickisch ; 11(Nov):3011−3015, 2010. CARP: Software for Fishing Out Good Clustering Algorithms Volodymyr Melnykov, Ranjan Maitra ; 12(Jan):69−73, 2011. The arules R-Package Ecosystem: Analyzing Interesting Patterns from Large Transaction Data Sets Michael Hahsler, Sudheer Chelluboina, Kurt Hornik, Christian Buchta ; 12(Jun):2021−2025, 2011. MSVMpack: A Multi-Class Support Vector Machine Package Fabien Lauer, Yann Guermeur ; 12(Jul):2293−2296, 2011. Waffles : A Machine Learning Toolkit Michael Gashler ; 12(Jul):2383−2387, 2011. MULAN: A Java Library for Multi-Label Learning Grigorios Tsoumakas, Eleftherios Spyromitros-Xioufis, Jozef Vilcek, Ioannis Vlahavas ; 12(Jul):2411−2414, 2011. LPmade: Link Prediction Made Easy Ryan N. Lichtenwalter, Nitesh V. Chawla ; 12(Aug):2489−2492, 2011. Scikit-learn: Machine Learning in Python Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, édouard Duchesnay ; 12(Oct):2825−2830, 2011. The Stationary Subspace Analysis Toolbox Jan Saputra Müller, Paul von Bünau, Frank C. Meinecke, Franz J. Király, Klaus-Robert Müller ; 12(Oct):3065−3069, 2011. 原文链接 http://jmlr.csail.mit.edu/mloss/
一个在线的sparse coding学习算法,最近很多应用类的文章都用这个算法。 The SPArse Modelling Software (SPAMS) can be downloaded here . It includes fast implementations of LARS, OMP, a dictionary learning algorithm and its variants for NMF, sparse PCA, as well as efficient sparse solvers based on proximal methods. June, 2011: SPAMS is now released under an open-source licence. This should be a good news for MAC, Windows, R, Python, C++ users. The package contains scripts for compiling the library under Linux. If you manage to make a script for compiling it under Mac OS and Windows, I would be glad to include it in the release. The denoising code of my ICCV'09 paper can be found here .The package contains binary files for Linux 64bits computers, and an instruction file. Academic use only. The demosaicking code of my ICCV'09 paper can be found here .The package contains binary files for Linux 64bits computers, and an instruction file. Academic use only. The KSVD source code of my IEEE-TIP and SIAM-MMS papers from 2008 can be found here . This package is not maintained anymore and I will not respond to any question about the source code. If you need Linux binaries to do experiments, please contact me.
一、投稿信 1. Dear Dr. Defendi ML: I am sending a manuscript entitled “” by – which I should like to submit for possible publication in the journal of - . Yours sincerely 2. Dear Dr. A: Enclosed is a manuscript entitled “” by sb, which we are submitting for publication in the journal of - . We have chosen this journal because it deals with - . We believe that sth would be of interest to the journal’s readers. 3. Dear Dr. A: Please find enclosed for your review an original research article, “” by sb. All authors have read and approve this version of the article, and due care has been taken to ensure the integrity of the work. No part of this paper has published or submitted elsewhere. No conflict of interest exits in the submission of this manuscript, and we have attached to this letter the signed letter granting us permission to use Figure 1 from another source. We appreciate your consideration of our manuscript, and we look forward to receiving comments from the reviewers. 二、询问有无收到稿件 Dear Editors, We dispatched our manuscript to your journal on 3 August 2006 but have not, as yet, receive acknowledgement of their safe arrival. We fear that may have been lost and should be grateful if you would let us know whether or not you have received them. If not, we will send our manuscript again. Thank you in advance for your help. 三、询问论文审查回音 Dear Editors , It is more than 12 weeks since I submitted our manuscript (No: ) for possible publication in your journal. I have not yet received a reply and am wondering whether you have reached a decision. I should appreciated your letting me know what you have decided as soon as possible. 四、关于论文的总体审查意见 1. This is a carefully done study and the findings are of considerable interest. A few minor revision are list below. 2. This is a well-written paper containing interesting results which merit publication. For the benefit of the reader, however, a number of points need clarifying and certain statements require further justification. There are given below. 3. Although these observation are interesting, they are rather limited and do not advance our knowledge of the subject sufficiently to warrant publication in PNAS. We suggest that the authors try submitting their findings to specialist journal such as – 4. Although this paper is good, it would be ever better if some extra data were added. 5. This manuscript is not suitable for publication in the journal of – because the main observation it describe was reported 3 years ago in a reputable journal of - . 6. Please ask someone familiar with English language to help you rewrite this paper. As you will see, I have made some correction at the beginning of the paper where some syntax is not satisfactory. 7. We feel that this potentially interesting study has been marred by an inability to communicate the finding correctly in English and should like to suggest that the authors seek the advice of someone with a good knowledge of English, preferable native speaker. 8. The wording and style of some section, particularly those concerning HPLC, need careful editing. Attention should be paid to the wording of those parts of the Discussion of and Summary which have been underlined. 9. Preliminary experiments only have been done and with exception of that summarized in Table 2, none has been repeated. This is clearly unsatisfactory, particularly when there is so much variation between assays. 10. The condition of incubation are poorly defined. What is the temperature? Were antibody used? 五、给编辑的回信 1. In reply to the referee’s main criticism of paper, it is possible to say that – One minor point raised by the referee concerns of the extra composition of the reaction mixture in Figure 1. This has now been corrected. Further minor changes had been made on page 3, paragraph 1 (line 3-8) and 2 (line 6-11). These do not affect our interpretation of the result. 2. I have read the referee’s comments very carefully and conclude that the paper has been rejected on the sole grounds that it lake toxicity data. I admit that I did not include a toxicity table in my article although perhaps I should have done. This was for the sake of brevity rather than an error or omission. 3. Thank you for your letter of – and for the referee’s comments concerning our manuscript entitled “”. We have studied their comments carefully and have made correction which we hope meet with their approval. 4. I enclosed a revised manuscript which includes a report of additional experiments done at the referee’s suggestion. You will see that our original findings are confirmed. 5. We are sending the revised manuscript according to the comments of the reviewers. Revised portion are underlined in red. 6. We found the referee’s comments most helpful and have revised the manuscript 7. We are pleased to note the favorable comments of reviewers in their opening sentence. 8. Thank you for your letter. I am very pleased to learn that our manuscript is acceptable for publication in Cancer Research with minor revision. 9. We have therefore completed a further series of experiments, the result of which are summarized in Table 5. From this we conclude that intrinsic factor is not account. 10. We deleted the relevant passage since they are not essential to the contents of the paper. 11. I feel that the reviewer’s comments concerning Figures 1 and 2 result from a misinterpretation of the data. 12. We would have include a non-protein inhibitor in our system, as a control, if one had been available. 13. We prefer to retain the use of Table 4 for reasons that it should be clear from the new paragraph inserted at the end of the Results section. 14. Although reviewer does not consider it is important to measure the temperature of the cells, we consider it essential. 15. The running title has been changed to “”. 16. The Materials and Methods section now includes details for measuring uptake of isotope and assaying hexokinase. 17. The concentration of HAT media (page12 paragraph 2) was incorrectly stated in the original manuscript. This has been rectified. The authors are grateful to the referees for pointing out their error. 18. As suggested by both referees, a discussion of the possibility of laser action on chromosome has been included (page16, paragraph 2). 19. We included a new set of photographs with better definition than those originally submitted and to which a scale has been added. 20. Following the suggestion of the referees, we have redraw Figure 3 and 4. 21. Two further papers, published since our original submission, have been added to the text and Reference section. These are: 22. We should like to thank the referees for their helpful comments and hope that we have now produced a more balance and better account of our work. We trust that the revised manuscript is acceptable for publication. 23. I greatly appreciate both your help and that of the referees concerning improvement to this paper. I hope that the revised manuscript is now suitable for publication. 24. I should like to express my appreciation to you and the referees for suggesting how to improve our paper. 25. I apologize for the delay in revising the manuscript. This was due to our doing an additional experiment, as suggested by referees.
一、最初投稿 Cover letter Dear Editors: We would like to submit the enclosed manuscript entitled “Paper Title”, which we wish to be considered for publication in “Journal Name”. No conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication. I would like to declare on behalf of my co-authors that the work described was original research that has not been published previously, and not under consideration for publication elsewhere, in whole or in part. All the authors listed have approved the manuscript that is enclosed. In this work, we evaluated …… ( 简要介绍一下论文的创新性 ). I hope this paper is suitable for “Journal Name”. The following is a list of possible reviewers for your consideration: 1) Name AE-mail: ××××@×××× 2) Name BE-mail: ××××@×××× We deeply appreciate your consideration of our manuscript, and we look forward to receiving comments from the reviewers. If you have any queries, please don’t hesitate to contact me at the address below. Thank you and best regards. Yours sincerely, ×××××× Corresponding author: Name: ××× E-mail: ××××@×××× 二、催稿信 Dear Prof. ×××: Sorry for disturbing you. I am not sure if it is the right time to contact you to inquire about the status of my submitted manuscript titled “Paper Title”. (ID: 文章稿号 ), although the status of “With Editor” has been lasting for more than two months, since submitted to journal three months ago. I am just wondering that my manuscript has been sent to reviewers or not? I would be greatly appreciated if you could spend some of your time check the status for us. I am very pleased to hear from you on the reviewer’s comments. Thank you very much for your consideration. Best regards! Yours sincerely, ×××××× Corresponding author: Name: ××× E-mail: ××××@×××× 三、修改稿 Cover letter Dear Dr/ Prof.. (写上负责你文章编辑的姓名,显得尊重,因为第一次的投稿不知道具体负责的编辑,只能用通用的 Editors ) : On behalf of my co-authors, we thank you very much for giving us an opportunity to revise our manuscript, we appreciate editor and reviewers very much for their positive and constructive comments and suggestions on our manuscript entitled “Paper Title”. (ID: 文章稿号 ). We have studied reviewer’s comments carefully and have made revision which marked in red in the paper. We have tried our best to revise our manuscript according to the comments. Attached please find the revised version, which we would like to submit for your kind consideration. We would like to express our great appreciation to you and reviewers for comments on our paper. Looking forward to hearing from you. Thank you and best regards. Yours sincerely, ×××××× Corresponding author: Name: ××× E-mail: ××××@×××× 四、修改稿回答审稿人的意见(最重要的部分) List of Responses Dear Editors and Reviewers: Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “Paper Title” (ID: 文章稿号 ). Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have studied comments carefully and have made correction which we hope meet with approval. Revised portion are marked in red in the paper. The main corrections in the paper and the responds to the reviewer’s comments are as flowing: Responds to the reviewer’s comments: Reviewer #1: 1. Response to comment: (…… 简要列出意见 ……) Response: ×××××× 2. Response to comment: (…… 简要列出意见 ……) Response: ×××××× 。。。。。。 逐条意见回答,切忌一定不能有遗漏 针对不同的问题有下列几个礼貌术语可适当用用: We are very sorry for our negligence of ……... We are very sorry for our incorrect writing ……... It is really true as Reviewer suggested that…… We have made correction according to the Reviewer’s comments. We have re-written this part according to the Reviewer’s suggestion As Reviewer suggested that…… Considering the Reviewer’s suggestion, we have …… 最后特意感谢一下这个审稿人的意见: Special thanks to you for your good comments. Reviewer #2: 同上述 Reviewer #3: ×××××× Other changes: 1. Line 60-61, the statements of “……” were corrected as “…………” 2. Line 107, “……” was added 3. Line 129, “……” was deleted ×××××× We tried our best to improve the manuscript and made some changes in the manuscript.These changes will not influence the content and framework of the paper. And here we did not list the changes but marked in red in revised paper. We appreciate for Editors/Reviewers’ warm work earnestly, and hope that the correction will meet with approval. Once again, thank you very much for your comments and suggestions. 五、文章接受后可以考虑感谢一下负责你文章的编辑或主编(根据需要) Dear Prof. ××××××: Thanks very much for your kind work and consideration on publication of our paper. On behalf of my co-authors, we would like to express our great appreciation to editor and reviewers. Thank you and best regards. Yours sincerely, ×××××× Corresponding author: Name: ××× E-mail: ××××@×××× 六、询问校稿信件(如果文章接受后时间较长) Dear ×××: Sorry for disturbing you. I am not sure if it is the right time to contact you to inquire about the status of our accepted manuscript titled “Paper Title” (ID: 文章稿号 ), since the copyright agreement for publication has been sent to you two months ago. I am just wondering that how long I can receive the proof of our manuscript from you? I would be greatly appreciated if you could spend some of your time for a reply. I am very pleased to hear from you. Thank you very much for your consideration. Yours sincerely, ×××××× Corresponding author: Name: ××× E-mail: ××××@×××× 七、文章校稿信件 Dear Mr. ×××: Thanks very much for your kind letter about the proof of our paper titled “Paper Title” (ID: 文章稿号 ) for publication in “Journal Name”. We have finished the proof reading and checking carefully, and some corrections about the proof and the answers to the queries are provided below. Corrections: 1. In ****** should be **** (Page ***, Right column, line***) 2. In **** the “*****” should be “****” (Page ****, Right column, line****) Answers for “author queries”: 1. *********************. 2. ********************** 3. ********************** We greatly appreciate the efficient, professional and rapid processing of our paper by your team. If there is anything else we should do, please do not hesitate to let us know. Thank you and best regards. Yours sincerely, ×××××× Corresponding author: Name: ××× E-mail: ××××@××××
最近推荐系统很火,转载一篇入门级别的,转载地址: 推荐系统的循序进阶读物 导师推荐的 为了方便大家从理论到实践,从入门到精通,循序渐进系统地理解掌握推荐系统及相关知识。特做了个读物清单。大家可以按此表阅读,也欢迎提出意见和指出未标明的经典文献以丰富各学科需求(为避免初学者疲于奔命,每个方向只推荐几篇经典文献)。 1. 中文综述(了解概念-入门篇) a) 个性化推荐系统的研究进展 b) 个性化推荐系统评价方法综述 2. 英文综述(了解概念-进阶篇) a) 2004ACMTois-Evaluating collaborative filtering recommender systems b) 2004ACMTois -Introduction to Recommender Systems - Algorithms and evaluation c) 2005IEEEtkde Toward the next generation of recommender systems - A survey of the state-of-the-art and possible extensions 3. 动手能力(实践算法-入门篇) a) 2004ACMtois Item-based top-N recommendation algorithms.pdf (协同过滤) b) 2007PRE Bipartite network projection and personal recommendation.pdf (网络结构) 4. 动手能力(实践算法-进阶篇) a) 2010PNAS-Solving the apparent diversity-accuracy dilemma of recommender systems.pdf (物质扩散和热传导) b) 2009NJP Accurate and diverse recommendations via eliminating redundant correlations.pdf (多步物质扩散) c) 2008EPL Effect of initial configuration on network-based Recommendation.pdf (初始资源分配问题) 5. 推荐系统扩展应用(进阶篇) a) 2009EPJB Predicting missing links via local information.pdf (相似性度量方法) b) 2010theis-Evaluating Collaborative Filtering over time.pdf (基于时间效应的博士论文) c) 2009PA Personalized recommendation via integrated diffusion on user-item-tag tripartite graphs.pdf (基于标签的三部分图方法) d) 2004LNCS Trust-aware collaborative filtering for recommender systems.pdf (基于信任机制) e) 1997CA-Fab_content-based, collaborative recommendation.pdf (基于文本信息) 6. 推荐结果的解释(进阶篇) a) 2000CSCW-Explaining Collaborative Filtering Recommendations.pdf b) 2011PRE-Information filtering via biased heat conduction.pdf c) 2011PRE- Information filtering via preferential diffusion.pdf d) 2010EPL Link Prediction in weighted networks - The role of weak ties e) 2010EPL-Solving the cold-start problem in recommender systems with social tags.pdf 7. 推荐系统综合篇(专著、大型综述、博士论文) a) 2005Ziegler-thesis-Towards Decentralized Recommender Systems.pdf 2010Recommender Systems Handbook.pdf
1.《Convex Optimization》这本是是斯坦福的 Stephen P. Boyd教授写的,非常权威,书的主页: http://www.stanford.edu/~boyd/cvxbook/ 2.《Nonlinear Programming: 2nd Edition》by Dimitri P. Bertsekas 3.《Numerical Optimization》by Nocedal and Wright 这几本应该是非常经典的书籍。
2011年CVPR,MIT的 Jason Chang 发表了一篇文章:Efficient MCMC Sampling with Implicit Shape Representations 主要用于边缘检测。速度是普通方法的50,000倍(In contrastto previous methods, BFPS easily and efficiently handlestopological changes, large perturbations, and multipleregions, while exhibiting a 50,000 times speed up.) 代码可以到作者主页下载。
Important Dates Paper submission deadline July 1,2012 Review open August 30,2012 Rebuttal deadline September 6, 2012 Notification of acceptance September 25, 2012 Camera ready early registration October 10, 2012 Workshops/Tutorials/Demos/Special session November 5-6, 2012 Main conference November 7-9, 2012 Call For Papers Motion and Tracking / Stereo and Structure from Motion / Shape from X / Color and Texture / Segmentation and Grouping / Image-Based Modeling / Illumination and Reflectance Modeling / Sensors / Early and Biologically-Inspired Vision / Computational Photography and Video / Object Recognition / Object Detection and Categorization / Video Analysis and Event Recognition / Face and Gesture Analysis / Statistical Methods and Learning / Performance Evaluation / Medical Image Analysis / Optimization Methods / Applications of Computer Vision link: http://www.accv2012.org/
大部分作者主页上已经挂出了今年cvpr的文章,该看看了,下面的文章是我打算近期看的,希望能坚持。 Discriminative Virtual Views for Cross-View Action Recognition Detecting activities of daily living in first-person camera views We are not Contortionist: Coupled Adaptive Learning for Head and Body Orientation Estimation in Surveillance Video Understanding Collective Crowd Behaviors:Learning Mixture Model of Dynamic Pedestrian-Agents Max-Margin Early Event Detectors Weakly Supervised Structured Output Learning for Semantic Segmentation Contextual Boost for Pedestrian Detection Accidental pinhole and pinspeck cameras: revealing the scene outside the picture Finding Animals: Semantic Segmentation using Regions and Parts A Unified Approach to Salient Object Detection via Low Rank Matrix Recovery Pedestrian detection at 100 frames per second Stream-based Joint Exploration-Exploitation Active Learning Beyond Spatial Pyramids: Receptive Field Learning for Pooled Image Features A-Optimal Non-negative Projection for Image Representation
Kristen Grauman是德州大学奥斯汀分校的教授,2011年marr prize的得主,还是个美女。 今年秋季她开设的Visual Recognition课程内容很丰富,而且很新,资源也很多,有兴趣的可以去看看。课程网址: http://www.cs.utexas.edu/~grauman/courses/fall2011/schedule.html Date Topics Papers and links Presenters Items due Aug 24 Course intro Topic preferences due via email by Monday August 29 I. Single-object recognition fundamentals: representation, matching, and classification Aug 31 Recognizing specific objects: Invariant local features, instance recognition, bag-of-words models img alt="sift" src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/sift.png" real_src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/sift.png" title="今年马尔奖得主KristenGrauman讲的VisualRecognition" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; border-top-width: 0px; border-right-width: 0px; border-bottom-width: 0px; border-left-width: 0px; border-style: initial; border-color: initial; list-style-type: none; list-style-position: initial; list-style-image: initial; width: 300px; height: 248px; " *Object Recognition from Local Scale-Invariant Features, Lowe, ICCV 1999. *Local Invariant Feature Detectors: A Survey, Tuytelaars and Mikolajczyk.Foundations and Trends in Computer Graphics and Vision, 2008. *Video Google: A Text Retrieval Approach to Object Matching in Videos, Sivic and Zisserman, ICCV 2003. For more background on feature extraction: Szeliski book : Sec 3.2 Linear filtering, 4.1 Points and patches, 4.2 Edges Scalable Recognition with a Vocabulary Tree, D. Nister and H. Stewenius, CVPR 2006. SURF: Speeded Up Robust Features, Bay, Ess, Tuytelaars, and Van Gool, CVIU 2008. Bundling Features for Large Scale Partial-Duplicate Web Image Search.Z. Wu, Q. Ke, M. Isard, and J. Sun.CVPR 2009. Robust Wide Baseline Stereo from Maximally Stable Extremal Regions, J. Matas, O. Chum, U. Martin, and T. Pajdla, BMVC 2002. City-Scale Location Recognition, G. Schindler, M. Brown, and R. Szeliski, CVPR 2007. Object Retrieval with Large Vocabularies and Fast Spatial Matching. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, CVPR 2007. I Know What You Did Last Summer: Object-Level Auto-annotation of Holiday Snaps, S. Gammeter, L. Bossard, T.Quack, L. van Gool, ICCV 2009. Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval.O. Chum et al. CVPR 2007. A Performance Evaluation of Local Descriptors. K. Mikolajczyk and C. Schmid. CVPR 2003 Oxford group interest point software Andrea Vedaldi's VLFeat code , including SIFT, MSER, hierarchical k-means. INRIA LEAR team's software , including interest points, shape features Semantic Robot Vision Challenge links CVPR 2009 Workshop on Visual Place Categorization Code for downloading Flickr images, by James Hays UW Community Photo Collections homepage FLANN - Fast Library for Approximate Nearest Neighbors.Marius Muja et al. Google Goggles Kooaba Sept 7 Recognition via classification and global models: Global appearance models for category and scene recognition, sliding window detection, detection as a binary decision. img alt="hog" src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/hog.png" real_src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/hog.png" title="今年马尔奖得主KristenGrauman讲的VisualRecognition" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; border-top-width: 0px; border-right-width: 0px; border-bottom-width: 0px; border-left-width: 0px; border-style: initial; border-color: initial; list-style-type: none; list-style-position: initial; list-style-image: initial; width: 300px; height: 193px; " *A Discriminatively Trained, Multiscale, Deformable Part Model, by P. Felzenszwalb,D.McAllester and D. Ramanan.CVPR 2008. *Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, Lazebnik, Schmid, and Ponce, CVPR 2006. *Rapid Object Detection Using a Boosted Cascade of Simple Features, Viola and Jones, CVPR 2001. Histograms of Oriented Gradients for Human Detection, Dalal and Triggs, CVPR 2005. Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope, Oliva and Torralba, IJCV 2001. Locality-Constrained Linear Coding for Image Classification.J. Wang, J. Yang, K. Yu,and T. HuangCVPR 2010. Visual Categorization with Bags of Keypoints, C. Dance, J. Willamowski, L. Fan, C. Bray, and G. Csurka, ECCV International Workshop on Statistical Learning in Computer Vision, 2004. Pedestrian Detection in Crowded Scenes, Leibe, Seemann, and Schiele, CVPR 2005. Pyramids of Histograms of Oriented Gradients (pHOG), Bosch and Zisserman. Eigenfaces for Recognition, Turk and Pentland, 1991. Sampling Strategies for Bag-of-Features Image Classification.E. Nowak, F. Jurie, and B. Triggs.ECCV 2006. Beyond Sliding Windows: Object Localization by Efficient Subwindow Search.C. Lampert, M. Blaschko, and T. Hofmann.CVPR 2008. A Trainable System for Object Detection, C. Papageorgiou and T. Poggio, IJCV 2000. Object Recognition with Features Inspired by Visual Cortex.T. Serre, L. Wolf and T. Poggio. CVPR 2005. LIBPMK feature extraction code, includes dense sampling LIBSVM library for support vector machines PASCAL VOC Visual Object Classes Challenge Sept 14 Regions and mid-level representations Segmentation, grouping, surface estimation img alt="regions" src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/regions2.jpg" real_src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/regions2.jpg" title="今年马尔奖得主KristenGrauman讲的VisualRecognition" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; border-top-width: 0px; border-right-width: 0px; border-bottom-width: 0px; border-left-width: 0px; border-style: initial; border-color: initial; list-style-type: none; list-style-position: initial; list-style-image: initial; width: 300px; height: 82px; " img alt="geocontext" src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/geocontext.jpg" real_src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/geocontext.jpg" title="今年马尔奖得主KristenGrauman讲的VisualRecognition" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; border-top-width: 0px; border-right-width: 0px; border-bottom-width: 0px; border-left-width: 0px; border-style: initial; border-color: initial; list-style-type: none; list-style-position: initial; list-style-image: initial; width: 300px; height: 108px; " *Constrained Parametric Min-Cuts for Automatic Object Segmentation.J.Carreira and C.Sminchisescu. CVPR 2010. *Geometric Context from a Single Image, by D. Hoiem, A. Efros, and M. Hebert, ICCV 2005. *Contour Detection and Hierarchical Image Segmentation.P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. PAMI 2011. From Contours to Regions: An Empirical Evaluation.P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik.CVPR 2009. Boundary-Preserving Dense Local Regions.J. Kim and K. Grauman.CVPR 2011. Object Recognition as Ranking Holistic Figure-Ground Hypotheses.F.Li, J.Carreira, and C.Sminchisescu. CVPR 2010. Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman.CVPR 2006. Combining Top-down and Bottom-up Segmentation.E. Borenstein, E. Sharon, and S. Ullman.CVPRworkshop 2004. Efficient Region Search for Object Detection.S. Vijayanarasimhan and K. Grauman. CVPR 2011. Extracting Subimages of an Unknown Category from a Set of Images,S. Todorovic and N. Ahuja, CVPR 2006. Learning Mid-level Features for Recognition.Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce.CVPR, 2010. Class-Specific, Top-Down Segmentation, E. Borenstein and S. Ullman, ECCV 2002. Object Recognition by Integrating Multiple Image Segmentations, C. Pantofaru, C. Schmid, and M. Hebert, ECCV 2008 Image Parsing: Unifying Segmentation, Detection, and Recognition. Tu, Z., Chen, Z., Yuille, A.L., Zhu, S.C. ICCV 2003 GrabCut -Interactive Foreground Extraction using Iterated Graph Cuts , by C. Rother, V. Kolmogorov, A. Blake, SIGGRAPH 2004. Recognition Using Regions.C. Gu, J. Lim, P. Arbelaez, J. Malik, CVPR 2009. Robust Higher Order Potentials for Enforcing Label Consistency, P. Kohli, L. Ladicky, and P. Torr. CVPR 2008. Co-segmentation of Image Pairs by Histogram Matching --Incorporating a Global Constraint into MRFs , C. Rother, V. Kolmogorov, T. Minka, and A. Blake.CVPR 2006. Collect-Cut: Segmentation with Top-Down Cues Discovered in Multi-Object Images.Y. J. Lee and K. Grauman. CVPR 2010. An Efficient Algorithm for Co-segmentation, D. Hochbaum, V. Singh, ICCV 2009. Normalized Cuts and Image Segmentation, J. Shi and J. Malik.PAMI 2000. Greg Mori's superpixel code Berkeley Segmentation Dataset and code Pedro Felzenszwalb's graph-based segmentation code Michael Maire's segmentation code and paper Mean-shift: a Robust Approach Towards Feature Space Analysis David Blei's Topic modeling code Expts: Brian , Cho-Jui Implementation assignment due Friday Sept 16, 5 PM II. Beyond single objects: scenes and properties Sept 21 Context and scenes Multi-object scenes, inter-object relationships, understanding scenes' spatial layout, 3d context img alt="context" src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/context.png" real_src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/context.png" title="今年马尔奖得主KristenGrauman讲的VisualRecognition" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; border-top-width: 0px; border-right-width: 0px; border-bottom-width: 0px; border-left-width: 0px; border-style: initial; border-color: initial; list-style-type: none; list-style-position: initial; list-style-image: initial; width: 300px; height: 199px; " *Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces.D. Lee, A. Gupta, M. Hebert, and T. Kanade.NIPS 2010. *Multi-Class Segmentation with Relative Location Prior.S. Gould, J. Rodgers, D. Cohen, G. Elidan and D.Koller.IJCV 2008. *Using the Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization.Torralba, Murphy, and Freeman.CACM 2009. Contextual Priming for Object Detection, A. Torralba.IJCV 2003. TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation. J. Shotton, J. Winn, C. Rother, A. Criminisi.ECCV 2006. Recognition Using Visual Phrases.M. Sadeghi and A. Farhadi.CVPR 2011. Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry.V. Hedau, D. Hoiem, and D. Forsyth.ECCV 2010 Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics, A. Gupta, A. Efros, and M. Hebert.ECCV 2010. Object-Graphs for Context-Aware Category Discovery. Y. J. Lee and K. Grauman. CVPR 2010. Geometric Reasoning for Single Image Structure Recovery. D. Lee, M. Hebert, and T. Kanade.CVPR 2009. Putting Objects in Perspective, by D. Hoiem, A. Efros, and M. Hebert, CVPR 2006. Discriminative Models for Multi-Class Object Layout, C. Desai, D. Ramanan, C. Fowlkes. ICCV 2009. Closing the Loop in Scene Interpretation.D. Hoiem, A. Efros, and M. Hebert.CVPR 2008. Decomposing a Scene into Geometric and Semantically Consistent Regions, S. Gould, R. Fulton, and D. Koller, ICCV 2009. Learning Spatial Context: Using Stuff to Find Things, by G. Heitz and D. Koller, ECCV 2008. An Empirical Study of Context in Object Detection, S. Divvala, D. Hoiem, J. Hays, A. Efros, M. Hebert, CVPR 2009. Object Categorization using Co-Occurrence, Location and Appearance, by C. Galleguillos, A. Rabinovich and S. Belongie, CVPR 2008. Context Based Object Categorization: A Critical Survey . C. Galleguillos and S. Belongie. What, Where and Who? Classifying Events by Scene and Object Recognition, L.-J. Li and L. Fei-Fei, ICCV 2007. Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Unsupervised Framework, L-J. Li, R. Socher, L. Fei-Fei, CVPR 2009. Labelme Database Scene Understanding Symposium Papers: Nishant , Jung Expts: Saurajit Sept 28 Saliency and attention Among all items in the scene, which deserve attention (first)? img alt="saliency" src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/salient.jpg" real_src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/salient.jpg" title="今年马尔奖得主KristenGrauman讲的VisualRecognition" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; border-top-width: 0px; border-right-width: 0px; border-bottom-width: 0px; border-left-width: 0px; border-style: initial; border-color: initial; list-style-type: none; list-style-position: initial; list-style-image: initial; width: 300px; height: 213px; " *A Model of Saliency-based Visual Attention for Rapid Scene Analysis.L. Itti, C. Koch, and E. Niebur.PAMI 1998 *Learning to Detect a Salient Object.T. Liu et al. CVPR 2007. *Figure-Ground Segmentation Improves Handled Object Recognition in Egocentric Video.X. Ren and C. Gu.CVPR 2010 *What Do We Perceive in a Glance of a Real-World Scene?L. Fei-Fei, A. Iyer, C. Koch, and P. Perona.Journal of Vision, 2007. Interesting Objects are Visually Salient.L. Elazary and L. Itti. Journal of Vision, 8(3):1–15, 2008. Accounting for the Relative Importance of Objects in Image Retrieval.S. J. Hwang and K. Grauman.BMVC 2010. Some Objects are More Equal Than Others: Measuring and Predicting Importance, M. Spain and P. Perona.ECCV 2008. What Makes an Image Memorable?P. Isola et al. CVPR 2011. The Discriminant Center-Surround Hypothesis for Bottom-Up Saliency. D. Gao, V.Mahadevan, and N. Vasconcelos. NIPS, 2007. Category-Independent Object Proposals.I. Endres and D. Hoiem.ECCV 2010. What is an Object?B. Alexe, T. Deselaers, and V. Ferrari.CVPR 2010. A Principled Approach to Detecting Surprising Events in Video.L. Itti and P. Baldi.CVPR 2005 Optimal Scanning for Faster Object Detection,N. Butko, J. Movellan.CVPR 2009. What Attributes Guide the Deployment of Visual Attention and How Do They Do It?J. Wolfe and T. Horowitz.Neuroscience, 5:495–501, 2004. Visual Correlates of Fixation Selection: Effects of Scale and Time. B. Tatler, R. Baddeley, and I. Gilchrist. Vision Research, 45:643, 2005. Objects Predict Fixations Better than Early Saliency.W. Einhauser, M. Spain, and P. Perona.Journal of Vision, 8(14):1–26, 2008. Reading Between the Lines: Object Localization Using Implicit Cues from Image Tags.S. J. Hwang and K. Grauman.CVPR 2010. Peripheral-Foveal Vision for Real-time Object Recognition and Tracking in Video.S. Gould, J. Arfvidsson, A. Kaehler, B. Sapp, M. Messner, G. Bradski, P. Baumstrack,S. Chung, A. Ng.IJCAI 2007. Peekaboom: A Game for Locating Objects in Images, by L. von Ahn, R. Liu and M. Blum, CHI 2006. Determining Patch Saliency Using Low-Level Context, D. Parikh, L. Zitnick, and T. Chen. ECCV 2008. Visual Recognition and Detection Under Bounded Computational Resources, S. Vijayanarasimhan and A. Kapoor.CVPR 2010. Key-Segments for Video Object Segmentation. Y. J. Lee, J. Kim, and K. Grauman. ICCV 2011 Contextual Guidance of Eye Movements and Attention in Real-World Scenes: The Role of Global Features on Object Search.A. Torralba, A. Oliva, M. Castelhano, J. Henderson. The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search, G. Zelinsky, W. Zhang, B. Yu, X. Chen, D. Samaras, NIPS 2005. Papers: Lu Xia Expts: Larry Oct 5 Attributes: Visual properties, learning from natural language descriptions, intermediate representations img alt="attributes" src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/attributes.png" real_src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/attributes.png" title="今年马尔奖得主KristenGrauman讲的VisualRecognition" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; border-top-width: 0px; border-right-width: 0px; border-bottom-width: 0px; border-left-width: 0px; border-style: initial; border-color: initial; list-style-type: none; list-style-position: initial; list-style-image: initial; width: 300px; height: 249px; " *Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, C. Lampert, H. Nickisch, and S. Harmeling, CVPR 2009 .pdf" style="text-decoration: none; color: rgb(7, 67, 135); " target="_blank">pdf ] *Describing Objects by Their Attributes, A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, CVPR 2009. *Attribute and Simile Classifiers for Face Verification, N. Kumar, A. Berg, P. Belhumeur, S. Nayar.ICCV 2009. Relative Attributes.D. Parikh and K. Grauman.ICCV 2011. A Discriminative Latent Model of Object Classes and Attributes.Y. Wang and G. Mori.ECCV, 2010. Learning Visual Attributes, V. Ferrari and A. Zisserman, NIPS 2007. Learning Models for Object Recognition from Natural Language Descriptions, J. Wang, K. Markert, and M. Everingham, BMVC 2009 . FaceTracer: A Search Engine for Large Collections of Images with Faces.N. Kumar, P. Belhumeur, and S. Nayar.ECCV 2008. Attribute-Centric Recognition for Cross-Category Generalization.A. Farhadi, I. Endres, D. Hoiem.CVPR 2010. Automatic Attribute Discovery and Characterization from Noisy Web Data.T. Berg et al.ECCV 2010. Attributes-Based People Search in Surveillance Environments.D. Vaquero, R. Feris, D. Tran, L. Brown, A. Hampapur, and M. Turk.WACV 2009. Image Region Entropy: A Measure of "Visualness" of Web Images Associated with One Concept.K. Yanai and K. Barnard.ACM MM 2005. What Helps Where And Why? Semantic Relatedness for Knowledge Transfer. M. Rohrbach, M. Stark, G. Szarvas, I. Gurevych and B. Schiele. CVPR 2010. Recognizing Human Actions by Attributes.J. Liu, B. Kuipers, S. Savarese, CVPR 2011. Interactively Building a Discriminative Vocabulary of Nameable Attributes.D. Parikh and K. Grauman. CVPR 2011. Papers: Saurajit Expts: Qiming , Harsh Proposal abstracts due Friday Oct 7, 5 PM III. External input in recognition Oct 12 Language and description Discovering the correspondence between words and other language constructs and images, generating descriptions img alt="caption" src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/caption.jpg" real_src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/caption.jpg" title="今年马尔奖得主KristenGrauman讲的VisualRecognition" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; border-top-width: 0px; border-right-width: 0px; border-bottom-width: 0px; border-left-width: 0px; border-style: initial; border-color: initial; list-style-type: none; list-style-position: initial; list-style-image: initial; width: 300px; height: 203px; " *Baby Talk: Understanding and Generating Image Descriptions.Kulkarni et al.CVPR 2011. *Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers, A. Gupta and L. Davis, ECCV 2008. *Learning Sign Language by Watching TV (using weakly aligned subtitles), P. Buehler, M. Everingham, and A. Zisserman. CVPR 2009. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary, P. Duygulu, K. Barnard, N. de Freitas, D. Forsyth. ECCV 2002. The Mathematics of Statistical Machine Translation: Parameter Estimation.P. Brown, S. Della Pietro, V. Della Pietra, R. Mercer.Association for Computational Linguistics, 1993. (background for Duygulu et al paper) How Many Words is a Picture Worth?Automatic Caption Generation for News Images.Y. Feng and M. Lapata.ACL 2010. Matching words and pictures. K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. Blei, and M. Jordan.JMLR, 3:1107–1135, 2003. Who's Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation.L. Jie, B. Caputo, and V. Ferrari.NIPS 2009. Watch, Listen Learn: Co-training on Captioned Images and Videos. S. Gupta, J. Kim, K. Grauman, and R. Mooney. ECML 2008. Systematic Evaluation of Machine Translation Methods for Image and Video Annotation, P. Virga, P. Duygulu, CIVR 2005. Localizing Objects and Actions in Videos Using Accompanying Text.Johns Hopkins University Summer Workshop Report.J. Neumann et al.2010. Subrip for subtitle extraction Reuters captioned photos Sonal Gupta's data for commentary+video Papers: Chris Expts: Jae , Naga Oct 19 Interactive learning and recognition Human-in-the-loop learning, active annotation collection, crowdsourcing img alt="questions" src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/questions.jpg" real_src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/questions.jpg" title="今年马尔奖得主KristenGrauman讲的VisualRecognition" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; border-top-width: 0px; border-right-width: 0px; border-bottom-width: 0px; border-left-width: 0px; border-style: initial; border-color: initial; list-style-type: none; list-style-position: initial; list-style-image: initial; width: 300px; height: 128px; " img alt="mturk" src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/mturk.bmp" real_src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/mturk.bmp" title="今年马尔奖得主KristenGrauman讲的VisualRecognition" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; border-top-width: 0px; border-right-width: 0px; border-bottom-width: 0px; border-left-width: 0px; border-style: initial; border-color: initial; list-style-type: none; list-style-position: initial; list-style-image: initial; width: 300px; height: 48px; " *Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds.S. Vijayanarasimhan and K. Grauman.CVPR 2011. *Visual Recognition with Humans in the Loop.Branson S., Wah C., Babenko B., Schroff F., Welinder P., Perona P., Belongie S.ECCV 2010. *The Multidimensional Wisdom of Crowds.Welinder P., Branson S., Belongie S., Perona, P. NIPS 2010. *What’s It Going to Cost You? : Predicting Effort vs. Informativeness for Multi-Label Image Annotations. S. Vijayanarasimhan and K. Grauman. CVPR 2009 iCoseg: Interactive Co-segmentation with Intelligent Scribble Guidance, D.Batra, A. Kowdle , D. Parikh, J. Luo and T. Chen.CVPR 2010. Labeling Images with a Computer Game.L. von Ahn and L. Dabbish.CHI, 2004. Who's Vote Should Count More: Optimal Integration fo Labels from Labelers of Unknown Expertise.J. Whitehill et al.NIPS 2009. Utility Data Annotation with Amazon Mechanical Turk.A. Sorokin and D. Forsyth.Wkshp on Internet Vision, 2008. Far-Sighted Active Learning on a Budget for Image and Video Recognition. S. Vijayanarasimhan, P. Jain, and K. Grauman. CVPR 2010. Multiclass Recognition and Part Localization with Humans in the Loop.C. Wah et al. ICCV 2011. Multi-Level Active Prediction of Useful Image Annotations for Recognition. S. Vijayanarasimhan and K. Grauman. NIPS 2008. Active Learning from Crowds.Y. Yan, R. Rosales, G. Fung, J. Dy.ICML 2011. Proactive Learning: Cost-Sensitive Active Learning with Multiple Imperfect Oracles.P. Donmez and J. Carbonell.CIKM 2008. Inactive Learning?Difficulties Employing Active Learning in Practice.J. Attenberg and F. Provost.SIGKDD 2011. Annotator Rationales for Visual Recognition. J. Donahue and K. Grauman.ICCV 2011. Interactively Building a Discriminative Vocabulary of Nameable Attributes.D. Parikh and K. Grauman. CVPR 2011. Actively Selecting Annotations Among Objects and Attributes. A. Kovashka, S. Vijayanarasimhan, and K. Grauman. ICCV 2011 Supervised Learning from Multiple Experts: Whom to Trust When Everyone Lies a Bit.V. Raykar et al.ICML 2009. Multi-class Active Learning for Image Classification.A. J. Joshi, F. Porikli, and N. Papanikolopoulos.CVPR 2009. GrabCut -Interactive Foreground Extraction using Iterated Graph Cuts , by C. Rother, V. Kolmogorov, A. Blake, SIGGRAPH 2004. Active Learning for Piecewise Planar 3D Reconstruction.A. Kowdle , Y.-J. Chang, A. Gallagher and T. Chen.CVPR 2011 Amazon Mechanical Turk Using Mechanical Turk with LabelMe Papers: Brian , Harsh Expts: Yunsik Proposal extended outline due Friday Oct 21, 5 PM IV. Activity in images and video Oct 26 Pictures of people Finding people and their poses, automatic face tagging img alt="pose" src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/pose.jpg" real_src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/pose.jpg" title="今年马尔奖得主KristenGrauman讲的VisualRecognition" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; border-top-width: 0px; border-right-width: 0px; border-bottom-width: 0px; border-left-width: 0px; border-style: initial; border-color: initial; list-style-type: none; list-style-position: initial; list-style-image: initial; width: 300px; height: 189px; " *Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations, L.Bourdev and J. Malik.ICCV 2009 *Understanding Images of Groups of People, A. Gallagher and T. Chen, CVPR 2009. *Real-Time Human Pose Recognition in Parts from a Single Depth Image.J. Shotton et al.CVPR 2011. *"'Who are you?' - Learning Person Specific Classifiers from Video, J. Sivic, M. Everingham, and A. Zisserman, CVPR 2009. Contextual Identity Recognition in Personal Photo Albums. D. Anguelov, K.-C. Lee, S. Burak, Gokturk, and B. Sumengen. C VPR 2007. Fast Pose Estimation with Parameter Sensitive Hashing.G. Shakhnarovich, P. Viola, T. Darrell, ICCV 2003. Finding and Tracking People From the Bottom Up.D. Ramanan, D. A. Forsyth.CVPR 2003. Where’s Waldo: Matching People in Images of Crowds.R. Garg, D. Ramanan, S. Seitz, N. Snavely. CVPR 2011. Autotagging Facebook: Social Network Context Improves Photo Annotation, byZ. Stone, T. Zickler, and T. Darrell.CVPR Internet Vision Workshop 2008. Efficient Propagation for Face Annotation in Family Albums. L. Zhang, Y. Hu, M. Li, and H. Zhang. MM 2004. Progressive Search Space Reduction for Human Pose Estimation.Ferrari, V., Marin-Jimenez, M. and Zisserman, A.CVPR 2008. Leveraging Archival Video for Building Face Datasets, by D. Ramanan, S. Baker, and S. Kakade.ICCV 2007. Names and Faces in the News, by T. Berg, A. Berg, J. Edwards, M. Maire, R. White, Y. Teh, E. Learned-Miller and D. Forsyth, CVPR 2004. Face Discovery with Social Context.Y. J. Lee and K. Grauman.BMVC 2011. “Hello! My name is... Buffy” – Automatic Naming of Characters in TV Video, by M. Everingham, J. Sivic and A. Zisserman, BMVC 2006. Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities. Yao, B., Fei-Fei, L. CVPR 2010. A Face Annotation Framework with Partial Clustering and Interactive Labeling. R. X. Y. Tian,W. Liu, F.Wen, and X. Tang. CVPR 2007. From 3D Scene Geometry to Human Workspace.A. Gupta et al.CVPR 2011. Pictorial Structures Revisited: People Detection and Articulated Pose Estimation.M. Andriluka et al. CVPR 2009. Face detection code in OpenCV Gallagher's Person Dataset Face data from Buffy episode, from Oxford Visual Geometry Group CALVIN upper-body detector code Papers: Sunil , Larry Expts: Nishant , Jung Nov 2 Activity recognition Recognizing and localizing human actions in video img alt="actions" src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/actions.png" real_src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/actions.png" title="今年马尔奖得主KristenGrauman讲的VisualRecognition" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; border-top-width: 0px; border-right-width: 0px; border-bottom-width: 0px; border-left-width: 0px; border-style: initial; border-color: initial; list-style-type: none; list-style-position: initial; list-style-image: initial; width: 300px; height: 205px; " *Actions in Context, M. Marszalek, I. Laptev, C. Schmid.CVPR 2009. *A Hough Transform-Based Voting Framework for Action Recognition.A. Yao, J. Gall, L. Van Gool.CVPR 2010. *Beyond Actions: Discriminative Models for Contextual Group Activities.T. Lian, Y. Wang, W. Yang, and G. Mori.NIPS 2010. Objects in Action: An Approach for Combining Action Understanding and Object Perception.A. Gupta and L. Davis.CVPR, 2007. Learning Realistic Human Actions from Movies .I. Laptev, M. Marszałek, C. Schmid and B. Rozenfeld.CVPR 2008. Understanding Egocentric Activities.A. Fathi, A. Farhadi, J. Rehg.ICCV 2011. Exploiting Human Actions and Object Context for Recognition Tasks.D. Moore, I. Essa, and M. Hayes.ICCV 1999. A Scalable Approach to Activity Recognition Based on Object Use.J. Wu, A. Osuntogun, T. Choudhury, M. Philipose, and J. Rehg.ICCV 2007. Recognizing Actions at a Distance.A. Efros, G. Mori, J. Malik.ICCV 2003. Activity Recognition from First Person Sensing .E. Taralova, F. De la Torre, M. HebertCVPR 2009 Workshop on Egocentric Vision Action Recognition from a Distributed Representation of Pose and Appearance, S.Maji, L. Bourdev, J.Malik,CVPR 2011. Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition. A. Kovashka and K. Grauman. CVPR 2010. Temporal Causality for the Analysis of Visual Events.K. Prabhakar, S. Oh, P. Wang, G. Abowd, and J. Rehg.CVPR 2010. Modeling Activity Global Temporal Dependencies using Time Delayed Probabilistic Graphical Model.Loy, Xiang Gong ICCV 2009. What's Going on?: Discovering Spatio-Temporal Dependencies in Dynamic Scenes.D. Kuettel et al.CVPR 2010. Learning Actions From the Web.N. Ikizler-Cinbis, R. Gokberk Cinbis, S. Sclaroff.ICCV 2009. Content-based Retrieval of Functional Objects in Video Using Scene Context.S. Oh, A. Hoogs, M. Turek, and R. Collins.ECCV 2010. Ivan Laptev's Space-Time Interest Points code Hollywood activity dataset UCF activity datasets PASCAL VOC action recognition taster challenge Greg Mori and Ivan Laptev's tutorial on action recognition at ECCV 2010 TRECVID video retrieval challenge UMich Collective Activity dataset Papers: Qiming , Yunsik Expts: Lu Xia V. Dealing with lots of data/categories Nov 9 Scaling with a large number of categories Sharing features between classes, transfer, taxonomy, learning from few examples, exploiting class relationships img alt="shared" src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/shared.png" real_src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/shared.png" title="今年马尔奖得主KristenGrauman讲的VisualRecognition" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; border-top-width: 0px; border-right-width: 0px; border-bottom-width: 0px; border-left-width: 0px; border-style: initial; border-color: initial; list-style-type: none; list-style-position: initial; list-style-image: initial; width: 300px; height: 139px; " *Sharing Visual Features for Multiclass and Multiview Object Detection, A. Torralba, K. Murphy, W. Freeman, PAMI 2007. *What Does Classifying More than 10,000 Image Categories Tell Us?J. Deng, A. Berg, K. Li and L. Fei-Fei.ECCV 2010. *Discriminative Learning of Relaxed Hierarchy for Large-scale Visual Recognition.T. Gao and Daphne Koller.ICCV 2011. Comparative Object Similarity for Improved Recognition with Few or Zero Examples.G. Wang, D. Forsyth, and D. Hoeim.CVPR 2010. Learning and Using Taxonomies for Fast Visual Categorization, G. Griffin and P. Perona, CVPR 2008. Cross-Generalization: Learning Novel Classes from a Single Example by Feature Replacement.CVPR 2005. 80 Million Tiny Images: A Large Dataset for Non-Parametric Object and Scene Recognition, by A. Torralba, R. Fergus, and W. Freeman.PAMI 2008. Constructing Category Hierarchies for Visual Recognition, M. Marszalek and C. Schmid.ECCV 2008. Learning Generative Visual Models from Few Training Examples: an Incremental Bayesian Approach Tested on 101 Object Categories. L. Fei-Fei, R. Fergus, and P. Perona. CVPR Workshop on Generative-Model Based Vision. 2004. Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts. S. Fidler and A. Leonardis. CVPR 2007 Exploiting Object Hierarchy: Combining Models from Different Category Levels, A. Zweig and D. Weinshall, ICCV 2007 Incremental Learning of Object Detectors Using a Visual Shape Alphabet.Opelt, Pinz, and Zisserman, CVPR 2006. Sequential Learning of Reusable Parts for Object Detection.S. Krempp, D. Geman, and Y. Amit.2002 ImageNet: A Large-Scale Hierarchical Image Database, J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, CVPR 2009 Semantic Label Sharing for Learning with Many Categories.R. Fergus et al.ECCV 2010. Learning a Tree of Metrics with Disjoint Visual Features.S. J. Hwang, K. Grauman, F. Sha.NIPS 2011. SUN Scene dataset of 899 scene classes ImageNet dataset of 15K objects and ImageNet challenge Papers: Cho-Jui , Si Si Expts: Lu Pan Nov 16 Large-scale search and mining Scalable retrieval algorithms for massive databases, mining for themes img alt="hash" src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/hash.png" real_src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/hash.png" title="今年马尔奖得主KristenGrauman讲的VisualRecognition" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; border-top-width: 0px; border-right-width: 0px; border-bottom-width: 0px; border-left-width: 0px; border-style: initial; border-color: initial; list-style-type: none; list-style-position: initial; list-style-image: initial; width: 300px; height: 209px; " *VisualRank: Applying PageRank to Large-Scale Image Search.Y. Jing and S. Baluja.PAMI 2008. *Kernelized Locality Sensitive Hashing for Scalable Image Search, by B. Kulis and K. Grauman, ICCV 2009 *Video Mining with Frequent Itemset Configurations.T. Quack, V. Ferrari, and L. Van Gool.CIVR 2006. Learning Binary Projections for Large-Scale Image Search.K. Grauman and R. Fergus.Chapter (draft) to appear in Registration, Recognition, and Video Analysis, R. Cipolla, S. Battiato, and G. Farinella, Editors. World-scale Mining of Objects and Events from Community Photo Collections.T. Quack, B. Leibe, and L. Van Gool.CIVR 2008. Interest Seam Image.X. Zhang, G. Hua, L. Zhang, H. Shum.CVPR 2010. Detecting Objects in Large Image Collections and Videos by Efficient Subimage Retrieval, C. Lampert, ICCV 2009. Geometric Min-Hashing: Finding a (Thick) Needle in a Haystack, O. Chum, M. Perdoch, and J. Matas.CVPR 2009. FaceTracer: A Search Engine for Large Collections of Images with Faces.N. Kumar, P. Belhumeur, and S. Nayar.ECCV 2008. Efficiently Searching for Similar Images. K. Grauman. Communications of the ACM , 2009. Fast Image Search for Learned Metrics, P. Jain, B. Kulis, and K. Grauman, CVPR 2008. Small Codes and Large Image Databases for Recognition, A. Torralba, R. Fergus, and Y. Weiss, CVPR 2008. Object Retrieval with Large Vocabularies and Fast Spatial Matching. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, CVPR 2007. LSH homepage Nearest Neighbor Methods in Learning and Vision, Shakhnarovich, Darrell, and Indyk, editors. Papers: Naga, Jae Expts: Si Si Nov 23 Summarization Video synopsis, discovering repeated objects, visualization img alt="synopsis" src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/synopsis.jpg" real_src="http://www.cs.utexas.edu/~grauman/courses/fall2011/ims/synopsis.jpg" title="今年马尔奖得主KristenGrauman讲的VisualRecognition" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; border-top-width: 0px; border-right-width: 0px; border-bottom-width: 0px; border-left-width: 0px; border-style: initial; border-color: initial; list-style-type: none; list-style-position: initial; list-style-image: initial; width: 300px; height: 191px; " *Webcam Synopsis: Peeking Around the World , by Y. Pritch, A. Rav-Acha, A. Gutman , and S. Peleg, ICCV 2007. * Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman.CVPR 2006. * Summarizing Visual Data Using Bi-Directional Similarity.D. Simakov, Y. Caspi, E. Shechtmann, M. Irani.CVPR 2008. Fast Unsupervised Ego-Action Learning for First-Person Sports Video.K. Kitani, T. Okabe, Y. Sato, A. Sugimoto.CVPR 2011. Scene Summarization for Online Image Collections.I. Simon, N. Snavely, S. Seitz.ICCV 2007. VideoCut: Removing Irrelevant Frames by Discovering the Object of Interest.D. Liu, G. Hua, T. Chen.ECCV 2010. Video Epitomes. V. Cheung, B. J. Frey, and N. Jojic.CVPR 2005. Making a Long Video Short. A. Rav-Acha, Y. Pritch, and S. Peleg. CVPR 2006. Structural Epitome: A Way to Summarize One's Visual Experience.N. Jojic, A. Perina, V. Murino.NIPS 2010. Video Abstraction: A Systematic Review and Classification.B. Truong and S. Venkatesh.ACM 2007. Shape Discovery from Unlabeled Image Collections.Y. J. Lee and K. Grauman.CVPR 2009. Detecting and Sketching the Common.S. Bagon, O. Brostovski, M. Galun, M. Irani.CVPR 2010. Object-Graphs for Context-Aware Category Discovery. Y. J. Lee and K. Grauman. CVPR 2010. Unsupervised Object Discovery: A Comparison.T. Tuytelaars et al.IJCV 2009. Papers: Lu Pan Expts: Sunil, Chris Final paper drafts due Wed Nov 23 Nov 30 Final project presentations in class Final papers due Tues Dec 6, 5 PM
前面两篇博文介绍了以像素值为特征的背景建模方法: 以像素值为特征的方法(2) 和 以像素值为特征的方法(1) 下面介绍一下以纹理为特征的方法,比较出名的就是用LBP和SILTP为特征做的。 以LBP为特征的文章:A texture-based method for modeling the background and detecting moving objects 这篇文章发表在06年TPAMI上面,还是奥鲁大学那帮人做的,他们已经把LBP用到极致了,LBP在计算机视觉的各个领域都得到了应用。 首先进行LBP的计算: 公式为: 特征表示完之后就是建立背景模型,为K个。模型的更新公式为: 而比较两个直方图的相似程度则用直方图交集(histogram intersaction). 文章实验做得比较充分,但是比较试验很少只跟混合高斯模型(GMM)进行比较了. 另一篇比较好的以纹理为特征的背景建模的方法是CVPR2010的文章,题目: Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes 本文提出了一种新的纹理表示方法,scale invariant local ternary pattern(SILTP) 其次就是在背景建模更新的时候提出一种模式核密度估计的方法(pattern kernel density estimation) 文章的对比试验也很全,用三种方法在九段视频上进行了试验。
以前做过一些关于背景建模,运动目标检测的工作,打算进行一下小结,那么就先从这篇CVPR2011这篇评测的文章说起吧。 Evaluation of Background Subtraction Techniques for Video Surveillance ( PDF ) Sebastian Brutzer, Benjamin Hoeferlin (University of Stuttgart), Gunther Heidemann (University of Stuttgart) 这篇文章的项目主页: http://www.vis.uni-stuttgart.de/index.php?id=sabs 可以在这个网页上下载最新的数据库,以及一些评测的代码(注意是评测的代码,不是背景建模方法的代码)。 这篇文章对近年来背景建模的一些方法做了一些比较,比较的方法有: 本人感觉这篇文章之所以能够发在CVPR这种高级别的会议上,主要有一下原因: 1. 作者公开发布了一个数据,而且这个数据库是合成,所以比较方便用来量化评价其他方法; 2. 表面上工作量很大 ,之所以说表面上工作量很大,作者虽然比较了9中方法,但是这些方法在网上几乎都有源代码(集成在opencv中的),还有一些是已经公开了可执行程序的了。而作者的工作量不是很大。我们可以看Features那一列,基本上都是color为特征,而作者忽略了纹理特征的背景建模。纹理特征奥鲁大学做LBP的人在06年就发表了用纹理做背景建模的文章,而且是发表在PAMI上面的,作者不能不知道吧,试问他比较的这九中方法那种发表在PAMI上了。再说2010 CVPR上面有一篇Stan Li的文章也是用纹理做,两篇文章效果都很好。 也不知道作者为什么没有比较... ...以后我们会介绍这两个经典的方法的; 3.分析的还可以,貌似所有cvpr的文章分析的都不错。 作者分析了背景建模有以下的难点: Gradual illumination changes : It is desirable that background model adapts to gradual changes of the appearance of the environment. For example in outdoor settings, the light intensity typically varies during day. Sudden illumination changes : Sudden once-off changes are not covered by the background model. They occur for example with sudden switch of light, strongly affect the appearance of background, and cause false positive detections. Dynamic background : Some parts of the scenery may contain movement, but should be regarded as background, according to their relevance. Such movement can be periodical or irregular (e.g., traffic lights, waving trees). Camouflage : Intentionally or not, some objects may poorly differ from the appearance of background, making correct classification difficult. This is especially important in surveillance applications. Shadows : Shadows cast by foreground objects often complicate further processing steps subsequent to background subtraction. Overlapping shadows of foreground regions for example hinder their separation and classification. Hence, it is preferable to ignore these irrelevant regions. Bootstrapping : If initialization data which is free from foreground objects is not available, the background model has to be initialized using a bootstrapping strategy. Video noise : Video signal is generally superimposed by noise. Background subtraction approaches for video surveillance have to cope with such degraded signals affected by different types of noise, such as sensor noise or compression artifacts. 评测的结果: 值得注意的是, Barnich 方法速度性能都很不错,他的文章中有伪代码,作者的主页上提供可执行程序,并且可以集成到自己的程序中。
jianchao yang 的这篇Locality-constrained Linear Coding for Image Classification(介绍这篇文章的链接: http://blog.sina.com.cn/s/blog_631a4cc40100wdul.html )是在以下两篇文章的基础上做的,Liner Spatial Pyramid Matching using Sparse Coding for Image Classification(CVPR'2009)和Nolinear Dimensionality Reduction by Locally Linear Embedding(LLE),下面分别介绍一下。 Liner Spatial Pyramid Matching using Sparse Coding for Image Classification(CVPR'2009)同样也是 jianchao yang 的工作代码也是公开的,现在的他引率已经是175次,刚出三年的文章能达到这个引用率已经很不错了。文章的核心公式: 第一项约束重构误差,第二项是用1范数近似0范数约束稀疏性。 之后就是一些基本的操作,max pooling和图像金字塔了。 这个公式解决了两个问题,其一是多个码本重构特征减少了重构误差,其二用线性SVM减少训练时间。 优化这个式子用的是 Honglak Lee 代码。 Nolinear Dimensionality Reduction by Locally Linear Embedding(LLE)是流行中非常经典的文章,现在的他引率已经达到4550次了,这是发在 Science, 2000年的文章。作者 Sam T. Roweis 是一个大牛,成名比较早,但是因为产后抑郁跳楼身亡了(男人也可以因为这个原因跳楼??)。文章中的思想比较简单,就是用近邻的几个点重构他,只考虑他们之间的相对关系。从而降维。
SIFT原文:Distinctive Image Featuresfrom Scale-Invariant Keypoints,作者 David G. Lowe 发表在2004的IJCV上面,这篇文章可谓是里程碑似的工作,现在的他引次数已经达到12239次!! 作者在这方面至少有10+年的积累才发出这种牛B的文章,所以这也暗示我们要沿着一个方向踏踏实实的做呀~ 现在关于图像分类的文章底层特征基本上都是SIFT,所以这种经典的文章还是要了解一下呀,虽然那个IJCV很长,在有些细节说的还是不具体,所以可以看看 Rob Hess 写的代码,点击即可下载,代码写的很工整,非常容易理解,用C写的。 SIFT Library The Scale Invariant Feature Transform (SIFT) is a method to detect distinctive, invariant image feature points, which easily can be matched between images to perform tasks such as object detection and recognition, or to compute geometrical transformations between images. The open-source SIFT library available here is implemented in C using the OpenCV open-source computer vision library and includes functions for computing SIFT features in images, matching SIFT features between images using kd-trees, and computing geometrical image transforms from feature matches using RANSAC.The library also includes functionality to import and work with image features from both David Lowe’s SIFT detector and the Oxford VGG’s affine covariant feature detectors . The images below depict some of this functionality. 不过做实验的时候还是推荐使用牛津大学开发的 VLFeat ,速度很快,而且性能很好。 The VLFeat open source library implements popular computer vision algorithms including SIFT , MSER , k-means , hierarchical k-means , agglomerative information bottleneck, and quick shift . It is written in C for efficiency and compatibility, with interfaces in MATLAB for ease of use, and detailed documentation throughout. It supports Windows , Mac OS X , and Linux . The latest version of VLFeat is0.9.13.
FastMap方法最原始文献为 ,该方法的最大优点就是速度快,具体来说,它的时间复杂度为 O ( p * n ),其中 p 为目标空间维数, n 为进行嵌入或映射操作的对象数量。 类似方法还有MetricMap(个人觉得这个还是比较难理解的) , Landmark MDS 等,而且Platt已经证明这三种方法均可归结于Nystrom方法 。 从实现的角度来说,FastMap方法和Landmark MDS均不难,而MetricMap可能会麻烦点。 本人参照 实现了FastMap方法,源代码见 fastmap.py ,该源代码比 中多了对out of sample对象的处理,而且采用了统一的方法。 本人认为FastMap方法虽然速度比较快,但精度不算太高(做过大量的实验,以后会贴出相关结果),不过可以结合MDS一块来使用。目前初始化MDS比较不错的方法是classical MDS,但classical MDS速度特别慢,因此可以采用FastMap、MetricMap或Landmark MDS得到的输出来初始化MDS,让MDS通过迭代的方法找到最优解(当然一般是局部最优解)。 参考文献: Christos Faloutsos and King-Ip (David) Lin, 1995. FastMap:a fast algorithm for indexing, data-mining andvisualization of traditional and multimedia datasets. Proceedings of the ACM SIGMOD International Conference onManagement of Data , Michael J. Carey and Donovan A. Schneider,eds., San Jose, California, 163-174. Jason Tsong-LiWang,XiongWang,DennisShasha,KaizhongZhang, 2005.MetricMap: An Embedding Technique for Processing Distance-based Queries in Metric Spaces.IEEE Transactions on Systems, Man, and Cybernetics: Part B: Cybernetics, Vol. 35, No. 5, pp.973--987. Vinde Silva, Joshua B.Tenenbaum, 2004.Sparse Multidimensional Scaling using Landmark Points. Technical Report,Stanford University. John C.Platt, 2005.FastMap, MetricMap, and landmark MDS are all Nystrom Algorithms. Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics , pp.261-268. http://gromgull.net/blog/2009/08/fastmap-in-python/
Before we start discussing the topic of a hybrid NLP (Natural Language Processing) system, let us look at the concept of hybrid from our life experiences. I was driving a classical Camry for years and had never thought of a change to other brands because as a vehicle, there was really nothing to complain. Yes, style is old but I am getting old too, who beats whom? Until one day a few years ago when we needed to buy a new car to retire my damaged Camry. My daughter suggested hybrid, following the trend of going green. So I ended up driving a Prius ever since and fallen in love with it. It is quiet, with bluetooth and line-in, ideal for my iPhone music enjoyment. It has low emission and I finally can say bye to smog tests. It at least saves 1/3 gas. We could have gained all these benefits by purchasing an expensive all-electronic car but I want the same feel of power at freeway and dislike the concept of having to charge the car too frequently. Hybrid gets the best of both worlds for me now, and is not that more expensive. Now back to NLP. There are two major approaches to NLP, namely machine learning and grammar engineering (or hand-crafted rule system). As mentioned in previous posts, each has its own strengths and limitations, as summarized below. In general, a rule system is good at capturing a specific language phenomenon (trees) while machine learning is good at representing the general picture of the phenomena (forest). As a result, it is easier for rule systems to reach high precision but it takes a long time to develop enough rules to gradually raise the recall. Machine learning, on the other hand, has much higher recall, usually with compromise in precision or with a precision ceiling. Machine learning is good at simple, clear and coarse-grained task while rules are good at fine-grained tasks. One example is sentiment extraction. The coarse-grained task there is sentiment classification of documents (thumbs-up thumbs down), which can be achieved fast by a learning system. The fine-grained task for sentiment extraction involves extraction of sentiment details and the related actionable insights, including association of the sentiment with an object, differentiating positive/negative emotions from positive/negative behaviors, capturing the aspects or features of the object involved, decoding the motivation or reasons behind the sentiment,etc. In order to perform sophisticated tasks of extracting such details and actionable insights, rules are a better fit. The strength for machine learning lies in its retraining ability. In theory, the algorithm, once developed and debugged, remains stable and the improvement of a learning system can be expected once a larger and better quality corpus is used for retraining (in practice, retraining is not always easy: I have seen famous learning systems deployed in client basis for years without being retrained for various reasons). Rules, on the other hand, need to be manually crafted and enhanced. Supervised machine learning is more mature for applications but it requires a large labelled corpus. Unsupervised machine learning only needs raw corpus, but it is research oriented and more risky in application. A promising approach is called semi-supervised learning which only needs a small labelled corpus as seeds to guide the learning. We can also use rules to generate the initial corpus or seeds for semi-supervised learning. Both approaches involve knowledge bottlenecks. Rule systems's bottleneck is the skilled labor, it requires linguists or knowledge engineers to manually encode each rule in NLP, much like a software engineer in the daily work of coding. The biggest challenge to machine learning is the sparse data problem, which requires a very large labelled corpus to help overcome. The knowledge bottleneck for supervised machine learning is the labor required for labeling such a large corpus. We can build a system to combine the two approaches to complement each other. There are different ways of combining the two approaches in a hybrid system. One example is the practice we use in our product, where the results of insights are structured in a back-off model: high precision results from rules are ranked higher than the medium precision results returned by statistical systems or machine learning. This helps the system to reach configurable balance between precision and recall. When labelled data are available (e.g. the community has already built the corpus, or for some tasks, the public domain has the data, e.g. sentiment classification of movie reviews can use the review data with users' feedback on 5-star scale), and when the task is simple and clearly defined, using machine learning will greatly speed up the development of a capability. Not every task is suitable for both approaches. (Note that suitability is in the eyes of beholder: I have seen many passionate ML specialists willing to try everything in ML irrespective of the nature of the task: as an old saying goes, when you have a hammer, everything looks like a nail.) For example, machine learning is good at document classification whilerules are mostly powerless for such tasks. But for complicated tasks such as deep parsing, rules constructed by linguists usually achieve better performance than machine learning. Rules also perform better for tasks which have clear patterns, for example, identifying data items like time,weight, length, money, address etc. This is because clear patterns can be directly encoded in rules to be logically complete in coverage while machine learning based on samples still has a sparse data challenge. When designing a system, in addition to using a hybrid approach for some tasks, for other tasks, we should choose the most suitable approach depending on the nature of the tasks. Other aspects of comparison between the two approaches involve the modularization and debugging in industrial development. A rule system can be structured as a pipeline of modules fairly easily so that a complicated task is decomposed into a series of subtasks handled by different levels of modules. In such an architecture, a reported bug is easy to localize and fix by adjusting the rules in the related module. Machine learning systems are based on the learned model trained from the corpus. The model itself, once learned, is often like a black-box (even when the model is represented by a list of symbolic rules as results of learning, it is risky to manually mess up with the rules in fixing a data quality bug). Bugs are supposed to be fixable during retraining of the model based on enhanced corpus and/or adjusting new features. But re-training is a complicated process which may or may not solve the problem. It is difficultto localize and directly handle specific reported bugs in machine learning. To conclude, due to the complementary nature for pros/cons of the two basic approaches to NLP, a hybrid system involving both approaches is desirable, worth more attention and exploration. There are different ways of combining the two approaches in a system, including a back-off model using rulles for precision and learning for recall, semi-supervised learning using high precision rules to generate initial corpus or “seeds”, etc.. Related posts: Comparison of Pros and Cons of Two NLP Approaches Is Google ranking based on machine learning ? 《立委随笔:语言自动分析的两个路子》 《立委随笔:机器学习和自然语言处理》 【置顶:立委科学网博客NLP博文一览(定期更新版)】
AI: http://www.douban.com/online/10918517/ 开始时间: 2011年10月10日 周一 09:00 结束时间: 2011年12月16日 周五 09:00 斯坦福于今年十月份将开始网络课程【人工智能】。所有内容线上发布,包括视频讲座,作业,考试等等。如果顺利完成,将得到证书一个。 全球目前已经有超过五十六万学生报名。 课程内容为全英文,但是youtube有自动翻译中文字幕(不是很精准)。 本活动旨在为希望参与这门课的同学一个交流的平台。希望大家多多推荐。 详情见 http://www.ai-class.com/ A bold experiment in distributed education, "Introduction to Artificial Intelligence" will be offered free and online to students worldwide during the fall of 2011. The course will include feedback on progress and a statement of accomplishment. Taught by Sebastian Thrun and Peter Norvig, the curriculum draws from that used in Stanford's introductory Artificial Intelligence course. The instructors will offer similar materials, assignments, and exams. Artificial Intelligence is the science of making computer software that reasons about the world around it. Humanoid robots, Google Goggles, self-driving cars, even software that suggests music you might like to hear are all examples of AI. In this class, you will learn how to create this software from two of the leaders in the field. Class begins October 10. Details on the course, including a syllabus is available here. Sign up above to receive additional information about participating in the online version when it becomes available ML: http://www.douban.com/online/10918628/ 开始时间: 2011年10月10日 周一 07:00 结束时间: 2011年12月6日 周二 06:00 斯坦福于今年十月份将公开授课的线上课程【机器学习】。所有内容免费线上发布,包括视频讲座,作业,考试等等。如果顺利完成,将得到证书一个。 课程内容为全英文,youtube有自动翻译的中文字幕。 本活动旨在为希望参与这门课的同学一个交流的平台。希望大家多多推荐。 详情见 http://www.ml-class.com/ Course Description This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition. Topics include: (i) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI). (iv) Reinforcement learning. The course will also draw from numerous case studies and applications, so that you'll also learn how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas.
这两天写论文中,本来设计的是要画这个Precision-Recall Corve的,因为PRC是从信息检索中来的,而且我又做的类似一个检索,所以要画这个图,但是我靠,竟然发现不好画,找了很多资料等。最后也没画好,多么重要好看实用的图啊,可惜了。 今天就花了一点功夫, 专门为自己弄了个工具包,用来计算多分类问题中的Precision-Recall Corve、混淆矩阵Confusion Matrix并且进行可视化输出。 不过Precision-Recall Corve对于每一类的画法还是很有讲究的,我们知道对于二类问题,像是检索中的问题,最后的查全率、查准率基本都是最后计算一对值就行了,但是就一对值,一个点是画不出曲线来的,所以在实际的曲线过程中,是这样的: 1、首先得分为正负两类,多类问题真对每一类都可以映射过去 2、按照决策值(分类问题每一个样本肯定会有一个支持分类的概率或者置信度等等,像是libsvm的dec_values的矩阵),按照从小到大的顺序进行排序 3、然后分别计算全部样本、全本样本-1、全部样本-2、………..、一直计算完毕,每一次都会有查全率查准率,就可以曲线了,这里我说的很粗糙,详细的可以查看我的代码,当然也有函数参考的别人的,也做了说明。 correct result / classification E1 E2 obtained result / classification E1 tp (true positive) fp (false positive) E2 fn (false negative) tn (true negative) Precision and recall are then defined as: Recall in this context is also referred to as the True Positive Rate, other related measures used in classification include True Negative Rate and Accuracy: . True Negative Rate is also called Specificity. —————— 我的计算这些东西的代码包: PG_Curve.zip : Matlab code for computing and visualization: Confusion Matrix, Precision/Recall Curve, ROC, Accuracy, F-Measure etc. for Classification. 红色的跳跃的就是最原始的曲线,绿色的是一个人的平滑算法。
From: http://as.wiley.com/WileyCDA/WileyTitle/productCd-0470749911,descCd-description.html Larger Image Cluster Analysis, 5th Edition Brian S. Everitt , Dr Sabine Landau , Dr Morven Leese , Dr Daniel Stahl ISBN: 978-0-470-74991-3 Hardcover 336 pages March 2011 Wiley List Price: US $95.00 Description This edition provides a thorough revision of the fourth edition which focuses on the practical aspects of cluster analysis and covers new methodology in terms of longitudinal data and provides examples from bioinformatics. Real life examples are used throughout to demonstrate the application of the theory, and figures are used extensively to illustrate graphical techniques. This book includes an appendix of getting started on cluster analysis using R, as well as a comprehensive and up-to-date bibliography. Table of Contents Preface Acknowledgement 1 An introduction to classification and clustering 1.1 Introduction 1.2 Reasons for classifying 1.3 Numerical methods of classification - cluster analysis 1.4 What is a cluster? 1.5 Examples of the use of clustering 1.6 Summary 2 Detecting clusters graphically 2.1 Introduction 2.2 Detecting clusters with univariate and bivariate plots of data 2.3 Using lower-dimensional projections of multivariate data for graphical representations 2.4 Three-dimensional plots and trellis graphics 2.5 Summary 3Measurement of proximity 3.1 Introduction 3.2 Similarity measures for categorical data 3.3 Dissimilarity and distance measures for continuous data 3.4 Similarity measures for data containing both continuous and categorical variables 3.5 Proximity measures for structured data 3.6 Inter-group proximity measures 3.7 Weighting variables 3.8 Standardization 3.9 Choice of proximity measure 3.10 Summary 4Hierarchical clustering 4.1 Introduction 4.2 Agglomerative methods 4.3 Divisive methods 4.4 Applying the hierarchical clustering process 4.5 Applications of hierarchical methods 4.6 Summary 5Optimization clustering techniques 5.1 Introduction 5.2 Clustering criteria derived from the dissimilarity matrix 5.3 Clustering criteria derived from continuous data 5.4 Optimization algorithms 5.5 Choosing the number of clusters 5.6 Applications of optimization methods 5.7 Summary 6Finite mixture densities as models for cluster analysis 6.1 Introduction 6.2 Finite mixture densities 6.3 Other finite mixture densities 6.4 Bayesian analysis of mixtures 6.5 Inference for mixture models with unknown number of components and model structure 6.6 Dimension reduction - variable selection in finite mixture modelling 6.7 Finite regression mixtures 6.8 Software for finite mixture modelling 6.9 Some examples of the application of finite mixture densities 6.10 Summary 7Model-based cluster analysis for structured data 7.1 Introduction 7.2 Finite mixture models for structured data 7.3 Finite mixtures of factor models 7.4 Finite mixtures of longitudinal models 7.5 Applications of finite mixture models for structured data 7.6 Summary 8Miscellaneous clustering methods 8.1 Introduction 8.2 Density search clustering techniques 8.3 Density-based spatial clustering of applications with noise 8.4 Techniques which allow overlapping clusters 8.5 Simultaneous clustering of objects and variables 8.6 Clustering with constraints 8.7 Fuzzy clustering 8.8 Clustering and artificial neural networks 8.9 Summary 9Some final comments and guidelines 9.1 Introduction 9.2 Using clustering techniques in practice 9.3 Testing for absence of structure 9.4 Methods for comparing cluster solutions 9.5 Internal cluster quality, influence and robustness 9.6 Displaying cluster solutions graphically 9.7 Illustrative examples 9.8 Summary Bibliography Index
NICTA将elefant开源了 2010年2月28日 cvchina 没有评论 NICTA (National ICT Australia),是澳大利亚的一个独立公司,最近将 elefant (Efficient Learning, Large-scale Inference, and Optimisation Toolkit)开源了。 elefant 类似于weka,提供了很多机器学习,数据挖掘的算法,更酷的是,它是商用级别的。 关于elefant: Elefant (Efficient Learning, Large-scale Inference, and Optimisation Toolkit) is an open source library for machine learning licensed under the Mozilla Public License ( MPL ). We develop an open source machine learning toolkit which provides algorithms formachine learningutilising the power of multi-core/multi-threaded processors/operating systems (Linux, WIndows, Mac OS X), a graphical user interface for users who want to quickly prototype machine learning experiments, tutorials to support learning about Statistical Machine Learning ( Statistical Machine Learning at The Australian National University ), and detailed and precise documentation for each of the above. 关于NICTA: NICTA (National ICT Australia) is Australias Information and Communications Technology (ICT) Centre of Excellence.We are an independent company in the business of research, commercialisation and research training.With over 700 people, NICTA is the largest organisation in Australia dedicated to ICT research. 除了 elefant , NICTA 放出了很多开源软件,具体信息在 OpenNICTA 上面,其中有个一 行人库 不得不提,这个 行人库 包含了25k+的行人图像。做行人检测的人有福了啊。 来源
From: http://www.cse.ust.hk/~sinnopan/conferenceTL.htm List of Conferences and Workshops Where Transfer Learning Paper Appear This webpage will be updated regularly. Main Conferences Machine Learning and Artificial Intelligence Conferences AAAI 2008 Transfer Learning via Dimensionality Reduction Transferring Localization Models across Space Transferring Localization Models over Time Transferring Multi-device Localization Models using Latent Multi-task Learning Text Categorization with Knowledge Transfer from Heterogeneous Data Sources Zero-data Learning of New Tasks 2007 Transferring Naive Bayes Classifiers for Text Classification Mapping and Revising Markov Logic Networks for Transfer Learning Measuring the Level of Transfer Learning by an AP Physics Problem-Solver 2006 Using Homomorphisms to Transfer Options across Continuous Reinforcement Learning Domains Value-Function-Based Transfer for Reinforcement Learning Using Structure Mapping IJCAI 2009 Transfer Learning Using Task-Level Features with Application to Information Retrieval Transfer Learning from Minimal Target Data by Mapping across Relational Domains Domain Adaptation via Transfer Component Analysis Knowledge Transfer on Hybrid Graph Manifold Alignment without Correspondence Robust Distance Metric Learning with Auxiliary Knowledge Can Movies and Books Collaborate? Cross-Domain Collaborative Filtering for Sparsity Reduction Exponential Family Sparse Coding with Application to Self-taught Learning 2007 Learning and Transferring Action Schemas General Game Learning Using Knowledge Transfer Building Portable Options: Skill Transfer in Reinforcement Learning Transfer Learning in Real-Time Strategy Games Using Hybrid CBR/RL An Experts Algorithm for Transfer Learning Transferring Learned Control-Knowledge between Planners Effective Control Knowledge Transfer through Learning Skill and Representation Hierarchies Efficient Bayesian Task-Level Transfer Learning ICML 2009 Deep Transfer via Second-Order Markov Logic Feature Hashing for Large Scale Multitask Learning A Convex Formulation for Learning Shared Structures from Multiple Tasks EigenTransfer: A Unified Framework for Transfer Learning Domain Adaptation from Multiple Sources via Auxiliary Classifiers Transfer Learning for Collaborative Filtering via a Rating-Matrix Generative Model 2008 Bayesian Multiple Instance Learning: Automatic Feature Selection and Inductive Transfer Multi-Task Learning for HIV Therapy Screening Self-taught Clustering Manifold Alignment using Procrustes Analysis Automatic Discovery and Transfer of MAXQ Hierarchies Transfer of Samples in Batch Reinforcement Learning Hierarchical Kernel Stick-Breaking Process for Multi-Task Image Analysis Multi-Task Compressive Sensing with Dirichlet Process Priors A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning 2007 Boosting for Transfer Learning Self-taught Learning: Transfer Learning from Unlabeled Data Robust Multi-Task Learning with t-Processes Multi-Task Learning for Sequential Data via iHMMs and the Nested Dirichlet Process Cross-Domain Transfer for Reinforcement Learning Learning a Meta-Level Prior for Feature Relevance from Multiple Related Tasks Multi-Task Reinforcement Learning: A Hierarchical Bayesian Approach The Matrix Stick-Breaking Process for Flexible Multi-Task Learning Asymptotic Bayesian Generalization Error When Training and Test Distributions Are Different Discriminative Learning for Differing Training and Test Distributions 2006 Autonomous Shaping: Knowledge Transfer in Reinforcement Learning Constructing Informative Priors using Transfer Learning NIPS 2008 Clustered Multi-Task Learning: A Convex Formulation Multi-task Gaussian Process Learning of Robot Inverse Dynamics Transfer Learning by Distribution Matching for Targeted Advertising Translated Learning: Transfer Learning across Different Feature Spaces An empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis Domain Adaptation with Multiple Sources 2007 Learning Bounds for Domain Adaptation Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations A Spectral Regularization Framework for Multi-Task Structure Learning Multi-task Gaussian Process Prediction Semi-Supervised Multitask Learning Gaussian Process Models for Link Analysis and Transfer Learning Multi-Task Learning via Conic Programming Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation 2006 Correcting Sample Selection Bias by Unlabeled Data Dirichlet-Enhanced Spam Filtering based on Biased Samples Analysis of Representations for Domain Adaptation Multi-Task Feature Learning AISTAT 2009 A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation 2007 Kernel Multi-task Learning using Task-specific Features Inductive Transfer for Bayesian Network Structure Learning ECML/PKDD 2009 Relaxed Transfer of Different Classes via Spectral Partition Feature Selection by Transfer Learning with Linear Regularized Models Semi-Supervised Multi-Task Regression 2008 Actively Transfer Domain Knowledge An Algorithm for Transfer Learning in a Heterogeneous Environment Transferred Dimensionality Reduction Modeling Transfer Relationships between Learning Tasks for Improved Inductive Transfer Kernel-Based Inductive Transfer 2007 Graph-Based Domain Mapping for Transfer Learning in General Games Bridged Refinement for Transfer Learning Transfer Learning in Reinforcement Learning Problems Through Partial Policy Recycling Domain Adaptation of Conditional Probability Models via Feature Subsetting 2006 Skill Acquisition via Transfer Learning and Advice Taking COLT 2009 Online Multi-task Learning with Hard Constraints Taking Advantage of Sparsity in Multi-Task Learning Domain Adaptation: Learning Bounds and Algorithms 2008 Learning coordinate gradients with multi-task kernels Linear Algorithms for Online Multitask Classification 2007 Multitask Learning with Expert Advice 2006 Online Multitask Learning UAI 2009 Bayesian Multitask Learning with Latent Hierarchies Multi-Task Feature Learning Via Efficient L2,1-Norm Minimization 2008 Convex Point Estimation using Undirected Bayesian Transfer Hierarchies Data Mining Conferences KDD 2009 Cross Domain Distribution Adaptation via Kernel Mapping Extracting Discriminative Concepts for Domain Adaptation in Text Mining 2008 Spectral domain-transfer learning Knowledge transfer via multiple model local structure mapping 2007 Co-clustering based Classification for Out-of-domain Documents 2006 Reverse Testing: An Efficient Framework to Select Amongst Classifiers under Sample Selection Bias ICDM 2008 Unsupervised Cross-domain Learning by Interaction Information Co-clustering Using Wikipedia for Co-clustering Based Cross-domain Text Classification SDM 2008 Type-Independent Correction of Sample Selection Bias via Structural Discovery and Re-balancing Direct Density Ratio Estimation for Large-scale Covariate Shift Adaptation 2007 On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Probabilistic Joint Feature Selection for Multi-task Learning Application Conferences SIGIR 2009 Mining Employment Market via Text Block Detection and Adaptive Cross-Domain Information Extraction Knowledge transformation for cross-domain sentiment classification 2008 Topic-bridged PLSA for cross-domain text classification 2007 Cross-Lingual Query Suggestion Using Query Logs of Different Languages 2006 Tackling Concept Drift by Temporal Inductive Transfer Constructing Informative Prior Distributions from Domain Knowledge in Text Classification Building Bridges for Web Query Classification WWW 2009 Latent Space Domain Transfer between High Dimensional Overlapping Distributions 2008 Can Chinese web pages be classified with English data source? ACL 2009 Transfer Learning, Feature Selection and Word Sense Disambiguation Graph Ranking for Sentiment Transfer Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction Cross-Domain Dependency Parsing Using a Deep Linguistic Grammar Heterogeneous Transfer Learning for Image Clustering via the SocialWeb 2008 Exploiting Feature Hierarchy for Transfer Learning in Named Entity Recognition Multi-domain Sentiment Classification Active Sample Selection for Named Entity Transliteration Mining Wiki Resources for Multilingual Named Entity Recognition Multi-Task Active Learning for Linguistic Annotations 2007 Domain Adaptation with Active Learning for Word Sense Disambiguation Frustratingly Easy Domain Adaptation Instance Weighting for Domain Adaptation in NLP Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification Self-Training for Enhancement and Domain Adaptation of Statistical Parsers Trained on Small Datasets 2006 Estimating Class Priors in Domain Adaptation for Word Sense Disambiguation Simultaneous English-Japanese Spoken Language Translation Based on Incremental Dependency Parsing and Transfer CVPR 2009 Domain Transfer SVM for Video Concept Detection Boosted Multi-Task Learning for Face Verification With Applications to Web Image and Video Search 2008 Transfer Learning for Image Classification with Sparse Prototype Representations Workshops NIPS 2005 Workshop - Inductive Transfer: 10 Years Later NIPS 2005 Workshop - Interclass Transfer NIPS 2006 Workshop - Learning when test and training inputs have different distributions AAAI 2008 Workshop - Transfer Learning for Complex Tasks
转载于: http://apex.sjtu.edu.cn/apex_wiki/Transfer%20Learning 迁移学习( Transfer Learning ) 薛贵荣 在传统的机器学习的框架下,学习的任务就是在给定充分训练数据的基础上来学习一个分类模型;然后利用这个学习到的模型来对测试文档进行分类与预测。然而,我们看到机器学习算法在当前的Web挖掘研究中存在着一个关键的问题:一些新出现的领域中的大量训练数据非常难得到。我们看到Web应用领域的发展非常快速。大量新的领域不断涌现,从传统的新闻,到网页,到图片,再到博客、播客等等。传统的机器学习需要对每个领域都标定大量训练数据,这将会耗费大量的人力与物力。而没有大量的标注数据,会使得很多与学习相关研究与应用无法开展。其次,传统的机器学习假设训练数据与测试数据服从相同的数据分布。然而,在许多情况下,这种同分布假设并不满足。通常可能发生的情况如训练数据过期。这往往需要我们去重新标注大量的训练数据以满足我们训练的需要,但标注新数据是非常昂贵的,需要大量的人力与物力。从另外一个角度上看,如果我们有了大量的、在不同分布下的训练数据,完全丢弃这些数据也是非常浪费的。如何合理的利用这些数据就是迁移学习主要解决的问题。迁移学习可以从现有的数据中迁移知识,用来帮助将来的学习。迁移学习(Transfer Learning)的目标是将从一个环境中学到的知识用来帮助新环境中的学习任务。因此,迁移学习不会像传统机器学习那样作同分布假设。 我们在迁移学习方面的工作目前可以分为以下三个部分:同构空间下基于实例的迁移学习,同构空间下基于特征的迁移学习与异构空间下的迁移学习。我们的研究指出,基于实例的迁移学习有更强的知识迁移能力,基于特征的迁移学习具有更广泛的知识迁移能力,而异构空间的迁移具有广泛的学习与扩展能力。这几种方法各有千秋。 1.同构空间下基于实例的迁移学习 基于实例的迁移学习的基本思想是,尽管辅助训练数据和源训练数据或多或少会有些不同,但是辅助训练数据中应该还是会存在一部分比较适合用来训练一个有效的分类模型,并且适应测试数据。于是,我们的目标就是从辅助训练数据中找出那些适合测试数据的实例,并将这些实例迁移到源训练数据的学习中去。在基于实例的迁移学习方面,我们推广了传统的 AdaBoost 算法,提出一种具有迁移能力的boosting算法:Tradaboosting ,使之具有迁移学习的能力,从而能够最大限度的利用辅助训练数据来帮助目标的分类。我们的关键想法是,利用boosting的技术来过滤掉辅助数据中那些与源训练数据最不像的数据。其中,boosting的作用是建立一种自动调整权重的机制,于是重要的辅助训练数据的权重将会增加,不重要的辅助训练数据的权重将会减小。调整权重之后,这些带权重的辅助训练数据将会作为额外的训练数据,与源训练数据一起从来提高分类模型的可靠度。 基于实例的迁移学习只能发生在源数据与辅助数据非常相近的情况下。但是,当源数据和辅助数据差别比较大的时候,基于实例的迁移学习算法往往很难找到可以迁移的知识。但是我们发现,即便有时源数据与目标数据在实例层面上并没有共享一些公共的知识,它们可能会在特征层面上有一些交集。因此我们研究了基于特征的迁移学习,它讨论的是如何利用特征层面上公共的知识进行学习的问题。 2.同构空间下基于特征的迁移学习 在基于特征的迁移学习研究方面,我们提出了多种学习的算法,如CoCC算法 ,TPLSA算法 ,谱分析算法 与自学习算法 等。其中利用互聚类算法产生一个公共的特征表示,从而帮助学习算法。我们的基本思想是使用互聚类算法同时对源数据与辅助数据进行聚类,得到一个共同的特征表示,这个新的特征表示优于只基于源数据的特征表示。通过把源数据表示在这个新的空间里,以实现迁移学习。应用这个思想,我们提出了基于特征的有监督迁移学习与基于特征的无监督迁移学习。 2.1 基于特征的有监督迁移学习 我们在基于特征的有监督迁移学习方面的工作是基于互聚类的跨领域分类 ,这个工作考虑的问题是:当给定一个新的、不同的领域,标注数据及其稀少时,如何利用原有领域中含有的大量标注数据进行迁移学习的问题。在基于互聚类的跨领域分类这个工作中,我们为跨领域分类问题定义了一个统一的信息论形式化公式,其中基于互聚类的分类问题的转化成对目标函数的最优化问题。在我们提出的模型中,目标函数被定义为源数据实例,公共特征空间与辅助数据实例间互信息的损失。 2.2 基于特征的无监督迁移学习:自学习聚类 我们提出的自学习聚类算法 属于基于特征的无监督迁移学习方面的工作。这里我们考虑的问题是:现实中可能有标记的辅助数据都难以得到,在这种情况下如何利用大量无标记数据辅助数据进行迁移学习的问题。自学习聚类 的基本思想是通过同时对源数据与辅助数据进行聚类得到一个共同的特征表示,而这个新的特征表示由于基于大量的辅助数据,所以会优于仅基于源数据而产生的特征表示,从而对聚类产生帮助。 上面提出的两种学习策略(基于特征的有监督迁移学习与无监督迁移学习)解决的都是源数据与辅助数据在同一特征空间内的基于特征的迁移学习问题。当源数据与辅助数据所在的特征空间中不同时,我们还研究了跨特征空间的基于特征的迁移学习,它也属于基于特征的迁移学习的一种。 3 异构空间下的迁移学习:翻译学习 我们提出的翻译学习 致力于解决源数据与测试数据分别属于两个不同的特征空间下的情况。在 中,我们使用大量容易得到的标注过文本数据去帮助仅有少量标注的图像分类的问题,如上图所示。我们的方法基于使用那些用有两个视角的数据来构建沟通两个特征空间的桥梁。虽然这些多视角数据可能不一定能够用来做分类用的训练数据,但是,它们可以用来构建翻译器。通过这个翻译器,我们把近邻算法和特征翻译结合在一起,将辅助数据翻译到源数据特征空间里去,用一个统一的语言模型进行学习与分类。 引文: . Wenyuan Dai, Yuqiang Chen, Gui-Rong Xue, Qiang Yang, and Yong Yu. Translated Learning: Transfer Learning across Different Feature Spaces. Advances in Neural Information Processing Systems 21 (NIPS 2008), Vancouver, British Columbia, Canada, December 8-13, 2008. . Xiao Ling, Wenyuan Dai, Gui-Rong Xue, Qiang Yang, and Yong Yu. Spectral Domain-Transfer Learning. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), Pages 488-496, Las Vegas, Nevada, USA, August 24-27, 2008. . Wenyuan Dai, Qiang Yang, Gui-Rong Xue and Yong Yu. Self-taught Clustering. In Proceedings of the Twenty-Fifth International Conference on Machine Learning (ICML 2008), pages 200-207, Helsinki, Finland, 5-9 July, 2008. . Gui-Rong Xue, Wenyuan Dai, Qiang Yang and Yong Yu. Topic-bridged PLSA for Cross-Domain Text Classification. In Proceedings of the Thirty-first International ACM SIGIR Conference on Research and Development on Information Retrieval (SIGIR2008), pages 627-634, Singapore, July 20-24, 2008. . Xiao Ling, Gui-Rong Xue, Wenyuan Dai, Yun Jiang, Qiang Yang and Yong Yu. Can Chinese Web Pages be Classified with English Data Source? In Proceedings the Seventeenth International World Wide Web Conference (WWW2008), Pages 969-978, Beijing, China, April 21-25, 2008. . Xiao Ling, Wenyuan Dai, Gui-Rong Xue and Yong Yu. Knowledge Transferring via Implicit Link Analysis. In Proceedings of the Thirteenth International Conference on Database Systems for Advanced Applications (DASFAA 2008), Pages 520-528, New Delhi, India, March 19-22, 2008. . Wenyuan Dai, Gui-Rong Xue, Qiang Yang and Yong Yu. Co-clustering based Classification for Out-of-domain Documents. In Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007), Pages 210-219, San Jose, California, USA, Aug 12-15, 2007. . Wenyuan Dai, Gui-Rong Xue, Qiang Yang and Yong Yu. Transferring Naive Bayes Classifiers for Text Classification. In Proceedings of the Twenty-Second National Conference on Artificial Intelligence (AAAI 2007), Pages 540-545, Vancouver, British Columbia, Canada, July 22-26, 2007. . Wenyuan Dai, Qiang Yang, Gui-Rong Xue and Yong Yu. Boosting for Transfer Learning. In Proceedings of the Twenty-Fourth International Conference on Machine Learning (ICML 2007), Pages 193-200, Corvallis, Oregon, USA, June 20-24, 2007. . Dikan Xing, Wenyuan Dai, Gui-Rong Xue and Yong Yu. Bridged Refinement for Transfer Learning. In Proceedings of the Eleventh European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2007), Pages 324-335, Warsaw, Poland, September 17-21, 2007. (Best Student Paper Award) . Xin Zhang, Wenyuan Dai, Gui-Rong Xue and Yong Yu. Adaptive Email Spam Filtering based on Information Theory. In Proceedings of the Eighth International Conference on Web Information Systems Engineering (WISE 2007), Pages 159170, Nancy, France, December 3-7, 2007. Transfer Learning (2009-10-29 03:03:46由 grxue 编辑)
数据每年都在成倍增长,但是有用的信息却好像在减少。在过去 20 年里出现的数据挖掘领域正致力于这个问题。它不仅是一个重要的研究领域,而且在现实世界中具有重大的潜在应用价值。 数据挖掘和数据库知识发现( Data Mining Knowledge Discovery in Database ,简称 DMKDD )是 20 世纪 90 年代兴起的一门信息技术领域的前沿技术,它是在数据和数据库急剧增长远远超过人们对数据处理和理解能力的背景下产生的,也是数据库、统计学、机器学习、最优化与计算技术等多学科发展融合的结果。 知识发现是从数据中识别有效的、新颖的、潜在有用的、最终可理解模式的一个复杂过程。数据挖掘是知识发现中通过特定的算法在可接受的计算效率限制内生成特定模式的一个步骤。知识发现是一个包括数据选择、数据预处理、数据变换、数据挖掘、模式评价等步骤,最终得到知识的全过程,而数据挖掘是其中的一个关键步骤。由于数据挖掘对于知识发现的重要性,目前,大多数知识发现的研究都集中在数据挖掘的算法和应用上,因此,很多研究者往往对数据挖掘与知识发现不作严格区分,把二者混淆使用。 目前数据挖掘研究和实践与 20 世纪 60 年代的数据库研究和实践的状态相似。当时应用程序员每次编写程序时,都必须建立一个完整的数据库环境。随着关系数据模型、查询处理和优化技术、事务管理策略和特定查询语言( SQL )与界面的发展,现在的环境已经迥然不同了。在未来几十年内,数据挖掘技术的发展可能会与数据库发展历程相似,就是使数据挖掘技术更易于使用和开发。 参考文献: 1.U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy. Advances in knowledge discovery and data mining. AAAI/MIT Press, 1996. 2. J. Han, M. Kamber. Data mining: concepts and techniques. Morgan Kaufmann Publishers, 2001. ( 2nd Edition, 2006 ) 3. M. H. Dunham. Data Mining: Introductory and Advanced Topics. Pearson Education, Inc., 2003. (郭崇慧,田凤占,靳晓明等译.数据挖掘教程 ( 世界著名计算机教材精选 ) .清华大学出版社, 2005 .)
统计学习理论( Statistical Learning Theory , SLT )是一种专门研究有限样本情况下的统计理论 。该理论针对有限样本统计问题建立了一套新的理论体系,在这种体系下的统计推理规则不仅考虑了对渐近性能的要求,而且追求在现有有限信息的条件下得到最优结果。 V. Vapnik 等人从 20 世纪 70 年代开始致力于此方面研究,到 20 世纪 90 年代中期,随着其理论的不断发展和成熟,也由于神经网络等方法在理论上缺乏实质性进展,统计学习理论开始受到越来越广泛的重视。统计学习理论是建立在一套较坚实的理论基础之上的,为解决有限样本学习问题提供了一个统一的框架。 同时,在统计学习理论基础上发展了一种新的通用预测方法支持向量机( Support Vector Machines , SVM ),已初步表现出很多优于已有方法的性能 ,它能将很多现有方法(比如多项式逼近、径向基函数方法、多层感知器网络)纳入其中,有望帮助解决许多原来难以解决的问题(比如神经网络结构选择问题、局部极值问题等)。 SLT 和 SVM 正在成为继神经网络研究之后新的研究热点,并将推动数据挖掘与机器学习理论和技术的重大发展 。 参考文献: 1. V. Vapnik. The nature of statistical learning theory. Springer-Verlag, 1995. 2. V. Vapnik. Statistical learning theory. John Wiley and Sons, Inc., 1998. 3. B. E. Boser, I. Guyon, V. Vapnik. A training algorithm for optimal margin classifiers. In: D. Haussler, Editor, Proceedings of the Fifth Annual ACM Workshop of Computational Learning Theory, 144-152, ACM Press, 1992. 4. C. Cortes, V. Vapnik. Support-vector networks. Machine Learning, 1995, 20, 273-297 5. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 1998, 2(2), 121-167
第一个是人工智能的历史(History of Artificial Intelligence), 顺着 AI 发展时间线娓娓道来,中间穿插无数牛人故事,且一波三折大气磅礴,可谓事实比想象更令人惊讶。人工智能始于哲学思辨,中间经历了一个没有心理学(尤其是认知神经科学的)的帮助的阶段,仅通过牛人对人类思维的外在表现的归纳、内省,以及数学工具进行探索,其间最令人激动的是 Herbert Simon (决策理论之父,诺奖,跨领域牛人)写的一个自动证明机,证明了罗素的数学原理中的二十几个定理,其中有一个定理比原书中的还要优雅,Simon 的程序用的是启发式搜索,因为公理系统中的证明可以简化为从条件到结论的树状搜索(但由于组合爆炸,所以必须使用启发式剪枝)。后来 Simon 又写了 GPS (General Problem Solver),据说能解决一些能良好形式化的问题,如汉诺塔。但说到底 Simon 的研究毕竟只触及了人类思维的一个很小很小的方面 Formal Logic,甚至更狭义一点 Deductive Reasoning (即不包含 Inductive Reasoning , Transductive Reasoning (俗称 analogic thinking)。还有诸多比如 Common Sense、Vision、尤其是最为复杂的 Language 、Consciousness 都还谜团未解。还有一个比较有趣的就是有人认为 AI 问题必须要以一个物理的 Body 为支撑,一个能够感受这个世界的物理规则的身体本身就是一个强大的信息来源,基于这个信息来源,人类能够自身与时俱进地总结所谓的 Common-Sense Knowledge (这个就是所谓的 Emboddied Mind 理论。 ),否则像一些老兄直接手动构建 Common-Sense Knowledge Base ,就很傻很天真了,须知人根据感知系统从自然界获取知识是一个动态的自动更新的系统,而手动构建常识库则无异于古老的 Expert System 的做法。当然,以上只总结了很小一部分个人觉得比较有趣或新颖的,每个人看到的有趣的地方不一样,比如里面相当详细地介绍了神经网络理论的兴衰。所以建议你看自己一遍,别忘了里面链接到其他地方的链接。 第二个则是人工智能(Artificial Intelligence)。当然,还有机器学习等等。从这些条目出发能够找到许多非常有用和靠谱的深入参考资料。 然后是一些书籍 书籍: 1. 《Programming Collective Intelligence》,近年出的入门好书,培养兴趣是最重要的一环,一上来看大部头很容易被吓走的:P 2. Peter Norvig 的《AI, Modern Approach 2nd》(无争议的领域经典)。 3. 《The Elements of Statistical Learning》,数学性比较强,可以做参考了。 4. 《Foundations of Statistical Natural Language Processing》,自然语言处理领域公认经典。 5. 《Data Mining, Concepts and Techniques》,华裔科学家写的书,相当深入浅出。 6. 《Managing Gigabytes》,信息检索好书。 7. 《Information Theory:Inference and Learning Algorithms》,参考书吧,比较深。 相关数学基础(参考书,不适合拿来通读): 1. 线性代数:这个参考书就不列了,很多。 2. 矩阵数学:《矩阵分析》,Roger Horn。矩阵分析领域无争议的经典。 3. 概率论与统计:《概率论及其应用》,威廉费勒。也是极牛的书,可数学味道太重,不适合做机器学习的。于是讨论组里的 Du Lei 同学推荐了《All Of Statistics》并说到 机器学习这个方向,统计学也一样非常重要。推荐All of statistics,这是CMU的一本很简洁的教科书,注重概念,简化计算,简化与Machine Learning无关的概念和统计内容,可以说是很好的快速入门材料。 4. 最优化方法:《Nonlinear Programming, 2nd》非线性规划的参考书。《Convex Optimization》凸优化的参考书。此外还有一些书可以参考 wikipedia 上的最优化方法条目。要深入理解机器学习方法的技术细节很多时候(如SVM)需要最优化方法作为铺垫。 推荐几本书: 《Machine Learning, Tom Michell》, 1997. 老书,牛人。现在看来内容并不算深,很多章节有点到为止的感觉,但是很适合新手(当然,不能新到连算法和概率都不知道)入门。比如决策树部分就很精彩,并且这几年没有特别大的进展,所以并不过时。另外,这本书算是对97年前数十年机器学习工作的大综述,参考文献列表极有价值。国内有翻译和影印版,不知道绝版否。 《Modern Information Retrieval, Ricardo Baeza-Yates et al》. 1999 老书,牛人。貌似第一本完整讲述IR的书。可惜IR这些年进展迅猛,这本书略有些过时了。翻翻做参考还是不错的。另外,Ricardo同学现在是Yahoo Research for Europe and Latin Ameria的头头。 《Pattern Classification (2ed)》, Richard O. Duda, Peter E. Hart, David G. Stork 大约也是01年左右的大块头,有影印版,彩色。没读完,但如果想深入学习ML和IR,前三章(介绍,贝叶斯学习,线性分类器)必修。 还有些经典与我只有一面之缘,没有资格评价。另外还有两本小册子,论文集性质的,倒是讲到了了不少前沿和细节,诸如索引如何压缩之类。可惜忘了名字,又被我压在箱底,下次搬家前怕是难见天日了。 (呵呵,想起来一本:《Mining the Web - Discovering Knowledge from Hypertext Data》 ) 说一本名气很大的书:《Data Mining: Practical Machine Learning Tools and Techniques》。Weka 的作者写的。可惜内容一般。理论部分太单薄,而实践部分也很脱离实际。DM的入门书已经不少,这一本应该可以不看了。如果要学习了解 Weka ,看文档就好。第二版已经出了,没读过,不清楚。 信息检索方面,Du Lei 同学再次推荐: 信息检索方面的书现在建议看Stanford的那本《Introduction to Information Retrieval》,这书刚刚正式出版,内容当然up to date。另外信息检索第一大牛Croft老爷也正在写教科书,应该很快就要面世了。据说是非常pratical的一本书。 对信息检索有兴趣的同学,强烈推荐翟成祥博士在北大的暑期学校课程,这里有全slides和阅读材料: http://net.pku.edu.cn/~course/cs410/schedule.html maximzhao 同学推荐了一本机器学习: 加一本书:Bishop, 《Pattern Recognition and Machine Learning》. 没有影印的,但是网上能下到。经典中的经典。Pattern Classification 和这本书是两本必读之书。《Pattern Recognition and Machine Learning》是很新(07年),深入浅出,手不释卷。 最后,关于人工智能方面(特别地,决策与判断),再推荐两本有意思的书, 一本是《Simple Heuristics that Makes Us Smart》 另一本是《Bounded Rationality: The Adaptive Toolbox》 不同于计算机学界所采用的统计机器学习方法,这两本书更多地着眼于人类实际上所采用的认知方式,以下是我在讨论组上写的简介: 这两本都是德国ABC研究小组(一个由计算机科学家、认知科学家、神经科学家、经济学家、数学家、统计学家等组成的跨学科研究团体)集体写的,都是引起领域内广泛关注的书,尤其是前一本,後一本则是对 Herbert Simon (决策科学之父,诺奖获得者)提出的人类理性模型的扩充研究),可以说是把什么是真正的人类智能这个问题提上了台面。核心思想是,我们的大脑根本不能做大量的统计计算,使用fancy的数学手法去解释和预测这个世界,而是通过简单而鲁棒的启发法来面对不确定的世界(比如第一本书中提到的两个后来非常著名的启发法:再认启发法(cognition heuristics)和选择最佳(Take the Best)。当然,这两本书并没有排斥统计方法就是了,数据量大的时候统计优势就出来了,而数据量小的时候统计方法就变得非常糟糕;人类简单的启发法则充分利用生态环境中的规律性(regularities),都做到计算复杂性小且鲁棒。 关于第二本书的简介: 1. 谁是 Herbert Simon 2. 什么是 Bounded Rationality 3. 这本书讲啥的: 我一直觉得人类的决策与判断是一个非常迷人的问题。这本书简单地说可以看作是《决策与判断》的更全面更理论的版本。系统且理论化地介绍人类决策与判断过程中的各种启发式方法(heuristics)及其利弊(为什么他们是最优化方法在信息不足情况下的快捷且鲁棒的逼近,以及为什么在一些情况下会带来糟糕的后果等,比如学过机器学习的都知道朴素贝叶斯方法在许多情况下往往并不比贝叶斯网络效果差,而且还速度快;比如多项式插值的维数越高越容易overfit,而基于低阶多项式的分段样条插值却被证明是一个非常鲁棒的方案)。 在此提一个书中提到的例子,非常有意思:两个团队被派去设计一个能够在场上接住抛过来的棒球的机器人。第一组做了详细的数学分析,建立了一个相当复杂的抛物线近似模型(因为还要考虑空气阻力之类的原因,所以并非严格抛物线),用于计算球的落点,以便正确地接到球。显然这个方案耗资巨大,而且实际运算也需要时间,大家都知道生物的神经网络中生物电流传输只有百米每秒之内,所以 computational complexity 对于生物来说是个宝贵资源,所以这个方案虽然可行,但不够好。第二组则采访了真正的运动员,听取他们总结自己到底是如何接球的感受,然后他们做了这样一个机器人:这个机器人在球抛出的一开始一半路程啥也不做,等到比较近了才开始跑动,并在跑动中一直保持眼睛于球之间的视角不变,后者就保证了机器人的跑动路线一定会和球的轨迹有交点;整个过程中这个机器人只做非常粗糙的轨迹估算。体会一下你接球的时候是不是眼睛一直都盯着球,然后根据视线角度来调整跑动方向?实际上人类就是这么干的,这就是 heuristics 的力量。 相对于偏向于心理学以及科普的《决策与判断》来说,这本书的理论性更强,引用文献也很多而经典,而且与人工智能和机器学习都有交叉,里面也有不少数学内容,全书由十几个章节构成,每个章节都是由不同的作者写的,类似于 paper 一样的,很严谨,也没啥废话,跟《Psychology of Problem Solving》类似。比较适合 geeks 阅读哈。 另外,对理论的技术细节看不下去的也建议看看《决策与判断》这类书(以及像《别做正常的傻瓜》这样的傻瓜科普读本),对自己在生活中做决策有莫大的好处。人类决策与判断中使用了很多的 heuristics ,很不幸的是,其中许多都是在适应几十万年前的社会环境中建立起来的,并不适合于现代社会,所以了解这些思维中的缺点、盲点,对自己成为一个良好的决策者有很大的好处,而且这本身也是一个非常有趣的领域。 统计学习理论与支持向量机 统计学习理论(Statistical Learning Theory,SLT)是一种专门研究有限样本情况下的统计理论 。该理论针对有限样本统计问题建立了一套新的理论体系,在这种体系下的统计推理规则不仅考虑了对渐近性能的要求,而且追求在现有有限信息的条件下得到最优结果。V. Vapnik等人从20世纪70年代开始致力于此方面研究,到20世纪90年代中期,随着其理论的不断发展和成熟,也由于神经网络等方法在理论上缺乏实质性进展,统计学习理论开始受到越来越广泛的重视。统计学习理论是建立在一套较坚实的理论基础之上的,为解决有限样本学习问题提供了一个统一的框架。 同时,在统计学习理论基础上发展了一种新的通用预测方法支持向量机(Support Vector Machines,SVM),已初步表现出很多优于已有方法的性能 ,它能将很多现有方法(比如多项式逼近、径向基函数方法、多层感知器网络)纳入其中,有望帮助解决许多原来难以解决的问题(比如神经网络结构选择问题、局部极值问题等)。SLT和SVM正在成为继神经网络研究之后新的研究热点,并将推动数据挖掘与机器学习理论和技术的重大发展 。 参考文献: 1. V. Vapnik. The nature of statistical learning theory. Springer-Verlag, 1995. 2. V. Vapnik. Statistical learning theory. John Wiley and Sons, Inc., 1998. 3. B. E. Boser, I. Guyon, V. Vapnik. A training algorithm for optimal margin classifiers. In: D. Haussler, Editor, Proceedings of the Fifth Annual ACM Workshop of Computational Learning Theory, 144-152, ACM Press, 1992. 4. C. Cortes, V. Vapnik. Support-vector networks. Machine Learning, 1995, 20, 273-297 5. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 1998, 2(2), 121-167 http://www.support-vector-machines.org/SVM_soft.html SHOGUN - is a new machine learning toolbox with focus on large scale kernel methods and especially on Support Vector Machines (SVM) with focus to bioinformatics. It provides a generic SVM object interfacing to several different SVM implementations. Each of the SVMs can be combined with a variety of the many kernels implemented. It can deal with weighted linear combination of a number of sub-kernels, each of which not necessarily working on the same domain, where an optimal sub-kernel weighting can be learned using Multiple Kernel Learning. Apart from SVM 2-class classification and regression problems, a number of linear methods like Linear Discriminant Analysis (LDA), Linear Programming Machine (LPM), (Kernel) Perceptrons and also algorithms to train hidden markov models are implemented. The input feature-objects can be dense, sparse or strings and of type int/short/double/char and can be converted into different feature types. Chains of preprocessors (e.g. substracting the mean) can be attached to each feature object allowing for on-the-fly pre-processing. SHOGUN comes in different flavours, a stand-a-lone version and also with interfaces to Matlab(tm), R, Octave, Readline and Python. This is the R package.
转载于: http://bbs.byr.edu.cn/wForum/disparticle.php?boardName=PR_AIID=3229pos=12 我经常在 TopLanguage 讨论组上推荐一些书籍,也经常问里面的牛人们搜罗一些有关的资料,人工智能、机器学习、自然语言处理、知识发现(特别地,数据挖掘)、信息检索这些无疑是 CS 领域最好玩的分支了(也是互相紧密联系的),这里将最近有关机器学习和人工智能相关的一些学习资源归一个类: 首先是两个非常棒的 Wikipedia 条目,我也算是 wikipedia 的重度用户了,学习一门东西的时候常常发现是始于 wikipedia 中间经过若干次 google ,然后止于某一本或几本著作。 第一个是人工智能的历史(History of Artificial Intelligence),我在讨论组上写道: 而今天看到的这篇文章是我在 wikipedia 浏览至今觉得最好的。文章名为《人工智能的历史》,顺着 AI 发展时间线娓娓道来,中间穿插无数牛人故事,且一波三折大气磅礴,可谓事实比想象更令人惊讶。人工智能始于哲学思辨,中间经历了一个没有心理学(尤其是认知神经科学的)的帮助的阶段,仅通过牛人对人类思维的外在表现的归纳、内省,以及数学工具进行探索,其间最令人激动的是 Herbert Simon (决策理论之父,诺奖,跨领域牛人)写的一个自动证明机,证明了罗素的数学原理中的二十几个定理,其中有一个定理比原书中的还要优雅,Simon 的程序用的是启发式搜索,因为公理系统中的证明可以简化为从条件到结论的树状搜索(但由于组合爆炸,所以必须使用启发式剪枝)。后来 Simon 又写了 GPS (General Problem Solver),据说能解决一些能良好形式化的问题,如汉诺塔。但说到底 Simon 的研究毕竟只触及了人类思维的一个很小很小的方面 Formal Logic,甚至更狭义一点 Deductive Reasoning (即不包含 Inductive Reasoning , Transductive Reasoning (俗称 analogic thinking)。还有诸多比如 Common Sense、Vision、尤其是最为复杂的 Language 、Consciousness 都还谜团未解。还有一个比较有趣的就是有人认为 AI 问题必须要以一个物理的 Body 为支撑,一个能够感受这个世界的物理规则的身体本身就是一个强大的信息来源,基于这个信息来源,人类能够自身与时俱进地总结所谓的 Common-Sense Knowledge (这个就是所谓的 Emboddied Mind 理论。 ),否则像一些老兄直接手动构建 Common-Sense Knowledge Base ,就很傻很天真了,须知人根据感知系统从自然界获取知识是一个动态的自动更新的系统,而手动构建常识库则无异于古老的 Expert System 的做法。当然,以上只总结了很小一部分我个人觉得比较有趣或新颖的,每个人看到的有趣的地方不一样,比如里面相当详细地介绍了神经网络理论的兴衰。所以我强烈建议你看自己一遍,别忘了里面链接到其他地方的链接。 顺便一说,徐宥同学打算找时间把这个条目翻译出来,这是一个相当长的条目,看不动 E 文的等着看翻译吧:) 第二个则是人工智能(Artificial Intelligence)。当然,还有机器学习等等。从这些条目出发能够找到许多非常有用和靠谱的深入参考资料。 然后是一些书籍 书籍: 1. 《Programming Collective Intelligence》,近年出的入门好书,培养兴趣是最重要的一环,一上来看大部头很容易被吓走的:P 2. Peter Norvig 的《AI, Modern Approach 2nd》(无争议的领域经典)。 3. 《The Elements of Statistical Learning》,数学性比较强,可以做参考了。 4. 《Foundations of Statistical Natural Language Processing》,自然语言处理领域公认经典。 5. 《Data Mining, Concepts and Techniques》,华裔科学家写的书,相当深入浅出。 6. 《Managing Gigabytes》,信息检索好书。 7. 《Information Theory:Inference and Learning Algorithms》,参考书吧,比较深。 相关数学基础(参考书,不适合拿来通读): 1. 线性代数:这个参考书就不列了,很多。 2. 矩阵数学:《矩阵分析》,Roger Horn。矩阵分析领域无争议的经典。 3. 概率论与统计:《概率论及其应用》,威廉费勒。也是极牛的书,可数学味道太重,不适合做机器学习的。于是讨论组里的 Du Lei 同学推荐了《All Of Statistics》并说到 机器学习这个方向,统计学也一样非常重要。推荐All of statistics,这是CMU的一本很简洁的教科书,注重概念,简化计算,简化与Machine Learning无关的概念和统计内容,可以说是很好的快速入门材料。 4. 最优化方法:《Nonlinear Programming, 2nd》非线性规划的参考书。《Convex Optimization》凸优化的参考书。此外还有一些书可以参考 wikipedia 上的最优化方法条目。要深入理解机器学习方法的技术细节很多时候(如SVM)需要最优化方法作为铺垫。 王宁同学推荐了好几本书: 《Machine Learning, Tom Michell》, 1997. 老书,牛人。现在看来内容并不算深,很多章节有点到为止的感觉,但是很适合新手(当然,不能新到连算法和概率都不知道)入门。比如决策树部分就很精彩,并且这几年没有特别大的进展,所以并不过时。另外,这本书算是对97年前数十年机器学习工作的大综述,参考文献列表极有价值。国内有翻译和影印版,不知道绝版否。 《Modern Information Retrieval, Ricardo Baeza-Yates et al》. 1999 老书,牛人。貌似第一本完整讲述IR的书。可惜IR这些年进展迅猛,这本书略有些过时了。翻翻做参考还是不错的。另外,Ricardo同学现在是Yahoo Research for Europe and Latin Ameria的头头。 《Pattern Classification (2ed)》, Richard O. Duda, Peter E. Hart, David G. Stork 大约也是01年左右的大块头,有影印版,彩色。没读完,但如果想深入学习ML和IR,前三章(介绍,贝叶斯学习,线性分类器)必修。 还有些经典与我只有一面之缘,没有资格评价。另外还有两本小册子,论文集性质的,倒是讲到了了不少前沿和细节,诸如索引如何压缩之类。可惜忘了名字,又被我压在箱底,下次搬家前怕是难见天日了。 (呵呵,想起来一本:《Mining the Web - Discovering Knowledge from Hypertext Data》 ) 说一本名气很大的书:《Data Mining: Practical Machine Learning Tools and Techniques》。Weka 的作者写的。可惜内容一般。理论部分太单薄,而实践部分也很脱离实际。DM的入门书已经不少,这一本应该可以不看了。如果要学习了解 Weka ,看文档就好。第二版已经出了,没读过,不清楚。 信息检索方面,Du Lei 同学再次推荐: 信息检索方面的书现在建议看Stanford的那本《Introduction to Information Retrieval》,这书刚刚正式出版,内容当然up to date。另外信息检索第一大牛Croft老爷也正在写教科书,应该很快就要面世了。据说是非常pratical的一本书。 对信息检索有兴趣的同学,强烈推荐翟成祥博士在北大的暑期学校课程,这里有全slides和阅读材料: http://net.pku.edu.cn/~course/cs410/schedule.html maximzhao 同学推荐了一本机器学习: 加一本书:Bishop, 《Pattern Recognition and Machine Learning》. 没有影印的,但是网上能下到。经典中的经典。Pattern Classification 和这本书是两本必读之书。《Pattern Recognition and Machine Learning》是很新(07年),深入浅出,手不释卷。 最后,关于人工智能方面(特别地,决策与判断),再推荐两本有意思的书, 一本是《Simple Heuristics that Makes Us Smart》 另一本是《Bounded Rationality: The Adaptive Toolbox》 不同于计算机学界所采用的统计机器学习方法,这两本书更多地着眼于人类实际上所采用的认知方式,以下是我在讨论组上写的简介: 这两本都是德国ABC研究小组(一个由计算机科学家、认知科学家、神经科学家、经济学家、数学家、统计学家等组成的跨学科研究团体)集体写的,都是引起领域内广泛关注的书,尤其是前一本,後一本则是对 Herbert Simon (决策科学之父,诺奖获得者)提出的人类理性模型的扩充研究),可以说是把什么是真正的人类智能这个问题提上了台面。核心思想是,我们的大脑根本不能做大量的统计计算,使用fancy的数学手法去解释和预测这个世界,而是通过简单而鲁棒的启发法来面对不确定的世界(比如第一本书中提到的两个后来非常著名的启发法:再认启发法(cognition heuristics)和选择最佳(Take the Best)。当然,这两本书并没有排斥统计方法就是了,数据量大的时候统计优势就出来了,而数据量小的时候统计方法就变得非常糟糕;人类简单的启发法则充分利用生态环境中的规律性(regularities),都做到计算复杂性小且鲁棒。 关于第二本书的简介: 1. 谁是 Herbert Simon 2. 什么是 Bounded Rationality 3. 这本书讲啥的: 我一直觉得人类的决策与判断是一个非常迷人的问题。这本书简单地说可以看作是《决策与判断》的更全面更理论的版本。系统且理论化地介绍人类决策与判断过程中的各种启发式方法(heuristics)及其利弊(为什么他们是最优化方法在信息不足情况下的快捷且鲁棒的逼近,以及为什么在一些情况下会带来糟糕的后果等,比如学过机器学习的都知道朴素贝叶斯方法在许多情况下往往并不比贝叶斯网络效果差,而且还速度快;比如多项式插值的维数越高越容易overfit,而基于低阶多项式的分段样条插值却被证明是一个非常鲁棒的方案)。 在此提一个书中提到的例子,非常有意思:两个团队被派去设计一个能够在场上接住抛过来的棒球的机器人。第一组做了详细的数学分析,建立了一个相当复杂的抛物线近似模型(因为还要考虑空气阻力之类的原因,所以并非严格抛物线),用于计算球的落点,以便正确地接到球。显然这个方案耗资巨大,而且实际运算也需要时间,大家都知道生物的神经网络中生物电流传输只有百米每秒之内,所以 computational complexity 对于生物来说是个宝贵资源,所以这个方案虽然可行,但不够好。第二组则采访了真正的运动员,听取他们总结自己到底是如何接球的感受,然后他们做了这样一个机器人:这个机器人在球抛出的一开始一半路程啥也不做,等到比较近了才开始跑动,并在跑动中一直保持眼睛于球之间的视角不变,后者就保证了机器人的跑动路线一定会和球的轨迹有交点;整个过程中这个机器人只做非常粗糙的轨迹估算。体会一下你接球的时候是不是眼睛一直都盯着球,然后根据视线角度来调整跑动方向?实际上人类就是这么干的,这就是 heuristics 的力量。 相对于偏向于心理学以及科普的《决策与判断》来说,这本书的理论性更强,引用文献也很多而经典,而且与人工智能和机器学习都有交叉,里面也有不少数学内容,全书由十几个章节构成,每个章节都是由不同的作者写的,类似于 paper 一样的,很严谨,也没啥废话,跟《Psychology of Problem Solving》类似。比较适合 geeks 阅读哈。 另外,对理论的技术细节看不下去的也建议看看《决策与判断》这类书(以及像《别做正常的傻瓜》这样的傻瓜科普读本),对自己在生活中做决策有莫大的好处。人类决策与判断中使用了很多的 heuristics ,很不幸的是,其中许多都是在适应几十万年前的社会环境中建立起来的,并不适合于现代社会,所以了解这些思维中的缺点、盲点,对自己成为一个良好的决策者有很大的好处,而且这本身也是一个非常有趣的领域。 (完)