【研究动态】 深度学习 进阶线路图(一) 在应用机器学习的时候,最耗时和重要的阶段是对原始数据进行特征提取。 深度学习 是一个新的机器学习的分支,他要做的就是跨过整个特征设计阶段,而是直接从数据中学习得到。大部分的深度学习方法都是基于神经网络的,在这些结构中,复杂的高层结构是建立在多个非线性神经元函数的多层叠加上的。 其实最容易的介绍神经网络和深度学习的教程是Geoff Hinton’s Coursera course .(AITMR译者注:我以前上过这个课,需要有一定的基础才能听得懂的。)在这个课程中,你能学习得到一些关键思想并能让你自己完成一些简单的算法。( Geoff 是这个领域的先驱者,AITMR注:我们大家都习惯叫他祖师爷,他在06在Nature上发表了一篇关于深度学习的文章,被认为是这个领域的第一篇比较详细的文章,并附有代码。) 其实说白了 深度学习 就是从基本数据中学习,然后让模型工作得更好。但是这个领域目前来说还没有达到这个阶段,就是可以你把数据输进去,然后模型就完全自动的学习了。目前,你还需要判断很多问题:模型过拟合了吗?优化过程完成了吗?需要增加更多的神经元吗?需要增加更多的层数吗?不幸的是,现在对于这些个问题还没有一个共识,因此你需要不管的思考并做实验。为了达到这个水准,你需要深入理解算法的核心内容,和于此相关的一些机器学习的关键知识。这篇文章就是要根据这些点来给出一个进阶帮助你更好的理解深度学习。 如果你以前没怎么看过Metacademy,你可以在 here 找到这个网站内容的基本结构和内容简介。登陆Metacademy,基本概念已经用红色标明了。这些将给你一些基本的认识,例如: 随着更多的内容加入到Metacademy,学习内容会实时更新的。外链用绿色字体显示,尽管我们已经尽可能的去列得详细,但是你还是要根据自己的情况去选择一些。但是你也不同完全按照我们给出的线路图去完成,因为每个人的情况不一样。 You can also check out one of several review papers, which give readable overviews of recent progress in the field: Y. Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2009. Y. Bengio, A. Courville, and P. Vincent. Representation learning: a review and new perspectives. 2014 【研究动态】深度学习进阶线路图(二) Supervised models 如果你对使用神经网络感兴趣,意味着你也许喜欢自动的预测。监督学习是一种机器学习的框架,在这个框架里面你有一项特别想完成的任务,计算机通过有标签的数据集合学习得到模型。比如加入你想自动的判别邮件信息是垃圾邮件还是正常邮件,有监督学习框架中,你会有100,000封电子邮件被标注为“垃圾邮件”或者是“非垃圾邮件”,这些数据集合被用来训练分类器,使得它能够判别以前从未遇到过的邮件。 在深入理解神经网络之前,你应该先了解一点浅层网络机器学习算法比如线性回归( linear regression ),逻辑回归( logistic regression ),支持向量机( support vector machines (SVMs) )等。这些算法更容易完成,并且有一些很成熟的软件包可以调用(比如: AITMR译者注:scikit-learn确实是一个非常好的软件包,译者一直以来都是用这个软件包,而且你还可以通过研究软件包的源程序来深入的学习相关的算法。)。这些算法是你实现神经网络的一些基础内容。另外,神经网络的基本组成部分神经元和这些模型有很大的关联。因此,你花时间学习这些模型也是为更深度的理解神经网络打好了基础。 为了能更好的应用监督学习,你需要理解泛化( generalization )的概念,所谓泛化就是在新数据上的预测能力。你需要理解怎样去平衡过拟合与欠拟合之间的权衡:你既要使你的模型能够完全的表达你的数据,又不至于使得你的模型过于复杂使其过拟合。在回归方面,这个可以归结为 bias and variance ,这提供了一个更为直观的表达。当然 cross-validation 是一个你可以用来测量泛化能力的算法。 最开始的深度学习模型是前馈神经网络 feed-forward neural net 它是通过反向传播 backpropagation 来学习的。 视觉是深度神经网络的一个主要的应用方向,而卷积神经网络 convolutional nets 在此方面已经取得了突破性的进展。 递归神经网络 Recurrent neural nets 是把数据表示成时间结构的一类神经网络。伴随时间的Backpropagation是一种优雅的训练算法,但是在实用性方面还是有一定问题的。 【研究动态】深度学习进阶线路图(三):非监督学习 监督模型中,有标签数据可用来训练模型进行预测。但是在很多情况下,标签数据很难获取,或者是很难定义标签。你有的可能只是些非标签数据。这种情况下的学习叫做非监督学习。例如,你想对邮件进行“垃圾邮件”和“非垃圾邮件”的分类,但是你却没有标签数据集。那么,用这些非标签数据集,你能做些什么事呢?第一,你可以简单的分析数据模式。你的非标签数据也许蕴含着某些潜在的属性,这些潜在属性可以通过主成分分析( principal component analysis )或者是因子分析( factor analysis )得到。第二,你也可以对你的数据进行聚类研究,一类的数据比其他类的数据更为近似,聚类算法主要有 k-means 和 mixture of Gaussians 在神经网络领域,非监督学习还有另外一种作用:他能对监督学习提供一些帮助。尤其是非标签数据比标签数据更容易获取。假如你正在进行目标识别方面的工作,给图像中的目标做标签是一件非常繁琐的事情,然而非标签的数据却可以从网上成千上万的下载。 非监督预训练( Unsupervised pre-training )已经在很多领域证明了其可极大的提供识别率。他的思想就是你通过非标签数据训练一个非监督神经网络,然后把类似的结构联合起来构成监督神经网络。目的都是为了给原始数据进行建模,而预训练能够为预先提取一些数据的相关结构。另外,深度非监督模型也比深度监督模型更容易训练(当然,关于这一点现在大家还不知道具体原因。)。非监督网络的预训练初始化使得整个网络训练时不至于陷入局部极值。 关于分监督训练好处的证明仍然是复杂的,很多成功应用的深度神经网络已经避免使用它,尤其是在大数据的背景下。但是他也保持者很好的记录,值得我们取关注他。 那哪些是非监督神经网络呢?最基本的就是自编码结构( autoencoder ),这是一种预测他自己输入的前馈神经网络。然而这并不是最难的事情,可以一些限制后事情变得有点困难。第一,网络中有一层或者是多层的神经元数量要比数据层的小。另外,还需要限制隐含层的活跃神经元是稀疏的(译者注:只有少部分神经元的输出为非零。)。再着,在输入数据中加入一些噪声,再使网络具有去噪能力( denoising autoencoder )。 关于非监督学习另外一种方法是生成模型。人们假设数据符合某种潜在的分布,然后尝试对这种分布建模。受限玻耳兹曼机 Restricted Boltzmann machines (RBMs) 是一种监督的只有一个隐含层的生成模型。而这个模型可以堆积形成多层生成模型,比如深信网络( deep belief nets (DBNs) )和深度玻尔兹曼机( deep Boltzmann machines (DBMs) )等。 DBMs can learn to model some pretty complex data distributions: Generative modeling is a deep and rich area, and you can find lots more examples in the Bayesian machine learning roadmap . 【研究动态】深度学习进阶线路图(四):优化算法 定义好深度神经网络的机构后,该怎么去训练他们呢?最笨重的训练方法就是随机梯度下降法( stochastic gradient descent (SGD) ),这种方法在每次训练中只添加一个训练样本(或者说是少量的训练样本),通过这些训练样本一小步一小步的减小损失函数。也就是说这需要计算损失函数的梯度值,这可以通过反向传播的算法算得。当然在编好程序后要进行“梯度计算检查”( check your gradient computations )来确保你的梯度计算是正确的。SGD算法理解简单,实现也比较容易,用起来也是十分的得心应手。 其他还有很多其他的凸优化( convex optimization )可以解决这个训练问题,在凸问题中,SGD和其他的一些局部搜索算法可以保证找到全局极值。可以找到全局极值是因为函数的形状是“碗状”(即凸函数),因此微调就朝着全局极值的方向走了。很多机器学习的研究就是想去构造一个凸优化问题。然而,深度神经网络却并非都是凸优化问题,因此你仅能保证找到一个局部极小值。这看似令人失望,但是我们也找到了一些解决方法( something we can live with )。对于大部分的前馈网络和生成网络,局部极值其实是挺合理的。(当时递归神经网络是个意外。) 局部极值最大的问题就是损失函数的曲率会趋向极值。然而神经网络是非凸的,因此曲率的问题就凸显出来了,而解决神经网络的训练的问题都是借鉴的凸优化问题的求解方法。如果你想了解一些相关的背景知识,可以去看看Boyd and Vandenberghe’s写的书凸优化( Convex Optimization ): Sections 9.2-9.3 talk about gradient descent, the canonical first-order optimization method (i.e. a method which only uses first derivatives) Section 9.5 talks about Newton’s method, the canonical second-order optimization method (i.e. a method which accounts for second derivatives, or curvature) 牛顿法其实是很适合去处理曲率问题的,但是他处理大尺度的神经网络训练却并不实用,主要有两方面的原因:其一,它是一种批处理方法,因此每次训练都得把全部的训练样本添加进去。其二,他需要构建Hessian矩阵,并且还要求逆,而Hessian矩阵的维数和参数维数相等啊。(译者注:计算量太大了,当你的神经网络结构非常大的时候,这简直就是一场灾难啊!)因此,一直以来他都是作为一种理想的二阶训练方法人们去尝试。实际上用得最多的还是: conjugate gradient limited memory BFGS 相比于一般的神经网络模型,训练RBMs又提出了新的挑战:目标函数需要计算配分函数,而梯度计算需要进行推理( inference )。而这两者貌似都是不可解的( intractable )。在实际操作中 Contrastive divergence and persistent contrastive divergence 被广泛的用来估计梯度。然而,模型估计依然还是个问题。退火抽样( annealed importance sampling )可以用来估计模型的似然函数( estimate the model likelihood )。但是终究还是显得苍白,对于估计模型的性能还是很难完成的。 Even once you understand the math behind these algorithms, the devil’s in the details. Here are some good practical guides for getting these algorithms to work in practice: G. Hinton. A practical guide to training restricted Boltzmann machines. 2010. J. Martens and I. Sutskever. Training deep and recurrent networks with Hessian-free optimization. Neural Networks: Tricks of the Trade, 2012. Y. Bengio. Practical recommendations for gradient-based training of deep architectures. Neural Networks: Tricks of the Trade, 2012. L. Bottou. Stochastic gradient descent tricks. Neural Networks: Tricks of the Trade, 2012. 【研究动态】深度学习进阶线路图(五):应用与相关软件 视觉应用: 计算机视觉是神经网络和深度学习的主要应用领域。早在1998年,卷积神经网络( convolutional nets )就已经在手写数字识别上大显身手,MNIST手写书库( MNIST handrwritten digit dataset )也一直以来都作为神经网络研究的标准数据集。(译者注:卷积神经网络在计算机视觉领域的应用是空前的,据说美国ATM机上支票的识别就是用的CNNs算法,而最近CNNs的研究又进入了一个爆发期,很多新的算法表中出现,比如3D的CNNs等。笔者曾经仔细研读过CNNs的MATLAB代码,确实是一个很好的算法,而且对于图像的识别率也是很高的。)近来,卷积神经网络把数千种类的物体分类问题( classifying between thousands of object categories )大大的推进了一步。仅用行像素来学习打游戏( play Atari games )的DeepMind系统里面就用到了视觉识别。 也有很多的工作是关于图像的生成模型的。而这些研究工作都是关注于学习稀疏表示( learning sparse representations )和图像的局部关系建模( modeling the local covariance structure )。加入你用卷积结构的生成模型对图像建模,那么你能得到更深层次的特征。 相关软件: Cafe is an increasingly popular deep learning software package designed for image-related tasks, e.g. object recognition. It’s one of the fastest deep learning packages available — it’s written in C++ and CUDA. The University of Toronto machine learning group has put together some nice GPU libraries for Python. GNumPy gives a NumPy-like wrapper for GPU arrays. It wraps around Cudamat , a GPU linear algebra library, and npmat , which pretends to be a GPU on a CPU machine (for debugging). PyLearn is a neural net library developed by the University of Montreal machine learning group . It is intended for researchers, so it is built to be customizable and extendable. PyLearn is built on top of Theano , a Python library for neural nets and related algorithms (also developed at Montreal), which provides symbolic differentiation and GPU support. If for some reason you hate Python, Torch is a powerful machine learning library for Lua. 【研究动态】深度学习进阶线路图(六):深度学习与其他机器学习算法的关系 Relationships with other machine learning techniques 神经网络和其他的机器学习算法有着千丝万缕的联系。理解他们之间的这些关系可以帮助我们选定神经网络的结构。 很多神经网络结构可以看成是浅层结构的非线性叠加生成。前馈网络就可以看做是逻辑回归( logistic regression )的类比。而自编码网络(Autoencoders)可以看成是降维算法( PCA )的非线性类比。 RBMs和所有的高斯单元可以看成是类似于因子分析( equivalent to Factor analysis )。RBMs也可以看成是另外一种指数族分布( exponential family )。 核方法是另外一种把线性算法转为非线性算法的技术。神经网络和核方法之间其实有着很微妙的关系:贝叶斯神经网络其实就是有无限多个隐含神经元的高斯过程。(See Chapter 2 of Radford Neal’s Ph.D. thesis. Background: Gaussian processes ) Relationship with the brain If these models are called “neural” nets, it’s natural to ask whether they have anything to do with how the brain works . In a certain sense, they don’t: you can understand and apply the algorithms without knowing anything about neuroscience. Mathematically, feed-forward neural nets are just adaptive basis function expansions . But the connections do run pretty deep between practical machine learning and studies of the mind and brain. Unfortunately, Metacademy doesn’t have any neuroscience content (yet!), so the background links in this section will be fairly incomplete. Doubly unfortunately, neuroscience and cognitive science seem not to have the same commitment to open access that machine learning does, so this section might only be useful if you have access to a university library. When trying to draw parallels between learning algorithms and the brain, we need to be precise about what level we’re talking about. In “The philosophy and the approach” (Chapter 1 of Vision: a Computational Investigation ), David Marr argued for explicitly separating different levels of analysis: computation, algorithms, and implementation. (This is worth reading, even if you read nothing else in this section.) While not all researchers agree with this way of partitioning things, it’s useful to keep in mind when trying to understand exactly what someone is claiming. Neuroscience Jeff Hawkins’s book On Intelligence aims to present a unifying picture of the computational role of the neocortex. While the theory itself is fairly speculative, the book is an engaging and accessible introduction to the structure of the cortex. Many neural net models have learned similar response properties to neurons in the primary visual cortex (V1). Olshausen and Field’s sparse coding model ( background ) was the first to demonstrate that a purely statistical learning algorithm discovered filters similar to those of V1. (Whether or not this is a neural net is a matter of opinion.) Since then, a wide variety of representation learning algorithms based on seemingly different ideas have recovered similar representations. Other statistical models have learned topological representations similar to the layout of cell types in V1. Karklin and Lewicki fit a more sophisticated statistical model which reproduced response properties of complex cells. While the connection between V1 and learned filters may seem tidy, Olshausen highlights a lot of things we still don’t understand about V1 . For more on the neuroscience of the visual system, check out Eye, Brain, and Vision , a freely available book written by David Hubel, one of the pioneers who first studied V1. (Chapters 3, 4, and 5 are the most relevant.) There have also been neural nets explicitly proposed as models of the brain. Riesenhuber and Poggio’s HMAX model is a good example. Jim DiCarlo found that deep convolutional networks yield neurons which behave similarly to those high up in the primate visual hierarchy. Cognitive science It’s not just at the level of neurons that researchers have tried to draw connections between the brain and neural nets. Cognitive science refers to the interdisciplinary study of thought processes, and can be thought of a study of the mind rather than the brain. Connectionism is a branch of cognitive science, especially influential during the 1980s, which attempted to model high-level cognitive processes in terms of networks of neuron-like units. (Several of the most influential machine learning researchers came out of this tradition.) McClelland and Rumelhart’s book Parallel Distributed Processing (volumes 1 and 2 ) is the connectionist Bible. Other significant works in the field include: J. McClelland and T. Rogers. The parallel distributed processing approach to semantic cognition. Nature Reviews Neuroscience, 2003. One of the most perplexing questions about the brain is how neural systems can model the compositional structure of language. Linguists tend to model language in terms of recursive structures like grammars, which are very different from the representations used in most neural net research. Paul Smolensky and Geraldine Legendre’s book The Harmonic Mind presents a connectionist theory of language, where neurons implement a system of constraints between different linguistic features. 转自: http://www.aitmr.com/index.php/airesearch/373.html http://www.aitmr.com/index.php/airesearch/401.html http://www.aitmr.com/index.php/airesearch/417.html http://www.aitmr.com/index.php/airesearch/425.html http://www.aitmr.com/index.php/airesearch/442.html http://www.aitmr.com/index.php/airesearch/448.html 原文翻译自:http://metacademy.org/roadmaps/rgrosse/deep_learning
引言: 神经网络( N eural N etwork)与支持向量机( S upport V ector M achines,SVM)是统计学习的代表方法。可以认为神经网络与支持向量机都源自于感知机(Perceptron)。感知机是1958年由Rosenblatt发明的线性分类模型。感知机对线性分类有效,但现实中的分类问题通常是非线性的。 神经网络与支持向量机(包含核方法)都是非线性分类模型。1986年,Rummelhart与McClelland发明了神经网络的学习算法 B ack P ropagation。后来,Vapnik等人于1992年提出了支持向量机。神经网络是多层(通常是三层)的非线性模型, 支持向量机利用核技巧把非线性问题转换成线性问题。 神经网络与支持向量机一直处于“竞争”关系。 Scholkopf是Vapnik的大弟子,支持向量机与核方法研究的领军人物。据Scholkopf说,Vapnik当初发明支持向量机就是想干掉神经网络(He wanted to kill Neural Network)。支持向量机确实很有效,一段时间支持向量机一派占了上风。 近年来,神经网络一派的大师Hinton又提出了神经网络的Deep Learning算法(2006年),使神经网络的能力大大提高,可与支持向量机一比。 Deep Learning假设神经网络是多层的,首先用Boltzman Machine(非监督学习)学习网络的结构,然后再通过Back Propagation(监督学习)学习网络的权值。 关于Deep Learning的命名,Hinton曾开玩笑地说: I want to call SVM shallow learning. (注:shallow 有肤浅的意思)。其实Deep Learning本身的意思是深层学习,因为它假设神经网络有多层。 总之,Deep Learning是值得关注的统计学习新算法。 深度学习(Deep Learning) 是ML研究中的一个新的领域,它被引入到ML中使ML更接近于其原始的目标:AI。查看 a brief introduction to Machine Learning for AI 和 an introduction to Deep Learning algorithms . 深度学习是关于学习多个表示和抽象层次,这些层次帮助解释数据,例如图像,声音和文本。 对于更多的关于深度学习算法的知识,可以参看: The monograph or review paper Learning Deep Architectures for AI (Foundations Trends in Machine Learning, 2009). The ICML 2009 Workshop on Learning Feature Hierarchies webpage has a list of references . The LISA public wiki has a reading list and a bibliography . Geoff Hinton has readings from last year’s NIPS tutorial . 这篇综述主要是介绍一些最重要的深度学习算法,并将演示如何用 Theano 来运行它们。 Theano是一个python库,使得写深度学习模型更加容易,同时也给出了一些关于在GPU上训练它们的选项。 这个算法的综述有一些先决条件。首先你应该知道一个关于python的知识,并熟悉numpy。由于这个综述是关于如何使用Theano,你应该先阅读 Theano basic tutorial 。一旦你完成这些,阅读我们的 Getting Started 章节---它将介绍概念定义,数据集,和利用随机梯度下降来优化模型的方法。 纯有监督学习算法可以按照以下顺序阅读: Logistic Regression - using Theano for something simple Multilayer perceptron - introduction to layers Deep Convolutional Network - a simplified version of LeNet5 无监督和半监督学习算法可以用任意顺序阅读(auto-encoders可以被独立于RBM/DBM地阅读): Auto Encoders, Denoising Autoencoders - description of autoencoders Stacked Denoising Auto-Encoders - easy steps into unsupervised pre-training for deep nets Restricted Boltzmann Machines - single layer generative RBM model Deep Belief Networks - unsupervised generative pre-training of stacked RBMs followed by supervised fine-tuning 关于mcRBM模型,也有一篇新的关于从能量模型中抽样的综述: HMC Sampling - hybrid (aka Hamiltonian) Monte-Carlo sampling with scan() 上文翻译自 http://deeplearning.net/tutorial/ 查看最新论文 Yoshua Bengio, Learning Deep Architectures for AI, Foundations and Trends in Machine Learning, 2(1), 2009 深度(Depth) 从一个输入中产生一个输出所涉及的计算可以通过一个流向图(flow graph)来表示:流向图是一种能够表示计算的图,在这种图中每一个节点表示一个基本的计算并且一个计算的值(计算的结果被应用到这个节点的孩子节点的值)。考虑这样一个计算集合,它可以被允许在每一个节点和可能的图结构中,并定义了一个函数族。输入节点没有孩子,输出节点没有父亲。 对于表达 的流向图,可以通过一个有两个输入节点 和 的图表示,其中一个节点通过使用 和 作为输入(例如作为孩子)来表示 ;一个节点仅使用 作为输入来表示平方;一个节点使用 和 作为输入来表示加法项(其值为 );最后一个输出节点利用一个单独的来自于加法节点的输入计算SIN。 这种流向图的一个特别属性是深度(depth):从一个输入到一个输出的最长路径的长度。 传统的前馈神经网络能够被看做拥有等于层数的深度(比如对于输出层为隐层数加1)。SVMs有深度2(一个对应于核输出或者特征空间,另一个对应于所产生输出的线性混合)。 深度架构的动机 学习基于深度架构的学习算法的主要动机是: 不充分的深度是有害的; 大脑有一个深度架构; 认知过程是深度的; 不充分的深度是有害的 在许多情形中深度2就足够(比如logical gates, formal neurons, sigmoid-neurons, Radial Basis Function units like in SVMs)表示任何一个带有给定目标精度的函数。但是其代价是:图中所需要的节点数(比如计算和参数数量)可能变的非常大。理论结果证实那些事实上所需要的节点数随着输入的大小指数增长的函数族是存在的。这一点已经在logical gates, formal neurons 和rbf单元中得到证实。在后者中Hastad说明了但深度是d时,函数族可以被有效地(紧地)使用O(n)个节点(对于n个输入)来表示,但是如果深度被限制为d-1,则需要指数数量的节点数O(2^n)。 我们可以将深度架构看做一种因子分解。大部分随机选择的函数不能被有效地表示,无论是用深地或者浅的架构。但是许多能够有效地被深度架构表示的却不能被用浅的架构高效表示(see the polynomials example in the Bengio survey paper )。一个紧的和深度的表示的存在意味着在潜在的可被表示的函数中存在某种结构。如果不存在任何结构,那将不可能很好地泛化。 大脑有一个深度架构 例如,视觉皮质得到了很好的研究,并显示出一系列的区域,在每一个这种区域中包含一个输入的表示和从一个到另一个的信号流(这里忽略了在一些层次并行路径上的关联,因此更复杂)。这个特征层次的每一层表示在一个不同的抽象层上的输入,并在层次的更上层有着更多的抽象特征,他们根据低层特征定义。 需要注意的是大脑中的表示是在中间紧密分布并且纯局部:他们是稀疏的:1%的神经元是同时活动的。给定大量的神经元,任然有一个非常高效地(指数级高效)表示。 认知过程看起来是深度的 人类层次化地组织思想和概念; 人类首先学习简单的概念,然后用他们去表示更抽象的; 工程师将任务分解成多个抽象层次去处理; 学习/发现这些概念(知识工程由于没有反省而失败?)是很美好的。对语言可表达的概念的反省也建议我们一个稀疏的表示:仅所有可能单词/概念中的一个小的部分是可被应用到一个特别的输入(一个视觉场景)。 学习深度架构的突破 2006年前,尝试训练深度架构都失败了:训练一个深度有监督前馈神经网络趋向于产生坏的结果(同时在训练和测试误差中),然后将其变浅为1(1或者2个隐层)。 2006年的3篇论文改变了这种状况,由Hinton的革命性的在深度信念网(Deep Belief Networks, DBNs)上的工作所引领: Hinton, G. E., Osindero, S. and Teh, Y., A fast learning algorithm for deep belief nets .Neural Computation 18:1527-1554, 2006 Yoshua Bengio, Pascal Lamblin, Dan Popovici and Hugo Larochelle, Greedy Layer-Wise Training of Deep Networks , in J. Platt et al. (Eds), Advances in Neural Information Processing Systems 19 (NIPS 2006), pp. 153-160, MIT Press, 2007 Marc’Aurelio Ranzato, Christopher Poultney, Sumit Chopra and Yann LeCun Efficient Learning of Sparse Representations with an Energy-Based Model , in J. Platt et al. (Eds), Advances in Neural Information Processing Systems (NIPS 2006), MIT Press, 2007 在这三篇论文中以下主要原理被发现: 表示的无监督学习被用于(预)训练每一层; 在一个时间里的一个层次的无监督训练,接着之前训练的层次。在每一层学习到的表示作为下一层的输入; 用无监督训练来调整所有层(加上一个或者更多的用于产生预测的附加层); DBNs在每一层中利用用于表示的无监督学习RBMs。Bengio et al paper 探讨和对比了RBMs和auto-encoders(通过一个表示的瓶颈内在层预测输入的神经网络)。Ranzato et al paper在一个convolutional架构的上下文中使用稀疏auto-encoders(类似于稀疏编码)。Auto-encoders和convolutional架构将在以后的课程中讲解。 从2006年以来,大量的关于深度学习的论文被发表,一些探讨了其他原理来引导中间表示的训练,查看 Learning Deep Architectures for AI 本文英文版出处 http://www.iro.umontreal.ca/~pift6266/H10/notes/deepintro.html
摘自: http://cseweb.ucsd.edu/~dasgupta/254-deep/ CSE 254: Seminar on Learning Algorithms Time TuTh 3.30-5 in CSE 2154 Instructor: Sanjoy Dasgupta Office hours TBA in EBU3B 4138 This quarter the theme of CSE 254 is deep learning . Prerequisite: CSE 250AB. The first couple of lectures will be an overview of basic material. Thereafter, in each class meeting, a student will give a talk lasting about 60 minutes presenting a technical paper (or several papers) in detail. In questions during the talk, and in the final 20 minutes, all seminar participants will discuss the paper and the issues raised by it. Date Presenter Paper Slides Jan 10 Sanjoy Introduction Jan 12 Sanjoy Hopfield nets Jan 17 Sanjoy Markov random fields, Gibbs sampling, simulated annealing Jan 19 Sanjoy Deep belief nets as autoencoders and classifiers Jan 24 Brian Task-driven dictionary learning here Jan 26 Vicente A quantitative theory of immediate visual recognition here Jan 31 Emanuele Convolutional deep belief networks here Feb 2 Nakul Restricted Boltzmann machines: learning , and hardness of inference here Feb 7 Craig The independent components of natural scenes are edge filters here Feb 9 No class: ITA conference at UCSD Feb 14 Janani Deep learning via semi-supervised embedding here Feb 16 Stefanos A unified architecture for natural language processing here Feb 21 Hourieh An analysis of single-layer networks in unsupervised feature learning here Feb 23 Ozgur Emergence of simple-cell receptive properties by learning a sparse code for natural images here Feb 28 Matus Representation power of neural networks: Barron , Cybenko , Kolmogorov here Mar 1 Frederic Reinforcement learning on slow features of high-dimensional input streams Mar 6 Dibyendu, Sreeparna Learning deep energy models and What is the best multistage architecture for object recognition? here Mar 8 No class: Sanjoy out of town Mar 13 Bryan Inference of sparse combinatorial-control networks here Mar 15 Qiushi Weighted sums of random kitchen sinks here This is a four unit course in which the work consists of oral presentations. The procedure for each student presentation is as follows: · One week in advance: Finish a draft of Latex/Powerpoint that present clearly the work in the paper. Make an appointment with me to discuss the draft slides. And email me the slides. · Several days in advance: Meet for about one hour to discuss improving the slides, and how to give a good presentation. · Day of presentation: Give a good presentation with confidence, enthusiasm, and clarity. · Less than three days afterwards: Make changes to the slides suggested by the class discussion, and email me the slides in PDF, two slides per page, for publishing. Try to make your PDF file less than one megabyte. Please read, reflect upon, and follow these presentation guidelines , courtesy of Prof Charles Elkan. Presentations will be evaluated, in a friendly way but with high standards, using this feedback form . Here is a preliminary list of papers .
Deep LearningInstructor: Bhiksha Raj COURSE NUMBER -- MLD: 10805 LTI: 11-785 (Lab) / 11-786 (Seminar) Timings: 1:30 p.m. -- 2:50 p.m. Days: Mondays and Wednesdays Location: GHC 4211 Website: http://deeplearning.cs.cmu.edu Credits: 10-805 and 11-786 are 6-credit seminar courses. 11-785 is a 12-credit lab course. Students who register for 11-785 will be required to complete all lab exercises. IMPORTANT: LTI students are requested to switch to the 11-XXX courses. All students desiring 12 credits must register for 11-785. Instructor: Bhiksha Raj Contact: email:bhiksha@cs.cmu.edu, Phone:8-9826, Office: GHC6705 Office hours: 3.30-5.00 Mondays. You may also meet me at other times if I'm free. TA: Anders Oland Contact: email:anderso@cs.cmu.edu, Office: GHC7709 Office hours: 12:30-2:00 Fridays. Deep learning algorithms attempt to learn multi-level representations of data, embodying a hierarchy of factors that may explain them. Such algorithms have been demonstrated to be effective both at uncovering underlying structure in data, and have been successfully applied to a large variety of problems ranging from image classification, to natural language processing and speech recognition. In this course students will learn about this resurgent subject. The course presents the subject through a series of seminars, which will explore it from its early beginnings, and work themselves to some of the state of the art. The seminars will cover the basics of deep learning and the underlying theory, as well as the breadth of application areas to which it has been applied, as well as the latest issues on learning from very large amounts of data. Although the concept of deep learning has been applied to a number of different models, we will concentrate largely, although not entirely, on the connectionist architectures that are most commonly associated with it. Students who participate in the course are expected to present at least one paper on the topic to the class. Presentations are expected to be thorough and, where applicable, illustrated through experiments and simulations conducted by the student. Students are registered for the lab course must also complete all lab exercises. Labs Lab 1 is up Lab 1: Perceptrons and MLPs Data sets Due: 18 Sep 2013 Lab 2 is up Lab 1: The effect of increasing network depth Data set Due: 17 Oct 2013 Papers and presentations Date Topic/paper Author Presenter Additional Links 28 Aug 2013 Introduction Bhiksha Raj Intelligent Machinery Alan Turing Subhodeep Moitra 4 Sep 2013 Bain on Neural Networks. Brain and Cognition 33:295-305, 1997 Alan L. Wilkes and Nicholas J. Wade Lars Mahler McCulloch, W.S. Pitts, W.H. (1943). A Logical Calculus of the Ideas Immanent in Nervous Activity, Bulletin of Mathematical Biophysics, 5:115-137, 1943. W.S. McCulloch and W.H. Pitts Kartik Goyal Michael Marsalli's tutorial on the McCulloch and Pitts Neuron 9 Sep 2013 The Perceptron: A Probalistic Model For Information Storage And Organization In The Brain. Psychological Review 65 (6): 386.408, 1958. F. Rosenblatt Daniel Maturana ?? Chapter from “The organization of Behavior”, 1949. D. O. Hebb Sonia Todorova 11 Sep 2013 The Widrow Hoff learning rule (ADALINE and MADALINE). Widrow Pallavi Baljekar ?? Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Networks 2 (6): 459.473, 1989. T. Sanger Khoa Luu A simplified Neuron model as a principal component analyzer, by Erkki Oja 16 Sep 2013 Learning representations by back-propagating errors. Nature323(6088): 533.536 Rumelhart et al. Ahmed Hefny Chapter by Rumelhart, Hinton and Williams Backpropagation through time: what it does and how to do it., P. Werbos, Proc. IEEE 1990 A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm, IEEE Intl. Conf. on Neural Networks, 1993 M. Riedmiller, H. Braun Danny (ZhenZong) Lan 18 Sep 2013 Neural networks and physical systems with emergent collective computational abilities, Proc. Natl. Acad. Sciences, Vol 79, 2554-2558, 1982 J. J. Hopfield Prasanna Muthukumar The self-organizing map. Proc. IEEE, Vol 79, 1464:1480, 1990 Teuvo Kohonen Fatma Faruq 23 Sep 2013 Phoneme recognition using time-delay neural networks, IEEE trans. Acoustics, Speech Signal Processing, Vol 37(3), March 1989 A. Waibel et al. Chen Chen A tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the echo state network approach, GMD Report 159, German National Research Center for Information Technology, 2002 Herber Jaeger Shaowei Wang 25 Sep 2013 Bidirectional recurrent neural networks, IEEE transactions on signal processing, Vol 45(11), Nov. 1997 M. Schuster and K. Paliwal Felix Juefei Xu Long short-term memory. Neural Computation, 9(8):1735.1780, 1997 S. Hochreiter and J. Schmidhuber Dougal Sutherland 30 Sep 2013 A learning algorithm for Boltzmann machines, Cognitive Science, 9, 147-169, 1985 D. Ackley, G. Hinton, T. Sejnowski Siyuan Improved simulated annealing, Boltzmann machine, and attributed graph matching, EURASIP Workshop on Neural Networks, vol 412, LNCS, Springer, pp: 151-160, 1990 Lei Xu, Erkii Oja. Ran Chen 2 Oct 2013 Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position, Pattern Recognition Vol. 15(6), pp. 455-469, 1982 K. Fukushima, S. Miyake Sam Thomson Shift invariance and the Neocognitron, E. Barnard and D. Casasent, Neural Networks Vol 3(4), pp. 403-410, 1990 Face recognition: A convolutional neural-network approach, IEEE transactions on Neural Networks, Vol 8(1), pp98-113, 1997 S. Lawrence, C. L. Giles, A. C. Tsoi, A. D. Back Hoang Ngan Le Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis, P.Y.Simard, D. Steinkraus, J.C. Platt, Prc. Document analysis and recognition, 2003 Gradient based learning applied to document recognition, Y. LeCun, L. Bottou, Y. Bengio, P. Haffner. Proceedings of the IEEE, November 1998, pp. 1-43 7 Oct 2013 On the problem of local minima in backpropagation, IEEE tran. Pattern Analysis and Machine Intelligence, Vol 14(1), 76-86, 1992 M. Gori, A. Tesi Jon Smereka Learning long-term dependencies with gradient descent is difficult, IEEE trans. Neural Networks, Vol 5(2), pp 157-166, 1994 Y. Bengio, P. Simard, P. Frasconi Keerthiram Murugesan Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber, in A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press , 2001 Backpropagation is sensitive to initial conditions, J. F. Kolen and J. B. Pollack, Advances in Neural Information Processing Systems, pp 860-867, 1990 9 Oct 2013 Multilayer feedforward networks are universal approximators, Neural Networks, Vol:2(3), 359-366, 1989 K. Hornik, M. Stinchcombe, H. White Sonia Todorova Approximations by superpositions of a sigmoidal function, G. Cybenko, Mathematics of control, signals and systems, Vol:2, pp. 303-314, 1989 On the approximation realization of continuous mappings by neural networks, K. Funahashi, Neural Networks, Vol. 2(3), pp. 183-192, 1989 Universal approximation bounds for superpositions of a sigmoidal function, A. R. Barron, IEEE Trans. on Info. Theory, Vol 39(3), pp. 930-945, 1993 On the expressive power of deep architectures, Proc. 14th intl. conf. on discovery science, 2011 Y. Bengio and O. Delalleau Prasanna Muthukumar Scaling learning algorithms towards AI, Y. Bengio and Y. LeCunn, in Large Scale Kernel Machines , Eds. Bottou, Chappelle, DeCoste, Weston, 2007 Shallow vs. Deep sum product networks, O. Dellaleau and Y. Bengio, Advances in Neural Information Processing Systems, 2011 14 Oct 2013 Information processing in dynamical systems: Foundations of Harmony theory; In Parallel Distributed Processing: Explorations in the microstructure of cognition , Rumelhart and McLelland eds., 1986 Paul Smolensky Kathy Brigham Geometry of the restricted Boltzmann machine, M. A. Cueto, J. Morton, B. Sturmfels, Contemporary Mathematics, Vol. 516., pp. 135-153, 2010 Exponential family harmoniums with and application to information retrieval, Advances in Neural Information Processing Systems (NIPS), 2004 M. Welling, M. Rosen-Zvi, G. Hinton Ankur Gandhe Continuous restricted Boltzmann machine with an implementable training algorithm, H. Chen and A. F. Muray, IEE proceedings on Vision, Image and Signal Processing, Vol. 150(3), pp. 153-158, 2003 Diffusion networks, product of experts, and factor analysis, T. K. Marks and J. R. Movellan, 3rd Intl. Conf. on Independent Component Analysis and Signal Separation, 2001 16 Oct 2013 Distributed optimization of deeply nested systems. Unpublished manuscript, Dec. 24, 2012, arXiv:1212.5921 M. Carrera-Perpiñan and W. Wang M. Carrera-Perpiñan 21 Oct 2013 Training products of experts by minimizing contrastive divergence, Neural Computation, Vol. 14(8), pp. 1771-1800, 2002 G. Hinton Yuxiong Wang On contrastive divergence learning, M. Carrera-Perpinñan, AI and Statistics, 2005 Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient, T. Tieleman, International conference on Machine learning (ICML), pp. 1064-1071, 2008 An Analysis of Contrastive Divergence Learning in Gaussian Boltzmann Machines, Chris Williams, Felix Agakov, Tech report, University of Edinburgh, 2002 Justifying and generalizing contrastive divergence, Y. Bengio, O. Delalleau, Neural Computation, Vol. 21(6), pp. 1601-1621, 2009 23 Oct 2013 A fast learning algorithm for deep belief networks, Neural Computation, Vol. 18, No. 7, Pages 1527-1554, 2006. G. Hinton, S. Osindero, Y.-W. Teh Aaron Wise Reducing the dimensionality of data with Neural Networks, G. Hinton and R. Salakhutidnov, Science, Vol. 313. no. 5786, pp. 504 - 507, 28 July 2006 Greedy layer-wise training of deep networks, Neural Information Processing Systems (NIPS), 2007. Y. Bengio, P. Lamblin, D. Popovici and H. Larochelle Ahmed Hefny Efficient Learning of Sparse Overcomplete Representations with an Energy-Based Model, M. Ranzato, C.S. Poultney, S. Chopra, Y. Lecunn, Neural Information Processing Systems (NIPS), 2006. 28 Oct 2013 Imagenet classification with deep convolutional neural networks, NIPS 2012 A. Krizhevsky, I. Sutskever, G. Hinton Danny Lan Convolutional recursive deep learning for 3D object classification, R. Socher, B. Huval, B. Bhat, C. Manning, A. Ng, NIPS 2012 Multi-column deep neural networks for image classification, D. Ciresan, U. Meier and J. Schmidhuber, CVPR 2012 Learning hierarchial features for scene labeling, IEEE transactions on pattern analysis and machine intelligence, Vol 35(8), pp. 1915-1929, 2012 C. Couprie, L. Najman, Y. LeCun Jon Smereka Learning convolutional feature hierarchies for visual recognition, K. Laukcuoglu,P. Sermanet, Y-Lan Boureau, K. Gregor, M. Mathieu, Y. LeCun, NIPS 2010 30 Oct 2013 Statistical language models based on neural networks, PhD dissertation, Brno, 2012, chapters 3 and 6 T. Mikolov, Fatma Faruq Semi-supervised recursive autoencoders for predicting sentiment R. Socher, J. Pennington, E. Huang, A. Ng and C. Manning Yueran Yuan Dynamic pooling and unfoloding recursive autoencoders for paraphrase detection, R. Socher, E. Huang, J. Pennington, A. Ng, C. Manning, EMNLP 2011 Joint learning of words and meaning representation for open-text semantic parsing, A.Bodes, X. Glorot, J. Weston, Y. Bengio, AISTATS 2012 4 Nov 2013 Supervised sequence labelling with recurrent neural networks, PhD dissertation, T. U. Munchen, 2008, Chapters 4 and 7 A. Graves, Georg Schoenherr Speech recognition with deep recurrent neural networks, A. Graves, A.-. Mohamed, G. Hinton, ICASSP 2013 Deep neural networks for acoustic modeling in speech recognition: the shared view of four research groups, IEEE Signal Processing Magazine, Vold 29(6), pp 82-97, 2012. G. Hinton et al. Daniel Maturana 6 Nov 2013 Modeling Documents with a Deep Boltzmann Machine, UAI 2013 N. Srivastava, R. Salakhutidinov, G. Hinton Siyuan Generating text with Recurrent Neural Networks, I. Sutskever, J. Martens, G. Hinton, ICML 2011 Word representations: A simple and general method for semi-supervised learning, ACL 2010 J. Turian, L. Ratinov, Y. Bengio Sam Thomson 11 Nov 2013 An empirical evaluation of deep architectures on problems with many factors or variables, ICML 2007 H. Larochelle, D. Erhan, A. Courville, J. Bergstra, Y. Bengio Ran Chen The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training, AISTATS 2009 D. Erhan, P.-A. Manzagol, Y. Bengio, S. Bengio, P. Vincent Ankur Gandhe 13 Nov 2013 Extracting and Composing Robust Features with Denoising Autoencoders, ICML 2008 P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzgool Pallavi Baljekar Improving neural networks by preventing co-adaptation of feature detectors, G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sustskever, R. R. Salakhutdinov Subhodeep Moitra 18 Nov 2013 A theory of deep learning architectures for sensory perception: the ventral stream, Fabio Anselmi, Joel Z Leibo, Lorenzo Rosasco, Jim Mutch, Andrea Tacchetti, Tomaso Poggio Dipan Pal 20 Nov 2013 No more pesky learning rates, ICML 2013 Tom Schaul, Sixin Zhang and Yann LeCun Georg Shoenherr No more pesky learning rates: supplementary material On the importance of initialization and momentum in deep learning, JMLR 28(3): 1139.1147, 2013 Ilya Sutskever, James Martens, George Dahl, Geoffrey Hinton Kartik Goyal Supplementary material for paper 25 Nov 2013 Guest lecture Quoc Le 27 Nov 2013 A multi-layer sparse coding network learns contour coding from natural images Neural Networks Research Centre, Vision Research 42(12): 1593-1605, 2002 Patrik O. Hoyer and Aapo Hyvarinen Sparse Feature Learning for Deep Belief Networks, NIPS 2007 Marc.Aurelio Ranzato Y-Lan Boureau, Yann LeCun Sparse deep belief net model for visual area V2, NIPS 2007 Honglak Lee Chaitanya Ekanadham Andrew Y. Ng Deep Sparse Rectifier Neural Networks, JMLR 16: 315-323, 2011 Xavier Glorot, Antoine Bordes, Yoshua Bengio To be arranged Exploring strategies for training deep neural networks, Journal of Machine Learning Research, Vol. 1, pp 1-40, 2009 H. Larochelle, Y. Bengio, J. Louradour, P. Lamblin Why Does Unsupervised Pre-training Help Deep Learning?, AISTATS 2010 D. Erhan, A. Courville, Y. Bengio, P. Vincent Understanding the difficulty of training deep feedforward neural networks, AISTATS 2010 X. Glorot and Y. Bengio A Provably Efficient Algorithm for Training Deep Networks, arXiv:1304.7045 , 2013 R. Livni, S. Shalev-Schwartz, O. Shamir
Accepted Papers Oral presentations Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction Jian Zhou, Olga Troyanskaya Playing Atari with Deep Reinforcement Learning Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis, Antonoglou, Daan Wierstra, Martin Riedmiller Poster presentations Sparse Combinatorial Autoencoders (ID 2) Karthik Narayan, Pieter Abbeel Grounded Compositional Semantics for Finding and Describing Images with Sentences (ID 4) Richard Socher , Quoc Le , Christopher Manning , Andrew Ng Curriculum Learning for Handwritten Text Line Recognition (ID 5) Jerome Louradour, Christopher Kermorvant A Deep and Tractable Density Estimator (ID 7) Benigno Uria, Iain Murray, Hugo Larochelle Multi-Column Deep Neural Networks for Offline Handwritten Chinese Character Classification (ID 11) Dan Ciresan, Juergen Schmidhuber End-to-end Phoneme Sequence Recognition using Convolutional Neural Networks (ID 12) Dimitri Palaz, Ronan Collobert, Mathew Magimai.-Doss Scalable Wide Sparse Learning for Connectomics (ID 15) Jeremy Maitin-Shepard, Pieter Abbeel Is deep learning really necessary for word embeddings? (ID 16) Rémi Lebret, Joël Legrand, Ronan Collobert Recurrent Conditional Random Fields (ID 18) Kaisheng Yao, Baolin Peng, G eoffrey Zweig, Dong Yu , Xiaolong Li, Feng Gao Recurrent Convolutional Neural Networks for Scene Parsing (ID 20) Pedro Pinheiro, Ronan Collobert Backpropagation in Sequential Deep Belief Networks (ID 22) Galen Andrew, Jeff Bilmes Learning semantic representations for the phrase translation model (ID 23) Jianfeng Gao, Xiaodong He, Wen-tau Yih, Li Deng Event-driven Contrastive Divergence in Spiking Neural Networks (ID 25) Emre Neftci, Bruno Pedroni, Gert Cauwenberghs, Kenneth Kreutz-Delgado, Srinjoy Das Dynamics of learning in deep linear neural networks (ID 27) Andrew Saxe, James McClelland, Surya Ganguli Exploring Deep and Recurrent Architectures for Optimal Control (ID 28) Sergey Levine Analyzing noise in autoencoders and deep networks (ID 29) Ben Poole, Jascha Sohl-Dickstein, Surya Ganguli Structured Recurrent Temporal Restricted Boltzmann Machines (ID 30) Roni Mittelman, Benjamin Kuipers, Silvio Savarese, Honglak Lee Learning Deep Representations via Multiplicative Interactions between Factors of Variation (ID 31) Scott Reed, Honglak Lee Learning Input and Recurrent Weight Matrices in Echo State Networks (ID 32) Hamid Palangi, Li Deng , Rabab Ward Learning Sum-Product Networks with Direct and Indirect Variable Interactions (ID 33) Amirmohammad Rooshenas, Daniel Lowd Bidirectional Recursive Neural Networks for Token-Level Labeling with Structure (ID 34) Ozan Irsoy, Claire Cardie Estimating Dependency Structures for non-Gaussian Components (ID 38) Hiroaki Sasaki, Michael Gutmann, Hayaru Shouno, Aapo Hyvarinen Multimodal Neural Language Models (ID 42) Ryan Kiros, Ruslan Salakhutdinov , Richard Zemel Non-degenerate Priors for Arbitrarily Deep Networks (ID 43) David Duvenaud, Oren Rippel, Ryan Adams, Z oubin Ghahramani Learning Multilingual Word Representations using a Bag-of-Words Autoencoder (ID 44) Stanislas Lauly, Alex Boulanger, Hugo Larochelle Multilingual Deep Learning (ID 45) Sarath Chandar A P, Mitesh M. Khapra, Balaraman Ravindran, Vikas Raykar, Amrita Saha Learned-norm pooling for deep neural networks (ID 46) Caglar Gulcehre, Kyunghyun Cho , Razvan Pascanu, Yoshua Bengio Transition-based Dependency Parsing Using Recursive Neural Networks (ID 47) Pontus Stenetorp 共接收 30 篇 源自: https://sites.google.com/site/deeplearningworkshopnips2013/accepted-papers
http://www.cs.washington.edu/research/ml/projects/ 原址: http://www.cs.washington.edu/node/8805 In machine learning, as throughout computer science, there is a tradeoff between expressiveness and tractability. On the one hand, we need powerful model classes to capture the richness and complexity of the real world. On the other, we need inference in those models to remain tractable, otherwise their potential for widespread practical use is limited. Deep learning can induce powerful representations, with multiple layers of latent variables, but these models are generally intractable. We are developing new classes of similarly expressive but still tractable models, including sum-product networks and tractable Markov logic. These models capture both class-subclass and part-subpart structure in the domain, and are in some aspects more expressive than traditional graphical models like Bayesian networks and Markov random fields. Research includes designing representations, studying their properties, developing efficient algorithms for learning them, and applications to challenging problems in natural language understanding, vision, and other areas.Awards NIPS 2012 Outstanding Student Paper: Discriminative Learning of Sum-Product Networks UAI 2011 Best Paper: Sum-Product Networks: A New Deep Architecture EMNLP 2009 Best Paper: Unsupervised Semantic Parsing People Pedro Domingos Abram L Friesen Robert C Gens Chloe M Kiddon Aniruddh Nath Mathias Niepert W Austin Webb Publications Learning the Structure of Sum-Product Networks (2013) A Tractable First-Order Probabilistic Logic (2012) Discriminative Learning of Sum-Product Networks (2012) Learning Multiple Hierarchical Relational Clusterings (2012) Coarse-to-Fine Inference and Learning for First-Order Probabilistic Models (2011) Sum-Product Networks: A New Deep Architecture (2011) Approximate Inference by Compilation to Arithmetic Circuits (2010) Learning Efficient Markov Networks (2010) Unsupervised Ontology Induction from Text (2010) Unsupervised Semantic Parsing (2009) Learning Arithmetic Circuits (2008) Naive Bayes Models for Probability Estimation (2005) Research Groups Artificial Intelligence Machine Learning
Facebook Launches Advanced AI Effort to Find Meaning in Your Posts A technique called deep learning could help Facebook understand its users and their data better. By Tom Simonite on September 20, 2013 Facebook ’s piles of data on people’s lives could allow it to push the boundaries of what can be done with the emerging AI technique known as deep learning . Facebook is set to get an even better understanding of the 700 million people who use the social network to share details of their personal lives each day. A new research group within the company is working on an emerging and powerful approach to artificial intelligence known as deep learning , which uses simulated networks of brain cells to process data. Applying this method to data shared on Facebook could allow for novel features and perhaps boost the company’s ad targeting. Deep learning has shown potential as the basis for software that could work out the emotions or events described in text even if they aren’t explicitly referenced, recognize objects in photos, and make sophisticated predictions about people’s likely future behavior. The eight-person group , known internally as the AI team, only recently started work, and details of its experiments are still secret. But Facebook’s chief technology officer , Mike Schroepfer , will say that one obvious way to use deep learning is to improve the news feed, the personalized list of recent updates he calls Facebook’s “ killer app .” The company already uses conventional machine learning techniques to prune the 1,500 updates that average Facebook users could possibly see down to 30 to 60 that are judged most likely to be important to them. Schroepfer says Facebook needs to get better at picking the best updates because its users are generating more data and using the social network in different ways. “The data set is increasing in size, people are getting more friends, and with the advent of mobile, people are online more frequently,” Schroepfer told MIT Technology Review . “It’s not that I look at my news feed once at the end of the day; I constantly pull out my phone while I’m waiting for my friend or I’m at the coffee shop. We have five minutes to really delight you.” Shroepfer says deep learning could also be used to help people organize their photos or choose which is the best one to share on Facebook . In looking into deep learning , Facebook follows its competitors Google and Microsoft , which have used the approach to impressive effect in the past year. Google has hired and acquired leading talent in the field (see “ 10 Breakthrough Technologies 2013: Deep Learning ”), and last year it created software that taught itself to recognize cats and other objects by reviewing stills from YouTube videos. The underlying technology was later used to slash the error rate of Google’s voice recognition services (see “ Google’s Virtual Brain Goes to Work ”). Meanwhile, researchers at Microsoft have used deep learning to build a system that translates speech from English to Mandarin Chinese in real time (see “ Microsoft Brings Star Trek’s Voice Translator to Life ”). Chinese Web giant Baidu also recently established a Silicon Valley research lab to work on deep learning . Less complex forms of machine learning have underpinned some of the most useful features developed by major technology companies in recent years, such as spam detection systems and facial recognition in images. The largest companies have now begun investing heavily in deep learning because it can deliver significant gains over those more established techniques, says Elliot Turner , founder and CEO of AlchemyAPI , which rents access to its own deep learning software for text and images. “Research into understanding images, text, and language has been going on for decades, but the typical improvement a new technique might offer was a fraction of a percent,” he says. “In tasks like vision or speech, we’re seeing 30 percent-plus improvements with deep learning .” The newer technique also allows much faster progress in training a new piece of software, says Turner. Conventional forms of machine learning are slower because before data can be fed into learning software, experts must manually choose which features of it the software should pay attention to, and they must label the data to signify, for example, that certain images contain cars. Deep learning systems can learn with much less human intervention because they can figure out for themselves which features of the raw data are most significant. They can even work on data that hasn’t been labeled, as Google’s cat-recognizing software did. Systems able to do that typically use software that simulates networks of brain cells, known as neural nets, to process data. They require more powerful collections of computers to run. Facebook’s AI group will work on applications that can help the company’s products as well as on more general research that will be made public, says Srinivas Narayanan , an engineering manager at Facebook who’s helping to assemble the new group. He says one way Facebook can help advance deep learning is by drawing on its recent work creating new types of hardware and software to handle large data sets (see “ Inside Facebook’s Not-So-Secret New Data Center ”). “It’s both a software and a hardware problem together; the way you scale these networks requires very deep integration of the two,” he says. Facebook hired deep learning expert Marc’Aurelio Ranzato away from Google for its new group. Other members include Yaniv Taigman , cofounder of the facial recognition startup Face.com (see “ When You’re Always a Familiar Face ”); computer vision expert Lubomir Bourdev ; and veteran Facebook engineer Keith Adams . 原文: http://www.technologyreview.com/news/519411/facebook-launches-advanced-ai-effort-to-find-meaning-in-your-posts/
Adaptive dropout for training deep neural networks http://media.nips.cc/nipsbooks/nipspapers/paper_files/nips26/1409.pdf J. Ba , B. Frey A Deep Architecture for Matching Short Texts http://media.nips.cc/nipsbooks/nipspapers/paper_files/nips26/697.pdf Z. Lu , H. Li A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks http://papers.nips.cc/paper/4978-a-scalable-approach-to-probabilistic-latent-space-inference-of-large-scale-networks.pdf J. Yin , Q. Ho , E. Xing Bayesian Hierarchical Community Discovery C. Blundell , Y. Teh Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent Y. Hu , J. Boyd-Graber , H. Daume III , Z. Ying Convex Two-Layer Modeling . Aslan , H. CHENG , X. Zhang , D. Schuurmans Deep content-based music recommendation A. van den Oord , S. Dieleman , B. Schrauwen Deep Fisher Networks for Large-Scale Image Classification K. Simonyan , A. Vedaldi , A. Zisserman Deep Neural Networks for Object Detection C. Szegedy , A. Toshev , D. Erhan DeViSE: A Deep Visual-Semantic Embedding Model A. Frome , G. Corrado , J. Shlens , S. Bengio , J. Dean , M. Ranzato , T. Mikolov Dropout Training as Adaptive Regularization S. Wager , S. Wang , P. Liang Extracting regions of interest from biological images with convolutional sparse block coding M. Pachitariu , M. Sahani , A. Packer , N. Pettit , H. Dalgleish Generalized Denoising Auto-Encoders as Generative Models Y. Bengio , L. Yao , G. Alain , P. Vincent Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream D. Yamins , H. Hong , C. Cadieu , J. DiCarlo Learning a Deep Compact Image Representation for Visual Tracking N. Wang , D. Yeung Learning Multi-level Sparse Representations F. Diego , F. Hamprecht Learning Stochastic Feedforward Neural Networks Y. Tang , R. Salakhutdinov Lexical and Hierarchical Topic Regression V. Nguyen , J. Boyd-Graber , P. Resnik Multi-Prediction Deep Boltzmann Machines I. Goodfellow , M. Mirza , A. Courville , Y. Bengio Multisensory Encoding, Decoding, and Identification A. Lazar , Y. Slutskiy On the Expressive Power of Restricted Boltzmann Machines J. Martens , A. Chattopadhya , T. Pitassi , R. Zemel Pass-efficient unsupervised feature selection H. Schweitzer , C. Maung Predicting Parameters in Deep Learning M. Denil , B. Shakibi , L. Dinh , M. Ranzato , N. de Freitas Reasoning With Neural Tensor Networks for Knowledge Base Completion R. Socher , D. Chen , C. Manning , A. Ng Robust Image Denoising with Multi-Column Deep Neural Networks F. Agostinelli , H. Lee , M. Anderson Spike train entropy-rate estimation using hierarchical Dirichlet process priors K. Knudson , J. Pillow Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs Y. Dauphin , Y. Bengio Top-Down Regularization of Deep Belief Networks H. Goh , N. Thome , M. Cord , J. LIM Training and Analysing Deep Recurrent Neural Networks M. Hermans , B. Schrauwen Understanding Dropout P. Baldi , P. Sadowski Wavelets on Graphs via Deep Learning R. Rustamov , L. Guibas 摘自: http://nips.cc/Conferences/2013/Program/accepted-papers.php
题目:Sparse representation in computer vision and visual cortex 主讲人: 彭义刚,博士毕业于清华大学自动化系,研究方向为image/video processing, sparse representation, low-rank matrix recovery。 肖达,北京邮电大学计算机学院教师。 提纲: 1. From sparsity to low-rankness and more(讲稿下载: http://vdisk.weibo.com/s/KMQW6 ) 2. Self-organizing cortical map model and Topographica simulator(讲稿下载: http://vdisk.weibo.com/s/KMR4I 。另见参考文献) 视频回放: http://www.duobei.com/room/4411032613 参考文献: . Bednar JA: Building a mechanistic model of the development and function of the primary visual cortex. J Physiol Paris, 2012, 106(5-6):194-211. .Demo代码网址 http://topographica.org/
有回顾NLP(Natural Language Processing)历史的大牛介绍统计模型(通过所谓机器学习 machine learning)取代传统知识系统(又称规则系统 rule-based system)成为学界主流的掌故,说20多年前好像经历了一场惊心动魄的宗教战争。其实我倒觉得更像49年解放军打过长江去,传统NLP的知识系统就跟国民党一样兵败如山倒,大好江山拱手相让。文傻秀才遭遇理呆兵,有理无理都说不清,缴械投降是必然结果。唯一遗憾的也许是,统计理呆仗打得太过顺利,太没有抵抗,倒是可能觉得有些不过瘾,免不了有些胜之不武的失落。苍白文弱的语言学家也 太不经打了。 自从 20 年前统计学家势不可挡地登堂入室一统天下以后,我这样语言学家出身的在学界立马成为二等公民,一直就是敲边鼓,如履薄冰地跟着潮流走。走得烦了,就做一回阿桂。 NLP 这个领域,统计学家完胜,是有其历史必然性的,不服不行。虽然统计学界有很多对传统规则系统根深蒂固的偏见和经不起推敲但非常流行的蛮横结论(以后慢慢论,血泪账一笔一笔诉 :),但是机器学习的巨大成果和效益是有目共睹无所不在的:机器翻译,语音识别/合成,搜索排序,垃圾过滤,文档分类,自动文摘,知识习得,you name it 甚至可以极端一点这么说,规则系统的成功看上去总好像是个案,是经验,是巧合,是老中医,是造化和运气。而机器学习的成功,虽然有时也有 tricks,但总体而论是科学的正道,是可以重复和批量复制的。 不容易复制的成功就跟中国餐一样,同样的材料和recipe,不同的大厨可以做出完全不同的味道来。这就注定了中华料理虽然遍及全球,可以征服食不厌精的美食家和赢得海内外无数中餐粉丝,但中餐馆还是滥竽充数者居多,因此绝对形成不了麦当劳这样的巨无霸来。 而统计NLP和机器学习就是麦当劳这样的巨无霸:味道比较单调,甚至垃圾,但绝对是饿的时候能顶事儿, fulfilling,最主要的是 no drama,不会大起大落。不管在世界哪个角落,都是一条流水线上的产品,其味道和质量如出一辙 。 做不好主流,那就做个大厨吧。做个一级大厨感觉也蛮好。最终还是系统说了算。邓小平真是聪明,有个白猫黑猫论,否则我们这些前朝遗老不如撞墙去。 就说过去10多年吧,我一直坚持做多层次的 deep parsing,来支持NLP的各种应用。当时看到统计学家们追求单纯,追求浅层的海量数据处理,心里想,难怪有些任务,你们虽然出结果快,而且也鲁棒,可质量总是卡在一个口上就过不去。从“人工智能”的概念高度看,浅层学习(shallow learning)与深层分析(deep parsing)根本就不在一个档次上,你再“科学”也没用。可这个感觉和道理要是跟统计学家说,当时是没人理睬的,是有理说不清的,因为他们从本质上就鄙视或忽视语言学家 ,根本就没有那个平等对话的氛围(chemistry)。最后人家到底自己悟出来了,因此近来天上掉下个多层 deep learning,视为神迹,仿佛一夜间主导了整个机器学习领域,趋之者若鹜。啧啧称奇的人很多,洋洋自得的也多,argue 说,一层一层往深了学习是革命性的突破,质量自然是大幅度提升。我心里想,这个大道理我十几年前就洞若观火,殊途不还是同归了嘛。想起在深度学习风靡世界之前,曾有心有灵犀的老友这样评论过: To me, Dr. Li is essentially the only one who actualy builds true industrial NLP systems with deep parsing. While the whole world is praised with heavy statistics on shallow linguistics, Dr. Li proved with excellent system performances such a simple truth: deep parsing is useful and doable in large scale real world applications. 我的预见,大概还要20年吧(不是说风水轮流转20年河东河西么),主流里面的偏见会部分得到纠正,那时也不会是规则和知识的春天重返,而是统计和规则比较地和谐合作。宗教式的交恶和贬抑会逐渐淡去。 阿弥陀佛! 【相关篇什】 【立委随笔:文傻和理呆的世纪悲剧(romance tragedy)】 ZT: 2013突破性科学技术之“深度学习” 【置顶:立委科学网博客NLP博文一览(定期更新版)】