科学网

 找回密码
  注册
科学网 标签 machine 相关日志

tag 标签: machine

相关日志

RL资源网站(2.27.2020)
ChenChengHUST 2020-2-27 19:44
Python: https://blog.csdn.net/u014119694/article/details/76095796 RL: http://incompleteideas.net/book/code/code2nd.html RL的b站网课: https://www.bilibili.com/video/av32149008?p=1
个人分类: RL学习|2 次阅读|0 个评论
机器学习常用方法网址(第一批汇集)
zlyang 2019-9-28 19:06
机器学习常用方法网址(第一批汇集) (1)2016-09-26,30分钟学会用scikit-learn的基本回归方法(线性、决策树、SVM、KNN)和集成方法(随机森林,Adaboost和GBRT) https://blog.csdn.net/u010900574/article/details/52666291 竟然KNN这个计算效能最差的算法效果最好 (2)2018-12-26,机器学习的几种方法(knn,逻辑回归,SVM,决策树,随机森林,极限随机树,集成学习,Adaboost,GBDT) https://blog.csdn.net/fanzonghao/article/details/85260775 (3) 2018-02-13,Machine Learning: 十大机器学习算法 https://zhuanlan.zhihu.com/p/33794257 (4)2018-12-28,机器学习 – machine learning | ML https://easyai.tech/ai-definition/machine-learning/ 15种经典机器学习算法 算法 训练方式 线性回归 监督学习 逻辑回归 监督学习 线性判别分析 监督学习 决策树 监督学习 朴素贝叶斯 监督学习 K邻近 监督学习 学习向量量化 监督学习 支持向量机 监督学习 随机森林 监督学习 AdaBoost 监督学习 高斯混合模型 非监督学习 限制波尔兹曼机 非监督学习 K-means 聚类 非监督学习 最大期望算法 非监督学习 (5)2017-06-01,深度学习笔记——基于传统机器学习算法(LR、SVM、GBDT、RandomForest)的句子对匹配方法 https://blog.csdn.net/mpk_no1/article/details/72836042 从准确率上来看,随机森林的效果最好。时间上面,SVM耗时最长。 (6)2016-08-04,8种常见机器学习算法比较 https://www.leiphone.com/news/201608/WosBbsYqyfwcDNa4.html 通常情况下:【GBDT=SVM=RF=Adaboost=Other…】 (7)2016-07-21,各分类方法应用场景 逻辑回归,支持向量机,随机森林,GBT,深度学习 https://www.quora.com/What-are-the-advantages-of-different-classification-algorithms (8)2018-12-10,LR、SVM、RF、GBDT、XGBoost和LightGbm比较 https://www.cnblogs.com/x739400043/p/10098659.html (9)2018-03-07,机器学习(三十六)——XGBoost, LightGBM, Parameter Server https://antkillerfarm.github.io/ml/2018/03/07/Machine_Learning_36.html (10)2019-07-29,深度学习(三十七)——CenterNet, Anchor-Free, NN Quantization https://blog.csdn.net/antkillerfarm/article/details/97623139 (11)2019-05-13,机器学习该怎么入门? https://www.zhihu.com/question/20691338 (12)2018-12-28,机器学习、人工智能、深度学习是什么关系? https://easyai.tech/ai-definition/machine-learning/ (13)2019-09-27,机器学习 https://www.zhihu.com/topic/19559450/hot 2019-09-26, ALBERT A Lite BERT for Self-supervised Learning of Language Representations https://arxiv.org/abs/1909.11942 (14)2016,机器学习 Machine-Learning https://github.com/JustFollowUs/Machine-Learning (15) Bradley Efron, Trevor Hastie, 2016-08, Computer Age Statistical Inference: Algorithms, Evidence and Data Science https://web.stanford.edu/~hastie/CASI/ (16)2019-06-05,张钹院士:人工智能技术已进入第三代 https://blog.csdn.net/cf2SudS8x8F0v/article/details/90986936 相关链接: 2016-09-01,支持向量机 Support Vector Machine 程序网址 http://blog.sciencenet.cn/blog-107667-1000087.html 2016-09-01,Crosswavelet and Wavelet Coherence 小波分析的程序网址 http://blog.sciencenet.cn/blog-107667-1000091.html 2019-07-27,威布尔分布 Weibull Distribution 资源网页搜集 http://blog.sciencenet.cn/blog-107667-1191323.html 2016-09-01,极限学习机 Extreme Learning Machines (ELM) 程序网址 http://blog.sciencenet.cn/blog-107667-1000094.html 2019-09-27,极值分布 Extreme Values Distribution 相关网页 http://blog.sciencenet.cn/blog-107667-1199726.html 2019-09-22,模糊数学:扎德“解模糊”、卡尔曼“模糊化”(博文网页汇集) http://blog. sciencenet.cn/blog-107667-1199064.html 2018-08-26,估计 Largest Lyapunov exponent 的 matlab 程序搜集(网址) http://blog.sciencenet.cn/blog-107667-1131215.html 2018-08-18 ,“大数据”时期,更渴望“小样本数理统计学” http://blog.sciencenet.cn/blog-107667-1129894.html 2017-07-11,布拉德利·埃弗龙(Bradley Efron):2005年美国国家科学奖章得主(统计学) http://blog.sciencenet.cn/blog-107667-1065714.html 感谢您的指教! 感谢您指正以上任何错误! 感谢您提供更多的相关资料!
个人分类: 风电功率预测|3868 次阅读|3 个评论
[转载]Inquire Into The Research Progress of Electrospinning Tech
QingziNano 2018-4-25 16:34
Inquire IntoThe Research Progress of Electrospinning Technology Summary: electrospinning technology is a method of directly, continuously manufacture polymer nanofiber at present. It has the advantages such as simple process, convenient operation, quick manufacture speed, etc. FoshanLepton Precision Measurement And Control Technology Co.,Ltd528225 Summary: ectrospinning technology is a method to directly, continuously manufacture polymer nanofiber at present. It has the advantages such as simple process, convenient operation, quick manufacture speed, etc. It’s extensive applied to medical and environmental areas. This article introduced electrospinning technologyin recent years and its application’s research progress, summarized the electrospinning theory and influencing factors, are proposed the prospects of the electrospinning technology application in the future. Keywords: electrospinning; nanofiber; progress Foreword: In the strict sense, nanofiber means the ultra minimum fiber whose diameter is less than 100nm. Its feature is high specific surface area and high porosity. Therefore, it can be extensive applied to the areas like high-efficiency filter materials, biological materials, high precision instruments, protective materials, nano composites, etc. In the 1990s, the research of nanotechnology was heated up, making the nanofiber manufacture become the research hot spot quickly. Electrospinning polymer nanofiber has the features like simple equipment, easy operation, etc. Up to now, it’s one of the most important methods of manufacturing polymer nanofiber. 1.Electrospinning The figure of the electrospinning equipment is showed as Diagram1. It’s mainly composed of three parts: high voltage power supply, nozzle and fiber collection device. Usually adopted direct-current power supply instead of alternating current power supply for high voltage power. Electrospinning needed for high voltage power 1~30kV. The syringe (or pipette) deliver the solution or melt to the nozzle in the terminal. The nozzle is a very thin metal tube with electrode. The collection device or collection plate is used for collecting nanofiber. Through changing the geometric dimensioning and shape of the collection device, we can change the nanofiber arrays form. 2.Electrospinning Technology Theory Back to 1882, Raleigh found that liquid drop with electric was unstable in the electric field. After entering the electric field, because of the electric field force, the liquid drop is easy to split into smaller liquid drops. Taylor’s research showed that liquid drop with electricity entered the electric field through the nozzle, under the combined action of electric field force and surface tension of liquid, the liquid drop gradually stretched, forming a cone (Taylor Cone) with a49.3° angle. In the process of electrospinning, the polymer solution or melt is extruded to the nozzle, because of the actions of electric field force and surface tension. There forms a Taylor Cone on the nozzle. Along with the spinning solution pushed into the electric field, it will spray out from the top of the Taylor Cone, In the electric field, it will be continue stretched under the electric field force. When the jet-flow is stretched to a certain extent, it will overcome the surface tension, curve unstably and be stretched and split into thinner jet-flows. At the moment, the jet-flow’s specific surface is increasing quickly, making the solvent volatilize quickly. At last it’s collected by the collection device, solidify and form the nonwoven cloth shape fibrofelt. 3. The Influencing Factors of Electrospinning The influencing factors of electrospinning mainly include solution properties (such as viscosity, thickness, apparent molecular weight distribution, elastic conductivity, dielectric constant, surface tension, etc.), process conditions (such as voltage, press ratio, the distance between nozzle and collection device, nozzle diameter, etc.) and environmental factors (such as temperature, humidity, gas flow rate, etc.). In this respect, many people did the research. The existing research result showed that in the electrospinning process, the main process parameters which influencing the fiber property mainly are: polymer solution thickness, spinning voltage, solidify distance ( the distance between nozzle and collection device), solvent volatility and extruded velocity, etc. (1)Polymer Solution Concentration The higher the polymer solution concertration, the higher the viscosity and surface tension. After leaving the nozzle, the splitting ability of the liquid drop is decreasing along with the surface tension increasing. Usually, when other conditions are unchanged, the fiber’s diameter will increase along with the polymer solution concentration increase. (2)Spinning Voltage Along with the voltage to the polymer solution increased , the system’s electrostatic force increased, the splitting ability of the liquid drop increased, the fiber diameter decreased. (3)Solidify Distance After spraying out from the nozzle, the polymer liquid drop along with the solvent volatilize in the air, concentrate and solidify to fiber, at last collected by the collection device. In different system, the solidify distance has different influence to the fiber diameter. Such as, the research of PS/THF system showed that changing the solidify distance, the influence to the fiber diameter is unapparent. But for the PAN/N, N-DMF system, the fiber diameter decreased along with the collection distance increased. (4)Solvent Similar with the regular solution spinning, the solvent property has a big influence to the forming, structure and property of the solution electrospinning fiber. The volatility of the solvent is very important to the fiber shape. 4. The Application of Electrospinning Technology With the development of nanotechnology, electrospinning as a simple, effective nanofiber manufacture new process technology, it will play a significant role in the biomedical materials, filtration and protection, catalysis, energy, photoelectricity, food engineering and cosmetics areas. ①In the biomedicine area, the nanofiber’s diameter is smaller than cell, can simulate the structure and biological function of natural cell epimatrix; the form and structure of many human tissues, organs are similar to the nanofiber. That makes the nanofiber be possibleto be used for repairing of the tissues and organs. Some electrospinning materials have good biocompatibility and degradability, can be the carrier to enter human body, is easy to be absorbed. Besides, the electrospinning nanofiber has high specific surface, high porosity and other good features. Therefore, it caused continue concern by the researchers in the biomedicine area, and well applied to drug delivery system, wound repair, biological tissue engineering and other aspects. ②The filter efficiency of the fiber filter material will improve along with the decrease of the fiber diameter. Therefore, decreasing the fiber diameter becomes an effective method to improve the filtering quality of fiber filter materials. Electrospinning fiber has many advantages, such as small diameter, small hole diameter, high porosity, uniform fiber etc. That makes it have great application potential in air filtration, liquid filtration and individual protection areas. ③Electrospinning can effectively regulate and control fiber’s fine structure, combining with low surface energy materials, can get super hydrophobic materials, and is hopeful to apply to the ship hull, petroleum pipeline inwall, high-rise glass, automotive glass, etc. But if electrospinning fiber materials want to achieve the above application in the self-cleaning field, have to improve the intensity, abradability and the binding strength of the fiber membrane material and substrate material, etc. ④Catalyst granule with nano structure is easy to unite, thus influencing its dispersibility and utilization ratio. Therefore electrospinning fiber material can be the formwork to uniformly disperse the granule. At the same time, it can also exert the flexibility and operational ease of the polymer carrier, as well as utilizing the composition of catalytic material and polymer surface in micro nano size, producing quite strong synergistic effect, improving the catalytic effect. ⑤Electrospinning nanofiber has quite high specific surface and porosity, can increase the active area of sensing materials and detected objects. It’s hopeful to increase the sensor’s performance substantially. Besides, electrospinning nanofiber can also be applied to many areas like energy, photoelectricity, food engineering etc. 5. The Technology Progress of Electrospinning 5.1. The Technological Improvement of Electrospinning Method (1)Combination Electrospinning In 2003, Philip University, Germany and Israel Zussmandeveloped the combination electrospinning technology together. There are two solutions and two nozzles in this spinning technology. On the front end of the nozzles formed the combination liquid drop, produced the jet-flow, the inner liquid drop also added into the jet-flow. Therefore, it’s difficult to control the liquid drop. If controlling well, it can produce core-shell structure fiber and hollow fiber. (2)The Development of TUFT TUFT is the abbreviation of tubular fiber template. It used polymer to manufacture nanofiber, made other polymer, metal, ceramic attach to the nanofiber, then removed the original polymer, made the fiber become hollow. It can also make composite layer to manufacture nano capacitor. For example, if adding polymer on the outside of the palladium particles, then can get nanocables which inside is electric conductor, outside is insulating layer. If making the aluminum attach to the polymer, then can get expoxy aluminum tube. If making the chromium attach to the polymer, then can get chromium tube. (3)Composite Nozzle Electrospinning basically adopted nozzle as its spinning way. The University of Shiga Prefecture, Japan developed composite nozzle. In order to continue to manufacture nanofiber nonwoven fabric, composite nozzle is indispensable. Because of the distance between the top, bottom, left and right is big, the electrostatic repulsion’s influence decrease. Therefore, usually set the nozzle according to this distance: left and right 10mm, top and bottom 50mm. The nozzle adopted diameter 0.5mm stainless steel tube, used good drug resistance fluorine rubber hose to delivery solution to each steel tube . Each steel tube inserted to the hole on the copper tube, applied high voltage to the copper tube. For this, the stainless steel tube need to connect to the copper tube firmly, but detachable. At present, the nozzle is linear arrangement. 5.2 M-ESP’s Development F.Ko connected the nozzle of the extruder to the ground, applied high voltage to the collector, electrostatic jet spinning PP. But using this device couldn’t get average diameter less than 1μmfiber. Besides, the fiber on the collector can’t be taken out in the high voltage status. It’s a problem of industrialization. Warner used plastic pipe to wind the PP filled syringe, made the heat carrier circularly manufacture melt, the spinning cabinet was in heating status, applied high voltage to the area between syringe nozzle and collector, then got the nanofiber first time. Joo added PLA to the syringe, manufactured a device which can control syringe temperature, spinning temperature, collector temperature, successfully manufactured PLA nanofiber. Above-mentioned research device, manufactured polymer melt in the container, set the nozzle on the container. This is only changing the solution to melt of the S-ESP. This method is the extending of S-ESP. University of Fukui, Japan developed a device which lasing polymer rod from a distance, melting a part of it, and applied high voltage to the melt. This device’s action principle is supplying polymer rod material (diameter less than 1mm) to the melt part in a certain speed (about 0.2mm/s), used carbon dioxide laser to heat its front end from three direction at the same time, made the polymer rod melt uniformly in part, and applied high voltage to the polymer melt, in the spinning area heating status, manufactured fiber by the electrostatic traction. The laser exposure part is fusiform, generate a fiber from its bottom (Figure 2). The result of various kinds of polymer fiber trial-manufacture showed that all are forming a fiber from the melt part, the fiber collected by the collector which diameter is less than 1μm. This device’s feature is: because using laser for heating melt, can do local heating in a moment, energy loss is small; because it’s heating indirectly, the requirement of the device is not high; because not using spinneret plate, high melting point slices can also be spun. 6.Concluding Remarks At present, electrospinning nanofiber technology is in its infancy, but already can see its extensive application prospect, it will produce billions of market value in the future. Researchers will also overcome every technology difficulty of nanofiber application area. Maybe every technology can drive the science and technology progress of the whole spinning industry. Of course, the progress of these technology also need every area of society cooperate closely to complete, such as spinning technology, chemistry technology, biology, polymer science and material science etc. Source: China Journal Net Writer: Jiezhuang Guo, Foshan Lepton Precision Measurement And Control Technology Co., Ltd.
1312 次阅读|0 个评论
疣(带 翻译狗 上路)
zuojun 2017-9-8 14:40
疣是 人 类 乳 头 瘤病毒 ( HPV )所引起,以往 认为这 些疾病是慢性良性疾病,但 发现 HPV 感染后有一部分会 导 致 恶 性 肿 瘤,如皮肤癌、 舌癌 和 宫颈 癌等,因而引起人 们 的重 视 。 Use machine: SDL https://www.freetranslation.com/en/translate-english-chinese SDL: It is a human papilomavirus (HPV), previously thought to be caused by these diseases are chronicdiseases, but found benign HPV infection is a part of the cause of malignanttumors, such as skin cancer, Tongue cancer such as cervical and breast cancer,and this has aroused the attention of the people. It is caused by the virus tocells of a primary response accretion of leather superficial table of benign.Infected about potential four months or so. More likely to young people. Human: This is caused by human papiloma virus (HPV). Previously, it was thought that these diseases were chronic,but it has been found some benign HPV could cause malignant tumors, such asskin cancer, tongue cancer, cervical cancer, etc. This has aroused the attention for these diseases.
个人分类: Scientific Translation|2651 次阅读|0 个评论
On Hand-crafted Myth of Knowledge Bottleneck
liwei999 2016-8-8 04:39
In my article “ Pride and Prejudice of Main Stream “, the first myth listed as top 10 misconceptions in NLP is as follows: Rule-based system faces a knowledge bottleneck of hand-crafted development while a machine learning system involves automatic training (implying no knowledge bottleneck). While there are numerous misconceptions on the old school of rule systems , this hand-crafted myth can be regarded as the source of all. Just take a review of NLP papers, no matter what are the language phenomena being discussed, it’s almost cliche to cite a couple of old school work to demonstrate superiority of machine learning algorithms, and the reason for the attack only needs one sentence, to the effect that the hand-crafted rules lead to a system “difficult to develop” (or “difficult to scale up”, “with low efficiency”, “lacking robustness”, etc.), or simply rejecting it like this, “literature , and have tried to handle the problem in different aspects, but these systems are all hand-crafted”. Once labeled with hand-crafting, one does not even need to discuss the effect and quality. Hand-craft becomes the rule system’s “original sin”, the linguists crafting rules, therefore, become the community’s second-class citizens bearing the sin. So what is wrong with hand-crafting or coding linguistic rules for computer processing of languages? NLP development is software engineering. From software engineering perspective, hand-crafting is programming while machine learning belongs to automatic programming. Unless we assume that natural language is a special object whose processing can all be handled by systems automatically programmed or learned by machine learning algorithms, it does not make sense to reject or belittle the practice of coding linguistic rules for developing an NLP system. For consumer products and arts, hand-craft is definitely a positive word: it represents quality or uniqueness and high value, a legit reason for good price. Why does it become a derogatory term in NLP? The root cause is that in the field of NLP, almost like some collective hypnosis hit in the community, people are intentionally or unintentionally lead to believe that machine learning is the only correct choice. In other words, by criticizing, rejecting or disregarding hand-crafted rule systems, the underlying assumption is that machine learning is a panacea, universal and effective, always a preferred approach over the other school. The fact of life is, in the face of the complexity of natural language, machine learning from data so far only surfaces the tip of an iceberg of the language monster (called l ow-hanging fruit by Church in K. Church: A Pendulum Swung Too Far ), far from reaching the goal of a complete solution to language understanding and applications. There is no basis to support that machine learning alone can solve all language problems, nor is there any evidence that machine learning necessarily leads to better quality than coding rules by domain specialists (e.g. computational grammarians). Depending on the nature and depth of the NLP tasks, hand-crafted systems actually have more chances of performing better than machine learning, at least for non-trivial and deep level NLP tasks such as parsing, sentiment analysis and information extraction (we have tried and compared both approaches). In fact, the only major reason why they are still there, having survived all the rejections from mainstream and still playing a role in industrial practical applications, is the superior data quality, for otherwise they cannot have been justified for industrial investments at all. the “forgotten” school: why is it still there? what does it have to offer? The key is the excellent data quality as advantage of a hand-crafted system, not only for precision, but high recall is achievable as well. quote from On Recall of Grammar Engineering Systems In the real world, NLP is applied research which eventually must land on the engineering of language applications where the results and quality are evaluated. As an industry, software engineering has attracted many ingenious coding masters, each and every one of them gets recognized for their coding skills, including algorithm design and implementation expertise, which are hand-crafting by nature. Have we ever heard of a star engineer gets criticized for his (manual) programming? With NLP application also as part of software engineering, why should computational linguists coding linguistic rules receive so much criticism while engineers coding other applications get recognized for their hard work? Is it because the NLP application is simpler than other applications? On the contrary, many applications of natural language are more complex and difficult than other types of applications (e.g. graphics software, or word processing apps). The likely reason to explain the different treatment between a general purpose programmer and a linguist knowledge engineer is that the big environment of software engineering does not involve as much prejudice while the small environment of NLP domain is deeply biased, with belief that the automatic programming of an NLP system by machine learning can replace and outperform manual coding for all language projects. For software engineering in general, (manual) programming is the norm and no one believes that programmers’ jobs can be replaced by automatic programming in any time foreseeable. Automatic programming, a concept not rare in science fiction for visions like machines making machines, is currently only a research area, for very restricted low-level functions. Rather than placing hope on automatic programming, software engineering as an industry has seen a significant progress on work of the development infrastructures, such as development environment and a rich library of functions to support efficient coding and debugging. Maybe in the future one day, applications can use more and more of automated code to achieve simple modules, but the full automation of constructing any complex software project is nowhere in sight. By any standards, natural language parsing and understanding (beyond shallow level tasks such as classification, clustering or tagging) is a type of complex tasks. Therefore, it is hard to expect machine learning as a manifestation of automatic programming to miraculously replace the manual code for all language applications. The application value of hand-crafting a rule system will continue to exist and evolve for a long time, disregarded or not. “Automatic” is a fancy word. What a beautiful world it would be if all artificial intelligence and natural languages tasks could be accomplished by automatic machine learning from data. There is, naturally, a high expectation and regard for machine learning breakthrough to help realize this dream of mankind. All this should encourage machine learning experts to continue to innovate to demonstrate its potential, and should not be a reason for the pride and prejudice against a competitive school or other approaches. Before we embark on further discussions on the so-called rule system’s knowledge bottleneck defect, it is worth mentioning that the word “automatic” refers to the system development, not to be confused with running the system. At the application level, whether it is a machine-learned system or a manual system coded by domain programmers (linguists), the system is always run fully automatically, with no human interference. Although this is an obvious fact for both types of systems, I have seen people get confused so to equate hand-crafted NLP system with manual or semi-automatic applications. Is hand-crafting rules a knowledge bottleneck for its development? Yes, there is no denying or a need to deny that. The bottleneck is reflected in the system development cycle. But keep in mind that this “bottleneck” is common to all large software engineering projects, it is a resources cost, not only introduced by NLP. From this perspective, the knowledge bottleneck argument against hand-crafted system cannot really stand, unless it can be proved that machine learning can do all NLP equally well, free of knowledge bottleneck: it might be not far from truth for some special low-level tasks, e.g. document classification and word clustering, but is definitely misleading or incorrect for NLP in general, a point to be discussed below in details shortly. Here are the ballpark estimates based on our decades of NLP practice and experiences. For shallow level NLP tasks (such as Named Entity tagging, Chinese segmentation), a rule approach needs at least three months of one linguist coding and debugging the rules, supported by at least half an engineer for tools support and platform maintenance, in order to come up with a decent system for initial release and running. As for deep NLP tasks (such as deep parsing, deep sentiments beyond thumbs-up and thumbs-down classification), one should not expect a working engine to be built up without due resources that at least involve one computational linguist coding rules for one year, coupled with half an engineer for platform and tools support and half an engineer for independent QA (quality assurance) support. Of course, the labor resources requirements vary according to the quality of the developers (especially the linguistic expertise of the knowledge engineers) and how well the infrastructures and development environment support linguistic development. Also, the above estimates have not included the general costs, as applied to all software applications, e.g. the GUI development at app level and operations in running the developed engines. Let us present the scene of the modern day rule-based system development. A hand-crafted NLP rule system is based on compiled computational grammars which are nowadays often architected as an integrated pipeline of different modules from shallow processing up to deep processing. A grammar is a set of linguistic rules encoded in some formalism, which is the core of a module intended to achieve a defined function in language processing, e.g. a module for shallow parsing may target noun phrase (NP) as its object for identification and chunking . What happens in grammar engineering is not much different from other software engineering projects. As knowledge engineer, a computational linguist codes a rule in an NLP-specific language, based on a development corpus. The development is data-driven, each line of rule code goes through rigid unit tests and then regression tests before it is submitted as part of the updated system for independent QA to test and feedback. The development is an iterative process and cycle where incremental enhancements on bug reports from QA and/or from the field (customers) serve as a necessary input and step towards better data quality over time. Depending on the design of the architect, there are all types of information available for the linguist developer to use in crafting a rule’s conditions, e.g. a rule can check any elements of a pattern by enforcing conditions on (i) word or stem itself (i.e. string literal, in cases of capturing, say, idiomatic expressions), and/or (ii) POS (part-of-speech, such as noun, adjective, verb, preposition), (iii) and/or orthography features (e.g. initial upper case, mixed case, token with digits and dots), and/or (iv) morphology features (e.g. tense, aspect, person, number, case, etc. decoded by a previous morphology module), (v) and/or syntactic features (e.g. verb subcategory features such as intransitive, transitive, ditransitive), (vi) and/or lexical semantic features (e.g. human, animal, furniture, food, school, time, location, color, emotion). There are almost infinite combinations of such conditions that can be enforced in rules’ patterns. A linguist’s job is to code such conditions to maximize the benefits of capturing the target language phenomena, a balancing art in engineering through a process of trial and error. Macroscopically speaking, the rule hand-crafting process is in its essence the same as programmers coding an application, only that linguists usually use a different, very high-level NLP-specific language, in a chosen or designed formalism appropriate for modeling natural language and framework on a platform that is geared towards facilitating NLP work. Hard-coding NLP in a general purpose language like Java is not impossible for prototyping or a toy system. But as natural language is known to be a complex monster, its processing calls for a special formalism (some form or extension of Chomsky’s formal language types) and an NLP-oriented language to help implement any non-toy systems that scale. So linguists are trained on the scene of development to be knowledge programmers in hand-crafting linguistic rules. In terms of different levels of languages used for coding, to an extent, it is similar to the contrast between programmers in old days and the modern software engineers today who use so-called high-level languages like Java or C to code. Decades ago, programmers had to use assembly or machine language to code a function. The process and workflow for hand-crafting linguistic rules are just like any software engineers in their daily coding practice, except that the language designed for linguists is so high-level that linguistic developers can concentrate on linguistic challenges without having to worry about low-level technical details of memory allocation, garbage collection or pure code optimization for efficiency, which are taken care of by the NLP platform itself. Everything else follows software development norms to ensure the development stay on track, including unit testing, baselines construction and monitoring, regressions testing, independent QA, code reviews for rules’ quality, etc. Each level language has its own star engineer who masters the coding skills. It sounds ridiculous to respect software engineers while belittling linguistic engineers only because the latter are hand-crafting linguistic code as knowledge resources. The chief architect in this context plays the key role in building a real life robust NLP system that scales. To deep-parse or process natural language, he/she needs to define and design the formalism and language with the necessary extensions, the related data structures, system architecture with the interaction of different levels of linguistic modules in mind (e.g. morpho-syntactic interface), workflow that integrate all components for internal coordination (including patching and handling interdependency and error propagation) and the external coordination with other modules or sub-systems including machine learning or off-shelf tools when needed or felt beneficial. He also needs to ensure efficient development environment and to train new linguists into effective linguistic “coders” with engineering sense following software development norms (knowledge engineers are not trained by schools today). Unlike the mainstream machine learning systems which are by nature robust and scalable, hand-crafted systems’ robustness and scalability depend largely on the design and deep skills of the architect. The architect defines the NLP platform with specs for its core engine compiler and runner, plus the debugger in a friendly development environment. He must also work with product managers to turn their requirements into operational specs for linguistic development, in a process we call semantic grounding to applications from linguistic processing. The success of a large NLP system based on hand-crafted rules is never a simple accumulation of linguistics resources such as computational lexicons and grammars using a fixed formalism (e.g. CFG) and algorithm (e.g. chart-parsing). It calls for seasoned language engineering masters as architects for the system design. Given the scene of practice for NLP development as describe above, it should be clear that the negative sentiment association with “hand-crafting” is unjustifiable and inappropriate. The only remaining argument against coding rules by hands comes down to the hard work and costs associated with hand-crafted approach, so-called knowledge bottleneck in the rule-based systems. If things can be learned by a machine without cost, why bother using costly linguistic labor? Sounds like a reasonable argument until we examine this closely. First, for this argument to stand, we need proof that machine learning indeed does not incur costs and has no or very little knowledge bottleneck. Second, for this argument to withstand scrutiny, we should be convinced that machine learning can reach the same or better quality than hand-crafted rule approach. Unfortunately, neither of these necessarily hold true. Let us study them one by one. As is known to all, any non-trivial NLP task is by nature based on linguistic knowledge, irrespective of what form the knowledge is learned or encoded. Knowledge needs to be formalized in some form to support NLP, and machine learning is by no means immune to this knowledge resources requirement. In rule-based systems, the knowledge is directly hand-coded by linguists and in case of (supervised) machine learning, knowledge resources take the form of labeled data for the learning algorithm to learn from (indeed, there is so-called unsupervised learning which needs no labeled data and is supposed to learn from raw data, but that is research-oriented and hardly practical for any non-trivial NLP, so we leave it aside for now). Although the learning process is automatic, the feature design, the learning algorithm implementation, debugging and fine-tuning are all manual, in addition to the requirement of manual labeling a large training corpus in advance (unless there is an existing labeled corpus available, which is rare; but machine translation is a nice exception as it has the benefit of using existing human translation as labeled aligned corpora for training). The labeling of data is a very tedious manual job. Note that the sparse data challenge represents the need of machine learning for a very large labeled corpus. So it is clear that knowledge bottleneck takes different forms, but it is equally applicable to both approaches. No machine can learn knowledge without costs, and it is incorrect to regard knowledge bottleneck as only a defect for the rule-based system. One may argue that rules require expert skilled labor, while the labeling of data only requires high school kids or college students with minimal training. So to do a fair comparison of the costs associated, we perhaps need to turn to Karl Marx whose “Das Kapital” has some formula to help convert simple labor to complex labor for exchange of equal value: for a given task with the same level of performance quality (assuming machine learning can reach the quality of professional expertise, which is not necessarily true), how much cheap labor needs to be used to label the required amount of training corpus to make it economically an advantage? Something like that. This varies from task to task and even from location to location (e.g. different minimal wage laws), of course. But the key point here is that knowledge bottleneck challenges both approaches and it is not the case believed by many that machine learning learns a system automatically with no or little cost attached. In fact, things are far more complicated than a simple yes or no in comparing the costs as costs need also to be calculated in a larger context of how many tasks need to be handled and how much underlying knowledge can be shared as reusable resources. We will leave it to a separate writing for the elaboration of the point that when put into the context of developing multiple NLP applications, the rule-based approach which shares the core engine of parsing demonstrates a significant saving on knowledge costs than machine learning. Let us step back and, for argument’s sake, accept that coding rules is indeed more costly than machine learning, so what? Like in any other commodities, hand-crafted products may indeed cost more, they also have better quality and value than products out of mass production. For otherwise a commodity society will leave no room for craftsmen and their products to survive. This is common sense, which also applies to NLP. If not for better quality, no investors will fund any teams that can be replaced by machine learning. What is surprising is that there are so many people, NLP experts included, who believe that machine learning necessarily performs better than hand-crafted systems not only in costs saved but also in quality achieved. While there are low-level NLP tasks such as speech processing and document classification which are not experts’ forte as we human have much more restricted memory than computers do, deep NLP involves much more linguistic expertise and design than a simple concept of learning from corpora to expect superior data quality. In summary, the hand-crafted rule defect is largely a misconception circling around wildly in NLP and reinforced by the mainstream, due to incomplete induction or ignorance of the scene of modern day rule development. It is based on the incorrect assumption that machine learning necessarily handles all NLP tasks with same or better quality but less or no knowledge bottleneck, in comparison with systems based on hand-crafted rules. Note: This is the author’s own translation, with adaptation, of part of our paper which originally appeared in Chinese in Communications of Chinese Computer Federation (CCCF), Issue 8, 2013 Domain portability myth in natural language processing Pride and Prejudice of NLP Main Stream K. Church: A Pendulum Swung Too Far , Linguistics issues in Language Technology, 2011; 6(5) Wintner 2009. What Science Underlies Natural Language Engineering? Computational Linguistics, Volume 35, Number 4 Pros and Cons of Two Approaches: Machine Learning vs Grammar Engineering Overview of Natural Language Processing Dr. Wei Li’s English Blog on NLP
个人分类: 立委科普|4313 次阅读|0 个评论
一切声称用机器学习做社会媒体舆情挖掘的系统,都值得怀疑
liwei999 2015-11-21 03:51
一切声称用主流机器学习方法做社会媒体舆情挖掘的系统,都值得怀疑。捉襟见肘不堪应用是基本现状。原因是如此显然,机器学习在短消息主导的社会媒体面前失效了。短消息根本就没有足够密度的数据点(所谓 keyword density)供机器学习施展。巧妇且难为无米之炊,这是一袋子词的方法论决定的,再大的训练集也难以克服这个局限。没有语言学的结构分析,这是不可逾越的挑战。 I have articulated this point in various previous posts or blogs before, but the world is so dominated by the mainstream that it does not seem to carry far. So let me make it simple to be understood: The sentiment classification approach based on bag of words (BOW) model, so far the dominant approach in the mainstream for sentiment analysis, simply breaks in front of social media. The major reason is simple: the social media posts are full of short messages which do not have the keyword density required by a classifier to make the proper sentiment decision. The precision ceiling for this line of work in real life social media is found to be 60%, far behind the widely acknowledged precision minimum 80% for a usable extraction system. Trusting a machine learning classifier to perform social media sentiment is not much better than flipping a coin. So let us get straight. From now on, whoever claims the use of machine learning for social media mining of public opinions and sentiments is likely to be a trap (unless it is verified to have involved parsing of linguistic structures or patterns, which so far has never been heard of in practical systems based on machine learning). Fancy visualizations may make the mining results look real and attractive but they are just not trustable at all. 【补记】 朋友截屏了朋友圈,说这是一竿子打翻一船人的架势。但关于这一点,实在没有办法, 无论中文还是西文, 短消息压倒多数是 移动时代社交媒体的现实, 总须有人揭出社交媒体大数据挖掘背后的事实真相。 BOW 面对短消息束手无策,是不争的事实,不会因为这是最简便 available 的主流方法,多数人用它,它就在不适合它的场所突然显灵了。 不 work 就是不 work,这一路突破不了60%的精度瓶颈,离公认的可用精度门槛80%遥不可及,这是方法论决定的。 Related Posts: Pros and Cons of Two Approaches: Machine Learning and Grammar Engineering Coarse-grained vs. fine-grained sentiment analysis 舆情挖掘系统独立验证的意义 2015-11-22 【立委科普:NLP 中的一袋子词是什么】 2015-11-27 【置顶:立委科学网博客NLP博文一览(定期更新版)】
个人分类: 立委科普|6651 次阅读|0 个评论
钩沉:Early arguments for a hybrid model for NLP and IE
liwei999 2015-10-25 01:01
January 2000 On Hybrid Model Pre-Knowledge-Graph Profile Extraction Research via SBIR (3) This section presents the feasibility study conducted in Phase I of the proposed hybrid model for Level-2 and Level‑3 IE. This study is based on literature review and supported by extensive experiments and prototype implementation. This model complements corpus-based machine learning by hand-coded FST rules. The essential argument for this strategy is that by combining machine learning methods with an FST rule-based system, the system is able to exploit the best of both paradigms while overcoming their respective weaknesses. This approach was intended to meet the demand of the designed system for processing unrestricted real life text. 2.2.1 Hybrid Approach It was proposed that FST hand-crafted rules combine with corpus-based learning in all major modules of Textract . More precisely, each module M will consist of two sub-modules M1 and M2, i.e. FST model and trained model. The former serves as a preprocessor, as shown below. M1: FST Sub-module ˉ M2: trained Sub-module The trained model M2 has two features: (i) adaptive training ; (ii) structure-based training. In a pipeline architecture, the output of the previous module is the input of the succeeding module. If the succeeding module is a trained model, there are two types of training: adaptive training or non-adaptive training. In adaptive training, the input in the training phase is exactly the same as the input in the application phase. That is, the possibly imperfect output from the previous module is the input for training even if the previous module may make certain mistakes. This type of training “adapts” the model to imperfect input and the trained model will be more robust and results in some necessary adjustment. In contrast, a naive non-adaptive training is often conducted based on a perfect, often artificial input. The assumption is that the previous module is a continuously improving module and will be able to provide near-perfect output for the next module. There are pros and cons for both adaptive and non-adaptive methods. Non-adaptive training is suitable for the case when the training time is significantly long and in the case where the previous module is simple and reaches high precision. In contrast, an adaptively trained model has to be re-trained each time the previous module(s) undergo some major changes. Otherwise, the performance will be seriously affected. This imposes stringent requirements on training time and algorithm efficiency. Since the machine learning tools, which Cymfony has developed in-house, are very efficient, Textract can afford to adopt the more flexible training method using adaptive input. Adaptive training provides the rationale for placing the FST model before the trained model. The development of the FST sub-module M1 and the trained sub-module M2 can be done independently. When the time comes for the integration of M1 and M2 for better performance, it suffices to re-train M2 on the output of M1. The flexible adaptive training capabilities make this design viable, as verified inthe prototype implementation of Textract2.0/CE . In contrast, if M1 were placed after M2, the development of hand-crafted rules for M1 would have to wait until M2 is implemented. Otherwise, many rules may have to be re-written and re-debugged, which is not desirable. The second issue is structure-based training. Natural language is structural by nature; any sophisticated high level IE can hardly be successful based on linear strings of tokens. In order to capture CE/GEphenomena, traditional n -gram training with a window size of n linear tokens is not sufficient. Sentences can be long where the related entities are far apart, not to mention the long distance phenomena in linguistics. Without structure based training, no matter how large the window size one chooses, generalized rules cannot be effectively learned. However, once the training is based on linguistic structures, the distance between the entities becomes tractable. In fact, as linguistic structures are hierarchical, we need to perform multi-level training in order to capture CE/GE. For CE, it has been found during the Phase I research that three levels of training are necessary. Each level of training should be supported by the corresponding natural language parser. The remainder of this section presents the feasibility study and arguments for the choice of an FST rule based system to complement the corpus-based machine learning models. 2.2.2 FST Grammars The most attractive feature of the FST formalism lies in its superior time and space efficiency. Applying FST basically depends linearly on the input size of the text. This is in contrast to the more pervasive formalism used in NLP, namely, Context Free Grammars. This theoretical time/space efficiency has been verified through the extensive use of Cymfony’s proprietary FST Toolkit in the following applications of Textract implementation: (i) tokenizer; (ii) FST-based rules for capturing NE; (iii) FST representation of lexicons (lexical transducers); (iv) experiments in FST local grammars for shallow parsing; and (v) local CE/GEgrammars in FST. For example, the Cymfony shallow parser has been benchmarked to process 460 MB of text an hour on a 450 MHz Pentium II PC running Windows NT. There is a natural combination of FST-based grammars and lexical approaches to natural language phenomena . In order for IE grammars/rules to perform well, the lexical approach must be employed. In fact, the NE/CE/GE grammars which have been developed in Phase I have demonstrated a need for the lexical approach. Take CE as an example. In order to capture a certain CE relationship, say affiliation , the corresponding rules need to check patterns involving specific verbs and/or prepositions, say work for/hired by , which denote this relationship in English. The GE grammar, which aims at decoding the key semantic relationships in the argument structure in stead of surface syntactic relationships, has also demonstrated the need to involve considerable level of lexical constraints. Efficiency is always an important consideration indeveloping a large-scale deployable software system. Efficiency is particularly required for lexical grammars since lexical grammars are usually too large for efficient processing using conventional, more powerful grammar formalisms (e.g. Context Free Grammar formalism). Cymfony is convinced through extensive experiments that the FST technology is an outstanding tool to tackle this efficiency problem. It was suggested that a set of cascaded FST grammars could simulate sophisticated natural language parsing. This use of FST application has already successfully been applied to the Textract shallow parsing and local CE/GE extraction. There are a number of success stories of FST-based rule systems in the field of IE. For example, the commercial NE system NetOwl relies heavily on FST pattern matching rules . SRI also applied a very efficient FST local grammar for the shallow parsing of basic noun phrases and verb groups in order to support IE tasks . More recently, Universite Paris VII/LADL has successfully applied FST technology to one specified information extraction/retrieval task; that system can extract information on-the-fly about one's occupation from huge amounts of free text. The system is able to answer questions which conventional retrieval systems cannot handle, e.g. W ho is the minister of culture in France? Finally, it has also been proven by many research programs such as , and INTEX , as well as Cymfony , that an FST-based rule system is extremely efficient. In addition, FST is a convenient tool for capturing linguistic phenomena, especially for idioms and semi-productive expressions that are abundant in natural languages. As Hobbs says, “languages in general are very productive in the construction of short, multiword fixed phrases and proper names employing specialized microgrammars”. However, a purely FST-based rule system suffers from the same disadvantage in knowledge acquisition as that for all handcrafted rule systems. After all, the FST rules or local grammars have to be encoded by human experts, imposing this traditional labor-intensive problem in developing large scale systems. The conclusion is that while FST overcomes a number of shortcomings of the traditional rule based system (in particular the efficiency problem), it does not relieve the dependence on highly skilled human labor. Therefore, approaches for automatic machine learning techniques are called for. 2.2.3 Machine Learning The appeal of corpus-based machine learning in language modeling lies mainly in its automatic training/learning capabilities,hence significantly reducing the cost for hand-coding rules. Compared with rule based systems, there are definite advantages of corpus-based learning: · automatic knowledge acquisition: results in fast development time since the system discovers regularity automatically when given sufficient correctly annotated data · robustness: since knowledge/rules are learned directly from corpus · acceptable speed : in general, there is little run-time processing; the knowledge/rules obtained in the training phase can be stored in efficient data structures for run-time lookup · portability : a domain shift only requires the truthing of new data; new knowledge/rules will be automatically learned with no need to change any part of the program or control BBN has recently implemented an integrated, fully trainable model, SIFT, applied to IE . This system performs the tasks of linguistic processing (POS tagging, syntactic parsing and semantic relationship identification), TE and TR as well as NE, all at once. They have reported 83.49% F-measures for TE and 71.23% F-measures for TR, a result close to those of the best systems in MUC-7. In addition, their successful experiment in making use of the Penn Treebank for training the initial syntactic parser significantly reduces the cost of human annotation. There is no doubt that their effort is a significant progress in this field. It demonstrates the state-of-the-art in applying grammar induction to Level-2 IE. However, there are two potentially serious problems with their approach. The first is the lack of efficiency in applying the model. As they acknowledge, the speed of the system is rather slow. In terms of efficiency, the CKY‑based parsing algorithm they use is not comparable with algorithms for formalisms based on the finite state scheme (e.g. FST, Viterbi for HMM). This limiting factor is due to the inherent nature of the learned grammar based on the CFG formalism. To overcome this problem, rule induction has been explored in the direction of learning FST style grammars for local CE/GEextraction instead of CFG. The second problem is with their integrated approach. Because everything is integrated in one process, it is extremely difficult to trace where a problem lies, making debugging difficult. It is believed that a much more secure way is to follow the conventional practice of modularizing the NLP/IE process in different tasks and sub-tasks, as Cymfony has proposed in the Textract architecture design: POS tagging, shallow parsing, co-referencing, full parsing, pragmatic filtering, NE, CE,GE. Along this line, it is easy to find directly whether a particular degradation in performance is due to poor support from co-referencing or from mistakes in shallow parsing, for example. Performance benchmarking can be measured for each module; efforts to improve the performance of each individual module will contribute to the improvement of the overall system performance. 2.2.4 Drawbacks of Corpus-based Learning The following drawbacks motivate the proposed idea of building a hybrid system/module, complementing the automatic corpus-based learning by handcrafted grammars in FST. · ‘Sparse data’ problem : this is recognized as a bottle-neck for all corpus-based models . Unfortunately, a practical solution to this problem (e.g. smoothing or back-off techniques) often results in a model much less sophisticated than traditional rule-based systems. · ‘Local maxima’ problem : even if the training corpus is large and sufficiently representative, the training program can result in a poor model because training got stuck in a local maximum and failed to find the global peak . This is an inherent problem with the standard training algorithms for both HMM (i.e. forward-backward algorithm ) and CFG grammar induction ( inside-outside algorithm ). This problem can be very serious when there is no extra information applied to guide the training process. · computational complexity problem : It is often the case that there is a trade-off between expressive power/prior knowledge/constraints in the templates and feasibility. Usually, the more sophisticateda model or rule template is, the more the minimum requirement for a corpus increases, often up to an unrealistic level of training complexity. To extend the length of the string to be examined (e.g. from bigram to trigram), or to add more features (or categories/classes) for a template to be able to make reference to, usually means an enormous jump in such requirement. Otherwise, the system suffers from more serious sparse data effect. In many cases, the limitation imposed on the training complexity makes some research ideas unattainable, which in turn limits the performance power. · potentially very high cost for manual annotationof corpus: that is why Cymfony has proposed as one important direction for future research to explore the combination of supervised training and unsupervised training. Among the above four problems, the sparse data problem is believed to be most serious. To a considerable extent, the success of a system depends on how this problem is addressed. In general, there are three ways to minimize the negative effect of sparse data, discussed below. The first is to condition the probabilities/rules on fewer elements, e.g. to back off from N-gram model to (N-1)-gram model. This remedy is clearly a sacrifice of the power and therefore is not a viable option for sophisticated NLP/IE tasks. The second approach is to condition the probabilities/rules on appropriate levels of linguistic structures (e.g. basic phrase level) instead of surface based linear tokens. The research in the CE prototyping showed this to be one the most promising ways of handling the sparse data problem. This approach calls for a reliable natural language parser to establish the necessary structural foundation for conducting structure-based adaptive learning. The shallow parser which Cymfony has built using the FST engine and an extensively tested manual grammar has been tested to perform with 90.5% accuracy. The third method is to condition the probabilities/rules on more general features, e.g. using syntactic categories (e.g. POS) or semantic classes (e.g. the results from semantic lexicon; or from word clustering training) instead of the token literal. This is also a proven effective means for overcoming this bottleneck. However,there is considerable difficulty in applying this approach due to the high degree of lexical ambiguity widespread in natural languages. As for the ‘local maxima’ problem, the proposed hybrid approach in integrating handcrafted FST rules and the automatic grammar learner promises a solution. The learned model can be re-trained using the FST component as a ‘seed’ to guide the learning. In general, the more constraints and heuristics that are given to the initial statistical model for training, the better the chance for the training algorithm to result in the global maximum. It is believed that a handcrafted grammar is the most effective of such constraints since it embodies human linguistic knowledge. 2.2.5 Feasibility and Advantages of Hybrid Approach In fact, the feasibility of such collaboration between a handcrafted rule system (FST in this case) and a corpus-based system has already been verified for all the major types of models: · For transformation based systems, Brill's training algorithm ensures that the input to the system can be either a randomly tagged text ( naive initial state ) or a text tagged by another module with the same function ( sophisticated initial state ) . Using the POS tagging as an example, the input to the transformation-based tagger can be either a text randomly tagged or a text tagged by another POS tagger. The shift in the input sources only requires re-training the system; nothing in the algorithm and the annotated corpus need to be changed. · In the case of rule induction, the FST-based grammar can serve as a ‘seed’ to effectively constrain/guide the learning process in overcoming the ‘local maxima’ problem. In general, a better initial estimate of the parameters gives the learning procedure a chance to obtain better results when many local maximal points exist . It is proven by experiments conducted by Briscoe Waegner that even with a very crude handcrafted grammar of only seven binary-branching rules (e.g. PP -- P NP) to start with, a much better grammar is automatically learned than the one using the same approach without a grammar ‘seed’. Another more interesting experiment they conducted gives the following encouraging results. Given the seed of an artificial grammar that can only parse 25% of the 50,00-word corpus, the training program is able to produce a grammar capable of parsing 75% the corpus. This demonstrates the feasibility of combining handcrafted grammar and automatic grammar induction in line with the general approach proposed above: FST rules before statistical model. · When the trained sub-module is an HMM, Cymfony has verified its feasibility through extensive experiments in implementing the hybrid NE tagger, Textract 1.0 . Cymfony first implemented an NE system purely on HMM bi-gram learning, and found there were weaknesses. Due to sparse data problem, although time and numerical NEs are expressed in very predictable patterns, there was considerable amount of mistagging. Later this problem was addressed by FST rules which are good at capturing these patterns. The FST pattern rules for NE serve as a preprocessor. As a result, Textract1.0 achieved significant performance enhancement (from 85% F-measures raised to 93%). The advantages of this proposed hybrid approach are summarized below: · strict modularity : the proposal of combining FST rules and statistical models makes the system more modular as each major module is now divided into two sub-modules. Of course, adaptive re-training is necessary in the later stage of integrating the two sub-modules but it is not a burden as the process is automatic and in principle, it does not require modifications in the algorithm or the training corpus. · enhanced performance : due to the complementary nature of handcrafted and machine-learning systems. · flexible ratio of sub-modules : one module may have a large trained model and a small FST component, or the other way around, depending on the nature of a given task, i.e. how well the FST approach or the learning approach applies to the task. One is free to decide how to allocate more effort and resources to develop one component or the other. If we judge that for Task One, automatic learning is most effective, we are free to decide that more effort and resources should be used to develop the trained module M2 for this task (and less effort for the FST module M1). In other words, the relative size or contribution of M1 versus M2 is flexible,e.g. M1=20% and M2=80%. Technology developed for the proposed information extraction system and its application has focused on six specific areas: (i) machine learning toolkit, (ii) CE, (iii),CO (iv) GE, (v) QA and (vi) truthing and evaluation. The major accomplishments in these areas from the Phase I research are presented in the following sections. In fact, it is also the case in the development of a pure statistical system: repeated training and testing is the normal practice of adjusting the model in the effort for performance improvement and debugging. It is possible that one module is based exclusively on FST rules, i.e. M1=100% and M2=0%, or completely on a learned model, i.e. M1=0% and M2=100% so long as its performance is deemed good enough or the overhead of combining the FST grammar and the learned model outweighs the slight gain in performance. In fact, some minor modules like Tokenizer and POS Tagger can produce very reliable results using only one approach. REFERENCES Abney, S.P. 1991. Parsingby Chunks, Principle-Based Parsing: Computation and Psycholinguistics ,Robert C. Berwick, Steven P. Abney, Carol Tenny, eds. Kluwer Academic Publishers, Boston, MA, pp.257-278. Appelt, D.E. et al. 1995. SRI International FASTUS System MUC-6 TestResults and Analysis. Proceedings ofMUC-6 , Morgan Kaufmann Publishers, San Mateo, CA Beckwith, R. et al. 1991. WordNet: A Lexical Database Organized on Psycholinguistic Principles. Lexicons: Using On-line Resources to build a Lexicon , Uri Zernik,editor, Lawrence Erlbaum, Hillsdale, NJ. Bikel, D.M. et al .,1997. Nymble: a High-Performance Learning Name-finder. Proceedings ofthe Fifth Conference on Applied Natural Language Processing , MorganKaufmann Publishers, pp. 194-201. Brill, E., 1995.Transformation-based Error-Driven Learning and Natural language Processing: A Case Study in Part-of-Speech Tagging, Computational Linguistics , Vol.21,No.4, pp. 227-253 Briscoe, T. Waegner,N., 1992. Robust Stochastic Parsing Using the Inside-Outside Algorithm. WorkshopNotes, Statistically-Based NLP Techniques , AAAI, pp. 30-53 Charniak, E. 1994. Statistical Language Learning , MIT Press, Cambridge, MA. Chiang, T-H., Lin, Y-C. Su, K-Y. 1995. Robust Learning, Smoothing, and Parameter Tying on Syntactic Ambiguity Resolution, Computational Linguistics , Vol.21,No.3, pp. 321-344. Chinchor, N. Marsh,E. 1998. MUC-7 Information Extraction Task Definition (version 5.1), Proceedingsof MUC-7 Darroch, J.N. Ratcliff, D. 1972. Generalized iterative scaling for log-linear models. TheAnnals of Mathematical Statistics, pp. 1470-1480. Grishman, R., 1997.TIPSTER Architecture Design Document Version 2.3. Technical report, DARPA. Hobbs, J.R. 1993. FASTUS: A System for Extracting Informationfrom Text, Proceedings of the DARPA workshop on Human Language Technology , Princeton, NJ, pp. 133-137. Krupka, G.R. Hausman, K. 1998. IsoQuest Inc.: Description of the NetOwl (TM) ExtractorSystem as Used for MUC-7, Proceedings of MUC-7 Lin, D. 1998. Automatic Retrieval and Clustering of Similar Words, Proceedings of COLING-ACL '98 , Montreal, pp. 768-773. Miller, S. et al .,1998. BBN: Description of the SIFT System as Used for MUC-7. Proceedings of MUC-7 Mohri, M. 1997.Finite-State Transducers in Language and Speech Processing, ComputationalLinguistics , Vol.23, No.2, pp.269-311. Mooney, R.J. 1999. Symbolic Machine Learning for NaturalLanguage Processing. Tutorial Notes, ACL ’99 . MUC-7, 1998. Proceedings of the Seventh MessageUnderstanding Conference (MUC-7), published on the websitehttp://www.muc.saic.com/ Pine, C. 1996. Statement-of-Work (SOW) for The Intelligence Analyst Associate (IAA)Build 2, Contract for IAA Build 2, USAF, AFMC, RomeLaboratory. Rilof, E. Jones, R.1999. Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping, Proceedings of the Sixteenth a National Conference on Artificial Intelligence (AAAI-99) Rosenfeld, R. 1994. Adaptive Statistical Language Modeling. PhD thesis, Carnegie Mellon University. Senellart, J. 1998. Locating Noun Phrases with Finite StateTransducers, Proceedings of COLING-ACL '98 , Montreal, pp. 1212-1219. Silberztein, M. 1998.Tutorial Notes: Finite State Processing with INTEX, COLING-ACL'98, Montreal(also available at http://www.ladl.jussieu.fr) Srihari, R. 1998. A Domain Independent Event Extraction Toolkit, AFRL-IF-RS-TR-1998-152 Final Technical Report, published by Air Force Research Laboratory, Information Directorate,Rome Research Site, New York Yangarber, R. Grishman, R. 1998. NYU: Description of the Proteus/PET System as Used for MUC-7ST, Proceedings of MUC-7 Pre-Knowledge-Graph Profile Extraction Research via SBIR (1) 2015-10-24 Pre-Knowledge-Graph Profile Extraction Research via SBIR (2) 2015-10-24 朝华午拾:在美国写基金申请的酸甜苦辣 - 科学网 【置顶:立委科学网博客NLP博文一览(定期更新版)】
个人分类: 立委科普|5146 次阅读|0 个评论
机器学习与R语言 书中R程序问题
waterbridge7 2015-9-15 10:18
1. 4.2.2中数据读取时有问题: str(sms_raw$type) 后有三个factor。原因是数据中有一行格式有问题,用其他行替换掉即可。 2. 4.2.3中 sms_dtm - DocumentTermMatrix(corpus_clean) 此句前面要加 corpus_clean - tm_map(corpus_clean, PlainTextDocument) 出处: http://www.zhihu.com/question/29114787/answer/44383827 3. 6.2.3中ins_model3 可能打字错误 应该为 ins_model 没有3
个人分类: 交流|1010 次阅读|0 个评论
The Machine - 机器
benlion 2015-9-8 11:15
- Scientific Art of Bio-systems 1. R.Rosen MD.Mesarovic (1968), DW.Thompson S.Leduc (1910). 2. BJ.Zeng,Structure Theory of Bio-systems, Methodology of Graph Theory and Network Topology (1991-1997): a). Solar-energy and Bio-Electronics (1991), b). Systems Medicine and Pharmacology (1992), c). Structure Theory on the Integration, Stability and Construction of Systems (1993), c). Systems Genetics and Bio-engineering (1994), d). Bio-computer and Cell Bionic Engineering, Oviduct Bioreactor and Transgenics (1994) , e). Bio-systems Theory and Systems Bio-engineering, SG of the First International Conference on Transgenic Animals (1996), f). Biosystem Network, BSSE (1999) – Positive and Synthetic Thoughts: Structure Theory of Bio-systems, Computational, Experimental and Engineering Manipulation of Bio-systems, Bionics and Transgenics of Artificial Bio-systems etc. 3. On Systems Biology (2001) at the Aspects: Systems Theory (O.Wolkenhauer), Experimental Omics (L.Hood), Computation in Silico (H.Kitano) and Engineering Design (AP.Arkin) etc. - (08/09/2015) -
个人分类: genesis|1738 次阅读|0 个评论
Why Hybrid?
liwei999 2015-6-19 00:19
Before we start discussing the topic of a hybrid NLP (Natural Language Processing) system, let us look at the concept of hybrid from our life experiences. I was driving a classical Camry for years and had never thought of a change to other brands because as a vehicle, there was really nothing to complain. Yes, style is old but I am getting old too, who beats whom? Until one day a few years ago when we needed to buy a new car to retire my damaged Camry. My daughter suggested hybrid, following the trend of going green. So I ended up driving a Prius ever since and have fallen in love with it. It is quiet, with bluetooth and line-in, ideal for my iPhone music enjoyment. It has low emission and I finally can say bye to smog tests. It at least saves 1/3 gas. We could have gained all these benefits by purchasing an expensive all-electronic car but I want the same feel of power at freeway and dislike the concept of having to charge the car too frequently. Hybrid gets the best of both worlds for me now, and is not that more expensive. Now back to NLP. There are two major approaches to NLP, namely machine learning and grammar engineering (or hand-crafted rule system). As mentioned in previous posts, each has its own strengths and limitations, as summarized below. In general, a rule system is good at capturing a specific language phenomenon (trees) while machine learning is good at representing the general picture of the phenomena (forest). As a result, it is easier for rule systems to reach high precision but it takes a long time to develop enough rules to gradually raise the recall. Machine learning, on the other hand, has much higher recall, usually with compromise in precision or with a precision ceiling. Machine learning is good at simple, clear and coarse-grained task while rules are good at fine-grained tasks. One example is sentiment extraction. The coarse-grained task there is sentiment classification of documents (thumbs-up thumbs down), which can be achieved fast by a learning system. The fine-grained task for sentiment extraction involves extraction of sentiment details and the related actionable insights, including association of the sentiment with an object, differentiating positive/negative emotions from positive/negative behaviors, capturing the aspects or features of the object involved, decoding the motivation or reasons behind the sentiment,etc. In order to perform sophisticated tasks of extracting such details and actionable insights, rules are a better fit. The strength for machine learning lies in its retraining ability. In theory, the algorithm, once developed and debugged, remains stable and the improvement of a learning system can be expected once a larger and better quality corpus is used for retraining (in practice, retraining is not always easy: I have seen famous learning systems deployed in client basis for years without being retrained for various reasons). Rules, on the other hand, need to be manually crafted and enhanced. Supervised machine learning is more mature for applications but it requires a large labelled corpus. Unsupervised machine learning only needs raw corpus, but it is research oriented and more risky in application. A promising approach is called semi-supervised learning which only needs a small labelled corpus as seeds to guide the learning. We can also use rules to generate the initial corpus or seeds for semi-supervised learning. Both approaches involve knowledge bottlenecks. Rule systems's bottleneck is the skilled labor, it requires linguists or knowledge engineers to manually encode each rule in NLP, much like a software engineer in the daily work of coding. The biggest challenge to machine learning is the sparse data problem, which requires a very large labelled corpus to help overcome. The knowledge bottleneck for supervised machine learning is the labor required for labeling such a large corpus. We can build a system to combine the two approaches to complement each other. There are different ways of combining the two approaches in a hybrid system. One example is the practice we use in our product, where the results of insights are structured in a back-off model: high precision results from rules are ranked higher than the medium precision results returned by statistical systems or machine learning. This helps the system to reach configurable balance between precision and recall. When labelled data are available (e.g. the community has already built the corpus, or for some tasks, the public domain has the data, e.g. sentiment classification of movie reviews can use the review data with users' feedback on 5-star scale), and when the task is simple and clearly defined, using machine learning will greatly speed up the development of a capability. Not every task is suitable for both approaches. (Note that suitability is in the eyes of beholder: I have seen many passionate ML specialists willing to try everything in ML irrespective of the nature of the task: as an old saying goes, when you have a hammer, everything looks like a nail.) For example, machine learning is good at document classification whilerules are mostly powerless for such tasks. But for complicated tasks such as deep parsing, rules constructed by linguists usually achieve better performance than machine learning. Rules also perform better for tasks which have clear patterns, for example, identifying data items like time,weight, length, money, address etc. This is because clear patterns can be directly encoded in rules to be logically complete in coverage while machine learning based on samples still has a sparse data challenge. When designing a system, in addition to using a hybrid approach for some tasks, for other tasks, we should choose the most suitable approach depending on the nature of the tasks. Other aspects of comparison between the two approaches involve the modularization and debugging in industrial development. A rule system can be structured as a pipeline of modules fairly easily so that a complicated task is decomposed into a series of subtasks handled by different levels of modules. In such an architecture, a reported bug is easy to localize and fix by adjusting the rules in the related module. Machine learning systems are based on the learned model trained from the corpus. The model itself, once learned, is often like a black-box (even when the model is represented by a list of symbolic rules as results of learning, it is risky to manually mess up with the rules in fixing a data quality bug). Bugs are supposed to be fixable during retraining of the model based on enhanced corpus and/or adjusting new features. But re-training is a complicated process which may or may not solve the problem. It is difficultto localize and directly handle specific reported bugs in machine learning. To conclude, Hybrid gets the best of both worlds . Due to the complementary nature for pros/cons of the two basic approaches to NLP, a hybrid system involving both approaches is desirable, worth more attention and exploration. There are different ways of combining the two approaches in a system, including a back-off model using rulles for precision and learning for recall, semi-supervised learning using high precision rules to generate initial corpus or “seeds”, etc.. Related posts: Comparison of Pros and Cons of Two NLP Approaches Is Google ranking based on machine learning ? 《立委随笔:语言自动分析的两个路子》 《立委随笔:机器学习和自然语言处理》 【置顶:立委科学网博客NLP博文一览(定期更新版)】
个人分类: 立委科普|3993 次阅读|0 个评论
[转载]深度学习、机器学习与模式识别三者之间的区别
machinelearn 2015-3-30 16:10
Lets take a close look at three relatedterms (Deep Learning vs Machine Learning vs Pattern Recognition), and see howthey relate to some of the hottest tech-themes in 2015 (namely Robotics andArtificial Intelligence). In our short journey through jargon, you shouldacquire a better understanding of how computer vision fits in, as well as gainan intuitive feel for how the machine learning zeitgeist has slowly evolvedover time. Fig 1. Putting a human inside a computeris not Artificial Intelligence (Photo from WorkFusionBlog ) If you look around, you'll see no shortage of jobs at high-tech startupslooking for machine learning experts. While only a fraction of them are lookingfor Deep Learning experts, I bet most of these startups can benefit from eventhe most elementary kind of data scientist. So how do you spot a futuredata-scientist? You learn how they think. The three highly-relatedlearning buzz words “Pattern recognition,” “machinelearning,” and “deep learning” represent three different schools ofthought. Pattern recognition is the oldest (and as a term is quiteoutdated). Machine Learning is the most fundamental (one of the hottest areasfor startups and research labs as of today, early 2015). And DeepLearning is the new, the big, the bleeding-edge -- we’re not even close tothinking about the post-deep-learning era . Just take a look at thefollowing Google Trends graph. You'll see that a) Machine Learning isrising like a true champion, b) Pattern Recognition started as synonymous withMachine Learning, c) Pattern Recognition is dying, and d) Deep Learning is newand rising fast. 1. Pattern Recognition: The birth ofsmart programs Pattern recognition was a term popularin the 70s and 80s. Theemphasis was on getting a computer program to do something “smart” likerecognize the character 3. And it really took a lot ofcleverness and intuition to build such a program. Just think of 3vs B and 3 vs 8. Back in the day, itdidn’t really matter how you did it as long as there was no human-in-a-boxpretending to be a machine. (See Figure 1) So if your algorithm wouldapply some filters to an image, localize some edges, and apply morphologicaloperators, it was definitely of interest to the pattern recognitioncommunity. Optical Character Recognition grew out of this community andit is fair to call “Pattern Recognition” as the “Smart Signal Processingof the 70s, 80s, and early 90s. Decision trees, heuristics, quadraticdiscriminant analysis, etc all came out of this era. Pattern Recognition becomesomething CS folks did, and not EE folks. One of the most popular booksfrom that time period is the infamous Duda Hart PatternClassification book and is still a great starting point for youngresearchers. But don't get too caught up in the vocabulary, it's a bitdated. The character 3 partitionedinto 16 sub-matrices. Custom rules, custom decisions, and customsmart programs used to be all the rage. SeeOCR Page . Quiz : The most popular Computer Vision conference is called CVPRand the PR stands for Pattern Recognition. Can you guess the year of thefirst CVPR conference? 2. Machine Learning: Smart programs canlearn from examples Sometime in the early 90s people startedrealizing that a more powerful way to build pattern recognition algorithms isto replace an expert (who probably knows way too much about pixels) with data(which can be mined from cheap laborers). So you collect a bunch of faceimages and non-face images, choose an algorithm, and wait for the computationsto finish. This is the spirit of machine learning. Machine Learningemphasizes that the computer program (or machine) must do some work after it isgiven data. The Learning step is made explicit. And believeme, waiting 1 day for your computations to finish scales better than invitingyour academic colleagues to your home institution to design some classificationrules by hand. What is Machine Learningfrom DrNatalia Konstantinova's Blog . The most important part of thisdiagram are the Gears which suggests thatcrunching/working/computing is an important step in the ML pipeline. As Machine Learning grew into a majorresearch topic in the mid 2000s, computer scientists began applying these ideasto a wide array of problems. No longer was it only character recognition,cat vs. dog recognition, and other “recognize a pattern inside an array ofpixels” problems. Researchers started applying Machine Learning to Robotics(reinforcement learning, manipulation, motion planning, grasping), to genomedata, as well as to predict financial markets. Machine Learning wasmarried with Graph Theory under the brand “Graphical Models,” every roboticsexpert had no choice but to become a Machine Learning Expert, and MachineLearning quickly became one of the most desired and versatile computing skills . However Machine Learning says nothing about the underlyingalgorithm. We've seen convex optimization, Kernel-based methods, SupportVector Machines, as well as Boosting have their winning days. Togetherwith some custom manually engineered features, we had lots of recipes, lots ofdifferent schools of thought, and it wasn't entirely clear how a newcomershould select features and algorithms. But that was all about tochange... Further reading: To learn more about the kinds of features that were used inComputer Vision research see my blog post: Fromfeature descriptors to deep learning: 20 years of computer vision . 3. Deep Learning: one architecture torule them all Fast forward to today and what we’reseeing is a large interest in something called Deep Learning. The most popularkinds of Deep Learning models, as they are using in large scale imagerecognition tasks, are known as Convolutional Neural Nets, or simplyConvNets. ConvNet diagram from TorchTutorial Deep Learning emphasizes the kind ofmodel you might want to use (e.g., a deep convolutional multi-layer neuralnetwork) and that you can use data fill in the missing parameters. Butwith deep-learning comes great responsibility. Because you are starting witha model of the world which has a high dimensionality, you really need a lot ofdata (big data) and a lot of crunching power (GPUs). Convolutions are usedextensively in deep learning (especially computer vision applications), and thearchitectures are far from shallow. If you're starting out with Deep Learning, simply brush up on some elementaryLinear Algebra and start coding. I highly recommend AndrejKarpathy's Hacker's guideto Neural Networks . Implementing your own CPU-based backpropagationalgorithm on a non-convolution based problem is a good place to start. There are still lots of unknowns. The theory of why deep learning works isincomplete, and no single guide or book is better than true machine learningexperience. There are lots of reasons why Deep Learning is gainingpopularity, but Deep Learning is not going to take over the world. Aslong as you continue brushing up on your machine learning skills, your job issafe. But don't be afraid to chop these networks in half, slice 'n dice atwill, and build software architectures that work in tandem with your learningalgorithm. The Linux Kernel of tomorrow might run on Caffe (one of the most popular deeplearning frameworks), but great products will always need great vision, domainexpertise, market development, and most importantly: human creativity. Other related buzz-words Big-data is the philosophy of measuring allsorts of things, saving that data, and looking through it forinformation. For business, this big-data approach can give you actionableinsights. In the context of learning algorithms, we’ve only startedseeing the marriage of big-data and machine learning within the past fewyears. Cloud-computing , GPUs , DevOps ,and PaaS providers have made large scale computing within reach ofthe researcher and ambitious everyday developer. Artificial Intelligence is perhaps the oldest term, the most vague,and the one that was gone through the most ups and downs in the past 50 years.When somebody says they work on Artificial Intelligence, you are either goingto want to laugh at them or take out a piece of paper and write down everythingthey say. Further reading: My 2011 Blog post ComputerVision is Artificial Intelligence . Conclusion Machine Learning is here to stay. Don't think about it as PatternRecognition vs Machine Learning vs Deep Learning, just realize that each termemphasizes something a little bit different. But the search continues. Go ahead and explore. Break something. We will continue building smartersoftware and our algorithms will continue to learn, but we've only begun toexplore the kinds of architectures that can truly rule-them-all. If you're interested in real-time visionapplications of deep learning, namely those suitable for robotic and homeautomation applications, then you should check out what we've been buildingat vision.ai . Hopefully in a few days, I'll be ableto say a little bit more. :-)
个人分类: 科研笔记|4633 次阅读|0 个评论
《概率图模型:原理与技术》译者序
热度 23 王飞跃 2015-3-28 14:21
《概率图模型:原理与技术》译者序 王飞跃 由清华大学出版社出版的《概率图模型:原理与技术》将于近期发布,敬请关注京东、当当、亚马逊的图书信息。 -------------------------------------------------------------------------------------------- 译者序 美国斯坦福大学教授 Daphne Koller 和以色列希伯来大学教授 Nir Friedman 的专著《概率图模型:原理与技术( P robabilistic Graphical Models: Principles and Techniques )》是机器学习和人工智能领域一部里程碑式的著作。本书内容十分丰富,作者以前所未有的广度和深度,对概率图模型这一领域的基础知识和最新进展进行了全面描述和总结。显然,这本书是对 Judea Pearl 教授关于不确定情况下贝叶斯网络推理与决策方面开创性工作之权威和重要的拓广,是本领域专家学者和研究生的最佳参考书和教科书之一。近几年来,人工智能和机器学习的研究与应用已成为全球性的科技热点,此书的出版恰当其时、正当其用,并且已产生了十分积极的影响。 将一本近 1200 页的英文原著翻译成中文,的确是一次刻骨铭心的难忘经历,有时甚至感觉自已比原著的第二作者 Friedman 更有资格被称为 “ 烤焦的人 ” (英文就是 Fried man )! 此书内容广博,而且作者特别善于见微知著,长篇细论,即便是单纯地读完此书亦需要坚韧的毅力,何况将其译成中文!用 Koller 自己在邮件中的话说: “ 这是一次英雄般的努力! ” 对于英文原著国内外已有许多评论,我已无需锦上添花。在此,只希望简要回顾一下自己为何如此钟情于此书以及本书五年的翻译历程,权当为序。 自提出社会计算这一新的研究领域之后,我一直试图寻求能够支撑社会计算的理论框架与解析方法。 2004 年正式提出基于人工社会、计算实验和平行执行的复杂系统计算方法 ACP 框架,同时也将其作为社会计算的一般框架,因为社会计算所涉及的社会问题,显然是典型的复杂系统问题。但社会计算用于量化分析的一般解析方法是什么?在哪里?这在当时是一个首要的研究问题。自 2007 年起,作为副主编,我不断地推动《 IEEE 智能系统( IEEE Intelligent Systems )》杂志将社会计算列为主题,并先后组织了社会计算、社会学习、社会媒体、社会安全等相关专题专刊的征稿与出版,试图通过这些工作寻找完整的社会计算框架、方法及应用的学术 “ 生态 ” 体系。这些工作,得到了时任主编 James Hendler 教授的热情支持,他自己与 Tim Berners-Lee 还相应地提出了社会机器( Social Machines )的研究方向。我于 2008 年底接任主编,次年的年度编委会之后, Jim 打 电话问我对 Koller 和 Friedman 的《 概率图模型 》的看法如何,可否在 IEEE Intelligent Systems 上组织一篇书评。但当时我并没有看过此书,因此建议由他来找人写。 2010 年春,我们实验室的王珏研究员再次让我注意此书,认为这本书会将对机器学习的研究产生重要影响,建议我组织大家学习讨论,并组织人员翻译。在此情况下,我才买了 英文原版书,一看此书一千多页,比一块厚砖头还厚,当即就对能否找到时间翻译产生怀疑,表示不可能有时间从事这项工作。 然而,粗读此书后,特别是从网络搜索了相关基于概率图模型的研究论文之后,我下定决心组织人员翻译此书,并立即让正在从事相关研究的博士生周建英着手写相关综述(见周建英、王飞跃、曾大军, “ 分层 Dirichlet 过程及其应用综述 ” , 自动化学报 , 第 37 卷 4 期 , 389-407, 2011 )。促使我态度转变的原因主要有三点:首先,本书是按照 “ 知识表示 — 学习推理 — 自适应决策 ” 这一大框架组织的,与我自己的 ACP (即:人工社会 / 组织 / 系统 + 计算实验 + 平行执行)方法之主旨完全一致。显然,基于概率图的表示是人工组织建模方法的特殊形式,机器学习和相关推理是计算实验的智能化方式,而自适应决策则更是平行执行的闭环反馈化的具体实现。实际上,我一直希望把计算实验与机器学习打通,从而使统计实验方法升华并与最新的机器学习方法,特别是与集成学习( ensemble learning )结合起来。其次,概率图模型方法已在文本分类、观点分析、行为分析、舆情计算、车辆分类、交通路况分析以及智能控制上有了初步应用,这些都是我所带领的团队当时正在从事的研究及开发工作,而我正苦寻有何方法能将这些工作的基础统一起来,从而实现从基础、研究到应用的一体化整合,概率图模型为我开启了一个十分具有前景和希望的途径。唯一的缺憾,是我对此书的内容组织和写作方式有所保留,觉得虽然总体合理流畅,而且还用了 “ 技巧框盒( skill boxes ) ” 、 “ 案例框盒 ( case study boxes) ” 、 “ 概念框盒( concept boxes ) ” 等让人一目了然印象深刻的表述形式,但对一般研究人员,特别是伴随微博微信成长起来的 QQ 新一代,此书相对难读,对某些人甚至基本无法读下去。因此,我希望通过细品此书,能够写出一本自己的书,就叫《可能图模型:原理与方法》,作为社会计算、平行方法和知识自动化的基础教材,统一我们团队学生之基本功的培育和修炼。 有了这些想法,立即付诸行动。为了尽快完成翻译,我准备尝试利用自己一直全力推动的 “ 人肉搜索 ” 众包形式,希望通过社会媒体号召专业和社会人士参与此次工作,也为将来此书中文版之发行打下一个良好基础,就算是一种 “ 计算广告 ” 和 “ 精准投放 ” 实习吧!首先,我在实验室组织了十多名学生参与此次工作,作为众包的 “ 种子 ” ;再请王珏老师给大家系统化地上课,讲解此书;希望在此基础上,再挑选一批人,进行翻译,其余的参加翻译的讨论,并将翻译讨论后的结果分布于网上,发动专业和社会人士,进行评论、修正和改进,以 “ 众包 ” 方式完成此书的中文翻译。王珏老师对此给予我极大地支持,并安排了他的学生韩素青和张军平等参与此次工作。 与此同时,我也开始联系此书的版权事宜。王珏老师推荐了清华大学出版社薛慧老师作为联系人,我也找到了正邀请我写作《社会计算》的 MIT 出版社的 Jane 女士,询问翻译版权事宜。 Jane 立即回信,表示感兴趣,同时将回信抄送 Koller 和 Friedman ,并表示她将参加 2010 年 5 月在 Alaska 召开的 IEEE 机器人与自动化国际大会( ICRA )大会,因我是 IEEE ICRA2010 的国际程序委员会委员,希望见面细谈。其实自己原本不想去 Alaska ,但为了尽快顺利促成此事,我决定还是去一趟。对于此次会面,我印象极深,一是刚下飞机就接到国内电话,社会计算工程团队发生重大事件,让我十分担心;二是刚进大会酒店 Hilton 大厅,一眼就看到左边酒吧里同事 JM 正与一位女士坐在一起品酒交谈, JM 起身与我打招呼,并介绍他的女伴,原来正是 Jane !版权事宜很顺利, Jane 说她刚开始时担心翻译质量,因为此书很有难度,现在放心了。而且,对清华大学出版社也很了解,可又担心此文的中文版权是否有繁体与简体的冲突,因为繁体中文版权已授予一家台湾公司。对此,我只能介绍了她与清华大学出版社薛慧直接联系,似乎后来的版权事宜还算顺利。 没有想到的是,接下来的翻译工作极不顺畅。参与学习班的许多同学提交的初稿可以用一个词来描述: “ 惨不忍睹 ” 。错误理解、望文生义,甚至将意思满拧,比比皆是,有时让人哭笑不得,完全可以用 “ 天马行空,独往独来 ” 来形容。让我难以相信,这是出自完成大学甚至硕士学业的学生之手。我一度想把这些翻译放到网上,让大家开心一下,或许会吸引更多的人士参与此书翻译的众包工作。但王珏和几位同事的意见完全打消了我的想法,也使我放弃了以 “ 众包 ” 方式进行翻译的打算。主要原因是当时希望能趁大家还在热切期盼此书的时候出版中文版,而众包翻译这种形式从未尝试过,结果如何不能妄断,万一比学生翻译的还差怎么办?何时才能完成?就是完成之后,署名和其他出版问题如何解决?最后,决定由我主笔并组织翻译工作,分翻译、统稿、审校、修正、清样校对五个阶段进行,邀请实验室毕业的韩素青、张军平、周建英和杨剑以及王立威和孙仕亮等机器学习领域的一线年轻研究人员辅助。在 2012 年以前,我只能用零星时间从事翻译工作。 2011 年底的一场大病,让我在之后有了充裕的时间和精力一边修养一边翻译修改。特别是在北京西郊与山林相伴的日日夜夜,效率极高,终于在 2012 年夏初初步完成了此书的翻译和统稿。 必须说明的是,本项工作是集体努力的结果。参与人员五十余人,多为我和王珏老师的历届学生。首先,非常感谢韩素青博士在翻译和统稿过程付出的巨大努力和心血,她的坚持使我打消了一度放弃此项目的想法。此外,我团队的学生王友忠、王坤峰、王凯、叶佩军、田滨、成才、吕宜生、任延飞、孙涛、苏鹏、李叶、李林静、李泊、李晓晨、沈栋、宋东平、张柱、陈松航、陈诚、周建英、赵学亮、郝春辉、段伟、顾原、徐文聪、彭景、葛安生等参与了本书的翻译,北京大学的王立威教授、复旦大学的张军平教授、华东师范大学的孙仕亮教授、北京工业大学的杨剑教授、公安部第三研究所的周建英博士、中国科学院自动化研究所的王坤峰博士参与了审校,我的学生王坤峰、田滨、李叶、李泊、苟超、姚彦洁等参与了修正,最后,王坤峰带领我团队的学生王晓、亢文文、朱燕燕、刘玉强、刘裕良、杨坚、陈亚冉、陈圆圆、苟超、赵一飞、段艳杰、姚彦洁等完成了清样的校对和通读,在此我向他们深表谢意。还有许多其他同学和同事在不同阶段参与了本项工作,十分感谢他们的贡献,抱歉无法在此一一具名。 在此书的翻译过程中,还得到 Koller 教授的帮助。 2011 年 8 月上旬,我在旧金山的 AAAI 年会上与她相见,讨论了翻译事宜。 Koller 表示可以让她的两位中国学生参与翻译,我还同他们联系过,但除了几个名词和一些修正问题,并没有太劳驾其学生。 Koller 提供的详细勘读表和网站信息,对我们翻译的校正很有帮助。今年四月,我赴美参加主编会议,本计划去旧金山与 Koller 见面确定翻译的最后一些细节,不想因病作罢,只能通过邮件进行。 此书的翻译还让我与斯坦福大学人工智能实验室的三位主任有了较深的来往,而且三位分别是当今世界上最成功 MOOC 网络大学 Coursera 和 Udacity 的创始人。当 Koller 因和吴恩达创办 Coursera 而辞去 AI 主任之后, Sebastian Thrun 接任,那时我恰好与他合作组织 IJCAI2013 北京大会。 2011 年他来京得知我们正在翻译 《概率图模型》 后,希望也能翻译他的《概率机器人( Probabilistic Robotics )》。自己虽然教授了 20 年的机器人课程,但再无精力和时间做此类工作,只能安排实验室其他研究人员承担。但是他的中文翻译版权已被转让,只好作罢。后来 Thrun 辞去斯坦福和谷歌的工作,创办 Udacity ,接任实验室主任的,正是后来加入百度的吴恩达博士,十分感谢他表示愿为《概率图模型》中文版的推广而尽力。 从初春北京的西山到五月阿拉斯加的海滨,从夏雨中长沙的跨线桥到烈日下图森的仙人掌,从秋枫叶飘的旧金山到海浪冲沙的海牙,从深秋风凉的青岛石老人再回到初冬消失在雾霾里的北京高楼, ...... 本书的翻译伴我度过了五年的风风雨雨,差点成了完不成的任务( Mission Impossible !)。今日落稿,顿觉释然,除了感谢自己的学生与同事之外,我必须特别感谢清华大学出版社的薛慧女士,感谢她在整个翻译过程中的热心和耐心。 最后,希望把本书的中文版献给王珏老师,没有他就没有本项目的开始。更重要的是,希望本书中文版的完成,能使他早日从疾病中康复! 中国科学院自动化研究所复杂系统管理与控制国家重点实验室 国防科技大学军事计算实验与平行系统技术研究中心 王飞跃 2012 年秋初记于长沙跨线桥居, 2013 年初补记于北京国奥新村, 2014 年初春再记于美国图森 Catalina 山居,同年深秋重记于青岛石老人海滨。 又记: 十分不幸的是, 2014 年 12 月 3 日传来噩耗:王珏老师因病逝世!相识相知相助 21 年,内心悲痛,无以言喻。特别是王珏老师生前没能看到本书中文版的正式出版,遗憾之余,深感自责。鉴于他和他学生的巨大帮助,我曾多次同他商谈,希望将他列为译者之一,但每次他都坚决拒绝;最后无奈,曾托薛慧拿着出版合同请他签字,但依然被拒绝。唯感欣慰的是, 12 月 2 日下午,在他神志清醒的最后时刻,我们见了最后一面。他去世后,当日我即电邮 Daphne Koller ,告她先前不曾知晓的王珏老师,还有他对中国机器学习的重要贡献以及在翻译其专著过程中所起的关键作用,希望她在中文版的序言里有所表述。 英文如下: Prof. Jue Wang, a pioneer in ML in China and a research scientist in my lab, died of cancer today at age 66. He was a big promoter of your book and without his strong push behind, I might not have determined to do the translation in the first place. Many of outstanding young ML researchers in China are his former students and they have given me a huge support during the translation and proofreading of your book. So I would like you to say a few words about his effortin your preface. 可以告慰王珏老师的是, Koller 教授在其序言里恰如其分地表示了对他的贡献之衷心感谢。本书中文版的最终出版,就是对王珏老师的纪念! 2014 年 12 月 9 日 于北京科技会堂 -------------------------------------------------------------------------------------------- 图书目录 致谢 插图目录 算法目录 专栏目录 第 1 章 引言 1.1 动机 1.2 结构化概率模型 1.2.1 概率图模型 1.2.2 表示、推理、学习 1.3 概述和路线图 1.3.1 各章的概述 1.3.2 读者指南 1.3.3 与其他学科的联系 1.4 历史注记 第 2 章 基础知识 2.1 概率论 2.1.1 概率分布 2.1.2 概率中的基本概念 2.1.3 随机变量与联合分布 2.1.4 独立性与条件独立性 2.1.5 查询一个分布 2.1.6 连续空间 2.1.7 期望与方差 2.2 图 2.2.1 节点与边 2.2.2 子图 2.2.3 路径与迹 2.2.4 圈与环 2.3 相关文献 2.4 练习 第Ⅰ部分 表 示 第 3 章 贝叶斯网表示 3.1 独立性性质的利用 3.1.1 随机变量的独立性 3.1.2 条件参数化方法 3.1.3 朴素贝叶斯模型 3.2 贝叶斯网 3.2.1 学生示例回顾 3.2.2 贝叶斯网的基本独立性 3.2.3 图与分布 3.3 图中的独立性 3.3.1 d- 分离 3.3.2 可靠性与完备性 3.3.3 d- 分离算法 3.3.4 I- 等价 3.4 从分布到图 3.4.1 最小 I-Map 3.4.2 P-Map 3.4.3 发现 P-Map* 3.5 小结 3.6 相关文献 3.7 习题 第 4 章 无向图模型 4.1 误解示例 4.2 参数化 4.2.1 因子 4.2.2 吉布斯分布与马尔可夫网 4.2.3 简化的马尔可夫网 4.3 马尔可夫网的独立性 4.3.1 基本独立性 4.3.2 独立性回顾 4.3.3 从分布到图 4.4 参数化回顾 4.4.1 细粒度参数化方法 4.4.2 过参数化 4.5 贝叶斯网与马尔可夫网 4.5.1 从贝叶斯网到马尔可夫网 4.5.2 从马尔可夫网到贝叶斯网 4.5.3 弦图 4.5.4 I- 等价 4.6 部分有向图 4.6.1 条件随机场 4.6.2 链图模型 * 4.7 小结与讨论 4.8 相关文献 4.9 习题 第 5 章 局部概率模型 5.1 CPD 表 5.2 确定性 CPD 5.2.1 表示 5.2.2 依赖性 5.3 上下文特定的 CPD 5.3.1 表示 5.3.2 独立性 5.4 因果影响的独立性 5.4.1 noisy-or 模型 5.4.2 广义线性模型 5.4.3 一般公式化表示 5.4.4 独立性 5.5 连续变量 5.5.1 混合模型 5.6 条件贝叶斯网 5.7 小结 5.8 相关文献 5.9 习题 第 6 章 基于模板的表示 6.1 引言 6.2 时序模型 6.2.1 基本假设 6.2.2 动态贝叶斯网 6.2.3 状态观测模型 6.3 模板变量与模板因子 6.4 对象 - 关系领域的有向概率模型 6.4.1 plate 模型 6.4.2 概率关系模型 6.5 无向表示 6.6 结构不确定性 * 6.6.1 关系不确定性 6.6.2 对象不确定性 6.7 小结 6.8 相关文献 6.9 习题 第 7 章 高斯网络模型 7.1 多元高斯分布 7.2.1 基本参数化方法 7.2.2 高斯分布的运算 7.2.3 高斯分布的独立性 7.2 高斯贝叶斯网 7.3 高斯马尔可夫随机场 7.4 小结 7.5 相关文献 7.6 练习 第 8 章 指数族 8.1 引言 8.2 指数族 8.2.1 线性指数族 8.3 因子化的指数族 (factored exponential families) 8.3.1 积分布 (product distributions) 8.3.2 贝叶斯网络 8.4 熵和相对熵 8.4.1 熵 8.4.2 相对熵 8.5 投影 8.5.1 比较 8.5.2 M- 投影 8.5.3 I- 投影 8.6 小结 8.7 相关文献 8.8 习题 第Ⅱ部分 推 理 第 9 章 精确推理:变量消除 9.1 复杂性分析 9.1.1 精确推理分析 9.1.2 近似推理分析 9.2 变量消除:基本思路 9.3 变量消除 9.3.1 基本消除 9.3.2 证据处理 9.4 复杂性与图结构:变量消除 9.4.1 简单分析 9.4.2 图论分析 9.4.3 寻找消除排序 * 9.5 条件作用 * 9.5.1 条件作用算法 9.5.2 条件作用与变量消除 9.5.3 图论分析 9.5.4 改进的条件作用算法 9.6 用结构 CPD 推理 * 9.6.1 因果影响的独立性 9.6.2 上下文特定的独立性 9.6.3 讨论 9.7 小结与讨论 9.8 相关文献 9.9 习题 第 10 章 精确推理:团树 10.1 变量消除与团树 10.1.1 聚类图 10.1.2 团树 10.2 信息传递:和积 10.2.1 团树中的变量消除 10.2.2 团树校准 10.2.3 作为分布的校准团树 10.3 信息传递:信念更新 10.3.1 使用除法的信息传递 10.3.2 和 - 积与信息 - 更新消息的等价性 10.3.3 回答查询 10.4 构建一个团树 10.4.1 源自变量消除的团树 10.4.2 来自弦图的团树 10.5 小结 10.6 相关文献 10.7 习题 第 11 章 推理优化 11.1 引言 11.1.1 再议精确推理 * 11.1.2 能量泛函 11.1.3 优化能量泛函 11.2 作为优化的精确推理 11.2.1 不动点刻画 . 11.2.2 推理优化 11.3 基于传播的近似 11.3.1 一个简单的例子 11.3.2 聚类图信念传播 11.3.3 聚类图信念传播的性质 11.3.4 收敛性分析 * 11.3.5 构建聚类图 11.3.6 变分分析 11.3.7 其他熵近似 * 11.3.8 讨论 11.4 用近似信息传播 * 11.4.1 因子分解的消息 11.4.2 近似消息计算 11.4.3 用近似消息推理 11.4.4 期望传播 11.4.5 变分分析 11.4.6 讨论 11.5 结构化的变分近似 11.5.1 平均场近似 11.5.2 结构化的近似 11.5.3 局部变分法 * 11.6 小结与讨论 11.7 相关文献 11.8 习题 第 12 章 基于粒子的近似推理 12.1 前向采样 12.1.1 从贝叶斯网中采样 12.1.2 误差分析 12.1.3 条件概率查询 12.2 似然加权与重要性采样 12.2.1 似然加权:直觉 12.2.2 重要性采样 12.2.3 贝叶斯网的重要性采样 12.2.4 重要性采样回顾 12.3 马尔可夫链的蒙特卡罗方法 12.3.1 吉布斯采样算法 12.3.2 马尔可夫链 12.3.3 吉布斯采样回顾 12.3.4 一马尔可夫链的一个更广泛的类 * 12.3.5 利用马尔可夫链 12.4 坍塌的粒子 12.4.1 坍塌的似然加权 * 12.4.2 坍塌的 MCMC 12.5 确定性搜索方法 * 12.6 小结 12.7 相关文献 12.8 习题 第 13 章 最大后验推断 13.1 综述 13.1.1 计算复杂性 13.1.2 求解方法综述 13.2 (边缘) MAP 的变量消除 13.2.1 最大 - 积变量消除 13.2.2 找到最可能的取值 13.2.3 边缘 MAP 的变量消除 * 13.3 团树中的最大 - 积 13.3.1 计算最大边缘 13.3.2 作为再参数化的信息传递 13.3.3 最大边缘解码 13.4 多圈聚类图中的最大 - 积信念传播 13.4.1 标准最大 - 积消息传递 13.4.2 带有计数的最大 - 积 BP* 13.4.3 讨论 13.5 作为线性优化问题的 MAP* 13.5.1 整数规划的公式化 13.5.2 线性规划松弛 13.5.3 低温极限 13.6 对 MAP 使用图割 13.6.1 使用图割的推理 13.6.2 非二元变量 13.7 局部搜索算法 * 13.8 小结 13.9 相关文献 13.10 习题 第 14 章 混合网络中的推理 14.1 引言 14.1.1 挑战 14.1.2 离散化 14.1.3 概述 14.2 高斯网络中的变量消除 14.2.1 标准型 14.2.2 和 - 积算法 14.2.3 高斯信念传播 14.3 混合网 14.3.1 面临的困难 14.3.2 混合高斯网络的因子运算 14.3.3 CLG 网络的 EP 14.3.4 一个“准确的” CLG 算法 * 14.4 非线性依赖 14.4.1 线性化 14.4.2 期望传播与高斯近似 14.5 基于粒子的近似方法 14.5.1 在连续空间中采样 14.5.2 贝叶斯网中的前向采样 14.5.3 MCMC 方法 14.5.4 坍塌的粒子 14.5.4 非参数信息传递 14.6 小结与讨论 14.7 相关文献 14.8 习题 第 15 章 在时序模型中推理 15.1 推理任务 15.2 精确推理 15.2.1 基于状态观测模型的滤波 15.2.2 作为团树传播的滤波 15.2.3 DBN 中的团树推理 15.2.4 纠缠 15.3 近似推理 15.3.1 核心思想 15.3.2 因子分解的信念状态方法 15.3.3 粒子滤波 15.2.4 确定性搜索技术 15.4 混合 DBN 15.4.1 连续模型 15.4.2 混合模型 15.5 小结 15.6 相关文献 15.7 习题 第Ⅲ部分 学 习 第 16 章 图模型学习:概述 16.1 动机 16.2 学习目标 16.2.1 密度估计 16.2.2 具体的预测任务 16.2.3 知识发现 16.3 作为优化的学习 16.3.1 经验风险与过拟合 16.3.2 判别式与生成式训练 16.4 学习任务 16.4.1 模型限制 16.4.2 数据的可观测性 16.4.3 学习任务的分类 16.5 相关文献 第 17 章 参数估计 17.1 最大似然估计 17.1.1 图钉的例子 17.1.2 最大似然准则 17.2 贝叶斯网的 MLE 17.2.1 一个简单的例子 17.2.2 全局似然分解 17.2.3 条件概率分布表 17.2.4 高斯贝叶斯网 * 17.2.5 作为 M- 投影的最大似然估计 * 17.3 贝叶斯参数估计 17.3.1 图钉例子的回顾 17.3.2 先验与后验 . 17.4 贝叶斯网络中的贝叶斯参数估计 17.4.1 参数独立性与全局分解 17.4.2 局部分解 17.4.3 贝叶斯网络学习的先验分布 17.4.4 MAP 估计 * 17.5 学习具有共享参数的模型 17.5.1 全局参数共享 17.5.2 局部参数共享 17.5.3 具有共享参数的贝叶斯推理 17.5.4 层次先验 * 17.6 泛化分析 * 17.6.1 渐进性分析 . 17.6.2 PAC 界 17.7 小结 17.8 相关文献 17.9 习题 第 18 章 贝叶斯网络中的结构学习 18.1 引言 18.1.1 问题定义 18.1.2 方法概述 18.2 基于约束的方法 18.2.1 基本框架 18.2.2 独立性测试 . 18.3 结构得分 18.3.1 似然得分 18.3.2 贝叶斯得分函数 18.3.3 单个变量的边缘似然 18.3.4 贝叶斯网的贝叶斯得分 18.3.5 理解贝叶斯得分 18.3.6 先验 18.3.7 得分等价性 * 18.4 结构搜索 18.4.1 学习树结构网 18.4.2 给定序 18.4.3 一般的图 18.4.4 用等价类学习 * 18.5 贝叶斯模型平均 * 18.5.1 基本理论 18.5.2 基本给定序的模型平均 18.5.3 一般的情况 . 18.6 关于额外结构学习模型 18.6.1 关于局部结构学习 18.6.2 学习模板模型 18.7 小结与讨论 18.8 相关文献 18.9 习题 第 19 章 部分观测数据 19.1 基础知识 19.1.1 数据的似然和观测模型 19.1.2 观测机制的解耦 19.1.3 似然函数 19.1.4 可识别性 19.2 参数估计 19.2.1 梯度上升方法 19.2.2 期望最大化( EM ) 19.2.3 比较:梯度上升与 EM 19.2.4 近似推断 * 19.3 使用不完全数据的贝叶斯学习 * 19.3.1 概述 19.3.2 MCMC 采样 19.3.3 变分贝叶斯学习 19.4 结构学习 19.4.1 得分的结构 19.4.2 结构搜索 19.4.3 结构的 EM 19.5 用隐变量学习模型 19.5.1 隐变量的信息内容 19.5.2 确定基数 19.5.3 引入隐变量 19.6 小结 19.7 相关文献 19.8 习题 第 20 章 学习无向模型 20.1 概述 20.2 似然函数 20.2.1 一个例子 20.2.2 似然函数的形式 20.2.3 似然函数的性质 20.3 最大(条件)似然参数估计 20.3.1 最大似然估计 20.3.2 条件训练模型 20.3.3 用缺失数据学习 20.3.4 最大熵和最大似然 * 20.4 参数先验与正则化 20.4.1 局部先验 20.4.2 全局先验 20.5 用近似推理学习 20.5.1 信念传播 20.5.2 基于 MAP 的学习 * 20.6 替代目标 20.6.1 伪似然及其推广 20.6.2 对比优化准则 20.7 结构学习 20.7.1 使用独立性检验的结构学习 . 20.7.2 基于得分的学习:假设空间 . 20.7.3 目标函数 20.7.4 优化任务 20.7.5 评价模型的改变 20.8 小结 20.9 相关文献 20.10 习题 第Ⅳ部分 行为与决策 第 21 章 因果关系 21.1 动机与概述 21.1.1 条件作用与干预 21.1.2 相关关系和因果关系 21.2 因果关系模型 21.3 结构因果关系的可识别性 . 21.3.1 查询简化规则 21.3.2 迭代的查询简化 21.4 机制与响应变量 * 21.5 功能因果模型中的部分可识别性 * 21.6 仅事实查询 * 21.6.1 成对的网络 21.6.2 仅事实查询的界 21.7 学习因果模型 21.7.1 学习没有混合因素的因果模型 21.7.2 从干预数据中学习 21.7.3 处理隐变量 * 21.7.4 学习功能因果关系模型 * 21.8 小结 21.9 相关文献 21.10 习题 第 22 章 效用和决策 22.1 基础:期望效用最大化 22.1.1 非确定性决策制订 22.1.2 理论证明 * 22.2 效用曲线 22.2.1 货币效用 22.2.2 风险态度 22.2.3 合理性 22.3 效用的获取 22.3.1 效用获取过程 22.3.2 人类生命的效用 22.4 复杂结果的效用 22.4.1 偏好和效用独立性 * 22.4.2 加法独立性特性 22.5 小结 22.6 相关文献 22.7 习题 第 23 章 结构化决策问题 23.1 决策树 23.1.1 表示 23.1.2 逆向归纳算法 23.2 影响图 23.2.1 基本描述 23.2.2 决策规则 23.2.3 时间与记忆 23.2.4 语义与最优性准则 23.3 影响图的逆向归纳 23.3.1 影响图的决策树 23.3.2 求和 - 最大化 - 求和规则 23.4 期望效用的计算 23.4.1 简单的变量消除 23.4.2 多个效用变量:简单的方法 23.4.3 广义变量消除 * 23.5 影响图中的最优化 23.5.1 最优化一个单一的决策规则 23.5.2 迭代优化算法 23.5.3 策略关联与全局最优性 * 23.6 忽略无关的信息 * 23.7 信息的价值 23.7.1 单一观测 23.7.2 多重观测 23.8 小结 23.9 相关文献 23.10 习题 第 24 章 结束语 附录 A 背景材料 A.1 信息论 . A.1.1 压缩和熵 A.1.2 条件熵与信息 A.1.3 相对熵和分布距离 A.2 收敛界 A.2.1 中心极限定理 A.2.2 收敛界 A.3 算法与算法的复杂性 A.3.1 基本图算法 A.3.2 算法复杂性分析 A.3.3 动态规划 A.3.4 复杂度理论 A.4 组合优化与搜索 A.4.1 优化问题 A.4.2 局部搜索 A.4.3 分支定界搜索 A.5 连续最优化 A.5.1 连续函数最优解的刻画 A.5.2 梯度上升方法 A.5.3 约束优化 A.5.4 凸对偶性 参考文献 符号索引 符号缩写
个人分类: 科研记事|44788 次阅读|23 个评论
Coursera: Neural Networks for ML- Lecture 13
zhuwei3014 2014-8-13 11:59
Belief Nets wake-sleep algorithm 1. The history of backpropagation 2. Belief Nets 3. Learning Sigmoid Belief Nets 4. Wake-Sleep Algorithm
个人分类: Coursera: Neural Networks|4971 次阅读|0 个评论
Coursera: Neural Networks for ML- Lecture 12
zhuwei3014 2014-8-7 13:38
2012_An efficient learning procedure for deep Boltzmann Machine ICML2008_Training restricted Boltzmann machines using approximations to the likelihood gradient(PCD) 1. The Boltzman Machine learning algorithm 2. More efficient ways to get the statistics(Optional) 3. Restricted Boltzmann Machine 4. An example of RBM learning 5. RBM for collabrative filtering
个人分类: Coursera: Neural Networks|2669 次阅读|0 个评论
翻译机GTS,我的新助手
zuojun 2014-7-16 17:05
花了一些时间,用英文写了一篇短文,纪念一位老教授 (因为我已经不习惯用中文写文章了)。可是,向中国的同事们递交一篇 英文 纪念 短文,我,我,我还没那个胆。于是,我决定借GTS的一臂之力: http://www.gts-translation.com/tools/free-online-translation/ I believe in mentoring —in memory of Prof. Chen 我 坚 信 为 人 师长是我们的义务 -以此纪念陈世训教授 When I look back on my life, I see more than my own footprints. There were my parents, of course. There were also teachers at every phase of my life, from grade school all the way through college. And, there were graduate school professors. Unlike in college when there were 50 students in one class, students in graduate schools had close interaction with their professors then. Whenever I look back on my school years, I am always filled with gratitude, especially toward my professors and mentors in meteorology andoceanography. 当我回首 往事 ,我看到在我自己的脚印周 围 , 还 有 许许 多多的脚印。有我父母的脚印。有每一 个阶段的 老 师们 的脚印,从小学一年 级 一直到大学 毕业 ;研究生院教授 们 的脚印更是清新可 见 。当时的研究生与教授之 间 有着比较密切的互 动 。每当想起我的学生 时 代,我就充 满 了感激之情,尤其是 对 我的气象学与海洋学的教授 们 和 导师们。 I have many fond memories about the Meteorology Department of Zhongshan University, where I was a graduate student and then stayed on briefly as a junior faculty. I remember riding bicycle across the old Pearl River Bridge everyday to use a “super computer” off campus for my thesis work. I learnt how to play volleyball from my colleagues at the department. Though I never became very good at it, I was ableto join beach volleyball games when I was a graduate student at Nova in South Florida. I don’t need to close my eyes to see you, Prof. Chen, always neatly dressed and always with a kind smile. I never forget what you told us in agroup meeting before my graduation, on the importance of solidarity. 我 对 中大气象系有 许 多美好的回 忆。在那儿,我度过了五个春秋, 三年的 硕 士研究生,然后留校工作。 为 了 硕 士 论 文的数据 计 算, 我们 每天 骑 自行 车 横跨珠江大 桥 去市里的一个 计 算中心工作 。 系里的 男教工教我们女教工打排球。虽然我一直打得不好,但在南佛罗里达州的 Nova 读研究生的时候,我也能够参加沙滩排球友谊赛。记忆中的陈先生总是衣着整齐,面带微笑。这一切,至今历历在目。我也不会忘记在毕业前的一次组会上,您 谆谆 教 诲 我们“要团结”。 (隐去 下面 三段。否则,原文发表时会被认为是“抄袭”。) ps. The Chinese version has been edited by a real person, not GTS, after I have worked hard on it.
个人分类: 中文博客|3025 次阅读|0 个评论
A smart machine to translate Chinese into English or E to C
zuojun 2014-6-30 17:40
I typed in 打 边 炉 and it gave me Hot pot. I am impressed! Now, you try it, and let me know how you like it (or not). http://www.gts-translation.com/tools/free-online-translation/ Is it perfect? Of course NOT! It knows 好好学习, but NOT 天天向上。
个人分类: Scientific Translation|3330 次阅读|0 个评论
Comparison of Pros and Cons of Two NLP Approaches
liwei999 2014-6-19 17:19
So it is time to compare and summarize the pros and cons of the two basic NLP (Natural Language Processing) approaches and show where they are complementary to each other. Some notes: 1. In text processing, majority of basic robust machine learning is based on keywords, so-called BOW (bag-of-word) model although there is research of machine learning that goes beyond keywords. It actutally utilizes n-gram (mostly bigram or trigram) linear word sequence to simulate the language structure. 2. Grammar engineering is mostly a hand-crafted rule system based on linguistic structures (often represented internally as a grammar tree), to simulate the linguistic parsing in human mind. 3. Machine learning is good at viewing the forest (tasks such as document classification or word clustering from a corpus; and it fails in short messages) while rules are good at examining each tree (sentence-level tasks such as parsing and extraction; and it handles short messages well). This is understandable. Document or corpus contains a fairly big bag of keywords, making it easy for machine to learn statistical clues of the words for a given task. Short messages do not have enough data points for a machine learning system to use as evidence. On the other hand, grammar rules decode the linguistic relationships between words to understand the sentence, therefore it is good at handling short messages. 4. In general, a machine learning system based on keyword statistics is recall-oriented while a rule system is precision-oriented. They are complementary in these two core metrics of data quality. Each rule may only cover a tiny portion of the language phenomena, but once it captures it, it is usually precise. It is easy to develop a highly precise rule system but the recall typically only picks up incrementally in accordance with the number of rules developed. Because keyword based machine learning has no knowledge of sentence structures, at best its ngram evidence indirectly simulates languiage structure, it usually cannot reach high precision, but as long as the training corpus is sizable, good recall can be expected by the nature of underlying keyword statistics and the disregard for structural constraints. 5. Machine learning is known for its robustness and scalability as its algorithms are based on science (e.g. MaxEnt is based on information theory) that can be repeated and rigidly tested (of course, like any application areas, there are tricks and know-how to make things work or fail too in practice). The development is also fast once the labeled corpus is available (which is often not easy in practice) because there are off-shelf tools in open source and tons of documentation and literature in the community for proven ML algorithms. 6. Grammar engineering on the other hand tends to depend more on the expertise of the designer and developer for being robust and scalable. It requires deep skills and secret source which may only be accumulated based on years of successes as well as lessons learned. It is not purely a scientific undertaking but more of a blancing art in architect, design and development. To a degree, this is like chefs for Chinese cooking: with the same materials and the assumably the same recipe, one chef's dish can taste a lot better or different from that of another chef. Recipe only gives a framework while the monster of great taste is in the details of know-how. It is not easily repeatable across developers but the same master can repeatedly make the best quality dishes/systems. 7. The knowledge bottleneck is reflected in both machine learning systems and in grammar systems. A decent machine learning system requires a large hand-labeled corpus (research oriented unsupervised learning systems do not need manual annotation, but they are often not practical either). There is consensus in the community that the quality of machine learning usually depends more on the data than on the algorithms. On the other hand, the bottleneck of grammar engineering lies in skilled designer (data scientist) and well-trained domain developers (computational linguists), who are often in short supply today. 8. Machine learning is good at coarse-grained specific task (typical example is classification) while grammar engineering is good at fine-grained analysis and detailed insight extraction. Their respective strengths make them highly complementary in certain application scenarios because as information consumers, users often demand both coarse-grained overview as well as details of actionable intelligence. 9. One big big problem of a machine learning system is the difficulty to fix a reported quality bug. This is because the learned model is usually a black box and no direct human interference is allowed or even possible to address a specific problem unless the model is re-trained with new corpus and/or new features. In the latter case, there is no guarantee that the specific problem we want to solve will be addressed well by re-training as the learning process needs to blance all features in a unified model. This issue is believed to be the major reason why the Google search ranking algorithm favors hand-crafted functions over machine learning because their objective of better user experience can hardly by achieved by a black box model . 10. Grammar system is much more transparent in the language understanding process. The modern grammar systems are all designed with careful modularization so that each specific quality bug can be traced to the corresponding module of the system for fine-tuning. The effect is direct, immediate and can be incrementally accumulated for overall perforamcece enhancement. 11. From the perspective of the NLP depth, at least for the current state of the art, machine learning seems to do shallow NLP work fairly well while grammar engineering can go much deeper in linguistic parsing to achieve deep analytics and insights. (The on-going deep learning research program might get machine learning to some level deeper than before, but it is yet to see how effective it can do real deep NLP and how deep it can go, especially in the area of text processing and understanding.) Related blogs: why hybrid? on machine learning vs. hand-coded rules in NLP 再谈机器学习和手工系统:人和机器谁更聪明能干? 【置顶:立委科学网博客NLP博文一览(定期更新版)】
个人分类: 立委科普|11202 次阅读|0 个评论
[转载] Is Google ranking based on machine learning?
liwei999 2014-6-18 17:21
Quora has a question with discussions on Why is machine learning used heavily for Google's ad ranking and less for their search ranking? A lot of people I've talked to at Google have told me that the ad ranking system is largely machine learning based, while search ranking is rooted in functions that are written by humans using their intuition (with some components using machine learning). Surprise? Contrary to what many people have believed, Google search consists of hand-crafted functions using heuristics. Why? 479 One very popular reply there is from Edmond Lau , Ex-Google Search Quality Engineer who said something which we have been experiencing and have indicated over and over in my past blogs on Machine Learning vs. Rule System, i.e. it is very difficult to debug an ML system for specific observed quality bugs while the rule system, if designed modularly, is easy to control for fine-tuning: From what I gathered while I was there, Amit Singhal , who heads Google's core ranking team, has a philosophical bias against using machine learning in search ranking. My understanding for the two main reasons behind this philosophy is: In a machine learning system, it's hard to explain and ascertain why a particular search result ranks more highly than another result for a given query. The explainability of a certain decision can be fairly elusive; most machine learning algorithms tend to be black boxes that at best expose weights and models that can only paint a coarse picture of why a certain decision was made. Even in situations where someone succeeds in identifying the signals that factored into why one result was ranked more highly than other, it's difficult to directly tweak a machine learning-based system to boost the importance of certain signals over others in isolated contexts. The signals and features that feed into a machine learning system tend to only indirectly affect the output through layers of weights, and this lack of direct control means that even if a human can explain why one web page is better than another for a given query, it can be difficult to embed that human intuition into a system based on machine learning. Rule-based scoring metrics, while still complex, provide a greater opportunity for engineers to directly tweak weights in specific situations. From Google's dominance in web search, it's fairly clear that the decision to optimize for explainability and control over search result rankings has been successful at allowing the team to iterate and improve rapidly on search ranking quality. The team launched 450 improvements in 2008 , and the number is likely only growing with time. Ads ranking, on the other hand, tends to be much more of an optimization problem where the quality of two ads are much harder to compare and intuit than two web page results. Whereas web pages are fairly distinctive and can be compared and rated by human evaluators on their relevance and quality for a given query , the short three- or four-line ads that appear in web search all look fairly similar to humans. It might be easy for a human to identify an obviously terrible ad, but it's difficult to compare two reasonable ones: Branding differences, subtle textual cues, and behavioral traits of the user, which are hard for humans to intuit but easy for machines to identify, become much more important. Moreover, different advertisers have different budgets and different bids, making ad ranking more of a revenue optimization problem than merely a quality optimization problem. Because humans are less able to understand the decision behind an ads ranking decision that may work well empirically, explainability and control -- both of which are important for search ranking -- become comparatively less useful in ads ranking, and machine learning becomes a much more viable option. Jackie Bavaro , Google PM for 3 years Suggest Bio Votes by Piaw Na ( Worked at Google ) , Marc Bodnick , Alex Clemmer , Tudor Achim , and 92 more . Edmond Lau's answer is great, but I wanted to add one more important piece of information. When I was on the search team at Google (2008-2010), many of the groups in search were moving away from machine learning systems to the rules-based systems. That is to say that Google Search used to use more machine learning, and then went the other direction because the team realized they could make faster improvements to search quality with a rules based system. It's not just a bias, it's something that many sub-teams of search tried out and preferred. I was the PM for Images, Video, and Local Universal - 3 teams that focus on including the best results when they are images, videos, or places. For each of those teams I could easily understand and remember how the rules worked. I would frequently look at random searches and their results and think Did we include the right Images for this search? If not, how could we have done better?. And when we asked that question, we were usually able to think of signals that would have helped - try it yourself. The reasons why *you* think we should have shown a certain image are usually things that Google can actually figure out. Upvote • Comment • Share • Thank • Report • Written 10 Apr, 2013 Anonymous Votes by Edmond Lau ( Ex-Google Search Quality Engineer ) , Bin Lu ( Software Engineer at Google ) , Keith Rabois , Vu Ha , and 34 more . Part of the answer is legacy, but a bigger part of the answer is the difference in objectives, scope and customers of the two systems. The customer for the ad-system is the advertiser (and by proxy, Google's sales dept). If the machine-learning system does a poor job, the advertisers are unhappy and Google makes less money. Relatively speaking, this is tolerable to Google. The system has an objective function ($) and machine learning systems can be used when they can work with an objective function to optimize. The total search-space (# of ads) is also much much smaller. The search ranking system has a very subjective goal - user happiness. CTR, query volume etc. are very inexact metrics for this goal, especially on the fringes (i.e. query terms that are low-volume/volatile). While much of the decisioning can be automated, there are still lots of decisions that need human intuition. To tell whether site A better than site B for topic X with limited behavioural data is still a very hard problem. It degenerates into lots of little messy rules and exceptions that tries to impose a fragile structure onto human knowledge, that necessarily needs tweaking. An interesting question is - is the Google search index (and associated semantic structures) catching up (in size and robustness) to the subset of the corpus of human knowledge that people are interested in and searching for ? My guess is that right now, the gap is probably growing - i.e. interesting/search-worthy human knowledge is growing faster than Google's index.. Amit Singhal's job is probably getting harder every year. By extension, there are opportunities for new search providers to step into the increasing gap with unique offerings. p.s: I used to manage an engineering team for a large search provider (many years ago). 【置顶:立委科学网博客NLP博文一览(定期更新版)】
个人分类: 立委科普|3747 次阅读|0 个评论
求解OSCAR regularization问题的长文
热度 1 zengxrong 2013-9-25 17:43
这是我们求解OSCAR regularization问题的长文 http://arxiv.org/pdf/1309.6301.pdf
个人分类: 压缩感知|1849 次阅读|2 个评论
machine learning学习小感
a6657266 2013-7-17 16:55
在线看了Andrew Ng关于machine learning 以及feature learning的课程,收获颇丰,很多之前不是很清楚的概念也都了解了。​​​​ 大部分都需要公式详解,为了方便,我只在这里分条叙上文字性内容。 感觉machine learning 就是不断的再寻求方法使得cost function 最小。 feature scale 非常重要,否则会影响gradient descent的速度。 learning rate可以初始化成10个值,0.001,0.003,0.009,...找到使cost function下降最快的 如果矩阵不可逆,也叫singular矩阵 用newton方法找到最佳参数方法较快,但是计算量较大,如果训练样本数在1000以下,就可以使用,而且效果比gradient descent好 regularization是为了抑制overfitting stochastic gradient descent 较为常用,因为他每次循环只需要计算一个example,不用计算所有example的和
4510 次阅读|0 个评论
[转载]scikit-learn: machine learning in Python
chuangma2006 2013-4-27 07:18
scikit-learn is a Python module integrating classique machine learning algorithmes in the tightly-nit world of scientific Python packages. It aims to provide simple and efficient solutions to learning problems that are accessible to everybody and reusable in various contexts: machine-learning as a versatile tool for science and engineering. website: http://scikit-learn.org/dev/index.html
个人分类: Python|3187 次阅读|0 个评论
[转载]CiteSpace(Could not create the Java virtual machine)
jerrycueb 2013-2-20 18:40
[转载]CiteSpace(Could not create the Java virtual machine)
CiteSpace现在通过WebStart启动时,为Java虚拟机(JVM)请求最多为1GB的内存。如果你计算机上的内存满足不了这一要求,就会遇到“Could not create the Java virtual machine”的问题,也就不能启动CiteSpace. 有两个解决办法: 1. 扩展内存 2. 根据你现有内存的大小选用下列对应的启动 0.5GB: http://cluster.ischool.drexel.edu/~cchen/citespace/current/citespace512mb.jnlp 1.5GB: http://cluster.ischool.drexel.edu/~cchen/citespace/current/citespace1.5gb.jnlp 2.0GB: http://cluster.ischool.drexel.edu/~cchen/citespace/current/citespace2gb.jnlp 3.0GB: http://cluster.ischool.drexel.edu/~cchen/citespace/current/citespace3gb.jnlp 4.0GB: http://cluster.ischool.drexel.edu/~cchen/citespace/current/citespace4gb.jnlp
个人分类: 知识图谱|3102 次阅读|0 个评论
开始科研,路漫漫其曲折
热度 1 ayzcq 2013-1-16 14:41
博士阶段第一个学期即将结束,回顾一下,有所得,有所惑。写在此处,算一个非正式年终小结,也借此整理一下思绪吧。 1.8月初-9月中,大约一个半月时间,适应并学习了Structured SVM,感觉machine learning挺有意思,但是对于自己来说,确实有些难,硬着头皮看tutorial,看paper,终于在第一次自己讲组会后,有了阶段性理解。 2.9月底-期末,做实验,效果一直不好,知道现在,model似乎看不出问题,但是就是没有预想的效果。开始的时候是code漏洞百出,千锤百炼之后,code问题的可能已经极小。到底怎么回事儿呢? 数据有问题? kernel map有问题? code有问题? 模型真的有问题?
1687 次阅读|1 个评论
A sad day for machine learning
benyang22 2012-12-21 14:06
Machine learning algorithms are used to identify VPN traffic. The internet control in China seems to have been tightened recently , according to the Guardian. Several VPN providers claimed that the censorship system can 'learn, discover and block' encrypted VPN protocols. Using machine learning algorithms in protocol classification is not exactly a new topic in the field . And given the fact that even the founding father of the 'Great Firewall,' Fan Bingxing himself, has also written a paper about utilizing machine learning algorithm in encrypted traffic analysis, it would be not surprising at all if they are now starting to identify suspicious encrypted traffic using numerically efficient classifiers . So the arm race between anti-censorship and surveillance technology goes on.
个人分类: 饭后胡言|2879 次阅读|0 个评论
a few useful things to know about machine learning
justinzhao 2012-12-15 17:51
(1) A few useful things to know about machine learning (2) the attached paper is translation of this paper
个人分类: 读书日记|3935 次阅读|0 个评论
[转载]机器学习几本书:my list of cool machine learning books
jiandanjinxin 2012-11-27 16:09
http://matpalm.com/blog/2010/08/06/my-list-of-cool-machine-learning-books/ 0) " Machine Learning: a Probabilistic Perspective " byKevin Patrick Murphy Now available amazon.com and other vendors. Electronic versions (e.g., for Kindle) will be available later in the Fall. Table of contents Chapter 1 (Introduction) Information for instructors from MIT Press . If you are an official instructor, you can request an e-copy, which can help you decide if the book is suitable for your class. You can also request the solutions manual. Errata Matlab software All the figures , together with matlab code to generate them 1) "programming collective intelligence" by toby segaran if you know nothing about machine learning and haven't done maths since high school then this is the book for you. it's a fantastically accesible introduction to the field. includes almost no theory and explains algorithms using actual python implementations. 2) "data mining" by witten and frank this book covers quite a bit more than programming c.i. while still being extremely practical (ie very few formula). about a fifth of the book is dedicated to weka, a machine learning workbench which was written by the authors. apart from the weka section this book has no code. i made a little screencast on weka awhile back if you're after a summary. 3) "introduction to data mining" by tan, steinbach and kumar covers almost the same material as the witten/frank text but delves a little bit deeper and with more rigour. includes no code (none of the books do from now on) with algorithms described by formula. has a number of appendices on linear algebra, probability, statistics etc so that you can read up if you're a bit rusty or new to the fields (the witten/frank text lack these). some people might argue having both of these books is a waste since they cover so much of the same ground but i've always found multiple explanations from different authors to be a great way to help understand a topic. i read the witten/frank text first and am glad i did but if i could only keep one i'd keep this one. intermission at this point you've probably got enough mental firepower to handle some of the uni level machine learning course notes that are floating about online. if you're keen to get a better foundation of the maths side of things it'd be worth working through andrew ng's lecture series on machine learning. (20 hours of a second year stanford course on machine learning) i also found andrew moore's lecture slides really great. (they do though require a reasonable understanding of the basics) 4) "foundations of statistical natural language processing" by manning and schutze not a machine learning book as such but great for learning to deal with one of the most common types of data around; text. since most of machine learning theory is about maths (ie numbers) this is awesome in helping to understanding how to deal with text in a mathematical context. 5) "introduction to machine learning" by ethem alpaydin covers generally the same sort of topics as the data mining books but with much more rigour and theory (derivations, proofs, etc). i think this is a good thing though since understanding how things work at a low level gives you the ability to tweak and modify as required. loads more formulas but again with appendixs that introduce the basics in enough detail to get by. 6) "all of statistics" by larry wasserman by this stage you'll probably have an appreciation of how important statistics is for this domain and it might be worth foccussing on it for a bit. personally i found this book to be a great read and though i've only read certain sections in depth i'm looking forward to when i get a chance to work through it cover to cover 7) "the elements of statistical learning" by hastie, tibshirani and friedman. with a bit more stats under your belt you might have a chance of getting through this one; the most complex of the lot. this book is absolutely beautifully presented and now that it's FREE to download you've got no reason not to have a crack at it. a remarkable piece of work and one i've yet to get through fully cover to cover, it's quite hardcore and right on the border of my level of understanding ( which makes it perfect for me :P ) ps. books i haven't read that are in the mail "machine learning" by tom mitchell have been wanting to read this one for awhile, i'm a big fan of tom mitchell , but couldn't justify the cost however just found out the other day the paperback is a third of the price of the hardback i was looking at!! the book's in the mail "pattern recognition and machine learning" by chris bishop all of a sudden seemed like everyone was reading this but me so it was time to jump on the bandwagon 《模式分类》如果是计算机、物理背景的 ,先看 Bishop的Machine Learning and Pattern Recognition ,然后看T. Hastie的 Elements of Statistical Learning 如果是数学、统计背景的,调转个顺序就可以了。Bishop的那本太厚推荐Jordan的统计学习的课件,全面,难度适中 http://www.cs.berkeley.edu/~jordan/courses/281B-spring04/ 如果实在对英文没兴趣,可以看看李航的那本统计学习,比较基础 如果仅仅想看看这方面的应用情景,推荐吴军的数学之美 以上内容转自 http://www.zhizhihu.com/html/y2012/4019.html
2974 次阅读|0 个评论
[转载]machine learning non-linear: SVM, Neural Network
genesquared 2012-11-8 15:59
Support vector machine - Wikipedia, the free encyclopedia en.wikipedia.org/.../Support_vector_ machin ... - 网页快照 - 翻译此页 In machine learning , support vector machines (SVMs, also support vector ... SVMs can efficiently perform non-linear classification using what is called the kernel ... Formal definition - History - Motivation - Linear SVM Machine Learning : Do there exist non-linear online (stochastic ... www.quora.com/ Machine - Learning /Do-there-exist- no ... - 翻译此页 Not sure weather I got it right but ... Artificial Neural Networks (ANN) are able to capture non-linear hypothesis. Furthermore, extensions to ANN incorporate ... Machine Learning scianta.com/technology/ machinelearning .htm - 网页快照 - 翻译此页 Linear and Non-Linear Regression is a machine learning technique for fitting a curve to a collection of data. The algebraic formula for the curve is a model of the ... Foundations of Machine Learning Lecture 5 www.cs.nyu.edu/~mohri/mls/lecture_5.pdf - 翻译此页 文件格式: PDF/Adobe Acrobat - 快速查看 作者:M Mohri - 相关文章 Mehryar Mohri - Foundations of Machine Learning . Motivation. Non-linear decision boundary. Efficient computation of inner products in high dimension. Flexible ... Machine Learning Learning highly non-linear functions www.cs.cmu.edu/~epxing/Class/10701.../lecture5.pdf - 翻译此页 文件格式: PDF/Adobe Acrobat - 快速查看 1. Machine Learning . Neural Networks. Eric Xing. 10-701/15-781, Fall 2011. 781, Fall 2011. Lecture 5, September 26, 2011. Reading: Chap. 5 CB. 1. © Eric Xing ... Machine Learning Learning non-linear functions www.cs.cmu.edu/~epxing/Class/10701.../lecture6.pdf - 翻译此页 文件格式: PDF/Adobe Acrobat - 快速查看 Lecture 6, February 4, 2008. Reading: Chap. 1.6, CB Chap 3, TM. Learning non-linear functions f: X → Y. ○. X (vector of) continuous and/or discrete vars. ○ ... machine learning - non linear svm kernel dimension - Stack Overflow stackoverflow.com/.../ non-linear -svm-kernel... - 网页快照 - 翻译此页 3 个回答-10月22日 I have some problems with understanding the kernels for non-linear ... The transformation usually increases the number of dimensions of your ... Investigation of expert rule bases, logistic regression, and non-linear ... www.ncbi.nlm.nih.gov/pubmed/19474477 - 翻译此页 作者:MC Prosperi - 2009 - 被引用次数:17 - 相关文章 Investigation of expert rule bases, logistic regression, and non-linear machine learning techniques for predicting response to antiretroviral treatment. Prosperi ... Original article Investigation of expert rule bases, logistic regression ... www.intmedpress.com/serveFile.cfm?sUID...847b... - 翻译此页 文件格式: PDF/Adobe Acrobat 作者:MCF Prosperi - 2009 - 被引用次数:17 - 相关文章 developed through machine learning methods. Methods: The aim of the study was to investigate linear and non-linear statistical learning models for classifying ... machine learning - Non-linear (e.g. RBF kernel) SVM with SCAD ... stats.stackexchange.com/.../ non-linear -e-g-r... - 网页快照 - 翻译此页 1 Mar 2012 – Is there one? I think there's a penalizedSVM package in R but it looks to use a linear kernel. Can't quite tell from the documentation. If it's linear ...
2317 次阅读|0 个评论
[J-2012] Characteristics of chip evolution with elevating cu
melius 2012-9-4 20:38
Characteristics of chip evolution with elevating cutting speed from low to very high Liu Zhanqiang*, Su Guosheng To validate the correlation between chip morphology and material dynamic mechanical properties, a wide-range cutting speed (from 30 m/min to 7000 m/min) experiment is conducted with AerMet100 steel. The chips are collected and photos are taken with an optical microscope. The focus is put on workpiece material embrittlement and chip morphology evolution with the cutting speed rising. It is found that with the increase of cutting speed the workpiece material embrittles. At 7000 m/min the metal becomes completely brittle and the chip is made up of small non-plastic fractured fragments. Characteristics of the cutting temperature and cutting heat in the process are also presented. 2012-Characteristics of chip evolution with elevating cutting speed from low to .pdf
个人分类: [Publications] 论文全文|2569 次阅读|0 个评论
[转载]machine learning libraries in Python
chuangma2006 2012-8-5 08:31
http://amundblog.blogspot.com/2008/05/pragmatic-classification-with-python.html
个人分类: Python|2969 次阅读|0 个评论
What is Life? -- a Turing Machine!
jingpeng 2012-7-16 18:41
What is Life? -- a Turing Machine!
What is life? 是在1943年,薛定谔在都柏林做的一系列演讲题目,并汇总写成了一本书,就叫《What is Life?》,在我博客《 薛定谔的深刻洞见 》里有介绍。这个问题和这本书吸引了很多物理学家投身生命科学,最终发现DNA的双螺旋结构,破解了很多生命的密码。现在,2012年,文特尔(Craig Venter)在都柏林又做了一次演讲,题目是《What is Life? – A 21 st century perspective》,用现代的生物学发现,试图回答这个问题,阐释生命的本质。 Venter认为,生命的本质就是图灵机,生命的过程就像是运行一个软件!这个软件就编码在DNA中,并且可以修改!Venter组建了 Celera Genomic ,先于人类基因组计划完成测序,这已经是一个了不起的成就。Venter还创立了 JCVI( J. Craig Venter Institute ),通过组建DNA,创造了生命,引发了合成生物学的革命。可以说,在Venter眼里,生命确实是一堆超长序列编码的程序,人生就是这个程序的运行。这个程序可以修改,但必须非常小心,因为每个部分是紧密相连的。胡乱修改很容易造成系统崩溃。 从人性的角度,这是让很多人难以接受的。一个大活人,怎么变成了一部机器和程序了?这生命是不是太没意义了? 我认同Venter的观点,但这并不会导致人生没有意义。天生的东西难以改变,这是父母遗传。本身可以看作自然的馈赠,构成了自我的一部分,但并不是全部的自我。全部的自我还包括后天的经历,人生的轨迹等等。当然,还有大脑的发育和记忆的形成,都是基于基因,但超越基因的。相同的基因,得到类似的大脑结构,就像构建了一个计算机硬件。但里面运行什么程序,记载了什么内容,则是后天的自我经历造成的。每个人的身体和脑结构差不多,但还是可以有不一样的人生。打个不太恰当的比喻,几台计算机,有的装了Windows,有的装了Linux,有的有魔兽,有的有星际,运转起来是很不一样的。即使是都装了星际,选择的地图,操作的玩家,选择的战略战术,还有微操,对手的战略战术等,都会造成一局独一无二的游戏。这个游戏就是自我。 Craig Venter 网上热传的,延参法师讲解绳命。这也可以对比中西方的文化和思维,这两人还有点像,有木有?!中国人的思维方式还是模糊的,西方人一直在追求精确的理解。这是普通大众的兴趣造成的,也就是一种文化。在博文《 文化之殇--传统对锐气的消磨 》里有分析。 参考资料: 【1】 “人造生命之父”文特尔:生命是一台图灵机 ,中国科学报,2012 【2】 ‘What is Life? – A 21st Century Perspective’ with Dr. Craig Venter
个人分类: 文化-评论|3940 次阅读|0 个评论
[转载]Free online course: Learning from data (July 10—
热度 1 zuojun 2012-7-8 07:36
http://www.work.caltech.edu/telecourse.html A real Caltech course, not a watered-down version Free , introductory Machine Learning course Taught by Caltech Professor Yaser Abu-Mostafa Lectures recorded from a live broadcast, including QA Prerequisites: Basic probability, matrices, and calculus Homeworks with online grading and ranking Discussion forum for participants Summer session starts on July 10, 2012
个人分类: Education|2285 次阅读|2 个评论
探秘公钥与私钥
yufree 2012-6-16 14:09
探秘公钥与私钥
最近读了阮一峰关于数字签名的 介绍 ,中间有一个问题一直困扰我:既然公钥与私钥是不同的,那它们又是如何保证可逆的解读明文与密文呢?直接讨论这个问题理解上有点困难,先从简单的加密与解密开始吧。 1什么是密码 这个问题似乎很简单,密码学里密码(cipher)就是用来加密与解密的运算法则。一般使用时,密码(cipher)跟暗语(code)是差不多的,但在古典密码的范畴里,code更多指利用codebook解读的无规律语句,而cipher则强调存在相应的运算法则。 2密码的分类 如下图,密码被分为古典密码与现代密码,至于中间的那个Rotor machine是用来解决流密码的一种机械解码装置,算是古典密码的一种,因为区别古典密码与现代密码的最主要方式就是看密码的表示方式是否是二进制的。古典密码主要采用移位与替换来实现加密与解密,众所周知的凯撒密码就是一种移位密码;现代密码主要包括私钥加密技术与公钥加密技术,其中私钥加密技术本质上与古典密码差不多,加密解密的key是一致的,但公钥加密技术就不一样了,其加密与解密的key是不一样的。这就是今天所要讨论的问题,这样的加密机制是什么。 3密码的有效性 在讨论公钥加密机制之前,有必要先讨论下密码的有效性问题。众所周知,密码是用来保密的,如果很容易被第三方解密那就没什么意义了,那么密码是如何保证自己不被破解呢?首先,密码可以借用随机数来实现,但这里提到的随机数必须是真随机数而不是由函数提供的伪随机数,例如 费纳姆密码 就是采用明文加随机数来实现,shannon就证明过理论不可解的密码的key必须至少与明文一样长而且要加入随机数,但因为随机数实际上往往并不随机,所以事实上存在被破解的可能;其次,密码可以通过庞大的运算量来实现,换句话说就是解密的成本(如时间或运算量)过大,而明文时效性有限,这样基于实际上解密的不经济性而提供的密码正是当今使用的主流;最后,上面两种问题如果不考虑技术限制都可能被破解,这其中一个重要的原因就是明文代入的规律性,也就是说,密文带有明文的统计信息或自然语义信息越多就越有可能被破解,但另一方面,如果密文没什么规律而明文又是有规律的那这种加密方式本身事实上就是包含了明文,这样的加密方式并不具备通用性因而意义也不大,爱伦坡的小说中就出现过一个密文全是i的信,这样的密码解起来自然就不太可能了。为了实现密码的有效性与通用性,私钥加密技术就出现了,例如DES算法、AES算法……这类算法虽然不错但推广时会发现如果两个人用所需要的key还少一些,但在互联网上,数以亿计的用户如果在实现通信时互换key,且不说这个过程有多麻烦,单是key的数量就是天文数字,因此公钥加密技术出现了。 4公钥密码加密的理论基础 关于公钥密码的表现形式,请参照阮一峰的博文。这里要谈的是为什么加密与解密不用同一把钥匙。 直觉上,加密与解密是互逆过程,也就是说算法也应该互逆。现代密码中采用2进制通讯,所以这类算法至少对二进制互逆,最常用的莫过于XOR算法。这类算法的实现就是对比明文与key的二进制代码,数值一致则密文为0,否则为1;这样解密时就简单多了,同样的key去对比,明文是1,key是0则密文为1,解密出的明文也还原为1。当然这说的是不包含其他算法的加密解密,但过程上只要可逆,我们直觉上就更易接受。 在公钥加密中,主要采用了单向函数。通俗地说,单向函数的意义就在于逆向运算困难,这里主要采用的是素因数分解与离散对数两种运算。 1)素因数分解。这个不难理解,给你一个数,例如42,你可以规定是通过2×3×7得到的,但它也可能是6×7得到的,对于小数字似乎不明显,但对于一个大数而言例如1675401,你能看出来是3×7×19×19×13×17得到的吗?如果里面的素数因子越大,这样的反算就越困难。但相反的,如果你知道数m的n-1个因子就很容易得到另一个未知的因子,这就是key,也就是攻克单向函数的单向陷门函数。 2)离散对数。如果你知道a^b=c这个公式中的a与b,那么c很容易就求出来了,但如果知道的是a与c求b就悲剧了,这就是离散对数问题。 那么这些运算方法如何运用在公钥加密里的呢?以RSA算法为例讲解的话,就要了解下取模运算,取模运算可以保证数n的运算结果只出现在1~n-1之间,且也能保证四则运算都可以进行,这对于擅长处理有限域问题的计算机而言是十分有利的。那么取模运算又是如何与上面的素因数分解结合在一起的呢?这就依赖 费马小定理 了,如果两个数a与n互素的话,那么a^(n-1)!=1(mod n)。但事实上,费马小定理判定出的数不见得就是素数,如3215031751这个数在a=2,a=3,a=5,a=7上是满足费马小定理的,但这是个伪素数(151×751×28351)。但在实际应用上是可以接受这种误差的,所以这个问题到不用太纠结。事实上,费马小定理只是欧拉定理的一个特例,事实上,a的欧拉函数(只小于n与n互素的数的数量)次方对n取模都得1,所以我们可以看到事实上在1~(n-1)的所有所有整数上,有 a^(kL+1)=a(mod n) 其中,L是n=pq中(p-1)与(q-1)的最小公倍数,k为非负整数 那么对于RSA算法而言是如何利用这些规律的呢? 首先,加密的表示是C=P^e(mod N),这里P是明文,C是密文,e N都是公钥;解密的表示是P=C^d(mod N),其中d是私钥。将解密的公式代入会发现事实上有C=C^ed(mod N) ,那么这里面的问题就很明显了,只要ed=kL+1的话这一组加密与解密就成立,更重要的是,L正是素因数的最小公倍数,这就将素因数分解与加密相结合了,同时也可以发现用不同的e(公钥)可以得到不同的d,一般为了安全性e会选的大一点这样对应的d也会大一些,破解起来就更麻烦。在RSA算法中,明文通过编码方式转换为二进制码,二进制码通过分组转换为10进制的数,然后通过公钥进行加密,解密的时候根据私钥计算得到明文,然后转成二进制码,之后转为明文就可以了。从这一过程可以看出事实上加密与解密的关键在于公钥与密钥一定是配对的而不是独立的。上面是用素因数分解构成的算法,实际离散对数也可以构成相应的算法,只不过还需要一个随机数来辅助。说到这里,你会发现在这个加密体系中运算量实际是很大的,与之形成对比的是私钥加密体系,其运算速度快,但问题就是发放钥匙比较麻烦,所以实际使用时往往采用hybird加密方法,也就是用公钥体系加密私钥体系的钥匙,用私钥加密大量的明文信息且一同封包传输。至于说不加密的明文传输中想知道文本信息是否被篡改,可以通过校检hash函数值来实现(毕竟不是每条信息都需要加密)。如果还不放心,可以用依赖私有钥匙的hash函数来校检。再不放心就要用电子签名了,这一点在阮一峰的博文里提到了就不赘述了。 5公钥的应用实现 前面扯了一些基础,可能你会奇怪,这么复杂的过程怎么感觉不到呢?其实很容易感觉到,我们需要的是有公信力的第三方颁布的公钥证书而已。当我们浏览加密网页时,服务器传输的事实是用自己私钥加密过的密文与证书,而我们的浏览器需要做的就是通过证书管理器来校检这证书是否靠谱,不靠谱就会提示错误而靠谱就会用证书里的公钥与服务器通讯。KK在《失控》里提到加密必胜,并认为这是节制互联网无限链接的法宝,没错,无规矩的自由是混乱,有隐私的互联才会稳定。
个人分类: 科搜研手册|5400 次阅读|0 个评论
[转载]Science Blog 2012年06月13日 20:10 (星期三)
xupeiyang 2012-6-13 21:24
http://scienceblog.com/ Blurring the line between man and machine Work out harder through motivation in a pill? Do Trees Crave Personal Space? MicroRNAs, autophagy and clear cell renal cell carcinoma Tighten your Tropical Belts: Climate Change in the North Volcanic gases could deplete ozone layer Contaminated Alcohol Pads Tied to Illnesses in Children’s Hospital Why are some people greener than others? Radiation-resistant circuits can survive space, damaged nuclear plants NASA Sees Smoke from Siberian Fires Reach the U.S. Coast Mild stress can affect perceptions same as life-threatening stress A father’s love is one of the greatest influences on personality development
个人分类: 科学博客|1677 次阅读|0 个评论
局部线性支持向量机——Local Linear Support Vector Machine
rockycxy 2012-6-12 09:10
ICML 2011 的文章,是对SVM的改进,主要是考虑性能与训练时间之间的平衡。实验部分,性能虽然和最好的方法有一定的差距,但是训练测试时间大大缩短。这篇文章的想法比较巧妙,值得一读。
2 次阅读|0 个评论
[转载]machine learning界的大牛们
hjanime 2012-5-19 04:17
闲着无事,想写点一些我所了解的machine learning大家。由于学识浅薄,见识有限,并且仅局限于某些领域,一些在NLP及最近很热的生物信息领域活跃的学者我就浅陋无知,所以不对的地方大 家仅当一笑。 Machine Learning 大家(1):M. I. Jordan 在我的眼里,M Jordan无疑是武林中的泰山北斗。他师出MIT,现在在berkeley坐镇一方,在附近的两所名校(加stanford)中都可以说无出其右 者,stanford的Daphne Koller虽然也声名遐迩,但是和Jordan比还是有一段距离。 Jordan身兼 stat和cs两个系的教授,从他身上可以看出Stat和ML的融合。 Jordan 最先专注于mixtures of experts,并迅速奠定了自己的地位,我们哈尔滨工业大学的校友徐雷跟他做博后期间,也在这个方向上沾光不少。Jordan和他的弟子在很多方面作出 了开创性的成果,如spectral clustering, Graphical model和nonparametric Bayesian。现在后两者在ML领域是非常炙手可热的两个方向,可以说很大程度上是Jordan的lab一手推动的。 更难能 可贵的是, Jordan不仅自己武艺高强,并且揽钱有法,教育有方,手下门徒众多且很多人成了大器,隐然成为江湖大帮派。他的弟子中有10多人任教授,个人认为他现 在的弟子中最出色的是stanford的Andrew Ng,不过由于资历原因,现在还是assistant professor,不过成为大教授指日可待;另外Tommi Jaakkola和David Blei也非常厉害,其中Tommi Jaakkola在mit任教而David Blei在cmu做博后,数次获得NIPS最佳论文奖,把SVM的最大间隔方法和Markov network的structure结构结合起来,赫赫有名。还有一个博后是来自于toronto的Yee Whye Teh,非常不错,有幸跟他打过几次交道,人非常nice。另外还有一个博后居然在做生物信息方面的东西,看来jordan在这方面也捞了钱。这方面他有 一个中国学生Eric P. Xing(清华大学校友),现在在cmu做assistant professor。 总的说来,我 觉得 Jordan现在做的主要还是graphical model和Bayesian learning,他去年写了一本关于graphical model的书,今年由mit press出版,应该是这个领域里程碑式的著作。3月份曾经有人答应给我一本打印本看看,因为Jordan不让他传播电子版,但后来好像没放在心上(可见 美国人也不是很守信的),人不熟我也不好意思问着要,可以说是一大遗憾. 另外发现一个有趣的现象就是Jordan对hierarchical情有独钟,相当多的文章都是关于hierarchical的,所以能 hierarchical大家赶快hierarchical,否则就让他给抢了。 用我朋友话说看jordan牛不牛,看他主页下 面的Past students and postdocs就知道了。 Machine Learning大家(2):D. Koller D. Koller是1999年美国青年科学家总统奖(PECASE)得主,IJCAI 2001 Computers and Thought Award(IJCAI计算机与思维奖,这是国际人工智能界35岁以下青年学者的最高奖)得主,2004 World Technology Award得主。 最先知道D koller是因为她得了一个大奖,2001年IJCAI计算机与思维奖。Koller因她在概率推理的理论和实践、机器学习、计算博弈论等领域的重要贡 献,成为继Terry Winograd、David Marr、Tom Mitchell、Rodney Brooks等人之后的第18位获奖者。说起这个奖挺有意思的,IJCAI终身成就奖(IJCAI Award for Research Excellence),是国际人工智能界的最高荣誉; IJCAI计算机与思维奖是国际人工智能界35岁以下青年学者的最高荣誉。早期AI研究将推理置于至高无上的地位; 但是1991年牛人Rodney Brooks对推理全面否定,指出机器只能独立学习而得到了IJCAI计算机与思维奖; 但是koller却因提出了Probabilistic Relational Models 而证明机器可以推理论知而又得到了这个奖,可见世事无绝对,科学有轮回。 D koller的Probabilistic Relational Models在nips和icml等各种牛会上活跃了相当长的一段时间,并且至少在实验室里证明了它在信息搜索上的价值,这也导致了她的很多学生进入了 google。虽然进入google可能没有在牛校当faculty名声响亮,但要知道google的很多员工现在可都是百万富翁,在全美大肆买房买车的 主。 Koller的研究主要都集中在probabilistic graphical model,如Bayesian网络,但这玩意我没有接触过,我只看过几篇他们的markov network的文章,但看了也就看了,一点想法都没有,这滩水有点深,不是我这种非科班出身的能趟的,并且感觉难以应用到我现在这个领域中。 Koller 才从教10年,所以学生还没有涌现出太多的牛人,这也是她不能跟Jordan比拟的地方,并且由于在stanford的关系,很多学生直接去硅谷赚大钱去 了,而没有在学术界开江湖大帮派的影响,但在stanford这可能太难以办到,因为金钱的诱惑实在太大了。不过Koller的一个学生我非常崇拜,叫 Ben Taskar,就是我在(1)中所提到的Jordan的博后,是好几个牛会的最佳论文奖,他把SVM的最大间隔方法和Markov network结合起来,可以说是对structure data处理的一种标准工具,也把最大间隔方法带入了一个新的热潮,近几年很多牛会都有这样的workshop。 我最开始上Ben Taskar的在stanford的个人网页时,正赶上他刚毕业,他的顶上有这么一句话:流言变成了现实,我终于毕业了!可见Koller是很变态的,把 自己的学生关得这么郁闷,这恐怕也是大多数女faculty的通病吧,并且估计还非常的push! Machine learning 大家(3):J. D. Lafferty 大家都知道NIPS和ICML向来都是由大大小小的山头所割据,而 John Lafferty无疑是里面相当高的一座高山,这一点可从他的publication list里的NIPS和ICML数目得到明证。虽然江湖传说计算机重镇CMU现在在走向衰落,但这无碍Lafferty拥有越来越大的影响力,翻开AI兵 器谱排名第一的journal of machine learning research的很多文章,我们都能发现author或者editor中赫然有Lafferty的名字。 Lafferty给人 留下的最大的印象似乎是他2001年的conditional random fields,这篇文章后来被疯狂引用,广泛地应用在语言和图像处理,并随之出现了很多的变体,如Kumar的discriminative random fields等。虽然大家都知道discriminative learning好,但很久没有找到好的discriminative方法去处理这些具有丰富的contextual inxxxxation的数据,直到Lafferty的出现。 而现在Lafferty做的东西好像很 杂,semi-supervised learning, kernel learning,graphical models甚至manifold learning都有涉及,可能就是像武侠里一样只要学会了九阳神功,那么其它的武功就可以一窥而知其精髓了。这里面我最喜欢的是semi- supervised learning,因为随着要处理的数据越来越多,进行全部label过于困难,而完全unsupervised的方法又让人不太放心,在这种情况下 semi-supervised learning就成了最好的。这没有一个比较清晰的认识,不过这也给了江湖后辈成名的可乘之机。到现在为止,我觉得cmu的semi- supervised是做得最好的,以前是KAMAL NIGAM做了开创性的工作,而现在Lafferty和他的弟子作出了很多总结和创新。 Lafferty 的弟子好像不是很多,并且好像都不是很有名。不过今年毕业了一个中国人,Xiaojin Zhu(上海交通大学校友),就是做semi-supervised的那个人,现在在wisconsin-madison做assistant professor。他做了迄今为止最全面的Semi-supervised learning literature survey,大家可以从他的个人主页中找到。这人看着很憨厚,估计是很好的陶瓷对象。另外我在(1)中所说的Jordan的牛弟子D Blei今年也投奔Lafferty做博后,就足见Lafferty的牛了。 Lafferty做NLP是很好的,著名的Link Grammar Parser还有很多别的应用。其中language model在IR中应用,这方面他的另一个中国学生ChengXiang Zhai(南京大学校友,2004年美国青年科学家总统奖(PECASE)得主),现在在uiuc做assistant professor。 Machine learning 大家(4):Peter L. Bartlett 鄙人浅薄之见,Jordan 比起同在berkeley的Peter Bartlett还是要差一个层次。Bartlett主要的成就都是在learning theory方面,也就是ML最本质的东西。他的几篇开创性理论分析的论文,当然还有他的书Neural Network Learning: Theoretical Foundations。 UC Berkeley的统计系在强手如林的北美高校中一直是top3,这就足以证明其肯定是群星荟萃,而其中,Peter L. Bartlett是相当亮的一颗星。关于他的研究,我想可以从他的一本书里得到答案:Neural Network Learning: Theoretical Foundations。也就是说,他主要做的是Theoretical Foundations。基础理论虽然没有一些直接可面向应用的算法那样引人注目,但对科学的发展实际上起着更大的作用。试想vapnik要不是在VC维 的理论上辛苦了这么多年,怎么可能有SVM的问世。不过阳春白雪固是高雅,但大多数人只能听懂下里巴人,所以Bartlett的文章大多只能在做理论的那 个圈子里产生影响,而不能为大多数人所广泛引用。 Bartlett在最近两年做了大量的Large margin classifiers方面的工作,如其convergence rate和generalization bound等。并且很多是与jordan合作,足见两人的工作有很多相通之处。不过我发现Bartlett的大多数文章都是自己为第一作者,估计是在教育 上存在问题吧,没带出特别牛的学生出来。 Bartlett的个人主页的talk里有很多值得一看的slides,如Large Margin Classifiers: Convexity and Classification;Large Margin Methods for Structured Classification: Exponentiated Gradient Algorithms。大家有兴趣的话可以去下来看看。 Machine learning 大家(5): Michael Collins Michael Collins (http://people.csail.mit.edu/mcollins/ 自然语言处理(NLP)江湖的第一高人。出身Upenn,靠 一身叫做Collins Parser的武功在江湖上展露头脚。当然除了资质好之外,其出身也帮了不少忙。早年一个叫做Mitchell P. Marcus的师傅传授了他一本葵花宝典-Penn Treebank。从此,Collins整日沉迷于此,终于练成盖世神功。 学成之后,Collins告别师傅开始闯荡江湖,投入了一个叫ATT Labs Research的帮会,并有幸结识了Robert Schapire、Yoram Singer等众多高手。大家不要小瞧这个叫ATT Labs Research的帮会,如果谁没有听过它的大名总该知道它的同父异母的兄弟Bell Labs吧。 言归正传,话说 Collins在这里度过了3年快乐的时光。其间也奠定了其NLP江湖老大的地位。并且练就了Discriminative Reranking, Convolution Kernels,Discriminative Training Methods for Hidden Markov Models等多种绝技。然而,世事难料,怎奈由于帮会经营不善,这帮大牛又不会为帮会拼杀,终于被一脚踢开,大家如鸟兽散了。Schapire去了 Princeton, Singer 也回老家以色列了。Collins来到了MIT,成为了武林第一大帮的六袋长老,并教授一门叫做的Machine Learning Approaches for NLP (http://www.ai.mit.edu/courses/6.891-nlp/ 的功夫。虽然这一地位与其功力极不相符,但是这并没有打消Collins的积极性,通过其刻苦打拼,终于得到了一个叫Sloan Research Fellow的头衔,并于今年7月,光荣的升任7袋Associate Professor。 在其下山短短7年时间 内,Collins共获得了4次世界级武道大会冠军(EMNLP2002, 2004, UAI2004, 2005)。相信年轻的他,总有一天会一统丐帮,甚至整个江湖。 看过Collins和别人合作的一篇文章,用 conditional random fields 做object recogntion。还这么年轻,admire to death!
个人分类: 生物信息|4411 次阅读|0 个评论
[转载]人脸识别方法个人见解(转载)
hailuo0112 2012-4-21 16:02
看到j.liu关于人脸识别的帖子,萌发写这个帖子的念头。没有别的意思,就是想抛砖引玉,把问题说的全面一点,希望j.liu和回其帖子的兄弟姐妹们不要介意。如有兴趣,欢迎继续讨论。在以下讨论中, TPAMI = IEEE Transactions on PAMI 这个杂志 PAMI 是指 pattern analysis and machine intelligence这两个领域 1)PCA和LDA及其相关方法 Eigenfaces和Fisherfaces无疑是人脸识别中里程碑式的工作。就使用的方法而言,PCA和LDA都不是新方法,但是他们都是被第一次十分明确的用在人脸识别中的方法。之所以说"十分明确",是因为就发表的时间来看,这两个论文都不是首次把这两个方法用在PAMI相关的分类识别中。这给我们一个小小的启示:一个新的方法专注于解决一个具体的问题可能会带来更大的影响,虽然这个方法具有一般性。 在现在人脸识别的方法中,这两个方法也是follow的人最多的。究其原因,除了其有效性之外,简单是其最大的特点。纵观PAMI历史风云,能经受住时间考验而流传下来的方法,除了有效之外一般都有两个特点其一:1)简单 (PCA, LDA, K-Means, Normalized Cuts etc.);2)复杂 ,但是解决一个具有一般性而且很难被解决的问题 (在AAM、3d morphable model有深刻影响的Lucas-Kanade算法)。所以如果你的方法一般人都有能力做得到,那就尽量把你的方法做的简单明确。这就是外国人推崇备至的所谓的Ockham's Razor原理(就个人情感而言,我十分讨厌这个名词)。在这里我要强调一点是,这里说的简单并不是说原理简单,Normalized Cuts就方法本身来说简单,但是原理并不简单;微分几何中的Gauss-Bonnet定理形式非常简单,内涵何其丰富。 在此我想多提两句。由于国内有诸多发论文的方法论,其中一个流传下来的一句话就是:系统做的越简单越好,理论做的越复杂越好。不可否认,这句话有它有道理的地方,但是如果用这句话教育后人,误人子弟矣。 后来出现了许多新的与之类似的方法,就TPAMI上发表的来看,比较有代表性就是 HE Xiaofei 的LPP和 YAN Shuicheng 的MFA。关于这两个方法的评论大家可参看j.liu贴中knato的回帖。 在这里我想谈谈我的个人见解。首先这两个方法的出现有它们的意义。LPP是流形学习中Laplacian Eigenmaps线性化,这样无疑会带动其它流形学习方法在识别问题中的尝试,一个为解决问题找到一个思路,二个为进入寒冬的流形学习找到新的用武之地,虽然这两个都不是上档次的思路,但是潜在影响还是有的。后来 YANG Jian 的UDP就是在LPP号召下在TPAMI上的产物。LPP是非监督方法,所以它的识别性能比LDA好的概率极其微弱。 MFA是基于局部数据关系的监督鉴别方法。它有两个最近临近点数量的参数要调。这两个参数是这个方法双刃剑。参数调的好,MFA会比LDA效果好,调的不好则不行。这样MFA用起来比LDA复杂,这样如果MFA的性能比LDA好的有限,而用起来复杂得多的话,它终将被历史所抛弃。 另外就像j.Liu在他的帖子中说出的一样,这些方法有一定的投机性,比如这两篇文章的试验,他们都把Fisherfaces(PCA+LDA)设为c- 1,虽然这是按照原始论文的取法,但是是做过这方面工作的人都知道PCA的主元数目如果取得太大,PCA+LDA的性能会显著降低,在WANG Xiaogang的IJCV上的Random sampling LDA中清楚地给出了图形说明。所以他们论文中给出的实验比较不具可信性。 LPP, UDP, MFA都是我们中国人(至少这些方法发表时还都是)为第一作者发表的方法,个人认为其存在有一定的价值,但它们将是PAMI研究发展中的过眼烟云,无法与PCA,LDA相媲美。 2)LDA奇异性问题 众所周知,LDA是基于求解广义特征值问题(Sb*u=Alpha*Sw*u),所以在实际应用时遇到奇异性的问题,就是Sw矩阵不可逆。在人脸识别中解决这一问题的论文“浩如烟海”。这也说明了LDA的影响力之大。在这一类方法中,也有风格之分。 o. PCA 降维 在Fisherfaces中采用的就是先用PCA降维,再用LDA,这也是现在处理这一问题的一般方法。这里有个比较讽刺的事情。Belhumeur在他的论文里说:PCA actually smears the classes together。那末既然smears the classes together,既然PCA破坏类的结构,那为什莫还要用PCA降维?而且事实证明,即使在Sw可逆的情况下,用PCA features也会增强LDA在人脸识别中的性能。这里只能说明,PCA的作用或是PCA features并不是Belhumeur和其以后follow这样说法的人叙述的那样。PCA虽然简单,但是人们应该对它有个正确的认识,这个以后如果有机会再谈。 a. RDA 至今影响最大最实用的还是基于regularization思想的RDA。其实这个问题不仅仅在人脸识别中才被注意到。很早在统计中就被解决过,RDA发表于1989的Journal of the Americal Statistical Association杂志上,可见其久远。在Sw上加一个扰动项也是解决这一问题的最简单方法。 b.子空间投影 论文最多的也就在这一块。应用knato类似的排列组合方法,令image(Sw)和null(Sw)分别表示Sw的列(像)空间和零空间,则我们可很容易的就列出如下组合方法 (强调:这里却不是提供给大家发论文的方法论,而是以较形象的方式叙述!) 把样本投影到 aa. image(Sb), bb. null(Sw), cc. image(Sw), dd. image(Sw)+null(Sw), ee. image(Sb)+null(Sw) 可并列可串行, ff. image(St)+null(Sw) 以上每一种组合就代表不止一篇论文,在此就不详细列举了。另外,你还可以把random sampling技术加进来,这样就可以不止翻倍。还有,你还可以把同样的技术用到KPCA KLDA (kFA)上,这样又可翻倍。更进一步,你还可以把ICA,LBP, Gabor features等诸如此类的东西和以上子空间混合,...,子子孙孙无穷尽焉。 把这个东西做到极致的是国内的 YANG Jian。另外香港中文大学的 TANG Xiaoou 和他以前的学生 WANG Xiaogang 也做这相关的工作,但是他们做一个idea就是一个,没有灌水之嫌。YANG Jian的工作可以用他在TPAMI上的 KPCA plus LDA 这篇文章来概括,虽然他灌水无数,但就子空间方法而言,他这篇文章还有他发表在国内自动化学报上的那篇长文还是有东西的。如果你想做这一块的工作,值得看一看,是个较为全面的总结。TANG Xiaoou在子空间方面的代表工作(开山之作)就是dual spaces LDA, random sampling (and bagging) LDA, unified subspaces。(在此之后他还有学生一直在做,就不详细列举了。) 我建议想做这一块工作的同学们,要把TANG and YANG的工作烂熟于心,取长补短,相互学习,取其精华,这样可以较为快速而全面地掌握。 c. QR分解 矩阵和数值功底比较好的人,能做得更像模像样。Cheong Hee Park 和 YE Jieping 无疑是这方面的高手。去看看他们在TPAMI,JMLR, 和SIAM的J. Matrix Anal. Appl上发表的论文可知一二。 d. 相关性 如果Sw可逆,则Sb*u=Alpha*Sw*u可以转化为 inv(Sw)*Sb*u=Alpha*u。那末就可以考察Sw的子空间和Sb子空间的相关性。这方面的代表工作就是Aleix M. Martinez在TPAMI上长文的那个工作。 e. 变商为差 变u'*Sb*u/(u'*Sw*u)为u'*(Sb-Sw)*u。 3)基于图像局部结构的方法 这一类获得广泛认可的方法有Gabor和LBP,另外还有可能有用的SIFT和differential features。 Gabor应用比较早有影响力的代表作就是EBGM。Gabor也是提取用来识别的visual feature的最常用手段。 有无数人因为LBP的极其简单而怀疑它的性能,但是有趣的是最近Ahonen在TPAMI上的短文,就是把LBP应用在人脸识别上,没有任何新的改进,这也说明Reviewer们和editor对这类方法的肯定和鼓励。在非监督feature extraction中,LBP有明显的优势,但是绝对没有达到作者在论文显示的那个水平。在他的论文中,LBP特别weighted LBP效果非常好,这和他们应用的FERET人脸库的人脸crop形式有关。他们应用CSU的椭圆模板来crop人脸,如果应用正方形的模板 weighted LBP提高很有限。特别在FRGC Version 2上测试,LBP绝对没有一般监督性的识别方法好。另外这也给我们一个小小启示,就是加个weight其识别性能就能大大提高,这说明什莫问题呢? 另外我不敢苟同j.liu在他文章说的LBP对image blocks大小不敏感是个美丽谎言的说法。首先,有一定的敏感性,这个是要承认的。但是LBP有一个性能稳定的image blocks,并不是人们认为的histogram要符合一定的统计性等等。这个block size的选取比最优的PCA主元数目的选取要容易得多。当然这些都是小问题。 国内有人做Gabor和LBP的结合。当然是值得探索的,但是我个人认为不应该在这两种方法结合上花费太多精力。完全可以用类似形式考虑别的思路。 4) Sparse representation NMF和NTF都属于sparse representation的方法,都曾被应用在人脸识别中,但效果都非常有限。特别是NTF,属于数学理论上非常优美,但是实际效果很勉强的典型。 另外,Sparse representation (coding) 是一个很有趣也是很有前途的方法,Sparse representation 有很多方式,关键要看你怎莫用、解决怎样的问题。过段时间我们还有机会再谈。 5)Tensor方法 Tensor在人脸识别中至少到现在为止,还非常得不成功。最典型的就是M. Alex O.Vasilescu在ECCV'02上的tensorfaces。他们对于问题的分析和tensor的对应天衣无缝,非常有道理,数学实现上也同样简单,但是自从那个方法发表出来以后基本无人follow。究其原因,个人认为就是把本来简单的问题复杂化,最重要的就是复杂化以后并没有带来该有的益处。 Alex对tensor的应用是flattening high-way tensor。这是一种常见的处理tensor的方法,这样做的好处就是使tensor好处理易于计算。two-way tensorfaces就是我们理解的Eigenfaces。但是同样是tensor,这种tensor和Amnon Shashua的NTF有着本质的区别。NTF是纯正的tensor思想。但是它实现起来过于复杂,又加上原理比Alex的tensor更复杂,所以无人问津。但是不可否认,它们都是数学上十分优美的方法。如果你想学习tensor而又不想枯燥,我推荐你去看这三篇论文(Shashua两篇)。 6)参数模型 参数模型的应用也多种多样,比如HMM, GMM等。这两个都是一般性的建模方法,所以应用也很庞杂,而且在人脸识别中的应用大多是从speech recognition中的方法转化而来,在此就不多谈。有兴趣的同学们可以参看H. Othman在PAMI上的论文和Conrad Sanderson在PR上的论文。 但是在此其中,最简单的是Baback Moghaddam在TPAMI上那个Probabilistic Subspaces的文章,这个文章也是WANG Xiaogang的unified spaces的参考原本。 7) 3D 模型 代表作是Volker Blanz在TPAMI上的那个文章。不过个人十分不看好。 8)Personal Perspectives a. 基于子空间的方法很难在实际应用中有所用处 b. 基于找图像局部结构的方法更有希望。像EBGM, LBP, SIFT之类可以给我们很多有益的启示。这点和j.liu的观点一致。 c. 把人脸识别中的方法推广开来,应用到一般的分类和统计问题中,这也是人脸识别衍生出来的一大作用。 d. 由于我们国内的特殊研究环境,大家一般都喜欢做简易快的工作,所以人脸识别这一领域出现有华人名字的论文为数可观。其实在某些压力之下这也无可厚非,但是还是希望我们国人在有条件的情况下,不要以发论文为主,多关注于解决问题本身、尽量向推动理论发展的方向努力。我们绝对有这个能力。君不见,NIPS ‘06两篇Best student paper被在国外留学的中国人获得,CVPR'07更是又传来喜讯:Best student paper由清华学生获得,这些都是迹象。我们正处于一个意气风发、大有可为的时代。就本人学术水平和资历来说,绝没有资格来说这些话,这只不过是个人的一点心愿和号召而已,同时更是勉励自己。 这个帖子主要是谈谈在上一篇中没有谈到或是一带而过的问题。和上一篇一样,还是就方法论方法。 1,kernel methods a. KPCA及其相关 kernel席卷PAMI领域的趋势还在加强。原因很简单,绝大多数的问题都能和kernel挂上钩。在人脸识别里,KPCA和KFA的影响力远不及 PCA和LDA。就应用领域来说,KPCA也远没有PCA应用的广泛。YANG Jian在PAMI上的那个KPCA plus LDA就是子空间和kernel结合的典型论文。如果用作一般性的降维KPCA确实会比PCA效果好,特别是你用的feature空间不是一般的欧式空间的时候更为明显。所以,把LDA用在KPCA变换的空间里自然会比用在PCA变换的空间里效果好。 但是就降维来说,KPCA有一个严重的缺点,就是由它不能得到一个可表示的子空间,比如PCA也可以得到一组正交基作为表示基。当然,这也是kernel 方法的本质属性导致的。这样就会限制kernel方法的应该范围。举个简单的例子,有人做过用PCA来给SIFT特征降维的方法,也就是那个SIFT+ PCA,但他们没有用KPCA+SIFT。就原理上来说,KPCA更适合给SIFT降维,但是在实际应用中,对于SIFT来说,如果需要降维的话,用来降维的东西必须事先学好,PCA就可以事先通过大量的自然图片来学习一个子空间。但是,KPCA做不到。虽然有out-of-sample的方法,但是这种方法有明显的缺点:如果训练样本过大,KPCA的kernel矩阵就很大,这样就很不方便应用,如果过小,效果又不好。其实这也是这类kernel方法的通病(不是一般)。 b. regression regression也是分类常用的一种方法。CVPR'07就有一篇Kernel ridge regression。 regression用来分类的原理很简单,但是他和传统的LDA等类似的方法有着明显的区别。就ridge regression来说,它就是要找一个变换,使样本在变换后的空间里和他们本身的label尽量接近,那末这个学到的变换就在最小二乘意义下尽量好的刻画了样本空间的类结构。一般的,对变换函数(离散就是向量或是矩阵)做一个l2范数上的限制,美其名曰保证函数的smooth(这个下面还会再谈)。这样就可以得到一个形式上较为美的闭解。其实根本不用kernelizaton,regression本身就可以和kernel直接挂上钩,因为求出来变换矩阵在一定限制下就可以看成kernel矩阵(YE Jieping CVPR‘07的metric learning中就用到了类似的思想)。这个和用graph Laplacian做ranking的方法非常相似。Laplacian(或是其简单变形)的逆矩阵如果是正定的,那末就把这个逆看作kernel矩阵。那末和kernel直接相关的方法和思路就用上来了,特别是learning中,种类繁杂。 把ridge regression核化的全部技术含量就在计算的trick上。由于把样本映射到Hilbert空间中只是一个虚的表示,在出现内积的情况下才能写成现实的表达式,所以对于kernel方法来说,计算上的trick要求就比较高。但是,往往这类trick都是在统计和矩阵早已被解决的问题,所以大部分工作就是怎样用好而已。 像这样“借壳还魂”的做法,在很多理论的研究上都非常重要。我们要达到我们的目的,但是这个东西又不是直接可表达的,那末就可以把它放到一定的空间中,按照这个空间中的基本原理来计算,最后到达一个可以表达的形式,而且是按照你的idea来推导的。这种东西一旦做出来,质量还不低。 2,regularization 虽然名字叫regularization,其实就想谈谈优化目标和优化约束问题。 如果你看了ICML'07,CVPR'07和即将出炉的ICCV'07,你就会发现07年是个不平凡的一年,降维领域有点混乱。或者说自从97年以来一直就没有平静过,都是Fisherfaces惹的祸:) 还记得knato回帖中斗胆列出的排列组合吗?如果不记得暂且去温习一下,因为我要用一把。把knato列出的不同排列组合加上如下regression一个的一个优化 ||Y-W'X||^2, 就可以概括所有今年的和这类相关论文的思想。然后,如果你愿意,你还可以衍生出很多。优化目标确定以后,所不同的就是求解方法。你可以带着这个观点再去看一下今年的论文,了然于胸。 由此,线性降维的混乱过程经历了一个小小的转折————从子空间组合到优化目标和优化约束的组合。子空间主要集中在1998--2005(当然还不会消失),后一种在今年可以说是达到一个小小的高潮。如果再加上应用算法的策略,就形成了乱世中的三足鼎立局面。特别是后一种,往往穿插出现,而且有待加强。这其中的代表人物 TANG Xiaoou, YANG Jian, YE Jieping, HE Xiaofei,YAN Shuicheng。导致这一变更的主要因素来源于非线性方法的应用,特别kernel和manifold learning的线性化应用,这其中LPP起了很大的刺激作用。 如果你能站在一个高度(一定范围内)看待这些东西,那末当你面临毕业出国压力时,你就可以“察若水三千,得一瓢饮”来缓解压力。而且还可以尽量饮得好水。(再次郑重声明:这不是发这个帖子的原意。) 3,子空间方法中常用的计算技巧 a. 关于这一块的东西,Stan Z. Li编辑过一个小书挺好的,可以通过下面的网站找到。 http://www.face-rec.org/ 不过,我想谈谈规律性的东西。这其中涉及到的东西就是 column (range) space, null space, generalized inverse。这些东西都和QR分解,SVD或是GSVD相关。遇到这些东西,就想起他们准没错。如果你有兴趣,可以看看YE Jieping和Haesun Park关于子空间的论文,都是一个模式。 b. 正交化 从发表的论文来看,对于广义特征值问题,如果求解一组相互正交的基,比B-orthogonal效果要好很多。代表作就是CAI Deng的orthogonal LPP和YE Jieping的 orthogonal LDA。 CAI Deng做了一个orthogonal LPP发在TIP上。他用的就是88年发在TPAMI上的方法,原理一模一样。YE Jieping用的是同时对角化三个矩阵。风格不同,各有长短。个人还是倾向于CAI Deng用的那个方法。 4,Tensor revisited 在上一篇中,我谈了tensor的方法,主要说了tensorfaces和NTF。这里再多说几句。 最近在tensor方面功夫最多的是YAN Shuicheng,最近的TPAMI, TIP, 和 CVPR'07都有他与此相关的文章。这对于发扬和推广tensor的思想和方法确实是个好事情,我是赞同探讨的。 另外,HE Xiaofei和CAI Deng也做过tensor subspace。准确地说,他们只是借用了tensor的概念,他们的方法可以和2D PCA, 2D LDA归为一类。 其实做这一块东西最早的是YANG Jian的一个大师兄,在90年代PR上的工作,后来YANG Jian把它发扬光大,最初的结果就是PR和TPAMI上各一篇短文(2DPCA)。 最早把这类东西以tensor形式呈现的是CV中的大牛Amnon Shashua在01年CVPR上的论文,有兴趣可以看看。不过,大牛终究是大牛,当他听说了NMF以后,NTF立马横空出世(ICML'05)。这个中间的变化是质的跨越,能做出前面那种方法的可以说非常之多,能做出后面那种方法的真是寥寥。这是值得我们好好学习的。 (B.T.W.,Amnon此人并不只是学术了得,其妻子是以色列小姐,again,也值得大家学习的榜样,特别是整天闷头做科研的我们) 在这里要强调的是,我们不能完全否定一些简单的东西,上轨道的或是正宗有深度的方法往往就是这样慢慢做出来的。 5,其它 关于kernel的方法我就是点到而止。在上一个帖子中有人提出说说SVM和Boosting,如果谁有兴趣,可以谈谈。 另外也有人说在上一个贴中我漏掉了Bayesianfaces,实际这个就是我在参数模型中提到的Probabilistic Subspaces方法。有兴趣可以看看。 结束语 纵观PAMI领域困扰纷争,虽然我们达不到“跳出三界外,不在五行中”的境界,但是至少我们可以更好的看清楚这个领域的情况。如果你能站在一个高度看待这些东西,你就有可能认清你自己认为有希望的方向在哪儿,从而更准确地找到自己的目标而少走弯路,或是更好地给自己定位。 写这些东西,就是想帮助了解这一领域的人能全面准确地了解这一块的东西,少走弯路。另外,对于已经谙熟于心的人,激发一个讨论的话题。在上一篇贴子中,看贴的人多,回帖的人少,这个现象可不好。欢迎大家踊跃发言,良性讨论,这样才会带来更多益处,千万不要担心自己是新手,越是新手越需要发言。 俗话说:“乱世出英雄”,当今在PAMI领域正是需要英雄的时机,就是我在I中说的“我们正处在一个大有可为的时代”,希望下次力挽狂澜的是华人的名字。 以上尽是一家之言,欢迎大家批评指正、主动参与讨论。
2689 次阅读|0 个评论
[转载]机器学习中的数学(4)-线性判别分析(LDA), 主成分分析(PCA)
zhenliangli 2012-3-24 14:14
版权声明: 本文由LeftNotEasy发布于 http://leftnoteasy.cnblogs.com , 本文可以被全部的转载或者部分使用,但请注明出处,如果有问题,请联系 wheeleast@gmail.com 前言: 第二篇 的文章中谈到,和部门老大一宁出去outing的时候,他给了我相当多的机器学习的建议,里面涉及到很多的算法的意义、学习方法等等。一宁上次给我提到,如果学习分类算法,最好从线性的入手,线性分类器最简单的就是LDA,它可以看做是简化版的SVM,如果想理解SVM这种分类器,那理解LDA就是很有必要的了。 谈到LDA,就不得不谈谈PCA,PCA是一个和LDA非常相关的算法,从推导、求解、到算法最终的结果,都有着相当的相似。 本次的内容主要是以推导数学公式为主,都是从算法的物理意义出发,然后一步一步最终推导到最终的式子,LDA和PCA最终的表现都是解一个矩阵特征值的问题,但是理解了如何推导,才能更深刻的理解其中的含义。本次内容要求读者有一些基本的线性代数基础,比如说特征值、特征向量的概念,空间投影,点乘等的一些基本知识等。除此之外的其他公式、我都尽量讲得更简单清楚。 LDA: LDA的全称是Linear Discriminant Analysis(线性判别分析), 是一种supervised learning。 有些资料上也称为是Fisher’s Linear Discriminant,因为它被Ronald Fisher发明自1936年,Discriminant这次词我个人的理解是,一个模型,不需要去通过概率的方法来训练、预测数据,比如说各种贝叶斯方法,就需要获取数据的先验、后验概率等等。LDA是在 目前机器学习、数据挖掘领域经典且热门 的一个算法,据我所知,百度的商务搜索部里面就用了不少这方面的算法。 LDA的原理是,将带上标签的数据(点),通过投影的方法,投影到维度更低的空间中,使得投影后的点,会形成按类别区分,一簇一簇的情况,相同类别的点,将会在投影后的空间中更接近。要说明白LDA,首先得弄明白线性分类器( Linear Classifier ):因为LDA是一种线性分类器。对于K-分类的一个分类问题,会有K个线性函数: 当满足条件:对于所有的j,都有Yk Yj,的时候,我们就说x属于类别k。对于每一个分类,都有一个公式去算一个分值,在所有的公式得到的分值中,找一个最大的,就是所属的分类了。 上式实际上就是一种投影,是将一个高维的点投影到一条高维的直线上,LDA最求的目标是,给出一个标注了类别的数据集,投影到了一条直线之后,能够使得点尽量的按类别区分开,当k=2即二分类问题的时候,如下图所示: 红色的方形的点为0类的原始点、蓝色的方形点为1类的原始点,经过原点的那条线就是投影的直线,从图上可以清楚的看到,红色的点和蓝色的点被 原点 明显的分开了,这个数据只是随便画的,如果在高维的情况下,看起来会更好一点。下面我来推导一下二分类LDA问题的公式: 假设用来区分二分类的直线(投影函数)为: LDA分类的一个目标是使得不同类别之间的距离越远越好,同一类别之中的距离越近越好,所以我们需要定义几个关键的值。 类别i的原始中心点为:(Di表示属于类别i的点) 类别i投影后的中心点为: 衡量类别i投影后,类别点之间的分散程度(方差)为: 最终我们可以得到一个下面的公式,表示LDA投影到w后的损失函数: 我们 分类的目标是,使得类别内的点距离越近越好(集中),类别间的点越远越好。 分母表示每一个类别内的方差之和,方差越大表示一个类别内的点越分散,分子为两个类别各自的中心点的距离的平方,我们最大化J(w)就可以求出最优的w了。想要求出最优的w,可以使用拉格朗日乘子法,但是现在我们得到的J(w)里面,w是不能被单独提出来的,我们就得想办法将w单独提出来。 我们定义一个投影前的各类别分散程度的矩阵,这个矩阵看起来有一点麻烦,其实意思是,如果某一个分类的输入点集Di里面的点距离这个分类的中心店mi越近,则Si里面元素的值就越小,如果分类的点都紧紧地围绕着mi,则Si里面的元素值越更接近0. 带入Si,将J(w)分母化为: 同样的将J(w)分子化为: 这样损失函数可以化成下面的形式: 这样就可以用最喜欢的拉格朗日乘子法了,但是还有一个问题,如果分子、分母是都可以取任意值的,那就会使得有无穷解,我们将分母限制为长度为1(这是用拉格朗日乘子法一个很重要的技巧,在下面将说的PCA里面也会用到,如果忘记了,请复习一下高数),并作为拉格朗日乘子法的限制条件,带入得到: 这样的式子就是一个求特征值的问题了。 对于N(N2)分类的问题,我就直接写出下面的结论了: 这同样是一个求特征值的问题,我们求出的第i大的特征向量,就是对应的Wi了。 这里想多谈谈特征值,特征值在纯数学、量子力学、固体力学、计算机等等领域都有广泛的应用,特征值表示的是矩阵的性质,当我们取到矩阵的前N个最大的特征值的时候,我们可以说提取到的矩阵主要的成分(这个和之后的PCA相关,但是不是完全一样的概念)。在机器学习领域,不少的地方都要用到特征值的计算,比如说图像识别、pagerank、LDA、还有之后将会提到的PCA等等。 下图是图像识别中广泛用到的特征脸(eigen face),提取出特征脸有两个目的,首先是为了压缩数据,对于一张图片,只需要保存其最重要的部分就是了,然后是为了使得程序更容易处理,在提取主要特征的时候,很多的噪声都被过滤掉了。跟下面将谈到的PCA的作用非常相关。 特征值的求法有很多,求一个D * D的矩阵的时间复杂度是O(D^3), 也有一些求Top M的方法,比如说 power method ,它的时间复杂度是O(D^2 * M), 总体来说,求特征值是一个很费时间的操作,如果是单机环境下,是很局限的。 PCA: 主成分分析(PCA)与LDA有着非常近似的意思,LDA的输入数据是带标签的,而PCA的输入数据是不带标签的,所以PCA是一种unsupervised learning。LDA通常来说是作为一个独立的算法存在,给定了训练数据后,将会得到一系列的判别函数(discriminate function),之后对于新的输入,就可以进行预测了。而PCA更像是一个预处理的方法,它可以将原本的数据降低维度,而使得降低了维度的数据之间的方差最大(也可以说投影误差最小,具体在之后的推导里面会谈到)。 方差这个东西是个很有趣的,有些时候我们会考虑减少方差(比如说训练模型的时候,我们会考虑到方差-偏差的均衡),有的时候我们会尽量的增大方差。方差就像是一种信仰(强哥的话),不一定会有很严密的证明,从实践来说,通过尽量增大投影方差的PCA算法,确实可以提高我们的算法质量。 说了这么多,推推公式可以帮助我们理解。 我下面将用两种思路来推导出一个同样的表达式。首先是最大化投影后的方差,其次是最小化投影后的损失(投影产生的损失最小)。 最大化方差法: 假设我们还是将一个空间中的点投影到一个向量中去。首先,给出原空间的中心点: 假设u1为投影向量,投影之后的方差为: 上面这个式子如果看懂了之前推导LDA的过程,应该比较容易理解,如果线性代数里面的内容忘记了,可以再温习一下,优化上式等号右边的内容,还是用拉格朗日乘子法: 将上式求导,使之为0,得到: 这是一个标准的特征值表达式了,λ对应的特征值,u对应的特征向量。上式的左边取得最大值的条件就是λ1最大,也就是取得最大的特征值的时候。假设我们是要将一个D维的数据空间投影到M维的数据空间中(M D), 那我们取前M个特征向量构成的投影矩阵就是能够使得方差最大的矩阵了。 最小化损失法: 假设输入数据x是在D维空间中的点,那么,我们可以用D个正交的D维向量去完全的表示这个空间(这个空间中所有的向量都可以用这D个向量的线性组合得到)。在D维空间中,有无穷多种可能找这D个正交的D维向量,哪个组合是最合适的呢? 假设我们已经找到了这D个向量,可以得到: 我们可以用近似法来表示投影后的点: 上式表示,得到的新的x是由前M 个基的线性组合加上后D - M个基的线性组合,注意这里的z是对于每个x都不同的,而b对于每个x是相同的,这样我们就可以用M个数来表示空间中的一个点,也就是使得数据降维了。但是这样降维后的数据,必然会产生一些扭曲,我们用J描述这种扭曲,我们的目标是,使得J最小: 上式的意思很直观,就是对于每一个点,将降维后的点与原始的点之间的距离的平方和加起来,求平均值,我们就要使得这个平均值最小。我们令: 将上面得到的z与b带入降维的表达式: 将上式带入J的表达式得到: 再用上拉普拉斯乘子法(此处略),可以得到,取得我们想要的投影基的表达式为: 这里又是一个特征值的表达式,我们想要的前M个向量其实就是这里最大的M个特征值所对应的特征向量。证明这个还可以看看,我们J可以化为: 也就是当误差J是由最小的D - M个特征值组成的时候,J取得最小值。跟上面的意思相同。 下图是PCA的投影的一个表示,黑色的点是原始的点,带箭头的虚线是投影的向量,Pc1表示特征值最大的特征向量,pc2表示特征值次大的特征向量,两者是彼此正交的,因为这原本是一个2维的空间,所以最多有两个投影的向量,如果空间维度更高,则投影的向量会更多。 总结: 本次主要讲了两种方法,PCA与LDA,两者的思想和计算方法非常类似,但是一个是作为独立的算法存在,另一个更多的用于数据的预处理的工作。另外对于PCA和LDA还有核方法,本次的篇幅比较大了,先不说了,以后有时间再谈: 参考资料: prml bishop,introduce to LDA(对不起,这个真没有查到出处) 本文转载自: http://www.cnblogs.com/LeftNotEasy/archive/2011/01/08/lda-and-pca-machine-learning.html
个人分类: MachineLearning|2863 次阅读|0 个评论
[转载]机器学习中的数学(2)-线性回归,偏差、方差权衡
zhenliangli 2012-3-24 14:12
版权声明: 本文由LeftNotEasy所有,发布于 http://leftnoteasy.cnblogs.com 。如果转载,请注明出处,在未经作者同意下将本文用于商业用途,将追究其法律责任。如果有问题,请联系作者 wheeleast@gmail.com 前言: 距离上次发文章,也快有半个月的时间了,这半个月的时间里又在学习机器学习的道路上摸索着前进,积累了一点心得,以后会慢慢的写写这些心得。写文章是促进自己对知识认识的一个好方法,看书的时候往往不是非常细,所以有些公式、知识点什么的就一带而过,里面的一些具体意义就不容易理解了。而写文章,特别是写科普性的文章,需要对里面的具体意义弄明白,甚至还要能举出更生动的例子,这是一个挑战。为了写文章,往往需要把之前自己认为看明白的内容重新理解一下。 机器学习可不是一个完全的技术性的东西,之前和部门老大在outing的时候一直在聊这个问题,机器学习绝对不是一个一个孤立的算法堆砌起来的,想要像看《算法导论》这样看机器学习是个不可取的方法,机器学习里面有几个东西一直贯穿全书,比如说数据的分布、最大似然(以及求极值的几个方法,不过这个比较数学了),偏差、方差的权衡,还有特征选择,模型选择,混合模型等等知识,这些知识像砖头、水泥一样构成了机器学习里面的一个个的算法。想要真正学好这些算法,一定要静下心来将这些基础知识弄清楚,才能够真正理解、实现好各种机器学习算法。 今天的主题是线性回归,也会提一下偏差、方差的均衡这个主题。 线性回归定义: 在 上一个主题 中,也是一个与回归相关的,不过上一节更侧重于梯度这个概念,这一节更侧重于回归本身与偏差和方差的概念。 回归最简单的定义是,给出一个点集D,用一个函数去拟合这个点集,并且使得点集与拟合函数间的误差最小。 上图所示,给出一个点集(x,y), 需要用一个函数去拟合这个点集,蓝色的点是点集中的点,而红色的曲线是函数的曲线,第一张图是一个最简单的模型,对应的函数为y = f(x) = ax + b,这个就是一个线性函数, 第二张图是二次曲线,对应的函数是y = f(x) = ax^2 + b。 第三张图我也不知道是什么函数,瞎画的。 第四张图可以认为是一个N次曲线,N = M - 1,M是点集中点的个数,有一个定理是,对于给定的M个点,我们可以用一个M - 1次的函数去完美的经过这个点集。 真正的线性回归,不仅会考虑使得曲线与给定点集的拟合程度最好,还会考虑模型最简单,这个话题我们将在本章后面的偏差、方差的权衡中深入的说,另外这个话题还可以参考我之前的一篇文章: 贝叶斯、概率分布与机器学习 ,里面对模型复杂度的问题也进行了一些讨论。 线性回归(linear regression),并非是指的线性函数,也就是 (为了方便起见,以后向量我就不在上面加箭头了) x0,x1…表示一个点不同的维度,比如说上一节中提到的,房子的价钱是由包括面积、房间的个数、房屋的朝向等等因素去决定的。而是用广义的线性函数: wj是系数,w就是这个系数组成的向量,它影响着不同维度的Φj(x)在回归函数中的影响度,比如说对于房屋的售价来说,房间朝向的w一定比房间面积的w更小。Φ(x)是可以换成不同的函数,不一定要求Φ(x)=x,这样的模型我们认为是广义线性模型。 最小二乘法与最大似然: 这个话题在 此处 有一个很详细的讨论,我这里主要谈谈这个问题的理解。最小二乘法是线性回归中一个最简单的方法,它的推导有一个假设,就是 回归函数的估计值与真实值间的误差假设是一个高斯分布 。这个用公式来表示是下面的样子: ,y(x,w)就是给定了w系数向量下的回归函数的估计值,而t就是真实值了,ε表示误差。我们可以接下来推出下面的式子: 这是一个简单的条件概率表达式,表示在给定了x,w,β的情况下,得到真实值t的概率,由于ε服从高斯分布,则从估计值到真实值间的概率也是高斯分布的,看起来像下面的样子: 贝叶斯、概率分布与机器学习 这篇文章中对分布影响结果这个话题讨论比较多,可以回过头去看看,由于最小二乘法有这样一个假设,则会导致,如果我们给出的估计函数y(x,w)与真实值t不是高斯分布的,甚至是一个差距很大的分布,那么算出来的模型一定是不正确的,当给定一个新的点x’想要求出一个估计值y’,与真实值t’可能就非常的远了。 概率分布是一个可爱又可恨的东西,当我们能够准确的预知某些数据的分布时,那我们可以做出一个非常精确的模型去预测它,但是在大多数真实的应用场景中,数据的分布是不可知的,我们也很难去用一个分布、甚至多个分布的混合去表示数据的真实分布,比如说给定了1亿篇网页,希望用一个现有的分布(比如说混合高斯分布)去匹配里面词频的分布,是不可能的。在这种情况下,我们只能得到词的出现概率,比如p(的)的概率是0.5,也就是一个网页有1/2的概率出现“的”。如果一个算法,是对里面的分布进行了某些假设,那么可能这个算法在真实的应用中就会表现欠佳。 最小二乘法对于类似的一个复杂问题,就很无力了 偏差、方差的权衡(trade-off): 偏差(bias)和方差(variance)是统计学的概念,刚进公司的时候,看到每个人的嘴里随时蹦出这两个词,觉得很可怕。首先得明确的,方差是多个模型间的比较,而非对一个模型而言的,对于单独的一个模型,比如说: 这样的一个给定了具体系数的估计函数,是不能说f(x)的方差是多少。而偏差可以是单个数据集中的,也可以是多个数据集中的,这个得看具体的定义。 方差和偏差一般来说,是从同一个数据集中,用科学的采样方法得到几个不同的子数据集,用这些子数据集得到的模型,就可以谈他们的方差和偏差的情况了。方差和偏差的变化一般是和模型的复杂程度成正比的,就像本文一开始那四张小图片一样,当我们一味的追求模型精确匹配,则可能会导致同一组数据训练出不同的模型,它们之间的差异非常大。这就叫做方差,不过他们的偏差就很小了,如下图所示: 上图的蓝色和绿色的点是表示一个数据集中采样得到的不同的子数据集,我们有两个N次的曲线去拟合这些点集,则可以得到两条曲线(蓝色和深绿色),它们的差异就很大,但是他们本是由同一个数据集生成的,这个就是模型复杂造成的方差大。模型越复杂,偏差就越小,而模型越简单,偏差就越大,方差和偏差是按下面的方式进行变化的: 当方差和偏差加起来最优的点,就是我们最佳的模型复杂度。 用一个很通俗的例子来说,现在咱们国家一味的追求GDP,GDP就像是模型的偏差,国家希望现有的GDP和目标的GDP差异尽量的小,但是其中使用了很多复杂的手段,比如说倒卖土地、强拆等等,这个增加了模型的复杂度,也会使得偏差(居民的收入分配)变大,穷的人越穷(被赶出城市的人与进入城市买不起房的人),富的人越富(倒卖土地的人与卖房子的人)。其实本来模型不需要这么复杂,能够让居民的收入分配与国家的发展取得一个平衡的模型是最好的模型。 最后还是用数学的语言来描述一下偏差和方差: E(L)是损失函数,h(x)表示真实值的平均,第一部分是与y(模型的估计函数)有关的,这个部分是由于我们选择不同的估计函数(模型)带来的差异,而第二部分是与y无关的,这个部分可以认为是模型的固有噪声。 对于上面公式的第一部分,我们可以化成下面的形式: 这个部分在PRML的1.5.5推导,前一半是表示偏差,而后一半表示方差,我们可以得出:损失函数=偏差^2+方差+固有噪音。 下图也来自PRML: 这是一个曲线拟合的问题,对同分布的不同的数据集进行了多次的曲线拟合,左边表示方差,右边表示偏差,绿色是真实值函数。ln lambda表示模型的复杂程度,这个值越小,表示模型的复杂程度越高,在第一行,大家的复杂度都很低(每个人都很穷)的时候,方差是很小的,但是偏差同样很小(国家也很穷),但是到了最后一幅图,我们可以得到,每个人的复杂程度都很高的情况下,不同的函数就有着天壤之别了(贫富差异大),但是偏差就很小了(国家很富有)。 本文转载自: http://www.cnblogs.com/LeftNotEasy/archive/2010/12/19/mathmatic_in_machine_learning_2_regression_and_bias_variance_trade_off.html
个人分类: MachineLearning|2843 次阅读|0 个评论
[转载]zhou zhihua 对AI领域的会议的评点
junyichai 2012-3-12 11:44
The First Class: 今天先谈谈AI里面tier-1的conferences, 其实基本上就是AI里面大家比较公认的top conference. 下面同分的按字母序排列. IJCAI (1+): AI最好的综合性会议, 1969年开始, 每两年开一次, 奇数年开. 因为AI实在太大, 所以虽然每届基本上能录100多篇(现在已经到200多篇了),但分到每个领域就没几篇了,象machine learning、computer vision这么大的领域每次大概也就10篇左右, 所以难度很大. 不过从录用率上来看倒不太低,基本上20%左右, 因为内行人都会掂掂分量, 没希望的就别浪费reviewer的时间了. 最近中国大陆投往国际会议的文章象潮水一样, 而且因为国内很少有能自己把关的研究组, 所以很多会议都在complain说中国的低质量文章严重妨碍了PC的工作效率. 在这种情况下, 估计这几年国际会议的录用率都会降下去. 另外, 以前的IJCAI是没有poster的, 03年开始, 为了减少被误杀的好人, 增加了2页纸的poster.值得一提的是, IJCAI是由貌似一个公司的"IJCAI Inc."主办的(当然实际上并不是公司, 实际上是个基金会), 每次会议上要发几个奖, 其中最重要的两个是IJCAI Research Excellence Award 和 Computer Thoughts Award, 前者是终身成就奖, 每次一个人, 基本上是AI的最高奖(有趣的是, 以AI为主业拿图灵奖的6位中, 有2位还没得到这个奖), 后者是奖给35岁以下的青年科学家, 每次一个人. 这两个奖的获奖演说是每次IJCAI的一个重头戏.另外, IJCAI 的 PC member 相当于其他会议的area chair, 权力很大, 因为是由PC member去找 reviewer 来审, 而不象一般会议的PC member其实就是 reviewer. 为了制约这种权力, IJCAI的审稿程序是每篇文章分配2位PC member, primary PC member去找3位reviewer, second PC member 找一位. AAAI (1): 美国人工智能学会AAAI的年会. 是一个很好的会议, 但其档次不稳定, 可以给到1+, 也可以给到1-或者2+, 总的来说我给它"1". 这是因为它的开法完全受IJCAI制约: 每年开, 但如果这一年的IJCAI在北美举行, 那么就停开. 所以, 偶数年里因为没有IJCAI, 它就是最好的AI综合性会议, 但因为号召力毕竟比IJCAI要小一些, 特别是欧洲人捧AAAI场的比IJCAI少得多(其实亚洲人也是), 所以比IJCAI还是要稍弱一点, 基本上在1和1+之间; 在奇数年, 如果IJCAI不在北美, AAAI自然就变成了比IJCAI低一级的会议(1-或2+), 例如2005年既有IJCAI又有AAAI, 两个会议就进行了协调, 使得IJCAI的录用通知时间比AAAI的deadline早那么几天, 这样IJCAI落选的文章可以投往AAAI.在审稿时IJCAI 的 PC chair也在一直催, 说大家一定要快, 因为AAAI那边一直在担心IJCAI的录用通知出晚了AAAI就麻烦了. COLT (1): 这是计算学习理论最好的会议, ACM主办, 每年举行. 计算学习理论基本上可以看成理论计算机科学和机器学习的交叉, 所以这个会被一些人看成是理论计算机科学的会而不是AI的会. 我一个朋友用一句话对它进行了精彩的刻画: "一小群数学家在开会". 因为COLT的领域比较小, 所以每年会议基本上都是那些人. 这里顺便提一件有趣的事, 因为最近国内搞的会议太多太滥, 而且很多会议都是LNCS/LNAI出论文集, LNCS/LNAI基本上已经被搞臭了, 但很不幸的是, LNCS/LNAI中有一些很好的会议, 例如COLT. CVPR (1): 计算机视觉和模式识别方面最好的会议之一, IEEE主办, 每年举行. 虽然题目上有计算机视觉, 但个人认为它的模式识别味道更重一些. 事实上它应该是模式识别最好的会议, 而在计算机视觉方面, 还有ICCV与之相当. IEEE一直有个倾向, 要把会办成"盛会", 历史上已经有些会被它从quality很好的会办成"盛会"了. CVPR搞不好也要走这条路. 这几年录的文章已经不少了. 最近负责CVPR会议的TC的chair发信说, 对这个community来说, 让好人被误杀比被坏人漏网更糟糕, 所以我们是不是要减少好人被误杀的机会啊? 所以我估计明年或者后年的CVPR就要扩招了. ICCV (1): 介绍CVPR的时候说过了, 计算机视觉方面最好的会之一. IEEE主办, 每年举行. ICML (1): 机器学习方面最好的会议之一. 现在是IMLS主办, 每年举行. 参见关于NIPS的介绍. NIPS (1): 神经计算方面最好的会议之一, NIPS主办, 每年举行. 值得注意的是, 这个会每年的举办地都是一样的, 以前是美国丹佛, 现在是加拿大温哥华; 而且它是年底开会,会开完后第2年才出论文集, 也就是说, NIPS'05的论文集是06年出. 会议的名字是"Advances in Neural Information Processing Systems", 所以, 与ICMLECML这样的"标准的"机器学习会议不同, NIPS里有相当一部分神经科学的内容, 和机器学习有一定的距离. 但由于会议的主体内容是机器学习, 或者说与机器学习关系紧密, 所以不少人把NIPS看成是机器学习方面最好的会议之一. 这个会议基本上控制在Michael Jordan的徒子徒孙手中, 所以对Jordan系的人来说, 发NIPS并不是难事, 一些未必很强的工作也能发上去, 但对这个圈子之外的人来说, 想发一篇实在很难, 因为留给"外人"的口子很小. 所以对Jordan系以外的人来说, 发NIPS的难度比ICML更大. 换句话说, ICML比较开放, 小圈子的影响不象NIPS那么大, 所以北美和欧洲人都认, 而NIPS则有些人(特别是一些欧洲人, 包括一些大家)坚决不投稿. 这对会议本身当然并不是好事, 但因为Jordan系很强大, 所以它似乎也不太care. 最近IMLS(国际机器学习学会)改选理事, 有资格提名的人包括近三年在ICMLECMLCOLT发过文章的人, NIPS则被排除在外了. 无论如何, 这是一个非常好的会. ACL (1-): 计算语言学/自然语言处理方面最好的会议, ACL (Association of Computational Linguistics) 主办, 每年开. KR (1-): 知识表示和推理方面最好的会议之一, 实际上也是传统AI(即基于逻辑的AI)最好的会议之一. KR Inc.主办, 现在是偶数昕? SIGIR (1-): 信息检索方面最好的会议, ACM主办, 每年开. 这个会现在小圈子气越来越重. 信息检索应该不算AI, 不过因为这里面用到机器学习越来越多, 最近几年甚至有点机器学习应用会议的味道了, 所以把它也列进来. SIGKDD (1-): 数据挖掘方面最好的会议, ACM主办, 每年开. 这个会议历史比较短, 毕竟, 与其他领域相比,数据挖掘还只是个小弟弟甚至小侄儿. 在几年前还很难把它列在tier-1里面, 一方面是名声远不及其他的top conference响亮, 另一方面是相对容易被录用. 但现在它被列在tier-1应该是毫无疑问的事情了. 另: 参见sir和lucky的介绍. UAI (1-): 名字叫"人工智能中的不确定性", 涉及表示推理学习等很多方面, AUAI (Association of UAI) 主办, 每年开. The Second Class: tier-2的会议列得不全, 我熟悉的领域比较全一些. AAMAS (2+): agent方面最好的会议. 但是现在agent已经是一个一般性的概念, 几乎所有AI有关的会议上都有这方面的内容, 所以AAMAS下降的趋势非常明显. ECCV (2+): 计算机视觉方面仅次于ICCV的会议, 因为这个领域发展很快, 有可能升级到1-去. ECML (2+): 机器学习方面仅次于ICML的会议, 欧洲人极力捧场, 一些人认为它已经是1-了. 我保守一点, 仍然把它放在2+. 因为机器学习发展很快, 这个会议的reputation上升非常明显. ICDM (2+): 数据挖掘方面仅次于SIGKDD的会议, 目前和SDM相当. 这个会只有5年历史, 上升速度之快非常惊人. 几年前ICDM还比不上PAKDD, 现在已经拉开很大距离了. SDM (2+): 数据挖掘方面仅次于SIGKDD的会议, 目前和ICDM相当. SIAM的底子很厚, 但在CS里面的影响比ACM和IEEE还是要小, SDM眼看着要被ICDM超过了, 但至少目前还是相当的. ICAPS (2): 人工智能规划方面最好的会议, 是由以前的国际和欧洲规划会议合并来的. 因为这个领域逐渐变冷清, 影响比以前已经小了. ICCBR (2): Case-Based Reasoning方面最好的会议. 因为领域不太大, 而且一直半冷不热, 所以总是停留在2上. COLLING (2): 计算语言学/自然语言处理方面仅次于ACL的会, 但与ACL的差距比ICCV-ECCV和ICML-ECML大得多. ECAI (2): 欧洲的人工智能综合型会议, 历史很久, 但因为有IJCAI/AAAI压着, 很难往上升. ALT (2-): 有点象COLT的tier-2版, 但因为搞计算学习理论的人没多少, 做得好的数来数去就那么些group, 基本上到COLT去了, 所以ALT里面有不少并非计算学习理论的内容. EMNLP (2-): 计算语言学/自然语言处理方面一个不错的会. 有些人认为与COLLING相当, 但我觉得它还是要弱一点. ILP (2-): 归纳逻辑程序设计方面最好的会议. 但因为很多其他会议里都有ILP方面的内容, 所以它只能保住2-的位置了. PKDD (2-): 欧洲的数据挖掘会议, 目前在数据挖掘会议里面排第4. 欧洲人很想把它抬起来, 所以这些年一直和ECML一起捆绑着开, 希望能借ECML把它带起来. 但因为ICDM和SDM, 这已经不太可能了. 所以今年的PKDD和ECML虽然还是一起开, 但已经独立审稿了(以前是可以同时投两个会, 作者可以声明优先被哪个会考虑, 如果ECML中不了还可以被PKDD接受). The Third Class: 列得很不全. 另外, 因为AI的相关会议非常多, 所以能列在tier-3也算不错了, 基本上能进到所有AI会议中的前30%吧 ACCV (3+): 亚洲的计算机视觉会议, 在亚太级别的会议里算很好的了. DS (3+): 日本人发起的一个接近数据挖掘的会议. ECIR (3+): 欧洲的信息检索会议, 前几年还只是英国的信息检索会议. ICTAI (3+): IEEE最主要的人工智能会议, 偏应用, 是被IEEE办烂的一个典型. 以前的quality还是不错的, 但是办得越久声誉反倒越差了, 糟糕的是似乎还在继续下滑, 现在其实3+已经不太呆得住了. PAKDD (3+): 亚太数据挖掘会议, 目前在数据挖掘会议里排第5. ICANN (3+): 欧洲的神经网络会议, 从quality来说是神经网络会议中最好的, 但这个领域的人不重视会议,在该领域它的重要性不如IJCNN. AJCAI (3): 澳大利亚的综合型人工智能会议, 在国家/地区级AI会议中算不错的了. CAI (3): 加拿大的综合型人工智能会议, 在国家/地区级AI会议中算不错的了. CEC (3): 进化计算方面最重要的会议之一, 盛会型. IJCNN/CEC/FUZZ-IEEE这三个会议是计算智能或者说软计算方面最重要的会议, 它们经常一起开, 这时就叫WCCI (World Congress on Computational Intelligence). 但这个领域和CS其他分支不太一样, 倒是和其他学科相似, 只重视journal, 不重视会议, 所以录用率经常在85%左右, 所录文章既有quality非常高的论文, 也有入门新手的习作. FUZZ-IEEE (3): 模糊方面最重要的会议, 盛会型, 参见CEC的介绍. GECCO (3): 进化计算方面最重要的会议之一, 与CEC相当,盛会型. ICASSP (3): 语音方面最重要的会议之一, 这个领域的人也不很care会议. ICIP (3): 图像处理方面最著名的会议之一, 盛会型. ICPR (3): 模式识别方面最著名的会议之一, 盛会型. IEA/AIE (3): 人工智能应用会议. 一般的会议提名优秀论文的通常只有几篇文章, 被提名就已经是很高的荣誉了, 这个会很有趣, 每次都搞1、20篇的优秀论文提名, 专门搞几个session做被提名论文报告, 倒是很热闹. IJCNN (3): 神经网络方面最重要的会议, 盛会型, 参见CEC的介绍. IJNLP (3): 计算语言学/自然语言处理方面比较著名的一个会议. PRICAI (3): 亚太综合型人工智能会议, 虽然历史不算短了, 但因为比它好或者相当的综合型会议太多, 所以很难上升. Combined List: 说明: 纯属个人看法, 仅供参考. tier-1的列得较全, tier-2的不太全, tier-3的很不全.同分的按字母序排列. 不很严谨地说, tier-1是可以令人羡慕的, tier-2是可以令人尊敬的,由于AI的相关会议非常多, 所以能列进tier-3的也是不错的 tier-1: IJCAI (1+): International Joint Conference on Artificial Intelligence AAAI (1): National Conference on Artificial Intelligence COLT (1): Annual Conference on Computational Learning Theory CVPR (1): IEEE International Conference on Computer Vision and Pattern Recognition ICCV (1): IEEE International Conference on Computer Vision ICML (1): International Conference on Machine Learning NIPS (1): Annual Conference on Neural Information Processing Systems ACL (1-): Annual Meeting of the Association for Computational Linguistics KR (1-): International Conference on Principles of Knowledge Representation and Reasoning SIGIR (1-): Annual International ACM SIGIR Conference on Research and Development in Information Retrieval SIGKDD (1-): ACM SIGKDD International Conference on Knowledge Discovery and Data Mining UAI (1-): International Conference on Uncertainty in Artificial Intelligence tier-2: AAMAS (2+): International Joint Conference on Autonomous Agents and Multiagent Systems ECCV (2+): European Conference on Computer Vision ECML (2+): European Conference on Machine Learning ICDM (2+): IEEE International Conference on Data Mining SDM (2+): SIAM International Conference on Data Mining ICAPS (2): International Conference on Automated Planning and Scheduling ICCBR (2): International Conference on Case-Based Reasoning COLLING (2): International Conference on Computational Linguistics ECAI (2): European Conference on Artificial Intelligence ALT (2-): International Conference on Algorithmic Learning Theory EMNLP (2-): Conference on Empirical Methods in Natural Language Processing ILP (2-): International Conference on Inductive Logic Programming PKDD (2-): European Conference on Principles and Practice of Knowledge Discovery in Databases tier-3: ACCV (3+): Asian Conference on Computer Vision DS (3+): International Conference on Discovery Science ECIR (3+): European Conference on IR Research ICTAI (3+): IEEE International Conference on Tools with Artificial Intelligence PAKDD (3+): Pacific-Asia Conference on Knowledge Discovery and Data Mining ICANN (3+): International Conference on Artificial Neural Networks AJCAI (3): Australian Joint Conference on Artificial Intelligence CAI (3): Canadian Conference on Artificial Intelligence CEC (3): IEEE Congress on Evolutionary Computation FUZZ-IEEE (3): IEEE International Conference on Fu Systems GECCO (3): Genetic and Evolutionary Computation Conference ICASSP (3): International Conference on Acoustics, Speech, and Signal Processing ICIP (3): International Conference on Image Processing ICPR (3): International Conference on Pattern Recognition IEA/AIE (3): International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems IJCNN (3): International Joint Conference on Neural Networks IJNLP (3): International Joint Conference on Natural Language Processing PRICAI (3): Pacific-Rim International Conference on Artificial Intelligence
194 次阅读|0 个评论
machine learning introduction
jiangdm 2012-2-6 09:35
书籍: 1. 入门好书《Programming Collective Intelligence》,培养兴趣是最重要的一环,一上来看大部头很容易被吓走的:P 2. Peter Norvig 的 《AI, Modern Approach 2nd》(无争议的领域经典),上次讨论中 Shenli 使我开始看这本书了,建议有选择的看,部头还是太大了,比如我先看里面的概率推理部分的。 3. 《The Elements of Statistical Learning》,数学性比较强,可以做参考了。 4. 《Foundations of Statistical Natural Language Processing》,自然语言处理领域公认经典。 5. 《Data Mining, Concepts and Techniques》,华裔科学家写的书,相当深入浅出。 6. 《Managing Gigabytes》,信息检索经典。 7. 《Information Theory:Inference and Learning Algorithms》,参考书吧,比较深。 相关数学基础(参考,不是拿来通读的): 《矩阵分析》,Roger Horn。矩阵分析领域无争议的经典。 《概率论及其应用》,威廉·费勒。也是极牛的书。 线性代数的? 《Nonlinear Programming, 2nd》非线性规划的参考书。 《Convex Optimization》凸优化的参考书 Bishop Pattern Recognition and Machine Learning 工具: 1. Weka (或知识发现大奖的数据挖掘开源工具集,非常华丽 Tom Michell 主页上 Andrew Moore 周志华 统计机器学习概论 .ppt 强化学习.ppt 机器学习研究进展.ppt 机器学习及其挑战.ppt AdvanceML.ppt 机器学习研究及最新进展.pdf
个人分类: ML|1 次阅读|0 个评论
IEEE Spectrum Seminar – S2 – The brain of a new machine
feiyou 2011-12-21 17:07
Abstract: 介绍由于忆阻器的研究进展,人工智能在HP实验室的研究下能够往前进一大步。本Session从大脑和神经元的基础入手,解释人工智能的现状和终极目标。 并介绍了忆阻器的基本原理和特性。介绍HP实验室的MoNETA计划的结构、原理及研发步骤和最终目标。 slides download here (资料中涉及图表文字均归原作者所有,仅作个人学习参考,部分文献来自网络,未一一列举参考文献)
2173 次阅读|0 个评论
[转载]南京大学周志华对AI会议的评述
hustliaohh 2011-12-5 20:58
The First Class: 今天先谈谈AI里面tier-1的conferences, 其实基本上就是AI里面大家比较公认的top conference. 下面同分的按字母序排列. IJCAI (1+): AI最好的综合性会议, 1969年开始, 每两年开一次, 奇数年开. 因为AI 实在太大, 所以虽然每届基本上能录100多篇(现在已经到200多篇了),但分到每个 领域就没几篇了,象machine learning、computer vision这么大的领域每次大概也 就10篇左右, 所以难度很大. 不过从录用率上来看倒不太低,基本上20%左右, 因为内 行人都会掂掂分量, 没希望的就别浪费reviewer的时间了. 最近中国大陆投往国际会 议的文章象潮水一样, 而且因为国内很少有能自己把关的研究组, 所以很多会议都在 complain说中国的低质量文章严重妨碍了PC的工作效率. 在这种情况下, 估计这几年 国际会议的录用率都会降下去. 另外, 以前的IJCAI是没有poster的, 03年开始, 为了 减少被误杀的好人, 增加了2页纸的poster.值得一提的是, IJCAI是由貌似一个公司 的"IJCAI Inc."主办的(当然实际上并不是公司, 实际上是个基金会), 每次会议上要 发几个奖, 其中最重要的两个是IJCAI Research Excellence Award 和 Computer Thoughts Award, 前者是终身成就奖, 每次一个人, 基本上是AI的最高奖(有趣的 是, 以AI为主业拿图灵奖的6位中, 有2位还没得到这个奖), 后者是奖给35岁以下的 青年科学家, 每次一个人. 这两个奖的获奖演说是每次IJCAI的一个重头戏.另外, IJCAI 的 PC member 相当于其他会议的area chair, 权力很大, 因为是由PC member 去找 reviewer 来审, 而不象一般会议的PC member其实就是 reviewer. 为了制约 这种权力, IJCAI的审稿程序是每篇文章分配2位PC member, primary PC member去找 3位reviewer, second PC member 找一位. AAAI (1): 美国人工智能学会AAAI的年会. 是一个很好的会议, 但其档次不稳定, 可 以给到1+, 也可以给到1-或者2+, 总的来说我给它"1". 这是因为它的开法完全受 IJCAI制约: 每年开, 但如果这一年的IJCAI在北美举行, 那么就停开. 所以, 偶数年 里因为没有IJCAI, 它就是最好的AI综合性会议, 但因为号召力毕竟比IJCAI要小一些, 特别是欧洲人捧AAAI场的比IJCAI少得多(其实亚洲人也是), 所以比IJCAI还是要稍弱 一点, 基本上在1和1+之间; 在奇数年, 如果IJCAI不在北美, AAAI自然就变成了比 IJCAI低一级的会议(1-或2+), 例如2005年既有IJCAI又有AAAI, 两个会议就进行了协 调, 使得IJCAI的录用通知时间比AAAI的deadline早那么几天, 这样IJCAI落选的文章 可以投往AAAI.在审稿时IJCAI 的 PC chair也在一直催, 说大家一定要快, 因为AAAI 那边一直在担心IJCAI的录用通知出晚了AAAI就麻烦了. COLT (1): 这是计算学习理论最好的会议, ACM主办, 每年举行. 计算学习理论基本上 可以看成理论计算机科学和机器学习的交叉, 所以这个会被一些人看成是理论计算 机科学的会而不是AI的会. 我一个朋友用一句话对它进行了精彩的刻画: "一小群数 学家在开会". 因为COLT的领域比较小, 所以每年会议基本上都是那些人. 这里顺便 提一件有趣的事, 因为最近国内搞的会议太多太滥, 而且很多会议都是LNCS/LNAI出 论文集, LNCS/LNAI基本上已经被搞臭了, 但很不幸的是, LNCS/LNAI中有一些很好的 会议, 例如COLT. CVPR (1): 计算机视觉和模式识别方面最好的会议之一, IEEE主办, 每年举行. 虽然题 目上有计算机视觉, 但个人认为它的模式识别味道更重一些. 事实上它应该是模式识 别最好的会议, 而在计算机视觉方面, 还有ICCV与之相当. IEEE一直有个倾向, 要把 会办成"盛会", 历史上已经有些会被它从quality很好的会办成"盛会"了. CVPR搞不好 也要走这条路. 这几年录的文章已经不少了. 最近负责CVPR会议的TC的chair发信 说, 对这个community来说, 让好人被误杀比被坏人漏网更糟糕, 所以我们是不是要减 少好人被误杀的机会啊? 所以我估计明年或者后年的CVPR就要扩招了. ICCV (1): 介绍CVPR的时候说过了, 计算机视觉方面最好的会之一. IEEE主办, 每年举行 . ICML (1): 机器学习方面最好的会议之一. 现在是IMLS主办, 每年举行. 参见关于NIPS的 介绍. NIPS (1): 神经计算方面最好的会议之一, NIPS主办, 每年举行. 值得注意的是, 这个会 每年的举办地都是一样的, 以前是美国丹佛, 现在是加拿大温哥华; 而且它是年底开会, 会开完后第2年才出论文集, 也就是说, NIPS'05的论文集是06年出. 会议的名字是 "Advances in Neural Information Processing Systems", 所以, 与ICML\ECML这样 的"标准的"机器学习会议不同, NIPS里有相当一部分神经科学的内容, 和机器学习有 一定的距离. 但由于会议的主体内容是机器学习, 或者说与机器学习关系紧密, 所以 不少人把NIPS看成是机器学习方面最好的会议之一. 这个会议基本上控制在Michael Jordan的徒子徒孙手中, 所以对Jordan系的人来说, 发NIPS并不是难事, 一些未必很 强的工作也能发上去, 但对这个圈子之外的人来说, 想发一篇实在很难, 因为留给"外 人"的口子很小. 所以对Jordan系以外的人来说, 发NIPS的难度比ICML更大. 换句话说, ICML比较开放, 小圈子的影响不象NIPS那么大, 所以北美和欧洲人都认, 而NIPS则有 些人(特别是一些欧洲人, 包括一些大家)坚决不投稿. 这对会议本身当然并不是好事, 但因为Jordan系很强大, 所以它似乎也不太care. 最近IMLS(国际机器学习学会)改选 理事, 有资格提名的人包括近三年在ICML\ECML\COLT发过文章的人, NIPS则被排除在 外了. 无论如何, 这是一个非常好的会. ACL (1-): 计算语言学/自然语言处理方面最好的会议, ACL (Association of Computational Linguistics) 主办, 每年开. KR (1-): 知识表示和推理方面最好的会议之一, 实际上也是传统AI(即基于逻辑的AI) 最好的会议之一. KR Inc.主办, 现在是偶数年开. SIGIR (1-): 信息检索方面最好的会议, ACM主办, 每年开. 这个会现在小圈子气越来 越重. 信息检索应该不算AI, 不过因为这里面用到机器学习越来越多, 最近几年甚至 有点机器学习应用会议的味道了, 所以把它也列进来. SIGKDD (1-): 数据挖掘方面最好的会议, ACM主办, 每年开. 这个会议历史比较短, 毕竟, 与其他领域相比,数据挖掘还只是个小弟弟甚至小侄儿. 在几年前还很难把它列 在tier-1里面, 一方面是名声远不及其他的top conference响亮, 另一方面是相对容易 被录用. 但现在它被列在tier-1应该是毫无疑问的事情了. 另: 参见sir和lucky的介绍. UAI (1-): 名字叫"人工智能中的不确定性", 涉及表示\推理\学习等很多方面, AUAI (Association of UAI) 主办, 每年开. The Second Class: tier-2的会议列得不全, 我熟悉的领域比较全一些. AAMAS (2+): agent方面最好的会议. 但是现在agent已经是一个一般性的概念, 几乎所有AI有关的会议上都有这方面的内容, 所以AAMAS下降的趋势非常明显. ECCV (2+): 计算机视觉方面仅次于ICCV的会议, 因为这个领域发展很快, 有可能 升级到1-去. ECML (2+): 机器学习方面仅次于ICML的会议, 欧洲人极力捧场, 一些人认为它已 经是1-了. 我保守一点, 仍然把它放在2+. 因为机器学习发展很快, 这个会议 的reputation上升非常明显. ICDM (2+): 数据挖掘方面仅次于SIGKDD的会议, 目前和SDM相当. 这个会只有5年 历史, 上升速度之快非常惊人. 几年前ICDM还比不上PAKDD, 现在已经拉开很大 距离了. SDM (2+): 数据挖掘方面仅次于SIGKDD的会议, 目前和ICDM相当. SIAM的底子很厚, 但在CS里面的影响比ACM和IEEE还是要小, SDM眼看着要被ICDM超过了, 但至少 目前还是相当的. ICAPS (2): 人工智能规划方面最好的会议, 是由以前的国际和欧洲规划会议合并 来的. 因为这个领域逐渐变冷清, 影响比以前已经小了. ICCBR (2): Case-Based Reasoning方面最好的会议. 因为领域不太大, 而且一直 半冷不热, 所以总是停留在2上. COLLING (2): 计算语言学/自然语言处理方面仅次于ACL的会, 但与ACL的差距比 ICCV-ECCV和ICML-ECML大得多. ECAI (2): 欧洲的人工智能综合型会议, 历史很久, 但因为有IJCAI/AAAI压着, 很难往上升. ALT (2-): 有点象COLT的tier-2版, 但因为搞计算学习理论的人没多少, 做得好 的数来数去就那么些group, 基本上到COLT去了, 所以ALT里面有不少并非计算 学习理论的内容. EMNLP (2-): 计算语言学/自然语言处理方面一个不错的会. 有些人认为与COLLING 相当, 但我觉得它还是要弱一点. ILP (2-): 归纳逻辑程序设计方面最好的会议. 但因为很多其他会议里都有ILP方面 的内容, 所以它只能保住2-的位置了. PKDD (2-): 欧洲的数据挖掘会议, 目前在数据挖掘会议里面排第4. 欧洲人很想把 它抬起来, 所以这些年一直和ECML一起捆绑着开, 希望能借ECML把它带起来. 但因为ICDM和SDM, 这已经不太可能了. 所以今年的PKDD和ECML虽然还是一起开, 但已经独立审稿了(以前是可以同时投两个会, 作者可以声明优先被哪个会考虑, 如果ECML中不了还可以被PKDD接受). The Third Class: 列得很不全. 另外, 因为AI的相关会议非常多, 所以能列在tier-3也算不错了, 基本上能 进到所有AI会议中的前30%吧 ACCV (3+): 亚洲的计算机视觉会议, 在亚太级别的会议里算很好的了. DS (3+): 日本人发起的一个接近数据挖掘的会议. ECIR (3+): 欧洲的信息检索会议, 前几年还只是英国的信息检索会议. ICTAI (3+): IEEE最主要的人工智能会议, 偏应用, 是被IEEE办烂的一个典型. 以前的 quality还是不错的, 但是办得越久声誉反倒越差了, 糟糕的是似乎还在继续下滑, 现在 其实3+已经不太呆得住了. PAKDD (3+): 亚太数据挖掘会议, 目前在数据挖掘会议里排第5. ICANN (3+): 欧洲的神经网络会议, 从quality来说是神经网络会议中最好的, 但这个领域 的人不重视会议,在该领域它的重要性不如IJCNN. AJCAI (3): 澳大利亚的综合型人工智能会议, 在国家/地区级AI会议中算不错的了. CAI (3): 加拿大的综合型人工智能会议, 在国家/地区级AI会议中算不错的了. CEC (3): 进化计算方面最重要的会议之一, 盛会型. IJCNN/CEC/FUZZ-IEEE这三个会议是 计算智能或者说软计算方面最重要的会议, 它们经常一起开, 这时就叫WCCI (World Congress on Computational Intelligence). 但这个领域和CS其他分支不太一样, 倒是和 其他学科相似, 只重视journal, 不重视会议, 所以录用率经常在85%左右, 所录文章既有 quality非常高的论文, 也有入门新手的习作. FUZZ-IEEE (3): 模糊方面最重要的会议, 盛会型, 参见CEC的介绍. GECCO (3): 进化计算方面最重要的会议之一, 与CEC相当,盛会型. ICASSP (3): 语音方面最重要的会议之一, 这个领域的人也不很care会议. ICIP (3): 图像处理方面最著名的会议之一, 盛会型. ICPR (3): 模式识别方面最著名的会议之一, 盛会型. IEA/AIE (3): 人工智能应用会议. 一般的会议提名优秀论文的通常只有几篇文章, 被提名 就已经是很高的荣誉了, 这个会很有趣, 每次都搞1、20篇的优秀论文提名, 专门搞几个 session做被提名论文报告, 倒是很热闹. IJCNN (3): 神经网络方面最重要的会议, 盛会型, 参见CEC的介绍. IJNLP (3): 计算语言学/自然语言处理方面比较著名的一个会议. PRICAI (3): 亚太综合型人工智能会议, 虽然历史不算短了, 但因为比它好或者相当的综 合型会议太多, 所以很难上升. Combined List: 说明: 纯属个人看法, 仅供参考. tier-1的列得较全, tier-2的不太全, tier-3的很不全 . 同分的按字母序排列. 不很严谨地说, tier-1是可以令人羡慕的, tier-2是可以令 人尊敬的,由于AI的相关会议非常多, 所以能列进tier-3的也是不错的 tier-1: IJCAI (1+): International Joint Conference on Artificial Intelligence AAAI (1): National Conference on Artificial Intelligence COLT (1): Annual Conference on Computational Learning Theory CVPR (1): IEEE International Conference on Computer Vision and Pattern Recognition ICCV (1): IEEE International Conference on Computer Vision ICML (1): International Conference on Machine Learning NIPS (1): Annual Conference on Neural Information Processing Systems ACL (1-): Annual Meeting of the Association for Computational Linguistics KR (1-): International Conference on Principles of Knowledge Representation and Reasoning SIGIR (1-): Annual International ACM SIGIR Conference on Research and Development in Information Retrieval SIGKDD (1-): ACM SIGKDD International Conference on Knowledge Discovery and Data Mining UAI (1-): International Conference on Uncertainty in Artificial Intelligence tier-2: AAMAS (2+): International Joint Conference on Autonomous Agents and Multiagent Systems ECCV (2+): European Conference on Computer Vision ECML (2+): European Conference on Machine Learning ICDM (2+): IEEE International Conference on Data Mining SDM (2+): SIAM International Conference on Data Mining ICAPS (2): International Conference on Automated Planning and Scheduling ICCBR (2): International Conference on Case-Based Reasoning COLLING (2): International Conference on Computational Linguistics ECAI (2): European Conference on Artificial Intelligence ALT (2-): International Conference on Algorithmic Learning Theory EMNLP (2-): Conference on Empirical Methods in Natural Language Processing ILP (2-): International Conference on Inductive Logic Programming PKDD (2-): European Conference on Principles and Practice of Knowledge Discovery in Databases tier-3: ACCV (3+): Asian Conference on Computer Vision DS (3+): International Conference on Discovery Science ECIR (3+): European Conference on IR Research ICTAI (3+): IEEE International Conference on Tools with Artificial Intelligence PAKDD (3+): Pacific-Asia Conference on Knowledge Discovery and Data Mining ICANN (3+): International Conference on Artificial Neural Networks AJCAI (3): Australian Joint Conference on Artificial Intelligence CAI (3): Canadian Conference on Artificial Intelligence CEC (3): IEEE Congress on Evolutionary Computation FUZZ-IEEE (3): IEEE International Conference on Fu Systems GECCO (3): Genetic and Evolutionary Computation Conference ICASSP (3): International Conference on Acoustics, Speech, and Signal Processing ICIP (3): International Conference on Image Processing ICPR (3): International Conference on Pattern Recognition IEA/AIE (3): International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems IJCNN (3): International Joint Conference on Neural Networks IJNLP (3): International Joint Conference on Natural Language Processing PRICAI (3): Pacific-Rim International Conference on Artificial Intelligence 关于List的补充说明: 列list只是为了帮助新人熟悉领域, 给出的评分或等级都是个人意见, 仅供参考. 特别要 说明的是: 1. tier-1 conference上的文章并不一定比tier-3的好, 只能说前者的平均水准更高. 2. 研究工作的好坏不是以它发表在哪儿来决定的, 发表在高档次的地方只是为了让工作更 容易被同行注意到. tier-3会议上发表1篇被引用10次的文章可能比在tier-1会议上发表1 0篇被引用0次的文章更有价值. 所以, 数top会议文章数并没有太大意义, 重要的是同行的 评价和认可程度. 3. 很多经典工作并不是发表在高档次的发表源上, 有不少经典工作甚至是发表在很低档的 发表源上. 原因很多, 就不细说了. 4. 会议毕竟是会议, 由于审稿时间紧, 错杀好人和漏过坏人的情况比比皆是, 更何况还要 考虑到有不少刚开始做研究的学生在代老板审稿. 5. 会议的reputation并不是一成不变的,新会议可能一开始没什么声誉,但过几年后就野 鸡变凤凰,老会议可能原来声誉很好,但越来越往下滑. 6. 只有计算机科学才重视会议论文, 其他学科并不把会议当回事. 但在计算机科学中也有 不太重视会议的分支. 7. Politics无所不在. 你老板是谁, 你在哪个研究组, 你在哪个单位, 这些简单的因素都 可能造成决定性的影响. 换言之, 不同环境的人发表的难度是不一样的. 了解到这一点后 , 你可能会对high-level发表源上来自low-level单位名不见经传作者的文章特别注意(例 如如果计算机学报上发表了平顶山铁道电子信息科技学院的作者的文章,我一定会仔细读 ). 8. 评价体系有巨大的影响. 不管是在哪儿谋生的学者, 都需要在一定程度上去迎合评价体 系, 否则连生路都没有了, 还谈什么做研究. 以国内来说, 由于评价体系只重视journal, 有一些工作做得很出色的学者甚至从来不投会议. 另外, 经费也有巨大的制约作用. 国外 很多好的研究组往往是重要会议都有文章. 但国内是不行的, 档次低一些的会议还可以投 了只交注册费不开会, 档次高的会议不去做报告会有很大的负面影响, 所以只能投很少的 会议. 这是在国内做CS研究最不利的地方. 我的一个猜想:人民币升值对国内CS研究会有 不小的促进作用(当然, 人民币升值对整个中国来说利大于弊还是弊大于利很难说). 9. ... ... 最近,实验室的大牛Gawd也写了一个rank,说的也很好,并且反映了这些会议的一些最新 的变化。 1. NIPS:心中的痛,同样也是心中的梦想。尽管屡投屡被挫,但是要越挫越勇。不得不承 认,NIPS上的文章质量上还是非常上乘的。对于圈外人来说,NIPS总是很难。不管其匿名 与否,圈内的人一看便可猜个大概文章究竟出自何方神圣笔下。难为了我们这些无名小卒 。 1. UAI:同样也是梦想,文章质量一样的上乘,同样的难。 3. ICML:虽说去年国内人士灌水无数,但这并不能削弱ICML文章的质量和影响力,从另外 一个侧面讲,这正是说明了国内ML的进步神速。 3. AISTATS:别说他的文章接收率高,但和nips一样,想要进去还真得颇下一番功夫。看 看zoubin等人对此会如此钟爱便可知晓该会议文章质量究竟如何。当然,该会议每每选心 旷神怡之处做为栖身之地,也成为其吸引大牛竞相前往的得力手段。 5. IJCAI:且不论从前的IJCAI如何,上次在印度开的IJCAI可真是有点让人大跌眼镜,or al和poster文章数量之多都是一个里程碑。然而,并不能否定IJCAI在权内的影响力和号召 力,对于做AI的人来说,IJCAI应该还是一个最主要的会议。只是如果照上届那般海收,恐 怕文章质量无法保障。不知下届如何。 6. KDD:之所以把他写在IJCAI之后是因为他毕竟是从IJCAI上派生出来的,而其直至今日 的发展速度不可谓不让人赞叹。然而我认为,一个会如果不匿名,那么其文章质量必然得 不到大的飞跃。KDD里面的猫腻多了去了,不必熬述。希望可以尽快匿名。 7. ICCV:对于做CV的人来说,一定不同意把他排在第7位,然而就去年的iccv来看,在上 面发文的做方法的人比比皆是。对于做方法的人来说,这个位置我想比较公平。具体原由 大家只要看看上面做方法的文章便知。 7. CVPR:无法比较它与ICCV孰优孰劣,也许大部分人都认为ICCV稍胜一筹。我确认为不然 ,就去年ICCV上所发表的CVPR拒掉文章的数量便可见一斑。而且CVPR文章的投稿数量和接 收文章数都是与日俱增,不知是否在Miami可以看到上千人开会的场面。 7. SIGIR:做方法的人要在SIGIR上篇文章还是挺难的,因为实验的要求比较高。当然,M SRA除外。 10. AAAI:将其放在此处并不是从AI的角度,而还是从ML相关的方法的角度。该会议上出 现的ML以及PR文章的影响力不如上面8个会,不信可以去查citeseer。 11. SIGMM:当然,不可从做系统的同学的角度如此评价此会议。但是对于就是用ML的方法 做了MM应用的人来说,此会议也只能在这个地方了。 11. ECML:尽管上面有ICML押着,不过文章质量一直很坚挺,这点从接收率也可见一斑。 11. SDM:尽管老吴一直力挺ICDM,但我认为其文章质量仍无法与SDM相比。从参会的人就 可以看到。 11. ECCV:亏就亏在E字头了,不过文章质量不错。另外我发现上面发表的有关纯粹方法的 文章并不多。 15. ICDM:尽管一直用接收率标榜自己,然而一个最主要的原因是基数大。如果文章质量 没有硬保障,接收率也只是空谈。 上面也只是列出了我感兴趣的一些会议,从做方法的人的角度来看的rank。一家之谈,仅 供消遣。
个人分类: 科研道路|6315 次阅读|0 个评论
why hybrid? on machine learning vs. hand-coded rules in NLP
热度 1 liwei999 2011-10-8 04:00
Before we start discussing the topic of a hybrid NLP (Natural Language Processing) system, let us look at the concept of hybrid from our life experiences. I was driving a classical Camry for years and had never thought of a change to other brands because as a vehicle, there was really nothing to complain. Yes, style is old but I am getting old too, who beats whom? Until one day a few years ago when we needed to buy a new car to retire my damaged Camry. My daughter suggested hybrid, following the trend of going green. So I ended up driving a Prius ever since and fallen in love with it. It is quiet, with bluetooth and line-in, ideal for my iPhone music enjoyment. It has low emission and I finally can say bye to smog tests. It at least saves 1/3 gas. We could have gained all these benefits by purchasing an expensive all-electronic car but I want the same feel of power at freeway and dislike the concept of having to charge the car too frequently. Hybrid gets the best of both worlds for me now, and is not that more expensive. Now back to NLP. There are two major approaches to NLP, namely machine learning and grammar engineering (or hand-crafted rule system). As mentioned in previous posts, each has its own strengths and limitations, as summarized below. In general, a rule system is good at capturing a specific language phenomenon (trees) while machine learning is good at representing the general picture of the phenomena (forest). As a result, it is easier for rule systems to reach high precision but it takes a long time to develop enough rules to gradually raise the recall. Machine learning, on the other hand, has much higher recall, usually with compromise in precision or with a precision ceiling. Machine learning is good at simple, clear and coarse-grained task while rules are good at fine-grained tasks. One example is sentiment extraction. The coarse-grained task there is sentiment classification of documents (thumbs-up thumbs down), which can be achieved fast by a learning system. The fine-grained task for sentiment extraction involves extraction of sentiment details and the related actionable insights, including association of the sentiment with an object, differentiating positive/negative emotions from positive/negative behaviors, capturing the aspects or features of the object involved, decoding the motivation or reasons behind the sentiment,etc. In order to perform sophisticated tasks of extracting such details and actionable insights, rules are a better fit. The strength for machine learning lies in its retraining ability. In theory, the algorithm, once developed and debugged, remains stable and the improvement of a learning system can be expected once a larger and better quality corpus is used for retraining (in practice, retraining is not always easy: I have seen famous learning systems deployed in client basis for years without being retrained for various reasons). Rules, on the other hand, need to be manually crafted and enhanced. Supervised machine learning is more mature for applications but it requires a large labelled corpus. Unsupervised machine learning only needs raw corpus, but it is research oriented and more risky in application. A promising approach is called semi-supervised learning which only needs a small labelled corpus as seeds to guide the learning. We can also use rules to generate the initial corpus or seeds for semi-supervised learning. Both approaches involve knowledge bottlenecks. Rule systems's bottleneck is the skilled labor, it requires linguists or knowledge engineers to manually encode each rule in NLP, much like a software engineer in the daily work of coding. The biggest challenge to machine learning is the sparse data problem, which requires a very large labelled corpus to help overcome. The knowledge bottleneck for supervised machine learning is the labor required for labeling such a large corpus. We can build a system to combine the two approaches to complement each other. There are different ways of combining the two approaches in a hybrid system. One example is the practice we use in our product, where the results of insights are structured in a back-off model: high precision results from rules are ranked higher than the medium precision results returned by statistical systems or machine learning. This helps the system to reach configurable balance between precision and recall. When labelled data are available (e.g. the community has already built the corpus, or for some tasks, the public domain has the data, e.g. sentiment classification of movie reviews can use the review data with users' feedback on 5-star scale), and when the task is simple and clearly defined, using machine learning will greatly speed up the development of a capability. Not every task is suitable for both approaches. (Note that suitability is in the eyes of beholder: I have seen many passionate ML specialists willing to try everything in ML irrespective of the nature of the task: as an old saying goes, when you have a hammer, everything looks like a nail.) For example, machine learning is good at document classification whilerules are mostly powerless for such tasks. But for complicated tasks such as deep parsing, rules constructed by linguists usually achieve better performance than machine learning. Rules also perform better for tasks which have clear patterns, for example, identifying data items like time,weight, length, money, address etc. This is because clear patterns can be directly encoded in rules to be logically complete in coverage while machine learning based on samples still has a sparse data challenge. When designing a system, in addition to using a hybrid approach for some tasks, for other tasks, we should choose the most suitable approach depending on the nature of the tasks. Other aspects of comparison between the two approaches involve the modularization and debugging in industrial development. A rule system can be structured as a pipeline of modules fairly easily so that a complicated task is decomposed into a series of subtasks handled by different levels of modules. In such an architecture, a reported bug is easy to localize and fix by adjusting the rules in the related module. Machine learning systems are based on the learned model trained from the corpus. The model itself, once learned, is often like a black-box (even when the model is represented by a list of symbolic rules as results of learning, it is risky to manually mess up with the rules in fixing a data quality bug). Bugs are supposed to be fixable during retraining of the model based on enhanced corpus and/or adjusting new features. But re-training is a complicated process which may or may not solve the problem. It is difficultto localize and directly handle specific reported bugs in machine learning. To conclude, due to the complementary nature for pros/cons of the two basic approaches to NLP, a hybrid system involving both approaches is desirable, worth more attention and exploration. There are different ways of combining the two approaches in a system, including a back-off model using rulles for precision and learning for recall, semi-supervised learning using high precision rules to generate initial corpus or “seeds”, etc.. Related posts: Comparison of Pros and Cons of Two NLP Approaches Is Google ranking based on machine learning ? 《立委随笔:语言自动分析的两个路子》 《立委随笔:机器学习和自然语言处理》 【置顶:立委科学网博客NLP博文一览(定期更新版)】
个人分类: 立委科普|8734 次阅读|1 个评论
review: 马尔可夫逻辑网络研究
jiangdm 2011-9-24 22:51
马尔可夫逻辑网络研究 徐从富, 郝春亮, 苏保君, 楼俊杰 软件学报 , 2011 摘要: 马尔可夫逻辑网络是将马尔可夫网络与一阶逻辑相结合的一种统计关系学习模型,在自然语言处理、复 杂网络、信息抽取等领域都有重要的应用前景.较为全面、深入地总结了马尔可夫逻辑网络的理论模型、推理、权重和结构学习,最后指出了马尔可夫逻辑网络未来的主要研究方向. 关键词: Markov 逻辑网;统计关系学习;概率图模型;推理;权重学习;结构学习 the hard work of AI: 如何有效地处理复杂性和不确定性等问题? -- 统计关系学习(statistical relational learning, SRL) -- 概率图模型(probabilistic graphical model,PGM) 统计关系学习: 通过集成关系/逻辑表示、概率推理、不确定性处理、机器学习和数据挖掘等方法,以获取关系数据中的似然模型 概率图模型: 一种通用化的不确定性知识表示和处理方法 -- 贝叶斯网络(Bayesian networks) -- 隐马尔可夫模型(hidden Markov model) -- 马尔可夫决策过程(Markov decision process) -- 神经网络(neural network) idea: 统计关系学习(尤其是关系/逻辑表示) + 概率图模型 马尔可夫逻辑网络(Markov logic networks)= Markov网 + 一阶逻辑 Markov 网常用近似推理算法: 1 Markov 逻辑网 1.1 Markov网和一阶逻辑 Markov 网: Markov 随机场(Markov random field,MRF) 1.2 Markov逻辑网的定义和示例 定义: Markov 逻辑网 2 Markov 逻辑网的推理 -- 概率图模型推理的基本问题: 计算边缘概率、条件概率以及对于最大可能存在状态的推理 -- Markov 逻辑网推理: 生成的闭Markov 网 2.1 最大可能性问题 MaxWalkSAT 算法 LazySAT 算法 2.2 边缘概率和条件概率 概率图模型一重要推理形式: 计算边缘概率和条件概率,通常采用MCMC 算法、BP 算法等 3 Markov 逻辑网的学习 3.1 参数学习 3.1.1 伪最大似然估计 3.1.2 判别训练 训练Markov 逻辑网权重的高效算法: VP(voted perceptron)算法、CD(contrastive divergence)算法 3.2 结构学习 3.2.1 评价标准 结构学习两个难题: 一是如何搜索潜在的结构; 二是如何为搜索到的结构建立起一个评价标准,即如何筛选出最优结构. 3.2.2 自顶而下的结构学习 4 算法比较和分析 4.1 与基于Bayesian网的统计关系学习算法比较 基于 Bayesian 网的SRL 算法: 传统Bayesian 网的基础上进行扩展的SRL 方法 4.2 与基于随机文法的统计关系学习算法比较 4.3 与基于HMM的统计关系学习算法比较 5 Markov 逻辑网的应用 5.1 应用概况 5.2 应用举例 6 述评 Markov 逻辑网: 1) 将传统的一阶谓词逻辑与当前主流的统计学习方法有机地结合起来 2) 填补了AI 等领域中存在的高层与底层之间的巨大鸿沟. -- 一阶谓词逻辑更适用于高层知识的表示与推理 -- 基于概率统计的机器学习方法则擅长于对底层数据进行统计学习. Open problem: (1) 增强算法的学习能力,使其可以从缺值数据中学习; (2) 提高真值闭从句的计算速度,解决结构学习算法效率的瓶颈问题; (3) 从一阶逻辑和Markov 网这两个方面完善Markov 逻辑网的理论; (4) 增强Markov 逻辑网模型的实用性,从而更好地解决实际应用问题. 马尔可夫链蒙特卡洛(Markov chain Monte Carlo,简称MCMC)方法 信念传播算法 推理 学习 个人点评: 没看懂,可关注,问其与 Bayesian Network什么关系呢? 马尔可夫逻辑网络研究.pdf
个人分类: AI & ML|7278 次阅读|0 个评论
review: 李群机器学习研究综述
热度 1 jiangdm 2011-9-24 21:56
《李群机器学习研究综述》,李凡长 何书萍 钱旭培 计算机学报,2010 摘 要   文中简述了李群机器学习的相关研究内容,包括李群机器学习的概念、公理假设、代数学习模型、几何学习模型、Dynkin图的几何学习算法、量子群、辛群分类器的设计、轨道生成学习算法等. 关键词:  李群机器学习;公理假设;李群;分类器 李群机器学习(Lie Group Machine Learning, LML) 李群机器学 vs 流形学习 个人点评: 文章作为综述,弱了。文章层次和分类不清楚,首先摘要没写好。 李群机器学习研究综述.pdf
个人分类: AI & ML|5007 次阅读|1 个评论
review: ELIQoS: 一种高效节能、与位置无关的传感器网络服务质量
jiangdm 2011-9-19 23:26
ELIQoS: 一种高效节能、与位置无关的传感器网络服务质量 毛莺池 龚海刚 刘明 陈道蓄 谢立 计算机研究与发 2006 摘要 如何保证在覆盖足够的监测区域的同时延长网络的寿命是无线传感器网络所面临的最重要问 题之一,广泛采用的策略是选出工作节点以满足应用期望的服务质量(即覆盖率),同时关闭其他冗余节 点L分析了随机部署网络在已知监测区域大小和节点感知范围情况下,无需节点位置信息,应用期望的 服务质量与所需的工作节点数量之间的数学关系L在此基础上提出了一种高效节能、与位置无关的传 感器网络服务质量协议(ELIQoS),协议根据节点能量大小,选取最少的工作节点满足应用期望的服务 质量L实验结果表明,ELIQoS协议不仅可以有效地提供满足应用期望的服务质量,而且可以减少能量 消耗,实现能耗负载均衡 关键词: 无线传感器网络;服务质量;覆盖;节能;状态调度 ELIQoS 一种高效节能、与位置无关的传感器网络服务质量协议.pdf
个人分类: Network|1 次阅读|0 个评论
reading "machine learning:concepts and techniques"
barsterd 2011-8-17 16:38
I. 数据挖掘基本步骤: 1.数据清洗; 2.数据集成; 3.数据选择; 4.数据变换; 5.数据挖掘; 6.模式评估; 7.知识表示。 II. 知识的类型:特征、关联、分类、聚类、延边分析。 III.数据挖掘基于对象的分类: 1.分类,离散量预测; 2.预测,连续量预测; IV. diff. between ML and DM: In contrast to machine learning, the emphasis of data mining lies on the discovery of previously unknown patterns as opposed to generalizing known patterns to new data. V. diff. between ID3 and C4.5: C4.5 made a number of improvements to ID3. Some of these are: Handling both continuous and discrete attributes - In order to handle continuous attributes, C4.5 creates a threshold and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it. Handling training data with missing attribute values - C4.5 allows attribute values to be marked as? for missing. Missing attribute values are simply not used in gain and entropy calculations. Handling attributes with differing costs. Pruning trees after creation - C4.5 goes back through the tree once it's been created and attempts to remove branches that do not help by replacing them with leaf nodes. PS: C5 performs better than C4.5. to be continued...
个人分类: 论文总结|0 个评论
review: Knowledge Discovery in Databases: An Overview
jiangdm 2011-8-4 15:52
《Knowledge Discovery in Databases: An Overview》, William J. Frawley, Gregory Piatetsky-Shapiro, and Christopher J. Matheus, AAAI ,1992 Abstract: After a decade of fundamental interdisciplinary research in machine learning, the spadework in this field has been done; the 1990s should see the widespread exploitation of knowledge discovery as an aid to assembling knowledge bases. The contributors to the AAAI Press book \emph{Knowledge Discovery in Databases} were excited at the potential benefits of this research. The editors hope that some of this excitement will communicate itself to AI Magazine readers of this article the goal of this article: This article presents an overview of the state of the art in research on knowledge discovery in databases. We analyze Knowledge Discovery and define it as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. We then compare and contrast database, machine learning, and other approaches to discovery in data. We present a framework for knowledge discovery and examine problems in dealing with large, noisy databases, the use of domain knowledge, the role of the user in the discovery process, discovery methods, and the form and uses of discovered knowledge. We also discuss application issues, including the variety of existing applications and propriety of discovery in social databases. We present criteria for selecting an application in a corporate environment. In conclusion, we argue that discovery in databases is both feasible and practical and outline directions for future research, which include better use of domain knowledge, efficient and incremental algorithms, interactive systems, and integration on multiple levels. 个人点评: 一篇老些的经典数据挖掘综述,个人认为本文两个入脚点:一是Machine Learning (Table 1,2),二是文中 Figure 1 Knowledge Discovery in Databases Overview.pdf beamer_Knowledge_Discovery_Database_Overview.pdf beamer_Knowledge_Discovery_Database_Overview.tex
个人分类: AI & ML|1 次阅读|0 个评论
[转载]Slides of Machine Learning Summer School
timy 2011-6-17 10:36
From: http://mlss2011.comp.nus.edu.sg/index.php?n=Site.Slides MLSS 2011 Machine Learning Summer School 13-17 June 2011, Singapore Slides Speaker Topic Slides Chiranjib Bhattacharyya Kernel Methods Slides ( pdf ) Wray Buntine Introduction to Machine Learning Slides ( pdf ) Zoubin Ghahramani Gaussian Processes, Graphical Model Structure Learning Slides (Part 1 pdf , Part 2 pdf , Part 3 pdf ) Stephen Gould Markov Random Fields for Computer Vision Slides (Part 1 pdf , Part 2 pdf , Part 3 pdf )]] Marko Grobelnik How We Represent Text? ...From Characters to Logic Slides ( pptx ) David Hardoon Multi-Source Learning; Theory and Application Slides ( pdf ) Mark Johnson Probabilistic Models for Computational Linguistics Slides (Part 1 pdf , Part 2 pdf , Part 3 pdf ) Wee Sun Lee Partially Observable Markov Decision Processes Slides ( pdf , pptx ) Hang Li Learning to Rank Slides ( pdf ) Sinno Pan Qiang Yang Transfer Learning Slides (Part1 pptx Part 2 pdf ) Tomi Silander Introduction to Graphical Models Slides ( pdf ) Yee Whye Teh Bayesian Nonparametrics Slides ( pdf ) Ivor Tsang Feature Selection using Structural SVM and its Applications Slides ( pdf ) Max Welling Learning in Markov Random Fields Slides ( pdf , pptx )
个人分类: 机器学习|4265 次阅读|0 个评论
[转载]Statistical Machine Translation 基于统计的机器翻译系统及原理
geneculture 2011-6-15 09:36
Statistical Machine Translation Abstract We have been developing the statistical machine translation system for speech to speech translation. We focus our research on text-to-text translation task now, but we will include speech-to-speech translation among our research topics soon. We have an interest in building a translation model, decoding a word graph and combining statistical machine translation system and speech recognizer. http://isoft.postech.ac.kr/research/SMT/smt.html Statistical Machine Translation Input : SMT system gets a foreign sentence as a input. Output : SMT system generates a native sentence which is a translation of the input Language Model is a model that provides the probability of an arbitarary word sequence. Translation Model is a model that provides the probabilities of possible translation pairs. Decoding Algorithm is a graph search algorithm that provides best path on a word graph. Decoding process A Decoder is a core component of the SMT systzm. The decoder gets possible partial translations from the translation model, then selects an re-arranges them to make the best translation. Initialize : create small partial model for caching an pre-calculate future cost. Hypothesis is a partial translation which generated by applying a series of tranlation options. Decoding process is iterations of two taks: choosing a hypothesis and exapnding the hypothesis. The process terminates if there is no remainig hypothesis to expand. Speech to Speech Machine Translation Speech to Speech Machine Translation can be achieved by cascading three independent components: ASR , SMT system and TTS system. That is, an output of ASR be an input for the SMT system and an output an output of the SMT system be an input for the TTS systm. We use cascading approach now, but we have an interest in joint model which combines ASR and SMT decoder. http://isoft.postech.ac.kr/research/SMT/smt.html
2034 次阅读|0 个评论
[转载]Classical Paper List on ML and NLP
wqfeng 2011-3-25 12:40
Classical Paper List on Machine Learning and Natural Language Processing from Zhiyuan Liu Hidden Markov Models Rabiner, L. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. (Proceedings of the IEEE 1989) Freitag and McCallum, 2000, Information Extraction with HMM Structures Learned by Stochastic Optimization, (AAAI'00) Maximum Entropy Adwait R. A Maximum Entropy Model for POS tagging, (1994) A. Berger, S. Della Pietra, and V. Della Pietra. A maximum entropy approach to natural language processing. (CL'1996) A. Ratnaparkhi. Maximum Entropy Models for Natural Language Ambiguity Resolution. PhD thesis, University of Pennsylvania, 1998. Hai Leong Chieu, 2002. A Maximum Entropy Approach to Information Extraction from Semi-Structured and Free Text, (AAAI'02) MEMM McCallum et al., 2000, Maximum Entropy Markov Models for Information Extraction and Segmentation, (ICML'00) Punyakanok and Roth, 2001, The Use of Classifiers in Sequential Inference. (NIPS'01) Perceptron McCallum, 2002 Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms (EMNLP'02) Y. Li, K. Bontcheva, and H. Cunningham. Using Uneven-Margins SVM and Perceptron for Information Extraction. (CoNLL'05) SVM Z. Zhang. Weakly-Supervised Relation Classification for Information Extraction (CIKM'04) H. Han et al. Automatic Document Metadata Extraction using Support Vector Machines (JCDL'03) Aidan Finn and Nicholas Kushmerick. Multi-level Boundary Classification for Information Extraction (ECML'2004) Yves Grandvalet, Johnny Marià , A Probabilistic Interpretation of SVMs with an Application to Unbalanced Classification. (NIPS' 05) CRFs J. Lafferty et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. (ICML'01) Hanna Wallach. Efficient Training of Conditional Random Fields. MS Thesis 2002 Taskar, B., Abbeel, P., and Koller, D. Discriminative probabilistic models for relational data. (UAI'02) Fei Sha and Fernando Pereira. Shallow Parsing with Conditional Random Fields. (HLT/NAACL 2003) B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. (NIPS'2003) S. Sarawagi and W. W. Cohen. Semi-Markov Conditional Random Fields for Information Extraction (NIPS'04) Brian Roark et al. Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm (ACL'2004) H. M. Wallach. Conditional Random Fields: An Introduction (2004) Kristjansson, T.; Culotta, A.; Viola, P.; and McCallum, A. Interactive Information Extraction with Constrained Conditional Random Fields. (AAAI'2004) Sunita Sarawagi and William W. Cohen. Semi-Markov Conditional Random Fields for Information Extraction. (NIPS'2004) John Lafferty, Xiaojin Zhu, and Yan Liu. Kernel Conditional Random Fields: Representation and Clique Selection. (ICML'2004) Topic Models Thomas Hofmann. Probabilistic Latent Semantic Indexing. (SIGIR'1999). David Blei, et al. Latent Dirichlet allocation. (JMLR'2003). Thomas L. Griffiths, Mark Steyvers. Finding Scientific Topics. (PNAS'2004). POS Tagging J. Kupiec. Robust part-of-speech tagging using a hidden Markov model. (Computer Speech and Language'1992) Hinrich Schutze and Yoram Singer. Part-of-Speech Tagging using a Variable Memory Markov Model. (ACL'1994) Adwait Ratnaparkhi. A maximum entropy model for part-of-speech tagging. (EMNLP'1996) Noun Phrase Extraction E. Xun, C. Huang, and M. Zhou. A Unified Statistical Model for the Identification of English baseNP. (ACL'00) Named Entity Recognition Andrew McCallum and Wei Li. Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-enhanced Lexicons. (CoNLL'2003). Moshe Fresko et al. A Hybrid Approach to NER by MEMM and Manual Rules, (CIKM'2005). Chinese Word Segmentation Fuchun Peng et al. Chinese Segmentation and New Word Detection using Conditional Random Fields, COLING 2004. Document Data Extraction Andrew McCallum, Dayne Freitag, and Fernando Pereira. Maximum entropy Markov models for information extraction and segmentation. (ICML'2000). David Pinto, Andrew McCallum, etc. Table Extraction Using Conditional Random Fields. SIGIR 2003. Fuchun Peng and Andrew McCallum. Accurate Information Extraction from Research Papers using Conditional Random Fields. (HLT-NAACL'2004) V. Carvalho, W. Cohen. Learning to Extract Signature and Reply Lines from Email. In Proc. of Conference on Email and Spam (CEAS'04) 2004. Jie Tang, Hang Li, Yunbo Cao, and Zhaohui Tang, Email Data Cleaning, SIGKDD'05 P. Viola, and M. Narasimhan. Learning to Extract Information from Semi-structured Text using a Discriminative Context Free Grammar. (SIGIR'05) Yunhua Hu, Hang Li, Yunbo Cao, Dmitriy Meyerzon, Li Teng, and Qinghua Zheng, Automatic Extraction of Titles from General Documents using Machine Learning, Information Processing and Management, 2006 Web Data Extraction Ariadna Quattoni, Michael Collins, and Trevor Darrell. Conditional Random Fields for Object Recognition. (NIPS'2004) Yunhua Hu, Guomao Xin, Ruihua Song, Guoping Hu, Shuming Shi, Yunbo Cao, and Hang Li, Title Extraction from Bodies of HTML Documents and Its Application to Web Page Retrieval, (SIGIR'05) Jun Zhu et al. Mutual Enhancement of Record Detection and Attribute Labeling in Web Data Extraction. (SIGKDD 2006) Event Extraction Kiyotaka Uchimoto, Qing Ma, Masaki Murata, Hiromi Ozaku, and Hitoshi Isahara. Named Entity Extraction Based on A Maximum Entropy Model and Transformation Rules. (ACL'2000) GuoDong Zhou and Jian Su. Named Entity Recognition using an HMM-based Chunk Tagger (ACL'2002) Hai Leong Chieu and Hwee Tou Ng. Named Entity Recognition: A Maximum Entropy Approach Using Global Information. (COLING'2002) Wei Li and Andrew McCallum. Rapid development of Hindi named entity recognition using conditional random fields and feature induction. ACM Trans. Asian Lang. Inf. Process. 2003 Question Answering Rohini K. Srihari and Wei Li. Information Extraction Supported Question Answering. (TREC'1999) Eric Nyberg et al. The JAVELIN Question-Answering System at TREC 2003: A Multi-Strategh Approach with Dynamic Planning. (TREC'2003) Natural Language Parsing Leonid Peshkin and Avi Pfeffer. Bayesian Information Extraction Network. (IJCAI'2003) Joon-Ho Lim et al. Semantic Role Labeling using Maximum Entropy Model. (CoNLL'2004) Trevor Cohn et al. Semantic Role Labeling with Tree Conditional Random Fields. (CoNLL'2005) Kristina toutanova, Aria Haghighi, and Christopher D. Manning. Joint Learning Improves Semantic Role Labeling. (ACL'2005) Shallow parsing Ferran Pla, Antonio Molina, and Natividad Prieto. Improving text chunking by means of lexical-contextual information in statistical language models. (CoNLL'2000) GuoDong Zhou, Jian Su, and TongGuan Tey. Hybrid text chunking. (CoNLL'2000) Fei Sha and Fernando Pereira. Shallow Parsing with Conditional Random Fields. (HLT-NAACL'2003) Acknowledgement Dr. Hang Li , for original paper list.
个人分类: 模式识别|3020 次阅读|0 个评论
获取cpu逻辑核的数量
热度 1 hillpig 2011-1-18 20:23
在many-core程序的设计中,往往需要获取cpu核的数量,觉得这段代码很好,摘录下来: 摘自:http://stackoverflow.com/questions/150355/programmatically-find-the-number-of-cores-on-a-machine #ifdef _WIN32 #include windows.h #elif MACOS #include sys/param.h #include sys/sysctl.h #else #include unistd.h #endif int getNumCores () { #ifdef WIN32 SYSTEM_INFO sysinfo ; GetSystemInfo ( sysinfo ); return sysinfo . dwNumberOfProcessors ; #elif MACOS int nm ; size_t len = 4 ; uint32_t count ; nm = CTL_HW ; nm = HW_AVAILCPU ; sysctl ( nm , 2 , count , len , NULL , 0 ); if ( count 1 ) { nm = HW_NCPU ; sysctl ( nm , 2 , count , len , NULL , 0 ); if ( count 1 ) { count = 1 ; } } return count ; #else return sysconf ( _SC_NPROCESSORS_ONLN ); #endif } 加我私人微信,交流技术。
个人分类: postgresql|6097 次阅读|2 个评论
临床营养学_百度百科—translate this page
zuojun 2010-12-2 14:31
Before you read this Blog , I have a short story to share with you. One day many months ago, a colleague asked me how I was going to make a living in a few years when MACHINE will be translating, say English into Chinese, or vice verse. I didn't know how to answer his question, and became worried (because I was planning to be a full-time freelance English editor). So, I went home and did my homework, by asking the machine to translate a page for me online. Guess what happened? This is what a machine can do for us, in terms of translation. Enjoy 百科名片 Wikipedia card 临床营养学是关于食物中营养素的性质,分布,代谢作用以及食物摄入不足的后果的一门科学。 Journal of Clinical Nutrition is about the nature of nutrients in food, distribution, metabolism and food intake in the consequences of a science. 临床营养学中的营养素是指食物中能被吸收及用于增进健康的化学物。 In Clinical Nutrition is the food nutrients can be absorbed and used to improve the health of the chemicals. 某些营养素是必需的,因为它们不能被机体合成,因此必须从食物中获得。 Certain nutrients are necessary because they can not synthesized by the body and therefore must obtain from food. 对患者来说,合理平衡的营养饮食极为重要。 For patients, a reasonable balance diet is extremely important. 医食同源,药食同根,表明营养饮食和药物对于治疗疾病有异曲同工之处。 Medical and Edible food and medicine from the same root, that diet and medication for the treatment of diseases would be similar. 合理的营养饮食可提高机体预防疾病、抗手术和麻醉的能力。 A reasonable diet can improve the body to prevent disease, the ability of anti-surgery and anesthesia.
个人分类: Thoughts of Mine|5407 次阅读|0 个评论
一周科技新闻简评——机器时代
songshuhui 2008-9-3 15:59
wilddonkey 发表于2008-04-21 星期一 18:35 分类: 其他 | | NASA 的工程师们正在对一种巨大的机器人进行测试,这种机器人的 使命是在荒无人烟的月球上充当宇航员的坐骑。这些大家伙宽 7.5 米,有六条 6 米多长的腿,绰号运动员,最快时速 10 公里。 运动员行动起来很稳重,像一只海龟,这有些辜负了它们的绰号。但稳重是必需的,因为背上是宇航员的住所。运动员每条腿都装上了轮子,可以行驶,经过有坡度的地方,各条腿的长度会自动进行调整,以确保住所里的宇航员坐的安稳。遇到大的沟坎,或是轮子陷进土里,运动 员会把轮子变成脚,六只脚轮流抬起、放下,小心翼翼走出困境。 有了这样一个坐骑,宇航员们就不必在固定的基地里日复一日的例行公事,而是有机会过上一种自由、浪漫的游牧生活。在荒凉的月球上四处游荡,在感兴趣的地方安营扎寨,勘探研究一番后,拔寨启程,开始新的旅途。 从月球到地球,人类越来越依赖于机器,各式各样的困难都试图通过 机器来解决。比如父母们在孩子长到一定程度,确切地说是牙齿已长齐并且能够独立使用牙刷的时候,会面临一个难题,那就是如何说服他们去刷牙。孩子们不喜欢刷牙,即便你成功地引发了他们的刷牙愿望,他们也会因为技术问题刷的不尽如人意,比如 5 岁的孩子通常只能清洁到 1/4 的牙齿。 为了解决这一难题,台湾大学的一个研究小组把这项枯燥乏味的琐事改造成了一场游戏。这场游戏的参与者包括接在普通牙刷 尾部的一个长方形小盒子,盒子的四个面嵌有以不同模式排列的三个发光二极管;一个安在盥洗台上方,通过发光二极管实时跟踪牙刷移动的摄像机;一个放在盥洗台上的计算机,接收摄像机发来的牙刷实时移动数据,据此精确计算出哪颗牙正在被刷到,并在屏幕上显示出一颗虚拟的牙齿从脏兮兮的五颜六色变为了光亮白净;最后,还需要一个觉得看到屏幕上的虚拟牙齿随着自己的动作一个个变干净非常有趣,乐在其中的孩子。 这场刷牙游戏或许还能吸引更大一点的孩子,甚至成年人,不过那需要屏幕上显示出 的场景更好玩一些。 家中添置了新机器,那些比电冰箱、洗衣机更高级、更智能的机器会给我们的 生活 带 来什 么新的改变?一个盘子大小的机器人正 在家中各个阴暗角落里 辛勤工作,确切地说它是一个自动吸尘器,商品名 Roo mba , iRobot 公司生产。但它的主人 不会认同这种说法,主人给它套上了外套,另起了名 字,赋予了性别,它成为了家中的一员。人们开始对身边的机器产生感情。 在伊拉克的美国士兵承认当他们的机器人被炸弹、地雷摧毁时,他们感到非常悲伤。 2020 年后,常驻月 球的宇航员们也会对自己的坐骑生出情感。未来,人与机器将演绎出更为复杂动人的情感纠葛。 标签: machine time , news
个人分类: 其他|1350 次阅读|0 个评论

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-5-29 20:39

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部