Inquire IntoThe Research Progress of Electrospinning Technology Summary: electrospinning technology is a method of directly, continuously manufacture polymer nanofiber at present. It has the advantages such as simple process, convenient operation, quick manufacture speed, etc. FoshanLepton Precision Measurement And Control Technology Co.,Ltd528225 Summary: ectrospinning technology is a method to directly, continuously manufacture polymer nanofiber at present. It has the advantages such as simple process, convenient operation, quick manufacture speed, etc. It’s extensive applied to medical and environmental areas. This article introduced electrospinning technologyin recent years and its application’s research progress, summarized the electrospinning theory and influencing factors, are proposed the prospects of the electrospinning technology application in the future. Keywords: electrospinning; nanofiber; progress Foreword: In the strict sense, nanofiber means the ultra minimum fiber whose diameter is less than 100nm. Its feature is high specific surface area and high porosity. Therefore, it can be extensive applied to the areas like high-efficiency filter materials, biological materials, high precision instruments, protective materials, nano composites, etc. In the 1990s, the research of nanotechnology was heated up, making the nanofiber manufacture become the research hot spot quickly. Electrospinning polymer nanofiber has the features like simple equipment, easy operation, etc. Up to now, it’s one of the most important methods of manufacturing polymer nanofiber. 1.Electrospinning The figure of the electrospinning equipment is showed as Diagram1. It’s mainly composed of three parts: high voltage power supply, nozzle and fiber collection device. Usually adopted direct-current power supply instead of alternating current power supply for high voltage power. Electrospinning needed for high voltage power 1~30kV. The syringe (or pipette) deliver the solution or melt to the nozzle in the terminal. The nozzle is a very thin metal tube with electrode. The collection device or collection plate is used for collecting nanofiber. Through changing the geometric dimensioning and shape of the collection device, we can change the nanofiber arrays form. 2.Electrospinning Technology Theory Back to 1882, Raleigh found that liquid drop with electric was unstable in the electric field. After entering the electric field, because of the electric field force, the liquid drop is easy to split into smaller liquid drops. Taylor’s research showed that liquid drop with electricity entered the electric field through the nozzle, under the combined action of electric field force and surface tension of liquid, the liquid drop gradually stretched, forming a cone (Taylor Cone) with a49.3° angle. In the process of electrospinning, the polymer solution or melt is extruded to the nozzle, because of the actions of electric field force and surface tension. There forms a Taylor Cone on the nozzle. Along with the spinning solution pushed into the electric field, it will spray out from the top of the Taylor Cone, In the electric field, it will be continue stretched under the electric field force. When the jet-flow is stretched to a certain extent, it will overcome the surface tension, curve unstably and be stretched and split into thinner jet-flows. At the moment, the jet-flow’s specific surface is increasing quickly, making the solvent volatilize quickly. At last it’s collected by the collection device, solidify and form the nonwoven cloth shape fibrofelt. 3. The Influencing Factors of Electrospinning The influencing factors of electrospinning mainly include solution properties (such as viscosity, thickness, apparent molecular weight distribution, elastic conductivity, dielectric constant, surface tension, etc.), process conditions (such as voltage, press ratio, the distance between nozzle and collection device, nozzle diameter, etc.) and environmental factors (such as temperature, humidity, gas flow rate, etc.). In this respect, many people did the research. The existing research result showed that in the electrospinning process, the main process parameters which influencing the fiber property mainly are: polymer solution thickness, spinning voltage, solidify distance ( the distance between nozzle and collection device), solvent volatility and extruded velocity, etc. (1)Polymer Solution Concentration The higher the polymer solution concertration, the higher the viscosity and surface tension. After leaving the nozzle, the splitting ability of the liquid drop is decreasing along with the surface tension increasing. Usually, when other conditions are unchanged, the fiber’s diameter will increase along with the polymer solution concentration increase. (2)Spinning Voltage Along with the voltage to the polymer solution increased , the system’s electrostatic force increased, the splitting ability of the liquid drop increased, the fiber diameter decreased. (3)Solidify Distance After spraying out from the nozzle, the polymer liquid drop along with the solvent volatilize in the air, concentrate and solidify to fiber, at last collected by the collection device. In different system, the solidify distance has different influence to the fiber diameter. Such as, the research of PS/THF system showed that changing the solidify distance, the influence to the fiber diameter is unapparent. But for the PAN/N, N-DMF system, the fiber diameter decreased along with the collection distance increased. (4)Solvent Similar with the regular solution spinning, the solvent property has a big influence to the forming, structure and property of the solution electrospinning fiber. The volatility of the solvent is very important to the fiber shape. 4. The Application of Electrospinning Technology With the development of nanotechnology, electrospinning as a simple, effective nanofiber manufacture new process technology, it will play a significant role in the biomedical materials, filtration and protection, catalysis, energy, photoelectricity, food engineering and cosmetics areas. ①In the biomedicine area, the nanofiber’s diameter is smaller than cell, can simulate the structure and biological function of natural cell epimatrix; the form and structure of many human tissues, organs are similar to the nanofiber. That makes the nanofiber be possibleto be used for repairing of the tissues and organs. Some electrospinning materials have good biocompatibility and degradability, can be the carrier to enter human body, is easy to be absorbed. Besides, the electrospinning nanofiber has high specific surface, high porosity and other good features. Therefore, it caused continue concern by the researchers in the biomedicine area, and well applied to drug delivery system, wound repair, biological tissue engineering and other aspects. ②The filter efficiency of the fiber filter material will improve along with the decrease of the fiber diameter. Therefore, decreasing the fiber diameter becomes an effective method to improve the filtering quality of fiber filter materials. Electrospinning fiber has many advantages, such as small diameter, small hole diameter, high porosity, uniform fiber etc. That makes it have great application potential in air filtration, liquid filtration and individual protection areas. ③Electrospinning can effectively regulate and control fiber’s fine structure, combining with low surface energy materials, can get super hydrophobic materials, and is hopeful to apply to the ship hull, petroleum pipeline inwall, high-rise glass, automotive glass, etc. But if electrospinning fiber materials want to achieve the above application in the self-cleaning field, have to improve the intensity, abradability and the binding strength of the fiber membrane material and substrate material, etc. ④Catalyst granule with nano structure is easy to unite, thus influencing its dispersibility and utilization ratio. Therefore electrospinning fiber material can be the formwork to uniformly disperse the granule. At the same time, it can also exert the flexibility and operational ease of the polymer carrier, as well as utilizing the composition of catalytic material and polymer surface in micro nano size, producing quite strong synergistic effect, improving the catalytic effect. ⑤Electrospinning nanofiber has quite high specific surface and porosity, can increase the active area of sensing materials and detected objects. It’s hopeful to increase the sensor’s performance substantially. Besides, electrospinning nanofiber can also be applied to many areas like energy, photoelectricity, food engineering etc. 5. The Technology Progress of Electrospinning 5.1. The Technological Improvement of Electrospinning Method (1)Combination Electrospinning In 2003, Philip University, Germany and Israel Zussmandeveloped the combination electrospinning technology together. There are two solutions and two nozzles in this spinning technology. On the front end of the nozzles formed the combination liquid drop, produced the jet-flow, the inner liquid drop also added into the jet-flow. Therefore, it’s difficult to control the liquid drop. If controlling well, it can produce core-shell structure fiber and hollow fiber. (2)The Development of TUFT TUFT is the abbreviation of tubular fiber template. It used polymer to manufacture nanofiber, made other polymer, metal, ceramic attach to the nanofiber, then removed the original polymer, made the fiber become hollow. It can also make composite layer to manufacture nano capacitor. For example, if adding polymer on the outside of the palladium particles, then can get nanocables which inside is electric conductor, outside is insulating layer. If making the aluminum attach to the polymer, then can get expoxy aluminum tube. If making the chromium attach to the polymer, then can get chromium tube. (3)Composite Nozzle Electrospinning basically adopted nozzle as its spinning way. The University of Shiga Prefecture, Japan developed composite nozzle. In order to continue to manufacture nanofiber nonwoven fabric, composite nozzle is indispensable. Because of the distance between the top, bottom, left and right is big, the electrostatic repulsion’s influence decrease. Therefore, usually set the nozzle according to this distance: left and right 10mm, top and bottom 50mm. The nozzle adopted diameter 0.5mm stainless steel tube, used good drug resistance fluorine rubber hose to delivery solution to each steel tube . Each steel tube inserted to the hole on the copper tube, applied high voltage to the copper tube. For this, the stainless steel tube need to connect to the copper tube firmly, but detachable. At present, the nozzle is linear arrangement. 5.2 M-ESP’s Development F.Ko connected the nozzle of the extruder to the ground, applied high voltage to the collector, electrostatic jet spinning PP. But using this device couldn’t get average diameter less than 1μmfiber. Besides, the fiber on the collector can’t be taken out in the high voltage status. It’s a problem of industrialization. Warner used plastic pipe to wind the PP filled syringe, made the heat carrier circularly manufacture melt, the spinning cabinet was in heating status, applied high voltage to the area between syringe nozzle and collector, then got the nanofiber first time. Joo added PLA to the syringe, manufactured a device which can control syringe temperature, spinning temperature, collector temperature, successfully manufactured PLA nanofiber. Above-mentioned research device, manufactured polymer melt in the container, set the nozzle on the container. This is only changing the solution to melt of the S-ESP. This method is the extending of S-ESP. University of Fukui, Japan developed a device which lasing polymer rod from a distance, melting a part of it, and applied high voltage to the melt. This device’s action principle is supplying polymer rod material (diameter less than 1mm) to the melt part in a certain speed (about 0.2mm/s), used carbon dioxide laser to heat its front end from three direction at the same time, made the polymer rod melt uniformly in part, and applied high voltage to the polymer melt, in the spinning area heating status, manufactured fiber by the electrostatic traction. The laser exposure part is fusiform, generate a fiber from its bottom (Figure 2). The result of various kinds of polymer fiber trial-manufacture showed that all are forming a fiber from the melt part, the fiber collected by the collector which diameter is less than 1μm. This device’s feature is: because using laser for heating melt, can do local heating in a moment, energy loss is small; because it’s heating indirectly, the requirement of the device is not high; because not using spinneret plate, high melting point slices can also be spun. 6.Concluding Remarks At present, electrospinning nanofiber technology is in its infancy, but already can see its extensive application prospect, it will produce billions of market value in the future. Researchers will also overcome every technology difficulty of nanofiber application area. Maybe every technology can drive the science and technology progress of the whole spinning industry. Of course, the progress of these technology also need every area of society cooperate closely to complete, such as spinning technology, chemistry technology, biology, polymer science and material science etc. Source: China Journal Net Writer: Jiezhuang Guo, Foshan Lepton Precision Measurement And Control Technology Co., Ltd.
疣是 人 类 乳 头 瘤病毒 ( HPV )所引起,以往 认为这 些疾病是慢性良性疾病,但 发现 HPV 感染后有一部分会 导 致 恶 性 肿 瘤,如皮肤癌、 舌癌 和 宫颈 癌等,因而引起人 们 的重 视 。 Use machine: SDL https://www.freetranslation.com/en/translate-english-chinese SDL: It is a human papilomavirus (HPV), previously thought to be caused by these diseases are chronicdiseases, but found benign HPV infection is a part of the cause of malignanttumors, such as skin cancer, Tongue cancer such as cervical and breast cancer,and this has aroused the attention of the people. It is caused by the virus tocells of a primary response accretion of leather superficial table of benign.Infected about potential four months or so. More likely to young people. Human: This is caused by human papiloma virus (HPV). Previously, it was thought that these diseases were chronic,but it has been found some benign HPV could cause malignant tumors, such asskin cancer, tongue cancer, cervical cancer, etc. This has aroused the attention for these diseases.
In my article “ Pride and Prejudice of Main Stream “, the first myth listed as top 10 misconceptions in NLP is as follows: Rule-based system faces a knowledge bottleneck of hand-crafted development while a machine learning system involves automatic training (implying no knowledge bottleneck). While there are numerous misconceptions on the old school of rule systems , this hand-crafted myth can be regarded as the source of all. Just take a review of NLP papers, no matter what are the language phenomena being discussed, it’s almost cliche to cite a couple of old school work to demonstrate superiority of machine learning algorithms, and the reason for the attack only needs one sentence, to the effect that the hand-crafted rules lead to a system “difficult to develop” (or “difficult to scale up”, “with low efficiency”, “lacking robustness”, etc.), or simply rejecting it like this, “literature , and have tried to handle the problem in different aspects, but these systems are all hand-crafted”. Once labeled with hand-crafting, one does not even need to discuss the effect and quality. Hand-craft becomes the rule system’s “original sin”, the linguists crafting rules, therefore, become the community’s second-class citizens bearing the sin. So what is wrong with hand-crafting or coding linguistic rules for computer processing of languages? NLP development is software engineering. From software engineering perspective, hand-crafting is programming while machine learning belongs to automatic programming. Unless we assume that natural language is a special object whose processing can all be handled by systems automatically programmed or learned by machine learning algorithms, it does not make sense to reject or belittle the practice of coding linguistic rules for developing an NLP system. For consumer products and arts, hand-craft is definitely a positive word: it represents quality or uniqueness and high value, a legit reason for good price. Why does it become a derogatory term in NLP? The root cause is that in the field of NLP, almost like some collective hypnosis hit in the community, people are intentionally or unintentionally lead to believe that machine learning is the only correct choice. In other words, by criticizing, rejecting or disregarding hand-crafted rule systems, the underlying assumption is that machine learning is a panacea, universal and effective, always a preferred approach over the other school. The fact of life is, in the face of the complexity of natural language, machine learning from data so far only surfaces the tip of an iceberg of the language monster (called l ow-hanging fruit by Church in K. Church: A Pendulum Swung Too Far ), far from reaching the goal of a complete solution to language understanding and applications. There is no basis to support that machine learning alone can solve all language problems, nor is there any evidence that machine learning necessarily leads to better quality than coding rules by domain specialists (e.g. computational grammarians). Depending on the nature and depth of the NLP tasks, hand-crafted systems actually have more chances of performing better than machine learning, at least for non-trivial and deep level NLP tasks such as parsing, sentiment analysis and information extraction (we have tried and compared both approaches). In fact, the only major reason why they are still there, having survived all the rejections from mainstream and still playing a role in industrial practical applications, is the superior data quality, for otherwise they cannot have been justified for industrial investments at all. the “forgotten” school: why is it still there? what does it have to offer? The key is the excellent data quality as advantage of a hand-crafted system, not only for precision, but high recall is achievable as well. quote from On Recall of Grammar Engineering Systems In the real world, NLP is applied research which eventually must land on the engineering of language applications where the results and quality are evaluated. As an industry, software engineering has attracted many ingenious coding masters, each and every one of them gets recognized for their coding skills, including algorithm design and implementation expertise, which are hand-crafting by nature. Have we ever heard of a star engineer gets criticized for his (manual) programming? With NLP application also as part of software engineering, why should computational linguists coding linguistic rules receive so much criticism while engineers coding other applications get recognized for their hard work? Is it because the NLP application is simpler than other applications? On the contrary, many applications of natural language are more complex and difficult than other types of applications (e.g. graphics software, or word processing apps). The likely reason to explain the different treatment between a general purpose programmer and a linguist knowledge engineer is that the big environment of software engineering does not involve as much prejudice while the small environment of NLP domain is deeply biased, with belief that the automatic programming of an NLP system by machine learning can replace and outperform manual coding for all language projects. For software engineering in general, (manual) programming is the norm and no one believes that programmers’ jobs can be replaced by automatic programming in any time foreseeable. Automatic programming, a concept not rare in science fiction for visions like machines making machines, is currently only a research area, for very restricted low-level functions. Rather than placing hope on automatic programming, software engineering as an industry has seen a significant progress on work of the development infrastructures, such as development environment and a rich library of functions to support efficient coding and debugging. Maybe in the future one day, applications can use more and more of automated code to achieve simple modules, but the full automation of constructing any complex software project is nowhere in sight. By any standards, natural language parsing and understanding (beyond shallow level tasks such as classification, clustering or tagging) is a type of complex tasks. Therefore, it is hard to expect machine learning as a manifestation of automatic programming to miraculously replace the manual code for all language applications. The application value of hand-crafting a rule system will continue to exist and evolve for a long time, disregarded or not. “Automatic” is a fancy word. What a beautiful world it would be if all artificial intelligence and natural languages tasks could be accomplished by automatic machine learning from data. There is, naturally, a high expectation and regard for machine learning breakthrough to help realize this dream of mankind. All this should encourage machine learning experts to continue to innovate to demonstrate its potential, and should not be a reason for the pride and prejudice against a competitive school or other approaches. Before we embark on further discussions on the so-called rule system’s knowledge bottleneck defect, it is worth mentioning that the word “automatic” refers to the system development, not to be confused with running the system. At the application level, whether it is a machine-learned system or a manual system coded by domain programmers (linguists), the system is always run fully automatically, with no human interference. Although this is an obvious fact for both types of systems, I have seen people get confused so to equate hand-crafted NLP system with manual or semi-automatic applications. Is hand-crafting rules a knowledge bottleneck for its development? Yes, there is no denying or a need to deny that. The bottleneck is reflected in the system development cycle. But keep in mind that this “bottleneck” is common to all large software engineering projects, it is a resources cost, not only introduced by NLP. From this perspective, the knowledge bottleneck argument against hand-crafted system cannot really stand, unless it can be proved that machine learning can do all NLP equally well, free of knowledge bottleneck: it might be not far from truth for some special low-level tasks, e.g. document classification and word clustering, but is definitely misleading or incorrect for NLP in general, a point to be discussed below in details shortly. Here are the ballpark estimates based on our decades of NLP practice and experiences. For shallow level NLP tasks (such as Named Entity tagging, Chinese segmentation), a rule approach needs at least three months of one linguist coding and debugging the rules, supported by at least half an engineer for tools support and platform maintenance, in order to come up with a decent system for initial release and running. As for deep NLP tasks (such as deep parsing, deep sentiments beyond thumbs-up and thumbs-down classification), one should not expect a working engine to be built up without due resources that at least involve one computational linguist coding rules for one year, coupled with half an engineer for platform and tools support and half an engineer for independent QA (quality assurance) support. Of course, the labor resources requirements vary according to the quality of the developers (especially the linguistic expertise of the knowledge engineers) and how well the infrastructures and development environment support linguistic development. Also, the above estimates have not included the general costs, as applied to all software applications, e.g. the GUI development at app level and operations in running the developed engines. Let us present the scene of the modern day rule-based system development. A hand-crafted NLP rule system is based on compiled computational grammars which are nowadays often architected as an integrated pipeline of different modules from shallow processing up to deep processing. A grammar is a set of linguistic rules encoded in some formalism, which is the core of a module intended to achieve a defined function in language processing, e.g. a module for shallow parsing may target noun phrase (NP) as its object for identification and chunking . What happens in grammar engineering is not much different from other software engineering projects. As knowledge engineer, a computational linguist codes a rule in an NLP-specific language, based on a development corpus. The development is data-driven, each line of rule code goes through rigid unit tests and then regression tests before it is submitted as part of the updated system for independent QA to test and feedback. The development is an iterative process and cycle where incremental enhancements on bug reports from QA and/or from the field (customers) serve as a necessary input and step towards better data quality over time. Depending on the design of the architect, there are all types of information available for the linguist developer to use in crafting a rule’s conditions, e.g. a rule can check any elements of a pattern by enforcing conditions on (i) word or stem itself (i.e. string literal, in cases of capturing, say, idiomatic expressions), and/or (ii) POS (part-of-speech, such as noun, adjective, verb, preposition), (iii) and/or orthography features (e.g. initial upper case, mixed case, token with digits and dots), and/or (iv) morphology features (e.g. tense, aspect, person, number, case, etc. decoded by a previous morphology module), (v) and/or syntactic features (e.g. verb subcategory features such as intransitive, transitive, ditransitive), (vi) and/or lexical semantic features (e.g. human, animal, furniture, food, school, time, location, color, emotion). There are almost infinite combinations of such conditions that can be enforced in rules’ patterns. A linguist’s job is to code such conditions to maximize the benefits of capturing the target language phenomena, a balancing art in engineering through a process of trial and error. Macroscopically speaking, the rule hand-crafting process is in its essence the same as programmers coding an application, only that linguists usually use a different, very high-level NLP-specific language, in a chosen or designed formalism appropriate for modeling natural language and framework on a platform that is geared towards facilitating NLP work. Hard-coding NLP in a general purpose language like Java is not impossible for prototyping or a toy system. But as natural language is known to be a complex monster, its processing calls for a special formalism (some form or extension of Chomsky’s formal language types) and an NLP-oriented language to help implement any non-toy systems that scale. So linguists are trained on the scene of development to be knowledge programmers in hand-crafting linguistic rules. In terms of different levels of languages used for coding, to an extent, it is similar to the contrast between programmers in old days and the modern software engineers today who use so-called high-level languages like Java or C to code. Decades ago, programmers had to use assembly or machine language to code a function. The process and workflow for hand-crafting linguistic rules are just like any software engineers in their daily coding practice, except that the language designed for linguists is so high-level that linguistic developers can concentrate on linguistic challenges without having to worry about low-level technical details of memory allocation, garbage collection or pure code optimization for efficiency, which are taken care of by the NLP platform itself. Everything else follows software development norms to ensure the development stay on track, including unit testing, baselines construction and monitoring, regressions testing, independent QA, code reviews for rules’ quality, etc. Each level language has its own star engineer who masters the coding skills. It sounds ridiculous to respect software engineers while belittling linguistic engineers only because the latter are hand-crafting linguistic code as knowledge resources. The chief architect in this context plays the key role in building a real life robust NLP system that scales. To deep-parse or process natural language, he/she needs to define and design the formalism and language with the necessary extensions, the related data structures, system architecture with the interaction of different levels of linguistic modules in mind (e.g. morpho-syntactic interface), workflow that integrate all components for internal coordination (including patching and handling interdependency and error propagation) and the external coordination with other modules or sub-systems including machine learning or off-shelf tools when needed or felt beneficial. He also needs to ensure efficient development environment and to train new linguists into effective linguistic “coders” with engineering sense following software development norms (knowledge engineers are not trained by schools today). Unlike the mainstream machine learning systems which are by nature robust and scalable, hand-crafted systems’ robustness and scalability depend largely on the design and deep skills of the architect. The architect defines the NLP platform with specs for its core engine compiler and runner, plus the debugger in a friendly development environment. He must also work with product managers to turn their requirements into operational specs for linguistic development, in a process we call semantic grounding to applications from linguistic processing. The success of a large NLP system based on hand-crafted rules is never a simple accumulation of linguistics resources such as computational lexicons and grammars using a fixed formalism (e.g. CFG) and algorithm (e.g. chart-parsing). It calls for seasoned language engineering masters as architects for the system design. Given the scene of practice for NLP development as describe above, it should be clear that the negative sentiment association with “hand-crafting” is unjustifiable and inappropriate. The only remaining argument against coding rules by hands comes down to the hard work and costs associated with hand-crafted approach, so-called knowledge bottleneck in the rule-based systems. If things can be learned by a machine without cost, why bother using costly linguistic labor? Sounds like a reasonable argument until we examine this closely. First, for this argument to stand, we need proof that machine learning indeed does not incur costs and has no or very little knowledge bottleneck. Second, for this argument to withstand scrutiny, we should be convinced that machine learning can reach the same or better quality than hand-crafted rule approach. Unfortunately, neither of these necessarily hold true. Let us study them one by one. As is known to all, any non-trivial NLP task is by nature based on linguistic knowledge, irrespective of what form the knowledge is learned or encoded. Knowledge needs to be formalized in some form to support NLP, and machine learning is by no means immune to this knowledge resources requirement. In rule-based systems, the knowledge is directly hand-coded by linguists and in case of (supervised) machine learning, knowledge resources take the form of labeled data for the learning algorithm to learn from (indeed, there is so-called unsupervised learning which needs no labeled data and is supposed to learn from raw data, but that is research-oriented and hardly practical for any non-trivial NLP, so we leave it aside for now). Although the learning process is automatic, the feature design, the learning algorithm implementation, debugging and fine-tuning are all manual, in addition to the requirement of manual labeling a large training corpus in advance (unless there is an existing labeled corpus available, which is rare; but machine translation is a nice exception as it has the benefit of using existing human translation as labeled aligned corpora for training). The labeling of data is a very tedious manual job. Note that the sparse data challenge represents the need of machine learning for a very large labeled corpus. So it is clear that knowledge bottleneck takes different forms, but it is equally applicable to both approaches. No machine can learn knowledge without costs, and it is incorrect to regard knowledge bottleneck as only a defect for the rule-based system. One may argue that rules require expert skilled labor, while the labeling of data only requires high school kids or college students with minimal training. So to do a fair comparison of the costs associated, we perhaps need to turn to Karl Marx whose “Das Kapital” has some formula to help convert simple labor to complex labor for exchange of equal value: for a given task with the same level of performance quality (assuming machine learning can reach the quality of professional expertise, which is not necessarily true), how much cheap labor needs to be used to label the required amount of training corpus to make it economically an advantage? Something like that. This varies from task to task and even from location to location (e.g. different minimal wage laws), of course. But the key point here is that knowledge bottleneck challenges both approaches and it is not the case believed by many that machine learning learns a system automatically with no or little cost attached. In fact, things are far more complicated than a simple yes or no in comparing the costs as costs need also to be calculated in a larger context of how many tasks need to be handled and how much underlying knowledge can be shared as reusable resources. We will leave it to a separate writing for the elaboration of the point that when put into the context of developing multiple NLP applications, the rule-based approach which shares the core engine of parsing demonstrates a significant saving on knowledge costs than machine learning. Let us step back and, for argument’s sake, accept that coding rules is indeed more costly than machine learning, so what? Like in any other commodities, hand-crafted products may indeed cost more, they also have better quality and value than products out of mass production. For otherwise a commodity society will leave no room for craftsmen and their products to survive. This is common sense, which also applies to NLP. If not for better quality, no investors will fund any teams that can be replaced by machine learning. What is surprising is that there are so many people, NLP experts included, who believe that machine learning necessarily performs better than hand-crafted systems not only in costs saved but also in quality achieved. While there are low-level NLP tasks such as speech processing and document classification which are not experts’ forte as we human have much more restricted memory than computers do, deep NLP involves much more linguistic expertise and design than a simple concept of learning from corpora to expect superior data quality. In summary, the hand-crafted rule defect is largely a misconception circling around wildly in NLP and reinforced by the mainstream, due to incomplete induction or ignorance of the scene of modern day rule development. It is based on the incorrect assumption that machine learning necessarily handles all NLP tasks with same or better quality but less or no knowledge bottleneck, in comparison with systems based on hand-crafted rules. Note: This is the author’s own translation, with adaptation, of part of our paper which originally appeared in Chinese in Communications of Chinese Computer Federation (CCCF), Issue 8, 2013 Domain portability myth in natural language processing Pride and Prejudice of NLP Main Stream K. Church: A Pendulum Swung Too Far , Linguistics issues in Language Technology, 2011; 6(5) Wintner 2009. What Science Underlies Natural Language Engineering? Computational Linguistics, Volume 35, Number 4 Pros and Cons of Two Approaches: Machine Learning vs Grammar Engineering Overview of Natural Language Processing Dr. Wei Li’s English Blog on NLP
一切声称用主流机器学习方法做社会媒体舆情挖掘的系统,都值得怀疑。捉襟见肘不堪应用是基本现状。原因是如此显然,机器学习在短消息主导的社会媒体面前失效了。短消息根本就没有足够密度的数据点(所谓 keyword density)供机器学习施展。巧妇且难为无米之炊,这是一袋子词的方法论决定的,再大的训练集也难以克服这个局限。没有语言学的结构分析,这是不可逾越的挑战。 I have articulated this point in various previous posts or blogs before, but the world is so dominated by the mainstream that it does not seem to carry far. So let me make it simple to be understood: The sentiment classification approach based on bag of words (BOW) model, so far the dominant approach in the mainstream for sentiment analysis, simply breaks in front of social media. The major reason is simple: the social media posts are full of short messages which do not have the keyword density required by a classifier to make the proper sentiment decision. The precision ceiling for this line of work in real life social media is found to be 60%, far behind the widely acknowledged precision minimum 80% for a usable extraction system. Trusting a machine learning classifier to perform social media sentiment is not much better than flipping a coin. So let us get straight. From now on, whoever claims the use of machine learning for social media mining of public opinions and sentiments is likely to be a trap (unless it is verified to have involved parsing of linguistic structures or patterns, which so far has never been heard of in practical systems based on machine learning). Fancy visualizations may make the mining results look real and attractive but they are just not trustable at all. 【补记】 朋友截屏了朋友圈,说这是一竿子打翻一船人的架势。但关于这一点,实在没有办法, 无论中文还是西文, 短消息压倒多数是 移动时代社交媒体的现实, 总须有人揭出社交媒体大数据挖掘背后的事实真相。 BOW 面对短消息束手无策,是不争的事实,不会因为这是最简便 available 的主流方法,多数人用它,它就在不适合它的场所突然显灵了。 不 work 就是不 work,这一路突破不了60%的精度瓶颈,离公认的可用精度门槛80%遥不可及,这是方法论决定的。 Related Posts: Pros and Cons of Two Approaches: Machine Learning and Grammar Engineering Coarse-grained vs. fine-grained sentiment analysis 舆情挖掘系统独立验证的意义 2015-11-22 【立委科普:NLP 中的一袋子词是什么】 2015-11-27 【置顶:立委科学网博客NLP博文一览(定期更新版)】
January 2000 On Hybrid Model Pre-Knowledge-Graph Profile Extraction Research via SBIR (3) This section presents the feasibility study conducted in Phase I of the proposed hybrid model for Level-2 and Level‑3 IE. This study is based on literature review and supported by extensive experiments and prototype implementation. This model complements corpus-based machine learning by hand-coded FST rules. The essential argument for this strategy is that by combining machine learning methods with an FST rule-based system, the system is able to exploit the best of both paradigms while overcoming their respective weaknesses. This approach was intended to meet the demand of the designed system for processing unrestricted real life text. 2.2.1 Hybrid Approach It was proposed that FST hand-crafted rules combine with corpus-based learning in all major modules of Textract . More precisely, each module M will consist of two sub-modules M1 and M2, i.e. FST model and trained model. The former serves as a preprocessor, as shown below. M1: FST Sub-module ˉ M2: trained Sub-module The trained model M2 has two features: (i) adaptive training ; (ii) structure-based training. In a pipeline architecture, the output of the previous module is the input of the succeeding module. If the succeeding module is a trained model, there are two types of training: adaptive training or non-adaptive training. In adaptive training, the input in the training phase is exactly the same as the input in the application phase. That is, the possibly imperfect output from the previous module is the input for training even if the previous module may make certain mistakes. This type of training “adapts” the model to imperfect input and the trained model will be more robust and results in some necessary adjustment. In contrast, a naive non-adaptive training is often conducted based on a perfect, often artificial input. The assumption is that the previous module is a continuously improving module and will be able to provide near-perfect output for the next module. There are pros and cons for both adaptive and non-adaptive methods. Non-adaptive training is suitable for the case when the training time is significantly long and in the case where the previous module is simple and reaches high precision. In contrast, an adaptively trained model has to be re-trained each time the previous module(s) undergo some major changes. Otherwise, the performance will be seriously affected. This imposes stringent requirements on training time and algorithm efficiency. Since the machine learning tools, which Cymfony has developed in-house, are very efficient, Textract can afford to adopt the more flexible training method using adaptive input. Adaptive training provides the rationale for placing the FST model before the trained model. The development of the FST sub-module M1 and the trained sub-module M2 can be done independently. When the time comes for the integration of M1 and M2 for better performance, it suffices to re-train M2 on the output of M1. The flexible adaptive training capabilities make this design viable, as verified inthe prototype implementation of Textract2.0/CE . In contrast, if M1 were placed after M2, the development of hand-crafted rules for M1 would have to wait until M2 is implemented. Otherwise, many rules may have to be re-written and re-debugged, which is not desirable. The second issue is structure-based training. Natural language is structural by nature; any sophisticated high level IE can hardly be successful based on linear strings of tokens. In order to capture CE/GEphenomena, traditional n -gram training with a window size of n linear tokens is not sufficient. Sentences can be long where the related entities are far apart, not to mention the long distance phenomena in linguistics. Without structure based training, no matter how large the window size one chooses, generalized rules cannot be effectively learned. However, once the training is based on linguistic structures, the distance between the entities becomes tractable. In fact, as linguistic structures are hierarchical, we need to perform multi-level training in order to capture CE/GE. For CE, it has been found during the Phase I research that three levels of training are necessary. Each level of training should be supported by the corresponding natural language parser. The remainder of this section presents the feasibility study and arguments for the choice of an FST rule based system to complement the corpus-based machine learning models. 2.2.2 FST Grammars The most attractive feature of the FST formalism lies in its superior time and space efficiency. Applying FST basically depends linearly on the input size of the text. This is in contrast to the more pervasive formalism used in NLP, namely, Context Free Grammars. This theoretical time/space efficiency has been verified through the extensive use of Cymfony’s proprietary FST Toolkit in the following applications of Textract implementation: (i) tokenizer; (ii) FST-based rules for capturing NE; (iii) FST representation of lexicons (lexical transducers); (iv) experiments in FST local grammars for shallow parsing; and (v) local CE/GEgrammars in FST. For example, the Cymfony shallow parser has been benchmarked to process 460 MB of text an hour on a 450 MHz Pentium II PC running Windows NT. There is a natural combination of FST-based grammars and lexical approaches to natural language phenomena . In order for IE grammars/rules to perform well, the lexical approach must be employed. In fact, the NE/CE/GE grammars which have been developed in Phase I have demonstrated a need for the lexical approach. Take CE as an example. In order to capture a certain CE relationship, say affiliation , the corresponding rules need to check patterns involving specific verbs and/or prepositions, say work for/hired by , which denote this relationship in English. The GE grammar, which aims at decoding the key semantic relationships in the argument structure in stead of surface syntactic relationships, has also demonstrated the need to involve considerable level of lexical constraints. Efficiency is always an important consideration indeveloping a large-scale deployable software system. Efficiency is particularly required for lexical grammars since lexical grammars are usually too large for efficient processing using conventional, more powerful grammar formalisms (e.g. Context Free Grammar formalism). Cymfony is convinced through extensive experiments that the FST technology is an outstanding tool to tackle this efficiency problem. It was suggested that a set of cascaded FST grammars could simulate sophisticated natural language parsing. This use of FST application has already successfully been applied to the Textract shallow parsing and local CE/GE extraction. There are a number of success stories of FST-based rule systems in the field of IE. For example, the commercial NE system NetOwl relies heavily on FST pattern matching rules . SRI also applied a very efficient FST local grammar for the shallow parsing of basic noun phrases and verb groups in order to support IE tasks . More recently, Universite Paris VII/LADL has successfully applied FST technology to one specified information extraction/retrieval task; that system can extract information on-the-fly about one's occupation from huge amounts of free text. The system is able to answer questions which conventional retrieval systems cannot handle, e.g. W ho is the minister of culture in France? Finally, it has also been proven by many research programs such as , and INTEX , as well as Cymfony , that an FST-based rule system is extremely efficient. In addition, FST is a convenient tool for capturing linguistic phenomena, especially for idioms and semi-productive expressions that are abundant in natural languages. As Hobbs says, “languages in general are very productive in the construction of short, multiword fixed phrases and proper names employing specialized microgrammars”. However, a purely FST-based rule system suffers from the same disadvantage in knowledge acquisition as that for all handcrafted rule systems. After all, the FST rules or local grammars have to be encoded by human experts, imposing this traditional labor-intensive problem in developing large scale systems. The conclusion is that while FST overcomes a number of shortcomings of the traditional rule based system (in particular the efficiency problem), it does not relieve the dependence on highly skilled human labor. Therefore, approaches for automatic machine learning techniques are called for. 2.2.3 Machine Learning The appeal of corpus-based machine learning in language modeling lies mainly in its automatic training/learning capabilities,hence significantly reducing the cost for hand-coding rules. Compared with rule based systems, there are definite advantages of corpus-based learning: · automatic knowledge acquisition: results in fast development time since the system discovers regularity automatically when given sufficient correctly annotated data · robustness: since knowledge/rules are learned directly from corpus · acceptable speed : in general, there is little run-time processing; the knowledge/rules obtained in the training phase can be stored in efficient data structures for run-time lookup · portability : a domain shift only requires the truthing of new data; new knowledge/rules will be automatically learned with no need to change any part of the program or control BBN has recently implemented an integrated, fully trainable model, SIFT, applied to IE . This system performs the tasks of linguistic processing (POS tagging, syntactic parsing and semantic relationship identification), TE and TR as well as NE, all at once. They have reported 83.49% F-measures for TE and 71.23% F-measures for TR, a result close to those of the best systems in MUC-7. In addition, their successful experiment in making use of the Penn Treebank for training the initial syntactic parser significantly reduces the cost of human annotation. There is no doubt that their effort is a significant progress in this field. It demonstrates the state-of-the-art in applying grammar induction to Level-2 IE. However, there are two potentially serious problems with their approach. The first is the lack of efficiency in applying the model. As they acknowledge, the speed of the system is rather slow. In terms of efficiency, the CKY‑based parsing algorithm they use is not comparable with algorithms for formalisms based on the finite state scheme (e.g. FST, Viterbi for HMM). This limiting factor is due to the inherent nature of the learned grammar based on the CFG formalism. To overcome this problem, rule induction has been explored in the direction of learning FST style grammars for local CE/GEextraction instead of CFG. The second problem is with their integrated approach. Because everything is integrated in one process, it is extremely difficult to trace where a problem lies, making debugging difficult. It is believed that a much more secure way is to follow the conventional practice of modularizing the NLP/IE process in different tasks and sub-tasks, as Cymfony has proposed in the Textract architecture design: POS tagging, shallow parsing, co-referencing, full parsing, pragmatic filtering, NE, CE,GE. Along this line, it is easy to find directly whether a particular degradation in performance is due to poor support from co-referencing or from mistakes in shallow parsing, for example. Performance benchmarking can be measured for each module; efforts to improve the performance of each individual module will contribute to the improvement of the overall system performance. 2.2.4 Drawbacks of Corpus-based Learning The following drawbacks motivate the proposed idea of building a hybrid system/module, complementing the automatic corpus-based learning by handcrafted grammars in FST. · ‘Sparse data’ problem : this is recognized as a bottle-neck for all corpus-based models . Unfortunately, a practical solution to this problem (e.g. smoothing or back-off techniques) often results in a model much less sophisticated than traditional rule-based systems. · ‘Local maxima’ problem : even if the training corpus is large and sufficiently representative, the training program can result in a poor model because training got stuck in a local maximum and failed to find the global peak . This is an inherent problem with the standard training algorithms for both HMM (i.e. forward-backward algorithm ) and CFG grammar induction ( inside-outside algorithm ). This problem can be very serious when there is no extra information applied to guide the training process. · computational complexity problem : It is often the case that there is a trade-off between expressive power/prior knowledge/constraints in the templates and feasibility. Usually, the more sophisticateda model or rule template is, the more the minimum requirement for a corpus increases, often up to an unrealistic level of training complexity. To extend the length of the string to be examined (e.g. from bigram to trigram), or to add more features (or categories/classes) for a template to be able to make reference to, usually means an enormous jump in such requirement. Otherwise, the system suffers from more serious sparse data effect. In many cases, the limitation imposed on the training complexity makes some research ideas unattainable, which in turn limits the performance power. · potentially very high cost for manual annotationof corpus: that is why Cymfony has proposed as one important direction for future research to explore the combination of supervised training and unsupervised training. Among the above four problems, the sparse data problem is believed to be most serious. To a considerable extent, the success of a system depends on how this problem is addressed. In general, there are three ways to minimize the negative effect of sparse data, discussed below. The first is to condition the probabilities/rules on fewer elements, e.g. to back off from N-gram model to (N-1)-gram model. This remedy is clearly a sacrifice of the power and therefore is not a viable option for sophisticated NLP/IE tasks. The second approach is to condition the probabilities/rules on appropriate levels of linguistic structures (e.g. basic phrase level) instead of surface based linear tokens. The research in the CE prototyping showed this to be one the most promising ways of handling the sparse data problem. This approach calls for a reliable natural language parser to establish the necessary structural foundation for conducting structure-based adaptive learning. The shallow parser which Cymfony has built using the FST engine and an extensively tested manual grammar has been tested to perform with 90.5% accuracy. The third method is to condition the probabilities/rules on more general features, e.g. using syntactic categories (e.g. POS) or semantic classes (e.g. the results from semantic lexicon; or from word clustering training) instead of the token literal. This is also a proven effective means for overcoming this bottleneck. However,there is considerable difficulty in applying this approach due to the high degree of lexical ambiguity widespread in natural languages. As for the ‘local maxima’ problem, the proposed hybrid approach in integrating handcrafted FST rules and the automatic grammar learner promises a solution. The learned model can be re-trained using the FST component as a ‘seed’ to guide the learning. In general, the more constraints and heuristics that are given to the initial statistical model for training, the better the chance for the training algorithm to result in the global maximum. It is believed that a handcrafted grammar is the most effective of such constraints since it embodies human linguistic knowledge. 2.2.5 Feasibility and Advantages of Hybrid Approach In fact, the feasibility of such collaboration between a handcrafted rule system (FST in this case) and a corpus-based system has already been verified for all the major types of models: · For transformation based systems, Brill's training algorithm ensures that the input to the system can be either a randomly tagged text ( naive initial state ) or a text tagged by another module with the same function ( sophisticated initial state ) . Using the POS tagging as an example, the input to the transformation-based tagger can be either a text randomly tagged or a text tagged by another POS tagger. The shift in the input sources only requires re-training the system; nothing in the algorithm and the annotated corpus need to be changed. · In the case of rule induction, the FST-based grammar can serve as a ‘seed’ to effectively constrain/guide the learning process in overcoming the ‘local maxima’ problem. In general, a better initial estimate of the parameters gives the learning procedure a chance to obtain better results when many local maximal points exist . It is proven by experiments conducted by Briscoe Waegner that even with a very crude handcrafted grammar of only seven binary-branching rules (e.g. PP -- P NP) to start with, a much better grammar is automatically learned than the one using the same approach without a grammar ‘seed’. Another more interesting experiment they conducted gives the following encouraging results. Given the seed of an artificial grammar that can only parse 25% of the 50,00-word corpus, the training program is able to produce a grammar capable of parsing 75% the corpus. This demonstrates the feasibility of combining handcrafted grammar and automatic grammar induction in line with the general approach proposed above: FST rules before statistical model. · When the trained sub-module is an HMM, Cymfony has verified its feasibility through extensive experiments in implementing the hybrid NE tagger, Textract 1.0 . Cymfony first implemented an NE system purely on HMM bi-gram learning, and found there were weaknesses. Due to sparse data problem, although time and numerical NEs are expressed in very predictable patterns, there was considerable amount of mistagging. Later this problem was addressed by FST rules which are good at capturing these patterns. The FST pattern rules for NE serve as a preprocessor. As a result, Textract1.0 achieved significant performance enhancement (from 85% F-measures raised to 93%). The advantages of this proposed hybrid approach are summarized below: · strict modularity : the proposal of combining FST rules and statistical models makes the system more modular as each major module is now divided into two sub-modules. Of course, adaptive re-training is necessary in the later stage of integrating the two sub-modules but it is not a burden as the process is automatic and in principle, it does not require modifications in the algorithm or the training corpus. · enhanced performance : due to the complementary nature of handcrafted and machine-learning systems. · flexible ratio of sub-modules : one module may have a large trained model and a small FST component, or the other way around, depending on the nature of a given task, i.e. how well the FST approach or the learning approach applies to the task. One is free to decide how to allocate more effort and resources to develop one component or the other. If we judge that for Task One, automatic learning is most effective, we are free to decide that more effort and resources should be used to develop the trained module M2 for this task (and less effort for the FST module M1). In other words, the relative size or contribution of M1 versus M2 is flexible,e.g. M1=20% and M2=80%. Technology developed for the proposed information extraction system and its application has focused on six specific areas: (i) machine learning toolkit, (ii) CE, (iii),CO (iv) GE, (v) QA and (vi) truthing and evaluation. The major accomplishments in these areas from the Phase I research are presented in the following sections. In fact, it is also the case in the development of a pure statistical system: repeated training and testing is the normal practice of adjusting the model in the effort for performance improvement and debugging. It is possible that one module is based exclusively on FST rules, i.e. M1=100% and M2=0%, or completely on a learned model, i.e. M1=0% and M2=100% so long as its performance is deemed good enough or the overhead of combining the FST grammar and the learned model outweighs the slight gain in performance. In fact, some minor modules like Tokenizer and POS Tagger can produce very reliable results using only one approach. REFERENCES Abney, S.P. 1991. Parsingby Chunks, Principle-Based Parsing: Computation and Psycholinguistics ,Robert C. Berwick, Steven P. Abney, Carol Tenny, eds. Kluwer Academic Publishers, Boston, MA, pp.257-278. Appelt, D.E. et al. 1995. SRI International FASTUS System MUC-6 TestResults and Analysis. Proceedings ofMUC-6 , Morgan Kaufmann Publishers, San Mateo, CA Beckwith, R. et al. 1991. WordNet: A Lexical Database Organized on Psycholinguistic Principles. Lexicons: Using On-line Resources to build a Lexicon , Uri Zernik,editor, Lawrence Erlbaum, Hillsdale, NJ. Bikel, D.M. et al .,1997. Nymble: a High-Performance Learning Name-finder. Proceedings ofthe Fifth Conference on Applied Natural Language Processing , MorganKaufmann Publishers, pp. 194-201. Brill, E., 1995.Transformation-based Error-Driven Learning and Natural language Processing: A Case Study in Part-of-Speech Tagging, Computational Linguistics , Vol.21,No.4, pp. 227-253 Briscoe, T. Waegner,N., 1992. Robust Stochastic Parsing Using the Inside-Outside Algorithm. WorkshopNotes, Statistically-Based NLP Techniques , AAAI, pp. 30-53 Charniak, E. 1994. Statistical Language Learning , MIT Press, Cambridge, MA. Chiang, T-H., Lin, Y-C. Su, K-Y. 1995. Robust Learning, Smoothing, and Parameter Tying on Syntactic Ambiguity Resolution, Computational Linguistics , Vol.21,No.3, pp. 321-344. Chinchor, N. Marsh,E. 1998. MUC-7 Information Extraction Task Definition (version 5.1), Proceedingsof MUC-7 Darroch, J.N. Ratcliff, D. 1972. Generalized iterative scaling for log-linear models. TheAnnals of Mathematical Statistics, pp. 1470-1480. Grishman, R., 1997.TIPSTER Architecture Design Document Version 2.3. Technical report, DARPA. Hobbs, J.R. 1993. FASTUS: A System for Extracting Informationfrom Text, Proceedings of the DARPA workshop on Human Language Technology , Princeton, NJ, pp. 133-137. Krupka, G.R. Hausman, K. 1998. IsoQuest Inc.: Description of the NetOwl (TM) ExtractorSystem as Used for MUC-7, Proceedings of MUC-7 Lin, D. 1998. Automatic Retrieval and Clustering of Similar Words, Proceedings of COLING-ACL '98 , Montreal, pp. 768-773. Miller, S. et al .,1998. BBN: Description of the SIFT System as Used for MUC-7. Proceedings of MUC-7 Mohri, M. 1997.Finite-State Transducers in Language and Speech Processing, ComputationalLinguistics , Vol.23, No.2, pp.269-311. Mooney, R.J. 1999. Symbolic Machine Learning for NaturalLanguage Processing. Tutorial Notes, ACL ’99 . MUC-7, 1998. Proceedings of the Seventh MessageUnderstanding Conference (MUC-7), published on the websitehttp://www.muc.saic.com/ Pine, C. 1996. Statement-of-Work (SOW) for The Intelligence Analyst Associate (IAA)Build 2, Contract for IAA Build 2, USAF, AFMC, RomeLaboratory. Rilof, E. Jones, R.1999. Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping, Proceedings of the Sixteenth a National Conference on Artificial Intelligence (AAAI-99) Rosenfeld, R. 1994. Adaptive Statistical Language Modeling. PhD thesis, Carnegie Mellon University. Senellart, J. 1998. Locating Noun Phrases with Finite StateTransducers, Proceedings of COLING-ACL '98 , Montreal, pp. 1212-1219. Silberztein, M. 1998.Tutorial Notes: Finite State Processing with INTEX, COLING-ACL'98, Montreal(also available at http://www.ladl.jussieu.fr) Srihari, R. 1998. A Domain Independent Event Extraction Toolkit, AFRL-IF-RS-TR-1998-152 Final Technical Report, published by Air Force Research Laboratory, Information Directorate,Rome Research Site, New York Yangarber, R. Grishman, R. 1998. NYU: Description of the Proteus/PET System as Used for MUC-7ST, Proceedings of MUC-7 Pre-Knowledge-Graph Profile Extraction Research via SBIR (1) 2015-10-24 Pre-Knowledge-Graph Profile Extraction Research via SBIR (2) 2015-10-24 朝华午拾:在美国写基金申请的酸甜苦辣 - 科学网 【置顶:立委科学网博客NLP博文一览(定期更新版)】
- Scientific Art of Bio-systems 1. R.Rosen MD.Mesarovic (1968), DW.Thompson S.Leduc (1910). 2. BJ.Zeng,Structure Theory of Bio-systems, Methodology of Graph Theory and Network Topology (1991-1997): a). Solar-energy and Bio-Electronics (1991), b). Systems Medicine and Pharmacology (1992), c). Structure Theory on the Integration, Stability and Construction of Systems (1993), c). Systems Genetics and Bio-engineering (1994), d). Bio-computer and Cell Bionic Engineering, Oviduct Bioreactor and Transgenics (1994) , e). Bio-systems Theory and Systems Bio-engineering, SG of the First International Conference on Transgenic Animals (1996), f). Biosystem Network, BSSE (1999) – Positive and Synthetic Thoughts: Structure Theory of Bio-systems, Computational, Experimental and Engineering Manipulation of Bio-systems, Bionics and Transgenics of Artificial Bio-systems etc. 3. On Systems Biology (2001) at the Aspects: Systems Theory (O.Wolkenhauer), Experimental Omics (L.Hood), Computation in Silico (H.Kitano) and Engineering Design (AP.Arkin) etc. - (08/09/2015) -
Before we start discussing the topic of a hybrid NLP (Natural Language Processing) system, let us look at the concept of hybrid from our life experiences. I was driving a classical Camry for years and had never thought of a change to other brands because as a vehicle, there was really nothing to complain. Yes, style is old but I am getting old too, who beats whom? Until one day a few years ago when we needed to buy a new car to retire my damaged Camry. My daughter suggested hybrid, following the trend of going green. So I ended up driving a Prius ever since and have fallen in love with it. It is quiet, with bluetooth and line-in, ideal for my iPhone music enjoyment. It has low emission and I finally can say bye to smog tests. It at least saves 1/3 gas. We could have gained all these benefits by purchasing an expensive all-electronic car but I want the same feel of power at freeway and dislike the concept of having to charge the car too frequently. Hybrid gets the best of both worlds for me now, and is not that more expensive. Now back to NLP. There are two major approaches to NLP, namely machine learning and grammar engineering (or hand-crafted rule system). As mentioned in previous posts, each has its own strengths and limitations, as summarized below. In general, a rule system is good at capturing a specific language phenomenon (trees) while machine learning is good at representing the general picture of the phenomena (forest). As a result, it is easier for rule systems to reach high precision but it takes a long time to develop enough rules to gradually raise the recall. Machine learning, on the other hand, has much higher recall, usually with compromise in precision or with a precision ceiling. Machine learning is good at simple, clear and coarse-grained task while rules are good at fine-grained tasks. One example is sentiment extraction. The coarse-grained task there is sentiment classification of documents (thumbs-up thumbs down), which can be achieved fast by a learning system. The fine-grained task for sentiment extraction involves extraction of sentiment details and the related actionable insights, including association of the sentiment with an object, differentiating positive/negative emotions from positive/negative behaviors, capturing the aspects or features of the object involved, decoding the motivation or reasons behind the sentiment,etc. In order to perform sophisticated tasks of extracting such details and actionable insights, rules are a better fit. The strength for machine learning lies in its retraining ability. In theory, the algorithm, once developed and debugged, remains stable and the improvement of a learning system can be expected once a larger and better quality corpus is used for retraining (in practice, retraining is not always easy: I have seen famous learning systems deployed in client basis for years without being retrained for various reasons). Rules, on the other hand, need to be manually crafted and enhanced. Supervised machine learning is more mature for applications but it requires a large labelled corpus. Unsupervised machine learning only needs raw corpus, but it is research oriented and more risky in application. A promising approach is called semi-supervised learning which only needs a small labelled corpus as seeds to guide the learning. We can also use rules to generate the initial corpus or seeds for semi-supervised learning. Both approaches involve knowledge bottlenecks. Rule systems's bottleneck is the skilled labor, it requires linguists or knowledge engineers to manually encode each rule in NLP, much like a software engineer in the daily work of coding. The biggest challenge to machine learning is the sparse data problem, which requires a very large labelled corpus to help overcome. The knowledge bottleneck for supervised machine learning is the labor required for labeling such a large corpus. We can build a system to combine the two approaches to complement each other. There are different ways of combining the two approaches in a hybrid system. One example is the practice we use in our product, where the results of insights are structured in a back-off model: high precision results from rules are ranked higher than the medium precision results returned by statistical systems or machine learning. This helps the system to reach configurable balance between precision and recall. When labelled data are available (e.g. the community has already built the corpus, or for some tasks, the public domain has the data, e.g. sentiment classification of movie reviews can use the review data with users' feedback on 5-star scale), and when the task is simple and clearly defined, using machine learning will greatly speed up the development of a capability. Not every task is suitable for both approaches. (Note that suitability is in the eyes of beholder: I have seen many passionate ML specialists willing to try everything in ML irrespective of the nature of the task: as an old saying goes, when you have a hammer, everything looks like a nail.) For example, machine learning is good at document classification whilerules are mostly powerless for such tasks. But for complicated tasks such as deep parsing, rules constructed by linguists usually achieve better performance than machine learning. Rules also perform better for tasks which have clear patterns, for example, identifying data items like time,weight, length, money, address etc. This is because clear patterns can be directly encoded in rules to be logically complete in coverage while machine learning based on samples still has a sparse data challenge. When designing a system, in addition to using a hybrid approach for some tasks, for other tasks, we should choose the most suitable approach depending on the nature of the tasks. Other aspects of comparison between the two approaches involve the modularization and debugging in industrial development. A rule system can be structured as a pipeline of modules fairly easily so that a complicated task is decomposed into a series of subtasks handled by different levels of modules. In such an architecture, a reported bug is easy to localize and fix by adjusting the rules in the related module. Machine learning systems are based on the learned model trained from the corpus. The model itself, once learned, is often like a black-box (even when the model is represented by a list of symbolic rules as results of learning, it is risky to manually mess up with the rules in fixing a data quality bug). Bugs are supposed to be fixable during retraining of the model based on enhanced corpus and/or adjusting new features. But re-training is a complicated process which may or may not solve the problem. It is difficultto localize and directly handle specific reported bugs in machine learning. To conclude, Hybrid gets the best of both worlds . Due to the complementary nature for pros/cons of the two basic approaches to NLP, a hybrid system involving both approaches is desirable, worth more attention and exploration. There are different ways of combining the two approaches in a system, including a back-off model using rulles for precision and learning for recall, semi-supervised learning using high precision rules to generate initial corpus or “seeds”, etc.. Related posts: Comparison of Pros and Cons of Two NLP Approaches Is Google ranking based on machine learning ? 《立委随笔:语言自动分析的两个路子》 《立委随笔:机器学习和自然语言处理》 【置顶:立委科学网博客NLP博文一览(定期更新版)】
Lets take a close look at three relatedterms (Deep Learning vs Machine Learning vs Pattern Recognition), and see howthey relate to some of the hottest tech-themes in 2015 (namely Robotics andArtificial Intelligence). In our short journey through jargon, you shouldacquire a better understanding of how computer vision fits in, as well as gainan intuitive feel for how the machine learning zeitgeist has slowly evolvedover time. Fig 1. Putting a human inside a computeris not Artificial Intelligence (Photo from WorkFusionBlog ) If you look around, you'll see no shortage of jobs at high-tech startupslooking for machine learning experts. While only a fraction of them are lookingfor Deep Learning experts, I bet most of these startups can benefit from eventhe most elementary kind of data scientist. So how do you spot a futuredata-scientist? You learn how they think. The three highly-relatedlearning buzz words “Pattern recognition,” “machinelearning,” and “deep learning” represent three different schools ofthought. Pattern recognition is the oldest (and as a term is quiteoutdated). Machine Learning is the most fundamental (one of the hottest areasfor startups and research labs as of today, early 2015). And DeepLearning is the new, the big, the bleeding-edge -- we’re not even close tothinking about the post-deep-learning era . Just take a look at thefollowing Google Trends graph. You'll see that a) Machine Learning isrising like a true champion, b) Pattern Recognition started as synonymous withMachine Learning, c) Pattern Recognition is dying, and d) Deep Learning is newand rising fast. 1. Pattern Recognition: The birth ofsmart programs Pattern recognition was a term popularin the 70s and 80s. Theemphasis was on getting a computer program to do something “smart” likerecognize the character 3. And it really took a lot ofcleverness and intuition to build such a program. Just think of 3vs B and 3 vs 8. Back in the day, itdidn’t really matter how you did it as long as there was no human-in-a-boxpretending to be a machine. (See Figure 1) So if your algorithm wouldapply some filters to an image, localize some edges, and apply morphologicaloperators, it was definitely of interest to the pattern recognitioncommunity. Optical Character Recognition grew out of this community andit is fair to call “Pattern Recognition” as the “Smart Signal Processingof the 70s, 80s, and early 90s. Decision trees, heuristics, quadraticdiscriminant analysis, etc all came out of this era. Pattern Recognition becomesomething CS folks did, and not EE folks. One of the most popular booksfrom that time period is the infamous Duda Hart PatternClassification book and is still a great starting point for youngresearchers. But don't get too caught up in the vocabulary, it's a bitdated. The character 3 partitionedinto 16 sub-matrices. Custom rules, custom decisions, and customsmart programs used to be all the rage. SeeOCR Page . Quiz : The most popular Computer Vision conference is called CVPRand the PR stands for Pattern Recognition. Can you guess the year of thefirst CVPR conference? 2. Machine Learning: Smart programs canlearn from examples Sometime in the early 90s people startedrealizing that a more powerful way to build pattern recognition algorithms isto replace an expert (who probably knows way too much about pixels) with data(which can be mined from cheap laborers). So you collect a bunch of faceimages and non-face images, choose an algorithm, and wait for the computationsto finish. This is the spirit of machine learning. Machine Learningemphasizes that the computer program (or machine) must do some work after it isgiven data. The Learning step is made explicit. And believeme, waiting 1 day for your computations to finish scales better than invitingyour academic colleagues to your home institution to design some classificationrules by hand. What is Machine Learningfrom DrNatalia Konstantinova's Blog . The most important part of thisdiagram are the Gears which suggests thatcrunching/working/computing is an important step in the ML pipeline. As Machine Learning grew into a majorresearch topic in the mid 2000s, computer scientists began applying these ideasto a wide array of problems. No longer was it only character recognition,cat vs. dog recognition, and other “recognize a pattern inside an array ofpixels” problems. Researchers started applying Machine Learning to Robotics(reinforcement learning, manipulation, motion planning, grasping), to genomedata, as well as to predict financial markets. Machine Learning wasmarried with Graph Theory under the brand “Graphical Models,” every roboticsexpert had no choice but to become a Machine Learning Expert, and MachineLearning quickly became one of the most desired and versatile computing skills . However Machine Learning says nothing about the underlyingalgorithm. We've seen convex optimization, Kernel-based methods, SupportVector Machines, as well as Boosting have their winning days. Togetherwith some custom manually engineered features, we had lots of recipes, lots ofdifferent schools of thought, and it wasn't entirely clear how a newcomershould select features and algorithms. But that was all about tochange... Further reading: To learn more about the kinds of features that were used inComputer Vision research see my blog post: Fromfeature descriptors to deep learning: 20 years of computer vision . 3. Deep Learning: one architecture torule them all Fast forward to today and what we’reseeing is a large interest in something called Deep Learning. The most popularkinds of Deep Learning models, as they are using in large scale imagerecognition tasks, are known as Convolutional Neural Nets, or simplyConvNets. ConvNet diagram from TorchTutorial Deep Learning emphasizes the kind ofmodel you might want to use (e.g., a deep convolutional multi-layer neuralnetwork) and that you can use data fill in the missing parameters. Butwith deep-learning comes great responsibility. Because you are starting witha model of the world which has a high dimensionality, you really need a lot ofdata (big data) and a lot of crunching power (GPUs). Convolutions are usedextensively in deep learning (especially computer vision applications), and thearchitectures are far from shallow. If you're starting out with Deep Learning, simply brush up on some elementaryLinear Algebra and start coding. I highly recommend AndrejKarpathy's Hacker's guideto Neural Networks . Implementing your own CPU-based backpropagationalgorithm on a non-convolution based problem is a good place to start. There are still lots of unknowns. The theory of why deep learning works isincomplete, and no single guide or book is better than true machine learningexperience. There are lots of reasons why Deep Learning is gainingpopularity, but Deep Learning is not going to take over the world. Aslong as you continue brushing up on your machine learning skills, your job issafe. But don't be afraid to chop these networks in half, slice 'n dice atwill, and build software architectures that work in tandem with your learningalgorithm. The Linux Kernel of tomorrow might run on Caffe (one of the most popular deeplearning frameworks), but great products will always need great vision, domainexpertise, market development, and most importantly: human creativity. Other related buzz-words Big-data is the philosophy of measuring allsorts of things, saving that data, and looking through it forinformation. For business, this big-data approach can give you actionableinsights. In the context of learning algorithms, we’ve only startedseeing the marriage of big-data and machine learning within the past fewyears. Cloud-computing , GPUs , DevOps ,and PaaS providers have made large scale computing within reach ofthe researcher and ambitious everyday developer. Artificial Intelligence is perhaps the oldest term, the most vague,and the one that was gone through the most ups and downs in the past 50 years.When somebody says they work on Artificial Intelligence, you are either goingto want to laugh at them or take out a piece of paper and write down everythingthey say. Further reading: My 2011 Blog post ComputerVision is Artificial Intelligence . Conclusion Machine Learning is here to stay. Don't think about it as PatternRecognition vs Machine Learning vs Deep Learning, just realize that each termemphasizes something a little bit different. But the search continues. Go ahead and explore. Break something. We will continue building smartersoftware and our algorithms will continue to learn, but we've only begun toexplore the kinds of architectures that can truly rule-them-all. If you're interested in real-time visionapplications of deep learning, namely those suitable for robotic and homeautomation applications, then you should check out what we've been buildingat vision.ai . Hopefully in a few days, I'll be ableto say a little bit more. :-)
2012_An efficient learning procedure for deep Boltzmann Machine ICML2008_Training restricted Boltzmann machines using approximations to the likelihood gradient(PCD) 1. The Boltzman Machine learning algorithm 2. More efficient ways to get the statistics(Optional) 3. Restricted Boltzmann Machine 4. An example of RBM learning 5. RBM for collabrative filtering
花了一些时间,用英文写了一篇短文,纪念一位老教授 (因为我已经不习惯用中文写文章了)。可是,向中国的同事们递交一篇 英文 纪念 短文,我,我,我还没那个胆。于是,我决定借GTS的一臂之力: http://www.gts-translation.com/tools/free-online-translation/ I believe in mentoring —in memory of Prof. Chen 我 坚 信 为 人 师长是我们的义务 -以此纪念陈世训教授 When I look back on my life, I see more than my own footprints. There were my parents, of course. There were also teachers at every phase of my life, from grade school all the way through college. And, there were graduate school professors. Unlike in college when there were 50 students in one class, students in graduate schools had close interaction with their professors then. Whenever I look back on my school years, I am always filled with gratitude, especially toward my professors and mentors in meteorology andoceanography. 当我回首 往事 ,我看到在我自己的脚印周 围 , 还 有 许许 多多的脚印。有我父母的脚印。有每一 个阶段的 老 师们 的脚印,从小学一年 级 一直到大学 毕业 ;研究生院教授 们 的脚印更是清新可 见 。当时的研究生与教授之 间 有着比较密切的互 动 。每当想起我的学生 时 代,我就充 满 了感激之情,尤其是 对 我的气象学与海洋学的教授 们 和 导师们。 I have many fond memories about the Meteorology Department of Zhongshan University, where I was a graduate student and then stayed on briefly as a junior faculty. I remember riding bicycle across the old Pearl River Bridge everyday to use a “super computer” off campus for my thesis work. I learnt how to play volleyball from my colleagues at the department. Though I never became very good at it, I was ableto join beach volleyball games when I was a graduate student at Nova in South Florida. I don’t need to close my eyes to see you, Prof. Chen, always neatly dressed and always with a kind smile. I never forget what you told us in agroup meeting before my graduation, on the importance of solidarity. 我 对 中大气象系有 许 多美好的回 忆。在那儿,我度过了五个春秋, 三年的 硕 士研究生,然后留校工作。 为 了 硕 士 论 文的数据 计 算, 我们 每天 骑 自行 车 横跨珠江大 桥 去市里的一个 计 算中心工作 。 系里的 男教工教我们女教工打排球。虽然我一直打得不好,但在南佛罗里达州的 Nova 读研究生的时候,我也能够参加沙滩排球友谊赛。记忆中的陈先生总是衣着整齐,面带微笑。这一切,至今历历在目。我也不会忘记在毕业前的一次组会上,您 谆谆 教 诲 我们“要团结”。 (隐去 下面 三段。否则,原文发表时会被认为是“抄袭”。) ps. The Chinese version has been edited by a real person, not GTS, after I have worked hard on it.
I typed in 打 边 炉 and it gave me Hot pot. I am impressed! Now, you try it, and let me know how you like it (or not). http://www.gts-translation.com/tools/free-online-translation/ Is it perfect? Of course NOT! It knows 好好学习, but NOT 天天向上。
So it is time to compare and summarize the pros and cons of the two basic NLP (Natural Language Processing) approaches and show where they are complementary to each other. Some notes: 1. In text processing, majority of basic robust machine learning is based on keywords, so-called BOW (bag-of-word) model although there is research of machine learning that goes beyond keywords. It actutally utilizes n-gram (mostly bigram or trigram) linear word sequence to simulate the language structure. 2. Grammar engineering is mostly a hand-crafted rule system based on linguistic structures (often represented internally as a grammar tree), to simulate the linguistic parsing in human mind. 3. Machine learning is good at viewing the forest (tasks such as document classification or word clustering from a corpus; and it fails in short messages) while rules are good at examining each tree (sentence-level tasks such as parsing and extraction; and it handles short messages well). This is understandable. Document or corpus contains a fairly big bag of keywords, making it easy for machine to learn statistical clues of the words for a given task. Short messages do not have enough data points for a machine learning system to use as evidence. On the other hand, grammar rules decode the linguistic relationships between words to understand the sentence, therefore it is good at handling short messages. 4. In general, a machine learning system based on keyword statistics is recall-oriented while a rule system is precision-oriented. They are complementary in these two core metrics of data quality. Each rule may only cover a tiny portion of the language phenomena, but once it captures it, it is usually precise. It is easy to develop a highly precise rule system but the recall typically only picks up incrementally in accordance with the number of rules developed. Because keyword based machine learning has no knowledge of sentence structures, at best its ngram evidence indirectly simulates languiage structure, it usually cannot reach high precision, but as long as the training corpus is sizable, good recall can be expected by the nature of underlying keyword statistics and the disregard for structural constraints. 5. Machine learning is known for its robustness and scalability as its algorithms are based on science (e.g. MaxEnt is based on information theory) that can be repeated and rigidly tested (of course, like any application areas, there are tricks and know-how to make things work or fail too in practice). The development is also fast once the labeled corpus is available (which is often not easy in practice) because there are off-shelf tools in open source and tons of documentation and literature in the community for proven ML algorithms. 6. Grammar engineering on the other hand tends to depend more on the expertise of the designer and developer for being robust and scalable. It requires deep skills and secret source which may only be accumulated based on years of successes as well as lessons learned. It is not purely a scientific undertaking but more of a blancing art in architect, design and development. To a degree, this is like chefs for Chinese cooking: with the same materials and the assumably the same recipe, one chef's dish can taste a lot better or different from that of another chef. Recipe only gives a framework while the monster of great taste is in the details of know-how. It is not easily repeatable across developers but the same master can repeatedly make the best quality dishes/systems. 7. The knowledge bottleneck is reflected in both machine learning systems and in grammar systems. A decent machine learning system requires a large hand-labeled corpus (research oriented unsupervised learning systems do not need manual annotation, but they are often not practical either). There is consensus in the community that the quality of machine learning usually depends more on the data than on the algorithms. On the other hand, the bottleneck of grammar engineering lies in skilled designer (data scientist) and well-trained domain developers (computational linguists), who are often in short supply today. 8. Machine learning is good at coarse-grained specific task (typical example is classification) while grammar engineering is good at fine-grained analysis and detailed insight extraction. Their respective strengths make them highly complementary in certain application scenarios because as information consumers, users often demand both coarse-grained overview as well as details of actionable intelligence. 9. One big big problem of a machine learning system is the difficulty to fix a reported quality bug. This is because the learned model is usually a black box and no direct human interference is allowed or even possible to address a specific problem unless the model is re-trained with new corpus and/or new features. In the latter case, there is no guarantee that the specific problem we want to solve will be addressed well by re-training as the learning process needs to blance all features in a unified model. This issue is believed to be the major reason why the Google search ranking algorithm favors hand-crafted functions over machine learning because their objective of better user experience can hardly by achieved by a black box model . 10. Grammar system is much more transparent in the language understanding process. The modern grammar systems are all designed with careful modularization so that each specific quality bug can be traced to the corresponding module of the system for fine-tuning. The effect is direct, immediate and can be incrementally accumulated for overall perforamcece enhancement. 11. From the perspective of the NLP depth, at least for the current state of the art, machine learning seems to do shallow NLP work fairly well while grammar engineering can go much deeper in linguistic parsing to achieve deep analytics and insights. (The on-going deep learning research program might get machine learning to some level deeper than before, but it is yet to see how effective it can do real deep NLP and how deep it can go, especially in the area of text processing and understanding.) Related blogs: why hybrid? on machine learning vs. hand-coded rules in NLP 再谈机器学习和手工系统:人和机器谁更聪明能干? 【置顶:立委科学网博客NLP博文一览(定期更新版)】
Quora has a question with discussions on Why is machine learning used heavily for Google's ad ranking and less for their search ranking? A lot of people I've talked to at Google have told me that the ad ranking system is largely machine learning based, while search ranking is rooted in functions that are written by humans using their intuition (with some components using machine learning). Surprise? Contrary to what many people have believed, Google search consists of hand-crafted functions using heuristics. Why? 479 One very popular reply there is from Edmond Lau , Ex-Google Search Quality Engineer who said something which we have been experiencing and have indicated over and over in my past blogs on Machine Learning vs. Rule System, i.e. it is very difficult to debug an ML system for specific observed quality bugs while the rule system, if designed modularly, is easy to control for fine-tuning: From what I gathered while I was there, Amit Singhal , who heads Google's core ranking team, has a philosophical bias against using machine learning in search ranking. My understanding for the two main reasons behind this philosophy is: In a machine learning system, it's hard to explain and ascertain why a particular search result ranks more highly than another result for a given query. The explainability of a certain decision can be fairly elusive; most machine learning algorithms tend to be black boxes that at best expose weights and models that can only paint a coarse picture of why a certain decision was made. Even in situations where someone succeeds in identifying the signals that factored into why one result was ranked more highly than other, it's difficult to directly tweak a machine learning-based system to boost the importance of certain signals over others in isolated contexts. The signals and features that feed into a machine learning system tend to only indirectly affect the output through layers of weights, and this lack of direct control means that even if a human can explain why one web page is better than another for a given query, it can be difficult to embed that human intuition into a system based on machine learning. Rule-based scoring metrics, while still complex, provide a greater opportunity for engineers to directly tweak weights in specific situations. From Google's dominance in web search, it's fairly clear that the decision to optimize for explainability and control over search result rankings has been successful at allowing the team to iterate and improve rapidly on search ranking quality. The team launched 450 improvements in 2008 , and the number is likely only growing with time. Ads ranking, on the other hand, tends to be much more of an optimization problem where the quality of two ads are much harder to compare and intuit than two web page results. Whereas web pages are fairly distinctive and can be compared and rated by human evaluators on their relevance and quality for a given query , the short three- or four-line ads that appear in web search all look fairly similar to humans. It might be easy for a human to identify an obviously terrible ad, but it's difficult to compare two reasonable ones: Branding differences, subtle textual cues, and behavioral traits of the user, which are hard for humans to intuit but easy for machines to identify, become much more important. Moreover, different advertisers have different budgets and different bids, making ad ranking more of a revenue optimization problem than merely a quality optimization problem. Because humans are less able to understand the decision behind an ads ranking decision that may work well empirically, explainability and control -- both of which are important for search ranking -- become comparatively less useful in ads ranking, and machine learning becomes a much more viable option. Jackie Bavaro , Google PM for 3 years Suggest Bio Votes by Piaw Na ( Worked at Google ) , Marc Bodnick , Alex Clemmer , Tudor Achim , and 92 more . Edmond Lau's answer is great, but I wanted to add one more important piece of information. When I was on the search team at Google (2008-2010), many of the groups in search were moving away from machine learning systems to the rules-based systems. That is to say that Google Search used to use more machine learning, and then went the other direction because the team realized they could make faster improvements to search quality with a rules based system. It's not just a bias, it's something that many sub-teams of search tried out and preferred. I was the PM for Images, Video, and Local Universal - 3 teams that focus on including the best results when they are images, videos, or places. For each of those teams I could easily understand and remember how the rules worked. I would frequently look at random searches and their results and think Did we include the right Images for this search? If not, how could we have done better?. And when we asked that question, we were usually able to think of signals that would have helped - try it yourself. The reasons why *you* think we should have shown a certain image are usually things that Google can actually figure out. Upvote • Comment • Share • Thank • Report • Written 10 Apr, 2013 Anonymous Votes by Edmond Lau ( Ex-Google Search Quality Engineer ) , Bin Lu ( Software Engineer at Google ) , Keith Rabois , Vu Ha , and 34 more . Part of the answer is legacy, but a bigger part of the answer is the difference in objectives, scope and customers of the two systems. The customer for the ad-system is the advertiser (and by proxy, Google's sales dept). If the machine-learning system does a poor job, the advertisers are unhappy and Google makes less money. Relatively speaking, this is tolerable to Google. The system has an objective function ($) and machine learning systems can be used when they can work with an objective function to optimize. The total search-space (# of ads) is also much much smaller. The search ranking system has a very subjective goal - user happiness. CTR, query volume etc. are very inexact metrics for this goal, especially on the fringes (i.e. query terms that are low-volume/volatile). While much of the decisioning can be automated, there are still lots of decisions that need human intuition. To tell whether site A better than site B for topic X with limited behavioural data is still a very hard problem. It degenerates into lots of little messy rules and exceptions that tries to impose a fragile structure onto human knowledge, that necessarily needs tweaking. An interesting question is - is the Google search index (and associated semantic structures) catching up (in size and robustness) to the subset of the corpus of human knowledge that people are interested in and searching for ? My guess is that right now, the gap is probably growing - i.e. interesting/search-worthy human knowledge is growing faster than Google's index.. Amit Singhal's job is probably getting harder every year. By extension, there are opportunities for new search providers to step into the increasing gap with unique offerings. p.s: I used to manage an engineering team for a large search provider (many years ago). 【置顶:立委科学网博客NLP博文一览(定期更新版)】
scikit-learn is a Python module integrating classique machine learning algorithmes in the tightly-nit world of scientific Python packages. It aims to provide simple and efficient solutions to learning problems that are accessible to everybody and reusable in various contexts: machine-learning as a versatile tool for science and engineering. website: http://scikit-learn.org/dev/index.html
Machine learning algorithms are used to identify VPN traffic. The internet control in China seems to have been tightened recently , according to the Guardian. Several VPN providers claimed that the censorship system can 'learn, discover and block' encrypted VPN protocols. Using machine learning algorithms in protocol classification is not exactly a new topic in the field . And given the fact that even the founding father of the 'Great Firewall,' Fan Bingxing himself, has also written a paper about utilizing machine learning algorithm in encrypted traffic analysis, it would be not surprising at all if they are now starting to identify suspicious encrypted traffic using numerically efficient classifiers . So the arm race between anti-censorship and surveillance technology goes on.
http://matpalm.com/blog/2010/08/06/my-list-of-cool-machine-learning-books/ 0) " Machine Learning: a Probabilistic Perspective " byKevin Patrick Murphy Now available amazon.com and other vendors. Electronic versions (e.g., for Kindle) will be available later in the Fall. Table of contents Chapter 1 (Introduction) Information for instructors from MIT Press . If you are an official instructor, you can request an e-copy, which can help you decide if the book is suitable for your class. You can also request the solutions manual. Errata Matlab software All the figures , together with matlab code to generate them 1) "programming collective intelligence" by toby segaran if you know nothing about machine learning and haven't done maths since high school then this is the book for you. it's a fantastically accesible introduction to the field. includes almost no theory and explains algorithms using actual python implementations. 2) "data mining" by witten and frank this book covers quite a bit more than programming c.i. while still being extremely practical (ie very few formula). about a fifth of the book is dedicated to weka, a machine learning workbench which was written by the authors. apart from the weka section this book has no code. i made a little screencast on weka awhile back if you're after a summary. 3) "introduction to data mining" by tan, steinbach and kumar covers almost the same material as the witten/frank text but delves a little bit deeper and with more rigour. includes no code (none of the books do from now on) with algorithms described by formula. has a number of appendices on linear algebra, probability, statistics etc so that you can read up if you're a bit rusty or new to the fields (the witten/frank text lack these). some people might argue having both of these books is a waste since they cover so much of the same ground but i've always found multiple explanations from different authors to be a great way to help understand a topic. i read the witten/frank text first and am glad i did but if i could only keep one i'd keep this one. intermission at this point you've probably got enough mental firepower to handle some of the uni level machine learning course notes that are floating about online. if you're keen to get a better foundation of the maths side of things it'd be worth working through andrew ng's lecture series on machine learning. (20 hours of a second year stanford course on machine learning) i also found andrew moore's lecture slides really great. (they do though require a reasonable understanding of the basics) 4) "foundations of statistical natural language processing" by manning and schutze not a machine learning book as such but great for learning to deal with one of the most common types of data around; text. since most of machine learning theory is about maths (ie numbers) this is awesome in helping to understanding how to deal with text in a mathematical context. 5) "introduction to machine learning" by ethem alpaydin covers generally the same sort of topics as the data mining books but with much more rigour and theory (derivations, proofs, etc). i think this is a good thing though since understanding how things work at a low level gives you the ability to tweak and modify as required. loads more formulas but again with appendixs that introduce the basics in enough detail to get by. 6) "all of statistics" by larry wasserman by this stage you'll probably have an appreciation of how important statistics is for this domain and it might be worth foccussing on it for a bit. personally i found this book to be a great read and though i've only read certain sections in depth i'm looking forward to when i get a chance to work through it cover to cover 7) "the elements of statistical learning" by hastie, tibshirani and friedman. with a bit more stats under your belt you might have a chance of getting through this one; the most complex of the lot. this book is absolutely beautifully presented and now that it's FREE to download you've got no reason not to have a crack at it. a remarkable piece of work and one i've yet to get through fully cover to cover, it's quite hardcore and right on the border of my level of understanding ( which makes it perfect for me :P ) ps. books i haven't read that are in the mail "machine learning" by tom mitchell have been wanting to read this one for awhile, i'm a big fan of tom mitchell , but couldn't justify the cost however just found out the other day the paperback is a third of the price of the hardback i was looking at!! the book's in the mail "pattern recognition and machine learning" by chris bishop all of a sudden seemed like everyone was reading this but me so it was time to jump on the bandwagon 《模式分类》如果是计算机、物理背景的 ,先看 Bishop的Machine Learning and Pattern Recognition ,然后看T. Hastie的 Elements of Statistical Learning 如果是数学、统计背景的,调转个顺序就可以了。Bishop的那本太厚推荐Jordan的统计学习的课件,全面,难度适中 http://www.cs.berkeley.edu/~jordan/courses/281B-spring04/ 如果实在对英文没兴趣,可以看看李航的那本统计学习,比较基础 如果仅仅想看看这方面的应用情景,推荐吴军的数学之美 以上内容转自 http://www.zhizhihu.com/html/y2012/4019.html
Characteristics of chip evolution with elevating cutting speed from low to very high Liu Zhanqiang*, Su Guosheng To validate the correlation between chip morphology and material dynamic mechanical properties, a wide-range cutting speed (from 30 m/min to 7000 m/min) experiment is conducted with AerMet100 steel. The chips are collected and photos are taken with an optical microscope. The focus is put on workpiece material embrittlement and chip morphology evolution with the cutting speed rising. It is found that with the increase of cutting speed the workpiece material embrittles. At 7000 m/min the metal becomes completely brittle and the chip is made up of small non-plastic fractured fragments. Characteristics of the cutting temperature and cutting heat in the process are also presented. 2012-Characteristics of chip evolution with elevating cutting speed from low to .pdf
What is life? 是在1943年,薛定谔在都柏林做的一系列演讲题目,并汇总写成了一本书,就叫《What is Life?》,在我博客《 薛定谔的深刻洞见 》里有介绍。这个问题和这本书吸引了很多物理学家投身生命科学,最终发现DNA的双螺旋结构,破解了很多生命的密码。现在,2012年,文特尔(Craig Venter)在都柏林又做了一次演讲,题目是《What is Life? – A 21 st century perspective》,用现代的生物学发现,试图回答这个问题,阐释生命的本质。 Venter认为,生命的本质就是图灵机,生命的过程就像是运行一个软件!这个软件就编码在DNA中,并且可以修改!Venter组建了 Celera Genomic ,先于人类基因组计划完成测序,这已经是一个了不起的成就。Venter还创立了 JCVI( J. Craig Venter Institute ),通过组建DNA,创造了生命,引发了合成生物学的革命。可以说,在Venter眼里,生命确实是一堆超长序列编码的程序,人生就是这个程序的运行。这个程序可以修改,但必须非常小心,因为每个部分是紧密相连的。胡乱修改很容易造成系统崩溃。 从人性的角度,这是让很多人难以接受的。一个大活人,怎么变成了一部机器和程序了?这生命是不是太没意义了? 我认同Venter的观点,但这并不会导致人生没有意义。天生的东西难以改变,这是父母遗传。本身可以看作自然的馈赠,构成了自我的一部分,但并不是全部的自我。全部的自我还包括后天的经历,人生的轨迹等等。当然,还有大脑的发育和记忆的形成,都是基于基因,但超越基因的。相同的基因,得到类似的大脑结构,就像构建了一个计算机硬件。但里面运行什么程序,记载了什么内容,则是后天的自我经历造成的。每个人的身体和脑结构差不多,但还是可以有不一样的人生。打个不太恰当的比喻,几台计算机,有的装了Windows,有的装了Linux,有的有魔兽,有的有星际,运转起来是很不一样的。即使是都装了星际,选择的地图,操作的玩家,选择的战略战术,还有微操,对手的战略战术等,都会造成一局独一无二的游戏。这个游戏就是自我。 Craig Venter 网上热传的,延参法师讲解绳命。这也可以对比中西方的文化和思维,这两人还有点像,有木有?!中国人的思维方式还是模糊的,西方人一直在追求精确的理解。这是普通大众的兴趣造成的,也就是一种文化。在博文《 文化之殇--传统对锐气的消磨 》里有分析。 参考资料: 【1】 “人造生命之父”文特尔:生命是一台图灵机 ,中国科学报,2012 【2】 ‘What is Life? – A 21st Century Perspective’ with Dr. Craig Venter
http://www.work.caltech.edu/telecourse.html A real Caltech course, not a watered-down version Free , introductory Machine Learning course Taught by Caltech Professor Yaser Abu-Mostafa Lectures recorded from a live broadcast, including QA Prerequisites: Basic probability, matrices, and calculus Homeworks with online grading and ranking Discussion forum for participants Summer session starts on July 10, 2012
http://scienceblog.com/ Blurring the line between man and machine Work out harder through motivation in a pill? Do Trees Crave Personal Space? MicroRNAs, autophagy and clear cell renal cell carcinoma Tighten your Tropical Belts: Climate Change in the North Volcanic gases could deplete ozone layer Contaminated Alcohol Pads Tied to Illnesses in Children’s Hospital Why are some people greener than others? Radiation-resistant circuits can survive space, damaged nuclear plants NASA Sees Smoke from Siberian Fires Reach the U.S. Coast Mild stress can affect perceptions same as life-threatening stress A father’s love is one of the greatest influences on personality development
Before we start discussing the topic of a hybrid NLP (Natural Language Processing) system, let us look at the concept of hybrid from our life experiences. I was driving a classical Camry for years and had never thought of a change to other brands because as a vehicle, there was really nothing to complain. Yes, style is old but I am getting old too, who beats whom? Until one day a few years ago when we needed to buy a new car to retire my damaged Camry. My daughter suggested hybrid, following the trend of going green. So I ended up driving a Prius ever since and fallen in love with it. It is quiet, with bluetooth and line-in, ideal for my iPhone music enjoyment. It has low emission and I finally can say bye to smog tests. It at least saves 1/3 gas. We could have gained all these benefits by purchasing an expensive all-electronic car but I want the same feel of power at freeway and dislike the concept of having to charge the car too frequently. Hybrid gets the best of both worlds for me now, and is not that more expensive. Now back to NLP. There are two major approaches to NLP, namely machine learning and grammar engineering (or hand-crafted rule system). As mentioned in previous posts, each has its own strengths and limitations, as summarized below. In general, a rule system is good at capturing a specific language phenomenon (trees) while machine learning is good at representing the general picture of the phenomena (forest). As a result, it is easier for rule systems to reach high precision but it takes a long time to develop enough rules to gradually raise the recall. Machine learning, on the other hand, has much higher recall, usually with compromise in precision or with a precision ceiling. Machine learning is good at simple, clear and coarse-grained task while rules are good at fine-grained tasks. One example is sentiment extraction. The coarse-grained task there is sentiment classification of documents (thumbs-up thumbs down), which can be achieved fast by a learning system. The fine-grained task for sentiment extraction involves extraction of sentiment details and the related actionable insights, including association of the sentiment with an object, differentiating positive/negative emotions from positive/negative behaviors, capturing the aspects or features of the object involved, decoding the motivation or reasons behind the sentiment,etc. In order to perform sophisticated tasks of extracting such details and actionable insights, rules are a better fit. The strength for machine learning lies in its retraining ability. In theory, the algorithm, once developed and debugged, remains stable and the improvement of a learning system can be expected once a larger and better quality corpus is used for retraining (in practice, retraining is not always easy: I have seen famous learning systems deployed in client basis for years without being retrained for various reasons). Rules, on the other hand, need to be manually crafted and enhanced. Supervised machine learning is more mature for applications but it requires a large labelled corpus. Unsupervised machine learning only needs raw corpus, but it is research oriented and more risky in application. A promising approach is called semi-supervised learning which only needs a small labelled corpus as seeds to guide the learning. We can also use rules to generate the initial corpus or seeds for semi-supervised learning. Both approaches involve knowledge bottlenecks. Rule systems's bottleneck is the skilled labor, it requires linguists or knowledge engineers to manually encode each rule in NLP, much like a software engineer in the daily work of coding. The biggest challenge to machine learning is the sparse data problem, which requires a very large labelled corpus to help overcome. The knowledge bottleneck for supervised machine learning is the labor required for labeling such a large corpus. We can build a system to combine the two approaches to complement each other. There are different ways of combining the two approaches in a hybrid system. One example is the practice we use in our product, where the results of insights are structured in a back-off model: high precision results from rules are ranked higher than the medium precision results returned by statistical systems or machine learning. This helps the system to reach configurable balance between precision and recall. When labelled data are available (e.g. the community has already built the corpus, or for some tasks, the public domain has the data, e.g. sentiment classification of movie reviews can use the review data with users' feedback on 5-star scale), and when the task is simple and clearly defined, using machine learning will greatly speed up the development of a capability. Not every task is suitable for both approaches. (Note that suitability is in the eyes of beholder: I have seen many passionate ML specialists willing to try everything in ML irrespective of the nature of the task: as an old saying goes, when you have a hammer, everything looks like a nail.) For example, machine learning is good at document classification whilerules are mostly powerless for such tasks. But for complicated tasks such as deep parsing, rules constructed by linguists usually achieve better performance than machine learning. Rules also perform better for tasks which have clear patterns, for example, identifying data items like time,weight, length, money, address etc. This is because clear patterns can be directly encoded in rules to be logically complete in coverage while machine learning based on samples still has a sparse data challenge. When designing a system, in addition to using a hybrid approach for some tasks, for other tasks, we should choose the most suitable approach depending on the nature of the tasks. Other aspects of comparison between the two approaches involve the modularization and debugging in industrial development. A rule system can be structured as a pipeline of modules fairly easily so that a complicated task is decomposed into a series of subtasks handled by different levels of modules. In such an architecture, a reported bug is easy to localize and fix by adjusting the rules in the related module. Machine learning systems are based on the learned model trained from the corpus. The model itself, once learned, is often like a black-box (even when the model is represented by a list of symbolic rules as results of learning, it is risky to manually mess up with the rules in fixing a data quality bug). Bugs are supposed to be fixable during retraining of the model based on enhanced corpus and/or adjusting new features. But re-training is a complicated process which may or may not solve the problem. It is difficultto localize and directly handle specific reported bugs in machine learning. To conclude, due to the complementary nature for pros/cons of the two basic approaches to NLP, a hybrid system involving both approaches is desirable, worth more attention and exploration. There are different ways of combining the two approaches in a system, including a back-off model using rulles for precision and learning for recall, semi-supervised learning using high precision rules to generate initial corpus or “seeds”, etc.. Related posts: Comparison of Pros and Cons of Two NLP Approaches Is Google ranking based on machine learning ? 《立委随笔:语言自动分析的两个路子》 《立委随笔:机器学习和自然语言处理》 【置顶:立委科学网博客NLP博文一览(定期更新版)】
I. 数据挖掘基本步骤: 1.数据清洗; 2.数据集成; 3.数据选择; 4.数据变换; 5.数据挖掘; 6.模式评估; 7.知识表示。 II. 知识的类型:特征、关联、分类、聚类、延边分析。 III.数据挖掘基于对象的分类: 1.分类,离散量预测; 2.预测,连续量预测; IV. diff. between ML and DM: In contrast to machine learning, the emphasis of data mining lies on the discovery of previously unknown patterns as opposed to generalizing known patterns to new data. V. diff. between ID3 and C4.5: C4.5 made a number of improvements to ID3. Some of these are: Handling both continuous and discrete attributes - In order to handle continuous attributes, C4.5 creates a threshold and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it. Handling training data with missing attribute values - C4.5 allows attribute values to be marked as? for missing. Missing attribute values are simply not used in gain and entropy calculations. Handling attributes with differing costs. Pruning trees after creation - C4.5 goes back through the tree once it's been created and attempts to remove branches that do not help by replacing them with leaf nodes. PS: C5 performs better than C4.5. to be continued...
《Knowledge Discovery in Databases: An Overview》, William J. Frawley, Gregory Piatetsky-Shapiro, and Christopher J. Matheus, AAAI ,1992 Abstract: After a decade of fundamental interdisciplinary research in machine learning, the spadework in this field has been done; the 1990s should see the widespread exploitation of knowledge discovery as an aid to assembling knowledge bases. The contributors to the AAAI Press book \emph{Knowledge Discovery in Databases} were excited at the potential benefits of this research. The editors hope that some of this excitement will communicate itself to AI Magazine readers of this article the goal of this article: This article presents an overview of the state of the art in research on knowledge discovery in databases. We analyze Knowledge Discovery and define it as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. We then compare and contrast database, machine learning, and other approaches to discovery in data. We present a framework for knowledge discovery and examine problems in dealing with large, noisy databases, the use of domain knowledge, the role of the user in the discovery process, discovery methods, and the form and uses of discovered knowledge. We also discuss application issues, including the variety of existing applications and propriety of discovery in social databases. We present criteria for selecting an application in a corporate environment. In conclusion, we argue that discovery in databases is both feasible and practical and outline directions for future research, which include better use of domain knowledge, efficient and incremental algorithms, interactive systems, and integration on multiple levels. 个人点评: 一篇老些的经典数据挖掘综述,个人认为本文两个入脚点:一是Machine Learning (Table 1,2),二是文中 Figure 1 Knowledge Discovery in Databases Overview.pdf beamer_Knowledge_Discovery_Database_Overview.pdf beamer_Knowledge_Discovery_Database_Overview.tex
From: http://mlss2011.comp.nus.edu.sg/index.php?n=Site.Slides MLSS 2011 Machine Learning Summer School 13-17 June 2011, Singapore Slides Speaker Topic Slides Chiranjib Bhattacharyya Kernel Methods Slides ( pdf ) Wray Buntine Introduction to Machine Learning Slides ( pdf ) Zoubin Ghahramani Gaussian Processes, Graphical Model Structure Learning Slides (Part 1 pdf , Part 2 pdf , Part 3 pdf ) Stephen Gould Markov Random Fields for Computer Vision Slides (Part 1 pdf , Part 2 pdf , Part 3 pdf )]] Marko Grobelnik How We Represent Text? ...From Characters to Logic Slides ( pptx ) David Hardoon Multi-Source Learning; Theory and Application Slides ( pdf ) Mark Johnson Probabilistic Models for Computational Linguistics Slides (Part 1 pdf , Part 2 pdf , Part 3 pdf ) Wee Sun Lee Partially Observable Markov Decision Processes Slides ( pdf , pptx ) Hang Li Learning to Rank Slides ( pdf ) Sinno Pan Qiang Yang Transfer Learning Slides (Part1 pptx Part 2 pdf ) Tomi Silander Introduction to Graphical Models Slides ( pdf ) Yee Whye Teh Bayesian Nonparametrics Slides ( pdf ) Ivor Tsang Feature Selection using Structural SVM and its Applications Slides ( pdf ) Max Welling Learning in Markov Random Fields Slides ( pdf , pptx )
Statistical Machine Translation Abstract We have been developing the statistical machine translation system for speech to speech translation. We focus our research on text-to-text translation task now, but we will include speech-to-speech translation among our research topics soon. We have an interest in building a translation model, decoding a word graph and combining statistical machine translation system and speech recognizer. http://isoft.postech.ac.kr/research/SMT/smt.html Statistical Machine Translation Input : SMT system gets a foreign sentence as a input. Output : SMT system generates a native sentence which is a translation of the input Language Model is a model that provides the probability of an arbitarary word sequence. Translation Model is a model that provides the probabilities of possible translation pairs. Decoding Algorithm is a graph search algorithm that provides best path on a word graph. Decoding process A Decoder is a core component of the SMT systzm. The decoder gets possible partial translations from the translation model, then selects an re-arranges them to make the best translation. Initialize : create small partial model for caching an pre-calculate future cost. Hypothesis is a partial translation which generated by applying a series of tranlation options. Decoding process is iterations of two taks: choosing a hypothesis and exapnding the hypothesis. The process terminates if there is no remainig hypothesis to expand. Speech to Speech Machine Translation Speech to Speech Machine Translation can be achieved by cascading three independent components: ASR , SMT system and TTS system. That is, an output of ASR be an input for the SMT system and an output an output of the SMT system be an input for the TTS systm. We use cascading approach now, but we have an interest in joint model which combines ASR and SMT decoder. http://isoft.postech.ac.kr/research/SMT/smt.html
Classical Paper List on Machine Learning and Natural Language Processing from Zhiyuan Liu Hidden Markov Models Rabiner, L. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. (Proceedings of the IEEE 1989) Freitag and McCallum, 2000, Information Extraction with HMM Structures Learned by Stochastic Optimization, (AAAI'00) Maximum Entropy Adwait R. A Maximum Entropy Model for POS tagging, (1994) A. Berger, S. Della Pietra, and V. Della Pietra. A maximum entropy approach to natural language processing. (CL'1996) A. Ratnaparkhi. Maximum Entropy Models for Natural Language Ambiguity Resolution. PhD thesis, University of Pennsylvania, 1998. Hai Leong Chieu, 2002. A Maximum Entropy Approach to Information Extraction from Semi-Structured and Free Text, (AAAI'02) MEMM McCallum et al., 2000, Maximum Entropy Markov Models for Information Extraction and Segmentation, (ICML'00) Punyakanok and Roth, 2001, The Use of Classifiers in Sequential Inference. (NIPS'01) Perceptron McCallum, 2002 Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms (EMNLP'02) Y. Li, K. Bontcheva, and H. Cunningham. Using Uneven-Margins SVM and Perceptron for Information Extraction. (CoNLL'05) SVM Z. Zhang. Weakly-Supervised Relation Classification for Information Extraction (CIKM'04) H. Han et al. Automatic Document Metadata Extraction using Support Vector Machines (JCDL'03) Aidan Finn and Nicholas Kushmerick. Multi-level Boundary Classification for Information Extraction (ECML'2004) Yves Grandvalet, Johnny Marià , A Probabilistic Interpretation of SVMs with an Application to Unbalanced Classification. (NIPS' 05) CRFs J. Lafferty et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. (ICML'01) Hanna Wallach. Efficient Training of Conditional Random Fields. MS Thesis 2002 Taskar, B., Abbeel, P., and Koller, D. Discriminative probabilistic models for relational data. (UAI'02) Fei Sha and Fernando Pereira. Shallow Parsing with Conditional Random Fields. (HLT/NAACL 2003) B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. (NIPS'2003) S. Sarawagi and W. W. Cohen. Semi-Markov Conditional Random Fields for Information Extraction (NIPS'04) Brian Roark et al. Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm (ACL'2004) H. M. Wallach. Conditional Random Fields: An Introduction (2004) Kristjansson, T.; Culotta, A.; Viola, P.; and McCallum, A. Interactive Information Extraction with Constrained Conditional Random Fields. (AAAI'2004) Sunita Sarawagi and William W. Cohen. Semi-Markov Conditional Random Fields for Information Extraction. (NIPS'2004) John Lafferty, Xiaojin Zhu, and Yan Liu. Kernel Conditional Random Fields: Representation and Clique Selection. (ICML'2004) Topic Models Thomas Hofmann. Probabilistic Latent Semantic Indexing. (SIGIR'1999). David Blei, et al. Latent Dirichlet allocation. (JMLR'2003). Thomas L. Griffiths, Mark Steyvers. Finding Scientific Topics. (PNAS'2004). POS Tagging J. Kupiec. Robust part-of-speech tagging using a hidden Markov model. (Computer Speech and Language'1992) Hinrich Schutze and Yoram Singer. Part-of-Speech Tagging using a Variable Memory Markov Model. (ACL'1994) Adwait Ratnaparkhi. A maximum entropy model for part-of-speech tagging. (EMNLP'1996) Noun Phrase Extraction E. Xun, C. Huang, and M. Zhou. A Unified Statistical Model for the Identification of English baseNP. (ACL'00) Named Entity Recognition Andrew McCallum and Wei Li. Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-enhanced Lexicons. (CoNLL'2003). Moshe Fresko et al. A Hybrid Approach to NER by MEMM and Manual Rules, (CIKM'2005). Chinese Word Segmentation Fuchun Peng et al. Chinese Segmentation and New Word Detection using Conditional Random Fields, COLING 2004. Document Data Extraction Andrew McCallum, Dayne Freitag, and Fernando Pereira. Maximum entropy Markov models for information extraction and segmentation. (ICML'2000). David Pinto, Andrew McCallum, etc. Table Extraction Using Conditional Random Fields. SIGIR 2003. Fuchun Peng and Andrew McCallum. Accurate Information Extraction from Research Papers using Conditional Random Fields. (HLT-NAACL'2004) V. Carvalho, W. Cohen. Learning to Extract Signature and Reply Lines from Email. In Proc. of Conference on Email and Spam (CEAS'04) 2004. Jie Tang, Hang Li, Yunbo Cao, and Zhaohui Tang, Email Data Cleaning, SIGKDD'05 P. Viola, and M. Narasimhan. Learning to Extract Information from Semi-structured Text using a Discriminative Context Free Grammar. (SIGIR'05) Yunhua Hu, Hang Li, Yunbo Cao, Dmitriy Meyerzon, Li Teng, and Qinghua Zheng, Automatic Extraction of Titles from General Documents using Machine Learning, Information Processing and Management, 2006 Web Data Extraction Ariadna Quattoni, Michael Collins, and Trevor Darrell. Conditional Random Fields for Object Recognition. (NIPS'2004) Yunhua Hu, Guomao Xin, Ruihua Song, Guoping Hu, Shuming Shi, Yunbo Cao, and Hang Li, Title Extraction from Bodies of HTML Documents and Its Application to Web Page Retrieval, (SIGIR'05) Jun Zhu et al. Mutual Enhancement of Record Detection and Attribute Labeling in Web Data Extraction. (SIGKDD 2006) Event Extraction Kiyotaka Uchimoto, Qing Ma, Masaki Murata, Hiromi Ozaku, and Hitoshi Isahara. Named Entity Extraction Based on A Maximum Entropy Model and Transformation Rules. (ACL'2000) GuoDong Zhou and Jian Su. Named Entity Recognition using an HMM-based Chunk Tagger (ACL'2002) Hai Leong Chieu and Hwee Tou Ng. Named Entity Recognition: A Maximum Entropy Approach Using Global Information. (COLING'2002) Wei Li and Andrew McCallum. Rapid development of Hindi named entity recognition using conditional random fields and feature induction. ACM Trans. Asian Lang. Inf. Process. 2003 Question Answering Rohini K. Srihari and Wei Li. Information Extraction Supported Question Answering. (TREC'1999) Eric Nyberg et al. The JAVELIN Question-Answering System at TREC 2003: A Multi-Strategh Approach with Dynamic Planning. (TREC'2003) Natural Language Parsing Leonid Peshkin and Avi Pfeffer. Bayesian Information Extraction Network. (IJCAI'2003) Joon-Ho Lim et al. Semantic Role Labeling using Maximum Entropy Model. (CoNLL'2004) Trevor Cohn et al. Semantic Role Labeling with Tree Conditional Random Fields. (CoNLL'2005) Kristina toutanova, Aria Haghighi, and Christopher D. Manning. Joint Learning Improves Semantic Role Labeling. (ACL'2005) Shallow parsing Ferran Pla, Antonio Molina, and Natividad Prieto. Improving text chunking by means of lexical-contextual information in statistical language models. (CoNLL'2000) GuoDong Zhou, Jian Su, and TongGuan Tey. Hybrid text chunking. (CoNLL'2000) Fei Sha and Fernando Pereira. Shallow Parsing with Conditional Random Fields. (HLT-NAACL'2003) Acknowledgement Dr. Hang Li , for original paper list.
Before you read this Blog , I have a short story to share with you. One day many months ago, a colleague asked me how I was going to make a living in a few years when MACHINE will be translating, say English into Chinese, or vice verse. I didn't know how to answer his question, and became worried (because I was planning to be a full-time freelance English editor). So, I went home and did my homework, by asking the machine to translate a page for me online. Guess what happened? This is what a machine can do for us, in terms of translation. Enjoy 百科名片 Wikipedia card 临床营养学是关于食物中营养素的性质,分布,代谢作用以及食物摄入不足的后果的一门科学。 Journal of Clinical Nutrition is about the nature of nutrients in food, distribution, metabolism and food intake in the consequences of a science. 临床营养学中的营养素是指食物中能被吸收及用于增进健康的化学物。 In Clinical Nutrition is the food nutrients can be absorbed and used to improve the health of the chemicals. 某些营养素是必需的,因为它们不能被机体合成,因此必须从食物中获得。 Certain nutrients are necessary because they can not synthesized by the body and therefore must obtain from food. 对患者来说,合理平衡的营养饮食极为重要。 For patients, a reasonable balance diet is extremely important. 医食同源,药食同根,表明营养饮食和药物对于治疗疾病有异曲同工之处。 Medical and Edible food and medicine from the same root, that diet and medication for the treatment of diseases would be similar. 合理的营养饮食可提高机体预防疾病、抗手术和麻醉的能力。 A reasonable diet can improve the body to prevent disease, the ability of anti-surgery and anesthesia.