自然语言处理与本体工程研讨会 (NLPOE 2010)(EI收录) Workshop on 3 rd Natural Language Processing and Ontology Engineering (NLPOE 2010) http://nlpoe2010.pqpq.net/ In conjunction with The 2010 IEEE/WIC/ACM International Conference on Web Intelligence (WI-10) August 31-September 3, 2010, Toronto, Canada Call for Papers Natural Language Processing (NLP) addresses the problems of automated understanding and generation of natural human languages. The former identifies the syntactic structure of a sentence, judges the semantic relations among the syntactic constituents, in hopes of reaching at an eventual understanding of the sentence. The latter process constructs the semantic structures and syntactic constituents according to the semantic and syntactic properties of the lexical items selected, and eventually generates grammatically well-formed sentences. The goal of the NLP applications is to facilitate human-machine communication using natural languages. In particular, it is to establish various computer application software systems to process natural language, such as machine translation, computer-assisted teaching, information retrieval, automatic text categorization, automatic summarization, speech recognition and synthesis, information extraction from the text, intelligent search on the Internet. Today, with the wide use of the Internet, the demand for language information puts a high premium on automated processing of massive language information. Ontology engineering is a subfield of artificial intelligence and computer science, which aims at a structured representation of terms and relationship between the terms within particular domain, with the purpose to facilitate knowledge sharing and knowledge reuse. Ontology project involves the development of Ontology building programs, Ontology life-cycle management, the research of Ontology building methods, support tools and ontology languages, and a series of similar activities. Ontologies have found important applications in information sharing, system integration, knowledge-based software development and many other issues in software industry. However, ontology engineering is a time-consuming and painstaking endeavor, and NLP technology has important contributions to make in quick and automatic development of ontologies. This workshop will focus on the recent advances made in Ontology engineering and NLP, with the aim to promote the interaction between and common growth of the two areas. We are particularly interested in the building of upper-level language ontology in NLP and the application of NLP technology in Ontology engineering. More importantly, we expect that individuals and research institutions in the areas of both Ontology engineering and NLP could pay attention to this workshop, which may contribute to the integration and growth of these two areas. The topics of the workshop include, but are not limited to, the following: 1.Natural language understanding, including syntactic parsing, word sense disambiguation, semantic role labeling etc; 2.Text mining, including named entity recognition, term recognition, term and synonyms and concept extraction, relation extraction etc) 3.Lexical resources and corpora, including dictionaries, thesaurus, ontology, etc; 4.Ontology learning and population from text, Web and other resources; 5.Application issues of ontology based NLP: information extraction, text categorization, text summarization and other applications; 6.Other topics of relevance in ontology learning, ontology evolution, ontology modeling and ontology application etc. Paper Submission Paper submissions should be limited to a maximum of 4 pages (only one more page is available and extra payment is required for the extra page). The papers must be in English and should be formatted according to the IEEE 2-column format (see the Author Guidelines at http://www.computer.org/portal/pages/cscps/cps/final/wi08.xml ). All submitted papers will be reviewed by at least 2 program committee members on the basis of technical quality, relevance, significance, and clarity. The workshop only accepts on-line submissions. Please use the Submission Form on the WI'10 website to submit your paper. http://wi-consortium.org/cyberchair/wiiat10/scripts/ws_submit.php Publication All papers accepted for workshops will be included in the Workshop Proceedings published by the IEEE Computer Society Press that are indexed by EI, and will be available at the workshops. Important Dates Workshop paper submission: April 16, 2010 Notification of paper acceptance: May 28, 2010 Camera-ready of accepted papers: June 21, 2010 Workshops date: August 31, 2010 Conference dates: September 1 - 3, 2010 Workshop Organizers Zhifang Sui Associate Professor Institute of Computational Linguistics (ICL), Peking University No.5 Yiheyuan Rd. haidian District.100871,Beijing China E-mail:suizhifang@gmail.com Tel:086-01062753081-105 Yao Liu Associate Professor Institute of Scientific and Technical Information of China No.15 Fuxing Road haidian District, Beijing 100038 China E-mail:liuy@istic.ac.cn Tel:086-01058882 053 From:http://www.sciencenet.cn/m/user_content.aspx?id=301834
今年参加开放本体仓储( Open Ontology Repository -- OOR )活动,又在FRSAR报告撰写中重读了很多主题分类方面的经典,回头审视本体的功能和局限,觉得过去自己对本体的认识有很多盲点,值得总结出来,也许这些分析与许多权威专家的想法不太一致,撂在这里,算是抛砖引玉。 自本体一词(又称实用分类系统)在图书馆情报学中开始流传以来,有时候本体被寄予了无限美好的希望,好像在漫长而没有尽头的对图书情报进行组织和检索途径的摸索中 ,终于见到了一点亮光;这颗虽未被尝过(是因为还没得到而不是因为舍不得尝)、但已经被贴上万灵标签的灵丹妙药,几乎要成为每个2010年代宏伟计划的起点以及终结; 我们希望网络世界的无序的问题会在本体的万能功用面前不攻自破... 网上成百上千本体资源(专指编成的并正式表达出来的ontologies)的存在,美国、英国等政府资助的长期、大型本体中心的突出成果(如: BioPortal ), 人气很旺的本体峰会( ontology Summit )年会及其节奏极快的网上举行的会前周会(weekly), 也多多少少证明了本体之热和能。 问题是,对于以情报检索、文献资源组织、面向读者(而不是机器)服务的我们的图情工作,本体到底能有多大的本事? 我认为,(1)本体是很好的可以利用的知识组织系统(KOS),我们必须理解和利用已经存在的众多的本体资源。(2)以推理为目标和按逻辑和公理制作的本体(泛指词)只是可以利用的知识组织系统之一,不应是我们唯一的、也不应是最终的目标和手段。 An ontology is a formal, explicit specification of a shared conceptualization。Studer, R., Benjamins, and Fensel, D. (1998). Knowledge engineering: Principles and methods, Data and Knowledge Engineering, 25(1998): 161-197. 这里的formal, 指机器可处理的,概念、属性、相关关系、限定条件等都有明确定义的。 在机器处理中,按照逻辑和公理(axioms)来建立的本体才可能不产生语义含糊的推理,昆虫(纲)(Insecta) 只能属于节支动物门 (Arthropoda), 在本体的类目(class)之间,下位类自然承袭上位类的所有属性(attributes)(但还有更独特的其它属性)。 这种严格的属--种 (genus-species)等级 关系,在taxonomies(知识分类表)中一般来说得到 基本 保证,在 thesauri (叙词表)中得到 一定 到 基本 的保证,(根据学科专业而定) ,在classification schemes (图书馆分类法)中则得到 较少 的保证。 属--种关系只是图书分类法中采用的等级关系的一种,在很多情况下,图书馆分类法是根据事物被学习、研究、讨论和由此产生的学科专业、书、刊、论文等的情况而列的,这样昆虫可以被列在很多类下,例如农业害虫, 疾病载体, 食品, 艺术表现, 控制技术等等类目下面。这种'perspective hierarchies (角度等级)不带有属种关系所遵循的概念内涵的逻辑关系,不反映概念自反性 (reflexivity),反对称性(anti-symmetry),传递性(transitivity)等基本属性。这些观点和角度与被研究对象概念在本体中是通过很多其它方式(而不是等级方式)来表现的,比如属性(attributes), 限定和规定(restrictions,rules)、概念关系类型 (semantic relation types) 和经由它们生成的概念- -关系断言(assertions)等。 本体和其它KOS也都采用整体-部分(whole-part) 和类-例(class-instances)等级,这里略去不表。 本体与图书馆情报学的KOS(1)有没有必要一起用?(2)能不能互相利用?(3)本体取代所有KOS? (有空时再抛一砖。)下图是一个典型实例,可以用来解释本体资源(ontologies)显示概念关系的方法, 但是这里并没有展现在装载本体的文件里的各种对概念关系的表达。 这个ontology是用Protege为工具做的,注册到BioPortal。 BioPortal集中了不少生物和生物医学方面的ontologies, 有三种表现格式: Protege, OWL(Lite, Full, DL), OBO自己的格式。不管是什么格式表现和存储的,其显示给人(而不是机器)看的界面是一样的。如图。 注意左边细胞(cell)在hasSubclass等级结构中的显示。Cell承袭上位类的所有特征(attributes),其下位类承袭它的所有特征,这些特征可以在右边上栏的attributes下面看到,包括: has_boundary; has_inherent_3-D_shape; dimension; has_dimension; has_mass; Definition; Comment; Synonym. 这个类与其它类目的关系的局部显示在右下栏中。这些关系遵循某个ontology定义好的关系类型。不同的ontologies往往根据需要定义一些的关系类型。可以看出,这些关系类型得到特定的揭示,是叙词表或分类表所缺乏的。 但是要记住的是,为什么ontologies需要这些定义和严格的关系,因为它们的原始目的并不是标引和检索,而是判断和推理。 屏幕图像来自Foundational Model of Anatomy ontology (FMA)在BioPortal上的显示。 Source: BioPortal of the Open Biomedical Ontologies (OBO) library, National Center for Biomedical Ontology (http://www.bioontology.org/tools/portal/bioportal.html) 在对关系的表达格式中,本体采用的编码语言容许其达到不同的境界,比如,OWL Web Ontology Language 建立在RDF的基础上,有很多特定的表达,例如 (用英文总结的,没有来得及翻译): ? cardinality constraints on properties, e.g., a Star is memberOf exactly one Galaxy; ? specifying constraints on the range or cardinality of a property depend on the class of resource, e.g., for a binarySystem the hasMember property has 2 values, while for a tripleSystem the same property should have 3 values; ? specifying that a given property is transitive, e.g., if A hasAncestor B, and B hasAncestor C, then A hasAncestor C; ? specifying that a given property is a unique identifier (or key) for instances of a particular class; ? Equivalent class - specifying that two different classes (having different URIrefs) actually represent the same class; ? Same as - specifying that two different instances (having different URIrefs) actually represent the same individual; ? the ability to describe new classes in terms of combinations (e.g., unions and intersections) of other classes; and ? the ability to describe disjoint classes (i.e., no instance belongs to both classes), e.g., benign and malignant. 这些表达使得本体的关系推理得到保障,比如对成员范围甚至个数的规定,对排斥关系(disjoint)的概念的表述(例如良性与恶性肿瘤之间的关系 -- 虽然这个例子让人看了很不舒服,但是很能说服问题,所以先用了再说吧)等等。 我想用这个例子来看本体与传统KOS(例如taxonomies) 的关系。SWAD 例子比较好懂,先放在这里。其它实例下次继续介绍。 SWED (Semantic Web Environmental Directory) 是关于环境科学和环境保护方面的组织和项目的名录或指南,作为一个portal, 其内容是通过语义网工具和内容收割过来的 。 SWAD的ontology里面建立了几个提供等级关系结构的知识分类表 (taxonomies),如图所示。 来源:Alistair Miles, Taxonomies and the Semantic Web, CISTRANA Workshop 02/05 另外,Alistair Miles通过SWAD的实例对编码的经济效益作了一些比较研究,发现最合适的方式就是SKOS和OWL的结合。 Source: Alistair Miles, Taxonomies and the Semantic Web, CISTRANA Workshop 02/06 http://isegserv.itd.rl.ac.uk/public/skos/press/cistrana200602/taxonomies-semanticweb.ppt 也就是说,对等级结构的KOS成份的编码(例如 taxonomies, thesauri), 用SKOS编码就足够用了,因为它们所要表现的关系比较简单,等级关系为主,有一些不加区分的相关关系,要是用OWL的那些复杂表达式有点杀鸡用牛刀的架势,没有必要。但是在表达ontologies所要表示的复杂关系时(参见上篇的最后一节),必须使用OWL。目前欧洲几个NKOS大项目的研究都试图在证明SKOS的重要性和效益问题。
今年参加开放本体仓储( Open Ontology Repository -- OOR )活动,又在FRSAR报告撰写中重读了很多主题分类方面的经典,回头审视本体的功能和局限,觉得过去自己对本体的认识有很多盲点,值得总结出来,也许这些分析与许多权威专家的想法不太一致,撂在这里,算是抛砖引玉。 自本体一词(又称实用分类系统)在图书馆情报学中开始流传以来,有时候本体被寄予了无限美好的希望,好像在漫长而没有尽头的对图书情报进行组织和检索途径的摸索中 ,终于见到了一点亮光;这颗虽未被尝过(是因为还没得到而不是因为舍不得尝)、但已经被贴上万灵标签的灵丹妙药,几乎要成为每个2010年代宏伟计划的起点以及终结; 我们希望网络世界的无序的问题会在本体的万能功用面前不攻自破... 网上成百上千本体资源(专指编成的并正式表达出来的ontologies)的存在,美国、英国等政府资助的长期、大型本体中心的突出成果(如: BioPortal ), 人气很旺的本体峰会( ontology Summit )年会及其节奏极快的网上举行的会前周会(weekly), 也多多少少证明了本体之热和能。 问题是,对于以情报检索、文献资源组织、面向读者(而不是机器)服务的我们的图情工作,本体到底能有多大的本事? 我认为,(1)本体是很好的可以利用的知识组织系统(KOS),我们必须理解和利用已经存在的众多的本体资源。(2)以推理为目标和按逻辑和公理制作的本体(泛指词)只是可以利用的知识组织系统之一,不应是我们唯一的、也不应是最终的目标和手段。 An ontology is a formal, explicit specification of a shared conceptualization。Studer, R., Benjamins, and Fensel, D. (1998). Knowledge engineering: Principles and methods, Data and Knowledge Engineering, 25(1998): 161-197. 这里的formal, 指机器可处理的,概念、属性、相关关系、限定条件等都有明确定义的。 在机器处理中,按照逻辑和公理(axioms)来建立的本体才可能不产生语义含糊的推理,昆虫(纲)(Insecta) 只能属于节支动物门 (Arthropoda), 在本体的类目(class)之间,下位类自然承袭上位类的所有属性(attributes)(但还有更独特的其它属性)。 这种严格的属--种 (genus-species)等级 关系,在taxonomies(知识分类表)中一般来说得到 基本 保证,在 thesauri (叙词表)中得到 一定 到 基本 的保证,(根据学科专业而定) ,在classification schemes (图书馆分类法)中则得到 较少 的保证。 属--种关系只是图书分类法中采用的等级关系的一种,在很多情况下,图书馆分类法是根据事物被学习、研究、讨论和由此产生的学科专业、书、刊、论文等的情况而列的,这样昆虫可以被列在很多类下,例如农业害虫, 疾病载体, 食品, 艺术表现, 控制技术等等类目下面。这种'perspective hierarchies (角度等级)不带有属种关系所遵循的概念内涵的逻辑关系,不反映概念自反性 (reflexivity),反对称性(anti-symmetry),传递性(transitivity)等基本属性。这些观点和角度与被研究对象概念在本体中是通过很多其它方式(而不是等级方式)来表现的,比如属性(attributes), 限定和规定(restrictions,rules)、概念关系类型 (semantic relation types) 和经由它们生成的概念- -关系断言(assertions)等。 本体和其它KOS也都采用整体-部分(whole-part) 和类-例(class-instances)等级,这里略去不表。 本体与图书馆情报学的KOS(1)有没有必要一起用?(2)能不能互相利用?(3)本体取代所有KOS? (有空时再抛一砖。)下图是一个典型实例,可以用来解释本体资源(ontologies)显示概念关系的方法, 但是这里并没有展现在装载本体的文件里的各种对概念关系的表达。 这个ontology是用Protege为工具做的,注册到BioPortal。 BioPortal集中了不少生物和生物医学方面的ontologies, 有三种表现格式: Protege, OWL(Lite, Full, DL), OBO自己的格式。不管是什么格式表现和存储的,其显示给人(而不是机器)看的界面是一样的。如图。 注意左边细胞(cell)在hasSubclass等级结构中的显示。Cell承袭上位类的所有特征(attributes),其下位类承袭它的所有特征,这些特征可以在右边上栏的attributes下面看到,包括: has_boundary; has_inherent_3-D_shape; dimension; has_dimension; has_mass; Definition; Comment; Synonym. 这个类与其它类目的关系的局部显示在右下栏中。这些关系遵循某个ontology定义好的关系类型。不同的ontologies往往根据需要定义一些的关系类型。可以看出,这些关系类型得到特定的揭示,是叙词表或分类表所缺乏的。 但是要记住的是,为什么ontologies需要这些定义和严格的关系,因为它们的原始目的并不是标引和检索,而是判断和推理。 屏幕图像来自Foundational Model of Anatomy ontology (FMA)在BioPortal上的显示。 Source: BioPortal of the Open Biomedical Ontologies (OBO) library, National Center for Biomedical Ontology (http://www.bioontology.org/tools/portal/bioportal.html) 在对关系的表达格式中,本体采用的编码语言容许其达到不同的境界,比如,OWL Web Ontology Language 建立在RDF的基础上,有很多特定的表达,例如 (用英文总结的,没有来得及翻译): ? cardinality constraints on properties, e.g., a Star is memberOf exactly one Galaxy; ? specifying constraints on the range or cardinality of a property depend on the class of resource, e.g., for a binarySystem the hasMember property has 2 values, while for a tripleSystem the same property should have 3 values; ? specifying that a given property is transitive, e.g., if A hasAncestor B, and B hasAncestor C, then A hasAncestor C; ? specifying that a given property is a unique identifier (or key) for instances of a particular class; ? Equivalent class - specifying that two different classes (having different URIrefs) actually represent the same class; ? Same as - specifying that two different instances (having different URIrefs) actually represent the same individual; ? the ability to describe new classes in terms of combinations (e.g., unions and intersections) of other classes; and ? the ability to describe disjoint classes (i.e., no instance belongs to both classes), e.g., benign and malignant. 这些表达使得本体的关系推理得到保障,比如对成员范围甚至个数的规定,对排斥关系(disjoint)的概念的表述(例如良性与恶性肿瘤之间的关系 -- 虽然这个例子让人看了很不舒服,但是很能说服问题,所以先用了再说吧)等等。 我想用这个例子来看本体与传统KOS(例如taxonomies) 的关系。SWAD 例子比较好懂,先放在这里。其它实例下次继续介绍。 SWED (Semantic Web Environmental Directory) 是关于环境科学和环境保护方面的组织和项目的名录或指南,作为一个portal, 其内容是通过语义网工具和内容收割过来的 。 SWAD的ontology里面建立了几个提供等级关系结构的知识分类表 (taxonomies),如图所示。 来源:Alistair Miles, Taxonomies and the Semantic Web, CISTRANA Workshop 02/05 另外,Alistair Miles通过SWAD的实例对编码的经济效益作了一些比较研究,发现最合适的方式就是SKOS和OWL的结合。 Source: Alistair Miles, Taxonomies and the Semantic Web, CISTRANA Workshop 02/06 http://isegserv.itd.rl.ac.uk/public/skos/press/cistrana200602/taxonomies-semanticweb.ppt 也就是说,对等级结构的KOS成份的编码(例如 taxonomies, thesauri), 用SKOS编码就足够用了,因为它们所要表现的关系比较简单,等级关系为主,有一些不加区分的相关关系,要是用OWL的那些复杂表达式有点杀鸡用牛刀的架势,没有必要。但是在表达ontologies所要表示的复杂关系时(参见上篇的最后一节),必须使用OWL。目前欧洲几个NKOS大项目的研究都试图在证明SKOS的重要性和效益问题。
Workshop on 3 rd Natural Language Processing and Ontology Engineering (NLPOE 2010) http://nlpoe2010.pqpq.net/ In conjunction with The 2010 IEEE/WIC/ACM International Conference on Web Intelligence (WI-10) August 31-September 3, 2010, Toronto, Canada Call for Papers Natural Language Processing (NLP) addresses the problems of automated understanding and generation of natural human languages. The former identifies the syntactic structure of a sentence, judges the semantic relations among the syntactic constituents, in hopes of reaching at an eventual understanding of the sentence. The latter process constructs the semantic structures and syntactic constituents according to the semantic and syntactic properties of the lexical items selected, and eventually generates grammatically well-formed sentences. The goal of the NLP applications is to facilitate human-machine communication using natural languages. In particular, it is to establish various computer application software systems to process natural language, such as machine translation, computer-assisted teaching, information retrieval, automatic text categorization, automatic summarization, speech recognition and synthesis, information extraction from the text, intelligent search on the Internet. Today, with the wide use of the Internet, the demand for language information puts a high premium on automated processing of massive language information. Ontology engineering is a subfield of artificial intelligence and computer science, which aims at a structured representation of terms and relationship between the terms within particular domain, with the purpose to facilitate knowledge sharing and knowledge reuse. Ontology project involves the development of Ontology building programs, Ontology life-cycle management, the research of Ontology building methods, support tools and ontology languages, and a series of similar activities. Ontologies have found important applications in information sharing, system integration, knowledge-based software development and many other issues in software industry. However, ontology engineering is a time-consuming and painstaking endeavor, and NLP technology has important contributions to make in quick and automatic development of ontologies. This workshop will focus on the recent advances made in Ontology engineering and NLP, with the aim to promote the interaction between and common growth of the two areas. We are particularly interested in the building of upper-level language ontology in NLP and the application of NLP technology in Ontology engineering. More importantly, we expect that individuals and research institutions in the areas of both Ontology engineering and NLP could pay attention to this workshop, which may contribute to the integration and growth of these two areas. The topics of the workshop include, but are not limited to, the following: 1.Natural language understanding, including syntactic parsing, word sense disambiguation, semantic role labeling etc; 2.Text mining, including named entity recognition, term recognition, term and synonyms and concept extraction, relation extraction etc) 3.Lexical resources and corpora, including dictionaries, thesaurus, ontology, etc; 4.Ontology learning and population from text, Web and other resources; 5.Application issues of ontology based NLP: information extraction, text categorization, text summarization and other applications; 6.Other topics of relevance in ontology learning, ontology evolution, ontology modeling and ontology application etc. Paper Submission Paper submissions should be limited to a maximum of 4 pages (only one more page is available and extra payment is required for the extra page). The papers must be in English and should be formatted according to the IEEE 2-column format (see the Author Guidelines at http://www.computer.org/portal/pages/cscps/cps/final/wi08.xml ). All submitted papers will be reviewed by at least 2 program committee members on the basis of technical quality, relevance, significance, and clarity. The workshop only accepts on-line submissions. Please use the Submission Form on the WI'10 website to submit your paper. http://wi-consortium.org/cyberchair/wiiat10/scripts/ws_submit.php Publication All papers accepted for workshops will be included in the Workshop Proceedings published by the IEEE Computer Society Press that are indexed by EI, and will be available at the workshops. Important Dates Workshop paper submission: April 16, 2010 Notification of paper acceptance: May 28, 2010 Camera-ready of accepted papers: June 21, 2010 Workshops date: August 31, 2010 Conference dates: September 1 - 3, 2010 Workshop Organizers Zhifang Sui Associate Professor Institute of Computational Linguistics (ICL), Peking University No.5 Yiheyuan Rd. haidian District.100871,Beijing China E-mail:suizhifang@gmail.com Tel:086-01062753081-105 Yao Liu Associate Professor Institute of Scientific and Technical Information of China No.15 Fuxing Road haidian District, Beijing 100038 China E-mail:liuy@istic.ac.cn Tel:086-01058882 053
From: http://www.langware.com/index.php?/content/view/30/45/ Ontology http://en.wikipedia.org/wiki/Ontology From Wikipedia, the free encyclopedia In philosophy, ontology is the study of being or existence. http://en.wikipedia.org/wiki/Ontology_(computer_science) Ontology (computer science) From Wikipedia, the free encyclopedia In both computer science and information science, an ontology is a data model that represents a set of concepts within a domain and the relationships between those concepts. It is used to reason about the objects within that domain. http://www-ksl.stanford.edu/kst/what-is-an-ontology.html What is an Ontology? Short answer: An ontology is a specification of a conceptualization. http://www.jfsowa.com/ontology/ The subject of ontology is the study of the categories of things that exist or may exist in some domain. The product of such a study, called an ontology, is a catalog of the types of things that are assumed to exist in a domain of interest D from the perspective of a person who uses a language L for the purpose of talking about D. http://www.formalontology.it/ Ontology is the theory of objects and their ties. Ontology provides criteria for distinguishing various types of objects (concrete and abstract, existent and non-existent, real and ideal, independent and dependent) and their ties (relations, dependences and predication). http://ontology.buffalo.edu/ State University of New York at Buffalo Department of Philosophy; Ontology http://www.newadvent.org/cathen/11258a.htm Ontology is not a subjective science as Kant describes it (Ub. d. Fortschr. d. Met., 98) nor an inferential Psychology, as Hamilton regards it (Metaphysics, Lect. VII); nor yet a knowledge of the absolute (theology); nor of some ultimate reality whether conceived as matter or as spirit, which Monists suppose to underlie and produce individual real beings and their manifestations. http://pespmc1.vub.ac.be/ONTOLI.html Ontology (the science of being) is a word, like metaphysics, that is used in many different senses. It is sometimes considered to be identical to metaphysics, but we prefer to use it in a more specific sense, as that part of metaphysics that specifies the most fundamental categories of existence, the elementary substances or structures out of which the world is made. http://www.aaai.org/AITopics/html/ontol.html Ontological analysis clarifies the structure of knowledge. Given a domain, its ontology forms the heart of any system of knowledge representation for that domain. Without ontologies, or the conceptualizations that underlie knowledge, there cannot be a vocabulary for representing knowledge....Second, ontologies enable knowledge sharing. -from What Are Ontologies, and Why Do We Need Them? B. Chandrasekaran, Jorn R. Josephson, V. and Richard Benjamins http://www.daml.org/ontologies/ DAML Ontology Library http://ontology.buffalo.edu/smith/articles/ontologies.htm Ontology as a branch of philosophy is the science of what is, of the kinds and structures of the objects, properties and relations in every area of reality. Ontology in this sense is often used in such a way as to be synonymous with metaphysics. In simple terms it seeks the classification of entities. In the field of information processing there arises what we might call the Tower of Babel problem. http://www.linguistics-ontology.org/ The GOLD Community is a vision to bring together those interested in the best-practice encoding of linguistic data. http://emeld.org/documents/GLOT-LinguisticOntology.pdf A linguistic ontology for the semantic web http://www.formalontology.it/linguistic-relativity.htm Language and Thought: Ontological Problems Ontology and the Linguistic Relativity (Sapir-Whorf) Hypothesis http://ontology.teknowledge.com/ This site contains information about the SUMO (Suggested Upper Merged Ontology). This ontology is being created as part of the IEEE Standard Upper Ontology Working Group. The goal of this Working Group is to develop a standard upper ontology that will promote data interoperability, information search and retrieval, automated inferencing, and natural language processing. The SUMO has been translated into various representation formats, but the language of development is a variant of KIF (a version of the first-order predicate calculus). http://www.fb10.uni-bremen.de/anglistik/langpro/webspace/jb/info-pages/ ontology/ontology-root.htm This page is a collection of starting points for information on ontologies gathered together for ease of reference for our own ontology-related projects. It is made available as is in case it is of use to anyone else. http://www.cs.vu.nl/~guus/papers/Hage05a.pdf A Method to Combine Linguistic Ontology-Mapping Techniques We discuss four linguistic ontology-mapping techniques and evaluate them on real-life ontologies in the domain of food. Furthermore we propose a method to combine ontology-mapping techniques with high Precision and Recall to reduce the necessary amount of manual labor and computation. http://zimmer.csufresno.edu/~wlewis/projects/DDLOD.html Data-Driven Linguistic Ontology Development Universitt Bremen The intent of the DDLOD project is to semi-automatically capture a picture of the semantic space of the field of linguistics, and use this snapshot to make the Generalized Ontology for Linguistic Description (GOLD) as complete and comprehensive as possible. http://linguistlist.org/emeld/tools/ontology.cfm Markup: Linguistic Ontology Traditionally markup has been defined as systematic annotation designed to reveal a text's typographical and informational structure. Linguistic markup might be broadly described as annotation representing: (a) the grammatical structure of text couched in the focus language and (b) the structure of documents presenting a linguistic description or analysis of such text. http://www.aifb.uni-karlsruhe.de/WBS/pci/annotation.pdf Ontology-based linguistic annotation Institute AIFB; University of Karlsruhe http://zimmer.csufresno.edu/~wlewis/projects/DDLOD-overview.html The World Wide Web has become a primary source for disseminating data on the worlds languages, with a variety of language data regularly posted to the Web, including large numbers of scholarly papers on language. Often embedded in these documents are enriched language data encoded in the form of Interlinear Glossed Text (IGT). IGT is a standard method for presenting linguistic data, and consists of a line of language data, usually broken down by morpheme, a line of grammatical and gloss information aligned with the text in the first line, and a line representing the translation. http://cogprints.org/4009/ The ontology of signs as linguistic and non-linguistic entities: a cognitive perspective http://www.phil.uni-passau.de/linguistik/linguistik_urls/urls.php?CAT=computing: Software:Ontology+Engineering Linguistics Links Database Computing Software Ontology Engineering JATKE (unified platform for ontology learning) OntoLT (middleware for ontology extraction from text) Protg (ontology editor and knowledge-base editor) Text2Onto (framework for ontology learning from text) TextToOnto (ontology construction using text mining techniques) http://www.phil.uni-passau.de/linguistik/linguistik_urls/urls.php?CAT=computing:Software Linguistics Links Database Department of General Linguistics at the University of Passau. http://www.cs.utexas.edu/users/mfkb/related.html Some Ongoing KBS/Ontology Projects and Groups Knowledge-Base Projects, Groups, and Related Material http://sigart.acm.org/ai/ontology.html A lot of stuff for linguistics, networks and computers. http://www.essex.ac.uk/linguistics/clmt/other_sites/index_1.html A lot of links for linguistics, networks and computers. No longer maintained. http://www.sim.hcuge.ch/ontology/03_MedicalLinguistics.htm The Service d'Informatique Mdicale (SIM) is part of the Radiology and Medical Informatics Department of the University Hospitals of Geneva, This entity is in charge of development of medical applications like patient record, medical orders and other knowledge based applications. A group of SIM has been long specialized for Natural Language Processing. http://linguistlist.org/emeld/school/classroom/ontology/index.html E-MELD school of best practices in digital language documentation http://linguistlist.org/emeld/workshop/2005/papers/saulwick-paper.doc . Semantic relations in ontology mediated linguistic data integration http://llc.oxfordjournals.org/cgi/content/abstract/21/suppl_1/29 Oxford Journals Literary and Linguistic Computing Designing and Implementing an Ontology for Logic and Linguistics http://www.legenden.dk/blog/2003/12/links.html Online Philosophy List of philosophers with online papers about: Language, Linguistics, Metaphysics, Epistemology, Logic and Mathematics http://www.let.uu.nl/linguistics/log/ EBoLi - an E-Book for Linguistics http://suo.ieee.org/email/msg12240.html Multi-Source Ontology (MSO) Draft Ballot Question http://xml.coverpages.org/xml.html Extensible Markup Language (XML) and links for ontology. http://www.onlineoriginals.com/showitem.asp?itemID=287articleID=10 A GENETIC INTERPRETATION OF RICOEUR'S PHILOSOPHY OF LANGUAGE Furnishing Ricoeur's theory of language with an ontology that is consistent with his own assumptions http://www.clres.com/dict.html ACL SIGLEX Resource Links http://swik.net/ontology?index ontology Pages Filter by Tag related to ontology http://www.cs.brandeis.edu/~jamesp/arda/time/readings.html The site contains References and Links; General References; Ontology WG; Corpus WG; TimeML WG http://nlp.shef.ac.uk/links.html Natural language processing group http://www.imi.uni-luebeck.de/~ingenerf/terminology/Term-oth.html Materials about Basic Sciences and; Terminology; Ontology; Artificial Intelligence; Knowledge Representation; Computational Linguistics; Information Retrieval http://citeseer.ist.psu.edu/704251.html Introduction The World Wide Web has the potential to become a primary source for storing and accessing linguistic data, including data of the sort that are routinely collected by field linguists. Having large amounts of linguistic data on the Web will give linguists, indigenous communities, and language learners access to resources that have hitherto been difficult to obtain. For linguists, scientific data from the world's languages will be just as accessible as information in on-line http://citeseer.ist.psu.edu/760180.html Class Relation Predicate GrammaticalRelation Aspect Tense Case Agreement Attribute GrammaticalAttribute Gender Person Number 7 4.2 Details of the Ontology As much as possible we tried to use existing elements of the SUMO. First of all SUMO already includes a good semiotics architecture for the representation and the communication of information in general. Expanded from the original SUMO somewhat are the basic segments of language, which are classified as LinguisticExpressions http://www.loa-cnr.it/Files/SOIA.pdf SOIA Semantics and Ontology of InterAction Joint project ISTC - IRIT (CNRS-UPS, Toulouse, France) http://opim-sun.wharton.upenn.edu/~asa28//useful_semiotics_research_links.htm Useful Semiotics, linguistics, semantics, syntactics, controlled language, domain-specific language, etc. Research Links http://links.jstor.org/sici?sici=0097-8507(198309)59%3A3%3C708%3AEILO%3E2.0.CO%3B2-L Essays in Linguistic Ontology http://www.jfsowa.com/ontology/lexicon.htm The lexicon is the bridge between a language and the knowledge expressed in that language. Every language has a different vocabulary, but every language provides the grammatical mechanisms for combining its stock of words to express an open-ended range of concepts. Different languages, however, differ in the grammar, the words, and the concepts they express. http://www.cs.bilkent.edu.tr/~erayo/ontology/html/bookmarks/Ontologies/ Linguistics_Oriented/index.html Annotated Ontology Resources: Linguistics Oriented http://www.sciencedirect.com/science?_ob=ArticleURL_udi=B6V0N-47TFMYT-5 _user=10_coverDate=11%2F15%2F2002_rdoc=1_fmt=_orig=search_ sort=dview=c_acct=C000050221_version=1_urlVersion=0_userid=10md5 =85bb0f32be97f1d75abcbd7652951834 Linguistic kleptomania in computer science Department of Informatics, Aristotle University, Thessaloniki, Greece http://www.fi.muni.cz/gwc2004/proc/118.pdf One Dead Armadillo on WordNet's Speedway to Ontology Institute for Formal Ontology and Medical Information Science, University of Leipzig http://www.ling.su.se/DaLi/research/index.htm Research in Computational Linguistics at SU http://xml.coverpages.org/muleco.html Multilingual Upper-Level Electronic Commerce Ontology (MULECO) http://xml.coverpages.org/oil.html Ontology Interchange Language (OIL) http://xml.coverpages.org/owl.html OWL Web Ontology Language http://xml.coverpages.org/oml.html Ontology and Conceptual Knowledge Markup Languages http://xml.coverpages.org/shoe.html Simple HTML Ontology Extensions (SHOE) http://xml.coverpages.org/xol.html XOL - XML-Based Ontology Exchange Language http://www.cstr.ed.ac.uk/ University of Edinburgh The Centre for Speech Technology Research http://www.cl.cam.ac.uk/research/nl/index.html Natural Language and Information Processing Group University of Cambridge; Computer Laboratory; NLIP Group Computer Laboratory, University of Cambridge
图谋按:《Science》2009年8月14日刊登了一篇评论《策略性阅读、本体与科学出版的未来》。作者为伊利诺伊大学厄巴纳香槟分校图情(LIS)研究生院Allen H. Renear和Carole L. Palmer。特此摘译该文摘要及2019年科学家们将怎样使用文献?部分,供参考。感谢caveman (Jason Zou) 先生提供原文! 译自:Allen H. Renear, et al. Strategic Reading, Ontologies, and the Future of Scientific Publishing.Science 325,828(2009).DOI:10.1126/Science.1157784 (作者信息:Allen H. Renear and Carole L. Palmer,Center for Informatics Research in Science and Scholarship,Graduate School of Library and Information Science,University of Illinois at Urbana-Champaign,CHampaign,IL 61820,USA) 题名:策略性阅读、本体与科学出版的未来 摘要:科学出版革命自20世纪80年代起预示即将发生。科学家们讲究策略性阅读,同时对许多篇论文进行搜索,筛选,浏览,链接,注释和分析内容片段。观察表明网络环境下的策略性阅读最近有所增加,不久将进一步集中为两种流行的趋势:一是资源数字标引,检索和导航的广泛使用;二是多学科内在本体互操作的出现。利用本体优势,阅读工具开发加速与增强,阅读实践将变得更加快速和丰富,改变了科学家使用文献的方法并且重塑了科学出版的演变。 摘要原文: The revolution in scientific publishing that has been promised since the 1980s is about to take place. Scientists have always read strategically, working with many articles simultaneously to search, filter, scan, link, annotate, and analyze fragments of content. An observed recent increase in strategic reading in the online environment will soon be further intensified by two current trends: (i) the widespread use of digital indexing, retrieval, and navigation resources and (ii) the emergence within many scientific disciplines of interoperable ontologies. Accelerated and enhanced by reading tools that take advantage of ontologies, reading practices will become even more rapid and indirect, transforming the ways in which scientists engage the literature and shaping the evolution of scientific publishing. 2019年科学家们将怎样使用文献? 尽管文本挖掘和自动化处理变得很平常,科学家们仍旧阅读叙事散文。然而,这种阅读实践有延伸阅读文献和本体意识工具的支持会越来越策略。作为出版工作流程的一部分,针对丰富的本体,科学术语将按常规编入索引。更重要的是,正式的说法,也许在专门的结构化摘要,将提供计算获得的因果关系和本体联系的索引和浏览工具。超文本链接将是广泛的,通过共享注释数据库自动生成读者提供的博客评论。同时,将出现更多工具增强搜索、浏览和分析并且利用日益丰富的索引、链接和注释信息。 如上所述,在技术方面没有障碍,而且已经在进行。一如既往,这些变化将是渐进的。现在已广泛运用现有的索引和检索服务的科学家,会遇到新的增强功能,并且采用那些快速增长的文献。新功能的提供有时会作为应用程序接口(比如PubMed的新功能)的一部分,或作为用户可以添加到Web浏览器的共享外部工具。这些发展在找一篇文章来读的行为已经过时与狭义的文本挖掘对象之间形成中间道路,直接反映科学家们日常工作中策略性阅读是非常必要的、有意义的。 原文:How Will Scientists Work with the Literature in 2019? Scientists will still read narrative prose, even as text mining and automated processing become common; however, these reading practices will become increasingly strategic, supported by enhanced literature and ontology-aware tools. Aspart of the publishing workflow, scientific terminology will be indexed routinely against rich ontologies. More importantly, formalized assertions, perhaps maintained in specialized structured abstracts, will provide indexing and browsing tools with computational access to causal and ontological relationships. Hypertext linking will be extensive, generated both automatically and by readers providing commentary on blogs and through shared annotation databases. At the same time, more tools for enhanced searching,scanning, and analyzing will appear and exploit the increasingly rich layer of indexing, linking, and annotation information. There are no technical obstacles to this trajectory, and it is already under way. The changes,as always, will be incremental: Scientists, who today already make extensive use of existing indexing and retrieval services, will encounter a steady stream of new enhancements and adopt those that allow rapid and productive engagement with the literature. The new functionality will sometimes be provided as part of the application interface (new features in PubMed, for instance) or as shared external tools that users can add to their Web browsers. These developments chart a middle course between the already obsolete activity of finding an article to read on the one hand, and the narrower objectives of text mining on the other, responding directly to the entrenched necessity and value of strategic reading in the daily work of todays scientists.
Data Mining with Ontologies: Implementations, Findings, and Frameworks 来源于: https://igi-pub.com/reference/details.asp?ID=6844v=preface Edited By: Hector Oscar Nigro , Universidad Nacional del Centro de la Provincia de Buenos Aires, Argentina; Sandra Elizabeth Gonzalez Cisaro , Universidad Nacional del Centro de la Provincia de Buenos Aires, Argentina; Daniel Hugo Xodo , Universidad Nacional del Centro de la Provincia de Buenos Aires, Argentina Preface: Data mining, also referred to as knowledge discovery in databases (KDD), is a process of finding new, interesting, previously unknown, potentially useful, and ultimately understandable patterns from very large volumes of data. Data mining is a discipline which brings together database systems, statistics, artificial intelligence, machine learning, parallel and distributed processing and visualization between other disciplines (Fayyad et al., 1996; Hand Kamber, 2001; Hernadez Orallo et al., 2004). Nowadays, one of the most important and challenging problems in data mining is the definition of the prior knowledge; this can be originated from the process or the domain. This contextual information may help select the appropriate information, features or techniques, decrease the space of hypothesis, represent the output in a most comprehensible way and improve the whole process. Therefore we need a conceptual model to help represent to this knowledge. According to Gruber's ontology definition?explicit formal specifications of the terms in the domain and relations among them (Gruber, 1993, 2002); we can represent the knowledge of knowledge discovery process and knowledge about domain. Principally, ontologies are used for communication (between machines and/or humans), automated reasoning, and representation and reuse of knowledge (Cimiano et al., 2004). As a result, ontological foundation is a precondition for efficient automated usage of knowledge discovery information. Thus, we can perceive the relation between Ontologies and data mining in two manners: From ontologies to data mining, we are incorporating knowledge in the process through the use of ontologies, i.e. how the experts comprehend and carry out the analysis tasks. Representative applications are intelligent assistants for discover process (Bernstein et al., 2001, 2005), interpretation and validation of mined knowledge, Ontologies for resource and service description and knowledge Grids (Cannataro et al., 2003; Brezany et al., 2004). From data mining to Ontologies, we include domain knowledge in the input information or use the ontologies to represent the results. Therefore the analysis is done over these ontologies. The most characteristic applications are in medicine, biology and spatial data, such as gene representation, taxonomies, applications in geosciences, medical applications and specially in evolving domains (Langley, 2006; Gottgtroy et al., 2003, 2005; Bogorny et al., 2005). When we can represent and include knowledge in the process through ontologies, we can transform data mining into knowledge mining. Data Mining with Ontologies Cycle Figure 1 shows our vision of data mining with ontologies cycle. Metadata ontologies : These ontologies establish how this variable is constructed i.e. which was the process that permit us to obtain its value, and it can vary using another method. Of course this ontology must also express general information about the variable as is treated. Domain ontologies : These ontologies express the knowledge about application domain. Ontologies for data mining process : These ontologies codify all knowledge about the process, i.e., select features, select the best algorithms according to the variables and the problem, and establish valid process sequences (Bernstein, 2001, 2005; Cannataro, 2003, 2004). According with Gomez-Perez and Manzano-Macho (2003) the different methods and approaches, which allow the extraction of ontologies or semantics from database schemas can be classified on three areas, main goal, techniques used and sources used for learning. With regard to the attributes of each area they are the following for summary of ontology learning methods from relational schema are: Main goal To map a relational schema with a conceptual schema To create (and refine) an ontology To create ontological instances (from a database) Enhance ad hoc queries Techniques used Mappings Reverse engineering Induction inference Rule generation Graphic modeling Sources used for learning Relational schemas (of a database) Schema of domain specific databases Flat files Relational databases In next paragraphs we explain in more detail these three classes of ontologies based on earlier works from different knowledge fields. Domain Ontology The models on many scientists work to represent their work hypotheses are generally cause effect diagrams. Models make use of general laws or theories to predict or explain behavior in specific situations. Currently these cause effect diagrams can be without difficulty translated to ontologies, by means of conceptual maps which discriminate taxonomy organized as central concepts, main concept, secondary concepts, specific concepts. Discovery systems produce models that are valuable for prediction, but they should also produce models that have been stated in some declarative format, that can be communicated clearly and precisely, which helps people understand observations, in terms that they find well known (Bridewell, 2006; Langley, 2002, 2006). Models can be from different appearances and dissimilar abstraction level, but the more complex the fact for which they account, the more important that they be cast in some formal notation with an unambiguous interpretation. And of course these new knowledge can be easily communicated and updated between systems and Knowledge databases. In particular into data mining field knowledge can be represented in different formalisms, e.g. rules, decision trees, cluster, known as models. Discovery systems should generate knowledge in a format that is well known to domain users. There are an important relation between knowledge structures and discovery process with learning machine. The formers are important outputs of discovery process, and are important inputs to discovery (Langley, 2000). Thus knowledge plays as crucial a role as data in the automation of discovery. Therefore, ontologies provide a structure capable of supporting the knowledge representation about domain. Metadata Ontologies As Spyns et al. (2002) affirm ontologies in current computer science language are computer-based resources that represent agreed domain semantics. Unlike data models, the fundamental asset of ontologies is their relative independence of particular applications, i.e., an ontology consists of relatively generic knowledge that can be reused by different kinds of applications/tasks. In opposition a data model represents the structure and integrity of the data elements of the, in principle ?single?, specific enterprise application(s) by which it will be used. Consequently, the conceptualization and the vocabulary of a data model are not intended a priori to be shared by other applications (Gottgtroy et al., 2005). Similarly, in data modeling practice, the semantics of data models often constitute an informal accord between the developers and the users of the data model?including when a data warehouse is designedand, in many cases, the data model is updated as it evolves when particular new functional requirements pop up without any significant update in the metadata repository. Both ontology model and data model have similarities in terms of scope and task. They are context dependent knowledge representation, that is, there doesn?t exist a strict line between generic and specific knowledge when you are building ontology. Moreover, both modeling techniques are knowledge acquisition intensive tasks and the resulted models represent partial account of conceptualizations (Gottgtroy et al., 2003). In spite of the differences, we should consider the similarities and the fact of data models carry a lot of useful hide knowledge about the domain in its data schemas, in order to build ontologies from data and improve the process of knowledge discovery in databases. Due the fact data schemas do not have the required semantic knowledge to intelligently guide ontology construction has been presented as a challenge for database and ontology engineers (Gottgtroy et al., 2003). Ontologies for Data Mining Process Vision about KDD process is changing over time. In its beginnings the main objective was to extract a valuable pattern from a fat file as a play of try and error. As time goes by, researchers and fundamentally practitioners discuss the importance of a priori knowledge, the knowledge and understandability about the problem, the choice of the methodology to do the discovery, the expertise in similar situations and an important question arises up to what existent is such inversion on data mining projects worthwhile? As practitioners and researchers in this field we can perceive that expertise is very important, knowledge about domain is helpful and it simplify the process. To do more attractive the process to managers the practitioners must do it more efficiently and reusing experience. So we can codify all statistical and machine learning knowledge with ontologies and use it. Bernstein et al. (2001) have developed the concept of intelligent assistant discovery (IDA), which helps data miners with the exploration of the space of valid data mining processes. It takes advantage of an explicit ontology of data-mining techniques, which defines the various techniques and their properties. Main characteristics are (Bernstein et al., 2005). A systematic enumeration of valid DM processes, so they do not miss important, potentially fruitful options. Effective rankings of these valid processes by different criteria, to help them choose between the options. An infrastructure for sharing data mining knowledge, which leads to what economists call network externalities. Cannataro and colleagues have done another interesting contribution to this kind of ontologies. They developed an ontology that can be used to simplify the development of distributed knowledge discovery applications on the Grid, offering to a domain expert a reference model for the different kind of data mining tasks, methodologies and software available to solve a given problem, helping a user in finding the most appropriate solution (Cannataro et al., 2003, 2004). Authors have adopted the Enterprise Methodology (Corcho et al., 2003). Research Works in the Topic The next paragraphs will describe the most recently research works in data mining with ontologies field. Singh, Vajirkar, and Lee (2003) have developed a context aware data mining framework which provide accuracy and efficacy to data mining outcomes. Context factors were modeled using ontological representation. Although the context aware framework proposed is generic in nature and can be applied to most of the fields, the medical scenario provided was like a proof of concept to our proposed model. Hotho, Staab and Stumme (2003) have showed that using ontologies as filters in term selection prior to the application of a K-means clustering algorithm will increase the tightness and relative isolation of document clusters as a measure of improvement. Pand and Shen (2005) have proposed architecture for knowledge discovery in evolving environments. The architecture creates a communication mechanism to incorporate known knowledge into discovery process, through ontology service facility. The continuous mining is transparent to the end user; moreover, the architecture supports logical and physical data independence. Rennolls (2005, p. 719) have developed an intelligent framework for data mining, knowledge discovery and business intelligence. The ontological framework will guide to user to choice of models from an expanded data mining toolkit, and the epistemological framework will assist to user in interpreting and appraising the discovered relationships and patterns. On domain ontologies, Pan and Pan (2006) have proposed ontobase ontology repository. It is an implementation, which allows users and agents to retrieve ontologies and metadata through open Web standards and ontology service. Key features of the system include the use of XML metadata interchange to represent and import ontologies and metadata, the support for smooth transformation and transparent integration using ontology mapping and the use of ontology services to share and reuse domain knowledge in a generic way. Recently, Bounif et al. (2006) have explained the articulation of a new approach for database schema evolution and outline the use of domain ontology. The approach they have proposed belongs to a new tendency called the tendency of a priori approaches. It implies the investigation of potential future requirements besides the current requirements during the standard requirements analysis phase of schema design or redesign and their inclusion into the conceptual schema. Those requirements are determined with the help of a domain ontology called ?a requirements ontology? using data mining techniques and schema repository. Book Organization This book is organized into three major sections dealing respectively with implementations, findings, and frameworks. Section I : Implementations includes applications or study cases on data mining with ontologies. Chapter I , TODE: An Ontology-Based Model for the Dynamic Population of Web Directories by Sofia Stamou, Alexandros Ntoulas, and Dimitris Christodoulakis studies how we can organize the continuously proliferating Web content into topical categories, also known as Web directories. Authors have implemented a system, named TODE that uses Topical Ontology for Directories? Editing. Also TODE?s performance is evaluated; experimental results imply that the use of a rich topical ontology significantly increases classification accuracy for dynamic contents. Chapter II , Raising, to Enhance Rule Mining in Web Marketing with the Use of an Ontology by Xuan Zhou and James Geller introduces Raising as an operation which is used as a preprocessing step for data mining. Rules have been derived using demographic and interest information as input for data mining. The Raising step takes advantage of interest ontology to advance data mining and to improve rule quality. Furthermore, the effects caused by Raising are analyzed in detail, showing an improvement of the support and confidence values of useful association rules for marketing purposes. Chapter III , Web Usage Mining for Ontology Management by Brigitte Trousse, Marie-Aude Aufaure, B?n?dicte Le Grand, Yves Lechevallier, and Florent Masseglia proposes an original approach for ontology management in the context of Web-based information systems. Their approach relies on the usage analysis of the chosen Web site, in complement of the existing approaches based on content analysis of Web pages. One major contribution of this chapter is then the application of usage analysis to support ontology evolution and/or web site reorganization. Chapter IV , SOM-Based Clustering of Multilingual Documents Using an Ontology by Minh Hai Pham, Delphine Bernhard, Gayo Diallo, Radja Messai, and Michel Simonet presents a method which make use of Self Organizing Map (SOM) to cluster medical documents. The originality of the method is that it does not rely on the words shared by documents but rather on concepts taken from ontology. The goal is to cluster various medical documents in thematically consistent groups. Authors have compared the results for two indexing schemes: stem-based indexing and conceptual indexing. Section II : Findings comprise more theoretical aspects of data mining with ontologies such as ontologies for interpretation and validation and domain ontologies. Chapter V , Ontology-Based Interpretation and Validation of Mined Knowledge: Normative and Cognitive Factors in Data Mining by Ana Isabel Canhoto, addresses the role of cognition and context in the interpretation and validation of mined knowledge. She proposes the use of ontology charts and norm specifications to map how varying levels of access to information and exposure to specific social norms lead to divergent views of mined knowledge. Domain knowledge and bias information influence which patterns in the data are deemed as useful and, ultimately, valid. Chapter VI , Data Integration Through Protein Ontology by Amandeep S. Sidhu, Tharam S. Dillon, and Elizabeth Chang discuss conceptual framework of Protein Ontology that has a hierarchical classification of concepts represented as classes, from general to specific; a list of attributes related to each concept, for each class; a set of relations between classes to link concepts in ontology in more complicated ways than implied by the hierarchy, to promote reuse of concepts in the ontology; and a set of algebraic operators to query protein ontology instances. Chapter VII , TtoO: Mining a Thesaurus and Texts to Build and Update a Domain Ontology by Josiane Mothe and Nathalie Hernandez introduces a method re-using a thesaurus built for a given domain, in order to create new resources of a higher semantic level in the form of an ontology. The originality of the method is that it is based on both the knowledge extracted from a thesaurus and the knowledge semiautomatically extracted from a textual corpus. In parallel, authors have developed mechanisms based on the obtained ontology to accomplish a science-monitoring task. An example is provided in this chapter. Chapter VIII , Evaluating the Construction of Domain Ontologies for Recommender Systems Based on Texts by Stanley Loh, Daniel Lichtnow, Thyago Borges, and Gustavo Piltcher, investigates different aspects in the construction of domain ontology to a content-based recommender system. The chapter discusses different approaches so as to construct the domain ontology, including the use of text mining software tools for supervised learning, the interference of domain experts in the engineering process and the use of a normalization step. Section III : Frameworks includes different architectures for different domains in data warehousing or mining with ontologies context. Chapter IX , by Vania Bogorny, Paulo Martins Engel, and Luis Otavio Alvares introduces the problem of mining frequent geographic patterns and spatial association rules from geographic databases. A large amount of natural geographic associations are explicitly represented in geographic database schemas and geo-ontologies, which have not been used so far in frequent geographic pattern mining. The main goal of this chapter is to show how the large amount of knowledge represented in geo-ontologies as prior knowledge can be used to avoid the extraction of patterns previously known as noninteresting. Chapter X , Ontology-Based Construction of Grid Data Mining Workflows by Peter Brezany, Ivan Janciak, and A Min Tjoa, introduces an ontology-based framework for automated construction of complex interactive data mining workflows. The authors present their solution called GridMiner Assistant (GMA), which addresses the whole life cycle of the knowledge discovery process. In addition, conceptual and implementation architectures of the framework are presented and its application to an example taken from the medical domain is illustrated. Chapter XI , Ontology-Based Data Warehousing and Mining Approaches in Petroleum Industries by Shastri L. Nimmagadda and Heinz Dreher. Complex geo-spatial heterogeneous data structures complicate the accessibility and presentation of data in petroleum industries. Data warehousing approach supported by ontology will be described for effective data mining. Ontology based data warehousing framework with fine-grained multidimensional data structures facilitates mining and visualization of data patterns, trends, and correlations hidden under massive volumes of data. Chapter XII , A Framework for Integrating Ontologies and Pattern-Bases by Evangelos Kotsifakos, Gerasimos Marketos, and Yannis Theodoridis propose the integration of pattern base management systems (PBMS) and ontologies. It is as a solution to the need of many scientific fields for efficient extraction of useful information from large databases and the exploitation of knowledge. Authors use a case study of data mining over scientific (seismological) data to illustrate their proposal. Book Objective This book aims at publishing original academic work with high quality scientific papers. The key objective is to provide to data mining students, practitioners, professionals, professors and researchers an integral vision of the topic. This book specifically focuses on those areas that explore new methodologies or examine real study cases that are ontology-based The book describes the state-of-the-art, innovative theoretical frameworks, advanced and successful implementations as well as the latest empirical research findings in the area of data mining with ontologies. Audience The target audience of this book is readers who want to learn how to apply data mining based on ontologies to real world problems. The purpose is to show users how to go from theory and algorithms to real applications. The book is also geared toward students, practitioners, professionals, professors and researchers with basic understanding in data mining. The information technology community can increase its knowledge and skills with these new techniques. People working on the Knowledge Management area such as engineers, managers, and analysts can read it, due to the fact that data mining, ontologies and knowledge management areas are linked straightforwardly. References Bernstein, A., Hill, S., Provost, F. (2001). Towards intelligent assistance for the data mining process: An ontology-based approach . CeDER Working Paper IS-02-02, New York University. Bernstein, A., Provost, F., Hill, S. (2005). Towards intelligent assistance for the data mining process: An ontology-based approach for cost/sensitive classification. In IEEE Transactions on Knowledge and Data Engineering , 17(4), 503-518. Bogorny, V., Engel, P. M., Alvares, L.O. (2005). Towards the reduction of spatial join for knowledge discovery in geographic databases using geo-ontologies and spatial integrity constraints. In M. Ackermann, B. Berendt, M. Grobelink, V. Avatek (Eds.), Proceedings ECML/PKDD Second Workshop on Knowledge Discovery and Ontologies (pp. 51-58). Bounif, H., Spaccapietra, S., Pottinger, R. (2006, September 12-15). Requirements ontology and multirepresentation strategy for database schema evolution . Paper presented at the 2nd VLDB Workshop on Ontologies-based techniques for Databases and Information Systems. Seoul, Korea. Brezany, P., Janciak, I., Woehrer, A., Tjoa, A.M. (2004). GridMiner: A framework for knowledge discovery on the Grid from a vision to design and implementation . Cracow Grid Workshop. Cracow, Poland: Springer. Bridewell, W., S?nchez, J. N., Langley, P., Billwen, D. (2006). An Interactive environment for the modeling on discovery of scientific knowledge. International Journal of Human-Computer Studies , 64, 1009-1014. Cannataro, M., Comito, C. (2003, May 20-24). A data mining ontology for Grid programming . Paper presented at the I Workshop on Semantics Peer to Peer and Grid Computing. Budapest. Retrieved March, 2006, from http://www.isi.edu/~stefan/SemPGRID Cannataro, M., Congiusta, A. Pugliese, A., Talia, D., Trunfio, P. (2004). Distributed data mining on Grids: Services, tools, and applications. IEEE Transactions on Systems, Man and Cybernetics, Part B , 34(6), 2451-2465. Cimiano, P., Stumme, G., Hotho, A., Tane, J. (2004). Conceptual knowledge processing with formal concept analysis and ontologies. In Proceedings of The Second International Conference on Formal Concept Analysis (ICFCA 04) . Corcho, O., Fern?ndez-L?pez, M., G?mez-P?rez, A. (2003). Methodologies, tools and languages for building ontologies: where is their meeting point? Data Knowledge Engineering 46(1), 41-64. Amsterdam: Elsevier Science Publishers B. V. Fayyad, U., Piatetsky-Shiapiro, G., Smyth, P., Uthurusamy R. (1996). Advances in knowledge discovery and data mining . Merlo Park, California: AAAI Press. G?mez P?rez, A., Manzano Macho, D., (Eds.) (2003). Survey of ontology learning methods and techniques . Deliverable 1.5 OntoWeb Project Documentation. Universidad Polit?cnica de Madrid. Retrieved November, 2006, from http://www.deri.at/fileadmin/documents/deliverables/Ontoweb/ D1.5.pdf Gottgtroy, P., Kasabov, N., MacDonell, S. (2003, December). An ontology engineering approach for knowledge discovery from data in evolving domains. In Proceedings of Data Mining 2003 Data Mining IV . Boston: WIT. Gottgtroy, P., MacDonell, S., Kasabov, N., Jain, V. (2005). Enhancing data analysis with Ontologies and OLAP . Paper presented at Data Mining 2005, Sixth International Conference on Data Mining, Text Mining and their Business Applications, Skiathos, Greece. Gruber, T. (1993). A translation Approach to Portable Ontology Specification. Knowledge Acquisitions , 5(2), 199-220. Gruber, T. (2002). What is an ontology? Retrieved November, 2006, from http://www-ksl.stanford. edu/kst/what-is-an-ontology.html Han, J., Kamber, M. (2001). Data mining: Concepts and techniques . Morgan Kaufmann. Hern?ndez Orallo, J., Ram?rez Quintana, M., Ferri Ramirez, C. (2004). Introducci?n a la Miner?a de Datos . Madrid: Editorial Pearson Educaci?n SA. Hotho, A., Staab, S., Stumme, G. (2003). Ontologies improve text document clustering. In Proceedings of the 3rd IEEE Conference on Data Mining , Melbourne, FL, (pp.541-544). Langley, P. (2000). The computational support of scientific discovery. International Journal of Human- Computer Studies , 53, 393-410. Langley P. (2006). Knowledge, data, and search in computational discovery . Invited talk at International Workshop on feature selection for data mining: Interfacing machine learning and statistics, (FSDM) April 22, 2006, Bethesda, Maryland in conjunction with 2006 SIAM Conference on data mining (SDM). Pan, D., Shen, J. Y. (2005). Ontology service-based architecture for continuous knowledge discovery. In Proceedings of International Conference on Machine Learning and Cybernetics , 4, 2155-2160. IEEE Press. Pan, D., Pan, Y. (2006, June 21-23). Using ontology repository to support data mining. In Proceedings of the Sixth World Congress on Intelligent Control and Automation , Dalian, China, (pp. 5947-5951). Rennolls, K. (2005). An intelligent framework (O-SS-E) For data mining, knowledge discovery and business intelligence. Keynote Paper. In Proceeding 2nd International Workshop on Philosophies and Methodologies for Knowledge Discovery , PMKD?05, in the DEXA?05 Workshops (pp. 715- 719). IEEE Computer Society Press. ISBN 0-7695-2424-9. Singh, S., Vajirkar, P., Lee, Y. (2003). Context-based data mining using ontologies. In Song, I., Liddle, S. W., Ling, T. W., Scheuermann, P. (Eds.), Proceedings 22nd International Conference on Conceptual Modeling . Lecture Notes in Computer Science (vol. 2813, pp. 405-418). Springer. Spyns, P., Meersman, R., Jarrar, M. (2002). Data modeling versus ontology engineering, SIGMOD Record Special Issue on Semantic Web, Database Management and Information Systems , 31.
生物学文本数据存储量的急剧增长使得造成了人类方便有效地获取所需信息上的困难。问题的出现是由于大多数信息都隐含在无结构或者半结构的文本中,这些文本计算机无法轻易地理解。 本文介绍了一个基于本体的生物学信息抽取与查询应答系统( Biological Information Extraction and Query Answering , BIEQA ),该系统首先通过对一组存储在生物学本体中的概念进行文本挖掘,然后应用自然语言处理技术和共现分析技术挖掘出概念间可能的生物学关系。系统用文本挖掘方法将每一对生物学概念间频繁出现的生物学关系抽取出来。挖掘出来的关系都标有成员隶属程度的模糊值,该值等于该关系出现频次占整个文献集合中关系频次的比例,称作模糊生物学关系。把从文本集合中抽取出来的模糊生物学关系与其他诸如关系中出现的生物学条目等相关信息存储于数据库中。 数据库与问询处理模型集成在一起。查询处理模型带有界面,指导用户生成不同精确度的正规检索策略。 Biological relation extraction and query answering from MEDLINE abstracts using ontology-based text mining Muhammad Abulaish and Lipika Dey Data Knowledge Engineering Volume 61, Issue 2 , May 2007, Pages 228-262