科学网

 找回密码
  注册

tag 标签: 本体

相关帖子

版块 作者 回复/查看 最后发表

没有相关内容

相关日志

文化难改就像本性难移
周可真 2010-9-16 06:27
人与自然的根本区别何在?文化。 一个民族和别的民族的根本区别何在?文化。 一个人和别的人的根本区别何在?本性。 何谓本性?就是性相近,习相远的性,这是人生来就有的、与他人有差异的个性,并且因后天的习(社会生活)而变得越来越明显的个性。 世界上没有完全相同的两片叶子,就像世界上没有完全相同的两个指纹。个性亦是如此,文化亦是如此。 虽然如此,虽然个性有差异,文化有差异,却无妨于个性互有差异的人的并存,无妨于文化互有差异的民族的并存,如同一棵树上长着许许多多不同的叶子。 这个世界上有许许多多不同的民族和这些民族的文化,但它们是可以互相并存的,而且人们是应该自觉地意识到它们不仅事实上并存,而且道理上必然并存的,否则这个世界就会是完全同一的,而完全同一的民族与文化就是没有民族与文化,就像绝对的光明就是没有光明。 既然如此,丑陋的文化和美好的文化并存于这个世界上,就是这个世界的自然秩序,是人为无法改变的,越是想改变得和其他文化一样,就越是变得和其他文化不一样,除非这种改变是自毁其文化,自毁其民族。 本性难移这句中国俗语蕴含了上述以及还未能言尽的道理。这个难不是困难的难,而是不可能的代名词,是不可能的婉转说法。 本性难移就是本性不可能变的另一种说法。 诚然,对于人的感觉来说,似乎本性是可变的,并且好象事实上在变化,但这种变化只是假象,只是不同条件下本性不同的表现这里条件是指有某种个性的存在与有其他个性的存在的关系,这就好象孙悟空虽然有七十二变却也还是孙悟空,所变的只是孙悟空的形象,而不是孙悟空本身。 所谓本身,就是本体,也就是所谓本性。本身是一,现身是多。多是一的表现,也就是由一派生出来的,所谓一生二,二生三,三生万也。世间运化万变的物象、事象都是现身,都是其本身的表现。例如,张三是本身,张(三)老师、张(三)大哥、张(三)老弟、张(三)处长等等都是张三的现身,尽管有这些不同的现象,但它们都是张三本身在不同条件下不同的表现。这里张三是一,张(三)老师、张(三)大哥、张(三)老弟、张(三)处长等等便是多。 诚然,张三也不是自生的,而是由张二与李二生的,是张二与李二生的儿或女。但是,张三一经被生下,就没有什么可以取代张三,张三永远是张三,它的本身是不变的,所变的只是它的现身。正因为如此,正因为一切物象、事象都如此,这个世界才是如此形形色色,千差万别。 假定任何物或事都只有现身,而不存在什么本身,则任何物或事就既是本身也是现身,但这并不能改变张三就是张三的事实,也就是说,它改变不了这个世界的多样性这个事实。如果承认这个事实,则无论是否区分本身与现身,都必须同时承认张三就是张三并且只是张三,只不过在不区分本身与现身的情况下,张三是不变的,唯一的,而在区分本身与现身的情况下,张三不是不变的而是变化的,不是唯一的而是多样的,如此而已。 张三或其他任何一种物或事或其本身的唯一性、不变性是由什么决定的?不知道,也无法知道。而不知其所以然而然谓之命,从而可以说,任何一种物或事或其本身的唯一性、不变性都是由命决定命定的。 命对于人是不可逃避、不可抗拒,也不可增加或减少的,所以人得知命、认命。 但是,一切物或事都是命定从而人不得不知命、认命,这并不意味着人生在世是无可作为的。 人生在世是可以有所作为的,这种作为就在于每个人都有选择其现身的可能,也就是说,他或她可以自我选择与自己个性不同的有其他个性的存在发生关系,而他或她的这种自我选择就是其潜力的显现或发挥的过程,即通常所谓的努力。 努力,不管是有为的努力,还是无为的努力,都只能也只可能改变自我的现象(现身的形象),而不能也不可能改变我本身(自我)。 要之,任何一个人都有三个要素构成:自我(本身)、非我(现身)和自我选择(努力)。 同理,任何一种文化都有三个要素构成:文化本身、文化现象和文化选择。在这里,文化选择的意义不过在于某种文化自我决定其现象形态,即自我决定以何种方式来表现自己,或者说,自我决定与其他文化发生何种形式的关系,例如,在中、西之间的文化交往过程中,中国文化可以自我决定与西方文化发生何种形式的关系可以是拒斥,也可以是接纳,还可以是羞羞答答或犹犹豫豫的拒斥或接纳。 然而,无论中国文化作何种自我选择,都改变不了中国文化本身,或者说,改变不了中国文化的唯一性、独特性,除非这种自我选择是自毁其文化从而使中华民族消失于这个世界。 因此,无论中国文化是否被看起来是丑是美还是其他形象,中国文化都是本性难移。
个人分类: 人文之思|4233 次阅读|9 个评论
[转载]语义网、本体
skymoon619 2010-7-22 17:38
语义网是对未来网络的一个设想,在这样的网络中,信息都被赋予了明确的含义,机器能够自动地处理和集成网上可用的信息.语义网使用XML来定义定制的标签格式以及用RDF的灵活性来表达数据,下一步需要的就是一种Ontology的网络语言(比如OWL)来描述网络文档中的术语的明确含义和它们之间的关系. 语义网是Semantic Web的中文名称。   语义网就是能够根据语义进行判断的网络。   简单地说,语义网是一种能理解人类语言的智能网络,它不但能够理解人类的语言,而且还可以使人与电脑之间的交流变得像人与人之间交流一样轻松。   添加了更多的用于描述属性和类型的词汇,例如类型之间的不相交性(disjointness),基数(cardinality),等价性,属性的更丰富的类型,属性特征(例如对称性,symmetry),以及枚举类型(enumerated classes). (1)语义网不同于现在WWW,它是现有WWW的扩展与延伸;   (2) 现有的WWW是面向文档而语义网则面向文档所表示的数据;   (3) 语义网将更利于计算机理解与处理,并将具有一定的判断、推理能力。   虽然语义网给我们展示了WWW的美好前景以及由此而带来的 互联网 的革命,但语义网的实现仍面临着巨大的挑战:   (1)内容的可获取性,即基于Ontology而构建的语义网网页目前还很少;   (2)本体的开发和演化,包括用于所有领域的核心本体的开发、开发过程中的方法及技术支持、本体的演化及标注和版本控制问题;   (3)内容的可扩展性,即有了语义网的内容以后,如何以可扩展的方式来管理它,包括如何组织、存储和查找等;   (4)多语种支持;   (5)本体语言的标准化。 语义网不同于现存的万维网,其数据主要供人类使用,新一代WWW中将提供也能为计算机所处理的数据,这将使得大量的智能服务成为可能;语义网研究活动的目标是开发一系列计算机可理解和处理的表达语义信息的语言和技术,以支持网络环境下广泛有效的自动推理。   目前我们所使用的万维网,实际上是一个存储和共享图像、文本的媒介,电脑所能看到的只是一堆文字或图像,对其内容无法进行识别。万维网中的信息,如果要让电脑进行处理的话,就必须首先将这些信息加工成计算机可以理解的原始信息后才能进行处理,这是相当麻烦的事情。而语义网的建立则将事情变得简单得多。   语义网是对万维网本质的变革,它的主要开发任务是使数据更加便于电脑进行处理和查找。其最终目标是让用户变成全能的上帝,对因特网上的海量资源达到几乎无所不知的程度,计算机可以在这些资源中找到你所需要的信息,从而将万维网中一个个现存的信息孤岛,发展成一个巨大的数据库。   语义网将使人类从搜索相关网页的繁重劳动中解放出来。因为网中的计算机能利用自己的智能软件,在搜索数以万计的网页时,通过智能代理从中筛选出相关的有用信息。而不像现在的万维网,只给你罗列出数以万计的无用搜索结果。   例如,在进行在线登记参加会议时,会议主办方在网站上列出了时间、地点,以及附近宾馆的打折信息。如果使用万维网的话,此时你必须上网查看时间表,并进行拷贝和粘贴,然后打电话或在线预订机票和宾馆等。但假如使用的是语义网,那么一切都变得很简单了,此时安装在你计算机上的软件会自动替你完成上述步骤,你所做的仅仅是用鼠标按几个按钮而已。   在浏览新闻时,语义网将给每一篇新闻报道贴上标签,分门别类的详细描述哪句是作者、哪句是导语、哪句是标题。这样,如果你在搜索引擎里输入老舍的作品,你就可以轻松找到老舍的作品,而不是关于他的文章。   总之,语义网是一种更丰富多彩、更个性化的网络,你可以给予其高度信任,让它帮助你滤掉你所不喜欢的内容,使得网络更像是你自己的网络。
个人分类: 图书馆学与情报学杂志|2320 次阅读|0 个评论
1998—2008年国内外本体应用研究计量分析及可视化
BlueSkyBird 2010-7-6 17:30
胡泽文 王效岳 山东理工大学科技信息研究所 淄博 255049 运用文献计量分析方法、计算机统计分析技术、社会网络分析软件对本体领域的历史文献进行分析,通过绘制文献数量分布图、核心关键词的共现网络,挖掘当前本体应用领域的发展趋势、概况和研究热点等信息,以期让读者对国内外本体应用领域的研究概况、趋势和热点有一个直观、清晰的认识,为以后的研究工作提供一个指引。 本体 共现分析 计量分析 社会网络分析 可视化图谱 亮点1: 利用SQL语句统计高频关键词,利用社会化分析软件如 Ucinet 和 NetDraw 进行高频词共现分析及可视化。 亮点2: 结合图表分析,对共现分析所得研究热点的直观清晰和简明扼要的综述,使读者能够迅速了解到本体的应用领域概况及如何在这些领域应用? 文章下载地址: 1998-2008年国内外本体应用研究计量分析及可视化.pdf
个人分类: 研究论文|6798 次阅读|2 个评论
存在的三重根及其求解
HuangFengli 2010-7-5 15:25
存在的三重根及其求解 黄凤琳 中国社会科学院研究生院 存在,本体,向 一 存在及其证明。存在的本质即客观性,非最高范畴。客观性证明:主体主义主体(强主体性)的反映实践(个体主体主义主体的反映实践不完备,人类主体的反映实践最完备)证明,非客体主义主体(弱主体性)或非主体化主体(极弱主体性)的反映实践证明。 这里的存在是相对于意识而言的。意识具有主观性,那么存在则具有客观性。因此,这里的存在是反映相对独立于意识(意识不挤出自身而反映存在)而存在的所有存在物的客观性共同属性的范畴。因此,存在的本质属性即客观性。不同于反映绝对独立于意识(意识挤出自身而反映存在)存在的所有存在物的共同属性的范畴,即最高范畴:本体。本体因绝对独立于意识,不具客观性。世界上不存在不存在。那么,本体为何物?这里要对意识不挤出自身而反映存在和挤出自身而反映存在两种思维现象作出解释,方能回答本体为何物的问题。首先要明晰,作为客体的存在的本质属性即客观性是相对于主体的本质属性主体性而言的,正如作为客体的存在是相对于主体而存在一样。这需要与存在是相对于意识而言的。意识具有主观性,那么存在则具有客观性的说法区别。意识是主体性的构成部分,但不等于主体性。因此,意识具有主观性,但不等于主体性。主体性不同,则客体性不同;则主观性不同,客观性不同。在作者的《科学学原理草稿》和〈〈社会科学学原理〉〉中,把主体根据主体性的强弱区分为强主体性,弱主体性和极弱主体性三种(还可以更细致地分类)。这里的主体可以是个体主体或集体主体。对于国家主体而言,处于同一历史片段的具有先进国际自然分工实践能力的国家的主体性弱于处于同一历史片段的具有落后国际自然分工实践能力的国家的主体性。当处于同一历史片段的国家的国际自然分工实践能力之间的差值悬殊的时候,弱国际自然分工实践能力国家的国家主体性极弱,可以认为是其国家主体性被剥夺,如沦为强国的被统治阶级。对于阶级主体及个体主体,可作类似分析。回到前面。意识不挤出自身而反映存在是主体性较弱的主体的思维现象。主体性较弱的主体在思维过程中,自我意识不强烈或处于潜意识中。意识不挤出自身就是指这种包含处于潜意识或显意识不强烈的自我意识在内的意识不通过显意识或足够的显意识来挤出自我意识。(缺乏显意识的自我意识的意识缺乏对长期环境变化的认识,进而导致相应主体难以有效形成长期行为)。而意识挤出自身而反映存在是主体性较强的主体的思维现象。主体性较强的主体在思维过程中,自我意识强烈或处于显意识中。意识挤出自身就是指包含处于显意识的自我意识在内的意识通过显意识来挤出自我意识。(处于显意识的自我意识的意识善于对长期环境变化的认识,进而导致相应主体能于有效形成长期行为。)自我意识被挤出在于反映主体(不包含自我意识,通过反映不包含自我意识的意识内容,定义自我)本身。意识不可能隔绝自己去反映存在。因此,作为相对独立于意识而存在的所有存在物的共同属性是客观性。自我意识被挤出,去绝对公正无私地反映不包含自我意识的意识内容即主体本身,因此不具有主体性。那么剩下的能够反映作为相对独立于意识而存在的所有存在物的客观性共同属性的,只有不包含自我意识的意识了。本体就是反映这种意识的范畴,表现为自我意识为了定义自己,去认识不包含自我意识的意识内容,进而反映主体的内容。但是,当这种不包含自我意识的意识不能被显意识化之前,自我意识只能无限追溯自身的远外部联系,来最大限度地反映并定义主体自身,并把这种意识的内容潜意识地投射到本质的范畴之中。这就是本体的本质。下面一小节主要是从本体的形式上分析其特征,并为作者的向论打下基础。 客观性证明。显然要证明自然存在的客观性是容易的,因为意识挤出自身或不挤出自身都能反映自然存在。但是证明社会存在的客观性则比较难,只有意识挤出自我意识后,通过自我意识去反映。因此,主体性较强的(国家或阶级或个体)主体能够更容易地反映自身的社会存在。而主体性较弱的主体则不太容易。另一方面,集体主体对存在的客观性的反映实践比个体主体对存在的客观性的反映实践来证明存在的客观性,从理论上更完备。 二 本体及其证明。空间终极存在(最高范畴)。悖论与完备性解决途径:时间终极存在(自我完备)/ 即本体不存在。世界上不存在不存在。本体是主体的自我反映存在。本体的证明即主体的自我反映过程,通过无限追溯自我的终极远外部联系来定义自我(终极远外部联系即空间终极存在不存在。因此,主体的自我反映存在就被投射到终极远外部联系中) 前面分析了本体的本质。下面分析本体的形式。本体的形式表现为反映所有存在物的共同属性的最高范畴。在空间本位的时空观下,这一最高范畴就在于反映极小空间的存在属性,并以此规定大空间存在的属性。本体存在以来,人们一直在质疑认识本体之可能(证明人类认识本体的能力最精细证明,莫过于哥德尔不完备性定理了),但是从没有指出本体按其形式内容的规定根本就不存在或干脆指出本体就是主体自身,尽管现代哲学以来有这种趋势。在作者看来,本体的证明经历了传统本体论,近代认识论和现代主体论三个阶段。传统本体论是按照第一小节指出的主体认识自身的方式(通过认识自我意识的终极远外部联系来认识并规定自身)来认识主体自身的。而近代认识论则通过质疑人类认识本体之可能的方式完备传统本体论或解构本体论来更充分地认识主体自身的。而现代主体论(现代主体主义国家主体的自我反映,作者认为德国是一个国家,此后国家社会主义国家的哲学本体论也属于现代主体论)则是直接主张主体应该反映主体自身的,尽管从未指出这是一个本体回归主体自身的过程。 作者认为要解决本体悖论,必须把空间本位的时空观转换为时间本位的时空观(辩证法与之类似,尽管它从没有明确其时间本位的时空观)。这个想法是大胆的,因为我们会发现它最多只能在理论上自我完备,而毫无实用价值(价值可能仅仅在于说明本体之不实在或主体实在)。时间本位的时空观之所以能在理论上自我完备,就在于时间自身能于规定自身。时间流的每一个截面都在时间的本性中被规定和规定下一个截面。但是,这种对存在认识对于我们对存在认识毫无帮助,它只能解释。时间到空间再到物质的转变过程:向的和是时间,称为向量;向量的和是空间,称为空间向量;空间向量的和是空间;空间的和是速度; 速度的和是质量;质量的和是力;力的和是动量;动量的和是能量。 三 向论:时间平流(单位向值),时间湍流(向值序级/ 存在序级)。 用时间平流来反映单位向类似平流的在逻辑上自我完备的流动。用时间湍流来反映和向的类似湍流的在逻辑上需要主体一层层实践反映的流动。在时间平流中,主体不需要认识时间本身,它即自我完备。而在时间湍流中,主体则需要通过实践的逻辑一层层递进反映时间湍流的逻辑即只能借助于实践逻辑而不能自我完备。造成时间湍流的原因是向值序级或存在序级的存在。如量子-原子-分子这样的跨越式加和的和向序列的存在。 存在的三重根:外部联系( X ),内部联系( Y ),外内联系( F ()) /Y=F ( X ) 一 时间湍流造就存在的三重根(概念反映形式):外部联系,内部联系,外内联系 前面已经指出,时间湍流使得人们对存在的反映不能通过时间平流的自我完备的逻辑(只要不追问本体的性质,就不表现为空间本位的时空观物质观,而是以不回答谁本位问题的存在的三重根逻辑为指导)来实现,而必须借助于实践的逻辑。实践逻辑是指主体通过实践一层层剥离反映造就时间湍流的向值序级。人类历史的实践逻辑形成了不回答时空孰本位问题的存在的三重根的逻辑思想,概念表达即存在的外部联系,存在的内部联系,存在的外部联系的发展变化使得存在的内部联系发展变化的外内联系。前两者反映了存在的空间上的联系,而后者反映了时间湍流。人类对存在的认识,正是通过实践在历史反映实践的基础上逐步推进对存在的三种联系的具体内容的认识的。使在逻辑上使时间湍流更趋平缓。 二 存在的三重根的数学反映形式:X ,Y ,F ()(F ()须大于1 ,不能等于或小于,否则非存在即无外内联系或存在的内部联系存在的外内联系的数学反映形式) 这里用数学形式来反映存在的三重根。X表示存在的内部联系,Y表示存在的外部联系,F()表示外内联系。其中,F()须大于1,不能等于或小于1,否则非存在即无外内联系或存在的内部联系存在的外内联系的数学反映形式。存在的三种联系如同函数求解的三重根一样,不可分地构成了存在。用数学形式来反映存在即Y=F(X)。 三 存在的数学反映形式:Y=F (X ) 存在的三重根的求解:反映实践 一 存在的三重根的求解:反映实践(区别于接触实践,分为感性反映实践和理性反映实践,进一步分为国际自然分工感性和理性反映实践和国内社会分工感性和理性反映实践等) 求解存在的三重根是通过反映实践直接实现的。在《科学学原理草稿》中,作者认为实践的外部联系是历史实践(历史实践的外部联系是DNA等适应存在物),实践的内部联系是接触实践和反映实践,其中接触实践是反映实践的外部联系。反映实践的内部联系是感性反映实践和理性反映实践,其中感性反映实践是理性反映实践的外部联系。对存在的认识过程,就是在历史实践的基础上,通过不断新的接触实践,并以感性反映实践为材料,对存在的三重根的理性反映实践。对存在的三重根的理性反映实践越深入,对时间湍流的逻辑反映越趋平缓,人类对存在的认识就越深刻。 二 求解的递进:反映存在及其积累,新反映实践的不断生成 上一节已基本提到对存在的三重根的求解的递进。因为,实践的外部联系是历史实践,因此求解的递进基础就是历史实践的转化物即历史理性反映存在,求解并在此基础上,通过计划或规定新的接触实践形成新的反映实践。 用数学形式来反映人类对存在的三重根的求解过程。假如历史实践已经认识某一存在的三重根Y,X,F()的就具体内容。那么新的反映实践就在其基础上,深入地反映Z=F(Y)(对存在的内部联系存在的三重根的反映)或X=F(W)(对存在的外部联系存在的三重根的反映)。要更精确更深入更完备地认识某一存在,就可以像后者那样不断地追溯求解存在的外部联系及远外部联系。
8190 次阅读|1 个评论
[转载]自然语言处理与本体工程研讨会 (NLPOE 2010)(EI收录)
hanpu0725 2010-5-10 16:43
自然语言处理与本体工程研讨会 (NLPOE 2010)(EI收录) Workshop on 3 rd Natural Language Processing and Ontology Engineering (NLPOE 2010) http://nlpoe2010.pqpq.net/ In conjunction with The 2010 IEEE/WIC/ACM International Conference on Web Intelligence (WI-10) August 31-September 3, 2010, Toronto, Canada Call for Papers Natural Language Processing (NLP) addresses the problems of automated understanding and generation of natural human languages. The former identifies the syntactic structure of a sentence, judges the semantic relations among the syntactic constituents, in hopes of reaching at an eventual understanding of the sentence. The latter process constructs the semantic structures and syntactic constituents according to the semantic and syntactic properties of the lexical items selected, and eventually generates grammatically well-formed sentences. The goal of the NLP applications is to facilitate human-machine communication using natural languages. In particular, it is to establish various computer application software systems to process natural language, such as machine translation, computer-assisted teaching, information retrieval, automatic text categorization, automatic summarization, speech recognition and synthesis, information extraction from the text, intelligent search on the Internet. Today, with the wide use of the Internet, the demand for language information puts a high premium on automated processing of massive language information. Ontology engineering is a subfield of artificial intelligence and computer science, which aims at a structured representation of terms and relationship between the terms within particular domain, with the purpose to facilitate knowledge sharing and knowledge reuse. Ontology project involves the development of Ontology building programs, Ontology life-cycle management, the research of Ontology building methods, support tools and ontology languages, and a series of similar activities. Ontologies have found important applications in information sharing, system integration, knowledge-based software development and many other issues in software industry. However, ontology engineering is a time-consuming and painstaking endeavor, and NLP technology has important contributions to make in quick and automatic development of ontologies. This workshop will focus on the recent advances made in Ontology engineering and NLP, with the aim to promote the interaction between and common growth of the two areas. We are particularly interested in the building of upper-level language ontology in NLP and the application of NLP technology in Ontology engineering. More importantly, we expect that individuals and research institutions in the areas of both Ontology engineering and NLP could pay attention to this workshop, which may contribute to the integration and growth of these two areas. The topics of the workshop include, but are not limited to, the following: 1.Natural language understanding, including syntactic parsing, word sense disambiguation, semantic role labeling etc; 2.Text mining, including named entity recognition, term recognition, term and synonyms and concept extraction, relation extraction etc) 3.Lexical resources and corpora, including dictionaries, thesaurus, ontology, etc; 4.Ontology learning and population from text, Web and other resources; 5.Application issues of ontology based NLP: information extraction, text categorization, text summarization and other applications; 6.Other topics of relevance in ontology learning, ontology evolution, ontology modeling and ontology application etc. Paper Submission Paper submissions should be limited to a maximum of 4 pages (only one more page is available and extra payment is required for the extra page). The papers must be in English and should be formatted according to the IEEE 2-column format (see the Author Guidelines at http://www.computer.org/portal/pages/cscps/cps/final/wi08.xml ). All submitted papers will be reviewed by at least 2 program committee members on the basis of technical quality, relevance, significance, and clarity. The workshop only accepts on-line submissions. Please use the Submission Form on the WI'10 website to submit your paper. http://wi-consortium.org/cyberchair/wiiat10/scripts/ws_submit.php Publication All papers accepted for workshops will be included in the Workshop Proceedings published by the IEEE Computer Society Press that are indexed by EI, and will be available at the workshops. Important Dates Workshop paper submission: April 16, 2010 Notification of paper acceptance: May 28, 2010 Camera-ready of accepted papers: June 21, 2010 Workshops date: August 31, 2010 Conference dates: September 1 - 3, 2010 Workshop Organizers Zhifang Sui Associate Professor Institute of Computational Linguistics (ICL), Peking University No.5 Yiheyuan Rd. haidian District.100871,Beijing China E-mail:suizhifang@gmail.com Tel:086-01062753081-105 Yao Liu Associate Professor Institute of Scientific and Technical Information of China No.15 Fuxing Road haidian District, Beijing 100038 China E-mail:liuy@istic.ac.cn Tel:086-01058882 053 From:http://www.sciencenet.cn/m/user_content.aspx?id=301834
个人分类: 会议征文|2258 次阅读|0 个评论
[转载]正确认识本体的功能和局限[转摘]
sqwang 2010-4-9 16:17
今年参加开放本体仓储( Open Ontology Repository -- OOR )活动,又在FRSAR报告撰写中重读了很多主题分类方面的经典,回头审视本体的功能和局限,觉得过去自己对本体的认识有很多盲点,值得总结出来,也许这些分析与许多权威专家的想法不太一致,撂在这里,算是抛砖引玉。 自本体一词(又称实用分类系统)在图书馆情报学中开始流传以来,有时候本体被寄予了无限美好的希望,好像在漫长而没有尽头的对图书情报进行组织和检索途径的摸索中 ,终于见到了一点亮光;这颗虽未被尝过(是因为还没得到而不是因为舍不得尝)、但已经被贴上万灵标签的灵丹妙药,几乎要成为每个2010年代宏伟计划的起点以及终结; 我们希望网络世界的无序的问题会在本体的万能功用面前不攻自破... 网上成百上千本体资源(专指编成的并正式表达出来的ontologies)的存在,美国、英国等政府资助的长期、大型本体中心的突出成果(如: BioPortal ), 人气很旺的本体峰会( ontology Summit )年会及其节奏极快的网上举行的会前周会(weekly), 也多多少少证明了本体之热和能。 问题是,对于以情报检索、文献资源组织、面向读者(而不是机器)服务的我们的图情工作,本体到底能有多大的本事? 我认为,(1)本体是很好的可以利用的知识组织系统(KOS),我们必须理解和利用已经存在的众多的本体资源。(2)以推理为目标和按逻辑和公理制作的本体(泛指词)只是可以利用的知识组织系统之一,不应是我们唯一的、也不应是最终的目标和手段。 An ontology is a formal, explicit specification of a shared conceptualization。Studer, R., Benjamins, and Fensel, D. (1998). Knowledge engineering: Principles and methods, Data and Knowledge Engineering, 25(1998): 161-197. 这里的formal, 指机器可处理的,概念、属性、相关关系、限定条件等都有明确定义的。 在机器处理中,按照逻辑和公理(axioms)来建立的本体才可能不产生语义含糊的推理,昆虫(纲)(Insecta) 只能属于节支动物门 (Arthropoda), 在本体的类目(class)之间,下位类自然承袭上位类的所有属性(attributes)(但还有更独特的其它属性)。 这种严格的属--种 (genus-species)等级 关系,在taxonomies(知识分类表)中一般来说得到 基本 保证,在 thesauri (叙词表)中得到 一定 到 基本 的保证,(根据学科专业而定) ,在classification schemes (图书馆分类法)中则得到 较少 的保证。 属--种关系只是图书分类法中采用的等级关系的一种,在很多情况下,图书馆分类法是根据事物被学习、研究、讨论和由此产生的学科专业、书、刊、论文等的情况而列的,这样昆虫可以被列在很多类下,例如农业害虫, 疾病载体, 食品, 艺术表现, 控制技术等等类目下面。这种'perspective hierarchies (角度等级)不带有属种关系所遵循的概念内涵的逻辑关系,不反映概念自反性 (reflexivity),反对称性(anti-symmetry),传递性(transitivity)等基本属性。这些观点和角度与被研究对象概念在本体中是通过很多其它方式(而不是等级方式)来表现的,比如属性(attributes), 限定和规定(restrictions,rules)、概念关系类型 (semantic relation types) 和经由它们生成的概念- -关系断言(assertions)等。 本体和其它KOS也都采用整体-部分(whole-part) 和类-例(class-instances)等级,这里略去不表。 本体与图书馆情报学的KOS(1)有没有必要一起用?(2)能不能互相利用?(3)本体取代所有KOS? (有空时再抛一砖。)下图是一个典型实例,可以用来解释本体资源(ontologies)显示概念关系的方法, 但是这里并没有展现在装载本体的文件里的各种对概念关系的表达。 这个ontology是用Protege为工具做的,注册到BioPortal。 BioPortal集中了不少生物和生物医学方面的ontologies, 有三种表现格式: Protege, OWL(Lite, Full, DL), OBO自己的格式。不管是什么格式表现和存储的,其显示给人(而不是机器)看的界面是一样的。如图。 注意左边细胞(cell)在hasSubclass等级结构中的显示。Cell承袭上位类的所有特征(attributes),其下位类承袭它的所有特征,这些特征可以在右边上栏的attributes下面看到,包括: has_boundary; has_inherent_3-D_shape; dimension; has_dimension; has_mass; Definition; Comment; Synonym. 这个类与其它类目的关系的局部显示在右下栏中。这些关系遵循某个ontology定义好的关系类型。不同的ontologies往往根据需要定义一些的关系类型。可以看出,这些关系类型得到特定的揭示,是叙词表或分类表所缺乏的。 但是要记住的是,为什么ontologies需要这些定义和严格的关系,因为它们的原始目的并不是标引和检索,而是判断和推理。 屏幕图像来自Foundational Model of Anatomy ontology (FMA)在BioPortal上的显示。 Source: BioPortal of the Open Biomedical Ontologies (OBO) library, National Center for Biomedical Ontology (http://www.bioontology.org/tools/portal/bioportal.html) 在对关系的表达格式中,本体采用的编码语言容许其达到不同的境界,比如,OWL Web Ontology Language 建立在RDF的基础上,有很多特定的表达,例如 (用英文总结的,没有来得及翻译): ? cardinality constraints on properties, e.g., a Star is memberOf exactly one Galaxy; ? specifying constraints on the range or cardinality of a property depend on the class of resource, e.g., for a binarySystem the hasMember property has 2 values, while for a tripleSystem the same property should have 3 values; ? specifying that a given property is transitive, e.g., if A hasAncestor B, and B hasAncestor C, then A hasAncestor C; ? specifying that a given property is a unique identifier (or key) for instances of a particular class; ? Equivalent class - specifying that two different classes (having different URIrefs) actually represent the same class; ? Same as - specifying that two different instances (having different URIrefs) actually represent the same individual; ? the ability to describe new classes in terms of combinations (e.g., unions and intersections) of other classes; and ? the ability to describe disjoint classes (i.e., no instance belongs to both classes), e.g., benign and malignant. 这些表达使得本体的关系推理得到保障,比如对成员范围甚至个数的规定,对排斥关系(disjoint)的概念的表述(例如良性与恶性肿瘤之间的关系 -- 虽然这个例子让人看了很不舒服,但是很能说服问题,所以先用了再说吧)等等。 我想用这个例子来看本体与传统KOS(例如taxonomies) 的关系。SWAD 例子比较好懂,先放在这里。其它实例下次继续介绍。 SWED (Semantic Web Environmental Directory) 是关于环境科学和环境保护方面的组织和项目的名录或指南,作为一个portal, 其内容是通过语义网工具和内容收割过来的 。 SWAD的ontology里面建立了几个提供等级关系结构的知识分类表 (taxonomies),如图所示。 来源:Alistair Miles, Taxonomies and the Semantic Web, CISTRANA Workshop 02/05 另外,Alistair Miles通过SWAD的实例对编码的经济效益作了一些比较研究,发现最合适的方式就是SKOS和OWL的结合。 Source: Alistair Miles, Taxonomies and the Semantic Web, CISTRANA Workshop 02/06 http://isegserv.itd.rl.ac.uk/public/skos/press/cistrana200602/taxonomies-semanticweb.ppt 也就是说,对等级结构的KOS成份的编码(例如 taxonomies, thesauri), 用SKOS编码就足够用了,因为它们所要表现的关系比较简单,等级关系为主,有一些不加区分的相关关系,要是用OWL的那些复杂表达式有点杀鸡用牛刀的架势,没有必要。但是在表达ontologies所要表示的复杂关系时(参见上篇的最后一节),必须使用OWL。目前欧洲几个NKOS大项目的研究都试图在证明SKOS的重要性和效益问题。
个人分类: 人工智能|2917 次阅读|0 个评论
[转载]正确认识本体的功能和局限[转摘]
sqwang 2010-4-9 16:17
今年参加开放本体仓储( Open Ontology Repository -- OOR )活动,又在FRSAR报告撰写中重读了很多主题分类方面的经典,回头审视本体的功能和局限,觉得过去自己对本体的认识有很多盲点,值得总结出来,也许这些分析与许多权威专家的想法不太一致,撂在这里,算是抛砖引玉。 自本体一词(又称实用分类系统)在图书馆情报学中开始流传以来,有时候本体被寄予了无限美好的希望,好像在漫长而没有尽头的对图书情报进行组织和检索途径的摸索中 ,终于见到了一点亮光;这颗虽未被尝过(是因为还没得到而不是因为舍不得尝)、但已经被贴上万灵标签的灵丹妙药,几乎要成为每个2010年代宏伟计划的起点以及终结; 我们希望网络世界的无序的问题会在本体的万能功用面前不攻自破... 网上成百上千本体资源(专指编成的并正式表达出来的ontologies)的存在,美国、英国等政府资助的长期、大型本体中心的突出成果(如: BioPortal ), 人气很旺的本体峰会( ontology Summit )年会及其节奏极快的网上举行的会前周会(weekly), 也多多少少证明了本体之热和能。 问题是,对于以情报检索、文献资源组织、面向读者(而不是机器)服务的我们的图情工作,本体到底能有多大的本事? 我认为,(1)本体是很好的可以利用的知识组织系统(KOS),我们必须理解和利用已经存在的众多的本体资源。(2)以推理为目标和按逻辑和公理制作的本体(泛指词)只是可以利用的知识组织系统之一,不应是我们唯一的、也不应是最终的目标和手段。 An ontology is a formal, explicit specification of a shared conceptualization。Studer, R., Benjamins, and Fensel, D. (1998). Knowledge engineering: Principles and methods, Data and Knowledge Engineering, 25(1998): 161-197. 这里的formal, 指机器可处理的,概念、属性、相关关系、限定条件等都有明确定义的。 在机器处理中,按照逻辑和公理(axioms)来建立的本体才可能不产生语义含糊的推理,昆虫(纲)(Insecta) 只能属于节支动物门 (Arthropoda), 在本体的类目(class)之间,下位类自然承袭上位类的所有属性(attributes)(但还有更独特的其它属性)。 这种严格的属--种 (genus-species)等级 关系,在taxonomies(知识分类表)中一般来说得到 基本 保证,在 thesauri (叙词表)中得到 一定 到 基本 的保证,(根据学科专业而定) ,在classification schemes (图书馆分类法)中则得到 较少 的保证。 属--种关系只是图书分类法中采用的等级关系的一种,在很多情况下,图书馆分类法是根据事物被学习、研究、讨论和由此产生的学科专业、书、刊、论文等的情况而列的,这样昆虫可以被列在很多类下,例如农业害虫, 疾病载体, 食品, 艺术表现, 控制技术等等类目下面。这种'perspective hierarchies (角度等级)不带有属种关系所遵循的概念内涵的逻辑关系,不反映概念自反性 (reflexivity),反对称性(anti-symmetry),传递性(transitivity)等基本属性。这些观点和角度与被研究对象概念在本体中是通过很多其它方式(而不是等级方式)来表现的,比如属性(attributes), 限定和规定(restrictions,rules)、概念关系类型 (semantic relation types) 和经由它们生成的概念- -关系断言(assertions)等。 本体和其它KOS也都采用整体-部分(whole-part) 和类-例(class-instances)等级,这里略去不表。 本体与图书馆情报学的KOS(1)有没有必要一起用?(2)能不能互相利用?(3)本体取代所有KOS? (有空时再抛一砖。)下图是一个典型实例,可以用来解释本体资源(ontologies)显示概念关系的方法, 但是这里并没有展现在装载本体的文件里的各种对概念关系的表达。 这个ontology是用Protege为工具做的,注册到BioPortal。 BioPortal集中了不少生物和生物医学方面的ontologies, 有三种表现格式: Protege, OWL(Lite, Full, DL), OBO自己的格式。不管是什么格式表现和存储的,其显示给人(而不是机器)看的界面是一样的。如图。 注意左边细胞(cell)在hasSubclass等级结构中的显示。Cell承袭上位类的所有特征(attributes),其下位类承袭它的所有特征,这些特征可以在右边上栏的attributes下面看到,包括: has_boundary; has_inherent_3-D_shape; dimension; has_dimension; has_mass; Definition; Comment; Synonym. 这个类与其它类目的关系的局部显示在右下栏中。这些关系遵循某个ontology定义好的关系类型。不同的ontologies往往根据需要定义一些的关系类型。可以看出,这些关系类型得到特定的揭示,是叙词表或分类表所缺乏的。 但是要记住的是,为什么ontologies需要这些定义和严格的关系,因为它们的原始目的并不是标引和检索,而是判断和推理。 屏幕图像来自Foundational Model of Anatomy ontology (FMA)在BioPortal上的显示。 Source: BioPortal of the Open Biomedical Ontologies (OBO) library, National Center for Biomedical Ontology (http://www.bioontology.org/tools/portal/bioportal.html) 在对关系的表达格式中,本体采用的编码语言容许其达到不同的境界,比如,OWL Web Ontology Language 建立在RDF的基础上,有很多特定的表达,例如 (用英文总结的,没有来得及翻译): ? cardinality constraints on properties, e.g., a Star is memberOf exactly one Galaxy; ? specifying constraints on the range or cardinality of a property depend on the class of resource, e.g., for a binarySystem the hasMember property has 2 values, while for a tripleSystem the same property should have 3 values; ? specifying that a given property is transitive, e.g., if A hasAncestor B, and B hasAncestor C, then A hasAncestor C; ? specifying that a given property is a unique identifier (or key) for instances of a particular class; ? Equivalent class - specifying that two different classes (having different URIrefs) actually represent the same class; ? Same as - specifying that two different instances (having different URIrefs) actually represent the same individual; ? the ability to describe new classes in terms of combinations (e.g., unions and intersections) of other classes; and ? the ability to describe disjoint classes (i.e., no instance belongs to both classes), e.g., benign and malignant. 这些表达使得本体的关系推理得到保障,比如对成员范围甚至个数的规定,对排斥关系(disjoint)的概念的表述(例如良性与恶性肿瘤之间的关系 -- 虽然这个例子让人看了很不舒服,但是很能说服问题,所以先用了再说吧)等等。 我想用这个例子来看本体与传统KOS(例如taxonomies) 的关系。SWAD 例子比较好懂,先放在这里。其它实例下次继续介绍。 SWED (Semantic Web Environmental Directory) 是关于环境科学和环境保护方面的组织和项目的名录或指南,作为一个portal, 其内容是通过语义网工具和内容收割过来的 。 SWAD的ontology里面建立了几个提供等级关系结构的知识分类表 (taxonomies),如图所示。 来源:Alistair Miles, Taxonomies and the Semantic Web, CISTRANA Workshop 02/05 另外,Alistair Miles通过SWAD的实例对编码的经济效益作了一些比较研究,发现最合适的方式就是SKOS和OWL的结合。 Source: Alistair Miles, Taxonomies and the Semantic Web, CISTRANA Workshop 02/06 http://isegserv.itd.rl.ac.uk/public/skos/press/cistrana200602/taxonomies-semanticweb.ppt 也就是说,对等级结构的KOS成份的编码(例如 taxonomies, thesauri), 用SKOS编码就足够用了,因为它们所要表现的关系比较简单,等级关系为主,有一些不加区分的相关关系,要是用OWL的那些复杂表达式有点杀鸡用牛刀的架势,没有必要。但是在表达ontologies所要表示的复杂关系时(参见上篇的最后一节),必须使用OWL。目前欧洲几个NKOS大项目的研究都试图在证明SKOS的重要性和效益问题。
个人分类: 人工智能|9 次阅读|0 个评论
自然语言处理与本体工程研讨会 (NLPOE 2010)(EI收录)
liuysd 2010-2-23 08:46
Workshop on 3 rd Natural Language Processing and Ontology Engineering (NLPOE 2010) http://nlpoe2010.pqpq.net/ In conjunction with The 2010 IEEE/WIC/ACM International Conference on Web Intelligence (WI-10) August 31-September 3, 2010, Toronto, Canada Call for Papers Natural Language Processing (NLP) addresses the problems of automated understanding and generation of natural human languages. The former identifies the syntactic structure of a sentence, judges the semantic relations among the syntactic constituents, in hopes of reaching at an eventual understanding of the sentence. The latter process constructs the semantic structures and syntactic constituents according to the semantic and syntactic properties of the lexical items selected, and eventually generates grammatically well-formed sentences. The goal of the NLP applications is to facilitate human-machine communication using natural languages. In particular, it is to establish various computer application software systems to process natural language, such as machine translation, computer-assisted teaching, information retrieval, automatic text categorization, automatic summarization, speech recognition and synthesis, information extraction from the text, intelligent search on the Internet. Today, with the wide use of the Internet, the demand for language information puts a high premium on automated processing of massive language information. Ontology engineering is a subfield of artificial intelligence and computer science, which aims at a structured representation of terms and relationship between the terms within particular domain, with the purpose to facilitate knowledge sharing and knowledge reuse. Ontology project involves the development of Ontology building programs, Ontology life-cycle management, the research of Ontology building methods, support tools and ontology languages, and a series of similar activities. Ontologies have found important applications in information sharing, system integration, knowledge-based software development and many other issues in software industry. However, ontology engineering is a time-consuming and painstaking endeavor, and NLP technology has important contributions to make in quick and automatic development of ontologies. This workshop will focus on the recent advances made in Ontology engineering and NLP, with the aim to promote the interaction between and common growth of the two areas. We are particularly interested in the building of upper-level language ontology in NLP and the application of NLP technology in Ontology engineering. More importantly, we expect that individuals and research institutions in the areas of both Ontology engineering and NLP could pay attention to this workshop, which may contribute to the integration and growth of these two areas. The topics of the workshop include, but are not limited to, the following: 1.Natural language understanding, including syntactic parsing, word sense disambiguation, semantic role labeling etc; 2.Text mining, including named entity recognition, term recognition, term and synonyms and concept extraction, relation extraction etc) 3.Lexical resources and corpora, including dictionaries, thesaurus, ontology, etc; 4.Ontology learning and population from text, Web and other resources; 5.Application issues of ontology based NLP: information extraction, text categorization, text summarization and other applications; 6.Other topics of relevance in ontology learning, ontology evolution, ontology modeling and ontology application etc. Paper Submission Paper submissions should be limited to a maximum of 4 pages (only one more page is available and extra payment is required for the extra page). The papers must be in English and should be formatted according to the IEEE 2-column format (see the Author Guidelines at http://www.computer.org/portal/pages/cscps/cps/final/wi08.xml ). All submitted papers will be reviewed by at least 2 program committee members on the basis of technical quality, relevance, significance, and clarity. The workshop only accepts on-line submissions. Please use the Submission Form on the WI'10 website to submit your paper. http://wi-consortium.org/cyberchair/wiiat10/scripts/ws_submit.php Publication All papers accepted for workshops will be included in the Workshop Proceedings published by the IEEE Computer Society Press that are indexed by EI, and will be available at the workshops. Important Dates Workshop paper submission: April 16, 2010 Notification of paper acceptance: May 28, 2010 Camera-ready of accepted papers: June 21, 2010 Workshops date: August 31, 2010 Conference dates: September 1 - 3, 2010 Workshop Organizers Zhifang Sui Associate Professor Institute of Computational Linguistics (ICL), Peking University No.5 Yiheyuan Rd. haidian District.100871,Beijing China E-mail:suizhifang@gmail.com Tel:086-01062753081-105 Yao Liu Associate Professor Institute of Scientific and Technical Information of China No.15 Fuxing Road haidian District, Beijing 100038 China E-mail:liuy@istic.ac.cn Tel:086-01058882 053
个人分类: 会议征文|6528 次阅读|2 个评论
Ontology【ZZ】
timy 2009-12-29 20:13
From: http://www.langware.com/index.php?/content/view/30/45/ Ontology http://en.wikipedia.org/wiki/Ontology From Wikipedia, the free encyclopedia In philosophy, ontology is the study of being or existence. http://en.wikipedia.org/wiki/Ontology_(computer_science) Ontology (computer science) From Wikipedia, the free encyclopedia In both computer science and information science, an ontology is a data model that represents a set of concepts within a domain and the relationships between those concepts. It is used to reason about the objects within that domain. http://www-ksl.stanford.edu/kst/what-is-an-ontology.html What is an Ontology? Short answer: An ontology is a specification of a conceptualization. http://www.jfsowa.com/ontology/ The subject of ontology is the study of the categories of things that exist or may exist in some domain. The product of such a study, called an ontology, is a catalog of the types of things that are assumed to exist in a domain of interest D from the perspective of a person who uses a language L for the purpose of talking about D. http://www.formalontology.it/ Ontology is the theory of objects and their ties. Ontology provides criteria for distinguishing various types of objects (concrete and abstract, existent and non-existent, real and ideal, independent and dependent) and their ties (relations, dependences and predication). http://ontology.buffalo.edu/ State University of New York at Buffalo Department of Philosophy; Ontology http://www.newadvent.org/cathen/11258a.htm Ontology is not a subjective science as Kant describes it (Ub. d. Fortschr. d. Met., 98) nor an inferential Psychology, as Hamilton regards it (Metaphysics, Lect. VII); nor yet a knowledge of the absolute (theology); nor of some ultimate reality whether conceived as matter or as spirit, which Monists suppose to underlie and produce individual real beings and their manifestations. http://pespmc1.vub.ac.be/ONTOLI.html Ontology (the science of being) is a word, like metaphysics, that is used in many different senses. It is sometimes considered to be identical to metaphysics, but we prefer to use it in a more specific sense, as that part of metaphysics that specifies the most fundamental categories of existence, the elementary substances or structures out of which the world is made. http://www.aaai.org/AITopics/html/ontol.html Ontological analysis clarifies the structure of knowledge. Given a domain, its ontology forms the heart of any system of knowledge representation for that domain. Without ontologies, or the conceptualizations that underlie knowledge, there cannot be a vocabulary for representing knowledge....Second, ontologies enable knowledge sharing. -from What Are Ontologies, and Why Do We Need Them? B. Chandrasekaran, Jorn R. Josephson, V. and Richard Benjamins http://www.daml.org/ontologies/ DAML Ontology Library http://ontology.buffalo.edu/smith/articles/ontologies.htm Ontology as a branch of philosophy is the science of what is, of the kinds and structures of the objects, properties and relations in every area of reality. Ontology in this sense is often used in such a way as to be synonymous with metaphysics. In simple terms it seeks the classification of entities. In the field of information processing there arises what we might call the Tower of Babel problem. http://www.linguistics-ontology.org/ The GOLD Community is a vision to bring together those interested in the best-practice encoding of linguistic data. http://emeld.org/documents/GLOT-LinguisticOntology.pdf A linguistic ontology for the semantic web http://www.formalontology.it/linguistic-relativity.htm Language and Thought: Ontological Problems Ontology and the Linguistic Relativity (Sapir-Whorf) Hypothesis http://ontology.teknowledge.com/ This site contains information about the SUMO (Suggested Upper Merged Ontology). This ontology is being created as part of the IEEE Standard Upper Ontology Working Group. The goal of this Working Group is to develop a standard upper ontology that will promote data interoperability, information search and retrieval, automated inferencing, and natural language processing. The SUMO has been translated into various representation formats, but the language of development is a variant of KIF (a version of the first-order predicate calculus). http://www.fb10.uni-bremen.de/anglistik/langpro/webspace/jb/info-pages/ ontology/ontology-root.htm This page is a collection of starting points for information on ontologies gathered together for ease of reference for our own ontology-related projects. It is made available as is in case it is of use to anyone else. http://www.cs.vu.nl/~guus/papers/Hage05a.pdf A Method to Combine Linguistic Ontology-Mapping Techniques We discuss four linguistic ontology-mapping techniques and evaluate them on real-life ontologies in the domain of food. Furthermore we propose a method to combine ontology-mapping techniques with high Precision and Recall to reduce the necessary amount of manual labor and computation. http://zimmer.csufresno.edu/~wlewis/projects/DDLOD.html Data-Driven Linguistic Ontology Development Universitt Bremen The intent of the DDLOD project is to semi-automatically capture a picture of the semantic space of the field of linguistics, and use this snapshot to make the Generalized Ontology for Linguistic Description (GOLD) as complete and comprehensive as possible. http://linguistlist.org/emeld/tools/ontology.cfm Markup: Linguistic Ontology Traditionally markup has been defined as systematic annotation designed to reveal a text's typographical and informational structure. Linguistic markup might be broadly described as annotation representing: (a) the grammatical structure of text couched in the focus language and (b) the structure of documents presenting a linguistic description or analysis of such text. http://www.aifb.uni-karlsruhe.de/WBS/pci/annotation.pdf Ontology-based linguistic annotation Institute AIFB; University of Karlsruhe http://zimmer.csufresno.edu/~wlewis/projects/DDLOD-overview.html The World Wide Web has become a primary source for disseminating data on the worlds languages, with a variety of language data regularly posted to the Web, including large numbers of scholarly papers on language. Often embedded in these documents are enriched language data encoded in the form of Interlinear Glossed Text (IGT). IGT is a standard method for presenting linguistic data, and consists of a line of language data, usually broken down by morpheme, a line of grammatical and gloss information aligned with the text in the first line, and a line representing the translation. http://cogprints.org/4009/ The ontology of signs as linguistic and non-linguistic entities: a cognitive perspective http://www.phil.uni-passau.de/linguistik/linguistik_urls/urls.php?CAT=computing: Software:Ontology+Engineering Linguistics Links Database Computing Software Ontology Engineering JATKE (unified platform for ontology learning) OntoLT (middleware for ontology extraction from text) Protg (ontology editor and knowledge-base editor) Text2Onto (framework for ontology learning from text) TextToOnto (ontology construction using text mining techniques) http://www.phil.uni-passau.de/linguistik/linguistik_urls/urls.php?CAT=computing:Software Linguistics Links Database Department of General Linguistics at the University of Passau. http://www.cs.utexas.edu/users/mfkb/related.html Some Ongoing KBS/Ontology Projects and Groups Knowledge-Base Projects, Groups, and Related Material http://sigart.acm.org/ai/ontology.html A lot of stuff for linguistics, networks and computers. http://www.essex.ac.uk/linguistics/clmt/other_sites/index_1.html A lot of links for linguistics, networks and computers. No longer maintained. http://www.sim.hcuge.ch/ontology/03_MedicalLinguistics.htm The Service d'Informatique Mdicale (SIM) is part of the Radiology and Medical Informatics Department of the University Hospitals of Geneva, This entity is in charge of development of medical applications like patient record, medical orders and other knowledge based applications. A group of SIM has been long specialized for Natural Language Processing. http://linguistlist.org/emeld/school/classroom/ontology/index.html E-MELD school of best practices in digital language documentation http://linguistlist.org/emeld/workshop/2005/papers/saulwick-paper.doc . Semantic relations in ontology mediated linguistic data integration http://llc.oxfordjournals.org/cgi/content/abstract/21/suppl_1/29 Oxford Journals Literary and Linguistic Computing Designing and Implementing an Ontology for Logic and Linguistics http://www.legenden.dk/blog/2003/12/links.html Online Philosophy List of philosophers with online papers about: Language, Linguistics, Metaphysics, Epistemology, Logic and Mathematics http://www.let.uu.nl/linguistics/log/ EBoLi - an E-Book for Linguistics http://suo.ieee.org/email/msg12240.html Multi-Source Ontology (MSO) Draft Ballot Question http://xml.coverpages.org/xml.html Extensible Markup Language (XML) and links for ontology. http://www.onlineoriginals.com/showitem.asp?itemID=287articleID=10 A GENETIC INTERPRETATION OF RICOEUR'S PHILOSOPHY OF LANGUAGE Furnishing Ricoeur's theory of language with an ontology that is consistent with his own assumptions http://www.clres.com/dict.html ACL SIGLEX Resource Links http://swik.net/ontology?index ontology Pages Filter by Tag related to ontology http://www.cs.brandeis.edu/~jamesp/arda/time/readings.html The site contains References and Links; General References; Ontology WG; Corpus WG; TimeML WG http://nlp.shef.ac.uk/links.html Natural language processing group http://www.imi.uni-luebeck.de/~ingenerf/terminology/Term-oth.html Materials about Basic Sciences and; Terminology; Ontology; Artificial Intelligence; Knowledge Representation; Computational Linguistics; Information Retrieval http://citeseer.ist.psu.edu/704251.html Introduction The World Wide Web has the potential to become a primary source for storing and accessing linguistic data, including data of the sort that are routinely collected by field linguists. Having large amounts of linguistic data on the Web will give linguists, indigenous communities, and language learners access to resources that have hitherto been difficult to obtain. For linguists, scientific data from the world's languages will be just as accessible as information in on-line http://citeseer.ist.psu.edu/760180.html Class Relation Predicate GrammaticalRelation Aspect Tense Case Agreement Attribute GrammaticalAttribute Gender Person Number 7 4.2 Details of the Ontology As much as possible we tried to use existing elements of the SUMO. First of all SUMO already includes a good semiotics architecture for the representation and the communication of information in general. Expanded from the original SUMO somewhat are the basic segments of language, which are classified as LinguisticExpressions http://www.loa-cnr.it/Files/SOIA.pdf SOIA Semantics and Ontology of InterAction Joint project ISTC - IRIT (CNRS-UPS, Toulouse, France) http://opim-sun.wharton.upenn.edu/~asa28//useful_semiotics_research_links.htm Useful Semiotics, linguistics, semantics, syntactics, controlled language, domain-specific language, etc. Research Links http://links.jstor.org/sici?sici=0097-8507(198309)59%3A3%3C708%3AEILO%3E2.0.CO%3B2-L Essays in Linguistic Ontology http://www.jfsowa.com/ontology/lexicon.htm The lexicon is the bridge between a language and the knowledge expressed in that language. Every language has a different vocabulary, but every language provides the grammatical mechanisms for combining its stock of words to express an open-ended range of concepts. Different languages, however, differ in the grammar, the words, and the concepts they express. http://www.cs.bilkent.edu.tr/~erayo/ontology/html/bookmarks/Ontologies/ Linguistics_Oriented/index.html Annotated Ontology Resources: Linguistics Oriented http://www.sciencedirect.com/science?_ob=ArticleURL_udi=B6V0N-47TFMYT-5 _user=10_coverDate=11%2F15%2F2002_rdoc=1_fmt=_orig=search_ sort=dview=c_acct=C000050221_version=1_urlVersion=0_userid=10md5 =85bb0f32be97f1d75abcbd7652951834 Linguistic kleptomania in computer science Department of Informatics, Aristotle University, Thessaloniki, Greece http://www.fi.muni.cz/gwc2004/proc/118.pdf One Dead Armadillo on WordNet's Speedway to Ontology Institute for Formal Ontology and Medical Information Science, University of Leipzig http://www.ling.su.se/DaLi/research/index.htm Research in Computational Linguistics at SU http://xml.coverpages.org/muleco.html Multilingual Upper-Level Electronic Commerce Ontology (MULECO) http://xml.coverpages.org/oil.html Ontology Interchange Language (OIL) http://xml.coverpages.org/owl.html OWL Web Ontology Language http://xml.coverpages.org/oml.html Ontology and Conceptual Knowledge Markup Languages http://xml.coverpages.org/shoe.html Simple HTML Ontology Extensions (SHOE) http://xml.coverpages.org/xol.html XOL - XML-Based Ontology Exchange Language http://www.cstr.ed.ac.uk/ University of Edinburgh The Centre for Speech Technology Research http://www.cl.cam.ac.uk/research/nl/index.html Natural Language and Information Processing Group University of Cambridge; Computer Laboratory; NLIP Group Computer Laboratory, University of Cambridge
个人分类: 自然语言处理|8464 次阅读|5 个评论
策略性阅读、本体与科学出版的未来(摘译)
libseeker 2009-12-20 12:52
图谋按:《Science》2009年8月14日刊登了一篇评论《策略性阅读、本体与科学出版的未来》。作者为伊利诺伊大学厄巴纳香槟分校图情(LIS)研究生院Allen H. Renear和Carole L. Palmer。特此摘译该文摘要及2019年科学家们将怎样使用文献?部分,供参考。感谢caveman (Jason Zou) 先生提供原文! 译自:Allen H. Renear, et al. Strategic Reading, Ontologies, and the Future of Scientific Publishing.Science 325,828(2009).DOI:10.1126/Science.1157784 (作者信息:Allen H. Renear and Carole L. Palmer,Center for Informatics Research in Science and Scholarship,Graduate School of Library and Information Science,University of Illinois at Urbana-Champaign,CHampaign,IL 61820,USA) 题名:策略性阅读、本体与科学出版的未来 摘要:科学出版革命自20世纪80年代起预示即将发生。科学家们讲究策略性阅读,同时对许多篇论文进行搜索,筛选,浏览,链接,注释和分析内容片段。观察表明网络环境下的策略性阅读最近有所增加,不久将进一步集中为两种流行的趋势:一是资源数字标引,检索和导航的广泛使用;二是多学科内在本体互操作的出现。利用本体优势,阅读工具开发加速与增强,阅读实践将变得更加快速和丰富,改变了科学家使用文献的方法并且重塑了科学出版的演变。 摘要原文: The revolution in scientific publishing that has been promised since the 1980s is about to take place. Scientists have always read strategically, working with many articles simultaneously to search, filter, scan, link, annotate, and analyze fragments of content. An observed recent increase in strategic reading in the online environment will soon be further intensified by two current trends: (i) the widespread use of digital indexing, retrieval, and navigation resources and (ii) the emergence within many scientific disciplines of interoperable ontologies. Accelerated and enhanced by reading tools that take advantage of ontologies, reading practices will become even more rapid and indirect, transforming the ways in which scientists engage the literature and shaping the evolution of scientific publishing. 2019年科学家们将怎样使用文献? 尽管文本挖掘和自动化处理变得很平常,科学家们仍旧阅读叙事散文。然而,这种阅读实践有延伸阅读文献和本体意识工具的支持会越来越策略。作为出版工作流程的一部分,针对丰富的本体,科学术语将按常规编入索引。更重要的是,正式的说法,也许在专门的结构化摘要,将提供计算获得的因果关系和本体联系的索引和浏览工具。超文本链接将是广泛的,通过共享注释数据库自动生成读者提供的博客评论。同时,将出现更多工具增强搜索、浏览和分析并且利用日益丰富的索引、链接和注释信息。 如上所述,在技术方面没有障碍,而且已经在进行。一如既往,这些变化将是渐进的。现在已广泛运用现有的索引和检索服务的科学家,会遇到新的增强功能,并且采用那些快速增长的文献。新功能的提供有时会作为应用程序接口(比如PubMed的新功能)的一部分,或作为用户可以添加到Web浏览器的共享外部工具。这些发展在找一篇文章来读的行为已经过时与狭义的文本挖掘对象之间形成中间道路,直接反映科学家们日常工作中策略性阅读是非常必要的、有意义的。 原文:How Will Scientists Work with the Literature in 2019? Scientists will still read narrative prose, even as text mining and automated processing become common; however, these reading practices will become increasingly strategic, supported by enhanced literature and ontology-aware tools. Aspart of the publishing workflow, scientific terminology will be indexed routinely against rich ontologies. More importantly, formalized assertions, perhaps maintained in specialized structured abstracts, will provide indexing and browsing tools with computational access to causal and ontological relationships. Hypertext linking will be extensive, generated both automatically and by readers providing commentary on blogs and through shared annotation databases. At the same time, more tools for enhanced searching,scanning, and analyzing will appear and exploit the increasingly rich layer of indexing, linking, and annotation information. There are no technical obstacles to this trajectory, and it is already under way. The changes,as always, will be incremental: Scientists, who today already make extensive use of existing indexing and retrieval services, will encounter a steady stream of new enhancements and adopt those that allow rapid and productive engagement with the literature. The new functionality will sometimes be provided as part of the application interface (new features in PubMed, for instance) or as shared external tools that users can add to their Web browsers. These developments chart a middle course between the already obsolete activity of finding an article to read on the one hand, and the narrower objectives of text mining on the other, responding directly to the entrenched necessity and value of strategic reading in the daily work of todays scientists.
个人分类: 学海拾贝|3917 次阅读|0 个评论
基于本体的转化医学信息组织表达方法的探索
zilu85 2009-11-13 08:52
【下面是我在中华医学会医学信息学学术年会上宣读的一篇论文的前言部分】 转化医学是指将基础研究人员和临床医生的研究成果整合起来,将其直接应用于病人身上,转化医学研究的目标就是探索如何跨越从实验室到病床的屏障。由于医疗实践的初衷就是尽可能多地利用知识和数据来治愈病人,所以转化医学不是什么新的概念,只是随着最近 20 年信息科学技术的发展,开展转化医学研究的条件愈发成熟起来。 特别是高通量分子技术的发展,产生了大量的、复杂的、而且是动态的数据,利用这些来自于基础科学实验室的数据的研究论文也越来越多,这些文献所提出的理论改变了我们对人类疾病的理解,并且对病人的治疗发挥了直接作用。因此说,高通量分子技术的进步给生物医学的发展带来了机遇,尤其是给转化医学 (Translational Medicine) 研究带来了新的契机。 如何开展转化医学研究?关键的步骤之一就是将来自实验室的基因表达数据与患者的临床特征衔接起来。 以乳腺癌为例,作为女性多发癌症之一,其诊断和治疗研究具有十分重要的意义。一直以来,人们普遍认为乳腺癌是一种异质性疾病,有必要对乳腺癌进一步分类以实施个性化治疗。在现有的知识水平上,临床医生可以根据肿瘤的临床特征 ( 如肿瘤的大小、淋巴结转移、远隔转移以及组织学表现 ) 、病人特征 ( 如年龄、吸烟史和月经状况 ) 以及免疫组织化学特征 ( 如 ER 、 PR 、 ERBB2) 等患者临床特征信息,大致估计肿瘤行为,并为判断预后和对治疗的反应提供帮助。因而,患者临床特征是乳腺癌预后和治疗的重要因素。两组病人基因表达的某些差异可能是由其他因素 ( 如年龄 ) ,而不是靶向因素 ( 如治疗 ) 造成的。 然而,在对癌症患者的肿瘤标本的微阵列数据进行分析的时候,往往把患者间的临床特征上的差异降到最低。例如,在有关治疗的微阵列实验研究中的设计中,尽可能选择年龄没有显著差异两组病人。由于微阵列研究往往费用比较昂贵,研究人员没有足够的样本得出针对病人临床特征的具有统计学意义的结论。如果能够汇集并组织已有的微阵列数据中的临床特征数据,深入探索临床特征与基因表达数据之间的关联,有可能把基础研究与临床实践结合到一起,成为解决问题的途径之一。 目前,在许多基因表达数据库的记录中则蕴含着与该实验数据相关的病人信息,其中重要公共资源当属基因表达数据库( Gene Expression Omnibus , GEO , http://www.ncbi.nlm.nih.gov/geo )。 GEO 是 NCBI 为保存和自由分发科研人员提交的高通量基因表达数据而建立一个基因表达数据仓库和在线资源,该数据库保存了来自微阵列( microarray ),高密度寡核苷酸阵列( HAD ),杂交膜( filter )和基因表达系列分析( SAGE )的许多类型的基因表达数 据。 目前, GEO 存储了大约 10 亿单个基因表达的数据,来自于 100 多种生物,内容广泛涉及到各种生物学问题。在 GEO 中部分记录中含有病人临床信息。例如, GEO 中 GSE2019 号系列中有数百个样本,每个样本的 Description 字段都有临床信息的注释,包括病人的年龄、性别、种族、肿瘤的病理分期、分型等等(见图 1 )。 这些基础研究的记录中包含了病人的临床数据,潜在地把基础研究与临床实践联系了起来,对于开展转化医学探索具有重要的意义。因此,如何组织和表达基因表达数据库中的临床特征数据,尤其是这一研究领域所涉及到的基本术语、概念以及这些概念间的关系,是检索、存储、组织和利用数据库中来自不同实验室的患者临床数据的前提,也是今后开发相关知识库的保障,更是进一步开展转化医学研究的基础。 正是基于上述原因,本研究提出建立一个本体来规范表达临床信息 / 知识。 近年来,随着本体 (Ontology) 研究的逐渐成熟,本体技术被越来越多的研究领域所接受,已经成为整合和解释生物医学数据的重要工具技术 。简单的讲,本体是某一领域的术语及其关系的明确正规的界定在医学领域里开发出了大量的标准化的结构化词表,如 snomed 和一体化医学语言系统 (Unified Medical Language System , UMLS) 。极大的方便了人们交流、组织、表达和分析利用信息。 为此,我们搜集了 GEO 中与乳腺癌相关的基因表达记录,对筛选出含有病人信息的记录,分析其中使用的术语和概念。利用本体的方法构建出表达基因表达数据库中乳腺癌病人的临床信息的知识库。利用该本体可以更加准确的检索、分析和解释乳腺癌微阵列数据,由此促进转化科学和系统科学的发展。同时,也为今后在其他类型的肿瘤和其他高通量平台中应用本体来分析和组织信息提供基础。
个人分类: 生物信息学|5958 次阅读|2 个评论
学术报告通知:走向Web3.0:我们还有多远的路要走?
timy 2009-8-28 16:06
转发一下 : 题 目 :走向Web3.0:我们还有多远的路要走? 报告人 :黄智生博士 荷兰阿姆斯特丹自由大学计算机系 时 间 :2009年9月10日(星期四)下午2:00 地 点 :中国科学技术信息研究所五层548会议室 (北京海淀区复兴路15号中央电视台西门) 黄智生博士简历 :博士毕业于荷兰阿姆斯特丹大学逻辑研究所。目前担任荷兰阿姆斯特丹自由大学计算机系高级研究员、东南大学计算机学院和江苏科技大学客座教授。研究兴趣包括:语义Web的逻辑基础、智能多媒体技术、智能Agent的形式化理论与实现、决策支持系统的应用逻辑以及行为推理等。在国际高水平、期刊会议上发表学术论文百余篇。 个人主页: http://www.cs.vu.nl/~huang/ 报告提纲 :万维网(WWW)的技术发展和普及在短短的数十年里正在以前所未有的方式冲击着人类社会的方方面面,它深刻地改变了许多人的日常生活和工作方式,也为人类社会留下了巨大的信息和知识财富。万维网(WWW)的技术方式经历了从Web1.0发展到Web2.0,正在处于走向众说纷纭而令人向往的Web3.0的过程中。本报告将分析Web3.0的应用前景以及它同语义网(The Semantic Web)与本体技术的联系,并通过介绍欧盟第七研究框架中大型语义网LarKC课题(Large Knowledge Collider,大型知识对撞机)的研究内容,详细介绍海量语义处理与推理技术。本报告并将通过介绍一系列语义产品和应用系统,展现Web3.0丰富多彩的技术前景。 欢迎所内外各界人士踊跃参加! 信息技术支持中心 学术委员会 二○○九年八月二十八日
个人分类: 同行交流|4561 次阅读|0 个评论
语义网与本体技术纵横谈之二:对中国语义网论坛第一专题讨论的点评
ZSHuang 2009-3-11 22:25
应Admin的邀请,让我为SemanticWeb的第一个专题讨论究竟什么是SemanticWeb,它有什么标志特征?它能给我们带来什么?作点评。这四个月以来,大家在这个中国语义网论坛上对语义网的最基本问题展开热烈的讨论。到目前为止,共发帖55份,总点击数逼近8千。这对于一个纯学术的帖子来说,实属不易。我看到的是,大家在这个专题讨论中都能本着学术探讨的精神,畅所欲言,其乐融融,展现出良好的学术风气。 值得说明的是,我的下面的点评,不管是支持你的观点的还是反对你的观点的,都不是对大家的看法的一个结论性的判断。我希望以一个讨论参与者的身份平等地与大家进行讨论。首先我要感谢所有讨论的参与者,你们的所有看法或观点都是值得鼓励的,所以都是有价值的。 究竟什么是SemanticWeb,它有什么标志特征?这是所有语义网研究者必然要思考的核心问题。 正如Admin所指出的那样,SW的核心思想可以分为两个方面:一个是semantics,一个是web。语义(semantics)指的是提供能被计算机理解的数据,即它的逻辑分析与语义表示的维度。网(web)指的是那些语义数据不是孤立存在的,而是彼此互连,形成一个网状结构,即它的数据连接的维度。 所以,对于何为语义网,存在着下面四种不同的理解: (1)semantic+web:即语义网应是在现有的网络数据上加一点语义分析的内容,或者是在现有的语义数据上加一些网络描述的能力。 (2)semantic+Web:这里的Web的第一个字母是大写的,即语义网应是更多的网络成分,相对少的语义表示和处理的部分。 (3)Semantic+web:即语义网应是更多的语义处理成分,相对少的网络处理部分。 (4)Semantic+Web:即语义网应是很多的语义处理成分,再加上很多的网络处理部分。 详情请见:http://bbs.xml.org.cn/dispbbs.asp?boardID=2&ID=69324
个人分类: 科海拾贝|5490 次阅读|1 个评论
语义网与本体技术纵横谈: 语义与网络
ZSHuang 2009-2-13 22:23
语义网的核心问题就是要表达网络信息的语义(semantics),也就是我们通常所说的意义(meaning)。从逻辑学和语言学的角度来讲,所谓的语义指的是一个描述或一个词汇(或不严格地说,一个概念)与它所要表达的在客观或主观世界上所对应的一个实体所建立的一种联系。比如说,老虎这一词的语义,指的是它所对应的在客观世界中存在的一类动物的总称;孙悟空这一概念的语义,指的是它所对应的在人们文化精神世界中所描述的某个具体的人物。只要能建立这种概念与它所要指称的实体之间的联系,我们通常就可以认为它就已经表达了该概念的意义,即语义,这就是语义的指称性。 准确地讲,语义具有下列几个主要特征: 指称性(denotation):即上面所述的,语义应能体现概念或术语它所对应的在外部世界上的某个实体的联系; 唯一性(uniqueness):如果不同的术语用来表达同一个意义的话,则应指向唯一的一个外部实体,而不是多个外部实体; 关联性(relatedness):语义应能表达一个概念与其他概念之间的关联关系,而不是简单地对应到一个外部实体。 当然,人们通常所理解的意义远比上述这些特征更丰富。著名的数理逻辑的创始人之一的Frege就区分了Reference和Sense。前者指的是我们上面所说的语义的指称性特征,如我们用张老师来指称世界上某个具体的人物;后者指的是描述所附带有涉及语用环境的含义,如张老师还可能在特定的环境下包含着尊重的含义。在语义网上,我们关注的是描述的指称性及其相关的特征,而在目前情况下不去关注含义所涉及的一系列特征。 语义网是通过把概念指向某个网络资源来实现语义指称性的,具体地说,它通过在有关描述上附加一个URI(唯一资源标识UniformResourceIdentifier)的前缀来实现的。如要表达老虎这个动物概念,则使用类似于下列描述来表达: http://cohse.semanticweb.org/ontologies/animal#tiger 这里tiger是该概念的直接描述,而其前http://cohse.semanticweb.org/ontologies/animal#是该概念所对应的唯一网络资源标识符。显然这里所体现出来的指称性并不直接对应到我们通常所理解的在客观世界上的对应物(实际上也是不可能直接做到的)。在逻辑学和数理语言学上,也是通过语言陈述与它所对应的语义模型建立对应联系来实现的,这里的语义模型只是一个形式化的数学描述,而形式语义定义的最重要的特征是实现它的唯一性和关联性。唯一资源标识URI给语义网提供了一个非常有效的实现语义唯一性的手段,因为URI总是给出网络资源的唯一标识。这可以用一个不太准确但形象化的描述来概括:无二义就是有意义,即只要能够由计算机或人类把一个概念对应到一个无二义性的指称实体上去,就可以认为把握了该概念的语义了。语义的关联性是通过本体描述来实现的。正如本章前面所介绍的,本体描述了概念之间的包含关系,个别与一般的关系,部分与整体的关系,这就充分表述了概念之间的关联性。 所以说,网络技术在一定程度上是能够用于体现描述的语义性的。它也为机器自动处理提供了最重要的基础。这里最重要的技术关键就是基于网络的本体语言的开发与使用。 摘自马张华,黄智生(著)《网络信息资源组织》(第八章本体技术与语义网),北京大学出版社,2007.
个人分类: 科海拾贝|8443 次阅读|1 个评论
本体与数据挖掘结合的一本力作
timy 2009-2-9 20:25
Data Mining with Ontologies: Implementations, Findings, and Frameworks      来源于: https://igi-pub.com/reference/details.asp?ID=6844v=preface    Edited By: Hector Oscar Nigro , Universidad Nacional del Centro de la Provincia de Buenos Aires, Argentina; Sandra Elizabeth Gonzalez Cisaro , Universidad Nacional del Centro de la Provincia de Buenos Aires, Argentina; Daniel Hugo Xodo , Universidad Nacional del Centro de la Provincia de Buenos Aires, Argentina Preface: Data mining, also referred to as knowledge discovery in databases (KDD), is a process of finding new, interesting, previously unknown, potentially useful, and ultimately understandable patterns from very large volumes of data. Data mining is a discipline which brings together database systems, statistics, artificial intelligence, machine learning, parallel and distributed processing and visualization between other disciplines (Fayyad et al., 1996; Hand Kamber, 2001; Hernadez Orallo et al., 2004). Nowadays, one of the most important and challenging problems in data mining is the definition of the prior knowledge; this can be originated from the process or the domain. This contextual information may help select the appropriate information, features or techniques, decrease the space of hypothesis, represent the output in a most comprehensible way and improve the whole process. Therefore we need a conceptual model to help represent to this knowledge. According to Gruber's ontology definition?explicit formal specifications of the terms in the domain and relations among them (Gruber, 1993, 2002); we can represent the knowledge of knowledge discovery process and knowledge about domain. Principally, ontologies are used for communication (between machines and/or humans), automated reasoning, and representation and reuse of knowledge (Cimiano et al., 2004). As a result, ontological foundation is a precondition for efficient automated usage of knowledge discovery information. Thus, we can perceive the relation between Ontologies and data mining in two manners: From ontologies to data mining, we are incorporating knowledge in the process through the use of ontologies, i.e. how the experts comprehend and carry out the analysis tasks. Representative applications are intelligent assistants for discover process (Bernstein et al., 2001, 2005), interpretation and validation of mined knowledge, Ontologies for resource and service description and knowledge Grids (Cannataro et al., 2003; Brezany et al., 2004). From data mining to Ontologies, we include domain knowledge in the input information or use the ontologies to represent the results. Therefore the analysis is done over these ontologies. The most characteristic applications are in medicine, biology and spatial data, such as gene representation, taxonomies, applications in geosciences, medical applications and specially in evolving domains (Langley, 2006; Gottgtroy et al., 2003, 2005; Bogorny et al., 2005). When we can represent and include knowledge in the process through ontologies, we can transform data mining into knowledge mining. Data Mining with Ontologies Cycle Figure 1 shows our vision of data mining with ontologies cycle. Metadata ontologies : These ontologies establish how this variable is constructed i.e. which was the process that permit us to obtain its value, and it can vary using another method. Of course this ontology must also express general information about the variable as is treated. Domain ontologies : These ontologies express the knowledge about application domain. Ontologies for data mining process : These ontologies codify all knowledge about the process, i.e., select features, select the best algorithms according to the variables and the problem, and establish valid process sequences (Bernstein, 2001, 2005; Cannataro, 2003, 2004). According with Gomez-Perez and Manzano-Macho (2003) the different methods and approaches, which allow the extraction of ontologies or semantics from database schemas can be classified on three areas, main goal, techniques used and sources used for learning. With regard to the attributes of each area they are the following for summary of ontology learning methods from relational schema are: Main goal To map a relational schema with a conceptual schema To create (and refine) an ontology To create ontological instances (from a database) Enhance ad hoc queries Techniques used Mappings Reverse engineering Induction inference Rule generation Graphic modeling Sources used for learning Relational schemas (of a database) Schema of domain specific databases Flat files Relational databases In next paragraphs we explain in more detail these three classes of ontologies based on earlier works from different knowledge fields. Domain Ontology The models on many scientists work to represent their work hypotheses are generally cause effect diagrams. Models make use of general laws or theories to predict or explain behavior in specific situations. Currently these cause effect diagrams can be without difficulty translated to ontologies, by means of conceptual maps which discriminate taxonomy organized as central concepts, main concept, secondary concepts, specific concepts. Discovery systems produce models that are valuable for prediction, but they should also produce models that have been stated in some declarative format, that can be communicated clearly and precisely, which helps people understand observations, in terms that they find well known (Bridewell, 2006; Langley, 2002, 2006). Models can be from different appearances and dissimilar abstraction level, but the more complex the fact for which they account, the more important that they be cast in some formal notation with an unambiguous interpretation. And of course these new knowledge can be easily communicated and updated between systems and Knowledge databases. In particular into data mining field knowledge can be represented in different formalisms, e.g. rules, decision trees, cluster, known as models. Discovery systems should generate knowledge in a format that is well known to domain users. There are an important relation between knowledge structures and discovery process with learning machine. The formers are important outputs of discovery process, and are important inputs to discovery (Langley, 2000). Thus knowledge plays as crucial a role as data in the automation of discovery. Therefore, ontologies provide a structure capable of supporting the knowledge representation about domain. Metadata Ontologies As Spyns et al. (2002) affirm ontologies in current computer science language are computer-based resources that represent agreed domain semantics. Unlike data models, the fundamental asset of ontologies is their relative independence of particular applications, i.e., an ontology consists of relatively generic knowledge that can be reused by different kinds of applications/tasks. In opposition a data model represents the structure and integrity of the data elements of the, in principle ?single?, specific enterprise application(s) by which it will be used. Consequently, the conceptualization and the vocabulary of a data model are not intended a priori to be shared by other applications (Gottgtroy et al., 2005). Similarly, in data modeling practice, the semantics of data models often constitute an informal accord between the developers and the users of the data model?including when a data warehouse is designedand, in many cases, the data model is updated as it evolves when particular new functional requirements pop up without any significant update in the metadata repository. Both ontology model and data model have similarities in terms of scope and task. They are context dependent knowledge representation, that is, there doesn?t exist a strict line between generic and specific knowledge when you are building ontology. Moreover, both modeling techniques are knowledge acquisition intensive tasks and the resulted models represent partial account of conceptualizations (Gottgtroy et al., 2003). In spite of the differences, we should consider the similarities and the fact of data models carry a lot of useful hide knowledge about the domain in its data schemas, in order to build ontologies from data and improve the process of knowledge discovery in databases. Due the fact data schemas do not have the required semantic knowledge to intelligently guide ontology construction has been presented as a challenge for database and ontology engineers (Gottgtroy et al., 2003). Ontologies for Data Mining Process Vision about KDD process is changing over time. In its beginnings the main objective was to extract a valuable pattern from a fat file as a play of try and error. As time goes by, researchers and fundamentally practitioners discuss the importance of a priori knowledge, the knowledge and understandability about the problem, the choice of the methodology to do the discovery, the expertise in similar situations and an important question arises up to what existent is such inversion on data mining projects worthwhile? As practitioners and researchers in this field we can perceive that expertise is very important, knowledge about domain is helpful and it simplify the process. To do more attractive the process to managers the practitioners must do it more efficiently and reusing experience. So we can codify all statistical and machine learning knowledge with ontologies and use it. Bernstein et al. (2001) have developed the concept of intelligent assistant discovery (IDA), which helps data miners with the exploration of the space of valid data mining processes. It takes advantage of an explicit ontology of data-mining techniques, which defines the various techniques and their properties. Main characteristics are (Bernstein et al., 2005). A systematic enumeration of valid DM processes, so they do not miss important, potentially fruitful options. Effective rankings of these valid processes by different criteria, to help them choose between the options. An infrastructure for sharing data mining knowledge, which leads to what economists call network externalities. Cannataro and colleagues have done another interesting contribution to this kind of ontologies. They developed an ontology that can be used to simplify the development of distributed knowledge discovery applications on the Grid, offering to a domain expert a reference model for the different kind of data mining tasks, methodologies and software available to solve a given problem, helping a user in finding the most appropriate solution (Cannataro et al., 2003, 2004). Authors have adopted the Enterprise Methodology (Corcho et al., 2003). Research Works in the Topic The next paragraphs will describe the most recently research works in data mining with ontologies field. Singh, Vajirkar, and Lee (2003) have developed a context aware data mining framework which provide accuracy and efficacy to data mining outcomes. Context factors were modeled using ontological representation. Although the context aware framework proposed is generic in nature and can be applied to most of the fields, the medical scenario provided was like a proof of concept to our proposed model. Hotho, Staab and Stumme (2003) have showed that using ontologies as filters in term selection prior to the application of a K-means clustering algorithm will increase the tightness and relative isolation of document clusters as a measure of improvement. Pand and Shen (2005) have proposed architecture for knowledge discovery in evolving environments. The architecture creates a communication mechanism to incorporate known knowledge into discovery process, through ontology service facility. The continuous mining is transparent to the end user; moreover, the architecture supports logical and physical data independence. Rennolls (2005, p. 719) have developed an intelligent framework for data mining, knowledge discovery and business intelligence. The ontological framework will guide to user to choice of models from an expanded data mining toolkit, and the epistemological framework will assist to user in interpreting and appraising the discovered relationships and patterns. On domain ontologies, Pan and Pan (2006) have proposed ontobase ontology repository. It is an implementation, which allows users and agents to retrieve ontologies and metadata through open Web standards and ontology service. Key features of the system include the use of XML metadata interchange to represent and import ontologies and metadata, the support for smooth transformation and transparent integration using ontology mapping and the use of ontology services to share and reuse domain knowledge in a generic way. Recently, Bounif et al. (2006) have explained the articulation of a new approach for database schema evolution and outline the use of domain ontology. The approach they have proposed belongs to a new tendency called the tendency of a priori approaches. It implies the investigation of potential future requirements besides the current requirements during the standard requirements analysis phase of schema design or redesign and their inclusion into the conceptual schema. Those requirements are determined with the help of a domain ontology called ?a requirements ontology? using data mining techniques and schema repository. Book Organization This book is organized into three major sections dealing respectively with implementations, findings, and frameworks. Section I : Implementations includes applications or study cases on data mining with ontologies. Chapter I , TODE: An Ontology-Based Model for the Dynamic Population of Web Directories by Sofia Stamou, Alexandros Ntoulas, and Dimitris Christodoulakis studies how we can organize the continuously proliferating Web content into topical categories, also known as Web directories. Authors have implemented a system, named TODE that uses Topical Ontology for Directories? Editing. Also TODE?s performance is evaluated; experimental results imply that the use of a rich topical ontology significantly increases classification accuracy for dynamic contents. Chapter II , Raising, to Enhance Rule Mining in Web Marketing with the Use of an Ontology by Xuan Zhou and James Geller introduces Raising as an operation which is used as a preprocessing step for data mining. Rules have been derived using demographic and interest information as input for data mining. The Raising step takes advantage of interest ontology to advance data mining and to improve rule quality. Furthermore, the effects caused by Raising are analyzed in detail, showing an improvement of the support and confidence values of useful association rules for marketing purposes. Chapter III , Web Usage Mining for Ontology Management by Brigitte Trousse, Marie-Aude Aufaure, B?n?dicte Le Grand, Yves Lechevallier, and Florent Masseglia proposes an original approach for ontology management in the context of Web-based information systems. Their approach relies on the usage analysis of the chosen Web site, in complement of the existing approaches based on content analysis of Web pages. One major contribution of this chapter is then the application of usage analysis to support ontology evolution and/or web site reorganization. Chapter IV , SOM-Based Clustering of Multilingual Documents Using an Ontology by Minh Hai Pham, Delphine Bernhard, Gayo Diallo, Radja Messai, and Michel Simonet presents a method which make use of Self Organizing Map (SOM) to cluster medical documents. The originality of the method is that it does not rely on the words shared by documents but rather on concepts taken from ontology. The goal is to cluster various medical documents in thematically consistent groups. Authors have compared the results for two indexing schemes: stem-based indexing and conceptual indexing. Section II : Findings comprise more theoretical aspects of data mining with ontologies such as ontologies for interpretation and validation and domain ontologies. Chapter V , Ontology-Based Interpretation and Validation of Mined Knowledge: Normative and Cognitive Factors in Data Mining by Ana Isabel Canhoto, addresses the role of cognition and context in the interpretation and validation of mined knowledge. She proposes the use of ontology charts and norm specifications to map how varying levels of access to information and exposure to specific social norms lead to divergent views of mined knowledge. Domain knowledge and bias information influence which patterns in the data are deemed as useful and, ultimately, valid. Chapter VI , Data Integration Through Protein Ontology by Amandeep S. Sidhu, Tharam S. Dillon, and Elizabeth Chang discuss conceptual framework of Protein Ontology that has a hierarchical classification of concepts represented as classes, from general to specific; a list of attributes related to each concept, for each class; a set of relations between classes to link concepts in ontology in more complicated ways than implied by the hierarchy, to promote reuse of concepts in the ontology; and a set of algebraic operators to query protein ontology instances. Chapter VII , TtoO: Mining a Thesaurus and Texts to Build and Update a Domain Ontology by Josiane Mothe and Nathalie Hernandez introduces a method re-using a thesaurus built for a given domain, in order to create new resources of a higher semantic level in the form of an ontology. The originality of the method is that it is based on both the knowledge extracted from a thesaurus and the knowledge semiautomatically extracted from a textual corpus. In parallel, authors have developed mechanisms based on the obtained ontology to accomplish a science-monitoring task. An example is provided in this chapter. Chapter VIII , Evaluating the Construction of Domain Ontologies for Recommender Systems Based on Texts by Stanley Loh, Daniel Lichtnow, Thyago Borges, and Gustavo Piltcher, investigates different aspects in the construction of domain ontology to a content-based recommender system. The chapter discusses different approaches so as to construct the domain ontology, including the use of text mining software tools for supervised learning, the interference of domain experts in the engineering process and the use of a normalization step. Section III : Frameworks includes different architectures for different domains in data warehousing or mining with ontologies context. Chapter IX , by Vania Bogorny, Paulo Martins Engel, and Luis Otavio Alvares introduces the problem of mining frequent geographic patterns and spatial association rules from geographic databases. A large amount of natural geographic associations are explicitly represented in geographic database schemas and geo-ontologies, which have not been used so far in frequent geographic pattern mining. The main goal of this chapter is to show how the large amount of knowledge represented in geo-ontologies as prior knowledge can be used to avoid the extraction of patterns previously known as noninteresting. Chapter X , Ontology-Based Construction of Grid Data Mining Workflows by Peter Brezany, Ivan Janciak, and A Min Tjoa, introduces an ontology-based framework for automated construction of complex interactive data mining workflows. The authors present their solution called GridMiner Assistant (GMA), which addresses the whole life cycle of the knowledge discovery process. In addition, conceptual and implementation architectures of the framework are presented and its application to an example taken from the medical domain is illustrated. Chapter XI , Ontology-Based Data Warehousing and Mining Approaches in Petroleum Industries by Shastri L. Nimmagadda and Heinz Dreher. Complex geo-spatial heterogeneous data structures complicate the accessibility and presentation of data in petroleum industries. Data warehousing approach supported by ontology will be described for effective data mining. Ontology based data warehousing framework with fine-grained multidimensional data structures facilitates mining and visualization of data patterns, trends, and correlations hidden under massive volumes of data. Chapter XII , A Framework for Integrating Ontologies and Pattern-Bases by Evangelos Kotsifakos, Gerasimos Marketos, and Yannis Theodoridis propose the integration of pattern base management systems (PBMS) and ontologies. It is as a solution to the need of many scientific fields for efficient extraction of useful information from large databases and the exploitation of knowledge. Authors use a case study of data mining over scientific (seismological) data to illustrate their proposal. Book Objective This book aims at publishing original academic work with high quality scientific papers. The key objective is to provide to data mining students, practitioners, professionals, professors and researchers an integral vision of the topic. This book specifically focuses on those areas that explore new methodologies or examine real study cases that are ontology-based The book describes the state-of-the-art, innovative theoretical frameworks, advanced and successful implementations as well as the latest empirical research findings in the area of data mining with ontologies. Audience The target audience of this book is readers who want to learn how to apply data mining based on ontologies to real world problems. The purpose is to show users how to go from theory and algorithms to real applications. The book is also geared toward students, practitioners, professionals, professors and researchers with basic understanding in data mining. The information technology community can increase its knowledge and skills with these new techniques. People working on the Knowledge Management area such as engineers, managers, and analysts can read it, due to the fact that data mining, ontologies and knowledge management areas are linked straightforwardly. References Bernstein, A., Hill, S., Provost, F. (2001). Towards intelligent assistance for the data mining process: An ontology-based approach . CeDER Working Paper IS-02-02, New York University. Bernstein, A., Provost, F., Hill, S. (2005). Towards intelligent assistance for the data mining process: An ontology-based approach for cost/sensitive classification. In IEEE Transactions on Knowledge and Data Engineering , 17(4), 503-518. Bogorny, V., Engel, P. M., Alvares, L.O. (2005). Towards the reduction of spatial join for knowledge discovery in geographic databases using geo-ontologies and spatial integrity constraints. In M. Ackermann, B. Berendt, M. Grobelink, V. Avatek (Eds.), Proceedings ECML/PKDD Second Workshop on Knowledge Discovery and Ontologies (pp. 51-58). Bounif, H., Spaccapietra, S., Pottinger, R. (2006, September 12-15). Requirements ontology and multirepresentation strategy for database schema evolution . Paper presented at the 2nd VLDB Workshop on Ontologies-based techniques for Databases and Information Systems. Seoul, Korea. Brezany, P., Janciak, I., Woehrer, A., Tjoa, A.M. (2004). GridMiner: A framework for knowledge discovery on the Grid from a vision to design and implementation . Cracow Grid Workshop. Cracow, Poland: Springer. Bridewell, W., S?nchez, J. N., Langley, P., Billwen, D. (2006). An Interactive environment for the modeling on discovery of scientific knowledge. International Journal of Human-Computer Studies , 64, 1009-1014. Cannataro, M., Comito, C. (2003, May 20-24). A data mining ontology for Grid programming . Paper presented at the I Workshop on Semantics Peer to Peer and Grid Computing. Budapest. Retrieved March, 2006, from http://www.isi.edu/~stefan/SemPGRID Cannataro, M., Congiusta, A. Pugliese, A., Talia, D., Trunfio, P. (2004). Distributed data mining on Grids: Services, tools, and applications. IEEE Transactions on Systems, Man and Cybernetics, Part B , 34(6), 2451-2465. Cimiano, P., Stumme, G., Hotho, A., Tane, J. (2004). Conceptual knowledge processing with formal concept analysis and ontologies. In Proceedings of The Second International Conference on Formal Concept Analysis (ICFCA 04) . Corcho, O., Fern?ndez-L?pez, M., G?mez-P?rez, A. (2003). Methodologies, tools and languages for building ontologies: where is their meeting point? Data Knowledge Engineering 46(1), 41-64. Amsterdam: Elsevier Science Publishers B. V. Fayyad, U., Piatetsky-Shiapiro, G., Smyth, P., Uthurusamy R. (1996). Advances in knowledge discovery and data mining . Merlo Park, California: AAAI Press. G?mez P?rez, A., Manzano Macho, D., (Eds.) (2003). Survey of ontology learning methods and techniques . Deliverable 1.5 OntoWeb Project Documentation. Universidad Polit?cnica de Madrid. Retrieved November, 2006, from http://www.deri.at/fileadmin/documents/deliverables/Ontoweb/ D1.5.pdf Gottgtroy, P., Kasabov, N., MacDonell, S. (2003, December). An ontology engineering approach for knowledge discovery from data in evolving domains. In Proceedings of Data Mining 2003 Data Mining IV . Boston: WIT. Gottgtroy, P., MacDonell, S., Kasabov, N., Jain, V. (2005). Enhancing data analysis with Ontologies and OLAP . Paper presented at Data Mining 2005, Sixth International Conference on Data Mining, Text Mining and their Business Applications, Skiathos, Greece. Gruber, T. (1993). A translation Approach to Portable Ontology Specification. Knowledge Acquisitions , 5(2), 199-220. Gruber, T. (2002). What is an ontology? Retrieved November, 2006, from http://www-ksl.stanford. edu/kst/what-is-an-ontology.html Han, J., Kamber, M. (2001). Data mining: Concepts and techniques . Morgan Kaufmann. Hern?ndez Orallo, J., Ram?rez Quintana, M., Ferri Ramirez, C. (2004). Introducci?n a la Miner?a de Datos . Madrid: Editorial Pearson Educaci?n SA. Hotho, A., Staab, S., Stumme, G. (2003). Ontologies improve text document clustering. In Proceedings of the 3rd IEEE Conference on Data Mining , Melbourne, FL, (pp.541-544). Langley, P. (2000). The computational support of scientific discovery. International Journal of Human- Computer Studies , 53, 393-410. Langley P. (2006). Knowledge, data, and search in computational discovery . Invited talk at International Workshop on feature selection for data mining: Interfacing machine learning and statistics, (FSDM) April 22, 2006, Bethesda, Maryland in conjunction with 2006 SIAM Conference on data mining (SDM). Pan, D., Shen, J. Y. (2005). Ontology service-based architecture for continuous knowledge discovery. In Proceedings of International Conference on Machine Learning and Cybernetics , 4, 2155-2160. IEEE Press. Pan, D., Pan, Y. (2006, June 21-23). Using ontology repository to support data mining. In Proceedings of the Sixth World Congress on Intelligent Control and Automation , Dalian, China, (pp. 5947-5951). Rennolls, K. (2005). An intelligent framework (O-SS-E) For data mining, knowledge discovery and business intelligence. Keynote Paper. In Proceeding 2nd International Workshop on Philosophies and Methodologies for Knowledge Discovery , PMKD?05, in the DEXA?05 Workshops (pp. 715- 719). IEEE Computer Society Press. ISBN 0-7695-2424-9. Singh, S., Vajirkar, P., Lee, Y. (2003). Context-based data mining using ontologies. In Song, I., Liddle, S. W., Ling, T. W., Scheuermann, P. (Eds.), Proceedings 22nd International Conference on Conceptual Modeling . Lecture Notes in Computer Science (vol. 2813, pp. 405-418). Springer. Spyns, P., Meersman, R., Jarrar, M. (2002). Data modeling versus ontology engineering, SIGMOD Record Special Issue on Semantic Web, Database Management and Information Systems , 31.
个人分类: 文本挖掘|11522 次阅读|5 个评论
全国第四届语义Web与本体论学术研讨会(SWON2009)[zz]
timy 2009-1-4 15:34
转载于: http://www.jos.org.cn/ch/reader/view_news.aspx?id=2009010495036001 全国第四届语义Web与本体论学术研讨会(SWON2009) 征文通知 (2009年9月26-28日 中国矿业大学,徐州) 语义Web吸取人工智能、信息论、哲学、逻辑和计算复杂性等学科的研究成果,力图对Web上信息的表示和获取方式进行改进,以解决目前使用Web时存在的瓶颈。语义Web的核心思想是通过增加一些语义信息,使得计算机能参与到自动处理Web信息的过程,并为实现智能化的Web应用提供必要的技术基础。 全国语义Web与本体论学术研讨会(SWON)是中国计算机学会暨电子政务与办公自动化专委会主办的系列会议。SWON 2009会议将于2009年9月在徐州召开。会议目的是为语义Web学术界和工业界提供一个交流平台,反映国际国内关于语义Web的最新研究成果和进展。 会义录用论文中主要论文初定拟以英文方式由IEEE Computer Society Press(EI源刊)正刊出版,其余论文将由核心期刊《计算机科学》专刊、《计算机与数字工程》正刊和清华大学出版社出版(根据录用篇数确定期刊种类)。会议期间除进行会议论文交流外,还将邀请著名专家做特邀报告,并继续评选大会优秀学生论文。 一、征文范围(包括但不限于) 语义Web语言与工具 语义Web知识表示 语义Web知识管理 语义Web推理 语义Web服务 语义Web安全 语义Web挖掘 语义信息标注 语义检索和查询 本体学习与元数据生成 本体存储与管理 语义集成和映射 二、来稿要求 1.本次会议主要通过网上投稿,尽量不要通过Email投稿,拒收纸质稿件。严禁一稿多投。 2.中英文稿均可,一般不超过6000字,为了便于出版论文集,来稿必须附中英文摘要、关键词、资助基金与主要参考文献,注明作者及主要联系人姓名、工作单位、详细通信地址(包括Email地址)与作者简介。稿件要求采用WORD或PDF格式。 三、联系信息 1.投稿地址: http://www.easychair.org/conferences/?conf=swon09 . 2.大会网站: http://www.neu.edu.cn/wisa2009 . 3.会务情况:中国矿业大学 姜淑娟( shjjiang@cumt.edu.cn ) 四、重要日期 1.征文截止日期:2009年4月25日 2.录用通知发出日期:2009年5月20日 3.正式论文提交日期:2009年6月5日 4.会议召开日期:2009年9月26-28日
个人分类: 同行交流|4531 次阅读|0 个评论
也来建一个本体
zilu85 2008-11-8 09:27
据说游泳教练不一定会游泳,所以,尽管指导了 2 个学生开发本体,其实自己真的没上手操练起来。虽然原理都明白,毕竟是纸上谈兵,叶公好龙,滥竽充数,到了这里准备独奏了,踏踏实实地从头到尾地把这个过程走一遍,也算不枉此行,毕竟这里的老师都有经验。 建立本体不等于会使用本体编辑器(比如我用的 Protege ),而是对要表现领域的概念和概念关系有一个清楚的理解,想着如何设计出来一个东西,让大家都认为通过你的思维建立起来的本体是他们心目中的知识框架。所以,功夫在诗外,个人觉得 engineering 的东西并不重要。这个说着容易,真的做起来太难了。我就一直分不清类和属性,比如,最简单的,病人年龄 40 岁,这个年龄是作为类列出来呢,还是作为属性呢?有的时候觉得应该设为类,有时候觉得这是某个类的属性。回过头来把那些介绍最基本概念的资料又重新读一遍,这回体会又上了一个层次了。看来学会一样东西,必须真正地实践,然后再回头学习理论,这么粗浅的道理,又让我体会并总结出来了。 等俺回去了,谁跟我提本体我就可以满怀信心地和他探讨了,因为我亲手做过了。虽然不是什么发明创造,可是经历给了俺信心。
个人分类: 休闲|3403 次阅读|2 个评论
利用本体为基础的文本挖掘方法从MEDLINE文摘中抽取生物学关系抽取并应答查询
zilu85 2008-10-14 02:14
生物学文本数据存储量的急剧增长使得造成了人类方便有效地获取所需信息上的困难。问题的出现是由于大多数信息都隐含在无结构或者半结构的文本中,这些文本计算机无法轻易地理解。 本文介绍了一个基于本体的生物学信息抽取与查询应答系统( Biological Information Extraction and Query Answering , BIEQA ),该系统首先通过对一组存储在生物学本体中的概念进行文本挖掘,然后应用自然语言处理技术和共现分析技术挖掘出概念间可能的生物学关系。系统用文本挖掘方法将每一对生物学概念间频繁出现的生物学关系抽取出来。挖掘出来的关系都标有成员隶属程度的模糊值,该值等于该关系出现频次占整个文献集合中关系频次的比例,称作模糊生物学关系。把从文本集合中抽取出来的模糊生物学关系与其他诸如关系中出现的生物学条目等相关信息存储于数据库中。 数据库与问询处理模型集成在一起。查询处理模型带有界面,指导用户生成不同精确度的正规检索策略。 Biological relation extraction and query answering from MEDLINE abstracts using ontology-based text mining Muhammad Abulaish and Lipika Dey Data Knowledge Engineering Volume 61, Issue 2 , May 2007, Pages 228-262
个人分类: 生物医学文本挖掘|4806 次阅读|1 个评论
从文本资源中建立医学本体的方法(节译)
zilu85 2008-9-30 23:15
医学领域里,人们普遍认可应该通过建立本体来开发无歧义的词表。本研究的目标是帮助肺病学专家对其诊断和治疗活动进行编码,用一个采用专业本体表现医学知识的软件。本文介绍了我们根据从文本中抽取术语建立医学本体的知识工程学的方法。将自然语言处理工具应用于病人出院小结的文本上开发出建立肺病学本体所必需的资源。结果表明,在建立此类本体上,将分布分析和词汇-句法模式结合使用可以达到令人满意的效果。 引言近10年以来,法国公立医院一直相互交流其医疗活动的信息。对于每一个病人的信息可以通过病人的出院小结加以收集,每个病人的诊断采用国际疾病分类法归类。一般法国的编码过程都是由医生使用医学专业词表手工完成。这些词表是为了帮助医生对常用术语进行编码而编撰的,很明显这些根据词表编撰的编码工具不能准确地满足医生的需要。实际上,词表中存在着词汇含义模糊,词汇不全面等问题,其一致性和完整性的维护也是个问题。更严重的是,部分地由于词表的含义模糊性,编码上的不一致也成为众所周知的问题。因此,有文献提出自动化编码任务需要对医学条目的概念化组织,即把这些条目的含义应当写进本体内模型结构之中。本体就是一种正规的结构,其目标就是通过基本元素、概念,及其定义和相互关系的组织来表示特定的知识领域。我们认为开发本体资源会有助于开发高效能、可信度高的高级编码工具。目标我们认为,应当根据本体开发的目标来设计分类体系结果的分类标准。我们注意到目前还没有涵盖了肺病领域法语编码过程的本体。本研究的目标就是建立一个这样的本体。关于建立本体方法的报道很多,但是很少有详细介绍概念化的步骤的,就是获取和组织概念及其关系的过程。我们研究的主要限制是需要由知识工程师而不是直接由医生建立本体。对于知识工程师而言,主要的问题是辨别和分类某一领域的概念。我们应用了一种由文本驱动的方法并将文本报告作为信息的主要资源。用自然语言处理工具来分析语料。本文所采用的方法是以差异性语义规则(differential semantics principles)为基础的。我们研究的主要假设就是联合使用如下两种方法可以提高建立本体的效率:1)用分布分析来建立术语表资源的方法;2)观察那些表现所需要的关系的语料中的句子来识别语义关系的方法。首先,本文介绍了本研究中使用的材料和工具,然后在方法部分详细介绍了建立本体的各个步骤,结果部分介绍了对本体评价的统计学测量,本体专业覆盖面及其在辅助编码上的使用。最后,通过讨论本研究的收获得出作出结论。 http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1839277blobtype=pdf
个人分类: 生物信息学|4455 次阅读|0 个评论

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-28 13:32

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部