拜读武夷山老师的博文 《科学计量学、科学史和情报学的历史联系》, 勾起了我对学科之间关系的兴趣,于是禁不住把自己的想法也一吐为快。呵呵,不怕您笑话,这么简单的问题,我到今天才考虑出来点眉目。 无论是信息计量学,文献计量学,还是科学计量学,其主要活动就是数数,掰手指头或者打算盘都行。从命名的原则上讲,信息、文献是数数的对象,而科学则是说这种计数活动的应用对象。这样一想,其实本身是一个活动(计数)的两个方面,如同小姑娘总要在不同场合穿不同的衣裳。尽管文献计量学历史悠久一些,信息计量学和科学计量学提出的似乎晚一点;尽管有着三大定律(洛特卡、布拉德福、齐夫)和两大规律(增长和老化),以及引文分析,看着似乎很繁杂,但是,从应用的角度考虑,我个人认为文献计量学能解决的无外乎两件事:(1)科学结构的分析,某个专题领域的研究主题,无论是用论文的同被引分析,还是主题词/关键词的共词聚类分析,还是作者同被引分析,甚至引文年代分布,都是通过对文献计数的方法展现科学研究活动的基本状况的,这当然属于科学计量学的了。(2)绩效评价:科研单位和个人乃至国家的科研活动成果评价,比如发表文献量,被引用数量,当然可以扩大到国家层次上,也可以缩小到期刊水平上,这些研究不是科学计量学又是什么呢?所以,从应用对象而言,文献计量学就是科学计量学的一个组成部分,甚至可以说文献计量学仍然是科学计量学到目前为止的主要的研究手段。如果深入一点儿,不是以整篇文章作为计数单位,比如从文章中抽取出来主题词,或者概念,或者知识,以及这些东东直接的关系,那就可以说是信息计量学了,只不过进展不大,后来又被数据挖掘所侵占了。 这就又涉及到了文本挖掘的问题了,我一直也困扰,从1986年,在武汉大学听邱均平老师的第一堂课开始,我一直从事文献计量学的研究,这些年来改称自己研究的领域是数据挖掘和知识发现,其实有赶时髦的嫌疑,自己心里也嘀咕,自己所研究的还不就是文献计量学那一套吗?虽然我对数据挖掘的定义、文本挖掘的主要任务,乃至文本挖掘在生物医学领域的主要研究方向达到倒背如流(主要是为了讲课用),但是,心里还是害怕别人问这个问题的。后来,一个具体的例子让我对二者的区别有了些许的领悟。 一个难缠的学生,给我提出这样一个问题:影响健康公平 研究 的主要外因有哪些?希望我用情报学的方法来解决之。我最初自然的联想就是找到有关于健康公平的文献,抽取其中概念以及概念间的联系,形成一个规则或者模板,然后返回到大的文献集合中,找到答案。结果发现,得到的是影响健康公平的因素,而不是影响健康公平研究的因素。这就是文献计量学和文本挖掘的分别: 寻找影响健康公平 研究 的因素,实际上是文献计量学或者科学计量学的任务,是对一个学科发展状况的分析,所以还是要从文献计量学中寻找办法; 寻找影响健康公平的因素,则是文本挖掘的任务,如同目前在生物信息学领域大量出现的论文中涉及到的蛋白间作用,基因与疾病关系等等一样的(这个基因能引起哪些疾病?)。我也做过阿司匹林有哪些副作用的分析,只是那时候对学科间的关系还是稀里糊涂的。 最后,重温一下生物医学领域的文本挖掘的主要任务: 2005年,Aaron M. Cohen and William R. Hersh. A survey of current work in biomedical text mining. BRIEFINGS IN BIOINFORMATICS. VOL 6. NO 1. 5771. MARCH 2005 (1) 命名实体识别(Named Entity Recognition ):识别出文献集中某一种事物的各种名称,比如某一组期刊论文中所有的药物名,一组MEDLINE文摘中基因名称和符号。 (2) 文本分类(Text classification):自动判别一篇文献是否具有某种特性,一般是指该文献是否讨论某一个主题或者含有特定类型的信息。 (3) 同义词或者缩略语抽取(Synonym and abbreviation extraction):主要是未发现的基因名称同义词或者缩略语的抽取。 (4) 关系抽取(Relationship extraction):发现特定一对实体之间有某种预先设定好的关系,比如基因,蛋白或者药物之间的各种生物医学关系或者特定的某种关系(如调控关系)。 (5) 形成假说(Hypothesis generation):基于Swanson的非相关互补文献的发现。 (6) 集成系统平台(Integration frameworks):TXTGate,PubMatrix,Textpresso等。 2007 年, Pierre Zweigenbaum , Dina Demner-Fushman , Hong Yu , Kevin B. Cohen. Frontiers of biomedical text mining: current progress. Brief Bioinform . 2007 September ; 8(5): 358375. (1)从文本中抽取事实(EXTRACTING FACTS FROM TEXTS) (1.1)识别命名实体(Named entity recognition) (1.2)确认生物医学实体关系(Identifying relations between biomedical entities) (2)基于信息抽取的研究(BEYOND INFORMATION EXTRACTION) (2.1)总结(Summarization):自动总结文本的内容,确认一篇或者多篇论文的最重要的内容,并简洁规范地表示之。 (2.2)处理非文本资料(Processing non-textual material):用图像分析技术和自然语言处理技术来分析图表以及图表相关的文字,或者处理特殊类型的文字,比如化合物。 (2.3)回答问题(Question answering):高精度的文献检索,给出简短的回答,提供支持材料和链接。 (2.4)基于文献的发现(Literature-based discovery):还是Swanson的研究。 (3)评估系统和面向用户的系统(ASSESSMENT AND USER-FOCUSED SYSTEMS) (3.1)注释文本集和大规模评价(Annotated text collections and large-scale evaluation):用于评价文本挖掘系统的语料库等等。 (3.2)了解用户需求(Understanding user needs):在系统开发过程中考虑到用户需求、行为以及与系统工具的相互作用,以此来判断生物医学信息学服务和工具是否必须和有用。比如对FlyBase数据库的开发中就利用了对用户行为的观察和用户反馈信息。 通过对上面的两篇文章的分析汇总,我们大致可以归纳出,在生物医学领域 ,文本挖掘主要内容就是(1)文本挖掘的基本技术,如命名实体识别和关系抽取;(2)以文本挖掘基本技术为基础,开展的应用性的研究;(3)有关系统的开发和评价研究。 对了,下面是一个文本挖掘的入门读物,很简单,有兴趣的同行可以看看: K. Bretonnel Cohen, Lawrence Hunter Getting Started in Text Mining.PLoS Computational Biology, 2008,4( 1): e20.( www.ploscompbiol.org )
The 6 th International Conference on Scientometrics and University Evaluation Nov. 5-6, 2010 Wuhan , China http://www.icsue2010.nseac.com The 6th International Conference on Scientometrics and University Evaluation (ICSUE 2010) will be held from November 5th to 6th, 2010 in Wuhan , China . Located in the middle of China , Wuhan is one of the most delightful and modern cities in China . ICSUE 2010 aims to provide a high-level international forum for researchers and scholars to present and discuss recent advances and new topics in Scientometrics and Scientific Evaluation. All accepted papers will be recommended to be indexed by EI and ISTP. Some excellent papers will be recommended to the Journal of Scientometrics (Indexed by SCISSCI) . The multiple topics of interest include, but are not limited to: l Scientometrics 1. Scientometrics and Societys Development 2. Development and Trend of Scientometrics 3. Theory and Method of Scientometrics 4. Scientometrics, Bibliometrics and Informetrics 5. Development and Trend of Webometrics and Knowmetrics l University Evaluation 1. Theory and Method of University Evaluation 2. Indicator System of University Evaluation 3. Status and Function of University Evaluation 4. Comparative Study on University Evaluation 5. Current Research, Problems and Strategy on Scientific Evaluation 6. Case Study on University Evaluation, Scientific Evaluation and Journal Evaluation 7. Standard of World-Class Universities and Evaluation Research 8. Development Strategy and Construction on World-Class Universities l Application Research of Five-metrics 1. On University Evaluation 2. On Scientific Evaluation 3. On Journal Evaluation Sponsor Wuhan University Cooperator Huazhong Normal University Chinese Association for Science of Science and ST Policy International Society for Scientometrics and Informetrics(ISSI) US News World Report(USNEWS) International Ranking Expert Group (IREG) IMPORTANT DATES Papers Due: Jul. 15th, 2010 Acceptance Notification: Aug. 15th, 2010 Registration Deadline: Sep. 15th, 2010 Conference: Nov. 5-6, 2010 CONTACT INFORMATION Website: http://www. icsue2010.nseac.com Fax: + 86-27-68754477 E-mail: icsue 2010@gmail.com
请用开放连接: http://arxiv.org/list/cs.DL/0912 Digital Libraries Authors and titles for cs.DL in Dec 2009 arXiv:0912.1221 Title: Clusters and Maps of Science Journals Based on Bi-connected Graphs in the Journal Citation Reports Authors: Loet Leydesdorff Journal-ref: Journal of Documentation, 60(4), 2004, 317-427 Subjects: Digital Libraries (cs.DL) ; Physics and Society (physics.soc-ph) arXiv:0912.1224 Title: The university-industry knowledge relationship: Analyzing patents and the science base of technologies Authors: Loet Leydesdorff Journal-ref: Journal of the American Society for Information Science and Technology, 55(11), 2004, 991-1001 Subjects: Digital Libraries (cs.DL) ; Computers and Society (cs.CY); Information Retrieval (cs.IR); Physics and Society (physics.soc-ph) arXiv:0912.1227 Title: Mapping the Chinese Science Citation Database Authors: Loet Leydesdorff , Jin Bihui Journal-ref: Proceedings of the 67th ASIST Annual Meeting, Vol. 41 (Medford, NJ: Information Today, 2004), pp. 488-495 Subjects: Digital Libraries (cs.DL) ; Physics and Society (physics.soc-ph) arXiv:0912.1371 Title: A study of seismology as a dynamic, distributed area of scientific research Authors: Caroline S. Wagner , Loet Leydesdorff Journal-ref: Scientometrics 58(1) (2003) 91-114 Subjects: Digital Libraries (cs.DL) ; Physics and Society (physics.soc-ph) arXiv:0912.1767 Title: An evaluation of Flickrs distributed classification system, from the perspective of its members, and as an image retrieval tool in comparison with a controlled vocabulary Authors: Samuel Piker Comments: Dissertation, 40 pages including appendices Subjects: Digital Libraries (cs.DL) ; Information Theory (cs.IT) arXiv:0912.2032 Title: Institutional Repository saber.ula.ve: A testimonial perspective Authors: Y. Briceno , H.Y. Contreras , L. A. Nunez , F. Salager-Meyer , A. Rojas , R. Torrens Comments: 7th International Conference on Open Access in Accra Ghana from 2nd to 3rd November 2009 Subjects: Digital Libraries (cs.DL) arXiv:0912.3098 Title: Maps on the basis of the Arts Humanities Citation Index: The journals Leonardo and Art Journal versus Digital Humanities as a topic Authors: Loet Leydesdorff , Alkim Almila Akdag Salah Subjects: Digital Libraries (cs.DL) ; Physics and Society (physics.soc-ph) arXiv:0912.3882 Title: Science overlay maps: a new tool for research policy and library management Authors: Ismael Rafols , Alan L. Porter , Loet Leydesdorff Comments: 40 pages, 6 Figures Subjects: Digital Libraries (cs.DL) ; Information Retrieval (cs.IR); Physics and Society (physics.soc-ph) arXiv:0912.3953 Title: Studies on access: a review Authors: Philip M. Davis Comments: 18 pages, 2 tables Subjects: Digital Libraries (cs.DL) ; Computers and Society (cs.CY) arXiv:0912.4141 Title: The SJR indicator: A new indicator of journals' scientific prestige Authors: Borja Gonzalez-Pereira (1), Vicente Guerrero-Bote (1), Felix Moya-Anegon (2) ((1) University of Extremadura, Department of Information and Communication, Scimago Group, Spain (2) CSIC, CCHS, IPP, Scimago Group Spain) Comments: 21 pages with graphs and tables Subjects: Digital Libraries (cs.DL) ; Physics and Society (physics.soc-ph) arXiv:0912.4188 Title: The skewness of computer science Authors: Massimo Franceschet Subjects: Digital Libraries (cs.DL) ; Computers and Society (cs.CY)
请用开放连接: http://arxiv.org/list/cs.DL/pastweekskip=0show=7#item7 Digital Libraries Authors and titles for recent submissions Tue, 26 Jan 2010 Mon, 25 Jan 2010 Thu, 21 Jan 2010 Tue, 19 Jan 2010 Mon, 18 Jan 2010 Tue, 26 Jan 2010 arXiv:1001.4433 Title: Scientometrics and Communication Theory: Towards Theoretically Informed Indicators Authors: Loet Leydesdorff , Peter Van den Besselaar Journal-ref: Scientometrics 38(1) (1977), 155-174 Subjects: Digital Libraries (cs.DL) ; Physics and Society (physics.soc-ph) arXiv:1001.4276 Title: Towards Automatic Extraction of Social Networks of Organizations in PubMed Abstracts Authors: Siddhartha Jonnalagadda , Philip Topham , Graciela Gonzalez Comments: 8 pages, First International Workshop on Graph Techniques for Biomedical Networks in Conjunction with IEEE International Conference on Bioinformatics and Biomedicine, Washington D.C., USA, Nov. 1-4, 2009 Subjects: Digital Libraries (cs.DL) arXiv:1001.4274 Title: ONER: Tool for Organization Named Entity Recognition from Affiliation Strings in PubMed Abstracts Authors: Siddhartha Jonnalagadda , Philip Topham , Graciela Gonzalez Comments: 3 pages, The 3rd International Symposiumon Languages in Biology and Medicine, Jeju Island, South Korea, November 8-10, 2009 Subjects: Digital Libraries (cs.DL) Mon, 25 Jan 2010 arXiv:1001.4023 Title: Digital Mathematics Libraries: The Good, the Bad, the Ugly Authors: Thierry Bouche (IF, CCDNM) Journal-ref: Mathematics in Computer Science 3, 3 (2010) 20 Subjects: Digital Libraries (cs.DL) Thu, 21 Jan 2010 arXiv:1001.3663 Title: Collaboration in an Open Data eScience: A Case Study of Sloan Digital Sky Survey Authors: Jian Zhang , Chaomei Chen Comments: iConference 2010 Subjects: Digital Libraries (cs.DL) Tue, 19 Jan 2010 arXiv:1001.2837 (cross-list from physics.soc-ph) Title: The long-term dynamics of co-authorship scientific networks, Iberoamerican Countries (1973-2006) Authors: Guillermo A. Lemarchand Comments: 37 pages; 18 figures; 15 tables, co-authorship networks, self-organization, preferential attachment Subjects: Physics and Society (physics.soc-ph) ; Digital Libraries (cs.DL) Mon, 18 Jan 2010 arXiv:1001.2576 (cross-list from hep-ph) Title: HepML, an XML-based format for describing simulated data in high energy physic Authors: S. Belov , L. Dudko , D. Kekelidze , A. Sherstnev Comments: 21 pages, 4 eps figures, elsart.cls Subjects: High Energy Physics - Phenomenology (hep-ph) ; Digital Libraries (cs.DL); Software Engineering (cs.SE)
最近一期《 Scientometrics 》公告了两年一次的科学计量学最高奖项 Derek John de Solla Price Medal结果。获奖人为匈牙利的 Peter Vinkler 和法国的 Michel Zitt。 两位早年分别是化学家和管理学者,近年在引文分析和评价方法与指标等热点问题上颇有建树。 笔者对两位的直接印象是:Vinkler也参与了目前h指数研究热潮,而Zitt的逻辑化推理能力很强。 另外, Glanzel 指出,Vinkler是位独狼,为啥呢?刚查了一下,他80年代进入这一领域至今,在LIS学科内发表了30余篇文章,全是独著! 希望早日见到我国科学计量学家获得这一荣誉。 参考文献: Pter Vinkler and Michel Zitt win the 2009 Derek John de Solla Price Medal. Scientometrics,2009, 81:1-5.
最近文章投稿,总有点无处可去的感觉。考虑到交流与传播,一般不愿投往影响力不太大的期刊。一些期刊影响虽大,但主题又不很适合。 科学计量学及相关的文献计量、信息计量和网络计量等领域的研究在国内开展已有一段时间。国内图情学、科学学和科研管理类期刊都会刊登一些科学计量学论文,但一直未见有像国外《Scientometrics》等这样专门的学术期刊。国际上的此领域论文似乎稿源较为充足,影响也较大。即使新办的《Journal of Informetrics》,今年第一次有影响因子就达1.188,在LIS学科约前三分之一。 感觉当前国内科学计量学及相关领域研究还算热闹,虽有些只是毕业论文或职称论文,但也能见到一些创新性的工作,稿源上支撑一本有一定质量的刊物应不是问题。因为近水楼台,在保证质量的情况下,短期内进入主流评价体制内也并非太难(类似《Journal of Informetrics》)。 国内社科类刊物,综合性刊物较多,专业细分似有不足。 当然,办刊本身是件极有挑战性的工作。