博文

谁是优秀的科学家?——H-Index来预测（转自麦兜）

已有 8284 次阅读 2007-10-8 09:16 |个人分类:科研点滴启发|系统分类:科研笔记|关键词:学者

H指数是2005年加利福尼亚大学圣地亚哥分校统计物理学家赫希在2005年发明的，立刻引起了全世界学术界的广泛关注，论文在网上公布后，《自然》和《科学》杂志先后报道，正式论文于2005年11月正式发表在《美国科学院院刊》上（全文：

fujian (138.72 KB) ）。

在H指数发明之前，常见的评估学术成就的方法有论文的总篇数和总引用率。但是前者的问题在于，那些只大量发表低影响力论文的人会因此受益，而后者的问题在于，一两篇高引用率的文章会掩盖了大量低引用率的文章。相比较而言，更公平的做法是计算文章的平均引用率。

但是，H指数已经被公认为比平均引用率更科学的指标。所谓H指数，是指一个人有N篇论文分别被引用了至少N次。例如，普林斯顿高等研究所的物理学家爱德·威顿（Ed Witten）的H指数是110，表明他有110篇文章每篇至少被引用了110次。（化学类的H-Index排名请参阅下文）

与其它统计方法不同的是，要确定一个人的H指数相当容易，只要到SCI网站，查出某个人发表的所有SCI论文，让其按被引次数从高到低排列，往下核对，直到某篇论文的序号大于该论文被引次数，那个序号减去1就是H指数。

H指数（h-index）的发明人乔治·赫希（Jorge Hirsch）最近撰文指出，H指数不仅可以用于评估研究人员过去的学术水平，也可以用于预测未来的学术成就。

买卖提上的一篇相关文章（有博友说是一个叫LAODOUFU网友的文章，谢谢提醒）

【按：前几天看到大家讨论H-Index，有所感触，便成此文。所谓不同领域的H-Index不可比较，说起来容易，做起来难。IF其实也是如此，可现在有多少人把领域分开了？特别是partially multidisciplinary的学科越来越多。我理解有限，请拍板砖。】
所谓Impact Factor，"often abbreviated IF, is a measure of the citations toscience and social science journals." (Wikipedia Definition)
所谓H-Index，"is a hybrid index that quantifies scientific productivity and impact of a scientist based on the set of his/her most quoted papers and the number of citations that they have received in other people's publications." (Wikipedia Definition)

排名这个东西，永远都有人热衷。
古时候有好汉和武林高手排名，比如李元霸排名第一，秦叔宝并列第十三什么的；比如周伯通/王重阳第一，然后下面四个谁谁谁什么的。现代也有各种各样的排名：比如哪个NBA球队战绩好、哪部电影好看、哪个歌手的唱片卖得多什么的。
只是不是所有的事情都可以拿来排名的。我以为，科学就不是可以排名的。
研究science的scientist发明的"Impact Factor"和研究scientists的scientist发明的"H-index"，本意大致是分别寻找一个科学的办法去评估一本杂志和一位科学家的在学术领域的影响，在科学（统计？）意义上其实无可厚非。可问题在于，好事者和官僚主义者，无论是懂科学的还是不懂科学的，在利益的驱使下鼓吹和推广这些东西，让这样一些科学的方法变了质，直至IF现在成了人上人们拿在手里的生死令牌。这些事情，打个不恰当的比方，核聚变这个东西在实验室里发现的时候当然是很酷的科学，可万一恐怖分子拿个核弹在手里到处晃点就绝对不有趣了。
做科学的每个人心中都有一杆秤，什么杂志最好什么杂志OK什么杂志烂心里都有数。而发了一辈子的Macromolecules/JPC而绝不肯染指 JACS更不鸟Angew的老板们大有人在。可是曾几何时，这"IF"却似乎已经深入人心；大家发文章的时候都要看看IF多高似乎已经成了习惯。

简单地看看这个问题：
"Cell"真的比"JACS"好？
"Nano Letters"真的比"JACS""Phys. Rev. Lett."好？
"Adv. Mater."真的比"Chem. Mater.""JOC""JPC""Macromolecules"好？
"Annu. Rev. Immunol."真的比"Chem. Rev."好？
IF排杂志也就罢了，现在居然冒出来这个H-Index出来给人排名打分，给科学人量化。大家看看笑过也就算了，居然还有大张旗鼓欲将其推广，并且即将或者已经成为招聘指标之一，不仅是让人哭笑不得，而且绝对误人子弟。
我做的纳米科学就比你做的力学重要？不见得。
我发在Nature的文章就比你发在"Journal of Materials Research"的文章牛X？更不见得。
我现在引用1000次的文章就比你现在引用只是10的文章重要？还是不见得。

远的不说，看看纳米领域：
Younan Xia比Peidong Yang和Hongjie Dai的H-Index高~10个点数能说明什么问题？讨论George Whitesides、Richard Smalley、甚至Charlie Lieber谁的H-Index高有意义吗？
看看中国科学界，IF自然鼓励大家做好科学、往好的杂志投文章；可由此而来和科研基金和个人得失相关带来的虚浮的风气除了让人恶心，真是找不出其他恰当的字眼。这H-Index的出现，我只是担心有更加危险和毁灭性的后果。大家不要以为我是吃不到葡萄说葡萄酸：本人的H-Index目前也有20，在 "young yet established pre-scientists"里不算太低的了吧。只是隐约中，我似乎已经看到不久的将来，某些不是做纳米和生物的科学家们，年轻的，年老的，呆呆地坐在自己的办公桌前，看着面前将他们一生追求的东西用简单的算术公式定量的H-Index - 那些与他们的工资、奖金、基金、职位息息相关的无比荒唐的数字。
在浩瀚的未知面前，我骄傲地觉得自己无比渺小；在这为排名而疯狂的“科学”世界，我却悲哀地觉得自己同样的渺小。

新的评价科研成就的指数
Proc. Natl. Acad. Sci. USA | November 15, 2005 | vol. 102 | no. 46 | 16569-16572
An index to quantify an individual's scientific research output
J. E. Hirsch
Department of Physics, University of California at San Diego, La Jolla, CA 92093-0319
Abstract:

I propose the index h, defined as the number of papers with citation number ≥h, as a useful index to characterize the scientific output of a researcher. For example, the highest h among physicists appears to be E. Witten's h, which is 110. That is, Witten has written 110 papers with at least 110 citations each. That gives a lower bound on the total number of citations to Witten's papers at h2 = 12,100.

Schematic curve of number of citations versus paper number, with papers numbered in order of decreasing citations. The intersection of the 45° line with the curve gives h. The total number of citations is the area under the curve. Assuming the second derivative is nonnegative everywhere, the minimum area is given by the distribution indicated by the dotted line, yielding a = 2 in N(c.tot) = ah^2, where N(c.tot) is the total number of citations, a is the the proportionality constant empirically ranging between 3 and 5. A scientist has index h if h of his or her Np papers have at least h citations each and the other (Np – h) papers have ≤h citations each.

I argue that h is preferable to other single-number criteria commonly used to evaluate scientific output of a researcher, as follows:

(1) Total number of papers (Np). Advantage: measures productivity. Disadvantage: does not measure importance or impact of papers.

(2) Total number of citations (Nc,tot). Advantage: measures total impact. Disadvantage: hard to find and may be inflated by a small number of "big hits," which may not be representative of the individual if he or she is a coauthor with many others on those papers. In such cases, the relation in Eq. 1 will imply a very atypical value of a, >5. Another disadvantage is that Nc,tot gives undue weight to highly cited review articles versus original research contributions.

(3) Citations per paper (i.e., ratio of Nc,tot to Np). Advantage: allows comparison of scientists of different ages. Disadvantage: hard to find, rewards low productivity, and penalizes high productivity.

(4) Number of "significant papers," defined as the number of papers with >y citations (for example, y = 50). Advantage: eliminates the disadvantages of criteria i, ii, and iii and gives an idea of broad and sustained impact. Disadvantage: y is arbitrary and will randomly favor or disfavor individuals, and y needs to be adjusted for different levels of seniority.

(5) Number of citations to each of the q most-cited papers (for example, q = 5). Advantage: overcomes many of the disadvantages of the criteria above. Disadvantage: It is not a single number, making it more difficult to obtain and compare. Also, q is arbitrary and will randomly favor and disfavor individuals.

Instead, the proposed h index measures the broad impact of an individual's work, avoids all of the disadvantages of the criteria listed above, usually can be found very easily by ordering papers by "times cited" in the Thomson ISI Web of Science database (http://isiknowledge.com), and gives a ballpark estimate of the total number of citations

...........................................
I have proposed an easily computable index, h, which gives an estimate of the importance, significance, and broad impact of a scientist's cumulative research contributions. I suggest that this index may provide a useful yardstick with which to compare, in an unbiased way, different individuals competing for the same resource when an important evaluation criterion is scientific achievement.
Note that this paper was first posted and can also been found online at http://arxiv.org/abs/physics/0508025
See also the preview at http://ucsdnews.ucsd.edu/newsrel/science/mcH.asp
November 7, 2005
UCSD Physicist Proposes New Way to Rank Scientists' Output
By Kim McDonald
Publications in peer-reviewed journals are the yardstick by which academic scientists compare their work with their colleagues. But is the best measure of a scientist’s worth the total number of his or her published papers? Or the average quality of those papers, based on the number of times they are cited or the reputation of the journals in which they are published?

According to a physicist at the University of California, San Diego, neither of these methods—often used in academe or federal agencies to judge scientific publication records for hiring, promotion or grant awards—gives consistent and satisfactory comparisons. So Jorge E. Hirsch, a physics professor at UCSD, devised an alternative that appears to be a simpler and more reliable way to rank scientific output within a discipline than any now in use.

In a paper published in the November 15 issue of the Proceedings of the National Academy of Sciences, which appears this week in the journal’s early online edition, Hirsch explains that his “h-index” can give a reliable “estimate of the importance, significance and broad impact of a scientist’s cumulative research contributions.” What’s more, for each scientist, his method provides a single number, which takes only 30 seconds to compute, that can be used to compare a scientist’s relative rank within a discipline.

“For a person to have a high h-index is not an accident,” Hirsch says, after testing his method on scientists in a variety of disciplines and circulating his formula on physics bulletin boards for other scholars to test. “I myself was surprised to see how consistent an estimate you get with this method. It does seem to say something about a person’s overall academic achievement.”

The h-index is derived from the number of times a scientist’s publications are cited in other papers, but is calculated in a way to avoid some of the problems associated with counting large numbers of marginal papers or high-profile coauthors.

For example, Hirsch says that while the total number of publications gives some indication of a scientist’s productivity, it says little about the quality of those publications. And while the total number of times a scientist’s papers are cited in other publications says something about their quality, he says those measurements can be suspect if a scientist has high-performing coauthors, few publications or a lifetime of mediocre work skewed by one or two highly cited papers . Citation counts may also be skewed if a scientist publishes scientific review articles, which are not reports of original research, but summaries of other scientists’ work frequently referenced in subsequent journal articles.

Hirsch was motivated to develop the h-index because of his own problems publishing controversial papers on superconductivity in journals considered high-impact. Although these papers ended up in journals categorized as low-impact, they garnered many citations, evidence of their importance to the field.

His new method relies on the use of the Thomson ISI Web of Science database at http://isiknowledge.com To search for a scholar’s h-index, go to the Web of Science and enter the name in the “General Search” category. Clicking on “Search” brings up a list of papers over the entire lifetime by that author. To reorder the list from the most highly cited papers to least cited, click on “Sort by Times Cited” in the right hand column.

The h-index is obtained by moving down this list until the number of the paper—essentially the scholar’s h name—exceeds the number of citations from that paper. For example, a scholar will have an h value of 75 whose 76 th paper on the list has been cited 75 or fewer times, but whose 75 th paper has been cited 75 or more times. Put another way, this scholar has published 75 papers with at least 75 citations each.

Hirsch devotes a section in his paper to demonstrate mathematically why this method for “h”—which stands for “high citations”—seems to work. But the real proof of the pudding came when he applied the h-index to the scientific luminaries within various disciplines and found that they ended up where expected.

Edward Witten, a theoretical physicist at the Institute for Advanced Study in Princeton, N.J., who developed an extension of string theory and is widely regarded as one of the most brilliant physicists ever, has the highest h-index in physics, 110. By contrast, Nobel laureate Philip Anderson of Princeton University has an h-index of 91, while Nobel laureates Steven Weinberg of the University of Texas has an h-index of 88, Frank Wilczek of the Massachusetts Institute of Technology (68) and David Gross of UC Santa Barbara (66).

Hirsch, whose own h-index is 49, notes that comparisons of h-index among scientists in different disciplines don’t work as well. High-impact biologists tend to have generally higher h-index values, he says, possibly because of their greater research resources, while social scientists tend to have lower h-index values, presumably because their other non-journal publications, such as books, are not factored into this calculation.

Nevertheless, Hirsch is able to make some generalizations. After 20-year career in science, he says in his paper, an h-index of 20 should generally indicate a “successful scientist,” while an h-index of 40 “characterizes outstanding scientists, likely to be found only at the top universities or major research laboratories.” An h-index of 60 after 20 years or 90 after a 30-year scientific career, meanwhile, he says, “characterizes truly unique individuals.”

Hirsch says he is concerned that his h-index, while useful to compare publication records, not be misused.

“It should only be used as one measure, not as the primary basis for evaluating people for awards or promotion,” he adds. “You surely wouldn’t want to say that in order to get tenure or to get into the National Academy of Sciences you need to have an h-index of such and such.”

Nonetheless, Hirsch’s h-index has generated intense interest among scientists who have found out about it and used it.
“The reaction I’ve gotten has been very favorable,” he says. “Scientists want to know how they compare to their colleagues. The h-index really says something about that person and their work.”fujian

转载本文请联系原作者获取授权，同时请注明本文来自戴启广科学网博客。
链接地址：https://m.sciencenet.cn/blog-3913-8529.html

上一篇：高考中的阴盛阳衰
下一篇：院士杨乐：研究生如何获得完整的科研训练

催化中国，中国催化分享 http://blog.sciencenet.cn/u/catachina 化学家（www.chemj.cn）

博文

谁是优秀的科学家?——H-Index来预测（转自麦兜）

当前推荐数：0

发表评论评论 (5 个评论)

戴启广

全部作者的精选博文

全部作者的其他最新博文

全部精选博文导读

相关博文

催化中国， 中国催化分享 http://blog.sciencenet.cn/u/catachina 化学家（www.chemj.cn）

博文

谁是优秀的科学家?——H-Index来预测（转自麦兜）

当前推荐数：0

发表评论 评论 (5 个评论)

戴启广

全部作者的精选博文

全部作者的其他最新博文

全部精选博文导读

相关博文

催化中国，中国催化分享 http://blog.sciencenet.cn/u/catachina 化学家（www.chemj.cn）

发表评论评论 (5 个评论)