熊荣川 六盘水师范学院生物信息学实验室 xiongrongchuan@126.com http://blog.sciencenet.cn/u/Bearjazz 对表格排序往往是许多数据分析过程必不可少的组成部分,我们习惯了在 excel 中完成这样的操作。其实 R 语言也可以对表格数据进行简单的排序,掌握这些有用的函数,在使用 R 语言进行数据分析时就不用在 excel 和 R 平台之间来回的倒数据了。 以下是在 R 平台上的代码输入进运算结果。范例 order.csv 请见博文的附件,注意设置工作目录(你存放 order.csv 的位置) rm(list=ls()) # 清空向量 setwd(D:/ziliao/zhuanye/R bear/lab03)# 设置工作目录 y=read.csv(order.csv) y X V1 V2 X.1 V5 V6 1 1 0.153979 3213.282 1000.000000 1000.000000 0.153979 2 2 0.163979 3215.253 0.010000 NA NA 3 3 0.173979 3218.715 0.010000 NA NA 4 4 0.183979 3219.471 0.010000 NA NA 5 5 0.193979 3238.251 0.010000 NA NA 6 6 0.203979 3269.727 0.010000 NA NA 7 7 0.213979 3270.134 0.010000 NA NA 8 8 0.223979 3279.202 0.010000 NA NA 9 9 0.233979 3260.387 0.010000 NA NA 10 10 0.828344 3266.762 0.594365 0.594365 0.828344 11 11 0.838344 3244.964 0.010000 NA NA 12 12 0.848344 3247.315 0.010000 NA NA 13 13 0.858344 3258.594 0.010000 NA NA 14 14 0.868344 3266.271 0.010000 NA NA 15 15 0.878344 3278.918 0.010000 NA NA 16 16 0.888344 3273.031 0.010000 NA NA 17 17 0.898344 3281.390 0.010000 NA NA 18 18 0.908344 3290.748 0.010000 NA NA 19 19 0.918344 3269.900 0.010000 NA NA 20 20 1.586302 3259.754 0.667958 0.667958 1.586302 x = y x 3213.282 3215.253 3218.715 3219.471 3238.251 3269.727 3270.134 3279.202 3260.387 3266.762 3244.964 3247.315 3258.594 3266.271 3278.918 3273.031 3281.390 3290.748 3269.900 3259.754 x = sort(x) # 对数组 x 进行排序 # x 3213.282 3215.253 3218.715 3219.471 3238.251 3244.964 3247.315 3258.594 3259.754 3260.387 3266.271 3266.762 3269.727 3269.900 3270.134 3273.031 3278.918 3279.202 3281.390 3290.748 y ),] # 以第三列为依据排序,其它列相应变动 # X V1 V2 X.1 V5 V6 1 1 0.153979 3213.282 1000.000000 1000.000000 0.153979 2 2 0.163979 3215.253 0.010000 NA NA 3 3 0.173979 3218.715 0.010000 NA NA 4 4 0.183979 3219.471 0.010000 NA NA 5 5 0.193979 3238.251 0.010000 NA NA 11 11 0.838344 3244.964 0.010000 NA NA 12 12 0.848344 3247.315 0.010000 NA NA 13 13 0.858344 3258.594 0.010000 NA NA 20 20 1.586302 3259.754 0.667958 0.667958 1.586302 9 9 0.233979 3260.387 0.010000 NA NA 14 14 0.868344 3266.271 0.010000 NA NA 10 10 0.828344 3266.762 0.594365 0.594365 0.828344 6 6 0.203979 3269.727 0.010000 NA NA 19 19 0.918344 3269.900 0.010000 NA NA 7 7 0.213979 3270.134 0.010000 NA NA 16 16 0.888344 3273.031 0.010000 NA NA 15 15 0.878344 3278.918 0.010000 NA NA 8 8 0.223979 3279.202 0.010000 NA NA 17 17 0.898344 3281.390 0.010000 NA NA 18 18 0.908344 3290.748 0.010000 NA NA 如果对含有“ NA ”数组进行排序,排序完之后数组中就没有“ NA ”值了,你可以试试看。 就这么简单,祝您科研愉快! 附件: order.csv 另外,如果对字符向量进行排序,使用sort()函数。
Quantifying the influence of scientists and their publications: distinguishing between prestige and popularity Author: Yan-Bo Zhou, Linyuan Lü * and Menghui Li Journal:New J.Phys. 14 (2012) 033033 Download: http://iopscience.iop.org/1367-2630/14/3/033033 Abstract The number of citations is a widely used metric for evaluating the scientific credit of papers, scientists and journals. However, it so happens that papers with fewer citations from prestigious scientists have a higher influence than papers with more citations. In this paper, we argue that by whom the paper is being cited is of greater significance than merely the number of citations. Accordingly, we propose an interactive model of author–paper bipartite networks as well as an iterative algorithm to obtain better rankings for scientists and their publications. The main advantage of this method is twofold: (i) it is a parameter-free algorithm; (ii) it considers the relationship between the prestige of scientists and the quality of their publications. We conducted real experiments on publications in econophysics, and used this method to evaluate the influence of related scientific journals. The comparison between the rankings by our method and simple citation counts suggests that our method is effective in distinguishing prestige from popularity. GENERAL SCIENTIFIC SUMMARY Introduction and background. The question of how to measure the scientific influence of scientists and their publications is a long-term debate. Citation counts have been widely used to evaluate scientific impact. However, a paper with fewer citations, but those from prestigious scientists, is of greater influence than those papers with more citations, but from less prestigious sources. The value of each citation should depend on its source. We therefore propose an iterative algorithm to quantify the scientists' prestige and the quality of their publications via their interrelationship on an author–paper bipartite network; we call this the AP rank. Main results. We apply AP rank to classify scientists and papers in the field of econophysics. Although some overlap exists between AP rank and citation counts, the outliers reveal the remarkable and meaningful differences. The figure shows the co-authorship network. With AP rank, we also identify the top-five mainstream journals in econophysics, Physica A, Physical Review E, European Physical Journal B, Quantitative Finance and Physical Review Letters . Wider implications. The main advantages of AP rank are obvious: (1) it is parameter-free; (2) it considers the interaction between the prestige of scientists and the quality of their publications; and (3) it is effective in distinguishing prestige from popularity. Our algorithm can be generalized to applications in a wide range of systems. For example, on Twitter we can build an online reputation system to identify the influential users and evaluate the quality of their tweets by constructing a bipartite network where the retweets can be considered as a kind of citation.
携五人之力生蛋蛋一枚,特在圣诞之夜以示纪念! 蛋蛋名称: Identifying influential nodes in complex networks 蛋蛋编号: Physica A 391 (2012) 1777–1787 蛋蛋特征: 提出一种局部的指标用以刻画网络节点的影响力。 下载蛋蛋: http://www.sciencedirect.com/science/article/pii/S0378437111007333 Abstract :Identifying influential nodes that lead to faster and wider spreading in complex networks is of theoretical and practical significance. The degree centrality method is very simple but of little relevance. Global metrics such as betweenness centrality and closeness centrality can better identify influential nodes, but are incapable to be applied in large-scale networks due to the computational complexity. In order to design an effective ranking method, we proposed a semi-local centrality measure as a tradeoff between the low-relevant degree centrality and other time-consuming measures. We use the Susceptible–Infected–Recovered (SIR) model to evaluate the performance by using the spreading rate and the number of infected nodes. Simulations on four real networks show that our method can well identify influential nodes.