匠人府分享 http://blog.sciencenet.cn/u/meiweipingg

博文

多个自变量,如何确定哪个更重要?

已有 19824 次阅读 2017-6-28 12:25 |个人分类:数据处理与统计分析|系统分类:科研笔记|关键词:学者| 多个自变量

多个自变量,如何确定哪个更重要?

梅卫平

Basic knowledge worth spreading!

这篇文章有答案 http://mp.weixin.qq.com/s/mqUFwAuE1oHpBVEjgs3odg


  • 如题,举例比如y~x1,x2…,其中 x1 和 x2 哪个对 y 的影响更大?

1.    比较相关系数r或决 定系数R^2的大小?


错,系数大小和自变量的重要性没有必然关系。Correlation doesNOT tell anything about the effect of y (independent variable) and x (dependent variable) [1].

   

补充:如果纯粹只是想比较“系数大小差异是否显著”的话,可以用R语言里的package cocor

2.    比较显著 性p-value?


错,统计学的显著结果可能实际并不那么显著的重要[2]


3.    比较 Standardized regre ssion coefficients 貌似可行[3,4],不知对否?


    貌似可以用这个方法,不知对否???恳请批评指正!!!


    计算 方法可以使用 R packagere relaimpo, 具体说明[5]和 操作指南[6]见参考文献。


范例R代码如下:

   library(relaimpo)

   data(“swiss”)

   cor(swiss)

   linmod <- lm(Fertility ~ ., data = swiss)

   summary(linmod)

metrics <- calc.relimp(linmod, type = c("lmg",  "first", "last","betasq", "pratt","genizi","car"), rela= TRUE)

#  type 不推荐"first"(因为可能会把无显著性的自变量也分配较高的contribution),推荐使用"last", "betasq","pratt" 等

# rela=TRUE 表示将各自变量的contribution的总和设置为100%

Details


lmg

is the R^2 contribution averaged over orderings among regressors, cf. e.g. Lindeman, Merenda and Gold 1980, p.119ff or Chevan and Sutherland (1991).


pmvd

is the proportional marginal variance decomposition as proposed by Feldman (2005) (non-US version only). It can be interpreted as a weighted average over orderings among regressors, with data-dependent weights.


last

is each variables contribution when included last, also sometimes called "usefulness".


first

is each variables contribution when included first, which is just the squared covariance between y and the variable.


betasq

is the squared standardized coefficient.


pratt

is the product of the standardized coefficient and the correlation.


genizi

is the R^2 decomposition according to Genizi 1993


car

is the R^2 decomposition according to Zuber and Strimmer 2010, also available from package care (squares of scores produced by function carscore



metrics

   plot(metrics)


   metrics01 <- calc.relimp(linmod, type = "betasq", rela = TRUE)

   metrics01

Metrics are normalized to sum to 100% (rela=TRUE).

Relative importance metrics:                    

                  betasq

Agriculture      0.12911291

Examination      0.03580132

Education        0.59260934

Catholic         0.15931588

Infant.Mortality 0.08316055


Relative importancebetasq方法)从大到小排序

相对重要性排序

Education Catholic

Agriculture

Infant.Mortality

Examination

Fertility

0.59260934

0.15931588

0.12911291

0.08316055

0.03580132

注:相对重要性的各自变量顺序,不同于显著性或相关性大小顺序。

显著性排序

Education

Examination

Catholic

Infant.Mortality

Agriculture

Fertility

3.659e-07

9.45e-07

0.001029

0.003585

0.01492

相关性排序

Education

Examination

Catholic

Infant.Mortality

Agriculture

Fertility

-0.66378886

-0.6458827

0.4636847

0.41655603

0.35307918



   plot(metrics01)


参考文献

 [1] https://www.researchgate.net/post/Are_the_use_of_regression_and_correlation_coefficients_enough_to_measure_effect_of_the_independent_variable_on_the_dependent_variable  

[2] http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/p-value-and-significance-level/practical-significance/

[3]  http://blog.minitab.c om/blog/adventures-in-statistics-2/how-to-identify-the-most-important-predictor-variables-in-regression-models

[4] https://www.quora.com/How- can-I-tell-which-independent-variable-has-more-effect-from-other-independent-variables-on-a-dependent-variable-in-regression-analysis

[5] https://www.researchgate.net/publication/26469377_Relative_Importance_for_Linear_Regression_in_R_The_Package_relaimpo

[6] https://cran.r-project.org/web/pack ages/relaimpo/relaimpo.pdf




https://m.sciencenet.cn/blog-651374-1063444.html

上一篇:双重身份:2017JCR既是SCI又是SSCI的期刊名单
下一篇:用数据说话系列(5): 非参数检验SteelDwass test和 Dunn test选谁

1 杨正瓴

该博文允许注册用户评论 请点击登录 评论 (10 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-24 18:07

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部