我们非常荣幸地聘请上海交大-耶鲁联合生物统计中心副主任、上海交通大学生物信息和生物统计学系和数学系长聘教授俞章盛担任Wiley期刊Statistics in Medicine副主编。Wiley出版集团编辑拓展副总监胡昌杰先生代表期刊共同主编RalphD'Agostino, Simon Day, Joel Greenhouse,Louise Ryan向俞教授颁发了聘书。 Statistics in Medicine 《医学统计学》(纸本ISSN:0277-6715,在线ISSN: 1097-0258),1982年创刊,全年24期,SCI 2015年影响因子1.533。 刊载统计及其它计量方法应用于医学领域 (包括医学数据的收集、分析、描述、阐释)的研究论文,涉及医学数据、临床实验、诊断、剂量控制、流行病学和卫生保健等。 该杂志旨在通过论文发表和其它计量方法,影响医学实践和其相关联的科学。出版的主要标准是对实际医学问题的统计方法恰当,阐释清晰。该杂志力求提高在统计学家,临床医生和医学研究者之间的交流。 俞章盛教授简介 俞章盛教授本科毕业于华东师范大学统计专业,先后在华东师范大学和美国鲍林格灵州立大学获得人口学和统计学硕士学位,之后获得美国密歇根大学生物统计学博士学位,并于2005-2006年在哈佛大学做访问学者,之后在俄亥俄州立大学生物统计系和印第安纳大学医学院生物统计系任教,2014获聘印第安纳大学医学院生物统计系副教授(终身教职)。2009-2013年分别担任中印第安纳统计协会的副会长和会长。2014年获得上海市“东方学者”特聘教授。 作为生物(临床医学)统计领域的专家,俞教授广泛地与临床医学专家合作,为提高医学研究的效率,研究方法的适用性,研究结果的正确理解作出了卓越的贡献。 俞教授还担任Journalof System Science and Complexity期刊副主编,Heart Rhythm StatisticalEditor,Pediatric Pulmonology编委。
The 5-level structure of information, which includes Statistics, Syntax, Semantics, Pragmatics, and Apobetics. This review is from: In the Beginning Was Information In this fascinating book, the author Werner Gitt explains in detail the principle of information theory, namely defining the characteristics of information and all the observational evidence we have for the origin and formation of information. He carefully and clearly delineates what is considered information for the purposes of the theory, and the 5-level structure of information, which includes Statistics, Syntax, Semantics, Pragmatics, and Apobetics. It is shown that the well-known theory of information given by Shannon is an important contribution, but can only describe the lowest (statistical) level of information, while ignoring the most crucial aspects of its higher level definition. All information, as defined by the book, has these higher level aspects, which include the structure and code (syntax); the meaning (semantics); the intended action (pragmatics); and purpose or goal (apobetics). Of course that is an oversimplification of the concept, but Gitt does a fine job of explaining it with numerous fascinating examples both from the biological and technical realm. Gitt shows how all attempts to generate (or simulate the generation of) information apart from a mental process have failed. This is the most fundamental hurdle that the theory of evolution must overcome in order to claim validity as a complete explanation of the origin of life apart from the Creator or a mental source. DNA is undeniably information, and it is coded in such an efficient and marvelous way, that it is utterly unmatched by the greatest technological advancements of today. Even an experiment to show the formation of meaningful DNA from materialistic processes, in sufficient quantity to produce life, would still fall far short of proving this necessary step for evolution, since apart from a meaningful context of proteins and RNA to participate in the replication, transcription, and translation of the information in DNA, DNA is useless. And as it is well known in biology, the paradox goes deeper: the proteins that are required for replication, etc are coded for BY the DNA! The challenge of information theory to evolution can not be brushed aside, and this book does an excellent job of laying out the theory in a detailed yet understandable and compelling manner. Gitt's book offers a fresh look at the creation and evolution debate by presenting a robust positive case for creation on the basis of the theorems and natural laws encompassed by information theory and the countless observations that have affirmed this theory. He discuss numerous examples that have been proposed contrary to the it, and how they have failed to falsify the theory. Gitt devotes limited time to discounting evolution, but makes reference to other writings of his that deal with it more specifically. The purpose of the book is not so much to deconstruct evolutionary theory, but to establish by scientific theorems that all known information has a mental source, and this has yet to be disproven. He is also unabashedly a Christian and a believer in special creation, which comes across clearly in his book, yet he rightly admits that the existence of God can not be proved. However, he points out the consistency of the inference of a Creator with all other observations about information. In the Beginning Was Information will be a very informative book not only for creationists, but evolutionists as well, due to its thorough explanation of information. If you read this book, by all means read the appendix at the end, it contains some of the most intriguing examples in the whole book!
I generally break my projects into 4 pieces: load.R clean.R func.R do.R load.R: Takes care of loading in all the data required. Typically this is a short file, reading in data from files, URLs and/or ODBC. Depending on the project at this point I'll either write out the workspace usingsave() or just keep things in memory for the next step. clean.R: This is where all the ugly stuff lives - taking care of missing values, merging data frames, handling outliers. func.R: Contains all of the functions needed to perform the actual analysis. source()'ing this file should have no side effects other than loading up the function definitions. This means that you can modify this file and reload it without having to go back an repeat steps 1 2 which can take a long time to run for large data sets. do.R: Calls the functions defined in func.R to perform the analysis and produce charts and tables. The main motivation for this set up is for working with large data whereby you don't want to have to reload the data each time you make a change to a subsequent step. Also, keeping my code compartmentalized like this means I can come back to a long forgotten project and quickly read load.R and work out what data I need to update, and then look at do.R to work out what analysis was performed. source: http://stackoverflow.com/questions/1429907/workflow-for-statistical-analysis-and-report-writing/
本身不是做统计学的。兴趣倒是不小。刚开始关注Bayes。到目前,看到几本这方面的书。各有不同,但都有参考价值: 1. 适合非统计专业的人阅读的入门级 1.1 Introduction to WinBUGS for Ecologists 向生态学者介绍 Bayesian Modeling的书。浅显易懂。可惜,支持这本书网站一直打不开,还没有运行过它的例子。如果你仅了解最基本的统计回归,你就可以通过这本书开始 Bayesian 了。 1.2 A First Course in Bayesian Statistical Methods 这本和上本类似。公式相对多了点,但完全适合非专业人士自学。作者在前言里这样写道My experience has been that once a student understands the basic idea of posterior sampling, their data analyses quickly become more creative and meaningful, using relevant posterior predictive distributions and interesting functions of parameters. 看来,Bayesian不但有用而且好像很容易。 2. 高级别的 2.1 Bayesian Data Analysis, by Gelman, Carlin, Stern, and Rubin (1995, 2004) 2.2 Data Analysis Using Regression and Multilevel/Hierarchical Models, by Gelman and Hill (2007) 这两本只是翻了翻,follow 书中例子不难,想弄明白的话,似乎是做梦。但从应用角度来说,也是不可不读的书。 3. 未知级别 3.1 Bayes and Empirical Bayes Methods for Data Analysis 3.2 Bayesian Analysis for Population Ecology_R 3.3 Bayesian Analysis of Gene Expression Data 3.4 Bayesian Biostatistics 3.5 Bayesian Computation With R-2ed 3.6 Bayesian Disease Mapping Hierarchical Modeling in Spatial Epidemiology 3.7 Bayesian Methods for Ecology 3.8 Bayesian Modeling Using WinBUGS 3.9 Bayesian Statistical Modelling_R 3.10 Bayesian_core_a_practical_approach_to_computational 3.11 Introduction to Bayesian Scientific Computing- 3.12 Introducing Monte Carlo Methods with R 3.13 Introduction to Probability Simulation and Gibbs Sampling with R
Wang, S., M. K. Cowles, et al. (2008). Grid computing of spatial statistics: using the TeraGrid forGi* analysis. Concurrency and Computation: Practice and Experience 20(14): 1697-1720. The massive quantities of geographic information that are collected by modern sensing technologies are difficult to use and understand without data reduction methods that summarize distributions and report salient trends. Statistical analyses, therefore, are increasingly being used to analyze large geographic data sets over a broad spectrum of spatial and temporal scales. Computational Grids coordinate the use of distributed computational resources to form a large virtual supercomputer that can be applied to solve computationally intensive problems in science, engineering, and commerce. This paper presents a solution to computing a spatial statistic, Gi*(d) using Grids. Our approach is based on a quadtree-based domain decomposition that uses task-scheduling algorithms based on GridShell and Condor. Computational experiments carried out on the TeraGrid were designed to evaluate the performance of solution processes. The Grid-based approach to computing values for Gi*(d) shows improved performance over the sequential algorithm while also solving larger problem sizes. The solution demonstrated not only advances knowledge about the application of the Grid in spatial statistics applications but also provides insights into the design of Grid middleware for other computationally intensive applications. Copyright 2008 John Wiley Sons, Ltd.
R Code for CRW simulation #copy and paste the following code in R #to simulate Correlated Random Walk in an open space #Original code by Xiaohua Dai #required libraries require(circular) require(CircStats) ##CRW initial parameters #length ~ gamma distribution (sh, sc) #For a gamma distribution: gamma(shape, scale) # mean = shape*scale # variance = shape*scale*scale #Then, scale = variance/mean, shape = mean/scale #shape parameter: sh = 0.285 #scale parameter: sc = 362 #turning angle ~ wrapped cauchy distribution (m, rh, s) #mean turning angle in radians: m = 0.145 #mean resultant length rho: rh = 0.356 #square displacements R = matrix(0,1000,25) #x,y coordinates x = matrix(0,1000,25) y = matrix(0,1000,25) #turning angles the = matrix(0,1000,25) #lower 2.5% CI of R r25 = matrix(0,25) #mean of R rm = matrix(0,25) #upper 2.5% CI of R r975 = matrix(0,25) #Start simulation; sim = times of simulation for(sim in 1:1000){ for(step in 2:25){ l - rgamma(1,shape=sh,scale=sc) ta - rwrappedcauchy(1,mu=m,rho=rh) the = the +ta x = x +l*cos(the ) y = y +l*sin(the ) R = x ^2+y ^2 } } for(step in 1:25){ r25 = sort(R ) rm = mean(R ) r975 = sort(R ) } #output write.table(data.frame(r25,rm,r975),CRWoutput.txt) write.csv(data.frame(r25,rm,r975),CRWoutput.csv) Wednesday July 5, 2006 - 11:15am (EEST) Permanent Link | 0 Comments
R code for grid-based movement simulation Grid size: 1km 1km square Initial Agent: Individual animal Local movements: Habitat selection index H i (according to the percentage levels of utilization distribution, UD i , incell i ): ## H could be also determined according to the habitat quality, prey density, etc. Time step: 0.5hr At time step t : agent atcell m (center coordinate = ( x t , y t )) When t +1 the agent move to (or stay at) one of the nine cells ( n = m -4, , m +4) as follows ( x t -1, y t -1) ( x t , y t -1) ( x t +1, y t -1) ( x t -1, y t ) ( x t , y t ) ( x t +1, y t ) ( x t -1, y t +1) ( x t , y t +1) ( x t +1, y t +1) Possibility ( p ) of moving to/staying atcell n is P n = H n / SUM ( H i ), i from m- 4 to m +4. #####Here's the R script to simulate animal movement###### #Original code by Xiaohua Dai # Required R packages require(adehabitat) require(car) require(spdep) ## Initial parameters # Location time series (x,y) # time = number of time steps time - 15000 x - array(0,time) y - array(0,time) # Number of animal occurences at location x,y: location # Grid map of Kruger # (NOTE: zero-value grids buffer around its border: # 1. to make the grid contains NRow * NCol cells # 2. to ensure each cell in Kruger has 8 neighbouring cells) location - image.asc(Kruger) # The values of habitat selection index H decrease with the increasing of utilization level # H = 0 when the cells are not in home range therefore elephants wont move to the cells H - location UD - image.asc(KrugerUD) H - round(100/UD) BB - array(H) neigh - cell2nb(NRow,NCol,torus=FALSE,type=queen) # Generate 8 neigHours for each cell image(as.asc(H)) # Display the grid space of habitats # Location coordinates (lx, ly) # Use lxy to combine lx and ly together as a data frame lx - rep(1:NRow, NCol) # e.g. 123412341234 ly - rep(1:NCol, each=NRow) # e.g. 111122223333 lxy - data.frame(lx,ly) # Initial location of animal loc - round(runif(1,min=1,max=length(lx))) ##Movement simulation for(t in 1:time){ # Record location time series x - lxy$lx y - lxy$ly # Draw location point points(lxy$lx ,lxy$ly , col = round(runif(1, max=10)), pch = 19) # 9-cell neigHourhood matrix of habitat selection # Repeat the number of k according to its selection level BB ] # Previous cell also included since animal have a certain probability to stay in it. cxy - rep(loc,BB ) for(i in 1:8) { k - neigh ] #8 neigHouring cells cxy - c(cxy, rep(k,BB )) } # Sample one value in the selection array cxy # The larger BB ] is, the higer probability for the animal to move to cell k # Move to the new location and add 1 to the number of animal occurence at loc loc - some(cxy,1) location - location +1 }# Simulate the next move Wednesday July 5, 2006 - 11:22am (EEST) Permanent Link | 0 Comments
R code to simulate animal movement in a torus # Original code by Xiaohua Dai # Required R packages require(adehabitat) require(car) require(spdep) ## Initial parameters # Location time series (x,y) # time = number of time steps time - 15000 x - array(0,time) y - array(0,time) # Number of animal occurences at location x,y: location # location - round(runif(length(HB),min=1,max=3)) BB - array(HB) neigh - cell2nb(CellN,CellN,torus=TRUE,type=queen) # Generate 8 neighbours for each cell image(as.asc(HB)) # Display the grid space of habitats # Location coordinates (lx, ly) # Use lxy to combine lx and ly together as a data frame lx - rep(1:CellN, CellN) ly - rep(1:CellN, each=CellN) lxy - data.frame(lx,ly) # Initial location of animal loc - round(runif(1,min=1,max=length(lx))) ##Movement simulation for(t in 1:time){ # Record location time series x - lxy$lx y - lxy$ly # Draw location point points(lxy$lx ,lxy$ly , col = round(runif(1, max=10)), pch = 19) # 9-cell neighbourhood matrix of habitat selection cxy - loc for(i in 1:8) { k - neigh ] #8 neighbouring cells in a torus # Repeat the number of k according to its preference degree BB ] # Previous cell also included since animal have a certain probability to stay in it. cxy - c(cxy, rep(k,BB )) } # Sample one value in the selection array cxy # The larger BB ] is, the higer probability for the animal to move to cell k # Move to the new location and add 1 to the number of animal occurence at loc loc - some(cxy,1) location - location +1 }# Simulate the next move ## Estimation of Kernel Home-Range with 25%, 50% and 95% percentage # for home range contour estimation xy - data.frame(x,y) ud - kernelUD(xy) ver - getverticeshr(ud, 95) plot(ver, add=TRUE) ver - getverticeshr(ud, 50) plot(ver, add=TRUE) ver - getverticeshr(ud, 25) plot(ver, add=TRUE) Wednesday July 5, 2006 - 11:23am (EEST) Permanent Link | 0 Comments
R code to generate convex hulls around point clusters #Original code by Roger Bivand #Modified by Xiaohua Dai require(maptools) require(sp) require(amap) require(shapefiles) #reading point shape foodloc - readShapePoints(foodtree.shp) # yourloc - readShapePoints(yourshape.shp) xy - coordinates(foodloc) xy_clusts - hcluster(xy, method=euclidean, link=complete) # hcluster use twice less memory, as it doesn't store distance matrix # complete linkage hierarchical clustering plot(xy_clusts) # shows the clustering tree cl - cutree(xy_clusts, 200) # 200 is the number of clusters which_cl - tapply(1:nrow(xy), cl, function(i) xy ) chulls_cl - lapply(which_cl, function(x) x ) plot(xy) res - lapply(chulls_cl, polygon) n - length(chulls_cl) polygons - lapply(1:n, function(i) { chulls_cl ] - rbind(chulls_cl ], chulls_cl ] ) # the convex hulls do not join first and last points, so we copy here Polygons(list(Polygon(coords=chulls_cl ])), ID=i) }) out - SpatialPolygonsDataFrame(SpatialPolygons(polygons), data=data.frame(ID=1:n)) plot(out) # note standard-violating intersecting polygons! tempfile - tempfile() writePolyShape(out, tempfile) in_again - readShapePoly(tempfile) plot(in_again, border=blue, add=TRUE) #output test - read.shapefile(tempfile) write.shapefile(test,ptcluster) #Refer to: #http://www.google.com/search?hl=zh-CNq=%22outline+polygons+of+point+clumps%22+r-projectbtnG=Google+%E6%90%9C%E7%B4%A2lr= Wednesday July 5, 2006 - 12:34pm (EEST) Permanent Link | 0 Comments
LetsR来用R entomology 发表于 2005-6-16 17:27:00 Lets R 来用 R In bilingual English-Chinese What is R? R 是什么? *R is not only a programming language; R is also a graphic statistical environment withplenty of easily-loaded packages. (I like it, same as theeasy-to-useextensions for ArcView) R 是程序语言, R 是具有大量易装载功能包的图形统计环境。我喜欢这点,如同 ArcView 中使用方便的扩展部件。 How to R? 怎么用 R *You can write your own scripts, you can also call a large number of powerful functions. 你可以自己写脚本,也可以调用大量有用函数。 Why to R? 为什么R * You can run R on UNIX, Windows and Mac OS R 可以运行于 UNIX, Windows 和 Mac 操作系统 * R is free: free of charge and free to use 你可以免费和自由的使用 R * R is a combination of functional programming and object-oriented programming R 是函数型程序设计与面向对象程序设计的综合体 * You need not to be a programmer; you can quickly be a programmer 你不必是程序员;你能够很快地成为程序员 * Many R users and big name statisticians around the world will answer your questions in maillists 你可以通过邮件列表向为数众多的 R 使用者和统计牛人咨询问题 * Where is R? R 在哪里 * Home page: http://www.R-project.org/ and many mirrors 主页与镜像 * Useful m ini-course for beginners: http://life.bio.sunysb.edu/~dstoebel/R/ 初学者快速入门教程 * R introduction in Chinese: http://www.biosino.org/pages/newhtm/r/schtml/ 中文 R 导论 * R resources for ecologists: http:// c r an. r -p r oject.o r g/web/ views /Envi r onmet r ics.html 生态学家的 R 资源 * Last update 2000.06.16 Xiaohua Dai @ ecoinformatics.blog.edu.cn 搜索引擎关键词: 统计软件R, R中文, 中文R, R语言
GIS-relatedpackagesinR entomology 发表于 2005-7-8 20:39:00 GIS-related packages in R: ade4 -- Analysis of Environmental Data : Exploratory and Euclidean methods in Environmental sciences adehabitat -- Analysis of habitat selection by animals fields -- Tools for spatial data GRASS -- Interface between GRASS 5.0 geographical information system and R Mapdata -- Extra Map Databases Mapproj -- Map Projections Maps -- Draw Geographical Maps Maptools -- tools for reading and handling shapefiles Maptree -- Mapping, pruning, and graphing tree models PBSmapping -- PBS Mapping 2 Shapefiles -- Read and Write ESRI Shapefiles Sp -- classes and methods for spatial data Spatial -- Functions for Kriging and Point Pattern Analysis Spatstat -- Spatial Point Pattern analysis, model-fitting and simulation Spdep -- Spatial dependence: weighting schemes, statistics and models etc.
R常见工具和网站 entomology发表于-2008-7-26 20:02:00 0 推荐 这是我学R几年来觉得最有用的工具和网站,先写一部分,以后想起来慢慢补充。 1 R Task Views --to install packages for a special task. 用于特定专业研究的包组合: http://cran.r-project.org/web/views/ 如生态学的 http://cran.r-project.org/web/views/Environmetrics.html 2 R Reference Card--as a printed guideat hand, just several pages, but many useful hints.R参考手册,只有几页,最简单的只有一页,可以打印出来随时参考: (1)一页版英文: http://cran.r-project.org/doc/contrib/Short-refcard.pdf (2)多页版英文: http://cran.r-project.org/doc/contrib/refcard.pdf (3)多页版中文: http://cran.r-project.org/doc/contrib/Liu-R-refcard.pdf 3 Tinn-R--to make the use of R easier in a graphic interface. 图形界面的R编辑器: http://sourceforge.net/projects/tinn-r 4 Rcmdr--R GUI inteface.R的GUI界面套件: http:// cran.r-project.org/web/packages/ Rcmdr /index.html http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/ 5 升级包的时候可以选择韩国的服务器,速度快,而且更新要比国内快得多。