科学网

 找回密码
  注册

tag 标签: STATA

相关帖子

版块 作者 回复/查看 最后发表

没有相关内容

相关日志

[转载]Set of tools to put Stata output in Latex
amelielele 2012-7-1 21:21
Gerhard Riener has a nice page replicated below. LaTex and Stata @Gerhard’s Hompage @ Department of Economics @ Essex Stata Integration to Latex Producing Tables and Graphs and including them into publications and articles is often a very tedious task, especially when LaTex tables are involved. I collected some hopefully useful programs and links to instruction how to least paifully get your output to the masses General Articles A good website for people who (have to) work with data: DataNinja maintained by an Econ PhD student Rosa Gini and Jacopo Pasquini Automatic generation of documents Florent Bresson Outils Stata pour LaTeX Ben Jann Making regression tables from stored estimates Roger B. Newson Confidence intervals and p-values for delivery to the end user Stata Modules The most comprehensive stata module is OUTTEX . Provides a lot of features for exporting into html, \LaTex and text. CORRTEX Stata module to generate correlation tables formatted in LaTeX OUTREG2 Stata module to arrange regression outputs into an illustrative table TABOUT Stata module to export publication quality cross-tabulations, EST2TEX Stata module to create LaTeX tables from estimation results ESTOUT Stata module to make regression tables SUTEX Stata module to LaTeX code for summary statistics tables MAKETEX : Stata module to generate LaTeX code from a text file OUTTABLE : Stata module to write matrix to LaTeX table 原文: http://iman.edublogs.org/2009/07/02/set-of-tools-to-put-stata-output-in-latex/
个人分类: Widget|3140 次阅读|0 个评论
[转载]几种常用的统计分析软体比较
agri521 2010-7-29 14:36
本文转载自周恬弘的博客为了美丽的地面。周先生毕业于维吉尼亚州大学,哲学博士,现为台湾某医院行政副院长。 http://thchou.blogspot.com/2008/07/blog-post_13.html (笔者整理排版) 这一年来接触了几种不同的统计软体,有些是自己有实际去使用过,有些则是只听别人介绍或只有概念上的初步了解。每一种软体都各有利弊,我就目前所知道的这几种软体做简单的比较。 本来我只会使用SPSS,但不是很深入。 SPSS是专门给社会科学研究人员使用的统计软体 , 其好处是视窗的介面与对话方块与下拉式选单的功能让一般的电脑使用者很容易上手 ,使得SPSS成为最普遍的统计软体之一。 在修计量经济学时,我们所使用的统计软体叫做Stata,以前我并不知道有这样的软体,但是使用之后也觉得蛮好用的,有其独特的功能。Stata原 本只用简单的指令,研究人员可以写一连串的指令去整理资料内容与执行统计分析工作,后来Stata的版本也加入类似SPSS的视窗介面与对话方块与选单的 功能。对我来说, Stata最方便的地方 (这也是老师一再强调的好处),是 可以将整套的执行指令存成一个do file,可以让研究人员反覆执行同样的分析工作 。 如果要修改分析工作中的某些步骤,只要修改其中相关的部分指令,便可以很容易让电脑重新进行分析工作,省下很多的时间与避免错误的可能性。相较之下,对话 方块每执行一次分析工作,就得重头勾选对话方块中的项目,相当耗时,也容易出错。后来有朋友告诉我其实 SPSS也有类似Stata的do file的执行指令档的功能,如果懂得其执行指令,也可以反覆执行整批的指令。 因此在操作上,我觉得 SPSS与Stata其实差不多。不过我个人比较喜欢 Stata的分析结果呈现格式,感觉上比较简洁 ;此外, Stata所提供的统计功能要比SPSS来得完整 。特别是在回归分析方面,Stata的涵盖面比SPSS更为广泛,而且Stata在进行假设验证时,比SPSS来得灵活。 SPSS与Stata都不便宜,虽然Stata有简易版与学生版,价格较低廉,可是能够处理的资料量与变数量有限制,而且Stata每年都要收费。 还有,SPSS与Stata在处理的资料量上面有一定的限制,对于非常大的资料档(如资料笔数很多),有时这两种软体无法处理。 在处理大量的资料方面,目前公认最好的统计软体是SAS。 SAS也是透过写指令的方式去执行资料整理与统计分析,与Stata相似。SAS的功能也非常完备,可能是统计专业人员最倚重的软体。但是其缺点是SAS没有单机版,只有机构版,而且每年计费,且收费不便宜,换句话说, SAS是只提供给机构或团体使用 。不是在机构工作的研究人员或没有与SAS签约的机构的研究人员都无缘使用SAS。 暑假我修Dr. Luke的健康照护机构策略,Dr. Luke在上课时会用到另一种统计软体叫做JMP,这是SAS集团的新产品之一,属于单机版的套装统计软体,感觉上相当精巧与友善。 JMP是一种互动型的分析软体,也是使用视窗与对话方块的功能 , 研究人员每执行一次动作,结果马上会呈现出来,使用者可以根据这个结果,再做进一步的资料整理或分析,直到所期望的最终的结果出现。JMP运用很多图形的 方式来呈现分析结果,让分析人员一目了然。我觉得JMP在整理资料的功能也是我目前所接触过最友善的,分析人员可以很容易地依照自己的想法操作资料,并且 马上得到自己所想要的资料排列格式。据我所知,JMP可以处理的资料量也相当大,不过其统计功能的涵盖面可能不像Stata或SAS那么广泛与完整。它另 外的一个缺点是没有执行指令档的功能,因此每个动作都要一次又一次透过对话方块与下拉式选单去执行。 最近,有位朋友介绍我一种开放程式码的统计软体,叫做R。我才刚在了解这个软体。据我所知,R的统计功能也相当齐全,可以处理的资料量也很大,它是 采用写执行指令的方式去进行分析,但是与SAS、SPSS与Stata不同的地方在于R也是采用互动的方式,让分析人员一步一步、一层一层分析下去,直到 得到自己想要的结果。而且R好像也有执行批次档的功能。然而最大的好处是R是可以自由取得,不用花钱。不过据说R的上手并不容易,因为它是由专精统计的网 友所共同发展出来的,因此还是在变动中,而且没有一套完整详尽的操作手册,而是散布各处,要找到这些使用指南经常要花一些时间。因此,有位好心的统计工作 者特别架设了一个网页 ,专门介绍R给对其他统计软体有些经验的人,希望藉此缩短摸索的时间。 对于各种统计软体所包括的功能,Wikipedia有做比较 ,可以参考看看。 有时候我很纳闷,为什么要有这么多不同的统计软体?每一种各有优缺,导致像我这样,每种都接触一下,学了一点,却没有一种是专精的,还要伤脑筋去想 到底要用哪一种作为自己的主要工具。有时候,太多选择也是颇让人头疼,这时就会希望有一种完美的统计软体:免费、处理无限大的资料量、涵盖所有的统计功 能、可以执行批次指令档、也有视窗与对话方块功能,而且,容易学习与上手。 http://www.statmethods.net/index.html http://en.wikipedia.org/wiki/Comparison_of_statistical_packages 以下为网友回复: 这些统计软体的多元化跟电脑软体市场的竞争有直接关系。但是设计统计软体的考量和需要,因为大过个人使用者的层次,才会使这市场变得如此复杂。 就电脑的发展史来看,在个人超级电脑还没有发展出来,储存空间还非常有限的五零甚至到六零年代,庞大的资料都是电脑的中央大系统,也就是IBM公司发展的Mainframe在作储存和输出控制。那时的资料库语言是Cobol,用来写数学计算的语言是Fortran。 Fortran 算是中阶语言,比最基本的Assembly language高一阶而已,对不专学电脑语言的研究者来说,要学programming很困难。因为这个友善一般使用者的需要,在Chicago University教书的几位社会科学研究者就跟电脑专家合作,发展出第一代的SPSS。值得注意的是这虽然是电脑软体,是可以读算和呈现结果的环 境,却也是一种新的电脑语言 是用Fortran作底写创出来新的电脑语法,也就是说系统必须先解读SPSS成为Fortran,再由Fortran解作Assembly language,然后执行真正的指令 。所以SPSS 变成是最高阶的语言,同时也指由这个语言控制数算和输出结果的环境。这种模式成为后来发展所有统计软体语言的典范。 SPSS刚开始也是需要使用者自己写语言(syntax)。采用视窗的环境和滑鼠点选很后来的事。 SPSS 头几代刚设计出来就遇到这语言跟IBM系统相容不是很顺利的问题。这问题其实是IBM系统不断更新,Fortran 的发展跟不上,导致SPSS也受到限制。另一方面是因为不是专业的程式设计师最先开始使用SPSS,却不擅掌控如何从大系统读存资料。这两个问题,成为 SPSS后来不受工商企业界广为采用的致命伤。一直要到个人电脑的速度和储存容量可以和大系统匹敌,才有所改善。 SAS 差不多是六零年代末期开始发展进入软体市场,却一直到七零年代初才比较多人使用。刚开始SAS也是由Fortran写成。但是能赶过SPSS进占工商企 业,是因为电脑语言C的发明。C比Fortran 更容易使用,更灵活地能跟Assembly language对谈。也就是说C更容易和大系统相容,方便使用者从大系统上读写存资料。SAS公司的人比较有企业远见。一看到C的发展具备潜力,马上把 后来的SAS 版本改由C、而不是Fortran来写。所以会C的专业程式设计师同时也很容易学会SAS。企业界不缺钱顾人学最新的东西,雇了最新的C programmers作系统管理,发现这人同时也略懂SAS系统(反之也是),至于统计学者就另外聘请,形成企业界里系统管理专家和统计学者合作,而由 SAS程式设计师担任连接两边界面的合作团队,促使公司大量采用SAS系统。SAS 就是这样在整体市场行销上拼过SPSS ,在八零年代几乎独占统计软体市场。 C 在八零年代初也被用来发展另外一种管理系统,叫做Unix ,成为IBM Mainframe系统的劲敌。而且在九零年代起开始有取代IBM Mainframe的趋势,成为新的系统管理平台。SAS 当初下对了一步棋,现在就骑在浪头上,远远超过SPSS。在有进行临床实验的药厂更是如此,比方美国食品药物检验局在检查药厂送交的临床实验报告时,就 同意而且规定药厂要以SAS档案格式呈送资料 (最近要打破SAS的独占,规定也能用超越软体档案格式限制的XML的格式标准呈送)。 SAS不纯粹只是市场行销上超越SPSS,就语言本身的灵活和广度来看,都大到适用于不同的企业。这语言的强处,就是因为不需要借助视窗,什么都能 办到:比方能用SAS语言本身去读写各类商用资料库、输出全新的报表格式、和创造新的统计应用软体,如同用语言C一般。SAS programming本身变成是一种专业。极大部份只用SPSS的人的电脑程度实在不能比。不过对大部份在研究机构的学生和老师来说,要精通SAS的程 式设计必须花不少时间学,实在困难。 STATA原本是由计量经济学家和电脑程式设计师一起发展出来,专用在经济指数预测。在发展这语言和软体的时候同时看到SPSS和SAS的优缺点。所以 STATA 的语言环境看起来像是SPSS和SAS的混合 要用视窗或纯写程式的功能几乎一样 (视窗点选的动作可以记录成do file 用batch mode送交系统执行)。现代版的SPSS也发现光用视窗会牺牲执行速度和限制用途,容许使用者在每个主要视窗按paste 的按钮,就可以把语言syntax 记录下来成为类似do file的syntax file,也能交付batch mode执行。 稍微了解STATA多一点之后,会知道STATA在处理资料上有一点跟SPSS和SAS很不同。是STATA 的强处,比较挑剔的人却说是STATA的缺点。 SPSS 和SAS 在处理资料上是一笔记录(record or case)读进来到主要的work memory后,在每一个运算的功能(function)处理过,把这笔资料写出到另外的work memory,再读另一笔资料进来到主要的work memory。是线性按照记录原有存放的次序的处理流程。一笔进来一笔出去,下一个记录才能读进来。处理完写出去的记录不能再回过头来颠倒做再次的处理。 这个流程在SPSS的语言程式里没有显明,但系统在台面下是这样处理。所以大部份SPSS 的使用者如果没有被教通,就根本不知道为什么有些指令是必须要先写才能跟着其他的指令,长久来就都搞不通怎样写好程式,最后只好傻傻地只靠视窗功能。 SAS 的语言里就把资料读进来和处理资料的数算功能(在SAS 里叫做procedure ,简写作proc)明显划分。读资料的语法部份叫做data step。也是每次只读一笔资料,做基本的资料改写或操弄,写出到另一个work memory。再读进另一笔。 但是STATA读资料的时候就突破这种一笔一笔的处理方式 。读的时候是把所有的记录一次都读进到主要的work memory 里组织形成像Excel里的table,基本上是架构成一个距阵,让使用者可以指明要用哪些变数(variable),而且可以选择要哪些记录 record ,要跳着或是颠倒的顺序作处理。举个例子。比方在SPSS或SAS里,写程式的方式是要假想一笔资料读进来,能够处理的是同一笔记录上不同变数间值的加减 乘除(水平方向)。STATA除了水平方向的运算外,更可以在一个动作下处理一个特定变数下不同记录的值之间的加减乘除(垂直方向)。所以更加灵活。 问题是因为是一次读进所有的资料在work memory里来架构距阵,work memory要够大才行。现在一般的个人电脑内设的RAM都足够处理学术界的研究资料。但是企业界在用的资料库,比方银行高达数百万的帐号或是健保资料, 有时候就大到超过个人电脑上RAM一次读进所能够负荷的容量。所以不管学术界和教育界对STATA的灵活程度如何赞不绝口,都没有考虑到企业界的实际顾 虑,对企业界还是只用 SAS的忠诚表示不解。 R处理资料的基本结构跟SPSS和SAS比较像,原本是由ATT Bell lab在八零年代创造的电脑语言S演进而来,算是最新的发展。所以资料数目比较不是考量。但是处理的速度没有SAS快。虽然有很多统计学家写出最新的统计 功能,在使用者学习手册上因为没有市场利益的刺激,却相对欠缺。但只要这两个问题逐渐得到解决,在未来的确可能打破其他统计软体要使用者付费和不断提升版 本的困扰。但这也使得 R使用者要写出和其他软体程式对照的R教学手册,目前受到其他公司用版权百般刁难阻挡。 SPSS 公司看到SAS的优势,后来也赶紧改用C语言作底,而不再靠Fortran。但是使用者社群一分化固定下来,让大部份的专业程式设计师靠拢到SAS阵营去,就限制了SPSS软体后来的发展。 怎么说? 一个新的程式语言和软体要能发展成熟,起码都要大概十年的期间。原本开发新语言和写新软体的人没办法了解所有行业的需要。是靠使用者用了之后给予回 馈和评价,才知道如何改进升格版本。如前一篇评论所言,SPSS的使用者一开始就只是那些学院里的社会科学研究者,使用统计的程度也许不差,但是缺乏对电 脑语言广度的认知,对SPSS和管理系统的相容性也不擅操作。SPSS公司为了继续留住这群主要顾客,当个人电脑开始跃进时,是所有统计软体中最先接纳改 用视窗环境,好方便这些设计程式功夫不高的一般使用者。这在当时看起来虽然是最进步的做法,长远看来却是划地自限,使专业的程式设计师更避开学SPSS。 原因是如果视窗和滑鼠人人会用,那还用得着专业的程式设计师吗?专写程式的人靠什么吃饭? 所以SAS虽然比较晚接纳个人电脑的视窗环境,却比较有耐力,能够靠有好设计程式功力的使用者社群一起发展更具威力的更新版本,提供给各种不同的企业环境运用。 STATA就是看到SAS和SPSS龟兔赛跑的结果,虽然也提供视窗环境方便大众,但是不敢忽略发展STATA语言本身,希望能留住不同程度的使用 者。浅白地说,写SPSS syntax比使用视窗功能略为麻烦;相反地,单写SAS语言比SAS视窗点选的功能强很多;STATA写语言跟使用视窗功能的效率差不多,但是学会直接 写STATA语法能让使用者自己设计统计程序和加强处理数据的功能,所以专业的程式设计师学STATA 比一般使用者更有发挥才能赚饭吃的余地。这种做法也才使得原来是用SPSS或是SAS两边不同程度的人,都有改用STATA的好处。
个人分类: 博文转载|5154 次阅读|0 个评论
STATA经典命令:panel
zhao1198 2010-2-12 09:11
Panel Data Econometrics: STATA Command *************************** * Panel的设置 和描述性统计 . tsset // Declare a dataset to be panel data panel variable: firmid (unbalanced) time variable: yeara, 1990 to 2006, but with gaps . xtdes firmid: 1 , 2 , ..., 3218 n = 3219 yeara: 1989 , 1990 , ..., 2006 T = 18 Delta(yeara) = 1 ; (2006-1989)+1 = 18 (firmid*yeara uniquely identifies each observation) . xtsum // Summarize xt data . xttab // Tabulate xt data *************************** * Hausman test http://fmwww.bc.edu/ec-c/s2009/327/ec327.s2009.php use traffic, clear summarize fatal beertax spircons unrate perincK * Fixed-effects (within) regression xtreg fatal beertax spircons unrate perincK, fe * Fixed-effects (within) regression,adding year dummies first qui tabulate year, generate(yr) local j 0 forvalues i=82/87 { local ++j rename yrj yri qui replace yri = yri - yr7 } drop yr7 xtreg fatal beertax spircons unrate perincK yr*, fe test yr82 yr83 yr84 yr85 yr86 yr87 * Between regression (regression on group means) xtreg fatal beertax spircons unrate perincK, be * Random-effects GLS regression xtreg fatal beertax spircons unrate perincK, re * Hausman test qui xtreg fatal beertax spircons unrate perincK, fe estimates store fix qui xtreg fatal beertax spircons unrate perincK, re hausman fix . *************************** * Dynamic panel: xtabond2 http://fmwww.bc.edu/ec-c/s2009/327/xtabond2.pdf use http://www.stata-press.com/data/r7/abdata.dta,clear * Dynamic panel- data estimation, one-step difference GMM xtabond2 n l.n l(0/1).(w k) yr1980-yr1984, gmm(l.n w k) iv(yr1980-yr1984) /// noleveleq small * Dynamic panel-data estimation, two-step system GMM xtabond2 n l.n l(0/1).(w k) yr1980-yr1984, gmm(l.n w k) iv(yr1980-yr1984, mz) /// robust twostep small h(2) xtabond2 n l(1/2).n l(0/1).w l(0/2).(k ys) yr1980-yr1984, gmm(l.n w k) iv(yr1980-yr1984) /// robust twostep small xtabond2 n l(1/2).n l(0/1).w l(0/2).(k ys), gmm(w k, lag(1 .)) gmm(ys, lag(2 .)) iv(yr198*, eq(lev)) /// robust twostep *************************** * Dynamic panel: xtabond2 http://fmwww.bc.edu/ec-c/s2009/327/ec327.s2009.php http://fmwww.bc.edu/ec-c/s2009/327/xtabond2.pdf * Dynamic panel-data estimation, two-step difference GMM xtabond2 fatal L.fatal spircons year, /// gmmstyle(beertax spircons unrate perincK) /// ivstyle(year) twostep robust noleveleq * Dynamic panel-data estimation, two-step system GMM xtabond2 fatal L.fatal spircons year, /// gmmstyle(beertax spircons unrate perincK) /// ivstyle(year) twostep robust ----------------------- Using Arellano Bond Dynamic Panel GMM Estimators in Stata(Elitza Mileva) tsset ctry_dum year ssc install xtabond2,replace * Dynamic panel-data estimation, one-step difference GMM // * gmm( ) lists the endogenous var // * lag (2 2) instruct to use only the second lag of the endogenous variables as instruments // * iv ( ) lists all strictly exogenous variables (l.growth, uncert, tot, dev_m2) as well as the additional instrumental variables (fin_integr, trans_index, flows_eeca), which are not part of equation (1) and, therefore, are not listed before the comma in the Stata command . // * nolevel (or noleveleq) tells Stata to apply the difference GMM estimator. By default xtabond2 will apply the system GMM, if you dont specify nolevel. // * small tells Stata to use the small-sample adjustment and report t - instead of z-statistics and the Wald chi-squared test instead of the F test. // * twostep specifies that the two-step estimator is calculated instead of the default one-step. In two-step estimation, the standard covariance matrix is robust to panel-specific autocorrelation and heteroskedasticity, but the standard errors are downward biased. Use twostep robust to get the finite-sample corrected two-step covariance matrix. // * robust specifies that the resulting standard errors are consistent with panel-specific autocorrelation and heteroskedasticity in one-step estimation. xtabond2 inv l.inv fdi loans portfolio l.growth uncert tot dev_m2, gmm (inv fdi loans portfolio, lag (2 2)) iv(fin_integr trans_index flows_eeca l.growth uncert tot dev_m2) nolevel small * Dynamic panel-data estimation, one-step system GMM // * equation () sub-option, which specifies which equation should use the instruments: first-difference only ( equation (diff) ) or levels only ( equation (level) ). The default is both equations. xtabond2 inv l.inv fdi loans portfolio l.growth uncert tot dev_m2, gmm (inv fdi loans portfolio, lag (3 3)) iv(fin_integr trans_index flows_eeca l.growth uncert tot dev_m2) small noconst
个人分类: Stata|181 次阅读|0 个评论
[转]将非平行面板转换为平行面板的命令:xtbalance
zhao1198 2009-10-2 17:26
Stata: Unbalanced to Balanced 将非平行面板转换为平行面板的命令 :xtbalance http://blog.cnfol.com/arlion/article/1183850.html 使用范例: xtbalance , range(1998 2005) 下载解压后存放到 personal 文件夹下即可。也可以放到其他的文件夹中,但需要采用 adopath + 命令指定文件夹的路径。 帮助文件: --------------------------------------------------------------------------------------------------------------- help for xtbalance version1.0 --------------------------------------------------------------------------------------------------------------- Trans the dataset into balance Panel Data xtbalance, range(numlist) You must tsset your data before using xtbalance; see help tsset. Description: xtbalance Trans the dataset into balance Panel Data with sample range specified by option range . Options : range(numlist) specifies sample range to be transfored.numlist must be two integers and specified in ascending order. Examples: . help xtbalance . xtbalance, range(1998 2005) For problems and suggestions login my blog http://blog.cnfol.com/arlion Author: Yu-Jun Lian, Jinhe Center, Xi'an Jiaotong University, China. ================== FAQ: 应对安装中可能出现的问题,方法如下: 不知怎么回事,照您的方法做还是不能在 STATA9.0 添加 xtbalance 。真是苦恼! 以下为 blog 主人的回复: 执行如下命令再运行 xtbalance , try 一下 adopath + D:\stata9\ado\personal 以下为 blog 主人的回复: 不知道你的 STATA 中是否设定了 profile.do 文件,如果没有,可以设一个。它的作用是把一些基本的设定定义好,在每次运行 STATA 时自动执行。 设定方法:把下面的代码粘贴到 do 文件编辑器中,保存到 D:\stata9 中,名称为 profile.do 。当然,你也可以根据自己的需要添加或删除命令。 adopath + D:\stata9\ado\personal adopath + D:\stata9\ado\personal\invt adopath + D:\stata9\ado\personal\update2 //adopath + D:\statawd\chung //adopath + D:\statawd\mine local fn = subinstr(`c(current_time)',:,,2) log using d:\stata9\ado\do\s`fn'.log, text replace cmdlog using d:\stata9\ado\do\c`fn'.log, replace sysdir set PLUS D:\stata9\ado\plus sysdir set OLDPLACE D:\ado sysdir set PERSONAL D:\stata9\ado\personal set matsize 2000 set more off,perma cd d:\stata9\ado\personal 下面的命令可保持时间跨度不变,将 unbalance 转化为 balance : tsset firm year,yearly xtdes by firm: gen obs=_N drop if obsr(max) xtbalance_ado
个人分类: Stata|288 次阅读|0 个评论
How do I graph data onto a map with tmap?stata_faq
zhao1198 2009-9-28 13:11
How do I graph data onto a map with tmap? http://www.stata.com/support/faqs/graphics/tmap.html
个人分类: Stata|2847 次阅读|0 个评论
Guide to creating maps with Stata
zhao1198 2009-9-28 13:09
Guide to creating maps with Stata The graphs and maps on this site are created with the Stata statistical package. This article describes how to make maps like those showing Millennium Development Goal regions and UNICEF regions in Stata from a shapefile. Shapefiles store geographic features and related information and were developed by ESRI for its ArcGIS line of software. The shapefile format is used by many other programs and maps in this format can be downloaded from various sites on the Internet. Another common map format is the MapInfo Interchange Format for use with the MapInfo software. Shapefile data is usually stored in a set of three files (.shp, .shx, .dbf), while MapInfo data is stored in two files (.mif, .mid). Some sources for shapefiles and other data are listed on the website of the U.S. Centers for Disease Control and Prevention (CDC) under Resources for Creating Public Health Maps . The CDC itself provides shapefiles for all countries with administrative boundaries down to the state level. Please note that these shapefiles are not in the public domain and are intended for use with the CDC's Epi Info software only. Other sources of shapefiles can be found with a Google search. This guide is divided into two parts. Read part 1 if you have Stata 9 or 10 and part 2 if you have Stata 8. The creation of maps is not supported in older versions of Stata. Part 1: Creating maps with Stata 9 or 10 To create a map with Stata 9 or 10 you need the following software. Stata version 9.2 or newer. spmap: Stata module for drawing thematic maps, by Maurizio Pisati. Install in Stata with the command ssc install spmap . shp2dta: Stata module for converting shapefiles to Stata format, by Kevin Crow. Install in Stata with the command ssc install shp2dta . Shapefile: For the example in this guide, download world_adm0.zip (646 KB), a shapefile that contains the boundaries of all countries of the world. Step 1: Convert shapefile to Stata format Unzip world_adm0.zip to a folder that is visible to Stata. The archive contains three files called world_adm0.dbf, world_adm0.shp, and world_adm0.shx. Start Stata and run this command: shp2dta using world_adm0, data(world-d) coor(world-c) genid(id) Two new files will be created: world-d.dta (with the country names and other information) and world-c.dta (with the coordinates of the country boundaries). If you plan to superimpose labels on a map, for example country names, you should run the following command instead, which will add centroid coordinates to the file world-d.dta: shp2dta using world_adm0, data(world-d) coor(world-c) genid(id) genc(c) Please refer to the spmap documentation to learn more about labels because they are not covered in this guide. The DBF, SHP, and SHX files can be deleted. Some shapefiles are not compatible with the shp2dta command and Stata will abort the conversion with an error message. If this is the case, you can use a combination of two other programs, shp2mif and mif2dta. These programs are explained in the instructions for Stata 8 (see Step 1 and Step 2 in part 2 of this guide). Step 2: Draw map in Stata Open world-d.dta in Stata. The file contains no country-specific data that could be used for this example so we will create a variable with the length of each country's name. The Stata command for this is: generate length = length(NAME) Draw a map that indicates the length of all country names with this command: spmap length using world-c.dta, id(id) Be patient because spmap is slow if a map contains many features. The default map is monochrome, it shows Antarctica, the legend is too small and the legend values are arranged from high to low. We can draw a second map without Antarctica, with a blue palette, and with a bigger legend with values arranged from low to high: spmap length using world-c.dta if NAME!=Antarctica, id(id) fcolor(Blues) legend(symy(*2) symx(*2) size(*2)) legorder(lohi) You now have the map below. Darker colors indicate longer names, ranging from 4 letters (for example Cuba and Iraq) to 33 letters (Falkland Islands (Islas Malvinas)). To customize the map further, please read the Stata help file for spmap. Map created with spmap in Stata: length of country names The instructions above can be used to convert any shapefile to Stata format. If you have maps in MapInfo format you have to use another program called mif2dta that is described in part 2 of this guide. Part 2: Creating maps with Stata 8 To create a map with Stata 8 you need the following software. Stata version 8.2. tmap: Stata module for thematic mapping by Maurizio Pisati. Install in Stata with the command ssc install tmap . mif2dta: Stata module for converting files from MapInfo to Stata format, also by Maurizio Pisati. Install in Stata with the command ssc install mif2dta . SHP2MIF: DOS program for converting shapefiles to MapInfo format. Go to the the website of RouteWare and click on SHP2MIF (135 Kb) under the heading Converters to get ishp2mif.zip. Shapefile: For the example in this guide, download world_adm0.zip (646 KB), a shapefile that contains the boundaries of all countries of the world. Step 1: Convert shapefile to MapInfo format Unzip ishp2mif.zip. The archive contains three files, among them SHP2MIF.EXE. Unzip world_adm0.zip to the same folder as SHP2MIF.EXE. The archive contains three files called world_adm0.dbf, world_adm0.shp, and world_adm0.shx. Open a DOS command window: Windows Start menu - Run - command - OK. Change the path in the command window to the folder that contains SHP2MIF.EXE and the three map files. Use the DOS command cd to change the path. SHP2MIF works best with short file names in the 8.3 format (name up to 8 characters, extension up to 3 characters). Rename the map files with this DOS command: rename world_adm0.* world.* The map files are now called world.dbf, world.shp, and world.shx. Convert the maps to MapInfo format by typing shp2mif world in the DOS command window. This produces two new files: WORLD.MID and WORLD.MIF. Close the DOS command window. The DBF, SHP and SHX files can be deleted. Step 2: Convert MapInfo files to Stata format Move the MIF and MID files to a folder that is visible to Stata. Start Stata and run this command: mif2dta world, genid(id) Two new files will be created: world-Coordinates.dta (with the country boundaries) and world-Database.dta (with the country names and other information). If you plan to superimpose labels on a map, for example country names, you should run the following command instead, which will add centroid coordinates to the file world-Database.dta: mif2dta world, genid(id) genc(c) Please refer to the tmap documentation to learn more about labels because they are not covered in this guide. The MIF and MID files can be deleted. Step 3: Draw map in Stata Open world-Database.dta in Stata. The file contains no country-specific data that could be used for this example so we will create a variable with the length of each country's name. The Stata command for this is: generate length = length(name) Draw a map that indicates the length of all country names with this command: tmap choropleth length, map(world-Coordinates.dta) id(id) Be patient because tmap is slow if a map contains many features. The default map is monochrome, it shows Antarctica and the legend is too small. We can draw a second map without Antarctica, with a blue palette, and with a bigger legend: tmap choropleth length if name!=Antarctica, map(world-Coordinates.dta) id(id) palette(Blues) legsize(2) To reduce the margins, display the graph again and set the margins to zero: graph display, margins(zero) You now have the map below. Darker colors indicate longer names, ranging from 4 letters (for example Cuba and Iraq) to 33 letters (Falkland Islands (Islas Malvinas)). To customize the map further, please read the Stata help file for tmap and the tmap user's guide by Maurizio Pisati. The user's guide and additional tmap files can be downloaded in Stata with the commands ssc describe tmap and net get tmap . Map created with tmap in Stata: length of country names The instructions above can be used to convert any shapefile to Stata format. If you have maps in MapInfo format you can skip step 1 of the instructions and start with step 2. Related articles Guide to integrating Stata and external text editors Guide to creating PNG images with Stata Guide to reading Statalist with Gmail External links Stata FAQ: How do I graph data onto a map? Wikipedia article on shapefiles Wikipedia article on MapInfo Interchange Format Resources for Creating Public Health Maps from the Centers for Disease Control and Prevention (CDC) Friedrich Huebler, 6 November 2005 (edited 30 June 2009), Creative Commons License Permanent URL: http://huebler.blogspot.com/2005/11/creating-maps-with-stata.html http://huebler.blogspot.com/2005/11/creating-maps-with-stata.html
个人分类: Stata|6059 次阅读|0 个评论
[STATA]HAIF: Stata module to compute Homoskedastic Adjustment Inflation Factors
zhao1198 2009-9-14 17:34
haif calculates homoskedastic adjustment inflation factors (HAIFs) for core variables in the corevarlist, caused by adjustment by the additional variables specified by addvars(). HAIFs are calculated for the variances and standard errors of estimated linear regression parameters corresponding to the core variables. For each variance (or standard error), the HAIF is defined as the ratio between that variance (or standard error) of that parameter, in a model containing both the core variables and the additional variables, to the corresponding variance (or standard error) of the same parameter, in a model containing only the core variables, calculated assuming that the second model is true, and also assuming that the outcome variable is homoskedastic (meaning that it has equal variances in all subpopulations defined by the predictor variables). haifcomp calculates the ratios between the HAIFs for the same core variables caused by adjustment for two alternative lists of additional variables, namely a numerator list and a denominator list. haif and haifcomp are intended for use in model selection, allowing the user to choose a model based on the joint distribution of the exposures and confounders, before estimating the parameters of the model from the data on the outcome variable.
个人分类: Stata|3811 次阅读|0 个评论
[STATA]SORTOBS: Stata module to sort observations according to a specified order
zhao1198 2009-9-14 17:28
http://ideas.repec.org/c/boc/bocode/s457003.html sortobs allows the user to sort observations by either (1) a variable's specific values or (2) observation numbers. Observations that are not specified in the command retain their original, respective sort orders.
个人分类: Stata|2854 次阅读|0 个评论
[STATA]GREP: Stata module to search within your datasets for keywords
zhao1198 2009-9-14 17:25
http://ideas.repec.org/c/boc/bocode/s457002.html grep emulates the unix/linux command by the same name and will of course run on all operating systems. You can use it to parse any list of dta files and find ones with variables whose variable name or variable labels contain strings that interest you. It display the results in smcl format and they are clickable to you can directly describe the results. Furthermore it returns everything including datasets and variables found so you can program on top of it.
个人分类: Stata|2878 次阅读|0 个评论
SEQCOMP, a sequence analysis Stata plug-in
zhao1198 2009-8-31 18:23
SEQCOMP, a sequence analysis Stata plug-in http://laurent.lesnard.free.fr/article.php3?id_article=8 Version 1.0 Available for Stata (v9 and higher) Mac (intel and PPC) and Windows Wednesday 28 May 2008 This Stata plug-in implements a sequence analysis method which has been presented in a working paper and previously in an article published in the Electronic International Journal of Time Use Research , Vol. 1 No. 1, pp. 67-91. Social sciences lack solutions to perform sequence analysis. This paper presents the Stata plug-in which was developed to implement a sequence analysis method I thought up to build a taxonomy of work schedules. Warning ! prior to version 0.7, the plugin was not the exact implementation of the formula proposed Differences are likely to be minor but users are advised to check on (...)" class="spip_note" name="nh1" href="http://laurent.lesnard.free.fr/article.php3?id_article=8#nb1">1 ] here . Many thanks to Renzo Carriero who pointed out that to me. First version: 7 december 2006 A sequence comparison method based on the sole substitution operations Although this method can be seen as a particular case of Optimal Matching , it is only a distant relative since only substitution operations are used. As a consequence, this method is only suitable for sequences of identical length. In a way, this method is closer to the Hamming distance which is usually considered as the ancestor of the Levenshtein distance (OM). Hence, a possible name for this method could be dynamic hamming dissimilarity measure. Indeed, subsitution costs are not equal to one unit as in the Hamming distance but are derived from the series of transition matrices which describe, between two episodes, the fluctuations between the states considered in the analysis. More precisely, sizable transitions between two states between t and t+1 means that they are close in probabilistic terms: the chances that switching between the two states are high. On the contrary, few transitions are observed between two states mean that these two states are distant. Work schedules can be sumarized by a two-state (work and no work) process. At 9 AM, transitions from work to no work are presumably higher than at 9 PM and consequently, workers and non workers will be considered as close at 9 AM and very distant at 9 PM. As a sequence comparison method, the end result is a matrix composed of the dissimilarity for every pair of sequences. A data reduction technique, such as cluster analysis or multidimensional scaling (MDS) is needed if these dissimilarities are to be exploited. Content of the zip file A Stata plug-in is actually composed of two distinct files: the plug-in strictly speaking, which extension is simply plugin This extension is hiding a dll." class="spip_note" name="nh2" href="http://laurent.lesnard.free.fr/article.php3?id_article=8#nb2">2 ]. an ado file, named here seqcomp.ado , an interface to distseq.plugin These two files must be unzipped into your local personal ado folder, installed somwhere on your computer . Once these two files installed, the plugin can be used through basic Stata syntax: seqcomp varlist In varlist, the first argument, should be put the list of variables the sequences to be analyzed are made of. The analysis can be restricted to certain sequences through the if option and weights can also be used The keyword iw is used since the version 0.4 in place of aw: iw is used (...)" class="spip_note" name="nh3" href="http://laurent.lesnard.free.fr/article.php3?id_article=8#nb3">3 ]. Typical use is: seqcomp episode1-episode100 The dissimilarities computed by the plugin are available as a Stata dissimilarity matrix named dhamdist . Note that the size of this matrix does not depend on matsize hence can be way over 800 for Stata Intercooled users and way over 11,000 for Stata SE ones. Getting the dissimilarities as a Stata matrix slows down a little things so it is possible to disable this feature using the nodistmat option. In this case the export option to save the result in a dissimilarity list becomes compulsory (results have to be stored somewhere!). The using command is also compulsory when export is chosen as it indicates where the results are to be stored. Remark that the file path must imperatively include at the end the appropriate folder separator . For example seqcomp episode1-episode100 using C:\temp\, export nodistmat will analyse all the sequences in the files from episode1 to episode100 and will put the results in C:\temp\. id() is optional but useful when export is chosen as it helps to match the internal id used to compute dissimilarities with any their original id, if any. Weights are taken into account for the calculations of the transition matrices but not for matching, which is by definition a one to one comparison. When weights are turned on, it is the users responsibility to use them again properly in the data reduction stage. Finally, it is possible to tell seqcomp which variable identifies observations: a file including a mapping of this variable to the internal id used will be produced. Results are made of three files if the export option is chosen: substitution.dat , which contains the series of the substitution cost matrices distancelist.dat , which presents the dissimilarity matrix as a dissimilarity list file with three columns: dissimilarities are located in the third column whereas the id of the couples of sequences can be found in the two first columns. 2 1 x 3 1 x 3 2 x 4 2 x 1 3 x ... idmapping.dat , made of two columns: the first one lists the internal ids of observations and the second gives their true id. This is the most efficient way of storing a dissimilarity matrix and is quite easy to use with standard statistical packages, in particular with the cluster package ClustanGraphics which reads without problem proximity lists . Stata itself reads proximity lists but is restricted to small matrices Matrix maximum size is 800 for Stata intercooled and 11,000 for Stata (...)" class="spip_note" name="nh4" href="http://laurent.lesnard.free.fr/article.php3?id_article=8#nb4">4 ]. However, Stata is not good when it comes to do cluster analysis: few (old) algorithms are available. SAS and ClustanGraphics are better in this field but neither features the latest methods. Why writing a plug-in and not a classical Stata ado file with Mata statements? The principle of sequence analysis is quite simple but require a lot of computer memory. Stata is not good when it comes to manage memory with such procedures and the only solution is to program these elements in C. Differences are likely to be minor but users are advised to check on their data. This extension is hiding a dll. The keyword iw is used since the version 0.4 in place of aw : iw is used to reflect the relative importance of observations (post-stratification etc.) whereas aw is inversely proportional to some variance measure (and as a consequence has nothing to do with sampling considerations). Matrix maximum size is 800 for Stata intercooled and 11,000 for Stata Special Edition (SE).
个人分类: Stata|4353 次阅读|0 个评论
Archive for the ‘Stata’ Category
zhao1198 2009-8-31 18:21
Archive for the Stata Category Many Stata do-files , also with R. http://changjx.wordpress.com/category/stacomputing/stata/
个人分类: Stata|2547 次阅读|0 个评论
STATA笔记_7年制医学院学生的博客
zhao1198 2009-8-31 17:22
有STATA学习笔记 http://chenchen0221.spaces.live.com/blog/cns!3F0A2D82728FA043!336.entry STATA的变量赋值 用generate产生新变量 generate 新变量=表达式 generate bh=_n  /* 将数据库的内部编号赋给变量bh。 generate group=int((_n-1)/5)+1 /* 按当前数据库的顺序,依次产生5个1,5个2,5个 3。直到数据库结束。 generate block=mod(_n,6)    /* 按当前数据库的顺序,依次产生1,2,3,4,5,0。 generate y=log(x) if x0  /* 产生新变量y,其值为所有x0的对数值log(x),当x=0时,用缺失值代替。 egen产生新变量 set obs 12 egen a=seq() /*产生1到N的自然数 egen b=seq(),b(3) /*产生一个序列,每个元素重复#次 egen c=seq(),to(4) /*产生多个序列,每个序列从1到# egen d=seq(),f(4)t(6) /*产生多个序列,每个序列从#1到#2
个人分类: Stata|2287 次阅读|0 个评论
Princeton stata 教程(2)
zhao1198 2009-8-30 17:06
STATA http://dss.princeton.edu/online_help/stats_packages/stata/stata.htm Stata is an interactive data analysis program which runs on a variety of platforms. Stata is installed on the Windows machines and Macs in OIT's public clusters and on the Windows machines in the DSS Data Lab, as well as on the Tombstone Unix server. DSS Resources Introduction to Stata Introduction Issuing Commands Stata's Online Help Operating System Interface Dealing With Memory Requirements - what to do if there's no room Keeping Track of Your Work Stata's Built-in Calculator: display Data, Datasets and Variables Data Files - what they are and how to get them in Stata Converting to and from Excel and spreadsheet files Reading other kinds of text data Saving Data Missing Values Stata Variable Types Stata Variable Names Exploring your Data Examining your Data Summary Statistics Simple Regression Predicted Values Creating and Modifying Variables Variable creation commands The if qualifier Combining tests: and and or Subscripting Running Stata on Unix Running Unix Stata in text mode Stata for Unix with an XWindows interface Running large jobs in the background