科学网

 找回密码
  注册

tag 标签: standard

相关帖子

版块 作者 回复/查看 最后发表

没有相关内容

相关日志

[转载]分清楚STANDARD ERROR 和 STANDARD DEVIATION
Zhouwenju 2014-3-9 14:04
Hi, I will try to put it in a nutshell. The SD is a measure of the dispersion of the data around the mean. If you have a sample (let us call it sample 1) and you take some measurement on it (e.g. you take into account the heart's beats per minutes of a sample of people), you will find that the mean is, let us say, 60 with a SD of 10. This tell you how much variation there is in your sample and, if your observations are normal distributed, you are in the position to know how many observations lie between, say, the mean and given number of SD. If you draw other samples (sample 2,3,4,5,6 on so forth) you will have other mean values; sometimes they can be 60, sometimes 58, sometimes 62, and so on (everyone with its own SD). If you collect all this mean values (that is, the mean of sample 1, the mean of sample 2, the one of sample 3,4,5, 6, and so on) you will get a new distribution (called sampling distribution) with a new mean and a new SD. This new SD (the SD of the sampling distribution) is the SE, that is a measure of the dispersion of the distribution of all those samples you have collected. Usually, you are in the position to have just one sample, and you have to estimate the SE on the basis of the unique sample you have. How do you calculate the SE? On the basis of the SD of your single sample (you can easily find the formula on the web). The SE is important to calculate the confidence interval for the population mean. In other words, given your sample, you may want to infer the mean of the population the sample comes from. You get this by calculating a confidence interval in which the true population mean will lie, starting from the mean of your sample and from the SE that you estimated from the SD of your sample. I hope this helps a bit, regards Gm http://www.talkstats.com/showthread.php/10262-Difference-between-standard-deviation-and-standard-error
个人分类: 基础知识|542 次阅读|0 个评论
approximate Likelihood-Ratio Test 和 standard bootstrap区别
zczhou 2013-3-7 00:46
aLRT (parametric bootstrap)和 standard bootstrap(nonparametric bootstrap)的区别,aLRT 是phyML计算支持率的另外一种方法,其中Chi2-based aLRT (approximate Likelihood-Ratio Test) for branches 得到的支持率比较松散,SH-like 得到的比较相近 -b (or --bootstrap) int int = -1 : approximate likelihood ratio test returning aLRT statistics. int = -2 : approximate likelihood ratio test returning Chi2-based parametric branch supports. int = -3 : minimum of Chi2-based parametric and SH-like branch supports. int = -4 : SH-like branch supports alone. aLRT is a statistical test to compute branch supports. It applies to every (internal) branch and is computed along PhyML run on the original data set. Thus, aLRT is much faster than standard bootstrap which requires running PhyML 100-1,000 times with resampled data sets. As with any test, the aLRT branch support is significant when it is larger than 0.90-0.99. With good quality data (enough signal and sites), the sets of branches with bootstrap proportion 0.75 and aLRT0 aLRT (approximate Likelihood-Ratio Test) for branches -b (or --bootstrap) int int = -1 : approximate likelihood ratio test returning aLRT statistics. int = -2 : approximate likelihood ratio test returning Chi2-based parametric branch supports. int = -3 : minimum of Chi2-based parametric and SH-like branch supports. int = -4 : SH-like branch supports alone. aLRT is a statistical test to compute branch supports. It applies to every (internal) branch and is computed along PhyML run on the original data set. Thus, aLRT is much faster than standard bootstrap which requires running PhyML 100-1,000 times with resampled data sets. As with any test, the aLRT branch support is significant when it is larger than 0.90-0.99. With good quality data (enough signal and sites), the sets of branches with bootstrap proportion 0.75 and aLRT0.9 (SH-like option) tend to be similar. Perform bootstrap and number of resampled data sets -b (or --bootstrap) int int 0 : int is the number of bootstrap replicates. int = 0 : neither approximate likelihood ratio test nor bootstrap values are computed. When there is only one data set you can ask PhyML to generate resampled bootstrap data sets from this original data set. PhyML then returns the bootstrap tree with branch lengths and bootstrap values, using standard NEWICK format. The "Print pseudo trees" option gives the pseudo trees in a *_boot_trees.txt file. option) tend to be similar. Perform bootstrap and number of resampled data sets -b (or --bootstrap) int int 0 : int is the number of bootstrap replicates. int = 0 : neither approximate likelihood ratio test nor bootstrap values are computed. When there is only one data set you can ask PhyML to generate resampled bootstrap data sets from this original data set. PhyML then returns the bootstrap tree with branch lengths and bootstrap values, using standard NEWICK format. The "Print pseudo trees" option gives the pseudo trees in a *_boot_trees.txt file. reference linking: http://www.atgc-montpellier.fr/phyml/usersguide.php?type=command http://www.atgc-montpellier.fr/phyml/alrt/
7062 次阅读|0 个评论
[转载]Fortran compilers for MacOS(ZZ)
hjlyyc 2013-2-24 19:13
http://www.microscopy.cen.dtu.dk/computing/fortran/index.html gcc (Gnu compiler collection) contains fortran as well as C compilers. Apple's version of gcc is modified from the standard version, it contains a number of extensions (eg -arch to compile for Intel or PPC) but does not support fortran. So a separate installation of a fortran compiler is needed. And there are many different possible versions... All compilers need the Developer tools installed which are included on the operating system CDs or can be downloaded from Apple's developer site (you will need to create a (free) account). For Lion (10.7) the developer tools (Xcode 4.1) can be downloaded from the App store. Apple's gcc versions are (Aug 2011): Lion 10.7.0 gcc 4.2.1, Snow Leopard 10.6.8 gcc 4.2.1, Leopard 10.5.8 gcc 4.0.1 The latest stable version of gcc (27 June 2011) is gcc 4.6.1 In the past, by default gcc produced 64 bit code but gfortran produced 32 bit code! More recently, gfortran (at least the "R for Macos version") now produces 64 bit code by default. See 32 bit 64 bit compilers below. Below are some of the fortran compilers available for MacOS and where to get them from. gfortran gfortran is the most recent fortran 95 compiler and is supported by gcc. Executive summary of the info below: for the latest gfortran straight from the sources go for the ones from "HPC MacOS X" or from fink for an Xcode compatible gfortran with Apple extensions go for the one from "R for MacOS X". GNU GCC Source for gcc is available from gcc . But it takes some effort to compile. There is an offical home page for gfortran , a gfortran wiki (more useful than the home page) and a page on the gfortran wiki containing links to gfortran binaries for MacOS and other machines. Binaries are available for Leopard (Intel and PPC) and for Tiger (Intel only) and date from early/mid 2009. I believe these are straight gcc and don't contain the Apple extensions. HPC MacOS X HPC MacOS X have gcc 4.6 binaries containing gfortran and gfortran only binaries for Lion, Snow Leopard and Leopard. These are from GNU sources and I believe don't contain the Apple extensions. They install in /usr/local. As of 9 Aug 2011 both gcc and gfortran are version 4.6.x The binaries are intel 64 bit only and are compiled to produce 64 bit code. MacResearch.org MacResearch.org have a number of pages on gfortran. gfortran for Leopard has a link to a gfortran installer based on gcc 4.3 and dated Nov 2007. I think this is straight gcc. gfortran (64 bit) in OSX 10.6 recommends the version of gcc 4.4 from fink as being 64 bit provided it is installed from 64 bit fink. It also suggests that gcc from MacPorts doesn't work on 10.6 yet (but may be fixed by now). R for MacOS X The "R for MacOS X" Tools page has a link to a gfortran 4.2.3 binary and the comment (referring to compiling the "R" package) "do not use compilers from HPC, they won't work correctly!". It is a universal binary and supports the Apple extensions eg compiling for Intel and PPC plus 64 bit versions and can create universal binaries. Not sure whether this version works on Snow Leopard. Lower down there is a link to a gfortran 4.2.4 package for Lion (gfortran-lion-5666-3.pkg), Snow Leopard (gfortran-42-5646.pkg), Leopard and Tiger. This adds gfortran to Xcode 3.2 on Snow Leopard and thus includes the Apple extensions. It does not install the man page. This version installs gfortran into /usr/bin. The version for Lion is a universal binary for 32 64 bit Intel only. For Snow Leopard it is a universal binary for 32 bit Intel and 32 bit PPC only. It will produce 32 bit Intel or PPC or 64 bit Intel code. It does not have the libraries to produce 64 bit PPC code. See also the info at the bottom of this page on universal compilers . Fink Fink ( try here if the previous link is dead) is a package management system for MacOS. It comes in 32 bit and 64 bit versions but you have to install one or the other, not both and all packages must be the same. Fink has both 32 and 64 bit versions of gfortran (as part of gcc 4.4.1) for Snow Leopard. It installs all files into /sw. Fink gives its gcc a different name (gcc-4) from that installed by Apple to avoid conflicts. A recent (29 Oct 2009) attempt to install this gcc and gfortran on Snow Leopard caused the machine to hang part way through. MacPorts MacPorts is another package management system for MacOs. It has a version of gcc 4.4.1 including gfortran for Snow Leopard. There was a rumor (see MacResearch.org above) that it doesn't work. Uninstall gfortran To uninstall gfortran do "sudo gfortran-uninstall" g95 g95 is an alternative fortran 95 compiler to gfortran. The website doesn't seem to have been updated since July 2009. The G95 project The g95 project has g95 binaries for x86 and PPC on its downloads page . These are dated 24 June 2009 so may not work on Snow Leopard. Fink Fink includes only 32 bit versions of g95 for Leopard and earlier. There seems to be no Snow Leopard version. g77 g77 is now obsolete but still very popular. The final version of g77 is 3.4.3. HPC MacOS X HPC MacOS X has g77 binaries for Intel and PPC dating from October 2006. Not sure which OS versions they run on. Fink Fink includes both 32 and 64 bit versions of g77 for Snow Leopard and earlier. f2c and fort77 f2c converts fortran source into C source, which can then be compiled with gcc. fort77 is a wrapper that runs f2c and compiles in one go, thus behaving like a fortran compiler. fort77 supports the -i2 option to make the default for INTEGER to be INTEGER*2 not INTEGER*4. Neither g77 nor gfortran support this option. Fink Fink includes both 32 and 64 bit versions of f2c and fort77 for Snow Leopard and earlier. 32 bit and 64 bit compilers "32 bit" and "64 bit" refer to the size of the integers used to reference memory. Unsigned 32 bit (4 byte) integers have a maximum of 2 32 = 4.295×10 9 and thus can only address up to 4GB of memory (2.147×10 9 for signed integers). Any computer that uses more than 4GB of memory needs to use 64 bit addressing to make use of more than 4GB of its memory. As an aside, 32 bit Windows is limited to fractionally under 3GB because memory addresses above 3GB are reserved for "memory mapped input/output" (MMIO). And the "fractionally under" is because some addreses between 640K and 1M are still reserved for the same function from the days of the 640KB limit and 20 bit addressing (max 1MB) in early PCs. 64 bit programs (applications) To make use of more than 4GB of memory a program must be compiled to be "64 bit", ie use 8 byte integers for memory addressing. In addition, for a 64 bit program to run the operating system must be 64 bit, any libraries used must be 64 bit and the processor must be 64 bit. 64 bit compilers To produce a 64 bit program the compiler must be capable of generating 64 bit code. It does not matter whether the compiler is itself a 32 or 64 bit program. So for example g77 is available in both 32 bit and 64 bit versions, but both versions produce 32 bit code only (I believe...). Universal binaries A universal binary is a binary that has been compiled for more than one architecture and bundled together in one file. On the mac this means it can be compiled for any combination of 32 or 64 bit, or PPC or Intel. So for example Apple's gcc on Snow Leopard is a universal binary compiled for 32 bit Intel (i386), 32 bit PPC (ppc7400) and 64 bit Intel (x86_64). Use the file command to find the type of architecture, eg file /usr/bin/gcc Use the -arch option to produce code for alternative architectures. Alternatives are: i386, x86_64, ppc and ppc64 (note that Apple's gcc does not have libraries for ppc64). Multiple values give a universal binary, eg gfortran -arch x86_64 -arch i386 -arch ppc hello.f -o hello There are also -m32 and -m64 options to gcc and gfortran. These are not Apple extensions so are present in all versions of gcc and gfortran. I believe they are subtly different from the corresponding -arch options... As an aside, to find out which libraries a program needs do eg otool -L hello Defaults By default gcc on a 64 bit computer running Snow Leopard produces 64 bit code. But gfortran on Snow Leopard (at least the version from "R for MacOS X") is a 32 bit program and produces 32 bit code by default (although you can ask it to produce 64 bit code). So if you try to link code compiled with gfortran with code compiled with gcc you get errors about incompatible libraries. To solve, specify the architecture you need with either -arch or -m32/64. On Lion both gcc and gfortran from "R for MacOS X" are universal 32 64 bit Intel programs and by default both produce 64 bit code. CEN | Computing
个人分类: 软件教程|3 次阅读|0 个评论
[转载]gold standard algorithm, platinum standard &diamond standard
cooooldog 2013-2-22 10:13
gold standard algorithm 是否意味着 非常好?顶级的? 如果platinum standard algorithm, diamond standard algorithm是不是更好?
62 次阅读|0 个评论
[转载]GeneMapper fragment size menu
genesquared 2013-2-6 15:21
Size-Matching/Size-Calling Algorithm Size-Matching Size-Calling Algorithm This algorithm uses a dynamic programming approach that is efficient (runs in low polynomial time and space) and guarantees an optimal solution. It first matches a list of peaks from the electropherogram to a list of fragment sizes from the size standard. It then derives quality values statistically by examining the similarity between the theoretical and actual distance between the fragments. Size-Matching Algorithm Example Figure 3-14 shows an example of how the size-matching/calling algorithm works using contaminated GeneScan™ 120 size standard data. Detected peaks (standard and contamination) are indicated by blue lower bars along the x-axis. The size standard fragments as determined by the algorithm (and their corresponding lengths in base pairs) are designated by the upper green bars. Note that there are more peaks than size standard locations because the standard was purposely contaminated to test the algorithm. The algorithm correctly identifies all the size standard peaks and removes the contamination peaks (indicated by the black triangles) from consideration. The large peak is excluded from the candidate list by a filter that identifies the peak as being atypical with respect to the other peaks. Figure 3-14 Size-matching example
个人分类: 3500|0 个评论
EOF analysis of one climate field
热度 2 gwangcc 2012-11-1 17:09
function =eof(X,neof) % function =eof_analysis(X,neof) % Wrapper function to perform PCA of a field X % with TWO spatial dimensions. (This code will also % work with a SINGLE spatial dimension. But it might % be easier to directly call 'principal_component_analysis'.) % This function basically transforms the data matrix into % a standard 2-d data matrix, taking into account NaNs. % Input: % X: (x,y,t) or (x,t). % neof: number of EOF/PC to return % Output: % EOFs: 3-d (x,y,e) or 2-d (x,e) matrix with spatial % patterns. % PCs: 2-d matrix (t,e) with principal components (scores) % in the columns % Var: variance of each principal component % Xrecon: 3-d (x,y,t) or 2-d (x,t) matrix with reconstructed X % WITHOUT adding back the mean % Note: Xpc is the projection of X onto the corresponding EOF. % That is, Xpc(it,ie) = nsum(nsum(X(:,:,it).*Xeofs(:,:,ie))) % Use this to project a physical field onto EOF space. % If X only has 2 dimensions, assume second dimension is time. % Insert 'y' dimension to conform with rest of function which % assumes X represents multiple realizations of a 2-D field. if ndims(X)==2 =size(X); X=reshape(X, ); end % Flatten 2-d fields into single vector =size(X); Xd=reshape(X, ); % (space,time) % remove NaNs inn=find(~isnan(Xd(:,1))); Xd=Xd(inn,:); % (space, time) n=size(Xd,1); %normalize for i=1:n % Xd(i,:)=Xd(i,:)-nanmean(Xd(i,:)); % Xd(i,:)=detrend(squeeze(Xd(i,:)),'constant'); % Xd(i,:)=detrend(squeeze(Xd(i,:))); Xd(i,:)=zscore(detrend(squeeze(Xd(i,:)),'constant')); end Xd=Xd'; Xd=double(Xd); % PCA if nargout3 =principal_component_analysis(Xd,neof); else =principal_component_analysis(Xd,neof); end % Xpcs (t,e) % Reshape EOFs to physical space Xeofs=repmat(NaN, ); Xeofs(inn,:)=EOFs; Xeofs=squeeze(reshape(Xeofs, )); Xrecon=repmat(NaN, ); for ie=1:neof for it=1:nt Xrecon(:,:,it)=Xrecon(:,:,it)+Xpcs(it,ie)*Xeofs(:,:,ie); end end if nargout3 XR=XR'; % (x,t) Xrecon=repmat(NaN, ); Xrecon(inn,:)=XR; Xrecon=squeeze(reshape(Xrecon, )); end function Xpcs=eof_project(Xanom,Xeofs,numd) % function Xpcs=eof_analysis(Xanom,Xeofs,numd) % Function to project physical field (1 or 2 spatial % dimensions) onto EOF. % Input: % Xanom: (x,y,t) or (x,t) - physical ANOMALY field % Xeofs: 3-d (x,y,e) or 2-d (x,e) matrix with spatial % patterns. % numd: if Xanom is (x,t), MUST set ndims=1 % Output: % Xpcs: 2-d matrix (t,e) with projection coefficients % (principal components) in the columns. Number % of projection coefficients returned is equal % to the number of EOFs passed in the input argument. % Samar Khatiwala (spk@ldeo.columbia.edu) if nargin3 % default is 2-d numd=2; end if ndims(Xanom)==2 numd==1 % data is (x,t) Xpcs=Xeofs'*Xanom; Xpcs=Xpcs'; else % nt=size(Xanom,3) % neofs=size(Xeofs,3) % Xpcs=repmat(NaN, ); % for it=1:nt % Xpcs(it,:)=squeeze(nansum(nansum(Xeofs.*repmat(Xanom(:,:,it), ))))'; % end % for ie=1:neofs % Xpcs(:,ie)=squeeze(nansum(nansum(Xanom.*repmat(Xeofs(:,:,ie), )))); % end % Flatten 2-d fields into single vector =size(Xanom); neofs=size(Xeofs,3); Xanom=reshape(Xanom, ); % (x,t) Xeofs=reshape(Xeofs, ); % (x,e) % remove NaNs inn=find(~isnan(Xanom(:,1))); Xanom=Xanom(inn,:); % (x,t) Xeofs=Xeofs(inn,:); % (x,e); Xpcs=Xeofs'*Xanom; Xpcs=Xpcs'; end function =principal_component_analysis(X,neof) % function =principal_component_analysis(X,neof) % Function to do a principal component analysis of % data matrix X. % Input: % X: (t,x) each row corrsponds to a sample, each column % is a variable. (Each column is a time series of a % variable.) % neof: number of EOF/PC to return % Output: % EOFs: (x,e) matrix with EOFs (loadings) in the columns % PCs: (t,e) matrix with principal components (scores) in the columns % Var: variance of each principal component % Xrecon: (t,x) reconstructed X (WITHOUT adding back the mean) % To reconstruct: Xrecon = PCs*EOFs' % Notes: (1) This routine will subtract off the mean of each % variable (column) before performing PCA. % (2) sum(var(X)) = sum(Var) = sum(diag(S)^2/(m-1)) % Samar Khatiwala (spk@ldeo.columbia.edu) if strcmp(class(X),'single') disp('WARNING: Converting input matrix X to class DOUBLE') end % Center X by subtracting off column means = size(X); %X = X - repmat(mean(X,1),m,1); r = min(m-1,n); % max possible rank of X % SVD if nargin 2 =svds(X,r); else =svds(X,min(r,neof)); end % EOFs: (x,e) % U: (t,e) % Determine the EOF coefficients PCs=U*S; % PCs=X*EOFs (t,e) % compute variance of each PC % Modified EV=diag(S).^2/sum( diag(S).^2 ); Var = EV*100; % Note: X = U*S*EOFs' % EOFs are eigenvectors of X'*X = (m-1)*cov(X) % sig^2 (=diag(S)^2) are eigenvalues of X'*X % So tr(X'*X) = sum(sig_i^2) = (m-1)*(total variance of X) if nargout3 Xrecon = PCs*EOFs'; % (t,x) end
5091 次阅读|3 个评论
[转载]Why P</=0.05?
carldy 2012-7-28 21:48
For reference: Why P 0.05? from: http://www.jerrydallal.com/LHSP/p05.htm The standard level of significance used to justify a claim of a statistically significant effect is 0.05. For better or worse, the term statistically significant has become synonymous with P 0.05 . There are many theories and stories to account for the use of P=0.05 to denote statistical significance. All of them trace the practice back to the influence of R.A. Fisher. In 1914, Karl Pearson published his Tables for Statisticians Biometricians . For each distribution, Pearson gave the value of P for a series of values of the random variable. When Fisher published Statistical Methods for Research Workers (SMRW) in 1925, he included tables that gave the value of the random variable for specially selected values of P. SMRW was a major influence through the 1950s. The same approach was taken for Fisher's Statistical Tables for Biological, Agricultural, and Medical Research , published in 1938 with Frank Yates. Even today, Fisher's tables are widely reproduced in standard statistical texts. Fisher's tables were compact. Where Pearson described a distribution in detail, Fisher summarized it in a single line in one of his tables making them more suitable for inclusion in standard reference works * . However, Fisher's tables would change the way the information could be used. While Pearson's tables provide probabilities for a wide range of values of a statistic, Fisher's tables only bracket the probabilities between coarse bounds. The impact of Fisher's tables was profound. Through the 1960s, it was standard practice in many fields to report summaries with one star attached to indicate P 0.05 and two stars to indicate P 0.01, Occasionally, three starts were used to indicate P 0.001. Still, why should the value 0.05 be adopted as the universally accepted value for statistical significance? Why has this approach to hypothesis testing not been supplanted in the intervening three-quarters of a century? It was Fisher who suggested giving 0.05 its special status. Page 44 of the 13th edition of SMRW, describing the standard normal distribution, states The value for which P=0.05, or 1 in 20, is 1.96 or nearly 2; it is convenient to take this point as a limit in judging whether a deviation ought to be considered significant or not. Deviations exceeding twice the standard deviation are thus formally regarded as significant. Using this criterion we should be led to follow up a false indication only once in 22 trials, even if the statistics were the only guide available. Small effects will still escape notice if the data are insufficiently numerous to bring them out, but no lowering of the standard of significance would meet this difficulty. Similar remarks can be found in Fisher (1926, 504). ... it is convenient to draw the line at about the level at which we can say: "Either there is something in the treatment, or a coincidence has occurred such as does not occur more than once in twenty trials."... If one in twenty does not seem high enough odds, we may, if we prefer it, draw the line at one in fifty (the 2 per cent point), or one in a hundred (the 1 per cent point). Personally, the writer prefers to set a low standard of significance at the 5 per cent point, and ignore entirely all results which fail to reach this level. A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance. However, Fisher's writings might be described as inconsistent. On page 80 of SMRW, he offers a more flexible approach In preparing this table we have borne in mind that in practice we do not want to know the exact value of P for any observed 2 , but, in the first place, whether or not the observed value is open to suspicion. If P is between .1 and .9 there is certainly no reason to suspect the hypothesis tested. If it is below .02 it is strongly indicated that the hypothesis fails to account for the whole of the facts. Belief in the hypothesis as an accurate representation of the population sampled is confronted by the logical disjunction: Either the hypothesis is untrue, or the value of 2 has attained by chance an exceptionally high value. The actual value of P obtainable from the table by interpolation indicates the strength of the evidence against the hypothesis. A value of 2 exceeding the 5 per cent. point is seldom to be disregarded. These apparent inconsistencies persist when Fisher dealt with specific examples. On page 137 of SMRW, Fisher suggests that values of P slightly less than 0.05 are are not conclusive. he results of t shows that P is between .02 and .05. The result must be judged significant, though barely so; in view of the data we cannot ignore the possibility that on this field, and in conjunction with the other manures used, nitrate of soda has conserved the fertility better than sulphate of ammonia; the data do not, however, demonstrate this point beyond the possibility of doubt. On pages 139-140 of SMRW, Fisher dismisses a value greater than 0.05 but less than 0.10. e find... t =1.844 . The difference between the regression coefficients, though relatively large, cannot be regarded as significant. There is not sufficient evidence to assert that culture B was growing more rapidly than culture A. while in Fisher he is willing pay attention to a value not much different. ...P=.089. Thus a larger value of 2 would be obtained by chance only 8.9 times in a hundred, from a series of values in random order. There is thus some reason to suspect that the distribution of rainfall in successive years is not wholly fortuitous, but that some slowly changing cause is liable to affect in the same direction the rainfall of a number of consecutive years. Yet in the same paper another such value is dismissed! ...P=.093 from Elderton's Table, showing that although there are signs of association among the rainfall distribution values, such association, if it exists, is not strong enough to show up significantly in a series of about 60 values. Part of the reason for the apparent inconsistency is the way Fisher viewed P values. When Neyman and Pearson proposed using P values as absolute cutoffs in their style of fixed-level testing, Fisher disagreed strenuously. Fisher viewed P values more as measures of the evidence against a hypotheses, as reflected in the quotation from page 80 of SMRW above and this one from Fisher (1956, p 41-42) The attempts that have been made to explain the cogency of tests of significance in scientific research, by reference to hypothetical frequencies of possible statements, based on them, being right or wrong, thus seem to miss the essential nature of such tests. A man who "rejects" a hypothesis provisionally, as a matter of habitual practice, when the significance is at the 1% level or higher, will certainly be mistaken in not more than 1% of such decisions. For when the hypothesis is correct he will be mistaken in just 1% of these cases, and when it is incorrect he will never be mistaken in rejection. This inequality statement can therefore be made. However, the calculation is absurdly academic, for in fact no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas. Further, the calculation is based solely on a hypothesis, which, in the light of the evidence, is often not believed to be true at all, so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance. Still, we continue to use P values nearly as absolute cutoffs but with an eye on rethinking our position for values close to 0.05 ** . Why have we continued doing things this way? A procedure such as this has an important function as a gatekeeper and filter--it lets signals pass while keeping the noise down. The 0.05 level guarantees the literature will be spared 95% of potential reports of effects where there are none. For such procedures to be effective, it is essential ther be a tacit agreement among researchers to use them in the same way. Otherwise, individuals would modify the procedure to suit their own purposes until the procedure became valueless. As Bross (1971) remarks, Anyone familiar with certain areas of the scientific literature will be well aware of the need for curtailing language-games. Thus if there were no 5% level firmly established, then some persons would stretch the level to 6% or 7% to prove their point. Soon others would be stretching to 10% and 15% and the jargon would become meaningless. Whereas nowadays a phrase such as statistically significant difference provides some assurance that the results are not merely a manifestation of sampling variation, the phrase would mean very little if everyone played language-games. To be sure, there are always a few folks who fiddle with significance levels--who will switch from two-tailed to one-tailed tests or from one significance test to another in an effort to get positive results. However such gamesmanship is severely frowned upon and is rarely practiced by persons who are native speakers of fact-limited scientific languages--it is the mark of an amateur. Bross points out that the continued use of P=0.05 as a convention tells us a good deal about its practical value. The continuing usage of the 5% level is indicative of another important practical point: it is a feasible level at which to do research work. In other words, if the 5% level is used, then in most experimental situations it is feasible (though not necessarily easy) to set up a study which will have a fair chance of picking up those effects which are large enough to be of scientific interest. If past experience in actual applications had not shown this feasibility, the convention would not have been useful to scientists and it would not have stayed in their languages. For suppose that the 0.1% level had been proposed. This level is rarely attainable in biomedical experimentation. If it were made a prerequisite for reporting positive results, there would be very little to report. Hence from the standpoint of communication the level would have been of little value and the evolutionary process would have eliminated it. The fact that many aspects of statistical practice in this regard have changed gives Bross's argument additional weight. Once (mainframe) computers became available and it was possible to calculate precise P values on demand, standard practice quickly shifted to reporting the P values themselves rather than merely whether or not they were less than 0.05. The value of 0.02 suggested by Fisher as a strong indication that the hypothesis fails to account for the whole of the facts has been replaced by 0.01. However, science has seen fit to continue letting 0.05 retain its special status denoting statistical significance. * Fisher may have had additional reasons for developing a new way to table commonly used distribution functions. Jack Good, on page 513 of the discussion section of Bross (1971), says, "Kendall mentioned that Fisher produced the tables of significance levels to save space and to avoid copyright problems with Karl Pearson, whom he disliked." ** It is worth noting that when researchers worry about P values close to 0.05, they worry about values slightly greater than 0.05 and why they deserve attention nonetheless. I cannot recall published research downplaying P values less than 0.05. Fisher's comment cited above from page 137 of SMRW is a rare exception. References Bross IDJ (1971), "Critical Levels, Statistical Language and Scientific Inference," in Godambe VP and Sprott (eds) Foundations of Statistical Inference . Toronto: Holt, Rinehart Winston of Canada, Ltd. Fisher RA (1956), Statistical Methods and Scientific Inference New York: Hafner Fisher RA (1926), "The Arrangement of Field Experiments," Journal of the Ministry of Agriculture of Great Britain, 33, 503-513. Fisher RA (19xx), "On the Influence of Rainfall on the Yield of Wheat at Rothamstead,"
个人分类: 读书心得体会 Harvest|2772 次阅读|0 个评论
[转载]【GAMIT-BLOCK】gamit10.4运行例子讲解
zhenghui2915 2012-5-3 17:08
[转载]【GAMIT-BLOCK】gamit10.4运行例子讲解
gamit.4运行例子讲解 * README for the GAMIT/GLOBK standard example * Southern California 2000 034-036, 2002 041-042, 2004 -52-052 * Last updated by rwk 101109 21:20 UTC The example is set up to 1) conduct phase processing (sh_gamit) for three days in 2000, two days in 2002, and two days in 2004, using up to six continuous stations (BLYT, BRAN, CIT1, JPLM, TRAK, WLSN) and one survey-mode station (7001); 2) compute and plot daily repeatabilities (sh_glred) for 2000; and 3) compute repeatabilities and velocities for the three years together (globk/glorg). Input files provided in advance are RINEX, sittbl., sites.defaults, apriori coordinates, and the GLOBK command files ; all the others are to be created by the user as part of the example. The structure established by the example has three GAMIT processing directories, named by year ( /2000, /2002, /2004), each of which has below it a /rinex, /tables, and /gsoln directory specific to that year. At the top level there is a processing directory for GLOBK (/vsoln) and an additional /tables directory used for the multiyear combination. The steps described assume that you have downloaded and linked OTL grid and have internet access while processing; if you do not have these, see Notes 3 and 4 below. In directory /check_files are saved copies of the q filesand .org files for each day, and the .org files and postscript files for the multiyear repeatablities and velocities. Before you start, make sure that you have constructed the paths and aliases described in the installation README. The example may be run from any directory on your system, preferably the place you intend to process your own data, not under gg. 工作目录文件结构 STEP 1: Run GAMIT for the three days from 2000. In the example/2000 directory type sh_setup -yr 2000 The 2000/tables directory will now contain links to most standard files in gg/tables and copies of these files for process.defaults, sestbl., station.info (complete SOPAC version), and autcln.cmd. The sites.defaults and sittbl. files were already in directory and therefore not overwritten by sh_setup. Examine sites.defaults to note that it has been set up to ftp from a remote archive (SOPAC by default) RINEX files for BLYT, JPLM, LNCO, and MATH, and 'xstnfo' is set to avoid any automatic update of station.info during processsing. Note also that the sittbl. is set up to impose moderate constraints on BLYT, BRAN, CIT1, JPLM, TRAK, and WLSN ambiguity resolution in GAMIT. Edit 2000/tables/process.defaults to change aprf from itrf05.apr to regional.apr , and 'mailto' to your own email address to receive the sh_gamit summary file. Construct a small, experiment-specific station.info file by using the following procedure in the 2000/tables directory: sh_upd_stnfo -l sd will create 'station.info.new', using from the SOPAC station.info only the sites listed in sites.defaults. After checking, rename it to 'station.info' (overwriting the no-longer-useful SOPAC station.info). sh_upd_stnfo -files ../rinex/*.00o will add entries to station.info from the headers of the RINEX files, only one site (7001) in this case. Note that any RINEX files to be read by sh_upd_stnfo needs to be uncompressed first. In general, unless you know that the RINEX headers are correct, you need to check station.info after this step and make corrections as needed. As it happens, the RINEX files for 7001 have antenna names that are not IGS standard, a common occurrence since the receiver cannot detect the antenna type, leaving it up to the person creating the RINEX file to get it right. If you have a current version of guess_rcvant.dat in your gg/tables directory, this update will work correctly since this file includes the entry ' ant TYPE^T TRBROG' , which will allow sh_upd_stnfo to recognize 'DORNE-MARGOL. TYPE T' as GAMIT code TRBROG and create from it (using rcvant.dat) the IGS-standard name'AOAD/M_T'. Type at the /2000 level sh_gamit -expt scal -d 2000 034 035 036 -pres ELEV -orbit IGSF -copt x k p -dopts c ao (You may redirect the output to a file using with ! sh_gamit.logwith csh or sh_gamit.log with bash.) A summary file should be emailed to you as each day completes execution. Check the summary for number of stations (6), Postfit RMS (3-6 mm, none 0), Postfit nrms (~0.2), ambiguity resolution (~90%), and coordinate adjustments 30 cm (only site 7001. Look at the sky plots and phase vs elevation angle plots in the /gifs directory. If necessary, compare the q-files and sky plots with those in the check_files directory (Note that the sky plots in check_files are still in postscript, whereas those in /gifs have been converted to gif images.) Type at the /2000 level sh_glred -s 2000 034 2000 036 -expt scal -opt H G E ! sh_glred.log The script as commanded will translate the GAMIT ascii h-files in each day directory to GLOBK binary h-files and put them into the /glbf directory ( H ); create a gdl file for each day listing the h-file; run GLOBK for each day ( G ) using globk_comb.cmd and glorg_comb.cmd ( G ); and generate time series plots ( E, from program ensum). In /gsoln, use ghostscript to inspect the repeatability plots (gs psbase*). Optionally compare the glorg print files (.org) with those in the /check_files directory. We obtain good repeatablities here because we have constructed a file of a priori coordinates (regional.apr) based on a multiyear regional solution. Alternatively you can incorporate more ITRF sites and define the frame more broadly. Optionally, remove additional files from the day directories to save space: sh_cleanup -d 2000 034 035 036 -dopts p x k Notes: 1. For large or complex data sets, the utility sh_get_times can be helpful in determining the days and session spans to be processed. 2. In creating station.info for your own experiments, it is important to check it after updating from the RINEX headers unless you are sure these headers are correct. In processing, the station.info entries always override whatever is in the RINEX header or the x-files. An alternative way of creating entries for survey-mode sites is to use interactive program 'make_stnfo', then merge this file with the one created from the SOPAC station.info file for continuous sites. The survey-mode file have a shorter form of the station.info format, but this will be converted when it is merged with the continuous file, which should be listed as the reference (-ref) in the call to sh_upd_stnfo. 3. The example is set up to use ocean tidal loading ( 'Use otl.grid = Y' in the sestbl. ), which requires you to have previously downloaded into gg/tables an OTL file from the anonymous ftp directory on everest.mit.edu and to have linked this file to otl.grid also within gg/tables. The IERS/IGS standard model is otl_FES2004.grid (730 Mb). You may, however, substitute the smaller (45 Mb) otl_CSR4.grid, or you may turn off ocean tidal loading by setting 'Tides applied = 23' and 'Use otl.grid = N' in the sestbl. (Note that the links to the other grid and list files (met.grid, met.list, map.grid, etc.) can remain empty for running the example and for most processing. 4. If you want or need to run the example without having internet access while running, you can pre-load the RINEX (*.00o) files into 2000/rinex, the navigation files (RINEX brdc 0.00n) into 2000/brdc and the orbit (.sp3) files into 2000/igs and set -noftp in the sh_gamit command line. STEP 2: Repeat Step 1 for 2002 and 2004: At /2002: sh_setup -yr 2002 Edit process.defaults for mailto and aprf At /2002/tables: sh_upd_stnfo -l sd ; mv station.info.new station.info At /2002: sh_gamit -expt scal -d 2002 041 042 -orbit IGSF -copt x k p -dopts c ao ! sh_gamit.log sh_glred -s 2002 041 2002 042 -expt scal -opt H G E ! sh_glred.log At /2004: sh_setup -yr 2004 Edit process.defaults for mailto and aprf At /2004/tables: sh_upd_stnfo -l sd ; mv station.info.new station.info At /2004 sh_gamit -expt scal -d 2004 051 052 -orbit IGSF -copt x k p -dopts c ao ! sh_gamit.log sh_glred -s 2004 051 2004 052 -expt scal -opt H G E ! sh_glred.log Note that site 7001 is not available in 2002 or 2004. Since there are no station.info entries needed for 2002 and 2004 other than those from the SOPAC file, you can skip the second sh_upd_stnfo step. Don't forget to edit 'mailto' and 'aprf' in process.defaults after running sh_setup. STEP 3: Run GLOBK to get 3-epoch repeatabilities and velocities The key user-specific controls for this step (also incorporated in the sh_glred runs within each year) are list of sites to be used in defining the reference frame, and the a priori coordinates for these sites, Here we use the same sites and coordinate file (regional.apr) as in the single-year solutions, but this may not always be the case. Create a list of h-files to be input to GLOBK for repeatabilies and velocities. In /vsoln, type ls ../????/glbf/h*glx scal.gdl For large or complex data sets, it's helpful at this point to run glist, which will check for blunders and give you a list of all the sites used and their spans. Program glist2cmd can then be helpful in establishing a use_site list. To get repeatabilities, type glred 6 globk_comb.prt globk_comb.log scal.gdl globk_comb.cmd This run will produce in a single file (globk_comb.org) solutions for each h-file in the list (each day over the three years). You should inspect this file for quality of stabilization and adjustments and uncertainties to the positions and velocities. With a very long file, it is convenient to get a summary of the stabilization quality (rms and number of stations retained) using grep 'POS STAT' globk_comb.org Plot the results using sh_plotcrd -f globk_comb.org -s long -res -o 1 -vert -col 1 -x 2000.0 2005.0 Then inspect the plots with 'gs psbase*' and if necessary compare them with the plots in /check_files. If there are any outliers, they must be dealt with by exclusion (rename to _XPS or _XCL in the eq_file) or downweighting (sig_neu entries in globk_comb.cmd. Finally, perform a velocity solution globk 6 globk_vel.prt globk_vel.log scal.gdl globk_vel.cmd Check the .org file for chi2 increments in stacking the 7 h-files; quality of stablization for velocity (primarily) and position; and the adjustments and uncertainties of the velocity estimates. Plot the velocities using sh_plotvel -ps scal -f globk_vel.org -R240/246/32/35 -factor 0.5 -arrow_value 10 -page L The scal.ps plot should look like the one in check_files. ------------ To start over from scratch, run sh_clean_example , once for each year and once for the velocity solution. (Type the name of the script without arguments to see the documentation.) 中文说明部分 : http://www.gcedd.com/topic-26-1.html;jsessionid=AE3447013652EDDCEE7AACA2E19CBA84 这个例子共有6个连续站,也就是IGS站,在网上可以下载到。例子包中没有这个6个站的数据。这6个站是: BLYT, BRAN, CIT1, JPLM, TRAK, WLSN 还有一个该工程自由的测站,也就是区域站7001,该工程包括2000年的3天,2002年的2天,2004年的2天,后面2年数据解算中,不包括7001测站。 好了开始: 第一步 在2000目录下:执行 1\ sh_setup -yr 2000 然后,编辑sites.defaults文件。 编辑邮件地址,解算完,gamit会把总结文件发到你邮箱。 这个regional.apr先验坐标文件,很精确,对于结算结果很有好处。进行如下替换处理process.defaults: itrf05.apr to regional.apr 完成这步,首先查看sites.defaults,确定里面的测站,这个文件后面被需要,主要是生成station.info文件。在完成查看确定、编辑sites.defaults之后, 2\在2000/tables 目录下: 运行这个: sh_upd_stnfo -l sd 然后,把 station.info.new 文件名修改为 'station.info',这部主要是提取IGS测站station.info, 在这里,忘了个问题, 需要解压原始观测Z文件,然后将D文件转换为O文件,最后运行; sh_upd_stnfo -files ../rinex/*.00o 3在 /2000 目录下:执行: sh_gamit -expt scal -d 2000 034 035 036 -pres ELEV -orbit IGSF -copt x k p -dopts c ao (画天空图) 首先,会下载需要的数据,以及表文件。然后才是解算。 -pres ELEV的功能是画天空图,如果不加这个选项的话,在gifs目录下不会有图生成。 gamit结算完成。查看3天的,没问题的话,进行下一步 4\ sh_glred -s 2000 034 2000 036 -expt scal -opt H G E ! sh_glred.log 这是在平差,查看每个测站的3天重复性。 5\ sh_cleanup -d 2000 034 035 036 -dopts p x k (删除没用的文件) 注意: 1、可用sh_get_times查看数据截止时间 2、如何生成station.info文件,有2种方法( 交互式的:make_stnfo,自动式的:sh_upd_stnfo ) 3、海潮文件下载及设置(这个在安装gamit时,就已经做好,如何安装gamit10.4 ,请查看我之前录的视频) 4、没有网络情况,如何处理该例子。主要是下载数据,以及采用-noftp选项。 第二步,处理2002和2004年的数据 和处理2000的方法一样。下面就直接按部就班的处理了。 (1)2002年 1、At /2002: sh_setup -yr 2002 2、Edit process.defaults for mailto and aprf 3、 At /2002/tables: sh_upd_stnfo -l sd ; 把 station.info.new 修改为:station.info 这里如果还有不是IGS的测站,需要像2000的一样,进行这步: sh_upd_stnfo -files ../rinex/*.00o 不过这里没有这种情况了,这里的测站全是IGS站, 4、At /2002: sh_gamit -expt scal -d 2002 041 042 -orbit IGSF -copt x k p -dopts c ao ! sh_gamit.log sh_glred -s 2002 041 2002 042 -expt scal -opt H G E ! sh_glred.log (2) 2004年 开始进行2004年的处理 1、At /2004: sh_setup -yr 2004 2、Edit process.defaults for mailto and aprf 3、At /2004/tables: sh_upd_stnfo -l sd ; mv station.info.new station.info 4、At /2004 sh_gamit -expt scal -d 2004 051 052 -orbit IGSF -copt x k p -dopts c ao ! sh_gamit.log 下载数据很漫长,如果网速不行的话,当然你可以提前下好数据, sh_glred -s 2004 051 2004 052 -expt scal -opt H G E ! sh_glred.log 第三步,运行GLOBK,生成3年的重复率和速率 1、In /vsoln ls ../????/glbf/h*glx scal.gdl 2、重复性计算 glred 6 globk_comb.prt globk_comb.log scal.gdl globk_comb.cmd grep 'POS STAT' globk_comb.org globk_comb.org这个是结果文件。 3、绘制结果 sh_plotcrd -f globk_comb.org -s long -res -o 1 -vert -col 1 -x 2000.0 2005.0 4、速率计算 globk 6 globk_vel.prt globk_vel.log scal.gdl globk_vel.cmd 5、绘制速率 sh_plotvel -ps scal -f globk_vel.org -R240/246/32/35 -factor 0.5 -arrow_value 10 -page L 7001只有2000年的数据。 这个图是最终的速率图; ok,运行完成了。 如果你没有如何错误,可以很顺利的完成。 如何查看结果、评估结果,期待续集 。。。。。。 by zzh_my@163.com 学无止境
个人分类: GAMIT-GLOBK|6277 次阅读|0 个评论
[转载]浅析sstream库
huozhenhua 2012-2-20 11:34
sstream库定义了三种类:istringstream、ostringstream和stringstream,分别用来进行流的输入、输出和输入输出操作。另外,每个类都有一个对应的宽字符集版本。 sstream使用string对象来代替字符数组。这样可以避免缓冲区溢出的危险。而且,传入参数和目标对象的类型被自动推导出来,即使使用了不正确的格式化符也没有危险。 istringstream和ostringstream主要用在内核格式化中(用cout的ostream方法将格式化信息写入string对象中或是读取string对象中的格式化信息)例如: ostringstream outstr; double price= 281.00; char* ps = "for a copy of the ISO/EIC C++ standard!"; outstr fixed; outstr "Pay only$" price ps end; string msg = outstr.str(); istreamstring允许用istream方法读取istringsteam对象中的数据,也可以用使用string对象对istreamsting对象初始化。简而言之:istirngstream和ostringstream可以使用 istream和ostream类的方法来管理存储在字符串的字符数据。 stringstream通常是用来做数据转换的。相比c库的转换,它更加安全,自动和直接。 例如:#include string #include sstream #include iostream int main() { std::stringstream stream; std::string result; int i = 1000; stream i; //将int输入流 stream result; //从stream中抽取前面插入的int值 std::cout result std::endl; // print the string "1000" } 除了基本类型的转换,也支持char *的转换。 #include sstream #include iostream int main() { std::stringstream stream; char result ; stream 8888; //向stream中插入8888 stream result; //抽取stream中的值到result std::cout result std::endl; // 屏幕显示 "8888" } 需要注意的是,下面的写法是不正确的:ifsream fs(Filename); stringsteam buff; buff fs.rubf();//这句代码可以一次性把文件写入一个字符串中,然后将Outbuff.str()的值赋给一个string对象就可以。 buff fs;这样写是错误的,看看下面的运算符的定义就知道了,它不接受这样的参数。 但可以这样写fsbuf;这样写才正确。 cout Outbuff endl; 这样写,编译器可以通过编译,但是运行后是空值。改成这样的才行:cout Outbuff.rubf() endl; istringstream和ostringstream在文件流的用法和stringstream的用法类似,必须用rubf方法才可以看到内容。 rubf返回的一个 stringbuf 对象的指针,str方法返回的是一个string对象,上面的rubf也可以换成str方法。 这三个类的str和rubf的类方法用法都一样。 不同的是str方法:有两个版本: string str()const;//拷贝流缓冲到一个string对象中 void str(constr string s);//通过流缓冲构造一个string对象。上面的rubf也可以写出Outbuff.rubuf()-str(),这样些效率更高些。 需要特别注意的是:要清空上面的类对象的内存,不能用clear方法,那只是设置了错误标志位,要用str(""); stringstream的方法和ostream的 方法一样。 而且stringstream只有运算符可以用。 ostream operator (bool val ); ostream operator (short val ); ostream operator (unsigned short val ); ostream operator (int val ); ostream operator (unsigned int val ); ostream operator (long val ); ostream operator (unsigned long val ); ostream operator (float val ); ostream operator (double val ); ostream operator (long double val ); ostream operator (void* val ); ostream operator (streambuf* sb ); ostream operator (ostream ( *pf )(ostream)); ostream operator (ios ( *pf )(ios)); ostream operator (ios_base ( *pf )(ios_base)); 上面的都是它的成员函数,下面的则是全局函数 ostream operator (ostream out, char c ); ostream operator (ostream out, signed char c ); ostream operator (ostream out, unsigned char c ); ostream operator (ostream out, const char* s ); ostream operator (ostream out, const signed char* s ); ostream operator (ostream out, const unsigned char* s ); 我们还可以利用stringstream来清楚文件内容。示例代码如下: ofstream fs(FileName); stringstream str; strfs; fs.close(); 这样文件就被清空了,但是文件还在。
个人分类: C/C++|2366 次阅读|0 个评论
[转载]Econometric software package information
kerong1996 2012-1-18 10:30
Econometric software package information Amos This package is designed for estimating linear structural equation models. It is particularly well suited for models with latent variables or measurement error components. Bootstrap methods are provided for the computation of standard errors. While this is not a general purpose econometrics package, it is useful in specialized applications. A free student version is available for Windows and OS/2 platforms. AREMOS This package, provided by Global Insight, is primarily designed for the analysis and manipulation of time-series and panel data. AREMOS estimates OLS, 2SLS, 3SLS, ARIMA, VAR, cointegration, and some nonlinear models. It is available on Windows platforms. Autobox A package that automates the identification of ARIMA and transfer function models. This package is available on DOS, Windows, and RISC platforms. BMDP A classic package that estimates regression, logit, survival function, maximum likelihood (user-specified function), and ARIMA models. Available Windows 2000/XP platforms. DataDesk A Mac/Windows program designed for exploratory data analysis. Peforms basic regression analysis. Dataplore Dataplore is a full-featured package for the analysis of time-series data using a frequency domain approach. Academic users may download evaluation copies of the software for Windows, Linux, and Sun Solaris platforms. EasyReg This package contains a wide variety of estimators for cross-sectional and time-series models. In addition to standard regression models, this package also estimates a variety of limited dependent variable (such as logit, probit, and Tobit models) and time-series models (including ARMA error processes, ARCH tests, VAR models, and tests for unit roots and cointegration). The program and sample data sets are downloadable from this site. This powerful software package operates on Windows platforms. EasyPlot A 3D data plotting package that will estimate some nonlinear relationships. Egwald Statistics - Multiple Regression This online regression package, created by Elmer G. Wiens, allows the user to estimate multiple regression models online (including models with parameter restrictions). Epicure A statistical package designed to deal with risk models. EQS - Structural Equation Modeling Software A package for estimating LISREL-type models. EViews (Quantitative Micro Software) The modern windows based replacement to the very popular MicroTSP package. This package offers state of the art time-series modelling capabilities and a good variety of single equation and simultaneous equation regression models. A variety of limited dependent variable, panel data, and ARCH models are also included to provide a very well-rounded econometrics package. Remote access to web-based data files is possible. Available for Windows. Fair-Parke program This program may be used to estimate systems of simultaneous equations (including rational expectations and autocorrelated error models). It is available in either an executable form for MS-DOS machines or as FORTRAN source code for other platforms. First Bayes A software package designed to teach Bayesian statistics. This package runs on Windows platforms and a download available at this site. (There is no charge for educational users.) Frontier This software package, written by Tim Coelli, provides a maximum likelihood estimator for the parameters of frontier regression models. This may be used for the estimation of cost or production functions with truncated normal error terms. It allows for time-variant and time-invariant efficiencies and may be used with either cross-sectional or panel data. GAMS (General Algebraic Modeling System) This is a modeling system that can be used for a variety of statistical, econometric, and other mathematical models. Model libraries are available for downloading. Gauss This econometrics package/matrix programming language is one of the most popular among those working with maximum likelihood estimators. Gauss is available on Dos, Windows, and Unix platforms. This site contains online tutorials and links to several libraries of Gauss code. Links to a variety of Gauss resources may be found at American University or at Eric Zivot''s Gauss Resources page. GB-Stat A reasonably comprehensive statistics package that can be used for many econometric applications. It is available for Windows or Mac platforms. G*Power A free software program that computes t, F, and Chi-Squared statistics and computes the power of an experimental test. PC and Mac versions are available and may be downloaded from this site. GQOPT The classic nonlinear optimization package that contains a large variety of options to solve difficult maximum likelihood problems. This package, written in Fortran, requires a user-written Fortran subroutine to evaluate the likelihood function. The user may also specify analytic derivatives (as a subroutine) or use a choice among a variety of built-in numerical derivative routines. Available for DOS, Windows, and mainframe environments. (A Fortran compiler is also required.) Gretl The Gnu Regression, Econometrics and Time-series Library (gretl) is a very useful econometrics package developed by Allin Cottrell. It offers a very easy to use graphical interface and a growing collection of statistical and econometrics routines. Grocer This is an econometrics toolkit for Scilab, a matrix processing language similar to Gauss and Matlab. Grocer and Scilab are both free open-source projects. Grocer provides an automatic general-to-specific model estimator that is analagous to PC-GETS. Estimators for 2SLS, SUR, 3SLS, VAR, VEC, VARMA, and GARCH models are also part of this package. JMP A package that contains a variety of ANOVA, regression, ARIMA, and time-series smoothing models. Leading Market Technologies (producers of EXPO - a statistical package designed for financial analysts) This site contains information about the EXPO package (a free student version is available). LIMDEP 8.0 This econometrics software package was written by William Greene, the author of Econometric Methods. LIMDEP is updated frequently and contains state-of-the-art estimators. LIMDEP contains estimators for most single-equation and simultaneous equation econometric models. No other package contains as diverse a mix of estimators for limited dependent variable models. It''s only weakness is its limited capabilities for dealing with time-series models. LISREL LISREL is a statistical package designed to estimate models involving linear structural relationships among observed and latent (unobserved) variables. MacAnova A statistical and matrix algebra package that is strongest in ANOVA models with some basic regression and time series models. Available on Mac, Windows, and Linux platform. This package is free for educational users. Macsyma This is a commercial symbolic algebra package that is comparable to Mathematica and Maple. Maple One of the most popular mathematical processing languages. While this is not a statistical package, it is often used to teach statistical and econometric concepts. Software libraries are available for many applications. Mathematica The major competitor to Maple. (See the notes above for Maple.) A Mathematica Help Page is available at Yale MathCad (from Mathsoft) MathCad is another mathematical processing language. Matlab A mathematical processing language that is a competitor to Mathematica and Maple. Matrixer This Windows shareware program offers a large and growing selection of time-series, cross-sectional, and limited dependent variable models. Maxima An open source descendent of the original Macsyma symbolic algebra project. Microfit This is a general econometrics package, developed by B. and M. Hashem Pesaran, for Windows platforms. This package has a very good mix of estimators for ARCH, GARCH, cointegrating VAR models, and other time series models. Minitab A commonly used package for teaching basic statistics and econometrics. It does not have many of the features of modern econometric software, but it is easy to use and offers good online help facilities. MLEQuick MLEQuick is a menu-driven program that contains estimators for a wide variety of regression, limited dependent variable and survival models. Those with some programming experience in C++ may also wish to use MLE++ (a collection of MLE routines written in C++). MuPAD An alternative to Mathematica or Maple that is available on Windows, Linux, and Mac platforms. A free "Light" version of thhis package is available from this site. Mx Mx is a structural equation modeling program that allows the user to specify models using either a matrix algebra language or a graphical user interface. Linear and nonlinear equality constraints, missing data, and multilevel models may be handled with this package. NAG (Numerical Algorithms Group) A major supplier of Fortran 77, Fortran 90, C, and Ada code for statistical and other applications. This company also provides a variety of statistical packages including GLIM and a statistical add-on for Excel. (These packages can be used for a variety of ANOVA, linear, nonlinear, and (user-specified) maximum likelihood procedures. O-Matrix A matrix programming language that is can be used for statistical applications. A demo version may be downloaded from this site. Octave Another mathematical programming language. Ox Ox is an object oriented matrix programming language designed for statistical applications. Educational users may download a copy of Ox from this site (along with Pc-Give - described below). This package provides a nice free alternative to GAUSS for educational and noncommercial research applications. PcGive PcGive is a statistical package written by Doornik and Hendry that is particularly well suited for time-series models. This package has been designed with to encourage the general to specific model building practice for which the London School is known. (This package now includes PcFiml.) R The R Project for Statistical Computing is a GNU project that has developed a set of mathematical tools and statistical procedures that provide a free alternative to ATT''s S package. RATS and CATS A very powerful time-series package that runs on Windows and Mac platforms. This package is very well suited for estimating VARS and a variety of time-series models. The CATS package provides additional cointegration tests and features. Resampling Stats A site that contains information about an alternative approach to teaching statistics and downloadable software programs that can be used for this approach. S Plus An exploratory/visual data analysis package with limited econometric functionality. SAS For many years, SAS has been the dominant mainframe package for dealing with large econometric models. Shazam A popular package that contains estimators for most basic econometric models. This page contains sample documentation, data, and allows visitors to run Shazam programs remotely. Soritec Soritec is a reasonably full-featured econometrics software package with nice graphical capabilities. Estimators are available for OLS, 2SLS, 3SLS, SURE, ARIMA, logit, probit, and transfer function models. A student version is available for $10. Soritec is available for DOS and Windows platforms. While this package contains most regression model estimators, it is still a bit limited in sample selectivity models. Spatial Statistics Software, Spatial Data, and Articles This site, provided by Kelley Pace, contains free spatial software written in Matlab that makes it possible to estimate large spatial autoregression models. A Fortran 90 version of the software (Spacestatpack) is also available, as well as spatial data, spatial articles, and links to other sites that contain information related to spatial statistics models. STAMP (Software for Structural Time Series Modeling) A structural time series package that models and forecasts time series variables. StatsDirect A basic statistics package that estimates regression, probit and logit, and survival analysis. SPSS A very popular package among econometricians in the late 1970s and early 1980s. SST An econometrics package that can be used to estimate a variety of maximum likelihood problems. A variety of regression and limited dependent models are available. Stata During the last decade, Stata has become one of the most widely used econometric software packages. It now includes a wide variety of robust estimators for regression, limited dependent variable, panel data, and time-series models. Only SAS comes close to the variety of estimators provided by this package. UCLA''s Stata tutorial StatTransfer While this is not an econometric software package, this program is an extremely valuable tool that provides conversion among most commonly used data formats. These data formats include speadsheet and database formats as well as the internal storage formats for virtually all major econometric and statistical software packages. StatView A now discontinued product that was very popular among Mac users (a Windows version was also available). This package provides basic regression and survival analysis models, but does not contain many fundamental econometric estimators. Stixbox A free library of statistical routines for use with MATLAB. SYSTAT Systat is a popular statistical package distributed by SPSS. It is available on Windows and Mac platforms. TSP International The home of the original makers of TSP. This package was the first widely used econometrics software package on mainframe computers. The current version of this package runs on mainframe and PC platforms and contains several estimators that are not available on the current Micro-TSP and EViews packages. It still relies on the command-line user interface that should be familiar to econometricians who used earlier versions of the package. Those who are prefer using a mouse may wish to try the simpler interface available on EViews or MicroTSP. TSP is available for mainframe, DOS, Windows, OS/2, Mac, and unix machines. WinSolve A package for anlalyzing nonlinear economic models. A full demo version is available that functions for 120 days. Xlisp-Stat A statistical package written in Lisp by Luke Tierney at the University of Minnesota. The package may be downloaded from this FTP site. XploRe This package is designed for exploratory data analysis and nonparametric econometric applications. XTREMES A software package designed to analyze extreme data. This is primarily used for the analysis of insurance and speculative price data. XPGL, a graphical programming language in statistics, operates within the XTREMES package. Yorick Another free matrix language with numerous built-in statistical functions. 可能介绍的比较多,大家也没必要每种软件都要试一试,可重点关注eviews,tsp,gauss,limdep,mathematica,matlab,sas,spss和stata。实际上我也只不过经常使用这几种软件,下面我就把使用心得和大家分享一下,刚好我也梳理一下,助人助已。 1、eviews相信大家最熟悉了,这是目前高校里面使用最普遍的软件,是tsp(dos版)的windiows版本。其以界面的友善、使用的简单而著称,基本上操作是傻瓜式,但是非常实用,处理回归方程是它的长处,能处理一般的回归包括多元回归问题。我比较喜欢它的单位根检验和granger因果关系检验这两个命令,以及协整模型、ARIMA模型。我向入门者推荐这门软件。不过这个软件的劣势在于它的处理过程(傻瓜菜单)是个黑箱,出来的结果可能会不够精确,有的人可能会为得到一些结论造一些结果,可信度不是很高。(不过对于回归分析我相信对于同一组数据所有软件做出来的结果都是一样的);另一个不足是只能处理时间序列数据。 tsp软件现在没人用了,因为没有再用dos操作系统了。 2、gauss软件比较强大,在国内有林光平博士《计算计量经济学:计量经济学家和金融分析师GAUSS编程与应用》这本书,附带一个guass软件的轻量版。这本书非常好,附带有许多已经编好的、可信赖的软件包,你只要编点简单的程序按你的目标把这些软件包串联起来就行了,很简单,这等于是把黑箱打开了一部分,提高的可信度。遗憾的是我没见到完整的gauss软件,可能有的也是盗版。 3、spss软件。我以前非常喜欢使用这个软件,界面友好,使用简单,但是功能很强大,也可以编程,eviews能处理的它全能处理,另外横截面数据的处理是它的强项,能处理多变量问题,如进行因素分析、主成份分析、聚类分析、生存分析等。目前我这里有正版的11.5版本。大家学有余力我强力推荐。 4、sas软件。这个软件非常强大,也被吹嘘的很神秘。可以说spss的功能它全有,另外它还带有一些帮助企业决策的功能。但是有两个方面的问题影响它的使用。一是相对前面几种软件它使用相对比较复杂,不是很容易学;另外sas软件非常大,盗版的太多,一般企业购买正版的较多,对于我们穷苦学生和学者来说安装正版有点奢侈,我以前机子里装的就是盗版。盗版可能影响数据处理的可信度。 5、mathematica软件。这是一个数学软件,现在有5.0版。使用非常方便,用一些简单的命令就可以得到你要的结果,对数据拟合、模型拟合处理的很好,得到的图形也非常漂亮。以前在本科数学建模的时候经常用,但是统计功能不是很强大。 6、matlab软件。这是一种工科软件,功能非常强大,在建筑、工程中使用比较多,做出来的图形能够用完美来形容,编程能力很强,不过用在统计上有点大才小用,编程也相对复杂。但是用做数学建模绝对是个好的工具。 7、limdep软件。这是一门专业的统计软件,不大,安装盘不到8M,目前版本是8.0。我有正版软件及三大卷使用guide。除了时间序列、横截面数据外,处理面板数据是它的强项。但是我感觉这个软件使用太复杂了,命令非常复杂,界面也不友好,学习起来太费时间,使用效率不高,我不推荐。 8、stata软件。说到最后才说到她,这才是我的最爱。只能用无数个“太好。。。”来形容。正当我使用limdep软件处理面板数据,做stochastic frontier analysis模型痛苦万分时,我找到了这个软件,太爽了。这个软件兼有常用的eviews,spss,limdep,gauss有长处,使用简单,真正是把傻瓜菜单和命令、编程完美结合起来,目前版本是9.0。处理面板数据是它的优势,功能是太多了,技术细节处理的非常好,而且如果你是用正版的话,基本上每个星期都有在线升级,网站的支持功能也做得非常好。如果你有课题经费报销的话我绝对支持你去买一个正版,绝对值得! 说得这么多,可能弄得有点复杂了,对于上面几种统计软件的选择,对于初学者我有几点建议: 1、只选对的。如果你是入门,依据实用性原则,eviews应是你的首选,其次是spss。依照简单性原则,时间序列用eviews,横截面数据用spss,面板数据用stata,具体的依照你的水平和处理对象。各种软件各有优劣势。我也是把各种软件结合起来用,根据需要来选择。 2、最好用正版软件。我知道我们都是穷苦子弟,导师课题费使用也很抠门,但是从对你学习的帮助程度、数据处理的可信性来说选正版是理性的。如果真要用盗版的话用eviews和spss吧,简单也有点可信。 以上是我的一些介绍,有未尽之处、不对之处对象大家补充和指正。 (一些新的补充: EXCEL也能做一些简单直观的统计分析,如果已经安装宏的话还能做一些数值分析,也很实用。 EVIEWS也能进行面板数据的处理,但是很不方便。 有网友说spss在多元统计分析方面较突出,而eviews在计量经济模型方面较优,二者结合应用比较理想;sas在数据挖掘方面功能较强,matlab、mathematica软件可能在数值计算方面占优,这些说得很有道理。)
3216 次阅读|0 个评论
主编退稿理由: TOO LONG。32页的review算长的吗?
热度 1 whenand 2011-11-4 15:38
Dear Mr. Chengning Zhang, ID: SMCC-11-10-0474 Title: Reconsiderations on Clustering Analysis Authors: Zhang, Chengning; Zhao, Mingyang; Luo, Haibo After careful examination, I have concluded that we will not be able to publish your paper in the Transactions on Systems, Man, and Cybernetics--Part C: Applications and Reviews. This looks like an interesting work. However, your paper seems to be TOO LONG exceeding the standard number of pages significantly. For this reason we will not consider it further. Thank you for considering our Transactions as an outlet for publication of your manuscript. Sincerely, Prof. Vladimir Marik Editor-in-Chief Transactions on Systems, Man, and Cybernetics--Part C: Applications and Reviews
1769 次阅读|1 个评论
[转载]新 標 準 New ISO standard to increase security of escalators
LEOLAND 2011-11-1 14:16
New ISO standard to increase security of escalators Escalators, Copenhagen Metro station A new ISO standard will increase the safety and user comfort of the escalators used around the world on a daily basis by millions of people. Escalators are installed in many locations such as shopping centres, cinemas, airports, railway stations and subways and they are used every day in the world by millions of people. ISO technical report ISO/TS 25740-1:2011 , Safety requirements for escalators and moving walks – Part 1: Global essential safety requirements (GESR) , specifies safety requirements for escalators and moving walks, the components and functions, and provides methods for minimizing safety risks that might arise in the course of the operation and use of, or work on, escalators and moving walks. The objective of this technical specification is to define a common global level of safety for all people using, or associated with, escalators and moving walks and provide a uniform process for assessing their safety. The requirements will help: Developers of safety or safety-related standards for escalators and moving walks Designers of escalators and moving walks, manufacturers and installers, and maintenance and service organizations Independent third-party conformity assessment bodies Inspection and testing bodies and similar organizations. ISO/TS 25740-1 facilitates innovation of escalators and moving walks not designed according to existing local, national or regional safety standards, while maintaining equal levels of safety (if such innovations become state of the art, they can then be integrated into the detailed local safety standard at a later date). Developed as a product safety standard, the technical specification will help users and non-users to be protected from the effects of falling, shearing, crushing or abrasion, or other injuries. The objectives of the safety requirements are to: Introduce a universal approach to identifying and mitigating potential safety risks on new designs of components for escalators and moving walks that use new technologies, materials or concepts that are not adequately addressed in existing standards Stimulate harmonization of existing safety standards for escalators and moving walks. ISO/TS 25740-1 was developed by ISO technical committee ISO/TC 178, Lifts, escalators and moving walks , and is available from ISO national member institutes (see the complete list with contact details). It may also be obtained directly from ISO Central Secretariat, price 142 Swiss francs through the ISO Store or by contacting the Marketing, Communication Information department (see right-hand column).
个人分类: 标准文存|2128 次阅读|0 个评论
不负责任的拒稿意见
热度 5 zhilinyang 2011-10-8 11:55
稿件被审稿人拒绝,再也正常不过了。不过有的审稿人实在可恶,写出不负责任的拒稿意见。以下就是几例这样的审稿意见。 1) Although the results seem interesting and correct, the ideas and methods are standard and the results and examples are simple. Hence I do not think this paper deserves to be published in XX. 2) The studied problem is interesting, but, overall, the paper does not reach the high standards of XX. 3) The argument here is standard and not suprizing. the paper does not meet the high standards of XX. 4) Although the paper has some merit it does not contain new ideas. Most of the ideas have appeared in the literature. As a result the paper does not meet the standards of XX. 5) The argument in Section 3 is standard (once the results in Section 2 are derived). Many authors are now presenting results similar to section 2. Unfortunately this paper does not meet the HIGH standards of XX.
个人分类: 教学与科研|8024 次阅读|11 个评论
the c programming language standard
hillpig 2010-5-13 12:01
http://en.wikipedia.org/wiki/ANSI_C c99 http://www.open-std.org/jtc1/sc22/wg14/ c1x http://en.wikipedia.org/wiki/C1X http://www-949.ibm.com/software/rational/cafe/community/ccpp/standards
个人分类: 未分类|9 次阅读|0 个评论

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-6-2 09:46

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部