科学网

 找回密码
  注册

tag 标签: prediction

相关帖子

版块 作者 回复/查看 最后发表

没有相关内容

相关日志

[转载]Checklist for computational programs in bioinformatics
chuangma2006 2012-11-28 13:57
How to evaluate performance of prediction methods? Measures and their interpretation in variation effect analysis Mauno Vihinen BMC Genomics 2012, 13 (Suppl 4):S2 doi:10.1186/1471-2164-13-S4-S2 http://www.biomedcentral.com/1471-2164/13/S4/S2 This checklist is provided to help when comparing and measuring performance of predictors and when selecting a suitable one. These are items that method developers should include in articles, or as supplement to articles, as they enable effective comparison and evaluation of the performance of predictors. Items to check when estimating method performance and comparing performance of different methods: - Is the method described in detail? - Have the developers used established databases and benchmarks for training and testing (if available)? - If not, are the datasets available? - Is the version of the method mentioned (if several versions exist)? - Is the contingency table available? - Have the developers reported all the six performance measures: sensitivity, specificity, positive predictive value, negative predictive value, accuracy and Matthews correlation coefficient. If not, can they be calculated from figures provided by developers? - Has cross validation or some other partitioning method been used in method testing? - Are the training and test sets disjoint? - Are the results in balance e.g. between sensitivity and specificity? - Has the ROC curve been drawn based on the entire test set? - Inspect the ROC curve and AUC. - How does the method compare to others in all the measures? - Does the method provide probabilities for predictions?
个人分类: Research|1754 次阅读|0 个评论
缺陷二分类预测的评价
lzhx171 2012-11-10 21:49
最近由于考试等复习,新论文看的甚少,主要看了一篇 2010-ASE 的由 Tim.M 等人发的一篇利用静态代码属性进行缺陷预测的文章( Defect prediction from static code features-current results, limitations, new approaches ),借这个机会,我也慢慢总结自己看过的一些文献。 这周我主要总结了以前所看文献对二分类问题的评价。首先先简单说明一下利用静态代码的预测中,为何使用二分类进行预测。在发现一个缺陷时,我们很难知道这个缺陷的危害程度,也就是说如果仅仅利用静态代码的话是很难挖掘出哪个缺陷应该被优先修复,在之前的一些文献中也有人提过 “ 由于对危害的定义很广泛,没有统一的标准 ” ,因此用 0/1 来二分类更明确。还有一些方法是预测模块中缺陷的数量的,但在挖掘 NASA 数据集后发现,大部分模块只包含 1 个缺陷,因此,预测模块中含有的数量并没有太大的意义(这里只强调模块内,并不是预测全局),二分类就完全可以。 大多数论文,在对二分类评价时采用了AUC,即ROC曲线下面积,也就是利用pd和pf的关系来画ROC曲线。pd为召回率,即查出的缺陷占总缺陷的比例;pf是错误警示,即错误将无缺陷预测为有缺陷的模块占所有无缺陷模块的比例。07年时Tim.M等人发现只用AUC虽然能在一定程度上说明模型的好坏,但是由于ROC曲线有很多(pd,pf)点,因此要选出一个最好的点来作为模型的打分,如何选取这个点,他们引入了balance的评测方法,定义如下: 显然,最好的情况下为(pd,pf)=(1,0),以上公式其实就是(pd,pf)到最理想点的欧式距离,作为模型的评价方法。这个公式平等的兼顾了 pd,pf,但缺点是可能会出现多个(pd,pf)点。 还有一种常用的评价方法为F1-measure,定义为: F1 = 2rp / ( r +p ) r就是上面的召回率,p是precision (表示预测出的有缺陷模块中真正有缺陷所占比例,即精度),这个也是很多论文中用到的,主要的优点是更注重对有缺陷模块预测的重要性。 10年Tim.M等人发现,在软件缺陷的二分类时,不能仅仅用简单的用信息检索的评价方法来对模型进行评估,应该结合软件工程中软件测试的一些问题来解决。例如,如果我们预测100%的模块有缺陷,那么在QA中就需要花很多的成本来维护,而预测10%的模块有缺陷时,测试成本也就相对低,而缺陷预测的目标是利用最少的模块找出最多的缺陷。于是引入了effort参数,来表示缺陷的比例,作为一项衡量测试成本的参数引入到ROC曲线中,组成一个三维的坐标系如下图: 改进的ROC图,类似的balance评估方法也会变成如下: $\alpha$等参数是权值,可以根据实际来定。以上公式也就是与(1,0,0)理想空间点的带权距离。 针对缺陷的二分类评价很多,以后遇到新的方法还会更新。
130 次阅读|0 个评论
基于抽样的缺陷预测
lzhx171 2012-10-14 18:25
本周看了一篇12年发表在ASE上的文章,文章名为Sample-based software defect prediction with active and semi-supervised learning. 翻译为基于抽样方法的主动学习和半监督学习软件缺陷预测。 文中作者描述了三种方法,一是利用传统的机器学习方法随机抽样训练,二是利用半监督学习(semi-supervised learning)训练器随机抽样,三是利用主动半监督学习训练主动抽样(active sampling)的结果,并提出一种叫ACoForest的算法进行主动抽样。 本文核心是为主动学习+半监督学习,以及融合这二者的提出的ACoForest算法。为了描述这个算法,我们先描述一般抽样的半监督学习算法(CoForest,这个算法在07年时由本文作者提出,并应用于医学诊断当中)。 给定带标签集合L,和为标记集合U,首先利用带标签的训练集初始化N个随机树,接着在每次迭代中用N-1个随机树集成训练预测 标记U中数据 ,并将可信度较高的实例加入训练集L , 中, 对L , 随机抽样 ,使其 满足一定的条件, 然后 由带标签集合L和新标记的实例集合L , 进行优化 ,直到迭代中没有任何一个随机树变化为止。(红色标记部分的需要满足一定的条件,这个在以前报告中讲过)。 以上为CoForest方法。在优化随机树时,应该选取最有助于优化的算法,这样可以减小训练集而同时提高精确率,因此在进行优化前(上段蓝色字体),选取N个随机树最有争议(说明所含信息多)的前M组数据,再进行之后的过程,这个就是ACoForest。这个算法利用了主动学习及半监督学习的优点,使得每个随机树收敛的更快。 在实验部分,作者比较了ACoForest的方法,几乎在所有数据集上F1值都好于CoForest。此算法的新颖之处在于训练集的获取上,个人认为就是要找到含有有效信息最多的数据集合,作者通过他们之前提出的一种基于分歧的半监督学习方法,根据多分类器对每个数据的分歧程度来说明一个数据是否值得作为训练集。可以说这是一篇作者在他们之前研究基础上的一种应用延伸。
201 次阅读|0 个评论
会议信息——4th WGNE workshop
jiati0214 2012-9-27 11:22
4th WGNE workshop on systematic errors in weather and climate models The JSC/CAS Working Group on Numerical Experimentation (WGNE) is organising a workshop on systematic errors in weather and climate models to be hosted at the Met Office, Exeter, UK, during 15-19 April 2013. The principal goal will be to increase understanding of the nature and cause of errors in models used for weather and climate prediction (including intra-seasonal to inter-annual). It is anticipated that the focus will be on General Circulation Models (GCMs) such as those used in CMIP5 , TIGGE , operational NWP, etc., including atmosphere-only, coupled atmosphere-ocean and earth system models. Biases in the atmosphere, land surface, ocean and cryosphere are all of interest. A wide variety of diagnostic techniques will be discussed, including traditional analysis methods applied to global models, process studies, the use of diagnostic and process models (e.g. single-column, cloud-resolving), and simplified experiments (e.g. aqua-planet). Of special interest will be studies that consider errors found in multiple models and errors which are present across timescales. Diagnostics and metrics that utilize novel or multi-variate observational resources and constraints to identify and characterize systematic errors are welcomed, together with studies which infer the amount of systematic error in predicted extremes from systematic errors in non-extreme situations. Alongside WGNE , the following groups will contribute to the coordination of the workshop: The Working Group on Coupled Models ( WGCM ), the Working Group on Seasonal to Inter-annual Prediction ( WGSIP ), the Working Group on Ocean Model Development ( WGOMD ), Stratospheric Processes And their Role in Climate ( SPARC ), Global Energy and Water Cycle Experiment ( GEWEX ), the Joint Working Group on Forecast Verification Research ( JWGFVR ), and the Year Of Tropical Convection ( YOTC ) project. 详细信息见:http://www.metoffice.gov.uk/conference/wgne2013
2305 次阅读|0 个评论
预测在相向行人流中的作用——封面文章
热度 3 majian 2012-7-24 22:33
预测在相向行人流中的作用——封面文章
Effect of prediction on the self-organization of pedestrian counter flow http://iopscience.iop.org/1751-8121/45/30/305004/ 刚刚收到通知,得知最近和北交大王子洋博士合作的一篇文章被Journal of Physics A选为封面文章,(2012年45卷30期),如下图。 该文主要考虑人在运动过程中的预测机制对自组织行为的影响,主要表现为counter flow中的lane formation会产生不同的变化。为量化这种影响,我们在Physica A的文章(2010, 389:2101-2117.)中方法的基础上对这种影响进行了研究,从封面的图中可以直观的看出lane的个数、形态的变化趋势。 另附Physica a的文章地址: k -Nearest-Neighbor interaction induced self-organized pedestrian counter flow http://www.sciencedirect.com/science/article/pii/S0378437110000464
个人分类: 复杂系统|4755 次阅读|5 个评论
第5篇一作:Journal of Geophysical Research
yongbin 2012-5-11 14:39
2011JD017069.pdf Citation: Yong, B. , Y. Hong, L. L. Ren, , J. J. Gourley, G. J. Huffman, X. Chen, W. Wang, and S. I. Khan (2012), Assessment of evolving TRMM-based multi-satellite real-time precipitation estimation methods and their impacts on hydrologic prediction in a high latitude basin, Journal of Geophysical Research- Atmosphere , 117, D09108, doi: 10.1029/2011JD017069 .
个人分类: 科学研究|6226 次阅读|0 个评论
2004-01Achieving real-time pulse-to-pulse PRI prediction
lcj2212916 2012-1-23 14:53
共9页。 免费网盘下载地址: http://www.ctdisk.com/file/4322860 论坛下载地址: http://radarew.5d6d.com/thread-582-1-1.html
1859 次阅读|0 个评论
2005-10Enabling technology radar PRI and RF prediction
lcj2212916 2012-1-23 11:31
共23页。 免费网盘下载地址: http://www.ctdisk.com/file/4322102 论坛下载地址: http://radarew.5d6d.com/thread-580-1-1.html
1844 次阅读|0 个评论
2004-09Using PRI prediction to improve ECM effectiveness
lcj2212916 2012-1-22 18:55
共54页。 免费网盘下载地址: http://www.ctdisk.com/file/4319476
1683 次阅读|0 个评论
2005PRI and RF prediction enabling technology
lcj2212916 2012-1-22 18:34
免费网盘下载地址: http://www.ctdisk.com/file/4319421
1781 次阅读|0 个评论
2012-01Achieving real-time pulse-to-pulse PRI prediction
lcj2212916 2012-1-22 17:40
共28页。 免费网盘下载地址: http://www.ctdisk.com/file/4316990 论坛下载地址: http://radarew.5d6d.com/thread-577-1-1.html
1706 次阅读|0 个评论
2002-02PRI prediction enhance EW training
lcj2212916 2012-1-22 15:26
免费网盘下载地址: http://www.ctdisk.com/file/4314691 论坛下载地址: http://radarew.5d6d.com/thread-576-1-1.html
1999 次阅读|0 个评论
数论在短期地震预报中的运用
热度 1 sfw111 2011-11-29 12:41
Abstract: hort-term earthquake prediction has always been a very difficult problem in geology, 15 this article pre-displacement, pre-established short-term break for the earthquake prediction based on the theory becomes completely abandoned to form the basis of earthquake prediction method, short-term earthquake prediction is a theoretical breakthrough. Key words: Mechanics; earthquake,;short-term forecasting,;pre-displacement; pre-fracture 摘要: 地震短期预报历来是一个十分困难的地质学问题,本文以预位移预断裂为依据对于短期地震预报进行了理论思考,一旦该理论被实践所证明,将会是地震短期预报的一次理论突破。 关键词 :固体力学;地震;短期预报;预位移;预断裂 预位移预断裂短期地震预报数学方法探析.pdf
个人分类: 科学研究|366 次阅读|1 个评论
[转载]基于网络的预测: Network-based prediction for sources .
Fangjinqin 2011-8-12 09:33
基于网络的 预测13347.full.pdf Network-based prediction for sources of transcriptional dysregulation using latent pathway identification analysis Lisa Phama, Lisa Christadoreb, Scott Schausb, and Eric D. Kolaczykc,1 aProgram in Bioinformatics, Understanding the systemic biological pathways and the key cellular mechanisms that dictate disease states, drug response, and altered cellular function poses a significant challenge. Although high-throughput measurement techniques, such as transcriptional profiling, give some insight into the altered state of a cell, they fall far short of providing by themselves a complete picture. Some improvement can be made by using enrichmentbased methods to, for example, organize biological data of this sort into collections of dysregulated pathways. However, such methods arguably are still limited to primarily a transcriptional view of the cell. Augmenting these methods still further with networks and additional -omics data has been found to yield pathways that play more fundamental roles. We propose a previously undescribed method for identification of such pathways that takes a more direct approach to the problem than any published to date. Our method, called latent pathway identification analysis (LPIA), looks for statistically significant evidence of dysregulation in a network of pathways constructed in a manner that implicitly links pathways through their common function in the cell. We describe the LPIA methodology and illustrate its effectiveness through analysis of data on (i) metastatic cancer progression, (ii) drug treatment in human lung carcinoma cells, and (iii) diagnosis of type 2 diabetes. With these analyses, we show that LPIA can successfully identify pathways whose perturbations have latent influences on the transcriptionally altered genes.
个人分类: 学术文章|2284 次阅读|0 个评论
[转载]Buy America (I mean American houses!)
zuojun 2011-7-14 06:52
Why? Because the price for houses will go up eventually! Gary Shilling: 20% Drop in Housing to Cause Recession in 2012 Gary Shilling, President of A. Gary Shilling Co. and author of the Age of Deleveraging says another recession is brewing -- no matter what action the Fed takes. Shilling says the shock to trigger the next recess is "another big leg-down in housing." (An asset class the Fed has not been able to reflate.) As those familiar with Shilling know, his forecasts are generally bearish. However, in his defense, Shilling was one of the few economists who correctly predicted the dangers of the subprime mortgage market and its impact on the broader economy. The problem with the real estate market remains excess inventory. Based on Shilling's research, there are 2 million to 2.5 million excess homes in the country -- a supply that will take 4-5 years to work-off. The result: Housing prices will fall another 20% and underwater mortgages will balloon from 23% to 40%, he says. With housing slumping again, Shilling says recession is coming to a town near you in 2012. http://finance.yahoo.com/blogs/daily-ticker/20-drop-housing-cause-recession-2012-says-gary-161445494.html
个人分类: From the U.S.|1586 次阅读|0 个评论
[转载]CAFA: Critical Assessment of Function Annotation
lry198010 2010-8-5 19:11
随着测序技术的发展,基因和基因组及其相关序列的获取已不是什么重大问题,可以预计今后相当长一段时间,如何有效的注释基因的功能将成为生物学亟待解决的一个问题。 目前,未知功能的基因的数目是如此之多,在基因功能的预测上,如果能提高一个百分点,那么能节省的资源和人力将是非常可观的。而想完全通过实验的方法来注明数量如此之大的未知基因,在今后很长一段时间来说都是不可能的,因此,基因预测软件将是预测未知功能基因功能的最后方法。但是,目前存在如此之多的基因功能预测软件,如何评价这些软件预测结果的可靠性和准确性,一级如何对不同预测软件的结果进行比较依旧是一个问题。CAFA就是对这种问题的一种尝试,从现在开始,CAFA将提供50,000个未知功能的蛋白质序列,所有参与这个项目的基因功能预测软件对提供的序列进行功能的注释,到明年1月份的时候再把预测的结果提交给CAFA,最后在五月份将在ISMB2011一个专门的会上对预测的结果进行评价。 这是一个非常有意思的活动!预计这个活动将会对基因功能预测软件有着非常大的影响,可能在结果准确性的评价上将会有新的标准,而这50,000个蛋白质序列将会成为蛋白质功能预测的benchmark。 CAFA的网址: CAFA experiment website
个人分类: genetic association breeding|3353 次阅读|0 个评论
[转载]Mathematical and Statistical Approaches to Climate Modelling and Prediction
zuojun 2010-7-9 07:11
Isaac Newton Institute for Mathematical Sciences Mathematical and Statistical Approaches to Climate Modelling and Prediction 11 August - 22 December 2010 Here is the link: http://www.newton.ac.uk/programmes/CLP/index.html
个人分类: My Research Interests|2279 次阅读|0 个评论
Two newly published papers on Link Prediction
babyann519 2009-10-28 02:53
Many complex systems can be well described by networks where nodes present individuals or agents, and links denote the relations or interactions between nodes. Recently, the link prediction of complex networks has attracted more and more attention from computer scientists and physicists. Link prediction aims at estimating the likelihood of the existence of a link between two nodes, based on the observed links and the attributes of the nodes. For example, classical information retrieval can be viewed as predicting missing links between words and documents, and the process of recommending items to a user can be considered as a link prediction problem in the user-item bipartite network. Attached please find two newly published papers about the problem of link prediction. One (EPJB)discussed missing links prediction via local information. The other (PRE) introduced an efficient and effective similarity index, called Local Path index for link prediction. PRE_80_046122 EPJB_71_623
个人分类: 未分类|13461 次阅读|8 个评论

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-5-18 02:55

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部