博文

读The Elements of Statistical learning

已有 6263 次阅读 2010-9-5 20:29 |个人分类:科研笔记|系统分类:科研笔记|关键词:学者| Learning, statistical

Chapter 2 Overview of Supervised learning
2.1 几个常用且意义相同的术语:
inputs在统计类的文献中,叫做predictors, 但经典叫法是independently variables,在模式识别中,叫做feature.
outputs,叫做responses, 经典叫法是dependently variables.

2.2 给出了回归和分类问题的基本定义

2.3 介绍两类简单的预测方法: Least square 和 KNN:
Least square产生的linear decision boundary的特点: low variance but potentially high bias;
KNN wiggly and unstabla,也就是high variance and low bias.

这一段总结蛮经典:

A large subset of the most popular techniques in use today are variants of these two simple procedures. In fact 1-nearest-neighbor, the simplest of all, captures a large percentage of the market for low-dimensional problems. The following list describes some ways in which these simple procedures have been enhanced:

~ Kernel methods use weights that decrease smoothly to zero with distance from the target point, ather than the e®ective 0=1 weights used by k-nearest neighbors.

~In high-dimensional spaces the distance kernels are modified to emphasize some variable more than others.

~Local regression fits linear models by locally weighted least squares rather than fitting constants locally.

~Linear models fit to a basis expansion of the original inputs allow arbitrarily complex models.

~Projection pursuit and neural network models consist of sums of non-linearly transformed linear models.

2.4 统计决策的理论分析

看不进去,没怎么看懂,明天看新内容前再看一遍,今天看的内容 p35-p43.

2.5节讨论了local methods KNN在高维特征下的问题, 在维数增大的情况下,要选取r部分的样本,所需要的边长接近1,这样会导致variance非常高.

2.6节分为统计模型,监督学习介绍和函数估计的方法来介绍,统计模型给出一般问的统计概率模型,监督学习说明了用训练样例来拟合函数,函数估计介绍了常用的参数估计,选取使目标函数最大的参数作为估计.

2.7 介绍了structured regression methods,它能解决某些情况下不好解决的问题.

2.8 一些估计器的介绍:

2.8.1 通过roughness penalty, 实质就是regularized methods,通过penalty 项限制函数空间的复杂度.

2.8.2 kernel methods and local regression kermel function实际上和local neighbor方法类似,kernel反映了样本间的距离

2.8.3 basis functions and Dictionary methods 从dictionary中选出若干basis functions叠加作为得到的function. 单层前反馈神经网络和boosting 还有MARS,MART都属于这一类方法.

2.9 模型选择和bias, variance的折中

往往模型的复杂度越高(例如regularizer控制项越小), bias越低但是variance越高. 造成训练错误率很低但是测试错误率很高. 反之亦然. 简图2.11

看到61页.主要讲了解回归问题的若干线性方法, 首先是基本回归问题,然后介绍多回归,多输出,接着说subset selection, forward stepwise/stagewise selection(两种的区别是后者更新时不会对其他变量做调整). 3.4 shrinkage methods 便是加入regularizer来smooth化,因为subset selection后的数据偏离散. 如果用平方则是ridge regression, 如果用绝对值就是lasso,还有一种变形least angle regression,和lasso很相关,明天再看看吧.也就是61页到97页的内容.

补充:3.3节对linear regression问题中约束对应的p-norm进行了分析,当p=1.2(文中q表示这里的p)是和elastic net penalty外形很相似,但事实上前者光滑,后者sharp(non-differentiable), (可微意味着无穷阶可导).

3.4节 Least Angle Regression(LAR),和lasso几乎相同,但是在非零取值为0时,相应的变量要从active set中移出,重新计算direction.

3.5节讨论了principal component regression 和partial least squares的方法, 应该可以理解为降维,将原来的d维数据映射到m(m<d)上面再求解.

3.6 讨论了selection 和 shrinkage方法的比较,貌似的优化的方向选择的不同;

3.7多元输出的selection和shrinkage

3.8 Lasso更多的讨论和路径算法 : 基本的优化形式loss+penalty, loss 和penalty的不同造成了关于lasso之类的很多讨论. 另外有提到线性规划用单纯形法求解,记录一下怕将来需要看线性规划的东西没有方向.

3.9 计算代价的分析

Chapter 4 解分类问题的线性方法

4.1 介绍了线性决策边界为线性方法

4.2 indicator matrix的线性回归,

4.3 LDS linear discrimant analysis, 假设每一类为多元高斯分布如下, 在利用到概率密度给出分类的条件概率时,若概率密度函数中的协方差矩阵

均相同就引出了LDA.

文章接着对LDA的各种情形和计算方式进行了讨论.

4.4 p137 明天重新过一遍,结束第4章

转载本文请联系原作者获取授权，同时请注明本文来自彭泽武科学网博客。
链接地址：https://m.sciencenet.cn/blog-472136-359654.html

上一篇：Latex 应用小结
下一篇：beamer主题

收藏分享

当前推荐数：0

发表评论评论 (0 个评论)

数据加载中...

返回顶部

彭泽武

扫一扫，分享此博文

全部作者的其他最新博文

• transfer ranking

研究记录分享 http://blog.sciencenet.cn/u/petrelli 记录是自我管理的有效方式

博文

读The Elements of Statistical learning

当前推荐数：0

发表评论评论 (0 个评论)

彭泽武

全部作者的其他最新博文

全部精选博文导读

相关博文

研究记录分享 http://blog.sciencenet.cn/u/petrelli 记录是自我管理的有效方式

博文

读The Elements of Statistical learning

当前推荐数：0

发表评论 评论 (0 个评论)

彭泽武

全部作者的其他最新博文

全部精选博文导读

相关博文

发表评论评论 (0 个评论)