博文

matlab-聚类算法笔记

已有 11809 次阅读 2011-11-4 15:59 |个人分类:teresa学习笔记|系统分类:科研笔记|关键词:学者| 笔记

MATLAB提供了两种方法进行聚类分析：

1、利用clusterdata 函数对数据样本进行一次聚类，这个方法简洁方便，其特点是使用范围较窄，不能由用户根据自身需要来设定参数，更改距离计算方法；

2、分步聚类：（1）用pdist函数计算变量之间的距离，找到数据集合中两辆变量之间的相似性和非相似性；（2）用linkage函数定义变量之间的连接；（3）用cophenetic函数评价聚类信息；（4）用cluster函数进行聚类。

clusterdata
一种是利用 clusterdata函数对样本数据进行一次聚类，其缺点为可供用户选择的面较窄，不能更改距离的计算方法；
Construct clusters from data

Syntax

T = clusterdata(X, cutoff)
T = clusterdata(X,'param1',val1,'param2',val2,...)

clusterdata是用plist、linkage、cophenetic三个函数对数据X进行聚类的；

X是m×n的矩阵，cutoff是一个阈值，使得聚类数划分等级的。

'distance'：Any of the distance metric names allowed by pdist (follow the 'minkowski' option by the value of the exponent p)；任何距离量度的统称（例如minkowski距离，欧式距离，马氏距离等）

'linkage'Any of the linkage methods allowed by the linkage function，（使用linkage的任何连接方法）

'cutoff'Cutoff for inconsistent or distance measure（不一致或距离的测量）

'maxclust'Maximum number of clusters to form（最大数量的聚类形式）

'criterion' Either 'inconsistent' or 'distance'（不一致或距离）

'depth'Depth for computing inconsistent values（计算深度不一致性）

例1
X=[11978 12.5 93.5 31908;…;57500 67.6 238.0 15900];

T=clusterdata(X,0.9)；

例2

rand('state',12); %对随机数使用“state”方法进行随机数输出
X = [rand(10,3); rand(10,3)+1.2; rand(10,3)+2.5]; %使用随机数组成一个10行3列的数据
T = clusterdata(X,'maxclust',3); %对X数组进行聚类，聚成3类
find(T==2)%找到分类为2的类的索引

ans =

    11
    12
    13
    14
    15
    16
    17
    18
    19
    20

X的随机值为=

0.526563655116966 0.314160189162942 0.080065636597459
0.750205183120925 0.460299825114432 0.898696464610818
0.665461227195465 0.694011417546359 0.910465702645885
0.964047588742116 0.001430822000113 0.739874220859649
0.108159056609906 0.553028790706944 0.066380478467501
0.931359132232088 0.825424913690079 0.952315438754947
0.678086959238781 0.341903966913527 0.561481952384538
0.982730942848522 0.704605210117893 0.087097863371214
0.61469160803023 0.046998923124057 0.60240645087182
0.580161260939054 0.917354969151808 0.588163845515278
1.382463100625415 1.963581607169883 1.944378753177476
2.10675860143888 1.67148731861097 1.348544774679616
1.398800733731886 1.661420472538929 1.322245532927235
1.714104593458096 1.491763801233318 1.45432173385559
1.541023406502844 1.843749450951724 1.646589531966269
2.085124805604476 1.845243529032419 2.173408525894387
1.307487415137871 1.538016451755838 2.160077353655978
1.414477011958066 1.993290719360019 1.991074187198809
1.6194348823557 1.477032783770685 1.897881627154902
1.59880598537658 1.549889835739045 1.575633454910911
3.372473795296814 2.696353072311677 3.399817031232327
3.137051221640599 3.365280920827324 3.060890738629505
3.294132532102184 3.19619501414256 2.907001691195813
2.655105137336478 3.067858951189933 2.971985435647922
3.309410399232246 2.592839654750768 2.577141096894014
2.59557218643413 3.334773703571633 3.087931862332622
2.58206179687188 3.416156742412155 3.264419917354281
2.71127001520713 2.770324454152381 2.634665034882088
2.796178480239203 3.254737176245175 3.418015616180941
2.647417542325437 2.545380417876791 3.253541134557589

发现随机值《5，

当改动X(9,1)=500时，看一些分类结果，根据经验可知X(9,1)会被单独分成一类：

find(T==3)

ans =

find(T==2)

ans =

     1
     2
     3
     4
     5
     6
     7
     8
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20

>> find(T==1)

ans =

    21
    22
    23
    24
    25
    26
    27
    28
    29
    30

转载本文请联系原作者获取授权，同时请注明本文来自孙月芳科学网博客。
链接地址：https://m.sciencenet.cn/blog-582961-504552.html

收藏分享

当前推荐数：1 推荐人：田灿荣

该博文允许注册用户评论请点击登录评论 (0 个评论)

数据加载中...

返回顶部

孙月芳

扫一扫，分享此博文

Sunteresa的个人博客分享 http://blog.sciencenet.cn/u/Sunteresa

博文

matlab-聚类算法笔记

当前推荐数：1 推荐人：田灿荣

该博文允许注册用户评论请点击登录评论 (0 个评论)

孙月芳

全部精选博文导读

相关博文

Sunteresa的个人博客分享 http://blog.sciencenet.cn/u/Sunteresa

博文

matlab-聚类算法笔记

当前推荐数：1 推荐人： 田灿荣

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

孙月芳

全部精选博文导读

相关博文

当前推荐数：1 推荐人：田灿荣

该博文允许注册用户评论请点击登录评论 (0 个评论)