Python实现k-means算法
本文实例为大家分享了Python实现k-means算法的具体代码,供大家参考,具体内容如下
这也是周志华《机器学习》的习题9.4。
数据集是西瓜数据集4.0,如下
编号,密度,含糖率
1,0.697,0.46
2,0.774,0.376
3,0.634,0.264
4,0.608,0.318
5,0.556,0.215
6,0.403,0.237
7,0.481,0.149
8,0.437,0.211
9,0.666,0.091
10,0.243,0.267
11,0.245,0.057
12,0.343,0.099
13,0.639,0.161
14,0.657,0.198
15,0.36,0.37
16,0.593,0.042
17,0.719,0.103
18,0.359,0.188
19,0.339,0.241
20,0.282,0.257
21,0.784,0.232
22,0.714,0.346
23,0.483,0.312
24,0.478,0.437
25,0.525,0.369
26,0.751,0.489
27,0.532,0.472
28,0.473,0.376
29,0.725,0.445
30,0.446,0.459
算法很简单,就不解释了,代码也不复杂,直接放上来:
# -*- coding: utf-8 -*- """Excercise 9.4""" import numpy as np import pandas as pd import matplotlib.pyplot as plt import sys import random data = pd.read_csv(filepath_or_buffer = '../dataset/watermelon4.0.csv', sep = ',')[["密度","含糖率"]].values ########################################## K-means ####################################### k = int(sys.argv[1]) #Randomly choose k samples from data as mean vectors mean_vectors = random.sample(data,k) def dist(p1,p2): return np.sqrt(sum((p1-p2)*(p1-p2))) while True: print mean_vectors clusters = map ((lambda x:[x]), mean_vectors) for sample in data: distances = map((lambda m: dist(sample,m)), mean_vectors) min_index = distances.index(min(distances)) clusters[min_index].append(sample) new_mean_vectors = [] for c,v in zip(clusters,mean_vectors): new_mean_vector = sum(c)/len(c) #If the difference betweenthe new mean vector and the old mean vector is less than 0.0001 #then do not updata the mean vector if all(np.divide((new_mean_vector-v),v) < np.array([0.0001,0.0001]) ): new_mean_vectors.append(v) else: new_mean_vectors.append(new_mean_vector) if np.array_equal(mean_vectors,new_mean_vectors): break else: mean_vectors = new_mean_vectors #Show the clustering result total_colors = ['r','y','g','b','c','m','k'] colors = random.sample(total_colors,k) for cluster,color in zip(clusters,colors): density = map(lambda arr:arr[0],cluster) sugar_content = map(lambda arr:arr[1],cluster) plt.scatter(density,sugar_content,c = color) plt.show()
运行方式:在命令行输入 python k_means.py 4。其中4就是k。
下面是k分别等于3,4,5的运行结果,因为一开始的均值向量是随机的,所以每次运行结果会有不同。
相关推荐
horizonheart 2020-07-05
lixiaotao 2020-03-02
路漫 2020-01-23
jiayuqicz 2020-01-01
lixiaotao 2019-12-29
清溪算法 2019-12-23
muyurenzheng 2019-12-02
蜗牛慢爬的李成广 2019-11-09
lixiaotao 2019-10-28
KDF000 2013-03-10
KDF000 2013-03-06
Oudasheng 2012-12-27
JueryLee 2019-06-28
WindChaser 2019-06-27
小白的成长 2010-12-20
WindChaser 2019-06-26
WindChaser 2019-06-21
ivabrother 2019-06-04
风吹夏天 2019-05-21
SuMFLOWERS 2018-04-01
秋田小鼠 2018-10-19
kunlong00 2018-09-15
bamboocqh 2018-09-02
yxqfxd 2018-11-15
MasterCui 2018-05-25
香蕉麦乐迪 2018-02-23
jiayuqicz 2018-02-22
CorrectForm 2018-02-23
HTML学堂码匠 2017-12-05
kker 2019-04-29
yuanran0 2017-11-13
slxshare 2017-11-11
yimixgg 2017-10-17