Cluster Analysis in Python
聚类
数据是么有标签的,属于无监督学习
hierarchical clustering
层次聚类法
- linkage:聚合距离函数
- fcluster:层次聚类函数
- 使用scipy包中的函数
# Import linkage and fcluster functions from scipy.cluster.hierarchy import linkage, fcluster # Use the linkage() function to compute distances Z = linkage(df, 'ward') # Generate cluster labels df['cluster_labels'] = fcluster(Z, 2, criterion='maxclust') # Plot the points with seaborn sns.scatterplot(x='x', y='y', hue='cluster_labels', data=df) plt.show()
kmeans
均值聚类
- 使用vq函数将样本数据中的每个样本点分配给一个中心点,形成n个聚类vq
whiten:白化预处理是一种常见的数据预处理方法,作用是去除样本数据的冗余信息
Normalize a group of observations on a per feature basis.
# Import kmeans and vq functions from scipy.cluster.vq import kmeans, vq # Compute cluster centers centroids,_ = kmeans(df, 2) # Assign cluster labels df['cluster_labels'], _ = vq(df, centroids) # Plot the points with seaborn sns.scatterplot(x='x', y='y', hue='cluster_labels', data=df) plt.show()
# Import the whiten function from scipy.cluster.vq import whiten goals_for = [4,3,2,3,1,1,2,0,1,4] # Use the whiten() function to standardize the data scaled_data =whiten(goals_for) print(scaled_data) <script.py> output: [3.07692308 2.30769231 1.53846154 2.30769231 0.76923077 0.76923077 1.53846154 0. 0.76923077 3.07692308]
fifa数据集的一个小demo
# Scale wage and value fifa['scaled_wage'] = whiten(fifa['eur_wage']) fifa['scaled_value'] = whiten(fifa['eur_value']) # Plot the two columns in a scatter plot fifa.plot(x='scaled_wage', y='scaled_value', kind = 'scatter') plt.show() # Check mean and standard deviation of scaled values print(fifa[['scaled_wage', 'scaled_value']].describe()) <script.py> output: scaled_wage scaled_value count 1000.00 1000.00 mean 1.12 1.31 std 1.00 1.00 min 0.00 0.00 25% 0.47 0.73 50% 0.85 1.02 75% 1.41 1.54 max 9.11 8.98
相关推荐
mogigo00 2020-11-11
Fredreck 2020-07-19
horizonheart 2020-07-05
swazerz 2020-06-04
路漫 2020-05-30
只能做防骑 2020-05-13
horizonheart 2020-05-09
wonner 2020-05-09
NVEFLY 2020-04-19
从早忙到晚的闲人 2020-04-13
sayhaha 2020-02-05
路漫 2020-01-23
kingzone 2020-01-01
lixiaotao 2019-12-29
sxyhetao 2019-12-14
wuxiaosi0 2019-12-06
huimor 2019-12-07
clouderyu 2019-12-03