机器学习:KERAS对乳腺癌的分类准确率为98.18%
数据集
可在此处找到数据链接(https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/)。
导入Python库
import numpy as np
from sklearn import preprocessing, cross_validation
import pandas as pd
读取数据
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data')
Reshaping
将特征列添加到dataframe,Python代码如下:
df.columns = ['id','clump_thickness','unif_cell_size','unif_cell_shape','marg_adhesion','single_epith_size','bare_nuclei','bland_chrom','norm_nucleoli','mitoses','class']
删除id列因为与类没有相关性
df.drop(['id'], inplace=True, axis=1)
用-99999替换空数据为异常值
df.replace('?', -99999, inplace=True)
将类值映射到二进制,在我们的数据中它是2和4。(2为良性,4为恶性)
df['class'] = df['class'].map(lambda x: 1 if x == 4 else 0)
最终的dataframe
缩放数据
创建X(特征)和y(类)
X = np.array(df.drop(['class'], axis=1))
y = np.array(df['class'])
创建scaler 实例
scaler = preprocessing.MinMaxScaler()
最后缩放数据
X = scaler.fit_transform(X)
拆分数据
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.2)
创建机器学习模型和训练
导入Python库
from __future__ import print_function
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Activation
import tensorflow as tf
创建机器学习模型
创建机器学习模型实例,Python代码如下:
model = Sequential()
将层添加到机器学习模型中,Python代码如下:
model.add(Dense(9, activation='sigmoid', input_shape=(9,)))
model.add(Dense(27, activation='sigmoid'))
model.add(Dropout(0.25))
model.add(Dense(54, activation='sigmoid'))
model.add(Dropout(0.25))
model.add(Dense(27, activation='sigmoid'))
model.add(Dropout(0.25))
model.add(Dense(1, activation='sigmoid'))
编译模型
model.compile(optimizer=keras.optimizers.Adam(), loss=keras.losses.mean_squared_logarithmic_error)
我用Adam作为优化器,对数均方误差作为损失函数。
训练机器学习模型
model.fit(X_train, y_train, batch_size=30, epochs=2000, verbose=1, validation_data=(X_test, y_test))
Output:
Epoch 2000/2000
558/558 [==============================] - 0s 320us/step - loss: 0.0104 - val_loss: 0.0182
评估结果
loss = model.evaluate(X_test, y_test, verbose=1, batch_size=30)
print("Final result is {}".format(100 - loss*100))
Output:
Final result is 98.18395614690546
最终结果是98.18%