机器学习算法可以分类不同的波形?
假设我们有三种波形,如图所示。
蓝色模式显示从正弦函数采样的数据;
红色模式显示从随机分布采样的数据;
绿色模式显示从三个正弦函数的组合中采样的数据;
这里是我们的问题:如果我们有很多由这三种波形组成的样本,机器学习算法是否可以正确分类这些波形,如监督学习中的一般多类分类问题?
简单的答案?是! 机器学习算法和深度学习机制都可以正确管理这样的多类分类问题。这里是关于实验的简要介绍。
数据:有1万个训练波形,10,000个测试波形。每个模式有500个数据点。三种模式大致均匀分布在训练数据集和测试数据集中。
算法:在Scikit-Learn下的四种机器学习分类算法(朴素贝叶斯,随机森林,梯度增强和支持向量机)和Keras实现的一种神经网络体系结构(即多层感知器)进行了测试。
性能:所有分类方法都可以预测精度高于98%的结果。
用Python实现Scikit-Learn下的四种分类算法
# import library from Scikit-Learn ---------------------------------------------
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
# algorithm 1 ------------------------------------------------------------------
print(" Naive Bayes ... ")
start = timeit.default_timer()
from sklearn import naive_bayes
classifier = naive_bayes.GaussianNB()
nb_model = classifier.fit(X, Y)
prediction = nb_model.predict(X_test)
end = timeit.default_timer()
print(" accuracy = ", accuracy_score(Y_test, prediction), " time = ", end - start)
print(confusion_matrix(Y_test, prediction))
print("")
# algorithm 2 ------------------------------------------------------------------
print(" Random Forest ... ")
start = timeit.default_timer()
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier()
rf_model = classifier.fit(X, Y)
prediction = rf_model.predict(X_test)
end = timeit.default_timer()
print(" accuracy = ", accuracy_score(Y_test, prediction), " time = ", end - start)
print(confusion_matrix(Y_test, prediction))
print("")
# algorithm 3 ------------------------------------------------------------------
print(" Gradient Boosting ... ")
start = timeit.default_timer()
from sklearn.ensemble import GradientBoostingClassifier as gbc
classifier = gbc()
gbc_model = classifier.fit(X, Y)
prediction = gbc_model.predict(X_test)
end = timeit.default_timer()
print(" accuracy = ", accuracy_score(Y_test, prediction), " time = ", end - start)
print(confusion_matrix(Y_test, prediction))
print("")
# algorithm 4 ------------------------------------------------------------------
print(" SVM ... ")
start = timeit.default_timer()
from sklearn import svm
classifier = svm.SVC()
svc_model = classifier.fit(X, Y)
prediction = svc_model.predict(X_test)
end = timeit.default_timer()
print(" accuracy = ", accuracy_score(Y_test, prediction), " time = ", end - start)
print(confusion_matrix(Y_test, prediction))
print("")
由Keras实现的三层多层感知器
____________________________________________
Layer (type) Output Shape Param #
==========================================
dense_1 (Dense) (None, 32) 16032
_____________________________________________
activation_1 (Activation) (None, 32) 0
_____________________________________________
dense_2 (Dense) (None, 32) 1056
_____________________________________________
activation_2 (Activation) (None, 32) 0
_____________________________________________
dense_3 (Dense) (None, 32) 1056
_____________________________________________
activation_3 (Activation) (None, 32) 0
_____________________________________________
dense_4 (Dense) (None, 3) 99
_____________________________________________
activation_4 (Activation) (None, 3) 0
===========================================
Total params: 18,243
Trainable params: 18,243
Non-trainable params: 0
不同算法下的精确度和执行时间
MLP (3 layers, 32 neurons, 30 epochs) ...
accuracy = 0.9986 time = 14.8910531305
[[3378 0 0]
[ 0 3407 0]
[ 13 1 3201]]
Naive Bayes ...
accuracy = 0.9874 time = 1.30055759897
[[3252 0 126]
[ 0 3407 0]
[ 0 0 3215]]
Random Forest ...
accuracy = 1.0 time = 3.23794154915
[[3378 0 0]
[ 0 3407 0]
[ 0 0 3215]]
Gradient Boosting ...
accuracy = 1.0 time = 128.849120774
[[3378 0 0]
[ 0 3407 0]
[ 0 0 3215]]
SVM ...
accuracy = 1.0 time = 116.055568152
[[3378 0 0]
[ 0 3407 0]
[ 0 0 3215]]