這學期選修了一門Python在大數(shù)據(jù)中的應(yīng)用這門課,是方老師教的,了解了一些數(shù)據(jù)分析常用的庫,Numpy,plt,sklearn等
印象比較深的庫有
1.plt 可以對數(shù)據(jù)進行可視化,利于直觀的進行數(shù)據(jù)分析
2.sklearn 有許多機器學習算法,可以直接用,十分方便
老師留了幾道python題,我這次直接拿來做總結(jié)
//題目描述:
用scikit-learn加載iris數(shù)據(jù)集,采用KNN、SVM和樸素貝葉斯算法進行分類,最后比較這三種方法的優(yōu)缺點。
代碼:
# -*- coding: utf-8 -*-
"""
Created on Sat Jun 1 18:24:09 2019
@author: Administrator
"""
# =============================================================================
# 作業(yè)要求
# 用scikit-learn加載iris數(shù)據(jù)集,
# 采用KNN、SVM和樸素貝葉斯算法進行分類,最后比較這三種方法的優(yōu)缺點。
# =============================================================================
# =============================================================================
# #Iris也稱鳶尾花卉數(shù)據(jù)集,是一類多重變量分析的數(shù)據(jù)集。
# #可通過花萼長度,花萼寬度,花瓣長度,花瓣寬度4個屬性
# #預(yù)測鳶尾花卉屬于(Setosa,Versicolour,Virginica)三個種類中的哪一類。
# =============================================================================
#導(dǎo)入必要的包
import
numpy
as
np
import
pylab
as
plt
from
sklearn
.
datasets
import
load_iris
from
sklearn
.
model_selection
import
train_test_split
#引入train_test_split函數(shù)
from
sklearn
.
neighbors
import
KNeighborsClassifier
#引入KNN分類器
from
sklearn
.
svm
import
SVC
#引入SVM分類器
from
sklearn
.
naive_bayes
import
GaussianNB
#使用高斯貝葉斯模型
iris
=
load_iris
(
)
#加載iris信息
data
=
iris
.
data
#iris的數(shù)據(jù)集
target
=
iris
.
target
#iris的種類
#使用train_test_split()函數(shù)將數(shù)據(jù)集分成用于訓練的data和用于測試的data
data_train
,
data_test
,
target_train
,
target_test
=
train_test_split
(
data
,
target
,
test_size
=
0.3
,
random_state
=
0
)
#1.kNN算法分類
knn
=
KNeighborsClassifier
(
)
#調(diào)用knn分類器
knn
.
fit
(
data_train
,
target_train
)
#訓練knn分類器
accurate_Knn
=
knn
.
score
(
data_test
,
target_test
,
sample_weight
=
None
)
#調(diào)用該對象的打分方法,計算出準確率
print
'KNN預(yù)測值:'
,
(
knn
.
predict
(
data_test
)
)
#預(yù)測值
print
'真實值:'
,
(
target_test
)
#真實值
print
'KNN輸出訓練集的準確率為:'
,
accurate_Knn
for
i
in
range
(
len
(
target_test
)
)
:
#因為數(shù)據(jù)是二維數(shù)組,所以要用for循環(huán),也可以用reshape對二位數(shù)組進行變形
if
target_test
[
i
]
==
0
:
plt
.
scatter
(
data_test
[
i
,
0
]
,
data_test
[
i
,
1
]
,
c
=
'r'
)
#畫散點圖
elif
target_test
[
i
]
==
1
:
plt
.
scatter
(
data_test
[
i
,
0
]
,
data_test
[
i
,
1
]
,
c
=
'g'
)
else
:
plt
.
scatter
(
data_test
[
i
,
0
]
,
data_test
[
i
,
1
]
,
c
=
'b'
)
plt
.
title
(
"iris"
)
plt
.
xlabel
(
"ewidth"
)
plt
.
ylabel
(
"elength"
)
plt
.
show
(
)
pr
=
knn
.
predict
(
data_test
)
for
i
in
range
(
len
(
pr
)
)
:
if
pr
[
i
]
==
0
:
plt
.
scatter
(
data_test
[
i
,
0
]
,
data_test
[
i
,
1
]
,
c
=
'r'
)
elif
pr
[
i
]
==
1
:
plt
.
scatter
(
data_test
[
i
,
0
]
,
data_test
[
i
,
1
]
,
c
=
'g'
)
else
:
plt
.
scatter
(
data_test
[
i
,
0
]
,
data_test
[
i
,
1
]
,
c
=
'b'
)
plt
.
title
(
"iris-KNN"
)
plt
.
xlabel
(
"ewidth"
)
plt
.
ylabel
(
"elength"
)
plt
.
show
(
)
print
(
"\n\n"
)
#2.SVM算法分類
svm
=
SVC
(
kernel
=
'rbf'
,
gamma
=
0.1
,
decision_function_shape
=
'ovo'
,
C
=
0.8
)
#搭建模型,訓練SVM分類器
svm
.
fit
(
data_train
,
target_train
)
#訓練SVC
accurate_Svm
=
svm
.
score
(
data_train
,
target_train
)
print
'SVM預(yù)測值:'
,
(
svm
.
predict
(
data_test
)
)
#預(yù)測值
print
'真實值:'
,
(
target_test
)
#真實值
print
'SVM-輸出訓練集的準確率為:'
,
accurate_Svm
for
i
in
range
(
len
(
target_test
)
)
:
if
target_test
[
i
]
==
0
:
plt
.
scatter
(
data_test
[
i
,
0
]
,
data_test
[
i
,
1
]
,
c
=
'r'
)
elif
target_test
[
i
]
==
1
:
plt
.
scatter
(
data_test
[
i
,
0
]
,
data_test
[
i
,
1
]
,
c
=
'g'
)
else
:
plt
.
scatter
(
data_test
[
i
,
0
]
,
data_test
[
i
,
1
]
,
c
=
'b'
)
plt
.
title
(
"iris"
)
plt
.
xlabel
(
"ewidth"
)
plt
.
ylabel
(
"elength"
)
plt
.
show
(
)
pr
=
svm
.
predict
(
data_test
)
for
i
in
range
(
len
(
pr
)
)
:
if
pr
[
i
]
==
0
:
plt
.
scatter
(
data_test
[
i
,
0
]
,
data_test
[
i
,
1
]
,
c
=
'r'
)
elif
pr
[
i
]
==
1
:
plt
.
scatter
(
data_test
[
i
,
0
]
,
data_test
[
i
,
1
]
,
c
=
'g'
)
else
:
plt
.
scatter
(
data_test
[
i
,
0
]
,
data_test
[
i
,
1
]
,
c
=
'b'
)
plt
.
title
(
"iris-SVM"
)
plt
.
xlabel
(
"ewidth"
)
plt
.
ylabel
(
"elength"
)
plt
.
show
(
)
print
(
"\n\n"
)
#3.樸素貝葉斯算法分類
nb
=
GaussianNB
(
)
#設(shè)置分類器
nb
.
fit
(
data_train
,
target_train
)
accurate_Nb
=
nb
.
score
(
data_train
,
target_train
)
print
'NB預(yù)測值:'
,
(
nb
.
predict
(
data_test
)
)
#預(yù)測值
print
'真實值:'
,
(
target_test
)
#真實值
print
'NB-輸出訓練集的準確率為:'
,
accurate_Nb
for
i
in
range
(
len
(
target_test
)
)
:
if
target_test
[
i
]
==
0
:
plt
.
scatter
(
data_test
[
i
,
0
]
,
data_test
[
i
,
1
]
,
c
=
'r'
)
elif
target_test
[
i
]
==
1
:
plt
.
scatter
(
data_test
[
i
,
0
]
,
data_test
[
i
,
1
]
,
c
=
'g'
)
else
:
plt
.
scatter
(
data_test
[
i
,
0
]
,
data_test
[
i
,
1
]
,
c
=
'b'
)
plt
.
title
(
"iris"
)
plt
.
xlabel
(
"ewidth"
)
plt
.
ylabel
(
"elength"
)
plt
.
show
(
)
pr
=
nb
.
predict
(
data_test
)
for
i
in
range
(
len
(
pr
)
)
:
if
pr
[
i
]
==
0
:
plt
.
scatter
(
data_test
[
i
,
0
]
,
data_test
[
i
,
1
]
,
c
=
'r'
)
elif
pr
[
i
]
==
1
:
plt
.
scatter
(
data_test
[
i
,
0
]
,
data_test
[
i
,
1
]
,
c
=
'g'
)
else
:
plt
.
scatter
(
data_test
[
i
,
0
]
,
data_test
[
i
,
1
]
,
c
=
'b'
)
plt
.
title
(
"iris-NB"
)
plt
.
xlabel
(
"ewidth"
)
plt
.
ylabel
(
"elength"
)
plt
.
show
(
)
偷個懶,其他的運行結(jié)果就不貼了,都差不多。
結(jié)果分析:
對比三種算法的準確率我發(fā)現(xiàn)knn=0.97,svm=0.96,樸素貝葉斯=0.94
即在較少數(shù)據(jù)時knn>svm>樸素貝葉斯,進一步得出結(jié)論,較少數(shù)據(jù)時KNN和svm的分類效率較高,樸素貝葉斯的效率較低。
本人水平有限,如有問題歡迎大家不吝指正。
更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號聯(lián)系: 360901061
您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點擊下面給點支持吧,站長非常感激您!手機微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對您有幫助就好】元
