1200字范文 > sklearn实现决策树随机森林逻辑回归 KNN 贝叶斯 SVM 以葡萄干数据集为例

sklearn实现决策树随机森林逻辑回归 KNN 贝叶斯 SVM 以葡萄干数据集为例

时间：2023-09-25 10:21:48

数据集介绍

本次使用的数据集为葡萄干数据集，来源于UCI中：

https://archive.ics.uci.edu/ml/datasets/Raisin+Dataset

介绍为：

Images of Kecimen and Besni raisin varieties grown in Turkey were obtained with CVS. A total of 900 raisin grains were used, including 450 pieces from both varieties. These images were subjected to various stages of pre-processing and 7 morphological features were extracted. These features have been classified using three different artificial intelligence techniques.

就是将图片通过一定方法提取出7个特征，分别为：

1.) Area: Gives the number of pixels within the boundaries of the raisin.

2.) Perimeter: It measures the environment by calculating the distance between the boundaries of the raisin and the pixels around it.

3.) MajorAxisLength: Gives the length of the main axis, which is the longest line that can be drawn on the raisin.

4.) MinorAxisLength: Gives the length of the small axis, which is the shortest line that can be drawn on the raisin.

5.) Eccentricity: It gives a measure of the eccentricity of the ellipse, which has the same moments as raisins.

6.) ConvexArea: Gives the number of pixels of the smallest convex shell of the region formed by the raisin.

7.) Extent: Gives the ratio of the region formed by the raisin to the total pixels in the bounding box.

同时葡萄干种类为两种，分别为Kecimen 和 Besni。

可以看一眼数据集部分数据：

其中该数据集是发表的外文文献：Classification of Raisin Grains Using Machine Vision and Artificial

Intelligence Methods。

该文献使用的是三种方法，分别为LR，MLP，SVM，准确率如下：

我将采用不同机器学习的方法进行使用。

数据集处理：

第一步读取数据集：

datas = pd.read_excel('Raisin_Dataset.xlsx') datas.head(10)

第二步：

将特征和标签分开：

#.提取特征数据、标签数据cols = [i for i in datas.columns if i not in ['Class']] #获取种特征名称，不包含标签print(cols)

uesr_data = datas[cols]uesr_data.head()

第三步进行归一化处理，归一化对于提升准确率还是很有用的。我使用的是torch.nn模块进行的，也可以用其他方法归一化（下面图片分别为未归一化和归一化数据）该方法不推荐。sklearn也自带归一化函数：

import torch.nn as nnimport torchdata = uesr_datadata = np.float64(data)print(data)print (data.shape)# 归一化 data = np.array(data)data = torch.FloatTensor(data)data = nn.functional.normalize(data)data = data.numpy()print(data)print (data.shape)

sklearn归一化

from sklearn.preprocessing import StandardScalersc = StandardScaler()data = sc.fit_transform(data)data

target = datas['Class'].values #target为标签数据，转为numpy类型

第四步是划分训练集和测试集，并且打乱：

from sklearn.model_selection import train_test_split#random_state为随机数，stratify表示对谁进行分层抽样x_train, x_test, y_train, y_test = train_test_split(data, target, test_size=0.3,shuffle=True, random_state=666,stratify=target)print(x_train.shape,y_train.shape)print(x_test.shape,y_test.shape)

决策树

使用的机器学习方法主要参考了中文文档和B站视频：

/video/BV1vJ41187hk/

决策树初使用

我先试用决策树来处理一下该数据集（我使用的是sklearn库中封装好的函数，想要学习可以去看sklearn中文文档）：

from sklearn import treeclf = tree.DecisionTreeClassifier(criterion='gini')clf = clf.fit(x_train,y_train)clf_result = clf.score(x_test,y_test)clf_result

准确率到0.815，但是文献使用的不同方法最高到0.86，效果还是不错的。

接下来进行调参，先介绍一下一些参数用法：

1.criterion这个参数正是用来决定不纯度的计算方法的，sklearn提供了两种选择:

1)输入"entropy", 使用信息熵(Entropy)

2)输入"gini"，使用基尼系数(Gini Impurity)

不填默认基尼系数，填写gini使用基尼系数，填写entropy使用信息增益。

数据维度很大，噪音很大时使用基尼系数；

维度低，数据比较清晰的时候,信息熵和基尼系数没区别；

当决策树的拟合程度不够的时候，使用信息熵；

2.random_ state用来设置分枝中的随机模式的参数，默认None。

3.splitter也是用来控制决策树中的随机选项的，有两种输入值,输入"best", 决策树在分枝时虽然随机，但是还是会优先选择更重要的特征进行分枝(重要性可以通过属性feature_ importances. 查看)，输入"random", 决策树在分枝时会更加随机，树会因为含有更多的不必要信息而更深更大,并因这些不必要信息而降低对训练集的拟合。

4.max_ depth：限制树的最大深度，超过设定深度的树枝全部剪掉。

5.min_ samples_ leaf限定,一个节点在分枝后的每个子节点都必须包含至少min_samples_ leaf个训练样本，否则分枝就不会发生，或者,分枝会朝着满足每个子节点都包含min_ samples_ leaf个样本的方向去发生。一般搭配max_ depth使用,在回归树中有神奇的效果，可以让模型变得更加平滑。这个参数的数量设置得太小会引起过拟合，设置得太大就会阻止模型学习数据。

6.min_ samples_split限定, 一个节点必须要包含至少min_samples_split个训练样本,这个节点允许被分枝,否则分枝就不会发生。

7.max_features限制分枝时考虑的特征个数，超过限制个数的特征都会被舍弃。和max_depth异曲同工。

调参

先找max_depth合适参数：

from sklearn import treeimport matplotlib.pyplot as pltacc = []for i in range(1,21):clf1 = tree.DecisionTreeClassifier(criterion='entropy',random_state=166,splitter='random',max_depth=i,min_samples_leaf=20,min_samples_split=20)clf1 = clf1.fit(x_train,y_train)clf1_result = clf1.score(x_test,y_test)acc.append(clf1_result)plt.plot(range(1,21),acc,color = 'red',label = 'max_depth')plt.legend()plt.show()print(acc)

max_depth-9可以达到0.874，所以我使用max_depth=9继续调参，调整min_samples_leaf：

from sklearn import treeimport matplotlib.pyplot as pltacc = []for i in range(1,40):clf1 = tree.DecisionTreeClassifier(criterion='entropy',random_state=166,splitter='random',max_depth=9,min_samples_leaf=i,min_samples_split=20)clf1 = clf1.fit(x_train,y_train)clf1_result = clf1.score(x_test,y_test)acc.append(clf1_result)plt.plot(range(1,40),acc,color = 'red',label = 'min_samples_leaf')plt.legend()plt.show()print(acc)

其中min_samples_leaf=13到的0.885准确率，调参流程大致如此，你还可以继续其它调参。

还可以使用GridSearchCV来调参,其中加入了十折交叉验证

from sklearn import treeimport matplotlib.pyplot as pltfrom sklearn.model_selection import GridSearchCVparam_grid = [{'criterion': ['entropy',"gini"],},]clf1 = tree.DecisionTreeClassifier()clf_GS = GridSearchCV(clf1,param_grid,cv=10)clf_GS.fit(data,target)print(clf_GS.best_params_)print(clf_GS.best_score_)print(clf_GS.best_estimator_)print(clf_GS.best_index_ )print(clf_GS.scorer_)print(clf_GS.n_splits_)print(clf_GS.cv_results_ )

以后使用entropy，调整random_state

from sklearn import treeimport matplotlib.pyplot as pltfrom sklearn.model_selection import GridSearchCVparam_grid = [{'random_state': np.arange(1,300,10),},]clf1 = tree.DecisionTreeClassifier(criterion='entropy')clf_GS = GridSearchCV(clf1,param_grid,cv=10)clf_GS.fit(data,target)print(clf_GS.best_params_)print(clf_GS.best_score_)print(clf_GS.best_estimator_)

选用random_state=211,调整其他参数

from sklearn import treeimport matplotlib.pyplot as pltfrom sklearn.model_selection import GridSearchCVparam_grid = [{'max_depth': np.arange(1,20,1),'max_leaf_nodes':np.arange(25,50,1),},]clf1 = tree.DecisionTreeClassifier(criterion='entropy',random_state=211)clf_GS = GridSearchCV(clf1,param_grid,cv=10)clf_GS.fit(data,target)print(clf_GS.best_params_)print(clf_GS.best_score_)print(clf_GS.best_estimator_)

在这里插入代码片

from sklearn import treeimport matplotlib.pyplot as pltfrom sklearn.model_selection import GridSearchCV# param_grid = [{'criterion': ['entropy',"gini"],'max_features': [2, 4, 6, 8]},#{'max_depth':np.arange(1,30,1),'max_leaf_nodes':np.arange(25,50,1),'max_features':np.arange(5,30,1),#'min_samples_split':np.arange(2, 2+20, 1)},random_state=166]param_grid = [{'min_samples_leaf':np.arange(2, 2+20, 1),},]clf1 = tree.DecisionTreeClassifier(criterion='entropy', max_depth=2, max_features=6,max_leaf_nodes=25, random_state=211,min_samples_split=2)clf_GS = GridSearchCV(clf1,param_grid,cv=10)clf_GS.fit(data,target)print(clf_GS.best_params_)print(clf_GS.best_score_)print(clf_GS.best_estimator_)

方法使用

feature_importances_使用：

[*zip(cols,clf1.feature_importances_)]

#apply返回每个测试样本所在的叶子节点的索引clf1.apply(x_test)

#predict返回每个测试样本的分类/回归结果clf.predict(x_test)

graphviz使用,可以查看决策树的分支图：

import graphvizfrom sklearn import treeimport matplotlib.pyplot as pltclf = tree.DecisionTreeClassifier(criterion='entropy', max_depth=2, max_features=6,max_leaf_nodes=25, random_state=211,min_samples_split=2)clf = clf.fit(x_train,y_train)clf_result = clf.score(x_test,y_test)print(clf_result)clf_data = tree.export_graphviz(clf,feature_names=cols,class_names=['Kecimen','Besni'],filled=True,rounded=True)graph = graphviz.Source(clf_data)graph

可以保存本图：

graph.render('test-output/round-table.gv', view=True)

使用交叉验证准确率降低了，最终准确率为0.85，外文文献准确率低也应该是这个原因。

使用特征提取

使用的为sklearn的pca。

"""特征提取 """from sklearn.decomposition import PCApca = PCA(n_components=4)#n_components为特征变为几个，这里是变成四个新特征pca_data = pca.fit_transform(data)#决策树from sklearn.model_selection import train_test_split#_data:特征数据，y_data:标签数据，test_size=0.2表示测试占比80%，random_state为随机数，stratify表示对谁进行分层抽样x_train2, x_test2, y_train2, y_test2 = train_test_split(pca_data, target, test_size=0.3,shuffle=True, random_state=666,stratify=target)from sklearn import treeimport matplotlib.pyplot as pltclf2 = tree.DecisionTreeClassifier(criterion='entropy',random_state=36,splitter='random',max_depth=7,min_samples_leaf=20,min_samples_split=20)clf2 = clf2.fit(x_train2,y_train2)clf2_result = clf2.score(x_test2,y_test2)print(clf_result)

使用特征提取，该数据集准确率没有提升。

随机森林

随机森林基本使用

from sklearn.ensemble import RandomForestClassifierrfc = RandomForestClassifier()rfc = rfc.fit(x_train,y_train)Forest_result = rfc.score(x_test,y_test)Forest_result

初始已经有0.86的准确率。

十折交叉验证

from sklearn.ensemble import RandomForestClassifierfrom sklearn.model_selection import cross_val_scorerfc_s = []rfc = RandomForestClassifier(n_estimators=20)rfc_yan = cross_val_score(rfc,data,target,cv=10).mean()rfc_yan

调参

n_estimators：这是森林中树木的数量，即基基评估器的数量。这个参数对随机森林模型的精确性影响是单调的，n.estimators越大，模型的效果往往越好。但是相应的，任何模型都有决策边界，nestimators达到一定的程度之后，随机森林的精确性往往不在上升或开始波动,并且, n _estimators越大,需要的计算量和内存也越大，训练的时间也会越来越长。对于这个参数,我们是渴望在训练难度和模型效果之间取得平衡。

调整n_estimators数量

from sklearn.ensemble import RandomForestClassifierfrom sklearn.model_selection import GridSearchCVparam_grid = [{'n_estimators': np.arange(1,200,10),},]rfc1 = RandomForestClassifier()rfc_GS = GridSearchCV(rfc1,param_grid,cv=10)rfc_GS.fit(data,target)print(rfc_GS.best_params_)print(rfc_GS.best_score_)print(rfc_GS.best_estimator_)

所以最佳为n_estimators=71，调整深度

from sklearn.ensemble import RandomForestClassifierfrom sklearn.model_selection import GridSearchCVparam_grid = [{'max_depth':np.arange(1,30,1),},]rfc1 = RandomForestClassifier(n_estimators=71)rfc_GS = GridSearchCV(rfc1,param_grid,cv=10)rfc_GS.fit(data,target)print(rfc_GS.best_params_)print(rfc_GS.best_score_)print(rfc_GS.best_estimator_)

调整max_leaf_nodes

from sklearn.ensemble import RandomForestClassifierfrom sklearn.model_selection import GridSearchCVparam_grid = [{'max_leaf_nodes':np.arange(25,50,1),},]rfc1 = RandomForestClassifier(max_depth=4, n_estimators=71)rfc_GS = GridSearchCV(rfc1,param_grid,cv=10)rfc_GS.fit(data,target)print(rfc_GS.best_params_)print(rfc_GS.best_score_)print(rfc_GS.best_estimator_)

调整random_state

from sklearn.ensemble import RandomForestClassifierfrom sklearn.model_selection import GridSearchCVparam_grid = [{'random_state':np.arange(1,300,10),},]rfc1 = RandomForestClassifier(max_depth=4, max_leaf_nodes=40, n_estimators=71)rfc_GS = GridSearchCV(rfc1,param_grid,cv=10)rfc_GS.fit(data,target)print(rfc_GS.best_params_)print(rfc_GS.best_score_)print(rfc_GS.best_estimator_)

后续还有许多参数可以根据以上方法进行调参，就不一一展示了。

但是整体随机森林准确率在0.876左右，比决策树效果好。

逻辑回归

逻辑回归简单使用

from sklearn.linear_model import LogisticRegressionlrl1 = LogisticRegression(penalty="l2",solver="liblinear",C=0.5,max_iter=1000)lrl1 = lrl1.fit(x_train,y_train)print(lrl1.coef_)lrl1_score = lrl1.score(x_test,y_test)print(lrl1_score)

from sklearn.linear_model import LogisticRegressionCVlrl2 = LogisticRegressionCV(cv=10,penalty="l2",solver="liblinear",max_iter=1000,random_state=0)lrl2 = lrl2.fit(x_train,y_train)print(lrl2.coef_)lrl2_score = lrl2.score(x_test,y_test)print(lrl2_score)

调参

1.penalty

可以输入"l1"或"l2"来指定使用哪一种正则化方式，不填写默认"l2"。

注意,若选择"l1"正则化，参数solver仅能够使用"liblinear",若使用"l2"正则化，参数solver中所有的求解方式都可以使用。

2.C

C正则化强度的倒数，必须是一个大于0的浮点数，不填写默认1.0,即默认一倍正则项。C越小，对损失函数的惩罚越重，正则化的效力越强。

3.Cs：浮点数列表或int，可选（默认值= 10）

4.solver：逻辑回归损失函数的优化方法，有四种算法供选择

solver{‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’}, default=’lbfgs’

‘newton-cg’：坐标轴下降法来迭代优化损失函数 ‘lbfgs’：, ‘liblinear’：牛顿法变种 ‘sag’：随机梯度下降。其中‘newton-cg’, ‘lbfgs’, ‘sag’只适用于L2惩罚项的优化，liblinear两种都适用。因为L1正则化的损失函数不是连续可导的，而{‘newton-cg’, ‘lbfgs’,‘sag’}这三种优化算法时都需要损失函数的一阶或者二阶连续导数。而‘liblinear’并没有这个依赖。

5.n_jobs： int或None，可选（默认=无）交叉验证循环期间使用的CPU核心数。

6.random_state： int，RandomState实例或None，可选（默认=无）如果是int，则random_state是随机数生成器使用的种子; 如果是RandomState实例，则random_state是随机数生成器; 如果为None，则随机数生成器是由其使用的RandomState实例np.random。

7.max_iterint, default=100 优化算法的最大迭代次数。

选择penalty:

from sklearn.linear_model import LogisticRegression,LogisticRegressionCVfrom sklearn.model_selection import GridSearchCVparam_grid = [{'penalty':['l1','l2','elasticnet'],},]lrl3 = LogisticRegressionCV()lrl_GS = GridSearchCV(lrl3,param_grid,cv=10)lrl_GS.fit(data,target)print(lrl_GS.best_params_)print(lrl_GS.best_score_)print(lrl_GS.best_estimator_)

调整solver

from sklearn.linear_model import LogisticRegression,LogisticRegressionCVfrom sklearn.model_selection import GridSearchCVparam_grid = [{'solver':['liblinear', 'newton-cg', 'lbfgs', 'sag', 'saga'],},]lrl3 = LogisticRegressionCV(penalty='l2')lrl_GS = GridSearchCV(lrl3,param_grid,cv=10)lrl_GS.fit(data,target)print(lrl_GS.best_params_)print(lrl_GS.best_score_)print(lrl_GS.best_estimator_)

调整max_iter

from sklearn.linear_model import LogisticRegression,LogisticRegressionCVfrom sklearn.model_selection import GridSearchCVparam_grid = [{'max_iter':np.arange(1,300,10),},]lrl3 = LogisticRegressionCV(penalty='l2',solver='sag')lrl_GS = GridSearchCV(lrl3,param_grid,cv=10)lrl_GS.fit(data,target)print(lrl_GS.best_params_)print(lrl_GS.best_score_)print(lrl_GS.best_estimator_)

from sklearn.linear_model import LogisticRegression,LogisticRegressionCVfrom sklearn.model_selection import GridSearchCVparam_grid = [{'random_state':np.arange(1,300,10),},]lrl3 = LogisticRegressionCV(penalty='l2',solver='sag',max_iter=111)lrl_GS = GridSearchCV(lrl3,param_grid,cv=10)lrl_GS.fit(data,target)print(lrl_GS.best_params_)print(lrl_GS.best_score_)print(lrl_GS.best_estimator_)

使用逻辑回归最高可达到0.847。

KNN

knn基本使用

from sklearn.neighbors import KNeighborsClassifierknn1 = KNeighborsClassifier()knn1 = knn1.fit(x_train,y_train)knn1_score = lrl1.score(x_test,y_test)print(knn1_score)

调参

1.n_neighbors： int，optional(default = 5)

默认情况下kneighbors查询使用的邻居数。就是k-NN的k的值，选取最近的k个点。

2.weights{‘uniform’, ‘distance’} or callable, default=’uniform’

默认是uniform，参数可以是uniform、distance，也可以是用户自己定义的函数。uniform是均等的权重，就说所有的邻近点的权重都是相等的。distance是不均等的权重，距离近的点比距离远的点的影响大。

3.algorithm{‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, default=’auto’

快速k近邻搜索算法，默认参数为auto，可以理解为算法自己决定合适的搜索算法。除此之外，用户也可以自己指定搜索算法ball_tree、kd_tree、brute方法进行搜索，brute是蛮力搜索，也就是线性扫描，当训练集很大时，计算非常耗时。kd_tree，构造kd树存储数据以便对其进行快速检索的树形数据结构，kd树也就是数据结构中的二叉树。以中值切分构造的树，每个结点是一个超矩形，在维数小于20时效率高。ball tree是为了克服kd树高纬失效而发明的，其构造过程是以质心C和半径r分割样本空间，每个节点是一个超球体。

4.leaf_sizeint, default=30

默认是30，这个是构造的kd树和ball树的大小。这个值的设置会影响树构建的速度和搜索速度，同样也影响着存储树所需的内存大小。需要根据问题的性质选择最优的大小。

5.pint： default=2

距离度量公式。

调整n_neighbors

from sklearn.neighbors import KNeighborsClassifierfrom sklearn.model_selection import GridSearchCVparam_grid = [{'n_neighbors':np.arange(1,50,1),},]knn1 = KNeighborsClassifier()knn_GS = GridSearchCV(knn1,param_grid,cv=10)knn_GS = knn_GS.fit(data,target)print(knn_GS.best_params_)print(knn_GS.best_score_)print(knn_GS.best_estimator_)

调整weights

from sklearn.neighbors import KNeighborsClassifierfrom sklearn.model_selection import GridSearchCVparam_grid = [{'weights':['uniform','distance'],},]knn1 = KNeighborsClassifier(n_neighbors=15)knn_GS = GridSearchCV(knn1,param_grid,cv=10)knn_GS = knn_GS.fit(data,target)print(knn_GS.best_params_)print(knn_GS.best_score_)print(knn_GS.best_estimator_)

调整algorithm

from sklearn.neighbors import KNeighborsClassifierfrom sklearn.model_selection import GridSearchCVparam_grid = [{'algorithm':['auto','ball_tree','kd_tree','brute']},]knn1 = KNeighborsClassifier(n_neighbors=15,weights='uniform')knn_GS = GridSearchCV(knn1,param_grid,cv=10)knn_GS = knn_GS.fit(data,target)print(knn_GS.best_params_)print(knn_GS.best_score_)print(knn_GS.best_estimator_)

高斯朴素贝叶斯

高斯朴素贝叶斯简单使用

from sklearn.naive_bayes import GaussianNBgnb1 = GaussianNB()gnb1 = gnb1.fit(x_train,y_train)gnb1_score = gnb1.score(x_test,y_test)print(gnb1_score)

支持向量机SVM

SVM简单使用

from sklearn import svmsvm1 = svm.SVC()svm1 = svm1.fit(x_train,y_train)svm1_score = svm1.score(x_test,y_test)print(svm1_score)

调参

参数翻译主要参考以下：

/TeFuirnever/article/details/99646257

1.C： float，可选(默认值= 1.0)

错误术语的惩罚参数C。C越大，相当于惩罚松弛变量，希望松弛变量接近0，即对误分类的惩罚增大，趋向于对训练集全分对的情况，这样对训练集测试时准确率很高，但泛化能力弱。C值小，对误分类的惩罚减小，允许容错，将他们当成噪声点，泛化能力较强。

2**.kernel** ： string，optional(default =‘rbf’)

’linear’：线性核函数

‘poly’：多项式核函数

‘rbf’：径像核函数/高斯核

‘sigmod’：sigmod核函数

‘precomputed’：核矩阵

3.degree： int，可选(默认= 3)

多项式核函数的阶数，int类型，可选参数，默认为3。这个参数只对多项式核函数有用，是指多项式核函数的阶数n，如果给的核函数参数是其他核函数，则会自动忽略该参数。

4.gamma： float，optional(默认=‘auto’)

核函数系数，float类型，可选参数，默认为auto。只对’rbf’ ,’poly’ ,’sigmod’有效。如果gamma为auto，代表其值为样本特征数的倒数，即1/n_features。

5.coef0： float，optional(默认值= 0.0)

核函数中的独立项，float类型，可选参数，默认为0.0。只有对’poly’ 和,’sigmod’核函数有用，是指其中的参数c。

6**.shrinking** ：布尔值，可选(默认= True)

是否采用启发式收缩方式，bool类型，可选参数，默认为True。

7.probability：布尔值，可选(默认=False)

是否启用概率估计，bool类型，可选参数，默认为False，这必须在调用fit()之前启用，并且会fit()方法速度变慢。

8.tol： float，optional(默认值= 1e-3)

svm停止训练的误差精度，float类型，可选参数，默认为1e^-3。

9.cache_size： float，可选（默认为200）

内存大小，float类型，可选参数，默认为200。指定训练所需要的内存，以MB为单位，默认为200MB。

10.class_weight： {dict，‘balanced’}，可选

类别权重，dict类型或str类型，可选参数，默认为None。给每个类别分别设置不同的惩罚参数C，如果没有给，则会给所有类别都给C=1，即前面参数指出的参数C。如果给定参数’balance’，则使用y的值自动调整与输入数据中的类频率成反比的权重。

11.verbose： bool，默认值：False

是否启用详细输出，bool类型，默认为False，此设置利用libsvm中的每个进程运行时设置，如果启用，可能无法在多线程上下文中正常工作。一般情况都设为False，不用管它。

12.max_iter： int，optional(默认值= -1)

最大迭代次数，int类型，默认为-1，表示不限制。

13.decision_function_shape： ‘ovo’，‘ovr’，默认=‘ovr’

决策函数类型，可选参数’ovo’和’ovr’，默认为’ovr’。’ovo’表示one vs one，’ovr’表示one vs rest。

14.random_state： int，RandomState实例或None，可选(默认=无)

数据洗牌时的种子值，int类型，可选参数，默认为None。伪随机数发生器的种子,在混洗数据时用于概率估计。

调整C

from sklearn import svmfrom sklearn.model_selection import GridSearchCVparam_grid = [{'C':np.arange(1,60,2)},]svm1 = svm.SVC()svm_GS = GridSearchCV(svm1,param_grid,cv=10)svm_GS = svm_GS.fit(data,target)print(svm_GS.best_params_)print(svm_GS.best_score_)print(svm_GS.best_estimator_)

调整kernel

from sklearn import svmfrom sklearn.model_selection import GridSearchCVparam_grid = [{'kernel':['linear','poly','rbf','sigmoid']},]svm1 = svm.SVC(C=59)svm_GS = GridSearchCV(svm1,param_grid,cv=10)svm_GS = svm_GS.fit(data,target)print(svm_GS.best_params_)print(svm_GS.best_score_)print(svm_GS.best_estimator_)

调整degree

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。