1、逻辑回归命名的由来是什么,为什么叫“逻辑”回归。
2、逻辑回归是线性回归方法,还是分类方法。
3、逻辑回归与第4章学习的决策树、贝叶斯方法相比,在样本集上有什么不同?
4、逻辑回归的程序编写方法。
数据以iris数据集为例,先数据加载和处理,获取setosa、virginica 两个分类的数据、转换0和1、准备做逻辑回归。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn import metrics
from sklearn.model_selection import train_test_split
import seaborn as sns
iris=pd.read_csv('iris.csv')
iris=iris[(iris['Species']=='setosa') | (iris['Species']=='virginica')]
print(iris.head(5))
iris['Species'] =iris['Species'].replace(['setosa', 'virginica'], [0, 1])
print(iris.tail(5))
X=iris[['Sepal.Length','Sepal.Width','Petal.Length','Petal.Width' ]].values
Y=iris['Species'].values
print(X[0:5])
print(Y[0:5])
拆分数据集(7:3为拆分比例)为训练集和测试集,以及数据的标准化:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=1, stratify=Y)
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
训练模型:
lr = LogisticRegression(C=100.0, random_state=1, solver='lbfgs', multi_class='ovr')
lr.fit(X_train_std, Y_train)
模型预测,预测测试数据集的自变量,得到预测结果:
Y_predict = lr.predict(X_test_std)
print(Y_predict)
print(Y_test)
模型评估,混淆矩阵:
matrix_of_confusion = metrics.confusion_matrix(Y_test, Y_predict)
fig, ax = plt.subplots(figsize = (10, 6))
sns.heatmap(matrix_of_confusion, annot=True ,fmt='g');
ax.xaxis.set_label_position("top")
plt.title('Confusion matrix', y=1.1)
plt.ylabel('Actual class')
plt.xlabel('Predicted class')
plt.show();模型评估(precision、 recall、F1、 accuracy):
print("逻辑回归 Recall :%.3f" %metrics.recall_score(Y_test, Y_predict))
print("逻辑回归 precision :%.3f" %metrics.precision_score(Y_test, Y_predict))
print("逻辑回归 F1 :%.3f" %metrics.f1_score(Y_test, Y_predict))
print("逻辑回归 Accuracy :%.3f" %metrics.accuracy_score(Y_test, Y_predict))