1200字范文,内容丰富有趣,写作的好帮手!
1200字范文 > 机器学习-线性回归方法之最小二乘法和梯度下降法

机器学习-线性回归方法之最小二乘法和梯度下降法

时间:2020-12-31 12:25:17

相关推荐

机器学习-线性回归方法之最小二乘法和梯度下降法

最近刚开始学机器学习里面的线性回归~也是第一次接触python 参考了网上的一些内容和自己的理解做了以下整理~希望可以帮助到你

1线性回归(Linear Regression):

拟合出一个线性组合关系的函数,尽可能拟合所有数据点。

一元线性回归:如y=x+b,找到一条直线,所有样本点到直线的距离之和最小。 多元线性回归:如y=1x1+2x2+3x3...+0,找到一个超平面来拟合样本点。 为了方便计算和表示,写成矩阵形式,y=X+

如何判断模型与观测点是最佳拟合呢?“误差的平方和最小”估计出来的模型是最接近真实情形的。

2求解方法:

1最小二乘法(least square method)

(4条消息) 一文让你彻底搞懂最小二乘法(超详细推导)_胤风的博客-CSDN博客_最小二乘法/MoreAction_/article/details/106443383

例1:通过公式计算出(y=wx+b)w和b,再由误差函数(平方损失函数平均值)计算出误差

import numpy as npimport matplotlib.pyplot as plt# ---------------1. 准备数据----------data = np.array([[32,31],[53,68],[61,62],[47,71],[59,87],[55,78],[52,79],[39,59],[48,75],[52,71],[45,55],[54,82],[44,62],[58,75],[56,81],[48,60],[44,82],[60,97],[45, 48],[38,56],[66,83],[65,118],[47,57],[41,51],[51,75],[59,74],[57,95],[63,95],[46,79],[50,83]])# 提取data中的两列数据,分别作为x,yx = data[:, 0]y = data[:, 1]# 用plt画出散点图#plt.scatter(x, y)#plt.show()# -----------2. 定义损失函数------------------# 损失函数是系数的函数,另外还要传入数据的x,ydef compute_cost(w, b, points):total_cost = 0M = len(points)# 逐点计算平方损失误差,然后求平均数for i in range(M):x = points[i, 0]y = points[i, 1]total_cost += (y - w * x - b) ** 2return total_cost / M# ------------3.定义算法拟合函数-----------------# 先定义一个求均值的函数def average(data):sum = 0num = len(data)for i in range(num):sum += data[i]return sum / num# 定义核心拟合函数def fit(points):M = len(points)x_bar = average(points[:, 0])sum_yx = 0sum_x2 = 0sum_delta = 0for i in range(M):x = points[i, 0]y = points[i, 1]sum_yx += y * (x - x_bar)sum_x2 += x ** 2# 根据公式计算ww = sum_yx / (sum_x2 - M * (x_bar ** 2))for i in range(M):x = points[i, 0]y = points[i, 1]sum_delta += (y - w * x)b = sum_delta / Mreturn w, b# ------------4. 测试------------------w, b = fit(data)print("w is: ", w)print("b is: ", b)cost = compute_cost(w, b, data)print("cost is: ", cost)# ---------5. 画出拟合曲线------------plt.scatter(x, y)# 针对每一个x,计算出预测的y值pred_y = w * x + bplt.plot(x, pred_y, c='r')plt.show()

2梯度下降法(gradient descent)

主要有三种梯度下降方法

1Batch Gradient Descent 批量梯度下降

2Stochastic Gradient Descent 随机梯度下降

3Mini-batch Gradient Descent 小批量梯度下降(BGD和SGD的折中)

2.1对比最小二乘法和梯度下降法

例1:将最小二乘法的例1的例子用梯度下降法python代码:

import numpy as npimport matplotlib.pyplot as plt# 准备数据data = np.array([[32, 31], [53, 68], [61, 62], [47, 71], [59, 87], [55, 78], [52, 79], [39, 59], [48, 75], [52, 71],[45, 55], [54, 82], [44, 62], [58, 75], [56, 81], [48, 60], [44, 82], [60, 97], [45, 48], [38, 56],[66, 83], [65, 118], [47, 57], [41, 51], [51, 75], [59, 74], [57, 95], [63, 95], [46, 79],[50, 83]])x = data[:, 0]y = data[:, 1]# --------------2. 定义损失函数--------------def compute_cost(w, b, data):total_cost = 0M = len(data)# 逐点计算平方损失误差,然后求平均数for i in range(M):x = data[i, 0]y = data[i, 1]total_cost += (y - w * x - b) ** 2return total_cost / M# --------------3. 定义模型的超参数------------alpha = 0.0001initial_w = 0initial_b = 0num_iter = 10# --------------4. 定义核心梯度下降算法函数-----def grad_desc(data, initial_w, initial_b, alpha, num_iter):w = initial_wb = initial_b# 定义一个list保存所有的损失函数值,用来显示下降的过程cost_list = []for i in range(num_iter):cost_list.append(compute_cost(w, b, data))w, b = step_grad_desc(w, b, alpha, data)return [w, b, cost_list]def step_grad_desc(current_w, current_b, alpha, data):sum_grad_w = 0sum_grad_b = 0M = len(data)# 对每个点,代入公式求和for i in range(M):x = data[i, 0]y = data[i, 1]sum_grad_w += (current_w * x + current_b - y) * xsum_grad_b += current_w * x + current_b - y# 用公式求当前梯度grad_w = 2 / M * sum_grad_wgrad_b = 2 / M * sum_grad_b# 梯度下降,更新当前的w和bupdated_w = current_w - alpha * grad_wupdated_b = current_b - alpha * grad_breturn updated_w, updated_b# ------------5. 测试:运行梯度下降算法计算最优的w和b-------w, b, cost_list = grad_desc( data, initial_w, initial_b, alpha, num_iter )print("w is: ", w)print("b is: ", b)cost = compute_cost(w, b, data)print("cost is: ", cost)#plt.plot(cost_list)#plt.show()# ------------6. 画出拟合曲线-------------------------plt.scatter(x, y)# 针对每一个x,计算出预测的y值pred_y = w * x + bplt.plot(x, pred_y, c='r')plt.show()

2.2对比BGD和SGD:

(文字版更详细|表格版简略)

①梯度计算:

BGD每次迭代需计算整个数据集,每次计算所有样本的(成本函数梯度)的和,迭代缓慢。

SGD每次迭代只随机选择样本中某个数据,每次计算一个样本(成本函数的梯度),迭代快速。

②计算成本:

BGD计算缓慢,计算成本高。SGD计算复杂度低(computationally much less expensive),假设数据集中有1亿个样本,BGD每次迭代计算需要使用1亿个样本,而SGD只使用1个样本来计算。

③过程process update:

BGD每步选择较平滑;SGD由于每次选择梯度的随机性(randomness in its descent),需要更高次数的迭代(higher number of iterations)来取得最优值,且收敛路径会更曲折(noisier)见下图。

④适用场合:

BGD适合训练样本小时,且不需要数据集样本顺序随机;SGD适合训练样本较大时,需要样本顺序随机。

但是BGD对于凸/相对平滑的误差流形效果很好(convex or relatively smooth error manifolds),且能随着特征数量的增加而扩展。

⑤结果solution:

BGD在收敛时间充足时能够给出最优解;而SGD一旦接近最小值会四处反弹而不安顿下来,因此会给一个模型的较好值而不是最优值,可以通过降低每一步的学习率来尽量解决。

BGD容易陷入局部最小值,SGD不容易陷入局部最小值。

3综合代码示例

3.1:线性回归 BGD SGD

Created on Sun Oct 16 11:23:55 @author: imagine"""%matplotlib inline# import packagesimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport timeimport warningswarnings.filterwarnings("ignore")num_features = 2theta_ = np.array([1, 1])#2行1列num_samples = 100###第一步产生数据X,y##########################X为100*2,y为100*1def generate_data(num_samples, num_features, scale, function):# range of x: [-0.5, 0.5]X = np.random.rand(num_samples, num_features)-0.5#X为100行*2列的矩阵,里面的数据都在-0.5到0.5之间# X@y is equivalent to np.matmul(X, y)y = function(X) + np.random.normal(scale=scale, size=(num_samples))#y是100行1列的矩阵,return X, ydef func1(X):return X @ theta_ #x为100*2,theta_为2*1,得到return为100*1X, y = generate_data(num_samples, num_features, 0.2, func1)###第二步计算 三种方法示例 正态方程法;批量梯度下降法BGD;随机梯度下降法SGD###################方法一#########################正态方程法方法def linear_regression(X, y):return np.linalg.inv((X.transpose()@X))@X.transpose()@y #用梯度求导为0的公式计算出thetatheta_linear_regression = linear_regression(X, y)print('the estimated parameter using normal equation is {}'.format(theta_linear_regression))##################方法二##############################BGD方法theta_linear_regression_2 = np.zeros(num_features) #2行1列theta_linear_regression_2max_epoch = 50alpha = 0.01# function to compute the errordef LSError(X, y, theta):return np.linalg.norm(np.matmul(X, theta)-y)# implementation of batch gradient descentdef BGD(X, y, max_epoch, alpha, theta_linear_regression):for epoch in range(max_epoch):# if the least square error is small enough, we can stop the iterative updateif LSError(X, y, theta_linear_regression) <= 1e-5:break# each epoch we use whole data to compute the gradient and update the parameterderiv = np.matmul(np.matmul(X.T, X), theta_linear_regression_2) - np.matmul(X.T, y)# update the parametertheta_linear_regression -= alpha* derivreturn theta_linear_regressionBGD(X, y, max_epoch, alpha, theta_linear_regression_2)theta_linear_regression_2print('the estimated parameter using BGD is {}'.format(theta_linear_regression_2))##################方法三##############################SGD方法theta_linear_regression_3 = np.zeros(num_features) #2行1列theta_linear_regression_3 = theta_linear_regression_3.reshape(-1,1)max_epoch = 50alpha = 0.01def LSError(X, y, theta):return np.linalg.norm(np.matmul(X, theta)-y)def SGD(X, y, max_epoch, alpha, theta_linear_regression):for epoch in range(max_epoch):if LSError(X, y, theta_linear_regression) <= 1e-5:break# each epoch we use one data to compute the gradient and update the parameterdata1= np.hstack((X, y.reshape((-1, 1))))#水平堆叠 若x为一列,y为一列,则组合起来为两列np.random.shuffle(data1)#将数据按行进行打乱,一行为一组数据X_mini=data1[:,:-1]#两列y_mini=data1[:,-1].reshape(-1,1)#1列for i in range(X.shape[0]):x_temp = X_mini[i,:].reshape(1,-1)y_temp = y_mini[i,:].reshape(1,-1)deriv = x_temp.T@x_temp@theta_linear_regression_3-x_temp.T@y_temp# update the parametertheta_linear_regression -= alpha* derivreturn theta_linear_regressionSGD(X, y, max_epoch, alpha, theta_linear_regression_3)theta_linear_regression_3print('the estimated parameter using SGD is {}'.format(theta_linear_regression_3))

3.2 MGD 的python代码

# -*- coding: utf-8 -*-"""Created on Sat Oct 15 15:04:14 @author: imagine"""# 1第一步importing dependencies 导入依赖关系,为线性回归生成数据并将数据可视化import numpy as npimport matplotlib.pyplot as plt# 1.1creating datamean = np.array([5.0, 6.0])cov = np.array([[1.0, 0.95], [0.95, 1.2]])data = np.random.multivariate_normal(mean, cov, 8000) #8000个数据例子,每个例子有两个属性# 1.2visualising dataplt.scatter(data[:500, 0], data[:500, 1], marker='.')plt.show()# 1.3train-test-split,训练集数据拆分,进一步分为训练集(X_train,y_train)7200个和测试集(X_test,y_test)800个data = np.hstack((np.ones((data.shape[0], 1)), data))split_factor = 0.90split = int(split_factor * data.shape[0])X_train = data[:split, :-1]y_train = data[:split, -1].reshape((-1, 1))X_test = data[split:, :-1]y_test = data[split:, -1].reshape((-1, 1))print('Number of examples in training set= ' ,(X_train.shape[0]))print('Number of examples in testing set=' , (X_test.shape[0]))# linear regression using "mini-batch" gradient descent# function to compute hypothesis / predictionsdef hypothesis(X, theta):return np.dot(X, theta)# function to compute gradient of error function w.r.t. thetadef gradient(X, y, theta):h = hypothesis(X, theta)grad = np.dot(X.transpose(), (h - y))return grad# function to compute the error for current values of thetadef cost(X, y, theta):h = hypothesis(X, theta)J = np.dot((h - y).transpose(), (h - y))J /= 2return J[0]# function to create a list containing mini-batchesdef create_mini_batches(X, y, batch_size):mini_batches = []data = np.hstack((X, y))#水平堆叠 若x为一列,y为一列,则组合起来为两列np.random.shuffle(data)#将数据按行进行打乱,一行为一组数据n_minibatches = data.shape[0] // batch_size #data.shape[0]行数 data.shape[1]是列数.//表示返回商的整数部分,小的那个i = 0for i in range(n_minibatches + 1): #range(3)即将0-3赋值给i 不包含3mini_batch = data[i * batch_size:(i + 1)*batch_size, :]X_mini = mini_batch[:, :-1]Y_mini = mini_batch[:, -1].reshape((-1, 1))#reshape将Y_mini变成一列,行数由系统自动计算生成,-1为模糊变量mini_batches.append((X_mini, Y_mini))#append表示向mini_batches末尾追加元素if data.shape[0] % batch_size != 0:mini_batch = data[i * batch_size:data.shape[0]]X_mini = mini_batch[:, :-1]Y_mini = mini_batch[:, -1].reshape((-1, 1))mini_batches.append((X_mini, Y_mini))return mini_batches #某些x y的值选中# function to perform mini-batch gradient descentdef gradientDescent(X, y, learning_rate=0.001, batch_size=32):theta = np.zeros((X.shape[1], 1))error_list = []max_iters = 3for itr in range(max_iters):mini_batches = create_mini_batches(X, y, batch_size)for mini_batch in mini_batches:X_mini, y_mini = mini_batchtheta = theta - learning_rate * gradient(X_mini, y_mini, theta)error_list.append(cost(X_mini, y_mini, theta))return theta, error_listtheta, error_list = gradientDescent(X_train, y_train)print("Bias = ", theta[0])print("Coefficients = ", theta[1:])# visualising gradient descentplt.plot(error_list)plt.xlabel("Number of iterations")plt.ylabel("Cost")plt.show()# predicting output for X_testy_pred = hypothesis(X_test, theta)plt.scatter(X_test[:, 1], y_test[:, ], marker='.')plt.plot(X_test[:, 1], y_pred, color='orange')plt.show()# calculating error in predictionserror = np.sum(np.abs(y_test - y_pred) / y_test.shape[0])print('Mean absolute error =', error)

资料参考来源:

1.ML | Stochastic Gradient Descent (SGD) - GeeksforGeeks

2.Difference between Batch Gradient Descent and Stochastic Gradient Descent - GeeksforGeeks

3.

机器学习之线性回归算法Linear Regression(python代码实现)_卷不动的程序猿的博客-CSDN博客_linearregression函数/qq_41750911/article/details/124883520

4.

(4条消息) 一文让你彻底搞懂最小二乘法(超详细推导)_胤风的博客-CSDN博客_最小二乘法/MoreAction_/article/details/106443383

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。