1200字范文 > python绘制拟合回归散点图_python – 将曲线拟合到散点图的边界

python绘制拟合回归散点图_python – 将曲线拟合到散点图的边界

时间：2023-08-14 16:48:57

我发现问题真的很有趣,所以我决定尝试一下.我不知道pythonic或natural,但我认为我已经找到了一种更准确的方法,可以在使用每个点的信息时将边缘拟合到像您这样的数据集.

首先,让我们生成一个看起来像你所展示的随机数据.这个部分可以很容易地跳过,我发布它只是为了使代码完整和可重复.我使用了两个双变量正态分布来模拟那些过度密度,并在其上撒上一层均匀分布的随机点.然后将它们添加到与您类似的线方程中,线下的所有内容都被截断,最终结果如下所示：

以下是制作它的代码段：

import numpy as np

x_res = 1000

x_data = np.linspace(0, 2000, x_res)

# true parameters and a function that takes them

true_pars = [80, 70, -5]

model = lambda x, a, b, c: (a / np.sqrt(x + b) + c)

y_truth = model(x_data, *true_pars)

mu_prim, mu_sec = [1750, 0], [450, 1.5]

cov_prim = [[300**2, 0 ],

[ 0, 0.2**2]]

# covariance matrix of the second dist is trickier

cov_sec = [[200**2, -1 ],

[ -1, 1.0**2]]

prim = np.random.multivariate_normal(mu_prim, cov_prim, x_res*10).T

sec = np.random.multivariate_normal(mu_sec, cov_sec, x_res*1).T

uni = np.vstack([x_data, np.random.rand(x_res) * 7])

# censoring points that will end up below the curve

prim = prim[np.vstack([[prim[1] > 0], [prim[1] > 0]])].reshape(2, -1)

sec = sec[np.vstack([[sec[1] > 0], [sec[1] > 0]])].reshape(2, -1)

# rescaling to data

for dset in [uni, sec, prim]:

dset[1] += model(dset[0], *true_pars)

# this code block generates the figure above:

import matplotlib.pylab as plt

plt.figure()

plt.plot(prim[0], prim[1], '.', alpha=0.1, label = '2D Gaussian #1')

plt.plot(sec[0], sec[1], '.', alpha=0.5, label = '2D Gaussian #2')

plt.plot(uni[0], uni[1], '.', alpha=0.5, label = 'Uniform')

plt.plot(x_data, y_truth, 'k:', lw = 3, zorder = 1.0, label = 'True edge')

plt.xlim(0, 2000)

plt.ylim(-8, 6)

plt.legend(loc = 'lower left')

plt.show()

# mashing it all together

dset = np.concatenate([prim, sec, uni], axis = 1)

现在我们有了数据和模型,我们可以集体讨论如何拟合点分布的边缘.常用的回归方法,如非线性最小二乘scipy.optimize.curve_fit,取数据值y并优化模型的自由参数,使y和模型(x)之间的残差最小.非线性最小二乘是一个迭代过程,试图在每一步摆动曲线参数,以改善每一步的拟合.现在显然,这是我们不想做的一件事,因为我们希望我们的最小化程序能够让我们尽可能远离最合适的曲线(但不要太远).

因此,让我们考虑以下功能.它不是简单地返回残差,而是在迭代的每一步也“翻转”曲线上方的点,并将它们考虑在内.这样,曲线下面的点总是比它上面的点多,导致曲线每次迭代都向下移动！达到最低点后,找到函数的最小值,散点的边缘也是如此.当然,这种方法假设你没有曲线下面的异常值 – 但是你的数字似乎并没有受到太多影响.

以下是实现此想法的功能：

def get_flipped(y_data, y_model):

flipped = y_model - y_data

flipped[flipped > 0] = 0

return flipped

def flipped_resid(pars, x, y):

"""

For every iteration, everything above the currently proposed

curve is going to be mirrored down, so that the next iterations

is going to progressively shift downwards.

"""

y_model = model(x, *pars)