it's only fitting

rob mahurin

2025-07-31 14:19

Here's a quick-n-dirty Gaussian curve fitter, which I seem to reinvent about twice a year. There are a number of canned solutions that I can never remember how to use, but I also frequently find myself wanting to fit weird functions.

Below in a minimum working example. But the short-short version is

def func(x, *params): ...
x,y = read_some_data()
guess_params = [...]

from scipy import optimize
better_params, covariance = optimize.curve_fit(
    func, x, y, p0=guess_params)

First, we need to define the function to fit. I think you can't have keyword arguments, but I suppose you could kludge around that with a wrapper or with functools.partial. Here, the gaussian, with normalizations for area and also normalizations for fitting to the peak:

def gaussian_norm(x, mean, std_dev, integral, offset):
    import numpy as np
    mu,sigma = mean, std_dev # standard symbols
    bell = np.exp( -(x-mu)**2 / (2*sigma**2) )
    norm = integral / np.sqrt(2*np.pi*sigma**2)
    return norm*bell + offset

def gaussian_peak(x, mean, std_dev, peak, offset):
    import numpy as np
    integral = peak * np.sqrt(2*np.pi*std_dev**2)
    return gaussian_norm(x, mean, std_dev, integral, offset)

Next, generate some noisy data (perhaps by doing an experiment):

import numpy as np
import matplotlib.pyplot as plt

# Use messy bins because data always sucks
frequency = np.linspace(2900,3100, 1253)

# Pick some default parameters
mean, std_dev, amplitude, offset = 3027, 15, 20, 5

# Generate some noisy data
nice_data = gaussian_peak(frequency, mean, std_dev, amplitude, offset)
data = nice_data + np.random.randn(*nice_data.shape)

plt.plot(frequency, data)

plot of messy data

Because this is quick-n-dirty, it's necessary to guess at some parameters:

from scipy import optimize

guess = (3000, 50, 30, 0)
better_guess, covariance = optimize.curve_fit(
    gaussian_peak, frequency, data, p0 = guess)

plt.plot(frequency, data, alpha=0.5, label='data')
plt.plot(frequency, gaussian_peak(frequency, *guess), label='intial parameters')
plt.plot(frequency, gaussian_peak(frequency, *better_guess), label='fit result', color='red')

plot of messy data with bad initial guess and good fit

The fit results seem to be pretty good:

specified	fitted
3027.00	3026.91
15.00	15.01
20.00	19.97
5.00	5.01

Note that gaussian_norm, while its integral is better-defined, develops a huge covariance between the integral and the width of the distribution. It's easier to read the parameters for gaussian_peak off of the graph, and the parameters seem to be much more orthogonal to each other.