# Data processing¶

## Fitting¶

Class `fitting.Fitter` is a user-friendly wrapper around `scipy.optimize.least_squares()` routine. Dealing with fitting is made more convenient in a couple of ways:

• It is easy to specify the x-parameter name (in the case it is not the first parameter), or specify multiple x-parameters;
• All of the fit and fixed parameters are specified by name; it is easy to switch between any parameter being fit or fixed;
• The wrapper automatically handles complex parameters (split into real and imaginary parts), numpy arrays, lists, or tuples (including nested structures);
• The final parameters (fit and fixed) are returned in a single dictionary indexed by their names;
• The wrapper also returns the fit function with all of the parameters bound to the final fit and fixed values;
• The fit function result is flattened during fitting, so it works for functions returning multi-dimensional (for example, 2D) arrays.

### Examples¶

Fitting a Lorentzian:

```def lorentzian(frequency, position=0., width=1., height=1.):
return height/(1.+4.*(frequency-position)**2/width**2)

## creating the fitter
# fit_parameters dictionary specifies the initial guess
fit_par = {"position":0.5, "height":1.}
fitter = pll.Fitter(lorentzian, xarg_name="frequency", fit_parameters=fit_par)
# additional fit parameter is supplied during the call
fit_par, fit_func = fitter.fit(xdata, ydata, fit_parameters={"width":1.0})
plot(xdata, ydata)  # plot the experimental data
plot(xdata, fit_func(xdata))  # plot fit result
```

Fitting a sum of complex Lorentzians with the same width:

```def lorentzian_sum(frequency, positions, width, amplitudes):
# list of complex lorentzians
#   positions and amplitudes are lists, one per peak
lorentzians = [a/(1.+2j*(frequency-p)/width) for (a,p) in zip (amplitudes,positions)]
return np.sum(lorentzians, axis=0)

## creating the fitter
# fit_parameters dictionary specifies the initial guess
#     (complex initial guess for the "amplitude" parameter hints that this parameter is complex)
fit_par = {"positions":[0.,0.5,1.], "amplitudes":[1.+0.j]*3}
fitter = pll.Fitter(lorentzian_sum, xarg_name="frequency", fit_parameters=fit_par)
# fixed parameter is supplied during the call (could have also been supplied on Fitter initialization)
fit_par, fit_func = fitter.fit(xdata, ydata, fixed_parameters = {"width":0.3})
plot(xdata, ydata.real)  # plot the experimental data
plot(xdata, fit_func(xdata).real)  # plot fit result
```

Fitting 2D Gaussian and getting the parameter estimation errors:

```def gaussian(x, y, pos, width, height):
return np.exp( -((x-pos)**2+(y-pos)**2)/(2*width**2) )*height

## creating the fitter
# fit_parameters dictionary specifies the initial guess
fit_par = {"pos":(100,100), "width":10., "height":5.}
fitter = pll.Fitter(gaussian, xarg_name=["x","y"], fit_parameters=fit_par)
xs, ys = np.meshgrid(np.arange(img.shape), np.arange(img.shape), indexing="ij") # building x and y coordinates for the image
# fit_stderr is a dictionary containing the fit error for the corresponding parameters
fit_par, fit_func, fit_stderr = fitter.fit([xs,ys], img, return_stderr=True)
imshow(fit_func(xs, ys))  # plot fit result
```

The full module documentation is given at `pylablib.core.dataproc.fitting`.

## Filtering and decimation¶

There are several functions present for filtering the data to smooth it or reduce its size. Most of them are thin wrapper around standard numpy or scipy method, but they provide more universal interface which work both with numpy arrays and pandas DataFrames:

## Fourier transform¶

There is a couple of methods to work with Fourier transform. They are built around `numpy.fft.fft()`, but allow more convenient normalization (e.g., in units of power spectral density), and work better with pandas DataFrames. They also have an option to automatically trim the trace length to the nearest “good” size, which is a product of small primes. This can have fairly strong (up to a factor of several) effect on the transform runtime, while typically trimming off less than 1% of the data.

The main methods are `fourier.fourier_transform()` for the direct transform, `fourier.inverse_fourier_transform()` for the inverse transform, and `fourier.power_spectral_density()` for the power spectral density:

```>> x = np.random.normal(size=10**5)  # normal distribution centered at 0 with a width of 1
>> PSD = pll.power_spectral_density(x, dt=1E-3)  # by default, use density normalization; assume time step of 1ms
>> df = PSD[1,0] - PSD[0,0]
>> df  # total span is 1kHz with 10**5 points, resulting in 0.01Hz step
0.01
>> np.sum(PSD[:,1]) * df  # integrated PSD is equal to the original trace RMS squared, which is about 1 for the normal distribution
1.005262206692361
>> np.mean(x**2)
1.005262206692361
```

## Feature detection¶

There are several methods for simple feature detection:

• The peak detection, which is usually achieved by the combination of `feature.multi_scale_peakdet()` and `feature.find_peaks_cutoff()`. The first applies difference-of-Lorentzians or difference-of-Gaussians filter, which detects peaks of a particular width. The second finds peaks using a cutoff.
• Another way to find peaks is using `feature.find_local_extrema()`, which finds local minima or maxima in a sliding window of a given width.
• Switching between two states with a noisy trace can be detected using `feature.latching_trigger()`. It implements a more robust approach to find when the trace is above/below threshold by considering two thresholds: a higher “on” thresholds and a lower “off” threshold. It makes the on/off state “latch” to its current value and is robust to small trace fluctuations around the threshold, which would lead to rapid on/off switches in a single-threshold scheme.

## Miscellaneous utilities¶

Additionally, there is a variety of small functions to simplify some data analyses and transforms: