Code
try:
import fysisk_biokemi
print("Already installed")
except ImportError:
%pip install -q "fysisk_biokemi[colab] @ git+https://github.com/au-mbg/fysisk-biokemi.git"
Curve fitting is a fundamental skill in biochemistry and biophysics for analyzing experimental data. We use curve fitting to determine parameters in mathematical models that describe biological processes like enzyme kinetics, binding affinity, and reaction rates.
The basic idea is to find the parameters of a mathematical function that best describes our experimental data.
Let’s start with some simple data that follows a linear relationship. The data below represents a theoretical experiment where we measure some output \(y\) at different input values \(x\):
Looking at this data, we can see it roughly follows a straight line. Let’s assume our model is:
\[y = ax + b\]
where \(a\) and \(b\) are parameters we want to determine from the data.
To use scipy.optimize.curve_fit
, we need to define a function where:
Define a linear function for fitting:
Always a good idea to check that your function works as expected
Come up with another test case to check
Now we can use curve_fit
to find the best parameters. The basic syntax is:
Where:
function
: The function we defined abovex_data
, y_data
: Our experimental datap0
: Initial guess for parameters (optional but recommended)popt
: The optimized parameters (what we want!)pcov
: Covariance matrix (contains information about parameter uncertainties)Finish the cell below to perform the curve fit by adding the three missing arguments to the call to curve_fit
.
It’s crucial to always plot your fit to see how well it describes the data. To do this we evaluate the function with the fitted parameters and a densely sampled independent variable.
Then we can plot it
fig, ax = plt.subplots(figsize=(6, 4))
ax.plot(x_data, y_data, 'o', markersize=8, label='Experimental data')
# Then we can plot the fit
ax.plot(..., ..., '-', linewidth=2, label=f'Fit: y = {a_fit:.2f}x + {b_fit:.2f}')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.legend()
ax.set_title('Linear curve fit')
Many biological processes follow nonlinear relationships. Let’s work with exponential decay, which is common in biochemistry (e.g., radioactive decay).
First, let’s generate some exponential data:
# Generate exponential decay data
t = np.array([0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0])
# True parameters: A=10, k=0.8
A_true, k_true = 10, 0.8
signal = A_true * np.exp(-k_true * t) + np.random.normal(0, 0.3, len(t))
fig, ax = plt.subplots(figsize=(6, 4))
ax.plot(t, signal, 'o', markersize=8)
ax.set_xlabel('Time')
ax.set_ylabel('Signal')
ax.set_title('Exponential decay data')
Text(0.5, 1.0, 'Exponential decay data')
The model we want to fit is: \[\text{signal} = A e^{-kt}\]
Define the exponential decay function:
Now fit the exponential function to the data:
Again we should plot to check that it looks as expected
p0
) - poor guesses can lead to fitting failureYou now have the fundamental skills needed to fit curves to biochemical data! In the exercises, you’ll apply these techniques to analyze real experimental data and extract meaningful biological parameters.
Train your estimation skills using the widget below.
The widget below shows the curves for \(\theta\) using both the simple expression and the quadratic binding expression. Use it to determine how large [P] has to be for them to be notably different.
A dialysis experiment was set up where equal amounts of a protein were separately dialyzing against buffers containing different concentrations of a ligand – each measurement was done in triplicate. The average number of ligands bound per protein molecule, \(\bar{n}\) were obtained from these experiments. The corresponding concentrations of free ligand and values are given in dataset dialys-exper.xlsx
.
Run the next cell after uploading the file
Explain how the values of \(\bar{n}\) is calculated when knowing the concentrations of ligand inside and outside the dialysis bag, as well as the total concentration of the protein, [\(\text{P}_{\text{tot}}\)].
Convert the concentrations of free ligand to SI-units given in M, add it as a row to the DataFrame
.
Now we want to fit the data to extract \(K_D\) and \(\nu_{\text{max}}\), by using the equation
\[ \nu([L_{\text{free}}]) = \nu_{\text{max}} \frac{[L_{\text{free}}]}{K_D + [L_{\text{free}}]} \]
To do so we need to implmenet it as a Python function
Fitting refers to finding the parameters that make an assumed functional form best ‘fit’ the data. Programmatically we will use the curve_fit
from the scipy
package to do so. The signature of this function looks like this
The arguments are
function
: A python function where the first argument is the independent variable, and other arguments are the parameters of the functions.x_data
: The observed values of the independent variable.y_data
: The observed values of the dependent variable.p0
: Initial guesses for the parameters.When called curve_fit
starts by calculating how well the functions fits the data with the initial parameters in p0
and then iteratively improves the fit by trying new values for the parameters in an intelligent way.
The found parameters will generally depend on p0
and it is therefore necessary to provide a good (or good enough) guess for p0
.
Generally, the best way to learn more about a function is to read it’s documentation and then play around with it. The documentation is in this case on the SciPy website. You don’t need to read it, unless you want more details.
Finish the code to perform the fitting in the cell below.
from scipy.optimize import curve_fit
# Choose the variables from the dataframe
x = df['Free Ligand [L](M)']
y = df['n-bar']
# Initial guess
K_D_guess = ... # Your initial guess for K_D
nu_max_guess = ... # Your initial guess for nu_max
p0 = [K_D_guess, nu_max_guess]
# Curve fit
# Replace the four ... with the correct arguments in the correct order.
popt, pcov = curve_fit(..., ..., ..., ...)
# Print the parameters
nu_max_fit, K_D_fit = popt
print(f"{nu_max_fit = :1.3f} ")
print(f"{K_D_fit = :e}")
Are the parameters you find reasonable? How can you tell if they are reasonable by looking at the plot you made earlier?
When we have the fitted parameters we can calculate and plot the function. To do so we make an array of values for the independent variable and use our function to calculate the dependent variable
Now that we calculated the dependent variable we can plot the fit along with the data.
The binding of ADP to the enzyme pyruvate kinase was measured by fluorescence. The enzyme concentration was 4 μM throughout the titration, and each measurement was done in triplicate. The binding results were obtained at 310 K and are given in the .csv
-file adp-bindin-pyruva-kinase.csv
.
As always, use the widget to load the dataset
Run the next cell after uploading the file
The concentrations in the dataset are given in mM, add a new column to the DataFrame
with the units given in M.
For each value of \(\bar{n}\) calculate the concentration of [ADP\(_\text{free}\)] from [ADP\(_\text{tot}\)] and [enzyme].
Make a plot of the free ligand concentration versus \(\bar{n}\).
To fit we need a implement the function we want to fit the parameters of, the functional form is
\[ n = n_{\text{max}} \frac{[L]^n}{K_D + [L]^n} \]
Finish the code below to create a fit.
from scipy.optimize import curve_fit
# Choose the variables from the dataframe
x = df['[ADPfree](M)']
y = df['nbar']
# Initial guess
K_D_guess = ... # Your initial guess for K_D
nu_max_guess = ... # Your initial guess for nu_max
n_exp = ... # Your initial guess for the exponent.
p0 = [K_D_guess, nu_max_guess, n_exp]
# Bounds
bounds = (0, np.inf) # We limit the parameters to be positve.
# Curve fit
popt, pcov = curve_fit(n_bound, x, y, p0=p0, bounds=bounds)
# Print the parameters
n_max_fit, K_D_fit, n_exp_fit = popt
print(f"{n_max_fit = :1.3f} ")
print(f"{K_D_fit = :e}")
print(f"{n_exp_fit = :1.3f} ")
Once we’ve obtained the fitted parameters we can plot the fit together with the data.
L = np.linspace(0, 1.2 * x.max(), 50)
n = n_bound(L, n_max_fit, K_D_fit, 1)
fig, ax = plt.subplots(figsize=(7, 4))
ax.plot(df['[ADPfree](M)'], df['nbar'], 'o', label='Observations')
ax.plot(L, n)
ax.axvline(K_D_fit, color='k', linewidth=0.5, linestyle='--')
ax.axhline(n_max_fit, color='k', linewidth=0.5, linestyle='--')
ax.set_xlabel(r'$[\text{ADP}_{\text{free}}]$', fontsize=14)
ax.set_ylabel(r'$\bar{n}$', fontsize=14)
ax.set_ylim([0, n.max()*1.1])
Calculate the free energy for the association of the ADP-pyruvate kinase complex assuming \(R = 8.314472 \times 10^{-3} \ \frac{\text{kJ}}{\text{mol} \cdot \text{K}}\) and \(T = 310 \ \text{K}\).
Consider the difference between association and dissociation
Start by defining the two given constants as variables
And do the calculation
The inter-bindin-data.xlsx
contains a protein binding dataset.
Load the dataset using the widget below
Run the next cell after uploading the file
Add a new column to the DataFrame
with the ligand concentration in SI units.
Make plots of the binding data directly with a linear and logarithmic x-axis.
Estimate \(K_D\) by visual inspection of these plots!
import matplotlib.pyplot as plt
# This makes a figure with two axes.
fig, axes = plt.subplots(1, 2, figsize=(9, 4))
# Can with [0] to plot in the first axis.
ax = axes[0]
ax.plot(..., ..., 'o') # Replace ... with your code.
ax.set_xlabel('[L](M)', fontsize=14)
ax.set_ylabel(r'$\bar{n}$', fontsize=14)
ax = axes[1]
... # Put some code here - perhaps you can copy it from somewhere?
ax.set_xscale('log') # This make the x-axis logarithmic.
Ths command ax.set_xscale('log')
tells matplotlib
that we want the x-axis to use a log-scale.
Make a fit to determine \(K_D\), as always we start by implementing the function to fit with
And then we can make the fit
from scipy.optimize import curve_fit
# Choose the variables from the dataframe
x = ... # Choose x-data from the dataframe
y = ... # Choose y-data from the dataframe
# Initial guess
p0 = [k_d_estimate]
# Bounds
bounds = (0, np.inf) # We limit the parameters to be positve.
# Curve fit
popt, pcov = ... # Call the curve_fit function.
# Print the parameters
k_d_fit = popt[0]
print(f"{k_d_fit = :e}")
Use the figure below to compare your guess with the fitted value.
Based on the value of \(K_D\) found from the fit,
The binding of NAD+ to the protein yeast glyceraldehyde 3-phosphate dehydrogenase (GAPDH) was studied using equilibrium dialysis. The enzyme concentration was 71 μM. The concentration of \([\text{NAD}^{+}_\text{free}]\) and the corresponding values of \(\bar{n}\) were determined with the resulting data found in the dataset deter-type-streng-coope.xlsx
.
Load the dataset using the widget below
Run the next cell after uploading the file
Start by adding a new column to the DataFrame
with the average value of \(\bar{n}\) across the three series
Now also add a column with the ligand concentration in SI units with the column-name [NAD+free]_(M)
.
Finally, set the concentration of the GADPH in SI units
Make a plot of the average \(\bar{n}\) as a function of \([\text{NAD}^{+}_\text{free}]\).
Make a Scatchard plot based on the average \(\bar{n}\).
How many binding sites does GAPDH contain for \(\text{NAD}^{+}\)?
What type of cooperativity do the plots indicate?
Make a fit using the functional form
\[ \bar{n} = N \frac{[L]^h}{K_D + [L]^h} \]
As usual, start by defining the function in Python
Now we can fit
from scipy.optimize import curve_fit
# This selects the '[NAD+free]_(m)'-column three times and stitches it together.
x = np.concatenate([df['[NAD+free]_(M)'], df['[NAD+free]_(M)'], df['[NAD+free]_(M)']])
# Do the same to stitch together the nbar1, nbar2 and nbar3 columns.
y = ...
# Initial guess
p0 = [..., ..., ...]
# Bounds
bounds = (0, np.inf) # We limit the parameters to be positve.
# Curve fit
popt, pcov = curve_fit(n_bar, x, y, p0=p0, bounds=bounds)
# Print the parameters
N_fit, k_d_fit, h_fit = popt
print(f"{N_fit = :.3f}")
print(f"{k_d_fit = :e}")
print(f"{h_fit = :.3f}")
L = np.linspace(0, df['[NAD+free]_(M)'].max()*1.5)
n_bar_fit = n_bar(L, N_fit, k_d_fit, h_fit)
fig, ax = plt.subplots(figsize=(8, 4))
ax.plot(x, y, 'o', label='Data')
ax.plot(L, n_bar_fit, '-', label='Fit')
ax.axhline(N_fit, color='C2', label='N from fit')
ax.axvline(k_d_fit, color='C3', label=r'$K_D$ from fit')
ax.set_xlabel(r'$[\text{NAD}^{+}_\text{free}]$', fontsize=14)
ax.set_ylabel(r'$\bar{n}$', fontsize=14)
ax.legend()
plt.show()
Below is given the general expression for saturation of a binding site by one of the ligands, \(L\), when two ligands \(L\) and \(C\) are competing for binding to the same site on a protein. Assume that \([P_T] = 10^{-9}\) \(\mathrm{M}\).
\[ \theta = \frac{[PL]}{[P_T]} = \frac{1}{\frac{K_D}{[L]}\left(1 + \frac{[C]}{K_C}\right) + 1} \]
Consider these four situations
# | \(K_D\) | \([L_T]\) | \(K_C\) | \([C_T]\) |
---|---|---|---|---|
1 | \(1·10^{-5}\) \(\mathrm{M}\) | \(1·10^{-3}\) \(\mathrm{M}\) | \(0\) | |
2 | \(1·10^{-5}\) \(\mathrm{M}\) | \(1·10^{-3}\) \(\mathrm{M}\) | \(1·10^{-6}\) \(\mathrm{M}\) | \(1·10^{-2}\) \(\mathrm{M}\) |
3 | \(1·10^{-5}\) \(\mathrm{M}\) | \(1·10^{-3}\) \(\mathrm{M}\) | \(1·10^{-5}\) \(\mathrm{M}\) | \(1·10^{-3}\) \(\mathrm{M}\) |
4 | \(1·10^{-5}\) \(\mathrm{M}\) | \(1·10^{-5}\) \(\mathrm{M}\) | \(1·10^{-6}\) \(\mathrm{M}\) | \(1·10^{-6}\) \(\mathrm{M}\) |
Explain how the fact that \([P_T]\) is much smaller than \([L_T]\) and \([C_T]\) simplifies the calculations using the above equation.
Calculate the degree of saturation of the protein with ligand \(L\) in the four situations.
Start by writing a Python function that calculates the degree of saturation \(\theta\).
Then use that function to calculate \(\theta\) for each situation.
What is the degree of saturation with respect to the competitor \(C\) in #1, #2 and #4?
What is the fraction of \([P_{\mathrm{free}}]\) in #2, #3, #4?
Consider how to express the fraction of \([P_{\mathrm{free}}]\) in terms of theta_L_X
and theta_C_X
.
---
title: Week 46
engine: jupyter
categories: ['regression', 'calculation', 'data', 'plotting']
format-links:
- text: "Open in Google Colab"
href: "https://colab.research.google.com/github/au-mbg/fysisk-biokemi/blob/built-notebooks/built_notebooks/student/week_46.ipynb"
icon: box-arrow-up-right
---
{{< include tidbits/_install_import.qmd >}}
---
{{< include intros/regression_intro.qmd >}}
---
{{< include exercises/estim-bindin-affini.qmd >}}
---
{{< include exercises/dialys-exper.qmd >}}
---
{{< include exercises/adp-bindin-pyruva-kinase.qmd >}}
---
{{< include exercises/inter-bindin-data.qmd >}}
---
{{< include exercises/deter-type-streng-coope.qmd >}}
---
{{< include exercises/compe-ligand-bind.qmd >}}