Code
try:
import fysisk_biokemi
print("Already installed")
except ImportError:
%pip install -q "fysisk_biokemi[colab] @ git+https://github.com/au-mbg/fysisk-biokemi.git"Curve fitting is a fundamental skill in biochemistry and biophysics for analyzing experimental data. We use curve fitting to determine parameters in mathematical models that describe biological processes like enzyme kinetics, binding affinity, and reaction rates.
The basic idea is to find the parameters of a mathematical function that best describes our experimental data.
Real biochemical data is often requires complex fitting functions. Therefore, we will start out with a simpler data that follows a linear relationship. The data below represents a theoretical experiment where we measure some output \(y\) at different input values \(x\):
The functions above simply make two arrays - i.e. columns of data - containing the number. You can view the content of these colmns by with print(x_data) or print(y_data).
Looking at this data, we can see it roughly follows a straight line. Let’s assume our model is:
\[y = ax + b\]
where \(a\) and \(b\) are parameters we want to determine from the data.
To use scipy.optimize.curve_fit, we need to define a function where:
Define a linear function for fitting:
Always a good idea to check that your function works as expected
As a test we are adding 1 in the place of x, 2 in the place of a and 0 in the place of b. In other words: \(1 \cdot 2 + 0 = 2\)
When in doubt it is a good idea to do sanity checks like this.
Now we can use curve_fit to find the best parameters. The basic syntax is:
Where:
function: The function we defined abovex_data, y_data: Our experimental datap0: Initial guess for parameters (optional but recommended)fitted_parameters: The optimized parameters (what we want!)trash: Don’t worry about this.Finish the cell below to perform the curve fit by adding the three missing arguments to the call to curve_fit.
# Initial guess: a=2, b=0
initial_guess = [2, 0]
## Your task: Perform the fit ##
## You need give the function three arguments ##
## The function: linear_function ##
## The x-data: x_data
## The y-data: y_data
fitted_parameters, trash = curve_fit(..., ..., ..., p0=initial_guess)
## Extract parameters ##
a_fit, b_fit = popt
print(fitted_parameters)
print(a_fit)
print(b_fit)It’s crucial to always plot your fit to see how well it describes the data. To do this we evaluate the function with the fitted parameters and a densely sampled independent variable.
Then we can plot it
Many biological processes follow nonlinear relationships. Let’s work with exponential decay, which is common in biochemistry (e.g., radioactive decay or unimolecular chemical reactions).
You can load the dataset from the exp_decay_data.xlsx-file with the widget below
Run the next cell after uploading the file
Now make a plot of the data
The model we want to fit is: \[\text{signal} = A e^{-kt}\]
Define the exponential decay function:
def exponential_decay(t, A, k):
## Your task: Calculate the exponential function ##
## Remember that np.exp calculates the exponential function ##
result = ...
return result
## This calculates the function for t=1, A=1, k=1 ##
## If you did it correctly you should see 0.3678...
test_result = exponential_decay(1, 1, 1)
print(test_result)Now fit the exponential function to the data:
## We set the initial guess as A=8 and k=1 ##
initial_guess = [8, 1]
## Your task: Make the fit using the exponenital_decay function, and the time and signal data.
##
fitted_parameters, trash = curve_fit(..., ..., ..., p0=initial_guess)
# Extract parameters
A_fit, k_fit = fitted_parameters
print(A_fit)
print(k_fit)Again we should plot to check that it looks as expected
Now we can plot the fit along with the data.
curve_fit
Fitting refers to finding the parameters that make an assumed functional form best ‘fit’ the data.
In Python we will use the curve_fit from the scipy package to do so. The function looks like this
The arguments are
function: A python function where the first argument is the independent variable (typically the x coordinate), and other arguments are the parameters of the functions.x_data: The observed values of the independent variable (The x coordinates).y_data: The observed values of the dependent variable (The y coordinates) .p0: Initial guesses for the parameters.When called curve_fit starts by calculating how well the functions fits the data with the initial parameters in p0 and then iteratively improves the fit by trying new values for the parameters in an intelligent way.
The found parameters will generally depend on p0 and it is therefore necessary to provide a good (or good enough) guess for p0.
Generally, the best way to learn more about a function is to read it’s documentation and then play around with it. The documentation is in this case on the SciPy website. You don’t need to read it, unless you want more details.
You now have the fundamental skills needed to fit curves to biochemical data! In the exercises, you’ll apply these techniques to analyze real experimental data and extract meaningful biological parameters.
Train your estimation skills using the widget below.
The widget below shows the curves for \(\theta\) using both the simple expression assuming that \(L = L_{tot}\) and the quadratic binding expression. Vary \(K_D\) and \(P_{total}\) to work out a rule of thumb for when the two equations give a similar curve.
A dialysis experiment was set up where equal amounts of a protein were separately dialyzing against buffers containing different concentrations of a ligand – each measurement was done in triplicate. The average number of ligands bound per protein molecule, \(\bar{n}\) were obtained from these experiments. The corresponding concentrations of free ligand and values are given in dataset dialys-exper.xlsx.
Run the next cell after uploading the file
Explain how the values of \(\bar{n}\) is calculated when knowing the concentrations of ligand inside and outside the dialysis bag, as well as the total concentration of the protein, [\(\text{P}_{\text{tot}}\)].
Convert the concentrations of free ligand to SI-units given in M, add it as a row to the DataFrame.
Now we want to fit the data to extract \(K_D\) and \(n\), by using the equation
\[ \bar{n}([L_{\text{free}}]) = N \frac{[L_{\text{free}}]}{K_D + [L_{\text{free}}]} \]
To do so we need to implmenet it as a Python function
Finish the code to perform the fitting in the cell below.
from scipy.optimize import curve_fit
# Choose the variables from the dataframe
x = df['Free_ligand_(M)']
y = df['n_bar']
## Your task: Make initial guesses for the two parameters ##
K_D_guess = ... # Your initial guess for K_D
N_guess = ... # Your initial guess for N.
initial_guess = [K_D_guess, N_guess]
## Your task: Give the four arguments in the correct order to make calculate ##
## the fitted parameters ##
fitted_parameters, trash = curve_fit(..., ..., ..., ...)
# Print the parameters
N_fit, K_D_fit = fitted_parameters
print(fitted_parameters)
print(N_fit)
print(K_D_fit)Are the parameters you find reasonable? How can you tell if they are reasonable by looking at the plot you made earlier?
When we have the fitted parameters we can calculate and plot the function. To do so we make an array of values for the independent variable and use our function to calculate the dependent variable
Now that we calculated the dependent variable we can plot the fit along with the data.
The inter-bindin-data.xlsx contains a protein binding dataset.
Load the dataset using the widget below
Run the next cell after uploading the file
Add a new column to the DataFrame with the ligand concentration in SI units.
Make plots of the binding data directly with a linear and logarithmic x-axis.
Estimate \(K_D\) by visual inspection of these plots!
import matplotlib.pyplot as plt
# This makes a figure with two axes.
fig, axes = plt.subplots(1, 2)
## We index with [0] to plot in the first axis. ##
## Don't worry about remembering this ##
ax = axes[0]
## Your task: Plot the data [L]_(M) vs nbar.
ax.plot(..., ..., 'o') # Replace ... with your code.
## We index with [0] to plot in the first axis. ##
## Don't worry about remembering this ##
## We will make this a log plot ##
ax = axes[1]
## Your task: Plot the data [L]_(M) vs nbar.
## Same exact code as above
ax.plot(..., ..., 'o') # Replace ... with your code.
## This makes it a log plot ##
## You do not need to remember or understand this in detail ##
ax.set_xscale('log')Ths command ax.set_xscale('log') tells matplotlib that we want the x-axis to use a log-scale.
Make a fit to determine \(K_D\), as always we start by implementing the function to fit with
And then we can make the fit
## Your task: Choose the variables from the dataframe ##
x = ... # Choose x-data from the dataframe
y = ... # Choose y-data from the dataframe
## Initial guess ##
initial_guess = [K_D_estimate] # Using your estimate as the initial value for K_D
## Your task: Finish the curve fitting
fitted_parameters, trash = ... # Call the curve_fit function.
# Print the parameters
K_D_fit = fitted_parameters[0]
print(K_D_fit)Compare the fitted values with your guess.
## These lines make equally spaced points on a normal linear axis ##
## And on a logarithmic axis. ##
L_smooth = np.linspace(0, df['[L]_(M)'].max(), 100)
L_smooth_log = np.geomspace(df['[L]_(M)'].min(), df['[L]_(M)'].max(), 100)
## Youe task: Evaluate the fitted function with L_smooth and L_smooth_log ##
nbar_fit = n_bar(..., K_D_fit)
nbar_fit_log = n_bar(..., K_D_fit)Now we can make a plot.
## This plot uses some extra commands to make it look nice, don't worry
## about it - you don't need to understand all the details.
# This makes a figure with two axes.
fig, axes = plt.subplots(1, 2, figsize=(9, 4))
# Index with [0] to plot in the first axis - Linear plot
ax = axes[0]
ax.plot(df['[L]_(M)'], df['nbar'], 'o', color='C0')
ax.plot(L_smooth, nbar_fit, color='C2')
ax.axvline(K_D_estimate, label='Estimate', color='C1')
ax.axvline(K_D_fit, label='Fit', color='C2')
ax.set_xlabel('[L](M)', fontsize=14)
ax.set_ylabel(r'$\bar{n}$', fontsize=14)
ax.legend()
# Index with [1] to plot in the second axis - Log plot.
ax = axes[1]
ax.plot(df['[L]_(M)'], df['nbar'], 'o', color='C0')
ax.plot(L_smooth_log, nbar_fit_log, color='C2')
ax.set_xlabel('[L](M)', fontsize=14)
ax.set_ylabel(r'$\bar{n}$', fontsize=14)
ax.axvline(K_D_estimate, label='Estimate', color='C1')
ax.axvline(K_D_fit, label='Fit', color='C2')
ax.legend()
ax.set_xscale('log')Based on the value of \(K_D\) found from the fit,
The binding of NAD+ to the protein yeast glyceraldehyde 3-phosphate dehydrogenase (GAPDH) was studied using equilibrium dialysis. The enzyme concentration was 71 μM. The concentration of \([\text{NAD}^{+}_\text{free}]\) and the corresponding values of \(\bar{n}\) were determined with the resulting data found in the dataset deter-type-streng-coope.xlsx.
The dataset consists of three independent repititions of the same experiment.
Load the dataset using the widget below
Run the next cell after uploading the file
Start by adding a new column to the DataFrame with the average value of \(\bar{n}\) across the three series
Now also add a column with the ligand concentration in SI units with the column-name [NAD+free]_(M).
Finally, set the concentration of the GADPH in SI units
Make a plot of the average \(\bar{n}\) as a function of \([\text{NAD}^{+}_\text{free}]\).
Make a Scatchard plot based on the average \(\bar{n}\).
Estimate by eye how many bindings sites GAPDH contains for \(\text{NAD}^{+}\)?
Based on a visual inspection of the plot above, are there in signs of cooperativity? If so, which kind?
Make a fit using the functional form
\[ \bar{n} = N \frac{[L]^h}{K + [L]^h} \]
As usual, start by defining the function in Python
Now we can fit
## We want to use all the data for the fit, but it was given in three different columns.
## We stitch it together to make a column containing all x-coordinates
## And another column containing all y-coordinates
x = np.concatenate([df['[NAD+free]_(M)'], df['[NAD+free]_(M)'], df['[NAD+free]_(M)']])
y = np.concatenate([df['nbar1'], df['nbar2'], df['nbar3']])
## Your task: Make initial guesses for the three parameters
## The order is that of the function
## So N, K and h.
initial_guess = [..., ..., ...]
## Your task: Make the fit
## Note: The bounds argument has been added to have the function
## only explore positive sets of parameters.
fitted_parameters, trash = curve_fit(..., ..., ..., ..., bounds=(0, np.inf))
# Print the parameters
N_fit, K_fit, h_fit = fitted_parameters
print('N_fit', N_fit)
print('K_fit', K_fit)
print('h_fit', h_fit)Do the fitted parameters support your intuitive reading of the type of cooperativity?
In the data sets used, repeated experiments were given as a three seperate columns, which is a quite natural way of recording data during an experiment. However, it is not an appropriate format for regression, where we want all the data-points in a single column. In this exercise we did a bit of data wrangling to make the data appropriate for what we want to do with it - in this case by creating a new joint column containing all the data by using the np.concatenate function.
Below is given the general expression for saturation of a binding site by one of the ligands, \(L\), when two ligands \(L\) and \(C\) are competing for binding to the same site on a protein. Assume that \([P_T] = 10^{-9}\) \(\mathrm{M}\).
\[ \theta_L = \frac{[PL]}{[P_T]} = \frac{1}{\frac{K_D}{[L]}\left(1 + \frac{[C]}{K_C}\right) + 1} \]
Consider these four situations
| # | \([L_T]\) | \([C_T]\) | \(K_D\) | \(K_C\) |
|---|---|---|---|---|
| 1 | \(1·10^{-3}\) \(\mathrm{M}\) | \(0\) | \(1·10^{-5}\) \(\mathrm{M}\) | \(1·10^{-6}\) \(\mathrm{M}\) |
| 2 | \(1·10^{-3}\) \(\mathrm{M}\) | \(1·10^{-2}\) \(\mathrm{M}\) | \(1·10^{-5}\) \(\mathrm{M}\) | \(1·10^{-6}\) \(\mathrm{M}\) |
| 3 | \(1·10^{-3}\) \(\mathrm{M}\) | \(1·10^{-3}\) \(\mathrm{M}\) | \(1·10^{-5}\) \(\mathrm{M}\) | \(1·10^{-5}\) \(\mathrm{M}\) |
| 4 | \(1·10^{-5}\) \(\mathrm{M}\) | \(1·10^{-6}\) \(\mathrm{M}\) | \(1·10^{-5}\) \(\mathrm{M}\) | \(1·10^{-6}\) \(\mathrm{M}\) |
Explain how the fact that \([P_T]\) is much smaller than \([L_T]\) and \([C_T]\) simplifies the calculations using the above equation.
Calculate the degree of saturation of the protein with ligand \(L\) in the four situations.
Start by writing a Python function that calculates the degree of saturation \(\theta\).
Then use that function to calculate \(\theta_L\) for each situation.
## Your task: Calculate the degree of saturation for each set of parameters ##
theta_L_1 = bound_fraction(..., ..., ..., ...)
theta_L_2 = ...
theta_L_3 = ...
theta_L_4 = ...
print('Situation 1', theta_L_1)
print('Situation 2', theta_L_2)
print('Situation 3', theta_L_3)
print('Situation 4', theta_L_4)What is the degree of saturation with respect to the competitor \(C\) in #1, #2, #3 and #4?
## Your task: Calculate the degree of saturation for each set of parameters ##
## Note: Now you are calculating it for the competitor!
theta_C_1 = ...
theta_C_2 = ...
theta_C_3 = ...
theta_C_4 = ...
## Prints the results so we can analyze them.
print('Situation 1', theta_C_1)
print('Situation 2', theta_C_2)
print('Situation 3', theta_C_3)
print('Situation 4', theta_C_4)What is the fraction of \([P_{\mathrm{free}}]\) in #1, #2, #3 and #4?
Consider how to express the fraction of \([P_{\mathrm{free}}]\) in terms of theta_L_X and theta_C_X.
## Your task: Calculate the P_free.
## Hint: Do not use the function from before - figure out another way.
theta_free_1 = ...
theta_free_2 = ...
theta_free_3 = ...
theta_free_4 = ...
## Prints the results so we can analyze them.
print('Situation 1', theta_free_1)
print('Situation 2', theta_free_2)
print('Situation 3', theta_free_3)
print('Situation 4', theta_free_4)Describe in your own words why these four competitive situations end up with their respective binding patterns. Consider the relative concentrations and affinities of ligands L and C.
Consider the reaction:
\[ A \rightleftarrows B \]
The time course of the reaction was followed monitoring formation of the product \(B\). The starting concentration of \(A\) was 50 µM, and as you will learn later in the course, the reversible reaction going from \(A\) to \(B\) follows first order kinetics.
The data is contained in the diff_q_keq_data.xlsx file.
Run the next cell after uploading the file
Convert the measured concentrations of \(B\), \([B]\), to concentrations in the SI-unit \(\mathrm{M} \ (\mathrm{mol/L})\). Repeat for the starting concentration of \(A\), \([A_0]\).
Use the laws of mass balance to calculate \([A]\) as a function of time in the unit \(\mathrm{M}\), add it as a column to the DataFrame
Make a plot of the time vs. the concentration of \(A\) and of time versus the concentration of \(B\).
Calculate and plot the mass action ratio \(Q\) as a function of time.
Now plot the mass action ratio versus time.
What is the difference between \(Q\) and \(K_{eq}\)?
Which of the two compounds \(A\) and \(B\) is favored at equilibrium? (Favored: highest concentration)
What is the equilibrium constant of this reaction?
The equilibrium constant for a reversible reaction going from \(A\) to \(B\) was measured as a function of temperature.
The data obtained is given in the Excel document deter_delta_h_data.xlsx - load the data with the widget
Run the next cell after uploading the file
In NumPy the np.log function calculates the natural logarithm that we often write as \(\ln\) and np.log10 calculates the base 10 logarithm.
Add new columns to the DataFrame containing \(\ln(K_{eq})\) and \(1/T\) and make a plot with \(1/T\) on the x-axis and \(\ln(K_{eq})\) on the y-axis.
Now you can make the plot in the cell below
Make a linear trendline by fitting a linear function to the data.
Start by defining a linear function
Now we can use regression to find the parameters
Then make a plot with the trendline added
fig, ax = plt.subplots()
## Your task: Choose points to evaluate the fit
## Look at the x-axis above and estimate a value for the minimum and maximum.
## Then calculate the fit at those points
max_x = ...
min_x = ...
T_trend = np.array([min_x, max_x]) # Select or make some appropriate points for the plotting evaluating the trendline.
y_fit = vant_hoff(T_trend, ..., ...)
## Your task: Plot the fit
...
## Plots the data
ax.plot(df['1/T'], df['ln(Keq)'], 'o')
## EXTRA: You do not need to edit or understand this.
## It adds axis labels to the plot you made.
ax.set_xlabel('1/T [K]')
ax.set_ylabel(r'$\ln(K_{eq})$');Calculate the reaction enthalpy and entropy, at \(T = 298 \ \mathrm{K}\), assuming that \(R = 8.314472 \times 10^{-3} \ \frac{\mathrm{kJ}}{\mathrm{mol}\cdot\mathrm{K}}\). Is the reaction endo- or exothermic?
The units of a slope is the units of the y-axis dvided by the units of the x-axis.
In the van’t Hoff plot the y-axis is unitless and the x-axis is in 1/K.
Calculate the free energy change, \(\Delta G^\circ\), at \(T = 298 \ \mathrm{K}\), assuming that \(R = 8.314472 \times 10^{-3} \ \frac{\mathrm{kJ}}{\mathrm{mol}\cdot\mathrm{K}}\)
Below is shown the structure of ATP.
\(\Delta G^\circ\) for the hydrolysis of ATP to ADP and Pi is \(-30.5 \ \mathrm{kJ/mol}\).
In cells at \(298 \ \mathrm{K}\) the steady state concentrations of ATP, ADP og Pi resulted in a value for the reaction quotient \(Q = \frac{\mathrm{[ADP]}[\mathrm{Pi}]}{[\mathrm{ATP}]} = 0.002\). Calculate ΔG’ under these conditions.
Assuming that the temperature within these cells is \(298 \ \mathrm{K}\), are these cells in equilibrium with respect to the hydrolysis of ATP?
Calculate the number of moles of ATP synthesized by a resting person in a period of 24 hours, assuming that the energy consumption by this person is \(8368 \ \mathrm{kJ}\) per 24 hours and that 40% of this energy is converted directly into ATP without loss of energy. Use the ΔG’ value calculated in question a, and assume that \(298 \ \mathrm{K}\).
You can use np.absolute to calculate the absolute value of a number.
Next, calculate the average number of times each ATP molecule is turned over (synthesis followed by hydrolysis) in 24 hours when the total amount of ATP in the body equals \(0.2 \ \mathrm{mol}\).
---
title: Week 46
engine: jupyter
categories: ['regression', 'calculation', 'data', 'plotting']
format-links:
- text: "Open in Google Colab"
href: "https://colab.research.google.com/github/au-mbg/fysisk-biokemi/blob/built-notebooks/built_notebooks/student/week_46.ipynb"
icon: box-arrow-up-right
---
{{< include tidbits/_install_import.qmd >}}
---
{{< include intros/regression_intro.qmd >}}
---
{{< include exercises/estim-bindin-affini.qmd >}}
---
{{< include exercises/dialys-exper.qmd >}}
---
{{< include exercises/inter-bindin-data.qmd >}}
<!-- ---
{{< include exercises/adp-bindin-pyruva-kinase.qmd >}} -->
---
{{< include exercises/deter-type-streng-coope.qmd >}}
---
{{< include exercises/compe-ligand-bind.qmd >}}
---
{{< include exercises/diff_q_keq.qmd >}}
---
{{< include exercises/deter_deltah_for_react.qmd >}}
---
{{< include exercises/atp_hydrolysis.qmd >}}