Code
try:
import fysisk_biokemi
print("Already installed")
except ImportError:
%pip install -q "fysisk_biokemi[colab] @ git+https://github.com/au-mbg/fysisk-biokemi.git"
Evaluate the following expressions using Python
Recall that the basic arithemetic operations are, in Python, represented as follows
+
-
/
*
x**y
The cell below defines an array \(a\) with all the numbers from 0 to 9.
For each entry \(a_i\) in the array calculate the expression \(b_i = a_i / 3 + a_i^2\) and assign that to a new array \(b\). use the fact that you can do elementwise operations with an array - do not calculate it seperately for every element of the array.
With an array all the basic arithemtic operations can do be done elementwise, so the expression \(x^2 + 1\) can be done for all elements in an array as such
Just like if x
was just a single number. Similarly, expressions be involve adding arrays together like
Variables are an essential part of a program. Calculate the following where each intermediate is assigned to a variable
\[ \begin{aligned} a &= 5 \times 3 \\ b &= a + 9 \\ c &= \frac{4}{3}a + b + 2 \end{aligned} \]
Bonus question: Why does c
print as a decimal number when the others print without decimals?
Parentheses a very important, one misplaced parentheses and the universe could have been drastically different - if parenthesis had been missing it is likely the universe would never have existed.
Using the variables a
, b
and c
you calculate in the previous exercise, evaluate the following expressions
For repeated computations it’s good practice to define a function. Define a function that calculates the following expression
\[ f(x) = \frac{Ax + B + C}{A + B} \]
Where \(A = 10\), \(B=5\), \(C=2.5\)
Now evaluate the function for \(x = 62.5\),
There are many ways to plot, we will be using the matplotlib
library - but the concepts are similar across most ways of plotting.
We can make a plot of a straight line between two points \((x_1, y_1)\) and \((x_2, y_2)\) like so;
ax
is a special type (like int
, float
, str
), you can think of it as the box that contains the plot.
If we want to connect to a third point \((x_3, y_3)\) we would instead write
Notice that with three sets of points we plot two line segments.
Make a plot with lines between the following 4
Start by defining an array for both x
and y
Now use ax.plot
to make the plot
Consider the following;
In order to actually plot a square you will need to update x
and y
, they both need to contain five numbers such that four line segments are plotted.
Copy your code for making the plot from the previous exercise to the cell below
Often the data we plot can broadly be thought of as originating from a function, that is we have
\[ y = f(x) \]
Where \(f\) is the function, this might for example be an experiment that produces some output \(y\) given some input \(x\) or it might be a traditional mathematical function like \(y = x^2\). Plotting this type of data is exactly the same as plotting a square - it just usually consists of many more data points resulting in many line segments producing a smooth looking curve.
Make a plot of the function
\[ y = \mathrm{e}^x \]
Once you’ve made the plot try changing the number of points in x
by changing the last argument in the call to np.linspace
.
Remember that np.exp
is used to calculate the exponential function.
Often we want to plot multiple functions in the same figure as it enables us to compare them. Luckily, this is quite simple!
Plot the function
\[ y = \mathrm{e}^{ax} \]
With \(a = \left[1, \cfrac{1}{2}, 1.2\right]\) in the same plot producing three curves.
We will generally need to customize plots a little bit, for example the plot from the previous exercise doesn’t have a label for the axis, there’s no way of telling which curve is which and it has no title.
To add labels to the x and y axis we use the two functions
ax.set_xlabel('String with the name of the x-axis')
ax.set_ylabel('String with the name of the y-axis')
Adding information about the curves we plot can be done by also giving them a label, that is done as an argument to the ax.plot
call, like so
label
argument is given when the curve is plotted.
ax.legend()
is called to tell matplotlib to show the a ‘legend’ containing the label
of all the plots that have been given one.
Copy your code from the previous exercise and add labels for both the x and y axis aswell as each curve, for example you can label them according to the value of the the parameter \(a\) like a = 1
etc.
There are many other customization options that can be added in the same way as the label
, that is
Some useful examples are
linestyle
: Controls if the plot is made using a full, dashed or dotted line using -
, --
and :
respectively. So ax.plot(..., ..., linestyle = ":")
produces a dotted line.color
: Controls the color of the line - valid options are listed here https://matplotlib.org/stable/gallery/color/named_colors.htmllinewidth
: Sets the width of the line - can be any number.alpha
: Controls the transparency of the plot 0
being fully transparent and 1
being fully visible.You can try some of these options for the plot you’ve made above if you want to. There are many other ways of customizing plots for different situations. The matplotlib gallery shows a number of them.
Sometimes we don’t want to connect each point with a line segment but just show the points in a scatter plot.
The next cell makes some data
Try plotting it like for the previous exercises
You will see a very strange plot like when you scribble out a word on a piece of paper. Clearly a line plot is not a very good way of showing this data! There are, at least, two ways of making a scatter plot
ax.plot(x, y, 'o')
: Quick and dirty way of just showing the points and not the lines.ax.scatter(x, y)
: Function specifically for making these types of plots - with its own set of customization arguments.Try either, or both, of these ways in the cell below
In the accompanying Excel file (averag-prope-amino-acids.xlsx
), you will find a table that contains the molecular weight of the 20 common amino acid residues, i.e. their weight as residues in a peptide chain. Additionally, you will find their relative frequency in E. coli proteins, where a frequency of 0.01 means that this residue constitutes 1 % of the residues in a protein.
Use the widget below to load the averag-prope-amino-acids.xlsx
file.
The command below will display the table as a DataFrame
.
Calculate the average molecular weight of a residue in a protein. To do this our procedure will be as follows
In the cell below finish the calculation of weight_times_freq
by extracting the "MW of AA residue"
-column and "Frequency in proteins"
and multiplying them together.
You can index in the dataframe by using the column name, for example to get the "MW of AA residue"
-column you would do
Arrays also allows us to operate on every element in the array at the same time, so arrays can be added, subtracted, multiplied, divided, etc, see for example this figure
The syntax f"{average_mw = :.3f}"
is just a way of printing the value with a nicer format, in this case we print the value to 3 decimal places. In Python these are called f-strings, you don’t need to understand the details at the moment.
What would the molecular weight of a 300-residue protein most likely be, if you did not know its sequence?
In many projects, you will be working with a mixture of proteins. This could for example be a cell lysate or a biological fluid for protein abundance analysis, or the early stages of a protein purification process. In these situations, you cannot work with a molecule specific extinction coefficient. Instead, we would use the average values, which we will determine below.
Calculate the average concentration of amino acid residues in a protein mixture at 1 mg/mL.
Calculate the absorbance from such a mixture under the assumption that only Trp and Tyr contribute.
For a cell lysate, you measure and absorbance of 0.78 at a path length of 0.5 cm. What is the protein concentration?
# Set known values:
A = 0.78 # Unitless
l = 0.5 # cm
# Extract frequencies
freq = df.set_index("Name")["Frequency in proteins"]
f_trp = freq["Tryptophan (Trp/W)"]
f_tyr = freq["Tyrosine (Tyr/Y)"]
# Calculate extintinction coefficent in [L/(mol cm)]
eps_mix = ...
# Calculate molar concentration [mol/L]
c_res = ...
# Calculate concentration [g/L] = [mg/mL]:
conc_mg_per_mL = ...
print(f"Protein concentration = {conc_mg_per_mL:.3f} mg/mL")
In this exercise we will analyze the spectra of apo- and holo-myoglobin. The dataset is given in uv-spec-apo-holo-myo.csv
.
Use the widget to load the dataset.
Run this cell after having uploaded the file in the cell above.
Using matplotlib
plot each spectrum in the same figure as line plots.
Based on the spectra explain what ApoMb and HoloMb represent?
You have learned that pure proteins without any UV/Vis-absorbing prosthetic groups bound, have basically no absorbance at wavelengths above λ>320 nm. Nevertheless ApoMb still show some absorbance above 320 nm. In this case, it can explained by the fact that ApoMb was generated from HoloMb by a procedure that will not be explained here.
Explain why ApoMb in the spectrum above absorbs light at λ>320 nm.
Give a rough estimate of the efficiency of the chosen procedure of heme-group removal.
Are there any isobestic points between the two spectra?
To get a better understanding of the causes for the different spectra, you can compare to litterature. The figure below shows the absorbance spectra of three states of myoglobin.
\[ \begin{aligned} \mathrm{MbFe(II)O_2} &\Rightarrow \mathrm{oxyMb} \\ \\ \mathrm{MbFe(III)} &\Rightarrow \mathrm{metMb} \\ \\ \mathrm{MbFe(II)} &\Rightarrow \mathrm{deoxyMb} \end{aligned} \]
metMb \((\mathrm{MbFe(III)})\) is normally described as ‘aged’ myoglobin. What does this mean in terms of the bound iron?
Give a qualitative explanation to the observed change in absorbance of metMb compared to fresh oxy-/deoxyMb?
Is the spectral difference from deoxyMb to metMb a redshift or a blueshift?
You would like to set up an experiment where the absorbance of myoglobin at defined wavelength(s) should be used to measure the level of \(\mathrm{O}_2\) binding. Sketch how the absorbance spectra would look like when going from deoxyMb and continuously increasing the concentration of \(\mathrm{O}_2\) (draw this for five different \(\mathrm{O}_2\) concentrations going from pure deoxyMb to pure oxyMb)
In this exercise we will learn how Python is excellent for handling datasets with many data points and how it can be used to apply the same procedure to all the data points at once.
A researcher wants to determine the concentration of two proteins in blood plasma that is suspected to be involved in development of an autoimmune disease. 500 patients and 500 healthy individuals were included in the study and absorbance measurements of the two purified proteins from all blood plasma samples were measured at 280 nm. The molecular weight and extinction coefficients of the two proteins are given in the table below.
Protein | \(M_w\) \([\text{kDa}]\) | \(\epsilon\) \([\text{M}^{-1}\text{cm}^{-1}]\) |
---|---|---|
1 | 130 | 180000 |
2 | 57 | 80000 |
Use the widget to load the dataset as a dataframe from the file protei-blood-plasma.xlsx
Run this cell after having uploaded the file in the cell above.
Calculate the molar concentration of the two proteins in all samples, the light path for every measurement is 0.1 cm.
Always a good idea to assign known values to variables
Add another set of four columns containing the concentrations in mg/mL.
Now that we have the concentrations, calculate the concentration in the four categories.
When displaying the dataframe above we used indexed it with names
as df[names]
. We can do the same to compute something over just the four rows.
For example the if we have a DataFrame
called example_df
, we can calculate the mean over the rows as:
Here axis=0
means that we apply the operation over the first axis which by convention are the rows. The figure below visualizes this
Calculate the standard deviation
The standard deviation can be calculated using the .std
-method that works in the same way as the .mean
-method we used above.
Consider the following questions
The spectra of many fluorescent proteins can be found at the website: www.fpbase.org. Go to FPbase and search for “mCherry”.
Find the following parameters for the protein
Save them to seperate variables in the cell below.
What is the absorbance of a 1 µM solution of mCherry at its absorption maximum?
The sequence of the protein is also given. From this determine the extinction coefficient at 280 nm.
Start by taking the sequence from the website and assigning it to the variable sequence
in the cell below.
Now use the sequence to calculate the extinction coefficient, finish the code below (or take the function you implemented in a previous exercise!)
Ellipsis
The excitation and emission spectra can be downloaded as a csv-file by clicking the download icon as highlighted below
Use the widget below to load the dataset as a DataFrame
Run the next cell after uploading the file
Make your own plot showing the excitation and emission spectra of “mCherry” using the above data.
To plot we use the matplotlib
package. Plots are generally just straight lines connecting points, with enough points we get a smooth looking figure.
For example, to plot a line connecting three datapoints
fig
and an ax
, the ax
-object is the box where our plot is created.
There are many ways of customizing plots, you will see different ones in the exercises, but by no means all of them - if you are interested you can find more information on the matplotlib documentation.
You don’t have to worry about the NaN
values in the dataset when plotting, matplotlib just skips plotting that line segment.
What is the Stokes shift of mCherry?
If you have two arrays A
and B
you can find the entry in A
corresponding to the largest value in B
like this
The np.argmax
stands for argument maximum meaning that it finds the index of the maximum value in a given array. The figure below illustrates this
import numpy as np
# Start by extracting the wavelengths from the DataFrame
wavelengths = ...
# Find the indices
ex_max_idx = np.argmax(...)
em_max_idx = ...
# Use the index to find the corresponding wavelengths
lambda_ex = wavelengths[...]
lambda_em = ...
stokes_shift = ...
print(f"{lambda_ex = :d}")
print(f"{lambda_em = :d}")
print(f"{stokes_shift = :d}")
What colors are the light that correspond to the excitation and emission maxima respectively?
You investigate a protein that neither contains tyrosine nor cysteine residues. A 37 µM solution gives an absorbance of 0,41 at 280nm at a light path of 1 cm.
How many tryptophan residues does the protein contain?
You would like to conduct a protein stability experiment at an absorbance of 0,8 (path length 1cm). What concentration should you use?
The cell below loads extinction and emission spectra for tryptophan in aqueous buffer. (The data files are trypt-absor-fluor-emission.xlsx
and trypt-absor-fluor-extinction.xlsx
)
wavelength_(nm) | emission_(AU) | |
---|---|---|
0 | 280.0 | 379 |
1 | 280.5 | 396 |
2 | 281.0 | 403 |
... | ... | ... |
396 | 478.0 | 5024 |
397 | 478.5 | 4950 |
398 | 479.0 | 4813 |
399 rows × 2 columns
wavelength_(nm) | molar_extinction_(cm-1/M) | |
---|---|---|
0 | 219.75 | 12488 |
1 | 220.00 | 12872 |
2 | 220.25 | 12428 |
... | ... | ... |
396 | 318.75 | 14 |
397 | 319.00 | 18 |
398 | 319.25 | 45 |
399 rows × 2 columns
Make plot showing the two spectra and determine the Stokes shift.
fig, axes = plt.subplots(1, 2, figsize=(7, 3), layout='constrained')
# Emission
ax = axes[0]
ax.set_title('Emission')
# Replace ... with your code.
ax.plot(..., ..., label='Emission', color='C0')
ax.set_ylabel('Emission')
ax.set_xlabel('Wavelength [nm]')
# Extinction
ax = axes[1]
ax.set_title('Extinction')
# Replace ... to plot the extinction spectrum
...
ax.set_ylabel('Extinction')
ax.set_xlabel('Wavelength [nm]')
plt.show()
Recall that we have used np.argmax
to determine the maximum index before, use that to calculate the Stokes shift
max_index = np.argmax(...) # Replace ... to get the index of the maximum of the emission spectrum
emission_wavelength_max = df_emission['wavelength_(nm)'][max_index]
max_index = ... # Replace ... to get the index of the maximum of the extinction spectrum
extinction_wavelength_max = df_extinction[...][...] # Use the index to get the wavelength at of the maximum.
print(f"{emission_wavelength_max = :.1f} nm")
print(f"{extinction_wavelength_max = :.1f} nm")
Next, you compare the fluorescence emission spectra for your protein at pH 7 and at pH 2. You can find the data in the files trypt-absor-fluor-ph2.xlsx
and trypt-absor-fluor-ph7.xlsx
- the cell below also loads these datasets
Wavelength(nm) | Fluo_Int | |
---|---|---|
0 | 311.623037 | 0.890000 |
1 | 314.450262 | 1.128668 |
2 | 315.863874 | 1.128668 |
... | ... | ... |
43 | 367.068063 | 27.990971 |
44 | 368.638743 | 27.313770 |
45 | 370.052356 | 26.636569 |
46 rows × 2 columns
Wavelength(nm) | Fluo_Int | |
---|---|---|
0 | 311.623037 | 0.890000 |
1 | 314.450262 | 1.128668 |
2 | 315.863874 | 1.128668 |
... | ... | ... |
43 | 367.068063 | 27.990971 |
44 | 368.638743 | 27.313770 |
45 | 370.052356 | 26.636569 |
46 rows × 2 columns
Make a plot comparing the two emission spectra. Which resembles the spectrum of tryptophan in water most?
Explain why the same protein gives such different spectra at pH 2 and pH 7.
The protein of human myoglobin is given below
We want to calculate the extinction coefficient of this protein, we have seen that this can be calculated using the formula
\[ \epsilon(280 \mathrm{nm}) = N_{Trp} \epsilon_{Trp} + N_{Tyr} \epsilon_{Tyr} + N_{Cys} \epsilon_{Cys} \tag{1}\]
Where \(N_{Trp}\) is the number of Tryptophan in the protein (and likewise for the other two terms), and the three constants \(A\), \(B\) and \(C\) are given as
\[ \begin{align} \epsilon_{Trp} &= 5500 \ \mathrm{M^{−1} cm^{−1}} \\ \epsilon_{Tyr} &= 1490 \ \mathrm{M^{−1} cm^{−1}} \\ \epsilon_{Cys} &= 125 \ \mathrm{M^{−1} cm^{−1}} \end{align} \]
In order to calculate the formula we need to know the count of the relevant residues, we can use Python to get that - for example we can count the number of Tryptophan like so;
In the cell below find the number of residues
You can check what Python has stored each variable by using print
Use equation (Equation 1) to calculate the extinction coefficient of human myoglobin.
What are the units of this value?
ProtParam is an online tool that calculates various physical and chemical parameters from a given protein sequence and is used worldwide in research laboratories.
Go to ProtPram at this link: https://web.expasy.org/protparam/ and paste the sequence and click Compute Parameters. You should then see the calculated parameters, similar to in the image below
On the output page you will see the number of residues, does that match your calculation?
Using the extinction coefficient and the molecular weight given by ProtParam, calculate the absorbance at 280 nm of a myoglobin solution at a concentration of 1 mg/mL in a cuvette with a light path of 1 cm.
Remember to convert the concentration to \(\mathrm{mol/L}\).
This value is what is known as the A280(0.1%) of a protein, i.e. the absorbance of a given protein at a concentration of 0.1% weight/volume (= 1 g/L = 1 mg/mL).
We have now calculated the extinction coefficient of a protein, now we will make our code more reusable so that it can be applied to other proteins easily.
A function in Python is a set of instructions, like a recipe, that can be defined and reused multiple times. The syntax is like this
def
command is used to define the functions name, here my_function
, and state its inputs, e.g. the name and ingredients of a recipe.
return
something, like the final product of a recipe.
Note that the function is not executed by doing this, like how a cake isn’t baked by writing down the recipe, in order actually use the function it needs to be called
This is also how we have already used other functions like print
.
The way of doing so is by defining a function that does the necessary operations for a given sequence. In this way the code can be reused for any sequence.
Finish implementing the body of the function below, note that you have already written all the required code - you just need to copy it into the function.
It’s always a good idea to check that functions do what we expect, so we can confirm that it gives the same result for human myoglobin as we calculated before
The largest known protein is Titin, the cell below loads the sequence of titin and prints a few bits of information about it. (You can also find the full sequence in the dataset extin-coeff-human-myogl.txt
.)
Use your function to calculate the extinction coefficient of titin.
You don’t need to understand the code below, it’s just ment to illustrate that knowing some Python will allow you to explore the topics that interest you in more detail.
In general Python is very powerful at letting us explore properties of sequences, for example the cell below calculates number of residues between each Tryptophan in the Titin sequence and plot the distribution.
import numpy as np
import matplotlib.pyplot as plt
def get_distance(sequence, letter):
W_index = np.argwhere(np.array([l for l in sequence]) == letter)
count = len(W_index)
distance = (W_index - np.roll(W_index, 1))[1:]
return distance, count
letters = ['W', 'Y', 'C']
fig, axes = plt.subplots(1, 3, figsize=(3*3, 3), sharey=True)
axes = axes.flatten()
for ax, letter in zip(axes, letters):
distance, count = get_distance(titin_sequence, letter)
ax.hist(distance, bins=np.arange(0, 500, 25),
edgecolor='black', alpha=0.75, density=True)
ax.set_xlabel('Distance [Number of residues]')
info = f'Residue: {letter} \nCount: {count} \nMean distance: {np.mean(distance):.1f}'
ax.text(0.975, 0.975, info, transform=ax.transAxes, ha='right', va='top')
---
title: Week 45
engine: jupyter
categories: ['calculation', 'data', 'plotting']
format-links:
- text: "Open in Google Colab"
href: "https://colab.research.google.com/github/au-mbg/fysisk-biokemi/blob/built-notebooks/built_notebooks/student/week_45.ipynb"
icon: box-arrow-up-right
---
{{< include tidbits/_install_import.qmd >}}
---
{{< include intros/calculation_intro.qmd >}}
---
{{< include intros/plotting_intro.qmd >}}
---
{{< include exercises/averag-prope-amino-acids.qmd >}}
---
{{< include exercises/uv-spec-apo-holo-myo.qmd >}}
---
{{< include exercises/protei-blood-plasma.qmd >}}
---
{{< include exercises/the-fluor-protei-mcherr.qmd >}}
---
{{< include exercises/trypt-absor-fluor.qmd >}}
---
{{< include exercises/extin-coeff-human-myogl.qmd >}}