Biological Modeling

Screenshot 1401-11-22 at 16.47.19

Random Walks and Turing Patterns

zebra stripes?

Random Walk Theorem

After n steps of unit length in a random walk, a particle will on average find itself a distance from its origin that is proportional to $\sqrt{n}$ .

consider more than one type of particle are randomly walking

Reaction-Diffusion System

parameters

kill rate ( $ k $ )

predator-prey rate ( $ r $ )

$ A + 2B \rightarrow 3B$

reactants $ \rightarrow $ products

the emergent patterns are called Turing patterns

feed rate ( $ f $ )

$ \frac{d[B]}{dt} = k [B] $

diffusion rate ( $ d_A $ )

if we raise the diffusion rate $ \rightarrow $ more of the reaction A + 2B → 3B.

if f = 2k we can see zebra stripes

we want to simulate system

cellular automaton model

concentration of a particle

$ [ A ] $

simulation

diffusion

convolution

feed

add $f.(1-[A])$ to each cell

kill

delete $k[B]$ from each cell

reaction

A + 2B → 3B

subtract $r.[A][B]^2$ from concentration of A and add it to concentration of B

in our sample model

$[A]_{\text{new}} = [A] + \triangle A + f.(1 - [A]) - r.[A][B]^2$ $[B]_{\text{new}} = [B] + \triangle B - k.[B] + r.[A][B]^2$

Applying these reaction-diffusion computations

a cellular automaton called the Gray-Scott model

fine-tuned system

very slight changes in parameter values can lead to significant changes in the system.

robust system

perturbations such as parameter variations do not lead to substantive changes in the ultimate behavior of the system

Network of Proteins

model protein interactions

network

biological networks has

network motifs

frequently recurring structures hidden within biological networks

central dogma

regulation

level

Protein

transcription factor

DNA

transcription factors

extracellular stimuli can activate a transcription factor via a system of signaling molecules that convey a signal through relay molecules to the transcription factor

week binding affinity to DNA in general

but strong for sequence motif

find the set of genes to which a transcription factor binds

ChIP-seq

chromatin immunoprecipitation sequencing

transcription factors and genes make a network

transcription factor network

nodes

organism’s proteins

edges

x to y if x is TF for y

loops

autoregulation

in E coli

most are negative

some parts are more common -> Motifs

Negative Autoregulation Motif

simulation

image activation of TF Y

reaction

kill rate

proteases enzymes degrade proteins

diffusion rate

same for both paricles

we ignore X kill rate because the system produce it in a way that its concentration become steady

Y repress itself

X -> X + Y

2Y -> Y

Repressilator

three components are transcription factors, and they repress themself in a loop

reaction

X + Y -> X

Y + Z -> Y

Z + X -> Z

all three particles are produced as the result of an activation process by some other transcription factor

hidden particle

I -> I + X

I -> I + Y

I -> I + Z

kill rate

same for all

diffusion rate

same for all

Start: X

because the biological systems are random

Noise is a feature of biological systems

The Feedforward Loop Motif

type-1 incoherent feedforward loop

simulation

image activation of protein Z, X and Y should be transcription factors

reaction

X -> X + Y

Y + Z -> Z

kill rate

we ignore X kill rate because the system produce it in a way that its concentration become steady

Y and Z kill rates are same

diffusion rate

same for both paricles

damped oscillation

have we non damping oscillator?

Repressilator

diagram

negative autoregulation makes system response time shorter

diagram

X -> X + Z

we have different type of that

result

Screenshot 1401-10-03 at 16.13.57

Screenshot 1401-10-03 at 16.13.45

Screenshot 1401-10-03 at 16.11.09

Screenshot 1401-10-03 at 16.14.07

Screenshot 1401-10-03 at 16.13.32

result

Screenshot 1401-10-03 at 16.14.16

The repressilator is robust to disturbance

chemotaxis

How does a simple organism like E. coli sense an attractant or repellent in its environment?

The movement of organisms like E. coli in response to a chemical stimulus is called chemotaxis.

E. coli and other bacteria

move toward

attractants like glucose and electron acceptors

move away from

repellents like $\text{Ni}^{2+}$ and $\text{Co}^{2+}$

Every E. coli cell has between five and twelve flagella distributed on its surface that can rotate both clockwise and counter-clockwise.

all flagella are rotating counter-clockwise

propel the cell forward at about 20 μm per second.

it is about ten times the length of the cell per second

a single flagellum rotates clockwise

bacterium stops and rotates.

alternate between periods of “running” in a straight line and then “tumbling”

Screenshot 1401-11-29 at 13.21.37

random walk through its environment

Bacteria are amazingly diverse.

but

variations in bacterial tumbling frequencies are relatively small.

In the absence of an attractant or repellent, E. coli stops to tumble once every 1 to 1.5 seconds, which is similar to most other bacteria.

we wonder why evolution might hold tumbling frequency constant across species.

Chemotaxis

perceive a change in its environment and react accordingly

signal transduction

cell identifies a stimulus outside the cell and then transmits this stimulus into the cell.

certain molecule’s extracellular concentration increases

receptor proteins

have more frequent binding with these molecules

This signal is then “transduced” via a series of internal chemical processes.

For example

cascade begins that eventually changes a transcription factor into an active state, so that it is ready to activate or repress the genes that it regulates.

E. coli

Screenshot 1401-11-29 at 13.27.29

receptor proteins that detect attractants such as glucose by binding to and forming a complex with these attractant ligands.

To model ligand-receptor dynamics, we will use a reversible reaction that proceeds continuously in both directions at possibly different rates.

type

receptor-ligand complex will dissociate

L + T <--> LT

if we have $l_o$ from L and $t_0$ from T the system should become equilibrium in some point

ligand collides with a receptor

LT -> L + T

L + T -> LT

reverse reaction

means

the rates of the forward
and reverse reactions are equal.

$k_{bind}$.[L].[T] = $k_{dissociate}$.[LT]

conservation of mass

[L] + [LT] = $l_0$
[T] + [LT] = $t_0$

$k_{bind}$.[L].[T] = $k_{dissociate}$.[LT]
[L] + [LT] = $l_0$
[T] + [LT] = $t_0$

we can find [L], [T] and [LT]

Analyzing the Coronavirus Spike Protein

2003

severe acute respiratory syndrome, or SARS

This mysterious new disease had crossed the Pacific Ocean within a week of entering Hong Kong.

SARS was deadly, killing close to 10% of those who became sick.

2017

researchers published the results of sampling horseshoe bats for five years in a cave in Yunnan province. They found that these bats harbored coronaviruses with remarkable genetic similarity to the virus causing SARS.

but no body cares 😭

coronavirus disease 2019 (COVID-19)

Coronaviruses

SARS coronavirus 2 (SARS-CoV-2)

their outer membranes are covered in a layer of spike proteins

ACE2 enzyme on a human cell’s membrane

both SARS-CoV and SARS-CoV2 has spike protein

but

SARS-CoV-2 is more infectious

SARS-CoV-2

genome : nearly 30,000 nucleotides

Upon sequence comparison

related to several coronaviruses isolated from bats and distantly related to SARS-CoV

spike protein : 1,273 amino acid sequence

what is the spike protein 3D shape?

Nature’s magic protein folding algorithm

This folding process occurs spontaneously for all proteins and without any outside influence

Screenshot 1401-11-03 at 15.30.59

the same polypeptide chain will almost always fold into the same 3-D structure in a manner of microseconds.

how does SARS-CoV-2 spike protein structure and function differ from the same protein in SARS-CoV?

Determine the spike protein’s shape from its sequence of amino acids?

how

experimentally

cryo-electron microscopy (cryo-EM)

X-ray crystallography

crystallize many copies of a protein

shine an intense beam of X-rays at the crystal

The light hitting the protein is diffracted

newer method

preserve thousands of copies of a protein in non-crystalline ice

examine these copies with an electron microscope

expensive

An X-ray crystallography experiment for a single protein costs upward of $2,000, and building an electron microscope can cost millions.

Protein structures that have been determined experimentally are typically stored in the PDB, which contains over 160,000 protein structures.

This number may seem large, but a recent study estimated that the 20,000 human genes translate into between 620,000 and 6.13 million protein isoforms (i.e., protein variants with slightly different structures).

computationally

hard

fine-tuned with respect to some mutations but robust with respect to others

As a result, two very different amino acid sequences can fold into proteins with similar structure and comparable function.

example

hemoglobin subunit alpha taken from three species: humans, short-fin mako sharks, and emus.

Polypeptide

Amino Acid

Screenshot 1401-11-12 at 20.53.42

peptide bond

Screenshot 1401-11-12 at 20.53.56

the bond is between C and N

the angle of this bond is fix but it's more flexible for C_alpha

Screenshot 1401-11-12 at 20.55.04

So we can have different shape for a chain of amino acids

secondary strucrure

Screenshot 1401-11-12 at 20.55.22

tertiary structure

Screenshot 1401-11-12 at 20.55.51

quaternary structure

Screenshot 1401-11-12 at 20.56.06

spike protein

Screenshot 1401-11-12 at 20.56.15

three chain -> three identical protein structures

each chain

two subunit, S1 and S2

each subunit

protein domains

the SARS-CoV-2 spike protein has a receptor binding domain (RBD) located on the S1 subunit of the spike protein that is responsible for interacting with the human ACE2 enzyme

side chain

Screenshot 1401-11-12 at 20.53.42

twenty types

Screenshot 1401-11-12 at 21.21.49

nonpolar = hydrophobic

nine -> polar

tend to find these amino acids tucked away on the interior of the protein.

A central theme of Chapter 3 was that a system of chemical reactions moves toward equilibrium. The same principle is true of protein folding; when a protein folds into its final structure, it reaches a conformation that is as chemically stable as possible.

potential energy = free energy = bonded energy + non-bonded energy

bonded energy

protein’s covalent bonds

non-bonded energy

electrostatic interactions

bond angles between adjacent amino acids

torsion angles

van der Waals interactions

attraction and repulsion forces from the electric charge between pairs of charged amino acids.

Two nearby amino acids of opposite charge may interact to form a salt bridge.

Conformations that contain salt bridges and keep apart similarly charged amino acids will therefore have a lower free energy component contributed by electrostatic interactions.

due to random chance, the negatively charged electrons in an atom could momentarily be unevenly distributed on one side of the nucleus.

negative charge on the side

induced dipole

one side of the atom may attract only the oppositely charged components of another atom

Screenshot 1401-11-12 at 21.46.15

Van der Waals forces refer to the attraction and repulsion between atoms because of induced dipoles.

As the protein folds, it seeks a conformation of lowest total potential energy

creating patterns from which the position of every atom in the protein can be inferred.

$ \min_{structures~for~P} potential~energy~(strucrure)$

ab initio

Predicting a protein’s structure using only its amino acid sequence

What we have?

force fields

local search

QUARK

Slow!

.pdb

How to compare to 3D structures?

for shapes S and T

translate S -> same center of mass

sample n, $s_i$ and $t_i$ points from two shapes

RMSD = $ \sqrt{\frac{1}{n} \sum (s_i - t_i)^2}$

rotate and flip S so minimize RMSD

Kabsch algorithm

$d(S, T) = \min RMSD$

Screenshot 1401-11-16 at 00.33.08

Screenshot 1401-11-16 at 00.33.14

the potential energy of a candidate protein shape

Homology Modeling

(comparative modeling)

the information contained in known structures to help us predict the structure of a protein with unknown structure.

SARS-CoV spike protein -> SARS-CoV2 spike protein

fragment assembly

input

primary structure

find template with MSA

use template for structures of conserved parts

use fragment library for non-conserved parts

SWISS-MODEL

Robetta

GalaxyWEB

ProDy

Critical Assessment of protein Structure Prediction (CASP)

In 2020, the 14th iteration of this contest (CASP14) was won in a landslide.

The second version of AlphaFold, a DeepMind project, vastly outperformed the world’s foremost structure prediction approaches

based on deep learning

Local Difference in Proteins

SARS-CoV

receptor binding motif (RBM)

has mutated so much

Do the mutations that SARS-CoV-2 has accumulated somehow make it easier for the virus to infect human cells?

What about Structure?

In addition to verifying the structure of the spike protein in both SARS-CoV and SARS-CoV-2, researchers also determined the structure of the RBD complexed with ACE2 in both SARS-CoV (PDB entry: 2ajf) and SARS-CoV-2 (PDB entry: 6vw1).

RMSD -> for global structure alignment

Not good for find local differences

M(i, j) = 1 if amino acid i is more near to j than a threshold

contact map

SARS-CoV-2

Screenshot 1401-11-16 at 03.11.50

SARS-CoV

Screenshot 1401-11-16 at 03.11.57

similarity of i-th alpha carbon position

Screenshot 1401-11-16 at 17.01.29

k = 2 or 3 (2 for first and last)

$\sigma_{i, j}^2 = |i-j|^2$

VMD

Screenshot 1401-11-16 at 20.39.49

left : SARS-CoV right : SARS-CoV2

difference1

silver part

hydrophobic pocket

Met82, Leu79, Tyr83

hidden away from the outside of the ACE2 enzyme

yellow part

SARS-CoV-2, Phe486

inserts itself into the pocket

main-chain hydrogen bond

Asn487

Ala475

we can compute potential energy for ACE2 and Spike protein Complex

this complex has fewer energy in ACE2 - SARS-CoV2 Spike binding domain

What about dynamic movements of protein structure?

we use static structure

elastic network model (ENM)

Gaussian network model (GNM) is a kind of ENM

nodes: alpha carbons
springs: two C nearer than threshold

Screenshot 1401-11-17 at 14.23.20

The fluctuation of carbons are Gaussian, meaning that the alpha carbon deviates randomly from its equilibrium position according to a normal (bell-shaped) distribution.

Although atomic fluctuations are powered by randomness, the movements of protein atoms are heavily correlated.

$R_{ij} = $ distance between i and j

Screenshot 1401-11-17 at 14.31.27

$R_i^0 = $ the equilibrium position of node i

$\Delta R_i = $ displacement from equilibrium

Screenshot 1401-11-17 at 14.31.40

what is relation between $\Delta R_i$ and $\Delta R_j$ ?

cross-correlation of alpha carbons i and j

$C_{ij} = \frac{<\Delta R_i, \Delta R_j>}{\sqrt{<\Delta R_i, \Delta R_i> . <\Delta R_j, \Delta R_j>}}$

from -1 to 1

1 means completely correlated

$n \times n$ cross-correlation matrix C

Screenshot 1401-11-17 at 15.56.45

which parts are more flexible?

B-factor = $\frac{8 \pi^2}{3} <\Delta R_i, \Delta R_i>$

Normal mode analysis (NMA)

deconvolving a protein’s movement into individual normal modes

Screenshot 1401-11-17 at 15.52.30

Screenshot 1401-11-17 at 15.57.43

Screenshot 1401-11-17 at 15.51.26

left is for SARS-CoV-2

its NTD is more flexible

what about direction?

anisotropic network model (ANM)

Screenshot 1401-11-17 at 16.00.32

Classifying White Blood Cells

what are blood cells?

red blood cell (RBC)

transport oxygen via the hemoglobin protein

white blood cell (WBC)

How Are Blood Cells Counted?

classic method

device

hemocytometer

a technician filters a small amount of blood onto a gridded slide

counts the number of cells of each type in the squares on the grid

estimate the number of each type of cell per volume

identify and attack foreign cells as part of your immune system

divide into families based on their structure and function

monocytes

lymphocytes

granulocytes

Screenshot 1401-11-21 at 21.55.44

Screenshot 1401-11-21 at 21.55.52

Screenshot 1401-11-21 at 21.55.36

computational methods

identify WBC by color threshold

a binarized image

Screenshot 1401-11-22 at 14.13.28

not bad!

How to classify them?

(granulocytes, lymphocytes, and monocytes)

KNN

What is our features?

Binary Image ->

Find Boundary ->

take n samples ->

$\textbf{s} = (s_1, s_2, \dots, s_n)$

When two shapes are near?

$RMSD(s, t) = \sqrt{\frac{1}{n} [d(s_1, t_1)^2 + \dots + d(s_n, t_n)^2]}$

but we should rotate and move!

Kabsch algorithm

for every pair! so expensive 😢

rotate all images concurrently

major axis

the longest line segment that connects two points on the outside of the image and crosses through the image’s center of mass.

Screenshot 1401-11-22 at 16.47.14

Screenshot 1401-11-22 at 16.47.19

our desired shape space 😃

$\textbf{s} = (s_1, s_2, \dots, s_n)$

curse of dimensionality

principal components analysis (PCA)

have a multilobular nucleus

subtipe

eosinophils

neutrophils

d < n

$\textbf{s} = (s_1, s_2, \dots, s_d)$

KNN

F1-Score

The best known classification algorithms for WBC image

deep learning

basophils

forward reaction

$ \frac{d[A]}{dt} = f $

activate or repress

SARS coronavirus (SARS-CoV)

$k_{dissociate}$

$k_{bind}$

if we add more particles, solving equations become hard

Gillespie’s stochastic simulation algorithm

input

$l_0~~ t_0~~ k_{bind}~~ k_{dissociate}$

output

[L], [T] and [LT]

how we find $k_{bind}$ and $k_{dissociate}$?

experimentally

what is units?

concentration

reaction rate

$k_{bind}[L][T] \rightarrow \frac{mol}{\mu m^3 . s}$

$k \rightarrow$ different

$k_{dissociate}[LT] \rightarrow \frac{mol}{\mu m^3 . s}$

$[A] \rightarrow \frac{mol}{\mu m^3}~~~~~$

detour

a well-mixed environment of particles

on average, $\lambda$ customers enter a store in a single hour

$ X \sim $ number of customers enter the store in next hour

$X \sim Poisson$

$Pr(X = n) = \frac{\lambda^n e^{-\lambda}}{n!}~~~~~~$

$ X \sim $ number of customers enter the store in next $t$ hours

$X \sim Poisson$

$Pr(X = n) = \frac{(\lambda t)^n e^{-\lambda t}}{n!}~~~~~~$

how long we will typically have to wait for the next customer to arrive?

$T \sim $ exponential distribution

$Pr(T > t) = Pr(X = 0, T = t) = e^{- \lambda t}$

if t increases the probability of Pr(T > t) decrease exponentially

a reaction involving those particles taking place at some average rate

how long should we expect to wait before this reaction occurs somewhere in the environment?

customers : instances of a chemical reaction
$\lambda$ : the rate r at which the reaction occurs

$E(T) = \frac{1}{\lambda}$

we have n reactions with rates $r_1, r_2, \dots, r_n$

average waiting time

$\frac{1}{r_1, r_2, \dots, r_n}$

we can generate random time from exponential distribution

now we should do a reaction but witch

select one with probability $P_i = \frac{r_i}{r_1 + r_2 + \dots + r_n} $

example

$P(L + T \rightarrow LT) = \frac{r_{bind}}{r_{bind} + r_{dissociate}}$

$P(LT \rightarrow L + T) = \frac{r_{dissociate}}{r_{bind} + r_{dissociate}}$

process of signal transduction

phosphorylation

phosphoryl group usually comes from

dephosphorylation

$ATP \rightarrow ADP + PO_{3}^{-}$

a molecule loses its phosphoryl group

a chemical reaction that attaches a phosphoryl group ($PO_{3}^{-}$) to an organic molecule.

Phosphoryl modifications serve as an information exchange

activate or deactivate certain enzymes

phosphorylation cascade

a sequence of phosphorylation and dephosphorylation events

in bacterial chemotaxis serves to

transmit information within the cell about the amount of ligand binding being detected on the cell’s exterior.

Screenshot 1401-12-15 at 21.05.35

material

receptor

methyl-accepting chemotaxis proteins (MCPs)

in the cell exterior

bind to ligand

in the cell interior

bind to proteins

The pathway also includes a number of additional proteins, which all start with the prefix Che (short for “chemotaxis”).

Screenshot 1401-12-15 at 21.10.41

MSP bind to CheW and CheA

in absence of MCP-ligand, this complex is more stable

CheA molecule autophosphorylates

phosphorylated CheA pass on its phosphoryl group to CheY

phosphorylated CheY interact with flagellar motor switch

flagellar motor switch

change of flagellar rotation from counter-clockwise to clockwise

responsible for controlling the direction of flagellar rotation.

tumble

which in the absence of an increase in attractant occurs every 1 to 1.5 seconds

in presence of MCP-ligand, this complex is unstable

CheA is less able to autophosphorylate

less likely to tumble

run for a longer period of time

We Can add this reactions to Gillespie algorithm

increase in Ligand

decrease in phosphorylated CheY

less tumbling

adaptation

attractant remains constant for a period of time

regardless of the absolute value of the concentration

cell returns to the same background tumbling frequency

Bacteria remember past concentrations using methylation

methylation

a methyl group ($−CH_3$) is added to an organic molecule

Every MCP receptor contains four methylation sites

demethylation

On the plasma membrane

MCPs, CheW, and CheA molecules form an array structure

Methylation reduces the negative charge on the receptors

stabilizing the array

facilitating CheA autophosphorylation

how increase tumbling frequency?

ligand

methylation

CheR -> Methylation of MCPs

When bound to MCPs, CheR methylates ligand-bound MCPs faster

the rate of MCP methylation by CheR is higher if the MCP is bound to a ligand

when ligand increase

autophosphorylated CheY $\downarrow$

run

after a time

CheR methylate MCP

autophosphorylated CheY $\uparrow$

tumble

CheB -> Demethylation of MCPs

CheA -> Phosphorylate CheB

autophosphorylated CheA $\uparrow$

autophosphorylated CheB $\uparrow$

demethylation of MCP $\uparrow$

autophosphorylated CheA $\downarrow$

autophosphorylated CheY $\downarrow$

combinatorial explosion : means that building realistic models of biochemical systems at scale can be daunting.

rule-based modeling : a paradigm in which a potentially enormous number of reactions are specified by a much smaller collection of “rules” from which all reactions can be inferred.

Bacterial tumbling is robust to large sudden changes in attractant

$l_0 = 100,000$

$l_0 = 1,000,000$

$l_0 = 10,000,000$

Screenshot 1401-12-16 at 21.22.28

Screenshot 1401-12-16 at 21.22.33

Screenshot 1401-12-16 at 21.22.38

in each state it returns to equilibrium quick and in same time

orange is CheY+P concentration

x side is time

Traveling up an attractant gradient

Imagine a glucose cube in an aqueous solution. As the cube dissolves, a gradient will form, with a glucose concentration that decreases outward from the cube.

to simulate

we increase $[L]$ exponentially

$[L] = l_0 e^{k.t}$

k represents the steepness of the gradient

$l_0 = 1000$ and $k = 0.1$

Screenshot 1402-01-01 at 02.01.02

The Beauty of E. coli’s Random Exploration Algorithm

we simulate two possible scenario

average waiting time is fix

average waiting time depend on environment

Screenshot 1402-01-01 at 02.23.12

Screenshot 1402-01-01 at 02.23.15