Computational Biology
Midterm Study Guide
Campbell Biology (2014) · Jones & Pevzner (2004) · Suraishkumar (2019)
What Is Computational Biology?
Computational biology refers to the use of techniques in computer science, data analysis, mathematical modeling, and computational simulations to understand biological systems and their relationships (NIH BISTI, 2000). It sits at the intersection of computer science, biology, data science, applied mathematics, chemistry, and genetics.
Course clarification
In IF3211, Bioinformatics ≈ Computational Biology. The course focuses on applying computational techniques to solve biological problems, not purely wet-lab biology.
Five Related Fields — Know the Difference
Computational Biology
Using CS, math & statistics to understand biology. Broad umbrella term.
Bioinformatics
Using tech to analyze DNA, RNA, proteins & big data in biology. Often used synonymously here.
Bio-inspired Computing
Using models of biology to solve CS problems — e.g., genetic algorithms, neural networks, swarm intelligence.
Biomedical Engineering
Using engineering & computing to treat disease. Retinal prosthetics, biosensors, biochips.
Biological Computing
Using biologically derived molecules (DNA, proteins) to perform computations — DNA computing.
Applications of Computational Biology in Genetics
Biology vs Mathematics — Why Biology Is Different
Biology is dominated by observations; its rules are not yet fully universal, and mathematical representations are still evolving. Unlike math (where operations are precisely defined) or physics (where observations map cleanly to equations), biology requires pattern recognition across messy, context-dependent data. This is exactly why computational methods are so valuable.
Bio-Inspired Algorithms — CS from Biology
Genetic Algorithms (GA)
Population of candidate solutions (chromosomes of bits). Uses selection, crossover, mutation to evolve better solutions over generations.
Artificial Neural Networks
Inspired by biological neurons: weighted inputs → summation → activation function → output. Basis of modern deep learning.
Swarm Intelligence
Ant colony optimization, particle swarm — emergent intelligent behavior from simple agents following local rules.
Levels of Biological Organization
Unifying Themes in Biology (Campbell Ch. 1)
- Evolution — the overarching framework; natural selection drives diversity.
- Information flow — DNA → RNA → Protein (Central Dogma).
- Energy transformation — all life transforms energy; photosynthesis captures it, cellular respiration releases it.
- Structure & function — form at every level correlates with function (e.g., enzyme active sites, membrane phospholipids).
- Emergent properties — life arises from interaction of non-living components; the whole is greater than the sum of its parts.
- Cell theory — the cell is the fundamental unit of life; all cells come from pre-existing cells.
Exam tip — Sequence Analysis Problem
Many CompBio problems come from biology to CS (fragment assembly, phylogenetic trees, sequence alignment) and from CS to biology (sequencing by hybridization, DNA computing). Know which direction each flows.
All living organisms are built from chemicals based mostly on carbon. A compound containing carbon is an organic compound. Carbon's unique ability to form four covalent bonds allows it to build large, complex, varied molecules. The four classes of biological macromolecules are carbohydrates, lipids, proteins, and nucleic acids.
Polymers and Monomers — The Core Concept
Dehydration Synthesis (Condensation)
Monomers are joined by removing a water molecule from between them, forming a covalent bond. This builds polymers.
Hydrolysis
Polymers are broken apart by adding a water molecule across a bond, splitting monomers apart. Digestion uses hydrolysis.
1. Carbohydrates (CH₂O)ₙ
Carbohydrates serve as fuel and building materials. Their molecular formula is generally multiples of CH₂O. They are classified by size into monosaccharides, disaccharides, and polysaccharides.
Key Terms
Important Polysaccharides
| Name | Monomer | Linkage | Function | Found in |
|---|---|---|---|---|
| Starch (amylose/amylopectin) | α-glucose | α-1,4 + α-1,6 (branched) | Energy storage | Plants |
| Glycogen | α-glucose | α-1,4 + α-1,6 (highly branched) | Energy storage | Animals, fungi |
| Cellulose | β-glucose | β-1,4 | Structural support | Plant cell walls |
| Chitin | N-acetylglucosamine | β-1,4 | Structural support | Arthropod exoskeletons, fungal walls |
α vs β linkage — a critical distinction
α-linkages (starch, glycogen) form helical, digestible chains. β-linkages (cellulose, chitin) form straight, hydrogen-bonded sheets that are much stronger and indigestible by most animals. Humans lack cellulase.
2. Lipids
Lipids are hydrophobic (water-fearing) molecules. Unlike the other three macromolecules, lipids are not true polymers — they don't form long repetitive chains of identical monomers. They are unified by being mostly nonpolar hydrocarbons that do not mix with water.
3. Proteins
Proteins are the most structurally and functionally diverse macromolecules. They are polymers of amino acids joined by peptide bonds. A protein's specific shape determines its specific function. Proteins carry out virtually every process in a cell.
Functions of Proteins
- Enzymes — biological catalysts that speed up reactions (e.g., amylase, DNA polymerase, ATP synthase).
- Structural — collagen (connective tissue), keratin (hair, nails), actin & myosin (muscle).
- Transport — hemoglobin (O₂), membrane transport proteins, channel proteins.
- Signaling — hormones (insulin), receptors, second-messenger proteins.
- Defense — antibodies (immunoglobulins) bind to foreign antigens.
- Gene regulation — transcription factors bind DNA to turn genes on/off.
- Motor proteins — kinesin, dynein move along cytoskeletal tracks.
Amino Acid Structure
All 20 amino acids share a central α-carbon bonded to: an amino group (–NH₂), a carboxyl group (–COOH), a hydrogen (–H), and a variable R group (side chain). The R group determines the amino acid's chemical properties.
Nonpolar / Hydrophobic R
Glycine, Alanine, Valine, Leucine, Isoleucine, Proline, Phenylalanine, Methionine, Tryptophan
Polar / Uncharged R
Serine, Threonine, Cysteine, Asparagine, Glutamine, Tyrosine
Charged R (ionized)
Positive: Lysine, Arginine, Histidine.
Negative: Aspartate, Glutamate.
Four Levels of Protein Structure
Denaturation
Disruption of a protein's structure (secondary and beyond) by heat, extreme pH, or chemicals. The primary structure remains intact, but function is lost. Some proteins can renature (e.g., ribonuclease); most cannot in vivo.
4. Nucleic Acids — DNA & RNA
Nucleic acids store, transmit, and help express genetic information. They are polymers of nucleotides. Each nucleotide has three components: a 5-carbon sugar (deoxyribose in DNA, ribose in RNA), a nitrogenous base, and a phosphate group. Nucleotides are joined by phosphodiester bonds between the 3'-OH of one sugar and the 5'-phosphate of the next.
| Property | DNA | RNA |
|---|---|---|
| Sugar | Deoxyribose (–H at 2') | Ribose (–OH at 2') |
| Bases | A, T, G, C | A, U, G, C (Uracil replaces Thymine) |
| Strands | Double-stranded (double helix) | Single-stranded (can fold) |
| Location | Nucleus, mitochondria, chloroplasts | Nucleus, cytoplasm, ribosomes |
| Function | Long-term genetic information storage | Protein synthesis (mRNA, tRNA, rRNA), regulation (miRNA, siRNA) |
| Stability | Very stable (deoxyribose + double-stranded) | Less stable (2'-OH is reactive) |
Nitrogenous Bases — Base-Pairing Rules
Purines (double ring)
Adenine (A) and Guanine (G)
Pyrimidines (single ring)
Cytosine (C), Thymine (T/DNA only), Uracil (U/RNA only)
In any double-stranded DNA: %A = %T and %G = %C. Therefore A+T ≠ G+C (varies by organism).
ATP — The Cell's Energy Currency
Adenosine triphosphate (ATP) is a nucleotide (adenine + ribose + 3 phosphates) that is the direct energy currency of all cells. The bond between the second and third phosphate is a high-energy phosphoanhydride bond. Hydrolysis of this bond releases ~7.3 kcal/mol of free energy under standard conditions.
This exergonic reaction is coupled to endergonic cellular work (biosynthesis, transport, movement). ATP regenerated by cellular respiration and photosynthesis.
Macromolecule Quick-Reference
| Macromolecule | Monomer | Bond | Key Functions | Example |
|---|---|---|---|---|
| Carbohydrate | Monosaccharide | Glycosidic | Energy, structure | Glucose → starch / cellulose |
| Lipid (fat) | Glycerol + fatty acids | Ester | Energy storage, insulation | Triacylglycerol |
| Lipid (membrane) | Glycerol + 2 FA + phosphate | Ester | Membrane bilayer | Phosphatidylcholine |
| Protein | Amino acid | Peptide | Catalysis, structure, signaling, transport | Hemoglobin, collagen, insulin |
| DNA | Deoxyribonucleotide | Phosphodiester | Genetic information storage | Chromosome |
| RNA | Ribonucleotide | Phosphodiester | Gene expression, catalysis | mRNA, tRNA, rRNA, ribozyme |
Genetics at the molecular level explains how information is stored in DNA, copied faithfully during cell division, and converted into functional proteins. This section covers DNA replication, the Central Dogma (transcription + translation), gene regulation, and mutations — all relevant computational biology targets.
DNA Double Helix — Structure
- Watson-Crick model (1953): Two antiparallel polynucleotide strands coiled around a common axis. One strand runs 5'→3', the other 3'→5'.
- Sugar-phosphate backbones on the outside; nitrogenous bases stacked on the inside, perpendicular to the helix axis (~0.34 nm apart; one full turn = 10 base pairs = 3.4 nm).
- Base pairs stabilized by hydrogen bonds (A=T via 2; G≡C via 3) and by hydrophobic stacking interactions between adjacent base pairs.
- Major and minor grooves: Proteins that interact with DNA (transcription factors, restriction enzymes) typically bind the major groove where base-specific contacts are possible.
DNA Replication — Semiconservative Model
Each new DNA molecule consists of one original (template) strand and one newly synthesized strand — the semiconservative model (confirmed by Meselson-Stahl experiment, 1958).
For a visual overview of the replication process, see this excellent video!
[Video] DNA Replication Animation: https://www.youtube.com/watch?v=TNKWgcFPHqw
Key Enzymes & Proteins
Leading vs Lagging Strand
Leading Strand
Synthesized continuously in the 5'→3' direction toward the replication fork. Requires one RNA primer.
Lagging Strand
Synthesized discontinuously (away from fork) as Okazaki fragments (100–200 nt in eukaryotes, 1000–2000 in prokaryotes). Each fragment needs its own RNA primer.
End Replication Problem & Telomeres
Eukaryotic chromosomes shorten with each replication at the 3' end of the lagging strand template. Telomeres (TTAGGG repeats) cap chromosome ends. Telomerase (reverse transcriptase carrying its own RNA template) extends telomeres in germline cells and stem cells. Cancer cells often reactivate telomerase for immortality.
The Central Dogma of Molecular Biology
Reverse transcription (RNA→DNA, by retroviruses) and RNA replication (RNA→RNA, by RNA viruses) are special exceptions.
Transcription (DNA → mRNA)
Transcription produces an RNA molecule complementary to the DNA template strand (read 3'→5'), building mRNA 5'→3'. It occurs in the nucleus in eukaryotes. The resulting mRNA is the same sequence as the coding strand (except T→U).
- Initiation: RNA polymerase binds the promoter (in bacteria: –10 and –35 sequences; in eukaryotes: TATA box ~25 bp upstream, recognized by transcription factors). The double helix unwinds locally.
- Elongation: RNA polymerase moves along template strand 3'→5', synthesizing mRNA 5'→3' at ~40 nt/s. No proofreading mechanism. Ribonucleotides (ATP, UTP, GTP, CTP) are incorporated.
- Termination: In bacteria: rho-independent (hairpin in mRNA) or rho-dependent termination. In eukaryotes: polyadenylation signal (AAUAAA) followed by cleavage and poly-A addition.
Eukaryotic Pre-mRNA Processing
- 5' Capping: Modified guanosine cap added to 5' end (protects from degradation; aids ribosome recognition).
- 3' Poly-A Tail: ~200 adenine nucleotides added to 3' end (stability; nuclear export signal).
- RNA Splicing: Introns (non-coding intervening sequences) are removed by the spliceosome. Exons (expressed sequences) are joined together. Alternative splicing of the same pre-mRNA can produce different protein isoforms.
Translation (mRNA → Protein)
Translation occurs at the ribosome (large + small subunit). It reads mRNA in the 5'→3' direction, in triplets called codons, to assemble a polypeptide from amino acids delivered by tRNA molecules. The genetic code is redundant (degenerative), universal (with minor exceptions), and unambiguous.
Start codon: AUG (methionine). Stop codons: UAA, UAG, UGA. Multiple codons per amino acid = degeneracy (wobble at 3rd position).
- Initiation: Small ribosomal subunit binds mRNA at the 5' cap (eukaryotes) or Shine-Dalgarno sequence (bacteria). Initiator tRNA (Met-tRNA) pairs with AUG start codon in the P site. Large subunit joins. GTP hydrolysis used.
- Elongation: Aminoacyl-tRNA enters the A site; peptide bond forms (peptidyl transferase activity of 23S/28S rRNA is a ribozyme!); ribosome translocates 3 nt in 5'→3' direction; empty tRNA exits via E site.
- Termination: Stop codon (UAA, UAG, UGA) enters A site; release factor (not tRNA) binds; polypeptide released; ribosome dissociates. Polyribosomes (polysomes) = multiple ribosomes on one mRNA.
tRNA — The Adapter Molecule
tRNA has a cloverleaf secondary structure folded into an L-shape in 3D. The anticodon loop (3 nucleotides) base-pairs with the mRNA codon. The 3' CCA-OH end carries the specific amino acid, attached by aminoacyl-tRNA synthetase (one per amino acid).
Mutations — Types & Consequences
Gene Regulation
Not all genes are expressed at all times in all cells. Gene regulation occurs at multiple levels:
Histones acetylated (decondense chromatin → accessible) or methylated (condense → silenced). Epigenetic control — heritable but not encoded in DNA sequence.
Transcription factors (activators and repressors) bind enhancers/silencers (sometimes thousands of bp away, looping to promoter) to control RNA polymerase. Operons in bacteria (lac operon: lacZ, lacY, lacA genes repressed by lac repressor; induced by allolactose).
Alternative splicing; mRNA stability (AU-rich elements in 3'UTR target mRNA for degradation); miRNA/siRNA (small ~22 nt RNAs guide RISC complex to complementary mRNA for cleavage or translational repression).
Control of ribosome binding; iron-responsive elements (IREs) in ferritin mRNA — IRP protein blocks translation when iron is low.
Protein folding (chaperones), cleavage (zymogen activation), chemical modification (phosphorylation, glycosylation, ubiquitination → proteasomal degradation).
Computational Approaches to Genetics
Sequence Alignment
Finding regions of similarity between DNA/protein sequences. Pairwise (BLAST, Smith-Waterman) and multiple sequence alignment (ClustalW, MUSCLE). Dynamic programming is the basis of optimal alignment.
Fragment Assembly
Reconstructing a genome from millions of short reads (shotgun sequencing). Overlap-layout-consensus or De Bruijn graph approaches. Hamiltonian/Eulerian path problems.
Gene Finding
Identifying ORFs (open reading frames), promoter sequences, splice sites in genomic DNA. Hidden Markov Models (HMMs) widely used.
Motif Finding
Identifying short recurring patterns (transcription factor binding sites) in a set of sequences. Gibbs sampling, expectation-maximization (EM) methods.
Sequence Alignment — Key Concepts (Jones & Pevzner)
e.g. d_H("ATGC", "AAGC") = 1
# Smith-Waterman local alignment (pseudocode)
def smith_waterman(seq1, seq2, match, mismatch, gap):
H = [[0]*(len(seq2)+1) for _ in range(len(seq1)+1)]
for i in range(1, len(seq1)+1):
for j in range(1, len(seq2)+1):
score = match if seq1[i-1]==seq2[j-1] else mismatch
H[i][j] = max(
0,
H[i-1][j-1] + score, # diagonal (match/mismatch)
H[i-1][j] + gap, # gap in seq2
H[i][j-1] + gap # gap in seq1
)
return H
# De Bruijn graph for genome assembly
# k-mer decomposition of a read
def kmer_decomp(seq, k):
return [seq[i:i+k] for i in range(len(seq)-k+1)]
# Build De Bruijn graph: nodes=(k-1)-mers, edges=k-mers
def de_bruijn(kmers):
graph = {}
for kmer in kmers:
prefix = kmer[:-1]
suffix = kmer[1:]
graph.setdefault(prefix, []).append(suffix)
return graph
The plasma membrane is a selectively permeable barrier that regulates the passage of substances in and out of the cell. Its structure is described by the Fluid Mosaic Model (Singer & Nicolson, 1972): a fluid phospholipid bilayer with embedded proteins that can move laterally.
Membrane Structure
Passive Transport — No Energy Required
Passive transport moves substances down their concentration (or electrochemical) gradient — from high to low concentration. No ATP needed; driven by thermodynamics (entropy increase).
Osmotic Terminology
Hypotonic Solution
Solute concentration lower than cell interior. Water flows into cell → cell swells (animal: lysis; plant: turgid/healthy).
Hypertonic Solution
Solute concentration higher than cell interior. Water flows out → cell shrinks (animal: crenation; plant: plasmolysis).
Isotonic Solution
Solute concentration equal to cell interior. No net movement of water. Cell maintains normal volume.
π = osmotic pressure, i = van 't Hoff factor (ionization), M = molarity, R = gas constant (0.0821 L·atm/mol·K), T = temperature (Kelvin). Critical for biomedical IV solutions.
Active Transport — Energy Required
Active transport moves substances against their concentration gradient — from low to high concentration. Requires ATP (primary active transport) or an ion gradient (secondary active transport).
Net result: cytoplasm negative relative to outside. High K⁺ inside; high Na⁺ outside. Exploited by every excitable cell.
Bulk Transport — Moving Large Molecules
Transport Comparison Table
| Type | Direction | Energy | Protein needed? | Examples |
|---|---|---|---|---|
| Simple diffusion | High → Low | None | No | O₂, CO₂, ethanol |
| Osmosis | High [H₂O] → Low [H₂O] | None | No (or aquaporin) | Water across all membranes |
| Facilitated diffusion (channel) | High → Low | None | Yes (channel) | Ion channels, aquaporins |
| Facilitated diffusion (carrier) | High → Low | None | Yes (carrier) | GLUT1 (glucose), uniporters |
| Primary active | Low → High | ATP | Yes (pump) | Na⁺/K⁺ ATPase, H⁺ pump, Ca²⁺ pump |
| Secondary active (symport) | Low → High (solute) | Ion gradient | Yes | SGLT1 (Na⁺/glucose), NKCC |
| Secondary active (antiport) | Low → High (solute) | Ion gradient | Yes | Na⁺/H⁺ exchanger, Na⁺/Ca²⁺ exchanger |
| Endocytosis/Exocytosis | In/Out | ATP | Yes (vesicle machinery) | LDL uptake, neurotransmitter release |
Metabolism is the totality of an organism's chemical reactions. It is organized into metabolic pathways: ordered sequences of reactions, each catalyzed by a specific enzyme, transforming substrates into products. Metabolic pathways are either catabolic (degradation, releasing energy) or anabolic (biosynthesis, requiring energy).
Thermodynamic Foundations
ΔH = enthalpy change (heat), T = absolute temperature (K), ΔS = entropy change. Also: ΔG° = −RT ln Keq (standard free energy and equilibrium constant).
Enzymes — Biological Catalysts
Enzymes are proteins (or ribozymes) that lower the activation energy (Eₐ) of reactions without being consumed. They do not change ΔG or the equilibrium; they only speed the rate at which equilibrium is reached.
v = reaction rate, Vₘₐₓ = maximum rate, [S] = substrate concentration, Kₘ = Michaelis constant (substrate concentration at ½Vₘₐₓ). Low Kₘ = high affinity for substrate.
Cellular Respiration — Overview
Cellular respiration is the controlled release of chemical energy from organic molecules to produce ATP. In the presence of oxygen (aerobic respiration), glucose is completely oxidized.
ΔG° = −686 kcal/mol. Energy is harvested gradually through redox reactions. NAD⁺ and FAD act as electron carriers (reduced to NADH and FADH₂).
(Cytoplasm) → Pyruvate Oxidation
(Mitochondrial matrix) → Citric Acid Cycle
(Mitochondrial matrix) → Oxidative Phosphorylation
(Inner mitochondrial membrane)
Stage 1: Glycolysis (Cytoplasm)
Glucose (6C) is split into two pyruvate (3C) molecules through 10 enzymatic steps. Occurs in cytosol. Requires no oxygen (anaerobic-capable).
Investment phase (steps 1–5): 2 ATP consumed. Payoff phase (steps 6–10): 4 ATP + 2 NADH produced. Net: 2 ATP, 2 NADH per glucose.
Stage 2: Pyruvate Oxidation (Pyruvate Dehydrogenase Complex)
2 pyruvates per glucose → 2 Acetyl-CoA + 2 CO₂ + 2 NADH. Occurs in mitochondrial matrix. Pyruvate dehydrogenase complex contains three enzymes and five cofactors (including thiamine pyrophosphate, lipoic acid, CoA, FAD, NAD⁺).
Stage 3: Citric Acid Cycle (Krebs Cycle) — per Acetyl-CoA
Per glucose (×2): 6 NADH, 2 FADH₂, 2 ATP (substrate-level), 4 CO₂. Key intermediates: citrate (6C), isocitrate, α-ketoglutarate (5C), succinyl-CoA (4C), succinate, fumarate, malate, oxaloacetate (regenerated).
Stage 4: Oxidative Phosphorylation (Inner Mitochondrial Membrane)
NADH and FADH₂ donate electrons to the electron transport chain (ETC). Energy released is used to pump H⁺ across the inner mitochondrial membrane, creating a proton gradient (proton motive force). H⁺ flows back through ATP synthase (chemiosmosis), driving ATP synthesis.
Inhibited by: DNP (uncoupler — dissipates H⁺ gradient without ATP synthesis, releasing heat); oligomycin (ATP synthase blocker); cyanide/CO (Complex IV blocker — no final electron acceptor).
ATP Yield Summary (per glucose)
| Stage | Location | Direct ATP | NADH | FADH₂ | ATP equiv. |
|---|---|---|---|---|---|
| Glycolysis | Cytoplasm | 2 | 2 | — | 2 + 5 = 7 |
| Pyruvate oxidation | Matrix | 0 | 2 | — | 5 |
| Citric acid cycle | Matrix | 2 | 6 | 2 | 2 + 15 + 3 = 20 |
| Total (approx.) | 4 | 10 | 2 | ~30–32 ATP |
Anaerobic Respiration & Fermentation
When O₂ is absent or insufficient, cells regenerate NAD⁺ from NADH via fermentation — allowing glycolysis to continue producing 2 ATP per glucose.
Lactic Acid Fermentation
Pyruvate + NADH → Lactate + NAD⁺. Occurs in muscle cells during intense exercise, and in bacteria used to make yogurt & cheese.
Alcohol Fermentation
Pyruvate → Acetaldehyde (CO₂ released) → Ethanol + NAD⁺. Yeast and some bacteria. Basis of brewing and baking (CO₂ makes dough rise).
Photosynthesis — Capturing Light Energy
Photosynthesis converts light energy into chemical energy stored in glucose. Occurs in chloroplasts.
Reverse of aerobic respiration in terms of atoms, but mechanistically very different.
Light-dependent reactions (thylakoid membrane)
Chlorophyll absorbs light (mainly red ~680 nm and blue ~430 nm). Photosystem II splits H₂O → O₂ + 2H⁺ + 2e⁻. Electrons flow through ETC (producing NADPH). ATP synthesized by chemiosmosis. Photosystem I reduces NADP⁺ → NADPH.
Calvin Cycle / Light-independent (stroma)
Uses ATP + NADPH from light reactions to fix CO₂ into sugar. 3 turns to produce one G3P. RuBisCO catalyzes CO₂ fixation (most abundant enzyme on Earth). Net: 3 CO₂ → 1 G3P (requires 9 ATP + 6 NADPH).
Metabolic Networks & Computational Biology
Metabolism can be modeled as a directed graph where nodes are metabolites and edges are reactions (enzyme-catalyzed). This is the basis of flux balance analysis (FBA) and systems biology tools used to predict metabolic phenotypes.
# Simplified metabolic pathway graph (glycolysis key nodes)
glycolysis = {
'Glucose': [('G6P', 'Hexokinase')],
'G6P': [('F6P', 'Phosphoglucose isomerase')],
'F6P': [('F1,6BP', 'PFK-1')], # rate-limiting step
'F1,6BP': [('DHAP', 'Aldolase'), ('G3P', 'Aldolase')],
'G3P': [('1,3-BPG', 'GAPDH')], # NADH produced here
'1,3-BPG': [('3PG', 'Phosphoglycerate kinase')], # ATP
'3PG': [('2PG', 'Phosphoglycerate mutase')],
'2PG': [('PEP', 'Enolase')],
'PEP': [('Pyruvate', 'Pyruvate kinase')], # ATP
}
def trace_pathway(graph, start, end, path=[]):
path = path + [start]
if start == end: return path
for (neighbor, enzyme) in graph.get(start, []):
result = trace_pathway(graph, neighbor, end, path)
if result: return result
return None
print(trace_pathway(glycolysis, 'Glucose', 'Pyruvate'))
Key formula to remember for PFK-1 regulation
PFK-1 is the committed step of glycolysis and the most important regulatory enzyme. Inhibited by high ATP (energy surplus) and citrate (plenty of CAC intermediates). Activated by AMP (energy deficit) and F2,6BP. This is classic feedback inhibition in a catabolic pathway.
Quick Recap — 5 Big Ideas
- CompBio is an intersection: CS + math + statistics applied to biological systems. Bioinformatics ≈ using computers to analyze DNA, RNA, proteins and big biological data.
- Four macromolecules: Carbohydrates (CH₂O; energy & structure), Lipids (hydrophobic; membrane & energy storage), Proteins (amino acid polymers; catalysis & structure), Nucleic acids (nucleotide polymers; information). All built by dehydration, broken by hydrolysis.
- Central Dogma: DNA (replication & template) → mRNA (transcription) → Protein (translation). Mutations change sequence; regulation controls when/where/how much.
- Membrane transport: Passive (down gradient: diffusion, osmosis, facilitated diffusion — no ATP) vs. Active (against gradient — requires ATP or ion gradient). Phospholipid bilayer is selectively permeable.
- Metabolism = energy transformation: Glycolysis (2 ATP, cytoplasm, anaerobic) → Pyruvate oxidation → CAC (2 ATP, NADH, FADH₂) → Oxidative phosphorylation (~26–28 ATP via chemiosmosis, inner mitochondrial membrane). Total: ~30–32 ATP/glucose.