IF3211 · Domain Specific Computation · STEI ITB

Computational Biology
Midterm Study Guide

Campbell Biology (2014) · Jones & Pevzner (2004) · Suraishkumar (2019)

Intro to CompBio Macromolecules Genetics Cell Transport Metabolism
Topic 01 Introduction to Computational Biology

What Is Computational Biology?

Computational biology refers to the use of techniques in computer science, data analysis, mathematical modeling, and computational simulations to understand biological systems and their relationships (NIH BISTI, 2000). It sits at the intersection of computer science, biology, data science, applied mathematics, chemistry, and genetics.

Course clarification

In IF3211, Bioinformatics ≈ Computational Biology. The course focuses on applying computational techniques to solve biological problems, not purely wet-lab biology.

Five Related Fields — Know the Difference

Computational Biology

Using CS, math & statistics to understand biology. Broad umbrella term.

Bioinformatics

Using tech to analyze DNA, RNA, proteins & big data in biology. Often used synonymously here.

Bio-inspired Computing

Using models of biology to solve CS problems — e.g., genetic algorithms, neural networks, swarm intelligence.

Biomedical Engineering

Using engineering & computing to treat disease. Retinal prosthetics, biosensors, biochips.

Biological Computing

Using biologically derived molecules (DNA, proteins) to perform computations — DNA computing.

Applications of Computational Biology in Genetics

Genome AssemblyReconstructing full genome sequences from short sequencing reads (fragment assembly problem).
Variant CallingIdentifying SNPs, indels, and structural variants from sequencing data versus a reference genome.
PhylogeneticsBuilding evolutionary trees from sequence data to infer relationships between organisms.
Functional GenomicsStudying gene expression patterns, regulatory networks, and gene–function relationships at scale.
Structural BioinformaticsPredicting 3D structure of proteins and other macromolecules from sequence (e.g., AlphaFold).
Systems BiologyModeling entire biological pathways and networks as mathematical/computational systems.
Personalized MedicineTailoring treatments to individual genomic profiles using computational analysis.
Population GeneticsAnalyzing genetic variation across populations to study demography, selection, and migration.

Biology vs Mathematics — Why Biology Is Different

Biology is dominated by observations; its rules are not yet fully universal, and mathematical representations are still evolving. Unlike math (where operations are precisely defined) or physics (where observations map cleanly to equations), biology requires pattern recognition across messy, context-dependent data. This is exactly why computational methods are so valuable.

Bio-Inspired Algorithms — CS from Biology

Genetic Algorithms (GA)

Population of candidate solutions (chromosomes of bits). Uses selection, crossover, mutation to evolve better solutions over generations.

Artificial Neural Networks

Inspired by biological neurons: weighted inputs → summation → activation function → output. Basis of modern deep learning.

Swarm Intelligence

Ant colony optimization, particle swarm — emergent intelligent behavior from simple agents following local rules.

Levels of Biological Organization

Atom Molecule Organelle Cell Tissue Organ Organism Population Ecosystem

Unifying Themes in Biology (Campbell Ch. 1)

  • Evolution — the overarching framework; natural selection drives diversity.
  • Information flow — DNA → RNA → Protein (Central Dogma).
  • Energy transformation — all life transforms energy; photosynthesis captures it, cellular respiration releases it.
  • Structure & function — form at every level correlates with function (e.g., enzyme active sites, membrane phospholipids).
  • Emergent properties — life arises from interaction of non-living components; the whole is greater than the sum of its parts.
  • Cell theory — the cell is the fundamental unit of life; all cells come from pre-existing cells.

Exam tip — Sequence Analysis Problem

Many CompBio problems come from biology to CS (fragment assembly, phylogenetic trees, sequence alignment) and from CS to biology (sequencing by hybridization, DNA computing). Know which direction each flows.

Topic 02 Macromolecules — Carbohydrates, Lipids, Proteins & Nucleic Acids

All living organisms are built from chemicals based mostly on carbon. A compound containing carbon is an organic compound. Carbon's unique ability to form four covalent bonds allows it to build large, complex, varied molecules. The four classes of biological macromolecules are carbohydrates, lipids, proteins, and nucleic acids.

Polymers and Monomers — The Core Concept

Dehydration Synthesis (Condensation)

Monomers are joined by removing a water molecule from between them, forming a covalent bond. This builds polymers.

Reaction Monomer–OH + H–Monomer → Monomer–Monomer + H₂O

Hydrolysis

Polymers are broken apart by adding a water molecule across a bond, splitting monomers apart. Digestion uses hydrolysis.

Reaction Monomer–Monomer + H₂O → Monomer–OH + H–Monomer

1. Carbohydrates (CH₂O)ₙ

Carbohydrates serve as fuel and building materials. Their molecular formula is generally multiples of CH₂O. They are classified by size into monosaccharides, disaccharides, and polysaccharides.

Key Terms

MonosaccharideSingle sugar unit. Formula (CH₂O)ₙ where n = 3–7. Classified by carbon count (triose=3C, pentose=5C, hexose=6C) and carbonyl position (aldose=end, ketose=middle).
GlucoseC₆H₁₂O₆ — the most common monosaccharide; primary cellular fuel. Ring form dominant in solution (α-glucose or β-glucose differ in –OH orientation at C1).
FructoseC₆H₁₂O₆ — isomer of glucose; a ketose sugar found in fruit.
Ribose / Deoxyribose5-carbon sugars. Ribose is in RNA; deoxyribose (missing –OH at C2) is in DNA.
DisaccharideTwo monosaccharides joined by a glycosidic linkage via dehydration. Sucrose = glucose + fructose; Lactose = glucose + galactose; Maltose = glucose + glucose.
Glycosidic linkageCovalent bond between two sugar monomers. The position (α or β, 1–4 or 1–6) determines polymer properties.
PolysaccharidePolymer of hundreds to thousands of monosaccharides. Function depends on monomer type and linkage geometry.

Important Polysaccharides

NameMonomerLinkageFunctionFound in
Starch (amylose/amylopectin)α-glucoseα-1,4 + α-1,6 (branched)Energy storagePlants
Glycogenα-glucoseα-1,4 + α-1,6 (highly branched)Energy storageAnimals, fungi
Celluloseβ-glucoseβ-1,4Structural supportPlant cell walls
ChitinN-acetylglucosamineβ-1,4Structural supportArthropod exoskeletons, fungal walls

α vs β linkage — a critical distinction

α-linkages (starch, glycogen) form helical, digestible chains. β-linkages (cellulose, chitin) form straight, hydrogen-bonded sheets that are much stronger and indigestible by most animals. Humans lack cellulase.

2. Lipids

Lipids are hydrophobic (water-fearing) molecules. Unlike the other three macromolecules, lipids are not true polymers — they don't form long repetitive chains of identical monomers. They are unified by being mostly nonpolar hydrocarbons that do not mix with water.

Fatty acidLong hydrocarbon chain (typically 14–20 carbons) with a carboxyl group (–COOH) at one end. The backbone of most lipids.
Saturated fatty acidNo C=C double bonds; all carbons carry the maximum number of hydrogens. Straight chain → solid at room temperature (e.g., palmitic acid). Found in animal fats.
Unsaturated fatty acidOne or more C=C double bonds; each introduces a "kink" in the chain → liquid at room temperature (oils). Found in plants and fish. Polyunsaturated = multiple double bonds.
Triacylglycerol (fat)Glycerol + 3 fatty acid chains joined by ester linkages. Primary long-term energy storage; yields more than twice the energy per gram compared to carbohydrates (9 kcal/g vs 4 kcal/g).
PhospholipidGlycerol + 2 fatty acids + phosphate group + polar head. Amphipathic (hydrophilic head + hydrophobic tails). Spontaneously forms bilayers → the structural basis of all cell membranes.
SteroidFour fused carbon rings. Cholesterol is the foundational steroid in animal membranes — it modulates membrane fluidity. Steroid hormones (testosterone, estrogen, cortisol) are all cholesterol derivatives.
WaxLong fatty acid + long alcohol chain. Highly hydrophobic; protective coating on plant cuticles, bird feathers, insect exoskeletons.
Ester linkage in fat formation (dehydration) Glycerol–OH + HOOC–Fatty acid → Glycerol–OOC–Fatty acid + H₂O

3. Proteins

Proteins are the most structurally and functionally diverse macromolecules. They are polymers of amino acids joined by peptide bonds. A protein's specific shape determines its specific function. Proteins carry out virtually every process in a cell.

Functions of Proteins

  • Enzymes — biological catalysts that speed up reactions (e.g., amylase, DNA polymerase, ATP synthase).
  • Structural — collagen (connective tissue), keratin (hair, nails), actin & myosin (muscle).
  • Transport — hemoglobin (O₂), membrane transport proteins, channel proteins.
  • Signaling — hormones (insulin), receptors, second-messenger proteins.
  • Defense — antibodies (immunoglobulins) bind to foreign antigens.
  • Gene regulation — transcription factors bind DNA to turn genes on/off.
  • Motor proteins — kinesin, dynein move along cytoskeletal tracks.

Amino Acid Structure

All 20 amino acids share a central α-carbon bonded to: an amino group (–NH₂), a carboxyl group (–COOH), a hydrogen (–H), and a variable R group (side chain). The R group determines the amino acid's chemical properties.

Nonpolar / Hydrophobic R

Glycine, Alanine, Valine, Leucine, Isoleucine, Proline, Phenylalanine, Methionine, Tryptophan

Polar / Uncharged R

Serine, Threonine, Cysteine, Asparagine, Glutamine, Tyrosine

Charged R (ionized)

Positive: Lysine, Arginine, Histidine.
Negative: Aspartate, Glutamate.

Peptide bond formation (dehydration) –COOH (aa₁) + H₂N– (aa₂) → –CO–NH– (peptide bond) + H₂O

Four Levels of Protein Structure

Primary (1°)The specific sequence of amino acids in the polypeptide chain. Encoded directly by the gene. All higher-order structure follows from this.
Secondary (2°)Local folding patterns stabilized by hydrogen bonds between backbone –C=O and –N–H groups. Main forms: α-helix (right-handed coil) and β-pleated sheet (zigzag strands).
Tertiary (3°)Overall 3D shape of a single polypeptide chain. Stabilized by interactions among R groups: disulfide bridges (–S–S–), hydrogen bonds, ionic bonds, hydrophobic interactions, van der Waals forces.
Quaternary (4°)Association of two or more polypeptide chains (subunits) into a multi-subunit protein. Example: hemoglobin = 4 subunits (2α + 2β). Not all proteins have quaternary structure.

Denaturation

Disruption of a protein's structure (secondary and beyond) by heat, extreme pH, or chemicals. The primary structure remains intact, but function is lost. Some proteins can renature (e.g., ribonuclease); most cannot in vivo.

4. Nucleic Acids — DNA & RNA

Nucleic acids store, transmit, and help express genetic information. They are polymers of nucleotides. Each nucleotide has three components: a 5-carbon sugar (deoxyribose in DNA, ribose in RNA), a nitrogenous base, and a phosphate group. Nucleotides are joined by phosphodiester bonds between the 3'-OH of one sugar and the 5'-phosphate of the next.

PropertyDNARNA
SugarDeoxyribose (–H at 2')Ribose (–OH at 2')
BasesA, T, G, CA, U, G, C (Uracil replaces Thymine)
StrandsDouble-stranded (double helix)Single-stranded (can fold)
LocationNucleus, mitochondria, chloroplastsNucleus, cytoplasm, ribosomes
FunctionLong-term genetic information storageProtein synthesis (mRNA, tRNA, rRNA), regulation (miRNA, siRNA)
StabilityVery stable (deoxyribose + double-stranded)Less stable (2'-OH is reactive)

Nitrogenous Bases — Base-Pairing Rules

Purines (double ring)

Adenine (A) and Guanine (G)

Pyrimidines (single ring)

Cytosine (C), Thymine (T/DNA only), Uracil (U/RNA only)

Chargaff's Rules (DNA complementary base pairing) A pairs with T (2 H-bonds) | G pairs with C (3 H-bonds)

In any double-stranded DNA: %A = %T and %G = %C. Therefore A+T ≠ G+C (varies by organism).

ATP — The Cell's Energy Currency

Adenosine triphosphate (ATP) is a nucleotide (adenine + ribose + 3 phosphates) that is the direct energy currency of all cells. The bond between the second and third phosphate is a high-energy phosphoanhydride bond. Hydrolysis of this bond releases ~7.3 kcal/mol of free energy under standard conditions.

ATP Hydrolysis ATP + H₂O → ADP + Pᵢ + energy (~7.3 kcal/mol)

This exergonic reaction is coupled to endergonic cellular work (biosynthesis, transport, movement). ATP regenerated by cellular respiration and photosynthesis.

Macromolecule Quick-Reference

MacromoleculeMonomerBondKey FunctionsExample
CarbohydrateMonosaccharideGlycosidicEnergy, structureGlucose → starch / cellulose
Lipid (fat)Glycerol + fatty acidsEsterEnergy storage, insulationTriacylglycerol
Lipid (membrane)Glycerol + 2 FA + phosphateEsterMembrane bilayerPhosphatidylcholine
ProteinAmino acidPeptideCatalysis, structure, signaling, transportHemoglobin, collagen, insulin
DNADeoxyribonucleotidePhosphodiesterGenetic information storageChromosome
RNARibonucleotidePhosphodiesterGene expression, catalysismRNA, tRNA, rRNA, ribozyme
Topic 03 Genetics — Molecular Basis of Inheritance & Gene Expression

Genetics at the molecular level explains how information is stored in DNA, copied faithfully during cell division, and converted into functional proteins. This section covers DNA replication, the Central Dogma (transcription + translation), gene regulation, and mutations — all relevant computational biology targets.

DNA Double Helix — Structure

  • Watson-Crick model (1953): Two antiparallel polynucleotide strands coiled around a common axis. One strand runs 5'→3', the other 3'→5'.
  • Sugar-phosphate backbones on the outside; nitrogenous bases stacked on the inside, perpendicular to the helix axis (~0.34 nm apart; one full turn = 10 base pairs = 3.4 nm).
  • Base pairs stabilized by hydrogen bonds (A=T via 2; G≡C via 3) and by hydrophobic stacking interactions between adjacent base pairs.
  • Major and minor grooves: Proteins that interact with DNA (transcription factors, restriction enzymes) typically bind the major groove where base-specific contacts are possible.

DNA Replication — Semiconservative Model

Each new DNA molecule consists of one original (template) strand and one newly synthesized strand — the semiconservative model (confirmed by Meselson-Stahl experiment, 1958).

For a visual overview of the replication process, see this excellent video!

[Video] DNA Replication Animation: https://www.youtube.com/watch?v=TNKWgcFPHqw

Key Enzymes & Proteins

HelicaseUnwinds the double helix by breaking H-bonds between bases at the replication fork. Creates positive supercoiling ahead of the fork.
TopoisomeraseRelieves torsional stress ahead of the replication fork by cutting, rotating, and rejoining DNA. DNA gyrase (a topoisomerase II) in bacteria is the target of fluoroquinolone antibiotics.
PrimaseSynthesizes short RNA primers (~10 nt) providing a 3'-OH start point, since DNA polymerase cannot initiate new strands.
DNA Polymerase III (bacteria)Adds deoxyribonucleotides to the 3'-OH end of the growing strand (5'→3' direction). Also has 3'→5' proofreading exonuclease activity (~1 error per 10⁷ bases).
DNA Polymerase IRemoves RNA primers and replaces them with DNA.
DNA LigaseSeals the nick between Okazaki fragments (lagging strand) by forming phosphodiester bonds.
Single-Strand Binding Proteins (SSBPs)Stabilize single-stranded DNA at the replication fork, preventing re-annealing.

Leading vs Lagging Strand

Leading Strand

Synthesized continuously in the 5'→3' direction toward the replication fork. Requires one RNA primer.

Lagging Strand

Synthesized discontinuously (away from fork) as Okazaki fragments (100–200 nt in eukaryotes, 1000–2000 in prokaryotes). Each fragment needs its own RNA primer.

End Replication Problem & Telomeres

Eukaryotic chromosomes shorten with each replication at the 3' end of the lagging strand template. Telomeres (TTAGGG repeats) cap chromosome ends. Telomerase (reverse transcriptase carrying its own RNA template) extends telomeres in germline cells and stem cells. Cancer cells often reactivate telomerase for immortality.

The Central Dogma of Molecular Biology

Information flow (Crick, 1958) DNA →(Replication)→ DNA →(Transcription)→ RNA →(Translation)→ Protein

Reverse transcription (RNA→DNA, by retroviruses) and RNA replication (RNA→RNA, by RNA viruses) are special exceptions.

Transcription (DNA → mRNA)

Transcription produces an RNA molecule complementary to the DNA template strand (read 3'→5'), building mRNA 5'→3'. It occurs in the nucleus in eukaryotes. The resulting mRNA is the same sequence as the coding strand (except T→U).

  1. Initiation: RNA polymerase binds the promoter (in bacteria: –10 and –35 sequences; in eukaryotes: TATA box ~25 bp upstream, recognized by transcription factors). The double helix unwinds locally.
  2. Elongation: RNA polymerase moves along template strand 3'→5', synthesizing mRNA 5'→3' at ~40 nt/s. No proofreading mechanism. Ribonucleotides (ATP, UTP, GTP, CTP) are incorporated.
  3. Termination: In bacteria: rho-independent (hairpin in mRNA) or rho-dependent termination. In eukaryotes: polyadenylation signal (AAUAAA) followed by cleavage and poly-A addition.

Eukaryotic Pre-mRNA Processing

  • 5' Capping: Modified guanosine cap added to 5' end (protects from degradation; aids ribosome recognition).
  • 3' Poly-A Tail: ~200 adenine nucleotides added to 3' end (stability; nuclear export signal).
  • RNA Splicing: Introns (non-coding intervening sequences) are removed by the spliceosome. Exons (expressed sequences) are joined together. Alternative splicing of the same pre-mRNA can produce different protein isoforms.

Translation (mRNA → Protein)

Translation occurs at the ribosome (large + small subunit). It reads mRNA in the 5'→3' direction, in triplets called codons, to assemble a polypeptide from amino acids delivered by tRNA molecules. The genetic code is redundant (degenerative), universal (with minor exceptions), and unambiguous.

Reading Frame Rule 4³ = 64 possible codons → 20 amino acids + 3 stop codons + 1 start codon

Start codon: AUG (methionine). Stop codons: UAA, UAG, UGA. Multiple codons per amino acid = degeneracy (wobble at 3rd position).

  1. Initiation: Small ribosomal subunit binds mRNA at the 5' cap (eukaryotes) or Shine-Dalgarno sequence (bacteria). Initiator tRNA (Met-tRNA) pairs with AUG start codon in the P site. Large subunit joins. GTP hydrolysis used.
  2. Elongation: Aminoacyl-tRNA enters the A site; peptide bond forms (peptidyl transferase activity of 23S/28S rRNA is a ribozyme!); ribosome translocates 3 nt in 5'→3' direction; empty tRNA exits via E site.
  3. Termination: Stop codon (UAA, UAG, UGA) enters A site; release factor (not tRNA) binds; polypeptide released; ribosome dissociates. Polyribosomes (polysomes) = multiple ribosomes on one mRNA.

tRNA — The Adapter Molecule

tRNA has a cloverleaf secondary structure folded into an L-shape in 3D. The anticodon loop (3 nucleotides) base-pairs with the mRNA codon. The 3' CCA-OH end carries the specific amino acid, attached by aminoacyl-tRNA synthetase (one per amino acid).

Mutations — Types & Consequences

Point mutationSingle nucleotide change. Can be a transition (purine↔purine or pyrimidine↔pyrimidine) or transversion (purine↔pyrimidine).
Silent (synonymous)Codon changes but still encodes the same amino acid (due to degeneracy). No phenotypic effect.
MissenseCodon changes to a different amino acid. May or may not affect function. Example: HbS (sickle-cell) — Glu→Val at position 6 of β-globin.
NonsenseCodon changes to a stop codon. Truncated protein usually nonfunctional.
FrameshiftInsertion or deletion of nucleotides not in multiples of 3. Shifts the reading frame of all downstream codons → garbled protein usually.
Insertion/Deletion (Indel)One or more nucleotides inserted or deleted. If n ≠ multiple of 3, causes frameshift.
Chromosomal mutationLarge-scale changes: deletion, duplication, inversion, translocation of chromosome segments.

Gene Regulation

Not all genes are expressed at all times in all cells. Gene regulation occurs at multiple levels:

Chromatin remodeling

Histones acetylated (decondense chromatin → accessible) or methylated (condense → silenced). Epigenetic control — heritable but not encoded in DNA sequence.

Transcriptional

Transcription factors (activators and repressors) bind enhancers/silencers (sometimes thousands of bp away, looping to promoter) to control RNA polymerase. Operons in bacteria (lac operon: lacZ, lacY, lacA genes repressed by lac repressor; induced by allolactose).

Post-transcriptional

Alternative splicing; mRNA stability (AU-rich elements in 3'UTR target mRNA for degradation); miRNA/siRNA (small ~22 nt RNAs guide RISC complex to complementary mRNA for cleavage or translational repression).

Translational

Control of ribosome binding; iron-responsive elements (IREs) in ferritin mRNA — IRP protein blocks translation when iron is low.

Post-translational

Protein folding (chaperones), cleavage (zymogen activation), chemical modification (phosphorylation, glycosylation, ubiquitination → proteasomal degradation).

Computational Approaches to Genetics

Sequence Alignment

Finding regions of similarity between DNA/protein sequences. Pairwise (BLAST, Smith-Waterman) and multiple sequence alignment (ClustalW, MUSCLE). Dynamic programming is the basis of optimal alignment.

Fragment Assembly

Reconstructing a genome from millions of short reads (shotgun sequencing). Overlap-layout-consensus or De Bruijn graph approaches. Hamiltonian/Eulerian path problems.

Gene Finding

Identifying ORFs (open reading frames), promoter sequences, splice sites in genomic DNA. Hidden Markov Models (HMMs) widely used.

Motif Finding

Identifying short recurring patterns (transcription factor binding sites) in a set of sequences. Gibbs sampling, expectation-maximization (EM) methods.

Sequence Alignment — Key Concepts (Jones & Pevzner)

Hamming Distance (same-length strings) d_H(v, w) = number of positions where v and w differ

e.g. d_H("ATGC", "AAGC") = 1

Edit (Levenshtein) Distance d(v,w) = min insertions + deletions + substitutions to convert v → w
# Smith-Waterman local alignment (pseudocode)
def smith_waterman(seq1, seq2, match, mismatch, gap):
    H = [[0]*(len(seq2)+1) for _ in range(len(seq1)+1)]
    for i in range(1, len(seq1)+1):
        for j in range(1, len(seq2)+1):
            score = match if seq1[i-1]==seq2[j-1] else mismatch
            H[i][j] = max(
                0,
                H[i-1][j-1] + score,   # diagonal (match/mismatch)
                H[i-1][j]   + gap,      # gap in seq2
                H[i][j-1]   + gap       # gap in seq1
            )
    return H
# De Bruijn graph for genome assembly
# k-mer decomposition of a read
def kmer_decomp(seq, k):
    return [seq[i:i+k] for i in range(len(seq)-k+1)]

# Build De Bruijn graph: nodes=(k-1)-mers, edges=k-mers
def de_bruijn(kmers):
    graph = {}
    for kmer in kmers:
        prefix = kmer[:-1]
        suffix = kmer[1:]
        graph.setdefault(prefix, []).append(suffix)
    return graph
Topic 04 Cell Transport — Membrane Structure & Transport

The plasma membrane is a selectively permeable barrier that regulates the passage of substances in and out of the cell. Its structure is described by the Fluid Mosaic Model (Singer & Nicolson, 1972): a fluid phospholipid bilayer with embedded proteins that can move laterally.

Membrane Structure

Phospholipid bilayerTwo layers of phospholipids arranged with hydrophilic heads facing aqueous environments (inside and outside) and hydrophobic tails facing inward. Forms a stable self-sealing barrier.
Membrane fluidityMembrane must stay fluid for function. Unsaturated fatty acids (kinked tails prevent packing) and cholesterol (at moderate temperatures, prevents phospholipids from packing too closely or becoming too rigid) both maintain fluidity.
Integral proteinsEmbedded within the bilayer, often spanning it (transmembrane proteins). Nonpolar transmembrane α-helices interact with the hydrophobic core. Functions: channels, carriers, receptors, enzymes.
Peripheral proteinsLoosely associated with membrane surfaces (not embedded). Often associated with integral proteins. Examples: cytoskeletal attachment proteins, some enzymes.
Glycoproteins/GlycolipidsCarbohydrate chains attached to proteins/lipids on the extracellular surface. Form the "glycocalyx" — used in cell-cell recognition, blood type antigens, immune signaling.

Passive Transport — No Energy Required

Passive transport moves substances down their concentration (or electrochemical) gradient — from high to low concentration. No ATP needed; driven by thermodynamics (entropy increase).

Simple diffusionNonpolar, small molecules (O₂, CO₂, N₂) and small nonpolar molecules (ethanol) cross the lipid bilayer directly. Rate proportional to concentration gradient.
OsmosisDiffusion of water across a selectively permeable membrane from hypotonic (lower solute concentration, higher water concentration) to hypertonic (higher solute concentration, lower water concentration). Water moves down its own concentration gradient.
Facilitated diffusionPolar or large molecules cross with the help of membrane proteins (channel proteins or carrier proteins). Still no ATP required — driven by concentration gradient.
Channel proteinsForm hydrophilic pores. Ion channels are gated (ligand-gated, voltage-gated, mechanically-gated) or always open (leak channels). Aquaporins are specialized water channels.
Carrier proteinsBind specific solute, undergo conformational change to release it on the other side. More selective, slower than channels. GLUT transporters carry glucose.

Osmotic Terminology

Hypotonic Solution

Solute concentration lower than cell interior. Water flows into cell → cell swells (animal: lysis; plant: turgid/healthy).

Hypertonic Solution

Solute concentration higher than cell interior. Water flows out → cell shrinks (animal: crenation; plant: plasmolysis).

Isotonic Solution

Solute concentration equal to cell interior. No net movement of water. Cell maintains normal volume.

Osmotic Pressure (van 't Hoff equation) π = iMRT

π = osmotic pressure, i = van 't Hoff factor (ionization), M = molarity, R = gas constant (0.0821 L·atm/mol·K), T = temperature (Kelvin). Critical for biomedical IV solutions.

Active Transport — Energy Required

Active transport moves substances against their concentration gradient — from low to high concentration. Requires ATP (primary active transport) or an ion gradient (secondary active transport).

Primary active transportProtein pumps use ATP hydrolysis directly to transport ions/molecules against gradient. Classic example: Na⁺/K⁺ ATPase — pumps 3 Na⁺ out and 2 K⁺ in per ATP hydrolyzed. Maintains resting membrane potential and cell volume.
Na⁺/K⁺ ATPaseElectrogenic pump (net transfer of charge = 1 positive charge out per cycle), maintaining −70 mV resting potential in neurons. Constitutes ~25% of total ATP usage in the human body.
Proton pump (H⁺ ATPase)In plants, fungi, bacteria. Pumps H⁺ out, generating proton gradient used to drive secondary active transport (e.g., sucrose uptake in plant phloem).
Secondary active transport (cotransport)Uses the gradient of one ion (usually Na⁺, established by primary pump) to move another substance against its gradient. Symporter: both molecules same direction. Antiporter: opposite directions. SGLT1 (intestine): Na⁺ symport with glucose.
Na⁺/K⁺ ATPase Summary 3 Na⁺(in) + 2 K⁺(out) + ATP → 3 Na⁺(out) + 2 K⁺(in) + ADP + Pᵢ

Net result: cytoplasm negative relative to outside. High K⁺ inside; high Na⁺ outside. Exploited by every excitable cell.

Bulk Transport — Moving Large Molecules

EndocytosisCell engulfs material by membrane invagination forming a vesicle. Three types:
— Phagocytosis"Cell eating" — large particles (bacteria, dead cells) engulfed as pseudopods surround them. Vesicle → phagosome → fuses with lysosome.
— Pinocytosis"Cell drinking" — small droplets of extracellular fluid engulfed non-specifically. Constitutive (always occurring).
— Receptor-mediated endocytosisSpecific ligands bind receptors in coated pits (clathrin-coated); vesicle formed. Very selective uptake. LDL (cholesterol) uptake is a classic example; defect causes familial hypercholesterolemia.
ExocytosisVesicles from Golgi apparatus or secretory vesicles fuse with plasma membrane, releasing contents outside. Used for secretion (neurotransmitters, hormones, digestive enzymes, mucus).

Transport Comparison Table

TypeDirectionEnergyProtein needed?Examples
Simple diffusionHigh → LowNoneNoO₂, CO₂, ethanol
OsmosisHigh [H₂O] → Low [H₂O]NoneNo (or aquaporin)Water across all membranes
Facilitated diffusion (channel)High → LowNoneYes (channel)Ion channels, aquaporins
Facilitated diffusion (carrier)High → LowNoneYes (carrier)GLUT1 (glucose), uniporters
Primary activeLow → HighATPYes (pump)Na⁺/K⁺ ATPase, H⁺ pump, Ca²⁺ pump
Secondary active (symport)Low → High (solute)Ion gradientYesSGLT1 (Na⁺/glucose), NKCC
Secondary active (antiport)Low → High (solute)Ion gradientYesNa⁺/H⁺ exchanger, Na⁺/Ca²⁺ exchanger
Endocytosis/ExocytosisIn/OutATPYes (vesicle machinery)LDL uptake, neurotransmitter release
Topic 05 Metabolism — Energy, Enzymes & Cellular Respiration

Metabolism is the totality of an organism's chemical reactions. It is organized into metabolic pathways: ordered sequences of reactions, each catalyzed by a specific enzyme, transforming substrates into products. Metabolic pathways are either catabolic (degradation, releasing energy) or anabolic (biosynthesis, requiring energy).

Thermodynamic Foundations

Free energy (G)Gibbs free energy — the portion of a system's energy available to do work at constant temperature and pressure. Predicts spontaneity of reactions.
ΔG (free energy change)ΔG = G_products − G_reactants. If ΔG < 0: exergonic (spontaneous, releases energy). If ΔG > 0: endergonic (non-spontaneous, requires energy input). If ΔG = 0: at equilibrium.
Exergonic reactionΔG < 0; energy released to surroundings; proceeds spontaneously. Cellular respiration (glucose oxidation): ΔG = −686 kcal/mol.
Endergonic reactionΔG > 0; energy absorbed from surroundings; not spontaneous; requires energy coupling (e.g., from ATP hydrolysis). Protein synthesis, active transport, photosynthesis.
Energy couplingExergonic reactions (ATP hydrolysis) are coupled to endergonic reactions, making them thermodynamically favorable. The cell's economy: ATP is the coupling agent.
Gibbs Free Energy Equation ΔG = ΔH − TΔS

ΔH = enthalpy change (heat), T = absolute temperature (K), ΔS = entropy change. Also: ΔG° = −RT ln Keq (standard free energy and equilibrium constant).

Enzymes — Biological Catalysts

Enzymes are proteins (or ribozymes) that lower the activation energy (Eₐ) of reactions without being consumed. They do not change ΔG or the equilibrium; they only speed the rate at which equilibrium is reached.

Active siteThe specific pocket or cleft where substrate binds. Shaped to complement the substrate (induced fit model: active site changes shape slightly upon substrate binding, improving catalysis).
SubstrateThe reactant molecule that binds the enzyme's active site. Enzyme-substrate complex forms, lowering Eₐ, then products released.
CofactorNon-protein molecule required for enzyme activity. Inorganic: metal ions (Zn²⁺, Fe²⁺, Mg²⁺). Organic cofactors = coenzymes (e.g., NAD⁺, FAD, coenzyme A — often vitamins or vitamin derivatives).
Competitive inhibitionInhibitor resembles substrate and competes for the active site. Effect overcome by increasing [substrate]. Increases apparent Kₘ; Vₘₐₓ unchanged. Example: statins compete with HMG-CoA reductase substrate.
Noncompetitive inhibitionInhibitor binds allosteric site (not active site), changing enzyme shape. Reduces Vₘₐₓ; Kₘ unchanged. Cannot be overcome by adding more substrate.
Allosteric regulationRegulatory molecule binds away from active site. Activators stabilize active form; inhibitors stabilize inactive form. Cooperativity in allosteric enzymes (sigmoidal kinetics). Key for metabolic pathway control.
Feedback inhibitionEnd-product of a pathway inhibits an early enzyme in that pathway. Prevents overproduction. Classic: isoleucine inhibits threonine deaminase (first enzyme in its own biosynthesis pathway).
Michaelis-Menten Kinetics v = (Vₘₐₓ × [S]) / (Kₘ + [S])

v = reaction rate, Vₘₐₓ = maximum rate, [S] = substrate concentration, Kₘ = Michaelis constant (substrate concentration at ½Vₘₐₓ). Low Kₘ = high affinity for substrate.

Cellular Respiration — Overview

Cellular respiration is the controlled release of chemical energy from organic molecules to produce ATP. In the presence of oxygen (aerobic respiration), glucose is completely oxidized.

Overall Aerobic Respiration C₆H₁₂O₆ + 6O₂ → 6CO₂ + 6H₂O + ~30–32 ATP

ΔG° = −686 kcal/mol. Energy is harvested gradually through redox reactions. NAD⁺ and FAD act as electron carriers (reduced to NADH and FADH₂).

Glycolysis
(Cytoplasm)
Pyruvate Oxidation
(Mitochondrial matrix)
Citric Acid Cycle
(Mitochondrial matrix)
Oxidative Phosphorylation
(Inner mitochondrial membrane)

Stage 1: Glycolysis (Cytoplasm)

Glucose (6C) is split into two pyruvate (3C) molecules through 10 enzymatic steps. Occurs in cytosol. Requires no oxygen (anaerobic-capable).

Glycolysis Net Equation Glucose + 2 NAD⁺ + 2 ADP + 2 Pᵢ → 2 Pyruvate + 2 NADH + 2 ATP + 2 H₂O

Investment phase (steps 1–5): 2 ATP consumed. Payoff phase (steps 6–10): 4 ATP + 2 NADH produced. Net: 2 ATP, 2 NADH per glucose.

Key glycolytic enzymesHexokinase (step 1); Phosphofructokinase-1/PFK-1 (step 3, main regulatory enzyme, inhibited by ATP, citrate; activated by AMP); Pyruvate kinase (step 10).
Substrate-level phosphorylationATP produced by direct transfer of a phosphate group from a phosphorylated substrate to ADP. Occurs in glycolysis (steps 7 & 10) and citric acid cycle (step 5).

Stage 2: Pyruvate Oxidation (Pyruvate Dehydrogenase Complex)

Per pyruvate Pyruvate + CoA + NAD⁺ → Acetyl-CoA + CO₂ + NADH

2 pyruvates per glucose → 2 Acetyl-CoA + 2 CO₂ + 2 NADH. Occurs in mitochondrial matrix. Pyruvate dehydrogenase complex contains three enzymes and five cofactors (including thiamine pyrophosphate, lipoic acid, CoA, FAD, NAD⁺).

Stage 3: Citric Acid Cycle (Krebs Cycle) — per Acetyl-CoA

Net per Acetyl-CoA (one turn) Acetyl-CoA + 3 NAD⁺ + FAD + ADP + Pᵢ + 2H₂O → 2 CO₂ + 3 NADH + FADH₂ + ATP + CoA

Per glucose (×2): 6 NADH, 2 FADH₂, 2 ATP (substrate-level), 4 CO₂. Key intermediates: citrate (6C), isocitrate, α-ketoglutarate (5C), succinyl-CoA (4C), succinate, fumarate, malate, oxaloacetate (regenerated).

Key CAC enzymesCitrate synthase (regulated), Isocitrate dehydrogenase (produces first CO₂; inhibited by NADH/ATP), α-Ketoglutarate dehydrogenase (produces second CO₂; similar to pyruvate DH complex).

Stage 4: Oxidative Phosphorylation (Inner Mitochondrial Membrane)

NADH and FADH₂ donate electrons to the electron transport chain (ETC). Energy released is used to pump H⁺ across the inner mitochondrial membrane, creating a proton gradient (proton motive force). H⁺ flows back through ATP synthase (chemiosmosis), driving ATP synthesis.

Complex I (NADH dehydrogenase)Accepts 2e⁻ from NADH; pumps 4 H⁺ across membrane. Transfers electrons to ubiquinone (CoQ).
Complex II (Succinate dehydrogenase)Accepts 2e⁻ from FADH₂; does NOT pump H⁺. Transfers electrons to CoQ. Part of both CAC and ETC.
Complex III (Cytochrome bc₁)Accepts electrons from CoQH₂; pumps 4 H⁺. Transfers electrons to cytochrome c.
Complex IV (Cytochrome c oxidase)Accepts electrons from cytochrome c; pumps 2 H⁺; transfers electrons to O₂ (final electron acceptor), forming H₂O. O₂ must be present for ETC to continue.
ATP synthase (Complex V)H⁺ flows through it (down gradient) from intermembrane space to matrix. Rotational catalysis: the F₀ subunit rotates, driving conformational changes in F₁ subunit that catalyze ATP synthesis from ADP + Pᵢ. ~2.5 ATP per NADH; ~1.5 ATP per FADH₂.
Chemiosmosis (Mitchell's Hypothesis, 1961) Proton motive force = ΔΨ (membrane potential) + ΔpH (chemical gradient)

Inhibited by: DNP (uncoupler — dissipates H⁺ gradient without ATP synthesis, releasing heat); oligomycin (ATP synthase blocker); cyanide/CO (Complex IV blocker — no final electron acceptor).

ATP Yield Summary (per glucose)

StageLocationDirect ATPNADHFADH₂ATP equiv.
GlycolysisCytoplasm222 + 5 = 7
Pyruvate oxidationMatrix025
Citric acid cycleMatrix2622 + 15 + 3 = 20
Total (approx.)4102~30–32 ATP

Anaerobic Respiration & Fermentation

When O₂ is absent or insufficient, cells regenerate NAD⁺ from NADH via fermentation — allowing glycolysis to continue producing 2 ATP per glucose.

Lactic Acid Fermentation

Pyruvate + NADH → Lactate + NAD⁺. Occurs in muscle cells during intense exercise, and in bacteria used to make yogurt & cheese.

Alcohol Fermentation

Pyruvate → Acetaldehyde (CO₂ released) → Ethanol + NAD⁺. Yeast and some bacteria. Basis of brewing and baking (CO₂ makes dough rise).

Photosynthesis — Capturing Light Energy

Photosynthesis converts light energy into chemical energy stored in glucose. Occurs in chloroplasts.

Overall equation 6CO₂ + 6H₂O + light energy → C₆H₁₂O₆ + 6O₂

Reverse of aerobic respiration in terms of atoms, but mechanistically very different.

Light-dependent reactions (thylakoid membrane)

Chlorophyll absorbs light (mainly red ~680 nm and blue ~430 nm). Photosystem II splits H₂O → O₂ + 2H⁺ + 2e⁻. Electrons flow through ETC (producing NADPH). ATP synthesized by chemiosmosis. Photosystem I reduces NADP⁺ → NADPH.

Calvin Cycle / Light-independent (stroma)

Uses ATP + NADPH from light reactions to fix CO₂ into sugar. 3 turns to produce one G3P. RuBisCO catalyzes CO₂ fixation (most abundant enzyme on Earth). Net: 3 CO₂ → 1 G3P (requires 9 ATP + 6 NADPH).

Metabolic Networks & Computational Biology

Metabolism can be modeled as a directed graph where nodes are metabolites and edges are reactions (enzyme-catalyzed). This is the basis of flux balance analysis (FBA) and systems biology tools used to predict metabolic phenotypes.

# Simplified metabolic pathway graph (glycolysis key nodes)
glycolysis = {
    'Glucose':     [('G6P', 'Hexokinase')],
    'G6P':         [('F6P', 'Phosphoglucose isomerase')],
    'F6P':         [('F1,6BP', 'PFK-1')],   # rate-limiting step
    'F1,6BP':      [('DHAP', 'Aldolase'), ('G3P', 'Aldolase')],
    'G3P':         [('1,3-BPG', 'GAPDH')],  # NADH produced here
    '1,3-BPG':     [('3PG', 'Phosphoglycerate kinase')],  # ATP
    '3PG':         [('2PG', 'Phosphoglycerate mutase')],
    '2PG':         [('PEP', 'Enolase')],
    'PEP':         [('Pyruvate', 'Pyruvate kinase')],  # ATP
}

def trace_pathway(graph, start, end, path=[]):
    path = path + [start]
    if start == end: return path
    for (neighbor, enzyme) in graph.get(start, []):
        result = trace_pathway(graph, neighbor, end, path)
        if result: return result
    return None

print(trace_pathway(glycolysis, 'Glucose', 'Pyruvate'))

Key formula to remember for PFK-1 regulation

PFK-1 is the committed step of glycolysis and the most important regulatory enzyme. Inhibited by high ATP (energy surplus) and citrate (plenty of CAC intermediates). Activated by AMP (energy deficit) and F2,6BP. This is classic feedback inhibition in a catabolic pathway.

Glossary All Key Terms — Alphabetical Reference
CompBioBioinformatics — Applying CS and stats to analyze biological data (sequences, structures, pathways).
MacroActive site — Region of enzyme that binds substrate; shape complementary to substrate (induced fit).
MacroAmino acid — Monomer of proteins; central C bonded to –NH₂, –COOH, –H, and R group. 20 standard types.
MetabolismATP — Adenosine triphosphate; universal cellular energy currency. Hydrolysis releases ~7.3 kcal/mol.
MacroBase pairing — A=T (2 H-bonds), G≡C (3 H-bonds) in DNA; A=U in RNA.
MetabolismCalvin Cycle — Light-independent reactions in chloroplast stroma; CO₂ fixed to G3P via RuBisCO; uses ATP + NADPH.
MacroCarbohydrate — (CH₂O)ₙ; energy and structural role. Monosaccharides → disaccharides → polysaccharides.
MetabolismCAC / Krebs cycle — Citric acid cycle; 8-step cycle in matrix; produces NADH, FADH₂, ATP, CO₂ per Acetyl-CoA turn.
MacroCellulose — β-1,4-linked glucose polymer; plant cell wall; structural, not digestible by animals.
TransportChannel protein — Integral membrane protein forming aqueous pore; allows specific ions/molecules through by facilitated diffusion.
MetabolismChemiosmosis — ATP synthesis powered by H⁺ flowing through ATP synthase down proton gradient (Mitchell hypothesis).
MacroCholesterol — Steroid lipid; membrane fluidity regulator; precursor to steroid hormones, bile acids, vitamin D.
GeneticsCodon — 3-nucleotide mRNA sequence specifying one amino acid (or stop). 64 possible codons.
GeneticsCentral Dogma — DNA → RNA → Protein; information flow direction in all living cells.
GeneticsCRISPR-Cas9 — Bacterial adaptive immune system repurposed as precise gene-editing tool; guide RNA directs Cas9 nuclease to target DNA sequence.
TransportCotransport — Secondary active transport; one ion's gradient drives another molecule against its gradient. Symport or antiport.
MetabolismDenaturation — Loss of protein 3D structure (not primary) due to heat, pH, chemicals. Usually destroys function.
MacroDehydration synthesis — Bond formation between monomers by removing H₂O. Builds all polymers.
MacroDeoxyribose — 5C sugar in DNA backbone; –H at 2' position (ribose has –OH).
GeneticsDNA ligase — Seals nicks in DNA backbone; joins Okazaki fragments; used in cloning.
GeneticsDNA polymerase — Synthesizes DNA 5'→3'; requires primer; has proofreading exonuclease; uses dNTPs.
MetabolismETC — Electron transport chain; Complexes I–IV in inner mitochondrial membrane; transfers electrons ultimately to O₂.
MetabolismExergonic / Endergonic — ΔG < 0: releases energy (spontaneous). ΔG > 0: absorbs energy (requires coupling).
GeneticsExon / Intron — Exons = coding sequences retained in mRNA. Introns = non-coding sequences removed by splicing.
MetabolismFeedback inhibition — End-product inhibits early enzyme in its biosynthesis pathway; prevents overproduction.
MacroFatty acid — Carboxyl group + hydrocarbon tail; saturated (no double bonds) or unsaturated (one or more C=C).
MetabolismFermentation — Anaerobic oxidation of glucose; regenerates NAD⁺ via pyruvate reduction. Net: 2 ATP/glucose.
TransportFluid mosaic model — Membrane = fluid phospholipid bilayer with mobile embedded proteins (1972, Singer & Nicolson).
CompBioGenetic algorithm — Evolutionary computation; population of solutions evolves via selection + crossover + mutation.
GeneticsGenetic code — Triplet codons map to 20 amino acids; redundant, unambiguous, nearly universal.
MetabolismGlycolysis — 10-step pathway; glucose (6C) → 2 pyruvate (3C); net 2 ATP + 2 NADH; cytoplasm; anaerobic.
MacroGlycogen — Highly branched α-1,4/1,6 glucose polymer; animal energy storage; liver and muscle.
TransportHypo/Hypertonic — Solute concentration relative to cell. Hypotonic: water enters. Hypertonic: water leaves.
MacroHydrolysis — Bond cleavage by adding H₂O; breaks all biological polymers. Digestion.
GeneticsmRNA — Messenger RNA; carries genetic information from DNA in nucleus to ribosomes in cytoplasm.
GeneticsMutation — Permanent change in DNA sequence. Types: silent, missense, nonsense, frameshift.
MacroNAD⁺/NADH — Nicotinamide adenine dinucleotide; electron carrier. NAD⁺ (oxidized) accepts electrons → NADH (reduced); used in ETC.
GeneticsNucleotide — Monomer of nucleic acids: nitrogenous base + 5C sugar + phosphate group(s).
TransportOsmosis — Diffusion of water across semipermeable membrane from low [solute] to high [solute] side.
GeneticsOkazaki fragment — Short DNA fragments on lagging strand; each needs RNA primer; joined by DNA ligase.
TransportPhospholipid — Amphipathic membrane lipid; glycerol + 2 FA + phosphate head; forms bilayer.
MetabolismPhotosynthesis — 6CO₂ + 6H₂O + light → C₆H₁₂O₆ + 6O₂. Light reactions (thylakoid) + Calvin cycle (stroma).
MacroPolymer / Monomer — Polymer = long chain of identical/similar subunits (monomers) linked by covalent bonds.
GeneticsPromoter — DNA sequence upstream of a gene where RNA polymerase binds to initiate transcription.
MacroProtein structure — 1° (aa sequence) → 2° (α-helix/β-sheet) → 3° (3D fold) → 4° (multi-subunit).
GeneticsRibosome — Large + small subunit; site of translation; rRNA + protein. 70S (prokaryote), 80S (eukaryote).
MacroStarch — α-1,4/1,6-linked glucose; plant energy storage. Amylose (linear) + amylopectin (branched).
GeneticsTelomere — Repetitive DNA (TTAGGG) at chromosome ends; prevents shortening. Maintained by telomerase.
GeneticsTranscription — DNA → mRNA; catalyzed by RNA polymerase; reads template 3'→5'; synthesizes mRNA 5'→3'.
GeneticsTranslation — mRNA → protein at ribosome; codons decoded by tRNA anticodons; initiation-elongation-termination.
GeneticstRNA — Transfer RNA; cloverleaf structure; carries amino acid at 3' CCA end; anticodon pairs with mRNA codon.
TransportNa⁺/K⁺ ATPase — Primary active transporter; 3 Na⁺ out / 2 K⁺ in per ATP; maintains resting membrane potential.
MacroNucleic acid — DNA or RNA; polymer of nucleotides; stores and expresses genetic information.
CompBioDe Bruijn graph — Graph representation of overlapping k-mers; used for genome assembly. Eulerian path solves assembly.
CompBioSmith-Waterman — Dynamic programming algorithm for optimal local sequence alignment. Time: O(mn).

Quick Recap — 5 Big Ideas

  • CompBio is an intersection: CS + math + statistics applied to biological systems. Bioinformatics ≈ using computers to analyze DNA, RNA, proteins and big biological data.
  • Four macromolecules: Carbohydrates (CH₂O; energy & structure), Lipids (hydrophobic; membrane & energy storage), Proteins (amino acid polymers; catalysis & structure), Nucleic acids (nucleotide polymers; information). All built by dehydration, broken by hydrolysis.
  • Central Dogma: DNA (replication & template) → mRNA (transcription) → Protein (translation). Mutations change sequence; regulation controls when/where/how much.
  • Membrane transport: Passive (down gradient: diffusion, osmosis, facilitated diffusion — no ATP) vs. Active (against gradient — requires ATP or ion gradient). Phospholipid bilayer is selectively permeable.
  • Metabolism = energy transformation: Glycolysis (2 ATP, cytoplasm, anaerobic) → Pyruvate oxidation → CAC (2 ATP, NADH, FADH₂) → Oxidative phosphorylation (~26–28 ATP via chemiosmosis, inner mitochondrial membrane). Total: ~30–32 ATP/glucose.