Proteins

Jaewon Chung

(he/him) - NeuroData lab
Johns Hopkins University - Biomedical Engineering

icon j1c@jhu.edu
icon @j1c (Github)
icon @j1c (Twitter)

Generate: Biomedicines

  • Designs and develops protein therapeutics
  • Uses generative AI 🤯🤯🤯
  • >$600M in funding 💰💰💰
  • Backed by Flagship Pioneering
    • Founded Moderna 💊💊💊
  • Located in Somerville, MA

center
center

What are proteins?


center

Special Types of Proteins - Antibodies

center

But why protein-based therapeutics?

Proteins

  • High Target Specificity
    • Lower Toxicity
    • Fewer side effects
    • Small range of targets
    • Limited drug-drug interactions
  • Longer half-life
  • Examples:
    • Monoclonal antibodies, hormones

Small Molecules

  • Low Target Specificity
    • Higher Toxicity
    • More side effects
    • Wider range of targets
    • More drug-drug interactions
  • Shorter half-life
  • Examples:
    • Ibuprofen, lithium

General Drug Discovery Pipeline

  1. Target Identification and Validation (what disease/condition?)
    • Generate has its own business department; partners with other companies
  2. Lead Discovery (what drug candidates?)
    • Generate uses Chroma (diffusion-based protein generative model)
    • Given candidates, test binding affinity, selectivity, obtain structure.
    • Use data to update Chroma.
  3. Lead Optimization (can we make some candidates better?)
    • Iterative optimization to improve binding affinity, selectivity.
  4. Preclinical Development (is it safe?)
  5. Clinical Development (is it safe in humans?)

What are some open problems in protein prediction?

Single Protein

  • Given sequence, predict structure
    • AlphaFold, Rosetta
  • Given sequence, predict function
    • Large language models (ProteinBert)
  • Given function, predict sequence and/or structure

Protein Complex

  • Given two structures, predict complex structure
    • Protein docking
  • Given two structures, predict binding sites

What did I do?

  • Mainly worked on interaction prediction.

Surface as representation of proteins



center

Different Sequences & Structures but Same Surfaces

center

Different Sequences & Structures but Same Surfaces

  • Compared to query A

Structure Sequence Alignment Score Structure Aligment Score Surface Alignment Score
B .23 .35 2.85
C .34 .32 3.10
D .85 .78 2.96





Can we learn representations of the interacting surfaces?

Protein-Protein Interaction



center

Point Cloud as Representation of the Surface

Atoms in Space

center

Atoms + Surface Point Cloud

center

Initial Point Cloud Features

Chemical Features

center

Geometric Features

center

Geodesic Convolutions

center

Geodesic Convolutions

center

True Interacting Points

center

True Interacting Points

center

True Interacting Points

center

True Interacting Points

center

Point Clouds of a Complex

All Suface Points

center

Only True Interacting Points

center

Triplet Sampling

center

Triplet Sampling

center

Structural Antibody Database (SAbDab)

  • ~1400 antibody/antigen structures from PDB

  • Data splits:

    • Based on antigen clustering
    • ~1000 train
    • ~200 validation
    • ~200 test
  • Result: Close to state of the art

Special Thanks


center