PLMNeuron
AI & ML interests
PLMs, ESM, AlphaFold, AlphaGenome
PLMNeuron is an interdisciplinary research collective at UC Berkeley dedicated to uncovering how protein language models “think.”
🔬 We investigate the inner workings of state-of-the-art protein language models (PLMs), such as ESM2, by mapping biophysical and functional properties—like charge, hydrophobicity, and secondary structure motifs—onto individual neurons. Our goal is to turn black-box embeddings into transparent, interpretable, and steerable representations.
📚 This organization hosts datasets and resources for the paper:
"Automated Neuron Labelling Enables Generative Steering and Interpretability in Protein Language Models"
Proceedings of the Workshop on Generative AI for Biology at the 42nd International Conference on Machine Learning (ICML), Vancouver, Canada. PMLR 267, 2025.
🧬 These datasets enable research on interpretability and generative control by providing natural language annotations of individual neurons across multiple layers and model sizes.
➡️ We welcome collaboration! Dive into the datasets or reach out to connect.
📊 Dataset Index (Work in Progress)
| Dataset Name | Link | Description | Schema Overview |
|---|---|---|---|
protolyze/esm3B_500k_neuron_explanations_redo |
đź”— Link | Annotations of neurons in ESM2-t36-3B with interpretable biophysical and functional features. | neuron_id, explanation_1, explanation_2 |