AI & ML interests

PLMs, ESM, AlphaFold, AlphaGenome

PLMNeuron is an interdisciplinary research collective at UC Berkeley dedicated to uncovering how protein language models “think.”

🔬 We investigate the inner workings of state-of-the-art protein language models (PLMs), such as ESM2, by mapping biophysical and functional properties—like charge, hydrophobicity, and secondary structure motifs—onto individual neurons. Our goal is to turn black-box embeddings into transparent, interpretable, and steerable representations.

📚 This organization hosts datasets and resources for the paper:

"Automated Neuron Labelling Enables Generative Steering and Interpretability in Protein Language Models"
Proceedings of the Workshop on Generative AI for Biology at the 42nd International Conference on Machine Learning (ICML), Vancouver, Canada. PMLR 267, 2025.

Hugging Face Paper arXiv ICML

🧬 These datasets enable research on interpretability and generative control by providing natural language annotations of individual neurons across multiple layers and model sizes.

➡️ We welcome collaboration! Dive into the datasets or reach out to connect.


📊 Dataset Index (Work in Progress)

Dataset Name Link Description Schema Overview
protolyze/esm3B_500k_neuron_explanations_redo đź”— Link Annotations of neurons in ESM2-t36-3B with interpretable biophysical and functional features. neuron_id, explanation_1, explanation_2

models 0

None public yet