protolyze (PLMNeuron)

Organization Card

PLMNeuron is an interdisciplinary research collective at UC Berkeley dedicated to uncovering how protein language models “think.”

🔬 We investigate the inner workings of state-of-the-art protein language models (PLMs), such as ESM2, by mapping biophysical and functional properties—like charge, hydrophobicity, and secondary structure motifs—onto individual neurons. Our goal is to turn black-box embeddings into transparent, interpretable, and steerable representations.

📚 This organization hosts datasets and resources for the paper:

"Automated Neuron Labelling Enables Generative Steering and Interpretability in Protein Language Models"
Proceedings of the Workshop on Generative AI for Biology at the 42nd International Conference on Machine Learning (ICML), Vancouver, Canada. PMLR 267, 2025.

🧬 These datasets enable research on interpretability and generative control by providing natural language annotations of individual neurons across multiple layers and model sizes.

➡️ We welcome collaboration! Dive into the datasets or reach out to connect.

📊 Dataset Index (Work in Progress)

Dataset Name	Link	Description	Schema Overview
`protolyze/esm3B_500k_neuron_explanations_redo`	🔗 Link	Annotations of neurons in ESM2-t36-3B with interpretable biophysical and functional features.	`neuron_id`, `explanation_1`, `explanation_2`

PLMNeuron

AI & ML interests

📊 Dataset Index (Work in Progress)

Collections 1

Automated Neuron Labelling Enables Generative Steering and Interpretability in Protein Language Models

Automated Neuron Labelling Enables Generative Steering and Interpretability in Protein Language Models

models 0

datasets 52

protolyze/simulator_X_simple

protolyze/3B_original_steering_metrics_all

protolyze/weight_masked_steering

protolyze/combined_steering_experiments

protolyze/utility

protolyze/steering_experiment_2

protolyze/multi_goal_steering

protolyze/steered_sequences_3B_multiple

protolyze/8M_original_steering_metrics_all

protolyze/utility_multi_steer

AI & ML interests

Team members 4

📊 Dataset Index (Work in Progress)

Collections 1

models 0

datasets 52 Sort: Recently updated

datasets 52