HIPPIE: A Generative Model for Electrophysiological Analysis Across Species, Technologies, and Modalities

HIPPIE is a conditional variational autoencoder for extracellular electrophysiology. It ingests three per-unit modalities (mean waveform, inter-spike-interval distribution, autocorrelogram) and learns a joint latent representation conditioned on recording technology. The released checkpoint was pretrained on 11 publicly available electrophysiology datasets and is intended for transfer to new recordings without retraining the full model.


Model Details

Property Value
Architecture Conditional VAE (1D ResNet18 encoders, MLP fusion, per-modality decoders)
Latent dimension 30 (10 per modality across 3 modalities)
KL weight (β) 1.0
Input modalities Waveform (50 samples), ISI (100 bins), ACG (100 bins)
Conditioning 3-class recording-technology embedding (dim=5)
Pretraining datasets 11 (see below)
Framework PyTorch + PyTorch Lightning

Architecture

Each modality is independently encoded by a 1D ResNet18 backbone with channel widths [64, 128, 256, 512]. A fusion encoder concatenates the three modality representations and projects to the shared latent space. The decoder mirrors the encoder structure and is conditioned on both the latent code and (during supervised fine-tuning) a cell-type label; the encoder is class-agnostic at inference time (encoder_uses_class_embedding=False in the production config). The released checkpoint disables super-region and layer conditioning (num_super_regions=0, num_layers=0); only the recording-technology source embedding is active.

For datasets that ship without an autocorrelogram (bimodal recordings), the ACG channel is zero-filled.


Training Data

Pretrained on 11 labeled electrophysiology datasets spanning mouse, rat, and macaque. Cell-type counts and sample sizes follow the manuscript Methods.

Dataset Identifier Species / Region Technology Sample size
Hausser (Beau et al. 2024, C4) hausser_cell_type Mouse cerebellar cortex Neuropixels 1.0 1,998 total / 113 labeled (6 types)
Hull (Beau et al. 2024, C4) hull_cell_type Mouse cerebellar cortex Neuropixels 1.0 103 labeled (5 types)
Lisberger (Beau et al. 2024, C4) lissberger_labeled_cell_type Macaque cerebellar floccular complex 16-ch Plexon silicon s-Probe 1,152 total / 668 labeled (5 types)
CellExplorer (Petersen et al. 2021) cellexplorer_cell_type Mouse visual cortex / hippocampus Neuropixels 1.0 430 neurons (7 classes)
IBL Brainwide Map ibl_brainwide_good Mouse whole-brain (10 Cosmos regions) Neuropixels 62,993 neurons / 139 subjects
Allen Visual Coding allen_scope_neuropixel_area_subset Mouse cortex / hippocampus / thalamus (19 CCF regions) Neuropixels 61,781 neurons / 47 mice
Watson (DANDI 000041) dandi_000041_cell_type Rat frontal cortex 64-site silicon probe 221 neurons (2 classes)
Calvigioni (DANDI 000473) dandi_000473_cell_type Mouse prefrontal cortex Neuropixels 9,213 neurons (2 classes)
Ramachandran (DANDI 000955) dandi_000955_cell_type Rat somatosensory cortex 32-ch NeuroNexus electrode 134 neurons (2 classes)
A1 (Lakunina et al. 2020) a1data_remove_undef Mouse auditory cortex Silicon probe 285 neurons (3 classes)
Juxtacellular S1 (Yu et al. 2019) juxtacellular_mouse_s1_area Mouse barrel cortex Juxtasomal glass micropipette 224 neurons (5 classes)

Technology Conditioning Vocabulary

hippie.inference.TECHNOLOGY_IDS maps the three technology classes to the integer used by the source-conditioning embedding:

ID Label Datasets that used this slot during pretraining
0 neuropixels Hausser, Hull, CellExplorer, IBL, Allen, Calvigioni (DANDI 000473)
1 silicon_probe Lisberger, Watson (DANDI 000041), Ramachandran (DANDI 000955), A1
2 juxtacellular Juxtacellular Mouse S1

Installation

The released hippie Python package wraps the model and provides a small inference API. Install it from the HIPPIE release:

git clone https://github.com/braingeneers/HIPPIE.git
cd HIPPIE
pip install -e .
pip install huggingface-hub   # to pull the checkpoint

How To Use

Quick start: pretrained encoder from the Hub

from hippie.inference import HIPPIEClassifier

classifier = HIPPIEClassifier.from_pretrained(
    repo_id="Jesusgf23/hippie",
    filename="hippie_techcond_v1.ckpt",
    device="cuda",      # or "cpu"
)

Extract latent embeddings

import numpy as np
from hippie.inference import TECHNOLOGY_IDS

# Inputs (float32, batch-first), all pre-normalized to [-1, 1]:
#   wave: (N, 50)   waveform, min-max normalized
#   isi:  (N, 100)  log(x+1) of ISI histogram, min-max normalized
#   acg:  (N, 100)  autocorrelogram, min-max normalized (zeros if unavailable)

z = classifier.get_embeddings(
    wave=wave,
    isi=isi,
    acg=acg,
    tech_id=TECHNOLOGY_IDS["neuropixels"],   # or 0, 1, 2, 3
    batch_size=256,
)
# z shape: (N, 30)

UMAP and HDBSCAN on the embeddings

coords = HIPPIEClassifier.umap_reduce(z, n_components=2, n_neighbors=30, metric="cosine")
clusters = HIPPIEClassifier.hdbscan_cluster(z, min_cluster_size=5)

Loading from a local checkpoint

classifier = HIPPIEClassifier.from_checkpoint("./hippie_techcond_v1.ckpt", device="cuda")

End-to-end script

A runnable CLI version of the snippets above lives at examples/extract_embeddings.py in the GitHub repo. It loads the checkpoint (Hub by default, --checkpoint for local), iterates a directory of per-dataset CSVs in the canonical HIPPIE layout, and writes the concatenated embeddings to a single .npz:

python examples/extract_embeddings.py \
  --datasets-root ./datasets_hippie \
  --output ./embeddings.npz

The extract_embeddings.py file shipped alongside this model card is a mirror — the GitHub copy is the source of truth.

Preprocessing reference

The model expects each modality to be min-max normalized to [-1, 1] and resampled to the canonical lengths (waveform: 50, ISI: 100, ACG: 100). ISI is additionally log(x+1)-transformed before normalization. The MultiModalEphysDataset in hippie/dataloading.py is the canonical implementation; the snippet below reproduces it for the waveform case:

import numpy as np
import torch
import torch.nn.functional as F

def preprocess_waveform(raw: np.ndarray, wave_len: int = 50) -> np.ndarray:
    t = torch.as_tensor(raw, dtype=torch.float32)
    if t.dim() == 1:
        t = t.unsqueeze(0)
    if t.shape[-1] != wave_len:
        t = F.interpolate(t.unsqueeze(1), size=(wave_len,),
                          mode="linear", align_corners=False).squeeze(1)
    mn = t.amin(dim=-1, keepdim=True)
    mx = t.amax(dim=-1, keepdim=True)
    return ((t - mn) / (mx - mn + 1e-8) * 2.0 - 1.0).numpy().astype(np.float32)

Fine-tune on a new dataset

# Requires the HIPPIE release (cross_dataset_script.py)
python cross_dataset_script.py \
  --training-dataset YOUR_DATASET \
  --pretrain-checkpoint ./hippie_techcond_v1.ckpt \
  --config class_decoder_source_bn_aug_reg \
  --z_dim 30 --beta 1.0

Artifacts in this repo

File Description
hippie_techcond_v1.ckpt PyTorch Lightning checkpoint (technology-conditioned)
embeddings.npz Precomputed 30-D embeddings for all 11 labeled datasets
umap_coords.npz 2-D UMAP projection of the embeddings
umap_by_dataset.png UMAP scatter colored by dataset
extract_embeddings.py Script that reproduces embeddings.npz from the checkpoint
download_artifacts.sh Helper to pull artifacts from S3

Citation

If you use this model, please cite:

@article{gonzalez-ferrer2025hippie,
  title   = {HIPPIE: A Generative Model for Electrophysiological Analysis
             Across Species, Technologies, and Modalities},
  author  = {Gonzalez-Ferrer, Jesus and Lehrer, Julian and
             Alvarez-Esteban, Bruno and Schweiger, Hunter E. and
             Geng, Jinghui and Eugenio dos Santos, Luiz F. S. and
             Moreno-Ochando, Avelina and Hernandez, Sebastian and
             Reyes, Francisco and Sevetson, Jess L. and
             Schneider, Aidan and Salama, Sofie R. and
             Teodorescu, Mircea and Haussler, David and
             Mostajo-Radji, Mohammed A.},
  year    = {2025},
  note    = {Under revision at Nature Communications}
}

License

Apache 2.0. See LICENSE for details.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support