torch_amt - PyTorch Auditory Modeling Toolbox

Differentiable, Hardware-accelerated PyTorch implementations of Computational Auditory models from the MATLAB Auditory Modeling Toolbox (AMT).

torch_amt - PyTorch Auditory Modeling Toolbox

Overview

torch_amt provides a comprehensive collection of differentiable auditory models and building blocks for psychoacoustic research, computational neuroscience, and audio deep learning applications.

Key Features:

🔥 Hardware acceleration - CUDA, MPS (Apple Silicon), and CPU support
📊 Fully differentiable - Integrate with neural networks and optimize via backpropagation
🧩 Modular architecture - Mix and match components for custom auditory pipelines
🎓 Scientific adherence - Matching MATLAB AMT v1.6.0 implementations
📚 Comprehensive documentation - Detailed API reference with equations and examples

Installation

pip install torch-amt

Or from source:

git clone https://github.com/StefanoGiacomelli/torch_amt.git
cd torch_amt
pip install -e .

Quick Start

Complete Auditory Model

import torch
import torch_amt

# Load Dau et al. (1997) model
model = torch_amt.Dau1997(fs=48000)

# Process 1 second of audio
audio = torch.randn(1, 48000)  # (batch, time)
output = model(audio)

print(f"Input: {audio.shape}")
# Input: torch.Size([1, 48000])
print(f"Output: List of {len(output)} frequency channels")
# Output: List of 31 frequency channels
print(f"Each channel shape: {output[0].shape}")
# Each channel shape: torch.Size([1, 8, 48000]) - (batch, modulation_channels, time)

Custom Processing Pipeline

import torch
import torch_amt

# Build custom auditory processing chain
filterbank = torch_amt.GammatoneFilterbank(fs=48000, fc=(80, 8000))
ihc = torch_amt.IHCEnvelope(fs=48000)
adaptation = torch_amt.AdaptLoop(fs=48000)

# Process signal
audio = torch.randn(2, 48000)     # Batch of 2 signals
filtered = filterbank(audio)      # (2, 31, 48000) - 31 frequency channels
envelope = ihc(filtered)          # (2, 31, 48000) - Envelope extraction
adapted = adaptation(envelope)    # (2, 31, 48000) - Temporal adaptation

print(f"Input: {audio.shape}")
# Input: torch.Size([2, 48000])
print(f"After Gammatone filterbank: {filtered.shape}")
# After Gammatone filterbank: torch.Size([2, 31, 48000])
print(f"After IHC envelope: {envelope.shape}")
# After IHC envelope: torch.Size([2, 31, 48000])
print(f"After adaptation: {adapted.shape}")
# After adaptation: torch.Size([2, 31, 48000])

Hardware Acceleration

import torch
import torch_amt

# Check available hardware
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"MPS available: {torch.backends.mps.is_available()}")

# Move model to GPU (CUDA or MPS)
model = torch_amt.Dau1997(fs=48000)

if torch.backends.mps.is_available():
    model = model.to('mps')  # Apple Silicon
    print(f"Using device: mps")
elif torch.cuda.is_available():
    model = model.cuda()  # NVIDIA GPU
    print(f"Using device: cuda")
else:
    print(f"Using device: cpu")

# Process on accelerated hardware
audio = torch.randn(8, 48000).to(model.gammatone_fb.fc.device)
output = model(audio)

Learnable Front-ends for Neural Networks

import torch
import torch.nn as nn
import torch_amt

class AudioClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        # Learnable auditory front-end
        self.auditory = torch_amt.King2019(fs=48000, learnable=True)
        self.classifier = nn.Linear(155, 10)  # 31 freqs × 5 mods = 155 → 10 classes

    def forward(self, audio):
        features = self.auditory(audio)     # (B, T, F, M) e.g., (4, 24000, 31, 5)
        pooled = features.mean(dim=1)       # (B, F, M) e.g., (4, 31, 5) - Pool over time
        flattened = pooled.flatten(1)       # (B, F×M) e.g., (4, 155)
        return self.classifier(flattened)   # (B, 10)

# Train end-to-end with backpropagation
model = AudioClassifier()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-1)

# Example forward pass
audio = torch.randn(4, 24000)  # Batch of 4 signals, 0.5 seconds @ 48kHz
logits = model(audio)  # (4, 10)
print(f"Input: {audio.shape} → Output: {logits.shape}")
# Input: torch.Size([4, 24000]) → Output: torch.Size([4, 10])

Available Models

torch_amt includes 6 complete auditory models:

Dau1997 - Temporal processing model with adaptation loops
Glasberg2002 - Loudness model with specific loudness transformation
Moore2016 - Binaural loudness model with spatial processing
King2019 - FM/AM masking model with broken-stick compression
Osses2021 - Temporal integration model
Paulick2024 - Physiological CASP model with advanced IHC

Plus 43+ building block components organized into:

Ear Models - Outer and middle ear filtering
Auditory Filterbanks - Gammatone, DRNL, excitation patterns
Inner Hair Cell Models - Envelope extraction, physiological models
Modulation Analysis - Temporal modulation filterbanks (standard & fast)
Loudness Processing - Compression, specific loudness, binaural processing
Signal Processing - Filters, transforms, utilities

Documentation Contents

Indices and Tables

Citation

If you use torch_amt in your research, please cite:

@software{giacomelli2026torch_amt,
  author = {Giacomelli, Stefano},
  title = {torch\_amt: PyTorch Auditory Modeling Toolbox},
  year = {2026},
  url = {https://github.com/StefanoGiacomelli/torch_amt},
  version = {0.1.0}
}

Also consider citing the original AMT paper:

@article{majdak2022amt,
  author = {Majdak, Piotr and Hollomey, Clara and Baumgartner, Robert},
  title = {AMT 1.x: A toolbox for reproducible research in auditory modeling},
  journal = {Acta Acustica},
  volume = {6},
  pages = {19},
  year = {2022},
  doi = {10.1051/aacus/2022011},
  url = {https://amtoolbox.org/}
}

Contact

Stefano Giacomelli ICT - Ph.D. Candidate Department of Engineering, Information Science & Mathematics (DISIM dpt.) University of L’Aquila, Italy

📧 Email: stefano.giacomelli@graduate.univaq.it 🔗 GitHub: https://github.com/StefanoGiacomelli 🆔 ORCID: https://orcid.org/0009-0009-0438-1748 🎓 Scholar: https://scholar.google.com/citations?user=l-n0hl4AAAAJ&hl=it 💼 LinkedIn: https://www.linkedin.com/in/stefano-giacomelli-811654135

This project is funded under the Italian National Ministry of University and Research, for the Italian National Recovery and Resilience Plan (NRRP) “Methods of Computational Auditory Scene Analysis and Synthesis supporting eXtended and Immersive Reality Services”

License

This project is licensed under the GNU General Public License v3.0 or later (GPLv3+).