torch_amt - PyTorch Auditory Modeling Toolbox

Differentiable, Hardware-accelerated PyTorch implementations of Computational Auditory models from the MATLAB Auditory Modeling Toolbox (AMT).

License: GPL v3 Python 3.14+ PyTorch 2.0+
torch_amt - PyTorch Auditory Modeling Toolbox

Overview

torch_amt provides a comprehensive collection of differentiable auditory models and building blocks for psychoacoustic research, computational neuroscience, and audio deep learning applications.

Key Features:

  • πŸ”₯ Hardware acceleration - CUDA, MPS (Apple Silicon), and CPU support

  • πŸ“Š Fully differentiable - Integrate with neural networks and optimize via backpropagation

  • 🧩 Modular architecture - Mix and match components for custom auditory pipelines

  • πŸŽ“ Scientific adherence - Matching MATLAB AMT v1.6.0 implementations

  • πŸ“š Comprehensive documentation - Detailed API reference with equations and examples

Installation

pip install torch-amt

Or from source:

git clone https://github.com/StefanoGiacomelli/torch_amt.git
cd torch_amt
pip install -e .

Quick Start

Complete Auditory Model

import torch
import torch_amt

# Load Dau et al. (1997) model
model = torch_amt.Dau1997(fs=48000)

# Process 1 second of audio
audio = torch.randn(1, 48000)  # (batch, time)
output = model(audio)

print(f"Input: {audio.shape}")
# Input: torch.Size([1, 48000])
print(f"Output: List of {len(output)} frequency channels")
# Output: List of 31 frequency channels
print(f"Each channel shape: {output[0].shape}")
# Each channel shape: torch.Size([1, 8, 48000]) - (batch, modulation_channels, time)

Custom Processing Pipeline

import torch
import torch_amt

# Build custom auditory processing chain
filterbank = torch_amt.GammatoneFilterbank(fs=48000, fc=(80, 8000))
ihc = torch_amt.IHCEnvelope(fs=48000)
adaptation = torch_amt.AdaptLoop(fs=48000)

# Process signal
audio = torch.randn(2, 48000)     # Batch of 2 signals
filtered = filterbank(audio)      # (2, 31, 48000) - 31 frequency channels
envelope = ihc(filtered)          # (2, 31, 48000) - Envelope extraction
adapted = adaptation(envelope)    # (2, 31, 48000) - Temporal adaptation

print(f"Input: {audio.shape}")
# Input: torch.Size([2, 48000])
print(f"After Gammatone filterbank: {filtered.shape}")
# After Gammatone filterbank: torch.Size([2, 31, 48000])
print(f"After IHC envelope: {envelope.shape}")
# After IHC envelope: torch.Size([2, 31, 48000])
print(f"After adaptation: {adapted.shape}")
# After adaptation: torch.Size([2, 31, 48000])

Hardware Acceleration

import torch
import torch_amt

# Check available hardware
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"MPS available: {torch.backends.mps.is_available()}")

# Move model to GPU (CUDA or MPS)
model = torch_amt.Dau1997(fs=48000)

if torch.backends.mps.is_available():
    model = model.to('mps')  # Apple Silicon
    print(f"Using device: mps")
elif torch.cuda.is_available():
    model = model.cuda()  # NVIDIA GPU
    print(f"Using device: cuda")
else:
    print(f"Using device: cpu")

# Process on accelerated hardware
audio = torch.randn(8, 48000).to(model.gammatone_fb.fc.device)
output = model(audio)

Learnable Front-ends for Neural Networks

import torch
import torch.nn as nn
import torch_amt

class AudioClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        # Learnable auditory front-end
        self.auditory = torch_amt.King2019(fs=48000, learnable=True)
        self.classifier = nn.Linear(155, 10)  # 31 freqs Γ— 5 mods = 155 β†’ 10 classes

    def forward(self, audio):
        features = self.auditory(audio)     # (B, T, F, M) e.g., (4, 24000, 31, 5)
        pooled = features.mean(dim=1)       # (B, F, M) e.g., (4, 31, 5) - Pool over time
        flattened = pooled.flatten(1)       # (B, FΓ—M) e.g., (4, 155)
        return self.classifier(flattened)   # (B, 10)

# Train end-to-end with backpropagation
model = AudioClassifier()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-1)

# Example forward pass
audio = torch.randn(4, 24000)  # Batch of 4 signals, 0.5 seconds @ 48kHz
logits = model(audio)  # (4, 10)
print(f"Input: {audio.shape} β†’ Output: {logits.shape}")
# Input: torch.Size([4, 24000]) β†’ Output: torch.Size([4, 10])

Available Models

torch_amt includes 6 complete auditory models:

  • Dau1997 - Temporal processing model with adaptation loops

  • Glasberg2002 - Loudness model with specific loudness transformation

  • Moore2016 - Binaural loudness model with spatial processing

  • King2019 - FM/AM masking model with broken-stick compression

  • Osses2021 - Temporal integration model

  • Paulick2024 - Physiological CASP model with advanced IHC

Plus 43+ building block components organized into:

  • Ear Models - Outer and middle ear filtering

  • Auditory Filterbanks - Gammatone, DRNL, excitation patterns

  • Inner Hair Cell Models - Envelope extraction, physiological models

  • Modulation Analysis - Temporal modulation filterbanks (standard & fast)

  • Loudness Processing - Compression, specific loudness, binaural processing

  • Signal Processing - Filters, transforms, utilities

Documentation Contents

Indices and Tables

Citation

If you use torch_amt in your research, please cite:

@software{giacomelli2026torch_amt,
  author = {Giacomelli, Stefano},
  title = {torch\_amt: PyTorch Auditory Modeling Toolbox},
  year = {2026},
  url = {https://github.com/StefanoGiacomelli/torch_amt},
  version = {0.1.0}
}

Also consider citing the original AMT paper:

@article{majdak2022amt,
  author = {Majdak, Piotr and Hollomey, Clara and Baumgartner, Robert},
  title = {AMT 1.x: A toolbox for reproducible research in auditory modeling},
  journal = {Acta Acustica},
  volume = {6},
  pages = {19},
  year = {2022},
  doi = {10.1051/aacus/2022011},
  url = {https://amtoolbox.org/}
}

Contact

Stefano Giacomelli ICT - Ph.D. Candidate Department of Engineering, Information Science & Mathematics (DISIM dpt.) University of L’Aquila, Italy

DISIM - University of L'Aquila

πŸ“§ Email: stefano.giacomelli@graduate.univaq.it πŸ”— GitHub: https://github.com/StefanoGiacomelli πŸ†” ORCID: https://orcid.org/0009-0009-0438-1748 πŸŽ“ Scholar: https://scholar.google.com/citations?user=l-n0hl4AAAAJ&hl=it πŸ’Ό LinkedIn: https://www.linkedin.com/in/stefano-giacomelli-811654135

This project is funded under the Italian National Ministry of University and Research, for the Italian National Recovery and Resilience Plan (NRRP) β€œMethods of Computational Auditory Scene Analysis and Synthesis supporting eXtended and Immersive Reality Services”

License

This project is licensed under the GNU General Public License v3.0 or later (GPLv3+).