torch_amt - PyTorch Auditory Modeling Toolboxο
Differentiable, Hardware-accelerated PyTorch implementations of Computational Auditory models from the MATLAB Auditory Modeling Toolbox (AMT).
Overviewο
torch_amt provides a comprehensive collection of differentiable auditory models and building blocks for psychoacoustic research, computational neuroscience, and audio deep learning applications.
Key Features:
π₯ Hardware acceleration - CUDA, MPS (Apple Silicon), and CPU support
π Fully differentiable - Integrate with neural networks and optimize via backpropagation
π§© Modular architecture - Mix and match components for custom auditory pipelines
π Scientific adherence - Matching MATLAB AMT v1.6.0 implementations
π Comprehensive documentation - Detailed API reference with equations and examples
Installationο
pip install torch-amt
Or from source:
git clone https://github.com/StefanoGiacomelli/torch_amt.git
cd torch_amt
pip install -e .
Quick Startο
Complete Auditory Modelο
import torch
import torch_amt
# Load Dau et al. (1997) model
model = torch_amt.Dau1997(fs=48000)
# Process 1 second of audio
audio = torch.randn(1, 48000) # (batch, time)
output = model(audio)
print(f"Input: {audio.shape}")
# Input: torch.Size([1, 48000])
print(f"Output: List of {len(output)} frequency channels")
# Output: List of 31 frequency channels
print(f"Each channel shape: {output[0].shape}")
# Each channel shape: torch.Size([1, 8, 48000]) - (batch, modulation_channels, time)
Custom Processing Pipelineο
import torch
import torch_amt
# Build custom auditory processing chain
filterbank = torch_amt.GammatoneFilterbank(fs=48000, fc=(80, 8000))
ihc = torch_amt.IHCEnvelope(fs=48000)
adaptation = torch_amt.AdaptLoop(fs=48000)
# Process signal
audio = torch.randn(2, 48000) # Batch of 2 signals
filtered = filterbank(audio) # (2, 31, 48000) - 31 frequency channels
envelope = ihc(filtered) # (2, 31, 48000) - Envelope extraction
adapted = adaptation(envelope) # (2, 31, 48000) - Temporal adaptation
print(f"Input: {audio.shape}")
# Input: torch.Size([2, 48000])
print(f"After Gammatone filterbank: {filtered.shape}")
# After Gammatone filterbank: torch.Size([2, 31, 48000])
print(f"After IHC envelope: {envelope.shape}")
# After IHC envelope: torch.Size([2, 31, 48000])
print(f"After adaptation: {adapted.shape}")
# After adaptation: torch.Size([2, 31, 48000])
Hardware Accelerationο
import torch
import torch_amt
# Check available hardware
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"MPS available: {torch.backends.mps.is_available()}")
# Move model to GPU (CUDA or MPS)
model = torch_amt.Dau1997(fs=48000)
if torch.backends.mps.is_available():
model = model.to('mps') # Apple Silicon
print(f"Using device: mps")
elif torch.cuda.is_available():
model = model.cuda() # NVIDIA GPU
print(f"Using device: cuda")
else:
print(f"Using device: cpu")
# Process on accelerated hardware
audio = torch.randn(8, 48000).to(model.gammatone_fb.fc.device)
output = model(audio)
Learnable Front-ends for Neural Networksο
import torch
import torch.nn as nn
import torch_amt
class AudioClassifier(nn.Module):
def __init__(self):
super().__init__()
# Learnable auditory front-end
self.auditory = torch_amt.King2019(fs=48000, learnable=True)
self.classifier = nn.Linear(155, 10) # 31 freqs Γ 5 mods = 155 β 10 classes
def forward(self, audio):
features = self.auditory(audio) # (B, T, F, M) e.g., (4, 24000, 31, 5)
pooled = features.mean(dim=1) # (B, F, M) e.g., (4, 31, 5) - Pool over time
flattened = pooled.flatten(1) # (B, FΓM) e.g., (4, 155)
return self.classifier(flattened) # (B, 10)
# Train end-to-end with backpropagation
model = AudioClassifier()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-1)
# Example forward pass
audio = torch.randn(4, 24000) # Batch of 4 signals, 0.5 seconds @ 48kHz
logits = model(audio) # (4, 10)
print(f"Input: {audio.shape} β Output: {logits.shape}")
# Input: torch.Size([4, 24000]) β Output: torch.Size([4, 10])
Available Modelsο
torch_amt includes 6 complete auditory models:
Dau1997 - Temporal processing model with adaptation loops
Glasberg2002 - Loudness model with specific loudness transformation
Moore2016 - Binaural loudness model with spatial processing
King2019 - FM/AM masking model with broken-stick compression
Osses2021 - Temporal integration model
Paulick2024 - Physiological CASP model with advanced IHC
Plus 43+ building block components organized into:
Ear Models - Outer and middle ear filtering
Auditory Filterbanks - Gammatone, DRNL, excitation patterns
Inner Hair Cell Models - Envelope extraction, physiological models
Modulation Analysis - Temporal modulation filterbanks (standard & fast)
Loudness Processing - Compression, specific loudness, binaural processing
Signal Processing - Filters, transforms, utilities
Documentation Contentsο
API Reference
Indices and Tablesο
Citationο
If you use torch_amt in your research, please cite:
@software{giacomelli2026torch_amt,
author = {Giacomelli, Stefano},
title = {torch\_amt: PyTorch Auditory Modeling Toolbox},
year = {2026},
url = {https://github.com/StefanoGiacomelli/torch_amt},
version = {0.1.0}
}
Also consider citing the original AMT paper:
@article{majdak2022amt,
author = {Majdak, Piotr and Hollomey, Clara and Baumgartner, Robert},
title = {AMT 1.x: A toolbox for reproducible research in auditory modeling},
journal = {Acta Acustica},
volume = {6},
pages = {19},
year = {2022},
doi = {10.1051/aacus/2022011},
url = {https://amtoolbox.org/}
}
Contactο
Stefano Giacomelli ICT - Ph.D. Candidate Department of Engineering, Information Science & Mathematics (DISIM dpt.) University of LβAquila, Italy
π§ Email: stefano.giacomelli@graduate.univaq.it π GitHub: https://github.com/StefanoGiacomelli π ORCID: https://orcid.org/0009-0009-0438-1748 π Scholar: https://scholar.google.com/citations?user=l-n0hl4AAAAJ&hl=it πΌ LinkedIn: https://www.linkedin.com/in/stefano-giacomelli-811654135
This project is funded under the Italian National Ministry of University and Research, for the Italian National Recovery and Resilience Plan (NRRP) βMethods of Computational Auditory Scene Analysis and Synthesis supporting eXtended and Immersive Reality Servicesβ
Licenseο
This project is licensed under the GNU General Public License v3.0 or later (GPLv3+).