torch_amt - PyTorch Auditory Modeling Toolbox
==============================================

**Differentiable, Hardware-accelerated PyTorch implementations of Computational Auditory models from the MATLAB Auditory Modeling Toolbox (AMT).**

.. image:: https://img.shields.io/badge/License-GPLv3-blue.svg
   :target: https://www.gnu.org/licenses/gpl-3.0
   :alt: License: GPL v3

.. image:: https://img.shields.io/badge/python-3.14+-blue.svg
   :target: https://www.python.org/downloads/
   :alt: Python 3.14+

.. image:: https://img.shields.io/badge/PyTorch-2.0+-ee4c2c.svg
   :target: https://pytorch.org/
   :alt: PyTorch 2.0+

.. figure:: ../../dev/AMT_front_image.png
   :alt: torch_amt - PyTorch Auditory Modeling Toolbox
   :align: center
   :width: 800px

Overview
--------

torch_amt provides a comprehensive collection of differentiable auditory models and building blocks
for psychoacoustic research, computational neuroscience, and audio deep learning applications.

**Key Features:**

* 🔥 **Hardware acceleration** - CUDA, MPS (Apple Silicon), and CPU support
* 📊 **Fully differentiable** - Integrate with neural networks and optimize via backpropagation
* 🧩 **Modular architecture** - Mix and match components for custom auditory pipelines
* 🎓 **Scientific adherence** - Matching MATLAB AMT v1.6.0 implementations
* 📚 **Comprehensive documentation** - Detailed API reference with equations and examples

Installation
------------

.. code-block:: bash

   pip install torch-amt

Or from source:

.. code-block:: bash

   git clone https://github.com/StefanoGiacomelli/torch_amt.git
   cd torch_amt
   pip install -e .

Quick Start
-----------

Complete Auditory Model
~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    import torch
    import torch_amt

    # Load Dau et al. (1997) model
    model = torch_amt.Dau1997(fs=48000)

    # Process 1 second of audio
    audio = torch.randn(1, 48000)  # (batch, time)
    output = model(audio)

    print(f"Input: {audio.shape}")
    # Input: torch.Size([1, 48000])
    print(f"Output: List of {len(output)} frequency channels")
    # Output: List of 31 frequency channels
    print(f"Each channel shape: {output[0].shape}")
    # Each channel shape: torch.Size([1, 8, 48000]) - (batch, modulation_channels, time)

Custom Processing Pipeline
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    import torch
    import torch_amt

    # Build custom auditory processing chain
    filterbank = torch_amt.GammatoneFilterbank(fs=48000, fc=(80, 8000))
    ihc = torch_amt.IHCEnvelope(fs=48000)
    adaptation = torch_amt.AdaptLoop(fs=48000)

    # Process signal
    audio = torch.randn(2, 48000)     # Batch of 2 signals
    filtered = filterbank(audio)      # (2, 31, 48000) - 31 frequency channels
    envelope = ihc(filtered)          # (2, 31, 48000) - Envelope extraction
    adapted = adaptation(envelope)    # (2, 31, 48000) - Temporal adaptation

    print(f"Input: {audio.shape}")
    # Input: torch.Size([2, 48000])
    print(f"After Gammatone filterbank: {filtered.shape}")
    # After Gammatone filterbank: torch.Size([2, 31, 48000])
    print(f"After IHC envelope: {envelope.shape}")
    # After IHC envelope: torch.Size([2, 31, 48000])
    print(f"After adaptation: {adapted.shape}")
    # After adaptation: torch.Size([2, 31, 48000])

Hardware Acceleration
~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    import torch
    import torch_amt

    # Check available hardware
    print(f"CUDA available: {torch.cuda.is_available()}")
    print(f"MPS available: {torch.backends.mps.is_available()}")

    # Move model to GPU (CUDA or MPS)
    model = torch_amt.Dau1997(fs=48000)

    if torch.backends.mps.is_available():
        model = model.to('mps')  # Apple Silicon
        print(f"Using device: mps")
    elif torch.cuda.is_available():
        model = model.cuda()  # NVIDIA GPU
        print(f"Using device: cuda")
    else:
        print(f"Using device: cpu")

    # Process on accelerated hardware
    audio = torch.randn(8, 48000).to(model.gammatone_fb.fc.device)
    output = model(audio)

Learnable Front-ends for Neural Networks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    import torch
    import torch.nn as nn
    import torch_amt

    class AudioClassifier(nn.Module):
        def __init__(self):
            super().__init__()
            # Learnable auditory front-end
            self.auditory = torch_amt.King2019(fs=48000, learnable=True)
            self.classifier = nn.Linear(155, 10)  # 31 freqs × 5 mods = 155 → 10 classes
        
        def forward(self, audio):
            features = self.auditory(audio)     # (B, T, F, M) e.g., (4, 24000, 31, 5)
            pooled = features.mean(dim=1)       # (B, F, M) e.g., (4, 31, 5) - Pool over time
            flattened = pooled.flatten(1)       # (B, F×M) e.g., (4, 155)
            return self.classifier(flattened)   # (B, 10)

    # Train end-to-end with backpropagation
    model = AudioClassifier()
    optimizer = torch.optim.SGD(model.parameters(), lr=1e-1)

    # Example forward pass
    audio = torch.randn(4, 24000)  # Batch of 4 signals, 0.5 seconds @ 48kHz
    logits = model(audio)  # (4, 10)
    print(f"Input: {audio.shape} → Output: {logits.shape}")
    # Input: torch.Size([4, 24000]) → Output: torch.Size([4, 10])

Available Models
----------------

torch_amt includes 6 complete auditory models:

* **Dau1997** - Temporal processing model with adaptation loops
* **Glasberg2002** - Loudness model with specific loudness transformation
* **Moore2016** - Binaural loudness model with spatial processing
* **King2019** - FM/AM masking model with broken-stick compression
* **Osses2021** - Temporal integration model
* **Paulick2024** - Physiological CASP model with advanced IHC

Plus 43+ building block components organized into:

* **Ear Models** - Outer and middle ear filtering
* **Auditory Filterbanks** - Gammatone, DRNL, excitation patterns
* **Inner Hair Cell Models** - Envelope extraction, physiological models
* **Modulation Analysis** - Temporal modulation filterbanks (standard & fast)
* **Loudness Processing** - Compression, specific loudness, binaural processing
* **Signal Processing** - Filters, transforms, utilities

Documentation Contents
----------------------

.. toctree::
   :maxdepth: 2
   :caption: User Guide

   installation
   quickstart
   tutorials

.. toctree::
   :maxdepth: 2
   :caption: API Reference

   api/models
   api/filterbanks
   api/ihc
   api/adaptation
   api/modulation
   api/loudness
   api/ears
   api/filters

.. toctree::
   :maxdepth: 1
   :caption: Additional Information

   changelog
   license
   citing

Indices and Tables
------------------

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

Citation
--------

If you use torch_amt in your research, please cite:

.. code-block:: bibtex

   @software{giacomelli2026torch_amt,
     author = {Giacomelli, Stefano},
     title = {torch\_amt: PyTorch Auditory Modeling Toolbox},
     year = {2026},
     url = {https://github.com/StefanoGiacomelli/torch_amt},
     version = {0.1.0}
   }

Also consider citing the original AMT paper:

.. code-block:: bibtex

   @article{majdak2022amt,
     author = {Majdak, Piotr and Hollomey, Clara and Baumgartner, Robert},
     title = {AMT 1.x: A toolbox for reproducible research in auditory modeling},
     journal = {Acta Acustica},
     volume = {6},
     pages = {19},
     year = {2022},
     doi = {10.1051/aacus/2022011},
     url = {https://amtoolbox.org/}
   }

Contact
-------

**Stefano Giacomelli**  
ICT - Ph.D. Candidate  
Department of Engineering, Information Science & Mathematics (DISIM dpt.)
University of L'Aquila, Italy

.. figure:: https://phdict.disim.univaq.it/wp-content/uploads/2024/06/logo-univaq-disim-2-2-768x283.png
   :alt: DISIM - University of L'Aquila
   :align: left
   :width: 400px
   :height: 150px

📧 Email: stefano.giacomelli@graduate.univaq.it  
🔗 GitHub: https://github.com/StefanoGiacomelli 
🆔 ORCID: https://orcid.org/0009-0009-0438-1748 
🎓 Scholar: https://scholar.google.com/citations?user=l-n0hl4AAAAJ&hl=it  
💼 LinkedIn: https://www.linkedin.com/in/stefano-giacomelli-811654135

*This project is funded under the Italian National Ministry of University and Research, for the Italian National Recovery and Resilience Plan (NRRP) "Methods of Computational Auditory Scene Analysis and Synthesis supporting eXtended and Immersive Reality Services"*

License
-------

This project is licensed under the GNU General Public License v3.0 or later (GPLv3+).