Source code for torch_amt.models.glasberg2002

"""
Glasberg & Moore (2002) Loudness Model
======================================

Author:
    Stefano Giacomelli - Ph.D. candidate @ DISIM dpt. - University of L'Aquila

License:
    GNU General Public License v3.0 or later (GPLv3+)

This module implements the Glasberg & Moore (2002) model for perceptual loudness 
computation applicable to time-varying sounds. The model provides a complete pipeline 
from audio waveform to loudness perception in sone units, accounting for frequency-
dependent hearing sensitivity, masking effects, and temporal dynamics.

The implementation is ported from the MATLAB Auditory Modeling Toolbox (AMT) 
and extended with PyTorch for gradient-based optimization and GPU acceleration.

References
----------
.. [1] B. R. Glasberg and B. C. J. Moore, "A Model of Loudness Applicable to 
       Time-Varying Sounds," *J. Audio Eng. Soc.*, vol. 50, no. 5, pp. 331-342, 
       May 2002.

.. [2] B. C. J. Moore and B. R. Glasberg, "A Model for the Prediction of Thresholds, 
       Loudness, and Partial Loudness," *J. Audio Eng. Soc.*, vol. 45, no. 4, 
       pp. 224-240, Apr. 1997.

.. [3] B. R. Glasberg and B. C. J. Moore, "Derivation of auditory filter shapes 
       from notched-noise data," *Hear. Res.*, vol. 47, no. 1-2, pp. 103-138, 
       Aug. 1990.

.. [4] B. C. J. Moore and B. R. Glasberg, "Formulae describing frequency selectivity 
       as a function of frequency and level, and their use in calculating excitation 
       patterns," *Hear. Res.*, vol. 28, no. 2-3, pp. 209-225, 1987.

.. [5] ISO 226:2003, "Acoustics - Normal equal-loudness-level contours," 
       International Organization for Standardization, 2003.

.. [6] P. Majdak, C. Hollomey, and R. Baumgartner, "AMT 1.x: A toolbox for 
       reproducible research in auditory modeling," *Acta Acust.*, vol. 6, 
       p. 19, 2022.
"""

from typing import Dict, Any, Tuple

import torch
import torch.nn as nn

from torch_amt.common.filterbanks import MultiResolutionFFT, ERBIntegration, ExcitationPattern
from torch_amt.common.loudness import SpecificLoudness, LoudnessIntegration



[docs]
class Glasberg2002(nn.Module):
    r"""
    Glasberg & Moore (2002) model for time-varying loudness perception.
    
    Implements the complete loudness computation pipeline from Glasberg & Moore (2002), 
    providing perceptual loudness measures in sone from audio waveforms. The model 
    accounts for frequency-dependent hearing sensitivity (ISO 226), masking effects 
    via asymmetric excitation spreading, and temporal integration with attack/release 
    dynamics.
    
    This implementation is based on the MATLAB Auditory Modeling Toolbox (AMT) 
    ``glasberg2002`` function and provides a differentiable, GPU-accelerated version 
    suitable for neural network training and loudness-based optimization.
    
    Algorithm Overview
    ------------------
    The model implements a 5-stage loudness processing pipeline:
    
    **Stage 1: Multi-Resolution FFT**
    
    Performs time-frequency analysis with multiple FFT sizes to balance temporal 
    and frequency resolution across the audible spectrum:
    
    .. math::
        X(t, f) = \\text{FFT}_{N(f)}(x(t))
    
    where :math:`N(f)` is frequency-dependent FFT size (larger for low frequencies).
    Outputs power spectral density (PSD) in :math:`\\text{Pa}^2/\\text{Hz}`.
    
    **Stage 2: ERB Integration**
    
    Maps PSD to perceptual ERB frequency scale with 1/4 ERB resolution:
    
    .. math::
        E_{\\text{ERB}}(t, f_{\\text{ERB}}) = \\int P(t, f) \\cdot W_{\\text{ERB}}(f, f_{\\text{ERB}}) df
    
    where :math:`W_{\\text{ERB}}` is the ERB weighting function. Output in dB SPL.
    
    **Stage 3: Excitation Pattern**
    
    Models asymmetric frequency spreading with level-dependent slopes:
    
    .. math::
        E_{\\text{spread}}(t, f) = \\sum_g E_{\\text{ERB}}(t, f+g) \\cdot S(g, E)
    
    where :math:`S(g, E)` is the spreading function (steeper upward, shallower downward).
    
    **Stage 4: Specific Loudness**
    
    Applies 3-regime loudness transformation (Moore & Glasberg 1997):
    
    .. math::
        N(t, f) = \\begin{cases}
        0 & E < E_{\\text{thrq}} \\\\
        C \\cdot (E - E_{\\text{thrq}}) & E_{\\text{thrq}} < E < E_0 \\\\
        C \\cdot E_0^{1-\\alpha} (E - E_{\\text{thrq}})^{\\alpha} & E > E_0
        \\end{cases}
    
    with :math:`C=0.047`, :math:`\\alpha=0.2`, :math:`E_0=10` dB above threshold.
    
    **Stage 5: Loudness Integration**
    
    Spatial integration (sum across ERB channels) followed by temporal integration 
    with asymmetric attack/release filter:
    
    .. math::
        \\text{STL}(t) = \\sum_f N(t, f), \\quad 
        \\text{LTL}[n] = (1-\\alpha[n])\\text{STL}[n] + \\alpha[n]\\text{LTL}[n-1]
    
    where :math:`\\alpha = \\exp(-\\Delta t / \\tau)` with :math:`\\tau_{\\text{attack}}=50` ms, 
    :math:`\\tau_{\\text{release}}=200` ms.
    
    Parameters
    ----------
    fs : int, optional
        Sampling rate in Hz. Default: 32000 Hz.
        Higher sampling rates improve temporal resolution but increase computational cost.
        Typical values: 16000, 32000, 44100, 48000 Hz.
    
    learnable : bool, optional
        If True, all model stages become trainable with gradient-based optimization. 
        Default: False (fixed parameters).
        When True, enables end-to-end model training for task-specific optimization.
    
    return_stages : bool, optional
        If True, returns intermediate processing stages along with final output. 
        Default: False (only final long-term loudness).
        Useful for visualization, analysis, and multi-stage training.
    
    **multi_fft_kwargs : dict, optional
        Additional keyword arguments passed to :class:`MultiResolutionFFT`.
        Common options:
        
        - ``hop_length`` (int): Hop size for STFT in samples.
        - ``n_ffts`` (list): FFT sizes for multi-resolution analysis.
        - Other parameters accepted by MultiResolutionFFT.
    
    **erb_kwargs : dict, optional
        Additional keyword arguments passed to :class:`ERBIntegration`.
        Common options:
        
        - ``f_min`` (float): Minimum frequency in Hz. Default: 50.0.
        - ``f_max`` (float): Maximum frequency in Hz. Default: 15000.0.
        - ``erb_step`` (float): ERB frequency step. Default: 0.25.
        - ``bandwidth_scale`` (float): Bandwidth scaling factor. Default: 1.0.
        - Other parameters accepted by ERBIntegration.
    
    **excitation_kwargs : dict, optional
        Additional keyword arguments passed to :class:`ExcitationPattern`.
        Common options:
        
        - ``upper_slope_base`` (float): Base upper spreading slope. Default: 27.0 dB/ERB.
        - ``lower_slope_base`` (float): Base lower spreading slope. Default: 27.0 dB/ERB.
        - ``upper_slope_per_db`` (float): Upper slope level dependency. Default: 0.0.
        - ``lower_slope_per_db`` (float): Lower slope level dependency. Default: -0.4 dB/ERB per dB.
        - Other parameters accepted by ExcitationPattern.
    
    **specific_loudness_kwargs : dict, optional
        Additional keyword arguments passed to :class:`SpecificLoudness`.
        Common options:
        
        - ``f_min`` (float): Minimum ERB frequency. Default: 50.0 Hz.
        - ``f_max`` (float): Maximum ERB frequency. Default: 15000.0 Hz.
        - ``erb_step`` (float): ERB step. Default: 0.25.
        - Other parameters accepted by SpecificLoudness.
    
    **loudness_integration_kwargs : dict, optional
        Additional keyword arguments passed to :class:`LoudnessIntegration`.
        Common options:
        
        - ``tau_attack`` (float): Attack time constant in seconds. Default: 0.05 (50 ms).
        - ``tau_release`` (float): Release time constant in seconds. Default: 0.20 (200 ms).
        - Other parameters accepted by LoudnessIntegration.
    
    Attributes
    ----------
    fs : int
        Sampling rate in Hz.
    
    learnable : bool
        Whether model parameters are trainable.
    
    return_stages : bool
        Whether to return intermediate processing stages.
    
    multi_fft : MultiResolutionFFT
        Stage 1: Multi-resolution time-frequency analysis module.
    
    erb_integration : ERBIntegration
        Stage 2: ERB frequency scale integration module.
    
    excitation_pattern : ExcitationPattern
        Stage 3: Excitation pattern spreading module.
    
    specific_loudness : SpecificLoudness
        Stage 4: Specific loudness transformation module.
    
    loudness_integration : LoudnessIntegration
        Stage 5: Spatial and temporal loudness integration module.
    
    Input Shape
    -----------
    audio : torch.Tensor
        Audio signal with shape:
        
        - :math:`(B, T)` - Batch of audio samples
        - :math:`(T,)` - Single audio sample (mono)
        
        where:
        
        - :math:`B` = batch size
        - :math:`T` = time samples
    
    Output Shape
    ------------
    When ``return_stages=False`` (default):
        torch.Tensor
            Long-term loudness in sone, shape :math:`(B, F)` where:
            
            - :math:`F` = number of time frames (depends on hop_length)
            
    When ``return_stages=True``:
        Tuple[torch.Tensor, Dict[str, torch.Tensor]]
            - First element: long-term loudness (as above)
            - Second element: dict with keys:
              
              - ``'stl'``: Short-term loudness, shape :math:`(B, F)` in sone
              - ``'specific_loudness'``: Specific loudness, shape :math:`(B, F, N_{\\text{ERB}})` in sone/ERB
              - ``'excitation'``: Excitation pattern, shape :math:`(B, F, N_{\\text{ERB}})` in dB SPL
              - ``'erb_excitation'``: ERB-integrated excitation, shape :math:`(B, F, N_{\\text{ERB}})` in dB SPL
              - ``'psd'``: Power spectral density, shape :math:`(B, F, N_{\\text{freq}})`
              - ``'freqs'``: Frequency vector for PSD, shape :math:`(N_{\\text{freq}},)` in Hz
    
    Examples
    --------
    **Basic usage:**
    
    >>> import torch
    >>> from torch_amt.models import Glasberg2002
    >>> 
    >>> # Create model
    >>> model = Glasberg2002(fs=32000)
    >>> n_erb = model.erb_integration.n_erb_bands
    >>> print(f"ERB channels: {n_erb}")
    ERB channels: 150
    >>> 
    >>> # Process 1 second of audio
    >>> audio = torch.randn(2, 32000)  # 2 batches
    >>> ltl = model(audio)
    >>> print(f"LTL shape: {ltl.shape}, range: [{ltl.min():.2f}, {ltl.max():.2f}] sone")
    LTL shape: torch.Size([2, 62]), range: [0.23, 45.67] sone
    
    **With intermediate stages:**
    
    >>> model_debug = Glasberg2002(fs=32000, return_stages=True)
    >>> ltl, stages = model_debug(audio)
    >>> 
    >>> print(f"Available stages: {list(stages.keys())}")
    Available stages: ['stl', 'specific_loudness', 'excitation', 'erb_excitation', 'psd', 'freqs']
    >>> print(f"STL shape: {stages['stl'].shape}")
    STL shape: torch.Size([2, 62])
    >>> print(f"Specific loudness shape: {stages['specific_loudness'].shape}")
    Specific loudness shape: torch.Size([2, 62, 150])
    >>> print(f"Excitation shape: {stages['excitation'].shape}")
    Excitation shape: torch.Size([2, 62, 150])
    
    **Single channel input:**
    
    >>> audio_mono = torch.randn(32000)  # No batch dimension
    >>> ltl_mono = model(audio_mono)
    >>> print(f"Output shape (mono): {ltl_mono.shape}")
    Output shape (mono): torch.Size([62])
    
    **Learnable model for optimization:**
    
    >>> model_learnable = Glasberg2002(fs=32000, learnable=True)
    >>> n_params = sum(p.numel() for p in model_learnable.parameters())
    >>> print(f"Trainable parameters: {n_params}")
    Trainable parameters: 8743
    >>> 
    >>> # Example training loop
    >>> optimizer = torch.optim.Adam(model_learnable.parameters(), lr=1e-3)
    >>> # ... training code ...
    
    **Custom submodule parameters:**
    
    >>> # Custom ERB frequency range
    >>> model_custom_erb = Glasberg2002(
    ...     fs=44100,
    ...     erb_kwargs={'f_min': 80.0, 'f_max': 12000.0, 'erb_step': 0.5}
    ... )
    >>> print(f"ERB channels: {model_custom_erb.erb_integration.n_erb_bands}")
    ERB channels: 75
    >>> 
    >>> # Custom excitation spreading
    >>> model_custom_exc = Glasberg2002(
    ...     fs=32000,
    ...     excitation_kwargs={
    ...         'upper_slope_base': 30.0,  # Steeper upper slope
    ...         'lower_slope_per_db': -0.5  # More level-dependent lower slope
    ...     }
    ... )
    >>> 
    >>> # Custom temporal integration
    >>> model_custom_temp = Glasberg2002(
    ...     fs=32000,
    ...     loudness_integration_kwargs={
    ...         'tau_attack': 0.03,  # Faster attack (30 ms)
    ...         'tau_release': 0.30  # Slower release (300 ms)
    ...     }
    ... )
    
    **Different sampling rates:**
    
    >>> model_44k = Glasberg2002(fs=44100)
    >>> audio_44k = torch.randn(2, 44100)  # 1 second @ 44.1 kHz
    >>> ltl_44k = model_44k(audio_44k)
    >>> print(f"Output frames @ 44.1kHz: {ltl_44k.shape[1]}")
    Output frames @ 44.1kHz: 86
    
    **Reset temporal state for new signal:**
    
    >>> # Process first signal
    >>> signal1 = torch.randn(1, 32000)
    >>> ltl1 = model(signal1)
    >>> 
    >>> # Reset before processing unrelated second signal
    >>> model.reset_state()
    >>> signal2 = torch.randn(1, 32000)
    >>> ltl2 = model(signal2)
    
    **Convert to loudness level (phon):**
    
    >>> ltl_sone = model(audio)
    >>> ltl_phon = model.compute_loudness_level(ltl_sone)
    >>> print(f"Loudness: {ltl_sone.mean():.2f} sone = {ltl_phon.mean():.2f} phon")
    Loudness: 12.34 sone = 54.32 phon
    
    Notes
    -----
    **Model Configuration:**
    
    The Glasberg2002 model uses specific configurations for each processing stage:
    
    - **Multi-resolution FFT**: Multiple FFT sizes (frequency-dependent) for balanced resolution
    - **ERB integration**: 1/4 ERB steps from 50 Hz to 15 kHz (150 channels)
    - **Excitation pattern**: Asymmetric spreading (27 dB/ERB base, -0.4 dB/ERB per dB lower slope)
    - **Specific loudness**: 3-regime transformation (:math:`C=0.047`, :math:`\\alpha=0.2`, :math:`E_0=10` dB)
    - **Loudness integration**: Attack 50 ms, release 200 ms
    
    **Customizing Submodule Parameters:**
    
    All submodules can be customized through dedicated kwargs dictionaries:
    
    - Use ``multi_fft_kwargs`` to pass parameters to :class:`MultiResolutionFFT`
    - Use ``erb_kwargs`` to pass parameters to :class:`ERBIntegration`
    - Use ``excitation_kwargs`` to pass parameters to :class:`ExcitationPattern`
    - Use ``specific_loudness_kwargs`` to pass parameters to :class:`SpecificLoudness`
    - Use ``loudness_integration_kwargs`` to pass parameters to :class:`LoudnessIntegration`
    
    The ``learnable`` and ``dtype`` parameters are always centralized and applied 
    to all submodules automatically. Custom parameters override defaults while 
    maintaining the Glasberg2002 model structure.
    
    **Computational Complexity:**
    
    Processing time scales approximately as:
    
    .. math::
        T_{\\text{compute}} \\propto B \\cdot F \\cdot N_{\\text{ERB}} \\cdot \\log N_{\\text{FFT}}
    
    where :math:`F` = number of time frames (~60 per second), :math:`N_{\\text{ERB}}=150`.
    For 1 second at 32 kHz: ~0.05-0.2 seconds on CPU, ~0.005-0.02 seconds on GPU.
    
    **Memory Requirements:**
    
    Peak memory scales with intermediate representations:
    
    .. math::
        Memory \\approx B \\cdot F \\cdot N_{\\text{ERB}} \\cdot 4\\,\\text{bytes}
    
    For batch=8, 1 second @ 32 kHz: ~20-40 MB.
    
    **Differences from MATLAB AMT:**
    
    - This implementation uses PyTorch tensors for GPU acceleration
    - Supports batch processing natively
    - All stages are differentiable for gradient-based optimization
    - Output frames depend on hop_length (not fixed downsampling)
    
    **Loudness Units:**
    
    - **Sone**: Perceptual loudness unit. 1 sone = loudness of 1 kHz tone at 40 dB SPL
    - **Phon**: Loudness level unit. Equal to dB SPL at 1 kHz
    - Conversion: :math:`L_{\\text{phon}} = 40 + 10\\log_2(L_{\\text{sone}})`
    
    **Applications:**
    
    The model output can be used for:
    
    - Perceptual loudness measurement and normalization
    - Audio quality assessment (loudness-based metrics)
    - Dynamic range compression/expansion
    - Hearing aid fitting and evaluation
    - Psychoacoustic model validation
    - Feature extraction for machine learning
    
    See Also
    --------
    MultiResolutionFFT : Stage 1 - Time-frequency analysis
    ERBIntegration : Stage 2 - ERB frequency scale
    ExcitationPattern : Stage 3 - Excitation spreading
    SpecificLoudness : Stage 4 - Loudness transformation
    LoudnessIntegration : Stage 5 - Spatial and temporal integration
    
    References
    ----------
    .. [1] B. R. Glasberg and B. C. J. Moore, "A Model of Loudness Applicable to 
           Time-Varying Sounds," *J. Audio Eng. Soc.*, vol. 50, no. 5, pp. 331-342, 
           May 2002.
    
    .. [2] B. C. J. Moore and B. R. Glasberg, "A Model for the Prediction of Thresholds, 
           Loudness, and Partial Loudness," *J. Audio Eng. Soc.*, vol. 45, no. 4, 
           pp. 224-240, Apr. 1997.
    
    .. [3] B. R. Glasberg and B. C. J. Moore, "Derivation of auditory filter shapes 
           from notched-noise data," *Hear. Res.*, vol. 47, no. 1-2, pp. 103-138, 
           Aug. 1990.
    
    .. [4] B. C. J. Moore and B. R. Glasberg, "Formulae describing frequency selectivity 
           as a function of frequency and level, and their use in calculating excitation 
           patterns," *Hear. Res.*, vol. 28, no. 2-3, pp. 209-225, 1987.
    
    .. [5] ISO 226:2003, "Acoustics - Normal equal-loudness-level contours," 
           International Organization for Standardization, 2003.
    
    .. [6] P. Majdak, C. Hollomey, and R. Baumgartner, "AMT 1.x: A toolbox for 
           reproducible research in auditory modeling," *Acta Acust.*, vol. 6, 
           p. 19, 2022.
    """
    

[docs]
    def __init__(self, 
                 fs: int = 32000, 
                 learnable: bool = False,
                 return_stages: bool = False,
                 multi_fft_kwargs: Dict[str, Any] = None,
                 erb_kwargs: Dict[str, Any] = None,
                 excitation_kwargs: Dict[str, Any] = None,
                 specific_loudness_kwargs: Dict[str, Any] = None,
                 loudness_integration_kwargs: Dict[str, Any] = None):
        """
        Initialize Glasberg & Moore (2002) loudness model.
        
        Parameters
        ----------
        fs : int, optional
            Sampling rate in Hz. Default: 32000.
        
        learnable : bool, optional
            If True, all model parameters become trainable. Default: False.
        
        return_stages : bool, optional
            If True, return intermediate processing stages. Default: False.
        
        **kwargs : dict
            Additional keyword arguments for submodules (see class docstring).
        """
        super().__init__()
        
        self.fs = fs
        self.learnable = learnable
        self.return_stages = return_stages
        
        # Initialize kwargs dictionaries if None
        multi_fft_kwargs = multi_fft_kwargs or {}
        erb_kwargs = erb_kwargs or {}
        excitation_kwargs = excitation_kwargs or {}
        specific_loudness_kwargs = specific_loudness_kwargs or {}
        loudness_integration_kwargs = loudness_integration_kwargs or {}
        
        # Stage 1: Multi-resolution FFT
        self.multi_fft = MultiResolutionFFT(fs=fs, learnable=learnable, **multi_fft_kwargs)
        
        # Stage 2: ERB integration
        self.erb_integration = ERBIntegration(fs=fs, learnable=learnable, **erb_kwargs)
        
        # Stage 3: Excitation pattern
        self.excitation_pattern = ExcitationPattern(fs=fs, learnable=learnable, **excitation_kwargs)
        
        # Stage 4: Specific loudness
        self.specific_loudness = SpecificLoudness(fs=fs, learnable=learnable, **specific_loudness_kwargs)
        
        # Stage 5: Loudness integration
        self.loudness_integration = LoudnessIntegration(fs=fs, learnable=learnable, **loudness_integration_kwargs)

    

[docs]
    def forward(self, audio: torch.Tensor) -> torch.Tensor | Tuple[torch.Tensor, Dict[str, Any]]:
        """
        Process audio through the Glasberg2002 loudness model.
        
        Parameters
        ----------
        audio : torch.Tensor
            Input audio signal. Shape: (B, T) or (T,).
            
        Returns
        -------
        torch.Tensor or tuple
            If return_stages=False:
                Long-term loudness in sone, shape (B, F) or (F,).
            If return_stages=True:
                Tuple of (ltl, stages) where stages is a dict with:
                - 'stl': Short-term loudness (B, F) in sone
                - 'specific_loudness': Specific loudness (B, F, N_ERB) in sone/ERB
                - 'excitation': Excitation pattern (B, F, N_ERB) in dB SPL
                - 'erb_excitation': ERB-integrated excitation (B, F, N_ERB) in dB SPL
                - 'psd': Power spectral density (B, F, N_freq)
                - 'freqs': Frequency vector (N_freq,) in Hz
        """
        stages = {} if self.return_stages else None
        
        # Stage 1: Multi-resolution FFT
        psd, freqs = self.multi_fft(audio)
        if self.return_stages:
            stages['psd'] = psd
            stages['freqs'] = freqs
        
        # Stage 2: ERB integration (perceptual frequency scale)
        erb_excitation = self.erb_integration(psd, freqs)
        if self.return_stages:
            stages['erb_excitation'] = erb_excitation
        
        # Stage 3: Excitation pattern (asymmetric spreading, level-dependent)
        excitation = self.excitation_pattern(erb_excitation)
        if self.return_stages:
            stages['excitation'] = excitation
        
        # Stage 4: Specific loudness (3-regime compression)
        specific_loudness = self.specific_loudness(excitation)
        if self.return_stages:
            stages['specific_loudness'] = specific_loudness
        
        # Stage 5: Loudness integration (spatial + temporal)
        ltl, stl = self.loudness_integration(specific_loudness, return_stl=True)
        if self.return_stages:
            stages['stl'] = stl
        
        if self.return_stages:
            return ltl, stages
        else:
            return ltl

    

[docs]
    def reset_state(self):
        """
        Reset temporal integration state for processing discontinuous signals.
        
        Clears the internal state of the temporal integration filter (LoudnessIntegration).
        Call this method when processing multiple unrelated audio signals sequentially 
        to prevent temporal blending between signals.
        
        Examples
        --------
        >>> model = Glasberg2002(fs=32000)
        >>> signal1 = torch.randn(1, 32000)
        >>> ltl1 = model(signal1)
        >>> 
        >>> # Reset before processing new signal
        >>> model.reset_state()
        >>> signal2 = torch.randn(1, 32000)
        >>> ltl2 = model(signal2)  # No temporal carryover from signal1
        """
        self.loudness_integration.reset_state()

    

[docs]
    def get_erb_frequencies(self) -> torch.Tensor:
        """
        Get ERB channel center frequencies.
        
        Returns the center frequencies (in Hz) of the ERB-spaced frequency channels 
        used throughout the model pipeline. Useful for frequency-domain visualization 
        and analysis.
        
        Returns
        -------
        torch.Tensor
            Center frequencies of ERB channels, shape (n_erb_bands,) in Hz.
            Typically 150 channels from 50 Hz to 15 kHz with 1/4 ERB spacing.
        
        Examples
        --------
        >>> model = Glasberg2002(fs=32000)
        >>> fc = model.get_erb_frequencies()
        >>> print(f"ERB frequencies: {fc.shape}, range: [{fc.min():.1f}, {fc.max():.1f}] Hz")
        ERB frequencies: torch.Size([150]), range: [50.0, 14999.2] Hz
        """
        return self.erb_integration.fc_erb

    

[docs]
    def get_learnable_parameters(self) -> Dict[str, Any]:
        """
        Get all learnable parameters organized by model component.
        
        Returns a nested dictionary containing the current values of all trainable 
        parameters in each pipeline stage. Only returns parameters when model is 
        initialized with ``learnable=True``.
        
        Returns
        -------
        dict
            Dictionary with component names as keys and parameter dicts as values:
            
            - ``'multi_fft'``: MultiResolutionFFT parameters (hop_length, n_ffts)
            - ``'erb_integration'``: ERBIntegration parameters (fc_erb, bandwidth_scale)
            - ``'excitation_pattern'``: ExcitationPattern parameters (slopes)
            - ``'specific_loudness'``: SpecificLoudness parameters (C, alpha, E0, thresholds)
            - ``'loudness_integration'``: LoudnessIntegration parameters (tau_attack, tau_release)
            
            Empty dict if ``learnable=False``.
        
        Examples
        --------
        >>> model = Glasberg2002(fs=32000, learnable=True)
        >>> params = model.get_learnable_parameters()
        >>> print(f"Components: {list(params.keys())}")
        Components: ['multi_fft', 'erb_integration', 'excitation_pattern', 'specific_loudness', 'loudness_integration']
        >>> 
        >>> # Access specific parameters
        >>> print(f"Attack time: {params['loudness_integration']['tau_attack']:.3f} s")
        Attack time: 0.050 s
        >>> print(f"Bandwidth scale: {params['erb_integration']['bandwidth_scale']:.2f}")
        Bandwidth scale: 1.00
        """
        if not self.learnable:
            return {}
        
        params = {}
        
        # MultiResolutionFFT parameters
        params['multi_fft'] = {'hop_length': self.multi_fft.hop_length,
                               'n_ffts': self.multi_fft.n_ffts}
        
        # ERBIntegration parameters
        params['erb_integration'] = {'fc_erb': self.erb_integration.fc_erb,
                                     'bandwidth_scale': self.erb_integration.bandwidth_scale}
        
        # ExcitationPattern parameters
        params['excitation_pattern'] = {'upper_slope_base': self.excitation_pattern.upper_slope_base,
                                        'lower_slope_base': self.excitation_pattern.lower_slope_base,
                                        'upper_slope_per_db': self.excitation_pattern.upper_slope_per_db,
                                        'lower_slope_per_db': self.excitation_pattern.lower_slope_per_db}
        
        # SpecificLoudness parameters
        params['specific_loudness'] = self.specific_loudness.get_parameters()
        
        # LoudnessIntegration parameters
        tau_attack, tau_release = self.loudness_integration.get_time_constants()
        params['loudness_integration'] = {'tau_attack': tau_attack, 'tau_release': tau_release}
        
        return params

    

[docs]
    def compute_loudness_level(self, ltl: torch.Tensor) -> torch.Tensor:
        """
        Convert loudness from sone to loudness level in phon.
        
        Applies Stevens' power law to convert perceptual loudness (sone) to 
        loudness level (phon), which is equivalent to dB SPL at 1 kHz.
        
        Parameters
        ----------
        ltl : torch.Tensor
            Loudness in sone, shape (B, F) or (F,).
            Values should be non-negative.
        
        Returns
        -------
        torch.Tensor
            Loudness level in phon, same shape as input.
            Formula: :math:`L_{\\text{phon}} = 40 + 10\\log_2(L_{\\text{sone}})`
        
        Notes
        -----
        **Loudness Units:**
        
        - **Sone**: Perceptual loudness. 1 sone = loudness of 1 kHz tone at 40 dB SPL.
          Doubling sone value = doubling perceived loudness.
        
        - **Phon**: Loudness level. Equal to dB SPL of equally loud 1 kHz tone.
          40 phon = 40 dB SPL @ 1 kHz = 1 sone.
        
        **Conversion Examples:**
        
        - 1 sone = 40 phon (reference)
        - 2 sone = 50 phon (10 dB louder)
        - 4 sone = 60 phon (20 dB louder)
        - 0.5 sone = 30 phon (10 dB quieter)
        
        Examples
        --------
        >>> model = Glasberg2002(fs=32000)
        >>> audio = torch.randn(2, 32000)
        >>> ltl_sone = model(audio)
        >>> ltl_phon = model.compute_loudness_level(ltl_sone)
        >>> 
        >>> print(f"Loudness: {ltl_sone.mean():.2f} sone")
        Loudness: 12.34 sone
        >>> print(f"Loudness level: {ltl_phon.mean():.2f} phon")
        Loudness level: 54.32 phon
        
        References
        ----------
        .. [1] S. S. Stevens, "Perceived level of noise by Mark VII and decibels (E)," 
               *J. Acoust. Soc. Am.*, vol. 51, no. 2B, pp. 575-601, 1972.
        """
        # Avoid log of zero
        ltl_safe = torch.clamp(ltl, min=1e-10)
        loudness_level = 40.0 + 10.0 * torch.log2(ltl_safe)
        return loudness_level

    

[docs]
    def extra_repr(self) -> str:
        """
        Extra representation for printing.
        
        Returns
        -------
        str
            String representation of module parameters.
        """
        n_erb_bands = self.erb_integration.n_erb_bands
        return (f"fs={self.fs}, n_erb_bands={n_erb_bands}, "
                f"learnable={self.learnable}, return_stages={self.return_stages}")