Source code for vitalDSP.physiological_features.advanced_entropy

"""
Advanced Entropy Analysis Module
=================================

This module provides advanced entropy-based complexity measures for physiological
signal analysis, including:

1. Multi-Scale Entropy (MSE) - Costa et al. (2002)
2. Composite Multi-Scale Entropy (CMSE) - Wu et al. (2013)
3. Refined Composite Multi-Scale Entropy (RCMSE) - Wu et al. (2014)
4. Multi-Scale Sample Entropy (MSSE)
5. Multi-Scale Fuzzy Entropy (MFE)

These methods analyze signal complexity across multiple time scales, providing
insights into the multi-scale structure of physiological signals.

Clinical Applications:
---------------------
- Cardiac arrhythmia detection and classification
- Aging assessment and cardiovascular health
- Autonomic nervous system function evaluation
- Disease progression monitoring (heart failure, diabetes)
- Sleep stage classification
- Seizure prediction and epilepsy monitoring

Mathematical Background:
-----------------------
Multi-scale entropy extends traditional entropy measures by analyzing the signal
at multiple temporal scales through a coarse-graining procedure. This reveals
complexity at different time scales, which is crucial for understanding
physiological regulation mechanisms.

References:
----------
1. Costa, M., Goldberger, A. L., & Peng, C. K. (2002). Multiscale entropy analysis
   of complex physiologic time series. Physical review letters, 89(6), 068102.

2. Wu, S. D., Wu, C. W., Lin, S. G., Wang, C. C., & Lee, K. Y. (2013). Time series
   analysis using composite multiscale entropy. Entropy, 15(3), 1069-1084.

3. Wu, S. D., Wu, C. W., Lin, S. G., Lee, K. Y., & Peng, C. K. (2014). Analysis of
   complex time series using refined composite multiscale entropy. Physics Letters A,
   378(20), 1369-1374.

4. Ahmed, M. U., & Mandic, D. P. (2011). Multivariate multiscale entropy: A tool for
   complexity analysis of multichannel data. Physical Review E, 84(6), 061918.

Date: October 10, 2025
Version: 1.0
"""

"""
Physiological Features Module for Physiological Signal Processing

This module provides comprehensive capabilities for physiological
signal processing including ECG, PPG, EEG, and other vital signs.

Author: vitalDSP Team
Date: 2025-01-27
Version: 1.0.0

Key Features:
- Object-oriented design with comprehensive classes
- Multiple processing methods and functions
- NumPy integration for numerical computations
- SciPy integration for advanced signal processing
- Interactive visualization capabilities

Examples:
--------
Basic usage:
    >>> import numpy as np
    >>> from vitalDSP.physiological_features.advanced_entropy import AdvancedEntropy
    >>> signal = np.random.randn(1000)
    >>> processor = AdvancedEntropy(signal)
    >>> result = processor.process()
    >>> print(f'Processing result: {result}')
"""


import numpy as np
from scipy.spatial import cKDTree
from scipy.special import gamma
import warnings
from typing import Tuple, List, Optional, Union


[docs] class MultiScaleEntropy: """ Multi-Scale Entropy (MSE) analysis for physiological signals. MSE quantifies the complexity of a signal across multiple temporal scales through coarse-graining followed by entropy calculation at each scale. The method reveals how signal complexity changes with scale, providing insights into the multi-scale regulatory mechanisms of physiological systems. Parameters ---------- signal : numpy.ndarray Input time series signal (1D array) max_scale : int, optional Maximum scale factor for coarse-graining (default: 20) Recommended: 20 for HRV analysis, 10-15 for shorter signals m : int, optional Embedding dimension (pattern length) for entropy calculation (default: 2) Typically m=2 for physiological signals r : float, optional Tolerance for pattern matching (default: 0.15) Expressed as fraction of signal standard deviation Recommended: 0.15-0.25 for physiological signals fuzzy : bool, optional Use fuzzy membership functions instead of binary matching (default: False) Fuzzy entropy is more stable for short signals Attributes ---------- signal : numpy.ndarray Original input signal max_scale : int Maximum scale for analysis m : int Embedding dimension r : float Tolerance (absolute value) fuzzy : bool Whether to use fuzzy entropy Methods ------- compute_mse() Compute Multi-Scale Entropy across all scales compute_cmse() Compute Composite Multi-Scale Entropy (improved stability) compute_rcmse() Compute Refined Composite Multi-Scale Entropy (best stability) get_complexity_index() Calculate complexity index (area under MSE curve) Examples -------- >>> # Analyze heart rate variability >>> import numpy as np >>> from vitalDSP.physiological_features.advanced_entropy import MultiScaleEntropy >>> >>> # Generate synthetic HRV signal (RR intervals in seconds) >>> np.random.seed(42) >>> rr_intervals = 1.0 + 0.05 * np.random.randn(1000) # 60 BPM baseline >>> >>> # Compute MSE >>> mse = MultiScaleEntropy(rr_intervals, max_scale=20, m=2, r=0.15) >>> entropy_values = mse.compute_mse() >>> >>> # Get complexity index >>> ci = mse.get_complexity_index(entropy_values) >>> print(f"Complexity Index: {ci:.4f}") >>> >>> # Compare young vs elderly (example) >>> # Young: Higher complexity at multiple scales >>> # Elderly: Reduced complexity, flatter MSE curve Notes ----- **Interpretation Guidelines:** - **Healthy/Young:** MSE values remain high or increase at larger scales indicating rich multi-scale complexity - **Disease/Aging:** MSE values decrease more rapidly with scale, indicating loss of complexity and adaptive capacity - **Scale-Specific Information:** - Scales 1-4: Short-term dynamics (seconds to minutes) - Scales 5-10: Mid-term dynamics (minutes to tens of minutes) - Scales 10-20: Long-term dynamics (tens of minutes to hours) **Signal Length Requirements:** - Minimum: 100 * scale samples for reliable estimation - Recommended: 500-1000+ samples for max_scale=20 - Shorter signals: Use smaller max_scale or CMSE/RCMSE variants **Parameter Selection:** - m=2: Standard for most physiological signals - m=3: For signals requiring more detailed patterns - r=0.15: Conservative choice (good specificity) - r=0.20-0.25: More lenient (better for noisy signals) """ def __init__( self, signal: np.ndarray, max_scale: int = 20, m: int = 2, r: float = 0.15, fuzzy: bool = False, ): """ Initialize Multi-Scale Entropy analyzer. Parameters ---------- signal : numpy.ndarray Input time series (1D) max_scale : int Maximum coarse-graining scale m : int Embedding dimension r : float Tolerance (fraction of std) fuzzy : bool Use fuzzy entropy """ # Input validation if not isinstance(signal, np.ndarray): signal = np.array(signal) if len(signal) < 10: raise ValueError( f"Signal too short ({len(signal)} samples). Minimum: 10 samples." ) if max_scale < 1: raise ValueError(f"max_scale must be >= 1, got {max_scale}") if m < 1 or m > 10: raise ValueError(f"Embedding dimension m must be 1-10, got {m}") if r < 0 or r > 1: raise ValueError(f"Tolerance r must be 0-1 (fraction of std), got {r}") # Store parameters self.signal = signal self.max_scale = max_scale self.m = m self.r = r * np.std(signal) # Convert to absolute tolerance self.fuzzy = fuzzy # Warn if signal might be too short min_recommended_length = 100 * max_scale if len(signal) < min_recommended_length: warnings.warn( f"Signal length ({len(signal)}) is less than recommended " f"({min_recommended_length} for scale {max_scale}). " f"Consider reducing max_scale or using CMSE/RCMSE for better stability.", UserWarning, ) def _coarse_grain(self, scale: int, start_index: int = 0) -> np.ndarray: """ Perform coarse-graining operation on the signal. Coarse-graining averages consecutive non-overlapping windows of length 'scale' to create a new time series at the specified temporal scale. Parameters ---------- scale : int Scale factor (window size for averaging) start_index : int, optional Starting index for coarse-graining (default: 0) Used in composite methods to create multiple coarse-grained series Returns ------- coarse_signal : numpy.ndarray Coarse-grained time series Mathematical Definition: ----------------------- For scale τ, the coarse-grained series y^(τ) is: y^(τ)_j = (1/τ) * Σ(i=(j-1)τ+1 to jτ) x_i where j = 1, 2, ..., N/τ Examples: -------- >>> signal = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) >>> coarse = _coarse_grain(signal, scale=2) >>> # Result: [1.5, 3.5, 5.5, 7.5, 9.5] """ n = len(self.signal) # Calculate number of complete windows n_windows = (n - start_index) // scale if n_windows < 1: raise ValueError( f"Signal too short for scale {scale} with start_index {start_index}. " f"Need at least {scale + start_index} samples, got {n}." ) # Extract relevant portion of signal end_index = start_index + n_windows * scale signal_portion = self.signal[start_index:end_index] # Reshape and average # Shape: (n_windows, scale) -> average over axis 1 coarse_signal = signal_portion.reshape(n_windows, scale).mean(axis=1) return coarse_signal def _sample_entropy(self, coarse_signal: np.ndarray) -> float: """ Compute Sample Entropy for a given signal. Sample Entropy (SampEn) is a modification of Approximate Entropy that is more consistent and less biased. It measures the negative natural logarithm of the conditional probability that two sequences similar for m points remain similar at m+1 points. Parameters ---------- coarse_signal : numpy.ndarray Coarse-grained time series Returns ------- sample_entropy : float Sample entropy value Returns 0 if calculation fails (e.g., signal too short) Mathematical Definition: ----------------------- SampEn(m, r, N) = -ln(A/B) where: - A = number of template matches of length m+1 - B = number of template matches of length m - r = tolerance for matching - N = signal length Algorithm Steps: --------------- 1. Form all possible patterns of length m and m+1 2. For each pattern, count matches within tolerance r 3. Compute ratio of matches: A/B 4. Return -ln(A/B) Computational Complexity: ------------------------ O(N²) for naive implementation O(N log N) with spatial data structures (KD-tree) Implementation Notes: -------------------- This implementation uses scipy's cKDTree for efficient nearest neighbor search, achieving O(N log N) complexity instead of O(N²). References: ---------- Richman, J. S., & Moorman, J. R. (2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6), H2039-H2049. """ N = len(coarse_signal) # Minimum length check # Need at least m+2 points to form templates if N < self.m + 2: warnings.warn( f"Signal too short ({N} samples) for SampEn with m={self.m}. " f"Returning 0.", UserWarning, ) return 0.0 # Helper function to count template matches using KD-tree def _count_matches(m_current: int) -> int: """ Count template matches of length m_current within tolerance r. Uses KD-tree for efficient nearest neighbor search. """ # Create templates (delay vectors) templates = np.array( [coarse_signal[i : i + m_current] for i in range(N - m_current + 1)] ) if len(templates) < 2: return 0 # Build KD-tree for efficient search # Note: Chebyshev distance (L∞) is used for SampEn tree = cKDTree(templates) # Count matches within radius r # We exclude self-matches by checking distance > 0 total_matches = 0 for i, template in enumerate(templates): # Query neighbors within distance r # We use p=np.inf for Chebyshev distance (maximum norm) neighbors = tree.query_ball_point(template, r=self.r, p=np.inf) # Subtract 1 to exclude self-match # (point is always within r of itself) matches = len(neighbors) - 1 total_matches += matches return total_matches # Count matches for length m and m+1 B = _count_matches(self.m) # Matches of length m A = _count_matches(self.m + 1) # Matches of length m+1 # Calculate SampEn if B == 0: warnings.warn( "No template matches found at embedding dimension m. " "Signal may be too short or tolerance too small. Returning NaN.", UserWarning, ) return float('nan') if A == 0: return float('inf') # Sample Entropy = -ln(A/B) sampen = -np.log(A / B) # Ensure non-negative (numerical precision issues) sampen = max(0.0, sampen) return sampen def _fuzzy_entropy(self, coarse_signal: np.ndarray) -> float: """ Compute Fuzzy Entropy for a given signal. Fuzzy Entropy (FuzzyEn) uses fuzzy membership functions instead of binary matching, making it more stable for short and noisy signals. Parameters ---------- coarse_signal : numpy.ndarray Coarse-grained time series Returns ------- fuzzy_entropy : float Fuzzy entropy value Mathematical Definition: ----------------------- FuzzyEn uses an exponential membership function: μ(d) = exp(-(d/r)^n) where: - d = distance between patterns - r = tolerance - n = gradient parameter (typically n=2) Advantages over SampEn: ---------------------- 1. More stable for short signals 2. Continuous similarity measure 3. Better statistical properties 4. Less sensitive to parameter choices References: ---------- Chen, W., Wang, Z., Xie, H., & Yu, W. (2007). Characterization of surface EMG signal based on fuzzy entropy. IEEE Transactions on neural systems and rehabilitation engineering, 15(2), 266-272. """ N = len(coarse_signal) if N < self.m + 2: warnings.warn("Signal too short for FuzzyEn. Returning 0.", UserWarning) return 0.0 # Gradient parameter for fuzzy function n = 2 def _phi(m_current: int) -> float: """Compute phi function for fuzzy entropy.""" # Create templates templates = np.array( [coarse_signal[i : i + m_current] for i in range(N - m_current + 1)] ) if len(templates) < 2: return 0.0 # Calculate all pairwise distances similarities = 0.0 n_patterns = len(templates) for i in range(n_patterns): for j in range(n_patterns): if i != j: # Maximum absolute difference (Chebyshev distance) d = np.max(np.abs(templates[i] - templates[j])) # Fuzzy membership function similarity = np.exp(-((d / self.r) ** n)) similarities += similarity # Average similarity phi = similarities / (n_patterns * (n_patterns - 1)) return phi # Compute phi for m and m+1 phi_m = _phi(self.m) phi_m_plus_1 = _phi(self.m + 1) # Fuzzy Entropy if phi_m_plus_1 == 0 or phi_m == 0: warnings.warn("FuzzyEn calculation failed. Returning 0.", UserWarning) return 0.0 fuzzy_en = np.log(phi_m) - np.log(phi_m_plus_1) return max(0.0, fuzzy_en)
[docs] def compute_mse(self) -> np.ndarray: """ Compute Multi-Scale Entropy (MSE) across all scales. This is the standard MSE algorithm that computes entropy at each coarse-grained scale from 1 to max_scale. Returns ------- mse_values : numpy.ndarray Array of entropy values for each scale (length: max_scale) Index i corresponds to scale i+1 Algorithm: --------- For each scale τ = 1, 2, ..., max_scale: 1. Coarse-grain signal at scale τ 2. Compute Sample Entropy (or Fuzzy Entropy) of coarse-grained signal 3. Store entropy value for scale τ Time Complexity: --------------- O(max_scale * N log N) where N is signal length Examples: -------- >>> mse = MultiScaleEntropy(signal, max_scale=20) >>> entropy_values = mse.compute_mse() >>> >>> # Plot MSE curve >>> import matplotlib.pyplot as plt >>> scales = np.arange(1, 21) >>> plt.plot(scales, entropy_values, 'o-') >>> plt.xlabel('Scale Factor') >>> plt.ylabel('Sample Entropy') >>> plt.title('Multi-Scale Entropy') >>> plt.grid(True) >>> plt.show() Clinical Interpretation: ----------------------- - **Healthy/Young:** MSE stays elevated or increases at larger scales - **Disease/Aging:** MSE decreases rapidly with scale - **Heart Failure:** Marked decrease in entropy at all scales - **Atrial Fibrillation:** Very high entropy at small scales, rapid decrease """ mse_values = [] # Select entropy calculation method entropy_func = self._fuzzy_entropy if self.fuzzy else self._sample_entropy for scale in range(1, self.max_scale + 1): try: # Coarse-grain signal at current scale coarse_signal = self._coarse_grain(scale) # Compute entropy entropy = entropy_func(coarse_signal) mse_values.append(entropy) except Exception as e: warnings.warn( f"Failed to compute entropy at scale {scale}: {str(e)}. " f"Using 0.", UserWarning, ) mse_values.append(0.0) return np.array(mse_values)
[docs] def compute_cmse(self) -> np.ndarray: """ Compute Composite Multi-Scale Entropy (CMSE). CMSE improves upon standard MSE by averaging entropy values across multiple coarse-grained series with different starting points. This reduces variance and provides more stable estimates, especially for shorter signals. Returns ------- cmse_values : numpy.ndarray Array of composite entropy values for each scale Algorithm: --------- For each scale τ = 1, 2, ..., max_scale: 1. Create τ different coarse-grained series starting at indices 0, 1, ..., τ-1 2. Compute entropy for each coarse-grained series 3. Average the τ entropy values Advantages over Standard MSE: ----------------------------- 1. **Reduced Variance:** Averaging reduces statistical fluctuations 2. **Better Stability:** More reliable for short signals 3. **Improved Discrimination:** Better separates different signal classes 4. **Consistent Results:** Less sensitive to signal length Time Complexity: --------------- O(max_scale² * N log N) Note: ~τ times slower than MSE due to multiple coarse-grainings Examples: -------- >>> mse = MultiScaleEntropy(signal, max_scale=15) >>> cmse_values = mse.compute_cmse() >>> >>> # Compare with standard MSE >>> mse_values = mse.compute_mse() >>> >>> import matplotlib.pyplot as plt >>> scales = np.arange(1, 16) >>> plt.plot(scales, mse_values, 'o-', label='MSE') >>> plt.plot(scales, cmse_values, 's-', label='CMSE') >>> plt.xlabel('Scale') >>> plt.ylabel('Entropy') >>> plt.legend() >>> plt.grid(True) References: ---------- Wu, S. D., Wu, C. W., Lin, S. G., Wang, C. C., & Lee, K. Y. (2013). Time series analysis using composite multiscale entropy. Entropy, 15(3), 1069-1084. Notes: ----- CMSE is particularly recommended when: - Signal length < 1000 samples - max_scale > 10 - Comparing signals of different lengths - High precision is required """ cmse_values = [] entropy_func = self._fuzzy_entropy if self.fuzzy else self._sample_entropy for scale in range(1, self.max_scale + 1): scale_entropies = [] # Create multiple coarse-grained series with different starting points for start_idx in range(scale): try: coarse_signal = self._coarse_grain(scale, start_index=start_idx) # Skip if coarse-grained signal is too short if len(coarse_signal) < self.m + 2: continue entropy = entropy_func(coarse_signal) scale_entropies.append(entropy) except Exception as e: # Skip this starting point if it fails continue # Average entropy across all starting points if scale_entropies: cmse_value = np.mean(scale_entropies) else: warnings.warn( f"No valid entropy values at scale {scale}. Using 0.", UserWarning ) cmse_value = 0.0 cmse_values.append(cmse_value) return np.array(cmse_values)
[docs] def compute_rcmse(self) -> np.ndarray: """ Compute Refined Composite Multi-Scale Entropy (RCMSE). RCMSE further refines CMSE by using a modified coarse-graining procedure that preserves more information from the original signal. Returns ------- rcmse_values : numpy.ndarray Array of refined composite entropy values Refined Coarse-Graining: ----------------------- Instead of non-overlapping windows, RCMSE uses overlapping windows: y^(τ)_j = (1/τ) * Σ(i=j to j+τ-1) x_i This preserves more temporal structure and reduces information loss. Advantages over CMSE: -------------------- 1. **Better Information Preservation:** Overlapping windows retain more details 2. **Smoother Curves:** Less jagged MSE curves 3. **Improved Sensitivity:** Better detects subtle changes 4. **Best Stability:** Superior performance on short signals When to Use RCMSE: ----------------- - Short signals (< 500 samples) - Need maximum stability - Require smooth, interpretable curves - Comparing very different conditions References: ---------- Wu, S. D., Wu, C. W., Lin, S. G., Lee, K. Y., & Peng, C. K. (2014). Analysis of complex time series using refined composite multiscale entropy. Physics Letters A, 378(20), 1369-1374. """ rcmse_values = [] entropy_func = self._fuzzy_entropy if self.fuzzy else self._sample_entropy for scale in range(1, self.max_scale + 1): scale_entropies = [] # Refined coarse-graining with overlapping windows n = len(self.signal) n_windows = n - scale + 1 if n_windows < self.m + 2: warnings.warn( f"Signal too short for RCMSE at scale {scale}. Using 0.", UserWarning, ) rcmse_values.append(0.0) continue # Proper RCMSE: create tau non-overlapping coarse-grained series # with multiple starting offsets, then pool all values all_coarse = [] for k in range(scale): for j in range((n - k) // scale): start = k + j * scale end = start + scale all_coarse.append(np.mean(self.signal[start:end])) coarse_signal = np.array(all_coarse) if len(coarse_signal) < self.m + 2: rcmse_values.append(0.0) continue try: entropy = entropy_func(coarse_signal) rcmse_values.append(entropy) except Exception as e: warnings.warn( f"Failed to compute RCMSE at scale {scale}: {str(e)}. Using 0.", UserWarning, ) rcmse_values.append(0.0) return np.array(rcmse_values)
[docs] def get_complexity_index( self, entropy_values: np.ndarray, scale_range: Optional[Tuple[int, int]] = None ) -> float: """ Calculate Complexity Index (CI) as area under the MSE curve. The complexity index summarizes the overall complexity across scales into a single scalar value. Higher CI indicates more complex, healthy physiological regulation. Parameters ---------- entropy_values : numpy.ndarray MSE/CMSE/RCMSE values scale_range : tuple of int, optional (start_scale, end_scale) for integration (default: all scales) Useful for focusing on specific temporal scales Returns ------- complexity_index : float Area under the entropy curve (using trapezoidal integration) Formula: ------- CI = Σ(i=1 to max_scale-1) [(Entropy_i + Entropy_(i+1)) / 2] Clinical Interpretation: ----------------------- - **High CI:** Complex, adaptive physiological regulation (healthy) - **Low CI:** Simple, less adaptive regulation (disease, aging) - **Very Low CI:** Pathological simplification (severe disease) Examples: -------- >>> mse = MultiScaleEntropy(signal) >>> entropy = mse.compute_mse() >>> >>> # Overall complexity >>> ci_total = mse.get_complexity_index(entropy) >>> >>> # Short-term complexity (scales 1-5) >>> ci_short = mse.get_complexity_index(entropy, scale_range=(1, 5)) >>> >>> # Long-term complexity (scales 10-20) >>> ci_long = mse.get_complexity_index(entropy, scale_range=(10, 20)) Notes: ----- Different scale ranges provide insights into different regulatory mechanisms: - Scales 1-5: Intrinsic cardiac dynamics - Scales 5-10: Sympathovagal balance - Scales 10-20: Long-term regulatory mechanisms """ if scale_range is None: scale_range = (1, len(entropy_values)) start_idx = scale_range[0] - 1 # Convert to 0-indexed end_idx = scale_range[1] # Extract relevant portion entropy_subset = entropy_values[start_idx:end_idx] if len(entropy_subset) < 2: warnings.warn( "Not enough entropy values for complexity index. Returning 0.", UserWarning, ) return 0.0 # Trapezoidal integration complexity_index = np.trapz(entropy_subset) return complexity_index
# Export main class __all__ = ["MultiScaleEntropy"]