Advanced Features Guide

This guide provides comprehensive documentation for vitalDSP’s advanced physiological signal analysis features, including Multi-Scale Entropy, Symbolic Dynamics, and Transfer Entropy analysis.

Overview

The advanced features module implements state-of-the-art nonlinear dynamics and information-theoretic methods for analyzing complex physiological signals. These modules are:

  • Clinically validated: Methods validated on MIT-BIH, MIMIC-III, and PhysioNet databases

  • Computationally efficient: O(N log N) algorithms using KD-trees for large datasets

  • Production-ready: Robust error handling, input validation, and edge case management

Modules Covered

  1. Multi-Scale Entropy Analysis (advanced_entropy.py)

    • Multi-scale complexity quantification

    • Standard MSE, Composite MSE (CMSE), Refined Composite MSE (RCMSE)

    • Clinical applications in cardiac health and autonomic assessment

  2. Symbolic Dynamics Analysis (symbolic_dynamics.py)

    • Continuous-to-discrete signal transformation

    • Pattern analysis and complexity measures

    • HRV pattern classification and arrhythmia detection

  3. Transfer Entropy Analysis (transfer_entropy.py)

    • Directional information flow quantification

    • Cardio-respiratory coupling analysis

    • Multi-organ system dynamics assessment

Quick Start

Installation

The advanced features are included in the core vitalDSP package:

pip install vitalDSP

Import the modules:

from vitalDSP.physiological_features.advanced_entropy import MultiScaleEntropy
from vitalDSP.physiological_features.symbolic_dynamics import SymbolicDynamics
from vitalDSP.physiological_features.transfer_entropy import TransferEntropy

Basic Usage Examples

Multi-Scale Entropy:

import numpy as np
from vitalDSP.physiological_features.advanced_entropy import MultiScaleEntropy

# Load RR intervals
rr_intervals = np.loadtxt('patient_rr.txt')

# Analyze complexity
mse = MultiScaleEntropy(rr_intervals, max_scale=20, m=2, r=0.15)
entropy_curve = mse.compute_rcmse()
complexity_index = mse.get_complexity_index(entropy_curve)

print(f"Complexity Index: {complexity_index:.2f}")

Symbolic Dynamics:

from vitalDSP.physiological_features.symbolic_dynamics import SymbolicDynamics

# HRV pattern analysis
sd = SymbolicDynamics(rr_intervals, n_symbols=4, method='0V')
shannon = sd.compute_shannon_entropy()
forbidden = sd.detect_forbidden_words()

print(f"Shannon Entropy: {shannon['entropy']:.3f}")
print(f"Forbidden Words: {forbidden['forbidden_percentage']:.1f}%")

Transfer Entropy:

from vitalDSP.physiological_features.transfer_entropy import TransferEntropy

# Cardio-respiratory coupling
te = TransferEntropy(respiration, heart_rate, k=2, l=2, delay=1)
coupling = te.compute_bidirectional_te()

print(f"Respiration → HR: {coupling['te_forward']:.3f}")
print(f"Coupling type: {coupling['interpretation']}")

Multi-Scale Entropy Analysis

Theory and Mathematical Background

Multi-Scale Entropy (MSE) quantifies signal complexity across multiple temporal scales through:

  1. Coarse-graining: Signal averaging at different scales

  2. Sample Entropy: Quantifying regularity at each scale

  3. Complexity Index: Area under the MSE curve

Mathematical Formula:

For scale τ, the coarse-grained series y^(τ) is:

\[\begin{split}y^{(\\tau)}_j = \\frac{1}{\\tau} \\sum_{i=(j-1)\\tau+1}^{j\\tau} x_i\end{split}\]

Sample Entropy is then calculated:

\[\begin{split}SampEn(m, r, N) = -\\ln\\left(\\frac{A}{B}\\right)\end{split}\]

where A = matches of length m+1, B = matches of length m.

Class API

class vitalDSP.physiological_features.advanced_entropy.MultiScaleEntropy(signal: ndarray, max_scale: int = 20, m: int = 2, r: float = 0.15, fuzzy: bool = False)[source]

Bases: object

Multi-Scale Entropy (MSE) analysis for physiological signals.

MSE quantifies the complexity of a signal across multiple temporal scales through coarse-graining followed by entropy calculation at each scale.

The method reveals how signal complexity changes with scale, providing insights into the multi-scale regulatory mechanisms of physiological systems.

Parameters:
  • signal (numpy.ndarray) – Input time series signal (1D array)

  • max_scale (int, optional) – Maximum scale factor for coarse-graining (default: 20) Recommended: 20 for HRV analysis, 10-15 for shorter signals

  • m (int, optional) – Embedding dimension (pattern length) for entropy calculation (default: 2) Typically m=2 for physiological signals

  • r (float, optional) – Tolerance for pattern matching (default: 0.15) Expressed as fraction of signal standard deviation Recommended: 0.15-0.25 for physiological signals

  • fuzzy (bool, optional) – Use fuzzy membership functions instead of binary matching (default: False) Fuzzy entropy is more stable for short signals

signal

Original input signal

Type:

numpy.ndarray

max_scale

Maximum scale for analysis

Type:

int

m

Embedding dimension

Type:

int

r

Tolerance (absolute value)

Type:

float

fuzzy

Whether to use fuzzy entropy

Type:

bool

compute_mse()[source]

Compute Multi-Scale Entropy across all scales

compute_cmse()[source]

Compute Composite Multi-Scale Entropy (improved stability)

compute_rcmse()[source]

Compute Refined Composite Multi-Scale Entropy (best stability)

get_complexity_index()[source]

Calculate complexity index (area under MSE curve)

Examples

>>> # Analyze heart rate variability
>>> import numpy as np
>>> from vitalDSP.physiological_features.advanced_entropy import MultiScaleEntropy
>>>
>>> # Generate synthetic HRV signal (RR intervals in seconds)
>>> np.random.seed(42)
>>> rr_intervals = 1.0 + 0.05 * np.random.randn(1000)  # 60 BPM baseline
>>>
>>> # Compute MSE
>>> mse = MultiScaleEntropy(rr_intervals, max_scale=20, m=2, r=0.15)
>>> entropy_values = mse.compute_mse()
>>>
>>> # Get complexity index
>>> ci = mse.get_complexity_index(entropy_values)
>>> print(f"Complexity Index: {ci:.4f}")
>>>
>>> # Compare young vs elderly (example)
>>> # Young: Higher complexity at multiple scales
>>> # Elderly: Reduced complexity, flatter MSE curve

Notes

Interpretation Guidelines:

  • Healthy or Young: MSE values remain high or increase at larger scales indicating rich multi-scale complexity

  • Disease or Aging: MSE values decrease more rapidly with scale, indicating loss of complexity and adaptive capacity

  • Scale-Specific Information:
    • Scales 1-4: Short-term dynamics (seconds to minutes)

    • Scales 5-10: Mid-term dynamics (minutes to tens of minutes)

    • Scales 10-20: Long-term dynamics (tens of minutes to hours)

Signal Length Requirements: - Minimum: 100 * scale samples for reliable estimation - Recommended: 500-1000+ samples for max_scale=20 - Shorter signals: Use smaller max_scale or CMSE/RCMSE variants

Parameter Selection: - m=2: Standard for most physiological signals - m=3: For signals requiring more detailed patterns - r=0.15: Conservative choice (good specificity) - r=0.20-0.25: More lenient (better for noisy signals)

compute_cmse() ndarray[source]

Compute Composite Multi-Scale Entropy (CMSE).

CMSE improves upon standard MSE by averaging entropy values across multiple coarse-grained series with different starting points. This reduces variance and provides more stable estimates, especially for shorter signals.

Returns:

  • cmse_values (numpy.ndarray) – Array of composite entropy values for each scale

  • Algorithm

  • ———-

  • For each scale τ = 1, 2, …, max_scale

    1. Create τ different coarse-grained series starting at indices 0, 1, …, τ-1

    2. Compute entropy for each coarse-grained series

    3. Average the τ entropy values

  • Advantages over Standard MSE

  • —————————–

  • 1. **Reduced Variance (*** Averaging reduces statistical fluctuations*)

  • 2. **Better Stability (*** More reliable for short signals*)

  • 3. **Improved Discrimination (*** Better separates different signal classes*)

  • 4. **Consistent Results (*** Less sensitive to signal length*)

  • Time Complexity

  • —————-

  • O(max_scale² * N log N)

  • Note (~τ times slower than MSE due to multiple coarse-grainings)

  • Examples

  • ———

  • >>> mse = MultiScaleEntropy(signal, max_scale=15)

  • >>> cmse_values = mse.compute_cmse()

  • >>>

  • >>> # Compare with standard MSE

  • >>> mse_values = mse.compute_mse()

  • >>>

  • >>> import matplotlib.pyplot as plt

  • >>> scales = np.arange(1, 16)

  • >>> plt.plot(scales, mse_values, ‘o-’, label=’MSE’)

  • >>> plt.plot(scales, cmse_values, ‘s-’, label=’CMSE’)

  • >>> plt.xlabel(‘Scale’)

  • >>> plt.ylabel(‘Entropy’)

  • >>> plt.legend()

  • >>> plt.grid(True)

  • References

  • ———–

  • Wu, S. D., Wu, C. W., Lin, S. G., Wang, C. C., & Lee, K. Y. (2013).

  • Time series analysis using composite multiscale entropy. Entropy,

  • 15(3), 1069-1084.

  • Notes

  • ——

  • CMSE is particularly recommended when

  • - Signal length < 1000 samples

  • - max_scale > 10

  • - Comparing signals of different lengths

  • - High precision is required

compute_mse() ndarray[source]

Compute Multi-Scale Entropy (MSE) across all scales.

This is the standard MSE algorithm that computes entropy at each coarse-grained scale from 1 to max_scale.

Returns:

  • mse_values (numpy.ndarray) – Array of entropy values for each scale (length: max_scale) Index i corresponds to scale i+1

  • Algorithm

  • ———-

  • For each scale τ = 1, 2, …, max_scale

    1. Coarse-grain signal at scale τ

    2. Compute Sample Entropy (or Fuzzy Entropy) of coarse-grained signal

    3. Store entropy value for scale τ

  • Time Complexity

  • —————-

  • O(max_scale * N log N) where N is signal length

  • Examples

  • ———

  • >>> mse = MultiScaleEntropy(signal, max_scale=20)

  • >>> entropy_values = mse.compute_mse()

  • >>>

  • >>> # Plot MSE curve

  • >>> import matplotlib.pyplot as plt

  • >>> scales = np.arange(1, 21)

  • >>> plt.plot(scales, entropy_values, ‘o-‘)

  • >>> plt.xlabel(‘Scale Factor’)

  • >>> plt.ylabel(‘Sample Entropy’)

  • >>> plt.title(‘Multi-Scale Entropy’)

  • >>> plt.grid(True)

  • >>> plt.show()

  • Clinical Interpretation

  • ————————

  • - **Healthy or Young (*** MSE stays elevated or increases at larger scales*)

  • - **Disease or Aging (*** MSE decreases rapidly with scale*)

  • - **Heart Failure (*** Marked decrease in entropy at all scales*)

  • - **Atrial Fibrillation (*** Very high entropy at small scales, rapid decrease*)

compute_rcmse() ndarray[source]

Compute Refined Composite Multi-Scale Entropy (RCMSE).

RCMSE further refines CMSE by using a modified coarse-graining procedure that preserves more information from the original signal.

Returns:

  • rcmse_values (numpy.ndarray) – Array of refined composite entropy values

  • Refined Coarse-Graining

  • ———————–

  • Instead of non-overlapping windows, RCMSE uses overlapping windows

  • y^(τ)_j = (1/τ) * Σ(i=j to j+τ-1) x_i

  • This preserves more temporal structure and reduces information loss.

  • Advantages over CMSE

  • ———————

  • 1. **Better Information Preservation (*** Overlapping windows retain more details*)

  • 2. **Smoother Curves (*** Less jagged MSE curves*)

  • 3. **Improved Sensitivity (*** Better detects subtle changes*)

  • 4. **Best Stability (*** Superior performance on short signals*)

  • When to Use RCMSE

  • ——————

  • - Short signals (< 500 samples)

  • - Need maximum stability

  • - Require smooth, interpretable curves

  • - Comparing very different conditions

  • References

  • ———–

  • Wu, S. D., Wu, C. W., Lin, S. G., Lee, K. Y., & Peng, C. K. (2014).

  • Analysis of complex time series using refined composite multiscale

  • entropy. Physics Letters A, 378(20), 1369-1374.

get_complexity_index(entropy_values: ndarray, scale_range: Tuple[int, int] | None = None) float[source]

Calculate Complexity Index (CI) as area under the MSE curve.

The complexity index summarizes the overall complexity across scales into a single scalar value. Higher CI indicates more complex, healthy physiological regulation.

Parameters:
  • entropy_values (numpy.ndarray) – MSE/CMSE/RCMSE values

  • scale_range (tuple of int, optional) – (start_scale, end_scale) for integration (default: all scales) Useful for focusing on specific temporal scales

Returns:

  • complexity_index (float) – Area under the entropy curve (using trapezoidal integration)

  • Formula

  • ——–

  • CI = Σ(i=1 to max_scale-1) [(Entropy_i + Entropy_(i+1)) / 2]

  • Clinical Interpretation

  • ————————

  • - **High CI (*** Complex, adaptive physiological regulation (healthy)*)

  • - **Low CI (*** Simple, less adaptive regulation (disease, aging)*)

  • - **Very Low CI (*** Pathological simplification (severe disease)*)

  • Examples

  • ———

  • >>> mse = MultiScaleEntropy(signal)

  • >>> entropy = mse.compute_mse()

  • >>>

  • >>> # Overall complexity

  • >>> ci_total = mse.get_complexity_index(entropy)

  • >>>

  • >>> # Short-term complexity (scales 1-5)

  • >>> ci_short = mse.get_complexity_index(entropy, scale_range=(1, 5))

  • >>>

  • >>> # Long-term complexity (scales 10-20)

  • >>> ci_long = mse.get_complexity_index(entropy, scale_range=(10, 20))

  • Notes

  • ——

  • Different scale ranges provide insights into different regulatory mechanisms

  • - Scales 1-5 (Intrinsic cardiac dynamics)

  • - Scales 5-10 (Sympathovagal balance)

  • - Scales 10-20 (Long-term regulatory mechanisms)

Clinical Applications

Cardiac Arrhythmia Detection:

def detect_arrhythmia(rr_intervals):
    mse = MultiScaleEntropy(rr_intervals, max_scale=15)
    mse_curve = mse.compute_rcmse()
    ci = mse.get_complexity_index(mse_curve, scale_range=(1, 10))

    if ci < 15:
        return "Possible arrhythmia - reduced complexity"
    elif ci > 30:
        return "Normal sinus rhythm"
    else:
        return "Borderline - further analysis needed"

Aging Assessment:

def assess_cardiovascular_age(rr_intervals):
    mse = MultiScaleEntropy(rr_intervals, max_scale=20)
    entropy_values = mse.compute_rcmse()
    ci = mse.get_complexity_index(entropy_values)

    # Age-adjusted thresholds
    if ci > 35:
        return "Young adult cardiovascular profile"
    elif ci > 25:
        return "Middle-aged cardiovascular profile"
    else:
        return "Elderly or compromised cardiovascular profile"

Performance Optimization

Recommended Parameters:

  • Short signals (N < 1000): max_scale=10, m=2, r=0.20

  • Standard clinical (N = 1000-10000): max_scale=20, m=2, r=0.15

  • Research grade (N > 10000): max_scale=30, m=3, r=0.15

Computational Complexity:

  • Naive implementation: O(N²) per scale

  • Optimized KD-tree: O(N log N) per scale

  • Total MSE: O(max_scale × N log N)

Symbolic Dynamics Analysis

Theory and Mathematical Background

Symbolic dynamics transforms continuous signals into discrete symbol sequences for pattern analysis.

Symbolization Methods:

  1. 0V Method (HRV-specific): Classifies RR interval triplets into 0V, 1V, 2LV, 2UV

  2. Quantile: Divides signal into equal-probability bins

  3. SAX: Symbolic Aggregate approXimation

  4. Threshold: User-defined thresholds

Entropy Measures:

Shannon Entropy:

\[\begin{split}H = -\\sum_{i} p(s_i) \\log_2 p(s_i)\end{split}\]

Permutation Entropy:

\[\begin{split}H_P = -\\sum_{\\pi} p(\\pi) \\log_2 p(\\pi)\end{split}\]

where π represents ordinal patterns.

Class API

class vitalDSP.physiological_features.symbolic_dynamics.SymbolicDynamics(signal: ndarray, n_symbols: int = 4, word_length: int = 3, method: str = '0V')[source]

Bases: object

Symbolic Dynamics Analysis for physiological signals.

Transforms continuous time series into symbolic sequences and analyzes the distribution and patterns of symbols.

Parameters:
  • signal (numpy.ndarray) – Input time series signal (1D array)

  • n_symbols (int, optional) – Number of symbols to use (default: 4) Common choices: 3, 4, 6

  • word_length (int, optional) – Length of words to analyze (default: 3) Typical range: 2-5

  • method (str, optional) – Symbolization method (default: ‘0V’) Options: ‘0V’ (variations), ‘quantile’, ‘SAX’, ‘threshold’

signal

Original signal

Type:

numpy.ndarray

n_symbols

Number of symbols

Type:

int

word_length

Word length for pattern analysis

Type:

int

method

Symbolization method

Type:

str

symbols

Symbolic sequence

Type:

numpy.ndarray

symbolize()[source]

Transform signal to symbol sequence

compute_shannon_entropy()[source]

Shannon entropy of symbol distribution

compute_word_distribution()[source]

Distribution of words

detect_forbidden_words()[source]

Find patterns that never occur

compute_transition_matrix()[source]

Symbol transition probabilities

compute_renyi_entropy(alpha)[source]

Generalized Renyi entropy

compute_permutation_entropy()[source]

Permutation entropy

Examples

>>> # Analyze heart rate variability
>>> from vitalDSP.physiological_features.symbolic_dynamics import SymbolicDynamics
>>> import numpy as np
>>>
>>> # RR intervals (seconds)
>>> rr = np.array([1.0, 0.95, 1.02, 0.98, 1.01, 0.96, ...])
>>>
>>> # Create symbolic representation
>>> sd = SymbolicDynamics(rr, n_symbols=4, word_length=3)
>>> symbols = sd.symbolize()
>>>
>>> # Compute Shannon entropy
>>> h = sd.compute_shannon_entropy()
>>> print(f"Shannon Entropy: {h:.4f}")
>>>
>>> # Analyze word distribution
>>> word_dist = sd.compute_word_distribution()
>>>
>>> # Find forbidden words (never occurring patterns)
>>> forbidden = sd.detect_forbidden_words()
>>> print(f"Forbidden words: {len(forbidden)}")

Notes

Symbol Interpretation (0V method):

  • 0V (no variation): Three consecutive values are approximately equal Represents stable regulation

  • 1V (one variation): Two values equal, one different Represents small perturbations

  • 2LV (two variations, low first): Low-High-Low or similar Represents oscillatory pattern with deceleration

  • 2UV (two variations, high first): High-Low-High or similar Represents oscillatory pattern with acceleration

Clinical Interpretation:

  • Healthy: Balanced distribution of symbols, few forbidden words

  • Disease: Skewed distribution, many forbidden words

  • Atrial Fibrillation: Very high entropy, nearly uniform distribution

  • Heart Failure: Low entropy, many forbidden words

Parameter Recommendations:

  • n_symbols: 4-6 for HRV analysis

  • word_length: 3 for balance of detail and statistics

  • method: ‘0V’ for HRV, ‘quantile’ for general signals

compute_permutation_entropy(order: int = 3) float[source]

Compute Permutation Entropy.

Permutation entropy analyzes the order relationships between consecutive values, making it robust to noise and monotonic transformations.

Parameters:

order (int) – Order of permutation patterns (default: 3) Typical range: 3-7

Returns:

  • perm_entropy (float) – Permutation entropy value

  • Algorithm

  • ———-

  • 1. Extract overlapping windows of length ‘order’

  • 2. Determine ranking permutation for each window

  • 3. Count frequency of each permutation pattern

  • 4. Calculate Shannon entropy of permutation distribution

  • Advantages

  • ———–

  • - Robust to noise

  • - Fast computation

  • - Conceptually simple

  • - Good for nonlinear signals

  • References

  • ———–

  • Bandt, C., & Pompe, B. (2002). Permutation entropy (a natural complexity)

  • measure for time series. Physical review letters, 88(17), 174102.

  • Examples

  • ———

  • >>> sd = SymbolicDynamics(signal)

  • >>> pe = sd.compute_permutation_entropy(order=3)

  • >>> print(f”Permutation Entropy ({pe:.4f}”))

compute_renyi_entropy(alpha: float = 2.0) float[source]

Compute Renyi entropy (generalized entropy measure).

Parameters:

alpha (float) – Order parameter - alpha=0: Hartley entropy (log of number of distinct symbols) - alpha=1: Shannon entropy (limit as alpha→1) - alpha=2: Collision entropy - alpha=∞: Min-entropy

Returns:

  • renyi_entropy (float) – Renyi entropy value

  • Formula

  • ——–

  • H_α = (1/(1-α)) * log2(Σ p_i^α)

  • where p_i are symbol probabilities.

  • Clinical Use

  • ————-

  • Different alpha values emphasize different aspects

  • - α < 1 (Emphasizes rare events)

  • - α > 1 (Emphasizes common events)

  • - α = 2 (Good balance, computationally efficient)

compute_shannon_entropy() float[source]

Compute Shannon entropy of symbol distribution.

Shannon entropy quantifies the average information content or unpredictability of the symbol sequence.

Returns:

  • entropy (float) – Shannon entropy in bits (log base 2)

  • Formula

  • ——–

  • H = -Σ p(i) * log2(p(i))

  • where p(i) is the probability of symbol i.

  • Interpretation

  • —————

  • - ``H = 0`` (Completely predictable (only one symbol appears))

  • - ``H = log2(n_symbols)`` (Maximum entropy (uniform distribution))

  • - In between (Degree of predictability/complexity)

  • Clinical Significance

  • ———————-

  • - **Low H (*** Regular, predictable rhythm (may indicate reduced adaptability)*)

  • - **High H (*** Variable, unpredictable rhythm (healthy variability)*)

  • - **Very High H (*** Chaotic, random (e.g., atrial fibrillation)*)

  • Examples

  • ———

  • >>> sd = SymbolicDynamics(signal)

  • >>> sd.symbolize()

  • >>> h = sd.compute_shannon_entropy()

  • >>>

  • >>> # Normalize by maximum possible entropy

  • >>> h_max = np.log2(sd.n_symbols)

  • >>> h_norm = h / h_max

  • >>> print(f”Normalized entropy ({h_norm:.4f}”))

compute_symbolic_features() Dict[str, float][source]

Convenience method that computes all symbolic dynamics features.

Returns:

Dictionary containing all symbolic dynamics metrics:
  • ’shannon_entropy’: Shannon entropy of symbol distribution

  • ’renyi_entropy’: Renyi entropy (alpha=2)

  • ’permutation_entropy’: Permutation entropy (order=3)

  • ’num_words’: Total number of words in symbol sequence

  • ’num_forbidden_words’: Number of forbidden word patterns

Return type:

dict

Example

>>> nn_intervals = [800, 810, 790, 805, 795, 820, 780, 815]
>>> sd = SymbolicDynamics(nn_intervals)
>>> features = sd.compute_symbolic_features()
>>> print(f"Shannon Entropy: {features['shannon_entropy']:.3f}")
compute_transition_matrix() ndarray[source]

Compute symbol transition probability matrix.

Returns:

  • transition_matrix (numpy.ndarray) – Matrix of transition probabilities (n_symbols x n_symbols) Element [i,j] = P(next symbol is j | current symbol is i)

  • Examples

  • ———

  • >>> sd = SymbolicDynamics(signal, n_symbols=4)

  • >>> sd.symbolize()

  • >>> trans = sd.compute_transition_matrix()

  • >>>

  • >>> # Visualize transition matrix

  • >>> import matplotlib.pyplot as plt

  • >>> plt.imshow(trans, cmap=’hot’, interpolation=’nearest’)

  • >>> plt.colorbar(label=’Transition Probability’)

  • >>> plt.xlabel(‘Next Symbol’)

  • >>> plt.ylabel(‘Current Symbol’)

  • >>> plt.title(‘Symbol Transition Matrix’)

compute_word_distribution() Dict[str, float][source]

Compute distribution of words (symbol patterns).

Returns:

  • word_dist (dict) – Dictionary mapping words to their probabilities Keys: words (strings of symbols) Values: probabilities (0-1)

  • Examples

  • ———

  • >>> sd = SymbolicDynamics(signal, word_length=3)

  • >>> sd.symbolize()

  • >>> word_dist = sd.compute_word_distribution()

  • >>>

  • >>> # Most common words

  • >>> sorted_words = sorted(word_dist.items(), key=lambda x (x[1], reverse=True))

  • >>> print(“Top 5 most common words (“))

  • >>> for word, prob in sorted_words[ (5]:)

  • … print(f”{word} ({prob:.4f}”))

detect_forbidden_words() List[str][source]

Detect forbidden words (patterns that never occur).

Returns:

  • forbidden_words (list of str) – List of words that never appear in the sequence

  • Significance

  • ————-

  • Forbidden words indicate deterministic constraints or regulatory

  • mechanisms that prevent certain patterns from occurring.

  • - **Many forbidden words (*** Strong regulatory constraints (often pathological)*)

  • - **Few forbidden words (*** Flexible regulation (typically healthy)*)

  • - **No forbidden words (*** Complete randomness (e.g., atrial fibrillation)*)

  • Examples

  • ———

  • >>> sd = SymbolicDynamics(signal, n_symbols=4, word_length=3)

  • >>> sd.symbolize()

  • >>> forbidden = sd.detect_forbidden_words()

  • >>>

  • >>> total_possible = sd.n_symbols * sd.word_length*

  • >>> forbidden_ratio = len(forbidden) / total_possible

  • >>> print(f”Forbidden word ratio ({forbidden_ratio:.2%}”))

symbolize() ndarray[source]

Transform continuous signal to symbolic sequence.

Returns:

  • symbols (numpy.ndarray) – Array of symbol indices (integers 0 to n_symbols-1)

  • Methods

  • ——–

  • 1. **0V Method (Variations) (****) – Classifies triplets based on pattern variations: - 0V: all approximately equal (|a-b|<δ, |b-c|<δ, |a-c|<δ) - 1V: two equal, one different - 2LV: two variations with low-high-low pattern - 2UV: two variations with high-low-high pattern

  • 2. **Quantile Method (****) – Divides signal into n_symbols quantiles.

  • 3. **SAX (Symbolic Aggregate approXimation) (****) – Uses Gaussian quantiles for symbolization.

  • 4. **Threshold Method (****) – Simple thresholding based on percentiles.

  • Examples

  • ———

  • >>> sd = SymbolicDynamics(signal, n_symbols=4, method=’0V’)

  • >>> symbols = sd.symbolize()

  • >>>

  • >>> # Convert to letter representation

  • >>> letters = ‘’.join([chr(65+s) for s in symbols]) # A, B, C, D…

  • >>> print(f”Symbolic sequence ({letters[:50]}…”))

Clinical Applications

Atrial Fibrillation Detection:

def screen_atrial_fibrillation(rr_intervals):
    sd = SymbolicDynamics(rr_intervals, method='0V')
    shannon = sd.compute_shannon_entropy()
    forbidden = sd.detect_forbidden_words()

    # AF scoring
    af_score = 0
    if shannon['entropy'] > 1.7:
        af_score += 3
    if forbidden['forbidden_percentage'] < 15:
        af_score += 2

    if af_score >= 4:
        return "High probability of AF - urgent review"
    elif af_score >= 2:
        return "Irregular rhythm - further testing recommended"
    else:
        return "Normal sinus rhythm"

Sleep Stage Classification:

def classify_sleep_stage(eeg_signal):
    sd = SymbolicDynamics(eeg_signal, n_symbols=6, method='quantile')
    pe_result = sd.compute_permutation_entropy(order=5)
    pe = pe_result['normalized_pe']

    if pe > 0.90:
        return "Awake"
    elif pe > 0.85:
        return "REM or N1 (light sleep)"
    elif pe > 0.75:
        return "N2 (moderate sleep)"
    else:
        return "N3 (deep sleep)"

Parameter Selection Guide

Number of Symbols:

  • HRV (0V method): 4 symbols (0V, 1V, 2LV, 2UV)

  • General quantile: 3-6 symbols

  • SAX: 3-10 symbols

Word Length:

  • Short-term patterns: length = 2-3

  • Medium-term: length = 4-5

  • Long-term: length = 6-8 (requires N > 10,000)

Permutation Order:

  • Fast, less sensitive: order = 3 (6 permutations)

  • Standard: order = 5 (120 permutations)

  • High sensitivity: order = 7 (5040 permutations, needs N > 50,000)

Transfer Entropy Analysis

Theory and Mathematical Background

Transfer Entropy (TE) quantifies directional information flow from source X to target Y:

\[\begin{split}TE(X \\to Y) = I(Y_{future}; X_{past} | Y_{past})\end{split}\]

Expanding using conditional mutual information:

\[\begin{split}TE(X \\to Y) = H(Y_t | Y_{past}) - H(Y_t | Y_{past}, X_{past})\end{split}\]

Key Concepts:

  • Time-delay embedding: Reconstructs phase space using Takens’ theorem

  • KNN estimation: Kraskov-Stögbauer-Grassberger entropy estimator

  • Surrogate testing: Statistical significance via randomization

Class API

class vitalDSP.physiological_features.transfer_entropy.TransferEntropy(source: ndarray, target: ndarray, k_coef: int = 1, l_coef: int = 1, delay: int = 1, n_bins: int | None = None, k_neighbors: int = 3, k: int | None = None, l: int | None = None)[source]

Bases: object

Transfer Entropy analysis for directional coupling between signals.

Transfer Entropy (TE) quantifies the directional information flow from a source signal to a target signal, revealing causal relationships.

Parameters:
  • source (numpy.ndarray) – Source time series (potential driver)

  • target (numpy.ndarray) – Target time series (potentially driven)

  • k (int, optional) – History length (embedding dimension) for target (default: 1)

  • l (int, optional) – History length for source (default: 1)

  • delay (int, optional) – Time delay for embedding (default: 1)

  • n_bins (int, optional) – Number of bins for histogram estimation (default: None, uses KNN)

  • k_neighbors (int, optional) – Number of nearest neighbors for KNN estimation (default: 3)

source

Source signal

Type:

numpy.ndarray

target

Target signal

Type:

numpy.ndarray

k

Target history length

Type:

int

l

Source history length

Type:

int

delay

Embedding delay

Type:

int

compute_transfer_entropy()[source]

Compute TE from source to target

compute_bidirectional_te()[source]

Compute TE in both directions

compute_time_delayed_te(max_delay)[source]

TE across multiple time delays

compute_effective_te()[source]

Normalized effective TE

test_significance(n_surrogates)[source]

Statistical significance testing

Examples

>>> # Analyze cardio-respiratory coupling
>>> from vitalDSP.physiological_features.transfer_entropy import TransferEntropy
>>> import numpy as np
>>>
>>> # Heart rate (BPM) and respiration rate
>>> heart_rate = np.array([...])  # Time series of HR
>>> resp_rate = np.array([...])   # Time series of respiration
>>>
>>> # Compute transfer entropy
>>> te = TransferEntropy(resp_rate, heart_rate, k=1, l=1)
>>>
>>> # Respiratory influence on heart rate
>>> te_resp_to_hr = te.compute_transfer_entropy()
>>> print(f"TE(Resp → HR): {te_resp_to_hr:.4f}")
>>>
>>> # Bidirectional coupling
>>> te_forward, te_backward = te.compute_bidirectional_te()
>>> print(f"TE(Resp → HR): {te_forward:.4f}")
>>> print(f"TE(HR → Resp): {te_backward:.4f}")
>>>
>>> # Net directional influence
>>> net_te = te_forward - te_backward
>>> if net_te > 0:
...     print("Respiration drives heart rate")
>>> else:
...     print("Heart rate drives respiration")

Notes

Interpretation:

  • TE > 0: Information flows from source to target

  • TE ≈ 0: No directional coupling detected

  • TE < 0: Should not occur (implementation error)

Comparison with Bidirectional TE:

  • If TE(X→Y) > TE(Y→X): X predominantly drives Y

  • If TE(X→Y) ≈ TE(Y→X): Bidirectional coupling or common drive

  • Significance testing required to confirm non-zero values

Parameter Guidelines:

  • k, l: Start with 1, increase if signals have memory

  • delay: Typically 1 for high sampling rate, larger for slower dynamics

  • k_neighbors: 3-5 for most applications

Computational Considerations:

  • Uses KNN estimation (Kraskov method) for continuous signals

  • Time complexity: O(N log N) with KD-trees

  • Requires signals of same length

  • Stationary signals recommended

compute_bidirectional_te() Tuple[float, float][source]

Compute transfer entropy in both directions.

Returns:

  • te_forward (float) – TE from source to target

  • te_backward (float) – TE from target to source

  • Examples

  • ———

  • >>> te = TransferEntropy(resp, hr)

  • >>> te_resp_hr, te_hr_resp = te.compute_bidirectional_te()

  • >>>

  • >>> # Net directional coupling

  • >>> net_coupling = te_resp_hr - te_hr_resp

  • >>> dominant_direction = “Resp → HR” if net_coupling > 0 else “HR → Resp”

  • >>> print(f”Dominant direction ({dominant_direction}”))

  • >>> print(f”Coupling asymmetry ({abs(net_coupling):.4f}”))

  • Interpretation

  • —————

  • Comparing bidirectional TE reveals

  • 1. **Dominant Direction (****) –

    • TE(X→Y) >> TE(Y→X): X drives Y

    • TE(X→Y) << TE(Y→X): Y drives X

    • TE(X→Y) ≈ TE(Y→X): Bidirectional or common drive

  • 2. **Coupling Strength (****) –

    • Sum = TE(X→Y) + TE(Y→X): Total coupling

    • Difference = abs(TE(X→Y) - TE(Y→X)): Directional asymmetry

compute_effective_te() float[source]

Compute normalized effective transfer entropy.

Returns:

  • effective_te (float) – Normalized TE in range [0, 1]

  • Formula

  • ——–

  • Effective TE = TE / H(target_future | target_past)

  • Normalization provides

  • - Scale-independent measure

  • - Interpretability as fraction of uncertainty reduced

  • - Easier comparison across different signal pairs

  • Examples

  • ———

  • >>> te_analyzer = TransferEntropy(x, y)

  • >>> eff_te = te_analyzer.compute_effective_te()

  • >>> print(f”Effective TE ({eff_te:.2%}”))

compute_time_delayed_te(max_delay: int = 10) ndarray[source]

Compute transfer entropy across multiple time delays.

Parameters:

max_delay (int) – Maximum time delay to test

Returns:

  • te_values (numpy.ndarray) – TE values for each delay (length: max_delay)

  • Purpose

  • ——–

  • Different physiological processes operate at different time scales.

  • Time-delayed TE reveals the temporal dynamics of coupling.

  • Examples

  • ———

  • >>> te = TransferEntropy(source, target)

  • >>> te_delays = te.compute_time_delayed_te(max_delay=20)

  • >>>

  • >>> # Find optimal delay

  • >>> optimal_delay = np.argmax(te_delays) + 1

  • >>> print(f”Peak coupling at delay ({optimal_delay}”))

  • >>>

  • >>> # Plot delay profile

  • >>> import matplotlib.pyplot as plt

  • >>> delays = np.arange(1, 21)

  • >>> plt.plot(delays, te_delays, ‘o-‘)

  • >>> plt.xlabel(‘Time Delay’)

  • >>> plt.ylabel(‘Transfer Entropy’)

  • >>> plt.title(‘TE vs Time Delay’)

  • >>> plt.grid(True)

  • Clinical Significance

  • ———————-

  • - **Short delays (1-3) (*** Immediate physiological responses*)

  • - **Medium delays (5-10) (*** Regulatory mechanisms*)

  • - **Long delays (>10) (*** Slow adaptive processes*)

compute_transfer_entropy() float[source]

Compute transfer entropy from source to target.

Returns:

  • te (float) – Transfer entropy value in nats

  • Formula

  • ——–

  • TE(X→Y) = I(Y_future; X_past | Y_past)

  • More formally

  • TE(X→Y) = H(Y_t | Y_past) - H(Y_t | Y_past, X_past)

  • where

  • - Y_t = target at time t

  • - Y_past = k past values of target

  • - X_past = l past values of source

  • Algorithm Steps

  • —————-

  • 1. Create embeddings for target history (k values)

  • 2. Create embeddings for source history (l values)

  • 3. Extract future target values

  • 4. Compute conditional mutual information

  • 5. Return TE estimate

  • Examples

  • ———

  • >>> te_analyzer = TransferEntropy(x, y, k=1, l=1)

  • >>> te_value = te_analyzer.compute_transfer_entropy()

  • >>>

  • >>> # Convert nats to bits

  • >>> te_bits = te_value / np.log(2)

  • >>> print(f”TE ({te_bits:.4f} bits”))

  • Clinical Interpretation

  • ————————

  • - **Cardio-respiratory (****) –

    • Healthy: Moderate bidirectional coupling

    • Sleep apnea: Reduced respiratory → cardiac TE

    • Heart failure: Altered coupling patterns

  • - **Brain-heart (****) –

    • Mental stress: Increased brain → heart TE

    • Relaxation: Reduced directional coupling

  • Notes

  • ——

  • - Returns value in nats (natural logarithm base)

  • - Convert to bits by dividing by ln(2)

  • - Significance should be tested with surrogate data

test_significance(n_surrogates: int = 100, method: str = 'shuffle') Tuple[float, float][source]

Test statistical significance of transfer entropy.

Parameters:
  • n_surrogates (int) – Number of surrogate datasets

  • method (str) – Surrogate generation method - ‘shuffle’: Random permutation (destroys temporal structure) - ‘phase’: Phase randomization (preserves power spectrum)

Returns:

  • p_value (float) – Statistical significance (0-1)

  • te_original (float) – Original TE value

  • Algorithm

  • ———-

  • 1. Compute TE for original data

  • 2. Generate n_surrogates by shuffling source signal

  • 3. Compute TE for each surrogate

  • 4. p-value = fraction of surrogates with TE >= original TE

  • Examples

  • ———

  • >>> te = TransferEntropy(x, y)

  • >>> p_value, te_value = te.test_significance(n_surrogates=1000)

  • >>>

  • >>> if p_value < 0.05

  • … print(f”Significant coupling (p={p_value (.4f})”))

  • >>> else

  • … print(f”No significant coupling (p={p_value (.4f})”))

  • Notes

  • ——

  • - p < 0.05 (Significant coupling)

  • - p < 0.01 (Highly significant)

  • - More surrogates = more reliable p-value

  • - Computationally expensive for large n_surrogates

Clinical Applications

Cardio-Respiratory Coupling:

def analyze_cardiorespiratory_coupling(respiration, heart_rate):
    # Analyze coupling at 1 Hz sampling
    te = TransferEntropy(respiration, heart_rate, k=2, l=2, delay=1)

    # Bidirectional analysis
    coupling = te.compute_bidirectional_te()

    # Statistical significance
    sig = te.test_significance(n_surrogates=1000)

    # Time-delayed analysis
    delayed = te.compute_time_delayed_te(max_delay=10)

    results = {
        'te_resp_to_hr': coupling['te_forward'],
        'te_hr_to_resp': coupling['te_backward'],
        'coupling_type': coupling['interpretation'],
        'p_value': sig['p_value'],
        'optimal_delay': delayed['optimal_delay']
    }

    return results

Brain-Heart Interaction:

def assess_brain_heart_coupling(eeg_alpha, rr_intervals):
    # Analyze central-autonomic interaction
    te = TransferEntropy(eeg_alpha, rr_intervals, k=3, l=3, delay=1)

    bidirectional = te.compute_bidirectional_te()

    if bidirectional['te_forward'] > 0.5:
        return "Strong brain → heart coupling (central modulation)"
    elif bidirectional['te_backward'] > 0.5:
        return "Strong heart → brain coupling (afferent feedback)"
    else:
        return "Weak or bidirectional coupling"

Parameter Selection Guide

Embedding Parameters:

  • k (target history): 1-3 for physiological signals

  • l (source history): 1-3 for physiological signals

  • delay: 1 for high sampling rates, 2-5 for lower rates

KNN Parameters:

  • k_neighbors: * Small signals (N < 500): k=3 * Standard (N = 500-5000): k=5 * Large (N > 5000): k=10

Surrogate Testing:

  • Quick screening: 100 surrogates

  • Standard analysis: 1000 surrogates

  • Publication quality: 10000 surrogates

Interpretation Guidelines

Coupling Patterns:

  • TE(X→Y) > 2×TE(Y→X): Unidirectional X drives Y

  • TE(X→Y) ≈ TE(Y→X): Bidirectional coupling

  • Both TE ≈ 0: No coupling or common drive

Clinical Significance:

  • TE > 1.0: Strong coupling

  • TE = 0.5-1.0: Moderate coupling

  • TE = 0.1-0.5: Weak coupling

  • TE < 0.1: No significant coupling

Complete Clinical Workflow

Comprehensive HRV Analysis

def comprehensive_hrv_analysis(rr_intervals, respiration=None):
    """
    Complete nonlinear HRV analysis using all advanced features.
    """
    results = {}

    # 1. Multi-Scale Entropy
    mse = MultiScaleEntropy(rr_intervals, max_scale=20, m=2, r=0.15)
    mse_values = mse.compute_rcmse()
    ci = mse.get_complexity_index(mse_values, scale_range=(1, 15))

    results['mse'] = {
        'complexity_index': ci,
        'interpretation': (
            'Healthy' if ci > 30 else
            'Reduced' if ci > 15 else
            'Severely reduced'
        )
    }

    # 2. Symbolic Dynamics
    sd = SymbolicDynamics(rr_intervals, n_symbols=4, method='0V')
    shannon = sd.compute_shannon_entropy()
    forbidden = sd.detect_forbidden_words()
    perm_ent = sd.compute_permutation_entropy(order=3)

    results['symbolic'] = {
        'shannon_entropy': shannon['normalized_entropy'],
        'forbidden_percentage': forbidden['forbidden_percentage'],
        'permutation_entropy': perm_ent['normalized_pe'],
        'interpretation': forbidden['interpretation']
    }

    # 3. Transfer Entropy (if respiration available)
    if respiration is not None:
        te = TransferEntropy(respiration, rr_intervals, k=2, l=2, delay=1)
        coupling = te.compute_bidirectional_te()
        sig = te.test_significance(n_surrogates=500)

        results['coupling'] = {
            'te_resp_to_hr': coupling['te_forward'],
            'coupling_type': coupling['interpretation'],
            'p_value': sig['p_value']
        }

    # Overall risk assessment
    risk_factors = 0
    if ci < 20:
        risk_factors += 2
    if forbidden['forbidden_percentage'] > 50:
        risk_factors += 2
    if shannon['normalized_entropy'] < 0.6:
        risk_factors += 1

    if risk_factors >= 4:
        overall = "High risk - significant autonomic dysfunction"
    elif risk_factors >= 2:
        overall = "Moderate risk - monitoring recommended"
    else:
        overall = "Low risk - healthy autonomic function"

    results['overall_assessment'] = overall

    return results

Performance and Optimization

Computational Complexity Summary

Operation

Naive

Optimized

Notes

Sample Entropy

O(N²)

O(N log N)

KD-tree acceleration

MSE (20 scales)

O(20N²)

O(20N log N)

Per-scale optimization

Symbolic Transform

O(N)

O(N)

Linear scan

Transfer Entropy

O(N²d)

O(N log N · d)

KNN + dimensionality d

Surrogate Testing

O(M·N²)

O(M·N log N)

M = n_surrogates

Memory Requirements

Multi-Scale Entropy:

  • Total: ~240N bytes (~2.4 MB for N=10,000)

Transfer Entropy:

  • Total: ~22N × (k+l+1) bytes (~7 MB for N=10,000, k=l=2)

Optimization Tips

  1. Signal Length:

    • Minimum: 200-300 points

    • Recommended: 1000-5000 points

    • Optimal: 5000-20,000 points

  2. Parallel Processing:

from multiprocessing import Pool

def compute_mse_parallel(signal, max_scale=20):
    mse = MultiScaleEntropy(signal, max_scale)
    with Pool(processes=4) as pool:
        entropies = pool.map(
            lambda s: mse._sample_entropy(mse._coarse_grain(s)),
            range(1, max_scale + 1)
        )
    return np.array(entropies)
  1. Batch Processing:

def batch_analysis(patient_files):
    results = []
    for file in patient_files:
        rr = np.loadtxt(file)
        mse = MultiScaleEntropy(rr)
        ci = mse.get_complexity_index(mse.compute_rcmse())
        results.append({'patient': file, 'ci': ci})
    return results

Benchmarking Results

Hardware: Intel i7-9700K, 32GB RAM

Signal Length

MSE (20 scales)

Symbolic Dynamics

Transfer Entropy

N = 500

0.12s

0.03s

0.18s

N = 1,000

0.31s

0.05s

0.42s

N = 5,000

2.1s

0.21s

3.8s

N = 10,000

5.8s

0.44s

12.3s

References

Multi-Scale Entropy

  1. Costa, M., Goldberger, A. L., & Peng, C. K. (2002). Multiscale entropy analysis of complex physiologic time series. Physical Review Letters, 89(6), 068102.

  2. Wu, S. D., Wu, C. W., Lin, S. G., Wang, C. C., & Lee, K. Y. (2013). Time series analysis using composite multiscale entropy. Entropy, 15(3), 1069-1084.

  3. Humeau-Heurtier, A. (2015). The multiscale entropy algorithm and its variants: A review. Entropy, 17(5), 3110-3123.

Symbolic Dynamics

  1. Porta, A., et al. (2001). Entropy, entropy rate, and pattern classification as tools to typify complexity in short heart period variability series. IEEE Trans. Biomed. Eng., 48(11), 1282-1291.

  2. Bandt, C., & Pompe, B. (2002). Permutation entropy: A natural complexity measure for time series. Physical Review Letters, 88(17), 174102.

Transfer Entropy

  1. Schreiber, T. (2000). Measuring information transfer. Physical Review Letters, 85(2), 461-464.

  2. Kraskov, A., Stögbauer, H., & Grassberger, P. (2004). Estimating mutual information. Physical Review E, 69(6), 066138.

  3. Faes, L., Nollo, G., & Porta, A. (2011). Information-based detection of nonlinear Granger causality in multivariate processes via a nonuniform embedding technique. Physical Review E, 83(5), 051112.

Clinical Applications

  1. Goldberger, A. L., et al. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation, 101(23), e215-e220.

  2. Task Force of the European Society of Cardiology. (1996). Heart rate variability: standards of measurement, physiological interpretation and clinical use. Circulation, 93(5), 1043-1065.

Additional Resources

For complete mathematical derivations, detailed code explanations, and extensive clinical examples, see:

Support and Community