TimeMixer icon indicating copy to clipboard operation
TimeMixer copied to clipboard

optimize DFT decomposition for improved performance

Open S8XY opened this issue 9 months ago • 1 comments

the current DFT-based series decomposition implementation has performance gaps that can be addressed to improve training and inference speed.

current limitations

  1. naive implementation without batch optimization
  2. repeated calculations that could be cached
  3. ignores GPU memory layout optimization
  4. misses opportunities for parallel computation

proposed optimizations

1. batched operations

reimplement the DFT computation to operate on entire batches at once rather than processing each time series individually:

# Instead of:
for i in range(batch_size):
    xf = torch.fft.rfft(x[i])
    # ...processing...

# Use:
xf = torch.fft.rfft(x)  # processes entire batch at once

2. pre-compute frequency indices

for stationary time series, the top-k frequencies often remain consistent. implement a caching mechanism to avoid recomputing these indices repeatedly:

def forward(self, x):
    batch_size = x.shape[0]
    if self._cached_freq_indices is None or batch_size != self._cached_batch_size:
        # compute and cache indices
    # use cached indices

3. optimize memory operations

restructure memory operations to minimize data movement:

  • use in-place operations where possible
  • align memory access patterns
  • reduce intermediate tensor allocations

4. CUDA kernel implementation (optional)

for maximum performance, implement a custom CUDA kernel for the specific DFT decomposition patterns used in TimeMixer:

@torch.jit.script
def optimized_dft_decomp(x, top_k):
    # optimized implementation

expected benefits

  • 30-50% speedup in DFT decomposition operations
  • reduced memory footprint
  • improved training throughput
  • lower inference latency

affected code

the main class affected is DFT_series_decomp in models/TimeMixer.py.

S8XY avatar May 05 '25 16:05 S8XY

Thank you very much for your suggestion. Indeed, the DFT module should be further improved. We have also received many recommendations from colleagues in the signal processing community who are very interested in applying Fourier transforms to time-series modeling. If possible, you’re welcome to submit a merge request — we’ll carefully review and fully take your feedback into account.

kevinliu2000 avatar Oct 16 '25 14:10 kevinliu2000