Publications

A collection of my research publications and academic papers.

TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

Weian Mao, Xi Lin, Wei Huang, Yuxin Xie, Tianfu Fu, Bohan Zhuang, Song Han, Yukang Chen

PreprintApril 2026

TriAttention proposes a novel KV cache compression approach for long reasoning in LLMs. It leverages trigonometric series based on fixed centers in pre-RoPE space to score key importance, achieving 2.5x higher throughput or 10.7x KV memory reduction while matching Full Attention reasoning accuracy.
PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation

PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation

Xiaolong Li*, Youping Gu*, Xi Lin*, Weijie Wang, Bohan Zhuang

* Equal contribution

PreprintDecember 2025

PSA introduce an efficient attention mechanism to accelerate video understanding and generation. It leverages a multi-level sparse attention strategy, enabling the model to effectively mitigates information loss while preserving computational efficiency under a low compute budget.
Nifty tech tag lists from Wouter Beeftink