MotionCLR: Motion Generation and Training-free Editing via Understanding Attention Mechanisms

Ling-Hao Chen1, 2, Wenxun Dai1, Xuan Ju3, Shunlin Lu4, Lei Zhang2🤗

🤗Corresponding author.
1Tsinghua University , 2International Digital Economy Academy (IDEA),
3The Chinese University of Hong Kong, 4The Chinese University of Hong Kong, Shenzhen

📖 Abstract

This research delves into the problem of interactive editing of human motion generation. Previous motion diffusion models lack explicit modeling of the word-level text-motion correspondence and good explainability, hence restricting their fine-grained editing ability. To address this issue, we propose an attention-based motion diffusion model, namely MotionCLR (/ˈmoʊʃn klɪr/), with CLeaR modeling of attention mechanisms. Technically, MotionCLR models the in-modality and cross-modality interactions with self-attention and cross-attention, respectively. More specifically, the self-attention mechanism aims to measure the sequential similarity between frames and impacts the order of motion features. By contrast, the cross-attention mechanism works to find the fine-grained word-sequence correspondence and activate the corresponding timesteps in the motion sequence. Based on these key properties, we develop a versatile set of simple yet effective motion editing methods via manipulating attention maps, such as motion (de-)emphasizing, in-place motion replacement, and example-based motion generation, etc. For further verification of the explainability of the attention mechanism, we additionally explore the potential of action-counting and grounded motion generation ability via attention maps. Our experimental results show that our method enjoys good generation and editing ability with good explainability.

Demo Video

MotionCLR Interactive Demo Video

🤗 MotionCLR v1-preview Demo

📖 Introducing MotionCLR

Inplace Motion Replacement

Motion (de)-emphasizing

Motion Erasing

Motion Shifting

Motion Style Transferring

🌹 Acknowledgement

Citation

@article{motionllm,
  title={MotionCLR: Motion Generation and Training-free Editing via Understanding Attention Mechanisms},
  author={Chen, Ling-Hao and Dai, Wenxun and Ju, Xuan and Lu, Shunlin and Zhang, Lei},
  journal={arxiv:2410.18977},
  year={2024}
}

The website template was adapted from HumanMAC Project.