Can Pose Transfer Models Generate Realistic Human Motion?

¹ SSPS
² Stanford University

Recent pose-transfer methods aim to generate temporally consistent and fully controllable videos of human actions, reenacting the motion from a reference video with a new identity. However, evaluating the perceptual quality and action consistency of these methods—especially on identities and motions outside the training distribution—remains challenging. To address this, we introduce PoseTransfer-HumanEval, a benchmarking framework for evaluating video pose-transfer methods, which involves generating videos outside the training distribution and conducting a participant study on their quality. We demonstrate it by evaluating three state-of-the-art methods: AnimateAnyone, MagicAnimate, and ExAvatar. We find that participants, presented with the pose-transferred videos, correctly identify the desired action only 42.92% of the time. Overall, the participants find the actions in the generated videos consistent with the reference (source) videos only 36.46% of the time.

Paper Code Slides

IEEE FG-W 2025

Introduction

Pose transfer methods generate videos in which a new human identity mimics the actions from a reference video. These techniques have potential applications in areas like animation, healthcare, and fashion, offering new ways to animate humans, assist therapy, or visualize clothing. At their core, pose transfer models work by separating motion from identity and combining them in a believable way.

Demonstration

We demonstrate PT-HE on three prominent pose transfer methods: AnimateAnyone, MagicAnimate, and ExAvatar. In Section 1/3 Semantics, we find that, overall, the majority of videos do not get recognized as the intended action, with ExAvatar outperforming AnimateAnyone and MagicAnimate:

Method	Recognition Accuracy (20 cls)
AnimateAnyone	47.50 %
MagicAnimate	13.12 %
ExAvatar	68.12 %
Random chance	5.00 %

In Section 2/3 Consistency, we find that, while ExAvatar has the highest rate of videos where the action was deemed as consistent among the tested models, approximately 40-45% of videos are deemed as inconsistent across the board:

Method	Consistent	Part. Consistent	Inconsistent
AnimateAnyone	27.50 %	25.62 %	46.88 %
MagicAnimate	26.88 %	31.25 %	41.88 %
ExAvatar	55.00 %	18.12 %	38.54 %

Conclusion

Despite strong benchmark performance, current pose transfer models often fail to generate motion that is both semantically clear and visually consistent. Our human evaluation reveals significant gaps between quantitative metrics and real-world perception—especially for out-of-distribution identities and actions—and equips future work in this area with a standardized procedure to conduct human evaluation.

Can Pose Transfer Models Generate Realistic Human Motion?

Introduction

1/3 Semantics

2/3 Consistency

3/3 Qualitative

Demonstration

Conclusion

Citation