We present HeadEvolver, a novel framework to generate stylized head avatars from text guidance. HeadEvolver uses locally learnable mesh deformation from a template head mesh, producing high-quality digital assets for detail-preserving editing and animation. To tackle the challenges of lacking fine-grained and semantic-aware local shape control in global deformation through Jacobians, we introduce a trainable parameter as a weighting factor for the Jacobian at each triangle to adaptively change local shapes while maintaining global correspondences and facial features. Moreover, to ensure the coherence of the resulting shape and appearance from different viewpoints, we use pretrained image diffusion models for differentiable rendering with regularization terms to refine the deformation under text guidance. Extensive experiments demonstrate that our method can generate diverse head avatars with an articulated mesh that can be edited seamlessly in 3D graphics software, facilitating downstream applications such as more efficient animation with inherited blend shapes and semantic consistency.
HeadEvolver deforms a template mesh by optimizing per-triangle weighted Jacobians guided by a text prompt. Rendered normal and RGB images are fed into a diffusion model to compute respective losses. Our regularization of the weighted Jacobians controls the fidelity and semantics of facial features that conform to text guidance
Comparison with the text to 3D avatar generation methods (TADA, Fantasia3D, and TextDeformer), HeadEvolver excels at producing 3D head avatars with high-quality mesh.
HeadEvolver supports attribute inheritance, which preserves the properties of source template mesh such as rigging armature, UV mapping, facial landmarks, and 3DMM parameters.
The generated head avatars could be animated and manipulated with previously defined rigging armature and 3DMM parameters (e.g., shape and expression).
The generated texture maps could be seamlessly transferred to other head models.
HeadEvolver supports editing by text prompts and manipulation of the generated head avatars in graphics software.
The generated head avatars could be further edited locally through textual descriptions from the perspectives of shape and appearance.
We show downsteamed applications using our semantic-preserving and high-quality avatars in 3D graphics software such as creating morphing effects and adding accessories.
In this section, we attach a few more visual results to demonstrate HeadEvolver's capabilities discussed above.
This is not an official product of Tencent.
We will highlight that we did not use extra training data nor fine-tune Stable Diffusion, so our pipeline strictly follows the use license of Stable Diffusion (Creative ML OpenRAIL-M license). All the copyrights of the demo images and videos are from the generation from stable diffusion. Any commercial use should get formal permission from Stable Diffusion.
@article{wang2024headevolver,
title={HeadEvolver: Text to Head Avatars via Locally Learnable Mesh Deformation},
author={Wang, Duotun and Meng, Hengyu and Cai, Zeyu and Shao, Zhijing and Liu, Qianxi
and Wang, Lin and Fan, Mingming and Shan, Ying and Zhan, Xiaohang and Wang, Zeyu},
journal={arXiv preprint arXiv:2403.09326},
year={2024}
}