HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving
Mesh Deformation

1The Hong Kong University of Science and Technology (Guangzhou), 2Tencent AI Lab, 3South China University of Technology, 4The Hong Kong University of Science and Technology
*Equal Contributions

Video Preview


Current text-to-avatar methods often rely on implicit representations, leading to 3D content that artists cannot easily edit and animate in graphics software. This paper introduces a novel framework for generating stylized head avatars from text guidance, which leverages locally learnable mesh deformation and 2D diffusion priors to achieve high-quality digital assets for attribute-preserving manipulation. Given a template mesh, our method represents mesh deformation with per-face Jacobians and adaptively modulates local deformation using a learnable vector field. This vector field enables anisotropic scaling while preserving the rotation of vertices, which can better express identity and geometric details. We also employ landmark- and contour-based regularization terms to balance the expressiveness and plausibility of generated head avatars from multiple views without relying on any specific shape prior. Our framework can not only generate realistic shapes and textures that can be further edited via text, but also support seamless editing using the preserved attributes from the template mesh, such as 3DMM parameters, blendshapes, and UV coordinates. Extensive experiments demonstrate that our framework can generate diverse and expressive head avatars with high-quality meshes that artists can easily manipulate in 3D graphics software, facilitating downstream applications such as more efficient asset creation and animation with preserved attributes.

Method Overview

HeadEvolver deforms a template mesh by optimizing per-triangle vector fields guided by a text prompt. Rendered normal and RGB images coupled with MediaPipe landmarks are fed into a diffusion model to compute respective losses. Our regularization of Jacobians controls the fidelity and semantics of facial features that conform to text guidance.

Text to Head Avatar Generation

Comparison with the text to 3D avatar generation methods (TADA, Fantasia3D, and TextDeformer), HeadEvolver excels at producing 3D head avatars with high-quality mesh.

Click and rotate with the mouse. Press G to toggle wireframe, R to reset view.
(From left to right: "Terracotta Army", "Monkey", "Superman", and "Kobe Bryant".)

Attribute Preserving

HeadEvolver can preserve the properties of source template mesh such as rigging armature, UV mapping, facial landmarks, and 3DMM parameters.

Motion Retargeting

The generated head avatars could be animated and manipulated with previously defined rigging armature and 3DMM parameters (e.g., shape and expression).

Texture Transfer

The generated texture maps could be seamlessly transferred to other head models.

Editing Support

HeadEvolver supports editing by text prompts and manipulation of the generated head avatars in graphics software.

Text-based Local Editing

The generated head avatars could be further edited locally through textual descriptions from the perspectives of shape and appearance.

Editing in Blender

We show downsteamed applications using our semantic-preserving and high-quality avatars in 3D graphics software such as creating morphing effects and adding accessories.

More Results

In this section, we attach a few more visual results to demonstrate HeadEvolver's capabilities discussed above.

Avatar Generation

Texture Transfer


Click and rotate with the mouse. Press G to toggle wireframe, R to reset view.
(From left to right: "Cate Blanchett", "Stephen Curry", "Keira Knightley", "Kit Harington", "Donald Trump", "Vincent van Gogh", "Hulk", "Pinocchio", and "Anne Hathaway".)

License & Disclaimer

This is not an official product of Tencent.

  • Please carefully read and comply with the open-source license applicable to the data before using it.
  • Please carefully read and comply with the intellectual property declaration applicable to this project before using it.
  • The pipeline of this project runs completely offline and does not collect any personal information or other data. If you use this pipeline to provide services to end-users and collect related data, please take necessary compliance measures according to applicable laws and regulations (including but not limited to publishing privacy policies, adopting necessary data security strategies, and signing data collection agreements). If the collected data involves personal information, user consent must be obtained (if applicable). Any legal liabilities arising from this are unrelated to Tencent, HKUST(GZ) or HKUST.
  • Without Tencent's written permission, you are not authorized to use the names or logos legally owned by Tencent, such as "Tencent." Otherwise, you may be liable for legal responsibilities.
  • The pipeline of this project does not have the ability to directly provide services to end-users. If you need to use the pipeline and future released code for further model training or demos, as part of your product to provide services to end-users, or for similar use, please comply with applicable laws and regulations for your product or service. Any legal liabilities arising from this are unrelated to Tencent, HKUST(GZ) or HKUST.
  • It is prohibited to use this project or future released open-source code for activities that harm the legitimate rights and interests of others (including but not limited to fraud, deception, infringement of others' portrait rights, reputation rights), or other behaviors that violate applicable laws and regulations or go against social ethics and good customs (including but not limited to providing incorrect or false information, spreading pornographic, terrorist, and violent information). Otherwise, you may be liable for legal responsibilities.
  • We will highlight that we did not use extra training data nor fine-tune Stable Diffusion, so our pipeline strictly follows the use license of Stable Diffusion (Creative ML OpenRAIL-M license). All the copyrights of the demo images and videos are from the generation from stable diffusion. Any commercial use should get formal permission from Stable Diffusion.


          title={HeadEvolver: Text to Head Avatars via Locally Learnable Mesh Deformation},
          author={Wang, Duotun and Meng, Hengyu and Cai, Zeyu and Shao, Zhijing and Liu, Qianxi 
                  and Wang, Lin and Fan, Mingming and Zhan, Xiaohang and Wang, Zeyu},
          journal={arXiv preprint arXiv:2403.09326},