HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving
Mesh Deformation

1The Hong Kong University of Science and Technology (Guangzhou), 2Tencent AI Lab, 3The Hong Kong University of Science and Technology
*Equal Contributions

Video Preview

Abstract

Current text-to-avatar methods often rely on implicit representations, leading to 3D content that artists cannot easily edit and animate in graphics software. This paper introduces a novel framework for generating stylized head avatars from text guidance, which leverages locally learnable mesh deformation and 2D diffusion priors to achieve high-quality digital assets for attribute-preserving manipulation. Given a template mesh, our method represents mesh deformation with per-face Jacobians and adaptively modulates local deformation using a learnable vector field. This vector field enables anisotropic scaling while preserving the rotation of vertices, which can better express identity and geometric details. We also employ landmark- and contour-based regularization terms to balance the expressiveness and plausibility of generated head avatars from multiple views without relying on any specific shape prior. Our framework can not only generate realistic shapes and textures that can be further edited via text, but also support seamless editing using the preserved attributes from the template mesh, such as 3DMM parameters, blendshapes, and UV coordinates. Extensive experiments demonstrate that our framework can generate diverse and expressive head avatars with high-quality meshes that artists can easily manipulate in 3D graphics software, facilitating downstream applications such as more efficient asset creation and animation with preserved attributes.

Method Overview

HeadEvolver deforms a template mesh by optimizing per-triangle vector fields guided by a text prompt. Rendered normal and RGB images coupled with MediaPipe landmarks are fed into a diffusion model to compute respective losses. Our regularization of Jacobians controls the fidelity and semantics of facial features that conform to text guidance.

Text to Head Avatar Generation

Mouse drag for rotation; Mouse scroll for scaling. Press G to toggle wireframe, R to reset view.
(From left to right: "Terracotta Army", "Monkey", "Superman", and "Kobe Bryant", "Vincent van Gogh", "Donald Trump".)

Ours

Static & Interactable Rendering

HumanNorm

Static & Interactable Rendering

TADA

Static & Interactable Rendering

TextDeformer

Static & Interactable Rendering

Attribute Preserving

Our framework can preserve the properties of source template mesh such as rigging armature, UV mapping, facial landmarks, and 3DMM parameters.

Motion Retargeting

The generated head avatars could be animated and manipulated with previously defined rigging armature and 3DMM parameters (e.g., shape and expression).

Texture Transfer

The generated texture maps could be seamlessly transferred to other head models.

Editing Support

Our method supports editing by text prompts and manipulation of the generated head avatars in graphics software.

Text-based Local Editing

The generated head avatars could be further edited locally through textual descriptions from the perspectives of shape and appearance.

Editing in Blender

We show downsteamed applications using our semantic-preserving and high-quality avatars in 3D graphics software such as creating morphing effects and adding accessories.