Abstract

Generative models have enabled intuitive image creation and manipulation using natural language. In particular, diffusion models have recently shown remarkable results for natural image editing. In this work, we propose to apply diffusion techniques to edit textures, a specific class of images that are an essential part of 3D content creation pipelines. We analyze existing editing methods and show that they are not directly applicable to textures, since their common underlying approach, manipulating attention maps, is unsuitable for the texture domain. To address this, we propose a novel approach that instead manipulates CLIP image embeddings to condition the diffusion generation. We define editing directions using simple text prompts (e.g., "aged wood" to "new wood") and map these to CLIP image embedding space using a texture prior, with a sampling-based approach that gives us identity-preserving directions in CLIP space. To further improve identity preservation, we project these directions to a CLIP subspace that minimizes identity variations resulting from entangled texture attributes. Our editing pipeline facilitates the creation of arbitrary sliders using natural language prompts only, with no ground-truth annotated data necessary.

Paper

Paper (authors version): PDF
Supplemental Material: PDF

Results


We show here some interactive results. Check supplemental results on our full test dataset here.



Slider Image




Slider Image




Slider Image




Slider Image


Bibtex

@article{guerrero2024texsliders, title={TexSliders: Diffusion-Based Texture Editing in CLIP Space}, author={Guerrero-Viu, Julia and Hasan, Milos and Roullier, Arthur and Harikumar, Midhun and Hu, Yiwei and Guerrero, Paul and Gutierrez, Diego and Masia, Belen and Deschaintre, Valentin}, journal={arXiv preprint arXiv:2405.00672}, year={2024} }

Acknowledgments

This work has been partially supported by grant PID2022-141539NB-I00, funded by MICIU/AEI/10.13039/501100011033 and by ERDF, EU, and by the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 956585 (PRIME). Julia Guerrero-Viu developed part of this work during an Adobe internship, and was also partially supported by the FPU20/02340 predoctoral grant. We thank Daniel Martin and Sergio Izquierdo for insightful discussions and help preparing the final figures, and Ajinkya Kale for insightful discussions, as well as the team that developed the internal backbone diffusion model.