Style-Rank

Abstract

Stylization and style transfer are fundamental tasks in the realm of image editing, particularly in professional illustration creation. These techniques involve transforming the visual style of an image while preserving its core content. Text-to-image (T2I) generative models have been successful in creating visually stunning images from textual descriptions. However, recent advancements in diffusion models have opened the door for personalized styling in image generation.

This project aims to provide a unified codebase to evaluate training-free stylization models. Various models from the literature were evaluated on Style-Rank, an evaluation dataset of images that we compiled from the most popular stylization papers. On top of evaluating the different models, we also propose, Inversion-InstantStyle, a small improvement with respect to InstantStyle, by computing a starting latent using DDIM Inversion and adding noise to it. See Inversion-InstantStyle demo and the method technical diagram below.

Evaluation

The following project allows benchmarking several training-free stylization methods on the aggregated Style-Rank dataset and compute the corresponding quantitative metrics.

Dataset

We provide Style-Rank, an evaluation dataset of images that we compiled using reference images from the following stylization papers:

Our codebase can also be used with your own dataset to evaluate the models for more specific use cases, such as enterprise. Notice that corresponding original licenses still apply to each image part of this dataset.

Training-free methods

We provide wrappers for the following training-free stylization methods, following either the official implementations or Diffusers implementations:

Model	Arxiv	Code	Project Page	Implementation
StyleAligned	Arxiv	Code	Project Page	Official
VisualStyle	Arxiv	Code	Project Page	Official
IP-Adapter	Arxiv	Code	Project Page	Diffusers
InstantStyle	Arxiv	Code	Project Page	Diffusers
CSGO	Arxiv	Code	Project Page	Official
Style-Shot	Arxiv	Code	Project Page	Official

Inversion-InstantStyle method

On top of the above mentioned open-source methods, we also provide a new method, Inversion-InstantStyle, that simply combines DDIM Inversion, renoising and InstantStyle.

In more details,

We first use DDIM inversion without any conditioning (no CFG) to invert the styling reference image, resulting in an inverted latent.
We propose to add random noise to this inverted latent, whose strength can be controlled by the user. We hypothesize that this process removes part of the information in the inverted latent, preserving only high-level features (colors, textures, etc.) and discarding content information.
Finally, starting from this renoised latent, we use the InstantStyle method to generate a new image in the same style as the reference image.

Metrics

The following metrics are computed to assess the quality of the models:

CLIP-Text metric: Cosine Similarity between the target caption (embedded using ClipTextModel) and the generated image (embedded using ClipVisionModel) - Using the implementation from Transformers
CLIP-Image metric: Cosine Similarity between the target styling image and the generated image (embedded using ClipVisionModel) - Using the implementation from Transformers
Dino: Cosine Similarity between the target styling image and the generated image (embedded using Dinov2Model) - Using the implementation from Dino
ImageReward: Score from the ImageReward model

Results

Following instructions within our benchmark codebase, we compiled the results of the different listed methods (including Inversion-InstantStyle) on the provided dataset.

Model	ImageReward ↑	Clip-Text ↑	Clip-Image ↑	Dinov2 ↑
StyleAligned	-1.26	19.26	68.72	36.29
VisualStyle	-0.72	22.12	66.68	20.80
IP-Adapter	-2.03	15.01	83.66	40.50
Style-Shot	-0.38	21.34	65.04	23.04
CSGO	-0.29	22.16	61.73	16.85
InstantStyle	-0.13	22.78	66.43	18.48
Inversion-InstantStyle	-1.30	18.90	76.60	49.42

Results are aligned with those obtained in the different styling papers. Notice that the metrics can fluctuate depending on the chosen prompts and seeds for evaluation.

@misc{benaroche2024stylerank, title={Style-Rank: Benchmarking stylization for diffusion models}, author=Eyal Benaroche and Clement Chadebec and Onur Tasar and Benjamin Aubin}, year={2024}, }