Teaser Image

Abstract

Stylization and style transfer are fundamental tasks in the realm of image editing, particularly in professional illustration creation. These techniques involve transforming the visual style of an image while preserving its core content. Text-to-image (T2I) generative models have been successful in creating visually stunning images from textual descriptions. However, recent advancements in diffusion models have opened the door for personalized styling in image generation.

This project aims to provide a unified codebase to evaluate training-free stylization models. Various models from the literature were evaluated on Style-Rank, an evaluation dataset of images that we compiled from the most popular stylization papers. On top of evaluating the different models, we also propose, Inversion-InstantStyle, a small improvement with respect to InstantStyle, by computing a starting latent using DDIM Inversion and adding noise to it. See Inversion-InstantStyle demo and the method technical diagram below.

Evaluation

The following project allows benchmarking several training-free stylization methods on the aggregated Style-Rank dataset and compute the corresponding quantitative metrics.


Dataset

We provide Style-Rank, an evaluation dataset of images that we compiled using reference images from the following stylization papers:

Our codebase can also be used with your own dataset to evaluate the models for more specific use cases, such as enterprise. Notice that corresponding original licenses still apply to each image part of this dataset.

Training-free methods

We provide wrappers for the following training-free stylization methods, following either the official implementations or Diffusers implementations:
Model Arxiv Code Project Page Implementation
StyleAligned Arxiv Code Project Page Official
VisualStyle Arxiv Code Project Page Official
IP-Adapter Arxiv Code Project Page Diffusers
InstantStyle Arxiv Code Project Page Diffusers
CSGO Arxiv Code Project Page Official
Style-Shot Arxiv Code Project Page Official

Inversion-InstantStyle method

On top of the above mentioned open-source methods, we also provide a new method, Inversion-InstantStyle, that simply combines DDIM Inversion, renoising and InstantStyle.

In more details,

  1. We first use DDIM inversion without any conditioning (no CFG) to invert the styling reference image, resulting in an inverted latent.
  2. We propose to add random noise to this inverted latent, whose strength can be controlled by the user. We hypothesize that this process removes part of the information in the inverted latent, preserving only high-level features (colors, textures, etc.) and discarding content information.
  3. Finally, starting from this renoised latent, we use the InstantStyle method to generate a new image in the same style as the reference image.

Teaser Image

Metrics

The following metrics are computed to assess the quality of the models:

  • CLIP-Text metric: Cosine Similarity between the target caption (embedded using ClipTextModel) and the generated image (embedded using ClipVisionModel) - Using the implementation from Transformers
  • CLIP-Image metric: Cosine Similarity between the target styling image and the generated image (embedded using ClipVisionModel) - Using the implementation from Transformers
  • Dino: Cosine Similarity between the target styling image and the generated image (embedded using Dinov2Model) - Using the implementation from Dino
  • ImageReward: Score from the ImageReward model

Results

Following instructions within our benchmark codebase, we compiled the results of the different listed methods (including Inversion-InstantStyle) on the provided dataset.

Model ImageReward ↑ Clip-Text ↑ Clip-Image ↑ Dinov2 ↑
StyleAligned -1.26 19.26 68.72 36.29
VisualStyle -0.72 22.12 66.68 20.80
IP-Adapter -2.03 15.01 83.66 40.50
Style-Shot -0.38 21.34 65.04 23.04
CSGO -0.29 22.16 61.73 16.85
InstantStyle -0.13 22.78 66.43 18.48
Inversion-InstantStyle -1.30 18.90 76.60 49.42

Results are aligned with those obtained in the different styling papers. Notice that the metrics can fluctuate depending on the chosen prompts and seeds for evaluation.

clip_text clip_image scatter plot
@misc{benaroche2024stylerank,
  title={Style-Rank: Benchmarking stylization for diffusion models}, 
  author=Eyal Benaroche and Clement Chadebec and Onur Tasar and Benjamin Aubin},
  year={2024},
  }