Multimodal Color Recommendation In Vector Graphic Documents

Qianru Qiu, Xueting Wang, Mayu Otani
CyberAgent AI Lab
[Paper] [Code]

Abstract

Color selection plays a critical role in graphic document design and requires sufficient consideration of various contexts. However, recommending appropriate colors which harmonize with the other colors and textual contexts in documents is a challenging task, even for experienced designers. In this study, we propose a multimodal masked color model that integrates both color and textual contexts to provide text-aware color recommendation for graphic documents. Our proposed model comprises self-attention networks to capture the relationships between colors in multiple palettes, and cross-attention networks that incorporate both color and CLIP-based text representations. Our proposed method primarily focuses on color palette completion, which recommends colors based on the given colors and text. Additionally, it is applicable for another color recommendation task, full palette generation, which generates a complete color palette corresponding to the given text. Experimental results demonstrate that our proposed approach surpasses previous color palette completion methods on accuracy, color distribution, and user experience, as well as full palette generation methods concerning color diversity and similarity to the ground truth palettes.

Method

(Left) Representation processes of color and text in a graphic document. (Right) Multimodal masked color model.

Results of Color Palette Completion

Color recommendation results with our proposed method, Qiu et al., and random color selection. The selected colors for recoloring are marked with ‘✓’. In the first sample, three colors in image element are recolored. In the second sample, two colors are recolored: one in graphic element and the other in text element. In the third sample, one color in graphic element is recolored. (b) Text and Palettes are extracted from GT, including text contents (TC), image labels (IL), and the palettes of image, graphic, and text elements.

Results of Full Palette Generation

Generated palette results of our proposed method and the related work TPN with the ground truth. PP (post-processing) is to eliminate duplicated colors in a palette.