Deep Saliency Prior for Reducing Visual Distraction


CVPR 2022


Google Research

Given an input image and a region(s) to edit, our method back-propagates through a visual saliency prediction model to solve for an image such that the saliency level in the region of interest is modified. We explore a set of differentiable operators, the parameters of which are all guided by the saliency model, resulting in a variety of effects such as (a) camouflaging (b) semantic editing (c) inpainting, and (d) color harmonization.

Paper
Supplementary Material
Eye-Gaze Study

Using only a model that was trained to predict where people look at images, and no additional training data, we can produce a range of powerful editing effects for reducing distraction in images. Given an image and a mask specifying the region to edit, we backpropagate through a state-of-the-art saliency model to parameterize a differentiable editing operator, such that the saliency within the masked region is reduced. We demonstrate several operators, including: a recoloring operator, which learns to apply a color transform that camouflages and blends distractors into their surroundings; a warping operator, which warps less salient image regions to cover distractors, gradually collapsing objects into themselves and effectively removing them (an effect akin to inpainting); a GAN operator, which uses a semantic prior to fully replace image regions with plausible, less salient alternatives. The resulting effects are consistent with cognitive research on the human visual system (e.g., since color mismatch is salient, the recoloring operator learns to harmonize objects' colors with their surrounding to reduce their saliency), and, importantly, are all achieved solely through the guidance of the pretrained saliency model, with no additional supervision. We present results on a variety of natural images and conduct a perceptual study to evaluate and validate the changes in viewers' eye-gaze between the original images and our edited results.

Saliency Driven Warping - Visualization

Visualize of the intermediate steps of the warp operator optimization:

 

Input   Optimization
(Zoom-in)
  Optimization
(Zoom-in)
  Output
     

Results

Reducing Distraction in Video Conference Calls

Our approach + background blur can reduce visual attention drawn to distracting regions, while maintaining the structural integrity of the subject’s environment. Compare with the common background blur effect, which leaves colorful, attention-grabbing blobs in the background:

 

Saliency Increase in StyleGAN space:

The output image (right) is achieved by learning directions in the latent space, such that the saliency of the original image (left) is increased in the region of interest (marked in red on the corresponding saliency map). The found directions are semantically meaningful and natural (adding a moustache and adding prominent domes):

Validation through Real Eye-Gaze Measurement:

Samples of real eye-gaze saliency maps measured in our perceptual study. Each pair in the first row show an original image (left) with a region of interest on top (red border) and our result (right). The second row depicts the corresponding average eye-gaze maps across participants in the study:

BibTeX

@article{aberman2021deep,
  author = {Aberman, Kfir and He, Junfeng and Gandelsman, Yossi and Mossari, Inbar and Jacobes, David E. and Kohlhoff, Kai and Pritch, Yael and Rubinstein, Michael},
  title = {Deep Saliency Prior for Reducing Visual Distraction},
  publisher = {Arxiv}
}