News

July, 2024: Code and model available (see Code).
July, 2024: Web launched.

Abstract

Saliency prediction in 360º video plays an important role in modeling visual attention, and can be leveraged for content creation, compression techniques, or quality assessment methods, among others. Visual attention in immersive environments depends not only on visual input, but also on inputs from other sensory modalities, primarily audio. Despite this, only a minority of saliency prediction models have incorporated auditory inputs, and much remains to be explored about what auditory information is relevant and how to integrate it in the prediction. In this work, we propose an audiovisual saliency model for 360º video content, AViSal360. Our model integrates both spatialized and semantic audio information, together with visual inputs. We perform exhaustive comparisons to demonstrate both the actual relevance of auditory information in saliency prediction, and the superior performance of our model when compared to previous approaches.

Downloads

Paper: (Authors version) PDF
Supplementary material: PDF

Results

The qualitative comparisons between our model, AViSal360, and the three main state-of-the-art approaches can be found in our web-based browser for the D-SAV360 dataset.

Code

You can find the code and model for AViSal360 in our GitHub repository.

Bibtex:

@INPROCEEDINGS {10765391, author = {Bernal-Berdun, Edurne and Pina, Jorge and Vallejo, Mateo and Serrano, Ana and Martin, Daniel and Masia, Belen}, booktitle = {2024 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)}, title = {{AViSal360: Audiovisual Saliency Prediction for 360° Video}}, year = {2024}, pages = {1246-1255}, doi = {10.1109/ISMAR62088.2024.00141}, publisher = {IEEE Computer Society}}

Related Work

2023: D-SAV360: A Dataset of Gaze Scanpaths on 360° Ambisonic Videos

@article{bernal2023d, title={D-SAV360: A Dataset of Gaze Scanpaths on 360° Ambisonic Videos}, author={Bernal-Berdun, Edurne and Martin, Daniel and Malpica, Sandra and Perez, Pedro J and Gutierrez, Diego and Masia, Belen and Serrano, Ana}, journal={IEEE Transactions on Visualization and Computer Graphics}, year={2023}, publisher={IEEE} }

2022: SST-Sal: A spherical spatio-temporal approach for saliency prediction in 360∘ videos

@article{bernal2022sst, title={SST-Sal: A spherical spatio-temporal approach for saliency prediction in 360∘ videos}, author={Bernal-Berdun, Edurne and Martin, Daniel and Gutierrez, Diego and Masia, Belen}, journal={Computers \& Graphics}, volume={106}, pages={200--209}, year={2022}, publisher={Elsevier} }

2022: ScanGAN360: A Generative Model of Realistic Scanpaths for 360 Images

@article{martin2022scangan360, title={ScanGAN360: A Generative Model of Realistic Scanpaths for 360 Images}, author={Martin, Daniel and Serrano, Ana and Bergman, Alexander W and Wetzstein, Gordon and Masia, Belen}, journal={IEEE Transactions on Visualization \& Computer Graphics}, number={01}, pages={1--1}, year={2022}, publisher={IEEE Computer Society} }

2020: Panoramic convolutions for 360º single-image saliency prediction

@inproceedings{martin20saliency, author={Martin, Daniel and Serrano, Ana and Masia, Belen}, title={Panoramic convolutions for $360^{\circ}$ single-image saliency prediction}, booktitle={CVPR Workshop on Computer Vision for Augmented and Virtual Reality}, year={2020} }

This work has been supported by grant PID2022-141539NB-I00, funded by MICIU/AEI/10.13039/501100011033 and by ERDF, EU

Edurne Bernal-Berdun	Jorge Pina	Mateo Vallejo	Ana Serrano	Daniel Martin	Belen Masia
Universidad de Zaragoza, I3A