Loading…
Facial Image Compression via Neural Image Manifold Compression
Although the recent learning-based image and video coding techniques achieve rapid development, the signal fidelity-driven target in these methods leads to the divergence to a highly effective and efficient coding framework for both human and machine. In this paper, we aim to address the issue by ma...
Saved in:
Published in: | IEEE transactions on circuits and systems for video technology 2024, p.1-1 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Although the recent learning-based image and video coding techniques achieve rapid development, the signal fidelity-driven target in these methods leads to the divergence to a highly effective and efficient coding framework for both human and machine. In this paper, we aim to address the issue by making use of the power of generative models to bridge the gap between full fidelity (for human vision) and high discrimination (for machine vision). Therefore, relying on existing pretrained generative adversarial networks (GAN), we build a GAN inversion framework that projects the image into a low-dimensional natural image manifold. In this manifold, the feature is highly discriminative and also encodes the appearance information of the image, named as latent code . Taking a variational bit-rate constraint with a hyperprior model to model/suppress the entropy of image manifold code, our method is capable of fulfilling the needs of both machine and human visions at very low bit-rates. To improve the visual quality of image reconstruction, we further propose multiple latent codes and scalable inversion . The former gets several latent codes in the inversion, while the latter additionally compresses and transmits a shallow compact feature to support visual reconstruction. Experimental results demonstrate the superiority of our method in both human vision tasks, i.e . image reconstruction, and machine vision tasks, including semantic parsing and attribute prediction. |
---|---|
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2023.3274739 |