(VI(S)TA) Visual Statistics Group

Image Processing

IMAGE CODING
Transform Coding, Image Representation and Quantizer Design (MATLAB code)

Image compression systems commonly operate by transforming the input signal into a new representation whose elements are then independently quantized. The success of such a system depends on two properties of the representation. First, the coding rate is minimized only if the elements of the representation are statistically independent. Second, the perceived coding distortion is minimized only if the errors in a reconstructed image arising from quantization of the different elements of the representation are perceptually independent.

According to the Barlow hypothesis , the perceptual representation of the image should also be statistically efficient. The match with natural image statistics and the fact that most of the image coding applications are intended to be judged by a human observer have motivated the use of human vision models and perceptual metrics to inspire the image representation in transform coders as well as the bit allocation in the selected representation (e.g. JPEG and JPEG2000). Our work in this field has been focused in using accurate models of the perceptual non-linearties to improve these standards. The key issue is making a uniform quantization in a perceptually uniform domain. We have developed a distortion criterion that unifies all the results in perceptually based quantization: making a uniform quantization in the perceptual domain is equivalent to restrict the Maximum Perceptual Error (MPE) in each component of the perceptual representation. When using this concept, all the proposed approaches (ours and those of other people, e.g. JPEG and JPEG2000) are just particular cases using different perception models.

The first attempts to use human vision models in tranform coding were too simple: the original JPEG standard [Wallace91] used the block-DCT as the first stage of the perceptual representation and a simple linear approximation of the second stage was used to design bit allocation (using achromatic and chromatic CSFs).

Over the last years, we have shown that using progressively more accurate approximations of the second non-linear stage significantly improve the JPEG results [ Malo95 , Malo99 , Malo00 , Epifanio03 , Malo03 , Malo06a ]. In particular, we have shown that linear transforms cannot achieve either of the (statistical and perceptual independence) goals, and we have proposed an adaptive non-linear image representation (the divisive normalization) that greatly reduces both the statistical and the perceptual redundancy amongst representation elements. We developed an efficient method of inverting this representation, and we demonstrated that this dual reduction in dependency can greatly improve the visual quality of compressed images. For an extended review of the proposed methods in the context of color image coding, see [ Malo02 ] (in Spanish!).

An illustration of the results that summarizes our work in this field is given below (0.18 bits/pix):

bar

Original (8 bits/pix)

JPEG [Wallace91]

[ Malo95 , Malo99 , Malo00 ]

[ Epifanio03 ]

[ Malo03 ] [Malo06a ]

Rate-Distortion performance of DCT based methods

Rate (Entropy in bits/pix)

Dotted: JPEG [Wallace91]

Dash-dot: [Malo95 , Malo99 , Malo00 ]
(Note that this simple non-linearity was intended to work in the JPEG range, i.e, between [0.4-0.9] bits/pix)

Dashed: [Epifanio03 ]

Solid: [Malo03 , Malo06a ]

The new standard JPEG2000 [Taubman02] uses wavelets in the first linear stage; and it incorporates more complex expressions, yet not exact , for the non-linear perceptual stage [Daly02]. In particular, in addition of the simplest linear model (the CSF), it allows point-wise non-linearities and simplified versions of the divisive normalization. The major problem found in using the most accurate version of divisive normalization is the mathematical complexity of its inversion. This is why the standard just uses simplified versions of the non-linearity.

In [ Navarro05 ] we improved the performance of JPEG2000 by using the perceptual non-linearities as well. We extended to wavelets the robust and fast algorithm to invert the exact expression of divisive normalization that was derived and used in the DCT context [Malo06a ]. In this way, significant improvements in rate-distortion performance and better color reproduction than in JPEG2000 have been found (see the example below 0.2 bits/pix).

Original (24 bits/pix)

Linear JPEG2000 [Daly02]

Simple non-linear JPEG2000 [Daly02]

Our method [ Navarro05 ]

Rate-Distortion performance of Wavelet based methods

Dotted: Linear JPEG2000 [Daly02]

Dashed: Simple non-linear JPEG2000 [Daly02]

Solid: [Navarro05 ]

Due to our colaboration with Dr. Gustavo Camps (Dept. d'Eng. Electr. Universitat de Valencia) we are applied a different kind of non-linear processing after the linear transform representation. In particular, we trained perceptually weighted Support Vector Machines (SVMs) to select the subset of more relevant coefficients in the linear representation.

SVM learning has been recently proposed for image compression in the frequency domain using a constant insensitivity zone by Robinson and Kecman [Rob&Kec03]. However, according to the statistical properties of natural images and the properties of human perception (the Barlow hypothesis again!), a constant insensitivity makes sense in the spatial domain but it is certainly not a good option in a frequency domain. In fact, in their approach they made a fixed low-pass ad-hoc assumption: they neglected high-frequency coefficients in the SVM training.

In [ Gomez04 ] we have proposed the use of adaptive insensitivity SVMs [Camps01] for image coding using an appropriate distortion criterion [ Malo95 -Malo06a ] based on a simple visual cortex model. Training the SVM by using an accurate perception model avoids any a priori assumption and reduces the blocking effect and improves the subjective rate-distortion performance of the original approach.

Our results compared to the results using straightforward SVMs in the DCT domain and with the method proposed in [Rob&Kec03] are shown below (0.3 bits/pix). Even better results are obtained when operating in non-linearly transformed domains [ Gutiérrez07 , Camps08]. We have recently extended these results to work with color images [Malo08].

Original (8 bits/pix)	Euclidean SVM
[Robinson&Kecman03]	[ Gomez04 ]

Rate-Distortion performance of DCT-SVM based methods

Dotted: Euclidean SVM

Dashed: [Rob&Kec03]

Solid: CSF-SVM [Gomez04 ]

PUBLICATIONS

G. Camps, J. Gutiérrez, G. Gómez and J. Malo
On the Suitable Domain for SVM Training in Image Coding.
Journal of Machine Learning Research, Vol. 9, pp 49-66 (2008)