|
IMAGE CODING
Transform
Coding,
Image Representation and Quantizer Design
Image compression systems
commonly operate
by transforming the input signal into a new representation whose
elements
are then independently quantized. The success of such a system depends
on two properties of the representation. First, the coding rate is
minimized
only if the elements of the representation are statistically
independent.
Second, the perceived coding distortion is minimized only if the errors
in a reconstructed image arising from quantization of the different
elements
of the representation are perceptually independent.
According to the Barlow
hypothesis, the perceptual representation of the image should also
be statistically efficient. The match with natural
image statistics and the fact that most of the image coding
applications
are intended to be judged by a human observer have motivated the use of
human
vision models and perceptual
metrics to inspire the image representation in transform coders as
well as the bit allocation in the selected representation (e.g. JPEG
and
JPEG2000). Our work in this field has been
focused
in using accurate models of the perceptual non-linearties to improve
these
standards. The key issue is making a uniform quantization in a
perceptually
uniform domain. We have developed a distortion criterion that unifies
all
the results in perceptually based quantization: making a uniform
quantization
in the perceptual domain is equivalent to restrict the Maximum
Perceptual
Error (MPE) in each component of the perceptual representation. When
using
this concept, all the proposed approaches (ours and those of other
people,
e.g. JPEG and JPEG2000) are just particular cases using different
perception
models.
The first attempts to use human
vision
models in tranform coding were too simple: the original JPEG standard
[Wallace91]
used the block-DCT as the first stage of the perceptual representation
and a simple linear approximation of the second stage was used to
design
bit allocation (using achromatic and chromatic CSFs).
Over the last years, we have
shown that
using progressively more accurate approximations of the second
non-linear
stage significantly improve the JPEG results [Malo95, Malo99, Malo00, Epifanio03, Malo03, Malo04].
In particular, we have shown that linear transforms cannot achieve
either
of the (statistical and perceptual independence) goals, and we have
proposed
an adaptive non-linear image representation (the divisive
normalization)
that greatly reduces both the statistical and the perceptual redundancy
amongst representation elements. We developed an efficient method of
inverting
this representation, and we demonstrated that this dual reduction in
dependency
can greatly improve the visual quality of compressed images. For an
extended
review of the proposed methods in the context of color image coding,
see
[Malo02] (in Spanish!).
An illustration of the results
that summarizes
our work in this field is given below (0.18 bits/pix):

Rate-Distortion
performance of DCT based methods
Rate (Entropy in bits/pix)
|
Dotted: JPEG
[Wallace91]
Dash-dot: [Malo95, Malo99, Malo00]
(Note that
this simple
non-linearity was intended to work in the JPEG range, i.e, between
[0.4-0.9]
bits/pix)
Dashed: [Epifanio03]
Solid: [Malo03, Malo04]
|
The new standard JPEG2000
[Taubman02] uses
wavelets in the first linear stage; and it incorporates more complex
expressions, yet
not exact, for the non-linear perceptual stage [Daly02]. In
particular,
in addition of the simplest linear model (the CSF), it allows
point-wise
non-linearities and simplified versions of the divisive normalization.
The major problem found in using the most accurate version of divisive
normalization is the mathematical complexity of its inversion. This is
why the standard just uses simplified versions of the non-linearity.
In [Navarro05]
we improved the performance of JPEG2000 by using the perceptual
non-linearities
as well. We extended to wavelets the robust and fast algorithm to
invert
the exact expression of divisive normalization that was derived and
used
in the DCT context [Malo04].
In this way, significant improvements in rate-distortion performance
and
better color reproduction than in JPEG2000 have been found (see the
example
below 0.2 bits/pix).

| Original
(24 bits/pix) |
Linear
JPEG2000 [Daly02] |
| Simple
non-linear JPEG2000
[Daly02] |
Our method [Navarro05] |
Rate-Distortion
performance of Wavelet based methods
|
Dotted: Linear
JPEG2000 [Daly02]
Dashed: Simple
non-linear JPEG2000
[Daly02]
Solid: [Navarro05]
|
Due to our colaboration with Dr.
Gustavo Camps (Dept. d'Eng. Electr. Universitat de Valencia) we are
applied a different kind of non-linear processing after the linear
transform
representation. In particular, we trained perceptually weighted
Support Vector Machines (SVMs) to select the subset of more relevant
coefficients
in the linear representation.
SVM learning has been recently
proposed
for image compression in the frequency domain using a constant insensitivity
zone by Robinson and Kecman [Rob&Kec03]. However, according to
the statistical properties of natural images and the properties of
human
perception (the Barlow
hypothesis again!), a constant insensitivity makes sense in the
spatial
domain but it is certainly not a good option in a frequency domain. In
fact, in their approach they made a fixed low-pass ad-hoc assumption:
they
neglected high-frequency coefficients in the SVM training.
In [Gomez04]
we have proposed the use of adaptive insensitivity SVMs [Camps01] for
image
coding using an appropriate distortion criterion [Malo95-Malo04]
based on a simple visual cortex model. Training the SVM by using an
accurate
perception model avoids any a priori assumption and reduces the
blocking
effect and improves the subjective rate-distortion performance of the
original
approach.
Our results compared to the
results using
straightforward SVMs in the DCT domain and with the method proposed in
[Rob&Kec03] are shown below (0.3 bits/pix).
Original (8 bits/pix) |
Euclidean SVM |
[Robinson&Kecman03] |
[Gomez04] |
Rate-Distortion
performance of DCT-SVM based methods
|
Dotted: Euclidean
SVM
Dashed:
[Rob&Kec03]
Solid: CSF-SVM [Gomez04]
|
PUBLICATIONS
|
J. Gutiérrez, G. Gomez, G. Camps and
J.Malo
Perceptual image
representations in Suppot Vector Machine image coding.
Book chapter in: Kernel methods in bioengineering,
communications and image processing.
Publisher Idea Group Inc. (submmited in June 2005)
|
|
G.
Gomez, G. Camps, J. Gutiérrez and J. Malo
Perceptual Adaptive
Insensitivity for
Support Vector Machine Image Coding.
IEEE Trans.
Neural Networks
(Accepted June 2005)
|
|
Y.
Navarro, J. Rovira, J. Gutiérrez and J.Malo
Gain control for the
chromatic channels
in JPEG2000.
Proc. of the 10th
Intl. Conf.
AIC. Vol. 1, pp. 539-542. (2005)
|
|
J.Malo,
I.Epifanio, R.Navarro and E. Simoncelli,
Non-linear Image
Representation for
Efficient Perceptual Coding.
IEEE Trans.
Im. Proc.,
(Accepted in oct. 2004)
|
|
J. Malo
Normalized Image
Representation for
Efficient Coding.
Proc. IEEE 37th Asilomar
Conf. Signals
Systems & Comp., Vol. 2, pp. 1408-1412 (2003)
|
|
I.
Epifanio, J. Gutiérrez and J.Malo
Linear Transform for
Simultaneous Diagonalization
of Covariance and Perceptual Metric Matrix in Image Coding.
Pattern Recognition,
Vol. 36, pp.
1799-1811 (2003)
|
|
J.Malo
Almacenamiento y
transmisión
de imágenes en color
Tecnología del color
(Artigas,
Capilla y Pujol Eds.). Cap. 4, pp 117-164.
Servei de Publicacions de la
Universitat
de València (2002)
|
|
J.Malo,
R.Navarro, I.Epifanio, F.Ferri, J.M.Artigas
Non-linear Invertible
Representation
for Joint Statistical and Perceptual Feature Decorrelation.
Lecture Notes on Computer
Science,
Vol. 1876, pp. 658-667 (2000)
|
|
J.Malo,
F.Ferri, R.Navarro, R.Valerio
Perceptually and
Statistically Decorrelated
Features for Image Representation:
Application to Transform
Coding.
Proc. IEEE Int. Conf. Patt.
Rec.,
Vol. 3, pp. 242-245 (2000)
|
|
J.Malo,
F.Ferri, J.Albert, J.Soret, J.M.Artigas
The role of perceptual
contrast non-linearities
in image transform quantization.
Image & Vision Computing
,
Vol. 18, 3, pp. 233-246 (2000)
|
|
J.Malo,
F.Ferri, J.Albert, J.Soret
Comparison of perceptually
uniform
quantization with average perceptual error minimization in image
transform
coding.
Electronics Letters ,
Vol. 35,
13, pp. 1067-1068 (1999)
|
|
J.Malo,
A.Pons, J.M.Artigas
Bit allocation algorithm
for codebook
design in vector quantization fully based on HVS non-linearities for
suprathreshold
contrasts.
Electronics Letters ,
Vol. 31,
15, pp.1222-1224 (1995)
|
|
|