A filter bank for rotationally invariant image recognition

We present new rotation moment invariants based on multiresolution filter bank techniques. The multiresolution pyramid motivates our simple but efficient feature selection procedure based on the fuzzy C-mean clustering methodology combined with the Mahalanobis distance measure. The proposed procedure verifies an impact of random noise as well as an interesting, less known impact of noise due to spatial transformations. The recognition accuracy of the proposed technique has been tested with the Zernike moments, the Fourier-Mellin moments as well as with wavelet based schemes. The numerical experiments, with more than 30 000 images, demonstrate a tangible accuracy increase of about 3% for low level noise, 8% for the average level noise and 15% for high level noise.


Introduction
The first region-based rotation moment invariants introduced by Hu, as described in [16,21,22], are projections of the image onto monomial functions.The moments are believed to be reliable for complex shapes, because they involve not only the contour pixels but all the pixels of the object.However, a dramatic increase in complexity associated with the relevant projections renders Hu's moments impractical.Besides, the redundancy of the Hu moments, noticed in [22], clearly indicates a need for further research.Shortly after Hu's paper, a variety of invariant moments were proposed and analyzed [3,4,9,10,11,14,18,19,20,21,22,25,28].The major developments are characterized by the Legendre moments [14,21,22], the orthogonal Zernike moments [9,10,11,14,22], the Fourier-Mellin moments [10,9,22], the complex moments [3,4,22], the Tchebichef moments [25] and the Krawtchouk moments [28].As a matter of fact, almost every known system of orthogonal polynomials has been tested and analyzed.Finally, Shen [20] introduced a rotationally invariant moment representing the image by projections onto wavelets.It has been demonstrated that such wavelet invariants may ensure a higher classification rate.
The multiresolution pyramid is well-known [12].However, to the best knowledge of the authors, it has not been applied to construct moment invariants.Therefore, we propose to develop the idea of wavelet-based moments by introducing the filter bank representation.As opposed to conventional wavelets, this representation is rotationally invariant due to the circular Fourier-transform prior to multiresolution analysis [12].Since the Mallat-like expansion is always overcomplete, the features are selected by the Fuzzy C-mean (FCM) clustering method endowed with the Mahalanobis distance measure and the elimination of redundant and noise sensitive features.It should be noted that the sensitivity of the moment invariants to image noise has been mentioned repeatedly in the literature (see, for instance, [22]).As a consequence the moments are rotationally invariant only when they are computed from ideal images.Even in the absence of noise induced by physical devices, there always exists noise due to finite resolution of the image subjected to spatial transformations.Therefore, in practice, the spatial transformations themselves affect the invariance.Since this transformation noise may appear even at low frequencies, the moment invariants should be evaluated by the response not only to random high frequency noise, but also to rotations and scaling.
First of all, we construct the multiresolution pyramid using a fast quadrature mirror filter (QMF).Next, we eliminate the features belonging to different resolution levels sensitive to either random or transformational noise.Furthermore, features from different resolution levels are considered in combinations.The objects are represented by means of FCM clusters.A minimum of the FCM-function corresponds to a better discriminative set.At this step, the use of the Mahalanobis distance is important, since the entire set of moments is redundant [1].Additional redundancy appears due to over completeness of the multiresolution analysis.
The recognition rate of the algorithm has been tested on 30 000 different images and compared with the Zernike moments, the Fourier-Mellin moments as well as with a wavelet based representation proposed by Shen [20].Our proposed technique provides a significant accuracy increase ranging from 3% to 15%.

Rotationally Invariant Moments
A general moment M of a function f (r, θ) with respect to a function F (r, θ), in a polar coordinate system with the origin at the centroid of the object is defined by In the context of image processing, the function f (r, θ) is the image intensity (the gray level).We assume that F (r, θ) = R(r)G(θ), where R(r) denotes a basis function such as the Zernike polynomial and G(θ) an angular function.Taking G(θ) ≡ GΓ (θ) = e i Γ θ for some Γ, provides the rotational invariance.Note that if Γ is a continuous variable, then the integral with regard to θ is none other than the circular Fourier-transform.Usually (but not necessarily), in the theory of rotational invariant moments, Γ is an integer called the angular order [20].We represent the above integral by where Note that if MΓ is a moment of the rotated image f (r, θ + φ), where φ is the angle of rotation, then MΓ = e iΓφ M Γ .Therefore, | MΓ | = |M Γ |.Thus, rotation of the object affects the phase but not the magnitude.Furthermore, the moment phase cancellation may be performed by multiplication of appropriate powers of moments rather than just by taking the moments' magnitudes (since the latter case yields a redundant feature system).Flusser [3,4] has shown that the rotation invariants can be constructed as products n i=1 M k i Γ i from some minimal set defined by a supplementary integer equation with regard to Γ i and k i (see [3] for further details).Given the magnitudes, the Flusser invariants may be evaluated by the identity Furthermore, from a functional analysis point of view, each object is represented by an infinite and unique set of the invariants if R(r) constitutes a basis in the appropriate functional space.A wavelet basis has a number of advantages, since it may be adapted to the spectrum as well as to the spatial properties of a particular set of objects.In [9,20] a set of the radial functions is given by where ψ(r) is the mother wavelet, m the dilation parameter (the scale index) and n the shifting parameter.Our forthcoming multiresolution analysis employs the most common choice m = 2 j , where j is an integer.
Coordinates of rS Γ (r) in the basis R m,n (r) given by are called the wavelet moments.A periodic treatment (see, for instance, [5]) is used when S Γ (r)r is integrated at the boundaries.Observe that this procedure requires variation of the angular order along with the scale index and the shifting parameter.Therefore, the number of features may be very large.
From the point of view of multiresolution analysis, such projections correspond to the so-called "details" associated with the high frequency part of the object shape.Therefore, such coefficients are usually sensitive to noise and not always appropriate.
Furthermore, the multiresolution pyramid is well-known.However, to the best knowledge of the authors, it has not been applied to construct rotation moment invariants.Therefore, our approach includes multiresolution analysis combined with FCM clustering, the Mahalanobis distance measure and a selection procedure which eliminates the redundant features.

Accuracy of the wavelet moment invariants
As mentioned in the introduction, the spatial transformations induce some noise, even when physical noise is negligible.We show that such noise may affect a pattern recognition system drastically and therefore requires special attention when applying the feature selection.Sensitivity of the conventional moments has been mentioned repeatedly in the literature and the rotation noise is not an exception.For instance, the error induced by rotation noise for the seventh order Hu moment, applied to the images presented in this section, is larger than 100%.Furthermore, we show that the wavelet moments (which are thought of as less noise-sensitive features [19]) may be affected by the transformations as well.Consider the B-spline mother wavelet in the Gaussian approximation form [20,9,26] given by where k, a, f 0 and σ w are the parameters of the mother wavelet.
Also consider six silhouettes of aircraft [27] being rotated by 360 • in increments of 5 • by means of standard graphics software, such as Photoshop (see Figure 1), using bilinear interpolation.In order to eliminate accumulation of errors due to multiple re-sampling, each rotation has been performed by rotating the original silhouette.Figure 2 shows a typical impact of the rotations on S 1 (r) and |M 0,1,1 |.Let us evaluate the accuracy by measuring the standard deviation of the normalized features [6].The standard deviation at Γ = 1.Light gray: the best for particular frequency; dark gray: the best for all of the frequencies.
Consider Table 1.Clearly, some values of ψ m,n produce poor results, since the rotation noise has been magnified [18,19].Another reason for the poor results is that the information has been "washed out" (for instance, when ψ m,n is small at the peak of S Γ (r)).
Since the rotation may shift the centroid of the object, the spatial noise may appear at low frequencies [18,19].Therefore, a large error is also induced by the transformation noise appearing at the same frequencies with S Γ (r) [18,19].
Finally, we compare the rotations performed physically and by the software (see Table 1).
The physical rotation may produce larger errors due to illumination, additional random noise, etc.However, in our case physically rotated objects display almost the same level of inaccuracies.
Scaling produces considerable errors -in particular when combined with rotations.Consider the wavelet moment invariants |M 1,0,1 | and |M 1,2,1 | applied to represent the scaled and rotated Alpha Jet (see Figure 4).Note that |M 1,0,1 | is the most accurate feature for the Alpha Jet and |M 1,2,1 | for the MiG-29.The maximum error in |M w 1,0,1 | is 2.34%, 2.96% and 4.39% of the rotated Alpha Jet scaled 20%, 40% and 60% respectively.The maximum error in |M w 1,2,1 | is 7.94%, 11.26% and 19.96%.The above error is significant and should be considered when selecting features for pattern recognition.Here a w superscript denotes "wavelet."

Filter bank moment invariants
In this section, we introduce new moment invariants based on the filter bank technique.In the case of discrete orthogonal wavelets the low-resolution coefficients may be calculated from higher resolution coefficients by a scheme called the filter blank.The QMF is a fast algorithm first proposed by Mallat [12] and extended to the biorthogonal case by Unser et al. [2,23,24].The approximation and the detail wavelet moments are constructed as respectively for m = L − 1, L − 2, . . ., 0, where L is the finest resolution level.Here H, G are the so-called finite impulse response filters, A L,n,Γ = S Γ (r n )r n and r n = n 1 K for all n = 1, 2, . . ., K [2,12,23,24].It is not difficult to demonstrate that |A m,n,Γ | and |D m,n,Γ | are rotation invariants for any Γ.

Selecting the Filter Bank Invariants
Selection of features [8] is a crucial step for any shape recognition system.In general, larger feature sets do not necessarily provide better discrimination.The mutliresolution moment invariants imply that for dissimilar objects the features should be taken mostly from the approximation coefficients.However, for similar objects one should employ the details.In order to find the best combination of the approximation and the detail coefficients, we first examine the features individually and we discard those with a low discriminatory capability.Further selection is done by analyzing combinations of the features.The entire feature selection procedure is presented as follows: 1. Discard the noise-sensitive angular orders by considering the least square error given by where I is the number of classes, J the number of objects in each class and S Γ (r k ) i,T emplate is the circular Fourier-transform of the template associated with class i.The resulting set Γ * = {Γ 1 , Γ 2 , . . ., Γ L } is fed to the next step of the procedure.
3. Reduce the dimension of the feature space by analyzing the features individually.We eliminate the redundant features using ANOVA [13].It is a powerful statistical procedure which compares the class means by employing analysis of the variance.We use a one-way ANOVA with a randomized complete block design to verify the assumption µ 1 = µ 2 = . . .µ i = . . .= µ I , where µ i is the mean-feature of class i.
4. Analyze combinations of the features.At this stage, the multiresolution analysis is combined with the FCM technique [7,17].First of all, the features should be normalized [6] (otherwise the FCM function may be larger for a better feature set).Then, we consider all possible combinations of the features at a scale a; F a = {M a,Γ 1 , M a,Γ 2 , ..., M a,Γ L }. Next, we consider combinations of the features selected from the scales a, b, c, d, e, . . .as follows: The discriminatory capability of a set is evaluated by minimizing the FCM-function.A minimum of the function corresponds to a better set.As mentioned before, the filter bank moment invariants are redundant.Therefore, the algorithm involves the Mahalanobis distance measure [1].Finally, once an appropriate feature set has been selected, the classification templates are automatically found as the centroids of the FCM clusters.

Experimental Results
We evaluate performance of the proposed algorithm by means of two data sets.The first data set consists of 37 500 noisy images [27] based on fifteen basic aircraft silhouettes: Alpha Jet, Am-X, Jaguar, Hawk, An-12 Cub, An-24 Coke, An-32 Cline, C-130 Provider, C-137 Hercules, G-222, MB-326, MB-339A, Mig-29, MiG-17 and Jastreb.Each silhouette produces 1 600 training images and 900 testing images.Our second data set, based on the online database NIST [15], consists of machine-printed characters, namely, 9 000 Roman capitals (bold, courier).We use 5 800 letters for training and 3 200 for testing.Both data sets are degraded by an impulse noise varying from 1% to 8% and transformation noise.We perform the experiments via the filter blank moment invariants obtained by means of cubic B-splines.The orthogonal Daubechies wavelets of order 6, 4 and 2 and the Coiflet wavelets have been tested as well.Although the orthogonal wavelets easily allow reconstruction of the image, the B-splines always seem to perform slightly better.We also analyzed the wavelet moment invariants introduced by Shen [20].As mentioned before, the Shen's invariants obtained by projecting ψ m,n (r) onto S Γ (r)r correspond to D m,n,Γ .
We denote our proposed algorithm by QMF-FCM-M in the case of the Mahalanobis distance and by QMF-FCM-E in the case of the Euclidean distance.We use the notation FCM to indicate our feature selection algorithm and the notation I if the features were selected individually (see, for instance, [20]).For example, Shen   The comparison of an average classification rate of the proposed QMF-FCM-M versus the most popular moment invariants is shown in Table 2. Table 2 includes degradation by all types of noise, rotation, translation, scaling and random noise.Besides, in the case of NIST, we consider an interesting effect of the boundary noise appearing after separation of touching letters by means of dilation.Consider Tables 2 and 3. Shen-I-E applied to the NIST symbols exhibits an 88.4% average recognition rate, whereas our method achieves a 94% recognition rate.The table shows that every component of the algorithm is almost equally important (namely, combining the QMF with the FCM shows a 1.5% increase whereas the Mahalanobis distance increases the recognition rate further by 2%).Differentiation by the intensity and the type of noise given in Tables 1-6 reveals that our algorithm almost always outperforms Shen's invariants and in particular when they are based on the individual feature selection and the Euclidean distance.The efficiency of the algorithm compared to that of the preceding techniques, becomes apparent when increasing the noise intensity.The most impressive result is an almost 28% absolute increase (45% relative increase) with regard the Fourier-Mellin invariants in the case of the aircraft silhouettes degraded by 6-8% impulse noise and rotation noise.Tables 5 and 6 exemplify the experiments with the NIST printed characters.Note that the rotation and segmentation noise has a more significant impact on the characters, since the centroids of the characters often lie outside the character body.Consequently, the centroids are much more sensitive to the noise.However, for 4.5-6% noise and a combination of the impulse and the transformation noise Shen's invariants, the Zernike invariants and the Fourier-Mellin invariants achieve 51.5%, 46.7% and 37.8% recognition rates respectively, whereas the filter bank invariants achieves a 67.5% recognition rate.

Conclusions
The proposed filter bank invariants extend the idea of applying wavelets for rotation invariant pattern recognition.Our approach based on the analysis of the high and the low frequency filter bank coefficients, combined with elimination of the redundant features, always seems to lead to a tangible improvement in the recognition rate when compared to those of conventional methods.For instance, we obtain an increase of approximately

Figure 2 (
Figure 2(b) shows that rotations may have a drastic impact on the wavelet invariants.For instance, in our case the invariant |M 0,1,1 | varies from 0 to 15.71% with the maximum produced by the Alpha Jet rotated by 120 • .The accuracy analysis reveals a good performance of |M 1,0,1 | with a maximum error of approximately 1.17% for the 25 • rotated Alpha Jet.It should be noted that the extrema of ψ 1,0 (r) and |S 1 (r)| are close, which results in an accurate and representative feature (see Figure 3(b)).
) and 6(b) respectively.The training classes form an elliptical shape.The circles representing the classes in terms of the Euclidean distance are overlapping.However, the two ellipses representing the Mahalanobis distance are separable.Moreover, even if the sets are separable in both metrics, the Mahalanobis metric usually requires fewer FCM iterations (see Figures6(a) and 6(b)).

Figure 5 :Figure 6 :
Figure 5: Scattergrams of Alpha Jet and Am-X and the characters O and Q corrupted by 0-2% impulse nose.(a) Aircraft.(b) Characters.

Table 1 :
The Mahalanobis distance makes it possible to eliminate redundant features.It also provides better separability.Consider the scattergrams |A 3,13,1 |/|A 3,14,1 | associated with two similar aircraft Alpha Jet and Am-X, and |A 2,10,1 |/|A 2,11,1 | associated with the capital letters O and Q in Figures5(a -I-E, means "Shen's invariants with individual selection in terms of the Euclidean distance."

Table 4 :
Aircraft images, impulse noise combined with rotation and scaling.

Table 5 :
The NIST characters, impulse noise and segmentation noise.

Table 6 :
The NIST characters, impulse noise and transformation noise.