This paper proposes the method to aggregate multi-similarity measures based on relevance feedback in CBIR. In this approach, the overall similarity measure between the two
images is calculated by non-additive aggregation from different individual similarity measure based on Choquet integral. The important weights are used in Choquet integral, which
are defined not only for each individual similarity, but also for all the aggregations between
them. The weights are estimated by the Simplex algorithm for optimization linear objective
function based on user’s feedback information, therefore, the overall similarity measure is
more suitable than the target user’s query. Experimental results on two benchmark datasets
Oliva and Corel have shown the effectiveness of the proposed approach compared to two
methods [13] and [4]. Despite promising results, identifying the most informative images for
user to label, which is one of the critical issues in relevance feedback has not been addressed
in the proposed method yet. The authors plan to incorporate active learning methods in the
proposed to more effectively exploit the feedback information of users.
19 trang |
Chia sẻ: huongthu9 | Lượt xem: 473 | Lượt tải: 0
Bạn đang xem nội dung tài liệu Learning interaction measure with relevance feedback in image retrieval, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
Journal of Computer Science and Cybernetics, V.32, N.2 (2016), 113–131
DOI no. 10.15625/1813-9663/32/2/8605
LEARNING INTERACTION MEASURE WITH RELEVANCE
FEEDBACK IN IMAGE RETRIEVAL
NGO TRUONG GIANG1, NGO QUOC TAO2, NGUYEN DUC DUNG2, NGO HOANG HUY2
1Faculty of Information Technology, HaiPhong Private University;
2Institute of Information Technology, Vietnam Academy of Science and Technology;
giangnt@hpu.edu.vn,{nqtao,nddung,nhhuy}@ioit.ac.vn
Abstract. Relevance feedback is an effective approach to bridge the gap between low-level feature
extraction and high-level semantic concept in content-based image retrieval (CBIR). In this paper,
the authors further improve the use of users feedback with multi-feature query and the Choquet
integral. Taking into account the interaction among feature sets, feedback information is used to
adjust the feature’s relevance weights that are considered as the fuzzy density values in the Choquet
integral to define the overall similarity measure between two images. The feature weight adjustment
and integration aims at minimizing the difference between users desire and outcome of the retrieval
system. Experimental results on several benchmark datasets have shown the effectiveness of the
proposed method in improving the quality of CBIR systems.
Keywords. Content-based image retrieval, relevance feedback, linear programming, fuzzy mea-
sures, Choquet integral.
1. INTRODUCTION
Image retrieval has become an important research topic in multimedia applications and attracted
attention of many researchers in last decades. Basically, there are two main approaches in building
an image retrieval system: text-based and content-based [22, 11, 17]. In text-based image retrieval
systems, the users’ queries are composed by key-words, which describe image content. The system
retrieves images using image labels which are annotated manually. However, the difficulties in an-
notating a massive number of images and the avoiding subjectively labeling make this framework
impractical in many situations.
In contrast to the keywords description, content-based image retrieval (CBIR) uses visual fea-
tures such as colors, textures, patterns and shapes to describe image content. Its advantage over the
keyword based image retrieval lies in the fact that the features are automatically extracted and the
image’s content. However, the most challenging in CBIR is the semantic gap between human under-
standing (high level concepts) and machine understanding (low-level features) [22]. Many algorithms
have been proposed to describe low level features of image and compute the similarity between them
[5, 31, 32, 14, 37, 8], the semantic gap has not yet been satisfactorily resolved and the performance
of CBIR is still far from the expectations of the user.
One possible solution to narrow down the semantic gap is integrating human interaction into
CBIR systems, which is popularly known as Relevance Feedback (RF) [34, 30]. Basically, RF is a su-
pervised learning technique that aims to improve the retrieval performance by learning the subjective
c© 2016 Vietnam Academy of Science & Technology
114 TRUONG-GIANG.N, QUOC-TAO.N,DUC-DUNG.N, HOANG-HUY.N
perception of users about the similarity between the images. According to the method, the system
needs to be run through several iterations. For each iteration, the system retrieves a series of images
that are ordered according to a pre-defined similarity measure, and it requires user’s interaction to
mark the relevant and non-relevant retrievals [2]. These data are then used to update the reformu-
lated query vector for helping next query results that are being nearer the user’s requirement [21].
Therefore, the query model used in the system should be designed to be flexibly enough so that it
can be dynamically adjusted to approximate a variety of similarity adjustments encountered in query
process.
Multi-feature query paradigm used in CBIR system aims to mine the complementary information
of multiple features for achieving strong generalization. In this paradigm, each image is represented
by a multi-dimensional feature vector, and each element of the feature vector is assigned the different
weights which based on the feedback of users. These weights are then used for aggregating to produce
a definitive result that may consist of a composite similarity or score value [3]. Many methods for
constructing an aggregation of similarities have been developed. Rui et al. [26] proposed an interactive
retrieval approach in which, each image is represented by vectors with the corresponding weights that
represent the relative importance of each feature within a vector as well as their importance across
the entire data set. These weights are dynamically updated based on user’s subjective perception
so as to place more weight on relevant elements and less on irrelevant ones. Iqbal et al. [19] used
a technique based on Gaussian normalization to process distances of color, texture and structure
measures. Then, a global measure is defined as a weighted linear aggregation of the normalized
distances. In [10], a set of weights for each descriptor is derived based on genetic algorithm, then
they are used to combine image database descriptors. Knowles et. al. [35] have defined a global
measure as an optimal linear aggregation of partial similarity functions by using the multi-objective
optimization technique by a Pareto Archive Evolution Strategy [20]. In [3], a new similarity measure
is designed based on assuming statistical independence, which combines the results obtained with
each independent function. This approach is expanded in [4] by generating a straightforward rule
that can be easily implemented. The negative feedback is used to identify the context of positive
example selection as well as disambiguate the concept being searched. The key idea behind above
methods is to change the weight of each image feature. However, the mentioned methods consider
the features in terms of independent elements. And the additive operators as the weighted sum are
often used to produce an overall similarity measure. Therefore, they are not flexible enough to model
queries when several of features may be important to the user only if some other feature is present
or absent.
A solution to overcome the above problems is to use the Sugeno measure [25] with the Choquet
integral (CI) [24]. In [13], the Sugeno measure is used to define the importance of each component
feature and the Choquet integral is used to estimate the similarity between images. However, a
limitation of Sugeno measure is considered all pairs of features interact the same way which is not
appropriate in the context of retrieval using multiple feature spaces.
In this paper, the Choquet integral is used as a similarity measure, and a effective algorithm is
proposed to learn the relevant weight of the features based on the relevance feedback. The user’s
feedback information is modeled by a fuzzy set used to learn relevant weight of the features. These
weights are then used as the fuzzy density functions in Choquet integral to aggregate individual
similarity measures in order to produce overall similarity measure. The learning relevant weight
of the features is stated in the least absolute deviation sense that is solved by linear programming
techniques. This allows us to estimate the relevant weights not only to each feature, but also to
LEARNING INTERACTION MEASURE WITH RELEVANCE FEEDBACK 115
their aggregations that represents query image more closely to user’s target. Experimental results
on several benchmark datasets show that the proposed method effectively improve the quality of
the CBIR system. The paper is organized as follows. The brief of discrete Choquet integrals and
some related concepts are presented in the Section 2. Section 3 presents detailed descriptions of the
proposed method for combining the similarity measures based on relevance feedback. The empirical
results are presented in section 4. Finally, some conclusions and future research directions are drawn.
2. FUZZY MEASURES AND CHOQUET INTEGRAL
In this section, the basic concepts of the Choquet integral is recalled. This is a tool for data
fusion, which is documented, e.g. in [28, 16, 15, 25].
2.1. Fuzzy measures
Definition 1. [28] Let X be a set, X 6= ∅. A fuzzy measure (also called capacity) on X is a
set function g : 2X → [0, 1] satisfying the following conditions:
• Boundary conditions: g (∅) = 0, g (X) = 1;
• Monotonicity: g (A) ≤ g (B) ∀A ⊂ B ⊂ X ;
• Continuity: if {Ai}1≤i satisfies Ai ⊂ X,Ai+1 ⊂ Ai, ∀i, then
lim
x→∞ g (Ai) = g
( ∞⋃
i=1
Ai
)
.
Fuzzy measures represent the importance of the set of properties X. They are defined
not only in every property separately but also for all the aggregations among the properties
of the set X.
Definition 2. [25] A fuzzy measure is a Sugeno measure (or a λ−fuzzy measure) if satisfying
the following additional conditions:
gλ (A ∪B) = gλ (A) + gλ (B) + λgλ (A) gλ (B) .
For all A,B ⊂ X,A ∩ B = ∅, and when X = {x1, x2, . . . , xn} is a finite set, X 6= ∅,
λ > −1, λ 6= 0, is determined by solving the following equation:
(λ+ 1) =
n∏
i=1
(
1 + λg(i)
)
(1)
where, g(i)
def
= g({xi}). Thus, the fuzzy measure of a subset A ⊂ X can be inferred from the
following formula:
g({A}) = −1 +
∏k
i=1(1 + λg
(i))
λ
(2)
where, A = {x1, x2, . . . , xk}.
116 TRUONG-GIANG.N, QUOC-TAO.N,DUC-DUNG.N, HOANG-HUY.N
Definition 3. [16] Let X be a finite set, X 6= ∅. The Mo¨bius transform of a given fuzzy
measure g is a set function Mg : 2X → [0, 1] defined by
A ⊂ X 7→ Mg(A) =
∑
B⊂A
(−1)|A\B| g(B). (3)
The Mo¨bius transformation is invertible. The fuzzy measure restored by its inverse
transformation, called Zeta transform
g(A) =
∑
B⊂A
Mg(B),∀A ⊂ X. (4)
The boundary and monotonicity conditions of of a fuzzy measure are explained, respectively,
as follows:
• Mg (∅) = 0 and ∑A⊂XMg(A) = 1;
• ∑B⊂A|i∈BMg(B) ≥ 0,∀A ⊂ X, i ∈ A.
Definition 4. [23] A fuzzy measure g is called K-additive if
• ∀A ⊂ X,M(A) = 0 if card(A) > K;
• ∃A ⊂ X such that card(A) = K and Mg(A) 6= 0.
A K-additive fuzzy measure is defined by only (n1) + (n2) + · · · + (nk) coeficients instead of
2n − 1 of a general fuzzy measure g, where n = card(X).
2.2. Choquet integral
Choquet integral [24] is the most common form of fuzzy integral which is an integral of
a real function with respect to a fuzzy measure [9].
Definition 5. [25] Let f : X → [0, 1] be a function where f(x) denotes the confidence value
of x. The Choquet integral of f with respect to the fuzzy measure g can be defined as:
Cg (f)
def
=
∫
x
f (x) ◦ g def=
∫ 1
0
g (Aα) dα (5)
where, ∀α ∈ [0, 1], Aα def= {x ∈ X |f (x) ≥ α}. In caseX is a discrete set, the discrete Choquet
integral with respect to a fuzzy measure g is defined as
Cg (f) =
n∑
i=1
[
f (xti)− f
(
xti−1
)]
g (Ati) (6)
such that 0 ≤ f (xt1) ≤ f (xt2) ≤ ... ≤ f (xtn) , f (xt0) = 0, Ati
def
= {xti , ..., xtn}.
With the help of the Mo¨bius transformation, Choquet integral can be expressed as
Cg (f) =
∑
A⊂X
Mg (A) min
i∈A
f(xi). (7)
LEARNING INTERACTION MEASURE WITH RELEVANCE FEEDBACK 117
3. THE PROPOSED METHOD
In this section, the integral Choquet is used as aggregation operator to produce the
overall similarity measure from individual similarity measures. The first, the form property
of Choquet integral is clarified by the following statement:
Proposition 1.
(i) If gλ is a λ−fuzzy measure, then gλ is monotonicity, ∀A ⊂ B ⊂ X ⇒ gλ(A) ≤ gλ(B),
and A 6= Ø, A ⊂ X,A is finite⇒ gλ (A) =
−1 + ∏
a∈A
(1 + λgλ ({a}))
λ
.
(ii) If f1, f2 : X → [0, 1], f1(x) ≤ f2(x),∀x ∈ X, g is the fuzzy measure on X then
Cg(f1) ≤ Cg(f2).
Special, inf(f) ≤ Cg(f)≤ sup(f),∀f : X → [0, 1]. This property demonstrates that
when X is a finite set, the Cg is an aggregation operator.
(iii) X =
{
xi
∣∣i = 1, n}, g is the fuzzy measure on X, f : X → [0, 1] is a function and
i1,i2,. . . ,in is a permutation of 1, 2.., n such that: 0 = f(xi0) ≤ f(xi1) ≤ f(xi2) ≤ ... ≤
f(xin),∀k = 1, n,Aik
def
= {xik , ..., xin} , Ain+1
def
= ∅, then
n∑
k=1
(
f(xik)− f(xik−1)
)
g(Aik) =
n∑
k=1
f(xik)
(
g(Aik)− g(Aik+1)
)
.
Proof:
(i) A ⊂ B ⇒ B = A ∪B\A⇒ gλ(B) = gλ(A) + gλ(B\A) + gλ(B)gλ(B\A)
=gλ(A) + gλ(B\A) {1 + λgλ(A)} ≥ gλ(A) by 0 ≤ gλ(B\A), 0 ≤ gλ(A) ≤
1, λ > −1.
Rewrite the Sugeno equation:
∀A,B ⊂ X,A ∩B = ∅, gλ(A ∪B) = gλ(A) + gλ(B) + λgλ(A)gλ(B)
⇒ (1 + λgλ(A ∪B)) = (1 + λgλ(A)) (1 + λgλ(B))
⇒ (1 + λgλ ({a1, a2, .., ai})) = (1 + λgλ ({a1, a2, .., ai−1})) (1 + λgλ ({ai})) ,
∀i = k, k − 1, ..., 2
⇒ 1 + λgλ ({a1, a2, .., ak}) =
k∏
i=1
(1 + λgλ ({ai}))
⇒ gλ ({a1, a2, .., ak}) =
−1 +
k∏
i=1
(1 + λgλ ({ai}))
λ
(ii) ∀α ∈ [0, 1], {x ∈ X |f1(x) ≥ α} ⊂ {x ∈ X |f2(x) ≥ α} do f1(x) ≤ f2(x)∀x ∈
X
⇒ g ({x ∈ X |f1(x) ≥ α}) ≤ g ({x ∈ X |f2(x) ≥ α})∀α ∈ [0, 1]
⇒ Cg(f1) =
∫ 1
0
g ({x ∈ X |f1(x) ≥ α}) dα ≤
∫ 1
0
g ({x ∈ X |f2(x) ≥ α}) dα = Cg(f2)
118 TRUONG-GIANG.N, QUOC-TAO.N,DUC-DUNG.N, HOANG-HUY.N
f : X → [0, 1],∀x ∈ X, f(x) = inf(f), f : X → [0, 1],∀x ∈ X, f(x) = sup(f)
inf(f) = Cg(f) ≤ Cg(f) ≤ Cg(f) = sup(f)
(iii) ∀k = 1, n, ak def= f(xik)− f(xik−1), s0 = 0, sk
def
=
∑k
l=1 al ⇒ sk = f(xik)
n∑
k=1
(
f(xik)− f(xik−1)
)
g(Aik) =
n∑
k=1
akg(Aik) =
n∑
k=1
skg(Aik)
= f(xin)g(Ain+1) +
n∑
k=1
sk
{
g(Aik)− g(Aik+1)
}
(Abel′s formular )
=
n∑
k=1
f(xik)
{
g(Aik)− g(Aik+1)
}
.
When g is the additive fuzzy measure,
Cg(f) =
n∑
k=1
f(xik)
{
g(Aik−1)− g(Aik)
}
=
n∑
k=1
f(xik)g({xik}) =
n∑
i=1
f(xi)g({xi}).
Remaks:
• Formula gλ (A) in (i) demonstrates that all pairs of features interact the
same way, is a polynomial of λ in which coefficients are importance of in-
dividual feature.
A = {a1, a2, .., ak} ⇒
gλ(A) =
k∑
i=1
gλ ({ai})+λ
k−1∑
i1=1
k∑
i2=i1+1
gλ ({ai1})gλ ({ai2})+...+λk−1
k∏
i=1
gλ ({ai}) .
• Right side of the equation (iii) means that, if new information that xik con-
tribution is less, the measure should be chosen so that g(Aik)−g(Aik+1) ≈ 0.
3.1. Modeling the overall similarity using the discrete Choquet Integral
Let F = {ft}Tt=1 be a set of n feature spaces which represent an image used by CBIR
system. Here, ft is a multi-dimension vector of feature space which can be modelled by
several representations, e.g. Color Histogram, Grid Color Moment, Local Binary Pattern,
Gabor Wavelets Texture and Edge Histogram. Let F q and F j be two feature vectors of two
images q, j, respectively.
Definition 6. The similarity measure with respect to multi-feature between two images q and
j, is denoted as SimT (q, j) and is defined as follows:
SimT (q, j) = Φ
({
Simt
(
f qt , f
j
t
)
|t = 1...T
})
(8)
LEARNING INTERACTION MEASURE WITH RELEVANCE FEEDBACK 119
where, T is number of feature spaces, Simt
(
f qt , f
j
t
)
is similarity measure between query
image and image j according to tth feature space, Φ is a paradigm for aggregating the
similarity measure of individual feature spaces into the overall similarity measure. This
definition is general for using many types of different feature spaces. The most popular
aggregation paradigm, which is used in CBIR systems, is the weighted sum. This paradigm
is defined as follows:
Φ
({
Simt
(
f qt , f
j
t
)
|t = 1...T
})
=
T∑
t=1
λt Simt
(
f qt , f
j
t
)
(9)
where,
∑T
t=1 λt = 1, and λt > 0, ∀t = 1, T .
In practice, when users observe the images, they can recognize the similar between two
images in either an individual feature or in some of aggregating features. In addition, a
similar of individual feature is only important for the user if similarity with respect to some
other feature appears, it means that, there is a mutual interaction in each other features.
Thus, above model is not flexibly enough for such query paradigms. For example, assuming
that there has three images I1, I2, I3 and their overall similarities are evaluated based on
three feature namely color, texture and shape, and the weight is assigned on each feature
to be the same. Using the aggregation paradigm in Eq(9), the overall similarity measure of
images is shown in Table 1. In this results, image I3 has the lowest rank. However, if the user
wants to search for a well-balanced both shape and texture similarity, image I3 should be
considered as the first rank. This will be difficult to achieve by additive operators, although
the λ changes with any values.
Table 1. The overall similarity using weighted sum operator
(with weights λ(color) = λ(texture) = λ(shape) = 1/3)
Image Color Texture Shape Overall Similarity
I1 0.72 0.68 0.44 0.613
I2 0.60 0.52 0.72 0.613
I3 0.56 0.60 0.60 0.587
In this case, the weights should be assigned to both the individual feature and their
subsets. For above example, the weight assigned to the aggregation of two featuresbreak
{texture, shape} should be bigger than or equal to the sum of their individual weights, i.e.,
λ {texture, shape} ≥ λ {texture}+ λ {shape} .
Additionally, the weights assigned to the rest combination should be smaller or equal to
the sum of their individual weight because they do not receive much interest of users, i.e.,
λ {color, texture} ≤ λ {color}+λ {texture}, similarly, yielding λ {color, shape} ≤ λ {color}+
λ {shape}. This reasoning can be easy to model by fuzzy measures [15]. Hence, the Choquet
integral will be effective solution for the query paradigms.
In this work, Choquet integral with fuzzy measure is used to define aggregation paradigm
Φ. The similarity of features are considered as source information, and the relevance weight
120 TRUONG-GIANG.N, QUOC-TAO.N,DUC-DUNG.N, HOANG-HUY.N
of features is considered as fuzzy density function. These are then used in Choquet integral
model to compute the overall similarity measure. Thus Eq(8) can be rewritten as follows:
SimT (q, j) = Choquetµ(q, j)
def
=
T∑
i=1
(
Simt(i)
(
f qt , f
j
t
)
− Simt(i−1)
(
f qt , f
j
t
))
× µ ({fi, ..., fT })
(10)
where {ti}1≤i≤T , it means that, the indexes of features set are permuted in such a way that:
Simt1
(
f qt , f
j
t
)
≤ ... ≤ Simt(T )
(
f qt , f
j
t
)
, and convention Simt0
(
f qt , f
j
t
)
= 0. The notation
µ(.) presents the relevance weights of features which will also serve as the with fuzzy measures
under the constraints as:
{µ(A)}A∈2{1,2,..,T} ,∀A ∈ 2{1,2,...,T}, 0 ≤ µ(A) ≤ 1
µ(∅) = 0, µ ({1, 2, ...T}) = 1
∀A ∈ 2{1,2,...,T}, A 6= ∅, max
B⊂A,|B|=A−1
{µ(B)} ≤ µ(A).
(11)
Using K-additive fuzzy measure in Mo¨bius representation, formula Eq(10) is able to be
rewritten as follows:
SimT (q, j)
def
=
∑
A⊆{1,...,T}
MµA mint∈A
(
Simt
(
f qt , f
j
t
))
(12)
where Mµ(A) satisfying constrains:
∑
B⊆A,|B|≤KMµB ≥ 0
∀A ⊆ {1, 2, ..,T}, |A| > 1, ∀t ∈ A,Mµ{t} ≥ 0∑
B⊆{1,2,...,T},|B|<KMµB = 1.
(13)
Considering above example, suppose the individual importance of each feature is assigned
by fixed values where µ {color} = µ {texture} = µ {shape} = 0.3, and the importance on
the aggregation of features is assigned by:
µ {textue, shape} = 0.9 ≥ µ {texture}+ µ {shape} ,
µ {color, texture} = 0.4 ≤ µ {color}+ µ {texture} ,
µ {color, shape} = 0.4 ≤ µ {color}+ µ {shape} .
According to Definition 1, µ {∅} = 0, µ {color, texture, shape} = 1. The fuzzy measure
values and similarity measures in Table 1 are substituted into Eq(10) to compute the overall
similarity. The result is summarized as shown in Table 2. Based on this result, we see, image
I3 is identified as the best similar image which fit user’s expectation.
LEARNING INTERACTION MEASURE WITH RELEVANCE FEEDBACK 121
Table 2. The overall similarity using integral Choquet
Image Overall similarity measure Ranking
I1 Step1: Sorting the feature’s similarity measure in ascending order:
Sims<Simt<Simc.
Step2: Calculating the overrall similarity measure according using
Eq(10)
Cg(color, texture, shape)= Sims*g{color,texture,shape}+
(Simt-Sims)*g{texture,color}+(Simc-Simt)*g{color}
=0.44*1+(0.68-0.44)*0.4+(0.72-0.68)*0.2
=0.548
3
I2 0.588 2
I3 0.596 1
3.2. Learning features’s relevance weight
To use the Choquet integral for aggregating similarity, it is necessary to identify the
weight of all subsets of feature space. However, it is difficult for estimating 2n values of
fuzzy measure, especially when the number of features, n is sufficiently large [27]. Many
methods have been proposed to identify fuzzy measure values, the most of them are stated
under of the form of an optimization problem: see [7] for detail. However, finding a solution
via these methods in the case of a very large number of parameters and constraints still
remains as an obstacle. In general, the usability of each identification method depending on
several factors: types of inputs, number of inputs and number of fuzzy measure values that
need to be identified. In this work, the learning relevance weight of features is formulated in
the context of linear programming problem, which can be solved quickly and reliably.
Let S+, S− be two sets of image related and unrelated, which labeled by user via relevance
feedback. In this work, it is needed to estimate relevance weights µ of candidate subsets of
features so that the value of the overall similarity match the desired outputs yk as close
as possible, i.e., that the images in S+are ranked in closest to query image, whereas the
images in S− are ranked in farthest to query image. For this purpose, the learning feature’s
relevance weight can be formulated as the least absolute deviation problem:
minimize
K∑
k=1
|Choquetµ(q, j)− yk| (14)
subject to various constraints on µ.
Let yk =
{
y+k , y
−
k
}
be the output values which are defined by
y+k = max1≤t≤T
Simt(f
q
t , f
j
t ), Ik ∈ S+,
y−k = min1≤t≤T
Simt(f
q
t , f
j
t ), Ik ∈ S−.
(15)
122 TRUONG-GIANG.N, QUOC-TAO.N,DUC-DUNG.N, HOANG-HUY.N
The problem (14) can be restated as follows:
minimize
∑
p∈S+
max
(
y+p − Choquetµ (p, j) , 0
)
+
∑
p∈S−
max
(
Choquetµ (p, j)− y−p , 0
) (16)
subject to various constraints (11) on µ. However, these conditions do not reduce the com-
plexity of the least absolute deviation problems. It is possible to reduce the complexity of
the problem when working in Mo¨bius representation [6].
By using K-additive fuzzy measure in Mo¨bius representation, the problem (16) is trans-
formed into a simplified optimization problem:
minimize
∑
p∈S+
max
y+p − ∑
A||A|≤K
MµAhA (p) , 0
+
∑
p∈S−
max
∑
A||A|≤K
MµAhA (p)− y−p , 0
(17)
subject to various constraints:
∑
B⊆A,|B|≤KMµB ≥ 0
∀A ⊆ {1, 2, ..,T}, |A| > 1, ∀t ∈ A,Mµ{t} ≥ 0∑
B⊆{1,2,...,T},|B|<KMµB = 1
(18)
where
hA (p) = min
t∈A
(Simt (f
q
t , f
p
t )) ,
Set
e+p = y
+
p −
∑
A||A|≤K
hA(p)MµA, p ∈ S+,
e−p =
∑
A||A|≤K
hA(p)MµA,−y−p , p ∈ S−.
By using auxiliary variables e+p , e
−
p , the problem (17) is transformed into:
minimize
K∑
p=1
e+p + e
−
p (19)
subject to constraints:
e+p −
∑
A||A|≤K hA(p)MµA ≥ −y+p ,∀p ∈ S+,
e−p +
∑
A||A|≤K hA(p)MµA ≥ y−p ,∀p ∈ S−,∑
B⊆A,|B|≤KMµB ≥ 0,
∀A ⊆ {1, 2, ..,T}, |A| > 1, ∀t ∈ A,Mµ{t} ≥ 0,∑
B⊆{1,2,...,T},|B|<KMµB = 1,
e+p ≥ 0, e−p ≥ 0.
(20)
LEARNING INTERACTION MEASURE WITH RELEVANCE FEEDBACK 123
This is a linear programming problem which is very convenient to solve by using several
standard methods as Simplex method. The procedure for calculating overall similarity mea-
sure is described in algorithm 1, and the proposed method for content-based image retrieval
is described in algorithm 2.
Algorithm 1 Aggregation similariy measure based on Choquet integral
Input: Set of m fuzzy measure in Mo¨bius representation Mob =
{MµA, ∀A ⊆ {1, 2, ..,T}},
set of T similarity measures X =
{
Simt
(
f qt , f
j
t
)}
|t = 1...T.
Output: aggregation similarity measure S.
1: Initialisation: S = 0;
2: for j = 1 to m do
3: for i = 1 to T do
4: r=0;
5: if bit i of j (from right to left) is 1 then
6: r = min(r,X[i]);
7: end if
8: end for
9: if r > 1 then
10: r = 0;
11: end if
12: S = S +Mob[j] ∗ r;
13: end for
14: return S;
4. EXPERIMENTS
4.1. Image database
In this section, the efficiency of proposed method is we evaluated on two sets of data as
ImgDB1 and ImgDB2 is evaluated. The ImgDB1 is set of Caltech101 data [12] including
9144 images classified into 101 classes, there are from 40 to 800 images on each one. The
ImgDB2 is subset of set of Corel data [29] including 15000 images classified into 150 classes
with different semantic topics. There are 100 images on each one. They are benchmark data
set that are often used to evaluate the efficiency in CBIR systems [11, 29].
4.2. Feature extraction
In CBIR system in general, each image is often presented by main three features: Color,
Texture and Shape [22, 11]. In the experiment, five descriptions for these features in [36]
are used, that are: a 9-dimension-vector for Grid Color Moment (GCM), a 59-dimension-
vector for Local Binary Pattern (LBP), a 120-dimension-vector for Gabor Wavelets Texture
(GWT), a 37-dimension-vector for Edge Histogram (EH) and a 512-dimension vector for
GIST feature. All of feature descriptions have been normalized according to method [1] so
that their each element has value in range [0, 1]. The similarity measure between pairs of
124 TRUONG-GIANG.N, QUOC-TAO.N,DUC-DUNG.N, HOANG-HUY.N
Algorithm 2 CBIR-Choquet (Proposed method for content-based image retrieval)
Input: Query image xq ={f qt }1≤t≤T , Feature database DB=
{
xj
}
1≤j≤N , x
j =
{
f jt
}
1≤t≤T
,
where, f qt , f
j
t ∈ [0, 1] .
Output: Result: Image list are arranged in the similarity measure to query image
1: Initialisation: g(Simt) =
1
T ∀t = 1, T ;S+ = ∅;S− = ∅;
% Ranking without relevance feedback
2: for j = 1 to N do
3: Compute the similarity between the query image and every image in DB according to
feature spaces.
{
Simt
(
f qt , f
j
t
)}
|t = 1...T
4: Compute overall similarity between the query image and every image in DB
SimTDB(j) = SimT (q, j); using Eq(9)
5: end for
6: Sort SimTDB(j), 1 ≤ j ≤ N ; in descending order
7: Normalize
{
Simt
(
f qt , f
j
t
)}
|t = 1...T , s.t Simt
(
f qt , f
j
t
)
∈ [0, 1].
% In each kth round of relevance feedback
8: while User not satisfied do
9: Identify sets of S+k ,S
−
k from user’s feedback:
S+ = S+ ∪ S+k ;S− = S− ∪ S−k ;
10: Compute the y+k , y
−
k ; using Eq(15)
11: Compute the MµA; by solving the problem (19).
12: for j = 1 to N do
13: Compute the overall similarity
SimTDB(j) = Choquet(
{
Simt
(
f qt , f
j
t
)}
|t = 1...T,MµA); by algorithm 1
14: end for
15: Result= Sort SimTDB(j), 1 ≤ j ≤ N ; in descending order
16: end while
17: return Result;
image of each description is calculated by using Euclidean distance. The descriptions used
in experiment and their properties are shown in Table 3.
4.3. Evaluating efficiency
A serial of experiments is implemented to show the feasibility and utility of the proposed
method. To illustrate the specific situation of online users, twenty images are selected ran-
domly from each class of each image database and considered as query images. Therefore,
it will have 2020 and 3000 query sessions on each database, respectively. At initial step of
each query session, images in database are sorted by Euclid distance to query image, and
then RF is automatically performed by a computer. At each feed-back round, the top 20
retrieved results are serially examined from the top, and all the unlabeled images within
the scope were automatically labeled using the ground truth in order to simulate the users
feedback. The images which belong to the same class are considered relevance and others
are considered irrelevance. All of labeled images in feedback loops will be used to update
LEARNING INTERACTION MEASURE WITH RELEVANCE FEEDBACK 125
Table 3. Descriptions and their properties used in the experiment
Descriptions Type of feature Size of feature vectors
GCM Color 81
LBP Texture 59
GWT Texture 120
EDH Shape 37
GIST Shape 512
fuzzy weights that are used to combine similarity measure.
The query result without feedback on set of data ImgDB1 is shown in Fig.1, distance
measure values are shown in Table 4, respectively. The image at the left top position is
query one, the images which have red border are the same semantic class with query image,
others are different from semantic class. It is very easy to see that amount of images having
relevance to query image is very limited; a lot of images have a very closed distance to query
image but very different in semantic and vice versa. However, after the fourth feedback
iteration, number of relevance images which are retrieved by proposed method increases
significantly as Fig.2.
Figure 1. The retrieval results without relevance feedback for query image 377022.jpg.
The Average Precision measure (AP)[18] has used to evaluate efficiency of the system
after each feedback iteration. At each feedback iteration, AP is defined as the average of
precision values obtained after each relevant image is retrieved. The precision value is the
ratio between the number of relevant images found by the system and the number of pictures
currently retrieved at the current iteration. The mean average precision (MAP) is the mean
of the average precision over all queries:
MAP =
1
Q
Q∑
q=1
1
NR
NR∑
i=1
p(i) (21)
126 TRUONG-GIANG.N, QUOC-TAO.N,DUC-DUNG.N, HOANG-HUY.N
Figure 2. The retrieval results after the fourth feedback iteration for query image 377022.jpg.
Table 4. Distance measure of the top 20 retrieved results without relevance feedback for
query image 377022.jpg.
Image GCM LBP GWT EDH GIST Dist
377022 0 0 0 0 0 0
113006 0.241 0.081 0.149 0.214 0.357 1.042
377024 0.225 0.112 0.174 0.216 0.373 1.099
208035 0.166 0.078 0.198 0.235 0.425 1.101
377029 0.236 0.113 0.170 0.227 0.363 1.110
134011 0.233 0.094 0.208 0.233 0.349 1.117
377075 0.213 0.176 0.183 0.225 0.326 1.124
104079 0.193 0.110 0.151 0.318 0.355 1.128
377078 0.256 0.109 0.142 0.237 0.387 1.130
499011 0.228 0.108 0.219 0.188 0.393 1.136
136016 0.295 0.122 0.143 0.222 0.358 1.140
482032 0.285 0.092 0.123 0.264 0.383 1.147
113032 0.286 0.143 0.147 0.258 0.319 1.152
150022 0.301 0.115 0.177 0.249 0.317 1.160
77088 0.242 0.128 0.226 0.267 0.300 1.163
377035 0.222 0.175 0.236 0.215 0.321 1.171
283069 0.174 0.149 0.187 0.264 0.398 1.172
283056 0.208 0.119 0.233 0.223 0.390 1.172
208096 0.259 0.125 0.216 0.226 0.349 1.174
326018 0.213 0.102 0.190 0.239 0.434 1.177
where, Q is total of query image, NR is the total number of relevant images for the query q.
The effectiveness of the proposed method is evaluated in two aspects: The effectiveness
of weight learning model, and the aggregation model of similarity measures. With the first
aspect, the proposed method is compared with the method in [13] which uses the Choquet
LEARNING INTERACTION MEASURE WITH RELEVANCE FEEDBACK 127
integral with Sugeno measure as the distance measure of features. In this method, the
authors have used the Euclidean distance of each component in 48-dimensional feature vector
representing the image’s texture features as sources of information in the CI. The coefficient
λ that is used to compute weight of interactive features is computed by estimating the real
roots of a polynomial of degree 48. However, in the context of multi-features, applying this
method directly for each feature component is not suitable due to the need of estimating the
real root of the polynomial of degree n(n = 809 in our case) when calculating the coefficient λ.
In this experiment, the information source used in Choquet integral is the similarity measure
of five feature tuples, and the weight of the interactive features is evaluated according to the
respective proposals. Determining of the coefficient λ is easily performed by finding the real
root of a polynomial of degree 5.
Figure 3. The effectiveness of learning model of fuzzy measure on ImgDB1 dataset
The results in Fig.3 and Fig.4 show that the proposed method is superior to the method
using Sugeno measure. The reason is that, in the proposed method, the weight of the
interactive features is more closely reflected user’s target because this weight is learned
during the relevant feedback process. Meanwhile, the method proposed in [13] only learns
the important weights of the corresponding individual feature tuple. These weights are
then used to estimate the coefficient λ that is used to compute the weight of the interactive
features, meaning that this method only considers the interaction of all pairs of feature tuples
in the same way, which cannot fully reflect the user’s target.
To evaluate the performance of the proposed method according to the second aspect, an
experiment is carried out as follows: Firstly, the method [33] is used to compute similarity
measure for individual feature tuple. Then, these similarity measures are aggregated ac-
cording to three ways: using weighted sum operator, our proposed method, and the method
proposed in [4] which are named EMR+WS, EMR+Proposed and IMR+[4], respectively.
The weights of the feature tuples are estimated according to the learning algorithm in the
proposed method. The comparison results are shown in Fig.5. In most cases, the results of
128 TRUONG-GIANG.N, QUOC-TAO.N,DUC-DUNG.N, HOANG-HUY.N
Figure 4. The effectiveness of learning model of fuzzy measure on ImgDB2 dataset
Figure 5. The effectiveness of aggregation paradigm of similarity measures on ImgDB2
dataset
the proposed method are better than those of other methods. As analyzed above, the aggre-
gation model based on Choquet integral considers all the interactions between subsets of the
feature tuples, so the results of this model reflect more accurately the target of users, espe-
cially, in the case of multiple semantic concepts, which is present in the image as in ImgDB2.
The computation of weights of interactive features is expensive due to the need of consid-
ering all the interactions. However, the proposed method applies aggregation paradigm of
LEARNING INTERACTION MEASURE WITH RELEVANCE FEEDBACK 129
similarity measures for T feature tuples (T = 5 in this case) instead of computing individual
feature components, so this difficulty can be easily overcome.
5. CONCLUSION
This paper proposes the method to aggregate multi-similarity measures based on rele-
vance feedback in CBIR. In this approach, the overall similarity measure between the two
images is calculated by non-additive aggregation from different individual similarity mea-
sure based on Choquet integral. The important weights are used in Choquet integral, which
are defined not only for each individual similarity, but also for all the aggregations between
them. The weights are estimated by the Simplex algorithm for optimization linear objective
function based on user’s feedback information, therefore, the overall similarity measure is
more suitable than the target user’s query. Experimental results on two benchmark datasets
Oliva and Corel have shown the effectiveness of the proposed approach compared to two
methods [13] and [4]. Despite promising results, identifying the most informative images for
user to label, which is one of the critical issues in relevance feedback has not been addressed
in the proposed method yet. The authors plan to incorporate active learning methods in the
proposed to more effectively exploit the feedback information of users.
ACKNOWLEDGMENT
We would like to thank anonymous reviewers for their valuable comments and suggestions
on the submission version of this paper. This work was supported in part by the Vietnam
Academy of Science and Technology under the grant VAST.HTQT.Bulgaria.02/15-16: ”Im-
proving the quality of an image management system by using machine learning methods”.
This work was also supported in part by the project CS16.03 of the Institute of Information
Technology, ”Improving CBIR method based on visual perception and multi- query”.
REFERENCES
[1] S. Aksoy and R. M. Haralick, “Feature normalization and likelihood-based similarity measures
for image retrieval,” Pattern Recognition Letters, vol. 22, no. 5, pp. 563–582, 2001, image/Video
Indexing and Retrieval.
[2] M. Arevalillo-Herra´ez, F. J. Ferri, and J. Domingo, “A naive relevance feedback model for
content-based image retrieval using multiple similarity measures,” Pattern Recogn., vol. 43, no. 3,
pp. 619–629, Mar. 2010.
[3] M. Arevalillo-Herrez, J. Domingo, and F. J. Ferri, “Combining similarity measures in content-
based image retrieval,” Pattern Recognition Letters, vol. 29, no. 16, pp. 2174–2181, 2008.
[4] M. Arevalillo-Herrez, F. J. Ferri, and J. Domingo, “A naive relevance feedback model for content-
based image retrieval using multiple similarity measures,” Pattern Recognition, vol. 43, no. 3,
pp. 619–629, 2010.
[5] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “Speeded-up robust features (surf),” Computer
Vision and Image Understanding, vol. 110, no. 3, pp. 346–359, 2008, similarity Matching in
Computer Vision and Multimedia.
[6] G. Beliakov, “Fitting fuzzy measures by linear programming. programming library fmtools,” in
Fuzzy Systems, 2008. FUZZ-IEEE 2008. (IEEE World Congress on Computational Intelligence).
IEEE International Conference on, June 2008, pp. 862–867.
130 TRUONG-GIANG.N, QUOC-TAO.N,DUC-DUNG.N, HOANG-HUY.N
[7] W. Budiharto, A. R. Krishnan, M. M. Kasim, and E. M. N. E. A. Bakar, “A short survey on
the usage of choquet integral and its associated fuzzy measure in multiple attribute analysis,”
Procedia Computer Science, vol. 59, pp. 427–434, 2015.
[8] O. M. A. B. Chawki Youness, El Asnaoui Khalid, “New method of content based image re-
trieval based on 2-d esprit method and the gabor filters,” TELKOMNIKA Indonesian Journal
of Electrical Engineering, vol. 15, no. 2, pp. 313–320, August 2015.
[9] Y. Choi, D. Kim, and R. Krishnapuram, “Relevance feedback for content-based image retrieval
using the choquet integral,” in Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International
Conference on, vol. 2, 2000, pp. 1207–1210 vol.2.
[10] R. da S. Torres, A. X. Falca˜o, B. Zhang, W. Fan, E. A. Fox, M. A. Gonc¸alves, and P. Calado, “A
new framework to combine descriptors for content-based image retrieval,” in Proceedings of the
14th ACM International Conference on Information and Knowledge Management, ser. CIKM
’05. New York, NY, USA: ACM, 2005, pp. 335–336.
[11] R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image retrieval: Ideas, influences, and trends of the
new age,” ACM Computing Surveys, vol. 40, no. 2, pp. 1–60, May 2008.
[12] L. Fei-Fei, R. Fergus, and P. Perona, “Learning generative visual models from few training
examples: An incremental bayesian approach tested on 101 object categories,” Comput. Vis.
Image Underst., vol. 106, no. 1, pp. 59–70, Apr. 2007.
[13] H. Frigui, “Interactive image retrieval using fuzzy sets,” Pattern Recognition Letters, vol. 22,
no. 9, pp. 1021–1031, 2001.
[14] N. T. Giang, N. Q. Tao, N. D. Dung, and N. T. The, “Skeleton based shape matching using
reweighted random walks,” in The proceding of the IEEE on 9th International Conference on
Information, Communications and Signal Processing (ICICS), December 2013, pp. 1–5.
[15] M. Grabisch, “The application of fuzzy integrals in multicriteria decision making,” European
Journal of Operational Research, vol. 89, no. 3, pp. 445–456, 1996.
[16] M. Grabisch, I. Kojadinovic, and P. Meyer, “A review of methods for capacity identification in
choquet integral based multi-attribute utility theory: Applications of the kappalab r package,”
European Journal of Operational Research, vol. 186, no. 2, pp. 766–785, 2008.
[17] W. Hu, N. Xie, L. Li, X. Zeng, and S. Maybank, “A survey on visual content-based video indexing
and retrieval,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and
Reviews), vol. 41, no. 6, pp. 797–819, Nov 2011.
[18] M. J. Huiskes and M. S. Lew, “Performance evaluation of relevance feedback methods,” in
Proceedings of the 2008 International Conference on Content-based Image and Video Retrieval,
ser. CIVR ’08. New York, NY, USA: ACM, 2008, pp. 239–248.
[19] Q. Iqbal and J. K. Aggarwal, “Combining structure, color and texture for image retrieval: A per-
formance evaluation,” in Pattern Recognition, 2002. Proceedings. 16th International Conference
on, vol. 2, 2002, pp. 438–443 vol.2.
[20] J. D. Knowles and D. W. Corne, “Approximating the nondominated front using the pareto
archived evolution strategy,” Evol. Comput., vol. 8, no. 2, pp. 149–172, Jun. 2000.
[21] C.-H. Lee and M.-F. Lin, “Ego-similarity measurement for relevance feedback,” Expert Systems
with Applications, vol. 37, no. 1, pp. 871–877, 2010.
[22] Y. Liu, D. Zhang, G. Lu, and W.-Y. Ma, “A survey of content-based image retrieval with high-
level semantics,” Pattern Recogn., vol. 40, no. 1, pp. 262–282, Jan. 2007.
[23] G. Michel, “K-order additive discrete fuzzy measures and their representation,” Fuzzy Sets Syst.,
vol. 92, no. 2, pp. 167–189, Dec. 1997.
LEARNING INTERACTION MEASURE WITH RELEVANCE FEEDBACK 131
[24] T. Murofushi and M. Sugeno, “An interpretation of fuzzy measures and the choquet integral as
an integral with respect to a fuzzy measure,” Fuzzy Sets and Systems, vol. 29, no. 2, pp. 201–227,
1989.
[25] Y. Narukawa and T. Murofushi, Choquet integral and Sugeno integral as aggregation functions.
Berlin, Heidelberg: Springer Berlin Heidelberg, 2003, pp. 27–39.
[26] Y. Rui, T. S. Huang, M. Ortega, and S. Mehrotra, “Relevance feedback: a power tool for
interactive content-based image retrieval,” IEEE Transactions on Circuits and Systems for Video
Technology, vol. 8, no. 5, pp. 644–655, Sep 1998.
[27] A. Silvia, G. Salvatore, L. Fabio, and M. Benedetto, “Assessing non-additive utility for multi-
criteria decision aid,” European Journal of Operational Research, vol. 158, no. 3, pp. 734–744,
2004.
[28] M. S. T. Murofushi, “An interpretation of fuzzy measure and the choquet integral as an integral
with respect to a fuzzy measure,” Fuzzy Sets and Systems, vol. 29, pp. 201–227, 1989.
[29] D. Tao, X. Tang, X. Li, and Y. Rui, “Direct kernel biased discriminant analysis: a new content-
based image retrieval relevance feedback algorithm,” IEEE Transactions on Multimedia, vol. 8,
no. 4, pp. 716–727, Aug 2006.
[30] B. Thomee and M. Lew, “Interactive search in image retrieval: a survey,” International Journal
of Multimedia Information Retrieval, vol. 1, no. 2, pp. 71–86, 2012.
[31] J. Wu and J. M. Rehg, “Centrist: A visual descriptor for scene categorization,” IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, vol. 33, no. 8, pp. 1489–1501, Aug 2011.
[32] L. Wu and S. C. H. Hoi, “Enhancing bag-of-words models with semantics-preserving metric
learning,” IEEE MultiMedia, vol. 18, no. 1, pp. 24–37, Jan 2011.
[33] B. Xu, J. Bu, C. Chen, C. Wang, D. Cai, and X. He, “Emr: A scalable graph-based ranking model
for content-based image retrieval,” IEEE Transactions on Knowledge and Data Engineering,
vol. 27, no. 1, pp. 102–114, Jan 2015.
[34] M. O. Y. Rui, T. S. Huang and S. Mehrotra, “Relevance feedback: A powerful tool for inter-
active content-based image retrieval,” IEEE Transactions on Circuits and Systems for Video
Technology, vol. 8, pp. 644– 655, 1998.
[35] Q. Zhang and E. Izquierdo, “Optimizing metrics combining low-level visual descriptors for image
annotation and retrieval,” in 2006 IEEE International Conference on Acoustics Speech and Signal
Processing Proceedings, vol. 2, May 2006.
[36] J. Zhu, S. C. Hoi, M. R. Lyu, and S. Yan, “Near-duplicate keyframe retrieval by nonrigid image
matching,” in Proceedings of the 16th ACM International Conference on Multimedia, ser. MM
’08. New York, NY, USA: ACM, 2008, pp. 41–50.
[37] Y. K. J. K. Zukuan WEI, Hongyeon KIM, “An efficient content based image retrieval scheme,”
TELKOMNIKA Indonesian Journal of Electrical Engineering, vol. 11, no. 11, pp. 6986–6991,
November 2013.
Received on August 09 - 2016
Revised on October 03 - 2016
Các file đính kèm theo tài liệu này:
- learning_interaction_measure_with_relevance_feedback_in_imag.pdf