In this paper, the authors propose a fuzzy probabilistic relational database model, called FPRDB,
as an extension of the PRDB model with fuzzy sets. FPRDB is also a complete development for the
model in [18] with a full set of basic fuzzy probabilistic relational algebraic operations. FPRDB is
capable of representing and manipulating both imprecise and uncertain information in the real world
applications. FPRDB has been built based on the association of the probability theory and fuzzy
theory. The relational schemas, relations, functional dependencies and relational algebraic operations
of FPRDB have been defined coherently and consistently. A set of basic properties of the algebraic
operations in FPRDB has also been proposed as theorems and proven completely.
Towards applying FPRDB in practice, a management system for FPRDB will be build with the
familiar querying and manipulating language like SQL that is able to represent and handle imprecise
and uncertain information in the real world.
19 trang |
Chia sẻ: huongthu9 | Lượt xem: 448 | Lượt tải: 0
Bạn đang xem nội dung tài liệu A fuzzy probabilistic relational database model and algebra, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
ly the value of each attribute,
although we know that the attribute may take one of the values that can be vague of a certain set.
More recently, in [17], the author introduced a probabilistic relational database model called
PRDB that was able to represent situations in which we do not know exactly the value of each
attribute but we know the probability interval for it takes one of values of a candidate set. It
means that the model can overcome the shortcoming of the model in [16]. However, the PRDB
model could not express and deal with vague information. In [18], the author extended the model
in [17] with fuzzy set values to a fuzzy probabilistic relational data base model, called FPRDB, for
representing and handling uncertain and vague information. However, in the model [18], only the
selection operation was defined while all other algebraic operations are missing. In this paper, using
the combination of fuzzy probabilistic triples introduced in [2] and the probabilistic interpretation
of relations on fuzzy sets defined in [18], we extend more the FPRDB model with a complete set of
fuzzy probabilistic relational algebraic operations for representing and manipulating both imprecise
and uncertain information in practice.
The basis of mathematics to develop FPRDB is presented in Section 2. Schemas and relations of
FPRDB are introduced in Section 3. Section 4,5 and 6 present fuzzy probabilistic relational algebraic
operations and their properties in FPRDB. Finally, Section 7 concludes the paper and outlines further
research directions in the future.
2. PROBABILITY AND FUZZY SETS
In this section, some notions about probability and fuzzy sets are presented as the basis for extending
the probabilistic relational database model PRDB with fuzzy set values.
2.1. Probabilistic interpretation of relations on fuzzy sets
First, the mass assignment formulated based on the voting model of fuzzy sets in [19] as the basis for
the probabilistic interpretation of relations on them is defined as follows.
Definition 1. Let A =
∑
i=1,n
∑
j=1,mi
xi,j : yi be a normal fuzzy set on a domain U , where n, mi ∈ N
and yi > yj if i < j, ∀i =1,. . . , n and ∀j =1,. . . , mi. The mass assignment corresponding to A is
a mapping mA: 2
U →[0, 1] that is defined by mA(z1) = y1−y2, . . . , mA(∪i=1,jzi) = yj−yj+1,. . . ,
mA(∪i=1,nzi) = yn, where zi = ∪j=1,mi{xi,j}.
It is noted that, the mass assignment
mA(z1) = y1 − y2, . . . ,mA(∪i=1,jzi) = yj − yj+1,. . . , mA(∪i=1,n zi) = yn
can be denoted by
mA = z1: y1 − y2, . . . ,∪i=1,jzi: yj − yj+1,. . . , ∪i=1,nzi: yn.
The probabilistic interpretation of relations on fuzzy sets, as the probability measures for the
relations are true, is defined in [2, 18] as below.
Definition 2. Let A be a fuzzy set on a domain U , B be a fuzzy set on a domain V , and θ be
a binary relation from {=, 6=, ≤, <, ⊆, ∈} assumed to be valid on (U × V ). The probabilistic
interpretation of a relation AθB, denoted by prob(AθB), is a value in [0, 1] that is defined by
NGUYEN HOA 191
∑
S⊆U,T⊆V
p(uθv|u ∈ S, v ∈ T ).mA(S).mB(T ),
where mA, mB are the mass assignments corresponding to A and B and p(uθv|u ∈ S, v ∈ T ) is
the conditional probability of uθv given u ∈ S, v ∈ T .
Intuitively, given fuzzy propositions “x is A” and “y is B”, prob(AθB) is the probability for
xθy being true.
Definition 3. Let A and B be two fuzzy sets on a domain U. The probabilistic interpretation
of the relation A→ B, denoted by prob(A→ B), is a value in [0, 1] that is defined by∑
S,T⊆U
p(u ∈ T |u ∈ S).mA(S).mB(T ),
where mA, mB are the mass assignments corresponding to A and B and p(u ∈ T |u ∈ S) is the
conditional probability for u ∈ T given u ∈ S.
The intuitive meaning of prob(A→ B) is that given a fuzzy proposition “x ∈ A”, prob(A→ B)
is the probability for x ∈ B being true. In other words, it is the fuzzy conditional probability of x ∈ B
given x ∈ A. It is noted that the above probabilistic interpretation can also be adapted for fuzzy
sets on continuous domains, using integration instead of additionas for computing the probability of
fuzzy events in [20].
Example 1. Let A = {3:0.2, 4:0.5, 5:0.9, 6:1} and B= {6: 0.3, 5:1, 4:0.3} be the fuzzy sets on the
domain {1, 2, 3, 4, 5, 6}, then the mass assignments corresponding to A and B are mA={6}:0.1, {5,
6}:0.4, {4, 5, 6}:0.3, {3, 4, 5, 6}:0.2 and mB = {5}:0.7, {4, 5, 6}:0.3. The probabilistic interpretation
of A→ B is computed as follows:
prob(A→ B) = p(u ∈{5}|u ∈{6}).mA({6}).mB({5}) +
p(u ∈{5}|u ∈{5,6}).mA({5,6}).mB({5}) +
p(u ∈{5}|u ∈{4,5,6}).mA({4,5,6}).mB({5}) +
p(u ∈{5}|u ∈{3,4,5,6}).mA({3,4,5,6}).mB({5}) +
p(u ∈{4,5,6}|u ∈{6}).mA({6}).mB({4,5,6}) +
p(u ∈{4,5,6}|u ∈{5,6}).mA({5,6}).mB({4,5,6}) +
p(u ∈{4,5,6}|u ∈{4,5,6}).mA({4,5,6}).mB({4,5,6}) +
p(u ∈{4,5,6}|u ∈{3,4,5,6}).mA({3,4,5,6}).mB({4,5,6})
= 0 × 0.1× 0.7 + 1/2 × 0.4 × 0.7 + 1/3 × 0.3 × 0.7 + 1/4 × 0.2 × 0.7 +
1.0 × 0.1 × 0.3 + 1.0 × 0.4 × 0.3 + 1.0 × 0.3 × 0.3 + 3/4 × 0.2 × 0.3= 0.53.
2.2. Fuzzy probabilistic triples
For representing imprecise and uncertain attribute values in FPRDB, the fuzzy probabilistic triples
are used in [18] extended from probabilistic triples in [17]. First, the probability distribution function
as the basis for the concept of the fuzzy probabilistic triple is defined as below.
Definition 4. Let X be a finite set, a probability distribution function α over X is a mapping
α: X → [0, 1] such that ∑
x∈X
α(x) ≤ 1.
An important probability distribution function which we often encounter in practice is the uniform
distribution u(x) = 1/|X|, ∀x ∈ X . For example, if X = {e1, e2, e3}, the uniform distribution u
over X is u(x) = 1/3, ∀x ∈{e1, e2, e3}.
192 A FUZZY PROBABILISTIC RELATIONAL DATABASE MODEL AND ALGEBRA
Definition 5. A probabilistic triple 〈X , α, β 〉 consists of a finite set X , a probability distribution
function α over X , and a function β: X → [0, 1] such that α(x) ≤ β(x), ∀x ∈ X . If the elements
of X are fuzzy sets then 〈X , α, β〉 is called a fuzzy probabilistic triple.
Informally, a fuzzy probabilistic triple 〈X , α, β〉 assigns each element x ∈ X a probability
interval [α(x), β(x)] to express the imprecise and uncertainty degree of x in X . It means that
the fuzzy probabilistic triple 〈X , α, β〉 assigns each element x ∈ X a probability p(x) which
α(x) ≤ p(x) ≤β(x) to represent the imprecise and uncertainty degree of x in X . It is easy to see
that the fuzzy probabilistic triple 〈X , α, β〉 is a probability interval extension of the probability
distribution function α on X . Many probabilistic database models, such as [1, 9, 11, 17], used the
interval [α(x), β(x)] to represent the probability for x instead of using the value of the distribution
function α(x). Because, in many situations, it is impossible to know exactly the probability α for x
but only the probability for x belonging to an interval [α, β].
Example 2. Suppose the daily treatment cost of a patient is estimated within about 50 or 60
(thousand VND) with a probability for each between 0.4 and 0.6. Then this information can be
represented by the fuzzy probabilistic triple 〈X , α, β〉 = 〈{about 50, about 60}, 0.8u, 1.2u〉, where
about 50 and about 60 are fuzzy sets defining the imprecise treatment costs and u is the uniform
distribution over X= {about 50, about 60}. Here, 0.8u and 1.2u are the probability distribution
functions α and β respectively with α(x) = 0.8u(x) = 0.8(1/2) = 0.4 and β(x) = 1.2u(x) = 1.2(1/2)
= 0.6, ∀x ∈ X= {about 50, about 60}. In addition, it is noted more that
∑
x∈X
α(x) = 0.8u(about 50) + 0.8u(about 60) = 0.8(1/2) + 0.8(1/2) = 0.4 + 0.4 = 0.8 ≤ 1
and
α(x) = 0.8u(x) = 0.8(1/2) = 0.4 ≤ 0.6 = 1.2(1/2) = 1.2u(x) = β(x),∀x ∈ X.
It means that α(x) ≤β(x) and β(x) ∈[0, 1], ∀x ∈ X .
2.3. Probabilistic combination strategies
Let two events e1 and e2 have probabilities in the intervals [L1, U1] and [L2, U2], then probability
intervals of the conjunction event e1 ∧ e2, disjunction event e1 ∨ e2, or difference event e1 ∧ ¬e2
can be computed by alternative strategies. In this work, we employ the conjunction, disjunction,
and difference strategies given in [1, 17] as presented in Table 1, where ⊗, ⊕, and denote the
conjunction, disjunction, and difference operators, respectively.
In following sections, the notation [L1, U1] ≤ [L2, U2] is used to replace L1 ≤ L2 and U1 ≤ U2
whereas the notation [L1, U1] ⊆ [L2, U2] is used to replace L2 ≤ L1 and U1 ≤ U2.
2.4. Conjunction, disjunction and difference of fuzzy probabilistic triples
For building algebraic operations such as the join, intersection, union and difference of fuzzy proba-
bilistic relations in FPRDB, combination strategies of probabilistic triples in [17] are extended with
fuzzy sets. It is noted that, here h(v) denotes the height of a fuzzy set v, whereby v is a normal
fuzzy set if and only if h(v) = 1.
Definition 6. Let fpt1 = 〈V1, α1, β1〉 and fpt2 = 〈V2, α2, β2〉 be two fuzzy probabilistic triples,
and ⊗ be a probabilistic conjunction strategy. Then the conjunction of fpt1 and fpt2 under ⊗,
denoted by fpt1⊗fpt2, is the fuzzy probabilistic triple fpt = 〈V , α, β〉, such that:
NGUYEN HOA 193
Table 1. Examples of probabilistic combination strategies
Strategy Operators
Ignorance ([L1, U1] ⊗ig[L2, U2]) ≡ [max(0, L1 + L2– 1), min(U1, U2)]
([L1, U1] ⊕ig[L2, U2]) ≡ [max(L1, L2), min(1, U1 + U2)]
([L1, U1] ig[L2, U2]) ≡ [max(0, L1– U2), min(U1,1– L2)]
Independence ([L1, U1] ⊗in[L2, U2]) ≡ [L1 . L2, U1. U2]
([L1, U1] ⊕in[L2, U2]) ≡ [L1 +L2– (L1 . L2), U1 +U2– (U1 . U2)]
([L1, U1] in[L2, U2]) ≡ [L1. (1– U2), U1. (1– L2)]
Positive correlation
(when e1 implies e2,
or e2 implies e1)
([L1, U1] ⊗pc[L2, U2]) ≡ [min(L1, L2), min(U1, U2)]
([L1, U1] ⊕pc[L2, U2]) ≡ [max(L1, L2), max(U1, U2)]
([L1, U1] pc[L2, U2]) ≡ [max(0, L1 – U2), max(0, U1 –L2)]
Mutual exclusion
(when e1 and e2 are
mutually exclusive)
([L1, U1] ⊗me[L2, U2]) ≡ [0, 0]
([L1, U1] ⊕me[L2, U2]) ≡ [min(1, L1 + L2), min(1, U1 + U2)]
([L1, U1] me[L2, U2]) ≡ [L1, min(U1, 1 – L2)]
1) V ={v = v1∩v2|v1 ∈ V1, v2 ∈ V2, h(v) = 1, [α1(v1), β1(v1)]⊗[α2(v2), β2(v2)] 6=[0,0]}, and
2) [α(v), β(v)] = ⊕me:v1∈V 1,v2∈V 2,v=v1∩v2[α1(v1), β1(v1)]⊗[α2(v2), β2(v2)], for every v ∈ V ,
where ⊕me is the mutual exclusion probabilistic disjunction strategy.
It is noted that, unlike the PRDB, in that each v1 and v2 in V1 and V2 respectively is elementary
and non-fuzzy, here v1 and v2 may be fuzzy sets, since there can be more than one pair (v1, v2) ∈
V1×V2 such that v = v1∩v2. So the probability intervals for those pairs must be combined using the
mutual exclusion probabilistic disjunction strategy ⊕me in the above computation of [α(v), β(v)].
Example 3. Let fpt1 = 〈{about 40, about 50}, u, u〉 and fpt2 = 〈{about 50, about 60}, 0.8u,
1.2u〉 be fuzzy probabilistic triples, where about 40, about 50 and about 60 are fuzzy sets with
h(about 40∩about 60 )< 1, then fpt1⊗infpt2 with the independence probabilistic conjunction strat-
egy is the fuzzy probabilistic triple fpt = 〈{about 50}, 0.2u, 0.3u〉.
Next, the disjunction and difference of fuzzy probabilistic triples in turn are defined as below.
Definition 7. Let fpt1 = 〈V1, α1, β1〉 and fpt2 = 〈V2, α2, β2〉 be two fuzzy probabilistic triples,
and ⊕ be a probabilistic disjunction strategy. Then the disjunction of fpt1 and fpt2 under ⊕,
denoted by fpt1⊕fpt2, is the fuzzy probabilistic triple fpt = 〈V ,α,β〉, such that:
1) V = P ∪Q∪R, where P= {v1 ∈ V1|¬∃v2 ∈ V2: h(v1∩ v2) = 1}, Q= {v2 ∈ V2|¬∃v1 ∈ V1:
h(v1 ∩ v2) = 1}, and R= {v1 ∩ v2|v1 ∈ V1, v2 ∈ V2, h(v1 ∩ v2) = 1}, and
2)
[α(v), β(v)] =
[α1(v), β1(v)], ∀v ∈ P
[α2(v), β2(v)], ∀v ∈ Q
⊕me:v1∈V1,v2∈V2,v=v1∩v2 [α1(v1), β1(v1)]⊕ [α2(v2), β2(v2)], ∀v ∈ R
Definition 8. Let fpt1 = 〈V1, α1, β1〉 and fpt2 = 〈V2, α2, β2〉 be two fuzzy probabilistic triples,
and be a probabilistic difference strategy. Then the difference of fpt1 and fpt2 under , denoted
by fpt1 fpt2, is the fuzzy probabilistic triple fpt = 〈V , α, β〉, such that:
194 A FUZZY PROBABILISTIC RELATIONAL DATABASE MODEL AND ALGEBRA
1) V = P ∪ Q, where P= {v1 ∈ V1|¬∃v2 ∈ V2: h(v1 ∩ v2) = 1}, Q= {v = v1 ∩ v2|v1 ∈
V1,v2 ∈ V2, h(v1 ∩ v2) = 1 and [α1(v1), β1(v1)] [α2(v2), β2(v2)] 6= [0, 0]}, and
2)
[α(v), β(v)] =
{
[α1(v), β1(v)], ∀v ∈ P
⊕me:v1∈V1,v2∈V2,v=v1∩v2 [α1(v1), β1(v1)] [α2(v2), β2(v2)], ∀v ∈ Q
3. SCHEMAS AND FUZZY PROBABILISTIC RELATIONS
3.1. Fuzzy probabilistic relational schemas
Fuzzy probabilistic relational schemas are extended from probabilistic relational schemas in [17]. A
fuzzy probabilistic relational schema describes a set of attributes of a set of certain objects of which
each attribute is associated with fuzzy probabilistic triples as the following definition.
Definition 9. A fuzzy probabilistic relational schema is a pair R= (U , ℘), where
1) U = {A1, A2,. . . ,Ak} is a set of pairwise different attributes.
2) ℘ is a function that maps each attribute A ∈U to the set of all fuzzy probabilistic triples on
the value domain of A (i.e. each element of ℘(A) is a fuzzy probabilistic triple that has the
form 〈V , α, β〉, where V is a subset of the set of all fuzzy sets on the value domain of A).
Note that as in the classical relational database, for simplicity, the notations R(U , ℘) and R can
be used to replace R = (U , ℘). In addition, the domain of each attribute A is denoted by dom(A).
3.2. Fuzzy probabilistic relations
A fuzzy probabilistic relation is an instance of a fuzzy probabilistic relational schema in which each
attribute may take imprecise and uncertain values represented by a fuzzy probabilistic triple as the
definition below.
Definition 10. Let U= {A1, A2,. . . ,Ak} be a set of k pairwise different attributes. A fuzzy
probabilistic relation r over the fuzzy probabilistic relational schema R(U , ℘), is a finite set
{t|t= (〈V1, α1, β1〉, 〈V2, α2, β2〉,. . . , 〈Vk, αk, βk〉)} in which each element t is a list of k fuzzy
probabilistic triples such that 〈Vi, αi, βi〉 belongs to the set fi = ℘(Ai) and Vi 6= ∅, for every i =1,
2,. . . , k.
Each element t in the relation r over R(U , ℘) is called a tuple on U . Each fuzzy probabilistic
triple 〈Vi, αi, βi〉 represents the imprecise and uncertain value of the attribute Ai of the tuple t,
the notation t.Ai denotes 〈Vi, αi, βi〉 that is t.Ai = 〈Vi, αi, βi〉. For each set of attributes X ⊆
{A1, A2, . . . , Ak}, the notation t[X ] is used to denote the rest of t after eliminating the value of
attributes not belonging to X .
From Definition 5, it is noted that, each attribute Ai of a tuple t in the relation r over R(U , ℘)
only takes one value v in Vi with a probability p(v) ∈ [αi(v),βi(v)]. As in [1,3,16], the model FPRDB
adopts the closed world assumption (CWA). It means, for each tuple t, every value v ∈dom(Ai)−Vi
has the probability 0. In addition, each precise value v ∈ Vi is also considered as a special fuzzy set
on dom(Ai) with the membership function µv(v) =1 and µv(x) = 0, ∀x ∈dom(Ai) and x 6= v.
So, each probabilistic relation in [17] can be considered as a particular fuzzy probabilistic relation by
Definition 10.
NGUYEN HOA 195
Example 4. A simple fuzzy probabilistic relation PATIENT (over the schema PATIENT) about
patients at the clinic of a hospital can be organised as Table 2 In the relation, the attributes P ID,
P NAME, AGE, DISEASE and D COST describe the information about the identifier, name, age,
disease and daily treatment cost of each patient, respectively. In reality, while diagnosing, the disease
of each patient is not always determined certainly by the physicians. Similarly, the treatment cost
for patients are also not known definitely even as the patients know about their diseases. Here,
the conventional unit for the treatment cost is 1000 (VND). It is noted that, for each attribute
A ∈U={P ID, P NAME, AGE, DISEASE D COST} in the schema PATIENT(U , ℘), ℘(A)
includes all fuzzy probabilistic triples on the value domain of A (Definition 9).
Table 2. Relation PATIENT
P ID P NAME AGE DISEASE D COST
PT226 N.V.
Ha
〈{65}, u, u〉 〈{lung cancer, tuber-
culosis},
0.8u, 1.2u〉
〈{300, 350}, u, u〉
PT234 T.V.
Son
〈{young},u,
u〉
〈{hepatitis, cirrho-
sis},
0.9u, 1.3u〉
〈{about 60,
about 70}, 0.8u,
1.2u〉
PT242 L.T.
Lan
〈{middle aged},
u, u〉
〈{cholecystitis}, u,
u〉
〈{8}, u, u〉
Also note that, in the real world applications, fuzzy set values of attributes of the fuzzy prob-
abilistic relation PATIENT, such as about 60, about 70, young and middle aged will be defined
(maybe by professional experts) compatibly and consistently with the meaning of the information
represented by them. In this example, the fuzzy set values only simply illustrate for Definition 10.
For example, about 60 = {58: 0.5, 59: 0.9, 60: 1, 61: 0.9, 62: 0.5} or about 70 = {68: 0.5, 69:
0.9, 70: 1, 71: 0.9, 72: 0.5} can be considered as a fuzzy set value representing the daily treatment
cost of the patient T.V. Son who has hepatitis or cirrhosis. Similarly, young and middle aged
whose membership function graphs as Figure 1, may be considered as fuzzy set values representing
the imprecise age of the patients T.V.Son and L.T. Lan respectively. In addition, for simplicity,
each fuzzy probabilistic triple 〈V , u, u〉, with V ={v}, will be represented as a single value v because
if the attribute takes such a fuzzy probabilistic triple, then actually it only takes a value v with the
probability is 1 (Definition 5). In other words, the attribute certainly takes the value v.
Figure 1. Fuzzy set values of the attribute AGE
Now, the notion of a fuzzy probabilistic relational database is defined as follows.
Definition 11. A fuzzy probabilistic relational database over a set of attributes is a set of fuzzy
196 A FUZZY PROBABILISTIC RELATIONAL DATABASE MODEL AND ALGEBRA
probabilistic relations corresponding with the set of their fuzzy probabilistic relational schemas.
Note that, if we only care about an unique relation over a schema then we can unify its symbol
name with its schema’s name.
3.3. Fuzzy probabilistic functional dependencies
For defining the fuzzy probabilistic functional dependent concept in FPRDB, first a probability
measure is proposed to determine the fuzzy equal degree of two values of the same attribute for two
different tuples in a relation as below.
Definition 12. Let t1 and t2 be two tuples in a fuzzy probabilistic relation r, A be an attribute
of r and ⊗ be a probabilistic conjunction strategy. The probability interval for the values of the
attribute A of two tuples t1 and t2 respectively are equal under ⊗, denoted by p(t1.A =⊗ t2.A) is
[Σv∈V α(v).prob(v1 = v2), min(1, Σv∈V β(v).prob(v1 = v2))]
where,
t1.A = 〈V1, α1, β1〉, t2.A = 〈V2, α2, β2〉 and
[α(v), β(v)] = [α1(v1), β1(v1)] ⊗ [α2(v2), β2(v2)], ∀v= (v1, v2) ∈ V = V1 × V2.
Now, the fuzzy probabilistic functional dependency in FPRDB is extended from the probabilistic
functional dependency in PRDB [17] as follows.
Definition 13. Let U= {A1, A2,. . . ,Ak} be a set of k pairwise different attributes R(U , ℘)
be a fuzzy probabilistic relational schema, r be any fuzzy probabilistic relation over R, ⊗ be a
probabilistic conjunction strategy, X= {Ai,. . . ,Al} and Y = {Aj ,. . . ,Am} be two subsets of U . A
fuzzy probabilistic functional dependency of Y on X under ⊗ over R, denoted by X ⊗Y , if and
only if
∀t1, t2 ∈ r, Pr(t1[X ] =⊗t2[X ]) ≤Pr(t1[Y ] =⊗t2[Y ]),
where
Pr(t1[X ] =⊗t2[X ]) = p(t1.Ai =⊗ t2.Ai) ⊗ . . .⊗p(t1.Al =⊗ t2.Al)
and
Pr(t1[Y ] =⊗t2[Y ]) = p(t1.Aj =⊗ t2.Aj) ⊗ . . .⊗p(t1.Am =⊗ t2.Am).
An obvious example of the fuzzy probabilistic functional dependency is every attribute Ai de-
pending on the set {A1, A2,. . . , Ak} that consists of all attributes of the schema R. It is noted that
in the classical database, one can consider the probability for two values of an attribute equal i.e.
only 0 or 1, so the functional dependency in the classical relational database is a particular case of
the fuzzy probabilistic functional dependency in this definition.
As for the classical relational database model, the keys of a schema in FPRDB are the basis
for recognising a tuple in a fuzzy probabilistic relation. In the model and management systems of
the classical relational database, key attributes are constrained not to take the value NULL [3, 4].
Similarly, in FPRDB, it is assuming that the value of each key attribute is always certain and definite.
The key concept of fuzzy probabilistic relational schema is defined using the fuzzy probabilistic
functional dependency as follows.
Definition 14. Let R(U , ℘) be a fuzzy probabilistic relational schema, r be any relation over R
and ⊗ be a probabilistic conjunction strategy, a set of attributes K ⊆U is called a key of R under
⊗ if the value of each attribute of K is always certain, precise in r and there is a fuzzy probabilistic
functional dependency K ⊗U such that not to exist any subset of K has the properties.
NGUYEN HOA 197
In the relation PATIENT above, if we assume that each patient has a unique identifier corre-
sponding with the value of the attribute P ID and the identifier differs from every identifier of other
patients, then by the definition, P ID is a key of the schema PATIENT under any probabilistic
conjunction strategy.
4. SELECTION OPERATION ON A FUZZY PROBABILISTIC RELATION
4.1. Syntax of selection conditions
The selection is a basic algebraic operation and is used to query the imprecise and uncertain informa-
tion in FPRDB. Before defining the selection operation, we present the formal syntax and semantics
of selection conditions by extending those definitions of PRDB with fuzzy sets. We start with the
syntax of selection expressions as the following definition.
Definition 15. Let R be a schema in FPRDB and X be a set of relational tuple variables. Then
selection expressions are inductively defined and have one of the following forms:
1) x.A θc, where x ∈X , A is an attribute in R, θ is a binary relation from {=, 6=, ≤, <, ⊆,∈,→}
and c is a single value or a fuzzy set.
2) x.A1 =⊗x.A2, where x ∈X , A1 and A2 are two different attributes in R and ⊗ is a proba-
bilistic conjunction strategy.
3) E1 ⊗ E2, where E1 and E2 are selection expressions on the same relational tuple variable, ⊗
is a probabilistic conjunction strategy.
4) E1 ⊕ E2, where E1 and E2 are selection expressions on the same relational tuple variable, ⊕
is a probabilistic disjunction strategy.
Example 5. Consider the fuzzy probabilistic relational schema PATIENT in Example 4, the selec-
tion of “all patients who get hepatitis is and pay the daily treatment cost about 60 (thousand VND)”
can be expressed by the selection expression x.DISEASE = hepatitis⊗x.D COST→about 60.
In FPRDB, each selection condition is a logical combination of selection expressions with proba-
bility intervals need to be satisfied as the following definition.
Definition 16. Let R be a schema in FPRDB, selection conditions are inductively defined as
follows:
1) If E is a selection expression and [L, U ] is a subinterval of [0, 1], then (E)[L,U ] is a selection
condition.
2) If φ and ψ are selection conditions on the same tuple variable, then ¬φ, (φ ∧ ψ), (φ ∨ ψ) are
selection conditions.
Example 6. With the schema of the relation PATIENT in Example 4, then the selection of “all
patients who are young with a probability of at least 0.4 and have lung cancer with a probability of
at least 0.8” can be done using the selection condition (x.AGE → young) [0.4, 1.0] ∧ (x.DISEASE
= lung cancer)[0.8, 1.0].
198 A FUZZY PROBABILISTIC RELATIONAL DATABASE MODEL AND ALGEBRA
4.2. Semantics of selection conditions
For defining the semantics of selection conditions in FPRDB, we first extend probabilistic interpre-
tations of selection expressions in [17] with the binary relations on fuzzy sets in Section 2 as the
definition below.
Definition 17. Let R be a fuzzy probabilistic relational schema in FPRDB, r be a relation over R,
x be a tuple variable and t be a tuple in r. The probabilistic interpretation of selection expressions
with respect to R, r and t, denoted by probR,r,t, is the partial mapping from the set of all selection
expressions to the set of all closed subintervals of [0, 1] that is inductively defined as follows:
1) probR,r,t(x.Aθc) = [Σv∈V α(v).prob(vθc), min(1, Σv∈V β(v).prob(vθc))],
where t.A = 〈V , α, β〉.
2) probR,r,t(x.A1 =⊗ x.A2) = [Σv∈V α(v).prob(v1= v2), min(1, Σv∈V β(v).prob(v1 = v2))],
where t.A1= 〈V1, α1, β1〉, t.A2 = 〈V2, α2, β2〉 and
[α(v), β(v)] = [α1(v1), β1(v1)] ⊗ [α2(v2), β2(v2)], ∀v = (v1, v2) ∈ V = V1 × V2.
3) probR,r,t(E1 ⊗ E2) = probR,r,t(E1)⊗ probR,r,t(E2).
4) probR,r,t(E1 ⊕ E2) = probR,r,t(E1)⊕ probR,r,t(E2).
Intuitively, probR,r,t(x.Aθc) is the probability interval for the attribute A of the tuple t having
a value v such that vθc and probR,r,t(x.A1 =⊗ x.A2) is the probability interval for the attributes
A1 and A2 of the tuple t having values v1 and v2 respectively, such that v1 = v2.
Example 7. Let r denote the relation PATIENT in Example 4 and R denote the schema of PATIENT
consider the tuple t2 (the second tuple) in r, using Definition 3 we have prob(about 60 →about 60 )
= 0.86 and prob(about 70 →about 60 ) = 0.0. From that
probR,r,t2(x.D COST→about 60 ) = [0.8u(about 60 ).prob(about 60 →about 60 ) +
0.8u(about 70 ).prob(about 70 →about 60 ),
min(1, 1.2u(about 60 ).prob(about 60 →about 60 ) +
1.2u(about 70 ).prob(about 70 →about 60 ))]
= [0.8 ×0.5×0.86 + 0.8 ×0.5×0., min(1, 1.2 ×0.5×0.86 + 1.2 ×0.5×0.0)] =[0.344, 0.516].
On the basis of the probabilistic interpretation of selection expressions, the satisfaction or seman-
tics of selection conditions in FPRDB is defined as below.
Definition 18. Let R be a relational schema in FPRDB, r be a fuzzy probabilistic relation over
R and t ∈ r. The satisfaction of selection conditions under probR,r,t is defined as follows:
1) probR,r,t |= (E)[L, U ] if and only if (iff) probR,r,t(E) ⊆[L, U ].
2) probR,r,t |= ¬φ iff probR,r,t |= φ does not hold.
3) probR,r,t |= φ ∧ ψ iff probR,r,t |= φ and probR,r,t |= ψ.
4) probR,r,t |= φ ∨ ψ iff probR,r,t |= φ or probR,r,t |= ψ.
NGUYEN HOA 199
Note that, in the classical database, the concepts of selection expression and selection condition
are identical and we can consider probability intervals [L, U ] in selection conditions being always
equal to [1.0, 1.0]. This also means that the concept of satisfaction of selection conditions in the
classical relational database model is a particular case of the concept of satisfaction of selection
conditions in FPRDB.
Now, the selection operation on a relation in FPRDB is defined as follows.
Definition 19. Let R be a relational schema in FPRDB, r be a fuzzy probabilistic relation over
R and φ be a selection condition over a tuple variable x. The selection on r with respect to φ,
denoted by σφ(r), is the relation r’ = {t ∈ r|probR,r,t |= φ} over R, including all satisfied tuples of
the selection condition φ.
Example 8. Let approx 15 = (10: 0; 15: 1; 20: 0) denote the continuous triangle shaped fuzzy
set whose vertices are (15, 1), (10, 0) and (20, 0). Consider the relation PATIENT in Example 4.
Then, the query “Find all patients who are approximately 15 years old with a probability from 0.2
to 0.5 and have hepatitis with a probability of at least 0.4” can be done by the selection operation
r’=σφ(PATIENT) with φ =(x.AGE→approx 15 )[0.2, 0.5]∧ (x.DISEASE = hepatitis)[0.4, 1.0].
As noted in Definition 3, the probabilistic interpretation prob(A→ B) can also be adapted for
fuzzy sets on continuous domains, using integration instead of addition. That is
prob(A→ B) =
1∫
0
1∫
0
p(u ∈ yB|u ∈ xA)dxdy =
1∫
0
1∫
0
p(xA ∩ yB)
p(xA)
dxdy =
1∫
0
1∫
0
|xA ∩ yB|
|xA| dxdy
where xA and yB are α -cuts of the fuzzy sets A and B with α = x and α = y, respectively.
For the fuzzy sets approx 15 above and young given in Example 4, their α-cuts are respectively
αapprox 15 = [10+5α, 20-5α] and αyoung= [0, 35-15α]. From that prob(young→approx 15 ) is
computed as follows:
prob(young → approx 15) =
1∫
0
1∫
0
|xyoung ∩ yapprox 15|
|xyoung| dxdy
=
1∫
0
1∫
0
|[0, 35− 15x] ∩ [10 + 5y, 20− 5y]|
|[0, 35− 15x]| dxdy
=
1∫
0
1∫
0
10− 10y
35− 15xdxdy =
1∫
0
1∫
0
2− 2y
7− 3xdxdy =
ln 7− ln 4
3
≈ 0.2.
The selection is implemented by checking the satisfaction of all tuples in PATIENT for the
selection condition φ. From the result computed above, we can see only the tuple t2 satisfies φ be-
cause probR,r,t2(x.AGE→approx 15 ) = [u(young).prob(young→approx 15 ), min(1, u(young).
prob(young→approx 15 ))] =[1 ×0.2, min(1, 1 ×0.2]= [0.2, 0.2]⊆ [0.2, 0.5] and probR,r,t2(x. DIS-
EASE = hepatitis) = [0.45, 0.65]⊆[0.4, 1.0]. So, the result of the selection is the relation r’=
σφ(PATIENT) as in Table 3.
200 A FUZZY PROBABILISTIC RELATIONAL DATABASE MODEL AND ALGEBRA
Table 3. Relation r’=σφ (PATIENT)
P ID P NAME AGE DISEASE D COST
PT234 T.V. Son 〈{young},u,
u〉
〈{hepatitis, cirrhosis},
0.9u, 1.3u〉
〈{about 60, about 70},
0.8u, 1.2u〉
5. OTHER OPERATIONSON FUZZY PROBABILISTIC RELATIONS
As for the classical relational database and PRDB, other basic operations on fuzzy probabilistic
relations are the projection, Cartesian product, join, intersection, union, and difference. We now
extend those operations of PRDB for FPRDB taking into account imprecise and uncertain values of
relational attributes.
5.1. Projection
A projection of a fuzzy probabilistic relation on a set of attributes is a new fuzzy probabilistic relation
in which only the attributes in that set are considered for each tuple of the new relation as the following
definition.
Definition 20. Let R= (U , ℘) be a fuzzy probabilistic relational schema, r be a relation over
R and L be a subset of attributes of U . The projection of r on L denoted by ΠL(r), is the fuzzy
probabilistic relation r’ over the schema R’, determined by:
1) R’ = (L, ℘’) and ℘’(A) = ℘(A), ∀A ∈L,
2) r’ = {t’= t[L]|t ∈ r}, i.e., r’ consists of tuples t’ to achieve from the tuples t = (〈V1, α1,
β1〉,. . . , 〈Vk, αk, βk〉) ∈ r by eliminating every 〈Vj , αj , βj〉 that t.Aj = 〈Vj , αj , βj〉 and
Aj /∈L.
5.2. Cartesian product
For the Cartesian product of two fuzzy probabilistic relations, as in the classical relational database
and PRDB, we assume the set of attributes of their schemas are disjoint. Also, for the operation
being commutative, we assume every k-tuple t = (〈V1, α1, β1〉,. . . , 〈Vk, αk, βk〉) is an un-ordered
list.
Definition 21. The fuzzy probabilistic relational schemasR1(U 1, ℘1) andR2(U 2, ℘2) are Carte-
sian product-compatible if and only if U 1 and U 2 have not any common attribute.
Note that, any schemas R1(U 1, ℘1) and R2(U 2, ℘2) can be made Cartesian product-compatible
by renaming of attributes in U 1 and U 2.
Now, the Cartesian product of two fuzzy probabilistic relations in FPRDB is extended from that
operation of PRDB as follows.
Definition 22. Let r1 and r2 be two fuzzy probabilistic relations over the Cartesian product-
compatible schemas R1= (U 1, ℘1) and R2= (U 2, ℘2), respectively. The Cartesian product of r1
and r2, denoted by r1 × r2, is the fuzzy probabilistic relation r over R determined by:
1) R = (U , ℘), whereU= U 1∪ U 2, ℘(A) = ℘1(A) if A ∈U 1 and ℘(A) = ℘2(A) if A ∈U 2,
NGUYEN HOA 201
2) r= {t =(〈V1, α1, β1〉, . . . , 〈Vk, αk, βk〉, 〈Vk+1, αk+1, βk+1〉,. . . , 〈Vk+m, αk+m, βk+m〉)|t1
=(〈V1, α1, β1〉, . . . , 〈Vk, αk, βk〉) and t2 = (〈Vk+1, αk+1, βk+1〉,. . . , 〈Vk+m, αk+m, βk+m〉),
t1 ∈ r1 and t2 ∈ r2}.
5.3. Join
The join of two fuzzy probabilistic relations in FPRDB is extended from the natural join of two
relations in the classical relational database and the join in PRDB. The join only works with relations
whose schemas are join-compatible as the definition below.
Definition 23. The fuzzy probabilistic relational schemas R1(U 1, ℘1) and R2(U 2, ℘2) are join
-compatible if and only if the domains of two attributes of the same name A in U 1 and U 2,
respectively are identical.
From Definition 9, we can see that for two attributes of the same name A in U 1 and U 2 of two
join–compatible schemas R1(U 1, ℘1) and R2(U 2, ℘2) then ℘1(A) = ℘2(A). For building the join
of fuzzy two probabilistic relations, we first extend the join of two tuples in PRDB for FPRDB using
the conjunction of fuzzy probabilistic triples as follows.
Definition 24. Let t1 and t2 be two tuples on two sets of attributes U 1 and U 2 respectively, and
⊗ be a probabilistic conjunction strategy. The join of t1 and t2 under ⊗, denoted by t1 ./⊗ t2, is
the tuple t on U 1∪U 2 defined by:
1) t.A = t1.A, ∀A ∈U 1−U 2,
2) t.A = t2.A, ∀A ∈U 2−U 1,
3) t.A =t1.A⊗ t2.A, ∀A ∈U 1∩U 2.
It is noted that t1.A ⊗ t2.A is the conjunction of two fuzzy probabilistic triples t1.A and t2.A by
Definition 6. It is easy to see that t1 ./⊗ t2 = t2 ./⊗ t1 for every probabilistic conjunction strategy,
i.e. the join of two tuples is commutative.
Definition 25. Let r1 and r2 be two fuzzy probabilistic relations over the join-compatible schemas
R1= (U 1, ℘1) and R2= (U 2, ℘2), respectively and let ⊗ be a probabilistic conjunction strategy.
The join of r1 and r2 under ⊗, denoted by r1 ./⊗ r2, is the fuzzy probabilistic relation r over the
schema R, determined by:
1) R = (U , ℘) where U = U 1∪ U 2, ℘(A) = ℘1(A) if A ∈U 1− U 2, ℘(A) = ℘2(A) if
A ∈U 2− U 1 and ℘(A) = ℘1(A) = ℘2(A) if A ∈U 1∩U 2 (because ℘1(A) = ℘2(A),
Definition 23).
2) r= {t = t1 ./⊗ t2| t1 ∈ r1, t2 ∈ r2 and ∀A ∈U 1∩U 2, if t1.A ⊗ t2.A = 〈V , α, β〉 then
V 6= ∅}.
Example 9. Given two fuzzy probabilistic relations DOCTOR1 and DOCTOR2 as in Tables 4 and 5,
where approx 30 = (25:0;30:1;35:0) denotes the continuous triangle shaped fuzzy set whose vertices
are (30, 1), (25, 0) and (35, 0) and middle aged is the fuzzy set in Example 4. Then, the result of
the join of them under the probabilistic conjunction strategy ⊗in and the standard intersection of
fuzzy sets (with the height equals 1, by Definition 6) is the relation DOCTOR computed as in Table 6.
202 A FUZZY PROBABILISTIC RELATIONAL DATABASE MODEL AND ALGEBRA
Table 4. Relation DOCTOR1
DOCTOR ID D AGE
DT005 〈{middle aged}, u, u〉
DT093 〈{approx 30}, u, u〉
DT102 〈{55,56}, u, u〉
Table 5. Relation DOCTOR2
DOCTOR NAME D AGE
L.V Cuong 〈{30, 31}, 0.8u, 1.2u〉
N.V. Hung 〈{middle aged}, u, u〉
N.T.Dat 〈{54, 55}, u, u〉
Here, the names of each relation and its schema are identical, the set of fuzzy probabilistic triples
℘(A) for each attribute A in the schemas consists of all fuzzy probabilistic triples on dom(A).
5.4. Intersection, union, and difference
Intersection, union and difference of two fuzzy probabilistic relations, respectively, over the same
schema is a fuzzy probabilistic relation over that schema, in which the value of attributes in common
tuples of those two relations associated by a probabilistic combination strategy. A common tuple
of two fuzzy probabilistic relations over the same schema is the tuple whose key attributes’ values
are identical in both relations. It is due to the impreciseness and uncertainty of attribute values, a
common tuple of two fuzzy probabilistic relations is not completely identical as that of two relations
in the classical relational database.
First, the intersection of two tuples as the basis for the intersection of two fuzzy probabilistic
relations is defined as follows.
Definition 26. Let t1 and t2 be two tuples respectively in two fuzzy probabilistic relations over
the same schema R(U , ℘) and ⊗ be a probabilistic conjunction strategy. The intersection of t1
and t2 under ⊗, denoted by t1 ∩⊗ t2, is the tuple t on U defined by t.A = t1.A⊗ t2.A for every
A ∈U .
Definition 27. Let r1 and r2 be two fuzzy probabilistic relations over the same schemaR(U , ℘), K
be a key of R and ⊗ be a probabilistic conjunction strategy. The intersection of r1 and r2 under ⊗,
denoted by r1∩⊗r2, is the fuzzy probabilistic relation r overR defined by r = {t = t1∩⊗t2| t1 ∈ r1,
t2 ∈ r2 such that t1[K] = t2[K] and t1.A⊗ t2.A 6= 〈∅, α, β〉, ∀A ∈ U }.
Table 6. Relation DOCTOR= DOCTOR1 ./⊗inDOCTOR2
DOCTOR ID DOCTOR NAME D AGE
DT005 N.V. Hung 〈{middle aged}, u, u〉
DT093 L.V Cuong 〈{30}, 0.4u, 0.6u〉
DT102 N.T.Dat 〈{55}, 0.25u, 0.25u〉
NGUYEN HOA 203
It is noted that, the notation t1[K] = t2[K] is used in the definition due to the value of each key
attribute assumed is certain and definite as in the Definition 14.
Example 10. Consider two relations DIAGNOSE1 and DIAGNOSE2 over the same schema DI-
AGNOSE(U , ℘) as in Tables 7 and 8 where U={P ID, DOCTOR ID, P AGE, DISEASE} and
{P ID, DOCTOR ID} is a key of DIAGNOSE, young, middle aged and approx 15 are the
fuzzy sets given in Examples 4 and 8. The set ℘(A) for each attribute A in the schema DIAG-
NOSE (U , ℘) consists of all fuzzy probabilistic triples 〈V , α, β〉 on dom(A). Then the intersection
of DIAGNOSE1 and DIAGNOSE2 under ⊗in is the relation DIAGNOSE computed as in Table 9.
Table 7. Relation DIAGNOSE1
P ID DOCTOR ID P AGE DISEASE
PT226 DT093 〈{65}, u, u〉 〈{lung cancer, tuberculosis}, 0.8u, 1.2u〉
PT234 DT102 〈{approx 15},u,
u〉
〈{hepatitis, cirrhosis}, u, u〉
Table 8. Relation DIAGNOSE2
P ID DOCTOR ID P AGE DISEASE
PT383 DT102 〈{69, 70}, u, u〉 〈{lung cancers}, u, u〉
PT234 DT102 〈{young},u, u〉 〈{hepatitis, gall-stone}, 0.8u, 1.2u〉
PT242 DT025 〈{middle aged}, u, u〉 〈{cholecystitis}, u, u〉
Table 9. Relation DIAGNOSE = DIAGNOSE1∩⊗inDIAGNOSE2
P ID DOCTOR ID P AGE DISEASE
PT234 DT102 〈{approx 15},u, u〉 〈{hepatitis}, 0.2u, 0.3u〉
It is noted that, in the example, the standard intersection is used to compute the intersection
of fuzzy sets whereby, for example approx 15 ∩ young= approx 15. In addition, by Definition 6,
the height of each fuzzy set computed from the intersection of two fuzzy sets is required to equal 1.
The union of two fuzzy probabilistic relations over the same schema in FPRDB are based on the
union of tuples as below.
Definition 28. Let t1 and t2 be two tuples respectively in two fuzzy probabilistic relations over
the same schema R(U , ℘) and ⊕ be a probabilistic disjunction strategy. The union of t1 and t2
under ⊕, denoted by t1 ∪⊕ t2, is the tuple t on U defined by t.A = t1.A⊕ t2.A for every A ∈U .
Definition 29. Let r1 and r2 be two fuzzy probabilistic relations over the same schema R(U , ℘),
K be a key of R and ⊕ be a probabilistic disjunction strategy. The union of r1 and r2 under ⊕,
denoted by r1 ∪⊕ r2, is the fuzzy probabilistic relation r over R defined by r = {t1 ∈ r1| there is
not any tuple t2 ∈ r2 such that t1[K] =t2[K]}∪{t2 ∈ r2| there is not any tuple t1 ∈ r1 such that
t2[K] = t1[K]}∪{t = t1 ∪⊕ t2|t1 ∈ r1, t2 ∈ r2 such that t1[K] = t2[K]}.
204 A FUZZY PROBABILISTIC RELATIONAL DATABASE MODEL AND ALGEBRA
As for the intersection and union operations, for defining the difference operation of two fuzzy
probabilistic relations, we first define the difference operation of two tuples as follows.
Definition 30. Let t1 and t2 be two tuples respectively in two fuzzy probabilistic relations over
the same schema R(U , ℘) and be a probabilistic difference strategy. The difference of t1 and t2
under , denoted by t1 − t2, is the tuple t on U defined by t.A = t1.A t2.A for every A ∈U .
Definition 31. Let r1 and r2 be two fuzzy probabilistic relations over the same schema R(U , ℘),
K be a key of R and be a probabilistic difference strategy. The difference of r1 and r2 under ,
denoted by r1 − r2, is the fuzzy probabilistic relation r over R defined by r= {t1 ∈ r1| there is
not any tuple t2 ∈ r2 such that t1[K] = t2[K]}∪{t = t1 − t2| t1 ∈ r1, t2 ∈ r2 such that t1[K]
= t2[K] and t1.A t2.A 6= 〈∅, α, β〉, ∀A ∈ U }.
6. PROPERTY OF ALGEBRAIC OPERATIONS
In this section, we propose some properties of the fuzzy probabilistic relational algebraic operations
in FPRDB as an extension from those in the classical relational database and PRDB. Clearly, these
properties say that FPRDB model is coherent and consistent.
Theorem 1. Let r be a fuzzy probabilistic relation over the schema R in FPRDB, φ1 and φ2
be two selection conditions. Then
σφ1(σφ2(r)) = σφ2(σφ1(r)) = σφ1∧φ2(r) (1)
where, the last expression assumes that φ1 and φ2 have the same tuple variable.
The first property shows that the selections may be reordered.
Proof. Let r1 = σφ1(r), r2 = σφ2(r) and r1∧2 = σφ1∧φ2(r). Then for each t ∈ r, we have
σφ1(σφ2(r)) = {t ∈ r2|probR,r2,t |= φ1}
= {t ∈ r|(probR,r,t |= φ2) ∧ (probR,r2,t |= φ1)}
= {t ∈ r|(probR,r,t |= φ2) ∧ (probR,r,t |= φ1)}(because of r2 ⊆ r)
= {t ∈ r|probR,r,t |= φ1 ∧ φ2} (Definition 18)
= σφ1∧φ2(r).
So, σφ1(σφ2(r)) = σφ1∧φ2(r) is proven. Equation σφ2(σφ1(r)) = σφ2∧φ1(r) is proven similarly.
Since φ1 ∧ φ2 ⇔ φ2 ∧ φ1 (the logical conjunction of selection conditions are commutative), hence
σφ1∧φ2(r) = σφ2∧φ1(r). Therefore, we have σφ1(σφ2(r)) = σφ2(σφ1(r)) and so σφ1(σφ2(r)) =
σφ2(σφ1(r)) = σφ1∧φ2(r). Thus, Theorem 1 is proven.
Theorem 2. Let r be a fuzzy probabilistic relation over the schema R in FPRDB, A and B
be two subsets of attributes of R and A ⊆B . Then
ΠA(ΠB(r)) = ΠA(r). (2)
Proof. Because A ⊆ B , so A∩B= A and sides of (2) are the relations over the same schema
(Definition 20). From that, we are easy to see ΠA(ΠB(r)) = ΠA∩B(r) = ΠA(r). Thus, equation(2)
is proven.
NGUYEN HOA 205
Theorem 3. Let R1, R2 and R3 be pairwise join-compatible schemas in FPRDB, r1, r2 and
r3 be fuzzy probabilistic relations over R1, R2 and R3 respectively. Let ⊗ be a probabilistic
conjunction strategy. Then
r1 ./⊗ r2 = r2 ./⊗ r1, (3)
(r1 ./⊗ r2) ./⊗ r3 = r1 ./⊗ (r2 ./⊗ r3). (4)
Equation (4) and (5) say that the join operation of fuzzy probabilistic relations is commutative
and associative.
Proof. Clearly, r1 ./⊗ r2 and r2 ./⊗ r1 are two relations over the same schema. By Definition
6, the conjunction of fuzzy probabilistic triples is commutative (due to the commutativity of proba-
bilistic conjunction strategies and the intersection of fuzzy sets). Consequently, the join of tuples is
commutative (by Definition 24). So, by Definition 25, we have r1 ./⊗ r2 = r2 ./⊗ r1.
Since R1, R2 and R3 are pairwise join-compatible, so the results of two sides of (4) are the
relations over the same schema. Moreover, the intersection of fuzzy sets has the associativity, by
Definition 6, it follows that the conjunction of fuzzy probabilistic triples is associative. From asso-
ciativity of the classical relational join and by Definition 24, it is easy to see that the join of tuples
which are based on the conjunction of fuzzy probabilistic triples is associative. Thus, by Definition
25, it results in (r1 ./⊗ r2) ./⊗ r3 = r1 ./⊗ (r2 ./⊗ r3).
Because the Cartesian product is a particular case of the join (Definition 22 and Definition 25),
we have the straight corollary of Theorem 3 below.
Corollary 1.Let R1, R2 and R3 be pairwise Cartesian product-compatible schemas in FPRDB,
r1, r2 and r3 be fuzzy probabilistic relations over R1, R2 and R3 respectively. Then
r1 × r2 = r2 × r1, (5)
(r1 × r2)× r3 = r1 × (r2 × r3). (6)
Theorem 4. Let r1, r2 and r3 be fuzzy probabilistic relations over the same schema R in
FPRDB. Let ⊗/⊕ be a probabilistic conjunction/disjunction strategy. Then
r1 ∩⊗ r2 = r2 ∩⊗ r1, (7)
(r1 ∩⊗ r2) ∩⊗ r3 = r1 ∩⊗ (r2 ∩⊗ r3), (8)
r1 ∪⊕ r2 = r2 ∪⊕ r1, (9)
(r1 ∪⊕ r2) ∪⊕ r3 = r1 ∪⊕ (r2 ∪⊕ r3). (10)
Equations of (7), (8), (9) and (10) say that the intersection and union of relations in FPRDB are
commutative and associative.
Proof. Equations in the theorem are proven respectively as follows:
206 A FUZZY PROBABILISTIC RELATIONAL DATABASE MODEL AND ALGEBRA
Equations (7) and (8): From commutativity and associativity of the intersection of fuzzy sets, it
follows the conjunction of fuzzy probabilistic triples has commutativity and associativity. Thus, the
intersection of tuples, by Definition 26, has commutativity and associativity. So, the intersection of
tuples that have the same key value in r1, r2 and r3 respectively is commutative and associative.
From that, it follows Equations (7) and (8).
Equations (9) and (10): From commutativity of the union, intersection of fuzzy sets, the disjunc-
tion of fuzzy probabilistic triples (Definition 7) and the union of tuples (Definition 28), by Definition
29 we have Equation (9).
For Equation (10), let K be the key used to determine common tuples of r1, r2 and r3. Without
loss of generality, we may assume that each tuple t belongs to one of the three relations r1, r2 and
r3 then there exists two tuples belonging to the two remaining relations, respectively such that the
value of the key K of the three tuples respectively are always the same. This can be done by adding
t to the relations in which it is missing and resetting α(v) = β(v) = 0 for every v in the set V of
values of each attribute A /∈ K of the tuple t. Under this technical assumption, the result of each
expression in Equation (10) is not changed and only the union case of two tuples in Definition 29 is
relevant. Now, from associativity of the disjunction of fuzzy probabilistic triples (Definition 7) and
the union of tuples, Equation(10) obviously holds.
7. CONCLUSION
In this paper, the authors propose a fuzzy probabilistic relational database model, called FPRDB,
as an extension of the PRDB model with fuzzy sets. FPRDB is also a complete development for the
model in [18] with a full set of basic fuzzy probabilistic relational algebraic operations. FPRDB is
capable of representing and manipulating both imprecise and uncertain information in the real world
applications. FPRDB has been built based on the association of the probability theory and fuzzy
theory. The relational schemas, relations, functional dependencies and relational algebraic operations
of FPRDB have been defined coherently and consistently. A set of basic properties of the algebraic
operations in FPRDB has also been proposed as theorems and proven completely.
Towards applying FPRDB in practice, a management system for FPRDB will be build with the
familiar querying and manipulating language like SQL that is able to represent and handle imprecise
and uncertain information in the real world.
REFERENCES
[1] T. Eiter, J. J. Lu, T. Lukasiewicz, and V. Subrahmanian, “Probabilistic object bases,” ACM
Transactions on Database Systems (TODS), vol. 26, no. 3, pp. 264–312, 2001.
[2] T. H. Cao and H. Nguyen, “Uncertain and fuzzy object bases: a data model and algebraic
operations,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems,
vol. 19, no. 02, pp. 275–305, 2011.
[3] E. F. Codd, “A relational model of data for large shared data banks,” Communications of the
ACM, vol. 13, no. 6, pp. 377–387, 1970.
[4] D. C. J., An introduction to database systems. Prentice hall New Jersey, 8th ed., 2008.
[5] G. Klir and B. Yuan, Fuzzy sets and fuzzy logic. Prentice hall New Jersey, 1995, vol. 4.
[6] Z. L.A., “Fuzzy sets,” Journal of Information and Control, vol. 8, no. 3, pp. 338–353, 1965.
NGUYEN HOA 207
[7] T. Ge, A. Dekhtyar, and J. Goldsmith, “Uncertain data: Representations, query processing, and
applications,” in Advances in Probabilistic Databases for Uncertain Information Management.
Springer, 2013, pp. 67–108.
[8] Z. Ma and L. Yan, Advances in probabilistic databases for uncertain information management.
Springer, 2013, vol. 304.
[9] N. Hoa and T. D. Hieu, “A probabilistic relational data model for uncertain information,” in
2013 IEEE Third International Conference on Information Science and Technology (ICIST).
IEEE, 2013, pp. 607–613.
[10] R. Ross, V. Subrahmanian, and J. Grant, “Aggregate operators in probabilistic databases,”
Journal of the ACM (JACM), vol. 52, no. 1, pp. 54–101, 2005.
[11] W. Zhao, A. Dekhtyar, and J. Goldsmith, “Databases for interval probabilities,” International
journal of intelligent systems, vol. 19, no. 9, pp. 789–815, 2004.
[12] X. Meng, Z. Ma, and X. Zhu, “A knowledge-based fuzzy query and results ranking approach for
relational databases,” Journal of Computational Information Systems, vol. 6, no. 6, 2010.
[13] J. Mishra and S. Ghosh, “A new functional dependency in a vague relational database model,”
International Journal of Computer Applications, vol. 39, no. 8, pp. 29–36, 2012.
[14] F. Petry, “Fuzzy databases: Principles and applications (with chapter contribution by patrick
bosc). international series in intelligent technologies. ed. hj zimmermann,” J. Zimmermann.
Kluwer Academic Publishers (KAP), 1996.
[15] A. A. Sabour, A. M. Gadallah, and H. A. Hefny, “Flexible querying of relational databases: Fuzzy
set based approach,” in International Conference on Advanced Machine Learning Technologies
and Applications. Springer, 2014, pp. 446–455.
[16] L. Yan and Z. Ma, “A fuzzy probabilistic relational database model and algebra,” International
Journal of Fuzzy Systems, vol. 15, no. 1, p. 244, 2013.
[17] H. Nguyen, “A probabilistic relational database model and algebra,” Journal of Computer Sci-
ence and Cybernetics, vol. 31, no. 4, p. 305, 2015.
[18] ——, “Towards a fuzzy probabilistic relational data base model,” in Knowledge and Systems
Engineering (KSE), 2015 Seventh International Conference on. IEEE, 2015, pp. 298–301.
[19] J. Baldwin, J. Lawry, and T. Martin, “A mass assignment theory of the probability of fuzzy
events,” Fuzzy Sets and Systems, vol. 83, no. 3, pp. 353–367, 1996.
[20] ——, “A note on probability/possibility consistency for fuzzy events,” in Proceedings of the
6th International Conference on Information Processing and Management of Uncertainty in
Knowledge-Based Systems, 1996, pp. 521–525.
Received on March 15 - 2016
Revised on August 18 - 2016
Các file đính kèm theo tài liệu này:
- a_fuzzy_probabilistic_relational_database_model_and_algebra.pdf