A fuzzy probabilistic relational database model and algebra

In this paper, the authors propose a fuzzy probabilistic relational database model, called FPRDB, as an extension of the PRDB model with fuzzy sets. FPRDB is also a complete development for the model in [18] with a full set of basic fuzzy probabilistic relational algebraic operations. FPRDB is capable of representing and manipulating both imprecise and uncertain information in the real world applications. FPRDB has been built based on the association of the probability theory and fuzzy theory. The relational schemas, relations, functional dependencies and relational algebraic operations of FPRDB have been defined coherently and consistently. A set of basic properties of the algebraic operations in FPRDB has also been proposed as theorems and proven completely. Towards applying FPRDB in practice, a management system for FPRDB will be build with the familiar querying and manipulating language like SQL that is able to represent and handle imprecise and uncertain information in the real world.

19 trang | Chia sẻ: huongthu9 | Lượt xem: 608 | Lượt tải: 0

Bạn đang xem nội dung tài liệu A fuzzy probabilistic relational database model and algebra, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên

ly the value of each attribute, although we know that the attribute may take one of the values that can be vague of a certain set. More recently, in [17], the author introduced a probabilistic relational database model called PRDB that was able to represent situations in which we do not know exactly the value of each attribute but we know the probability interval for it takes one of values of a candidate set. It means that the model can overcome the shortcoming of the model in [16]. However, the PRDB model could not express and deal with vague information. In [18], the author extended the model in [17] with fuzzy set values to a fuzzy probabilistic relational data base model, called FPRDB, for representing and handling uncertain and vague information. However, in the model [18], only the selection operation was defined while all other algebraic operations are missing. In this paper, using the combination of fuzzy probabilistic triples introduced in [2] and the probabilistic interpretation of relations on fuzzy sets defined in [18], we extend more the FPRDB model with a complete set of fuzzy probabilistic relational algebraic operations for representing and manipulating both imprecise and uncertain information in practice. The basis of mathematics to develop FPRDB is presented in Section 2. Schemas and relations of FPRDB are introduced in Section 3. Section 4,5 and 6 present fuzzy probabilistic relational algebraic operations and their properties in FPRDB. Finally, Section 7 concludes the paper and outlines further research directions in the future. 2. PROBABILITY AND FUZZY SETS In this section, some notions about probability and fuzzy sets are presented as the basis for extending the probabilistic relational database model PRDB with fuzzy set values. 2.1. Probabilistic interpretation of relations on fuzzy sets First, the mass assignment formulated based on the voting model of fuzzy sets in [19] as the basis for the probabilistic interpretation of relations on them is defined as follows. Definition 1. Let A = ∑ i=1,n ∑ j=1,mi xi,j : yi be a normal fuzzy set on a domain U , where n, mi ∈ N and yi > yj if i < j, ∀i =1,. . . , n and ∀j =1,. . . , mi. The mass assignment corresponding to A is a mapping mA: 2 U →[0, 1] that is defined by mA(z1) = y1−y2, . . . , mA(∪i=1,jzi) = yj−yj+1,. . . , mA(∪i=1,nzi) = yn, where zi = ∪j=1,mi{xi,j}. It is noted that, the mass assignment mA(z1) = y1 − y2, . . . ,mA(∪i=1,jzi) = yj − yj+1,. . . , mA(∪i=1,n zi) = yn can be denoted by mA = z1: y1 − y2, . . . ,∪i=1,jzi: yj − yj+1,. . . , ∪i=1,nzi: yn. The probabilistic interpretation of relations on fuzzy sets, as the probability measures for the relations are true, is defined in [2, 18] as below. Definition 2. Let A be a fuzzy set on a domain U , B be a fuzzy set on a domain V , and θ be a binary relation from {=, 6=, ≤, <, ⊆, ∈} assumed to be valid on (U × V ). The probabilistic interpretation of a relation AθB, denoted by prob(AθB), is a value in [0, 1] that is defined by NGUYEN HOA 191 ∑ S⊆U,T⊆V p(uθv|u ∈ S, v ∈ T ).mA(S).mB(T ), where mA, mB are the mass assignments corresponding to A and B and p(uθv|u ∈ S, v ∈ T ) is the conditional probability of uθv given u ∈ S, v ∈ T . Intuitively, given fuzzy propositions “x is A” and “y is B”, prob(AθB) is the probability for xθy being true. Definition 3. Let A and B be two fuzzy sets on a domain U. The probabilistic interpretation of the relation A→ B, denoted by prob(A→ B), is a value in [0, 1] that is defined by∑ S,T⊆U p(u ∈ T |u ∈ S).mA(S).mB(T ), where mA, mB are the mass assignments corresponding to A and B and p(u ∈ T |u ∈ S) is the conditional probability for u ∈ T given u ∈ S. The intuitive meaning of prob(A→ B) is that given a fuzzy proposition “x ∈ A”, prob(A→ B) is the probability for x ∈ B being true. In other words, it is the fuzzy conditional probability of x ∈ B given x ∈ A. It is noted that the above probabilistic interpretation can also be adapted for fuzzy sets on continuous domains, using integration instead of additionas for computing the probability of fuzzy events in [20]. Example 1. Let A = {3:0.2, 4:0.5, 5:0.9, 6:1} and B= {6: 0.3, 5:1, 4:0.3} be the fuzzy sets on the domain {1, 2, 3, 4, 5, 6}, then the mass assignments corresponding to A and B are mA={6}:0.1, {5, 6}:0.4, {4, 5, 6}:0.3, {3, 4, 5, 6}:0.2 and mB = {5}:0.7, {4, 5, 6}:0.3. The probabilistic interpretation of A→ B is computed as follows: prob(A→ B) = p(u ∈{5}|u ∈{6}).mA({6}).mB({5}) + p(u ∈{5}|u ∈{5,6}).mA({5,6}).mB({5}) + p(u ∈{5}|u ∈{4,5,6}).mA({4,5,6}).mB({5}) + p(u ∈{5}|u ∈{3,4,5,6}).mA({3,4,5,6}).mB({5}) + p(u ∈{4,5,6}|u ∈{6}).mA({6}).mB({4,5,6}) + p(u ∈{4,5,6}|u ∈{5,6}).mA({5,6}).mB({4,5,6}) + p(u ∈{4,5,6}|u ∈{4,5,6}).mA({4,5,6}).mB({4,5,6}) + p(u ∈{4,5,6}|u ∈{3,4,5,6}).mA({3,4,5,6}).mB({4,5,6}) = 0 × 0.1× 0.7 + 1/2 × 0.4 × 0.7 + 1/3 × 0.3 × 0.7 + 1/4 × 0.2 × 0.7 + 1.0 × 0.1 × 0.3 + 1.0 × 0.4 × 0.3 + 1.0 × 0.3 × 0.3 + 3/4 × 0.2 × 0.3= 0.53. 2.2. Fuzzy probabilistic triples For representing imprecise and uncertain attribute values in FPRDB, the fuzzy probabilistic triples are used in [18] extended from probabilistic triples in [17]. First, the probability distribution function as the basis for the concept of the fuzzy probabilistic triple is defined as below. Definition 4. Let X be a finite set, a probability distribution function α over X is a mapping α: X → [0, 1] such that ∑ x∈X α(x) ≤ 1. An important probability distribution function which we often encounter in practice is the uniform distribution u(x) = 1/|X|, ∀x ∈ X . For example, if X = {e1, e2, e3}, the uniform distribution u over X is u(x) = 1/3, ∀x ∈{e1, e2, e3}. 192 A FUZZY PROBABILISTIC RELATIONAL DATABASE MODEL AND ALGEBRA Definition 5. A probabilistic triple 〈X , α, β 〉 consists of a finite set X , a probability distribution function α over X , and a function β: X → [0, 1] such that α(x) ≤ β(x), ∀x ∈ X . If the elements of X are fuzzy sets then 〈X , α, β〉 is called a fuzzy probabilistic triple. Informally, a fuzzy probabilistic triple 〈X , α, β〉 assigns each element x ∈ X a probability interval [α(x), β(x)] to express the imprecise and uncertainty degree of x in X . It means that the fuzzy probabilistic triple 〈X , α, β〉 assigns each element x ∈ X a probability p(x) which α(x) ≤ p(x) ≤β(x) to represent the imprecise and uncertainty degree of x in X . It is easy to see that the fuzzy probabilistic triple 〈X , α, β〉 is a probability interval extension of the probability distribution function α on X . Many probabilistic database models, such as [1, 9, 11, 17], used the interval [α(x), β(x)] to represent the probability for x instead of using the value of the distribution function α(x). Because, in many situations, it is impossible to know exactly the probability α for x but only the probability for x belonging to an interval [α, β]. Example 2. Suppose the daily treatment cost of a patient is estimated within about 50 or 60 (thousand VND) with a probability for each between 0.4 and 0.6. Then this information can be represented by the fuzzy probabilistic triple 〈X , α, β〉 = 〈{about 50, about 60}, 0.8u, 1.2u〉, where about 50 and about 60 are fuzzy sets defining the imprecise treatment costs and u is the uniform distribution over X= {about 50, about 60}. Here, 0.8u and 1.2u are the probability distribution functions α and β respectively with α(x) = 0.8u(x) = 0.8(1/2) = 0.4 and β(x) = 1.2u(x) = 1.2(1/2) = 0.6, ∀x ∈ X= {about 50, about 60}. In addition, it is noted more that ∑ x∈X α(x) = 0.8u(about 50) + 0.8u(about 60) = 0.8(1/2) + 0.8(1/2) = 0.4 + 0.4 = 0.8 ≤ 1 and α(x) = 0.8u(x) = 0.8(1/2) = 0.4 ≤ 0.6 = 1.2(1/2) = 1.2u(x) = β(x),∀x ∈ X. It means that α(x) ≤β(x) and β(x) ∈[0, 1], ∀x ∈ X . 2.3. Probabilistic combination strategies Let two events e1 and e2 have probabilities in the intervals [L1, U1] and [L2, U2], then probability intervals of the conjunction event e1 ∧ e2, disjunction event e1 ∨ e2, or difference event e1 ∧ ¬e2 can be computed by alternative strategies. In this work, we employ the conjunction, disjunction, and difference strategies given in [1, 17] as presented in Table 1, where ⊗, ⊕, and denote the conjunction, disjunction, and difference operators, respectively. In following sections, the notation [L1, U1] ≤ [L2, U2] is used to replace L1 ≤ L2 and U1 ≤ U2 whereas the notation [L1, U1] ⊆ [L2, U2] is used to replace L2 ≤ L1 and U1 ≤ U2. 2.4. Conjunction, disjunction and difference of fuzzy probabilistic triples For building algebraic operations such as the join, intersection, union and difference of fuzzy proba- bilistic relations in FPRDB, combination strategies of probabilistic triples in [17] are extended with fuzzy sets. It is noted that, here h(v) denotes the height of a fuzzy set v, whereby v is a normal fuzzy set if and only if h(v) = 1. Definition 6. Let fpt1 = 〈V1, α1, β1〉 and fpt2 = 〈V2, α2, β2〉 be two fuzzy probabilistic triples, and ⊗ be a probabilistic conjunction strategy. Then the conjunction of fpt1 and fpt2 under ⊗, denoted by fpt1⊗fpt2, is the fuzzy probabilistic triple fpt = 〈V , α, β〉, such that: NGUYEN HOA 193 Table 1. Examples of probabilistic combination strategies Strategy Operators Ignorance ([L1, U1] ⊗ig[L2, U2]) ≡ [max(0, L1 + L2– 1), min(U1, U2)] ([L1, U1] ⊕ig[L2, U2]) ≡ [max(L1, L2), min(1, U1 + U2)] ([L1, U1] ig[L2, U2]) ≡ [max(0, L1– U2), min(U1,1– L2)] Independence ([L1, U1] ⊗in[L2, U2]) ≡ [L1 . L2, U1. U2] ([L1, U1] ⊕in[L2, U2]) ≡ [L1 +L2– (L1 . L2), U1 +U2– (U1 . U2)] ([L1, U1] in[L2, U2]) ≡ [L1. (1– U2), U1. (1– L2)] Positive correlation (when e1 implies e2, or e2 implies e1) ([L1, U1] ⊗pc[L2, U2]) ≡ [min(L1, L2), min(U1, U2)] ([L1, U1] ⊕pc[L2, U2]) ≡ [max(L1, L2), max(U1, U2)] ([L1, U1] pc[L2, U2]) ≡ [max(0, L1 – U2), max(0, U1 –L2)] Mutual exclusion (when e1 and e2 are mutually exclusive) ([L1, U1] ⊗me[L2, U2]) ≡ [0, 0] ([L1, U1] ⊕me[L2, U2]) ≡ [min(1, L1 + L2), min(1, U1 + U2)] ([L1, U1] me[L2, U2]) ≡ [L1, min(U1, 1 – L2)] 1) V ={v = v1∩v2|v1 ∈ V1, v2 ∈ V2, h(v) = 1, [α1(v1), β1(v1)]⊗[α2(v2), β2(v2)] 6=[0,0]}, and 2) [α(v), β(v)] = ⊕me:v1∈V 1,v2∈V 2,v=v1∩v2[α1(v1), β1(v1)]⊗[α2(v2), β2(v2)], for every v ∈ V , where ⊕me is the mutual exclusion probabilistic disjunction strategy. It is noted that, unlike the PRDB, in that each v1 and v2 in V1 and V2 respectively is elementary and non-fuzzy, here v1 and v2 may be fuzzy sets, since there can be more than one pair (v1, v2) ∈ V1×V2 such that v = v1∩v2. So the probability intervals for those pairs must be combined using the mutual exclusion probabilistic disjunction strategy ⊕me in the above computation of [α(v), β(v)]. Example 3. Let fpt1 = 〈{about 40, about 50}, u, u〉 and fpt2 = 〈{about 50, about 60}, 0.8u, 1.2u〉 be fuzzy probabilistic triples, where about 40, about 50 and about 60 are fuzzy sets with h(about 40∩about 60 )< 1, then fpt1⊗infpt2 with the independence probabilistic conjunction strat- egy is the fuzzy probabilistic triple fpt = 〈{about 50}, 0.2u, 0.3u〉. Next, the disjunction and difference of fuzzy probabilistic triples in turn are defined as below. Definition 7. Let fpt1 = 〈V1, α1, β1〉 and fpt2 = 〈V2, α2, β2〉 be two fuzzy probabilistic triples, and ⊕ be a probabilistic disjunction strategy. Then the disjunction of fpt1 and fpt2 under ⊕, denoted by fpt1⊕fpt2, is the fuzzy probabilistic triple fpt = 〈V ,α,β〉, such that: 1) V = P ∪Q∪R, where P= {v1 ∈ V1|¬∃v2 ∈ V2: h(v1∩ v2) = 1}, Q= {v2 ∈ V2|¬∃v1 ∈ V1: h(v1 ∩ v2) = 1}, and R= {v1 ∩ v2|v1 ∈ V1, v2 ∈ V2, h(v1 ∩ v2) = 1}, and 2) [α(v), β(v)] =  [α1(v), β1(v)], ∀v ∈ P [α2(v), β2(v)], ∀v ∈ Q ⊕me:v1∈V1,v2∈V2,v=v1∩v2 [α1(v1), β1(v1)]⊕ [α2(v2), β2(v2)], ∀v ∈ R Definition 8. Let fpt1 = 〈V1, α1, β1〉 and fpt2 = 〈V2, α2, β2〉 be two fuzzy probabilistic triples, and be a probabilistic difference strategy. Then the difference of fpt1 and fpt2 under , denoted by fpt1 fpt2, is the fuzzy probabilistic triple fpt = 〈V , α, β〉, such that: 194 A FUZZY PROBABILISTIC RELATIONAL DATABASE MODEL AND ALGEBRA 1) V = P ∪ Q, where P= {v1 ∈ V1|¬∃v2 ∈ V2: h(v1 ∩ v2) = 1}, Q= {v = v1 ∩ v2|v1 ∈ V1,v2 ∈ V2, h(v1 ∩ v2) = 1 and [α1(v1), β1(v1)] [α2(v2), β2(v2)] 6= [0, 0]}, and 2) [α(v), β(v)] = { [α1(v), β1(v)], ∀v ∈ P ⊕me:v1∈V1,v2∈V2,v=v1∩v2 [α1(v1), β1(v1)] [α2(v2), β2(v2)], ∀v ∈ Q 3. SCHEMAS AND FUZZY PROBABILISTIC RELATIONS 3.1. Fuzzy probabilistic relational schemas Fuzzy probabilistic relational schemas are extended from probabilistic relational schemas in [17]. A fuzzy probabilistic relational schema describes a set of attributes of a set of certain objects of which each attribute is associated with fuzzy probabilistic triples as the following definition. Definition 9. A fuzzy probabilistic relational schema is a pair R= (U , ℘), where 1) U = {A1, A2,. . . ,Ak} is a set of pairwise different attributes. 2) ℘ is a function that maps each attribute A ∈U to the set of all fuzzy probabilistic triples on the value domain of A (i.e. each element of ℘(A) is a fuzzy probabilistic triple that has the form 〈V , α, β〉, where V is a subset of the set of all fuzzy sets on the value domain of A). Note that as in the classical relational database, for simplicity, the notations R(U , ℘) and R can be used to replace R = (U , ℘). In addition, the domain of each attribute A is denoted by dom(A). 3.2. Fuzzy probabilistic relations A fuzzy probabilistic relation is an instance of a fuzzy probabilistic relational schema in which each attribute may take imprecise and uncertain values represented by a fuzzy probabilistic triple as the definition below. Definition 10. Let U= {A1, A2,. . . ,Ak} be a set of k pairwise different attributes. A fuzzy probabilistic relation r over the fuzzy probabilistic relational schema R(U , ℘), is a finite set {t|t= (〈V1, α1, β1〉, 〈V2, α2, β2〉,. . . , 〈Vk, αk, βk〉)} in which each element t is a list of k fuzzy probabilistic triples such that 〈Vi, αi, βi〉 belongs to the set fi = ℘(Ai) and Vi 6= ∅, for every i =1, 2,. . . , k. Each element t in the relation r over R(U , ℘) is called a tuple on U . Each fuzzy probabilistic triple 〈Vi, αi, βi〉 represents the imprecise and uncertain value of the attribute Ai of the tuple t, the notation t.Ai denotes 〈Vi, αi, βi〉 that is t.Ai = 〈Vi, αi, βi〉. For each set of attributes X ⊆ {A1, A2, . . . , Ak}, the notation t[X ] is used to denote the rest of t after eliminating the value of attributes not belonging to X . From Definition 5, it is noted that, each attribute Ai of a tuple t in the relation r over R(U , ℘) only takes one value v in Vi with a probability p(v) ∈ [αi(v),βi(v)]. As in [1,3,16], the model FPRDB adopts the closed world assumption (CWA). It means, for each tuple t, every value v ∈dom(Ai)−Vi has the probability 0. In addition, each precise value v ∈ Vi is also considered as a special fuzzy set on dom(Ai) with the membership function µv(v) =1 and µv(x) = 0, ∀x ∈dom(Ai) and x 6= v. So, each probabilistic relation in [17] can be considered as a particular fuzzy probabilistic relation by Definition 10. NGUYEN HOA 195 Example 4. A simple fuzzy probabilistic relation PATIENT (over the schema PATIENT) about patients at the clinic of a hospital can be organised as Table 2 In the relation, the attributes P ID, P NAME, AGE, DISEASE and D COST describe the information about the identifier, name, age, disease and daily treatment cost of each patient, respectively. In reality, while diagnosing, the disease of each patient is not always determined certainly by the physicians. Similarly, the treatment cost for patients are also not known definitely even as the patients know about their diseases. Here, the conventional unit for the treatment cost is 1000 (VND). It is noted that, for each attribute A ∈U={P ID, P NAME, AGE, DISEASE D COST} in the schema PATIENT(U , ℘), ℘(A) includes all fuzzy probabilistic triples on the value domain of A (Definition 9). Table 2. Relation PATIENT P ID P NAME AGE DISEASE D COST PT226 N.V. Ha 〈{65}, u, u〉〈{lung cancer, tuber- culosis}, 0.8u, 1.2u〉〈{300, 350}, u, u〉 PT234 T.V. Son 〈{young},u, u〉〈{hepatitis, cirrho- sis}, 0.9u, 1.3u〉〈{about 60, about 70}, 0.8u, 1.2u〉 PT242 L.T. Lan 〈{middle aged}, u, u〉〈{cholecystitis}, u, u〉〈{8}, u, u〉 Also note that, in the real world applications, fuzzy set values of attributes of the fuzzy prob- abilistic relation PATIENT, such as about 60, about 70, young and middle aged will be defined (maybe by professional experts) compatibly and consistently with the meaning of the information represented by them. In this example, the fuzzy set values only simply illustrate for Definition 10. For example, about 60 = {58: 0.5, 59: 0.9, 60: 1, 61: 0.9, 62: 0.5} or about 70 = {68: 0.5, 69: 0.9, 70: 1, 71: 0.9, 72: 0.5} can be considered as a fuzzy set value representing the daily treatment cost of the patient T.V. Son who has hepatitis or cirrhosis. Similarly, young and middle aged whose membership function graphs as Figure 1, may be considered as fuzzy set values representing the imprecise age of the patients T.V.Son and L.T. Lan respectively. In addition, for simplicity, each fuzzy probabilistic triple 〈V , u, u〉, with V ={v}, will be represented as a single value v because if the attribute takes such a fuzzy probabilistic triple, then actually it only takes a value v with the probability is 1 (Definition 5). In other words, the attribute certainly takes the value v. Figure 1. Fuzzy set values of the attribute AGE Now, the notion of a fuzzy probabilistic relational database is defined as follows. Definition 11. A fuzzy probabilistic relational database over a set of attributes is a set of fuzzy 196 A FUZZY PROBABILISTIC RELATIONAL DATABASE MODEL AND ALGEBRA probabilistic relations corresponding with the set of their fuzzy probabilistic relational schemas. Note that, if we only care about an unique relation over a schema then we can unify its symbol name with its schema’s name. 3.3. Fuzzy probabilistic functional dependencies For defining the fuzzy probabilistic functional dependent concept in FPRDB, first a probability measure is proposed to determine the fuzzy equal degree of two values of the same attribute for two different tuples in a relation as below. Definition 12. Let t1 and t2 be two tuples in a fuzzy probabilistic relation r, A be an attribute of r and ⊗ be a probabilistic conjunction strategy. The probability interval for the values of the attribute A of two tuples t1 and t2 respectively are equal under ⊗, denoted by p(t1.A =⊗ t2.A) is [Σv∈V α(v).prob(v1 = v2), min(1, Σv∈V β(v).prob(v1 = v2))] where, t1.A = 〈V1, α1, β1〉, t2.A = 〈V2, α2, β2〉 and [α(v), β(v)] = [α1(v1), β1(v1)] ⊗ [α2(v2), β2(v2)], ∀v= (v1, v2) ∈ V = V1 × V2. Now, the fuzzy probabilistic functional dependency in FPRDB is extended from the probabilistic functional dependency in PRDB [17] as follows. Definition 13. Let U= {A1, A2,. . . ,Ak} be a set of k pairwise different attributes R(U , ℘) be a fuzzy probabilistic relational schema, r be any fuzzy probabilistic relation over R, ⊗ be a probabilistic conjunction strategy, X= {Ai,. . . ,Al} and Y = {Aj ,. . . ,Am} be two subsets of U . A fuzzy probabilistic functional dependency of Y on X under ⊗ over R, denoted by X ⊗Y , if and only if ∀t1, t2 ∈ r, Pr(t1[X ] =⊗t2[X ]) ≤Pr(t1[Y ] =⊗t2[Y ]), where Pr(t1[X ] =⊗t2[X ]) = p(t1.Ai =⊗ t2.Ai) ⊗ . . .⊗p(t1.Al =⊗ t2.Al) and Pr(t1[Y ] =⊗t2[Y ]) = p(t1.Aj =⊗ t2.Aj) ⊗ . . .⊗p(t1.Am =⊗ t2.Am). An obvious example of the fuzzy probabilistic functional dependency is every attribute Ai de- pending on the set {A1, A2,. . . , Ak} that consists of all attributes of the schema R. It is noted that in the classical database, one can consider the probability for two values of an attribute equal i.e. only 0 or 1, so the functional dependency in the classical relational database is a particular case of the fuzzy probabilistic functional dependency in this definition. As for the classical relational database model, the keys of a schema in FPRDB are the basis for recognising a tuple in a fuzzy probabilistic relation. In the model and management systems of the classical relational database, key attributes are constrained not to take the value NULL [3, 4]. Similarly, in FPRDB, it is assuming that the value of each key attribute is always certain and definite. The key concept of fuzzy probabilistic relational schema is defined using the fuzzy probabilistic functional dependency as follows. Definition 14. Let R(U , ℘) be a fuzzy probabilistic relational schema, r be any relation over R and ⊗ be a probabilistic conjunction strategy, a set of attributes K ⊆U is called a key of R under ⊗ if the value of each attribute of K is always certain, precise in r and there is a fuzzy probabilistic functional dependency K ⊗U such that not to exist any subset of K has the properties. NGUYEN HOA 197 In the relation PATIENT above, if we assume that each patient has a unique identifier corre- sponding with the value of the attribute P ID and the identifier differs from every identifier of other patients, then by the definition, P ID is a key of the schema PATIENT under any probabilistic conjunction strategy. 4. SELECTION OPERATION ON A FUZZY PROBABILISTIC RELATION 4.1. Syntax of selection conditions The selection is a basic algebraic operation and is used to query the imprecise and uncertain informa- tion in FPRDB. Before defining the selection operation, we present the formal syntax and semantics of selection conditions by extending those definitions of PRDB with fuzzy sets. We start with the syntax of selection expressions as the following definition. Definition 15. Let R be a schema in FPRDB and X be a set of relational tuple variables. Then selection expressions are inductively defined and have one of the following forms: 1) x.A θc, where x ∈X , A is an attribute in R, θ is a binary relation from {=, 6=, ≤, <, ⊆,∈,→} and c is a single value or a fuzzy set. 2) x.A1 =⊗x.A2, where x ∈X , A1 and A2 are two different attributes in R and ⊗ is a proba- bilistic conjunction strategy. 3) E1 ⊗ E2, where E1 and E2 are selection expressions on the same relational tuple variable, ⊗ is a probabilistic conjunction strategy. 4) E1 ⊕ E2, where E1 and E2 are selection expressions on the same relational tuple variable, ⊕ is a probabilistic disjunction strategy. Example 5. Consider the fuzzy probabilistic relational schema PATIENT in Example 4, the selec- tion of “all patients who get hepatitis is and pay the daily treatment cost about 60 (thousand VND)” can be expressed by the selection expression x.DISEASE = hepatitis⊗x.D COST→about 60. In FPRDB, each selection condition is a logical combination of selection expressions with proba- bility intervals need to be satisfied as the following definition. Definition 16. Let R be a schema in FPRDB, selection conditions are inductively defined as follows: 1) If E is a selection expression and [L, U ] is a subinterval of [0, 1], then (E)[L,U ] is a selection condition. 2) If φ and ψ are selection conditions on the same tuple variable, then ¬φ, (φ ∧ ψ), (φ ∨ ψ) are selection conditions. Example 6. With the schema of the relation PATIENT in Example 4, then the selection of “all patients who are young with a probability of at least 0.4 and have lung cancer with a probability of at least 0.8” can be done using the selection condition (x.AGE → young) [0.4, 1.0] ∧ (x.DISEASE = lung cancer)[0.8, 1.0]. 198 A FUZZY PROBABILISTIC RELATIONAL DATABASE MODEL AND ALGEBRA 4.2. Semantics of selection conditions For defining the semantics of selection conditions in FPRDB, we first extend probabilistic interpre- tations of selection expressions in [17] with the binary relations on fuzzy sets in Section 2 as the definition below. Definition 17. Let R be a fuzzy probabilistic relational schema in FPRDB, r be a relation over R, x be a tuple variable and t be a tuple in r. The probabilistic interpretation of selection expressions with respect to R, r and t, denoted by probR,r,t, is the partial mapping from the set of all selection expressions to the set of all closed subintervals of [0, 1] that is inductively defined as follows: 1) probR,r,t(x.Aθc) = [Σv∈V α(v).prob(vθc), min(1, Σv∈V β(v).prob(vθc))], where t.A = 〈V , α, β〉. 2) probR,r,t(x.A1 =⊗ x.A2) = [Σv∈V α(v).prob(v1= v2), min(1, Σv∈V β(v).prob(v1 = v2))], where t.A1= 〈V1, α1, β1〉, t.A2 = 〈V2, α2, β2〉 and [α(v), β(v)] = [α1(v1), β1(v1)] ⊗ [α2(v2), β2(v2)], ∀v = (v1, v2) ∈ V = V1 × V2. 3) probR,r,t(E1 ⊗ E2) = probR,r,t(E1)⊗ probR,r,t(E2). 4) probR,r,t(E1 ⊕ E2) = probR,r,t(E1)⊕ probR,r,t(E2). Intuitively, probR,r,t(x.Aθc) is the probability interval for the attribute A of the tuple t having a value v such that vθc and probR,r,t(x.A1 =⊗ x.A2) is the probability interval for the attributes A1 and A2 of the tuple t having values v1 and v2 respectively, such that v1 = v2. Example 7. Let r denote the relation PATIENT in Example 4 and R denote the schema of PATIENT consider the tuple t2 (the second tuple) in r, using Definition 3 we have prob(about 60 →about 60 ) = 0.86 and prob(about 70 →about 60 ) = 0.0. From that probR,r,t2(x.D COST→about 60 ) = [0.8u(about 60 ).prob(about 60 →about 60 ) + 0.8u(about 70 ).prob(about 70 →about 60 ), min(1, 1.2u(about 60 ).prob(about 60 →about 60 ) + 1.2u(about 70 ).prob(about 70 →about 60 ))] = [0.8 ×0.5×0.86 + 0.8 ×0.5×0., min(1, 1.2 ×0.5×0.86 + 1.2 ×0.5×0.0)] =[0.344, 0.516]. On the basis of the probabilistic interpretation of selection expressions, the satisfaction or seman- tics of selection conditions in FPRDB is defined as below. Definition 18. Let R be a relational schema in FPRDB, r be a fuzzy probabilistic relation over R and t ∈ r. The satisfaction of selection conditions under probR,r,t is defined as follows: 1) probR,r,t |= (E)[L, U ] if and only if (iff) probR,r,t(E) ⊆[L, U ]. 2) probR,r,t |= ¬φ iff probR,r,t |= φ does not hold. 3) probR,r,t |= φ ∧ ψ iff probR,r,t |= φ and probR,r,t |= ψ. 4) probR,r,t |= φ ∨ ψ iff probR,r,t |= φ or probR,r,t |= ψ. NGUYEN HOA 199 Note that, in the classical database, the concepts of selection expression and selection condition are identical and we can consider probability intervals [L, U ] in selection conditions being always equal to [1.0, 1.0]. This also means that the concept of satisfaction of selection conditions in the classical relational database model is a particular case of the concept of satisfaction of selection conditions in FPRDB. Now, the selection operation on a relation in FPRDB is defined as follows. Definition 19. Let R be a relational schema in FPRDB, r be a fuzzy probabilistic relation over R and φ be a selection condition over a tuple variable x. The selection on r with respect to φ, denoted by σφ(r), is the relation r’ = {t ∈ r|probR,r,t |= φ} over R, including all satisfied tuples of the selection condition φ. Example 8. Let approx 15 = (10: 0; 15: 1; 20: 0) denote the continuous triangle shaped fuzzy set whose vertices are (15, 1), (10, 0) and (20, 0). Consider the relation PATIENT in Example 4. Then, the query “Find all patients who are approximately 15 years old with a probability from 0.2 to 0.5 and have hepatitis with a probability of at least 0.4” can be done by the selection operation r’=σφ(PATIENT) with φ =(x.AGE→approx 15 )[0.2, 0.5]∧ (x.DISEASE = hepatitis)[0.4, 1.0]. As noted in Definition 3, the probabilistic interpretation prob(A→ B) can also be adapted for fuzzy sets on continuous domains, using integration instead of addition. That is prob(A→ B) = 1∫ 0 1∫ 0 p(u ∈ yB|u ∈ xA)dxdy = 1∫ 0 1∫ 0 p(xA ∩ yB) p(xA) dxdy = 1∫ 0 1∫ 0 |xA ∩ yB| |xA| dxdy where xA and yB are α -cuts of the fuzzy sets A and B with α = x and α = y, respectively. For the fuzzy sets approx 15 above and young given in Example 4, their α-cuts are respectively αapprox 15 = [10+5α, 20-5α] and αyoung= [0, 35-15α]. From that prob(young→approx 15 ) is computed as follows: prob(young → approx 15) = 1∫ 0 1∫ 0 |xyoung ∩ yapprox 15| |xyoung| dxdy = 1∫ 0 1∫ 0 |[0, 35− 15x] ∩ [10 + 5y, 20− 5y]| |[0, 35− 15x]| dxdy = 1∫ 0 1∫ 0 10− 10y 35− 15xdxdy = 1∫ 0 1∫ 0 2− 2y 7− 3xdxdy = ln 7− ln 4 3 ≈ 0.2. The selection is implemented by checking the satisfaction of all tuples in PATIENT for the selection condition φ. From the result computed above, we can see only the tuple t2 satisfies φ be- cause probR,r,t2(x.AGE→approx 15 ) = [u(young).prob(young→approx 15 ), min(1, u(young). prob(young→approx 15 ))] =[1 ×0.2, min(1, 1 ×0.2]= [0.2, 0.2]⊆ [0.2, 0.5] and probR,r,t2(x. DIS- EASE = hepatitis) = [0.45, 0.65]⊆[0.4, 1.0]. So, the result of the selection is the relation r’= σφ(PATIENT) as in Table 3. 200 A FUZZY PROBABILISTIC RELATIONAL DATABASE MODEL AND ALGEBRA Table 3. Relation r’=σφ (PATIENT) P ID P NAME AGE DISEASE D COST PT234 T.V. Son 〈{young},u, u〉〈{hepatitis, cirrhosis}, 0.9u, 1.3u〉〈{about 60, about 70}, 0.8u, 1.2u〉 5. OTHER OPERATIONSON FUZZY PROBABILISTIC RELATIONS As for the classical relational database and PRDB, other basic operations on fuzzy probabilistic relations are the projection, Cartesian product, join, intersection, union, and difference. We now extend those operations of PRDB for FPRDB taking into account imprecise and uncertain values of relational attributes. 5.1. Projection A projection of a fuzzy probabilistic relation on a set of attributes is a new fuzzy probabilistic relation in which only the attributes in that set are considered for each tuple of the new relation as the following definition. Definition 20. Let R= (U , ℘) be a fuzzy probabilistic relational schema, r be a relation over R and L be a subset of attributes of U . The projection of r on L denoted by ΠL(r), is the fuzzy probabilistic relation r’ over the schema R’, determined by: 1) R’ = (L, ℘’) and ℘’(A) = ℘(A), ∀A ∈L, 2) r’ = {t’= t[L]|t ∈ r}, i.e., r’ consists of tuples t’ to achieve from the tuples t = (〈V1, α1, β1〉,. . . , 〈Vk, αk, βk〉) ∈ r by eliminating every 〈Vj , αj , βj〉 that t.Aj = 〈Vj , αj , βj〉 and Aj /∈L. 5.2. Cartesian product For the Cartesian product of two fuzzy probabilistic relations, as in the classical relational database and PRDB, we assume the set of attributes of their schemas are disjoint. Also, for the operation being commutative, we assume every k-tuple t = (〈V1, α1, β1〉,. . . , 〈Vk, αk, βk〉) is an un-ordered list. Definition 21. The fuzzy probabilistic relational schemasR1(U 1, ℘1) andR2(U 2, ℘2) are Carte- sian product-compatible if and only if U 1 and U 2 have not any common attribute. Note that, any schemas R1(U 1, ℘1) and R2(U 2, ℘2) can be made Cartesian product-compatible by renaming of attributes in U 1 and U 2. Now, the Cartesian product of two fuzzy probabilistic relations in FPRDB is extended from that operation of PRDB as follows. Definition 22. Let r1 and r2 be two fuzzy probabilistic relations over the Cartesian product- compatible schemas R1= (U 1, ℘1) and R2= (U 2, ℘2), respectively. The Cartesian product of r1 and r2, denoted by r1 × r2, is the fuzzy probabilistic relation r over R determined by: 1) R = (U , ℘), whereU= U 1∪ U 2, ℘(A) = ℘1(A) if A ∈U 1 and ℘(A) = ℘2(A) if A ∈U 2, NGUYEN HOA 201 2) r= {t =(〈V1, α1, β1〉, . . . , 〈Vk, αk, βk〉, 〈Vk+1, αk+1, βk+1〉,. . . , 〈Vk+m, αk+m, βk+m〉)|t1 =(〈V1, α1, β1〉, . . . , 〈Vk, αk, βk〉) and t2 = (〈Vk+1, αk+1, βk+1〉,. . . , 〈Vk+m, αk+m, βk+m〉), t1 ∈ r1 and t2 ∈ r2}. 5.3. Join The join of two fuzzy probabilistic relations in FPRDB is extended from the natural join of two relations in the classical relational database and the join in PRDB. The join only works with relations whose schemas are join-compatible as the definition below. Definition 23. The fuzzy probabilistic relational schemas R1(U 1, ℘1) and R2(U 2, ℘2) are join -compatible if and only if the domains of two attributes of the same name A in U 1 and U 2, respectively are identical. From Definition 9, we can see that for two attributes of the same name A in U 1 and U 2 of two join–compatible schemas R1(U 1, ℘1) and R2(U 2, ℘2) then ℘1(A) = ℘2(A). For building the join of fuzzy two probabilistic relations, we first extend the join of two tuples in PRDB for FPRDB using the conjunction of fuzzy probabilistic triples as follows. Definition 24. Let t1 and t2 be two tuples on two sets of attributes U 1 and U 2 respectively, and ⊗ be a probabilistic conjunction strategy. The join of t1 and t2 under ⊗, denoted by t1 ./⊗ t2, is the tuple t on U 1∪U 2 defined by: 1) t.A = t1.A, ∀A ∈U 1−U 2, 2) t.A = t2.A, ∀A ∈U 2−U 1, 3) t.A =t1.A⊗ t2.A, ∀A ∈U 1∩U 2. It is noted that t1.A ⊗ t2.A is the conjunction of two fuzzy probabilistic triples t1.A and t2.A by Definition 6. It is easy to see that t1 ./⊗ t2 = t2 ./⊗ t1 for every probabilistic conjunction strategy, i.e. the join of two tuples is commutative. Definition 25. Let r1 and r2 be two fuzzy probabilistic relations over the join-compatible schemas R1= (U 1, ℘1) and R2= (U 2, ℘2), respectively and let ⊗ be a probabilistic conjunction strategy. The join of r1 and r2 under ⊗, denoted by r1 ./⊗ r2, is the fuzzy probabilistic relation r over the schema R, determined by: 1) R = (U , ℘) where U = U 1∪ U 2, ℘(A) = ℘1(A) if A ∈U 1− U 2, ℘(A) = ℘2(A) if A ∈U 2− U 1 and ℘(A) = ℘1(A) = ℘2(A) if A ∈U 1∩U 2 (because ℘1(A) = ℘2(A), Definition 23). 2) r= {t = t1 ./⊗ t2| t1 ∈ r1, t2 ∈ r2 and ∀A ∈U 1∩U 2, if t1.A ⊗ t2.A = 〈V , α, β〉 then V 6= ∅}. Example 9. Given two fuzzy probabilistic relations DOCTOR1 and DOCTOR2 as in Tables 4 and 5, where approx 30 = (25:0;30:1;35:0) denotes the continuous triangle shaped fuzzy set whose vertices are (30, 1), (25, 0) and (35, 0) and middle aged is the fuzzy set in Example 4. Then, the result of the join of them under the probabilistic conjunction strategy ⊗in and the standard intersection of fuzzy sets (with the height equals 1, by Definition 6) is the relation DOCTOR computed as in Table 6. 202 A FUZZY PROBABILISTIC RELATIONAL DATABASE MODEL AND ALGEBRA Table 4. Relation DOCTOR1 DOCTOR ID D AGE DT005 〈{middle aged}, u, u〉 DT093 〈{approx 30}, u, u〉 DT102 〈{55,56}, u, u〉 Table 5. Relation DOCTOR2 DOCTOR NAME D AGE L.V Cuong 〈{30, 31}, 0.8u, 1.2u〉 N.V. Hung 〈{middle aged}, u, u〉 N.T.Dat 〈{54, 55}, u, u〉 Here, the names of each relation and its schema are identical, the set of fuzzy probabilistic triples ℘(A) for each attribute A in the schemas consists of all fuzzy probabilistic triples on dom(A). 5.4. Intersection, union, and difference Intersection, union and difference of two fuzzy probabilistic relations, respectively, over the same schema is a fuzzy probabilistic relation over that schema, in which the value of attributes in common tuples of those two relations associated by a probabilistic combination strategy. A common tuple of two fuzzy probabilistic relations over the same schema is the tuple whose key attributes’ values are identical in both relations. It is due to the impreciseness and uncertainty of attribute values, a common tuple of two fuzzy probabilistic relations is not completely identical as that of two relations in the classical relational database. First, the intersection of two tuples as the basis for the intersection of two fuzzy probabilistic relations is defined as follows. Definition 26. Let t1 and t2 be two tuples respectively in two fuzzy probabilistic relations over the same schema R(U , ℘) and ⊗ be a probabilistic conjunction strategy. The intersection of t1 and t2 under ⊗, denoted by t1 ∩⊗ t2, is the tuple t on U defined by t.A = t1.A⊗ t2.A for every A ∈U . Definition 27. Let r1 and r2 be two fuzzy probabilistic relations over the same schemaR(U , ℘), K be a key of R and ⊗ be a probabilistic conjunction strategy. The intersection of r1 and r2 under ⊗, denoted by r1∩⊗r2, is the fuzzy probabilistic relation r overR defined by r = {t = t1∩⊗t2| t1 ∈ r1, t2 ∈ r2 such that t1[K] = t2[K] and t1.A⊗ t2.A 6= 〈∅, α, β〉, ∀A ∈ U }. Table 6. Relation DOCTOR= DOCTOR1 ./⊗inDOCTOR2 DOCTOR ID DOCTOR NAME D AGE DT005 N.V. Hung 〈{middle aged}, u, u〉 DT093 L.V Cuong 〈{30}, 0.4u, 0.6u〉 DT102 N.T.Dat 〈{55}, 0.25u, 0.25u〉 NGUYEN HOA 203 It is noted that, the notation t1[K] = t2[K] is used in the definition due to the value of each key attribute assumed is certain and definite as in the Definition 14. Example 10. Consider two relations DIAGNOSE1 and DIAGNOSE2 over the same schema DI- AGNOSE(U , ℘) as in Tables 7 and 8 where U={P ID, DOCTOR ID, P AGE, DISEASE} and {P ID, DOCTOR ID} is a key of DIAGNOSE, young, middle aged and approx 15 are the fuzzy sets given in Examples 4 and 8. The set ℘(A) for each attribute A in the schema DIAG- NOSE (U , ℘) consists of all fuzzy probabilistic triples 〈V , α, β〉 on dom(A). Then the intersection of DIAGNOSE1 and DIAGNOSE2 under ⊗in is the relation DIAGNOSE computed as in Table 9. Table 7. Relation DIAGNOSE1 P ID DOCTOR ID P AGE DISEASE PT226 DT093 〈{65}, u, u〉〈{lung cancer, tuberculosis}, 0.8u, 1.2u〉 PT234 DT102 〈{approx 15},u, u〉〈{hepatitis, cirrhosis}, u, u〉 Table 8. Relation DIAGNOSE2 P ID DOCTOR ID P AGE DISEASE PT383 DT102 〈{69, 70}, u, u〉〈{lung cancers}, u, u〉 PT234 DT102 〈{young},u, u〉〈{hepatitis, gall-stone}, 0.8u, 1.2u〉 PT242 DT025 〈{middle aged}, u, u〉〈{cholecystitis}, u, u〉 Table 9. Relation DIAGNOSE = DIAGNOSE1∩⊗inDIAGNOSE2 P ID DOCTOR ID P AGE DISEASE PT234 DT102 〈{approx 15},u, u〉〈{hepatitis}, 0.2u, 0.3u〉 It is noted that, in the example, the standard intersection is used to compute the intersection of fuzzy sets whereby, for example approx 15 ∩ young= approx 15. In addition, by Definition 6, the height of each fuzzy set computed from the intersection of two fuzzy sets is required to equal 1. The union of two fuzzy probabilistic relations over the same schema in FPRDB are based on the union of tuples as below. Definition 28. Let t1 and t2 be two tuples respectively in two fuzzy probabilistic relations over the same schema R(U , ℘) and ⊕ be a probabilistic disjunction strategy. The union of t1 and t2 under ⊕, denoted by t1 ∪⊕ t2, is the tuple t on U defined by t.A = t1.A⊕ t2.A for every A ∈U . Definition 29. Let r1 and r2 be two fuzzy probabilistic relations over the same schema R(U , ℘), K be a key of R and ⊕ be a probabilistic disjunction strategy. The union of r1 and r2 under ⊕, denoted by r1 ∪⊕ r2, is the fuzzy probabilistic relation r over R defined by r = {t1 ∈ r1| there is not any tuple t2 ∈ r2 such that t1[K] =t2[K]}∪{t2 ∈ r2| there is not any tuple t1 ∈ r1 such that t2[K] = t1[K]}∪{t = t1 ∪⊕ t2|t1 ∈ r1, t2 ∈ r2 such that t1[K] = t2[K]}. 204 A FUZZY PROBABILISTIC RELATIONAL DATABASE MODEL AND ALGEBRA As for the intersection and union operations, for defining the difference operation of two fuzzy probabilistic relations, we first define the difference operation of two tuples as follows. Definition 30. Let t1 and t2 be two tuples respectively in two fuzzy probabilistic relations over the same schema R(U , ℘) and be a probabilistic difference strategy. The difference of t1 and t2 under , denoted by t1 − t2, is the tuple t on U defined by t.A = t1.A t2.A for every A ∈U . Definition 31. Let r1 and r2 be two fuzzy probabilistic relations over the same schema R(U , ℘), K be a key of R and be a probabilistic difference strategy. The difference of r1 and r2 under , denoted by r1 − r2, is the fuzzy probabilistic relation r over R defined by r= {t1 ∈ r1| there is not any tuple t2 ∈ r2 such that t1[K] = t2[K]}∪{t = t1 − t2| t1 ∈ r1, t2 ∈ r2 such that t1[K] = t2[K] and t1.A t2.A 6= 〈∅, α, β〉, ∀A ∈ U }. 6. PROPERTY OF ALGEBRAIC OPERATIONS In this section, we propose some properties of the fuzzy probabilistic relational algebraic operations in FPRDB as an extension from those in the classical relational database and PRDB. Clearly, these properties say that FPRDB model is coherent and consistent. Theorem 1. Let r be a fuzzy probabilistic relation over the schema R in FPRDB, φ1 and φ2 be two selection conditions. Then σφ1(σφ2(r)) = σφ2(σφ1(r)) = σφ1∧φ2(r) (1) where, the last expression assumes that φ1 and φ2 have the same tuple variable. The first property shows that the selections may be reordered. Proof. Let r1 = σφ1(r), r2 = σφ2(r) and r1∧2 = σφ1∧φ2(r). Then for each t ∈ r, we have σφ1(σφ2(r)) = {t ∈ r2|probR,r2,t |= φ1} = {t ∈ r|(probR,r,t |= φ2) ∧ (probR,r2,t |= φ1)} = {t ∈ r|(probR,r,t |= φ2) ∧ (probR,r,t |= φ1)}(because of r2 ⊆ r) = {t ∈ r|probR,r,t |= φ1 ∧ φ2} (Definition 18) = σφ1∧φ2(r). So, σφ1(σφ2(r)) = σφ1∧φ2(r) is proven. Equation σφ2(σφ1(r)) = σφ2∧φ1(r) is proven similarly. Since φ1 ∧ φ2 ⇔ φ2 ∧ φ1 (the logical conjunction of selection conditions are commutative), hence σφ1∧φ2(r) = σφ2∧φ1(r). Therefore, we have σφ1(σφ2(r)) = σφ2(σφ1(r)) and so σφ1(σφ2(r)) = σφ2(σφ1(r)) = σφ1∧φ2(r). Thus, Theorem 1 is proven. Theorem 2. Let r be a fuzzy probabilistic relation over the schema R in FPRDB, A and B be two subsets of attributes of R and A ⊆B . Then ΠA(ΠB(r)) = ΠA(r). (2) Proof. Because A ⊆ B , so A∩B= A and sides of (2) are the relations over the same schema (Definition 20). From that, we are easy to see ΠA(ΠB(r)) = ΠA∩B(r) = ΠA(r). Thus, equation(2) is proven. NGUYEN HOA 205 Theorem 3. Let R1, R2 and R3 be pairwise join-compatible schemas in FPRDB, r1, r2 and r3 be fuzzy probabilistic relations over R1, R2 and R3 respectively. Let ⊗ be a probabilistic conjunction strategy. Then r1 ./⊗ r2 = r2 ./⊗ r1, (3) (r1 ./⊗ r2) ./⊗ r3 = r1 ./⊗ (r2 ./⊗ r3). (4) Equation (4) and (5) say that the join operation of fuzzy probabilistic relations is commutative and associative. Proof. Clearly, r1 ./⊗ r2 and r2 ./⊗ r1 are two relations over the same schema. By Definition 6, the conjunction of fuzzy probabilistic triples is commutative (due to the commutativity of proba- bilistic conjunction strategies and the intersection of fuzzy sets). Consequently, the join of tuples is commutative (by Definition 24). So, by Definition 25, we have r1 ./⊗ r2 = r2 ./⊗ r1. Since R1, R2 and R3 are pairwise join-compatible, so the results of two sides of (4) are the relations over the same schema. Moreover, the intersection of fuzzy sets has the associativity, by Definition 6, it follows that the conjunction of fuzzy probabilistic triples is associative. From asso- ciativity of the classical relational join and by Definition 24, it is easy to see that the join of tuples which are based on the conjunction of fuzzy probabilistic triples is associative. Thus, by Definition 25, it results in (r1 ./⊗ r2) ./⊗ r3 = r1 ./⊗ (r2 ./⊗ r3). Because the Cartesian product is a particular case of the join (Definition 22 and Definition 25), we have the straight corollary of Theorem 3 below. Corollary 1.Let R1, R2 and R3 be pairwise Cartesian product-compatible schemas in FPRDB, r1, r2 and r3 be fuzzy probabilistic relations over R1, R2 and R3 respectively. Then r1 × r2 = r2 × r1, (5) (r1 × r2)× r3 = r1 × (r2 × r3). (6) Theorem 4. Let r1, r2 and r3 be fuzzy probabilistic relations over the same schema R in FPRDB. Let ⊗/⊕ be a probabilistic conjunction/disjunction strategy. Then r1 ∩⊗ r2 = r2 ∩⊗ r1, (7) (r1 ∩⊗ r2) ∩⊗ r3 = r1 ∩⊗ (r2 ∩⊗ r3), (8) r1 ∪⊕ r2 = r2 ∪⊕ r1, (9) (r1 ∪⊕ r2) ∪⊕ r3 = r1 ∪⊕ (r2 ∪⊕ r3). (10) Equations of (7), (8), (9) and (10) say that the intersection and union of relations in FPRDB are commutative and associative. Proof. Equations in the theorem are proven respectively as follows: 206 A FUZZY PROBABILISTIC RELATIONAL DATABASE MODEL AND ALGEBRA Equations (7) and (8): From commutativity and associativity of the intersection of fuzzy sets, it follows the conjunction of fuzzy probabilistic triples has commutativity and associativity. Thus, the intersection of tuples, by Definition 26, has commutativity and associativity. So, the intersection of tuples that have the same key value in r1, r2 and r3 respectively is commutative and associative. From that, it follows Equations (7) and (8). Equations (9) and (10): From commutativity of the union, intersection of fuzzy sets, the disjunc- tion of fuzzy probabilistic triples (Definition 7) and the union of tuples (Definition 28), by Definition 29 we have Equation (9). For Equation (10), let K be the key used to determine common tuples of r1, r2 and r3. Without loss of generality, we may assume that each tuple t belongs to one of the three relations r1, r2 and r3 then there exists two tuples belonging to the two remaining relations, respectively such that the value of the key K of the three tuples respectively are always the same. This can be done by adding t to the relations in which it is missing and resetting α(v) = β(v) = 0 for every v in the set V of values of each attribute A /∈ K of the tuple t. Under this technical assumption, the result of each expression in Equation (10) is not changed and only the union case of two tuples in Definition 29 is relevant. Now, from associativity of the disjunction of fuzzy probabilistic triples (Definition 7) and the union of tuples, Equation(10) obviously holds. 7. CONCLUSION In this paper, the authors propose a fuzzy probabilistic relational database model, called FPRDB, as an extension of the PRDB model with fuzzy sets. FPRDB is also a complete development for the model in [18] with a full set of basic fuzzy probabilistic relational algebraic operations. FPRDB is capable of representing and manipulating both imprecise and uncertain information in the real world applications. FPRDB has been built based on the association of the probability theory and fuzzy theory. The relational schemas, relations, functional dependencies and relational algebraic operations of FPRDB have been defined coherently and consistently. A set of basic properties of the algebraic operations in FPRDB has also been proposed as theorems and proven completely. Towards applying FPRDB in practice, a management system for FPRDB will be build with the familiar querying and manipulating language like SQL that is able to represent and handle imprecise and uncertain information in the real world. REFERENCES [1] T. Eiter, J. J. Lu, T. Lukasiewicz, and V. Subrahmanian, “Probabilistic object bases,” ACM Transactions on Database Systems (TODS), vol. 26, no. 3, pp. 264–312, 2001. [2] T. H. Cao and H. Nguyen, “Uncertain and fuzzy object bases: a data model and algebraic operations,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 19, no. 02, pp. 275–305, 2011. [3] E. F. Codd, “A relational model of data for large shared data banks,” Communications of the ACM, vol. 13, no. 6, pp. 377–387, 1970. [4] D. C. J., An introduction to database systems. Prentice hall New Jersey, 8th ed., 2008. [5] G. Klir and B. Yuan, Fuzzy sets and fuzzy logic. Prentice hall New Jersey, 1995, vol. 4. [6] Z. L.A., “Fuzzy sets,” Journal of Information and Control, vol. 8, no. 3, pp. 338–353, 1965. NGUYEN HOA 207 [7] T. Ge, A. Dekhtyar, and J. Goldsmith, “Uncertain data: Representations, query processing, and applications,” in Advances in Probabilistic Databases for Uncertain Information Management. Springer, 2013, pp. 67–108. [8] Z. Ma and L. Yan, Advances in probabilistic databases for uncertain information management. Springer, 2013, vol. 304. [9] N. Hoa and T. D. Hieu, “A probabilistic relational data model for uncertain information,” in 2013 IEEE Third International Conference on Information Science and Technology (ICIST). IEEE, 2013, pp. 607–613. [10] R. Ross, V. Subrahmanian, and J. Grant, “Aggregate operators in probabilistic databases,” Journal of the ACM (JACM), vol. 52, no. 1, pp. 54–101, 2005. [11] W. Zhao, A. Dekhtyar, and J. Goldsmith, “Databases for interval probabilities,” International journal of intelligent systems, vol. 19, no. 9, pp. 789–815, 2004. [12] X. Meng, Z. Ma, and X. Zhu, “A knowledge-based fuzzy query and results ranking approach for relational databases,” Journal of Computational Information Systems, vol. 6, no. 6, 2010. [13] J. Mishra and S. Ghosh, “A new functional dependency in a vague relational database model,” International Journal of Computer Applications, vol. 39, no. 8, pp. 29–36, 2012. [14] F. Petry, “Fuzzy databases: Principles and applications (with chapter contribution by patrick bosc). international series in intelligent technologies. ed. hj zimmermann,” J. Zimmermann. Kluwer Academic Publishers (KAP), 1996. [15] A. A. Sabour, A. M. Gadallah, and H. A. Hefny, “Flexible querying of relational databases: Fuzzy set based approach,” in International Conference on Advanced Machine Learning Technologies and Applications. Springer, 2014, pp. 446–455. [16] L. Yan and Z. Ma, “A fuzzy probabilistic relational database model and algebra,” International Journal of Fuzzy Systems, vol. 15, no. 1, p. 244, 2013. [17] H. Nguyen, “A probabilistic relational database model and algebra,” Journal of Computer Sci- ence and Cybernetics, vol. 31, no. 4, p. 305, 2015. [18] ——, “Towards a fuzzy probabilistic relational data base model,” in Knowledge and Systems Engineering (KSE), 2015 Seventh International Conference on. IEEE, 2015, pp. 298–301. [19] J. Baldwin, J. Lawry, and T. Martin, “A mass assignment theory of the probability of fuzzy events,” Fuzzy Sets and Systems, vol. 83, no. 3, pp. 353–367, 1996. [20] ——, “A note on probability/possibility consistency for fuzzy events,” in Proceedings of the 6th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, 1996, pp. 521–525. Received on March 15 - 2016 Revised on August 18 - 2016

Các file đính kèm theo tài liệu này:

a_fuzzy_probabilistic_relational_database_model_and_algebra.pdf