Some algorithms related to consistent decision table

In this paper, we proposed some novel algorithms to find an object reduct and an attribute reduct of consistent decision table and proved correctness of the algorithms. Based on this result, we can apply some machine learning methods such as learning decision rules or learning decision trees in an effective way. In addition, combination of attribute reduct and object reduct can help mine large consistent decision table in comparison with traditional methods.

12 trang | Chia sẻ: huongthu9 | Lượt xem: 342 | Lượt tải: 0

Bạn đang xem nội dung tài liệu Some algorithms related to consistent decision table, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên

Journal of Computer Science and Cybernetics, V.33, N.2 (2017), 131–142 DOI 10.15625/1813-9663/33/2/9281 SOME ALGORITHMS RELATED TO CONSISTENT DECISION TABLE HOANG MINH QUANG1, VU DUC THI2, NGUYEN NGOC SAN3 1Institute of Information Technology, Vietnam Academy of Science and Technology 2The Information Technology Institute (ITI), Vietnam National University, Hanoi 3Posts and Telecommunications Institute of Technology 1hoangquang@ioit.ac.vn Abstract. Rough set theory is a useful mathematical tool developed to deal with vagueness and uncertainty. As an important concept of rough set theory, an attribute reduct is a subset of attributes that are jointly sufficient and individually necessary for preserving a particular property of the given information table. Rough set theory is also the most popular for generating decision rules from decision table. In this paper, we propose an algorithm finding object reducts of consistent decsion table. On the other hand, we also show an algorithm to find some attribute reducts and the correctness of our algorithms is proof-theoretical. These algorithms of ours have polynomial time complexity. Our finding object reduct helps other algorithms of finding attribute reducts become more effective, especially as working with huge consistent decision table. Keywords. Object reduct, attribute reduct, rough set theory, decision table, decision rules. 1. INTRODUCTION Rough set theory, introduced by Zdzislaw Pawlak in the early 1980s [3], is a new methamat- ical tool to deal with vagueness and uncertainty. This approach seems to be of fundamental importance to artificial intelligence and cognitive sciences, especially in the areas of machine learning, knowledge acquisition, decision analysis, knowldege discovery from databases, ex- pert systems, decision support systems, in ductive reasoning, and pattern recognition. The information [4] about the real world is given in the form of an information table (sometimes called a decision table). Thus, the information table represents input data, gathered from any domain, such as medicine, finance, or the military. Rows of a decision table are called objects (examples, entities). Properties, columns of a decision table, of objects are perceived through assigning values to some variables. We distinguish between two kinds of variables: attributes (sometimes called condition attributes) and decisions (sometimes called decision attributes). Usually a single decision is all that is required. For example, if the information table describes a hospital, the objects may be patients; the attributes, symptoms and tests; and the decisions, diseases. Each patient is characterized by the results of tests and symp- toms and is classified by the physicians (experts) as being on some level of disease severity. If the information table describes an industrial process, the objects may represent samples of a process taken at some specific moments in time; attributes, the parameters of the process; and decisions, actions taken by the operators (experts). c© 2017 Vietnam Academy of Science & Technology 132 HOANG MINH QUANG, VU DUC THI, NGUYEN NGOC SAN The main concept of rough set theory [4] is an indiscernibility relation, normally as- sociated with a set of attributes. Obviously, the indiscernibility relation is an equivalence relation. The redundant (or dispensable) attributes are defined due to the concept of indis- cernibility relation. If a set of attributes and its superset define the same indiscernibility relation, then any attribute that belongs to the superset and not to the set is redundant. Such a set of attributes, with no redundant attribute, is called minimal (or independent). The set P of attributes is the reduct (or covering) of another set Q of attributes if P is minimal and the indiscernibility relations which defined by P and Q are the same. In [5], the problems of generating minimal (relative) reducts and of generating minimal dependencies are NP-hard. In this paper, we propose a novel method to eliminate redundant objects of consistent decision table while the problem of finding some attribute reducts is preserved. Our methods run in polynomial time complexity. An object reduct of consistent decision table is a reductive table from the original one that satisfies both these two condi- tions: firstly that table has to preserve all attribute reducts of the original one; and secondly that table has to have minimal number of objects. For example, a consistent decision table which has one million objects, has ten attribute reducts totally; after applying the object reduct algorithm it still keeps ten attribute reducts but only has ten thousand objects and that number of objects is minimal so that if we reduce one more objects, we can not pre- seve the ten attribute reducts. It is clear that our algorithm is useful in reducing number of objects and size of data storage. Because computational time complexity of almost at- tribute reduct algorithms depends on number of objects, then obviously our algorithm helps these algorithms become more effective and efficient, especially for tables that contain huge number of objects. 2. BASIC CONCEPTS Some basic concepts of relational database theory and of rough set theory can be found in [1, 2, 6] and [3, 4, 5], respectively. We consider a decision table in rough set theory as relational table in relational database theory because they include rows and columns and have some similar concepts about objects and attributes. Based on results in relational database theory [6], we can apply these results into decision table [7]. Some definitions and combinations that use in this paper are given as follows: Pawlak denotes information system and decision table in [3]. The definitions about relation and functional dependency (FD) are in [2]. Definition 1. Let decision table DS = (U,C∪{d}, V, f), U = {u1, ..., um} be a relation over C ∪{d}. A decision table DS is consistent if and only if the functional dependency C → {d} is true; it means that for any x, y ∈ U if C(x) = C(y) then d(x) = d(y). Conversely, DS is inconsistent. Example 2. Given a consistent decision table DS = (U,C ∪ {d}, V, f) where U = {u1, ..., u6} (or {1, 2, 3, 4, 5, 6}), {d} is decision attribute “Flu” (denoted as {f}) C = {Headache,Muscle pain, Temperature} (denoted as {h,m, t} or {hmt}), R = C ∪ {d} = {hmtf}. VHeadache = {yes, no}, SOME ALGORITHMS RELATED TO CONSISTENT DECISION TABLE 133 VMusclepain = {yes, no}, VTemperature = {normal, high, very high}, Vd = {no, yes}, V = VHeadache ∪ VMuscle pain ∪ VTemperature ∪ Vd, and function f : U × C ∪ {d} → ⋃ a∈C Va as Table 1 Table 1. Consistent decision table STT Headache Muscle pain Temperature Flu 1 yes yes normal no 2 yes yes high yes 3 yes yes very high yes 4 no yes normal no 5 no no high no 6 no yes very high yes Definition 3. Every attribute subset P ⊆ C ∪D determines an indiscernibility relation IND(P ) = {(u, v) ∈ U × U |∀a ∈ P, f(u, a) = f(v, a)}, IND(P ) determines a partition of U which is denoted by U/P . Any element [u]P = {v ∈ U |(u, v) ∈ IND(P )} in U/P is called an equivalent class. Example 4. (continue Example 2) U/h = {{1, 2, 3}, {4, 5, 6}} U/m = {{1, 2, 3, 4, 6}, {5}} U/t = {{1, 4}, {2, 5}, {3, 6}} U/f = {{1, 4, 5}, {2, 3, 6}} U/hm = {{1, 2, 3}, {4, 6}, {5}}} U/ht = {{1}, {2}, {3}, {4}, {5}, {6}} U/mt = {{1, 4}, {2}, {3, 6}, {5}} U/hmt = {{1}, {2}, {3}, {4}, {5}, {6}}. Definition 5. Pawlak [3] define lower approximation, upper approximation and positive region based on equivalent class as follows: • B-upper approximation of X is the set BX = {u ∈ U |[u]B ∩X 6= ∅}, • B-lower approximation of X is the set BX = {u ∈ U |[u]B ⊆ X} with B ⊆ C, X ⊆ U , • B-boundary is the set BNB(X) = BX\BX, • B-positive region of D is the set POSB(D) = ⋃ X∈U/D (BX) 134 HOANG MINH QUANG, VU DUC THI, NGUYEN NGOC SAN Example 6. (continue Example 4) Since Definition 5 we obtain POSh({f}) = {∅} POSm({f}) = {{5}} POSt({f}) = {{1, 4}, {3, 6}} POShm({f}) = {{5}} POSht({f}) = {{1}, {2}, {3}, {4}, {5}, {6}} POSmt({f}) = {{1}, {2}, {3}, {4}, {5}, {6}}. Definition 7. A pair s = 〈R,F 〉, where R is a set of attributes and F is a set of FDs on R, is called a relation schema. For any A ⊆ R, the set A+ = {a : A → {a} ∈ F+} is called the closure of A on s. It is clear that A → B ∈ F+ if and only if B ⊆ A+. Similarly, A+r = {a : A→ {a} ∈ F+} is called the closure of A on relation r. Definition 8. Let r be a relation, s = 〈R,F 〉 be a relation scheme and A ⊆ R. Then A is a key of r (a key of s) if A → R (A → R ∈ F+). A is a minimal key of r (s) if A is a key of r (s) and any proper subset of A is not key of r (s). The set of all minimal keys of r (s) is denoted by Kr (Ks). A family K ⊆ P (R) is a Sperner-system on R if for any A,B ∈ K implies A 6⊂ B. It is clear that Kr (Ks) are Sperner-systems. Definition 9. Let K be a Sperner-system over R as the set of all minimal keys of s. We defined the set of antikeys of K, denoted by K−1, as follows K−1 = {A ⊂ R : (B ∈ K)⇒ (B 6⊂ A) and if (A ⊂ C)⇒ (∃B ∈ K)(B ⊆ C)} . It is easy to see that K−1 is the set of subsets of R, which does not contain the element of K and which is maximal for this property. They are the maximal non-keys. Clearly, K−1 is also a Sperner-system. Definition 10. Let r be a relation over R. Denote Er = {Eij : 1 ≤ i ≤ j ≤ |r|}, where Eij = {a ∈ R : hi(a) = hj(a)}. Then Er is called an equality set of r. For Ar ∈ R,A+r = ∩Eij , if there exists Eij ∈ Er : A ⊆ Eij , otherwise A+r = R. Example 11. (continue Example 2) Let U be a relation over C ∪ {d}, 1 ≤ i < j ≤ |U |. For each pair of rows (i, j), we construct the sets Eij . We have: E1,2 = {hm}, E1,3 = {hm}. By doing the same thing with pairs (1, 4), ..., (1, 6), (2, 3), (2, 4), ..., (5, 6) we obtain the set Er containing sets Ai as follows A1 = {hm} = E1,2 = E1,3 = E4,6 A2 = {mtf} = E1,4 = E3,6 A3 = {f} = E1,5 A4 = {m} = E1,6 = E2,4 = E3,4 A5 = {hmf} = E2,3 A6 = {t} = E2,5 SOME ALGORITHMS RELATED TO CONSISTENT DECISION TABLE 135 A7 = {mf} = E2,6 A8 = {hf} = E4,5 A9 = {h} = E5,6 Er = {A1, ..., A9} = EUr . Definition 12. Let r = {h1, ..., hm} be a relation over R, Er is the equality set of r. Let Mr = {Eij ∈ Er : ∀Est ∈ Er : Eij ⊆ Est, Eij 6= Est} where 1 ≤ i < j ≤ m, 1 ≤ s < t ≤ m. Mr is called the maximal equality system of r. Md = {A ∈ Er : d 6∈ A, 6 ∃B ∈ Er : d 6∈ B,A ⊂ B} is called the maximal equality system of r with respect to decision attribute d of consistent decision table DS. Example 13. (continue Examples 2 and 11) We construct the set MUd being the maximal equality system of Er that do not have decision attribute d of consistent decision table DS. It means that: Md = M U d = {hm, t} = {B1, B2}. Definition 14. Let s = 〈R,F 〉 be a relation scheme over R and a ∈ R. The set Ksa = {A ⊆ R : A→ {a}, 6 ∃B : (B → {a})(B ⊂ A)} is called a family of minimal sets of the attribute a over s. Similarly, the set Kra = {A ⊆ R : A→ {a}, 6 ∃B ⊆ R : (B → {a})(B ⊂ A)} is called a family of minimal sets of the attribute a over r. Definition 15. If K is a Sperner-system over R as the family of minimal sets of the attribute a over r (or s); in other words K = Kr (or K = Ks), then K−1 = {Kra}−1 (or K−1 = {Ksa}−1) is the family of maximal subsets of R which are not the family of minimal sets of the attribute a, defined as {Kra}−1 = {A ⊆ R : A→ {a} 6∈ R+r , A ⊂ B ⇒ B → {a} ∈ F+r }, {Ksa}−1 = {A ⊆ R : A→ {a} 6∈ R+r , A ⊂ B ⇒ B → {a} ∈ F+r }. It is clear that R 6∈ Ksa, R 6∈ Kra, {a} ∈ Ksa, {a} ∈ Kra and Ksa, Kra are Sperner-systems over R. Definition 16. Let DS = (U,C ∪ {d}, V, f) be a decision table. If B ⊆ C satisfies 1) POSB(D) = POSC(D) 2) ∀b ∈ B,POSB−{b}(D) 6= POSC(D) then B is called reduct of C. If DS is a consistent decision table, B is an attribute reduct of C if B satifies B → {d} and ∀B′ ⊂ B, B′ 6→ {d}. Let RED(C) be the set of all reducts of C. From Definition 16 and formula Kra in Definition 14 we have RED(C) = K r d −{d} where Krd is the family of all minimal set of the attribute {d} over r = 〈U,C ∪ {d}〉. 136 HOANG MINH QUANG, VU DUC THI, NGUYEN NGOC SAN Example 17. (continue Example 6) Based on Definition 16, we test POSB({f}) where B ∈ {{h}, {m}, {t}, {hm}, {ht}, {mt}} then we have B ∈ {{ht}, {mt}} satifies POSB({f}) = POSC({f}). Thus, REDU (C) = {ht,mt}. 3. ALGORITHMS RELATED TO CONSISTENT DECISION TABLE In this section, we construct two algorithms to find an object reduct and an attribute reduct run in polynomial time complexity. The first algorithm will preserve process that find some attribute reducts according to positive region reducts [3]. This is a significant result in data mining process specially with respect to large consistent decision table. The objective of almost algorithms related to consistent decision table is to find all attribute reducts. While computational time complexity of these algorithms depends not only on number of attributes but also number of objects, there are many redundant objects making the process slower. Therefore, in this paper we introduce a method to remove these redundant objects, but still preserve all attribute reducts. This method bases on database theory and rough set theory. In rough set theory, we have the definition of RED(C). In database theory, we have the definition of Krd . According to above definition, RED(C) = Krd − {d}, the formula means that set of all attribute reducts of consistent decision table equals set of all minimal keys of relation U over C ∪ {d} of DS = (U,C ∪ {d}, V, f) minus decision attribute {d}. On another hand, we know that if there exists a set of all minimal keys, then there also exists a set of anti keys, they define each other. We have Lemma 18, Md = (K r d) −1. If we gradually temporarily remove an object of consistent decision table so that Md does not change, then (K r d) −1 also does not change, thus set of all minimal keys does not change and RED(C) does not change, then we conclude that this object is redundant, then we permanently remove that object; and vice versa. At the end, we get an object reduct which is a consistent decision table that already rejects all redundant objects. Lemma 18. Let DS = (U,C∪{d}, V, f) be a consistent decision table where C = {c1, c2, ..., cn}, U = {u1, u2, ..., un}. Let us consider r = {u1, u2, ..., um} on the attribute set R = C ∪ {d}. We set Er = {Eij : 1 ≤ i < j ≤ m} where Eij = {a ∈ R : a(ui) = a(uj)}. We set Md = {A ∈ Er : d 6∈ A, 6 ∃B ∈ Er : d 6∈ B,A ⊂ B}. Then we have Md = (K r d) −1 where Krd is called a family of minimal sets of the attribute {d} over relation r. The Lemma 18 is proved in [7]. Definition 19. An object reduct of consistent decision table DS = (U,C ∪ {d}, V, f) is a consistent decision table DS′ = (U ′, C ∪ {d}, V, f), where RED(C) = REDU (C) and: 1) U ′ ⊆ U , 2) REDU (C) = REDU ′(C), 3) REDU (C) 6= REDU ′−{u}(C), ∀u ∈ U ′. SOME ALGORITHMS RELATED TO CONSISTENT DECISION TABLE 137 Algorithm 1: Finding an object reduct over consistent decision table Input : DS = (U,C ∪ {d}, V, f) Output: DS′ = (U ′, C ∪ {d}, V, f) 1 Step 1: Compute Er = {A1, ..., At}; 2 Step 2: Compute MUd = {A ∈ Er : d 6∈ A, 6 ∃B ∈ Er : d 6∈ B,A ⊂ B}.; 3 Step 3: Set T (0) = U = {u1, ..., um}; 4 Step 4: Set T (i + 1) = { T (i)− ui+1, if MT (i)−ui+1d = MUd T (i), otherwise Then we set U ′ = T (m). Theorem 20. T (m) satisfies the three conditions 1), 2) and 3) in Definition 19. Proof. We prove the theorem by induction. At basis step T (0) = U , clearly, U ′ = U , REDU ′(C) = REDU (C) thus the two conditions 1), 2) are satisfied. At inductive step, assume that we have T (i) = U(i) satisfies two conditions 1), 2) in Definition 19. We have to prove that T (i + 1) = U(i + 1) satisfies the two conditions. • In the first case: If T (i + 1) = T (i) then it is obvious that U(i + 1) = U(i), REDU(i+1)(C) = REDU(i)(C) = RED(C) by induction hypothesis. Thus, T (i+1) satisfies the two conditions 1), 2) in Definition 19. • In the second case: If T (i + 1) = T (i)− {ui+1} then MUd = MU(i+1)d . By Lemma 18, MUd = ( KUd )−1 where (U = {u1, ..., um})⇒MU(i+1)d = ( K U(i+1) d )−1 ⇒ (KUd )−1 = (KU(i+1)d )−1 . By Definition 9 and 15 (K and K1 are uniquely determined by one another), it can be seen that ( KUd ) = ( K U(i+1) d ) . From Definition 16 and the result of Definition 16, we have REDU (C) = ( KUd )− {d} and REDU(i+1)(C) = ( K U(i+1) d ) − {d} ⇒ (ii1)REDU (C) = REDU(i+1)(C). From induction hypothesis, we have (ii2) REDU (C) = REDU(i)(C). From (ii1), (ii2) we obtain REDU (C) = REDU(i)(C) = REDU(i+1)(C). Because REDU (C) = RED(C) is a Sperner-system (by definition KUd is a Sperner-system ⇒ KUd − {d} is a Sperner-system), REDU(i)(C) and REDU(i+1)(C) are Sperner-systems. Finally, the two conditions in Definition 19 are satisfied at step i + 1 as follows 138 HOANG MINH QUANG, VU DUC THI, NGUYEN NGOC SAN 1) U(i + 1) ⊆ U(i), 2) REDU(i+1)(C) = REDU(i)(C) = ... = REDU (C) = RED(C). When i+1 = m then Algorithm 1 stops. Now we need show that U(m) satisfies the condition 3) in Definition 19 which means that REDU(m)−u(C) 6= REDU (C), where ∀u ∈ U(m). Assume that there exists u = ui+1, u ∈ U(m) such that REDU(m)−ui+1(C) = REDU (C) (ii3). By Definition 16, REDU(m)−ui+1(C) = K U(m)−ui+1 d −{d} and REDU (C) = KUd −{d}, thus (ii3)⇔ KU(m)−ui+1d − {d} = KUd − {d} ⇔ KU(m)−ui+1d = KUd (ii4). By Definition 9, 15 and Lemma 18 (K and K−1 are uniquely determined by one another), it means that (ii4)⇔ ( K U(m)−ui+1 d )−1 = ( KUd )−1 ⇔MU(m)−ui+1d = MUd (ii5). By the above induction, if M U(m)−ui+1 d = M U d then ui+1 will be removed, thus ui+1 6∈ U(m) which contradicts with hypothesis u = ui+1 ∈ U(m). Hence, the condition 3) in Definition 19 is satisfied. The theorem is proved. It is clear that the number of steps computing Er by Definition 10 is less than |U |2. The number of steps computing Md is less than |Er|2 and |Er| ≤ |U |(|U | − 1) 2 . Thus, the worst-case time complexity of Algorithm 1 is not greater than O(|U |5). If we change order of universe set U , we can find another object reduct. Example 21. (continue example 13) In step 3 and step 4 of Algorithm 1 we have: T (0) = U = {1, 2, 3, 4, 5, 6}, by using formula Md in Definition 10 we compute ET (0)−{1}r = {A1, A2, A4, A5, A6, A7, A8, A9} M T (0)−{1} d = {hm, t} = MUd ⇒ T (1) = T (0)− {1} ET (1)−{2}r = {A1, A2, A4, A8, A9} M T (1)−{2} d = {hm} 6= MUd ⇒ T (2) = T (1) ET (2)−{3}r = {A1, A4, A6, A7, A8, A9} M T (2)−{3} d = {hm, t} = MUd ⇒ T (3) = T (2)− {3} ET (3)−{4}r = {A6, A7, A9} M T (3)−{4} d = {t, h} 6= MUd ⇒ T (4) = T (3) ET (4)−{5}r = {A1, A4, A7} M T (4)−{5} d = {hm} 6= MUd ⇒ T (5) = T (4) ET (5)−{6}r = {A4, A6, A8} M T (5)−{6} d = {m, t} 6= MUd ⇒ T (6) = T (5). Set U ′ = T (6) = {2, 4, 5, 6} then DS′ = (U ′, C ∪ {d}, V, f) is the object reduct of the consistent decision table of DS = (U,C ∪ {d}, V, f) as Table 2. SOME ALGORITHMS RELATED TO CONSISTENT DECISION TABLE 139 Table 2. The object reduct of Table 1 STT Headache Muscle pain Temperature Flu 2 yes yes high yes 4 no yes normal no 5 no no high no 6 no yes very high yes Example 22. (continue Example 21) By using Definition 3 we have U ′/h = {{2}, {4, 5, 6}} U ′/m = {{2, 4, 6}, {5}} U ′/t = {{2, 5}, {4}, {6}} U ′/f = {{2, 6}, {4, 5}} U ′/hm = {{2}, {4, 6}, {5}} U ′/ht = {{2}, {4}, {5}, {6}} U ′/tm = {{2}, {4}, {5}, {6}}. Example 23. (continue Example 22) By using Definition 5 we have POS′h({f}) = {{2}} POS′m({f}) = {{5}} POS′t({f}) = {{4}, {6}} POS′hm({d}) = {{2}, {5}} POS′ht({d}) = {{2}, {4}, {5}, {6}} POS′tm({d}) = {{2}, {4}, {5}, {6}}. Example 24. (continue Examples 17 and 23) Based on Definition 16, we have REDU ′(C) = {ht, tm}. By Definition 19, REDU ′(C) is a Sperner-system and REDU (C) = REDU ′(C) = {ht, tm}. Thus, the Algorithm 1 preserses RED(C) of the consistent decision table DS (Table 1). Clearly, the consistent decision DS′ is efficient for finding RED(C) of consistent decision table according to rough set theory [3]. 140 HOANG MINH QUANG, VU DUC THI, NGUYEN NGOC SAN Algorithm 2: Finding the minimal key from a set of antikeys Input : Let K, H be Sperner-systems and C = {c1, ..., cn} ⊆ U such that H−1 = K and ∃B ∈ K : B ⊆ C Output: D ∈ H 1 Step 1: We set A(0) = C; 2 Step i + 1: Set A(i + 1) = { A(i)− {ci+1}, if ∀B ∈ K : A(i)− {ci+1} 6⊆ B A(i), otherwise Then we set D = A(n). Lemma 25. [6] If K is a set of antikeys, then A(n) ∈ H. Algorithm 3: Finding an attribute reduct from a consistent decision table Input : DS = (U,C ∪ {d}, V, f) Output: D ∈ RED(C) 1 Step 1: Compute Er = {A1, ..., At}; 2 Step 2: Compute Md = {A ∈ Er : d 6∈ A, 6 ∃B ∈ Er : d 6∈ B,A ⊂ B}; 3 Step 3: Set H(0) = C = {c1, ..., cn}; 4 Step 4: Set H(i + 1) = { H(i)− ci+1, if 6 ∃B ∈Md : H(i)− ci+1 ⊆ B H(i), otherwise Then we set D = H(n). Theorem 26. H(n) ∈ RED(C). Proof. The Algorithm 3 is based on the Algorithm 2. By Lemma 18, (Krd) −1 = Md. By Lemma 25, H(n) ∈ Krd (1). By the result of Definition 16, RED(C) = Krd−{d} (2). At step 3 of Algorithm 3 we set C = {c1, ..., cn} then d 6∈ C. Thus, in Algorithm 3 d 6∈ H(n) (3). From (1) and (3) we have H(n) ∈ Krd−{d} (4). From (2) and (4) we obtain H(n) ∈ RED(C). The theorem is proved. Similarly to the Algorithm 1, the time complexity of Algorithm 3 is not greater than O(|C| × |U |4). If we change the order of the set C in step 3 we can get another attribute reduct of the consistent decision table DS. Example 27. (continue Example 21) In step 1 of Algorithm 3 we have E1,2 = {m}, E1,3 = {t}, E1,2 = {mf}, E2,3 = {hf}, E2,4 = {hm}, E3,4 = {h}. In step 2 of Algorithm 3 Er = {m, t,mf, hf, hm, h} ⇒Md = {hm, t} = {B1, B2}. SOME ALGORITHMS RELATED TO CONSISTENT DECISION TABLE 141 In step 3 and 4 of Algorithm 3 we have C = H(0) = {hmt} temp = H(0)− {h} = {mt} 6⊆ (∀B ∈Md)⇒ H(1) = temp temp = H(1)− {m} = {t} ⊆ B2 ∈Md ⇒ H(2) = H(1) temp = H(2)− {t} = {m} ⊆ B1 ∈Md ⇒ H(3) = H(2). Set H = H(3) = {mt}, the Algorithm 3 and H ∈ RED(C), we obtain table 3. By analogy Table 3. An attribute reduct {mt} of Table 2 STT Muscle pain Temperature Flu 1 yes high yes 2 yes normal no 3 no high no 4 yes very high yes with attributes [4], we can define elementary sets associated with the decision as subsets of the set of all objects with the same value of the decision. Such subsets will be called concepts. We observe that in terms of rough set theory, decision “Flu” depends on attributes “Muscle pain” and “Temperature”, since all elementary sets of indiscernibility relation associated with {mt} are subsets of some concepts. The induced decision rules from Table 3 are equal to those from Table 1 with respect to reduct {mt}, they are normal(Temperature)→ ¬(Flu), ¬(Muscle pain) ∧ high(Temperature)→ ¬(Flu), (Muscle pain) ∧ high(Temperature)→ (Flu), very high(Temperature)→ (Flu). 4. CONCLUSION In this paper, we proposed some novel algorithms to find an object reduct and an attribute reduct of consistent decision table and proved correctness of the algorithms. Based on this result, we can apply some machine learning methods such as learning decision rules or learning decision trees in an effective way. In addition, combination of attribute reduct and object reduct can help mine large consistent decision table in comparison with traditional methods. REFERENCES [1] J. Demetrovics and V. D. Thi, “Algorithms for generating an armstrong relation and inferring functional dependencies in the relational datamodel,” Computers & Mathematics with Applica- tions, vol. 26, no. 4, pp. 43–55, 1993. 142 HOANG MINH QUANG, VU DUC THI, NGUYEN NGOC SAN [2] ——, “Keys, antikeys and prime attributes,” in Annales Univ. Sci. Budapest, Sect. Comp, vol. 8, 1987, pp. 35–52. [3] Z. Pawlak, “Rough sets,” International Journal of Computer & Information Sciences, vol. 11, no. 5, pp. 341–356, 1982. [4] Z. Pawlak, J. Grzymala-Busse, R. Slowinski, and W. Ziarko, “Rough sets,” Communications of the ACM, vol. 38, no. 11, pp. 88–95, 1995. [5] A. Skowron and C. Rauszer, “The discernibility matrices and functions in information systems,” in Intelligent Decision Support. Springer, 1992, pp. 331–362. [6] V. D. Thi, “The minimal keys and antikeys,” Acta Cybernetica, vol. 7, no. 4, pp. 361–371, 1986. [7] V. D. Thi and N. L. Giang, “A method to construct decision table from relation scheme,” Cyber- netics and Information Technologies, vol. 11, no. 3, pp. 32–41, 2011. Received on March 06 - 2017 Revised on March 10 - 2017

Các file đính kèm theo tài liệu này:

some_algorithms_related_to_consistent_decision_table.pdf