In this paper, we proposed some novel algorithms to find an object reduct and an attribute
reduct of consistent decision table and proved correctness of the algorithms. Based on
this result, we can apply some machine learning methods such as learning decision rules or
learning decision trees in an effective way. In addition, combination of attribute reduct and
object reduct can help mine large consistent decision table in comparison with traditional
methods.
12 trang |
Chia sẻ: huongthu9 | Lượt xem: 433 | Lượt tải: 0
Bạn đang xem nội dung tài liệu Some algorithms related to consistent decision table, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
Journal of Computer Science and Cybernetics, V.33, N.2 (2017), 131–142
DOI 10.15625/1813-9663/33/2/9281
SOME ALGORITHMS RELATED TO CONSISTENT DECISION
TABLE
HOANG MINH QUANG1, VU DUC THI2, NGUYEN NGOC SAN3
1Institute of Information Technology, Vietnam Academy of Science and Technology
2The Information Technology Institute (ITI), Vietnam National University, Hanoi
3Posts and Telecommunications Institute of Technology
1hoangquang@ioit.ac.vn
Abstract. Rough set theory is a useful mathematical tool developed to deal with vagueness and
uncertainty. As an important concept of rough set theory, an attribute reduct is a subset of attributes
that are jointly sufficient and individually necessary for preserving a particular property of the given
information table. Rough set theory is also the most popular for generating decision rules from
decision table. In this paper, we propose an algorithm finding object reducts of consistent decsion
table. On the other hand, we also show an algorithm to find some attribute reducts and the correctness
of our algorithms is proof-theoretical. These algorithms of ours have polynomial time complexity.
Our finding object reduct helps other algorithms of finding attribute reducts become more effective,
especially as working with huge consistent decision table.
Keywords. Object reduct, attribute reduct, rough set theory, decision table, decision rules.
1. INTRODUCTION
Rough set theory, introduced by Zdzislaw Pawlak in the early 1980s [3], is a new methamat-
ical tool to deal with vagueness and uncertainty. This approach seems to be of fundamental
importance to artificial intelligence and cognitive sciences, especially in the areas of machine
learning, knowledge acquisition, decision analysis, knowldege discovery from databases, ex-
pert systems, decision support systems, in ductive reasoning, and pattern recognition. The
information [4] about the real world is given in the form of an information table (sometimes
called a decision table). Thus, the information table represents input data, gathered from
any domain, such as medicine, finance, or the military. Rows of a decision table are called
objects (examples, entities). Properties, columns of a decision table, of objects are perceived
through assigning values to some variables. We distinguish between two kinds of variables:
attributes (sometimes called condition attributes) and decisions (sometimes called decision
attributes). Usually a single decision is all that is required. For example, if the information
table describes a hospital, the objects may be patients; the attributes, symptoms and tests;
and the decisions, diseases. Each patient is characterized by the results of tests and symp-
toms and is classified by the physicians (experts) as being on some level of disease severity.
If the information table describes an industrial process, the objects may represent samples of
a process taken at some specific moments in time; attributes, the parameters of the process;
and decisions, actions taken by the operators (experts).
c© 2017 Vietnam Academy of Science & Technology
132 HOANG MINH QUANG, VU DUC THI, NGUYEN NGOC SAN
The main concept of rough set theory [4] is an indiscernibility relation, normally as-
sociated with a set of attributes. Obviously, the indiscernibility relation is an equivalence
relation. The redundant (or dispensable) attributes are defined due to the concept of indis-
cernibility relation. If a set of attributes and its superset define the same indiscernibility
relation, then any attribute that belongs to the superset and not to the set is redundant.
Such a set of attributes, with no redundant attribute, is called minimal (or independent).
The set P of attributes is the reduct (or covering) of another set Q of attributes if P is
minimal and the indiscernibility relations which defined by P and Q are the same.
In [5], the problems of generating minimal (relative) reducts and of generating minimal
dependencies are NP-hard. In this paper, we propose a novel method to eliminate redundant
objects of consistent decision table while the problem of finding some attribute reducts is
preserved. Our methods run in polynomial time complexity. An object reduct of consistent
decision table is a reductive table from the original one that satisfies both these two condi-
tions: firstly that table has to preserve all attribute reducts of the original one; and secondly
that table has to have minimal number of objects. For example, a consistent decision table
which has one million objects, has ten attribute reducts totally; after applying the object
reduct algorithm it still keeps ten attribute reducts but only has ten thousand objects and
that number of objects is minimal so that if we reduce one more objects, we can not pre-
seve the ten attribute reducts. It is clear that our algorithm is useful in reducing number
of objects and size of data storage. Because computational time complexity of almost at-
tribute reduct algorithms depends on number of objects, then obviously our algorithm helps
these algorithms become more effective and efficient, especially for tables that contain huge
number of objects.
2. BASIC CONCEPTS
Some basic concepts of relational database theory and of rough set theory can be found
in [1, 2, 6] and [3, 4, 5], respectively. We consider a decision table in rough set theory as
relational table in relational database theory because they include rows and columns and
have some similar concepts about objects and attributes. Based on results in relational
database theory [6], we can apply these results into decision table [7]. Some definitions and
combinations that use in this paper are given as follows:
Pawlak denotes information system and decision table in [3]. The definitions about
relation and functional dependency (FD) are in [2].
Definition 1. Let decision table DS = (U,C∪{d}, V, f), U = {u1, ..., um} be a relation over
C ∪{d}. A decision table DS is consistent if and only if the functional dependency C → {d}
is true; it means that for any x, y ∈ U if C(x) = C(y) then d(x) = d(y). Conversely, DS is
inconsistent.
Example 2. Given a consistent decision table DS = (U,C ∪ {d}, V, f)
where U = {u1, ..., u6} (or {1, 2, 3, 4, 5, 6}), {d} is decision attribute “Flu” (denoted as {f})
C = {Headache,Muscle pain, Temperature} (denoted as {h,m, t} or {hmt}),
R = C ∪ {d} = {hmtf}.
VHeadache = {yes, no},
SOME ALGORITHMS RELATED TO CONSISTENT DECISION TABLE 133
VMusclepain = {yes, no},
VTemperature = {normal, high, very high},
Vd = {no, yes}, V = VHeadache ∪ VMuscle pain ∪ VTemperature ∪ Vd,
and function f : U × C ∪ {d} →
⋃
a∈C
Va as Table 1
Table 1. Consistent decision table
STT Headache Muscle pain Temperature Flu
1 yes yes normal no
2 yes yes high yes
3 yes yes very high yes
4 no yes normal no
5 no no high no
6 no yes very high yes
Definition 3. Every attribute subset P ⊆ C ∪D determines an indiscernibility relation
IND(P ) = {(u, v) ∈ U × U |∀a ∈ P, f(u, a) = f(v, a)},
IND(P ) determines a partition of U which is denoted by U/P .
Any element [u]P = {v ∈ U |(u, v) ∈ IND(P )} in U/P is called an equivalent class.
Example 4. (continue Example 2)
U/h = {{1, 2, 3}, {4, 5, 6}}
U/m = {{1, 2, 3, 4, 6}, {5}}
U/t = {{1, 4}, {2, 5}, {3, 6}}
U/f = {{1, 4, 5}, {2, 3, 6}}
U/hm = {{1, 2, 3}, {4, 6}, {5}}}
U/ht = {{1}, {2}, {3}, {4}, {5}, {6}}
U/mt = {{1, 4}, {2}, {3, 6}, {5}}
U/hmt = {{1}, {2}, {3}, {4}, {5}, {6}}.
Definition 5. Pawlak [3] define lower approximation, upper approximation and positive
region based on equivalent class as follows:
• B-upper approximation of X is the set BX = {u ∈ U |[u]B ∩X 6= ∅},
• B-lower approximation of X is the set BX = {u ∈ U |[u]B ⊆ X} with B ⊆ C, X ⊆ U ,
• B-boundary is the set BNB(X) = BX\BX,
• B-positive region of D is the set POSB(D) =
⋃
X∈U/D
(BX)
134 HOANG MINH QUANG, VU DUC THI, NGUYEN NGOC SAN
Example 6. (continue Example 4) Since Definition 5 we obtain
POSh({f}) = {∅}
POSm({f}) = {{5}}
POSt({f}) = {{1, 4}, {3, 6}}
POShm({f}) = {{5}}
POSht({f}) = {{1}, {2}, {3}, {4}, {5}, {6}}
POSmt({f}) = {{1}, {2}, {3}, {4}, {5}, {6}}.
Definition 7. A pair s = 〈R,F 〉, where R is a set of attributes and F is a set of FDs on
R, is called a relation schema. For any A ⊆ R, the set A+ = {a : A → {a} ∈ F+} is called
the closure of A on s. It is clear that A → B ∈ F+ if and only if B ⊆ A+. Similarly,
A+r = {a : A→ {a} ∈ F+} is called the closure of A on relation r.
Definition 8. Let r be a relation, s = 〈R,F 〉 be a relation scheme and A ⊆ R. Then A is
a key of r (a key of s) if A → R (A → R ∈ F+). A is a minimal key of r (s) if A is a key
of r (s) and any proper subset of A is not key of r (s). The set of all minimal keys of r (s)
is denoted by Kr (Ks). A family K ⊆ P (R) is a Sperner-system on R if for any A,B ∈ K
implies A 6⊂ B. It is clear that Kr (Ks) are Sperner-systems.
Definition 9. Let K be a Sperner-system over R as the set of all minimal keys of s. We
defined the set of antikeys of K, denoted by K−1, as follows
K−1 = {A ⊂ R : (B ∈ K)⇒ (B 6⊂ A) and if (A ⊂ C)⇒ (∃B ∈ K)(B ⊆ C)} .
It is easy to see that K−1 is the set of subsets of R, which does not contain the element of
K and which is maximal for this property. They are the maximal non-keys. Clearly, K−1 is
also a Sperner-system.
Definition 10. Let r be a relation over R. Denote Er = {Eij : 1 ≤ i ≤ j ≤ |r|}, where
Eij = {a ∈ R : hi(a) = hj(a)}. Then Er is called an equality set of r. For Ar ∈ R,A+r = ∩Eij ,
if there exists Eij ∈ Er : A ⊆ Eij , otherwise A+r = R.
Example 11. (continue Example 2) Let U be a relation over C ∪ {d}, 1 ≤ i < j ≤ |U |. For
each pair of rows (i, j), we construct the sets Eij . We have:
E1,2 = {hm}, E1,3 = {hm}. By doing the same thing with pairs (1, 4), ..., (1, 6), (2, 3),
(2, 4), ..., (5, 6) we obtain the set Er containing sets Ai as follows
A1 = {hm} = E1,2 = E1,3 = E4,6
A2 = {mtf} = E1,4 = E3,6
A3 = {f} = E1,5
A4 = {m} = E1,6 = E2,4 = E3,4
A5 = {hmf} = E2,3
A6 = {t} = E2,5
SOME ALGORITHMS RELATED TO CONSISTENT DECISION TABLE 135
A7 = {mf} = E2,6
A8 = {hf} = E4,5
A9 = {h} = E5,6
Er = {A1, ..., A9} = EUr .
Definition 12. Let r = {h1, ..., hm} be a relation over R, Er is the equality set of r. Let
Mr = {Eij ∈ Er : ∀Est ∈ Er : Eij ⊆ Est, Eij 6= Est}
where 1 ≤ i < j ≤ m, 1 ≤ s < t ≤ m. Mr is called the maximal equality system of r.
Md = {A ∈ Er : d 6∈ A, 6 ∃B ∈ Er : d 6∈ B,A ⊂ B}
is called the maximal equality system of r with respect to decision attribute d of consistent
decision table DS.
Example 13. (continue Examples 2 and 11) We construct the set MUd being the maximal
equality system of Er that do not have decision attribute d of consistent decision table DS.
It means that:
Md = M
U
d = {hm, t} = {B1, B2}.
Definition 14. Let s = 〈R,F 〉 be a relation scheme over R and a ∈ R. The set
Ksa = {A ⊆ R : A→ {a}, 6 ∃B : (B → {a})(B ⊂ A)}
is called a family of minimal sets of the attribute a over s. Similarly, the set
Kra = {A ⊆ R : A→ {a}, 6 ∃B ⊆ R : (B → {a})(B ⊂ A)}
is called a family of minimal sets of the attribute a over r.
Definition 15. If K is a Sperner-system over R as the family of minimal sets of the attribute
a over r (or s); in other words K = Kr (or K = Ks), then K−1 = {Kra}−1 (or K−1 =
{Ksa}−1) is the family of maximal subsets of R which are not the family of minimal sets of
the attribute a, defined as
{Kra}−1 = {A ⊆ R : A→ {a} 6∈ R+r , A ⊂ B ⇒ B → {a} ∈ F+r },
{Ksa}−1 = {A ⊆ R : A→ {a} 6∈ R+r , A ⊂ B ⇒ B → {a} ∈ F+r }.
It is clear that R 6∈ Ksa, R 6∈ Kra, {a} ∈ Ksa, {a} ∈ Kra and Ksa, Kra are Sperner-systems over
R.
Definition 16. Let DS = (U,C ∪ {d}, V, f) be a decision table. If B ⊆ C satisfies
1) POSB(D) = POSC(D)
2) ∀b ∈ B,POSB−{b}(D) 6= POSC(D)
then B is called reduct of C.
If DS is a consistent decision table, B is an attribute reduct of C if B satifies B → {d}
and ∀B′ ⊂ B, B′ 6→ {d}. Let RED(C) be the set of all reducts of C. From Definition 16
and formula Kra in Definition 14 we have RED(C) = K
r
d −{d} where Krd is the family of all
minimal set of the attribute {d} over r = 〈U,C ∪ {d}〉.
136 HOANG MINH QUANG, VU DUC THI, NGUYEN NGOC SAN
Example 17. (continue Example 6) Based on Definition 16, we test POSB({f}) where
B ∈ {{h}, {m}, {t}, {hm}, {ht}, {mt}} then we have B ∈ {{ht}, {mt}} satifies POSB({f}) =
POSC({f}). Thus, REDU (C) = {ht,mt}.
3. ALGORITHMS RELATED TO CONSISTENT DECISION TABLE
In this section, we construct two algorithms to find an object reduct and an attribute
reduct run in polynomial time complexity. The first algorithm will preserve process that find
some attribute reducts according to positive region reducts [3]. This is a significant result in
data mining process specially with respect to large consistent decision table.
The objective of almost algorithms related to consistent decision table is to find all
attribute reducts. While computational time complexity of these algorithms depends not
only on number of attributes but also number of objects, there are many redundant objects
making the process slower. Therefore, in this paper we introduce a method to remove these
redundant objects, but still preserve all attribute reducts. This method bases on database
theory and rough set theory. In rough set theory, we have the definition of RED(C). In
database theory, we have the definition of Krd . According to above definition, RED(C) =
Krd − {d}, the formula means that set of all attribute reducts of consistent decision table
equals set of all minimal keys of relation U over C ∪ {d} of DS = (U,C ∪ {d}, V, f) minus
decision attribute {d}. On another hand, we know that if there exists a set of all minimal
keys, then there also exists a set of anti keys, they define each other. We have Lemma 18,
Md = (K
r
d)
−1. If we gradually temporarily remove an object of consistent decision table so
that Md does not change, then (K
r
d)
−1 also does not change, thus set of all minimal keys does
not change and RED(C) does not change, then we conclude that this object is redundant,
then we permanently remove that object; and vice versa. At the end, we get an object reduct
which is a consistent decision table that already rejects all redundant objects.
Lemma 18. Let DS = (U,C∪{d}, V, f) be a consistent decision table where C = {c1, c2, ..., cn},
U = {u1, u2, ..., un}. Let us consider r = {u1, u2, ..., um} on the attribute set R = C ∪ {d}.
We set Er = {Eij : 1 ≤ i < j ≤ m} where Eij = {a ∈ R : a(ui) = a(uj)}.
We set Md = {A ∈ Er : d 6∈ A, 6 ∃B ∈ Er : d 6∈ B,A ⊂ B}.
Then we have Md = (K
r
d)
−1 where Krd is called a family of minimal sets of the attribute {d}
over relation r.
The Lemma 18 is proved in [7].
Definition 19. An object reduct of consistent decision table DS = (U,C ∪ {d}, V, f) is a
consistent decision table DS′ = (U ′, C ∪ {d}, V, f), where RED(C) = REDU (C) and:
1) U ′ ⊆ U ,
2) REDU (C) = REDU ′(C),
3) REDU (C) 6= REDU ′−{u}(C), ∀u ∈ U ′.
SOME ALGORITHMS RELATED TO CONSISTENT DECISION TABLE 137
Algorithm 1: Finding an object reduct over consistent decision table
Input : DS = (U,C ∪ {d}, V, f)
Output: DS′ = (U ′, C ∪ {d}, V, f)
1 Step 1: Compute Er = {A1, ..., At};
2 Step 2: Compute MUd = {A ∈ Er : d 6∈ A, 6 ∃B ∈ Er : d 6∈ B,A ⊂ B}.;
3 Step 3: Set T (0) = U = {u1, ..., um};
4 Step 4: Set
T (i + 1) =
{
T (i)− ui+1, if MT (i)−ui+1d = MUd
T (i), otherwise
Then we set U ′ = T (m).
Theorem 20. T (m) satisfies the three conditions 1), 2) and 3) in Definition 19.
Proof. We prove the theorem by induction. At basis step T (0) = U , clearly, U ′ = U ,
REDU ′(C) = REDU (C) thus the two conditions 1), 2) are satisfied. At inductive step,
assume that we have T (i) = U(i) satisfies two conditions 1), 2) in Definition 19. We have to
prove that T (i + 1) = U(i + 1) satisfies the two conditions.
• In the first case: If T (i + 1) = T (i) then it is obvious that
U(i + 1) = U(i), REDU(i+1)(C) = REDU(i)(C) = RED(C)
by induction hypothesis. Thus, T (i+1) satisfies the two conditions 1), 2) in Definition
19.
• In the second case: If T (i + 1) = T (i)− {ui+1} then MUd = MU(i+1)d .
By Lemma 18, MUd =
(
KUd
)−1
where
(U = {u1, ..., um})⇒MU(i+1)d =
(
K
U(i+1)
d
)−1 ⇒ (KUd )−1 = (KU(i+1)d )−1 .
By Definition 9 and 15 (K and K1 are uniquely determined by one another), it can be
seen that
(
KUd
)
=
(
K
U(i+1)
d
)
. From Definition 16 and the result of Definition 16, we
have
REDU (C) =
(
KUd
)− {d}
and
REDU(i+1)(C) =
(
K
U(i+1)
d
)
− {d} ⇒ (ii1)REDU (C) = REDU(i+1)(C).
From induction hypothesis, we have (ii2) REDU (C) = REDU(i)(C). From (ii1),
(ii2) we obtain REDU (C) = REDU(i)(C) = REDU(i+1)(C). Because REDU (C) =
RED(C) is a Sperner-system (by definition KUd is a Sperner-system ⇒ KUd − {d} is
a Sperner-system), REDU(i)(C) and REDU(i+1)(C) are Sperner-systems. Finally, the
two conditions in Definition 19 are satisfied at step i + 1 as follows
138 HOANG MINH QUANG, VU DUC THI, NGUYEN NGOC SAN
1) U(i + 1) ⊆ U(i),
2) REDU(i+1)(C) = REDU(i)(C) = ... = REDU (C) = RED(C).
When i+1 = m then Algorithm 1 stops. Now we need show that U(m) satisfies the condition
3) in Definition 19 which means that REDU(m)−u(C) 6= REDU (C), where ∀u ∈ U(m).
Assume that there exists u = ui+1, u ∈ U(m) such that REDU(m)−ui+1(C) = REDU (C)
(ii3). By Definition 16, REDU(m)−ui+1(C) = K
U(m)−ui+1
d −{d} and REDU (C) = KUd −{d},
thus
(ii3)⇔ KU(m)−ui+1d − {d} = KUd − {d} ⇔ KU(m)−ui+1d = KUd (ii4).
By Definition 9, 15 and Lemma 18 (K and K−1 are uniquely determined by one another),
it means that
(ii4)⇔
(
K
U(m)−ui+1
d
)−1
=
(
KUd
)−1 ⇔MU(m)−ui+1d = MUd (ii5).
By the above induction, if M
U(m)−ui+1
d = M
U
d then ui+1 will be removed, thus ui+1 6∈ U(m)
which contradicts with hypothesis u = ui+1 ∈ U(m). Hence, the condition 3) in Definition
19 is satisfied. The theorem is proved.
It is clear that the number of steps computing Er by Definition 10 is less than |U |2.
The number of steps computing Md is less than |Er|2 and |Er| ≤ |U |(|U | − 1)
2
. Thus, the
worst-case time complexity of Algorithm 1 is not greater than O(|U |5). If we change order
of universe set U , we can find another object reduct.
Example 21. (continue example 13) In step 3 and step 4 of Algorithm 1 we have:
T (0) = U = {1, 2, 3, 4, 5, 6}, by using formula Md in Definition 10 we compute
ET (0)−{1}r = {A1, A2, A4, A5, A6, A7, A8, A9}
M
T (0)−{1}
d = {hm, t} = MUd ⇒ T (1) = T (0)− {1}
ET (1)−{2}r = {A1, A2, A4, A8, A9}
M
T (1)−{2}
d = {hm} 6= MUd ⇒ T (2) = T (1)
ET (2)−{3}r = {A1, A4, A6, A7, A8, A9}
M
T (2)−{3}
d = {hm, t} = MUd ⇒ T (3) = T (2)− {3}
ET (3)−{4}r = {A6, A7, A9}
M
T (3)−{4}
d = {t, h} 6= MUd ⇒ T (4) = T (3)
ET (4)−{5}r = {A1, A4, A7}
M
T (4)−{5}
d = {hm} 6= MUd ⇒ T (5) = T (4)
ET (5)−{6}r = {A4, A6, A8}
M
T (5)−{6}
d = {m, t} 6= MUd ⇒ T (6) = T (5).
Set U ′ = T (6) = {2, 4, 5, 6} then DS′ = (U ′, C ∪ {d}, V, f) is the object reduct of the
consistent decision table of DS = (U,C ∪ {d}, V, f) as Table 2.
SOME ALGORITHMS RELATED TO CONSISTENT DECISION TABLE 139
Table 2. The object reduct of Table 1
STT Headache Muscle pain Temperature Flu
2 yes yes high yes
4 no yes normal no
5 no no high no
6 no yes very high yes
Example 22. (continue Example 21) By using Definition 3 we have
U ′/h = {{2}, {4, 5, 6}}
U ′/m = {{2, 4, 6}, {5}}
U ′/t = {{2, 5}, {4}, {6}}
U ′/f = {{2, 6}, {4, 5}}
U ′/hm = {{2}, {4, 6}, {5}}
U ′/ht = {{2}, {4}, {5}, {6}}
U ′/tm = {{2}, {4}, {5}, {6}}.
Example 23. (continue Example 22) By using Definition 5 we have
POS′h({f}) = {{2}}
POS′m({f}) = {{5}}
POS′t({f}) = {{4}, {6}}
POS′hm({d}) = {{2}, {5}}
POS′ht({d}) = {{2}, {4}, {5}, {6}}
POS′tm({d}) = {{2}, {4}, {5}, {6}}.
Example 24. (continue Examples 17 and 23) Based on Definition 16, we have REDU ′(C) =
{ht, tm}. By Definition 19, REDU ′(C) is a Sperner-system and REDU (C) = REDU ′(C) =
{ht, tm}. Thus, the Algorithm 1 preserses RED(C) of the consistent decision table DS
(Table 1). Clearly, the consistent decision DS′ is efficient for finding RED(C) of consistent
decision table according to rough set theory [3].
140 HOANG MINH QUANG, VU DUC THI, NGUYEN NGOC SAN
Algorithm 2: Finding the minimal key from a set of antikeys
Input : Let K, H be Sperner-systems and C = {c1, ..., cn} ⊆ U such that
H−1 = K and ∃B ∈ K : B ⊆ C
Output: D ∈ H
1 Step 1: We set A(0) = C;
2 Step i + 1: Set
A(i + 1) =
{
A(i)− {ci+1}, if ∀B ∈ K : A(i)− {ci+1} 6⊆ B
A(i), otherwise
Then we set D = A(n).
Lemma 25. [6] If K is a set of antikeys, then A(n) ∈ H.
Algorithm 3: Finding an attribute reduct from a consistent decision table
Input : DS = (U,C ∪ {d}, V, f)
Output: D ∈ RED(C)
1 Step 1: Compute Er = {A1, ..., At};
2 Step 2: Compute Md = {A ∈ Er : d 6∈ A, 6 ∃B ∈ Er : d 6∈ B,A ⊂ B};
3 Step 3: Set H(0) = C = {c1, ..., cn};
4 Step 4: Set
H(i + 1) =
{
H(i)− ci+1, if 6 ∃B ∈Md : H(i)− ci+1 ⊆ B
H(i), otherwise
Then we set D = H(n).
Theorem 26. H(n) ∈ RED(C).
Proof. The Algorithm 3 is based on the Algorithm 2. By Lemma 18, (Krd)
−1 = Md. By
Lemma 25, H(n) ∈ Krd (1). By the result of Definition 16, RED(C) = Krd−{d} (2). At step
3 of Algorithm 3 we set C = {c1, ..., cn} then d 6∈ C. Thus, in Algorithm 3 d 6∈ H(n) (3).
From (1) and (3) we have H(n) ∈ Krd−{d} (4). From (2) and (4) we obtain H(n) ∈ RED(C).
The theorem is proved.
Similarly to the Algorithm 1, the time complexity of Algorithm 3 is not greater than
O(|C| × |U |4). If we change the order of the set C in step 3 we can get another attribute
reduct of the consistent decision table DS.
Example 27. (continue Example 21) In step 1 of Algorithm 3 we have
E1,2 = {m}, E1,3 = {t}, E1,2 = {mf}, E2,3 = {hf}, E2,4 = {hm}, E3,4 = {h}.
In step 2 of Algorithm 3
Er = {m, t,mf, hf, hm, h} ⇒Md = {hm, t} = {B1, B2}.
SOME ALGORITHMS RELATED TO CONSISTENT DECISION TABLE 141
In step 3 and 4 of Algorithm 3 we have
C = H(0) = {hmt}
temp = H(0)− {h} = {mt} 6⊆ (∀B ∈Md)⇒ H(1) = temp
temp = H(1)− {m} = {t} ⊆ B2 ∈Md ⇒ H(2) = H(1)
temp = H(2)− {t} = {m} ⊆ B1 ∈Md ⇒ H(3) = H(2).
Set H = H(3) = {mt}, the Algorithm 3 and H ∈ RED(C), we obtain table 3. By analogy
Table 3. An attribute reduct {mt} of Table 2
STT Muscle pain Temperature Flu
1 yes high yes
2 yes normal no
3 no high no
4 yes very high yes
with attributes [4], we can define elementary sets associated with the decision as subsets of the
set of all objects with the same value of the decision. Such subsets will be called concepts. We
observe that in terms of rough set theory, decision “Flu” depends on attributes “Muscle pain”
and “Temperature”, since all elementary sets of indiscernibility relation associated with {mt}
are subsets of some concepts. The induced decision rules from Table 3 are equal to those
from Table 1 with respect to reduct {mt}, they are
normal(Temperature)→ ¬(Flu),
¬(Muscle pain) ∧ high(Temperature)→ ¬(Flu),
(Muscle pain) ∧ high(Temperature)→ (Flu),
very high(Temperature)→ (Flu).
4. CONCLUSION
In this paper, we proposed some novel algorithms to find an object reduct and an attribute
reduct of consistent decision table and proved correctness of the algorithms. Based on
this result, we can apply some machine learning methods such as learning decision rules or
learning decision trees in an effective way. In addition, combination of attribute reduct and
object reduct can help mine large consistent decision table in comparison with traditional
methods.
REFERENCES
[1] J. Demetrovics and V. D. Thi, “Algorithms for generating an armstrong relation and inferring
functional dependencies in the relational datamodel,” Computers & Mathematics with Applica-
tions, vol. 26, no. 4, pp. 43–55, 1993.
142 HOANG MINH QUANG, VU DUC THI, NGUYEN NGOC SAN
[2] ——, “Keys, antikeys and prime attributes,” in Annales Univ. Sci. Budapest, Sect. Comp, vol. 8,
1987, pp. 35–52.
[3] Z. Pawlak, “Rough sets,” International Journal of Computer & Information Sciences, vol. 11,
no. 5, pp. 341–356, 1982.
[4] Z. Pawlak, J. Grzymala-Busse, R. Slowinski, and W. Ziarko, “Rough sets,” Communications of
the ACM, vol. 38, no. 11, pp. 88–95, 1995.
[5] A. Skowron and C. Rauszer, “The discernibility matrices and functions in information systems,”
in Intelligent Decision Support. Springer, 1992, pp. 331–362.
[6] V. D. Thi, “The minimal keys and antikeys,” Acta Cybernetica, vol. 7, no. 4, pp. 361–371, 1986.
[7] V. D. Thi and N. L. Giang, “A method to construct decision table from relation scheme,” Cyber-
netics and Information Technologies, vol. 11, no. 3, pp. 32–41, 2011.
Received on March 06 - 2017
Revised on March 10 - 2017
Các file đính kèm theo tài liệu này:
- some_algorithms_related_to_consistent_decision_table.pdf