Hedges algebras and fuzzy partition problem for qualitative attributes

Hedge algebra, a new and quite effective tool, can be used instead of the fuzzy theory in many cases due to the order structure on the set of natural language elements. This paper demonstrates this trend through the method of building the membership function to split fuzzy item sets in the fuzzy data mining problem. This is an important step that has not still been much invested and researched. In order to meet the requirements of optimization problem for both the quantity and the MF parameters mentioned above, the expansion of hedge algebra (not only for 5 terms) will not only solve data minings problem better but also promote Hedge Algebras strength. Moreover, the number of classes can be easily increased and still ensure a strong partitions to divide the identified domain of items based on this method. This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant No. 102.01-2017.06

pdf16 trang | Chia sẻ: huongthu9 | Lượt xem: 321 | Lượt tải: 0download
Bạn đang xem nội dung tài liệu Hedges algebras and fuzzy partition problem for qualitative attributes, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
Journal of Computer Science and Cybernetics, V.32, N.4 (2016), 335–350 DOI 10.15625/1813-9663/32/4/9145 HEDGES ALGEBRAS AND FUZZY PARTITION PROBLEM FOR QUALITATIVE ATTRIBUTES TRAN THAI SON1, NGUYEN TUAN ANH2 1Institute of Information Technology, Vietnam Academy of Science and Technology 2University of Information and Communication Technology, Thai Nguyen University 1ttson1955@gmail.com; 2anhnt@ictu.edu.vn  Abstract. There have been many approaches proposed to derive membership functions for mining fuzzy association rules using genetic algorithms (GAs) and show their advantages. However, these approaches show that the number of linguistic terms needs to be predefined. In this paper, the author proposed a new method to construct the membership functions (MFs) based on database. The theory of hedge algrebra was used to build the membership functions and GA is applied to optimize them. The experimental results demonstrate the benefits of this method. Keywords. Fuzzy Association Rules; Data mining; Hedge algebras; Genetic algorithms; Member- ship functions 1. INTRODUCTION Recently, mining fuzzy association rules, such as “If students have high academic results and are passionate about researching, they will find a good job.”, has been the topic that is considered and developed. To be able to get fuzzy rules, in the first step, we partition each attribute domain of a database into fuzzy sets, characterized by membership functions (MFs). Then, each value (number) in the database will be converted to the corresponding set of the degree of membership. Numerical Database is converted into a fuzzy database ready for fuzzy mining in next steps. Traditionally, the fuzzy database is often assumed to be available, i.e, the MF sets are defined for each DB attribute. Such MF sets are usually determined experimentally and independently from database. In fact, it was shown that the approach may inversely affect the quality of the mining rules. Therefore, researchers pay attention to the construction of optimal sets MFs according to some predefined criteria by different algorithms of DB. However, there are very few studies of contructing MF sets for extracting association rules. The majority of them are related to the problems of automatic classification and regression [8, 9, 10, 11, 24]. In addition, despite its flexibility in approach, the use of L.Zadeh fuzzy set theory still has certain limitations. In this paper we propose the approach using Hedge Algebra [18, 19] to build MFs for extracting association rules. The advantages of this method will be demonstrated in analysis of experimental results on standard data of the Population of US in 1995. c© 2016 Vietnam Academy of Science & Technology 336 TRAN THAI SON, NGUYEN TUAN ANH 2. RELATED WORK 2.1. Association rule Let I = I1, I2, ..., Im be a set of items. Let D, the task-relevant data, be a set of database transactions where each transaction T is a set of items such that T ⊆ I. Each transaction is associated with an identifier, called TID [21]. Definition 1. [22] An association rule has the form of X ⇒ Y , where X ⊂ I, Y ⊂ I, and X ∩ Y = ∅. Two important measures of association rule are support(s) and confidence(c) defined in [22]. Definition 2. [22] The support of association rule X ⇒ Y is the probability that X ∪ Y exists in a transaction in the database D. support(X ⇒ Y ) = P (X ∪ Y ) = n (X ∪ Y ) N . (1) Definition 3. [22] The confidence of the association rule X ⇒ Y is the probability that X ∪ Y exists given that a transaction contains X, i.e. confidence (X ⇒ Y ) = P ( X Y ) = n (X ∪ Y ) n (Y ) , (2) where n(X) - transaction count (including X), N - total of transaction database. Mining the association rules is used to find all rules having the degree of support and confidence that is greater than that of min support and min confidence determined by the user. In fuzzy association rules, the degree of support of a fuzzy range sk belonging to xi is defined as following [3, 4]: FS ( Ax1s1 , A x2 s2 , .., A xk sk ) = 1 N N∑ j=1 k∏ i=1 µxisi ( dxij ) . (3) And the reliability of a fuzzy range s1, s2, .., sk of items x1, x2, .., xk respectively is FS ( Ax1s1 , A x2 s2 , . . . , A xk k ) = 1 N N∑ j = 1 min ( µx1s1 ( dx1j ) , µx2s2 ( dx2j ) , . . . , µxksk ( dxkj )) , (4) where xi is i th item, sj is fuzzy range belonging to item i th, N is the total of transactions in the database, µxisk(d xi j ) is the membership degree of the value at the i th column, row j into the fuzzy set sk. HEDGES ALGEBRAS AND PROBLEM FUZZY PARTITION FOR QUALITATIVE ATTRIBUTES 337 2.2. Hedges algebras (HA) Suppose that there is a set of linguistic values of any linguistic variable that includes:...< Very Negative < Negative < Little Negative < Zero < Little Positive < Positive < Very Positive <... These linguistic values appear in the language rules (LRB - Linguistic Rule Base) of approximately reasoning problems based on knowledge. Therefore, it is necessary to have a strict computing architecture which preserves order relation of the inherent linguistic values. Since then the semantic relationship of the linguistic value in the rules can be calculated. HA [18, 19] is a mathematical structure which has the order of collection of linguistic items. The order relationship is defined by the semantics of the linguistic items from this collection. The quantitative semantics value of linguistic items through Semantically Quan- tifying Mappings SQMs allows performing a full description and showing rule set model and approximate inference process of inference in a logical and coherent way. Definition 4. [19] Hedge algebras of the linguistic variable X can be represented as an algebraic structure which is the set of 5 components AX = (X,G,C,H,≤), where X is a set of items in X; ≤ denotes the natural semantic order relationships of the items in X; G = {c−, c+} , c− ≤ c+ called the generating elements; C = 0,W, 1 is the set of constants, with the range 0 ≤ c− ≤W ≤ c+ ≤ 1, to indicate the elements that has the smallest, largest and neutral elements. The set of hedges H = H− ∪H+, with H− = {hj : 1 ≤ j ≤ q} is the set of negative hedges, H+ = {hj : 1 ≤ j ≤ p} is the positive hedges. Apparently, the set X consists of linguistic items which have the order of the linguistic variable X and are generated by the impact of hedges h ∈ H on the generating elements c ∈ G. The components in AX has some following properties: • ∀h ∈ H,x ∈ X : hx ≤ or hx ≥ x • ∀h ∈ H, hx = x then x is the permanent element. We have H (x) = {x|x ∈ C} • ∀h ∈ H− then hc+ ≤ c+, hc− ≥ c−; ∀h ∈ H+ then hc+ ≥ c+ hc− ≤ c− • h, k ∈ H+, h ≥ k if hc+ ≥ kc+ (or hc− ≤ kc−) • h, k ∈ H−, h ≥ k if hc+ ≤ kc+ (or hc− ≥ kc−) • h, k ∈ H, x ∈ X, h is positive for k if hkx < kx < x (or x < kx < hkx), h is negative for k if x < kx < hkx (or x < hkx < kx) From the above characteristics, we can define a function sgn as follows: Definition 5. [25, 26] sgn : X → {−1, 0, 1}. With k, h ∈ H, c ∈ G, x ∈ X 1) sgn (c+) = +1 and sgn (c−) = −1 2) {h ∈ H+|sgn (h) = +1} and {h ∈ H−|sgn (h) = −1} 3) sgn (hc+) = +sgn (c+) if hc+ ≥ c+ or sgn(hc−) = +sgn(c−); if hc−c− and sgn(hc+) = −sgn(c+); if hc+ ≤ c+ or sgn(hc−) = −sgn(c−); if hc− ≥ c− or sgn (hc) = sgn (h) sgn (c) 338 TRAN THAI SON, NGUYEN TUAN ANH 4) sgn (khx) = +sgn (hx) if k is positive for h (sgn (k, h) = +1) and sgn (khx) = −sgn (hx) if k is negative for h (sgn (k, h) = −1) 5) sgn (khx) = 0 if khx = hx. Fuzzy measurement of a concept x ∈ X is equal to the radius of the set H(x), denoted as fm(x) and can be calculated recursively from the fuzzy measurement of the generating elements fm(c−), fm(c+) and the fuzzy measurement of the hedges µ (h), h ∈ H, called the fuzzy parameters of X. fm(x) is defined recursively as follows: Definition 6. [20] fm : X → [0, 1], x ∈ X where 1) fm(C−) + fm (C+) = 1 and ∑ h∈H fm (hu) = fm (x), for all x ∈ X 2) fm(x) = 0, with ∀x, H (x) = {x}. Especially, fm (0) = fm (W ) = fm (1) = 0 3) ∀x, y ∈ X, ∀h ∈ H, fm (hx) fm (x) = fm (hy) fm (y) does not depend on specific elements and is called the fuzziness measure of h, denoted by µ (h). With x ∈ X, x = hn, hn−1, ..., h1c, hj ∈ H, c ∈ G, we have the characteristics of fm(x) and µ (h) as follows: fm (hx) = µ (h) fm (x) ,∀x ∈ X, (5) p∑ i=−q,i6=0 fm (hic) = fm (c) , with C ∈ { C−, C+ } , (6) p∑ i=−q,i6=0 fm (hix) = fm (x) , (7) fm (x) = fm (hnhn−1 . . . h1c) = µ (hn)µ (hn−1) . . . µ (h1) fm (c) , (8) −q∑ i=−1 µ (hi) = α, and p∑ i=1 µ (hi) = β, with α, β > 0 andα+ β = 1. (9) With advance fm(c−), fm(c+) and µ (h) , h ∈ H specifically, semantically quantifying value is recursively determined by the semantically quantifying mapping function v as follows. Definition 7. [19] v : X → [0, 1] v(W ) = fm(c−), v(c−) = θ − αfm(c−) = βfm(c−), (10a) v ( c+ ) = θ + αfm ( c+ ) = 1− βfm (c+) , (10b) HEDGES ALGEBRAS AND PROBLEM FUZZY PARTITION FOR QUALITATIVE ATTRIBUTES 339 v (hjx) = v (x) + sign (hjx)  j∑ i=sign(j) fm (hj)− ω (hjx) fm (hjx)  , (11a) v (hjx) = v (x) + sign (hjx)  j∑ i=sign(j) fm (hj)− ω (hjx) fm (hjx)  , (11b) where ω (hjx) = 1 2 [1 + sgn (hp, hj) (β − α)], i ∈ [−q ∧ q] = [−q, p] \ {0}. The definition and properties of the HA can be found in [1]. Similarly it is also set I(k) = ∪l=1,...,kIl. Figure 1. Fuzziness measure of TRUTH In HA, the set (of elements) of the same capacity (length) will create a partition with domain-specific attributes. Each fuzzy interval corresponds to a fuzzy measure of an element. These fuzzy intervals are arranged in the order that reflexes the “strength” of the se- mantics values (aka, quantitative semantics value). These results give us a simple method to establish a set of domain-specific partition MFs [26]. 3. FUZZY PARTITIONS IN QUANTITATIVE ATTRIBUTES 3.1. Using fuzzy logic method for Partitions There are several methods for partitioning domain values using fuzzy logic as follows: • Random partitioning: In this method, a fixed number of sub-domains (partitions) is chosen (conventionally, 3) so that each sub-domain has the same length. This method is simple and effective when there is not enough information about the database. How- ever, it does not take into account the distribution of data. • Partition clustering: In this method, data is partitioned into clusters based on the proximity of a certain criteria between them. Popular algorithm is k-means clustering. 340 TRAN THAI SON, NGUYEN TUAN ANH Normally, the number of clusters are chosen in advance and fixed (usually, 3). Unlike the previous one, this method takes into account the distribution of data. However, it is also very time-consuming. • Dynamically constrained partitioning: In this method, data is partitioned into fuzzy domain under constraints on the membership functions to ensure a given criteria. According to [11], these criteria might be: 1) The number of partions must be reasonably small. 2) The membership functions are distinguished, for example, two MFs are not spe- cific to the same linguistic label. 3) Each MF is normalized ie if it reaches the value 1 at least at one point of the range. 4) The range of values is covered entirely by the corresponding fuzzy domain and at least one MF received value x > 0 at any point in the value domain. 3.2. Using HA method for Partitions In section 2.2., we presented some basic concepts of HA. These values refer to different fuzzy sets of the partition for each numeric dataset attribute. It is easy to calculate the value of membership degree based on fuzzy membership functions as in [17]. Having defined the domain of the item (the normalization on the interval [0,1]), any value between two quan- tification of semantics value of consecutive fuzzy intervals or coincides with a quantification of semantics value will create domain-specific partition fuzzy intervals. Thus, the distance from xij to two quantitative semantics value can be taken as the membership degree xij in fuzzy sets (if xij is equal to a quantitative semantics value, only one membership degree): The smaller the distance is, the higher the membership degree will be (maximum value 1 when being a full member). In [14], the authors built MFs from quantification of semantics values. Membership functions are triangular with ver- tices having coordinates (v(xi), 1). The coordinates of the three vertices of a triangle are: (v(xi), 1), (v(xi−1), 0), (v(xi+1), 0) with v(xi−1), v(xi), v(xi+1) being consecutive quantifica- tion of semantics values (in Figure 2). It can be seen that in Figure 2, the two methods are used. Assuming E is an arbitrary point in the determining area of property Ii. With the first method, the distance between Ev(x2) and Ev(x3) will be used to determine the membership degree of E to fuzzy sets represented by triangle membership functions v(x1)Bv(x3) and v(x2)Cv(x4) (based on the standardization so that membership degree is always in the range [0,1]. Using the second method, EG and EF are the membership degree of E to the two fuzzy sets. EG is parallel to v(x2)B, so EG v(x2)B = Ev(x3) v(x2)v(x3) . Similarly, EF v(x3)C = v(x2)E v(x2)v(x3) . Furthermore, v(x2)B = v(x3)C = 1, so EF EG = Ev(x2) Ev(x3) . Therefore, it is equivalent to use the membership degree in these two methods. Especially, determining the membership degree based on HA is also reasonable and close to the natural experience. Using the HA to build membership functions has some advantages: HEDGES ALGEBRAS AND PROBLEM FUZZY PARTITION FOR QUALITATIVE ATTRIBUTES 341 Figure 2. Build MFs using HA a) Since constructing HA is based on the semantics that human beings feel, it is sensible that the membership functions bring good reflection on the semantics of the fuzzy set it represents. b) Coverage of MFs as well (always identified domain covered). If need optimal suitability of MFs, we just need to optimize performance and usability overlap. Optimization problem according to the parameters of the overlapping HA and usefulness can be solved by a genetic algorithm (GA). c) The number of optimal parameters is limited. When changing the initial parameters of the HA, it is easy to rebuild the MFs. Therefore, this method is simple and reasonable. To prove the effectiveness of this approach, we tested on the standard sample FAM95 data as presented in the following section (FAM95 database conducted by the Bureau of the Census for the Bureau of Labor Statisticsin 1995). 4. GENETIC MINING PROCESS In [7], the authors used methods to evaluate the membership functions. MFs’ prominence was assessed by three factors: The overlap factor represents the overlap ratio of the MFs, The coverage factor represents the coverage ratio of the MFs, and usage factor. In this paper, we use GA algorithm as in [5] to optimize the MFs according to the above criteria. The components required to design this GA include: • CHC genetic model. • MFs codification and initial gene pool. • Chromosome evaluation. • Crossover operator. 4.1. CHC genetic model In this section, the population-based selection approach is considered using the CHC genetic model [18] to perform the adequate global search. In the genetic CHC model, N 342 TRAN THAI SON, NGUYEN TUAN ANH parents and their offsprings compete to select the best N individuals for the next population. The CHC uses the “incest prevention” mechanism and a process of restarting to create the diversity in the population (instead of mutation). The incest prevention mechanism is used when provoking the crossover operator. While considering a real coding scheme, each gene is transformed using a Gray Code with a fixed number of bits per gene. This number is determined by experts. The crossover threshold L is calculated as following: L = (#Genes BITSGENE)/4.0 where #Genes is the number of genes, BITSGENE is the number of bits per gene. In the original CHC model, if there is no new individuals in the popuplation in a generation, L is decreased by one. To make L indepedent of #Genes and BITSGENE, we decrease L by ϕ% (ϕ is set by users, normally 10%). The algorithm restarts when L < 0. 4.2. MFs codification and initial gene pool In this paper, we use structured HA as follows: AT = (X,G, H,≤) , G = {C− = {low} ∪ C+ = {high}}, H = {H− = {Little} ∪H+ = {V ery}}, α = µ (Little) = 1− µ (V ery), β = 1− α, w = fm (low) = 1− fm (high). We performed a chromosome, a real number array size n× 2 (where n is the number of items, 2 corresponds to the parameters α and w in each HA): (α1, . . . , αn, w1, . . . , wn) . (12) For each pair (αi, wi) are parameters of a HA. Initialize population consisting of N chromosomes: based on the experience of the value α and w will receive a random value in the interval [0.2 to 0.8]. 4.3. Chromosomal evaluation To evaluate the chromosome, we use fitness function to define in [7]. The fitness function of a chromosome Cq is defined as follows: fitness (Cq) = ∑ x∈L1 fuzzy support (x) suitability (Cq) (13) with - L1 is frequent 1-itemset using MFs in Cq, - fuzzy support(x): fuzzy support of 1-itemset x is calculated from the transaction database, - suitability (Cq): appropriate coverage of MF in Cq. HEDGES ALGEBRAS AND PROBLEM FUZZY PARTITION FOR QUALITATIVE ATTRIBUTES 343 The suitability of the set of chromosomes of MFs in Cq is defined as follows: suitability(Cq) = n∑ k=1 [overlap factor(Cqk) + coverage factor(Cqk)] (14) where n is number of items, overlap factor(Cqk) is the overlap factor of the MFs for an item Ik in the chromosome Cq, and coverage factor(Cqk) is the coverage factor of the MFs for an item Ik in the chromosome Cq. The overlap factor presents the ratio of overlying between the MFs (denoted Ri and Rj , i < j) for an item Ik in the chromosome Cq. This ratio is defined as the length of the overlap region from the right span of Ri and the left span of Rj . If this length is larger than the minimum of the spans, it is said to be “little redundant”. Thus, the overlap factor is defined as Overlap factor (Cqk) = m∑ k=1 m∑ j=i+1 [ max ( overlap (Ri, Rj) min ( spanRRi , spanLRj , ) , 1)− 1] (15) where overlap(Ri, Rj) is the overlap length of Ri and Rj , spanRRi is the right span of Ri, spanLRj is the left span of Rj and m is the number of MFs for Ik. In this case, spanRRi and spanRRj have the same size as the MFs are shifted in the uniform partition (and thus maintain the original MFs’ shapes). The coverage factor is the ratio of coverage of the MFs for an item Ik in the chromo- some and is defined as the coverage range divided by the maximum number of items in the transaction. The larger the coverage ratio is, the better the derived MFs will be. The coverage factor of the MFs for an item Ik in the chromosome Cq is defined as: Coverage factor (Cqk) = 1 rang (R1, . . . , Rm) max (Ik) , (16) where range(R1, R2, ..., Rm) - coverage range of the MFs and max(Ik) - maximum quantity of Ik in the transactions. Figure 3. MFs of Item Ij 344 TRAN THAI SON, NGUYEN TUAN ANH Figure 4. Two bad kinds of membership functions These suitable factors in the fitness functions may help to prevent two unfavorable types of MFs (redundant or separated): the overlap factor - for the redundant MFs, the coverage factor - for the separated MFs. Recently, we also use the concept of strong fuzzy partition to build collective MF [12]. This concept is defined as follows: MFs make a strong fuzzy partition domain if they cover property values and at any point on each identified domain, the total’s membership degree on this point to all of the partitions MFs valued at 1. Strong fuzzy partition MFs also generate relatively good distribution. With this measure, it is possible to use genetic algorithms to get optimal MFs. 4.4. Crossover operator The crossover operators create a cooperation within genetic model generating the con- vergence by pressuring on the offsprings. Particularly, the Parent Centric BLX (PCBLX) operator based on the BLX − α is considered here. The PCBLX operator is described as follows. Let us assume that X = (x1, ..., xn) and Y = (y1, ..., yn), (xi, yi ∈ [ai, bi] ⊂ <, i = 1, ..., n), are two real-coded chromosomes that are going to be crossed. We generate the two following offsprings: (1) O1 = (o11, ..., o1n) where o1i is a randomly (uniformly) chosen number from the interval [l1i , u 1 i ], with l 1 i = max {ai, xi − Iiα}, u1i = max {bi, xi − Iiα}, and Ii = |xi, yi|. (2) O2 = (o21, ..., o2n) where o2i is a randomly (uniformly) chosen number from the interval [l2i , u 2 i ], with l 2 i = max {ai, yi − Iiα}, u2i = max {bi, yi − Iiα}, and Ii = |xi, yi|. 5. PROPOSED MINING ALGORITHM In this section, a proposed algorithm for mining MFs and association rules is described in detail. Input: Transaction data with T quantities, n-item set (each item has m predefined linguistic terms), support threshold min Support, confidence threshold min confidence, population size N. Output: Set of association rules with associated set of MFs. Phase 1: Learning the MFs. HEDGES ALGEBRAS AND PROBLEM FUZZY PARTITION FOR QUALITATIVE ATTRIBUTES 345 Step 1: Initial population generation with N chromosomes. Chromosome representation of the form (α1, . . . , αn, w1, . . . , wn), with each pair of (αi, wi) is a HA. Step 2: Population evaluation. For each chromosome: The MFs is a string encryption as described in Section 4.2. Based on the HA has been in step 1, to build the MFs for a property in the database as described in Section 3.2. Step 3: Calculate the fitness function for each chromosome in the population as follows: - For each transaction Di, i = 1 to T , and for each item Ij , j = 1 to n, convert the quantitative value v (i) j (Di = (v (i) 1 ), ..., v (i) n )) into a fuzzy set f (i) j represented as f (i) j = f (i)j1 Rj1 + f (i) j2 Rj2 + . . .+ f (i) jl Rjl  using the corresponding MFs, where Rjk is the k-th linguistic term of item Ij , f (i) jk is v (i) j ’s membership value in Rjk region, and m is the number of linguistic terms for Ij . - For each linguistic term Rjk, assess its count on the transactions: countjk = n∑ i=1 f (i) j (17) - For each Rjk, 1 < j < n and 1 < k < m, check if countjk is larger than or equal to the minimum support threshold min Support. If Rjk satisfies the above condition, put it in the set of large 1-itemsets (L1): L1 = {Rjk|countjk ≥ α, 1 ≤ j ≤ m, 1 ≤ k ≤ |Ij |} . (18) - Set the fitness value of the chromosome as the sum of the fuzzy support (the count/T ) of the linguistic terms in L1 divided by suitability(Cq). That is fitness (Cq) = ∑ x∈L1 fuzzy support (x) suitability (Cq) . (19) Step 4: Threshold value (L) initialization. Step 5: Next population generation: - Shuffle the population. - Select parents two by two. - Evaluate new individuals. - Join the parents with their offspring and select the best N individuals to put into the next population. 346 TRAN THAI SON, NGUYEN TUAN ANH Step 6: In case the best chromosome does not change or there is no new individual in the population, L = L− (Linitial ∗ 0.1). Step 7: If L < 0, restart the population. Step 8: If the maximum number of evaluations is not reached, go to Step 5. Phase 2: Basic method for mining association rules. Step 9: The set of the best MFs is then applied in mining fuzzy association rules from the given quantitative database using the algorithm proposed in [23]. 6. EXPERIMENTAL RESULTS The source of the data is taken from FAM95 database, conducted by the Bureau of the Census for the Bureau of Labor Statistics in 1995. We selected 10 attribute numbers that include: age of the head of the family, number of persons in the family, number of children, hours head worked last week, head’s personal income, family income, taxable income for head, federal tax for head, final sampling weight for weight and March supplement income and tax. Table 1. Results obtained in the genetic process Proposed Approach Hong et als Approach Uniform Fuzzy Partition Sup Fit Fsup Suit #1I Fit Fsup Suit #1I Fit Fsup Suit #1I 0.2 0.98 9.83 10 22 0.53 10.22 19.27 22 0.94 9.43 10 21 0.5 0.79 7.87 10 10 0.38 7.95 20.63 12 0.46 4.57 10 7 0.7 0.66 6.62 10 8 0.2 3.96 19.54 5 0.24 2.36 10 3 0.9 0.09 0.94 10 1 0.06 0.9 15.01 1 0 0 10 0 In these experiments, the proposed approach with one uniform fuzzy partition is com- pared to the Hong et al.s work [7]. For our method, each item is divided into 5 fuzzy domains with the corresponding labels in the HA. They are {0, C−,W,C+, 1}. We performed a genetic algorithm with the following parameters: 50 individuals, 10,000 evaluations, 0 bits per gene for the Gray codification, 0.6 as crossover probability. The results are shown in Table 1, with the total level of support, with Fsup for the sum of the fuzzy support of the large 1-itemsets, Suit for the suitability and #1I for the number of large 1-itemsets. The results from the table 1 show that: - With the minsupp value of 20, the large 1-itemsets calculated by the HA and by Hong et al. have the same values which are better than the others. - The Hong method brings better result only with the minsupp value of 0.5. However, the HA approach wins the others. Herrera et al. also conducted experiments on the same data using a method called 2- tuples. This method shows better large 1-itemsets (22, 15, 10, 1, accordingly). However, the received MF sets (after the GA calculating) are quite poor (fig. 6.: MFs for 10 attributes with minimum support value of 20%). It is shown that, in these situations, the received MFs sets all have a pair of overlapping MFs, which does not satisfy the overlap criteria. These results prove that the fuzzy domain HEDGES ALGEBRAS AND PROBLEM FUZZY PARTITION FOR QUALITATIVE ATTRIBUTES 347 Figure 5. MFs obtained by Herrera et al.’ approach with 5 linguistic terms Figure 6. MFs displacements of the MFs with 5 linguistic terms 348 TRAN THAI SON, NGUYEN TUAN ANH partitioned by this method is not good enough (ex. In this situation, its more reasonable to get 4 partitions, which reduce the amount language labels to 4 (instead of 5)). Selecting the MFs with fixed amount of items as well as determining the number of items are important issue since the final results are dependent on the amount of the MF for each item. We will return to this problem in the next paper with algorithms for optimizing the quantity and parameter of the attribute MFs in order to obtain the best result in data mining through the concept of multi domain particle fuzzy partitioning. Figure 6. introduces the illustration of MFs calculated by HA approach. The triangles representing the MFs make up a strong partitioning. 7. CONCLUSIONS Hedge algebra, a new and quite effective tool, can be used instead of the fuzzy theory in many cases due to the order structure on the set of natural language elements. This paper demonstrates this trend through the method of building the membership function to split fuzzy item sets in the fuzzy data mining problem. This is an important step that has not still been much invested and researched. In order to meet the requirements of optimization problem for both the quantity and the MF parameters mentioned above, the expansion of hedge algebra (not only for 5 terms) will not only solve data minings problem better but also promote Hedge Algebras strength. Moreover, the number of classes can be easily increased and still ensure a strong partitions to divide the identified domain of items based on this method. This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant No. 102.01-2017.06 REFERENCES [1] N.C. Ho, Wechler W., Hedge algebra: An algebraic approach to structures of sets of linguistic truth values, Fuzzy Sets and Systems, vol. 35, pp. 281–293, 1990. [2] L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engag- ing in nontraditional genetic recombination, in: G. Rawlin (ed.), Foundations of Genetic Algorithms, 265–283, 1991. [3] C. M. Kuok, A. Fu, and M. H. Wong, Mining fuzzy association rules in databases, ACM Sigmod Record, vol. 27, no. 1, pp. 41–46, 1998. [4] A. Gyenesei, A fuzzy approach for mining quantitative association rules, Acta Cybern., vol. 15, no. 2, pp. 305–320, 2001. [5] Herrera, Martinez, Learning the Membership Function Contexts for Mining Fuzzy Asso- ciation Rules by Using Genetic Algorithms, Fuzzy Set and System, 905-921, 2009. [6] Li-Xing Wang and J.M.Mendel, Generating Fuzzy Rules by Lerning from Examples, IEEE Trans. SMC, 1, 1992. [7] C. Chen, T. Hong, Vincent S. T. and L. Chen, Multi-objective genetic-fuzzy data mining. International Journal of Innovative Computing, Information and Control, 8, 2012. HEDGES ALGEBRAS AND PROBLEM FUZZY PARTITION FOR QUALITATIVE ATTRIBUTES 349 [8] M.J. Gacto, R. Alcal, F. Herrera, Interpretability of linguistic fuzzy rule-basedsystems: An overview of interpretability measures, Information Sciences, 8, 2011. [9] M. Antonelli, P. Ducange, F. Marcelloni, Genetic Training Instance Selection in Multiob- jective Evolutionary Fuzzy Systems: A coevolutionary Approach, IEEE Trans. on Fuzzy Systems, 20, 276–290, 2012. [10] Alcala-Fdez, Jesu´s and Alcala, Rafael and Herrera, Francisco, A fuzzy association rule- based classification model for high-dimensional problems with genetic rule selection and lateral tuning, IEEE Transactions on Fuzzy Systems, 19, 5, 857–872, 2011. [11] P.Pulkkinen and H.Koivisto, A Dynamically Constrained Multiobjective Genetic Fuzzy System for Regression Problems, IEEE Tran. on Fuzzy Systems, 18, 857–872, 2010. [12] Corrado Mencar, Marco Lucarelli, Ciro Castiello, Anna M. Fanelli, Design of Strong Fuzzy Partitions from Cuts, Conference of the European Society for Fuzzy Logic and Technology, 2013. [13] Tanaka, H, Uejima, S, and Asia, K., Linear regression analysis with Fuzzy model, IEEE Trans. Systems.Man.Cybernet, 12, 903–07, 1982. [14] N. C. Ho, T.T. Son, D.L. Long, Hedge Algebras approach to fuzzy classification, Journal of Computer Science and Cybernetics, 25, 53–68, 2009. [15] N. C. Ho, T. T. Son, and P. D. Phong, Modeling of a semantics core of linguistic terms based on an extension of hedge algebra semantics and its application, Knowledge-Based Systems, vol. 67, pp. 244–262, 2014. [16] N. C. Ho, W. Pedrycz, D. T. Long et al., A genetic design of linguistic terms for fuzzy rule based classifiers, International Journal of Approximate Reasoning, vol. 54, no. 1, pp. 1–21, 2012. [17] N. C. Ho, T. T. Son, T. D. Khang et al., Fuzziness Measure, Quantified Sematic Mapping and Interpolative Method of Approximate Reasoning in Medical Expert Systems, Journal of Computer Science and Cybernetics, vol. 18, no. 3, pp. 237–252, 2002. [18] N.C. Ho and Wechler, Wolfgang, Hedge algebras: an algebraic approach to structure of sets of linguistic truth values, Fuzzy sets and systems, 35, 3, 281–293, 1990. [19] N.C. Ho, Wechler, Wolfgang, Extended hedge algebras and their application to fuzzy logic, Fuzzy sets and systems, 52, 3, 259–281, 1992. [20] N. C. Ho, and N. V. Long, Fuzziness measure on complete hedge algebras and quanti- fying semantics of terms in linear hedge algebras, Fuzzy Sets and Systems, vol. 158, no. 4, pp. 452–471, 2007. [21] Jiawei Han, Data Mining: Concepts and Techniques: University of Illinois at Urbana- Champaign, Micheline Kamber, 2012. [22] R. Agrawal, T. Imielinski, A. Swami, Fast algorithms for mining association rules, The International Conference on Very Large Database, 487–499, 1994. 350 TRAN THAI SON, NGUYEN TUAN ANH [23] D. L. Olson, and D. Delen, ”Advanced data mining techniques”, Springer Science & Business Media, 2008. [24] C.-H. Chen, T.-P. Hong, Y.-C. Lee et al., Finding Active Membership Functions for Genetic-Fuzzy Data Mining, International Journal of Information Technology & Decision Making, vol. 14, no. 06, pp. 1215–1242, 2015. [25] N. C. Ho, T. D. Khang, H. V. Nam et al., Hedge algebras, linguistic-value logic and their application to fuzzy reasoning, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 7, no. 04, pp. 347–361, 1999. [26] N. C. Ho, V. N. Lan, and L. X. Viet, Quantifying hedge algebras, interpolate reasoning method and its application to some problems of fuzzy control, WSEAS Transactions on Computers, vol. 5, no. 11, pp. 2519–2529, 2006. Received on November 11 - 2017 Revised on July 01 - 2017

Các file đính kèm theo tài liệu này:

  • pdfhedges_algebras_and_fuzzy_partition_problem_for_qualitative.pdf