In this paper, we have introduced the RTWU structure
and two algorithms – EAHUI-Miner sequential algorithm
and PEAHUI-Miner parallel algorithm. Experimental results have showed that the number of candidate sets that
EAHUI-Miner generates is less than that of FHM. Moreover, the two proposed algorithms are also faster than
two state-of-the-art algorithms (FHM and EFIM), with
datasets containing thousands of items and huge numbers
of transactions (large databases), but there are fewer items
in each transaction. In the future, we plan to test and
implement the algorithms using Hadoop framework.
7 trang |
Chia sẻ: huongthu9 | Lượt xem: 529 | Lượt tải: 0
Bạn đang xem nội dung tài liệu Parallel Mining for High Utility Itemsets Mining by Efficient Data Structure, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
Research and Development on Information and Communication Technology
Parallel Mining for High Utility Itemsets
Mining by Efficient Data Structure
Nguyen Manh Hung1 and Dau Hai Phong2
1 Military Technical Academy, Hanoi, Vietnam
2 Thang Long University, Hanoi, Vietnam
E-mail: manhhungk12@mta.edu.vn, phong4u@gmail.com
Correspondence: Dau Hai Phong
Communication: received 5 July 2017, revised 8 August 2017, accepted 16 August 2017
Abstract: Mining high utility itemsets in transaction
database is an important task in data mining and widely
applied in many areas. Recently, many algorithms have been
proposed, but most algorithms for identifying high utility
itemsets need to generate candidate sets by overestimating
their utility and then calculating their exact utility value.
Therefore, the number of candidate itemsets is much larger
than the actual number of high utility itemsets. In this
paper, we introduce the Retail Transaction-Weighted Utility
(RTWU) structure and propose two algorithms: EAHUI-
Miner algorithm and PEAHUI-Miner parallel algorithm. They
have been experimented and compared to the two most
efficient algorithms: EFIM and FHM. Results show that our
algorithm is better with sparse datasets.
Keywords: High utility mining, EAHUI-miner, PEAHUI-miner,
utility-list, RTWU.
I. INTRODUCTION
Today, searching for hidden knowledge in huge volume
datasets grows rapidly and becomes a very interesting
problem. High utility mining, which aims to find a benefit
value of an itemset greater than a given threshold, is a
difficult problem. Unlike frequent itemset mining, in the
mining of high utility itemsets, it is allowed to evaluate the
importance of each item in the dataset.
One of the challenges in mining high utility itemsets is
that they do not have the closure property [1], which ensures
that if X is not a high utility itemset then all itemsets
containing X are also not high utility itemsets. This leads
to increasing the number of candidates and search space.
Some high utility mining algorithms proposed a two-
phase mining approach, such as Udepth [2], PB [3], HP [4],
UP-Growth [5] and UP-Hist [6]. However, algorithms using
this approach generate a large number of candidates and
require multiple scans of the database. Recently, to avoid
the generation of a large number of candidates, algorithms
such as HUI-Miner [7], EFIM [8] and FHM [9] attempted
to directly search for high utility itemsets in only one phase.
In 2012, Liu et al. proposed the HUI-Miner algorithm [7]
that uses a utility-list structure to store utility information
of each itemset and information for reducing search space.
Different from previous algorithms, HUI-Miner does not
generate candidate high utility itemsets. After constructing
the initial utility-lists from a mined database, HUI-Miner
can mine high utility itemsets from these utility-lists.
In 2015, Zida and Fournier-Viger proposed the EFIM
algorithm [8] that relies on two upper-bounds named sub-
tree utility and local utility to more effectively prune the
search space. It also introduced a novel array-based utility
counting technique named Fast Utility Counting to calculate
these upper-bounds in linear time and space. Moreover, to
reduce the cost of database scans, EFIM proposes efficient
database projection and transaction merging techniques. An
extensive experimental study on various datasets showed
that EFIM is in general two to three orders of magnitude
and memory requirement is up to eight times less than the
state-of-art algorithms, such as d2HUP, HUI-Miner, HUP-
Miner, FHM, and UP-Growth+.
The FHM algorithm [9] restricted the high join cost
of the HUI-Miner algorithm [7] based on the closure
property of the Transaction-Weighted Utility (TWU). It
does not join itemsets containing the pair (x, y) having
TWU(x, y) smaller than a given minimum utility threshold.
However, as reviewed in [4], the complexity of TWU is
more than necessary.
In this paper, we propose the Retail Transaction-Weighted
Utility (RTWU) structure, the EAHUI-Miner sequential
algorithm, and the PEAHUI-Miner parallel algorithm for
mining high utility itemsets by efficiently pruning candidate
itemsets using the RTWU structure.
The paper is organized as follows. First, Section II
reviews related works. Then, Sections III and IV respec-
27
Research and Development on Information and Communication Technology
TABLE I
DATABASE TRANSACTION
tid Transactions
1 b:1, c:2, d:1, g:1
2 a:4, b:1, c:3, d:1, e:1
3 a:4, c:2, d:1
4 c:2, e:1, f :1
5 a:5, b:2, d:1, e:2
6 a:3, b:4, c:1, f :2
7 d:1, g:5
TABLE II
UTILITY TABLE
Item a b c d e f g
Utility 1 2 1 5 4 3 1
tively present the EAHUI-Miner and PEAHUI-Miner algo-
rithms. Section V gives experimental evaluation. Finally,
Section VI concludes the paper.
II. RELATED WORKS
Let I = {i1, i2, i3, . . . , in} be a set of items and D be
a database composed of a utility table and a transaction
table. Each item in I has a utility value in the utility
table. Each transaction Tj in the transaction table T has
a unique transaction identifier (tid = j), and is a subset
of I, in which each item is associated with a count value.
An itemset is a subset of I and is called a k-itemset if it
contains k items.
To better explain the concept, we give a database trans-
action represented as a table, as Table I, and utilities of the
items, as shown in Table II.
Definition 1 ([7]): The internal utility of item ik in
transaction Tj , denoted as O(ik,Tj), is the count value
associated with i in Tj in the transaction table of D.
For examples, in Table I, O(a,T2) = 4 and O(d,T2) = 1.
Definition 2 ([7]): The external utility of item ik ,
denoted as S(ik), is the utility value of ik in the utility
table of D.
For examples, in Table II, S(a) = 1 and S(d) = 5.
Definition 3 ([7]): The utility of item ik in transaction
Tj , denoted as U(ik,Tj), is the product of O(ik,Tj) and S(ik),
that is,
U(ik,Tj) = S(ik) ×O(ik,Tj).
For examples, U(a,T2) = 1× 4 = 4 and U(d,T2) = 5× 1 = 5.
Definition 4 ([7]): The utility of itemset X in transac-
tion Tj , denoted as U(X,Tj), is the sum of the utilities of
all the items in X in Tj in which X is contained, that is,
U(X,Tj) =
∑
(ik ∈X)∧(X⊆Tj )
U(ik,Tj).
For example, U({ad},T2) = 1 × 4 + 5 × 1 = 9.
Definition 5 ([7]): The utility of itemset X , denoted as
U(X), is the sum of the utilities of X in all the transactions
Tj containing X in D, that is,
U(X) =
∑
(Tj ∈D)∧(X⊆Tj )
U(X,Tj).
For example, in Table I, U({ad}) = U({ad},T2) +
U({ad},T3) + U({ad},T5) = (1 × 4 + 5 × 1) + (1 × 4 + 5 ×
1) + (1 × 5 + 5 × 1) = 28.
Definition 6 ([7]): The utility of transaction Tj , denoted
as TU(Tj), is the sum of the utilities of all the items in Tj ,
that is,
TU(Tj) =
∑
ik ∈X
U(ik,Tj),
and the total utility of D is the sum of the utilities of all
the transactions in D.
For example, TU(T3) = 1 × 4 + 1 × 2 + 5 × 1 = 11 and
TU(T4) = 1 × 2 + 4 × 1 + 3 × 1 = 9.
Definition 7 ([7]): The transaction-weighted utility of
itemset X in D, denoted as TWU(X), is the sum of the
utilities of all the transactions containing X in D, that is,
TWU(X) =
∑
(X⊆Tj )∧(Tj ∈D)
TU(Tj).
For example, TWU({ad}) = TU(T2)+TU(T5) = 18+26 = 44.
Definition 8 (High transaction-weighted utility itemset
(HTWU) [10]): For a given itemset X , X is called a high
transaction-weighted utility itemset if TWU(X) ≥ λmin,
where λmin is a user-specified threshold.
For example, if λmin = 12 and TWU({ad}) then {ad} is
HTWU.
Definition 9 ([7]): Given an itemset X and a transaction
(or itemset) T with X ⊆ T , the set of all the items after
X in T is denoted as T/X .
For example, in Table I, T2/{abc}= {de} and T2/{cd}= {e}.
Definition 10 ([7]): The remaining utility of itemset
X in transaction T , denoted as ru(X,T), is the sum of the
utilities of all the items in T/X in T , that is,
ru(X,T) =
∑
i∈(T/X)
U(i,T).
For example, ru({abc},T2) = 5 × 1 + 4 × 1 = 9.
Definition 11 ([7]): Each element in the utility-list of an
itemset X contains three fields: tid, iutil, and rutil, where
tid is the identifier of the transaction, Ttid , containing X ,
iutil is the utility of X in Ttid , i.e., U(X,Ttid), and rutil is
the remaining utility of X in Ttid , i.e., ru(X,Ttid).
Lemma 1 ([7]): Given the utility-list of an itemset X , if
the sum of all the iutils and rutils in the utility-list is less
28
Vol. E–3, No. 14, Sep. 2017
{ab} {ac} {abc}
tid iutil rutil tid iutil rutil tid iutil rutil
2 6 12
+
2 7 9 → 2 9 9
5 9 13 3 6 5 6 12 6
6 11 7 6 4 6 Utility = 21
Utility = 26 Utility = 17
Figure 1. Joint utility-list of itemsets {ab} and {ac }.
than a given threshold λmin, any extension X ′ of X is not
high-utility.
For example, the joint utility-list of itemsets {ab} and
{ac} from Table I and Table II is shown in Figure 1.
Let us join utility-list of two itemsets based on the
transaction identifier tid. For example, iutil({abc},T2) =
iutil({ab},T2)+iutil({ac},T2)−iutil({a},T2) = 6+7−4 = 9.
Similarly, iutil({abc},T6) = iutil({ab},T6) + iutil({ac},
T6) − iutil({a},T6) = 11 + 4 − 3 = 12. Then, calculate
rutil({abc},T2) = rutil({c},T2) = 9 and rutil({abc},T6) =
rutil({c},T6) = 6.
Assuming that λmin = 30, we have the total utility of
the itemset {abc} is 21, so the itemset {abc} is not a high
utility one. But the sum of the iutil and rutil of the itemset
{abc} is 21+ 15 = 36. Thus, the itemset {abc} should still
be able to generate high utility itemsets.
The FHM algorithm is based on the observation that
although HUI-Miner performs a single phase and thus
does not generate candidates as does the two-phase model,
it explores the search space of itemsets by generating
itemsets and a costly join operation has to be performed to
evaluate the utility of each itemset. To reduce the number of
joins that are performed, FHM proposes a pruning strategy
named Estimated Utility Co-occurrence Pruning (EUCP)
that can prune itemsets without having to perform joins.
Specifically, FHM stores the TWU value of all pairs (a, b)
and is based on the closure property of TWU, so all itemsets
containing pairs (a, b) and having a TWU value smaller than
λmin will not join the utility-list of itemsets.
Furthermore, the FHM algorithm mines high itemsets
in depth. Assume that items are arranged in the alphabet
order, {aX} is an itemset with the prefix a, {bX} is an
itemset with the prefix b. Thus, {bX} does not contain
item a, but since computation of TWU({bX}) may still
contain the utility of item a, the value of TWU({bX}) is the
upper bound of U({bX}) and larger than necessary, so using
TWU({bX}) to prune candidate itemsets is not effective.
III. EAHUI-MINER ALGORITHM
In this section we present the RTWU structure and
EAHUI-Miner sequential algorithm for mining high utility
TABLE III
UTILITY-LIST OF {cd}
tid iutil itemutil rutil
1 7 5 1
2 8 5 4
3 7 5 0
itemsets using the RTWU structure to effectively pruning
candidate itemsets.
Definition 12: Extended utility-list of an itemset P is
a list of items in which each item consists of 4 fields:
tid, iutil, itemutil and rutil, where tid is the identity
of transaction containing P, i.e., Ttid , iutil is the utility
of itemset P in transaction Ttid , i.e., U(P,Ttid), itemutil
is utility of item x in Ttid , i.e., U(x,Ttid), and rutil is
remaining utility of transaction Ttid except the utility of
P, starting from the item after item x.
For example, utility-list of {cd} is shown in Table III.
Definition 13: The remaining transaction-weighted util-
ity of an item (x, y) (x is before y in order) in a transaction
T is the sum of the remaining utilities of all the items in T
from item x, denoted as RTWU(x, y,T), that is,
RTWU(x, y,T) =
∑
i∈[T/x′]
rutil(i,T),
where [T/x ′] is the transaction T that does not contain items
before item x in T .
For example, RTWU(c, d,T2) = 4.
Definition 14: The remaining transaction-weighted util-
ity of an item pair (x, y) in database D is the sum of the
remaining transaction-weighted utilities of pair (x, y) in all
transactions containing items x and y in D, denoted as
RTWU(x, y), that is,
RTWU(x, y) =
∑
(T ∈D)∧((x,y)⊆T )
RTWU(x, y,T).
For example, RTWU(c, d) = 1 + 4 + 0 = 5.
Definition 15: The RTWU structure is determined by a
triplet (x; y; c) ∈ I × I × R, where I is the set of items in
the database, x and y are two items of I (x is before y in
order), and c is a real number such that c = RTWU(x, y).
Definition 16: Sum of all the iutils in the extended
utility-list of itemset Px, denoted as Px.sumiutils, i.e.,
Px.sumiutils =
∑
(Px∈T )∧(T ∈D)
U(P,T).
Definition 17: Sum of all the rutils in the extended
utility-list of itemset Px, denoted as Px.sumrutils, i.e.,
Px.sumiutils =
∑
(Px∈T )∧(T ∈D)
U(T \Px,T).
29
Research and Development on Information and Communication Technology
(iutil = U_set; rutil = U_rem; itemutil = U_item).
{ab} {ac}
tid iutil itemutil rutil tid iutil itemutil rutil
2 4 2 12
2 4 3 9
5 5 4 13 3 4 2 5
6 3 8 7 6 3 1 6
Figure 2. Extended utility-lists of itemsets {ab} and {ac }
Definition 18: Sum of all the itemutils in the extended
utility-list of itemset Px, denoted as Px.sumitemutils, i.e.,
Px.sumitemutils =
∑
(Px∈T )∧(T ∈D)
U(x,T).
Lemma 2: Given two extended utility-lists of itemsets
Px and Py of P. If min(Px.sumiutils, Py.sumiutils) +
RTWU(x, y) is smaller than λmin then Pxy and expand of
Pxy are not high transaction weight utility itemsets.
Proof: For all pairs (x, y), we have
U(Pxy) = UTIL(P) +UTIL(xy),
where U(Pxy) is utility of itemset Pxy in D, UTIL(P) is
utility of P in transactions containing Pxy, UTIL(xy) is
utility of itemset {xy} in transactions containing Pxy.
According to Definition 12 of an extended utility-list
and join operation of two extended utility-lists of Px and
Py, implemented as join operation of HUI-Miner [7] and
FHM [9] algorithms, we have
UTIL(P) ≤ min(Px.sumiutils, Py.sumiutils).
According to Definition 13 for RTWU(x, y), we have
UTIL(xy) ≤ RTWU(x, y).
Thus,
U(Pxy) = UTIL(P) +UTIL(xy)
≤ min(Px.sumiutils, Py.sumiutils) + RTWU(x, y).
For example, in Figure 2, extended utility-list of itemsets
{ab} and {ac}with λmin = 30.
According to the FHM algorithm, with the itemset
P = {abc} having the utility-list shown in Figure 1, we
have P.sumiutils + P.sumrutils = 21 + 15 = 36 and
TWU({ab}) = TU(T2)+TU(T5)+TU(T6) = 18+ 22+ 18 =
58. Similarly, TWU({ac}) = 47 and TWU({bc}) = 46.
Thus, the itemset P contains pairs (a, b), (a, c) and (b, c),
the TWU of each is greater than 30, and since P.sumiutils+
P.sumrutils > 30, it continues to join the utility-list of the
itemset P with utility-lists of other itemsets to find high
utility itemsets.
Above, P = {abc} has
P.sumiutils + P.sumrutils = 36 > 30.
Algorithm 1: Building extended utility-list
Inputs:
Px.ULs . extended utility-list of Px;
Py.ULs . extended utility-list of Py;
Output:
Pxy.ULs . extended utility-list of Pxy;
function CONSTRUCT(Px, Py)
Pxy.ULs = Null;
for each item Ex ∈ Px.ULs do
if (∃ E y ∈ Py.ULs and Ex.Ttid = E y.Ttid) then
Exy = 〈Ex.Ttid, Ex.iutils + Ex.itemutils,
E y.itemutils, E y.rutils〉;
Insert Exy into Pxy.ULs;
end if
end for
Return Pxy.ULs;
end function
However, according to Lemma 2, we have
min({ab}.sumiutils, {ac}.sumiutils) + RTWU(b, c)
= min{(4 + 5 + 3), (4 + 4 + 3)} + (9 + 6)
= 11 + 15 = 26 < 30.
Thus, it does not join the utility-list of itemset {abc} with
utility-lists of other itemsets.
Based on Lemma 2, we propose to improve the FHM
algorithm using the RTWU structure, which is presented
in the next section.
1. Building the Extended Utility-List
In the first database scan, the extended utility-list of 1-
itemset of P is empty (i.e., P.iutil=0). Extended utility-lists
containing k-itemsets (k ≥ 2) are built using Algorithm 1.
2. Mining High Utility Itemsets
Algorithm 2 shows the pseudo-code of the EAHUI-Miner
algorithm. For each utility-list X in ULs, if the sum of all
the utilities of items in X exceeds λmin, then the extension
associated with X is high utility and is returned by the
algorithm. In EAHUI-Miner, we use the EUCS structure
of the FHM algorithm [9] and the RTWU structure of
Definition 14.
IV. PEAHUI-MINER PARALLEL ALGORITHM
In this section, we present the PEAHUI-Miner parallel
algorithm derived from the EAHUI-Miner algorithm in
Section III. This is a dynamic distribution parallel algo-
rithm following a fine-grain and shared memory model to
30
Vol. E–3, No. 14, Sep. 2017
Algorithm 2: High Utility Itemset Mining Algorithm
Inputs:
ULs . high utility 1-itemset;
λmin . minimum utility threshold;
Output:
HUIs . high utility itemsets.
procedure EAHUI-MINER(ULs, λmin)
for each utility-list of Px ∈ ULs do
if (Px.sumiutils+Px.sumitemutils ≥ λmin) then
return Px;
end if
if (Px.sumiutils + Px.sumitemutils
+Px.sumrutils ≥ λmin) then
exULs = Null;
for each Py after Px ∈ ULs do
if (EUCS(x, y) ≥ λmin) &
min(Px.sumiutils, Py.sumiutils)
+RTWU(x, y) ≥ λmin) then
exULs = exULs+Construct(Px, Py);
end if
Call EAHUI-Miner(exULs, λmin);
end for
end if
end for
end procedure
improve load balancing between processes. The flowchart
of the algorithm is shown in Figure 3.
The main parallel step is implemented by the parallel
loop structure (#pragma omp parallel for) of the OpenMP
library for utility-list of 1-itemsets. The fine-grain parallel
is a method which divides each element in the ULs of 1-
itemsets for each process.
The number of processes is usually much smaller than the
number of elements in the ULs. The parallel loop structure
will divide the next element in ULs for each processor
when it has finished processing the previous element.
V. EXPERIMENTS
1. Environment and Data
We performed experiments to evaluate the performance
of the proposed EAHUI-Miner and PEAHUI-Miner algo-
rithms. Experiments were carried out on a computer with
a third generation 64-bit core i7 processor running Win-
dows 7 and 8 GB of RAM. We compared the performance
of EAHUI-Miner with the state-of-the-art algorithms, EFIM
and FHM, which are downloaded from the homepage of
Fournier-Viger [11]. Our two algorithms are coded by Java,
and the OpenMP library is used for the parallel algo-
rithm. Experiments were performed using a set of standard
. . .
Database
Caculate utility-list 1-itemsets
i = 0
j = m
F
T
j < ULs.size()
m1
If (P1 - avaiable)
If ( j < m + 1)
Mining item ULs(i + 1)
Else
Mining item ULs( j)
If (P1 - avaiable)
If ( j < m + 1)
Mining item ULs(i + m)
Else
Mining item ULs( j)
j = j + 1
Waiting all threads is completed
Finish.
Range
R(Ptindex)
Figure 3. Flowchart of the PEAHUI-Miner algorithm.
TABLE IV
DATASET CHARACTERISTICS
Database AvgLen #Trans #Items
T10I4D100K 10 100,000 1,000
T10I4D200K 10 200,000 1,000
Foodmart 4.4 4,141 1,559
Mushroom 23 8,124 119
TABLE V
NUMBER OF CANDIDATE ON DIFFERENT DATASETS
λmin FHM EAHUI-Miner
T10I4D100K 2,500 153,016 125,647
T10I4D200K 2,500 925,994 891,585
Foodmart 1,000 259,876 258,921
Mushroom 100,000 1,588,018 1,587,927
datasets namely T10I4D100K, T10I4D200K, Foodmart and
Mushroom. Table IV summarizes their characteristics.
31
Research and Development on Information and Communication Technology
Minimum Utility Threshold
100000 150000 200000
R
un
ni
ng
ti
m
e
(s)
×105
0
0.5
1
1.5
2
Mushroom
EFIM
FHM
EAHUI-Miner
Figure 4. Execution times on Mushroom dataset.
Minimum Utility Threshold
1000 1500 2000
R
un
ni
ng
ti
m
e
(s)
0
100
200
300
400
500
600
700
Foodmart
EFIM
FHM
EAHUI-Miner
Figure 5. Execution times on Foodmart dataset.
2. Candidates and High Utility Itemsets
The experimental results of the EAHUI-Miner, PEAHUI-
Miner, EFIM, and FHM algorithms are obtained with many
minimum thresholds of utility and the same number of high
utility itemsets.
Table V shows the number of candidates that the two
algorithms generated. The results show that number of
candidates generated by FHM is larger than that generated
by EAHUI-Miner.
3. Running Time
Running times of the EFIM, FHM, and EAHUI-Miner
algorithms are shown in Figures 4, 5, 6 and 7. The results
show that EFIM is very fast on dense datasets with small
size for the set I, EAHUI-Miner and FHM are faster than
Minimum Utility Threshold
2500 3000 3500
R
un
ni
ng
ti
m
e
(s)
×104
0
0.5
1
1.5
2
T10I4D100K
EFIM
FHM
EAHUI-Miner
Figure 6. Execution times on T10I4D100K dataset.
Minimum Utility Threshold
2500 3000 3500
R
un
ni
ng
ti
m
e
(s)
×104
1
1.5
2
2.5
3
3.5
4
T10I4D200K
EFIM
FHM
EAHUI-Miner
Figure 7. Execution times on T10I4D200K dataset.
EFIM on sparse large datasets with large size for I but
there are fewer items in each transaction.
Figure 8 and 9 compare the running times of the EAHUI-
Miner sequential algorithm and the PEAHUI-Miner parallel
algorithm on the T10I4D100K and T10I4D200K datasets.
VI. CONCLUSIONS
In this paper, we have introduced the RTWU structure
and two algorithms – EAHUI-Miner sequential algorithm
and PEAHUI-Miner parallel algorithm. Experimental re-
sults have showed that the number of candidate sets that
EAHUI-Miner generates is less than that of FHM. More-
over, the two proposed algorithms are also faster than
two state-of-the-art algorithms (FHM and EFIM), with
datasets containing thousands of items and huge numbers
32
Vol. E–3, No. 14, Sep. 2017
Minimum Utility Threshold
2500 3000 3500
R
un
ni
ng
ti
m
e
(s)
×104
0
0.5
1
1.5
2
T10I4D100K
EAHUI-Miner
PEAHUI-Miner
Figure 8. Execution times on T10I4D100K dataset.
Minimum Utility Threshold
2500 3000 3500
R
un
ni
ng
ti
m
e
(s)
×104
0.5
1
1.5
2
2.5
T10I4D200K
EAHUI-Miner
PEAHUI-Miner
Figure 9. Execution times on T10I4D200K dataset.
of transactions (large databases), but there are fewer items
in each transaction. In the future, we plan to test and
implement the algorithms using Hadoop framework.
REFERENCES
[1] R. Agrawal, “Fast algorithms for mining association rules
in large databases,” in Proceedings of the 20th International
Conference on Very Large Data Bases, 1994, pp. 478–499.
[2] W. Song, Y. Liu, and J. Li, “Vertical mining for high utility
itemsets,” in IEEE International Conference on Granular
Computing (GrC). IEEE, 2012, pp. 429–434.
[3] G.-C. Lan, T.-P. Hong, and V. S. Tseng, “An efficient
projection-based indexing approach for mining high utility
itemsets,” Knowledge and information systems, vol. 38, no. 1,
pp. 85–107, 2014.
[4] D. H. Phong and N. M. Hung, “Một mô hình hiệu quả khai
phá tập mục lợi ích cao,” Research and Development on
Information and Communication, pp. 26–36, Jun. 2015.
[5] V. S. Tseng, C.-W. Wu, B.-E. Shie, and P. S. Yu, “Up-growth:
an efficient algorithm for high utility itemset mining,” in
Proceedings of the 16th ACM SIGKDD international con-
ference on Knowledge discovery and data mining. ACM,
2010, pp. 253–262.
[6] S. Dawar and V. Goyal, “Up-hist tree: An efficient data
structure for mining high utility patterns from transac-
tion databases,” in Proceedings of the 19th International
Database Engineering & Applications Symposium. ACM,
2015, pp. 56–61.
[7] M. Liu and J. Qu, “Mining high utility itemsets without
candidate generation,” in Proceedings of the 21st ACM
international conference on Information and knowledge man-
agement. ACM, 2012, pp. 55–64.
[8] S. Zida, P. Fournier-Viger, J. C.-W. Lin, C.-W. Wu, and V. S.
Tseng, “Efim: a highly efficient algorithm for high-utility
itemset mining,” in Mexican International Conference on
Artificial Intelligence. Springer, 2015, pp. 530–546.
[9] P. Fournier-Viger, C.-W. Wu, S. Zida, and V. S. Tseng,
“Fhm: faster high-utility itemset mining using estimated
utility co-occurrence pruning,” in International symposium
on methodologies for intelligent systems. Springer, 2014,
pp. 83–92.
[10] Y. Liu, W. keng Liao, and A. Choudhary, “A two-phase
algorithm for fast discovery of high utility itemsets,” in
Proceedings of the Pacific-Asia Conference on Knowledge
Discovery and Data Mining (PAKDD), 2005, pp. 689–695.
[11] Philippe Fournier-Viger’s site.
viger.com.
Nguyen Manh Hung received the Ph.D.
degree in Computer Science, Scientific
Academy, Russian Federation in 2004. He
is working in the Postgraduate Department
- Military Technical Academy, Hanoi, Viet-
nam. His current research interests include:
modern data structures, data mining, clas-
sification of the high-performance packet.
Dau Hai Phong received the Master degree
in Information Technology, Military Tech-
nical Academy, Hanoi, Vietnam in 2008.
He is working in Faculty of Mathemat-
ics and Computer Science, Thang Long
University. His current research interests
include: Data mining, Parallel computing,
Distributed Database.
33
Các file đính kèm theo tài liệu này:
- parallel_mining_for_high_utility_itemsets_mining_by_efficien.pdf