PHÂN LOẠI THÔNG ĐIỆP TRÊN DIỄN ĐÀN
LÊ THANH TÂM
Trang nhan đề
Mục lục
Chương 1: Tổng quan.
Chương 2: Tiền xử lý dữ liệu.
Chương 3: Phân loại thông điệp.
Chương 4: Chương trình phân loại.
Chương 5: Kết luận.
Tài liệu tham khảo
Phụ lục
33 trang |
Chia sẻ: maiphuongtl | Lượt xem: 1899 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Luận văn Phân loại thông điệp trên diễn đàn, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
28
Chuang2- TIEN XU LY DU LI~U
Theohinh 1.3thi giai do~nti~nxu 1)-'dii'li~uduQ'cthvchi~ntheobabu6c
chinhnhusau:
Phantichva
bi~udi~n
thongdi~p
Xay d1Jllg
C\lllltir ChQnd~ctrung
2.1.Phantichvabi~udi~nthongdi~p
PhAnnaybaag6m:phantachtutrongthongdi~pvabi€u di~nthongdi~p.
MQtv~nd~khokhandAtitientrongxu1)-'tvdQngti~ngVi~tlavi~cdinhnghlatIT
trongti~ngVi~tv~nconnhi~utr,!nhlu~n.f)~thu~nti~nchovi~ctrin}Ibayv~sau
chungtoitheoquaildi€mcuaDinhDi~n[6]sau:mQtdiuti~ngVi~tbaag6mnhi~u
tIT,m6itl1baag6mmQthaynhi~u'ti~ng',m6i'ti~ng'lam6ichu6ikYtv li~nnhau
phanbi~tv6icacti~ngkhacb~ngmQthaynhi~likhoangtr~ng.Thid\l:tIT'hQc'la
mQtITg6mmQti~ng,tu 'hQcsinh'lamQtug6mhaiti~ng.C\lmtu'khoahQct1J
nhien'g6m2tuhay4ti~ng.
Trongcacti~ngChilliau,nguaitacoth€ dUllgianI~yxacdinhcactITnha
vaocackhoangtr~ngphancachtuvachQncactud~ctf11ngchonQidungvanban
(dvavaotAnsu~txu~thi~ncuatu)lamd~ctrung.D6iv6iti~ngVi~tchungta
khongth~lamtuungtvb6in~uchungtaxacdinhtuchidvacackhoangtr~ngphan
cachthichungtacoth€ chinh~nduQ'cac'ti~ng'vonghlavadododQchinhxac
cuah~th6ngser~th~p.Theocacnhangoi1ngii'hQcthiti~ngVi~tcod~n80%la
cactIT2 'ti~ng'[6].Nhuv~ymu6nd~tchinhxaccaDthichungtaphaiti~nhanh
tachtu.
2.1.1.Tachtir
Bu6cnaytachmQthongdi~pthanhcaccalivam6icalithanhcacdunvi tIT.
29
Phuangphav
Buac 1.Xay d\l11gotomatamti~tdminnh~ tfitcacacamti~tti~ngVi~t.
Buac 2.Xay dlJngotomatITVlJl1gdoannh~ntfitcacactITvlJngti~ngVi~t.
Buac3.DlJatrencacotomatneutren,xaydlJngdbthituO'ngungvaiCall
cftnphantichvaS11d\mgthu~ttoantimki~mtrendbthid€ li~tkecaccachphan
tichcothe.
Bangchucaicuaotomatamti~tlabangchucaiti~ngVi~t,m6iclingchuy€n
duQ"cghitrendomQtky tlJ. Thi d\l,vai baamti~tphuang,philp,trinhtaseco
otomatdoannh~namti~tnhuHinh2.1.
f:~~
q, ~
~~
Hinh2.1.XaydlJngotomatamti~t
Thu~toanxayd\l11gotomatamti~t
Nhfjp:Tir di€n amti~t
Xudt:Otomatamti~t.
Thufjtloan:
1.L~ptn;mgthaikh&idftu%;
2.Vong l~pdQcchotai khi h~t~pdu li~u,lfiyratUngamti~t.GQicackY tlJ
cuaamti~tdatac(pcl'°o.,Cn-l.
a. p:=%;i:=O;
b.Vongl~ptrongkhi(i ~n-l )
30
i. LAy ra kY tl,rCj;
ii. Timtrongcacclingchuy~ntirtr~ngthaiP clingtrendoghiky
t1JCj. Neucocling(p,q) nhuthe:
i :==i +1;
p:==q;
iii. Neukhongcocling(p,q) naonhuthethithoatkhoivong
Hipb.
c.V6i j ill i denn-l
i. T~om6itr~ngthaiq, ghinh~nq Iatr~ngthaikhongket;
ii. Themclingchuy~n(p,q)trendoghikytl,rCj;
iii. p:==q;
d.Ghinh~nq Iatr~gthaiket;
Otomatill VlJngduQ'cxaydlJtlgtuangtl,r,v6idi~mkhacnhusail:thayvi ghi
trenm6iclingchuy~nm(>tamtiet,taghis6bi~ucuatr~ngthai(ket)cuaotomatam
tiett~idodoannh~m6iamtietcuaill nh~mgiftmkichthu6ccuaotomatill VlJtlg.
Thi d\l,v6i haitirphl£O'ngphapvaphl£O'ngtrinh,giftsirkhi duaIAnIuqtcacamtiet
phl£ang,phap,trinhquaotomatamtiet,tadenduQ'cactr~ngthaiketghicacs6nj,
n],n3thitrencacclingchuy~ntuangimgtaghicacs6nj, n2,n3(Hinh2.2).
~~
"\
D
Hinh2.2:Xfiy dl,Tngotomatirvl,Tng
Thu~tOMxaydlJtlgotomatill VlJtlg:
31
Nhgp:Tir dientiTVlJIlg,otomatamtiet
Xudt:OtomattirV\l'ng.
Thugtloan:
I. L~ptr:;mgthaikh6'idftu%;
2.Yang l~pdQcchot6'ikhihett~pdftli~u,Idyratirngm\lCtiTword.OQicac
amtietcuawordhi SO'SP""Sn-l' ,
3. Su d\lngotomatamtietdedoannh~ncacamtiettren,duQ'cacs6hi~u
cuatr:;mgthai(ket)wangtmgla mo,mp...,mn-l
a p:=q 'i'- O
'
. 0' .- ,
b. Yang l~ptrongkhi (i:::;n-I)
i. Ldyras6mj;
ii. TimtrongcaccungchuyentiTtr~ngthaip cungtrendoghis6
m;.Neucocung(p,q)nhuthe
I i :=i +l'. '
2.P:=q;
iii. N~ukhongcocung(p,q)naonhuthethithoatkh6ivang
l~pb.
c.V6'i j tiTi denn-l
i. T~om6'itr~gthaiq, ghinh~nq latr~gthaikhongk~t;
ii. Themcungchuyen(p,q)trendoghis6mj;
iii. p:=q;
d.Ohinh~nq latr~gthaik~t
Saukhidaxayd1!ngxonghaiotomat,aghichungvaohait~pdjnhkieude
dungtrongbu6'cphantachtirVlJIlg.N~um6ikYW(char)duQ'cghivaot~pv6'ikich
32
thuO'c2 byte(maUnicode),m6is6nguyen(int)cokichthuO'c4 bytethi t~pluu
otomatamti<§tcokichthuO'c270KB,t~potomatill V\IDgcokichthuO'c1MB.
Tu tuangcuathu~ttoanphfLntachtirV\IDglaquyvi~cphantachcauv€ vi~c
timduangditrenmQtde,thicohu6ng,khongcotrQngs6.
GiaSlrcaubandfiulamQtdayge,mn+1 amti€t So,Sl, ...,Sn.Ta xayd\!ng
mQtde,thicon+2dinhVo,VI, ...,VmVn+l,s~pthllWtrenmQtduangth~ngill trai
sangphai;trongdo,ill dinhVid€n dinhVjco cung(i <j) n€u cacamti€t Si,Si+],...,
Sj-ltheothut\! l~pthanhmQttir.Khi do m6icachphantachcaukhacnhautuong
UngvO'imQtduangdi trende,thi ill dinhdfiuVod€n di.Qhcu6iVn+l.Trongth\!ct€,
cachphfLntichcaudungd~nnhAtthuangUngvO'iduangdi quait cungnhAttrende,
thi.
TrongtruanghQ'pcauco S\!nh~pnhfuIgthi de,thi seco nhi€u bonmQt
duangding~nhAtirdinhdfiud€n dinhcu6i,tali~tketoanbQcacduangding~n
nhAtrende,thi,tirdoduaratAtcacacphuongantachcaucoth~vad~nguoidung
quy€tdinhsechQnphuonganmio,tuythuQcvaongii'nghiaho~cvancanh.Thi dl,l,
xetmQtcaucoc\lm"thu(Jcdiabim",tacode,thinhusau(Hinh2.3).
thu9Cdill
I~--'
djllh~
Hinh2.3.MQttinhhu6ngnh~pnhfuIg
C\lmnayco S\!nh~pnh~nggiii'athu(Jcdiavadia himvatasecohaik€t qua
phantachla "thu(Jcdia / bim"va "thu(Jc/ dia bim".Ta co th~chi rarAtnhi€u nhii'ng
c\lmnh~pnhfuIgtrongti€ng Vi~t,ch~gh~n"t6hQ'pamti€t", "bfuIgchUngcO"',...
TruanghQ'ptrongcauco amti€t khongn~mtrongtir di~nthi ro rangotomat
amti€t khongdoannh~nduQ'camti€t nay.K€t qualade,thitaxayd\Illgtircaudo
lakhonglienthong.D\!avaotinhchAtnay,tathAyrfuIgn€u de,thikhonglienthong
33
thid~dangphathi~nrar&ngdaTIvi amtietkhongdoannh~nduQ'ckhongn&mtrong
tirdi~namtiet,tircnoduQ'cvietsajchinhtaho~clamC)tdaTIvi amtiet(tirv\ffig)
moi
DanhgiGkitqua
V6i cachtierc~nnhutren,battoanphantachtirvlJngtrongCalitiengVi~t
vScO'band5duQ'cgiaiquyet,d~cbi~tlavAndStachcact6hQ'ptirtuangduO'ngv6i
mQtdaTIvi tirVlJng,thuanglacacClJmtirc6dinh,ngii'c6dinhho~cacthanhngii'
trongtiengVi~t.v6i nhiiTIgCalinh~pvaocoSlJnh~pnh&ngtirV\ffig,tircconhiSu
banmC)tcachphantachthitaphaichQnramC)tphuangan.TrongtAtcacacphuang
anphantachdobaogiaclingt6nt~iphuangandung.
DdgiaiquyitvdndJ naytacothddung:
-Cacquyt~cfigii'phapdochuyengiangonngfrxayd\ffig.Tienhanhphan
tichcuphapcuaCaliv6inhfrngphuangantachtirV\ffigcoth~,tirdolo~iranhiiTIg
phuangansatcupharo
-Dungphuangphapxac suAt- th6ngke.Phaith6ngketrongkhovanban
tuO'ngd6i16ncuatiengVi~td~timraxacsuAtcuacacbC)dothaybC)batir lo~iho~c
tirvlJngdi c~nhnhau.Tir doIlJa chQnphuO'nganphantachcoxacsuAtsatit nhAt.
0 daytasirdlJngcachthirhal,tircIa dungxacsuAtth6ngke d~chQnphuO'ng
ant6tnhAt.ClJ th~la taduaramC)tdC)donhutrongdinhnghla2.1.
Dinh nghza2.1 (D(jkit dinhc¥mfir):Cho ClJmtir dain tir (wJ"",wn).DC)do
pccuanoduQ'cdinhnghlanhusau:
C(Wp..., Wn)
pc(w"...,wn)=n4C(w,)*",*C(wn)
(2.1)
V6i C(wJ"",wn)la s6IfinmatAtcacactirtrongclJmtitcungxuAthi~nv6i
nhau.C(Wj),i=l,nIa s6IfinxuAthi~ncuatirWitrongmQiClJmtir.D~thAyr&ngdC)
dopccaokhicactirthanhphfinxayrachungv6inhaunhiSubanvachungxayra
riengIeithall.
34
Nhuv~ytatinhdQk~tdinhpcchotfttcacacphuanganvachQnphuangan
co gia tr! 16TInhftt.
2.1.2.Trich dii'li~u
Dinhnghla2.2:StoplistIa danhsachcactirxuftthi~nquanhi~utrongkho
ngftli~uvakhongconQidungrorangtrongthongdi~pnhucacgi6'ill, d~itirva
lientir...
Thi d\!cactirthuQcstoplistla:"cai","va","v~",...Lu~tZipfduQ'cdungd~
nh~nbi~ttAnsufttxuftthi~ncuacactir quanhi~uhayquait. Xoa cactir thuQc
stoplistla mQttrongnhftngcachkJ:rucactirV\ffigphi>bi~nnhftt.Vi~ckhircactir
hi~mxuftthi~nduQ'cth\lchi~na bu6'cphatsinhlu~tb&ngcachxacdinhnguang
min_sup.
T~pcactirtrongstoplistthikhongc6dinhmaduQ'cxacdinhph\!thuQcvao
m6ilInhV\lc.Thi V\ltir "maytinh"khongthuQcstoplisttrongcactaili~uphiky
thu~tnhungnosethuQcstoplisttrongcactiii li~uv~maytinh.Ta secocongC\!
clingcftpchonguaidungd~hQcoth~themhoi;icxoamQtirtrongstoplist.
Bu6'cnaytaxoacactir tAmthuang(stoplist)trongcan,chi giftl~icactir
chuathongtinhUnich.Taxemthu~ttoan2.1.
ThuG!oim2.1
Nh~p:(1)Scactirtrongthongdi~p;(2)DanhsachcacStoplist
Xuftt:Cactirhqpl~T trongcacCali.
Phuangphap:
1)T=
2)whilechuah~tS {
3)Trichtirt k~ti~ptrongS ;
4)
5)
}
ift~ L andt~T then
T=Tu t;
35
2.1.3.Bi~udi~ndii'li~u
D~apdl.mgbQrheinlo~ik~thqpchothongdi~ptacAnbi~udi~nthongdi~p
m thanhgiaotacT={l11c,mJ,{V}}.V6i mela tl,rad~cuathongdi~p,m!la chi~udai
cuanovaV={tt,...,td lavectO'k fir trongnQidungcuam.
Vi thu~toanchi thaotactrencacgiaotac86nenmiphAntu duqcanhx~
t6imQts6tuO'ng(mg.Ta daubriengmQtmi~ns6chomithuQctinhvagallm6igia
trim6icuathuQctinhv6is6chuadungk~ti~ptrongmi~n ay.V6im6imi~ns6,s6
chuadungdAutienla 1vatangthem1v6im6igiatri m6iduqcthemvao.Ky 86
dAutiencuamigiatri la Id cuathuQctinh.Thi d\ln~uphAnnQidungcuathong
di~pcoId la 1vamQtfirtrongdocos6123thinoduQ'cluutrongcO'So'dft1i~ula
1123.Tilt cacacchud~clingduqcanhx~ungv6i cacs6nguyennhungchung
khongdoih6iphaicoId vi th~mQtchud~phaithuQcv~phaicuamQtlu~tnaodo.
Chungtagi6ithi~umQtHeuristichiamQthongdi~pthanhnhi~uphAnma
m6iphAncodQdaikhongvuqtquamax_fen(thongdi~pcochi~udait6ida)vakern
theocacId thuQctinhcuathongdi~p.Di~unaynh~mcaiti~ndQchinhxacvatinh
thihanhcuabQrheinlo~i.Saudaylacaclqi ichkhiapd\lngHeuristichianay:
-DQdaigiaotacng~nhan:lamgiamdangk~thaigianxayd\l1lglu~tkhi
rheinlo~i.
-C\lCbQhoalu~t:lamC\lCbQhoatrenmilu~t,firdomilu~tchichuas6
phAntu it hO'nmax_leu.Tinh C\lCbQt~ochocacfir trongcaccaugi6ngnhauho~c
gAllgi6ngnhaucotinhtuO'ngquailnhi~uhall.
-Tang trngs6 cho thuQctinh quail trng:lam tangtrngs6 cho cacthuQc
tinhtlJad~macacnothuangquailtr<;mgtrongthongdi~p.
Heuristicduqcungd\lngchot~pcacthongdi~phuilnluy~nh~mt~ocho
mohinhrheinlo~ihi~uquahan.Khongchiacacthongdi~ptrongt~pdft1i~uthu
nghi~mvacacthongdi~pm6i.
36
Thidu2.1:Xemthongdi~ptronghinh2.4.
Tl7aG6C6phainhUngngu6'ithiehnh~eTCS1anhUngngu6'ithiehcoGO'll?
Daquy Toi 1amQtfaneuaTCS,mQtfanth(jtsY'tirkhitoiconr~tnh().Baihatnao
euaT dinglamtoilientuUngG~nmQtthigi6'ieoa6evatrimtu(Jng.Bljtn
than
toi,r~tyeuthiGhnhUngaemV(ingth~nayng6ingheKL hatnhgeT,m6t
minh
thai,vaeamth6ythiehm6tminhnhuv~y.N~ubljtlleiing1angu6'ithiGh
nhge
T, bljtlle6th~yn1inhthiehliingnghenhgqTmQtminhnhutoikhong?
Hengap1ai!TuanHung
Hinh2.4:MQtthongdi~ptrendi€n Gall.
Bangli: Cacill trichtirthongdi~ptrenhinh2.4
ThuQetinh CaefirtrongthongGi~p Cacgiatri so
Ungv6i caefir
{11,12,13,14,15,16,
17,18,19,110,111,
112,113,114,115,116,
117,118,119,
120,121}
{21,22,23,24}
13~
NQidung {fan,tCS,tp.~tsl7,nh6,bail1at,1c;un,li~Jf
tu6TIg,th~gi6i,coGQc,truutUQ'Ilg,
yellthich,Gemv~ng,ng6i,nghe,KL,
hat,nhljte,mOtminh,camth~y,thieh,
1~ngnghe}
{thich,nhljtc,TCS, coGO'll}
{daquy}
Tl7aGe
Ngu6'igui
CQtthu2 cuabang2.1lamQtt~pcacfir duQ'ctrichfir thongdi~p(daxoa
stoplist).Chungtagiasur~ngdaylathongdi~pddutientrongcO'saduli~uvas6
ddutienchuadungla 1.Do dod~tId(NQidung)= 1,Id(T1,lad~)=2 vaId(Nguai
gui)=3.
R5rangtrongnQidung:"fan"duQ'canhx~d~n11,"TCS" la 12,...
Trongt1,lad~:"thich"duQ'canhx~d~n21,"nhac"la22,...
Trongnguaigui:"Daquy"duQ'canhx~d~n31.
CQtthu3chuacacgiatrjs6cuacactir.
37
Di;itmax_len=3thiphfinn<)idungduQ'chiathanh2 phfin:
Phfin 1 chua 10tITdfiutien(lrngv6i 2 diu dfiu),phfin2 chua 11ill conl~i
(ungv6i 2 diu conl~i).Id cuatlJad6vangmJigiri luonco trongm6iphfinnay.K~t
quacohaigiaotaccod<)daituanglrngla 15va 16.Di;itId chud6cuathongdi~pla
3thitacohaigiaotacnhusau:
1
2
{3,15,11,12,13,14,15,16)7,18,19,110,21,22,23,24,31}
{3,16,111,112,113,114,115,116,1~7,118,119,120,121,21,22,23,24,31}
2.2.Xayd1}'ngc\lmfirvacbenc\lmfir
Dinhnghza2.3:M(JtcZ,JmfirtaSlfkit h9'Pcoy nghzavathuangxuyencua
haihaynhiJufirmamJi chungkh6ngthu(Jcstoplist.
Tadffbi~tr~ngcorfitnhi6utITxufithi~nph6Qi~ntrongvanbannhungchua
rfitit thongtin hftuich nhungd6ikh.ik~thQ'Pchungl~it~ora m<)tCl,lmtITthi chua
thongtin quailtrQngva khongconph6bi~nnua.Thi dl,ltrongm<)tdi€n dan,co
nhi6uthongdi~pchuacabatIT"tblJc","hi~n"va"duQ'C"lanhungtITthu<)cstoplist
vaconhi6ungunghlakhacnhaumyvaongucanh.Tuynhienn~uxemxetCl,lmtIT
"thlJchi~nduQ'c"thinokhongthu<)cstoplistnua.
Trongphfinnaytanghienc1JlJ11;1<)tthu~tgiaim6i do la FP-phrasemano sir
dl,lngthu~ttoankhaikhoangcacm~uph6bi~ngi5ngnhuFP-growthd€ xaydlJng
cl,lmtIT.Chungtaclingsirdl,lngd<)dopcd€ x~ph~ngvatiacl,lmtITtrongquatrinh
chungsinhfa.
TfitcacacphuangphapxaydlJngcl,lmtIThi~nnayduQ'chiathanhcacnhom
sau:
-PhuangphaptlJ d<)ngvathucong:Phuangphapthucongkhachinhxac
nhungt5nnhi6uthaigianvayeticfiuhi€u bi~tnhi6uv6cacchud6.PhuangphaptlJ
d<)ngduQ'chiathanh ailo~i:dlJatrencackythu~tngonnguhQcvadlJatrenth5ng
keduli~u.
38
-PhUO'Ilgphapd\1'avitongftcanh:NgftcanhcuamQtc\lmill coth6Iacungtai
li~u,cungcau...
-PhUO'IlgphapC\lmill cothut\1'vakhongthut\1':N~uchQnIa cothut\1'thi
d6iv6ihaiC\lmtircocacill gi6ngnhaunhungthut1!khacnhauthixemnhukhac
nhau.
-PhUO'Ilgphapd\1'avitoill Io~id6xacdinhC\lmill: Thi d\ltheosailcacdanh
ill IamQtchu6icacdanhill haytinhill thiduQ'cxemIa c\lmill.
-DungdQdai t6i da cuaC\lmtir tim duQ'c:Nhi~uphuO'Ilgphapchi xetcac
c\lmtircohaitir,mQts6phuO'Ilgphapkhacthixetcacc\lmill codQdaibAtky.
CacphuO'Ilgphapxir Iy ngonngftt1!nhienthi sird\lllgcacd~ctrungill V\lllg
vaCllphap.d mucdQtirV\lllgthi gallmQtill Io~i(danhtir, dQngtir,tinhtir...)cho
m6itir.d muccuphapthixacdinhm6iill d\1'avitochucnangcuachungtrongCall
(chutir,tuctir, dQngtir)va dungcacthongtin nayd6gomnhomcacill trongCall
thanhcacdonvi cocdutruc.
CacphuO'Ilgphapth6ngke sird\lngcacdQdonhu:tAnsudtxudthi~n,thong
tintuangh6,dQh6ndQn,dQk~tdinh...d6u6c IuQ'llgtinhhQ'PIy cuamQtS\1'k~thQ'P
giuacactirxemnhumQtC\lmtir.MQtphuO'IlgphapdO'IlgiannhdtIaxemmQtC\lm
tirnhumQtbQcacill khongthuQcStoplistvadi k~nhauit nhdttrong25taili~u.
MQtphuongphapkhacdinhnghiadQk~tdinhchomQtc\lmtirvachicocacC\lmill
cogiatr!naythoatrennguangm6iduQ'chApnh~.Nhi~uphuO'Ilgphapti~pc~
theocachtbnghQ'P,thid\lnhudungcackythu~tngonngftd6phatsinhc\lmtirva
mQtdQdothongtintUO'Ilgh6duQ'cth6ngked6Io~iboS\1'trungI~p.PhuO'Ilgphap
khacI~iirngd\lngthu~ttoanApriorid6timcacc\lmill cothut1!vakhongthut1!.
M~cducacphuO'Ilgphapth6ngkekhongd\1'avitocacd~ctinhcuphaphay
ngunghianhungchungkhamphacacC\lmtirv6i chiphirehO'IlrAtnhi~usov6i
dungcackythu~tngonngft.
39
2.2.1.Ph1f(mgphapFP-Growth(FrequentPatternGrowth)
FP-growthlagiaithu~tkhaiImminghi~uquacacmftuph6bi~nsovaithu~t
toanApriori.
Quizrac:ChomQt~pI cacphfu1ill, t~pX={ihiz...idc I duqcgQilak-
itemset.
V~yI-itemsetla t~pmQtphfu1tic,2-itemsetla t~p2phAntic...
2.2.1.1.Thi~tk~vaxayd1}11gFP-tree
CacthuQctiOOsandayt~ochoFP-treelamQtc~utrucdftli~ur~tcodQngva
hi~uqua:
l-CacnutcuanochilUlltrftI-itemset.Nhuv~ykhixayd\Ingcityta
chicAnduy~t2 lfu1trencoosadftli~u.
2-T~tcacacgiaotaccoosadftli~uduqcs~px~ptheothu1\Ixacdinh.
Do dosed~dangquansatkhicoOOi€ugiaotacdiquamQts6nutchungvacoth~
gQPl~icacnutgi6ngOOauchivi~ctangbi~nd~mchochUng.
3-S~px~pcacgiaotactheothut\I giamcuatAns6xu~thi~ncua
chungOO~mcaiti~ntinhcodQngcuacayvatangs6nutduqctrQnhO'n.
Dinhngh'ia2.4: FP-treela mQtcaycacmJuph6biJn vaclinglamarQng
cuac~utruccayti€n t6valUlltrftthongtindinhluqngxacdinhchocacmftuph6
bi~n.
I-Cay comQtnutg6ccoOOanla "null",mQtt~pcaccayconti€n t6va
mQtbangHeader.
2-M6inuttrongcayconti€n t6chua3 thuQctiOO:item-name,count
vanode-link.Vai item-namelatencuaphAnill d~idi~nchonut,countchua
s6giaotacchuamftudotiOOtirnutdod~nnutg6cvanode-linkchid~nnut
k~ti~ptrongcaycophfu1ill doho?cchid~nnulln~unolanutla.Cacnode-
linkgiupchoS\Iduy~tcityd~dangtrongquatrinhkhaikhoangdftli~u.
3-S6dongtrongbangHeaderb~ngs6phAnticphanbi~trongcayFP
vam6idongclingchua3thuQctiOO:item-name,item-countvanode-link.Vai item-.
40
nameleitencuaphAntu, item-countla tAngsf>bi€n d€m cuatAtcacacnutchua
phAntu dovanode-linkchid€n nutdAutienchuaphAnill dotrongcay.
ThUGtloan2.2:Xay dlJngcayFP titmQtcO'sadftli~ucacgiaotaco
Nhtjp: (1)CO'sadftli~ugiaotacD; (2)Ngufmgmill_sup.
Xudt:CayFP v6'icO'sadftli~uD vath6amill_sup.
Phurmgphap:
(1)Duy~trenD IAndAuvathut~pmQt~pcacmfiuphAbi€n F cling
v6'idQh6trQ'cuachung.
S~px€p cacphAntutrongF theochi~ugiiimcuadQh6trQ'valUll
vaodanhsachL.
N€u conhi~uphAnill coclingdQh6trQ'thi s~ptheochi~utangten
cuachung.
(2)T(;lonutgf>cR vagallnhanla"null".
T(;lobangHeaderco IFIdongva d~tAtcacacnode-linkchid€n
null.
(3)For eachgiaotacTED do{//dQcD IAn2
ChichQncaephAnill phAbi€n cuaT duavaoP;
S~pP theothutlJcuaL.
CallInsert_Tree(P,R);
}
Procedureinsert_tree(P,R) f
I) I>~tP = [PIP-p],v6'ip laphAnill dAutiencuaP va P -P laphAn
conl(;licuadanhsach.
2) ifR comQtconN saDchoN.item-name=pthen
3) N.count=N.count+1;
4)else{
5) T(;lonutm6'iN;
41
6)
7)
8)
N.count=1;
N.item-name=p;
N.parent=R;
II T~onode-linkp chid€nphAntirdAtitiencoitem-namenay.
II H labangheader.
9) N.node-link=H(p).head;
10) H(p).head=N;
11)}
II Tangbi€n countcuapthem1trongbangheaderH.
12)H(p).count=H(p).count+1;
13)ifP -p"*then
14) Callinsert- tree(P- p,N);
}
Thidu2.2:Xayd\lngFP-treeill bangdftli~u2.2vaimill_sup=3.
Bang2.2:Bangdftli~ucacgiaotac
Buac1:duy~ttrencO's6dftli~utimdanhsachcacmduph6bi€n
L = {(3,5),(2,4),(1,3),(5,3)}, vai s6dAtilatenvas6thuhailadl)h6.
trqcuano.
Buac2:T~onutg6cchocayvaduy~tcO's6dftli~uIAnhaid~chen5
giaotacvaocaynhuhinh2.5(a)-(e).
Tid Giaotac Mducanchen
1 {1,2,3,4} {3,2,1}
2 {1,2,3} {3,2,1}
3 {2,3,5} {3,2,5}
4 {3,5,6} {3,5}
5 {1,2,3,5} {3,2,1,5}
42
Truactien,giaotac{1,2,3,4}duQ'chen(Hinh2.5a).PhAntu4 khongph6
bi~nvacacphAntu conI~iduQ'cs~ptheothirt\1'trongL:{3,2,I}. NhanhdAtitien
duQ'chenvaodiy va cacbi~nd~mcuacacphAntu 3,2,1duQ'ctangten1trong
bangHeader.
Giaotacthirhai {I,2,3}chiracacphAntu d€u ph6bi~nvaduQ'cs~pthanh
{3,2,1}.No cochungm<)tducmgdanvai giaotacdAtitiennenkhongconutmai
naoduQ'chenvaodiymachicocacbi~nd~mduQ'c~pnh~t.
Giaotacthirbala {2,3,5}duQ'cs~pthanh{3,2,5}.Vi cochungduemgdanla
{3,2}vaiduemgcli la {3,2,1}nenbi~n<,t~mm6inutdQctheonoduQ'ctangthemI
vacom<)tnutmai(5:1)duQ'ct~ovalienk~tnhum<)tnutconcua(2:3).
GiaotacthirtucophAntu6khongph6bi~n.PhAnconI~iduQ'cs~pla {3,5}
nencochungnut{3}vai caydanghi~nhanh,vi th~biend~mcuanotangthem1
com<)tnutmai(5:1)duQ'chenvaolamconcuanut(3:4).Vi cayhi~nt~ichiranut
cophAntu5nennode-linkduQ'ct~ogiftachung.
GiaotacthirnamduQ'cs~pla{3,2,1,5}clingt~othemm<)tnutmaichophAn
tu5.Node-linkcua5baygiachirabanut.UiachicuanutdAtitienclingvainode-
linkduQ'clUlltrongdongtuonglIngcuabangHeader.
BangHeader
Item-nameItem-count
~ t ~de-link
~ ------~(
a)Chenmati{3,2,1} b) Chenmati{3,2,1}
3 1 -
2 1 -
1 I -
5 0
3 2 -
2 2 -
1 2 -
5 0
,,,
, j', ,' ,' ,, ,, ,, ,'---
c) Chenmati{3,2,5}
43
d)Chenmati{3,5}
\
\
\
\
\
\
\
\
\ , ,
, ,,,,
\
\
\
I
I
,,
---------------
e)Chenmati{3,2,1,5}
Hinh2.5:Xaydl,lngcayFP
3 3
-
2 3 -
1 2 -
5 1 ',
3 5
-.
2 4 -'
1 3 -
5 3 \
3 4 -
2 3 -
1 2 -
5 2 "
44
2.2.1.2.Khai khmingcaemiu ph&bi~nbing FP-tree
PhuongphapFP-growthti€p c~ theohu6'ngchiava trj chovi~ckhai
khoangcacm~uph6bi€n.Phuongphaptimki€m mQtcachd~quyd6iv6'icacm~u
ph6bi€n ng~nhonvasaudotimcacm~udaihonb~ngcachghepmQtphfintuph6
bi€n vaosaumQtm~ung~hon.
FP-treekhongnhftngchophepbi€u di€n dit li~ucodQngmaconti~nIQ'icho
khaikhoangmQtm~uph6bi€n hi~uquabai cacthuQctinhsauday:
1-Tfttcacacm~uph6bi€n chuamQtm~uph6bi€n conajcoth€ duQ'c
timthftytrongcooSo'm~uco di~uki~ncuam~uconmano chuamQtt~pcacdu<mg
d~nti~nt6trongFP-tree.
2-Tfttcacacdu<mgd~nti~nt6cuaajcoth€ duQ'ctimthftyb~ngcach
duy~tquamQtnode-linkcuaphfintudfiutiencuaajtrongcayFP codi~uki~ncua
aj,v6'im6idu<mgd~nti~nt6ladu<mgdi ill mQtnutdQctheonode-linkd€n nutg6c
cuacay.MQtcaycodi~uki~nduQ'cxayd\IIlgb~g cachdungcooSo'codi~uki~n
cuaaj.
3-Tfttcacacm~uph6bi€n chuam~uh~ut6ajcoth€ duQ'ctimthfty
b~ngcachghepajv6'icacm~uconph6bi€n duQ'ckhaikhoangill cayFP codi~u
ki~ncuaai.
4- Bfttkym~uph6bi€n duQ'cxayd\IIlgdungajnhumQth~ut6ph6
bi€n thicocungdQh6trQ'v6'iaj.
Thu~toand€ khaikhoangtfttcacacm~uph6bi€n dungFP-treeduQ'ctrinh
baydu6'iday.
ThUGtloan2.3:DungFP-treed€ khaikhoangcacm~uph6bi€n b~g cach
tangtruangphanmanhm~u.
Nhijp:(1)CayFP d6iv6'icooSo'ditli~uD; (2)Nguangmill_sup;
Xudt:MQtt~pdfiyducacm~uph6bi€n F.
Phuangphap:Call FP-growth(Tree,null)
ProcedureFP-growth(Tree,a){
1)F=<D;
45
2)If Treechuam<)tdmmgddndanP then{
3) foreacht6hQp~cuacacnuttrongP do{
4) Phatsinhmdup=~u a;
5) sup(P)=mill_supcuacacnuttrong~;
6) F=Fu p;
7) }
8) }
9) else
10)for eachaibangheadercuacayb~tdAtill mduph6bi€n itnh~tdo
11)
12)
13)
14)
15)
16)
17)
18)
19)}
{ Phatsinhmdu~=aiu a;
sup(~)=sup(ai);
F=F u ~;
Xay d\ll1gcO's6codi€u ki~ncua~;
Xay d\ll1gFP-treecodi€u ki~nTreepcua~;
If Tree~*then
Call FP-growth(Tree,);
}
Thid\lsaudayapd\lngthu~tloanFP-growthchocay6hinh2.5e.
Thidu2.3:
CacphAnill ph6bi€n trongbangHeaderduQ'cxemxetb~tdAtill phAnill co
d<)h6trg th~pnh~tvi mufmdambaoduy~tcayill du6itennh~mtranhxetm<)t
phAntll nhi€u IAn.Tru6clienxetphAntir5,m<)tphAntirph6bi€n du6icungcua
cay.Noxayratrongbanhanhlrngv6icacduangddn(3,2,1:1),(3:1)va(3,2:1).£><)
h6trg cuam6iduangddnnayb~gnhaulrngv6inutcua5.Saukhit6ngcacIfA1..~
trgcuacacphAnill trongcabanhanhduQ'CtinhchungtaduQ'cbangtAnsu~tC(
ki~n(3:3,2:2,1:1).PhAntir I khongth6amin_sup=2,vi th€ chico3 va2m6i
dungd€ xayd\ll1gcapFP codi€u ki~nchophAnill 5nhuhinh2.6.Caynay(.
46
xaydlJnggi6ngnhucayFP nhungthayvi cacgiaotactachencacduangdftn.Vi
cayFP codi€u ki~ncuaphfinill 5chicomQtduangdftndO'11(3:3,2:2)nencacmftu
ph6bi~nconcoth~duQ'cphatsinhIa(3:3),(2:2),(3,2:2),v6'idQh6trQ'duQ'cgan
b~ngdQh6trQ't6i thi~ucuacacphfinill trongmQtmftucon.
BangHeader
!tem-\,e !t
i
m-count 0
\ Node-link jrrfl u --- ~
[TIlm.cb
Hinh 2.6:Xay dlJllgcayFP codi€u ki~ncuaphfinill 5tronghinh2.5e.
Tfitcacacmftuph6bi~nchiraphfintu5duQ'cpbatsinhb~gcachghepphfin
ill 5vitosaucacmftucon:{5:3},{3,5:3},{2,5:2},{3,2,5:2}.
D6i v6'iphfinill 1,ca samftucodi€u ki~nchichiramQtduangdftn(3,2:3)
lIngv6'icaycodi€u ki~n.Tfitcacact6hQ'Pcoth~cuacacphfintuchota
cacmftuph6bi~nsau:{I:3},{3,I:3},{2,I:3},{3,2,1:3}.
D6i v6'iphfinill 2, ca samftucodi€u ki~nvacayFP co di€u ki~nIa (3:4)va
.No chocacmftu{2:4},{3,2:4}.
Ca samftucodi€u ki~ncua3 la r6ngvi th~comQtmftuph6bi~n{3:5}.Quy
trinhkhaikhoangduQ'ct6ngk~ta bang2.3.
47
Bfm~: QuytrinhkhaiImmingcayFP tronghinh2.5e
Dinhnghza2.5(Cumfirthira):C~mfirp g(Jila thiraniu vachiniu coqlm
firPsubla c1:lmfir con cuaP maco hc;mgcao han h{;mgcuap.
Khongbaogiach<;mcacC\lmill thiramab~tbuQcphaitiano.d daychungta
phatsinhtAtcacacc\lmill saildotiacacc\lmtirthira.
Dinh nghfa2.6 (ThuoctinhAnti-Monotone):MQtmJup khongthoamiinmQt
thuQctinhthi thuQctinhnayg(Ji la Anti-Monotoneniu vachi niu khongcosieu
mJunaocuap thoano.
ThuQctinhAnti-MonotonerAthtIuichbaivi nocoth~duQ'cdAyt6idavito
quytrinhkhaikhoangcacm~uph6bi€n.
Dinh Iv2.1(ThuoctinhAnti-MonotonecuaDC):
Cho c\lmill P=(W},"',Wn),gQiDen(w},...,Wn)Ia IJ1~Us6cua(2.1)d6iv6i p va
PCiIadQdopccuabAtkyc\lmill PicodQdain-l ill thuduQ'cb~ngcachb6ill thui
trongc\lmtirp (i=l,n).
Deni(W},...,Wn)Iam~us6trongPCi.S~px€p cacill theothun.rgiambi€n d€m
toanC\lCcuachungtrongmQiC\lmill. N€u cacdi~uki~nsaildaydungv6i n
(i) pc(w}"",wn)<pciCw},...,wn)
(ii) Den(w},...,wn)>Deni(w},...,wn)
thi chungclingdungv6i bAtkym>n.
Phan CO'sacodieukin CayFP co Cacmu phobien
ill diu kin duQ'csinhra
5 {{3,2,1:1),{3:1),(3,2:I)} {5:3},{3,5:3},{2,5:2},
{3,2,5:2}
1 {{3,2:3)} {1:3},{3,1:3},{2,1:3},
{3,2,1:3}
2 ({3:4)} {2:4},{3,2:4}
3 cD R6ng {3:5}
48
Chimgminh
- Bu6'c1:D~tk Iii s6n t6i thi~usaochodi~uki~n(ii) dung.V6'ik thi
(i) duQ"csuyra ill (ii). Th~tv~ytheoheuristicApriori taco:
Nom(WI"",Wk):::;;Nomi(WI"",Wk)
V6'iNom Iii tit s6trong(2.1).Di~uniiydUngvi c\lmill bellv€ tnii Iii
sieuClJmtircuaClJmtirbellv€ phai.K€t hQ'Pb~tdfulgthucniiyv6'i(ii) taduQ"c(i).
- Bu6'c2: Giii sirhai di~uki~ncuadinh Iy dungv6'in, tacAnchUng
minhnodungv6'in+1.
Tir d~ngthuc(2.1)tavi€t I~i(ii) nhusau:
n~C( ) C() C() !C(Wl)*C(WZ)*...*C(wn)WI * WZ *...* W >n-Z
n 1 C(wJ
C(Wl)n-z *C(WZ) n-Z*...*C(Wn) n-z> C(Wl) n-I*C(Wz) n-I*...*C(Wn) n-I
C(WJ n-I
.
Saukhi rutg<;mvanhanhaiv€ choC(Wit-1taduQ'c:
C(Wit-1> C(Wi)*C(WZ)*...*C(Wn)
DaychinhIii mOtd~g khaccua(ii).
D~chUngminh(i) dungv6'in+1tacAnchUngminhdi~uki~n(ii) clingdung
v6'in+1.Tuc Ia:
(2.2)
Den(Wl,WZ...Wn,Wn+l)>Deni(wl,WZ...Wn,Wn+l)
Theo Heuristic Apriori taco:
(2.3)
Nom(Wl,WZ"",Wn,Wn+l):::;;Nomi(wl,WZ",Wn,Wn+l)
D~chungminh(2.3)tacAnchungminh:
C(Wit> C(Wl)*C(WZ)*...*C(wn)*C(Wn+l) (2.4)
Vi t~tcii cactir trongClJmtir duQ"cs~ptheochi~ugiiimbi€n d€m to~mClJC
cuachung,nentaco:
C(Wi);;:::C(Wn+l) i=I,...,n
Nhan(2.2)v6'i(2.5)v€ v6'iv€ taduQ"c(2.4).
V~ytheonguyenIyquyn~ptaduQ"cdi~uphiiichUngminh!!!
(2.5)
49
He qua2.1:N~ue\lmtu p(Wl,"',Wn)la thua,co dQdopc va 2 di~uki~nsau
th6aman:
0) Den(wJ.,"'~wnY> Deni(wJ.,...~wn)
(ii) C(WI)~C(wz)~ ~C(wn)
thibAtky sieue\lmtUnaoeuap dIngthua.
ChUngminh:
SlJ thuaduqenoitrongdi~uki~n(i) euadinhly 2.1.Vi theodinhly thi(i) la
dungv6im>n,tirelit:bAtkysieue\lmtUnaoeuap clinglathuan~uthir1\1'trencae
bi~nd~meuaphfinill duqebaatoim.
Caedi~uki~na caemdusf,vathir1\1'caed~iluqnglarAtqUailtrQng.Thid\l
di~uki~n(i) euadinhly ladungtrongkhi(ii) sainhungsaudok~tlu~neuadinhly
thikhongdambaadungm~edilnocoth~dung.Xemthid\ldu6iday.
Thidu2.4:
Gia sircaedi~uki~neuadinhly th6amanva a,b, e,d la caetUco caebi~n
d~mnhusau:
C(a)=80; C(b)=15; C(e)=5; C(d)=5
C(be)=5; C(abe)=4; C(bed)=4; C(abed)=3
TAteacaedi~uki~neuadinhly2.1th6aman:
Den(be)=16.5=75;Den(abe)= .JSO*15*5 =77.46
pe(be)=51Den(be)=0.066;pe(abe)=41Den(abe)=0.052
Tinh dQdopcehobedvaabed:
pe(bed)=41.J15*5*5 =0.207
pe(abed)= 3/VSO*15*5*5= 0.097
R5rangpe(bed)>pe(abed)dungnhu dinhly.
ChungtaxettruanghQ'PtAteacaedi~uki~neuadinhly duqeth6aman
ngo~itmdi~uki~n(ii)nhusau:
50
C(a)=20;C(b)=15;C(e)=10;C(d)=5;C(be)=10;
C(abe)=3;C(bed)=3;C(abed)=3;Den(be)=15*10=150;
Den(abe)= .J20*15*10 =54.77;pe(be)=10/Den(be)=0.066;
pe(abe)=3/Den(abe)=0.055;pe(bed)=3/.J15*10*5 =0.110;
pe(abed)=3/\120*15*10*5 =0.122
Hinh2.7:Tia c\lmtITthiraC\lCbe>
C?/mtit D{jaopc
ab 0.015
abe 0.080
abed 0.149
abede 0.239
abee 0.160
abd 0.070
abde 0.170
abe 0.078
ae 0.015
aed
0.091}aede 0.202ace 10
ad 0.015
ade 0.111 }-
ae 0.015
be 0.083
bed 0.231
bede 0.376
bee 0.258
bd 0.080
bde 0.283
be 0.100
cd 0.133
ede 0.365
ce 0.167
de 0.200
51
Trongtruanghqpnaypc(bcd)<pc(abcd),baivi di~uki~n(ii) kh6ngthoa.
DinhIy2.1chophepphatbi~uIu~tsaudaytrongquytrinhtiaC\lmtITthiraC\lCbt).
LUGt2.1 (ria cumtirthiracucbodungdodopc):
Tia mt)tC\lmtITthirap va tAtcacacStellC\lmtITcuano n~uco it nhAtmt)t
c\lmtITconPsubcuaP coDen(psub)<Den(p).
Chi c~nki~mtradi~uki~n(ii) vi di~uki~n(i) duqcsuyra tIT(ii). Thli t\1'
giamcuacacbi~nd~mtoanC\lCcuacactITriengIe cuam9i c\lmtITsed£;1tduqc
b~ngcachsuathu~toanFP-growthduqcdi€n giai dual day.Lu~t2.1chopheptia
nhi~uc\lmtITthiranhungkh6ngphaiIa tAt cabaivi lu~tnayti~pc~ntheohuangbi
quailvatiacacC\lmtITchikhinaono thoatAtcacacdi~uki~ncuadinhIy 2.1.
Nhungcoth~dinhIyv§:ndungth~mchidi~uki~1J(ii) kh6ngthoa.Quytrinhtiacac
c\lmtITthiraconl£;1itadungCP-treeseduqctrinhbaytrongcacph~sau.
Tahayxemthid\ltiaC\lmtITthiraC\lCbt)dungdt)dopcnhusau:
Thidu2.5:ChovanbanchliacactITa,b,c,d,evaicacbi~nd~mnhusau:
C(a)=66;C(b)= 10;C(c)=6;C(d)=5;C(e)=4;C(ab)= 10;C(bc)=5;
C(ac)=5;C(ad)=4;C(ae)=4;C(bd)=4;C(be)=4;C(cd)=4;C(ce)=4;C(de)=4;
C(abc)=5;C(abd)=4;C(bcd)=4;C(bce)=4;C(bde)=4;C(ced)=4;
C(abcd)=4;C(abce)=4;C(abde)=4;C(bcde)=4;C(abcde)=4.
Hinh2.7chothAycacC\lmtITduqcphatsinhvaimill_sup=3vacacgiatri
pctuanglingcuachung.CacdAtingo~chiracacC\lmtITduqctia.Truactienc\lm
ill abduqcsinhraKh6ngc~nki~mtracacc\lmtITconcuanovi dt)dait6ithi~ucho
phepcuanoia3.Ti~ptheoIaC\lmtITabcvaiDen(abc)=63.929.C\lmtITconcuano
IabccoDen(bc)=60.000.TheoIu~t2.1,c\lmtITabcvatAtcacacStellC\lmtITcua
noduqctia.Chuy r~ngcacStellc\lmtITabcd,aberle,abce,abd,abde,abeduqctia
makh6ngc~nsinhfa.NhomC\lmtITthli hatIa (acd,acde,ace)duqctiabai vi
Den(acd)=44.497vaDen(cd)=30.000.Nhomthlibaduqctiachicomt)tc\lmtITade,
vaiDen(ade)=36.332vaDen(de)=20.000.
52
2.2.2-PhU'O'ngphapFP-phrase
Giai thu~tFP-phraseduQ'cdungd~xayd\ffigC\lmfir.No lAygiatri nh~pvao
Ia danhsachcacgiaotacmam6igiaotacIamQtt1Jad€ ho~cmQtCalitrongnQidung
cuathongdi~p.CacC\lmfirduQ'ckhatkhoangxuyenquatAtcacacthongdi~ptrong
t~pdftli~uhuAnIuy~nmam6ithongdi~pdaxacdinhIO£;li.Nguangmill_suptrong
khikhatkhoangIanhanmill_suptruacdoseduQ'cdungd~khatkhoang6 buac
sau.
Quatrinhxayd\ffigFP-treechokhatkh,oangC\lmfir gi6ngnhuthu~toan2.2
nhungtruackhi chencacgiaotacv;lOcaythi tas~pchungtheochi€u tangcuatAn
suAt.Thayd6inayIacAnthi€tnhAmthoamapdi€u ki~hcuadinhIy2.1mab~tbuQc
cacphAntITtrongC\lmfirphaiduQ'cs~ptheochi€ugiamtAnsuAt.Vi cayduQ'cduy~t
firduaiIennenchungtacAncacphAntITcotAnsuAtIannhAtnAm6 duaicay.Dang
ti€c Ia di€u naylamchocayr~mr£;lphanvaduy~tch~mhannhungcokhanang
vi~ctiamfiud£;lthi~uquahan.
Quatrinhkhatkhoangmfiuph6bi€n clingdothQisirad6i trongtruanghqp
diyFP codi€u ki~nchuamQtduangdfindanvatAtcacacmfiuph6bi€n duQ'csinh
rabAngcachd€m tAtcacacmfiuconcoth~cuaduangdfin.Trongthu~ttoanFP-
growth(thu~ttoan2.3)thivi~cd€m naycoth~theothut1JbAtky trongkhi FP-
phrasethiUtitienduy~ttheochi€usaucuavi~cd€mmfiud~lamchovi~ctiasaut6i
dacoth~.Thi d\l,n€u duangdfinIa,thut1Jd€m mfiulIngvaiduy~tthu
t1Jcuacaytronghinh2.8Ia:{ab},{abc},{abcd},{abcde},{abce},{abd},{abde},
{abe},{ac},{acd},{acde},{ace},{ad},{ade},{ae},{bc},{bcd},{bcde},{bce},
{bd},{bde},{be},{cd},{cde},{ce},{de}.
53
Hinh2.8:Li~tkem~utheoUtitiepdQsaucuaduangdfut
Trongb6sungnaychungtaxemtruanghQ'PmQtduangd~ndannhutruang.
hQ'Pt6ngquatbfuIgcachxayd\lIlgcayFP codi~uki~ncuano.Thu~ttoanFP-
phrasedaydudungd~phatsinhC\lmtirkhongthiranhusau:
ThuGtloan2.4(FP-phrase):KhaikhoangtAtcaC\lmtirph6bi~nkhongthira
gi6ngnhuphuO'ngphapFP-growth.
Nhejp:(1)T~pcacgiaotacT, m6igiaotaclatl!ad~ho~cmQtCallnQidung
cuathongdi~p;
(2)Nguangph6bi~nt6ithi~uC\lmtirSmin;
(3)BQ dait6idaC\lmtir Imax.
Xudt:T~pdayduP cacC\lmtirph6bi~nkhongthirath6aImax.
Phlfangphap:
(I) Xay d\1'ngFP-trees~px~pm6igiaotactheothirt\1'tangtansuAt.
(2)P=~;
(3)CallFP-phrase(Tree,null)
54
ProcedureFP-phrase(Tree,a) {
1)foreachajtrongheaderctiacayb~td~utirph~ntuph6bi~nnhMdo
2) { Phcitsinhmfiu~=aju a;
3) sup(P)=sup(aj);
4) iflength(~);:::2andlength(~)~lmaxthen{
5) if mfiucon~subctiacoDen(~sub)<den(~)then
6) Tiavakh6ngphatri~nonua;
7) elseif ~kh6ngduQ'ctiabaiCP-treethen{
8) P=P u ~;
9) Xayd\ll1gcosamfiucodi~uki~nctia~;
10) XfLydlJngcayFP codi~uki~nTree~;
11) If Tree~"*then
12) callFP-growth(Tree~, );
13) }
14) }//endiflength
15) }//endfor
16)}
Tia cacc\lmtirconl~idungCP-treea dong(7)duQ'ctrinhbaytrongph~n
2.2.3.
f)~tranhki~mtravetc~ntdtcacacmfiuconcoth~ctiamQiC\lmtira dong
(5)tadungSlJt6iUtisau.V6imQiC\lmtirPn=WI,...,Wnthiconhi~uc\lmtirconctia
nokh6ngc~nki~mtra,dola tdtcacacc\lmtirconSPn-lctiaC\lmtircon
Pn-l=WI,.."Wn-ldflbi~tvath6adi~uki~n:
Den(SPn-l);:::Den(Pn-l) (2.6)
Th~tv~y,n~u(2.6)kh6ngdungd6iv6i it nhdtmQtSPn-lthiPn-lduQ'ctiacung
v6i tdtcasieuc\lmtir ctianovaPnkh6ngduQ'cxetnua.Tdt cacacc\lmtir conctia
Pn-lclingd6ngthai lacacc\lmtirconctiaPn, vi th~PnduQ'ctian~u:
55
Den(Pn)>rninSpn-l(Den(SPn-l» (2.7)
D6 ki6rntradi€u ki~nnaychungtachi cAnlUll giatri rninspn-l(Den(sPn-l»va
kh6ngcAntinhrnfius6cuabAtky rnfiuconnaocuaPn.N~u(2.7)kh6ngth6atacAn
tinhrnfius6 cuacacrnfiucon cuaPnnhungkh6ngla rnfiucon cuaPn-l'Hay xern
thu~ttoandu6iday.
Thudtolm2.5:Ki6rntratAtcacacrnfiuconcuaC\lrnfirhi~nhanh.
Nh(tp:C\lrnfirPnvaDen(Pn);(2)DenmincuatAt<;1ac crnfiuconcua
Pn-l,v6i Pn=w U Pu-l(w la rnQts6firnaodo).
Xudt:(1)Truen~uPnduQ'ctja,Falsen~ukhacdi; (2)C~pnh~tDenmin;
PhLiangphap:
1) iflength(Pn)=2then{
2) C~pnh~t:Denmin=Den(Pn);
3) returnFalse;
4) }
5) else{
6) if (Den(Pn»Denmin)then{
7) returnTrue;
8) else{
9) for eachc\lrnfirconsPncuaPnIDalength(sPn)~2andwe SPndo{
10) ifDen(Pn)>Den(sPn)then
11) returnTrue;
12) }
13) if Den(Pn)<Denminthen
14) C~pnh~tDenmin=Den(Pn);
15) returnFalse;
16) }
17) }
56
N~uthu~tmlnch~yd~ndong(13),mclaDen(pn)::s;;denminvaDen(Pn)::s;;
Den(sPn)v6i mQimfiuconSPn.DenminduQ'c~Pnh~t&dong(14)n~ubAtd~ngthuc
nh6hanth~tSlJ.DenmincdnduQ'c~Pnh~tn~uthu~tmlntraveTrue,vi trong
tru<mghQ'PnaymQtmfiukhongduQ'ctr6ngnil'a.
2.2.3.Tia c1}.mfir thiradungCP-Tree (CompressedPatternTree)
CP-treela mQtd~ngdangiancuaCR-treeduQ'ctrinhbaytrongchuangsail.
CP-treekhonglUll tril'thongtin ve lo~ivamQinutcuano t'rngv6i mQtmfiut6i da.
ChungtadungdQdopc d€ x~Ph~g C\lmill. MQtC\lmill co th€ duQ'ctiangaykhi
chennovaoCP-treeb~ngcachki€m ITacot6nt~iC\lmill concoh~gcaohanhay
khong?Sieuc\lmtircoh~ngthAphanhaykhong?N~ucothitiachung.CacC\lmtir
conl~iduQ'cxuAtrab~ngcachduy~ttrendu<mgdfutcuacaybiltdAtill tAtcacac
nutcuaCP-treecokernthongtinthuQctinhmfiuchod~nnutg6c.
2.2.4.Chenc1}.mfir
Trongphflnnaychungtamotacachsird\lngcacC\lmtirdaxaydlJngd€ thay
th~cacill riengIe.Nhieuc\lmill coth€ d\lngdQb&ichungcomQts6ill chung.B€
giaiquy~tdieunaytadinhnghiam9tthutlJt6ngtrenmQiC\lmill.
LulU 2.2(X~ph~ngc\lmill) Cho hai c\lmill Pi vaPj. Pi co h~g caohanPj
hayPi >Pj n~u:
(i) PC(Pi)>pc(Pj),ho~c
(ii) PC(Pi)=pc(Pj)vasup(Pi)>sup(Pj),ho~c
(iii) PC(Pi)=pc(Pj)vaSUP(Pi)=sup(Pj)valength(pi)>length(pj),ho~c
(iv) PC(Pi)=pc(Pj)va SUP(Pi)= sup(Pj)va length(pi)= length(pj)vaPi duQ'C
sinhratru6cPj.
TAtcacacc\lmill sinhraduQ'csilpgiamddntheoh~ng.V6i m6iCallmacac
c\lmill phiLnothiduQ'chQnlamt'rngvienthayth~,v6iphiLnayduQ'cdinhnghia
nhusail:
57
Dinh nghza2.7 (Phil C1,lmtir)Ciiu S all(Ycphil bai C1,lmtirp niu tatcdcactir
cilaC1,lmtirnlimtrongciiuS, tucfa: VWj EP, 3wj E S, v6'iWj,Wjfacactir.
Cac c\lmtiTphil duQ'cxetb~tdAutiTC\lmco h~g caonh~t.MQt Id C\lmtiT
thaychonhi~uId cuacacill thuQcc\lmdo.Quatrinhti~ptl,lcd~nkhi t~tcacact6
hQ'PtiTcothetrongdiuduQ'Cthayth~bO'imQtsf>C\lmtiTho~cd~nh~tcacC\lmtiT.
Detf>iuuthaigianth\lchi~ncuathu~ttoanchUngtaluuv~tsf>cactiTcuab~tky
c\lmill m\otrongm6idiu..MQthu~ttoandAyduduQ'cdungdethayth~C\lmill
duQ'ctrinhbaydu6iday(Thu~ttoan2.6).
Thuiitloan2.6:(Thayth~c\lmill) Thayth~cacnhomill trongmicaubO'i
m<)tC\lmtir.
Nh(1p:(1)T~pP t~tcacacC\lmill; (2)Co sO'ditli~ucacgiaotacT, v6im6i
giaotacla mQtcaunQidunghayWad~cuathongdi~p;m6igiaotacchuaId cua
thongdi~ptuang(mg.
Xuat:Co sO'ditli~ucacgiaotacD,v6im6igiaotacthehi~nmQthongdi~p
duQ'cthayth~bO'icacc\lmill.
Phllangphap:
1) P =Sort(P);//S~ptheochi~ugiamcuah~g cuacacc\lmill
2) For eachgiaotactj E T do {
3) Npt(tj)=sf>C\lmtirtronggiaotactjtru6ckhithayC\lmtir;
4) Timt~pconPCOy c P lat~tcacacc\lmill philtj;
5) do {
6) ifc\lmill k~ti~pPj E PCOy vfu1philtj then
7) Thayt~tcacacill cuaPjtrongtjbO'iPj;
8) C~pnh~tsf>c\lmtiTtrong4:npt(tj)=npt(tj)-1(Pj);
9) }
10) }until(endofPcovornpt(tj)<2);
11) }
.....
58
12)Gomnhomt~tcacacgiaotactrongT theeId thongdi~pvaomQtgiao
tilemai.
13)Chengiaotacm6'inayvaoD;
2.3. ChQD d~ctrU'Dg
V~ndequailtr<;mgtrongphantichduli~uvanbanlakichthu6'ckhonggian
d~ctrungIan.B6ivi m6iill phfulbi~tduQ'cthemvaolamtangthemmOtkichthu6'c
chokhonggian.Do dokichthu6'ckhonggiand~ctrungcoth~dC;ltd~nhangchlJc
th~mchihangtramngan.Thl!CrachicomOts6itcacill lamd~c!rungd~phanIOC;li
taili~utrongkhir~tnhi€ucactirOOi~u,khongranghialamtangthaigiantiOOtmln
dangk~vak~tquakerntinc~y.Vi v~ychungtaphaiti~nhanhchQnd~ctrung.
Dinhnghia2.8:ChQnd~c!runglaquytriOOxoacacill khongranghiatrong
taili~unh~mcaiti~ndOchinhxacvagiamthaigiantinhtoanphuctC;lP.
Conhi€uphuangphapchQnd~ctrungduQ'clrngdlJngtrongbu6'cti€n xirIf
dftli~umachtiy~udl!avao:nguO'ngtAnsu~taili~u,l -th6ngke,thongtintuang
h6vacuO'ngdOxuAthi~nill. SaildaylacacdOdodungd~chQnd~ctrungtrenthl!c
t~.
Thon!!tintutm!!h6trun!!binh
DOdothongtin tuangh6trungbinhduQ'Csird\lngrOngraid~phan IOC;li
thongdi~p.CongthuctiOOthongtintuangh6trungbiOOgiuamOtt~ptAtcacac
IOC;liC va mOtill t la:
P(e,ft)
MI(C,t)=L L P(e,ft)logzPee)*pert)ceC f,e{O,I} (2.8)
Trongdo4=0n~ukhongcot trongthongdi~pva4= 1n~unguQ'clC;li.
MI(C,t) caokhi bi~tSl!ki~nmaill t com~thayvfu1gm~trongthongdi~pva
nogiupdl!deanIOC;lic chothongdi~p.
59
D(lt{cih=l,mla kY hi~ucuame)t~pcaclo~i.Phuangtrinh(2.8)coth€ vi€t l~i
nhusau:
m m m
MI(C,t)=-2:P(cj)log2P(Cj)+P(t)2:P(cilt)log2P(cdt)+P(t)2:P(cilt)log2P(cilt) (2.9)
~l ~l ~l
MI duQ'ctinhv6i m6itit1.CactitduQ'chQnco th€ ho(lcla tAtcacactitthoa
manMI tbi thi€u (donguaidungquydinh)ho(lcnguQ'cl~ino co th€ la k titmaco
giatri de)docaDnhAt,v6i k duQ'cxacdinhtheoth\lcnghi~m.
Dotincar
De)tinc~ycoth€ dungd€ chQnd(lctrungchobe)phanlo~idancApvaphan
lo~ico thu b~c.
TrongtruanghQ'Pphanlo~idancAp,de)tinc~ycuaill tv6i lo~iCichinhlade)
tinc~ycualu~t =>CivaduQ'Ctinhnhusau:
~( )
count(tc;)
conl~t~ c. =
I count(t)
(2.10)
Dbi v6i truanghQ'Pphanlo~icothub~c,de)tin c~ycheoduQ'cdinhnghla
nhusau:
contE(t ~ Cj) = P(tCj)
P(tcJ+ LB(cj,cJP(tc)
(2.11)
Cj*Cj
V6iB(Cj,Ci)lakhoangcachgiualo~iCjvaCjtrongcayphancApchud~.Theo
congthucde)tinc~ynaythigiatricangnhon€u t ph6bi€n trongcaclo~iCjmacac
lo~iCjnayxav6i lo~iduQ'cd\ldocinCj.No d~tgiatri IannhAtla 1n€u t duQ'cb~t
g(lpchitrongcacthongdi~pcolo~iCj.R5 rangde)donaychobi€t tinhchinhxac
cuabe)phanlo~i.
60
DodoJ
£)9 do J cling co the dUng de chpn dilc trung cho ca tnrang hpp phiin lo{li (/07
cip vaphiin 10ft;co thiJ bPc.
DQ do J dc3ivai phan loej lei:
P(e.lt) P(-.c.lt)
J(t~cJ=P(t)*[P(eilt)log2 1 +P(-.c;lt)log2 ']
P(cJ P(-,cJ
(2.12)
DQdoJ la itchcuahaid~iluQ11g.Tru6etieD,pet)daTIgUmla dQdoeualu~t
va d~iluQ11gthuhai la trongngo~evuonglamn6i b~teadQehiOOxaevatiOOdaTI
gianeuabQphanlo~i.
Chnd~etrungb~ngeachdungdQtin e~yho~edQdo J va thgehi~nrieng
d6iv6i m6i Io~ibAngeachxaedinhgia tri t6i thieueiiad9 doho~eehnehom6i
Io~imQts6d~etrungmacoh~ngcaDooAtd6iv6i lo~inay.
Chil Y r~g dQtine~yvadQdoJ d6ucothedu,edungehoehnd~etrungva
chnlu~tva;ho~cx~ph~ng.Caelu~tk~thqpco thedu,exemOOulo~id~c!rung
maco dQdai Ian ban 1va m6iill riengbi~tla mQtt~pti6ndi6uki~ncuamQtlu~t
tuang(mg.
Caegiatr!P(c),pet)vaP(e,t)du,ct£OOtheocaccachkhacOOau.Thi d\l d6i
v6i mohinhdabi~nBernoullicuaBayesthiP(c) b~g s6tai li~uco lo~ic chiacho
t6ngs6taiIi~u;pet)b~ngs6taili~uchuaill t chiachot6ngs6taili~u;P(c,t)b~ng
86taili~ulo~ic vaehuaill t chiachot6ngs6taili~utrongt~pluy~n.D6iv6imo
hiOOdathuccuaBayesthiP(c)b~gs6tircungxuAthi~ntrongcactaili~ucolo~ie
chiachot6ngs6tircungxuAthi~n;pet)b~gs6IAnxuAthi~ncuatirt chiachot6ng
86tirxuAthi~nvaP(e,t)b~ngs6IAnxuAthi~neuaill t trongcaetaili~ulo~ie chia
chot6ngs6ill xuAthi~n.