Luận văn Phân loại thông điệp trên diễn đàn

PHÂN LOẠI THÔNG ĐIỆP TRÊN DIỄN ĐÀN LÊ THANH TÂM Trang nhan đề Mục lục Chương 1: Tổng quan. Chương 2: Tiền xử lý dữ liệu. Chương 3: Phân loại thông điệp. Chương 4: Chương trình phân loại. Chương 5: Kết luận. Tài liệu tham khảo Phụ lục

pdf33 trang | Chia sẻ: maiphuongtl | Lượt xem: 1899 | Lượt tải: 0download
Bạn đang xem trước 20 trang tài liệu Luận văn Phân loại thông điệp trên diễn đàn, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
28 Chuang2- TIEN XU LY DU LI~U Theohinh 1.3thi giai do~nti~nxu 1)-'dii'li~uduQ'cthvchi~ntheobabu6c chinhnhusau: Phantichva bi~udi~n thongdi~p Xay d1Jllg C\lllltir ChQnd~ctrung 2.1.Phantichvabi~udi~nthongdi~p PhAnnaybaag6m:phantachtutrongthongdi~pvabi€u di~nthongdi~p. MQtv~nd~khokhandAtitientrongxu1)-'tvdQngti~ngVi~tlavi~cdinhnghlatIT trongti~ngVi~tv~nconnhi~utr,!nhlu~n.f)~thu~nti~nchovi~ctrin}Ibayv~sau chungtoitheoquaildi€mcuaDinhDi~n[6]sau:mQtdiuti~ngVi~tbaag6mnhi~u tIT,m6itl1baag6mmQthaynhi~u'ti~ng',m6i'ti~ng'lam6ichu6ikYtv li~nnhau phanbi~tv6icacti~ngkhacb~ngmQthaynhi~likhoangtr~ng.Thid\l:tIT'hQc'la mQtITg6mmQti~ng,tu 'hQcsinh'lamQtug6mhaiti~ng.C\lmtu'khoahQct1J nhien'g6m2tuhay4ti~ng. Trongcacti~ngChilliau,nguaitacoth€ dUllgianI~yxacdinhcactITnha vaocackhoangtr~ngphancachtuvachQncactud~ctf11ngchonQidungvanban (dvavaotAnsu~txu~thi~ncuatu)lamd~ctrung.D6iv6iti~ngVi~tchungta khongth~lamtuungtvb6in~uchungtaxacdinhtuchidvacackhoangtr~ngphan cachthichungtacoth€ chinh~nduQ'cac'ti~ng'vonghlavadododQchinhxac cuah~th6ngser~th~p.Theocacnhangoi1ngii'hQcthiti~ngVi~tcod~n80%la cactIT2 'ti~ng'[6].Nhuv~ymu6nd~tchinhxaccaDthichungtaphaiti~nhanh tachtu. 2.1.1.Tachtir Bu6cnaytachmQthongdi~pthanhcaccalivam6icalithanhcacdunvi tIT. 29 Phuangphav Buac 1.Xay d\l11gotomatamti~tdminnh~ tfitcacacamti~tti~ngVi~t. Buac 2.Xay dlJngotomatITVlJl1gdoannh~ntfitcacactITvlJngti~ngVi~t. Buac3.DlJatrencacotomatneutren,xaydlJngdbthituO'ngungvaiCall cftnphantichvaS11d\mgthu~ttoantimki~mtrendbthid€ li~tkecaccachphan tichcothe. Bangchucaicuaotomatamti~tlabangchucaiti~ngVi~t,m6iclingchuy€n duQ"cghitrendomQtky tlJ. Thi d\l,vai baamti~tphuang,philp,trinhtaseco otomatdoannh~namti~tnhuHinh2.1. f:~~ q, ~ ~~ Hinh2.1.XaydlJngotomatamti~t Thu~toanxayd\l11gotomatamti~t Nhfjp:Tir di€n amti~t Xudt:Otomatamti~t. Thufjtloan: 1.L~ptn;mgthaikh&idftu%; 2.Vong l~pdQcchotai khi h~t~pdu li~u,lfiyratUngamti~t.GQicackY tlJ cuaamti~tdatac(pcl'°o.,Cn-l. a. p:=%;i:=O; b.Vongl~ptrongkhi(i ~n-l ) 30 i. LAy ra kY tl,rCj; ii. Timtrongcacclingchuy~ntirtr~ngthaiP clingtrendoghiky t1JCj. Neucocling(p,q) nhuthe: i :==i +1; p:==q; iii. Neukhongcocling(p,q) naonhuthethithoatkhoivong Hipb. c.V6i j ill i denn-l i. T~om6itr~ngthaiq, ghinh~nq Iatr~ngthaikhongket; ii. Themclingchuy~n(p,q)trendoghikytl,rCj; iii. p:==q; d.Ghinh~nq Iatr~gthaiket; Otomatill VlJngduQ'cxaydlJtlgtuangtl,r,v6idi~mkhacnhusail:thayvi ghi trenm6iclingchuy~nm(>tamtiet,taghis6bi~ucuatr~ngthai(ket)cuaotomatam tiett~idodoannh~m6iamtietcuaill nh~mgiftmkichthu6ccuaotomatill VlJtlg. Thi d\l,v6i haitirphl£O'ngphapvaphl£O'ngtrinh,giftsirkhi duaIAnIuqtcacamtiet phl£ang,phap,trinhquaotomatamtiet,tadenduQ'cactr~ngthaiketghicacs6nj, n],n3thitrencacclingchuy~ntuangimgtaghicacs6nj, n2,n3(Hinh2.2). ~~ "\ D Hinh2.2:Xfiy dl,Tngotomatirvl,Tng Thu~tOMxaydlJtlgotomatill VlJtlg: 31 Nhgp:Tir dientiTVlJIlg,otomatamtiet Xudt:OtomattirV\l'ng. Thugtloan: I. L~ptr:;mgthaikh6'idftu%; 2.Yang l~pdQcchot6'ikhihett~pdftli~u,Idyratirngm\lCtiTword.OQicac amtietcuawordhi SO'SP""Sn-l' , 3. Su d\lngotomatamtietdedoannh~ncacamtiettren,duQ'cacs6hi~u cuatr:;mgthai(ket)wangtmgla mo,mp...,mn-l a p:=q 'i'- O ' . 0' .- , b. Yang l~ptrongkhi (i:::;n-I) i. Ldyras6mj; ii. TimtrongcaccungchuyentiTtr~ngthaip cungtrendoghis6 m;.Neucocung(p,q)nhuthe I i :=i +l'. ' 2.P:=q; iii. N~ukhongcocung(p,q)naonhuthethithoatkh6ivang l~pb. c.V6'i j tiTi denn-l i. T~om6'itr~gthaiq, ghinh~nq latr~gthaikhongk~t; ii. Themcungchuyen(p,q)trendoghis6mj; iii. p:=q; d.Ohinh~nq latr~gthaik~t Saukhidaxayd1!ngxonghaiotomat,aghichungvaohait~pdjnhkieude dungtrongbu6'cphantachtirVlJIlg.N~um6ikYW(char)duQ'cghivaot~pv6'ikich 32 thuO'c2 byte(maUnicode),m6is6nguyen(int)cokichthuO'c4 bytethi t~pluu otomatamti<§tcokichthuO'c270KB,t~potomatill V\IDgcokichthuO'c1MB. Tu tuangcuathu~ttoanphfLntachtirV\IDglaquyvi~cphantachcauv€ vi~c timduangditrenmQtde,thicohu6ng,khongcotrQngs6. GiaSlrcaubandfiulamQtdayge,mn+1 amti€t So,Sl, ...,Sn.Ta xayd\!ng mQtde,thicon+2dinhVo,VI, ...,VmVn+l,s~pthllWtrenmQtduangth~ngill trai sangphai;trongdo,ill dinhVid€n dinhVjco cung(i <j) n€u cacamti€t Si,Si+],..., Sj-ltheothut\! l~pthanhmQttir.Khi do m6icachphantachcaukhacnhautuong UngvO'imQtduangdi trende,thi ill dinhdfiuVod€n di.Qhcu6iVn+l.Trongth\!ct€, cachphfLntichcaudungd~nnhAtthuangUngvO'iduangdi quait cungnhAttrende, thi. TrongtruanghQ'pcauco S\!nh~pnhfuIgthi de,thi seco nhi€u bonmQt duangding~nhAtirdinhdfiud€n dinhcu6i,tali~tketoanbQcacduangding~n nhAtrende,thi,tirdoduaratAtcacacphuongantachcaucoth~vad~nguoidung quy€tdinhsechQnphuonganmio,tuythuQcvaongii'nghiaho~cvancanh.Thi dl,l, xetmQtcaucoc\lm"thu(Jcdiabim",tacode,thinhusau(Hinh2.3). thu9Cdill I~--' djllh~ Hinh2.3.MQttinhhu6ngnh~pnhfuIg C\lmnayco S\!nh~pnh~nggiii'athu(Jcdiavadia himvatasecohaik€t qua phantachla "thu(Jcdia / bim"va "thu(Jc/ dia bim".Ta co th~chi rarAtnhi€u nhii'ng c\lmnh~pnhfuIgtrongti€ng Vi~t,ch~gh~n"t6hQ'pamti€t", "bfuIgchUngcO"',... TruanghQ'ptrongcauco amti€t khongn~mtrongtir di~nthi ro rangotomat amti€t khongdoannh~nduQ'camti€t nay.K€t qualade,thitaxayd\Illgtircaudo lakhonglienthong.D\!avaotinhchAtnay,tathAyrfuIgn€u de,thikhonglienthong 33 thid~dangphathi~nrar&ngdaTIvi amtietkhongdoannh~nduQ'ckhongn&mtrong tirdi~namtiet,tircnoduQ'cvietsajchinhtaho~clamC)tdaTIvi amtiet(tirv\ffig) moi DanhgiGkitqua V6i cachtierc~nnhutren,battoanphantachtirvlJngtrongCalitiengVi~t vScO'band5duQ'cgiaiquyet,d~cbi~tlavAndStachcact6hQ'ptirtuangduO'ngv6i mQtdaTIvi tirVlJng,thuanglacacClJmtirc6dinh,ngii'c6dinhho~cacthanhngii' trongtiengVi~t.v6i nhiiTIgCalinh~pvaocoSlJnh~pnh&ngtirV\ffig,tircconhiSu banmC)tcachphantachthitaphaichQnramC)tphuangan.TrongtAtcacacphuang anphantachdobaogiaclingt6nt~iphuangandung. DdgiaiquyitvdndJ naytacothddung: -Cacquyt~cfigii'phapdochuyengiangonngfrxayd\ffig.Tienhanhphan tichcuphapcuaCaliv6inhfrngphuangantachtirV\ffigcoth~,tirdolo~iranhiiTIg phuangansatcupharo -Dungphuangphapxac suAt- th6ngke.Phaith6ngketrongkhovanban tuO'ngd6i16ncuatiengVi~td~timraxacsuAtcuacacbC)dothaybC)batir lo~iho~c tirvlJngdi c~nhnhau.Tir doIlJa chQnphuO'nganphantachcoxacsuAtsatit nhAt. 0 daytasirdlJngcachthirhal,tircIa dungxacsuAtth6ngke d~chQnphuO'ng ant6tnhAt.ClJ th~la taduaramC)tdC)donhutrongdinhnghla2.1. Dinh nghza2.1 (D(jkit dinhc¥mfir):Cho ClJmtir dain tir (wJ"",wn).DC)do pccuanoduQ'cdinhnghlanhusau: C(Wp..., Wn) pc(w"...,wn)=n4C(w,)*",*C(wn) (2.1) V6i C(wJ"",wn)la s6IfinmatAtcacactirtrongclJmtitcungxuAthi~nv6i nhau.C(Wj),i=l,nIa s6IfinxuAthi~ncuatirWitrongmQiClJmtir.D~thAyr&ngdC) dopccaokhicactirthanhphfinxayrachungv6inhaunhiSubanvachungxayra riengIeithall. 34 Nhuv~ytatinhdQk~tdinhpcchotfttcacacphuanganvachQnphuangan co gia tr! 16TInhftt. 2.1.2.Trich dii'li~u Dinhnghla2.2:StoplistIa danhsachcactirxuftthi~nquanhi~utrongkho ngftli~uvakhongconQidungrorangtrongthongdi~pnhucacgi6'ill, d~itirva lientir... Thi d\!cactirthuQcstoplistla:"cai","va","v~",...Lu~tZipfduQ'cdungd~ nh~nbi~ttAnsufttxuftthi~ncuacactir quanhi~uhayquait. Xoa cactir thuQc stoplistla mQttrongnhftngcachkJ:rucactirV\ffigphi>bi~nnhftt.Vi~ckhircactir hi~mxuftthi~nduQ'cth\lchi~na bu6'cphatsinhlu~tb&ngcachxacdinhnguang min_sup. T~pcactirtrongstoplistthikhongc6dinhmaduQ'cxacdinhph\!thuQcvao m6ilInhV\lc.Thi V\ltir "maytinh"khongthuQcstoplisttrongcactaili~uphiky thu~tnhungnosethuQcstoplisttrongcactiii li~uv~maytinh.Ta secocongC\! clingcftpchonguaidungd~hQcoth~themhoi;icxoamQtirtrongstoplist. Bu6'cnaytaxoacactir tAmthuang(stoplist)trongcan,chi giftl~icactir chuathongtinhUnich.Taxemthu~ttoan2.1. ThuG!oim2.1 Nh~p:(1)Scactirtrongthongdi~p;(2)DanhsachcacStoplist Xuftt:Cactirhqpl~T trongcacCali. Phuangphap: 1)T= 2)whilechuah~tS { 3)Trichtirt k~ti~ptrongS ; 4) 5) } ift~ L andt~T then T=Tu t; 35 2.1.3.Bi~udi~ndii'li~u D~apdl.mgbQrheinlo~ik~thqpchothongdi~ptacAnbi~udi~nthongdi~p m thanhgiaotacT={l11c,mJ,{V}}.V6i mela tl,rad~cuathongdi~p,m!la chi~udai cuanovaV={tt,...,td lavectO'k fir trongnQidungcuam. Vi thu~toanchi thaotactrencacgiaotac86nenmiphAntu duqcanhx~ t6imQts6tuO'ng(mg.Ta daubriengmQtmi~ns6chomithuQctinhvagallm6igia trim6icuathuQctinhv6is6chuadungk~ti~ptrongmi~n ay.V6im6imi~ns6,s6 chuadungdAutienla 1vatangthem1v6im6igiatri m6iduqcthemvao.Ky 86 dAutiencuamigiatri la Id cuathuQctinh.Thi d\ln~uphAnnQidungcuathong di~pcoId la 1vamQtfirtrongdocos6123thinoduQ'cluutrongcO'So'dft1i~ula 1123.Tilt cacacchud~clingduqcanhx~ungv6i cacs6nguyennhungchung khongdoih6iphaicoId vi th~mQtchud~phaithuQcv~phaicuamQtlu~tnaodo. Chungtagi6ithi~umQtHeuristichiamQthongdi~pthanhnhi~uphAnma m6iphAncodQdaikhongvuqtquamax_fen(thongdi~pcochi~udait6ida)vakern theocacId thuQctinhcuathongdi~p.Di~unaynh~mcaiti~ndQchinhxacvatinh thihanhcuabQrheinlo~i.Saudaylacaclqi ichkhiapd\lngHeuristichianay: -DQdaigiaotacng~nhan:lamgiamdangk~thaigianxayd\l1lglu~tkhi rheinlo~i. -C\lCbQhoalu~t:lamC\lCbQhoatrenmilu~t,firdomilu~tchichuas6 phAntu it hO'nmax_leu.Tinh C\lCbQt~ochocacfir trongcaccaugi6ngnhauho~c gAllgi6ngnhaucotinhtuO'ngquailnhi~uhall. -Tang trngs6 cho thuQctinh quail trng:lam tangtrngs6 cho cacthuQc tinhtlJad~macacnothuangquailtr<;mgtrongthongdi~p. Heuristicduqcungd\lngchot~pcacthongdi~phuilnluy~nh~mt~ocho mohinhrheinlo~ihi~uquahan.Khongchiacacthongdi~ptrongt~pdft1i~uthu nghi~mvacacthongdi~pm6i. 36 Thidu2.1:Xemthongdi~ptronghinh2.4. Tl7aG6C6phainhUngngu6'ithiehnh~eTCS1anhUngngu6'ithiehcoGO'll? Daquy Toi 1amQtfaneuaTCS,mQtfanth(jtsY'tirkhitoiconr~tnh().Baihatnao euaT dinglamtoilientuUngG~nmQtthigi6'ieoa6evatrimtu(Jng.Bljtn than toi,r~tyeuthiGhnhUngaemV(ingth~nayng6ingheKL hatnhgeT,m6t minh thai,vaeamth6ythiehm6tminhnhuv~y.N~ubljtlleiing1angu6'ithiGh nhge T, bljtlle6th~yn1inhthiehliingnghenhgqTmQtminhnhutoikhong? Hengap1ai!TuanHung Hinh2.4:MQtthongdi~ptrendi€n Gall. Bangli: Cacill trichtirthongdi~ptrenhinh2.4 ThuQetinh CaefirtrongthongGi~p Cacgiatri so Ungv6i caefir {11,12,13,14,15,16, 17,18,19,110,111, 112,113,114,115,116, 117,118,119, 120,121} {21,22,23,24} 13~ NQidung {fan,tCS,tp.~tsl7,nh6,bail1at,1c;un,li~Jf tu6TIg,th~gi6i,coGQc,truutUQ'Ilg, yellthich,Gemv~ng,ng6i,nghe,KL, hat,nhljte,mOtminh,camth~y,thieh, 1~ngnghe} {thich,nhljtc,TCS, coGO'll} {daquy} Tl7aGe Ngu6'igui CQtthu2 cuabang2.1lamQtt~pcacfir duQ'ctrichfir thongdi~p(daxoa stoplist).Chungtagiasur~ngdaylathongdi~pddutientrongcO'saduli~uvas6 ddutienchuadungla 1.Do dod~tId(NQidung)= 1,Id(T1,lad~)=2 vaId(Nguai gui)=3. R5rangtrongnQidung:"fan"duQ'canhx~d~n11,"TCS" la 12,... Trongt1,lad~:"thich"duQ'canhx~d~n21,"nhac"la22,... Trongnguaigui:"Daquy"duQ'canhx~d~n31. CQtthu3chuacacgiatrjs6cuacactir. 37 Di;itmax_len=3thiphfinn<)idungduQ'chiathanh2 phfin: Phfin 1 chua 10tITdfiutien(lrngv6i 2 diu dfiu),phfin2 chua 11ill conl~i (ungv6i 2 diu conl~i).Id cuatlJad6vangmJigiri luonco trongm6iphfinnay.K~t quacohaigiaotaccod<)daituanglrngla 15va 16.Di;itId chud6cuathongdi~pla 3thitacohaigiaotacnhusau: 1 2 {3,15,11,12,13,14,15,16)7,18,19,110,21,22,23,24,31} {3,16,111,112,113,114,115,116,1~7,118,119,120,121,21,22,23,24,31} 2.2.Xayd1}'ngc\lmfirvacbenc\lmfir Dinhnghza2.3:M(JtcZ,JmfirtaSlfkit h9'Pcoy nghzavathuangxuyencua haihaynhiJufirmamJi chungkh6ngthu(Jcstoplist. Tadffbi~tr~ngcorfitnhi6utITxufithi~nph6Qi~ntrongvanbannhungchua rfitit thongtin hftuich nhungd6ikh.ik~thQ'Pchungl~it~ora m<)tCl,lmtITthi chua thongtin quailtrQngva khongconph6bi~nnua.Thi dl,ltrongm<)tdi€n dan,co nhi6uthongdi~pchuacabatIT"tblJc","hi~n"va"duQ'C"lanhungtITthu<)cstoplist vaconhi6ungunghlakhacnhaumyvaongucanh.Tuynhienn~uxemxetCl,lmtIT "thlJchi~nduQ'c"thinokhongthu<)cstoplistnua. Trongphfinnaytanghienc1JlJ11;1<)tthu~tgiaim6i do la FP-phrasemano sir dl,lngthu~ttoankhaikhoangcacm~uph6bi~ngi5ngnhuFP-growthd€ xaydlJng cl,lmtIT.Chungtaclingsirdl,lngd<)dopcd€ x~ph~ngvatiacl,lmtITtrongquatrinh chungsinhfa. TfitcacacphuangphapxaydlJngcl,lmtIThi~nnayduQ'chiathanhcacnhom sau: -PhuangphaptlJ d<)ngvathucong:Phuangphapthucongkhachinhxac nhungt5nnhi6uthaigianvayeticfiuhi€u bi~tnhi6uv6cacchud6.PhuangphaptlJ d<)ngduQ'chiathanh ailo~i:dlJatrencackythu~tngonnguhQcvadlJatrenth5ng keduli~u. 38 -PhUO'Ilgphapd\1'avitongftcanh:NgftcanhcuamQtc\lmill coth6Iacungtai li~u,cungcau... -PhUO'IlgphapC\lmill cothut\1'vakhongthut\1':N~uchQnIa cothut\1'thi d6iv6ihaiC\lmtircocacill gi6ngnhaunhungthut1!khacnhauthixemnhukhac nhau. -PhUO'Ilgphapd\1'avitoill Io~id6xacdinhC\lmill: Thi d\ltheosailcacdanh ill IamQtchu6icacdanhill haytinhill thiduQ'cxemIa c\lmill. -DungdQdai t6i da cuaC\lmtir tim duQ'c:Nhi~uphuO'Ilgphapchi xetcac c\lmtircohaitir,mQts6phuO'Ilgphapkhacthixetcacc\lmill codQdaibAtky. CacphuO'Ilgphapxir Iy ngonngftt1!nhienthi sird\lllgcacd~ctrungill V\lllg vaCllphap.d mucdQtirV\lllgthi gallmQtill Io~i(danhtir, dQngtir,tinhtir...)cho m6itir.d muccuphapthixacdinhm6iill d\1'avitochucnangcuachungtrongCall (chutir,tuctir, dQngtir)va dungcacthongtin nayd6gomnhomcacill trongCall thanhcacdonvi cocdutruc. CacphuO'Ilgphapth6ngke sird\lngcacdQdonhu:tAnsudtxudthi~n,thong tintuangh6,dQh6ndQn,dQk~tdinh...d6u6c IuQ'llgtinhhQ'PIy cuamQtS\1'k~thQ'P giuacactirxemnhumQtC\lmtir.MQtphuO'IlgphapdO'IlgiannhdtIaxemmQtC\lm tirnhumQtbQcacill khongthuQcStoplistvadi k~nhauit nhdttrong25taili~u. MQtphuongphapkhacdinhnghiadQk~tdinhchomQtc\lmtirvachicocacC\lmill cogiatr!naythoatrennguangm6iduQ'chApnh~.Nhi~uphuO'Ilgphapti~pc~ theocachtbnghQ'P,thid\lnhudungcackythu~tngonngftd6phatsinhc\lmtirva mQtdQdothongtintUO'Ilgh6duQ'cth6ngked6Io~iboS\1'trungI~p.PhuO'Ilgphap khacI~iirngd\lngthu~ttoanApriorid6timcacc\lmill cothut1!vakhongthut1!. M~cducacphuO'Ilgphapth6ngkekhongd\1'avitocacd~ctinhcuphaphay ngunghianhungchungkhamphacacC\lmtirv6i chiphirehO'IlrAtnhi~usov6i dungcackythu~tngonngft. 39 2.2.1.Ph1f(mgphapFP-Growth(FrequentPatternGrowth) FP-growthlagiaithu~tkhaiImminghi~uquacacmftuph6bi~nsovaithu~t toanApriori. Quizrac:ChomQt~pI cacphfu1ill, t~pX={ihiz...idc I duqcgQilak- itemset. V~yI-itemsetla t~pmQtphfu1tic,2-itemsetla t~p2phAntic... 2.2.1.1.Thi~tk~vaxayd1}11gFP-tree CacthuQctiOOsandayt~ochoFP-treelamQtc~utrucdftli~ur~tcodQngva hi~uqua: l-CacnutcuanochilUlltrftI-itemset.Nhuv~ykhixayd\Ingcityta chicAnduy~t2 lfu1trencoosadftli~u. 2-T~tcacacgiaotaccoosadftli~uduqcs~px~ptheothu1\Ixacdinh. Do dosed~dangquansatkhicoOOi€ugiaotacdiquamQts6nutchungvacoth~ gQPl~icacnutgi6ngOOauchivi~ctangbi~nd~mchochUng. 3-S~px~pcacgiaotactheothut\I giamcuatAns6xu~thi~ncua chungOO~mcaiti~ntinhcodQngcuacayvatangs6nutduqctrQnhO'n. Dinhngh'ia2.4: FP-treela mQtcaycacmJuph6biJn vaclinglamarQng cuac~utruccayti€n t6valUlltrftthongtindinhluqngxacdinhchocacmftuph6 bi~n. I-Cay comQtnutg6ccoOOanla "null",mQtt~pcaccayconti€n t6va mQtbangHeader. 2-M6inuttrongcayconti€n t6chua3 thuQctiOO:item-name,count vanode-link.Vai item-namelatencuaphAnill d~idi~nchonut,countchua s6giaotacchuamftudotiOOtirnutdod~nnutg6cvanode-linkchid~nnut k~ti~ptrongcaycophfu1ill doho?cchid~nnulln~unolanutla.Cacnode- linkgiupchoS\Iduy~tcityd~dangtrongquatrinhkhaikhoangdftli~u. 3-S6dongtrongbangHeaderb~ngs6phAnticphanbi~trongcayFP vam6idongclingchua3thuQctiOO:item-name,item-countvanode-link.Vai item-. 40 nameleitencuaphAntu, item-countla tAngsf>bi€n d€m cuatAtcacacnutchua phAntu dovanode-linkchid€n nutdAutienchuaphAnill dotrongcay. ThUGtloan2.2:Xay dlJngcayFP titmQtcO'sadftli~ucacgiaotaco Nhtjp: (1)CO'sadftli~ugiaotacD; (2)Ngufmgmill_sup. Xudt:CayFP v6'icO'sadftli~uD vath6amill_sup. Phurmgphap: (1)Duy~trenD IAndAuvathut~pmQt~pcacmfiuphAbi€n F cling v6'idQh6trQ'cuachung. S~px€p cacphAntutrongF theochi~ugiiimcuadQh6trQ'valUll vaodanhsachL. N€u conhi~uphAnill coclingdQh6trQ'thi s~ptheochi~utangten cuachung. (2)T(;lonutgf>cR vagallnhanla"null". T(;lobangHeaderco IFIdongva d~tAtcacacnode-linkchid€n null. (3)For eachgiaotacTED do{//dQcD IAn2 ChichQncaephAnill phAbi€n cuaT duavaoP; S~pP theothutlJcuaL. CallInsert_Tree(P,R); } Procedureinsert_tree(P,R) f I) I>~tP = [PIP-p],v6'ip laphAnill dAutiencuaP va P -P laphAn conl(;licuadanhsach. 2) ifR comQtconN saDchoN.item-name=pthen 3) N.count=N.count+1; 4)else{ 5) T(;lonutm6'iN; 41 6) 7) 8) N.count=1; N.item-name=p; N.parent=R; II T~onode-linkp chid€nphAntirdAtitiencoitem-namenay. II H labangheader. 9) N.node-link=H(p).head; 10) H(p).head=N; 11)} II Tangbi€n countcuapthem1trongbangheaderH. 12)H(p).count=H(p).count+1; 13)ifP -p"*then 14) Callinsert- tree(P- p,N); } Thidu2.2:Xayd\lngFP-treeill bangdftli~u2.2vaimill_sup=3. Bang2.2:Bangdftli~ucacgiaotac Buac1:duy~ttrencO's6dftli~utimdanhsachcacmduph6bi€n L = {(3,5),(2,4),(1,3),(5,3)}, vai s6dAtilatenvas6thuhailadl)h6. trqcuano. Buac2:T~onutg6cchocayvaduy~tcO's6dftli~uIAnhaid~chen5 giaotacvaocaynhuhinh2.5(a)-(e). Tid Giaotac Mducanchen 1 {1,2,3,4} {3,2,1} 2 {1,2,3} {3,2,1} 3 {2,3,5} {3,2,5} 4 {3,5,6} {3,5} 5 {1,2,3,5} {3,2,1,5} 42 Truactien,giaotac{1,2,3,4}duQ'chen(Hinh2.5a).PhAntu4 khongph6 bi~nvacacphAntu conI~iduQ'cs~ptheothirt\1'trongL:{3,2,I}. NhanhdAtitien duQ'chenvaodiy va cacbi~nd~mcuacacphAntu 3,2,1duQ'ctangten1trong bangHeader. Giaotacthirhai {I,2,3}chiracacphAntu d€u ph6bi~nvaduQ'cs~pthanh {3,2,1}.No cochungm<)tducmgdanvai giaotacdAtitiennenkhongconutmai naoduQ'chenvaodiymachicocacbi~nd~mduQ'c~pnh~t. Giaotacthirbala {2,3,5}duQ'cs~pthanh{3,2,5}.Vi cochungduemgdanla {3,2}vaiduemgcli la {3,2,1}nenbi~n<,t~mm6inutdQctheonoduQ'ctangthemI vacom<)tnutmai(5:1)duQ'ct~ovalienk~tnhum<)tnutconcua(2:3). GiaotacthirtucophAntu6khongph6bi~n.PhAnconI~iduQ'cs~pla {3,5} nencochungnut{3}vai caydanghi~nhanh,vi th~biend~mcuanotangthem1 com<)tnutmai(5:1)duQ'chenvaolamconcuanut(3:4).Vi cayhi~nt~ichiranut cophAntu5nennode-linkduQ'ct~ogiftachung. GiaotacthirnamduQ'cs~pla{3,2,1,5}clingt~othemm<)tnutmaichophAn tu5.Node-linkcua5baygiachirabanut.UiachicuanutdAtitienclingvainode- linkduQ'clUlltrongdongtuonglIngcuabangHeader. BangHeader Item-nameItem-count ~ t ~de-link ~ ------~( a)Chenmati{3,2,1} b) Chenmati{3,2,1} 3 1 - 2 1 - 1 I - 5 0 3 2 - 2 2 - 1 2 - 5 0 ,,, , j', ,' ,' ,, ,, ,, ,'--- c) Chenmati{3,2,5} 43 d)Chenmati{3,5} \ \ \ \ \ \ \ \ \ , , , ,,,, \ \ \ I I ,, --------------- e)Chenmati{3,2,1,5} Hinh2.5:Xaydl,lngcayFP 3 3 - 2 3 - 1 2 - 5 1 ', 3 5 -. 2 4 -' 1 3 - 5 3 \ 3 4 - 2 3 - 1 2 - 5 2 " 44 2.2.1.2.Khai khmingcaemiu ph&bi~nbing FP-tree PhuongphapFP-growthti€p c~ theohu6'ngchiava trj chovi~ckhai khoangcacm~uph6bi€n.Phuongphaptimki€m mQtcachd~quyd6iv6'icacm~u ph6bi€n ng~nhonvasaudotimcacm~udaihonb~ngcachghepmQtphfintuph6 bi€n vaosaumQtm~ung~hon. FP-treekhongnhftngchophepbi€u di€n dit li~ucodQngmaconti~nIQ'icho khaikhoangmQtm~uph6bi€n hi~uquabai cacthuQctinhsauday: 1-Tfttcacacm~uph6bi€n chuamQtm~uph6bi€n conajcoth€ duQ'c timthftytrongcooSo'm~uco di~uki~ncuam~uconmano chuamQtt~pcacdu<mg d~nti~nt6trongFP-tree. 2-Tfttcacacdu<mgd~nti~nt6cuaajcoth€ duQ'ctimthftyb~ngcach duy~tquamQtnode-linkcuaphfintudfiutiencuaajtrongcayFP codi~uki~ncua aj,v6'im6idu<mgd~nti~nt6ladu<mgdi ill mQtnutdQctheonode-linkd€n nutg6c cuacay.MQtcaycodi~uki~nduQ'cxayd\IIlgb~g cachdungcooSo'codi~uki~n cuaaj. 3-Tfttcacacm~uph6bi€n chuam~uh~ut6ajcoth€ duQ'ctimthfty b~ngcachghepajv6'icacm~uconph6bi€n duQ'ckhaikhoangill cayFP codi~u ki~ncuaai. 4- Bfttkym~uph6bi€n duQ'cxayd\IIlgdungajnhumQth~ut6ph6 bi€n thicocungdQh6trQ'v6'iaj. Thu~toand€ khaikhoangtfttcacacm~uph6bi€n dungFP-treeduQ'ctrinh baydu6'iday. ThUGtloan2.3:DungFP-treed€ khaikhoangcacm~uph6bi€n b~g cach tangtruangphanmanhm~u. Nhijp:(1)CayFP d6iv6'icooSo'ditli~uD; (2)Nguangmill_sup; Xudt:MQtt~pdfiyducacm~uph6bi€n F. Phuangphap:Call FP-growth(Tree,null) ProcedureFP-growth(Tree,a){ 1)F=<D; 45 2)If Treechuam<)tdmmgddndanP then{ 3) foreacht6hQp~cuacacnuttrongP do{ 4) Phatsinhmdup=~u a; 5) sup(P)=mill_supcuacacnuttrong~; 6) F=Fu p; 7) } 8) } 9) else 10)for eachaibangheadercuacayb~tdAtill mduph6bi€n itnh~tdo 11) 12) 13) 14) 15) 16) 17) 18) 19)} { Phatsinhmdu~=aiu a; sup(~)=sup(ai); F=F u ~; Xay d\ll1gcO's6codi€u ki~ncua~; Xay d\ll1gFP-treecodi€u ki~nTreepcua~; If Tree~*then Call FP-growth(Tree,); } Thid\lsaudayapd\lngthu~tloanFP-growthchocay6hinh2.5e. Thidu2.3: CacphAnill ph6bi€n trongbangHeaderduQ'cxemxetb~tdAtill phAnill co d<)h6trg th~pnh~tvi mufmdambaoduy~tcayill du6itennh~mtranhxetm<)t phAntll nhi€u IAn.Tru6clienxetphAntir5,m<)tphAntirph6bi€n du6icungcua cay.Noxayratrongbanhanhlrngv6icacduangddn(3,2,1:1),(3:1)va(3,2:1).£><) h6trg cuam6iduangddnnayb~gnhaulrngv6inutcua5.Saukhit6ngcacIfA1..~ trgcuacacphAnill trongcabanhanhduQ'CtinhchungtaduQ'cbangtAnsu~tC( ki~n(3:3,2:2,1:1).PhAntir I khongth6amin_sup=2,vi th€ chico3 va2m6i dungd€ xayd\ll1gcapFP codi€u ki~nchophAnill 5nhuhinh2.6.Caynay(. 46 xaydlJnggi6ngnhucayFP nhungthayvi cacgiaotactachencacduangdftn.Vi cayFP codi€u ki~ncuaphfinill 5chicomQtduangdftndO'11(3:3,2:2)nencacmftu ph6bi~nconcoth~duQ'cphatsinhIa(3:3),(2:2),(3,2:2),v6'idQh6trQ'duQ'cgan b~ngdQh6trQ't6i thi~ucuacacphfinill trongmQtmftucon. BangHeader !tem-\,e !t i m-count 0 \ Node-link jrrfl u --- ~ [TIlm.cb Hinh 2.6:Xay dlJllgcayFP codi€u ki~ncuaphfinill 5tronghinh2.5e. Tfitcacacmftuph6bi~nchiraphfintu5duQ'cpbatsinhb~gcachghepphfin ill 5vitosaucacmftucon:{5:3},{3,5:3},{2,5:2},{3,2,5:2}. D6i v6'iphfinill 1,ca samftucodi€u ki~nchichiramQtduangdftn(3,2:3) lIngv6'icaycodi€u ki~n.Tfitcacact6hQ'Pcoth~cuacacphfintuchota cacmftuph6bi~nsau:{I:3},{3,I:3},{2,I:3},{3,2,1:3}. D6i v6'iphfinill 2, ca samftucodi€u ki~nvacayFP co di€u ki~nIa (3:4)va .No chocacmftu{2:4},{3,2:4}. Ca samftucodi€u ki~ncua3 la r6ngvi th~comQtmftuph6bi~n{3:5}.Quy trinhkhaikhoangduQ'ct6ngk~ta bang2.3. 47 Bfm~: QuytrinhkhaiImmingcayFP tronghinh2.5e Dinhnghza2.5(Cumfirthira):C~mfirp g(Jila thiraniu vachiniu coqlm firPsubla c1:lmfir con cuaP maco hc;mgcao han h{;mgcuap. Khongbaogiach<;mcacC\lmill thiramab~tbuQcphaitiano.d daychungta phatsinhtAtcacacc\lmill saildotiacacc\lmtirthira. Dinh nghfa2.6 (ThuoctinhAnti-Monotone):MQtmJup khongthoamiinmQt thuQctinhthi thuQctinhnayg(Ji la Anti-Monotoneniu vachi niu khongcosieu mJunaocuap thoano. ThuQctinhAnti-MonotonerAthtIuichbaivi nocoth~duQ'cdAyt6idavito quytrinhkhaikhoangcacm~uph6bi€n. Dinh Iv2.1(ThuoctinhAnti-MonotonecuaDC): Cho c\lmill P=(W},"',Wn),gQiDen(w},...,Wn)Ia IJ1~Us6cua(2.1)d6iv6i p va PCiIadQdopccuabAtkyc\lmill PicodQdain-l ill thuduQ'cb~ngcachb6ill thui trongc\lmtirp (i=l,n). Deni(W},...,Wn)Iam~us6trongPCi.S~px€p cacill theothun.rgiambi€n d€m toanC\lCcuachungtrongmQiC\lmill. N€u cacdi~uki~nsaildaydungv6i n (i) pc(w}"",wn)<pciCw},...,wn) (ii) Den(w},...,wn)>Deni(w},...,wn) thi chungclingdungv6i bAtkym>n. Phan CO'sacodieukin CayFP co Cacmu phobien ill diu kin duQ'csinhra 5 {{3,2,1:1),{3:1),(3,2:I)} {5:3},{3,5:3},{2,5:2}, {3,2,5:2} 1 {{3,2:3)} {1:3},{3,1:3},{2,1:3}, {3,2,1:3} 2 ({3:4)} {2:4},{3,2:4} 3 cD R6ng {3:5} 48 Chimgminh - Bu6'c1:D~tk Iii s6n t6i thi~usaochodi~uki~n(ii) dung.V6'ik thi (i) duQ"csuyra ill (ii). Th~tv~ytheoheuristicApriori taco: Nom(WI"",Wk):::;;Nomi(WI"",Wk) V6'iNom Iii tit s6trong(2.1).Di~uniiydUngvi c\lmill bellv€ tnii Iii sieuClJmtircuaClJmtirbellv€ phai.K€t hQ'Pb~tdfulgthucniiyv6'i(ii) taduQ"c(i). - Bu6'c2: Giii sirhai di~uki~ncuadinh Iy dungv6'in, tacAnchUng minhnodungv6'in+1. Tir d~ngthuc(2.1)tavi€t I~i(ii) nhusau: n~C( ) C() C() !C(Wl)*C(WZ)*...*C(wn)WI * WZ *...* W >n-Z n 1 C(wJ C(Wl)n-z *C(WZ) n-Z*...*C(Wn) n-z> C(Wl) n-I*C(Wz) n-I*...*C(Wn) n-I C(WJ n-I . Saukhi rutg<;mvanhanhaiv€ choC(Wit-1taduQ'c: C(Wit-1> C(Wi)*C(WZ)*...*C(Wn) DaychinhIii mOtd~g khaccua(ii). D~chUngminh(i) dungv6'in+1tacAnchUngminhdi~uki~n(ii) clingdung v6'in+1.Tuc Ia: (2.2) Den(Wl,WZ...Wn,Wn+l)>Deni(wl,WZ...Wn,Wn+l) Theo Heuristic Apriori taco: (2.3) Nom(Wl,WZ"",Wn,Wn+l):::;;Nomi(wl,WZ",Wn,Wn+l) D~chungminh(2.3)tacAnchungminh: C(Wit> C(Wl)*C(WZ)*...*C(wn)*C(Wn+l) (2.4) Vi t~tcii cactir trongClJmtir duQ"cs~ptheochi~ugiiimbi€n d€m to~mClJC cuachung,nentaco: C(Wi);;:::C(Wn+l) i=I,...,n Nhan(2.2)v6'i(2.5)v€ v6'iv€ taduQ"c(2.4). V~ytheonguyenIyquyn~ptaduQ"cdi~uphiiichUngminh!!! (2.5) 49 He qua2.1:N~ue\lmtu p(Wl,"',Wn)la thua,co dQdopc va 2 di~uki~nsau th6aman: 0) Den(wJ.,"'~wnY> Deni(wJ.,...~wn) (ii) C(WI)~C(wz)~ ~C(wn) thibAtky sieue\lmtUnaoeuap dIngthua. ChUngminh: SlJ thuaduqenoitrongdi~uki~n(i) euadinhly 2.1.Vi theodinhly thi(i) la dungv6im>n,tirelit:bAtkysieue\lmtUnaoeuap clinglathuan~uthir1\1'trencae bi~nd~meuaphfinill duqebaatoim. Caedi~uki~na caemdusf,vathir1\1'caed~iluqnglarAtqUailtrQng.Thid\l di~uki~n(i) euadinhly ladungtrongkhi(ii) sainhungsaudok~tlu~neuadinhly thikhongdambaadungm~edilnocoth~dung.Xemthid\ldu6iday. Thidu2.4: Gia sircaedi~uki~neuadinhly th6amanva a,b, e,d la caetUco caebi~n d~mnhusau: C(a)=80; C(b)=15; C(e)=5; C(d)=5 C(be)=5; C(abe)=4; C(bed)=4; C(abed)=3 TAteacaedi~uki~neuadinhly2.1th6aman: Den(be)=16.5=75;Den(abe)= .JSO*15*5 =77.46 pe(be)=51Den(be)=0.066;pe(abe)=41Den(abe)=0.052 Tinh dQdopcehobedvaabed: pe(bed)=41.J15*5*5 =0.207 pe(abed)= 3/VSO*15*5*5= 0.097 R5rangpe(bed)>pe(abed)dungnhu dinhly. ChungtaxettruanghQ'PtAteacaedi~uki~neuadinhly duqeth6aman ngo~itmdi~uki~n(ii)nhusau: 50 C(a)=20;C(b)=15;C(e)=10;C(d)=5;C(be)=10; C(abe)=3;C(bed)=3;C(abed)=3;Den(be)=15*10=150; Den(abe)= .J20*15*10 =54.77;pe(be)=10/Den(be)=0.066; pe(abe)=3/Den(abe)=0.055;pe(bed)=3/.J15*10*5 =0.110; pe(abed)=3/\120*15*10*5 =0.122 Hinh2.7:Tia c\lmtITthiraC\lCbe> C?/mtit D{jaopc ab 0.015 abe 0.080 abed 0.149 abede 0.239 abee 0.160 abd 0.070 abde 0.170 abe 0.078 ae 0.015 aed 0.091}aede 0.202ace 10 ad 0.015 ade 0.111 }- ae 0.015 be 0.083 bed 0.231 bede 0.376 bee 0.258 bd 0.080 bde 0.283 be 0.100 cd 0.133 ede 0.365 ce 0.167 de 0.200 51 Trongtruanghqpnaypc(bcd)<pc(abcd),baivi di~uki~n(ii) kh6ngthoa. DinhIy2.1chophepphatbi~uIu~tsaudaytrongquytrinhtiaC\lmtITthiraC\lCbt). LUGt2.1 (ria cumtirthiracucbodungdodopc): Tia mt)tC\lmtITthirap va tAtcacacStellC\lmtITcuano n~uco it nhAtmt)t c\lmtITconPsubcuaP coDen(psub)<Den(p). Chi c~nki~mtradi~uki~n(ii) vi di~uki~n(i) duqcsuyra tIT(ii). Thli t\1' giamcuacacbi~nd~mtoanC\lCcuacactITriengIe cuam9i c\lmtITsed£;1tduqc b~ngcachsuathu~toanFP-growthduqcdi€n giai dual day.Lu~t2.1chopheptia nhi~uc\lmtITthiranhungkh6ngphaiIa tAt cabaivi lu~tnayti~pc~ntheohuangbi quailvatiacacC\lmtITchikhinaono thoatAtcacacdi~uki~ncuadinhIy 2.1. Nhungcoth~dinhIyv§:ndungth~mchidi~uki~1J(ii) kh6ngthoa.Quytrinhtiacac c\lmtITthiraconl£;1itadungCP-treeseduqctrinhbaytrongcacph~sau. Tahayxemthid\ltiaC\lmtITthiraC\lCbt)dungdt)dopcnhusau: Thidu2.5:ChovanbanchliacactITa,b,c,d,evaicacbi~nd~mnhusau: C(a)=66;C(b)= 10;C(c)=6;C(d)=5;C(e)=4;C(ab)= 10;C(bc)=5; C(ac)=5;C(ad)=4;C(ae)=4;C(bd)=4;C(be)=4;C(cd)=4;C(ce)=4;C(de)=4; C(abc)=5;C(abd)=4;C(bcd)=4;C(bce)=4;C(bde)=4;C(ced)=4; C(abcd)=4;C(abce)=4;C(abde)=4;C(bcde)=4;C(abcde)=4. Hinh2.7chothAycacC\lmtITduqcphatsinhvaimill_sup=3vacacgiatri pctuanglingcuachung.CacdAtingo~chiracacC\lmtITduqctia.Truactienc\lm ill abduqcsinhraKh6ngc~nki~mtracacc\lmtITconcuanovi dt)dait6ithi~ucho phepcuanoia3.Ti~ptheoIaC\lmtITabcvaiDen(abc)=63.929.C\lmtITconcuano IabccoDen(bc)=60.000.TheoIu~t2.1,c\lmtITabcvatAtcacacStellC\lmtITcua noduqctia.Chuy r~ngcacStellc\lmtITabcd,aberle,abce,abd,abde,abeduqctia makh6ngc~nsinhfa.NhomC\lmtITthli hatIa (acd,acde,ace)duqctiabai vi Den(acd)=44.497vaDen(cd)=30.000.Nhomthlibaduqctiachicomt)tc\lmtITade, vaiDen(ade)=36.332vaDen(de)=20.000. 52 2.2.2-PhU'O'ngphapFP-phrase Giai thu~tFP-phraseduQ'cdungd~xayd\ffigC\lmfir.No lAygiatri nh~pvao Ia danhsachcacgiaotacmam6igiaotacIamQtt1Jad€ ho~cmQtCalitrongnQidung cuathongdi~p.CacC\lmfirduQ'ckhatkhoangxuyenquatAtcacacthongdi~ptrong t~pdftli~uhuAnIuy~nmam6ithongdi~pdaxacdinhIO£;li.Nguangmill_suptrong khikhatkhoangIanhanmill_suptruacdoseduQ'cdungd~khatkhoang6 buac sau. Quatrinhxayd\ffigFP-treechokhatkh,oangC\lmfir gi6ngnhuthu~toan2.2 nhungtruackhi chencacgiaotacv;lOcaythi tas~pchungtheochi€u tangcuatAn suAt.Thayd6inayIacAnthi€tnhAmthoamapdi€u ki~hcuadinhIy2.1mab~tbuQc cacphAntITtrongC\lmfirphaiduQ'cs~ptheochi€ugiamtAnsuAt.Vi cayduQ'cduy~t firduaiIennenchungtacAncacphAntITcotAnsuAtIannhAtnAm6 duaicay.Dang ti€c Ia di€u naylamchocayr~mr£;lphanvaduy~tch~mhannhungcokhanang vi~ctiamfiud£;lthi~uquahan. Quatrinhkhatkhoangmfiuph6bi€n clingdothQisirad6i trongtruanghqp diyFP codi€u ki~nchuamQtduangdfindanvatAtcacacmfiuph6bi€n duQ'csinh rabAngcachd€m tAtcacacmfiuconcoth~cuaduangdfin.Trongthu~ttoanFP- growth(thu~ttoan2.3)thivi~cd€m naycoth~theothut1JbAtky trongkhi FP- phrasethiUtitienduy~ttheochi€usaucuavi~cd€mmfiud~lamchovi~ctiasaut6i dacoth~.Thi d\l,n€u duangdfinIa,thut1Jd€m mfiulIngvaiduy~tthu t1Jcuacaytronghinh2.8Ia:{ab},{abc},{abcd},{abcde},{abce},{abd},{abde}, {abe},{ac},{acd},{acde},{ace},{ad},{ade},{ae},{bc},{bcd},{bcde},{bce}, {bd},{bde},{be},{cd},{cde},{ce},{de}. 53 Hinh2.8:Li~tkem~utheoUtitiepdQsaucuaduangdfut Trongb6sungnaychungtaxemtruanghQ'PmQtduangd~ndannhutruang. hQ'Pt6ngquatbfuIgcachxayd\lIlgcayFP codi~uki~ncuano.Thu~ttoanFP- phrasedaydudungd~phatsinhC\lmtirkhongthiranhusau: ThuGtloan2.4(FP-phrase):KhaikhoangtAtcaC\lmtirph6bi~nkhongthira gi6ngnhuphuO'ngphapFP-growth. Nhejp:(1)T~pcacgiaotacT, m6igiaotaclatl!ad~ho~cmQtCallnQidung cuathongdi~p; (2)Nguangph6bi~nt6ithi~uC\lmtirSmin; (3)BQ dait6idaC\lmtir Imax. Xudt:T~pdayduP cacC\lmtirph6bi~nkhongthirath6aImax. Phlfangphap: (I) Xay d\1'ngFP-trees~px~pm6igiaotactheothirt\1'tangtansuAt. (2)P=~; (3)CallFP-phrase(Tree,null) 54 ProcedureFP-phrase(Tree,a) { 1)foreachajtrongheaderctiacayb~td~utirph~ntuph6bi~nnhMdo 2) { Phcitsinhmfiu~=aju a; 3) sup(P)=sup(aj); 4) iflength(~);:::2andlength(~)~lmaxthen{ 5) if mfiucon~subctiacoDen(~sub)<den(~)then 6) Tiavakh6ngphatri~nonua; 7) elseif ~kh6ngduQ'ctiabaiCP-treethen{ 8) P=P u ~; 9) Xayd\ll1gcosamfiucodi~uki~nctia~; 10) XfLydlJngcayFP codi~uki~nTree~; 11) If Tree~"*then 12) callFP-growth(Tree~, ); 13) } 14) }//endiflength 15) }//endfor 16)} Tia cacc\lmtirconl~idungCP-treea dong(7)duQ'ctrinhbaytrongph~n 2.2.3. f)~tranhki~mtravetc~ntdtcacacmfiuconcoth~ctiamQiC\lmtira dong (5)tadungSlJt6iUtisau.V6imQiC\lmtirPn=WI,...,Wnthiconhi~uc\lmtirconctia nokh6ngc~nki~mtra,dola tdtcacacc\lmtirconSPn-lctiaC\lmtircon Pn-l=WI,.."Wn-ldflbi~tvath6adi~uki~n: Den(SPn-l);:::Den(Pn-l) (2.6) Th~tv~y,n~u(2.6)kh6ngdungd6iv6i it nhdtmQtSPn-lthiPn-lduQ'ctiacung v6i tdtcasieuc\lmtir ctianovaPnkh6ngduQ'cxetnua.Tdt cacacc\lmtir conctia Pn-lclingd6ngthai lacacc\lmtirconctiaPn, vi th~PnduQ'ctian~u: 55 Den(Pn)>rninSpn-l(Den(SPn-l» (2.7) D6 ki6rntradi€u ki~nnaychungtachi cAnlUll giatri rninspn-l(Den(sPn-l»va kh6ngcAntinhrnfius6cuabAtky rnfiuconnaocuaPn.N~u(2.7)kh6ngth6atacAn tinhrnfius6 cuacacrnfiucon cuaPnnhungkh6ngla rnfiucon cuaPn-l'Hay xern thu~ttoandu6iday. Thudtolm2.5:Ki6rntratAtcacacrnfiuconcuaC\lrnfirhi~nhanh. Nh(tp:C\lrnfirPnvaDen(Pn);(2)DenmincuatAt<;1ac crnfiuconcua Pn-l,v6i Pn=w U Pu-l(w la rnQts6firnaodo). Xudt:(1)Truen~uPnduQ'ctja,Falsen~ukhacdi; (2)C~pnh~tDenmin; PhLiangphap: 1) iflength(Pn)=2then{ 2) C~pnh~t:Denmin=Den(Pn); 3) returnFalse; 4) } 5) else{ 6) if (Den(Pn»Denmin)then{ 7) returnTrue; 8) else{ 9) for eachc\lrnfirconsPncuaPnIDalength(sPn)~2andwe SPndo{ 10) ifDen(Pn)>Den(sPn)then 11) returnTrue; 12) } 13) if Den(Pn)<Denminthen 14) C~pnh~tDenmin=Den(Pn); 15) returnFalse; 16) } 17) } 56 N~uthu~tmlnch~yd~ndong(13),mclaDen(pn)::s;;denminvaDen(Pn)::s;; Den(sPn)v6i mQimfiuconSPn.DenminduQ'c~Pnh~t&dong(14)n~ubAtd~ngthuc nh6hanth~tSlJ.DenmincdnduQ'c~Pnh~tn~uthu~tmlntraveTrue,vi trong tru<mghQ'PnaymQtmfiukhongduQ'ctr6ngnil'a. 2.2.3.Tia c1}.mfir thiradungCP-Tree (CompressedPatternTree) CP-treela mQtd~ngdangiancuaCR-treeduQ'ctrinhbaytrongchuangsail. CP-treekhonglUll tril'thongtin ve lo~ivamQinutcuano t'rngv6i mQtmfiut6i da. ChungtadungdQdopc d€ x~Ph~g C\lmill. MQtC\lmill co th€ duQ'ctiangaykhi chennovaoCP-treeb~ngcachki€m ITacot6nt~iC\lmill concoh~gcaohanhay khong?Sieuc\lmtircoh~ngthAphanhaykhong?N~ucothitiachung.CacC\lmtir conl~iduQ'cxuAtrab~ngcachduy~ttrendu<mgdfutcuacaybiltdAtill tAtcacac nutcuaCP-treecokernthongtinthuQctinhmfiuchod~nnutg6c. 2.2.4.Chenc1}.mfir Trongphflnnaychungtamotacachsird\lngcacC\lmtirdaxaydlJngd€ thay th~cacill riengIe.Nhieuc\lmill coth€ d\lngdQb&ichungcomQts6ill chung.B€ giaiquy~tdieunaytadinhnghiam9tthutlJt6ngtrenmQiC\lmill. LulU 2.2(X~ph~ngc\lmill) Cho hai c\lmill Pi vaPj. Pi co h~g caohanPj hayPi >Pj n~u: (i) PC(Pi)>pc(Pj),ho~c (ii) PC(Pi)=pc(Pj)vasup(Pi)>sup(Pj),ho~c (iii) PC(Pi)=pc(Pj)vaSUP(Pi)=sup(Pj)valength(pi)>length(pj),ho~c (iv) PC(Pi)=pc(Pj)va SUP(Pi)= sup(Pj)va length(pi)= length(pj)vaPi duQ'C sinhratru6cPj. TAtcacacc\lmill sinhraduQ'csilpgiamddntheoh~ng.V6i m6iCallmacac c\lmill phiLnothiduQ'chQnlamt'rngvienthayth~,v6iphiLnayduQ'cdinhnghia nhusail: 57 Dinh nghza2.7 (Phil C1,lmtir)Ciiu S all(Ycphil bai C1,lmtirp niu tatcdcactir cilaC1,lmtirnlimtrongciiuS, tucfa: VWj EP, 3wj E S, v6'iWj,Wjfacactir. Cac c\lmtiTphil duQ'cxetb~tdAutiTC\lmco h~g caonh~t.MQt Id C\lmtiT thaychonhi~uId cuacacill thuQcc\lmdo.Quatrinhti~ptl,lcd~nkhi t~tcacact6 hQ'PtiTcothetrongdiuduQ'Cthayth~bO'imQtsf>C\lmtiTho~cd~nh~tcacC\lmtiT. Detf>iuuthaigianth\lchi~ncuathu~ttoanchUngtaluuv~tsf>cactiTcuab~tky c\lmill m\otrongm6idiu..MQthu~ttoandAyduduQ'cdungdethayth~C\lmill duQ'ctrinhbaydu6iday(Thu~ttoan2.6). Thuiitloan2.6:(Thayth~c\lmill) Thayth~cacnhomill trongmicaubO'i m<)tC\lmtir. Nh(1p:(1)T~pP t~tcacacC\lmill; (2)Co sO'ditli~ucacgiaotacT, v6im6i giaotacla mQtcaunQidunghayWad~cuathongdi~p;m6igiaotacchuaId cua thongdi~ptuang(mg. Xuat:Co sO'ditli~ucacgiaotacD,v6im6igiaotacthehi~nmQthongdi~p duQ'cthayth~bO'icacc\lmill. Phllangphap: 1) P =Sort(P);//S~ptheochi~ugiamcuah~g cuacacc\lmill 2) For eachgiaotactj E T do { 3) Npt(tj)=sf>C\lmtirtronggiaotactjtru6ckhithayC\lmtir; 4) Timt~pconPCOy c P lat~tcacacc\lmill philtj; 5) do { 6) ifc\lmill k~ti~pPj E PCOy vfu1philtj then 7) Thayt~tcacacill cuaPjtrongtjbO'iPj; 8) C~pnh~tsf>c\lmtiTtrong4:npt(tj)=npt(tj)-1(Pj); 9) } 10) }until(endofPcovornpt(tj)<2); 11) } ..... 58 12)Gomnhomt~tcacacgiaotactrongT theeId thongdi~pvaomQtgiao tilemai. 13)Chengiaotacm6'inayvaoD; 2.3. ChQD d~ctrU'Dg V~ndequailtr<;mgtrongphantichduli~uvanbanlakichthu6'ckhonggian d~ctrungIan.B6ivi m6iill phfulbi~tduQ'cthemvaolamtangthemmOtkichthu6'c chokhonggian.Do dokichthu6'ckhonggiand~ctrungcoth~dC;ltd~nhangchlJc th~mchihangtramngan.Thl!CrachicomOts6itcacill lamd~c!rungd~phanIOC;li taili~utrongkhir~tnhi€ucactirOOi~u,khongranghialamtangthaigiantiOOtmln dangk~vak~tquakerntinc~y.Vi v~ychungtaphaiti~nhanhchQnd~ctrung. Dinhnghia2.8:ChQnd~c!runglaquytriOOxoacacill khongranghiatrong taili~unh~mcaiti~ndOchinhxacvagiamthaigiantinhtoanphuctC;lP. Conhi€uphuangphapchQnd~ctrungduQ'clrngdlJngtrongbu6'cti€n xirIf dftli~umachtiy~udl!avao:nguO'ngtAnsu~taili~u,l -th6ngke,thongtintuang h6vacuO'ngdOxuAthi~nill. SaildaylacacdOdodungd~chQnd~ctrungtrenthl!c t~. Thon!!tintutm!!h6trun!!binh DOdothongtin tuangh6trungbinhduQ'Csird\lngrOngraid~phan IOC;li thongdi~p.CongthuctiOOthongtintuangh6trungbiOOgiuamOtt~ptAtcacac IOC;liC va mOtill t la: P(e,ft) MI(C,t)=L L P(e,ft)logzPee)*pert)ceC f,e{O,I} (2.8) Trongdo4=0n~ukhongcot trongthongdi~pva4= 1n~unguQ'clC;li. MI(C,t) caokhi bi~tSl!ki~nmaill t com~thayvfu1gm~trongthongdi~pva nogiupdl!deanIOC;lic chothongdi~p. 59 D(lt{cih=l,mla kY hi~ucuame)t~pcaclo~i.Phuangtrinh(2.8)coth€ vi€t l~i nhusau: m m m MI(C,t)=-2:P(cj)log2P(Cj)+P(t)2:P(cilt)log2P(cdt)+P(t)2:P(cilt)log2P(cilt) (2.9) ~l ~l ~l MI duQ'ctinhv6i m6itit1.CactitduQ'chQnco th€ ho(lcla tAtcacactitthoa manMI tbi thi€u (donguaidungquydinh)ho(lcnguQ'cl~ino co th€ la k titmaco giatri de)docaDnhAt,v6i k duQ'cxacdinhtheoth\lcnghi~m. Dotincar De)tinc~ycoth€ dungd€ chQnd(lctrungchobe)phanlo~idancApvaphan lo~ico thu b~c. TrongtruanghQ'Pphanlo~idancAp,de)tinc~ycuaill tv6i lo~iCichinhlade) tinc~ycualu~t =>CivaduQ'Ctinhnhusau: ~( ) count(tc;) conl~t~ c. = I count(t) (2.10) Dbi v6i truanghQ'Pphanlo~icothub~c,de)tin c~ycheoduQ'cdinhnghla nhusau: contE(t ~ Cj) = P(tCj) P(tcJ+ LB(cj,cJP(tc) (2.11) Cj*Cj V6iB(Cj,Ci)lakhoangcachgiualo~iCjvaCjtrongcayphancApchud~.Theo congthucde)tinc~ynaythigiatricangnhon€u t ph6bi€n trongcaclo~iCjmacac lo~iCjnayxav6i lo~iduQ'cd\ldocinCj.No d~tgiatri IannhAtla 1n€u t duQ'cb~t g(lpchitrongcacthongdi~pcolo~iCj.R5 rangde)donaychobi€t tinhchinhxac cuabe)phanlo~i. 60 DodoJ £)9 do J cling co the dUng de chpn dilc trung cho ca tnrang hpp phiin lo{li (/07 cip vaphiin 10ft;co thiJ bPc. DQ do J dc3ivai phan loej lei: P(e.lt) P(-.c.lt) J(t~cJ=P(t)*[P(eilt)log2 1 +P(-.c;lt)log2 '] P(cJ P(-,cJ (2.12) DQdoJ la itchcuahaid~iluQ11g.Tru6etieD,pet)daTIgUmla dQdoeualu~t va d~iluQ11gthuhai la trongngo~evuonglamn6i b~teadQehiOOxaevatiOOdaTI gianeuabQphanlo~i. Chnd~etrungb~ngeachdungdQtin e~yho~edQdo J va thgehi~nrieng d6iv6i m6i Io~ibAngeachxaedinhgia tri t6i thieueiiad9 doho~eehnehom6i Io~imQts6d~etrungmacoh~ngcaDooAtd6iv6i lo~inay. Chil Y r~g dQtine~yvadQdoJ d6ucothedu,edungehoehnd~etrungva chnlu~tva;ho~cx~ph~ng.Caelu~tk~thqpco thedu,exemOOulo~id~c!rung maco dQdai Ian ban 1va m6iill riengbi~tla mQtt~pti6ndi6uki~ncuamQtlu~t tuang(mg. Caegiatr!P(c),pet)vaP(e,t)du,ct£OOtheocaccachkhacOOau.Thi d\l d6i v6i mohinhdabi~nBernoullicuaBayesthiP(c) b~g s6tai li~uco lo~ic chiacho t6ngs6taiIi~u;pet)b~ngs6taili~uchuaill t chiachot6ngs6taili~u;P(c,t)b~ng 86taili~ulo~ic vaehuaill t chiachot6ngs6taili~utrongt~pluy~n.D6iv6imo hiOOdathuccuaBayesthiP(c)b~gs6tircungxuAthi~ntrongcactaili~ucolo~ie chiachot6ngs6tircungxuAthi~n;pet)b~gs6IAnxuAthi~ncuatirt chiachot6ng 86tirxuAthi~nvaP(e,t)b~ngs6IAnxuAthi~neuaill t trongcaetaili~ulo~ie chia chot6ngs6ill xuAthi~n.

Các file đính kèm theo tài liệu này:

  • pdf3.pdf
  • pdf0.pdf
  • pdf1.pdf
  • pdf2_2.pdf
  • pdf4_2.pdf
  • pdf5.pdf
  • pdf6.pdf
  • pdf7.pdf
  • pdf8.pdf
Tài liệu liên quan