This study investigates the effectiveness of a hybrid
intelligence model that integrates an LSSVM algorithm
and decomposition (OAO) to improve its predictive
accuracy in solving multiple class problems - Excellent,
Good, Average, Fair and Poor are five levels in classifying
water quality. The effectiveness of the OAO-LSSVM
model is compared with that of the SMO, Multiclass
Classifier, Naïve Bayes, Logistic and LibSVM. The
proposed model yields a higher predictive accuracy and
overall average performance score than other models with
92.196% and 91.421%, respectively. Therefore, the OAOLSSVM model can be used as a potential tool in classifying
water quality in reservoir. In further study, the author hopes
that the proposed model can be improved to find the most
robust classification model for water quality and handle
more multi-class classification problems in real world.
4 trang |
Chia sẻ: honghp95 | Lượt xem: 703 | Lượt tải: 0
Bạn đang xem nội dung tài liệu Classification model for water quality in reservoir using an integration of one-Against-one strategy and least square support vector machines - Thi Phuong Trang Pham, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
162 Thi Phuong Trang Pham
CLASSIFICATION MODEL FOR WATER QUALITY IN RESERVOIR USING AN
INTEGRATION OF ONE-AGAINST-ONE STRATEGY AND LEAST SQUARE
SUPPORT VECTOR MACHINES
MÔ HÌNH PHÂN LOẠI CHẤT LƯỢNG NGUỒN NƯỚC HỒ CHỨA BẰNG SỰ KẾT HỢP
CHIẾN LƯỢC MỘT ĐỐI MỘT VÀ BÌNH PHƯƠNG MÁY HỌC VÉC TƠ HỖ TRỢ
Thi Phuong Trang Pham
University of Technology and Education - The University of Danang; trangpham3112@gmail.com
Abstract - An inefficient water management system may become
one of the major disadvantages for a human-centered sustainable
development process. Therefore, the classification model of water
quality in reservoirs is essential in the resolution of environmental
problems and has been a relevant tool for a sustainable and
harmonious progress of the populations. This article proposes a
classification model for classifying water quality in reservoir based
on an integration of one-against-one (OAO) strategy and least
square support vector machine (LSSVM). The paper analyzes and
compares performance of various classification models and
algorithms in order to demonstrate the suitable proposed model in
classifying water quality with accuracy up to 92.196%.
Tóm tắt - Một hệ thống quản lý nguồn nước không hiệu quả có thể
trở thành một trong những bất lợi chính cho quá trình phát triển
bền vững của loài người. Vì vậy, mô hình phân loại chất lượng
nguồn nước tại hồ chứa là rất cần thiết để giải quyết vấn đề môi
trường và đây cũng là công cụ hữu ích cho sự cân bằng quá trình
ô nhiễm. Bài báo này đề xuất mô hình phân loại chất lượng nguồn
nước tại hồ chứa dựa vào sự kết hợp giữa chiến lược một đối một
và bình phương máy học vec-tơ hỗ trợ. Bài báo phân tích và so
sánh kết quả đạt được với những mô hình và thuật toán phân loại
khác để chứng minh sự phù hợp của mô hình được đề xuất trong
việc phân loại chất lượng nguồn nước hồ chứa với độ chính xác
đạt được là 92.196%
Key words - Management system; classification model; reservoir
water quality; one-against-one; least square support vector
machines.
Từ khóa - Hệ thống quản lý; mô hình phân loại; chất lượng nguồn
nước hồ chứa; một đối một; bình phương máy học vec-tơ hỗ trợ.
1. Introduction
The water is a primary natural resource for the survival
and health of humans such as drinking, irrigation,
hydroelectricity, fish fostering and recreation. Reservoirs
are being subjected to intense multi-objective demands on
limited resources, and water use attracts more attention to
water quality. It is clear that, water quality affects other
environmental interests, such as fish and wildlife, and can
impact or impair water use. To be honest, an efficient water
management system is a major goal in contemporary
societies, taking into account the importance to health and
the need to safeguard and promote its sustainable use.
However, the assessment of a reservoir water quality is
being done through analytical methods, which may not be
a good way due to the distances to be covered, the number
of parameters to be considered, and the financial resources
spent to obtain such data.
Many years ago, new technological breakthroughs
provided new ways to create and store information. Indeed,
many organizations accumulate large amounts of
information on a daily basis according to their cities and
processes, based on the assumption that large volume of data
may be a source of knowledge which may be used to improve
their performance and behavior, either by discovering trends
or accelerating the course of efficient decision-making.
However, the conventional tools for data analysis have a great
number of drawbacks since they do not allow the detection of
singularities inside such massive facts. Besides, this method is
time consuming and gives low accuracy, and needs a lot of
manpower. In addition to this, there were some studies [1, 2]
using Artificial Neural Network (ANN) to evaluate the water
quality directly or Yue Liao et.al (2011) combined multiclass
support vector machine (SVM) with biomonitoring to assess
water quality [3].
Classifying reservoir water quality is multi-class
classification problem, and single-machine methods as
well as decomposition strategies are the most popular
methods. Decomposition strategies [4] are commonly used
to solve classification problems with multiple classes.
These methods transform a multi-class classification
problem into several binary classification problems [5].
Several studies [6, 7] demonstrated that one-against-one
(OAO) [8] is one of the most effective decomposition
strategies. Machine learning techniques are powerful tools
for research and a least squares support vector machine
(LSSVM) is a highly enhanced machine-learning
technique with many advanced features [9]. However,
single-machine methods take a significant amount of
computing time to solve large optimization problems and
are not suitable for practical applications [10, 11].
The aim of this research is to propose the suitable multi-
class classification model for classifying water quality based
on the combination of the OAO approach and the LSSVM
model in reservoir. To verify the effectiveness of the
proposed model, this paper analyzes and compares the
performance of the proposed model and other models. This
study, therefore, proposes a multi-classification model,
namely OAO-LSSVM to forecast multiple water quality in
reservoir. The proposed model yields 92.196% of accuracy
compared to other models when applying water quality data.
The rest of this paper is organized as follow. Section 2
reviews the LSSVM, OAO and the classification evaluation
methods. The collection dataset, and analytical results are
mentioned in Section 3. And conclusion is given in Section 4.
ISSN 1859-1531 - TẠP CHÍ KHOA HỌC VÀ CÔNG NGHỆ ĐẠI HỌC ĐÀ NẴNG, SỐ 11(132).2018, QUYỂN 2 163
2. Methodology
2.1. Least square support vector machines for classification
The LSSVM was improved by Suykens et al. (2002)
[12]. In a function estimation of the LSSVM, the
optimization problem is formulated as
2 2
, ,
1
1 1
min ( , )
2 2
N
k
b e
k
J e C e
(1)
Equation (2) is the resulting LSSVM model for
function prediction.
1
( ) ( , )
N
k k
k
f x K x x b
(2)
where 𝛼𝑘 , 𝑏 are Lagrange multipliers and the bias term,
respectively; and K(x, xk) is the kernel function. In this
study, a radial basis function kernel (RBF) is used.
Equation (3) is the RBF function.
2 2( , x ) exp( / 2k kK x x x ) (3)
2.2. One-against-one strategy
The decomposition strategy of decomposing the
original problem into many sub-problems has been
extensively used in using binary classifiers to solve multi-
class classification problems. One-against-One (OAO) [8]
is one of the most effective available decomposition
strategies [6]. Therefore, the OAO algorithm is used for
decomposition herein. The OAO scheme divides an
original problem into as many binary problems as possible
pairs of classes. Typically, the OAO method constructs k(k
- 1)/2 classifiers [5], where k is the number of classes. All
classifiers are combined to yield the final result. Different
methods can be used to combine the obtained classifiers for
the OAO scheme whereas the most common method is a
simple voting method [13].
The LSSVM is the useful tool in solving binary-class
classification. However, there are more and more
complicated multi-class classification problems in the
world. This is the reason why the author combines OAO
approach with LSSVM model. The final model is created
by the integration of the OAO and the LSSVM codes under
the support of Matlab software.
2.3. Evaluation
Various approaches have been suggested for evaluating
the performance of multiclass classifiers. This study
employed six evaluation measures such as accuracy,
precision, sensitivity, specificity, area under the receiver
operating characteristic curve (AUC), and overall average
performance score (S).
Accuracy can be defined as the degree of uncertainty in
a measurement with respect to an absolute standard. The
predictive accuracy of a classification algorithm is
calculated as follows
tp tn
Accuracy
tp fp tn fn
(4)
where the true positive (tp) values (number of correctly
recognized class examples) and true negative (tn) values
(number of correctly recognized examples that do not
belong to the class) represent accurate classifications. The
false positive (fp) value (number of examples that are either
incorrectly assigned to a class or false negative (fn) value
(number of examples that are not assigned to a class) refers
to erroneous classifications.
Two extended versions of accuracy are precision and
sensitivity. Precision measures the reproducibility of a
measurement, whereas sensitivity – also called recall –
measures the completeness. Precision in Eq. (5) is defined
as the number of true positives as a proportion of the total
number of true positives and false positives that are
provided by the classifier. Sensitivity in Eq. (6) is the
number of correctly classified positive examples divided
by the number of positive examples in the data. In
identifying positive labels, sensitivity is useful for
estimating the effectiveness of a classifier.
tp
Precision
tp fp
(5)
tp
Sensitivity
tp fn
(6)
Another performance metric is specificity. The
specificity of a test is the ability of the test to correctly
determine the cases. This metric is estimated by calculating
the number of true negatives as a proportion of the total
number of true negatives and false positives in examples.
Equation (7) is the formula for specificity,
tn
Specificity
tn fp
(7)
The AUC indicates the area under the receiving
operating characteristic (ROC) curve which is the most
commonly used tool for visualizing the performance of a
classifier, and AUC is the best way to capture its
performance as a single number. The ROC curve captures
a single point, the area under the curve (AUC), in the
analysis of model performance [14]. The AUC, sometimes
referred to as the balanced accuracy [15] is easily obtained
using Eq. (8).
1
2
tp tn
AUC
tp fn tn fp
(8)
To compound the effects of preceding measures, an
overall average performance score (S) for the distinct
classification models is proposed in Eq. (9).
S =
1
𝑚
𝑥 ∑ 𝑃𝑖𝑚𝑖=1 (9)
where m is number of distinct performance measures; and
Pi is ith performance measure.
3. Data preparation and analytical results
3.1. Data preparation
The case study in this paper from the field of
hydroelectric engineering involves a dataset on the quality
of water in a reservoir from 150 reservoirs of Taiwan. The
author collected the data from the Taiwan water annual
report. The quality of water plays an important role because
water is a primary natural resource that supports the
survival and health of humans through drinking, irrigation,
164 Thi Phuong Trang Pham
hydroelectricity, aquaculture and recreation. In addition to
this, predicting water quality is critical in the management
of water quality, and enables a manager thereof better
choice. The accurate prediction of phenomena related to
water is essential to the optimal management of water
resources.
Table 1. Statistical attributes of reservoir water quality dataset
Number Parameter - Input Number
Parameter – Output
(Reservoir water
quality grades)
1 Secchi disk Depth (SD) 1 Excellent – Class 1
2 Chlorophyll a (Chla) 2 Good - Class 2
3 Total phosphorus (TP) 3 Average - Class 3
4 Fair - Class 4
5 Poor - Class 5
Table 1 shows the details of the water quality dataset.
Carlon’s Trophic State Index (CTSI) has long been used in
the region to assess eutrophication in reservoirs. Generally,
the factors that are considered to evaluate reservoir water
quality include Secchi disk depth (SD), Chlorophyll a
(Chla), Total Phosphorus (TP), dissolved oxygen (DO),
ammonia (NH3), biochemical oxygen demand (BOD),
temperature (TEMP) and others. In this investigation, SD,
Chla and TP are used to classify the quality of water in a
reservoir because, they are the most important and popular
factors in assessing the water quality. The Organization for
Economic Cooperation and Development ’s (OECD)
single indicator water quality differentiations (Table 2)
[16] are used to generate the following five levels for each
evaluation factor, as follows; excellent (Class 1), good
(Class 2), average (Class 3), fair (Class 4) and poor (Class
5). The database includes 1576 data points with three
independent inputs (SD, Chla and TP) and the output is one
of five ratings of quality of water in a reservoir.
Table 2. Water quality parameter classification
Water quality constituent \ Level Excellent 1 Good 2 Average 3 Fair 4 Poor 5
Secchi disk Depth (SD) * > 4.5 4.5-3.7 3.7-2.3 2.3-1.7 < 1.7
Chlorophyll a (Chla) * 10
Total phosphorus (TP) * 40
Carlon’s Trophic State Index (CTSI) 70
* Trophic state as a function of nutrient levels defined by OECD
3.2. Analytical results
To demonstrate the effectiveness of the OAO-LSSVM
model, its predictive performance is compared with that of
single multi-class classification algorithms - Sequential
Minimal Optimization (SMO), the Multiclass Classifier,
the Naïve Bayes, Logistic and the Library Support vector
machine (LibSVM). The performance of the OAO-
LSSVM model is evaluated in terms of accuracy, precision,
sensitivity and specificity, AUC and S. Table 3 compares
the performances of the SMO, Multiclass Classifier, Naïve
Bayes, Logistic, LibSVM and OAO-LSSVM models when
used to predict the quality of water in a reservoir using test
data. The numerical results revealed that the OAO-LSSVM
was the best model for predicting this dataset in terms of
accuracy, precision, sensitivity and specificity, AUC and S
value (92.196% 90.794%, 90.633%, 92.078%, 91.405%
and 91.421%, respectively). Although the specificity of
OAO-LSSVM model has lower value than Naïve Bayses
and Multiclass Classifier models, the accuracy and S value
of the proposed model yields highest values which are the
most commonly used indexes when comparing multi-class
classification models. The sorting comparison of models in
terms of overall average performance score (S) is further
made in Figure 1.
Figure 1. Comparison of models in term of overall average
performance score (S)
Table 3. Comparison of other multi-class models and the proposed model
Multi-class models
Performance measure
Accuracy (%) Precision (%) Sensitivity (%) Specificity (%) AUC (%) S (%)
SMO 75.238 75.200 77.500 85.900 81.705 79.109
Multiclass Classifier 85.397 85.400 86.500 94.900 90.71 88.581
Naïve Bayses 76.000 76.000 78.700 99.500 89.15 83.870
Logistic 89.580 89.600 89.600 90.600 90.36 89.948
LibSVM 80.950 81.000 81.000 87.600 84.306 82.971
OAO-LSSVM 92.196 90.794 90.633 92.078 91.405 91.421
79.109
88.581
83.870
89.948
82.971
91.421
70 80 90 100
SMO
Multiclass Classifier
Naïve Bayses
Logistic
LibSVM
OAO-LSSVM
Percentage (%)
P
r
id
ic
te
d
m
o
d
e
ls
ISSN 1859-1531 - TẠP CHÍ KHOA HỌC VÀ CÔNG NGHỆ ĐẠI HỌC ĐÀ NẴNG, SỐ 11(132).2018, QUYỂN 2 165
4. Conclusions
This study investigates the effectiveness of a hybrid
intelligence model that integrates an LSSVM algorithm
and decomposition (OAO) to improve its predictive
accuracy in solving multiple class problems - Excellent,
Good, Average, Fair and Poor are five levels in classifying
water quality. The effectiveness of the OAO-LSSVM
model is compared with that of the SMO, Multiclass
Classifier, Naïve Bayes, Logistic and LibSVM. The
proposed model yields a higher predictive accuracy and
overall average performance score than other models with
92.196% and 91.421%, respectively. Therefore, the OAO-
LSSVM model can be used as a potential tool in classifying
water quality in reservoir. In further study, the author hopes
that the proposed model can be improved to find the most
robust classification model for water quality and handle
more multi-class classification problems in real world.
REFERENCES
[1] L.S. Palani S, Tkalich P., An ANN application for water quality
forecasting, Marine Pollution Bulletin 56:1586- 1597 (2008).
[2] B.A. Singh KP, Malik A, Jain G., Artificial neural network modeling
of the river water quality—A case study, Ecological Modelling
220(6):888-895 (2009).
[3] Y. Liao, J. Xu, W. Wang, A Method of Water Quality Assessment
Based on Biomonitoring and Multiclass Support Vector Machine,
Procedia Environmental Sciences 10 (2011) 451-457.
[4] A.C. Lorena, A.C.P.L.F. de Carvalho, J.M.P. Gama, A review on the
combination of binary classifiers in multiclass problems, Artificial
Intelligence Review 30(1) (2009) 19.
[5] M. Galar, A. Fernández, E. Barrenechea, H. Bustince, F.
Herrera, An overview of ensemble methods for binary classifiers
in multi-class problems: Experimental study on one-vs-one and
one-vs-all schemes, Pattern Recognition 44(8) (2011) 1761-
1776.
[6] M. Galar, A. Fernández, E. Barrenechea, F. Herrera, DRCW-OVO:
Distance-based relative competence weighting combination for
One-vs-One strategy in multi-class problems, Pattern Recognition
48(1) (2015) 28-42.
[7] S. Kang, S. Cho, P. Kang, Constructing a multi-class classifier using
one-against-one approach with different binary classifiers,
Neurocomput. 149(PB) (2015) 677-682.
[8] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H.
Witten, The WEKA data mining software: an update, SIGKDD
Explor. Newsl. 11(1) (2009) 10-18.
[9] J.-S. Chou, A.-D. Pham, Nature-inspired metaheuristic optimization
in least squares support vector regression for obtaining bridge scour
information, Information Sciences 399 (2017) 64-80.
[10] H. Chih-Wei, L. Chih-Jen, A comparison of methods for multiclass
support vector machines, IEEE Transactions on Neural Networks
13(2) (2002) 415-425.
[11] R. Rifkin, A. Klautau, In Defense of One-Vs-All Classification, J.
Mach. Learn. Res. 5 (2004) 101-141.
[12] J.A.K. Suykens, T.V. Gestel, J.D. Brabanter, B.D. Moor, J.
Vandewalle, Least squares support vector machines, World
Scientific, Singapore, 2002.
[13] N. García-Pedrajas, D. Ortiz-Boyer, An empirical study of binary
classifier fusion methods for multiclass classification, Information
Fusion 12(2) (2011) 111-130.
[14] J.-S. Chou, C.-F. Tsai, Y.-H. Lu, Project dispute prediction by
hybrid machine learning techniques, Journal of Civil Engineering
and Management 19(4) (2013) 505-517.
[15] M. Sokolova, G. Lapalme, A systematic analysis of performance
measures for classification tasks, Information Processing &
Management 45(4) (2009) 427-437.
[16] N.T.U. Hydrotech Research Institute, Reservoir eutrophiction
prediction and prevention by using remote sensing technique. Water
Resources Agency (in Chinese) (2005).
(The Board of Editors received the paper on 19/9/2018, its review was completed on 31/10/2018)
Các file đính kèm theo tài liệu này:
- pdffull_2019m02d015_14_11_50_783_2114545.pdf