In this paper, we propose a new way to apply Haar-like
patterns as well as how to use integral image technique for
 computing feature values. For the same feature, much more
informative values can be extracted and hence, detection
rate is better as well.
The more important contribution is how we apply
Gaussian probability distribution to AdaBoost to improve
its performance. By utilizing this function, we can avoid
the difficulty to choose optimal thresholds for each round
of AdaBoost. Classification problem becomes simpler and
more straightforward by determining how far to the mean
positive and negative distributions are. Besides, the
detection speed is also faster because of classification rate.
Those two contributions have been characterized into
the success of our paper. By applying this system or this
idea about implementation, face detection system can be
run in a normal computer or machine. From the advance in
performance, this method can be used in other real-time
detection systems in practice.
                
              
                                            
                                
            
 
            
                 6 trang
6 trang | 
Chia sẻ: huongthu9 | Lượt xem: 951 | Lượt tải: 0 
              
            Bạn đang xem nội dung tài liệu Fast gaussian distribution based adaboost algorithm for face detection, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
ISSN 1859-1531 - THE UNIVERSITY OF DANANG, JOURNAL OF SCIENCE AND TECHNOLOGY, NO. 6(127).2018 45 
FAST GAUSSIAN DISTRIBUTION BASED ADABOOST ALGORITHM 
FOR FACE DETECTION 
Tuan M. Pham1, Hao P. Do2, Danh C. Doan2, Hoang V. Nguyen2 
1University of Science and Technology - The University of Danang; pmtuan@dut.udn.vn 
2Hippo Tech Vietnam; {haodophuc, danhdoan.es, nguyenviethoang.25}@gmail.com 
Abstract - In the past few years, Paul Viola and Michael J. Jones 
have successfully developed a new face detection approach which 
has been widely applied to many detection systems. Even though 
the efficiency and robustness are proved in both performance and 
accuracy, there is still a number of improvements that we can apply 
to enhance their algorithm. This paper inherits face detection 
framework of Viola-Jones and introduces two key contributions. 
First, the modification is used to apply integral image so that 
features are more informative and help increase detection 
performance. The second contribution is the new approach to 
utilize AdaBoost that uses Gaussian Probability Distribution to 
compute how close to the mean positive and negative distributions 
are, then classify them more efficiently. Furthermore, by 
experiments, we also prove that a small fraction of a feature set is 
far enough to develop a good strong classifier instead of the whole 
feature set. As a result, the memory required as well as the time for 
training is minimized. 
Key words - face detection; Gaussian distribution; AdaBoost; 
Haar-like pattern; weak classifier 
1. Introduction 
In recent decades, along with the rapidly advanced 
improvement in technology, face detection has now 
become the most popular topic that can be applied to many 
fields in industries or in real life. Algorithms for face 
detections are developed quickly and become more 
enhanced to support complicated applications like multi-
view face detection [1-4], occluded face detection [3, 5], 
pedestrian detection [6, 7], ... In this paper, we inherit the 
work of Viola-Jones [8] which has been proved successful 
in accuracy as well as in performance. 
Thanks to their great work, the number of practical 
real-time applications and systems are built for face 
detection or related topics. We will propose some new 
methods for feature extraction and implementation so that 
the system can train and detect images faster than previous 
one from Viola-Jones and it also utilizes less memory 
storage. Besides, new Haar-like patterns are proposed to 
improve the efficiency of detection. 
There are two main contributions in our face detection 
systems and they are briefly introduced below and in detail 
in next sections. 
First, we utilize integral image representation as the 
main component to quickly compute feature values. 
Nevertheless, it is more efficient when applying non 
integer-sized pattern as described in section 3. In this way, 
given a feature, we can obtain much more information that 
is necessary for classification process. In this step, Viola-
Jones system pre-calculates feature values and stores them 
in hard drive. This method is useful in training process 
because all information is already computed. In contrast, 
the significantly long time is used for this calculation and 
working with hard drive. Instead of following their method, 
we introduce another approach. Given the size of a data set, 
size of image as well as the features patterns, a lookup table 
used for feature indexing can be generated separately 
before training procedure proceeds. We have to compute a 
specific feature value when needed. It is more efficient than 
the previous work [8] not only in performance but also in 
memory consumption. 
The second enhancement is how AdaBoost [9] is 
applied. Viola-Jones used threshold to classify positive and 
negative example images. Multiple weaker classifiers are 
combined to form a strong one which can divide data 
perfectly when learning, but in testing, it might fail. The 
detection quality depends much on the correctness of the 
AdaBoost function to classify data. In case the positive and 
negative distribution overlap, it is apparently difficult to 
choose a good threshold between those regions. In this 
paper we apply Gaussian probability distribution for the 
classification task. Why we use and how we apply this 
method to AdaBoost process is described in this section, 
and pseudo-code is also provided. Besides, the number of 
operations for Gaussian is less than using threshold, thus 
the training and detection time is reduced. 
a. Overview 
We start by review Viola-Jones systems and point out 
some functions that we need to improve. The review is in 
section 1. In section 2, we describe and explain how we 
choose and implement our new algorithms and why they 
are effective in face detection system. After that, the 
experiments and results are clearly shown in Section 3. 
2. Review of Viola-Jones Algorithms 
a. Haar-like patterns 
In every vision system, the efficiency and accuracy 
depend strongly on the features it uses and the quality of 
those features. Feature design and its calculation is the key 
to the success of a computer vision or machine learning 
system. To extract features from example images, Viola-
Jones used Haar-like rectangle features in their systems as 
shown in Figure 1. 
Figure 1. Haar-like rectangle patterns in Viola-Jones system 
Feature value for a certain rectangle is the difference 
between white and black pixel values. If doing this task by 
common methods, it would take the complexity of O(HW) 
where H and W are height and width of a pattern. 
46 Tuan M. Pham, Hao P. Do, Danh C. Doan, Hoang V. Nguyen 
Integral image representation is one of their 
contributions in their paper. It is applied to rapidly compute 
feature values. Its formula is described below: 
𝑖𝑖(𝑥, 𝑦) = ∑ 𝑖(𝑥′, 𝑦′)
0 ≤𝑥′≤𝑥,0 ≤𝑦′≤𝑦
Where 𝑖(𝑥, 𝑦) is the image intensity at pixel (𝑥, 𝑦) and 
𝑖𝑖(𝑥, 𝑦) is the value of integral image at pixel (𝑥, 𝑦). 
By using integral image method, we are able to 
calculate any rectangular sum by pre-computed referenced 
rectangles. Thus, the complexity is approximately O(1). 
In their detection system, they trained and tested by 
24x24 Grayscale PNG images. For each image, they 
applied those 5 rectangle features. Heights, widths and 
positions of each rectangle are also varied. Because 
information of images and feature patterns are given, the 
number of features which can be applied to an image is 
known. There were 43200, 43200, 27600, 27600 and 
20736 features for each rectangle of category (a), (b), (c), 
(d) and (e) respectively, thus 162336 features in total. 
b. AdaBoost algorithm 
From that, there were a huge number of features 
corresponding to each image sub-window. Due to this fact, 
it is still a lot of work even when those features are 
calculated quickly and efficiently. However, by using a 
very small set of features, detection system can form an 
effective classifier. 
Thanks to the invention of AdaBoost [9], this method 
can be used to select the essential features as well as to train 
for a strong classifier. AdaBoost is an algorithm for 
constructing a strong classifier from a linear combination 
of weak classifier. 
𝐹(𝑥) = ∑ 𝑎𝑡ℎ𝑡(𝑥)
𝑇
𝑡=1
Where 𝑥 is an example (19x19 image in our system), 
ℎ𝑡(𝑥) is a weak or basis classifier. Normally, the set 
ℋ = ℎ(𝑥) is finite. 
A weak classifier is a function of a feature (f), a 
threshold (𝜃) and a polarity (p) that denotes the direction 
of the inequality: 
ℎ(𝑥, 𝑓, 𝑝, 𝜃) = {
1 𝑝. 𝑓(𝑥) < 𝑝. 𝜃
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
For a weak learner, we do not expect the best 
classification. After each round of AdaBoost, a weak 
classifier with a smallest weighted error is chosen: 
ℎ̂𝑡 = arg
𝑚𝑖𝑛
ℎ𝑗 ∈ 𝐻
𝜀𝑗 = ∑ 𝑤𝑖|ℎ𝑗(𝑥𝑖 , 𝑓, 𝑝, 𝜃) − 𝑦𝑖|
𝑖
Where 𝑦𝑖 is the correct label for example 𝑥𝑖, 𝑦𝑖 is 1 if 
𝑥𝑖 represents a face, otherwise it is 0. 
In additions, every example is re-weighted so that it is 
emphasized in the next training round. Clearly, an example 
which is incorrectly labeled in the current round will have 
the greater weight compared to correct ones. 
𝑤𝑡+1,𝑖 = 𝑤𝑡,𝑖𝛽
1−𝑒𝑖 
Where 𝑒𝑖 = 0 if 𝑥𝑖 is correctly classified, otherwise it 
is 1, and 𝛽 =
𝑒𝑖
1−𝑒𝑖
AdaBoost is very simple to implement and efficiently 
extract good features from a very large set. One of the 
disadvantages of AdaBoost algorithm is that over fit is the 
result of choosing a very complex-training model, turning 
this to the key challenge to applying this method. 
Graphic visualization of AdaBoost process after each 
round is shown in Figure 2 below. In this example, we need 
to detect and classify blue dots from red ones. We apply 
AdaBoost to this problem and try to find the best weak 
classifier to classify these two regions in each round. 
Figure 2. Visualization for AdaBoost process after t=1 and t=3 
After completing the first round, 1 weak classifier is 
chosen as described by the black straight line. The total 
detection quality is now very low. However, by combining 
3 weak classifiers, the accuracy is significantly improved. 
In Viola-Jones system, they used AdaBoost and for 
each weak learner, tried to find the optimal threshold 
classification function. Supposing that there are N image 
examples and K features for each image, so they had KN 
combinations for feature and threshold. For the data set 
used by Viola-Jones, K was 162336 features and N was 
6977 images to train. In a training round of AdaBoost 
procedure, it needs to iterate the whole data set to evaluate 
the training error for a feature/threshold combination. It 
means, the complexity required for each round was 
O(NKN) to find a weak classifier. By setting the number of 
weak classifiers to M, the total complexity to train their 
system is O(MNKN). With M = 200, at least 1.58x1015 
operations needed to be processed in any training machine. 
Even when working with a super computer, it is still a 
tremendous procedure and takes a significantly long time 
to finish. 
To improve the system as well as reduce the training 
time, they proposed the modified algorithm. With a 
specific feature, they could find the optimal threshold by 
using current example weights without generating all 
possible combinations of this feature and every image 
example. To apply this algorithm, with each feature, 
examples should be sorted by their feature values, and the 
complexity for this process is O(Nlog2N). Thereafter, it 
only requires O(N) to find the optimal threshold for current 
feature. Hence, the complexity to this sub-task is O(max(N, 
Nlog2N)) = O(Nlog2N). This algorithm led to the reduced 
complexity of O(MKNlog2N) and sustainable decrease in 
the number of operations to 2.89x1012. 
ISSN 1859-1531 - THE UNIVERSITY OF DANANG, JOURNAL OF SCIENCE AND TECHNOLOGY, NO. 6(127).2018 47 
Pseudo-code for Viola-Jones' algorithm is shown as 
below: 
1. Giving example image (𝑥1, 𝑦1),  , (𝑥𝑛 , 𝑦𝑛) where 
𝑦𝑖 = 0, 1 for negative and positive examples 
respectively. 
2. Initializing weights 𝑤𝑡,𝑖 =
1
2𝑚
,
1
2𝑙
 for 𝑦𝑖 = 0, 1 
respectively, where m and l are the number of 
negatives and positives respectively. 
3. For 𝑡 = 1,  , 𝑇: 
a. Normalizing the weights 𝑤𝑡,𝑖 ←
𝑤𝑡,𝑖
∑ 𝑤𝑡,𝑗
𝑛
𝑗
b. Selecting the best weak classifier with 
respect to the weighted error 
𝜀𝑡 =
𝑚𝑖𝑛
𝑓, 𝑝, 𝜃
∑ 𝑤𝑖|ℎ(𝑥𝑖 , 𝑓, 𝑝, 𝜃) − 𝑦𝑖|
𝑖
a. Defining ℎ𝑡(𝑥) = ℎ(𝑥, 𝑓𝑡 , 𝑝𝑡 , 𝜃𝑡) where 
𝑓𝑡 , 𝑝𝑡 , and 𝜃𝑡 are the minimizers of 𝜀𝑡 
b. Updating the weights: 
𝑤𝑡+1,𝑖 = 𝑤𝑡,𝑖𝛽
1−𝑒𝑖 
Where 𝑒𝑖 = 0 if example 𝑥𝑖 is classified 
correctly, 𝑒𝑖 = 1 otherwise, and 𝛽 =
𝑒𝑖
1−𝑒𝑖
4. Combining strong classifier 
3. Proposed Method to Improve Viola-Jones system 
The main purpose of this paper is to propose the new 
method that can perform detection faster than the 
traditional Viola-Jone system, so we choose the same 
Haar-like rectangle features in their system. Besides, we 
also introduce our new features which are more efficient 
for face detection systems when the complexity level 
increases, such as detecting rotated faces. 
3.1. New feature selection 
In Viola-Jones system, they used integer-size of 
rectangle. In our research, we again use those rectangle but 
with non-integer size. With this method, features can be 
more informative, thus the detection performance is higher 
than that by the Viola-Jones system. Figure 3 shows some 
examples of new non integer-sized rectangle. This is 2x2 
sub-window from an image and the height of rectangle 
feature is 3. 
Figure 3. 2x2 sub-window image with non- integer-sized feature 
Because the size is a non -integer number, feature 
values are represented by floating-point number. It results 
in new difficulties when systems use complicated pattern 
of features. For this problem, we also figure out the 
approach which can quickly compute the feature values in 
few operations with the complexity of O(1). Compared to 
the traditional feature calculation, processing time is now 
nearly the same. 
To apply new rectangle features, users only need to 
clearly specify new pattern before training their models. 
Below is an example for pattern (d) in Figure 1. The matrix 
shows color map of features with 1 for white and -1 for 
black. 
1 1 1 
-1 -1 -1 
1 1 1 
By using the color map above, we can quickly compute 
the feature value for each rectangle with non-integer size. 
The size and position of a feature can vary 
correspondingly to the 19x19 image. Given the size of a 
pattern, we can manage the number of arising features. 
This point leads to another improvement for our system, 
that is pre-calculating and storing for feature values are no 
longer required. Back to Viola-Jones approach, they have 
to use hard drive to store the whole set of feature values. 
Apparently, the way costs a tremendously long time to get 
the training data available. Besides, it consumes huge 
memory storage which is now not essential in our system. 
There are 29241, 29241. 23409, 23409 and 29241 
features for category (a), (b), (c), (d) and (e) respectively. 
Thanks to the constraint of rectangle sizes, we 
separately make a lookup table from the given information 
about those rectangles. It means that when it requires any 
single feature value, our system can immediately find the 
exact feature as well as its size and position. After that we 
easily compute them by our formula as we have mentioned 
before. To generate this lookup table, we just apply brute-
force algorithm to iterate and find feature information. 
3.2. Gaussian distribution as classification function 
Even though the training time is significantly reduced, 
it is still a long time. In our research and experiment, we 
propose a new way to train our detection system by 
applying Gaussian probability distribution instead of 
finding optimal threshold for each feature. 
Starting with the point that positive and negative 
distribution of Haar-like feature for an image are very hard 
to classify by a single threshold. By combining multiple 
weak classifiers with many thresholds, the number of 
operations exponentially increases without guaranteeing 
the increase of detection accuracy. Besides, it can result in 
over-fitting problem on the training process. 
Figure 4. Histogram of a specific feature for face and 
 non-face images 
48 Tuan M. Pham, Hao P. Do, Danh C. Doan, Hoang V. Nguyen 
Figure above shows histogram of feature values for 
face and non-face images computed from a specific feature. 
Blue region denotes feature values for face images, and 
non-faces are drawn in orange. The x-axis is the feature 
values; meanwhile y-axis shows the frequency after 
normalization of each value. From the figure, it is clear that 
2 histograms are overlapped, leading to the difficulty to 
select a threshold. Moreover, in those situations, a weak 
classifier's performance is poor but the training time is 
longer and finally the testing result is poor as a 
consequence. 
In this paper, we propose the AdaBoost algorithm that 
uses Gaussian distribution of feature values. Gaussian 
distribution is one of the most important probability 
distributions for continuous variables and it is really useful 
in natural sciences. From theory, the averages of samples 
of a variable from independent distributions converge in 
distribution to the normal. In other words, it becomes 
normally distributed when we have enough observations. 
Below is the formula for a Gaussian distribution of a single 
real-valued variable x: 
𝑓𝑔(𝑥|𝜇, 𝜎
2) =
1
√2𝜋𝜎2
𝑒
−
(𝑥−𝜇)2
2𝜎2 
Where 𝜇 is the mean and 𝜎2 is the variance of the 
sequence of feature values for the data set with a specific 
feature. 
To overcome this overlapping problem, we apply 
Gaussian distribution to calculate and compare the 
difference to 2 means of positive and negative distributions. 
The classification function is applied as follows: 
ℎ(𝑥, 𝑓, 𝜇𝑝, 𝜎𝑝
2, 𝜇𝑛, 𝜎𝑛
2)
= {
1 𝑓(𝑥|𝜇𝑝, 𝜎𝑝
2) > 𝑓(𝑥|𝜇𝑛, 𝜎𝑛
2)
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Where 𝑓(𝑥|𝜇, 𝜎2) is the Gaussian function that is 
mentioned above. 𝑥 is a feature value of an example image. 
𝜇𝑝, 𝜎𝑝
2, 𝜇𝑛, and 𝜎𝑛
2 are means and variances for positive 
and negative distributions respectively. 
Our method proceeds as follows. For each feature, all 
corresponded values to image examples are computed at a 
given feature. Next , we calculate means and variances for 
2 distributions. After that, formula of Gaussian distribution 
is applied to find the distance to the means before 
comparison. If an image example is more likely to be a face, 
so the closer it is from the means of a positive region. 
Otherwise, it is closer to negative distribution. With this 
method, the difficulty of overlapping is overcome because 
the means of distributions are all always separated. 
After finishing the current classification, error is 
computed to select the best feature of the current AdaBoost 
round. Clearly, the weights of image examples are 
re-computed to emphasize incorrect classification for later 
training. 
Our procedure is described in the pseudo-code below. 
The only difference between our algorithm and Viola-
Jones' is the classification function with Gaussian method. 
1. Giving example image (𝑥1, 𝑦1),  , (𝑥𝑛 , 𝑦𝑛) where 
𝑦𝑖 = 0, 1 for negative and positive examples 
respectively. 
2. Initializing weights 𝑤𝑡,𝑖 =
1
2𝑚
,
1
2𝑙
 for 𝑦𝑖 = 0, 1 
respectively, where m and l are the number of 
negatives and positives respectively. 
3. For 𝑡 = 1,  , 𝑇: 
a. Normalizing the weights 𝑤𝑡,𝑖 ←
𝑤𝑡,𝑖
∑ 𝑤𝑡,𝑗
𝑛
𝑗
b. Selecting K’ features from full set of features 
c. For each feature: 
i. Computing feature values for every 
example 
ii. Calculating means and variances for 
positive and negative distributions. 
iii. Selecting the best weak classifier that 
minimizes the error: 
𝜀𝑡 = 𝑚𝑖𝑛𝑓 ∑ 𝑤𝑖|ℎ(ℎ(𝑥𝑖 , 𝑓, 𝜇𝑝, 𝜎𝑝
2, 𝜇𝑛, 𝜎𝑛
2 ) − 𝑦𝑖|
𝑖
iv. Defining ℎ𝑡(𝑥) = ℎ(𝑥, 𝑓𝑡) where 𝑓𝑡 is the 
minimizer of 𝜀𝑡 
d. Updating the weights: 
𝑤𝑡+1,𝑖 = 𝑤𝑡,𝑖𝛽
1−𝑒𝑖 
Where 𝑒𝑖 = 0 if example 𝑥𝑖 is classified 
correctly, 𝑒𝑖 = 1 otherwise, and 𝛽 =
𝑒𝑖
1−𝑒𝑖
4. Combining strong classifier 
In our method, it takes linear time to compute mean and 
variance for each positive or negative distribution with 
complexity of O(N) and all simple operations. O(MKN) is 
the total complexity for our algorithm. However, due to the 
use of Gaussian distribution, the floating-point operations 
are obligated. Indeed, exponential operation for floating-
point numbers is far complicated than simple arithmetic 
ones. In our system, this expression is used to find 𝑒𝑥 for 
classification function 𝑓(𝑥|𝜇𝑝, 𝜎𝑝
2). Although the current 
complexity is O(MKN), it costs even longer time than 
O(MKNlog2N) does if those operations are not well 
computed. Thus, for this step, instead of exponentials we 
use inverse operation which is natural logarithm 𝑙𝑛(𝑥). By 
our experiment, it takes much less time to compute 
𝑙𝑛(𝑒𝑥) compared to 𝑒𝑥. By simplifying expressions to find 
Gaussian function, the number of operations is also 
remarkably reduced. 
If an image is a face, ratio of Gaussian functions for two 
distributions should be greater than 1: 
1
√2𝜋𝜎𝑝2
𝑒
(𝑥−𝜇𝑝)
2
2𝜎𝑝
2
1
√2𝜋𝜎𝑛2
𝑒
(𝑥−𝜇𝑛)
2
2𝜎𝑛
2
> 1 
or, 
√𝜎𝑛2
√𝜎𝑝2
> 𝑒
(𝑥−𝜇𝑝)
2
2𝜎𝑝
2 −
(𝑥−𝜇𝑛)
2
2𝜎𝑛
2
ISSN 1859-1531 - THE UNIVERSITY OF DANANG, JOURNAL OF SCIENCE AND TECHNOLOGY, NO. 6(127).2018 49 
Because all elements are non-negative numbers, the 
inequality remains unchanged when taking square of both 
sides: 
𝜎𝑛
2
𝜎𝑝2
> 𝑒
(𝑥−𝜇𝑝)
2
2𝜎𝑝
2 −
(𝑥−𝜇𝑛)
2
2𝜎𝑛
2
Moreover, natural exponential is a one-to-one and 
increasing function, we can apply natural logarithm to the 
inequality: 
ln (
𝜎𝑛
2
𝜎𝑝2
) >
(𝑥 − 𝜇𝑝)
2
2𝜎𝑝2
−
(𝑥 − 𝜇𝑛)
2
2𝜎𝑛2
At this point, we can use the above inequality to make 
comparison; all of those operations can be calculated in a 
short time. We have reduced a lot of operations and the 
computation is now much simpler. 
The following shows pseudo-code for our approach: 
1. Computing feature values for the whole data set 
2. Finding mean 𝜇 and variance 𝜎2for positive and 
negative distributions 
3. Using natural logarithm to find: 𝑎 = ln (
𝜎𝑛
2
𝜎𝑝
2) 
4. Finding the value of 𝑏 = 
(𝑥−𝜇𝑝)
2
𝜎𝑝
2 −
(𝑥−𝜇𝑛)
2
𝜎𝑛
2 
5. Comparing 𝑎 and 𝑏, return 1 if 𝑎 > 𝑏, otherwise 0 
4. Experiments and Results 
a. Experiments 
As mentioned before, to train and test this system, we 
use data set from MIT cbcl Face Data [10] and it contains 
19x19 grayscale PGM images. 
In this procedure, we conduct some experiments to 
evaluate and compare the processing time between original 
method with our proposed one. Before preparing lookup 
table and training, we normalize the whole data set for both 
train and test images [11] Thus, those images are now in 
the same standard. Samples of images before and after the 
normalization process are listed below: 
Figure 5. Original images before normalization 
Figure 6. Images after normalization 
In the experiment, we implement our algorithm from 
scratch in Java. After that, the system runs in a normal 
computer with Mac OS, memory 8 GB 1600MHz DDR3 
and processor 2.6 GHz Intel Core i5. The amount of hard 
drive is not specified due to the fact that we only use Ram 
to experiment our system. It means, hard drive is not 
mandatory as in Viola-Jones's. 
b. Results 
Theoretically, for each round of AdaBoost process, 
there are totally 134541 features used for testing to choose 
the best weak classifier. By analysis and experiments, it is 
unnecessary to test all of those features, instead, a small 
number of them can be used to reduce the training time but 
still maintain the accuracy level. From experimenting 
different numbers of features for training each round, 
we have found that K' = 5000 features are sufficient for 
2 criteria above. 
The graph in Figure 7 shows the comparison between 
choosing K' = 5000 features versus the whole 134541 
features. Both of two AdaBoost algorithms that are used 
Viola-Jones with threshold and proposed method with 
Gaussian distribution are involved. Viola-Jones’ 
approaches are figured by blue and gray dashed lines. 
Figure 7. Processing time and accuracy between 2 methods 
with different number of chosen features on training image set 
In this experiment, we do not follow Viola-Jones 
method which requires hard drive to store feature values 
due to the long processing time with memory storage. 
Hence, in our proposed method, lookup table is utilized to 
reduce the training time. 
The x-axis is processing time measured in second and 
the y-axis is the corresponding accuracy by percentage. 
When Viola-Jones method using threshold is applied to 
classify face/non-face images, it takes approximately 
760 seconds for a weak classifier if we test the whole set of 
features. In contrast, 30 seconds is needed if this procedure 
is performed by 5000 features. When applying Gaussian 
distribution, the processing time decreases. It requires 
600 seconds for the full feature set and 22 seconds if 
5000 features are chosen. 
Figure 8. Result from experimenting with testing image set 
By conducting this experiment, it is proved that the 
detection system can still obtain the high performance 
50 Tuan M. Pham, Hao P. Do, Danh C. Doan, Hoang V. Nguyen 
without choosing the whole set of features. From the above 
figure, applying Gaussian distribution is better than 
original Viola-Jones method in both cases. This result is 
gained by testing with the training image set, we have the 
similar result with the testing image set and it is shown in 
Figure 8. 
In those experiment processes, we compute the 
accuracy of detection in the fixed processing time of about 
1 hour. With this period of time, AdaBoost by Viola-Jones’ 
algorithm can produce 5 and 125 weak classifiers for the 
whole set and 5000 features set respectively. Similarly, 
6 and 160 weak classifiers are chosen with Gaussian 
distribution algorithm. 
We also conduct another experiment that sets the fixed 
value of T - the number of weak classifiers. For T = 200, 
if we apply Viola-Jones system that pre-compute all feature 
values and store to hard drive, then use those values for 
training, it takes 46 seconds to compute and choose 1 weak 
class. Approximately, 153 minutes or 2 hours 33 minutes 
is required for the complete training process. By setting the 
same value for T, but keep using threshold for AdaBoost 
procedure, our new system requires 30 seconds for each 
weak classifier even though our computation is more 
complicated by using floating-point numbers. The total 
training time is now about 96 minutes or 1 hours 
36 minutes, 2/3 of the previous time if we compare that to 
Viola-Jones system. 
The training time is significantly reduced if Gaussian 
probability distribution is applied. For each weak classifier, 
the processing time decreases to 22 seconds. Clearly, the 
training process costs half of the original time with 
76 minutes or 1 hour 16 minutes. 
By using Gaussian probability distribution, the number 
of operations is reduced and now the speed of training is 
2 times faster than that of the previous work. However, in 
Viola-Jones method, they had to use hard drive to 
pre-compute training data and this process took a 
significant time as described in previous section. If this 
factor is taken into account, our new method is proved to 
be far efficient not only in processing time but also in 
memory usage. 
5. Conclusion 
In this paper, we propose a new way to apply Haar-like 
patterns as well as how to use integral image technique for 
computing feature values. For the same feature, much more 
informative values can be extracted and hence, detection 
rate is better as well. 
The more important contribution is how we apply 
Gaussian probability distribution to AdaBoost to improve 
its performance. By utilizing this function, we can avoid 
the difficulty to choose optimal thresholds for each round 
of AdaBoost. Classification problem becomes simpler and 
more straightforward by determining how far to the mean 
positive and negative distributions are. Besides, the 
detection speed is also faster because of classification rate. 
Those two contributions have been characterized into 
the success of our paper. By applying this system or this 
idea about implementation, face detection system can be 
run in a normal computer or machine. From the advance in 
performance, this method can be used in other real-time 
detection systems in practice. 
Acknowledgement 
This research was funded by Vietnam Ministry of 
Science and Technology Research Project in 2017-2018, 
No. CNTT-10 
REFERENCES 
[1] Bo Wu, Haizhou AI, Chang Huang and Shihong Lao, “Fast Rotation 
Invariant Multi-View Face Detection Based on Real Adaboost”, 
IEEE FGR'04, 2004. 
[2] Paul Viola, Michael J. Jones, “Fast Multi-view Face Detection”, 
Mitsubishi Electric Research Lab TR-2003-96, 2003. 
[3] Shengcai Liao, Anil K. Jain, and Stan Z. Li, “A Fast and Accurate 
Unconstrained Face Detector”, 2015. 
[4] T. Mita, T. Kaneko, and O. Hori, “Joint Haar-like Features for Face 
Detection”, ICCV 2005. 
[5] X. P. Burgos-Artizzu, P. Perona, “Robust Face Landmark 
Estimation Under Occlusion”, ICCV, 2013. 
[6] B. Leibe, E. Seemann, and B. Schiele, “Pedestrian Detection in 
Crowded Scenes”, CVPR, 2005. 
[7] S. Zhang, R. Benenson, M. Omran, J. Hosang, and B. Schiele, 
“Towards Reaching Human Performance In Pedestrian Detection”, 
IEEE PAMI, 2017. 
[8] Paul Viola, Michael J. Jones, “Robust Real-time Face Detection”, 
International Journal of Computer Vision, 2004, pp. 138-143. 
[9] Robert E. Schapire, “Explaining AdaBoost”, In Empirical Inference, 
2013. 
[10] CBCL Face Database. Retrieved from 
datasets/FaceData2.html 
[11] Dwayne Philips, “Image Processing in C 2nd”. R&D Publications, 
2000. 
(The Board of Editors received the paper on 03/01/2018, its review was completed on 03/4/2018) 
            Các file đính kèm theo tài liệu này:
 fast_gaussian_distribution_based_adaboost_algorithm_for_face.pdf fast_gaussian_distribution_based_adaboost_algorithm_for_face.pdf