TÓM TẮT
Thị trường chứng khoán là một kênh huy động vốn quan trọng cho nền kinh tế. Tuy nhiên, thị
trường có một sự mất mát tiềm tàng do sự biến động của giá cổ phiếu để phản ánh các sự kiện
không chắc chắn như tin tức chính trị, nguồn cung và nhu cầu của khối lượng giao dịch hàng
ngày. Có nhiều cách khác nhau để giảm rủi ro như xây dựng và tối ưu hóa danh mục đầu tư, phát
triển chiến lược phòng ngừa rủi ro. Vì thế kỹ thuật dự báo chuỗi thời gian có thể rất hữu ích nhằm
giúp cải thiện hiệu suất lợi nhuận cao hơn trên thị trường chứng khoán. Gần đây, thị trường chứng
khoán Việt Nam ngày càng được chú ý bởi hiệu suất đầu tư và vốn hóa đang được cải thiện. Trong
nghiên cứu này, chúng tôi đề xuất mô hình kết hợp giữa mô hình Sequence to Sequence với kiến
trúc mạng bộ nhớ dài-ngắn (Long Short-Term Memory) của học sâu và mô hình cấu trúc chuỗi
thời gian. Chúng tôi dùng dữ liệu giá của 21 cổ phiếu được niêm yết có giao dịch nhiều nhất trên
sàn giao dịch chứng khoán Hồ Chí Minh (HOSE) và sàn giao dịch chứng khoán Hà Nội (HNX) của
thị trường chứng khoán Việt Nam để đánh giá độ chính xác của mô hình đề xuất với mô hình
Sequence to Sequence và mô hình cấu trúc chuỗi thời gian thuần. Mặt khác, để kiểm tra lại tính
ứng dụng của mô hình trong môi trường đầu tư thực tế, chúng tôi dùng mô hình đề xuất cho quyết
định mua (Long) hay bán (Short) hợp đồng tương lai VN30F1M (hợp đồng tương lai chỉ số VN30
kỳ hạn một tháng) được niêm yết trên sàn HNX. Kết quả cho thấy mô hình đề xuất kết hợp giữa
Sequence to Sequence với kiến trúc mạng bộ nhớ dài-ngắn và mô hình cấu trúc chuỗi thời gian
đạt hiệu quả cao hơn với sai số nhỏ hơn các mô hình thuần trong việc dự báo giá chứng khoán và
có lời đối với giao dịch hợp đồng tương lai. Nghiên cứu này có ý nghĩa tích cực trong việc đóng
góp vào cơ sở lý luận của dự báo chuỗi thời gian bởi phương pháp được đề xuất trong nghiên
cứu này giúp bỏ qua những giải định khó thoản mãn trong môi trường tài chính thực tế của các
phương pháp hiện tại như Auto-regressive–moving-average model, Generalized Auto-regressive
Conditional Heteroskedasticity. Về mặt ứng dụng, các nhà đầu tư có thể sử dụng mô mình để phát
triển các chiến thuật để giao dịch trên thị trường chứng khoán Việt Nam.
16 trang |
Chia sẻ: hachi492 | Ngày: 17/01/2022 | Lượt xem: 213 | Lượt tải: 0
Bạn đang xem nội dung tài liệu Time series prediction: A combination of long short term memory and structural time series models, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
his procedure may lead to mislead-
ing results if trend is not deterministic7. A structural
time series models are a decomposable time series in
terms of three components of trend, seasonality and
cycle8,9. It is defined as following equation:
y(t) = g(t)+ s(t)+h(t)+ et (1)
where t = 1; : : : ;T;, and g(t) is stochastic and non-
periodic changes trend, s(t) is a seasonal stationary
linear process with periodic changes (e.g. quarterly,
yearly seasonality), and h(t) is a cyclical frequency of
time occurring on potentially irregular schedules over
one or more days10.
Many researches strongly support the model in prac-
tice have been carried out. For instance, Harvey
shown that class of structural models have several ad-
vantages over the seasonal ARIMA models adopted
and are applicable to model cycles in macroeconomic
time series5,11. Kitagawa, Gersch decomposed time
series into trend, seasonal, globally stationary autore-
gressive and observation error components with state
space Kalman filter and used Akaike minimum AIC
procedure to select the best of the alternative state
space models12 Taylor, Letham use structural models
for forecasting of business time series 10.
The local linear trend is a process can be regarded as a
local approximation to a linear trend. The stochastic
linear process can be described as:
y(t) = g(t)+ et
g(t) = g(t 1)+b (t 1)+ht (2)
b (t) = b (t 1)+zt 0
where the et NID(0;s2e ); t = 1; : : : ;T; ht
NID(0;s2h ); and zt NID(0;s2z ) are dis-
tributed independent of one another and white
noise disturbance terms with mean zero and vari-
ances s2e ; s2h and s2z respectively
13. Koopman and
Ooms14 proposed trend with stationary drift process
to extend local linear trend process by adding a sta-
tionary stochastic drift component:
g(t) = g(t 1)+b (t 1)+ht (3)
bt = (1 jb )
b +jb bt +zt
with autoregressive coefficient 0 < jb 1. However,
there is a drawback with this approach thatmake such
drift processes are difficult to identified. It requires
very large data samples.
Taylor and Letham10 developed new type of trend
models. Accordingly, they suggested that there are
two types of trendmodels: a saturating growthmodel,
and a piecewise linear model (see Figure 1). Saturat-
ing growth model is characterized by growth rate and
limitation of population growth. By applying nonlin-
ear logistic function:
g(t) = C1+e k(t m) (4)
with e is the natural logarithm base, m is the value
of sigmoid middle point, C is the maximum capac-
ity value, k is growth rate of the curve. From that
point of view, it cannot be captured movement in dy-
namic world due to nonconstant growth of maximum
capacity value and rate of the curve. Hence, to over-
come the issues, Taylor and Letham defined a time-
varying of maximum capacity C and growth rate k.
Suppose that we explicitly define S changepoints at
times s j; j= 1; : : : ;S;, and a vector of rate adjustments
d 2 RS with d j is the change in rate that occurs at time
s j 10. The saturating growth model is defined as:
g(t) = C(t)1+e (k+a(t)| d )(t (m+a(t)| g) (5)
where
g j = (s j m åi< j gl )(1 (
k+å
l< j
dl
ål j dl
)
a j (t) = f1;i f ts0;otherwise
Maximum capacityC(t) is adopted from external data
source.
From saturating growth model, we can define piece-
wise linear model without exhibit saturating growth:
g(t) = (k+a(t)T d )t+(m+a(t)T g) (6)
like saturating growth model, k is the growth rate, d
has the rate adjustments, m is offset parameter, and g j
is set to s jd j to make the function continuous.
501
Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515
Figure 1: Fitted daily stock price of Ho Chi Minh City Securities (HCM) stock with piecewise linear model
from January 3rd , 2017 to February 26th, 2019 in log-scale.
Deep Neural Network
Recurrent Neural Network
Despite powerfulness of deep neural networks, tra-
ditional neural networks have two drawbacks15 .
Firstly, main assumption of standard neural networks
is independence among the samples (data points). On
the other words, traditional neural networks cannot
link current event to previous events to inform later
ones due to it stateless preservation. In time series
analysis, it is widely accepted that current value de-
pends on past values4. It is unacceptable because
the independence assumption fails. Secondly, tra-
ditional neural networks require fixed-length vector
of each sample. Hence, it is critical to develop a
new generation of deep neural networks. Rumelhart,
Hinton, Williams (p.533) introduced a new learning
procedure for neuron networks with backpropagation
which can capture internal hidden state to “repre-
sent important features of the task domain”16. Fur-
thermore, with current development, recurrent neu-
ral network can model sequential data with varying
length and time dependences. A simple feed forward
recurrent neural network is defined 17:
h(t) = s(W hxx(t)+W hhht 1+bh) (7)by(t) = so f tmax(W yhht + by) (8)
where h(t) is hidden state of input data point at time
t. Clearly, h(t) is influenced by h(t 1) in the networks
previous state. The output by(t) at each time t is calcu-
lated given the hidden node values h(t) at time t. W yh
is weight matrix of input-hidden layer andW hh is the
matrix of hidden-to-hidden transition. In most con-
text, h(0) is initialized to zero. Haykin, Principe, Se-
jnowski, Mcwhirter suggested that RNN can achieve
stability and higher performance by nonzero initial-
ization18. By comparison to traditional fully con-
nected feedforward network, a recurrent neural net-
work takes advantage of sharing parameters across the
model that helps it learns without separately at each
position of sentence or series 19. Earlier, Jordan pro-
posed an almost like17. However, context nodes are
fed from the output layer instead of from hidden lay-
ers20. It means that Jordans neural network can take
previous predicted output into account to predict cur-
rent output.
h(t) = s(W hxx(t)+W hh by(t 1)+bh) (9)by(t) = so f tmax(W yhht + by)
In term of training, there are two steps to train a
recurrent neural network. First, the forward prop-
agation creates by outputs. After that, loss function
value L(byk;yk) of the network of each output node
k are compute in backpropagation stage. There are
many types of loss function to measure distance be-
tween the output and the actual value of classification
problems. To minimize the distance, we need to up-
date each of the weights iteratively by applying back-
propagation algorithm 16.
The algorithm applies derivative chain rule to calcu-
late the derivative of the loss function L for each pa-
rameter in the network. In addition, weights of neural
502
Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515
network are updated by gradient descent algorithm 15.
Hence, gradient of error of a neuron is calculated as:
dk =
¶L(byk;yk)
(¶byk ) g0k(ak) (10)
where ak = weak+ b is input to node k and eak is in-
coming activation output of ak;g0k(ak) , is activation
function for node k. The first term ¶L(byk;yk)
(¶byk) expresses
how fast the cost is changing as a function of esti-
mated output. The second term g0k(ak) suggests rate
of change of gk activation function at ak . In vector-
ized form, we generalize equation (10) for any layerlth
:
d l = Ñby Cg0(al) (11)
In addition, from the d l , we can compute the error
of the next layer d l+1 as:
d l = ((wl+1)T d t+1)g0(al) (12)
and error with respect to anyweight, bias in the neural
network:
¶C
¶wl = byl 1d l (13)
¶C
¶bl = d
l
From the final layer to first hidden layer, for each
layer of the neural network, we can apply the back-
propagation and compute the error vector d l with the
chain rule repeatedly to update weight and bias vec-
tors. In term of local minimum optimization, gra-
dient descent is utilized for finding the minimum of
cost function by updating weight and bias vectors. It
is computed as:
wl ! wl hm åx d x;l (byx;l 1) (14)
bl ! bl hm åd x;l
where m is number of training examples in a given
mini-batch with each training example x; h is a step
size. In practical, there are many optimizers devel-
oped to improve mini-batch gradient descent limita-
tions21. For instance, Qian22 and Yu23 accelerated
gradient was developed to relax navigating ravines
problem of stochastic gradient decent. Recurrent
neural network is a breakthrough in temporal se-
quence by adding internal state (memory) in each cell
to process sequences of inputs. In term of training, re-
current neural network parameters can be computed
and optimized by feed forward propagation and back-
propagation. For shallow network with a few hid-
den layers, the algorithm can be trained effectively.
However, with many hidden layers, it is hard to train
the network due to vanishing and exploding gradient
problem as derivatives become too small (e.g. 0 to 1
for sigmoid activation function) or too large. It only
allows the network to learn in short-range dependen-
cies and prevents from learning long-range depen-
dencies. As a result, long-short term memory net-
work architecture24, rectified linear units activation
function25, residual learning framework He, Zhang,
Ren, Sun were introduced to overcome the limita-
tion26.
Long-Short TermMemory Network
Formally identified by Hochreiter in both theoret-
ical and experimental approaches27, with involve-
ment of long-term dependencies data, back propaga-
tion algorithm of recurrent neural network is showed
that it suffers from insufficient that tends to ex-
plode or vanish through time may lead to oscillating
weights or unusable model. Not just recurrent neu-
ral network, Bengio, Simard, Frasconi28 also pointed
out that any deep feed-forward neural network with
shared weights may have vanishing gradient problem.
Hochreiter, Schmidhuber (p.6) developed a new ap-
proach called Long Short-Term Memory (LSTM) to
fill these gaps by introducing “input gate unit”, “out-
put gate unit”, and “memory cell”24. Accordingly, the
purpose of multiplicative input gate unit is to protect
memory contents from irrelevant inputs, and multi-
plicative output gate unit is to protect other units from
perturbation by currently irrelevant stored memory
contents. On the other words, with the new LSTM
architecture (see Figure 2), each cell can maintain its
state over time, and adjust input or output informa-
tion. Hence, the new type of neural network archi-
tecture is able to capture very long-term temporal de-
pendencies effectively, handle noise and continuous
values with unlimited state numbers in principle.
Since introduction, with revolution of computational
power, LSTM has been widely adopted and applied
for many difficult problems of many fields in prac-
tice and academic. This includes language model-
ing28, text classification30, language translation30,
speech recognition31. From original LSTM pro-
posed by Hochreiter, Schmidhuber24, a significant
improvement had been developed by introducing for-
get gates to reset out-of-dated contents of LSTM
memory cells32. In addition, to achieve higher ca-
pability of learning timings, peephole connections
that allows gates to look at cell state were added to
LSTM neural network. A forward pass LSTM archi-
tecture with forget gate and peephole connections is
described as33:
z
t
=Wz x
t +Rz y
t 1+bz (15)
z
t
=Wz x
t +Rz y
t 1+bz
zt = g(
z
t
)
i t =Wixt +Riyt 1+ pi ct 1+bi
it = s(i t)
f
t
=W f xt +R f yt 1+ p f ct 1+b f
f t = s(
f
t
)
ct = zt i t + ct 1 f t
o t =Woxt +Royt 1+ po ct 1+bo
ot = s(o t)
yt = h(ct)ot
503
Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515
Figure 2: Long-Short termmemory network architecture. Adopted fromHarmon, Klabjan 29.
where zt is block input, it is input gate, f t is forget gate,
ct is memory cell, ot is output gate, yt is block output.
Wz; Wi; W f ; Wo 2 RNM are input weights;
Rz; Ri; R f ; Ro 2 RNM are recurrent weights;
pi; p f ; po 2 RN are peephole weights; bz; bi; b f ; bo
are bias weights; g; s ; h are activation functions.
Like RNN, LSTM is trained with gradient descent as
it is a differentiable function estimator34. Backprop-
agation equations of LSTM are detailed:
dyt = Dt + Rz td zt+1 + RTi d i
t+1 + RTf d f
t+1 +
RTo dot+1 (16)
do t = dyt h(ct)s 0o t)
dct = dyt ot h0 (ct) + po d o
t
+ pi d i
t+1
+
p f d
f
t+1
+dct+1+ f t+1
d
f
t
= dct ct 1s 0(
f
t
)
d
i
t
= dct zt s 0(
i
t
)
d z
t
= dct it g0( z
t
)
dxt =WTz tdezt +WTi d i t+WTf d f t+ WTo do t
dW = åTt=0hdt ;X ti
dR = åT 1t=0 hdt+1;X ti
db = åTt=0hdti
d pi = åT 1t=0 c
t d i
t+1
d p f = åT 1t=0 c
t d
f
t+1
d po = åT 1t=0 c
t d o
t+1
Where * can be one of z ;
i ;
f ;
o and h1;2i is outer
product of two vectors.
It is worth to note that peephole is not always imple-
mented as forget gate because it simplifies LSTM and
reduce computational cost without significantly scar-
ifying performance. For instance, Keras35 does not
support peephole, but CNTK, TensorFlow does sup-
port 35,36. There have been many variant versions of
vanilla LSTM architecture with minor changes. Greff
et al. found that vanilla LSTM (with forget gate and
peephole) achieve reasonably performance on various
datesets33. Despite effectiveness of LSTM, there are
many efforts to simplify the architecture as LSTM re-
quires huge computational power of hardware. Gated
Recurrent Unit (GRU), a variant of LSTM with fewer
parameters than LSTM by simplifying forget gate,
which introduced byCho et al.37 has reasonable accu-
racy. However, Britz et al. shows that LSTM still sig-
nificantly outperforms GRU38. Hence, Van derWest-
huizen et al. is another attempt to save computational
costs and maintain performance of models by devel-
oping a forget-gate-only version of the LSTM with
chrono-initialized biases that achieves lightly higher
accuracy39.
Sequence to sequencemodel
Sequence to Sequence is a learning model that maps
an input sequence from a fixed-sized vector using
a LSTM to another LSTM to extract an output se-
quence. Sequence to Sequence has been widely ap-
plied in machine translation40, video captioning41,
time series classification for human activity recogni-
tion42. Bahdanau et al. used RNN Encoder-Decoder
that contains two recurrent neural networks (or long
short-term memory) to represent an input sequence
into another sequence of symbols43. One the other
words, encoder-decoder architecture is used to en-
code a sequence, decode the encoded sequence, and
recreate the sequence. The approach aims to maxi-
mize the conditional probability of output sequence
given an input sequence.
Encoder neural network transforms an input se-
quence of variable length X = x1;x2; : : : ;xT into a
504
Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515
Figure 3: Encoder-Decoder architecture.
fixed-length context variable with information of the
sequence (see Figure 3). RNN is mostly used as an
encoder neural network. However, Sutskever et al.40
found that LSTM significantly outperformed shallow
LSTMs and RNN. Asmentioned, RNN and LSTMuse
previous hidden states h1;h2; : : : ;ht 2;ht 1 to create
current hidden stateht . Hence, hidden state of an in-
put sequence is defined as:
ht = f (xt ; ht 1) (17)
c= k(h1;h2; : : : ;hT )
where is hidden state at time t, c is summary hidden
state of the whole input sequence, function f() can be
RNN, LSTM,GRUnetwork, or an activation function.
With summary hidden state c, given a target output
Y = y1; y2; : : : ; yT 0 , instead of computingP(Y jX) di-
rectly, decoder neural network computes conditional
probability of using previous information and sum-
mary hidden state. It is formally described as:
P(y1; : : : ; yT 0 jx1; : : : ; yT )
=ÕT
0
t 0=1P(yt ; jc;y1; : : : ;yt 0 1) (18)
The trained sequence to sequence model can be used
to generate a sequence give an input sequence. In ma-
chine translation, reverse the order of the words of the
input sequence is necessary because it is easy for op-
timizer (e.g. stochastic gradient decent) to “establish
communication between the input and the output”40
(p.3). For the sake of nature, time series prediction
problems always have desired order as input and out-
put is straightforward sequence.
EMPIRICAL RESULTS
Data
In this study, for liquidity and fairness of trading,
we use daily price data of 21 most traded stocks that
is listed on from VN-Index of Ho Chi Minh Stock
Exchange and HNX-Index of Hanoi Stock Exchange
(Vietnam) from 05 January 2015 to 19 January 2019.
It is 1010 data points in total. We use first 965 data
points for training and the last 45 data points for test-
ing. It is 9-type of window size for out-of-sample pre-
diction. It varies from 5 to 45 with 5-step ahead. Fur-
thermore, we use daily price of VN30F1M contract
that are traded on Ha Noi Stock Exchange from 1
September 2017 to from 13 November 2018 for train-
ing, and from 14 November 2018 to 15 May 2019 for
performance validation (120 trading days).
Data Pre-processing
Beyond algorithm improvement and parameter tun-
ing, an approach to improve the accuracy of ma-
chine learning model is apply data pre-processing
techniques. For instances, these techniques are im-
puted missing values, encode categorical data, detect
outliers, transform data, and scaling data. In this
work, we perform logarithm and Box-Cox transform
to transform the input dataset. Rationally, the idea
behind the logarithm transformation is to turn prob-
abilistic distribution of raw input data from skewed
data into approximately normal. Hence, prediction
performance is improved dramatically44. However,
in some circumstances, the logarithm technique does
not generate new data with less variable or more nor-
mal. In contrast, it may lead to be more variable and
more skewed 45. Thus, it is recommended that trans-
formation techniquesmust be applied very cautiously.
Output data of the transform stage is passed to data
scaler to be normalized. There are many types of
scaling method (e.g. maximum absolute value, given
range of feature). We use min-max scaler by scaling
the input feature to a range of [0,1]. It ensures the
505
Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515
large input value do not overwhelm smaller value in-
puts, then helps to stabilize neural networks46.
Detail Results
Structural Time Series
The aim of this step is to create baseline models for
evaluating prediction quality of structural time series
and sequence to sequence models with our proposed
model. Mean square error was calculated to measure
performance of each out-of-sample forecast.
We develop structural time series models as a base-
line model. For this task, we choose Prophet package
which is developed by Facebook for Python program-
ming language10. In this model, data input is a trans-
formed price of stocks in logarithm. In terms of pa-
rameter tuning, we almost use default settings except
adding monthly, quarterly, and yearly with Fourier
orders. We initialize 20, 30, 30 for Fourier orders of
monthly, quarterly, and yearly respectively. As it is
required future data would have to be known to per-
form prediction if we use Box-Cox transformation as
an extra regressor in structural time series models, we
omit the transformation procedure47. Without extra
regressor, the model can generate prediction of 21 se-
lected tickers from 5 to 45 with 5-step incrementation
window size.
As mentioned, Prophet model is structural time se-
ries models that combines trend g(t), seasonality s(t),
and irregular events. Figure 4 describes our attempt
to generate out-of-sample prediction for model qual-
ity evaluation and trend g(t) of series as a feature input
of Sequence to Sequence using Prophet model from
transformed logarithmic form and Box-Cox form of
stock price series.
In detail, for every stock v in selected list of stocks, we
transform the price to log-scale LP and Box-Cox se-
ries BC to use as an input for Prophet model P. We set
no out-of-sample prediction (W=0) to extract trend
series T from in-sample data generated by P as a
feature of Sequence to Sequence model. For perfor-
mance comparation, we set w to every 9-type W of
window size for out-of-sample prediction.
Sequence to sequenceModel
Regarding to baseline models, we also develop a Se-
quence to Sequence with LSTM architecture. We use
Keras with Tensorflow backend to create Encoder-
Decoder model to solve the sequence to sequence
problem35,36. To benefit from the efficiency of par-
allel computation for training deep learning neural
network, we train the model on virtual machine with
GPU on Google Cloud Platform.
Sequence to sequence model use states of encoder
neural network to generate prediction from decoder
neural networks. Hence, we feed normalized stock
price series to the model and generate prediction. In
Figure 5, we describe approach that we use to develop
baseline prediction with sequence to sequencemodel.
Like vanilla LSTM model for supervised learning, we
train input data with many iterations. However, we
discard output of encoder and use state and as input
for decoder. Furthermore, to create prediction for the
proposed model, we add trend series (extracted from
Figure 4) as another input feature.
The implementation is straightforward. First, like
Figure 4, we use scaled data of Box-Cox BC and log-
arithm transforms LP as input data. However, we
scale every BC and LP to range from 0 to 1 to cre-
ate x for every scaled list of stocks price X*. Further-
more, we use logarithm transformed series as target
data. We create and extract hidden states of encoder
model En with LSTM architecture and initialize de-
coder model DE with these hidden states. A main
advantage of Sequence to Sequence with LSTM over
structural time series models is that it can dynami-
cally perform prediction multiple time steps without
requiring extra data. In terms of accuracy, we found
that result of deeper LSTM model does not outper-
form shallow one. Hence, we used LSTM with sin-
gle hidden layer, with 64 cells and rectified linear unit
activation function. To prevent over-fitting, we apply
both L2 regularization and dropout. We use 0.0001
for regularization parameter lambda, rate of dropout
is 0.001 as recommended48.
Sequence to Sequence with Structural Time
SeriesModels
In this step, we combine both sequence to sequence
model and structural time series models. Specifically,
we use output dataset D (with W = 0) from Figure 5
as train data for Figure 6. On the other words, we
combine trend component of structural time series
models with price of stock in Box-Cox and logarithm
forms. Parameters of thesemodels are defined exactly
same as aforementioned baseline models. We found
that results are improved dramatically.
Results Analysis
Structural time series models was used to generate a
set of out-of-sample forecast in multiple window time
steps in log-scale (see Table 1). In terms of prediction
error, the result show thatMSE= 0.087787 (CTG at 45
time steps ahead) is highest, MSE = 0.003501 (SSI at 5
time steps ahead) is lowest. Likewise, results from Se-
quence to Sequencemodel (seeTable 2) and Sequence
506
Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515
Figure 4: Algorithm structural time series analysis with Facebook Prophet library.
Figure 5: Algorithm of Sequence to Sequence.
507
Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515
Figure 6: 45-day forecast of HCM.
to Sequence with Structural Time Series Models (see
Table 3) show that MSE = 0.231800 (PNJ at 45 time
steps ahead) and MSE = 0.046146 (ACB at 45 time
steps ahead) are highest, MSE = 0.000068 (CII at 5
time steps ahead) and MSE = 0.000006 (CII at 5 time
steps ahead) are lowest. Figure 6 plots prediction out-
put of models with actual data of HCM stock.
In term of back testing, by applying Figure 6, we
found that the proposed model can create positive
profit (see Figure 7). For simplicity, we do not con-
sider tax and transaction fee. From initial invested
money $1000, we get $1159.2 at the end of the test.
Specifically, we develop trading environment from
real market data with return TRR (index point) to
measure reward of the test. Agent is developed from
proposed model. For every day of 120 trading days
TD, it uses predicted return PR to choose positions.
If predicted return PR on the next two days (W=2) is
positive, we choose Long position. If it is negative, we
choose Short position. If is around zero, we hold po-
sition. Position is closed when profit PFT is bigger
than a point or the position is held more than a day
(T=2).
From univariate time series analysis perspective, we
found that structural time series models of Face-
book Prophet generate stable and high quality out-of-
sample prediction without requiring advanced tech-
niques or data assumptions. In addition, we also
found that it even achieves higher accuracy in-sample
fitted data when we add an extra regressor to struc-
tural time series models. Unfortunately, we cannot
create out-sample prediction with extra regressor. In
contrast to structural time series models, Sequence to
Sequence model with LSTM neural network cannot
create stable out-of-sample prediction. As Figure 8
point out, in some cases, Sequence to Sequencemodel
captures movement of stocks to generate high accu-
racy prediction with lower error than structural time
series models. However, the model cannot constantly
capture movement of stocks in some other cases. In
terms of computational performance, Sequence to Se-
quence model also takes more time for training and
predicting than structural time series models. It leads
to a gap to leveraging the state-of-the-art technique
for time series prediction. Fortunately, results from
Table 3 suggest that we can fill gaps of structural time
series models and Sequence to Sequence model by
adding output from structural time series models to
Sequence to Sequence model. Figure 8 show that
the model is stable and prediction error of proposed
model is almost always lowest among in threemodels.
In terms of benchmark limitation, there are some
drawbacks in this benchmark. On the one hand, it
is lack of residual analysis for each prediction. We
only compute Mean Square Error (MSE) for perfor-
mance comparison. The evaluated results are not con-
sistent enough to be fully accurate as some outlier
points as Figure 9 point out. On the other hand, al-
though the results are clear and useful when we use
MSE as an indicator for forecasting accuracy evalu-
ation, these forecasting evaluation criteria cannot be
discriminated between forecasting models when er-
rors of the forecast data are very close to each other.
Thus, the Chong and Hendry encompassing test for
nested models49 should be carried out to evaluate
the statistical significance of the forecasting models.
508
Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515
Figure 7: Uncompounded daily cumulative profit of VN30F1M trading.
Table 1: Mean squared error of structural time series model forecast from 5 to 45 window time steps ahead in
log-scale
5 10 15 20 25 30 35 40 45
ACB 0.014721 0.014147 0.015933 0.025314 0.028773 0.029434 0.034342 0.048010 0.058335
BID 0.024936 0.023140 0.014819 0.012375 0.014865 0.013050 0.014286 0.021603 0.023775
BVH 0.016722 0.022044 0.037414 0.046589 0.047791 0.047427 0.048169 0.052609 0.053467
CII 0.015892 0.010909 0.017354 0.025683 0.038861 0.049060 0.050928 0.056750 0.058097
CTD 0.020805 0.038501 0.043919 0.048860 0.056231 0.062125 0.070704 0.081036 0.082875
CTG 0.008746 0.030208 0.054718 0.065659 0.067614 0.068867 0.068824 0.084200 0.087787
DHG 0.024497 0.018136 0.037780 0.043572 0.043265 0.042221 0.044850 0.047424 0.051320
EIB 0.027591 0.027747 0.023507 0.023673 0.029305 0.030964 0.029256 0.021934 0.018138
FPT 0.012170 0.009933 0.009485 0.009722 0.010321 0.011376 0.011212 0.010736 0.010837
GAS 0.012885 0.016930 0.035988 0.047679 0.052466 0.057323 0.065196 0.078481 0.080781
HCM 0.047436 0.057021 0.064199 0.071993 0.069983 0.069544 0.070894 0.080751 0.081232
HPG 0.031256 0.030599 0.031586 0.044547 0.047608 0.050945 0.055629 0.067218 0.076217
MBB 0.015255 0.010532 0.027337 0.032361 0.031718 0.031117 0.032678 0.046915 0.052013
MSN 0.029451 0.022273 0.014707 0.011364 0.012848 0.012974 0.014851 0.022448 0.026552
PNJ 0.006993 0.007920 0.013121 0.016369 0.022160 0.028659 0.045313 0.065240 0.077773
PPC 0.030586 0.030464 0.024987 0.022773 0.029191 0.037727 0.045555 0.054872 0.056164
REE 0.019733 0.016948 0.015939 0.014662 0.012662 0.013016 0.011561 0.011009 0.013720
SBT 0.009171 0.009984 0.020123 0.023605 0.026334 0.026340 0.024738 0.013508 0.021442
SSI 0.003501 0.010974 0.025759 0.033560 0.035462 0.038449 0.042812 0.058004 0.059730
VCB 0.043887 0.052832 0.048772 0.038443 0.031976 0.029193 0.024631 0.011838 0.017969
VNM 0.049289 0.055314 0.049015 0.043668 0.047589 0.050864 0.054577 0.048458 0.042112
509
Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515
Figure 8: Error of out-of-sample forecasts in log-scale.
Figure 9: Backtest trading strategy.
510
Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515
Table 2: Mean squared error of sequence to sequencemodel forecast from 5 to 45 window time steps ahead in
log-scale
5 10 15 20 25 30 35 40 45
ACB 0.001549 0.010649 0.015936 0.036856 0.066454 0.088031 0.065661 0.023986 0.111370
BID 0.002975 0.023858 0.022093 0.055056 0.045212 0.055175 0.123015 0.111880 0.102072
BVH 0.003339 0.007065 0.022930 0.037389 0.038270 0.048385 0.094943 0.131791 0.118588
CII 0.000068 0.000949 0.007197 0.011593 0.008407 0.015061 0.032799 0.024295 0.019871
CTD 0.003564 0.031081 0.093107 0.072614 0.134139 0.160979 0.193552 0.157608 0.180879
CTG 0.000067 0.001404 0.002087 0.008134 0.003509 0.010062 0.014351 0.011942 0.010831
DHG 0.004307 0.015290 0.027839 0.005491 0.031615 0.040781 0.029965 0.034531 0.036360
EIB 0.000734 0.004151 0.005739 0.015192 0.009920 0.018238 0.014715 0.021097 0.012947
FPT 0.000538 0.004052 0.008909 0.011313 0.022583 0.027503 0.039039 0.028022 0.028253
GAS 0.001004 0.021844 0.031094 0.073882 0.090982 0.066480 0.057572 0.106243 0.162174
HCM 0.000771 0.005351 0.012451 0.006651 0.022035 0.030805 0.048325 0.047193 0.037641
HPG 0.000912 0.016714 0.055040 0.013062 0.073548 0.086221 0.099066 0.055516 0.117556
MBB 0.000955 0.008257 0.016460 0.021182 0.035717 0.054560 0.038501 0.064305 0.077589
MSN 0.001165 0.008558 0.019539 0.023428 0.032733 0.051309 0.039536 0.061719 0.071979
PNJ 0.009910 0.006657 0.061987 0.097721 0.094561 0.124379 0.171758 0.173592 0.231800
PPC 0.001150 0.002104 0.014166 0.012028 0.017004 0.027315 0.031092 0.031779 0.035133
REE 0.000612 0.005939 0.008998 0.008640 0.012904 0.017005 0.007903 0.031858 0.039960
SBT 0.005828 0.017245 0.031245 0.023963 0.067473 0.087509 0.049534 0.072302 0.079048
SSI 0.000589 0.004103 0.003022 0.011251 0.014304 0.013024 0.026790 0.025110 0.035506
VCB 0.003926 0.009320 0.032977 0.021303 0.023619 0.066008 0.039739 0.086166 0.084254
VNM 0.003010 0.021958 0.021868 0.043893 0.065320 0.049399 0.068308 0.087976 0.119454
However, there is no package in Python supporting
the test at this time, the test was not carried out to con-
duct appropriated benchmark in terms of statistics. In
addition, Diebold-Mariano (DM) test for comparing
predictive accuracy (not for comparing models) can-
not be applied as it only works for non-nested mod-
els50,51. Hence, we develop a back testing for the best
model as our benchmark suggest (i.e. the proposed
model) with real market data in different asset class
to relax this limitation.
Overall, in same window size, the combination of
structural time series models and Sequence to Se-
quence model are always achieve high performance
than pure structural time series models and Sequence
to Sequence model. However, in some cases, the hy-
brid model cannot capture movement of stock when
market is highly volatile.
CONCLUSION ANDDISCUSSION
In this work, we generally discussed a set of proce-
dures to model and predict price of stocks in Viet-
nam stock market with structural time series models
and Sequence to Sequence model and the combina-
tion of these models. Specifically, we fit stock prices
data with structural time series models then use fitted
data as input feature of Sequence to Sequence model
and generate out-sample prediction. We used output
of models to compare accuracy performance of each
model. We found that our proposed model can over-
come limitations of each model and generate fore-
cast with higher accuracy. The proposed model also
achieves positive results for derivatives trading with
real market data. Hence, the combination of Long
Short-term memory and structural time series model
is applicable to Vietnam stock markets.
Furthermore, deep learning is a powerful approach to
address time series problems. However, without fea-
511
Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515
Table 3: Mean squared error of proposedmodel forecast from 5 to 45 window time steps ahead in log-scale
5 10 15 20 25 30 35 40 45
ACB 0.000243 0.001528 0.000953 0.002946 0.000294 0.001339 0.002150 0.014070 0.046146
BID 0.002119 0.001047 0.003420 0.002281 0.007414 0.007904 0.039367 0.035262 0.002000
BVH 0.000145 0.007243 0.000876 0.000427 0.011959 0.002539 0.010426 0.017116 0.002917
CII 0.000006 0.002513 0.000787 0.001118 0.001933 0.000599 0.000063 0.002505 0.003436
CTD 0.001580 0.000395 0.000173 0.000329 0.008648 0.000776 0.016954 0.015527 0.002938
CTG 0.001237 0.000776 0.000641 0.002322 0.000283 0.000287 0.000458 0.002171 0.004216
DHG 0.000081 0.002963 0.000692 0.000894 0.005131 0.002980 0.000729 0.000448 0.024931
EIB 0.000022 0.001645 0.000096 0.001451 0.000103 0.001309 0.003419 0.001766 0.014310
FPT 0.000018 0.000009 0.002727 0.000098 0.000629 0.000087 0.004376 0.011860 0.001626
GAS 0.001049 0.000242 0.006162 0.001701 0.007973 0.000451 0.012106 0.007322 0.018421
HCM 0.000120 0.000814 0.000365 0.006616 0.000347 0.000154 0.005450 0.009231 0.003637
HPG 0.001480 0.000214 0.000307 0.000747 0.000886 0.001769 0.005409 0.006507 0.001292
MBB 0.000151 0.000234 0.002842 0.000414 0.003575 0.004369 0.011552 0.002987 0.007829
MSN 0.002896 0.001696 0.000507 0.001222 0.003149 0.006190 0.000697 0.002198 0.000785
PNJ 0.000267 0.003331 0.000503 0.000599 0.010909 0.003138 0.030432 0.022743 0.000275
PPC 0.000106 0.001729 0.001398 0.002341 0.001320 0.000454 0.003241 0.002314 0.000987
REE 0.000349 0.001104 0.000185 0.000304 0.000936 0.000123 0.002099 0.005302 0.011126
SBT 0.000863 0.002671 0.001414 0.000694 0.000526 0.000598 0.000768 0.013324 0.002088
SSI 0.000214 0.000316 0.000056 0.000131 0.001353 0.002919 0.000708 0.000588 0.007224
VCB 0.000439 0.000612 0.000192 0.000652 0.016464 0.007455 0.000140 0.002109 0.026819
VNM 0.000237 0.000130 0.008860 0.005173 0.002549 0.000879 0.000835 0.004840 0.000963
ture engineering, deep learning generates prediction
lower accuracy than structural time series models. In
future work, we will improve that model to achieve
real-time prediction to apply for quantitative trading.
In addition, we believe that Generative Adversarial
Networks (GAN) is a promising approach to apply.
ACKNOWLEDGEMENT
This paper is written under Grant number CS/2019-
04 funding from the University of Economics and
Law, VNU-HCM.Besides, we alsowould like to thank
the John Von Neumann Institute for their support
throughout the project.
ABBREVIATIONS
ARMA: Auto-regressive–moving-average
GARCH: Generalized Auto-regressive Conditional
Heteroskedasticity
RNN: Recurrent Neural Network
LSTM: Long Short-term Memory
Seq2Seq: Sequence to sequence
GAN: Generative Adversarial Networks
MSE: Mean Squared Error
HOSE: Ho Chi Minh Stock Exchange
HNX: Hanoi Stock Exchange
COMPETING INTERESTS
The authors declare that they have no conflict of in-
terest.
AUTHORS’ CONTRIBUTIONS
Quoc Luu andUyenPham initiate the idea, study rele-
vant models and seek for the data. Quoc Luu and Son
Nguyen build the main programs for numerical sim-
ulations. All authors check the simulation and con-
tribute for the interpretation of the results as well as
edit, revise the text and approve the article.
512
Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515
REFERENCES
1. Zhang Y, Tan X, Xi H, Zhao X. Real-time risk management
based on time series analysis. In2008 7th World Congress on
Intelligent Control and Automation Jun 25. IEEE. 2008;p.
2518–2523.
2. Giamouridis D, Vrontos I. Hedge fund portfolio construction:
A comparison of static and dynamic approaches. Journal of
Banking & Finance. 2007;31(1):199–217. Available from: https:
//doi.org/10.1016/j.jbankfin.2006.01.002.
3. Zhou X, Pan Z, Hu G, Tang S, Zhao C. Stock market pre-
diction on high-frequency data using generative adversarial
nets. Mathematical Problems in Engineering. 2018;Available
from: https://doi.org/10.1155/2018/4907423.
4. Box G, Jenkins G, Reinsel G, LjungG. Time series analysis: fore-
casting and control. John Wiley & Sons. 2015;.
5. Harvey A, Todd P. Forecasting economic time series with
structural and Box-Jenkins models: A case study. Journal of
Business & Economic Statistics. 1983;1(4):299–307. Available
from: https://doi.org/10.1080/07350015.1983.10509355.
6. Géron A. Hands-on machine learning with Scikit-Learn and
TensorFlow: concepts, tools, and techniques to build intelli-
gent systems. O’Reilly Media, Inc. 2017;.
7. Nelson C, Kang H. Pitfalls in the Use of Time as an Explanatory
Variable in Regression. Journal of Business & Economic Statis-
tics. 1984;2(1):73–82. Available from: https://doi.org/10.1080/
07350015.1984.10509371.
8. Harvey A, Peters S. Estimation procedures for structural time
seriesmodels. Journal of Forecasting. 1990;9(2):89–108. Avail-
able from: https://doi.org/10.1002/for.3980090203.
9. Jalles J. Structural time series models and the Kalman Filter: a
concise review. 2009;Available from: https://papers.ssrn.com/
sol3/papers.cfm?abstract_id=1496864.
10. Taylor S, and BL. Forecasting at scale. The American Statis-
tician. 2018;72(1):37–45. Available from: https://doi.org/10.
1080/00031305.2017.1380080.
11. Harvey A. Trends and cycles in macroeconomic time se-
ries. Journal of Business & Economic Statistics. 1985;3(3):216–
227. Available from: https://doi.org/10.1080/07350015.1985.
10509453.
12. Kitagawa G, Gersch W. A smoothness priors-state space
modeling of time series with trend and seasonality. Jour-
nal of the American Statistical Association. 1984;79(386):378–
389. Available from: https://doi.org/10.1080/01621459.1984.
10478060.
13. Harrison P, Stevens C. Bayesian forecasting. Journal
of the Royal Statistical Society: Series B (Methodological).
1976;38(3):205–228. Available from: https://doi.org/10.1111/
j.2517-6161.1976.tb01586.x.
14. Koopman S, Ooms M. Forecasting economic time series
using unobserved components time series models. 2011;p.
129–162. Available from: https://doi.org/10.1093/oxfordhb/
9780195398649.013.0006.
15. Lipton Z, Berkowitz J, Elkan C. A critical review of recur-
rent neural networks for sequence learning. arXiv preprint
arXiv:150600019. 2015;.
16. Rumelhart D, Hinton G, Williams R. Learning representations
by back-propagating errors. Cognitive modeling. 1988;5(3):1.
17. Elman J. Finding structure in time. Cognitive science.
1990;14(2):179–211. Available from: https://doi.org/10.1207/
s15516709cog1402_1.
18. Haykin S, Principe J, Sejnowski T, Mcwhirter J. Modeling Large
Dynamical Systems with Dynamical Consistent Neural Net-
works;.
19. Goodfellow I, Bengio Y, Courville A. Deep learning. MIT press.
2016;.
20. Jordan MI. Serial order: a parallel distributed processing ap-
proach. Technical report, June 1985-March 1986. California
Univ., San Diego, La Jolla (USA). Inst for Cognitive Science.
1986;.
21. Ruder S. An overview of gradient descent optimization algo-
rithms. arXiv preprint arXiv:160904747. 2016;.
22. Qian N. On themomentum term in gradient descent learning
algorithms. Neural networks. 1999;12(1):145–151. Available
from: https://doi.org/10.1016/S0893-6080(98)00116-6.
23. Yu N. Introductory lectures on convex optimization: a basic
course. Springer US. 2004;.
24. Hochreiter S, Schmidhuber J. Long short-termmemory. Neu-
ral computation. 1997;9(8):1735–1780. PMID: 9377276. Avail-
able from: https://doi.org/10.1162/neco.1997.9.8.1735.
25. Nair V, Hinton G. Rectified linear units improve restricted
boltzmannmachines. InProceedings of the 27th international
conference on machine learning (ICML-10). 2010;p. 807–814.
26. He K, Zhang X, Ren S, Sun J. Deep residual learning for im-
age recognition. InProceedings of the IEEE conference on
computer vision and pattern recognition. 2016;p. 770–778.
Available from: https://doi.org/10.1109/CVPR.2016.90PMid:
26180094.
27. Hochreiter S. Untersuchungen zu dynamischen neu-
ronalen Netzen. Diploma, Technische Universität München.
1991;91(1).
28. Bengio Y, Simard P, Frasconi P. Learning long-term depen-
dencieswith gradient descent is difficult. IEEE transactions on
neural networks. 1994;5(2):157–166. PMID: 18267787. Avail-
able from: https://doi.org/10.1109/72.279181.
29. Harmon M, Klabjan D. Dynamic prediction length for time
series with sequence to sequence networks. arXiv preprint
arXiv:180700425. 2018;.
30. Zhou C, Sun C, Liu Z, Lau F. A C-LSTM neural network for text
classification. arXiv preprint arXiv:151108630. 2015;.
31. Graves A, Jaitly N, Mohamed A. Hybrid speech recognition
with deep bidirectional LSTM. In 2013 IEEE workshop on au-
tomatic speech recognition and understanding. IEEE. 2013;p.
273–278. Available from: https://doi.org/10.1109/ASRU.2013.
6707742.
32. Gers F, Schmidhuber J, Cummins F. Learning to forget: Con-
tinual prediction with LSTM;.
33. Greff K, Srivastava R, Koutník J, Steunebrink B, Schmidhuber
J. LSTM: A search space odyssey. IEEE transactions on neural
networks and learning systems. 2016;28(10):2222–32. PMID:
27411231. Available from: https://doi.org/10.1109/TNNLS.
2016.2582924.
34. Graves A. Supervised sequence labelling. InSupervised se-
quence labelling with recurrent neural networks. Springer,
Berlin, Heidelberg. 2012;p. 5–13. Available from: https://doi.
org/10.1007/978-3-642-24797-2_2.
35. Chollet FK. Deep learning library for theano and tensorflow.
2015;7(8). Available from: https://keras.io/k.
36. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, et al. Ten-
sorflow: A system for large-scale machine learning. In12th
{USENIX} Symposium on Operating Systems Design and Im-
plementation ({OSDI} 16). 2016;p. 265–283.
37. Cho K, Merriënboer BV, Gulcehre C, Bahdanau D, Bougares F,
Schwenk H, et al. Learning phrase representations using RNN
encoder-decoder for statistical machine translation. arXiv
preprint arXiv:14061078. 2014;Available from: https://doi.org/
10.3115/v1/D14-1179.
38. Britz D, Goldie A, Luong M, Le Q. Massive explo-
ration of neural machine translation architectures. arXiv
preprint arXiv:170303906. 2017;Available from: https://doi.
org/10.18653/v1/D17-1151.
39. van der Westhuizen J, Lasenby J. The unreasonable effective-
ness of the forget gate. arXiv preprint arXiv:180404849. 2018;.
40. Sutskever I, Vinyals O, Le Q. Sequence to sequence learning
with neural networks. In Advances in neural information pro-
cessing systems. 2014;p. 3104–3112.
41. Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell
T, Saenko K. Sequence to sequence-video to text. InPro-
ceedings of the IEEE international conference on computer
vision;p. 4534–4542. Available from: https://doi.org/10.1109/
ICCV.2015.515.
42. Tang Y, Xu J, Matsumoto K, Ono C. Sequence-to-sequence
model with attention for time series classification. In2016 IEEE
513
Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515
16th International Conference on Data Mining Workshops
(ICDMW) . IEEE. 2016;p. 503–510. Available from: https://doi.
org/10.1109/ICDMW.2016.0078.
43. Bahdanau D, Cho K, Bengio Y. Neural machine translation
by jointly learning to align and translate. arXiv preprint
arXiv:14090473. 2014;.
44. Yang Y, Linkens D, Mahfouf M. Nonlinear Data Transfonna-
tion to Improve FlowStress PredictionUsingNeural Networks.
IFAC Proceedings Volumes. 2004;37(15):371–376. Available
from: https://doi.org/10.1016/S1474-6670(17)31052-2.
45. Changyong F, Hongyue W, Naiji L, Tian C, Hua H, Ying L. Log-
transformation and its implications for data analysis. . Shang-
hai archives of psychiatry. 2014;26(2):105.
46. Kim K. Financial time series forecasting using support vector
machines. Neurocomputing. 2003;55(1-2):307–319. Available
from: https://doi.org/10.1016/S0925-2312(03)00372-2.
47. Facebook. Prophet Notebook;.
48. HintonG. Nitish Srivastava Alex Krizhevsky Ilya Sutskever and
Ruslan Salakhutdinov. Improvingneural networks byprevent-
ing co-adaptation of feature detectors. CoRR. 2012;.
49. Chong Y, Hendry D. Econometric evaluation of linear
macro-economic models. The Review of Economic Studies.
1986;1;53(4):671–690. Available from: https://doi.org/10.2307/
2297611.
50. Vavra M. On a Bootstrap Test for Forecast Evaluations. Re-
search Department, National Bank of Slovakia. 2015;.
51. Diebold F. Comparing predictive accuracy, twenty years later:
A personal perspective on the use and abuse of Diebold-
Mariano tests. Journal of Business & Economic Statis-
tics. 2015;33(1):1. Available from: https://doi.org/10.1080/
07350015.2014.983236.
514
Tạp chí Phát triển Khoa học và Công nghệ – Kinh tế-Luật và Quản lý, 4(1):500-515
Open Access Full Text Article Bài Nghiên cứu
1Tài Chính Tính Toán Định Lượng, Viện
John von Neumann, Thành phố Hồ Chí
Minh, Việt Nam
2Toán Kinh Tế, Trường Đại học Kinh tế
Luật, Thành phố Hồ Chí Minh, Việt
Nam
Liên hệ
Lưu Hoài Thương Quốc, Tài Chính Tính
Toán Định Lượng, Viện John von Neumann,
Thành phố Hồ Chí Minh, Việt Nam
Email: quoc.luu2015@qcf.jvn.edu.vn
Lịch sử
Ngày nhận: 28/6/2019
Ngày chấp nhận: 25/9/2019
Ngày đăng: 31/3/2020
DOI : 10.32508/stdjelm.v4i1.593
Bản quyền
© ĐHQG Tp.HCM. Đây là bài báo công bố
mở được phát hành theo các điều khoản của
the Creative Commons Attribution 4.0
International license.
Dự báo chuỗi thời gian: sự kết hợp giữamô hình Long Short-term
Memory vàmô hình cấu trúc chuỗi thời gian
Lưu Hoài Thương Quốc1,*, Nguyễn Phúc Sơn2, PhạmHoàng Uyên1,2
Use your smartphone to scan this
QR code and download this article
TÓM TẮT
Thị trường chứng khoán là một kênh huy động vốn quan trọng cho nền kinh tế. Tuy nhiên, thị
trường có một sự mất mát tiềm tàng do sự biến động của giá cổ phiếu để phản ánh các sự kiện
không chắc chắn như tin tức chính trị, nguồn cung và nhu cầu của khối lượng giao dịch hàng
ngày. Có nhiều cách khác nhau để giảm rủi ro như xây dựng và tối ưu hóa danh mục đầu tư, phát
triển chiến lược phòng ngừa rủi ro. Vì thế kỹ thuật dự báo chuỗi thời gian có thể rất hữu ích nhằm
giúp cải thiện hiệu suất lợi nhuận cao hơn trên thị trường chứng khoán. Gần đây, thị trường chứng
khoán Việt Nam ngày càng được chú ý bởi hiệu suất đầu tư và vốn hóa đang được cải thiện. Trong
nghiên cứu này, chúng tôi đề xuất mô hình kết hợp giữa mô hình Sequence to Sequence với kiến
trúc mạng bộ nhớ dài-ngắn (Long Short-Term Memory) của học sâu và mô hình cấu trúc chuỗi
thời gian. Chúng tôi dùng dữ liệu giá của 21 cổ phiếu được niêm yết có giao dịch nhiều nhất trên
sàn giao dịch chứng khoán Hồ Chí Minh (HOSE) và sàn giao dịch chứng khoán Hà Nội (HNX) của
thị trường chứng khoán Việt Nam để đánh giá độ chính xác của mô hình đề xuất với mô hình
Sequence to Sequence và mô hình cấu trúc chuỗi thời gian thuần. Mặt khác, để kiểm tra lại tính
ứng dụng củamôhình trongmôi trường đầu tư thực tế, chúng tôi dùngmôhình đề xuất cho quyết
định mua (Long) hay bán (Short) hợp đồng tương lai VN30F1M (hợp đồng tương lai chỉ số VN30
kỳ hạn một tháng) được niêm yết trên sàn HNX. Kết quả cho thấy mô hình đề xuất kết hợp giữa
Sequence to Sequence với kiến trúc mạng bộ nhớ dài-ngắn và mô hình cấu trúc chuỗi thời gian
đạt hiệu quả cao hơn với sai số nhỏ hơn các mô hình thuần trong việc dự báo giá chứng khoán và
có lời đối với giao dịch hợp đồng tương lai. Nghiên cứu này có ý nghĩa tích cực trong việc đóng
góp vào cơ sở lý luận của dự báo chuỗi thời gian bởi phương pháp được đề xuất trong nghiên
cứu này giúp bỏ qua những giải định khó thoản mãn trong môi trường tài chính thực tế của các
phương pháp hiện tại như Auto-regressive–moving-average model, Generalized Auto-regressive
Conditional Heteroskedasticity. Về mặt ứng dụng, các nhà đầu tư có thể sử dụngmômình để phát
triển các chiến thuật để giao dịch trên thị trường chứng khoán Việt Nam.
Từ khoá: LSTM, Seq2Seq, Mô hình cấu trúc, mô hình kết hợp
Trích dẫn bài báo này: Hoài Thương Quốc L, Phúc Sơn N, Hoàng Uyên P.Dự báo chuỗi thời gian: sự kết
hợp giữa mô hình Long Short-term Memory và mô hình cấu trúc chuỗi thời gian. Sci. Tech. Dev. J. -
Eco. LawManag.; 4(1):500-515.
515
Các file đính kèm theo tài liệu này:
- time_series_prediction_a_combination_of_long_short_term_memo.pdf