Time series prediction: A combination of long short term memory and structural time series models

TÓM TẮT Thị trường chứng khoán là một kênh huy động vốn quan trọng cho nền kinh tế. Tuy nhiên, thị trường có một sự mất mát tiềm tàng do sự biến động của giá cổ phiếu để phản ánh các sự kiện không chắc chắn như tin tức chính trị, nguồn cung và nhu cầu của khối lượng giao dịch hàng ngày. Có nhiều cách khác nhau để giảm rủi ro như xây dựng và tối ưu hóa danh mục đầu tư, phát triển chiến lược phòng ngừa rủi ro. Vì thế kỹ thuật dự báo chuỗi thời gian có thể rất hữu ích nhằm giúp cải thiện hiệu suất lợi nhuận cao hơn trên thị trường chứng khoán. Gần đây, thị trường chứng khoán Việt Nam ngày càng được chú ý bởi hiệu suất đầu tư và vốn hóa đang được cải thiện. Trong nghiên cứu này, chúng tôi đề xuất mô hình kết hợp giữa mô hình Sequence to Sequence với kiến trúc mạng bộ nhớ dài-ngắn (Long Short-Term Memory) của học sâu và mô hình cấu trúc chuỗi thời gian. Chúng tôi dùng dữ liệu giá của 21 cổ phiếu được niêm yết có giao dịch nhiều nhất trên sàn giao dịch chứng khoán Hồ Chí Minh (HOSE) và sàn giao dịch chứng khoán Hà Nội (HNX) của thị trường chứng khoán Việt Nam để đánh giá độ chính xác của mô hình đề xuất với mô hình Sequence to Sequence và mô hình cấu trúc chuỗi thời gian thuần. Mặt khác, để kiểm tra lại tính ứng dụng của mô hình trong môi trường đầu tư thực tế, chúng tôi dùng mô hình đề xuất cho quyết định mua (Long) hay bán (Short) hợp đồng tương lai VN30F1M (hợp đồng tương lai chỉ số VN30 kỳ hạn một tháng) được niêm yết trên sàn HNX. Kết quả cho thấy mô hình đề xuất kết hợp giữa Sequence to Sequence với kiến trúc mạng bộ nhớ dài-ngắn và mô hình cấu trúc chuỗi thời gian đạt hiệu quả cao hơn với sai số nhỏ hơn các mô hình thuần trong việc dự báo giá chứng khoán và có lời đối với giao dịch hợp đồng tương lai. Nghiên cứu này có ý nghĩa tích cực trong việc đóng góp vào cơ sở lý luận của dự báo chuỗi thời gian bởi phương pháp được đề xuất trong nghiên cứu này giúp bỏ qua những giải định khó thoản mãn trong môi trường tài chính thực tế của các phương pháp hiện tại như Auto-regressive–moving-average model, Generalized Auto-regressive Conditional Heteroskedasticity. Về mặt ứng dụng, các nhà đầu tư có thể sử dụng mô mình để phát triển các chiến thuật để giao dịch trên thị trường chứng khoán Việt Nam.

16 trang | Chia sẻ: hachi492 | Ngày: 17/01/2022 | Lượt xem: 133 | Lượt tải: 0

Bạn đang xem nội dung tài liệu Time series prediction: A combination of long short term memory and structural time series models, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên

his procedure may lead to mislead- ing results if trend is not deterministic7. A structural time series models are a decomposable time series in terms of three components of trend, seasonality and cycle8,9. It is defined as following equation: y(t) = g(t)+ s(t)+h(t)+ et (1) where t = 1; : : : ;T;, and g(t) is stochastic and non- periodic changes trend, s(t) is a seasonal stationary linear process with periodic changes (e.g. quarterly, yearly seasonality), and h(t) is a cyclical frequency of time occurring on potentially irregular schedules over one or more days10. Many researches strongly support the model in prac- tice have been carried out. For instance, Harvey shown that class of structural models have several ad- vantages over the seasonal ARIMA models adopted and are applicable to model cycles in macroeconomic time series5,11. Kitagawa, Gersch decomposed time series into trend, seasonal, globally stationary autore- gressive and observation error components with state space Kalman filter and used Akaike minimum AIC procedure to select the best of the alternative state space models12 Taylor, Letham use structural models for forecasting of business time series 10. The local linear trend is a process can be regarded as a local approximation to a linear trend. The stochastic linear process can be described as: y(t) = g(t)+ et g(t) = g(t1)+b (t1)+ht (2) b (t) = b (t1)+zt 0 where the et NID(0;s2e ); t = 1; : : : ;T; ht NID(0;s2h ); and zt NID(0;s2z ) are dis- tributed independent of one another and white noise disturbance terms with mean zero and vari- ances s2e ; s2h and s2z respectively 13. Koopman and Ooms14 proposed trend with stationary drift process to extend local linear trend process by adding a sta- tionary stochastic drift component: g(t) = g(t1)+b (t1)+ht (3) bt = (1jb ) b +jb bt +zt with autoregressive coefficient 0 < jb 1. However, there is a drawback with this approach thatmake such drift processes are difficult to identified. It requires very large data samples. Taylor and Letham10 developed new type of trend models. Accordingly, they suggested that there are two types of trendmodels: a saturating growthmodel, and a piecewise linear model (see Figure 1). Saturat- ing growth model is characterized by growth rate and limitation of population growth. By applying nonlin- ear logistic function: g(t) = C1+ek(tm) (4) with e is the natural logarithm base, m is the value of sigmoid middle point, C is the maximum capac- ity value, k is growth rate of the curve. From that point of view, it cannot be captured movement in dy- namic world due to nonconstant growth of maximum capacity value and rate of the curve. Hence, to over- come the issues, Taylor and Letham defined a time- varying of maximum capacity C and growth rate k. Suppose that we explicitly define S changepoints at times s j; j= 1; : : : ;S;, and a vector of rate adjustments d 2 RS with d j is the change in rate that occurs at time s j 10. The saturating growth model is defined as: g(t) = C(t)1+e(k+a(t)| d )(t(m+a(t)| g) (5) where g j = (s jm åi< j gl )(1 ( k+å l< j dl ål j dl ) a j (t) = f1;i f ts0;otherwise Maximum capacityC(t) is adopted from external data source. From saturating growth model, we can define piece- wise linear model without exhibit saturating growth: g(t) = (k+a(t)T d )t+(m+a(t)T g) (6) like saturating growth model, k is the growth rate, d has the rate adjustments, m is offset parameter, and g j is set tos jd j to make the function continuous. 501 Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515 Figure 1: Fitted daily stock price of Ho Chi Minh City Securities (HCM) stock with piecewise linear model from January 3rd , 2017 to February 26th, 2019 in log-scale. Deep Neural Network Recurrent Neural Network Despite powerfulness of deep neural networks, tra- ditional neural networks have two drawbacks15 . Firstly, main assumption of standard neural networks is independence among the samples (data points). On the other words, traditional neural networks cannot link current event to previous events to inform later ones due to it stateless preservation. In time series analysis, it is widely accepted that current value de- pends on past values4. It is unacceptable because the independence assumption fails. Secondly, tra- ditional neural networks require fixed-length vector of each sample. Hence, it is critical to develop a new generation of deep neural networks. Rumelhart, Hinton, Williams (p.533) introduced a new learning procedure for neuron networks with backpropagation which can capture internal hidden state to “repre- sent important features of the task domain”16. Fur- thermore, with current development, recurrent neu- ral network can model sequential data with varying length and time dependences. A simple feed forward recurrent neural network is defined 17: h(t) = s(W hxx(t)+W hhht1+bh) (7)by(t) = so f tmax(W yhht + by) (8) where h(t) is hidden state of input data point at time t. Clearly, h(t) is influenced by h(t1) in the networks previous state. The output by(t) at each time t is calcu- lated given the hidden node values h(t) at time t. W yh is weight matrix of input-hidden layer andW hh is the matrix of hidden-to-hidden transition. In most con- text, h(0) is initialized to zero. Haykin, Principe, Se- jnowski, Mcwhirter suggested that RNN can achieve stability and higher performance by nonzero initial- ization18. By comparison to traditional fully con- nected feedforward network, a recurrent neural net- work takes advantage of sharing parameters across the model that helps it learns without separately at each position of sentence or series 19. Earlier, Jordan pro- posed an almost like17. However, context nodes are fed from the output layer instead of from hidden lay- ers20. It means that Jordans neural network can take previous predicted output into account to predict cur- rent output. h(t) = s(W hxx(t)+W hh by(t1)+bh) (9)by(t) = so f tmax(W yhht + by) In term of training, there are two steps to train a recurrent neural network. First, the forward prop- agation creates by outputs. After that, loss function value L(byk;yk) of the network of each output node k are compute in backpropagation stage. There are many types of loss function to measure distance be- tween the output and the actual value of classification problems. To minimize the distance, we need to up- date each of the weights iteratively by applying back- propagation algorithm 16. The algorithm applies derivative chain rule to calcu- late the derivative of the loss function L for each pa- rameter in the network. In addition, weights of neural 502 Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515 network are updated by gradient descent algorithm 15. Hence, gradient of error of a neuron is calculated as: dk = ¶L(byk;yk) (¶byk ) g0k(ak) (10) where ak = weak+ b is input to node k and eak is in- coming activation output of ak;g0k(ak) , is activation function for node k. The first term ¶L(byk;yk) (¶byk) expresses how fast the cost is changing as a function of esti- mated output. The second term g0k(ak) suggests rate of change of gk activation function at ak . In vector- ized form, we generalize equation (10) for any layerlth : d l = Ñby Cg0(al) (11) In addition, from the d l , we can compute the error of the next layer d l+1 as: d l = ((wl+1)T d t+1)g0(al) (12) and error with respect to anyweight, bias in the neural network: ¶C ¶wl = byl1d l (13) ¶C ¶bl = d l From the final layer to first hidden layer, for each layer of the neural network, we can apply the back- propagation and compute the error vector d l with the chain rule repeatedly to update weight and bias vec- tors. In term of local minimum optimization, gra- dient descent is utilized for finding the minimum of cost function by updating weight and bias vectors. It is computed as: wl ! wl hm åx d x;l (byx;l1) (14) bl ! bl hm åd x;l where m is number of training examples in a given mini-batch with each training example x; h is a step size. In practical, there are many optimizers devel- oped to improve mini-batch gradient descent limita- tions21. For instance, Qian22 and Yu23 accelerated gradient was developed to relax navigating ravines problem of stochastic gradient decent. Recurrent neural network is a breakthrough in temporal se- quence by adding internal state (memory) in each cell to process sequences of inputs. In term of training, re- current neural network parameters can be computed and optimized by feed forward propagation and back- propagation. For shallow network with a few hid- den layers, the algorithm can be trained effectively. However, with many hidden layers, it is hard to train the network due to vanishing and exploding gradient problem as derivatives become too small (e.g. 0 to 1 for sigmoid activation function) or too large. It only allows the network to learn in short-range dependen- cies and prevents from learning long-range depen- dencies. As a result, long-short term memory net- work architecture24, rectified linear units activation function25, residual learning framework He, Zhang, Ren, Sun were introduced to overcome the limita- tion26. Long-Short TermMemory Network Formally identified by Hochreiter in both theoret- ical and experimental approaches27, with involve- ment of long-term dependencies data, back propaga- tion algorithm of recurrent neural network is showed that it suffers from insufficient that tends to ex- plode or vanish through time may lead to oscillating weights or unusable model. Not just recurrent neu- ral network, Bengio, Simard, Frasconi28 also pointed out that any deep feed-forward neural network with shared weights may have vanishing gradient problem. Hochreiter, Schmidhuber (p.6) developed a new ap- proach called Long Short-Term Memory (LSTM) to fill these gaps by introducing “input gate unit”, “out- put gate unit”, and “memory cell”24. Accordingly, the purpose of multiplicative input gate unit is to protect memory contents from irrelevant inputs, and multi- plicative output gate unit is to protect other units from perturbation by currently irrelevant stored memory contents. On the other words, with the new LSTM architecture (see Figure 2), each cell can maintain its state over time, and adjust input or output informa- tion. Hence, the new type of neural network archi- tecture is able to capture very long-term temporal de- pendencies effectively, handle noise and continuous values with unlimited state numbers in principle. Since introduction, with revolution of computational power, LSTM has been widely adopted and applied for many difficult problems of many fields in prac- tice and academic. This includes language model- ing28, text classification30, language translation30, speech recognition31. From original LSTM pro- posed by Hochreiter, Schmidhuber24, a significant improvement had been developed by introducing for- get gates to reset out-of-dated contents of LSTM memory cells32. In addition, to achieve higher ca- pability of learning timings, peephole connections that allows gates to look at cell state were added to LSTM neural network. A forward pass LSTM archi- tecture with forget gate and peephole connections is described as33: z t =Wz x t +Rz y t1+bz (15) z t =Wz x t +Rz y t1+bz zt = g( z t ) it =Wixt +Riyt1+ pi ct1+bi it = s(it) f t =W f xt +R f yt1+ p f ct1+b f f t = s( f t ) ct = zt i t + ct1 f t ot =Woxt +Royt1+ po ct1+bo ot = s(ot) yt = h(ct)ot 503 Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515 Figure 2: Long-Short termmemory network architecture. Adopted fromHarmon, Klabjan 29. where zt is block input, it is input gate, f t is forget gate, ct is memory cell, ot is output gate, yt is block output. Wz; Wi; W f ; Wo 2 RNM are input weights; Rz; Ri; R f ; Ro 2 RNM are recurrent weights; pi; p f ; po 2 RN are peephole weights; bz; bi; b f ; bo are bias weights; g; s ; h are activation functions. Like RNN, LSTM is trained with gradient descent as it is a differentiable function estimator34. Backprop- agation equations of LSTM are detailed: dyt = Dt + Rz td zt+1 + RTi d i t+1 + RTf d f t+1 + RTo dot+1 (16) dot = dyt h(ct)s 0ot) dct = dyt ot h0 (ct) + po do t + pi di t+1 + p f d f t+1 +dct+1+ f t+1 d f t = dct ct1s 0( f t ) d i t = dct zt s 0( i t ) dz t = dct it g0(z t ) dxt =WTz tdezt +WTi d i t+WTf d f t+ WTo do t dW = åTt=0hdt ;X ti dR = åT1t=0 hdt+1;X ti db = åTt=0hdti d pi = åT1t=0 c t di t+1 d p f = åT1t=0 c t d f t+1 d po = åT1t=0 c t do t+1 Where * can be one of z ; i ; f ; o and h1;2i is outer product of two vectors. It is worth to note that peephole is not always imple- mented as forget gate because it simplifies LSTM and reduce computational cost without significantly scar- ifying performance. For instance, Keras35 does not support peephole, but CNTK, TensorFlow does sup- port 35,36. There have been many variant versions of vanilla LSTM architecture with minor changes. Greff et al. found that vanilla LSTM (with forget gate and peephole) achieve reasonably performance on various datesets33. Despite effectiveness of LSTM, there are many efforts to simplify the architecture as LSTM re- quires huge computational power of hardware. Gated Recurrent Unit (GRU), a variant of LSTM with fewer parameters than LSTM by simplifying forget gate, which introduced byCho et al.37 has reasonable accu- racy. However, Britz et al. shows that LSTM still sig- nificantly outperforms GRU38. Hence, Van derWest- huizen et al. is another attempt to save computational costs and maintain performance of models by devel- oping a forget-gate-only version of the LSTM with chrono-initialized biases that achieves lightly higher accuracy39. Sequence to sequencemodel Sequence to Sequence is a learning model that maps an input sequence from a fixed-sized vector using a LSTM to another LSTM to extract an output se- quence. Sequence to Sequence has been widely ap- plied in machine translation40, video captioning41, time series classification for human activity recogni- tion42. Bahdanau et al. used RNN Encoder-Decoder that contains two recurrent neural networks (or long short-term memory) to represent an input sequence into another sequence of symbols43. One the other words, encoder-decoder architecture is used to en- code a sequence, decode the encoded sequence, and recreate the sequence. The approach aims to maxi- mize the conditional probability of output sequence given an input sequence. Encoder neural network transforms an input se- quence of variable length X = x1;x2; : : : ;xT into a 504 Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515 Figure 3: Encoder-Decoder architecture. fixed-length context variable with information of the sequence (see Figure 3). RNN is mostly used as an encoder neural network. However, Sutskever et al.40 found that LSTM significantly outperformed shallow LSTMs and RNN. Asmentioned, RNN and LSTMuse previous hidden states h1;h2; : : : ;ht2;ht1 to create current hidden stateht . Hence, hidden state of an in- put sequence is defined as: ht = f (xt ; ht1) (17) c= k(h1;h2; : : : ;hT ) where is hidden state at time t, c is summary hidden state of the whole input sequence, function f() can be RNN, LSTM,GRUnetwork, or an activation function. With summary hidden state c, given a target output Y = y1; y2; : : : ; yT 0 , instead of computingP(Y jX) di- rectly, decoder neural network computes conditional probability of using previous information and sum- mary hidden state. It is formally described as: P(y1; : : : ; yT 0 jx1; : : : ; yT ) =ÕT 0 t 0=1P(yt ; jc;y1; : : : ;yt 01) (18) The trained sequence to sequence model can be used to generate a sequence give an input sequence. In ma- chine translation, reverse the order of the words of the input sequence is necessary because it is easy for op- timizer (e.g. stochastic gradient decent) to “establish communication between the input and the output”40 (p.3). For the sake of nature, time series prediction problems always have desired order as input and out- put is straightforward sequence. EMPIRICAL RESULTS Data In this study, for liquidity and fairness of trading, we use daily price data of 21 most traded stocks that is listed on from VN-Index of Ho Chi Minh Stock Exchange and HNX-Index of Hanoi Stock Exchange (Vietnam) from 05 January 2015 to 19 January 2019. It is 1010 data points in total. We use first 965 data points for training and the last 45 data points for test- ing. It is 9-type of window size for out-of-sample pre- diction. It varies from 5 to 45 with 5-step ahead. Fur- thermore, we use daily price of VN30F1M contract that are traded on Ha Noi Stock Exchange from 1 September 2017 to from 13 November 2018 for train- ing, and from 14 November 2018 to 15 May 2019 for performance validation (120 trading days). Data Pre-processing Beyond algorithm improvement and parameter tun- ing, an approach to improve the accuracy of ma- chine learning model is apply data pre-processing techniques. For instances, these techniques are im- puted missing values, encode categorical data, detect outliers, transform data, and scaling data. In this work, we perform logarithm and Box-Cox transform to transform the input dataset. Rationally, the idea behind the logarithm transformation is to turn prob- abilistic distribution of raw input data from skewed data into approximately normal. Hence, prediction performance is improved dramatically44. However, in some circumstances, the logarithm technique does not generate new data with less variable or more nor- mal. In contrast, it may lead to be more variable and more skewed 45. Thus, it is recommended that trans- formation techniquesmust be applied very cautiously. Output data of the transform stage is passed to data scaler to be normalized. There are many types of scaling method (e.g. maximum absolute value, given range of feature). We use min-max scaler by scaling the input feature to a range of [0,1]. It ensures the 505 Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515 large input value do not overwhelm smaller value in- puts, then helps to stabilize neural networks46. Detail Results Structural Time Series The aim of this step is to create baseline models for evaluating prediction quality of structural time series and sequence to sequence models with our proposed model. Mean square error was calculated to measure performance of each out-of-sample forecast. We develop structural time series models as a base- line model. For this task, we choose Prophet package which is developed by Facebook for Python program- ming language10. In this model, data input is a trans- formed price of stocks in logarithm. In terms of pa- rameter tuning, we almost use default settings except adding monthly, quarterly, and yearly with Fourier orders. We initialize 20, 30, 30 for Fourier orders of monthly, quarterly, and yearly respectively. As it is required future data would have to be known to per- form prediction if we use Box-Cox transformation as an extra regressor in structural time series models, we omit the transformation procedure47. Without extra regressor, the model can generate prediction of 21 se- lected tickers from 5 to 45 with 5-step incrementation window size. As mentioned, Prophet model is structural time se- ries models that combines trend g(t), seasonality s(t), and irregular events. Figure 4 describes our attempt to generate out-of-sample prediction for model qual- ity evaluation and trend g(t) of series as a feature input of Sequence to Sequence using Prophet model from transformed logarithmic form and Box-Cox form of stock price series. In detail, for every stock v in selected list of stocks, we transform the price to log-scale LP and Box-Cox se- ries BC to use as an input for Prophet model P. We set no out-of-sample prediction (W=0) to extract trend series T from in-sample data generated by P as a feature of Sequence to Sequence model. For perfor- mance comparation, we set w to every 9-type W of window size for out-of-sample prediction. Sequence to sequenceModel Regarding to baseline models, we also develop a Se- quence to Sequence with LSTM architecture. We use Keras with Tensorflow backend to create Encoder- Decoder model to solve the sequence to sequence problem35,36. To benefit from the efficiency of par- allel computation for training deep learning neural network, we train the model on virtual machine with GPU on Google Cloud Platform. Sequence to sequence model use states of encoder neural network to generate prediction from decoder neural networks. Hence, we feed normalized stock price series to the model and generate prediction. In Figure 5, we describe approach that we use to develop baseline prediction with sequence to sequencemodel. Like vanilla LSTM model for supervised learning, we train input data with many iterations. However, we discard output of encoder and use state and as input for decoder. Furthermore, to create prediction for the proposed model, we add trend series (extracted from Figure 4) as another input feature. The implementation is straightforward. First, like Figure 4, we use scaled data of Box-Cox BC and log- arithm transforms LP as input data. However, we scale every BC and LP to range from 0 to 1 to cre- ate x for every scaled list of stocks price X*. Further- more, we use logarithm transformed series as target data. We create and extract hidden states of encoder model En with LSTM architecture and initialize de- coder model DE with these hidden states. A main advantage of Sequence to Sequence with LSTM over structural time series models is that it can dynami- cally perform prediction multiple time steps without requiring extra data. In terms of accuracy, we found that result of deeper LSTM model does not outper- form shallow one. Hence, we used LSTM with sin- gle hidden layer, with 64 cells and rectified linear unit activation function. To prevent over-fitting, we apply both L2 regularization and dropout. We use 0.0001 for regularization parameter lambda, rate of dropout is 0.001 as recommended48. Sequence to Sequence with Structural Time SeriesModels In this step, we combine both sequence to sequence model and structural time series models. Specifically, we use output dataset D (with W = 0) from Figure 5 as train data for Figure 6. On the other words, we combine trend component of structural time series models with price of stock in Box-Cox and logarithm forms. Parameters of thesemodels are defined exactly same as aforementioned baseline models. We found that results are improved dramatically. Results Analysis Structural time series models was used to generate a set of out-of-sample forecast in multiple window time steps in log-scale (see Table 1). In terms of prediction error, the result show thatMSE= 0.087787 (CTG at 45 time steps ahead) is highest, MSE = 0.003501 (SSI at 5 time steps ahead) is lowest. Likewise, results from Se- quence to Sequencemodel (seeTable 2) and Sequence 506 Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515 Figure 4: Algorithm structural time series analysis with Facebook Prophet library. Figure 5: Algorithm of Sequence to Sequence. 507 Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515 Figure 6: 45-day forecast of HCM. to Sequence with Structural Time Series Models (see Table 3) show that MSE = 0.231800 (PNJ at 45 time steps ahead) and MSE = 0.046146 (ACB at 45 time steps ahead) are highest, MSE = 0.000068 (CII at 5 time steps ahead) and MSE = 0.000006 (CII at 5 time steps ahead) are lowest. Figure 6 plots prediction out- put of models with actual data of HCM stock. In term of back testing, by applying Figure 6, we found that the proposed model can create positive profit (see Figure 7). For simplicity, we do not con- sider tax and transaction fee. From initial invested money $1000, we get $1159.2 at the end of the test. Specifically, we develop trading environment from real market data with return TRR (index point) to measure reward of the test. Agent is developed from proposed model. For every day of 120 trading days TD, it uses predicted return PR to choose positions. If predicted return PR on the next two days (W=2) is positive, we choose Long position. If it is negative, we choose Short position. If is around zero, we hold po- sition. Position is closed when profit PFT is bigger than a point or the position is held more than a day (T=2). From univariate time series analysis perspective, we found that structural time series models of Face- book Prophet generate stable and high quality out-of- sample prediction without requiring advanced tech- niques or data assumptions. In addition, we also found that it even achieves higher accuracy in-sample fitted data when we add an extra regressor to struc- tural time series models. Unfortunately, we cannot create out-sample prediction with extra regressor. In contrast to structural time series models, Sequence to Sequence model with LSTM neural network cannot create stable out-of-sample prediction. As Figure 8 point out, in some cases, Sequence to Sequencemodel captures movement of stocks to generate high accu- racy prediction with lower error than structural time series models. However, the model cannot constantly capture movement of stocks in some other cases. In terms of computational performance, Sequence to Se- quence model also takes more time for training and predicting than structural time series models. It leads to a gap to leveraging the state-of-the-art technique for time series prediction. Fortunately, results from Table 3 suggest that we can fill gaps of structural time series models and Sequence to Sequence model by adding output from structural time series models to Sequence to Sequence model. Figure 8 show that the model is stable and prediction error of proposed model is almost always lowest among in threemodels. In terms of benchmark limitation, there are some drawbacks in this benchmark. On the one hand, it is lack of residual analysis for each prediction. We only compute Mean Square Error (MSE) for perfor- mance comparison. The evaluated results are not con- sistent enough to be fully accurate as some outlier points as Figure 9 point out. On the other hand, al- though the results are clear and useful when we use MSE as an indicator for forecasting accuracy evalu- ation, these forecasting evaluation criteria cannot be discriminated between forecasting models when er- rors of the forecast data are very close to each other. Thus, the Chong and Hendry encompassing test for nested models49 should be carried out to evaluate the statistical significance of the forecasting models. 508 Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515 Figure 7: Uncompounded daily cumulative profit of VN30F1M trading. Table 1: Mean squared error of structural time series model forecast from 5 to 45 window time steps ahead in log-scale 5 10 15 20 25 30 35 40 45 ACB 0.014721 0.014147 0.015933 0.025314 0.028773 0.029434 0.034342 0.048010 0.058335 BID 0.024936 0.023140 0.014819 0.012375 0.014865 0.013050 0.014286 0.021603 0.023775 BVH 0.016722 0.022044 0.037414 0.046589 0.047791 0.047427 0.048169 0.052609 0.053467 CII 0.015892 0.010909 0.017354 0.025683 0.038861 0.049060 0.050928 0.056750 0.058097 CTD 0.020805 0.038501 0.043919 0.048860 0.056231 0.062125 0.070704 0.081036 0.082875 CTG 0.008746 0.030208 0.054718 0.065659 0.067614 0.068867 0.068824 0.084200 0.087787 DHG 0.024497 0.018136 0.037780 0.043572 0.043265 0.042221 0.044850 0.047424 0.051320 EIB 0.027591 0.027747 0.023507 0.023673 0.029305 0.030964 0.029256 0.021934 0.018138 FPT 0.012170 0.009933 0.009485 0.009722 0.010321 0.011376 0.011212 0.010736 0.010837 GAS 0.012885 0.016930 0.035988 0.047679 0.052466 0.057323 0.065196 0.078481 0.080781 HCM 0.047436 0.057021 0.064199 0.071993 0.069983 0.069544 0.070894 0.080751 0.081232 HPG 0.031256 0.030599 0.031586 0.044547 0.047608 0.050945 0.055629 0.067218 0.076217 MBB 0.015255 0.010532 0.027337 0.032361 0.031718 0.031117 0.032678 0.046915 0.052013 MSN 0.029451 0.022273 0.014707 0.011364 0.012848 0.012974 0.014851 0.022448 0.026552 PNJ 0.006993 0.007920 0.013121 0.016369 0.022160 0.028659 0.045313 0.065240 0.077773 PPC 0.030586 0.030464 0.024987 0.022773 0.029191 0.037727 0.045555 0.054872 0.056164 REE 0.019733 0.016948 0.015939 0.014662 0.012662 0.013016 0.011561 0.011009 0.013720 SBT 0.009171 0.009984 0.020123 0.023605 0.026334 0.026340 0.024738 0.013508 0.021442 SSI 0.003501 0.010974 0.025759 0.033560 0.035462 0.038449 0.042812 0.058004 0.059730 VCB 0.043887 0.052832 0.048772 0.038443 0.031976 0.029193 0.024631 0.011838 0.017969 VNM 0.049289 0.055314 0.049015 0.043668 0.047589 0.050864 0.054577 0.048458 0.042112 509 Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515 Figure 8: Error of out-of-sample forecasts in log-scale. Figure 9: Backtest trading strategy. 510 Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515 Table 2: Mean squared error of sequence to sequencemodel forecast from 5 to 45 window time steps ahead in log-scale 5 10 15 20 25 30 35 40 45 ACB 0.001549 0.010649 0.015936 0.036856 0.066454 0.088031 0.065661 0.023986 0.111370 BID 0.002975 0.023858 0.022093 0.055056 0.045212 0.055175 0.123015 0.111880 0.102072 BVH 0.003339 0.007065 0.022930 0.037389 0.038270 0.048385 0.094943 0.131791 0.118588 CII 0.000068 0.000949 0.007197 0.011593 0.008407 0.015061 0.032799 0.024295 0.019871 CTD 0.003564 0.031081 0.093107 0.072614 0.134139 0.160979 0.193552 0.157608 0.180879 CTG 0.000067 0.001404 0.002087 0.008134 0.003509 0.010062 0.014351 0.011942 0.010831 DHG 0.004307 0.015290 0.027839 0.005491 0.031615 0.040781 0.029965 0.034531 0.036360 EIB 0.000734 0.004151 0.005739 0.015192 0.009920 0.018238 0.014715 0.021097 0.012947 FPT 0.000538 0.004052 0.008909 0.011313 0.022583 0.027503 0.039039 0.028022 0.028253 GAS 0.001004 0.021844 0.031094 0.073882 0.090982 0.066480 0.057572 0.106243 0.162174 HCM 0.000771 0.005351 0.012451 0.006651 0.022035 0.030805 0.048325 0.047193 0.037641 HPG 0.000912 0.016714 0.055040 0.013062 0.073548 0.086221 0.099066 0.055516 0.117556 MBB 0.000955 0.008257 0.016460 0.021182 0.035717 0.054560 0.038501 0.064305 0.077589 MSN 0.001165 0.008558 0.019539 0.023428 0.032733 0.051309 0.039536 0.061719 0.071979 PNJ 0.009910 0.006657 0.061987 0.097721 0.094561 0.124379 0.171758 0.173592 0.231800 PPC 0.001150 0.002104 0.014166 0.012028 0.017004 0.027315 0.031092 0.031779 0.035133 REE 0.000612 0.005939 0.008998 0.008640 0.012904 0.017005 0.007903 0.031858 0.039960 SBT 0.005828 0.017245 0.031245 0.023963 0.067473 0.087509 0.049534 0.072302 0.079048 SSI 0.000589 0.004103 0.003022 0.011251 0.014304 0.013024 0.026790 0.025110 0.035506 VCB 0.003926 0.009320 0.032977 0.021303 0.023619 0.066008 0.039739 0.086166 0.084254 VNM 0.003010 0.021958 0.021868 0.043893 0.065320 0.049399 0.068308 0.087976 0.119454 However, there is no package in Python supporting the test at this time, the test was not carried out to con- duct appropriated benchmark in terms of statistics. In addition, Diebold-Mariano (DM) test for comparing predictive accuracy (not for comparing models) can- not be applied as it only works for non-nested mod- els50,51. Hence, we develop a back testing for the best model as our benchmark suggest (i.e. the proposed model) with real market data in different asset class to relax this limitation. Overall, in same window size, the combination of structural time series models and Sequence to Se- quence model are always achieve high performance than pure structural time series models and Sequence to Sequence model. However, in some cases, the hy- brid model cannot capture movement of stock when market is highly volatile. CONCLUSION ANDDISCUSSION In this work, we generally discussed a set of proce- dures to model and predict price of stocks in Viet- nam stock market with structural time series models and Sequence to Sequence model and the combina- tion of these models. Specifically, we fit stock prices data with structural time series models then use fitted data as input feature of Sequence to Sequence model and generate out-sample prediction. We used output of models to compare accuracy performance of each model. We found that our proposed model can over- come limitations of each model and generate fore- cast with higher accuracy. The proposed model also achieves positive results for derivatives trading with real market data. Hence, the combination of Long Short-term memory and structural time series model is applicable to Vietnam stock markets. Furthermore, deep learning is a powerful approach to address time series problems. However, without fea- 511 Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515 Table 3: Mean squared error of proposedmodel forecast from 5 to 45 window time steps ahead in log-scale 5 10 15 20 25 30 35 40 45 ACB 0.000243 0.001528 0.000953 0.002946 0.000294 0.001339 0.002150 0.014070 0.046146 BID 0.002119 0.001047 0.003420 0.002281 0.007414 0.007904 0.039367 0.035262 0.002000 BVH 0.000145 0.007243 0.000876 0.000427 0.011959 0.002539 0.010426 0.017116 0.002917 CII 0.000006 0.002513 0.000787 0.001118 0.001933 0.000599 0.000063 0.002505 0.003436 CTD 0.001580 0.000395 0.000173 0.000329 0.008648 0.000776 0.016954 0.015527 0.002938 CTG 0.001237 0.000776 0.000641 0.002322 0.000283 0.000287 0.000458 0.002171 0.004216 DHG 0.000081 0.002963 0.000692 0.000894 0.005131 0.002980 0.000729 0.000448 0.024931 EIB 0.000022 0.001645 0.000096 0.001451 0.000103 0.001309 0.003419 0.001766 0.014310 FPT 0.000018 0.000009 0.002727 0.000098 0.000629 0.000087 0.004376 0.011860 0.001626 GAS 0.001049 0.000242 0.006162 0.001701 0.007973 0.000451 0.012106 0.007322 0.018421 HCM 0.000120 0.000814 0.000365 0.006616 0.000347 0.000154 0.005450 0.009231 0.003637 HPG 0.001480 0.000214 0.000307 0.000747 0.000886 0.001769 0.005409 0.006507 0.001292 MBB 0.000151 0.000234 0.002842 0.000414 0.003575 0.004369 0.011552 0.002987 0.007829 MSN 0.002896 0.001696 0.000507 0.001222 0.003149 0.006190 0.000697 0.002198 0.000785 PNJ 0.000267 0.003331 0.000503 0.000599 0.010909 0.003138 0.030432 0.022743 0.000275 PPC 0.000106 0.001729 0.001398 0.002341 0.001320 0.000454 0.003241 0.002314 0.000987 REE 0.000349 0.001104 0.000185 0.000304 0.000936 0.000123 0.002099 0.005302 0.011126 SBT 0.000863 0.002671 0.001414 0.000694 0.000526 0.000598 0.000768 0.013324 0.002088 SSI 0.000214 0.000316 0.000056 0.000131 0.001353 0.002919 0.000708 0.000588 0.007224 VCB 0.000439 0.000612 0.000192 0.000652 0.016464 0.007455 0.000140 0.002109 0.026819 VNM 0.000237 0.000130 0.008860 0.005173 0.002549 0.000879 0.000835 0.004840 0.000963 ture engineering, deep learning generates prediction lower accuracy than structural time series models. In future work, we will improve that model to achieve real-time prediction to apply for quantitative trading. In addition, we believe that Generative Adversarial Networks (GAN) is a promising approach to apply. ACKNOWLEDGEMENT This paper is written under Grant number CS/2019- 04 funding from the University of Economics and Law, VNU-HCM.Besides, we alsowould like to thank the John Von Neumann Institute for their support throughout the project. ABBREVIATIONS ARMA: Auto-regressive–moving-average GARCH: Generalized Auto-regressive Conditional Heteroskedasticity RNN: Recurrent Neural Network LSTM: Long Short-term Memory Seq2Seq: Sequence to sequence GAN: Generative Adversarial Networks MSE: Mean Squared Error HOSE: Ho Chi Minh Stock Exchange HNX: Hanoi Stock Exchange COMPETING INTERESTS The authors declare that they have no conflict of in- terest. AUTHORS’ CONTRIBUTIONS Quoc Luu andUyenPham initiate the idea, study rele- vant models and seek for the data. Quoc Luu and Son Nguyen build the main programs for numerical sim- ulations. All authors check the simulation and con- tribute for the interpretation of the results as well as edit, revise the text and approve the article. 512 Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515 REFERENCES 1. Zhang Y, Tan X, Xi H, Zhao X. Real-time risk management based on time series analysis. In2008 7th World Congress on Intelligent Control and Automation Jun 25. IEEE. 2008;p. 2518–2523. 2. Giamouridis D, Vrontos I. Hedge fund portfolio construction: A comparison of static and dynamic approaches. Journal of Banking & Finance. 2007;31(1):199–217. Available from: https: //doi.org/10.1016/j.jbankfin.2006.01.002. 3. Zhou X, Pan Z, Hu G, Tang S, Zhao C. Stock market pre- diction on high-frequency data using generative adversarial nets. Mathematical Problems in Engineering. 2018;Available from: https://doi.org/10.1155/2018/4907423. 4. Box G, Jenkins G, Reinsel G, LjungG. Time series analysis: fore- casting and control. John Wiley & Sons. 2015;. 5. Harvey A, Todd P. Forecasting economic time series with structural and Box-Jenkins models: A case study. Journal of Business & Economic Statistics. 1983;1(4):299–307. Available from: https://doi.org/10.1080/07350015.1983.10509355. 6. Géron A. Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelli- gent systems. O’Reilly Media, Inc. 2017;. 7. Nelson C, Kang H. Pitfalls in the Use of Time as an Explanatory Variable in Regression. Journal of Business & Economic Statis- tics. 1984;2(1):73–82. Available from: https://doi.org/10.1080/ 07350015.1984.10509371. 8. Harvey A, Peters S. Estimation procedures for structural time seriesmodels. Journal of Forecasting. 1990;9(2):89–108. Avail- able from: https://doi.org/10.1002/for.3980090203. 9. Jalles J. Structural time series models and the Kalman Filter: a concise review. 2009;Available from: https://papers.ssrn.com/ sol3/papers.cfm?abstract_id=1496864. 10. Taylor S, and BL. Forecasting at scale. The American Statis- tician. 2018;72(1):37–45. Available from: https://doi.org/10. 1080/00031305.2017.1380080. 11. Harvey A. Trends and cycles in macroeconomic time se- ries. Journal of Business & Economic Statistics. 1985;3(3):216– 227. Available from: https://doi.org/10.1080/07350015.1985. 10509453. 12. Kitagawa G, Gersch W. A smoothness priors-state space modeling of time series with trend and seasonality. Jour- nal of the American Statistical Association. 1984;79(386):378– 389. Available from: https://doi.org/10.1080/01621459.1984. 10478060. 13. Harrison P, Stevens C. Bayesian forecasting. Journal of the Royal Statistical Society: Series B (Methodological). 1976;38(3):205–228. Available from: https://doi.org/10.1111/ j.2517-6161.1976.tb01586.x. 14. Koopman S, Ooms M. Forecasting economic time series using unobserved components time series models. 2011;p. 129–162. Available from: https://doi.org/10.1093/oxfordhb/ 9780195398649.013.0006. 15. Lipton Z, Berkowitz J, Elkan C. A critical review of recur- rent neural networks for sequence learning. arXiv preprint arXiv:150600019. 2015;. 16. Rumelhart D, Hinton G, Williams R. Learning representations by back-propagating errors. Cognitive modeling. 1988;5(3):1. 17. Elman J. Finding structure in time. Cognitive science. 1990;14(2):179–211. Available from: https://doi.org/10.1207/ s15516709cog1402_1. 18. Haykin S, Principe J, Sejnowski T, Mcwhirter J. Modeling Large Dynamical Systems with Dynamical Consistent Neural Net- works;. 19. Goodfellow I, Bengio Y, Courville A. Deep learning. MIT press. 2016;. 20. Jordan MI. Serial order: a parallel distributed processing ap- proach. Technical report, June 1985-March 1986. California Univ., San Diego, La Jolla (USA). Inst for Cognitive Science. 1986;. 21. Ruder S. An overview of gradient descent optimization algo- rithms. arXiv preprint arXiv:160904747. 2016;. 22. Qian N. On themomentum term in gradient descent learning algorithms. Neural networks. 1999;12(1):145–151. Available from: https://doi.org/10.1016/S0893-6080(98)00116-6. 23. Yu N. Introductory lectures on convex optimization: a basic course. Springer US. 2004;. 24. Hochreiter S, Schmidhuber J. Long short-termmemory. Neu- ral computation. 1997;9(8):1735–1780. PMID: 9377276. Avail- able from: https://doi.org/10.1162/neco.1997.9.8.1735. 25. Nair V, Hinton G. Rectified linear units improve restricted boltzmannmachines. InProceedings of the 27th international conference on machine learning (ICML-10). 2010;p. 807–814. 26. He K, Zhang X, Ren S, Sun J. Deep residual learning for im- age recognition. InProceedings of the IEEE conference on computer vision and pattern recognition. 2016;p. 770–778. Available from: https://doi.org/10.1109/CVPR.2016.90PMid: 26180094. 27. Hochreiter S. Untersuchungen zu dynamischen neu- ronalen Netzen. Diploma, Technische Universität München. 1991;91(1). 28. Bengio Y, Simard P, Frasconi P. Learning long-term depen- dencieswith gradient descent is difficult. IEEE transactions on neural networks. 1994;5(2):157–166. PMID: 18267787. Avail- able from: https://doi.org/10.1109/72.279181. 29. Harmon M, Klabjan D. Dynamic prediction length for time series with sequence to sequence networks. arXiv preprint arXiv:180700425. 2018;. 30. Zhou C, Sun C, Liu Z, Lau F. A C-LSTM neural network for text classification. arXiv preprint arXiv:151108630. 2015;. 31. Graves A, Jaitly N, Mohamed A. Hybrid speech recognition with deep bidirectional LSTM. In 2013 IEEE workshop on au- tomatic speech recognition and understanding. IEEE. 2013;p. 273–278. Available from: https://doi.org/10.1109/ASRU.2013. 6707742. 32. Gers F, Schmidhuber J, Cummins F. Learning to forget: Con- tinual prediction with LSTM;. 33. Greff K, Srivastava R, Koutník J, Steunebrink B, Schmidhuber J. LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems. 2016;28(10):2222–32. PMID: 27411231. Available from: https://doi.org/10.1109/TNNLS. 2016.2582924. 34. Graves A. Supervised sequence labelling. InSupervised se- quence labelling with recurrent neural networks. Springer, Berlin, Heidelberg. 2012;p. 5–13. Available from: https://doi. org/10.1007/978-3-642-24797-2_2. 35. Chollet FK. Deep learning library for theano and tensorflow. 2015;7(8). Available from: https://keras.io/k. 36. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, et al. Ten- sorflow: A system for large-scale machine learning. In12th {USENIX} Symposium on Operating Systems Design and Im- plementation ({OSDI} 16). 2016;p. 265–283. 37. Cho K, Merriënboer BV, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:14061078. 2014;Available from: https://doi.org/ 10.3115/v1/D14-1179. 38. Britz D, Goldie A, Luong M, Le Q. Massive explo- ration of neural machine translation architectures. arXiv preprint arXiv:170303906. 2017;Available from: https://doi. org/10.18653/v1/D17-1151. 39. van der Westhuizen J, Lasenby J. The unreasonable effective- ness of the forget gate. arXiv preprint arXiv:180404849. 2018;. 40. Sutskever I, Vinyals O, Le Q. Sequence to sequence learning with neural networks. In Advances in neural information pro- cessing systems. 2014;p. 3104–3112. 41. Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko K. Sequence to sequence-video to text. InPro- ceedings of the IEEE international conference on computer vision;p. 4534–4542. Available from: https://doi.org/10.1109/ ICCV.2015.515. 42. Tang Y, Xu J, Matsumoto K, Ono C. Sequence-to-sequence model with attention for time series classification. In2016 IEEE 513 Science & Technology Development Journal – Economics - Law andManagement, 4(1):500-515 16th International Conference on Data Mining Workshops (ICDMW) . IEEE. 2016;p. 503–510. Available from: https://doi. org/10.1109/ICDMW.2016.0078. 43. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:14090473. 2014;. 44. Yang Y, Linkens D, Mahfouf M. Nonlinear Data Transfonna- tion to Improve FlowStress PredictionUsingNeural Networks. IFAC Proceedings Volumes. 2004;37(15):371–376. Available from: https://doi.org/10.1016/S1474-6670(17)31052-2. 45. Changyong F, Hongyue W, Naiji L, Tian C, Hua H, Ying L. Log- transformation and its implications for data analysis. . Shang- hai archives of psychiatry. 2014;26(2):105. 46. Kim K. Financial time series forecasting using support vector machines. Neurocomputing. 2003;55(1-2):307–319. Available from: https://doi.org/10.1016/S0925-2312(03)00372-2. 47. Facebook. Prophet Notebook;. 48. HintonG. Nitish Srivastava Alex Krizhevsky Ilya Sutskever and Ruslan Salakhutdinov. Improvingneural networks byprevent- ing co-adaptation of feature detectors. CoRR. 2012;. 49. Chong Y, Hendry D. Econometric evaluation of linear macro-economic models. The Review of Economic Studies. 1986;1;53(4):671–690. Available from: https://doi.org/10.2307/ 2297611. 50. Vavra M. On a Bootstrap Test for Forecast Evaluations. Re- search Department, National Bank of Slovakia. 2015;. 51. Diebold F. Comparing predictive accuracy, twenty years later: A personal perspective on the use and abuse of Diebold- Mariano tests. Journal of Business & Economic Statis- tics. 2015;33(1):1. Available from: https://doi.org/10.1080/ 07350015.2014.983236. 514 Tạp chí Phát triển Khoa học và Công nghệ – Kinh tế-Luật và Quản lý, 4(1):500-515 Open Access Full Text Article Bài Nghiên cứu 1Tài Chính Tính Toán Định Lượng, Viện John von Neumann, Thành phố Hồ Chí Minh, Việt Nam 2Toán Kinh Tế, Trường Đại học Kinh tế Luật, Thành phố Hồ Chí Minh, Việt Nam Liên hệ Lưu Hoài Thương Quốc, Tài Chính Tính Toán Định Lượng, Viện John von Neumann, Thành phố Hồ Chí Minh, Việt Nam Email: quoc.luu2015@qcf.jvn.edu.vn Lịch sử Ngày nhận: 28/6/2019 Ngày chấp nhận: 25/9/2019 Ngày đăng: 31/3/2020 DOI : 10.32508/stdjelm.v4i1.593 Bản quyền © ĐHQG Tp.HCM. Đây là bài báo công bố mở được phát hành theo các điều khoản của the Creative Commons Attribution 4.0 International license. Dự báo chuỗi thời gian: sự kết hợp giữamô hình Long Short-term Memory vàmô hình cấu trúc chuỗi thời gian Lưu Hoài Thương Quốc1,*, Nguyễn Phúc Sơn2, PhạmHoàng Uyên1,2 Use your smartphone to scan this QR code and download this article TÓM TẮT Thị trường chứng khoán là một kênh huy động vốn quan trọng cho nền kinh tế. Tuy nhiên, thị trường có một sự mất mát tiềm tàng do sự biến động của giá cổ phiếu để phản ánh các sự kiện không chắc chắn như tin tức chính trị, nguồn cung và nhu cầu của khối lượng giao dịch hàng ngày. Có nhiều cách khác nhau để giảm rủi ro như xây dựng và tối ưu hóa danh mục đầu tư, phát triển chiến lược phòng ngừa rủi ro. Vì thế kỹ thuật dự báo chuỗi thời gian có thể rất hữu ích nhằm giúp cải thiện hiệu suất lợi nhuận cao hơn trên thị trường chứng khoán. Gần đây, thị trường chứng khoán Việt Nam ngày càng được chú ý bởi hiệu suất đầu tư và vốn hóa đang được cải thiện. Trong nghiên cứu này, chúng tôi đề xuất mô hình kết hợp giữa mô hình Sequence to Sequence với kiến trúc mạng bộ nhớ dài-ngắn (Long Short-Term Memory) của học sâu và mô hình cấu trúc chuỗi thời gian. Chúng tôi dùng dữ liệu giá của 21 cổ phiếu được niêm yết có giao dịch nhiều nhất trên sàn giao dịch chứng khoán Hồ Chí Minh (HOSE) và sàn giao dịch chứng khoán Hà Nội (HNX) của thị trường chứng khoán Việt Nam để đánh giá độ chính xác của mô hình đề xuất với mô hình Sequence to Sequence và mô hình cấu trúc chuỗi thời gian thuần. Mặt khác, để kiểm tra lại tính ứng dụng củamôhình trongmôi trường đầu tư thực tế, chúng tôi dùngmôhình đề xuất cho quyết định mua (Long) hay bán (Short) hợp đồng tương lai VN30F1M (hợp đồng tương lai chỉ số VN30 kỳ hạn một tháng) được niêm yết trên sàn HNX. Kết quả cho thấy mô hình đề xuất kết hợp giữa Sequence to Sequence với kiến trúc mạng bộ nhớ dài-ngắn và mô hình cấu trúc chuỗi thời gian đạt hiệu quả cao hơn với sai số nhỏ hơn các mô hình thuần trong việc dự báo giá chứng khoán và có lời đối với giao dịch hợp đồng tương lai. Nghiên cứu này có ý nghĩa tích cực trong việc đóng góp vào cơ sở lý luận của dự báo chuỗi thời gian bởi phương pháp được đề xuất trong nghiên cứu này giúp bỏ qua những giải định khó thoản mãn trong môi trường tài chính thực tế của các phương pháp hiện tại như Auto-regressive–moving-average model, Generalized Auto-regressive Conditional Heteroskedasticity. Về mặt ứng dụng, các nhà đầu tư có thể sử dụngmômình để phát triển các chiến thuật để giao dịch trên thị trường chứng khoán Việt Nam. Từ khoá: LSTM, Seq2Seq, Mô hình cấu trúc, mô hình kết hợp Trích dẫn bài báo này: Hoài Thương Quốc L, Phúc Sơn N, Hoàng Uyên P.Dự báo chuỗi thời gian: sự kết hợp giữa mô hình Long Short-term Memory và mô hình cấu trúc chuỗi thời gian. Sci. Tech. Dev. J. - Eco. LawManag.; 4(1):500-515. 515

Các file đính kèm theo tài liệu này:

time_series_prediction_a_combination_of_long_short_term_memo.pdf