Stock Movement Prediction from Tweets and Historical Prices
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Stock Movement Prediction from Tweets and Historical Prices Yumo Xu and Shay B. Cohen School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh EH8 9AB yumo.xu@ed.ac.uk, scohen@inf.ed.ac.uk Abstract More recently, Hu et al. (2018) propose to mine news sequence directly from text with hierarchical Stock movement prediction is a challeng- attention mechanisms for stock trend prediction. ing problem: the market is highly stochas- However, stock movement prediction is widely tic, and we make temporally-dependent considered difficult due to the high stochasticity predictions from chaotic data. We treat of the market: stock prices are largely driven by these three complexities and present a new information, resulting in a random-walk pat- novel deep generative model jointly ex- tern (Malkiel, 1999). Instead of using only de- ploiting text and price signals for this terministic features, generative topic models were task. Unlike the case with discriminative extended to jointly learn topics and sentiments or topic modeling, our model introduces for the task (Si et al., 2013; Nguyen and Shirai, recurrent, continuous latent variables for a 2015). Compared to discriminative models, gener- better treatment of stochasticity, and uses ative models have the natural advantage in depict- neural variational inference to address the ing the generative process from market informa- intractable posterior inference. We also tion to stock signals and introducing randomness. provide a hybrid objective with tempo- However, these models underrepresent chaotic so- ral auxiliary to flexibly capture predictive cial texts with bag-of-words and employ simple dependencies. We demonstrate the state- discrete latent variables. of-the-art performance of our proposed In essence, stock movement prediction is a time model on a new stock movement predic- series problem. The significance of the temporal tion dataset which we collected.1 dependency between movement predictions is not addressed in existing NLP research. For instance, 1 Introduction when a company suffers from a major scandal on a Stock movement prediction has long attracted both trading day d1 , generally, its stock price will have a investors and researchers (Frankel, 1995; Edwards downtrend in the coming trading days until day d2 , et al., 2007; Bollen et al., 2011; Hu et al., 2018). i.e. [d1 , d2 ].2 If a stock predictor can recognize this We present a model to predict stock price move- decline pattern, it is likely to benefit all the predic- ment from tweets and historical stock prices. tions of the movements during [d1 , d2 ]. Otherwise, In natural language processing (NLP), public the accuracy in this interval might be harmed. This news and social media are two primary content re- predictive dependency is a result of the fact that sources for stock market prediction, and the mod- public information, e.g. a company scandal, needs els that use these sources are often discriminative. time to be absorbed into movements over time Among them, classic research relies heavily on (Luss and d’Aspremont, 2015), and thus is largely feature engineering (Schumaker and Chen, 2009; shared across temporally-close predictions. Oliveira et al., 2013). With the prevalence of deep Aiming to tackle the above-mentioned out- neural networks (Le and Mikolov, 2014), event- standing research gaps in terms of modeling high driven approaches were studied with structured market stochasticity, chaotic market information event representations (Ding et al., 2014, 2015). and temporally-dependent prediction, we propose 1 2 https://github.com/yumoxu/ We use the notation [a, b] to denote the interval of integer stocknet-dataset numbers between a and b.
StockNet, a deep generative model for stock price is widely used for predicting stock price movement prediction. movement (Xie et al., 2013) or financial volatility To better incorporate stochastic factors, we gen- (Rekabsaz et al., 2017). erate stock movements from latent driven factors modeled with recurrent, continuous latent vari- 3 Data Collection ables. Motivated by Variational Auto-Encoders (VAEs; Kingma and Welling, 2013; Rezende et al., In finance, stocks are categorized into 9 industries: 2014), we propose a novel decoder with a vari- Basic Materials, Consumer Goods, Healthcare, ational architecture and derive a recurrent varia- Services, Utilities, Conglomerates, Financial, In- tional lower bound for end-to-end training (Sec- dustrial Goods and Technology.5 Since high-trade- tion 5.2). To the best of our knowledge, StockNet volume-stocks tend to be discussed more on Twit- is the first deep generative model for stock move- ter, we select the two-year price movements from ment prediction. 01/01/2014 to 01/01/2016 of 88 stocks to target, To fully exploit market information, StockNet coming from all the 8 stocks in Conglomerates and directly learns from data without pre-extracting the top 10 stocks in capital size in each of the other structured events. We build market sources by 8 industries (see supplementary material). referring to both fundamental information, e.g. We observe that there are a number of tar- tweets, and technical features, e.g. historical stock gets with exceptionally minor movement ratios. In prices (Section 5.1).3 To accurately depict predic- a three-way stock trend prediction task, a com- tive dependencies, we assume that the movement mon practice is to categorize these movements prediction for a stock can benefit from learning to to another “preserve” class by setting upper and predict its historical movements in a lag window. lower thresholds on the stock price change (Hu We propose trading-day alignment as the frame- et al., 2018). Since we aim at the binary clas- work basis (Section 4), and further provide a novel sification of stock changes identifiable from so- multi-task learning objective (Section 5.3). cial media, we set two particular thresholds, - We evaluate StockNet on a stock movement pre- 0.5% and 0.55% and simply remove 38.72% of the diction task with a new dataset that we collected. selected targets with the movement percents be- Compared with strong baselines, our experiments tween the two thresholds. Samples with the move- show that StockNet achieves state-of-the-art per- ment percents ≤-0.5% and >0.55% are labeled formance by incorporating both data from Twitter with 0 and 1, respectively. The two thresholds are and historical stock price listings. selected to balance the two classes, resulting in 26,614 prediction targets in the whole dataset with 2 Problem Formulation 49.78% and 50.22% of them in the two classes. We split them temporally and 20,339 movements be- We aim at predicting the movement of a target tween 01/01/2014 and 01/08/2015 are for training, stock s in a pre-selected stock collection S on a 2,555 movements from 01/08/2015 to 01/10/2015 target trading day d. Formally, we use the market are for development, and 3,720 movements from information comprising of relevant social media 01/10/2015 to 01/01/2016 are for test. corpora M, i.e. tweets, and historical prices, in the There are two main components in our dataset:6 lag [d − ∆d, d − 1] where ∆d is a fixed lag size. a Twitter dataset and a historical price dataset. We estimate the binary movement where 1 denotes We access Twitter data under the official license rise and 0 denotes fall, of Twitter, then retrieve stock-specific tweets by y = 1 pcd > pcd−1 (1) querying regexes made up of NASDAQ ticker symbols, e.g. “\$GOOG\b” for Google Inc.. We where pcd denotes the adjusted closing price ad- preprocess tweet texts using the NLTK package justed for corporate actions affecting stock prices, (Bird et al., 2009) with the particular Twitter e.g. dividends and splits.4 The adjusted closing paper, the problem is solved by keeping the notational con- 3 sistency with our recurrent model and using its time step t to To a fundamentalist, stocks have their intrinsic values that can be derived from the behavior and performance of index trading days. Details will be provided in Section 4. We their company. On the contrary, technical analysis considers use d here to make the formulation easier to follow. 5 only the trends and patterns of the stock price. https://finance.yahoo.com/industries 4 6 Technically, d − 1 may not be an eligible trading day Our dataset is available at https://github.com/ and thus has no available price information. In the rest of this yumoxu/stocknet-dataset.
mode, including for tokenization and treatment of calendar day used in existing research, as the ba- hyperlinks, hashtags and the “@” identifier. To al- sic unit for building samples. To this end, we first leviate sparsity, we further filter samples by ensur- find all the T eligible trading days referred in a ing there is at least one tweet for each corpus in sample, in other words, existing in the time in- the lag. We extract historical prices for the 88 se- terval [d − ∆d + 1, d]. For clarity, in the scope lected stocks to build the historical price dataset of one sample, we index these trading days with from Yahoo Finance.7 t ∈ [1, T ],8 and each of them maps to an ac- tual (absolute) trading day dt . We then propose 4 Model Overview trading-day alignment: we reorganize our inputs, including the tweet corpora and historical prices, by aligning them to these T trading days. Specif- X y ically, on the tth trading day, we recognize mar- ket signals from the corpus Mt in [dt−1 , dt ) and the historical prices pt on dt−1 , for predicting the Z ✓ movement yt on dt . We provide an aligned sam- |D| ple for illustration in Figure 2. As a result, ev- ery single unit in a sample is a trading day, and Figure 1: Illustration of the generative process we can predict a sequence of movements y = from observed market information to stock move- [y1 , . . . , yT ]. The main target is yT while the re- ments. We use solid lines to denote the generation mainder y ∗ = [y1 , . . . , yT −1 ] serves as the tempo- process and dashed lines to denote the variational ral auxiliary target. We use these in addition to the approximation to the intractable posterior. main target to improve prediction accuracy (Sec- tion 5.3). We provide an overview of data alignment, We model the generative process shown in Fig- model factorization and model components. ure 1. We encode observed market information As explained in Section 1, we assume that pre- as a random variable X = [x1 ; . . . ; xT ], from dicting the movement on trading day d can ben- which we generate the latent driven factor Z = efit from predicting the movements on its former [z1 ; . . . ; zT ] for our prediction task. For the afore- trading days. However, due to the general princi- mentioned multi-task learning purpose, we aim at ple of sample independence, building connections modeling theR conditional probability distribution directly across samples with temporally-close tar- pθ (y|X) = Z pθ (y, Z|X) instead of pθ (yT |X). get dates is problematic for model training. We write the following factorization for genera- As an alternative, we notice that within a sam- tion, ple with a target trading day d there are likely to pθ (y, Z|X) = pθ (yT |X, Z) pθ (zT |z
Training Objective ↵ y3 07/08 (c) Attentive Temporal Output Auxiliary (ATA) y1 y2 03/08 06/08 Temporal hdec Variational decoder Attention z g1 g2 g3 ⇥ 2 ⇤ DKL N (µ, ) k N (0, I) (a) Variational Movement h1 h2 h3 Decoder (VMD) z1 z2 z3 " log 2 µ N (0, I) henc Variational encoder Historical 02/08 03/08 06/08 Input (b) Market Information Prices Attention Attention Attention Encoder (MIE) (d) VAEs Bi-GRUs Message Embedding Layer 02/08 03/08 - 05/08 06/08 Message Corpora Figure 2: The architecture of StockNet. We use the main target of 07/08/2012 and the lag size of 5 for illustration. Since 04/08/2012 and 05/08/2012 are not trading days (a weekend), trading-day alignment helps StockNet to organize message corpora and historical prices for the other three trading days in the lag. We use dashed lines to denote auxiliary components. Red points denoting temporal objectives are integrated with a temporal attention mechanism to acquire the final training objective. comprises three primary components following a The basic strategy of acquiring ct is to first feed bottom-up fashion, messages into the Message Embedding Layer for their low-dimensional representations, then selec- 1. Market Information Encoder (MIE) that en- tively gather them according to their quality. To codes tweets and prices to X; handle the circumstance that multiple stocks are discussed in one single message, in addition to text 2. Variational Movement Decoder (VMD) that information, we incorporate the position informa- infers Z with X, y and decodes stock move- tion of stock symbols mentioned in messages as ments y from X, Z; well. Specifically, the layer consists of a forward 3. Attentive Temporal Auxiliary (ATA) that in- GRU and a backward GRU for the preceding and tegrates temporal loss through an attention following contexts of a stock symbol, s, respec- mechanism for model training. tively. Formally, in the message corpus of the tth trading day, we denote the word sequence of the 5 Model Components kth message, k ∈ [1, K], as W where W`? = s, `? ∈ [1, L], and its word embedding matrix as We detail next the components of our model (MIE, E = [e1 ; e2 ; . . . ; eL ]. We run the two GRUs as VMD, ATA) and the way we estimate our model follows, parameters. → − −−−→ → − h f = GRU(ef , h f −1 ) (4) 5.1 Market Information Encoder ←− ←−−− ←− h b = GRU(eb , h b+1 ) (5) MIE encodes information from social media and → − ← − stock prices to enhance market information qual- m = ( h `? + h `? )/2 (6) ity, and outputs the market information input X for VMD. Each temporal input is defined as where f ∈ [1, . . . , `? ], b ∈ [`? , . . . , L]. The stock symbol is regarded as the last unit in both the xt = [ct , pt ] (3) preceding and the following contexts where the → − ← − hidden values, h l? , h l? , are averaged to acquire where ct and pt are the corpus embedding and the the message embedding m. Gathering all message historical price vector, respectively. embeddings for the tth trading day, we have a mes-
sage embedding matrix Mt ∈ Rdm ×K . In prac- Neural approximation aims at minimizing tice, the layer takes as inputs a five-rank tensor for the Kullback-Leibler divergence between the a mini-batch, and yields all Mt in the batch with qφ (Z|X, y) and pθ (Z|X, y). Instead of optimiz- shared parameters. ing it directly, we observe that the following equa- Tweet quality varies drastically. Inspired by the tion naturally holds, news-level attention (Hu et al., 2018), we weight messages with their respective salience in col- log pθ (y|X) (10) lective intelligence measurement. Specifically, we =DKL [qφ (Z|X, y) k pθ (Z|X, y)] first project Mt non-linearly to ut , the normalized +Eqφ (Z|X,y) [log pθ (y|X, Z)] attention weight over the corpus, −DKL [qφ (Z|X, y) k pθ (Z|X)] ut = ζ(wu| tanh(Wm,u Mt )) (7) where DKL [q k p] is the Kullback-Leibler diver- gence between the distributions q and p. There- where ζ(·) is the softmax function and Wm,u ∈ fore, we equivalently maximize the following vari- Rdm ×dm , wu ∈ Rdm ×1 are model parameters. ational recurrent lower bound by plugging Eq. (2, Then we compose messages accordingly to ac- 9) into Eq. (10), quire the corpus embedding, L (θ, φ; X, y) (11) ct = Mt u|t . (8) T X = Eqφ (zt |z
and the shared hidden representation hzt as Training Objective ỹT hzt = tanh(Wzφ [zt−1 , xt , hst , yt ] + bφz ) (16) Temporal Attention 1 φ φ where Wz,µ , Wz,δ , Wzφ are weight matrices and bφµ , bφδ , bφz are biases. Dependency Score Since Gaussian distribution belongs to the gT “location-scale” distribution family, we can fur- Information Score ther reparameterize zt as zt = µt + δt (17) g1 g2 g3 where denotes an element-wise product. The noise term ∼ N (0, I) naturally involves stochas- Figure 3: The temporal attention in our model. tic signals in our model. Squares are the non-linear projections of gt and Similarly, We let the prior pθ (zt |z
where we adopt the KL term annealing trick (Bow- use the input dropout rate of 0.3 to regularize latent man et al., 2016; Semeniuta et al., 2017) and add variables. Tensorflow (Abadi et al., 2016) is used a linearly-increasing KL term weight λ ∈ (0, 1] to construct the computational graph of StockNet to gradually release the KL regularization effect in and hyper-parameters are tweaked on the develop- the training procedure. Then we reuse v ∗ to build ment set. the final temporal weight vector v ∈ R1×T , 6.2 Evaluation Metrics v = [αv ∗ , 1] (28) Following previous work for stock prediction (Xie where 1 is for the main prediction and we adopt the et al., 2013; Ding et al., 2015), we adopt the stan- auxiliary weight α ∈ [0, 1] to control the overall dard measure of accuracy and Matthews Corre- auxiliary effects on the model training. α is tuned lation Coefficient (MCC) as evaluation metrics. on the development set and its effects will be dis- MCC avoids bias dueto data skew. Given the con- cussed at length in Section 6.5. Finally, we write fusion matrix tp fn fp tn containing the number of the training objective F by recomposition, samples classified as true positive, false positive, true negative and false negative, MCC is calcu- N 1 X (n) (n) lated as F (θ, φ; X, y) = v f (29) N n tp × tn − fp × fn MCC = p . (tp + fp)(tp + fn)(tn + fp)(tn + fn) where our model can learn to generalize with (30) the selective attendance of temporal auxiliary. We take the derivative of F with respect to all the 6.3 Baselines and Proposed Models model parameters {θ, φ} through backpropagation for the update. We construct the following five baselines in differ- ent genres,10 6 Experiments • R AND: a naive predictor making random guess in up or down. In this section, we detail our experimental setup • ARIMA: Autoregressive Integrated Moving and results. Average, an advanced technical analysis 6.1 Training Setup method using only price signals (Brown, 2004) . We use a 5-day lag window for sample construc- • R AND F OREST: a discriminative Random For- tion and 32 shuffled samples in a batch.9 The max- est classifier using Word2vec text represen- imal token number contained in a message and tations (Pagolu et al., 2016). the maximal message number on a trading day • TSLDA: a generative topic model jointly are empirically set to 30 and 40, respectively, with learning topics and sentiments (Nguyen and the excess clipped. Since all tweets in the batched Shirai, 2015). samples are simultaneously fed into the model, • HAN: a state-of-the-art discriminative deep we set the word embedding size to 50 instead of neural network with hierarchical attention larger sizes to control memory costs and make (Hu et al., 2018). model training feasible on one single GPU (11GB memory). We set the hidden size of Message Em- To make a detailed analysis of all the primary bedding Layer to 100 and that of VMD to 150. components in StockNet, in addition to H EDGE - All weight matrices in the model are initialized F UNDA NALYST, the fully-equipped StockNet, we with the fan-in trick and biases are initialized with also construct the following four variations, zero. We train the model with an Adam optimizer • T ECHNICAL A NALYST: the generative StockNet (Kingma and Ba, 2014) with the initial learning using only historical prices. rate of 0.001. Following Bowman et al. (2016), we • F UNDAMENTAL A NALYST: the generative Stock- Net using only tweet information. 9 Typically the lag size is set between 3 and 10. As intro- • I NDEPENDENTA NALYST: the generative Stock- duced in Section 4, trading days are treated as basic units in StockNet and 3 calendar days are thus too short to guarantee Net without temporal auxiliary targets. the existence of more than one trading day in a lag, e.g. the 10 prediction for the movement of Monday. We also experiment We do not treat event-driven models as comparable with 7 and 10 but they do not yield better results than 5. methods since our model uses no event pre-extraction tool.
Baseline models Acc. MCC StockNet variations Acc. MCC R AND 50.89 -0.002266 T ECHNICAL A NALYST 54.96 0.016456 ARIMA (Brown, 2004) 51.39 -0.020588 F UNDAMENTAL A NALYST 58.23 0.071704 (Pagolu et al., 2016) R AND F OREST 53.08 0.012929 I NDEPENDENTA NALYST 57.54 0.036610 TSLDA (Nguyen and Shirai, 2015) 54.07 0.065382 D ISCRIMINATIVE A NALYST 56.15 0.056493 HAN (Hu et al., 2018) 57.64 0.051800 H EDGE F UNDA NALYST 58.23 0.080796 Table 1: Performance of baselines and StockNet variations in accuracy and MCC. 60 (a) 0.10 60 (b) 0.10 58.23 58 57.54 57.24 57.54 57.44 58 55.56 0.080796 0.08 56.15 0.08 56 56 55.06 54.27 54.46 54 0.06 54 0.052397 0.056493 53.37 0.06 0.048112 52.68 MCC Acc. 52 0.045046 52 0.036610 0.036610 0.032907 0.04 51.69 0.038252 0.035652 0.04 50 50 0.027161 50.79 0.029556 48 Acc. 0.010535 0.02 48 Acc. 0.02 MCC 0.007390 MCC 46 46 0.00 0.00 0.0 0.1 0.3 0.5 0.7 0.9 1.0 0.0 0.1 0.3 0.5 0.7 0.9 1.0 Figure 4: (a) Performance of H EDGE F UNDA NALYST with varied α, see Eq. (28). (b) Performance of D ISCRIMINATIVE A NALYST with varied α. • D ISCRIMINATIVE A NALYST: the discriminative LYST gains exceptionally competitive results with StockNet directly optimizing the likelihood only 0.009092 less in MCC than H EDGE F UNDA N - objective. Following Zhang et al. (2016), we ALYST . The performance of F UNDAMENTAL A NALYST set zt = µ0t to take out the effects of the KL and T ECHNICAL A NALYST confirm the positive ef- term. fects from tweets and historical prices in stock movement prediction, respectively. As an effective 6.4 Results ensemble of the two market information, H EDGE - Since stock prediction is a challenging task and F UNDA NALYST gains even better performance. a minor improvement usually leads to large po- Compared with D ISCRIMINATIVE A NALYST, the tential profits, the accuracy of 56% is generally performance improvements of H EDGE F UNDA NA - reported as a satisfying result for binary stock LYST are not from enlarging the networks, demon- movement prediction (Nguyen and Shirai, 2015). strating that modeling underlying market status We show the performance of the baselines and explicitly with latent driven factors indeed benefits our proposed models in Table 1. TLSDA is the stock movement prediction. The comparison with best baseline in MCC while HAN is the best I NDEPENDENTA NALYST also shows the effectiveness baseline in accuracy. Our model, H EDGE F UNDA N - of capturing temporal dependencies between pre- ALYST achieves the best performance of 58.23 in dictions with the temporal auxiliary. However, the accuracy and 0.080796 in MCC, outperforming effects of the temporal auxiliary are more complex TLSDA and HAN with 4.16, 0.59 in accuracy, and and will be analyzed further in the next section. 0.015414, 0.028996 in MCC, respectively. Though slightly better than random guess, clas- 6.5 Effects of Temporal Auxiliary sic technical analysis, e.g. ARIMA, does not yield We provide a detailed discuss of how the tempo- satisfying results. Similar in using only histori- ral auxiliary affects model performance. As intro- cal prices, T ECHNICAL A NALYST shows an obvious duced in Eq. (28), the temporal auxiliary weight advantage in this task compared ARIMA. We be- α controls the overall effects of the objective-level lieve there are two major reasons: (1) T ECHNICAL - temporal auxiliary to our model. Figure 4 presents A NALYST learns from training data and incorpo- how the performance of H EDGE F UNDA NALYST and rates more flexible non-linearity; (2) our test set D ISCRIMINATIVE A NALYST fluctuates with α. contains a large number of stocks while ARIMA As shown in Figure 4, enhanced by the temporal is more sensitive to peculiar sequence station- auxiliary, H EDGE F UNDA NALYST approaches the best arity. It is worth noting that F UNDAMENTAL A NA - performance at 0.5, and D ISCRIMINATIVE A NALYST
achieves its maximum at 0.7. In fact, objective- yumoxu/stocknet-dataset. level auxiliary can be regarded as a denoising reg- ularizer: for a sample with a specific movement Acknowledgments as the main target, the market source in the lag The authors would like to thank the three anony- can be heterogeneous, e.g. affected by bad news, mous reviewers and Miles Osborne for their help- tweets on earlier days are negative but turn to pos- ful comments. This research was supported by a itive due to timely crises management. Without grant from Bloomberg and by the H2020 project temporal auxiliary tasks, the model tries to iden- SUMMA, under grant agreement 688139. tify positive signals on earlier days only for the main target of rise movement, which is likely to result in pure noise. In such cases, temporal aux- References iliary tasks help to filter market sources in the lag as per their respective aligned auxiliary move- Martı́n Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, ments. Besides, from the perspective of training Andy Davis, Jeffrey Dean, Matthieu Devin, et al. variational models, the temporal auxiliary helps 2016. Tensorflow: Large-scale machine learning on H EDGE F UNDA NALYST to encode more useful infor- heterogeneous distributed systems. arXiv preprint mation into the latent driven factor Z, which is arXiv:1603.04467 . consistent with recent research in VAEs (Seme- Steven Bird, Ewan Klein, and Edward Loper. 2009. niuta et al., 2017). Compared with H EDGE F UND - Natural language processing with Python: analyz- A NALYST that contains a KL term performing dy- ing text with the natural language toolkit. O’Reilly namic regularization, D ISCRIMINATIVE A NALYST re- Media, Inc. quires stronger regularization effects coming with Johan Bollen, Huina Mao, and Xiaojun Zeng. 2011. a bigger α to achieve its best performance. Twitter mood predicts the stock market. Journal of Since y ∗ also involves in generating yT through computational science 2(1):1–8. the temporal attention, tweaking α acts as a trade- off between focusing on the main target and gener- Samuel R Bowman, Luke Vilnis, Oriol Vinyals, An- drew Dai, Rafal Jozefowicz, and Samy Bengio. alizing by denoising. Therefore, as shown in Fig- 2016. Generating sentences from a continuous ure 4, our models do not linearly benefit from space. In Proceedings of The 20th SIGNLL Confer- incorporating temporal auxiliary. In fact, the two ence on Computational Natural Language Learning. models follow a similar pattern in terms of per- Berlin, Germany, pages 10–21. formance change: the curves first drop down with Robert Goodell Brown. 2004. Smoothing, forecasting the increase of α, except the MCC curve for D IS - and prediction of discrete time series. Courier Cor- CRIMINATIVE A NALYST rising up temporarily at 0.3. poration. After that, the curves ascend abruptly to their max- imums, then keep descending till α = 1. Though Rich Caruana. 1998. Multitask learning. In Learning to learn, Springer, pages 95–133. the start phase of increasing α even leads to worse performance, when auxiliary effects are properly Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan. introduced, the two models finally gain better re- 2014. Using structured events to predict stock price sults than those with no involvement of auxiliary movement: An empirical investigation. In Proceed- ings of the 2014 Conference on Empirical Methods effects, e.g. I NDEPENDENTA NALYST. in Natural Language Processing. Doha, Qatar, pages 1415–1425. 7 Conclusion Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan. We demonstrated the effectiveness of deep gen- 2015. Deep learning for event-driven stock predic- erative approaches for stock movement predic- tion. In Proceedings of the 24th International Con- ference on Artificial Intelligence. Buenos Aires, Ar- tion from social media data by introducing gentina, pages 2327–2333. StockNet, a neural network architecture for this task. We tested our model on a new compre- Robert D Edwards, WHC Bassetti, and John Magee. hensive dataset and showed it performs better 2007. Technical analysis of stock trends. CRC press. than strong baselines, including implementation of previous work. Our comprehensive dataset is Jeffrey A Frankel. 1995. Financial markets and mone- publicly available at https://github.com/ tary policy. MIT Press.
Ziniu Hu, Weiqing Liu, Jiang Bian, Xuanzhe Liu, and Danilo Jimenez Rezende, Shakir Mohamed, and Daan Tie-Yan Liu. 2018. Listening to chaotic whispers: Wierstra. 2014. Stochastic backpropagation and ap- A deep learning framework for news-oriented stock proximate inference in deep generative models. In trend prediction. In Proceedings of the Eleventh Proceedings of the 31th International Conference ACM International Conference on Web Search and on Machine Learning. Beijing, China, pages 1278– Data Mining. ACM, Los Angeles, California, USA, 1286. pages 261–269. Robert P Schumaker and Hsinchun Chen. 2009. Tex- Diederik P. Kingma and Jimmy Ba. 2014. Adam: tual analysis of stock market prediction using break- A method for stochastic optimization. CoRR ing financial news: The azfin text system. ACM abs/1412.6980. Transactions on Information Systems 27(2):12. Diederik P Kingma and Max Welling. 2013. Auto- Stanislau Semeniuta, Aliaksei Severyn, and Erhardt encoding variational bayes. arXiv preprint Barth. 2017. A hybrid convolutional variational au- arXiv:1312.6114 . toencoder for text generation. In Proceedings of Quoc Le and Tomas Mikolov. 2014. Distributed repre- the 2017 Conference on Empirical Methods in Nat- sentations of sentences and documents. In Proceed- ural Language Processing. Copenhagen, Denmark, ings of the 31st International Conference on Inter- pages 627–637. national Conference on Machine Learning-Volume Jianfeng Si, Arjun Mukherjee, Bing Liu, Qing Li, 32. JMLR. org, Beijing, China, pages 1188–1196. Huayi Li, and Xiaotie Deng. 2013. Exploiting topic Piji Li, Wai Lam, Lidong Bing, and Zihao Wang. based twitter sentiment for stock prediction. In Pro- 2017. Deep recurrent generative decoder for ab- ceedings of the 51st Annual Meeting of the Associa- stractive text summarization. In Proceedings of the tion for Computational Linguistics (Volume 2: Short 2017 Conference on Empirical Methods in Natu- Papers). Sofia, Bulgaria, volume 2, pages 24–29. ral Language Processing. Copenhagen, Denmark, Boyi Xie, Rebecca J Passonneau, Leon Wu, and pages 2081–2090. Germán G Creamer. 2013. Semantic frames to pre- Ronny Luss and Alexandre d’Aspremont. 2015. Pre- dict stock price movement. In Proceedings of the dicting abnormal returns from news using text clas- 51st Annual Meeting of the Association for Com- sification. Quantitative Finance 15(6):999–1012. putational Linguistics. Sofia, Bulgaria, volume 1, pages 873–883. Burton Gordon Malkiel. 1999. A random walk down Wall Street: including a life-cycle guide to personal Biao Zhang, Deyi Xiong, Hong Duan, Min Zhang, et al. investing. WW Norton & Company. 2016. Variational neural machine translation. In Proceedings of the 2016 Conference on Empirical Thien Hai Nguyen and Kiyoaki Shirai. 2015. Topic Methods in Natural Language Processing. Austin, modeling based sentiment analysis on social media Texas, USA, pages 521–530. for stock market prediction. In Proceedings of the 53rd Annual Meeting of the Association for Compu- tational Linguistics and the 7th International Joint Conference on Natural Language Processing. Bei- jing, China, volume 1, pages 1354–1364. Nuno Oliveira, Paulo Cortez, and Nelson Areal. 2013. Some experiments on modeling stock market be- havior using investor sentiment analysis and posting volume from twitter. In Proceedings of the 3rd In- ternational Conference on Web Intelligence, Mining and Semantics. ACM, Madrid, Spain, page 31. Venkata Sasank Pagolu, Kamal Nayan Reddy, Gana- pati Panda, and Babita Majhi. 2016. Sentiment analysis of twitter data for predicting stock market movements. In Proceedings of 2016 International Conference on Signal Processing, Communication, Power and Embedded System. IEEE, Rajaseetapu- ram, India, pages 1345–1350. Navid Rekabsaz, Mihai Lupu, Artem Baklanov, Alexander Dür, Linda Andersson, and Allan Han- bury. 2017. Volatility prediction using financial dis- closures sentiments with word embedding-based ir models. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada, volume 1, pages 1712–1721.
You can also read