The p-value is within machine precision indicating that the relationship is statistically very significant, so should act as a good predictor of stock market returns and signal for my model. To determine this, we can look at a t-test between the returns and the google trend index.ĭata: tot_df$AAPL.Adjusted and tot_df$indexĪlternative hypothesis: true difference in means is not equal to 0 But the important question is whether it is a good predictor of stock market returns. There appears to be some time-dependence, with spikes roughly twice per year. We can see that there is a clear time-dependent trend in the data, with some seasonality. To get a feel of what we are trying to predict we can plot the adjusted stock price of Apple as a function of time. The time series model I will use is an autoregressive intergrated moving average (ARIMA) model, this model will take \(x\) number of days of time series data and use it to forecast a given number of days ahead. I will quantify these inputs and use them as signals to increase the performance of my model. I test the hypothesis on the Apple stock since it is very popular and often searched term on google, and news stories are plentiful. I will use google trends since it gathers the total number of searches relative to the total search volume in google, and text-mine google news headlines for the sentiment as my "signals" to indicate buy-sell positions. I don't think any of the aforementioned techniques would be useful, since the data points seem to always be big numbers, but I am open to ideas.Here I hypothesize that trends in stock market returns can be predicted by the relative changes in amount terms related to the stock index is searched on search engines, and by sentiment derived by news headlines realted to publicy-traded companies. There simply is too little data and for this term and even multiple days of scraping returns single spikes usually have a high value (80-100) and tend to overlap. Very weak terms that can not be salvaged.Would a better approach here be to take the average or the median, also should the samples that produce 0 be considered when calculating those metrics? In theory if a certain data point has appeared as 0 many times, does that mean the data for that week is scarcer than for other weeks where it might be 0 less frequently? I can't prove it, but I feel like any non-zero data should be included in the combined dataframe, rather than lost with the median and average metrics (median of will be 0 and the average would be 1.) Perhaps it will be interesting to experiment with average/median values and see how they perform (see next point).Īs we can see, despite looking funny with many data points at 50 and 100, the product is quite decent and some of the resulting data sets perform better than a random sample. If fact this worsens the quality of the data for the purposes it is needed. Here there is no point of doing this, since each data set is robust and has a value for each day. Popular terms where the data is consistent.The left graph is simply a plot of all the data sets and the right is the results of the rudimental combination I did. I have identified three different categories of search terms as detailed below. I have done a quick test with the most rudimental way of combining the data sets possible: just picking the maximum value for each week from the available data sets - we want to avoid 0s, anything is better than a 0. This is not an issue for western countries, but I am trying to conduct research based on search frequency in developing countries and here the lack of data makes some of the search samples very small and the data very scarce (sometimes as small as 0-2 data points for a period of 18 months).Īs other ways to acquire Trends data from Google have failed, I have resorted to scraping data daily to investigate whether it can be combined in a way that would create the most representative sample possible (the closest to the raw data we don't have). As you might know, Google Trends works by normalising a random sample of the search term data, with the sample changing at least once per day, from my experience.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |