class: center, middle, inverse, title-slide # Time series models ### Dr. Olanrewaju Michael Akande ### Nov 14, 2019 --- ## Announcements - Final lab tomorrow. - Survey on slots for final project presentation over the next few days. - Guidelines for final project presentations next week. ## Outline - Introduction - Stationarity - AR models - MA models - ARMA, ARIMA?? --- ## Introduction - When data are ordered in time, responses and errors from one period may influence responses and errors from another period. -- - For example, it is reasonable to expect unemployment rate in a month to be correlated with unemployment rate in previous month(s). -- - Another example: weather events in current time period may depend on weather events in previous time period. -- - These are called .hlight[time series] data. -- - Correlation due to time is called serial correlation or autocorrelation. -- - We will only scratch the surface in this course. --- ## Goals of time series analysis - .hlight[Forecasting outcomes] -- + Given a series of outcomes ordered in time, predict the values of the outcomes in the future. -- + Examples: - forecasting future price of oil given historical oil prices. - predicting future price of a particular stock price given past prices of the same stock. -- + When forecasting, it is important to also report an interval estimate to incorporate uncertainty about future values. --- ## Goals of time series analysis - .hlight[Forecasting outcomes] -- + Forecasting outcomes using predictors may involve building a model for the predictors as well, since we can't observe them in the future. -- + For example, predicting inflation rate given employment rate requires estimating future values for the employment rate as well. -- - .hlight[Learning relationships with data ordered in time]. -- + How are outcomes correlated over time? Are there periodic relationships in outcomes? -- + Regressions of outcomes on predictors, accounting for correlated errors due to time series. --- ## Motivating example: FTSE 100 - The FTSE (Financial Times Stock Exchange) 100 Index is a share index of the 100 companies listed on the London Stock Exchange with the highest market capitalization. -- - A share index is essentially a form of weighted average of prices of selected stocks. -- - To motivate our discussions on time series, let's look at data for FTSE 100 returns in 2018. ```r ftse100 <- read.csv("data/ftse2018.csv", header = T) head(ftse100) ``` ``` ## Date Open High Low Close ## 1 11/7/2018 7040.68 7136.75 7040.68 7117.28 ## 2 11/6/2018 7103.84 7117.50 7027.45 7040.68 ## 3 11/5/2018 7094.12 7140.37 7077.40 7103.84 ## 4 11/2/2018 7114.66 7196.39 7094.12 7094.12 ## 5 11/1/2018 7128.10 7165.61 7085.74 7114.66 ## 6 10/31/2018 7035.85 7161.54 7035.85 7128.10 ``` - Can we forecast closing prices for the next five days from 11/7/2018? --- ## Motivating example: FTSE 100 Notice that the data go from latest to earliest date, so let's invert the order of the rows to make the time series increasing in date. -- ```r ftse100 <- ftse100[nrow(ftse100):1,] dim(ftse100) ``` ``` ## [1] 211 5 ``` ```r head(ftse100) ``` ``` ## Date Open High Low Close ## 211 1/10/2018 7731.02 7756.11 7716.21 7748.51 ## 210 1/11/2018 7748.51 7768.96 7734.64 7762.94 ## 209 1/12/2018 7762.94 7792.56 7752.63 7778.64 ## 208 1/15/2018 7778.64 7783.61 7763.43 7769.14 ## 207 1/16/2018 7769.14 7791.83 7740.55 7755.93 ## 206 1/17/2018 7755.93 7755.93 7711.11 7725.43 ``` --- ## Motivating example: FTSE 100 Plot the closing prices to see what a simple time series data looks like. ```r tsClose <- ts(ftse100$Close); ts.plot(tsClose,col="red3") ``` <img src="22-time-series-I_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> -- - It is reasonable to expect closing prices for a particular day to be correlated with closing prices for previous days. -- - How many of the previous days? We will have to investigate! --- ## Motivating example: Sunspots and melanoma - We will revisit that data but let's look at different example, where we also have a predictor. -- - Incidence of melanoma (skin cancer) may be related to solar radiation. -- - Annual data from Connecticut tumor registry on age adjusted melanoma incidence rates (per 100000 people). Treat these rates as without error. -- - We also have annual data on relative sunspot (dark spots on the sun caused by intense magnetic activity) activity. -- - Data go from 1936 to 1972. --- ## Motivating example: Sunspots and melanoma ```r cancersun <- read.csv("data/melanoma.csv", header = T) names(cancersun) = c("year", "melanoma", "sunspot") str(cancersun) ``` ``` ## 'data.frame': 37 obs. of 3 variables: ## $ year : int 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 ... ## $ melanoma: num 1 0.9 0.8 1.4 1.2 1 1.5 1.9 1.5 1.5 ... ## $ sunspot : num 40 115 100 80 60 40 23 10 10 25 ... ``` ```r head(cancersun) ``` ``` ## year melanoma sunspot ## 1 1936 1.0 40 ## 2 1937 0.9 115 ## 3 1938 0.8 100 ## 4 1939 1.4 80 ## 5 1940 1.2 60 ## 6 1941 1.0 40 ``` --- ## Motivating example: Sunspots and melanoma ```r ggplot(cancersun, aes(x=sunspot, y=melanoma)) + geom_point(alpha = .5,colour="blue4") + geom_smooth(method="lm",col="red3") + labs(title="Melanoma Incidence Rate vs Sunspots") ``` <img src="22-time-series-I_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> Weak positive (maybe!) relationship between them. --- ## Motivating example: Sunspots and melanoma Let's look at melanoma incidence rate in time ```r tsMelanoma <- ts(cancersun$melanoma); ts.plot(tsMelanoma,col="blue4") ``` <img src="22-time-series-I_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> Trend in time, some of which we might be able to explain using `sunspots`. --- ## Motivating example: Sunspots and melanoma Let's fit a linear model to the relationship between the two variables. ```r regmelanoma = lm(melanoma ~ sunspot, data = cancersun) ggplot(cancersun, aes(x=sunspot, y=regmelanoma$residual)) + geom_point(alpha = .5,colour="blue4") + geom_smooth(method="lm",col="red3") + labs(title="Residuals vs Sunspots") ``` <img src="22-time-series-I_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> Residuals look fine here. --- ## Motivating example: Sunspots and melanoma Let's plot the residuals versus year. ```r ggplot(cancersun, aes(x=year, y=regmelanoma$residual)) + geom_point(alpha = .5,colour="blue4") + geom_smooth(method="lm",col="red3") + labs(title="Residuals vs Year") ``` <img src="22-time-series-I_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> Huge trend! What to do??? --- ## Stationarity - The most common time series models usually assume what is known as .hlight[stationarity]. -- - .hlight[Stationarity] of a time series process means that the marginal distribution of any part of the series does not depend on time. -- - Basically, the locations in time themselves does not matter in the marginal and joint distributions; however, the differences in locations, that is, the lags, do matter! -- - Put a different way, a stationary time series is one whose properties do not depend on the particular time at which the series is observed. -- - Examples of non stationary series: + Steadily increasing trend (like the melanoma example). + Known seasonal trends, like increase in sales before Christmas. + Break in trend due to some external event. --- ## Stationarity - Denote the times series for the outcome by `\(y_t\)`. -- - Stationarity `\(\Rightarrow\)` + `\(\mathbb{Pr}(y_t)\)` is the same for all `\(t\)`, -- + `\(\mathbb{Pr}(y_t,y_{t+1})\)` is the same for all `\(t\)`, and so on... -- - .hlight[Weak stationarity] requires that only marginal moments, that is, means, variances and covariances are the same. -- - Stationarity `\(\Rightarrow\)` weak stationarity, but the converse need not hold. -- - For a normal distribution, the mean and variance completely characterizes the distribution, so that stationarity and weak stationarity will be equivalent. -- - Why does that matter? When dealing with linear models, what distribution do we assume?... --- ## Popular stationary models - We will mainly focus on two types of stationary time series models: -- + Autoregressive models (AR models) - Value of outcome at time `\(t\)` is correlated with value at previous times. -- + Moving average models (MA models) - Value of outcome at time `\(t\)` is correlated with value of prediction errors at previous times. -- - Side note: autoregressive moving average models (ARMA models) - Combination of AR and MA. -- - There are many more types of time series models (see STA 642/942). --- ## Autocorrelation - .hlight[Autocorrelation] (serial correlation) measures the strength of the linear relationship between `\(y_t\)` and its lagged values. -- - The lag `\(k\)` autocorrelation `\(\rho_k\)` measures the correlation in outcomes at time `\(t\)` and at time `\(t-k\)`, where `\(k\)` indicates how far back to go; `\(k\)` is called a lag. -- - Recall that for two random variables `\(X\)` and `\(Y\)`, we have .block[ .small[ `$$\mathbb{Corr}[X,Y] = \dfrac{\mathbb{E}[(X-\mu_X)(Y-\mu_Y)]}{\sqrt{\mathbb{V}[X]\mathbb{V}[Y]}}.$$` ] ] -- - The sample autocorrelation `\(r_k\)` can be calculated using .block[ .small[ `$$r_k = \dfrac{\sum_{t=k+1}^T(y_t - \bar{y})(y_{t-k} - \bar{y})}{\sum_{t=1}^T(y_t - \bar{y})^2}.$$` ] ] -- - .question[Why do we only have an estimate of the variance of y at time t in the denominator?] Think stationarity... --- ## Partial autocorrelation - Autocorrelation `\(\rho_k\)` (and the sample version `\(r_k\)`) between `\(y_t\)` and `\(y_{t-k}\)` will also include the linear relationships between `\(y_t\)` and each of `\(y_{t-1}, y_{t-2}, \ldots, y_{t-k+1}\)`. -- - As you will see, we will need to be able to assess the correlation between `\(y_t\)` and `\(y_{t-k}\)` without interference from the other lags. -- - .hlight[Partial autocorrelation] lets us do just that. It is the autocorrelation between `\(y_t\)` and `\(y_{t-k}\)`, with all the linear relationships between `\(y_t\)` and each of `\(y_{t-1}, y_{t-2}, \ldots, y_{t-k+1}\)` removed. -- - In `R`, use `acf` to compute and plot autocorrelations and `pacf` to compute and plot partial autocorrelations. --- ## AR models - The most common time series model is called the .hlight[autoregressive (AR)] model. -- - When only one lag matters, the zero-mean AR(1) model is .block[ .small[ `$$y_t = \phi y_{t-1} + \epsilon_t; \ \ \epsilon_t \sim N(0, \sigma^2).$$` ] ] -- - With a non-zero mean, we have .block[ .small[ `$$y_t = \mu + \phi y_{t-1} + \epsilon_t; \ \ \epsilon_t \sim N(0, \sigma^2).$$` ] ] -- - When the mean is non-zero, we can choose to de-mean (mean-center) the series and model that instead. -- - In both cases, for the AR(1) we basically have a linear regression where the value of the outcome at time `\(t\)` depends on value of outcome at time `\(t-1\)`. -- - `\(\phi\)` is the autocorrelation. --- ## AR models - For the zero-mean AR(1) model, -- + `\(|\phi|<1\)` represents stationary time series. -- + `\(\phi=1\)` is a random walk. -- + `\(|\phi|>1\)` implies non-stationary, "explosive" models. -- - A stationary AR(1) series varies around its mean, randomly wandering off away from the mean in response to the "input" values of the random `\(\epsilon_t\)` series, but always returning to near the mean, and never "exploding" away for more than a short time. -- - AR(1) series with `\(0<\phi<1\)` represent short-term, positive correlations that would damp out exponentially if `\(\epsilon_t\)` were zero. -- - Negative values of `\(\phi\)` represent short-term, negative correlations. --- ## AR models - Let's explore what AR(1) models look like via simulations. -- - Move to the R script [here](https://ids-702-f19.github.io/Course-Website/slides/lec-slides/TS_simulations.R). -- - Note that -- + autocorrelations decay steadily with lags. -- + partial autocorrelations go to zero after lag p. --- ## AR models - For a zero mean AR(p) model, we have .block[ .small[ `$$y_t = \sum_{k=1}^{p} \phi_k y_{t-k} + \epsilon_t; \ \ \epsilon_t \sim N(0, \sigma^2).$$` ] ] -- - So that for a non-zero mean AR(p) model, we have .block[ .small[ `$$y_t = \mu + \sum_{k=1}^{p} \phi_k y_{t-k} + \epsilon_t; \ \ \epsilon_t \sim N(0, \sigma^2).$$` ] ] -- - AR(p) models are capable of adequately representing a wide range of observed behaviors in time series for large enough `\(p\)`. --- ## AR models: how many lags? - Several ways to decide how many lags to include. -- - Use graphical techniques + Look at partial autocorrelation plots. -- + Set `\(p\)` at lag where correlations become small enough not to be important. -- - Use a model selection criterion like BIC. -- - See section 8.6 of the assigned readings. -- - Sometimes in time series data, the partial autocorrelations are small even at lag 1. -- - In this case, it can be reasonable to skip autoregressive models and just use usual linear regression modeling approaches. --- ## What if the series is not stationary? - Sometimes transformations can make stationarity a reasonable assumption. -- - Differencing (subtract lagged values from outcome at time `\(t\)`) also often help; changes over time are more likely to be stationary than the raw values. -- - Including predictors can also help as we will see later with the melanoma example. -- - There are other models for non stationary time series. --- ## AR(p): including predictors - We also might want to account for serial correlation in regression modeling. -- - Linear regression assumes independent errors across individuals. -- - As we have already seen with the melanoma example, this may not be reasonable with time series data. -- - With a single predictor `\(x_t\)`, we have .block[ .small[ `$$y_t = \mu + \sum_{k=1}^{p} \phi_k y_{t-k} + x_t + \epsilon_t; \ \ \epsilon_t \sim N(0, \sigma^2).$$` ] ] -- - That is, the value of outcome at time `\(t\)` depends on value of outcome at time `\(t-1, t-2, \ldots, t-k\)`, but also on the predictor `\(x\)` at time `\(t\)`. -- - Easy to extend the model to multiple predictors. --- ## Model assumptions: stationarity - Coefficients and regression variance do not change with time. -- + Apart from changes in explanatory variables, the behavior of the time series is the same at different segments of time. -- + Generally, no predictable patterns in the long term -- - Diagnostics: check if patterns in residuals are similar across time. -- - Tests: + Ljung-Box + Augmented Dickey–Fuller (ADF) + Kwiatkowski-Phillips-Schmidt-Shin (KPSS) -- - Remedies: + Sometimes transformations (e.g., using logs) can make stationarity more reasonable. + Use time series models that allows drift. --- ## Model assumptions: others - Other assumptions 1. Linearity 2. Independence of errors 3. Equal variance 4. Normality -- - Diagnose using the same methods we used for linear regression. -- - Remedies include transformations and model changes as we had before. --- ## MA models - The zero-mean MA(1) model is .block[ .small[ `$$y_t = \phi \epsilon_{t-1} + \epsilon_t; \ \ \epsilon_t \sim N(0, \sigma^2).$$` ] ] -- - With a non-zero mean, we have .block[ .small[ `$$y_t = \mu + \phi \epsilon_{t-1} + \epsilon_t; \ \ \epsilon_t \sim N(0, \sigma^2).$$` ] ] -- - The value of the outcome at time `\(t\)` depends on the value of the deviation from the mean (the error term) at time `\(t-1\)`. -- - For a zero mean MA(p) model, we have .block[ .small[ `$$y_t = \sum_{k=1}^{p} \phi_k \epsilon_{t-k} + \epsilon_t; \ \ \epsilon_t \sim N(0, \sigma^2).$$` ] ] -- - So that for a non-zero mean MA(p) model, we have .block[ .small[ `$$y_t = \mu + \sum_{k=1}^{p} \phi_k \epsilon_{t-k} + \epsilon_t; \ \ \epsilon_t \sim N(0, \sigma^2).$$` ] ] --- ## MA models - Let's explore what MA(1) models looks like via simulations. Move back to the same R script. -- - Note that -- + Autocorrelations die off almost immediately after lag 1. -- + In MA(p) model, autocorrelations (mostly!) die off after lag `\(p\)`. May not be exact since autocorrelation measures correlation between the actual outcome at different time points. -- + Partial autocorrelations are not particularly useful. <!-- - Contrast to AR(p) models --> <!-- + Autocorrelations tend to decrease over time smoothly. --> <!-- + Partial autocorrelations die off after lag `\(p\)`. --> -- - It is possible to write any stationary AR(p) model as an `\(\textrm{MA}(\infty)\)` model. The reverse result holds for some constraints on the MA parameters. See the reading material. --- ## Deciding models? - Use autocorrelations and partial autocorrelations to help decide model. -- - Steady decay on autocorrelations often implies AR. -- - Non zero autocorrelations before lag `\(p\)` and zero after lag `\(p\)` often implies MA. -- - Sometimes use both AR and MA error structure, called an .hlight[ARMA] model. -- - Whenever we take differences in `\(y\)` values to ensure stationarity before fitting ARMA models, we have .hlight[ARIMA] models.