What’s a time series anyway ?
“Is it recorded or produced with temporal information ?”
frequently encountered in day to day life
frequently encountered in research
Data fit for classic time series analysis in base R
The Time Variable/Axis
The Response Variable should be continuous or pseudo continuous
Learn some basic R tools for exploring series
Understand and quantify Serial correlation in time
Preparation for further analysis
Explaining Observations - Multiple regression models are helpful, but they are not really designed to handle real time-series data
For Real time series - Sample Summary Statistics (e.g.. Sample Mean and Variance ) are potentially misleading by themselves.
Let’s load some sample data and examine its structure
blowfly <- read.csv("blowfly.csv", header = T) blowfly
Time for a first look at Nicholson’s flies
plot(blowfly$total)
We can see some patterns already…
and using a time series plotting method:
plot.ts(blowfly$total)
Multiple Time Series
plot.ts(blowfly[1:3])
ts
classplot.ts
plots the object as if it were of the special class of R objects known simply as ts
the ts()
function is used to convert objects to the ts class
flies <- ts(blowfly$total) eggs<-ts(blowfly$eggs) nonemerging<- ts(blowfly$nonemerging) emerging<- ts(blowfly$emerging) class(flies)
## [1] "ts"
typeof(flies)
## [1] "integer"
setting the start
, end
, and frequency
parameters and functions
start(flies) ; end(flies) ; frequency(flies)
## [1] 1 1
## [1] 361 1
## [1] 1
Add a start year with start
or st
flies<-ts(blowfly$total, start=1954) plot(flies)
print(flies,calendar=T)
## 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 ## 948 942 911 858 801 676 504 397 248 146 1801 6235 ## 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 ## 5974 8921 6610 5973 5673 3875 2361 1352 1226 912 521 363 ## 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 ## 229 142 82 542 939 2431 3687 4543 4535 5441 4412 3022 ## 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 ## 2656 1967 1295 915 551 313 167 95 93 60 68 5259 ## 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 ## 6673 5441 3987 2952 3648 4222 3889 2295 1509 928 739 566 ## 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 ## 383 274 192 226 519 1224 2236 3818 6208 5996 5789 6652 ## 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 ## 7939 4868 3952 2712 1734 1224 703 508 366 279 243 343 ## 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 ## 761 1025 1221 1600 2267 3290 3471 3637 3703 4876 5364 4890 ## 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 ## 3029 1950 1225 1076 905 772 628 473 539 825 1702 2868 ## 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 ## 4473 5221 6592 6400 4752 3521 2719 1931 1500 1082 849 774 ## 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 ## 864 1308 1624 2224 2423 2959 3547 7237 5218 5311 4273 3270 ## 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 ## 2281 1549 1091 796 610 445 894 1454 2262 2363 3847 3876 ## 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 ## 3935 3479 3415 3861 3571 3113 2319 1630 1297 861 761 659 ## 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 ## 701 762 1188 1778 2428 3806 4519 5646 4851 5374 4713 7367 ## 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 ## 7236 5245 3636 2417 1258 766 479 402 248 254 604 1346 ## 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 ## 2342 3328 3599 4081 7643 7919 6098 6896 5634 5134 4188 3469 ## 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 ## 2442 1931 1790 1722 1488 1416 1369 1666 2627 3840 4044 4929 ## 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 ## 5111 3152 4462 4082 3026 1589 2075 1829 1388 1149 968 1170 ## 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 ## 1465 1676 3075 3815 4639 4424 2784 5860 5781 4897 3920 3835 ## 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 ## 3618 3050 3772 3517 3350 3018 2625 2412 2221 2619 3203 2706 ## 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 ## 2717 2175 1628 2388 3677 3156 4272 3771 4955 5584 3891 3501 ## 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 ## 4436 4369 3394 3869 2922 1843 2837 4690 5119 5839 5389 4993 ## 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 ## 4446 4851 4243 4620 4849 3664 3016 2881 3821 4300 4168 5446 ## 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 ## 5477 8579 7533 6884 4127 5546 6313 6650 6304 4842 4352 3215 ## 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 ## 2652 2330 3123 3955 4494 4780 5753 5555 5712 4786 4066 2891 ## 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 ## 3270 4404 4398 4112 4401 5779 6597 8091 11282 12446 13712 11017 ## 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 ## 14683 7258 6195 5962 4213 2775 1781 936 898 1160 3158 3386 ## 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 ## 4547 4823 4970 4940 5793 7836 4457 6901 8191 6766 5165 2919 ## 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 ## 3415 3431 3162 2525 2290 1955 1936 2384 4666 7219 8306 8027 ## 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 ## 7010 8149 8949 6105 5324 5766 6214 7007 8154 9049 6883 8103 ## 2314 ## 6803
Set time units and frequency with start
and frequency
the frequency will be the number of columns
example: set a quarterly frequency
flies<-ts(blowfly$total, start=c(1954,1),frequency = 4) plot(flies)
print(flies,calendar=T)
## Qtr1 Qtr2 Qtr3 Qtr4 ## 1954 948 942 911 858 ## 1955 801 676 504 397 ## 1956 248 146 1801 6235 ## 1957 5974 8921 6610 5973 ## 1958 5673 3875 2361 1352 ## 1959 1226 912 521 363 ## 1960 229 142 82 542 ## 1961 939 2431 3687 4543 ## 1962 4535 5441 4412 3022 ## 1963 2656 1967 1295 915 ## 1964 551 313 167 95 ## 1965 93 60 68 5259 ## 1966 6673 5441 3987 2952 ## 1967 3648 4222 3889 2295 ## 1968 1509 928 739 566 ## 1969 383 274 192 226 ## 1970 519 1224 2236 3818 ## 1971 6208 5996 5789 6652 ## 1972 7939 4868 3952 2712 ## 1973 1734 1224 703 508 ## 1974 366 279 243 343 ## 1975 761 1025 1221 1600 ## 1976 2267 3290 3471 3637 ## 1977 3703 4876 5364 4890 ## 1978 3029 1950 1225 1076 ## 1979 905 772 628 473 ## 1980 539 825 1702 2868 ## 1981 4473 5221 6592 6400 ## 1982 4752 3521 2719 1931 ## 1983 1500 1082 849 774 ## 1984 864 1308 1624 2224 ## 1985 2423 2959 3547 7237 ## 1986 5218 5311 4273 3270 ## 1987 2281 1549 1091 796 ## 1988 610 445 894 1454 ## 1989 2262 2363 3847 3876 ## 1990 3935 3479 3415 3861 ## 1991 3571 3113 2319 1630 ## 1992 1297 861 761 659 ## 1993 701 762 1188 1778 ## 1994 2428 3806 4519 5646 ## 1995 4851 5374 4713 7367 ## 1996 7236 5245 3636 2417 ## 1997 1258 766 479 402 ## 1998 248 254 604 1346 ## 1999 2342 3328 3599 4081 ## 2000 7643 7919 6098 6896 ## 2001 5634 5134 4188 3469 ## 2002 2442 1931 1790 1722 ## 2003 1488 1416 1369 1666 ## 2004 2627 3840 4044 4929 ## 2005 5111 3152 4462 4082 ## 2006 3026 1589 2075 1829 ## 2007 1388 1149 968 1170 ## 2008 1465 1676 3075 3815 ## 2009 4639 4424 2784 5860 ## 2010 5781 4897 3920 3835 ## 2011 3618 3050 3772 3517 ## 2012 3350 3018 2625 2412 ## 2013 2221 2619 3203 2706 ## 2014 2717 2175 1628 2388 ## 2015 3677 3156 4272 3771 ## 2016 4955 5584 3891 3501 ## 2017 4436 4369 3394 3869 ## 2018 2922 1843 2837 4690 ## 2019 5119 5839 5389 4993 ## 2020 4446 4851 4243 4620 ## 2021 4849 3664 3016 2881 ## 2022 3821 4300 4168 5446 ## 2023 5477 8579 7533 6884 ## 2024 4127 5546 6313 6650 ## 2025 6304 4842 4352 3215 ## 2026 2652 2330 3123 3955 ## 2027 4494 4780 5753 5555 ## 2028 5712 4786 4066 2891 ## 2029 3270 4404 4398 4112 ## 2030 4401 5779 6597 8091 ## 2031 11282 12446 13712 11017 ## 2032 14683 7258 6195 5962 ## 2033 4213 2775 1781 936 ## 2034 898 1160 3158 3386 ## 2035 4547 4823 4970 4940 ## 2036 5793 7836 4457 6901 ## 2037 8191 6766 5165 2919 ## 2038 3415 3431 3162 2525 ## 2039 2290 1955 1936 2384 ## 2040 4666 7219 8306 8027 ## 2041 7010 8149 8949 6105 ## 2042 5324 5766 6214 7007 ## 2043 8154 9049 6883 8103 ## 2044 6803
set a monthly frequency
flies<-ts(blowfly$total, start=c(1954,1), frequency = 12) print(flies,calendar=T)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov ## 1954 948 942 911 858 801 676 504 397 248 146 1801 ## 1955 5974 8921 6610 5973 5673 3875 2361 1352 1226 912 521 ## 1956 229 142 82 542 939 2431 3687 4543 4535 5441 4412 ## 1957 2656 1967 1295 915 551 313 167 95 93 60 68 ## 1958 6673 5441 3987 2952 3648 4222 3889 2295 1509 928 739 ## 1959 383 274 192 226 519 1224 2236 3818 6208 5996 5789 ## 1960 7939 4868 3952 2712 1734 1224 703 508 366 279 243 ## 1961 761 1025 1221 1600 2267 3290 3471 3637 3703 4876 5364 ## 1962 3029 1950 1225 1076 905 772 628 473 539 825 1702 ## 1963 4473 5221 6592 6400 4752 3521 2719 1931 1500 1082 849 ## 1964 864 1308 1624 2224 2423 2959 3547 7237 5218 5311 4273 ## 1965 2281 1549 1091 796 610 445 894 1454 2262 2363 3847 ## 1966 3935 3479 3415 3861 3571 3113 2319 1630 1297 861 761 ## 1967 701 762 1188 1778 2428 3806 4519 5646 4851 5374 4713 ## 1968 7236 5245 3636 2417 1258 766 479 402 248 254 604 ## 1969 2342 3328 3599 4081 7643 7919 6098 6896 5634 5134 4188 ## 1970 2442 1931 1790 1722 1488 1416 1369 1666 2627 3840 4044 ## 1971 5111 3152 4462 4082 3026 1589 2075 1829 1388 1149 968 ## 1972 1465 1676 3075 3815 4639 4424 2784 5860 5781 4897 3920 ## 1973 3618 3050 3772 3517 3350 3018 2625 2412 2221 2619 3203 ## 1974 2717 2175 1628 2388 3677 3156 4272 3771 4955 5584 3891 ## 1975 4436 4369 3394 3869 2922 1843 2837 4690 5119 5839 5389 ## 1976 4446 4851 4243 4620 4849 3664 3016 2881 3821 4300 4168 ## 1977 5477 8579 7533 6884 4127 5546 6313 6650 6304 4842 4352 ## 1978 2652 2330 3123 3955 4494 4780 5753 5555 5712 4786 4066 ## 1979 3270 4404 4398 4112 4401 5779 6597 8091 11282 12446 13712 ## 1980 14683 7258 6195 5962 4213 2775 1781 936 898 1160 3158 ## 1981 4547 4823 4970 4940 5793 7836 4457 6901 8191 6766 5165 ## 1982 3415 3431 3162 2525 2290 1955 1936 2384 4666 7219 8306 ## 1983 7010 8149 8949 6105 5324 5766 6214 7007 8154 9049 6883 ## 1984 6803 ## Dec ## 1954 6235 ## 1955 363 ## 1956 3022 ## 1957 5259 ## 1958 566 ## 1959 6652 ## 1960 343 ## 1961 4890 ## 1962 2868 ## 1963 774 ## 1964 3270 ## 1965 3876 ## 1966 659 ## 1967 7367 ## 1968 1346 ## 1969 3469 ## 1970 4929 ## 1971 1170 ## 1972 3835 ## 1973 2706 ## 1974 3501 ## 1975 4993 ## 1976 5446 ## 1977 3215 ## 1978 2891 ## 1979 11017 ## 1980 3386 ## 1981 2919 ## 1982 8027 ## 1983 8103 ## 1984
plot(flies)
explore different persepctives according to frequency with aggregate
and boxplot
par(mfrow=c(2,2)) plot(flies) ag.flies<-aggregate(flies) ; plot(ag.flies) cyc.flies<- cycle(flies) ; boxplot(flies ~ cyc.flies) par(mfrow=c(1,1))
cycle
was used to pull up the monthly interval as a categorical variable
Draw 2 series that have the same time axis together, with cbind
eggs2<-ts(blowfly$eggs, start=c(1954,1), frequency = 12) plot(cbind(flies , eggs2 ))
set a frequency of every 2-3 days
flies<-ts(blowfly$total, start=(c(1954,1)) , freq = 150) print(flies,calendar=T)
## p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 ## 1954 948 942 911 858 801 676 504 397 248 146 1801 ## 1955 2319 1630 1297 861 761 659 701 762 1188 1778 2428 ## 1956 3270 4404 4398 4112 4401 5779 6597 8091 11282 12446 13712 ## p12 p13 p14 p15 p16 p17 p18 p19 p20 p21 p22 ## 1954 6235 5974 8921 6610 5973 5673 3875 2361 1352 1226 912 ## 1955 3806 4519 5646 4851 5374 4713 7367 7236 5245 3636 2417 ## 1956 11017 14683 7258 6195 5962 4213 2775 1781 936 898 1160 ## p23 p24 p25 p26 p27 p28 p29 p30 p31 p32 p33 ## 1954 521 363 229 142 82 542 939 2431 3687 4543 4535 ## 1955 1258 766 479 402 248 254 604 1346 2342 3328 3599 ## 1956 3158 3386 4547 4823 4970 4940 5793 7836 4457 6901 8191 ## p34 p35 p36 p37 p38 p39 p40 p41 p42 p43 p44 ## 1954 5441 4412 3022 2656 1967 1295 915 551 313 167 95 ## 1955 4081 7643 7919 6098 6896 5634 5134 4188 3469 2442 1931 ## 1956 6766 5165 2919 3415 3431 3162 2525 2290 1955 1936 2384 ## p45 p46 p47 p48 p49 p50 p51 p52 p53 p54 p55 ## 1954 93 60 68 5259 6673 5441 3987 2952 3648 4222 3889 ## 1955 1790 1722 1488 1416 1369 1666 2627 3840 4044 4929 5111 ## 1956 4666 7219 8306 8027 7010 8149 8949 6105 5324 5766 6214 ## p56 p57 p58 p59 p60 p61 p62 p63 p64 p65 p66 ## 1954 2295 1509 928 739 566 383 274 192 226 519 1224 ## 1955 3152 4462 4082 3026 1589 2075 1829 1388 1149 968 1170 ## 1956 7007 8154 9049 6883 8103 6803 ## p67 p68 p69 p70 p71 p72 p73 p74 p75 p76 p77 ## 1954 2236 3818 6208 5996 5789 6652 7939 4868 3952 2712 1734 ## 1955 1465 1676 3075 3815 4639 4424 2784 5860 5781 4897 3920 ## 1956 ## p78 p79 p80 p81 p82 p83 p84 p85 p86 p87 p88 ## 1954 1224 703 508 366 279 243 343 761 1025 1221 1600 ## 1955 3835 3618 3050 3772 3517 3350 3018 2625 2412 2221 2619 ## 1956 ## p89 p90 p91 p92 p93 p94 p95 p96 p97 p98 p99 ## 1954 2267 3290 3471 3637 3703 4876 5364 4890 3029 1950 1225 ## 1955 3203 2706 2717 2175 1628 2388 3677 3156 4272 3771 4955 ## 1956 ## p100 p101 p102 p103 p104 p105 p106 p107 p108 p109 p110 ## 1954 1076 905 772 628 473 539 825 1702 2868 4473 5221 ## 1955 5584 3891 3501 4436 4369 3394 3869 2922 1843 2837 4690 ## 1956 ## p111 p112 p113 p114 p115 p116 p117 p118 p119 p120 p121 ## 1954 6592 6400 4752 3521 2719 1931 1500 1082 849 774 864 ## 1955 5119 5839 5389 4993 4446 4851 4243 4620 4849 3664 3016 ## 1956 ## p122 p123 p124 p125 p126 p127 p128 p129 p130 p131 p132 ## 1954 1308 1624 2224 2423 2959 3547 7237 5218 5311 4273 3270 ## 1955 2881 3821 4300 4168 5446 5477 8579 7533 6884 4127 5546 ## 1956 ## p133 p134 p135 p136 p137 p138 p139 p140 p141 p142 p143 ## 1954 2281 1549 1091 796 610 445 894 1454 2262 2363 3847 ## 1955 6313 6650 6304 4842 4352 3215 2652 2330 3123 3955 4494 ## 1956 ## p144 p145 p146 p147 p148 p149 p150 ## 1954 3876 3935 3479 3415 3861 3571 3113 ## 1955 4780 5753 5555 5712 4786 4066 2891 ## 1956
plot(flies)
par(mfrow=c(2,1)) blowfly$eggs %>% ts(frequency= 1) %>% plot blowfly$eggs %>% ts(frequency= 52) %>% plot
par(mfrow=c(1,1))
the ‘ts’ class allows R to understand
many base/stats functions are actually designed for the ‘ts’ class
but using them on a non- ts object may or may not result in an error
Warning: the ts
class is designed only for equidistant series. other packages have other classes and functions which extend capabilities
We can see some clearer cyclical behaviour of a section if we zoom in and explore a bit with window
:
flies<-ts(blowfly$total) plot(window(flies, 1, 100, 1))
more zooming with deltat
:
par(mfrow=c(3,1)) plot(window(flies, 1, 100, deltat = 5)) plot(window(flies, 1, 100, deltat = 10)) plot(window(flies, 1, 100, deltat = 20))
par(mfrow=c(1,1))
plot(window(flies, 1, 100, deltat = 20))
We finally lose the cyclical behaviour at ~deltat=20
. let’s remember this…
The flies series is unusually clear, and still - its cyclical behaviour is only one possible pattern.
other series may represent processes that behave in a way that is not immediately recognizable, and have many things going at once.
examples
We naturally want tools that will describe the entire series’ and parts of the series’ attributes, be executable fast in R, and guide us in further steps (such as model fitting and forecasting).
for that, we need to understand how a given series evolves through time - via autocorrelation and related attributes of time series
The Lag Plot
Auto-correlation
Partial Auto-correlation
Cross-Correlation
how does my data behave in relation to itself ? or ’how does this week’s population relate to last week’s population ?
plot(flies[1:350], flies[2:351])
Clearly, there’s a pattern at lag =1, at least in the smaller population sizes
now let’s look at lags 2 and 3:
par(mfrow=c(1,2)) plot(flies[1:350], flies[3:352]) plot(flies[1:350], flies[4:353]) par(mfrow=c(1,1))
we start losing the pattern gradually, as we go to higher lags…
more lags, lag = 1 to lag =8, using lag.plot()
lag.plot(flies, lag = 8, layout = c(2, 4), diag = T)
however, at lags 9-16:
lag.plot(flies, lags = 8, set.lags = 9:16, layout = c(2, 4), diag = T)
summary
acf()
acf(flies)
plot(flies)
This summarizes and charts the autocorrelation structure in the series, up to a default lag size, which is 25 in this case.
The vertical axis is the magnitude of the Sample Autocorrelation Coefficients. they measure the relative correlation between observations at different distances apart (lags)
and the horizontal axis is simply the lag size.
let’s have a look at these Sample Autocorrelation Coefficients
a <- acf(flies, plot = F)
a$acf %>% head(10) # for the actual factors
## [1] 1.00000000 0.88266919 0.74218182 0.57299028 0.39789881 ## [6] 0.22191632 0.06209902 -0.06770406 -0.16982318 -0.21847342
there’s no better evidence of cycles, the flies exhibit strong positive AND negative correlation at regular intervals throughout the series, with a cycle period of 19 weeks.
let’s look at the entire series
acf(flies, lag.max = 360, col = "red")
We can see
acf(flies, lag.max = 360, col = "red")
warning: there is bias here with increasing lag size. why ?
there’s some kind of a diminishing but long term “memory” in the system, up to a lag of 100 weeks.
clear cycles with a period of 19 weeks
The main point :there’s a LOT of non-randomness in the sequence
This last point is especially important for linear and other modelling, which assume that samples are independent. we may want to stop and think before going on to perform a conventional lm()
! , and at the very least be aware of series’ behaviour.
and this illustrates why we need to explore time series internally before we go on to the other, external parameters.
The Sample Auto-covariance Coefficients
acf(flies, type = "covariance")
acf(flies, type = "covariance", plot = F)$acf %>% head(10)
## [1] 5908371.3 5215137.3 4385085.8 3385439.3 2350933.9 1311164.0 ## [7] 366904.1 -400020.7 -1003378.4 -1290822.1
Correlograms produced using the sample autocorrelation coefficients do not account for the fact that for a given lag size there may be correlation between internal points, (e.g.. values that are 4 points apart were correlated, but so do values that are 2 points apart).
We sometimes want to control for the internal correlations inside the lag, or in other words, to check what would have been the correlation coefficients, had all the internal lags’ coefficients were forced to zero.
This helps us later on, when selecting a model
the Partial Autocorrelation is the relationship between this week’s population and the population at lag n when we have already controlled for the correlations between all of the successive weeks between this week and week n
We only need to use pacf
or acf(type="partial")
instead of the default
acc <- a$acf %>% head(12) %>% round(2) partial.acc <- pacf(flies, plot = F) %>% .$acf %>% head(12) %>% round(2) acc
## [1] 1.00 0.88 0.74 0.57 0.40 0.22 0.06 -0.07 -0.17 -0.22 -0.23 ## [12] -0.20
partial.acc
## [1] 0.88 -0.17 -0.21 -0.12 -0.12 -0.06 -0.01 -0.03 0.09 0.05 0.04 ## [12] 0.16
plot(1:12, acc, xlab = "Lag", ylab = "Coefficient") points(1:12, partial.acc, col = "blue") lines(1:12, rep(0, 12)) legend("topright", c("ac coefficients", "Partial ac coefficients"), col = c("black", "green"), pch = c(1, 1))
pacf(flies)
what can we take from this ?
the maximal negative and maximal positive lags are important places in the cycle.
clues for possible underlying processes in lags 3 and 12
the PACF can be thought of as analogous to a derivative of ACF - thus finding points driving change in pitch or direction.
Summary till now, using just Correlograms for the entire series
Lags 2-4, 12-16 are suspected: we should look for biological mechanisms there, when we construct an explanatory model.
Note: these are clues generated only from the time series itself, before considering any other data or knowledge in biology
The total number of flies is not the only one we have…
we could always scatter-plot the data frame variables against each other
plot(blowfly, lower.panel = NULL)
we get a strong correlation between total, deaths, and emerging. but there’s an inherent problem in this analysis…
flies <- ts(blowfly$total) ccf(nonemerging, flies)
ccf(emerging, flies)
ccf(eggs, flies)
Reminder: Lags 2-4 (negative), 12-16 (positive) are suspected from pacf()
of the total of “flies”
from ccf()
example, eggs vs. total-
it can be difficult to construct an explanation from this alone, but at least we now have clues of lead times and delay times between every two variables.
This is a first step in time series stats, towards modelling.
findings: cycle length = still 19
one possible explanation: larval processes appear to drive the cycles when they reach a maximum (density dependent competition ?)
an alternative , complementary approach to analysing fluctuations is to analyse how the variance evolves through time, by distributing it to frequencies.
This method can detect important processes that may not be found with acf()
to to too much noise in certain lags
spectrum
horizontal axis - frequencies instead of time vertical - variance
spectrum(flies, main = "")
in this analysis we look for frequency peaks. in this case:
spec<-spectrum(flies, main = "")
spec$spec %>% which.max() -> maxloc # where's the maximal value ? max.freq <- (spec$freq)[maxloc] # what's the maximal value ? max.freq
## [1] 0.05333333
now let’s plot and find the exact value in time units
plot(spec, main = "") abline(v = max.freq, col = "red", lwd = 2)
1 / max.freq
## [1] 18.75
suggests again cycles of ~19 years.
the information may or may not be easier to interpret than a correlogram.
it’s best to use both when exploring.
No definitive pattern in lags beyond 0 or a dominant frequency
typical of a Pure White Noise process, which is Normal errors around a steady mean.
No definitive pattern in lags, but high correlation for all lags
typical of a Random Walk process, where values are correlated to (mostly) only the latest value (the values in the previous step), plus a small white noise
Super Important
Is there a real trend here ? the time plot and our brain say there is. but the underlying process is probably not a real trend or combination of trends, but a kind of random ‘drift’.
the next value is more likely to be a minor deviation from the last value, with equal odds for moving up or down, than a result of a another external driver !
random walks are notorious in deceiving our brains. actually, the long term mean of the formula used to create this series, rw<-cumsum(rnorm(100,0,1))
is 0 !
repeat:
The same ACF even though the time plot is completely different
repeated negative followed by positive: return to equilibrium following random departures from it.
This is called an Autoregressive Process of order 1 or ‘AR(1)’ (the order can be determined from the pacf()
) in this case a negative AR(1) with rapid response.
ecological models of equilibrium population dynamics are typically Autoregressive.
Simplest Approach:
visually identify a breakpoint
apply the same steps on parts of the series to find their attributes and identify discontinuities
now let’s adapt the analysis for the two parts, separately
split the series:
first <- ts(flies[1:200]) second <- ts(flies[201:361])
what are the notable differences between the 2 parts ?
same overall shapes, same cycle length
Important Lags in the second parts are shifted 1 or 2, compared with the first - change points are somewhat closer to each other
what if we wanted to analyse the two with a commom baseline ?
lm()
let’s de-trend using linear regression, and have look at the results.
firstX <- 1:length(first) first.lm <- lm(first ~ firstX) first.detrended <- first - predict(first.lm) secondX <- 1:length(second) second.lm <- lm(second ~ secondX) second.detrended <- second - predict(second.lm)
par(mfrow = c(2, 2)) plot(first) ; lines(first.lm$fitted.values, col = "red") plot(second) ; lines(second.lm$fitted.values, col = "blue") plot.ts(first.detrended) # reminder Q: why plot.ts ? plot.ts(second.detrended)
par(mfrow = c(1, 1))
first.lm$coefficients
## (Intercept) firstX ## 2112.433116 4.882556
second.lm$coefficients
## (Intercept) secondX ## 2827.57461 21.94437
note: the pitch of of ‘second’ is 5 times that of first
now, let’s compare again
par(mfrow = c(2, 2)) acf(first.detrended) ; acf(second.detrended) pacf(first.detrended); pacf(second.detrended) par(mfrow = c(1, 1))
comparing the de-trended versions: acf()
- results similar, but pacf()
are quite different
conclusions:
diff()
par(mfrow = c(2, 1)) plot(flies) flies %>% diff() %>% plot()
par(mfrow = c(1, 1))
De-trending is the goal of many methods in time series analysis that are not presented here
+first order differencing will usual be enough coerce the series to become stationary, and will do it while still keeping the attributes of the signal. de-trending by a lm()
may create noise that was not there.
it has several uses in model selection, and the advantage that parameters do not need to be estimated (e.g. from a linear model)
let’s return to some noisier series
par(mfrow=c(2,2)) plot(w) plot(rwd) plot(sine.wn) plot(Y)
par(mfrow=c(1,1))
last time we used window()
par(mfrow = c(2, 2)) plot(window(w, 100, 200)) plot(window(w, 100, 150)) plot(window(w, 100, 125)) par(mfrow = c(1, 1))
this approach doesn’t work
we should probably start looking by smoothing first
filter()
filter
allows many methods for smoothing
this is the simplest technique we use the filter argument with a weights vector, giving equal weights
par(mfrow = c(3, 1)) v <- filter(w, sides = 2, filter = rep(1, 3) / 3) plot(w) lines(v, col = "red") rwd.s <- filter(rwd, sides = 2, filter = rep(1, 3) / 3) plot(rwd) lines(rwd.s, col = "red") sine.wn.s <- filter(sine.wn, sides = 2, filter = rep(1, 3) / 3) plot(sine.wn) lines(sine.wn.s, col = "red") Y.s <- filter(Y, sides = 2, filter = rep(1, 3) / 3) plot(Y) lines(Y.s, col = "red") par(mfrow = c(1, 1))
let’s try a 9 point average
v <- filter(w, sides = 2, filter = rep(1, 9) / 9) plot(w) lines(v, col = "red") rwd.s <- filter(rwd, sides = 2, filter = rep(1, 9) / 9) plot(rwd) lines(rwd.s, col = "red") sine.wn.s <- filter(sine.wn, sides = 2, filter = rep(1, 9) / 9) plot(sine.wn) lines(sine.wn.s, col = "red") Y.s <- filter(Y, sides = 2, filter = rep(1, 9) / 9) plot(Y) lines(Y.s, col = "red")
let’s try a 15 point average
v <- filter(w, sides = 2, filter = rep(1, 15) / 15) plot(w) lines(v, col = "red") rwd.s <- filter(rwd, sides = 2, filter = rep(1, 15) / 15) plot(rwd) lines(rwd.s, col = "red") sine.wn.s <- filter(sine.wn, sides = 2, filter = rep(1, 15) / 15) plot(sine.wn) lines(sine.wn.s, col = "red", lwd = 3) Y.s <- filter(Y, sides = 2, filter = rep(1, 15) / 15) plot(Y) lines(Y.s, col = "red")
increase the averaging period to receive a smoother results and (maybe) reveal patterns in the noise
a moving average is a local smoother, meaning it allows a change of series behaviour through out the series.
3### smoothing via ksmooth()
t <- 1:length(sine.wn) plot(sine.wn) lines(sine.wn.s, col = "red", lwd = 3) lines(ksmooth(t, sine.wn, "normal", bandwidth = 5), col = "blue", lwd = 2)
plot(sine.wn) lines(sine.wn.s, col = "red", lwd = 3) lines(ksmooth(t, sine.wn, "normal", bandwidth = 10), col = "blue", lwd = 2)
plot(sine.wn) lines(sine.wn.s, col = "red", lwd = 3) lines(ksmooth(t, sine.wn, "normal", bandwidth = 20), col = "blue", lwd = 2)
The wider the bandwidth, the smoother the result
smooth.spline()
t <- 1:length(sine.wn) # smoothing paraeter=0.1 plot(sine.wn) lines(sine.wn.s, col = "red", lwd = 3) lines(smooth.spline(t, sine.wn, spar = 0.1), col = "blue", lwd = 2)
# smoothing paraeter=1 plot(sine.wn) lines(sine.wn.s, col = "red", lwd = 3) lines(smooth.spline(t, sine.wn, spar = 1), col = "blue", lwd = 2)
# smoothing paraeter=0.7 plot(sine.wn) lines(sine.wn.s, col = "red", lwd = 3) lines(smooth.spline(t, sine.wn, spar = 0.7), col = "blue", lwd = 2)
t <- 1:length(flies) plot(flies) lines(ksmooth(t, flies, "normal", bandwidth = 5), col = "blue", lwd = 2)
# spline smoothing smoothing paraeter=0.4 plot(flies) lines(smooth.spline(t, flies, spar = 0.4), col = "blue", lwd = 2)
The description of a given time series is usually a cyclical process by itself
some methods that are covered here are also useful in other contexts (e,g. model fitting, control)
function | Purpose |
---|---|
ts | construct a time series object |
ts.plot | plot a vector or matrix as if they were of the special ‘ts’ class. |
window | have a look at sections of series |
lag.plot | create a scatter-plot matrix by lags |
acf , pacf , ccf | study serial correlation |
Diff | recode the series by differencing |
lm | construct a Linear Model, such a simple univariate regression to remove a trend |
filter, ksmooth, smooth.spline | applying linear and non linear filters. in the context of this lesson - for smoothing |
Process - a mathematical description of observations as a function of time (and perhaps other variables)
Stationary Process (intuitive definition, non formal) :
In other words, The properties of various sections of the data are constant
NOTE: No real process is strictly Stationary !
Term | Refers to |
---|---|
Continuous | Observations are made continuously through time (even if the variable of interest can only take certain) |
Deterministic | a time series that can be predicted exactly, by a well defined mathematical process |
Discrete | observations are taken at specific times |
Equidistant | A series where the variable is recorded at equal intervals (e.g every month) |
Lag | The time- distance between points in the series, also used for delay lengths in process/series descriptions |
Stationary | See above |
Stochastic | The future is only partly determined by the past/ exact predictions are impossible and are replaced by the idea that future values have a probability distribution |
Term | Refers to |
---|---|
Point Process | Binary Observations occur randomly at non equal interval |
Binary Process | Observations can take only “0” or “1”, Yes/No and so on |
Term | Refers to |
---|---|
ACF or ac.f | Autocorrelation Function |
pACF | Partial (corr. coefficient) ACF |
AR | Autoregressive |
MA | Moving Average |
CRAN Task View: Time Series Analysis https://cran.r-project.org/web/views/TimeSeries.html
Time Series Modelling basics, explained simply albeit with some theory mistakes https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modelling/
Basic Data Analysis for Time Series with R https://library.books24x7.com/toc.aspx?bookid=63461 (requires a library login + a books 24/7 account)
Looking for real ecological time series ?
BIOTIME - a Global database of biodiversity time series http://biotime.st-andrews.ac.uk/
NEON - a repository for LTERs
https://data.neonscience.org/browse-data?showAllDates=true&showAllSites=true&showTheme=org
Heavily based on some Textbooks and R manuals
There are many others…