Intro

What’s a time series anyway ?

“Is it recorded or produced with temporal information ?”

frequently encountered in day to day life

  • Marketing: Prices, Sales, economic indices
  • Central Beauro of Statistics: Demographic information
  • Physical: such as Weather history

frequently encountered in research

  • Observation Counts: Pop. & community ecology
  • Environmental
  • recording instrument output

Data fit for classic time series analysis in base R

  • The Time Variable/Axis

    • No Missing Values
    • Equidistant
  • The Response Variable should be continuous or pseudo continuous

Goals For today

  1. Learn some basic R tools for exploring series

  2. Understand and quantify Serial correlation in time

  3. Preparation for further analysis

Why bother ?

  • Variables in Time almost always behave differently than in space
  • Explaining Observations - Multiple regression models are helpful, but they are not really designed to handle real time-series data

  • For Real time series - Sample Summary Statistics (e.g.. Sample Mean and Variance ) are potentially misleading by themselves.

    • They do not have their usual properties
    • we can compute them (also for Time Series Statistics), but to describe populations we better remove systematic components from the samples first.

Objectives of Time Series Analysis

  • Description - what we are here for today
  • Explanation
  • Predictions, both with and without external variables
  • Control
  • Detection of anomalies

Let’s load some sample data and examine its structure

blowfly <- read.csv("blowfly.csv", header = T) 
blowfly

Time for a first look at Nicholson’s flies

plot(blowfly$total)

We can see some patterns already…

and using a time series plotting method:

plot.ts(blowfly$total)

Multiple Time Series

plot.ts(blowfly[1:3])

The ts class

plot.ts plots the object as if it were of the special class of R objects known simply as ts

the ts() function is used to convert objects to the ts class

flies <- ts(blowfly$total)
eggs<-ts(blowfly$eggs)

nonemerging<- ts(blowfly$nonemerging)
emerging<- ts(blowfly$emerging)


class(flies)
## [1] "ts"
typeof(flies)
## [1] "integer"

setting the start, end, and frequency parameters and functions

start(flies) ; end(flies) ; frequency(flies)
## [1] 1 1
## [1] 361   1
## [1] 1

Add a start year with start or st

flies<-ts(blowfly$total, start=1954)
plot(flies)

print(flies,calendar=T) 
##  1954  1955  1956  1957  1958  1959  1960  1961  1962  1963  1964  1965 
##   948   942   911   858   801   676   504   397   248   146  1801  6235 
##  1966  1967  1968  1969  1970  1971  1972  1973  1974  1975  1976  1977 
##  5974  8921  6610  5973  5673  3875  2361  1352  1226   912   521   363 
##  1978  1979  1980  1981  1982  1983  1984  1985  1986  1987  1988  1989 
##   229   142    82   542   939  2431  3687  4543  4535  5441  4412  3022 
##  1990  1991  1992  1993  1994  1995  1996  1997  1998  1999  2000  2001 
##  2656  1967  1295   915   551   313   167    95    93    60    68  5259 
##  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013 
##  6673  5441  3987  2952  3648  4222  3889  2295  1509   928   739   566 
##  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  2025 
##   383   274   192   226   519  1224  2236  3818  6208  5996  5789  6652 
##  2026  2027  2028  2029  2030  2031  2032  2033  2034  2035  2036  2037 
##  7939  4868  3952  2712  1734  1224   703   508   366   279   243   343 
##  2038  2039  2040  2041  2042  2043  2044  2045  2046  2047  2048  2049 
##   761  1025  1221  1600  2267  3290  3471  3637  3703  4876  5364  4890 
##  2050  2051  2052  2053  2054  2055  2056  2057  2058  2059  2060  2061 
##  3029  1950  1225  1076   905   772   628   473   539   825  1702  2868 
##  2062  2063  2064  2065  2066  2067  2068  2069  2070  2071  2072  2073 
##  4473  5221  6592  6400  4752  3521  2719  1931  1500  1082   849   774 
##  2074  2075  2076  2077  2078  2079  2080  2081  2082  2083  2084  2085 
##   864  1308  1624  2224  2423  2959  3547  7237  5218  5311  4273  3270 
##  2086  2087  2088  2089  2090  2091  2092  2093  2094  2095  2096  2097 
##  2281  1549  1091   796   610   445   894  1454  2262  2363  3847  3876 
##  2098  2099  2100  2101  2102  2103  2104  2105  2106  2107  2108  2109 
##  3935  3479  3415  3861  3571  3113  2319  1630  1297   861   761   659 
##  2110  2111  2112  2113  2114  2115  2116  2117  2118  2119  2120  2121 
##   701   762  1188  1778  2428  3806  4519  5646  4851  5374  4713  7367 
##  2122  2123  2124  2125  2126  2127  2128  2129  2130  2131  2132  2133 
##  7236  5245  3636  2417  1258   766   479   402   248   254   604  1346 
##  2134  2135  2136  2137  2138  2139  2140  2141  2142  2143  2144  2145 
##  2342  3328  3599  4081  7643  7919  6098  6896  5634  5134  4188  3469 
##  2146  2147  2148  2149  2150  2151  2152  2153  2154  2155  2156  2157 
##  2442  1931  1790  1722  1488  1416  1369  1666  2627  3840  4044  4929 
##  2158  2159  2160  2161  2162  2163  2164  2165  2166  2167  2168  2169 
##  5111  3152  4462  4082  3026  1589  2075  1829  1388  1149   968  1170 
##  2170  2171  2172  2173  2174  2175  2176  2177  2178  2179  2180  2181 
##  1465  1676  3075  3815  4639  4424  2784  5860  5781  4897  3920  3835 
##  2182  2183  2184  2185  2186  2187  2188  2189  2190  2191  2192  2193 
##  3618  3050  3772  3517  3350  3018  2625  2412  2221  2619  3203  2706 
##  2194  2195  2196  2197  2198  2199  2200  2201  2202  2203  2204  2205 
##  2717  2175  1628  2388  3677  3156  4272  3771  4955  5584  3891  3501 
##  2206  2207  2208  2209  2210  2211  2212  2213  2214  2215  2216  2217 
##  4436  4369  3394  3869  2922  1843  2837  4690  5119  5839  5389  4993 
##  2218  2219  2220  2221  2222  2223  2224  2225  2226  2227  2228  2229 
##  4446  4851  4243  4620  4849  3664  3016  2881  3821  4300  4168  5446 
##  2230  2231  2232  2233  2234  2235  2236  2237  2238  2239  2240  2241 
##  5477  8579  7533  6884  4127  5546  6313  6650  6304  4842  4352  3215 
##  2242  2243  2244  2245  2246  2247  2248  2249  2250  2251  2252  2253 
##  2652  2330  3123  3955  4494  4780  5753  5555  5712  4786  4066  2891 
##  2254  2255  2256  2257  2258  2259  2260  2261  2262  2263  2264  2265 
##  3270  4404  4398  4112  4401  5779  6597  8091 11282 12446 13712 11017 
##  2266  2267  2268  2269  2270  2271  2272  2273  2274  2275  2276  2277 
## 14683  7258  6195  5962  4213  2775  1781   936   898  1160  3158  3386 
##  2278  2279  2280  2281  2282  2283  2284  2285  2286  2287  2288  2289 
##  4547  4823  4970  4940  5793  7836  4457  6901  8191  6766  5165  2919 
##  2290  2291  2292  2293  2294  2295  2296  2297  2298  2299  2300  2301 
##  3415  3431  3162  2525  2290  1955  1936  2384  4666  7219  8306  8027 
##  2302  2303  2304  2305  2306  2307  2308  2309  2310  2311  2312  2313 
##  7010  8149  8949  6105  5324  5766  6214  7007  8154  9049  6883  8103 
##  2314 
##  6803

Set time units and frequency with start and frequency the frequency will be the number of columns

example: set a quarterly frequency

flies<-ts(blowfly$total, start=c(1954,1),frequency = 4)
plot(flies)

print(flies,calendar=T)
##       Qtr1  Qtr2  Qtr3  Qtr4
## 1954   948   942   911   858
## 1955   801   676   504   397
## 1956   248   146  1801  6235
## 1957  5974  8921  6610  5973
## 1958  5673  3875  2361  1352
## 1959  1226   912   521   363
## 1960   229   142    82   542
## 1961   939  2431  3687  4543
## 1962  4535  5441  4412  3022
## 1963  2656  1967  1295   915
## 1964   551   313   167    95
## 1965    93    60    68  5259
## 1966  6673  5441  3987  2952
## 1967  3648  4222  3889  2295
## 1968  1509   928   739   566
## 1969   383   274   192   226
## 1970   519  1224  2236  3818
## 1971  6208  5996  5789  6652
## 1972  7939  4868  3952  2712
## 1973  1734  1224   703   508
## 1974   366   279   243   343
## 1975   761  1025  1221  1600
## 1976  2267  3290  3471  3637
## 1977  3703  4876  5364  4890
## 1978  3029  1950  1225  1076
## 1979   905   772   628   473
## 1980   539   825  1702  2868
## 1981  4473  5221  6592  6400
## 1982  4752  3521  2719  1931
## 1983  1500  1082   849   774
## 1984   864  1308  1624  2224
## 1985  2423  2959  3547  7237
## 1986  5218  5311  4273  3270
## 1987  2281  1549  1091   796
## 1988   610   445   894  1454
## 1989  2262  2363  3847  3876
## 1990  3935  3479  3415  3861
## 1991  3571  3113  2319  1630
## 1992  1297   861   761   659
## 1993   701   762  1188  1778
## 1994  2428  3806  4519  5646
## 1995  4851  5374  4713  7367
## 1996  7236  5245  3636  2417
## 1997  1258   766   479   402
## 1998   248   254   604  1346
## 1999  2342  3328  3599  4081
## 2000  7643  7919  6098  6896
## 2001  5634  5134  4188  3469
## 2002  2442  1931  1790  1722
## 2003  1488  1416  1369  1666
## 2004  2627  3840  4044  4929
## 2005  5111  3152  4462  4082
## 2006  3026  1589  2075  1829
## 2007  1388  1149   968  1170
## 2008  1465  1676  3075  3815
## 2009  4639  4424  2784  5860
## 2010  5781  4897  3920  3835
## 2011  3618  3050  3772  3517
## 2012  3350  3018  2625  2412
## 2013  2221  2619  3203  2706
## 2014  2717  2175  1628  2388
## 2015  3677  3156  4272  3771
## 2016  4955  5584  3891  3501
## 2017  4436  4369  3394  3869
## 2018  2922  1843  2837  4690
## 2019  5119  5839  5389  4993
## 2020  4446  4851  4243  4620
## 2021  4849  3664  3016  2881
## 2022  3821  4300  4168  5446
## 2023  5477  8579  7533  6884
## 2024  4127  5546  6313  6650
## 2025  6304  4842  4352  3215
## 2026  2652  2330  3123  3955
## 2027  4494  4780  5753  5555
## 2028  5712  4786  4066  2891
## 2029  3270  4404  4398  4112
## 2030  4401  5779  6597  8091
## 2031 11282 12446 13712 11017
## 2032 14683  7258  6195  5962
## 2033  4213  2775  1781   936
## 2034   898  1160  3158  3386
## 2035  4547  4823  4970  4940
## 2036  5793  7836  4457  6901
## 2037  8191  6766  5165  2919
## 2038  3415  3431  3162  2525
## 2039  2290  1955  1936  2384
## 2040  4666  7219  8306  8027
## 2041  7010  8149  8949  6105
## 2042  5324  5766  6214  7007
## 2043  8154  9049  6883  8103
## 2044  6803

set a monthly frequency

flies<-ts(blowfly$total, start=c(1954,1), frequency = 12)
print(flies,calendar=T)
##        Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov
## 1954   948   942   911   858   801   676   504   397   248   146  1801
## 1955  5974  8921  6610  5973  5673  3875  2361  1352  1226   912   521
## 1956   229   142    82   542   939  2431  3687  4543  4535  5441  4412
## 1957  2656  1967  1295   915   551   313   167    95    93    60    68
## 1958  6673  5441  3987  2952  3648  4222  3889  2295  1509   928   739
## 1959   383   274   192   226   519  1224  2236  3818  6208  5996  5789
## 1960  7939  4868  3952  2712  1734  1224   703   508   366   279   243
## 1961   761  1025  1221  1600  2267  3290  3471  3637  3703  4876  5364
## 1962  3029  1950  1225  1076   905   772   628   473   539   825  1702
## 1963  4473  5221  6592  6400  4752  3521  2719  1931  1500  1082   849
## 1964   864  1308  1624  2224  2423  2959  3547  7237  5218  5311  4273
## 1965  2281  1549  1091   796   610   445   894  1454  2262  2363  3847
## 1966  3935  3479  3415  3861  3571  3113  2319  1630  1297   861   761
## 1967   701   762  1188  1778  2428  3806  4519  5646  4851  5374  4713
## 1968  7236  5245  3636  2417  1258   766   479   402   248   254   604
## 1969  2342  3328  3599  4081  7643  7919  6098  6896  5634  5134  4188
## 1970  2442  1931  1790  1722  1488  1416  1369  1666  2627  3840  4044
## 1971  5111  3152  4462  4082  3026  1589  2075  1829  1388  1149   968
## 1972  1465  1676  3075  3815  4639  4424  2784  5860  5781  4897  3920
## 1973  3618  3050  3772  3517  3350  3018  2625  2412  2221  2619  3203
## 1974  2717  2175  1628  2388  3677  3156  4272  3771  4955  5584  3891
## 1975  4436  4369  3394  3869  2922  1843  2837  4690  5119  5839  5389
## 1976  4446  4851  4243  4620  4849  3664  3016  2881  3821  4300  4168
## 1977  5477  8579  7533  6884  4127  5546  6313  6650  6304  4842  4352
## 1978  2652  2330  3123  3955  4494  4780  5753  5555  5712  4786  4066
## 1979  3270  4404  4398  4112  4401  5779  6597  8091 11282 12446 13712
## 1980 14683  7258  6195  5962  4213  2775  1781   936   898  1160  3158
## 1981  4547  4823  4970  4940  5793  7836  4457  6901  8191  6766  5165
## 1982  3415  3431  3162  2525  2290  1955  1936  2384  4666  7219  8306
## 1983  7010  8149  8949  6105  5324  5766  6214  7007  8154  9049  6883
## 1984  6803                                                            
##        Dec
## 1954  6235
## 1955   363
## 1956  3022
## 1957  5259
## 1958   566
## 1959  6652
## 1960   343
## 1961  4890
## 1962  2868
## 1963   774
## 1964  3270
## 1965  3876
## 1966   659
## 1967  7367
## 1968  1346
## 1969  3469
## 1970  4929
## 1971  1170
## 1972  3835
## 1973  2706
## 1974  3501
## 1975  4993
## 1976  5446
## 1977  3215
## 1978  2891
## 1979 11017
## 1980  3386
## 1981  2919
## 1982  8027
## 1983  8103
## 1984
plot(flies)

explore different persepctives according to frequency with aggregate and boxplot

par(mfrow=c(2,2))
plot(flies)
ag.flies<-aggregate(flies) ; plot(ag.flies)
cyc.flies<- cycle(flies) ; boxplot(flies ~ cyc.flies)
par(mfrow=c(1,1))

cycle was used to pull up the monthly interval as a categorical variable

Draw 2 series that have the same time axis together, with cbind

eggs2<-ts(blowfly$eggs, start=c(1954,1), frequency = 12)
plot(cbind(flies , eggs2 ))

set a frequency of every 2-3 days

flies<-ts(blowfly$total, start=(c(1954,1)) , freq = 150)
print(flies,calendar=T)
##         p1    p2    p3    p4    p5    p6    p7    p8    p9   p10   p11
## 1954   948   942   911   858   801   676   504   397   248   146  1801
## 1955  2319  1630  1297   861   761   659   701   762  1188  1778  2428
## 1956  3270  4404  4398  4112  4401  5779  6597  8091 11282 12446 13712
##        p12   p13   p14   p15   p16   p17   p18   p19   p20   p21   p22
## 1954  6235  5974  8921  6610  5973  5673  3875  2361  1352  1226   912
## 1955  3806  4519  5646  4851  5374  4713  7367  7236  5245  3636  2417
## 1956 11017 14683  7258  6195  5962  4213  2775  1781   936   898  1160
##        p23   p24   p25   p26   p27   p28   p29   p30   p31   p32   p33
## 1954   521   363   229   142    82   542   939  2431  3687  4543  4535
## 1955  1258   766   479   402   248   254   604  1346  2342  3328  3599
## 1956  3158  3386  4547  4823  4970  4940  5793  7836  4457  6901  8191
##        p34   p35   p36   p37   p38   p39   p40   p41   p42   p43   p44
## 1954  5441  4412  3022  2656  1967  1295   915   551   313   167    95
## 1955  4081  7643  7919  6098  6896  5634  5134  4188  3469  2442  1931
## 1956  6766  5165  2919  3415  3431  3162  2525  2290  1955  1936  2384
##        p45   p46   p47   p48   p49   p50   p51   p52   p53   p54   p55
## 1954    93    60    68  5259  6673  5441  3987  2952  3648  4222  3889
## 1955  1790  1722  1488  1416  1369  1666  2627  3840  4044  4929  5111
## 1956  4666  7219  8306  8027  7010  8149  8949  6105  5324  5766  6214
##        p56   p57   p58   p59   p60   p61   p62   p63   p64   p65   p66
## 1954  2295  1509   928   739   566   383   274   192   226   519  1224
## 1955  3152  4462  4082  3026  1589  2075  1829  1388  1149   968  1170
## 1956  7007  8154  9049  6883  8103  6803                              
##        p67   p68   p69   p70   p71   p72   p73   p74   p75   p76   p77
## 1954  2236  3818  6208  5996  5789  6652  7939  4868  3952  2712  1734
## 1955  1465  1676  3075  3815  4639  4424  2784  5860  5781  4897  3920
## 1956                                                                  
##        p78   p79   p80   p81   p82   p83   p84   p85   p86   p87   p88
## 1954  1224   703   508   366   279   243   343   761  1025  1221  1600
## 1955  3835  3618  3050  3772  3517  3350  3018  2625  2412  2221  2619
## 1956                                                                  
##        p89   p90   p91   p92   p93   p94   p95   p96   p97   p98   p99
## 1954  2267  3290  3471  3637  3703  4876  5364  4890  3029  1950  1225
## 1955  3203  2706  2717  2175  1628  2388  3677  3156  4272  3771  4955
## 1956                                                                  
##       p100  p101  p102  p103  p104  p105  p106  p107  p108  p109  p110
## 1954  1076   905   772   628   473   539   825  1702  2868  4473  5221
## 1955  5584  3891  3501  4436  4369  3394  3869  2922  1843  2837  4690
## 1956                                                                  
##       p111  p112  p113  p114  p115  p116  p117  p118  p119  p120  p121
## 1954  6592  6400  4752  3521  2719  1931  1500  1082   849   774   864
## 1955  5119  5839  5389  4993  4446  4851  4243  4620  4849  3664  3016
## 1956                                                                  
##       p122  p123  p124  p125  p126  p127  p128  p129  p130  p131  p132
## 1954  1308  1624  2224  2423  2959  3547  7237  5218  5311  4273  3270
## 1955  2881  3821  4300  4168  5446  5477  8579  7533  6884  4127  5546
## 1956                                                                  
##       p133  p134  p135  p136  p137  p138  p139  p140  p141  p142  p143
## 1954  2281  1549  1091   796   610   445   894  1454  2262  2363  3847
## 1955  6313  6650  6304  4842  4352  3215  2652  2330  3123  3955  4494
## 1956                                                                  
##       p144  p145  p146  p147  p148  p149  p150
## 1954  3876  3935  3479  3415  3861  3571  3113
## 1955  4780  5753  5555  5712  4786  4066  2891
## 1956
plot(flies)

par(mfrow=c(2,1))
blowfly$eggs %>% ts(frequency= 1) %>% plot
blowfly$eggs %>% ts(frequency= 52) %>% plot

par(mfrow=c(1,1))

the ‘ts’ class allows R to understand

  • that a vector is based on a uni-directional time sequence
  • convert frequencies easily

many base/stats functions are actually designed for the ‘ts’ class

but using them on a non- ts object may or may not result in an error

Warning: the ts class is designed only for equidistant series. other packages have other classes and functions which extend capabilities

Time series exploration & Descriptive techniques

  • relatively clear cyclical process, at least at the beginning (points 1:200).
  • the second part (200:361) looks less regular and with an upward trend.

Exploratory Questions

  • what are the cycle periods ?
  • is there any hidden periodical behaviour that is not immediately apparent ?
  • Do these 2 parts behave similarly or is there a fundamental change ? which attributes of the first part are conserved in the second part

We can see some clearer cyclical behaviour of a section if we zoom in and explore a bit with window:

flies<-ts(blowfly$total)
plot(window(flies, 1, 100, 1))

more zooming with deltat:

par(mfrow=c(3,1))
plot(window(flies, 1, 100, deltat = 5))
plot(window(flies, 1, 100, deltat = 10))
plot(window(flies, 1, 100, deltat = 20))

par(mfrow=c(1,1))

plot(window(flies, 1, 100, deltat = 20))

We finally lose the cyclical behaviour at ~deltat=20. let’s remember this…

The flies series is unusually clear, and still - its cyclical behaviour is only one possible pattern.

other series may represent processes that behave in a way that is not immediately recognizable, and have many things going at once.

examples

We naturally want tools that will describe the entire series’ and parts of the series’ attributes, be executable fast in R, and guide us in further steps (such as model fitting and forecasting).

for that, we need to understand how a given series evolves through time - via autocorrelation and related attributes of time series

Measures of serial dependence

The Lag Plot

Auto-correlation

Partial Auto-correlation

Cross-Correlation

Background on Autocorrelation - The lag plot.

how does my data behave in relation to itself ? or ’how does this week’s population relate to last week’s population ?

plot(flies[1:350], flies[2:351])

Clearly, there’s a pattern at lag =1, at least in the smaller population sizes

now let’s look at lags 2 and 3:

par(mfrow=c(1,2))
plot(flies[1:350], flies[3:352])
plot(flies[1:350], flies[4:353])
par(mfrow=c(1,1))

we start losing the pattern gradually, as we go to higher lags…

more lags, lag = 1 to lag =8, using lag.plot()

lag.plot(flies, lag = 8, layout = c(2, 4), diag = T)

however, at lags 9-16:

lag.plot(flies, lags = 8, set.lags = 9:16, layout = c(2, 4), diag = T)

summary

  • we begin with a pattern, lose it, and then find it again
  • at smaller populations, we begin with a positive correlation, lose it, then a negative correlation which peaks at about lag = 10, lose it, and so the cycle goes.

The Correlogram, using acf()

acf(flies)

plot(flies)

This summarizes and charts the autocorrelation structure in the series, up to a default lag size, which is 25 in this case.

The vertical axis is the magnitude of the Sample Autocorrelation Coefficients. they measure the relative correlation between observations at different distances apart (lags)

and the horizontal axis is simply the lag size.

let’s have a look at these Sample Autocorrelation Coefficients

a <- acf(flies, plot = F)
a$acf %>% head(10) # for the actual factors
##  [1]  1.00000000  0.88266919  0.74218182  0.57299028  0.39789881
##  [6]  0.22191632  0.06209902 -0.06770406 -0.16982318 -0.21847342

there’s no better evidence of cycles, the flies exhibit strong positive AND negative correlation at regular intervals throughout the series, with a cycle period of 19 weeks.

let’s look at the entire series

acf(flies, lag.max = 360, col = "red")

We can see

  • the (sort of regular) cyclical pattern, especially at the beginning
  • the diminishing positive correlation
  • some alternation between positive and negative correlations, which is typical of cycles.

acf(flies, lag.max = 360, col = "red")

warning: there is bias here with increasing lag size. why ?

  • there’s some kind of a diminishing but long term “memory” in the system, up to a lag of 100 weeks.

    • events that happened 50 weeks ago may still have an influence.
  • clear cycles with a period of 19 weeks

The main point :there’s a LOT of non-randomness in the sequence

This last point is especially important for linear and other modelling, which assume that samples are independent. we may want to stop and think before going on to perform a conventional lm() ! , and at the very least be aware of series’ behaviour.

and this illustrates why we need to explore time series internally before we go on to the other, external parameters.

The Sample Auto-covariance Coefficients

acf(flies, type = "covariance")

acf(flies, type = "covariance", plot = F)$acf %>% head(10)
##  [1]  5908371.3  5215137.3  4385085.8  3385439.3  2350933.9  1311164.0
##  [7]   366904.1  -400020.7 -1003378.4 -1290822.1

Partial Autocorrelation

Correlograms produced using the sample autocorrelation coefficients do not account for the fact that for a given lag size there may be correlation between internal points, (e.g.. values that are 4 points apart were correlated, but so do values that are 2 points apart).

We sometimes want to control for the internal correlations inside the lag, or in other words, to check what would have been the correlation coefficients, had all the internal lags’ coefficients were forced to zero.

This helps us later on, when selecting a model

the Partial Autocorrelation is the relationship between this week’s population and the population at lag n when we have already controlled for the correlations between all of the successive weeks between this week and week n

We only need to use pacf or acf(type="partial") instead of the default

acc <- a$acf %>% head(12) %>% round(2)

partial.acc <- pacf(flies, plot = F) %>% .$acf %>% head(12) %>% round(2)

acc
##  [1]  1.00  0.88  0.74  0.57  0.40  0.22  0.06 -0.07 -0.17 -0.22 -0.23
## [12] -0.20
partial.acc
##  [1]  0.88 -0.17 -0.21 -0.12 -0.12 -0.06 -0.01 -0.03  0.09  0.05  0.04
## [12]  0.16

plot(1:12, acc, xlab = "Lag", ylab = "Coefficient")
points(1:12, partial.acc, col = "blue")
lines(1:12, rep(0, 12))
legend("topright", c("ac coefficients", "Partial ac coefficients"), col = c("black", "green"), pch = c(1, 1))

pacf(flies)

what can we take from this ?

  • the maximal negative and maximal positive lags are important places in the cycle.

  • clues for possible underlying processes in lags 3 and 12

the PACF can be thought of as analogous to a derivative of ACF - thus finding points driving change in pitch or direction.

Summary till now, using just Correlograms for the entire series

  • cycle length - about 19
  • Lags 2-4, 12-16 are suspected: we should look for biological mechanisms there, when we construct an explanatory model.

  • Note: these are clues generated only from the time series itself, before considering any other data or knowledge in biology

Cross Correlation

The total number of flies is not the only one we have…

we could always scatter-plot the data frame variables against each other

plot(blowfly, lower.panel = NULL)

we get a strong correlation between total, deaths, and emerging. but there’s an inherent problem in this analysis…

flies <- ts(blowfly$total)

ccf(nonemerging, flies)

ccf(emerging, flies)

ccf(eggs, flies)

Reminder: Lags 2-4 (negative), 12-16 (positive) are suspected from pacf() of the total of “flies”

from ccf()

example, eggs vs. total-

  • negative at lags 2-3
  • negative at (-17 - -16)
  • positive at 12-13

it can be difficult to construct an explanation from this alone, but at least we now have clues of lead times and delay times between every two variables.

This is a first step in time series stats, towards modelling.

findings: cycle length = still 19

one possible explanation: larval processes appear to drive the cycles when they reach a maximum (density dependent competition ?)

Spectral Analysis

an alternative , complementary approach to analysing fluctuations is to analyse how the variance evolves through time, by distributing it to frequencies.

This method can detect important processes that may not be found with acf() to to too much noise in certain lags

The Periodogram, using spectrum

horizontal axis - frequencies instead of time vertical - variance

spectrum(flies, main = "")

in this analysis we look for frequency peaks. in this case:

spec<-spectrum(flies, main = "")

spec$spec %>% which.max() -> maxloc # where's the maximal value ?
max.freq <- (spec$freq)[maxloc] # what's the maximal value ?
max.freq
## [1] 0.05333333

now let’s plot and find the exact value in time units

plot(spec, main = "")
abline(v = max.freq, col = "red", lwd = 2)

1 / max.freq
## [1] 18.75

suggests again cycles of ~19 years.

the information may or may not be easier to interpret than a correlogram.

it’s best to use both when exploring.

typical series and interpretations

No definitive pattern in lags beyond 0 or a dominant frequency

typical of a Pure White Noise process, which is Normal errors around a steady mean.

No definitive pattern in lags, but high correlation for all lags

typical of a Random Walk process, where values are correlated to (mostly) only the latest value (the values in the previous step), plus a small white noise

Super Important

Is there a real trend here ? the time plot and our brain say there is. but the underlying process is probably not a real trend or combination of trends, but a kind of random ‘drift’.

the next value is more likely to be a minor deviation from the last value, with equal odds for moving up or down, than a result of a another external driver !

random walks are notorious in deceiving our brains. actually, the long term mean of the formula used to create this series, rw<-cumsum(rnorm(100,0,1)) is 0 !

repeat:

The same ACF even though the time plot is completely different

repeated negative followed by positive: return to equilibrium following random departures from it.

This is called an Autoregressive Process of order 1 or ‘AR(1)’ (the order can be determined from the pacf()) in this case a negative AR(1) with rapid response.

ecological models of equilibrium population dynamics are typically Autoregressive.

analysing parts of a TS + de-trending

now let’s adapt the analysis for the two parts, separately

split the series:

first <- ts(flies[1:200])
second <- ts(flies[201:361])

what are the notable differences between the 2 parts ?

same overall shapes, same cycle length

Important Lags in the second parts are shifted 1 or 2, compared with the first - change points are somewhat closer to each other

what if we wanted to analyse the two with a commom baseline ?

de-trending using lm()

let’s de-trend using linear regression, and have look at the results.

firstX <- 1:length(first)
first.lm <- lm(first ~ firstX)
first.detrended <- first - predict(first.lm)

secondX <- 1:length(second)
second.lm <- lm(second ~ secondX)
second.detrended <- second - predict(second.lm)

par(mfrow = c(2, 2))
plot(first) ; lines(first.lm$fitted.values, col = "red")
plot(second) ; lines(second.lm$fitted.values, col = "blue")
plot.ts(first.detrended) # reminder Q: why plot.ts ?
plot.ts(second.detrended)

par(mfrow = c(1, 1))

first.lm$coefficients
## (Intercept)      firstX 
## 2112.433116    4.882556
second.lm$coefficients
## (Intercept)     secondX 
##  2827.57461    21.94437

note: the pitch of of ‘second’ is 5 times that of first

now, let’s compare again

par(mfrow = c(2, 2))
acf(first.detrended) ; acf(second.detrended)
pacf(first.detrended); pacf(second.detrended)
par(mfrow = c(1, 1))

comparing the de-trended versions: acf() - results similar, but pacf() are quite different

conclusions:

  • suspected trend change
  • continuity in cycle period at the same time as a trend change,
  • quite a difference in the underlying change points
  • also: a new negative partial coefficient at lag 18

de-trending using diff()

par(mfrow = c(2, 1))
plot(flies)
flies %>% diff() %>% plot()

par(mfrow = c(1, 1))

De-trending Remarks

De-trending is the goal of many methods in time series analysis that are not presented here

+first order differencing will usual be enough coerce the series to become stationary, and will do it while still keeping the attributes of the signal. de-trending by a lm() may create noise that was not there.

it has several uses in model selection, and the advantage that parameters do not need to be estimated (e.g. from a linear model)

Smoothing

let’s return to some noisier series

par(mfrow=c(2,2))
plot(w)
plot(rwd)
plot(sine.wn)
plot(Y)

par(mfrow=c(1,1))

last time we used window()

par(mfrow = c(2, 2))

plot(window(w, 100, 200))
plot(window(w, 100, 150))
plot(window(w, 100, 125))

par(mfrow = c(1, 1))

this approach doesn’t work

we should probably start looking by smoothing first

smoothing via filter()

filter allows many methods for smoothing

filtering with a moving average

this is the simplest technique we use the filter argument with a weights vector, giving equal weights

par(mfrow = c(3, 1))

v <- filter(w, sides = 2, filter = rep(1, 3) / 3)
plot(w)
lines(v, col = "red")

rwd.s <- filter(rwd, sides = 2, filter = rep(1, 3) / 3)
plot(rwd)
lines(rwd.s, col = "red")

sine.wn.s <- filter(sine.wn, sides = 2, filter = rep(1, 3) / 3)
plot(sine.wn)
lines(sine.wn.s, col = "red")

Y.s <- filter(Y, sides = 2, filter = rep(1, 3) / 3)
plot(Y)
lines(Y.s, col = "red")

par(mfrow = c(1, 1))

let’s try a 9 point average

v <- filter(w, sides = 2, filter = rep(1, 9) / 9)
plot(w)
lines(v, col = "red")

rwd.s <- filter(rwd, sides = 2, filter = rep(1, 9) / 9)
plot(rwd)
lines(rwd.s, col = "red")

sine.wn.s <- filter(sine.wn, sides = 2, filter = rep(1, 9) / 9)
plot(sine.wn)
lines(sine.wn.s, col = "red")


Y.s <- filter(Y, sides = 2, filter = rep(1, 9) / 9)
plot(Y)
lines(Y.s, col = "red")

let’s try a 15 point average

v <- filter(w, sides = 2, filter = rep(1, 15) / 15)
plot(w)
lines(v, col = "red")

rwd.s <- filter(rwd, sides = 2, filter = rep(1, 15) / 15)
plot(rwd)
lines(rwd.s, col = "red")

sine.wn.s <- filter(sine.wn, sides = 2, filter = rep(1, 15) / 15)
plot(sine.wn)
lines(sine.wn.s, col = "red", lwd = 3)


Y.s <- filter(Y, sides = 2, filter = rep(1, 15) / 15)
plot(Y)
lines(Y.s, col = "red")

increase the averaging period to receive a smoother results and (maybe) reveal patterns in the noise

a moving average is a local smoother, meaning it allows a change of series behaviour through out the series.

3### smoothing via ksmooth()

t <- 1:length(sine.wn)

plot(sine.wn)
lines(sine.wn.s, col = "red", lwd = 3)
lines(ksmooth(t, sine.wn, "normal", bandwidth = 5), col = "blue", lwd = 2)

plot(sine.wn)
lines(sine.wn.s, col = "red", lwd = 3)
lines(ksmooth(t, sine.wn, "normal", bandwidth = 10), col = "blue", lwd = 2)

plot(sine.wn)
lines(sine.wn.s, col = "red", lwd = 3)
lines(ksmooth(t, sine.wn, "normal", bandwidth = 20), col = "blue", lwd = 2)

The wider the bandwidth, the smoother the result

smoothing via smooth.spline()

t <- 1:length(sine.wn)

# smoothing paraeter=0.1
plot(sine.wn)
lines(sine.wn.s, col = "red", lwd = 3)
lines(smooth.spline(t, sine.wn, spar = 0.1), col = "blue", lwd = 2)

# smoothing paraeter=1
plot(sine.wn)
lines(sine.wn.s, col = "red", lwd = 3)
lines(smooth.spline(t, sine.wn, spar = 1), col = "blue", lwd = 2)

# smoothing paraeter=0.7
plot(sine.wn)
lines(sine.wn.s, col = "red", lwd = 3)
lines(smooth.spline(t, sine.wn, spar = 0.7), col = "blue", lwd = 2)

smoothing via several methods

t <- 1:length(flies)

plot(flies)
lines(ksmooth(t, flies, "normal", bandwidth = 5), col = "blue", lwd = 2)

# spline smoothing smoothing paraeter=0.4
plot(flies)
lines(smooth.spline(t, flies, spar = 0.4), col = "blue", lwd = 2)

Conclusion & Remarks

The description of a given time series is usually a cyclical process by itself

some methods that are covered here are also useful in other contexts (e,g. model fitting, control)

Further Steps

  • Model fitting or Comparison
  • Forecasting
  • Discussion of outliers
  • Statistical Theory
  • Statistical Methodology

General Warnings

  • many methods, not described here, do have some underlying assumption. most commonly the assumption of Stationarity
  • Most of the methodology of time series covers only discrete time and continuous (or pseudo-continuous) variable (e.g. binary and point process are not covered)
  • many of the methods do not describe short and relatively stable series ( like a 10 points richness series) well enough for analysis

Terminology & Abbreviations

R functions

function Purpose
ts construct a time series object
ts.plot plot a vector or matrix as if they were of the special ‘ts’ class.
window have a look at sections of series
lag.plot create a scatter-plot matrix by lags
acf , pacf , ccf study serial correlation
Diff recode the series by differencing
lm construct a Linear Model, such a simple univariate regression to remove a trend
filter, ksmooth, smooth.spline applying linear and non linear filters. in the context of this lesson - for smoothing

Process Terminology

Process - a mathematical description of observations as a function of time (and perhaps other variables)

Stationary Process (intuitive definition, non formal) :

  • No systemic change in mean (no trend)
  • No systemic change in variance
  • Strictly periodic variations have been removed

In other words, The properties of various sections of the data are constant

  • Usually in the context of model fitting
  • Non Stationarity components (trend, cycles) may be the focus, or we may want to remove them prior to analysis
  • much of the probability theory of time series analysis concerns only stationary series, so we may need to transform a given series to stationary

NOTE: No real process is strictly Stationary !

Series Terminology

Term Refers to
Continuous Observations are made continuously through time (even if the variable of interest can only take certain)
Deterministic a time series that can be predicted exactly, by a well defined mathematical process
Discrete observations are taken at specific times
Equidistant A series where the variable is recorded at equal intervals (e.g every month)
Lag The time- distance between points in the series, also used for delay lengths in process/series descriptions
Stationary See above
Stochastic The future is only partly determined by the past/ exact predictions are impossible and are replaced by the idea that future values have a probability distribution

Process Types Terminology

Term Refers to
Point Process Binary Observations occur randomly at non equal interval
Binary Process Observations can take only “0” or “1”, Yes/No and so on

Abbreviations and Notation

Term Refers to
ACF or ac.f Autocorrelation Function
pACF Partial (corr. coefficient) ACF
AR Autoregressive
MA Moving Average

Source Material and useful links

Useful Links

Bibliography

Heavily based on some Textbooks and R manuals

  • Chatfield - Chapters 1, 2 & 6 Chatfield, Christopher. 2004. The Analysis of Time Series: An Introduction. 6th ed. Texts in Statistical Science. Boca Raton, FL: Chapman & Hall/CRC.
  • Shumway & Stoffer - Chapters 1, 2 & 4 Shumway, Robert H., and David S. Stoffer. 2006. Time Series Analysis and Its Applications: With R Examples. 2nd [updated] ed. Springer Texts in Statistics. New York: Springer.
  • The R book by Crawley - Chapter 24 Crawley, Michael J. 2013. The R Book. Second edition. Chichester, West Sussex, United Kingdom: Wiley.

There are many others…