Identifying secondary series for stepwise common singular spectrum analysis

Common singular spectrum analysis is a technique which can be used to forecast a primary time series by using the information from a secondary series. Not all secondary series, however, provide useful information. A first contribution in this paper is to point out the properties which a secondary series should have in order to improve the forecast accuracy of the primary series. The second contribution is a proposal which can be used to select a secondary series from several candidate series. Empirical studies suggest that the proposal performs well.


Introduction
Common singular spectrum analysis (CSSA) was introduced by Viljoen & Nel (2010). It is a multiple time series analysis procedure, combining singular value decomposition of an unfolding matrix with the concept of common principal components introduced by Flury (1988). A modification of CSSA is proposed by Viljoen & Steel (2012). It entails replacing the original maximum likelihood approach by a stepwise procedure which was introduced by Trendafilov (2010). Another popular procedure for multiple time series analysis is multi-channel singular spectrum analysis (MSSA), an extension of singular spectrum analysis (Golyandina et al. 2001, Golyandina & Stepanov 2005. The primary focus in this paper is on (stepwise) CSSA, but MSSA and SSA are also considered for comparison purposes in the simulation study discussed in §3.
Although CSSA caters for an arbitrary number of time series we restrict our discussion to the case of two series. In such a context it frequently makes sense to distinguish between a primary and a secondary series. The main interest is in forecasting the primary series and to improve the forecast accuracy by using the information provided by the secondary series. An important question that arises, concerns the properties required for a secondary series to provide useful information for improving forecasting accuracy. This question is addressed in §3. A simulation study is described in which the improvement in forecast accuracy achieved by different secondary series was investigated for CSSA and MSSA. In practical applications one often has multiple candidate secondary series and a next important question is whether a strategy can be proposed for identifying a good secondary series. In §4 an answer is provided to this question. The proposed strategy is based on the residuals obtained by combining the primary series with every candidate secondary series in a pairwise CSSA. In §5 the proposal is illustrated with a practical example. Section 6 contains concluding remarks and ideas for further research.
In the next section background information on MSSA as well as stepwise CSSA is given.

MSSA and stepwise CSSA
Consider two time series of equal length N , y 1t and y 2t , where t = 1, . . . , N . A signalplus-noise structure is assumed for both series, i.e.
Here, f j is the systematic (signal) component for series j, j = 1, 2, and ε j is the random (noise) component. Later it is assumed that ε 1 and ε 2 are Gaussian random variables. Our primary interest is in forecasting f 1 and we intend using information supplied by the second time series to improve the accuracy of our forecasts.
Both MSSA and stepwise CSSA consider these time series in terms of trajectory matrices. Let L, called the window length or embedding dimension, be an integer satisfying 1 < L < N/2. We form the following L × K trajectory matrices MSSA is based on the singular value decomposition of Y T . Let Y T = U AV T denote this decomposition. In this expression, U is a 2K × L matrix, the columns of which span the column space of Y T , V is an L × L matrix, the columns of which span the row space of Y T , and A is an L × L diagonal matrix of singular values a 1 ≥ a 2 ≥ · · · ≥ a L ≥ 0. Also, The matrix Y can be expressed in terms of the singular values and the columns of U and V , viz. Y = L i=1 a i v j u T i , and with d denoting the number of positive singular values, this becomes . . , L. The matrices Y (1) , . . . , Y (d) are bi-orthogonal and of rank 1. MSSA proceeds by splitting the matrices Y (1) , . . . , Y (d) into two groups, Y (1) , . . . , Y (r) and Y (r+1) , . . . , Y (d) , with the first group corresponding to estimated signal and the second group to noise or residual. The parameter r has a strong effect on the results of MSSA as will be seen in the simulation results later.
. This matrix may be seen as a smoothed version of Y , with F 1 corresponding to Y 1 and F 2 to Y 2 . The matrices F 1 and F 2 will typically not have a Hankel structure, and we therefore require a further step, reverse diagonal averaging, to obtain estimates of the signals corresponding to the two time series (Golyandina et al. 2001, pp. 17). Applying this to each of the matrices F 1 and F 2 , we obtain estimatesf 1t andf 2t for t = 1, . . . , N of the signal series corresponding to the two observed time series.
Turning our attention to forecasting, the following approach was proposed by Venter (2000) and is equivalent to the approach proposed by Golyandina et al. (2001). We consider K ]. A one-step-ahead forecast can be obtained by appending another columnf (1) K+1 to this matrix. This column has componentŝ f (1) N +1 is the forecast value which has to be determined. The forecast valuef (1) N +1 is found to minimize the distance betweenf  N +1 has been computed, the process can be iterated to obtain recurrent forecasts for longer horizons.
In practice, data have to be used to estimate Σ 1 , . . . , Σ k . Let S 1 , . . . , S k denote the usual sample covariance matrices. Estimating the CPCs entails finding an L × L orthogonal matrix Q (an estimate of the matrix Ψ) and diagonal matrices D 2 1 , . . . , D 2 k (containing the sample eigenvalues) such that Q T S i Q D 2 i holds simultaneously for i = 1, . . . , k as closely as possible in some sense. Flury (1988) derived a maximum likelihood approach for estimating the CPCs, while Trendafilov (2010) introduced a stepwise algorithm for this purpose -the interested reader is referred to Trendafilov (2010) for more details.
The stepwise approach to CPCs can easily be used in a singular spectrum analysis framework. This entails centering the trajectory matrices Y 1 and Y 2 by subtracting the column averages, thereby obtaining Y 1 and Y 2 . The symmetric matrices 2 are formed and can now be provided as input to the stepwise CPC algorithm. The remainder of the analysis proceeds as described in MSSA.

Secondary series properties
In this section the properties which render a secondary series useful for inclusion in a multiple time series forecasting scheme, such as CSSA and MSSA are determined. For this purpose, stepwise CSSA, MSSA and SSA were applied in several simulation scenarios in order to determine when and to what extent the first two procedures improved upon SSA.
Consider two time series y 1t and y 2t for t = 1, . . . , N . The signal-plus-noise structure given in (1) is assumed, with ε 1 and ε 2 distributed as follows Five covariance matrices were considered in Table 1 for the noise by varying the noise variance of the secondary series and the correlation between the two noise variables. The primary signal series is assumed to be whereas the secondary signal is assumed to be The trend in the secondary signal is determined by a, the amplitude of the signal by b and the period by c. Using different values for a, b and c, different secondary signals are created and then used in MSSA and stepwise CSSA to forecast the primary signal. The noise structures in Table 1 are used with each pair of primary and secondary series. This leads to the following thirty cases of secondary series as shown in Table 2.  For Cases 1-5 the secondary series signal is identical to that of the primary series. In Cases 6-10 the secondary signal is stationary with amplitude and period equal to that of the primary series. In Cases 11-15 and 16-20 the signals of the primary and secondary series differ with respect to period, while in Cases 21-25 and 26-30 the difference is with respect to amplitude.
In the simulation study the time series length N was set at 100 and the number of simulation repetitions was taken as 500. The six step ahead root mean squared error (RMSE) was calculated as a measure of forecast accuracy. Note that RMSE is calculated as the square root of the average (over the six steps ahead forecast period) squared difference between the forecast values (f t which is the same asŷ t ) and the true signal values (f t ). This can be done since the values of f t are known in a simulation study. The forecast errors were calculated for L = 12, L = 24 and L = 48. The results for L = 12 are omitted since there was no improvement. However, taking L = 24 or L = 48 leads to considerable improvement in the performance of all three techniques. For L = 24 and L = 48, we investigated r = 2, 3, . . . , 16 and r = 2, 3 . . . , 25 respectively. Table 3 contains the smallest six step ahead forecast RMS errors and corresponding values of r for each of the three techniques and the thirty cases for L = 24.
What conclusions can be drawn from Table 3? Overall, it is clear that CSSA is best in all cases. There are several cases where SSA outperforms MSSA, namely when the secondary series period differs from that of the primary series. MSSA is particularly bad when the secondary series period is larger than that of the primary series (see Cases 16-20). Recall that successive groups of five cases in Table 2 correspond to specific assumptions regarding the signals. For example, in Cases 1-5 the two signals are identical, in Cases 6-10 the two signals differ with respect to trend, etc. The five cases within each of these groups correspond to the noise covariance structures in Table 1, respectively. If we first consider the different groups, we see that the smallest forecast errors for CSSA are for Cases 1-5 and 26-30. These correspond to the scenarios where the two signals are identical (Cases 1-5) and where the amplitude of the secondary series is larger than that of the primary series (Cases 26-30). Looking in Table 3 within each of the groups, the most favourable cases for CSSA are typically the second, fourth and fifth. These correspond to a smaller secondary noise variance, a positive correlation between the two noise variables and a negative correlation between the two noise variables, respectively. It would seem that CSSA benefits especially from the information provided by the secondary series if the secondary series has the same signal structure as the primary one and/or the secondary noise variable is correlated with the primary noise variable and/or has a smaller variance. The best r values for CSSA are equal to 3 except for Cases 11-20, where r = 5 is best. For MSSA the optimal r value is 4 in all cases except for Cases 11-20 where r = 6. The best r-value for SSA is r = 4.
The L = 48 results show that the forecast errors for CSSA are largely the same whereas those of MSSA and SSA are slightly smaller. Overall, CSSA is still best in all cases. The performance of MSSA in Cases 16-20 is significantly better than with L = 24 and is more or less the same as that of SSA. The best scenarios for CSSA are once again Cases 1-5 and 26-30. Finally, within each group of five cases it is again the second, fourth and fifth  that perform best. The r-values are the same as for L = 24. Since the conclusions for L = 48 are essentially the same as for L = 24, we do not include the table for L = 48.

Procedures for selecting a secondary series
Consider a practical situation of forecasting a primary time series. Incorporating the information provided by a secondary series may improve the forecast accuracy of the primary series. If several secondary series are available the question arises as to which and how many of these should be included in the analysis. In this study an attempt to provide a partial answer is presented. The attention is restricted to the possibility of including a single secondary series and since CSSA was found best in the first simulation study, only this technique is used in the second simulation study. More specifically, a primary time series together with nine secondary series which are candidates for inclusion in the analysis are simulated. Based on the results of the first simulation study four procedures for selecting a secondary series is considered. For explanatory purposes the primary series is referred to as Series 1 and to the nine secondary series as Series 2 up to Series 10. The first step is to perform CSSA with Series 1 and each of Series 2 up to Series 10. This gives nine sets of pairs of residuals (residuals for Series 1 paired with each of the Series 2-10). Selecting a secondary series makes use of these residuals. In Procedure 1 three (out of nine) secondary series with smallest residual variances is identified. Procedure 1 then selects, from these three, the one with residuals having the largest negative correlation with the primary residuals. Procedure 2 selects, from among the nine available, the secondary series with smallest residual variance. Similarly Procedure 3 selects the secondary series with residuals having the largest negative correlation with the primary residuals. Finally, Procedure 4 randomly selects a secondary series from the nine options. This procedure is included to see whether using a criterion to select the secondary series does better than simply randomly adding one. SSA is also included in order to verify the value of using a secondary series.
Seven cases were investigated in the simulation study. In all of these the primary series signal is given by equation (2). The secondary series signal is given by equation (3). Table 4 summarizes the choices for a, b and c. Cases −2t 10 12 Table 4: Choices for a, b and c in the second simulation study.
In each of these cases observations of the primary and nine candidate secondary series was simulated by adding Gaussian noise to the signal values. The noise values were simulated from a ten variable Gaussian distribution with mean zero and covariance matrix Regarding the variances of the noise of the candidate series, we look at cases having the same noise variance as the primary series, having smaller noise variance and having larger noise variance. As far as the correlation between the primary noise series and the secondary noise series is concerned, we have three uncorrelated cases, three cases with positive correlation and three cases with negative correlation. The specific numerical values in the matrix were used to ensure that the covariance matrix remains positive definite. The results discussed below are for L = 24, N = 100 and using 800 simulation repetitions. Root mean square errors (RMSE) are given for the four procedures and seven cases in Table 5.   Figures 1 and 2 illustrate the series that were selected by the different procedures in the seven different cases. Given the best r-value, Procedure 1 almost always selects Series 8. This is not unexpected since Procedure 1 is designed to select the secondary series simultaneously having small residual variance and with residuals having negative correlation with the primary series residuals. Series 8 is simulated to have both these properties. As expected the residuals from the pairwise CSSA analyses to a large extent reflect the covariance structure used in generating the residuals of the series (provided that the correct value of r is chosen). Similar remarks can be made for Procedures 2 and 3: Procedure 2 chooses either series 2, 5 or 8 (corresponding to medium residual variances equal to 1) while Procedure 3 chooses series 9 (the series with residuals having the largest negative correlation with the primary series residuals) with high probability. As expected, Procedure 4 almost uniformly selects any of the series.
In Figure 3 the choice of secondary series for Procedure 1 and Case 4 is shown for different values of r. We see that initially there is no clear pattern. From r = 4 onwards there is a very strong preference for Series 8. Note that r = 5 is the best choice of r for this case.

An example
In this section an application of the procedures to a practical dataset is described. The data are from Franses & Van Dijk (2013). The data consist of eight daily indices of stock markets in Amsterdam (EOE), Frankfurt (DAX), Hong Kong (Hang Seng), London (FTSE100), New York (S&P 500), Paris (CAC40), Singapore (Singapore All Shares) and Tokyo (Nikkei) for the period 6 January 1986, until 31 December 1997. The data from 1 January 1988 until 28 December 1997 are considered, giving a total of 2 606 observations. The study entailed 26 repetitions. In each repetition 100 consecutive observations were analysed using each of the five procedures and six step ahead forecasts were computed. The eight indices differ substantially in terms of their magnitudes, varying from approximately 200 up to approximately 22 000. Equation (4) was therefore modified to give a mean absolute relative percentage error measure, In Tables 6 and 7 the results are summarized for L = 24 and L = 48, respectively.
The results in these tables were obtained by using the Amsterdam index as primary series with the other seven indices acting as candidate series. We see that all procedures   performed better for L = 24 than for L = 48. The best results were obtained for r = 5 with Procedure 3 giving the lowest error, closely followed by Procedure 1. We also found that the Singapore All Shares and Tokyo Nikkei indices were by far selected most frequently by Procedures 1 to 3. All in all it seems that some useful information could be provided by these indices when trying to predict the Amsterdam index.

Conclusions and questions for further research
In this paper the possibility of selecting a secondary time series from a set of available candidates for use in stepwise CSSA was investigated. Four procedures were studied reflecting different strategies to select the secondary series. These procedures were compared to using SSA where no secondary series is involved. It was found that selecting a secondary series with residuals negatively correlated to that of the primary series gave the best results, outperforming SSA in all cases. A question which deserves further investigation is how to select a value of r from the data.