Daily peak electricity load forecasting in South Africa using a multivariate non-parametric regression approach

Accurate prediction of daily peak load demand is very important for decision makers in the energy sector. This helps in the determination of consistent and reliable supply schedules during peak periods. Accurate short term load forecasts enable effective load shifting between transmission substations, scheduling of startup times of peak stations, load flow analysis and power system security studies. A multivariate adaptive regression splines (MARS) modelling approach towards daily peak electricity load forecasting in South Africa is presented in this paper for the period 2000 to 2009. MARS is a non-parametric multivariate regression method which is used in high-dimensional problems with complex model structures, such as nonlinearities, interactions and missing data, in a straight forward manner and produces results which may easily be explained to management. The models developed in this paper consist of components that represent calendar and meteorological data. The performances of the models are evaluated by comparing them to a piecewise linear regression model. The results from the study show that the MARS models achieve better forecast accuracy.


Introduction
One of the most weather-sensitive sectors of any economy is the energy sector.In this sector accurate prediction of daily peak electricity demand is very important.It provides short term forecasts which are required for dispatching and economic grid management of electric energy [1,2,3,8,16,19,21,22].The most important weather factors which affect daily peak demand (DPD) is temperature.Changing weather conditions represent the major source of variation in peak demand forecasting and the inclusion of temperature has a significant effect due to the fact that in winter heating systems are used, whilst in summer air conditioning appliances are used [6,10,11,13,14,15,17,18].Other weather factors include relative humidity, wind speed and cloud cover.Electricity demand forecasting has received extensive attention in the literature using various techniques ranging from classical time series methods and neural networks to regression methods.In this paper a multivariate adaptive regression splines (MARS) model is developed and used to predict daily peak electricity demand for South Africa.An updated review of different forecasting methods may be found in [7,12].
The remainder of the paper is organised as follows.In Section 2 the data are described and a preliminary data analysis is carried out.The piecewise linear regression and the MARS models are presented in Section 3. A discussion of the results is presented in Section 4, and the paper closes in Section 5.

Definitions and data
The data considered in this paper are on net energy sent out (NESO) in response to some demand for electrical power.NESO (measured in megawatts) is defined as the rate at which electrical energy is delivered to customers.In this paper NESO is used as a proxy of electrical demand after adjusting for energy losses.The data are for the period 2000 to 2009.
This definition of electrical demand has its weaknesses.Electrical demand is bounded by the power plants' capacity to provide supply at any time of the day, including the need for reserve capacity.Demand cannot exceed supply and there are no market forces acting to influence electricity prices and hence reducing demand in the short run.Prices are generally fixed.If demand were to exceed supply, intervention takes place in the form of, for example, load shedding.Load shedding is the last resort used to prevent a system-wide blackout.This NESO definition excludes the demand from people, companies, etc. who are willing (or unwilling) and able (or unable) to pay for electricity, but currently do not have access to electrical power.Despite the weakness in the NESO definition of electrical demand, it is still a good and measurable proxy for electrical demand.
The daily peak demand (DPD) is the maximum hourly demand in a 24-hour period.Aggregated DPD data were used for the industrial, commercial and domestic sectors of South Africa.Historical data on temperature were also collected from 22 meteorological stations from all the provinces of the country.The data were aggregated to obtain average daily, maximum and minimum temperatures for the entire country.
The time series plot of DPD in Figure 1 shows a positive linear trend and a strong seasonal fluctuation.The trend is mainly due to economic development of the country.Figures 2  and 3 shows daily and monthly index plots respectively.The basis for each index is 100.The seasonal peak is in July, which is a winter month.There is another small summer peak in October.The daily index plot shows that demand for electricity during week days is above the average consumption and decreases significantly on Saturdays and Sundays.A better representation of the relationship between DPD and temperature is shown in Figure 4.The peak temperature is the temperature recorded during the hour of peak demand on day t.The relationship is nonlinear.The demand for electricity is highly sensitive to temperature fluctuations in winter and less sensitive in summer.DPD increases sharply as temperature decreases.The non-linear relationship between temperature and DPD calls for the derivation of two functions: one for cooling degree-days and the other for heating degree-days.Cooling degree-days (CDD t ) and heating degree-days (HDD t ) are estimated on the basis of the two linear functions respectively, as defined in [12], where T ref represents the temperature which separates the winter and summer periods and where T t represents the peak temperature on day t.The reference temperature (T ref ) has been selected to be equal to 20.5 • C; this appears to be the temperature at which the minimum demand for electricity occurs.Above this temperature, electricity demand tends to rise slightly and below this temperature electricity demand increases significantly.

The models
A piecewise linear regression model and a MARS model are presented in this section.These models are used later for out-of-sample predictions of DPD.In both models DPD is taken as the dependent variable.

The piecewise linear regression model
Regression-based methods have been used extensively in load demand forecasting [4,10,20,22].These methods range from simple linear to multivariate linear regression models and work very well when the relationship between the dependent variable and the predictor variables is linear.They are usually fast, reliable and easy to implement with relatively robust solutions.
However, the relationship between electricity demand and temperature is nonlinear, as shown in Figure 4.This calls for the use of a multivariate linear regression model with three piecewise linear regression functions representing the winter, non-weather and summersensitive components.
The piecewise linear regression model used in this paper may be written as where x pt represents peak temperature (in degrees Celsius).The peak temperature is the temperature recorded at the hour of peak demand on day t, z t denotes the DPD (in megawatts) observed on day t, t w denotes the temperature where the winter-sensitive portion of demand joins the non-weather-sensitive demand component, t s denotes the temperature where the summer-sensitive portion of demand joins the non-weather-sensitive demand component, and β 0 represents the mean DPD observed during the non-weathersensitive period (t w ≤ x pt ≤ t s ).It should be noted that DPD during non-weather-sensitive days does not depend on temperature (x pt ).
where R t is a stochastic disturbance term and ε t is the innovation in the disturbance with The model in (1) accounts for any residual correlation that may occur as a result of the week-to-week variation in peak demand and also for the day-to-day variation.The model is based on the following theoretical assumptions: 1. Peak demand on day t is highly correlated with peak demand on day t + 1.
2. There may be significant correlation between demand 2 days, 5 days and/or 7 days apart.
The derivations of the equations of the three demand-temperature lines are presented in the appendix at the end of the paper.

The multivariate adaptive regression splines (MARS) model
MARS is a non-parametric multivariate regression method which was developed in [9] and has been used to solve high-dimensional problems with complex model structures, such as nonlinearities, interactions, multicollinearity and missing values [3,6,17,27].The method does not make any assumptions about the functional relationship between the response variable and the predictor variables.The MARS modelling approach overcomes the major drawbacks of using artificial neural networks which have long training processes, interpretive difficulties and an inability to determine the relative importance of potential input variables.In the MARS paradigm the modelling space is divided into subregions and then fits in each subregion simple linear regression models.The model building process occurs in two steps: the forward stepwise algorithm and the backward stepwise algorithm.In the forward stepwise step the MARS algorithm constructs a large number of basis functions which over-fits the data.In the backward stepwise step basis functions are deleted in order of least contribution using the generalised cross validation (GCV) criterion [5].The general MARS model may be written as where The GCV criterion is a measure of the goodness of fit which takes into account the residual error and the model complexity.In its simplest form the GCV criterion may be written as where N is the sample size, and C(M ) is the cost-penalty measure of a model containing M basis functions.The numerator measures the lack of fit on the M basis function model fM (x i ) and the denominator represents the penalty for the model complexity C(M ).As in [9], the complexity cost function may be written as where B is the M ×N data matrix of the M (nonconstant) basis functions ( The best model is one with the lowest GCV criterion value.
The three general MARS models used in this paper are and where a 0 , ω 0 , β 0 and c 1 , . . ., c 6 are constants, and where the parameters have meanings as declared in §3.1.

Results and discussion
The forecast results obtained via the piecewise linear regression model and the MARS models are presented in this section.

Piecewise linear regression model
Three different piecewise linear functions for modeling the peak demand (z t ) and peak temperature (x pt ) relationship were proposed in (1).The values of t w and t s are taken as 17.5 • C and 24 • C, respectively.These values were determined from a visual inspection of the graph in Figure 4. Piecewise linear regression models were fitted for various reference temperatures in the interval 17 • C -24 • C, without any significant improvements in the results.The reference temperature (T ref ) has been selected as 20.5 • C, as mentioned before.The resulting piecewise linear function is where

The MARS models
The values of the various parameters in the MARS models in ( 4)-( 6) are estimated from the data in this section.The forecasting results obtained via the models are also interpreted.

Model 1
The model in ( 1) is a simple MARS model which was used to determine the reference temperature separating the winter periods from the summer periods of the DPD-temperature relationship.The DPD is the dependent variable in this model with the peak temperature as the regressor variable.The best MARS model achieved a GCV value of 8.66744 × 10 6 and the reference temperature was found to be 20.9 • C. The complete model may be written as = 27833.6− 125.423 max{0, x pt − 20.9} + 384.209 max{0, 20.9 − x pt }.
A plot of the value of z t as a function of x pt may be found in Figure 5.If temperature decreases by a degree from 20.9 • C, the DPD increases by 384.209MW.Similarly, an increase by one degree above 20.9 • C results in a DPD decrease of 125.423MW.This shows that the DPD is more sensitive to low temperatures.This model was used to determine the number of heating degree days and also the number of cooling degree days.

Model 2
Out of the 24 predictor variables, the MARS algorithm selected eight variables as the most important.These variables are shown in The coefficient of basis function 1 is negative, meaning that if the trend is above 2 375, electricity demand decreases at a rate of 1.74 MW and when its below this knot, it decreases at a rate of 2.99MW.The trend component shows that the DPD is increasing at a decreasing rate.The coefficient of basis function 3 is negative, implying that if the peak temperature increases by 1 The piecewise linear GCV value was 3.82422 × 10 9 .The value achieved by the model is displayed in the graphical plot of ADESO against ADT, shown in Figure 6.
If the average daily temperature is less than or equal to 16  is shown in Figure 7.The modelling space is divided into three subspaces, separated by two knots at 16 • C and 22 • C, as shown in Figure 7.

Evaluating the goodness of fit of the models
The root mean squared error (RMSE) was used to evaluate the goodness of fit achieved by the piecewise regression model and the MARS model for peak load demand forecasting in the out-of-sample predictions for the period 1 November to 14 December 2009.As mentioned, the training period was 1 January 2000 to 31 October 2009.The RMSE was calculated as where n is the number of out-of-sample forecast data points and z at − z f t represents the forecast errors.The terms z at and z f t are the actual DPD and its future forecast, respectively.The goodness of fit results for thepiecewise linear and MARS models are shown in Table 3.It may be seen from the table that the MARS models outperform the piecewise linear regression model convincingly.Table 3: Goodness of fit results for the piecewise and MARS models.
The forecasts using MARS model 2, approximate 95% prediction intervals and the actual DPD values for the first seven days of November 2009 are given in Table 4.The actual peak demand falls within the prediction interval for all seven days.The MARS model seems to be useful for making short-term forecasts of daily peak demand.

Conclusions
A MARS model was developed for predicting daily electricity peak demand and the performance of the model was compared to that of a piecewise linear regression model.There were 3 636 data points, spanning the period 1 January 2000 to 14 December 2009.Of these, 3 592 data points were used for developing the models, while the remaining 44 observations were reserved for validation purposes.The MARS model outperformed the piecewise linear regression model convincingly and is easy to explain to management.The model is capable of clustering together categories of variables that have similar effects on the dependent variable.
Future research may include a sensitivity analysis with respect to daily and seasonal peak

Figure 4 :
Figure 4: Scatter plot of daily peak demand against peak temperature (in o C).

is a basis function, α 0
and α m are parameters, M is the number of basis functions, K m is the number of knots, s km takes on values of either 1 or −1 indicating the right or left sense of the associated step function, v(k, m) is the label of the independent variable and t km indicates the knot location.The MARS algorithm selects variables and values of those variables for knots of the hinge functions.

Figure 5 :
Figure 5: DPD as a of temperature according to MARS model 1.

Figure 6 :
Figure 6: Scatter plot of average daily energy sent out against average daily temperature.
(8)The coefficient of t is positive, showing a positive linear trend.The dummy variable x 1t is negative, showing that if the peak temperature decreases by one degree from 17.5 • C, electricity demand increases by 232.8 MW.The coefficient of x 2t in (1) is positive, showing that if the temperature increases by one degree from 24 • C, electricity demand increases by 21 MW.This shows that electricity demand is more sensitive to winter conditions than to summer conditions.All the coefficients of the dummy variables representing Friday, Saturday, Sunday, holiday, day before holiday and day after holiday are negative.This shows that there is a decrease in demand during these periods.Of the three days of the week, the largest decrease occurs on a Sunday.During holidays, demand for electricity decreases significantly compared to a day before and after a holiday.The smallest decrease is experienced on days after holidays.

Table 2 :
Table 2 in order of their importance.The piecewise linear GCV value was 9.02477 × 10 5 .Important predictor variables according to the MARS models.
• C from 19.2 • C, the DPD decreases by 235.381MW and if the peak temperature decreases by 1 • C below this knot, the DPD increases by 411MW.The coefficient for basis function 5 is positive, meaning if the day of the week is not a Sunday, then the DPD increases by 2598.11MW, but if the day is Saturday, there is a decrease in the DPD of 2367.12MW.The DPD increases by 2722.32MW if day t is not a holiday and increases by 500.434MWif it is not a day before a holiday.There is one bivariate interaction between a day before and after a holiday.If day t is not a day before or after a holiday, the DPD increases by 1280.82MW.If day t is not a Friday, there is an increase in DPD of 924.862MW.The third model is a MARS model for the Average Daily Energy Sent Out (ADESO) with average daily temperature (ADT) as the predictor variable.The model identifies the winter-sensitive, weather-neutral and summer-sensitive periods.The resulting model is ADESO = 564863 + 7332.94max{0, 22 − ADT} + 3714.8 max{0, ADT − 16}.
If the temperature increases by 1 • C (e.g. from 22 • C to 23 • C), electricity demand increases by 3715MW, which is about a 0.6% increase.If the temperature decreases by 1 • C in the range 16 • C-22 • C (e.g. from 22 • C to 21 • C), electricity demand increases by 3618MW, which is about s 0.6% increase.The MARS plot That is, if the temperature decreases by 1 • C (e.g. from 16 • C to 15 • C), electricity demand increases by 7333MW, which is about a 1.2% increase.