A note on flow-based formulations for solving resource constrained scheduling problems

The resource constrained scheduling problem involves the scheduling of a number of activities over time, where each activity consumes one or more resources per time period. For a feasible solution to exist, the total resource consumption per time period must not exceed the available resources. In addition, the order in which activities may be scheduled is determined by a precedence graph. In this paper, valid inequalities proposed for the resource flow-based formulation in previous studies are investigated to determine what effect they may have on computing times. It is shown empirically that improved computing times may be obtained if these valid inequalities are, in fact, omitted from the resource flow-based formulation. In addition, a heuristic is proposed for the generation of initial starting solutions and for estimating the extent of the scheduling horizon which, in turn, is required to calculate the latest starting times of activities. The computational results are based on well-known problem test instances as well as new randomly generated problem instances.


Introduction
A solution to the resource constrained scheduling problem (RCSP) prescribes starting times of activities such that the total resource usage by the activities per time period is within a pre-specified resource capacity. Furthermore, the activities have to be scheduled such that all of the precedence constraints are satisfied.
The use of mixed integer linear programming (MILP) as a modelling approach is well suited for the formulation of the RCSP due to the logical decision-making nature of the problem. Several different mathematical formulations may, however, exist for the same problem. These different formulations may be equivalent in terms of representing the feasible region and the objective function of the RCSP, but they may differ in the number of variables and constraints. This may, in turn, have an impact on the efficiency with which the underlying algorithm finds solutions to these models. In the literature, three main classes of RCSP formulations can be found, namely time-indexed formulations [10,12], resource flow-based formulations [3,4] and event-based formulations [9,16].
In a resource flow formulation, the resource consumption of activities is modelled as a network flow problem. That is, continuous variables represent the flow of a resource from one activity to the next. The start time of an activity is modelled as a continuous variable, while binary decision variables are required to fix the ordering of the activities. In this paper it is shown that valid inequalities applied to the resource flow formulation in Koné et. al [9], may actually have a detrimental effect on computing times. Empirical results provided below show that improved computing times are obtained if these valid inequalities are, in fact, excluded from the RCSP formulation. In addition, a resource graph expansion (RGE) heuristic is proposed for the purpose of estimating the scheduling horizon which, in turn, is required to calculate the latest starting times of activities. The resulting solution from the RGE heuristic may also be applied as a starting solution of the overall RCSP.
In the following section a resource flow-based formulation of the RCSP is provided. The valid inequalities of interest are also identified. These inequalitites may, however, be omitted from the formulation of the RCSP in order to speed up computing times. Details of the proposed RGE heuristic are presented in Section 3, and this is followed by a description of an iterative linear programming approach for calculating the earliest and latest allowable starting times of activities. Computational results are presented in Section 4 based on wellknown RCSP problem instances and instances generated randomly. The paper closes in Section 5 with a brief summary and some ideas for follow-up future work.

A resource flow formulation of the RCSP
The earliest resource flow MILP formulation of the RCSP is due to Artigues et. al [3], who proposed a polynomial insertion algorithm for solving the RCSP. This formulation is, however, driven by an algorithmic approach and is not formulated for the purpose of solving it with an MILP solver. Koné et. al [9] were the first to provide numerical results for a resource flow RCSP formulation solved using an off-the-shelf, commercial MILP solver.
In order to facilitate a formulation of the RCSP, the following notation is required. Let R denote the set of resources and let A denote the index set of all activities. Furthermore, let d i be the duration of activity i ∈ A, measured in days, and let v ri be the quantity of resource r ∈ R being consumed by activity i ∈ A over its entire duration. Also, let E i be the earliest start time and let L i be the latest start time of activity i ∈ A. The earliest and latest start times of an activity are functions of a so-called precedence graph (of which the nodes represent the various activities and each directed edge a precedence relationship) and the planning horizon. An approach toward calculating E i and L i is provided in Section 3. Moreover, let P(i) ⊆ A denote the set of immediate predecessor activities of activity i ∈ A (that is, all incident predecessor activities according to the precedence graph). Finally, let S(i) ⊆ A denote the set of immediate successor activities of activity i ∈ A (that is, all incident successor activities according to the precedence graph), and let U r be the upper limit on the quantity of resource r ∈ R that may be consumed per day.
In order to facilitate the formulation of resource flow constraints below, two artificial activities are introduced, both with a duration of zero. A source activity i + is introduced with P(i + ) = ∅ and S(i + ) = {i ∈ A : P(i) = ∅}, and a sink activity i − is introduced with S(i − ) = ∅ and P(i − ) = {i ∈ A : S(i) = ∅}. Furthermore, for the source and sink activities, v ri + = U r and v ri − = U r , respectively, for all r ∈ R.
The primary decision variables are the starting times s i ≥ 0 for each of the activities i ∈ A. In order to formulate the resource requirement constraints, resource flow variables f rij ≥ 0 are introduced to denote the flow of a resource r ∈ R from activity i ∈ A to j ∈ A. Binary variables z ij ∈ {0, 1}, called the linear ordering variables, are used to indicate the ordering of activities. That is, if z ij = 1 it indicates that activity j is scheduled to start only after completion of activity i. Consequently, the linear ordering variables also indicate whether the transfer of a resource is permitted from activity i ∈ A to j ∈ A.
The objective of the resource flow RCSP is to subject to the constraints The objective function (1) minimises the makespan of the schedule by minimising the starting time of the sink activity i − , while constraint set (2) is required to ensure feasibility in terms of activity precedence.
. According to constraint set (4), the flow of resources from activity i to j is permitted only if activity j is scheduled to start after the completion of activity i, that is when z ij = 1.
The resource requirements are imposed by constraint sets (5) and (6), stating that all the flow of resources into an activity (5) and all the flow of resources out of an activity (6) should match the daily resource requirement v ri /d i of an activity i, for any resource r ∈ R.
According to [9], constraint set (7), which is collectively referred to as directional constraints, ensures that resource flow is either in one direction or the other, or that activities i and j are being processed in parallel, i.e. z ij = 0 and z ji = 0. Constraints (8) are called transitivity constraints which, according to [2], are responsible for ensuring that there are no cycles in the permutations.
Constraint sets (7) and (8) are redundant valid inequalities [11]. No evidence of improvement in computing times are, however, provided in any of the computational results where these valid inequalities have been included in the flow-based RCSP formulation; see e.g.
The computational results reported in this paper show that constraint sets (7) and (8) may, in fact, have a detrimental effect on computing times for some problem instances when included in the formulation of resource flow RCSP models.

The resource graph expansion (RGE) heuristic
In addition to the precedence graph, implicity defined by the predecessor and successor sets P(i) and S(i), a resource flow graph is implied by the resource flow variables f rij and the linear ordering variables z ij . The basic idea behind the newly proposed RGE heuristic is to incrementally add activities to the resource flow graph while successively generating partial solutions. Once a starting solution to an activity i ∈ A has been calculated, the corresponding start-time variable s i is fixed to this solution in subsequent iterations. The initial RCSP formulation of the RGE heuristic comprises the constraint sets (2)-(4), which are formulated by taking the entire set of activities into account. The resource flow requirement constraint sets (5)-(6) are initially formulated for a subset of activities A ⊆ A, which include only the source activity i + and its set of immediate successors S(i + ). That is, A = {i + } ∪ S(i + ). Solving this relaxed version of the RCSP yields a solution that is feasible with respect to the subset of activities A .
For the following iteration of the heuristic the start-time variables s i , for all i ∈ A , are fixed to the solutions s * i obtained during the previous step. It should be noted, however, that the fixing of a variable s i = s * i is only allowed if the variables of its predecessors have already been fixed. Next, the subset of activities A is augmented with the successors of all of the activities in A , that is, A = A ∪ i∈A S(i) . The resource flow requirement constraint sets (5)-(6) are updated each time the subset A is augmented. A formal outline of the RGE heuristic is provided in Algorithm 1.
The purpose of solving the RGE heuristic is two-fold. Firstly, it provides a starting solution for the RCSP which my result in a speed-up of the MILP solver, and secondly, it provides an estimate of the scheduling horizon which, in turn, is required to calculate the latest starting times of activities. Specifying MILP-specific stopping criteria provides several variations on the RGE heuristic. For instance, by specifying a gap limit when solving the relaxed RCSP problem during each iteration, a speed-up of the RGE heuristic may be achieved since the branch-and-bound process will be terminated once the current optimality gap is less than the gap limit. This may, of course, be to the detriment of the quality of the final solution obtained by the heuristic. On the other hand, this may result in the successful computation of feasible solutions within the overall time limit specified for solving the RCSP. The notation RGE(γ) is used in the remainder of this paper to refer to the RGE heuristic where a gap limit of γ is applied during each successive solution of the relaxed RCSP. A gap limit of γ = 0 implies that the relaxed RCSP is solved to optimality.
Recall, from the above discussion, that E i and L i are the earliest and latest start times of an activity i ∈ A, respectively. Conceptually, the approach toward determining E i involves solving an optimisation problem in which the objective is to minimise the start time s i of activity i, subject to the precedence constraints of the RCSP. Similarly, an optimisation problem that maximises the start time s i of an activity i is solved to determine L i . It should be noted, however, that some upper bound, say T , is required on s i in order to prevent an unbounded solution in the case of solving the maximisation problem. An estimate of T is provided by the solution of the RGE heuristic.
The optimisation problem for determining E i and L i involves minimising / maximising s i (9) subject to the constraints for each activity i ∈ A.

Computational results
All of the empirical tests reported in this section were performed on an HP Compaq Elite 8300, with eight cores and 32GB of RAM. SuSE Linux was used as operating system and the IBM product, CPLEX v12.6 [7], was used as MILP solver.
Several data sets for the RCSP and its variants are available in the research community for the purpose of testing algorithmic ideas. For instance, the project scheduling problem library (PSPLIB) [13] is a repository of RCSP problem instances which has been referenced extensively over the years. The PSPLIB comprises the data sets J30, J60, J90 and J120, which are sets of RCSP instances with respectively 30, 60, 90 and 120 activities. Each data set consists of 480 different problem instances, except for the J120 data set which has 600 problem instances. Details on how these problem instances were created can be found in [8]. Other well-known data sets are the 39 problem instances of Baptiste and Le Pape [5], henceforth referred to as the BL instances, and the Pack instances by Carlier and Néron [6]. In order to test the efficiency of event-based formulations, Koné et. al [9] created the problem instances KSD15 d and Pack d, which are based on the J30 and Pack instances, respectively. These newly created instances are characterised by activities having longer durations.
The main objective in this section is to reproduce some of the results reported by Koné et. al [9] and to evaluate the effect that the valid inequalities (7) and (8) have on computing times. For this purpose the same data sets used in Koné et. al [9] are considered in this paper, with the addition of the J60 data set. Further data sets were generated randomly using the software RanGen2 [14]. Details on the design of RanGen2 can be found in [15]. The major benefit of using RanGen2 is that it allows for the specification of several input parameters which influence the properties of the randomly generated problem instances. For instance, one of the parameters in RanGen2, called I 2 , is used to specify the level of serialisation that the resulting precedence graph of the generated problem instance should possess. More specifically, if the value I 2 = 1 is specified by the user, a random problem instance is created for which all the activities are serial according to the precedence graph.
On the other hand, if I 2 = 0, all the activities are in parallel. For the purpose of this study, instances containing 50 or 100 activities were generated randomly using RanGen2. The problem instances in the data sets RG50 L and RG100 L were generated by specifying a low degree of serialisation, that is I 2 = 0.1, while the problem instances in RG50 H and RG100 H were generated by specifying a high degree of serialisation, that is I 2 = 0.5. The data sets RG50 L, RG50 H, RG100 L and RG100 H each contains 50 problem instances.
An important collective contribution by the research community has been the characterisation of problem instances according to various indicators. Some of the indicators used to distinguish between "easy" and "hard" instances are briefly described: Order strength (OS) is a measure of parallelism of the underlying precedence graph. That is, a problem instance for which OS = 0 indicates that all activities are in parallel, whereas OS = 1 indicates that all activities are ordered in serial. The hardness of problem instances increases with a decrease in OS.
Network complexity (N C) is the average number of incident arcs per node in the precedence graph. Higher levels of N C are associated with harder problem instances.
Resource factor (RF ) measures the average number of resources required per activity. It has been observed empirically that RF increases with an increase in the hardness of problem instances.
Resource strength (RS) is a measure which combines resource requirements per activity and peak resource demand due to a precedence feasible schedule based on the earliest start times of activities. Problem instances for RS close to zero are considered much harder than problem instances for which RS is close to one.
Disjunction ratio (DR) provides an indication of how many activities may be scheduled in parallel by taking resource requirements and precedence relations into account. Highly disjunctive problem instances are considered to be easier than cumulative instances that have a lower disjunction ratio.  Table 1 contains a summary of the statistics for the above indicators calculated for all of the problem instances considered in this paper. The hardest set of instances, according to the DR indicator, are the BL instances followed by the Pack d, J60, Pack, KSD15 d and J30 instances. The DR values for the randomly generated data sets RG50 L, RG50 H, RG100 L and RG100 H are much higher. Although it may appear that all of these instances are easy, it should be noted that none of the above indicators take the number of activities into account. Furthermore, the RG50 L and RG100 L data sets, which were generated according to a low degree of serialisation, exhibit relatively low OS values, which may suggest a higher degree of difficulty.
For the purpose of reporting the computational results, the resource flow formulation of the RCSP given by (1)- (8) is denoted by RF. The abreviation RFX is used to refer to the resource flow formulation that excludes the valid inequalities, that is, the formulation given by (1)- (6). In order to measure the effect of employing the RGE heuristic, the notation RFX+RGE(γ) and RF+RGE(γ) are used to refer to the combination of the RFX formulation and the RF formulation with the use of the RGE heuristic, respectively. Gap limits of γ = 0%, γ = 50% and γ = 100% are considered.
The first set of results is provided in Table 2 and is for the same problem instances that were considered in [9]. The results for the other instances follow later. For the J30 data set, both the RFX and the RF formulations were successful in solving 75% of the problem instances to optimality.   Table 3: The effect of directional and transitivity inequalities (7)-(8) on computing times for the remaining problem instances. and the KSD 15 problem instances and a marginal improvement in the average gap was obtained for the Pack d instances.
The main conclusion drawn from the first set of results provided in Table 2 is that there is merit to exclude the directional and transitivity inequalities (7)-(8) from the resource flow-based RCSP formulation. This may have an effect on the conclusions made by, for instance, Koné et. al [9] with respect to the success of event-based formulations over resource-flow formulations which include the directional and transitivity inequalities (7)- (8). The results that follow for the remaining problem instances considered in this paper are even more convincing in this regard.
The significance of excluding the directional and transitivity inequalities (7)-(8) from the RCSP problem formulation is clearly demonstrated by the results for the J60 data set in Table 3. Adopting the RFX formulation a total of 67% of the J60 problem instances were solved to optimality, compared to 23% for the RF formulation. Further improvements were achieved by employing RFX+RGE(50%) and an average gap of 9.4% was achieved by generating feasible solutions to all of the problem instances, compared to 9.7% in the  case of the plain RFX formulation.
Although the RFX formulation is outperformed by the RF formulation in the case of the "easier" RG50 H and RG100 H problem instances, very promising results are obtained for the "harder" RG50 L and RG100 L problem instances. The RFX allowed for the solution of 46% of the RG50 L problem instances, whereas the RF could not facilitate the solution of any of the problem instances to optimality. Improvements were once again achieved through RFX+RGE(50%) and an average gap of 5.2% was obtained by generating feasible solutions to all of the problem instances, compared to 6.9% in the case of the plain RFX formulation. Although none of the RG100 L problem instances could be solved to optimality, the benefits of applying RFX are still clearly visible by considering that an average gap of 32.6% was obtained for all of the cases for which at least one feasible solution could be computed. Feasible solutions could be computed for only 96% of the RG100 L problem instances according to the RF formulation.
Recall that the distinction between "easy" and "hard" instances for the RG50 and RG100 data sets are only based on the level of OS. That is, the easier problem instances RG50 H and RG100 H are associated with higher OS levels, while the harder instances RG50 L and RG100 L are associated with lower OS levels. From the above results it is therefore reasonable to assume that improvements in computing times may be expected for problem instances characterised by low OS values if directional and transitivity inequalities (7)-(8) are excluded. As indicated in Table 1, all of the RG50 and RG100 instances are, however, considered easy when measured according to the DR tractability indicator. The question raised here is whether the RFX is effective for problem instances characterised by a low DR value. The results of Table 2 may hint at the contrary considering that the RF outperformed the RFX in the case of the BL data set, which is considered to be the hardest since it has the lowest average DR value. In order to explore this further, attention is drawn to the positive results obtained for the J60 problem instances. Although the average DR value for the J60 data set is higher than that of the BL data set (0.4 vs. 0.34), the J60 problem instances contains more activities and they have a higher average RS value (0.6 vs. 0.34). In this respect the J60 data set may be considered to be harder than the BL data set. In order to analyse the effectiveness of the RFX with respect to  Positive results are reported in Table 4 for the J60 L and J60 H problem instances. By making use of the RFX formulation, which excludes the directional and transitivity inequalities (7)- (8), 73% of the total number of J60 L instances are solved to optimality. For the RF formulation, which includes the the directional and transitivity inequalities (7)- (8), only 44% of the instances are solved to optimality. It is also encouraging to note that the application of RFX+GRE(50%) improves the average gap, managing to generate feasible solutions for all of the problem instances instances with an average gap of 6.2% vs. the 6.8% gap achieved according to the plain RFX formulation without the heuristic. Results for the J60 H data set are also positive, showing that 49% of the instances are solved to optimality according to the RFX vs. 30% when applying RF.
As a final analysis, Table 5 provides the average performance measures for the RFX and heuristic combinations over all of the problem instances. At a first glance the use of the GRE heuristic does not appear to be beneficial since the RFX formulation without the heuristic achieved the highest percentage of instances solved to optimality. It should be noted, however, that these are averages over all of the problem instances. On closer inspection it is observed that the application of the heuristic is especially useful when considering harder problem instances. For instance, the application of RFX+RGE(50%) resulted in improved average gaps for the harder instances, such as J60, RG50 L and RG100 L, but not necessarily for the easier problem instances. Table 5 also indicates that, on average, γ = 50% may be a good parameter choice since RFX+RGE(50%) solved the most instances to optimality, compared to other choices of γ.

Summary and conclusion
The primary concern of this paper has been an investigation into how directional and transitivity valid inequalities may influence computing times when included in the formulation of the flow-based RCSP. The trend observed from the computational results shows that improved computing times may be expected when excluding these valid inequalities, especially when considering "harder" problem instances. This warrants a re-evaluation of the results presented by others which suggested that event-based RCSP formulations may perform better than resourc flow-based formulations.
As a secondary contribution, a heuristic was proposed for the purpose of generating initial feasible solutions and estimating the scheduling horison necessary for the computation of the latest start dates of the activities. Positive results were reported for specifically harder problem instances.