Australian National Electricity Market Model - Version 1.10

This working paper provides details of the Australian National Electricity Market (ANEM) model version 1.10 used in the research project titled: An investigation of the impacts of increased power supply to the national grid by wind generators on the Australian electricity industry. The paper provides a comprehensive reference of the ANEM model for the other project publications that use the ANEM model to analysis the sensitivity of four factors to increasing wind power penetration. The four factors include (1) transmission line congestion, (2) wholesale spot prices, (3) carbon dioxide emissions and (4) energy dispatch. The sensitivity of the four factors to wind power penetration is considered in conjunction with sensitivity to weather conditions, electricity demand growth and a major augmentation of the transmission grid of the Australian National Electricity Market (NEM) called NEMLink (AEMO 2010a, 2010b, 2011a, 2011b). The sensitivity analyses use 5 levels of wind power penetration from zero wind power penetration to enough wind power to meet the original 2020 41TWh Large-scale Renewable Energy Target. The sensitivity to weather is developed by using half hourly electricity demand profiles by node from three calendar years 2010, 2011 and 2012. The sensitivity to growth is developed by incrementing the nodal demand profiles over the projection years 2014 to 2025.


Introduction
This working paper's primary aim is to provide a detailed reference to the Australian National Electricity Market (ANEM) model version 1.10 for other publications developed during the research project titled: An investigation of the impacts of increased power supply to the national grid by wind generators on the Australian electricity industry.
Section 2 provides diagrams of the generation and load serving entity nodes and the transmission lines that the ANEM model uses. There are 52 nodes and 68 transmission lines, which make the ANEM model realistic. In comparison, many other models of the NEM are highly aggregated.
Section 3 describes the ANEM model in detail and provides additional information on the assumptions made about the change in the generation fleet in the NEM during the lifetime of the research project.

Australian National Electricity Market Network
This section provides network diagrams of the nodes discussed in this report. We also know these nodes as load serving entities or demand regions. However, two of the nodes are supply only nodes without associated demand. Figure 1 shows the interconnectors between the states to provide an overview of the more detailed state network diagrams in the following figures. Regarding the numbering on the nodes in Figures 2 to 6 below, if the node number and demand region number are the same, we place just one number on the node. If the node number and demand region number differ, we place both numbers on the node in the following way: (node number, demand region number). For instance, (10, 11) is on the node at North Morton. The red transmission lines indicates those lines whose capacity is increases in our NEMLink project report. NEMLink is a conceptual major augmentation of the transmission grid outlined in the National Transmission Network Development Plan (AEMO 2010a(AEMO , 2010b(AEMO , 2011a(AEMO , 2011b. Table 2 and Table 3 compare the thermal capacity of these lines before and after augmentation for NEMLink.
Electronic copy available at: https://ssrn.com/abstract=2724867    Table 3 shows two transmission lines augmented for NEMLink within the ANEM model, which were outside the scope of the original NEMLink proposal. We added these two lines to improve the linkage between the high capacity backbone of NEMLink and the high demand regions of Sydney and Melbourne.

Australian National Electricity Market Model
This section discusses the Australian National Electricity Market (ANEM) Model. The ARC Linkage project uses the ANEM model to study the interactions between the NEM and the increased penetration of wind generation on wholesale spot prices, generator dispatch patterns, transmission branch congestion and carbon emissions from electricity generation.
The ANEM model uses the node and transmission line topology in Section 2. ANEM is an agent based model and the agents include demand and supply side participants as well as a network operator. The nodes and transmission lines shown in Section 2 constrain the behaviour of these agents. The following sections provide an outline of the ANEM model and present the principal features of the agents in the model. We discuss the ANEM's algorithm used to calculate generation production levels, wholesale prices and power flows on transmission lines. We also discuss practical implementation considerations.

Outline of the ANEM model
The methodology underpinning the ANEM model involves the operation of wholesale power markets by an Independent System Operator (ISO) using Locational Marginal Pricing (LMP) to price energy by the location of its injection into, or withdrawal from, the transmission grid. ANEM is a modified and extended version of the American Agent-Based Modelling of Electricity Systems (AMES) model developed by Sun andTesfatsion (2007a, 2007b) and utilises the emerging powerful computational tools associated with Agent-based Computational Economics (ACE). This type of modelling uses a realistic representation of the network structure and high frequency behavioural interactions made possible by the availability of powerful computing resources. The important differences between the institutional structures of the Australian and USA wholesale electricity markets are also fully reflected in the modelling undertaken and outlined more fully in Wild, Bell and Foster (2012, Sec. 1).
To understand the impacts of increased penetration of wind generation in the NEM requires a realistic model containing many of the salient features of the NEM. These features include realistic transmission network pathways, competitive dispatch of all generation technologies with price determination based upon variable cost and branch congestion characteristics and intra-regional and inter-state trade.
In the ANEM model, we use a Direct Current Optimal Power Flow (DC OPF) algorithm to determine optimal dispatch of generation plant, power flows on transmission branches and wholesale prices. This framework accommodates many of the features mentioned above including: intra-state and inter-state power flows; regional location of generators and load centres; demand bid information and the following unit commitment features:

Principal features of the ANEM model
The ANEM model is programmed in Java using Repast (2014), a Java-based toolkit designed specifically for agent base modelling in the social sciences. The core elements of the model are: • The wholesale power market includes an ISO and energy traders that include demand side agents called Load-Serving Entities (LSE's) and generators distributed across the nodes of the transmission grid.

•
The transmission grid is an alternating current (AC) grid modelled as a balanced three-phase network.

•
The ANEM wholesale power market operates using increments of one half-hour. • The ANEM model ISO undertakes daily operation of the transmission grid within a single settlement system, which consists of a real time market settled using LMP.

•
For each half-hour of the day, the ANEM model's ISO determines power commitments and LMP's for the spot market based on generators' supply offers and LSE's demand bids used to settle financially binding contracts.

•
The inclusion of congestion components in the LMP helps price and manage transmission grid congestion.

Transmission grid characteristics in the ANEM model
The transmission grid utilised in the ANEM model is an AC grid modelled as a balanced three-phase network. In common with the design features outlined in Sun and Tesfatsion (2007a), we make the following additional assumptions: • The reactance on each branch is assumed to be a total branch reactance, meaning that branch length has been taken into account in determining reactance values; • All transformer phase angle shifts are assumed to be 0; • All transformer tap ratios are assumed to be 1; and • All line-charging capacitances are assumed to be 0.
To implement the DC OPF framework used in the ANEM model, two additional electrical concepts are required. These are base apparent power, which is measured in three-phase Megavoltamperes (MVA), and base voltage, which is measured in line-to-line Kilovolts (kV). We use these quantities to derive the conventional per unit (PU) normalisations used in the DC OPF solution and facilitate conversion between Standard International (SI) and PU unit conventions.
The ANEM model views the transmission grid as a commercial network consisting of pricing locations for the purchase and sale of electricity power. A pricing location is also a location at which market transactions are settled using publicly available LMPs and coincides with the set of transmission grid nodes. The ANEM model uses the DC OPF framework. Therefore, ANEM models the High Voltage DC (HVDC) Interconnectors DirectLink, MurrayLink and BassLink as 'quasi AC' links determining power flows from reactance and thermal MW rating values.
The major power flow pathways in the model reflect the major transmission pathways associated with 275, 500/330, 500/330/220, 275 and 220 KV transmission branches in QLD, NSW, VIC, SA and TAS, respectively. Key transmission data required for the transmission grid in the model relate to an assumed base voltage value, base apparent power, branch connection and direction of flow information, maximum thermal rating of each transmission branch (in MW's) and an estimate of its reactance value (in ohms). Base apparent power is set to 100 MVA, an internationally recognized value. Thermal ratings of transmission lines was constructed from data contained in AEMO (2013b) using the detailed grid diagrams in AEMO (2013a) to identify transmission infrastructure relevant to the transmission grid structure used in the ANEM model. We obtained reactance and load flow data from AEMO on a confidential basis.
AEMO defines the thermal rating of equipment including transmission lines in terms of MVA. We convert these values to MWs assuming a power factor of unity. Therefore, ANEM's MW values correspond to the MVA values in the source AEMO data files. We also utilize information in the AEMO equipment ratings files to accommodate differences in maximum thermal ratings between summer and winter. Typically, the maximum MW thermal capacity rating of transmission lines is greater in winter than in summer because lower temperatures occur more often in winter then summer. Therefore, ANEM uses different thermal MW capacity values in summer and winter. We also assume that the alloy in the transmission lines' determines the reactance and reactance is unaffected by temperature. These assumptions permit the use of a constant value for reactance on each branch.
In Section 2, we define the direction of flow on a transmission branch (e.g. line) connecting two nodes as 'positive' if the power flows from the lower numbered node to the higher numbered node. For example, for line 1 connecting Far North QLD (node 1) and the Ross node (node 2), power flowing from Far North QLD to Ross on line 1 would have a positive sign, while power flowing on line 1 from Ross to Far North QLD would have a negative sign. The latter type of power flow is termed 'reverse' direction flow. In the ANEM model, it is possible to accommodate power flows in the positive and reverse direction having different thermal limits and different capacities for summer and winter.

Demand-side agents in the ANEM model: LSE's
A LSE is an electric utility that has an obligation to provide electrical power to end-use consumers (residential, commercial or industrial). The LSE agents purchase bulk power in the wholesale power market each day to service customer demand (called load) in the downstream retail market, thereby linking the wholesale power market and retail market. We assume that downstream retail demands serviced by the LSE's exhibit negligible price sensitivity, reducing to daily supplied load profiles which represents the real power demand Electronic copy available at: https://ssrn.com/abstract=2724867 (in MW's) that the LSE has to service in its downstream retail market for each half-hour of the day. LSE's are also modelled as passive entities who submit daily load profiles to the ISO without strategic considerations (Sun & Tesfatsion 2007b).
The revenue received by LSE's for servicing these load obligations are regulated to be a simple 'dollar mark-up' based retail tariff. For example, in QLD, the state government regulates retail tariffs that are payable by most residential customers. Prior to July 2009, for example, this amounted to 14.4c/KWh exclusive of GST, which, in turn, translated into a retail tariff of $144/MWh. Thus, in the current set-up, we assume that LSE's have no incentive to submit price-sensitive demand bids into the market.
ANEM requires half-hourly regional load data. We derived this load data for QLD and NSW using regional load traces supplied by Powerlink and Transgrid. This data was then rebased to the state load totals published by AEMO (2014a) for the 'QLD1' and 'NSW1' markets. For the other three states, the regional shares were determined from terminal station load forecasts associated with summer peak demand (and winter peak demand, if available) contained in the annual planning reports published by the transmission companies Transend (TAS), Vencorp (VIC) and ElectraNet (SA). These regional load shares were then interpolated to a monthly based time series using a cubic spline technique and these time series of monthly shares were then multiplied by the 'TAS1', 'VIC1' and 'SA1' state load time series published by AEMO (2014a) in order to derive the regional load profiles for TAS, VIC and SA.
Additionally, the demand concept underpinning the state totals published by AEMO and used in the modelling is a net demand concept related conceptually to the output of scheduled and semi-scheduled generation, transmission losses and large independent loads directly connected to the transmission grid. This demand concept is termed 'scheduled demand' (AEMO 2012b) -elsewhere termed "total" demand in this report. As such, this net demand concept can be viewed as being calculated from gross demand, after contributions from small scale solar PV and both small scale and large scale non-scheduled generation (including wind, hydro and bagasse generation) has been netted out to produce the net demand concept used in the modelling.
The actual demand concept employed in the modelling is a grossed up form of scheduled demand, which we obtained by adding the output of large-scale non-scheduled generation to the scheduled demand data. We obtained the five-minute non-scheduled generation output data for the period 2007 to 2013 from AEMO and averaged across six five-minute intervals to obtain half-hourly output traces. We then summed across all non-scheduled generators located within a node and added to the nodal based scheduled demand to determine the nodal based augmented demand concept used in the modelling. Therefore, the demand concept employed in the modelling equates to the sum of the output of scheduled and semischeduled generation, non-scheduled generation, transmission losses and large independent loads directly connected to the transmission grid. It does not include the contributions from small scale solar PV and WTG and, as such, still represents a net demand concept.

Supply-side agents in the ANEM model: generators
We assume that generators produce and sell electrical power in bulk at the wholesale level. Each generator agent is configured with a production technology with assumed attributes Electronic copy available at: https://ssrn.com/abstract=2724867 relating to feasible production interval, total cost function, total variable cost function, fixed costs [pro-rated to a dollar per hour basis] and a marginal cost function. Depending upon plant type, a generator may also have start-up costs. Each generator also faces MW ramping constraints that determine the extent to which real power production levels can be increased or decreased over the next half-hour within the half hourly dispatch horizon. Production levels determined from the ramp-up and ramp-down constraints must fall within the minimum and maximum thermal MW capacity limits confronting each generator.
The MW production and ramping constraints are defined in terms of 'energy sent out'i.e. the energy available to service demand. In contrast, variable costs and carbon emissions are calculated from the 'energy generated' production concept which is defined to include energy sent out plus a typically small amount of additional energy that is produced internally as part of the power production process. ANEM models the variable costs of each generator as a quadratic function of half-hourly real energy produced by each generator. The marginal cost function is calculated as the partial derivative of the quadratic variable cost function with respect to energy produced, producing a marginal cost function, that is, linear (upward sloping) in real energy production of each generator (Sun & Tesfatsion 2007b).
The variable cost concept underpinning each generator's variable cost incorporates fuel, variable operation and maintenance (VO&M) costs and carbon cost components. The fuel, VO&M and carbon emissions/cost parameterisation was determined using data published in ACIL Tasman (2009) for thermal plant and from information sourced from hydro generation companies for hydro generation units. Wild, Bell and Foster (2012, App. A) provide a formal derivation of the various cost components in detail.
Additionally, we averaged the 2014-20 gas prices from a gas pricing model called ATESHGAH (Wagner 2004;Wagner, Molyneaux & Foster 2014) to provide this report's 2014 gas prices for the reference gas price scenario modelled for this report. Both this report's 2014 gas prices and ANEM assume an inflation rate of 2.5 per cent per annum indexed on year 2014.

Passive hedging strategy incorporated in the ANEM model
Both theory and observation suggest that financial settlements based on market structures similar to that implemented in the NEM expose market participants to the possibility of extreme volatility in spot prices encompassing price spike behaviour (typically of short duration) or sustained periods of low spot prices. These impacts pose significant danger to the bottom line of both LSE's and generators respectively, requiring both types of agents to have long hedge cover positions to protect their financial viability. In the ANEM model, a key decision for both types of agents is when to activate long cover to protect their bottom lines from the consequences of consistently high (low) spot priceskey determinants of 'excessively' high costs ('excessively' low revenues) faced by LSE's and generators, respectively. Failure to do so could pose serious problems for the continued financial solvency of market participants. The form of protection adopted in the model is a 'collar' instrument between LSE's and generators, which ANEM activates whenever spot prices rise above a ceiling price (for LSE's) or falls below a price floor (for generators). If the price floor applicable to generators is set equal to the generators long run marginal cost, ANEM can implement a generator long run revenue recovery through the hedge instrument.
Electronic copy available at: https://ssrn.com/abstract=2724867 ANEM assumes that both LSE's and generators pay a small fee (per MWh of energy demanded or supplied) for this long hedge cover. This fee is payable irrespective of whether long cover is actually activated. Thus, the small fee acts like a conventional premium payment in real options theory. If the spot price is greater than the price floor applicable to generator long cover and below the price ceiling applicable for LSE long cover, than no long cover is activated by either type of agent although the fee payable for the long cover is still paid by both types of agents.

DC OPF solution algorithm used in the ANEM model
Optimal dispatch, wholesale prices and power flows on transmission lines are determined in the ANEM model by a DC OPF algorithm. The DC OPF algorithm utilised in the model is that developed in Sun and Tesfatsion (2007a) and involves representing the standard DC OPF problem as an augmented strictly convex quadratic programming (SCQP) problem, involving the minimization of a positive definite quadratic form subject to linear equality and inequality constraints. The augmentation entails utilising an objective function that contains quadratic and linear variable cost coefficients and branch connection and bus admittance coefficients. The solution values are the real power injections and branch flows associated with the energy production levels for each generator and voltage angles for each node.
We use the Mosek (2014) optimisation software that exploits direct sparse matrix methods and utilises a convex quadratic programming algorithm based on the interior point algorithm to solve the DC OPF problem. Equation 1 shows ANEM's implementation of the Mosek DC OPF algorithm inequality constraints.
The ANEM model solves the following optimisation for every half-hour. Equation 1(a) shows the objective function that minimises real-power production levels P Gi for all generators i = 1,…,I and voltage angles δ k for all transmission lines and k = 2,..,K subject to the constraints in Equation 1(b), (c) and (d).

Equation 1: ANEM's objective function and constraints (a) Objective function: Minimise generator-reported total variable cost and nodal angle differences
(lower half-hourly thermal ramping limit  lower thermal MW capacity limit) (upper half-hourly thermal ramping limit  upper thermal MW capacity limit) i = 1,…, I Upper limit U and lower limit L, A i and B i are linear and quadratic cost coefficients from the variable cost function. δ k and δ 1 are the voltage angles at nodes 'k' and 'm' (measured in radians). Parameter π is a positive soft penalty weight on the sum of squared voltage angle differences. Variables F UN km and F UR km are the (positive) MW thermal limits associated with real power flows in the 'normal' and 'reverse' direction on each connected transmission branch km ∈ BR.
Electronic copy available at: https://ssrn.com/abstract=2724867 The linear equality constraint refers to a nodal balance condition, which requires that, at each node, power take-off (by LSE's located at that node) equals power injection (by generators located at that node) and net power transfers from other nodes on 'connected' transmission branches. On a node-by-node basis, the shadow price associated with this constraint gives the LMP (i.e. regional wholesale spot price) associated with that node. The linear inequality constraints ensure that real power transfers on connected transmission branches remain within permitted 'normal' and 'reverse' direction thermal limits and the real power produced by each generator remains within permitted lower and upper thermal MW capacity limits while also meeting MW ramp up and ramp down generator production limits.
The ANEM model differs in significant ways from many of the wholesale electricity market models used to investigate the Australian electricity industry. First, ANEM has a more disaggregated nodal structure than many of the other wholesale market models. The ANEM model contains 52 nodes and 68 transmission branches, including eight inter-state interconnectors and 60 intra-state transmission branches as depicted in Section 2. In contrast, other wholesale market models often involve five or six nodes, corresponding to each state region in the NEM, and six or seven inter-state interconnectors. Second, the solution algorithm used in the ANEM model is very different conceptually from the linear programming algorithms used in many of the other wholesale market models. ANEM uses quadratic programming to minimise both nodal angle differences and generator variable costs subject to network limits on transmission branches and generation. Optimal power flows on transmission branches are determined from optimised nodal angle differences, which, in turn, depend on transmission branch adjacency and bus admittance properties determined from the transmission grid's structure and branch reactance data (Sun & Tesfatsion 2007a, Sec. 4). Accounting for power flows in the equality constraints of the DC OPF algorithm allows the incorporation of congestion components in regional wholesale spot prices, which can produce divergence in regional spot prices associated with congestion on intra-state transmission branches.
In contrast, the linear programming algorithms do not explicitly optimise power flows as part of the optimisation process, directly capture the impact of branch congestion on spot prices or account for any impact associated with congestion on intra-state transmission branches. Moreover, these models typically fail to offer intra-state regional spot prices. These models also typically use limit equations published by AEMO to incorporate thermal and other system constraints. One potential limitation of this approach is that, within the context of including many new proposed wind farms in the modelling, correct limit equations for such a system do not exist. Typically, modellers who use existing limit equations would have to assume that they remain valid for the new analysis. However, this is unlikely in practice. Specifically, the introduction of a number of new wind farms would be expected to change dispatch patterns and power flows, especially as wind penetration levels increase. In these circumstances, new limit equations would be needed. This observation is also supported by the fact that whenever a new generator is added to the existing generation fleet, a new set of limit equations are typically published by AEMO.

Practical implementation considerations
The solution algorithm employed in all simulations involves applying the 'competitive equilibrium' solution. This means that all generators submit their true marginal cost coefficients without strategic bidding. This permits assessment of the true cost of generation and dispatch. Therefore, the methodological approach underpinning modelling is to produce 'as if' scenarios. In particular, we do not try to emulate actual historical generator bidding patterns or strategic bidding based upon monopolistic competition or game theoretic approaches. Instead, our objective is to investigate, in an ideal setting, how the proposed expansion in wind generation interacts with other generation in the NEM, from the perspective of least-cost dispatch. As such, the analytic framework is a conventional DC OPF analysis with generator supply offers based upon Short Run Marginal Cost (SRMC) coefficients.
We also assume that all thermal generators are available to supply power during the whole period under investigation, excepting assumed refurbishment or replacement programmes, plant retirements or temporary plant closures to be specified below. This rules out the possibility where allowing for unscheduled outages in thermal generators would be expected to increase costs and prices above what is produced when all relevant thermal plant is assumed to be available to supply power because it acts to constrain the least cost supply response available to meet prevailing demand.
In order to make the model response to the various scenarios more realistic, we have taken account of the fact that baseload and intermediate coal and gas plant typically have 'nonzero' must run MW capacity levels termed minimum stable operating levels. These plants cannot run below these specified MW capacity levels without endangering the long-term productive and operational viability of the plant itself or violating statutory limitations relating to the production of pollutants and other toxic substances.
Because of the significant run-up time needed to go from start-up to a position where coalfired power stations can actually begin supplying power to the grid, all coal plant was assumed to be synchronized with the grid so they can supply power. Thus, their minimum stable operating limits were assumed to be applicable for the whole period being investigated for which they are operational and they do not face start-up costs. Gas plant, however, has very quick start-up characteristics and can be synchronized with the grid and be ready to supply power typically within a half hour period of the decision to start-up. Therefore, in this case, the start-up decision and fixed start-up costs can accrue within the dispatch period being investigated.
Two approaches to modelling gas plant were adopted depending upon whether the gas plant could reasonably be expected to meet base-load and intermediate production duties or just peak-load production duties. If the gas plant was capable of meeting base-load or intermediate production duties, the plant was assigned a non-zero minimum stable operating capacity. In contrast, peak-load gas plant was assumed to have a zero minimum stable operating capacity. It should be recognised that because of the high domestic gas prices associated with the reference gas price scenario when compared with historically low domestic gas prices means that all OCGT gas plant are modelled as peak-load plant. On the other hand, gas thermal and Combined Cycle Gas Turbine (CCGT) plant are generally modelled as baseload or intermediate gas plant. In the former case, they are assumed to Electronic copy available at: https://ssrn.com/abstract=2724867 offer to supply power for a complete 24-hour periodthus, the minimum stable operating capacity is applicable for the whole 24-hour period and these plants do not face start-up costs. In contrast, some gas thermal plant is assumed to fulfil intermediate production duties and only offer to supply power during the day. In this case, the minimum stable operating capacities were only applicable for those particular half-hours of the day and these plants face the payment of fixed start-up costs upon start-up.
Details of the minimum stable operating capacities assumed for operational coal and baseload and intermediate gas-fired plant are listed in Table 4 and Table 5, together with details about their assumed operating time, whether start-up costs were liable and, if so, what values were assumed for these particular costs. Electronic copy available at: https://ssrn.com/abstract=2724867 We assumed the following generation de-commissioned: MacKay Gas Turbine (from 2017); • Torrens Island A (from 2017); and • Mt Stuart (from 2023).
We have also included some recently announced temporary plant closures associated with: Recall that all OCGT plant is assumed to operate as peak-load plant and, as such, does not have any specified non-zero minimum stable operating levels or must run production configurations. Gas thermal generation is treated as base-load generation operating with both non-zero minimum stable operating levels and must run production configurations defined in Table 5. However, in summer one unit of Torrens Island A and B are not run as base-load plant, but instead, as peak-load plant with a zero minimum stable operating level and no must run production configuration. Furthermore, given the lower demand typically prevailing in winter, together with higher output from wind generation in especially SA and VIC, both Newport, Torrens Island A and one unit of Torrens Island B are no longer run as base-load plant but, instead, as peak-load plant. It should also be noted that Tamar Valley CCGT plant is also operated in this mode during winter as well.
We have broadly fixed the generation structure used in simulations to the structure listed in Section 2 (after accounting for the plant de-commissioning mentioned above). In particular, we did not attempt to include any future proposed projects in the analysis because there is currently too much uncertainty over both the status and timing of many proposed projects.
This uncertainty principally reflects three factors. The first relates to financial uncertainty over future gas prices once the eastern seaboard CSG/LNG projects begin to operate from 2014-15. The second factor relates to the fall in average demand experienced widely throughout the NEM over the last couple of years, which affects the viability of baseload generation proposals as well as the future commissioning date of new project proposals. Specifically, the August 2014 Electricity Statement of Opportunities (AEMO 2014b) medium reserve deficit projection is zero until 2023-24 for all states. This implies an oversupply of generation capacity to meet demand, requiring no investment in new thermal plant until at least 2023-24. The third source of uncertainty is regulatory and political uncertainty about the future of carbon pricing and policy support for renewable energy. Therefore, given the generation set available for the ANEM model simulations, our modelling focuses on the interaction of proposed expansions in wind generation with the other generators currently in the NEM.
ANEM assumes all thermal generators available to supply power but imposes restrictions on the availability of hydro generation units. The dispatch of thermal plant is optimised around the assumed availability patterns for the hydro generation units. In determining the availability patterns for hydro plant, we assumed that water supply for hydro plant was not an issue. If water supply issues or hydro unit availability were constraining factors, as was actually the case in 2007, for example, this would increase the cost and prices obtained from simulations because the cost of supply offers of hydro plant would be expected to increase significantly.
Because of the prominence of hydro generation in TAS, some hydro units were assumed to offer capacity over the whole year with account being taken of the ability of hydro plant to meet base-load, intermediate or peak-load production duties. For pump-storage hydro units such as Wivenhoe and Shoalhaven, the pump mode was activated by setting up a pseudo LSE located at the Morton North and Wollongong nodes. The combined load requirements for pump actions of all Wivenhoe and Shoalhaven hydro units were combined into a single Electronic copy available at: https://ssrn.com/abstract=2724867 load block determined by the model from unit dispatch records of these generators from the previous day and placed in the relevant pseudo LSE's. In both cases, the pump actions are assumed to occur in off-peak periods when the price (cost to hydro units) of electricity is lowest.
For all hydro plant, hydro generator supply offers were based on Long Run Marginal Cost (LRMC) coefficients. These coefficients take into account the need to meet fixed costs including capital and operational expenses and are often significantly larger in magnitude than corresponding SRMC coefficients. For mainland hydro plant, supply was tailored to peak load production. Thus, LRMC estimates were obtained for much lower annual capacity factors (ACF) than would be associated with hydro plant fulfilling base load or intermediate production duties, thus producing higher LRMC coefficients. Moreover, the ACF was reduced for each successive hydro turbine making up a hydro plant resulting in an escalating series of marginal cost coefficient bids for each successive turbine. In general, the lowest marginal cost coefficient shadowed peak-load OCGT plant while other turbines supply offers could be significantly in excess of cost coefficients associated with more expensive peakload gas or diesel plant. This approach essentially priced the social cost of water usage within successive turbines of a hydro power station as an increasingly scarce commodity.
A key consideration governing the decision to use LRMC coefficients to underpin the supply offers of hydro generation plant is the predominance of such generators in TAS. With the absence of other major forms of thermal based generation in TAS and limited native load demand and export capability into VIC, it is likely that nodal pricing, based on SRMC would not be sufficient to cover operational and capital costs. Supply offers based on LRMC, however, ensure that average price levels are sufficient to cover these costs over the lifetime of a hydro plant's operation. We also assumed that the minimum stable operating capacity for all hydro plant is zero and that no start-up costs are incurred when the hydro plants begin supplying power to the grid. Hydro plant is also assumed to have a very fast ramping capability.
Various wind penetration scenarios are included in the modelling undertaken for this report. Section 4 presents the derivation of the output traces for all categories of wind farms. Section 5 presents the various wind penetration scenarios for use within the ANEM model.
In the ANEM model simulations performed for this project, we have also adopted an 'n' transmission configuration scenario. This approach involves applying the MW thermal limits determined from the sum of all individual transmission line thermal ratings in the group of transmission lines connecting two nodes. This approach effectively assumes no line outages occur and that the transmission lines are all in good working condition. For example, the capacity of each line is unconstrained below its rated capacity when all other transmission lines are operating at their maximum capacity. As such, this approach represents, from the perspective of operational constraints of the transmission network, an ideal setting, matching the approach we also adopted in relation to thermal and hydro generation unit availability.
The approach adopted in this project can be contrasted with the more realistic 'n-1' transmission configuration scenario which typically involves subtracting the largest individual line from the group connecting nodes. This latter approach is linked to reliability considerations that ensure that things do not go 'pear shaped' if the largest single line is lost, and as such, is a more realistic operational setting.
Electronic copy available at: https://ssrn.com/abstract=2724867 The main reason we adopted the 'n' transmission configuration scenario was the length of the time interval involved with the project, which goes out to 2025. As such, we are sacrificing some operational realism in the near turn but also recognising that the current 'n' scenario might well become an 'n-1' scenario towards the end of the simulation time horizon if additional transmission lines were to be added.

Projecting Regional Demand Profiles to the year 2025
The ANEM model requires demand projection out to 2025. These projections are predicated on baseline 2010, 2011 and 2012 load profiles of each demand node, representing halfhourly electricity load for a given year at each demand nodethat is, a node that contains demand. A forecast load profile curve can be projected using the methods outlined in Simshauser and Wild (2009) by reference to the peak load point estimate reported in AEMO (2012a). Once the peak load point estimate has been set, a schedule of half-hourly load is determined by applying a block-load factor (intercept term) and load-scaling factor (slope term) to one of the existing (historical) load profile schedules to extrapolate the projected schedule on the basis of expected changes in peak load demand characteristics in the future. where blf is the block load factor and lsf is the load scaling factor calculated immediately above. Given the above parameter settings, we obtain a value for 1020 blf of 0.0049 for 2020.
The projected load duration curve for year 2020 can be projected from the following relation In this study, we use three base-year load profiles in order to test how sensitive our conclusions are to uncertainty surrounding the future shape of the load demand projections. However, we only calculate the demand profiles for the 'Slow Rate of Change 50% POE Scenario' mentioned in AEMO (2012a). This scenario is the closest available scenario to the low demand growth environment currently confronting generators in the NEM.
In constructing state based nodal demand projections, we apply the aggregate state based peak load demand and energy usage projections contained in AEMO (2012a). As such, the load scaling and block load factors will be the same for each demand node in a state. They will vary from state to state however. It should be noted however that we do not use regional based peak demand or energy usage projections that are published by transmission service providers to fine tune regional based demand projections.

Weather Model Data
The Climate Research Group (CRG) of The University of Queensland (UQ) School of Geography, Planning and Environment Management was tasked with providing the five year (2008 to 2012), 90 metre above ground level (AGL), wind speed climatology at every operating and planned wind farm site in New South Wales, Victoria, South Australia, Tasmania and Queensland. The temporal resolution of the instantaneous wind speed data set was specified as 5-minute intervals. The wind speed climatology was one of the major variable inputs into the Australian National Electricity Market Model.
Due to exigencies of the computing infrastructure available to the CRG as further described in this chapter, the climatology output was constrained to three years (2010 to 2012) for all operating wind farm locations and the majority of planned wind farm locations. The reasons behind this data output adjustment was discussed with stakeholders.
The following paragraphs describe the weather model, its configuration and the wind speed data producedincluding a regression analysis of modelled and measured wind-energy turbine hub height wind speed.

Weather Model Configuration
CRG used the Weather Research & Forecasting Model (WRF 2015) version 3.5a mesoscale weather model that has been evolving through predecessors for some twenty years (WRF 2015). WRF is a "numerical weather prediction and atmospheric simulation system for both research and operational applications" (Skamarock et al. 2008, p. 1) widely used for meteorology research throughout the world. WRF is a community weather model; that is, model components are contributed from many model users, principally academic meteorology research institutions. WRF operational support, software engineering and geographic boundary condition data sets are maintained by the Mesoscale and Microscale Meteorology Division of the National Centre for Atmospheric Research (NCAR), headquartered in Boulder Colorado, The United States of America (US). WRF version 3.5 corrected a known topographic effect on wind bias that existed in earlier versions of WRF (Jiménez & Dudhia 2013). WRF is used extensively by the European Union (EU) and US wind energy industries to forecast winds in their energy demand planning processes.
The WRF three dimensional spatial schemes used in this project were based on a three nested domain design considered standard by WRF guiding documentation (Wang et al. 2012). The WRF nested domains were 5 : 1 (outer : middle) and 3 : 1 (middle ; inner) ratios as shown by an example in Figure 7. The outer domain had a spatial resolution of 15 km, middle domain 3km and inner domain 1 km. The domain design, particularly the size of the outer domain, made optimal use of six-hourly Global Forecasting System (GFS) reanalysed meteorological data archive. The inner domain size of 1 km was chosen to resolve terrain features to account for topographic effects on wind flow as recommended by (Horvath et al. 2012). The WRF earth surface database best spatial resolution is 30 arc-seconds so the inner domain spatial resolution matches the geographical initial condition data.
Electronic copy available at: https://ssrn.com/abstract=2724867  With stakeholder consent, not all planned wind farms sites were included in the domain design as that would have required some 37 domains and (as noted later) a computation over-head that could not be sustained by available resources. The inner domains were of different sizes to ensure wind farm were located well away from domain boundaries to reduce any model discontinuity issues. The WRF inner domains were typically sized around Electronic copy available at: https://ssrn.com/abstract=2724867 140 * 180 grid squaresgreater than the minimum 100 * 100 grid square size recommended by WRF user guidance (Wang et al. 2012).
The model tropospheric depth was divided into 30 sigma levels. Weather model sigma levels are a common weather model vertical coordinate system used to represent scaled pressure levels. The lowest levels are terrain followingwhich means wind data extraction from WRF at a given atmospheric level is at a known height above ground level.
Since the inception of WRF, model users select from a range of physics schemes; geographic boundary condition data; and meteorology initialising data to suit the weather output specification sought. The GFS, 6 hourly, reanalysed meteorology data is provided by the US National Centre for Environmental Prediction (NCEP) as final operational global tropospheric analyses maintained in near real time. Since this data incorporates weather observations at one degree latitude/longitude a large outer domain is needed so that sufficient meteorology is provided to the model boundary and initial conditions. Each WRF model domain set was first configured with boundary and initial conditions by running a series of processing commands collectively known as the WRF Pre-processing System (WPS). WPS utilities distribute geographical and meteorological data throughout the lateral and vertical design of each domain set. The geographical data sets used for underpinning WRF in this project were:  United States Geological Survey (USGS) topography at 30 arc-seconds (for the inner domain);  USGS 24 land use categories;  USGS 16 soil categories; and  Standard WRF provided albedo, soil temperature, sea surface temperature and green fraction data sets.
WRF physics scheme configurations were chosen based on those found optimal for wind modelling ( Electronic copy available at: https://ssrn.com/abstract=2724867

Weather Model Runs
Each WRF model domain package was setup in Linux "name-list" files to process a calendar month of wind data per parallel computer processor batch job. For twenty domainsthat meant packaging up 240 individual WRF model runs for each year of wind data produced. Each model run was allocated 64 central processor units (CPUs) in parallel, 5 Gigabytes of random access memory (RAM) and 90 hours of "wall time" (the batch queue time after which the job would terminate). On average each WRF batch job took around 74 hours to runthat is, when the batch jobs ran without interruption or error. The RCC resources allocation to the project was calculated to allow CRG staff to run 4 batch processing WRF model runs in parallel at any given timesome 256 CPUs running 24 hours a day.
Taken together, the model computational environment configuration specifications (CPUs, RAM, wall-time) were an optimal trade-off of WRF spatial and temporal accuracy against computing processing resources available to the project. This allocation was expected to allow five years of wind output data to be processed and made available to the Australian National Electricity Market Model. Unfortunately this computational throughput was not achieved in practice.
CRG planned to run all the WRF models in the UQ Research Computing Centre (RCC) "Barrine", a Linux high performance computing environment of the Queensland Computing Infrastructure Foundation (QCIF). During 2013 it was clear that Barrine was heavily used by scholars at UQ and elsewhereand in practice, and at best, only two WRF batch jobs of the project ran in parallel at any one time. Further, many of the WRF run time batch jobs failed and required restartingsome were restarted from the time of failure and some had to be re-run from the original model run start time. It is estimated that some 10% of the WRF restarts were required as a result of WRF mathematical discontinuities encountered mid-run. Some 90% of the failures were a result of computing infrastructure issues, particularly disk storage transfer and operating system problems of Barrine. Taken together, less than half the required rate of model output was being achieved in 2013. WRF restarts required a lot of manual WRF operator intervention -far more than expected. Many of the model restarts had to begin at the initial model time step because the RCC had to deal with perennial problems associated with the transfer of high volumes of WRF output to long term tape store since computational disk space size was not sufficient to archive WRF data.
After around 6 months of model operation into 2013, staff of the RCC provided some batch queue system script interventions that went some way to solving the problem of batch job failure. However there was no relief to the oversubscribed usage of Barrine during the WRF processing undertaken in 2013 and 2014. CRG learnt that the calculated computing resource allocation required by the project to allow four WRF runs in parallel, was not guaranteed by RCC operationsbut was instead considered a maximum resource availability to the project, but only if the RCC resources were not being used by others. Barrine suffered a number of periods of down time due to system failuresbut the main issue for the project was the inability to utilise the requested resources in the RCC.
In late 2013 it became clear that compiling the five year wind climatology requested by project stakeholders was not possibleand a lesser target of three years wind data compilation was set in consultation with project stakeholders. Since it was unclear if this target could be reached on RCC resources alone, CRG staff sought and received a high Electronic copy available at: https://ssrn.com/abstract=2724867 performance computing allocation at the National Computing Infrastructure (NCI) hosted at the Australian National University. In general it was found that WRF ran in half the time at the same computational resource base on the NCI compared to the RCC. In contrast to the RCC, NCI batch queue operations were allocated to user projects on a strict negotiated resource allocation basis. The batch job computing resources management at NCI gave some surety that the three year wind climatology could be achieved into 2014. Further, there were very few WRF model run-time failures compared with Barrine. When compiling WRF wind data for:  2010 -13.2% of WRF batch queue runs had to be re-started at some point while 1.2% of NCI runs had to be restarted;  2011 -25.8% of WRF batch queue runs had to be re-started at some point while 0% of NCI runs had to be restarted; and  2012 -4.8% of WRF batch queue runs had to be re-started at some point while 0% of NCI runs had to be restarted.
The processing order of WRF wind was the years 2011, 2010 then 2012 illustrating that over time, Barrine became a more dependable platform in terms of WRF run-time failureparticularly after RCC staff batch queue submission script file work-arounds were implemented. The issue of WRF model throughput on Barrine was ongoing throughout the project. 720 individual WRF runs were completed by using both Barrine and the NCI.

Weather Model Wind Output
The WRF models were configured to output weather parameters during run-time every 5 minutes in netCDF format. The 1 km spatial resolution inner domain model data files were retained on RCC storage systems so that the 90 m AGL wind data at specific wind energy turbine locations could be further extracted. The WRF model runs were configured such that the 4 th terrain following level from ground level was set at 90 m AGL. Using wind data at a height described by project stakeholders as an average wind turbine hub height was important, as wind speed and direction are different depending on where they are measured with respect to proximity to the ground (Banta et al. 2013). The effect of terrain friction on wind speed is shown in Figure 9, comparing 10 m and 90 m AGL wind speed/direction for one of the WRF domains. Note that the more laminar wind flow at height has a generally faster wind speed.
WRF wind output is retained as two Cartesian coordinate parameters -U and V wind vector components. "U" wind is air motion in the "x" direction and "V" wind is air motion in the "y" direction (Stull 2000). The conversion of these Cartesian coordinate wind vectors to wind speed and direction was undertaken during wind data preparation processes prior to ingestion into the Australian National Electricity Market Model as described further in this report.
Infigen Energy Ltd provided some automatic weather station measured wind speed data from a site in the Woakwine Range region South Australia at 37.49098 degrees South and 140.14863 degrees East. The data was 10 minute averaged wind speed at 82 m AGL. An assumption is made that the Infigen meteorological instruments were serviceable and calibrated during the time in which wind speed data was collected. This measured wind speed data is compared to 5-minute interval instantaneous wind speed extracted from a Electronic copy available at: https://ssrn.com/abstract=2724867 WRF model run at the same location. It is important to note therefore, that the compared wind speed parameters are not the same quantity. Graphs and statistics for three, five day, time series in January, April and July in 2011 are shown in the following figures/table: 1. Figure 10: Regression analysis scatter plots for the three time series; 2. Table 6: Summary statistics for the three time series; 3. Figure 11: 14 to 18 January 2011, 5 day time series graph; 4. Figure 12: 12 to 16 April 2011, 5 day time series graph; and 5. Figure 13: 12 to 16 July 2011, 5 day time series graph. The diurnal wind speed fluctuation comparing modelled with measured wind speed, are similar on a trend basis as seen if Figure 11 to Figure 13 -although disparities in wind speed; sometimes as a "spike", sometimes for a few hours is evident in the graphs. The summary statistics of comparing model to measured wind speed in Table 6, describe the extent of that variance across the time series.
Despite the modelled and measured wind speed parameters being of different units, Figure  10 shows that there are strong wind speed correlations for the five-day periods of January and July 2011 and a moderate correlation for April 2011.

minute interval instantaneous U, V vector and wind speed compared with Infigen Energy Ltd provided measured 10 minute average wind speed at a location in the Woakwine Range region of South Australia -RMSE (root mean square error) -R2 (Pearson's correlation coefficient) -MAPE (mean absolute percentage error)
The mean absolute percentage error (MAPE) shown in Table 6 is a measure of the model accuracy with respect to the measured wind speed values expressed as a percentage. The RMSE (root mean square error) values of Table 1 indicate that the WRF model is more accurate with respect to measured wind speed for January and July 2011, and less so for April 2011. Generally speaking, the 5-day time-series graphs ( Figure 11 to Figure 13 that follow) indicate the WRF wind speed is consistent with the measured wind speedat times overestimating wind speed, and at times underestimating wind speed.

Figure 11: Time series graphs -14 to 18 January 2011 -WRF 5 minute interval instantaneous wind speed compared with Infigen Energy Ltd provided measured 10 minute average wind speed at a location in the Woakwine Range region of South Australia
Electronic copy available at: https://ssrn.com/abstract=2724867

Figure 12: Time series graphs -12 to 16 April 2011 -WRF 5 minute interval instantaneous wind speed compared with Infigen Energy Ltd provided measured 10 minute average wind speed at a location in the Woakwine Range region of South Australia
Electronic copy available at: https://ssrn.com/abstract=2724867

Figure 13: Time series graphs -12 to 16 July 2011 -WRF 5 minute interval instantaneous wind speed compared with Infigen Energy Ltd provided measured 10 minute average wind speed at a location in the Woakwine Range region of South Australia
The time series graphs are in general agreement with Zhang, Pu and Zhang (2013) in their study examining near-surface winds per WRF physics scheme, in which the authors noted that errors in wind speed were maximum at night and least in the afternoon. The authors state that generally, WRF modelled wind speed has no systematic biases on a long-term perspective.

Weather Model Data Extraction
The 5-minute interval, 90 m height wind speed data was extracted from the WRF inner domain netCDF output files at selected wind farm positions within each of the 20 different Electronic copy available at: https://ssrn.com/abstract=2724867 model geographic domains. For each wind farm site, data was extracted from between one and nine locations depending on the size of the windfarm site and its geographic disparity. This was to account for the wind speed variability likely to be experienced at a large wind farm. The average number of data extraction points for the whole data set is around three data points.
Data extraction routines were written to recover the 90 m AGL winds at each latitude/longitude points every five minutes. Since the WRF inner domain data files were over 20 Gb in size these routines took two days to run on a Barrine single CPU. Figure 14 is a map which shows the mean geographic centre location of a wind farm in a WRF inner domain from North-west Tasmania as an example. In this case, three wind farms exist inside the WRF inner domaintwo operating wind farms and one with planning permission. Figure 15 is a map that is a zoomed in section of Figure 14 to illustrate the actual number of wind data extraction points for those three wind farms sites -again, as an example.

Figure 14: An example WRF inner domain (brown square)this one for NW Tasmania with operating wind farms locations (red dot) and a planned wind farm location (blue dot)
Electronic copy available at: https://ssrn.com/abstract=2724867

Figure 15: A zoom into the example WRF inner domain of Figure 14with data extraction points for operating (red triangles) and the planned wind farms (black dots)
For 20 WRF inner domains; 37 operational and 78 planned wind locations with 305 data extraction points, 10,980 monthly data extraction files for the three year period 2010 to 2012 were created.
Keeping only the WRF model inner domain files created a data store of some 180 Tb.
Unfortunately the project could only keep 90 Tb of that collection due to the limits of the Barrine data storage facility. Around half the WRF output files were deleted necessarily after wind data was extracted from them as a consequence.

Preparation of Wind Speed Data for ANEM Assimilation
The following paragraphs describe how the post-WRF 5-minute interval wind speed vectors were prepared for assimilation into the Australian National Electricity Market Model.
The latitude and longitude coordinates of representative clusters of Wind Turbine Generators (WTGs) identified from windfarm layouts in planning approval documents formed the basis of geographical locations for WRF data extraction points within each selected windfarm site. WRF data extraction at these latitude and longitude coordinates related to wind climatology results at 90 meters 'Above Ground Level', taking into account the elevation and nature of the terrain surrounding these coordinates when applying the 90 meters above ground level requirement.

Nature of the Data
The wind speed data output by WRF consists of files organised into directories for each year, with separate directories for planned and operational windfarms. The years 2010-2012 Electronic copy available at: https://ssrn.com/abstract=2724867 inclusive were computed by the model and made available. Approximately 8GB of total data was provided, being 1-1.5 GB per directory (per year and operational vs. planned). For each year there were approximately 1500 planned and 2100 operational files (varying slightly from year to year).
Within the directories, one for planned and one for operational per year, separate files exist for each distinct combination of:  month,  location (i.e. the name of the windfarm), and  point (there can be multiple points within a windfarm's location corresponding to different latitude/longitude coordinates within a windfarm). This information is encoded in the file names (which also includes the year and state/model codes; redundant information given the directory and location names).
Each file contains 5-minute interval data consisting of wind speed vectors with metre/second m/s units. The data in the files is encoded in ASCII (i.e. plain text). The data files are laid out in space-separated columns, with a first line describing the column names. The remaining 8000-9000 lines of each file contain 5-minute interval wind speed vector data, the number of lines depending on the number of 5-minute intervals in that month. Each line of data encodes:  a human-readable timestamp in the current time-zone,  X, Y and Z location information,  U and V wind speed vectors,  and optionally, T (temperature) and P (pressure).
The X, Y and Z location information is ignored, since this is implicit in the file name (and these values are invariant, on a per file basis). The P and T values are also ignored when present. However, the wind speed vectors U and V are used, as described below.

Problems in the Data
As initially provided, there were many inconsistencies and problems in the data: 1) Problems with file names such as: a) files expected but missing, b) extra files not expected, and c) incorrectly named files (e.g. a 'typo' in a file name). In some cases issues (a) and (b) came in pairs due to issue (c).
2) Problems in the file contents including: a) garbage data or incomplete lines b) problems with data ranges: i) start/stop time wrong, ii) gaps in the datanot consecutive 5-minute intervals, iii) data for the wrong time period entirely c) incorrectly formed timestamps "201" ":06" instead of ":00". The problems with the file contents were in most cases reported by the model developer as being due to WRF model restarts -premature termination of the WRF model software is Electronic copy available at: https://ssrn.com/abstract=2724867 relatively common, due to the very long run times it requires. Restarts of the model, where WRF continues writing to a file containing existing data from a previously aborted run, were the cause of many of the issues above.
Most of the issues above needed to be fixed by re-generating the data using WRF, in order to be able to generate the desired wind speed averages. In particular 5-minute interval data needed to be present in each file for at least the entire time period of interest, without any gaps, otherwise some interpolation strategy would be required. Similarly, all the required files must exist over the time range of interest, i.e., all per-month files need to exist for each location and point, or otherwise there are gaps in the data for that year. Some problems were more innocuous as there were simple fixes or workarounds, such as 2(c) above, where it was clear that ":00" was the intention in all cases.

Data Processing -Software Tools
Two categories of tools were used to manipulate the WRF data:  standard text file manipulation tools, and  custom software written in Haskell 2010 programming language (Marlow 2010).
The use of commonly available text manipulation tools like text editors, text search tools (e.g. 'grep') etc. hardly requires explanationone benefit of working with text files is that these well engineered and familiar tools can be applied.
Additionally, custom software was required in this instance, because: (a) there is no preexisting software that performs the exact computations required, and more importantly; (b) because the large size of the data set requires due consideration to software execution performance. In order to achieve good performance with large data sets, detailed control over memory and other resource usage is required, i.e., what is needed is the full generality of a programming language in which the memory allocation behaviour can be controlled.
Whenever a dataset exceeds the size of random access memory, processing the complete dataset as one complete unit is infeasible. The dataset must be processed progressively bitby-bit as a 'stream'. The satellite dataset will grow over time making the bit stream approach necessary. A new years' worth of satellite data may be calculated easily enough using WRF. However, where possible the ability of the computation to function correctly should not be dependent on the size of the data and/or the physical memory available (within sensible limits, of course), or have resource consumption that is a function of the size of the input.
When processing data as a 'stream', ideally the data should be processed in a single pass, i.e., read only oncein this case the nature of the calculations required (both to compute wind speed averages as well as various checking algorithms) allowed this.
For the custom software development Haskell 2010 was chosen as the language, using the industry standard Glasgow Haskell Compiler (GHC) (HaskellWiki 2015). Haskell is:  cross-platform -available on Windows, Mac OS/X and Linux OSes and others, allowing access to a wide array of computational resources,  compiledfor high execution performance, Electronic copy available at: https://ssrn.com/abstract=2724867  lazy -ideally suited to the elegant processing of arbitrarily large data sets using low and constant memory,  (advanced) statically typedthat allows rapid development of robust, correct software, and  extremely concise -making the writing, reading and editing of code particularly efficient (i.e., substantially shorter code than would typically be the case in other languages).
Haskell and the GHC have been in development since the late 1980s as the focus of much academic interest in lazy, functional languages. In recent years, Haskell has been increasingly used in general research as well as in settings outside academia, for a variety of commercial and industrial applications, motivated by the benefits above, as well as others.

Data Processing -Hardware
The large WRF data set required appropriate selection of computer hardware for running data checks, data extraction and calculations. In this project these processes are not especially computationally (i.e. CPU) intensive; but rather computation involving the WRF data is for the most part input/output (I/O) bound. Thus, standard, commodity Intel processors were used, a single core at a time, on both the Windows and OS/X operating systems platforms as happened to be convenient.
Selection of storage medium for the WRF data was a more performance critical consideration, so the data was stored and manipulated on a solid state drive (SSD) in all cases. The choice of SSD technology provides substantially better performance than traditional, rotating assembly, hard disk drives (HDDs). The size of the WRF data set at 8GB was well within the capacity of typical SSD drives.

Data Processing -Data Checks
Given the problems present in the data, identified above in section 4.3, it was necessary to develop custom tools formalising a number of checks, that could be applied efficiently to successive versions of the WRF data files. The sheer number of files as well as the number of lines per file, prohibit any manual, systematic inspection of the dataset as impractical. Even if such a manual inspection could be achieved in reasonable time, human error rates would result in a significant number of errors. Automation was essential, and it was only with the aid of the automated checking tools developed, that the list of problems above in section 1.3 could be precisely qualified and determined.
These tools not only helped identify both the nature and the extent of the problems enumerated in section 1.3, but also provided a gatekeeping function via formalised, automated testing for acceptance of a WRF data set as trouble free, and ready for further processing. The checking tools were designed to produce reports containing meaningful error messages identifying the cause clearly, to be readable for the WRF model developer.
The automated checks implemented were: a. all files are present as expected, no extra files are found and the names are structured as expected, b. the start and stop times for the files are correct, or at least inclusive of the time range of interest, Electronic copy available at: https://ssrn.com/abstract=2724867 c. timestamps are contiguous, without gaps, and d. the contents of the wind files are formatted as expected i.e., parse the contents of the file down to the individual character level, using a formally specified grammar for the file format, to ensure exact compliance with the expected format.
The advantage of using a formally specified grammar in the context of a full-blown parser in (d) above is that errors are reported completely consistently: a. correctly formed files always check out as OK, and b. incorrectly formed files always report a readable and well-located, helpful error.
By design there can be no false negatives or positives, which are typically present when working with ad-hoc or generic code for reading specific file formats.

Data Processing -Data Extraction
It is possible to use a much faster reading algorithm if the syntax of the data files is first rigorously checked. This required significantly less CPU time than the use of a parser, or other solutions involving any checking or error tolerance.
The raw WRF data extraction proper produced a stream of pairs of double precision floating point numbers, one for each five-minute interval, for each point within each location, on a monthly and yearly basis, i.e., structured exactly like the WRF data.

Computation of Average Wind Speeds
The calculation of average wind speed per windfarm involved two conceptually simple aspects:  conversion of the U and V wind speed vectors to a wind speed (i.e., by trivial application of Pythagoras' Theorem), and  averaging of wind speeds for a windfarm for all selected points (e.g. latitude and longitude coordinates) within the windfarm site boundary.
Recall that the latitude and longitude coordinates represent both the location of representative WTG clusters within the windfarm determined from WTG layout plans as well as pre-selected extraction points for WRF simulation purposes. Note that because the averaging process occurs across different pre-selected clusters of WTGs within each windfarm, the average wind speeds represent a five-minute averaged wind climatology profile, and clearly, does not reflect or replicate wind prospecting as conventionally conceived in windfarm development.
While the calculations involved are mathematically trivial, the software engineering implementation required to perform them was not, involving instead some subtlety of design in order to achieve adequate performance with the large WRF dataset. In order to achieve low memory consumption needs via 'streaming' of wind speeds per 5-minute interval, as well as reasonable run-time by making just a single pass over the data, the averages need to be computed interval by interval, accessing whichever WRF data files are required for the various points for a location, incrementally, as needed.
Electronic copy available at: https://ssrn.com/abstract=2724867 The average wind speed data outputted was provided according to two different organisations:  wind speeds by windfarm (e.g. WRF locations according to latitude and longitude coordinates within the windfarm), and  wind speeds by AEMO DUID (for both non-and semi-scheduled windfarms).
The data was output in CSV (comma-separated values) file format, one file per year. The CSV file is arranged into a first column containing consecutive 5-minute interval timestamps, followed by one column per WRF location (or AEMO DUID). Wind speeds are doubleprecision floating-point numbers with metre per second (m/s) units. The desired average wind speeds were in 5-minute intervals, so no change was required to the time base.
WRF windfarm names and AEMO DUIDs are not isomorphic. In some instances, more than one windfarm location from the WRF output reflecting different WTG clusters within a windfarm boundary corresponds to a single AEMO DUID. Consequently an explicit 1:many mapping from DUID to WRF farm names was constructed and applied, with wind speeds averaged across multiple WRF locations if needed for a particular AEMO windfarm DUID.

Calculating Wind Turbine Generator (WTG) Power from Average Wind Speeds
The average wind speeds calculated in the previous section form the basis for calculating the MW output of each windfarm. The initial calculation involves determining the MW output for a single representative WTG in a windfarm for each five-minute average wind speed for each consecutive five-minute period in years 2010, 2011 and 2012, respectively. The MW output is read off an appropriate WTG power curve for a given average meters per second (m/s) wind speed value. Because the choice of WTG can differs from windfarm to windfarm and even within a single windfarm, different WTG power curves were used to calculate WTG output traces.

WTG Power Curves and Output
Information on WTG power curves were sourced from a few different resources. The first was published power curves in excel files available from Idaho National Renewable Laboratory (INL 2015). The second was power curves available with the Windpower Program (Bradbury 2015). The third source was power curves available with the WASP Wind Flow Modelling Program (Jacobsen 2015). Finally, for any WTG not listed at these three sites, internet searches for power curves at the web sites of the manufactures of the WTG usually provided power curves in sales and technical documents outlining technical characteristics of the WTG.
The source power curves typically express the kW output of the WTG along the vertical axis for different wind speeds along the horizontal axis incremented by one meter per second. The kW output of these power curves typically becomes positive and increases in a nonlinear but smooth manner over wind speeds in the range of 3 to 13 m/s. The output then plateaus off at a rated kW capacity for wind speeds between 13 and 25 m/s before shutting down at higher wind speeds in order to protect the WTG from damage due to excessively high winds. For example, a wind speed of 25 m/s corresponds to a wind speed of 90 km/h. The rated output is typically maintained by varying the pitch of the blades of the WTG for wind speeds above 13 m/s. A typical WTG power curve is presented in Figure 16. Note the smooth increase in WTG output over wind speeds from 3 to 13 m/s [read along the horizontal axis] before plateauing at 2000 kW (or 2 MW) for wind speeds between 13 and 25 m/s.
Electronic copy available at: https://ssrn.com/abstract=2724867 The WTG power curves are 'standard' power curves developed assuming that the WTG are running at standard air pressure and temperaturethat is, at the atmospheric pressure at sea level and at 15 degrees Celsius which gives the standard density of dry air of 1.225 kilograms per cubic metre (kgm -3 ) used in calculating the energy content of the wind. Deviations away from these standards will change the energy content and WTG output. For example, air density decreases with increases in temperature and altitude. Therefore, both the energy content of wind and output from the WTG will decline relative to the output from standard power curves as temperature and altitude increase. Thus, at typical altitudes in Australia that are higher than sea level, the standard power curves used would tend to overestimate the power of the WTG's. Furthermore, to the extent that temperatures exceed 15 degrees Celsius (e.g. in summer), then the standard power curves would also tend to overestimate the output of the WTG's, especially at wind speeds between 3 and 13 m/s. Note in this context, that both higher temperatures (e.g. above 15 degrees Celsius) and higher altitudes (e.g. above sea level) would tend to reinforce the extent of over-estimation of power from the standard WTG power curves. However, to the extent that temperatures are below 15 degrees Celsius (e.g. in winter), then the power curves would tend to under-estimate the output of the WTG's especially at wind speeds between 3 and 13 m/s, although this might be partially or even fully offset when altitudes are significantly higher than sea level. In any case, in this latter circumstance, temperature and altitude effects would clearly work against each other. A cubic spline technique was used to disaggregate the horizontal axes of the standard power curves so that they incremented in terms of 0.05 m/s instead of the more aggregated 1 m/s displayed in the standard power curve reproduced in Figure 16. An excel add-on called XlXtrFun™ (ASDD 2015) was used to perform the cubic spline interpolations. This meant extending the m/s axis from 31 bins [e.g. from 0 to 30 m/s incremented by 1 m/s] along the horizontal axis to 601 bins [e.g. from 0 to 30 m/s incremented by 0.05 m/s] along the horizontal axis of the interpolated power curve. An example of an interpolated power curve for the Vestas V90-2000 WTG is contained in Figure 17. WTG output for a representative WTG in each windfarm was calculated given each fiveminute averaged m/s wind speed value using the excel 'VLOOKUP' function to read the corresponding KW output value associated with the wind speed value. If the five-minute average wind speed did not match an interpolated wind speed value associated with the interpolated power curve exactly, then the next nearest lowest interpolated wind speed value was chosen. This approach is conservative since it would produce a slightly lower WTG output value especially for wind speeds in the range 3 to 13 m/s then would be forthcoming if the actual and interpolated wind speed values of the interpolated power curve matched exactly.  c. Advanced planning (+all the windfarms above) d. Less advanced planning (+all the windfarms above) e. Least advanced planning (+all the windfarms above)

Wind generation scenarios
Classifying windfarms into scenarios C, D or E considers planning approval, agreements with transmission companies and if the farm is planned by one of the three large vertically integrated retailers, AGL Energy, Energy Australia or Origin Energy. For the three vertically integrated retailers, PPAs required for financial closure are more readily obtained. Finally, if windfarms met the first two conditions and were located with little or no other generation nearby, they were also deemed to be in a more solid position financially. Scenario E includes a small set of larger windfarms added by the authors because their proposed size made obtaining PPA's potentially more difficult (in the case of Ceres, Liverpool Range and Cattle Hill windfarms), or were judged to be further behind in planning approval (in the case of Crow's Nest and Yass Valley windfarms). A key reason for their inclusion was their interesting nodal locations and the relatively small presence of existing or proposed wind generation listed in the categories above at these nodes. Examples include the Tarong node in QLD, the Wellington and Yass nodes in NSW, Ceres windfarm in the Adelaide node in SA and the Cattle Hill windfarm in the Liapootah node in Southern TAS.

Scenario C: Advanced planning (+all the windfarms above)
Ararat VESTAS V112-3000 Barn Hill VESTAS V112-3000 Boco Rock 2 GE 1.6-100 (7)  Note in Table 7 if a windfarm has more than one type of WTG, the number of each type of WTG is shown beside each WTG type. For example, Cullerin Range and Boco Rock stages one and two windfarms. We assume a Vestas V112-3000 WTG for all planned windfarms in Scenarios C to E. Exceptions include Mortlake South and Ceres whose developers are WTG manufactures. Additionally, WTG for Boco Rock 2 reflect those made for Boco Rock 1, which is nearing completion.
Electronic copy available at: https://ssrn.com/abstract=2724867 In the last column of Table 7, we document when any other windfarm sites were used as a proxy for the wind climatology at a particular windfarm site. We adopted this approach for two reasons: (1) there was a trade-off between the time taken and memory requirements needed to both run and store WRF simulations results; and (2) some windfarms appeared less likely at the time of selecting WRF sites than is currently the case. Of the cases listed in Table 7 the proxies used for Boco Rock, Cattle Hill, Forsayth, Keyneton and Port Lincoln are the most questionable given their relatively larger distance from the proxied windfarms mentioned in column 3.

Calculating Total Windfarm Output
Total windfarm output was calculated by aggregating the representative WTG output discussed in Section 5.2 across the number of WTG's within a windfarm in order to calculate the total average five-minute MW output of the windfarm itself. In performing this, account was taken of whether more than one WTG type was installed at a windfarm. If this was the case, then the output for each different type of representative WTG (read off different power curves) was then aggregated across the number of the different types of WTG's at the windfarm. Then the aggregated MW output across the different types of WTG's in the windfarm was added together to derive the total output of the windfarm itself. Finally, when performing the aggregations, we assumed that all WTG's were operational and available to supply power.
As a form of logical check on total windfarm output, the capacity factor for each windfarm was calculated by summing the five-minute total output of the windfarm and dividing this by the product of its maximum MW capacity and the number of five-minute intervals in each year (e.g. 105120 for years 2010 and 2011 and 105408 for 2012). If the calculated capacity factors seem unreasonably high or low, we could decrease or increase the average wind speed profile by multiplying it by a scalar that was less than or greater than unity respectively, until the capacity factor became more reasonable. This operation shifted the whole yearly five-minute average wind speed profile downwards or upwards by a constant amount while not affecting the relative distribution of 'windy' and 'non-windy' periods over the whole year as determined by the synoptic patterns modelled by the WRF model. This latter outcome would reflect the underlying weather patterns and dynamics captured by the WRF model itself that determined the underlying meteorological features that produced the 'windy' and 'non-windy' periods over the whole year.
Heuristically, the above approach could be viewed as mimicking a Bayesian updating methodology. In this approach, we attach a high degree of confidence in the ability of the WRF model to forecast the broad scale synoptic features and their paths that would produce periods with significant wind resources such as low-pressure fronts. Thus we have high confidence in the ability of the model to predict time periods when 'windy' conditions are prevalent at each of the WRF sites throughout the year. However, predicting actual wind speeds is a more complex problem at particular WRF latitude and longitude coordinate sites because of the more finely scaled geographic region involved as well as more complex interactions between local and broader scale synoptic features as well as interactions with local terrain. As such, there is more scope for over-or under-estimation of actual wind speeds on such fine scales and the scaling methodology proposed above can be used to fine-tune the finer scale wind climatology results to produce more reasonable looking windfarm output projections. In this context, the capacity factor results act as a type of prior Electronic copy available at: https://ssrn.com/abstract=2724867 that is used to inform actual wind speed predictions generated by the WRF model at the WRF extraction points.
More generally, in applying this scaling operation, information about the actual capacity factors of operational windfarms were taken into account where possible to inform decisions on whether the capacity factors from WRF wind climatology results seemed reasonable or not. Thus, the capacity factors of operational windfarms are generally scaled to match their actual capacity factors obtained from assessing their 2009-2014 AEMO dispatch record. This approach is further supported, as it is generally believed that transmission bottlenecks did not unduly lead to significant wind spillage effects over this time period. This information, additionally, was also used to inform decisions about proposed windfarms located near existing windfarms and also provided upper bounds for proposed windfarms located in other regions.
The capacity factors of proposed windfarms are expected to be slightly higher on average than existing operational ones because the higher height of the V112-3000 (and Ceres 3.4 MW) WTG's, in particular, would allow more power to be produced at given wind speeds than is the case with older shorter WTG models installed at operational windfarms. Finally, in the last column, annual GWh production values are displayed. These latter results were determined by multiplying the average capacity factor by the MW capacity of each windfarm and the number of hours in a year (assuming 8760 hours) and then dividing this result by 1000 to convert from MWh to GWh. Table 8 shows that the average capacity factors for the proposed windfarms listed in Scenarios C to E are above those of the operational windfarms in Scenario B. For example, the average capacity factor for operational windfarms was determined to be 0.3658. In the case of the newer under construction windfarms, the average capacity factor increased to 0.3939. For the windfarms listed in Scenario C to E the average capacity factor was 0.3958, 0.4075 and 0.3931, respectively. For all windfarms included in the study, the overall average capacity factor was 0.3874. Recall that one reason for the increase in capacity factor was the trend towards choosing higher WTG types which have a larger 'wind field fetch' and are capable of producing more power at a given wind speed. There is also a tendency to install WTG's with a higher rated capacity than was the case especially with older operational windfarms. More generally, technological innovations also improve the efficiency of latter models of WTG's especially over wind speeds between 3 and 13 m/s when compared with earlier models to increase capacity factors.

Assessment of Windfarm Capacity Factors
Electronic copy available at: https://ssrn.com/abstract=2724867 Inspection of the average capacity factor of proposed windfarms going into new regions also highlight some interesting outcomes. First, Far North QLD seems to produce very good wind resources with average capacity factors of between 0.3997 (Forsayth) and 0.4495 (Mt Emerald) being recorded. The only operational windfarm in this region, Windy Hill 1 had an averaged capacity factor of 0.3620 based around much smaller WTG's than would be installed at other proposed sites in this region. It is also emphasised that the wind climatology used for Forsayth is an average of that for Mt Emerald and High Road. Both of these windfarms, however, are located quite some distance from the Forsayth site and, as such, represent an extremely rough approximation of the wind climatology for the Forsayth windfarm.
In QLD, the other region of interest is the Tarong  The results for the Marulan and Canberra nodes for the proposed windfarms remain generally close to the results obtained for Crookwell 1 and Woodlawn windfarms, respectively. That is, generally in the range of 0.34 to 0.39. It is also emphasised that the wind climatology used for Boco Rock 1 and 2 windfarms is based on the Woodlawn wind climatology results. However, Woodlawn is a considerable distance from the location of the Boco Rock windfarm and, as such, represents an extremely rough approximation of the wind climatology for Boco Rock.
Another new region in NSW relates to the large Silverton windfarm proposed for the Broken Hill area (which falls within the Tumut node). This windfarm has capacity factors for Stages 1 and 2 of 0.4309 and 0.4408 respectively, with the difference reflecting different WTG clusters within the windfarm site boundary of this large proposed windfarm. Once again, there are no operational windfarms in this area or publically available results from wind prospecting studies so we cannot be sure about the extent of potential over-estimation of the Electronic copy available at: https://ssrn.com/abstract=2724867 average capacity factor for this particular windfarm. However, both results, once again, would appear to be at the upper end of expectations.
It should be noted that the incidence of higher average capacity factors mentioned above especially for windfarms located at the Far North QLD, Tarong and Tumut nodes were not artefacts of any inflation of the average five-minute wind speed profiles obtained from the WRF model. In all these cases, the average five-minute wind speed profiles were actually deflated somewhat (e.g. shifted downwards) indicating that the WRF simulation results were responsible for the relatively high capacity factors achieved at these node, and quite bullish about the wind resource at these sites.
The average capacity factor calculated for the Bald Hills windfarm in the Morwell node (VIC) was 0.3970, being similar in range to that calculated for Wonthaggi windfarm located in the same node. Note that there was slight deflation of the average five-minute wind speed profiles for both windfarms, indicating that the WRF simulations suggested excellent wind resources at the locations of these two windfarms. This outcome also extended to Winchelsea windfarm in the neighbouring Melbourne node, which had an average capacity factor of 0.3956, together with a slight deflation of the average five-minute wind speed profile for this particular windfarm.
The two key VIC nodes for wind power generation are node 33 (South West VIC) and node 34 (Regional VIC). Both of these nodes had operational windfarms and the observed capacity factors of those windfarms was used to inform decisions about reasonable capacity factors of other proposed windfarms slated for construction at these two nodes.
In the case of the South West VIC node, average capacity factors for new proposed windfarms ranged from 0.3990 (Hawkesdale) to 0.4187 (Portland Stage 4). Note that the operational windfarms in this node experienced average capacity factors between 0.37 and 0.39 except for Oaklands Hills, which had a lower average capacity factor of 0.3240. Furthermore, those windfarms located on top of coastal bluffs tended to perform marginally better, such as Portland, stages 2, 3 and 4 and Yambuk windfarms. Note further that the Woakwine Range windfarm did attract special treatment in the modelling. Specifically, the wind resource for this particular windfarm was centred on the Woakwind Range in South East SA. However, its developer, Infigen Energy, intends to transport the power directly into South West VIC, bypassing South East SA. Finally, it should be recognised that these average capacity factor outcomes listed above were produced against the backdrop of some degree of deflation of each windfarm's five-minute average wind speed profiles. Thus, when measured against the raw wind speed projections generated by the WRF model itself, the adopted wind speed profiles and average capacity factors in the modelling are conservative in nature. capacity factor values reported in Table 8 for all other windfarms in node 34 were calculated after each respective five-minute average wind speed profiles were deflated a little. In the case of the former three windfarms, no deflation or inflation was applied to the wind speed profiles derived from WRF model simulations.
In the case of the four operational windfarms situated at the South East SA node, the fiveminute averaged wind speed profiles were deflated appropriately until the average capacity factors broadly matched the average operational value determined from AEMO dispatch data. This produced average capacity factors of between 0.3177 (Lake Bonney 1) to 0.3281 (Lake Bonney 2). These results can be contrasted with the results obtained for the proposed Woakwind Range windfarm, which is located in the same node but further north of the four above-mentioned operational windfarms. It should be noted that the Woakwine Range windfarm result is also predicated on some deflation of the average five-minute wind speed profile although to a slightly less extent than in the case of the four operational windfarms. However, the difference in deflation factor is not large enough to explain the differences in capacity factors between the windfarms. In the case of the Woakwine Range windfarm, the turbines tend to be located upon more pronounced ridges at higher elevation and the windfarm is assumed to utilise higher WTG's with greater wind field fetch than those WTG's installed at the other operational windfarms. Moreover, the other operational windfarms are located closer to the shore of Lake Bonney and at lower elevation. Thus, high average capacity factor outcomes at this node were not artefacts of any deliberate inflation of windfarm wind climatology results over and above what was produced naturally by WRF model simulations.
The results for the two windfarms included in the study at the Adelaide node are quite different. The operational windfarm Starfish Hill had an average capacity factor of 0.3005, which closely matches its operational performance. However, the large proposed Ceres windfarm has a much higher capacity factor of 0.4447. In both cases, some deflation of the average five-minute wind speed profiles was required to produce the recorded capacity factors listed in Table 8. Note that Ceres is located on Yorke Peninsular with power intended to be transported under the Gulf of Vincent directly into the Adelaide region, hence its placement in the Adelaide node instead of the Mid-North SA node. This windfarm is reputed to have an excellent wind resource and with the larger capacity and taller WTG's proposed for installation at this windfarm site when compared with those installed at Starfish Hill (e.g. see Table 1 for details), than the increase in capacity factor is not unexpected for Ceres relative to Starfish Hill.
In terms of both operational and proposed windfarms, the most important node for wind power generation in SA is the Mid-North SA node (e.g. node 39). Operational windfarms at this node have some of the highest capacity factors in the NEM, including Hallett 1 and 2, North Brown Hill and Snowtown 1 windfarms with capacity factors of between 0.4196 and 0.4372. Most of the proposed windfarms located at this node also have capacity factors of similar magnitude to those listed above for the four operational windfarms.
In general, some inflation of the average five-minute wind speed profiles were required to achieve the reported capacity factors listed in Table 8 for windfarms located in this node. Thus, the WRF model struggled to produce wind speed profiles consistent with the higher capacity factors experienced by many windfarms located at this node. In fact, the only windfarms at which deflation factors were applied were Wattle Point, Hornsdale and Mt Electronic copy available at: https://ssrn.com/abstract=2724867 Bryan windfarms. Furthermore, no deflation or inflation was applied for the Waterloo and Keyneton windfarms (with Keyneton utilising Waterloo wind climatology as a proxy wind climatology). All other windfarms at this node required some inflation of their wind speed profiles with inflation factors typically being in the range of 1.02 to 1.1. However, for Snowtown 1 and 2 as well as for Clements Gap (for year 2010 only), slightly larger inflation factors in the range of 1.12 to 1.17 were required to achieve the average capacity factors reported in Table 8.
As an example of the role that a higher WTG's could play in increasing windfarm capacity factors, both the Waterloo and Keyneton windfarms share the same wind climatology (with Keyneton being proxied by Waterloo) and the same inflation factor of unity. The WTG's installed or proposed to be installed also have a maximum three MW rated capacity. However, the height is different with the WTG's at Waterloo being 90 meters (e.g. Vestas V90-3000) while the WTGs at Keyneton were assumed to be 112 meters in height (e.g. Vestas V112-3000). The two windfarms are also of comparable size, with Waterloo containing 37 WTG's and a maximum MW capacity of 111 MW while Keyneton has 42 WTG's with a maximum capacity of 126 MW. However, the two capacity factor results listed in Table 8 are quite different with Waterloo's capacity factor being 0.3465 while Keyneton's capacity factor is 0.4359. The only difference between the two windfarms that can explain this difference is the choice and height of the WTGs.
There is a smaller wind power generation presence on the Eyre Peninsular (node 41) with the operational windfarms Cathedral Rocks and Mt Millar having average capacity factors of 0.3278 and 0.3173, respectively. These capacity factors, by design, closely match the average capacity factors of those windfarms actual dispatch in the NEM. Some deflation of the average five-minute wind speed profiles were required to achieve these target capacity factors, more so in the case of the Cathedral Rocks windfarm.
The final windfarm in SA is the proposed Lincoln Gap windfarm that is located in the Upper North SA node (e.g. node 40), a relatively short distance south west of Port Augusta. Unfortunately, at the time of selecting suitable sites for the purpose of WRF model simulations, no sites corresponding to this windfarm were chosen. Therefore, we had to impose a proxy wind climatology for this site that was composed of an average climatology drawn from the nearest geographically located windfarms. As specified in Table 1, we chose Mt Millar, Clements Gap, Hallett 1, Hallett 2, North Brown Hill and The Bluff as candidate windfarms for approximating the wind climatology at Lincoln Gap. Note further, that we did not impose any deflation or inflation of the average wind speed profile for the Lincoln Gap windfarm.
However, these candidate windfarms, being in the Eyre Peninsular and Mid North SA nodes, are some considerable distance from the Lincoln Gap site and, as such, represent very rough proxies. The capacity factor for Lincoln Gap reported in Table 8 is 0.4390, which is quite high given that the capacity factors of Mt Millar, Clements Gap and The Bluff are between 0.3173 and 0.3599 respectively while the other windfarms included in the candidate list are between 0.4196 and 0.4373, respectively. Normally, we would expect the capacity factor to be somewhere between these two sets of values. However, given the close correlation of the wind climatology in the selected mid-north SA windfarms and probable different profile emerging over the Eyre Peninsular, the new averaged wind climatology for Lincoln Gap might capture enhanced element from both regions underlying wind climatology.
Electronic copy available at: https://ssrn.com/abstract=2724867 Furthermore, the WTG assumed for Lincoln Gap is higher than those associated with the other windfarms, and is therefore likely to increase its capacity factor relative to the other windfarms. However, notwithstanding the results for Lincoln Gap, it is emphasised that the assumed average five-minute wind speed profile used for this particular windfarm is an extremely rough guide for the probable wind climatology of this windfarm.
The final state in the NEM with wind power generation is TAS. This state has two operational windfarms -Woolnorth [in the Burnie node (node 44)] and Musselroe [in the Hadspen node (node 46)]. The capacity factors reported in Table 8 for these two windfarms were constrained to match the historical performance of these two windfarms producing capacity factors of 0.4092 for Woolnorth and 0.4300 for Musselroe. While some deflation of the average five-minute wind speed profile was required to achieve the reported capacity factor for Woolnorth, no deflation or inflation of wind speeds was applied in the case of Musselroe. It should also be noted that the historical record for Musselroe is a lot shorter in extent than for Woolnorth because the former windfarm was commissioned in mid-2013 while the latter was fully commissioned by mid-2007.
There are three proposed TAS windfarm included in the study -Low Head [George Town node (e.g. node 42)], Granville Harbour [Farrell node (e.g. node 45)] and Cattle Hill windfarm [Liapootah node e.g. (node 50)]. Unfortunately, at the time of selecting suitable sites for the purpose of WRF model simulations, no sites corresponding to these three windfarm were chosen and proxy wind climatology had to be subsequently imposed for these proposed windfarms. For Low Head and Cattle Hill, averages of the five-minute average wind speeds for Woolnorth and Musselroe were chosen. For Granville Harbour, the Woolnorth wind climatology was adopted. In the case of Low Head, the justification was that the Low Head site was situated around half way between the Woolnorth and Musselroe windfarm sites in Northern TAS, although along a similar latitude. For Granville Harbour, this windfarm was also on the western coastline of TAS, although some distance to the south of Woolnorth. Thus, while it broadly shared a similar longitude to Woolnorth, its location laid along a different latitude. In the case of Cattle Hill, it broadly shared a similar longitude to Low Head but bordered the eastern edge of Lake Echo in Southern TAS. In Latitude terms, it was lower than Granville Harbour, being to the south and was separated from Woolnorth and Musselroe by mountain ranges in central TAS. Therefore, in overall terms, the adopted proxy climatology settings would be the worst in the case of Cattle Hill, and would be somewhat better approximations (although by no means ideal) for Granville Harbour and Low Head. In all three cases, some deflation of the average wind speeds was required to achieve the reported outcomes listed in Table 8. From this table, Low Head had an average capacity factor of 0.4342, Granville Harbour 0.4093 and Cattle Hill 0.4039. Finally, the slightly lower average capacity factors reported for Granville Harbour and Cattle Hill when compared with Low Head and Woolnorth reflects the slightly higher deflation factor that was applied to both windfarms when compared with Low Head and Woolnorth windfarms.

Wind Generation Penetration Scenarios, Aggregate GWh Production and LRET Target Compliance
The currently legislated 2020 41,000 GWh LRET targets for the period 2014 to 2021 are listed in Column 2 of Table 9. These targets apply to Australia as a whole and include Western Australia and the Northern Territory as well as the NEM states of QLD, NSW, VIC, SA and TAS. In this study, we assume that 87% of the target's value in each year would be Electronic copy available at: https://ssrn.com/abstract=2724867 met by wind generation production within the NEM. The remaining 13% is assumed to be made up from other eligible renewable generation in the NEM as well as renewable generation in other jurisdictions including currently installed windfarms in Western Australia. Implicit in this assumption is that the construction of further windfarms in Western Australia is likely to be limited and that the contribution of hydro generation in the NEM is constrained for most generators (apart from Bogong hydro in VIC) by the need to exceed pre-determined base production levels before additional production becomes eligible for generating LGC certificates. Thus, most of the LRET target will have to be met by other second-generation renewable energy technologies such as wind and large-scale solar farms. However, of these technologies, wind is the cheapest and most developed, within the Australian context, with an existing and substantial pipeline of approved projects that are capable of being 'run out' in order to meet the target over the period 2015 to 2020. This 87% target corresponding to the 2020 41 TWh LRET target is listed in Column 3 of Table 9. Examination of the last column of Table 8 indicates that 'Operational and Under Construction' windfarms would not be capable of meeting the 2014 target value of 14,747 GWh, given the average capacity factors listed in Table 8. In fact, given the target value of 14,747 GWh, a deficit of around 2,530 GWh would exist as shown in Table 8. Thus, the addition of some of the proposed 'Type 5 and 4' windfarms would be required to meet the 2014 target value, if we were depending upon production to fully meet LRET obligations in 2014.
More generally, the addition of the proposed 'Type 5 and 4' windfarms would generate enough capacity to almost meet the 2017 target value reported in Column 3 of Table 9 leaving a deficit of around 482 GWh relative to the 2017 target value of 22,647 GWh as shown in the last column of Table 8. Therefore, given the average capacity factors, a small number of windfarms from the 'Type 3 Base' category would most likely be needed to satisfy the 2017 target value listed in Column 3 of Table 9.
The addition of all of the 'Type 3 Base' proposed windfarms would generate enough capacity, given the average capacity factors listed in Table 8, to satisfy the 2020 target of 35,670 GWh, with a surplus of 1,609 GWh left over as indicated in the last column of Table 8. Furthermore, the addition of the proposed 'Type 3 Max' windfarms produces a significant surplus of production of approximately 10,237 GWh, relative to the 2020 target, as also shown in the last column of Table 8.
Electronic copy available at: https://ssrn.com/abstract=2724867 The results reported above are predicated on the average capacity factors listed in Table 8. Recall from the discussion in the previous section that some of these appeared to be towards the upper end of expectations. To the extent that the reported capacity factors might exaggerate the wind resource at some of the windfarms, then additional windfarms from the 'Type 5 and 4' category might potentially be required to meet the 2014 target. Furthermore, additional windfarms from the 'Type 3 Base' category might be needed to satisfy the 2017 target and even some from the 'Type 3 Max' category might be required to meet the 2020 target if the surplus production identified of 1,609 GWh above for the 'Type 3 Base' category was completely absorbed by lower observed capacity factors. However, even with lower capacity factors, the complete list of windfarms included in the study would be more than sufficient to fully exhaust the 2020 41,000 GWh LRET target.
It should be noted that currently there is a surplus of LGC certificates which is expected to be exhausted by 2017. As such, 2017 and 2018 are key years in which the current configuration of the LRET target would most likely have to be met fully from renewable energy production to avoid having penalty provisions of the scheme enacted. Under the current settings, this would require 'Type 5 and 4' and a fairly significant proportion of 'Type 3 Base' windfarms to be constructed and supplying eligible power to the electricity network by 2018. This implies the construction of around 4,913 MW of additional wind capacity, in a little over two to three yearsa more than doubling of the MW capacity of the current 'Operational and Under Construction' windfarm category which, from Table 8, is approximately 3,682 MW's.
More recently, a revised LRET target of 33,500 GWh has been proposed by business groups, representative of the Clean Energy Sector and the Federal Opposition in response to the loss of political bi-partisanship on the Renewable Energy Target and resulting regulatory uncertainty. This has led, in turn, to the complete drying up of investment in the large-scale renewable energy sector in Australia over the last year and a half. An 87% target was calculated for the lower proposed 33,500 GWh 2020 target using the yearly-based shares of the original scheme to extrapolate a path over years 2014 to 2020. That is, we divided each year's LRET target in Column 2 of Table 9 by 41,000 and then multiplied that proportion for each year by 33,500 to produce the annual targets listed in Column 4. We then multiplied these figures by 0.87 to get the values listed in Column 5 of Table 9.
Examination of Column 5 of Table 9 indicates that under the new target regime, the 'Operational and Under Construction' capacity is sufficient to meet the 2014 target listed in Column 5 of Table 9. From assessment of the last column of Table 8, in order to meet the 2015 target, some additional windfarms from the 'Type 5 and 4' category would have to be constructed. The addition of all the 'Type 5 and 4' windfarms would be sufficient to meet the reduced LRET Target out to 2018 with a slight surplus of around 391 GWh. The addition of all the 'Type 3 Base' windfarms would be more than sufficient to meet the 2020 target of 29,145 GWh with a surplus of around 8,134 GWhup from the 1,609 GWh surplus calculated under the original 41 TWh LRET target as discussed earlier in this section. In this case and even accounting for the possibility of some (slight) exaggeration of wind resources for some windfarms, we would not seem to need the 'Type 3 Max' category of windfarms to exhaust the new reduced LRET target in 2020.

Calculating Half-Hourly Windfarm Output Traces
The five-minute windfarm output traces form the basis of the windfarm output traces used in ANEM model simulations. Half-hourly output traces are calculated by averaging six consecutive five-minute windfarm output values calculated in the previous section to produce a half-hourly output trace for each windfarm. These half-hourly output traces are calculated for each half-hourly period in years 2010, 2011 and 2012. Note that there are 17520 halfhours in years 2010 and 2011 and 17568 half-hours in year 2012.
Nodal based half-hourly windfarm output traces are then calculated by summing together the half-hourly output traces of each windfarm located in a particular node. These nodal based windfarm output traces are then used in ANEM model wholesale market simulations to investigate the impact of different levels of wind generation in the NEM.

Implementation in ANEM
Wind generators are assumed to construct supply offers for their output based upon their variable costs. As such, they are assumed to operate as semi-scheduled plant. In this study, we assume that 85% of total operating costs of windfarms are fixed costs whilst the remaining 15% are variable costs. This assumption ensures that supply offers of windfarms were towards the bottom of the merit order of dispatch ensuring a high probability of dispatch when compared to other competing types of generation. In general, the ($/MWh) supply offers of windfarms used in the modelling was in the range of $2.76/MWh to $4.69/MWh, and are amongst the cheapest forms of generation incorporated in the modelling.
The default setting adopted for modelling purposes is for wind generation not to be dispatched. This is implemented by specifying a default supply offer for wind generation containing a zero minimum stable operating level, a maximum MW capacity calculated by summing the maximum MW capacities of all windfarms located in the node and utilising a supply offer coefficient equal to the 'Value-of-Lost-Load' (VOLL). Under these default settings, the nodal based wind generation source would not be dispatched, representing one of the most expensive types of available generation.
This default setting is overridden when the output of the nodal based wind generation source exceeds 5MW. Thus, the only parameter changes from the default setting is a new (lower) maximum MW capacity limit corresponding to the calculated output of the windfarms and use of short run marginal cost coefficients obtained by averaging across the equivalent coefficients of all windfarms located in the node, instead of VOLL. Recall that these coefficient values lie in the range of $2.76/MWh to $4.69/MWh. Furthermore, the minimum stable operating level remains the same, that is set to zero MW.
Because dispatch involving nodal output less than 5 MW is excluded, there will be a marginal reduction in aggregate output dispatched during ANEM market simulations, and with this only becoming pronounced at nodes containing one or two very small windfarms such as Far North QLD under the 'Operational and Under Construction' scenario. As more Electronic copy available at: https://ssrn.com/abstract=2724867 and larger windfarms are added in, this issue largely disappears as nodal wind generation output is calculated across a number of windfarms located in a node.