Energy Storage Valuation/Benchmarking

From EPRI Storage Wiki
Jump to navigation Jump to search

This section will quantitively compare the results from a few selected energy storage valuation tools in a single use case to highlight their differences and inform tool selection. Underperformance in this comparison should not be taken as a blanket statement against any tool owing to the lack of a root cause analysis. This analysis will be expanded as time goes on to incorporate more use cases and more tools with the hope of improving the field of energy storage modeling tools and providing more robust tool selection guidance.

To support comparison on a level playing field, only tools that support the use case were tested.

Use Case

This comparison will rely on the results from a single use case, which will be expanded in future publications to span the breadth of energy storage use cases more fully. The use case used to compare between tools is a simple customer side of the meter case where a fixed-size battery energy storage system is used to reduce the energy and demand charges experienced by a large commercial customer in Southern California. This customer’s electric demand profile will be consistent across tools as will the size of the storage system and the customer’s tariff. The differences between the models should come out in the expected demand and energy charge savings as a result of operating the storage system and the associated operational profile for the batteries.

Interval Demand Data

Hourly interval data is used to represent the customer. The tariff specifies that demand charges are calculated based on 15-minute data, but so that all tools could be compared (most do not go below hourly resolution), hourly data is used for all.

This site exhibits very little seasonal variation in the scale and timing of its electricity use.

The provided converter spreadsheet was used to extract the monthly average weekday, average weekend, and peak days to use in DER-CAM.

Tools Tested

This section selects EPRI’s DER-VET, Sandia National Laboratory’s QuESt, The National Renewable Energy Laboratory’s System Advisor Model, The National Renewable Energy Laboratory’s REopt Lite, and Lawrence Berkeley National Laboratory’s DER-CAM tools for quantitative comparison based on their status as free to use models with somewhat overlapping capabilities.

While none of these tools are perfectly comparable, the use case selected for this initial comparison is fully covered by all the selected tools. Many of the differences between the tools will not be highlighted by this use case, so future work will involve expanding the set of comparison cases to provide a comprehensive exploration of quantitative results for standard use cases across valuation tools.


This comparison uses the TOU-8 Option D (<2kV) for 2020 from Southern California Edison (SCE). No storage valuation tool can model every tariff in existence, but all the tools selected for this comparison support the relatively simple but high-cost TOU-8 tariff. This tariff is a time of use (TOU) tariff that defines three periods each for summer and winter.

Some of the tools, REopt Lite, SAM, QuESt, and DER-CAM have built-in connections to a utility rate database, allowing the user to search for and find the rate above without manual data entry. DER-VET is currently under development and its publicly available Beta release does not have this feature, but it is planned for the first full release in Q1, 2021. However, many of the database connections returned out of date or flawed information about the tariff or retrieved data for a different voltage connection level, and most required manual correction before they matched the tariff from SCE’s website. These features should be considered a good place to start and a good way to reduce the amount of work that goes into tariff entry, but require careful validation and correction, which can be almost as time-consuming.

File:SAM Tariff Entry.PNG
SAM Tariff Entry
File:SAM Demand Charges.PNG
SAM Demand Charges
File:DER CAM Tariff Entry.PNG
DER-CAM Tariff Entry
File:QuESt Tariff Entry.PNG
QuESt Tariff Entry

The QuESt tariff entry process is smoothed by the OpenEI United States Utility Rate Database connection, but editing the tariff is time-consuming when the timing needed correction. I found it easier to edit the raw data files located in ./data/rate_structures (.json format) using a text editor that to use the graphic interface.

I could not get the exact tariff format to work with QuESt, so I had to sacrifice one of the energy charges.

File:DER VET Tariff Entry.PNG
DER-VET Tariff Entry

DER-VET can accept a file storing tariff data, like the above, or the GUI can guide the user through the tariff construction process and can connect to the OpenEI tariff database for automated tariff generation.

File:REopt Lite Tariff Entry.PNG
REopt Lite Tariff Entry
File:REopt Lite Demand Charge Entry.PNG
REopt Lite Demand Charge Entry

The tools with heatmap-style tariff entries do not all allow for overlapping demand charge periods, as is the case with SCE’s TOU-8 rate. This rate has a facilities all-hours demand charge in addition to an on-peak demand charge. By specifying the facilities demand charge to apply to every hour except the on-peak hours, we do not fully capture the calculation. If the all-hours demand peak occurs during on-peak hours, then the demand charge calculation using these tools will not be correct.

Storage Technical Specifications

A generic Li-ion based energy storage system was constructed in each tool with minimal complexity.

File:Storage Technical Specs.PNG
Storage Technical Specifications

Comparison, Results, and Discussion

The main comparisons to make are between the storage system’s year 1 financial performance in each tool and how they are operated during critical times, such as the peak load day in a month.

Financial Results

The year-1 financial results, shown in the table below, indicate some agreement between tools, reveal some of the differences in how data can be entered into each tool, and identify a critical problem encountered when using one of the tools.

File:Year 1 Bill Savings.PNG
Year 1 Bill Savings

DER-VET and DER-CAM realize extremely similar results despite their different approaches – DER-VET uses a full year of data and DER-CAM uses 3 days per month. One of the design days in DER-CAM is the peak day in each month, which will very likely set the demand charge before and after storage. Because most of the benefit from the storage comes from demand charges and there are no constraints on how the storage can be used to shift energy during normal operation, DER-CAM’s approach does not impact the results very much. In other scenarios with long-duration storage or in cases with unusual load profiles, there might be a larger difference between the two tools.

QuESt also achieved similar results, though QuESt’s inability to accept the full tariff without errors results in somewhat higher bill savings. QuESt’s approach is very similar to DER-VET’s for cases like this, so should calculate near-identical results to DER-VET when given the same inputs.

REopt Lite, which was used to set up a purely financial battery-only case, was not able to operate the battery as effectively as in other tools despite using a similar perfect foresight optimization method. REopt Lite reported the SOC of the battery in 10% increments, so it is difficult to diagnose exactly why the tool gave sub-optimal results.

SAM’s battery operational profile resulted in strongly negative savings (the battery increased the site’s electricity bills). SAM does not use perfect foresight dispatch optimization. Instead, it operates the battery through a set of rules. In this case, the battery controller calculated a maximum peak load to allow over the next 24 hours based on a 24-hour look-ahead. This approach yielded demand charge savings most months which would have been modest relative to the perfect foresight result. However, in three months, the battery charged quickly at inopportune times, increasing those months’ demand charges more than it saved in the other 9 months, resulting in overall negative savings. If this were working properly, SAM’s approach would be expected to yield smaller savings than the perfect foresight models, but instead it yielded negative savings.

Interestingly, there was not even perfect agreement between what the base case (no storage) energy and demand charges should be. All the tools calculated similar base case electricity bills, but there are some differences that result in slightly different results.

Dispatch Results

All the tools in this comparison model the operation of the storage chronologically and can, with varying degrees of user effort, export the operational profile of the battery.

To compare between all tools evenly, the plot below shows the net load at the site (customer load – battery power) for July’s peak load day and compares these results with the original site load in black. This day was considered in DER-CAM so, while the rest of July’s operational results are not available in DER-CAM, this day is. July happens to be the month that sets the highest demand charges of the year, which is why it is being used for dispatch comparison.

File:July Operational Comparisons.PNG
July Operational Comparisons

The operation of the batteries in each tool over this day reveal the source of the financial mismatches. DER-VET, DER-CAM, and QuESt are all operated very similarly, reducing the net demand of the customer to around the same value during the 4pm-9pm peak time in which the strongest demand charges and highest energy prices are applied. These tools also all indicate battery charging early in the day, filling up the energy reserve for the afternoon peak without increasing the all-hours demand charge. This supports the idea that, given the same tariff information, QuESt would have produced very similar financial results to DER-VET and DER-CAM.

REopt produces a battery operational profile that seems to be mismatched with other results by 1 hour despite the tariff inputs matching the other tools. Additionally, it finds a result which is much less optimal than DER-VET, DER-CAM, or QuESt. The source of this difference is still unknown as REopt should be solving a similar perfect foresight dispatch optimization.

SAM does not find a very optimal operational profile for this month given the load and tariff information available. The tariff has an all-hours demand charge, which SAM’s algorithm is very good at reducing (you can see SAM’s daily peak load target ratchet up over the month), but also a 4pm-9pm weekday demand charge that SAM’s algorithm does not reduce very much and is much more expensive than the all-hours demand charge. SAM’s algorithm does not consider anything besides reducing the overall monthly peak, so does not use the battery as efficiently as other approaches, which tend to reduce the all-hours demand charge only slightly and focus all of the stored energy in the battery on reducing the peak demand charge.

File:SAM July Operation.PNG
SAM July Operation
File:DER VET July Operation.PNG
DER VET July Operation

In other months such as August, the SAM algorithm yields an inexplicable and short-lived charging event that both exceeds the battery’s power capacity somewhat and increases the demand charges for the site enormously. In the plot below, you can again visualize SAM’s algorithm as it tries to reduce the peak load in day 1 but gives up after the spike in day 2.

File:SAM August Operation.PNG
SAM August Operation


While the approaches adopted by most of the tools show strong alignment (even DER-CAM’s design days approach uses perfect foresight dispatch optimization within the design days), the numerical results do not necessarily show the same alignment. Differences between implementations can result in differences in scope and applicability that limit a true apples-to-apples comparison (e.g. between DER-VET and QuESt in this use case). Also, these implementation differences can cause significant quantitative misalignment between tools despite similar modeling approaches (e.g. between DER-VET and REopt).

In future publications, the set of use cases used in the benchmarking exercise will grow to cover more of each tool’s feature set, will encompass more tools, and will be standardized to allow for consistent benchmarking. For now, the limited scope of the quantitative comparison limits broad conclusions but shows enough to highlight the need for standard benchmarking.