Loading…
Online pricing of demand response based on long short-term memory and reinforcement learning
•Propose a quick online pricing method in the case of customers’ response is unknown.•Reinforcement learning is used as a price decision framework.•LSTM networks are used to predict customers’ response.•Combine LSTM networks and reinforcement learning to realize virtual exploration.•Optimize the tot...
Saved in:
Published in: | Applied energy 2020-08, Vol.271, p.114945, Article 114945 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •Propose a quick online pricing method in the case of customers’ response is unknown.•Reinforcement learning is used as a price decision framework.•LSTM networks are used to predict customers’ response.•Combine LSTM networks and reinforcement learning to realize virtual exploration.•Optimize the total profit to avoid the negative effects of myopic optimization.
Incentive-based demand response is playing an increasingly important role in ensuring the safe operation of the power grid and reducing system costs, and advances in information and communications technology have made it possible to implement it online. However, in regions where incentive-based demand response has never been implemented, the response behavior of customers is unknown, in this case, how to quickly and accurately set the incentive price is a challenge for service providers. This paper proposes a pricing method that combines long short-term memory networks and reinforcement learning to solve the pricing problem of service providers when the customers’ response behavior is unknown. Taking the total profit of all response time slots in one day as the optimization goal, long and short-term memory networks are used to learn the relationship between customers’ response behavior and incentive price, and reinforcement learning is used to explore and determine the optimal price. The results show that the combination of these two methods can perform virtual exploration of the optimal price, which solves the disadvantage that reinforcement learning can only rely on delayed rewards to perform exploration in the real scene, thereby speeding up the process of setting the optimal price. In addition, because the influence of the incentive prices combination of different time slots on the profit of the service provider is considered, the negative effect of myopia optimization is avoided. |
---|---|
ISSN: | 0306-2619 1872-9118 |
DOI: | 10.1016/j.apenergy.2020.114945 |