Loading…

Transformer Network-Based Reinforcement Learning Method for Power Distribution Network (PDN) Optimization of High Bandwidth Memory (HBM)

In this article, for the first time, we propose a transformer network-based reinforcement learning (RL) method for power distribution network (PDN) optimization of high bandwidth memory (HBM). The proposed method can provide an optimal decoupling capacitor (decap) design to maximize the reduction of...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on microwave theory and techniques 2022-11, Vol.70 (11), p.4772-4786
Main Authors: Park, Hyunwook, Kim, Minsu, Kim, Seongguk, Kim, Keunwoo, Kim, Haeyeon, Shin, Taein, Son, Keeyoung, Sim, Boogyo, Kim, Subin, Jeong, Seungtaek, Hwang, Chulsoon, Kim, Joungho
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this article, for the first time, we propose a transformer network-based reinforcement learning (RL) method for power distribution network (PDN) optimization of high bandwidth memory (HBM). The proposed method can provide an optimal decoupling capacitor (decap) design to maximize the reduction of PDN self- and transfer impedances seen at multiple ports. An attention-based transformer network is implemented to directly parameterize decap optimization policy. The optimality performance is significantly improved since the attention mechanism has powerful expression to explore massive combinatorial space for decap assignments. Moreover, it can capture sequential relationships between the decap assignments. The computing time for optimization is dramatically reduced due to the reusable network on the positions of probing ports and decap assignment candidates. This is because the transformer network has a context embedding process to capture meta-features including probing ports positions. In addition, the network is trained with randomly generated datasets. The computing time for training and data cost are critically decreased due to the scalability of the network. Due to its shared weight property and the context embedding process, the network can adapt to a larger scale of problems without additional training. For verification, the results are compared with conventional genetic algorithm (GA), random search (RS), and all the previous RL-based methods. As a result, the proposed method outperforms in all the following aspects: optimality performance, computing time, and data efficiency.
ISSN:0018-9480
1557-9670
DOI:10.1109/TMTT.2022.3202221