Loading…

Model-Architecture Co-Design for High Performance Temporal GNN Inference on FPGA

Temporal Graph Neural Networks (TGNNs) are powerful models to capture temporal, structural, and contextual information on temporal graphs. The generated temporal node embeddings outperform other methods in many downstream tasks. Real-world applications require high performance inference on real-time...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2022-03
Main Authors:	Zhou, Hongkuan, Zhang, Bingyi, Kannan, Rajgopal, Prasanna, Viktor, Busart, Carl
Format:	Article
Language:	English
Subjects:	Co-design Coders Computation Computer architecture Distillation Field programmable gate arrays Graph neural networks Graphs Hardware Inference Parallel processing Performance enhancement Performance evaluation Weight reduction
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Zhou, Hongkuan Zhang, Bingyi Kannan, Rajgopal Prasanna, Viktor Busart, Carl
description	Temporal Graph Neural Networks (TGNNs) are powerful models to capture temporal, structural, and contextual information on temporal graphs. The generated temporal node embeddings outperform other methods in many downstream tasks. Real-world applications require high performance inference on real-time streaming dynamic graphs. However, these models usually rely on complex attention mechanisms to capture relationships between temporal neighbors. In addition, maintaining vertex memory suffers from intrinsic temporal data dependency that hinders task-level parallelism, making it inefficient on general-purpose processors. In this work, we present a novel model-architecture co-design for inference in memory-based TGNNs on FPGAs. The key modeling optimizations we propose include a light-weight method to compute attention scores and a related temporal neighbor pruning strategy to further reduce computation and memory accesses. These are holistically coupled with key hardware optimizations that leverage FPGA hardware. We replace the temporal sampler with an on-chip FIFO based hardware sampler and the time encoder with a look-up-table. We train our simplified models using knowledge distillation to ensure similar accuracy vis-á-vis the original model. Taking advantage of the model optimizations, we propose a principled hardware architecture using batching, pipelining, and prefetching techniques to further improve the performance. We also propose a hardware mechanism to ensure the chronological vertex updating without sacrificing the computation parallelism. We evaluate the performance of the proposed hardware accelerator on three real-world datasets.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2638169907</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2638169907</sourcerecordid><originalsourceid>FETCH-proquest_journals_26381699073</originalsourceid><addsrcrecordid>eNqNissKgkAYRocgSMp3GGg9MM7kbSmW2qJw4V7Efm_ojP2j759BD9DqO5zz7YglpHRYcBHiQGxjBs658HzhutIi-UO_YGQR1l2_QL2sCDTW7AqmbxVtNNKsbzuaA248VaoGWsA0a6xGmj6f9K4aQPhqrWiSp9GJ7JtqNGD_9kjOya2IMzajfq9glnLQK6otlcKTgeOFIfflf68PtAU9fw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2638169907</pqid></control><display><type>article</type><title>Model-Architecture Co-Design for High Performance Temporal GNN Inference on FPGA</title><source>Publicly Available Content (ProQuest)</source><creator>Zhou, Hongkuan ; Zhang, Bingyi ; Kannan, Rajgopal ; Prasanna, Viktor ; Busart, Carl</creator><creatorcontrib>Zhou, Hongkuan ; Zhang, Bingyi ; Kannan, Rajgopal ; Prasanna, Viktor ; Busart, Carl</creatorcontrib><description>Temporal Graph Neural Networks (TGNNs) are powerful models to capture temporal, structural, and contextual information on temporal graphs. The generated temporal node embeddings outperform other methods in many downstream tasks. Real-world applications require high performance inference on real-time streaming dynamic graphs. However, these models usually rely on complex attention mechanisms to capture relationships between temporal neighbors. In addition, maintaining vertex memory suffers from intrinsic temporal data dependency that hinders task-level parallelism, making it inefficient on general-purpose processors. In this work, we present a novel model-architecture co-design for inference in memory-based TGNNs on FPGAs. The key modeling optimizations we propose include a light-weight method to compute attention scores and a related temporal neighbor pruning strategy to further reduce computation and memory accesses. These are holistically coupled with key hardware optimizations that leverage FPGA hardware. We replace the temporal sampler with an on-chip FIFO based hardware sampler and the time encoder with a look-up-table. We train our simplified models using knowledge distillation to ensure similar accuracy vis-á-vis the original model. Taking advantage of the model optimizations, we propose a principled hardware architecture using batching, pipelining, and prefetching techniques to further improve the performance. We also propose a hardware mechanism to ensure the chronological vertex updating without sacrificing the computation parallelism. We evaluate the performance of the proposed hardware accelerator on three real-world datasets.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Co-design ; Coders ; Computation ; Computer architecture ; Distillation ; Field programmable gate arrays ; Graph neural networks ; Graphs ; Hardware ; Inference ; Parallel processing ; Performance enhancement ; Performance evaluation ; Weight reduction</subject><ispartof>arXiv.org, 2022-03</ispartof><rights>2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2638169907?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25730,36988,44565</link.rule.ids></links><search><creatorcontrib>Zhou, Hongkuan</creatorcontrib><creatorcontrib>Zhang, Bingyi</creatorcontrib><creatorcontrib>Kannan, Rajgopal</creatorcontrib><creatorcontrib>Prasanna, Viktor</creatorcontrib><creatorcontrib>Busart, Carl</creatorcontrib><title>Model-Architecture Co-Design for High Performance Temporal GNN Inference on FPGA</title><title>arXiv.org</title><description>Temporal Graph Neural Networks (TGNNs) are powerful models to capture temporal, structural, and contextual information on temporal graphs. The generated temporal node embeddings outperform other methods in many downstream tasks. Real-world applications require high performance inference on real-time streaming dynamic graphs. However, these models usually rely on complex attention mechanisms to capture relationships between temporal neighbors. In addition, maintaining vertex memory suffers from intrinsic temporal data dependency that hinders task-level parallelism, making it inefficient on general-purpose processors. In this work, we present a novel model-architecture co-design for inference in memory-based TGNNs on FPGAs. The key modeling optimizations we propose include a light-weight method to compute attention scores and a related temporal neighbor pruning strategy to further reduce computation and memory accesses. These are holistically coupled with key hardware optimizations that leverage FPGA hardware. We replace the temporal sampler with an on-chip FIFO based hardware sampler and the time encoder with a look-up-table. We train our simplified models using knowledge distillation to ensure similar accuracy vis-á-vis the original model. Taking advantage of the model optimizations, we propose a principled hardware architecture using batching, pipelining, and prefetching techniques to further improve the performance. We also propose a hardware mechanism to ensure the chronological vertex updating without sacrificing the computation parallelism. We evaluate the performance of the proposed hardware accelerator on three real-world datasets.</description><subject>Co-design</subject><subject>Coders</subject><subject>Computation</subject><subject>Computer architecture</subject><subject>Distillation</subject><subject>Field programmable gate arrays</subject><subject>Graph neural networks</subject><subject>Graphs</subject><subject>Hardware</subject><subject>Inference</subject><subject>Parallel processing</subject><subject>Performance enhancement</subject><subject>Performance evaluation</subject><subject>Weight reduction</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNissKgkAYRocgSMp3GGg9MM7kbSmW2qJw4V7Efm_ojP2j759BD9DqO5zz7YglpHRYcBHiQGxjBs658HzhutIi-UO_YGQR1l2_QL2sCDTW7AqmbxVtNNKsbzuaA248VaoGWsA0a6xGmj6f9K4aQPhqrWiSp9GJ7JtqNGD_9kjOya2IMzajfq9glnLQK6otlcKTgeOFIfflf68PtAU9fw</recordid><startdate>20220310</startdate><enddate>20220310</enddate><creator>Zhou, Hongkuan</creator><creator>Zhang, Bingyi</creator><creator>Kannan, Rajgopal</creator><creator>Prasanna, Viktor</creator><creator>Busart, Carl</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PHGZM</scope><scope>PHGZT</scope><scope>PIMPY</scope><scope>PKEHL</scope><scope>PQEST</scope><scope>PQGLB</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20220310</creationdate><title>Model-Architecture Co-Design for High Performance Temporal GNN Inference on FPGA</title><author>Zhou, Hongkuan ; Zhang, Bingyi ; Kannan, Rajgopal ; Prasanna, Viktor ; Busart, Carl</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_26381699073</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Co-design</topic><topic>Coders</topic><topic>Computation</topic><topic>Computer architecture</topic><topic>Distillation</topic><topic>Field programmable gate arrays</topic><topic>Graph neural networks</topic><topic>Graphs</topic><topic>Hardware</topic><topic>Inference</topic><topic>Parallel processing</topic><topic>Performance enhancement</topic><topic>Performance evaluation</topic><topic>Weight reduction</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Hongkuan</creatorcontrib><creatorcontrib>Zhang, Bingyi</creatorcontrib><creatorcontrib>Kannan, Rajgopal</creatorcontrib><creatorcontrib>Prasanna, Viktor</creatorcontrib><creatorcontrib>Busart, Carl</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Engineering Database</collection><collection>ProQuest Central (New)</collection><collection>ProQuest One Academic (New)</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Middle East (New)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Applied & Life Sciences</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhou, Hongkuan</au><au>Zhang, Bingyi</au><au>Kannan, Rajgopal</au><au>Prasanna, Viktor</au><au>Busart, Carl</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Model-Architecture Co-Design for High Performance Temporal GNN Inference on FPGA</atitle><jtitle>arXiv.org</jtitle><date>2022-03-10</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>Temporal Graph Neural Networks (TGNNs) are powerful models to capture temporal, structural, and contextual information on temporal graphs. The generated temporal node embeddings outperform other methods in many downstream tasks. Real-world applications require high performance inference on real-time streaming dynamic graphs. However, these models usually rely on complex attention mechanisms to capture relationships between temporal neighbors. In addition, maintaining vertex memory suffers from intrinsic temporal data dependency that hinders task-level parallelism, making it inefficient on general-purpose processors. In this work, we present a novel model-architecture co-design for inference in memory-based TGNNs on FPGAs. The key modeling optimizations we propose include a light-weight method to compute attention scores and a related temporal neighbor pruning strategy to further reduce computation and memory accesses. These are holistically coupled with key hardware optimizations that leverage FPGA hardware. We replace the temporal sampler with an on-chip FIFO based hardware sampler and the time encoder with a look-up-table. We train our simplified models using knowledge distillation to ensure similar accuracy vis-á-vis the original model. Taking advantage of the model optimizations, we propose a principled hardware architecture using batching, pipelining, and prefetching techniques to further improve the performance. We also propose a hardware mechanism to ensure the chronological vertex updating without sacrificing the computation parallelism. We evaluate the performance of the proposed hardware accelerator on three real-world datasets.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2022-03
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2638169907
source	Publicly Available Content (ProQuest)
subjects	Co-design Coders Computation Computer architecture Distillation Field programmable gate arrays Graph neural networks Graphs Hardware Inference Parallel processing Performance enhancement Performance evaluation Weight reduction
title	Model-Architecture Co-Design for High Performance Temporal GNN Inference on FPGA
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-24T16%3A38%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Model-Architecture%20Co-Design%20for%20High%20Performance%20Temporal%20GNN%20Inference%20on%20FPGA&rft.jtitle=arXiv.org&rft.au=Zhou,%20Hongkuan&rft.date=2022-03-10&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2638169907%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_26381699073%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2638169907&rft_id=info:pmid/&rfr_iscdi=true