Loading…

Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization

How to best develop foundational models for time series forecasting remains an important open question. Tokenization is a crucial consideration in this effort: what is an effective discrete vocabulary for a real-valued sequential input? To address this question, we develop WaveToken, a wavelet-based...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-12
Main Authors:	Masserano, Luca, Ansari, Abdul Fatir, Boran Han, Zhang, Xiyuan, Faloutsos, Christos, Mahoney, Michael W, Andrew Gordon Wilson, Park, Youngsuk, Rangapuram, Syama, Maddix, Danielle C, Wang, Yuyang
Format:	Article
Language:	English
Subjects:	Autoregressive models Datasets Decomposition Deep learning Forecasting Time series
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Masserano, Luca Ansari, Abdul Fatir Boran Han Zhang, Xiyuan Faloutsos, Christos Mahoney, Michael W Andrew Gordon Wilson Park, Youngsuk Rangapuram, Syama Maddix, Danielle C Wang, Yuyang
description	How to best develop foundational models for time series forecasting remains an important open question. Tokenization is a crucial consideration in this effort: what is an effective discrete vocabulary for a real-valued sequential input? To address this question, we develop WaveToken, a wavelet-based tokenizer that allows models to learn complex representations directly in the space of time-localized frequencies. Our method first scales and decomposes the input time series, then thresholds and quantizes the wavelet coefficients, and finally pre-trains an autoregressive model to forecast coefficients for the forecast horizon. By decomposing coarse and fine structures in the inputs, wavelets provide an eloquent and compact language for time series forecasting that simplifies learning. Empirical results on a comprehensive benchmark, including 42 datasets for both in-domain and zero-shot settings, show that WaveToken: i) provides better accuracy than recently proposed foundation models for forecasting while using a much smaller vocabulary (1024 tokens), and performs on par or better than modern deep learning models trained specifically on each dataset; and ii) exhibits superior generalization capabilities, achieving the best average rank across all datasets for three complementary metrics. In addition, we show that our method can easily capture complex temporal patterns of practical relevance that are challenging for other recent pre-trained models, including trends, sparse spikes, and non-stationary time series with varying frequencies evolving over time.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3142375028</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3142375028</sourcerecordid><originalsourceid>FETCH-proquest_journals_31423750283</originalsourceid><addsrcrecordid>eNqNzLEOgjAUheHGxESivEMTZ5LSgrAbiIuTREdS4aLF2mpvYfDpReMDOJ3h_3JmJOBCxFGecL4gIWLPGOObjKepCMixMFdpGmUutLSDaaVX1tC9bUEj7ayjlboDPYBTgJNw0Ej0Hz0qSU9yBA0-OkuEllb2Bka9vg8rMu-kRgh_uyTrsqi2u-jh7HMA9HVvB2emVIs44SJLGc_Ff-oNDP5BQA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3142375028</pqid></control><display><type>article</type><title>Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization</title><source>Publicly Available Content Database</source><creator>Masserano, Luca ; Ansari, Abdul Fatir ; Boran Han ; Zhang, Xiyuan ; Faloutsos, Christos ; Mahoney, Michael W ; Andrew Gordon Wilson ; Park, Youngsuk ; Rangapuram, Syama ; Maddix, Danielle C ; Wang, Yuyang</creator><creatorcontrib>Masserano, Luca ; Ansari, Abdul Fatir ; Boran Han ; Zhang, Xiyuan ; Faloutsos, Christos ; Mahoney, Michael W ; Andrew Gordon Wilson ; Park, Youngsuk ; Rangapuram, Syama ; Maddix, Danielle C ; Wang, Yuyang</creatorcontrib><description>How to best develop foundational models for time series forecasting remains an important open question. Tokenization is a crucial consideration in this effort: what is an effective discrete vocabulary for a real-valued sequential input? To address this question, we develop WaveToken, a wavelet-based tokenizer that allows models to learn complex representations directly in the space of time-localized frequencies. Our method first scales and decomposes the input time series, then thresholds and quantizes the wavelet coefficients, and finally pre-trains an autoregressive model to forecast coefficients for the forecast horizon. By decomposing coarse and fine structures in the inputs, wavelets provide an eloquent and compact language for time series forecasting that simplifies learning. Empirical results on a comprehensive benchmark, including 42 datasets for both in-domain and zero-shot settings, show that WaveToken: i) provides better accuracy than recently proposed foundation models for forecasting while using a much smaller vocabulary (1024 tokens), and performs on par or better than modern deep learning models trained specifically on each dataset; and ii) exhibits superior generalization capabilities, achieving the best average rank across all datasets for three complementary metrics. In addition, we show that our method can easily capture complex temporal patterns of practical relevance that are challenging for other recent pre-trained models, including trends, sparse spikes, and non-stationary time series with varying frequencies evolving over time.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Autoregressive models ; Datasets ; Decomposition ; Deep learning ; Forecasting ; Time series</subject><ispartof>arXiv.org, 2024-12</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3142375028?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25732,36991,44569</link.rule.ids></links><search><creatorcontrib>Masserano, Luca</creatorcontrib><creatorcontrib>Ansari, Abdul Fatir</creatorcontrib><creatorcontrib>Boran Han</creatorcontrib><creatorcontrib>Zhang, Xiyuan</creatorcontrib><creatorcontrib>Faloutsos, Christos</creatorcontrib><creatorcontrib>Mahoney, Michael W</creatorcontrib><creatorcontrib>Andrew Gordon Wilson</creatorcontrib><creatorcontrib>Park, Youngsuk</creatorcontrib><creatorcontrib>Rangapuram, Syama</creatorcontrib><creatorcontrib>Maddix, Danielle C</creatorcontrib><creatorcontrib>Wang, Yuyang</creatorcontrib><title>Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization</title><title>arXiv.org</title><description>How to best develop foundational models for time series forecasting remains an important open question. Tokenization is a crucial consideration in this effort: what is an effective discrete vocabulary for a real-valued sequential input? To address this question, we develop WaveToken, a wavelet-based tokenizer that allows models to learn complex representations directly in the space of time-localized frequencies. Our method first scales and decomposes the input time series, then thresholds and quantizes the wavelet coefficients, and finally pre-trains an autoregressive model to forecast coefficients for the forecast horizon. By decomposing coarse and fine structures in the inputs, wavelets provide an eloquent and compact language for time series forecasting that simplifies learning. Empirical results on a comprehensive benchmark, including 42 datasets for both in-domain and zero-shot settings, show that WaveToken: i) provides better accuracy than recently proposed foundation models for forecasting while using a much smaller vocabulary (1024 tokens), and performs on par or better than modern deep learning models trained specifically on each dataset; and ii) exhibits superior generalization capabilities, achieving the best average rank across all datasets for three complementary metrics. In addition, we show that our method can easily capture complex temporal patterns of practical relevance that are challenging for other recent pre-trained models, including trends, sparse spikes, and non-stationary time series with varying frequencies evolving over time.</description><subject>Autoregressive models</subject><subject>Datasets</subject><subject>Decomposition</subject><subject>Deep learning</subject><subject>Forecasting</subject><subject>Time series</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNzLEOgjAUheHGxESivEMTZ5LSgrAbiIuTREdS4aLF2mpvYfDpReMDOJ3h_3JmJOBCxFGecL4gIWLPGOObjKepCMixMFdpGmUutLSDaaVX1tC9bUEj7ayjlboDPYBTgJNw0Ej0Hz0qSU9yBA0-OkuEllb2Bka9vg8rMu-kRgh_uyTrsqi2u-jh7HMA9HVvB2emVIs44SJLGc_Ff-oNDP5BQA</recordid><startdate>20241206</startdate><enddate>20241206</enddate><creator>Masserano, Luca</creator><creator>Ansari, Abdul Fatir</creator><creator>Boran Han</creator><creator>Zhang, Xiyuan</creator><creator>Faloutsos, Christos</creator><creator>Mahoney, Michael W</creator><creator>Andrew Gordon Wilson</creator><creator>Park, Youngsuk</creator><creator>Rangapuram, Syama</creator><creator>Maddix, Danielle C</creator><creator>Wang, Yuyang</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241206</creationdate><title>Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization</title><author>Masserano, Luca ; Ansari, Abdul Fatir ; Boran Han ; Zhang, Xiyuan ; Faloutsos, Christos ; Mahoney, Michael W ; Andrew Gordon Wilson ; Park, Youngsuk ; Rangapuram, Syama ; Maddix, Danielle C ; Wang, Yuyang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31423750283</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Autoregressive models</topic><topic>Datasets</topic><topic>Decomposition</topic><topic>Deep learning</topic><topic>Forecasting</topic><topic>Time series</topic><toplevel>online_resources</toplevel><creatorcontrib>Masserano, Luca</creatorcontrib><creatorcontrib>Ansari, Abdul Fatir</creatorcontrib><creatorcontrib>Boran Han</creatorcontrib><creatorcontrib>Zhang, Xiyuan</creatorcontrib><creatorcontrib>Faloutsos, Christos</creatorcontrib><creatorcontrib>Mahoney, Michael W</creatorcontrib><creatorcontrib>Andrew Gordon Wilson</creatorcontrib><creatorcontrib>Park, Youngsuk</creatorcontrib><creatorcontrib>Rangapuram, Syama</creatorcontrib><creatorcontrib>Maddix, Danielle C</creatorcontrib><creatorcontrib>Wang, Yuyang</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Masserano, Luca</au><au>Ansari, Abdul Fatir</au><au>Boran Han</au><au>Zhang, Xiyuan</au><au>Faloutsos, Christos</au><au>Mahoney, Michael W</au><au>Andrew Gordon Wilson</au><au>Park, Youngsuk</au><au>Rangapuram, Syama</au><au>Maddix, Danielle C</au><au>Wang, Yuyang</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization</atitle><jtitle>arXiv.org</jtitle><date>2024-12-06</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>How to best develop foundational models for time series forecasting remains an important open question. Tokenization is a crucial consideration in this effort: what is an effective discrete vocabulary for a real-valued sequential input? To address this question, we develop WaveToken, a wavelet-based tokenizer that allows models to learn complex representations directly in the space of time-localized frequencies. Our method first scales and decomposes the input time series, then thresholds and quantizes the wavelet coefficients, and finally pre-trains an autoregressive model to forecast coefficients for the forecast horizon. By decomposing coarse and fine structures in the inputs, wavelets provide an eloquent and compact language for time series forecasting that simplifies learning. Empirical results on a comprehensive benchmark, including 42 datasets for both in-domain and zero-shot settings, show that WaveToken: i) provides better accuracy than recently proposed foundation models for forecasting while using a much smaller vocabulary (1024 tokens), and performs on par or better than modern deep learning models trained specifically on each dataset; and ii) exhibits superior generalization capabilities, achieving the best average rank across all datasets for three complementary metrics. In addition, we show that our method can easily capture complex temporal patterns of practical relevance that are challenging for other recent pre-trained models, including trends, sparse spikes, and non-stationary time series with varying frequencies evolving over time.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-12
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3142375028
source	Publicly Available Content Database
subjects	Autoregressive models Datasets Decomposition Deep learning Forecasting Time series
title	Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T23%3A25%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Enhancing%20Foundation%20Models%20for%20Time%20Series%20Forecasting%20via%20Wavelet-based%20Tokenization&rft.jtitle=arXiv.org&rft.au=Masserano,%20Luca&rft.date=2024-12-06&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3142375028%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_31423750283%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3142375028&rft_id=info:pmid/&rfr_iscdi=true