Loading…

Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version

The expanding instrumentation of processes throughout society with sensors yields a proliferation of time series data that may in turn enable important applications, e.g., related to transportation infrastructures or power grids. Machine-learning based methods are increasingly being used to extract...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-10
Main Authors:	Miao, Hao, Liu, Ziqiao, Zhao, Yan, Guo, Chenjuan, Yang, Bin, Zheng, Kai, Jensen, Christian S
Format:	Article
Language:	English
Subjects:	Condensation Data storage Datasets Effectiveness Machine learning Matching Time series
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Miao, Hao Liu, Ziqiao Zhao, Yan Guo, Chenjuan Yang, Bin Zheng, Kai Jensen, Christian S
description	The expanding instrumentation of processes throughout society with sensors yields a proliferation of time series data that may in turn enable important applications, e.g., related to transportation infrastructures or power grids. Machine-learning based methods are increasingly being used to extract value from such data. We provide means of reducing the resulting considerable computational and data storage costs. We achieve this by providing means of condensing large time series datasets such that models trained on the condensed data achieve performance comparable to those trained on the original, large data. Specifically, we propose a time series dataset condensation framework, TimeDC, that employs two-fold modal matching, encompassing frequency matching and training trajectory matching. Thus, TimeDC performs time series feature extraction and decomposition-driven frequency matching to preserve complex temporal dependencies in the reduced time series. Further, TimeDC employs curriculum training trajectory matching to ensure effective and generalized time series dataset condensation. To avoid memory overflow and to reduce the cost of dataset condensation, the framework includes an expert buffer storing pre-computed expert trajectories. Extensive experiments on real data offer insight into the effectiveness and efficiency of the proposed solutions.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3121798336</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3121798336</sourcerecordid><originalsourceid>FETCH-proquest_journals_31217983363</originalsourceid><addsrcrecordid>eNqNy70KwjAUhuEgCIp6DwecA22ibXXVioNOFlcJ5kRTaqI58efyzeAFOH3D-3w9NhRS5ryaCTFgE6I2yzJRlGI-l0NmdkgElmDvAy6hNsaeLboIjb0hHDBYJFirqAgjrLzT6EhF6x28rILm7bnxnU5vrTrYq3i-WnfhvP5ETFbDEQMlPWZ9ozrCyW9HbLqpm9WW34N_PJHiqfXP4FI6yVzk5aKSspD_qS8HOEYI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3121798336</pqid></control><display><type>article</type><title>Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version</title><source>Publicly Available Content (ProQuest)</source><creator>Miao, Hao ; Liu, Ziqiao ; Zhao, Yan ; Guo, Chenjuan ; Yang, Bin ; Zheng, Kai ; Jensen, Christian S</creator><creatorcontrib>Miao, Hao ; Liu, Ziqiao ; Zhao, Yan ; Guo, Chenjuan ; Yang, Bin ; Zheng, Kai ; Jensen, Christian S</creatorcontrib><description>The expanding instrumentation of processes throughout society with sensors yields a proliferation of time series data that may in turn enable important applications, e.g., related to transportation infrastructures or power grids. Machine-learning based methods are increasingly being used to extract value from such data. We provide means of reducing the resulting considerable computational and data storage costs. We achieve this by providing means of condensing large time series datasets such that models trained on the condensed data achieve performance comparable to those trained on the original, large data. Specifically, we propose a time series dataset condensation framework, TimeDC, that employs two-fold modal matching, encompassing frequency matching and training trajectory matching. Thus, TimeDC performs time series feature extraction and decomposition-driven frequency matching to preserve complex temporal dependencies in the reduced time series. Further, TimeDC employs curriculum training trajectory matching to ensure effective and generalized time series dataset condensation. To avoid memory overflow and to reduce the cost of dataset condensation, the framework includes an expert buffer storing pre-computed expert trajectories. Extensive experiments on real data offer insight into the effectiveness and efficiency of the proposed solutions.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Condensation ; Data storage ; Datasets ; Effectiveness ; Machine learning ; Matching ; Time series</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3121798336?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Miao, Hao</creatorcontrib><creatorcontrib>Liu, Ziqiao</creatorcontrib><creatorcontrib>Zhao, Yan</creatorcontrib><creatorcontrib>Guo, Chenjuan</creatorcontrib><creatorcontrib>Yang, Bin</creatorcontrib><creatorcontrib>Zheng, Kai</creatorcontrib><creatorcontrib>Jensen, Christian S</creatorcontrib><title>Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version</title><title>arXiv.org</title><description>The expanding instrumentation of processes throughout society with sensors yields a proliferation of time series data that may in turn enable important applications, e.g., related to transportation infrastructures or power grids. Machine-learning based methods are increasingly being used to extract value from such data. We provide means of reducing the resulting considerable computational and data storage costs. We achieve this by providing means of condensing large time series datasets such that models trained on the condensed data achieve performance comparable to those trained on the original, large data. Specifically, we propose a time series dataset condensation framework, TimeDC, that employs two-fold modal matching, encompassing frequency matching and training trajectory matching. Thus, TimeDC performs time series feature extraction and decomposition-driven frequency matching to preserve complex temporal dependencies in the reduced time series. Further, TimeDC employs curriculum training trajectory matching to ensure effective and generalized time series dataset condensation. To avoid memory overflow and to reduce the cost of dataset condensation, the framework includes an expert buffer storing pre-computed expert trajectories. Extensive experiments on real data offer insight into the effectiveness and efficiency of the proposed solutions.</description><subject>Condensation</subject><subject>Data storage</subject><subject>Datasets</subject><subject>Effectiveness</subject><subject>Machine learning</subject><subject>Matching</subject><subject>Time series</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNy70KwjAUhuEgCIp6DwecA22ibXXVioNOFlcJ5kRTaqI58efyzeAFOH3D-3w9NhRS5ryaCTFgE6I2yzJRlGI-l0NmdkgElmDvAy6hNsaeLboIjb0hHDBYJFirqAgjrLzT6EhF6x28rILm7bnxnU5vrTrYq3i-WnfhvP5ETFbDEQMlPWZ9ozrCyW9HbLqpm9WW34N_PJHiqfXP4FI6yVzk5aKSspD_qS8HOEYI</recordid><startdate>20241028</startdate><enddate>20241028</enddate><creator>Miao, Hao</creator><creator>Liu, Ziqiao</creator><creator>Zhao, Yan</creator><creator>Guo, Chenjuan</creator><creator>Yang, Bin</creator><creator>Zheng, Kai</creator><creator>Jensen, Christian S</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241028</creationdate><title>Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version</title><author>Miao, Hao ; Liu, Ziqiao ; Zhao, Yan ; Guo, Chenjuan ; Yang, Bin ; Zheng, Kai ; Jensen, Christian S</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31217983363</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Condensation</topic><topic>Data storage</topic><topic>Datasets</topic><topic>Effectiveness</topic><topic>Machine learning</topic><topic>Matching</topic><topic>Time series</topic><toplevel>online_resources</toplevel><creatorcontrib>Miao, Hao</creatorcontrib><creatorcontrib>Liu, Ziqiao</creatorcontrib><creatorcontrib>Zhao, Yan</creatorcontrib><creatorcontrib>Guo, Chenjuan</creatorcontrib><creatorcontrib>Yang, Bin</creatorcontrib><creatorcontrib>Zheng, Kai</creatorcontrib><creatorcontrib>Jensen, Christian S</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Miao, Hao</au><au>Liu, Ziqiao</au><au>Zhao, Yan</au><au>Guo, Chenjuan</au><au>Yang, Bin</au><au>Zheng, Kai</au><au>Jensen, Christian S</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version</atitle><jtitle>arXiv.org</jtitle><date>2024-10-28</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>The expanding instrumentation of processes throughout society with sensors yields a proliferation of time series data that may in turn enable important applications, e.g., related to transportation infrastructures or power grids. Machine-learning based methods are increasingly being used to extract value from such data. We provide means of reducing the resulting considerable computational and data storage costs. We achieve this by providing means of condensing large time series datasets such that models trained on the condensed data achieve performance comparable to those trained on the original, large data. Specifically, we propose a time series dataset condensation framework, TimeDC, that employs two-fold modal matching, encompassing frequency matching and training trajectory matching. Thus, TimeDC performs time series feature extraction and decomposition-driven frequency matching to preserve complex temporal dependencies in the reduced time series. Further, TimeDC employs curriculum training trajectory matching to ensure effective and generalized time series dataset condensation. To avoid memory overflow and to reduce the cost of dataset condensation, the framework includes an expert buffer storing pre-computed expert trajectories. Extensive experiments on real data offer insight into the effectiveness and efficiency of the proposed solutions.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-10
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3121798336
source	Publicly Available Content (ProQuest)
subjects	Condensation Data storage Datasets Effectiveness Machine learning Matching Time series
title	Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T16%3A12%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Less%20is%20More:%20Efficient%20Time%20Series%20Dataset%20Condensation%20via%20Two-fold%20Modal%20Matching--Extended%20Version&rft.jtitle=arXiv.org&rft.au=Miao,%20Hao&rft.date=2024-10-28&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3121798336%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_31217983363%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3121798336&rft_id=info:pmid/&rfr_iscdi=true