Loading…

Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version

The expanding instrumentation of processes throughout society with sensors yields a proliferation of time series data that may in turn enable important applications, e.g., related to transportation infrastructures or power grids. Machine-learning based methods are increasingly being used to extract...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2024-10
Main Authors: Miao, Hao, Liu, Ziqiao, Zhao, Yan, Guo, Chenjuan, Yang, Bin, Zheng, Kai, Jensen, Christian S
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Miao, Hao
Liu, Ziqiao
Zhao, Yan
Guo, Chenjuan
Yang, Bin
Zheng, Kai
Jensen, Christian S
description The expanding instrumentation of processes throughout society with sensors yields a proliferation of time series data that may in turn enable important applications, e.g., related to transportation infrastructures or power grids. Machine-learning based methods are increasingly being used to extract value from such data. We provide means of reducing the resulting considerable computational and data storage costs. We achieve this by providing means of condensing large time series datasets such that models trained on the condensed data achieve performance comparable to those trained on the original, large data. Specifically, we propose a time series dataset condensation framework, TimeDC, that employs two-fold modal matching, encompassing frequency matching and training trajectory matching. Thus, TimeDC performs time series feature extraction and decomposition-driven frequency matching to preserve complex temporal dependencies in the reduced time series. Further, TimeDC employs curriculum training trajectory matching to ensure effective and generalized time series dataset condensation. To avoid memory overflow and to reduce the cost of dataset condensation, the framework includes an expert buffer storing pre-computed expert trajectories. Extensive experiments on real data offer insight into the effectiveness and efficiency of the proposed solutions.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3121798336</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3121798336</sourcerecordid><originalsourceid>FETCH-proquest_journals_31217983363</originalsourceid><addsrcrecordid>eNqNy70KwjAUhuEgCIp6DwecA22ibXXVioNOFlcJ5kRTaqI58efyzeAFOH3D-3w9NhRS5ryaCTFgE6I2yzJRlGI-l0NmdkgElmDvAy6hNsaeLboIjb0hHDBYJFirqAgjrLzT6EhF6x28rILm7bnxnU5vrTrYq3i-WnfhvP5ETFbDEQMlPWZ9ozrCyW9HbLqpm9WW34N_PJHiqfXP4FI6yVzk5aKSspD_qS8HOEYI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3121798336</pqid></control><display><type>article</type><title>Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version</title><source>Publicly Available Content (ProQuest)</source><creator>Miao, Hao ; Liu, Ziqiao ; Zhao, Yan ; Guo, Chenjuan ; Yang, Bin ; Zheng, Kai ; Jensen, Christian S</creator><creatorcontrib>Miao, Hao ; Liu, Ziqiao ; Zhao, Yan ; Guo, Chenjuan ; Yang, Bin ; Zheng, Kai ; Jensen, Christian S</creatorcontrib><description>The expanding instrumentation of processes throughout society with sensors yields a proliferation of time series data that may in turn enable important applications, e.g., related to transportation infrastructures or power grids. Machine-learning based methods are increasingly being used to extract value from such data. We provide means of reducing the resulting considerable computational and data storage costs. We achieve this by providing means of condensing large time series datasets such that models trained on the condensed data achieve performance comparable to those trained on the original, large data. Specifically, we propose a time series dataset condensation framework, TimeDC, that employs two-fold modal matching, encompassing frequency matching and training trajectory matching. Thus, TimeDC performs time series feature extraction and decomposition-driven frequency matching to preserve complex temporal dependencies in the reduced time series. Further, TimeDC employs curriculum training trajectory matching to ensure effective and generalized time series dataset condensation. To avoid memory overflow and to reduce the cost of dataset condensation, the framework includes an expert buffer storing pre-computed expert trajectories. Extensive experiments on real data offer insight into the effectiveness and efficiency of the proposed solutions.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Condensation ; Data storage ; Datasets ; Effectiveness ; Machine learning ; Matching ; Time series</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3121798336?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Miao, Hao</creatorcontrib><creatorcontrib>Liu, Ziqiao</creatorcontrib><creatorcontrib>Zhao, Yan</creatorcontrib><creatorcontrib>Guo, Chenjuan</creatorcontrib><creatorcontrib>Yang, Bin</creatorcontrib><creatorcontrib>Zheng, Kai</creatorcontrib><creatorcontrib>Jensen, Christian S</creatorcontrib><title>Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version</title><title>arXiv.org</title><description>The expanding instrumentation of processes throughout society with sensors yields a proliferation of time series data that may in turn enable important applications, e.g., related to transportation infrastructures or power grids. Machine-learning based methods are increasingly being used to extract value from such data. We provide means of reducing the resulting considerable computational and data storage costs. We achieve this by providing means of condensing large time series datasets such that models trained on the condensed data achieve performance comparable to those trained on the original, large data. Specifically, we propose a time series dataset condensation framework, TimeDC, that employs two-fold modal matching, encompassing frequency matching and training trajectory matching. Thus, TimeDC performs time series feature extraction and decomposition-driven frequency matching to preserve complex temporal dependencies in the reduced time series. Further, TimeDC employs curriculum training trajectory matching to ensure effective and generalized time series dataset condensation. To avoid memory overflow and to reduce the cost of dataset condensation, the framework includes an expert buffer storing pre-computed expert trajectories. Extensive experiments on real data offer insight into the effectiveness and efficiency of the proposed solutions.</description><subject>Condensation</subject><subject>Data storage</subject><subject>Datasets</subject><subject>Effectiveness</subject><subject>Machine learning</subject><subject>Matching</subject><subject>Time series</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNy70KwjAUhuEgCIp6DwecA22ibXXVioNOFlcJ5kRTaqI58efyzeAFOH3D-3w9NhRS5ryaCTFgE6I2yzJRlGI-l0NmdkgElmDvAy6hNsaeLboIjb0hHDBYJFirqAgjrLzT6EhF6x28rILm7bnxnU5vrTrYq3i-WnfhvP5ETFbDEQMlPWZ9ozrCyW9HbLqpm9WW34N_PJHiqfXP4FI6yVzk5aKSspD_qS8HOEYI</recordid><startdate>20241028</startdate><enddate>20241028</enddate><creator>Miao, Hao</creator><creator>Liu, Ziqiao</creator><creator>Zhao, Yan</creator><creator>Guo, Chenjuan</creator><creator>Yang, Bin</creator><creator>Zheng, Kai</creator><creator>Jensen, Christian S</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241028</creationdate><title>Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version</title><author>Miao, Hao ; Liu, Ziqiao ; Zhao, Yan ; Guo, Chenjuan ; Yang, Bin ; Zheng, Kai ; Jensen, Christian S</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31217983363</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Condensation</topic><topic>Data storage</topic><topic>Datasets</topic><topic>Effectiveness</topic><topic>Machine learning</topic><topic>Matching</topic><topic>Time series</topic><toplevel>online_resources</toplevel><creatorcontrib>Miao, Hao</creatorcontrib><creatorcontrib>Liu, Ziqiao</creatorcontrib><creatorcontrib>Zhao, Yan</creatorcontrib><creatorcontrib>Guo, Chenjuan</creatorcontrib><creatorcontrib>Yang, Bin</creatorcontrib><creatorcontrib>Zheng, Kai</creatorcontrib><creatorcontrib>Jensen, Christian S</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Miao, Hao</au><au>Liu, Ziqiao</au><au>Zhao, Yan</au><au>Guo, Chenjuan</au><au>Yang, Bin</au><au>Zheng, Kai</au><au>Jensen, Christian S</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version</atitle><jtitle>arXiv.org</jtitle><date>2024-10-28</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>The expanding instrumentation of processes throughout society with sensors yields a proliferation of time series data that may in turn enable important applications, e.g., related to transportation infrastructures or power grids. Machine-learning based methods are increasingly being used to extract value from such data. We provide means of reducing the resulting considerable computational and data storage costs. We achieve this by providing means of condensing large time series datasets such that models trained on the condensed data achieve performance comparable to those trained on the original, large data. Specifically, we propose a time series dataset condensation framework, TimeDC, that employs two-fold modal matching, encompassing frequency matching and training trajectory matching. Thus, TimeDC performs time series feature extraction and decomposition-driven frequency matching to preserve complex temporal dependencies in the reduced time series. Further, TimeDC employs curriculum training trajectory matching to ensure effective and generalized time series dataset condensation. To avoid memory overflow and to reduce the cost of dataset condensation, the framework includes an expert buffer storing pre-computed expert trajectories. Extensive experiments on real data offer insight into the effectiveness and efficiency of the proposed solutions.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-10
issn 2331-8422
language eng
recordid cdi_proquest_journals_3121798336
source Publicly Available Content (ProQuest)
subjects Condensation
Data storage
Datasets
Effectiveness
Machine learning
Matching
Time series
title Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T16%3A12%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Less%20is%20More:%20Efficient%20Time%20Series%20Dataset%20Condensation%20via%20Two-fold%20Modal%20Matching--Extended%20Version&rft.jtitle=arXiv.org&rft.au=Miao,%20Hao&rft.date=2024-10-28&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3121798336%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_31217983363%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3121798336&rft_id=info:pmid/&rfr_iscdi=true