Loading…
Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version
The expanding instrumentation of processes throughout society with sensors yields a proliferation of time series data that may in turn enable important applications, e.g., related to transportation infrastructures or power grids. Machine-learning based methods are increasingly being used to extract...
Saved in:
Published in: | arXiv.org 2024-10 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Miao, Hao Liu, Ziqiao Zhao, Yan Guo, Chenjuan Yang, Bin Zheng, Kai Jensen, Christian S |
description | The expanding instrumentation of processes throughout society with sensors yields a proliferation of time series data that may in turn enable important applications, e.g., related to transportation infrastructures or power grids. Machine-learning based methods are increasingly being used to extract value from such data. We provide means of reducing the resulting considerable computational and data storage costs. We achieve this by providing means of condensing large time series datasets such that models trained on the condensed data achieve performance comparable to those trained on the original, large data. Specifically, we propose a time series dataset condensation framework, TimeDC, that employs two-fold modal matching, encompassing frequency matching and training trajectory matching. Thus, TimeDC performs time series feature extraction and decomposition-driven frequency matching to preserve complex temporal dependencies in the reduced time series. Further, TimeDC employs curriculum training trajectory matching to ensure effective and generalized time series dataset condensation. To avoid memory overflow and to reduce the cost of dataset condensation, the framework includes an expert buffer storing pre-computed expert trajectories. Extensive experiments on real data offer insight into the effectiveness and efficiency of the proposed solutions. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3121798336</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3121798336</sourcerecordid><originalsourceid>FETCH-proquest_journals_31217983363</originalsourceid><addsrcrecordid>eNqNy70KwjAUhuEgCIp6DwecA22ibXXVioNOFlcJ5kRTaqI58efyzeAFOH3D-3w9NhRS5ryaCTFgE6I2yzJRlGI-l0NmdkgElmDvAy6hNsaeLboIjb0hHDBYJFirqAgjrLzT6EhF6x28rILm7bnxnU5vrTrYq3i-WnfhvP5ETFbDEQMlPWZ9ozrCyW9HbLqpm9WW34N_PJHiqfXP4FI6yVzk5aKSspD_qS8HOEYI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3121798336</pqid></control><display><type>article</type><title>Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version</title><source>Publicly Available Content (ProQuest)</source><creator>Miao, Hao ; Liu, Ziqiao ; Zhao, Yan ; Guo, Chenjuan ; Yang, Bin ; Zheng, Kai ; Jensen, Christian S</creator><creatorcontrib>Miao, Hao ; Liu, Ziqiao ; Zhao, Yan ; Guo, Chenjuan ; Yang, Bin ; Zheng, Kai ; Jensen, Christian S</creatorcontrib><description>The expanding instrumentation of processes throughout society with sensors yields a proliferation of time series data that may in turn enable important applications, e.g., related to transportation infrastructures or power grids. Machine-learning based methods are increasingly being used to extract value from such data. We provide means of reducing the resulting considerable computational and data storage costs. We achieve this by providing means of condensing large time series datasets such that models trained on the condensed data achieve performance comparable to those trained on the original, large data. Specifically, we propose a time series dataset condensation framework, TimeDC, that employs two-fold modal matching, encompassing frequency matching and training trajectory matching. Thus, TimeDC performs time series feature extraction and decomposition-driven frequency matching to preserve complex temporal dependencies in the reduced time series. Further, TimeDC employs curriculum training trajectory matching to ensure effective and generalized time series dataset condensation. To avoid memory overflow and to reduce the cost of dataset condensation, the framework includes an expert buffer storing pre-computed expert trajectories. Extensive experiments on real data offer insight into the effectiveness and efficiency of the proposed solutions.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Condensation ; Data storage ; Datasets ; Effectiveness ; Machine learning ; Matching ; Time series</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3121798336?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Miao, Hao</creatorcontrib><creatorcontrib>Liu, Ziqiao</creatorcontrib><creatorcontrib>Zhao, Yan</creatorcontrib><creatorcontrib>Guo, Chenjuan</creatorcontrib><creatorcontrib>Yang, Bin</creatorcontrib><creatorcontrib>Zheng, Kai</creatorcontrib><creatorcontrib>Jensen, Christian S</creatorcontrib><title>Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version</title><title>arXiv.org</title><description>The expanding instrumentation of processes throughout society with sensors yields a proliferation of time series data that may in turn enable important applications, e.g., related to transportation infrastructures or power grids. Machine-learning based methods are increasingly being used to extract value from such data. We provide means of reducing the resulting considerable computational and data storage costs. We achieve this by providing means of condensing large time series datasets such that models trained on the condensed data achieve performance comparable to those trained on the original, large data. Specifically, we propose a time series dataset condensation framework, TimeDC, that employs two-fold modal matching, encompassing frequency matching and training trajectory matching. Thus, TimeDC performs time series feature extraction and decomposition-driven frequency matching to preserve complex temporal dependencies in the reduced time series. Further, TimeDC employs curriculum training trajectory matching to ensure effective and generalized time series dataset condensation. To avoid memory overflow and to reduce the cost of dataset condensation, the framework includes an expert buffer storing pre-computed expert trajectories. Extensive experiments on real data offer insight into the effectiveness and efficiency of the proposed solutions.</description><subject>Condensation</subject><subject>Data storage</subject><subject>Datasets</subject><subject>Effectiveness</subject><subject>Machine learning</subject><subject>Matching</subject><subject>Time series</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNy70KwjAUhuEgCIp6DwecA22ibXXVioNOFlcJ5kRTaqI58efyzeAFOH3D-3w9NhRS5ryaCTFgE6I2yzJRlGI-l0NmdkgElmDvAy6hNsaeLboIjb0hHDBYJFirqAgjrLzT6EhF6x28rILm7bnxnU5vrTrYq3i-WnfhvP5ETFbDEQMlPWZ9ozrCyW9HbLqpm9WW34N_PJHiqfXP4FI6yVzk5aKSspD_qS8HOEYI</recordid><startdate>20241028</startdate><enddate>20241028</enddate><creator>Miao, Hao</creator><creator>Liu, Ziqiao</creator><creator>Zhao, Yan</creator><creator>Guo, Chenjuan</creator><creator>Yang, Bin</creator><creator>Zheng, Kai</creator><creator>Jensen, Christian S</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241028</creationdate><title>Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version</title><author>Miao, Hao ; Liu, Ziqiao ; Zhao, Yan ; Guo, Chenjuan ; Yang, Bin ; Zheng, Kai ; Jensen, Christian S</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31217983363</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Condensation</topic><topic>Data storage</topic><topic>Datasets</topic><topic>Effectiveness</topic><topic>Machine learning</topic><topic>Matching</topic><topic>Time series</topic><toplevel>online_resources</toplevel><creatorcontrib>Miao, Hao</creatorcontrib><creatorcontrib>Liu, Ziqiao</creatorcontrib><creatorcontrib>Zhao, Yan</creatorcontrib><creatorcontrib>Guo, Chenjuan</creatorcontrib><creatorcontrib>Yang, Bin</creatorcontrib><creatorcontrib>Zheng, Kai</creatorcontrib><creatorcontrib>Jensen, Christian S</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Miao, Hao</au><au>Liu, Ziqiao</au><au>Zhao, Yan</au><au>Guo, Chenjuan</au><au>Yang, Bin</au><au>Zheng, Kai</au><au>Jensen, Christian S</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version</atitle><jtitle>arXiv.org</jtitle><date>2024-10-28</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>The expanding instrumentation of processes throughout society with sensors yields a proliferation of time series data that may in turn enable important applications, e.g., related to transportation infrastructures or power grids. Machine-learning based methods are increasingly being used to extract value from such data. We provide means of reducing the resulting considerable computational and data storage costs. We achieve this by providing means of condensing large time series datasets such that models trained on the condensed data achieve performance comparable to those trained on the original, large data. Specifically, we propose a time series dataset condensation framework, TimeDC, that employs two-fold modal matching, encompassing frequency matching and training trajectory matching. Thus, TimeDC performs time series feature extraction and decomposition-driven frequency matching to preserve complex temporal dependencies in the reduced time series. Further, TimeDC employs curriculum training trajectory matching to ensure effective and generalized time series dataset condensation. To avoid memory overflow and to reduce the cost of dataset condensation, the framework includes an expert buffer storing pre-computed expert trajectories. Extensive experiments on real data offer insight into the effectiveness and efficiency of the proposed solutions.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-10 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_3121798336 |
source | Publicly Available Content (ProQuest) |
subjects | Condensation Data storage Datasets Effectiveness Machine learning Matching Time series |
title | Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T16%3A12%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Less%20is%20More:%20Efficient%20Time%20Series%20Dataset%20Condensation%20via%20Two-fold%20Modal%20Matching--Extended%20Version&rft.jtitle=arXiv.org&rft.au=Miao,%20Hao&rft.date=2024-10-28&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3121798336%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_31217983363%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3121798336&rft_id=info:pmid/&rfr_iscdi=true |