Loading…

Self-supervised representation learning for robust fine-grained human hand action recognition in industrial assembly lines

Humans are still indispensable on industrial assembly lines, but in the event of an error, they need support from intelligent systems. In addition to the objects to be observed, it is equally important to understand the fine-grained hand movements of a human to be able to track the entire process. H...

Full description

Saved in:
Bibliographic Details
Published in:Machine vision and applications 2025-01, Vol.36 (1), p.19, Article 19
Main Authors: Sturm, Fabian, Trat, Martin, Sathiyababu, Rahul, Allipilli, Harshitha, Menz, Benjamin, Hergenroether, Elke, Siegel, Melanie
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c156t-3a7599c28f234095076e2b04115c632771b5954adfe4e0f88af59272b62356983
container_end_page
container_issue 1
container_start_page 19
container_title Machine vision and applications
container_volume 36
creator Sturm, Fabian
Trat, Martin
Sathiyababu, Rahul
Allipilli, Harshitha
Menz, Benjamin
Hergenroether, Elke
Siegel, Melanie
description Humans are still indispensable on industrial assembly lines, but in the event of an error, they need support from intelligent systems. In addition to the objects to be observed, it is equally important to understand the fine-grained hand movements of a human to be able to track the entire process. However, these deep-learning-based hand action recognition methods are very label intensive, which cannot be offered by all industrial companies due to the associated costs. This work therefore presents a self-supervised learning approach for industrial assembly processes that allows a spatio-temporal transformer architecture to be pre-trained on a variety of information from real-world video footage of daily life. Subsequently, this deep learning model is adapted to the industrial assembly task at hand using only a few labels. Well-known real-world datasets best suited for representation learning of such hand actions in a regression tasks are outlined and to what extent they optimize the subsequent supervised trained classification task. This subsequent fine-tuning is supplemented by concept drift detection, which makes the resulting productively employed models more robust against concept drift and future changing assembly movements.
doi_str_mv 10.1007/s00138-024-01638-9
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3143455575</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3143455575</sourcerecordid><originalsourceid>FETCH-LOGICAL-c156t-3a7599c28f234095076e2b04115c632771b5954adfe4e0f88af59272b62356983</originalsourceid><addsrcrecordid>eNotkEtLAzEUhYMoWKt_wFXAdfTmNZkspfiCggt1HTIzSZsyzdRkRqi_3rQVLpyzOOdc-BC6pXBPAdRDBqC8JsAEAVoVp8_QjArOCFWVPkcz0MXXoNklusp5AwBCKTFDvx-u9yRPO5d-QnYdTm6XXHZxtGMYIu6dTTHEFfZDwmlopjxiH6Ijq2SLdHg9bW3Eaxs7bNtjJbl2WMVw9OFwXSmlYHtsc3bbpt_jvlTzNbrwts_u5l_n6Ov56XPxSpbvL2-LxyVpqaxGwq2SWres9owL0BJU5VgDglLZVpwpRRuppbCdd8KBr2vrpWaKNRXjstI1n6O70-4uDd-Ty6PZDFOK5aXhBZGQUipZUuyUatOQc3Le7FLY2rQ3FMyBsTkxNoWxOTI2mv8BtY1wqQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3143455575</pqid></control><display><type>article</type><title>Self-supervised representation learning for robust fine-grained human hand action recognition in industrial assembly lines</title><source>Springer Link</source><creator>Sturm, Fabian ; Trat, Martin ; Sathiyababu, Rahul ; Allipilli, Harshitha ; Menz, Benjamin ; Hergenroether, Elke ; Siegel, Melanie</creator><creatorcontrib>Sturm, Fabian ; Trat, Martin ; Sathiyababu, Rahul ; Allipilli, Harshitha ; Menz, Benjamin ; Hergenroether, Elke ; Siegel, Melanie</creatorcontrib><description>Humans are still indispensable on industrial assembly lines, but in the event of an error, they need support from intelligent systems. In addition to the objects to be observed, it is equally important to understand the fine-grained hand movements of a human to be able to track the entire process. However, these deep-learning-based hand action recognition methods are very label intensive, which cannot be offered by all industrial companies due to the associated costs. This work therefore presents a self-supervised learning approach for industrial assembly processes that allows a spatio-temporal transformer architecture to be pre-trained on a variety of information from real-world video footage of daily life. Subsequently, this deep learning model is adapted to the industrial assembly task at hand using only a few labels. Well-known real-world datasets best suited for representation learning of such hand actions in a regression tasks are outlined and to what extent they optimize the subsequent supervised trained classification task. This subsequent fine-tuning is supplemented by concept drift detection, which makes the resulting productively employed models more robust against concept drift and future changing assembly movements.</description><identifier>ISSN: 0932-8092</identifier><identifier>EISSN: 1432-1769</identifier><identifier>DOI: 10.1007/s00138-024-01638-9</identifier><language>eng</language><publisher>New York: Springer Nature B.V</publisher><subject>Activity recognition ; Assembly lines ; Corporate learning ; Deep learning ; Hand (anatomy) ; Labels ; Machine learning ; Moving object recognition ; Regression models ; Representations ; Robustness ; Self-supervised learning</subject><ispartof>Machine vision and applications, 2025-01, Vol.36 (1), p.19, Article 19</ispartof><rights>Copyright Springer Nature B.V. Jan 2025</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c156t-3a7599c28f234095076e2b04115c632771b5954adfe4e0f88af59272b62356983</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Sturm, Fabian</creatorcontrib><creatorcontrib>Trat, Martin</creatorcontrib><creatorcontrib>Sathiyababu, Rahul</creatorcontrib><creatorcontrib>Allipilli, Harshitha</creatorcontrib><creatorcontrib>Menz, Benjamin</creatorcontrib><creatorcontrib>Hergenroether, Elke</creatorcontrib><creatorcontrib>Siegel, Melanie</creatorcontrib><title>Self-supervised representation learning for robust fine-grained human hand action recognition in industrial assembly lines</title><title>Machine vision and applications</title><description>Humans are still indispensable on industrial assembly lines, but in the event of an error, they need support from intelligent systems. In addition to the objects to be observed, it is equally important to understand the fine-grained hand movements of a human to be able to track the entire process. However, these deep-learning-based hand action recognition methods are very label intensive, which cannot be offered by all industrial companies due to the associated costs. This work therefore presents a self-supervised learning approach for industrial assembly processes that allows a spatio-temporal transformer architecture to be pre-trained on a variety of information from real-world video footage of daily life. Subsequently, this deep learning model is adapted to the industrial assembly task at hand using only a few labels. Well-known real-world datasets best suited for representation learning of such hand actions in a regression tasks are outlined and to what extent they optimize the subsequent supervised trained classification task. This subsequent fine-tuning is supplemented by concept drift detection, which makes the resulting productively employed models more robust against concept drift and future changing assembly movements.</description><subject>Activity recognition</subject><subject>Assembly lines</subject><subject>Corporate learning</subject><subject>Deep learning</subject><subject>Hand (anatomy)</subject><subject>Labels</subject><subject>Machine learning</subject><subject>Moving object recognition</subject><subject>Regression models</subject><subject>Representations</subject><subject>Robustness</subject><subject>Self-supervised learning</subject><issn>0932-8092</issn><issn>1432-1769</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><recordid>eNotkEtLAzEUhYMoWKt_wFXAdfTmNZkspfiCggt1HTIzSZsyzdRkRqi_3rQVLpyzOOdc-BC6pXBPAdRDBqC8JsAEAVoVp8_QjArOCFWVPkcz0MXXoNklusp5AwBCKTFDvx-u9yRPO5d-QnYdTm6XXHZxtGMYIu6dTTHEFfZDwmlopjxiH6Ijq2SLdHg9bW3Eaxs7bNtjJbl2WMVw9OFwXSmlYHtsc3bbpt_jvlTzNbrwts_u5l_n6Ov56XPxSpbvL2-LxyVpqaxGwq2SWres9owL0BJU5VgDglLZVpwpRRuppbCdd8KBr2vrpWaKNRXjstI1n6O70-4uDd-Ty6PZDFOK5aXhBZGQUipZUuyUatOQc3Le7FLY2rQ3FMyBsTkxNoWxOTI2mv8BtY1wqQ</recordid><startdate>202501</startdate><enddate>202501</enddate><creator>Sturm, Fabian</creator><creator>Trat, Martin</creator><creator>Sathiyababu, Rahul</creator><creator>Allipilli, Harshitha</creator><creator>Menz, Benjamin</creator><creator>Hergenroether, Elke</creator><creator>Siegel, Melanie</creator><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>202501</creationdate><title>Self-supervised representation learning for robust fine-grained human hand action recognition in industrial assembly lines</title><author>Sturm, Fabian ; Trat, Martin ; Sathiyababu, Rahul ; Allipilli, Harshitha ; Menz, Benjamin ; Hergenroether, Elke ; Siegel, Melanie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c156t-3a7599c28f234095076e2b04115c632771b5954adfe4e0f88af59272b62356983</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Activity recognition</topic><topic>Assembly lines</topic><topic>Corporate learning</topic><topic>Deep learning</topic><topic>Hand (anatomy)</topic><topic>Labels</topic><topic>Machine learning</topic><topic>Moving object recognition</topic><topic>Regression models</topic><topic>Representations</topic><topic>Robustness</topic><topic>Self-supervised learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sturm, Fabian</creatorcontrib><creatorcontrib>Trat, Martin</creatorcontrib><creatorcontrib>Sathiyababu, Rahul</creatorcontrib><creatorcontrib>Allipilli, Harshitha</creatorcontrib><creatorcontrib>Menz, Benjamin</creatorcontrib><creatorcontrib>Hergenroether, Elke</creatorcontrib><creatorcontrib>Siegel, Melanie</creatorcontrib><collection>CrossRef</collection><jtitle>Machine vision and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sturm, Fabian</au><au>Trat, Martin</au><au>Sathiyababu, Rahul</au><au>Allipilli, Harshitha</au><au>Menz, Benjamin</au><au>Hergenroether, Elke</au><au>Siegel, Melanie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Self-supervised representation learning for robust fine-grained human hand action recognition in industrial assembly lines</atitle><jtitle>Machine vision and applications</jtitle><date>2025-01</date><risdate>2025</risdate><volume>36</volume><issue>1</issue><spage>19</spage><pages>19-</pages><artnum>19</artnum><issn>0932-8092</issn><eissn>1432-1769</eissn><abstract>Humans are still indispensable on industrial assembly lines, but in the event of an error, they need support from intelligent systems. In addition to the objects to be observed, it is equally important to understand the fine-grained hand movements of a human to be able to track the entire process. However, these deep-learning-based hand action recognition methods are very label intensive, which cannot be offered by all industrial companies due to the associated costs. This work therefore presents a self-supervised learning approach for industrial assembly processes that allows a spatio-temporal transformer architecture to be pre-trained on a variety of information from real-world video footage of daily life. Subsequently, this deep learning model is adapted to the industrial assembly task at hand using only a few labels. Well-known real-world datasets best suited for representation learning of such hand actions in a regression tasks are outlined and to what extent they optimize the subsequent supervised trained classification task. This subsequent fine-tuning is supplemented by concept drift detection, which makes the resulting productively employed models more robust against concept drift and future changing assembly movements.</abstract><cop>New York</cop><pub>Springer Nature B.V</pub><doi>10.1007/s00138-024-01638-9</doi></addata></record>
fulltext fulltext
identifier ISSN: 0932-8092
ispartof Machine vision and applications, 2025-01, Vol.36 (1), p.19, Article 19
issn 0932-8092
1432-1769
language eng
recordid cdi_proquest_journals_3143455575
source Springer Link
subjects Activity recognition
Assembly lines
Corporate learning
Deep learning
Hand (anatomy)
Labels
Machine learning
Moving object recognition
Regression models
Representations
Robustness
Self-supervised learning
title Self-supervised representation learning for robust fine-grained human hand action recognition in industrial assembly lines
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T10%3A01%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Self-supervised%20representation%20learning%20for%20robust%20fine-grained%20human%20hand%20action%20recognition%20in%20industrial%20assembly%20lines&rft.jtitle=Machine%20vision%20and%20applications&rft.au=Sturm,%20Fabian&rft.date=2025-01&rft.volume=36&rft.issue=1&rft.spage=19&rft.pages=19-&rft.artnum=19&rft.issn=0932-8092&rft.eissn=1432-1769&rft_id=info:doi/10.1007/s00138-024-01638-9&rft_dat=%3Cproquest_cross%3E3143455575%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c156t-3a7599c28f234095076e2b04115c632771b5954adfe4e0f88af59272b62356983%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3143455575&rft_id=info:pmid/&rfr_iscdi=true