Loading…

Self-supervised Vision Transformers for Land-cover Segmentation and Classification

Transformer models have recently approached or even surpassed the performance of ConvNets on computer vision tasks like classification and segmentation. To a large degree, these successes have been enabled by the use of large-scale labelled image datasets for supervised pre-training. This poses a si...

Full description

Saved in:

Bibliographic Details
Main Authors:	Scheibenreif, Linus, Hanna, Joelle, Mommert, Michael, Borth, Damian
Format:	Conference Proceeding
Language:	English
Subjects:	Computational modeling Computer vision Conferences Earth Image segmentation Training Transformers
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	1430
container_issue
container_start_page	1421
container_title
container_volume
creator	Scheibenreif, Linus Hanna, Joelle Mommert, Michael Borth, Damian
description	Transformer models have recently approached or even surpassed the performance of ConvNets on computer vision tasks like classification and segmentation. To a large degree, these successes have been enabled by the use of large-scale labelled image datasets for supervised pre-training. This poses a significant challenge for the adaption of vision Transformers to domains where datasets with millions of labelled samples are not available. In this work, we bridge the gap between ConvNets and Transformers for Earth observation by self-supervised pre-training on large-scale unlabelled remote sensing data 1 . We show that self-supervised pre-training yields latent task-agnostic representations that can be utilized for both land cover classification and segmentation tasks, where they significantly outperform the fully supervised baselines. Additionally, we find that subsequent fine-tuning of Transformers for specific downstream tasks performs on-par with commonly used ConvNet architectures. An ablation study further illustrates that the labelled dataset size can be reduced to one-tenth after self-supervised pre-training while still maintaining the performance of the fully supervised approach.
doi_str_mv	10.1109/CVPRW56347.2022.00148
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_9857009</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9857009</ieee_id><sourcerecordid>9857009</sourcerecordid><originalsourceid>FETCH-LOGICAL-i180t-dc01d1393a04d171e69a30e2e7dc81bbd34f5a592c8b39c8511481fac0c6ad163</originalsourceid><addsrcrecordid>eNotj9tKxDAURaMgOI7zBSL0B1rPaZo0eZTiDQrKzDg-DmlyKpFehqQO-PfWy9OCxWLDZuwaIUMEfVPtXtZvQvKizHLI8wwAC3XCLlBKUaiSa33KFjlKSEuB8pytYvyAOQIlhOYLtt5Q16bx80Dh6CO5ZOejH4dkG8wQ2zH0FGIyM6nN4FI7HikkG3rvaZjM9BPOOqk6E6Nvvf1Vl-ysNV2k1T-X7PX-bls9pvXzw1N1W6ceFUyps4AOueYGCoclktSGA-VUOquwaRwvWmGEzq1quLZK4PwMW2PBSuNQ8iW7-tv1RLQ_BN-b8LXXSpQAmn8DxVNR4A</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Self-supervised Vision Transformers for Land-cover Segmentation and Classification</title><source>IEEE Xplore All Conference Series</source><creator>Scheibenreif, Linus ; Hanna, Joelle ; Mommert, Michael ; Borth, Damian</creator><creatorcontrib>Scheibenreif, Linus ; Hanna, Joelle ; Mommert, Michael ; Borth, Damian</creatorcontrib><description>Transformer models have recently approached or even surpassed the performance of ConvNets on computer vision tasks like classification and segmentation. To a large degree, these successes have been enabled by the use of large-scale labelled image datasets for supervised pre-training. This poses a significant challenge for the adaption of vision Transformers to domains where datasets with millions of labelled samples are not available. In this work, we bridge the gap between ConvNets and Transformers for Earth observation by self-supervised pre-training on large-scale unlabelled remote sensing data 1 . We show that self-supervised pre-training yields latent task-agnostic representations that can be utilized for both land cover classification and segmentation tasks, where they significantly outperform the fully supervised baselines. Additionally, we find that subsequent fine-tuning of Transformers for specific downstream tasks performs on-par with commonly used ConvNet architectures. An ablation study further illustrates that the labelled dataset size can be reduced to one-tenth after self-supervised pre-training while still maintaining the performance of the fully supervised approach.</description><identifier>EISSN: 2160-7516</identifier><identifier>EISBN: 1665487399</identifier><identifier>EISBN: 9781665487399</identifier><identifier>DOI: 10.1109/CVPRW56347.2022.00148</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Computational modeling ; Computer vision ; Conferences ; Earth ; Image segmentation ; Training ; Transformers</subject><ispartof>2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022, p.1421-1430</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9857009$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,23911,23912,25120,27904,54533,54910</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9857009$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Scheibenreif, Linus</creatorcontrib><creatorcontrib>Hanna, Joelle</creatorcontrib><creatorcontrib>Mommert, Michael</creatorcontrib><creatorcontrib>Borth, Damian</creatorcontrib><title>Self-supervised Vision Transformers for Land-cover Segmentation and Classification</title><title>2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)</title><addtitle>CVPRW</addtitle><description>Transformer models have recently approached or even surpassed the performance of ConvNets on computer vision tasks like classification and segmentation. To a large degree, these successes have been enabled by the use of large-scale labelled image datasets for supervised pre-training. This poses a significant challenge for the adaption of vision Transformers to domains where datasets with millions of labelled samples are not available. In this work, we bridge the gap between ConvNets and Transformers for Earth observation by self-supervised pre-training on large-scale unlabelled remote sensing data 1 . We show that self-supervised pre-training yields latent task-agnostic representations that can be utilized for both land cover classification and segmentation tasks, where they significantly outperform the fully supervised baselines. Additionally, we find that subsequent fine-tuning of Transformers for specific downstream tasks performs on-par with commonly used ConvNet architectures. An ablation study further illustrates that the labelled dataset size can be reduced to one-tenth after self-supervised pre-training while still maintaining the performance of the fully supervised approach.</description><subject>Computational modeling</subject><subject>Computer vision</subject><subject>Conferences</subject><subject>Earth</subject><subject>Image segmentation</subject><subject>Training</subject><subject>Transformers</subject><issn>2160-7516</issn><isbn>1665487399</isbn><isbn>9781665487399</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2022</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj9tKxDAURaMgOI7zBSL0B1rPaZo0eZTiDQrKzDg-DmlyKpFehqQO-PfWy9OCxWLDZuwaIUMEfVPtXtZvQvKizHLI8wwAC3XCLlBKUaiSa33KFjlKSEuB8pytYvyAOQIlhOYLtt5Q16bx80Dh6CO5ZOejH4dkG8wQ2zH0FGIyM6nN4FI7HikkG3rvaZjM9BPOOqk6E6Nvvf1Vl-ysNV2k1T-X7PX-bls9pvXzw1N1W6ceFUyps4AOueYGCoclktSGA-VUOquwaRwvWmGEzq1quLZK4PwMW2PBSuNQ8iW7-tv1RLQ_BN-b8LXXSpQAmn8DxVNR4A</recordid><startdate>20220101</startdate><enddate>20220101</enddate><creator>Scheibenreif, Linus</creator><creator>Hanna, Joelle</creator><creator>Mommert, Michael</creator><creator>Borth, Damian</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20220101</creationdate><title>Self-supervised Vision Transformers for Land-cover Segmentation and Classification</title><author>Scheibenreif, Linus ; Hanna, Joelle ; Mommert, Michael ; Borth, Damian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i180t-dc01d1393a04d171e69a30e2e7dc81bbd34f5a592c8b39c8511481fac0c6ad163</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computational modeling</topic><topic>Computer vision</topic><topic>Conferences</topic><topic>Earth</topic><topic>Image segmentation</topic><topic>Training</topic><topic>Transformers</topic><toplevel>online_resources</toplevel><creatorcontrib>Scheibenreif, Linus</creatorcontrib><creatorcontrib>Hanna, Joelle</creatorcontrib><creatorcontrib>Mommert, Michael</creatorcontrib><creatorcontrib>Borth, Damian</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Scheibenreif, Linus</au><au>Hanna, Joelle</au><au>Mommert, Michael</au><au>Borth, Damian</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Self-supervised Vision Transformers for Land-cover Segmentation and Classification</atitle><btitle>2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)</btitle><stitle>CVPRW</stitle><date>2022-01-01</date><risdate>2022</risdate><spage>1421</spage><epage>1430</epage><pages>1421-1430</pages><eissn>2160-7516</eissn><eisbn>1665487399</eisbn><eisbn>9781665487399</eisbn><coden>IEEPAD</coden><abstract>Transformer models have recently approached or even surpassed the performance of ConvNets on computer vision tasks like classification and segmentation. To a large degree, these successes have been enabled by the use of large-scale labelled image datasets for supervised pre-training. This poses a significant challenge for the adaption of vision Transformers to domains where datasets with millions of labelled samples are not available. In this work, we bridge the gap between ConvNets and Transformers for Earth observation by self-supervised pre-training on large-scale unlabelled remote sensing data 1 . We show that self-supervised pre-training yields latent task-agnostic representations that can be utilized for both land cover classification and segmentation tasks, where they significantly outperform the fully supervised baselines. Additionally, we find that subsequent fine-tuning of Transformers for specific downstream tasks performs on-par with commonly used ConvNet architectures. An ablation study further illustrates that the labelled dataset size can be reduced to one-tenth after self-supervised pre-training while still maintaining the performance of the fully supervised approach.</abstract><pub>IEEE</pub><doi>10.1109/CVPRW56347.2022.00148</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2160-7516
ispartof	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022, p.1421-1430
issn	2160-7516
language	eng
recordid	cdi_ieee_primary_9857009
source	IEEE Xplore All Conference Series
subjects	Computational modeling Computer vision Conferences Earth Image segmentation Training Transformers
title	Self-supervised Vision Transformers for Land-cover Segmentation and Classification
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T05%3A17%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Self-supervised%20Vision%20Transformers%20for%20Land-cover%20Segmentation%20and%20Classification&rft.btitle=2022%20IEEE/CVF%20Conference%20on%20Computer%20Vision%20and%20Pattern%20Recognition%20Workshops%20(CVPRW)&rft.au=Scheibenreif,%20Linus&rft.date=2022-01-01&rft.spage=1421&rft.epage=1430&rft.pages=1421-1430&rft.eissn=2160-7516&rft.coden=IEEPAD&rft_id=info:doi/10.1109/CVPRW56347.2022.00148&rft.eisbn=1665487399&rft.eisbn_list=9781665487399&rft_dat=%3Cieee_CHZPO%3E9857009%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i180t-dc01d1393a04d171e69a30e2e7dc81bbd34f5a592c8b39c8511481fac0c6ad163%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=9857009&rfr_iscdi=true