Loading…

Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection

We present an unsupervised outlier detection method for galaxy spectra based on the spectrum autoencoder architecture spender , which reliably captures spectral features and provides highly realistic reconstructions for SDSS galaxy spectra. We interpret the sample density in the autoencoder latent s...

Full description

Saved in:
Bibliographic Details
Published in:The Astronomical journal 2023-08, Vol.166 (2), p.75
Main Authors: Liang, Yan, Melchior, Peter, Lu, Sicong, Goulding, Andy, Ward, Charlotte
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c416t-63fc2dbc645f3b0c2a3c23c4e2baf1f2cb178b5ff69f9078b4555dcef089ecb63
cites cdi_FETCH-LOGICAL-c416t-63fc2dbc645f3b0c2a3c23c4e2baf1f2cb178b5ff69f9078b4555dcef089ecb63
container_end_page
container_issue 2
container_start_page 75
container_title The Astronomical journal
container_volume 166
creator Liang, Yan
Melchior, Peter
Lu, Sicong
Goulding, Andy
Ward, Charlotte
description We present an unsupervised outlier detection method for galaxy spectra based on the spectrum autoencoder architecture spender , which reliably captures spectral features and provides highly realistic reconstructions for SDSS galaxy spectra. We interpret the sample density in the autoencoder latent space as a probability distribution, and identify outliers as low-probability objects with a normalizing flow. However, we found that the latent-space position is not, as expected from the architecture, redshift invariant, which introduces stochasticity into the latent space and the outlier detection method. We solve this problem by adding two novel loss terms during training, which explicitly link latent-space distances to data-space distances, preserving locality in the autoencoding process. Minimizing the additional losses leads to a redshift-invariant, nondegenerate latent-space distribution with clear separations between common and anomalous data. We inspect the spectra with the lowest probability and find them to include blends with foreground stars, extremely reddened galaxies, galaxy pairs and triples, and stars that are misclassified as galaxies. We release the newly trained spender model and the latent-space probability for the entire SDSS-I galaxy sample to aid further investigations.
doi_str_mv 10.3847/1538-3881/ace100
format article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_fbc3ccb893ce413e9e681d1a7bc433b5</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_fbc3ccb893ce413e9e681d1a7bc433b5</doaj_id><sourcerecordid>2843094773</sourcerecordid><originalsourceid>FETCH-LOGICAL-c416t-63fc2dbc645f3b0c2a3c23c4e2baf1f2cb178b5ff69f9078b4555dcef089ecb63</originalsourceid><addsrcrecordid>eNp1kEtPwzAQhC0EEqVw5xiJK2nt2EmcY1WgBCFV4nG27M26uApxcVJE_z0pQeXEaVejmdnVR8gloxMuRT5lKZcxl5JNNSCj9IiMDtIxGVFKRZwlaXZKztp2TSljkooReZhtO48N-Mo1q2iha_21i543CF3Qk6gsJ9ETVu2bs11UNp86ON0ARrqpouW2qx2G6Aa73u18c05OrK5bvPidY_J6d_syv48fl4tyPnuMQbCsizNuIakMZCK13FBINIeEg8DEaMtsAobl0qTWZoUtaL-KNE0rQEtlgWAyPibl0Ft5vVab4N512CmvnfoRfFgpHToHNSprgAMYWXBAwTgWmElWMZ0bEJybtO-6Gro2wX9sse3U2m9D07-vEik4LUSe895FBxcE37YB7eEqo2pPX-1Rqz1qNdDvI9dDxPnNX-e_9m9eh4W2</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2843094773</pqid></control><display><type>article</type><title>Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection</title><source>Directory of Open Access Journals</source><creator>Liang, Yan ; Melchior, Peter ; Lu, Sicong ; Goulding, Andy ; Ward, Charlotte</creator><creatorcontrib>Liang, Yan ; Melchior, Peter ; Lu, Sicong ; Goulding, Andy ; Ward, Charlotte</creatorcontrib><description>We present an unsupervised outlier detection method for galaxy spectra based on the spectrum autoencoder architecture spender , which reliably captures spectral features and provides highly realistic reconstructions for SDSS galaxy spectra. We interpret the sample density in the autoencoder latent space as a probability distribution, and identify outliers as low-probability objects with a normalizing flow. However, we found that the latent-space position is not, as expected from the architecture, redshift invariant, which introduces stochasticity into the latent space and the outlier detection method. We solve this problem by adding two novel loss terms during training, which explicitly link latent-space distances to data-space distances, preserving locality in the autoencoding process. Minimizing the additional losses leads to a redshift-invariant, nondegenerate latent-space distribution with clear separations between common and anomalous data. We inspect the spectra with the lowest probability and find them to include blends with foreground stars, extremely reddened galaxies, galaxy pairs and triples, and stars that are misclassified as galaxies. We release the newly trained spender model and the latent-space probability for the entire SDSS-I galaxy sample to aid further investigations.</description><identifier>ISSN: 0004-6256</identifier><identifier>EISSN: 1538-3881</identifier><identifier>DOI: 10.3847/1538-3881/ace100</identifier><language>eng</language><publisher>Madison: The American Astronomical Society</publisher><subject>Astronomy ; Astrostatistics ; Data analysis ; Galaxies ; Invariants ; Outliers (statistics) ; Probability distribution ; Red shift ; Spectra ; Spectroscopy ; Stars &amp; galaxies</subject><ispartof>The Astronomical journal, 2023-08, Vol.166 (2), p.75</ispartof><rights>2023. The Author(s). Published by the American Astronomical Society.</rights><rights>2023. The Author(s). Published by the American Astronomical Society. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c416t-63fc2dbc645f3b0c2a3c23c4e2baf1f2cb178b5ff69f9078b4555dcef089ecb63</citedby><cites>FETCH-LOGICAL-c416t-63fc2dbc645f3b0c2a3c23c4e2baf1f2cb178b5ff69f9078b4555dcef089ecb63</cites><orcidid>0000-0002-4557-6682 ; 0000-0002-8873-5065 ; 0000-0002-1001-1235 ; 0000-0002-8814-1670 ; 0000-0003-4700-663X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,860,2096,27901,27902</link.rule.ids></links><search><creatorcontrib>Liang, Yan</creatorcontrib><creatorcontrib>Melchior, Peter</creatorcontrib><creatorcontrib>Lu, Sicong</creatorcontrib><creatorcontrib>Goulding, Andy</creatorcontrib><creatorcontrib>Ward, Charlotte</creatorcontrib><title>Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection</title><title>The Astronomical journal</title><addtitle>AJ</addtitle><addtitle>Astron. J</addtitle><description>We present an unsupervised outlier detection method for galaxy spectra based on the spectrum autoencoder architecture spender , which reliably captures spectral features and provides highly realistic reconstructions for SDSS galaxy spectra. We interpret the sample density in the autoencoder latent space as a probability distribution, and identify outliers as low-probability objects with a normalizing flow. However, we found that the latent-space position is not, as expected from the architecture, redshift invariant, which introduces stochasticity into the latent space and the outlier detection method. We solve this problem by adding two novel loss terms during training, which explicitly link latent-space distances to data-space distances, preserving locality in the autoencoding process. Minimizing the additional losses leads to a redshift-invariant, nondegenerate latent-space distribution with clear separations between common and anomalous data. We inspect the spectra with the lowest probability and find them to include blends with foreground stars, extremely reddened galaxies, galaxy pairs and triples, and stars that are misclassified as galaxies. We release the newly trained spender model and the latent-space probability for the entire SDSS-I galaxy sample to aid further investigations.</description><subject>Astronomy</subject><subject>Astrostatistics</subject><subject>Data analysis</subject><subject>Galaxies</subject><subject>Invariants</subject><subject>Outliers (statistics)</subject><subject>Probability distribution</subject><subject>Red shift</subject><subject>Spectra</subject><subject>Spectroscopy</subject><subject>Stars &amp; galaxies</subject><issn>0004-6256</issn><issn>1538-3881</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>DOA</sourceid><recordid>eNp1kEtPwzAQhC0EEqVw5xiJK2nt2EmcY1WgBCFV4nG27M26uApxcVJE_z0pQeXEaVejmdnVR8gloxMuRT5lKZcxl5JNNSCj9IiMDtIxGVFKRZwlaXZKztp2TSljkooReZhtO48N-Mo1q2iha_21i543CF3Qk6gsJ9ETVu2bs11UNp86ON0ARrqpouW2qx2G6Aa73u18c05OrK5bvPidY_J6d_syv48fl4tyPnuMQbCsizNuIakMZCK13FBINIeEg8DEaMtsAobl0qTWZoUtaL-KNE0rQEtlgWAyPibl0Ft5vVab4N512CmvnfoRfFgpHToHNSprgAMYWXBAwTgWmElWMZ0bEJybtO-6Gro2wX9sse3U2m9D07-vEik4LUSe895FBxcE37YB7eEqo2pPX-1Rqz1qNdDvI9dDxPnNX-e_9m9eh4W2</recordid><startdate>20230801</startdate><enddate>20230801</enddate><creator>Liang, Yan</creator><creator>Melchior, Peter</creator><creator>Lu, Sicong</creator><creator>Goulding, Andy</creator><creator>Ward, Charlotte</creator><general>The American Astronomical Society</general><general>IOP Publishing</general><scope>O3W</scope><scope>TSCCA</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7TG</scope><scope>8FD</scope><scope>H8D</scope><scope>KL.</scope><scope>L7M</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-4557-6682</orcidid><orcidid>https://orcid.org/0000-0002-8873-5065</orcidid><orcidid>https://orcid.org/0000-0002-1001-1235</orcidid><orcidid>https://orcid.org/0000-0002-8814-1670</orcidid><orcidid>https://orcid.org/0000-0003-4700-663X</orcidid></search><sort><creationdate>20230801</creationdate><title>Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection</title><author>Liang, Yan ; Melchior, Peter ; Lu, Sicong ; Goulding, Andy ; Ward, Charlotte</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c416t-63fc2dbc645f3b0c2a3c23c4e2baf1f2cb178b5ff69f9078b4555dcef089ecb63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Astronomy</topic><topic>Astrostatistics</topic><topic>Data analysis</topic><topic>Galaxies</topic><topic>Invariants</topic><topic>Outliers (statistics)</topic><topic>Probability distribution</topic><topic>Red shift</topic><topic>Spectra</topic><topic>Spectroscopy</topic><topic>Stars &amp; galaxies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liang, Yan</creatorcontrib><creatorcontrib>Melchior, Peter</creatorcontrib><creatorcontrib>Lu, Sicong</creatorcontrib><creatorcontrib>Goulding, Andy</creatorcontrib><creatorcontrib>Ward, Charlotte</creatorcontrib><collection>Open Access: IOP Publishing Free Content</collection><collection>IOPscience (Open Access)</collection><collection>CrossRef</collection><collection>Meteorological &amp; Geoastrophysical Abstracts</collection><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>Meteorological &amp; Geoastrophysical Abstracts - Academic</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Directory of Open Access Journals</collection><jtitle>The Astronomical journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liang, Yan</au><au>Melchior, Peter</au><au>Lu, Sicong</au><au>Goulding, Andy</au><au>Ward, Charlotte</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection</atitle><jtitle>The Astronomical journal</jtitle><stitle>AJ</stitle><addtitle>Astron. J</addtitle><date>2023-08-01</date><risdate>2023</risdate><volume>166</volume><issue>2</issue><spage>75</spage><pages>75-</pages><issn>0004-6256</issn><eissn>1538-3881</eissn><abstract>We present an unsupervised outlier detection method for galaxy spectra based on the spectrum autoencoder architecture spender , which reliably captures spectral features and provides highly realistic reconstructions for SDSS galaxy spectra. We interpret the sample density in the autoencoder latent space as a probability distribution, and identify outliers as low-probability objects with a normalizing flow. However, we found that the latent-space position is not, as expected from the architecture, redshift invariant, which introduces stochasticity into the latent space and the outlier detection method. We solve this problem by adding two novel loss terms during training, which explicitly link latent-space distances to data-space distances, preserving locality in the autoencoding process. Minimizing the additional losses leads to a redshift-invariant, nondegenerate latent-space distribution with clear separations between common and anomalous data. We inspect the spectra with the lowest probability and find them to include blends with foreground stars, extremely reddened galaxies, galaxy pairs and triples, and stars that are misclassified as galaxies. We release the newly trained spender model and the latent-space probability for the entire SDSS-I galaxy sample to aid further investigations.</abstract><cop>Madison</cop><pub>The American Astronomical Society</pub><doi>10.3847/1538-3881/ace100</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0002-4557-6682</orcidid><orcidid>https://orcid.org/0000-0002-8873-5065</orcidid><orcidid>https://orcid.org/0000-0002-1001-1235</orcidid><orcidid>https://orcid.org/0000-0002-8814-1670</orcidid><orcidid>https://orcid.org/0000-0003-4700-663X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0004-6256
ispartof The Astronomical journal, 2023-08, Vol.166 (2), p.75
issn 0004-6256
1538-3881
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_fbc3ccb893ce413e9e681d1a7bc433b5
source Directory of Open Access Journals
subjects Astronomy
Astrostatistics
Data analysis
Galaxies
Invariants
Outliers (statistics)
Probability distribution
Red shift
Spectra
Spectroscopy
Stars & galaxies
title Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T00%3A03%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Autoencoding%20Galaxy%20Spectra.%20II.%20Redshift%20Invariance%20and%20Outlier%20Detection&rft.jtitle=The%20Astronomical%20journal&rft.au=Liang,%20Yan&rft.date=2023-08-01&rft.volume=166&rft.issue=2&rft.spage=75&rft.pages=75-&rft.issn=0004-6256&rft.eissn=1538-3881&rft_id=info:doi/10.3847/1538-3881/ace100&rft_dat=%3Cproquest_doaj_%3E2843094773%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c416t-63fc2dbc645f3b0c2a3c23c4e2baf1f2cb178b5ff69f9078b4555dcef089ecb63%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2843094773&rft_id=info:pmid/&rfr_iscdi=true