Loading…

Mixed Signals: Analyzing Software Attribution Challenges in the Android Ecosystem

The ability to identify the author responsible for a given software object is critical for many research studies and for enhancing software transparency and accountability. However, as opposed to other application markets like Apple's iOS App Store, attribution in the Android ecosystem is known...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on software engineering 2023-04, Vol.49 (4), p.1-16
Main Authors: Hageman, Kaspar, Feal, Alvaro, Gamba, Julien, Girish, Aniketh, Bleier, Jakob, Lindorfer, Martina, Tapiador, Juan, Vallina-Rodriguez, Narseo
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c245t-111540c43048cb29784a50ab37655bef99847d8009e4324042306fbee06c2a6c3
container_end_page 16
container_issue 4
container_start_page 1
container_title IEEE transactions on software engineering
container_volume 49
creator Hageman, Kaspar
Feal, Alvaro
Gamba, Julien
Girish, Aniketh
Bleier, Jakob
Lindorfer, Martina
Tapiador, Juan
Vallina-Rodriguez, Narseo
description The ability to identify the author responsible for a given software object is critical for many research studies and for enhancing software transparency and accountability. However, as opposed to other application markets like Apple's iOS App Store, attribution in the Android ecosystem is known to be hard. Prior research has leveraged market metadata and signing certificates to identify software authors without questioning the validity and accuracy of these attribution signals. However, Android application (app) authors can, either intentionally or by mistake, hide their true identity due to: (1) the lack of policy enforcement by markets to ensure the accuracy and correctness of the information disclosed by developers in their market profiles during the app release process, and (2) the use of self-signed certificates for signing apps instead of certificates issued by trusted CAs. In this paper, we perform the first empirical analysis of the availability, volatility and overall aptness of publicly available market and app metadata for author attribution in Android markets. To that end, we analyze a dataset of over 2.5 million market entries and apps extracted from five Android markets for over two years. Our results show that widely used attribution signals are often missing from market profiles and that they change over time. We also invalidate the general belief about the validity of signing certificates for author attribution. For instance, we find that apps from different authors share signing certificates due to the proliferation of app building frameworks and software factories. Finally, we introduce the concept of an attribution graph and we apply it to evaluate the validity of existing attribution signals on the Google Play Store. Our results confirm that the lack of control over publicly available signals can confuse automatic attribution processes.
doi_str_mv 10.1109/TSE.2023.3236582
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2803047642</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10015797</ieee_id><sourcerecordid>2803047642</sourcerecordid><originalsourceid>FETCH-LOGICAL-c245t-111540c43048cb29784a50ab37655bef99847d8009e4324042306fbee06c2a6c3</originalsourceid><addsrcrecordid>eNpNkEtPAjEUhRujiTi6d-GiievB29e0dUcIPhKMMeC6mUcHSmAG2xLFX28JLFydxXnk3g-hWwJDQkA_zGeTIQXKhoyyQih6hgZEM50zQeEcDQC0yoVQ-hJdhbACACGlGKCPN_djGzxzi65ch0c8SrL_dd0Cz_o2fpfe4lGM3lW76PoOj5flem27hQ3YdTguk9s1vncNntR92IdoN9fook1T9uakGfp8mszHL_n0_fl1PJrmNeUi5oQQwaHmDLiqK6ql4qWAsmKyEKKyrdaKy0aluy1nlAOnDIq2shaKmpZFzTJ0f9zd-v5rZ0M0q37nD18YqiDNyiJ1MgTHVO37ELxtzda7Ten3hoA5gDMJnDmAMydwqXJ3rDhr7b84ECG1ZH96hmhF</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2803047642</pqid></control><display><type>article</type><title>Mixed Signals: Analyzing Software Attribution Challenges in the Android Ecosystem</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Hageman, Kaspar ; Feal, Alvaro ; Gamba, Julien ; Girish, Aniketh ; Bleier, Jakob ; Lindorfer, Martina ; Tapiador, Juan ; Vallina-Rodriguez, Narseo</creator><creatorcontrib>Hageman, Kaspar ; Feal, Alvaro ; Gamba, Julien ; Girish, Aniketh ; Bleier, Jakob ; Lindorfer, Martina ; Tapiador, Juan ; Vallina-Rodriguez, Narseo</creatorcontrib><description>The ability to identify the author responsible for a given software object is critical for many research studies and for enhancing software transparency and accountability. However, as opposed to other application markets like Apple's iOS App Store, attribution in the Android ecosystem is known to be hard. Prior research has leveraged market metadata and signing certificates to identify software authors without questioning the validity and accuracy of these attribution signals. However, Android application (app) authors can, either intentionally or by mistake, hide their true identity due to: (1) the lack of policy enforcement by markets to ensure the accuracy and correctness of the information disclosed by developers in their market profiles during the app release process, and (2) the use of self-signed certificates for signing apps instead of certificates issued by trusted CAs. In this paper, we perform the first empirical analysis of the availability, volatility and overall aptness of publicly available market and app metadata for author attribution in Android markets. To that end, we analyze a dataset of over 2.5 million market entries and apps extracted from five Android markets for over two years. Our results show that widely used attribution signals are often missing from market profiles and that they change over time. We also invalidate the general belief about the validity of signing certificates for author attribution. For instance, we find that apps from different authors share signing certificates due to the proliferation of app building frameworks and software factories. Finally, we introduce the concept of an attribution graph and we apply it to evaluate the validity of existing attribution signals on the Google Play Store. Our results confirm that the lack of control over publicly available signals can confuse automatic attribution processes.</description><identifier>ISSN: 0098-5589</identifier><identifier>EISSN: 1939-3520</identifier><identifier>DOI: 10.1109/TSE.2023.3236582</identifier><identifier>CODEN: IESEDJ</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Android ; Applications programs ; Attribution ; Attribution graph ; Availability ; Certificates ; Companies ; Ecosystems ; Empirical analysis ; Internet ; Metadata ; Mobile apps ; Mobile operating systems ; Operating systems ; Software ; Validity ; Web and internet services</subject><ispartof>IEEE transactions on software engineering, 2023-04, Vol.49 (4), p.1-16</ispartof><rights>Copyright IEEE Computer Society 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c245t-111540c43048cb29784a50ab37655bef99847d8009e4324042306fbee06c2a6c3</cites><orcidid>0000-0002-4573-3967 ; 0000-0003-4554-8291 ; 0000-0002-5420-6835 ; 0000-0002-6658-1800 ; 0000-0002-2895-125X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10015797$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,54774</link.rule.ids></links><search><creatorcontrib>Hageman, Kaspar</creatorcontrib><creatorcontrib>Feal, Alvaro</creatorcontrib><creatorcontrib>Gamba, Julien</creatorcontrib><creatorcontrib>Girish, Aniketh</creatorcontrib><creatorcontrib>Bleier, Jakob</creatorcontrib><creatorcontrib>Lindorfer, Martina</creatorcontrib><creatorcontrib>Tapiador, Juan</creatorcontrib><creatorcontrib>Vallina-Rodriguez, Narseo</creatorcontrib><title>Mixed Signals: Analyzing Software Attribution Challenges in the Android Ecosystem</title><title>IEEE transactions on software engineering</title><addtitle>TSE</addtitle><description>The ability to identify the author responsible for a given software object is critical for many research studies and for enhancing software transparency and accountability. However, as opposed to other application markets like Apple's iOS App Store, attribution in the Android ecosystem is known to be hard. Prior research has leveraged market metadata and signing certificates to identify software authors without questioning the validity and accuracy of these attribution signals. However, Android application (app) authors can, either intentionally or by mistake, hide their true identity due to: (1) the lack of policy enforcement by markets to ensure the accuracy and correctness of the information disclosed by developers in their market profiles during the app release process, and (2) the use of self-signed certificates for signing apps instead of certificates issued by trusted CAs. In this paper, we perform the first empirical analysis of the availability, volatility and overall aptness of publicly available market and app metadata for author attribution in Android markets. To that end, we analyze a dataset of over 2.5 million market entries and apps extracted from five Android markets for over two years. Our results show that widely used attribution signals are often missing from market profiles and that they change over time. We also invalidate the general belief about the validity of signing certificates for author attribution. For instance, we find that apps from different authors share signing certificates due to the proliferation of app building frameworks and software factories. Finally, we introduce the concept of an attribution graph and we apply it to evaluate the validity of existing attribution signals on the Google Play Store. Our results confirm that the lack of control over publicly available signals can confuse automatic attribution processes.</description><subject>Android</subject><subject>Applications programs</subject><subject>Attribution</subject><subject>Attribution graph</subject><subject>Availability</subject><subject>Certificates</subject><subject>Companies</subject><subject>Ecosystems</subject><subject>Empirical analysis</subject><subject>Internet</subject><subject>Metadata</subject><subject>Mobile apps</subject><subject>Mobile operating systems</subject><subject>Operating systems</subject><subject>Software</subject><subject>Validity</subject><subject>Web and internet services</subject><issn>0098-5589</issn><issn>1939-3520</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpNkEtPAjEUhRujiTi6d-GiievB29e0dUcIPhKMMeC6mUcHSmAG2xLFX28JLFydxXnk3g-hWwJDQkA_zGeTIQXKhoyyQih6hgZEM50zQeEcDQC0yoVQ-hJdhbACACGlGKCPN_djGzxzi65ch0c8SrL_dd0Cz_o2fpfe4lGM3lW76PoOj5flem27hQ3YdTguk9s1vncNntR92IdoN9fook1T9uakGfp8mszHL_n0_fl1PJrmNeUi5oQQwaHmDLiqK6ql4qWAsmKyEKKyrdaKy0aluy1nlAOnDIq2shaKmpZFzTJ0f9zd-v5rZ0M0q37nD18YqiDNyiJ1MgTHVO37ELxtzda7Ten3hoA5gDMJnDmAMydwqXJ3rDhr7b84ECG1ZH96hmhF</recordid><startdate>20230401</startdate><enddate>20230401</enddate><creator>Hageman, Kaspar</creator><creator>Feal, Alvaro</creator><creator>Gamba, Julien</creator><creator>Girish, Aniketh</creator><creator>Bleier, Jakob</creator><creator>Lindorfer, Martina</creator><creator>Tapiador, Juan</creator><creator>Vallina-Rodriguez, Narseo</creator><general>IEEE</general><general>IEEE Computer Society</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><scope>K9.</scope><orcidid>https://orcid.org/0000-0002-4573-3967</orcidid><orcidid>https://orcid.org/0000-0003-4554-8291</orcidid><orcidid>https://orcid.org/0000-0002-5420-6835</orcidid><orcidid>https://orcid.org/0000-0002-6658-1800</orcidid><orcidid>https://orcid.org/0000-0002-2895-125X</orcidid></search><sort><creationdate>20230401</creationdate><title>Mixed Signals: Analyzing Software Attribution Challenges in the Android Ecosystem</title><author>Hageman, Kaspar ; Feal, Alvaro ; Gamba, Julien ; Girish, Aniketh ; Bleier, Jakob ; Lindorfer, Martina ; Tapiador, Juan ; Vallina-Rodriguez, Narseo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c245t-111540c43048cb29784a50ab37655bef99847d8009e4324042306fbee06c2a6c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Android</topic><topic>Applications programs</topic><topic>Attribution</topic><topic>Attribution graph</topic><topic>Availability</topic><topic>Certificates</topic><topic>Companies</topic><topic>Ecosystems</topic><topic>Empirical analysis</topic><topic>Internet</topic><topic>Metadata</topic><topic>Mobile apps</topic><topic>Mobile operating systems</topic><topic>Operating systems</topic><topic>Software</topic><topic>Validity</topic><topic>Web and internet services</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hageman, Kaspar</creatorcontrib><creatorcontrib>Feal, Alvaro</creatorcontrib><creatorcontrib>Gamba, Julien</creatorcontrib><creatorcontrib>Girish, Aniketh</creatorcontrib><creatorcontrib>Bleier, Jakob</creatorcontrib><creatorcontrib>Lindorfer, Martina</creatorcontrib><creatorcontrib>Tapiador, Juan</creatorcontrib><creatorcontrib>Vallina-Rodriguez, Narseo</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><jtitle>IEEE transactions on software engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hageman, Kaspar</au><au>Feal, Alvaro</au><au>Gamba, Julien</au><au>Girish, Aniketh</au><au>Bleier, Jakob</au><au>Lindorfer, Martina</au><au>Tapiador, Juan</au><au>Vallina-Rodriguez, Narseo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mixed Signals: Analyzing Software Attribution Challenges in the Android Ecosystem</atitle><jtitle>IEEE transactions on software engineering</jtitle><stitle>TSE</stitle><date>2023-04-01</date><risdate>2023</risdate><volume>49</volume><issue>4</issue><spage>1</spage><epage>16</epage><pages>1-16</pages><issn>0098-5589</issn><eissn>1939-3520</eissn><coden>IESEDJ</coden><abstract>The ability to identify the author responsible for a given software object is critical for many research studies and for enhancing software transparency and accountability. However, as opposed to other application markets like Apple's iOS App Store, attribution in the Android ecosystem is known to be hard. Prior research has leveraged market metadata and signing certificates to identify software authors without questioning the validity and accuracy of these attribution signals. However, Android application (app) authors can, either intentionally or by mistake, hide their true identity due to: (1) the lack of policy enforcement by markets to ensure the accuracy and correctness of the information disclosed by developers in their market profiles during the app release process, and (2) the use of self-signed certificates for signing apps instead of certificates issued by trusted CAs. In this paper, we perform the first empirical analysis of the availability, volatility and overall aptness of publicly available market and app metadata for author attribution in Android markets. To that end, we analyze a dataset of over 2.5 million market entries and apps extracted from five Android markets for over two years. Our results show that widely used attribution signals are often missing from market profiles and that they change over time. We also invalidate the general belief about the validity of signing certificates for author attribution. For instance, we find that apps from different authors share signing certificates due to the proliferation of app building frameworks and software factories. Finally, we introduce the concept of an attribution graph and we apply it to evaluate the validity of existing attribution signals on the Google Play Store. Our results confirm that the lack of control over publicly available signals can confuse automatic attribution processes.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TSE.2023.3236582</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-4573-3967</orcidid><orcidid>https://orcid.org/0000-0003-4554-8291</orcidid><orcidid>https://orcid.org/0000-0002-5420-6835</orcidid><orcidid>https://orcid.org/0000-0002-6658-1800</orcidid><orcidid>https://orcid.org/0000-0002-2895-125X</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0098-5589
ispartof IEEE transactions on software engineering, 2023-04, Vol.49 (4), p.1-16
issn 0098-5589
1939-3520
language eng
recordid cdi_proquest_journals_2803047642
source IEEE Electronic Library (IEL) Journals
subjects Android
Applications programs
Attribution
Attribution graph
Availability
Certificates
Companies
Ecosystems
Empirical analysis
Internet
Metadata
Mobile apps
Mobile operating systems
Operating systems
Software
Validity
Web and internet services
title Mixed Signals: Analyzing Software Attribution Challenges in the Android Ecosystem
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T00%3A15%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mixed%20Signals:%20Analyzing%20Software%20Attribution%20Challenges%20in%20the%20Android%20Ecosystem&rft.jtitle=IEEE%20transactions%20on%20software%20engineering&rft.au=Hageman,%20Kaspar&rft.date=2023-04-01&rft.volume=49&rft.issue=4&rft.spage=1&rft.epage=16&rft.pages=1-16&rft.issn=0098-5589&rft.eissn=1939-3520&rft.coden=IESEDJ&rft_id=info:doi/10.1109/TSE.2023.3236582&rft_dat=%3Cproquest_ieee_%3E2803047642%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c245t-111540c43048cb29784a50ab37655bef99847d8009e4324042306fbee06c2a6c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2803047642&rft_id=info:pmid/&rft_ieee_id=10015797&rfr_iscdi=true