Loading…

Normalized Affymetrix expression data are biased by G-quadruplex formation

Probes with runs of four or more guanines (G-stacks) in their sequences can exhibit a level of hybridization that is unrelated to the expression levels of the mRNA that they are intended to measure. This is most likely caused by the formation of G-quadruplexes, where inter-probe guanines form Hoogst...

Full description

Saved in:
Bibliographic Details
Published in:Nucleic acids research 2012-04, Vol.40 (8), p.3307-3315
Main Authors: Shanahan, Hugh P, Memon, Farhat N, Upton, Graham J G, Harrison, Andrew P
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c414t-47dfa078fea6dc6d053a7e2534a44a0bdf024655871b058dfc89916eff9cd0cd3
cites cdi_FETCH-LOGICAL-c414t-47dfa078fea6dc6d053a7e2534a44a0bdf024655871b058dfc89916eff9cd0cd3
container_end_page 3315
container_issue 8
container_start_page 3307
container_title Nucleic acids research
container_volume 40
creator Shanahan, Hugh P
Memon, Farhat N
Upton, Graham J G
Harrison, Andrew P
description Probes with runs of four or more guanines (G-stacks) in their sequences can exhibit a level of hybridization that is unrelated to the expression levels of the mRNA that they are intended to measure. This is most likely caused by the formation of G-quadruplexes, where inter-probe guanines form Hoogsteen hydrogen bonds, which probes with G-stacks are capable of forming. We demonstrate that for a specific microarray data set using the Human HG_U133A Affymetrix GeneChip and RMA normalization there is significant bias in the expression levels, the fold change and the correlations between expression levels. These effects grow more pronounced as the number of G-stack probes in a probe set increases. Approximately 14% of the probe sets are directly affected. The analysis was repeated for a number of other normalization pipelines and two, FARMS and PLIER, minimized the bias to some extent. We estimate that ∼15% of the data sets deposited in the GEO database are susceptible to the effect. The inclusion of G-stack probes in the affected data sets can bias key parameters used in the selection and clustering of genes. The elimination of these probes from any analysis in such affected data sets outweighs the increase of noise in the signal.
doi_str_mv 10.1093/nar/gkr1230
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3333884</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1014104350</sourcerecordid><originalsourceid>FETCH-LOGICAL-c414t-47dfa078fea6dc6d053a7e2534a44a0bdf024655871b058dfc89916eff9cd0cd3</originalsourceid><addsrcrecordid>eNqNkctLw0AQhxdRbK2evEuOgsTOvvK4CFK0KqIXPS-T7G6N5tHuJtL615vSWvTmXOYwHz9m5iPklMIlhZSPa3Tj2YejjMMeGVIesVCkEdsnQ-AgQwoiGZAj798BqKBSHJIBYzRNmUyG5OGpcRWWxZfRwbW1q8q0rlgGZjl3xvuiqQONLQboTJAV6HsqWwXTcNGhdt28NMvArgPanjwmBxZLb062fUReb29eJnfh4_P0fnL9GOaCijYUsbYIcWINRjqPNEiOsWGSCxQCIdMWmIikTGKagUy0zZM0pZGxNs015JqPyNUmd95lldG5qVuHpZq7okK3Ug0W6u-kLt7UrPlUvK8kEX3A-TbANYvO-FZVhc9NWWJtms4ruv4TCC7hHyiksv9yzHr0YoPmrvHeGbvbiIJai1K9KLUV1dNnv4_YsT9m-DeIv5E0</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1009530572</pqid></control><display><type>article</type><title>Normalized Affymetrix expression data are biased by G-quadruplex formation</title><source>Oxford Open</source><source>PubMed Central</source><creator>Shanahan, Hugh P ; Memon, Farhat N ; Upton, Graham J G ; Harrison, Andrew P</creator><creatorcontrib>Shanahan, Hugh P ; Memon, Farhat N ; Upton, Graham J G ; Harrison, Andrew P</creatorcontrib><description>Probes with runs of four or more guanines (G-stacks) in their sequences can exhibit a level of hybridization that is unrelated to the expression levels of the mRNA that they are intended to measure. This is most likely caused by the formation of G-quadruplexes, where inter-probe guanines form Hoogsteen hydrogen bonds, which probes with G-stacks are capable of forming. We demonstrate that for a specific microarray data set using the Human HG_U133A Affymetrix GeneChip and RMA normalization there is significant bias in the expression levels, the fold change and the correlations between expression levels. These effects grow more pronounced as the number of G-stack probes in a probe set increases. Approximately 14% of the probe sets are directly affected. The analysis was repeated for a number of other normalization pipelines and two, FARMS and PLIER, minimized the bias to some extent. We estimate that ∼15% of the data sets deposited in the GEO database are susceptible to the effect. The inclusion of G-stack probes in the affected data sets can bias key parameters used in the selection and clustering of genes. The elimination of these probes from any analysis in such affected data sets outweighs the increase of noise in the signal.</description><identifier>ISSN: 0305-1048</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/gkr1230</identifier><identifier>PMID: 22199258</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Computational Biology ; Data processing ; DNA Probes - chemistry ; Farms ; G-Quadruplexes ; Gene Expression Profiling ; Guanine ; Humans ; Hydrogen bonding ; mRNA ; Oligonucleotide Array Sequence Analysis ; Probes</subject><ispartof>Nucleic acids research, 2012-04, Vol.40 (8), p.3307-3315</ispartof><rights>The Author(s) 2011. Published by Oxford University Press. 2011</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c414t-47dfa078fea6dc6d053a7e2534a44a0bdf024655871b058dfc89916eff9cd0cd3</citedby><cites>FETCH-LOGICAL-c414t-47dfa078fea6dc6d053a7e2534a44a0bdf024655871b058dfc89916eff9cd0cd3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3333884/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3333884/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/22199258$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Shanahan, Hugh P</creatorcontrib><creatorcontrib>Memon, Farhat N</creatorcontrib><creatorcontrib>Upton, Graham J G</creatorcontrib><creatorcontrib>Harrison, Andrew P</creatorcontrib><title>Normalized Affymetrix expression data are biased by G-quadruplex formation</title><title>Nucleic acids research</title><addtitle>Nucleic Acids Res</addtitle><description>Probes with runs of four or more guanines (G-stacks) in their sequences can exhibit a level of hybridization that is unrelated to the expression levels of the mRNA that they are intended to measure. This is most likely caused by the formation of G-quadruplexes, where inter-probe guanines form Hoogsteen hydrogen bonds, which probes with G-stacks are capable of forming. We demonstrate that for a specific microarray data set using the Human HG_U133A Affymetrix GeneChip and RMA normalization there is significant bias in the expression levels, the fold change and the correlations between expression levels. These effects grow more pronounced as the number of G-stack probes in a probe set increases. Approximately 14% of the probe sets are directly affected. The analysis was repeated for a number of other normalization pipelines and two, FARMS and PLIER, minimized the bias to some extent. We estimate that ∼15% of the data sets deposited in the GEO database are susceptible to the effect. The inclusion of G-stack probes in the affected data sets can bias key parameters used in the selection and clustering of genes. The elimination of these probes from any analysis in such affected data sets outweighs the increase of noise in the signal.</description><subject>Computational Biology</subject><subject>Data processing</subject><subject>DNA Probes - chemistry</subject><subject>Farms</subject><subject>G-Quadruplexes</subject><subject>Gene Expression Profiling</subject><subject>Guanine</subject><subject>Humans</subject><subject>Hydrogen bonding</subject><subject>mRNA</subject><subject>Oligonucleotide Array Sequence Analysis</subject><subject>Probes</subject><issn>0305-1048</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><recordid>eNqNkctLw0AQhxdRbK2evEuOgsTOvvK4CFK0KqIXPS-T7G6N5tHuJtL615vSWvTmXOYwHz9m5iPklMIlhZSPa3Tj2YejjMMeGVIesVCkEdsnQ-AgQwoiGZAj798BqKBSHJIBYzRNmUyG5OGpcRWWxZfRwbW1q8q0rlgGZjl3xvuiqQONLQboTJAV6HsqWwXTcNGhdt28NMvArgPanjwmBxZLb062fUReb29eJnfh4_P0fnL9GOaCijYUsbYIcWINRjqPNEiOsWGSCxQCIdMWmIikTGKagUy0zZM0pZGxNs015JqPyNUmd95lldG5qVuHpZq7okK3Ug0W6u-kLt7UrPlUvK8kEX3A-TbANYvO-FZVhc9NWWJtms4ruv4TCC7hHyiksv9yzHr0YoPmrvHeGbvbiIJai1K9KLUV1dNnv4_YsT9m-DeIv5E0</recordid><startdate>20120401</startdate><enddate>20120401</enddate><creator>Shanahan, Hugh P</creator><creator>Memon, Farhat N</creator><creator>Upton, Graham J G</creator><creator>Harrison, Andrew P</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope><scope>5PM</scope></search><sort><creationdate>20120401</creationdate><title>Normalized Affymetrix expression data are biased by G-quadruplex formation</title><author>Shanahan, Hugh P ; Memon, Farhat N ; Upton, Graham J G ; Harrison, Andrew P</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c414t-47dfa078fea6dc6d053a7e2534a44a0bdf024655871b058dfc89916eff9cd0cd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Computational Biology</topic><topic>Data processing</topic><topic>DNA Probes - chemistry</topic><topic>Farms</topic><topic>G-Quadruplexes</topic><topic>Gene Expression Profiling</topic><topic>Guanine</topic><topic>Humans</topic><topic>Hydrogen bonding</topic><topic>mRNA</topic><topic>Oligonucleotide Array Sequence Analysis</topic><topic>Probes</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shanahan, Hugh P</creatorcontrib><creatorcontrib>Memon, Farhat N</creatorcontrib><creatorcontrib>Upton, Graham J G</creatorcontrib><creatorcontrib>Harrison, Andrew P</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shanahan, Hugh P</au><au>Memon, Farhat N</au><au>Upton, Graham J G</au><au>Harrison, Andrew P</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Normalized Affymetrix expression data are biased by G-quadruplex formation</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucleic Acids Res</addtitle><date>2012-04-01</date><risdate>2012</risdate><volume>40</volume><issue>8</issue><spage>3307</spage><epage>3315</epage><pages>3307-3315</pages><issn>0305-1048</issn><eissn>1362-4962</eissn><abstract>Probes with runs of four or more guanines (G-stacks) in their sequences can exhibit a level of hybridization that is unrelated to the expression levels of the mRNA that they are intended to measure. This is most likely caused by the formation of G-quadruplexes, where inter-probe guanines form Hoogsteen hydrogen bonds, which probes with G-stacks are capable of forming. We demonstrate that for a specific microarray data set using the Human HG_U133A Affymetrix GeneChip and RMA normalization there is significant bias in the expression levels, the fold change and the correlations between expression levels. These effects grow more pronounced as the number of G-stack probes in a probe set increases. Approximately 14% of the probe sets are directly affected. The analysis was repeated for a number of other normalization pipelines and two, FARMS and PLIER, minimized the bias to some extent. We estimate that ∼15% of the data sets deposited in the GEO database are susceptible to the effect. The inclusion of G-stack probes in the affected data sets can bias key parameters used in the selection and clustering of genes. The elimination of these probes from any analysis in such affected data sets outweighs the increase of noise in the signal.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>22199258</pmid><doi>10.1093/nar/gkr1230</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0305-1048
ispartof Nucleic acids research, 2012-04, Vol.40 (8), p.3307-3315
issn 0305-1048
1362-4962
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3333884
source Oxford Open; PubMed Central
subjects Computational Biology
Data processing
DNA Probes - chemistry
Farms
G-Quadruplexes
Gene Expression Profiling
Guanine
Humans
Hydrogen bonding
mRNA
Oligonucleotide Array Sequence Analysis
Probes
title Normalized Affymetrix expression data are biased by G-quadruplex formation
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T10%3A11%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Normalized%20Affymetrix%20expression%20data%20are%20biased%20by%20G-quadruplex%20formation&rft.jtitle=Nucleic%20acids%20research&rft.au=Shanahan,%20Hugh%20P&rft.date=2012-04-01&rft.volume=40&rft.issue=8&rft.spage=3307&rft.epage=3315&rft.pages=3307-3315&rft.issn=0305-1048&rft.eissn=1362-4962&rft_id=info:doi/10.1093/nar/gkr1230&rft_dat=%3Cproquest_pubme%3E1014104350%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c414t-47dfa078fea6dc6d053a7e2534a44a0bdf024655871b058dfc89916eff9cd0cd3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1009530572&rft_id=info:pmid/22199258&rfr_iscdi=true