Loading…
Normalized Affymetrix expression data are biased by G-quadruplex formation
Probes with runs of four or more guanines (G-stacks) in their sequences can exhibit a level of hybridization that is unrelated to the expression levels of the mRNA that they are intended to measure. This is most likely caused by the formation of G-quadruplexes, where inter-probe guanines form Hoogst...
Saved in:
Published in: | Nucleic acids research 2012-04, Vol.40 (8), p.3307-3315 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c414t-47dfa078fea6dc6d053a7e2534a44a0bdf024655871b058dfc89916eff9cd0cd3 |
---|---|
cites | cdi_FETCH-LOGICAL-c414t-47dfa078fea6dc6d053a7e2534a44a0bdf024655871b058dfc89916eff9cd0cd3 |
container_end_page | 3315 |
container_issue | 8 |
container_start_page | 3307 |
container_title | Nucleic acids research |
container_volume | 40 |
creator | Shanahan, Hugh P Memon, Farhat N Upton, Graham J G Harrison, Andrew P |
description | Probes with runs of four or more guanines (G-stacks) in their sequences can exhibit a level of hybridization that is unrelated to the expression levels of the mRNA that they are intended to measure. This is most likely caused by the formation of G-quadruplexes, where inter-probe guanines form Hoogsteen hydrogen bonds, which probes with G-stacks are capable of forming. We demonstrate that for a specific microarray data set using the Human HG_U133A Affymetrix GeneChip and RMA normalization there is significant bias in the expression levels, the fold change and the correlations between expression levels. These effects grow more pronounced as the number of G-stack probes in a probe set increases. Approximately 14% of the probe sets are directly affected. The analysis was repeated for a number of other normalization pipelines and two, FARMS and PLIER, minimized the bias to some extent. We estimate that ∼15% of the data sets deposited in the GEO database are susceptible to the effect. The inclusion of G-stack probes in the affected data sets can bias key parameters used in the selection and clustering of genes. The elimination of these probes from any analysis in such affected data sets outweighs the increase of noise in the signal. |
doi_str_mv | 10.1093/nar/gkr1230 |
format | article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3333884</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1014104350</sourcerecordid><originalsourceid>FETCH-LOGICAL-c414t-47dfa078fea6dc6d053a7e2534a44a0bdf024655871b058dfc89916eff9cd0cd3</originalsourceid><addsrcrecordid>eNqNkctLw0AQhxdRbK2evEuOgsTOvvK4CFK0KqIXPS-T7G6N5tHuJtL615vSWvTmXOYwHz9m5iPklMIlhZSPa3Tj2YejjMMeGVIesVCkEdsnQ-AgQwoiGZAj798BqKBSHJIBYzRNmUyG5OGpcRWWxZfRwbW1q8q0rlgGZjl3xvuiqQONLQboTJAV6HsqWwXTcNGhdt28NMvArgPanjwmBxZLb062fUReb29eJnfh4_P0fnL9GOaCijYUsbYIcWINRjqPNEiOsWGSCxQCIdMWmIikTGKagUy0zZM0pZGxNs015JqPyNUmd95lldG5qVuHpZq7okK3Ug0W6u-kLt7UrPlUvK8kEX3A-TbANYvO-FZVhc9NWWJtms4ruv4TCC7hHyiksv9yzHr0YoPmrvHeGbvbiIJai1K9KLUV1dNnv4_YsT9m-DeIv5E0</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1009530572</pqid></control><display><type>article</type><title>Normalized Affymetrix expression data are biased by G-quadruplex formation</title><source>Oxford Open</source><source>PubMed Central</source><creator>Shanahan, Hugh P ; Memon, Farhat N ; Upton, Graham J G ; Harrison, Andrew P</creator><creatorcontrib>Shanahan, Hugh P ; Memon, Farhat N ; Upton, Graham J G ; Harrison, Andrew P</creatorcontrib><description>Probes with runs of four or more guanines (G-stacks) in their sequences can exhibit a level of hybridization that is unrelated to the expression levels of the mRNA that they are intended to measure. This is most likely caused by the formation of G-quadruplexes, where inter-probe guanines form Hoogsteen hydrogen bonds, which probes with G-stacks are capable of forming. We demonstrate that for a specific microarray data set using the Human HG_U133A Affymetrix GeneChip and RMA normalization there is significant bias in the expression levels, the fold change and the correlations between expression levels. These effects grow more pronounced as the number of G-stack probes in a probe set increases. Approximately 14% of the probe sets are directly affected. The analysis was repeated for a number of other normalization pipelines and two, FARMS and PLIER, minimized the bias to some extent. We estimate that ∼15% of the data sets deposited in the GEO database are susceptible to the effect. The inclusion of G-stack probes in the affected data sets can bias key parameters used in the selection and clustering of genes. The elimination of these probes from any analysis in such affected data sets outweighs the increase of noise in the signal.</description><identifier>ISSN: 0305-1048</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/gkr1230</identifier><identifier>PMID: 22199258</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Computational Biology ; Data processing ; DNA Probes - chemistry ; Farms ; G-Quadruplexes ; Gene Expression Profiling ; Guanine ; Humans ; Hydrogen bonding ; mRNA ; Oligonucleotide Array Sequence Analysis ; Probes</subject><ispartof>Nucleic acids research, 2012-04, Vol.40 (8), p.3307-3315</ispartof><rights>The Author(s) 2011. Published by Oxford University Press. 2011</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c414t-47dfa078fea6dc6d053a7e2534a44a0bdf024655871b058dfc89916eff9cd0cd3</citedby><cites>FETCH-LOGICAL-c414t-47dfa078fea6dc6d053a7e2534a44a0bdf024655871b058dfc89916eff9cd0cd3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3333884/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3333884/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/22199258$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Shanahan, Hugh P</creatorcontrib><creatorcontrib>Memon, Farhat N</creatorcontrib><creatorcontrib>Upton, Graham J G</creatorcontrib><creatorcontrib>Harrison, Andrew P</creatorcontrib><title>Normalized Affymetrix expression data are biased by G-quadruplex formation</title><title>Nucleic acids research</title><addtitle>Nucleic Acids Res</addtitle><description>Probes with runs of four or more guanines (G-stacks) in their sequences can exhibit a level of hybridization that is unrelated to the expression levels of the mRNA that they are intended to measure. This is most likely caused by the formation of G-quadruplexes, where inter-probe guanines form Hoogsteen hydrogen bonds, which probes with G-stacks are capable of forming. We demonstrate that for a specific microarray data set using the Human HG_U133A Affymetrix GeneChip and RMA normalization there is significant bias in the expression levels, the fold change and the correlations between expression levels. These effects grow more pronounced as the number of G-stack probes in a probe set increases. Approximately 14% of the probe sets are directly affected. The analysis was repeated for a number of other normalization pipelines and two, FARMS and PLIER, minimized the bias to some extent. We estimate that ∼15% of the data sets deposited in the GEO database are susceptible to the effect. The inclusion of G-stack probes in the affected data sets can bias key parameters used in the selection and clustering of genes. The elimination of these probes from any analysis in such affected data sets outweighs the increase of noise in the signal.</description><subject>Computational Biology</subject><subject>Data processing</subject><subject>DNA Probes - chemistry</subject><subject>Farms</subject><subject>G-Quadruplexes</subject><subject>Gene Expression Profiling</subject><subject>Guanine</subject><subject>Humans</subject><subject>Hydrogen bonding</subject><subject>mRNA</subject><subject>Oligonucleotide Array Sequence Analysis</subject><subject>Probes</subject><issn>0305-1048</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><recordid>eNqNkctLw0AQhxdRbK2evEuOgsTOvvK4CFK0KqIXPS-T7G6N5tHuJtL615vSWvTmXOYwHz9m5iPklMIlhZSPa3Tj2YejjMMeGVIesVCkEdsnQ-AgQwoiGZAj798BqKBSHJIBYzRNmUyG5OGpcRWWxZfRwbW1q8q0rlgGZjl3xvuiqQONLQboTJAV6HsqWwXTcNGhdt28NMvArgPanjwmBxZLb062fUReb29eJnfh4_P0fnL9GOaCijYUsbYIcWINRjqPNEiOsWGSCxQCIdMWmIikTGKagUy0zZM0pZGxNs015JqPyNUmd95lldG5qVuHpZq7okK3Ug0W6u-kLt7UrPlUvK8kEX3A-TbANYvO-FZVhc9NWWJtms4ruv4TCC7hHyiksv9yzHr0YoPmrvHeGbvbiIJai1K9KLUV1dNnv4_YsT9m-DeIv5E0</recordid><startdate>20120401</startdate><enddate>20120401</enddate><creator>Shanahan, Hugh P</creator><creator>Memon, Farhat N</creator><creator>Upton, Graham J G</creator><creator>Harrison, Andrew P</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope><scope>5PM</scope></search><sort><creationdate>20120401</creationdate><title>Normalized Affymetrix expression data are biased by G-quadruplex formation</title><author>Shanahan, Hugh P ; Memon, Farhat N ; Upton, Graham J G ; Harrison, Andrew P</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c414t-47dfa078fea6dc6d053a7e2534a44a0bdf024655871b058dfc89916eff9cd0cd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Computational Biology</topic><topic>Data processing</topic><topic>DNA Probes - chemistry</topic><topic>Farms</topic><topic>G-Quadruplexes</topic><topic>Gene Expression Profiling</topic><topic>Guanine</topic><topic>Humans</topic><topic>Hydrogen bonding</topic><topic>mRNA</topic><topic>Oligonucleotide Array Sequence Analysis</topic><topic>Probes</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shanahan, Hugh P</creatorcontrib><creatorcontrib>Memon, Farhat N</creatorcontrib><creatorcontrib>Upton, Graham J G</creatorcontrib><creatorcontrib>Harrison, Andrew P</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shanahan, Hugh P</au><au>Memon, Farhat N</au><au>Upton, Graham J G</au><au>Harrison, Andrew P</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Normalized Affymetrix expression data are biased by G-quadruplex formation</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucleic Acids Res</addtitle><date>2012-04-01</date><risdate>2012</risdate><volume>40</volume><issue>8</issue><spage>3307</spage><epage>3315</epage><pages>3307-3315</pages><issn>0305-1048</issn><eissn>1362-4962</eissn><abstract>Probes with runs of four or more guanines (G-stacks) in their sequences can exhibit a level of hybridization that is unrelated to the expression levels of the mRNA that they are intended to measure. This is most likely caused by the formation of G-quadruplexes, where inter-probe guanines form Hoogsteen hydrogen bonds, which probes with G-stacks are capable of forming. We demonstrate that for a specific microarray data set using the Human HG_U133A Affymetrix GeneChip and RMA normalization there is significant bias in the expression levels, the fold change and the correlations between expression levels. These effects grow more pronounced as the number of G-stack probes in a probe set increases. Approximately 14% of the probe sets are directly affected. The analysis was repeated for a number of other normalization pipelines and two, FARMS and PLIER, minimized the bias to some extent. We estimate that ∼15% of the data sets deposited in the GEO database are susceptible to the effect. The inclusion of G-stack probes in the affected data sets can bias key parameters used in the selection and clustering of genes. The elimination of these probes from any analysis in such affected data sets outweighs the increase of noise in the signal.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>22199258</pmid><doi>10.1093/nar/gkr1230</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0305-1048 |
ispartof | Nucleic acids research, 2012-04, Vol.40 (8), p.3307-3315 |
issn | 0305-1048 1362-4962 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3333884 |
source | Oxford Open; PubMed Central |
subjects | Computational Biology Data processing DNA Probes - chemistry Farms G-Quadruplexes Gene Expression Profiling Guanine Humans Hydrogen bonding mRNA Oligonucleotide Array Sequence Analysis Probes |
title | Normalized Affymetrix expression data are biased by G-quadruplex formation |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T10%3A11%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Normalized%20Affymetrix%20expression%20data%20are%20biased%20by%20G-quadruplex%20formation&rft.jtitle=Nucleic%20acids%20research&rft.au=Shanahan,%20Hugh%20P&rft.date=2012-04-01&rft.volume=40&rft.issue=8&rft.spage=3307&rft.epage=3315&rft.pages=3307-3315&rft.issn=0305-1048&rft.eissn=1362-4962&rft_id=info:doi/10.1093/nar/gkr1230&rft_dat=%3Cproquest_pubme%3E1014104350%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c414t-47dfa078fea6dc6d053a7e2534a44a0bdf024655871b058dfc89916eff9cd0cd3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1009530572&rft_id=info:pmid/22199258&rfr_iscdi=true |