Loading…

Selecting the Right Correlation Measure for Binary Data

Finding the most interesting correlations among items is essential for problems in many commercial, medical, and scientific domains. Although there are numerous measures available for evaluating correlations, different correlation measures provide drastically different results. Piatetsky-Shapiro pro...

Full description

Saved in:
Bibliographic Details
Published in:ACM transactions on knowledge discovery from data 2014-11, Vol.9 (2), p.1-28
Main Authors: Duan, Lian, Street, W. Nick, Liu, Yanchi, Xu, Songhua, Wu, Brook
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c324t-3ad1b2c6c04f76a4adc11bf43f87123bda254c6168eb89ed1d46eb2c489c5edf3
cites cdi_FETCH-LOGICAL-c324t-3ad1b2c6c04f76a4adc11bf43f87123bda254c6168eb89ed1d46eb2c489c5edf3
container_end_page 28
container_issue 2
container_start_page 1
container_title ACM transactions on knowledge discovery from data
container_volume 9
creator Duan, Lian
Street, W. Nick
Liu, Yanchi
Xu, Songhua
Wu, Brook
description Finding the most interesting correlations among items is essential for problems in many commercial, medical, and scientific domains. Although there are numerous measures available for evaluating correlations, different correlation measures provide drastically different results. Piatetsky-Shapiro provided three mandatory properties for any reasonable correlation measure, and Tan et al. proposed several properties to categorize correlation measures; however, it is still hard for users to choose the desirable correlation measures according to their needs. In order to solve this problem, we explore the effectiveness problem in three ways. First, we propose two desirable properties and two optional properties for correlation measure selection and study the property satisfaction for different correlation measures. Second, we study different techniques to adjust correlation measures and propose two new correlation measures: the Simplified χ 2 with Continuity Correction and the Simplified χ 2 with Support. Third, we analyze the upper and lower bounds of different measures and categorize them by the bound differences. Combining these three directions, we provide guidelines for users to choose the proper measure according to their needs.
doi_str_mv 10.1145/2637484
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1718933474</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1718933474</sourcerecordid><originalsourceid>FETCH-LOGICAL-c324t-3ad1b2c6c04f76a4adc11bf43f87123bda254c6168eb89ed1d46eb2c489c5edf3</originalsourceid><addsrcrecordid>eNo1kMtOwzAURC0EEqUgfsE72ARy7etHl1DKQypC4iGxixznug1Kk2K7C_6eopbVzOLMaDSMnUN5BYDqWmhp0OIBG4FSukAjPg__vbZwzE5S-ipLpQDEiJk36sjntl_wvCT-2i6WmU-HGKlzuR16_kwubSLxMER-2_Yu_vA7l90pOwquS3S21zH7uJ-9Tx-L-cvD0_RmXngpMBfSNVALr32JwWiHrvEAdUAZrAEh68YJhV6DtlTbCTXQoKZtAO3EK2qCHLPLXe86Dt8bSrlatclT17mehk2qwICdSIkGt-jFDvVxSClSqNaxXW0HV1BWf9dU-2vkL181VPM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1718933474</pqid></control><display><type>article</type><title>Selecting the Right Correlation Measure for Binary Data</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><creator>Duan, Lian ; Street, W. Nick ; Liu, Yanchi ; Xu, Songhua ; Wu, Brook</creator><creatorcontrib>Duan, Lian ; Street, W. Nick ; Liu, Yanchi ; Xu, Songhua ; Wu, Brook</creatorcontrib><description>Finding the most interesting correlations among items is essential for problems in many commercial, medical, and scientific domains. Although there are numerous measures available for evaluating correlations, different correlation measures provide drastically different results. Piatetsky-Shapiro provided three mandatory properties for any reasonable correlation measure, and Tan et al. proposed several properties to categorize correlation measures; however, it is still hard for users to choose the desirable correlation measures according to their needs. In order to solve this problem, we explore the effectiveness problem in three ways. First, we propose two desirable properties and two optional properties for correlation measure selection and study the property satisfaction for different correlation measures. Second, we study different techniques to adjust correlation measures and propose two new correlation measures: the Simplified χ 2 with Continuity Correction and the Simplified χ 2 with Support. Third, we analyze the upper and lower bounds of different measures and categorize them by the bound differences. Combining these three directions, we provide guidelines for users to choose the proper measure according to their needs.</description><identifier>ISSN: 1556-4681</identifier><identifier>EISSN: 1556-472X</identifier><identifier>DOI: 10.1145/2637484</identifier><language>eng</language><subject>Adjustment ; Binary data ; Continuity ; Correlation ; Correlation analysis ; Guidelines ; Lower bounds ; Medical</subject><ispartof>ACM transactions on knowledge discovery from data, 2014-11, Vol.9 (2), p.1-28</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c324t-3ad1b2c6c04f76a4adc11bf43f87123bda254c6168eb89ed1d46eb2c489c5edf3</citedby><cites>FETCH-LOGICAL-c324t-3ad1b2c6c04f76a4adc11bf43f87123bda254c6168eb89ed1d46eb2c489c5edf3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Duan, Lian</creatorcontrib><creatorcontrib>Street, W. Nick</creatorcontrib><creatorcontrib>Liu, Yanchi</creatorcontrib><creatorcontrib>Xu, Songhua</creatorcontrib><creatorcontrib>Wu, Brook</creatorcontrib><title>Selecting the Right Correlation Measure for Binary Data</title><title>ACM transactions on knowledge discovery from data</title><description>Finding the most interesting correlations among items is essential for problems in many commercial, medical, and scientific domains. Although there are numerous measures available for evaluating correlations, different correlation measures provide drastically different results. Piatetsky-Shapiro provided three mandatory properties for any reasonable correlation measure, and Tan et al. proposed several properties to categorize correlation measures; however, it is still hard for users to choose the desirable correlation measures according to their needs. In order to solve this problem, we explore the effectiveness problem in three ways. First, we propose two desirable properties and two optional properties for correlation measure selection and study the property satisfaction for different correlation measures. Second, we study different techniques to adjust correlation measures and propose two new correlation measures: the Simplified χ 2 with Continuity Correction and the Simplified χ 2 with Support. Third, we analyze the upper and lower bounds of different measures and categorize them by the bound differences. Combining these three directions, we provide guidelines for users to choose the proper measure according to their needs.</description><subject>Adjustment</subject><subject>Binary data</subject><subject>Continuity</subject><subject>Correlation</subject><subject>Correlation analysis</subject><subject>Guidelines</subject><subject>Lower bounds</subject><subject>Medical</subject><issn>1556-4681</issn><issn>1556-472X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNo1kMtOwzAURC0EEqUgfsE72ARy7etHl1DKQypC4iGxixznug1Kk2K7C_6eopbVzOLMaDSMnUN5BYDqWmhp0OIBG4FSukAjPg__vbZwzE5S-ipLpQDEiJk36sjntl_wvCT-2i6WmU-HGKlzuR16_kwubSLxMER-2_Yu_vA7l90pOwquS3S21zH7uJ-9Tx-L-cvD0_RmXngpMBfSNVALr32JwWiHrvEAdUAZrAEh68YJhV6DtlTbCTXQoKZtAO3EK2qCHLPLXe86Dt8bSrlatclT17mehk2qwICdSIkGt-jFDvVxSClSqNaxXW0HV1BWf9dU-2vkL181VPM</recordid><startdate>20141101</startdate><enddate>20141101</enddate><creator>Duan, Lian</creator><creator>Street, W. Nick</creator><creator>Liu, Yanchi</creator><creator>Xu, Songhua</creator><creator>Wu, Brook</creator><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20141101</creationdate><title>Selecting the Right Correlation Measure for Binary Data</title><author>Duan, Lian ; Street, W. Nick ; Liu, Yanchi ; Xu, Songhua ; Wu, Brook</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c324t-3ad1b2c6c04f76a4adc11bf43f87123bda254c6168eb89ed1d46eb2c489c5edf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Adjustment</topic><topic>Binary data</topic><topic>Continuity</topic><topic>Correlation</topic><topic>Correlation analysis</topic><topic>Guidelines</topic><topic>Lower bounds</topic><topic>Medical</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Duan, Lian</creatorcontrib><creatorcontrib>Street, W. Nick</creatorcontrib><creatorcontrib>Liu, Yanchi</creatorcontrib><creatorcontrib>Xu, Songhua</creatorcontrib><creatorcontrib>Wu, Brook</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>ACM transactions on knowledge discovery from data</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Duan, Lian</au><au>Street, W. Nick</au><au>Liu, Yanchi</au><au>Xu, Songhua</au><au>Wu, Brook</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Selecting the Right Correlation Measure for Binary Data</atitle><jtitle>ACM transactions on knowledge discovery from data</jtitle><date>2014-11-01</date><risdate>2014</risdate><volume>9</volume><issue>2</issue><spage>1</spage><epage>28</epage><pages>1-28</pages><issn>1556-4681</issn><eissn>1556-472X</eissn><abstract>Finding the most interesting correlations among items is essential for problems in many commercial, medical, and scientific domains. Although there are numerous measures available for evaluating correlations, different correlation measures provide drastically different results. Piatetsky-Shapiro provided three mandatory properties for any reasonable correlation measure, and Tan et al. proposed several properties to categorize correlation measures; however, it is still hard for users to choose the desirable correlation measures according to their needs. In order to solve this problem, we explore the effectiveness problem in three ways. First, we propose two desirable properties and two optional properties for correlation measure selection and study the property satisfaction for different correlation measures. Second, we study different techniques to adjust correlation measures and propose two new correlation measures: the Simplified χ 2 with Continuity Correction and the Simplified χ 2 with Support. Third, we analyze the upper and lower bounds of different measures and categorize them by the bound differences. Combining these three directions, we provide guidelines for users to choose the proper measure according to their needs.</abstract><doi>10.1145/2637484</doi><tpages>28</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1556-4681
ispartof ACM transactions on knowledge discovery from data, 2014-11, Vol.9 (2), p.1-28
issn 1556-4681
1556-472X
language eng
recordid cdi_proquest_miscellaneous_1718933474
source Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)
subjects Adjustment
Binary data
Continuity
Correlation
Correlation analysis
Guidelines
Lower bounds
Medical
title Selecting the Right Correlation Measure for Binary Data
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-23T05%3A01%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Selecting%20the%20Right%20Correlation%20Measure%20for%20Binary%20Data&rft.jtitle=ACM%20transactions%20on%20knowledge%20discovery%20from%20data&rft.au=Duan,%20Lian&rft.date=2014-11-01&rft.volume=9&rft.issue=2&rft.spage=1&rft.epage=28&rft.pages=1-28&rft.issn=1556-4681&rft.eissn=1556-472X&rft_id=info:doi/10.1145/2637484&rft_dat=%3Cproquest_cross%3E1718933474%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c324t-3ad1b2c6c04f76a4adc11bf43f87123bda254c6168eb89ed1d46eb2c489c5edf3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1718933474&rft_id=info:pmid/&rfr_iscdi=true