Loading…
Analysis of messy data with heteroscedastic in mean models
In the analysis of the data, we often faced with the problem of data where the data did not meet some assumptions. In conditions of such data is often called data messy. This problem is a consequence of the data that generates outliers that bias or error estimation. To analyze the data messy, there...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | 1 |
container_start_page | |
container_title | |
container_volume | 1716 |
creator | Trianasari, Nurvita Sumarni, Cucu |
description | In the analysis of the data, we often faced with the problem of data where the data did not meet some assumptions. In conditions of such data is often called data messy. This problem is a consequence of the data that generates outliers that bias or error estimation. To analyze the data messy, there are three approaches, namely standard analysis, transform data and data analysis methods rather than a standard. Simulations conducted to determine the performance of a third comparative test procedure on average often the model variance is not homogeneous. Data simulation of each scenario is raised as much as 500 times. Next, we do the analysis of the average comparison test using three methods, Welch test, mixed models and Welch-r test. Data generation is done through software R version 3.1.2. Based on simulation results, these three methods can be used for both normal and abnormal case (homoscedastic). The third method works very well on data balanced or unbalanced when there is no violation in the homogenity’s assumptions variance.
For balanced data, the three methods still showed an excellent performance despite the violation of the assumption of homogeneity of variance, with the requisite degree of heterogeneity is high. It can be shown from the level of power test above 90 percent, and the best to Welch method (98.4%) and the Welch-r method (97.8%). For unbalanced data, Welch method will be very good moderate at in case of heterogeneity positive pair with a 98.2% power. Mixed models method will be very good at case of highly heterogeneity was negative negative pairs with power. Welch-r method works very well in both cases.
However, if the level of heterogeneity of variance is very high, the power of all method will decrease especially for mixed models methods. The method which still works well enough (power more than 50%) is Welch-r method (62.6%), and the method of Welch (58.6%) in the case of balanced data. If the data are unbalanced, Welch-r method works well enough in the case of highly heterogeneous positive positive or negative negative pairs, there power are 68.8% and 51% consequencly. Welch method perform well enough only in the case of highly heterogeneous variety of positive positive pairs with it is power of 64.8%. While mixed models method is good in the case of a very heterogeneous variety of negative partner with 54.6% power. So in general, when there is a variance is not homogeneous case, Welch method is applied to the data rank (Welch-r) h |
doi_str_mv | 10.1063/1.4942992 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>proquest_scita</sourceid><recordid>TN_cdi_scitation_primary_10_1063_1_4942992</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2121899623</sourcerecordid><originalsourceid>FETCH-LOGICAL-p288t-eaf269878fb50746ec4006abe468af7e0daac3112748c519e221c59aebe1c76a3</originalsourceid><addsrcrecordid>eNp9kE1LAzEYhIMoWKsH_8GCN2Fr3mw2H95K8QsKXhS8hbfZd2lK26ybVOm_d6UFb15mLs8MwzB2DXwCXFV3MJFWCmvFCRtBXUOpFahTNuLcylLI6uOcXaS04lxYrc2I3U-3uN6nkIrYFhtKaV80mLH4DnlZLClTH5OnBlMOvgjbAcFBYkPrdMnOWlwnujr6mL0_PrzNnsv569PLbDovO2FMLglboazRpl3UXEtFXnKucEFSGWw18QbRVwBCS-NrsCQE-NoiLQi8VliN2c2ht-vj545Sdqu464fZyQkQYKxVohqo2wOVfMiYQ9y6rg8b7PfuK_YO3PEX1zXtfzBw93vkX6D6AQ_0ZGA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>2121899623</pqid></control><display><type>conference_proceeding</type><title>Analysis of messy data with heteroscedastic in mean models</title><source>American Institute of Physics:Jisc Collections:Transitional Journals Agreement 2021-23 (Reading list)</source><creator>Trianasari, Nurvita ; Sumarni, Cucu</creator><contributor>Naiborhu, Janson ; Kania, Adhe</contributor><creatorcontrib>Trianasari, Nurvita ; Sumarni, Cucu ; Naiborhu, Janson ; Kania, Adhe</creatorcontrib><description>In the analysis of the data, we often faced with the problem of data where the data did not meet some assumptions. In conditions of such data is often called data messy. This problem is a consequence of the data that generates outliers that bias or error estimation. To analyze the data messy, there are three approaches, namely standard analysis, transform data and data analysis methods rather than a standard. Simulations conducted to determine the performance of a third comparative test procedure on average often the model variance is not homogeneous. Data simulation of each scenario is raised as much as 500 times. Next, we do the analysis of the average comparison test using three methods, Welch test, mixed models and Welch-r test. Data generation is done through software R version 3.1.2. Based on simulation results, these three methods can be used for both normal and abnormal case (homoscedastic). The third method works very well on data balanced or unbalanced when there is no violation in the homogenity’s assumptions variance.
For balanced data, the three methods still showed an excellent performance despite the violation of the assumption of homogeneity of variance, with the requisite degree of heterogeneity is high. It can be shown from the level of power test above 90 percent, and the best to Welch method (98.4%) and the Welch-r method (97.8%). For unbalanced data, Welch method will be very good moderate at in case of heterogeneity positive pair with a 98.2% power. Mixed models method will be very good at case of highly heterogeneity was negative negative pairs with power. Welch-r method works very well in both cases.
However, if the level of heterogeneity of variance is very high, the power of all method will decrease especially for mixed models methods. The method which still works well enough (power more than 50%) is Welch-r method (62.6%), and the method of Welch (58.6%) in the case of balanced data. If the data are unbalanced, Welch-r method works well enough in the case of highly heterogeneous positive positive or negative negative pairs, there power are 68.8% and 51% consequencly. Welch method perform well enough only in the case of highly heterogeneous variety of positive positive pairs with it is power of 64.8%. While mixed models method is good in the case of a very heterogeneous variety of negative partner with 54.6% power. So in general, when there is a variance is not homogeneous case, Welch method is applied to the data rank (Welch-r) has a better performance than the other methods.</description><identifier>ISSN: 0094-243X</identifier><identifier>EISSN: 1551-7616</identifier><identifier>DOI: 10.1063/1.4942992</identifier><identifier>CODEN: APCPCS</identifier><language>eng</language><publisher>Melville: American Institute of Physics</publisher><subject>Computer simulation ; Data analysis ; Data simulation ; Error analysis ; Heterogeneity ; Methods ; Outliers (statistics) ; Test procedures</subject><ispartof>AIP conference proceedings, 2016, Vol.1716 (1)</ispartof><rights>AIP Publishing LLC</rights><rights>2016 AIP Publishing LLC.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>309,310,314,776,780,785,786,23910,23911,25119,27903,27904</link.rule.ids></links><search><contributor>Naiborhu, Janson</contributor><contributor>Kania, Adhe</contributor><creatorcontrib>Trianasari, Nurvita</creatorcontrib><creatorcontrib>Sumarni, Cucu</creatorcontrib><title>Analysis of messy data with heteroscedastic in mean models</title><title>AIP conference proceedings</title><description>In the analysis of the data, we often faced with the problem of data where the data did not meet some assumptions. In conditions of such data is often called data messy. This problem is a consequence of the data that generates outliers that bias or error estimation. To analyze the data messy, there are three approaches, namely standard analysis, transform data and data analysis methods rather than a standard. Simulations conducted to determine the performance of a third comparative test procedure on average often the model variance is not homogeneous. Data simulation of each scenario is raised as much as 500 times. Next, we do the analysis of the average comparison test using three methods, Welch test, mixed models and Welch-r test. Data generation is done through software R version 3.1.2. Based on simulation results, these three methods can be used for both normal and abnormal case (homoscedastic). The third method works very well on data balanced or unbalanced when there is no violation in the homogenity’s assumptions variance.
For balanced data, the three methods still showed an excellent performance despite the violation of the assumption of homogeneity of variance, with the requisite degree of heterogeneity is high. It can be shown from the level of power test above 90 percent, and the best to Welch method (98.4%) and the Welch-r method (97.8%). For unbalanced data, Welch method will be very good moderate at in case of heterogeneity positive pair with a 98.2% power. Mixed models method will be very good at case of highly heterogeneity was negative negative pairs with power. Welch-r method works very well in both cases.
However, if the level of heterogeneity of variance is very high, the power of all method will decrease especially for mixed models methods. The method which still works well enough (power more than 50%) is Welch-r method (62.6%), and the method of Welch (58.6%) in the case of balanced data. If the data are unbalanced, Welch-r method works well enough in the case of highly heterogeneous positive positive or negative negative pairs, there power are 68.8% and 51% consequencly. Welch method perform well enough only in the case of highly heterogeneous variety of positive positive pairs with it is power of 64.8%. While mixed models method is good in the case of a very heterogeneous variety of negative partner with 54.6% power. So in general, when there is a variance is not homogeneous case, Welch method is applied to the data rank (Welch-r) has a better performance than the other methods.</description><subject>Computer simulation</subject><subject>Data analysis</subject><subject>Data simulation</subject><subject>Error analysis</subject><subject>Heterogeneity</subject><subject>Methods</subject><subject>Outliers (statistics)</subject><subject>Test procedures</subject><issn>0094-243X</issn><issn>1551-7616</issn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2016</creationdate><recordtype>conference_proceeding</recordtype><sourceid>AJDQP</sourceid><recordid>eNp9kE1LAzEYhIMoWKsH_8GCN2Fr3mw2H95K8QsKXhS8hbfZd2lK26ybVOm_d6UFb15mLs8MwzB2DXwCXFV3MJFWCmvFCRtBXUOpFahTNuLcylLI6uOcXaS04lxYrc2I3U-3uN6nkIrYFhtKaV80mLH4DnlZLClTH5OnBlMOvgjbAcFBYkPrdMnOWlwnujr6mL0_PrzNnsv569PLbDovO2FMLglboazRpl3UXEtFXnKucEFSGWw18QbRVwBCS-NrsCQE-NoiLQi8VliN2c2ht-vj545Sdqu464fZyQkQYKxVohqo2wOVfMiYQ9y6rg8b7PfuK_YO3PEX1zXtfzBw93vkX6D6AQ_0ZGA</recordid><startdate>20160229</startdate><enddate>20160229</enddate><creator>Trianasari, Nurvita</creator><creator>Sumarni, Cucu</creator><general>American Institute of Physics</general><scope>AJDQP</scope><scope>8FD</scope><scope>H8D</scope><scope>L7M</scope></search><sort><creationdate>20160229</creationdate><title>Analysis of messy data with heteroscedastic in mean models</title><author>Trianasari, Nurvita ; Sumarni, Cucu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p288t-eaf269878fb50746ec4006abe468af7e0daac3112748c519e221c59aebe1c76a3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Computer simulation</topic><topic>Data analysis</topic><topic>Data simulation</topic><topic>Error analysis</topic><topic>Heterogeneity</topic><topic>Methods</topic><topic>Outliers (statistics)</topic><topic>Test procedures</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Trianasari, Nurvita</creatorcontrib><creatorcontrib>Sumarni, Cucu</creatorcontrib><collection>AIP Open Access Journals</collection><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>Advanced Technologies Database with Aerospace</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Trianasari, Nurvita</au><au>Sumarni, Cucu</au><au>Naiborhu, Janson</au><au>Kania, Adhe</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Analysis of messy data with heteroscedastic in mean models</atitle><btitle>AIP conference proceedings</btitle><date>2016-02-29</date><risdate>2016</risdate><volume>1716</volume><issue>1</issue><issn>0094-243X</issn><eissn>1551-7616</eissn><coden>APCPCS</coden><abstract>In the analysis of the data, we often faced with the problem of data where the data did not meet some assumptions. In conditions of such data is often called data messy. This problem is a consequence of the data that generates outliers that bias or error estimation. To analyze the data messy, there are three approaches, namely standard analysis, transform data and data analysis methods rather than a standard. Simulations conducted to determine the performance of a third comparative test procedure on average often the model variance is not homogeneous. Data simulation of each scenario is raised as much as 500 times. Next, we do the analysis of the average comparison test using three methods, Welch test, mixed models and Welch-r test. Data generation is done through software R version 3.1.2. Based on simulation results, these three methods can be used for both normal and abnormal case (homoscedastic). The third method works very well on data balanced or unbalanced when there is no violation in the homogenity’s assumptions variance.
For balanced data, the three methods still showed an excellent performance despite the violation of the assumption of homogeneity of variance, with the requisite degree of heterogeneity is high. It can be shown from the level of power test above 90 percent, and the best to Welch method (98.4%) and the Welch-r method (97.8%). For unbalanced data, Welch method will be very good moderate at in case of heterogeneity positive pair with a 98.2% power. Mixed models method will be very good at case of highly heterogeneity was negative negative pairs with power. Welch-r method works very well in both cases.
However, if the level of heterogeneity of variance is very high, the power of all method will decrease especially for mixed models methods. The method which still works well enough (power more than 50%) is Welch-r method (62.6%), and the method of Welch (58.6%) in the case of balanced data. If the data are unbalanced, Welch-r method works well enough in the case of highly heterogeneous positive positive or negative negative pairs, there power are 68.8% and 51% consequencly. Welch method perform well enough only in the case of highly heterogeneous variety of positive positive pairs with it is power of 64.8%. While mixed models method is good in the case of a very heterogeneous variety of negative partner with 54.6% power. So in general, when there is a variance is not homogeneous case, Welch method is applied to the data rank (Welch-r) has a better performance than the other methods.</abstract><cop>Melville</cop><pub>American Institute of Physics</pub><doi>10.1063/1.4942992</doi><tpages>6</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0094-243X |
ispartof | AIP conference proceedings, 2016, Vol.1716 (1) |
issn | 0094-243X 1551-7616 |
language | eng |
recordid | cdi_scitation_primary_10_1063_1_4942992 |
source | American Institute of Physics:Jisc Collections:Transitional Journals Agreement 2021-23 (Reading list) |
subjects | Computer simulation Data analysis Data simulation Error analysis Heterogeneity Methods Outliers (statistics) Test procedures |
title | Analysis of messy data with heteroscedastic in mean models |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T05%3A35%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_scita&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Analysis%20of%20messy%20data%20with%20heteroscedastic%20in%20mean%20models&rft.btitle=AIP%20conference%20proceedings&rft.au=Trianasari,%20Nurvita&rft.date=2016-02-29&rft.volume=1716&rft.issue=1&rft.issn=0094-243X&rft.eissn=1551-7616&rft.coden=APCPCS&rft_id=info:doi/10.1063/1.4942992&rft_dat=%3Cproquest_scita%3E2121899623%3C/proquest_scita%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-p288t-eaf269878fb50746ec4006abe468af7e0daac3112748c519e221c59aebe1c76a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2121899623&rft_id=info:pmid/&rfr_iscdi=true |