Loading…

Design of a Flexible, User Friendly Feature Matrix Generation System and its Application on Biomedical Datasets

The generation of a feature matrix is the first step in conducting machine learning analyses on complex data sets such as those containing DNA, RNA or protein sequences. These matrices contain information for each object which have to be identified using complex algorithms to interrogate the data. T...

Full description

Saved in:
Bibliographic Details
Published in:Journal of grid computing 2020-09, Vol.18 (3), p.507-527
Main Authors: Ghorbani, M., Swift, S., Taylor, S. J. E., Payne, A. M.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c286t-e85015fcd41e13d990e4e63792cfe4581b82d2bac2119e4d23e0ddfb31da15b33
container_end_page 527
container_issue 3
container_start_page 507
container_title Journal of grid computing
container_volume 18
creator Ghorbani, M.
Swift, S.
Taylor, S. J. E.
Payne, A. M.
description The generation of a feature matrix is the first step in conducting machine learning analyses on complex data sets such as those containing DNA, RNA or protein sequences. These matrices contain information for each object which have to be identified using complex algorithms to interrogate the data. They are normally generated by combining the results of running such algorithms across various datasets from different and distributed data sources. Thus for non-computing experts the generation of such matrices prove a barrier to employing machine learning techniques. Further since datasets are becoming larger this barrier is augmented by the limitations of the single personal computer most often used by investigators to carry out such analyses. Here we propose a user friendly system to generate feature matrices in a way that is flexible, scalable and extendable. Additionally by making use of The Berkeley Open Infrastructure for Network Computing (BOINC) software, the process can be speeded up using distributed volunteer computing possible in most institutions. The system makes use of a combination of the Grid and Cloud User Support Environment (gUSE), combined with the Web Services Parallel Grid Runtime and Developer Environment Portal (WS-PGRADE) to create workflow-based science gateways that allow users to submit work to the distributed computing. This report demonstrates the use of our proposed WS-PGRADE/gUSE BOINC system to identify features to populate matrices from very large DNA sequence data repositories, however we propose that this system could be used to analyse a wide variety of feature sets including image, numerical and text data.
doi_str_mv 10.1007/s10723-020-09518-y
format article
fullrecord <record><control><sourceid>crossref_sprin</sourceid><recordid>TN_cdi_crossref_primary_10_1007_s10723_020_09518_y</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1007_s10723_020_09518_y</sourcerecordid><originalsourceid>FETCH-LOGICAL-c286t-e85015fcd41e13d990e4e63792cfe4581b82d2bac2119e4d23e0ddfb31da15b33</originalsourceid><addsrcrecordid>eNp9kMFOwzAMQCMEEmPwA5zyAQScpF3T49joQBriADtHaeNOmbp2SjJp_Xu6lTOSJVu2n2U9Qh45PHOA7CVwyIRkIIBBnnLF-isy4WkmWM5Vcn2pgWUqk7fkLoQdgEgViAnplhjctqVdTQ0tGjy5ssEnugnoaeEdtrbpaYEmHj3STxO9O9EVtuhNdF1Lv_sQcU9Na6mLgc4Ph8ZV42iIV9ft0Q6Nhi5NNAFjuCc3tWkCPvzlKdkUbz-Ld7b-Wn0s5mtWCTWLDFUKPK0rm3Dk0uY5YIIzmeWiqjFJFS-VsKI0leA8x8QKiWBtXUpuDU9LKadEjHcr34XgsdYH7_bG95qDPivTozI9KNMXZbofIDlCYVhut-j1rjv6dvjzP-oXslZw6A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Design of a Flexible, User Friendly Feature Matrix Generation System and its Application on Biomedical Datasets</title><source>Springer Nature</source><creator>Ghorbani, M. ; Swift, S. ; Taylor, S. J. E. ; Payne, A. M.</creator><creatorcontrib>Ghorbani, M. ; Swift, S. ; Taylor, S. J. E. ; Payne, A. M.</creatorcontrib><description>The generation of a feature matrix is the first step in conducting machine learning analyses on complex data sets such as those containing DNA, RNA or protein sequences. These matrices contain information for each object which have to be identified using complex algorithms to interrogate the data. They are normally generated by combining the results of running such algorithms across various datasets from different and distributed data sources. Thus for non-computing experts the generation of such matrices prove a barrier to employing machine learning techniques. Further since datasets are becoming larger this barrier is augmented by the limitations of the single personal computer most often used by investigators to carry out such analyses. Here we propose a user friendly system to generate feature matrices in a way that is flexible, scalable and extendable. Additionally by making use of The Berkeley Open Infrastructure for Network Computing (BOINC) software, the process can be speeded up using distributed volunteer computing possible in most institutions. The system makes use of a combination of the Grid and Cloud User Support Environment (gUSE), combined with the Web Services Parallel Grid Runtime and Developer Environment Portal (WS-PGRADE) to create workflow-based science gateways that allow users to submit work to the distributed computing. This report demonstrates the use of our proposed WS-PGRADE/gUSE BOINC system to identify features to populate matrices from very large DNA sequence data repositories, however we propose that this system could be used to analyse a wide variety of feature sets including image, numerical and text data.</description><identifier>ISSN: 1570-7873</identifier><identifier>EISSN: 1572-9184</identifier><identifier>DOI: 10.1007/s10723-020-09518-y</identifier><language>eng</language><publisher>Dordrecht: Springer Netherlands</publisher><subject>Computer Science ; Management of Computing and Information Systems ; Processor Architectures ; User Interfaces and Human Computer Interaction</subject><ispartof>Journal of grid computing, 2020-09, Vol.18 (3), p.507-527</ispartof><rights>The Author(s) 2020</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c286t-e85015fcd41e13d990e4e63792cfe4581b82d2bac2119e4d23e0ddfb31da15b33</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,778,782,27907,27908</link.rule.ids></links><search><creatorcontrib>Ghorbani, M.</creatorcontrib><creatorcontrib>Swift, S.</creatorcontrib><creatorcontrib>Taylor, S. J. E.</creatorcontrib><creatorcontrib>Payne, A. M.</creatorcontrib><title>Design of a Flexible, User Friendly Feature Matrix Generation System and its Application on Biomedical Datasets</title><title>Journal of grid computing</title><addtitle>J Grid Computing</addtitle><description>The generation of a feature matrix is the first step in conducting machine learning analyses on complex data sets such as those containing DNA, RNA or protein sequences. These matrices contain information for each object which have to be identified using complex algorithms to interrogate the data. They are normally generated by combining the results of running such algorithms across various datasets from different and distributed data sources. Thus for non-computing experts the generation of such matrices prove a barrier to employing machine learning techniques. Further since datasets are becoming larger this barrier is augmented by the limitations of the single personal computer most often used by investigators to carry out such analyses. Here we propose a user friendly system to generate feature matrices in a way that is flexible, scalable and extendable. Additionally by making use of The Berkeley Open Infrastructure for Network Computing (BOINC) software, the process can be speeded up using distributed volunteer computing possible in most institutions. The system makes use of a combination of the Grid and Cloud User Support Environment (gUSE), combined with the Web Services Parallel Grid Runtime and Developer Environment Portal (WS-PGRADE) to create workflow-based science gateways that allow users to submit work to the distributed computing. This report demonstrates the use of our proposed WS-PGRADE/gUSE BOINC system to identify features to populate matrices from very large DNA sequence data repositories, however we propose that this system could be used to analyse a wide variety of feature sets including image, numerical and text data.</description><subject>Computer Science</subject><subject>Management of Computing and Information Systems</subject><subject>Processor Architectures</subject><subject>User Interfaces and Human Computer Interaction</subject><issn>1570-7873</issn><issn>1572-9184</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNp9kMFOwzAMQCMEEmPwA5zyAQScpF3T49joQBriADtHaeNOmbp2SjJp_Xu6lTOSJVu2n2U9Qh45PHOA7CVwyIRkIIBBnnLF-isy4WkmWM5Vcn2pgWUqk7fkLoQdgEgViAnplhjctqVdTQ0tGjy5ssEnugnoaeEdtrbpaYEmHj3STxO9O9EVtuhNdF1Lv_sQcU9Na6mLgc4Ph8ZV42iIV9ft0Q6Nhi5NNAFjuCc3tWkCPvzlKdkUbz-Ld7b-Wn0s5mtWCTWLDFUKPK0rm3Dk0uY5YIIzmeWiqjFJFS-VsKI0leA8x8QKiWBtXUpuDU9LKadEjHcr34XgsdYH7_bG95qDPivTozI9KNMXZbofIDlCYVhut-j1rjv6dvjzP-oXslZw6A</recordid><startdate>20200901</startdate><enddate>20200901</enddate><creator>Ghorbani, M.</creator><creator>Swift, S.</creator><creator>Taylor, S. J. E.</creator><creator>Payne, A. M.</creator><general>Springer Netherlands</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20200901</creationdate><title>Design of a Flexible, User Friendly Feature Matrix Generation System and its Application on Biomedical Datasets</title><author>Ghorbani, M. ; Swift, S. ; Taylor, S. J. E. ; Payne, A. M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c286t-e85015fcd41e13d990e4e63792cfe4581b82d2bac2119e4d23e0ddfb31da15b33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science</topic><topic>Management of Computing and Information Systems</topic><topic>Processor Architectures</topic><topic>User Interfaces and Human Computer Interaction</topic><toplevel>online_resources</toplevel><creatorcontrib>Ghorbani, M.</creatorcontrib><creatorcontrib>Swift, S.</creatorcontrib><creatorcontrib>Taylor, S. J. E.</creatorcontrib><creatorcontrib>Payne, A. M.</creatorcontrib><collection>Springer Open Access</collection><collection>CrossRef</collection><jtitle>Journal of grid computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ghorbani, M.</au><au>Swift, S.</au><au>Taylor, S. J. E.</au><au>Payne, A. M.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Design of a Flexible, User Friendly Feature Matrix Generation System and its Application on Biomedical Datasets</atitle><jtitle>Journal of grid computing</jtitle><stitle>J Grid Computing</stitle><date>2020-09-01</date><risdate>2020</risdate><volume>18</volume><issue>3</issue><spage>507</spage><epage>527</epage><pages>507-527</pages><issn>1570-7873</issn><eissn>1572-9184</eissn><abstract>The generation of a feature matrix is the first step in conducting machine learning analyses on complex data sets such as those containing DNA, RNA or protein sequences. These matrices contain information for each object which have to be identified using complex algorithms to interrogate the data. They are normally generated by combining the results of running such algorithms across various datasets from different and distributed data sources. Thus for non-computing experts the generation of such matrices prove a barrier to employing machine learning techniques. Further since datasets are becoming larger this barrier is augmented by the limitations of the single personal computer most often used by investigators to carry out such analyses. Here we propose a user friendly system to generate feature matrices in a way that is flexible, scalable and extendable. Additionally by making use of The Berkeley Open Infrastructure for Network Computing (BOINC) software, the process can be speeded up using distributed volunteer computing possible in most institutions. The system makes use of a combination of the Grid and Cloud User Support Environment (gUSE), combined with the Web Services Parallel Grid Runtime and Developer Environment Portal (WS-PGRADE) to create workflow-based science gateways that allow users to submit work to the distributed computing. This report demonstrates the use of our proposed WS-PGRADE/gUSE BOINC system to identify features to populate matrices from very large DNA sequence data repositories, however we propose that this system could be used to analyse a wide variety of feature sets including image, numerical and text data.</abstract><cop>Dordrecht</cop><pub>Springer Netherlands</pub><doi>10.1007/s10723-020-09518-y</doi><tpages>21</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1570-7873
ispartof Journal of grid computing, 2020-09, Vol.18 (3), p.507-527
issn 1570-7873
1572-9184
language eng
recordid cdi_crossref_primary_10_1007_s10723_020_09518_y
source Springer Nature
subjects Computer Science
Management of Computing and Information Systems
Processor Architectures
User Interfaces and Human Computer Interaction
title Design of a Flexible, User Friendly Feature Matrix Generation System and its Application on Biomedical Datasets
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T18%3A57%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_sprin&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Design%20of%20a%20Flexible,%20User%20Friendly%20Feature%20Matrix%20Generation%20System%20and%20its%20Application%20on%20Biomedical%20Datasets&rft.jtitle=Journal%20of%20grid%20computing&rft.au=Ghorbani,%20M.&rft.date=2020-09-01&rft.volume=18&rft.issue=3&rft.spage=507&rft.epage=527&rft.pages=507-527&rft.issn=1570-7873&rft.eissn=1572-9184&rft_id=info:doi/10.1007/s10723-020-09518-y&rft_dat=%3Ccrossref_sprin%3E10_1007_s10723_020_09518_y%3C/crossref_sprin%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c286t-e85015fcd41e13d990e4e63792cfe4581b82d2bac2119e4d23e0ddfb31da15b33%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true