Loading…
A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction
The reprogrammable CRISPR/Cas9 genome editing tool's growing popularity is hindered by unwanted off-target effects. Efforts have been directed toward designing efficient guide RNAs as well as identifying potential off-target threats, yet factors that determine efficiency and off-target activity...
Saved in:
Published in: | Biomolecules (Basel, Switzerland) Switzerland), 2022-08, Vol.12 (8), p.1123 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c447t-6ab07e0ebe79d5328045bca16543d575420a91fa81b1235610a3fea4d971c8c3 |
---|---|
cites | cdi_FETCH-LOGICAL-c447t-6ab07e0ebe79d5328045bca16543d575420a91fa81b1235610a3fea4d971c8c3 |
container_end_page | |
container_issue | 8 |
container_start_page | 1123 |
container_title | Biomolecules (Basel, Switzerland) |
container_volume | 12 |
creator | Vora, Dhvani Sandip Verma, Yugesh Sundar, Durai |
description | The reprogrammable CRISPR/Cas9 genome editing tool's growing popularity is hindered by unwanted off-target effects. Efforts have been directed toward designing efficient guide RNAs as well as identifying potential off-target threats, yet factors that determine efficiency and off-target activity remain obscure. Based on sequence features, previous machine learning models performed poorly on new datasets, thus there is a need for the incorporation of novel features. The binding energy estimation of the gRNA-DNA hybrid as well as the Cas9-gRNA-DNA hybrid allowed generating better performing machine learning models for the prediction of Cas9 activity. The analysis of feature contribution towards the model output on a limited dataset indicated that energy features played a determining role along with the sequence features. The binding energy features proved essential for the prediction of on-target activity and off-target sites. The plateau, in the performance on unseen datasets, of current machine learning models could be overcome by incorporating novel features, such as binding energy, among others. The models are provided on GitHub (GitHub Inc., San Francisco, CA, USA). |
doi_str_mv | 10.3390/biom12081123 |
format | article |
fullrecord | <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_3493fa8fdd5242a8b8db8b1b4c805342</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A726894312</galeid><doaj_id>oai_doaj_org_article_3493fa8fdd5242a8b8db8b1b4c805342</doaj_id><sourcerecordid>A726894312</sourcerecordid><originalsourceid>FETCH-LOGICAL-c447t-6ab07e0ebe79d5328045bca16543d575420a91fa81b1235610a3fea4d971c8c3</originalsourceid><addsrcrecordid>eNpdks9v2yAUx61p01p1ve08Ie2yw9Ly0-DLpChaV0vZVnU97IYwPCdENnjYiZT_fqTpqnSABDw-fOH79IriPcFXjFX4uvGxJxQrQih7VZxTStSMSvb79cn6rLgcxw3OTeVB2dvijJUYV5jI86Kbo-_Grn0AtASTgg8rNB-GFHMQTRHVDsLk2z2a1oDqfohpMsECii36EXfQoRsw0zbBiNqY0OK-_nV3f70wY4XmdvI7P-3RXQLn8yaGd8Wb1nQjXD7NF8XDzdeHxe1s-fNbvZgvZ5ZzOc1K02AJGBqQlROMKsxFYw0pBWdOSMEpNhVpjSJNti1Kgg1rwXBXSWKVZRdFfZR10Wz0kHxv0l5H4_VjIKaVNmnytgPNeMWyUOucoJwa1SjXqIY03CosGKdZ68tRa9g2PTibs5FM90L05Unwa72KO11xLEomssCnJ4EU_2xhnHTvRwtdZwLE7aipxLLERMgqox__Qzdxm0LO1IHKNklOQaaujtTKZAM-tDG_a3N30HsbA7Q-x-eSlqrijBwsfD5esCmOY4L2-fcE60MV6dMqyviHU8fP8L-aYX8BhtHBRA</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2706101165</pqid></control><display><type>article</type><title>A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction</title><source>Publicly Available Content Database</source><source>PubMed Central</source><creator>Vora, Dhvani Sandip ; Verma, Yugesh ; Sundar, Durai</creator><creatorcontrib>Vora, Dhvani Sandip ; Verma, Yugesh ; Sundar, Durai</creatorcontrib><description>The reprogrammable CRISPR/Cas9 genome editing tool's growing popularity is hindered by unwanted off-target effects. Efforts have been directed toward designing efficient guide RNAs as well as identifying potential off-target threats, yet factors that determine efficiency and off-target activity remain obscure. Based on sequence features, previous machine learning models performed poorly on new datasets, thus there is a need for the incorporation of novel features. The binding energy estimation of the gRNA-DNA hybrid as well as the Cas9-gRNA-DNA hybrid allowed generating better performing machine learning models for the prediction of Cas9 activity. The analysis of feature contribution towards the model output on a limited dataset indicated that energy features played a determining role along with the sequence features. The binding energy features proved essential for the prediction of on-target activity and off-target sites. The plateau, in the performance on unseen datasets, of current machine learning models could be overcome by incorporating novel features, such as binding energy, among others. The models are provided on GitHub (GitHub Inc., San Francisco, CA, USA).</description><identifier>ISSN: 2218-273X</identifier><identifier>EISSN: 2218-273X</identifier><identifier>DOI: 10.3390/biom12081123</identifier><identifier>PMID: 36009017</identifier><language>eng</language><publisher>Switzerland: MDPI AG</publisher><subject>Algorithms ; binding energy ; CRISPR ; CRISPR-Cas Systems - genetics ; CRISPR/Cas9 ; Datasets ; Deep learning ; Design ; DNA ; Efficiency ; Energy ; Gene Editing ; Genome editing ; Genomes ; gRNA ; Learning algorithms ; Machine Learning ; Methods ; off-targets ; Predictions ; Proteins ; RNA, Guide, CRISPR-Cas Systems - genetics ; SHAP values ; Thermodynamics</subject><ispartof>Biomolecules (Basel, Switzerland), 2022-08, Vol.12 (8), p.1123</ispartof><rights>COPYRIGHT 2022 MDPI AG</rights><rights>2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2022 by the authors. 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c447t-6ab07e0ebe79d5328045bca16543d575420a91fa81b1235610a3fea4d971c8c3</citedby><cites>FETCH-LOGICAL-c447t-6ab07e0ebe79d5328045bca16543d575420a91fa81b1235610a3fea4d971c8c3</cites><orcidid>0000-0002-6549-6663</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2706101165/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2706101165?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25751,27922,27923,37010,37011,44588,53789,53791,74896</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/36009017$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Vora, Dhvani Sandip</creatorcontrib><creatorcontrib>Verma, Yugesh</creatorcontrib><creatorcontrib>Sundar, Durai</creatorcontrib><title>A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction</title><title>Biomolecules (Basel, Switzerland)</title><addtitle>Biomolecules</addtitle><description>The reprogrammable CRISPR/Cas9 genome editing tool's growing popularity is hindered by unwanted off-target effects. Efforts have been directed toward designing efficient guide RNAs as well as identifying potential off-target threats, yet factors that determine efficiency and off-target activity remain obscure. Based on sequence features, previous machine learning models performed poorly on new datasets, thus there is a need for the incorporation of novel features. The binding energy estimation of the gRNA-DNA hybrid as well as the Cas9-gRNA-DNA hybrid allowed generating better performing machine learning models for the prediction of Cas9 activity. The analysis of feature contribution towards the model output on a limited dataset indicated that energy features played a determining role along with the sequence features. The binding energy features proved essential for the prediction of on-target activity and off-target sites. The plateau, in the performance on unseen datasets, of current machine learning models could be overcome by incorporating novel features, such as binding energy, among others. The models are provided on GitHub (GitHub Inc., San Francisco, CA, USA).</description><subject>Algorithms</subject><subject>binding energy</subject><subject>CRISPR</subject><subject>CRISPR-Cas Systems - genetics</subject><subject>CRISPR/Cas9</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Design</subject><subject>DNA</subject><subject>Efficiency</subject><subject>Energy</subject><subject>Gene Editing</subject><subject>Genome editing</subject><subject>Genomes</subject><subject>gRNA</subject><subject>Learning algorithms</subject><subject>Machine Learning</subject><subject>Methods</subject><subject>off-targets</subject><subject>Predictions</subject><subject>Proteins</subject><subject>RNA, Guide, CRISPR-Cas Systems - genetics</subject><subject>SHAP values</subject><subject>Thermodynamics</subject><issn>2218-273X</issn><issn>2218-273X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNpdks9v2yAUx61p01p1ve08Ie2yw9Ly0-DLpChaV0vZVnU97IYwPCdENnjYiZT_fqTpqnSABDw-fOH79IriPcFXjFX4uvGxJxQrQih7VZxTStSMSvb79cn6rLgcxw3OTeVB2dvijJUYV5jI86Kbo-_Grn0AtASTgg8rNB-GFHMQTRHVDsLk2z2a1oDqfohpMsECii36EXfQoRsw0zbBiNqY0OK-_nV3f70wY4XmdvI7P-3RXQLn8yaGd8Wb1nQjXD7NF8XDzdeHxe1s-fNbvZgvZ5ZzOc1K02AJGBqQlROMKsxFYw0pBWdOSMEpNhVpjSJNti1Kgg1rwXBXSWKVZRdFfZR10Wz0kHxv0l5H4_VjIKaVNmnytgPNeMWyUOucoJwa1SjXqIY03CosGKdZ68tRa9g2PTibs5FM90L05Unwa72KO11xLEomssCnJ4EU_2xhnHTvRwtdZwLE7aipxLLERMgqox__Qzdxm0LO1IHKNklOQaaujtTKZAM-tDG_a3N30HsbA7Q-x-eSlqrijBwsfD5esCmOY4L2-fcE60MV6dMqyviHU8fP8L-aYX8BhtHBRA</recordid><startdate>20220816</startdate><enddate>20220816</enddate><creator>Vora, Dhvani Sandip</creator><creator>Verma, Yugesh</creator><creator>Sundar, Durai</creator><general>MDPI AG</general><general>MDPI</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7T5</scope><scope>7TM</scope><scope>7TO</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-6549-6663</orcidid></search><sort><creationdate>20220816</creationdate><title>A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction</title><author>Vora, Dhvani Sandip ; Verma, Yugesh ; Sundar, Durai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c447t-6ab07e0ebe79d5328045bca16543d575420a91fa81b1235610a3fea4d971c8c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>binding energy</topic><topic>CRISPR</topic><topic>CRISPR-Cas Systems - genetics</topic><topic>CRISPR/Cas9</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Design</topic><topic>DNA</topic><topic>Efficiency</topic><topic>Energy</topic><topic>Gene Editing</topic><topic>Genome editing</topic><topic>Genomes</topic><topic>gRNA</topic><topic>Learning algorithms</topic><topic>Machine Learning</topic><topic>Methods</topic><topic>off-targets</topic><topic>Predictions</topic><topic>Proteins</topic><topic>RNA, Guide, CRISPR-Cas Systems - genetics</topic><topic>SHAP values</topic><topic>Thermodynamics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vora, Dhvani Sandip</creatorcontrib><creatorcontrib>Verma, Yugesh</creatorcontrib><creatorcontrib>Sundar, Durai</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Immunology Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>ProQuest Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>Biomolecules (Basel, Switzerland)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Vora, Dhvani Sandip</au><au>Verma, Yugesh</au><au>Sundar, Durai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction</atitle><jtitle>Biomolecules (Basel, Switzerland)</jtitle><addtitle>Biomolecules</addtitle><date>2022-08-16</date><risdate>2022</risdate><volume>12</volume><issue>8</issue><spage>1123</spage><pages>1123-</pages><issn>2218-273X</issn><eissn>2218-273X</eissn><abstract>The reprogrammable CRISPR/Cas9 genome editing tool's growing popularity is hindered by unwanted off-target effects. Efforts have been directed toward designing efficient guide RNAs as well as identifying potential off-target threats, yet factors that determine efficiency and off-target activity remain obscure. Based on sequence features, previous machine learning models performed poorly on new datasets, thus there is a need for the incorporation of novel features. The binding energy estimation of the gRNA-DNA hybrid as well as the Cas9-gRNA-DNA hybrid allowed generating better performing machine learning models for the prediction of Cas9 activity. The analysis of feature contribution towards the model output on a limited dataset indicated that energy features played a determining role along with the sequence features. The binding energy features proved essential for the prediction of on-target activity and off-target sites. The plateau, in the performance on unseen datasets, of current machine learning models could be overcome by incorporating novel features, such as binding energy, among others. The models are provided on GitHub (GitHub Inc., San Francisco, CA, USA).</abstract><cop>Switzerland</cop><pub>MDPI AG</pub><pmid>36009017</pmid><doi>10.3390/biom12081123</doi><orcidid>https://orcid.org/0000-0002-6549-6663</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2218-273X |
ispartof | Biomolecules (Basel, Switzerland), 2022-08, Vol.12 (8), p.1123 |
issn | 2218-273X 2218-273X |
language | eng |
recordid | cdi_doaj_primary_oai_doaj_org_article_3493fa8fdd5242a8b8db8b1b4c805342 |
source | Publicly Available Content Database; PubMed Central |
subjects | Algorithms binding energy CRISPR CRISPR-Cas Systems - genetics CRISPR/Cas9 Datasets Deep learning Design DNA Efficiency Energy Gene Editing Genome editing Genomes gRNA Learning algorithms Machine Learning Methods off-targets Predictions Proteins RNA, Guide, CRISPR-Cas Systems - genetics SHAP values Thermodynamics |
title | A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T20%3A51%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Machine%20Learning%20Approach%20to%20Identify%20the%20Importance%20of%20Novel%20Features%20for%20CRISPR/Cas9%20Activity%20Prediction&rft.jtitle=Biomolecules%20(Basel,%20Switzerland)&rft.au=Vora,%20Dhvani%20Sandip&rft.date=2022-08-16&rft.volume=12&rft.issue=8&rft.spage=1123&rft.pages=1123-&rft.issn=2218-273X&rft.eissn=2218-273X&rft_id=info:doi/10.3390/biom12081123&rft_dat=%3Cgale_doaj_%3EA726894312%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c447t-6ab07e0ebe79d5328045bca16543d575420a91fa81b1235610a3fea4d971c8c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2706101165&rft_id=info:pmid/36009017&rft_galeid=A726894312&rfr_iscdi=true |