Loading…

An Encrypted Speech Retrieval Method Based on Deep Perceptual Hashing and CNN-BiLSTM

Since convolutional neural network (CNN) can only extract local features, and long shortterm memory (LSTM) neural network model has a large number of learning calculations, a long processing time and an obvious degree of information loss as the length of speech increases. Utilizing the characteristi...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE access 2020-01, Vol.8, p.1-1
Main Authors:	Zhang, Qiuyu, Li, Yuzhou, Hu, Yinjie, Zhao, Xuejiao
Format:	Article
Language:	English
Subjects:	4D hyperchaotic system Algorithms Artificial neural networks CNN-BiLSTM Deep perceptual hashing Encrypted speech retrieval Encryption Feature extraction Filter banks Hidden Markov models Machine learning Mel frequency cepstral coefficient Neural networks Retrieval Short term Spectrogram Speech Speech feature extraction
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c408t-bef5c03889e5bc9e8532c5f14e305b3abb77489cf266b7b14e3b0a894f1ef0993
cites	cdi_FETCH-LOGICAL-c408t-bef5c03889e5bc9e8532c5f14e305b3abb77489cf266b7b14e3b0a894f1ef0993
container_end_page	1
container_issue
container_start_page	1
container_title	IEEE access
container_volume	8
creator	Zhang, Qiuyu Li, Yuzhou Hu, Yinjie Zhao, Xuejiao
description	Since convolutional neural network (CNN) can only extract local features, and long shortterm memory (LSTM) neural network model has a large number of learning calculations, a long processing time and an obvious degree of information loss as the length of speech increases. Utilizing the characteristics of autonomous feature extraction in deep learning, CNN and bidirectional long short-term memory (BiLSTM) network are combined to present an encrypted speech retrieval method based on deep perceptual hashing and CNN-BiLSTM. Firstly, the proposed method extracts the Log-Mel Spectrogram/MFCC features of the original speech and enters the CNN and BiLSTM networks in turn for model training. Secondly, we use the trained fusion network model to learn the deep perceptual feature and generate deep perceptual hashing sequences. Finally, the normalized Hamming distance algorithm is used for matching retrieval. In order to protect the speech security in the cloud, a speech encryption algorithm based on a 4D hyperchaotic system is proposed. The experimental results show that the proposed method has good discrimination, robustness, recall and precision compared with the existing methods, and it has good retrieval efficiency and retrieval accuracy for longer speech. Meanwhile, the proposed speech encryption algorithm has a high key space to resist exhaustive attacks.
doi_str_mv	10.1109/ACCESS.2020.3015876
format	article
fullrecord	<record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_2b3d2c6822bd46829ff2e9727e94a994</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9165111</ieee_id><doaj_id>oai_doaj_org_article_2b3d2c6822bd46829ff2e9727e94a994</doaj_id><sourcerecordid>2454642621</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-bef5c03889e5bc9e8532c5f14e305b3abb77489cf266b7b14e3b0a894f1ef0993</originalsourceid><addsrcrecordid>eNpNUctOwzAQjBBIIOALuFjinOJ37GMJ5SGVh2g5W7azpqlKEpwUqX-PSxBiL7Oa3ZldabLsguAJIVhfTctytlhMKKZ4wjARqpAH2QklUudMMHn4rz_Ozvt-jVOpRIniJFtOGzRrfNx1A1Ro0QH4FXqFIdbwZTfoEYZVW6Fr26dp26AbgA69QPTQDds0v7f9qm7ekW0qVD495df1fLF8PMuOgt30cP6Lp9nb7WxZ3ufz57uHcjrPPcdqyB0E4TFTSoNwXoMSjHoRCAeGhWPWuaLgSvtApXSF2_MOW6V5IBCw1uw0exh9q9auTRfrDxt3prW1-SHa-G5sHGq_AUMdq6iXilJX8QQ6BAq6oAVobrXmyety9Opi-7mFfjDrdhub9L6hXHDJqaQkbbFxy8e27yOEv6sEm30aZkzD7NMwv2kk1cWoqgHgT6GJFIQQ9g1jZYNG</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2454642621</pqid></control><display><type>article</type><title>An Encrypted Speech Retrieval Method Based on Deep Perceptual Hashing and CNN-BiLSTM</title><source>IEEE Xplore Open Access Journals</source><creator>Zhang, Qiuyu ; Li, Yuzhou ; Hu, Yinjie ; Zhao, Xuejiao</creator><creatorcontrib>Zhang, Qiuyu ; Li, Yuzhou ; Hu, Yinjie ; Zhao, Xuejiao</creatorcontrib><description>Since convolutional neural network (CNN) can only extract local features, and long shortterm memory (LSTM) neural network model has a large number of learning calculations, a long processing time and an obvious degree of information loss as the length of speech increases. Utilizing the characteristics of autonomous feature extraction in deep learning, CNN and bidirectional long short-term memory (BiLSTM) network are combined to present an encrypted speech retrieval method based on deep perceptual hashing and CNN-BiLSTM. Firstly, the proposed method extracts the Log-Mel Spectrogram/MFCC features of the original speech and enters the CNN and BiLSTM networks in turn for model training. Secondly, we use the trained fusion network model to learn the deep perceptual feature and generate deep perceptual hashing sequences. Finally, the normalized Hamming distance algorithm is used for matching retrieval. In order to protect the speech security in the cloud, a speech encryption algorithm based on a 4D hyperchaotic system is proposed. The experimental results show that the proposed method has good discrimination, robustness, recall and precision compared with the existing methods, and it has good retrieval efficiency and retrieval accuracy for longer speech. Meanwhile, the proposed speech encryption algorithm has a high key space to resist exhaustive attacks.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2020.3015876</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>4D hyperchaotic system ; Algorithms ; Artificial neural networks ; CNN-BiLSTM ; Deep perceptual hashing ; Encrypted speech retrieval ; Encryption ; Feature extraction ; Filter banks ; Hidden Markov models ; Machine learning ; Mel frequency cepstral coefficient ; Neural networks ; Retrieval ; Short term ; Spectrogram ; Speech ; Speech feature extraction</subject><ispartof>IEEE access, 2020-01, Vol.8, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-bef5c03889e5bc9e8532c5f14e305b3abb77489cf266b7b14e3b0a894f1ef0993</citedby><cites>FETCH-LOGICAL-c408t-bef5c03889e5bc9e8532c5f14e305b3abb77489cf266b7b14e3b0a894f1ef0993</cites><orcidid>0000-0003-1488-388X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9165111$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,27633,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Zhang, Qiuyu</creatorcontrib><creatorcontrib>Li, Yuzhou</creatorcontrib><creatorcontrib>Hu, Yinjie</creatorcontrib><creatorcontrib>Zhao, Xuejiao</creatorcontrib><title>An Encrypted Speech Retrieval Method Based on Deep Perceptual Hashing and CNN-BiLSTM</title><title>IEEE access</title><addtitle>Access</addtitle><description>Since convolutional neural network (CNN) can only extract local features, and long shortterm memory (LSTM) neural network model has a large number of learning calculations, a long processing time and an obvious degree of information loss as the length of speech increases. Utilizing the characteristics of autonomous feature extraction in deep learning, CNN and bidirectional long short-term memory (BiLSTM) network are combined to present an encrypted speech retrieval method based on deep perceptual hashing and CNN-BiLSTM. Firstly, the proposed method extracts the Log-Mel Spectrogram/MFCC features of the original speech and enters the CNN and BiLSTM networks in turn for model training. Secondly, we use the trained fusion network model to learn the deep perceptual feature and generate deep perceptual hashing sequences. Finally, the normalized Hamming distance algorithm is used for matching retrieval. In order to protect the speech security in the cloud, a speech encryption algorithm based on a 4D hyperchaotic system is proposed. The experimental results show that the proposed method has good discrimination, robustness, recall and precision compared with the existing methods, and it has good retrieval efficiency and retrieval accuracy for longer speech. Meanwhile, the proposed speech encryption algorithm has a high key space to resist exhaustive attacks.</description><subject>4D hyperchaotic system</subject><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>CNN-BiLSTM</subject><subject>Deep perceptual hashing</subject><subject>Encrypted speech retrieval</subject><subject>Encryption</subject><subject>Feature extraction</subject><subject>Filter banks</subject><subject>Hidden Markov models</subject><subject>Machine learning</subject><subject>Mel frequency cepstral coefficient</subject><subject>Neural networks</subject><subject>Retrieval</subject><subject>Short term</subject><subject>Spectrogram</subject><subject>Speech</subject><subject>Speech feature extraction</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>DOA</sourceid><recordid>eNpNUctOwzAQjBBIIOALuFjinOJ37GMJ5SGVh2g5W7azpqlKEpwUqX-PSxBiL7Oa3ZldabLsguAJIVhfTctytlhMKKZ4wjARqpAH2QklUudMMHn4rz_Ozvt-jVOpRIniJFtOGzRrfNx1A1Ro0QH4FXqFIdbwZTfoEYZVW6Fr26dp26AbgA69QPTQDds0v7f9qm7ekW0qVD495df1fLF8PMuOgt30cP6Lp9nb7WxZ3ufz57uHcjrPPcdqyB0E4TFTSoNwXoMSjHoRCAeGhWPWuaLgSvtApXSF2_MOW6V5IBCw1uw0exh9q9auTRfrDxt3prW1-SHa-G5sHGq_AUMdq6iXilJX8QQ6BAq6oAVobrXmyety9Opi-7mFfjDrdhub9L6hXHDJqaQkbbFxy8e27yOEv6sEm30aZkzD7NMwv2kk1cWoqgHgT6GJFIQQ9g1jZYNG</recordid><startdate>20200101</startdate><enddate>20200101</enddate><creator>Zhang, Qiuyu</creator><creator>Li, Yuzhou</creator><creator>Hu, Yinjie</creator><creator>Zhao, Xuejiao</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-1488-388X</orcidid></search><sort><creationdate>20200101</creationdate><title>An Encrypted Speech Retrieval Method Based on Deep Perceptual Hashing and CNN-BiLSTM</title><author>Zhang, Qiuyu ; Li, Yuzhou ; Hu, Yinjie ; Zhao, Xuejiao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-bef5c03889e5bc9e8532c5f14e305b3abb77489cf266b7b14e3b0a894f1ef0993</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>4D hyperchaotic system</topic><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>CNN-BiLSTM</topic><topic>Deep perceptual hashing</topic><topic>Encrypted speech retrieval</topic><topic>Encryption</topic><topic>Feature extraction</topic><topic>Filter banks</topic><topic>Hidden Markov models</topic><topic>Machine learning</topic><topic>Mel frequency cepstral coefficient</topic><topic>Neural networks</topic><topic>Retrieval</topic><topic>Short term</topic><topic>Spectrogram</topic><topic>Speech</topic><topic>Speech feature extraction</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Qiuyu</creatorcontrib><creatorcontrib>Li, Yuzhou</creatorcontrib><creatorcontrib>Hu, Yinjie</creatorcontrib><creatorcontrib>Zhao, Xuejiao</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Qiuyu</au><au>Li, Yuzhou</au><au>Hu, Yinjie</au><au>Zhao, Xuejiao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An Encrypted Speech Retrieval Method Based on Deep Perceptual Hashing and CNN-BiLSTM</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2020-01-01</date><risdate>2020</risdate><volume>8</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Since convolutional neural network (CNN) can only extract local features, and long shortterm memory (LSTM) neural network model has a large number of learning calculations, a long processing time and an obvious degree of information loss as the length of speech increases. Utilizing the characteristics of autonomous feature extraction in deep learning, CNN and bidirectional long short-term memory (BiLSTM) network are combined to present an encrypted speech retrieval method based on deep perceptual hashing and CNN-BiLSTM. Firstly, the proposed method extracts the Log-Mel Spectrogram/MFCC features of the original speech and enters the CNN and BiLSTM networks in turn for model training. Secondly, we use the trained fusion network model to learn the deep perceptual feature and generate deep perceptual hashing sequences. Finally, the normalized Hamming distance algorithm is used for matching retrieval. In order to protect the speech security in the cloud, a speech encryption algorithm based on a 4D hyperchaotic system is proposed. The experimental results show that the proposed method has good discrimination, robustness, recall and precision compared with the existing methods, and it has good retrieval efficiency and retrieval accuracy for longer speech. Meanwhile, the proposed speech encryption algorithm has a high key space to resist exhaustive attacks.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2020.3015876</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-1488-388X</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2020-01, Vol.8, p.1-1
issn	2169-3536 2169-3536
language	eng
recordid	cdi_doaj_primary_oai_doaj_org_article_2b3d2c6822bd46829ff2e9727e94a994
source	IEEE Xplore Open Access Journals
subjects	4D hyperchaotic system Algorithms Artificial neural networks CNN-BiLSTM Deep perceptual hashing Encrypted speech retrieval Encryption Feature extraction Filter banks Hidden Markov models Machine learning Mel frequency cepstral coefficient Neural networks Retrieval Short term Spectrogram Speech Speech feature extraction
title	An Encrypted Speech Retrieval Method Based on Deep Perceptual Hashing and CNN-BiLSTM
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T17%3A42%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20Encrypted%20Speech%20Retrieval%20Method%20Based%20on%20Deep%20Perceptual%20Hashing%20and%20CNN-BiLSTM&rft.jtitle=IEEE%20access&rft.au=Zhang,%20Qiuyu&rft.date=2020-01-01&rft.volume=8&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2020.3015876&rft_dat=%3Cproquest_doaj_%3E2454642621%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c408t-bef5c03889e5bc9e8532c5f14e305b3abb77489cf266b7b14e3b0a894f1ef0993%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2454642621&rft_id=info:pmid/&rft_ieee_id=9165111&rfr_iscdi=true