Loading…

Secure Count Query on Encrypted Genomic Data

Capturing the vast amount of meaningful information encoded in the human genome is a fascinating research problem. The outcome of these researches have significant influences in a number of health related fields --- personalized medicine, paternity testing and disease susceptibility testing are a fe...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2017-03
Main Authors: Hasan, Mohammad Zahidul, Md Safiur Rahman Mahdi, Noman Mohammed
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Capturing the vast amount of meaningful information encoded in the human genome is a fascinating research problem. The outcome of these researches have significant influences in a number of health related fields --- personalized medicine, paternity testing and disease susceptibility testing are a few to be named. To facilitate these types of large scale biomedical research projects, it oftentimes requires to share genomic and clinical data collected by disparate organizations among themselves. In that case, it is of utmost importance to ensure that sharing, managing and analyzing the data does not reveal the identity of the individuals who contribute their genomic samples. The task of storage and computation on the shared data can be delegated to third party cloud infrastructures, equipped with large storage and high performance computation resources. Outsourcing these sensitive genomic data to the third party cloud storage is associated with the challenges of the potential loss, theft or misuse of the data as the server administrator cannot be completely trusted as well as there is no guarantee that the security of the server will not be breached. In this paper, we provide a model for secure sharing and computation on genomic data in a semi-honest third party cloud server. The security of the shared data is guaranteed through encryption while making the overall computation fast and scalable enough for real-life large-scale biomedical applications. We evaluated the efficiency of our proposed model on a database of Single-Nucleotide Polymorphism (SNP) sequences and experimental results demonstrate that a query of 50 SNPs in a database of 50000 records, where each record contains 300 SNPs, takes approximately 6 seconds.
ISSN:2331-8422