Loading…

MEFold: Memory-Efficient Optimization for Protein Language Models via Chunk and Quantization

Protein language models are currently experiencing a surge in demand owing to their remarkable accuracy in protein structure prediction. Nevertheless, their applications are hindered by the significant computation and memory requirements. The existing optimization strategies primarily focus on compu...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jiang, Yanfeng, Sun, Ning, Lu, Zhengxian, Peng, Shuang, Zhang, Yi, Yang, Fei, Li, Tao
Format:	Conference Proceeding
Language:	English
Subjects:	Accuracy chunk Computational modeling memory Memory management Neural networks Predictive models protein language models Proteins quantization Quantization (signal)
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Protein language models are currently experiencing a surge in demand owing to their remarkable accuracy in protein structure prediction. Nevertheless, their applications are hindered by the significant computation and memory requirements. The existing optimization strategies primarily focus on computational efficiency while often neglecting memory optimization, thereby restricting their suitability for devices with limited resources. In this paper, we propose MEFold, a novel memory-efficient optimization framework for protein language models that enables efficient inference on resource-constrained devices. MEFold consists of Look-up Table Chunk and Fine-grained Quantization. Look-up Table Chunk reduces the memory of intermediate activations by chunk and avoids the overhead of obtaining the optimal chunk size configuration through pre-computing. For the memory of model parameters, Fine-grained Quantization, delicately controls the scope of quantization to ensure that memory reduction is achieved while preventing declines in accuracy and computational speed. Experimental results show that, compared to the original model, for protein sequences ranging from 74 to 1024 in length, our method significantly reduces the peak memory during inference from 14.7-54.2GB to 6.0-14.4GB, while minimizing the impact on inference latency. On CASP14 and CAMEO datasets, the accuracy loss compared to the original model is below 1%. Moreover, our optimization provides various memory-saving alternatives. Our code is available at https://github.com/llwx593/MEFold.
ISSN:	2161-4407
DOI:	10.1109/IJCNN60899.2024.10651470