Loading…

Abstract 5364: Language modeling of peptide-HLA interactions achieves state-of-the-art performance on prediction of peptide presentation by HLA Class II

Precise and sensitive prediction of neoantigen presentation to the immune system via human leukocyte antigen (HLA) class II molecules remains a challenge despite the early success of neural networks applied to HLA class I. However, it is necessary to address this modeling challenge because presentat...

Full description

Saved in:
Bibliographic Details
Published in:Cancer research (Chicago, Ill.) Ill.), 2023-04, Vol.83 (7_Supplement), p.5364-5364
Main Authors: Sprague, Daniel, Klein, Joshua, Faria do Valle, Italo, Petrillo, Olivia, Rotunno, Melissa, Davis, Matthew, Lane, Monica, Jooss, Karin, Dhanik, Ankur
Format: Article
Language:English
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Precise and sensitive prediction of neoantigen presentation to the immune system via human leukocyte antigen (HLA) class II molecules remains a challenge despite the early success of neural networks applied to HLA class I. However, it is necessary to address this modeling challenge because presentation of a neoantigen epitope by both classes of HLA molecules may be valuable to induce a sustained immune response with therapeutic cancer vaccines. Previously we have developed a machine-learning based platform, EDGETM, that provides a state-of-the-art model to predict presentation of peptides by HLA Class I. Here we propose a new addition to our EDGETM platform: a model that leverages structural information of putative epitopes and HLA class II alleles from their in-situ context to predict presentation of peptides by HLA class II. Our model achieves this by leveraging the Evolutionary Scale Model pre-trained protein language model (LM), which has been demonstrated to embed protein sequences with rich structural information. The input to the model is a linear peptide consisting of an epitope and its flanking amino acids, concatenated with structurally relevant amino acids from each HLA allele. This allows our model to treat the modeling problem entirely as a natural language processing task, which minimizes imputation of covariates found in prior approaches when performing inference in the context of vaccine design, while maximizing the richness of the LM embeddings on longer linear peptides. Crucially, this also allows our model to generalize to any allele that has a known sequence. Additionally, DR-, DP-, and DQ-specific immunoaffinity purified mass spectrometry multi-allelic (MA) presentation data were generated per tumor or cell line sample, spanning 89 alleles in aggregate. We demonstrate that incrementally decreasing HLA class II allele MA resolution during training results in substantially improved predictions for situations where MA presentation data has completely ambiguous epitope presentation across DR/DP/DQ alleles. Our model achieves an Average Precision (AP) of 0.92 and ROC-AUC of 0.98 on the same benchmark validation data as the current state-of-the-art model BERTMHC, which achieved an AP of 0.81 and ROC-AUC of 0.95. These are the best AP and ROC-AUC for an HLA Class II presentation model on this benchmark dataset to the best of our knowledge. Our model is a significant advancement in HLA class II epitope prediction that allows our EDGETM platfor
ISSN:1538-7445
1538-7445
DOI:10.1158/1538-7445.AM2023-5364