Loading…

HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external document retrieval to provide domain-specific or up-to-date knowledge. The effectiveness of RAG depends on the relevance of retrieved documents, which is influenced by the semantic alignment of embeddi...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-12
Main Authors:	Bhattarai, Manish, Barron, Ryan, Eren, Maksim, Vu, Minh, Grantcharov, Vesselin, Boureima, Ismael, Stanev, Valentin, Matuszek, Cynthia, Valtchinov, Vladimir, Rasmussen, Kim, Alexandrov, Boian
Format:	Article
Language:	English
Subjects:	Alignment Clustering Cybersecurity Documents Embedding Hierarchies Large language models Machine learning Retrieval
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Bhattarai, Manish Barron, Ryan Eren, Maksim Vu, Minh Grantcharov, Vesselin Boureima, Ismael Stanev, Valentin Matuszek, Cynthia Valtchinov, Vladimir Rasmussen, Kim Alexandrov, Boian
description	Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external document retrieval to provide domain-specific or up-to-date knowledge. The effectiveness of RAG depends on the relevance of retrieved documents, which is influenced by the semantic alignment of embeddings with the domain's specialized content. Although full fine-tuning can align language models to specific domains, it is computationally intensive and demands substantial data. This paper introduces Hierarchical Embedding Alignment Loss (HEAL), a novel method that leverages hierarchical fuzzy clustering with matrix factorization within contrastive learning to efficiently align LLM embeddings with domain-specific content. HEAL computes level/depth-wise contrastive losses and incorporates hierarchical penalties to align embeddings with the underlying relationships in label hierarchies. This approach enhances retrieval relevance and document classification, effectively reducing hallucinations in LLM outputs. In our experiments, we benchmark and evaluate HEAL across diverse domains, including Healthcare, Material Science, Cyber-security, and Applied Maths.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3142373699</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3142373699</sourcerecordid><originalsourceid>FETCH-proquest_journals_31423736993</originalsourceid><addsrcrecordid>eNqNjM0KgkAURocgSMp3GGgt2Iw_2U7CMHAVrZNJrzaiM3ZHff4m6AFafRy-w1kRh3F-8I4BYxviGtP5vs-imIUhd8gjz9LiRHMJKLB6yUr0NBueUNdStTTtZasGUBMttDG00Uivw4h6gZreYEIJi_WF-tKIYKwpJqkVLUCgsoUdWTeiN-D-dkv2l-x-zj0bec9gprLTMyp7lfwQMB7zKEn4f9YH7eVDxA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3142373699</pqid></control><display><type>article</type><title>HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning</title><source>Publicly Available Content Database</source><creator>Bhattarai, Manish ; Barron, Ryan ; Eren, Maksim ; Vu, Minh ; Grantcharov, Vesselin ; Boureima, Ismael ; Stanev, Valentin ; Matuszek, Cynthia ; Valtchinov, Vladimir ; Rasmussen, Kim ; Alexandrov, Boian</creator><creatorcontrib>Bhattarai, Manish ; Barron, Ryan ; Eren, Maksim ; Vu, Minh ; Grantcharov, Vesselin ; Boureima, Ismael ; Stanev, Valentin ; Matuszek, Cynthia ; Valtchinov, Vladimir ; Rasmussen, Kim ; Alexandrov, Boian</creatorcontrib><description>Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external document retrieval to provide domain-specific or up-to-date knowledge. The effectiveness of RAG depends on the relevance of retrieved documents, which is influenced by the semantic alignment of embeddings with the domain's specialized content. Although full fine-tuning can align language models to specific domains, it is computationally intensive and demands substantial data. This paper introduces Hierarchical Embedding Alignment Loss (HEAL), a novel method that leverages hierarchical fuzzy clustering with matrix factorization within contrastive learning to efficiently align LLM embeddings with domain-specific content. HEAL computes level/depth-wise contrastive losses and incorporates hierarchical penalties to align embeddings with the underlying relationships in label hierarchies. This approach enhances retrieval relevance and document classification, effectively reducing hallucinations in LLM outputs. In our experiments, we benchmark and evaluate HEAL across diverse domains, including Healthcare, Material Science, Cyber-security, and Applied Maths.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Alignment ; Clustering ; Cybersecurity ; Documents ; Embedding ; Hierarchies ; Large language models ; Machine learning ; Retrieval</subject><ispartof>arXiv.org, 2024-12</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3142373699?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25732,36991,44569</link.rule.ids></links><search><creatorcontrib>Bhattarai, Manish</creatorcontrib><creatorcontrib>Barron, Ryan</creatorcontrib><creatorcontrib>Eren, Maksim</creatorcontrib><creatorcontrib>Vu, Minh</creatorcontrib><creatorcontrib>Grantcharov, Vesselin</creatorcontrib><creatorcontrib>Boureima, Ismael</creatorcontrib><creatorcontrib>Stanev, Valentin</creatorcontrib><creatorcontrib>Matuszek, Cynthia</creatorcontrib><creatorcontrib>Valtchinov, Vladimir</creatorcontrib><creatorcontrib>Rasmussen, Kim</creatorcontrib><creatorcontrib>Alexandrov, Boian</creatorcontrib><title>HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning</title><title>arXiv.org</title><description>Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external document retrieval to provide domain-specific or up-to-date knowledge. The effectiveness of RAG depends on the relevance of retrieved documents, which is influenced by the semantic alignment of embeddings with the domain's specialized content. Although full fine-tuning can align language models to specific domains, it is computationally intensive and demands substantial data. This paper introduces Hierarchical Embedding Alignment Loss (HEAL), a novel method that leverages hierarchical fuzzy clustering with matrix factorization within contrastive learning to efficiently align LLM embeddings with domain-specific content. HEAL computes level/depth-wise contrastive losses and incorporates hierarchical penalties to align embeddings with the underlying relationships in label hierarchies. This approach enhances retrieval relevance and document classification, effectively reducing hallucinations in LLM outputs. In our experiments, we benchmark and evaluate HEAL across diverse domains, including Healthcare, Material Science, Cyber-security, and Applied Maths.</description><subject>Alignment</subject><subject>Clustering</subject><subject>Cybersecurity</subject><subject>Documents</subject><subject>Embedding</subject><subject>Hierarchies</subject><subject>Large language models</subject><subject>Machine learning</subject><subject>Retrieval</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjM0KgkAURocgSMp3GGgt2Iw_2U7CMHAVrZNJrzaiM3ZHff4m6AFafRy-w1kRh3F-8I4BYxviGtP5vs-imIUhd8gjz9LiRHMJKLB6yUr0NBueUNdStTTtZasGUBMttDG00Uivw4h6gZreYEIJi_WF-tKIYKwpJqkVLUCgsoUdWTeiN-D-dkv2l-x-zj0bec9gprLTMyp7lfwQMB7zKEn4f9YH7eVDxA</recordid><startdate>20241205</startdate><enddate>20241205</enddate><creator>Bhattarai, Manish</creator><creator>Barron, Ryan</creator><creator>Eren, Maksim</creator><creator>Vu, Minh</creator><creator>Grantcharov, Vesselin</creator><creator>Boureima, Ismael</creator><creator>Stanev, Valentin</creator><creator>Matuszek, Cynthia</creator><creator>Valtchinov, Vladimir</creator><creator>Rasmussen, Kim</creator><creator>Alexandrov, Boian</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241205</creationdate><title>HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning</title><author>Bhattarai, Manish ; Barron, Ryan ; Eren, Maksim ; Vu, Minh ; Grantcharov, Vesselin ; Boureima, Ismael ; Stanev, Valentin ; Matuszek, Cynthia ; Valtchinov, Vladimir ; Rasmussen, Kim ; Alexandrov, Boian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31423736993</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Alignment</topic><topic>Clustering</topic><topic>Cybersecurity</topic><topic>Documents</topic><topic>Embedding</topic><topic>Hierarchies</topic><topic>Large language models</topic><topic>Machine learning</topic><topic>Retrieval</topic><toplevel>online_resources</toplevel><creatorcontrib>Bhattarai, Manish</creatorcontrib><creatorcontrib>Barron, Ryan</creatorcontrib><creatorcontrib>Eren, Maksim</creatorcontrib><creatorcontrib>Vu, Minh</creatorcontrib><creatorcontrib>Grantcharov, Vesselin</creatorcontrib><creatorcontrib>Boureima, Ismael</creatorcontrib><creatorcontrib>Stanev, Valentin</creatorcontrib><creatorcontrib>Matuszek, Cynthia</creatorcontrib><creatorcontrib>Valtchinov, Vladimir</creatorcontrib><creatorcontrib>Rasmussen, Kim</creatorcontrib><creatorcontrib>Alexandrov, Boian</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Databases</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bhattarai, Manish</au><au>Barron, Ryan</au><au>Eren, Maksim</au><au>Vu, Minh</au><au>Grantcharov, Vesselin</au><au>Boureima, Ismael</au><au>Stanev, Valentin</au><au>Matuszek, Cynthia</au><au>Valtchinov, Vladimir</au><au>Rasmussen, Kim</au><au>Alexandrov, Boian</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning</atitle><jtitle>arXiv.org</jtitle><date>2024-12-05</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external document retrieval to provide domain-specific or up-to-date knowledge. The effectiveness of RAG depends on the relevance of retrieved documents, which is influenced by the semantic alignment of embeddings with the domain's specialized content. Although full fine-tuning can align language models to specific domains, it is computationally intensive and demands substantial data. This paper introduces Hierarchical Embedding Alignment Loss (HEAL), a novel method that leverages hierarchical fuzzy clustering with matrix factorization within contrastive learning to efficiently align LLM embeddings with domain-specific content. HEAL computes level/depth-wise contrastive losses and incorporates hierarchical penalties to align embeddings with the underlying relationships in label hierarchies. This approach enhances retrieval relevance and document classification, effectively reducing hallucinations in LLM outputs. In our experiments, we benchmark and evaluate HEAL across diverse domains, including Healthcare, Material Science, Cyber-security, and Applied Maths.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-12
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3142373699
source	Publicly Available Content Database
subjects	Alignment Clustering Cybersecurity Documents Embedding Hierarchies Large language models Machine learning Retrieval
title	HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T19%3A37%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=HEAL:%20Hierarchical%20Embedding%20Alignment%20Loss%20for%20Improved%20Retrieval%20and%20Representation%20Learning&rft.jtitle=arXiv.org&rft.au=Bhattarai,%20Manish&rft.date=2024-12-05&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3142373699%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_31423736993%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3142373699&rft_id=info:pmid/&rfr_iscdi=true