Loading…

An efficient multimodal 2D + 3D feature-based approach to automatic facial expression recognition

•We propose a feature-based 2D+3D multimodal facial expression recognition method.•It is fully automatic benefit from a large set of automatically detected landmarks.•The complementarities between 2D and 3D features are comprehensively demonstrated.•Our method achieves the best accuracy on the BU–3D...

Full description

Saved in:
Bibliographic Details
Published in:Computer vision and image understanding 2015-11, Vol.140, p.83-92
Main Authors: Li, Huibin, Ding, Huaxiong, Huang, Di, Wang, Yunhong, Zhao, Xi, Morvan, Jean-Marie, Chen, Liming
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c448t-93f478e432478cfba2230fedaf414a6296f6f0b2967787b71426604cba9174be3
cites cdi_FETCH-LOGICAL-c448t-93f478e432478cfba2230fedaf414a6296f6f0b2967787b71426604cba9174be3
container_end_page 92
container_issue
container_start_page 83
container_title Computer vision and image understanding
container_volume 140
creator Li, Huibin
Ding, Huaxiong
Huang, Di
Wang, Yunhong
Zhao, Xi
Morvan, Jean-Marie
Chen, Liming
description •We propose a feature-based 2D+3D multimodal facial expression recognition method.•It is fully automatic benefit from a large set of automatically detected landmarks.•The complementarities between 2D and 3D features are comprehensively demonstrated.•Our method achieves the best accuracy on the BU–3DFE database so far.•A good generalization ability is shown on the Bosphorus database. We present a fully automatic multimodal 2D + 3D feature-based facial expression recognition approach and demonstrate its performance on the BU–3DFE database. Our approach combines multi-order gradient-based local texture and shape descriptors in order to achieve efficiency and robustness. First, a large set of fiducial facial landmarks of 2D face images along with their 3D face scans are localized using a novel algorithm namely incremental Parallel Cascade of Linear Regression (iPar–CLR). Then, a novel Histogram of Second Order Gradients (HSOG) based local image descriptor in conjunction with the widely used first-order gradient based SIFT descriptor are used to describe the local texture around each 2D landmark. Similarly, the local geometry around each 3D landmark is described by two novel local shape descriptors constructed using the first-order and the second-order surface differential geometry quantities, i.e., Histogram of mesh Gradients (meshHOG) and Histogram of mesh Shape index (curvature quantization, meshHOS). Finally, the Support Vector Machine (SVM) based recognition results of all 2D and 3D descriptors are fused at both feature-level and score-level to further improve the accuracy. Comprehensive experimental results demonstrate that there exist impressive complementary characteristics between the 2D and 3D descriptors. We use the BU–3DFE benchmark to compare our approach to the state-of-the-art ones. Our multimodal feature-based approach outperforms the others by achieving an average recognition accuracy of 86.32%. Moreover, a good generalization ability is shown on the Bosphorus database.
doi_str_mv 10.1016/j.cviu.2015.07.005
format article
fullrecord <record><control><sourceid>hal_cross</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_02130337v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1077314215001587</els_id><sourcerecordid>oai_HAL_hal_02130337v1</sourcerecordid><originalsourceid>FETCH-LOGICAL-c448t-93f478e432478cfba2230fedaf414a6296f6f0b2967787b71426604cba9174be3</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhhdRsH78AU-5iuw6-eimC15Kq1YoeFHwFmaziU1pm5Jki_57s1Q8enqHYZ6B9ymKGwoVBVrfryt9cH3FgI4rkBXA-KQYUWigZHz8cTrMUpacCnZeXMS4BqBUNHRU4HRHjLVOO7NLZNtvktv6DjeEzckd4XNiDaY-mLLFaDqC-33wqFckeYJ98ltMThOL2mXEfO2DidH5HQlG-8-dS3m-Ks4sbqK5_s3L4v3p8W22KJevzy-z6bLUQkxS2XAr5MQIznJo2yJjHKzp0AoqsGZNbWsLbU4pJ7KVuUpdg9AtNlSK1vDL4vb4d4UbtQ9ui-FbeXRqMV2qYQeMcuBcHmi-ZcdbHXyMwdg_gIIahKq1GoSqQagCqbLQDD0cIZNbHJwJKg7WtOlcrptU591_-A_-dH5e</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>An efficient multimodal 2D + 3D feature-based approach to automatic facial expression recognition</title><source>Elsevier</source><creator>Li, Huibin ; Ding, Huaxiong ; Huang, Di ; Wang, Yunhong ; Zhao, Xi ; Morvan, Jean-Marie ; Chen, Liming</creator><creatorcontrib>Li, Huibin ; Ding, Huaxiong ; Huang, Di ; Wang, Yunhong ; Zhao, Xi ; Morvan, Jean-Marie ; Chen, Liming</creatorcontrib><description>•We propose a feature-based 2D+3D multimodal facial expression recognition method.•It is fully automatic benefit from a large set of automatically detected landmarks.•The complementarities between 2D and 3D features are comprehensively demonstrated.•Our method achieves the best accuracy on the BU–3DFE database so far.•A good generalization ability is shown on the Bosphorus database. We present a fully automatic multimodal 2D + 3D feature-based facial expression recognition approach and demonstrate its performance on the BU–3DFE database. Our approach combines multi-order gradient-based local texture and shape descriptors in order to achieve efficiency and robustness. First, a large set of fiducial facial landmarks of 2D face images along with their 3D face scans are localized using a novel algorithm namely incremental Parallel Cascade of Linear Regression (iPar–CLR). Then, a novel Histogram of Second Order Gradients (HSOG) based local image descriptor in conjunction with the widely used first-order gradient based SIFT descriptor are used to describe the local texture around each 2D landmark. Similarly, the local geometry around each 3D landmark is described by two novel local shape descriptors constructed using the first-order and the second-order surface differential geometry quantities, i.e., Histogram of mesh Gradients (meshHOG) and Histogram of mesh Shape index (curvature quantization, meshHOS). Finally, the Support Vector Machine (SVM) based recognition results of all 2D and 3D descriptors are fused at both feature-level and score-level to further improve the accuracy. Comprehensive experimental results demonstrate that there exist impressive complementary characteristics between the 2D and 3D descriptors. We use the BU–3DFE benchmark to compare our approach to the state-of-the-art ones. Our multimodal feature-based approach outperforms the others by achieving an average recognition accuracy of 86.32%. Moreover, a good generalization ability is shown on the Bosphorus database.</description><identifier>ISSN: 1077-3142</identifier><identifier>EISSN: 1090-235X</identifier><identifier>DOI: 10.1016/j.cviu.2015.07.005</identifier><language>eng</language><publisher>Elsevier Inc</publisher><subject>Computer Science ; Computer Vision and Pattern Recognition ; Facial expression recognition ; Image Processing ; Local shape descriptor ; Local texture descriptor ; Multimedia ; Multimodal fusion ; Neural and Evolutionary Computing ; Signal and Image Processing</subject><ispartof>Computer vision and image understanding, 2015-11, Vol.140, p.83-92</ispartof><rights>2015 Elsevier Inc.</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c448t-93f478e432478cfba2230fedaf414a6296f6f0b2967787b71426604cba9174be3</citedby><cites>FETCH-LOGICAL-c448t-93f478e432478cfba2230fedaf414a6296f6f0b2967787b71426604cba9174be3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,776,780,881,27901,27902</link.rule.ids><backlink>$$Uhttps://hal.science/hal-02130337$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Huibin</creatorcontrib><creatorcontrib>Ding, Huaxiong</creatorcontrib><creatorcontrib>Huang, Di</creatorcontrib><creatorcontrib>Wang, Yunhong</creatorcontrib><creatorcontrib>Zhao, Xi</creatorcontrib><creatorcontrib>Morvan, Jean-Marie</creatorcontrib><creatorcontrib>Chen, Liming</creatorcontrib><title>An efficient multimodal 2D + 3D feature-based approach to automatic facial expression recognition</title><title>Computer vision and image understanding</title><description>•We propose a feature-based 2D+3D multimodal facial expression recognition method.•It is fully automatic benefit from a large set of automatically detected landmarks.•The complementarities between 2D and 3D features are comprehensively demonstrated.•Our method achieves the best accuracy on the BU–3DFE database so far.•A good generalization ability is shown on the Bosphorus database. We present a fully automatic multimodal 2D + 3D feature-based facial expression recognition approach and demonstrate its performance on the BU–3DFE database. Our approach combines multi-order gradient-based local texture and shape descriptors in order to achieve efficiency and robustness. First, a large set of fiducial facial landmarks of 2D face images along with their 3D face scans are localized using a novel algorithm namely incremental Parallel Cascade of Linear Regression (iPar–CLR). Then, a novel Histogram of Second Order Gradients (HSOG) based local image descriptor in conjunction with the widely used first-order gradient based SIFT descriptor are used to describe the local texture around each 2D landmark. Similarly, the local geometry around each 3D landmark is described by two novel local shape descriptors constructed using the first-order and the second-order surface differential geometry quantities, i.e., Histogram of mesh Gradients (meshHOG) and Histogram of mesh Shape index (curvature quantization, meshHOS). Finally, the Support Vector Machine (SVM) based recognition results of all 2D and 3D descriptors are fused at both feature-level and score-level to further improve the accuracy. Comprehensive experimental results demonstrate that there exist impressive complementary characteristics between the 2D and 3D descriptors. We use the BU–3DFE benchmark to compare our approach to the state-of-the-art ones. Our multimodal feature-based approach outperforms the others by achieving an average recognition accuracy of 86.32%. Moreover, a good generalization ability is shown on the Bosphorus database.</description><subject>Computer Science</subject><subject>Computer Vision and Pattern Recognition</subject><subject>Facial expression recognition</subject><subject>Image Processing</subject><subject>Local shape descriptor</subject><subject>Local texture descriptor</subject><subject>Multimedia</subject><subject>Multimodal fusion</subject><subject>Neural and Evolutionary Computing</subject><subject>Signal and Image Processing</subject><issn>1077-3142</issn><issn>1090-235X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhhdRsH78AU-5iuw6-eimC15Kq1YoeFHwFmaziU1pm5Jki_57s1Q8enqHYZ6B9ymKGwoVBVrfryt9cH3FgI4rkBXA-KQYUWigZHz8cTrMUpacCnZeXMS4BqBUNHRU4HRHjLVOO7NLZNtvktv6DjeEzckd4XNiDaY-mLLFaDqC-33wqFckeYJ98ltMThOL2mXEfO2DidH5HQlG-8-dS3m-Ks4sbqK5_s3L4v3p8W22KJevzy-z6bLUQkxS2XAr5MQIznJo2yJjHKzp0AoqsGZNbWsLbU4pJ7KVuUpdg9AtNlSK1vDL4vb4d4UbtQ9ui-FbeXRqMV2qYQeMcuBcHmi-ZcdbHXyMwdg_gIIahKq1GoSqQagCqbLQDD0cIZNbHJwJKg7WtOlcrptU591_-A_-dH5e</recordid><startdate>20151101</startdate><enddate>20151101</enddate><creator>Li, Huibin</creator><creator>Ding, Huaxiong</creator><creator>Huang, Di</creator><creator>Wang, Yunhong</creator><creator>Zhao, Xi</creator><creator>Morvan, Jean-Marie</creator><creator>Chen, Liming</creator><general>Elsevier Inc</general><general>Elsevier</general><scope>AAYXX</scope><scope>CITATION</scope><scope>1XC</scope></search><sort><creationdate>20151101</creationdate><title>An efficient multimodal 2D + 3D feature-based approach to automatic facial expression recognition</title><author>Li, Huibin ; Ding, Huaxiong ; Huang, Di ; Wang, Yunhong ; Zhao, Xi ; Morvan, Jean-Marie ; Chen, Liming</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c448t-93f478e432478cfba2230fedaf414a6296f6f0b2967787b71426604cba9174be3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Computer Science</topic><topic>Computer Vision and Pattern Recognition</topic><topic>Facial expression recognition</topic><topic>Image Processing</topic><topic>Local shape descriptor</topic><topic>Local texture descriptor</topic><topic>Multimedia</topic><topic>Multimodal fusion</topic><topic>Neural and Evolutionary Computing</topic><topic>Signal and Image Processing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Huibin</creatorcontrib><creatorcontrib>Ding, Huaxiong</creatorcontrib><creatorcontrib>Huang, Di</creatorcontrib><creatorcontrib>Wang, Yunhong</creatorcontrib><creatorcontrib>Zhao, Xi</creatorcontrib><creatorcontrib>Morvan, Jean-Marie</creatorcontrib><creatorcontrib>Chen, Liming</creatorcontrib><collection>CrossRef</collection><collection>Hyper Article en Ligne (HAL)</collection><jtitle>Computer vision and image understanding</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Huibin</au><au>Ding, Huaxiong</au><au>Huang, Di</au><au>Wang, Yunhong</au><au>Zhao, Xi</au><au>Morvan, Jean-Marie</au><au>Chen, Liming</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An efficient multimodal 2D + 3D feature-based approach to automatic facial expression recognition</atitle><jtitle>Computer vision and image understanding</jtitle><date>2015-11-01</date><risdate>2015</risdate><volume>140</volume><spage>83</spage><epage>92</epage><pages>83-92</pages><issn>1077-3142</issn><eissn>1090-235X</eissn><abstract>•We propose a feature-based 2D+3D multimodal facial expression recognition method.•It is fully automatic benefit from a large set of automatically detected landmarks.•The complementarities between 2D and 3D features are comprehensively demonstrated.•Our method achieves the best accuracy on the BU–3DFE database so far.•A good generalization ability is shown on the Bosphorus database. We present a fully automatic multimodal 2D + 3D feature-based facial expression recognition approach and demonstrate its performance on the BU–3DFE database. Our approach combines multi-order gradient-based local texture and shape descriptors in order to achieve efficiency and robustness. First, a large set of fiducial facial landmarks of 2D face images along with their 3D face scans are localized using a novel algorithm namely incremental Parallel Cascade of Linear Regression (iPar–CLR). Then, a novel Histogram of Second Order Gradients (HSOG) based local image descriptor in conjunction with the widely used first-order gradient based SIFT descriptor are used to describe the local texture around each 2D landmark. Similarly, the local geometry around each 3D landmark is described by two novel local shape descriptors constructed using the first-order and the second-order surface differential geometry quantities, i.e., Histogram of mesh Gradients (meshHOG) and Histogram of mesh Shape index (curvature quantization, meshHOS). Finally, the Support Vector Machine (SVM) based recognition results of all 2D and 3D descriptors are fused at both feature-level and score-level to further improve the accuracy. Comprehensive experimental results demonstrate that there exist impressive complementary characteristics between the 2D and 3D descriptors. We use the BU–3DFE benchmark to compare our approach to the state-of-the-art ones. Our multimodal feature-based approach outperforms the others by achieving an average recognition accuracy of 86.32%. Moreover, a good generalization ability is shown on the Bosphorus database.</abstract><pub>Elsevier Inc</pub><doi>10.1016/j.cviu.2015.07.005</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1077-3142
ispartof Computer vision and image understanding, 2015-11, Vol.140, p.83-92
issn 1077-3142
1090-235X
language eng
recordid cdi_hal_primary_oai_HAL_hal_02130337v1
source Elsevier
subjects Computer Science
Computer Vision and Pattern Recognition
Facial expression recognition
Image Processing
Local shape descriptor
Local texture descriptor
Multimedia
Multimodal fusion
Neural and Evolutionary Computing
Signal and Image Processing
title An efficient multimodal 2D + 3D feature-based approach to automatic facial expression recognition
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T08%3A14%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-hal_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20efficient%20multimodal%202D%20+%203D%20feature-based%20approach%20to%20automatic%20facial%20expression%20recognition&rft.jtitle=Computer%20vision%20and%20image%20understanding&rft.au=Li,%20Huibin&rft.date=2015-11-01&rft.volume=140&rft.spage=83&rft.epage=92&rft.pages=83-92&rft.issn=1077-3142&rft.eissn=1090-235X&rft_id=info:doi/10.1016/j.cviu.2015.07.005&rft_dat=%3Chal_cross%3Eoai_HAL_hal_02130337v1%3C/hal_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c448t-93f478e432478cfba2230fedaf414a6296f6f0b2967787b71426604cba9174be3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true