Loading…

A Comprehensive Pipeline for Complex Text-to-Image Synthesis

Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem. It needs to solve several difficult tasks across the fields of natural language processing and computer vision. We model it as a combination of semantic entity recognition,...

Full description

Saved in:
Bibliographic Details
Published in:Journal of computer science and technology 2020-05, Vol.35 (3), p.522-537
Main Authors: Fang, Fei, Luo, Fei, Zhang, Hong-Pan, Zhou, Hua-Jian, Chow, Alix L. H., Xiao, Chun-Xia
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c392t-c99e74b05277516b8c3a218300a5464d8624737a5e6751aad209ff05b01c21583
cites cdi_FETCH-LOGICAL-c392t-c99e74b05277516b8c3a218300a5464d8624737a5e6751aad209ff05b01c21583
container_end_page 537
container_issue 3
container_start_page 522
container_title Journal of computer science and technology
container_volume 35
creator Fang, Fei
Luo, Fei
Zhang, Hong-Pan
Zhou, Hua-Jian
Chow, Alix L. H.
Xiao, Chun-Xia
description Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem. It needs to solve several difficult tasks across the fields of natural language processing and computer vision. We model it as a combination of semantic entity recognition, object retrieval and recombination, and objects’ status optimization. To reach a satisfactory result, we propose a comprehensive pipeline to convert the input text to its visual counterpart. The pipeline includes text processing, foreground objects and background scene retrieval, image synthesis using constrained MCMC, and post-processing. Firstly, we roughly divide the objects parsed from the input text into foreground objects and background scenes. Secondly, we retrieve the required foreground objects from the foreground object dataset segmented from Microsoft COCO dataset, and retrieve an appropriate background scene image from the background image dataset extracted from the Internet. Thirdly, in order to ensure the rationality of foreground objects’ positions and sizes in the image synthesis step, we design a cost function and use the Markov Chain Monte Carlo (MCMC) method as the optimizer to solve this constrained layout problem. Finally, to make the image look natural and harmonious, we further use Poisson-based and relighting-based methods to blend foreground objects and background scene image in the post-processing step. The synthesized results and comparison results based on Microsoft COCO dataset prove that our method outperforms some of the state-of-the-art methods based on generative adversarial networks (GANs) in visual quality of generated scene images.
doi_str_mv 10.1007/s11390-020-0305-9
format article
fullrecord <record><control><sourceid>wanfang_jour_proqu</sourceid><recordid>TN_cdi_wanfang_journals_jsjkxjsxb_e202003004</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A719038398</galeid><wanfj_id>jsjkxjsxb_e202003004</wanfj_id><sourcerecordid>jsjkxjsxb_e202003004</sourcerecordid><originalsourceid>FETCH-LOGICAL-c392t-c99e74b05277516b8c3a218300a5464d8624737a5e6751aad209ff05b01c21583</originalsourceid><addsrcrecordid>eNp1kUtLAzEUhQdRsFZ_gLsBt6bevCYTcFOKj0JBwboO6fROO2ObqclU239v6ghdSciD3O8k53KS5JrCgAKou0Ap10CAxclBEn2S9GieARFK6NN4BgCi43KeXIRQA3AFQvSS-2E6atYbj0t0ofrC9LXa4KpymJaN_y2tcJdOcdeStiHjtV1g-rZ37RJDFS6Ts9KuAl797f3k_fFhOnomk5en8Wg4IQXXrCWF1qjEDCRTStJslhfcMppzACtFJuZ5xoTiykrMYt3aOQNdliBnQAtGZc77yW337rd1pXULUzdb7-KPpg71x64Ou5lBFnuPrYOI-E2Hb3zzucXQHnmmaS5BUSkjNeiohV2hqVzZtN4WccxxXRWNw7KK90NFNfCc64ML2gkK34TgsTQbX62t3xsK5hCC6UIw0Yg5hGB01LBOEyLrFuiPVv4X_QAuSYZC</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918507155</pqid></control><display><type>article</type><title>A Comprehensive Pipeline for Complex Text-to-Image Synthesis</title><source>ABI/INFORM global</source><source>Springer Nature</source><creator>Fang, Fei ; Luo, Fei ; Zhang, Hong-Pan ; Zhou, Hua-Jian ; Chow, Alix L. H. ; Xiao, Chun-Xia</creator><creatorcontrib>Fang, Fei ; Luo, Fei ; Zhang, Hong-Pan ; Zhou, Hua-Jian ; Chow, Alix L. H. ; Xiao, Chun-Xia</creatorcontrib><description>Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem. It needs to solve several difficult tasks across the fields of natural language processing and computer vision. We model it as a combination of semantic entity recognition, object retrieval and recombination, and objects’ status optimization. To reach a satisfactory result, we propose a comprehensive pipeline to convert the input text to its visual counterpart. The pipeline includes text processing, foreground objects and background scene retrieval, image synthesis using constrained MCMC, and post-processing. Firstly, we roughly divide the objects parsed from the input text into foreground objects and background scenes. Secondly, we retrieve the required foreground objects from the foreground object dataset segmented from Microsoft COCO dataset, and retrieve an appropriate background scene image from the background image dataset extracted from the Internet. Thirdly, in order to ensure the rationality of foreground objects’ positions and sizes in the image synthesis step, we design a cost function and use the Markov Chain Monte Carlo (MCMC) method as the optimizer to solve this constrained layout problem. Finally, to make the image look natural and harmonious, we further use Poisson-based and relighting-based methods to blend foreground objects and background scene image in the post-processing step. The synthesized results and comparison results based on Microsoft COCO dataset prove that our method outperforms some of the state-of-the-art methods based on generative adversarial networks (GANs) in visual quality of generated scene images.</description><identifier>ISSN: 1000-9000</identifier><identifier>EISSN: 1860-4749</identifier><identifier>DOI: 10.1007/s11390-020-0305-9</identifier><language>eng</language><publisher>Singapore: Springer Singapore</publisher><subject>Artificial Intelligence ; Computational linguistics ; Computer Science ; Computer vision ; Cost function ; Data Structures and Information Theory ; Datasets ; Generative adversarial networks ; Image quality ; Information Systems Applications (incl.Internet) ; Language processing ; Machine vision ; Markov chains ; Markov processes ; Monte Carlo method ; Natural language interfaces ; Natural language processing ; Object recognition ; Pipe lines ; Regular Paper ; Retrieval ; Software Engineering ; Synthesis ; Theory of Computation</subject><ispartof>Journal of computer science and technology, 2020-05, Vol.35 (3), p.522-537</ispartof><rights>Institute of Computing Technology, Chinese Academy of Sciences 2020</rights><rights>COPYRIGHT 2020 Springer</rights><rights>Institute of Computing Technology, Chinese Academy of Sciences 2020.</rights><rights>Copyright © Wanfang Data Co. Ltd. All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c392t-c99e74b05277516b8c3a218300a5464d8624737a5e6751aad209ff05b01c21583</citedby><cites>FETCH-LOGICAL-c392t-c99e74b05277516b8c3a218300a5464d8624737a5e6751aad209ff05b01c21583</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://www.wanfangdata.com.cn/images/PeriodicalImages/jsjkxjsxb-e/jsjkxjsxb-e.jpg</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2918507155?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,776,780,11667,27901,27902,36037,44339</link.rule.ids></links><search><creatorcontrib>Fang, Fei</creatorcontrib><creatorcontrib>Luo, Fei</creatorcontrib><creatorcontrib>Zhang, Hong-Pan</creatorcontrib><creatorcontrib>Zhou, Hua-Jian</creatorcontrib><creatorcontrib>Chow, Alix L. H.</creatorcontrib><creatorcontrib>Xiao, Chun-Xia</creatorcontrib><title>A Comprehensive Pipeline for Complex Text-to-Image Synthesis</title><title>Journal of computer science and technology</title><addtitle>J. Comput. Sci. Technol</addtitle><description>Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem. It needs to solve several difficult tasks across the fields of natural language processing and computer vision. We model it as a combination of semantic entity recognition, object retrieval and recombination, and objects’ status optimization. To reach a satisfactory result, we propose a comprehensive pipeline to convert the input text to its visual counterpart. The pipeline includes text processing, foreground objects and background scene retrieval, image synthesis using constrained MCMC, and post-processing. Firstly, we roughly divide the objects parsed from the input text into foreground objects and background scenes. Secondly, we retrieve the required foreground objects from the foreground object dataset segmented from Microsoft COCO dataset, and retrieve an appropriate background scene image from the background image dataset extracted from the Internet. Thirdly, in order to ensure the rationality of foreground objects’ positions and sizes in the image synthesis step, we design a cost function and use the Markov Chain Monte Carlo (MCMC) method as the optimizer to solve this constrained layout problem. Finally, to make the image look natural and harmonious, we further use Poisson-based and relighting-based methods to blend foreground objects and background scene image in the post-processing step. The synthesized results and comparison results based on Microsoft COCO dataset prove that our method outperforms some of the state-of-the-art methods based on generative adversarial networks (GANs) in visual quality of generated scene images.</description><subject>Artificial Intelligence</subject><subject>Computational linguistics</subject><subject>Computer Science</subject><subject>Computer vision</subject><subject>Cost function</subject><subject>Data Structures and Information Theory</subject><subject>Datasets</subject><subject>Generative adversarial networks</subject><subject>Image quality</subject><subject>Information Systems Applications (incl.Internet)</subject><subject>Language processing</subject><subject>Machine vision</subject><subject>Markov chains</subject><subject>Markov processes</subject><subject>Monte Carlo method</subject><subject>Natural language interfaces</subject><subject>Natural language processing</subject><subject>Object recognition</subject><subject>Pipe lines</subject><subject>Regular Paper</subject><subject>Retrieval</subject><subject>Software Engineering</subject><subject>Synthesis</subject><subject>Theory of Computation</subject><issn>1000-9000</issn><issn>1860-4749</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><recordid>eNp1kUtLAzEUhQdRsFZ_gLsBt6bevCYTcFOKj0JBwboO6fROO2ObqclU239v6ghdSciD3O8k53KS5JrCgAKou0Ap10CAxclBEn2S9GieARFK6NN4BgCi43KeXIRQA3AFQvSS-2E6atYbj0t0ofrC9LXa4KpymJaN_y2tcJdOcdeStiHjtV1g-rZ37RJDFS6Ts9KuAl797f3k_fFhOnomk5en8Wg4IQXXrCWF1qjEDCRTStJslhfcMppzACtFJuZ5xoTiykrMYt3aOQNdliBnQAtGZc77yW337rd1pXULUzdb7-KPpg71x64Ou5lBFnuPrYOI-E2Hb3zzucXQHnmmaS5BUSkjNeiohV2hqVzZtN4WccxxXRWNw7KK90NFNfCc64ML2gkK34TgsTQbX62t3xsK5hCC6UIw0Yg5hGB01LBOEyLrFuiPVv4X_QAuSYZC</recordid><startdate>20200501</startdate><enddate>20200501</enddate><creator>Fang, Fei</creator><creator>Luo, Fei</creator><creator>Zhang, Hong-Pan</creator><creator>Zhou, Hua-Jian</creator><creator>Chow, Alix L. H.</creator><creator>Xiao, Chun-Xia</creator><general>Springer Singapore</general><general>Springer</general><general>Springer Nature B.V</general><general>School of Computer Science,Wuhan University,Wuhan 430072,China%Xiaomi Technology Co. LTD,Beijing 100085,China</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PHGZM</scope><scope>PHGZT</scope><scope>PKEHL</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQGLB</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>Q9U</scope><scope>2B.</scope><scope>4A8</scope><scope>92I</scope><scope>93N</scope><scope>PSX</scope><scope>TCJ</scope></search><sort><creationdate>20200501</creationdate><title>A Comprehensive Pipeline for Complex Text-to-Image Synthesis</title><author>Fang, Fei ; Luo, Fei ; Zhang, Hong-Pan ; Zhou, Hua-Jian ; Chow, Alix L. H. ; Xiao, Chun-Xia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c392t-c99e74b05277516b8c3a218300a5464d8624737a5e6751aad209ff05b01c21583</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Artificial Intelligence</topic><topic>Computational linguistics</topic><topic>Computer Science</topic><topic>Computer vision</topic><topic>Cost function</topic><topic>Data Structures and Information Theory</topic><topic>Datasets</topic><topic>Generative adversarial networks</topic><topic>Image quality</topic><topic>Information Systems Applications (incl.Internet)</topic><topic>Language processing</topic><topic>Machine vision</topic><topic>Markov chains</topic><topic>Markov processes</topic><topic>Monte Carlo method</topic><topic>Natural language interfaces</topic><topic>Natural language processing</topic><topic>Object recognition</topic><topic>Pipe lines</topic><topic>Regular Paper</topic><topic>Retrieval</topic><topic>Software Engineering</topic><topic>Synthesis</topic><topic>Theory of Computation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fang, Fei</creatorcontrib><creatorcontrib>Luo, Fei</creatorcontrib><creatorcontrib>Zhang, Hong-Pan</creatorcontrib><creatorcontrib>Zhou, Hua-Jian</creatorcontrib><creatorcontrib>Chow, Alix L. H.</creatorcontrib><creatorcontrib>Xiao, Chun-Xia</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Database‎ (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest Business Premium Collection</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM global</collection><collection>Computing Database</collection><collection>ProQuest Engineering Database</collection><collection>ProQuest advanced technologies &amp; aerospace journals</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central (New)</collection><collection>ProQuest One Academic (New)</collection><collection>ProQuest One Academic Middle East (New)</collection><collection>One Business (ProQuest)</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Applied &amp; Life Sciences</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering collection</collection><collection>ProQuest Central Basic</collection><collection>Wanfang Data Journals - Hong Kong</collection><collection>WANFANG Data Centre</collection><collection>Wanfang Data Journals</collection><collection>万方数据期刊 - 香港版</collection><collection>China Online Journals (COJ)</collection><collection>China Online Journals (COJ)</collection><jtitle>Journal of computer science and technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fang, Fei</au><au>Luo, Fei</au><au>Zhang, Hong-Pan</au><au>Zhou, Hua-Jian</au><au>Chow, Alix L. H.</au><au>Xiao, Chun-Xia</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Comprehensive Pipeline for Complex Text-to-Image Synthesis</atitle><jtitle>Journal of computer science and technology</jtitle><stitle>J. Comput. Sci. Technol</stitle><date>2020-05-01</date><risdate>2020</risdate><volume>35</volume><issue>3</issue><spage>522</spage><epage>537</epage><pages>522-537</pages><issn>1000-9000</issn><eissn>1860-4749</eissn><abstract>Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem. It needs to solve several difficult tasks across the fields of natural language processing and computer vision. We model it as a combination of semantic entity recognition, object retrieval and recombination, and objects’ status optimization. To reach a satisfactory result, we propose a comprehensive pipeline to convert the input text to its visual counterpart. The pipeline includes text processing, foreground objects and background scene retrieval, image synthesis using constrained MCMC, and post-processing. Firstly, we roughly divide the objects parsed from the input text into foreground objects and background scenes. Secondly, we retrieve the required foreground objects from the foreground object dataset segmented from Microsoft COCO dataset, and retrieve an appropriate background scene image from the background image dataset extracted from the Internet. Thirdly, in order to ensure the rationality of foreground objects’ positions and sizes in the image synthesis step, we design a cost function and use the Markov Chain Monte Carlo (MCMC) method as the optimizer to solve this constrained layout problem. Finally, to make the image look natural and harmonious, we further use Poisson-based and relighting-based methods to blend foreground objects and background scene image in the post-processing step. The synthesized results and comparison results based on Microsoft COCO dataset prove that our method outperforms some of the state-of-the-art methods based on generative adversarial networks (GANs) in visual quality of generated scene images.</abstract><cop>Singapore</cop><pub>Springer Singapore</pub><doi>10.1007/s11390-020-0305-9</doi><tpages>16</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1000-9000
ispartof Journal of computer science and technology, 2020-05, Vol.35 (3), p.522-537
issn 1000-9000
1860-4749
language eng
recordid cdi_wanfang_journals_jsjkxjsxb_e202003004
source ABI/INFORM global; Springer Nature
subjects Artificial Intelligence
Computational linguistics
Computer Science
Computer vision
Cost function
Data Structures and Information Theory
Datasets
Generative adversarial networks
Image quality
Information Systems Applications (incl.Internet)
Language processing
Machine vision
Markov chains
Markov processes
Monte Carlo method
Natural language interfaces
Natural language processing
Object recognition
Pipe lines
Regular Paper
Retrieval
Software Engineering
Synthesis
Theory of Computation
title A Comprehensive Pipeline for Complex Text-to-Image Synthesis
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-23T23%3A24%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-wanfang_jour_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Comprehensive%20Pipeline%20for%20Complex%20Text-to-Image%20Synthesis&rft.jtitle=Journal%20of%20computer%20science%20and%20technology&rft.au=Fang,%20Fei&rft.date=2020-05-01&rft.volume=35&rft.issue=3&rft.spage=522&rft.epage=537&rft.pages=522-537&rft.issn=1000-9000&rft.eissn=1860-4749&rft_id=info:doi/10.1007/s11390-020-0305-9&rft_dat=%3Cwanfang_jour_proqu%3Ejsjkxjsxb_e202003004%3C/wanfang_jour_proqu%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c392t-c99e74b05277516b8c3a218300a5464d8624737a5e6751aad209ff05b01c21583%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2918507155&rft_id=info:pmid/&rft_galeid=A719038398&rft_wanfj_id=jsjkxjsxb_e202003004&rfr_iscdi=true