Loading…
A Comprehensive Pipeline for Complex Text-to-Image Synthesis
Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem. It needs to solve several difficult tasks across the fields of natural language processing and computer vision. We model it as a combination of semantic entity recognition,...
Saved in:
Published in: | Journal of computer science and technology 2020-05, Vol.35 (3), p.522-537 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c392t-c99e74b05277516b8c3a218300a5464d8624737a5e6751aad209ff05b01c21583 |
---|---|
cites | cdi_FETCH-LOGICAL-c392t-c99e74b05277516b8c3a218300a5464d8624737a5e6751aad209ff05b01c21583 |
container_end_page | 537 |
container_issue | 3 |
container_start_page | 522 |
container_title | Journal of computer science and technology |
container_volume | 35 |
creator | Fang, Fei Luo, Fei Zhang, Hong-Pan Zhou, Hua-Jian Chow, Alix L. H. Xiao, Chun-Xia |
description | Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem. It needs to solve several difficult tasks across the fields of natural language processing and computer vision. We model it as a combination of semantic entity recognition, object retrieval and recombination, and objects’ status optimization. To reach a satisfactory result, we propose a comprehensive pipeline to convert the input text to its visual counterpart. The pipeline includes text processing, foreground objects and background scene retrieval, image synthesis using constrained MCMC, and post-processing. Firstly, we roughly divide the objects parsed from the input text into foreground objects and background scenes. Secondly, we retrieve the required foreground objects from the foreground object dataset segmented from Microsoft COCO dataset, and retrieve an appropriate background scene image from the background image dataset extracted from the Internet. Thirdly, in order to ensure the rationality of foreground objects’ positions and sizes in the image synthesis step, we design a cost function and use the Markov Chain Monte Carlo (MCMC) method as the optimizer to solve this constrained layout problem. Finally, to make the image look natural and harmonious, we further use Poisson-based and relighting-based methods to blend foreground objects and background scene image in the post-processing step. The synthesized results and comparison results based on Microsoft COCO dataset prove that our method outperforms some of the state-of-the-art methods based on generative adversarial networks (GANs) in visual quality of generated scene images. |
doi_str_mv | 10.1007/s11390-020-0305-9 |
format | article |
fullrecord | <record><control><sourceid>wanfang_jour_proqu</sourceid><recordid>TN_cdi_wanfang_journals_jsjkxjsxb_e202003004</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A719038398</galeid><wanfj_id>jsjkxjsxb_e202003004</wanfj_id><sourcerecordid>jsjkxjsxb_e202003004</sourcerecordid><originalsourceid>FETCH-LOGICAL-c392t-c99e74b05277516b8c3a218300a5464d8624737a5e6751aad209ff05b01c21583</originalsourceid><addsrcrecordid>eNp1kUtLAzEUhQdRsFZ_gLsBt6bevCYTcFOKj0JBwboO6fROO2ObqclU239v6ghdSciD3O8k53KS5JrCgAKou0Ap10CAxclBEn2S9GieARFK6NN4BgCi43KeXIRQA3AFQvSS-2E6atYbj0t0ofrC9LXa4KpymJaN_y2tcJdOcdeStiHjtV1g-rZ37RJDFS6Ts9KuAl797f3k_fFhOnomk5en8Wg4IQXXrCWF1qjEDCRTStJslhfcMppzACtFJuZ5xoTiykrMYt3aOQNdliBnQAtGZc77yW337rd1pXULUzdb7-KPpg71x64Ou5lBFnuPrYOI-E2Hb3zzucXQHnmmaS5BUSkjNeiohV2hqVzZtN4WccxxXRWNw7KK90NFNfCc64ML2gkK34TgsTQbX62t3xsK5hCC6UIw0Yg5hGB01LBOEyLrFuiPVv4X_QAuSYZC</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918507155</pqid></control><display><type>article</type><title>A Comprehensive Pipeline for Complex Text-to-Image Synthesis</title><source>ABI/INFORM global</source><source>Springer Nature</source><creator>Fang, Fei ; Luo, Fei ; Zhang, Hong-Pan ; Zhou, Hua-Jian ; Chow, Alix L. H. ; Xiao, Chun-Xia</creator><creatorcontrib>Fang, Fei ; Luo, Fei ; Zhang, Hong-Pan ; Zhou, Hua-Jian ; Chow, Alix L. H. ; Xiao, Chun-Xia</creatorcontrib><description>Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem. It needs to solve several difficult tasks across the fields of natural language processing and computer vision. We model it as a combination of semantic entity recognition, object retrieval and recombination, and objects’ status optimization. To reach a satisfactory result, we propose a comprehensive pipeline to convert the input text to its visual counterpart. The pipeline includes text processing, foreground objects and background scene retrieval, image synthesis using constrained MCMC, and post-processing. Firstly, we roughly divide the objects parsed from the input text into foreground objects and background scenes. Secondly, we retrieve the required foreground objects from the foreground object dataset segmented from Microsoft COCO dataset, and retrieve an appropriate background scene image from the background image dataset extracted from the Internet. Thirdly, in order to ensure the rationality of foreground objects’ positions and sizes in the image synthesis step, we design a cost function and use the Markov Chain Monte Carlo (MCMC) method as the optimizer to solve this constrained layout problem. Finally, to make the image look natural and harmonious, we further use Poisson-based and relighting-based methods to blend foreground objects and background scene image in the post-processing step. The synthesized results and comparison results based on Microsoft COCO dataset prove that our method outperforms some of the state-of-the-art methods based on generative adversarial networks (GANs) in visual quality of generated scene images.</description><identifier>ISSN: 1000-9000</identifier><identifier>EISSN: 1860-4749</identifier><identifier>DOI: 10.1007/s11390-020-0305-9</identifier><language>eng</language><publisher>Singapore: Springer Singapore</publisher><subject>Artificial Intelligence ; Computational linguistics ; Computer Science ; Computer vision ; Cost function ; Data Structures and Information Theory ; Datasets ; Generative adversarial networks ; Image quality ; Information Systems Applications (incl.Internet) ; Language processing ; Machine vision ; Markov chains ; Markov processes ; Monte Carlo method ; Natural language interfaces ; Natural language processing ; Object recognition ; Pipe lines ; Regular Paper ; Retrieval ; Software Engineering ; Synthesis ; Theory of Computation</subject><ispartof>Journal of computer science and technology, 2020-05, Vol.35 (3), p.522-537</ispartof><rights>Institute of Computing Technology, Chinese Academy of Sciences 2020</rights><rights>COPYRIGHT 2020 Springer</rights><rights>Institute of Computing Technology, Chinese Academy of Sciences 2020.</rights><rights>Copyright © Wanfang Data Co. Ltd. All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c392t-c99e74b05277516b8c3a218300a5464d8624737a5e6751aad209ff05b01c21583</citedby><cites>FETCH-LOGICAL-c392t-c99e74b05277516b8c3a218300a5464d8624737a5e6751aad209ff05b01c21583</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://www.wanfangdata.com.cn/images/PeriodicalImages/jsjkxjsxb-e/jsjkxjsxb-e.jpg</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2918507155?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,776,780,11667,27901,27902,36037,44339</link.rule.ids></links><search><creatorcontrib>Fang, Fei</creatorcontrib><creatorcontrib>Luo, Fei</creatorcontrib><creatorcontrib>Zhang, Hong-Pan</creatorcontrib><creatorcontrib>Zhou, Hua-Jian</creatorcontrib><creatorcontrib>Chow, Alix L. H.</creatorcontrib><creatorcontrib>Xiao, Chun-Xia</creatorcontrib><title>A Comprehensive Pipeline for Complex Text-to-Image Synthesis</title><title>Journal of computer science and technology</title><addtitle>J. Comput. Sci. Technol</addtitle><description>Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem. It needs to solve several difficult tasks across the fields of natural language processing and computer vision. We model it as a combination of semantic entity recognition, object retrieval and recombination, and objects’ status optimization. To reach a satisfactory result, we propose a comprehensive pipeline to convert the input text to its visual counterpart. The pipeline includes text processing, foreground objects and background scene retrieval, image synthesis using constrained MCMC, and post-processing. Firstly, we roughly divide the objects parsed from the input text into foreground objects and background scenes. Secondly, we retrieve the required foreground objects from the foreground object dataset segmented from Microsoft COCO dataset, and retrieve an appropriate background scene image from the background image dataset extracted from the Internet. Thirdly, in order to ensure the rationality of foreground objects’ positions and sizes in the image synthesis step, we design a cost function and use the Markov Chain Monte Carlo (MCMC) method as the optimizer to solve this constrained layout problem. Finally, to make the image look natural and harmonious, we further use Poisson-based and relighting-based methods to blend foreground objects and background scene image in the post-processing step. The synthesized results and comparison results based on Microsoft COCO dataset prove that our method outperforms some of the state-of-the-art methods based on generative adversarial networks (GANs) in visual quality of generated scene images.</description><subject>Artificial Intelligence</subject><subject>Computational linguistics</subject><subject>Computer Science</subject><subject>Computer vision</subject><subject>Cost function</subject><subject>Data Structures and Information Theory</subject><subject>Datasets</subject><subject>Generative adversarial networks</subject><subject>Image quality</subject><subject>Information Systems Applications (incl.Internet)</subject><subject>Language processing</subject><subject>Machine vision</subject><subject>Markov chains</subject><subject>Markov processes</subject><subject>Monte Carlo method</subject><subject>Natural language interfaces</subject><subject>Natural language processing</subject><subject>Object recognition</subject><subject>Pipe lines</subject><subject>Regular Paper</subject><subject>Retrieval</subject><subject>Software Engineering</subject><subject>Synthesis</subject><subject>Theory of Computation</subject><issn>1000-9000</issn><issn>1860-4749</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><recordid>eNp1kUtLAzEUhQdRsFZ_gLsBt6bevCYTcFOKj0JBwboO6fROO2ObqclU239v6ghdSciD3O8k53KS5JrCgAKou0Ap10CAxclBEn2S9GieARFK6NN4BgCi43KeXIRQA3AFQvSS-2E6atYbj0t0ofrC9LXa4KpymJaN_y2tcJdOcdeStiHjtV1g-rZ37RJDFS6Ts9KuAl797f3k_fFhOnomk5en8Wg4IQXXrCWF1qjEDCRTStJslhfcMppzACtFJuZ5xoTiykrMYt3aOQNdliBnQAtGZc77yW337rd1pXULUzdb7-KPpg71x64Ou5lBFnuPrYOI-E2Hb3zzucXQHnmmaS5BUSkjNeiohV2hqVzZtN4WccxxXRWNw7KK90NFNfCc64ML2gkK34TgsTQbX62t3xsK5hCC6UIw0Yg5hGB01LBOEyLrFuiPVv4X_QAuSYZC</recordid><startdate>20200501</startdate><enddate>20200501</enddate><creator>Fang, Fei</creator><creator>Luo, Fei</creator><creator>Zhang, Hong-Pan</creator><creator>Zhou, Hua-Jian</creator><creator>Chow, Alix L. H.</creator><creator>Xiao, Chun-Xia</creator><general>Springer Singapore</general><general>Springer</general><general>Springer Nature B.V</general><general>School of Computer Science,Wuhan University,Wuhan 430072,China%Xiaomi Technology Co. LTD,Beijing 100085,China</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PHGZM</scope><scope>PHGZT</scope><scope>PKEHL</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQGLB</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>Q9U</scope><scope>2B.</scope><scope>4A8</scope><scope>92I</scope><scope>93N</scope><scope>PSX</scope><scope>TCJ</scope></search><sort><creationdate>20200501</creationdate><title>A Comprehensive Pipeline for Complex Text-to-Image Synthesis</title><author>Fang, Fei ; Luo, Fei ; Zhang, Hong-Pan ; Zhou, Hua-Jian ; Chow, Alix L. H. ; Xiao, Chun-Xia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c392t-c99e74b05277516b8c3a218300a5464d8624737a5e6751aad209ff05b01c21583</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Artificial Intelligence</topic><topic>Computational linguistics</topic><topic>Computer Science</topic><topic>Computer vision</topic><topic>Cost function</topic><topic>Data Structures and Information Theory</topic><topic>Datasets</topic><topic>Generative adversarial networks</topic><topic>Image quality</topic><topic>Information Systems Applications (incl.Internet)</topic><topic>Language processing</topic><topic>Machine vision</topic><topic>Markov chains</topic><topic>Markov processes</topic><topic>Monte Carlo method</topic><topic>Natural language interfaces</topic><topic>Natural language processing</topic><topic>Object recognition</topic><topic>Pipe lines</topic><topic>Regular Paper</topic><topic>Retrieval</topic><topic>Software Engineering</topic><topic>Synthesis</topic><topic>Theory of Computation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fang, Fei</creatorcontrib><creatorcontrib>Luo, Fei</creatorcontrib><creatorcontrib>Zhang, Hong-Pan</creatorcontrib><creatorcontrib>Zhou, Hua-Jian</creatorcontrib><creatorcontrib>Chow, Alix L. H.</creatorcontrib><creatorcontrib>Xiao, Chun-Xia</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Database (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest Business Premium Collection</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM global</collection><collection>Computing Database</collection><collection>ProQuest Engineering Database</collection><collection>ProQuest advanced technologies & aerospace journals</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central (New)</collection><collection>ProQuest One Academic (New)</collection><collection>ProQuest One Academic Middle East (New)</collection><collection>One Business (ProQuest)</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Applied & Life Sciences</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering collection</collection><collection>ProQuest Central Basic</collection><collection>Wanfang Data Journals - Hong Kong</collection><collection>WANFANG Data Centre</collection><collection>Wanfang Data Journals</collection><collection>万方数据期刊 - 香港版</collection><collection>China Online Journals (COJ)</collection><collection>China Online Journals (COJ)</collection><jtitle>Journal of computer science and technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fang, Fei</au><au>Luo, Fei</au><au>Zhang, Hong-Pan</au><au>Zhou, Hua-Jian</au><au>Chow, Alix L. H.</au><au>Xiao, Chun-Xia</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Comprehensive Pipeline for Complex Text-to-Image Synthesis</atitle><jtitle>Journal of computer science and technology</jtitle><stitle>J. Comput. Sci. Technol</stitle><date>2020-05-01</date><risdate>2020</risdate><volume>35</volume><issue>3</issue><spage>522</spage><epage>537</epage><pages>522-537</pages><issn>1000-9000</issn><eissn>1860-4749</eissn><abstract>Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem. It needs to solve several difficult tasks across the fields of natural language processing and computer vision. We model it as a combination of semantic entity recognition, object retrieval and recombination, and objects’ status optimization. To reach a satisfactory result, we propose a comprehensive pipeline to convert the input text to its visual counterpart. The pipeline includes text processing, foreground objects and background scene retrieval, image synthesis using constrained MCMC, and post-processing. Firstly, we roughly divide the objects parsed from the input text into foreground objects and background scenes. Secondly, we retrieve the required foreground objects from the foreground object dataset segmented from Microsoft COCO dataset, and retrieve an appropriate background scene image from the background image dataset extracted from the Internet. Thirdly, in order to ensure the rationality of foreground objects’ positions and sizes in the image synthesis step, we design a cost function and use the Markov Chain Monte Carlo (MCMC) method as the optimizer to solve this constrained layout problem. Finally, to make the image look natural and harmonious, we further use Poisson-based and relighting-based methods to blend foreground objects and background scene image in the post-processing step. The synthesized results and comparison results based on Microsoft COCO dataset prove that our method outperforms some of the state-of-the-art methods based on generative adversarial networks (GANs) in visual quality of generated scene images.</abstract><cop>Singapore</cop><pub>Springer Singapore</pub><doi>10.1007/s11390-020-0305-9</doi><tpages>16</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1000-9000 |
ispartof | Journal of computer science and technology, 2020-05, Vol.35 (3), p.522-537 |
issn | 1000-9000 1860-4749 |
language | eng |
recordid | cdi_wanfang_journals_jsjkxjsxb_e202003004 |
source | ABI/INFORM global; Springer Nature |
subjects | Artificial Intelligence Computational linguistics Computer Science Computer vision Cost function Data Structures and Information Theory Datasets Generative adversarial networks Image quality Information Systems Applications (incl.Internet) Language processing Machine vision Markov chains Markov processes Monte Carlo method Natural language interfaces Natural language processing Object recognition Pipe lines Regular Paper Retrieval Software Engineering Synthesis Theory of Computation |
title | A Comprehensive Pipeline for Complex Text-to-Image Synthesis |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-23T23%3A24%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-wanfang_jour_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Comprehensive%20Pipeline%20for%20Complex%20Text-to-Image%20Synthesis&rft.jtitle=Journal%20of%20computer%20science%20and%20technology&rft.au=Fang,%20Fei&rft.date=2020-05-01&rft.volume=35&rft.issue=3&rft.spage=522&rft.epage=537&rft.pages=522-537&rft.issn=1000-9000&rft.eissn=1860-4749&rft_id=info:doi/10.1007/s11390-020-0305-9&rft_dat=%3Cwanfang_jour_proqu%3Ejsjkxjsxb_e202003004%3C/wanfang_jour_proqu%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c392t-c99e74b05277516b8c3a218300a5464d8624737a5e6751aad209ff05b01c21583%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2918507155&rft_id=info:pmid/&rft_galeid=A719038398&rft_wanfj_id=jsjkxjsxb_e202003004&rfr_iscdi=true |