Loading…

Smoothing Convolutional Factorizes Inception V3 Labels and Transformers for Image Feature Extraction into Text Segmentation

In the concept of computer vision, object detection in video understanding cannot provide a contextual picture in the form of a semantic description of the video/image. For this reason, an object detection and feature extraction mechanism is needed and a video and image conversion technique into tex...

Full description

Saved in:

Bibliographic Details
Main Authors:	Triana Indah, Komang Ayu, Darma Putra, I Ketut Gede, Sudarma, Made, Hartati, Rukmi Sari
Format:	Conference Proceeding
Language:	English
Subjects:	BLEU Computer architecture Encoder-Decoder RNN Feature extraction Inception-V3 Object detection Smoothing methods System performance Testing Transformer Transformers
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	144
container_issue
container_start_page	139
container_title
container_volume
creator	Triana Indah, Komang Ayu Darma Putra, I Ketut Gede Sudarma, Made Hartati, Rukmi Sari
description	In the concept of computer vision, object detection in video understanding cannot provide a contextual picture in the form of a semantic description of the video/image. For this reason, an object detection and feature extraction mechanism is needed and a video and image conversion technique into text using the Inception-V3 and Transformer methods. Inception-V3 is a deep convolutional architecture that is a development model of Google-Net or Inception-V1. Improved system performance by adding additional factorization at the convolution stage to reduce existing connections or parameters without reducing the network used to extract image features with an input image size of 299 x 299 x 3 pixels. With a transformer architecture that uses a multi-head self-attention mechanism to predict words and recover words sequentially with an RNN encoder-decoder architecture. The research was carried out using 5 minute videos which produced a Tensorflow dataset of 1000 images and 5000 sentence captions. The model was evaluated with BLEU (Bilingual Evaluation Understudy), with average scores of BLEU-1, BLEU-2, BLEU-3, and BLEU-4 obtained at 0.418, 0.367, 0.245, and 0.165 to produce predicted captions and real captions.
doi_str_mv	10.1109/ICSGTEIS60500.2023.10424317
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10424317</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10424317</ieee_id><sourcerecordid>10424317</sourcerecordid><originalsourceid>FETCH-LOGICAL-i119t-72a88120ebfb31c6a6701510e28c17d24c3928b750d90110adf04d9ba1bf62b73</originalsourceid><addsrcrecordid>eNo1kEFLwzAYhqMgOOb-gYeA584vSdskRynbLAw8tIq3kbZfZ6VNRprJ1D9vh3p64OV9n8NLyB2DJWOg7_Os2JSrvEghAVhy4GLJIOaxYPKCLLTUSiQgFFecX5IZV4JFMcDrNVmM4zsATD0pEzUj38XgXHjr7J5mzn64_hg6Z01P16YOzndfONLc1ng4x_RF0K2psB-psQ0tvbFj6_yAfqQTaT6YPdI1mnD0SFen4CfJedfZ4GiJp0AL3A9ogznHN-SqNf2Iiz_OyfN6VWaP0fZpk2cP26hjTIdIcqMU44BVWwlWpyaVwBIGyFXNZMPjWmiuKplAo2F6xzQtxI2uDKvalFdSzMntr7dDxN3Bd4Pxn7v_v8QPDxBikA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Smoothing Convolutional Factorizes Inception V3 Labels and Transformers for Image Feature Extraction into Text Segmentation</title><source>IEEE Xplore All Conference Series</source><creator>Triana Indah, Komang Ayu ; Darma Putra, I Ketut Gede ; Sudarma, Made ; Hartati, Rukmi Sari</creator><creatorcontrib>Triana Indah, Komang Ayu ; Darma Putra, I Ketut Gede ; Sudarma, Made ; Hartati, Rukmi Sari</creatorcontrib><description>In the concept of computer vision, object detection in video understanding cannot provide a contextual picture in the form of a semantic description of the video/image. For this reason, an object detection and feature extraction mechanism is needed and a video and image conversion technique into text using the Inception-V3 and Transformer methods. Inception-V3 is a deep convolutional architecture that is a development model of Google-Net or Inception-V1. Improved system performance by adding additional factorization at the convolution stage to reduce existing connections or parameters without reducing the network used to extract image features with an input image size of 299 x 299 x 3 pixels. With a transformer architecture that uses a multi-head self-attention mechanism to predict words and recover words sequentially with an RNN encoder-decoder architecture. The research was carried out using 5 minute videos which produced a Tensorflow dataset of 1000 images and 5000 sentence captions. The model was evaluated with BLEU (Bilingual Evaluation Understudy), with average scores of BLEU-1, BLEU-2, BLEU-3, and BLEU-4 obtained at 0.418, 0.367, 0.245, and 0.165 to produce predicted captions and real captions.</description><identifier>EISSN: 2831-400X</identifier><identifier>EISBN: 9798350382822</identifier><identifier>DOI: 10.1109/ICSGTEIS60500.2023.10424317</identifier><language>eng</language><publisher>IEEE</publisher><subject>BLEU ; Computer architecture ; Encoder-Decoder RNN ; Feature extraction ; Inception-V3 ; Object detection ; Smoothing methods ; System performance ; Testing ; Transformer ; Transformers</subject><ispartof>2023 International Conference on Smart-Green Technology in Electrical and Information Systems (ICSGTEIS), 2023, p.139-144</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10424317$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10424317$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Triana Indah, Komang Ayu</creatorcontrib><creatorcontrib>Darma Putra, I Ketut Gede</creatorcontrib><creatorcontrib>Sudarma, Made</creatorcontrib><creatorcontrib>Hartati, Rukmi Sari</creatorcontrib><title>Smoothing Convolutional Factorizes Inception V3 Labels and Transformers for Image Feature Extraction into Text Segmentation</title><title>2023 International Conference on Smart-Green Technology in Electrical and Information Systems (ICSGTEIS)</title><addtitle>ICSGTEIS</addtitle><description>In the concept of computer vision, object detection in video understanding cannot provide a contextual picture in the form of a semantic description of the video/image. For this reason, an object detection and feature extraction mechanism is needed and a video and image conversion technique into text using the Inception-V3 and Transformer methods. Inception-V3 is a deep convolutional architecture that is a development model of Google-Net or Inception-V1. Improved system performance by adding additional factorization at the convolution stage to reduce existing connections or parameters without reducing the network used to extract image features with an input image size of 299 x 299 x 3 pixels. With a transformer architecture that uses a multi-head self-attention mechanism to predict words and recover words sequentially with an RNN encoder-decoder architecture. The research was carried out using 5 minute videos which produced a Tensorflow dataset of 1000 images and 5000 sentence captions. The model was evaluated with BLEU (Bilingual Evaluation Understudy), with average scores of BLEU-1, BLEU-2, BLEU-3, and BLEU-4 obtained at 0.418, 0.367, 0.245, and 0.165 to produce predicted captions and real captions.</description><subject>BLEU</subject><subject>Computer architecture</subject><subject>Encoder-Decoder RNN</subject><subject>Feature extraction</subject><subject>Inception-V3</subject><subject>Object detection</subject><subject>Smoothing methods</subject><subject>System performance</subject><subject>Testing</subject><subject>Transformer</subject><subject>Transformers</subject><issn>2831-400X</issn><isbn>9798350382822</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNo1kEFLwzAYhqMgOOb-gYeA584vSdskRynbLAw8tIq3kbZfZ6VNRprJ1D9vh3p64OV9n8NLyB2DJWOg7_Os2JSrvEghAVhy4GLJIOaxYPKCLLTUSiQgFFecX5IZV4JFMcDrNVmM4zsATD0pEzUj38XgXHjr7J5mzn64_hg6Z01P16YOzndfONLc1ng4x_RF0K2psB-psQ0tvbFj6_yAfqQTaT6YPdI1mnD0SFen4CfJedfZ4GiJp0AL3A9ogznHN-SqNf2Iiz_OyfN6VWaP0fZpk2cP26hjTIdIcqMU44BVWwlWpyaVwBIGyFXNZMPjWmiuKplAo2F6xzQtxI2uDKvalFdSzMntr7dDxN3Bd4Pxn7v_v8QPDxBikA</recordid><startdate>20231102</startdate><enddate>20231102</enddate><creator>Triana Indah, Komang Ayu</creator><creator>Darma Putra, I Ketut Gede</creator><creator>Sudarma, Made</creator><creator>Hartati, Rukmi Sari</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20231102</creationdate><title>Smoothing Convolutional Factorizes Inception V3 Labels and Transformers for Image Feature Extraction into Text Segmentation</title><author>Triana Indah, Komang Ayu ; Darma Putra, I Ketut Gede ; Sudarma, Made ; Hartati, Rukmi Sari</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i119t-72a88120ebfb31c6a6701510e28c17d24c3928b750d90110adf04d9ba1bf62b73</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>BLEU</topic><topic>Computer architecture</topic><topic>Encoder-Decoder RNN</topic><topic>Feature extraction</topic><topic>Inception-V3</topic><topic>Object detection</topic><topic>Smoothing methods</topic><topic>System performance</topic><topic>Testing</topic><topic>Transformer</topic><topic>Transformers</topic><toplevel>online_resources</toplevel><creatorcontrib>Triana Indah, Komang Ayu</creatorcontrib><creatorcontrib>Darma Putra, I Ketut Gede</creatorcontrib><creatorcontrib>Sudarma, Made</creatorcontrib><creatorcontrib>Hartati, Rukmi Sari</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Triana Indah, Komang Ayu</au><au>Darma Putra, I Ketut Gede</au><au>Sudarma, Made</au><au>Hartati, Rukmi Sari</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Smoothing Convolutional Factorizes Inception V3 Labels and Transformers for Image Feature Extraction into Text Segmentation</atitle><btitle>2023 International Conference on Smart-Green Technology in Electrical and Information Systems (ICSGTEIS)</btitle><stitle>ICSGTEIS</stitle><date>2023-11-02</date><risdate>2023</risdate><spage>139</spage><epage>144</epage><pages>139-144</pages><eissn>2831-400X</eissn><eisbn>9798350382822</eisbn><abstract>In the concept of computer vision, object detection in video understanding cannot provide a contextual picture in the form of a semantic description of the video/image. For this reason, an object detection and feature extraction mechanism is needed and a video and image conversion technique into text using the Inception-V3 and Transformer methods. Inception-V3 is a deep convolutional architecture that is a development model of Google-Net or Inception-V1. Improved system performance by adding additional factorization at the convolution stage to reduce existing connections or parameters without reducing the network used to extract image features with an input image size of 299 x 299 x 3 pixels. With a transformer architecture that uses a multi-head self-attention mechanism to predict words and recover words sequentially with an RNN encoder-decoder architecture. The research was carried out using 5 minute videos which produced a Tensorflow dataset of 1000 images and 5000 sentence captions. The model was evaluated with BLEU (Bilingual Evaluation Understudy), with average scores of BLEU-1, BLEU-2, BLEU-3, and BLEU-4 obtained at 0.418, 0.367, 0.245, and 0.165 to produce predicted captions and real captions.</abstract><pub>IEEE</pub><doi>10.1109/ICSGTEIS60500.2023.10424317</doi><tpages>6</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2831-400X
ispartof	2023 International Conference on Smart-Green Technology in Electrical and Information Systems (ICSGTEIS), 2023, p.139-144
issn	2831-400X
language	eng
recordid	cdi_ieee_primary_10424317
source	IEEE Xplore All Conference Series
subjects	BLEU Computer architecture Encoder-Decoder RNN Feature extraction Inception-V3 Object detection Smoothing methods System performance Testing Transformer Transformers
title	Smoothing Convolutional Factorizes Inception V3 Labels and Transformers for Image Feature Extraction into Text Segmentation
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T13%3A24%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Smoothing%20Convolutional%20Factorizes%20Inception%20V3%20Labels%20and%20Transformers%20for%20Image%20Feature%20Extraction%20into%20Text%20Segmentation&rft.btitle=2023%20International%20Conference%20on%20Smart-Green%20Technology%20in%20Electrical%20and%20Information%20Systems%20(ICSGTEIS)&rft.au=Triana%20Indah,%20Komang%20Ayu&rft.date=2023-11-02&rft.spage=139&rft.epage=144&rft.pages=139-144&rft.eissn=2831-400X&rft_id=info:doi/10.1109/ICSGTEIS60500.2023.10424317&rft.eisbn=9798350382822&rft_dat=%3Cieee_CHZPO%3E10424317%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i119t-72a88120ebfb31c6a6701510e28c17d24c3928b750d90110adf04d9ba1bf62b73%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10424317&rfr_iscdi=true