Loading…
A Multimodal Learning Approach for Translating Live Lectures into MOOCs Materials
This paper introduces an AI-based solution for the automatic generation of MOOCs, aiming to efficiently create highly realistic instructional videos while ensuring high-quality content. The generated content strives to keep content accuracy, video fluidity, and vivacity. This paper employs a multimo...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 688 |
container_issue | |
container_start_page | 687 |
container_title | |
container_volume | |
creator | Huang, Tzu-Chia Chang, Chih-Yuan Tsai, Hung-I Tao, Han-Si |
description | This paper introduces an AI-based solution for the automatic generation of MOOCs, aiming to efficiently create highly realistic instructional videos while ensuring high-quality content. The generated content strives to keep content accuracy, video fluidity, and vivacity. This paper employs a multimodal to understand text, images, and sound simultaneously, enhancing the accuracy and realism of video generation. The process involves three stages: First, the preprocessing stage employs OpenAI's Whisper for audio-to-text conversion, supplemented by Fuzzy Wuzzy and Large Language Models (LLMs) to enhance content accuracy and detect thematic sections. In the second stage, speaker motion prediction begins with skeleton tags. Based on these labels, the speaker's motion can be classified into different categories. Subsequently, a multimodal, including BERT and CNN, further extracts features from text and voice diagrams, respectively. Based on these features, the multimodal can learn the speaker's motion categories through the skeleton labels. As a result, the multimodal can predict the classes of the speaker's motions. The final stage generates MOOCs audiovisuals, converting text into subtitles using LLMs and predicting the speaker's motions. Finally, the well-known tool is used to ensure accurate voice and lip synchronization. Based on the mentioned approaches, the proposed mechanism guarantees seamless alignment and consistency in the video elements, thereby ensuring the generated MOOCs can be realistic and more recent. |
doi_str_mv | 10.1109/ICCE-Taiwan62264.2024.10674579 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10674579</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10674579</ieee_id><sourcerecordid>10674579</sourcerecordid><originalsourceid>FETCH-LOGICAL-i106t-ac62b3461401d18b534a384f8ec4f466d1f7559d7a46badb4c8168047fd87e953</originalsourceid><addsrcrecordid>eNo1kM1Kw0AYRUdBsNS-gYtZuUudn2_-liVULSQEIa7Ll2SiI2lSJlPFt7eirs7iXg6XS8gdZ2vOmbvf5fk2qzF84qiF0LAWTMCaM21AGXdBVs44KxWTVluAS7IQyqjMCgvXZDXP74wxyR1j3C3I84aWpyGFw9ThQAuPcQzjK90cj3HC9o32U6R1xHEeMP0ERfjw51qbTtHPNIxpomVV5TMtMfkYcJhvyFV_hl_9cUleHrZ1_pQV1eMu3xRZOC9NGbZaNBI0B8Y7bhslAaWF3voWetC6471RynUGQTfYNdBari0D03fWeKfkktz-eoP3fn-M4YDxa___gvwGo-RTbw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A Multimodal Learning Approach for Translating Live Lectures into MOOCs Materials</title><source>IEEE Xplore All Conference Series</source><creator>Huang, Tzu-Chia ; Chang, Chih-Yuan ; Tsai, Hung-I ; Tao, Han-Si</creator><creatorcontrib>Huang, Tzu-Chia ; Chang, Chih-Yuan ; Tsai, Hung-I ; Tao, Han-Si</creatorcontrib><description>This paper introduces an AI-based solution for the automatic generation of MOOCs, aiming to efficiently create highly realistic instructional videos while ensuring high-quality content. The generated content strives to keep content accuracy, video fluidity, and vivacity. This paper employs a multimodal to understand text, images, and sound simultaneously, enhancing the accuracy and realism of video generation. The process involves three stages: First, the preprocessing stage employs OpenAI's Whisper for audio-to-text conversion, supplemented by Fuzzy Wuzzy and Large Language Models (LLMs) to enhance content accuracy and detect thematic sections. In the second stage, speaker motion prediction begins with skeleton tags. Based on these labels, the speaker's motion can be classified into different categories. Subsequently, a multimodal, including BERT and CNN, further extracts features from text and voice diagrams, respectively. Based on these features, the multimodal can learn the speaker's motion categories through the skeleton labels. As a result, the multimodal can predict the classes of the speaker's motions. The final stage generates MOOCs audiovisuals, converting text into subtitles using LLMs and predicting the speaker's motions. Finally, the well-known tool is used to ensure accurate voice and lip synchronization. Based on the mentioned approaches, the proposed mechanism guarantees seamless alignment and consistency in the video elements, thereby ensuring the generated MOOCs can be realistic and more recent.</description><identifier>EISSN: 2575-8284</identifier><identifier>EISBN: 9798350386844</identifier><identifier>DOI: 10.1109/ICCE-Taiwan62264.2024.10674579</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Extractive summarization ; Feature extraction ; Generative MOOCs ; Instructional videos ; Large language models ; Lips ; multimodal ; Production ; Skeleton ; Skeleton-based motion classification ; Training</subject><ispartof>2024 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan), 2024, p.687-688</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10674579$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,777,781,786,787,27906,54536,54913</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10674579$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Huang, Tzu-Chia</creatorcontrib><creatorcontrib>Chang, Chih-Yuan</creatorcontrib><creatorcontrib>Tsai, Hung-I</creatorcontrib><creatorcontrib>Tao, Han-Si</creatorcontrib><title>A Multimodal Learning Approach for Translating Live Lectures into MOOCs Materials</title><title>2024 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan)</title><addtitle>ICCE-Taiwan</addtitle><description>This paper introduces an AI-based solution for the automatic generation of MOOCs, aiming to efficiently create highly realistic instructional videos while ensuring high-quality content. The generated content strives to keep content accuracy, video fluidity, and vivacity. This paper employs a multimodal to understand text, images, and sound simultaneously, enhancing the accuracy and realism of video generation. The process involves three stages: First, the preprocessing stage employs OpenAI's Whisper for audio-to-text conversion, supplemented by Fuzzy Wuzzy and Large Language Models (LLMs) to enhance content accuracy and detect thematic sections. In the second stage, speaker motion prediction begins with skeleton tags. Based on these labels, the speaker's motion can be classified into different categories. Subsequently, a multimodal, including BERT and CNN, further extracts features from text and voice diagrams, respectively. Based on these features, the multimodal can learn the speaker's motion categories through the skeleton labels. As a result, the multimodal can predict the classes of the speaker's motions. The final stage generates MOOCs audiovisuals, converting text into subtitles using LLMs and predicting the speaker's motions. Finally, the well-known tool is used to ensure accurate voice and lip synchronization. Based on the mentioned approaches, the proposed mechanism guarantees seamless alignment and consistency in the video elements, thereby ensuring the generated MOOCs can be realistic and more recent.</description><subject>Accuracy</subject><subject>Extractive summarization</subject><subject>Feature extraction</subject><subject>Generative MOOCs</subject><subject>Instructional videos</subject><subject>Large language models</subject><subject>Lips</subject><subject>multimodal</subject><subject>Production</subject><subject>Skeleton</subject><subject>Skeleton-based motion classification</subject><subject>Training</subject><issn>2575-8284</issn><isbn>9798350386844</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNo1kM1Kw0AYRUdBsNS-gYtZuUudn2_-liVULSQEIa7Ll2SiI2lSJlPFt7eirs7iXg6XS8gdZ2vOmbvf5fk2qzF84qiF0LAWTMCaM21AGXdBVs44KxWTVluAS7IQyqjMCgvXZDXP74wxyR1j3C3I84aWpyGFw9ThQAuPcQzjK90cj3HC9o32U6R1xHEeMP0ERfjw51qbTtHPNIxpomVV5TMtMfkYcJhvyFV_hl_9cUleHrZ1_pQV1eMu3xRZOC9NGbZaNBI0B8Y7bhslAaWF3voWetC6471RynUGQTfYNdBari0D03fWeKfkktz-eoP3fn-M4YDxa___gvwGo-RTbw</recordid><startdate>20240709</startdate><enddate>20240709</enddate><creator>Huang, Tzu-Chia</creator><creator>Chang, Chih-Yuan</creator><creator>Tsai, Hung-I</creator><creator>Tao, Han-Si</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20240709</creationdate><title>A Multimodal Learning Approach for Translating Live Lectures into MOOCs Materials</title><author>Huang, Tzu-Chia ; Chang, Chih-Yuan ; Tsai, Hung-I ; Tao, Han-Si</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i106t-ac62b3461401d18b534a384f8ec4f466d1f7559d7a46badb4c8168047fd87e953</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Extractive summarization</topic><topic>Feature extraction</topic><topic>Generative MOOCs</topic><topic>Instructional videos</topic><topic>Large language models</topic><topic>Lips</topic><topic>multimodal</topic><topic>Production</topic><topic>Skeleton</topic><topic>Skeleton-based motion classification</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Huang, Tzu-Chia</creatorcontrib><creatorcontrib>Chang, Chih-Yuan</creatorcontrib><creatorcontrib>Tsai, Hung-I</creatorcontrib><creatorcontrib>Tao, Han-Si</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Huang, Tzu-Chia</au><au>Chang, Chih-Yuan</au><au>Tsai, Hung-I</au><au>Tao, Han-Si</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A Multimodal Learning Approach for Translating Live Lectures into MOOCs Materials</atitle><btitle>2024 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan)</btitle><stitle>ICCE-Taiwan</stitle><date>2024-07-09</date><risdate>2024</risdate><spage>687</spage><epage>688</epage><pages>687-688</pages><eissn>2575-8284</eissn><eisbn>9798350386844</eisbn><abstract>This paper introduces an AI-based solution for the automatic generation of MOOCs, aiming to efficiently create highly realistic instructional videos while ensuring high-quality content. The generated content strives to keep content accuracy, video fluidity, and vivacity. This paper employs a multimodal to understand text, images, and sound simultaneously, enhancing the accuracy and realism of video generation. The process involves three stages: First, the preprocessing stage employs OpenAI's Whisper for audio-to-text conversion, supplemented by Fuzzy Wuzzy and Large Language Models (LLMs) to enhance content accuracy and detect thematic sections. In the second stage, speaker motion prediction begins with skeleton tags. Based on these labels, the speaker's motion can be classified into different categories. Subsequently, a multimodal, including BERT and CNN, further extracts features from text and voice diagrams, respectively. Based on these features, the multimodal can learn the speaker's motion categories through the skeleton labels. As a result, the multimodal can predict the classes of the speaker's motions. The final stage generates MOOCs audiovisuals, converting text into subtitles using LLMs and predicting the speaker's motions. Finally, the well-known tool is used to ensure accurate voice and lip synchronization. Based on the mentioned approaches, the proposed mechanism guarantees seamless alignment and consistency in the video elements, thereby ensuring the generated MOOCs can be realistic and more recent.</abstract><pub>IEEE</pub><doi>10.1109/ICCE-Taiwan62264.2024.10674579</doi><tpages>2</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | EISSN: 2575-8284 |
ispartof | 2024 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan), 2024, p.687-688 |
issn | 2575-8284 |
language | eng |
recordid | cdi_ieee_primary_10674579 |
source | IEEE Xplore All Conference Series |
subjects | Accuracy Extractive summarization Feature extraction Generative MOOCs Instructional videos Large language models Lips multimodal Production Skeleton Skeleton-based motion classification Training |
title | A Multimodal Learning Approach for Translating Live Lectures into MOOCs Materials |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T19%3A40%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20Multimodal%20Learning%20Approach%20for%20Translating%20Live%20Lectures%20into%20MOOCs%20Materials&rft.btitle=2024%20International%20Conference%20on%20Consumer%20Electronics%20-%20Taiwan%20(ICCE-Taiwan)&rft.au=Huang,%20Tzu-Chia&rft.date=2024-07-09&rft.spage=687&rft.epage=688&rft.pages=687-688&rft.eissn=2575-8284&rft_id=info:doi/10.1109/ICCE-Taiwan62264.2024.10674579&rft.eisbn=9798350386844&rft_dat=%3Cieee_CHZPO%3E10674579%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i106t-ac62b3461401d18b534a384f8ec4f466d1f7559d7a46badb4c8168047fd87e953%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10674579&rfr_iscdi=true |