Loading…

3D Scene Grammar for Parsing RGB-D Pointclouds

We pose 3D scene-understanding as a problem of parsing in a grammar. A grammar helps us capture the compositional structure of real-word objects, e.g., a chair is composed of a seat, a back-rest and some legs. Having multiple rules for an object helps us capture structural variations in objects, e.g...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2012-11
Main Authors: Anand, Abhishek, Li, Sherwin
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Anand, Abhishek
Li, Sherwin
description We pose 3D scene-understanding as a problem of parsing in a grammar. A grammar helps us capture the compositional structure of real-word objects, e.g., a chair is composed of a seat, a back-rest and some legs. Having multiple rules for an object helps us capture structural variations in objects, e.g., a chair can optionally also have arm-rests. Finally, having rules to capture composition at different levels helps us formulate the entire scene-processing pipeline as a single problem of finding most likely parse-tree---small segments combine to form parts of objects, parts to objects and objects to a scene. We attach a generative probability model to our grammar by having a feature-dependent probability function for every rule. We evaluated it by extracting labels for every segment and comparing the results with the state-of-the-art segment-labeling algorithm. Our algorithm was outperformed by the state-or-the-art method. But, Our model can be trained very efficiently (within seconds), and it scales only linearly in with the number of rules in the grammar. Also, we think that this is an important problem for the 3D vision community. So, we are releasing our dataset and related code.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2086670061</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2086670061</sourcerecordid><originalsourceid>FETCH-proquest_journals_20866700613</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQM3ZRCE5OzUtVcC9KzM1NLFJIyy9SCEgsKs7MS1cIcnfSdVEIyM_MK0nOyS9NKeZhYE1LzClO5YXS3AzKbq4hzh66BUX5haWpxSXxWfmlRXlAqXgjAwszM3MDAzNDY-JUAQBEOjDi</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2086670061</pqid></control><display><type>article</type><title>3D Scene Grammar for Parsing RGB-D Pointclouds</title><source>Publicly Available Content Database</source><creator>Anand, Abhishek ; Li, Sherwin</creator><creatorcontrib>Anand, Abhishek ; Li, Sherwin</creatorcontrib><description>We pose 3D scene-understanding as a problem of parsing in a grammar. A grammar helps us capture the compositional structure of real-word objects, e.g., a chair is composed of a seat, a back-rest and some legs. Having multiple rules for an object helps us capture structural variations in objects, e.g., a chair can optionally also have arm-rests. Finally, having rules to capture composition at different levels helps us formulate the entire scene-processing pipeline as a single problem of finding most likely parse-tree---small segments combine to form parts of objects, parts to objects and objects to a scene. We attach a generative probability model to our grammar by having a feature-dependent probability function for every rule. We evaluated it by extracting labels for every segment and comparing the results with the state-of-the-art segment-labeling algorithm. Our algorithm was outperformed by the state-or-the-art method. But, Our model can be trained very efficiently (within seconds), and it scales only linearly in with the number of rules in the grammar. Also, we think that this is an important problem for the 3D vision community. So, we are releasing our dataset and related code.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Feature extraction ; Labels</subject><ispartof>arXiv.org, 2012-11</ispartof><rights>2012. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2086670061?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Anand, Abhishek</creatorcontrib><creatorcontrib>Li, Sherwin</creatorcontrib><title>3D Scene Grammar for Parsing RGB-D Pointclouds</title><title>arXiv.org</title><description>We pose 3D scene-understanding as a problem of parsing in a grammar. A grammar helps us capture the compositional structure of real-word objects, e.g., a chair is composed of a seat, a back-rest and some legs. Having multiple rules for an object helps us capture structural variations in objects, e.g., a chair can optionally also have arm-rests. Finally, having rules to capture composition at different levels helps us formulate the entire scene-processing pipeline as a single problem of finding most likely parse-tree---small segments combine to form parts of objects, parts to objects and objects to a scene. We attach a generative probability model to our grammar by having a feature-dependent probability function for every rule. We evaluated it by extracting labels for every segment and comparing the results with the state-of-the-art segment-labeling algorithm. Our algorithm was outperformed by the state-or-the-art method. But, Our model can be trained very efficiently (within seconds), and it scales only linearly in with the number of rules in the grammar. Also, we think that this is an important problem for the 3D vision community. So, we are releasing our dataset and related code.</description><subject>Algorithms</subject><subject>Feature extraction</subject><subject>Labels</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQM3ZRCE5OzUtVcC9KzM1NLFJIyy9SCEgsKs7MS1cIcnfSdVEIyM_MK0nOyS9NKeZhYE1LzClO5YXS3AzKbq4hzh66BUX5haWpxSXxWfmlRXlAqXgjAwszM3MDAzNDY-JUAQBEOjDi</recordid><startdate>20121108</startdate><enddate>20121108</enddate><creator>Anand, Abhishek</creator><creator>Li, Sherwin</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20121108</creationdate><title>3D Scene Grammar for Parsing RGB-D Pointclouds</title><author>Anand, Abhishek ; Li, Sherwin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_20866700613</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Algorithms</topic><topic>Feature extraction</topic><topic>Labels</topic><toplevel>online_resources</toplevel><creatorcontrib>Anand, Abhishek</creatorcontrib><creatorcontrib>Li, Sherwin</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Anand, Abhishek</au><au>Li, Sherwin</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>3D Scene Grammar for Parsing RGB-D Pointclouds</atitle><jtitle>arXiv.org</jtitle><date>2012-11-08</date><risdate>2012</risdate><eissn>2331-8422</eissn><abstract>We pose 3D scene-understanding as a problem of parsing in a grammar. A grammar helps us capture the compositional structure of real-word objects, e.g., a chair is composed of a seat, a back-rest and some legs. Having multiple rules for an object helps us capture structural variations in objects, e.g., a chair can optionally also have arm-rests. Finally, having rules to capture composition at different levels helps us formulate the entire scene-processing pipeline as a single problem of finding most likely parse-tree---small segments combine to form parts of objects, parts to objects and objects to a scene. We attach a generative probability model to our grammar by having a feature-dependent probability function for every rule. We evaluated it by extracting labels for every segment and comparing the results with the state-of-the-art segment-labeling algorithm. Our algorithm was outperformed by the state-or-the-art method. But, Our model can be trained very efficiently (within seconds), and it scales only linearly in with the number of rules in the grammar. Also, we think that this is an important problem for the 3D vision community. So, we are releasing our dataset and related code.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2012-11
issn 2331-8422
language eng
recordid cdi_proquest_journals_2086670061
source Publicly Available Content Database
subjects Algorithms
Feature extraction
Labels
title 3D Scene Grammar for Parsing RGB-D Pointclouds
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T23%3A21%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=3D%20Scene%20Grammar%20for%20Parsing%20RGB-D%20Pointclouds&rft.jtitle=arXiv.org&rft.au=Anand,%20Abhishek&rft.date=2012-11-08&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2086670061%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_20866700613%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2086670061&rft_id=info:pmid/&rfr_iscdi=true