Loading…

Sample Complexity and Performance Bounds for Non-Parametric Approximate Linear Programming

One of the most difficult tasks in value function approximation for Markov Decision Processes is finding an approximation architecture that is expressive enough to capture the important structure in the value function, while at the same time not overfitting the training samples. Recent results in no...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings of the ... AAAI Conference on Artificial Intelligence 2013-06, Vol.27 (1), p.782-788
Main Authors:	Pazis, Jason, Parr, Ronald
Format:	Article
Language:	English
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	788
container_issue	1
container_start_page	782
container_title	Proceedings of the ... AAAI Conference on Artificial Intelligence
container_volume	27
creator	Pazis, Jason Parr, Ronald
description	One of the most difficult tasks in value function approximation for Markov Decision Processes is finding an approximation architecture that is expressive enough to capture the important structure in the value function, while at the same time not overfitting the training samples. Recent results in non-parametric approximate linear programming (NP-ALP), have demonstrated that this can be done effectively using nothing more than a smoothness assumption on the value function. In this paper we extend these results to the case where samples come from real world transitions instead of the full Bellman equation, adding robustness to noise. In addition, we provide the first max-norm, finite sample performance guarantees for any form of ALP. NP-ALP is amenable to problems with large (multidimensional) or even infinite (continuous) action spaces, and does not require a model to select actions using the resulting approximate solution.
doi_str_mv	10.1609/aaai.v27i1.8696
format	article
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1609_aaai_v27i1_8696</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1609_aaai_v27i1_8696</sourcerecordid><originalsourceid>FETCH-crossref_primary_10_1609_aaai_v27i1_86963</originalsourceid><addsrcrecordid>eNqVj81qwzAQhEVJoSHNudd9ATtW_KtjYhp6CMXQnnoRiy0bFUsyK7ckb1859AUyl5lhmMPH2AtPYl4kYoeIOv7dl5rHVSGKB7bep2UWpVlRrULmuYjyVIgntvX-OwnKBOe8XLOvDzTTqKB2i130fAW0HTSKekcGbavg6H5s5yF0eHc2apDQqJl0C4dpInfRBmcFZ20VEjTkhrAbbYdn9tjj6NX23zdsd3r9rN-ilpz3pHo5UfjSVfJELhRyoZA3CrlQpPc__gDrG1Ll</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Sample Complexity and Performance Bounds for Non-Parametric Approximate Linear Programming</title><source>Freely Accessible Science Journals</source><creator>Pazis, Jason ; Parr, Ronald</creator><creatorcontrib>Pazis, Jason ; Parr, Ronald</creatorcontrib><description>One of the most difficult tasks in value function approximation for Markov Decision Processes is finding an approximation architecture that is expressive enough to capture the important structure in the value function, while at the same time not overfitting the training samples. Recent results in non-parametric approximate linear programming (NP-ALP), have demonstrated that this can be done effectively using nothing more than a smoothness assumption on the value function. In this paper we extend these results to the case where samples come from real world transitions instead of the full Bellman equation, adding robustness to noise. In addition, we provide the first max-norm, finite sample performance guarantees for any form of ALP. NP-ALP is amenable to problems with large (multidimensional) or even infinite (continuous) action spaces, and does not require a model to select actions using the resulting approximate solution.</description><identifier>ISSN: 2159-5399</identifier><identifier>EISSN: 2374-3468</identifier><identifier>DOI: 10.1609/aaai.v27i1.8696</identifier><language>eng</language><ispartof>Proceedings of the ... AAAI Conference on Artificial Intelligence, 2013-06, Vol.27 (1), p.782-788</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Pazis, Jason</creatorcontrib><creatorcontrib>Parr, Ronald</creatorcontrib><title>Sample Complexity and Performance Bounds for Non-Parametric Approximate Linear Programming</title><title>Proceedings of the ... AAAI Conference on Artificial Intelligence</title><description>One of the most difficult tasks in value function approximation for Markov Decision Processes is finding an approximation architecture that is expressive enough to capture the important structure in the value function, while at the same time not overfitting the training samples. Recent results in non-parametric approximate linear programming (NP-ALP), have demonstrated that this can be done effectively using nothing more than a smoothness assumption on the value function. In this paper we extend these results to the case where samples come from real world transitions instead of the full Bellman equation, adding robustness to noise. In addition, we provide the first max-norm, finite sample performance guarantees for any form of ALP. NP-ALP is amenable to problems with large (multidimensional) or even infinite (continuous) action spaces, and does not require a model to select actions using the resulting approximate solution.</description><issn>2159-5399</issn><issn>2374-3468</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNqVj81qwzAQhEVJoSHNudd9ATtW_KtjYhp6CMXQnnoRiy0bFUsyK7ckb1859AUyl5lhmMPH2AtPYl4kYoeIOv7dl5rHVSGKB7bep2UWpVlRrULmuYjyVIgntvX-OwnKBOe8XLOvDzTTqKB2i130fAW0HTSKekcGbavg6H5s5yF0eHc2apDQqJl0C4dpInfRBmcFZ20VEjTkhrAbbYdn9tjj6NX23zdsd3r9rN-ilpz3pHo5UfjSVfJELhRyoZA3CrlQpPc__gDrG1Ll</recordid><startdate>20130630</startdate><enddate>20130630</enddate><creator>Pazis, Jason</creator><creator>Parr, Ronald</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20130630</creationdate><title>Sample Complexity and Performance Bounds for Non-Parametric Approximate Linear Programming</title><author>Pazis, Jason ; Parr, Ronald</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-crossref_primary_10_1609_aaai_v27i1_86963</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Pazis, Jason</creatorcontrib><creatorcontrib>Parr, Ronald</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the ... AAAI Conference on Artificial Intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pazis, Jason</au><au>Parr, Ronald</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sample Complexity and Performance Bounds for Non-Parametric Approximate Linear Programming</atitle><jtitle>Proceedings of the ... AAAI Conference on Artificial Intelligence</jtitle><date>2013-06-30</date><risdate>2013</risdate><volume>27</volume><issue>1</issue><spage>782</spage><epage>788</epage><pages>782-788</pages><issn>2159-5399</issn><eissn>2374-3468</eissn><abstract>One of the most difficult tasks in value function approximation for Markov Decision Processes is finding an approximation architecture that is expressive enough to capture the important structure in the value function, while at the same time not overfitting the training samples. Recent results in non-parametric approximate linear programming (NP-ALP), have demonstrated that this can be done effectively using nothing more than a smoothness assumption on the value function. In this paper we extend these results to the case where samples come from real world transitions instead of the full Bellman equation, adding robustness to noise. In addition, we provide the first max-norm, finite sample performance guarantees for any form of ALP. NP-ALP is amenable to problems with large (multidimensional) or even infinite (continuous) action spaces, and does not require a model to select actions using the resulting approximate solution.</abstract><doi>10.1609/aaai.v27i1.8696</doi></addata></record>
fulltext	fulltext
identifier	ISSN: 2159-5399
ispartof	Proceedings of the ... AAAI Conference on Artificial Intelligence, 2013-06, Vol.27 (1), p.782-788
issn	2159-5399 2374-3468
language	eng
recordid	cdi_crossref_primary_10_1609_aaai_v27i1_8696
source	Freely Accessible Science Journals
title	Sample Complexity and Performance Bounds for Non-Parametric Approximate Linear Programming
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T10%3A13%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sample%20Complexity%20and%20Performance%20Bounds%20for%20Non-Parametric%20Approximate%20Linear%20Programming&rft.jtitle=Proceedings%20of%20the%20...%20AAAI%20Conference%20on%20Artificial%20Intelligence&rft.au=Pazis,%20Jason&rft.date=2013-06-30&rft.volume=27&rft.issue=1&rft.spage=782&rft.epage=788&rft.pages=782-788&rft.issn=2159-5399&rft.eissn=2374-3468&rft_id=info:doi/10.1609/aaai.v27i1.8696&rft_dat=%3Ccrossref%3E10_1609_aaai_v27i1_8696%3C/crossref%3E%3Cgrp_id%3Ecdi_FETCH-crossref_primary_10_1609_aaai_v27i1_86963%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true