Acceleration of reinforcement learning by policy evaluation using nonstationary iterative method.

Senda, Kei; Hattori, Suguru; Hishinuma, Toru; Kohda, Takehisa

このアイテムのアクセス数: 363

http://hdl.handle.net/2433/192769

このアイテムのファイル:

ファイル	記述	サイズ	フォーマット
TCYB.2014.2313655.pdf		495.86 kB	Adobe PDF	見る/開く

完全メタデータレコード

DCフィールド	値	言語
dc.contributor.author	Senda, Kei	en
dc.contributor.author	Hattori, Suguru	en
dc.contributor.author	Hishinuma, Toru	en
dc.contributor.author	Kohda, Takehisa	en
dc.contributor.alternative	泉田, 啓	ja
dc.date.accessioned	2014-12-25T07:46:20Z	-
dc.date.available	2014-12-25T07:46:20Z	-
dc.date.issued	2014-12	-
dc.identifier.issn	2168-2275	-
dc.identifier.uri	http://hdl.handle.net/2433/192769	-
dc.description.abstract	Typical methods for solving reinforcement learning problems iterate two steps, policy evaluation and policy improvement. This paper proposes algorithms for the policy evaluation to improve learning efficiency. The proposed algorithms are based on the Krylov Subspace Method (KSM), which is a nonstationary iterative method. The algorithms based on KSM are tens to hundreds times more efficient than existing algorithms based on the stationary iterative methods. Algorithms based on KSM are far more efficient than they have been generally expected. This paper clarifies what makes algorithms based on KSM makes more efficient with numerical examples and theoretical discussions.	en
dc.format.mimetype	application/pdf	-
dc.language.iso	eng	-
dc.publisher	IEEE	en
dc.rights	© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en
dc.rights	この論文は出版社版でありません。引用の際には出版社版をご確認ご利用ください。	ja
dc.rights	This is not the published version. Please cite only the published version.	en
dc.title	Acceleration of reinforcement learning by policy evaluation using nonstationary iterative method.	en
dc.type	journal article	-
dc.type.niitype	Journal Article	-
dc.identifier.jtitle	IEEE transactions on cybernetics	en
dc.identifier.volume	44	-
dc.identifier.issue	12	-
dc.identifier.spage	2696	-
dc.identifier.epage	2705	-
dc.relation.doi	10.1109/TCYB.2014.2313655	-
dc.textversion	author	-
dc.identifier.pmid	24733037	-
dcterms.accessRights	open access	-
出現コレクション:	学術雑誌掲載論文等

アイテムの簡略レコードを表示する

Export to RefWorks

このリポジトリに保管されているアイテムはすべて著作権により保護されています。