Please help transcribe this video using our simple transcription tool. You need to be logged in to do so.
The majority of value function approximation based reinforcement learning algorithms available today, focus on approximating the state (V) or state-action (Q) value function and efficient action selection comes as an afterthought. On the other hand, real-world problems tend to have large action spaces, where evaluating every possible action becomes impractical. This mismatch presents a major obstacle in successfully applying reinforcement learning to real-world problems. In this paper we present a unified view of V and Q functions and arrive at a new space-efficient representation, where action selection can be done exponentially faster, without the use of a model. We then describe how to calculate this new value function efficiently via approximate linear programming and provide experimental results that demonstrate the effectiveness of the proposed approach.
Questions and AnswersYou need to be logged in to be able to post here.