Please help transcribe this video using our simple transcription tool. You need to be logged in to do so.

Description

To explore or exploit? In this paper, we discuss the long-standing exploration-exploration dilemma in context of designing a learning controller for stunt-style driving with scarce samples. By making an efficient use of a single demonstration by an expert, our algorithm leverages our intuitive understanding of driving to extract a coarse dynamics model from the collected driving data, then formulate the policy search in a setting of gradient update with a specially designed cost function. Both theoretical and empirical results are detailed and discussed.

Questions and Answers

You need to be logged in to be able to post here.