Please help transcribe this video using our simple transcription tool. You need to be logged in to do so.
To explore or exploit? In this paper, we discuss the long-standing exploration-exploration dilemma in context of designing a learning controller for stunt-style driving with scarce samples. By making an efficient use of a single demonstration by an expert, our algorithm leverages our intuitive understanding of driving to extract a coarse dynamics model from the collected driving data, then formulate the policy search in a setting of gradient update with a specially designed cost function. Both theoretical and empirical results are detailed and discussed.
Questions and AnswersYou need to be logged in to be able to post here.