My work is focused on developing machine learning methods and algorithms that can allow robots to discover and learn complex and intelligent behavior. Recently, I've been interested in representation learning and model-based reinforcement learning methods.


Yevgen Chebotar*, Karol Hausman*, Marvin Zhang*, Gaurav Sukhatme, Stefan Schaal, Sergey Levine. Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning. ICML 2017. arXiv 1703.03078.

In this paper, we devise an algorithm for training time-varying linear-Gaussian controllers that combines a sample-efficient model-based method with a corrective model-free method. We show that our algorithm, which we call PILQR, retains the sample efficiency of model-based methods while suffering less from modeling errors, which we demonstrate through several simulated and real robot experiments. We also combine PILQR with guided policy search to train expressive and general deep neural network policies. A video of our experimental results is available from the project website.

Marvin Zhang*, Xinyang Geng*, Jonathan Bruce*, Ken Caluwaerts, Massimo Vespignani, Vytas SunSpiral, Pieter Abbeel, Sergey Levine. Deep Reinforcement Learning for Tensegrity Robot Locomotion. ICRA 2017. arXiv 1609.09049.

In this paper, we explore the challenges associated with learning stable and efficient periodic locomotion, and we develop novel extensions to the mirror descent guided policy search algorithm to better handle this type of domain. Our method is able to learn successful locomotion for the NASA SUPERball, a tensegrity robot with a number of properties that make it a promising candidate for future planetary exploration missions. Videos and supplementary materials are available from the project website.

Marvin Zhang, Zoe McCarthy, Chelsea Finn, Sergey Levine, Pieter Abbeel. Learning Deep Neural Network Policies with Continuous Memory States. ICRA 2016. arXiv 1507.01273.

In this paper, we train control policies with continuous memory states, which can be understood as a type of recurrent neural network. We show that these policies can successfully and efficiently complete several simulated tasks that either require memory or can be made easier by the use of memory. We compare our method to several other baselines, and show that our method outperforms all of these alternate approaches.