Abstract: This paper presents a safe off-policy reinforcement learning (RL) scheme to design optimal controllers for systems with uncertain dynamics. The utility function for which its optimization ...