Mujoco1M Comparison


bmrun________________________ user mean HalfCheetah Hopper InvertedPendulum Swimmer InvertedDoublePendulum Reacher Walker2d commit
trpo_mpi
       
cron 1896.01 1289.7 1912.9 905.1 94.96 6731.63 -4.82 2342.63 ea68f3b
ppo2
       
cron 2203.79 1668.58 2316.16 809.43 111.19 7102.91 -6.71 3424.95 ea68f3b

Learning Curves

X-axis: timesteps Y-axis: Reward (avg. 6 seeds)
0 200000 400000 600000 800000 1000000 0 1000 2000 HalfCheetah 0 200000 400000 600000 800000 1000000 0 1000 2000 Hopper 0 200000 400000 600000 800000 1000000 0 5000 InvertedDoublePendulum 0 200000 400000 600000 800000 1000000 0 500 1000 InvertedPendulum 0 200000 400000 600000 800000 1000000 −75 −50 −25 0 Reacher 0 200000 400000 600000 800000 1000000 50 100 Swimmer 0 200000 400000 600000 800000 1000000 0 2000 Walker2d