cddd97b | Peter Zhokhov | 09 January 2019, 19:21:22 UTC | remove noop code | 09 January 2019, 19:21:22 UTC |
146bbf8 | Timothy Lee | 30 November 2018, 01:28:09 UTC | Removed code that prevented changes to actor loss when training with demos (#740) | 30 November 2018, 01:28:08 UTC |
f3a5aba | pzhokhov | 27 November 2018, 01:57:25 UTC | added smoke tests of ddpg (#734) | 27 November 2018, 01:57:25 UTC |
97e0391 | pzhokhov | 27 November 2018, 01:56:41 UTC | Fix ppo2 with MPI bug, other minor fixes (#735) * joshim5 changes (width and height to WarpFrame wrapper) * match network output with action distribution via a linear layer only if necessary (#167) * support color vs. grayscale option in WarpFrame wrapper (#166) * support color vs. grayscale option in WarpFrame wrapper * Support color in other wrappers * Updated per Peters suggestions * fixing test failures * ppo2 with microbatches (#168) * pass microbatch_size to the model during construction * microbatch fixes and test (#169) * microbatch fixes and test * tiny cleanup * added assertions to the test * vpg-related fix * Peterz joshim5 subclass ppo2 model (#170) * microbatch fixes and test * tiny cleanup * added assertions to the test * vpg-related fix * subclassing the model to make microbatched version of model WIP * made microbatched model a subclass of ppo2 Model * flake8 complaint * mpi-less ppo2 (resolving merge conflict) * flake8 and mpi4py imports in ppo2/model.py * more un-mpying * merge master * updates to the benchmark viewer code + autopep8 (#184) * viz docs and syntactic sugar wip * update viewer yaml to use persistent volume claims * move plot_util to baselines.common, update links * use 1Tb hard drive for results viewer * small updates to benchmark vizualizer code * autopep8 * autopep8 * any folder can be a benchmark * massage games image a little bit * fixed --preload option in app.py * remove preload from run_viewer.sh * remove pdb breakpoints * update bench-viewer.yaml * fixed bug (#185) * fixed bug it's wrong to do the else statement, because no other nodes would start. * changed the fix slightly | 27 November 2018, 01:56:41 UTC |
25ecb64 | pzhokhov | 27 November 2018, 00:30:37 UTC | fixed issue with wrong output layer variable names in ddpg (#733) | 27 November 2018, 00:30:37 UTC |
7dc6bc7 | Prabhat Nagarajan | 27 November 2018, 00:19:09 UTC | fixes typo (#732) * fixes typo * adds apostrophe | 27 November 2018, 00:19:09 UTC |
7139a66 | Christopher Hesse | 21 November 2018, 23:00:51 UTC | Merge pull request #728 from openai/christopherhesse-patch-1 Update README.md | 21 November 2018, 23:00:51 UTC |
8607dca | Christopher Hesse | 21 November 2018, 22:57:10 UTC | Update README.md | 21 November 2018, 22:57:10 UTC |
9f9835f | pzhokhov | 21 November 2018, 20:51:15 UTC | Update __init__.py | 21 November 2018, 20:51:15 UTC |
d3fed18 | sedand | 14 November 2018, 22:50:59 UTC | Fixed comment on example usage in jupyter-notebook (#396) Cause of error: Import name must be results_plotter, not log_viewer. | 14 November 2018, 22:50:59 UTC |
339d564 | Roman Ring | 14 November 2018, 20:22:42 UTC | add docs for layer_norm param in DQN baseline (#107) | 14 November 2018, 20:22:42 UTC |
a75bc37 | Buck Shlegeris | 14 November 2018, 20:20:55 UTC | fix typo in a comment (#161) | 14 November 2018, 20:20:55 UTC |
87b3a04 | Peter Zhokhov | 14 November 2018, 20:16:53 UTC | autopep8 | 14 November 2018, 20:16:53 UTC |
c5b1a1b | Brent Komer | 13 November 2018, 21:08:32 UTC | typo fix (#230) | 13 November 2018, 21:08:32 UTC |
c59a109 | JohannesAck | 13 November 2018, 21:03:48 UTC | Parameter documentation for tf_util.function (#349) * Added parameter documentation This parameter was thus far not documented and is non-intuitive when unfamiliar with tf. * Added parameter documentation | 13 November 2018, 21:03:48 UTC |
5cd6601 | James Alan Preiss | 13 November 2018, 19:09:11 UTC | case-insensitive sort for human-readable logger (#289) | 13 November 2018, 19:09:11 UTC |
0a13da8 | Xiaoquan Kong | 13 November 2018, 19:08:21 UTC | Change variable name from `inpt` to `input_` (#297) | 13 November 2018, 19:08:21 UTC |
18b6390 | Vladislav Zavadskyy | 13 November 2018, 19:03:55 UTC | Typo fix (#287) | 13 November 2018, 19:03:55 UTC |
52255be | pzhokhov | 09 November 2018, 19:18:05 UTC | microbatches in ppo2, custom frame size in WarpFrame, matching fc layer only when needed (#707) * joshim5 changes (width and height to WarpFrame wrapper) * match network output with action distribution via a linear layer only if necessary (#167) * support color vs. grayscale option in WarpFrame wrapper (#166) * support color vs. grayscale option in WarpFrame wrapper * Support color in other wrappers * Updated per Peters suggestions * fixing test failures * ppo2 with microbatches (#168) * pass microbatch_size to the model during construction * microbatch fixes and test (#169) * microbatch fixes and test * tiny cleanup * added assertions to the test * vpg-related fix * Peterz joshim5 subclass ppo2 model (#170) * microbatch fixes and test * tiny cleanup * added assertions to the test * vpg-related fix * subclassing the model to make microbatched version of model WIP * made microbatched model a subclass of ppo2 Model * flake8 complaint * mpi-less ppo2 (resolving merge conflict) * flake8 and mpi4py imports in ppo2/model.py * more un-mpying | 09 November 2018, 19:18:05 UTC |
d80acbb | AurelianTactics | 08 November 2018, 18:13:07 UTC | Removing print spam from Wrapper (#705) * DDPG has unused 'seed' argument DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for: ``` from baselines.common import set_global_seeds ... def learn(...): ... set_global_seeds(seed) ``` DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds. * DDPG: duplicate variable assignment variable nb_actions assigned same value twice in space of 10 lines nb_actions = env.action_space.shape[-1] * DDPG: noise_type 'normal_x' and 'ou_x' cause assert noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2' cause an assert message and DDPG not to run. Issue is noise following block: ''' if self.action_noise is not None and apply_noise: noise = self.action_noise() assert noise.shape == action.shape action += noise ''' noise is not nested: [number_of_actions] actions is nested: [[number_of_actions]] Can either nest noise or unnest actions * Revert "DDPG: noise_type 'normal_x' and 'ou_x' cause assert" * DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2') cause an assert message and DDPG not to run. Issue is the following block: ''' if self.action_noise is not None and apply_noise: noise = self.action_noise() assert noise.shape == action.shape action += noise ''' noise is not nested: [number_of_actions] action is nested: [[number_of_actions]] Hence the shapes do not pass the assert line even though the action += noise line is correct * Removing Print Spam from Wrapper Prints a line every time a video is saved or not saved. Seems unnecessary. | 08 November 2018, 18:13:07 UTC |
556b198 | pzhokhov | 08 November 2018, 18:11:45 UTC | Internal minifixes (#694) * joshim5 changes (width and height to WarpFrame wrapper) * match network output with action distribution via a linear layer only if necessary (#167) * support color vs. grayscale option in WarpFrame wrapper (#166) * support color vs. grayscale option in WarpFrame wrapper * Support color in other wrappers * Updated per Peters suggestions * fixing test failures | 08 November 2018, 18:11:45 UTC |
cc88804 | pzhokhov | 08 November 2018, 01:20:52 UTC | Update viz.ipynb | 08 November 2018, 01:20:52 UTC |
c14d307 | pzhokhov | 08 November 2018, 01:19:42 UTC | move viz docs to a notebook entirely (#704) * viz docs * writing vizualization docs * documenting plot_util * docstrings in plot_util * autopep8 and flake8 * spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc) * rephrased viz.md a little bit * more examples of viz code usage in the docs * replaced vizualization doc with notebook | 08 November 2018, 01:19:42 UTC |
0b71d4c | pzhokhov | 08 November 2018, 01:19:25 UTC | remove unused args of DDPG class (#702) | 08 November 2018, 01:19:25 UTC |
7bb405c | pzhokhov | 07 November 2018, 22:25:35 UTC | Update viz.md | 07 November 2018, 22:25:35 UTC |
8b95576 | pzhokhov | 07 November 2018, 01:02:20 UTC | more viz + build fixes (#703) * viz docs * writing vizualization docs * documenting plot_util * docstrings in plot_util * autopep8 and flake8 * spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc) * rephrased viz.md a little bit * more examples of viz code usage in the docs | 07 November 2018, 01:02:20 UTC |
9d4fb76 | Peter Zhokhov | 06 November 2018, 17:58:43 UTC | making num_envs and video length smaller in test_video_recorder to prevent hanging on travis | 06 November 2018, 17:58:43 UTC |
664ec6f | Peter Zhokhov | 06 November 2018, 03:19:39 UTC | catch bugfixes in gym | 06 November 2018, 03:19:39 UTC |
3917321 | Peter Zhokhov | 06 November 2018, 01:00:40 UTC | revert over-spellchecking | 06 November 2018, 01:00:40 UTC |
6e607ef | coord.e | 05 November 2018, 22:32:17 UTC | Add video recorder (#666) * Fix: Return the result of rendering from dummyvecenv * Add: Add a video recorder wrapper for vecenv * Change: Use VecVideoRecorder with --video_monitor flag * Change: Overwrite the metadata only when it isn't defined * Add: Define __del__ to make the file correctly closed in exit * Fix: Bump epidode_id in reset() * Fix: Use hasattr to check the existence of .metadata * Fix: Make directory when it doesn't exist * Change: Kepp recording for `video_length` steps, then close Because reset() is not what it is in normal gym.Env * Add: Enable to specify video_length from command line argument * Delete: Delete default value, None, of video_callable * Change: Use self.recorded_frames and self.recording to manage intervals * Add: Log the status of video recording * Fix: Fix saving path * Change: Place metadata in the base VecEnv * Delete: Delete unused imports * Fix: epidode_id => step_id * Fix: Refine the flag name * Change: Unify the flag name folloing to previous change * [WIP] Add: Add a test of VecVideoRecorder * Fix: Use PongNoFrameskip-v0 because SimpleEnv doesn't have render() * Change; Use TemporaryDirectory * Fix: minimal successful test * Add: Test against parallel environments * Add: Test against different type of VecEnvs * Change: Test against different length and interval of video capture * Delete: Reduce the number of tests * Change: Test if the output video is not empty * Add: Add some comments * Fix: Fix the flag name * Add: Add docstrings * Fix: Install ffmpeg in testing container for VecVideoRecorder's test * Fix: Delete unused things * Fix: Replace `video_callable` with `record_video_trigger` * Fix: Improve the explanation of `record_video_trigger` argument * Fix: Close owning vecenv in VecVideoRecorder.close to resolve memory leak | 05 November 2018, 22:32:17 UTC |
c74ce02 | pzhokhov | 05 November 2018, 22:31:15 UTC | visualization code docs / bugfixes (#701) * viz docs * writing vizualization docs * documenting plot_util * docstrings in plot_util * autopep8 and flake8 * spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc) * rephrased viz.md a little bit | 05 November 2018, 22:31:15 UTC |
ab59de6 | pzhokhov | 31 October 2018, 18:15:41 UTC | mpi-less baselines (#689) * make baselines run without mpi wip * squash-merged latest master * further removing MPI references where unnecessary * more MPI removal * syntax and flake8 * MpiAdam becomes regular Adam if Mpi not present * autopep8 * add assertion to test in mpi_adam; fix trpo_mpi failure without MPI on cartpole * mpiless ddpg | 31 October 2018, 18:15:41 UTC |
a071fa7 | Mathieu Poliquin | 30 October 2018, 17:17:46 UTC | Add retro to ppo2 defaults (#682) * Adds retro to ppo2 defaults Created defaults for retro, copied from Atari defaults for now. Tested with SuperMarioBros-Nes * ppo2 retro defaults to atari | 30 October 2018, 17:17:46 UTC |
637bf55 | Mathieu Poliquin | 30 October 2018, 17:16:15 UTC | Use deepmind wrapper for retro (#685) * Use deepmind wrapper for retro * moved wrap_deepmind_retro after Monitor wrapper | 30 October 2018, 17:16:15 UTC |
165c622 | AurelianTactics | 30 October 2018, 17:13:39 UTC | DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError (#680) * DDPG has unused 'seed' argument DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for: ``` from baselines.common import set_global_seeds ... def learn(...): ... set_global_seeds(seed) ``` DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds. * DDPG: duplicate variable assignment variable nb_actions assigned same value twice in space of 10 lines nb_actions = env.action_space.shape[-1] * DDPG: noise_type 'normal_x' and 'ou_x' cause assert noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2' cause an assert message and DDPG not to run. Issue is noise following block: ''' if self.action_noise is not None and apply_noise: noise = self.action_noise() assert noise.shape == action.shape action += noise ''' noise is not nested: [number_of_actions] actions is nested: [[number_of_actions]] Can either nest noise or unnest actions * Revert "DDPG: noise_type 'normal_x' and 'ou_x' cause assert" * DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2') cause an assert message and DDPG not to run. Issue is the following block: ''' if self.action_noise is not None and apply_noise: noise = self.action_noise() assert noise.shape == action.shape action += noise ''' noise is not nested: [number_of_actions] action is nested: [[number_of_actions]] Hence the shapes do not pass the assert line even though the action += noise line is correct | 30 October 2018, 17:13:39 UTC |
93c7cc2 | Peter Zhokhov | 29 October 2018, 22:25:38 UTC | Merge branch 'master' of github.com:openai/baselines | 29 October 2018, 22:25:38 UTC |
de36116 | Peter Zhokhov | 29 October 2018, 22:25:31 UTC | update tensorflow version check regex to parse version like 1.2.3rc4 (previously only 1.2.3-rc4) | 29 October 2018, 22:25:31 UTC |
e2b4182 | Mathieu Poliquin | 29 October 2018, 20:30:41 UTC | Set 'cnn' as default network for retro (#683) | 29 October 2018, 20:30:41 UTC |
8e56dde | pzhokhov | 24 October 2018, 18:01:59 UTC | Multidiscrete action space compatibility for policy gradient-based methods (#677) * multidiscrete space compatibility * flake8 and syntax | 24 October 2018, 18:01:59 UTC |
c3bd8ce | Juliano Laganá | 24 October 2018, 17:00:31 UTC | Adds description of param_noise parameter in deepq.learn method (#675) | 24 October 2018, 17:00:31 UTC |
84ea7aa | AurelianTactics | 24 October 2018, 16:59:46 UTC | DDPG has unused 'seed' argument (#676) DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for: ``` from baselines.common import set_global_seeds ... def learn(...): ... set_global_seeds(seed) ``` DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds. | 24 October 2018, 16:59:46 UTC |
88300ed | Peter Zhokhov | 24 October 2018, 16:57:57 UTC | fix raise NotImplemented() complaints of latest flake8 | 24 October 2018, 16:57:57 UTC |
583ba08 | pzhokhov | 23 October 2018, 18:22:27 UTC | Update cmd_util.py | 23 October 2018, 18:22:27 UTC |
014a559 | pzhokhov | 23 October 2018, 17:01:25 UTC | refactor ACER (#664) * make acer use vecframestack * acer passes mnist test with 20k steps * acer with non-image observations and tests * flake8 * test acer serialization with non-recurrent policies | 23 October 2018, 17:01:25 UTC |
4ed1350 | Isaac Poulton | 23 October 2018, 17:00:09 UTC | Fixed TypeError on creating atari vec envs (#671) | 23 October 2018, 17:00:09 UTC |
8513d73 | Rishabh Jangir | 23 October 2018, 02:04:40 UTC | HER : new functionality, enables demo based training (#474) * Add, initialize, normalize and sample from a demo buffer * Modify losses and add cloning loss * Add demo file parameter to train.py * Introduce new params in config.py for demo based training * Change logger.warning to logger.warn in rollout.py;bug * Add data generation file for Fetch environments * Update README file | 23 October 2018, 02:04:40 UTC |
c28acb2 | Xingdong Zuo | 23 October 2018, 02:01:26 UTC | [Clean-up]: delete `running_stat` and `filters` as they are replaced by `running_mean_std` and not used anymore (#614) * Delete filters.py * Delete running_stat.py | 23 October 2018, 02:01:26 UTC |
c5d9c4a | pzhokhov | 23 October 2018, 01:36:39 UTC | wrap retro envs correctly for other (non-deepq) algorithms (#669) * wrap retro envs correctly for other (non-deepq) algorithms * flake and csh comments * flake and csh comments | 23 October 2018, 01:36:39 UTC |
c0fa11a | pzhokhov | 22 October 2018, 16:15:04 UTC | minor fixes from internal (#665) * sync internal changes. Make ddpg work with vecenvs * B -> nenvs for consistency with other algos, small cleanups * eval_done[d]==True -> eval_done[d] * flake8 and numpy.random.random_integers deprecation warning * Merge branch 'master' of github.com:openai/games into peterz_track_baselines_branch | 22 October 2018, 16:15:04 UTC |
bd390c2 | Peter Zhokhov | 20 October 2018, 00:50:54 UTC | updated docstring for deepq | 20 October 2018, 00:50:54 UTC |
d0cc325 | pzhokhov | 19 October 2018, 15:54:21 UTC | store session at policy creation time (#655) * sync internal changes. Make ddpg work with vecenvs * B -> nenvs for consistency with other algos, small cleanups * eval_done[d]==True -> eval_done[d] * flake8 and numpy.random.random_integers deprecation warning * store session at policy creation time * coexistence tests * fix a typo * autopep8 * ... and flake8 * updated todo links in test_serialization | 19 October 2018, 15:54:21 UTC |
fc7f9ce | pzhokhov | 18 October 2018, 23:07:14 UTC | disable gym subpackages in setup.py (#661) * disable gym subpackages in setup.py * include gym[atari] in test requirements * gym[atari] -> atari-py in test requirements | 18 October 2018, 23:07:14 UTC |
3677dc1 | Matthew Rahtz | 18 October 2018, 20:54:39 UTC | Set allow_growth=True for MuJoCo session (#643) | 18 October 2018, 20:54:39 UTC |
ef96f38 | Matthew Rahtz | 16 October 2018, 23:28:23 UTC | Drop S and M args so that --play works (#636) | 16 October 2018, 23:28:23 UTC |
a03dacd | pzhokhov | 16 October 2018, 23:26:46 UTC | sync internal changes. Make ddpg work with vecenvs (#654) * sync internal changes. Make ddpg work with vecenvs * B -> nenvs for consistency with other algos, small cleanups * eval_done[d]==True -> eval_done[d] * flake8 and numpy.random.random_integers deprecation warning | 16 October 2018, 23:26:46 UTC |
e57f81b | Tianhong Dai | 16 October 2018, 23:22:06 UTC | revise the readme of ddpg (#653) | 16 October 2018, 23:22:06 UTC |
28aca63 | Peter Zhokhov | 09 October 2018, 16:48:31 UTC | update benchmark results | 09 October 2018, 16:48:31 UTC |
7bfbcf1 | Erik Doffagne | 04 October 2018, 17:31:22 UTC | Fixed typos in README (#635) | 04 October 2018, 17:31:22 UTC |
394339d | pzhokhov | 04 October 2018, 03:53:58 UTC | Update README.md | 04 October 2018, 03:53:58 UTC |
10c205c | pzhokhov | 02 October 2018, 23:33:19 UTC | Debug codegen ppo (#123) * disabled tests, running benchmarks only * dummy commit to RUN BENCHMARKS * benchmark ppo_metal; disable all but Bullet benchmarks * ppo2, codegen ppo and ppo_metal on Bullet RUN BENCHMARKS * run benchmarks on Roboschool instead RUN BENCHMARKS * run ppo_metal on Roboschool as well RUN BENCHMARKS * install roboschool in cron rcall user_config * dummy commit to RUN BENCHMARKS * import roboschool in codegen/contcontrol_prob.py RUN BENCHMARKS * re-enable tests, flake8 * get entropy from a distribution in Pred RUN BENCHMARKS * gin for hyperparameter injection; try codegen ppo close to baselines ppo RUN BENCHMARKS * provide default value for cg2/bmv_net_ops.py * dummy commit to RUN BENCHMARKS * make tests and benchmarks parallel; use relative path to gin file for rcall compatibility RUN BENCHMARKS * syntax error in run-benchmarks-new.py RUN BENCHMARKS * syntax error in run-benchmarks-new.py RUN BENCHMARKS * path relative to codegen/training for gin files RUN BENCHMARKS * another reconcilliation attempt between codegen ppo and baselines ppo RUN BENCHMARKS * value_network=copy for ppo2 on roboschool RUN BENCHMARKS * make None seed work with torch seeding RUN BENCHMARKS * try sequential batches with ppo2 RUN BENCHMARKS * try ppo without advantage normalization RUN BENCHMARKS * use Distribution to compute ema NLL RUN BENCHMARKS * autopep8 * clip gradient norm in algo_agent RUN BENCHMARKS * try ppo2 without vfloss clipping RUN BENCHMARKS * trying with gamma=0.0 - assumption is, both algos should be equally bad RUN BENCHMARKS * set gamma=0 in ppo2 RUN BENCHMARKS * try with ppo2 with single minibatch RUN BENCHMARKS * try with nminibatches=4, value_network=copy RUN BENCHMARKS * try with nminibatches=1 take two RUN BENCHMARKS * try initialization for vf=0.01 RUN BENCHMARKS * fix the problem with min_istart >= max_istart * i have no idea RUN BENCHMARKS * fix non-shared variance between old and new RUN BENCHMARKS * restored baselines.common.policies * 16 minibatches in ppo_roboschool.gin * fixing results of merge * cleanups * cleanups * fix run-benchmarks-new RUN BENCHMARKS Roboschool8M * fix syntax in run-benchmarks-new RUN BENCHMARKS Roboschool8M * fix test failures * moved gin requirement to codegen/setup.py * remove duplicated build_softq in get_algo.py * linting * run softq on continuous action spaces RUN BENCHMARKS Roboschool8M | 03 October 2018, 21:38:32 UTC |
62fe7c4 | pzhokhov | 02 October 2018, 22:54:14 UTC | disable async acktr (#129) * disable async acktr * linting * linting * linting | 03 October 2018, 21:38:32 UTC |
fbdf55f | Xingyou Song | 01 October 2018, 18:39:14 UTC | Xsong lqr ddpg (#125) * allows vec_envs to work * allows vec_envs to work * fixed branch with correct ddpg * running experiments jointly now * changed to subproc * changed to subproc * changed to subproc * small fix md * removed placeholder * removed placeholder * added ppotest * probably fixed ddpg hyperparam issues * checkpoint * edited readme * added orthogonal * added orthogonal * added ddpg-vecenv * reverted ddpg to old baselines | 03 October 2018, 21:38:32 UTC |
9ee804c | Christopher Hesse | 01 October 2018, 17:38:07 UTC | minor change to install.py and baselines run.py (#121) | 03 October 2018, 21:38:32 UTC |
4cf7dc9 | John Schulman | 30 September 2018, 21:54:44 UTC | Big refactor (#124) * massive revision inspired by soup: algo folder works * porting rl commands, WIP * various * git subrepo push --remote=git@github.com:openai/codegen.git --branch=refactor codegen subrepo: subdir: "codegen" merged: "aa27e069" upstream: origin: "git@github.com:openai/codegen.git" branch: "refactor" commit: "aa27e069" git-subrepo: version: "0.4.0" origin: "git@github.com:ingydotnet/git-subrepo.git" commit: "74339e8" * various * rewrite RL stuff in new framework * fix almost everything * woohoo tests pass * more tests * reformatting * fixes * write tests for embeddings * re-remove cg2 * pylint * minor * move smooth_helpers import; seems to cause nondeterministic failure in parallel pytest | 03 October 2018, 21:38:32 UTC |
e820b86 | Xingyou Song | 27 September 2018, 20:11:11 UTC | ppo2 now has eval stats (#120) * ppo2 now has eval stats * fixed spaces * fixed kwargs ordering * whitespace fix | 03 October 2018, 21:38:32 UTC |
858afa8 | pzhokhov | 26 September 2018, 22:28:52 UTC | Refactor DDPG (#111) * run ddpg on Mujoco benchmark RUN BENCHMARKS * autopep8 * fixed all syntax in refactored ddpg * a little bit more refactoring * autopep8 * identity test with ddpg WIP * enable test_identity with ddpg * refactored ddpg RUN BENCHMARKS * autopep8 * include ddpg into style check * fixing tests RUN BENCHMARKS * set default seed to None RUN BENCHMARKS * run tests and benchmarks in separate buildkite steps RUN BENCHMARKS * cleanup pdb usage * flake8 and cleanups * re-enabled all benchmarks in run-benchmarks-new.py * flake8 complaints * deepq model builder compatible with network functions returning single tensor * remove ddpg test with test_discrete_identity * make ppo_metal use make_vec_env instead of make_atari_env * make ppo_metal use make_vec_env instead of make_atari_env * fixed syntax in ppo_metal.run_atari | 03 October 2018, 21:38:32 UTC |
4121d9c | pzhokhov | 03 October 2018, 21:37:40 UTC | fix DQN learning bug (#632) * Update run.py * Update utils.py * Update utils.py | 03 October 2018, 21:37:40 UTC |
34ae319 | Peter Zhokhov | 27 September 2018, 19:51:43 UTC | add a note about DQN algorithms not performing well | 27 September 2018, 19:51:43 UTC |
4402b8e | Thomas Simonini | 24 September 2018, 16:54:41 UTC | Updated A2C and PPO2 comments (#612) * Updated A2C and PPO2 comments * Fixed format errors to respect PEP 8 style guide | 24 September 2018, 16:54:41 UTC |
555a5cb | ahuhn | 22 September 2018, 00:22:56 UTC | Adding num_env to readme example (#609) * Adding num_env to readme example * Updated readme example fix | 22 September 2018, 00:22:56 UTC |
8158f35 | Thomas Simonini | 21 September 2018, 20:12:31 UTC | Wrote some comments to explain the A2C and PPO2 implementation (#607) * added comments in A2C and PPO2 * Fixed format errors to respect PEP 8 style guide | 21 September 2018, 20:12:31 UTC |
a7fd8a4 | cclauss | 20 September 2018, 23:40:03 UTC | Run flake8 to find syntax errors and undefined names (#439) __E901,E999,F821,F822,F823__ are the "showstopper" flake8 issues that can halt the runtime with a SyntaxError, NameError, etc. The other flake8 issues are merely "style violations" -- useful for readability but they do not effect runtime safety. This PR therefore recommends a flake8 run of those tests on the entire codebase. * F821: undefined name `name` * F822: undefined name `name` in `__all__` * F823: local variable `name` referenced before assignment * E901: SyntaxError or IndentationError * E999: SyntaxError -- failed to compile a file into an Abstract Syntax Tree | 20 September 2018, 23:40:03 UTC |
e791565 | John Schulman | 20 September 2018, 20:31:25 UTC | Codegen more abstract abstract classes 3a (#106) * Soup code, arch search on CIFAR-10 * Oh I understood how choice_sequence() worked * Undo some pointless changes * Some beautification 1 * Some beautification 2 * An attempt to debug test_get_algo_outputs() number 70, unsuccessful. * Code style warning * Code style warnings, more * wip * wip * wip * fix almost everything; soup machine still broken * revert mpi_eda changes * minor fixes | 20 September 2018, 23:19:07 UTC |
7859f60 | XFFXFF | 20 September 2018, 23:16:44 UTC | prioritized experience replay bug (#527) | 20 September 2018, 23:16:44 UTC |
0f4ae2f | pzhokhov | 20 September 2018, 23:05:26 UTC | refactor acktr (#560) * refactor acktr * setup.cfg now tests style/syntax in acktr as well * flake8 complaints * added note about continuous action spaces for acktr into the README.md | 20 September 2018, 23:05:26 UTC |
0e7048b | pzhokhov | 19 September 2018, 22:04:54 UTC | Update README.md | 19 September 2018, 22:04:54 UTC |
75983ba | pzhokhov | 19 September 2018, 22:04:01 UTC | Update README.md | 19 September 2018, 22:04:01 UTC |
85be745 | Alfredo Canziani | 19 September 2018, 16:43:45 UTC | Add possibility of plotting timesteps vs episodes (#578) * Add possibility of plotting timesteps vs episodes * Remove leftover from personal project patch * Auto plt.tight_layout() on resize window event Calls `plt.tight_layout()` if a `resize_event` is issued. This means that the plot will look good even after the user has resized the plotting window. | 19 September 2018, 16:43:45 UTC |
115b59d | Geoffrey Irving | 18 September 2018, 22:52:57 UTC | Merge pull request #598 from openai/irving-rc Fix setup.py for tensorflow -rc versions | 18 September 2018, 22:52:57 UTC |
d34049c | Xingdong Zuo | 18 September 2018, 21:14:38 UTC | Update running_mean_std.py (#585) | 18 September 2018, 21:14:38 UTC |
59662ff | pzhokhov | 18 September 2018, 21:13:05 UTC | rename entcoeff to ent_coef in trpo_mpi for compatibility with other algos (#581) | 18 September 2018, 21:13:05 UTC |
a42c4eb | Geoffrey Irving | 18 September 2018, 18:35:43 UTC | Fix setup.py for tensorflow -rc versions | 18 September 2018, 18:35:43 UTC |
68a29d0 | R1ckF | 17 September 2018, 21:33:39 UTC | --play now works with LSTM (#595) | 17 September 2018, 21:33:39 UTC |
0c6f357 | Xingdong Zuo | 17 September 2018, 16:53:34 UTC | Delete identity_env.py (#588) | 17 September 2018, 16:53:34 UTC |
4dc697e | pzhokhov | 14 September 2018, 01:18:45 UTC | codegen test fixes (#95) * fix discovered test failures * autopep8 * test indices up to 123 * testing from index 124 on * add scope to logstd * fix flakiness in test_train_mle * autopep8 | 14 September 2018, 22:43:50 UTC |
e790f52 | Peter Zhokhov | 13 September 2018, 22:37:04 UTC | define mean for CategoricalPd (as softmax of logits) | 14 September 2018, 22:43:50 UTC |
fe06c6b | pzhokhov | 12 September 2018, 17:14:41 UTC | continuous action spaces for codegen + some benchmarking (#82) * add some docstrings * start making big changes * state machine redesign * sampling seems to work * some reorg * fixed sampling of real vals * json conversion * made it possible to register new commands got nontrivial version of Pred working * consolidate command definitions * add more macro blocks * revived visualization * rename Userdata -> CmdInterpreter make AlgoSmInstance subclass of SmInstance that uses appropriate userdata argument * replace userdata by ci when appropriate * minor test fixes * revamped handmade dir, can run ppo_metal * seed to avoid random test failure * implement AlgoAgent * Autogenerated object that performs all ops and macros * more CmdRecorder changes * move files around * move MatchProb and JtftProb * remove obsolete * fix tests involving AlgoAgent (pending the next commit on ppo_metal code) * ppo_metal: reduce duplication in policy_gen, make sess an attribute of PpoAgent and StochasticPolicy instead of using get_default_session everywhere. * maze_env reformatting, move algo_search script (but stil broken) * move agent.py * fix test on handcrafted agents * tuning/fixing ppo_metal baseline * minor * Fix ppo_metal baseline * Don’t set epcount, tcount unless they’re being used * get rid of old ppo_metal baseline * fixes for handmade/run.py tuning * fix codegen ppo * fix handmade ppo hps * fix test, go back to safe_div * switch to more complex filtering * make sure all handcrafted algos have finite probability * train to maximize logprob of provided samples Trex changes to avoid segfault * AlgoSm also includes global hyperparams * don’t duplicate global hyperparam defaults * create generic_ob_ac_space function * use sorted list of outkeys * revive tsne * todo changes * determinism test * todo + test fix * remove a few deprecated files, rename other tests so they don’t run automatically, fix real test failure * continuous control with codegen * continuous control with codegen * implement continuous action space algodistr * ppo with trex RUN BENCHMARKS * wrap trex in a monitor * dummy commit to RUN BENCHMARKS * adding monitor to trex env RUN BENCHMARKS * adding monitor to trex RUN BENCHMARKS * include monitor into trex env RUN BENCHMARKS * generate nll and predmean using Distribution node * dummy commit to RUN BENCHMARKS * include pybullet into baselines optional dependencies * dummy commit to RUN BENCHMARKS * install games for cron rcall user RUN BENCHMARKS * add --yes flag to install.py in rcall config for cron user RUN BENCHMARKS * both continuous and discrete versions seem to run * fixes to monitor to work with vecenv-like info and rewards RUN BENCHMARKS * dummy commit to RUN BENCHMARKS * removed shape check from one-hot encoding logic in distributions.CategoricalPd * reset logger configuration in codegen/handmade/run.py to be in-line with baselines RUN BENCHMARKS * merged peterz_codegen_benchmarks RUN BENCHMARKS * skip tests RUN BENCHMARKS * working on test failures * save benchmark dicts RUN BENCHMARK * merged peterz_codegen_benchmark RUN BENCHMARKS * add get_git_commit_message to the baselines.common.console_util * dummy commit to RUN BENCHMARKS * merged fixes from peterz_codegen_benchmark RUN BENCHMARKS * fixing failure in test_algo_nll WIP * test_algo_nll passes with both ppo and softq * re-enabled tests * run trex on gpus for 100k total (horizon=100k / 16) RUN BENCHMARKS * merged latest peterz_codegen_benchmarks RUN BENCHMARKS * fixing codegen test failures (logging-related) * fixed name collision in run-benchmarks-new.py RUN BENCHMARKS * fixed name collision in run-benchmarks-new.py RUN BENCHMARKS * fixed import in node_filters.py * test_algo_search passes * some cleanup * dummy commit to RUN BENCHMARKS * merge fast fail for subprocvecenv RUN BENCHMARKS * use SubprocVecEnv in sonic_prob * added deprecation note to shmem_vec_env * allow indexing of distributions * add timeout to pipeline.yaml * typo in pipeline.yml * run tests with --forked option * resolved merge conflict in rl_algs.bench.benchmarks * re-enable parallel tests * fix remaining merge conflicts and syntax * Update trex_prob.py * fixes to ResultsWriter * take baselines/run.py from peterz_codegen branch * actually save stuff to file in VecMonitor RUN BENCHMARKS * enable parallel tests * merge stricter flake8 * merge peterz_codegen_benchmark, resolve conflicts * autopep8 * remove traces of Monitor from trex env, check shapes before encoding in CategoricalPd * asserts and warnings to make q -> distribution change more explicit * fixed assert in CategoricalPd * add header to vec_monitor output file RUN BENCHMARKS * make VecMonitor write header to the output file * remove deprecation message from shmem_vec_env RUN BENCHMARKS * autopep8 * proper shape test in distributions.py * ResultsWriter can take dict headers * dummy commit to RUN BENCHMARKS * replace assert len(qs)==1 with warning RUN BENCHMARKS * removed pdb from ppo2 RUN BENCHMARKS | 14 September 2018, 22:43:49 UTC |
1f99a56 | Peter Zhokhov | 11 September 2018, 20:21:52 UTC | autopep8 | 11 September 2018, 20:21:52 UTC |
4e2a888 | Peter Zhokhov | 11 September 2018, 20:19:39 UTC | Merge commit 'refs/subrepo/baselines/fetch' into subrepo/baselines | 11 September 2018, 20:19:39 UTC |
c5b2918 | Peter Zhokhov | 11 September 2018, 19:48:16 UTC | git subrepo pull (merge) baselines subrepo: subdir: "baselines" merged: "2742f819" upstream: origin: "git@github.com:openai/baselines.git" branch: "master" commit: "5c5a9f4b" git-subrepo: version: "0.4.0" origin: "git@github.com:ingydotnet/git-subrepo.git" commit: "74339e8" | 11 September 2018, 20:18:43 UTC |
3bf31a4 | Peter Zhokhov | 11 September 2018, 19:42:47 UTC | git subrepo commit (merge) baselines subrepo: subdir: "baselines" merged: "0846932a" upstream: origin: "git@github.com:openai/baselines.git" branch: "master" commit: "c5d6f299" git-subrepo: version: "0.4.0" origin: "git@github.com:ingydotnet/git-subrepo.git" commit: "74339e8" | 11 September 2018, 20:18:43 UTC |
9070ee7 | pzhokhov | 11 September 2018, 18:01:51 UTC | tighten flake8, autopep8 to fix trailing whitespaces and blank lines with whitespaces (#87) | 11 September 2018, 20:18:43 UTC |
e568034 | Peter Zhokhov | 10 September 2018, 19:50:51 UTC | git subrepo pull (merge) baselines subrepo: subdir: "baselines" merged: "5c6a1fd9" upstream: origin: "git@github.com:openai/baselines.git" branch: "master" commit: "23b23332" git-subrepo: version: "0.4.0" origin: "git@github.com:ingydotnet/git-subrepo.git" commit: "74339e8" | 11 September 2018, 20:18:42 UTC |
b3bc25d | pzhokhov | 10 September 2018, 18:58:22 UTC | add fast failure when calling methods on a closed subprocvecenv (#84) | 11 September 2018, 20:18:42 UTC |
5183fa9 | Peter Zhokhov | 11 September 2018, 19:47:50 UTC | autopep8 on deepq/experiments | 11 September 2018, 19:47:50 UTC |
5c5a9f4 | Peter Zhokhov | 11 September 2018, 19:47:50 UTC | autopep8 on deepq/experiments | 11 September 2018, 19:47:50 UTC |
3bf35cb | Peter Zhokhov | 11 September 2018, 19:44:51 UTC | added peterz to baselines authorlist | 11 September 2018, 19:44:51 UTC |
5c62f5c | Peter Zhokhov | 11 September 2018, 19:44:51 UTC | added peterz to baselines authorlist | 11 September 2018, 19:44:51 UTC |
29bf587 | Peter Zhokhov | 11 September 2018, 19:40:29 UTC | Merge branch 'master' of github.com:openai/baselines | 11 September 2018, 19:40:29 UTC |
c5d6f29 | Peter Zhokhov | 11 September 2018, 19:40:29 UTC | Merge branch 'master' of github.com:openai/baselines | 11 September 2018, 19:40:29 UTC |