3c77161 | John | 01 April 2019, 08:46:09 UTC | fixed path in dockerfile | 01 April 2019, 08:46:09 UTC |
a7bd618 | John | 01 April 2019, 03:38:33 UTC | updated build/deploy scripts | 01 April 2019, 03:38:33 UTC |
33cab49 | John | 01 April 2019, 03:32:42 UTC | changed names in run-tests | 01 April 2019, 03:32:42 UTC |
81ec25f | John | 01 April 2019, 02:25:44 UTC | fixed python tests | 01 April 2019, 02:25:44 UTC |
6b9d8ff | John | 31 March 2019, 21:20:18 UTC | changed name in dockerfile | 31 March 2019, 21:20:18 UTC |
a1ad349 | John | 31 March 2019, 20:51:47 UTC | fixed hyphen typo | 31 March 2019, 20:51:47 UTC |
e023ab1 | John | 31 March 2019, 16:59:25 UTC | removed bayesianoptimization names | 31 March 2019, 16:59:25 UTC |
68aa24e | John | 20 March 2019, 14:05:50 UTC | Merge branch 'python_migration' of https://github.com/jdplatt/katib into python_migration | 20 March 2019, 14:05:50 UTC |
5970d5f | John | 20 March 2019, 14:04:21 UTC | updated grid and random images in e2e tests | 20 March 2019, 14:04:21 UTC |
2aad698 | jdplatt | 20 March 2019, 03:16:23 UTC | Update suggestion-config-hyb.yml | 20 March 2019, 03:16:23 UTC |
b479bd2 | John | 20 March 2019, 03:13:30 UTC | removed logging statements | 20 March 2019, 03:13:30 UTC |
5f48775 | John | 20 March 2019, 03:10:45 UTC | fixed tests | 20 March 2019, 03:10:45 UTC |
fbf0908 | John | 19 March 2019, 23:28:45 UTC | working on test client | 19 March 2019, 23:28:45 UTC |
a99fa58 | John Platt | 18 March 2019, 21:24:15 UTC | fixed setup files | 18 March 2019, 21:24:15 UTC |
7e03787 | John Platt | 18 March 2019, 18:01:35 UTC | renamed packages and split into different setup files | 18 March 2019, 18:01:35 UTC |
a96c4e2 | John Platt | 18 March 2019, 15:01:04 UTC | added random search go code back to fix hyperband | 18 March 2019, 15:01:04 UTC |
5195b19 | John Platt | 18 March 2019, 13:59:36 UTC | removed test code for old images | 18 March 2019, 13:59:36 UTC |
39ad7e2 | John Platt | 16 March 2019, 19:58:40 UTC | removed old files | 16 March 2019, 19:58:40 UTC |
7e6936e | John Platt | 16 March 2019, 19:56:42 UTC | reverted changes for local development | 16 March 2019, 19:56:42 UTC |
73fddfc | John Platt | 16 March 2019, 19:50:09 UTC | Merge branch 'master' into python_migration | 16 March 2019, 19:50:09 UTC |
44e804f | John Platt | 16 March 2019, 19:49:29 UTC | Merge branch 'master' of https://github.com/kubeflow/katib | 16 March 2019, 19:49:29 UTC |
ddd0af3 | John Platt | 16 March 2019, 15:59:11 UTC | moved parsing inside algorithms | 16 March 2019, 15:59:11 UTC |
a417908 | John Platt | 16 March 2019, 14:34:23 UTC | updated manifests and dockerfiles | 16 March 2019, 14:34:23 UTC |
a95a6cb | John Platt | 16 March 2019, 14:34:10 UTC | added grid search code | 16 March 2019, 14:34:10 UTC |
acc3376 | John Platt | 16 March 2019, 12:59:43 UTC | fixed minor issues | 16 March 2019, 12:59:43 UTC |
887f356 | Julian Qian | 15 March 2019, 23:42:57 UTC | fix demo link (#434) * fix demo link change to correct link README.md * The link should say README.md as well. The link should say README.md as well. | 15 March 2019, 23:42:57 UTC |
06f955b | Jinan Zhou | 14 March 2019, 01:58:22 UTC | Add fault tolerance support for trial failure (#424) * add fault tolerance for trial failure * fix a small typo * fix a typo * improve fault processing strategy * add an important TODO * fix typo * add some more TODOs | 14 March 2019, 01:58:22 UTC |
c87d583 | jdplatt | 11 March 2019, 19:54:38 UTC | Test for Bayesian Optimization Algo (#406) * added tests for acquisition function and models * added tests for global_optimizer * added tests for boa * minor linting * tests for algorithm manager * added discrete parameter to study config * covered all parameter types * moved python script to testing folder * added python tests to unit tests * remembered to uncomment existing tests * fixed path to test script * moved python tests to separate job in workflow * added run command to test script | 11 March 2019, 19:54:38 UTC |
61451ef | Richard Liu | 08 March 2019, 02:23:33 UTC | Katib v1alpha2 API for CRDs (#381) * v1alpha2 API proposal * Fix comments round 1 * Refactor into Experiment and Trial * Incorporate feedback from meeting * Rename * Minor edits | 08 March 2019, 02:23:33 UTC |
57dd5c5 | John Platt | 07 March 2019, 14:09:53 UTC | added run command to test script | 07 March 2019, 14:09:53 UTC |
9f75f05 | John Platt | 07 March 2019, 13:24:57 UTC | moved python tests to separate job in workflow | 07 March 2019, 13:24:57 UTC |
0217ace | John Platt | 07 March 2019, 12:31:28 UTC | fixed path to test script | 07 March 2019, 12:31:28 UTC |
936708c | John Platt | 06 March 2019, 14:32:00 UTC | made fetching observations optional | 06 March 2019, 14:32:00 UTC |
86bd27a | Andrey Velichkevich | 06 March 2019, 05:27:59 UTC | Add NAS team as reviewers (#419) * Add NAS team in reviewers * Update reviewers | 06 March 2019, 05:27:59 UTC |
feee2f9 | Jinan Zhou | 06 March 2019, 01:48:01 UTC | Multiple Trials for Reinforcement Learning Suggestion (#416) * supoort multiple trials * adjust To Do * language improvement in README.md * fix several problems * fix a potential problem * handle the GetEvaluationResult() return None problem | 06 March 2019, 01:48:01 UTC |
3a705a1 | Jinan Zhou | 06 March 2019, 00:40:03 UTC | Fix the package version in training container (#418) * fix the version of tf and keras * fix a typo | 06 March 2019, 00:40:03 UTC |
8f89ad4 | Andrey Velichkevich | 05 March 2019, 23:47:59 UTC | Add validation for NAS job in Katib controller (#398) * Initial commit * Add validation for NAS config * Fix validation * Add algorithmType in NasConfig validation * Add Discrete ParameterType to validation * Move validation to webhook Change GetJobType function Make a list with NAS algorithms * Add ValidateSuggestionParameters function in Katib API * Fix api * Add ValidateSuggestionParameters to Suggestion service * Change isValid to int32 * Create Validation function in NAS RL Suggestion service * Fix small problems * Reduce code inside Validation function * Add empty ValidateSuggestionParameters function in each HP service written in GO * Fix logging * Add ValidateSuggestionParameters to mock * Handle Unvailable error | 05 March 2019, 23:47:59 UTC |
a393d4b | John Platt | 05 March 2019, 21:48:29 UTC | updated manifests | 05 March 2019, 21:48:29 UTC |
a338fb2 | John Platt | 05 March 2019, 20:32:34 UTC | expanded parameter object and streamlined tests | 05 March 2019, 20:32:34 UTC |
b62c941 | John Platt | 05 March 2019, 16:34:58 UTC | Merge branch 'master' into python_migration | 05 March 2019, 16:34:58 UTC |
f4ae52d | John Platt | 05 March 2019, 15:39:00 UTC | remembered to uncomment existing tests | 05 March 2019, 15:39:00 UTC |
41d19db | John Platt | 05 March 2019, 15:34:19 UTC | added python tests to unit tests | 05 March 2019, 15:34:19 UTC |
89fcf56 | John Platt | 03 March 2019, 23:52:44 UTC | added random search | 04 March 2019, 00:27:08 UTC |
1c3401b | John Platt | 03 March 2019, 23:27:48 UTC | started testing service | 03 March 2019, 23:27:48 UTC |
dbc8b30 | John Platt | 03 March 2019, 17:37:57 UTC | pushed parameter validation and defaults inside algorithm | 03 March 2019, 17:37:57 UTC |
429c0c4 | John Platt | 02 March 2019, 01:22:14 UTC | broke out parsing tests to cover individual functions | 02 March 2019, 01:39:31 UTC |
d825a46 | John Platt | 02 March 2019, 00:56:31 UTC | eliminated algorithm_manager.py | 02 March 2019, 00:56:31 UTC |
abe8e6e | John Platt | 01 March 2019, 21:43:37 UTC | converted methods called in init for algorithm manager into functions | 01 March 2019, 21:43:37 UTC |
8620509 | John Platt | 01 March 2019, 19:45:08 UTC | linted bayesian_service.py | 01 March 2019, 20:33:22 UTC |
76eb49f | jdplatt | 01 March 2019, 14:31:35 UTC | Merge remote-tracking branch 'upstream/master' | 01 March 2019, 14:31:35 UTC |
db6b83b | Andrey Velichkevich | 01 March 2019, 02:54:21 UTC | Fix path to api protobuf (#415) | 01 March 2019, 02:54:21 UTC |
4d8c599 | Jinan Zhou | 27 February 2019, 03:05:45 UTC | Add support for parallel studyjobs (#404) * Add support for parallel studyjobs * fix a typo * Reorganize the program a little bit * fix a typo * fix a typo | 27 February 2019, 03:05:45 UTC |
87a31f3 | Jinan Zhou | 27 February 2019, 01:33:48 UTC | Add separable/depthwise convolution, data augmentation and multiple GPU support (#393) * add separable/depthwise convolution in operation library * add ENAS example StudyJob yaml * remove ENAS example, add data augmentation, add multiple GPU support | 27 February 2019, 01:33:48 UTC |
4d031e7 | Andrey Velichkevich | 27 February 2019, 00:32:45 UTC | Add create time to Trial API (#410) * Add create time to Trial API * Add Trial create time information * Fix UT for db | 27 February 2019, 00:32:45 UTC |
f5a3860 | John Platt | 21 February 2019, 17:24:20 UTC | moved python script to testing folder | 26 February 2019, 20:13:20 UTC |
afe1874 | John Platt | 21 February 2019, 17:17:04 UTC | covered all parameter types | 26 February 2019, 20:13:20 UTC |
9dd5e6a | John Platt | 21 February 2019, 17:07:10 UTC | added discrete parameter to study config | 26 February 2019, 20:13:20 UTC |
26f9106 | John Platt | 21 February 2019, 16:50:14 UTC | tests for algorithm manager | 26 February 2019, 20:13:20 UTC |
3e745ab | John Platt | 20 February 2019, 19:05:47 UTC | minor linting | 26 February 2019, 20:13:20 UTC |
eeb76e5 | John Platt | 20 February 2019, 16:24:12 UTC | added tests for boa | 26 February 2019, 20:13:19 UTC |
dd3d563 | John Platt | 20 February 2019, 15:36:00 UTC | added tests for global_optimizer | 26 February 2019, 20:13:19 UTC |
75e9886 | John Platt | 19 February 2019, 21:26:20 UTC | added tests for acquisition function and models | 26 February 2019, 20:13:19 UTC |
26da3ea | Johnu George | 26 February 2019, 04:32:34 UTC | Metric collector must fail on error (#405) * Fail when unable to collect logs * Set backlimit to 0 for jobs | 26 February 2019, 04:32:34 UTC |
6b75138 | Hougang Liu | 25 February 2019, 17:37:16 UTC | add latest tag for katib images (#409) | 25 February 2019, 17:37:16 UTC |
46d2dc7 | Hougang Liu | 22 February 2019, 02:35:03 UTC | add build and test for suggestion nasrl (#401) | 22 February 2019, 02:35:03 UTC |
d6a67ea | Akado2009 | 21 February 2019, 01:37:09 UTC | Database APIs for NAS updated (#394) * FINAL PUSH * FIX TESTS * new lock * new lock * small fi * DELET SPACE * deleted ununsed function | 21 February 2019, 01:37:09 UTC |
3bb8b54 | Jinan Zhou | 21 February 2019, 00:53:59 UTC | Suggestion for Neural Architecture Search with Reinforcement Learning (#339) * Suggestion for Neural Architecture Search with Reinforcement Learning * Add NAS RL Suggestion * Fix new line * set json format for GetSuggestion() * finish trial return in GetSuggestion(), finish GetEvaluationHistory, and fix bugs * fix a bug in GetEvaluationResult() * fix bigs in GetEvaluationResult * fix an error in GetEvaluatinResult * Add python Katib api * Remove unnecessary requirements * add about for suggestion * rename to README * Add picture explanations; make the printouts more organized * fix typos * fix some small problems * Fix several problems * Fix a typo * fix some problems * small fixes * Suggestion do not need to handle uncompleted trials * fix a small problem | 21 February 2019, 00:53:59 UTC |
5a1a791 | Hougang Liu | 20 February 2019, 17:40:23 UTC | add validating webhook for studyJob (#383) * add validating webhook for studyJob If create/update a studyJob with bad CR manifest or invalid configuration, k8s api server will reject the request. Fixes: #314 * add test * allow check "kubectl" error code | 20 February 2019, 17:40:23 UTC |
8a89b9e | Johnu George | 20 February 2019, 06:19:50 UTC | Removing Operator specific handling during a StudyJob run (#387) * Removing Operator specific handling during a StudyJob run * Return empty in error | 20 February 2019, 06:19:50 UTC |
edecd39 | Andrey Velichkevich | 20 February 2019, 00:41:30 UTC | Delete modeldb from unit tests (#391) * Delete modeldb from unit tests * Add library to interface test | 20 February 2019, 00:41:30 UTC |
c0f2f07 | Hougang Liu | 19 February 2019, 03:21:42 UTC | show studyjob condition when run kubectl get (#389) | 19 February 2019, 03:21:42 UTC |
ee62c33 | Jinan Zhou | 15 February 2019, 02:23:48 UTC | Training Container with Model Constructor for cifar10 (#345) * Training Container with Model Constructor for cifar10 * fix a small bug * make num_epochs a parameter | 15 February 2019, 02:23:48 UTC |
3706fce | Hougang Liu | 14 February 2019, 18:15:03 UTC | add studyjob python client (#379) | 14 February 2019, 18:15:03 UTC |
1de9307 | Hougang Liu | 14 February 2019, 18:14:53 UTC | fix wrong example (#378) | 14 February 2019, 18:14:53 UTC |
03ca08f | Johnu George | 14 February 2019, 16:36:02 UTC | Upgrading and controller runtime k8s to 1.11.2 (#376) | 14 February 2019, 16:36:02 UTC |
a5c8e02 | IWAMOTO Toshihiro | 14 February 2019, 05:32:03 UTC | Properly initialize CI cluster credential (#360) It has been using the cluster where argo ran | 14 February 2019, 05:32:03 UTC |
41a5a2e | Alexandra Johnson | 13 February 2019, 19:27:47 UTC | Include go dependencies in developer-guide.md (#369) Looks like Google protobufs might also be a dependency? | 13 February 2019, 19:27:47 UTC |
d6ea2d5 | Hougang Liu | 12 February 2019, 02:47:44 UTC | fix invalid memory address (#368) | 12 February 2019, 02:47:44 UTC |
421cbff | Richard Liu | 08 February 2019, 06:07:13 UTC | Fix presubmits (#363) * Fix typo * Fix gcloud builds submit command * Use printf() instead of print() | 08 February 2019, 06:07:13 UTC |
0ea34b1 | Richard Liu | 01 February 2019, 04:28:57 UTC | Katib 2019 Roadmap (#348) * roadmap * Fixing format * Add links to github issues * Fix comments | 01 February 2019, 04:28:57 UTC |
afee0c3 | Richard Liu | 29 January 2019, 05:43:25 UTC | Update OWNERS (#350) | 29 January 2019, 05:43:25 UTC |
f11c13e | Andrey Velichkevich | 29 January 2019, 01:23:06 UTC | Extend Katib API for NAS jobs (#327) * Add fields to studyjob structure * Change nasjob yaml file * Change parameter type * Add Parameter Type=range * Change API * Change input size * Reset API structure * Change StudyJob API structure * Remove Range parameter * Fix api.proto * Fix gopkg.toml * Remove old nasjob file * Fix nasjob.yaml * Add custom suggestion * Add blank NAS suggestion Change Katib API to process yaml file for NAS * Add correct YAML file for NAS example * Fix newline * Change StudyID to 1 * Add jobType parameter in Parsing * Remove changes in manager * Add NasConfig inside Yaml file * Fix name in nasConfig * Fix get StudyConfig in NAS * Add JobType in all services * Add job_type in bayesian_service * Add pointers in NasConfig structure * Fix Pointer in API * Add consts for jobType Remove return from populateCommonConfigFields * Move const jobType to const file * Remove Range parameter * Modify YAML file for NAS jobs * Add getStudyJobType function in GRPC server * Add blank GetStudyJobType func in manager * Fix metrics collector * Remove jobType from getStudy * Remove getStudyJobType from manager * Add NAS RL yaml deployment * Change worker to GPU * Clean nasrl suggestion * Add -u inside training-container * Fix namespace in worker template | 29 January 2019, 01:23:06 UTC |
f4026e4 | Hougang Liu | 25 January 2019, 00:26:22 UTC | ignore tfjob/pytorch job if corresponding CRD not created (#335) * ignore tfjob/pytorch job if corresponding CRD not created * update log message * only ignore NoMatchError when watch CRD * refactor func name for watch error | 25 January 2019, 00:26:22 UTC |
c67892f | Guang Ya Liu | 24 January 2019, 16:50:31 UTC | Clarify the example UI is generated by random-example. (#333) | 24 January 2019, 16:50:31 UTC |
a777721 | Hougang Liu | 24 January 2019, 01:16:26 UTC | only try to delete study info in db when in need (#342) | 24 January 2019, 01:16:26 UTC |
8545970 | Hougang Liu | 22 January 2019, 19:40:17 UTC | omit empty fields for studyjob status (#336) | 22 January 2019, 19:40:17 UTC |
15bbcae | Tim Zaman | 18 January 2019, 19:07:12 UTC | Update pytorch example with latest image (#329) * Update pytorch example with latest image * Update pytorch example docker image | 18 January 2019, 19:07:12 UTC |
a24c428 | Richard Liu | 18 January 2019, 00:00:00 UTC | Fix typo (#330) | 18 January 2019, 00:00:00 UTC |
d41f8e8 | Andrey Velichkevich | 16 January 2019, 15:30:01 UTC | Add information how to run TFjob and Pytorch examples in Katib (#321) * Add doc for tfjob and pytorch examples in Katib * Add contents * Fix README * Fix link to examples in README * Fix README * Add information about Katib UI and status of StudyJob * Add Ambassador information | 16 January 2019, 15:30:01 UTC |
0ed361c | Richard Liu | 15 January 2019, 23:22:00 UTC | Add xgboost example using Bayesian optimization (#320) * Add xgboost example * Add comments for ames example | 15 January 2019, 23:22:00 UTC |
4a69776 | Hougang Liu | 15 January 2019, 02:32:07 UTC | katib should be able to be deployed in any namespace (#324) | 15 January 2019, 02:32:07 UTC |
3c37f31 | Johnu George | 08 January 2019, 10:56:59 UTC | Adding distributed pytorch example for katib (#309) | 08 January 2019, 10:56:59 UTC |
9aa90fa | Johnu George | 08 January 2019, 10:22:02 UTC | minor fixes (#307) | 08 January 2019, 10:22:02 UTC |
f78a108 | Hougang Liu | 07 January 2019, 15:22:39 UTC | delete obsolete data in db (#315) * delete obsolete data in db * add delete study test * make sure trials and workers deleted when study deleted in ut test | 07 January 2019, 15:22:39 UTC |
fae6aa5 | Hougang Liu | 03 January 2019, 03:42:45 UTC | add bestTrialId to statusJob status (#312) * add bestTrialId to statusJob status * generate mock and add bestworkerid | 03 January 2019, 03:42:45 UTC |
f24889c | oshima | 25 December 2018, 17:22:57 UTC | Add api doc (#303) * add api doc Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix typo Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add instructions for update api files and docs Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix typo Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> | 25 December 2018, 17:22:57 UTC |
1295f45 | Hougang Liu | 23 December 2018, 02:11:11 UTC | validate studyJob when first reconcile it (#308) * validate studyJob when first reconcile it Fixes: #297 * use 3rd-party uuid instead of self-define one k8s.io/apimachinery/pkg/util/uuid is used in kubernetes source code | 23 December 2018, 02:11:11 UTC |
cbe91f8 | Hougang Liu | 22 December 2018, 05:51:27 UTC | add hougangliu as a reviewer (#310) | 22 December 2018, 05:51:27 UTC |
9baabbf | Johnu George | 21 December 2018, 13:57:09 UTC | Adding to OWNERS file (#304) * Adding to OWNERS file * adding to reviewers | 21 December 2018, 13:57:09 UTC |
b11b81d | Hougang Liu | 20 December 2018, 04:53:31 UTC | sync up worker status all the time (#299) Fixes: #298 | 20 December 2018, 04:53:31 UTC |