ac2dd76 | andreyvelich | 26 January 2019, 02:44:13 UTC | Add -u inside training-container | 26 January 2019, 02:44:13 UTC |
55c924e | andreyvelich | 26 January 2019, 01:12:19 UTC | Clean nasrl suggestion | 26 January 2019, 01:12:19 UTC |
b0fb3cd | andreyvelich | 26 January 2019, 01:09:11 UTC | Change worker to GPU | 26 January 2019, 01:09:11 UTC |
4df7151 | andreyvelich | 25 January 2019, 22:33:38 UTC | Add NAS RL yaml deployment | 25 January 2019, 22:33:38 UTC |
c362703 | andreyvelich | 25 January 2019, 20:00:25 UTC | Remove getStudyJobType from manager | 25 January 2019, 20:00:25 UTC |
8766eda | andreyvelich | 25 January 2019, 19:59:04 UTC | Remove jobType from getStudy | 25 January 2019, 19:59:04 UTC |
2324741 | andreyvelich | 24 January 2019, 19:28:56 UTC | Fix metrics collector | 24 January 2019, 19:28:56 UTC |
8f0d206 | andreyvelich | 24 January 2019, 02:24:53 UTC | Merge remote-tracking branch 'upstream/master' into 293-extend-sj-structure | 24 January 2019, 02:24:53 UTC |
a777721 | Hougang Liu | 24 January 2019, 01:16:26 UTC | only try to delete study info in db when in need (#342) | 24 January 2019, 01:16:26 UTC |
9db4a81 | andreyvelich | 24 January 2019, 01:10:26 UTC | Add blank GetStudyJobType func in manager | 24 January 2019, 01:10:26 UTC |
437b614 | andreyvelich | 24 January 2019, 00:31:24 UTC | Add getStudyJobType function in GRPC server | 24 January 2019, 00:31:24 UTC |
7ca5f0c | andreyvelich | 23 January 2019, 01:48:57 UTC | Modify YAML file for NAS jobs | 23 January 2019, 01:48:57 UTC |
4fa52ae | andreyvelich | 22 January 2019, 22:03:55 UTC | Remove Range parameter | 22 January 2019, 22:03:55 UTC |
8545970 | Hougang Liu | 22 January 2019, 19:40:17 UTC | omit empty fields for studyjob status (#336) | 22 January 2019, 19:40:17 UTC |
794e7cc | andreyvelich | 19 January 2019, 01:33:27 UTC | Move const jobType to const file | 19 January 2019, 01:33:27 UTC |
2802c6f | andreyvelich | 18 January 2019, 20:08:52 UTC | Add consts for jobType Remove return from populateCommonConfigFields | 18 January 2019, 20:08:52 UTC |
15bbcae | Tim Zaman | 18 January 2019, 19:07:12 UTC | Update pytorch example with latest image (#329) * Update pytorch example with latest image * Update pytorch example docker image | 18 January 2019, 19:07:12 UTC |
d653643 | andreyvelich | 18 January 2019, 05:58:40 UTC | Fix Pointer in API | 18 January 2019, 05:58:40 UTC |
f1dac5c | andreyvelich | 18 January 2019, 05:54:23 UTC | Add pointers in NasConfig structure | 18 January 2019, 05:54:23 UTC |
a01710c | andreyvelich | 18 January 2019, 05:48:09 UTC | Add job_type in bayesian_service | 18 January 2019, 05:48:09 UTC |
f5a5d83 | andreyvelich | 18 January 2019, 05:45:40 UTC | Add JobType in all services | 18 January 2019, 05:45:40 UTC |
04cdfbf | andreyvelich | 18 January 2019, 05:35:27 UTC | Fix get StudyConfig in NAS | 18 January 2019, 05:35:27 UTC |
3318467 | andreyvelich | 18 January 2019, 02:51:10 UTC | Merge remote-tracking branch 'upstream/master' into 293-extend-sj-structure | 18 January 2019, 02:51:10 UTC |
a18466e | andreyvelich | 18 January 2019, 02:34:24 UTC | Fix name in nasConfig | 18 January 2019, 02:34:24 UTC |
9546fef | andreyvelich | 18 January 2019, 02:25:12 UTC | Add NasConfig inside Yaml file | 18 January 2019, 02:25:12 UTC |
a24c428 | Richard Liu | 18 January 2019, 00:00:00 UTC | Fix typo (#330) | 18 January 2019, 00:00:00 UTC |
7ed6b0a | andreyvelich | 17 January 2019, 23:57:17 UTC | Remove changes in manager | 17 January 2019, 23:57:17 UTC |
2f8dc49 | andreyvelich | 17 January 2019, 05:29:28 UTC | Add jobType parameter in Parsing | 17 January 2019, 05:29:28 UTC |
7c15d64 | andreyvelich | 17 January 2019, 01:49:59 UTC | Change StudyID to 1 | 17 January 2019, 01:49:59 UTC |
f5498e4 | andreyvelich | 17 January 2019, 00:37:41 UTC | Fix newline | 17 January 2019, 00:37:41 UTC |
e8c9636 | andreyvelich | 17 January 2019, 00:29:43 UTC | Add correct YAML file for NAS example | 17 January 2019, 00:29:43 UTC |
1ff6840 | andreyvelich | 17 January 2019, 00:24:19 UTC | Add blank NAS suggestion Change Katib API to process yaml file for NAS | 17 January 2019, 00:24:19 UTC |
d41f8e8 | Andrey Velichkevich | 16 January 2019, 15:30:01 UTC | Add information how to run TFjob and Pytorch examples in Katib (#321) * Add doc for tfjob and pytorch examples in Katib * Add contents * Fix README * Fix link to examples in README * Fix README * Add information about Katib UI and status of StudyJob * Add Ambassador information | 16 January 2019, 15:30:01 UTC |
0ed361c | Richard Liu | 15 January 2019, 23:22:00 UTC | Add xgboost example using Bayesian optimization (#320) * Add xgboost example * Add comments for ames example | 15 January 2019, 23:22:00 UTC |
ac63ac0 | andreyvelich | 15 January 2019, 22:28:22 UTC | Add custom suggestion | 15 January 2019, 22:28:22 UTC |
972cf30 | andreyvelich | 15 January 2019, 02:36:01 UTC | Fix nasjob.yaml | 15 January 2019, 02:36:01 UTC |
945d3fd | andreyvelich | 15 January 2019, 02:34:12 UTC | Remove old nasjob file | 15 January 2019, 02:34:12 UTC |
de9a6d7 | andreyvelich | 15 January 2019, 02:33:25 UTC | Fix gopkg.toml | 15 January 2019, 02:33:25 UTC |
4a69776 | Hougang Liu | 15 January 2019, 02:32:07 UTC | katib should be able to be deployed in any namespace (#324) | 15 January 2019, 02:32:07 UTC |
6edb518 | andreyvelich | 15 January 2019, 02:31:55 UTC | Fix api.proto | 15 January 2019, 02:31:55 UTC |
fdb51d1 | andreyvelich | 15 January 2019, 02:26:51 UTC | Remove Range parameter | 15 January 2019, 02:26:51 UTC |
f673386 | andreyvelich | 14 January 2019, 23:28:18 UTC | Change StudyJob API structure | 14 January 2019, 23:28:18 UTC |
4342a8b | andreyvelich | 14 January 2019, 23:13:03 UTC | Reset API structure | 14 January 2019, 23:13:03 UTC |
3c37f31 | Johnu George | 08 January 2019, 10:56:59 UTC | Adding distributed pytorch example for katib (#309) | 08 January 2019, 10:56:59 UTC |
9aa90fa | Johnu George | 08 January 2019, 10:22:02 UTC | minor fixes (#307) | 08 January 2019, 10:22:02 UTC |
abd564e | andreyvelich | 07 January 2019, 19:25:09 UTC | Change input size | 07 January 2019, 19:25:09 UTC |
3e5a371 | andreyvelich | 07 January 2019, 18:57:18 UTC | Change api.proto | 07 January 2019, 18:57:18 UTC |
743265b | andreyvelich | 07 January 2019, 18:56:16 UTC | Change API | 07 January 2019, 18:56:16 UTC |
f78a108 | Hougang Liu | 07 January 2019, 15:22:39 UTC | delete obsolete data in db (#315) * delete obsolete data in db * add delete study test * make sure trials and workers deleted when study deleted in ut test | 07 January 2019, 15:22:39 UTC |
fae6aa5 | Hougang Liu | 03 January 2019, 03:42:45 UTC | add bestTrialId to statusJob status (#312) * add bestTrialId to statusJob status * generate mock and add bestworkerid | 03 January 2019, 03:42:45 UTC |
f24889c | oshima | 25 December 2018, 17:22:57 UTC | Add api doc (#303) * add api doc Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix typo Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add instructions for update api files and docs Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix typo Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> | 25 December 2018, 17:22:57 UTC |
1295f45 | Hougang Liu | 23 December 2018, 02:11:11 UTC | validate studyJob when first reconcile it (#308) * validate studyJob when first reconcile it Fixes: #297 * use 3rd-party uuid instead of self-define one k8s.io/apimachinery/pkg/util/uuid is used in kubernetes source code | 23 December 2018, 02:11:11 UTC |
cbe91f8 | Hougang Liu | 22 December 2018, 05:51:27 UTC | add hougangliu as a reviewer (#310) | 22 December 2018, 05:51:27 UTC |
9baabbf | Johnu George | 21 December 2018, 13:57:09 UTC | Adding to OWNERS file (#304) * Adding to OWNERS file * adding to reviewers | 21 December 2018, 13:57:09 UTC |
f6eb5ce | andreyvelich | 20 December 2018, 22:15:01 UTC | Add Parameter Type=range | 20 December 2018, 22:15:01 UTC |
b11b81d | Hougang Liu | 20 December 2018, 04:53:31 UTC | sync up worker status all the time (#299) Fixes: #298 | 20 December 2018, 04:53:31 UTC |
bca0b58 | Hougang Liu | 19 December 2018, 18:02:49 UTC | studyJob with non-kubeflow namespace cannot work (#302) | 19 December 2018, 18:02:49 UTC |
8e89813 | Johnu George | 19 December 2018, 15:03:34 UTC | Adding master pod check for default metric collector (#300) | 19 December 2018, 15:03:34 UTC |
07e0fd2 | Hougang Liu | 19 December 2018, 01:24:56 UTC | reduce some redundant code (#296) | 19 December 2018, 01:24:56 UTC |
9b8764d | andreyvelich | 18 December 2018, 02:17:41 UTC | Merge remote-tracking branch 'upstream/master' into 293-extend-sj-structure | 18 December 2018, 02:17:41 UTC |
7fe9b7d | andreyvelich | 18 December 2018, 02:15:35 UTC | Change parameter type | 18 December 2018, 02:15:35 UTC |
28c5b1c | Andrey Velichkevich | 16 December 2018, 15:43:49 UTC | Extend studyjob client API (#288) * Add namespace parameter to studyJob client API * Change if statement for namespace * Create func getNamespace | 16 December 2018, 15:43:49 UTC |
4be865e | ytetra | 16 December 2018, 15:43:43 UTC | fix deploy (#284) | 16 December 2018, 15:43:43 UTC |
eb4a35b | Hougang Liu | 16 December 2018, 15:34:39 UTC | update Readme (#295) A trial can be corresponds to a k8s job, TFJob and PyTorchJob now. Not only k8s job any more. | 16 December 2018, 15:34:39 UTC |
5a7977d | Hougang Liu | 14 December 2018, 15:14:46 UTC | fix studyJob status suggestionCount mismatch error (#290) Fixes: #289 | 14 December 2018, 15:14:46 UTC |
41e8f7d | Hougang Liu | 14 December 2018, 01:18:22 UTC | fix invalid worker kind issue (#287) * fix invalid worker kind issue studyJob should go to 'Failed' status when worker kind is invalid * add PyTorchJob as valid worker job kind | 14 December 2018, 01:18:22 UTC |
67ce5ee | andreyvelich | 14 December 2018, 00:41:19 UTC | Change nasjob yaml file | 14 December 2018, 00:41:19 UTC |
89e56a3 | andreyvelich | 14 December 2018, 00:34:58 UTC | Add fields to studyjob structure | 14 December 2018, 00:34:58 UTC |
33b2e58 | oshima | 13 December 2018, 20:00:04 UTC | get metricscollector by API (#292) Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> | 13 December 2018, 20:00:04 UTC |
f16aecc | Johnu George | 13 December 2018, 16:32:46 UTC | Support Pytorch job in Katib (#283) * Pytorch support in Katib * Adding pytorch worker kind to metrics collector * Updating Gopkg * Adding sleep * Changing the worker name * Adding gcr image | 13 December 2018, 16:32:46 UTC |
5527e34 | Johnu George | 12 December 2018, 17:01:34 UTC | Update k8s cluster version to 1.10 (#286) | 12 December 2018, 17:01:34 UTC |
67eca98 | oshima | 11 December 2018, 07:22:12 UTC | Enrich GUI (#264) * allow to create studyjob from UI Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * show success alert Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add rbac for ui Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix bug Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * rebase master Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add metrics collector manager to UI Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix typo Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> | 11 December 2018, 07:22:12 UTC |
86cddd3 | Hougang Liu | 11 December 2018, 06:46:34 UTC | update README (#281) | 11 December 2018, 06:46:34 UTC |
1c707dc | Hougang Liu | 11 December 2018, 00:30:40 UTC | fix typo error for MinikubeDemo (#282) | 11 December 2018, 00:30:40 UTC |
f8590e0 | Hougang Liu | 10 December 2018, 06:17:24 UTC | fix typo error (#280) | 10 December 2018, 06:17:24 UTC |
edf6cb5 | ytetra | 09 December 2018, 13:47:06 UTC | add e2eTest of each suggestion algorithm (#265) * random&grid * hyperband * add hyperband test * add grid case check | 09 December 2018, 13:47:06 UTC |
f4913b3 | Richard Liu | 09 December 2018, 12:52:52 UTC | Allow studyjobcontroller to delete pods (#278) | 09 December 2018, 12:52:52 UTC |
c8efb35 | Richard Liu | 07 December 2018, 16:42:11 UTC | Fix katib ui resource paths (#277) | 07 December 2018, 16:42:11 UTC |
36d8d25 | Koichiro Den | 05 December 2018, 09:12:00 UTC | Implement gRPC Health Checking Protocol + add readiness/liveness probes to vizier-core (#270) * Ensure vizier-core never been stuck too long waiting for DB conn Signed-off-by: Koichiro Den <den@valinux.co.jp> * Add standard Health gRPC service Signed-off-by: Koichiro Den <den@valinux.co.jp> * Change db.New to return error instead of exit(1) with log.Fatal Signed-off-by: Koichiro Den <den@valinux.co.jp> * Add SelectOne() to VizierDBInterface Signed-off-by: Koichiro Den <den@valinux.co.jp> * Rename import for later convenience Signed-off-by: Koichiro Den <den@valinux.co.jp> * Implement and register Health Server for Katib manager Signed-off-by: Koichiro Den <den@valinux.co.jp> * Add readiness/liveness probes to vizier-core Signed-off-by: Koichiro Den <den@valinux.co.jp> * Update test codebase Fixes: 61ac5607353 ("Add SelectOne() to VizierDBInterface") Signed-off-by: Koichiro Den <den@valinux.co.jp> | 05 December 2018, 09:12:00 UTC |
3516dda | Richard Liu | 05 December 2018, 08:33:36 UTC | POC: Katib integration with tf-operator (#267) * TF operator part 1 * Add consts * Fix * Update worker; fix schemes * Change example * Add rbac rules * Add crd * Add sleep for debugging * Log cluster name * Remove unrelated change * use katibapi.State | 05 December 2018, 08:33:36 UTC |
55f125c | ytetra | 05 December 2018, 07:02:33 UTC | fix make timing (#271) | 05 December 2018, 07:02:33 UTC |
f863b87 | IWAMOTO Toshihiro | 05 December 2018, 05:13:30 UTC | Add Update{Study,Trial} (#269) Only tested with unit tests. Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> | 05 December 2018, 05:13:30 UTC |
0e3e890 | oshima | 04 December 2018, 02:57:06 UTC | add Richard Liu to OWNERS (#274) Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> | 04 December 2018, 02:57:06 UTC |
211c6ba | oshima | 04 December 2018, 01:58:23 UTC | fix uncompleted value in ui (#238) Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> | 04 December 2018, 01:58:23 UTC |
1104524 | oshima | 04 December 2018, 01:24:06 UTC | fix bayesian optimization suggestion (#251) * fix bayse optimization suggestion Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add bayseopt-example Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * reset x_train in burn-in Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * validate parameters Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> | 04 December 2018, 01:24:06 UTC |
72a0fc0 | Koichiro Den | 30 November 2018, 12:41:56 UTC | Prevent pod restarts caused by slow db boot (#261) * Add readinessProbe for vizier-db Signed-off-by: Koichiro Den <den@valinux.co.jp> * Fix MYSQL_ROOT_PASSWORD Fixes: 67e94c7697bd ("Set MYSQL_ROOT_PASSWORD via Secret (#253)") Signed-off-by: Koichiro Den <den@valinux.co.jp> * Add simple loop to wait for DB connection successfully opened Signed-off-by: Koichiro Den <den@valinux.co.jp> | 30 November 2018, 12:41:56 UTC |
3f5462d | ytetra | 30 November 2018, 12:06:00 UTC | add UT of each suggestion algorithm (#237) * add random algorithm UT * add grid algorithm UT * add hyperband algorithm UT * fix typo * fix typo * add some tests * change various ParameterType pattern * add gengrid() test * fix significant figure | 30 November 2018, 12:06:00 UTC |
24160cb | Richard Liu | 28 November 2018, 06:52:51 UTC | Downgrade kubernetes dependency to 1.10.1 (#256) * downgrade to 1.10.1 * Delete pods * Fix job-name * Set successfulJobsHistoryLimit to 0 * Add comments | 28 November 2018, 06:52:51 UTC |
b7145b3 | Koichiro Den | 26 November 2018, 10:04:51 UTC | Fix incorrectly set namespace (#260) Commit b6f8e07d26a ("Update manifests (#246)") has just changed the namespace as a whole. This new manifest should be updated as well. Fixes: 67e94c7697b ("Set MYSQL_ROOT_PASSWORD via Secret (#253)") Signed-off-by: Koichiro Den <den@valinux.co.jp> | 26 November 2018, 10:04:51 UTC |
67e94c7 | Koichiro Den | 22 November 2018, 05:59:22 UTC | Set MYSQL_ROOT_PASSWORD via Secret (#253) * Set randomly generated MYSQL_ROOT_PASSWORD via Secret Signed-off-by: Koichiro Den <den@valinux.co.jp> * Seperate manifest for MYSQL_ROOT_PASSWORD, "test" being set by default Signed-off-by: Koichiro Den <den@valinux.co.jp> * Update run-tests.sh Fixes: 5312459c28f7 ("Set randomly generated MYSQL_ROOT_PASSWORD via Secret") Signed-off-by: Koichiro Den <den@valinux.co.jp> | 22 November 2018, 05:59:22 UTC |
63dc070 | oshima | 20 November 2018, 23:57:25 UTC | update UI (#255) Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> | 20 November 2018, 23:57:25 UTC |
e5e2dcd | Richard Liu | 20 November 2018, 23:18:57 UTC | Refactor studyjobcontroller (#254) * Refactor studyjob controller * Refactor * Go format files * More refactor * Rename studyjobcontroller to studyjob | 20 November 2018, 23:18:57 UTC |
597064a | Andrey | 20 November 2018, 08:19:24 UTC | Change deploy.sh for Minikube example (#252) * Change deploy for Minikube Example * Change namespace to kubeflow in Minikube example * Delete lines about modeldb from deploy | 20 November 2018, 08:19:24 UTC |
206bcaa | IWAMOTO Toshihiro | 20 November 2018, 01:43:06 UTC | Add mysql based unit tests (#243) Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> | 20 November 2018, 01:43:06 UTC |
b6f8e07 | oshima | 19 November 2018, 04:58:32 UTC | Update manifests (#246) * change namespace katib -> kubeflow Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * change namespace of tfevent-mc | 19 November 2018, 04:58:32 UTC |
f7aff4a | Michelle Casbon | 16 November 2018, 03:05:02 UTC | Add texasmichelle as reviewer (#247) | 16 November 2018, 03:05:02 UTC |
94b138a | oshima | 16 November 2018, 01:26:56 UTC | Tf event mc (#235) * add tf-event mc Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add tfevent mc ci Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add tfeventmc doc Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add comment and use logger Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> | 16 November 2018, 01:26:56 UTC |
9d59a10 | IWAMOTO Toshihiro | 14 November 2018, 06:14:39 UTC | Fix typos for json and objective (#242) Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> | 14 November 2018, 06:14:39 UTC |
29e53b8 | Richard Liu | 13 November 2018, 02:11:21 UTC | Add richardsliu to OWNERS/reviewer (#239) * Add richardsliu to OWNERS * Add richardsliu as reviewer | 13 November 2018, 02:11:21 UTC |
a01f482 | wukong1992 | 08 November 2018, 08:55:46 UTC | add starttime and completiontime to worker (#236) | 08 November 2018, 08:55:46 UTC |