https://github.com/kubeflow/katib

sort by:
Revision Author Date Message Commit Date
adb9d1d Update main.go 24 January 2019, 01:21:05 UTC
243e3ce Update interface.go 24 January 2019, 01:20:53 UTC
a92a9d6 Update main.go 22 January 2019, 23:28:50 UTC
6431ed1 Update interface.go 22 January 2019, 23:28:26 UTC
de311dd Update interface.go 22 January 2019, 23:21:44 UTC
4151390 Code refactored + several new functions introduced 22 January 2019, 22:27:21 UTC
f0d92b7 small fix 22 January 2019, 21:30:08 UTC
10238cf db test fixed 22 January 2019, 19:31:50 UTC
f90580c HP/NAS Separations, TIMESTAMP added to Trials 22 January 2019, 19:21:07 UTC
3c52221 ALMOST FINJAL FIX 18 January 2019, 22:51:03 UTC
7b14c28 Update main.go 18 January 2019, 02:05:08 UTC
049a90e Update main.go 18 January 2019, 00:23:30 UTC
c010b0e Update interface.go 18 January 2019, 00:18:50 UTC
0850c33 asdasd 18 January 2019, 00:16:42 UTC
6289424 Manager fixed 18 January 2019, 00:01:34 UTC
74f431d INIT AND APIS ADDED 17 January 2019, 12:01:40 UTC
80d6158 small fix 17 January 2019, 03:46:10 UTC
ee8ee09 Merge pull request #1 from kubeflow/master ss 17 January 2019, 02:00:34 UTC
d41f8e8 Add information how to run TFjob and Pytorch examples in Katib (#321) * Add doc for tfjob and pytorch examples in Katib * Add contents * Fix README * Fix link to examples in README * Fix README * Add information about Katib UI and status of StudyJob * Add Ambassador information 16 January 2019, 15:30:01 UTC
0ed361c Add xgboost example using Bayesian optimization (#320) * Add xgboost example * Add comments for ames example 15 January 2019, 23:22:00 UTC
4a69776 katib should be able to be deployed in any namespace (#324) 15 January 2019, 02:32:07 UTC
3c37f31 Adding distributed pytorch example for katib (#309) 08 January 2019, 10:56:59 UTC
9aa90fa minor fixes (#307) 08 January 2019, 10:22:02 UTC
f78a108 delete obsolete data in db (#315) * delete obsolete data in db * add delete study test * make sure trials and workers deleted when study deleted in ut test 07 January 2019, 15:22:39 UTC
fae6aa5 add bestTrialId to statusJob status (#312) * add bestTrialId to statusJob status * generate mock and add bestworkerid 03 January 2019, 03:42:45 UTC
f24889c Add api doc (#303) * add api doc Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix typo Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add instructions for update api files and docs Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix typo Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 25 December 2018, 17:22:57 UTC
1295f45 validate studyJob when first reconcile it (#308) * validate studyJob when first reconcile it Fixes: #297 * use 3rd-party uuid instead of self-define one k8s.io/apimachinery/pkg/util/uuid is used in kubernetes source code 23 December 2018, 02:11:11 UTC
cbe91f8 add hougangliu as a reviewer (#310) 22 December 2018, 05:51:27 UTC
9baabbf Adding to OWNERS file (#304) * Adding to OWNERS file * adding to reviewers 21 December 2018, 13:57:09 UTC
b11b81d sync up worker status all the time (#299) Fixes: #298 20 December 2018, 04:53:31 UTC
bca0b58 studyJob with non-kubeflow namespace cannot work (#302) 19 December 2018, 18:02:49 UTC
8e89813 Adding master pod check for default metric collector (#300) 19 December 2018, 15:03:34 UTC
07e0fd2 reduce some redundant code (#296) 19 December 2018, 01:24:56 UTC
28c5b1c Extend studyjob client API (#288) * Add namespace parameter to studyJob client API * Change if statement for namespace * Create func getNamespace 16 December 2018, 15:43:49 UTC
4be865e fix deploy (#284) 16 December 2018, 15:43:43 UTC
eb4a35b update Readme (#295) A trial can be corresponds to a k8s job, TFJob and PyTorchJob now. Not only k8s job any more. 16 December 2018, 15:34:39 UTC
5a7977d fix studyJob status suggestionCount mismatch error (#290) Fixes: #289 14 December 2018, 15:14:46 UTC
41e8f7d fix invalid worker kind issue (#287) * fix invalid worker kind issue studyJob should go to 'Failed' status when worker kind is invalid * add PyTorchJob as valid worker job kind 14 December 2018, 01:18:22 UTC
33b2e58 get metricscollector by API (#292) Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 13 December 2018, 20:00:04 UTC
f16aecc Support Pytorch job in Katib (#283) * Pytorch support in Katib * Adding pytorch worker kind to metrics collector * Updating Gopkg * Adding sleep * Changing the worker name * Adding gcr image 13 December 2018, 16:32:46 UTC
5527e34 Update k8s cluster version to 1.10 (#286) 12 December 2018, 17:01:34 UTC
67eca98 Enrich GUI (#264) * allow to create studyjob from UI Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * show success alert Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add rbac for ui Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix bug Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * rebase master Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add metrics collector manager to UI Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix typo Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 11 December 2018, 07:22:12 UTC
86cddd3 update README (#281) 11 December 2018, 06:46:34 UTC
1c707dc fix typo error for MinikubeDemo (#282) 11 December 2018, 00:30:40 UTC
f8590e0 fix typo error (#280) 10 December 2018, 06:17:24 UTC
edf6cb5 add e2eTest of each suggestion algorithm (#265) * random&grid * hyperband * add hyperband test * add grid case check 09 December 2018, 13:47:06 UTC
f4913b3 Allow studyjobcontroller to delete pods (#278) 09 December 2018, 12:52:52 UTC
c8efb35 Fix katib ui resource paths (#277) 07 December 2018, 16:42:11 UTC
36d8d25 Implement gRPC Health Checking Protocol + add readiness/liveness probes to vizier-core (#270) * Ensure vizier-core never been stuck too long waiting for DB conn Signed-off-by: Koichiro Den <den@valinux.co.jp> * Add standard Health gRPC service Signed-off-by: Koichiro Den <den@valinux.co.jp> * Change db.New to return error instead of exit(1) with log.Fatal Signed-off-by: Koichiro Den <den@valinux.co.jp> * Add SelectOne() to VizierDBInterface Signed-off-by: Koichiro Den <den@valinux.co.jp> * Rename import for later convenience Signed-off-by: Koichiro Den <den@valinux.co.jp> * Implement and register Health Server for Katib manager Signed-off-by: Koichiro Den <den@valinux.co.jp> * Add readiness/liveness probes to vizier-core Signed-off-by: Koichiro Den <den@valinux.co.jp> * Update test codebase Fixes: 61ac5607353 ("Add SelectOne() to VizierDBInterface") Signed-off-by: Koichiro Den <den@valinux.co.jp> 05 December 2018, 09:12:00 UTC
3516dda POC: Katib integration with tf-operator (#267) * TF operator part 1 * Add consts * Fix * Update worker; fix schemes * Change example * Add rbac rules * Add crd * Add sleep for debugging * Log cluster name * Remove unrelated change * use katibapi.State 05 December 2018, 08:33:36 UTC
55f125c fix make timing (#271) 05 December 2018, 07:02:33 UTC
f863b87 Add Update{Study,Trial} (#269) Only tested with unit tests. Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> 05 December 2018, 05:13:30 UTC
0e3e890 add Richard Liu to OWNERS (#274) Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 04 December 2018, 02:57:06 UTC
211c6ba fix uncompleted value in ui (#238) Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 04 December 2018, 01:58:23 UTC
1104524 fix bayesian optimization suggestion (#251) * fix bayse optimization suggestion Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add bayseopt-example Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * reset x_train in burn-in Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * validate parameters Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 04 December 2018, 01:24:06 UTC
72a0fc0 Prevent pod restarts caused by slow db boot (#261) * Add readinessProbe for vizier-db Signed-off-by: Koichiro Den <den@valinux.co.jp> * Fix MYSQL_ROOT_PASSWORD Fixes: 67e94c7697bd ("Set MYSQL_ROOT_PASSWORD via Secret (#253)") Signed-off-by: Koichiro Den <den@valinux.co.jp> * Add simple loop to wait for DB connection successfully opened Signed-off-by: Koichiro Den <den@valinux.co.jp> 30 November 2018, 12:41:56 UTC
3f5462d add UT of each suggestion algorithm (#237) * add random algorithm UT * add grid algorithm UT * add hyperband algorithm UT * fix typo * fix typo * add some tests * change various ParameterType pattern * add gengrid() test * fix significant figure 30 November 2018, 12:06:00 UTC
24160cb Downgrade kubernetes dependency to 1.10.1 (#256) * downgrade to 1.10.1 * Delete pods * Fix job-name * Set successfulJobsHistoryLimit to 0 * Add comments 28 November 2018, 06:52:51 UTC
b7145b3 Fix incorrectly set namespace (#260) Commit b6f8e07d26a ("Update manifests (#246)") has just changed the namespace as a whole. This new manifest should be updated as well. Fixes: 67e94c7697b ("Set MYSQL_ROOT_PASSWORD via Secret (#253)") Signed-off-by: Koichiro Den <den@valinux.co.jp> 26 November 2018, 10:04:51 UTC
67e94c7 Set MYSQL_ROOT_PASSWORD via Secret (#253) * Set randomly generated MYSQL_ROOT_PASSWORD via Secret Signed-off-by: Koichiro Den <den@valinux.co.jp> * Seperate manifest for MYSQL_ROOT_PASSWORD, "test" being set by default Signed-off-by: Koichiro Den <den@valinux.co.jp> * Update run-tests.sh Fixes: 5312459c28f7 ("Set randomly generated MYSQL_ROOT_PASSWORD via Secret") Signed-off-by: Koichiro Den <den@valinux.co.jp> 22 November 2018, 05:59:22 UTC
63dc070 update UI (#255) Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 20 November 2018, 23:57:25 UTC
e5e2dcd Refactor studyjobcontroller (#254) * Refactor studyjob controller * Refactor * Go format files * More refactor * Rename studyjobcontroller to studyjob 20 November 2018, 23:18:57 UTC
597064a Change deploy.sh for Minikube example (#252) * Change deploy for Minikube Example * Change namespace to kubeflow in Minikube example * Delete lines about modeldb from deploy 20 November 2018, 08:19:24 UTC
206bcaa Add mysql based unit tests (#243) Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> 20 November 2018, 01:43:06 UTC
b6f8e07 Update manifests (#246) * change namespace katib -> kubeflow Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * change namespace of tfevent-mc 19 November 2018, 04:58:32 UTC
f7aff4a Add texasmichelle as reviewer (#247) 16 November 2018, 03:05:02 UTC
94b138a Tf event mc (#235) * add tf-event mc Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add tfevent mc ci Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add tfeventmc doc Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add comment and use logger Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 16 November 2018, 01:26:56 UTC
9d59a10 Fix typos for json and objective (#242) Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> 14 November 2018, 06:14:39 UTC
29e53b8 Add richardsliu to OWNERS/reviewer (#239) * Add richardsliu to OWNERS * Add richardsliu as reviewer 13 November 2018, 02:11:21 UTC
a01f482 add starttime and completiontime to worker (#236) 08 November 2018, 08:55:46 UTC
5e51974 Fix typo (#233) * correct "purse" to "parse" * correct "Doubel" to "Double" * Update push-model.go fix lowercase * Update push-study.go use lowercase 05 November 2018, 20:31:38 UTC
04837a4 More DB unit tests (#234) * Fix EarlyStopParam and SuggestionParam DB methods GetEarlyStopParamList and GetSuggestionParamList mixed up the column order and they returned nothing. Also, SetEarlyStopParam didn't return an ID properly. Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> * Add more DB UTs Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> 05 November 2018, 07:47:01 UTC
8e90513 Fix the build script after #208 (#231) Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> 02 November 2018, 05:44:46 UTC
9f87fd8 Only retry an INSERT operation on unique constraint violation (#229) The retry logic is used to generate an unique ID, but if there is another error the DB code can fall into an infinite loop. Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> 01 November 2018, 06:00:42 UTC
0bc5182 New UI for Katib (#208) * add ui Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add ui Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * update test and doc Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix test Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix test Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix test Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * remove modelDB Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix test Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * refactor Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add loading img Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix test Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * Add loading image Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * refactor Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add root redirection Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add latestLog flag to GetWorkerFullInfo Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix test Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 29 October 2018, 04:20:23 UTC
7eeea12 fix slice range (#226) 28 October 2018, 09:46:16 UTC
13373d2 More db tests (#225) * Remove obsolete comments and an import Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> * Add Worker UTs Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> 25 October 2018, 03:22:52 UTC
106235b Fix storelogs (#222) * Fix StoreWorkerLogs The function has been storing into worker_metrics with duplicates and wrong timestamps for some time. The fix changes the worker_lastlogs DB table definition. DBs must be recreated. Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> * Add foreign key constraints to worker log DB tables and tidy up formatting This patch make sure worker_* rows have matching row in the worker table. Also changes multi-line string formatting for readability. Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> 24 October 2018, 04:00:15 UTC
4dc1aed Check errors in order to avoid SEGV (#219) Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> 19 October 2018, 07:34:49 UTC
1e14d3c Fix reqest count (#214) * fix manifest examples Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix MetricsCollector instance Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * eval req count after status check Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix cont check when ReqestCount is not set Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 17 October 2018, 06:30:02 UTC
44bd27e enlarge max of check goal grpc (#200) Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 17 October 2018, 06:04:00 UTC
eb12212 fix manifest examples (#213) * fix manifest examples Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix MetricsCollector instance Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 17 October 2018, 05:35:43 UTC
7ad8cfc Use camel-case instead of snake-case (#204) * Use camel-case instead of snake-case * Capitalize abbreviations in variables 15 October 2018, 17:02:23 UTC
1598256 Point to the example version of ConfigMap from MinikubeDemo.md (#202) Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> 15 October 2018, 17:02:13 UTC
1c95029 Fix CRD validation (#191) While CustomResourceDefinition.spec.scope defaults to Namespaced, omitting this generates a validation error. Just supply the default. Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> 15 October 2018, 06:32:04 UTC
3dce496 Bayesian Suggestion Algorithm Fixes (#188) * update requirements.txt for bayesian * add bayesian suggestion algorithm to deploy script * separate out python proto compiler command * update PYTHONPATH * update autogenerated python protobuf and grpc code * Update run-tests.sh 15 October 2018, 03:42:34 UTC
cbe5fee Fix deadlock condition in ReconcileStudyJobController#Reconcile (#201) 11 October 2018, 04:19:16 UTC
6609587 support request count (#193) 10 October 2018, 05:10:25 UTC
6195461 add building metrics-collector to CI (#190) Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 10 October 2018, 03:34:21 UTC
0bfa23b Fix CI (#194) * add -o xtrace Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * use client-cert instead password Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * delete get-credentials Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * delete unnecessary line Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 10 October 2018, 01:12:58 UTC
b502fd5 Add Katib logo (#189) * add logo Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix logo size Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix logo size 04 October 2018, 08:12:54 UTC
d404ee5 fix random-example (#181) Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 01 October 2018, 02:47:02 UTC
8332d6e fix-MinikubeDemodox (#171) Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 01 October 2018, 02:46:57 UTC
c9028e1 Add Retain flag (#176) * update vendors Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add retain flags to study job controller Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix vendor Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix unchange status Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add handling for failed status Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 01 October 2018, 02:41:19 UTC
9133042 Add pv example to katibDB (#178) * add pv to katibDB Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix test Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add pv to MinikubeDemo/deploy.sh Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * update Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix test Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 01 October 2018, 00:29:24 UTC
74b833a Update `checkStatus` return value orders (#185) Make `error` to the last postion. 29 September 2018, 05:38:15 UTC
b46378c Fix jsonnet so we override registry in image builds. (#177) * Fix jsonnet so we override registry in image builds. * The overrides for parameters isn't being passed through to subcommands like the image build template; as a result we don't actually override the parameters in the image step templates. * Make overrides in parts a required parameter so we don't accidentally forget to exclude it in the future. * Related to #79 releaser for Katib. * Fix prow_config.yaml 21 September 2018, 17:35:54 UTC
31d2e10 Postsubmit run should auto-push images to kubeflow-images-public (#174) * Related to #141 katib releaser * Related to kubeflow/kubeflow#1574 use prow to build our images * We are moving to using prow to run our release workflows and treating them just like regular workflows. * We are doing this because we need to get regular signal about whether the image builds are succeeding by running on postsubmit. * We also want to run them on presubmit so that we can verify any changes to the workflwo don't break the workflow. * Rather than define a new workflow to build the images; we can just reuse the existing E2E workflow which already builds all the images. We just change postsubmit to push to kubeflow-images-public. * Delete the releaser app; we will just the existing E2E test workflow and have that push to gcr.io/kubeflow-images-public on postsubmit. 21 September 2018, 09:17:41 UTC
81f2b74 Add REST API using grpc gateway (#142) * dep ensure * add grpc-gateway via dep * update protobuf via dep ensure * update compiled go code, add reverse proxy * add REST entrypoint for manager * update API build script * use build script to generate code * remove binary file * update build, deploy scripts for REST API * change name * add manifests for core-rest * remove deploy * add comments * remove vendor * use Gopkg files from master * update Gopkg files * update Gopkg files * update proto files and protobufs * update build scripts and tests * copy vendor for tests * uncomment deploy * update image name * ignore vizier-core-rest for port forwarding * update build script * update manifests * Add docs for REST API * core review changes * remove service account 21 September 2018, 06:27:53 UTC
f4887a6 add mutex to studyjob controller (#170) * add mutex to studyjob controller Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * use sync.Map Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * update only when the instance was changed Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 17 September 2018, 09:20:56 UTC
back to top