529fe1a | Julian Qian | 15 March 2019, 20:27:06 UTC | The link should say README.md as well. The link should say README.md as well. | 15 March 2019, 20:27:06 UTC |
2bd52c5 | Julian Qian | 15 March 2019, 20:05:43 UTC | fix demo link change to correct link README.md | 15 March 2019, 20:05:43 UTC |
06f955b | Jinan Zhou | 14 March 2019, 01:58:22 UTC | Add fault tolerance support for trial failure (#424) * add fault tolerance for trial failure * fix a small typo * fix a typo * improve fault processing strategy * add an important TODO * fix typo * add some more TODOs | 14 March 2019, 01:58:22 UTC |
c87d583 | jdplatt | 11 March 2019, 19:54:38 UTC | Test for Bayesian Optimization Algo (#406) * added tests for acquisition function and models * added tests for global_optimizer * added tests for boa * minor linting * tests for algorithm manager * added discrete parameter to study config * covered all parameter types * moved python script to testing folder * added python tests to unit tests * remembered to uncomment existing tests * fixed path to test script * moved python tests to separate job in workflow * added run command to test script | 11 March 2019, 19:54:38 UTC |
61451ef | Richard Liu | 08 March 2019, 02:23:33 UTC | Katib v1alpha2 API for CRDs (#381) * v1alpha2 API proposal * Fix comments round 1 * Refactor into Experiment and Trial * Incorporate feedback from meeting * Rename * Minor edits | 08 March 2019, 02:23:33 UTC |
86bd27a | Andrey Velichkevich | 06 March 2019, 05:27:59 UTC | Add NAS team as reviewers (#419) * Add NAS team in reviewers * Update reviewers | 06 March 2019, 05:27:59 UTC |
feee2f9 | Jinan Zhou | 06 March 2019, 01:48:01 UTC | Multiple Trials for Reinforcement Learning Suggestion (#416) * supoort multiple trials * adjust To Do * language improvement in README.md * fix several problems * fix a potential problem * handle the GetEvaluationResult() return None problem | 06 March 2019, 01:48:01 UTC |
3a705a1 | Jinan Zhou | 06 March 2019, 00:40:03 UTC | Fix the package version in training container (#418) * fix the version of tf and keras * fix a typo | 06 March 2019, 00:40:03 UTC |
8f89ad4 | Andrey Velichkevich | 05 March 2019, 23:47:59 UTC | Add validation for NAS job in Katib controller (#398) * Initial commit * Add validation for NAS config * Fix validation * Add algorithmType in NasConfig validation * Add Discrete ParameterType to validation * Move validation to webhook Change GetJobType function Make a list with NAS algorithms * Add ValidateSuggestionParameters function in Katib API * Fix api * Add ValidateSuggestionParameters to Suggestion service * Change isValid to int32 * Create Validation function in NAS RL Suggestion service * Fix small problems * Reduce code inside Validation function * Add empty ValidateSuggestionParameters function in each HP service written in GO * Fix logging * Add ValidateSuggestionParameters to mock * Handle Unvailable error | 05 March 2019, 23:47:59 UTC |
db6b83b | Andrey Velichkevich | 01 March 2019, 02:54:21 UTC | Fix path to api protobuf (#415) | 01 March 2019, 02:54:21 UTC |
4d8c599 | Jinan Zhou | 27 February 2019, 03:05:45 UTC | Add support for parallel studyjobs (#404) * Add support for parallel studyjobs * fix a typo * Reorganize the program a little bit * fix a typo * fix a typo | 27 February 2019, 03:05:45 UTC |
87a31f3 | Jinan Zhou | 27 February 2019, 01:33:48 UTC | Add separable/depthwise convolution, data augmentation and multiple GPU support (#393) * add separable/depthwise convolution in operation library * add ENAS example StudyJob yaml * remove ENAS example, add data augmentation, add multiple GPU support | 27 February 2019, 01:33:48 UTC |
4d031e7 | Andrey Velichkevich | 27 February 2019, 00:32:45 UTC | Add create time to Trial API (#410) * Add create time to Trial API * Add Trial create time information * Fix UT for db | 27 February 2019, 00:32:45 UTC |
26da3ea | Johnu George | 26 February 2019, 04:32:34 UTC | Metric collector must fail on error (#405) * Fail when unable to collect logs * Set backlimit to 0 for jobs | 26 February 2019, 04:32:34 UTC |
6b75138 | Hougang Liu | 25 February 2019, 17:37:16 UTC | add latest tag for katib images (#409) | 25 February 2019, 17:37:16 UTC |
46d2dc7 | Hougang Liu | 22 February 2019, 02:35:03 UTC | add build and test for suggestion nasrl (#401) | 22 February 2019, 02:35:03 UTC |
d6a67ea | Akado2009 | 21 February 2019, 01:37:09 UTC | Database APIs for NAS updated (#394) * FINAL PUSH * FIX TESTS * new lock * new lock * small fi * DELET SPACE * deleted ununsed function | 21 February 2019, 01:37:09 UTC |
3bb8b54 | Jinan Zhou | 21 February 2019, 00:53:59 UTC | Suggestion for Neural Architecture Search with Reinforcement Learning (#339) * Suggestion for Neural Architecture Search with Reinforcement Learning * Add NAS RL Suggestion * Fix new line * set json format for GetSuggestion() * finish trial return in GetSuggestion(), finish GetEvaluationHistory, and fix bugs * fix a bug in GetEvaluationResult() * fix bigs in GetEvaluationResult * fix an error in GetEvaluatinResult * Add python Katib api * Remove unnecessary requirements * add about for suggestion * rename to README * Add picture explanations; make the printouts more organized * fix typos * fix some small problems * Fix several problems * Fix a typo * fix some problems * small fixes * Suggestion do not need to handle uncompleted trials * fix a small problem | 21 February 2019, 00:53:59 UTC |
5a1a791 | Hougang Liu | 20 February 2019, 17:40:23 UTC | add validating webhook for studyJob (#383) * add validating webhook for studyJob If create/update a studyJob with bad CR manifest or invalid configuration, k8s api server will reject the request. Fixes: #314 * add test * allow check "kubectl" error code | 20 February 2019, 17:40:23 UTC |
8a89b9e | Johnu George | 20 February 2019, 06:19:50 UTC | Removing Operator specific handling during a StudyJob run (#387) * Removing Operator specific handling during a StudyJob run * Return empty in error | 20 February 2019, 06:19:50 UTC |
edecd39 | Andrey Velichkevich | 20 February 2019, 00:41:30 UTC | Delete modeldb from unit tests (#391) * Delete modeldb from unit tests * Add library to interface test | 20 February 2019, 00:41:30 UTC |
c0f2f07 | Hougang Liu | 19 February 2019, 03:21:42 UTC | show studyjob condition when run kubectl get (#389) | 19 February 2019, 03:21:42 UTC |
ee62c33 | Jinan Zhou | 15 February 2019, 02:23:48 UTC | Training Container with Model Constructor for cifar10 (#345) * Training Container with Model Constructor for cifar10 * fix a small bug * make num_epochs a parameter | 15 February 2019, 02:23:48 UTC |
3706fce | Hougang Liu | 14 February 2019, 18:15:03 UTC | add studyjob python client (#379) | 14 February 2019, 18:15:03 UTC |
1de9307 | Hougang Liu | 14 February 2019, 18:14:53 UTC | fix wrong example (#378) | 14 February 2019, 18:14:53 UTC |
03ca08f | Johnu George | 14 February 2019, 16:36:02 UTC | Upgrading and controller runtime k8s to 1.11.2 (#376) | 14 February 2019, 16:36:02 UTC |
a5c8e02 | IWAMOTO Toshihiro | 14 February 2019, 05:32:03 UTC | Properly initialize CI cluster credential (#360) It has been using the cluster where argo ran | 14 February 2019, 05:32:03 UTC |
41a5a2e | Alexandra Johnson | 13 February 2019, 19:27:47 UTC | Include go dependencies in developer-guide.md (#369) Looks like Google protobufs might also be a dependency? | 13 February 2019, 19:27:47 UTC |
d6ea2d5 | Hougang Liu | 12 February 2019, 02:47:44 UTC | fix invalid memory address (#368) | 12 February 2019, 02:47:44 UTC |
421cbff | Richard Liu | 08 February 2019, 06:07:13 UTC | Fix presubmits (#363) * Fix typo * Fix gcloud builds submit command * Use printf() instead of print() | 08 February 2019, 06:07:13 UTC |
0ea34b1 | Richard Liu | 01 February 2019, 04:28:57 UTC | Katib 2019 Roadmap (#348) * roadmap * Fixing format * Add links to github issues * Fix comments | 01 February 2019, 04:28:57 UTC |
afee0c3 | Richard Liu | 29 January 2019, 05:43:25 UTC | Update OWNERS (#350) | 29 January 2019, 05:43:25 UTC |
f11c13e | Andrey Velichkevich | 29 January 2019, 01:23:06 UTC | Extend Katib API for NAS jobs (#327) * Add fields to studyjob structure * Change nasjob yaml file * Change parameter type * Add Parameter Type=range * Change API * Change input size * Reset API structure * Change StudyJob API structure * Remove Range parameter * Fix api.proto * Fix gopkg.toml * Remove old nasjob file * Fix nasjob.yaml * Add custom suggestion * Add blank NAS suggestion Change Katib API to process yaml file for NAS * Add correct YAML file for NAS example * Fix newline * Change StudyID to 1 * Add jobType parameter in Parsing * Remove changes in manager * Add NasConfig inside Yaml file * Fix name in nasConfig * Fix get StudyConfig in NAS * Add JobType in all services * Add job_type in bayesian_service * Add pointers in NasConfig structure * Fix Pointer in API * Add consts for jobType Remove return from populateCommonConfigFields * Move const jobType to const file * Remove Range parameter * Modify YAML file for NAS jobs * Add getStudyJobType function in GRPC server * Add blank GetStudyJobType func in manager * Fix metrics collector * Remove jobType from getStudy * Remove getStudyJobType from manager * Add NAS RL yaml deployment * Change worker to GPU * Clean nasrl suggestion * Add -u inside training-container * Fix namespace in worker template | 29 January 2019, 01:23:06 UTC |
f4026e4 | Hougang Liu | 25 January 2019, 00:26:22 UTC | ignore tfjob/pytorch job if corresponding CRD not created (#335) * ignore tfjob/pytorch job if corresponding CRD not created * update log message * only ignore NoMatchError when watch CRD * refactor func name for watch error | 25 January 2019, 00:26:22 UTC |
c67892f | Guang Ya Liu | 24 January 2019, 16:50:31 UTC | Clarify the example UI is generated by random-example. (#333) | 24 January 2019, 16:50:31 UTC |
a777721 | Hougang Liu | 24 January 2019, 01:16:26 UTC | only try to delete study info in db when in need (#342) | 24 January 2019, 01:16:26 UTC |
8545970 | Hougang Liu | 22 January 2019, 19:40:17 UTC | omit empty fields for studyjob status (#336) | 22 January 2019, 19:40:17 UTC |
15bbcae | Tim Zaman | 18 January 2019, 19:07:12 UTC | Update pytorch example with latest image (#329) * Update pytorch example with latest image * Update pytorch example docker image | 18 January 2019, 19:07:12 UTC |
a24c428 | Richard Liu | 18 January 2019, 00:00:00 UTC | Fix typo (#330) | 18 January 2019, 00:00:00 UTC |
d41f8e8 | Andrey Velichkevich | 16 January 2019, 15:30:01 UTC | Add information how to run TFjob and Pytorch examples in Katib (#321) * Add doc for tfjob and pytorch examples in Katib * Add contents * Fix README * Fix link to examples in README * Fix README * Add information about Katib UI and status of StudyJob * Add Ambassador information | 16 January 2019, 15:30:01 UTC |
0ed361c | Richard Liu | 15 January 2019, 23:22:00 UTC | Add xgboost example using Bayesian optimization (#320) * Add xgboost example * Add comments for ames example | 15 January 2019, 23:22:00 UTC |
4a69776 | Hougang Liu | 15 January 2019, 02:32:07 UTC | katib should be able to be deployed in any namespace (#324) | 15 January 2019, 02:32:07 UTC |
3c37f31 | Johnu George | 08 January 2019, 10:56:59 UTC | Adding distributed pytorch example for katib (#309) | 08 January 2019, 10:56:59 UTC |
9aa90fa | Johnu George | 08 January 2019, 10:22:02 UTC | minor fixes (#307) | 08 January 2019, 10:22:02 UTC |
f78a108 | Hougang Liu | 07 January 2019, 15:22:39 UTC | delete obsolete data in db (#315) * delete obsolete data in db * add delete study test * make sure trials and workers deleted when study deleted in ut test | 07 January 2019, 15:22:39 UTC |
fae6aa5 | Hougang Liu | 03 January 2019, 03:42:45 UTC | add bestTrialId to statusJob status (#312) * add bestTrialId to statusJob status * generate mock and add bestworkerid | 03 January 2019, 03:42:45 UTC |
f24889c | oshima | 25 December 2018, 17:22:57 UTC | Add api doc (#303) * add api doc Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix typo Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add instructions for update api files and docs Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix typo Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> | 25 December 2018, 17:22:57 UTC |
1295f45 | Hougang Liu | 23 December 2018, 02:11:11 UTC | validate studyJob when first reconcile it (#308) * validate studyJob when first reconcile it Fixes: #297 * use 3rd-party uuid instead of self-define one k8s.io/apimachinery/pkg/util/uuid is used in kubernetes source code | 23 December 2018, 02:11:11 UTC |
cbe91f8 | Hougang Liu | 22 December 2018, 05:51:27 UTC | add hougangliu as a reviewer (#310) | 22 December 2018, 05:51:27 UTC |
9baabbf | Johnu George | 21 December 2018, 13:57:09 UTC | Adding to OWNERS file (#304) * Adding to OWNERS file * adding to reviewers | 21 December 2018, 13:57:09 UTC |
b11b81d | Hougang Liu | 20 December 2018, 04:53:31 UTC | sync up worker status all the time (#299) Fixes: #298 | 20 December 2018, 04:53:31 UTC |
bca0b58 | Hougang Liu | 19 December 2018, 18:02:49 UTC | studyJob with non-kubeflow namespace cannot work (#302) | 19 December 2018, 18:02:49 UTC |
8e89813 | Johnu George | 19 December 2018, 15:03:34 UTC | Adding master pod check for default metric collector (#300) | 19 December 2018, 15:03:34 UTC |
07e0fd2 | Hougang Liu | 19 December 2018, 01:24:56 UTC | reduce some redundant code (#296) | 19 December 2018, 01:24:56 UTC |
28c5b1c | Andrey Velichkevich | 16 December 2018, 15:43:49 UTC | Extend studyjob client API (#288) * Add namespace parameter to studyJob client API * Change if statement for namespace * Create func getNamespace | 16 December 2018, 15:43:49 UTC |
4be865e | ytetra | 16 December 2018, 15:43:43 UTC | fix deploy (#284) | 16 December 2018, 15:43:43 UTC |
eb4a35b | Hougang Liu | 16 December 2018, 15:34:39 UTC | update Readme (#295) A trial can be corresponds to a k8s job, TFJob and PyTorchJob now. Not only k8s job any more. | 16 December 2018, 15:34:39 UTC |
5a7977d | Hougang Liu | 14 December 2018, 15:14:46 UTC | fix studyJob status suggestionCount mismatch error (#290) Fixes: #289 | 14 December 2018, 15:14:46 UTC |
41e8f7d | Hougang Liu | 14 December 2018, 01:18:22 UTC | fix invalid worker kind issue (#287) * fix invalid worker kind issue studyJob should go to 'Failed' status when worker kind is invalid * add PyTorchJob as valid worker job kind | 14 December 2018, 01:18:22 UTC |
33b2e58 | oshima | 13 December 2018, 20:00:04 UTC | get metricscollector by API (#292) Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> | 13 December 2018, 20:00:04 UTC |
f16aecc | Johnu George | 13 December 2018, 16:32:46 UTC | Support Pytorch job in Katib (#283) * Pytorch support in Katib * Adding pytorch worker kind to metrics collector * Updating Gopkg * Adding sleep * Changing the worker name * Adding gcr image | 13 December 2018, 16:32:46 UTC |
5527e34 | Johnu George | 12 December 2018, 17:01:34 UTC | Update k8s cluster version to 1.10 (#286) | 12 December 2018, 17:01:34 UTC |
67eca98 | oshima | 11 December 2018, 07:22:12 UTC | Enrich GUI (#264) * allow to create studyjob from UI Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * show success alert Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add rbac for ui Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix bug Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * rebase master Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add metrics collector manager to UI Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix typo Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> | 11 December 2018, 07:22:12 UTC |
86cddd3 | Hougang Liu | 11 December 2018, 06:46:34 UTC | update README (#281) | 11 December 2018, 06:46:34 UTC |
1c707dc | Hougang Liu | 11 December 2018, 00:30:40 UTC | fix typo error for MinikubeDemo (#282) | 11 December 2018, 00:30:40 UTC |
f8590e0 | Hougang Liu | 10 December 2018, 06:17:24 UTC | fix typo error (#280) | 10 December 2018, 06:17:24 UTC |
edf6cb5 | ytetra | 09 December 2018, 13:47:06 UTC | add e2eTest of each suggestion algorithm (#265) * random&grid * hyperband * add hyperband test * add grid case check | 09 December 2018, 13:47:06 UTC |
f4913b3 | Richard Liu | 09 December 2018, 12:52:52 UTC | Allow studyjobcontroller to delete pods (#278) | 09 December 2018, 12:52:52 UTC |
c8efb35 | Richard Liu | 07 December 2018, 16:42:11 UTC | Fix katib ui resource paths (#277) | 07 December 2018, 16:42:11 UTC |
36d8d25 | Koichiro Den | 05 December 2018, 09:12:00 UTC | Implement gRPC Health Checking Protocol + add readiness/liveness probes to vizier-core (#270) * Ensure vizier-core never been stuck too long waiting for DB conn Signed-off-by: Koichiro Den <den@valinux.co.jp> * Add standard Health gRPC service Signed-off-by: Koichiro Den <den@valinux.co.jp> * Change db.New to return error instead of exit(1) with log.Fatal Signed-off-by: Koichiro Den <den@valinux.co.jp> * Add SelectOne() to VizierDBInterface Signed-off-by: Koichiro Den <den@valinux.co.jp> * Rename import for later convenience Signed-off-by: Koichiro Den <den@valinux.co.jp> * Implement and register Health Server for Katib manager Signed-off-by: Koichiro Den <den@valinux.co.jp> * Add readiness/liveness probes to vizier-core Signed-off-by: Koichiro Den <den@valinux.co.jp> * Update test codebase Fixes: 61ac5607353 ("Add SelectOne() to VizierDBInterface") Signed-off-by: Koichiro Den <den@valinux.co.jp> | 05 December 2018, 09:12:00 UTC |
3516dda | Richard Liu | 05 December 2018, 08:33:36 UTC | POC: Katib integration with tf-operator (#267) * TF operator part 1 * Add consts * Fix * Update worker; fix schemes * Change example * Add rbac rules * Add crd * Add sleep for debugging * Log cluster name * Remove unrelated change * use katibapi.State | 05 December 2018, 08:33:36 UTC |
55f125c | ytetra | 05 December 2018, 07:02:33 UTC | fix make timing (#271) | 05 December 2018, 07:02:33 UTC |
f863b87 | IWAMOTO Toshihiro | 05 December 2018, 05:13:30 UTC | Add Update{Study,Trial} (#269) Only tested with unit tests. Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> | 05 December 2018, 05:13:30 UTC |
0e3e890 | oshima | 04 December 2018, 02:57:06 UTC | add Richard Liu to OWNERS (#274) Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> | 04 December 2018, 02:57:06 UTC |
211c6ba | oshima | 04 December 2018, 01:58:23 UTC | fix uncompleted value in ui (#238) Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> | 04 December 2018, 01:58:23 UTC |
1104524 | oshima | 04 December 2018, 01:24:06 UTC | fix bayesian optimization suggestion (#251) * fix bayse optimization suggestion Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add bayseopt-example Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * reset x_train in burn-in Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * validate parameters Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> | 04 December 2018, 01:24:06 UTC |
72a0fc0 | Koichiro Den | 30 November 2018, 12:41:56 UTC | Prevent pod restarts caused by slow db boot (#261) * Add readinessProbe for vizier-db Signed-off-by: Koichiro Den <den@valinux.co.jp> * Fix MYSQL_ROOT_PASSWORD Fixes: 67e94c7697bd ("Set MYSQL_ROOT_PASSWORD via Secret (#253)") Signed-off-by: Koichiro Den <den@valinux.co.jp> * Add simple loop to wait for DB connection successfully opened Signed-off-by: Koichiro Den <den@valinux.co.jp> | 30 November 2018, 12:41:56 UTC |
3f5462d | ytetra | 30 November 2018, 12:06:00 UTC | add UT of each suggestion algorithm (#237) * add random algorithm UT * add grid algorithm UT * add hyperband algorithm UT * fix typo * fix typo * add some tests * change various ParameterType pattern * add gengrid() test * fix significant figure | 30 November 2018, 12:06:00 UTC |
24160cb | Richard Liu | 28 November 2018, 06:52:51 UTC | Downgrade kubernetes dependency to 1.10.1 (#256) * downgrade to 1.10.1 * Delete pods * Fix job-name * Set successfulJobsHistoryLimit to 0 * Add comments | 28 November 2018, 06:52:51 UTC |
b7145b3 | Koichiro Den | 26 November 2018, 10:04:51 UTC | Fix incorrectly set namespace (#260) Commit b6f8e07d26a ("Update manifests (#246)") has just changed the namespace as a whole. This new manifest should be updated as well. Fixes: 67e94c7697b ("Set MYSQL_ROOT_PASSWORD via Secret (#253)") Signed-off-by: Koichiro Den <den@valinux.co.jp> | 26 November 2018, 10:04:51 UTC |
67e94c7 | Koichiro Den | 22 November 2018, 05:59:22 UTC | Set MYSQL_ROOT_PASSWORD via Secret (#253) * Set randomly generated MYSQL_ROOT_PASSWORD via Secret Signed-off-by: Koichiro Den <den@valinux.co.jp> * Seperate manifest for MYSQL_ROOT_PASSWORD, "test" being set by default Signed-off-by: Koichiro Den <den@valinux.co.jp> * Update run-tests.sh Fixes: 5312459c28f7 ("Set randomly generated MYSQL_ROOT_PASSWORD via Secret") Signed-off-by: Koichiro Den <den@valinux.co.jp> | 22 November 2018, 05:59:22 UTC |
63dc070 | oshima | 20 November 2018, 23:57:25 UTC | update UI (#255) Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> | 20 November 2018, 23:57:25 UTC |
e5e2dcd | Richard Liu | 20 November 2018, 23:18:57 UTC | Refactor studyjobcontroller (#254) * Refactor studyjob controller * Refactor * Go format files * More refactor * Rename studyjobcontroller to studyjob | 20 November 2018, 23:18:57 UTC |
597064a | Andrey | 20 November 2018, 08:19:24 UTC | Change deploy.sh for Minikube example (#252) * Change deploy for Minikube Example * Change namespace to kubeflow in Minikube example * Delete lines about modeldb from deploy | 20 November 2018, 08:19:24 UTC |
206bcaa | IWAMOTO Toshihiro | 20 November 2018, 01:43:06 UTC | Add mysql based unit tests (#243) Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> | 20 November 2018, 01:43:06 UTC |
b6f8e07 | oshima | 19 November 2018, 04:58:32 UTC | Update manifests (#246) * change namespace katib -> kubeflow Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * change namespace of tfevent-mc | 19 November 2018, 04:58:32 UTC |
f7aff4a | Michelle Casbon | 16 November 2018, 03:05:02 UTC | Add texasmichelle as reviewer (#247) | 16 November 2018, 03:05:02 UTC |
94b138a | oshima | 16 November 2018, 01:26:56 UTC | Tf event mc (#235) * add tf-event mc Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add tfevent mc ci Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add tfeventmc doc Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add comment and use logger Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> | 16 November 2018, 01:26:56 UTC |
9d59a10 | IWAMOTO Toshihiro | 14 November 2018, 06:14:39 UTC | Fix typos for json and objective (#242) Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> | 14 November 2018, 06:14:39 UTC |
29e53b8 | Richard Liu | 13 November 2018, 02:11:21 UTC | Add richardsliu to OWNERS/reviewer (#239) * Add richardsliu to OWNERS * Add richardsliu as reviewer | 13 November 2018, 02:11:21 UTC |
a01f482 | wukong1992 | 08 November 2018, 08:55:46 UTC | add starttime and completiontime to worker (#236) | 08 November 2018, 08:55:46 UTC |
5e51974 | ytetra | 05 November 2018, 20:31:38 UTC | Fix typo (#233) * correct "purse" to "parse" * correct "Doubel" to "Double" * Update push-model.go fix lowercase * Update push-study.go use lowercase | 05 November 2018, 20:31:38 UTC |
04837a4 | IWAMOTO Toshihiro | 05 November 2018, 07:47:01 UTC | More DB unit tests (#234) * Fix EarlyStopParam and SuggestionParam DB methods GetEarlyStopParamList and GetSuggestionParamList mixed up the column order and they returned nothing. Also, SetEarlyStopParam didn't return an ID properly. Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> * Add more DB UTs Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> | 05 November 2018, 07:47:01 UTC |
8e90513 | IWAMOTO Toshihiro | 02 November 2018, 05:44:46 UTC | Fix the build script after #208 (#231) Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> | 02 November 2018, 05:44:46 UTC |
9f87fd8 | IWAMOTO Toshihiro | 01 November 2018, 06:00:42 UTC | Only retry an INSERT operation on unique constraint violation (#229) The retry logic is used to generate an unique ID, but if there is another error the DB code can fall into an infinite loop. Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> | 01 November 2018, 06:00:42 UTC |
0bc5182 | oshima | 29 October 2018, 04:20:23 UTC | New UI for Katib (#208) * add ui Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add ui Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * update test and doc Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix test Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix test Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix test Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * remove modelDB Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix test Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * refactor Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add loading img Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix test Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * Add loading image Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * refactor Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add root redirection Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add latestLog flag to GetWorkerFullInfo Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix test Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> | 29 October 2018, 04:20:23 UTC |
7eeea12 | ytetra | 28 October 2018, 09:46:16 UTC | fix slice range (#226) | 28 October 2018, 09:46:16 UTC |
13373d2 | IWAMOTO Toshihiro | 25 October 2018, 03:22:52 UTC | More db tests (#225) * Remove obsolete comments and an import Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> * Add Worker UTs Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> | 25 October 2018, 03:22:52 UTC |
106235b | IWAMOTO Toshihiro | 24 October 2018, 04:00:15 UTC | Fix storelogs (#222) * Fix StoreWorkerLogs The function has been storing into worker_metrics with duplicates and wrong timestamps for some time. The fix changes the worker_lastlogs DB table definition. DBs must be recreated. Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> * Add foreign key constraints to worker log DB tables and tidy up formatting This patch make sure worker_* rows have matching row in the worker table. Also changes multi-line string formatting for readability. Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> | 24 October 2018, 04:00:15 UTC |
4dc1aed | IWAMOTO Toshihiro | 19 October 2018, 07:34:49 UTC | Check errors in order to avoid SEGV (#219) Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> | 19 October 2018, 07:34:49 UTC |