https://github.com/kubeflow/katib

sort by:
Revision Author Date Message Commit Date
8ec484d Move common types to its own package 09 May 2019, 01:22:23 UTC
ed22e55 Move common types 09 May 2019, 00:20:56 UTC
55a1faa Move ObjectiveSpec definition to Trial CRD 08 May 2019, 20:38:36 UTC
105044c Merge branch 'master' into trial_mc 08 May 2019, 17:04:55 UTC
6853960 V1alpha2 Metrics collector (part 1) (#484) * Add metrics collector parser * Metrics collector implementation * Add metrics controller configmap * Metrics collector script and rbac * rename tmpValues * Fix for comments * Fix comments 08 May 2019, 04:10:33 UTC
6f557fe Fix e2e test 08 May 2019, 03:51:25 UTC
5d4e704 Merge branch 'master' into trial_mc 08 May 2019, 02:38:13 UTC
23deb5d Add metrics collector spec to Trial spec 08 May 2019, 02:25:49 UTC
a586bca enable test for katib-manager (#478) * enable test for katib-manager * add pv/pvc for v1alpha2 test * install dependency of test client * pip install by requirement file 08 May 2019, 00:18:36 UTC
6d95830 Remove outdated TODOs in README.md (#468) 07 May 2019, 14:49:52 UTC
823fa9f Get experiment config from the instance (#474) * Get experiment config from instance * Add parsing * Move getExperiment to util * Change objectmeta.name to name 07 May 2019, 00:33:40 UTC
df67741 Fix KatibClient name (#483) 03 May 2019, 01:36:18 UTC
709d97c Add Katib Client in v1alpha2 (#480) * Init commit * Add Katib Client * Add GetConfigMap func Move templates const * Change folder for Katib client * Delete old client * Change name for default templates 01 May 2019, 22:14:18 UTC
fd4c21c Add metrics collector spec to v1alpha2 API (#481) * Add metrics collector spec to v1alpha2 API * Delete metricsCollectorType * Fix * Fix unit test 01 May 2019, 03:26:55 UTC
70c3ccd vizier-core does not need any role (#482) 30 April 2019, 08:15:40 UTC
c93eb1f katib manager db error (#476) * katib manager db error condition is keyword of mysql, we need escape it in sql * fix test case error * use status to replace condition as column name 29 April 2019, 13:42:22 UTC
2b55c69 share one grpc-health-probe (#477) 29 April 2019, 04:32:19 UTC
6f5c5c7 validation and mutating webhook for experiment (#473) * validation and mutating webhook for experiment * add test for webhook * use controller-runtime client instead of client-go * use existing objectivetype const * fillback default TrialTemplate * validate if record for the new experiment exists in DB 27 April 2019, 23:22:20 UTC
78a4563 enable test for v1alpha2 (#465) * enable test for v1alpha2 * add KATIB-CORE-NAMESPACE env for controller * update example filed * share same image for two version controller * add status in subresources of crd 26 April 2019, 06:04:26 UTC
bc57a6d Add serviceAccountName in UI deployment (#469) 25 April 2019, 02:22:46 UTC
728b37b chore: Skip test when code is not changed (#467) Signed-off-by: Ce Gao <gaoce@caicloud.io> 24 April 2019, 23:38:45 UTC
5a7a144 Adding initial v1alpha2 API controller (#457) * Adding initial v1alpha2 controller * Adding logs * Adding comments * Adding template functions for experiment * Adding error checks 23 April 2019, 23:22:00 UTC
b886768 v1alpha2 api server implementation (#456) * add v1-alpha2 api server implementation Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add test Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add filter argument to GetTrialList Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * rename filter to filter_by_name Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * revert filter_by_name to filter Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 19 April 2019, 03:25:55 UTC
f5c59f6 fix(readme): Merge image directory (#455) Signed-off-by: Ce Gao <gaoce@caicloud.io> 15 April 2019, 08:12:03 UTC
33e8e30 Update REAME example links for v1alpha1 (#452) * Update REAME example links for v1alpha1 * pkg/api/api.proto -> pkg/api/v1alpha1/api.proto * pkg/api/gen-doc/api.md -> pkg/api/v1alpha1/gen-doc/api.md * Links in pkg/api/README.md need to be doubled up for alpha1 and alpha2 * manifests -> manifests/v1alpha1 * Fix another examples link * No more relative link * rename header * update scripts link 10 April 2019, 20:00:09 UTC
05569bc fix py client import error (#453) 10 April 2019, 08:46:15 UTC
04b3051 ClusterRoleBinding doesn't need namespace field (#451) 03 April 2019, 08:53:43 UTC
7ef5594 Update API for NAS in v1alpha2 (#450) * Update API for NAS in v1alpha2 * Fix name * Fix name in input size 03 April 2019, 07:35:42 UTC
b25422a Restructuring test scripts for v1alpha1 and v1alpha2 (#449) * Restructing test scripts for v1alpha1 and v1alpha2 * Fix package location 02 April 2019, 21:25:19 UTC
3d4cd04 Code restructuring to support V1alpha1 and V1alpha2 API (#448) * Code restructuring to support V1alpha1 and V1alpha2 API * Adding comments * Test package changes * Moving requirements file * Fix the package location * Renaming studyjobcontroller to katib-controller 01 April 2019, 21:56:33 UTC
4ab3dbd Fix labels matching the job operator implementation (#447) 29 March 2019, 18:26:14 UTC
de7323c Updating the pytorch example image (#446) 29 March 2019, 04:50:12 UTC
21855a1 Remove redundant lock (#444) 29 March 2019, 02:02:12 UTC
1316bad add v1alpha2 grpc api (#427) * add v1alpha2 grpc api Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * update gRPC API Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add v1alpha2 DB IF Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix typo, add doc and add todo for nasconfig Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * apply comments Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix typo Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * update proto Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * update Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 26 March 2019, 17:07:21 UTC
e260258 Remove katibcli (#436) * removed cmd folder for CLI * removed docs folder for CLI * final cleanup of cli removal * removed cli from build script * added go packages into unit-test script 20 March 2019, 21:32:58 UTC
28eb81d Change datadir for avoid failure due to lost+found (#432) 18 March 2019, 00:57:10 UTC
887f356 fix demo link (#434) * fix demo link change to correct link README.md * The link should say README.md as well. The link should say README.md as well. 15 March 2019, 23:42:57 UTC
06f955b Add fault tolerance support for trial failure (#424) * add fault tolerance for trial failure * fix a small typo * fix a typo * improve fault processing strategy * add an important TODO * fix typo * add some more TODOs 14 March 2019, 01:58:22 UTC
c87d583 Test for Bayesian Optimization Algo (#406) * added tests for acquisition function and models * added tests for global_optimizer * added tests for boa * minor linting * tests for algorithm manager * added discrete parameter to study config * covered all parameter types * moved python script to testing folder * added python tests to unit tests * remembered to uncomment existing tests * fixed path to test script * moved python tests to separate job in workflow * added run command to test script 11 March 2019, 19:54:38 UTC
61451ef Katib v1alpha2 API for CRDs (#381) * v1alpha2 API proposal * Fix comments round 1 * Refactor into Experiment and Trial * Incorporate feedback from meeting * Rename * Minor edits 08 March 2019, 02:23:33 UTC
86bd27a Add NAS team as reviewers (#419) * Add NAS team in reviewers * Update reviewers 06 March 2019, 05:27:59 UTC
feee2f9 Multiple Trials for Reinforcement Learning Suggestion (#416) * supoort multiple trials * adjust To Do * language improvement in README.md * fix several problems * fix a potential problem * handle the GetEvaluationResult() return None problem 06 March 2019, 01:48:01 UTC
3a705a1 Fix the package version in training container (#418) * fix the version of tf and keras * fix a typo 06 March 2019, 00:40:03 UTC
8f89ad4 Add validation for NAS job in Katib controller (#398) * Initial commit * Add validation for NAS config * Fix validation * Add algorithmType in NasConfig validation * Add Discrete ParameterType to validation * Move validation to webhook Change GetJobType function Make a list with NAS algorithms * Add ValidateSuggestionParameters function in Katib API * Fix api * Add ValidateSuggestionParameters to Suggestion service * Change isValid to int32 * Create Validation function in NAS RL Suggestion service * Fix small problems * Reduce code inside Validation function * Add empty ValidateSuggestionParameters function in each HP service written in GO * Fix logging * Add ValidateSuggestionParameters to mock * Handle Unvailable error 05 March 2019, 23:47:59 UTC
db6b83b Fix path to api protobuf (#415) 01 March 2019, 02:54:21 UTC
4d8c599 Add support for parallel studyjobs (#404) * Add support for parallel studyjobs * fix a typo * Reorganize the program a little bit * fix a typo * fix a typo 27 February 2019, 03:05:45 UTC
87a31f3 Add separable/depthwise convolution, data augmentation and multiple GPU support (#393) * add separable/depthwise convolution in operation library * add ENAS example StudyJob yaml * remove ENAS example, add data augmentation, add multiple GPU support 27 February 2019, 01:33:48 UTC
4d031e7 Add create time to Trial API (#410) * Add create time to Trial API * Add Trial create time information * Fix UT for db 27 February 2019, 00:32:45 UTC
26da3ea Metric collector must fail on error (#405) * Fail when unable to collect logs * Set backlimit to 0 for jobs 26 February 2019, 04:32:34 UTC
6b75138 add latest tag for katib images (#409) 25 February 2019, 17:37:16 UTC
46d2dc7 add build and test for suggestion nasrl (#401) 22 February 2019, 02:35:03 UTC
d6a67ea Database APIs for NAS updated (#394) * FINAL PUSH * FIX TESTS * new lock * new lock * small fi * DELET SPACE * deleted ununsed function 21 February 2019, 01:37:09 UTC
3bb8b54 Suggestion for Neural Architecture Search with Reinforcement Learning (#339) * Suggestion for Neural Architecture Search with Reinforcement Learning * Add NAS RL Suggestion * Fix new line * set json format for GetSuggestion() * finish trial return in GetSuggestion(), finish GetEvaluationHistory, and fix bugs * fix a bug in GetEvaluationResult() * fix bigs in GetEvaluationResult * fix an error in GetEvaluatinResult * Add python Katib api * Remove unnecessary requirements * add about for suggestion * rename to README * Add picture explanations; make the printouts more organized * fix typos * fix some small problems * Fix several problems * Fix a typo * fix some problems * small fixes * Suggestion do not need to handle uncompleted trials * fix a small problem 21 February 2019, 00:53:59 UTC
5a1a791 add validating webhook for studyJob (#383) * add validating webhook for studyJob If create/update a studyJob with bad CR manifest or invalid configuration, k8s api server will reject the request. Fixes: #314 * add test * allow check "kubectl" error code 20 February 2019, 17:40:23 UTC
8a89b9e Removing Operator specific handling during a StudyJob run (#387) * Removing Operator specific handling during a StudyJob run * Return empty in error 20 February 2019, 06:19:50 UTC
edecd39 Delete modeldb from unit tests (#391) * Delete modeldb from unit tests * Add library to interface test 20 February 2019, 00:41:30 UTC
c0f2f07 show studyjob condition when run kubectl get (#389) 19 February 2019, 03:21:42 UTC
ee62c33 Training Container with Model Constructor for cifar10 (#345) * Training Container with Model Constructor for cifar10 * fix a small bug * make num_epochs a parameter 15 February 2019, 02:23:48 UTC
3706fce add studyjob python client (#379) 14 February 2019, 18:15:03 UTC
1de9307 fix wrong example (#378) 14 February 2019, 18:14:53 UTC
03ca08f Upgrading and controller runtime k8s to 1.11.2 (#376) 14 February 2019, 16:36:02 UTC
a5c8e02 Properly initialize CI cluster credential (#360) It has been using the cluster where argo ran 14 February 2019, 05:32:03 UTC
41a5a2e Include go dependencies in developer-guide.md (#369) Looks like Google protobufs might also be a dependency? 13 February 2019, 19:27:47 UTC
d6ea2d5 fix invalid memory address (#368) 12 February 2019, 02:47:44 UTC
421cbff Fix presubmits (#363) * Fix typo * Fix gcloud builds submit command * Use printf() instead of print() 08 February 2019, 06:07:13 UTC
0ea34b1 Katib 2019 Roadmap (#348) * roadmap * Fixing format * Add links to github issues * Fix comments 01 February 2019, 04:28:57 UTC
afee0c3 Update OWNERS (#350) 29 January 2019, 05:43:25 UTC
f11c13e Extend Katib API for NAS jobs (#327) * Add fields to studyjob structure * Change nasjob yaml file * Change parameter type * Add Parameter Type=range * Change API * Change input size * Reset API structure * Change StudyJob API structure * Remove Range parameter * Fix api.proto * Fix gopkg.toml * Remove old nasjob file * Fix nasjob.yaml * Add custom suggestion * Add blank NAS suggestion Change Katib API to process yaml file for NAS * Add correct YAML file for NAS example * Fix newline * Change StudyID to 1 * Add jobType parameter in Parsing * Remove changes in manager * Add NasConfig inside Yaml file * Fix name in nasConfig * Fix get StudyConfig in NAS * Add JobType in all services * Add job_type in bayesian_service * Add pointers in NasConfig structure * Fix Pointer in API * Add consts for jobType Remove return from populateCommonConfigFields * Move const jobType to const file * Remove Range parameter * Modify YAML file for NAS jobs * Add getStudyJobType function in GRPC server * Add blank GetStudyJobType func in manager * Fix metrics collector * Remove jobType from getStudy * Remove getStudyJobType from manager * Add NAS RL yaml deployment * Change worker to GPU * Clean nasrl suggestion * Add -u inside training-container * Fix namespace in worker template 29 January 2019, 01:23:06 UTC
f4026e4 ignore tfjob/pytorch job if corresponding CRD not created (#335) * ignore tfjob/pytorch job if corresponding CRD not created * update log message * only ignore NoMatchError when watch CRD * refactor func name for watch error 25 January 2019, 00:26:22 UTC
c67892f Clarify the example UI is generated by random-example. (#333) 24 January 2019, 16:50:31 UTC
a777721 only try to delete study info in db when in need (#342) 24 January 2019, 01:16:26 UTC
8545970 omit empty fields for studyjob status (#336) 22 January 2019, 19:40:17 UTC
15bbcae Update pytorch example with latest image (#329) * Update pytorch example with latest image * Update pytorch example docker image 18 January 2019, 19:07:12 UTC
a24c428 Fix typo (#330) 18 January 2019, 00:00:00 UTC
d41f8e8 Add information how to run TFjob and Pytorch examples in Katib (#321) * Add doc for tfjob and pytorch examples in Katib * Add contents * Fix README * Fix link to examples in README * Fix README * Add information about Katib UI and status of StudyJob * Add Ambassador information 16 January 2019, 15:30:01 UTC
0ed361c Add xgboost example using Bayesian optimization (#320) * Add xgboost example * Add comments for ames example 15 January 2019, 23:22:00 UTC
4a69776 katib should be able to be deployed in any namespace (#324) 15 January 2019, 02:32:07 UTC
3c37f31 Adding distributed pytorch example for katib (#309) 08 January 2019, 10:56:59 UTC
9aa90fa minor fixes (#307) 08 January 2019, 10:22:02 UTC
f78a108 delete obsolete data in db (#315) * delete obsolete data in db * add delete study test * make sure trials and workers deleted when study deleted in ut test 07 January 2019, 15:22:39 UTC
fae6aa5 add bestTrialId to statusJob status (#312) * add bestTrialId to statusJob status * generate mock and add bestworkerid 03 January 2019, 03:42:45 UTC
f24889c Add api doc (#303) * add api doc Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix typo Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add instructions for update api files and docs Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix typo Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 25 December 2018, 17:22:57 UTC
1295f45 validate studyJob when first reconcile it (#308) * validate studyJob when first reconcile it Fixes: #297 * use 3rd-party uuid instead of self-define one k8s.io/apimachinery/pkg/util/uuid is used in kubernetes source code 23 December 2018, 02:11:11 UTC
cbe91f8 add hougangliu as a reviewer (#310) 22 December 2018, 05:51:27 UTC
9baabbf Adding to OWNERS file (#304) * Adding to OWNERS file * adding to reviewers 21 December 2018, 13:57:09 UTC
b11b81d sync up worker status all the time (#299) Fixes: #298 20 December 2018, 04:53:31 UTC
bca0b58 studyJob with non-kubeflow namespace cannot work (#302) 19 December 2018, 18:02:49 UTC
8e89813 Adding master pod check for default metric collector (#300) 19 December 2018, 15:03:34 UTC
07e0fd2 reduce some redundant code (#296) 19 December 2018, 01:24:56 UTC
28c5b1c Extend studyjob client API (#288) * Add namespace parameter to studyJob client API * Change if statement for namespace * Create func getNamespace 16 December 2018, 15:43:49 UTC
4be865e fix deploy (#284) 16 December 2018, 15:43:43 UTC
eb4a35b update Readme (#295) A trial can be corresponds to a k8s job, TFJob and PyTorchJob now. Not only k8s job any more. 16 December 2018, 15:34:39 UTC
5a7977d fix studyJob status suggestionCount mismatch error (#290) Fixes: #289 14 December 2018, 15:14:46 UTC
41e8f7d fix invalid worker kind issue (#287) * fix invalid worker kind issue studyJob should go to 'Failed' status when worker kind is invalid * add PyTorchJob as valid worker job kind 14 December 2018, 01:18:22 UTC
33b2e58 get metricscollector by API (#292) Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 13 December 2018, 20:00:04 UTC
f16aecc Support Pytorch job in Katib (#283) * Pytorch support in Katib * Adding pytorch worker kind to metrics collector * Updating Gopkg * Adding sleep * Changing the worker name * Adding gcr image 13 December 2018, 16:32:46 UTC
5527e34 Update k8s cluster version to 1.10 (#286) 12 December 2018, 17:01:34 UTC
67eca98 Enrich GUI (#264) * allow to create studyjob from UI Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * show success alert Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add rbac for ui Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix bug Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * rebase master Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add metrics collector manager to UI Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * fix typo Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> 11 December 2018, 07:22:12 UTC
86cddd3 update README (#281) 11 December 2018, 06:46:34 UTC
1c707dc fix typo error for MinikubeDemo (#282) 11 December 2018, 00:30:40 UTC
back to top