https://github.com/kubeflow/katib

sort by:
Revision Author Date Message Commit Date
85a44bd Update tensorflow-gpu for v1beta1 Change tensorflow-gpu version to 1.15.4 for enas-cnn-cifar10 in v1beta1 30 October 2020, 02:06:44 UTC
746185f Bump tensorflow-gpu in /examples/v1alpha3/nas/enas-cnn-cifar10 Bumps [tensorflow-gpu](https://github.com/tensorflow/tensorflow) from 1.15.2 to 1.15.4. - [Release notes](https://github.com/tensorflow/tensorflow/releases) - [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md) - [Commits](https://github.com/tensorflow/tensorflow/compare/v1.15.2...v1.15.4) Signed-off-by: dependabot[bot] <support@github.com> 30 October 2020, 01:03:44 UTC
6be88cd Merge pull request #1349 from ChenjunZou/expr_crd_typo fix typo in crd-experiment 30 October 2020, 00:57:58 UTC
7e48498 UI: Add resume policy to submit Experiment by parameters (#1362) 27 October 2020, 11:29:59 UTC
3153836 UI: Support metrics strategies to submit Experiment by parameters (#1364) * Add metric strategies to submit by params Experiment page * Fix additional metric names * Add checkbox style 27 October 2020, 11:25:59 UTC
f75f9fe UI: Support new parameters in Trial template (#1363) * Extend submit Experiment using parameters in UI. Add all parameters from Trial template. Add YAML template submission. * Fix Trial spec * Add changes for NAS submit * Add primary pod labels to NAS 27 October 2020, 11:21:59 UTC
3a7f45e Add Tekton Pipeline example (#1339) * Tekton example Add README for Tekton examples Add yaml with PipelineRun * Fix README * Remove istio annotation * Fix comment 27 October 2020, 06:43:58 UTC
614f9f8 Remove istio sidecar annotation from all examples (#1338) * Manually add istio sidecar annotation to all examples * Revert pytorch namespace * Add istio to random * Remove istio annotation from examples * Remove istio from metadata example 27 October 2020, 03:15:59 UTC
4450611 Clean up code for custom CRD (#1355) * Clean up code for custom CRD * Fix few comments * Rename experiment controller files 26 October 2020, 21:27:59 UTC
64be948 Switch to AWS CI/CD (#1356) * Add changes for AWS test infra * Remove comment from resume e2e * Change worker image * Refactor e2e test script * Replace create and delete cluster with testing scripts * Fix cluster name * Fix delete cluster * Test without folder for GOPATH * Add AWS creds to env * Comment creds * Delete v1alpha3 workflow from prow Add ECR env * Add AWS cred * Get other build for all images * Attach volume to create and delete cluster * Fix path for NAS suggestions * Change make deploy * Fix path * Change deploy * Move create cluster to build * Fix path to valid exp * Remove * Change command * Fix region * Fix run e2e go path * Add backoff * Add github.com to folder * Add github to src folder * Fix Katib path * Print CRDs in e2e Experiment * Show known types * Trigger CI * Set TypeMeta for experiment * Print known types * Build binary e2e * manually create experiment * Print ns list * Remove GCP auth * Set kube config * Add other e2e tests * Remove bin * Fix template name * Return exp in case of error * Deploy TF and PyTorch controllers * Create Kubeflow namespace * Remove v1alpha3 tests * Remove Katib client changes * Add ttl seconds after finished * Change to 5 hours * Increase activeDeadlineSeconds * Add comments * Add release workflow 26 October 2020, 18:33:02 UTC
35150fd Improve Katib README (#1361) 22 October 2020, 02:47:34 UTC
8ebba43 Add MPI operator horovod example (#1342) * Add MPI Job horovod example * Add link to dockerimage * Change mpi example docker hub registry * Remove istio sidecar annotation 17 October 2020, 00:02:13 UTC
85fc7d0 Enhancement for Custom CRD (#1333) * Init commit * Modify Insert function Add retry on empty observation * Fix mutate volume test * Fix validate experiment test * Fix invalid experiment * Don't get deployed job status when trial is completed * Not send Trial with unavailable metrics to Suggestion * Refactor requeue If objective metric value is not reported metrics collector reports unavailable value to the DB Controller reconciles Trial until DB is empty * Add condition before change trial status * Remove prints * Fix tfevent parser 13 October 2020, 13:02:27 UTC
f74c6a3 fix typo in crd-experiment 25 September 2020, 07:32:17 UTC
6a07daa Add trial metadata substitution example (#1319) * Add trial metadata example * Change description * Add istio sidecar false to annotation 18 September 2020, 03:02:45 UTC
6aa4ec9 fix(metrics-collector): allow user to nuke ephemeral-storage requests (#1312) * fix(metrics-collector): allow user to nuke ephemeral-storage requests * chore(gofmt): fan formatting * chore(gofmt): undo formatting on auto generated api.pb files 17 September 2020, 22:44:45 UTC
74e6e5b feat: Ignore pb files in update gofmt (#1340) Update travis nvm version to 12.18.1 Ignore .pb files in gofmt 17 September 2020, 02:38:46 UTC
721a382 Upload python SDK version (#1335) * Upload 0.0.4 SDK version * Fix doc links * Fix links in README * Modify tables * Fix link * Remove * Modify client and gen script * Update version to 0.0.5 * Run CI * Add Katib client to init 15 September 2020, 09:42:07 UTC
4b11f80 Add SDK examples for v1beta1 (#1337) * Add SDK examples for v1beta1 * Modify import 15 September 2020, 02:24:07 UTC
e99c77d Run post-submit image build in kubeflow-ci project (#1326) * Change registry for presubmit * Add prow_config to workflows * Add project to gcloud auth * Test manager * Add kubeflow-ci project for build in post-submit 14 September 2020, 15:34:58 UTC
6b7142f Custom CRD: Wait for all processes before running metrics collector (#1313) * Enable to wait all in metrics collectors * Rename metricsFilePath * Fix tfevent * Fix pns py * Fix comment 09 September 2020, 06:57:52 UTC
7b797e1 Custom CRD: Support dynamic Trial's jobs conditions (#1307) * Custom Job conditions implementation * Fix prints * Fix status * Fix test * Clean event msg * Run gofmt * Fix few comments * Generate clients * Fix comment * Add newline 08 September 2020, 21:45:52 UTC
2580186 Custom CRD: Add primary container name (#1308) * Add primary container name * Resolve * Generate clients * Add newline 08 September 2020, 19:01:52 UTC
fc8d522 [Adopters] change adopter of Ant Group (#1327) 05 September 2020, 16:25:41 UTC
d5c5e95 Update generate script with SDK (#1323) * Update generate script * Capitalise API * Remove verbose 04 September 2020, 20:47:42 UTC
a072156 Switch test from kubeflow-ci to automl-ci project. (#1321) * Change project to automl-ci * Change registry to automl-ci for presubmit * Add cluster role to sa * Add print * Modify user * Remove user change * Add separate scripts to build metrics-collectors * Move govaralls to after success * Update doc * Trigger CI 04 September 2020, 15:31:41 UTC
cba4560 Fix Pod's ownership to inject metrics collector (#1303) * Refactor get Katib job * Get trial after func * Remove trialName * return error * Remove error * Resolve 03 September 2020, 14:09:41 UTC
36aef5f Fix problem with Hyperopt Out of Range error (#1315) 03 September 2020, 10:59:41 UTC
d58b6a1 Custom CRD: Add primary pod labels (#1305) * Add primary pod labels * Generate swagger * Generate SDK * Trigger CI 03 September 2020, 02:41:40 UTC
ef6557a Custom CRD: Set dynamic watch from controller flags (#1302) 02 September 2020, 15:13:06 UTC
ced8496 Fix restart check in controller for completed experiments (#1306) * Add check for experiment restart in controller * Change comment 02 September 2020, 13:49:07 UTC
0b7a5f2 Update CI test cluster version to 1.16 (#1316) * Update CI cluster version to 1.16 * Add retry strategy * Remove backoff 01 September 2020, 13:03:50 UTC
2ceed7d Update docs for v1beta1 SDK (#1304) * Update docs for v1beta1 SDK * Fix samples in v1alpha3 19 August 2020, 11:05:11 UTC
1d18594 [python sdk] add v1beta1 models (#1252) * [python sdk] add v1beta1 models * upgrade version of python SDK to 0.0.3 * remove v1Alpha3 python sdk * add some python models manually: v1Time and V1UnstructuredUnstructured * bring back v1alpha3 * create separate python sdk for v1alpha3 and v1beta1 * move on * release pkg on pypi.org * remove dist files * refine 18 August 2020, 10:11:32 UTC
051d1de Proposal: Support custom CRD in Trial Job (#1273) * Add proposal for custom CRD in Trial Template * Fix * Modify doctoc * Doc fixes * Rename header * Fixes * Change doc * Remove comma * Fix Implementation 14 August 2020, 15:20:21 UTC
282b71e Support volume settings in Katib config (#1291) * Support volume settings in config * Set default path 12 August 2020, 12:03:45 UTC
77dd34e Refactor Trial controller unit test (#1299) * Refactor Trial controller unit test Move prometheus to util * Change import 11 August 2020, 06:06:16 UTC
cca0358 Use Logger in suggestion controller util (#1298) * add-logger-suggestion-controller-util * Change log message 10 August 2020, 19:43:59 UTC
2c4ad15 Log update object status error (#1297) * Info instead of Error when update status is failed * Log generate name instead of name * Test. Expect that suggestion is succeeded when experiment is updating 06 August 2020, 19:07:42 UTC
33832a7 Verify that Trials were successfully deleted (#1288) * Verify that trials were deleted Update suggestion status * Update suggestion requests * Fix tests * Fix comment * Add recorder to test controllers * Travis test * Change resume exp trial condition * Modify e2e for from volume experiment * Fix IsRestarting check * Fix comment 06 August 2020, 12:32:55 UTC
88eb798 Set number of epochs to decrease e2e tests time (#1290) * Add epoch for mnist e2e examples * Replace batch-size with epochs * Remove file * Remove changes from hyperband * Remove epoch from pytorch 06 August 2020, 01:20:54 UTC
9cf4544 Unit test for resuming Experiment in controller reconcilers (#1281) * Add unit test for Experiment and Suggestion controller reconcile * Delete buildTrialMetaForRunSpec from controller * Modify condition check * Fix format * Run mock for new version * Refactor experiment reconcile test * Remove comment 03 August 2020, 18:43:42 UTC
329b22e Validate restart Experiment parameters (#1287) * Validate resume experiment in webhook * Fix restart check * Fix test 03 August 2020, 11:21:40 UTC
6b9e914 Get metrics collector config data refactor (#1285) * Refactor get metrics collector config Fix PV name in validation webhook * Add test in validation for Katib config 31 July 2020, 16:11:07 UTC
ce89cbf feat(experiment): Add a check before deletion (#1223) * feat(experiment): Add a check before deletion Signed-off-by: Ce Gao <gaoce@caicloud.io> * fix: Delete all trials Signed-off-by: Ce Gao <gaoce@caicloud.io> * feat: Implement in v1beta1 Signed-off-by: Ce Gao <gaoce@caicloud.io> 31 July 2020, 10:35:07 UTC
ac1dc24 Add e2e test for FromVolume ResumePolicy (#1284) * Add e2e test for from volume resume * Resume experiment after completion * Print controller logs * Remove test prints * Remove controller logs 31 July 2020, 01:27:06 UTC
a42d8a9 Refactor suggestion config and add Composer unit test (#1282) * Init commit * Add test for deployment * Refactor suggestion config * Switch mock to 1.4.3 version * Fix empty map * Fix comments * Fix gofmt * Move package 30 July 2020, 17:28:31 UTC
c33da9d support trial meta injection in trial template rendering (#1259) * support trial meta injection in trial template rendering * use trialSpec.metadata as prefix of trialMeta reference * solve conflicts * add some comments on consts * apply gofmt * refactor * fix mock test * refine 27 July 2020, 17:24:17 UTC
9a7d43c Resume Experiment from Volume (#1275) * Resume experiment from the PV * Add comment * Remove old api comments * Change reason for Running suggestion * Fix few comments * Rename volume name like suggestion deployment * Add corev1 to const 27 July 2020, 15:32:18 UTC
27658a7 GRPC: Rename Manager to DBManager service (#1279) * Rename Manager to DBManager in gRPC * Update git ignore 25 July 2020, 01:36:16 UTC
9320b4e Add status to experiment CRD manifest (#1276) 23 July 2020, 02:29:39 UTC
ac091db Fix few API comments typos (#1274) * Fix few typos in API comments * Generate open API 20 July 2020, 01:26:50 UTC
b5465bd Add FPGA accelerated examples (#1269) * Add instructions for FPGA accelerated Experiments * XGBoost FPGA accelerated example * Ommit unnecessary quotes * Add the new example in the list of training container images * Ommit explicit declaration of the metrics collector Co-authored-by: Vaggelis Gkiastas <vaggelisgkia@hotmail.com> 16 July 2020, 21:35:03 UTC
f565047 Modify documentation for v1beta1 (#1267) * Change doc for v1beta1 * Fix 16 July 2020, 02:38:34 UTC
c199867 Add e2e test for DARTS (#1268) * Add e2e for darts * Remove todo 16 July 2020, 01:24:35 UTC
50fc911 UI: Add new ConfigMap with Trial Templates (#1265) * Add new configMap with Trial Templates * Enable to view all namespaces * Fix log 15 July 2020, 01:32:37 UTC
cbe0f40 Fix examples to run on OpenShift (#1241) 13 July 2020, 13:50:32 UTC
226c99c UI: Add Trial table pages (#1262) 10 July 2020, 01:56:36 UTC
f1393b9 UI: Delete ConfigMap with no Trial Templates (#1260) * Delete ConfigMap if there are no Trial Templates Add snack box for Templates * Not add empty ConfigMaps * Fix e2e test 10 July 2020, 01:16:35 UTC
4145c4f Fix paths in prow config (#1257) 08 July 2020, 01:29:08 UTC
42dbb56 UI: Update Material UI version to V4 (#1254) * Init commit for material UI v4 * Rebase * Add label to all selects * Remove changes from v1alpha3 07 July 2020, 01:53:57 UTC
c3b38d8 Adding retries for gRPC calls (#1248) 06 July 2020, 18:13:57 UTC
2e9d676 UI: Remove update button from Experiments view page (#1253) * Update experiments without button Move monitor components to common * Move fetch experiments to common rename job to experiment * Modify vars 06 July 2020, 17:24:55 UTC
ea1bb06 UI: Sorting for Trials information table (#1251) * Enable sort in Trials table * Remove console log * Modify cell width 03 July 2020, 01:28:47 UTC
29b797c String type for metric values (#1245) * Change metric type from float to string * Check unavailable latest objective metric in isTrialObservationAvailable * Fix e2e test * Delete consts.UnavailableMetricValue from e2e 30 June 2020, 12:38:06 UTC
f7cea41 Add hints to obtain Kubeflow and Minikube version (#1230) 30 June 2020, 02:05:58 UTC
c387843 chore: Update OWNERS (#1235) Signed-off-by: Ce Gao <gaoce@caicloud.io> 28 June 2020, 11:36:13 UTC
acd9e43 Validation for parameters in HP Experiments (#1243) * Add spec.parameters validation * Fix print * Change parameter index in test * Change reflect to equality.Semantic 28 June 2020, 01:40:17 UTC
3377a48 Fix Trial parameter name for v1beta1 DARTS example (#1246) * Fix Trial parameter in darts example * Fix description 27 June 2020, 02:02:15 UTC
6b2a8d7 Update unit tests for Suggestion client (#1238) * Update unit test for suggestion client * Add test for GetRPCClient 26 June 2020, 01:59:30 UTC
913d55a Delete fake suggestion interfaces (#1240) 26 June 2020, 01:17:30 UTC
2626dcf Rename Ant Financial to Ant Group (#1237) 25 June 2020, 20:52:16 UTC
5ea1746 UI: Optimisation and enhancements for v1beta1 Katib UI (#1232) * UI improvements * Add db-manager-addr flag Modify README * Fix graphviz * Modify README * Remove ID * Add zoom to graphviz * Remove unused scripts from index * Fix doc * Change npm install to npm ci Commit package-lock * fix npm version * Increase max_old_space_size * Set react-scripts to 3.2.0 * Modify doc 24 June 2020, 15:13:17 UTC
ad3f3b0 Not validate Trial template resources (#1231) 24 June 2020, 01:35:17 UTC
fd1dd0d Fix default metrics stretegy (#1226) 22 June 2020, 16:16:40 UTC
1f92669 Fix pytorch job in trial template (#1225) 22 June 2020, 02:20:39 UTC
12ed391 UI: Add Trial Parameters to submit experiment page (#1224) * Modify trial template editor and trial template configMap in submit Experiment by parameters * Add trial parameters * Change trialParam when configMap has been changed * Remove comment * Fix submit nas job 20 June 2020, 14:08:39 UTC
20d5708 Show all namespaces in experiment list (#1219) 18 June 2020, 01:24:04 UTC
f34ea32 Modify metric strategies example for new Trial template (#1218) 17 June 2020, 12:16:40 UTC
0be190a Add more test cases for Generator (#1216) * Init commit * Add test to generator for unstructured * Add test to generator for configmap * Fix log * Add comment 16 June 2020, 11:32:04 UTC
94ae58f New Trial Template validation (#1215) * Add validation for new Trial Template * Create patch to validate trial template * Remove print * Change fmt to log * Add test description for validator test * Move imports 15 June 2020, 14:10:03 UTC
a918e08 extracting metric value in multiple ways (#1140) * migrate this PR to v1beta1 * fix * fix * fix * add example for metric strategies * modify MetricStrategy specification * fix 12 June 2020, 18:33:55 UTC
917164a add clientset/lister/informer generation (#1194) * adapt v1beta1 * add resources for v1alpha3 * fix * refine update-codegen script * add katib apiVersion as prefix of swagger version 10 June 2020, 15:08:51 UTC
88da808 New Trial Template API controller implementation (#1202) * Add new Trial Template API * Add TrialSource in TrialTemplate Remove GoTemplate from Experiment API * Add init logic for new trial template * Change examples Implement new Trial Template to controller * Modify trial template configMap for new version of Trial Template Fix experiment defaults change valid/invalid experiment * Run gofmt * Fix hyperband * Remove num-epochs from grid example * Add tag to file mc example * Modify create Trials loop 10 June 2020, 14:24:51 UTC
a6cf636 Add citation information (#1210) * Add citation information * Update README.md * Update README.md 06 June 2020, 02:17:45 UTC
c2c5288 Python SDK for katib (#1177) * python SDK for katib with docs and examples * Update README.md * Update README.md * Update bayesianoptimization-katib-sdk.ipynb * Update bayesianoptimization-katib-sdk.ipynb * Update tfjob-katib-sdk.ipynb * Create OWNERS * Update bayesianoptimization-katib-sdk.ipynb * Update tfjob-katib-sdk.ipynb * Update bayesianoptimization-katib-sdk.ipynb * Update tfjob-katib-sdk.ipynb * Update bayesianoptimization-katib-sdk.ipynb * Update OWNERS * Update README.md * updated changes as per review comments * Update README.md * Update README.md * Added pip installation for katib sdk and removed status from get_optimal_hyperparamater API * Update README.md * Updated changes for delete_exp and removed unused imports 04 June 2020, 02:34:43 UTC
2179c16 Rename algorithm_setting to algorithm_settings in manager (#1204) 03 June 2020, 15:10:20 UTC
1039299 Add link to darts in training container images list doc (#1201) 02 June 2020, 01:42:15 UTC
5df1bd2 Re: Support string metrics values in Controller (#1200) * Support string metrics * Fix error in e2e tests 29 May 2020, 03:23:15 UTC
6ae79ca Modify new algorithm service doc (#1198) * Add step to new algorithm service doc to modify check Katib ready script * Add link to script 28 May 2020, 01:58:02 UTC
6e7a1aa Katib v1beta1 version (#1197) * Add v1beta1 version * Swagger for v1alpha3 and v1beta1 versions Fix format in bash scripts * Change make build to make buildv1alpha3 * Add folder path to python test * Fix folder in python test * Add goptuna and darts suggestions to check-katib-ready * Disable custom metrics collector e2e in v1beta1 28 May 2020, 01:54:02 UTC
b4dd083 Add more algorithm settings to DARTS (#1195) * Add more algorithm settings to DARTS * Delete data 22 May 2020, 01:26:38 UTC
b842f45 Fix split metrics in tf event metrics collector (#1191) 19 May 2020, 02:33:36 UTC
7a3a6cb UI: Fix comparison of metric values in Metrics Info Plot (#1192) * Fix compare in trial info plot * Remove parentheses 19 May 2020, 01:43:36 UTC
e77a49f Support one and two NN layers in DARTS (#1185) * Enable to run DARTS for 1 and 2 NN layers * Update prow 15 May 2020, 06:30:58 UTC
5c4624f Revert 1176 PR (String metrics values) (#1189) 15 May 2020, 05:50:59 UTC
8aaf864 Fix Never Resume Policy for Experiment (#1184) * Fix Never Resume Suggestion Add e2e test for never resume * Fix name for never resume in e2e * Add permission on run never resume 14 May 2020, 13:10:23 UTC
131f378 Change scikit-learn version to 0.22.0 (#1187) 14 May 2020, 01:44:22 UTC
99de8f8 DARTS documentation (#1180) * README for DARTS * Fix docs 08 May 2020, 01:43:42 UTC
581562a Unittest for DARTS Suggestion (#1179) * Add unittest for darts * Fix pip * Change chocolate className Change timeout for chocolate test 07 May 2020, 15:11:10 UTC
back to top