# How Katib v1beta1 tunes hyperparameter automatically in a Kubernetes native way See the following guides in the Kubeflow documentation: * [Concepts](https://www.kubeflow.org/docs/components/hyperparameter-tuning/overview/) in Katib, hyperparameter tuning, and neural architecture search. * [Getting started with Katib](https://kubeflow.org/docs/components/hyperparameter-tuning/hyperparameter/). * Detailed guide to [configuring and running a Katib experiment](https://kubeflow.org/docs/components/hyperparameter-tuning/experiment/). ## Example and Illustration After install Katib v1beta1, you can run `kubectl apply -f katib/examples/v1beta1/random-example.yaml` to try the first example of Katib. Then you can get the new `Experiment` as below. Katib concepts will be introduced based on this example. ```yaml $ kubectl get experiment random-example -n kubeflow -o yaml apiVersion: kubeflow.org/v1beta1 kind: Experiment metadata: ... name: random-example namespace: kubeflow ... spec: algorithm: algorithmName: random maxFailedTrialCount: 3 maxTrialCount: 12 metricsCollectorSpec: collector: kind: StdOut objective: additionalMetricNames: - Train-accuracy goal: 0.99 metricStrategies: - name: Validation-accuracy value: max - name: Train-accuracy value: max objectiveMetricName: Validation-accuracy type: maximize parallelTrialCount: 3 parameters: - feasibleSpace: max: "0.03" min: "0.01" name: lr parameterType: double - feasibleSpace: max: "5" min: "2" name: num-layers parameterType: int - feasibleSpace: list: - sgd - adam - ftrl name: optimizer parameterType: categorical resumePolicy: LongRunning trialTemplate: trialParameters: - description: Learning rate for the training model name: learningRate reference: lr - description: Number of training model layers name: numberLayers reference: num-layers - description: Training model optimizer (sdg, adam or ftrl) name: optimizer reference: optimizer trialSpec: apiVersion: batch/v1 kind: Job spec: template: spec: containers: - command: - python3 - /opt/mxnet-mnist/mnist.py - --batch-size=64 - --lr=${trialParameters.learningRate} - --num-layers=${trialParameters.numberLayers} - --optimizer=${trialParameters.optimizer} image: docker.io/kubeflowkatib/mxnet-mnist name: training-container restartPolicy: Never status: ... ``` #### Experiment When you want to tune hyperparameters for your machine learning model before training it further, you just need to create an `Experiment` CR like above. To learn what fields are included in the `Experiment.spec`, see the detailed guide to [configuring and running a Katib experiment](https://kubeflow.org/docs/components/hyperparameter-tuning/experiment/). #### Trial For each set of hyperparameters, Katib will internally generate a `Trial` CR with the hyperparameters key-value pairs, job manifest string with parameters instantiated and some other fields like below. `Trial` CR is used for internal logic control, and end user can even ignore it. ```yaml $ kubectl get trial -n kubeflow NAME TYPE STATUS AGE random-example-58tbx6xc Succeeded True 14m random-example-5nkb2gz2 Succeeded True 21m random-example-88bdbkzr Succeeded True 20m random-example-9tgjl9nt Succeeded True 17m random-example-dqzjb2r9 Succeeded True 19m random-example-gjfdgxxn Succeeded True 20m random-example-nhrx8tb8 Succeeded True 15m random-example-nkv76z8z Succeeded True 18m random-example-pcnmzl76 Succeeded True 21m random-example-spmk57dw Succeeded True 14m random-example-tvxz667x Succeeded True 16m random-example-xpw8wnjc Succeeded True 21m $ kubectl get trial random-example-gjfdgxxn -o yaml -n kubeflow apiVersion: kubeflow.org/v1beta1 kind: Trial metadata: ... name: random-example-gjfdgxxn namespace: kubeflow ownerReferences: - apiVersion: kubeflow.org/v1beta1 blockOwnerDeletion: true controller: true kind: Experiment name: random-example uid: 34349cb7-c6af-11ea-90dd-42010a9a0020 ... spec: metricsCollector: collector: kind: StdOut objective: additionalMetricNames: - Train-accuracy goal: 0.99 metricStrategies: - name: Validation-accuracy value: max - name: Train-accuracy value: max objectiveMetricName: Validation-accuracy type: maximize parameterAssignments: - name: lr value: "0.012171302435678337" - name: num-layers value: "3" - name: optimizer value: adam runSpec: apiVersion: batch/v1 kind: Job metadata: name: random-example-gjfdgxxn namespace: kubeflow spec: template: spec: containers: - command: - python3 - /opt/mxnet-mnist/mnist.py - --batch-size=64 - --lr=0.012171302435678337 - --num-layers=3 - --optimizer=adam image: docker.io/kubeflowkatib/mxnet-mnist name: training-container restartPolicy: Never status: completionTime: "2020-07-15T15:29:00Z" conditions: - lastTransitionTime: "2020-07-15T15:25:16Z" lastUpdateTime: "2020-07-15T15:25:16Z" message: Trial is created reason: TrialCreated status: "True" type: Created - lastTransitionTime: "2020-07-15T15:29:00Z" lastUpdateTime: "2020-07-15T15:29:00Z" message: Trial is running reason: TrialRunning status: "False" type: Running - lastTransitionTime: "2020-07-15T15:29:00Z" lastUpdateTime: "2020-07-15T15:29:00Z" message: Trial has succeeded reason: TrialSucceeded status: "True" type: Succeeded observation: metrics: - latest: "0.959594" max: "0.960490" min: "0.940585" name: Validation-accuracy - latest: "0.959022" max: "0.959188" min: "0.921658" name: Train-accuracy startTime: "2020-07-15T15:25:16Z" ``` #### Suggestion Katib will internally create a `Suggestion` CR for each `Experiment` CR. `Suggestion` CR includes the hyperparameter algorithm name by `algorithmName` field and how many sets of hyperparameter Katib asks to be generated by `requests` field. The CR also traces all already generated sets of hyperparameter in `status.suggestions`. Same as `Trial`, `Suggestion` CR is used for internal logic control and end user can even ignore it. ```yaml $ kubectl get suggestion random-example -n kubeflow -o yaml apiVersion: kubeflow.org/v1beta1 kind: Suggestion metadata: ... name: random-example namespace: kubeflow ownerReferences: - apiVersion: kubeflow.org/v1beta1 blockOwnerDeletion: true controller: true kind: Experiment name: random-example uid: 34349cb7-c6af-11ea-90dd-42010a9a0020 ... spec: algorithmName: random requests: 12 status: suggestionCount: 12 suggestions: ... - name: random-example-gjfdgxxn parameterAssignments: - name: lr value: "0.012171302435678337" - name: num-layers value: "3" - name: optimizer value: adam - name: random-example-88bdbkzr parameterAssignments: - name: lr value: "0.013408352284328112" - name: num-layers value: "4" - name: optimizer value: ftrl - name: random-example-dqzjb2r9 parameterAssignments: - name: lr value: "0.028873905258692753" - name: num-layers value: "3" - name: optimizer value: adam ... ``` ## What happens after an `Experiment` CR created When a user created an `Experiment` CR, Katib controllers including experiment controller, trial controller and suggestion controller will work together to achieve hyperparameters tuning for user Machine learning model.