swh:1:snp:2f03ce5544681ba840560cccc1370b7b35f2566e

sort by:
Revision Author Date Message Commit Date
d21f0ed Merge pull request #3661 from MaciekPytel/ca-1.18.3 Cluster Autoscaler 1.18.3 02 November 2020, 16:06:54 UTC
0760cdb Cluster Autoscaler 1.18.3 02 November 2020, 13:55:45 UTC
9dcf938 Merge pull request #3655 from detiber/backportEM-1.18 [cluster-autoscaler] Backport fixes for packet provider to release-1.18 02 November 2020, 13:54:52 UTC
e629617 Add price support in Packet 30 October 2020, 19:10:05 UTC
7304a81 Add support for multiple nodepools in Packet 30 October 2020, 19:08:19 UTC
39c0540 Add support for scaling up/down from/to 0 nodes in Packet 30 October 2020, 19:07:30 UTC
c33b738 add Packet cloudprovider owners Signed-off-by: Marques Johansson <marques@packet.com> 30 October 2020, 17:31:04 UTC
61507f1 Merge pull request #3625 from nilo19/cleanup/cherry-pick-3532-1.18 Cherry-pick #3532 onto 1.18: Azure: support allocatable resources overrides via VMSS tags 19 October 2020, 03:54:14 UTC
02306ad Azure: support allocatable resources overrides via VMSS tags 19 October 2020, 02:55:34 UTC
b2962ef Merge pull request #3598 from ryaneorth/cherry-pick-3570-1.18 Merge pull request #3570 from towca/jtuznik/scale-down-after-delete-fix 15 October 2020, 08:36:24 UTC
221d032 Merge remote-tracking branch 'upstream/cluster-autoscaler-release-1.18' into cherry-pick-3570-1.18 14 October 2020, 21:14:40 UTC
d282d74 Merge pull request #3612 from ryaneorth/cherry-pick-3441-1.18 Merge pull request #3441 from detiber/fixCAPITests 14 October 2020, 20:53:50 UTC
48c9d68 Merge pull request #3441 from detiber/fixCAPITests Improve Cluster API tests to work better with constrained resources 14 October 2020, 20:23:32 UTC
4e0c2ff Merge pull request #3570 from towca/jtuznik/scale-down-after-delete-fix Remove ScaleDownNodeDeleted status since we no longer delete nodes synchronously 09 October 2020, 21:22:20 UTC
f926cf5 Merge pull request #3581 from nitrag/cherry-pick-3308-1.18 Cherry pick 3308 onto 1.18 - Fix priority expander falling back to random although higher priority matches 07 October 2020, 22:36:16 UTC
51379c2 Merge pull request #3308 from bruecktech/fix-fallback Fix priority expander falling back to random although higher priority matches 05 October 2020, 13:53:31 UTC
486f7b2 Merge pull request #3551 from benmoss/capi-backports-1.18 [CA-1.18] CAPI backports for autoscaling workload clusters 01 October 2020, 13:12:53 UTC
ecbead8 Merge pull request #3560 from marwanad/cherry-pick-3558-1.18 Cherry pick #3558 onto 1.18 - Add missing stable labels in the azure template 30 September 2020, 08:50:26 UTC
f04cd2e fix imports 30 September 2020, 05:12:16 UTC
4c6137f add stable labels to the azure template 30 September 2020, 03:39:26 UTC
f3cfe3d move template-related code to its own file 30 September 2020, 03:38:55 UTC
1e0fe7c Update group identifier to use for Cluster API annotations - Also add backwards compatibility for the previously used deprecated annotations 28 September 2020, 17:31:37 UTC
09accf6 [cluster-autoscaler] Support using --cloud-config for clusterapi provider - Leverage --cloud-config to allow for providing a separate kubeconfig for Cluster API management and workload cluster resources - Allow for fallback to previous behavior when --cloud-config is not specified for backward compatibility - Provides a --clusterapi-cloud-config-authoritative flag to disable the above fallback behavior and allow for both the management and workload cluster clients to use the in-cluster config 28 September 2020, 17:31:37 UTC
5753f3f Add node autodiscovery to cluster-autoscaler clusterapi provider 28 September 2020, 17:31:36 UTC
9dc30d5 Convert clusterapi provider to use unstructured Remove internal types for Cluster API and replace with unstructured access 28 September 2020, 17:31:35 UTC
72526e5 Update vendor to pull in necessary new paths for client-go 28 September 2020, 17:31:34 UTC
6c243fe Merge pull request #2950 from enxebre/skip-machinedeployment Let the controller move on if machineDeployments are not available 28 September 2020, 15:28:45 UTC
f4ba55e Merge pull request #3523 from marwanad/cherry-pick-3440-1.18 Cherry-pick #3440 onto 1.18 - optional jitter on initial VMSS VM cache refresh 17 September 2020, 01:16:45 UTC
1e35781 Azure: optional jitter on initial VMSS VM cache refresh On (re)start, cluster-autoscaler will refresh all VMSS instances caches at once, and set those cache TTL to 5mn. All VMSS VM List calls (for VMSS discovered at boot) will then continuously hit ARM API at the same time, potentially causing regular throttling bursts. Exposing an optional jitter subtracted from the initial first scheduled refresh delay will splay those calls (except for the first one, at start), while keeping the predictable (max. 5mn, unless the VMSS changed) refresh interval after the first refresh. 17 September 2020, 00:55:50 UTC
b24a5be Merge pull request #3519 from marwanad/cherry-pick-3484-1.18 Cherry pick #3484 onto 1.18: Serve stale on ongoing throttling 17 September 2020, 00:10:45 UTC
3e51cc7 Merge pull request #3521 from marwanad/cherry-pick-3437-1.18 Cherry pick #3437 onto 1.18 - Avoid unwanted VMSS VMs caches invalidation 17 September 2020, 00:08:45 UTC
e146e3e call in the nodegroup API to avoid type assertion errors 16 September 2020, 19:17:56 UTC
c153a63 Avoid unwanted VMSS VMs caches invalidations `fetchAutoAsgs()` is called at regular intervals, fetches a list of VMSS, then call `Register()` to cache each of those. That registration function will tell the caller wether that vmss' cache is outdated (when the provided VMSS, supposedly fresh, is different than the one held in cache) and will replace existing cache entry by the provided VMSS (which in effect will require a forced refresh since that ScaleSet struct is passed by fetchAutoAsgs with a nil lastRefresh time and an empty instanceCache). To detect changes, `Register()` uses an `reflect.DeepEqual()` between the provided and the cached VMSS. Which does always find them different: cached VMSS were enriched with instances lists (while the provided one is blank, fresh from a simple vmss.list call). That DeepEqual is also fragile due to the compared structs containing mutexes (that may be held or not) and refresh timestamps, attributes that shoudln't be relevant to the comparison. As a consequence, all Register() calls causes indirect cache invalidations and a costly refresh (VMSS VMS List). The number of Register() calls is directly proportional to the number of VMSS attached to the cluster, and can easily triggers ARM API throttling. With a large number of VMSS, that throttling prevents `fetchAutoAsgs` to ever succeed (and cluster-autoscaler to start). ie.: ``` I0807 16:55:25.875907 153 azure_scale_set.go:344] GetScaleSetVms: starts I0807 16:55:25.875915 153 azure_scale_set.go:350] GetScaleSetVms: scaleSet.Name: a-testvmss-10, vmList: [] E0807 16:55:25.875919 153 azure_scale_set.go:352] VirtualMachineScaleSetVMsClient.List failed for a-testvmss-10: &{true 0 2020-08-07 17:10:25.875447854 +0000 UTC m=+913.985215807 azure cloud provider throttled for operation VMSSVMList with reason "client throttled"} E0807 16:55:25.875928 153 azure_manager.go:538] Failed to regenerate ASG cache: Retriable: true, RetryAfter: 899s, HTTPStatusCode: 0, RawError: azure cloud provider throttled for operation VMSSVMList with reason "client throttled" F0807 16:55:25.875934 153 azure_cloud_provider.go:167] Failed to create Azure Manager: Retriable: true, RetryAfter: 899s, HTTPStatusCode: 0, RawError: azure cloud provider throttled for operation VMSSVMList with reason "client throttled" goroutine 28 [running]: ``` From [`ScaleSet` struct attributes](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/azure_scale_set.go#L74-L89) (manager, sizes, mutexes, refreshes timestamps) only sizes are relevant to that comparison. `curSize` is not strictly necessary, but comparing it will provide early instance caches refreshs. 16 September 2020, 19:17:46 UTC
6eca014 Azure: serve stale on ongoing throttling k8s Azure clients keeps tracks of previous HTTP 429 and Retry-After cool down periods. On subsequent calls, they will notice the ongoing throttling window and will return a synthetic errors (without HTTPStatusCode) rather than submitting a throttled request to the ARM API: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/vendor/k8s.io/legacy-cloud-providers/azure/clients/vmssvmclient/azure_vmssvmclient.go#L154-L158 https://github.com/kubernetes/autoscaler/blob/a5ed2cc3fe0aabd92c7758e39f1a9c9fe3bd6505/cluster-autoscaler/vendor/k8s.io/legacy-cloud-providers/azure/retry/azure_error.go#L118-L123 Some CA components can cope with a temporarily outdated object view when throttled. They call in to `isAzureRequestsThrottled()` on clients errors to return stale objects from cache (if any) and extend the object's refresh period (if any). But this only works for the first API call (returning HTTP 429). Next calls in the same throttling window (per Retry-After header) won't be identified as throttled by `isAzureRequestsThrottled` due to their nul `HTTPStatusCode`. This can makes the CA panic during startup due a failing cache init, when more than one VMSS call hits throttling. We've seen this causing early restarts loops, re-scanning every VMSS due to cold cache on start, keeping the subscription throttled. Practically this change allows the 3 call sites (`scaleSet.Nodes()`, `scaleSet.getCurSize()`, and `AgentPool.getVirtualMachinesFromCache()`) to serve from cache (and extend the object's next refresh deadline) as they would do on the first HTTP 429 hit, rather than returning an error. 16 September 2020, 18:20:11 UTC
45b905c Merge pull request #3452 from nilo19/bug/cherry-pick-3418-1-18 Cherry pick the bug fix in #2418 onto 1.18 23 August 2020, 12:53:41 UTC
3ecf85c Merge pull request #3450 from DataDog/backoff-needs-retries-release-1.18 Cherry-pick onto 1.18: Backoff needs retries 23 August 2020, 12:51:41 UTC
8352911 Fix the bug that nicName in the if block shadows the counterpart outside. 23 August 2020, 10:03:53 UTC
09a08c2 Azure cloud provider: backoff needs retries When `cloudProviderBackoff` is configured, `cloudProviderBackoffRetries` must also be set to a value > 0, otherwise the cluster-autoscaler will instanciate a vmssclient with 0 Steps retries, which will cause `doBackoffRetry()` to return a nil response and nil error on requests. ARM client can't cope with those and will then segfault. See https://github.com/kubernetes/kubernetes/pull/94078 The README.md needed a small update, because the documented defaults are a bit misleading: they don't apply when the cluster-autoscaler is provided a config file, due to: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/azure_manager.go#L299-L308 ... which is also causing all environment variables to be ignored when a configuration file is provided. 23 August 2020, 09:05:23 UTC
a86950d Merge pull request #3444 from marwanad/cherry-pick-new-instances Cherry-pick #3311: Add various azure instance types now available 21 August 2020, 00:27:39 UTC
07036f9 Add various azure instance types now available 20 August 2020, 23:57:18 UTC
72178ad Merge pull request #3345 from detiber/backport3177 [CA-1.18] #3177 cherry-pick: Fix stale replicas issue with cluster-autoscaler CAPI provider 29 July 2020, 11:55:48 UTC
0baddce Merge pull request #3346 from detiber/backport3034 [CA-1.18] #3034 cherry-pick: Improve delete node mechanisms for cluster-api autoscaler provider #3034 29 July 2020, 11:35:49 UTC
bc7b29e Merge pull request #3361 from MaciekPytel/1_18_2 Cluster Autoscaler 1.18.2 27 July 2020, 13:26:17 UTC
7906c7f Cluster Autoscaler 1.18.2 27 July 2020, 12:32:56 UTC
f05cec0 Merge pull request #3359 from johanneswuerbach/automated-cherry-pick-of-#3040-upstream-cluster-autoscaler-release-1.18 Automated cherry pick of #3040: Ignore AWS NodeWithImpairedVolumes taint 27 July 2020, 09:36:17 UTC
6438180 Merge pull request #3356 from babilen5/cherry-pick-3124-to-1.18 [CA-1.18] Cherry-pick #3124: Allow small tolerance on memory capacity when comparing nodegroups 27 July 2020, 09:32:17 UTC
2a17e33 Ignore AWS NodeWithImpairedVolumes taint 27 July 2020, 07:35:05 UTC
0e78e0a Cherry-pick #3124: Allow small tolerance on memory capacity when comparing nodegroups 26 July 2020, 14:02:19 UTC
b8c214e Merge pull request #3332 from MaciekPytel/18_vendor_update Updating vendor against git@github.com:kubernetes/kubernetes.git:rele… 24 July 2020, 15:46:22 UTC
11aef35 Updating vendor against git@github.com:kubernetes/kubernetes.git:release-1.18 (ec73e191f47b7992c2f40fadf1389446d6661d6d) 24 July 2020, 15:22:29 UTC
30a4cd3 Merge pull request #3353 from MaciekPytel/cp_override_18 Allow overriding go version when updating vendor 24 July 2020, 15:12:22 UTC
e607eb9 Allow overriding go version when updating vendor This is required because Kubernetes 1.17 lists go1.12 in go.mod, but it doesn't actually compile using go1.12. 24 July 2020, 14:00:09 UTC
f527b39 remove redundant error checks in mark/unmark deletion functions This change removes a few nil checks against resources returned in the Mark and Unmark deletion functions of the cluster-autoscaler CAPI provider. These checks look to see if the returned value for a resource are nil, but the function will not return nil if it returns an error[0]. We only need to check the error return as discussed here[1]. [0] https://github.com/kubernetes/client-go/blob/master/dynamic/simple.go#L234 [1] https://github.com/openshift/kubernetes-autoscaler/pull/141/files#r414480960 23 July 2020, 19:16:30 UTC
60151cd Improve delete node mechanisms in cluster-autoscaler CAPI provider This change adds a function to remove the annotations associated with marking a node for deletion. It also adds logic to unmark a node in the event that an error is returned after the node has been annotated but before it has been removed. In the case where a node cannot be removed (eg due to minimum size), the node is unmarked before we return from the error condition. 23 July 2020, 19:16:22 UTC
a46a6d1 Rewrite DeleteNodesTwice test to check API not TargetSize for cluster-autoscaler CAPI provider 23 July 2020, 19:15:32 UTC
05ae2be Compare against minSize in deleteNodes() in cluster-autoscaler CAPI provider When calling deleteNodes() we should fail early if the operation could delete nodes below the nodeGroup minSize(). This is one in a series of PR to mitigate kubernetes#3104 23 July 2020, 19:15:29 UTC
bcdc272 Get replicas always from API server for cluster-autoscaler CAPI provider When getting Replicas() the local struct in the scalable resource might be stale. To mitigate possible side effects, we want always get a fresh replicas. This is one in a series of PR to mitigate kubernetes#3104 23 July 2020, 19:15:25 UTC
143877b Add mutex to DeleteNodes in cluster-autoscaler CAPI provider This change adds a mutex to the MachineController structure which is used to gate access to the DeleteNodes function. This is one in a series of PRs to mitigate kubernetes#3104 23 July 2020, 19:15:22 UTC
13a6e8e Merge pull request #3337 from gjtempleton/automated-cherry-pick-of-#3185-#3222-upstream-cluster-autoscaler-release-1.18 Automated cherry pick of #3185: cluster-autoscaler: use generated instance types #3222: Fix AWS CA tests for InstanceType generation changes 21 July 2020, 22:54:05 UTC
090e8aa Fix AWS CA tests for InstanceType generation changes 21 July 2020, 14:22:46 UTC
c10b201 cluster-autoscaler: use generated instance types Signed-off-by: Julien Balestra <julien.balestra@datadoghq.com> 21 July 2020, 14:22:44 UTC
86d70ea Merge pull request #3306 from marwanad/1.18-user-agent Fix user-agent string in azure clients 09 July 2020, 07:40:02 UTC
bbb4ab9 fix user agent string 09 July 2020, 06:23:09 UTC
02f033e add UserAgent parameter to clients 09 July 2020, 05:21:53 UTC
a00c70a move more clients out of vendor 09 July 2020, 05:13:44 UTC
fd3a834 Merge pull request #3305 from marwanad/1.18-cherry-pick-3296 Cherry-pick #3296: Fix possible lock issue and add timeouts for queuing of long running operations 09 July 2020, 04:54:55 UTC
d409a7f fix potential lock issue 09 July 2020, 04:12:26 UTC
5a9de63 fix err in log and cleanup other logs 09 July 2020, 04:09:43 UTC
7a8613a add context timeouts for the sync part of long running operations 09 July 2020, 04:07:09 UTC
fd1a370 Merge pull request #3292 from marwanad/1.18-cherry-pick-3221-and-3284 Cherry-pick #3221, #3284 - Async Deletions and 409 Conflicts 06 July 2020, 08:10:51 UTC
8476c5d synchronize instance deletions to avoid 409 conflict errors 06 July 2020, 06:30:51 UTC
05236b9 update tests 06 July 2020, 06:30:51 UTC
fe8b90e update to use async clients 06 July 2020, 06:30:51 UTC
0a252f9 add async clients for deletion 06 July 2020, 06:30:51 UTC
0470c39 Merge pull request #3285 from marwanad/cherry-pick-3277-1.18 Cherry pick #3277: Decrement curSize by the number of instances to be deleted 06 July 2020, 05:56:51 UTC
62ba5b7 decerement cache by the proper amount 05 July 2020, 06:08:15 UTC
f6071d3 move lock to the get method 05 July 2020, 06:07:48 UTC
41890e3 Merge pull request #3281 from marwanad/1.18-cherry-pick-3278 Cherry-pick #3278: use contexts with timeouts in scale set GET calls 05 July 2020, 02:12:49 UTC
571adaa use contexts with timeouts in scale set GET calls 04 July 2020, 21:04:55 UTC
8cd624d Merge pull request #3262 from marwanad/reduce-lock-scope-1.18 Cherry-pick #3261: Reduce instance lock scope in scale sets 02 July 2020, 02:02:02 UTC
c7172ba reduce instance mutex lock scope since its used by the Nodes() call to refresh cache 01 July 2020, 23:48:12 UTC
6e1eb04 Merge pull request #3244 from nilo19/bug/cherry-pick-3242-1.18 Cherry-pick #3242: Disable increaseSize when the node group is under initialilzation. 25 June 2020, 17:20:38 UTC
4bf3e47 Disable increaseSize when the node group is under initialilzation. 25 June 2020, 13:37:19 UTC
adb7b42 Merge pull request #3175 from detiber/backport-3057-v1.18 [CA-1.18] #3057 cherry-pick: CAPI: Do not normalize Node IDs outside of CAPI provider 04 June 2020, 07:54:44 UTC
f7aa0e9 Merge pull request #3190 from marwanad/fix-bad-cherry-pick-3141-1.18 1.18: fix bad cherry-pick for #3141 04 June 2020, 00:20:44 UTC
8f880ac fix bad cherry-pick for #3141 03 June 2020, 17:30:10 UTC
9738adb Do not normalize Node IDs outside of CAPI provider 03 June 2020, 14:28:57 UTC
c6d8539 Merge pull request #3172 from detiber/backport-2983-v1.18 [CA-1.18] #2983 cherry-pick: ClusterAPI Provider: Provide fake proivder IDs for failed Machines 03 June 2020, 07:42:17 UTC
6d7a013 Merge pull request #3178 from marwanad/cherry-pick-3141-1.18 Cherry-pick #3141: Avoid sending extra deletion calls for in-progress deletions 03 June 2020, 06:46:16 UTC
2162cee add unit test for in progress deletion cases 03 June 2020, 05:39:35 UTC
a948e5c avoid sending unncessary delete requests if delete is in progress 03 June 2020, 05:39:26 UTC
1a6bb0e Add testing for fake provider IDs 02 June 2020, 16:48:17 UTC
b898c1f Provide fake proivder IDs for failed machines 02 June 2020, 16:48:14 UTC
38df5c6 Merge pull request #3171 from detiber/backport-2940-v1.18 [CA-1.18] #2940 cherry-pick CAPI: Stop panicking in newMachineController 02 June 2020, 16:46:15 UTC
fd59e23 CAPI: Stop panicking in newMachineController 01 June 2020, 19:43:38 UTC
9cd7d97 Merge pull request #3105 from xmudrii/cherry-pick-2934 [CA-1.18] #2934 cherry-pick: Fixes 2932: let the capi version to be discovered 04 May 2020, 14:46:27 UTC
4e108b7 Add the ability to override CAPI group via env variable and discover API version. This change adds detection for an environment variable to specify the group for the clusterapi resources. If the environment variable `CAPI_GROUP` is specified, then it will be used instead of the default. This also decouples the API group from the version and let the latter to be discovered dynamically. 30 April 2020, 17:03:08 UTC
9f9e6ef Merge pull request #3056 from ydye/gpu-label-cherrypick-1.18 Cherry-pick #3019: Correct cloudprovider/azure's GPULabel to "accelerator" 15 April 2020, 08:26:03 UTC
8009f1d Correct cloudprovider/azure's GPULabel to "accelerator" 15 April 2020, 05:33:52 UTC
f6489c5 Merge pull request #3045 from marwanad/cherry-pick-3036-1.18 Cherry-pick #3036: Proactively decrement scale set count during deletion operations 14 April 2020, 10:01:11 UTC
back to top