https://github.com/kubernetes/autoscaler

sort by:
Revision Author Date Message Commit Date
1ba66f5 Merge pull request #3660 from MaciekPytel/ca-1.19.1 Cluster Autoscaler 1.19.1 02 November 2020, 14:52:56 UTC
3ad3393 Cluster Autoscaler 1.19.1 02 November 2020, 13:49:12 UTC
cae2f2f Merge pull request #3654 from detiber/backportEM-1.19 [cluster-autoscaler] Backport fixes for packet provider to release-1.19 02 November 2020, 13:38:52 UTC
1e86266 Add price support in Packet 30 October 2020, 16:42:44 UTC
a86274c Add support for multiple nodepools in Packet 30 October 2020, 16:41:08 UTC
3ce6ab8 Add support for scaling up/down from/to 0 nodes in Packet 30 October 2020, 16:41:04 UTC
4991afb add Packet cloudprovider owners Signed-off-by: Marques Johansson <marques@packet.com> 30 October 2020, 16:36:58 UTC
dbca528 Merge pull request #3627 from nilo19/cleanup/cherry-pick-3532-1.19 Cherry-pick #3532 onto 1.19: Azure: support allocatable resources overrides via VMSS tags 19 October 2020, 03:54:14 UTC
816a742 Azure: support allocatable resources overrides via VMSS tags 19 October 2020, 03:02:40 UTC
814935b Merge pull request #3597 from ryaneorth/cherry-pick-3570-1.19 Merge pull request #3570 from towca/jtuznik/scale-down-after-delete-fix 15 October 2020, 08:34:24 UTC
a02e9a5 Merge remote-tracking branch 'upstream/cluster-autoscaler-release-1.19' into cherry-pick-3570-1.19 14 October 2020, 21:16:55 UTC
fad6360 Merge pull request #3613 from ryaneorth/cherry-pick-3441-1.19 Merge pull request #3441 from detiber/fixCAPITests 14 October 2020, 20:53:50 UTC
e110878 Merge pull request #3441 from detiber/fixCAPITests Improve Cluster API tests to work better with constrained resources 14 October 2020, 20:24:47 UTC
2b98e02 Merge pull request #3570 from towca/jtuznik/scale-down-after-delete-fix Remove ScaleDownNodeDeleted status since we no longer delete nodes synchronously 09 October 2020, 21:21:16 UTC
1529a20 Merge pull request #3550 from benmoss/capi-backports-1.19 [CA-1.19] CAPI backports for autoscaling workload clusters 01 October 2020, 13:10:53 UTC
550e6ef Merge pull request #3557 from benmoss/backport-3416 [CA-1.19] Backport #3416 30 September 2020, 21:48:53 UTC
ce448be Merge pull request #3559 from marwanad/cherry-pick-3558 Cherry pick #3558 onto 1.19 - Add missing stable labels in the azure template 30 September 2020, 08:50:26 UTC
f3f3772 gofmt 30 September 2020, 03:35:51 UTC
a76685e add stable labels to the azure template 30 September 2020, 03:23:02 UTC
ec4bf55 move template-related code to its own file 30 September 2020, 03:22:01 UTC
bafb887 Remove go.mod from local copy of gophercloud Replacing the module path in go.mod did not solve the issue preventing hack/update-vendor.sh from running properly, so it will have to be deleted. 29 September 2020, 16:19:35 UTC
520b117 Merge pull request #3161 from detiber/fixCAPIAnnotations Update group identifier to use for Cluster API annotations 28 September 2020, 15:07:11 UTC
60376f9 Merge pull request #3203 from detiber/configSplit2 [cluster-autoscaler] Support using --cloud-config for clusterapi provider 28 September 2020, 15:06:20 UTC
d28f118 Merge pull request #3314 from detiber/autoDiscovery [cluster-autoscaler][clusterapi] Add support for node autodiscovery to clusterapi provider 28 September 2020, 15:03:32 UTC
4eb203a Merge pull request #3312 from detiber/unstructured [cluster-autoscaler][clusterapi] Remove internal types in favor of unstructured 28 September 2020, 15:02:54 UTC
e1979a9 Merge pull request #3522 from marwanad/cherry-pick-3440-1.19 Cherry pick #3440 onto 1.19 - optional jitter on initial VMSS VM cache refresh 17 September 2020, 00:32:45 UTC
7dfcf2d Azure: optional jitter on initial VMSS VM cache refresh On (re)start, cluster-autoscaler will refresh all VMSS instances caches at once, and set those cache TTL to 5mn. All VMSS VM List calls (for VMSS discovered at boot) will then continuously hit ARM API at the same time, potentially causing regular throttling bursts. Exposing an optional jitter subtracted from the initial first scheduled refresh delay will splay those calls (except for the first one, at start), while keeping the predictable (max. 5mn, unless the VMSS changed) refresh interval after the first refresh. 17 September 2020, 00:11:21 UTC
1e90d80 Merge pull request #3518 from marwanad/cherry-pick-3484-1.19 Cherry pick #3484 onto 1.19: Serve stale on ongoing throttling 17 September 2020, 00:10:45 UTC
bba156c Merge pull request #3520 from marwanad/cherry-pick-3437-1.19 Cherry pick #3437 onto 1.19 - Avoid unwanted VMSS VMs caches invalidations 17 September 2020, 00:08:45 UTC
c8d9d01 call in the nodegroup API to avoid type assertion errors 16 September 2020, 19:02:14 UTC
06974e7 Avoid unwanted VMSS VMs caches invalidations `fetchAutoAsgs()` is called at regular intervals, fetches a list of VMSS, then call `Register()` to cache each of those. That registration function will tell the caller wether that vmss' cache is outdated (when the provided VMSS, supposedly fresh, is different than the one held in cache) and will replace existing cache entry by the provided VMSS (which in effect will require a forced refresh since that ScaleSet struct is passed by fetchAutoAsgs with a nil lastRefresh time and an empty instanceCache). To detect changes, `Register()` uses an `reflect.DeepEqual()` between the provided and the cached VMSS. Which does always find them different: cached VMSS were enriched with instances lists (while the provided one is blank, fresh from a simple vmss.list call). That DeepEqual is also fragile due to the compared structs containing mutexes (that may be held or not) and refresh timestamps, attributes that shoudln't be relevant to the comparison. As a consequence, all Register() calls causes indirect cache invalidations and a costly refresh (VMSS VMS List). The number of Register() calls is directly proportional to the number of VMSS attached to the cluster, and can easily triggers ARM API throttling. With a large number of VMSS, that throttling prevents `fetchAutoAsgs` to ever succeed (and cluster-autoscaler to start). ie.: ``` I0807 16:55:25.875907 153 azure_scale_set.go:344] GetScaleSetVms: starts I0807 16:55:25.875915 153 azure_scale_set.go:350] GetScaleSetVms: scaleSet.Name: a-testvmss-10, vmList: [] E0807 16:55:25.875919 153 azure_scale_set.go:352] VirtualMachineScaleSetVMsClient.List failed for a-testvmss-10: &{true 0 2020-08-07 17:10:25.875447854 +0000 UTC m=+913.985215807 azure cloud provider throttled for operation VMSSVMList with reason "client throttled"} E0807 16:55:25.875928 153 azure_manager.go:538] Failed to regenerate ASG cache: Retriable: true, RetryAfter: 899s, HTTPStatusCode: 0, RawError: azure cloud provider throttled for operation VMSSVMList with reason "client throttled" F0807 16:55:25.875934 153 azure_cloud_provider.go:167] Failed to create Azure Manager: Retriable: true, RetryAfter: 899s, HTTPStatusCode: 0, RawError: azure cloud provider throttled for operation VMSSVMList with reason "client throttled" goroutine 28 [running]: ``` From [`ScaleSet` struct attributes](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/azure_scale_set.go#L74-L89) (manager, sizes, mutexes, refreshes timestamps) only sizes are relevant to that comparison. `curSize` is not strictly necessary, but comparing it will provide early instance caches refreshs. 16 September 2020, 19:01:46 UTC
37b9378 Azure: serve stale on ongoing throttling k8s Azure clients keeps tracks of previous HTTP 429 and Retry-After cool down periods. On subsequent calls, they will notice the ongoing throttling window and will return a synthetic errors (without HTTPStatusCode) rather than submitting a throttled request to the ARM API: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/vendor/k8s.io/legacy-cloud-providers/azure/clients/vmssvmclient/azure_vmssvmclient.go#L154-L158 https://github.com/kubernetes/autoscaler/blob/a5ed2cc3fe0aabd92c7758e39f1a9c9fe3bd6505/cluster-autoscaler/vendor/k8s.io/legacy-cloud-providers/azure/retry/azure_error.go#L118-L123 Some CA components can cope with a temporarily outdated object view when throttled. They call in to `isAzureRequestsThrottled()` on clients errors to return stale objects from cache (if any) and extend the object's refresh period (if any). But this only works for the first API call (returning HTTP 429). Next calls in the same throttling window (per Retry-After header) won't be identified as throttled by `isAzureRequestsThrottled` due to their nul `HTTPStatusCode`. This can makes the CA panic during startup due a failing cache init, when more than one VMSS call hits throttling. We've seen this causing early restarts loops, re-scanning every VMSS due to cold cache on start, keeping the subscription throttled. Practically this change allows the 3 call sites (`scaleSet.Nodes()`, `scaleSet.getCurSize()`, and `AgentPool.getVirtualMachinesFromCache()`) to serve from cache (and extend the object's next refresh deadline) as they would do on the first HTTP 429 hit, rather than returning an error. 16 September 2020, 18:02:21 UTC
0971e3c Merge pull request #3453 from nilo19/bug/cherry-pick-3418-1.19 Cherry pick the bug fix in #2418 onto 1.19 23 August 2020, 12:53:40 UTC
4a87180 Merge pull request #3451 from DataDog/backoff-needs-retries-release-1.19 Cherry-pick onto 1.19: Backoff needs retries 23 August 2020, 12:51:41 UTC
fcd4122 Fix the bug that nicName in the if block shadows the counterpart outside. 23 August 2020, 10:06:30 UTC
e81c026 Azure cloud provider: backoff needs retries When `cloudProviderBackoff` is configured, `cloudProviderBackoffRetries` must also be set to a value > 0, otherwise the cluster-autoscaler will instanciate a vmssclient with 0 Steps retries, which will cause `doBackoffRetry()` to return a nil response and nil error on requests. ARM client can't cope with those and will then segfault. See https://github.com/kubernetes/kubernetes/pull/94078 The README.md needed a small update, because the documented defaults are a bit misleading: they don't apply when the cluster-autoscaler is provided a config file, due to: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/azure_manager.go#L299-L308 ... which is also causing all environment variables to be ignored when a configuration file is provided. 23 August 2020, 09:02:52 UTC
db76938 Merge pull request #3381 from vivekbagade/cluster-autoscaler-release-1.19 Cluster Autoscaler 1.19.0 30 July 2020, 14:58:31 UTC
c753e45 Cluster Autoscaler 1.19.0 30 July 2020, 14:36:47 UTC
bfd3bc1 Merge pull request #3380 from krzysied/alt_name_fix VPA - Adding altName to e2e webhook cert 30 July 2020, 12:04:30 UTC
d4a1755 adding altName to e2e webhook cert 30 July 2020, 11:35:08 UTC
56e2bb5 Merge pull request #3339 from ysy2020/cluster-autoscaler-huaweicloud Add huaweicloud to list of supported cloud providers 30 July 2020, 09:54:31 UTC
d2afd1c Merge pull request #3374 from bskiba/fix-cert Add subjectAltName to VPA webhook cert 29 July 2020, 11:25:47 UTC
38da3c0 Add subjectAltName to VPA webhook cert 29 July 2020, 09:30:44 UTC
106a822 Merge pull request #3344 from DataDog/jb/vmss-min-max cluster-autoscaler: ignore nodegroups with min/max tag issues 28 July 2020, 22:55:47 UTC
9c7d8a6 Merge pull request #3363 from ellistarn/kubemark [cluster-autoscaler] Fixes to Kubemark integration. 28 July 2020, 13:13:08 UTC
c7483d9 cluster-autoscaler: ignore nodegroups with min/max tag issues Signed-off-by: Julien Balestra <julien.balestra@datadoghq.com> 28 July 2020, 09:15:23 UTC
2d8ba32 Merge pull request #3367 from bskiba/new-webhook Switch VPA admission controller to v1 admissionregistration API 28 July 2020, 08:39:08 UTC
e119e8e Testing local kubemark changes 27 July 2020, 23:50:35 UTC
b7922d7 Switch VPA admission controller to v1 admissionregistration API 27 July 2020, 19:42:58 UTC
4882a2a Merge pull request #3227 from povilasv/namespaced VPA: Allow limiting VPA deployment to a namespace 27 July 2020, 08:20:17 UTC
774390e Merge pull request #3351 from towca/jtuznik/upcoming-annotation Add an annotation identifying upcoming nodes 24 July 2020, 13:54:22 UTC
48fbe6a Merge pull request #3350 from MaciekPytel/go_version_override Allow overriding go version when updating vendor 24 July 2020, 13:42:22 UTC
1ffd20e Merge pull request #3352 from towca/jtuznik/FitsAnyNodeMatching Add the ability to check if a pod fits onto any node matching a function 24 July 2020, 13:40:22 UTC
3958c66 Add an annotation identifying upcoming nodes 24 July 2020, 13:20:34 UTC
b56af8a Allow overriding go version when updating vendor This is required because Kubernetes 1.17 lists go1.12 in go.mod, but it doesn't actually compile using go1.12. 24 July 2020, 13:19:16 UTC
6e38b2a Add the ability to check if a pod fits onto any node matching a function 24 July 2020, 13:15:23 UTC
8c5a0b7 fixes after review 24 July 2020, 06:35:13 UTC
c4592cf fixes after review 24 July 2020, 06:32:40 UTC
a87e70c fixes after review 24 July 2020, 06:31:45 UTC
0e5f460 Merge pull request #3343 from bskiba/master Fix for VPA e2e vendor update 23 July 2020, 09:55:39 UTC
99f1df5 Fix for VPA e2e vendor update 23 July 2020, 09:22:03 UTC
3850cc5 Merge pull request #3225 from bskiba/vpa-to-k8s-1-18-3 Update VPA k8s dependencies to 1.18.3 22 July 2020, 15:52:05 UTC
ebbc578 Merge pull request #3340 from bskiba/fix-limit-capping Only cap to limit in RequestOnly mode 22 July 2020, 13:10:05 UTC
2d189bb Merge pull request #3155 from tghartland/magnum-nodegroups Support Magnum node groups 22 July 2020, 09:36:05 UTC
d4c5acc Merge pull request #3328 from gjtempleton/CA-AWS-Static-Instance-List-Update CA - AWS CloudProvider - Static Instance List update 21 July 2020, 22:16:06 UTC
aa3008c fixed errors of travis-ci test 21 July 2020, 19:58:04 UTC
b0a64f9 Only cap to limit in RequestOnly mode 21 July 2020, 17:13:10 UTC
da09d22 Add huaweicloud to list of supported cloud providers 21 July 2020, 14:51:57 UTC
3edbe0c Fixes needed after e2e deps update 21 July 2020, 14:32:08 UTC
428ddcc Update VPA e2e k8s dependencies to 1.18.3 21 July 2020, 14:31:49 UTC
4ae09e6 Update magnum README 21 July 2020, 13:21:55 UTC
f65c9cd Update magnum tests 21 July 2020, 13:21:55 UTC
8b70591 Update magnum utils 21 July 2020, 13:21:55 UTC
7328895 Update magnum cloud provider 21 July 2020, 13:21:55 UTC
04a6084 Update magnum node group to use new manager 21 July 2020, 13:21:55 UTC
fa9efa2 Merge pull request #3330 from uswitch/fix_boundary_panic VPA: fix nil pointer in getBoundaryRecommendation when no limits set 21 July 2020, 11:59:15 UTC
fc9f92d Merge pull request #3326 from marwanad/fix-azure-useragent Configure user agent properly for azure clients 21 July 2020, 11:57:15 UTC
bfc9b09 Merge pull request #3331 from MaciekPytel/revert_huawei Revert "Merge pull request #3099 from ysy2020/cluster-autoscaler-huaw… 21 July 2020, 11:33:14 UTC
f35dcfa Revert "Merge pull request #3099 from ysy2020/cluster-autoscaler-huaweicloud" This reverts commit 84cbb3bc7923d56c6cffe1b117cb89cb91820243, reversing changes made to 6d6903f2f9e680ec50d41bf15af2df1ca0160ca0. 21 July 2020, 11:00:56 UTC
0a3cd33 Fixes after updating k8s.io deps 21 July 2020, 09:31:09 UTC
56b2a10 Update VPA generated code 21 July 2020, 09:31:00 UTC
221b2ab Update VPA Kubernetes dependencies to 1.18.3 21 July 2020, 09:29:07 UTC
b324bde fix nil pointer in getBoundaryRecommendation when no limits set 21 July 2020, 09:27:51 UTC
6b3dd68 Update VPA to go1.14 21 July 2020, 09:12:00 UTC
65ae5e4 CA - AWS CloudProvider - Static Instance List update Adds a number of new instance types/families to the static list 20 July 2020, 23:19:30 UTC
2d2215e configure user agent properly for azure clients 20 July 2020, 16:44:21 UTC
84cbb3b Merge pull request #3099 from ysy2020/cluster-autoscaler-huaweicloud Add huaweicloud to list of supported cloud providers 20 July 2020, 15:32:55 UTC
6d6903f Merge pull request #3327 from vivekbagade/master Bump CA version to 1.19.0-beta.1 20 July 2020, 12:08:51 UTC
540ef7c Bump CA version to 1.19.0-beta.1 20 July 2020, 11:20:38 UTC
b8da58b Merge pull request #3325 from vivekbagade/master Candidate for 1.19-beta 20 July 2020, 10:42:51 UTC
671a2fd Updating vendor against git@github.com:kubernetes/kubernetes.git:master (23b66eaabd3a535dbee6474638c5bf51e78fbcfa) 17 July 2020, 09:16:05 UTC
10b3f01 Merge pull request #3320 from benmoss/sample-manifest Add sample deployment/service account manifest for CAPI 15 July 2020, 19:52:37 UTC
d97e3dc Add sample deployment/service account manifest Based on https://notes.elmiko.dev/2020/05/22/kubernetes-autoscaler-capd.html 15 July 2020, 19:31:13 UTC
465d02a Merge pull request #3028 from johanneswuerbach/configurable-limit-scaling VPA: Configurable container limit scaling 15 July 2020, 07:56:37 UTC
ddb8ec8 Merge pull request #3311 from nkiraly/generate-azure-instance-types-2020-07 Add various azure instance types now available 10 July 2020, 05:53:48 UTC
ea4c12a Add various azure instance types now available 10 July 2020, 03:43:45 UTC
c452eee Merge pull request #3307 from nilo19/cleanup/enrich-cloudprovider-test Azure: Enrich unit tests for azure_cloud_provider 09 July 2020, 07:42:03 UTC
4decfcd Enrich unit tests for azure_cloud_provider 09 July 2020, 06:37:10 UTC
ad95c9d Merge pull request #3301 from MorrisLaw/add-morrislaw-to-approvers Add MorrisLaw to list of approvers for DigitalOcean 08 July 2020, 22:18:35 UTC
b83461d Add MorrisLaw to list of approvers for DigitalOcean 08 July 2020, 21:55:01 UTC
back to top