sort by:
Revision Author Date Message Commit Date
7eb0c29 monitoring/base: add temperature monitoring via NRPE Adds an Icinga check on all non-virtual hosts, to check the temperature via IPMI and NRPE. The check_ipmi_sensors plugin can check different IPMI sensor types. This limits it to temperature. [-T <sensor type>] limit sensors to query based on IPMI sensor type. Examples for IPMI sensor types are 'Fan', 'Temperature', 'Voltage', ... See the output of the FreeIPMI command 'ipmi-sensors -L' and chapter '42.2 Sensor Type Codes and Data' of the IPMI 2.0 spec for a full list of possible sensor types. The available types depend on your particular server and the available sensors there. Bug: T125205 Change-Id: Iefd4e699302a7adc15537cdbdb71bcaa0dced18c 31 May 2017, 01:28:32 UTC
1c6330f admin: add herron to ops group Keith Herron is a new Operations Engineer with root access. Bug: T166587 Change-Id: I82d2e65c291081d647108d14fc71f016fa692ef3 30 May 2017, 22:08:47 UTC
13f218c admins: create shell account for Keith Herron Creates a new shell account for new ops member Keith Herron. Key as pasted on P5508 / T166587#3302076. UID matching the user in LDAP with @wikimedia.org email. Adding the user to groups will be a follow-up change. Bug: T166587 Change-Id: Ib0b6b75a66a99d41ff2a1116c01e27194fcc33bc 30 May 2017, 21:28:41 UTC
71cc1cd Add ezachte's new ssh key We verified this key over a google video hangout Change-Id: I05668bbe7a0200664b3761f90d51c3a3d8f5aed4 30 May 2017, 20:31:06 UTC
1ceb82e Horizon sudo panel: Add policy checks Change-Id: I8aa4fa042945e2ba5068c6684ea8c493ff4a8b73 30 May 2017, 18:34:22 UTC
f9a7b04 Horizon sudo panel: Better distinguish between 'create' and 'modify' actions - Creation now properly reports if you try to create a duplicate rule - Modify has a read-only 'name' field to avoid accidental renames Bug: T162097 Change-Id: I0fcfd599431c94ee09e7da9389e644e5aba19d0d 30 May 2017, 18:34:11 UTC
bd7efd6 Horizon: Add sudo policy panel This works but the error messages are cryptic... we should at least detect naming conflicts and report those failures correctly. Bug: T162097 Change-Id: I383456e8d94bbd9488b09550ef4dc9155e0bc69f 30 May 2017, 18:33:57 UTC
858bbf5 mariadb: Remove db2048 from the list of full reimages, add db2044 db2048 has just been reinstalled with jessie, db2044 will be next. Change-Id: I42358e1acd478f878d5953b2166fb0b632b46c91 30 May 2017, 16:22:53 UTC
f75600f Add jdittrich to analytics-users Bug: T165943 Change-Id: I986f56320c3be0a60826fa5edd172d67db8a6200 30 May 2017, 16:13:56 UTC
9455ff3 install_server: reimage ms-be2001 with stretch This machine is up for decom and no longer in swift anyway, try stretch reimage. Bug: T162609 Change-Id: Ib68a62816c9888f1ea7511b24e7cf234761e2585 30 May 2017, 16:11:10 UTC
9b07973 ChangeProp: Add Redis/Nutcracker connection info Bug: T161710 Change-Id: Iab3c0d43879d79cb39f8e77594f0f110a3d95ebf 30 May 2017, 14:33:41 UTC
6b68236 prometheus: add hwmon collector to default set The hwmon collector reports hardware monitoring and sensor data, including thresholds for temperatures to be considered higher than normal. For example: node_hwmon_temp_celsius{chip="platform_coretemp_1",sensor="temp11"} 30 node_hwmon_temp_crit_celsius{chip="platform_coretemp_1",sensor="temp11"} 95 Add it to the default set. Bug: T125205 Change-Id: I8ec5eb8a4f5b99d8f14e0604df136a2773694b1e 30 May 2017, 14:14:11 UTC
b853497 interface-rps: optional NUMA awareness Change-Id: I2a65c71644bcaec2ec4d4f025ec45abfa176c69a 30 May 2017, 14:01:50 UTC
957297d interface-rps: refactor opts handling a bit Change-Id: I3ddb8adc1a7fe66fcd9663461ae99d7f96853364 30 May 2017, 14:00:59 UTC
b919724 interface-rps: clean up typing/format-string issues Change-Id: I23f76548b5681d60ec692a2ce661d2e6f75f27ca 30 May 2017, 14:00:39 UTC
71d5309 interface-rps: update documentation Change-Id: I58a1f17ef602b72c47deed2a45581008f30b7bcf 30 May 2017, 14:00:07 UTC
826c4d6 calico: Organize the required per DC hieradata Split the DC specific configuration in the respective locations. Also set the codfw configuration correctly Change-Id: I7ab562c9d8c0f176a93a437bc5f6b491e53236b0 30 May 2017, 13:13:28 UTC
99a1a13 Also strip rpcbind/nfs-common deps on jessie installs On a freshly installed jessie system we have two packages which got pulled in during base install by nfs-common/rpcbind, so strip these as well: root@kubetcd2001:~# apt-get autoremove Reading package lists... Done Building dependency tree Reading state information... Done The following packages will be REMOVED: libnfsidmap2 libtirpc1 Bug: T106477 Change-Id: I871a4530fa5164daba5a131425c619b8a2415104 30 May 2017, 12:53:20 UTC
227c20c hieradata: set max_body_size for swift::proxy Bug: T166482 Change-Id: Ie4df1817f5bd1bc031582c12af1de44d32590fec 30 May 2017, 10:46:38 UTC
5607a3d tlsproxy: add support to change max_body_size mediawiki to/from swift communications now happen on https and thus through nginx, we'll need to bump the max body size nginx will let through. Bug: T166482 Change-Id: Ied6a8ef5600925e7621380db97ddf8a551411f09 30 May 2017, 10:43:44 UTC
4b37c7d calico-node: Enable IPv6 support For calico to enable IPv6 support the IP6 variable needs to be defined to a valid IPv6 address and in fact the one present on the host. Use the fact ipaddress6 for that Change-Id: I6c2b230cee52ddfb2b276648a331e52b3673c7f7 30 May 2017, 10:26:28 UTC
1adfab1 Blacklist the parallel port kernel modules stretch's kernel seems to be emitting a warning in QEMU virtual machines (Debian bug #857624) and we don't use nor need parallel ports anyway :) Change-Id: I916da9d05ef5db7c9ab46749ad930c1ff759734b 30 May 2017, 10:10:20 UTC
c1abc0e elasticsearch - correct naming of curator config files Bug: T166154 Change-Id: Ib2099494ed2cc0203e6d5a67cb0f3e8c0d35eb5a 30 May 2017, 09:29:44 UTC
ef76ec6 elasticsearch - configure logging for elasticsearch-curator curator logs are sent to /var/log/elasticsearch/curator.log. All logs in /var/log/elasticsearch are already managed by logrotate, no need for additional configuration. Bug: T166154 Change-Id: Ie4d6b02267fd616b84dee7b1341e32fe92251e7c 30 May 2017, 09:27:06 UTC
ce19f12 Don't symlink systemd service instances In stretch systemctl will explicitly refuse to operate on symlink'ed units. For templated services the symlink isn't necessary because systemctl already DTRTs Bug: T166389 Change-Id: I034eb945189f3aec693a15994e9b29e1b9dcb2a6 30 May 2017, 09:05:31 UTC
0fb1c3f elasticsearch - ignore some warnings related to 5.3.2 upgrade This actually fixes a configuration error. Bug: T163708 Change-Id: I33a30ef46b88e76f4b3039c549097aa117f366c8 29 May 2017, 19:20:02 UTC
c5b262a raid/hpssacli: allow NRPE to execute all commands Bug: T163998 Change-Id: Icb11ea112db468986c4aac48defc1dc7efbd21f5 29 May 2017, 17:12:56 UTC
368f49a elasticsearch - silence some loggers for elastic 5.3 upgrade The ParseField deprecation will be fixed by https://gerrit.wikimedia.org/r/#/c/353100, the RestController deprecation needs an Elastica upgrade. Pending those fix, we need to hide those deprecations. Change-Id: I37b4368d15c6f1e931d24dae00531f9105066c0d 29 May 2017, 17:05:12 UTC
09bb665 raid/hpssacli: check for cable errors/no batteries T163777 had a case where the following "show status" resulted in an OK: Controller Status: OK Cache Status: Permanently Disabled We already fixed the check to emit a WARNING when Cache Status is not "OK" or "Not Configured" in a previous commit, but it seems there's another thing we could check: "controller slot=N show detail". This had a few more values we could check, and specifically: Cache Status Details: Cable Error Battery/Capacitor Count: 0 Emit CRITICAL for the former, and CRITICAL for the latter if the count is 0 and the argument --no-battery hasn't been passed to us. This is untested on systems that have no battery by design -- hopefully, Cable Error won't be reported on these. Bug: T163998 Change-Id: Iaa099ec825a86445f6e79cfad895e9aec757725c 29 May 2017, 16:37:33 UTC
71d8691 raid/hpssacli: WARN on permanently disabled cache Commit 94f371f1f400a636694eead5a228285314a48db3 skipped the Cache Status line of "controller slot=N show status", in an effort to make it not warn on HP SSD Smart Path configurations where Cache Status was set to "Not Configured". We were checking instead the LD Acceleration Status for each logical drive, which was indicative in the past. We have now seen a case out in the wild where Cache Status is set to "Permanently Disabled", indicative of an issue with the hardware's battery, but LDs all report "LD Acceleration Status: Controller Cache" and "Caching: Enabled". Change the Cache Status check to explicitly check for "OK" or "Not Configured" and otherwise emit a WARNING. Bug: T163998 Change-Id: Ib186044a8d349ca82f112c4f90da0ca20ccad96f 29 May 2017, 16:37:33 UTC
7bdea41 pmacct: update config, push output to Kafka Update our pmacct (nfacctd) config, to be able to push to a Kafka topic with the name of "netflow". This is in turn used by the Druid pipeline (via Tranquillity) and eventually shown into Pivot dashboards. While at it, remove the GeoIP stuff that we weren't using and set bgp_peer_as_skip_subas to true. Also set bgp_aspath_radius to 2, which is lossy (trims the AS path to 2, losing the rest) but which OTOH gives us a better visibility of the amount of traffic exchanged with the hop right after our transits. Change-Id: I3892851c723eb6c0c29fc6df8feb606e0ca88c43 29 May 2017, 16:32:37 UTC
bb43a2a pmacct: fix firewall rules We want routers to be be able to communicate to pmacct (netflows and BGP), but router loopbacks aren't in $PRODUCTION_NETWORKS which is what the firewall rules currently allow. Add a list of loopbacks per site, hardcoded within the pmacct manifest. Adding it to the network data Hiera was considered but rejected as an option, since it would fit poorly into that structure and (depending on where they would be added) open up connections to random services from the networking equipment. Hardcoding it into pmacct should be OK for now, and we can think about factoring it out when we'll need to reuse this information elsewhere. Change-Id: I898b7df9d502b842e00676e6efd5a0b2a0467669 29 May 2017, 16:30:07 UTC
34130f5 aptrepo: fix elastic.co's update filter The current grep-dctrl hook emits this error: grep-dctrl: inconsistent modifiers of simple filters. Fix this by moving -e in the sub-filter and only in the first one where it's actually needed. Change-Id: I6e8c9642f62f11042d21b442ba2e0ecf4cb1038a 29 May 2017, 16:23:58 UTC
124142a profile::nutcracker: correctly handle monitoring, refactoring * Add a monitoring_port parameter to set which port to monitor, if any * Add base settings for both redis and memcached, allowing only to override those, allowing a more DRY approach. * Add a parser function to merge the hashes, this could be avoided with either the future parser or a better base class. Change-Id: I35ff9fecc6b342de640e6e37e5b7764797080603 29 May 2017, 16:20:37 UTC
161cb67 Remove obsolete mediawiki::migrate class terbium reimage is complete Change-Id: I27e37985c4d6255840ab812d7c0ff0f1675f52c9 29 May 2017, 15:53:12 UTC
55d7e5d role::scb: fix redis host list for changeprop Change-Id: Ied6c0c8e76ceb1418f63565fd6ca6bf0d2ff0716 29 May 2017, 14:58:01 UTC
7988dea role::scb: add nutcracker for changeprop Also add a more generic profile::nutcracker to use in the general case. Change-Id: I910c095c46ccacaf9f9e52699d9c7373a5a3fb07 29 May 2017, 14:50:45 UTC
7395217 Specify the correct service IPs for kubernetes clusters eqiad and codfw were mistakenly swapped Bug: T165732 Change-Id: I491b278a0643d2b3ccbdc0594568eb50867d9e08 29 May 2017, 14:23:28 UTC
f383987 Utilize the allocated service ips in kubernetes Since we now have allocated service IP ranges in T165732, utilize them. While these IPs are expected to never be found on the wire due to all the NATs done by kubernetes, it is still considered good to allocate some IPs we know and allocate instead of having 192.168/16 IPs configured there for uniformity purposes Bug: T165732 Change-Id: I68982912dce8ddc659d76029bd9078312f7e068c 29 May 2017, 13:51:54 UTC
09265a7 elasticsearch - introduce curator configs to enable / disable replication Bug: T166154 Change-Id: I9a074b31fba7764838a6a7114c47ebc1110e59d3 29 May 2017, 12:52:52 UTC
07e4a05 Kubernetes: Add IPv6 mapped addresses While these boxes are not required to have IPv6, it is prudent to start usage having it configured nicely in order to test and find limitations if any Change-Id: I718748cf52c11b70ae7eb3a111676749545bba5b 29 May 2017, 12:17:37 UTC
8d9224f debug_proxy: increase client_max_body_size The default maximum client request body size is 1M, bump it to 100M. Bug: T165324 Change-Id: Ia4c372a3ccaa66a37bf5d6adc6014166567d1ae4 29 May 2017, 10:05:56 UTC
c9bdc24 prometheus: report puppet agent stats Introduce prometheus::node_puppet_agent to parse puppet state directory and report status (disabled, failed, etc) as metrics. Change-Id: I0eeb80cf68addacdd651b8eb80c159c711498a1f 29 May 2017, 09:31:45 UTC
f1a6942 Record extended account expiry date for piccari Change-Id: I9df9963a04e86b5c820339aead861be9e14819a9 29 May 2017, 09:15:11 UTC
578ba51 Fix spec for various modules base::service seems to now invoke logstash::conf, add 'logstash' to the list of fixtures. nrpe specs were failling with 'unknown init system'. Set the fact to 'systemd'. Change-Id: I0a4e9c41329bf71c9a5dc11ef3c033fb8ce0c74f 29 May 2017, 09:08:31 UTC
fe9f9f1 mariadb: Increase the binlog_cache_size to 10M There seems to be very large transactions, that go over the 1M limit and cause slowdown due to the binlog having to be written into disk: https://grafana.wikimedia.org/dashboard/db/mysql?orgId=1&var-dc=codfw%20prometheus%2Fops&var-server=db2049&panelId=19&fullscreen&from=1495793965043&to=1495974377121 Increase the size 10x (from 1M to 10M to accomodate for those large transactions). The transaction size, however, should go down. Change-Id: I9c834a64cfe9b35ab40f58c419fd4fdcd15f8084 29 May 2017, 08:46:55 UTC
ab66323 mysql-client: Install colordiff (neodymium & sarin) The colordiff utility will make easier to detect differences on sql output, one of the things done on the client servers. diff --color was only added on diffutils 3.4 (debian >= stretch). Change-Id: I600b582ce5089c23630cd93ad0ee82cc03dfb989 29 May 2017, 08:45:01 UTC
bc525a9 decom ms-be2001 - ms-be2012 Leave systems as role::spare::system to get puppet updates until physical decom happens. Bug: T162785 Change-Id: I5546db2058c15299d4c87b0d5e322744417d1d20 29 May 2017, 08:37:33 UTC
62169d8 mariadb: Decommission db1023 Bug: T166486 Change-Id: I2effec0d29d6b8c79e70bb49e65825eee52b21cc 29 May 2017, 08:33:57 UTC
61e66ef Monitoring: remove spaces from list of interfaces - When generating the list of interfaces in check_eth, there are several whitespaces left for all the interfaces filtered. This clean the list without leaving whitespaces. Bug: T166372 Change-Id: Ie642af5146b0273a5fa92d74218029ebcce1d915 29 May 2017, 07:52:59 UTC
5629a48 role::kubernetes::worker: upgrade calico everywhere Bug: T165024 Change-Id: Iaeea2f641a162908bd637aac126814277fb4da77 28 May 2017, 15:25:05 UTC
08e09a2 admins: revoke ladsgroups key temporarily Change-Id: I7fe7544fefbd2cf24be827e965480332ca4ce96a 27 May 2017, 17:33:27 UTC
8affb59 phabricator: reactivate exim ganglia stats I had this commented as a stop-gap for the spam-from-labs issue we had recently. Since the mail alias on the affected labs instance has been fixed and we don't have a replacement for these graphs yet, re-activate it as it was before. Change-Id: I6aa730330cbb3f6846b65723db1a8f7a566bb37f 26 May 2017, 23:14:52 UTC
13b9791 wikistats: install php7.0-xml if on stretch Bug: T165879 Change-Id: I03f04886dbdcfd616cd199d5fd2b5131f76fdd93 26 May 2017, 19:07:05 UTC
535989e planet: add Wikimedia Cloud Services blog feed Change-Id: I781259e180976709f4226b52fac17fdf37dd291a 26 May 2017, 17:52:48 UTC
3778bdf jobrunner: bump up the number of htmlCacheUpdate jobs temporarily We have a long backlog right now, and room for growth in terms of used resources. Change-Id: I9968c2c8d5401e0ca08285d91bd86abe3d431c0e 26 May 2017, 11:48:46 UTC
becb413 profile::redis::multidc_instace: fix template Change-Id: Iebe4d73723a5d6b1ec0af2081ccd285c043d4cce 26 May 2017, 10:22:10 UTC
bddae83 role::jobqueue_redis: add redis instances for changeprop Since we don't need replication given CP is active-active, also allow to define replica: false in redis::shards in order to control such a case. For now this instances will be used for RB automatic blacklisting Bug: T161710 Change-Id: I4ce64011e546cb94ad104e7abeee39ffc4c0f301 26 May 2017, 10:10:17 UTC
61b1893 Revert "Revert "mariadb: allow reimage of db2048 for upgrade to jessie"" This reverts commit 8e26212a25f7233306eb4136edf4629a3065806a. Change-Id: I4e1fadc7c4352cecf4ec677bd0793d622a66c4aa 26 May 2017, 09:44:06 UTC
32903b5 check_private_data_report: Add Jaime's email Change-Id: Ic2e5fe913d1bbf592909c8c7350d6a4528d8aff4 26 May 2017, 08:57:50 UTC
1a7233b check_cpufreq: various cleanups (typos, shell etc.) Change-Id: Iac6570e9ee2cc4c5b8720352e8d8b165625c0661 26 May 2017, 08:31:35 UTC
4a079a1 install_server: Reimage db2049 as jessie Bug: T165739 Change-Id: Id952289d96750b6c0250c567eed2685786539ca6 26 May 2017, 08:18:46 UTC
cad024c jynus-vimrc: Disable mouse input & enable syntax highlighting Say no to stretch defaults! Change-Id: Ib28cc8f9f778dd967abd1476b74762f848902d42 26 May 2017, 08:06:51 UTC
e755fca wikistats: ensure Apache PHP module is installed before site With the current setup puppet runs and libapache2-mod-php-* gets installed but the PHP site won't work until a manual 'dpkg-reconfigure libapache2-mod-php7.0'. This is because it gets installed before the rest of the Apache packages get pulled in. Add missing Require to fix that, also move things around a bit and make more obvious how the PHP version influences package names. Change-Id: Idb4205bee82051a36c3b22588a2fce98c975e89f 25 May 2017, 21:31:51 UTC
88b1e1d Add kafka_version parameter, s/java_packaage/java_home/ in confluent::kafka::client This allows for configuration of which Java is used, and which Kafka version will be installed. Change-Id: I577ed3f0fd9c95ba7ba5df1b49ce1e1680754dfb 25 May 2017, 19:39:12 UTC
9c86a4c Update kafka.sh wrapper script for Kafka 0.10+ This is safe to be merged now, but it will break some uses of the kafka wrapper script in older Kafka versions. Specifically, kafka console-consumer will not work, as it requires --zookeeper in the older Kafka version, but the newer one expects --bootstrap-server. If you need to use the console-consumer, you can bypass this wrapper script to use /usr/bin/kafka-console-consumer and manually pass --zookeeper $KAFKA_ZOOKEEPER_URL directly. Bug: T166164 Change-Id: Ibd24d7f7dd1d897493ed6a61db4704b72954aa25 25 May 2017, 18:51:18 UTC
42d7e37 phabricator: set alt_host in redirector to "phab" instead of "fab" fab.wmfusercontent.org does not exist in DNS. phab.wmfusercontent.org does. Code in the redirector has: if ($_SERVER['HTTP_HOST'] !== '<%= @phab_host %>' && $_SERVER['HTTP_HOST'] !== '<%= @alt_host %>') { .. redirect .. So this means redirects also happen if HTTP_HOST is actually phab.wmfusercontent.org, but that is not intended. Change-Id: Ic36ea7a9664a9e1f003cefdfe6170cb8210a4f9d 25 May 2017, 16:53:51 UTC
e9958ac r::c::perf - +rxring and +txqueuelen These values have proven useful on 10G bnx2x cards in LVSes, copying over to the caches as well. txqueuelen bump - can only be a good thing (or at worst neutral) under BBR/FQ control. rxring bump - All current text/upload caches show at least some count of receive overruns (the only zeros are on the lower-traffic clusters). At present, all live cache boxes as well as the known upcoming new installs in ulsfo+asia have bnx2x as primary in eth0, so the config basically assumes this for now, until some point in the future when that changes. Change-Id: Id1514abef02ad08a4a982d5c79883907434ae47f 25 May 2017, 15:50:19 UTC
918cb40 Horizon: Add some local_settings for ldap access Change-Id: I15307dbf4a9f489ad95734ca3779bc0d79834288 25 May 2017, 02:14:08 UTC
6da3083 role::aqs: use profile::cassandra Also add a switch to profile::cassandra to allow opening connections to the analytics network in case of need, and fix handling of the case we have no TLS encryption. Change-Id: I727779606ff6c83f95e2dd308df3fa34f14e3f77 25 May 2017, 12:00:05 UTC
39b64f2 role::kubernetes::worker: upgrade calico on one host Let's first test the upgrade on one host only. Bug: T165024 Change-Id: I9866a3148a298f5d0e6501b1781fc5aa0b8816df 25 May 2017, 10:26:25 UTC
6d160c6 admins: add kaldari to analytics-privatedata-users Access to the Hadoop Cluster ("Data Lake") via host stat1002, as requested on ticket. Bug: T166165 Change-Id: I373087f36e00128306af6821cddd94ffe790262c 25 May 2017, 09:54:58 UTC
6a5a14e calico: add new version 2.2.0 Bug: T165024 Change-Id: I667d376f151c33e86dfab3de546a9ee502778e66 25 May 2017, 07:07:43 UTC
24b5e2d phab: comment out include of exim4::ganglia This caused LOTS of cron spam since today. The chain of events was like: < paladox> !log phabricator upgrading mariadb to 10.2 -> /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18: symbolic link to libmariadb.so.3 -> /usr/sbin/exim: /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18: no version information available (required by /usr/sbin/exim) -> /usr/local/bin/exim-to-gmetric" fails But this cron is running every minute and then tries to send mail to root. /etc/aliases says that root should be root@wikimedia.org and then when it _tries_ to mail that sender verification fails: 550-Verification failed for <root@phabricator.phabricator.eqiad.wmflabs>" which then causes "mail delivery failed" messages which get delivered to all of ops. Since Ganglia is deprecated and i wanted to find a quick stop gap before talking about this some more, i am just commenting that class right now. Change-Id: Iff1ce9cdd059a4239309ab93d367e1bfa790b91d 25 May 2017, 01:06:20 UTC
2760d4c authdns::server: move 'include standard' to role Change-Id: Ic63b1f5fdffd81a1df0153d9c6054f0f7dd6a591 25 May 2017, 00:13:44 UTC
a6a2ca2 dnsrecursor: move 'include standard' to role Change-Id: I6fa652760aeafc846e6ab8541449f41f42b621a4 25 May 2017, 00:02:07 UTC
82af211 confd: use logrotate::conf for logrotate Change-Id: I772ffaa22542ca42b4bcd763895e562de7954e30 24 May 2017, 23:44:44 UTC
a79acf1 Gerrit: Remove "" around T\\d+ in gerrit.config When i did a gerrit upgrade and ran this java -jar gerrit.war init -d review_site It removed the quotes around match = "T\\d+" But in the puppet repo we have quotes around it so puppet re added them back - match = T\\d+ + match = "T\\d+" Thats a puppet run so it added "" Change-Id: If2c073e5f6bad030672f5eba88156844cdee9b6a 24 May 2017, 21:00:08 UTC
89b277c DHCP/partman: Add DHCP and partman entries for ores200[1-9] Bug:T165170 Change-Id: I2bca295a6baf0fbbe588be6e8fc00d5ea36ebbf1 24 May 2017, 20:42:08 UTC
837db31 Use is_not_bot filter function for eventlogging mysql consumer Bug: T67508 Change-Id: Ib5156dcf582c9829e336e4c58fe7184cc4e590d2 24 May 2017, 19:13:37 UTC
269f8dd Revert "Changes needed for upgrading to Druid 0.10" We have to roll back to 0.9. Hadoop indexing jobs don't work due to java 8 vs java 7 issues. This reverts commit 6fbadbad2658c964e46203dd447707676016f83e. Change-Id: Ide4e009ca1b4534649462f44aa823e356bf89a6a 24 May 2017, 17:42:31 UTC
1fefeeb r::c::perf - FQ outbound flow rate cap @ 1Gbps This should help clamp down brief rate spikes in local intercache and cache:app flows (aka TCP conns) to no more than 10% of interface bandwidth, and throttle some of the higher-traffic remote proxies (e.g. opera, zscaler, google, etc) to more-reasonable fractions of our available outbound host and transit bandwidth as well. Bug: T147569 Change-Id: If26405f59d6646842e7d6d459eb1a12a046c179b 24 May 2017, 17:07:35 UTC
9403053 novastats: Update some reports to use more up-to-date code. The old 'novastats.py' library is no longer needed since mwopenstackclients does most of the querying we want. This patch replaces some old scripts with a few newer example scripts that use mwopenstackclients. Change-Id: I8e054c973dcdaa731688788b5087043308779f96 24 May 2017, 16:12:23 UTC
fef24b7 openstackclients: add an optional project arg to allinstances() Change-Id: I5b18939213dc9a1d4879c1309ff14664e22755a5 24 May 2017, 16:07:39 UTC
fd9a126 Extend expiry dates for two accounts Change-Id: I66a8ffd574fa5cfdf3482438f9cb1b52b6920d7a 24 May 2017, 14:41:50 UTC
aedd882 tools: have maintain-kubeusers chown $HOME/.kube Chown the $HOME/.kube directory to the tool account. Also protect the generate credentials with chattr. Bug: T165875 Change-Id: I37bc5517ab4bd8b646e0c63d9140c1cc2633e9ea 24 May 2017, 14:20:38 UTC
7b4e9f8 Puppet: more reliable run-puppet-agent - when using the --failed-only option, consider last run failed if the summary yaml file is not valid. Change-Id: I886646b4a250b5e7ca30804837d4d73d6fa7dda2 24 May 2017, 14:02:30 UTC
ed6a7e5 Tidy up tools node motd Change-Id: I795ceb01adf3d0e6b0d839b88530ca7b75f840b2 24 May 2017, 13:58:32 UTC
6fbadba Changes needed for upgrading to Druid 0.10 Druid 0.10 depends on Java 8. The package should install it as a dependency when installing, but it will not be the default Java. Setting JAVA_HOME explicitly in daemon env.sh files. Bug: T164008 Change-Id: I83a36bdfbcbe13f1c06471446a9bc5a0aa3e0941 24 May 2017, 13:43:37 UTC
9d0bfe7 logrotate: Fix uwsgi postrotate script Using invoke-rc.d on jessie systems is plain wrong. For now using the service command wrapper and reevaluate in the future if we should switch to systemctl Change-Id: I584532cf9f8c25d858b3035b78de2a3163fdc25e 24 May 2017, 11:39:14 UTC
aa116a7 Specify the correct host for wikidata icinga config Since we changed it to www.wikidata.org, we also need to change the services referencing it Change-Id: I5633a0010ffef232d991de52126c1f3486767d56 24 May 2017, 11:24:12 UTC
2c7089d Decouple wikidata monitoring from the IP address Using the IP address of text-lb.esams was plain wrong as it would alert even when esams was depooled. Change the definition to use the FQDN, quality the definition correctly and stop using the -I on the command but rather rely on the correct DNS resolution done by -H of check_http Change-Id: I2efa53ea6dc8677cdc3b4769dfdb26a81b09cd33 24 May 2017, 10:53:19 UTC
01bff75 Revert "Use gdb from jessie-backports on jessie" This reverts commit 026c81a83071fca6a16939e3a405d0487e00b8c1. Change-Id: Ia159081fed2e7e6e3e438a85c930d8af386ae710 24 May 2017, 10:44:24 UTC
026c81a Use gdb from jessie-backports on jessie Fixes access to TLS variables in threaded programs (in particular HHVM) (among other changes in four major releases). It's a low level debugging tool, so shouldn't have any impact on existing setups. Change-Id: I7c7540660368846bd9486c83208f0032e4974b49 24 May 2017, 10:40:30 UTC
ab02b19 prometheus: enable qdisc collector on cache hosts Enable queueing discipline (qdisc) metric collection on cache hosts. Traffic control statistics are useful, among other things, to monitor BBR's behavior. Bug: T147569 Change-Id: I490f72235fd873903fdf1baf322cda602d21ae0a 24 May 2017, 10:25:53 UTC
7a3cb90 beta: set profile::etcd::tlsproxy::read_only=false 79cfdefd50 added a new setting so we can switch etcd readonly whenever doing a switchover (T159687). In production the setting is applied via the hieradata role hierarchy: hieradata/role/common/configcluster.yaml hieradata/role/eqiad/configcluster.yaml However on deployment-prep the role hierarchy is not looked up causing on deployment-etcd-01: Could not find data item profile::etcd::tlsproxy::read_only in any Hiera data file... Set the value to false, assuming on beta we want etcd to be writable. Change-Id: I99f0bf11112de2a81bbbf131ece01eaea7871227 24 May 2017, 10:04:31 UTC
776b1d5 raid: Implement the option to check write cache policies Enable all databases to enforce WriteBack policy, though an error (not a warning) if they go down to WriteThrough- they probably will cause performance problems and have something anomalous like a BBU problem or something else. Ignore those alerts on the handle-raid scripts to avoid creating failed disk tasks. This currently only works on megacli systems, although the plan is to deploy it to all systems where the feature makse sense (megacli are the ones that are older and cause daily issues). Bug: T166108 Change-Id: I36fc6fb115c2d9e9d88391ea5a9230d6389f781c 24 May 2017, 09:17:05 UTC
b182709 Enable memcache-based Thumbor broken thumbnail throttling Bug: T151065 Makes memcache run on the thumbor machines, as well as nutcracker to pool connections going to both. This colocation makes sense because the memcache use is minimal and thumbor's throttling feature is tolerant of memcache being down. Change-Id: Idc5f10324e1c0877901393949acccc618b340dde 24 May 2017, 08:34:16 UTC
ef3e76b Puppet: run-puppet-agent improvements - fix bug when --failed-only is set - do not imply --quiet with --failed-only Change-Id: Iaf441764d3893f560cbcc330a182d2eb2ad264ce 23 May 2017, 22:11:30 UTC
92b5ba2 r::c::perf - raise fq flow_limit to 300 Still seem some rare flow_plimit drops, generally on a single queue and only affecting ~20% of cache boxes. Hopefully this will make them truly-rare (there will probably always be bad edge cases where these drops are the most appropriate course of action). Bug: T147569 Change-Id: Ib0b19da2cdd53f8b8928165d770c49f16a6126bc 23 May 2017, 21:26:41 UTC
back to top