sort by:
Revision Author Date Message Commit Date
603a5b6 check_ipmi_temp: turn off sel checking Call check_ipmi_sensor with --nosel to turn off system event log checking. We do not want to get criticals on old events. Bug: T125205 Change-Id: Ia97fa0bf112a1d64e56ed7778108007500f4683f 06 June 2017, 10:27:45 UTC
0c95f53 Scap3: deploy jobrunner with scap3 Bug: T129148 Change-Id: I9b60db40222172da63f4e29cc64bbbf49567a960 06 June 2017, 09:03:33 UTC
c795a00 network: Add kubernetes pod/service IPs We will probably want various services to be accessed by kubernetes pods. Hence add the IP ranges in network::subnets data for eqiad, codfw production as well as the staging cluster. Also add the kubernetes service IPs. These are actually mostly informational as no traffic having a destination or source kubernetes service IP is ever expected to be on the physical wire, but it's probably useful to have those there as well. Change-Id: I4b6c1b2f5f06f10a1adac76221c5790061e438ef 06 June 2017, 08:22:57 UTC
8f16c4b Correct pageview_hourly loading scheme on pivot home Bug:T167068 Change-Id: I4fd6f60344c27bda086f937b85d793036e300e62 06 June 2017, 08:14:49 UTC
d0ea43b Do not confine LLDP fact to physical/non-VMs It's actually useful to establish the VM->host relationship and there is no reason why it should be confined anyway. However, only set the (automatic) monitoring parent for the physical hosts (resulting in no changed behavior for those). The rationale behind this is that we want to alert for each individual VM when the Ganeti hosts running them die, as: a) just a host DOWN alert for the VM node is too inconspicuous, b) it's usually the case that VMs can be relocated to other nodes. Change-Id: I8203558efc05e5fb88bff619f3b893d16de6db62 06 June 2017, 08:04:07 UTC
f8c7814 dynamicproxy: Centralise error page template and use it This was originally based on the Varnish errorpage, but I'm starting with dynamicproxy first because it's a simple case where the file is just on disk. Later patches will re-use some of this inside the VCL use cases. This change also affects toollabs proxy, since it uses dynamicproxy. Defining 'mediawiki::errorpage' to abstract default parameters and simplify usage in multiple places. Bug: T113114 Change-Id: Id8576df7ca03823256ce824f31dd99e3466ae226 05 June 2017, 21:18:30 UTC
4ad1c8d Revert "dynamicproxy: Centralise error page template and use it" This reverts commit f3a1e04bb5a1cebf80ac93676afcde84c8cfb2df. Change-Id: I7a1e92fe019d22f38b50f4a56a39db1ee42ff19a 05 June 2017, 20:43:52 UTC
6640d21 Revert "dynamic proxy errorpage: s/title/pagetitle/" This reverts commit 581fbf6981d537c31935e8da6c336bff385a4433. Change-Id: I95a3dcdc2f875784a53436d0e71013c3b331ab94 05 June 2017, 20:39:40 UTC
581fbf6 dynamic proxy errorpage: s/title/pagetitle/ Change-Id: I9540d64503e9723d54227b7447bb2a22fba84689 05 June 2017, 20:29:53 UTC
f3a1e04 dynamicproxy: Centralise error page template and use it This was originally based on the Varnish errorpage, but I'm starting with dynamicproxy first because it's a simple case where the file is just on disk. Later patches will re-use some of this inside the VCL use cases. This change also affects toollabs proxy, since it uses dynamicproxy. Defining 'mediawiki::errorpage' to abstract default parameters and simplify usage in multiple places. Bug: T113114 Change-Id: I764d00c7b40ad0931590f04ed2f76ecbd84b33ba 05 June 2017, 20:01:12 UTC
2246a55 flake8 fixes for E305 Fix "E305 expected 2 blank lines after class or function definition" warnings in preparation for flake8 upgrade in tox tests. Change-Id: I75425fc791f8745e02f5663aa0c73a6f057ecf2e 05 June 2017, 19:48:29 UTC
50cdac9 planet: remove "ja" and "ca" (empty), add link to new "el" Change-Id: Idda7f0213e0778ecbbc7af494a698b34c13d2d98 05 June 2017, 19:10:28 UTC
3c593da jenkins: Install java 8 on stretch and greater If on stretch or greater, set Java version to 8, otherwise keep installing version 7 as before. Bug: T166611 Bug: T162828 Change-Id: If6a134bbaa3bb879a11921b6c667932c198da9a2 05 June 2017, 17:49:19 UTC
0b66168 Phabricator: Fix colour for Unbreak Now tasks It seems that prod is not affected most likely someone edited it throw the webui. But this can be seen on https://phab-01.wmflabs.org/T145 . Changes indigo to pink. Which is the colour used on prod. Change-Id: I15bfbeca0a3664d4e179c8c392a650395c6d1c7f 05 June 2017, 17:06:05 UTC
1e5b3e6 planet: add Wikimedia Scoring Platform blog feed Change-Id: Ia87168968f89dd67eb3a920219390c4e97c9e539 05 June 2017, 17:03:15 UTC
ff7a8d5 hieradata: turn off nginx proxy_request_buffering Bug: T166806 Change-Id: I5a8d365263c35e33b4d703cff3f9baa19d27708b 05 June 2017, 16:19:56 UTC
369f35d tlsproxy: selectively disable request buffering This is needed to completely turn off request buffering to disk. Bug: T166806 Change-Id: I8334714809112f5959fbc250d44d5ef1f9136e7d 05 June 2017, 16:14:42 UTC
2ea1a97 Icinga: skip another NRPE error in raid hanlder Bug: T166962 Change-Id: I3a75502dfa40357dc1633a34af7353bc95892de6 05 June 2017, 15:27:53 UTC
4b4bda0 hieradata: set nginx client_max_body_size 0 for swift Bug: T166806 Change-Id: I1557982ebdb1db280bf510b69d473731e27baa60 05 June 2017, 13:53:00 UTC
d3996d3 etcd: big old roles/auth cleanup We decided not to use the builtin auth system anymore for etcd version 2, and most of what we did here wouldn't apply to etcd v3 anyways, so remove the old cruft as well as our managing tools and classes as they're mostly useless now. When we move to etcd 3, we might re-evaluate our stance, but all commands and outputs will be changing as well. So we will need to re-do the work anyways. For now, let's remove some files from the puppet repository! Change-Id: I1013c7ae2a8643472df28fc8b51527181520bce7 05 June 2017, 12:18:49 UTC
8b3ffd0 Add webrequest dataset to pivot configuration Sampled webrequest is now loaded in Druid. This patch updates pivot configuration to show the dataset with a proper name and coments. Bug: T166967 Change-Id: I8d1df2de39e56c93695414171bb24f4d6dec45b8 05 June 2017, 08:35:39 UTC
9528f42 check_ipmi_temp: set check timeout to 60 seconds In a few cases checks are still timing out. Increase timeout from 30 to 60 seconds. Bug: T125205 Change-Id: I0b2edf162477de25a887bbfb7f18ab3900617555 04 June 2017, 10:20:20 UTC
cbd09cd ircecho: notify service on config change Fix the need for a manual restart of the ircecho service when the configuration is changed by making the config file notify the Service resource that is created by Base::Service_unit. Also change the require for Base::Service_unit to wait on provisioning of the actual python script. There is no need to require the config anymore because of the notify relationship. Change-Id: I2621d741f44278edd9ff428fd2d864b4d4ebc292 03 June 2017, 15:20:30 UTC
a9cc2c0 Fix indentation of Gerrit downtime page Failing in style. Change-Id: I01e56c390cab94b4684db1e03c5ee9641602aee6 03 June 2017, 00:15:10 UTC
1497603 Fix typo on Gerrit downtime page This message sure looks like it was written in panic. Change-Id: I1fd52be400f28df568bba5aaaf77b3ea0d04fb52 03 June 2017, 00:13:27 UTC
2480598 Gerrit: Fix wrong syntax in ~/.gitconfig I by mistake did gc.<config>=<value> instead of [gc] <config> = <value> Change-Id: I825a7c9f0d7f708ba3a838bf4a59aa0bc7b8ba63 02 June 2017, 22:40:11 UTC
b3b647e Gerrit: Set gc.auto and gc.autopacklimit to 0 in ~/.gitconfig This is for gerrit 2.14 and won't affect gerrit 2.13. Apparently in the newer jgit release which gerrit 2.14 has they added support for gc.auto but we have gc switched off but it doesn't seem to affect gc.auto. So we have to switch it off globally in a gitconfig file. See disccusion at https://groups.google.com/forum/#!topic/repo-discuss/lVR37Pm4G3c Bug: T151676 Bug: T156120 Change-Id: Icbcee9c080a3ee618104a5bf2c1b7c579ca33b5f 02 June 2017, 22:03:37 UTC
dc715db Gerrit: Increase packedGitOpenFiles to 6000 Matches changes done in cf3fd766857ee5bf57b23840efb1bca4a10e2c90 <paladox> "Increasing core.packedGitOpenFiles to 6000 so that it is over the average and not significatly under it. It also matches the systemd version where we set the value to 6000" chad says "we currently only use about 4800-4900 on average" Change-Id: I693bf289fb00634604d1485dba27e70c8e3c8ec3 02 June 2017, 22:01:53 UTC
76440f6 planet: add Wikikmedia Performance Team blog feed Change-Id: I7677c3510437884ad739277af7e865c563b337b2 02 June 2017, 18:57:29 UTC
6171036 add admin group releasers-mediawiki to mwreleases1001 This should be a temporary step to give existing MW releasers shell access on the new mwreleases1001 host before a new puppet role has been written for it. Once we have a new role and mwreleases1001 has it applied, this should move to role/common. Bug: T164030 Change-Id: I1a6c72982816bfc8e956c6daa098ab65ad450a65 02 June 2017, 18:48:00 UTC
880715c [Planet Wikimedia] Add some hackathon-related blogs, add Greek planet Covered in the Wikimedia blog already, but some of them missing: https://blog.wikimedia.org/2017/05/31/vienna-hackathon-learnings/ Also add the Greek planet, with translations from translatewiki.net users. Change-Id: I88c74bb402ad8878d7d9717e3badbec7149b824f 02 June 2017, 17:30:30 UTC
9cd3333 Drop gerrit2001.yaml only includes temp admin permissions Already handled by role now that it's all setup Change-Id: I9216351e12bed3e470dc3bc0a126fe3e2dab2572 02 June 2017, 17:19:22 UTC
d474b99 raid: switch from stringified fact to array Now that we have non-stringified facts, we can drop the whole join with comma (on the fact side) and split with comma (on the puppet side) dance and just pass a regular array instead. This is partially a revert of Ia421806c4fbedf2da4a02ba804fef990e647b38c. Bug: T166372 Change-Id: I0a8b5a07aad58e01405ceaf8dd2e8dc2d9ebd190 02 June 2017, 14:45:19 UTC
fb95ed1 Remove str2bool from is_virtual facts Now that we have disabled stringified facts and that we run Facter >= 2 everywhere, $facts['is_virtual'] is guaranteed to be a boolean and doesn't need to be wrapped by str2bool(). This is partially a revert of Ie82b739b5927f43b08826ce6adb33a8b91ae81eb. Bug: T166372 Change-Id: I0cbbb46b1d598217075920b07b457c076b783d79 02 June 2017, 14:39:40 UTC
1927a94 LVS refactor: service IPs and sparing out lvs101[12] Bug: T150256 Bug: T165765 Change-Id: Ifd085f4e2869ad3703fe9080335bea4576c46329 02 June 2017, 13:52:18 UTC
25ef545 puppet: disable stringified facts in Labs as well Now that they've been tested in prod and that facter was upgraded in Labs, it's time to remove this realm guard and make the two environments consistent again. Change-Id: I3075ed559af083c1ac3e0a44156abcc887cecfec 02 June 2017, 13:12:21 UTC
34cb8c9 LVS: new redundancy layout for new eqiad+ulsfo hosts This sets up the "new" eqiad LVSes in the new style, with 4x machines providing N+1 for 3x classes. Ditto the upcoming new ulsfo LVSes with 3x machines providing N+1 for 2x classes. The regexes cover the expected future similar change in esams as well because it was simpler, but esams/codfw node lists at the top are not updated, as those machines aren't actually purchased yet. Bug: T150256 Bug: T164327 Bug: T165765 Change-Id: I81f70d801770707b526c183efc6f881b8bc0ba3f 02 June 2017, 12:54:19 UTC
79b5c5f admin: add ema's Yubikey Change-Id: Id47e4b084cd8604fa8a73e9bd3e319fdaf0b2199 02 June 2017, 12:05:19 UTC
610d1cd Re-enable temperature monitoring via NRPE The ACPI log lines flood mentioned in f707810565 has been fixed by blacklisting the acpi_power_meter module (597b9b1af8). Re-enable temperature monitoring. This reverts commit f707810565cae62677ebaa6044d31f2ae221e4c3. Bug: T125205 Change-Id: I95c5a1b00f2b876ea4d12db973b2b346cc78a7cb 02 June 2017, 12:00:14 UTC
10b076f swift: make swift-dispersion-stats policy-aware Required for swift 2.10 upgrade, swift-dispersion defaults to Policy-0 Bug: T151648 Change-Id: Idf7cb2a01eb1df8956fda5eb0c5b787084cb7f42 02 June 2017, 10:29:10 UTC
f750ffe logstash - curator connects only to localhost Since elasticsearch on the logstash cluster only exposes its API to localhost curator should only use localhost as well. Change-Id: I2421dc967087f27f9d312c716021b0194783530d 02 June 2017, 10:17:53 UTC
23a4ebe mariadb: Allow full reimage of db2041,38,37,35 (still on trusty) Add the other pending trusty hosts on codfw that are not yet in jessie, all at once, to avoid useless git puppet spam. Change-Id: I738f7806843111db660d7ab4542327de61abed3f 02 June 2017, 09:29:30 UTC
dcf02b4 logstash - cleanup dead code Cleanup related to https://gerrit.wikimedia.org/r/#/c/356063/ Bug: T166154 Change-Id: I904c24bea5161f3f4ad413423a02848e3f0f19de 02 June 2017, 09:15:30 UTC
4be76bf Add Save Timing alerts to Icinga Bug: T153170 Change-Id: If5d7d14cbd1a01317da5fa825df2bfe8a4b4ff60 02 June 2017, 09:02:38 UTC
4045358 Add Navigation Timing alerts to Icinga Bug: T153169 Change-Id: I94d8a243db7af6f9f393b54d1bb4e9abbd2b723f 02 June 2017, 08:58:51 UTC
69dc16c calico: Supploy a calicoctl.cfg file Supply a calicoctl.cfg file so we don't have to use ENV variables to configure calicoctl Change-Id: I0229ea89d1a39453f11e320906312ed238b58dab 02 June 2017, 07:39:43 UTC
f3b8ee0 labs: Direct people to #wikimedia-cloud for support Bug: T166420 Change-Id: I70aa070d7655e4caa2d3903e00471e8bbfadafc2 01 June 2017, 21:00:41 UTC
69e5336 shinken, icinga: direct bots to #wikimedia-cloud Bug: T166420 Change-Id: I51cd498d8dcaf68300e7e5ec491f8c4d3b6e2ec0 01 June 2017, 21:00:09 UTC
6415ad0 Labvirt2003: Switch to xfs All the other labvirts use xfs, it's easier to re-image this box than make the puppet code handle the difference. Change-Id: I8b8d24e797035aca2db303aa9b3cf4be8fabb050 01 June 2017, 17:55:10 UTC
ccfd2a7 Add hiera file for labtestvirt2003 In particular, specify where the /var/lib/nova/instances partition is. Change-Id: Ic156efbe9db1a16c32164c94e065f0dcdeccad62 01 June 2017, 17:40:23 UTC
ccdc21b Labtestvirt2003: Add to site.pp Change-Id: Id3cd04ca79e2917b9b479c90adeb6ee35d7a6d12 01 June 2017, 16:55:17 UTC
1a31e3c varnish: don't use $name as a parameter name $name is reserved and special in puppet and using it as parameter name causes a puppet 4 parser validate error. Change-Id: I3606c7c921e5b889d5c627d38514a1a383e127ed 01 June 2017, 17:38:47 UTC
0ea2d64 phabricator: don't assign a new hash key Hashes are immutable in puppet, so trying to assign a value to a key is invalid code that works right now because of legacy bugs. These are fixed in puppet 4 and the new parser errors out. Do this in a different way. Change-Id: I2a3c31acc558b876cc14ca06409033d4c7f4ade3 01 June 2017, 17:38:47 UTC
cf1e831 restbase: don't define parameter $hosts twice The value of the two definitions was the same, so fairly obvious. Errors with Puppet 4's parser validate. Change-Id: I5c55899a155b7968f3a0789aa2a2ccd88d0187d3 01 June 2017, 17:22:46 UTC
43f5233 Revert "setting labtestvirt2003 into site.pp" This is showing a catalog error for some of the labs specific items, removing the site.pp entry to get a basic system puppet run done (and user keys on system for debugging). This reverts commit f68c4a380b4e134462b721bc9622f66445c77d6b. Change-Id: I785b8eeeb5bc09038baa08007e807d57ed161806 01 June 2017, 16:43:41 UTC
f68c4a3 setting labtestvirt2003 into site.pp just expanding the stanza for labtestvirt200 in codfw Bug:T166237 Change-Id: Iaadfe5fadcf1cca46231cd7a9468236adaf7acfe 01 June 2017, 16:31:30 UTC
368c648 Add partman recipe and DHCP entries for labtestvirt2003 Bug:T166237 Change-Id: Ib458ca6f4d9d0afe95098a28d54772a68359b4be 01 June 2017, 15:53:43 UTC
af303f5 celery: use SyslogIdentifier Separate log lines for different workers in syslog Bug: T146581 Change-Id: I158dff3aedc678b824f019eda04ab70e396094bf 01 June 2017, 15:40:21 UTC
09e7272 varnish: add explicit guards around upload-specific VCL The conditional around CORS and commons redirect code in cluster_fe_err_synth is not strictly necessary as we are only returning synth 666/667 in cluster_fe_recv for upload. However, it seems like a good idea, if anything to avoid confusion. In upload-backend instead, the X-MediaWiki-Original code is definitely upload-only. Note that we need to use bereq.http.Host there instead of req.http.Host given that req is not available in vcl_backend_response. Bug: T164608 Change-Id: I66db324fe507e98096b89e0731404fb5ca389436 01 June 2017, 15:11:14 UTC
79068d4 profile: introduce swift::storage::labs Labs uses lvm for its virtual disks, thus introduce a labs-specific profile to create the required LV and symlink it into /dev and simulate a real disk/partition. Bug: T162247 Change-Id: If292ad0069ec5083959e963620bd25ad4127bf37 01 June 2017, 14:32:00 UTC
b80ddec role: report instance disk full % in beta prometheus Change-Id: Ic83a8074c22f5aa7271ff926364ec849a6f85632 01 June 2017, 14:26:39 UTC
c8ccae1 hieradata: default retention for prometheus/ops eqiad Running on baremetal with enough space now Change-Id: Ie79dc16d67ae8139463cba5de624fff9ca8bc4d0 01 June 2017, 14:13:33 UTC
68e0106 Labs: Include 'bikeshed' package on Trusty VMs. I'm installing this for the useful 'purge-old-kernels' script. Change-Id: Iaa7f95329b6c85264425453d6fa1634affb8ab2d 01 June 2017, 14:01:00 UTC
b714cf2 site: add prometheus global instance in eqiad ATM prometheus 'global' instance is only in codfw, add eqiad too Change-Id: Ie12a22ce38e9c752d863f1b9bb3800cd6b3cb945 01 June 2017, 13:52:50 UTC
1449591 Rewrite the LLDP fact(s) Rewrite the LLDP fact, removing all of the existing string facts and adding the following instead: * "lldp", a structured fact resulting in a hash that contains interfaces, each of which contain a neighbor, a port and a VLAN. Note that we are not returning port/id (lldpswportid) anymore, as this was of limited usefulness (e.g. on Junipers it returned the internal port number). * "lldp_parent", a fact returning a string with the LLDP neighbor of the primary interface (interface_primary, in our setup). * "lldp_neighbors", a fact returning an array with the LLDP neighbors across all interfaces. Change-Id: Ia156007346d7938a3f5a065fd8953f853748c314 01 June 2017, 12:05:47 UTC
a17f083 Maps cache: fix parameters stripped away Bug: T164608 Bug: T166735 Change-Id: Id1200aebf2ed42fd0a64453dee46a43040129483 01 June 2017, 11:43:29 UTC
7c16bad varnish: do not chmod VSM files There is no need to chmod VSM files anymore since we added a patch to fix VSM file permissions in 4.1.5-1wm4. Change-Id: I150bca099ebb6d59f4ba0465b8769e475146942c 01 June 2017, 10:11:05 UTC
597b9b1 base: blacklist acpi_power_meter This is known to cause spurious kernel messages when certain unimplemented parts of sysfs are read, e.g: May 31 17:07:57 db1077 kernel: [549206.257458] ACPI Error: SMBus/IPMI/GenericSerialBus write requires Buffer of length 66, found length 32 (20150930/exfield-418) May 31 17:07:57 db1077 kernel: [549206.257478] ACPI Error: Method parse/execution failed [\_SB.PMI0._PMM] (Node ffff883f7f89ce28), AE_AML_BUFFER_LIMIT (20150930/psparse-542) May 31 17:07:57 db1077 kernel: [549206.257493] ACPI Exception: AE_AML_BUFFER_LIMIT, Evaluating _PMM (20150930/power_meter-338) Bug: T125205 Change-Id: I4c0a7275fd2dd0ebcff97390e5f2bfb02c998983 01 June 2017, 09:55:42 UTC
b786dd5 Record extended account dates for two researchers Change-Id: Ib5fe141f8a7997e6a24e74be47e29234c9b3d8a6 01 June 2017, 07:01:22 UTC
17972e4 Add Druid hosts to network constants Initially to be used for revised zookeeper ferm rules restricting access to those hosts which actually access it. But mid-term the same definition can be reused by other ferm rules/puppet code as well. Change-Id: I36c797779e172709689cb72a2bf4e52c3954fa7e 01 June 2017, 06:39:42 UTC
b160069 contint: Fix stretch support in package_builder Causing puppet to fail on stretch Error: Could not retrieve catalog from remote server: Error 400 on SERVER: OS debian == jessie required. at /etc/puppet/modules/contint/manifests/package_builder.pp:8 on node jenkins-slave-01.git.eqiad.wmflabs Bug: T166611 Change-Id: I7d1f1e84017b7afad3e18bd74e2c03fc08c3f407 01 June 2017, 01:35:37 UTC
1164f06 icinga: give permissions to run commands to herron Let Keith Herron execute commands for all hosts and services in Icinga, like other Operations team members. This is needed to schedule downtimes, ACK Icinga alerts and similar things that can be selected from dropdown menus in the web UI. The Icinga-internal "user" (contact) needs to match the LDAP "cn" field because Apache LDAP auth is in front of Icinga but it's unrelated to the actual permissions from Icinga's point of view. Caveat is the LDAP login allows both capitalized and non-capitalized user name, so you can be "logged in as: " but still not have the permissions if using the wrong case. Bug: T166587 Change-Id: I2bc06fd8cd9df7a23e6c7b362846c9ec4ab92e1f 31 May 2017, 22:33:57 UTC
43afa24 Revert "admins: revoke ladsgroups key temporarily" This reverts commit 08e09a2a7a9c954ead3145522fb911c4afa15a44. Change-Id: I2ed108c8d2a7b9968a8868382cb86913c3a3a2a7 31 May 2017, 22:26:35 UTC
fba8e0c raid: bump hpssacli timeout Non-ideal, but loaded (i.e. user traffic + rebalance) swift machines can take up to one minute for check_hpssacli to complete. Change-Id: Idb0e7f79eaea77347f87971475bf27f2013cdb8b 31 May 2017, 18:27:43 UTC
b3cc36d adding gwicke and ppchelko to analytics-privatedata-users this has a three day wait and a checklist to be done on the task before the actual merge. 3 day wait also ends on 2017-05-31 Bug:T166391 Change-Id: I10d374ffcd43af3a0f6a16632a33639b1edc3eae 31 May 2017, 17:22:49 UTC
e49b155 Add Hadoop masters to network constants Initially to be used for revised zookeeper ferm rules restricting access to those hosts which actually access it. But mid-term the same definition can be reused by other ferm rules/puppet code as well. Change-Id: I4ba4682431ad0d1e808f8d0708de21799aca0458 31 May 2017, 17:08:02 UTC
28285f1 logstash - fix minor typo in documentation Change-Id: I0dd3a22fb7471771b8fd6aef9304de9fe5f824f9 31 May 2017, 17:02:43 UTC
f08d95a Add Kafka analytics brokers to network constants Initially to be used for revised zookeeper ferm rules restricting access to those hosts which actually access it. But mid-term the same definition can be reused by other ferm rules/puppet code as well. Change-Id: I533676d33cb3bb2c4ef498dc4c5426b9eb677adf 31 May 2017, 16:53:35 UTC
a985078 Add Kafka main brokers to network constants Initially to be used for revised zookeeper ferm rules restricting access to those hosts which actually access it. But mid-term the same definition can be reused by other ferm rules/puppet code as well. Change-Id: Ida313db0a6a8a0424e7861f66f8ce4e14d257b0b 31 May 2017, 16:47:12 UTC
0039fca prometheus: allow user 'prometheus' to export metrics too Change-Id: I8e50c77dddf0143c6fca60f23afc0040056beeac 31 May 2017, 16:34:26 UTC
42b1bd0 Icinga: improve raid handler message - Since we're alarming now also for faulty BBU and wrong Write Policy, the raid handler output might not be useful to diagnose the issue. As a quick workaround for now add the status reported by Icinga in the message to help recognize the problem in all cases. Bug: T166519 Change-Id: I2f2abc677307807f08c6c0baeeafc78639e9f224 31 May 2017, 16:02:38 UTC
42b54d9 maps->upload: move LVS IPs This needs careful deployment: 1. Disable puppet on high-traffic2 LVSes globally 2. Disable puppet on existing cache_maps hosts 3. Merge this change 4. Run puppet on all cache_upload (adds new LVS IPs) 5. Step through high-traffic2 LVS puppet-agent -> restart (which will rm maps cluster, add maps IPs to upload cluster) note cache_maps hosts left puppet-disabled intentionally above If something goes badly at this stage, we can revert similarly and the cache_maps hosts remain in a ready state to take traffic back. The revert process would look like: 1. Re-disable puppet on high-traffic2 LVS 2. Disable puppet on cache_upload 3. Merge revert of this change 4. Step through high-traffic2 LVS puppet-agent -> restart (which will re-create the maps cluster and move maps IPs to it) 5. Re-enable puppet on cache_upload and run agent (rm maps IPs) 6. Re-enable puppet on cache_maps (no-op) Bug: T164608 Change-Id: I5c2ba9fd165fe344861fdf22e4914af163759728 31 May 2017, 15:33:08 UTC
bcf993e Update puppet_compiler to 0.1.9 Change-Id: Ic8eab62e1c6e6a84ed70ae275915920f7d4468c4 31 May 2017, 15:02:03 UTC
a10db58 cache_upload: rebalance storage bins Pushing this will modify the systemd unit file immediately, but the changes won't take effect until the next varnish-backend-restart (which wipes the storage files to let them be recreated). Bug: T145661 Change-Id: I82d56a5514faffad1d03aa5e4e1e4b6996abbb62 31 May 2017, 14:12:43 UTC
acfae66 Use reprepro from jessie-backports While runnning the repository on jessie, use 5.1.1 from backports which supports dbgsym packages and buildinfo files and provides several other bugfixes. Bug: T158583 Change-Id: If59dcfc31cbcbed9bda3c4fa520565326181331a 31 May 2017, 13:53:29 UTC
f707810 Disable temperature monitoring via NRPE This has caused ~670k log lines related to ACPI errors in 274 hosts across the fleet just today (in the past ~6 hours). This reverts commits: - 7eb0c2996c2940b6d5cc92b95183911e7e6799be - 638f84368ce5fa5259ade8330d7ceb4045cdfb48 - 683969938940808ea731399d2d4a7181ab000977 Bug: T125205 Change-Id: I77828ffe62367218d6cf6e844bd7264b416ff681 31 May 2017, 13:14:21 UTC
80ca65f Puppet: disable stringified facts in prod Bug: T166372 Change-Id: Icb6b8d1ef277f783dedb4565b278dd2dca749f69 31 May 2017, 11:57:12 UTC
6004d43 Upgrade puppet-compiler to version 0.1.8 Change-Id: I65ab89266b4dcb2b307c9a9409a86c20936c170c 31 May 2017, 10:41:16 UTC
6839699 check_ipmi_temp: bump check/retry intervals and timeout BMCs are known to misbehave if queried often: check once every 30 minutes and retry after 10 minutes on failure. Also we have seen a bunch of check timeouts with the default setting of 10 seconds. Bump it to 30. Bug: T125205 Change-Id: I2c9da2c740dadd68ac6b35b0e600b07a0782296f 31 May 2017, 10:08:21 UTC
d66cd6b admin: remove my old SSH key (volans) Change-Id: Ief4508f36d9db233f68d648ab99269e9caaef887 31 May 2017, 10:06:10 UTC
ef6c385 puppet-compiler: Upgrade to version 0.1.7 Upgrade to version 0.1.7 of puppet-compiler Change-Id: Ib45c3e9f24874ce0aa58d825a0a4015b5a61abd9 31 May 2017, 09:33:36 UTC
f1d13aa etcd: set eqiad in r-w mode Change-Id: Ie076e29c62003ada9f765d8b2c1b8c8c67191a9b 31 May 2017, 09:22:47 UTC
4be4b13 Force fact stringification in servermon reporter We want to enable structured facts in puppet. However, servermon, while being generally agnostic to fact structure, is not developed with structured facts in mind. For now, and while the future of the project is still under discussion, force fact stringification in the reporter Bug: T166203 Change-Id: I24413825342c4926b24eba530c8feefcee24790f 31 May 2017, 09:19:53 UTC
b096855 etcd: enable replication eqiad => codfw Change-Id: I02dffef3d33844b69fd98eb09eaabc40c8c430e8 31 May 2017, 09:12:19 UTC
e83e318 etcd: set codfw to read-only too. And at the same time, block replication in eqiad. Change-Id: I6eaecf1b20da46e9c41a52194441f3d3ce911c06 31 May 2017, 09:08:19 UTC
4c10a68 logstash - fix typo in template name Bug: T166154 Change-Id: I8d8380a420bbaba9bc54cfe28f43ea2efd84200c 31 May 2017, 08:50:16 UTC
53802fe logstash - start using elasticsearch-curator for indices cleanup This also reintroduces merging older indices. Bug: T166154 Change-Id: I9d742f7f366c0a0dfbb2c58c416330019bb2b9d0 31 May 2017, 08:45:52 UTC
450cd89 beta: keep less mysql bin logs The beta cluster database configuration has expire_logs_days=30 which cause the disk to fill up. Lower it to 7 days based on Manuel suggestion. deployment-db03> show global variables like 'expire_logs_days'; +------------------+-------+ | Variable_name | Value | +------------------+-------+ | expire_logs_days | 30 | +------------------+-------+ 1 row in set (0.00 sec) Bug: T166060 Change-Id: I7aa521a934bcb5f36b30d863755a354c9f977f75 31 May 2017, 07:47:07 UTC
7311fcf ganglia: remove dup define in postgresql check Found using newer version of flake8. Change-Id: I30350a4e0cc74e95c09692d3dc78088a543e1c3f 31 May 2017, 05:53:29 UTC
638f843 monitoring/base: add nagios sudo privs for IPMI sensors The nagios user needs to be able to run ipmi-sel and ipmi-sensors with root orivs to be able to run IPMI temperature checks, otherwise Iefd4e699302a7adc155 would not work yet and get UNKNOWNS. From check_ipmi_sensors: 64 For \"-H localhost\" or if no host is specified (local computer) the 65 Nagios/Icinga user must be allowed to run 66 ipmimonitoring/ipmi-sensors/ipmi-sel/[ipmi-fru] with root privileges 67 or via sudo (ipmimonitoring/ipmi-sensors/ipmi-sel/[ipmi-fru] must be 68 able to access the IPMI devices via the IPMI system interface). This is like for some other existing nagios checks in base, like check_puppet_run etc. Bug: T125205 Change-Id: Ifbefd75a3f82654f8fd0f6b6917cb111b81a6a2a 31 May 2017, 02:35:57 UTC
back to top