Revision history - refs/changes/00/300100/1 - origin: https://github.com/wikimedia/operations-puppet

visit type:

Revision	Author	Date	Message	Commit Date
1bef213	Gabriel Wicke	20 July 2016, 19:57:14 UTC	Lower trickle_fsync_interval to 8mb In production, we still see relatively bursty IO, with no or few writes followed by a burst every five seconds or so. This causes short spikes of iowait up into the dozens of percent, which in turn do impact p99 read latencies. Our current trickle fsync interval is 30mb, which is rather large. I think it's worth lowering this significantly, perhaps to 8mb. All of those values are significantly above typical SSD erase block sizes of 256k. Bug: T140825 Change-Id: I9d6d2517f9ea90f99cd52c94483064ced9074324	20 July 2016, 19:59:00 UTC
51ae260	Eric Evans	20 July 2016, 17:42:46 UTC	Disable 1013-c instance We're experiencing stream timeouts, so putting this on hold until that's been addressed (r/300059). Bug: T134016 Change-Id: Iae35e61398e0664fdf7ced43c8eb43d2a592465e	20 July 2016, 19:52:27 UTC
d8b0f44	YuviPanda	20 July 2016, 18:33:21 UTC	grafana: Expand edit access in labs grafana This is anyone with shell access to any project on labs. Bug: T120295 Change-Id: Idc8c17f57bf80160ed908c2ce59c00b5c5a5011c	20 July 2016, 18:37:02 UTC
e959a80	Brandon Black	20 July 2016, 12:23:39 UTC	ssl_ciphersuite: drop non-FS AES256 options This is slightly controversial, but I think we should move forward with it for now, and allow reverting if there's any legitimate complaint. The non-forward-secret set of ciphers are the least-secure ones, which we need to eliminate first in the long term. The AES128 and 3DES non-FS ciphers are in this list for pragmatic reasons: there are simply too many clients still connecting with them for us to remove them at this time. The AES256 options here are not pragmatic. By eliminating them, we change the nature of the "compat" non-forward-secret list. It changes from a list of "anything we can reasonably support" to "things we still have to support because of minority (but still significant) real client traffic". Stepping through the rationale and data on why the AES256 options aren't pragmatic: 1) Fundamentally, most agree that AES256 doesn't offer any pragmatic crypto-strength benefit over AES128 today. 2) Any software which implements AES256 would also implement AES128, which we already prefer over it for efficiency. 3) Therefore any real client chosing AES256 has actually disabled AES128 for some security policy reason, with the (questionable) rationale that more bits is stronger. However, these same clients apparently do not support basic forward secrecy. It seems ridiculous to consider oneself in a position to set non-default client cipher policy for supposed security reasons while ignoring much more important factors like forward secrecy. (Note: We do continue to support AES256 in our forward-secret cipher lists). 4) We've been gathering detailed stats on our primary clusters for all ciphersuite selections for a year now, and none of these see any significant traffic. Breaking that down by each cipher's history: AES256-GCM-SHA384: Basically no data, except for one isolated, tiny spike of connections back on 2016-02-16. AES256-SHA256: Had a small non-ignorable population in mid-2015, but the rate abruptly fell to near-zero on 2015-10-08 and never recovered. Prior to that date, logging and sampling showed the bulk was from a single group of US Military proxies, which presumably finally got their software upgraded. It's now intermittent (often days with zero), and the long-term average rate since the dropoff has been roughly 0.005 reqs/sec. AES256-SHA: Also often goes days without stats. Its average rate over the past year is roughly 0.015 reqs/sec. Bug: T118181 Change-Id: I56cadda5211706ff7040b0e968e0ecb80d22245b	20 July 2016, 18:23:44 UTC
2b9bd89	Brandon Black	20 July 2016, 17:22:42 UTC	ssl_ciphersuite: auto-downgrade to compat when necc This eliminates some of the confusing caveats about using 'mid' or 'strong'. If 'mid' or 'strong' is used on a host that can't support them properly (apache on trusty/precise), a warning is emitted via the agent and the output is auto-downgraded to 'compat'. This will let us set many of our independent services to 'mid' now and have the actual change pend on their upgrade to jessie (and remind us to get that done). Bug: T118181 Change-Id: I123172ba9e289d15d884caefb9d9b2a5e17d21d7	20 July 2016, 17:55:00 UTC
d7a0d0f	Giuseppe Lavagetto	20 July 2016, 09:54:46 UTC	service::node: add git as deployment method This is also used in role::parsoid::testing where we don't want deployments via scap/trebuchet. Bug: T90668 Change-Id: Idc1d352b6f5d4aabb499b8208114e78de35f9ab1	20 July 2016, 15:39:39 UTC
97b325b	YuviPanda	20 July 2016, 15:09:17 UTC	redirector: Pass along request_uri to new location as well Allows redirecting domains while preserving links Change-Id: I5dc1d63984e51f4fbc634f93070da4317234b080	20 July 2016, 15:37:33 UTC
173cafc	YuviPanda	20 July 2016, 14:02:41 UTC	cache: Add labs grafana behind misc varnish Bug: T120295 Change-Id: I3a5a9415a95fd2e8f1538c2a102659341526ab73	20 July 2016, 15:03:19 UTC
d1afdf0	Ariel T. Glenn	20 July 2016, 13:27:24 UTC	set up proper dump monitor role and add to snapshot1007 Change-Id: Ibf89f1754c26390f675741cc3d70137a0857954f	20 July 2016, 14:20:43 UTC
b2d6869	YuviPanda	20 July 2016, 13:57:55 UTC	grafana: Add and provision labs grafana role Bug: T120295 Change-Id: I24036bd3298f3e6f679d37ee9ed1a69ce39fa1ad	20 July 2016, 14:00:20 UTC
17357ee	YuviPanda	20 July 2016, 13:19:14 UTC	grafana: Refactor production role into base role In preparation for introducing a labs grafana role Bug: T120295 Change-Id: I362e58ebe0bfee41fa97ed989a5de9c8085a05e8	20 July 2016, 13:47:07 UTC
a1e4f74	YuviPanda	20 July 2016, 13:07:33 UTC	grafana: Mark role explicitly as production Bug: T120295 Change-Id: I245e877a68540eb61fe35b37bd1c9329c8213a88	20 July 2016, 13:44:37 UTC
d74b49b	YuviPanda	20 July 2016, 13:03:32 UTC	grafana: Make role explicitly reference production secrets Bug: T120295 Change-Id: I77b96657edf2c97c274c2d09f745e36d40870f30	20 July 2016, 13:42:57 UTC
4425b42	Filippo Giunchedi	20 July 2016, 11:15:24 UTC	prometheus: use DOMAIN_NETWORKS not INTERNAL Change-Id: I35023b932a10354ec9cc795b0802b1bce4276a6f	20 July 2016, 13:18:33 UTC
8cf2ea2	Ariel T. Glenn	20 July 2016, 13:08:28 UTC	remove dump monitor from snapshot1004 Change-Id: I7f11b5941b7a01fb1908570fdcda5e40e8bc02d7	20 July 2016, 13:08:28 UTC
d80e11c	YuviPanda	20 July 2016, 12:41:19 UTC	tools: Add a kubernetes diamond collector Bug: T140887 Change-Id: I4822045fb8c4a127fec4106658f6cec3180048fd	20 July 2016, 13:05:31 UTC
f237e90	Giuseppe Lavagetto	19 July 2016, 09:25:08 UTC	parsoid: move to role::parsoid for all production nodes This is part of I3f4a5 that is being splitted in multiple commits Bug: T90668 Change-Id: Ie97015083ba56a7acaf5d40043dc04fcd0c33f3a	20 July 2016, 12:27:23 UTC
2d5f24f	andrewbogott	16 July 2016, 05:07:10 UTC	Include nova mysql password in novaenv.sh Bug: T139272 Change-Id: I148e6cf1c95ec6ec9bbf6e21172c500478d52a0a	20 July 2016, 11:32:37 UTC
601cb54	Chad Horohoe	19 July 2016, 18:17:51 UTC	add-ldap-user: Don't use sillyshell, it's silly (and doesn't exist anymore) Bug: T86668 Change-Id: I09da9e50ad06be54b7947e62171ffb10c1b3a2d7	20 July 2016, 11:31:37 UTC
eec4295	YuviPanda	20 July 2016, 11:18:43 UTC	tools: Make toolschecker webservice actions non silent So we can see why they are failing Change-Id: I6cdccf64c507166e5f8e012a0270d769180e0150	20 July 2016, 11:23:22 UTC
80e385f	YuviPanda	19 July 2016, 19:35:28 UTC	tools: Fix webservice toolschecker check - Use a separate tool to prevent racing with kubernetes webservice check - Fail if any of the steps fail, not only if all of them fail Change-Id: I96715d90e7e569d8527ba11cbeb2aa8cc73b903d	20 July 2016, 11:18:34 UTC
f8937a4	Jaime Crespo	01 April 2016, 15:48:12 UTC	New user for prometheus monitoring Prerequisites: * Setting a password on the non-public repo (DONE) * Setting a fake password on the public repo (DONE) Current grants given are the same as nagios. More can be added if needed. I will add it manually to a single host in order to test it first. Bug: T128185 Change-Id: I85152efcedc70f68ce345aa21b4dcce95663abd6	20 July 2016, 11:13:55 UTC
dae57fa	Filippo Giunchedi	20 July 2016, 09:47:03 UTC	prometheus: fix ferm::service and include node_exporter Change-Id: I17c2cbf0b80a0e32c41348e32e594905d459bf73	20 July 2016, 09:57:31 UTC
d2f8d49	elukey	20 July 2016, 09:31:11 UTC	Add special Cassandra compaction configs for aqs100[456] The new AQS nodes needs to be loaded with data but we are facing performance issues while compacting SSTables. This is an attempt to leverage better hardware to speed up computations. Change-Id: I9e7a43af5dafabd5160b55c3ad386693d97451f1	20 July 2016, 09:32:37 UTC
7197062	Filippo Giunchedi	15 July 2016, 13:45:23 UTC	nutcracker: default verbosity to 4 It looks like we're lowering verbosity to 4 across mediawiki machines anyways. Also verbosity 5 has created problems in the past with filling disks given our usage. Bug: T136078 Bug: T139786 Change-Id: Iaca2acde397ef276ab2636db9487fb8f349a0444	20 July 2016, 09:25:08 UTC
366b06d	addshore	14 July 2016, 09:31:30 UTC	Add more to stats:wmde config This will allow further portability of the scripts. Particularly this means that the db cred file will no longer be hard coded and thus these scripts can also be run on other user accounts (for example for one off runs) Change-Id: I890e08fe7bbdd38857540eafbe02469c6023c64b	20 July 2016, 09:17:41 UTC
dca8f91	Ariel T. Glenn	20 July 2016, 07:11:36 UTC	make sure directory creation uses the dump cron job startdate, part 2 The previous change made sure we end creation loop after all dirs show the appropriate date. This change makes sure we create the directory with the appropriate date in the first place. Change-Id: Icc7340e9cf17bfbdf0bc6af58da87206ab6dc18a	20 July 2016, 07:15:36 UTC
d37bf5a	Giuseppe Lavagetto	14 July 2016, 10:19:18 UTC	mediawiki::conftool: add mw-pool This is a specialized script that you should use when pooling an appserver for the first time or repooling it after it was set to "inactive". Change-Id: I36420cd1533940baf4e0cd7960d37cc6c04e1cbc	20 July 2016, 07:07:12 UTC
28f0354	Ariel T. Glenn	20 July 2016, 07:03:49 UTC	make sure directory creation uses the dump cron job startdate Bug: T126339 Change-Id: I2a2844411aab5df1e4578eebb0a908e613c35964	20 July 2016, 07:03:49 UTC
d99f8a8	Ariel T. Glenn	18 July 2016, 12:25:37 UTC	extend dumps cron job to run partial dumps as well This prepares the dump cron script to run full and partial dumps as specified, and sets up the new cron job for the partial dumps later in the month. Bug: T126339 Change-Id: If58ef377a253fd9f689a619eb40428417c9c7370	20 July 2016, 06:55:48 UTC
fe6dbfa	Giuseppe Lavagetto	14 July 2016, 09:49:17 UTC	role::mediawiki::webserver: add conftool scripts This will allow pooling/depooling from the appservers themselves. Change-Id: Id7e0d130a5b39d986239eeadc6a5a3945f178ea3	20 July 2016, 06:40:21 UTC
4dbcff8	Daniel Zahn	20 July 2016, 00:27:03 UTC	osmium: delete temp. migration class Change-Id: Iddfc4e37299a10249b6a3a666e0a2cc7059d3f96	20 July 2016, 04:49:15 UTC
16b2ed7	Daniel Zahn	20 July 2016, 00:25:19 UTC	osmium: copy data back from hafnium after upgrade Bug:T132530 Change-Id: Ibd53ed831bbe3ac06f42fc85679dc55d53ef820e	20 July 2016, 01:22:12 UTC
07eb092	Daniel Zahn	19 July 2016, 21:27:39 UTC	planet: add phabricator releng blog feed The blogs from the phame app in Phabricator also produce RSS/Atom feeds. These can be added to the config of planet.wikimedia.org (https://wikitech.wikimedia.org/wiki/Planet.wikimedia.org) so the content will be featured on https://en.planet.wikimedia.org/ currently there are 2 other blogs, but they are not ready yet / still in beta https://phabricator.wikimedia.org/phame/blog/ Change-Id: Ib574e4b415fb597979c387cf0da0c678740e50d7	20 July 2016, 00:56:28 UTC
3d8779c	Daniel Zahn	20 July 2016, 00:14:31 UTC	osmium: also copy /srv for migration Besides /home, also copy /srv (except /srv/mediawiki which we will delete) over to hafnium for reinstall, then copy it back later. Bug:T132530 Change-Id: I0ddd1896589d71c209ddbb44c11e44b8bdcc02a7	20 July 2016, 00:14:31 UTC
49486ed	Daniel Zahn	19 July 2016, 22:31:20 UTC	osmium: rsync home dirs to hafnium for migration Bug:T132530 Change-Id: I1ac35a7ac273191e4421a4df8efcb7a900b91607	19 July 2016, 23:37:21 UTC
ee4be79	Daniel Zahn	19 July 2016, 22:14:52 UTC	install_server: let osmium use mw-raid1 partman Bug:T132530 Bug:T136562 Change-Id: Ia8af9707b1f0adc81164d997e3616cb14467be37	19 July 2016, 23:15:34 UTC
17b37be	Daniel Zahn	19 July 2016, 22:05:38 UTC	install_server: let osmium use jessie-installer Bug:T132530 Change-Id: Ie7556a531654c6c0a62725b125124739a6296670	19 July 2016, 23:08:58 UTC
b060324	Ori Livneh	19 July 2016, 23:06:09 UTC	Fix-up for I39d2d7db576: move log_format directive to top level Change-Id: I05c4f1f42cc7c118d5a9c8a1f4066cff78c74c71	19 July 2016, 23:06:09 UTC
08deb80	Ori Livneh	19 July 2016, 22:47:01 UTC	rcstream: log X-Forwarded-Proto Right now the Nginx logs on rcsNNNN hosts are useless for determining which clients are using HTTPS because TLS termination is done on the Varnishes. So declare a log format that is identical to 'common' except it includes X-Forwarded-Proto as an additional field. Bug: T140128 Change-Id: I39d2d7db5767995ee66dd6481cc57ed891bff274	19 July 2016, 22:48:58 UTC
cd3fb01	Brandon Black	18 July 2016, 12:34:15 UTC	insecure post: 100% failure, loophole closed Bug: T136674 Bug: T105794 Change-Id: Ie2db01e1c05dc793e3350ba1111bbd30c50edb35	19 July 2016, 22:19:44 UTC
95d3902	Eric Evans	19 July 2016, 16:47:50 UTC	Enable instance restbase2003-c for bootstrap Bug: T134016 Change-Id: Iac655841514ba16e1aaf51813c0949d67103da03	19 July 2016, 20:57:23 UTC
f468e12	Jcrespo	19 July 2016, 20:17:47 UTC	Correct ip for db1043: 10.64.16.32, not 10.64.16.33 Bug: T138460 Change-Id: I6f88554163bf69d542a48328f4ac7f2a2ef2f42f	19 July 2016, 20:50:45 UTC
f878f4c	andrewbogott	19 July 2016, 19:12:52 UTC	Change the case of the rabbitmq collector class ...in case that matters Change-Id: Id225cfe0d2829de7a9419f726e5a762fc23f2f48	19 July 2016, 19:13:37 UTC
8874916	Eric Evans	19 July 2016, 16:42:08 UTC	Enable instance restbase1013-c for bootstrap Bug: T134016 Change-Id: I694c22231aff617957eab0764f2df63469bf4b8f	19 July 2016, 19:03:54 UTC
c6134d3	Chad Horohoe	15 July 2016, 15:21:00 UTC	Gerrit: Don't install defaults file, package provides it Change-Id: Ia7b4b19a6ec7f6d9437529651f470deea4964fec	19 July 2016, 18:34:26 UTC
be2da29	Jcrespo	19 July 2016, 18:09:34 UTC	Set db1048 as the primary master on the m3 proxy (not yet in use) Set db1043 as the backup primary. dbproxy1003 is not yet in use, but it will be when the dns for m3-master is updated. Bug: T138460 Change-Id: I52b146c9d5fb524f1bf65e638a8cbe9c911f27b1	19 July 2016, 18:31:43 UTC
e2269be	Daniel Zahn	08 July 2016, 18:33:35 UTC	typos file: add 'mariabd' and 'eqad' Add new strings as detectable typos. Fix 2 typos in mariadb (or i can't add it to the typos file). It's just the name of the ferm rule resource that is being changed here. Change-Id: I60cc54c78038c7f094c9e9b403ecba8b979d00c8	19 July 2016, 18:14:45 UTC
4bac87e	Daniel Zahn	19 July 2016, 01:47:28 UTC	admin: create shell account for mpany Creating a shell user for Maximilian Pany, per T135392 and T140399. He is a fundraising analytics consultant. key copied from ticket. UID matches newly created wikitech user. Bug:T140399 Change-Id: I705309d1754705b6b55827ea498a220122aae8ec	19 July 2016, 18:00:42 UTC
c92c44d	Chad Horohoe	19 July 2016, 02:50:07 UTC	Gerrit: Enable proper backups from new hosts Change-Id: Ifea8ef77c790c2c34a0bed6f13425af595096362	19 July 2016, 17:54:35 UTC
bc93fd1	Jcrespo	19 July 2016, 17:16:46 UTC	Changes on m3 grants to include unpuppetized users & dbproxy1003 Bug: T138460 Change-Id: I207628fe87d0684f45e6b58cfdeceeeab4d6d60f	19 July 2016, 17:40:18 UTC
309f084	Chad Horohoe	19 July 2016, 16:32:25 UTC	Gerrit: Go ahead and ensure lets_encrypt everywhere other than ytterbium Change-Id: I356a39715c858c0aec538ef37d57f729472b468b	19 July 2016, 17:27:40 UTC
e7aa74e	Chad Horohoe	19 July 2016, 03:01:10 UTC	Gerrit: Fix redirect to commit-msg, vary on $host Change-Id: I19e65477da24f6f5954a44bac296a9c65d53fea3	19 July 2016, 17:21:57 UTC
0c08329	Marko Obrovac	19 July 2016, 16:48:49 UTC	Parsoid: testreduce: correct gitRepoPath Parsoid's path was moved to the standard deployment location on ruthenium, so update the path in the config to reflect that. Bug: T90668 Change-Id: I72498a53deae4742f7ad15f953ff36e094da56bf	19 July 2016, 16:48:49 UTC
3fd35d2	Giuseppe Lavagetto	19 July 2016, 09:23:15 UTC	parsoid::testing: move to use the parsoid class The parsoid class is using service::node and the service-runner based version of parsoid. This is part of I3f4a5 that is being splitted in multiple commits Bug: T90668 Change-Id: I863cf7cd7cb2a0d67e27647e032a8a357f91df28	19 July 2016, 16:35:35 UTC
9f29145	YuviPanda	19 July 2016, 16:28:20 UTC	tools: Use a different tool for k8s webservice checks Avoids race conditions with the gridengine webservice check Change-Id: I0ab62f3b03a015683785266ff24f213352a23333	19 July 2016, 16:31:25 UTC
5b53549	YuviPanda	19 July 2016, 16:31:11 UTC	tools: Fix spacing Change-Id: Ib412196c1656a4e16c67d224bf98cfc4107b7502	19 July 2016, 16:31:25 UTC
d29cda1	Stanislav Malyshev	13 July 2016, 19:55:42 UTC	Move updater logs config to /etc/wdqs Change-Id: Ia2460828a04901b9f08d6343dae3ee7ec8ed26dc Bug: T139434	19 July 2016, 16:20:52 UTC
50c3aa8	YuviPanda	19 July 2016, 15:40:23 UTC	tools: Add limited sudo capabilities to toolschecker account Can sudo as a couple other toolschecker accounts that are doing webservice specific things Change-Id: I240450bb0fbb45c668acd4b171bf083f678a39fb	19 July 2016, 16:16:14 UTC
893542f	Marko Obrovac	19 July 2016, 16:10:30 UTC	Parsoid: Lower the heartbeat timeout to 3 mins The timeout for killing a worker with too much memory is 3 minutes, so bring down CPU clogging to the same value, since if the CPU is busy, there is no way the memory can be freed by the GC. Bug: T90668 Change-Id: I98de7a7488a37ee2a31d580dd4044f67c235aefc	19 July 2016, 16:10:30 UTC
86e0e96	Marko Obrovac	19 July 2016, 16:05:03 UTC	Parsoid: Increase heap limit to 600 MB Bug: T90668 Change-Id: I65894df520f0511efa9957ec39166d944c761943	19 July 2016, 16:06:38 UTC
e5fe9f8	andrewbogott	16 July 2016, 13:47:13 UTC	Remove a typo space Change-Id: I458d523507c2e0f018694e9523724ef46489b3ec	16 July 2016, 13:47:13 UTC
beab284	andrewbogott	15 July 2016, 19:03:02 UTC	Add diamond collector for rabbitmq stats Change-Id: I450c088980804ff5ef58e71f204a08b52abb1fd6	19 July 2016, 15:42:16 UTC
9e2d508	Giuseppe Lavagetto	19 July 2016, 15:28:02 UTC	parsoid: add realserver ips to role::parsoid as well Change-Id: I555ae6341f01073999baf2d25ecf196134090353	19 July 2016, 15:28:02 UTC
27c477e	Giuseppe Lavagetto	19 July 2016, 09:21:01 UTC	parsoid: add role based on service::node, apply to two hosts Add a new parsoid class and role that use service::node to install parsoid and run it from the service-runner based deployment This is part of I3f4a5 that is being splitted in multiple commits Bug: T90668 Change-Id: I9236292664abaa649c8748ce3cf8d9ea237ffd1c	19 July 2016, 15:15:37 UTC
9ec4f2d	cpettet	19 July 2016, 13:29:33 UTC	tools: set cgred application to trusty only Precise has some compatibility issues and we are limping along on one historical bastion for legacy use cases. Jessie is untested and will need some investigation. Relevant hosts are intentionally Trusty. Bug: T140696 Change-Id: Ib1398c6c95091515ccb2c7f0f9192972c139617f	19 July 2016, 14:33:57 UTC
8cf9fc3	YuviPanda	19 July 2016, 14:03:22 UTC	tools: Add check for all nodes in Ready condition If they aren't, they should've been marked as unschedulable with the kubectl drain / cordon commands. Bug: T140248 Change-Id: I93d732a8de62ce2970af94e862ebf936f6a9067d	19 July 2016, 14:32:43 UTC
e5c323a	Alex Monk	17 July 2016, 21:48:09 UTC	labs dnsrecursor metaldns: Resolve PTR records too Bug: T139438 Change-Id: I690ecf6dcffdcf669b03c535594f0f4f44b09c65	19 July 2016, 14:26:12 UTC
0684e20	elukey	19 July 2016, 14:09:00 UTC	Add addshore and zfilipin back to the deployment group The usernames were removed by accident from the deployment group while working on T140422. Change-Id: I42edd3a1de2ea5c93d8e4c1d8592e72c6622ed5a	19 July 2016, 14:19:33 UTC
b2bec38	Alex Monk	17 July 2016, 21:46:38 UTC	labs dnsrecursor metaldns: use hiera's labs_tld instead of assuming its value Change-Id: Ic3558b5535e1e7ae2cda502694b7e2b961577dd2	19 July 2016, 14:13:24 UTC
a4c24c8	Alex Monk	17 July 2016, 01:51:31 UTC	labs dnsrecursor: tidy up paths Change-Id: If2391b815d726cb4f068a5096ce251825d466f50	19 July 2016, 14:13:16 UTC
6a7e9fc	Filippo Giunchedi	19 July 2016, 11:16:22 UTC	puppetmaster: show commit hash, not tree hash Looks more intuitive this way since all output afterwards will reference commits just merged. Change-Id: Ie830d6590f08a38fb95933458ba2848e7ae9acc4	19 July 2016, 14:05:01 UTC
f1dd0ca	Filippo Giunchedi	19 July 2016, 11:09:39 UTC	site: use 'include' for role::prometheus::node_exporter Apparently using 'role' did nothing (no errors, no actions) Bug: T140646 Change-Id: I4b12917a664e43d32bb94f7a708dd8c478eb983b	19 July 2016, 13:59:20 UTC
97e929c	Jcrespo	19 July 2016, 13:18:58 UTC	Install jessie by default on all dbproxies Bug: T125027 Change-Id: I870f183ec2adceb1fa442066c3f3ca107250e53e	19 July 2016, 13:38:18 UTC
990ef09	Jcrespo	19 July 2016, 13:16:01 UTC	Repool db1001 as m1 secondary (passive) host Bug: T125027 Change-Id: Iec35f3307fc05eb87c58e785131de38221f0c33e	19 July 2016, 13:20:02 UTC
0a15670	Max Semenik	15 July 2016, 23:27:36 UTC	Delete maps-team hiera Has been migrated to https://wikitech.wikimedia.org/wiki/Hiera:Maps-team Change-Id: I44433fb3b03cb995c2615ccdcce316b1a4586974	19 July 2016, 13:16:22 UTC
89a3491	Jcrespo	19 July 2016, 11:32:16 UTC	Revert "Regenerate haproxy defaults on reload, in addition to on start" This doesn't work on reload, because only a signal is sent to the original process, so the execution options cannot be changed on reload. This has been documented on: https://wikitech.wikimedia.org/wiki/Service_restarts#Haproxy (but is is not a wmf-specific limitation, but an upstream one). This reverts commit 1364f20e870f67b6c0d832896a64c6afcf6647bc. Change-Id: I53e242513d56df285f1f51e74c20741f9a79e004	19 July 2016, 13:03:56 UTC
b457c82	Bryan Davis	19 July 2016, 01:20:26 UTC	logstash: update logstash_optimize_index.sh for ES 2.x The _optimize API was renamed to _forcemerge in Elasticsearch 2.1.0. Change-Id: I43535bd8e26a40874ea51f993fd5279c4599df02	19 July 2016, 12:55:00 UTC
1364f20	Jcrespo	19 July 2016, 11:18:33 UTC	Regenerate haproxy defaults on reload, in addition to on start It worked when starting the proxy, but we want generate_haproxy_default.sh to also run on reload (as most of the time, the proxy configuration changes will only be applied dynamically). Bug: T125027 Change-Id: Ia271ee9813a06ab1440cc536a531ba90626c44d5	19 July 2016, 11:18:33 UTC
766c183	Jcrespo	19 July 2016, 11:09:44 UTC	Fixing typo on systemd haproxy unit (extra newline) Bug: T125027 Change-Id: I627e2f08a2e4b5555c4f9a4245a4a86ea50d862a	19 July 2016, 11:10:52 UTC
4fbcf2e	Jaime Crespo	29 February 2016, 19:17:35 UTC	Update haproxy default file, as it cannot be dynamic in jessie As in jessie (systemd) the default config file cannot be a script, but only a set of key-value pairs, we need to generate it automatically. This has been added to the systemd unit pre script and tested successfully on dbproxy1005. The "if" and the duplicate defaults file will be dropped as soon as all proxies are upgraded to jessie. Bug: T125027 Change-Id: If2dd396fc7bfc38201337b9d318f12e69dc2519e	19 July 2016, 11:04:26 UTC
5da0367	Filippo Giunchedi	18 July 2016, 15:27:26 UTC	site: add node_exporter for prometheus machines Bug: T140646 Change-Id: Ie1d0a87ea9d23fc7346d1ee4df7eaaab29c0c51e	19 July 2016, 10:40:22 UTC
d775eb6	Giuseppe Lavagetto	19 July 2016, 09:12:59 UTC	service::node: add 'entrypoint' and 'heartbeat_to' As we want to use service::node for deploying parsoid, we need to add two configuration parameters that will be used for it. This is part of I3f4a5 that is being splitted in multiple commits Bug: T90668 Change-Id: I57a656d97587101e1f9862bb6e1e56479741d307	19 July 2016, 09:47:12 UTC
66fb386	Giuseppe Lavagetto	19 July 2016, 09:44:09 UTC	parsoid: add transition cleanup role Should be used on servers after they've migrated to role::parsoid from role::parsoid::production. Some directories are not removed willingly. Bug: T90668 Change-Id: Id3d931e332c224870b037a3caf7836736534e650	19 July 2016, 09:44:09 UTC
c547247	elukey	19 July 2016, 08:19:43 UTC	Remove analytics-deploy user/group since they will be created by scap:target. Bug: T129151 Change-Id: Id16bab21926b307f95faabe5682d4d44131fa9c7	19 July 2016, 08:19:43 UTC
45e752e	Giuseppe Lavagetto	19 July 2016, 06:45:12 UTC	puppetmaster::web_test: add NameVirtualHost directive It was not present in the original apache config as only one virtual host was available. So add it, and also add a lower-priority virtual host for performing the tests. Change-Id: I190659667798b955e35346ab0443944858345791	19 July 2016, 08:12:35 UTC
ce0c4e0	Petr Pchelko	23 June 2016, 13:01:30 UTC	Change-Prop: Ignore certain errors on page_delete and null_edit. For page_delete event it's completely valid for RESTBase to respond with a 404, so we need to ignore that error and not send a message to the error topic. For null_edit event some of the titles that commonly receive null edits are eplicitly blacklisted in RESTBase, so we need to ignore 403 errors for this event. Change-Id: I7553f16dcfd849af2226c5770f6e20ce476974d9	19 July 2016, 07:25:08 UTC
3e4c301	Giuseppe Lavagetto	15 July 2016, 13:42:28 UTC	puppetmaster: add test site to palladium The easiest way to validate that a puppet master works as expected is to compare catalogs generated from it and from the other backends; we thus create a 'puppet.test' virtualhost that points all its traffic to rhodium, the new puppetmaster we want to test right now. This will allow to run catalog compilations against the standard backends and against rhodium and check for errors/differences. Bug: T98173 Change-Id: Ie40e9608047cc834528cffe7f7b861c1f7a62085	19 July 2016, 06:18:17 UTC
8ec3d50	Paladox	18 July 2016, 23:22:41 UTC	[gerrit] Use HEAD instead of master for branch Since some repos use a different branch to master this will break the link for those repos. With this patch this fixes the link. Change-Id: I78c78d8a2968f59e6813f6d7b71617242c1ece32	19 July 2016, 02:32:47 UTC
7a40fdd	Chad Horohoe	19 July 2016, 01:27:28 UTC	Gerrit: Introduce comment of $maint_mode for the web UI This allows Gerrit to disappear with 503s while providing a useful error message back to the user. Don't just rely on ErrorDocument, as Gerrit might be going away and coming back several times. Should be a no-op (config-wise) on both ytterbium and lead until we actually use $maint_mode. Change-Id: Ideebea72ddbd8a9b919ffa319724ef1caecde7df	19 July 2016, 02:21:56 UTC
31ec1a8	Paladox	17 July 2016, 16:39:56 UTC	Add some colors to the site table on changes In the update to gerrit 2.12 it will remove the colors from the review table on changes making it harder to notice. Add some colors to it including making it a table with faint lines. I got the css from https://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/files/gerrit/GerritSite.css Change-Id: I93dd695cedc4c456f50982f64f6defcfe0a85cb9	19 July 2016, 02:20:10 UTC
5138d7c	Paladox	18 July 2016, 18:46:13 UTC	Add css to turn repo links into blue again in gerrit 2.12 it seems the links are blank but in gerrit 2.8 it is blue. Let's not let the links look plain but actual links. Change-Id: Ibcb8054fcb44a00ac186bca6c606a17d19fef97c	19 July 2016, 02:08:25 UTC
fc26872	addshore	14 July 2016, 08:51:14 UTC	Introduce wmde-analytics-users group This group will allow members to manually run the crons associated with the stats:wmde role as well as individual scripts in the case of failed runs and or back filling of data. Bug: T140342 Change-Id: I090b49634aa594fb2006f44edee03bac5b86bed0	19 July 2016, 01:29:39 UTC
a04cc29	Daniel Zahn	19 July 2016, 00:10:05 UTC	gerrit: fix ssh port monitoring follow-up to If23443bca48f45e3 The command definition is: /usr/lib/nagios/plugins/check_ssh -p '$ARG1$' '$HOSTADDRESS$' so the host address is already automatic and there is just one arg, the port. Without this we are getting "check_ssh: Port number must be a positive integer" Change-Id: Ibe2048713bfe5bc2c83489aa00eec229220e1369	19 July 2016, 00:18:35 UTC
0ed4828	Chad Horohoe	18 July 2016, 23:04:27 UTC	Nitpick: Point to Phab tasks directly instead of BZ redirs Change-Id: I4fffee2c01665145e4cae9442f5283dfd1946962	18 July 2016, 23:57:20 UTC
62aee3a	Chad Horohoe	18 July 2016, 23:24:19 UTC	Gerrit: monitoring conflicts with system ssh Change-Id: I0d90baae8421ae1211a77ae1a7b34ade520e8a06	18 July 2016, 23:24:19 UTC
6cb4aab	Chad Horohoe	18 July 2016, 23:20:54 UTC	Gerrit: Follow-up I8455189c, use monitoring::service not nrpe::monitor_service Change-Id: I2417fb54acd84a9fb39b0ca6649b7218eea3804f	18 July 2016, 23:20:56 UTC
b0c36be	Chad Horohoe	18 July 2016, 23:14:25 UTC	Gerrit: Follow-up I8455189c, isn't needed now Change-Id: I194f7c1e5300876caf53eafcc8a9bfce0915c18a	18 July 2016, 23:14:25 UTC
c36f942	Chad Horohoe	09 July 2016, 00:47:13 UTC	Gerrit: Remove SSH public key and last user of it Gerrit now supports doing its own garbage cleaning. Stop using this hack to trigger it. I'm pretty sure the SSH key isn't being used anyway since it's in Gerrit itself. Change-Id: I8455189c964cfd2f49521db88980073f1f0b9b5b	18 July 2016, 23:09:21 UTC
b8d510d	Chad Horohoe	18 July 2016, 19:35:17 UTC	Gerrit: Add icinga check for Gerrit SSH access Change-Id: If23443bca48f45e33335ac580f6efcbd5d8585b1	18 July 2016, 22:53:35 UTC

Newer
Older