https://github.com/wikimedia/operations-puppet

sort by:
Revision Author Date Message Commit Date
91ecae9 Merge "decom: ms6" into production 09 April 2014, 15:46:33 UTC
cf0af38 RT #7246 add Filippo Giunchedi to icinga cgi allowed commands Change-Id: I4d12616aff90dc7899a203204738e5594cc809d0 09 April 2014, 15:26:54 UTC
37406c7 Moving sqstat to analytics1003 It looks like it is working Change-Id: I98de7e30daddd5efe8f459d7c02a724a6f7c5b14 09 April 2014, 15:24:39 UTC
8b36418 RT #7246 add fgiunchedi to icinga contact groups Change-Id: I41a53683d842657cbccb88342b3b97258e904345 09 April 2014, 15:18:16 UTC
b3f371d Fix graphite-web cronjob. We should use shutil.move instead of os.rename, that does not work across filesystems. Change-Id: I55db9f76f9cc78a3a1ae6d6f505d19395a893fdf Signed-off-by: Giuseppe Lavagetto <glavagetto@wikimedia.org> 09 April 2014, 14:56:49 UTC
bfbb946 Merge "RT #7243 add filippo to admins::roots" into production 09 April 2014, 14:56:44 UTC
5c24c1d Merge "Check the proper cert on the cache boxes" into production 09 April 2014, 14:56:38 UTC
d26b70e Check the proper cert on the cache boxes Change-Id: Iad22593c95a43c851c95526140504a0f32ef970d 09 April 2014, 14:38:18 UTC
25f593c RT #7243 add filippo to admins::roots Change-Id: I45226234db9e2c62cc2536d9fd77934e1e086ca1 09 April 2014, 14:33:35 UTC
332e3ed Merge "Some extra improvements for check_eth" into production 09 April 2014, 14:31:52 UTC
1cb337d Merge "This is still wrong but slightly better!" into production 09 April 2014, 14:29:35 UTC
198bc06 Some extra improvements for check_eth * Dont check virtual interfaces * Only complain if the autonegotiated speed is below 1Gbit * Dont bail out but rather continue on unknown interfaces * Cleanup a bit Change-Id: Iae1f380705233a2fba4765049065665f6ba85bd4 09 April 2014, 14:25:25 UTC
b5342b3 fixing ticket.wikimedia.org apache vhost file discovered apache vhost typo/issue when replacing the key/cert Change-Id: I381c7e97ed51e075a90e930adb0e60ca26739051 RT: 7230 09 April 2014, 14:15:32 UTC
024418f This is still wrong but slightly better! Change-Id: I7147e506e9db6df3588512dd1c548618ee4bd8ae 09 April 2014, 14:11:05 UTC
2ddf301 Fix another introduced in I4a7a2c71be Call split was obviously wrong, gsub was originally intended Change-Id: I27e0aa6008aae959a536ca23b12a3dd0a45830dc 09 April 2014, 13:58:31 UTC
6edfa15 replace ticket.wikimedia.org certificate heartbleed cert replacement Change-Id: Ica03b7c1653c738dc6a5e8abda0112c90ddb614b RT: 7230 09 April 2014, 13:54:33 UTC
d7932c7 Fix bug introduced in I4a7a2c71be Obviously base module, not nrpe module Change-Id: I43e60fe5cc80e29c650df7a48e70cf06cb8c6d19 09 April 2014, 13:48:01 UTC
bbf5942 Merge "Improve the check_eth check" into production 09 April 2014, 13:44:57 UTC
ef01517 Merge "create shell account for Filippo Giunchedi" into production 09 April 2014, 13:43:10 UTC
387f343 Improve the check_eth check This improves the check_eth script introduced in 700cb5f6512aa151f99a99c672395d3b09d5a2dd by adding the ideas presented in Id5130db48c14b87752197ce8f0b022ef13e40e4b The idea is to have a single check and not two and at the same time make sure the entire infrastructure is working as configured Change-Id: I4a7a2c71be40941f812c4bcfa49c7ccb25ab36db 09 April 2014, 13:38:51 UTC
d6a4d8e decom: ms6 RT #7237 Change-Id: I3f57b6d7fe78b93de1aee6372626790f421fca96 09 April 2014, 13:35:39 UTC
1081e6e create shell account for Filippo Giunchedi new ops member RT #7243 Change-Id: Ibb34f4480be58fb339ae596443bf253f6cec47d2 09 April 2014, 13:33:38 UTC
ca4a1da Revert "Make (graphite|gdash).wm.o go through misc-eqiad-lb" It turns out that all of the magic this patch tries to do (renaming tests so they run on one box but report for another) doesn't work, and trying to do it makes icinga sad. For instance: Error: Could not find any host matching 'graphite.wikimedia.org' (config file '/etc/icinga/puppet_services.cfg', starting on line 131098) This reverts commit a87afea676375aba4f0ec28228e28df0502e5321. Change-Id: I58b4fb3de75e357dbac9a89f1fb4cd84efbff545 09 April 2014, 13:20:07 UTC
5a32066 Merge "Partial revert of a87afea676375aba4f0ec28228e28df0502e5321" into production 09 April 2014, 13:12:07 UTC
2e65b7b Merge "beta: reenable fatalmonitor script on eqiad" into production 09 April 2014, 12:52:11 UTC
22c9517 Partial revert of a87afea676375aba4f0ec28228e28df0502e5321 Previously we were setting hostname to $::realm which was intentionally bogus... that didn't do what was hoped, though, and caused icinga to bail out and skip a bunch of tests. Change-Id: If8102cffd4364639522546211690797dae8133a6 09 April 2014, 12:51:49 UTC
5b7895d Merge "remove all Tampa appservers from DHCP" into production 09 April 2014, 09:52:54 UTC
144dcd4 Merge "Hammer down a few more bogus https failures." into production 09 April 2014, 09:38:40 UTC
53d9a14 Merge "Create symlink for compile-wikiversions in /usr/local/bin" into production 09 April 2014, 09:36:03 UTC
4f11fcc Hammer down a few more bogus https failures. The new check_ssl_cert test is more sensitive to cert name in the unified cert... this patch changes a few more names to settle down the tests. RT 4725 Change-Id: I0fd3fa73c20e2f810dba52964660b26def0c8b70 09 April 2014, 09:35:20 UTC
3d387bc Merge "Make (graphite|gdash).wm.o go through misc-eqiad-lb" into production 09 April 2014, 03:41:37 UTC
a87afea Make (graphite|gdash).wm.o go through misc-eqiad-lb The service checks for graphite.wikimedia.org and gdash.wikimedia.org uses HTTPS, but tungsten doesn't serve HTTPS requests -- it gets free SSL termination from misc-eqiad. We could convert the check to a 'check_http' instead, but it is better to have the check actually go through the load balancer. That way, the check really does check that the site is up. Also sets the 'host' arg for the 5xx check to $::realm, so that tungsten isn't cited in alerts that have nothing to do with it. Change-Id: If4e8f3f17374d4b994f7f04e3f96d49e51eb6e88 09 April 2014, 03:33:41 UTC
bdf7afd uWSGI service: specify --autoload This was previously set by /usr/share/uwsgi/conf/default.ini, but we are no longer loading that file because it tells uWSGI to daemonize, and we want Upstart to manage it instead. Change-Id: Iae0b0196724f922a21762a9401baf56c6f8d7bb1 09 April 2014, 02:38:51 UTC
3984214 Add service checks for mwprof, uwsgi, graphite-web & gdash * Adds checks for https://(graphite|gitblit).wikimedia.org * Adds checks for uWSGI and mwprof services Change-Id: Ib0bc4a95217a2c97ed16448990674ef3065cb9db 09 April 2014, 02:30:25 UTC
417545f Add 'uwsgictl' tool for managing services Provisions /sbin/uwsgictl, a simple wrapper around the uWSGI Upstart task configuration that makes managing the service easy. This is a pattern I've used elsewhere in our Puppet repository and that I find works well. Change-Id: I58414f30061a99e58d16dedd7edf4faba1919867 09 April 2014, 02:20:09 UTC
ecf16f6 Merge "Update graphite::web for I8c214e0fd" into production 09 April 2014, 02:03:19 UTC
dae5783 Update graphite::web for I8c214e0fd I8c214e0fd changed the service resource name for uWSGI from 'uwsgi' to 'uwsgi/init'. Change-Id: I6eb62158aa1dc34e0690857ce94c2aee276d1367 09 April 2014, 02:02:15 UTC
3b3503d Merge "Replace uWSGI's broken init.d scripts with Upstart job def" into production 09 April 2014, 02:00:12 UTC
158dec5 Replace uWSGI's broken init.d scripts with Upstart job def The init.d scripts for uWSGI leave processes hanging around and starts new processes before these lingering processes have been terminated. They're also long and complicated and difficult to reason about. This patch replaces them with an Upstart task, which is a pattern I've used elsewhere (EventLogging, mwprof, Carbon) and that I find works very well. Change-Id: I8c214e0fd18c0f12768f016272a261b0f4476aa8 09 April 2014, 01:51:56 UTC
c2db9b6 Create symlink for compile-wikiversions in /usr/local/bin Symlink the new compile-wikiversions scap helper script to /usr/local/bin. Bug: 63659 Needed by: I143fc53 Change-Id: Ie6bb742cda33e1807ffb508cc5f54da42bfb6ead 08 April 2014, 22:47:34 UTC
97957ce Moving sqstat back to emery :/ This isn't working yet, and I don't have time to figure it out today. Change-Id: I105d6d697ee6f859f8579bf9e9c895dcd2d8bcd0 08 April 2014, 17:11:36 UTC
7ff103e Removing ethtool package from other places 94a1ff12c278c38b498949f57290acee61873eeb added it in base causing duplicate definitions Change-Id: I033dac1b341df6ddd4aece8042ae53a767bdd1bc 08 April 2014, 16:48:57 UTC
89ec3fb replace misc-web-lb cert replacement of ssl cert/keypairs Change-Id: Idc634111cd39c8afe29ee19e77b9291236e29668 RT: 7243 08 April 2014, 16:45:34 UTC
d58a956 invalid MariaDB variable name: user_stat Change-Id: I1c699a1269ddbd3d5118b97879b0acc3a1b1a603 08 April 2014, 16:37:56 UTC
46be863 Putting sqstat back on analytics1003 Change-Id: If488809a6cf2f52093d70d634ad0f59aedab611d 08 April 2014, 16:36:31 UTC
0b755f8 beta: reenable fatalmonitor script on eqiad Was protected with a $::site == pmtpa, dropping the guard. Change-Id: I25dc9289d7768183c6dae2bdd43084d1795e86a3 08 April 2014, 16:22:59 UTC
0e8cf01 Disabling statistics roles on stat1 Change-Id: I148731e478e86f31e077c0c7fb1cbbced1b39c36 08 April 2014, 16:20:03 UTC
2e686e4 When checking unified certs, check for *.wikipedia.org Our new test freaks out over *.wikimedia.org but it's all the same for monitoring purposes, both are in the unified cert. Change-Id: I3b72dafd03a6324d325ec5bb534498876200c7aa 08 April 2014, 15:44:58 UTC
c17b56d Merge "enable base monitoring for ALL hosts" into production 08 April 2014, 15:22:39 UTC
cf33aec enable base monitoring for ALL hosts remove the restriction that limited all of base monitoring (check DPKG, disk space, RAID etc) to just internal hosts. instead have it on all, since we also have NRPE on all hosts now since I1e25a532db02474a2d8105b25ae844fdb4e8e72b RT #80 Change-Id: I6f347c731493cda7836b181cf0fe81c1993151ab 08 April 2014, 15:14:57 UTC
4bd7059 Install and use check_ssl_cert tool to validate certs. rt 4725 Change-Id: Ic8fcea3207a2e51c40e414d72320a5be08b2fc13 08 April 2014, 14:54:03 UTC
fcea9a2 reprepro/updates - upgrading elasticsearch to 1.1 Change-Id: I1ea9450b45eb22046bef9830f8e4712c05231ba2 08 April 2014, 14:18:58 UTC
2e017aa Revert "Giving Nik shell access to analytics1004 to do some elasticsearch load testing" Broken for a month, group doesn't exist. This reverts commit be929bf21574f9530c886755f0724dfb3c5a7951. Change-Id: I76f374c28e4bef236dc21c5978b54dd602610dd8 08 April 2014, 13:42:10 UTC
e5afbe1 Merge "replace blog.wikimedia.org certificate" into production 08 April 2014, 13:37:33 UTC
4083802 replace blog.wikimedia.org certificate cert reissued Change-Id: If4ab7e875f4e35f41c39eef2752a2a0c5bb73af0 RT: 7216 08 April 2014, 13:35:20 UTC
3ee5c85 Merge "Tool Labs: forcibly upgrade libssl" into production 08 April 2014, 13:31:42 UTC
cd5116e add nrpe to base - add nrpe to base to have it on all hosts (RT #80) - remove it where base is already included - add base where it was not included yet Change-Id: I1e25a532db02474a2d8105b25ae844fdb4e8e72b 08 April 2014, 13:12:46 UTC
843433f Tool Labs: forcibly upgrade libssl Fix for CVE-2014-0160 Change-Id: I7ca170fb8fa712dd451d56851799acd761e52493 08 April 2014, 13:09:12 UTC
29cd201 Merge "adding ethtool to standard-packages.pp to be able to monitor interface speed" into production 08 April 2014, 13:05:08 UTC
94a1ff1 adding ethtool to standard-packages.pp to be able to monitor interface speed Change-Id: Ic8d6c392e723f47a6ac8b00652e8ffbf8adb0730 08 April 2014, 12:45:17 UTC
b01073a Remove unused db1014 block. db1014 was renamed tungsten rt5871. Change-Id: Iff516353a4e0fdd96ee7198e4a7dda5a33a813f3 08 April 2014, 10:16:04 UTC
700cb5f Add eth1 checks to nova compute hosts. RT ticket: 7220 Change-Id: If92cee98f0f69786c0176aeb87250bd62444da8b 08 April 2014, 10:07:59 UTC
48a4b71 Replacing the unified certificate unified cert replacement/revokation/reissue Change-Id: I8d3c83b84981ac05532c0ecb9f63dd684e42bc32 RT: 7223 08 April 2014, 09:08:07 UTC
282111f base: add debian-goodies checkrestart is actually very useful, so install debian-goodies across the fleet. Change-Id: Ie37daa5d69f3c1bded6a865a26123c8d4d7bf5b1 08 April 2014, 07:29:55 UTC
55e7610 Merge "webperf: couple lint issues in python scripts" into production 07 April 2014, 21:52:59 UTC
1b2b68b Support eqiad labs secondary disk Use the labs_lvm::volume class to mount varnish cache disk for labs instances. Mount it as XFS to solve bug 46359 Bug: 46359 Change-Id: I52403cd1e32770d3c41c930dd6c836b0637e7a08 07 April 2014, 20:32:48 UTC
ad8f382 webperf: couple lint issues in python scripts navtiming.py raise an ArgumentError which comes from the argparse module. Fully qualify it. ve.py imported logging without ever using it Change-Id: I1dd3109fe53e5c5d263af7f2ef16b1d440af1d85 07 April 2014, 20:28:23 UTC
15211d9 Adding + 2 Hadoop journalnodes in Row D RT 7206 Things brings us to 5 journalnodes, which is a fine quorum. We could attempt to just add one and remove one, but adding is easier than removing, and having 5 should be fine. Change-Id: Iaadbb44cfbbe31b07efcdacd0ce33d58cbb1d35d 07 April 2014, 19:06:39 UTC
7a1fb19 Merge "Add Icinga checks for important sysctl params" into production 07 April 2014, 18:38:17 UTC
73f39a5 l10nupdate: add support for --verbose flag Calling `l10nupdate --verbose` or `l10nupdate-1 --verbose` will log additional progress messages as the l10n cache is updated. Also use the new --verbose flag when running from cron. Change-Id: If5ce751ba9d22c8584e942aae901c743911ebfe3 07 April 2014, 17:13:31 UTC
278ed64 Adding swalling, maryana and jforrester to bast1001 They have confirmed that they have read https://wikitech.wikimedia.org/wiki/Server_access_responsibilities These users will need bastion access to access stat1003 when base::firewall is turned back on this week Change-Id: I4e675fb6e011b4fcd6613d9f7bfd1230a1be3db0 07 April 2014, 17:11:21 UTC
6c64d69 Moving sqstat filter back to erbium until analytics can talk to statsd Change-Id: I5ea18eed8aaa9dd5613fce024632fc68ed4b0777 07 April 2014, 17:10:08 UTC
4701dcb Not monitoring logs on analytics1003 misc udp2log instance Change-Id: Ie4ac78525ad417c9d70c25861e686a1ec289ccf5 07 April 2014, 16:37:10 UTC
4b008e1 Including standard and admins::roots on analytics1003 Change-Id: Ic7a42b196faf6bdbf3cb83b1f7b1adc8016ad4e9 07 April 2014, 16:06:02 UTC
ad45441 Need to include udp2log class since ::misc doesn't inherit Change-Id: I80ff6b4724a132d7604a7023c87df4c04081a577 07 April 2014, 16:00:44 UTC
1b18df1 Setting up misc udp2log instance on analytics1003 to test moving sqstat there Removing sqstat from emery Change-Id: I28c366112cb4e9a884731bdc7f467db446b8090f 07 April 2014, 15:49:46 UTC
e0cc9e7 ganglia: address selector in a define Make puppet-lint happier. Change-Id: Icbe3df57073469b567d20fb811f6d1d58911a3a3 07 April 2014, 13:28:43 UTC
8c4748b ganglia: lint manifest! Pass puppet-lint on manifests/ganglia.pp. No logical changes hopefully. Change-Id: I68c174d6dc14b0af03ed9be347583e37c688cc82 07 April 2014, 13:27:04 UTC
a6594e3 ganglia: class defined in class moved to top Lets avoid classes defined inside a class, makes it hard to grep for them plus puppet-lint complains. Change-Id: I81b73f922ac8a8f47cf9e5c124b7e4f3daa08e7b 07 April 2014, 13:24:06 UTC
c6ee102 Merge "mail :lint" into production 07 April 2014, 13:23:24 UTC
e3a6075 Purge /etc/apt/apt.conf /etc/apt/apt.conf gets created during the installation process, either via d-i or labs vmbuilder. It is no longer needed after the install is over and given it is unpuppetized it causes various problems during migrations This was found during RT: 6133 Change-Id: I8c0031851f2f3cf5aa0b558a564e31de70cfbfc8 07 April 2014, 11:36:37 UTC
14ecc89 Add an account for subbu on Parsoid / Cassandra test hosts Change-Id: I9fc17b7dbf6df3dadba08f4699dc4c5fd4b66151 07 April 2014, 10:11:50 UTC
6a52c94 removing spetrea's ssh key per RT #7203 Change-Id: If404e2675ed05539e9cd6447ffa3596617ad3980 07 April 2014, 06:25:58 UTC
d8a9e00 Merge "Jon Robson access to fluorine" into production 07 April 2014, 06:03:00 UTC
5f07e69 Add Icinga checks for important sysctl params Add 'check_sysctl', an Icinga plug-in that issues alerts whenever the actual and expected values of sysctl parameters are not the same. I made two additional changes that were prompted by rereading the README of the Debian procps package: "In general, files in the 10-*.conf range come from the procps package and serve as system defaults. Other packages install their files in the 30-*.conf range, to override system defaults. End-users can use 60-*.conf and above, or use /etc/sysctl.conf directly, which overrides anything in this directory." Thus: * Make the default priority of sysctl::conffile and sysctl::parameters resources 60, rather than 10. Any additional sysctl::parameters resources won't come from the procps package itself, so the default priority of 10 is neither sensible nor safe. * Change priority of Sysctl::Parameters['lvs'] to 60 so that it conforms with the priority namespacing policy of the package. (The 50 range is not defined.) Change-Id: If004835fb369810bba553dce2d2b8df7b8d364ac 06 April 2014, 02:36:34 UTC
5bd04c0 Ensure that status is always defined in deploy.checkout Bug: 60068 Change-Id: I6db4da1c9d899f4729f5f37eca857b30aa886f39 05 April 2014, 18:52:27 UTC
152e141 base::firewall always needs defs, but main-input should be absent if ensure => absent Change-Id: I153b175f558944a203e9bc923c7baab742f6a82c 04 April 2014, 19:50:48 UTC
223354e No need to remove ferm confs on base::firewall ensure => absent Ferm gets all confused if this happens! We only need to ensure that firm::rules are absent Change-Id: Ic522f5c67b8ccf7693ded9b74021022a5e0df0f0 04 April 2014, 19:43:50 UTC
9037355 Disabling base::firewall on stat1003 This is a temporary change, until we grant bastion access to all stat1003 users. They will have to first each confirm that they have read the Server_access_responsibilities document before being given bastion access. Change-Id: I2517511b8c1b64724d4fcd9df49b32bf957c978d 04 April 2014, 19:36:01 UTC
f0bd148 Reverting previous change, we will figure this out next week. This reverts https://gerrit.wikimedia.org/r/#/c/123895/ Change-Id: I17bec11875b172473080f347c2f36f73cdb26b20 04 April 2014, 19:33:13 UTC
9c062f2 Merge "Tool Labs: Fixes for webgrid" into production 04 April 2014, 19:28:16 UTC
1edbce0 Adding stat1003 accounts to bast1001 Since stat1003 doesn't have direct ssh access anymore, accounts there need to be present on bastion so that ssh proxying will work. We need to find a better way to grant just bastion access. admins::restricted gives access to more places than just bastions. Change-Id: I98f0b203b3a21d07b9fe2b44774ecfd834e81de1 04 April 2014, 18:57:02 UTC
cec4c16 Tool Labs: Fixes for webgrid - /var/run/lighttpd needs to be +t - php5-cli is required Change-Id: Id01a00b087b2f2e51dea0cfb82443c2d5d6987ac 04 April 2014, 16:50:15 UTC
e8a07be Labs: -pmtpa support from role::labs::instance Change-Id: I8b669b8f0a20aac69114ead970db71208def9132 04 April 2014, 16:29:52 UTC
0f426c7 Including base::firewall on stat1003, allowing rsyncd traffic within internal network Change-Id: Ib90075da150372a2ebe9b40bd749b0591f131f23 04 April 2014, 14:41:43 UTC
00d3f06 Adding $ensure parameter to base::firewall This makes it possible to remove base::firewall via puppet Change-Id: I2ec0d72a44597804c42052114e5cf31713f871ae 04 April 2014, 14:19:27 UTC
d67ecf4 Adding $srange parameter to ferm::service This uses the already existant R_SERVICE ferm def. Change-Id: I47ff32faf94bb4251bb6223d0f7068c48e204661 04 April 2014, 14:17:28 UTC
e654382 remove all Tampa appservers from DHCP those are down. the reason to keep DHCP so far was that it was considered to maybe use that to mass-boot them into a DBAN image for wiping (RobH/Chris) letting them decide if we do or not, if these will never boot and use DHCP again, please merge RT #6099 Change-Id: I73e523787a21e56a1ae1e2b179b22e90f57b7ce7 04 April 2014, 13:47:55 UTC
9c746e9 Update docu around analytics rsync source Public datasets get rsynced from stat1003 since 185c6cb7aed6f379dd0585312916bb2117d64fa3 . Change-Id: Ib8f0b15b96905b858598356dbe057453897c4a7a 04 April 2014, 13:18:23 UTC
67bc247 Remove group writability for analitycs files /a/squid, and /a/log Due the group writability of files underneath /a/squid, and /a/log, they sometimes accidentally get modified by people. Hence, we remove the group writability flag, because it is not needed there anyways. Bug: 63505 Change-Id: Ie32df433472a6ecca47bde8d694e0eca771185c1 04 April 2014, 13:17:44 UTC
back to top