https://github.com/wikimedia/operations-puppet

sort by:
Revision Author Date Message Commit Date
75ccc63 icinga: Move config into module - No need for $static_files, since service refresh is triggered by notify in the class declaration - No need for individual service notifications inside naggen, are handled in the class declaration Change-Id: I11cdd1817a3c14ff4bd0a31eef2a0e2d2e7f2ccb 26 September 2014, 19:43:54 UTC
9b84ffa Merge "setting install params for ms-fe2001-2004" into production 26 September 2014, 19:42:57 UTC
ecd4e4b setting install params for ms-fe2001-2004 setting the install params for codfw ms-fe servers Change-Id: If1d26057fbb37240d3a1fb54e251d77fed3acf37 RT: 8295 26 September 2014, 19:38:20 UTC
5e9b551 icinga: Move initscript into module Ideally we should just use the one that comes with icinga, but not in this refactor Change-Id: Ided256110a32beb6507e64fcb87d5b5f0cbe7ad0 26 September 2014, 19:23:22 UTC
cdcdfcf nagios_common: Move check_ganglia into module Change-Id: Id04fa056ddbed5e94be45ddd3c08773e521afdab 26 September 2014, 19:22:57 UTC
7426412 icinga: Move global monitoring hostgroups into module Change-Id: I90fb5b8fa2cb964cf68f352753765bb2fad42e14 26 September 2014, 19:22:42 UTC
8d866c3 icinga: Move naggen into module - Setup service notification in the file {} definitions, and setup a dependency to make sure it executes in order. This is more consistent than using subscribes - Remove the 'permissions fix' code, since all the files seem to be generated with appropriate permissions anyway Change-Id: Icc37c98309b0106de3174a8c39de2c77f50d5652 26 September 2014, 19:22:15 UTC
12dc741 Merge "fix duplicate declaration of http check" into production 26 September 2014, 19:19:09 UTC
94f5cdf fix duplicate declaration of http check we gotta use more specific names than just 'http' for those checks, or if we combine the roles on one host, as i just did, we are getting puppet errors not liking this: Error 400 on SERVER: Duplicate declaration: Monitor_service[http] is already declared ... Change-Id: Id6d8bf8b99407cc82bc2e5b4cc561cdd8476cf85 26 September 2014, 19:18:01 UTC
d8502ce terbium - include misc::noc-wikimedia to move noc.wikimedia.org away from fenari and over to terbium Change-Id: I3939a435b8238f6e51168bc618c4d6ecb4082ccc RT: 6145 26 September 2014, 19:06:31 UTC
4125a9a Use $::instanceproject as Hadoop user group in labs Change-Id: I2265897b312bfc62fc7eb909bf3368f68722bb66 26 September 2014, 18:59:29 UTC
968cc80 Increase OCG warning/critical space thresholds. We're working on tracking down a space leak here. But in the interim, there's no reason to have these thresholds so low: the machines in question have 500G disk, of which 453G is currently free. (And 32G ram disk.) Bug: 71341 Bug: 71260 Change-Id: I094fcd5d46a1a97acafd75484d0b5959c08448e7 26 September 2014, 18:49:56 UTC
c56e0c8 Move hhvm backend choice back to backend templates ... because the common include file doesn't have access to the variable to distinguish tiers, and I don't have a cleaner idea to fix this in one shot right now... Change-Id: I1d5ff97d8f24847f65032eb3e62c9bc505286080 26 September 2014, 18:43:39 UTC
ad72c04 hhvm_api varnish fixup Change-Id: Ib399d1132f894d77ac3f2c0b464e3a36ad4b455f 26 September 2014, 18:33:48 UTC
4a7f826 hhvm: serve API as well Change-Id: Ib9d4a6d25573569236facee6ea7c3f20f8dbe605 Signed-off-by: Giuseppe Lavagetto <glavagetto@wikimedia.org> 26 September 2014, 18:28:02 UTC
1128c2f Include CA certs everywhere rather than just when we need them. Because... can't hurt. Change-Id: I87a5a81791242843f04644432f66c4ab70b9eb71 26 September 2014, 06:12:28 UTC
c3473c6 We need ldap certs on some production boxes as well. Change-Id: I9e56e18c18ec652c7f8ec12859cb49fb5b02a68e 26 September 2014, 15:58:02 UTC
6ab3d25 Merge "Update labs instances to use the new ldap-eqiad server" into production 26 September 2014, 15:28:03 UTC
e5615cd icinga: Move packages into module Change-Id: I6547e2b983d4d188fe0f7575d398db6ea4f56d1b 26 September 2014, 15:24:44 UTC
5407de0 Merge "attempt to move ocg logs back to /var/log/ocg.log" into production 26 September 2014, 15:22:28 UTC
0fd1cb9 icinga: Remove resource.cfg from refresh list Handled by nagios_common::user_macros Change-Id: I4b7833b1203efc024d4da827a33cacaf08af5739 26 September 2014, 15:12:08 UTC
63311cb shinken: Specify config_dir for contacts Change-Id: I7bf3da4c5af1ef3038e7e999f504ea35152ee04d 26 September 2014, 15:11:47 UTC
2246941 attempt to move ocg logs back to /var/log/ocg.log Change-Id: I8aa178f71bf285f714ea576f889d7a612e861489 26 September 2014, 15:10:44 UTC
16f9712 Setup ldap client in terbium With this, silver (according to puppet, at least) has no more functions, and can be decommissioned Change-Id: I68c9cd9c4e636d4ca98f12f619edfaffe7462657 26 September 2014, 15:05:52 UTC
c76f4ae Remove all vumi related code This is all hosted externally now and unused internally. We might want to figure out if we should add ::decomission code to kill the things that were running, or do that manually on silver. This also removes most unmodularized code under 'mobile/', so yay Change-Id: I56746045a53a6f8c2f9e7e41c0fceecf17943373 26 September 2014, 15:05:20 UTC
5b10cb2 puppet-compiler: fix typos Change-Id: I8ab22da0e0bd88cb223849d83256bb06f75e05eb Signed-off-by: Giuseppe Lavagetto <glavagetto@wikimedia.org> 26 September 2014, 14:34:11 UTC
794dbdb Merge "Send all ariel's mail to Google Apps" into production 26 September 2014, 13:58:43 UTC
0a341fd authdns: Add trailing comma on extra listeners fixup for 0bfdf1712536321f7cfebe953be16cad502d4051 Change-Id: Id11510d18ba9f75dc3526c995f7139f8804b0f08 26 September 2014, 13:55:43 UTC
984e471 authdns: set up extra listeners for codfw transitions rubidium (eqiad) now also listens on the addrs for mexia (pmtpa) and baham (codfw). baham (codfw) listens on the addr for mexia (pmtpa) as well. Change-Id: I29d6ebadbc60aae7f52a6a58e268fab04c4ab98a 26 September 2014, 13:52:21 UTC
8930c23 dsh: Move dsh related code into a module Removed the dsh_groups define since that wasn't used anywhere. It was introduced in I0b86c6939821434ff40e5360e17d21f6e0a3ee3f with a note saying 'not included anywhere', and it has been more than a year and it still is not used anywhere. Change-Id: Ic1bacaae8b3ef6fec2065ab8b63e72f3dcf6b2d5 26 September 2014, 13:51:26 UTC
c92cf94 Send all ariel's mail to Google Apps Ariel seems to have the same issue I have with different primary account names on the two systems. Change-Id: Ib4e7b9ef50518237ef74e4832f9662a54f675602 26 September 2014, 13:28:37 UTC
bafa768 mediawiki: allow additional pools on one mediawiki host Change-Id: I80db47e22b3ced3577cfe8e27a23384816c4354f Signed-off-by: Giuseppe Lavagetto <glavagetto@wikimedia.org> 26 September 2014, 11:32:34 UTC
b856e72 convert mathoid from high-traffic to low-traffic Bad assumption on my part to put it in high-traffic anyway Add service IP for mathoid in LVS balancer IPs Change-Id: I9ac218a56e4794d968058d401fe329a449ea5307 26 September 2014, 11:22:04 UTC
6d40953 Add service ip for apis under hhvm Change-Id: Iacc83804687cb48de25b3ba9e0ef71dff907a4f9 Signed-off-by: Giuseppe Lavagetto <glavagetto@wikimedia.org> 26 September 2014, 09:22:07 UTC
0c7d4ce lvs: add hhvm-api.svc.eqiad.wmnet Change-Id: I777e640dd22413c7f18b367448a938ab518b6d1f Signed-off-by: Giuseppe Lavagetto <glavagetto@wikimedia.org> 26 September 2014, 09:09:46 UTC
c4a28b8 install-server: install lldpd early rationale being that this way a newly-installed machine will be listed in 'show lldp neigh' command on the switches, confirming where the machine is and the link status Change-Id: I4ab7606b99f4c16c0c9d518a6c36b5952b21ff28 26 September 2014, 08:32:13 UTC
846cb58 hhvm: do not install a specific version anymore Change-Id: I6ab798693e0f04cf1d43355aa7311a0771bbe0b9 Signed-off-by: Giuseppe Lavagetto <glavagetto@wikimedia.org> 26 September 2014, 08:19:30 UTC
c79f865 Update labs instances to use the new ldap-eqiad server Change-Id: I0a68dd9c69179e51cdfeded7b9336f6ee378a5af 26 September 2014, 03:21:34 UTC
132be05 Merge "beta: Remove Apache::Conf['hhvm_catchall'] from mediawiki::web::beta_sites" into production 25 September 2014, 23:54:38 UTC
6e7fae0 beta: more deps for beta cluster jobrunners Taken from local commit made to deployment-salt on 2014-09-10. Change-Id: Ia3ce27a3b3539a48bd29d4582961f43d928755c0 25 September 2014, 23:24:16 UTC
b02284b nagios_common: Remove duplicate paging definition Change-Id: I4de3bd9eab92565706f02f6e95ceed43a9bd7d53 25 September 2014, 23:12:37 UTC
bb9c35c nagios_common: Fix stupid copy paste error Change-Id: I31640565705f1712ff21837a11b519269877cfd9 25 September 2014, 22:57:25 UTC
cd12d8c beta: Remove Apache::Conf['hhvm_catchall'] from mediawiki::web::beta_sites role::beta::appserver includes both mediawiki::web and mediawiki::web::beta_sites. mediawiki::web includes mediawiki::web::modules which declares Apache::Conf['hhvm_catchall'] since Ie7c0dc64f7c6268af79b774e8653f5d4da1710d9. Beta has had a local hack for this created by jeremyb on 2014-09-10 and I just got around to tracking down the source of the conflict. Change-Id: Ibed9bb8df9f1bfbf316720ea0ddc3a91939b32d0 25 September 2014, 22:57:02 UTC
d75416c nagios_common: Use macro to expand path to wikidata check RT: 8456 Change-Id: I063e3609c4964275fb1d606fe8dc2cfbefc30bc0 25 September 2014, 22:25:29 UTC
0832c72 Ensure icedtea-7-jre-jamvm is absent on analytics::clients Change-Id: Ib4979757cfeaf2585d1a6579fdaadaa6ce8dccd4 25 September 2014, 22:03:07 UTC
55baa9e Mount HDFS at /mnt/hdfs read only on role::analytics::clients (stat1002 and analytics1027) Change-Id: I4c70f573f6fb23c64e469064af5a0b5632972bef 25 September 2014, 21:04:44 UTC
0d9476a Configure archive table name Depends on: https://gerrit.wikimedia.org/r/#/c/162292/ (NOTE: I've seen people express this dependency in gerrit, some pointers on how to do that would be appreciated) Change-Id: I9fa340666a58cd4b7d2c3be934fa3db0d4f817df 25 September 2014, 20:43:31 UTC
a05b98a temp fix for virt100X ntp Change-Id: Ib4071da20e690f9649a7751d74f5880eff958c78 25 September 2014, 19:17:39 UTC
5d9b8d9 switch debian installer to new NTP aliases Change-Id: If3e2f3afddf6bded68624a0fb7257d4fc4230261 25 September 2014, 18:59:21 UTC
fedd077 Merge "Remove 'epic' from the notice message for check_puppetrun" into production 25 September 2014, 18:16:32 UTC
658134f Remove 'epic' from the notice message for check_puppetrun These are not 'epic' failures. If they were epic, we would tell our children and grandchildren about them. Change-Id: I3fcdd0cbd29f5a16b52c4234330fed75f1a22523 25 September 2014, 18:14:16 UTC
94a4446 Switch all clients to new NTP servers ... and remove the old puppet classes ... and leave dobson/linne NTP servers running unmanaged Change-Id: Iaad6e127d0efed1c624772c07b9528627796c47a 25 September 2014, 17:47:12 UTC
1948be0 ntp: another eu s2 list improvement Change-Id: Ibe3f5eba614de17aed14b7e3e5b7a842641cbe58 25 September 2014, 17:37:36 UTC
9cd5ac1 fix one of the EU ntp s2 Change-Id: Ie50ea0fe21a6de641296c85d295cefc3d6d1442a 25 September 2014, 17:32:04 UTC
856e5cb ntp: use explicit S2 upstreams As noted in the ntp.pp comments this is less than ideal. However, if we want to maintain "restrict default ignore" (which is probably a good idea), we can't use pools until we get the "restrict source" feature, which isn't in a stable NTP release (much less distro package) yet. Change-Id: I629b3de0d3424e21b6441a31bcb5bac11f87286e 25 September 2014, 17:17:05 UTC
7b4f2ba Graphite: set Access-Control-Allow-Credentials for Grafana ..and exempt OPTIONS requests from the auth requirement, so CORS preflight works. Per http://grafana.org/docs/#graphite-server-config Change-Id: I9303834fbfee8292a4343a39cee842b546e3b45b 25 September 2014, 17:10:52 UTC
0bff6c4 nagios_common: Move contacts managing into module Change-Id: Ic0f38f1c304bc6fea1073be155f84dd5ad953b9d 25 September 2014, 16:53:19 UTC
5d2c83d Merge "nagios_common: Move check_paging into module" into production 25 September 2014, 16:52:40 UTC
a286a7b Merge "icinga: Move wikidata monitoring into module" into production 25 September 2014, 16:46:13 UTC
19f4475 Merge "icinga: Remove analytics.cfg according to TEMP: message" into production 25 September 2014, 16:46:03 UTC
69d2774 Merge "icinga: Move NSCA code into module" into production 25 September 2014, 16:36:34 UTC
f538916 HHVM: update JIT settings * Update configuration key names for <https://github.com/facebook/hhvm/commit/ca99ef1>. * Keep the ratio of a_size, a_cold, and a_frozen to have a ratio of 1 : 0.33 : 1, per Brett's advice. Change-Id: I192efa726a76bcc799490a164cf9bed0e56d5552 25 September 2014, 16:27:27 UTC
3d33020 Merge "icinga: Move user / group setup into module" into production 25 September 2014, 16:26:15 UTC
39f3e70 Merge "icinga: Move logrotate into module" into production 25 September 2014, 16:26:03 UTC
95d5eb2 Merge "icinga: Move icinga web into module" into production 25 September 2014, 16:21:45 UTC
450debf nagios_common: Move check_paging into module Also move the actual service into icinga module Change-Id: If9ccfbe1278d8d7fa8037a1f42c265a9141bdf63 25 September 2014, 16:19:35 UTC
49b3249 icinga: Move wikidata monitoring into module Change-Id: I001f9356ca565b54d5523204610c150ed9196a6f 25 September 2014, 16:19:35 UTC
b0cb191 icinga: Move user / group setup into module Change-Id: Ib2607d3502cad47d6f375603079faac69428531e 25 September 2014, 16:19:35 UTC
b7e9289 icinga: Move icinga web into module Created new icinga module, moved the bits that serve the web UI into the module Change-Id: I6b85a7d9d7954f02ba13ae82289d8515295109ef 25 September 2014, 16:19:35 UTC
bd07f52 icinga: Remove analytics.cfg according to TEMP: message Should've been removed by now Change-Id: I69252549a31533969c2b920c8f7c9a231bc56903 25 September 2014, 16:19:35 UTC
e2f2006 icinga: Move NSCA code into module - nsca::client does not seem to be used anywhere except the hadoop nodes - the firewall rules misspelt NSCA as NCSA, corrected (I assume, since the port numbers match NSCA, and NCSA seems totally unrelated) Change-Id: I4c6e84da376796253689a980a382ff62c8e29f56 25 September 2014, 16:19:35 UTC
e3324e4 icinga: Move logrotate into module Change-Id: I76006a6118e6a27ff9ae70454a1a91589fe0ace8 25 September 2014, 16:19:35 UTC
fb29fca Temporarily add the ldap-codfw cert to neptunium. Change-Id: Iad0c8b859c98cd5cbe6280139c2e503774e1bc04 25 September 2014, 16:14:32 UTC
ec97254 nagios_common: Move notification_commands into own class Change-Id: I43471fd6f7822b603b69ee167a760622d39bd718 25 September 2014, 16:11:06 UTC
b482299 nagios_common: Move timeperiods definition into module Change-Id: Ieeab484c64434b6f14c24deaf90a78f2fcdef051 25 September 2014, 16:09:23 UTC
e6a9480 nagios_common: Temp. move notify commands into check_commands This actually populates the file, so icinga will run. Will refactor out soon Change-Id: I0409c040e23f44de0370a52be07418acfb7cb95c 25 September 2014, 15:47:48 UTC
a6f89d1 Fix for new ntp.conf template Change-Id: Ib6df7eaabf2e780812583a3bd2147ebcbf168c24 25 September 2014, 15:37:06 UTC
73da7ac NTP config refactoring + updates Puppet-level stuff: * Does not touch existing NTP classes * Replaces the former classes ntp::client and ntp::server with a single define named 'ntp::daemon' that covers both cases equally via configfile template variables. * Adds class role::ntp (to be included on virtually every node via base/standard), which includes our full NTP layout/configuration for WMF and sorts out the server and client variables within itself. * Only on the new recdns boxes in site.pp for now (and a test client in codfw), will manually manage the push to those and fix up any issues before deploying wider and reaping the old classes/template NTP-level stuff: * Our new NTP servers are the 5x recdns servers (2x eqiad, 2x codfw, 1x esams). * All our NTP servers peer with each other, and upstream to the public NTP pool using a local zone (europe or us). * All our regular clients use the internal servers from 2x datacenters as their servers, and do not peer. * Access controls tightened/fixed. Change-Id: I797235f7605d274ef26f922f6e48546b0f1e6082 25 September 2014, 15:32:11 UTC
9be08e8 nagios_common: Move notification commands into module This is IRC and Email event handlers and such. Change-Id: I4e7a356d589512baad3af4ae38db9cf07231b234 25 September 2014, 15:27:11 UTC
1068abe nagios_common: move vsz into module Change-Id: Ie2a01656c185dc020dd48a1fe9e918f701d4d032 25 September 2014, 15:27:04 UTC
e83a621 nagios_common: move procs into module Change-Id: I12390db071a200d5be01121679f5df987bf94fc1 25 September 2014, 15:26:57 UTC
e72d724 nagios_common: move ping into module Change-Id: I18004176ee53e8d0071b6ea57f75ab36a369c181 25 September 2014, 15:26:46 UTC
5cd755a nagios_common: move pgsql into module Change-Id: I9662c4e1f9b6dcbd0ac38e8ec8dc6e076de9cb89 25 September 2014, 15:15:33 UTC
9749a13 nagios_common: move ntp into module Change-Id: I4a48506f08cb52f98886ba1ff8bd51ef8305caef 25 September 2014, 15:15:27 UTC
18c0e3f nagios_common: move nt into module Change-Id: Ib6c68bfa17e39db1084132789bbe95e62f59c8eb 25 September 2014, 15:15:22 UTC
f3dc160 nagios_common: move news into module Change-Id: Ifb50318779ec88bec8f581a8aa2bf113936ae174 25 September 2014, 15:15:15 UTC
082d5a0 nagios_common: move netware into module Change-Id: I8b3258c8f5a0a0e8c3a94653f7ee76038f238a31 25 September 2014, 15:15:10 UTC
4a25fc5 nagios_common: move mysql into module Change-Id: Ic7d169480b4cc8f649428420c24c8759a3aec31e 25 September 2014, 15:15:03 UTC
5710b91 nagios_common: move mrtg into module Change-Id: Ieb71ba1ca0f676798f0d286157618185c2003c7b 25 September 2014, 15:14:33 UTC
3976e5f Merge "nagios_common: Move mail into module" into production 25 September 2014, 15:10:11 UTC
dfce672 Merge "nagios_common: Move load into module" into production 25 September 2014, 15:10:05 UTC
f30ade8 Merge "nagios_common: Move ldap into module" into production 25 September 2014, 15:09:58 UTC
83c71f9 Merge "nagios_common: Move ifstatus into module" into production 25 September 2014, 15:09:49 UTC
2de9a55 Merge "nagios_common: Move http into module" into production 25 September 2014, 15:09:42 UTC
390f103 Merge "icinga: Remove hppjd check" into production 25 September 2014, 15:09:34 UTC
b98cfe4 Merge "nagios_common: Move ftp into module" into production 25 September 2014, 15:02:27 UTC
55fb658 hiera: use structured data in the private repo as well. Change-Id: I6896d3de64dcff5ffb27ee1e5412cd46e2b136ee Signed-off-by: Giuseppe Lavagetto <glavagetto@wikimedia.org> 25 September 2014, 14:45:55 UTC
bc898af nagios_common: Move mail into module Change-Id: I3275271c26dfb5416065ed45becb643b9e73b0fa 25 September 2014, 14:45:40 UTC
76fd031 nagios_common: Move load into module Change-Id: I3de4d3daf8a43ff00a5de8edc217154853317d53 25 September 2014, 14:45:35 UTC
0a4ad91 nagios_common: Move ldap into module Change-Id: I0babc3be8bed6adc87af1ef6a42e9c7cca94ee95 25 September 2014, 14:45:29 UTC
back to top