https://github.com/wikimedia/operations-puppet

sort by:
Revision Author Date Message Commit Date
485a697 Comment out uses of researchdb password classes Change-Id: I3576a0989b63cc4b27af84fa32631548b576cbb2 23 January 2015, 21:50:45 UTC
4c98f40 Comment out manual include of misc::statistics::packages::python in geowiki classes I will refactor this to use require_packages when geowiki is merged into its own module Change-Id: Ia7ca1e989a8f4ab04b0f2b9f84876d99ddcf2714 23 January 2015, 21:46:44 UTC
726dbfd Explicit provider for ganglia_new::monitor::service This seems to be an emerging ugly pattern as we migrate to jessie: For services which ship init files for other systems (e.g. /etc/init.d/foo and/or /etc/init/foo) as well as a systemd unit, the OS gives precendence to the systemd unit, but puppet (in the context of the Service definition) tends to believe the non-systemd variant without an explicit "provider => systemd". Thus in this case (prior to this patch) ganglia-monitor is actually running as a systemd service and functioning correctly there, but puppet keeps failing at status checks and then failing to start it (when it's already running), because it's explicitly invoking /etc/init.d/ganglia-monitor Change-Id: I748ea8ee28e9cde7ea1c2551f98485de1fe1913a 23 January 2015, 21:44:48 UTC
057aecf Apply new statistics::compute class to stat1003 Change-Id: Icecf9ba5813ac40772ced48627bd56374c621268 23 January 2015, 21:38:40 UTC
7d0f945 mediawiki: define pcre_cache_type for canary appservers As the new HHVM package we're installing has the new PCRE engine that needs to be configure, add this configuration there. Change-Id: If4700e5f160582504b9a1a3c8e05e289fbe75447 Signed-off-by: Giuseppe Lavagetto <glavagetto@wikimedia.org> 23 January 2015, 21:30:19 UTC
413ee9b Duplicate variable definition during migration to stat module Change-Id: I903a2aac6941cf0e3a9097c846999f54dc074b43 23 January 2015, 21:22:11 UTC
3522723 Temporarily use local variables in misc statistics classes to satisfy existing class references Change-Id: I130a955ca83917ef0cbf56ff20e6f76a0c56c6e5 23 January 2015, 21:17:10 UTC
5991c01 Just make misc::statistics::base depend on ::statistics Change-Id: Id1d77298021cb94f2bb0137ea9102fb5a71afe72 23 January 2015, 21:11:54 UTC
7b8ce19 Make misc::statistics::user and misc::statistics::base include module classes This is to fix duplicate resource definitions. T87450 Change-Id: I95d004f7c8fe64a9a78f2d786722c4465bfd7da1 23 January 2015, 21:07:37 UTC
f4b51bb Merge "cleaning out holmium entries" into production 23 January 2015, 21:04:08 UTC
2106b5f cleaning out holmium entries death to the old blog server T84263 Change-Id: I549ec19181d35e63bdcd90687d66ecf39273341f 23 January 2015, 21:03:11 UTC
79ff75d Globally qualify statistics module include Change-Id: I3930844e410ccdae95341b3d686582fb324012bb 23 January 2015, 21:03:01 UTC
691c03a Fix typo in class name include Change-Id: I2770fece69f84112e67ea59acb3ce40d673d97b9 23 January 2015, 21:01:19 UTC
2d1fa59 First commit in a multi-commit effort to move misc/statistics.pp into modules/ T87450 Change-Id: I49140d85ddea99f5d4d9a3c71e60cf7fa57d49b6 23 January 2015, 20:19:43 UTC
6378084 use /run/ directly for nginx pidfile this fixes "reload" under systemd w/ jessie, and also works for currnet precise/trusty because they all have /var/run -> /run anyways. Change-Id: Ia6fb7ba1da72f5cae87b825a3befd7341c37673d 23 January 2015, 19:25:16 UTC
c86c2ab snapshot: unify node declarations, use role, hiera See T86774 Change-Id: If95c91749d6290ba9dc2e44251a6344f1641c687 Signed-off-by: Giuseppe Lavagetto <glavagetto@wikimedia.org> 23 January 2015, 17:41:59 UTC
020892f mediawiki: Introduce feature flag to enable/disable lvs Since labs doesn't support LVS yet Bug: T87210 Change-Id: I0b84083645ae14b4479e1b5ed094b666b76ac43d 22 January 2015, 21:26:08 UTC
06408a2 mediawiki: Load prod/beta apache configurabion based on realm Is a hack, should be replaced with something nicer for managing the different apache configuration Bug: T87210 Change-Id: Iff85982d7a3f887933e928e770b089295194a01b 22 January 2015, 21:10:12 UTC
1451dc5 Merge "let bastion hosts have base::firewall" into production 21 January 2015, 23:56:49 UTC
e7e3a75 keyholder: ensure ssh-agent-proxy transmits whole messages Periodically, deployers will see the following error on tin: "Error reading response length from authentication socket." These errors started appearing with the introduction of keyholder. I suspect it happens when the recv() call on the socket that is connected to the actual ssh-agent reads less than a full message. So instead of copying bytes from the proxy to the client as they come, only send complete messages. Bug: T86545 Change-Id: I68b5e1c2f1d80ef7585e97d8820513ee062968be 21 January 2015, 02:30:54 UTC
1d47e5e Merge "Scrap the /var/log lvm partition; make the / 20G by default." into production 21 January 2015, 00:44:05 UTC
4ae028a Scrap the /var/log lvm partition; make the / 20G by default. Bug: T87003 Change-Id: Idaeafb72aa367aea0abdcec1fb9116717e23a3e2 21 January 2015, 00:42:39 UTC
4333cf4 varnish systemd unit file: iterate on fixups #4 Change-Id: I233819b0d0fa004b39cbd19deb5fa866d950d35b 20 January 2015, 23:54:01 UTC
e21cd85 varnish systemd unit file: iterate on fixups #3 Change-Id: I0126e8d99679b467021171d641da9b8fedc774b6 20 January 2015, 23:50:39 UTC
9f97e3c varnish systemd unit file: iterate on fixups #2 Change-Id: I618088f129e1babc3ee320dfb30f98b8fd3923dd 20 January 2015, 23:27:54 UTC
a57ede4 varnish systemd unit file: iterate on fixups Change-Id: I94f13d91a008fc3c2f490f17ccefe42032ab2373 20 January 2015, 23:20:09 UTC
943bdc8 Bump alert thresholds for EventLogging's overall events/s Since EventLogging volume recently outgrew the 350 events/s and EventLogging is known to be able to handle more events/s, we bump the threshold to avoid getting unneeded Icinga warnings for EventLogging. The 450 events/s threshold is arbitrary, but EventLogging is known to currently handle 450 events/s amount of traffic, and that threshold will silence the false alarms for now. The Analytics team has to come up with more realistic thresholds (T86244). Change-Id: I2312d62fff1ad851640c2f2fced646478833b7a4 20 January 2015, 22:02:04 UTC
fd52585 varnish systemd service stuff Change-Id: I459ea096e260b49037bf934ff9d63ef7372c7120 20 January 2015, 21:24:19 UTC
24e3a61 contint: Don't include base firewall by default Applying this class by itself should not cause the inclusion of base::firewall - only the opening up of a firewall hole *if* a firewall already exists. This will remove the base firewall from most beta instances, and we can enable them on a case by case basis if required Change-Id: I94d09b49726297e451777655b5acc614527daa5f 20 January 2015, 08:11:16 UTC
62971fb Update servermon settings Update RSpecs, Rakefile and update the settings and stop managing urls.py Change-Id: I4f1b41d18c454d0226b8bd3bc70af1c1175cbc6b 20 January 2015, 02:46:12 UTC
2cd2202 Update servermon's service_name So that code deployments actually restart the service Change-Id: I5d24c9e9eb9b9f1c951dbafbadec3c74ede3d797 20 January 2015, 02:46:12 UTC
2356ee6 update role::ganglia::config $data_sources ganglia::collector::config $data_sources seems to be the same, or overidden, or... something. Change-Id: I743c5f80c455f91e3e209142dcac4af6a375865e 20 January 2015, 02:27:33 UTC
55e5777 mediawiki: Explicitly open port 80 on mediawiki webservers No-op in prod since base::firewall isn't applied Change-Id: I2df3ae0f53e2c7913539938045bfcea1854d3c0f 20 January 2015, 02:17:25 UTC
2f32217 beta: Kill beta specific mediawiki logging role Bug: T87210 Change-Id: I5aa9392c2c1f9afd6c6625bbccde70c035eb50be 20 January 2015, 02:07:25 UTC
816acdc beta: Kill beta specific jobrunner class Hiera data appropriately set! Bug: T87210 Change-Id: Ibaebde2b9206ae5e7b606bede3e004d6da70fef7 20 January 2015, 02:07:25 UTC
51ee252 beta: Kill videoscaler role Bug: T87210 Change-Id: Ie1f80509d7022c1017d1758108ddceafa14d392e 20 January 2015, 02:07:25 UTC
bb1c8e3 mediawiki: Move jobrunner config into hiera Shared across jobrunner and videoscaler machines Bug: T87210 Change-Id: I21396d68e91344d3f269a75667dc5e1cfcdc7dfe 20 January 2015, 02:03:48 UTC
75d6dcb mediawiki: Fix incredibly annnoyyying stray space Change-Id: I78bcc5bb79067a2d9ec41289f536257569bca586 20 January 2015, 01:41:07 UTC
457d585 redis: user role, hiera See T86774 Change-Id: Ice9de3907a7d2cd90d95c1bddffbdbfbed24d1cd Signed-off-by: Giuseppe Lavagetto <glavagetto@wikimedia.org> 20 January 2015, 01:33:36 UTC
633df0b memcached: use role, hiera See T86774 Change-Id: I1f2aa1fb0320be4e89dc4b869b0786e8fde96dfc Signed-off-by: Giuseppe Lavagetto <glavagetto@wikimedia.org> 20 January 2015, 01:11:20 UTC
a93f85c Tools: Install valgrind on bastions Bug: T87117 Change-Id: I680bdd8a04b6800ed105e27f5a05a2981c63a56a 20 January 2015, 01:03:53 UTC
6d17f31 correct mysql ganglia aggregators for eqiad T87209 Change-Id: I502667045e0ba3723d70f23dadc607b9d668ce51 20 January 2015, 00:29:55 UTC
f11eee6 deploy dbstore2001 dbstore2002 Change-Id: Ib7ff6916b58831969b65566f7a1981bd4a8ba2d8 20 January 2015, 00:20:52 UTC
e6095a2 lvs: move the hiera file to the correct location Change-Id: Ifa0f5e4ed56a90c58c3676352ed5a2b815b9c6eb 20 January 2015, 00:06:10 UTC
2681cdc lvs: use role, hiera See T86774 Change-Id: I49a77b24d6e747460ad35b547c2c00a70ed99836 19 January 2015, 23:54:15 UTC
54deab4 beta: Clean up remnants of older apache-config setup Comes from modules/mediawiki/files/apache/beta now Change-Id: Iab695a98b37a7270da97dd722e1f17e59c75b02e 19 January 2015, 23:01:10 UTC
b2a7747 beta: Kill fatal_monitor.rb script Hasn't worked for a long time (HHVM has no fatal.log) Change-Id: Ied5c098bfb5c1ceac6936c0ea4e67eee3c4b3705 19 January 2015, 22:56:09 UTC
193f8a3 restbase: use role, hiera As per T86774 Change-Id: Idff78403ac860a7ae8b8a48d9dd418fed9b4e26a Signed-off-by: Giuseppe Lavagetto <glavagetto@wikimedia.org> 19 January 2015, 22:27:42 UTC
ab526aa logstash: Experimental logstash irc logging on production logstash SAL is terrible. Let's attempt to replace it! Only on deployment-prep logstash to begin with, because prod logstash is incredibly overloaded atm Change-Id: Ic2f32e5e34d438b83506b60deda428001b18330e 19 January 2015, 22:17:44 UTC
188dc37 Followup commit to 0333eab Fix a missing s typo Change-Id: I6da41d169e3535f895f354d0450d0b8dd0761f80 19 January 2015, 21:02:50 UTC
0333eab monitoring::service: mimic monitoring::host group handling We ended up having different ways of handling servicegroups and hostgroups in icinga. Be consistent and handle the $group parameter the same way Change-Id: I9e8f354ec68339433ae1efd6f5f45dd05cfc67d9 19 January 2015, 20:52:49 UTC
153985a parsoid: use hiera, role This commit cleans up site.pp and moves most class parameters to hiera, for the rationale see T86774 Change-Id: I2c4db8cad5f27e586eaa89b700618aa153cee141 Signed-off-by: Giuseppe Lavagetto <glavagetto@wikimedia.org> 19 January 2015, 20:10:54 UTC
8b5e4cc parsoid: Include base::firewall on parsoid hosts Beta has base::firewall included on all *oid hosts (see T86951) and sca* has base::firewall included for all *oid services (see I13f8e75dd0ba61671bf9f8acd075333c497b4435). This adds base::firewall to parsoid hosts as well so that everything is consistent (prod vs beta as well as *oid hosts). A ferm rule for parsoid's port has been added in Ia312a73d1ab329a22aae26ee851ed584363017b3 Bug: T87105 Change-Id: I5d32c8f3c60d4903d58e850e3507fffb959e4245 19 January 2015, 17:34:37 UTC
037b766 Do not delete the Wikidata dump we just created. Nor the recent ones. Instead delete the older ones. Change-Id: I05634acbe759427399e99d77cf2b9e47215363df 19 January 2015, 16:38:19 UTC
dc23197 shinken: Check wikitech on https rather than http Since http just redirects to https Change-Id: Icfed53d2bb5082037f3613fb2578e74eecc74d8d 19 January 2015, 10:33:25 UTC
fa29bab shinken: Add wikitech check for labs infra Just a basic check to see if main page is up Change-Id: Ie52016c85c0c3a1a887562250368ad854af6c42f 19 January 2015, 00:54:32 UTC
58f34e3 upgrade db1054 to trusty and mariadb 10 Change-Id: I54040963731614a56ae859ca73ad1607876ff0cb 18 January 2015, 20:26:09 UTC
c9e66a4 deployment: Open up redis port on deployment masters base::firewall is applied on deployment-bastion, so things can't seem to deploy in beta because the redis port isn't accessible. Adding in prod too since it is a good practice, even if prod machines don't have base::firewall yet. In a glorious future, these two roles would be unified Change-Id: I617b780a9cc7c5859e93b2b1d506909582106e62 17 January 2015, 09:52:02 UTC
4f154e7 ocg: Don't include admin class in labs Admin doesn't work on labs yet, and should be guarded. Change-Id: Iba680504cb073c2a682a6d268bcba702e6620520 17 January 2015, 09:40:08 UTC
26a4757 mediawiki: Don't include ::admin in labs Is used in jobrunners on beta, causing puppet failures Change-Id: I3d2efe68b7faaf4eb62962c23922690a42dcbf54 17 January 2015, 09:29:27 UTC
fb72fb7 Add IP mapping for toolserver.org Bug: T87086 Change-Id: Ideb7afc3c46a350928decef13ef240365f23491f 17 January 2015, 06:05:09 UTC
e6f720b beta: Monitor cxserver, mathoid, citoid and apertium services Bug: T87087 Change-Id: I661de431dc9c0c9728f560a455088620619e07d1 17 January 2015, 05:51:36 UTC
bea8b80 beta: Remove dup of /home/mwdeploy/.ssh This handled by mediawiki::users now. Change-Id: I8dfe51fb3654d126e18837959e295b0511d75703 17 January 2015, 05:16:12 UTC
37fb792 parsoid: Open port 8000 with ferm rule Since https://gerrit.wikimedia.org/r/185429 sca* hosts have base::firewall enabled. BetaLabs hosts also have it enabled (via role::ci::slave::labs::common) - for both sca* services and parsoid. Parsoid in production *doesn't* have base::firewall enabled, but we perhaps should at some point to unify parsoid with other *oids (or remove them from everywhere). For now, add a ferm rule to open pup port 8000 so betalabs parsoid works fine. Bug: T86951 Change-Id: Ia312a73d1ab329a22aae26ee851ed584363017b3 17 January 2015, 05:13:57 UTC
94fdf1b beta: Add shinken monitoring for parsoid Bug: T87063 Change-Id: I30a7c02ba0044db74f1e3926b3f27b4ab5c49e48 17 January 2015, 04:45:42 UTC
eab16cd admin: update my deployment script Change-Id: I1ceabd8612155ac79147aeba01d6f69071cf20d8 17 January 2015, 02:03:01 UTC
87dcb09 Temporarly override DNS CNAME entries for hadoop masters This is a sorta emergency effort to keep jobs from failing. Change-Id: Ide6834fd670e45b6b3e9ae9b9b0dd9e9d0c9ed8c 16 January 2015, 21:59:26 UTC
31ce406 Add subversion to Phabricator Change-Id: If2d67e501f1c43a675785d263cc91ca0f4c6ce1a 16 January 2015, 20:23:28 UTC
533404d setting rbf2001/2002 base isntall params setting mac info for dhcp and partition info for installer T86897 Change-Id: I2e6a2e20251048124986499762a93e12b5023fd3 16 January 2015, 18:44:59 UTC
5e66eab Add my .bashrc Basically the default but with my PS1 instead and some aliases I never use removed Change-Id: I34942b9ace3fe09118ace5a01ac225b8fce52c1a 16 January 2015, 18:39:04 UTC
c3cc82a amssq42 -> jessie Change-Id: I46fb8b82241bbc801aa58e6b48c59274a4a3510a 16 January 2015, 17:54:46 UTC
fcc2134 disable amssq42 esams text cache backend Change-Id: I83f9287971b17c62012be5e389875bb93ed66e5c 16 January 2015, 17:53:58 UTC
8f2a2c1 Removing hadoop::standby role from analytics1004 Change-Id: Ia482ac1f33231262da60685aadd93b70d661933f 16 January 2015, 17:51:00 UTC
2465d3a analytics1001 and analytics1002 are now the hadoop namenodes analytics1001 is the hadoop master Change-Id: Ib80100304686b2cdda6832e9f13b0056409b36d7 16 January 2015, 16:21:33 UTC
fabb452 Revert "VCL: Use header.append() in more places." This reverts commit bdb9186b9fa3281b7a9b23729fcc77adffa29dc1. header.append() does not do what it seems to do! Change-Id: I8b2c4d07cfd4f39fe19f6c45003da82ba3ebf076 16 January 2015, 15:50:41 UTC
4317cc5 Prep for migrating Hadoop namenodes to analytics1001 and analytics1002 Change-Id: Ic6abb379a88f0ac9e3b60e10d60b8b0d493b8040 16 January 2015, 15:48:40 UTC
dbfb1a4 Followup commit for c914851 Adding the role:: prefix Change-Id: Iec80c59608b3a88ed182dcbcb4eaf84dfbdd3338 16 January 2015, 14:21:24 UTC
c914851 Setup holmium as a backup::host To take one last backup of the blog host Change-Id: I836ee98e8def19705da9f1a0206452a907d2d4dc 16 January 2015, 13:41:26 UTC
9328237 Merge "*oid: Remove useless ferm declarations" into production 16 January 2015, 13:20:22 UTC
de961e8 Merge "monitoring: allow host to check based on the fqdn of a host" into production 16 January 2015, 13:19:23 UTC
6fa3e5c *oid: Remove useless ferm declarations While investigating T86143 several things were discovered: - Beta had 'natfix' applied earlier, which included base::firewall, which had a default DROP policy - To work around this, individual ports for the *oids were opened up via ferm rules for ::beta roles. During unification + some confusion these rules ended up on prod too, where they were noops due to base::firewall not being included. At least for Service Cluster A services (mathoid, citoid, cxserver) this was fixed in I13f8e75dd0ba61 - This 'natfix' was a kludge that was removed in I6be9ab57b9ac92973b5568c8db522ae97537551e. This also removed base::firewall - However the residual effects of applying base::firewall were still in place even thought the class was no longer applied. - The residual effects were removed manulal via salt, to fix T86143 - That means that some of the ferm rules are no longer required! Change-Id: I4e5f91eceba3d4894430ba5fbdb9f3945b99d2de 16 January 2015, 13:13:58 UTC
f59dfc5 Add base::firewall to Service Cluster A It should be done anyway at some point, do it now that it does not really have any traffic. Kill some comments in the various roles used in Service Cluster A that no longer make sense. Change-Id: I13f8e75dd0ba61671bf9f8acd075333c497b4435 16 January 2015, 12:53:29 UTC
1e1c382 monitoring: allow host to check based on the fqdn of a host Given the 'address' directive in a nagios host definition can take either an ip address or a fqdn, allow monitoring::host to use the fqdn whenever it is explicitly provided (and still defaulting to the current host's $::ipaddress in any other cases). We also fix the google host check not to use a hardcoded ip address. Change-Id: I606901663e90bb1c731a3f0aa2bcdd2c31fd352f Signed-off-by: Giuseppe Lavagetto <glavagetto@wikimedia.org> 16 January 2015, 12:18:04 UTC
967aaf4 shinken: Add ssh checks for all monitored hosts Bug: T86027 Change-Id: I3545247abdc0ac8278ae9fe298cc1397d7bed40a 16 January 2015, 12:00:27 UTC
4a1c9aa require at least one haproxy proxy, but allow multiple. When haproxy is gracefully reloaded, a persistent DB connection like eventlogging will keep the old haproxy process alive for its lifetime, though only the new process will proxy new tcp connections. Using a maximum range for check_procs just causes icinga noise. Change-Id: I73319bb3f3789a8680f17c9fa261596a48d5137f 16 January 2015, 11:03:30 UTC
3e89cbc mediawiki: retab of the virtualhost for search.w.o Change-Id: Ic48b5ab7b0dbe86b56b5bb98d0e36648b0fd0ea7 Signed-off-by: Giuseppe Lavagetto <glavagetto@wikimedia.org> 16 January 2015, 10:24:49 UTC
d33b0a3 beta: Ensure that mw related users are present in scap targets This is just what is left of I1e8a28d576cd67348625125238cda47628fa476a after the other issues have been fixed. In a glorious future, the beta module would not exist. Bug: T67591 Change-Id: I4089b70b1fea56f2f294d02788669126b17595fc 16 January 2015, 09:52:33 UTC
cc9cd92 mailman: move into a new, separate module Move all the mailman manifests & files into a separate module. This is fairly Wikimedia-specific by design. There are some grey lines between this and the role class that could probably be improved in a later patchset. Change-Id: I38660ba8533c47caf823a77890dc66b38c870957 16 January 2015, 09:18:06 UTC
caf6b2b Merge "admin: Add dotfiles for kartik" into production 16 January 2015, 09:13:18 UTC
158364f Merge "ldap: cleanup unused role classes" into production 16 January 2015, 09:08:42 UTC
1f565b3 Merge "Remove decom'ed server "sanger" from site.pp" into production 16 January 2015, 09:08:41 UTC
3d6b508 ldap: cleanup unused role classes The following classes are unreferenced/dead code: - ldap::role::client::corp - ldap::client::wmf-test-cluster - ldap::role::server::production - ldap::role::server::corp Remove them. Change-Id: If04faa38363f480514ab2a7a0b1b3f411f7e9f84 16 January 2015, 09:03:16 UTC
767b730 Remove decom'ed server "sanger" from site.pp It was a pmtpa host, shouldn't have been left here in the first place. Change-Id: I22a36542cb66e9977eae7b76b8a8f101d60df5b0 16 January 2015, 09:03:16 UTC
2c01f02 Merge "cxserver: Move comment to correct place" into production 16 January 2015, 09:02:49 UTC
32bbcdd admin: Add dotfiles for kartik Change-Id: I47d59a51d14eec17acc4476a386546647844716b 16 January 2015, 09:02:04 UTC
da0188a cxserver: Move comment to correct place Change-Id: Ieb539fde22223f5940fb5b86705ffc33404b0412 16 January 2015, 08:59:26 UTC
0bf2a32 mediawiki: use HHVM for the apple search dictionary Change-Id: I101bb5056b19f191c5ca3cf99e67b261848432cd Signed-off-by: Giuseppe Lavagetto <glavagetto@wikimedia.org> 16 January 2015, 08:21:25 UTC
0f84dda Merge "Exclude most udp2log messages from logstash" into production 16 January 2015, 06:32:51 UTC
ae69ead db1020 is primary Change-Id: I6ccbb22816297caba4e8b920b82ceb0d82766a9a 16 January 2015, 03:15:48 UTC
cdfdda2 etherpad: add Varnish misc config Add etherpad.wikimedia.org to misc Varnish config to set backend to zirconium. Bug: T85788 Change-Id: Ied6b8fa8f08c022407ee7cefb9ef93d6774c3070 16 January 2015, 00:03:04 UTC
back to top