https://github.com/postgres/postgres

sort by:
Revision Author Date Message Commit Date
8c894c5 Stamp 9.5.9. 28 August 2017, 21:24:28 UTC
e13c30c Doc: adjust release-note credit for parallel pg_restore fix. Discussion: https://postgr.es/m/CAFcNs+pJ6_Ud-zg3vY_Y0mzfESdM34Humt8avKrAKq_H+v18Cg@mail.gmail.com 28 August 2017, 15:40:48 UTC
dbe1736 Translation updates Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: bb30ba75db8403a9ce4fb8ba6b7c3fe42ac4069e 28 August 2017, 14:15:52 UTC
0601662 Release notes for 9.6.5, 9.5.9, 9.4.14, 9.3.19, 9.2.23. 27 August 2017, 21:35:04 UTC
6bd9c1b Fix outdated comment Author: Thomas Munro <thomas.munro@enterprisedb.com> 23 August 2017, 18:20:34 UTC
6bf4dca Fix translation marker This was erroneously removed in 55a70a023c3daefca9bbd68bfbe6862af10ab479. 23 August 2017, 13:59:00 UTC
d778a77 Backpatch introduction of TupleDescAttr(tupdesc, i). 2cd70845240 / c6293249d change the way individual attributes in a TupleDesc are stored / accessed. To reduce the effort of making extensions compatible with postgresql 11, and to ease future backpatching, backpatch introduction of TupleDescAttr() to all releases. Do not backpatch change in storage, as that'd be a breaking change for existing and working extensions. Author: Andres Freund Discussion: https://postgr.es/m/20170820181723.tdswdinzptbcwhrr@alap3.anarazel.de Backpatch: 9.2- 22 August 2017, 14:47:46 UTC
258aac0 Fix possible core dump in parallel restore when using a TOC list. Commit 3eb9a5e7c unintentionally introduced an ordering dependency into restore_toc_entries_prefork(). The existing coding of reduce_dependencies() contains a check to skip moving a TOC entry to the ready_list if it wasn't initially in the pending_list. This used to suffice to prevent reduce_dependencies() from trying to move anything into the ready_list during restore_toc_entries_prefork(), because the pending_list stayed empty throughout that phase; but it no longer does. The problem doesn't manifest unless the TOC has been reordered by SortTocFromFile, which is how I missed it in testing. To fix, just add a test for ready_list == NULL, converting the call with NULL from a poor man's sanity check into an explicit command not to touch TOC items' list membership. Clarify some of the comments around this; in particular, note the primary purpose of the check for pending_list membership, which is to ensure that we can't try to restore the same item twice, in case a TOC list forces it to be restored before its dependency count goes to zero. Per report from Fabrízio de Royes Mello. Back-patch to 9.3, like the previous commit. Discussion: https://postgr.es/m/CAFcNs+pjuv0JL_x4+=71TPUPjdLHOXA4YfT32myj_OrrZb4ohA@mail.gmail.com 19 August 2017, 17:39:38 UTC
bff216d Further tweaks to compiler flags for PL/Perl on Windows. It now emerges that we can only rely on Perl to tell us we must use -D_USE_32BIT_TIME_T if it's Perl 5.13.4 or later. For older versions, revert to our previous practice of assuming we need that symbol in all 32-bit Windows builds. This is not ideal, but inquiring into which compiler version Perl was built with seems far too fragile. In any case, we had not previously had complaints about these old Perl versions, so let's assume this is Good Enough. (It's still better than the situation ante commit 5a5c2feca, in that at least the effects are confined to PL/Perl rather than the whole PG build.) Back-patch to all supported versions, like 5a5c2feca and predecessors. Discussion: https://postgr.es/m/CANFyU97OVQ3+Mzfmt3MhuUm5NwPU=-FtbNH5Eb7nZL9ua8=rcA@mail.gmail.com 17 August 2017, 17:14:11 UTC
f2ed2d0 Changed ecpg parser to allow RETURNING clauses without attached C variables. 16 August 2017, 11:28:37 UTC
eff9ef7 Initialize replication_slot_catalog_xmin in procarray Although not confirmed and probably rare, if the newly allocated memory is not already zero, this could possibly have caused some problems. Also reorder the initializations slightly so they match the order of the struct definition. Author: Wong, Yi Wen <yiwong@amazon.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> 16 August 2017, 01:06:22 UTC
dbeefe6 Include foreign tables in information_schema.table_privileges This appears to have been an omission in the original commit 0d692a0dc9f. All related information_schema views already include foreign tables. Reported-by: Nicolas Thauvin <nicolas.thauvin@dalibo.com> 15 August 2017, 23:32:00 UTC
51684ba Handle elog(FATAL) during ROLLBACK more robustly. Stress testing by Andreas Seltenreich disclosed longstanding problems that occur if a FATAL exit (e.g. due to receipt of SIGTERM) occurs while we are trying to execute a ROLLBACK of an already-failed transaction. In such a case, xact.c is in TBLOCK_ABORT state, so that AbortOutOfAnyTransaction would skip AbortTransaction and go straight to CleanupTransaction. This led to an assert failure in an assert-enabled build (due to the ROLLBACK's portal still having a cleanup hook) or without assertions, to a FATAL exit complaining about "cannot drop active portal". The latter's not disastrous, perhaps, but it's messy enough to want to improve it. We don't really want to run all of AbortTransaction in this code path. The minimum required to clean up the open portal safely is to do AtAbort_Memory and AtAbort_Portals. It seems like a good idea to do AtAbort_Memory unconditionally, to be entirely sure that we are starting with a safe CurrentMemoryContext. That means that if the main loop in AbortOutOfAnyTransaction does nothing, we need an extra step at the bottom to restore CurrentMemoryContext = TopMemoryContext, which I chose to do by invoking AtCleanup_Memory. This'll result in calling AtCleanup_Memory twice in many of the paths through this function, but that seems harmless and reasonably inexpensive. The original motivation for the assertion in AtCleanup_Portals was that we wanted to be sure that any user-defined code executed as a consequence of the cleanup hook runs during AbortTransaction not CleanupTransaction. That still seems like a valid concern, and now that we've seen one case of the assertion firing --- which means that exactly that would have happened in a production build --- let's replace the Assert with a runtime check. If we see the cleanup hook still set, we'll emit a WARNING and just drop the hook unexecuted. This has been like this a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/877ey7bmun.fsf@ansel.ydns.eu 14 August 2017, 19:43:20 UTC
1621a75 Absorb -D_USE_32BIT_TIME_T switch from Perl, if relevant. Commit 3c163a7fc's original choice to ignore all #define symbols whose names begin with underscore turns out to be too simplistic. On Windows, some Perl installations are built with -D_USE_32BIT_TIME_T, and we must absorb that or we get the wrong result for sizeof(PerlInterpreter). This effectively re-reverts commit ef58b87df, which injected that symbol in a hacky way, making it apply to all of Postgres not just PL/Perl. More significantly, it did so on *all* 32-bit Windows builds, even when the Perl build to be used did not select this option; so that it fails to work properly with some newer Perl builds. By making this change, we would be introducing an ABI break in 32-bit Windows builds; but fortunately we have not used type time_t in any exported Postgres APIs in a long time. So it should be OK, both for PL/Perl itself and for third-party extensions, if an extension library is built with a different _USE_32BIT_TIME_T setting than the core code. Patch by me, based on research by Ashutosh Sharma and Robert Haas. Back-patch to all supported branches, as commit 3c163a7fc was. Discussion: https://postgr.es/m/CANFyU97OVQ3+Mzfmt3MhuUm5NwPU=-FtbNH5Eb7nZL9ua8=rcA@mail.gmail.com 14 August 2017, 15:48:59 UTC
425be3a Remove AtEOXact_CatCache(). The sole useful effect of this function, to check that no catcache entries have positive refcounts at transaction end, has really been obsolete since we introduced ResourceOwners in PG 8.1. We reduced the checks to assertions years ago, so that the function was a complete no-op in production builds. There have been previous discussions about removing it entirely, but consensus up to now was that it had some small value as a cross-check for bugs in the ResourceOwner logic. However, it now emerges that it's possible to trigger these assertions if you hit an assert-enabled backend with SIGTERM during a call to SearchCatCacheList, because that function temporarily increases the refcounts of entries it's intending to add to a catcache list construct. In a normal ERROR scenario, the extra refcounts are cleaned up by SearchCatCacheList's PG_CATCH block; but in a FATAL exit we do a transaction abort and exit without ever executing PG_CATCH handlers. There's a case to be made that this is a generic hazard and we should consider restructuring elog(FATAL) handling so that pending PG_CATCH handlers do get run. That's pretty scary though: it could easily create more problems than it solves. Preliminary stress testing by Andreas Seltenreich suggests that there are not many live problems of this ilk, so we rejected that idea. There are more-localized ways to fix the problem; the most principled one would be to use PG_ENSURE_ERROR_CLEANUP instead of plain PG_TRY. But adding cycles to SearchCatCacheList isn't very appealing. We could also weaken the assertions in AtEOXact_CatCache in some more or less ad-hoc way, but that just makes its raison d'etre even less compelling. In the end, the most reasonable solution seems to be to just remove AtEOXact_CatCache altogether, on the grounds that it's not worth trying to fix it. It hasn't found any bugs for us in many years. Per report from Jeevan Chalke. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAM2+6=VEE30YtRQCZX7_sCFsEpoUkFBV1gZazL70fqLn8rcvBA@mail.gmail.com 13 August 2017, 20:15:14 UTC
d1c1d90 Fix handling of container types in find_composite_type_dependencies. find_composite_type_dependencies correctly found columns that are of the specified type, and columns that are of arrays of that type, but not columns that are domains or ranges over the given type, its array type, etc. The most general way to handle this seems to be to assume that any type that is directly dependent on the specified type can be treated as a container type, and processed recursively (allowing us to handle nested cases such as ranges over domains over arrays ...). Since a type's array type already has such a dependency, we can drop the existing special case for the array type. The very similar logic in get_rels_with_domain was likewise a few bricks shy of a load, as it supposed that a directly dependent type could *only* be a sub-domain. This is already wrong for ranges over domains, and it'll someday be wrong for arrays over domains. Add test cases illustrating the problems, and back-patch to all supported branches. Discussion: https://postgr.es/m/15268.1502309024@sss.pgh.pa.us 09 August 2017, 21:03:09 UTC
a784d5f Prevent passing down MAKELEVEL/MAKEFLAGS from non-GNU make to GNU make. FreeBSD's make, for one, sets the MAKELEVEL environment variable when invoking commands. In the special Makefile we provide to hand off control from a non-GNU make to GNU make, this causes GNU make to think it is a child make invocation rather than top-level. That interferes with the hack added in commit dcae5facc to cause the temp-install tree to be made only by the top-level invocation of gmake. Unset the variable to prevent that. Likewise unset MAKEFLAGS, which FreeBSD's make also sets, and which could easily confuse gmake. There are no reports of actual trouble from that, but it seems better to be proactive. Back-patch to 9.5 where dcae5facc came in. Thomas Munro, hacked a bit more by me Discussion: https://postgr.es/m/CAEepm=1ueww35AXTkt1A3gyzZUqv5XCzh8RUNvJZAQAW=eOhVw@mail.gmail.com 09 August 2017, 16:06:14 UTC
9cc510f Reword some unclear comments 08 August 2017, 22:48:25 UTC
029386c Stamp 9.5.8. 07 August 2017, 21:13:41 UTC
caada7c Translation updates Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: d316c7f205275603a833ab9758ce51a76846ec58 07 August 2017, 17:43:43 UTC
16f4297 Last-minute updates for release notes. Security: CVE-2017-7546, CVE-2017-7547, CVE-2017-7548 07 August 2017, 15:46:20 UTC
873741c Require update permission for the large object written by lo_put(). lo_put() surely should require UPDATE permission, the same as lowrite(), but it failed to check for that, as reported by Chapman Flack. Oversight in commit c50b7c09d; backpatch to 9.4 where that was introduced. Tom Lane and Michael Paquier Security: CVE-2017-7548 07 August 2017, 14:19:21 UTC
36f9f60 Again match pg_user_mappings to information_schema.user_mapping_options. Commit 3eefc51053f250837c3115c12f8119d16881a2d7 claimed to make pg_user_mappings enforce the qualifications user_mapping_options had been enforcing, but its removal of a longstanding restriction left them distinct when the current user is the subject of a mapping yet has no server privileges. user_mapping_options emits no rows for such a mapping, but pg_user_mappings includes full umoptions. Change pg_user_mappings to show null for umoptions. Back-patch to 9.2, like the above commit. Reviewed by Tom Lane. Reported by Jeff Janes. Security: CVE-2017-7547 07 August 2017, 14:09:31 UTC
127835d Don't allow logging in with empty password. Some authentication methods allowed it, others did not. In the client-side, libpq does not even try to authenticate with an empty password, which makes using empty passwords hazardous: an administrator might think that an account with an empty password cannot be used to log in, because psql doesn't allow it, and not realize that a different client would in fact allow it. To clear that confusion and to be be consistent, disallow empty passwords in all authentication methods. All the authentication methods that used plaintext authentication over the wire, except for BSD authentication, already checked that the password received from the user was not empty. To avoid forgetting it in the future again, move the check to the recv_password_packet function. That only forbids using an empty password with plaintext authentication, however. MD5 and SCRAM need a different fix: * In stable branches, check that the MD5 hash stored for the user does not not correspond to an empty string. This adds some overhead to MD5 authentication, because the server needs to compute an extra MD5 hash, but it is not noticeable in practice. * In HEAD, modify CREATE and ALTER ROLE to clear the password if an empty string, or a password hash that corresponds to an empty string, is specified. The user-visible behavior is the same as in the stable branches, the user cannot log in, but it seems better to stop the empty password from entering the system in the first place. Secondly, it is fairly expensive to check that a SCRAM hash doesn't correspond to an empty string, because computing a SCRAM hash is much more expensive than an MD5 hash by design, so better avoid doing that on every authentication. We could clear the password on CREATE/ALTER ROLE also in stable branches, but we would still need to check at authentication time, because even if we prevent empty passwords from being stored in pg_authid, there might be existing ones there already. Reported by Jeroen van der Ham, Ben de Graaff and Jelte Fennema. Security: CVE-2017-7546 07 August 2017, 14:04:00 UTC
4f7807b Release notes for 9.6.4, 9.5.8, 9.4.13, 9.3.18, 9.2.22. 06 August 2017, 21:57:06 UTC
869a586 Fix thinko introduced in 2bef06d516460 et al. The callers for GetOldestSafeDecodingTransactionId() all inverted the argument for the argument introduced in 2bef06d516460. Luckily this appears to be inconsequential for the moment, as we wait for concurrent in-progress transaction when assembling a snapshot. Additionally this could only make a difference when adding a second logical slot, because only a pre-existing slot could cause an issue by lowering the returned xid dangerously much. Reported-By: Antonin Houska Discussion: https://postgr.es/m/32704.1496993134@localhost Backport: 9.4-, where 2bef06d516460 was backpatched to. 06 August 2017, 21:21:22 UTC
062291f Add regression test for wide REPLICA IDENTITY FULL updates. This just contains the regression tests added by a fix for a 9.4 specific bug regarding $subject. Author: Andres Freund Backpatch: 9.5- 05 August 2017, 21:44:18 UTC
bebee33 Disallow SSL session tickets. We don't actually support session tickets, since we do not create an SSL session identifier. But it seems that OpenSSL will issue a session ticket on-demand anyway, which will then fail when used. This results in reconnection failures when using ticket-aware client-side SSL libraries (such as the Npgsql .NET driver), as reported by Shay Rojansky. To fix, just tell OpenSSL not to issue tickets. At some point in the far future, we might consider enabling tickets instead. But the security implications of that aren't entirely clear; and besides it would have little benefit except for very short-lived database connections, which is Something We're Bad At anyhow. It would take a lot of other work to get to a point where that would really be an exciting thing to do. While at it, also tell OpenSSL not to use a session cache. This doesn't really do anything, since a backend would never populate the cache anyway, but it might gain some micro-efficiencies and/or reduce security exposures. Patch by me, per discussion with Heikki Linnakangas and Shay Rojansky. Back-patch to all supported versions. Discussion: https://postgr.es/m/CADT4RqBU8N-csyZuzaook-c795dt22Zcwg1aHWB6tfVdAkodZA@mail.gmail.com 04 August 2017, 15:07:10 UTC
f2878a6 Add missing ALTER USER variants ALTER USER ... SET did not support all the syntax variants of ALTER ROLE ... SET. Reported-by: Pavel Golub <pavel@microolap.com> 04 August 2017, 00:55:44 UTC
65048cf Fix pg_dump/pg_restore to emit REFRESH MATERIALIZED VIEW commands last. Because we push all ACL (i.e. GRANT/REVOKE) restore steps to the end, materialized view refreshes were occurring while the permissions on referenced objects were still at defaults. This led to failures if, say, an MV owned by user A reads from a table owned by user B, even if B had granted the necessary privileges to A. We've had multiple complaints about that type of restore failure, most recently from Jordan Gigov. The ideal fix for this would be to start treating ACLs as dependency- sortable objects, rather than hard-wiring anything about their dump order (the existing approach is a messy kluge dating to commit dc0e76ca3). But that's going to be a rather major change, and it certainly wouldn't lead to a back-patchable fix. As a short-term solution, convert the existing two-pass hack (ie, normal objects then ACLs) to a three-pass hack, ie, normal objects then ACLs then matview refreshes. Because this happens in RestoreArchive(), it will also fix the problem when restoring from an existing archive-format dump. (Note this means that if a matview refresh would have failed under the permissions prevailing at dump time, it'll fail during restore as well. We'll define that as user error rather than something we should try to work around.) To avoid performance loss in parallel restore, we need the matview refreshes to still be parallelizable. Hence, clean things up enough so that both ACLs and matviews are handled by the parallel restore infrastructure, instead of reverting back to serial restore for ACLs. There is still a final serial step, but it shouldn't normally have to do anything; it's only there to try to recover if we get stuck due to some problem like unresolved circular dependencies. Patch by me, but it owes something to an earlier attempt by Kevin Grittner. Back-patch to 9.3 where materialized views were introduced. Discussion: https://postgr.es/m/28572.1500912583@sss.pgh.pa.us 03 August 2017, 21:36:42 UTC
12f1e52 Fix build on zlib-less environments Commit 4d57e8381677 added support for getting I/O errors out of zlib, but it introduced a portability problem for systems without zlib. Repair by wrapping the zlib call inside #ifdef and restore the original code in the other branch. This serves to illustrate the inadequacy of the zlib abstraction in pg_backup_archiver: there is no way to call gzerror() in that abstraction. This means that the several places that call GZREAD and GZWRITE are currently doing error reporting wrongly, but ENOTIME to get it fixed before next week's release set. Backpatch to 9.4, like the commit that introduced the problem. 03 August 2017, 18:55:17 UTC
f3142c0 Fix pg_dump's errno checking for zlib I/O Some error reports were reporting strerror(errno), which for some error conditions coming from zlib are wrong, resulting in confusing reports such as pg_restore: [compress_io] could not read from input file: Success which makes no sense. To correctly extract the error message we need to use gzerror(), so let's do that. This isn't as comprehensive or as neat as I would like, but at least it should improve things in many common cases. The zlib abstraction in compress_io does not seem to be applied consistently enough; we could perhaps improve that, but it seems master-only material, not a bug fix for back-patching. This problem goes back all the way, but I decided to apply back to 9.4 only, because older branches don't contain commit 14ea89366 which this change depends on. Authors: Vladimir Kunschikov, Álvaro Herrera Discussion: https://postgr.es/m/1498120508308.9826@infotecs.ru 02 August 2017, 22:26:58 UTC
331bf9d Add pgtcl back to the list of externally-maintained client interfaces. FlightAware is still maintaining this, and indeed is seemingly being more active with it than the pgtclng fork is. List both, for the time being anyway. In the back branches, also back-port commit e20f679f6 and other recent updates to the client-interfaces list. I think these are probably of current interest to users of back branches. I did not touch the list of externally maintained PLs in the back branches, though. Those are much more likely to be server version sensitive, and I don't know which of these PLs work all the way back. Discussion: https://postgr.es/m/20170730162612.1449.58796@wrigleys.postgresql.org 02 August 2017, 20:55:03 UTC
af1f182 Silence warning from modern perl about unescaped braces Back-patch commit 76a1c97bf21c301f61bb41345e0cdd0d365b2086 into the older branches (9.5 and before). This is needed because perl 5.26 and later treats the case as an error not just a warning. Original patch by Andrew Dunstan, need for back-patch noted by Ashutosh Sharma Discussion: https://postgr.es/m/CAE9k0PkfNcmj9pA7Yx4qQ=K=3aY4TuiRhp7KYpayDWm9MYsnjg@mail.gmail.com 02 August 2017, 19:07:20 UTC
ca99f5b Fix comment. XLByteToSeg and XLByteToPrevSeg calculate only a segment number. The definition of these macros were modified by commit dfda6ebaec6763090fb78b458a979b558c50b39b but the comment remain unchanged. Patch by Yugo Nagata. Back patched to 9.3 and beyond. 31 July 2017, 23:08:47 UTC
d6f87a2 Doc: specify that the minimum supported version of Perl is 5.8.3. Previously the docs just said "5.8 or later". Experimentation shows that while you can build on Unix from a git checkout with 5.8.0, compiling recent PL/Perl requires at least 5.8.1, and you won't be able to run the TAP tests with less than 5.8.3 because that's when they added "prove". (I do not have any information on just what the MSVC build scripts require.) Since all these versions are quite ancient, let's not split hairs in the docs, but just say that 5.8.3 is the minimum requirement. Discussion: https://postgr.es/m/16894.1501392088@sss.pgh.pa.us 31 July 2017, 17:42:48 UTC
df52739 PL/Perl portability fix: absorb relevant -D switches from Perl. Back-patch of commit 3c163a7fc76debbbdad1bdd3c43721cffe72f4db, which see for more info. Also throw in commit b4cc35fbb709bd6fcae8998f041fd731c9acbf42, so Coverity doesn't whine about the back branches. Ashutosh Sharma, some adjustments by me Discussion: https://postgr.es/m/CANFyU97OVQ3+Mzfmt3MhuUm5NwPU=-FtbNH5Eb7nZL9ua8=rcA@mail.gmail.com 31 July 2017, 16:38:35 UTC
99eea89 PL/Perl portability fix: avoid including XSUB.h in plperl.c. Back-patch of commit bebe174bb4462ef079a1d7eeafb82ff969f160a4, which see for more info. Patch by me, with some help from Ashutosh Sharma Discussion: https://postgr.es/m/CANFyU97OVQ3+Mzfmt3MhuUm5NwPU=-FtbNH5Eb7nZL9ua8=rcA@mail.gmail.com 31 July 2017, 16:10:36 UTC
d90d5a1 Add missing comment in postgresql.conf. current_source requires to restart server to reflect the new value. Per Yugo Nagata and Masahiko Sawada. Back patched to 9.2 and beyond. 31 July 2017, 02:28:19 UTC
36bda39 Add missing comment in postgresql.conf. dynamic_shared_memory_type requires to restart server to reflect the new value. Per Yugo Nagata and Masahiko Sawada. Back pached to 9.4 and beyond. 31 July 2017, 02:11:28 UTC
62228ff Fix psql tab completion for CREATE USER MAPPING. After typing CREATE USER M..., it would not fill in MAPPING FOR, even though that was clearly intended behavior. Jeff Janes Discussion: https://postgr.es/m/CAMkU=1wo2iQ6jWnN=egqOb5NxEPn0PpANEtKHr3uPooQ+nYPtw@mail.gmail.com 27 July 2017, 18:13:15 UTC
3095609 Clean up SQL emitted by psql/describe.c. Fix assorted places that had not bothered with the convention of prefixing catalog and function names with "pg_catalog.". That could possibly result in query failure when running with a nondefault search_path. Also fix two places that weren't quoting OID literals. I think the latter hasn't mattered much since about 7.3, but it's still a bad idea to be doing it in 99 places and not in 2 others. Also remove a useless EXISTS sub-select that someone had stuck into describeOneTableDetails' queries for child tables. We just got the OID out of pg_class, so I hardly see how checking that it exists in pg_class was doing anything helpful. In passing, try to improve the emitted formatting of a couple of these queries, though I didn't work really hard on that. And merge unnecessarily duplicative coding in some other places. Much of this was new in HEAD, but some was quite old; back-patch as appropriate. 26 July 2017, 23:36:20 UTC
0d4604a Fix concurrent locking of tuple update chain If several sessions are concurrently locking a tuple update chain with nonconflicting lock modes using an old snapshot, and they all succeed, it may happen that some of them fail because of restarting the loop (due to a concurrent Xmax change) and getting an error in the subsequent pass while trying to obtain a tuple lock that they already have in some tuple version. This can only happen with very high concurrency (where a row is being both updated and FK-checked by multiple transactions concurrently), but it's been observed in the field and can have unpleasant consequences such as an FK check failing to see a tuple that definitely exists: ERROR: insert or update on table "child_table" violates foreign key constraint "fk_constraint_name" DETAIL: Key (keyid)=(123456) is not present in table "parent_table". (where the key is observably present in the table). Discussion: https://postgr.es/m/20170714210011.r25mrff4nxjhmf3g@alvherre.pgsql 26 July 2017, 21:24:16 UTC
5146ca3 Fix race condition in predicate-lock init code in EXEC_BACKEND builds. Trading a little too heavily on letting the code path be the same whether we were creating shared data structures or only attaching to them, InitPredicateLocks() inserted the "scratch" PredicateLockTargetHash entry unconditionally. This is just wrong if we're in a postmaster child, which would only reach this code in EXEC_BACKEND builds. Most of the time, the hash_search(HASH_ENTER) call would simply report that the entry already existed, causing no visible effect since the code did not bother to check for that possibility. However, if this happened while some other backend had transiently removed the "scratch" entry, then that other backend's eventual RestoreScratchTarget would suffer an assert failure; this appears to be the explanation for a recent failure on buildfarm member culicidae. In non-assert builds, there would be no visible consequences there either. But nonetheless this is a pretty bad bug for EXEC_BACKEND builds, for two reasons: 1. Each new backend would perform the hash_search(HASH_ENTER) call without holding any lock that would prevent concurrent access to the PredicateLockTargetHash hash table. This creates a low but certainly nonzero risk of corruption of that hash table. 2. In the event that the race condition occurred, by reinserting the scratch entry too soon, we were defeating the entire purpose of the scratch entry, namely to guarantee that transaction commit could move hash table entries around with no risk of out-of-memory failure. The odds of an actual OOM failure are quite low, but not zero, and if it did happen it would again result in corruption of the hash table. The user-visible symptoms of such corruption are a little hard to predict, but would presumably amount to misbehavior of SERIALIZABLE transactions that'd require a crash or postmaster restart to fix. To fix, just skip the hash insertion if IsUnderPostmaster. I also inserted a bunch of assertions that the expected things happen depending on whether IsUnderPostmaster is true. That might be overkill, since most comparable code in other functions isn't quite that paranoid, but once burnt twice shy. In passing, also move a couple of lines to places where they seemed to make more sense. Diagnosis of problem by Thomas Munro, patch by me. Back-patch to all supported branches. Discussion: https://postgr.es/m/10593.1500670709@sss.pgh.pa.us 24 July 2017, 20:45:46 UTC
f579580 Ensure that pg_get_ruledef()'s output matches pg_get_viewdef()'s. Various cases involving renaming of view columns are handled by having make_viewdef pass down the view's current relation tupledesc to get_query_def, which then takes care to use the column names from the tupledesc for the output column names of the SELECT. For some reason though, we'd missed teaching make_ruledef to do similarly when it is printing an ON SELECT rule, even though this is exactly the same case. The results from pg_get_ruledef would then be different and arguably wrong. In particular, this breaks pre-v10 versions of pg_dump, which in some situations would define views by means of emitting a CREATE RULE ... ON SELECT command. Third-party tools might not be happy either. In passing, clean up some crufty code in make_viewdef; we'd apparently modernized the equivalent code in make_ruledef somewhere along the way, and missed this copy. Per report from Gilles Darold. Back-patch to all supported versions. Discussion: https://postgr.es/m/ec05659a-40ff-4510-fc45-ca9d965d0838@dalibo.com 24 July 2017, 19:16:31 UTC
bef96e5 MSVC: Accept tcl86.lib in addition to tcl86t.lib. ActiveTcl8.6.4.1.299124-win32-x86_64-threaded.exe ships just tcl86.lib. Back-patch to 9.2, like the commit recognizing tcl86t.lib. 24 July 2017, 06:53:41 UTC
7debd9f Fix pg_dump's handling of event triggers. pg_dump with the --clean option failed to emit DROP EVENT TRIGGER commands for event triggers. In a closely related oversight, it also did not emit ALTER OWNER commands for event triggers. Since only superusers can create event triggers, the latter oversight is of little practical consequence ... but if we're going to record an owner for event triggers, then surely pg_dump should preserve it. Per complaint from Greg Atkins. Back-patch to 9.3 where event triggers were introduced. Discussion: https://postgr.es/m/20170722191142.yi4e7tzcg3iacclg@gmail.com 23 July 2017, 00:20:09 UTC
ed367be pg_rewind: Fix some problems when copying files >2GB. When incrementally updating a file larger than 2GB, the old code could either fail outright (if the client asked the server for bytes beyond the 2GB boundary) or fail to copy all the blocks that had actually been modified (if the server reported a file size to the client in excess of 2GB), resulting in data corruption. Generally, such files won't occur anyway, but they might if using a non-default segment size or if there the directory contains stray files unrelated to PostgreSQL. Fix by a more prudent choice of data types. Even with these improvements, this code still uses a mix of different types (off_t, size_t, uint64, int64) to represent file sizes and offsets, not all of which necessarily have the same width or signedness, so further cleanup might be in order here. However, at least now they all have the potential to be 64 bits wide on 64-bit platforms. Kuntal Ghosh and Michael Paquier, with a tweak by me. Discussion: http://postgr.es/m/CAGz5QC+8gbkz=Brp0TgoKNqHWTzonbPtPex80U0O6Uh_bevbaA@mail.gmail.com 22 July 2017, 02:05:07 UTC
90877d2 Stabilize postgres_fdw regression tests. The new test cases added in commit 8bf58c0d9 turn out to have output that can vary depending on the lc_messages setting prevailing on the test server. Hide the remote end's error messages to ensure stable output. This isn't a terribly desirable solution; we'd rather know that the connection failed for the expected reason and not some other one. But there seems little choice for the moment. Per buildfarm. Discussion: https://postgr.es/m/18419.1500658570@sss.pgh.pa.us 21 July 2017, 18:20:43 UTC
c54e4a5 pg_rewind: Fix busted sanity check. As written, the code would only fail the sanity check if none of the columns returned by the server were of the expected type, but we want it to fail if even one column is not of the expected type. Discussion: http://postgr.es/m/CA+TgmoYuY5zW7JEs+1hSS1D=V5K8h1SQuESrq=bMNeo0B71Sfw@mail.gmail.com 21 July 2017, 17:04:18 UTC
72318ea Re-establish postgres_fdw connections after server or user mapping changes. Previously, postgres_fdw would keep on using an existing connection even if the user did ALTER SERVER or ALTER USER MAPPING commands that should affect connection parameters. Teach it to watch for catcache invals on these catalogs and re-establish connections when the relevant catalog entries change. Per bug #14738 from Michal Lis. In passing, clean up some rather crufty decisions in commit ae9bfc5d6 about where fields of ConnCacheEntry should be reset. We now reset all the fields whenever we open a new connection. Kyotaro Horiguchi, reviewed by Ashutosh Bapat and myself. Back-patch to 9.3 where postgres_fdw appeared. Discussion: https://postgr.es/m/20170710113917.7727.10247@wrigleys.postgresql.org 21 July 2017, 16:51:38 UTC
315ca7f Doc: clarify description of degenerate NATURAL joins. Claiming that NATURAL JOIN is equivalent to CROSS JOIN when there are no common column names is only strictly correct if it's an inner join; you can't say e.g. CROSS LEFT JOIN. Better to explain it as meaning JOIN ON TRUE, instead. Per a suggestion from David Johnston. Discussion: https://postgr.es/m/CAKFQuwb+mYszQhDS9f_dqRrk1=Pe-S6D=XMkAXcDf4ykKPmgKQ@mail.gmail.com 20 July 2017, 16:41:37 UTC
c2bbec9 Fix dumping of outer joins with empty qual lists. Normally, a JoinExpr would have empty "quals" only if it came from CROSS JOIN syntax. However, it's possible to get to this state by specifying NATURAL JOIN between two tables with no common column names, and there might be other ways too. The code previously printed no ON clause if "quals" was empty; that's right for CROSS JOIN but syntactically invalid if it's some type of outer join. Fix by printing ON TRUE in that case. This got broken by commit 2ffa740be, which stopped using NATURAL JOIN syntax in ruleutils output due to its brittleness in the face of column renamings. Back-patch to 9.3 where that commit appeared. Per report from Tushar Ahuja. Discussion: https://postgr.es/m/98b283cd-6dda-5d3f-f8ac-87db8c76a3da@enterprisedb.com 20 July 2017, 15:29:36 UTC
d6dfa5f Doc: add missing note about permissions needed to change log_lock_waits. log_lock_waits is PGC_SUSET, but config.sgml lacked the standard boilerplate sentence noting that. Report: https://postgr.es/m/20170719100838.19352.16320@wrigleys.postgresql.org 19 July 2017, 16:58:59 UTC
6b62312 Doc: explain dollar quoting in the intro part of the pl/pgsql chapter. We're throwing people into the guts of the syntax with not much context; let's back up one step and point out that this goes inside a literal in a CREATE FUNCTION command. Per suggestion from Kurt Kartaltepe. Discussion: https://postgr.es/m/CACawnnyWAmH+au8nfZhLiFfWKjXy4d0kY+eZWfcxPRnjVfaa_Q@mail.gmail.com 17 July 2017, 20:43:30 UTC
45c7825 Merge large_object.sql test into largeobject.source. It seems pretty confusing to have tests named both largeobject and large_object. The latter is of very recent vintage (commit ff992c074), so get rid of it in favor of merging into the former. Also, enable the LO comment test that was added by commit 70ad7ed4e, since the later commit added the then-missing pg_upgrade functionality. The large_object.sql test case is almost completely redundant with that, but not quite: it seems like creating a user-defined LO with an OID in the system range might be an interesting case for pg_upgrade, so let's keep it. Like the earlier patch, back-patch to all supported branches. Discussion: https://postgr.es/m/18665.1500306372@sss.pgh.pa.us 17 July 2017, 19:28:17 UTC
f13e48b fix typo 16 July 2017, 16:00:47 UTC
7eb4124 Fix vcregress.pl PROVE_FLAGS bug in commit 93b7d9731f This change didn't adjust the publicly visible taptest function, causing buildfarm failures on bowerbird. Backpatch to 9.4 like previous change. 16 July 2017, 15:27:07 UTC
f20d58a Fix pg_basebackup output to stdout on Windows. When writing a backup to stdout with pg_basebackup on Windows, put stdout to binary mode. Any CR bytes in the output will otherwise be output incorrectly as CR+LF. In the passing, standardize on using "_setmode" instead of "setmode", for the sake of consistency. They both do the same thing, but according to MSDN documentation, setmode is deprecated. Fixes bug #14634, reported by Henry Boehlert. Patch by Haribabu Kommi. Backpatch to all supported versions. Discussion: https://www.postgresql.org/message-id/20170428082818.24366.13134@wrigleys.postgresql.org 14 July 2017, 13:03:12 UTC
3fbebd7 Fix dumping of FUNCTION RTEs that contain non-function-call expressions. The grammar will only accept something syntactically similar to a function call in a function-in-FROM expression. However, there are various ways to input something that ruleutils.c won't deparse that way, potentially leading to a view or rule that fails dump/reload. Fix by inserting a dummy CAST around anything that isn't going to deparse as a function (which is one of the ways to get something like that in there in the first place). In HEAD, also make use of the infrastructure added by this to avoid emitting unnecessary parentheses in CREATE INDEX deparsing. I did not change that in back branches, thinking that people might find it to be unexpected/unnecessary behavioral change. In HEAD, also fix incorrect logic for when to add extra parens to partition key expressions. Somebody apparently thought they could get away with simpler logic than pg_get_indexdef_worker has, but they were wrong --- a counterexample is PARTITION BY LIST ((a[1])). Ignoring the prettyprint flag for partition expressions isn't exactly a nice solution anyway. This has been broken all along, so back-patch to all supported branches. Discussion: https://postgr.es/m/10477.1499970459@sss.pgh.pa.us 13 July 2017, 23:24:44 UTC
cbf3e65 Fix race between GetNewTransactionId and GetOldestActiveTransactionId. The race condition goes like this: 1. GetNewTransactionId advances nextXid e.g. from 100 to 101 2. GetOldestActiveTransactionId reads the new nextXid, 101 3. GetOldestActiveTransactionId loops through the proc array. There are no active XIDs there, so it returns 101 as the oldest active XID. 4. GetNewTransactionid stores XID 100 to MyPgXact->xid So, GetOldestActiveTransactionId returned XID 101, even though 100 only just started and is surely still running. This would be hard to hit in practice, and even harder to spot any ill effect if it happens. GetOldestActiveTransactionId is only used when creating a checkpoint in a master server, and the race condition can only happen on an online checkpoint, as there are no backends running during a shutdown checkpoint. The oldestActiveXid value of an online checkpoint is only used when starting up a hot standby server, to determine the starting point where pg_subtrans is initialized from. For the race condition to happen, there must be no other XIDs in the proc array that would hold back the oldest-active XID value, which means that the missed XID must be a top transaction's XID. However, pg_subtrans is not used for top XIDs, so I believe an off-by-one error is in fact inconsequential. Nevertheless, let's fix it, as it's clearly wrong and the fix is simple. This has been wrong ever since hot standby was introduced, so backport to all supported versions. Discussion: https://www.postgresql.org/message-id/e7258662-82b6-7a45-56d4-99b337a32bf7@iki.fi 13 July 2017, 12:48:40 UTC
2346f1c Fix ruleutils.c for domain-over-array cases, too. Further investigation shows that ruleutils isn't quite up to speed either for cases where we have a domain-over-array: it needs to be prepared to look past a CoerceToDomain at the top level of field and element assignments, else it decompiles them incorrectly. Potentially this would result in failure to dump/reload a rule, if it looked like the one in the new test case. (I also added a test for EXPLAIN; that output isn't broken, but clearly we need more test coverage here.) Like commit b1cb32fb6, this bug is reachable in cases we already support, so back-patch all the way. 12 July 2017, 22:00:04 UTC
209970d Reduce memory usage of tsvector type analyze function. compute_tsvector_stats() detoasted and kept in memory every tsvector value in the sample, but that can be a lot of memory. The original bug report described a case using over 10 gigabytes, with statistics target of 10000 (the maximum). To fix, allocate a separate copy of just the lexemes that we keep around, and free the detoasted tsvector values as we go. This adds some palloc/pfree overhead, when you have a lot of distinct lexemes in the sample, but it's better than running out of memory. Fixes bug #14654 reported by James C. Reviewed by Tom Lane. Backport to all supported versions. Discussion: https://www.postgresql.org/message-id/20170514200602.1451.46797@wrigleys.postgresql.org 12 July 2017, 19:03:58 UTC
e7213fe Avoid integer overflow while sifting-up a heap in tuplesort.c. If the number of tuples in the heap exceeds approximately INT_MAX/2, this loop's calculation "2*i+1" could overflow, resulting in a crash. Fix it by using unsigned int rather than int for the relevant local variables; that shouldn't cost anything extra on any popular hardware. Per bug #14722 from Sergey Koposov. Original patch by Sergey Koposov, modified by me per a suggestion from Heikki Linnakangas to use unsigned int not int64. Back-patch to 9.4, where tuplesort.c grew the ability to sort as many as INT_MAX tuples in-memory (commit 263865a48). Discussion: https://postgr.es/m/20170629161637.1478.93109@wrigleys.postgresql.org 12 July 2017, 17:24:16 UTC
4458d60 Fix variable and type name in comment. Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/20170711.163441.241981736.horiguchi.kyotaro@lab.ntt.co.jp 12 July 2017, 14:09:11 UTC
9c32c29 Fix ordering of operations in SyncRepWakeQueue to avoid assertion failure. Commit 14e8803f1 removed the locking in SyncRepWaitForLSN, but that introduced a race condition, where SyncRepWaitForLSN might see syncRepState already set to SYNC_REP_WAIT_COMPLETE, but the process was not yet removed from the queue. That tripped the assertion, that the process should no longer be in the uqeue. Reorder the operations in SyncRepWakeQueue to remove the process from the queue first, and update syncRepState only after that, and add a memory barrier in between to make sure the operations are made visible to other processes in that order. Fixes bug #14721 reported by Const Zhang. Analysis and fix by Thomas Munro. Backpatch down to 9.5, where the locking was removed. Discussion: https://www.postgresql.org/message-id/20170629023623.1480.26508%40wrigleys.postgresql.org 12 July 2017, 12:35:36 UTC
2c71c88 Remove unnecessary braces, to match the surrounding style. Mostly in the new subscription-related commands. Backport the few that were also present in older versions. Thomas Munro Discussion: https://www.postgresql.org/message-id/CAEepm=3CyW1QmXcXJXmqiJXtXzFDc8SvSfnxkEGD3Bkv2SrkeQ@mail.gmail.com 12 July 2017, 09:31:16 UTC
56076b8 Fix multiple assignments to a column of a domain type. We allow INSERT and UPDATE commands to assign to the same column more than once, as long as the assignments are to subfields or elements rather than the whole column. However, this failed when the target column was a domain over array rather than plain array. Fix by teaching process_matched_tle() to look through CoerceToDomain nodes, and add relevant test cases. Also add a group of test cases exercising domains over array of composite. It's doubtless accidental that CREATE DOMAIN allows this case while not allowing straight domain over composite; but it does, so we'd better make sure we don't break it. (I could not find any documentation mentioning either side of that, so no doc changes.) It's been like this for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/4206.1499798337@sss.pgh.pa.us 11 July 2017, 20:48:59 UTC
afd1415 On Windows, retry process creation if we fail to reserve shared memory. We've heard occasional reports of backend launch failing because pgwin32_ReserveSharedMemoryRegion() fails, indicating that something has already used that address space in the child process. It's not very clear what, given that we disable ASLR in Windows builds, but suspicion falls on antivirus products. It'd be better if we didn't have to disable ASLR, anyway. So let's try to ameliorate the problem by retrying the process launch after such a failure, up to 100 times. Patch by me, based on previous work by Amit Kapila and others. This is a longstanding issue, so back-patch to all supported branches. Discussion: https://postgr.es/m/CAA4eK1+R6hSx6t_yvwtx+NRzneVp+MRqXAdGJZChcau8Uij-8g@mail.gmail.com 10 July 2017, 15:00:09 UTC
6c1860d Doc: clarify wording about tool requirements in sourcerepo.sgml. Original wording had confusingly vague antecedent for "they", so replace that with a more repetitive but clearer formulation. In passing, make the link to the installation requirements section more specific. Per gripe from Martin Mai, though this is not the fix he initially proposed. Discussion: https://postgr.es/m/CAN_NWRu-cWuNaiXUjV3m4H-riWURuPW=j21bSaLADs6rjjzXgQ@mail.gmail.com 10 July 2017, 04:08:41 UTC
fb2d385 Fix potential data corruption during freeze Fix oversight in 3b97e6823b94 bug fix. Bitwise AND is used instead of OR and it cleans all bits in t_infomask heap tuple field. Backpatch to 9.3 06 July 2017, 14:20:17 UTC
90630a6 Treat clean shutdown of an SSL connection same as the non-SSL case. If the client closes an SSL connection, treat it the same as EOF on a non-SSL connection. In particular, don't write a message in the log about that. Michael Paquier. Discussion: https://www.postgresql.org/message-id/CAB7nPqSfyVV42Q2acFo%3DvrvF2gxoZAMJLAPq3S3KkjhZAYi7aw@mail.gmail.com 03 July 2017, 11:53:06 UTC
446914f Fix walsender to exit promptly if client requests shutdown. It's possible for WalSndWaitForWal to be asked to wait for WAL that doesn't exist yet. That's fine, in fact it's the normal situation if we're caught up; but when the client requests shutdown we should not keep waiting. The previous coding could wait indefinitely if the source server was idle. In passing, improve the rather weak comments in this area, and slightly rearrange some related code for better readability. Back-patch to 9.4 where this code was introduced. Discussion: https://postgr.es/m/14154.1498781234@sss.pgh.pa.us 30 June 2017, 16:00:03 UTC
63f5db8 Second try at fixing tcp_keepalives_idle option on Solaris. Buildfarm evidence shows that TCP_KEEPALIVE_THRESHOLD doesn't exist after all on Solaris < 11. This means we need to take positive action to prevent the TCP_KEEPALIVE code path from being taken on that platform. I've chosen to limit it with "&& defined(__darwin__)", since it's unclear that anyone else would follow Apple's precedent of spelling the symbol that way. Also, follow a suggestion from Michael Paquier of eliminating code duplication by defining a couple of intermediate symbols for the socket option. In passing, make some effort to reduce the number of translatable messages by replacing "setsockopt(foo) failed" with "setsockopt(%s) failed", etc, throughout the affected files. And update relevant documentation so that it doesn't claim to provide an exhaustive list of the possible socket option names. Like the previous commit (f0256c774), back-patch to all supported branches. Discussion: https://postgr.es/m/20170627163757.25161.528@wrigleys.postgresql.org 28 June 2017, 16:30:16 UTC
d16e96f Support tcp_keepalives_idle option on Solaris. Turns out that the socket option for this is named TCP_KEEPALIVE_THRESHOLD, at least according to the tcp(7P) man page for Solaris 11. (But since that text refers to "SunOS", it's likely pretty ancient.) It appears that the symbol TCP_KEEPALIVE does get defined on that platform, but it doesn't seem to represent a valid protocol-level socket option. This leads to bleats in the postmaster log, and no tcp_keepalives_idle functionality. Per bug #14720 from Andrey Lizenko, as well as an earlier report from Dhiraj Chawla that nobody had followed up on. The issue's been there since we added the TCP_KEEPALIVE code path in commit 5acd417c8, so back-patch to all supported branches. Discussion: https://postgr.es/m/20170627163757.25161.528@wrigleys.postgresql.org 27 June 2017, 22:47:57 UTC
9a27065 Re-allow SRFs and window functions within sub-selects within aggregates. check_agg_arguments_walker threw an error upon seeing a SRF or window function, but that is too aggressive: if the function is within a sub-select then it's perfectly fine. I broke the SRF case in commit 0436f6bde by copying the logic for window functions ... but that was broken too, and had been since commit eaccfded9. Repair both cases in HEAD, and the window function case back to 9.3. 9.2 gets this right. 27 June 2017, 21:51:11 UTC
dc311b5 Don't lose walreceiver start requests due to race condition in postmaster. When a walreceiver dies, the startup process will notice that and send a PMSIGNAL_START_WALRECEIVER signal to the postmaster, asking for a new walreceiver to be launched. There's a race condition, which at least in HEAD is very easy to hit, whereby the postmaster might see that signal before it processes the SIGCHLD from the walreceiver process. In that situation, sigusr1_handler() just dropped the start request on the floor, reasoning that it must be redundant. Eventually, after 10 seconds (WALRCV_STARTUP_TIMEOUT), the startup process would make a fresh request --- but that's a long time if the connection could have been re-established almost immediately. Fix it by setting a state flag inside the postmaster that we won't clear until we do launch a walreceiver. In cases where that results in an extra walreceiver launch, it's up to the walreceiver to realize it's unwanted and go away --- but we have, and need, that logic anyway for the opposite race case. I came across this through investigating unexpected delays in the src/test/recovery TAP tests: it manifests there in test cases where a master server is stopped and restarted while leaving streaming slaves active. This logic has been broken all along, so back-patch to all supported branches. Discussion: https://postgr.es/m/21344.1498494720@sss.pgh.pa.us 26 June 2017, 21:31:56 UTC
5a6b95c Ignore old stats file timestamps when starting the stats collector. The stats collector disregards inquiry messages that bear a cutoff_time before when it last wrote the relevant stats file. That's fine, but at startup when it reads the "permanent" stats files, it absorbed their timestamps as if they were the times at which the corresponding temporary stats files had been written. In reality, of course, there's no data out there at all. This led to disregarding inquiry messages soon after startup if the postmaster had been shut down and restarted within less than PGSTAT_STAT_INTERVAL; which is a pretty common scenario, both for testing and in the field. Requesting backends would hang for 10 seconds and then report failure to read statistics, unless they got bailed out by some other backend coming along and making a newer request within that interval. I came across this through investigating unexpected delays in the src/test/recovery TAP tests: it manifests there because the autovacuum launcher hangs for 10 seconds when it can't get statistics at startup, thus preventing a second shutdown from occurring promptly. We might want to do some things in the autovac code to make it less prone to getting stuck that way, but this change is a good bug fix regardless. In passing, also fix pgstat_read_statsfiles() to ensure that it re-zeroes its global stats variables if they are corrupted by a short read from the stats file. (Other reads in that function go into temp variables, so that the issue doesn't arise.) This has been broken since we created the separation between permanent and temporary stats files in 8.4, so back-patch to all supported branches. Discussion: https://postgr.es/m/16860.1498442626@sss.pgh.pa.us 26 June 2017, 20:17:06 UTC
d4702b2 Fix typo in comment Once upon a time, WAL pointers could be NULL, but no longer. We talk about "valid" now. Reported-by: Amit Langote Discussion: https://postgr.es/m/33e9617d-27f1-eee8-3311-e27af98eaf2b@lab.ntt.co.jp 22 June 2017, 20:42:38 UTC
7775eba Fix possibility of creating a "phantom" segment after promotion. When promoting a standby just after a XLOG_SWITCH record was replayed, and next segment(s) are already are locally available (via walsender, restore_command + trigger/recovery target), that segment could accidentally be recycled onto the past of the new timeline. Later checkpointer would create a .ready file for it, assuming there was an error during creation, and it would get archived. That causes trouble if another standby is later brought up from a basebackup from before the timeline creation, because it would try to read the segment, because XLogFileReadAnyTLI just tries all possible timelines, which doesn't have valid contents. Thus replay would fail. The problem, if already occurred, can be fixed by removing the segment and/or having restore_command filter it out. The reason for the creation of such "phantom" segments was, that after an XLOG_SWITCH record the EndOfLog variable points to the beginning of the next segment, and RemoveXlogFile() used XLByteToPrevSeg(). Normally RemoveXlogFile() doing so is harmless, because the last segment will still exist preventing InstallXLogFileSegment() from causing harm, but just after promotion there's no previous segment on the new timeline. Fix that by using XLByteToSeg() instead of XLByteToPrevSeg(). Author: Andres Freund Reported-By: Greg Burek Discussion: https://postgr.es/m/20170619073026.zcwpe6mydsaz5ygd@alap3.anarazel.de Backpatch: 9.2-, bug older than all supported versions 21 June 2017, 21:14:29 UTC
26bef29 Fix typo in comment. Etsuro Fujita 21 June 2017, 08:55:29 UTC
f4f867b pg_upgrade: start/stop new server after pg_resetwal When commit 0f33a719fdbb5d8c43839ea0d2c90cd03e2af2d2 removed the instructions to start/stop the new cluster before running rsync, it was now possible for pg_resetwal/pg_resetxlog to leave the final WAL record at wal_level=minimum, preventing upgraded standby servers from reconnecting. This patch fixes that by having pg_upgrade unconditionally start/stop the new cluster after pg_resetwal/pg_resetxlog has run. Backpatch through 9.2 since, though the instructions were added in PG 9.5, they worked all the way back to 9.2. Discussion: https://postgr.es/m/20170620171844.GC24975@momjian.us Backpatch-through: 9.2 20 June 2017, 17:20:02 UTC
525ee8f doc: adjust wal_level pg_upgrade patch Since 9.5 has two WAL levels that apply to standby upgrades, archive and hot_standby, adjust the docs for that version to say, basically, "restore old cluster wal_level value in the new cluster". This is a follow-on patch to fd376afc9863dd8ea3eba95edfa79961173e706f. Backpatch-through: 9.5 only 20 June 2017, 15:04:31 UTC
54d4d81 Fix materialized-view documentation oversights. When materialized views were added, psql's \d commands were made to treat them as a separate object category ... but not everyplace in the documentation or comments got the memo. Noted by David Johnston. Back-patch to 9.3 where matviews came in. Discussion: https://postgr.es/m/CAKFQuwb27M3VXRhHErjCpkWwN9eKThbqWb1=trtoXi9_ejqPXQ@mail.gmail.com 19 June 2017, 22:32:22 UTC
4fc274d On Windows, make pg_dump use binary mode for compressed plain text output. The combination of -Z -Fp and output to stdout resulted in corrupted output data, because we left stdout in text mode, resulting in newline conversion being done on the compressed stream. Switch stdout to binary mode for this case, at the same place where we do it for non-text output formats. Report and patch by Kuntal Ghosh, tested by Ashutosh Sharma and Neha Sharma. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAGz5QCJPvbBjXAmJuGx1B_41yVCetAJhp7rtaDf7XQGWuB1GSw@mail.gmail.com 19 June 2017, 15:03:07 UTC
1ba1adf Fix leaking of small spilled subtransactions during logical decoding. When, during logical decoding, a transaction gets too big, it's contents get spilled to disk. Not just the top-transaction gets spilled, but *also* all of its subtransactions, even if they're not that large themselves. Unfortunately we didn't clean up such small spilled subtransactions from disk. Fix that, by keeping better track of whether a transaction has been spilled to disk. Author: Andres Freund Reported-By: Dmitriy Sarafannikov, Fabrízio de Royes Mello Discussion: https://postgr.es/m/1457621358.355011041@f382.i.mail.ru https://postgr.es/m/CAFcNs+qNMhNYii4nxpO6gqsndiyxNDYV0S=JNq0v_sEE+9PHXg@mail.gmail.com Backpatch: 9.4-, where logical decoding was introduced 19 June 2017, 02:13:50 UTC
501e6f8 Fix dependency, when changing a function's argument/return type. When a new base type is created using the old-style procedure of first creating the input/output functions with "opaque" in place of the base type, the "opaque" argument/return type is changed to the final base type, on CREATE TYPE. However, we did not create a pg_depend record when doing that, so the functions were left not depending on the type. Fixes bug #14706, reported by Karen Huddleston. Discussion: https://www.postgresql.org/message-id/20170614232259.1424.82774@wrigleys.postgresql.org 16 June 2017, 08:44:00 UTC
c7e17ce Fix low-probability leaks of PGresult objects in the backend. We had three occurrences of essentially the same coding pattern wherein we tried to retrieve a query result from a libpq connection without blocking. In the case where PQconsumeInput failed (typically indicating a lost connection), all three loops simply gave up and returned, forgetting to clear any previously-collected PGresult object. Since those are malloc'd not palloc'd, the oversight results in a process-lifespan memory leak. One instance, in libpqwalreceiver, is of little significance because the walreceiver process would just quit anyway if its connection fails. But we might as well fix it. The other two instances, in postgres_fdw, are somewhat more worrisome because at least in principle the scenario could be repeated, allowing the amount of memory leaked to build up to something worth worrying about. Moreover, in these cases the loops contain CHECK_FOR_INTERRUPTS calls, as well as other calls that could potentially elog(ERROR), providing another way to exit without having cleared the PGresult. Here we need to add PG_TRY logic similar to what exists in quite a few other places in postgres_fdw. Coverity noted the libpqwalreceiver bug; I found the other two cases by checking all calls of PQconsumeInput. Back-patch to all supported versions as appropriate (9.2 lacks postgres_fdw, so this is really quite unexciting for that branch). Discussion: https://postgr.es/m/22620.1497486981@sss.pgh.pa.us 15 June 2017, 19:03:55 UTC
85f2f54 doc: remove mention of Windows junction points by pg_upgrade pg_upgrade never used Windows junction points but instead always used Windows hard links. Reported-by: Adrian Klaver Discussion: https://postgr.es/m/6a638c60-90bb-4921-8ee4-5fdad68f8b09@aklaver.com Backpatch-through: 9.3, where the mention first appeared 15 June 2017, 17:25:45 UTC
fd376af docs: Fix pg_upgrade standby server upgrade docs It was unsafe to instruct users to start/stop the server after pg_upgrade was run but before the standby servers were rsync'ed. The new instructions avoid this. RELEASE NOTES: This fix should be mentioned in the minor release notes. Reported-by: Dmitriy Sarafannikov and Sergey Burladyan Discussion: https://postgr.es/m/87wp8o506b.fsf@seb.koffice.internal Backpatch-through: 9.5, where standby server upgrade instructions first appeared 15 June 2017, 16:30:02 UTC
7efc08d Fix document bug regarding read only transactions. It was explained that read only transactions (not in standby) allow to update sequences. This had been wrong since the commit: 05d8a561ff85db1545f5768fe8d8dc9d99ad2ef7 Discussion: https://www.postgresql.org/message-id/20170614.110826.425627939780392324.t-ishii%40sraoss.co.jp 15 June 2017, 01:08:38 UTC
df6896c Assert that we don't invent relfilenodes or type OIDs in binary upgrade. During pg_upgrade's restore run, all relfilenode choices should be overridden by commands in the dump script. If we ever find ourselves choosing a relfilenode in the ordinary way, someone blew it. Likewise for pg_type OIDs. Since pg_upgrade might well succeed anyway, if there happens not to be a conflict during the regression test run, we need assertions here to keep us on the straight and narrow. We might someday be able to remove the assertion in GetNewRelFileNode, if pg_upgrade is rewritten to remove its assumption that old and new relfilenodes always match. But it's hard to see how to get rid of the pg_type OID constraint, since those OIDs are embedded in user tables in some cases. Back-patch as far as 9.5, because of the risk of back-patches breaking something here even if it works in HEAD. I'd prefer to go back further, but 9.4 fails both assertions due to get_rel_infos()'s use of a temporary table. We can't use the later-branch solution of a CTE for compatibility reasons (cf commit 5d16332e9), and it doesn't seem worth inventing some other way to do the query. (I did check, by dint of changing the Asserts to elog(WARNING), that there are no other cases of unwanted OID assignments during 9.4's regression test run.) Discussion: https://postgr.es/m/19785.1497215827@sss.pgh.pa.us 13 June 2017, 00:04:33 UTC
11aa362 Take PROVE_FLAGS from the command line but not the environment This reverts commit 56b6ef893fee9e9bf47d927a02f4d1ea911f4d9c and instead makes vcregress.pl parse out PROVE_FLAGS from a command line argument when doing a TAP test, thus making it consistent with the makefile treatment. Discussion: https://postgr.es/m/c26a7416-2fb9-34ab-7991-618c922f896e%402ndquadrant.com Backpatch to 9.4 like previous patch. 10 June 2017, 14:23:21 UTC
b7665f0 postgres_fdw: Allow cancellation of transaction control commands. Commit f039eaac7131ef2a4cf63a10cf98486f8bcd09d2, later back-patched with commit 1b812afb0eafe125b820cc3b95e7ca03821aa675, allowed many of the queries issued by postgres_fdw to fetch remote data to respond to cancel interrupts in a timely fashion. However, it didn't do anything about the transaction control commands, which remained noninterruptible. Improve the situation by changing do_sql_command() to retrieve query results using pgfdw_get_result(), which uses the asynchronous interface to libpq so that it can check for interrupts every time libpq returns control. Since this might result in a situation where we can no longer be sure that the remote transaction state matches the local transaction state, add a facility to force all levels of the local transaction to abort if we've lost track of the remote state; without this, an apparently-successful commit of the local transaction might fail to commit changes made on the remote side. Also, add a 60-second timeout for queries issue during transaction abort; if that expires, give up and mark the state of the connection as unknown. Drop all such connections when we exit the local transaction. Together, these changes mean that if we're aborting the local toplevel transaction anyway, we can just drop the remote connection in lieu of waiting (possibly for a very long time) for it to complete an abort. This still leaves quite a bit of room for improvement. PQcancel() has no asynchronous interface, so if we get stuck sending the cancel request we'll still hang. Also, PQsetnonblocking() is not used, which means we could block uninterruptibly when sending a query. There might be some other optimizations possible as well. Nonetheless, this allows us to escape a wait for an unresponsive remote server quickly in many more cases than previously. Report by Suraj Kharage. Patch by me and Rafia Sabih. Review and testing by Amit Kapila and Tushar Ahuja. Discussion: http://postgr.es/m/CAF1DzPU8Kx+fMXEbFoP289xtm3bz3t+ZfxhmKavr98Bh-C0TqQ@mail.gmail.com 07 June 2017, 19:33:32 UTC
7d46a67 Fix docs to not claim ECPG's SET CONNECTION is not thread-aware. Changed by: Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> 07 June 2017, 14:14:28 UTC
739cb7f Clear auth context correctly when re-connecting after failed auth attempt. If authentication over an SSL connection fails, with sslmode=prefer, libpq will reconnect without SSL and retry. However, we did not clear the variables related to GSS, SSPI, and SASL authentication state, when reconnecting. Because of that, the second authentication attempt would always fail with a "duplicate GSS/SASL authentication request" error. pg_SSPI_startup did not check for duplicate authentication requests like the corresponding GSS and SASL functions, so with SSPI, you would leak some memory instead. Another way this could manifest itself, on version 10, is if you list multiple hostnames in the "host" parameter. If the first server requests Kerberos or SCRAM authentication, but it fails, the attempts to connect to the other servers will also fail with "duplicate authentication request" errors. To fix, move the clearing of authentication state from closePGconn to pgDropConnection, so that it is cleared also when re-connecting. Patch by Michael Paquier, with some kibitzing by me. Backpatch down to 9.3. 9.2 has the same bug, but the code around closing the connection is somewhat different, so that this patch doesn't apply. To fix this in 9.2, I think we would need to back-port commit 210eb9b743 first, and then apply this patch. However, given that we only bumped into this in our own testing, we haven't heard any reports from users about this, and that 9.2 will be end-of-lifed in a couple of months anyway, it doesn't seem worth the risk and trouble. Discussion: https://www.postgresql.org/message-id/CAB7nPqRuOUm0MyJaUy9L3eXYJU3AKCZ-0-03=-aDTZJGV4GyWw@mail.gmail.com 07 June 2017, 11:03:27 UTC
641a60b Unify SIGHUP handling between normal and walsender backends. Because walsender and normal backends share the same main loop it's problematic to have two different flag variables, set in signal handlers, indicating a pending configuration reload. Only certain walsender commands reach code paths checking for the variable (START_[LOGICAL_]REPLICATION, CREATE_REPLICATION_SLOT ... LOGICAL, notably not base backups). This is a bug present since the introduction of walsender, but has gotten worse in releases since then which allow walsender to do more. A later patch, not slated for v10, will similarly unify SIGHUP handling in other types of processes as well. Author: Petr Jelinek, Andres Freund Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/20170423235941.qosiuoyqprq4nu7v@alap3.anarazel.de Backpatch: 9.2-, bug is present since 9.0 06 June 2017, 02:18:16 UTC
50581f2 Prevent possibility of panics during shutdown checkpoint. When the checkpointer writes the shutdown checkpoint, it checks afterwards whether any WAL has been written since it started and throws a PANIC if so. At that point, only walsenders are still active, so one might think this could not happen, but walsenders can also generate WAL, for instance in BASE_BACKUP and logical decoding related commands (e.g. via hint bits). So they can trigger this panic if such a command is run while the shutdown checkpoint is being written. To fix this, divide the walsender shutdown into two phases. First, checkpointer, itself triggered by postmaster, sends a PROCSIG_WALSND_INIT_STOPPING signal to all walsenders. If the backend is idle or runs an SQL query this causes the backend to shutdown, if logical replication is in progress all existing WAL records are processed followed by a shutdown. Otherwise this causes the walsender to switch to the "stopping" state. In this state, the walsender will reject any further replication commands. The checkpointer begins the shutdown checkpoint once all walsenders are confirmed as stopping. When the shutdown checkpoint finishes, the postmaster sends us SIGUSR2. This instructs walsender to send any outstanding WAL, including the shutdown checkpoint record, wait for it to be replicated to the standby, and then exit. Author: Andres Freund, based on an earlier patch by Michael Paquier Reported-By: Fujii Masao, Andres Freund Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de Backpatch: 9.4, where logical decoding was introduced 06 June 2017, 02:18:16 UTC
e1319f6 Have walsenders participate in procsignal infrastructure. The non-participation in procsignal was a problem for both changes in master, e.g. parallelism not working for normal statements run in walsender backends, and older branches, e.g. recovery conflicts and catchup interrupts not working for logical decoding walsenders. This commit thus replaces the previous WalSndXLogSendHandler with procsignal_sigusr1_handler. In branches since db0f6cad48 that can lead to additional SetLatch calls, but that only rarely seems to make a difference. Author: Andres Freund Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/20170421014030.fdzvvvbrz4nckrow@alap3.anarazel.de Backpatch: 9.4, earlier commits don't seem to benefit sufficiently 06 June 2017, 02:18:16 UTC
b2482ab Fix thinko in previous openssl change 06 June 2017, 00:40:11 UTC
back to top