https://github.com/torvalds/linux

sort by:
Revision Author Date Message Commit Date
85f90cf x86: OLPC: switch over to using new EC driver on x86 This uses the new EC driver framework in drivers/platform/olpc. The XO-1 and XO-1.5-specific code is still in arch/x86, but the generic stuff (including a new workqueue; no more running EC commands with IRQs disabled!) can be shared with other architectures. Signed-off-by: Andres Salomon <dilinger@queued.net> Acked-by: Paul Fox <pgf@laptop.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> 01 August 2012, 03:27:30 UTC
d278b7a Platform: OLPC: add a suspended flag to the EC driver A problem we've noticed on XO-1.75 is when we suspend in the middle of an EC command. Don't allow that. In the process, create a private object for the generic EC driver to use; we have a framework for passing around a struct, use that rather than a proliferation of global variables. Signed-off-by: Andres Salomon <dilinger@queued.net> Acked-by: Paul Fox <pgf@laptop.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> 01 August 2012, 03:27:30 UTC
ac25041 Platform: OLPC: turn EC driver into a platform_driver The 1.75-based OLPC EC driver already does this; let's do it for all EC drivers. This gives us nice suspend/resume hooks, amongst other things. We want to run the EC's suspend hooks later than other drivers (which may be setting wakeup masks or be running EC commands). We also want to run the EC's resume hooks earlier than other drivers (which may want to run EC commands). Signed-off-by: Andres Salomon <dilinger@queued.net> Acked-by: Paul Fox <pgf@laptop.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> 01 August 2012, 03:27:30 UTC
3d26c20 Platform: OLPC: allow EC cmd to be overridden, and create a workqueue to call it This provides a new API allows different OLPC architectures to override the EC driver. x86 and ARM OLPC machines use completely different EC backends. The olpc_ec_cmd is synchronous, and waits for the workqueue to send the command to the EC. Multiple callers can run olpc_ec_cmd() at once, and they will by serialized and sleep while only one executes on the EC at a time. We don't provide an unregister function, as that doesn't make sense within the context of OLPC machines - there's only ever 1 EC, it's critical to functionality, and it certainly not hotpluggable. Signed-off-by: Andres Salomon <dilinger@queued.net> Acked-by: Paul Fox <pgf@laptop.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> 01 August 2012, 03:27:30 UTC
3bf9428 drivers: OLPC: update various drivers to include olpc-ec.h Switch over to using olpc-ec.h in multiple steps, so as not to break builds. This covers every driver that calls olpc_ec_cmd(). Signed-off-by: Andres Salomon <dilinger@queued.net> Acked-by: Paul Fox <pgf@laptop.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> 01 August 2012, 03:27:29 UTC
392a325 Platform: OLPC: add a stub to drivers/platform/ for the OLPC EC driver The OLPC EC driver has outgrown arch/x86/platform/. It's time to both share common code amongst different architectures, as well as move it out of arch/x86/. The XO-1.75 is ARM-based, and the EC driver shares a lot of code with the x86 code. Signed-off-by: Andres Salomon <dilinger@queued.net> Acked-by: Paul Fox <pgf@laptop.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> 01 August 2012, 03:27:29 UTC
08843b7 Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linux Pull nfsd changes from J. Bruce Fields: "This has been an unusually quiet cycle--mostly bugfixes and cleanup. The one large piece is Stanislav's work to containerize the server's grace period--but that in itself is just one more step in a not-yet-complete project to allow fully containerized nfs service. There are a number of outstanding delegation, container, v4 state, and gss patches that aren't quite ready yet; 3.7 may be wilder." * 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (35 commits) NFSd: make boot_time variable per network namespace NFSd: make grace end flag per network namespace Lockd: move grace period management from lockd() to per-net functions LockD: pass actual network namespace to grace period management functions LockD: manage grace list per network namespace SUNRPC: service request network namespace helper introduced NFSd: make nfsd4_manager allocated per network namespace context. LockD: make lockd manager allocated per network namespace LockD: manage grace period per network namespace Lockd: add more debug to host shutdown functions Lockd: host complaining function introduced LockD: manage used host count per networks namespace LockD: manage garbage collection timeout per networks namespace LockD: make garbage collector network namespace aware. LockD: mark host per network namespace on garbage collect nfsd4: fix missing fault_inject.h include locks: move lease-specific code out of locks_delete_lock locks: prevent side-effects of locks_release_private before file_lock is initialized NFSd: set nfsd_serv to NULL after service destruction NFSd: introduce nfsd_destroy() helper ... 31 July 2012, 21:42:28 UTC
cc8362b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph changes from Sage Weil: "Lots of stuff this time around: - lots of cleanup and refactoring in the libceph messenger code, and many hard to hit races and bugs closed as a result. - lots of cleanup and refactoring in the rbd code from Alex Elder, mostly in preparation for the layering functionality that will be coming in 3.7. - some misc rbd cleanups from Josh Durgin that are finally going upstream - support for CRUSH tunables (used by newer clusters to improve the data placement) - some cleanup in our use of d_parent that Al brought up a while back - a random collection of fixes across the tree There is another patch coming that fixes up our ->atomic_open() behavior, but I'm going to hammer on it a bit more before sending it." Fix up conflicts due to commits that were already committed earlier in drivers/block/rbd.c, net/ceph/{messenger.c, osd_client.c} * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (132 commits) rbd: create rbd_refresh_helper() rbd: return obj version in __rbd_refresh_header() rbd: fixes in rbd_header_from_disk() rbd: always pass ops array to rbd_req_sync_op() rbd: pass null version pointer in add_snap() rbd: make rbd_create_rw_ops() return a pointer rbd: have __rbd_add_snap_dev() return a pointer libceph: recheck con state after allocating incoming message libceph: change ceph_con_in_msg_alloc convention to be less weird libceph: avoid dropping con mutex before fault libceph: verify state after retaking con lock after dispatch libceph: revoke mon_client messages on session restart libceph: fix handling of immediate socket connect failure ceph: update MAINTAINERS file libceph: be less chatty about stray replies libceph: clear all flags on con_close libceph: clean up con flags libceph: replace connection state bits with states libceph: drop unnecessary CLOSED check in socket state change callback libceph: close socket directly from ceph_con_close() ... 31 July 2012, 21:35:28 UTC
2e3ee61 Merge tag 'writeback-proportions' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux Pull writeback updates from Wu Fengguang: "Use time based periods to age the writeback proportions, which can adapt equally well to fast/slow devices." Fix up trivial conflict in comment in fs/sync.c * tag 'writeback-proportions' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: Fix some comment errors block: Convert BDI proportion calculations to flexible proportions lib: Fix possible deadlock in flexible proportion code lib: Proportions with flexible period 31 July 2012, 05:14:04 UTC
1fad1e9 Merge tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Features include: - More preparatory patches for modularising NFSv2/v3/v4. Split out the various NFSv2/v3/v4-specific code into separate files - More preparation for the NFSv4 migration code - Ensure that OPEN(O_CREATE) observes the pNFS mds threshold parameters - pNFS fast failover when the data servers are down - Various cleanups and debugging patches" * tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (67 commits) nfs: fix fl_type tests in NFSv4 code NFS: fix pnfs regression with directio writes NFS: fix pnfs regression with directio reads sunrpc: clnt: Add missing braces nfs: fix stub return type warnings NFS: exit_nfs_v4() shouldn't be an __exit function SUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavors NFS: Split out NFS v4 client functions NFS: Split out the NFS v4 filesystem types NFS: Create a single nfs_clone_super() function NFS: Split out NFS v4 server creating code NFS: Initialize the NFS v4 client from init_nfs_v4() NFS: Move the v4 getroot code to nfs4getroot.c NFS: Split out NFS v4 file operations NFS: Initialize v4 sysctls from nfs_init_v4() NFS: Create an init_nfs_v4() function NFS: Split out NFS v4 inode operations NFS: Split out NFS v3 inode operations NFS: Split out NFS v2 inode operations NFS: Clean up nfs4_proc_setclientid() and friends ... 31 July 2012, 02:16:57 UTC
bbeb0af Merge tag 'mfd-for-linus-3.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6 Pull MFD fix from Samuel Ortiz: "This one fixes an s5m8767 regulator build breakage due to a merge conflict caused by the MFD s5m API changes." * tag 'mfd-for-linus-3.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: regulator: Fix an s5m8767 build failure 31 July 2012, 02:06:25 UTC
6df419e Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: "This is the first part of the media patches for v3.6. This patch series contain: - new DVB frontend: rtl2832 - new video drivers: adv7393 - some unused files got removed - a selection API cleanup between V4L2 and V4L2 subdev API's - a major redesign at v4l-ioctl2, in order to clean it up - several driver fixes and improvements." * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (174 commits) v4l: Export v4l2-common.h in include/linux/Kbuild media: Revert "[media] Terratec Cinergy S2 USB HD Rev.2" [media] media: Use pr_info not homegrown pr_reg macro [media] Terratec Cinergy S2 USB HD Rev.2 [media] v4l: Correct conflicting V4L2 subdev selection API documentation [media] Feature removal: V4L2 selections API target and flag definitions [media] v4l: Unify selection flags documentation [media] v4l: Unify selection flags [media] v4l: Common documentation for selection targets [media] v4l: Unify selection targets across V4L2 and V4L2 subdev interfaces [media] v4l: Remove "_ACTUAL" from subdev selection API target definition names [media] V4L: Remove "_ACTIVE" from the selection target name definitions [media] media: dvb-usb: print mac address via native %pM [media] s5p-tv: Use module_i2c_driver in sii9234_drv.c file [media] media: gpio-ir-recv: add allowed_protos for platform data [media] s5p-jpeg: Use module_platform_driver in jpeg-core.c file [media] saa7134: fix spelling of detach in label [media] cx88-blackbird: replace ioctl by unlocked_ioctl [media] cx88: don't use current_norm [media] cx88: fix a number of v4l2-compliance violations ... 31 July 2012, 02:03:41 UTC
1fe5e99 rbd: create rbd_refresh_helper() Create a simple helper that handles the common case of calling __rbd_refresh_header() while holding the ctl_mutex. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:21:46 UTC
b813623 rbd: return obj version in __rbd_refresh_header() Add a new parameter to __rbd_refresh_header() through which the version of the header object is passed back to the caller. In most cases this isn't needed. The main motivation is to normalize (almost) all calls to __rbd_refresh_header() so they are all wrapped immediately by mutex_lock()/mutex_unlock(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:21:46 UTC
ccece23 rbd: fixes in rbd_header_from_disk() This fixes a few issues in rbd_header_from_disk(): - There is a check intended to catch overflow, but it's wrong in two ways. - First, the type we don't want to overflow is size_t, not unsigned int, and there is now a SIZE_MAX we can use for use with that type. - Second, we're allocating the snapshot ids and snapshot image sizes separately (each has type u64; on disk they grouped together as a rbd_image_header_ondisk structure). So we can use the size of u64 in this overflow check. - If there are no snapshots, then there should be no snapshot names. Enforce this, and issue a warning if we encounter a header with no snapshots but a non-zero snap_names_len. - When saving the snapshot names into the header, be more direct in defining the offset in the on-disk structure from which they're being copied by using "snap_count" rather than "i" in the array index. - If an error occurs, the "snapc" and "snap_names" fields are freed at the end of the function. Make those fields be null pointers after they're freed, to be explicit that they are no longer valid. - Finally, move the definition of the local variable "i" to the innermost scope in which it's needed. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:21:46 UTC
913d2fd rbd: always pass ops array to rbd_req_sync_op() All of the callers of rbd_req_sync_op() except one pass a non-null "ops" pointer. The only one that does not is rbd_req_sync_read(), which passes CEPH_OSD_OP_READ as its "opcode" and, CEPH_OSD_FLAG_READ for "flags". By allocating the ops array in rbd_req_sync_read() and moving the special case code for the null ops pointer into it, it becomes clear that much of that code is not even necessary. In addition, the "opcode" argument to rbd_req_sync_op() is never actually used, so get rid of that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:21:46 UTC
d67d4be rbd: pass null version pointer in add_snap() rbd_header_add_snap() passes the address of a version variable to rbd_req_sync_exec(), but it ignores the result. Just pass a null pointer instead. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:21:46 UTC
57cfc10 rbd: make rbd_create_rw_ops() return a pointer Either rbd_create_rw_ops() will succeed, or it will fail because a memory allocation failed. Have it just return a valid pointer or null rather than stuffing a pointer into a provided address and returning an errno. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:21:46 UTC
4e891e0 rbd: have __rbd_add_snap_dev() return a pointer It's not obvious whether the snapshot pointer whose address is provided to __rbd_add_snap_dev() will be assigned by that function. Change it to return the snapshot, or a pointer-coded errno in the event of a failure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:21:45 UTC
6139919 libceph: recheck con state after allocating incoming message We drop the lock when calling the ->alloc_msg() con op, which means we need to (a) not clobber con->in_msg without the mutex held, and (b) we need to verify that we are still in the OPEN state when we retake it to avoid causing any mayhem. If the state does change, -EAGAIN will get us back to con_work() and loop. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> 31 July 2012, 01:19:45 UTC
4740a62 libceph: change ceph_con_in_msg_alloc convention to be less weird This function's calling convention is very limiting. In particular, we can't return any error other than ENOMEM (and only implicitly), which is a problem (see next patch). Instead, return an normal 0 or error code, and make the skip a pointer output parameter. Drop the useless in_hdr argument (we have the con pointer). Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> 31 July 2012, 01:19:30 UTC
8636ea6 libceph: avoid dropping con mutex before fault The ceph_fault() function takes the con mutex, so we should avoid dropping it before calling it. This fixes a potential race with another thread calling ceph_con_close(), or _open(), or similar (we don't reverify con->state after retaking the lock). Add annotation so that lockdep realizes we will drop the mutex before returning. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> 31 July 2012, 01:17:13 UTC
7b862e0 libceph: verify state after retaking con lock after dispatch We drop the con mutex when delivering a message. When we retake the lock, we need to verify we are still in the OPEN state before preparing to read the next tag, or else we risk stepping on a connection that has been closed. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> 31 July 2012, 01:16:56 UTC
4f471e4 libceph: revoke mon_client messages on session restart Revoke all mon_client messages when we shut down the old connection. This is mostly moot since we are re-using the same ceph_connection, but it is cleaner. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> 31 July 2012, 01:16:40 UTC
8007b8d libceph: fix handling of immediate socket connect failure If the connect() call immediately fails such that sock == NULL, we still need con_close_socket() to reset our socket state to CLOSED. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> 31 July 2012, 01:16:16 UTC
09d9032 ceph: update MAINTAINERS file * shiny new inktank.com email addresses * add include/linux/crush directory (previous oversight) Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> 31 July 2012, 01:16:03 UTC
756a16a libceph: be less chatty about stray replies There are many (normal) conditions that can lead to us getting unexpected replies, include cluster topology changes, osd failures, and timeouts. There's no need to spam the console about it. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> 31 July 2012, 01:16:03 UTC
43c7427 libceph: clear all flags on con_close Signed-off-by: Sage Weil <sage@inktank.com> 31 July 2012, 01:16:02 UTC
4a86169 libceph: clean up con flags Rename flags with CON_FLAG prefix, move the definitions into the c file, and (better) document their meaning. Signed-off-by: Sage Weil <sage@inktank.com> 31 July 2012, 01:16:01 UTC
8dacc7d libceph: replace connection state bits with states Use a simple set of 6 enumerated values for the socket states (CON_STATE_*) and use those instead of the state bits. All of the con->state checks are now under the protection of the con mutex, so this is safe. It also simplifies many of the state checks because we can check for anything other than the expected state instead of various bits for races we can think of. This appears to hold up well to stress testing both with and without socket failure injection on the server side. Signed-off-by: Sage Weil <sage@inktank.com> 31 July 2012, 01:16:00 UTC
d7353dd libceph: drop unnecessary CLOSED check in socket state change callback If we are CLOSED, the socket is closed and we won't get these. Signed-off-by: Sage Weil <sage@inktank.com> 31 July 2012, 01:15:59 UTC
ee76e07 libceph: close socket directly from ceph_con_close() It is simpler to do this immediately, since we already hold the con mutex. It also avoids the need to deal with a not-quite-CLOSED socket in con_work. Signed-off-by: Sage Weil <sage@inktank.com> 31 July 2012, 01:15:58 UTC
2e8cb10 libceph: drop gratuitous socket close calls in con_work If the state is CLOSED or OPENING, we shouldn't have a socket. Signed-off-by: Sage Weil <sage@inktank.com> 31 July 2012, 01:15:58 UTC
a59b55a libceph: move ceph_con_send() closed check under the con mutex Take the con mutex before checking whether the connection is closed to avoid racing with someone else closing it. Signed-off-by: Sage Weil <sage@inktank.com> 31 July 2012, 01:15:57 UTC
0065093 libceph: move msgr clear_standby under con mutex protection Avoid dropping and retaking con->mutex in the ceph_con_send() case by leaving locking up to the caller. Signed-off-by: Sage Weil <sage@inktank.com> 31 July 2012, 01:15:56 UTC
3b5ede0 libceph: fix fault locking; close socket on lossy fault If we fault on a lossy connection, we should still close the socket immediately, and do so under the con mutex. We should also take the con mutex before printing out the state bits in the debug output. Signed-off-by: Sage Weil <sage@inktank.com> 31 July 2012, 01:15:55 UTC
070c633 rbd: drop "object_name" from rbd_req_sync_unwatch() rbd_req_sync_unwatch() only ever uses rbd_dev->header_name as the value of its "object_name" parameter, and that value is available within the function already. So get rid of the parameter. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:15:54 UTC
7f0a24d rbd: drop "object_name" from rbd_req_sync_notify_ack() rbd_req_sync_notify_ack() only ever uses rbd_dev->header_name as the value of its "object_name" parameter, and that value is available within the function already. So get rid of the parameter. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:15:53 UTC
4cb1625 rbd: drop "object_name" from rbd_req_sync_notify() rbd_req_sync_notify() only ever uses rbd_dev->header_name as the value of its "object_name" parameter, and that value is available within the function already. So get rid of the parameter. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:15:53 UTC
0e6f322 rbd: drop "object_name" from rbd_req_sync_watch() rbd_req_sync_watch() is only called in one place, and in that place it passes rbd_dev->header_name as the value of the "object_name" parameter. This value is available within the function already. Having the extra parameter leaves the impression the object name could take on different values, but it does not. So get rid of the parameter. We can always add it back again if we find we want to watch some other object in the future. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:15:52 UTC
14e7085 rbd: drop rbd_dev parameter in snap functions Both rbd_register_snap_dev() and __rbd_remove_snap_dev() have rbd_dev parameters that are unused. Remove them. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:15:51 UTC
ed63f4f rbd: drop rbd_header_from_disk() gfp_flags parameter The function rbd_header_from_disk() is only called in one spot, and it passes GFP_KERNEL as its value for the gfp_flags parameter. Just drop that parameter and substitute GFP_KERNEL everywhere within that function it had been used. (If we find we need the parameter again in the future it's easy enough to add back again.) Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:15:50 UTC
9a5d690 rbd: snapc is unused in rbd_req_sync_read() The "snapc" parameter to in rbd_req_sync_read() is not used, so get rid of it. Reported-by: Josh Durgin <josh.durgin@inktank.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:15:49 UTC
de71a29 rbd: rename rbd_device->id The "id" field of an rbd device structure represents the unique client-local device id mapped to the underlying rbd image. Each rbd image will have another id--the image id--and each snapshot has its own id as well. The simple name "id" no longer conveys the information one might like to have. Rename the device "id" field in struct rbd_dev to be "dev_id" to make it a little more obvious what we're dealing with without having to think more about context. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:15:49 UTC
8e94af8 rbd: encapsulate header validity test If an rbd image header is read and it doesn't begin with the expected magic information, a warning is displayed. This is a fairly simple test, but it could be extended at some point. Fix the comparison so it actually looks at the "text" field rather than the front of the structure. In any case, encapsulate the validity test in its own function. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:15:48 UTC
aa711ee ceph: define snap counts as u32 everywhere There are two structures in which a count of snapshots are maintained: struct ceph_snap_context { ... u32 num_snaps; ... } and struct ceph_snap_realm { ... u32 num_prior_parent_snaps; /* had prior to parent_since */ ... u32 num_snaps; ... } These fields never take on negative values (e.g., to hold special meaning), and so are really inherently unsigned. Furthermore they take their value from over-the-wire or on-disk formatted 32-bit values. So change their definition to have type u32, and change some spots elsewhere in the code to account for this change. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:15:47 UTC
bd919d4 rbd: clean up a few dout() calls There was a dout() call in rbd_do_request() that was reporting the reporting the offset as the length and vice versa. While fixing that I did a quick scan of other dout() calls and fixed a couple of other minor things. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:15:46 UTC
a059329 rbd: simplify __rbd_remove_all_snaps() This just replaces a while loop with list_for_each_entry_safe() in __rbd_remove_all_snaps(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:15:45 UTC
a66f8c9 rbd: drop extra header_rwsem init In commit c666601a there was inadvertently added an extra initialization of rbd_dev->header_rwsem. This gets rid of the duplicate. Reported-by: Guangliang Zhao <gzhao@suse.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:15:45 UTC
9e15dc7 rbd: kill rbd_image_header->snap_seq The snap_seq field in an rbd_image_header structure held the value from the rbd image header when it was last refreshed. We now maintain this value in the snapc->seq field. So get rid of the other one. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:15:44 UTC
505cbb9 rbd: set snapc->seq only when refreshing header In rbd_header_add_snap() there is code to set snapc->seq to the just-added snapshot id. This is the only remnant left of the use of that field for recording which snapshot an rbd_dev was associated with. That functionality is no longer supported, so get rid of that final bit of code. Doing so means we never actually set snapc->seq any more. On the server, the snapshot context's sequence value represents the highest snapshot id ever issued for a particular rbd image. So we'll make it have that meaning here as well. To do so, set this value whenever the rbd header is (re-)read. That way it will always be consistent with the rest of the snapshot context we maintain. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:15:43 UTC
78dc447 rbd: preserve snapc->seq in rbd_header_set_snap() In rbd_header_set_snap(), there is logic to make the snap context's seq field get set to a particular snapshot id, or 0 if there is no snapshot for the rbd image. This seems to be an artifact of how the current snapshot id for an rbd_dev was recorded before the rbd_dev->snap_id field began to be used for that purpose. There's no need to update the value of snapc->seq here any more, so stop doing it. Tidy up a few local variables in that function while we're at it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:15:42 UTC
75fe9e1 rbd: don't use snapc->seq that way In what appears to be an artifact of a different way of encoding whether an rbd image maps a snapshot, __rbd_refresh_header() has code that arranges to update the seq value in an rbd image's snapshot context to point to the first entry in its snapshot array if that's where it was pointing initially. We now use rbd_dev->snap_id to record the snapshot id--using the special value CEPH_NOSNAP to indicate the rbd_dev is not mapping a snapshot at all. There is therefore no need to check for this case, nor to update the seq value, in __rbd_refresh_header(). Just preserve the seq value that rbd_read_header() provides (which, at the moment, is nothing). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 31 July 2012, 01:15:41 UTC
a71b891 rbd: send header version when notifying Previously the original header version was sent. Now, we update it when the header changes. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@inktank.com> 31 July 2012, 01:15:40 UTC
d1d2564 rbd: use reference counting for the snap context This prevents a race between requests with a given snap context and header updates that free it. The osd client was already expecting the snap context to be reference counted, since it get()s it in ceph_osdc_build_request and put()s it when the request completes. Also remove the second down_read()/up_read() on header_rwsem in rbd_do_request, which wasn't actually preventing this race or protecting any other data. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@inktank.com> 31 July 2012, 01:15:40 UTC
93a24e0 rbd: set image size when header is updated The image may have been resized. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@inktank.com> 31 July 2012, 01:15:39 UTC
a51aa0c rbd: expose the correct size of the device in sysfs If an image was mapped to a snapshot, the size of the head version would be shown. Protect capacity with header_rwsem, since it may change. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@inktank.com> 31 July 2012, 01:15:38 UTC
474ef7c rbd: only reset capacity when pointing to head Snapshots cannot be resized, and the new capacity of head should not be reflected by the snapshot. Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> 31 July 2012, 01:15:37 UTC
e88a36e rbd: return errors for mapped but deleted snapshot When a snapshot is deleted, the OSD will return ENOENT when reading from it. This is normally interpreted as a hole by rbd, which will return zeroes. To minimize the time in which this can happen, stop requests early when we are notified that our snapshot no longer exists. [elder@inktank.com: updated __rbd_init_snaps_header() logic] Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> 31 July 2012, 01:15:36 UTC
048a9d2 libceph: trivial fix for the incorrect debug output This is a trivial fix for the debug output, as it is inconsistent with the function name so may confuse people when debugging. [elder@inktank.com: switched to use __func__] Signed-off-by: Jiaju Zhang <jjzhang@suse.de> Reviewed-by: Alex Elder <elder@inktank.com> 31 July 2012, 01:15:36 UTC
21ec6ff ceph: fix potential double free We re-run the loop but we don't re-set the attrs pointer back to NULL. Signed-off-by: Alan Cox <alan@linux.intel.com> Reviewed-by: Alex Elder <elder@inktank.com> 31 July 2012, 01:15:35 UTC
85effe1 libceph: reset connection retry on successfully negotiation We exponentially back off when we encounter connection errors. If several errors accumulate, we will eventually wait ages before even trying to reconnect. Fix this by resetting the backoff counter after a successful negotiation/ connection with the remote node. Fixes ceph issue #2802. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> 31 July 2012, 01:15:34 UTC
5469155 libceph: protect ceph_con_open() with mutex Take the con mutex while we are initiating a ceph open. This is necessary because the may have previously been in use and then closed, which could result in a racing workqueue running con_work(). Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> 31 July 2012, 01:15:33 UTC
a53aab6 ceph: close old con before reopening on mds reconnect When we detect a mds session reset, close the old ceph_connection before reopening it. This ensures we clean up the old socket properly and keep the ceph_connection state correct. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> 31 July 2012, 01:15:32 UTC
a410702 libceph: (re)initialize bio_iter on start of message receive Previously, we were opportunistically initializing the bio_iter if it appeared to be uninitialized in the middle of the read path. The problem is that a sequence like: - start reading message - initialize bio_iter - read half a message - messenger fault, reconnect - restart reading message - ** bio_iter now non-NULL, not reinitialized ** - read past end of bio, crash Instead, initialize the bio_iter unconditionally when we allocate/claim the message for read. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> 31 July 2012, 01:15:31 UTC
6194ea8 libceph: resubmit linger ops when pg mapping changes The linger op registration (i.e., watch) modifies the object state. As such, the OSD will reply with success if it has already applied without doing the associated side-effects (setting up the watch session state). If we lose the ACK and resubmit, we will see success but the watch will not be correctly registered and we won't get notifies. To fix this, always resubmit the linger op with a new tid. We accomplish this by re-registering as a linger (i.e., 'registered') if we are not yet registered. Then the second loop will treat this just like a normal case of re-registering. This mirrors a similar fix on the userland ceph.git, commit 5dd68b95, and ceph bug #2796. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> 31 July 2012, 01:15:31 UTC
8c50c81 libceph: fix mutex coverage for ceph_con_close Hold the mutex while twiddling all of the state bits to avoid possible races. While we're here, make not of why we cannot close the socket directly. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> 31 July 2012, 01:15:30 UTC
3a140a0 libceph: report socket read/write error message We need to set error_msg to something useful before calling ceph_fault(); do so here for try_{read,write}(). This is more informative than libceph: osd0 192.168.106.220:6801 (null) Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> 31 July 2012, 01:15:29 UTC
546f04e libceph: support crush tunables The server side recently added support for tuning some magic crush variables. Decode these variables if they are present, or use the default values if they are not present. Corresponds to ceph.git commit 89af369c25f274fe62ef730e5e8aad0c54f1e5a5. Signed-off-by: caleb miles <caleb.miles@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> 31 July 2012, 01:15:23 UTC
27c1ee3 Merge branch 'akpm' (Andrew's patch-bomb) Merge Andrew's first set of patches: "Non-MM patches: - lots of misc bits - tree-wide have_clk() cleanups - quite a lot of printk tweaks. I draw your attention to "printk: convert the format for KERN_<LEVEL> to a 2 byte pattern" which looks a bit scary. But afaict it's solid. - backlight updates - lib/ feature work (notably the addition and use of memweight()) - checkpatch updates - rtc updates - nilfs updates - fatfs updates (partial, still waiting for acks) - kdump, proc, fork, IPC, sysctl, taskstats, pps, etc - new fault-injection feature work" * Merge emailed patches from Andrew Morton <akpm@linux-foundation.org>: (128 commits) drivers/misc/lkdtm.c: fix missing allocation failure check lib/scatterlist: do not re-write gfp_flags in __sg_alloc_table() fault-injection: add tool to run command with failslab or fail_page_alloc fault-injection: add selftests for cpu and memory hotplug powerpc: pSeries reconfig notifier error injection module memory: memory notifier error injection module PM: PM notifier error injection module cpu: rewrite cpu-notifier-error-inject module fault-injection: notifier error injection c/r: fcntl: add F_GETOWNER_UIDS option resource: make sure requested range is included in the root range include/linux/aio.h: cpp->C conversions fs: cachefiles: add support for large files in filesystem caching pps: return PTR_ERR on error in device_create taskstats: check nla_reserve() return sysctl: suppress kmemleak messages ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION ipc: compat: use signed size_t types for msgsnd and msgrcv ipc: allow compat IPC version field parsing if !ARCH_WANT_OLD_COMPAT_IPC ipc: add COMPAT_SHMLBA support ... 31 July 2012, 00:25:34 UTC
086ff4b drivers/misc/lkdtm.c: fix missing allocation failure check Addresses https://bugzilla.kernel.org/show_bug.cgi?id=44691 Reported-by: <rucsoftsec@gmail.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:22 UTC
e04f228 lib/scatterlist: do not re-write gfp_flags in __sg_alloc_table() We are seeing a lot of sg_alloc_table allocation failures using the new drm prime infrastructure. We isolated the cause to code in __sg_alloc_table that was re-writing the gfp_flags. There is a comment in the code that suggest that there is an assumption about the allocation coming from a memory pool. This was likely true when sg lists were primarily used for disk I/O. Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Cong Wang <amwang@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Rob Clark <rob.clark@linaro.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Inki Dae <inki.dae@samsung.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Sonny Rao <sonnyrao@chromium.org> Cc: Olof Johansson <olofj@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:22 UTC
c24aa64 fault-injection: add tool to run command with failslab or fail_page_alloc This adds tools/testing/fault-injection/failcmd.sh to run a command while injecting slab/page allocation failures via fault injection. Example: Run a command "make -C tools/testing/selftests/ run_tests" with injecting slab allocation failure. # ./tools/testing/fault-injection/failcmd.sh \ -- make -C tools/testing/selftests/ run_tests Same as above except to specify 100 times failures at most instead of one time at most by default. # ./tools/testing/fault-injection/failcmd.sh --times=100 \ -- make -C tools/testing/selftests/ run_tests Same as above except to inject page allocation failure instead of slab allocation failure. # env FAILCMD_TYPE=fail_page_alloc \ ./tools/testing/fault-injection/failcmd.sh --times=100 \ -- make -C tools/testing/selftests/ run_tests Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:22 UTC
d89dffa fault-injection: add selftests for cpu and memory hotplug This adds two selftests * tools/testing/selftests/cpu-hotplug/on-off-test.sh is testing script for CPU hotplug 1. Online all hot-pluggable CPUs 2. Offline all hot-pluggable CPUs 3. Online all hot-pluggable CPUs again 4. Exit if cpu-notifier-error-inject.ko is not available 5. Offline all hot-pluggable CPUs in preparation for testing 6. Test CPU hot-add error handling by injecting notifier errors 7. Online all hot-pluggable CPUs in preparation for testing 8. Test CPU hot-remove error handling by injecting notifier errors * tools/testing/selftests/memory-hotplug/on-off-test.sh is doing the similar thing for memory hotplug. 1. Online all hot-pluggable memory 2. Offline 10% of hot-pluggable memory 3. Online all hot-pluggable memory again 4. Exit if memory-notifier-error-inject.ko is not available 5. Offline 10% of hot-pluggable memory in preparation for testing 6. Test memory hot-add error handling by injecting notifier errors 7. Online all hot-pluggable memory in preparation for testing 8. Test memory hot-remove error handling by injecting notifier errors Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:22 UTC
08dfb4d powerpc: pSeries reconfig notifier error injection module This provides the ability to inject artifical errors to pSeries reconfig notifier chain callbacks. It is controlled through debugfs interface under /sys/kernel/debug/notifier-error-inject/pSeries-reconfig If the notifier call chain should be failed with some events notified, write the error code to "actions/<notifier event>/error". Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:22 UTC
9579f5b memory: memory notifier error injection module This provides the ability to inject artifical errors to memory hotplug notifier chain callbacks. It is controlled through debugfs interface under /sys/kernel/debug/notifier-error-inject/memory If the notifier call chain should be failed with some events notified, write the error code to "actions/<notifier event>/error". Example: Inject memory hotplug offline error (-12 == -ENOMEM) # cd /sys/kernel/debug/notifier-error-inject/memory # echo -12 > actions/MEM_GOING_OFFLINE/error # echo offline > /sys/devices/system/memory/memoryXXX/state bash: echo: write error: Cannot allocate memory Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:22 UTC
048b9c3 PM: PM notifier error injection module This provides the ability to inject artifical errors to PM notifier chain callbacks. It is controlled through debugfs interface under /sys/kernel/debug/notifier-error-inject/pm Each of the files in "error" directory represents an event which can be failed and contains the error code. If the notifier call chain should be failed with some events notified, write the error code to the files. If the notifier call chain should be failed with some events notified, write the error code to "actions/<notifier event>/error". Example: Inject PM suspend error (-12 = -ENOMEM) # cd /sys/kernel/debug/notifier-error-inject/pm # echo -12 > actions/PM_SUSPEND_PREPARE/error # echo mem > /sys/power/state bash: echo: write error: Cannot allocate memory Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Cc: Greg KH <greg@kroah.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:22 UTC
f5a9f52 cpu: rewrite cpu-notifier-error-inject module Rewrite existing cpu-notifier-error-inject module to use debugfs based new framework. This change removes cpu_up_prepare_error and cpu_down_prepare_error module parameters which were used to specify error code to be injected. We could keep these module parameters for backward compatibility by module_param_cb but it seems overkill for this module. This provides the ability to inject artifical errors to CPU notifier chain callbacks. It is controlled through debugfs interface under /sys/kernel/debug/notifier-error-inject/cpu If the notifier call chain should be failed with some events notified, write the error code to "actions/<notifier event>/error". Example1: inject CPU offline error (-1 == -EPERM) # cd /sys/kernel/debug/notifier-error-inject/cpu # echo -1 > actions/CPU_DOWN_PREPARE/error # echo 0 > /sys/devices/system/cpu/cpu1/online bash: echo: write error: Operation not permitted Example2: inject CPU online error (-2 == -ENOENT) # cd /sys/kernel/debug/notifier-error-inject/cpu # echo -2 > actions/CPU_UP_PREPARE/error # echo 1 > /sys/devices/system/cpu/cpu1/online bash: echo: write error: No such file or directory Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:22 UTC
8d43828 fault-injection: notifier error injection This patchset provides kernel modules that can be used to test the error handling of notifier call chain failures by injecting artifical errors to the following notifier chain callbacks. * CPU notifier * PM notifier * memory hotplug notifier * powerpc pSeries reconfig notifier Example: Inject CPU offline error (-1 == -EPERM) # cd /sys/kernel/debug/notifier-error-inject/cpu # echo -1 > actions/CPU_DOWN_PREPARE/error # echo 0 > /sys/devices/system/cpu/cpu1/online bash: echo: write error: Operation not permitted The patchset also adds cpu and memory hotplug tests to tools/testing/selftests These tests first do simple online and offline test and then do fault injection tests if notifier error injection module is available. This patch: The notifier error injection provides the ability to inject artifical errors to specified notifier chain callbacks. It is useful to test the error handling of notifier call chain failures. This adds common basic functions to define which type of events can be fail and to initialize the debugfs interface to control what error code should be returned and which event should be failed. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:22 UTC
1d151c3 c/r: fcntl: add F_GETOWNER_UIDS option When we restore file descriptors we would like them to look exactly as they were at dumping time. With help of fcntl it's almost possible, the missing snippet is file owners UIDs. To be able to read their values the F_GETOWNER_UIDS is introduced. This option is valid iif CONFIG_CHECKPOINT_RESTORE is turned on, otherwise returning -EINVAL. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:21 UTC
65fed8f resource: make sure requested range is included in the root range When the requested range is outside of the root range the logic in __reserve_region_with_split will cause an infinite recursion which will overflow the stack as seen in the warning bellow. This particular stack overflow was caused by requesting the (100000000-107ffffff) range while the root range was (0-ffffffff). In this case __request_resource would return the whole root range as conflict range (i.e. 0-ffffffff). Then, the logic in __reserve_region_with_split would continue the recursion requesting the new range as (conflict->end+1, end) which incidentally in this case equals the originally requested range. This patch aborts looking for an usable range when the request does not intersect with the root range. When the request partially overlaps with the root range, it ajust the request to fall in the root range and then continues with the new request. When the request is modified or aborted errors and a stack trace are logged to allow catching the errors in the upper layers. [ 5.968374] WARNING: at kernel/sched.c:4129 sub_preempt_count+0x63/0x89() [ 5.975150] Modules linked in: [ 5.978184] Pid: 1, comm: swapper Not tainted 3.0.22-mid27-00004-gb72c817 #46 [ 5.985324] Call Trace: [ 5.987759] [<c1039dfc>] ? console_unlock+0x17b/0x18d [ 5.992891] [<c1039620>] warn_slowpath_common+0x48/0x5d [ 5.998194] [<c1031758>] ? sub_preempt_count+0x63/0x89 [ 6.003412] [<c1039644>] warn_slowpath_null+0xf/0x13 [ 6.008453] [<c1031758>] sub_preempt_count+0x63/0x89 [ 6.013499] [<c14d60c4>] _raw_spin_unlock+0x27/0x3f [ 6.018453] [<c10c6349>] add_partial+0x36/0x3b [ 6.022973] [<c10c7c0a>] deactivate_slab+0x96/0xb4 [ 6.027842] [<c14cf9d9>] __slab_alloc.isra.54.constprop.63+0x204/0x241 [ 6.034456] [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38 [ 6.039842] [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38 [ 6.045232] [<c10c7dc9>] kmem_cache_alloc_trace+0x51/0xb0 [ 6.050710] [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38 [ 6.056100] [<c103f78f>] kzalloc.constprop.5+0x29/0x38 [ 6.061320] [<c17b45e9>] __reserve_region_with_split+0x1c/0xd1 [ 6.067230] [<c17b4693>] __reserve_region_with_split+0xc6/0xd1 ... [ 7.179057] [<c17b4693>] __reserve_region_with_split+0xc6/0xd1 [ 7.184970] [<c17b4779>] reserve_region_with_split+0x30/0x42 [ 7.190709] [<c17a8ebf>] e820_reserve_resources_late+0xd1/0xe9 [ 7.196623] [<c17c9526>] pcibios_resource_survey+0x23/0x2a [ 7.202184] [<c17cad8a>] pcibios_init+0x23/0x35 [ 7.206789] [<c17ca574>] pci_subsys_init+0x3f/0x44 [ 7.211659] [<c1002088>] do_one_initcall+0x72/0x122 [ 7.216615] [<c17ca535>] ? pci_legacy_init+0x3d/0x3d [ 7.221659] [<c17a27ff>] kernel_init+0xa6/0x118 [ 7.226265] [<c17a2759>] ? start_kernel+0x334/0x334 [ 7.231223] [<c14d7482>] kernel_thread_helper+0x6/0x10 Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:21 UTC
f7e1bec include/linux/aio.h: cpp->C conversions Convert init_sync_kiocb() from a nasty macro into a nice C function. The struct assignment trick takes care of zeroing all unmentioned fields. Shrinks fs/read_write.o's .text from 9857 bytes to 9714. Also demacroize is_sync_kiocb() and aio_ring_avail(). The latter fixes an arg-referenced-multiple-times hand grenade. Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:21 UTC
98c350c fs: cachefiles: add support for large files in filesystem caching Support the caching of large files. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=31182 Signed-off-by: Justin Lecher <jlec@gentoo.org> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.com> Tested-by: Suresh Jayaraman <sjayaraman@suse.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:21 UTC
668f06b pps: return PTR_ERR on error in device_create We should return PTR_ERR if the call to the device_create function fails. Without this patch we instead return the value from a successful call to cdev_add if the call to device_create fails. Signed-off-by: Emil Goode <emilgoode@gmail.com> Acked-by: Devendra Naga <devendra.aaru@gmail.com> Cc: Alexander Gordeev <lasaine@lvk.cs.msu.su> Cc: Rodolfo Giometti <giometti@enneenne.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:21 UTC
25353b3 taskstats: check nla_reserve() return Addresses https://bugzilla.kernel.org/show_bug.cgi?id=44621 Reported-by: <rucsoftsec@gmail.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:21 UTC
fd4b616 sysctl: suppress kmemleak messages register_sysctl_table() is a strange function, as it makes internal allocations (a header) to register a sysctl_table. This header is a handle to the table that is created, and can be used to unregister the table. But if the table is permanent and never unregistered, the header acts the same as a static variable. Unfortunately, this allocation of memory that is never expected to be freed fools kmemleak in thinking that we have leaked memory. For those sysctl tables that are never unregistered, and have no pointer referencing them, kmemleak will think that these are memory leaks: unreferenced object 0xffff880079fb9d40 (size 192): comm "swapper/0", pid 0, jiffies 4294667316 (age 12614.152s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8146b590>] kmemleak_alloc+0x73/0x98 [<ffffffff8110a935>] kmemleak_alloc_recursive.constprop.42+0x16/0x18 [<ffffffff8110b852>] __kmalloc+0x107/0x153 [<ffffffff8116fa72>] kzalloc.constprop.8+0xe/0x10 [<ffffffff811703c9>] __register_sysctl_paths+0xe1/0x160 [<ffffffff81170463>] register_sysctl_paths+0x1b/0x1d [<ffffffff8117047d>] register_sysctl_table+0x18/0x1a [<ffffffff81afb0a1>] sysctl_init+0x10/0x14 [<ffffffff81b05a6f>] proc_sys_init+0x2f/0x31 [<ffffffff81b0584c>] proc_root_init+0xa5/0xa7 [<ffffffff81ae5b7e>] start_kernel+0x3d0/0x40a [<ffffffff81ae52a7>] x86_64_start_reservations+0xae/0xb2 [<ffffffff81ae53ad>] x86_64_start_kernel+0x102/0x111 [<ffffffffffffffff>] 0xffffffffffffffff The sysctl_base_table used by sysctl itself is one such instance that registers the table to never be unregistered. Use kmemleak_not_leak() to suppress the kmemleak false positive. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:21 UTC
c1d7e01 ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION Rather than #define the options manually in the architecture code, add Kconfig options for them and select them there instead. This also allows us to select the compat IPC version parsing automatically for platforms using the old compat IPC interface. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:21 UTC
05ba3f1 ipc: compat: use signed size_t types for msgsnd and msgrcv The msgsnd and msgrcv system calls use size_t to represent the size of the message being transferred. POSIX states that values of msgsz greater than SSIZE_MAX cause the result to be implementation-defined. On Linux, this equates to returning -EINVAL if (long) msgsz < 0. For compat tasks where !CONFIG_ARCH_WANT_OLD_COMPAT_IPC and compat_size_t is smaller than size_t, negative size values passed from userspace will be interpreted as positive values by do_msg{rcv,snd} and will fail to exit early with -EINVAL. This patch changes the compat prototypes for msg{rcv,snd} so that the message size is represented as a compat_ssize_t, which we cast to the native ssize_t type for the core IPC code. Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:21 UTC
b610c04 ipc: allow compat IPC version field parsing if !ARCH_WANT_OLD_COMPAT_IPC Commit 48b25c43e6ee ("ipc: provide generic compat versions of IPC syscalls") added a new ARCH_WANT_OLD_COMPAT_IPC config option for architectures to select if their compat target requires the old IPC syscall interface. For architectures (such as AArch64) that do not require the internal calling conventions provided by this option, but have a compat target where the C library passes the IPC_64 flag explicitly, compat_ipc_parse_version no longer strips out the flag before calling the native system call implementation, resulting in unknown SHM/IPC commands and -EINVAL being returned to userspace. This patch separates the selection of the internal calling conventions for the IPC syscalls from the version parsing, allowing architectures to select __ARCH_WANT_COMPAT_IPC_PARSE_VERSION if they want to use version parsing whilst retaining the newer syscall calling conventions. Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:21 UTC
079a96a ipc: add COMPAT_SHMLBA support If the SHMLBA definition for a native task differs from the definition for a compat task, the do_shmat() function would need to handle both. This patch introduces COMPAT_SHMLBA, which is used by the compat shmat syscall when calling the ipc code and allows architectures such as AArch64 (where the native SHMLBA is 64k but the compat (AArch32) definition is 16k) to provide the correct semantics for compat IPC system calls. Cc: David S. Miller <davem@davemloft.net> Cc: Chris Zankel <chris@zankel.net> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:20 UTC
63dca8d kdump: append newline to the last lien of vmcoreinfo note The last line of vmcoreinfo note does not end with \n. Parsing all the lines in note becomes easier if all lines end with \n instead of trying to special case the last line. I know at least one tool, vmcore-dmesg in kexec-tools tree which made the assumption that all lines end with \n. I think it is a good idea to fix it. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:20 UTC
f19b9f7 fork: fix error handling in dup_task() The function dup_task() may fail at the following function calls in the following order. 0) alloc_task_struct_node() 1) alloc_thread_info_node() 2) arch_dup_task_struct() Error by 0) is not a matter, it can just return. But error by 1) requires releasing task_struct allocated by 0) before it returns. Likewise, error by 2) requires releasing task_struct and thread_info allocated by 0) and 1). The existing error handling calls free_task_struct() and free_thread_info() which do not only release task_struct and thread_info, but also call architecture specific arch_release_task_struct() and arch_release_thread_info(). The problem is that task_struct and thread_info are not fully initialized yet at this point, but arch_release_task_struct() and arch_release_thread_info() are called with them. For example, x86 defines its own arch_release_task_struct() that releases a task_xstate. If alloc_thread_info_node() fails in dup_task(), arch_release_task_struct() is called with task_struct which is just allocated and filled with garbage in this error handling. This actually happened with tools/testing/fault-injection/failcmd.sh # env FAILCMD_TYPE=fail_page_alloc \ ./tools/testing/fault-injection/failcmd.sh --times=100 \ --min-order=0 --ignore-gfp-wait=0 \ -- make -C tools/testing/selftests/ run_tests In order to fix this issue, make free_{task_struct,thread_info}() not to call arch_release_{task_struct,thread_info}() and call arch_release_{task_struct,thread_info}() implicitly where needed. Default arch_release_task_struct() and arch_release_thread_info() are defined as empty by default. So this change only affects the architectures which implement their own arch_release_task_struct() or arch_release_thread_info() as listed below. arch_release_task_struct(): x86, sh arch_release_thread_info(): mn10300, tile Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: David Howells <dhowells@redhat.com> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Salman Qazi <sqazi@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:20 UTC
87bec58 revert "sched: Fix fork() error path to not crash" To make way for "fork: fix error handling in dup_task()", which fixes the errors more completely. Cc: Salman Qazi <sqazi@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:20 UTC
b2412b7 fork: use vma_pages() to simplify the code The current code can be replaced by vma_pages(). So use it to simplify the code. [akpm@linux-foundation.org: initialise `len' at its definition site] Signed-off-by: Huang Shijie <shijie8@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:20 UTC
bc452b4 proc: do not allow negative offsets on /proc/<pid>/environ __mem_open() which is called by both /proc/<pid>/environ and /proc/<pid>/mem ->open() handlers will allow the use of negative offsets. /proc/<pid>/mem has negative offsets but not /proc/<pid>/environ. Clean this by moving the 'force FMODE_UNSIGNED_OFFSET flag' to mem_open() to allow negative offsets only on /proc/<pid>/mem. Signed-off-by: Djalal Harouni <tixxdz@opendz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Brad Spengler <spender@grsecurity.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:20 UTC
e8905ec proc: environ_read() make sure offset points to environment address range Currently the following offset and environment address range check in environ_read() of /proc/<pid>/environ is buggy: int this_len = mm->env_end - (mm->env_start + src); if (this_len <= 0) break; Large or negative offsets on /proc/<pid>/environ converted to 'unsigned long' may pass this check since '(mm->env_start + src)' can overflow and 'this_len' will be positive. This can turn /proc/<pid>/environ to act like /proc/<pid>/mem since (mm->env_start + src) will point and read from another VMA. There are two fixes here plus some code cleaning: 1) Fix the overflow by checking if the offset that was converted to unsigned long will always point to the [mm->env_start, mm->env_end] address range. 2) Remove the truncation that was made to the result of the check, storing the result in 'int this_len' will alter its value and we can not depend on it. For kernels that have commit b409e578d ("proc: clean up /proc/<pid>/environ handling") which adds the appropriate ptrace check and saves the 'mm' at ->open() time, this is not a security issue. This patch is taken from the grsecurity patch since it was just made available. Signed-off-by: Djalal Harouni <tixxdz@opendz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Brad Spengler <spender@grsecurity.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:20 UTC
108ceeb coredump: fix wrong comments on core limits of pipe coredump case In commit 898b374af6f7 ("exec: replace call_usermodehelper_pipe with use of umh init function and resolve limit"), the core limits recursive check value was changed from 0 to 1, but the corresponding comments were not updated. Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:20 UTC
0f20784 kmod: avoid deadlock from recursive kmod call The system deadlocks (at least since 2.6.10) when call_usermodehelper(UMH_WAIT_EXEC) request triggers call_usermodehelper(UMH_WAIT_PROC) request. This is because "khelper thread is waiting for the worker thread at wait_for_completion() in do_fork() since the worker thread was created with CLONE_VFORK flag" and "the worker thread cannot call complete() because do_execve() is blocked at UMH_WAIT_PROC request" and "the khelper thread cannot start processing UMH_WAIT_PROC request because the khelper thread is waiting for the worker thread at wait_for_completion() in do_fork()". The easiest example to observe this deadlock is to use a corrupted /sbin/hotplug binary (like shown below). # : > /tmp/dummy # chmod 755 /tmp/dummy # echo /tmp/dummy > /proc/sys/kernel/hotplug # modprobe whatever call_usermodehelper("/tmp/dummy", UMH_WAIT_EXEC) is called from kobject_uevent_env() in lib/kobject_uevent.c upon loading/unloading a module. do_execve("/tmp/dummy") triggers a call to request_module("binfmt-0000") from search_binary_handler() which in turn calls call_usermodehelper(UMH_WAIT_PROC). In order to avoid deadlock, as a for-now and easy-to-backport solution, do not try to call wait_for_completion() in call_usermodehelper_exec() if the worker thread was created by khelper thread with CLONE_VFORK flag. Future and fundamental solution might be replacing singleton khelper thread with some workqueue so that recursive calls up to max_active dependency loop can be handled without deadlock. [akpm@linux-foundation.org: add comment to kmod_thread_locker] Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:20 UTC
79c743d kernel/kmod.c: document call_usermodehelper_fns() a bit This function's interface is, uh, subtle. Attempt to apologise for it. Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Kees Cook <keescook@chromium.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:20 UTC
deb8274 fat: refactor shortname parsing Nearly identical shortname parsing is performed in fat_search_long() and __fat_readdir(). Extract this code into a function that may be called by both. Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 31 July 2012, 00:25:20 UTC
back to top