Revision 9e0af23764344f7f1b68e4eefbe7dc865018b63d authored by Liu Bo on 15 August 2014, 15:36:53 UTC, committed by Chris Mason on 24 August 2014, 14:17:02 UTC
This has been reported and discussed for a long time, and this hang occurs in
both 3.15 and 3.16.

Btrfs now migrates to use kernel workqueue, but it introduces this hang problem.

Btrfs has a kind of work queued as an ordered way, which means that its
ordered_func() must be processed in the way of FIFO, so it usually looks like --

normal_work_helper(arg)
    work = container_of(arg, struct btrfs_work, normal_work);

    work->func() <---- (we name it work X)
    for ordered_work in wq->ordered_list
            ordered_work->ordered_func()
            ordered_work->ordered_free()

The hang is a rare case, first when we find free space, we get an uncached block
group, then we go to read its free space cache inode for free space information,
so it will

file a readahead request
    btrfs_readpages()
         for page that is not in page cache
                __do_readpage()
                     submit_extent_page()
                           btrfs_submit_bio_hook()
                                 btrfs_bio_wq_end_io()
                                 submit_bio()
                                 end_workqueue_bio() <--(ret by the 1st endio)
                                      queue a work(named work Y) for the 2nd
                                      also the real endio()

So the hang occurs when work Y's work_struct and work X's work_struct happens
to share the same address.

A bit more explanation,

A,B,C -- struct btrfs_work
arg   -- struct work_struct

kthread:
worker_thread()
    pick up a work_struct from @worklist
    process_one_work(arg)
	worker->current_work = arg;  <-- arg is A->normal_work
	worker->current_func(arg)
		normal_work_helper(arg)
		     A = container_of(arg, struct btrfs_work, normal_work);

		     A->func()
		     A->ordered_func()
		     A->ordered_free()  <-- A gets freed

		     B->ordered_func()
			  submit_compressed_extents()
			      find_free_extent()
				  load_free_space_inode()
				      ...   <-- (the above readhead stack)
				      end_workqueue_bio()
					   btrfs_queue_work(work C)
		     B->ordered_free()

As if work A has a high priority in wq->ordered_list and there are more ordered
works queued after it, such as B->ordered_func(), its memory could have been
freed before normal_work_helper() returns, which means that kernel workqueue
code worker_thread() still has worker->current_work pointer to be work
A->normal_work's, ie. arg's address.

Meanwhile, work C is allocated after work A is freed, work C->normal_work
and work A->normal_work are likely to share the same address(I confirmed this
with ftrace output, so I'm not just guessing, it's rare though).

When another kthread picks up work C->normal_work to process, and finds our
kthread is processing it(see find_worker_executing_work()), it'll think
work C as a collision and skip then, which ends up nobody processing work C.

So the situation is that our kthread is waiting forever on work C.

Besides, there're other cases that can lead to deadlock, but the real problem
is that all btrfs workqueue shares one work->func, -- normal_work_helper,
so this makes each workqueue to have its own helper function, but only a
wraper pf normal_work_helper.

With this patch, I no long hit the above hang.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
1 parent f6dc45c
Raw File
.mailmap
#
# This list is used by git-shortlog to fix a few botched name translations
# in the git archive, either because the author's full name was messed up
# and/or not always written the same way, making contributions from the
# same person appearing not to be so or badly displayed.
#
# repo-abbrev: /pub/scm/linux/kernel/git/
#

Aaron Durbin <adurbin@google.com>
Adam Oldham <oldhamca@gmail.com>
Adam Radford <aradford@gmail.com>
Adrian Bunk <bunk@stusta.de>
Alan Cox <alan@lxorguk.ukuu.org.uk>
Alan Cox <root@hraefn.swansea.linux.org.uk>
Aleksey Gorelov <aleksey_gorelov@phoenix.com>
Al Viro <viro@ftp.linux.org.uk>
Al Viro <viro@zenIV.linux.org.uk>
Andreas Herrmann <aherrman@de.ibm.com>
Andrew Morton <akpm@osdl.org>
Andrew Vasquez <andrew.vasquez@qlogic.com>
Andy Adamson <andros@citi.umich.edu>
Archit Taneja <archit@ti.com>
Arnaud Patard <arnaud.patard@rtp-net.org>
Arnd Bergmann <arnd@arndb.de>
Axel Dyks <xl@xlsigned.net>
Axel Lin <axel.lin@gmail.com>
Ben Gardner <bgardner@wabtec.com>
Ben M Cahill <ben.m.cahill@intel.com>
Björn Steinbrink <B.Steinbrink@gmx.de>
Brian Avery <b.avery@hp.com>
Brian King <brking@us.ibm.com>
Christoph Hellwig <hch@lst.de>
Corey Minyard <minyard@acm.org>
Damian Hobson-Garcia <dhobsong@igel.co.jp>
David Brownell <david-b@pacbell.net>
David Woodhouse <dwmw2@shinybook.infradead.org>
Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Domen Puncer <domen@coderock.org>
Douglas Gilbert <dougg@torque.net>
Ed L. Cashin <ecashin@coraid.com>
Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Felipe W Damasio <felipewd@terra.com.br>
Felix Kuhling <fxkuehl@gmx.de>
Felix Moeller <felix@derklecks.de>
Filipe Lautert <filipe@icewall.org>
Franck Bui-Huu <vagabon.xyz@gmail.com>
Frank Zago <fzago@systemfabricworks.com>
Greg Kroah-Hartman <greg@echidna.(none)>
Greg Kroah-Hartman <gregkh@suse.de>
Greg Kroah-Hartman <greg@kroah.com>
Henk Vergonet <Henk.Vergonet@gmail.com>
Henrik Kretzschmar <henne@nachtwindheim.de>
Herbert Xu <herbert@gondor.apana.org.au>
Jacob Shin <Jacob.Shin@amd.com>
James Bottomley <jejb@mulgrave.(none)>
James Bottomley <jejb@titanic.il.steeleye.com>
James E Wilson <wilson@specifix.com>
James Ketrenos <jketreno@io.(none)>
Jean Tourrilhes <jt@hpl.hp.com>
Jeff Garzik <jgarzik@pretzel.yyz.us>
Jens Axboe <axboe@suse.de>
Jens Osterkamp <Jens.Osterkamp@de.ibm.com>
John Stultz <johnstul@us.ibm.com>
<josh@joshtriplett.org> <josh@freedesktop.org>
<josh@joshtriplett.org> <josh@kernel.org>
<josh@joshtriplett.org> <josht@linux.vnet.ibm.com>
<josh@joshtriplett.org> <josht@us.ibm.com>
<josh@joshtriplett.org> <josht@vnet.ibm.com>
Juha Yrjola <at solidboot.com>
Juha Yrjola <juha.yrjola@nokia.com>
Juha Yrjola <juha.yrjola@solidboot.com>
Kay Sievers <kay.sievers@vrfy.org>
Kenneth W Chen <kenneth.w.chen@intel.com>
Koushik <raghavendra.koushik@neterion.com>
Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Leonid I Ananiev <leonid.i.ananiev@intel.com>
Linas Vepstas <linas@austin.ibm.com>
Mark Brown <broonie@sirena.org.uk>
Matthieu CASTET <castet.matthieu@free.fr>
Mayuresh Janorkar <mayur@ti.com>
Michael Buesch <m@bues.ch>
Michel Dänzer <michel@tungstengraphics.com>
Mitesh shah <mshah@teja.com>
Morten Welinder <terra@gnome.org>
Morten Welinder <welinder@anemone.rentec.com>
Morten Welinder <welinder@darter.rentec.com>
Morten Welinder <welinder@troll.com>
Mythri P K <mythripk@ti.com>
Nguyen Anh Quynh <aquynh@gmail.com>
Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Patrick Mochel <mochel@digitalimplant.org>
Peter A Jonsson <pj@ludd.ltu.se>
Peter Oruba <peter@oruba.de>
Peter Oruba <peter.oruba@amd.com>
Praveen BP <praveenbp@ti.com>
Rajesh Shah <rajesh.shah@intel.com>
Ralf Baechle <ralf@linux-mips.org>
Ralf Wildenhues <Ralf.Wildenhues@gmx.de>
Rémi Denis-Courmont <rdenis@simphalempin.com>
Rudolf Marek <R.Marek@sh.cvut.cz>
Rui Saraiva <rmps@joel.ist.utl.pt>
Sachin P Sant <ssant@in.ibm.com>
Sam Ravnborg <sam@mars.ravnborg.org>
Sascha Hauer <s.hauer@pengutronix.de>
S.Çağlar Onur <caglar@pardus.org.tr>
Shiraz Hashim <shiraz.linux.kernel@gmail.com> <shiraz.hashim@st.com>
Simon Kelley <simon@thekelleys.org.uk>
Stéphane Witzmann <stephane.witzmann@ubpmes.univ-bpclermont.fr>
Stephen Hemminger <shemminger@osdl.org>
Sumit Semwal <sumit.semwal@ti.com>
Tejun Heo <htejun@gmail.com>
Thomas Graf <tgraf@suug.ch>
Tony Luck <tony.luck@intel.com>
Tsuneo Yoshioka <Tsuneo.Yoshioka@f-secure.com>
Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
Uwe Kleine-König <ukl@pengutronix.de>
Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Viresh Kumar <viresh.linux@gmail.com> <viresh.kumar@st.com>
Takashi YOSHII <takashi.yoshii.zj@renesas.com>
Yusuke Goda <goda.yusuke@renesas.com>
Gustavo Padovan <gustavo@las.ic.unicamp.br>
Gustavo Padovan <padovan@profusion.mobi>
back to top