directory listing hangs in "ufs" state
Jeremy Chadwick
freebsd at jdc.parodius.com
Wed Dec 14 18:22:57 UTC 2011
On Wed, Dec 14, 2011 at 10:11:47PM +0400, Andrey Zonov wrote:
> Hi Jeremy,
>
> This is not hardware problem, I've already checked that. I also ran
> fsck today and got no errors.
>
> After some more exploration of how mongodb works, I found that then
> listing hangs, one of mongodb thread is in "biowr" state for a long
> time. It periodically calls msync(MS_SYNC) accordingly to ktrace
> out.
>
> If I'll remove msync() calls from mongodb, how often data will be
> sync by OS?
>
> --
> Andrey Zonov
>
> On 14.12.2011 2:15, Jeremy Chadwick wrote:
> >On Wed, Dec 14, 2011 at 01:11:19AM +0400, Andrey Zonov wrote:
> >>
> >>Have you any ideas what is going on? or how to catch the problem?
> >
> >Assuming this isn't a file on the root filesystem, try booting the
> >machine in single-user mode and using "fsck -f" on the filesystem in
> >question.
> >
> >Can you verify there's no problems with the disk this file lives on as
> >well (smartctl -a /dev/disk)? I'm doubting this is the problem, but
> >thought I'd mention it.
I have no real answer, I'm sorry. msync(2) indicates it's effectively
deprecated (see BUGS). It looks like this is effectively a mmap-version
of fsync(2).
I'm extremely confused by this problem. What you're describing above is
that the process is "stuck in biowr state for a long time", but what you
stated originally was that the process was "stuck in ufs state for a
few minutes":
> I've got STABLE-8 (r221983) with mongodb-1.8.1 installed on it. A
> couple days ago I observed that listing of mongodb directory stuck in
> a few minutes in "ufs" state.
Can we narrow down what we're talking about here? Does the process
actually deadlock? Or are you concerned about performance implications?
I know nothing about this "mongodb" software, but the reason it's
calling msync() is because it wants to try and ensure that the data it
changed in an mmap()-mapped page to be reflected (fully written) on the
disk. This behaviour is fairly common within database software, but
"how often" the software chooses to do this is entirely a design
implementation choice by the authors.
Meaning: if mongodb is either 1) continually calling msync(), or 2)
waiting for too long a period of time before calling msync(),
performance within the process will suffer. #1 could result in overall
bad performance, while #2 could result in a process that's spending a
lot of time doing I/O (flushing to disk) and therefore appears
"deadlocked" when in fact the kernel/subsystems are doing exactly what
they were told to do.
Removing the msync() call could result in inconsistent data (possibly
non-recoverable) if the mongodb software crashes or if some other piece
(thread or child? Not sure) expects to open a new fd on that file which
has mmap()'d data.
This is about all I know. I would love to be able to tell you "consider
a different database" but that seems like an excuse rather than an
actual solution. I guess if all you're seeing is the process "stall"
for long periods of time, but recover normally, then I would open up a
support ticket with the mongodb folks to discuss performance.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |
More information about the freebsd-stable
mailing list