thr3ads.net - Btrfs devel - Is the checkpoint interval adjustable? [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Mike Audia

2013-Jul-31 20:02 UTC

Is the checkpoint interval adjustable?

I believe 30 sec is the default for the checkpoint interval.  Is this
adjustable?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zach Brown

2013-Jul-31 20:54 UTC

head link

Re: Is the checkpoint interval adjustable?

On Wed, Jul 31, 2013 at 04:02:29PM -0400, Mike Audia
wrote:> I believe 30 sec is the default for the checkpoint interval.  Is this
adjustable?
It doesn''t look like it.  It looks like it''s implemented with
raw ''30''s
in the code.

                delay = HZ * 30;
...
                    (now < cur->start_time || now - cur->start_time
<
30)) {

If you want more frequent forced commits you could always syncfs()
regularly from userspace, I suppose.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mike Audia

2013-Jul-31 22:10 UTC

head link

Re: Is the checkpoint interval adjustable?

> On Wed, Jul 31, 2013 at 04:02:29PM -0400, Mike Audia wrote:
> > I believe 30 sec is the default for the checkpoint interval.  Is this
adjustable?
>
> It doesn''t look like it. It looks like it''s implemented
with raw ''30''s
> in the code.
>
>  delay = HZ * 30;
> ...
>  (now < cur->start_time || now - cur->start_time <
> 30)) {
>
> If you want more frequent forced commits you could always syncfs()
> regularly from userspace, I suppose.
Thank you kindly for the prompt reply.  My goal is to make them _less_ frequent.
 I am NO programmer by any stretch.  Let''s say I want them to be once
every 5 min (300 sec).  Is the attached patch sane to acheive this?  Are there
any unforeseen and effects of doing this?  Thank you for the consideration.

--========GMX156771375308608111082
Content-Type: text/x-patch; charset="utf-8";
name="10_minute_checkpoints.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment;
filename="10_minute_checkpoints.patch"
Content-Description: Attachment: 10_minute_checkpoints.patch

--- a/fs/btrfs/disk-io.c	2013-07-31 18:05:22.581062955 -0400
+++ b/fs/btrfs/disk-io.c	2013-07-31 18:06:15.243201652 -0400
@@ -1713,7 +1713,7 @@
 
 	do {
 		cannot_commit = false;
-		delay = HZ * 30;
+		delay = HZ * 300;
 		mutex_lock(&root->fs_info->transaction_kthread_mutex);
 
 		spin_lock(&root->fs_info->trans_lock);
@@ -1725,7 +1725,7 @@
 
 		now = get_seconds();
 		if (!cur->blocked &&
-		    (now < cur->start_time || now - cur->start_time < 30)) {
+		    (now < cur->start_time || now - cur->start_time < 300)) {
 			spin_unlock(&root->fs_info->trans_lock);
 			delay = HZ * 5;
 			goto sleep;

--========GMX156771375308608111082--
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zach Brown

2013-Jul-31 22:56 UTC

head link

Re: Is the checkpoint interval adjustable?

> Thank you kindly for the prompt reply.  My goal is to make them _less_
> frequent.
I assumed as much.  I should have added some sympathy smileys :).
>  I am NO programmer by any stretch.  Let''s say I want them to be
once
> every 5 min (300 sec).  Is the attached patch sane to acheive this?
I think it''s a reasonable patch to try, yeah.
>  Are there any unforeseen and effects of doing this?  Thank you for
> the consideration.
I don''t *think* that there should be.  One way of looking at it is that
both 30 and 300 seconds are an *eternity* for cpu, memory, and storage.
Any trouble that you could get in to in 300 seconds some other machine
could trivially get in to in 30 with beefier hardware.

But I reserve the right to be wrong.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Duncan

2013-Aug-01 03:11 UTC

head link

Re: Is the checkpoint interval adjustable?

Zach Brown posted on Wed, 31 Jul 2013 15:56:40 -0700 as excerpted:

[Mike Audia wrote...]
>>  I am NO programmer by any stretch.  Let''s say I want them to
be once
>> every 5 min (300 sec).  Is the attached patch sane to acheive this?
>>  Are there any unforeseen and effects of doing this?
> I don''t *think* that there should be.  One way of looking at it is
that
> both 30 and 300 seconds are an *eternity* for cpu, memory, and storage.
> Any trouble that you could get in to in 300 seconds some other machine
> could trivially get in to in 30 with beefier hardware.
As a sysadmin (not a programmer) that has messed around with, for 
example, vm.dirty_bytes/ratio, vm.dirty_writeback_centisecs, etc, the 
concern I''d have is that longer commit periods and larger commit
buffers
increase the possibility of writeback storms.  While I''ve not tweaked 
btrfs and I probably need to reexamine my current settings since I''ve 
switched to SSD and btrfs, for spinning rust and reiserfs, I ended up 
tweaking vm.dirty_* here.

The files are /proc/sys/vm/* and the kernel documentation for them in 
Documentation/sysctl/vm.txt.  Most distros have an initscript that writes 
any custom values at boot, using values set in /etc/sysctl.conf and/or
/etc/sysctl.d/*, so that''s where you''d normally set them once
you''ve
settled on values that work for you.

The following are the defaults and what I settled on for a wall-powered 
system.

vm.dirty_ratio defaults to 10 (percent of RAM). I''ve read and agree
with
opinions that 10% of RAM when RAM is say half a gig (so 10% is ~50 MB) 
isn''t too bad on spinning rust, but it can be MUCH worse when RAM is
say
my current 16 gig (so 10% is ~1.6 gig), as that''s several seconds of 
writeback on spinning rust.  I reset that to 3% (~half a gig), here.

vm.dirty_background_ratio similarly, 5 (% of RAM) by default, reset to 1 
(~160 MB).

(The vm.dirty_(background_)bytes knobs parallel the above "ratio"
knobs
and may be easier to set for those thinking in terms of writeback backlog 
size and corresponding system responsiveness or lack thereof during that 
writeback, instead of percentage of memory dirty.  Set one set or the 
other.)

OTOH, vm.dirty_expire_centisecs defaults to 2999 (30 seconds, this is the 
high priority foreground value and might well be the reason btrfs is 
coded for a 30 second commit time as well) and 
vm.dirty_writeback_centisecs defaults to 499 (5 seconds, this is the 
lower priority background value).  I left expire where it was, but 
decided with the stricter ratio settings, writeback could be 10 seconds, 
doubling the background writeback time.


Before tuning btrfs'' hardcoded defaults, I''d suggest tuning
these values
if you haven''t already done so, and keeping them in mind if you do
decide
to tune btrfs as well.

For battery powered systems, also take a look at laptop mode (and laptop-
mode-tools), which I use here on my laptop (which I don''t have at hand
to
check what I set for vm.dirty_* on it).

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2013-Aug-01 15:40 UTC

head link

Re: Is the checkpoint interval adjustable?

On Wed, Jul 31, 2013 at 03:56:40PM -0700, Zach Brown
wrote:> >  I am NO programmer by any stretch.  Let''s say I want them to
be once
> > every 5 min (300 sec).  Is the attached patch sane to acheive this?
> 
> I think it''s a reasonable patch to try, yeah.
There were a few requests to tune the interval. This finally made me to
finish the patch and will send it in a second.
> >  Are there any unforeseen and effects of doing this?  Thank you for
> > the consideration.
> 
> I don''t *think* that there should be.  One way of looking at it is
that
> both 30 and 300 seconds are an *eternity* for cpu, memory, and storage.
> Any trouble that you could get in to in 300 seconds some other machine
> could trivially get in to in 30 with beefier hardware.
That''s a good point and lowers my worries a bit, though it would be
interesting to see in what way a beefy machine blows with 300 seconds
set.

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zach Brown

2013-Aug-01 17:59 UTC

head link

Re: Is the checkpoint interval adjustable?

> There were a few requests to tune the interval. This finally made me to
> finish the patch and will send it in a second.
Great, thanks.
> That''s a good point and lowers my worries a bit, though it would
be
> interesting to see in what way a beefy machine blows with 300 seconds
> set.
Agreed.  Ideally the transaction machinery decides at some point that a
transaction is sufficiently huge that it''ll saturate the storage
pipeline and kicks it off.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mike Audia

2013-Aug-02 20:58 UTC

head link

Re: Is the checkpoint interval adjustable?

> From: David Sterba
> There were a few requests to tune the interval. This finally made me to
> finish the patch and will send it in a second.
Thank you, David and to others who kindly replied to my post.  I will try your
patch rather than modifying the code
> > >  Are there any unforeseen and effects of doing this?  Thank you
for
> > > the consideration.
> >
> > I don''t *think* that there should be. One way of looking at
it is that
> > both 30 and 300 seconds are an *eternity* for cpu, memory, and
storage.
> > Any trouble that you could get in to in 300 seconds some other machine
> > could trivially get in to in 30 with beefier hardware.
>
> That''s a good point and lowers my worries a bit, though it would
be
> interesting to see in what way a beefy machine blows with 300 seconds
> set.
I have my system booting to a BTRFS root partition.  Let''s say
I''m using a value of 300 for my checkpoint interval.  Does this mean
that if I do a TON of filesystem writes (say I update my system which pulls down
a bunch of system file updates for example), and I copy over several gigs of
data from a backup, all _between_ checkpoints and for some reason, my system
freezes forcing me to ungracefully restart... is EVERYTHING since the last
checkpoint is lost?  Upon a reboot, will BTRFS just mount up to the last good
checkpoiint automatically or will I have a broken system and need to add the `-o
recovery` option while I mount it manualy from a chroot?

Another naive question: if I shutdown the system between checkpoints, systemd
should umount my partitions.  Does the syncing of cached data occur after the
graceful umount?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Duncan

2013-Aug-03 08:33 UTC

head link

Re: Is the checkpoint interval adjustable?

Mike Audia posted on Fri, 02 Aug 2013 16:58:42 -0400 as excerpted:
>> From: David Sterba There were a few requests to tune the interval. This
>> finally made me to finish the patch and will send it in a second.
> 
> Thank you, David and to others who kindly replied to my post.  I will
> try your patch rather than modifying the code
> 
>> > >  Are there any unforeseen and effects of doing this?  Thank
you
>> > >  for
>> > > the consideration.
>> >
>> > I don''t *think* that there should be. One way of looking
at it is
>> > that both 30 and 300 seconds are an *eternity* for cpu, memory,
and
>> > storage.
>> > Any trouble that you could get in to in 300 seconds some other
>> > machine could trivially get in to in 30 with beefier hardware.
>>
>> That''s a good point and lowers my worries a bit, though it
would be
>> interesting to see in what way a beefy machine blows with 300 seconds
>> set.
> 
> I have my system booting to a BTRFS root partition.  Let''s say
I''m using
> a value of 300 for my checkpoint interval.  Does this mean that if I do
> a TON of filesystem writes (say I update my system which pulls down a
> bunch of system file updates for example), and I copy over several gigs
> of data from a backup, all _between_ checkpoints and for some reason, my
> system freezes forcing me to ungracefully restart... is EVERYTHING since
> the last checkpoint is lost?
When I tried btrfs on faulty hardware a bit over a year ago, yes.  And 
yes, that''s the way a btree filesystem such as btrfs generally works, 
too, because when a change happens, it recurses up the tree until finally 
the master node is updated.  Until the master node is updated, the old 
master node remains effective.

During the time between the first change and the master node update, 
additional changes may occur, making the final master node update and 
likely several below it more "efficient", since that single write now 
covers more than a single change.  However, if the system bellys up in 
the meantime, that means you lose everything since the last master node 
update.

Here''s my experience from last year.  I had some failing hardware,
which
turned out to be the mobo, but before I ultimately figured out the 
problem, I thought it was the disks.  Thus, I bought a new one and 
attempted to replace what I thought was a failing one, copying everything 
over, and thinking I''d try the new to me btrfs while I was at it.

But what was really happening hardware-wise was that my then 8-year-old 
mobo had some capacitors going bad (I found several bulging and others 
burst when I finally figured out it was the mobo).  That was triggering 
intermittent I/O errors that I had (wrongly) attributed to the disks 
dying, thus the replacement attempt.  The symptom was SATA retries, 
downgrading the speed and retrying again, and eventually timing out and 
resetting the SATA interface.  Only sometimes the whole system would 
lockup before a successful reset, or it would timeout and reset enough 
times that I''d give up and do a full system reset.  The one thing I
/did/
notice was that if I kept things cold enough (by the time I was done I 
had the AC turned down so far I was sitting here in a full winter jacket, 
long underwear, and a knit hat... in a Phoenix summer with temps of 
40-45C/100-115F outside!!), the system would work better, so that''s
what
I was trying to do.

It was in this environment that I was attempting to copy all my old data 
from what I /thought/ was a failing disc drive (or drives, I was running 
md/raid1 for most of the system), initially blaming the copy failures on 
what I thought was the failing drive(s), until I had enough data on the 
new drive to try disconnecting the old drives and copying data around on 
the new drive.  When that acted up with the old drives entirely 
disconnected, I realized it wasn''t the drives after all, and eventually
found the problem.

But meanwhile, when I''d have to reset, what I''d find is that
on btrfs,
the whole tree I had been trying to copy over, and that I /thought/ had 
mostly copied fine, was gone.

Or worse, part of the metadata had copied, the filesystem tree or at 
least part of it, and was still there after a reboot, but all or most of 
the files were zeroed out!!  At least if nothing at all copied I knew 
right away where I was at.  With the zeroed out files, I''d have to
figure
out how much actual data had copied and remained on the new drive, and 
where it had gone from saving everything to only saving the metadata, 
with the actual files zeroed out.  Then I could delete them and try again.

My previous filesystem (and the one I returned to for a year after I gave 
up on btrfs for the time being, I''m back on btrfs, with new SSDs, now) 
was reiserfs.  It has actually been *IMPRESSIVELY* reliable for me, even 
thru various hardware failure, at least since the reiserfs data=ordered 
by default mode was introduced back in kernel 2.6.6 or some such.  (As it 
turns out, it was the same Chris Mason working on that after Hans Reiser 
and Namesys basically abandoned reiserfs in favor of working on reiser4, 
that''s behind btrfs now, so he knows his filesystems!)

What I found is that with properly tuned vm.dirty_* as explained in my 
earlier post, or with repeatedly hitting the magic-SRQ emergency sync 
hotkey (alt-srq-s), reiserfs had a chance to lose *ONLY* the data that 
hadn''t yet been synced, while btrfs would tend to either entirely lose
or
zero out the files for entire freshly copied trees, since the start of 
the copy operation, EVEN WHEN I HAD BEEN EMERGENCY SYNCING EVERY FEW 
SECONDS AS THE COPY PROGRESSED!!

Obviously, then, btrfs simply wasn''t going to work with my at the time 
bad hardware, certainly not for the massive data transfers I was trying 
to do, since with btrfs after a crash I had lost pretty much all of the 
current copy I had been doing, while reiserfs would reliably actually 
sync when I hit the emergency sync sequence, so after a crash I''d lose 
ONLY the few files since the last sync a few seconds before.  Thus, even 
with failing hardware I could make reasonable progress with the copy when 
the destination was reiserfs, where with btrfs, in most cases I was back 
at square one, as if I''d never done that copy at all, or worse yet,
with
a bunch of zeroed out files where the metadata was retained but not the 
actual data.

So I switched back to reiserfs for a year, and only tried btrfs again 
from a couple months ago now, when I upgraded to SSD and thus had both 
brand new and much faster hardware to worth with, AND needed a filesystem 
more suited to SSD than reiserfs.  (I had found reiserfs MUCH better and 
more robust for my needs than ext3/4 back on spinning rust, and didn''t 
really want to go ext4 on SSD either, tho I probably would have without 
btrfs where it is now.)

In that year btrfs has GREATLY matured as well, and to be honest I''m
not
sure whether its btrfs increased maturity and stability over that year or 
the fact that I''m on actually GOOD hardware now that makes the btrfs 
experience so much better for me now, but regardless, btrfs IS still 
experimental, and even when that label comes off, I expect it''ll take 
quite some time to reach the stability of current reiserfs, just as it 
took time for reiserfs to reach that.  But of course btrfs is far more 
flexible than reiserfs as well, both SSD-wise and in general.  Still, in 
a crash and DEFINITELY in the failing hardware scenario, I''d definitely
put a lot more trust in reiserfs than in btrfs, and I expect it''ll be 
that way for some time to come.
> Upon a reboot, will BTRFS just mount up to
> the last good checkpoiint automatically or will I have a broken system
> and need to add the `-o recovery` option while I mount it manualy from a
> chroot?
In general btrfs should simply mount the last checkpoint automatically.  
And with a recently created filesystem I think it''ll do pretty good at 
that.  However, btrfs IS still experimental, and particularly with older 
filesystems that have had a lot of use before some of the recent bugfixes, 
that''s not always a given, and recovery (or restore from backups, which
given btrfs experimental status, are even MORE important than they''d be
on a filesystem considered stable, basically, consider all your data on 
btrfs as "throw away" if it comes to it -- keep your primary copy as
well
as its backups on something other than btrfs at least until that 
experimental label comes off) is occasionally necessary.
> Another naive question: if I shutdown the system between checkpoints,
> systemd should umount my partitions.  Does the syncing of cached data
> occur after the graceful umount?
As is normally the case on Linux, once the graceful umounts (or remount-
read-onlys) have fully happened, you should be good to go.  Syncing 
should be completed before the umount (or remount read-only) is 
completed, so you should be safe after that.

There have however been a few bugs, to my knowledge now all fixed, where 
the umount wouldn''t complete (livelock), and even a few where it would 
appear to complete but the filesystem was continuing to do stuff in the 
background, such that shutting it down before that was complete would 
result in corruption.  AFAIK a shutdown after initiating a btrfs balance, 
before it completed, used to be one such situation, but btrfs now 
properly suspends the balance and quiesces the filesystem before umount, 
resuming it at remount read/write.  On a slow multi-terabyte "spinning 
rust" filesystem, a full balance can take quite some time (hours to tens 
of hours), so not being able to properly gracefully suspend that balance 
for umount and resume after a remount was a BIG problem, now fixed, as I 
said.  But the caveat about btrfs'' experimental/developmental status 
remains -- there ARE still bugs being found and fixed; choose a fully 
stable filesystem, not the still experimental btrfs, if you''re not 
willing to keep backups and consider everything you put on btrfs for the 
time being subject to potential loss if the worst should happen.

And of course as the wiki[1] recommends, if you do choose to run btrfs, 
keep current on your kernels, as they really ARE fixing bugs in real-
time, and if you''re running a kernel older than the latest Linus stable
series, you *ARE* going to be missing bugfixes that just /might/ save you 
from serious btrfs problems.

---
[1] Btrfs wiki: https://btrfs.wiki.kernel.org/

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Kai Krakow

2013-Aug-03 17:28 UTC

head link

Re: Is the checkpoint interval adjustable?

Mike Audia <mikey_a@gmx.com> schrieb:
> I believe 30 sec is the default for the checkpoint interval.  Is this
> adjustable? --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Just curious: What would be the benefit of increasing the checkpoint 
interval?

As far as I understood it would not decrease write load on the drives 
because it will only update a few pointers and probably increase the 
generation number...

Greetings,
Kai

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Torbjørn

2013-Aug-03 17:37 UTC

head link

Re: Is the checkpoint interval adjustable?

On 08/03/2013 07:28 PM, Kai Krakow wrote:> Mike Audia <mikey_a@gmx.com> schrieb:
>
>> I believe 30 sec is the default for the checkpoint interval.  Is this
>> adjustable? --
>> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Just curious: What would be the benefit of increasing the checkpoint
> interval?Laptops typically spin down disks to save power. If btrfs forces a write 
every 30 second, you have to spin it back up.

--
Torbjørn>
> As far as I understood it would not decrease write load on the drives
> because it will only update a few pointers and probably increase the
> generation number...
>
> Greetings,
> Kai
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Kai Krakow

2013-Aug-04 00:58 UTC

head link

Re: Is the checkpoint interval adjustable?

Torbjørn <lists@skagestad.org> schrieb:
>> Just curious: What would be the benefit of increasing the checkpoint
>> interval?
> Laptops typically spin down disks to save power. If btrfs forces a write
> every 30 second, you have to spin it back up.
I''d expect btrfs not to write to the disk when a checkpoint is reached
and
no writes occurred to the filesystem meanwhile... Could some developer shed 
some light on this?

IMHO if this is true, there is no point in increasing the checkpoint 
interval... Thoughts?

Regards,
Kai

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Seemingly Similar Threads

Search for more reasonably related threads

Btrfs devel - Jul 2013 - Is the checkpoint interval adjustable?

Is the checkpoint interval adjustable?

Re: Is the checkpoint interval adjustable?

Re: Is the checkpoint interval adjustable?

Re: Is the checkpoint interval adjustable?

Re: Is the checkpoint interval adjustable?

Re: Is the checkpoint interval adjustable?

Re: Is the checkpoint interval adjustable?

Re: Is the checkpoint interval adjustable?

Re: Is the checkpoint interval adjustable?

Re: Is the checkpoint interval adjustable?

Re: Is the checkpoint interval adjustable?

Re: Is the checkpoint interval adjustable?

Seemingly Similar Threads