thr3ads.net - freebsd stable - PATCH: Forcible delaying of UFS (soft)updates [Apr 2003]

If this information is useful, please help other people find it:
Share via:

Marko Zec

2003-Apr-11 19:01 UTC

PATCH: Forcible delaying of UFS (soft)updates

Here's a patch against 4.8-RELEASE kernel that allows disk writes on
softupdates-enabled filesystems to be delayed for (theoretically)
arbitrarily long periods of time. The motivation for such updating
policy is surprisingly not purely suicidal - it can allow disks on
laptops to spin down immediately after I/O operations and stay idle for
longer periods of time, thus saving considerable amount of battery
power.

The patch introduces a new sysctl tunable vfs.sync_extdelay which
controls the delay duration in seconds. If the variable is set to 0, the
standard UFS synching policy is restored. The tunable can be either
modified by hand or controlled by APM daemon using the attached
rc.syncdelay script.

When enabled, the extended delaying policy introduces some additional
changes:

- fsync() no longer flushes the buffers to disk, but returns immediately
instead;
- invoking sync() causes flushing of softupdates buffers to follow
immediately, which was not the case before;
- if one of the mounted filesystems becomes low on free space, which can
happen if lot of data is written to the FS but FS metadata buffers are
not updated to disk, flushing of all softupdates buffers is scheduled
automatically;
- if an I/O operation (typically read request) on ATA disk is performed,
which is likely to cause the disk to be spinned up, the pending buffers
are immediately flushed to the disk, but only if they were pending
longer than what would be the case with normal updating policy.

As I'm virtually clueless in FS concepts and theory I'm not sure if the
above model doesn't shake the foundations of UFS operation, therefore
I'd appreciate for more knowledgeable people to comment on the patch.
Nevertheless, my laptop runs without glitches for the last two weeks
with the extra delaying enabled, while happily achieving 5-10% longer
battery operated periods, depending on disk utilization patterns.

Cheers,

Marko
-------------- next part --------------
--- /usr/src/sys.org/dev/ata/ata-disk.c	Thu Jan 30 08:19:59 2003
+++ dev/ata/ata-disk.c	Sat Apr 12 00:31:26 2003
@@ -294,6 +294,7 @@ adstrategy(struct buf *bp)
     struct ad_softc *adp = bp->b_dev->si_drv1;
     int s;
 
+    stratcalls++;
     if (adp->device->flags & ATA_D_DETACHING) {
 	bp->b_error = ENXIO;
 	bp->b_flags |= B_ERROR;
--- /usr/src/sys.org/kern/vfs_subr.c	Sun Oct 13 18:19:12 2002
+++ kern/vfs_subr.c	Sat Apr 12 01:56:16 2003
@@ -116,6 +116,10 @@ SYSCTL_INT(_vfs, OID_AUTO, reassignbufme
 static int nameileafonly = 0;
 SYSCTL_INT(_vfs, OID_AUTO, nameileafonly, CTLFLAG_RW, &nameileafonly, 0,
"");
 
+int stratcalls = 0;
+int sync_extdelay = 0;
+SYSCTL_INT(_vfs, OID_AUTO, sync_extdelay, CTLFLAG_RW, &sync_extdelay, 0,
"");
+
 #ifdef ENABLE_VFS_IOOPT
 int vfs_ioopt = 0;
 SYSCTL_INT(_vfs, OID_AUTO, ioopt, CTLFLAG_RW, &vfs_ioopt, 0, "");
@@ -137,7 +141,7 @@ static vm_zone_t vnode_zone;
  * The workitem queue.
  */
 #define SYNCER_MAXDELAY		32
-static int syncer_maxdelay = SYNCER_MAXDELAY;	/* maximum delay time */
+int syncer_maxdelay = SYNCER_MAXDELAY;	/* maximum delay time */
 time_t syncdelay = 30;		/* max time to delay syncing data */
 time_t filedelay = 30;		/* time to delay syncing files */
 SYSCTL_INT(_kern, OID_AUTO, filedelay, CTLFLAG_RW, &filedelay, 0,
"");
@@ -145,7 +149,7 @@ time_t dirdelay = 29;		/* time to delay 
 SYSCTL_INT(_kern, OID_AUTO, dirdelay, CTLFLAG_RW, &dirdelay, 0,
"");
 time_t metadelay = 28;		/* time to delay syncing metadata */
 SYSCTL_INT(_kern, OID_AUTO, metadelay, CTLFLAG_RW, &metadelay, 0,
"");
-static int rushjob;			/* number of slots to run ASAP */
+int rushjob;			/* number of slots to run ASAP */
 static int stat_rush_requests;	/* number of times I/O speeded up */
 SYSCTL_INT(_debug, OID_AUTO, rush_requests, CTLFLAG_RW,
&stat_rush_requests, 0, "");
 
@@ -177,6 +181,7 @@ vntblinit()
 {
 
 	desiredvnodes = maxproc + cnt.v_page_count / 4;
+	TUNABLE_INT_FETCH("kern.maxvnodes", &desiredvnodes);
 	minvnodes = desiredvnodes / 4;
 	simple_lock_init(&mntvnode_slock);
 	simple_lock_init(&mntid_slock);
@@ -1119,7 +1124,7 @@ sched_sync(void)
 {
 	struct synclist *slp;
 	struct vnode *vp;
-	long starttime;
+	time_t starttime;
 	int s;
 	struct proc *p = updateproc;
 
@@ -1127,8 +1132,6 @@ sched_sync(void)
 	    SHUTDOWN_PRI_LAST);   
 
 	for (;;) {
-		kproc_suspend_loop(p);
-
 		starttime = time_second;
 
 		/*
@@ -1198,8 +1201,25 @@ sched_sync(void)
 		 * matter as we are just trying to generally pace the
 		 * filesystem activity.
 		 */
-		if (time_second == starttime)
+		if (time_second != starttime)
+			continue;
+
+		if (sync_extdelay >= syncer_maxdelay)
+			while (syncer_delayno == 0 && rushjob == 0 &&
+	    		    abs(time_second - starttime) < sync_extdelay) {
+				stratcalls = 0;
+				tsleep(&lbolt, PPAUSE, "syncer", 0);
+				kproc_suspend_loop(p);
+				if (stratcalls != 0 && syncer_maxdelay <
+				    abs(time_second - starttime)) {
+					rushjob = syncer_maxdelay;
+					break;
+				}
+			}
+		else {
 			tsleep(&lbolt, PPAUSE, "syncer", 0);
+			kproc_suspend_loop(p);
+		}
 	}
 }
 
--- /usr/src/sys.org/kern/vfs_syscalls.c	Thu Jan  2 18:26:18 2003
+++ kern/vfs_syscalls.c	Sat Apr 12 01:55:48 2003
@@ -563,6 +563,9 @@ sync(p, uap)
 	register struct mount *mp, *nmp;
 	int asyncflag;
 
+	/* Notify sched_sync() to try flushing syncer_workitem_pending[*] */
+	rushjob += syncer_maxdelay; 
+
 	simple_lock(&mountlist_slock);
 	for (mp = TAILQ_FIRST(&mountlist); mp != NULL; mp = nmp) {
 		if (vfs_busy(mp, LK_NOWAIT, &mountlist_slock, p)) {
@@ -2627,6 +2630,10 @@ fsync(p, uap)
 	struct file *fp;
 	vm_object_t obj;
 	int error;
+
+	/* Just return if we are artificially delaying disk syncs */
+	if (sync_extdelay)
+		return (0);
 
 	if ((error = getvnode(p->p_fd, SCARG(uap, fd), &fp)) != 0)
 		return (error);
--- /usr/src/sys.org/ufs/ffs/ffs_alloc.c	Fri Sep 21 21:15:21 2001
+++ ufs/ffs/ffs_alloc.c	Sat Apr 12 00:06:20 2003
@@ -125,6 +125,10 @@ ffs_alloc(ip, lbn, bpref, size, cred, bn
 #endif /* DIAGNOSTIC */
 	if (size == fs->fs_bsize && fs->fs_cstotal.cs_nbfree == 0)
 		goto nospace;
+	/* Speedup flushing of syncer_wokitem_pending[*] if low on freespace */
+	if (rushjob == 0 &&
+	    freespace(fs, fs->fs_minfree + 2) - numfrags(fs, size) < 0)
+		rushjob = syncer_maxdelay;
 	if (cred->cr_uid != 0 &&
 	    freespace(fs, fs->fs_minfree) - numfrags(fs, size) < 0)
 		goto nospace;
@@ -195,6 +199,10 @@ ffs_realloccg(ip, lbprev, bpref, osize, 
 	if (cred == NOCRED)
 		panic("ffs_realloccg: missing credential");
 #endif /* DIAGNOSTIC */
+	/* Speedup flushing of syncer_wokitem_pending[*] if low on freespace */
+	if (rushjob == 0 &&
+	    freespace(fs, fs->fs_minfree + 2) - numfrags(fs, nsize - osize) < 0)
+		rushjob = syncer_maxdelay;
 	if (cred->cr_uid != 0 &&
 	    freespace(fs, fs->fs_minfree) -  numfrags(fs, nsize - osize) < 0)
 		goto nospace;
--- /usr/src/sys.org/sys/buf.h	Sat Jan 25 20:02:23 2003
+++ sys/buf.h	Sat Apr 12 00:30:48 2003
@@ -478,6 +478,7 @@ extern char	*buffers;		/* The buffer con
 extern int	bufpages;		/* Number of memory pages in the buffer pool. */
 extern struct	buf *swbuf;		/* Swap I/O buffer headers. */
 extern int	nswbuf;			/* Number of swap I/O buffer headers. */
+extern int	stratcalls;		/* I/O ops since last buffer sync */
 extern TAILQ_HEAD(swqueue, buf) bswlist;
 extern TAILQ_HEAD(bqueues, buf) bufqueues[BUFFER_QUEUES];
 
--- /usr/src/sys.org/sys/vnode.h	Sun Dec 29 19:19:53 2002
+++ sys/vnode.h	Sat Apr 12 00:06:20 2003
@@ -294,6 +294,9 @@ extern	struct vm_zone *namei_zone;
 extern	int prtactive;			/* nonzero to call vprint() */
 extern	struct vattr va_null;		/* predefined null vattr structure */
 extern	int vfs_ioopt;
+extern	int rushjob;
+extern	int syncer_maxdelay;
+extern	int sync_extdelay;
 
 /*
  * Macro/function to check for client cache inconsistency w.r.t. leasing.



-------------- next part --------------
# apmd Configuration File
#
# $FreeBSD: src/etc/apmd.conf,v 1.2.2.1 2000/12/12 22:48:18 dannyboy Exp $
#

apm_event POWERSTATECHANGE {
	exec "/etc/rc.syncdelay";
}

apm_event SUSPENDREQ {
	exec "/etc/rc.suspend";
}

apm_event USERSUSPENDREQ {
	exec "sync && sync && sync";
	#exec "sleep 1";
	exec "apm -z";
}

apm_event NORMRESUME, STANDBYRESUME {
	exec "/etc/rc.resume";
	exec "/etc/rc.syncdelay";
}

# resume event configuration for serial mouse users by
# reinitializing a moused(8) connected to a serial port.
#
#apm_event NORMRESUME {
#	exec "kill -HUP `cat /var/run/moused.pid`";
#}

# suspend request event configuration for ATA HDD users:
# execute standby instead of suspend.
#
#apm_event SUSPENDREQ {
#	reject;
#	exec "sync && sync && sync";
#	exec "sleep 1";
#	exec "apm -Z";
#}

# Sample entries for battery state monitoring
#apm_battery 5% discharging {
#	exec "logger -p user.emerg battery status critical!";
#	exec "echo T250L8CE-GE-C >/dev/speaker";
#}
#apm_battery 1% discharging {
#	exec "logger -p user.emerg battery low - emergency suspend";
#	exec "echo T250L16B+BA+AG+GF+FED+DC+CC >/dev/speaker";
#	exec "apm -z";
#}
#apm_battery 99% charging {
#	exec "logger -p user.notice battery fully charged";
#}

# apmd Configuration ends here



-------------- next part --------------
#!/bin/sh
#
# Copyright (c) 2003 Marko Zec
#
#include /usr/share/examples/bsd-style-copyright
#

# 
# /etc/rc.syncdelay
#
# Adjust disk syncing policy and delay on battery powered systems.
# Invoked automatically by apmd(8) when power state change or resume
# events occur.
#

AC_DELAY=0	# no delayed syncing
BAT_DELAY=600	# sync every 10 minutes

if [ `apm -a` -eq 1 ]; then
	# AC powered mode
	sysctl vfs.sync_extdelay=$AC_DELAY
else
	# Battery powered mode
	# Allow delayed syncing only if enough battery capacity is available
	if [ `apm -l` -gt 3 ]; then
		sysctl vfs.sync_extdelay=$BAT_DELAY
	else
		sysctl vfs.sync_extdelay=0
	fi
fi

exit 0

Alfred Perlstein

2003-Apr-11 20:33 UTC

head link

PATCH: Forcible delaying of UFS (soft)updates

* Marko Zec <zec@tel.fer.hr> [030411 19:01] wrote:> 
> When enabled, the extended delaying policy introduces some additional
> changes:
> 
> - fsync() no longer flushes the buffers to disk, but returns immediately
> instead;
This is really the only bad thing I can see here, what about introducing
a slight delay and seeing if one can coalesce the writes?  Is this
part really needed?  Making fsync() not work is a good way to make
any sort of userland based transactional system break badly.

otherwise, way cool!

-Alfred

Jan Grant

2003-Apr-12 02:12 UTC

head link

PATCH: Forcible delaying of UFS (soft)updates

On Sat, 12 Apr 2003, Marko Zec wrote:
> When enabled, the extended delaying policy introduces some additional
> changes:
>
> - fsync() no longer flushes the buffers to disk, but returns immediately
> instead;
This is bad; the rest looks very interesting.


-- 
jan grant, ILRT, University of Bristol. http://www.ilrt.bris.ac.uk/
Tel +44(0)117 9287088 Fax +44 (0)117 9287112 http://ioctl.org/jan/
Rereleasing dolphins into the wild since 1998.

Oliver Fromme

2003-Apr-12 07:39 UTC

head link

PATCH: Forcible delaying of UFS (soft)updates

Marko Zec <zec@tel.fer.hr> wrote:
 > Here's a patch against 4.8-RELEASE kernel that allows disk writes on
 > softupdates-enabled filesystems to be delayed for (theoretically)
 > arbitrarily long periods of time. The motivation for such updating
 > policy is surprisingly not purely suicidal - it can allow disks on
 > laptops to spin down immediately after I/O operations and stay idle for
 > longer periods of time, thus saving considerable amount of battery
 > power.

It would be very cool if you could have different delay
settings per filesystem.  That would enable you to have
a large delay on /tmp, a medium delay on /var, and the
standard delay (i.e. more safety) on everything else.

 > - fsync() no longer flushes the buffers to disk, but returns immediately
 > instead;

I see some issues with that.  Better make that tunable
separately (and probably default to off).

 > - invoking sync() causes flushing of softupdates buffers to follow
 > immediately, which was not the case before;

That's cool.  I always disliked the fact I had to type
sync several times and still couldn't be sure that
everything was really synced.  (Yeah, I know, it's the
way it works, it always worked like that, and it's
documented to work like that ...  but I still dislike
it.)

 > - if one of the mounted filesystems becomes low on free space, which can
 > happen if lot of data is written to the FS but FS metadata buffers are
 > not updated to disk, flushing of all softupdates buffers is scheduled
 > automatically;

That's cool, too.  I've been bitten several times by the
bogus "no space left on device", due to soft-updates
delaying the freeing of file data.

I assume that buffered data is also flushed to disk when
the system runs low on RAM, right?  (I'm not a VFS/VM
expert, so that might be a stupid question.)

 > Nevertheless, my laptop runs without glitches for the last two weeks
 > with the extra delaying enabled, while happily achieving 5-10% longer
 > battery operated periods, depending on disk utilization patterns.

Awesome.  That would mean about 45 minutes more mobility
with my laptop.  :)

Regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co KG, Oettingenstr. 2, 80538 M?nchen
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

"Clear perl code is better than unclear awk code; but NOTHING
comes close to unclear perl code"  (taken from comp.lang.awk FAQ)

Dave Hart

2003-Apr-12 09:58 UTC

head link

PATCH: Forcible delaying of UFS (soft)updates

Marko Zec said:> Alfred Perlstein wrote:
> 
> > * Marko Zec <zec@tel.fer.hr> [030411 19:01] wrote:
> > >
> > > When enabled, the extended delaying policy introduces 
> > > some additional changes:
> > >
> > > - fsync() no longer flushes the buffers to disk, but 
> > > returns immediately instead;
[...]> > Making fsync() not work is a good way to make any sort
> > of userland based transactional system break badly.
[...]> If the disk would start spinning every now and than,
> the whole patch would than become pointless...
As I feared.
> [...] the fact that the modified fsync() just returns 
> without doing anything useful doesn't mean the data will be
> lost - it will  simply be delayed until the next coalesced
> updating occurs.
Unless, of course, your system or power happens to fail.
Imagine you have a database program keeping track of banking
transactions.  This program uses fsync() to ensure its
transaction logs are committed to reliable storage before
indicating the transaction is completed.  Suppose the moment
after I withdraw $500 from an ATM, the operating system or
hardware fails at the bank.

With your change to fsync() to not commit to stable storage,
I may have just won $500 courtesy of you.  That is, the
database software did all it could to ensure the $500
transaction was actually written to disk before authorizing
the ATM to dispense cash, yet fsync() has decided it's not
that important to do right away, so the transaction might
well have not hit the disk before the catastrophe.

For a perspective from the Windows world on the same sort
of capability, check out the Win32 FlushFileBuffers spec:

http://makeashorterlink.com/?E26B12F24

which is an alias for:

http://msdn.microsoft.com/library/default.asp?url=/library/
en-us/fileio/base/flushfilebuffers.asp
>From that page:  "The FlushFileBuffers function writes all of the buffered information for the specified file to disk."

Such is the world of writing OS code -- optimizing for one
situation may well break other important uses of the same
code.

Regards,
Dave Hart
davehart@davehart.com
(who spent more time formatting text than writing, sigh)

Michael Sierchio

2003-Apr-12 11:38 UTC

head link

PATCH: Forcible delaying of UFS (soft)updates

Marko Zec wrote:
> - fsync() no longer flushes the buffers to disk, but returns immediately
> instead;
Any system that does this should be flushed down the toilet.  Softupdates
already breaks transaction semantics of FFS by breaking link()/unlink()/rename()
etc.

Don't make it worse.  Many programs rely on fsync().

Ian Dowse

2003-Apr-12 13:02 UTC

head link

PATCH: Forcible delaying of UFS (soft)updates

In message <3E976EBD.C3E66EF8@tel.fer.hr>, Marko Zec
writes:>Here's a patch against 4.8-RELEASE kernel that allows disk writes on
>softupdates-enabled filesystems to be delayed for (theoretically)
>arbitrarily long periods of time. The motivation for such updating
>policy is surprisingly not purely suicidal - it can allow disks on
>laptops to spin down immediately after I/O operations and stay idle for
>longer periods of time, thus saving considerable amount of battery
>power.
Looks interesting. A while ago I was reading the spec of some IBM
ATA hard disk, and discovered that there is a "delayed write" feature
built into most ATA disks that is extremely useful for keeping a
laptop disk spun down.

When the feature is enabled, the disk behaves normally until it
spins down due to the standard ATA spindown timeout. Then it enters
the delayed write mode, and all further writes to the disk go just
to the disk cache and the disk is not spun up. Finally, when for
any reason the disk needs to be spun up (cache is full, or a read
of an uncached sector occurs), the cache is flushed as soon as the
disk spins up. Assuming this is was happens (it's mostly based on
observation rather than documentation), you get a much smaller
window where the disk is potentially inconsistent, and automatic
triggering of the writes only when they are necessary.

Below is simple script I use to turn on the feature when running
on battery power (using ACPI), and the -CURRENT patches that allow
the spindown delay and delayed write features to be controlled with
atacontrol (I mailed the patches to Soren a while ago).

Ian


#!/bin/sh

oacline=""
while :; do
	sleep 5

	acline=`sysctl -n hw.acpi.acline`
	if [ "X$acline" = "X$oacline" ]; then
		continue
	fi
	oacline="$acline";

	case "$acline" in
	1)
		atacontrol standby 0 0 300
		atacontrol delayed_write 0 0 0
		;;
	0)
		atacontrol standby 0 0 20
		atacontrol delayed_write 0 0 1
		;;
	esac
	
done


Index: sys/sys/ata.h
==================================================================RCS file:
/dump/FreeBSD-CVS/src/sys/sys/ata.h,v
retrieving revision 1.17
diff -u -r1.17 ata.h
--- sys/sys/ata.h	22 Mar 2003 12:18:20 -0000	1.17
+++ sys/sys/ata.h	28 Mar 2003 02:42:27 -0000
@@ -370,6 +370,7 @@
 #define ATARAIDSTATUS		11
 #define ATAENCSTAT		12
 #define ATAGMAXCHANNEL		13
+#define ATACMD			14
 
     union {
 	struct {
@@ -409,6 +410,20 @@
 	    int			v05;
 	    int			v12;
 	} enclosure;
+	struct {
+	    int			flags;		/* info about the request */
+#define ATA_CMD_CTRL			0x00
+#define ATA_CMD_READ			0x01
+#define ATA_CMD_WRITE			0x02
+
+	    u_int8_t		command;	/* command code */
+	    u_int64_t		lba;		/* lba address */
+	    u_int16_t		count;		/* sector count */
+	    u_int8_t		feature;	/* feature modifier bits */
+
+	    caddr_t		databuf;	/* I/O data buffer */
+	    int			datalen;	/* length of data buffer */
+	} ata;
 	struct {
 	    char		ccb[16];
 	    caddr_t		data;
Index: sys/dev/ata/ata-all.c
==================================================================RCS file:
/dump/FreeBSD-CVS/src/sys/dev/ata/ata-all.c,v
retrieving revision 1.175
diff -u -r1.175 ata-all.c
--- sys/dev/ata/ata-all.c	30 Mar 2003 09:27:59 -0000	1.175
+++ sys/dev/ata/ata-all.c	1 Apr 2003 12:27:07 -0000
@@ -355,6 +355,28 @@
 		      sizeof(struct ata_params));
 	    return 0;
 
+	case ATACMD: {
+	    struct ata_device *atadev;
+
+	    if (!device || !(ch = device_get_softc(device)))
+		return ENXIO;
+	    if (!(atadev = &ch->device[iocmd->device]) ||
+		!(ch->devices & (iocmd->device == MASTER ?
+				 ATA_ATA_MASTER : ATA_ATA_SLAVE)))
+		return ENXIO;
+	    if (iocmd->u.ata.flags != ATA_CMD_CTRL)
+		return EOPNOTSUPP;
+
+	    error = 0;
+	    ATA_SLEEPLOCK_CH(ch, ATA_CONTROL);
+	    if (ata_command(atadev, iocmd->u.ata.command, iocmd->u.ata.lba,
+			    iocmd->u.ata.count, iocmd->u.ata.feature,
+			    ATA_WAIT_INTR) != 0)
+		error = EIO;
+	    ATA_UNLOCK_CH(ch);
+	    return error;
+	}
+
 	case ATAENCSTAT: {
 	    struct ata_device *atadev;
 
Index: sbin/atacontrol/atacontrol.8
==================================================================RCS file:
/dump/FreeBSD-CVS/src/sbin/atacontrol/atacontrol.8,v
retrieving revision 1.22
diff -u -r1.22 atacontrol.8
--- sbin/atacontrol/atacontrol.8	23 Dec 2002 15:30:40 -0000	1.22
+++ sbin/atacontrol/atacontrol.8	31 Jan 2003 00:57:52 -0000
@@ -25,7 +25,7 @@
 .\"
 .\" $FreeBSD: src/sbin/atacontrol/atacontrol.8,v 1.22 2002/12/23 15:30:40
ru Exp $
 .\"
-.Dd May 17, 2001
+.Dd August 18, 2002
 .Dt ATACONTROL 8
 .Os
 .Sh NAME
@@ -72,6 +72,21 @@
 .Ar channel device
 .Nm
 .Ic list
+.Nm
+.Ic idle
+.Ar channel device
+.Op seconds
+.Nm
+.Ic standby
+.Ar channel device
+.Op seconds
+.Nm
+.Ic sleep
+.Ar channel device
+.Nm
+.Ic delayed_write
+.Ar channel device
+.Op 0 | 1
 .Sh DESCRIPTION
 The
 .Nm
@@ -208,6 +223,27 @@
 Fan RPM speed, enclosure temperature, 5V and 12V levels are shown.
 .It Ic list
 Show info about all attached devices on all active controllers.
+.It Ic idle
+Set the idle timeout on the specified device.
+If no timeout is given, put the device into the idle state immediately.
+.It Ic standby
+Set the standby timeout on the specified device.
+If no timeout is given, put the device into the standby state immediately.
+.It Ic sleep
+Put the device into sleep mode.
+Since this effectively powers down the device, settings configured by
+the driver are lost, so this should not be used on an active drive.
+Use
+.Nm
+.Ic reinit
+to reinitialize the device for later use.
+.It Ic delayed_write
+Enable or disable the delayed write feature on the specified device.
+When delayed writes are enabled on devices that support this feature,
+writes that occur while the disk is spun down are stored in the
+drive cache only.
+Once the cache becomes full or the disk is spun up (e.g. for a read
+operation), the cached writes are immediately flushed to the disk.
 .El
 .Sh EXAMPLES
 To see the devices' current access modes, use the command line:
Index: sbin/atacontrol/atacontrol.c
==================================================================RCS file:
/dump/FreeBSD-CVS/src/sbin/atacontrol/atacontrol.c,v
retrieving revision 1.20
diff -u -r1.20 atacontrol.c
--- sbin/atacontrol/atacontrol.c	22 Mar 2003 12:18:20 -0000	1.20
+++ sbin/atacontrol/atacontrol.c	1 Apr 2003 13:26:51 -0000
@@ -249,7 +249,7 @@
 main(int argc, char **argv)
 {
 	struct ata_cmd iocmd;
-	int fd, maxunit, unit;
+	int enable, fd, idle, maxunit, unit;
 
 	if ((fd = open("/dev/ata", O_RDWR)) < 0)
 		err(1, "control device not found");
@@ -427,6 +427,43 @@
 				mode2str(iocmd.u.mode.mode[0]), 
 				mode2str(iocmd.u.mode.mode[1]));
 		}
+	}
+	else if ((!strcmp(argv[1], "idle") || !strcmp(argv[1],
"standby") ||
+	    !strcmp(argv[1], "sleep")) && argc == 4) {
+		iocmd.cmd = ATACMD;
+		iocmd.device = atoi(argv[3]);
+		iocmd.u.ata.flags = ATA_CMD_CTRL;
+		iocmd.u.ata.command = !strcmp(argv[1], "idle") ? 0xe1 :
+		    !strcmp(argv[1], "standby") ? 0xe0 : 0xe6;
+		if (ioctl(fd, IOCATA, &iocmd) < 0)
+			err(1, "ioctl(ATACMD)");
+	}
+	else if ((!strcmp(argv[1], "idle") || !strcmp(argv[1],
"standby")) &&
+	    argc == 5) {
+		idle = atoi(argv[4]);
+		if (idle > 19800)
+			errx(1, "Maximum idle time is 19800 seconds");
+		if (idle <= 240*5)
+			iocmd.u.ata.count = (idle + 4) / 5;
+		else
+			iocmd.u.ata.count = idle / (30*60) + 240;
+
+		iocmd.cmd = ATACMD;
+		iocmd.device = atoi(argv[3]);
+		iocmd.u.ata.flags = ATA_CMD_CTRL;
+		iocmd.u.ata.command = !strcmp(argv[1], "idle") ? 0xe3 : 0xe2;
+		if (ioctl(fd, IOCATA, &iocmd) < 0)
+			err(1, "ioctl(ATACMD)");
+	}
+	else if (!strcmp(argv[1], "delayed_write") && argc == 5) {
+		enable = atoi(argv[4]);
+		iocmd.cmd = ATACMD;
+		iocmd.device = atoi(argv[3]);
+		iocmd.u.ata.feature = enable ? 0x07 : 0x87;
+		iocmd.u.ata.flags = ATA_CMD_CTRL;
+		iocmd.u.ata.command = 0xfa;
+		if (ioctl(fd, IOCATA, &iocmd) < 0)
+			err(1, "ioctl(ATACMD)");
 	}
 	else
 	    	usage();

Alfred Perlstein

2003-Apr-12 13:32 UTC

head link

PATCH: Forcible delaying of UFS (soft)updates

* Michael Sierchio <kudzu@tenebras.com> [030412 11:38]
wrote:> Marko Zec wrote:
> 
> >- fsync() no longer flushes the buffers to disk, but returns
immediately
> >instead;
> 
> Any system that does this should be flushed down the toilet.  Softupdates
> already breaks transaction semantics of FFS by breaking 
> link()/unlink()/rename()
> etc.
How does it do that?

Tony Finch

2003-Apr-12 16:02 UTC

head link

PATCH: Forcible delaying of UFS (soft)updates

Marko Zec <zec@tel.fer.hr> wrote:>
>Here's a patch against 4.8-RELEASE kernel that allows disk writes on
>softupdates-enabled filesystems to be delayed for (theoretically)
>arbitrarily long periods of time. The motivation for such updating
>policy is surprisingly not purely suicidal - it can allow disks on
>laptops to spin down immediately after I/O operations and stay idle for
>longer periods of time, thus saving considerable amount of battery
>power.
I've used a much simpler patch for a number of years, which allows
you to increase the 30s syncer interval. See
http://dotat.at/prog/buildworld/patches.src/04.syncdelay.patch.disabled

Tony.
-- 
f.a.n.finch  <dot@dotat.at>  http://dotat.at/
LANDS END TO ST DAVIDS HEAD INCLUDING THE BRISTOL CHANNEL: EAST OR SOUTHEAST
4, INCREASING 7, LOCALLY GALE 8, EASING SOUTHEAST 4 OR 5 LATER. FAIR, THEN
RAIN, WITH RISK OF MIST. GOOD, BECOMING MODERATE OR POOR. SLIGHT OR MODERATE,
LOCALLY ROUGH LATER.

Michael Collette

2003-Apr-13 02:16 UTC

head link

PATCH: Forcible delaying of UFS (soft)updates

Jon Hamilton wrote:> Dave Hart <davehart@davehart.com>, said on Sat Apr 12, 2003 [04:58:13
PM]:
> } Marko Zec said:
> [...]
> } > If the disk would start spinning every now and than,
> } > the whole patch would than become pointless...
> }
> } As I feared.
> }
> } > [...] the fact that the modified fsync() just returns
> } > without doing anything useful doesn't mean the data will be
> } > lost - it will  simply be delayed until the next coalesced
> } > updating occurs.
> }
> } Unless, of course, your system or power happens to fail.
> } Imagine you have a database program keeping track of banking
> } transactions.  This program uses fsync() to ensure its
> } transaction logs are committed to reliable storage before
> } indicating the transaction is completed.  Suppose the moment
> } after I withdraw $500 from an ATM, the operating system or
> } hardware fails at the bank.
> 
> Right.  So in such a situation, the admin for that system would not
> enable this optional behavior.  There probably aren't too many cases
> where mission critical financial transaction systems run on a laptop
> on which the desire is maximal battery life, which is the case from
> which this whole patch/discussion derives.  It's a conscious tradeoff. 
Despite criticism of Dave's comments, I'd also be a little concerned
about
what had been written to the drive prior to unexpected power loss.  I'm 
saying this as a person who uses a laptop as my primary desktop machine.

Real world laptop scenario.  I just finish downloading my E-Mail.  I then take 
and put this machine into a suspend mode.  Upon awakening the system glitches 
for some reason forcing an unexpected system shutdown.

(Note: I am having this problem now with a Thinkpad T23)

Did my mail get written to the drive prior to suspending?  I'll grant you
that
this isn't in the same league as moving cash around, but to me that mail is 
absolutely mission critical.

I'd love to get 10% more battery life from my laptop, but not at the expense
of having a file system that loses data on any unclean shutdown.  Be it 
moving $500, storing E-Mail, or just saving a document I had been working on.  
With this patch in play when I tell an app to save a document will it?

Later on,
-- 
"Outside of a dog, a book is man's best friend. Inside of a dog,
it's too dark
to read."
 - Groucho Marx

Kirk McKusick

2003-Apr-13 10:55 UTC

head link

PATCH: Forcible delaying of UFS (soft)updates

I am of the opinion that fsync should work. Applications like
`vi' use fsync to ensure that the write of the new file is on
stable store before removing the old copy. If that semantic
is broken, it would be possible to have neither the old nor
the new copy of your file after a crash. I do not consider
that acceptable behavior. Further, the fsync call is used
to ensure that link/unlink/rename have been completed. So
more than just fsync is being affected by your change. Lastly,
I often write out a file when I am about to suspend my laptop
(for low battery or other reasons) and I really want that file
on the disk now. I do not want to have to wait for it to decide
at some future time to spin up the disk.

I suggest that you make the disabling of fsync a separate
option from the rest of your change so that people can
decide for themselves whether they want partial savings
with working semantics, or greater savings with broken
semantics. I am also intrigued by the changes proposed by
Ian Dowse that may better accomplish the same goals with
less breakage.

	Kirk McKusick

David Schultz

2003-Apr-14 03:19 UTC

head link

PATCH: Forcible delaying of UFS (soft)updates

On Sat, Apr 12, 2003, Marko Zec wrote:> Here's a patch against 4.8-RELEASE kernel that allows disk writes on
> softupdates-enabled filesystems to be delayed for (theoretically)
> arbitrarily long periods of time. The motivation for such updating
> policy is surprisingly not purely suicidal - it can allow disks on
> laptops to spin down immediately after I/O operations and stay idle for
> longer periods of time, thus saving considerable amount of battery
> power.
Very nice!  I have been thinking about doing something like this
for a long time, but I never managed to find the time.  Some
comments:

- As others have mentioned, the fsync-disabling feature is questionable
  and ought to be separate.  You can make it somewhat more useful by at
  least guaranteeing transactional consistency, i.e. by treating every
  fsync() call as a write barrier.  You would need to ensure this for
  both data and metadata, which I expect would be devilishly hard to do
  within the softupdates framework.  However, you might be able to
  accomplish it at the disk buffer level.  For instance, you could
  have fsync() push the appropriate dirty buffers out to a separate
  cache, then commit the contents of the cache in the order of the
  fsyncs when the disk is next active.

- The fiddling with rushjob seems rather arbitrary.  You can probably
  just let the existing code increment it as necessary and force a sync
  if the value gets too high.

- Patches against -CURRENT would be nice.  (Sorry, that will be a doosie.)

- It looks like you have a few separate changes in there, such as
	+	TUNABLE_INT_FETCH("kern.maxvnodes", &desiredvnodes);
  and
	-	long starttime;
	+	time_t starttime;

Marko Zec

2003-Apr-15 11:43 UTC

head link

PATCH: Forcible delaying of UFS (soft)updates

Michael Collette wrote:
> Real world laptop scenario.  I just finish downloading my E-Mail.  I then
take
> and put this machine into a suspend mode.  Upon awakening the system
glitches
> for some reason forcing an unexpected system shutdown.
>
> (Note: I am having this problem now with a Thinkpad T23)
>
> Did my mail get written to the drive prior to suspending?
It should get synched, since the APM daemon by default calls a global sync (not
fsync!) a couple of times before suspending, to ensure FS consistency in case
the
system never resumes successfully.

Marko

Marko Zec

2003-Apr-15 12:19 UTC

head link

PATCH: Forcible delaying of UFS (soft)updates

David Schultz wrote:
>   For instance, you could
>   have fsync() push the appropriate dirty buffers out to a separate
>   cache, then commit the contents of the cache in the order of the
>   fsyncs when the disk is next active.
Huh... such a concept would still break fsync() semantics. Note that the
original patch also ensures dirty buffers get flushed if / when the disk spins
up, even before the delay timer gets expired.
> - The fiddling with rushjob seems rather arbitrary.  You can probably
>   just let the existing code increment it as necessary and force a sync
>   if the value gets too high.
If rushjob is would not be used for forcing prompt synching, the original code
could not guarantee the sync to occur immediately. Instead, the synching could
be further delayed for up to 30 seconds, which is not desirable if our major
design goal is to do as much disk I/O as possible in a small time interval and
leave the disk idle otherwise.

Marko

Oliver Fromme

2003-Apr-16 12:27 UTC

head link

PATCH: Forcible delaying of UFS (soft)updates

Chris Dillon <cdillon@wolves.k12.mo.us> wrote:
 > On Wed, 16 Apr 2003, Terry Lambert wrote:
 > > [Flash memory]
 > > The life expectancy of these devices is really, really
 > > underestimated.  In practice, I've seen two million write cycles
 > > from some of these in lab machines which get rewritten pretty often.
 > 
 > I realize they have what looks like a really big number of writes on a
 > human scale, but to a computer which does things methodically day in
 > and day out without stopping, those writes can add up relatively
 > quickly.  Even with a life of two million write cycles, the
 > "occasional" 30-second round of updates that happen to write the
same
 > bits over and over

The controller in things such as CompactFlash cards will
_not_ write the same physical bits over and over.
Those beasts are clever enough to remap logical blocks
to different physical blocks upon each write access, so
that the written-to flash cells are evenly distributed
over the whole physical range.

You can probably update the atime of files 100 million
times and more without any problems, because all of those
100 million writes will end up on all different flash
blocks.  Of course, that's provided that there are also
areas in your filesystem which are less frequently written
to, but that's usually the case (how often do you rewrite
binaries and libs?).

So I agree with Terry that the life expectancy of flash
devices really underestimated.

Regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co KG, Oettingenstr. 2, 80538 M?nchen
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

"If you do things right, people won't be sure you've done
anything at all." -- God in Futurama season 4 episode 8

Maybe Matching Threads

Search for more seemingly similar threads

freebsd stable - Apr 2003 - PATCH: Forcible delaying of UFS (soft)updates

PATCH: Forcible delaying of UFS (soft)updates

PATCH: Forcible delaying of UFS (soft)updates

PATCH: Forcible delaying of UFS (soft)updates

PATCH: Forcible delaying of UFS (soft)updates

PATCH: Forcible delaying of UFS (soft)updates

PATCH: Forcible delaying of UFS (soft)updates

PATCH: Forcible delaying of UFS (soft)updates

PATCH: Forcible delaying of UFS (soft)updates

PATCH: Forcible delaying of UFS (soft)updates

PATCH: Forcible delaying of UFS (soft)updates

PATCH: Forcible delaying of UFS (soft)updates

PATCH: Forcible delaying of UFS (soft)updates

PATCH: Forcible delaying of UFS (soft)updates

PATCH: Forcible delaying of UFS (soft)updates

PATCH: Forcible delaying of UFS (soft)updates

Maybe Matching Threads