I think I found a bug affecting btrfs filesystems and users invoking fstrim to discard unused blocks: if I execute a `fstrim -v /` twice, the amount trimmed does not change on the 2nd invocation AND it takes just as long as the first. Why do I think this is a bug? When I do the same on an ext4 partition I get different behavior: the output shows 0 B trimmed and it does is instantaneously when I run it a 2nd time. After contacting the fstrim developer, he stated that the userspace part (fstrim) does only one thing and it is invoke an ioctl (FITRIM); it is the job of the filesystem to properly implement this. Supporting data ---------------- Example on a btrfs partition: The 1st time: % time sudo fstrim -v / /: 5.2 GiB (5575192576 bytes) trimmed sudo fstrim -v / 0.00s user 0.05s system 2% cpu 2.084 total The 2nd time: % time sudo fstrim -v / /: 5.2 GiB (5575192576 bytes) trimmed sudo fstrim -v / 0.00s user 0.06s system 2% cpu 2.107 total If I run the command twice on an ext4 filesystem, it does go to zero and the 2nd invocation is instantaneous: The 1st time: % time sudo fstrim -v / /: 15.4 GiB (16481087488 bytes) trimmed sudo fstrim -v / 0.00s user 0.08s system 1% cpu 6.268 total The 2nd time: % time sudo fstrim -v / /: 0 B (0 bytes) trimmed sudo fstrim -v / 0.00s user 0.00s system 48% cpu 0.007 total -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Mike Audia posted on Thu, 10 Oct 2013 06:20:42 -0400 as excerpted:> I think I found a bug affecting btrfs filesystems and users invoking > fstrim to discard unused blocks: if I execute a `fstrim -v /` twice, the > amount trimmed does not change on the 2nd invocation AND it takes just > as long as the first. Why do I think this is a bug? When I do the same > on an ext4 partition I get different behavior: the output shows 0 B > trimmed and it does is instantaneously when I run it a 2nd time. After > contacting the fstrim developer, he stated that the userspace part > (fstrim) does only one thing and it is invoke an ioctl (FITRIM); it is > the job of the filesystem to properly implement this.This behavior is documented in the fstrim manpage under -v/--verbose:>>> When [--verbose is] specified fstrim will output the number of bytes >>> passed from the filesystem down the block stack to the device for >>> potential discard. This number is a maximum discard amount from the >>> storage device''s perspective, because FITRIM ioctl called repeated >>> will keep sending the same sectors for discard repeatedly. >>> >>> fstrim will report the same potential discard bytes each time, but >>> only sectors which had been written to between the discards would >>> actually be discarded by the storage device.Why ext4 behavior doesn''t conform to that fstrim documentation I can''t say (except by stating the obvious that the ext4 filesystem implementation of that ioctl obviously does it differently, but why... you''d have to either ask the ext4 folks or read its docs/sources), but given that fstrim documentation, the btrfs behavior is certainly NOTABUG as it''s simply conforming to the documentation. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/10/13 6:39 AM, Duncan wrote:> Mike Audia posted on Thu, 10 Oct 2013 06:20:42 -0400 as excerpted: > >> I think I found a bug affecting btrfs filesystems and users invoking >> fstrim to discard unused blocks: if I execute a `fstrim -v /` twice, the >> amount trimmed does not change on the 2nd invocation AND it takes just >> as long as the first. Why do I think this is a bug? When I do the same >> on an ext4 partition I get different behavior: the output shows 0 B >> trimmed and it does is instantaneously when I run it a 2nd time. After >> contacting the fstrim developer, he stated that the userspace part >> (fstrim) does only one thing and it is invoke an ioctl (FITRIM); it is >> the job of the filesystem to properly implement this. > > This behavior is documented in the fstrim manpage under -v/--verbose: > >>>> When [--verbose is] specified fstrim will output the number of bytes >>>> passed from the filesystem down the block stack to the device for >>>> potential discard. This number is a maximum discard amount from the >>>> storage device''s perspective, because FITRIM ioctl called repeated >>>> will keep sending the same sectors for discard repeatedly. >>>> >>>> fstrim will report the same potential discard bytes each time, but >>>> only sectors which had been written to between the discards would >>>> actually be discarded by the storage device. > > Why ext4 behavior doesn''t conform to that fstrim documentation I can''t > say (except by stating the obvious that the ext4 filesystem > implementation of that ioctl obviously does it differently, but why... > you''d have to either ask the ext4 folks or read its docs/sources), but > given that fstrim documentation, the btrfs behavior is certainly NOTABUG > as it''s simply conforming to the documentation.ext4 is conforming just fine. "fstrim will output the number of bytes passed from the filesystem down the block stack to the device for potential discard." It reports the number of bytes passed *from the filesystem* to the block device for discard, not the total range requested by the user. If the filesystem is clever enough to know that the range in question has not been written to since the last discard, then it takes no action, and reports zero bytes. So it sounds like btrfs doesn''t maintain this "already discarded" state, and will "re-discard" unused regions every time fstrim is issued. Not a bug per se, but not really optimized. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> If the filesystem is clever enough to know that the range in question has > not been written to since the last discard, then it takes no action, and > reports zero bytes.File system images can be rewritten on a new media so there is a drawback to that. Best Regards -Emil -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/11/13 10:14 AM, Emil Karlson wrote:>> If the filesystem is clever enough to know that the range in question has >> not been written to since the last discard, then it takes no action, and >> reports zero bytes. > > File system images can be rewritten on a new media so there is a > drawback to that.It''s in-memory for the mounted filesystem, not on disk. It checks the EXT4_GROUP_INFO_WAS_TRIMMED_BIT flag stored in bb_state in the ext4_group_info structure. So when you mount a dd''d copy, it takes a fresh look, and DTRT. -Eric> Best Regards > -Emil >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html