Two patches follow this message. One fixes the default implementation of SEEK_HOLE/DATA. This patch applies atop Josef's last posted patch. The second patch implements the same on ocfs2. The test tool for the same is available here. http://oss.oracle.com/~smushran/seek_data/seek_test.c It is improved since the last post. It runs cleanly on zfs, ocfs2 and ext3 (default behavior). Users testing on zfs will need to flip the values of SEEK_HOLE/DATA. The results on ext4 and btrfs are also available here. http://oss.oracle.com/~smushran/seek_data/ Thanks Sunil
Sunil Mushran
2011-May-19 02:44 UTC
[Ocfs2-devel] [PATCH] fs: Fix default SEEK_DATA and SEEK_HOLE implementation
In the default SEEK_DATA implementation, as the entire file is treated as data, all supplied offsets less than the size of the file are valid. For empty files, SEEK_DATA and SEEK_HOLE returns -ENXIO irrespective of the input offset. Signed-off-by: Sunil Mushran <sunil.mushran at oracle.com> --- fs/read_write.c | 9 ++++----- 1 files changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index af9cc51..4b991e9 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -66,11 +66,10 @@ generic_file_llseek_unlocked(struct file *file, loff_t offset, int origin) break; case SEEK_DATA: /* - * In the generic case the entire file is data, so data only - * starts at position 0 provided the file has an i_size, - * otherwise it's an empty file and will always be ENXIO. + * In the generic case the entire file is data. So all offsets + * less than the file size are valid. */ - if (offset != 0 || inode->i_size == 0) + if (inode->i_size == 0 || offset >= inode->i_size) return -ENXIO; break; case SEEK_HOLE: @@ -78,7 +77,7 @@ generic_file_llseek_unlocked(struct file *file, loff_t offset, int origin) * There is a virtual hole at the end of the file, so as long as * offset isn't i_size or larger, return i_size. */ - if (offset >= inode->i_size) + if (inode->i_size == 0 || offset >= inode->i_size) return -ENXIO; offset = inode->i_size; break; -- 1.7.4.1
ocfs2 implements its own llseek() to provide the SEEK_HOLE/SEEK_DATA functionality. SEEK_HOLE sets the file pointer to the start of either a hole or an unwritten (preallocated) extent, that is greater than or equal to the supplied offset. SEEK_DATA sets the file pointer to the start of an allocated extent (not unwritten) that is greater than or equal to the supplied offset. If the supplied offset is on a desired region, then the file pointer is set to it. Offsets greater than or equal to the file size return -ENXIO. Unwritten (preallocated) extents are considered holes because the file system treats reads to such regions in the same way as it does to holes. Signed-off-by: Sunil Mushran <sunil.mushran at oracle.com> --- fs/ocfs2/extent_map.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++ fs/ocfs2/extent_map.h | 2 + fs/ocfs2/file.c | 53 ++++++++++++++++++++++++++- 3 files changed, 150 insertions(+), 2 deletions(-) diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c index 23457b4..6942c21 100644 --- a/fs/ocfs2/extent_map.c +++ b/fs/ocfs2/extent_map.c @@ -832,6 +832,103 @@ out: return ret; } +int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int origin) +{ + struct inode *inode = file->f_mapping->host; + int ret; + unsigned int is_last = 0, is_data = 0; + u16 cs_bits = OCFS2_SB(inode->i_sb)->s_clustersize_bits; + u32 cpos, cend, clen, hole_size; + u64 extoff, extlen; + struct buffer_head *di_bh = NULL; + struct ocfs2_extent_rec rec; + + BUG_ON(origin != SEEK_DATA && origin != SEEK_HOLE); + + ret = ocfs2_inode_lock(inode, &di_bh, 0); + if (ret) { + mlog_errno(ret); + goto out; + } + + down_read(&OCFS2_I(inode)->ip_alloc_sem); + + if (inode->i_size == 0 || *offset >= inode->i_size) { + ret = -ENXIO; + goto out_unlock; + } + + if (OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) { + if (origin == SEEK_HOLE) + *offset = inode->i_size; + goto out_unlock; + } + + clen = 0; + cpos = *offset >> cs_bits; + cend = ocfs2_clusters_for_bytes(inode->i_sb, inode->i_size); + + while (cpos < cend && !is_last) { + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, &hole_size, + &rec, &is_last); + if (ret) { + mlog_errno(ret); + goto out_unlock; + } + + extoff = cpos; + extoff <<= cs_bits; + + if (rec.e_blkno == 0ULL) { + clen = hole_size; + is_data = 0; + } else { + BUG_ON(cpos < le32_to_cpu(rec.e_cpos)); + clen = le16_to_cpu(rec.e_leaf_clusters) - + (cpos - le32_to_cpu(rec.e_cpos)); + is_data = (rec.e_flags & OCFS2_EXT_UNWRITTEN) ? 0 : 1; + } + + if ((!is_data && origin == SEEK_HOLE) || + (is_data && origin == SEEK_DATA)) { + if (extoff > *offset) + *offset = extoff; + goto out_unlock; + } + + if (!is_last) + cpos += clen; + } + + if (origin == SEEK_HOLE) { + extoff = cpos; + extoff <<= cs_bits; + extlen = clen; + extlen <<= cs_bits; + + if ((extoff + extlen) > inode->i_size) + extlen = inode->i_size - extoff; + extoff += extlen; + if (extoff > *offset) + *offset = extoff; + goto out_unlock; + } + + ret = -ENXIO; + +out_unlock: + + brelse(di_bh); + + up_read(&OCFS2_I(inode)->ip_alloc_sem); + + ocfs2_inode_unlock(inode, 0); +out: + if (ret && ret != -ENXIO) + ret = -ENXIO; + return ret; +} + int ocfs2_read_virt_blocks(struct inode *inode, u64 v_block, int nr, struct buffer_head *bhs[], int flags, int (*validate)(struct super_block *sb, diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h index e79d41c..67ea57d 100644 --- a/fs/ocfs2/extent_map.h +++ b/fs/ocfs2/extent_map.h @@ -53,6 +53,8 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno, int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, u64 map_start, u64 map_len); +int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int origin); + int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster, u32 *p_cluster, u32 *num_clusters, struct ocfs2_extent_list *el, diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index cce8c2b..5349a4b 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -2572,6 +2572,55 @@ bail: return ret; } +/* Refer generic_file_llseek_unlocked() */ +static loff_t ocfs2_file_llseek(struct file *file, loff_t offset, int origin) +{ + struct inode *inode = file->f_mapping->host; + int ret = 0; + + mutex_lock(&inode->i_mutex); + + switch (origin) { + case SEEK_END: + offset += inode->i_size; + break; + case SEEK_CUR: + if (offset == 0) { + offset = file->f_pos; + goto out; + } + offset += file->f_pos; + break; + case SEEK_DATA: + case SEEK_HOLE: + ret = ocfs2_seek_data_hole_offset(file, &offset, origin); + if (ret) + goto out; + break; + default: + ret = -EINVAL; + goto out; + } + + if (offset < 0 && !(file->f_mode & FMODE_UNSIGNED_OFFSET)) + ret = -EINVAL; + if (!ret && offset > inode->i_sb->s_maxbytes) + ret = -EINVAL; + if (ret) + goto out; + + if (offset != file->f_pos) { + file->f_pos = offset; + file->f_version = 0; + } + +out: + mutex_unlock(&inode->i_mutex); + if (ret) + return ret; + return offset; +} + const struct inode_operations ocfs2_file_iops = { .setattr = ocfs2_setattr, .getattr = ocfs2_getattr, @@ -2594,7 +2643,7 @@ const struct inode_operations ocfs2_special_file_iops = { * ocfs2_fops_no_plocks and ocfs2_dops_no_plocks! */ const struct file_operations ocfs2_fops = { - .llseek = generic_file_llseek, + .llseek = ocfs2_file_llseek, .read = do_sync_read, .write = do_sync_write, .mmap = ocfs2_mmap, @@ -2642,7 +2691,7 @@ const struct file_operations ocfs2_dops = { * the cluster. */ const struct file_operations ocfs2_fops_no_plocks = { - .llseek = generic_file_llseek, + .llseek = ocfs2_file_llseek, .read = do_sync_read, .write = do_sync_write, .mmap = ocfs2_mmap, -- 1.7.4.1
On Wed, May 18, 2011 at 07:44:42PM -0700, Sunil Mushran wrote:> It is improved since the last post. It runs cleanly on zfs, ocfs2 and ext3 > (default behavior). Users testing on zfs will need to flip the values of > SEEK_HOLE/DATA.sounds like we should switch the around, just to cause the least amount of confusion.
Apparently Analagous Threads
- SEEK_DATA/HOLE on ocfs2 - v2
- FIEMAP problem
- Re: [PATCH nbdkit v5 FINAL 15/19] file: Implement extents.
- [Bug 8918] New: Use fiemap to quickly detect zero ranges of source file
- [PATCH] Btrfs: return EUCLEAN rather than ENXIO once internal error has occurred for SEEK_DATA/SEEK_HOLE inquiry