Benjamin Kingston
2017-May-12 08:35 UTC
[Gluster-users] Reliability issues with Gluster 3.10 and shard
Hello all, I'm trying to take advantage of the shard xlator, however I've found it causes a lot of issues that I hope is easily resolvable 1) large file operations work well (copy file from folder a to folder b 2) seek operations and list operations frequently fail (ls directory, read bytes xyz at offset 235567) Turning off the shard feature resolves this issue for new files created in the volume. mounted using the gluster fuse mount here's my volume settings, please let me know if there's some changes I can make. Option Value ------ ----- cluster.lookup-unhashed on cluster.lookup-optimize on cluster.min-free-disk 10% cluster.min-free-inodes 5% cluster.rebalance-stats off cluster.subvols-per-directory (null) cluster.readdir-optimize off cluster.rsync-hash-regex (null) cluster.extra-hash-regex (null) cluster.dht-xattr-name trusted.glusterfs.dht cluster.randomize-hash-range-by-gfid off cluster.rebal-throttle normal cluster.lock-migration off cluster.local-volume-name (null) cluster.weighted-rebalance on cluster.switch-pattern (null) cluster.entry-change-log on cluster.read-subvolume (null) cluster.read-subvolume-index -1 cluster.read-hash-mode 1 cluster.background-self-heal-count 8 cluster.metadata-self-heal on cluster.data-self-heal on cluster.entry-self-heal on cluster.self-heal-daemon on cluster.heal-timeout 600 cluster.self-heal-window-size 1 cluster.data-change-log on cluster.metadata-change-log on cluster.data-self-heal-algorithm diff cluster.eager-lock enable disperse.eager-lock on cluster.quorum-type auto cluster.quorum-count (null) cluster.choose-local on cluster.self-heal-readdir-size 1KB cluster.post-op-delay-secs 1 cluster.ensure-durability on cluster.consistent-metadata no cluster.heal-wait-queue-length 128 cluster.favorite-child-policy none cluster.stripe-block-size 128KB cluster.stripe-coalesce true diagnostics.latency-measurement off diagnostics.dump-fd-stats off diagnostics.count-fop-hits off diagnostics.brick-log-level INFO diagnostics.client-log-level INFO diagnostics.brick-sys-log-level CRITICAL diagnostics.client-sys-log-level CRITICAL diagnostics.brick-logger (null) diagnostics.client-logger (null) diagnostics.brick-log-format (null) diagnostics.client-log-format (null) diagnostics.brick-log-buf-size 5 diagnostics.client-log-buf-size 5 diagnostics.brick-log-flush-timeout 120 diagnostics.client-log-flush-timeout 120 diagnostics.stats-dump-interval 0 diagnostics.fop-sample-interval 0 diagnostics.fop-sample-buf-size 65535 diagnostics.stats-dnscache-ttl-sec 86400 performance.cache-max-file-size 0 performance.cache-min-file-size 0 performance.cache-refresh-timeout 1 performance.cache-priority performance.cache-size 1GB performance.io-thread-count 64 performance.high-prio-threads 16 performance.normal-prio-threads 16 performance.low-prio-threads 32 performance.least-prio-threads 1 performance.enable-least-priority on performance.cache-size 1GB performance.flush-behind on performance.nfs.flush-behind on performance.write-behind-window-size 2GB performance.resync-failed-syncs-after-fsyncoff performance.nfs.write-behind-window-size1MB performance.strict-o-direct off performance.nfs.strict-o-direct off performance.strict-write-ordering off performance.nfs.strict-write-ordering off performance.lazy-open yes performance.read-after-open no performance.read-ahead-page-count 4 performance.md-cache-timeout 1 performance.cache-swift-metadata true performance.cache-samba-metadata false performance.cache-capability-xattrs true performance.cache-ima-xattrs on features.encryption off encryption.master-key (null) encryption.data-key-size 256 encryption.block-size 4096 network.frame-timeout 1800 network.ping-timeout 42 network.tcp-window-size (null) features.lock-heal off features.grace-timeout 10 network.remote-dio disable client.event-threads 3 network.ping-timeout 42 network.tcp-window-size (null) network.inode-lru-limit 90000 auth.allow * auth.reject (null) transport.keepalive on server.allow-insecure on server.root-squash off server.anonuid 65534 server.anongid 65534 server.statedump-path /var/run/gluster server.outstanding-rpc-limit 64 features.lock-heal off features.grace-timeout 10 server.ssl (null) auth.ssl-allow * server.manage-gids off server.dynamic-auth on client.send-gids on server.gid-timeout 300 server.own-thread (null) server.event-threads 3 ssl.own-cert (null) ssl.private-key (null) ssl.ca-list (null) ssl.crl-path (null) ssl.certificate-depth (null) ssl.cipher-list (null) ssl.dh-param (null) ssl.ec-curve (null) transport.address-family inet6 performance.write-behind on performance.read-ahead off performance.readdir-ahead on performance.io-cache on performance.quick-read off performance.open-behind on performance.stat-prefetch on performance.client-io-threads on performance.nfs.write-behind on performance.nfs.read-ahead off performance.nfs.io-cache off performance.nfs.quick-read off performance.nfs.stat-prefetch off performance.nfs.io-threads off performance.force-readdirp true performance.cache-invalidation false features.uss off features.snapshot-directory .snaps features.show-snapshot-directory off network.compression off network.compression.window-size -15 network.compression.mem-level 8 network.compression.min-size 0 network.compression.compression-level -1 network.compression.debug false features.limit-usage (null) features.quota-timeout 0 features.default-soft-limit 80% features.soft-timeout 60 features.hard-timeout 5 features.alert-time 86400 features.quota-deem-statfs off geo-replication.indexing off geo-replication.indexing off -ben -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170512/a12dc3fa/attachment.html>
Benjamin Kingston
2017-May-13 06:46 UTC
[Gluster-users] Fwd: Reliability issues with Gluster 3.10 and shard
Hello all, I'm trying to take advantage of the shard xlator, however I've found it causes a lot of issues that I hope is easily resolvable 1) large file operations work well (copy file from folder a to folder b 2) seek operations and list operations frequently fail (ls directory, read bytes xyz at offset 235567) 3) Another issue is samba shares through samba-vfs show all files as 4MB, I've also seen this when mounting with fuse, however nfs-ganesha reflects correct file sizes always- Turning off the shard feature resolves this issue for new files created in the volume. mounted using the gluster fuse mount here's my volume settings, please let me know if there's some changes I can make. Volume Name: storage2 Type: Distributed-Replicate Volume ID: adaabca5-25ed-4e7f-ae86-2f20fc0143a8 Status: Started Snapshot Count: 0 Number of Bricks: 3 x (2 + 1) = 9 Transport-type: tcp Bricks: Brick1: fd00:0:0:3::6:/mnt/gluster/storage/brick0/glusterfs2 Brick2: fd00:0:0:3::8:/mnt/gluster/storage/brick0/glusterfs2 Brick3: fd00:0:0:3::10:/mnt/gluster/storage/brick0/glusterfs (arbiter) Brick4: fd00:0:0:3::6:/mnt/gluster/storage/brick1/glusterfs2 Brick5: fd00:0:0:3::8:/mnt/gluster/storage/brick1/glusterfs2 Brick6: fd00:0:0:3::10:/mnt/gluster/storage/brick1/glusterfs (arbiter) Brick7: fd00:0:0:3::6:/mnt/gluster/storage/brick2/glusterfs2 Brick8: fd00:0:0:3::8:/mnt/gluster/storage/brick2/glusterfs2 Brick9: fd00:0:0:3::10:/mnt/gluster/storage/brick2/glusterfs (arbiter) Options Reconfigured: features.ctr-enabled: on features.shard-block-size: 4MB network.inode-lru-limit: 90000 features.cache-invalidation: on performance.readdir-ahead: on client.event-threads: 3 performance.cache-ima-xattrs: on cluster.data-self-heal-algorithm: diff network.remote-dio: disable cluster.use-compound-fops: on cluster.read-freq-threshold: 2 cluster.write-freq-threshold: 2 features.record-counters: on disperse.shd-max-threads: 4 performance.parallel-readdir: on performance.client-io-threads: on server.event-threads: 3 cluster.lookup-optimize: on performance.open-behind: on performance.stat-prefetch: on performance.quick-read: off performance.io-cache: on performance.read-ahead: off performance.write-behind: on features.scrub: Active features.bitrot: on features.leases: on features.shard: off transport.address-family: inet6 nfs.disable: on server.allow-insecure: on cluster.shd-max-threads: 8 performance.low-prio-threads: 32 cluster.locking-scheme: granular cluster.shd-wait-qlength: 10000 user.cifs: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.tier-compact: on storage.linux-aio: on transport.keepalive: on performance.write-behind-window-size: 2GB performance.flush-behind: on performance.cache-size: 1GB cluster.choose-local: on performance.io-thread-count: 64 cluster.brick-multiplex: off cluster.enable-shared-storage: enable nfs-ganesha: enable -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170512/167a150b/attachment.html>