Displaying 6 results from an estimated 6 matches for "ckpt".
Did you mean:
cept
2018 May 23
0
Rebalance state stuck or corrupted
...55.777149] I [MSGID: 109022]
[dht-rebalance.c:1703:dht_migrate_file] 0-gv0-dht: completed migration of
/pnrsy/v-zhli2/generated/ende_with_teacher/model/translate_ende_wmt32k_distill/transformer_nat-transformer_nat_base_v1-id016_lr0.1_4000_reg5.0_neighbor_hinge0.5_exp_distill_2.0_no_average_kl/model.ckpt-50724.data-00002-of-00003
from subvolume gv0-replicate-0 to gv0-replicate-3
[2018-05-23 17:32:55.782048] W [dht-rebalance.c:2826:gf_defrag_process_dir]
0-gv0-dht: Found error from gf_defrag_get_entry
[2018-05-23 17:32:55.782358] E [MSGID: 109111]
[dht-rebalance.c:3123:gf_defrag_fix_layout] 0-gv0-dh...
2010 Apr 12
0
ocfs2/o2cb problem with openais/pacemaker
...id: 569559765
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node
569559765 is now known as app1a.xlhost.de
1271072439 setup_stack at 168: Cluster connection established. Local node
id: 569559765
1271072439 setup_stack at 172: Added Pacemaker as client 1 with fd 5
1271072439 setup_ckpt at 609: Initializing CKPT service (try 1)
1271072439 setup_ckpt at 615: Connected to CKPT service with handle
0x327b23c600000000
1271072439 call_ckpt_open at 160: Opening checkpoint
"ocfs2:controld:21f2cad5" (try 1)
1271072439 call_ckpt_open at 170: Opened checkpoint "ocfs2:controld:...
2017 Nov 16
0
Missing files on one of the bricks
...-e hex
/mnt/AIDATA/data/home/allac/experiments/171023_105655_mini_imagenet_projection_size_mixing_depth_num_filters_filter_size_block_depth_Explore\
architecture\ capacity/Explore\ architecture\
capacity\(projection_size\=32\;mixing_depth\=0\;num_filters\=64\;filter_size\=3\;block_depth\=3\)/model.ckpt-70001.data-00000-of-00001.tempstate1629411508065733704
getfattr: Removing leading '/' from absolute path names
# file:
mnt/AIDATA/data/home/allac/experiments/171023_105655_mini_imagenet_projection_size_mixing_depth_num_filters_filter_size_block_depth_Explore
architecture capacity/Explore ar...
2017 Nov 16
2
Missing files on one of the bricks
On 11/16/2017 04:12 PM, Nithya Balachandran wrote:
>
>
> On 15 November 2017 at 19:57, Frederic Harmignies
> <frederic.harmignies at elementai.com
> <mailto:frederic.harmignies at elementai.com>> wrote:
>
> Hello, we have 2x files that are missing from one of the bricks.
> No idea how to fix this.
>
> Details:
>
> # gluster volume
2009 Apr 08
1
ocfs2_controld.cman
If I start ocfs2_controld.cman in parallel on a few nodes, only one of them
starts up, the others exit with one of these errors:
call_section_read at 370: Reading from section "daemon_protocol" on checkpoint "ocfs2:controld" (try 1)
call_section_read at 387: Checkpoint "ocfs2:controld" does not have a section named "daemon_protocol"
call_section_read at
2013 Dec 10
4
Structure needs cleaning on some files
Hi All,
When reading some files we get this error:
md5sum: /path/to/file.xml: Structure needs cleaning
in /var/log/glusterfs/mnt-sharedfs.log we see these errors:
[2013-12-10 08:07:32.256910] W
[client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0: remote
operation failed: No such file or directory
[2013-12-10 08:07:32.257436] W
[client-rpc-fops.c:526:client3_3_stat_cbk]