search for: ckpt

Displaying 6 results from an estimated 6 matches for "ckpt".

Did you mean: cept
2018 May 23
0
Rebalance state stuck or corrupted
...55.777149] I [MSGID: 109022] [dht-rebalance.c:1703:dht_migrate_file] 0-gv0-dht: completed migration of /pnrsy/v-zhli2/generated/ende_with_teacher/model/translate_ende_wmt32k_distill/transformer_nat-transformer_nat_base_v1-id016_lr0.1_4000_reg5.0_neighbor_hinge0.5_exp_distill_2.0_no_average_kl/model.ckpt-50724.data-00002-of-00003 from subvolume gv0-replicate-0 to gv0-replicate-3 [2018-05-23 17:32:55.782048] W [dht-rebalance.c:2826:gf_defrag_process_dir] 0-gv0-dht: Found error from gf_defrag_get_entry [2018-05-23 17:32:55.782358] E [MSGID: 109111] [dht-rebalance.c:3123:gf_defrag_fix_layout] 0-gv0-dh...
2010 Apr 12
0
ocfs2/o2cb problem with openais/pacemaker
...id: 569559765 ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node 569559765 is now known as app1a.xlhost.de 1271072439 setup_stack at 168: Cluster connection established. Local node id: 569559765 1271072439 setup_stack at 172: Added Pacemaker as client 1 with fd 5 1271072439 setup_ckpt at 609: Initializing CKPT service (try 1) 1271072439 setup_ckpt at 615: Connected to CKPT service with handle 0x327b23c600000000 1271072439 call_ckpt_open at 160: Opening checkpoint "ocfs2:controld:21f2cad5" (try 1) 1271072439 call_ckpt_open at 170: Opened checkpoint "ocfs2:controld:...
2017 Nov 16
0
Missing files on one of the bricks
...-e hex /mnt/AIDATA/data/home/allac/experiments/171023_105655_mini_imagenet_projection_size_mixing_depth_num_filters_filter_size_block_depth_Explore\ architecture\ capacity/Explore\ architecture\ capacity\(projection_size\=32\;mixing_depth\=0\;num_filters\=64\;filter_size\=3\;block_depth\=3\)/model.ckpt-70001.data-00000-of-00001.tempstate1629411508065733704 getfattr: Removing leading '/' from absolute path names # file: mnt/AIDATA/data/home/allac/experiments/171023_105655_mini_imagenet_projection_size_mixing_depth_num_filters_filter_size_block_depth_Explore architecture capacity/Explore ar...
2017 Nov 16
2
Missing files on one of the bricks
On 11/16/2017 04:12 PM, Nithya Balachandran wrote: > > > On 15 November 2017 at 19:57, Frederic Harmignies > <frederic.harmignies at elementai.com > <mailto:frederic.harmignies at elementai.com>> wrote: > > Hello, we have 2x files that are missing from one of the bricks. > No idea how to fix this. > > Details: > > # gluster volume
2009 Apr 08
1
ocfs2_controld.cman
If I start ocfs2_controld.cman in parallel on a few nodes, only one of them starts up, the others exit with one of these errors: call_section_read at 370: Reading from section "daemon_protocol" on checkpoint "ocfs2:controld" (try 1) call_section_read at 387: Checkpoint "ocfs2:controld" does not have a section named "daemon_protocol" call_section_read at
2013 Dec 10
4
Structure needs cleaning on some files
Hi All, When reading some files we get this error: md5sum: /path/to/file.xml: Structure needs cleaning in /var/log/glusterfs/mnt-sharedfs.log we see these errors: [2013-12-10 08:07:32.256910] W [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0: remote operation failed: No such file or directory [2013-12-10 08:07:32.257436] W [client-rpc-fops.c:526:client3_3_stat_cbk]