Yonatan Shtarkman
2023-Feb-12 13:31 UTC
[Libguestfs] Issue with downloading files whose path contains multi-byte utf-8 characters
Hey, When downloading a file whose path contains multi-byte utf-8, libguestfs sometimes crashes. This reproduces when using python, and not when using guestfish. Code to reproduce: for i in range(2000): g.download ('/xxx?', '/tmp/1') #0 raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007ffff7fac140 in <signal handler called> () at /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007ffff6f77701 in _Py_INCREF (op=<optimized out>) at /usr/include/python3.9/object.h:408 #3 guestfs_int_py_event_callback_wrapper (g=<optimized out>, flags=<optimized out>, array_len=0, array=0x0, buf_len=47, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm --debug settle -E \303by", event_handle=0, event=16, callback=0x7ffff2516600) at handle.c:137 #4 guestfs_int_py_event_callback_wrapper (g=<optimized out>, callback=0x7ffff2516600, event=16, event_handle=0, flags=<optimized out>, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm --debug settle -E \303by", buf_len=47, array=0x0, array_len=0) at handle.c:104 #5 0x00007ffff6e0076a in guestfs_int_call_callbacks_message (g=0xf31290, event=16, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm --debug settle -E \303by", buf_len=47) at events.c:117 #6 0x00007ffff6e1702e in guestfs_int_log_message_callback (g=g at entry=0xf31290, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm --debug settle -E \303by", len=len at entry=47) at proto.c:145 #7 0x00007ffff6dfb759 in handle_log_message (g=g at entry=0xf31290, conn=conn at entry=0x110e280) at conn-socket.c:395 #8 0x00007ffff6dfbd63 in read_data (len=4, bufv=<optimized out>, connv=<optimized out>, g=<optimized out>) at conn-socket.c:179 #9 read_data (g=0xf31290, connv=0x110e280, bufv=<optimized out>, len=4) at conn-socket.c:142 #10 0x00007ffff6e1764a in recv_from_daemon (buf_rtn=0x7fffffffd858, size_rtn=0x7fffffffd854, g=0xf31290) at proto.c:545 #11 guestfs_int_recv_from_daemon (g=g at entry=0xf31290, size_rtn=size_rtn at entry=0x7fffffffd854, buf_rtn=buf_rtn at entry=0x7fffffffd858) at proto.c:623 #12 0x00007ffff6e17a5a in guestfs_int_recv (g=g at entry=0xf31290, fn=fn at entry=0x7ffff6e3b3e8 "download", hdr=hdr at entry=0x7fffffffd920, err=err at entry=0x7fffffffd8f0, xdrp=xdrp at entry=0x0, ret=ret at entry=0x0) at proto.c:668 I debugged this issue and noticed that the appliance logs from commandrvf are truncated, leading to parse failure (missing utf-8 additional bytes): https://github.com/libguestfs/libguestfs/blob/master/python/handle.c#L134 UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in position 0: invalid start byte -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://listman.redhat.com/archives/libguestfs/attachments/20230212/973ad8ab/attachment.htm>
Richard W.M. Jones
2023-Feb-13 18:07 UTC
[Libguestfs] Issue with downloading files whose path contains multi-byte utf-8 characters
On Sun, Feb 12, 2023 at 03:31:08PM +0200, Yonatan Shtarkman wrote:> Hey, > When downloading a file whose path contains multi-byte utf-8, libguestfs > sometimes crashes. > This reproduces when using python, and not when using guestfish. > > Code to reproduce: > for i in range(2000): > ? ? g.download ('/xxx?', '/tmp/1')'i' is not used inside the loop? Or is this error intermittent?> #0 ?raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50 > #1 ?0x00007ffff7fac140 in <signal handler called> () at /lib/x86_64-linux-gnu/ > libpthread.so.0 > #2 ?0x00007ffff6f77701 in _Py_INCREF (op=<optimized out>) at /usr/include/ > python3.9/object.h:408 > #3 ?guestfs_int_py_event_callback_wrapper > ? ? (g=<optimized out>, flags=<optimized out>, array_len=0, array=0x0, buf_len> 47, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm --debug settle -E \303by", > event_handle=0, event=16, callback=0x7ffff2516600) at handle.c:137 > #4 ?guestfs_int_py_event_callback_wrapper > ? ? (g=<optimized out>, callback=0x7ffff2516600, event=16, event_handle=0, > flags=<optimized out>, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm --debug > settle -E \303by", buf_len=47, array=0x0, array_len=0) at handle.c:104 > #5 ?0x00007ffff6e0076a in guestfs_int_call_callbacks_message (g=0xf31290, event > =16, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm --debug settle -E \303by", > buf_len=47) > ? ? at events.c:117 > #6 ?0x00007ffff6e1702e in guestfs_int_log_message_callback > ? ? (g=g at entry=0xf31290, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm --debug > settle -E \303by", len=len at entry=47) at proto.c:145 > #7 ?0x00007ffff6dfb759 in handle_log_message (g=g at entry=0xf31290, conn> conn at entry=0x110e280) at conn-socket.c:395 > #8 ?0x00007ffff6dfbd63 in read_data (len=4, bufv=<optimized out>, connv> <optimized out>, g=<optimized out>) at conn-socket.c:179 > #9 ?read_data (g=0xf31290, connv=0x110e280, bufv=<optimized out>, len=4) at > conn-socket.c:142 > #10 0x00007ffff6e1764a in recv_from_daemon (buf_rtn=0x7fffffffd858, size_rtn> 0x7fffffffd854, g=0xf31290) at proto.c:545 > #11 guestfs_int_recv_from_daemon (g=g at entry=0xf31290, size_rtn=size_rtn at entry> 0x7fffffffd854, buf_rtn=buf_rtn at entry=0x7fffffffd858) at proto.c:623 > #12 0x00007ffff6e17a5a in guestfs_int_recv > ? ? (g=g at entry=0xf31290, fn=fn at entry=0x7ffff6e3b3e8 "download", hdr=hdr at entry> 0x7fffffffd920, err=err at entry=0x7fffffffd8f0, xdrp=xdrp at entry=0x0, ret> ret at entry=0x0) > ? ? at proto.c:668 > > I debugged this issue and noticed that the appliance logs from?commandrvf?are > truncated, leading to parse failure (missing utf-8 additional bytes): > https://github.com/libguestfs/libguestfs/blob/master/python/handle.c#L134 > UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in position 0: invalid > start byteSo I thought we'd fixed this in: https://github.com/libguestfs/libguestfs/commit/0ee02e0117527b86a31b2a88a14994ce7f15571f This is specifically a Python API problem or would it affect the C API too? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
Possibly Parallel Threads
- Issue with downloading files whose path contains multi-byte utf-8 characters
- Issue with downloading files whose path contains multi-byte utf-8 characters
- [PATCH 0/3] protocol: Abstract out socket operations.
- [PATCH 0/4] Small refactorings of the protocol layer.
- [PATCH v2] Add progress bars