Sam Eiderman
2020-Jun-30 08:53 UTC
Re: [Libguestfs] [PATCH v3] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)
Hey Pino, Can you search for the previous patches I submitted? I had some discussions regarding this with Daniel and Nir. Thanks! On Tue, Jun 30, 2020 at 11:43 AM Pino Toscano <ptoscano@redhat.com> wrote:> On Sunday, 26 April 2020 20:14:03 CEST Sam Eiderman wrote: > > The python3 bindings create PyUnicode objects from application strings > > on the guest (i.e. installed rpm, deb packages). > > It is documented that rpm package fields such as description should be > > utf8 encoded - however in some cases they are not a valid unicode > > string, on SLES11 SP4 the encoding of the description of the following > > packages is latin1 and they fail to be converted to unicode using > > guestfs_int_py_fromstring() (which invokes PyUnicode_FromString()): > > Sorry, I wanted to reach our resident Python maintainers to get their > feedback, and so far had no time for it. Will do it shortly. > > BTW do you have a reproducer I can actually try freely? > > > diff --git a/python/handle.c b/python/handle.c > > index 2fb8c18f0..fe89dc58a 100644 > > --- a/python/handle.c > > +++ b/python/handle.c > > @@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str) > > #if PY_MAJOR_VERSION < 3 > > return PyString_FromString (str); > > #else > > - return PyUnicode_FromString (str); > > + return guestfs_int_py_fromstringsize (str, strlen (str)); > > #endif > > } > > > > @@ -397,7 +397,12 @@ guestfs_int_py_fromstringsize (const char *str, > size_t size) > > #if PY_MAJOR_VERSION < 3 > > return PyString_FromStringAndSize (str, size); > > #else > > - return PyUnicode_FromStringAndSize (str, size); > > + PyObject *s = PyUnicode_FromString (str); > > + if (s == NULL) { > > + PyErr_Clear (); > > + s = PyUnicode_Decode (str, strlen(str), "latin1", "strict"); > > Minor nit: space between "strlen" and the opening bracket. > > Also, isn't there any error we can check as a way to detect this > situation, rather than always attempting to decode it as latin1? > > Thanks, > -- > Pino Toscano
Sam Eiderman
2020-Jun-30 08:58 UTC
Re: [Libguestfs] [PATCH v3] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)
Regarding reproducing this - if possible, simply install SLES11 SP4 from CD. I'm not sure how easy it will be for you nowadays since Suse just removed SLES11 from their Downloads page. Sam On Tue, Jun 30, 2020 at 11:53 AM Sam Eiderman <sameid@google.com> wrote:> Hey Pino, > > Can you search for the previous patches I submitted? I had some > discussions regarding this with Daniel and Nir. > > Thanks! > > On Tue, Jun 30, 2020 at 11:43 AM Pino Toscano <ptoscano@redhat.com> wrote: > >> On Sunday, 26 April 2020 20:14:03 CEST Sam Eiderman wrote: >> > The python3 bindings create PyUnicode objects from application strings >> > on the guest (i.e. installed rpm, deb packages). >> > It is documented that rpm package fields such as description should be >> > utf8 encoded - however in some cases they are not a valid unicode >> > string, on SLES11 SP4 the encoding of the description of the following >> > packages is latin1 and they fail to be converted to unicode using >> > guestfs_int_py_fromstring() (which invokes PyUnicode_FromString()): >> >> Sorry, I wanted to reach our resident Python maintainers to get their >> feedback, and so far had no time for it. Will do it shortly. >> >> BTW do you have a reproducer I can actually try freely? >> >> > diff --git a/python/handle.c b/python/handle.c >> > index 2fb8c18f0..fe89dc58a 100644 >> > --- a/python/handle.c >> > +++ b/python/handle.c >> > @@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str) >> > #if PY_MAJOR_VERSION < 3 >> > return PyString_FromString (str); >> > #else >> > - return PyUnicode_FromString (str); >> > + return guestfs_int_py_fromstringsize (str, strlen (str)); >> > #endif >> > } >> > >> > @@ -397,7 +397,12 @@ guestfs_int_py_fromstringsize (const char *str, >> size_t size) >> > #if PY_MAJOR_VERSION < 3 >> > return PyString_FromStringAndSize (str, size); >> > #else >> > - return PyUnicode_FromStringAndSize (str, size); >> > + PyObject *s = PyUnicode_FromString (str); >> > + if (s == NULL) { >> > + PyErr_Clear (); >> > + s = PyUnicode_Decode (str, strlen(str), "latin1", "strict"); >> >> Minor nit: space between "strlen" and the opening bracket. >> >> Also, isn't there any error we can check as a way to detect this >> situation, rather than always attempting to decode it as latin1? >> >> Thanks, >> -- >> Pino Toscano > >
Pino Toscano
2020-Jun-30 09:00 UTC
Re: [Libguestfs] [PATCH v3] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)
On Tuesday, 30 June 2020 10:53:54 CEST Sam Eiderman wrote:> Hey Pino, > > Can you search for the previous patches I submitted? I had some discussions > regarding this with Daniel and Nir.Sure, I did read those, and I took it into account. What I said does not invalidate nor contradict that. -- Pino Toscano
Sam Eiderman
2020-Jun-30 09:10 UTC
Re: [Libguestfs] [PATCH v3] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)
I see, well the problem is that for some reason SUSE11 did not encode some of their packages as UTF8 but rather used Latin-1. There are multiple possible solutions here: 1. Do not decode application description as a string, but rather as a byte array - I am not sure regarding other than Python bindings, in C everything is a byte array anyway (as long as it is null terminated) - This will make this field less usable and require more decoding to be done by the user 2. Fallback decode "application description" specifically as latin1 (Something more closer to the first patch I submitted) 3. Fallback decode every string returned to Python as latin1 from any libguestfs API - the current patch Sam On Tue, Jun 30, 2020 at 12:00 PM Pino Toscano <ptoscano@redhat.com> wrote:> > On Tuesday, 30 June 2020 10:53:54 CEST Sam Eiderman wrote: > > Hey Pino, > > > > Can you search for the previous patches I submitted? I had some discussions > > regarding this with Daniel and Nir. > > Sure, I did read those, and I took it into account. What I said does not > invalidate nor contradict that. > > -- > Pino Toscano
Apparently Analagous Threads
- Re: [PATCH v3] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)
- Re: [PATCH v2] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)
- Re: [PATCH v2] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)
- Re: [PATCH v3] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)
- Re: [PATCH v3] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)