Peter Wu
2014-Aug-10 20:19 UTC
[Libguestfs] New Python API? (was: Re: About the return value of value_value)
(renaming subject as I am partially getting off-topic) On Sunday 10 August 2014 16:26:07 Richard W.M. Jones wrote:> > The next issue I see now is about the value_value function. This is > > briefly documented as: "return data length, data type and data of a > > value". > > > > For Perl, Python and OCaml, this is not true. A tuple is returned > > for both without the length (as this can be calculated from the data > > value). Ruby is the outlier here that uses a dictionary with three > > keys. I am not familar with Ruby and neither do I know Ruby users of > > hivex. > > > > The documentation should likely be fixed to exclude the length, but > > what about the Ruby API? Is it correct or should a documentation > > note be added that Ruby differs > > Note that the documentation applies to the C API (where length is > returned). Since the same generated documentation is used for the > other languages too, that can make it a bit patchy.The Python documentation is scare on the type of the various parameters and return values. Moreover, it states "Read the hivex(3) man page to find out how to use the API." Perhaps a second API should be created that is more pythonic (read: easier to use)? I mean, right now you have to use this (with some patches[0][1], also available at git[2]): import hivex from hivex.hive_types import * h = hivex.Hivex("system", write=True) ccs_name = "ControlSet001" ccs = h.node_get_child(h.root(), ccs_name) services = h.node_get_child(ccs, "Services") svc_viostor = h.node_get_child(services, "viostor") start_id = h.node_get_value(svc_viostor, "Start") #node_type, node_value = h.value_value(start_id) dword_value = h.value_dword(start_id) if node_value != 4: new_value = { "key": "Start", # constant from hivex.hive_types "t": REG_DWORD, # alternative of b'\4\0\0\0' "value": 4 } h.node_set_value(svc_viostor, new_value) h.commit() It would be great if something like this could be done instead: import hivex hive = hivex.Hivex2("system", write=True) ccs_name = "ControlSet001" svc_viostor = hive.root()[ccs_name].Services.viostor if svc_viostor.Start != 4: # Automatically detect that int '4' is an DWORD svc_viostor.Start = 4 svc.commit() I (ab)use the __getattr__ methods if an object to allow this kind of modifications. See also the RegistryHandle helper class at https://github.com/Lekensteyn/qemu-tools/blob/master/vbox-to-qemu.py (_import_callback at line 216 may also be interesting) This is a quick implementation with not much thought put into it, but what do you think of the idea to make an easier API next to the current one? In the current implementation, Python 3 bytes (Python 2 strings) are treated as plain bytes(*). That is fine, but Unicode is not handled correctly. This might also be an opportunity to treat Unicode strings as UTF-16 (LE) strings which must be nul-terminated. So u'Bar' should become b'B\0a\0r\0\0\0'. (*) Actually, Hivex 1.3.10 is broken in Python 3 and tries to convert all strings from UTF-8 to bytes and segfaults on other input which does not work for UTF-16 strings[0].> In Ruby it seems as if the length could be calculated from the string. > On the other hand, I'm not sure there is any point in intentionally > removing the length from the return value, as that might break callers > for no particular reason. > > The best plan here is probably to add a note to the Ruby documentation > for RLenTypeVal saying what the hash contains on Ruby.... and mention that all other language bindings return a tuple / list / array with just two elements as the length can be found from the value? Kind regards, Peter https://lekensteyn.nl [0]: https://www.redhat.com/archives/libguestfs/2014-August/msg00050.html [1]: https://www.redhat.com/archives/libguestfs/2014-August/msg00053.html [2]: https://github.com/Lekensteyn/hivex/compare/master...develop
Richard W.M. Jones
2014-Aug-10 21:18 UTC
Re: [Libguestfs] New Python API? (was: Re: About the return value of value_value)
On Sun, Aug 10, 2014 at 10:19:39PM +0200, Peter Wu wrote:> The Python documentation is scare on the type of the various parameters and > return values. Moreover, it states > > "Read the hivex(3) man page to find out how to use the API." > > Perhaps a second API should be created that is more pythonic (read: > easier to use)?One (negative) thing we learned doing libvirt is that unless you generate the language bindings and C API together, the language bindings inevitably get out of date, or (worse) contain non-systematic errors which are difficult to discover and correct. Therefore you're welcome to create a more Pythonic hivex API either on top of the existing Python API or talking directly to C, but we couldn't accept it upstream (well, unless it was fully generated and included in generator.ml, but that seems unlikely to be possible). Having said that ...> hive = hivex.Hivex2("system", write=True) > ccs_name = "ControlSet001" > svc_viostor = hive.root()[ccs_name].Services.viostor > if svc_viostor.Start != 4: > # Automatically detect that int '4' is an DWORD > svc_viostor.Start = 4 > svc.commit()... a possible exception would be if it just involves adding some extra code to the existing hivex.py file, eg. adding a just the extra classes with __setattr__ and __getattr__ functions.> I (ab)use the __getattr__ methods if an object to allow this kind of > modifications. See also the RegistryHandle helper class at > https://github.com/Lekensteyn/qemu-tools/blob/master/vbox-to-qemu.py > (_import_callback at line 216 may also be interesting)Noted. If this could be added to the existing hivex.py ... [...]> In the current implementation, Python 3 bytes (Python 2 strings) are treated > as plain bytes(*). That is fine, but Unicode is not handled correctly. This > might also be an opportunity to treat Unicode strings as UTF-16 (LE) strings > which must be nul-terminated. So u'Bar' should become b'B\0a\0r\0\0\0'.It's worth saying that encoding in the registry itself is not always UTF-16LE. It's sometimes UTF-8, ASCII or (in a case I found last week) an NLS like ISO-8859-1 or Big5. Essentially the consuming app always has to know what encoding to use. Doing "clever" stuff in the bindings is therefore almost always going to be wrong in some case. (This is also why the C functions like hivex_value_string are deprecated).> (*) Actually, Hivex 1.3.10 is broken in Python 3 and tries to convert all > strings from UTF-8 to bytes and segfaults on other input which does not work > for UTF-16 strings[0].> > In Ruby it seems as if the length could be calculated from the string. > > On the other hand, I'm not sure there is any point in intentionally > > removing the length from the return value, as that might break callers > > for no particular reason. > > > > The best plan here is probably to add a note to the Ruby documentation > > for RLenTypeVal saying what the hash contains on Ruby. > > ... and mention that all other language bindings return a tuple / > list / array with just two elements as the length can be found from > the value?Yup. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
On Sunday 10 August 2014 22:18:04 Richard W.M. Jones wrote:> On Sun, Aug 10, 2014 at 10:19:39PM +0200, Peter Wu wrote: > > The Python documentation is scare on the type of the various parameters > > and > > return values. Moreover, it states > > > > "Read the hivex(3) man page to find out how to use the API." > > > > Perhaps a second API should be created that is more pythonic (read: > > easier to use)? > > One (negative) thing we learned doing libvirt is that unless you > generate the language bindings and C API together, the language > bindings inevitably get out of date, or (worse) contain non-systematic > errors which are difficult to discover and correct. > > Therefore you're welcome to create a more Pythonic hivex API either on > top of the existing Python API or talking directly to C, but we > couldn't accept it upstream (well, unless it was fully generated and > included in generator.ml, but that seems unlikely to be possible).I was thinking of basing the more Pythonic API on top of the current hivex.Hivex class, not adding more functionality to that wrapper. If someone would like to create a broken registry, (s)he then has the full power with the low-level API. If on the other hand one is looking for a way to access a registry without breaking, a nicer API would be nice. Something that prevents a programmer from writing 1 byte to a DWORD type for example. Something that makes traversing through registry keys easier (as demonstrated before). Would there be interest for inclusion of such an API in hivex? Since it uses the existing Python methods, breakage must not be possible unless you break other programs relying on it.> Having said that ... > > > hive = hivex.Hivex2("system", write=True) > > ccs_name = "ControlSet001" > > svc_viostor = hive.root()[ccs_name].Services.viostor > > > > if svc_viostor.Start != 4: > > # Automatically detect that int '4' is an DWORD > > svc_viostor.Start = 4 > > > > svc.commit() > > ... a possible exception would be if it just involves adding some > extra code to the existing hivex.py file, eg. adding a just the extra > classes with __setattr__ and __getattr__ functions.Yes, the low-level binding is left intact, it's just a new Hivex2 class that is being added. No more changes are needed in libhivexmod.> > In the current implementation, Python 3 bytes (Python 2 strings) are > > treated as plain bytes(*). That is fine, but Unicode is not handled > > correctly. This might also be an opportunity to treat Unicode strings as > > UTF-16 (LE) strings which must be nul-terminated. So u'Bar' should become > > b'B\0a\0r\0\0\0'. > It's worth saying that encoding in the registry itself is not always > UTF-16LE. It's sometimes UTF-8, ASCII or (in a case I found last > week) an NLS like ISO-8859-1 or Big5. Essentially the consuming app > always has to know what encoding to use. Doing "clever" stuff in the > bindings is therefore almost always going to be wrong in some case. > (This is also why the C functions like hivex_value_string are > deprecated).When doing a registry export (.reg), all strings like "Key"="Value" appears to be UTF-16 strings. Trying to push an UTF-8 string into the registry results in Chinese characters (UTF-16?). Could you confirm/reject this against the exports of your keys? Also, when the trailing NUL byte is missing in the services values, a BSOD can be observed. If it is necessary to support other encodings, it may be worth to add a function to wrap the encoding, (type?) and value: UTF_16_LE = "utf-16-le" class RegistryString(object): def __init__(self, type, value, encoding=UTF_16_LE): ... def value(self): return self.value.encode(self.encoding) + u"\0".encode(self.encoding) (maybe introduce a wrapper function for this to avoid long lines) Strings are always NUL-terminated, right? I recall reading something like that in the MSDN documentation. Kind regards, Peter https://lekensteyn.nl