thr3ads.net - Libguestfs - [Libguestfs] [libnbd PATCH] python: Speed up pread [May 2022]

If this information is useful, please help other people find it:
Share via:

Eric Blake

2022-May-27 21:44 UTC

[Libguestfs] [libnbd PATCH] python: Speed up pread

Instead of malloc'ing a C buffer, nbd_pread()ing into it, then copying
it into an immutable Python bytes object, we can instead pre-create a
correctly-sized Python bytearray object, then nbd_pread() directly
into that object's underlying bytes.

While the data copying might not be the bottleneck compared to the
networking costs, it does have noticeable results; on my machine:

$ export script='
m=1024*1024
size=h.get_size()
for i in range(size // m):
  buf = h.pread(m, m*i)
'
$ time ./run nbdkit -U - pattern 10G --run 'nbdsh -u $uri -c
"$script"'

sped up from about 7.8s pre-patch to about 7.1s post-patch,
approximately a 10% speedup.

Note that this slightly changes the python API: h.pread[_structured]
now returns a mutable 'bytearray' object, rather than an immutable
'bytes' object.  This makes it possible to modify the just-read string
in place, rather than having to create yet another memory buffer for
any modifications, which offers even more speedups when writing a
read-modify-write paradigm in python.  But the change is
backwards-compatible - python already states that a read-write buffer
may be returned instead of readonly for any client that already
operated only on a buffer in a readonly manner.
---

Note that h.pread is more like Python read() semantics in creating a
buffer, while h.aio_pread is more like Python readinto() semantics in
modifying a passed-in buffer.  But now that both code paths have a
python object prior to calling into the C API, my next task is to
improve the h.*pread_structured callback function to pass its buffer
as a slice of the Python input buffer, rather than doing yet another
round of pointless memcpy from C into python objects.

 generator/Python.ml | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/generator/Python.ml b/generator/Python.ml
index 4ab18f6..1c4446e 100644
--- a/generator/Python.ml
+++ b/generator/Python.ml
@@ -1,6 +1,6 @@
 (* hey emacs, this is OCaml code: -*- tuareg -*- *)
 (* nbd client library in userspace: Python bindings
- * Copyright (C) 2013-2021 Red Hat Inc.
+ * Copyright (C) 2013-2022 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -293,7 +293,7 @@ let
     | BytesIn (n, _) ->
        pr "  Py_buffer %s = { .obj = NULL };\n" n
     | BytesOut (n, count) ->
-       pr "  char *%s = NULL;\n" n;
+       pr "  PyObject *%s = NULL;\n" n;
        pr "  Py_ssize_t %s;\n" count
     | BytesPersistIn (n, _)
     | BytesPersistOut (n, _) ->
@@ -432,8 +432,8 @@ let
     | Bool _ -> ()
     | BytesIn _ -> ()
     | BytesOut (n, count) ->
-       pr "  %s = malloc (%s);\n" n count;
-       pr "  if (%s == NULL) { PyErr_NoMemory (); goto out; }\n" n
+       pr "  %s = PyByteArray_FromStringAndSize (NULL, %s);\n" n
count;
+       pr "  if (%s == NULL) goto out;\n" n
     | BytesPersistIn (n, _) | BytesPersistOut (n, _) ->
        pr "  %s_buf = nbd_internal_py_get_aio_buffer (%s);\n" n n;
        pr "  if (!%s_buf) goto out;\n" n;
@@ -479,7 +479,7 @@ let
     function
     | Bool n -> pr ", %s" n
     | BytesIn (n, _) -> pr ", %s.buf, %s.len" n n
-    | BytesOut (n, count) -> pr ", %s, %s" n count
+    | BytesOut (n, count) -> pr ", PyByteArray_AS_STRING (%s), %s"
n count
     | BytesPersistIn (n, _)
     | BytesPersistOut (n, _) -> pr ", %s_buf->data,
%s_buf->len" n n
     | Closure { cbname } -> pr ", %s" cbname
@@ -524,8 +524,9 @@ let
   let use_ret = ref true in
   List.iter (
     function
-    | BytesOut (n, count) ->
-       pr "  py_ret = PyBytes_FromStringAndSize (%s, %s);\n" n count;
+    | BytesOut (n, _) ->
+       pr "  py_ret = %s;\n" n;
+       pr "  %s = NULL;\n" n;
        use_ret := false
     | Bool _
     | BytesIn _
@@ -572,7 +573,7 @@ let
     | BytesIn (n, _) ->
        pr "  if (%s.obj)\n" n;
        pr "    PyBuffer_Release (&%s);\n" n
-    | BytesOut (n, _) -> pr "  free (%s);\n" n
+    | BytesOut (n, _) -> pr "  Py_XDECREF (%s);\n" n
     | BytesPersistIn _ | BytesPersistOut _ -> ()
     | Closure { cbname } ->
        pr "  free_user_data (%s_user_data);\n" cbname
-- 
2.36.1

Eric Blake

2022-May-27 22:09 UTC

head link

[Libguestfs] [libnbd PATCH] python: Speed up pread

On Fri, May 27, 2022 at 04:44:16PM -0500, Eric Blake
wrote:> Instead of malloc'ing a C buffer, nbd_pread()ing into it, then copying
> it into an immutable Python bytes object, we can instead pre-create a
> correctly-sized Python bytearray object, then nbd_pread() directly
> into that object's underlying bytes.
> 
> While the data copying might not be the bottleneck compared to the
> networking costs, it does have noticeable results; on my machine:
> 
> $ export script='
> m=1024*1024
> size=h.get_size()
> for i in range(size // m):
>   buf = h.pread(m, m*i)
> '
> $ time ./run nbdkit -U - pattern 10G --run 'nbdsh -u $uri -c
"$script"'
> 
> sped up from about 7.8s pre-patch to about 7.1s post-patch,
> approximately a 10% speedup.
> 
> Note that this slightly changes the python API: h.pread[_structured]
> now returns a mutable 'bytearray' object, rather than an immutable
> 'bytes' object.  This makes it possible to modify the just-read
string
> in place, rather than having to create yet another memory buffer for
> any modifications, which offers even more speedups when writing a
> read-modify-write paradigm in python.  But the change is
> backwards-compatible - python already states that a read-write buffer
> may be returned instead of readonly for any client that already
> operated only on a buffer in a readonly manner.
> ---
> 
> Note that h.pread is more like Python read() semantics in creating a
> buffer, while h.aio_pread is more like Python readinto() semantics in
> modifying a passed-in buffer.  But now that both code paths have a
> python object prior to calling into the C API, my next task is to
> improve the h.*pread_structured callback function to pass its buffer
> as a slice of the Python input buffer, rather than doing yet another
> round of pointless memcpy from C into python objects.
> 
>  generator/Python.ml | 17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
I've been previously reminded (commit e0953c for example) that showing
the diff of the generated output is also useful:

--- python/methods.c.bak	2022-05-27 17:08:43.035658709 -0500
+++ python/methods.c	2022-05-27 17:08:50.678669864 -0500
@@ -2295,7 +2295,7 @@
   struct nbd_handle *h;
   int ret;
   PyObject *py_ret = NULL;
-  char *buf = NULL;
+  PyObject *buf = NULL;
   Py_ssize_t count;
   uint64_t offset_u64;
   unsigned long long offset; /* really uint64_t */
@@ -2309,19 +2309,20 @@
   h = get_handle (py_h);
   if (!h) goto out;
   flags_u32 = flags;
-  buf = malloc (count);
-  if (buf == NULL) { PyErr_NoMemory (); goto out; }
+  buf = PyByteArray_FromStringAndSize (NULL, count);
+  if (buf == NULL) goto out;
   offset_u64 = offset;
 
-  ret = nbd_pread (h, buf, count, offset_u64, flags_u32);
+  ret = nbd_pread (h, PyByteArray_AS_STRING (buf), count, offset_u64,
flags_u32);
   if (ret == -1) {
     raise_exception ();
     goto out;
   }
-  py_ret = PyBytes_FromStringAndSize (buf, count);
+  py_ret = buf;
+  buf = NULL;
 
  out:
-  free (buf);
+  Py_XDECREF (buf);
   return py_ret;
 }
 
@@ -2332,7 +2333,7 @@
   struct nbd_handle *h;
   int ret;
   PyObject *py_ret = NULL;
-  char *buf = NULL;
+  PyObject *buf = NULL;
   Py_ssize_t count;
   uint64_t offset_u64;
   unsigned long long offset; /* really uint64_t */
@@ -2350,8 +2351,8 @@
   h = get_handle (py_h);
   if (!h) goto out;
   flags_u32 = flags;
-  buf = malloc (count);
-  if (buf == NULL) { PyErr_NoMemory (); goto out; }
+  buf = PyByteArray_FromStringAndSize (NULL, count);
+  if (buf == NULL) goto out;
   offset_u64 = offset;
   chunk.user_data = chunk_user_data = alloc_user_data ();
   if (chunk_user_data == NULL) goto out;
@@ -2364,16 +2365,17 @@
   Py_INCREF (py_chunk_fn);
   chunk_user_data->fn = py_chunk_fn;
 
-  ret = nbd_pread_structured (h, buf, count, offset_u64, chunk, flags_u32);
+  ret = nbd_pread_structured (h, PyByteArray_AS_STRING (buf), count,
offset_u64, chunk, flags_u32);
   chunk_user_data = NULL;
   if (ret == -1) {
     raise_exception ();
     goto out;
   }
-  py_ret = PyBytes_FromStringAndSize (buf, count);
+  py_ret = buf;
+  buf = NULL;
 
  out:
-  free (buf);
+  Py_XDECREF (buf);
   free_user_data (chunk_user_data);
   return py_ret;
 }



-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Richard W.M. Jones

2022-May-28 17:16 UTC

head link

[Libguestfs] [libnbd PATCH] python: Speed up pread

On Fri, May 27, 2022 at 04:44:16PM -0500, Eric Blake
wrote:> Instead of malloc'ing a C buffer, nbd_pread()ing into it, then copying
> it into an immutable Python bytes object, we can instead pre-create a
> correctly-sized Python bytearray object, then nbd_pread() directly
> into that object's underlying bytes.
> 
> While the data copying might not be the bottleneck compared to the
> networking costs, it does have noticeable results; on my machine:
> 
> $ export script='
> m=1024*1024
> size=h.get_size()
> for i in range(size // m):
>   buf = h.pread(m, m*i)
> '
> $ time ./run nbdkit -U - pattern 10G --run 'nbdsh -u $uri -c
"$script"'
> 
> sped up from about 7.8s pre-patch to about 7.1s post-patch,
> approximately a 10% speedup.
> 
> Note that this slightly changes the python API: h.pread[_structured]
> now returns a mutable 'bytearray' object, rather than an immutable
> 'bytes' object.  This makes it possible to modify the just-read
string
> in place, rather than having to create yet another memory buffer for
> any modifications, which offers even more speedups when writing a
> read-modify-write paradigm in python.  But the change is
> backwards-compatible - python already states that a read-write buffer
> may be returned instead of readonly for any client that already
> operated only on a buffer in a readonly manner.
Although this is not an API change, in general we have no obligation
to maintain backwards compat for the Python API.  (The C API is quite
a different matter of course.)  Nevertheless, it's nice not to break
things.
> Note that h.pread is more like Python read() semantics in creating a
> buffer, while h.aio_pread is more like Python readinto() semantics in
> modifying a passed-in buffer.  But now that both code paths have a
> python object prior to calling into the C API, my next task is to
> improve the h.*pread_structured callback function to pass its buffer
> as a slice of the Python input buffer, rather than doing yet another
> round of pointless memcpy from C into python objects.
You might want to have a look at the nbdkit Python bindings for
inspiration because Nir made those either zero- or low-copy (I forget
exactly which).
>  generator/Python.ml | 17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/generator/Python.ml b/generator/Python.ml
> index 4ab18f6..1c4446e 100644
> --- a/generator/Python.ml
> +++ b/generator/Python.ml
> @@ -1,6 +1,6 @@
>  (* hey emacs, this is OCaml code: -*- tuareg -*- *)
>  (* nbd client library in userspace: Python bindings
> - * Copyright (C) 2013-2021 Red Hat Inc.
> + * Copyright (C) 2013-2022 Red Hat Inc.
>   *
>   * This library is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU Lesser General Public
> @@ -293,7 +293,7 @@ let
>      | BytesIn (n, _) ->
>         pr "  Py_buffer %s = { .obj = NULL };\n" n
>      | BytesOut (n, count) ->
> -       pr "  char *%s = NULL;\n" n;
> +       pr "  PyObject *%s = NULL;\n" n;
>         pr "  Py_ssize_t %s;\n" count
>      | BytesPersistIn (n, _)
>      | BytesPersistOut (n, _) ->
> @@ -432,8 +432,8 @@ let
>      | Bool _ -> ()
>      | BytesIn _ -> ()
>      | BytesOut (n, count) ->
> -       pr "  %s = malloc (%s);\n" n count;
> -       pr "  if (%s == NULL) { PyErr_NoMemory (); goto out; }\n"
n
> +       pr "  %s = PyByteArray_FromStringAndSize (NULL, %s);\n" n
count;
> +       pr "  if (%s == NULL) goto out;\n" n
>      | BytesPersistIn (n, _) | BytesPersistOut (n, _) ->
>         pr "  %s_buf = nbd_internal_py_get_aio_buffer (%s);\n" n
n;
>         pr "  if (!%s_buf) goto out;\n" n;
> @@ -479,7 +479,7 @@ let
>      function
>      | Bool n -> pr ", %s" n
>      | BytesIn (n, _) -> pr ", %s.buf, %s.len" n n
> -    | BytesOut (n, count) -> pr ", %s, %s" n count
> +    | BytesOut (n, count) -> pr ", PyByteArray_AS_STRING (%s),
%s" n count
>      | BytesPersistIn (n, _)
>      | BytesPersistOut (n, _) -> pr ", %s_buf->data,
%s_buf->len" n n
>      | Closure { cbname } -> pr ", %s" cbname
> @@ -524,8 +524,9 @@ let
>    let use_ret = ref true in
>    List.iter (
>      function
> -    | BytesOut (n, count) ->
> -       pr "  py_ret = PyBytes_FromStringAndSize (%s, %s);\n" n
count;
> +    | BytesOut (n, _) ->
> +       pr "  py_ret = %s;\n" n;
> +       pr "  %s = NULL;\n" n;
>         use_ret := false
>      | Bool _
>      | BytesIn _
> @@ -572,7 +573,7 @@ let
>      | BytesIn (n, _) ->
>         pr "  if (%s.obj)\n" n;
>         pr "    PyBuffer_Release (&%s);\n" n
> -    | BytesOut (n, _) -> pr "  free (%s);\n" n
> +    | BytesOut (n, _) -> pr "  Py_XDECREF (%s);\n" n
>      | BytesPersistIn _ | BytesPersistOut _ -> ()
>      | Closure { cbname } ->
>         pr "  free_user_data (%s_user_data);\n" cbname
Looks good,

Reviewed-by: Richard W.M. Jones <rjones at redhat.com>

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Libguestfs - May 2022 - [libnbd PATCH] python: Speed up pread

[Libguestfs] [libnbd PATCH] python: Speed up pread

[Libguestfs] [libnbd PATCH] python: Speed up pread

[Libguestfs] [libnbd PATCH] python: Speed up pread