thr3ads.net - Xen devel - oxenstored memory leak? seems related with XSA-38 [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Liuqiming (John)

2013-Jul-01 13:47 UTC

oxenstored memory leak? seems related with XSA-38

Hi all,

I test starting vm using xen-4.2.2 release with oxenstored, and got a problem
may be related with XSA-38
(http://lists.xen.org/archives/html/xen-announce/2013-02/msg00005.html).

When vm started, oxenstored memory usage keep increasing, and it took 1.5G
memory at last. Vm hanged at loading OS screen.

Here is the output of top:

top - 20:18:32 up 1 day,  3:09,  5 users,  load average: 0.99, 0.63, 0.32
Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.5 us,  1.8 sy,  0.0 ni, 93.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  46919428 total, 46699012 used,   220416 free,    36916 buffers
KiB Swap:  2103292 total,        0 used,  2103292 free, 44260932 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM     TIME+ COMMAND
  806 root      20   0  955m 926m 1068 R  99.9  2.0   4:54.14 oxenstored

  
top - 20:19:05 up 1 day,  3:09,  5 users,  load average: 0.99, 0.67, 0.34
Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.6 us,  1.6 sy,  0.0 ni, 93.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  46919428 total, 46708564 used,   210864 free,    36964 buffers
KiB Swap:  2103292 total,        0 used,  2103292 free, 44168380 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM     TIME+ COMMAND
  806 root      20   0 1048m 1.0g 1068 R 100.2  2.2   5:27.03 oxenstored

  

top - 20:21:35 up 1 day,  3:12,  5 users,  load average: 1.00, 0.80, 0.44
Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.7 us,  1.6 sy,  0.0 ni, 93.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  46919428 total, 46703052 used,   216376 free,    37208 buffers
KiB Swap:  2103292 total,        0 used,  2103292 free, 43682968 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM     TIME+ COMMAND
  806 root      20   0 1551m 1.5g 1068 R 100.2  3.3   7:56.10 oxenstored

And oxenstored log got these over and over again:

[20130701T12:27:14.290Z] D8           invalid     device/suspend/event-channel  
..
[20130701T12:27:14.290Z] D8.1937077039 invalid   /event-channel                 
..
[20130701T12:27:14.290Z] D8.1852727656 invalid                                  
..
[20130701T12:27:14.290Z] D8           debug
[20130701T12:27:14.290Z] D8           debug
[20130701T12:27:14.290Z] D8           debug
[20130701T12:27:14.290Z] D8           debug
[20130701T12:27:14.290Z] D8           debug
[20130701T12:27:14.290Z] D8           debug

My vm is a windows guest and has GPL PVDriver installed. This problem is hard to
reproduce, and after a hard reboot, everything looks normal.

I guess it''s something wrong with the xenbus IO Ring, so I investigated
the code:

1) oxenstored and xenbus in vm using a shared page to communicate with each
other
  struct xenstore_domain_interface {
    char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
    char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
    XENSTORE_RING_IDX req_cons, req_prod;
    XENSTORE_RING_IDX rsp_cons, rsp_prod;
};

2) xenbus in vm put request in req and increase req_prod, then send a event to
oxenstored
3) oxenstored calculates how many to read using req_cons and req_prod, and after
read oxenstored increase req_cons to make it equals req_prod which means no
request pending.
4) oxenstored put responds in rsp and increase rsp_prod, then send a event to
vm, xenbus in vm using similar logic to handle the rsp ring.

 Am I correct?

So, I''m curious about what happened when req_cons larger than req_prod
(this can be caused by buggy PV Driver or malicious guest user), it seems
oxenstored will fell in a endless loop.

Is this what XSA-38 talk about?

I built a pvdriver which will set req_prod to 0 after several xenstore
operation, and test it on xen-unstable.hg make sure all XSA-38 patches applied.
It seems that the problem I first met reproduced. Oxenstored will took a lot of
memory eventually.

Could anyone help me about this issue?

Liuqiming (John)

2013-Jul-04 02:48 UTC

head link

Re: oxenstored memory leak? seems related with XSA-38

Hi all,

Continue my test about oxenstored:

I switch to original C xenstored and test my "broken" vm. The
cxenstored do not have the "memory leak" issue.
So I compared the IO ring handler logic between cxenstored and oxenstored and
find out the difference:

In Cxenstord, after got the cons and prod value, a index check will be performed

	if (!check_indexes(cons, prod)) {
		errno = EIO;
		return -1;
	}

static bool check_indexes(XENSTORE_RING_IDX cons, XENSTORE_RING_IDX prod)
{
	return ((prod - cons) <= XENSTORE_RING_SIZE);
}

So any connection has prod - cons > XENSTORE_RING_SIZE will be treated as
"bad client", and cxenstored will not handle its IO ring msg any more.

But in oxenstored, there is just a simple comparison between prod and cons
    if (prod == cons)
		return 0;

so there leaves a security hole to a guest vm user who can manipulate prod and
cons to make oxenstored increasing memory usage and out of service.

I managed to create a patch to fix this and I''m testing it on xen4.2.2.
Will send out soon.
> -----Original Message-----
> From: Liuqiming (John)
> Sent: Monday, July 01, 2013 9:47 PM
> To: ''xen-devel@lists.xen.org'';
''ian.jackson@eu.citrix.com'';
> ''ian.campbell@citrix.com''
> Cc: Yanqiangjun
> Subject: oxenstored memory leak? seems related with XSA-38
> 
> Hi all,
> 
> I test starting vm using xen-4.2.2 release with oxenstored, and got a
problem
> may be related with XSA-38
> (http://lists.xen.org/archives/html/xen-announce/2013-02/msg00005.html).
> 
> When vm started, oxenstored memory usage keep increasing, and it took
> 1.5G memory at last. Vm hanged at loading OS screen.
> 
> Here is the output of top:
> 
> top - 20:18:32 up 1 day,  3:09,  5 users,  load average: 0.99, 0.63, 0.32
> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  4.5 us,  1.8 sy,  0.0 ni, 93.7 id,  0.0 wa,  0.0 hi,  0.0 si, 
0.0
> st
> KiB Mem:  46919428 total, 46699012 used,   220416 free,    36916
> buffers
> KiB Swap:  2103292 total,        0 used,  2103292 free, 44260932
> cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM     TIME+
> COMMAND
>   806 root      20   0  955m 926m 1068 R  99.9  2.0   4:54.14
> oxenstored
> 
> 
> top - 20:19:05 up 1 day,  3:09,  5 users,  load average: 0.99, 0.67, 0.34
> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  4.6 us,  1.6 sy,  0.0 ni, 93.8 id,  0.0 wa,  0.0 hi,  0.0 si, 
0.0
> st
> KiB Mem:  46919428 total, 46708564 used,   210864 free,    36964
> buffers
> KiB Swap:  2103292 total,        0 used,  2103292 free, 44168380
> cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM     TIME+
> COMMAND
>   806 root      20   0 1048m 1.0g 1068 R 100.2  2.2   5:27.03
> oxenstored
> 
> 
> 
> top - 20:21:35 up 1 day,  3:12,  5 users,  load average: 1.00, 0.80, 0.44
> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  4.7 us,  1.6 sy,  0.0 ni, 93.7 id,  0.0 wa,  0.0 hi,  0.0 si, 
0.0
> st
> KiB Mem:  46919428 total, 46703052 used,   216376 free,    37208
> buffers
> KiB Swap:  2103292 total,        0 used,  2103292 free, 43682968
> cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM     TIME+
> COMMAND
>   806 root      20   0 1551m 1.5g 1068 R 100.2  3.3   7:56.10
> oxenstored
> 
> And oxenstored log got these over and over again:
> 
> [20130701T12:27:14.290Z] D8           invalid
> device/suspend/event-channel
> 
> ..
> [20130701T12:27:14.290Z] D8.1937077039 invalid
> /event-channel
> 
>   ..
> [20130701T12:27:14.290Z] D8.1852727656
> invalid
> 
>             ..
> [20130701T12:27:14.290Z] D8           debug
> [20130701T12:27:14.290Z] D8           debug
> [20130701T12:27:14.290Z] D8           debug
> [20130701T12:27:14.290Z] D8           debug
> [20130701T12:27:14.290Z] D8           debug
> [20130701T12:27:14.290Z] D8           debug
> 
> My vm is a windows guest and has GPL PVDriver installed. This problem is
> hard to reproduce, and after a hard reboot, everything looks normal.
> 
> I guess it''s something wrong with the xenbus IO Ring, so I
investigated the
> code:
> 
> 1) oxenstored and xenbus in vm using a shared page to communicate with
> each other
>   struct xenstore_domain_interface {
>     char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
>     char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
>     XENSTORE_RING_IDX req_cons, req_prod;
>     XENSTORE_RING_IDX rsp_cons, rsp_prod;
> };
> 
> 2) xenbus in vm put request in req and increase req_prod, then send a event
> to oxenstored
> 3) oxenstored calculates how many to read using req_cons and req_prod,
> and after read oxenstored increase req_cons to make it equals req_prod
> which means no request pending.
> 4) oxenstored put responds in rsp and increase rsp_prod, then send a event
> to vm, xenbus in vm using similar logic to handle the rsp ring.
> 
>  Am I correct?
> 
> So, I''m curious about what happened when req_cons larger than
req_prod
> (this can be caused by buggy PV Driver or malicious guest user), it seems
> oxenstored will fell in a endless loop.
> 
> Is this what XSA-38 talk about?
> 
> I built a pvdriver which will set req_prod to 0 after several xenstore
> operation, and test it on xen-unstable.hg make sure all XSA-38 patches
> applied.
> It seems that the problem I first met reproduced. Oxenstored will took a
lot
> of memory eventually.
> 
> Could anyone help me about this issue?
> 
> 
> 
> 
>

Andrew Cooper

2013-Jul-04 08:52 UTC

head link

Re: oxenstored memory leak? seems related with XSA-38

On 04/07/13 03:48, Liuqiming (John) wrote:> Hi all,
>
> Continue my test about oxenstored:
>
> I switch to original C xenstored and test my "broken" vm. The
cxenstored do not have the "memory leak" issue.
> So I compared the IO ring handler logic between cxenstored and oxenstored
and find out the difference:
>
> In Cxenstord, after got the cons and prod value, a index check will be
performed
>
> 	if (!check_indexes(cons, prod)) {
> 		errno = EIO;
> 		return -1;
> 	}
>
> static bool check_indexes(XENSTORE_RING_IDX cons, XENSTORE_RING_IDX prod)
> {
> 	return ((prod - cons) <= XENSTORE_RING_SIZE);
> }
>
> So any connection has prod - cons > XENSTORE_RING_SIZE will be treated
as "bad client", and cxenstored will not handle its IO ring msg any
more.
>
> But in oxenstored, there is just a simple comparison between prod and cons
>     if (prod == cons)
> 		return 0;
>
> so there leaves a security hole to a guest vm user who can manipulate prod
and cons to make oxenstored increasing memory usage and out of service.
>
> I managed to create a patch to fix this and I''m testing it on
xen4.2.2. Will send out soon.
Are you aware of
http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=eeddfad1b339dcaa787230f519a19de1cbc22ad8
which is a second patch for XSA-38 ?

~Andrew
>
>> -----Original Message-----
>> From: Liuqiming (John)
>> Sent: Monday, July 01, 2013 9:47 PM
>> To: ''xen-devel@lists.xen.org'';
''ian.jackson@eu.citrix.com'';
>> ''ian.campbell@citrix.com''
>> Cc: Yanqiangjun
>> Subject: oxenstored memory leak? seems related with XSA-38
>>
>> Hi all,
>>
>> I test starting vm using xen-4.2.2 release with oxenstored, and got a
problem
>> may be related with XSA-38
>>
(http://lists.xen.org/archives/html/xen-announce/2013-02/msg00005.html).
>>
>> When vm started, oxenstored memory usage keep increasing, and it took
>> 1.5G memory at last. Vm hanged at loading OS screen.
>>
>> Here is the output of top:
>>
>> top - 20:18:32 up 1 day,  3:09,  5 users,  load average: 0.99, 0.63,
0.32
>> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
>> %Cpu(s):  4.5 us,  1.8 sy,  0.0 ni, 93.7 id,  0.0 wa,  0.0 hi,  0.0 si,
0.0
>> st
>> KiB Mem:  46919428 total, 46699012 used,   220416 free,    36916
>> buffers
>> KiB Swap:  2103292 total,        0 used,  2103292 free, 44260932
>> cached
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM     TIME+
>> COMMAND
>>   806 root      20   0  955m 926m 1068 R  99.9  2.0   4:54.14
>> oxenstored
>>
>>
>> top - 20:19:05 up 1 day,  3:09,  5 users,  load average: 0.99, 0.67,
0.34
>> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
>> %Cpu(s):  4.6 us,  1.6 sy,  0.0 ni, 93.8 id,  0.0 wa,  0.0 hi,  0.0 si,
0.0
>> st
>> KiB Mem:  46919428 total, 46708564 used,   210864 free,    36964
>> buffers
>> KiB Swap:  2103292 total,        0 used,  2103292 free, 44168380
>> cached
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM     TIME+
>> COMMAND
>>   806 root      20   0 1048m 1.0g 1068 R 100.2  2.2   5:27.03
>> oxenstored
>>
>>
>>
>> top - 20:21:35 up 1 day,  3:12,  5 users,  load average: 1.00, 0.80,
0.44
>> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
>> %Cpu(s):  4.7 us,  1.6 sy,  0.0 ni, 93.7 id,  0.0 wa,  0.0 hi,  0.0 si,
0.0
>> st
>> KiB Mem:  46919428 total, 46703052 used,   216376 free,    37208
>> buffers
>> KiB Swap:  2103292 total,        0 used,  2103292 free, 43682968
>> cached
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM     TIME+
>> COMMAND
>>   806 root      20   0 1551m 1.5g 1068 R 100.2  3.3   7:56.10
>> oxenstored
>>
>> And oxenstored log got these over and over again:
>>
>> [20130701T12:27:14.290Z] D8           invalid
>> device/suspend/event-channel
>>
>> ..
>> [20130701T12:27:14.290Z] D8.1937077039 invalid
>> /event-channel
>>
>>   ..
>> [20130701T12:27:14.290Z] D8.1852727656
>> invalid
>>
>>             ..
>> [20130701T12:27:14.290Z] D8           debug
>> [20130701T12:27:14.290Z] D8           debug
>> [20130701T12:27:14.290Z] D8           debug
>> [20130701T12:27:14.290Z] D8           debug
>> [20130701T12:27:14.290Z] D8           debug
>> [20130701T12:27:14.290Z] D8           debug
>>
>> My vm is a windows guest and has GPL PVDriver installed. This problem
is
>> hard to reproduce, and after a hard reboot, everything looks normal.
>>
>> I guess it''s something wrong with the xenbus IO Ring, so I
investigated the
>> code:
>>
>> 1) oxenstored and xenbus in vm using a shared page to communicate with
>> each other
>>   struct xenstore_domain_interface {
>>     char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
>>     char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
>>     XENSTORE_RING_IDX req_cons, req_prod;
>>     XENSTORE_RING_IDX rsp_cons, rsp_prod;
>> };
>>
>> 2) xenbus in vm put request in req and increase req_prod, then send a
event
>> to oxenstored
>> 3) oxenstored calculates how many to read using req_cons and req_prod,
>> and after read oxenstored increase req_cons to make it equals req_prod
>> which means no request pending.
>> 4) oxenstored put responds in rsp and increase rsp_prod, then send a
event
>> to vm, xenbus in vm using similar logic to handle the rsp ring.
>>
>>  Am I correct?
>>
>> So, I''m curious about what happened when req_cons larger than
req_prod
>> (this can be caused by buggy PV Driver or malicious guest user), it
seems
>> oxenstored will fell in a endless loop.
>>
>> Is this what XSA-38 talk about?
>>
>> I built a pvdriver which will set req_prod to 0 after several xenstore
>> operation, and test it on xen-unstable.hg make sure all XSA-38 patches
>> applied.
>> It seems that the problem I first met reproduced. Oxenstored will took
a lot
>> of memory eventually.
>>
>> Could anyone help me about this issue?
>>
>>
>>
>>
>>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Liuqiming (John)

2013-Jul-05 03:14 UTC

head link

Re: oxenstored memory leak? seems related with XSA-38

Hi Andrew,

Yes, I know the second patch commit by Ian Campbell. 

Actually I have applied all the patch from
http://wiki.xen.org/wiki/Security_Announcements#XSA_38_oxenstored_incorrect_handling_of_certain_Xenbus_ring_states

But this issue still exists.
> On 04/07/13 03:48, Liuqiming (John) wrote:
> > Hi all,
> >
> > Continue my test about oxenstored:
> >
> > I switch to original C xenstored and test my "broken" vm.
The cxenstored
> do not have the "memory leak" issue.
> > So I compared the IO ring handler logic between cxenstored and
> oxenstored and find out the difference:
> >
> > In Cxenstord, after got the cons and prod value, a index check will be
> performed
> >
> > 	if (!check_indexes(cons, prod)) {
> > 		errno = EIO;
> > 		return -1;
> > 	}
> >
> > static bool check_indexes(XENSTORE_RING_IDX cons,
> XENSTORE_RING_IDX prod)
> > {
> > 	return ((prod - cons) <= XENSTORE_RING_SIZE);
> > }
> >
> > So any connection has prod - cons > XENSTORE_RING_SIZE will be
treated
> as "bad client", and cxenstored will not handle its IO ring msg
any more.
> >
> > But in oxenstored, there is just a simple comparison between prod and
> cons
> >     if (prod == cons)
> > 		return 0;
> >
> > so there leaves a security hole to a guest vm user who can manipulate
prod
> and cons to make oxenstored increasing memory usage and out of service.
> >
> > I managed to create a patch to fix this and I''m testing it on
xen4.2.2. Will
> send out soon.
> 
> Are you aware of
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=eeddfad1b339dca
> a787230f519a19de1cbc22ad8
> which is a second patch for XSA-38 ?
> 
> ~Andrew
> 
> >
> >> -----Original Message-----
> >> From: Liuqiming (John)
> >> Sent: Monday, July 01, 2013 9:47 PM
> >> To: ''xen-devel@lists.xen.org'';
''ian.jackson@eu.citrix.com'';
> >> ''ian.campbell@citrix.com''
> >> Cc: Yanqiangjun
> >> Subject: oxenstored memory leak? seems related with XSA-38
> >>
> >> Hi all,
> >>
> >> I test starting vm using xen-4.2.2 release with oxenstored, and
got a
> problem
> >> may be related with XSA-38
> >>
> (http://lists.xen.org/archives/html/xen-announce/2013-02/msg00005.html).
> >>
> >> When vm started, oxenstored memory usage keep increasing, and it
took
> >> 1.5G memory at last. Vm hanged at loading OS screen.
> >>
> >> Here is the output of top:
> >>
> >> top - 20:18:32 up 1 day,  3:09,  5 users,  load average: 0.99,
0.63, 0.32
> >> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0
zombie
> >> %Cpu(s):  4.5 us,  1.8 sy,  0.0 ni, 93.7 id,  0.0 wa,  0.0 hi, 
0.0 si,
> 0.0
> >> st
> >> KiB Mem:  46919428 total, 46699012 used,   220416 free,    36916
> >> buffers
> >> KiB Swap:  2103292 total,        0 used,  2103292 free, 44260932
> >> cached
> >>
> >>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM
> TIME+
> >> COMMAND
> >>   806 root      20   0  955m 926m 1068 R  99.9  2.0   4:54.14
> >> oxenstored
> >>
> >>
> >> top - 20:19:05 up 1 day,  3:09,  5 users,  load average: 0.99,
0.67, 0.34
> >> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0
zombie
> >> %Cpu(s):  4.6 us,  1.6 sy,  0.0 ni, 93.8 id,  0.0 wa,  0.0 hi, 
0.0 si,
> 0.0
> >> st
> >> KiB Mem:  46919428 total, 46708564 used,   210864 free,    36964
> >> buffers
> >> KiB Swap:  2103292 total,        0 used,  2103292 free, 44168380
> >> cached
> >>
> >>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM
> TIME+
> >> COMMAND
> >>   806 root      20   0 1048m 1.0g 1068 R 100.2  2.2   5:27.03
> >> oxenstored
> >>
> >>
> >>
> >> top - 20:21:35 up 1 day,  3:12,  5 users,  load average: 1.00,
0.80, 0.44
> >> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0
zombie
> >> %Cpu(s):  4.7 us,  1.6 sy,  0.0 ni, 93.7 id,  0.0 wa,  0.0 hi, 
0.0 si,
> 0.0
> >> st
> >> KiB Mem:  46919428 total, 46703052 used,   216376 free,    37208
> >> buffers
> >> KiB Swap:  2103292 total,        0 used,  2103292 free, 43682968
> >> cached
> >>
> >>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM
> TIME+
> >> COMMAND
> >>   806 root      20   0 1551m 1.5g 1068 R 100.2  3.3   7:56.10
> >> oxenstored
> >>
> >> And oxenstored log got these over and over again:
> >>
> >> [20130701T12:27:14.290Z] D8           invalid
> >> device/suspend/event-channel
> >>
> >> ..
> >> [20130701T12:27:14.290Z] D8.1937077039 invalid
> >> /event-channel
> >>
> >>   ..
> >> [20130701T12:27:14.290Z] D8.1852727656
> >> invalid
> >>
> >>             ..
> >> [20130701T12:27:14.290Z] D8           debug
> >> [20130701T12:27:14.290Z] D8           debug
> >> [20130701T12:27:14.290Z] D8           debug
> >> [20130701T12:27:14.290Z] D8           debug
> >> [20130701T12:27:14.290Z] D8           debug
> >> [20130701T12:27:14.290Z] D8           debug
> >>
> >> My vm is a windows guest and has GPL PVDriver installed. This
problem is
> >> hard to reproduce, and after a hard reboot, everything looks
normal.
> >>
> >> I guess it''s something wrong with the xenbus IO Ring, so
I investigated the
> >> code:
> >>
> >> 1) oxenstored and xenbus in vm using a shared page to communicate
with
> >> each other
> >>   struct xenstore_domain_interface {
> >>     char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon.
> */
> >>     char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch
events.
> */
> >>     XENSTORE_RING_IDX req_cons, req_prod;
> >>     XENSTORE_RING_IDX rsp_cons, rsp_prod;
> >> };
> >>
> >> 2) xenbus in vm put request in req and increase req_prod, then
send a
> event
> >> to oxenstored
> >> 3) oxenstored calculates how many to read using req_cons and
req_prod,
> >> and after read oxenstored increase req_cons to make it equals
req_prod
> >> which means no request pending.
> >> 4) oxenstored put responds in rsp and increase rsp_prod, then send
a
> event
> >> to vm, xenbus in vm using similar logic to handle the rsp ring.
> >>
> >>  Am I correct?
> >>
> >> So, I''m curious about what happened when req_cons larger
than
> req_prod
> >> (this can be caused by buggy PV Driver or malicious guest user),
it seems
> >> oxenstored will fell in a endless loop.
> >>
> >> Is this what XSA-38 talk about?
> >>
> >> I built a pvdriver which will set req_prod to 0 after several
xenstore
> >> operation, and test it on xen-unstable.hg make sure all XSA-38
patches
> >> applied.
> >> It seems that the problem I first met reproduced. Oxenstored will
took a
> lot
> >> of memory eventually.
> >>
> >> Could anyone help me about this issue?
> >>
> >>
> >>
> >>
> >>
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

Liuqiming (John)

2013-Jul-05 09:07 UTC

head link

Re: oxenstored memory leak? seems related with XSA-38

Here is my patch that try to fix this issue.

The whole idea is: add check logic when read from IO ring, and if error happens
mark the reading connection as "bad",
Unless vm reboot, oxenstored will not handle message from this connection any
more.

diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c
b/tools/ocaml/libs/xb/xs_ring_stubs.c
index fdd9983..a9ca456 100644
--- a/tools/ocaml/libs/xb/xs_ring_stubs.c
+++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
@@ -45,6 +45,10 @@
 	cons = *(volatile uint32*)&intf->req_cons;
 	prod = *(volatile uint32*)&intf->req_prod;
 	xen_mb();
+
+	if ((prod - cons) > XENSTORE_RING_SIZE)
+	    return -1;
+
 	if (prod == cons)
 		return 0;
 	cons = MASK_XENSTORE_IDX(cons);
@@ -94,7 +98,7 @@
 	res = xs_ring_read(GET_C_STRUCT(interface),
 	                   String_val(buffer), Int_val(len));
 	if (res == -1)
-		caml_failwith("huh");
+		caml_failwith("bad client");
 	result = Val_int(res);
 	CAMLreturn(result);
 }
diff --git a/tools/ocaml/xenstored/connection.ml
b/tools/ocaml/xenstored/connection.ml
index 32e2f2e..e862c79 100644
--- a/tools/ocaml/xenstored/connection.ml
+++ b/tools/ocaml/xenstored/connection.ml
@@ -38,6 +38,11 @@
 	mutable perm: Perms.Connection.t;
 }
 
+let mark_as_bad con = 
+	match con.dom with
+	| None -> ()
+	| Some domain -> Domain.mark_as_bad domain
+
 let get_path con  Printf.sprintf "/local/domain/%i/" (match con.dom
with None -> 0 | Some d -> Domain.get_id d)
 
diff --git a/tools/ocaml/xenstored/domain.ml b/tools/ocaml/xenstored/domain.ml
index 85ab282..444069d 100644
--- a/tools/ocaml/xenstored/domain.ml
+++ b/tools/ocaml/xenstored/domain.ml
@@ -27,6 +27,7 @@
 	interface: Xenmmap.mmap_interface;
 	eventchn: Event.t;
 	mutable port: Xeneventchn.t option;
+	mutable bad_client: bool;
 }
 
 let get_path dom = "/local/domain/" ^ (sprintf "%u" dom.id)
@@ -34,6 +35,9 @@
 let get_interface d = d.interface
 let get_mfn d = d.mfn
 let get_remote_port d = d.remote_port
+
+let is_bad_domain domain = domain.bad_client
+let mark_as_bad domain = domain.bad_client <- true
 
 let string_of_port = function
 | None -> "None"
@@ -68,7 +72,8 @@
 	remote_port = remote_port;
 	interface = interface;
 	eventchn = eventchn;
-	port = None
+	port = None;
+	bad_client = false
 }
 
 let is_dom0 d = d.id = 0
diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
index a4ff741..2267ddc 100644
--- a/tools/ocaml/xenstored/process.ml
+++ b/tools/ocaml/xenstored/process.ml
@@ -374,7 +374,17 @@
 	Logging.xb_answer ~ty ~tid ~con:(Connection.get_domstr con) data
 
 let do_input store cons doms con -	if Connection.do_input con then (
+	let newpacket = 
+		try
+			Connection.do_input con
+		with Failure exp ->
+			error "caught exception %s" exp;
+			error "got a bad client %s" (sprintf "%-8s"
(Connection.get_domstr con));
+			Connection.mark_as_bad con;
+			false
+	in
+	
+	if newpacket then (
 		let packet = Connection.pop_in con in
 		let tid, rid, ty, data = Xenbus.Xb.Packet.unpack packet in
 		(* As we don''t log IO, do not call an unnecessary sanitize_data 
diff --git a/tools/ocaml/xenstored/xenstored.ml
b/tools/ocaml/xenstored/xenstored.ml
index 4045aed..cfc2a4b 100644
--- a/tools/ocaml/xenstored/xenstored.ml
+++ b/tools/ocaml/xenstored/xenstored.ml
@@ -50,9 +50,10 @@
 
 let process_domains store cons domains  	let do_io_domain domain -		let con =
Connections.find_domain cons (Domain.get_id domain) in
-		Process.do_input store cons domains con;
-		Process.do_output store cons domains con in
+		if not (Domain.is_bad_domain domain) then
+			let con = Connections.find_domain cons (Domain.get_id domain) in
+			Process.do_input store cons domains con;
+			Process.do_output store cons domains con in
 	Domains.iter domains do_io_domain
 
 let sigusr1_handler store 
> -----Original Message-----
> From: Liuqiming (John)
> Sent: Friday, July 05, 2013 11:14 AM
> To: ''andrew.cooper3@citrix.com''
> Cc: Yanqiangjun; ''xen-devel@lists.xen.org''
> Subject: RE: [Xen-devel] oxenstored memory leak? seems related with
> XSA-38
> 
> Hi Andrew,
> 
> Yes, I know the second patch commit by Ian Campbell.
> 
> Actually I have applied all the patch from
> http://wiki.xen.org/wiki/Security_Announcements#XSA_38_oxenstored_inc
> orrect_handling_of_certain_Xenbus_ring_states
> 
> But this issue still exists.
> 
> > On 04/07/13 03:48, Liuqiming (John) wrote:
> > > Hi all,
> > >
> > > Continue my test about oxenstored:
> > >
> > > I switch to original C xenstored and test my "broken"
vm. The cxenstored
> > do not have the "memory leak" issue.
> > > So I compared the IO ring handler logic between cxenstored and
> > oxenstored and find out the difference:
> > >
> > > In Cxenstord, after got the cons and prod value, a index check
will be
> > performed
> > >
> > > 	if (!check_indexes(cons, prod)) {
> > > 		errno = EIO;
> > > 		return -1;
> > > 	}
> > >
> > > static bool check_indexes(XENSTORE_RING_IDX cons,
> > XENSTORE_RING_IDX prod)
> > > {
> > > 	return ((prod - cons) <= XENSTORE_RING_SIZE);
> > > }
> > >
> > > So any connection has prod - cons > XENSTORE_RING_SIZE will be
> treated
> > as "bad client", and cxenstored will not handle its IO ring
msg any more.
> > >
> > > But in oxenstored, there is just a simple comparison between prod
and
> > cons
> > >     if (prod == cons)
> > > 		return 0;
> > >
> > > so there leaves a security hole to a guest vm user who can
manipulate
> prod
> > and cons to make oxenstored increasing memory usage and out of
service.
> > >
> > > I managed to create a patch to fix this and I''m testing
it on xen4.2.2. Will
> > send out soon.
> >
> > Are you aware of
> >
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=eeddfad1b339dca
> > a787230f519a19de1cbc22ad8
> > which is a second patch for XSA-38 ?
> >
> > ~Andrew
> >
> > >
> > >> -----Original Message-----
> > >> From: Liuqiming (John)
> > >> Sent: Monday, July 01, 2013 9:47 PM
> > >> To: ''xen-devel@lists.xen.org'';
''ian.jackson@eu.citrix.com'';
> > >> ''ian.campbell@citrix.com''
> > >> Cc: Yanqiangjun
> > >> Subject: oxenstored memory leak? seems related with XSA-38
> > >>
> > >> Hi all,
> > >>
> > >> I test starting vm using xen-4.2.2 release with oxenstored,
and got a
> > problem
> > >> may be related with XSA-38
> > >>
> >
> (http://lists.xen.org/archives/html/xen-announce/2013-02/msg00005.html).
> > >>
> > >> When vm started, oxenstored memory usage keep increasing, and
it
> took
> > >> 1.5G memory at last. Vm hanged at loading OS screen.
> > >>
> > >> Here is the output of top:
> > >>
> > >> top - 20:18:32 up 1 day,  3:09,  5 users,  load average:
0.99, 0.63,
> 0.32
> > >> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0
zombie
> > >> %Cpu(s):  4.5 us,  1.8 sy,  0.0 ni, 93.7 id,  0.0 wa,  0.0
hi,  0.0 si,
> > 0.0
> > >> st
> > >> KiB Mem:  46919428 total, 46699012 used,   220416 free,   
36916
> > >> buffers
> > >> KiB Swap:  2103292 total,        0 used,  2103292 free,
44260932
> > >> cached
> > >>
> > >>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM
> > TIME+
> > >> COMMAND
> > >>   806 root      20   0  955m 926m 1068 R  99.9  2.0   4:54.14
> > >> oxenstored
> > >>
> > >>
> > >> top - 20:19:05 up 1 day,  3:09,  5 users,  load average:
0.99, 0.67,
> 0.34
> > >> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0
zombie
> > >> %Cpu(s):  4.6 us,  1.6 sy,  0.0 ni, 93.8 id,  0.0 wa,  0.0
hi,  0.0 si,
> > 0.0
> > >> st
> > >> KiB Mem:  46919428 total, 46708564 used,   210864 free,   
36964
> > >> buffers
> > >> KiB Swap:  2103292 total,        0 used,  2103292 free,
44168380
> > >> cached
> > >>
> > >>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM
> > TIME+
> > >> COMMAND
> > >>   806 root      20   0 1048m 1.0g 1068 R 100.2  2.2   5:27.03
> > >> oxenstored
> > >>
> > >>
> > >>
> > >> top - 20:21:35 up 1 day,  3:12,  5 users,  load average:
1.00, 0.80,
> 0.44
> > >> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0
zombie
> > >> %Cpu(s):  4.7 us,  1.6 sy,  0.0 ni, 93.7 id,  0.0 wa,  0.0
hi,  0.0 si,
> > 0.0
> > >> st
> > >> KiB Mem:  46919428 total, 46703052 used,   216376 free,   
37208
> > >> buffers
> > >> KiB Swap:  2103292 total,        0 used,  2103292 free,
43682968
> > >> cached
> > >>
> > >>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM
> > TIME+
> > >> COMMAND
> > >>   806 root      20   0 1551m 1.5g 1068 R 100.2  3.3   7:56.10
> > >> oxenstored
> > >>
> > >> And oxenstored log got these over and over again:
> > >>
> > >> [20130701T12:27:14.290Z] D8           invalid
> > >> device/suspend/event-channel
> > >>
> > >> ..
> > >> [20130701T12:27:14.290Z] D8.1937077039 invalid
> > >> /event-channel
> > >>
> > >>   ..
> > >> [20130701T12:27:14.290Z] D8.1852727656
> > >> invalid
> > >>
> > >>             ..
> > >> [20130701T12:27:14.290Z] D8           debug
> > >> [20130701T12:27:14.290Z] D8           debug
> > >> [20130701T12:27:14.290Z] D8           debug
> > >> [20130701T12:27:14.290Z] D8           debug
> > >> [20130701T12:27:14.290Z] D8           debug
> > >> [20130701T12:27:14.290Z] D8           debug
> > >>
> > >> My vm is a windows guest and has GPL PVDriver installed. This
problem
> is
> > >> hard to reproduce, and after a hard reboot, everything looks
normal.
> > >>
> > >> I guess it''s something wrong with the xenbus IO
Ring, so I investigated
> the
> > >> code:
> > >>
> > >> 1) oxenstored and xenbus in vm using a shared page to
communicate
> with
> > >> each other
> > >>   struct xenstore_domain_interface {
> > >>     char req[XENSTORE_RING_SIZE]; /* Requests to xenstore
> daemon.
> > */
> > >>     char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch
> events.
> > */
> > >>     XENSTORE_RING_IDX req_cons, req_prod;
> > >>     XENSTORE_RING_IDX rsp_cons, rsp_prod;
> > >> };
> > >>
> > >> 2) xenbus in vm put request in req and increase req_prod,
then send a
> > event
> > >> to oxenstored
> > >> 3) oxenstored calculates how many to read using req_cons and
> req_prod,
> > >> and after read oxenstored increase req_cons to make it equals
req_prod
> > >> which means no request pending.
> > >> 4) oxenstored put responds in rsp and increase rsp_prod, then
send a
> > event
> > >> to vm, xenbus in vm using similar logic to handle the rsp
ring.
> > >>
> > >>  Am I correct?
> > >>
> > >> So, I''m curious about what happened when req_cons
larger than
> > req_prod
> > >> (this can be caused by buggy PV Driver or malicious guest
user), it seems
> > >> oxenstored will fell in a endless loop.
> > >>
> > >> Is this what XSA-38 talk about?
> > >>
> > >> I built a pvdriver which will set req_prod to 0 after several
xenstore
> > >> operation, and test it on xen-unstable.hg make sure all
XSA-38 patches
> > >> applied.
> > >> It seems that the problem I first met reproduced. Oxenstored
will took a
> > lot
> > >> of memory eventually.
> > >>
> > >> Could anyone help me about this issue?
> > >>
> > >>
> > >>
> > >>
> > >>
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.xen.org
> > > http://lists.xen.org/xen-devel

David Scott

2013-Jul-15 12:13 UTC

head link

Re: oxenstored memory leak? seems related with XSA-38

Hi,

On 05/07/13 10:07, Liuqiming (John) wrote:>
> Here is my patch that try to fix this issue.
>
> The whole idea is: add check logic when read from IO ring, and if error
happens mark the reading connection as "bad",
> Unless vm reboot, oxenstored will not handle message from this connection
any more.
I think detecting a bad client and avoiding wasting CPU time on it is a 
good idea. Is this patch working well for you in your testing?

In future I wonder whether we should add some form of request 
rate-limiting in addition to the per-domain quotas, to prevent one 
domain from taking more than it''s fair share of xenstored time. I 
imagine that a non-malicious domain can still keep xenstored busy with 
''legitimate'' traffic (although hopefully it still provides
services to
other guests, albeit more slowly)

Cheers,
Dave
>
> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c
b/tools/ocaml/libs/xb/xs_ring_stubs.c
> index fdd9983..a9ca456 100644
> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
> @@ -45,6 +45,10 @@
>   	cons = *(volatile uint32*)&intf->req_cons;
>   	prod = *(volatile uint32*)&intf->req_prod;
>   	xen_mb();
> +
> +	if ((prod - cons) > XENSTORE_RING_SIZE)
> +	    return -1;
> +
>   	if (prod == cons)
>   		return 0;
>   	cons = MASK_XENSTORE_IDX(cons);
> @@ -94,7 +98,7 @@
>   	res = xs_ring_read(GET_C_STRUCT(interface),
>   	                   String_val(buffer), Int_val(len));
>   	if (res == -1)
> -		caml_failwith("huh");
> +		caml_failwith("bad client");
>   	result = Val_int(res);
>   	CAMLreturn(result);
>   }
> diff --git a/tools/ocaml/xenstored/connection.ml
b/tools/ocaml/xenstored/connection.ml
> index 32e2f2e..e862c79 100644
> --- a/tools/ocaml/xenstored/connection.ml
> +++ b/tools/ocaml/xenstored/connection.ml
> @@ -38,6 +38,11 @@
>   	mutable perm: Perms.Connection.t;
>   }
>
> +let mark_as_bad con > +	match con.dom with
> +	| None -> ()
> +	| Some domain -> Domain.mark_as_bad domain
> +
>   let get_path con >   Printf.sprintf "/local/domain/%i/"
(match con.dom with None -> 0 | Some d -> Domain.get_id d)
>
> diff --git a/tools/ocaml/xenstored/domain.ml
b/tools/ocaml/xenstored/domain.ml
> index 85ab282..444069d 100644
> --- a/tools/ocaml/xenstored/domain.ml
> +++ b/tools/ocaml/xenstored/domain.ml
> @@ -27,6 +27,7 @@
>   	interface: Xenmmap.mmap_interface;
>   	eventchn: Event.t;
>   	mutable port: Xeneventchn.t option;
> +	mutable bad_client: bool;
>   }
>
>   let get_path dom = "/local/domain/" ^ (sprintf "%u"
dom.id)
> @@ -34,6 +35,9 @@
>   let get_interface d = d.interface
>   let get_mfn d = d.mfn
>   let get_remote_port d = d.remote_port
> +
> +let is_bad_domain domain = domain.bad_client
> +let mark_as_bad domain = domain.bad_client <- true
>
>   let string_of_port = function
>   | None -> "None"
> @@ -68,7 +72,8 @@
>   	remote_port = remote_port;
>   	interface = interface;
>   	eventchn = eventchn;
> -	port = None
> +	port = None;
> +	bad_client = false
>   }
>
>   let is_dom0 d = d.id = 0
> diff --git a/tools/ocaml/xenstored/process.ml
b/tools/ocaml/xenstored/process.ml
> index a4ff741..2267ddc 100644
> --- a/tools/ocaml/xenstored/process.ml
> +++ b/tools/ocaml/xenstored/process.ml
> @@ -374,7 +374,17 @@
>   	Logging.xb_answer ~ty ~tid ~con:(Connection.get_domstr con) data
>
>   let do_input store cons doms con > -	if Connection.do_input con then (
> +	let newpacket > +		try
> +			Connection.do_input con
> +		with Failure exp ->
> +			error "caught exception %s" exp;
> +			error "got a bad client %s" (sprintf "%-8s"
(Connection.get_domstr con));
> +			Connection.mark_as_bad con;
> +			false
> +	in
> +	
> +	if newpacket then (
>   		let packet = Connection.pop_in con in
>   		let tid, rid, ty, data = Xenbus.Xb.Packet.unpack packet in
>   		(* As we don''t log IO, do not call an unnecessary
sanitize_data
> diff --git a/tools/ocaml/xenstored/xenstored.ml
b/tools/ocaml/xenstored/xenstored.ml
> index 4045aed..cfc2a4b 100644
> --- a/tools/ocaml/xenstored/xenstored.ml
> +++ b/tools/ocaml/xenstored/xenstored.ml
> @@ -50,9 +50,10 @@
>
>   let process_domains store cons domains >   	let do_io_domain domain
> -		let con = Connections.find_domain cons (Domain.get_id domain) in
> -		Process.do_input store cons domains con;
> -		Process.do_output store cons domains con in
> +		if not (Domain.is_bad_domain domain) then
> +			let con = Connections.find_domain cons (Domain.get_id domain) in
> +			Process.do_input store cons domains con;
> +			Process.do_output store cons domains con in
>   	Domains.iter domains do_io_domain
>
>   let sigusr1_handler store >
>
>> -----Original Message-----
>> From: Liuqiming (John)
>> Sent: Friday, July 05, 2013 11:14 AM
>> To: ''andrew.cooper3@citrix.com''
>> Cc: Yanqiangjun; ''xen-devel@lists.xen.org''
>> Subject: RE: [Xen-devel] oxenstored memory leak? seems related with
>> XSA-38
>>
>> Hi Andrew,
>>
>> Yes, I know the second patch commit by Ian Campbell.
>>
>> Actually I have applied all the patch from
>> http://wiki.xen.org/wiki/Security_Announcements#XSA_38_oxenstored_inc
>> orrect_handling_of_certain_Xenbus_ring_states
>>
>> But this issue still exists.
>>
>>> On 04/07/13 03:48, Liuqiming (John) wrote:
>>>> Hi all,
>>>>
>>>> Continue my test about oxenstored:
>>>>
>>>> I switch to original C xenstored and test my "broken"
vm. The cxenstored
>>> do not have the "memory leak" issue.
>>>> So I compared the IO ring handler logic between cxenstored and
>>> oxenstored and find out the difference:
>>>>
>>>> In Cxenstord, after got the cons and prod value, a index check
will be
>>> performed
>>>>
>>>> 	if (!check_indexes(cons, prod)) {
>>>> 		errno = EIO;
>>>> 		return -1;
>>>> 	}
>>>>
>>>> static bool check_indexes(XENSTORE_RING_IDX cons,
>>> XENSTORE_RING_IDX prod)
>>>> {
>>>> 	return ((prod - cons) <= XENSTORE_RING_SIZE);
>>>> }
>>>>
>>>> So any connection has prod - cons > XENSTORE_RING_SIZE will
be
>> treated
>>> as "bad client", and cxenstored will not handle its IO
ring msg any more.
>>>>
>>>> But in oxenstored, there is just a simple comparison between
prod and
>>> cons
>>>>      if (prod == cons)
>>>> 		return 0;
>>>>
>>>> so there leaves a security hole to a guest vm user who can
manipulate
>> prod
>>> and cons to make oxenstored increasing memory usage and out of
service.
>>>>
>>>> I managed to create a patch to fix this and I''m
testing it on xen4.2.2. Will
>>> send out soon.
>>>
>>> Are you aware of
>>>
>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=eeddfad1b339dca
>>> a787230f519a19de1cbc22ad8
>>> which is a second patch for XSA-38 ?
>>>
>>> ~Andrew
>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Liuqiming (John)
>>>>> Sent: Monday, July 01, 2013 9:47 PM
>>>>> To: ''xen-devel@lists.xen.org'';
''ian.jackson@eu.citrix.com'';
>>>>> ''ian.campbell@citrix.com''
>>>>> Cc: Yanqiangjun
>>>>> Subject: oxenstored memory leak? seems related with XSA-38
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I test starting vm using xen-4.2.2 release with oxenstored,
and got a
>>> problem
>>>>> may be related with XSA-38
>>>>>
>>>
>>
(http://lists.xen.org/archives/html/xen-announce/2013-02/msg00005.html).
>>>>>
>>>>> When vm started, oxenstored memory usage keep increasing,
and it
>> took
>>>>> 1.5G memory at last. Vm hanged at loading OS screen.
>>>>>
>>>>> Here is the output of top:
>>>>>
>>>>> top - 20:18:32 up 1 day,  3:09,  5 users,  load average:
0.99, 0.63,
>> 0.32
>>>>> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,  
0 zombie
>>>>> %Cpu(s):  4.5 us,  1.8 sy,  0.0 ni, 93.7 id,  0.0 wa,  0.0
hi,  0.0 si,
>>> 0.0
>>>>> st
>>>>> KiB Mem:  46919428 total, 46699012 used,   220416 free,   
36916
>>>>> buffers
>>>>> KiB Swap:  2103292 total,        0 used,  2103292 free,
44260932
>>>>> cached
>>>>>
>>>>>    PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM
>>> TIME+
>>>>> COMMAND
>>>>>    806 root      20   0  955m 926m 1068 R  99.9  2.0  
4:54.14
>>>>> oxenstored
>>>>>
>>>>>
>>>>> top - 20:19:05 up 1 day,  3:09,  5 users,  load average:
0.99, 0.67,
>> 0.34
>>>>> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,  
0 zombie
>>>>> %Cpu(s):  4.6 us,  1.6 sy,  0.0 ni, 93.8 id,  0.0 wa,  0.0
hi,  0.0 si,
>>> 0.0
>>>>> st
>>>>> KiB Mem:  46919428 total, 46708564 used,   210864 free,   
36964
>>>>> buffers
>>>>> KiB Swap:  2103292 total,        0 used,  2103292 free,
44168380
>>>>> cached
>>>>>
>>>>>    PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM
>>> TIME+
>>>>> COMMAND
>>>>>    806 root      20   0 1048m 1.0g 1068 R 100.2  2.2  
5:27.03
>>>>> oxenstored
>>>>>
>>>>>
>>>>>
>>>>> top - 20:21:35 up 1 day,  3:12,  5 users,  load average:
1.00, 0.80,
>> 0.44
>>>>> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,  
0 zombie
>>>>> %Cpu(s):  4.7 us,  1.6 sy,  0.0 ni, 93.7 id,  0.0 wa,  0.0
hi,  0.0 si,
>>> 0.0
>>>>> st
>>>>> KiB Mem:  46919428 total, 46703052 used,   216376 free,   
37208
>>>>> buffers
>>>>> KiB Swap:  2103292 total,        0 used,  2103292 free,
43682968
>>>>> cached
>>>>>
>>>>>    PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM
>>> TIME+
>>>>> COMMAND
>>>>>    806 root      20   0 1551m 1.5g 1068 R 100.2  3.3  
7:56.10
>>>>> oxenstored
>>>>>
>>>>> And oxenstored log got these over and over again:
>>>>>
>>>>> [20130701T12:27:14.290Z] D8           invalid
>>>>> device/suspend/event-channel
>>>>>
>>>>> ..
>>>>> [20130701T12:27:14.290Z] D8.1937077039 invalid
>>>>> /event-channel
>>>>>
>>>>>    ..
>>>>> [20130701T12:27:14.290Z] D8.1852727656
>>>>> invalid
>>>>>
>>>>>              ..
>>>>> [20130701T12:27:14.290Z] D8           debug
>>>>> [20130701T12:27:14.290Z] D8           debug
>>>>> [20130701T12:27:14.290Z] D8           debug
>>>>> [20130701T12:27:14.290Z] D8           debug
>>>>> [20130701T12:27:14.290Z] D8           debug
>>>>> [20130701T12:27:14.290Z] D8           debug
>>>>>
>>>>> My vm is a windows guest and has GPL PVDriver installed.
This problem
>> is
>>>>> hard to reproduce, and after a hard reboot, everything
looks normal.
>>>>>
>>>>> I guess it''s something wrong with the xenbus IO
Ring, so I investigated
>> the
>>>>> code:
>>>>>
>>>>> 1) oxenstored and xenbus in vm using a shared page to
communicate
>> with
>>>>> each other
>>>>>    struct xenstore_domain_interface {
>>>>>      char req[XENSTORE_RING_SIZE]; /* Requests to xenstore
>> daemon.
>>> */
>>>>>      char rsp[XENSTORE_RING_SIZE]; /* Replies and async
watch
>> events.
>>> */
>>>>>      XENSTORE_RING_IDX req_cons, req_prod;
>>>>>      XENSTORE_RING_IDX rsp_cons, rsp_prod;
>>>>> };
>>>>>
>>>>> 2) xenbus in vm put request in req and increase req_prod,
then send a
>>> event
>>>>> to oxenstored
>>>>> 3) oxenstored calculates how many to read using req_cons
and
>> req_prod,
>>>>> and after read oxenstored increase req_cons to make it
equals req_prod
>>>>> which means no request pending.
>>>>> 4) oxenstored put responds in rsp and increase rsp_prod,
then send a
>>> event
>>>>> to vm, xenbus in vm using similar logic to handle the rsp
ring.
>>>>>
>>>>>   Am I correct?
>>>>>
>>>>> So, I''m curious about what happened when req_cons
larger than
>>> req_prod
>>>>> (this can be caused by buggy PV Driver or malicious guest
user), it seems
>>>>> oxenstored will fell in a endless loop.
>>>>>
>>>>> Is this what XSA-38 talk about?
>>>>>
>>>>> I built a pvdriver which will set req_prod to 0 after
several xenstore
>>>>> operation, and test it on xen-unstable.hg make sure all
XSA-38 patches
>>>>> applied.
>>>>> It seems that the problem I first met reproduced.
Oxenstored will took a
>>> lot
>>>>> of memory eventually.
>>>>>
>>>>> Could anyone help me about this issue?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xen.org
>>>> http://lists.xen.org/xen-devel
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

Liuqiming (John)

2013-Jul-16 05:19 UTC

head link

Re: oxenstored memory leak? seems related with XSA-38

Hi,

This patch works fine in my test, it can protect oxenstored from malicious
domain(which I made), but I''m not sure whether the check logic can
cover all conditions.

As oxenstored is a shared resource, I think it''s a good idea to add
some QOS mechanism.
> -----Original Message-----
> From: David Scott [mailto:dave.scott@eu.citrix.com]
> Sent: Monday, July 15, 2013 8:13 PM
> To: Liuqiming (John)
> Cc: andrew.cooper3@citrix.com; xen-devel@lists.xen.org; Yanqiangjun
> Subject: Re: [Xen-devel] oxenstored memory leak? seems related with
> XSA-38
> 
> Hi,
> 
> On 05/07/13 10:07, Liuqiming (John) wrote:
> >
> > Here is my patch that try to fix this issue.
> >
> > The whole idea is: add check logic when read from IO ring, and if
error
> happens mark the reading connection as "bad",
> > Unless vm reboot, oxenstored will not handle message from this
> connection any more.
> 
> I think detecting a bad client and avoiding wasting CPU time on it is a
> good idea. Is this patch working well for you in your testing?
> 
> In future I wonder whether we should add some form of request
> rate-limiting in addition to the per-domain quotas, to prevent one
> domain from taking more than it''s fair share of xenstored time. I
> imagine that a non-malicious domain can still keep xenstored busy with
> ''legitimate'' traffic (although hopefully it still
provides services to
> other guests, albeit more slowly)
> 
> Cheers,
> Dave
> 
> >
> > diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c
> b/tools/ocaml/libs/xb/xs_ring_stubs.c
> > index fdd9983..a9ca456 100644
> > --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
> > +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
> > @@ -45,6 +45,10 @@
> >   	cons = *(volatile uint32*)&intf->req_cons;
> >   	prod = *(volatile uint32*)&intf->req_prod;
> >   	xen_mb();
> > +
> > +	if ((prod - cons) > XENSTORE_RING_SIZE)
> > +	    return -1;
> > +
> >   	if (prod == cons)
> >   		return 0;
> >   	cons = MASK_XENSTORE_IDX(cons);
> > @@ -94,7 +98,7 @@
> >   	res = xs_ring_read(GET_C_STRUCT(interface),
> >   	                   String_val(buffer), Int_val(len));
> >   	if (res == -1)
> > -		caml_failwith("huh");
> > +		caml_failwith("bad client");
> >   	result = Val_int(res);
> >   	CAMLreturn(result);
> >   }
> > diff --git a/tools/ocaml/xenstored/connection.ml
> b/tools/ocaml/xenstored/connection.ml
> > index 32e2f2e..e862c79 100644
> > --- a/tools/ocaml/xenstored/connection.ml
> > +++ b/tools/ocaml/xenstored/connection.ml
> > @@ -38,6 +38,11 @@
> >   	mutable perm: Perms.Connection.t;
> >   }
> >
> > +let mark_as_bad con > > +	match con.dom with
> > +	| None -> ()
> > +	| Some domain -> Domain.mark_as_bad domain
> > +
> >   let get_path con > >   Printf.sprintf
"/local/domain/%i/" (match con.dom with None -> 0 |
> Some d -> Domain.get_id d)
> >
> > diff --git a/tools/ocaml/xenstored/domain.ml
> b/tools/ocaml/xenstored/domain.ml
> > index 85ab282..444069d 100644
> > --- a/tools/ocaml/xenstored/domain.ml
> > +++ b/tools/ocaml/xenstored/domain.ml
> > @@ -27,6 +27,7 @@
> >   	interface: Xenmmap.mmap_interface;
> >   	eventchn: Event.t;
> >   	mutable port: Xeneventchn.t option;
> > +	mutable bad_client: bool;
> >   }
> >
> >   let get_path dom = "/local/domain/" ^ (sprintf
"%u" dom.id)
> > @@ -34,6 +35,9 @@
> >   let get_interface d = d.interface
> >   let get_mfn d = d.mfn
> >   let get_remote_port d = d.remote_port
> > +
> > +let is_bad_domain domain = domain.bad_client
> > +let mark_as_bad domain = domain.bad_client <- true
> >
> >   let string_of_port = function
> >   | None -> "None"
> > @@ -68,7 +72,8 @@
> >   	remote_port = remote_port;
> >   	interface = interface;
> >   	eventchn = eventchn;
> > -	port = None
> > +	port = None;
> > +	bad_client = false
> >   }
> >
> >   let is_dom0 d = d.id = 0
> > diff --git a/tools/ocaml/xenstored/process.ml
> b/tools/ocaml/xenstored/process.ml
> > index a4ff741..2267ddc 100644
> > --- a/tools/ocaml/xenstored/process.ml
> > +++ b/tools/ocaml/xenstored/process.ml
> > @@ -374,7 +374,17 @@
> >   	Logging.xb_answer ~ty ~tid ~con:(Connection.get_domstr con)
> data
> >
> >   let do_input store cons doms con > > -	if Connection.do_input
con then (
> > +	let newpacket > > +		try
> > +			Connection.do_input con
> > +		with Failure exp ->
> > +			error "caught exception %s" exp;
> > +			error "got a bad client %s" (sprintf "%-8s"
> (Connection.get_domstr con));
> > +			Connection.mark_as_bad con;
> > +			false
> > +	in
> > +
> > +	if newpacket then (
> >   		let packet = Connection.pop_in con in
> >   		let tid, rid, ty, data = Xenbus.Xb.Packet.unpack packet in
> >   		(* As we don''t log IO, do not call an unnecessary
sanitize_data
> > diff --git a/tools/ocaml/xenstored/xenstored.ml
> b/tools/ocaml/xenstored/xenstored.ml
> > index 4045aed..cfc2a4b 100644
> > --- a/tools/ocaml/xenstored/xenstored.ml
> > +++ b/tools/ocaml/xenstored/xenstored.ml
> > @@ -50,9 +50,10 @@
> >
> >   let process_domains store cons domains > >   	let do_io_domain
domain > > -		let con = Connections.find_domain cons (Domain.get_id
domain) in
> > -		Process.do_input store cons domains con;
> > -		Process.do_output store cons domains con in
> > +		if not (Domain.is_bad_domain domain) then
> > +			let con = Connections.find_domain cons (Domain.get_id
> domain) in
> > +			Process.do_input store cons domains con;
> > +			Process.do_output store cons domains con in
> >   	Domains.iter domains do_io_domain
> >
> >   let sigusr1_handler store > >
> >
> >> -----Original Message-----
> >> From: Liuqiming (John)
> >> Sent: Friday, July 05, 2013 11:14 AM
> >> To: ''andrew.cooper3@citrix.com''
> >> Cc: Yanqiangjun; ''xen-devel@lists.xen.org''
> >> Subject: RE: [Xen-devel] oxenstored memory leak? seems related
with
> >> XSA-38
> >>
> >> Hi Andrew,
> >>
> >> Yes, I know the second patch commit by Ian Campbell.
> >>
> >> Actually I have applied all the patch from
> >>
> http://wiki.xen.org/wiki/Security_Announcements#XSA_38_oxenstored_inc
> >> orrect_handling_of_certain_Xenbus_ring_states
> >>
> >> But this issue still exists.
> >>
> >>> On 04/07/13 03:48, Liuqiming (John) wrote:
> >>>> Hi all,
> >>>>
> >>>> Continue my test about oxenstored:
> >>>>
> >>>> I switch to original C xenstored and test my
"broken" vm. The
> cxenstored
> >>> do not have the "memory leak" issue.
> >>>> So I compared the IO ring handler logic between cxenstored
and
> >>> oxenstored and find out the difference:
> >>>>
> >>>> In Cxenstord, after got the cons and prod value, a index
check will be
> >>> performed
> >>>>
> >>>> 	if (!check_indexes(cons, prod)) {
> >>>> 		errno = EIO;
> >>>> 		return -1;
> >>>> 	}
> >>>>
> >>>> static bool check_indexes(XENSTORE_RING_IDX cons,
> >>> XENSTORE_RING_IDX prod)
> >>>> {
> >>>> 	return ((prod - cons) <= XENSTORE_RING_SIZE);
> >>>> }
> >>>>
> >>>> So any connection has prod - cons > XENSTORE_RING_SIZE
will be
> >> treated
> >>> as "bad client", and cxenstored will not handle its
IO ring msg any more.
> >>>>
> >>>> But in oxenstored, there is just a simple comparison
between prod and
> >>> cons
> >>>>      if (prod == cons)
> >>>> 		return 0;
> >>>>
> >>>> so there leaves a security hole to a guest vm user who can
manipulate
> >> prod
> >>> and cons to make oxenstored increasing memory usage and out of
> service.
> >>>>
> >>>> I managed to create a patch to fix this and I''m
testing it on xen4.2.2.
> Will
> >>> send out soon.
> >>>
> >>> Are you aware of
> >>>
> >>
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=eeddfad1b339dca
> >>> a787230f519a19de1cbc22ad8
> >>> which is a second patch for XSA-38 ?
> >>>
> >>> ~Andrew
> >>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Liuqiming (John)
> >>>>> Sent: Monday, July 01, 2013 9:47 PM
> >>>>> To: ''xen-devel@lists.xen.org'';
''ian.jackson@eu.citrix.com'';
> >>>>> ''ian.campbell@citrix.com''
> >>>>> Cc: Yanqiangjun
> >>>>> Subject: oxenstored memory leak? seems related with
XSA-38
> >>>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I test starting vm using xen-4.2.2 release with
oxenstored, and got a
> >>> problem
> >>>>> may be related with XSA-38
> >>>>>
> >>>
> >>
> (http://lists.xen.org/archives/html/xen-announce/2013-02/msg00005.html).
> >>>>>
> >>>>> When vm started, oxenstored memory usage keep
increasing, and it
> >> took
> >>>>> 1.5G memory at last. Vm hanged at loading OS screen.
> >>>>>
> >>>>> Here is the output of top:
> >>>>>
> >>>>> top - 20:18:32 up 1 day,  3:09,  5 users,  load
average: 0.99, 0.63,
> >> 0.32
> >>>>> Tasks:   1 total,   1 running,   0 sleeping,   0
stopped,   0
> zombie
> >>>>> %Cpu(s):  4.5 us,  1.8 sy,  0.0 ni, 93.7 id,  0.0 wa, 
0.0 hi,  0.0 si,
> >>> 0.0
> >>>>> st
> >>>>> KiB Mem:  46919428 total, 46699012 used,   220416
free,
> 36916
> >>>>> buffers
> >>>>> KiB Swap:  2103292 total,        0 used,  2103292
free,
> 44260932
> >>>>> cached
> >>>>>
> >>>>>    PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM
> >>> TIME+
> >>>>> COMMAND
> >>>>>    806 root      20   0  955m 926m 1068 R  99.9  2.0
> 4:54.14
> >>>>> oxenstored
> >>>>>
> >>>>>
> >>>>> top - 20:19:05 up 1 day,  3:09,  5 users,  load
average: 0.99, 0.67,
> >> 0.34
> >>>>> Tasks:   1 total,   1 running,   0 sleeping,   0
stopped,   0
> zombie
> >>>>> %Cpu(s):  4.6 us,  1.6 sy,  0.0 ni, 93.8 id,  0.0 wa, 
0.0 hi,  0.0 si,
> >>> 0.0
> >>>>> st
> >>>>> KiB Mem:  46919428 total, 46708564 used,   210864
free,
> 36964
> >>>>> buffers
> >>>>> KiB Swap:  2103292 total,        0 used,  2103292
free,
> 44168380
> >>>>> cached
> >>>>>
> >>>>>    PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM
> >>> TIME+
> >>>>> COMMAND
> >>>>>    806 root      20   0 1048m 1.0g 1068 R 100.2  2.2
> 5:27.03
> >>>>> oxenstored
> >>>>>
> >>>>>
> >>>>>
> >>>>> top - 20:21:35 up 1 day,  3:12,  5 users,  load
average: 1.00, 0.80,
> >> 0.44
> >>>>> Tasks:   1 total,   1 running,   0 sleeping,   0
stopped,   0
> zombie
> >>>>> %Cpu(s):  4.7 us,  1.6 sy,  0.0 ni, 93.7 id,  0.0 wa, 
0.0 hi,  0.0 si,
> >>> 0.0
> >>>>> st
> >>>>> KiB Mem:  46919428 total, 46703052 used,   216376
free,
> 37208
> >>>>> buffers
> >>>>> KiB Swap:  2103292 total,        0 used,  2103292
free,
> 43682968
> >>>>> cached
> >>>>>
> >>>>>    PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM
> >>> TIME+
> >>>>> COMMAND
> >>>>>    806 root      20   0 1551m 1.5g 1068 R 100.2  3.3
> 7:56.10
> >>>>> oxenstored
> >>>>>
> >>>>> And oxenstored log got these over and over again:
> >>>>>
> >>>>> [20130701T12:27:14.290Z] D8           invalid
> >>>>> device/suspend/event-channel
> >>>>>
> >>>>> ..
> >>>>> [20130701T12:27:14.290Z] D8.1937077039 invalid
> >>>>> /event-channel
> >>>>>
> >>>>>    ..
> >>>>> [20130701T12:27:14.290Z] D8.1852727656
> >>>>> invalid
> >>>>>
> >>>>>              ..
> >>>>> [20130701T12:27:14.290Z] D8           debug
> >>>>> [20130701T12:27:14.290Z] D8           debug
> >>>>> [20130701T12:27:14.290Z] D8           debug
> >>>>> [20130701T12:27:14.290Z] D8           debug
> >>>>> [20130701T12:27:14.290Z] D8           debug
> >>>>> [20130701T12:27:14.290Z] D8           debug
> >>>>>
> >>>>> My vm is a windows guest and has GPL PVDriver
installed. This
> problem
> >> is
> >>>>> hard to reproduce, and after a hard reboot, everything
looks normal.
> >>>>>
> >>>>> I guess it''s something wrong with the xenbus
IO Ring, so I investigated
> >> the
> >>>>> code:
> >>>>>
> >>>>> 1) oxenstored and xenbus in vm using a shared page to
communicate
> >> with
> >>>>> each other
> >>>>>    struct xenstore_domain_interface {
> >>>>>      char req[XENSTORE_RING_SIZE]; /* Requests to
xenstore
> >> daemon.
> >>> */
> >>>>>      char rsp[XENSTORE_RING_SIZE]; /* Replies and
async watch
> >> events.
> >>> */
> >>>>>      XENSTORE_RING_IDX req_cons, req_prod;
> >>>>>      XENSTORE_RING_IDX rsp_cons, rsp_prod;
> >>>>> };
> >>>>>
> >>>>> 2) xenbus in vm put request in req and increase
req_prod, then send a
> >>> event
> >>>>> to oxenstored
> >>>>> 3) oxenstored calculates how many to read using
req_cons and
> >> req_prod,
> >>>>> and after read oxenstored increase req_cons to make it
equals
> req_prod
> >>>>> which means no request pending.
> >>>>> 4) oxenstored put responds in rsp and increase
rsp_prod, then send a
> >>> event
> >>>>> to vm, xenbus in vm using similar logic to handle the
rsp ring.
> >>>>>
> >>>>>   Am I correct?
> >>>>>
> >>>>> So, I''m curious about what happened when
req_cons larger than
> >>> req_prod
> >>>>> (this can be caused by buggy PV Driver or malicious
guest user), it
> seems
> >>>>> oxenstored will fell in a endless loop.
> >>>>>
> >>>>> Is this what XSA-38 talk about?
> >>>>>
> >>>>> I built a pvdriver which will set req_prod to 0 after
several xenstore
> >>>>> operation, and test it on xen-unstable.hg make sure
all XSA-38
> patches
> >>>>> applied.
> >>>>> It seems that the problem I first met reproduced.
Oxenstored will took
> a
> >>> lot
> >>>>> of memory eventually.
> >>>>>
> >>>>> Could anyone help me about this issue?
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>> _______________________________________________
> >>>> Xen-devel mailing list
> >>>> Xen-devel@lists.xen.org
> >>>> http://lists.xen.org/xen-devel
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> >

Ian Campbell

2013-Jul-16 09:46 UTC

head link

Re: oxenstored memory leak? seems related with XSA-38

On Mon, 2013-07-15 at 13:13 +0100, David Scott wrote:> Hi,
> 
> On 05/07/13 10:07, Liuqiming (John) wrote:
> >
> > Here is my patch that try to fix this issue.
> >
> > The whole idea is: add check logic when read from IO ring, and if
error happens mark the reading connection as "bad",
> > Unless vm reboot, oxenstored will not handle message from this
connection any more.
> 
> I think detecting a bad client and avoiding wasting CPU time on it is a 
> good idea. Is this patch working well for you in your testing?
> 
> In future I wonder whether we should add some form of request 
> rate-limiting in addition to the per-domain quotas, to prevent one 
> domain from taking more than it''s fair share of xenstored time. I 
> imagine that a non-malicious domain can still keep xenstored busy with 
> ''legitimate'' traffic (although hopefully it still
provides services to
> other guests, albeit more slowly)
Is this an Ack for this patch, or (leaving aside future work) would you
like to see changes before it gets applied?

Ian.
> 
> Cheers,
> Dave
> 
> >
> > diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c
b/tools/ocaml/libs/xb/xs_ring_stubs.c
> > index fdd9983..a9ca456 100644
> > --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
> > +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
> > @@ -45,6 +45,10 @@
> >   	cons = *(volatile uint32*)&intf->req_cons;
> >   	prod = *(volatile uint32*)&intf->req_prod;
> >   	xen_mb();
> > +
> > +	if ((prod - cons) > XENSTORE_RING_SIZE)
> > +	    return -1;
> > +
> >   	if (prod == cons)
> >   		return 0;
> >   	cons = MASK_XENSTORE_IDX(cons);
> > @@ -94,7 +98,7 @@
> >   	res = xs_ring_read(GET_C_STRUCT(interface),
> >   	                   String_val(buffer), Int_val(len));
> >   	if (res == -1)
> > -		caml_failwith("huh");
> > +		caml_failwith("bad client");
> >   	result = Val_int(res);
> >   	CAMLreturn(result);
> >   }
> > diff --git a/tools/ocaml/xenstored/connection.ml
b/tools/ocaml/xenstored/connection.ml
> > index 32e2f2e..e862c79 100644
> > --- a/tools/ocaml/xenstored/connection.ml
> > +++ b/tools/ocaml/xenstored/connection.ml
> > @@ -38,6 +38,11 @@
> >   	mutable perm: Perms.Connection.t;
> >   }
> >
> > +let mark_as_bad con > > +	match con.dom with
> > +	| None -> ()
> > +	| Some domain -> Domain.mark_as_bad domain
> > +
> >   let get_path con > >   Printf.sprintf
"/local/domain/%i/" (match con.dom with None -> 0 | Some d ->
Domain.get_id d)
> >
> > diff --git a/tools/ocaml/xenstored/domain.ml
b/tools/ocaml/xenstored/domain.ml
> > index 85ab282..444069d 100644
> > --- a/tools/ocaml/xenstored/domain.ml
> > +++ b/tools/ocaml/xenstored/domain.ml
> > @@ -27,6 +27,7 @@
> >   	interface: Xenmmap.mmap_interface;
> >   	eventchn: Event.t;
> >   	mutable port: Xeneventchn.t option;
> > +	mutable bad_client: bool;
> >   }
> >
> >   let get_path dom = "/local/domain/" ^ (sprintf
"%u" dom.id)
> > @@ -34,6 +35,9 @@
> >   let get_interface d = d.interface
> >   let get_mfn d = d.mfn
> >   let get_remote_port d = d.remote_port
> > +
> > +let is_bad_domain domain = domain.bad_client
> > +let mark_as_bad domain = domain.bad_client <- true
> >
> >   let string_of_port = function
> >   | None -> "None"
> > @@ -68,7 +72,8 @@
> >   	remote_port = remote_port;
> >   	interface = interface;
> >   	eventchn = eventchn;
> > -	port = None
> > +	port = None;
> > +	bad_client = false
> >   }
> >
> >   let is_dom0 d = d.id = 0
> > diff --git a/tools/ocaml/xenstored/process.ml
b/tools/ocaml/xenstored/process.ml
> > index a4ff741..2267ddc 100644
> > --- a/tools/ocaml/xenstored/process.ml
> > +++ b/tools/ocaml/xenstored/process.ml
> > @@ -374,7 +374,17 @@
> >   	Logging.xb_answer ~ty ~tid ~con:(Connection.get_domstr con) data
> >
> >   let do_input store cons doms con > > -	if Connection.do_input
con then (
> > +	let newpacket > > +		try
> > +			Connection.do_input con
> > +		with Failure exp ->
> > +			error "caught exception %s" exp;
> > +			error "got a bad client %s" (sprintf "%-8s"
(Connection.get_domstr con));
> > +			Connection.mark_as_bad con;
> > +			false
> > +	in
> > +	
> > +	if newpacket then (
> >   		let packet = Connection.pop_in con in
> >   		let tid, rid, ty, data = Xenbus.Xb.Packet.unpack packet in
> >   		(* As we don''t log IO, do not call an unnecessary
sanitize_data
> > diff --git a/tools/ocaml/xenstored/xenstored.ml
b/tools/ocaml/xenstored/xenstored.ml
> > index 4045aed..cfc2a4b 100644
> > --- a/tools/ocaml/xenstored/xenstored.ml
> > +++ b/tools/ocaml/xenstored/xenstored.ml
> > @@ -50,9 +50,10 @@
> >
> >   let process_domains store cons domains > >   	let do_io_domain
domain > > -		let con = Connections.find_domain cons (Domain.get_id
domain) in
> > -		Process.do_input store cons domains con;
> > -		Process.do_output store cons domains con in
> > +		if not (Domain.is_bad_domain domain) then
> > +			let con = Connections.find_domain cons (Domain.get_id domain) in
> > +			Process.do_input store cons domains con;
> > +			Process.do_output store cons domains con in
> >   	Domains.iter domains do_io_domain
> >
> >   let sigusr1_handler store > >
> >
> >> -----Original Message-----
> >> From: Liuqiming (John)
> >> Sent: Friday, July 05, 2013 11:14 AM
> >> To: ''andrew.cooper3@citrix.com''
> >> Cc: Yanqiangjun; ''xen-devel@lists.xen.org''
> >> Subject: RE: [Xen-devel] oxenstored memory leak? seems related
with
> >> XSA-38
> >>
> >> Hi Andrew,
> >>
> >> Yes, I know the second patch commit by Ian Campbell.
> >>
> >> Actually I have applied all the patch from
> >>
http://wiki.xen.org/wiki/Security_Announcements#XSA_38_oxenstored_inc
> >> orrect_handling_of_certain_Xenbus_ring_states
> >>
> >> But this issue still exists.
> >>
> >>> On 04/07/13 03:48, Liuqiming (John) wrote:
> >>>> Hi all,
> >>>>
> >>>> Continue my test about oxenstored:
> >>>>
> >>>> I switch to original C xenstored and test my
"broken" vm. The cxenstored
> >>> do not have the "memory leak" issue.
> >>>> So I compared the IO ring handler logic between cxenstored
and
> >>> oxenstored and find out the difference:
> >>>>
> >>>> In Cxenstord, after got the cons and prod value, a index
check will be
> >>> performed
> >>>>
> >>>> 	if (!check_indexes(cons, prod)) {
> >>>> 		errno = EIO;
> >>>> 		return -1;
> >>>> 	}
> >>>>
> >>>> static bool check_indexes(XENSTORE_RING_IDX cons,
> >>> XENSTORE_RING_IDX prod)
> >>>> {
> >>>> 	return ((prod - cons) <= XENSTORE_RING_SIZE);
> >>>> }
> >>>>
> >>>> So any connection has prod - cons > XENSTORE_RING_SIZE
will be
> >> treated
> >>> as "bad client", and cxenstored will not handle its
IO ring msg any more.
> >>>>
> >>>> But in oxenstored, there is just a simple comparison
between prod and
> >>> cons
> >>>>      if (prod == cons)
> >>>> 		return 0;
> >>>>
> >>>> so there leaves a security hole to a guest vm user who can
manipulate
> >> prod
> >>> and cons to make oxenstored increasing memory usage and out of
service.
> >>>>
> >>>> I managed to create a patch to fix this and I''m
testing it on xen4.2.2. Will
> >>> send out soon.
> >>>
> >>> Are you aware of
> >>>
> >>
http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=eeddfad1b339dca
> >>> a787230f519a19de1cbc22ad8
> >>> which is a second patch for XSA-38 ?
> >>>
> >>> ~Andrew
> >>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Liuqiming (John)
> >>>>> Sent: Monday, July 01, 2013 9:47 PM
> >>>>> To: ''xen-devel@lists.xen.org'';
''ian.jackson@eu.citrix.com'';
> >>>>> ''ian.campbell@citrix.com''
> >>>>> Cc: Yanqiangjun
> >>>>> Subject: oxenstored memory leak? seems related with
XSA-38
> >>>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I test starting vm using xen-4.2.2 release with
oxenstored, and got a
> >>> problem
> >>>>> may be related with XSA-38
> >>>>>
> >>>
> >>
(http://lists.xen.org/archives/html/xen-announce/2013-02/msg00005.html).
> >>>>>
> >>>>> When vm started, oxenstored memory usage keep
increasing, and it
> >> took
> >>>>> 1.5G memory at last. Vm hanged at loading OS screen.
> >>>>>
> >>>>> Here is the output of top:
> >>>>>
> >>>>> top - 20:18:32 up 1 day,  3:09,  5 users,  load
average: 0.99, 0.63,
> >> 0.32
> >>>>> Tasks:   1 total,   1 running,   0 sleeping,   0
stopped,   0 zombie
> >>>>> %Cpu(s):  4.5 us,  1.8 sy,  0.0 ni, 93.7 id,  0.0 wa, 
0.0 hi,  0.0 si,
> >>> 0.0
> >>>>> st
> >>>>> KiB Mem:  46919428 total, 46699012 used,   220416
free,    36916
> >>>>> buffers
> >>>>> KiB Swap:  2103292 total,        0 used,  2103292
free, 44260932
> >>>>> cached
> >>>>>
> >>>>>    PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM
> >>> TIME+
> >>>>> COMMAND
> >>>>>    806 root      20   0  955m 926m 1068 R  99.9  2.0  
4:54.14
> >>>>> oxenstored
> >>>>>
> >>>>>
> >>>>> top - 20:19:05 up 1 day,  3:09,  5 users,  load
average: 0.99, 0.67,
> >> 0.34
> >>>>> Tasks:   1 total,   1 running,   0 sleeping,   0
stopped,   0 zombie
> >>>>> %Cpu(s):  4.6 us,  1.6 sy,  0.0 ni, 93.8 id,  0.0 wa, 
0.0 hi,  0.0 si,
> >>> 0.0
> >>>>> st
> >>>>> KiB Mem:  46919428 total, 46708564 used,   210864
free,    36964
> >>>>> buffers
> >>>>> KiB Swap:  2103292 total,        0 used,  2103292
free, 44168380
> >>>>> cached
> >>>>>
> >>>>>    PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM
> >>> TIME+
> >>>>> COMMAND
> >>>>>    806 root      20   0 1048m 1.0g 1068 R 100.2  2.2  
5:27.03
> >>>>> oxenstored
> >>>>>
> >>>>>
> >>>>>
> >>>>> top - 20:21:35 up 1 day,  3:12,  5 users,  load
average: 1.00, 0.80,
> >> 0.44
> >>>>> Tasks:   1 total,   1 running,   0 sleeping,   0
stopped,   0 zombie
> >>>>> %Cpu(s):  4.7 us,  1.6 sy,  0.0 ni, 93.7 id,  0.0 wa, 
0.0 hi,  0.0 si,
> >>> 0.0
> >>>>> st
> >>>>> KiB Mem:  46919428 total, 46703052 used,   216376
free,    37208
> >>>>> buffers
> >>>>> KiB Swap:  2103292 total,        0 used,  2103292
free, 43682968
> >>>>> cached
> >>>>>
> >>>>>    PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM
> >>> TIME+
> >>>>> COMMAND
> >>>>>    806 root      20   0 1551m 1.5g 1068 R 100.2  3.3  
7:56.10
> >>>>> oxenstored
> >>>>>
> >>>>> And oxenstored log got these over and over again:
> >>>>>
> >>>>> [20130701T12:27:14.290Z] D8           invalid
> >>>>> device/suspend/event-channel
> >>>>>
> >>>>> ..
> >>>>> [20130701T12:27:14.290Z] D8.1937077039 invalid
> >>>>> /event-channel
> >>>>>
> >>>>>    ..
> >>>>> [20130701T12:27:14.290Z] D8.1852727656
> >>>>> invalid
> >>>>>
> >>>>>              ..
> >>>>> [20130701T12:27:14.290Z] D8           debug
> >>>>> [20130701T12:27:14.290Z] D8           debug
> >>>>> [20130701T12:27:14.290Z] D8           debug
> >>>>> [20130701T12:27:14.290Z] D8           debug
> >>>>> [20130701T12:27:14.290Z] D8           debug
> >>>>> [20130701T12:27:14.290Z] D8           debug
> >>>>>
> >>>>> My vm is a windows guest and has GPL PVDriver
installed. This problem
> >> is
> >>>>> hard to reproduce, and after a hard reboot, everything
looks normal.
> >>>>>
> >>>>> I guess it''s something wrong with the xenbus
IO Ring, so I investigated
> >> the
> >>>>> code:
> >>>>>
> >>>>> 1) oxenstored and xenbus in vm using a shared page to
communicate
> >> with
> >>>>> each other
> >>>>>    struct xenstore_domain_interface {
> >>>>>      char req[XENSTORE_RING_SIZE]; /* Requests to
xenstore
> >> daemon.
> >>> */
> >>>>>      char rsp[XENSTORE_RING_SIZE]; /* Replies and
async watch
> >> events.
> >>> */
> >>>>>      XENSTORE_RING_IDX req_cons, req_prod;
> >>>>>      XENSTORE_RING_IDX rsp_cons, rsp_prod;
> >>>>> };
> >>>>>
> >>>>> 2) xenbus in vm put request in req and increase
req_prod, then send a
> >>> event
> >>>>> to oxenstored
> >>>>> 3) oxenstored calculates how many to read using
req_cons and
> >> req_prod,
> >>>>> and after read oxenstored increase req_cons to make it
equals req_prod
> >>>>> which means no request pending.
> >>>>> 4) oxenstored put responds in rsp and increase
rsp_prod, then send a
> >>> event
> >>>>> to vm, xenbus in vm using similar logic to handle the
rsp ring.
> >>>>>
> >>>>>   Am I correct?
> >>>>>
> >>>>> So, I''m curious about what happened when
req_cons larger than
> >>> req_prod
> >>>>> (this can be caused by buggy PV Driver or malicious
guest user), it seems
> >>>>> oxenstored will fell in a endless loop.
> >>>>>
> >>>>> Is this what XSA-38 talk about?
> >>>>>
> >>>>> I built a pvdriver which will set req_prod to 0 after
several xenstore
> >>>>> operation, and test it on xen-unstable.hg make sure
all XSA-38 patches
> >>>>> applied.
> >>>>> It seems that the problem I first met reproduced.
Oxenstored will took a
> >>> lot
> >>>>> of memory eventually.
> >>>>>
> >>>>> Could anyone help me about this issue?
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>> _______________________________________________
> >>>> Xen-devel mailing list
> >>>> Xen-devel@lists.xen.org
> >>>> http://lists.xen.org/xen-devel
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> >
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

David Scott

2013-Jul-16 09:56 UTC

head link

Re: oxenstored memory leak? seems related with XSA-38

On 16/07/13 10:46, Ian Campbell wrote:> On Mon, 2013-07-15 at 13:13 +0100, David Scott wrote:
>> Hi,
>>
>> On 05/07/13 10:07, Liuqiming (John) wrote:
>>>
>>> Here is my patch that try to fix this issue.
>>>
>>> The whole idea is: add check logic when read from IO ring, and if
error happens mark the reading connection as "bad",
>>> Unless vm reboot, oxenstored will not handle message from this
connection any more.
>>
>> I think detecting a bad client and avoiding wasting CPU time on it is a
>> good idea. Is this patch working well for you in your testing?
>>
>> In future I wonder whether we should add some form of request
>> rate-limiting in addition to the per-domain quotas, to prevent one
>> domain from taking more than it''s fair share of xenstored
time. I
>> imagine that a non-malicious domain can still keep xenstored busy with
>> ''legitimate'' traffic (although hopefully it still
provides services to
>> other guests, albeit more slowly)
>
> Is this an Ack for this patch, or (leaving aside future work) would you
> like to see changes before it gets applied?
I''m happy with this current patch for now, so

Acked-by: David Scott <dave.scott@eu.citrix.com>

Cheers,
Dave
>
> Ian.
>
>>
>> Cheers,
>> Dave
>>
>>>
>>> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c
b/tools/ocaml/libs/xb/xs_ring_stubs.c
>>> index fdd9983..a9ca456 100644
>>> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
>>> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
>>> @@ -45,6 +45,10 @@
>>>    	cons = *(volatile uint32*)&intf->req_cons;
>>>    	prod = *(volatile uint32*)&intf->req_prod;
>>>    	xen_mb();
>>> +
>>> +	if ((prod - cons) > XENSTORE_RING_SIZE)
>>> +	    return -1;
>>> +
>>>    	if (prod == cons)
>>>    		return 0;
>>>    	cons = MASK_XENSTORE_IDX(cons);
>>> @@ -94,7 +98,7 @@
>>>    	res = xs_ring_read(GET_C_STRUCT(interface),
>>>    	                   String_val(buffer), Int_val(len));
>>>    	if (res == -1)
>>> -		caml_failwith("huh");
>>> +		caml_failwith("bad client");
>>>    	result = Val_int(res);
>>>    	CAMLreturn(result);
>>>    }
>>> diff --git a/tools/ocaml/xenstored/connection.ml
b/tools/ocaml/xenstored/connection.ml
>>> index 32e2f2e..e862c79 100644
>>> --- a/tools/ocaml/xenstored/connection.ml
>>> +++ b/tools/ocaml/xenstored/connection.ml
>>> @@ -38,6 +38,11 @@
>>>    	mutable perm: Perms.Connection.t;
>>>    }
>>>
>>> +let mark_as_bad con >>> +	match con.dom with
>>> +	| None -> ()
>>> +	| Some domain -> Domain.mark_as_bad domain
>>> +
>>>    let get_path con >>>    Printf.sprintf
"/local/domain/%i/" (match con.dom with None -> 0 | Some d ->
Domain.get_id d)
>>>
>>> diff --git a/tools/ocaml/xenstored/domain.ml
b/tools/ocaml/xenstored/domain.ml
>>> index 85ab282..444069d 100644
>>> --- a/tools/ocaml/xenstored/domain.ml
>>> +++ b/tools/ocaml/xenstored/domain.ml
>>> @@ -27,6 +27,7 @@
>>>    	interface: Xenmmap.mmap_interface;
>>>    	eventchn: Event.t;
>>>    	mutable port: Xeneventchn.t option;
>>> +	mutable bad_client: bool;
>>>    }
>>>
>>>    let get_path dom = "/local/domain/" ^ (sprintf
"%u" dom.id)
>>> @@ -34,6 +35,9 @@
>>>    let get_interface d = d.interface
>>>    let get_mfn d = d.mfn
>>>    let get_remote_port d = d.remote_port
>>> +
>>> +let is_bad_domain domain = domain.bad_client
>>> +let mark_as_bad domain = domain.bad_client <- true
>>>
>>>    let string_of_port = function
>>>    | None -> "None"
>>> @@ -68,7 +72,8 @@
>>>    	remote_port = remote_port;
>>>    	interface = interface;
>>>    	eventchn = eventchn;
>>> -	port = None
>>> +	port = None;
>>> +	bad_client = false
>>>    }
>>>
>>>    let is_dom0 d = d.id = 0
>>> diff --git a/tools/ocaml/xenstored/process.ml
b/tools/ocaml/xenstored/process.ml
>>> index a4ff741..2267ddc 100644
>>> --- a/tools/ocaml/xenstored/process.ml
>>> +++ b/tools/ocaml/xenstored/process.ml
>>> @@ -374,7 +374,17 @@
>>>    	Logging.xb_answer ~ty ~tid ~con:(Connection.get_domstr con)
data
>>>
>>>    let do_input store cons doms con >>> -	if
Connection.do_input con then (
>>> +	let newpacket >>> +		try
>>> +			Connection.do_input con
>>> +		with Failure exp ->
>>> +			error "caught exception %s" exp;
>>> +			error "got a bad client %s" (sprintf "%-8s"
(Connection.get_domstr con));
>>> +			Connection.mark_as_bad con;
>>> +			false
>>> +	in
>>> +	
>>> +	if newpacket then (
>>>    		let packet = Connection.pop_in con in
>>>    		let tid, rid, ty, data = Xenbus.Xb.Packet.unpack packet in
>>>    		(* As we don''t log IO, do not call an unnecessary
sanitize_data
>>> diff --git a/tools/ocaml/xenstored/xenstored.ml
b/tools/ocaml/xenstored/xenstored.ml
>>> index 4045aed..cfc2a4b 100644
>>> --- a/tools/ocaml/xenstored/xenstored.ml
>>> +++ b/tools/ocaml/xenstored/xenstored.ml
>>> @@ -50,9 +50,10 @@
>>>
>>>    let process_domains store cons domains >>>    	let
do_io_domain domain >>> -		let con = Connections.find_domain cons
(Domain.get_id domain) in
>>> -		Process.do_input store cons domains con;
>>> -		Process.do_output store cons domains con in
>>> +		if not (Domain.is_bad_domain domain) then
>>> +			let con = Connections.find_domain cons (Domain.get_id domain)
in
>>> +			Process.do_input store cons domains con;
>>> +			Process.do_output store cons domains con in
>>>    	Domains.iter domains do_io_domain
>>>
>>>    let sigusr1_handler store >>>
>>>
>>>> -----Original Message-----
>>>> From: Liuqiming (John)
>>>> Sent: Friday, July 05, 2013 11:14 AM
>>>> To: ''andrew.cooper3@citrix.com''
>>>> Cc: Yanqiangjun; ''xen-devel@lists.xen.org''
>>>> Subject: RE: [Xen-devel] oxenstored memory leak? seems related
with
>>>> XSA-38
>>>>
>>>> Hi Andrew,
>>>>
>>>> Yes, I know the second patch commit by Ian Campbell.
>>>>
>>>> Actually I have applied all the patch from
>>>>
http://wiki.xen.org/wiki/Security_Announcements#XSA_38_oxenstored_inc
>>>> orrect_handling_of_certain_Xenbus_ring_states
>>>>
>>>> But this issue still exists.
>>>>
>>>>> On 04/07/13 03:48, Liuqiming (John) wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> Continue my test about oxenstored:
>>>>>>
>>>>>> I switch to original C xenstored and test my
"broken" vm. The cxenstored
>>>>> do not have the "memory leak" issue.
>>>>>> So I compared the IO ring handler logic between
cxenstored and
>>>>> oxenstored and find out the difference:
>>>>>>
>>>>>> In Cxenstord, after got the cons and prod value, a
index check will be
>>>>> performed
>>>>>>
>>>>>> 	if (!check_indexes(cons, prod)) {
>>>>>> 		errno = EIO;
>>>>>> 		return -1;
>>>>>> 	}
>>>>>>
>>>>>> static bool check_indexes(XENSTORE_RING_IDX cons,
>>>>> XENSTORE_RING_IDX prod)
>>>>>> {
>>>>>> 	return ((prod - cons) <= XENSTORE_RING_SIZE);
>>>>>> }
>>>>>>
>>>>>> So any connection has prod - cons >
XENSTORE_RING_SIZE will be
>>>> treated
>>>>> as "bad client", and cxenstored will not handle
its IO ring msg any more.
>>>>>>
>>>>>> But in oxenstored, there is just a simple comparison
between prod and
>>>>> cons
>>>>>>       if (prod == cons)
>>>>>> 		return 0;
>>>>>>
>>>>>> so there leaves a security hole to a guest vm user who
can manipulate
>>>> prod
>>>>> and cons to make oxenstored increasing memory usage and out
of service.
>>>>>>
>>>>>> I managed to create a patch to fix this and
I''m testing it on xen4.2.2. Will
>>>>> send out soon.
>>>>>
>>>>> Are you aware of
>>>>>
>>>>
http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=eeddfad1b339dca
>>>>> a787230f519a19de1cbc22ad8
>>>>> which is a second patch for XSA-38 ?
>>>>>
>>>>> ~Andrew
>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Liuqiming (John)
>>>>>>> Sent: Monday, July 01, 2013 9:47 PM
>>>>>>> To: ''xen-devel@lists.xen.org'';
''ian.jackson@eu.citrix.com'';
>>>>>>> ''ian.campbell@citrix.com''
>>>>>>> Cc: Yanqiangjun
>>>>>>> Subject: oxenstored memory leak? seems related with
XSA-38
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I test starting vm using xen-4.2.2 release with
oxenstored, and got a
>>>>> problem
>>>>>>> may be related with XSA-38
>>>>>>>
>>>>>
>>>>
(http://lists.xen.org/archives/html/xen-announce/2013-02/msg00005.html).
>>>>>>>
>>>>>>> When vm started, oxenstored memory usage keep
increasing, and it
>>>> took
>>>>>>> 1.5G memory at last. Vm hanged at loading OS
screen.
>>>>>>>
>>>>>>> Here is the output of top:
>>>>>>>
>>>>>>> top - 20:18:32 up 1 day,  3:09,  5 users,  load
average: 0.99, 0.63,
>>>> 0.32
>>>>>>> Tasks:   1 total,   1 running,   0 sleeping,   0
stopped,   0 zombie
>>>>>>> %Cpu(s):  4.5 us,  1.8 sy,  0.0 ni, 93.7 id,  0.0
wa,  0.0 hi,  0.0 si,
>>>>> 0.0
>>>>>>> st
>>>>>>> KiB Mem:  46919428 total, 46699012 used,   220416
free,    36916
>>>>>>> buffers
>>>>>>> KiB Swap:  2103292 total,        0 used,  2103292
free, 44260932
>>>>>>> cached
>>>>>>>
>>>>>>>     PID USER      PR  NI  VIRT  RES  SHR S  %CPU
%MEM
>>>>> TIME+
>>>>>>> COMMAND
>>>>>>>     806 root      20   0  955m 926m 1068 R  99.9 
2.0   4:54.14
>>>>>>> oxenstored
>>>>>>>
>>>>>>>
>>>>>>> top - 20:19:05 up 1 day,  3:09,  5 users,  load
average: 0.99, 0.67,
>>>> 0.34
>>>>>>> Tasks:   1 total,   1 running,   0 sleeping,   0
stopped,   0 zombie
>>>>>>> %Cpu(s):  4.6 us,  1.6 sy,  0.0 ni, 93.8 id,  0.0
wa,  0.0 hi,  0.0 si,
>>>>> 0.0
>>>>>>> st
>>>>>>> KiB Mem:  46919428 total, 46708564 used,   210864
free,    36964
>>>>>>> buffers
>>>>>>> KiB Swap:  2103292 total,        0 used,  2103292
free, 44168380
>>>>>>> cached
>>>>>>>
>>>>>>>     PID USER      PR  NI  VIRT  RES  SHR S  %CPU
%MEM
>>>>> TIME+
>>>>>>> COMMAND
>>>>>>>     806 root      20   0 1048m 1.0g 1068 R 100.2 
2.2   5:27.03
>>>>>>> oxenstored
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> top - 20:21:35 up 1 day,  3:12,  5 users,  load
average: 1.00, 0.80,
>>>> 0.44
>>>>>>> Tasks:   1 total,   1 running,   0 sleeping,   0
stopped,   0 zombie
>>>>>>> %Cpu(s):  4.7 us,  1.6 sy,  0.0 ni, 93.7 id,  0.0
wa,  0.0 hi,  0.0 si,
>>>>> 0.0
>>>>>>> st
>>>>>>> KiB Mem:  46919428 total, 46703052 used,   216376
free,    37208
>>>>>>> buffers
>>>>>>> KiB Swap:  2103292 total,        0 used,  2103292
free, 43682968
>>>>>>> cached
>>>>>>>
>>>>>>>     PID USER      PR  NI  VIRT  RES  SHR S  %CPU
%MEM
>>>>> TIME+
>>>>>>> COMMAND
>>>>>>>     806 root      20   0 1551m 1.5g 1068 R 100.2 
3.3   7:56.10
>>>>>>> oxenstored
>>>>>>>
>>>>>>> And oxenstored log got these over and over again:
>>>>>>>
>>>>>>> [20130701T12:27:14.290Z] D8           invalid
>>>>>>> device/suspend/event-channel
>>>>>>>
>>>>>>> ..
>>>>>>> [20130701T12:27:14.290Z] D8.1937077039 invalid
>>>>>>> /event-channel
>>>>>>>
>>>>>>>     ..
>>>>>>> [20130701T12:27:14.290Z] D8.1852727656
>>>>>>> invalid
>>>>>>>
>>>>>>>               ..
>>>>>>> [20130701T12:27:14.290Z] D8           debug
>>>>>>> [20130701T12:27:14.290Z] D8           debug
>>>>>>> [20130701T12:27:14.290Z] D8           debug
>>>>>>> [20130701T12:27:14.290Z] D8           debug
>>>>>>> [20130701T12:27:14.290Z] D8           debug
>>>>>>> [20130701T12:27:14.290Z] D8           debug
>>>>>>>
>>>>>>> My vm is a windows guest and has GPL PVDriver
installed. This problem
>>>> is
>>>>>>> hard to reproduce, and after a hard reboot,
everything looks normal.
>>>>>>>
>>>>>>> I guess it''s something wrong with the
xenbus IO Ring, so I investigated
>>>> the
>>>>>>> code:
>>>>>>>
>>>>>>> 1) oxenstored and xenbus in vm using a shared page
to communicate
>>>> with
>>>>>>> each other
>>>>>>>     struct xenstore_domain_interface {
>>>>>>>       char req[XENSTORE_RING_SIZE]; /* Requests to
xenstore
>>>> daemon.
>>>>> */
>>>>>>>       char rsp[XENSTORE_RING_SIZE]; /* Replies and
async watch
>>>> events.
>>>>> */
>>>>>>>       XENSTORE_RING_IDX req_cons, req_prod;
>>>>>>>       XENSTORE_RING_IDX rsp_cons, rsp_prod;
>>>>>>> };
>>>>>>>
>>>>>>> 2) xenbus in vm put request in req and increase
req_prod, then send a
>>>>> event
>>>>>>> to oxenstored
>>>>>>> 3) oxenstored calculates how many to read using
req_cons and
>>>> req_prod,
>>>>>>> and after read oxenstored increase req_cons to make
it equals req_prod
>>>>>>> which means no request pending.
>>>>>>> 4) oxenstored put responds in rsp and increase
rsp_prod, then send a
>>>>> event
>>>>>>> to vm, xenbus in vm using similar logic to handle
the rsp ring.
>>>>>>>
>>>>>>>    Am I correct?
>>>>>>>
>>>>>>> So, I''m curious about what happened when
req_cons larger than
>>>>> req_prod
>>>>>>> (this can be caused by buggy PV Driver or malicious
guest user), it seems
>>>>>>> oxenstored will fell in a endless loop.
>>>>>>>
>>>>>>> Is this what XSA-38 talk about?
>>>>>>>
>>>>>>> I built a pvdriver which will set req_prod to 0
after several xenstore
>>>>>>> operation, and test it on xen-unstable.hg make sure
all XSA-38 patches
>>>>>>> applied.
>>>>>>> It seems that the problem I first met reproduced.
Oxenstored will took a
>>>>> lot
>>>>>>> of memory eventually.
>>>>>>>
>>>>>>> Could anyone help me about this issue?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@lists.xen.org
>>>>>> http://lists.xen.org/xen-devel
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
>>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
>

Ian Campbell

2013-Jul-19 11:56 UTC

head link

Re: oxenstored memory leak? seems related with XSA-38

On Tue, 2013-07-16 at 10:56 +0100, David Scott wrote:> On 16/07/13 10:46, Ian Campbell wrote:
> > On Mon, 2013-07-15 at 13:13 +0100, David Scott wrote:
> >> Hi,
> >>
> >> On 05/07/13 10:07, Liuqiming (John) wrote:
> >>>
> >>> Here is my patch that try to fix this issue.
> >>>
> >>> The whole idea is: add check logic when read from IO ring, and
if error happens mark the reading connection as "bad",
> >>> Unless vm reboot, oxenstored will not handle message from this
connection any more.
> >>
> >> I think detecting a bad client and avoiding wasting CPU time on it
is a
> >> good idea. Is this patch working well for you in your testing?
> >>
> >> In future I wonder whether we should add some form of request
> >> rate-limiting in addition to the per-domain quotas, to prevent one
> >> domain from taking more than it''s fair share of xenstored
time. I
> >> imagine that a non-malicious domain can still keep xenstored busy
with
> >> ''legitimate'' traffic (although hopefully it
still provides services to
> >> other guests, albeit more slowly)
> >
> > Is this an Ack for this patch, or (leaving aside future work) would
you
> > like to see changes before it gets applied?
> 
> I''m happy with this current patch for now, so
> 
> Acked-by: David Scott <dave.scott@eu.citrix.com>
Thanks.

John -- Please could you resubmit with a proper changelog and a
Signed-off-by indicating that you agree to the Developer''s Certificate
of Origin (described in
http://wiki.xen.org/wiki/Submitting_Xen_Patches). Please include Dave''s
ack as well (after your S-o-b).

Thanks,
Ian.

Liuqiming (John)

2013-Jul-22 12:08 UTC

head link

Re: oxenstored memory leak? seems related with XSA-38

Here is my patch, please let me know if I did anything wrong. (this is my first
patch for xen :)

Thanks.
> -----Original Message-----
> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> Sent: Friday, July 19, 2013 7:57 PM
> To: David Scott
> Cc: Liuqiming (John); andrew.cooper3@citrix.com; Yanqiangjun;
> xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] oxenstored memory leak? seems related with
> XSA-38
> 
> On Tue, 2013-07-16 at 10:56 +0100, David Scott wrote:
> > On 16/07/13 10:46, Ian Campbell wrote:
> > > On Mon, 2013-07-15 at 13:13 +0100, David Scott wrote:
> > >> Hi,
> > >>
> > >> On 05/07/13 10:07, Liuqiming (John) wrote:
> > >>>
> > >>> Here is my patch that try to fix this issue.
> > >>>
> > >>> The whole idea is: add check logic when read from IO
ring, and if error
> happens mark the reading connection as "bad",
> > >>> Unless vm reboot, oxenstored will not handle message from
this
> connection any more.
> > >>
> > >> I think detecting a bad client and avoiding wasting CPU time
on it is a
> > >> good idea. Is this patch working well for you in your
testing?
> > >>
> > >> In future I wonder whether we should add some form of request
> > >> rate-limiting in addition to the per-domain quotas, to
prevent one
> > >> domain from taking more than it's fair share of xenstored
time. I
> > >> imagine that a non-malicious domain can still keep xenstored
busy with
> > >> 'legitimate' traffic (although hopefully it still
provides services to
> > >> other guests, albeit more slowly)
> > >
> > > Is this an Ack for this patch, or (leaving aside future work)
would you
> > > like to see changes before it gets applied?
> >
> > I'm happy with this current patch for now, so
> >
> > Acked-by: David Scott <dave.scott@eu.citrix.com>
> 
> Thanks.
> 
> John -- Please could you resubmit with a proper changelog and a
> Signed-off-by indicating that you agree to the Developer's Certificate
> of Origin (described in
> http://wiki.xen.org/wiki/Submitting_Xen_Patches). Please include Dave's
> ack as well (after your S-o-b).
> 
> Thanks,
> Ian.
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Ian Campbell

2013-Jul-22 21:37 UTC

head link

Re: oxenstored memory leak? seems related with XSA-38

On Mon, 2013-07-22 at 12:08 +0000, Liuqiming (John)
wrote:> Here is my patch, please let me know if I did anything wrong. (this is my
first patch for xen :)
Applied, thanks.

Generally a fresh email with the patch and the changelog inline in the
body is preferred, http://wiki.xen.org/wiki/Submitting_Xen_Patches has
some hints and descriptions of handy tools...

Ian.

Xen devel - Jul 2013 - oxenstored memory leak? seems related with XSA-38

oxenstored memory leak? seems related with XSA-38

Re: oxenstored memory leak? seems related with XSA-38

Re: oxenstored memory leak? seems related with XSA-38

Re: oxenstored memory leak? seems related with XSA-38

Re: oxenstored memory leak? seems related with XSA-38

Re: oxenstored memory leak? seems related with XSA-38

Re: oxenstored memory leak? seems related with XSA-38

Re: oxenstored memory leak? seems related with XSA-38

Re: oxenstored memory leak? seems related with XSA-38

Re: oxenstored memory leak? seems related with XSA-38

Re: oxenstored memory leak? seems related with XSA-38

Re: oxenstored memory leak? seems related with XSA-38