Michael S. Tsirkin
2017-Jul-26 16:08 UTC
[PATCH net] Revert "vhost: cache used event for better performance"
On Wed, Jul 26, 2017 at 09:37:15PM +0800, Jason Wang wrote:> > > On 2017?07?26? 21:18, Jason Wang wrote: > > > > > > On 2017?07?26? 20:57, Michael S. Tsirkin wrote: > > > On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote: > > > > This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it > > > > was reported to break vhost_net. We want to cache used event and use > > > > it to check for notification. We try to valid cached used event by > > > > checking whether or not it was ahead of new, but this is not correct > > > > all the time, it could be stale and there's no way to know about this. > > > > > > > > Signed-off-by: Jason Wang<jasowang at redhat.com> > > > Could you supply a bit more data here please? How does it get stale? > > > What does guest need to do to make it stale? This will be helpful if > > > anyone wants to bring it back, or if we want to extend the protocol. > > > > > > > The problem we don't know whether or not guest has published a new used > > event. The check vring_need_event(vq->last_used_event, new + vq->num, > > new) is not sufficient to check for this. > > > > Thanks > > More notes, the previous assumption is that we don't move used event back, > but this could happen in fact if idx is wrapper around.You mean if the 16 bit index wraps around after 64K entries. Makes sense.> Will repost and add > this into commit log. > > Thanks
K. Den
2017-Jul-30 06:26 UTC
[PATCH net] Revert "vhost: cache used event for better performance"
On Wed, 2017-07-26 at 19:08 +0300, Michael S. Tsirkin wrote:> On Wed, Jul 26, 2017 at 09:37:15PM +0800, Jason Wang wrote: > > > > > > On 2017?07?26? 21:18, Jason Wang wrote: > > > > > > > > > On 2017?07?26? 20:57, Michael S. Tsirkin wrote: > > > > On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote: > > > > > This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it > > > > > was reported to break vhost_net. We want to cache used event and use > > > > > it to check for notification. We try to valid cached used event by > > > > > checking whether or not it was ahead of new, but this is not correct > > > > > all the time, it could be stale and there's no way to know about this. > > > > > > > > > > Signed-off-by: Jason Wang<jasowang at redhat.com> > > > > > > > > Could you supply a bit more data here please???How does it get stale? > > > > What does guest need to do to make it stale???This will be helpful if > > > > anyone wants to bring it back, or if we want to extend the protocol. > > > > > > > > > > The problem we don't know whether or not guest has published a new used > > > event. The check vring_need_event(vq->last_used_event, new + vq->num, > > > new) is not sufficient to check for this. > > > > > > Thanks > > > > More notes, the previous assumption is that we don't move used event back, > > but this could happen in fact if idx is wrapper around. > > You mean if the 16 bit index wraps around after 64K entries. > Makes sense. > > > Will repost and add > > this into commit log. > > > > ThanksHi, I am just curious but I have got a question: AFAIU, if you wanted to keep the caching mechanism alive in the code base, the following two changes could clear off the issue, or not?: (1) Always fetch the latest event value from guest when signalled_used event is invalid, which includes last_used_idx wraps-around case. Otherwise we might need changes which would complicate too much the logic to properly decide whether or not to skip signalling in the next vhost_notify round. (2) On top of that, split the signal-postponing logic to three cases like: * if the interval of vq.num is [2^16, UINT_MAX]: any cached event is in should-postpone-signalling interval, so paradoxically must always do signalling. * else if the interval of vq.num is [2^15, 2^16): the logic in the original patch (809ecb9bca6a9) suffices * else (= less than 2^15) (optional): checking only (vring_need_event(vq->last_used_event, new + vq->num, new) would suffice. Am I missing something, or is this irrelevant? I would appreciate if you could elaborate a bit more how the situation where event idx wraps around and moves back would make trouble. Thanks.
Jason Wang
2017-Aug-09 02:38 UTC
[PATCH net] Revert "vhost: cache used event for better performance"
On 2017?07?30? 14:26, K. Den wrote:> On Wed, 2017-07-26 at 19:08 +0300, Michael S. Tsirkin wrote: >> On Wed, Jul 26, 2017 at 09:37:15PM +0800, Jason Wang wrote: >>> >>> On 2017?07?26? 21:18, Jason Wang wrote: >>>> >>>> On 2017?07?26? 20:57, Michael S. Tsirkin wrote: >>>>> On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote: >>>>>> This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it >>>>>> was reported to break vhost_net. We want to cache used event and use >>>>>> it to check for notification. We try to valid cached used event by >>>>>> checking whether or not it was ahead of new, but this is not correct >>>>>> all the time, it could be stale and there's no way to know about this. >>>>>> >>>>>> Signed-off-by: Jason Wang<jasowang at redhat.com> >>>>> Could you supply a bit more data here please? How does it get stale? >>>>> What does guest need to do to make it stale? This will be helpful if >>>>> anyone wants to bring it back, or if we want to extend the protocol. >>>>> >>>> The problem we don't know whether or not guest has published a new used >>>> event. The check vring_need_event(vq->last_used_event, new + vq->num, >>>> new) is not sufficient to check for this. >>>> >>>> Thanks >>> More notes, the previous assumption is that we don't move used event back, >>> but this could happen in fact if idx is wrapper around. >> You mean if the 16 bit index wraps around after 64K entries. >> Makes sense. >> >>> Will repost and add >>> this into commit log. >>> >>> Thanks > Hi,Hi, sorry for the late reply, was on vacation last week.> > I am just curious but I have got a question: > AFAIU, if you wanted to keep the caching mechanism alive in the code base, > the following two changes could clear off the issue, or not?: > (1) Always fetch the latest event value from guest when signalled_used event is > invalid, which includes last_used_idx wraps-around case. Otherwise we might need > changes which would complicate too much the logic to properly decide whether or > not to skip signalling in the next vhost_notify round. > (2) On top of that, split the signal-postponing logic to three cases like: > * if the interval of vq.num is [2^16, UINT_MAX]: > any cached event is in should-postpone-signalling interval, so paradoxically > must always do signalling.I think don't think current code can work well if vq.num is grater than 2^15. Since all cached idx is u16. This looks like a bug which needs to be fixed.> * else if the interval of vq.num is [2^15, 2^16): > the logic in the original patch (809ecb9bca6a9) suffices > * else (= less than 2^15) (optional): > checking only (vring_need_event(vq->last_used_event, new + vq->num, new) > would suffice. > > Am I missing something, or is this irrelevant?Looks not, I think this may work. Let me do some test. Thanks> I would appreciate if you could elaborate a bit more how the situation where > event idx wraps around and moves back would make trouble. > > Thanks. >
Possibly Parallel Threads
- [PATCH net] Revert "vhost: cache used event for better performance"
- [PATCH net] Revert "vhost: cache used event for better performance"
- [PATCH V2 net] Revert "vhost: cache used event for better performance"
- [PATCH V2 net] Revert "vhost: cache used event for better performance"
- [PATCH net] Revert "vhost: cache used event for better performance"