Si-Wei Liu
2022-Aug-16 23:14 UTC
[PATCH 2/2] vDPA: conditionally read fields in virtio-net dev
On 8/16/2022 2:08 AM, Zhu, Lingshan wrote:> > > On 8/16/2022 3:58 PM, Si-Wei Liu wrote: >> >> >> On 8/15/2022 6:58 PM, Zhu, Lingshan wrote: >>> >>> >>> On 8/16/2022 7:32 AM, Si-Wei Liu wrote: >>>> >>>> >>>> On 8/15/2022 2:26 AM, Zhu Lingshan wrote: >>>>> Some fields of virtio-net device config space are >>>>> conditional on the feature bits, the spec says: >>>>> >>>>> "The mac address field always exists >>>>> (though is only valid if VIRTIO_NET_F_MAC is set)" >>>>> >>>>> "max_virtqueue_pairs only exists if VIRTIO_NET_F_MQ >>>>> or VIRTIO_NET_F_RSS is set" >>>>> >>>>> "mtu only exists if VIRTIO_NET_F_MTU is set" >>>>> >>>>> so we should read MTU, MAC and MQ in the device config >>>>> space only when these feature bits are offered. >>>>> >>>>> For MQ, if both VIRTIO_NET_F_MQ and VIRTIO_NET_F_RSS are >>>>> not set, the virtio device should have >>>>> one queue pair as default value, so when userspace querying queue >>>>> pair numbers, >>>>> it should return mq=1 than zero. >>>>> >>>>> For MTU, if VIRTIO_NET_F_MTU is not set, we should not read >>>>> MTU from the device config sapce. >>>>> RFC894 <A Standard for the Transmission of IP Datagrams over >>>>> Ethernet Networks> >>>>> says:"The minimum length of the data field of a packet sent over an >>>>> Ethernet is 1500 octets, thus the maximum length of an IP datagram >>>>> sent over an Ethernet is 1500 octets.? Implementations are encouraged >>>>> to support full-length packets" >>>> Noted there's a typo in the above "The *maximum* length of the data >>>> field of a packet sent over an Ethernet is 1500 octets ..." and the >>>> RFC was written 1984. >>> the spec RFC894 says it is 1500, see <a >>> href="https://urldefense.com/v3/__https://www.rfc-editor.org/rfc/rfc894.txt__;!!ACWV5N9M2RV99hQ!MdgxZjw5sp5Qz-GKfwT1IWcw_L4Jo1-UekuJPFz1UrG3YuqirKz7P9ksdJFh1vB6zHJ7z8Q04fpT0-9jWXCtlWM$">https://urldefense.com/v3/__https://www.rfc-editor.org/rfc/rfc894.txt__;!!ACWV5N9M2RV99hQ!KVwfun0b1Q59Ajp6O7JrB-BuEBSLyQ9e95oGq1cVG_sQIPDL0whI5frx1EGoQFznmm67RsEeJTrUdfYrmZPRFaM$ >>> </a> >>>> >>>> Apparently that is no longer true with the introduction of Jumbo >>>> size frame later in the 2000s. I'm not sure what is the point of >>>> mention this ancient RFC. It doesn't say default MTU of any >>>> Ethernet NIC/switch should be 1500 in either? case. >>> This could be a larger number for sure, we are trying to find out >>> the min value for Ethernet here, to support 1500 octets, MTU should >>> be 1500 at least, so I assume 1500 should be the default value for MTU >>>> >>>>> >>>>> virtio spec says:"The virtio network device is a virtual ethernet >>>>> card", >>>> Right, >>>>> so the default MTU value should be 1500 for virtio-net. >>>> ... but it doesn't say the default is 1500. At least, not in >>>> explicit way. Why it can't be 1492 or even lower? In practice, if >>>> the network backend has a MTU higher than 1500, there's nothing >>>> wrong for guest to configure default MTU more than 1500. >>> same as above >>>> >>>>> >>>>> For MAC, the spec says:"If the VIRTIO_NET_F_MAC feature bit is set, >>>>> the configuration space mac entry indicates the ?physical? address >>>>> of the network card, otherwise the driver would typically >>>>> generate a random local MAC address." So there is no >>>>> default MAC address if VIRTIO_NET_F_MAC not set. >>>>> >>>>> This commits introduces functions vdpa_dev_net_mtu_config_fill() >>>>> and vdpa_dev_net_mac_config_fill() to fill MTU and MAC. >>>>> It also fixes vdpa_dev_net_mq_config_fill() to report correct >>>>> MQ when _F_MQ is not present. >>>>> >>>>> These functions should check devices features than driver >>>>> features, and struct vdpa_device is not needed as a parameter >>>>> >>>>> The test & userspace tool output: >>>>> >>>>> Feature bit VIRTIO_NET_F_MTU, VIRTIO_NET_F_RSS, VIRTIO_NET_F_MQ >>>>> and VIRTIO_NET_F_MAC can be mask out by hardcode. >>>>> >>>>> However, it is challenging to "disable" the related fields >>>>> in the HW device config space, so let's just assume the values >>>>> are meaningless if the feature bits are not set. >>>>> >>>>> Before this change, when feature bits for RSS, MQ, MTU and MAC >>>>> are not set, iproute2 output: >>>>> $vdpa vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false mtu >>>>> 1500 >>>>> ?? negotiated_features >>>>> >>>>> without this commit, function vdpa_dev_net_config_fill() >>>>> reads all config space fields unconditionally, so let's >>>>> assume the MAC and MTU are meaningless, and it checks >>>>> MQ with driver_features, so we don't see max_vq_pairs. >>>>> >>>>> After applying this commit, when feature bits for >>>>> MQ, RSS, MAC and MTU are not set,iproute2 output: >>>>> $vdpa dev config show vdpa0 >>>>> vdpa0: link up link_announce false max_vq_pairs 1 mtu 1500 >>>>> ?? negotiated_features >>>>> >>>>> As explained above: >>>>> Here is no MAC, because VIRTIO_NET_F_MAC is not set, >>>>> and there is no default value for MAC. It shows >>>>> max_vq_paris = 1 because even without MQ feature, >>>>> a functional virtio-net must have one queue pair. >>>>> mtu = 1500 is the default value as ethernet >>>>> required. >>>>> >>>>> This commit also add supplementary comments for >>>>> __virtio16_to_cpu(true, xxx) operations in >>>>> vdpa_dev_net_config_fill() and vdpa_fill_stats_rec() >>>>> >>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu at intel.com> >>>>> --- >>>>> ? drivers/vdpa/vdpa.c | 60 >>>>> +++++++++++++++++++++++++++++++++++---------- >>>>> ? 1 file changed, 47 insertions(+), 13 deletions(-) >>>>> >>>>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c >>>>> index efb55a06e961..a74660b98979 100644 >>>>> --- a/drivers/vdpa/vdpa.c >>>>> +++ b/drivers/vdpa/vdpa.c >>>>> @@ -801,19 +801,44 @@ static int vdpa_nl_cmd_dev_get_dumpit(struct >>>>> sk_buff *msg, struct netlink_callba >>>>> ????? return msg->len; >>>>> ? } >>>>> ? -static int vdpa_dev_net_mq_config_fill(struct vdpa_device *vdev, >>>>> -?????????????????????? struct sk_buff *msg, u64 features, >>>>> +static int vdpa_dev_net_mq_config_fill(struct sk_buff *msg, u64 >>>>> features, >>>>> ???????????????????????? const struct virtio_net_config *config) >>>>> ? { >>>>> ????? u16 val_u16; >>>>> ? -??? if ((features & BIT_ULL(VIRTIO_NET_F_MQ)) == 0) >>>>> -??????? return 0; >>>>> +??? if ((features & BIT_ULL(VIRTIO_NET_F_MQ)) == 0 && >>>>> +??????? (features & BIT_ULL(VIRTIO_NET_F_RSS)) == 0) >>>>> +??????? val_u16 = 1; >>>>> +??? else >>>>> +??????? val_u16 = __virtio16_to_cpu(true, >>>>> config->max_virtqueue_pairs); >>>>> ? -??? val_u16 = le16_to_cpu(config->max_virtqueue_pairs); >>>>> ????? return nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP, >>>>> val_u16); >>>>> ? } >>>>> ? +static int vdpa_dev_net_mtu_config_fill(struct sk_buff *msg, >>>>> u64 features, >>>>> +??????????????????? const struct virtio_net_config *config) >>>>> +{ >>>>> +??? u16 val_u16; >>>>> + >>>>> +??? if ((features & BIT_ULL(VIRTIO_NET_F_MTU)) == 0) >>>>> +??????? val_u16 = 1500; >>>> As said, there's no virtio spec defined value for MTU. Please leave >>>> this field out if feature VIRTIO_NET_F_MTU is not negotiated. >>> same as above >>>>> +??? else >>>>> +??????? val_u16 = __virtio16_to_cpu(true, config->mtu); >>>>> + >>>>> +??? return nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16); >>>>> +} >>>>> + >>>>> +static int vdpa_dev_net_mac_config_fill(struct sk_buff *msg, u64 >>>>> features, >>>>> +??????????????????? const struct virtio_net_config *config) >>>>> +{ >>>>> +??? if ((features & BIT_ULL(VIRTIO_NET_F_MAC)) == 0) >>>>> +??????? return 0; >>>>> +??? else >>>>> +??????? return? nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR, >>>>> +??????????????? sizeof(config->mac), config->mac); >>>>> +} >>>>> + >>>>> + >>>>> ? static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, >>>>> struct sk_buff *msg) >>>>> ? { >>>>> ????? struct virtio_net_config config = {}; >>>>> @@ -822,18 +847,16 @@ static int vdpa_dev_net_config_fill(struct >>>>> vdpa_device *vdev, struct sk_buff *ms >>>>> ? ????? vdpa_get_config_unlocked(vdev, 0, &config, sizeof(config)); >>>>> ? -??? if (nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR, >>>>> sizeof(config.mac), >>>>> -??????????? config.mac)) >>>>> -??????? return -EMSGSIZE; >>>>> +??? /* >>>>> +???? * Assume little endian for now, userspace can tweak this for >>>>> +???? * legacy guest support. >>>> You can leave it as a TODO for kernel (vdpa core limitation), but >>>> AFAIK there's nothing userspace needs to do to infer the >>>> endianness. IMHO it's the kernel's job to provide an abstraction >>>> rather than rely on userspace guessing it. >>> we have discussed it in another thread, and this comment is >>> suggested by MST. >> Can you provide the context or link? It shouldn't work like this, >> otherwise it is breaking uABI. E.g. how will a legacy/BE supporting >> kernel/device be backward compatible with older vdpa tool (which has >> knowledge of this endianness implication/assumption from day one)? > https://urldefense.com/v3/__https://www.spinics.net/lists/netdev/msg837114.html__;!!ACWV5N9M2RV99hQ!KVwfun0b1Q59Ajp6O7JrB-BuEBSLyQ9e95oGq1cVG_sQIPDL0whI5frx1EGoQFznmm67RsEeJTrUdfYrGq7Vwjk$ > > The challenge is that the status filed is virtio16, not le16, so > le16_to_cpu(xxx) is wrong anyway. However we can not tell whether it > is a LE or BE device from struct vdpa_device, so for most cases, we > assume it is LE, and leave this comment.While the fix is fine, the comment is misleading in giving readers false hope. This is in vdpa_dev_net_config_fill() the vdpa tool query path, instead of calls from the VMM dealing with vhost/virtio plumbing specifics. I think what's missing today in vdpa core is the detection of guest type (legacy, transitional, or modern) regarding endianness through F_VERSION_1 and legacy interface access, the latter of which would need some assistance from VMM for sure. However, the presence of information via the vdpa tool query is totally orthogonal. I don't get a good reason for why it has to couple with endianness. How vdpa tool users space is supposed to tweak it? I don't get it... -Siwei> > Thanks >> >> -Siwei >> >>>> >>>>> +???? */ >>>>> +??? val_u16 = __virtio16_to_cpu(true, config.status); >>>>> ? ????? val_u16 = __virtio16_to_cpu(true, config.status); >>>>> ????? if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, val_u16)) >>>>> ????????? return -EMSGSIZE; >>>>> ? -??? val_u16 = __virtio16_to_cpu(true, config.mtu); >>>>> -??? if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16)) >>>>> -??????? return -EMSGSIZE; >>>>> - >>>>> ????? features_driver = vdev->config->get_driver_features(vdev); >>>>> ????? if (nla_put_u64_64bit(msg, >>>>> VDPA_ATTR_DEV_NEGOTIATED_FEATURES, features_driver, >>>>> ??????????????????? VDPA_ATTR_PAD)) >>>>> @@ -846,7 +869,13 @@ static int vdpa_dev_net_config_fill(struct >>>>> vdpa_device *vdev, struct sk_buff *ms >>>>> ??????????????????? VDPA_ATTR_PAD)) >>>>> ????????? return -EMSGSIZE; >>>>> ? -??? return vdpa_dev_net_mq_config_fill(vdev, msg, >>>>> features_driver, &config); >>>>> +??? if (vdpa_dev_net_mac_config_fill(msg, features_device, &config)) >>>>> +??????? return -EMSGSIZE; >>>>> + >>>>> +??? if (vdpa_dev_net_mtu_config_fill(msg, features_device, &config)) >>>>> +??????? return -EMSGSIZE; >>>>> + >>>>> +??? return vdpa_dev_net_mq_config_fill(msg, features_device, >>>>> &config); >>>>> ? } >>>>> ? ? static int >>>>> @@ -914,6 +943,11 @@ static int vdpa_fill_stats_rec(struct >>>>> vdpa_device *vdev, struct sk_buff *msg, >>>>> ????? } >>>>> ????? vdpa_get_config_unlocked(vdev, 0, &config, sizeof(config)); >>>>> ? +??? /* >>>>> +???? * Assume little endian for now, userspace can tweak this for >>>>> +???? * legacy guest support. >>>>> +???? */ >>>>> + >>>> Ditto. >>> same as above >>> >>> Thanks >>>> >>>> Thanks, >>>> -Siwei >>>>> ????? max_vqp = __virtio16_to_cpu(true, config.max_virtqueue_pairs); >>>>> ????? if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP, max_vqp)) >>>>> ????????? return -EMSGSIZE; >>>> >>> >> >