Steffen Weiberle
2006-Sep-15 23:48 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
I was able to get access to a system with bge interfaces, a v210. Installed nv47 on it and the bfu for crossbow. I am consistently panicing the system when pinging the IP address of a second zone with its own stack. Both zones are using the same gbe3 interface. I thought I had it working if bge3 had a real IP address on it, or if pinged outside addresses first, but turns out to not be the case. Thanks Steffen I configure thing via: ifconfig bge3 plumb ifconfig bge3 10.1.14.158 netmask + broadcast + up dladm create-vnic -d bge3 -m 0:9:7:8:9:1 -b 1000 1 dladm create-vnic -d bge3 -m 0:9:7:8:9:2 -b 1000 2 dladm create-vnic -d bge3 -m 0:9:7:8:9:3 -b 1000 3 dladm show-vnic zoneadm -z z1 boot zoneadm -z z2 boot Both zones have similar configs, here is z1: # zonecfg -z z1 info zonename: z1 zonepath: /export/zones/z1 autoboot: false stacktype: exclusive bootargs: pool: limitpriv: inherit-pkg-dir: dir: /lib inherit-pkg-dir: dir: /platform inherit-pkg-dir: dir: /sbin inherit-pkg-dir: dir: /usr net: physical: vnic1 address: 10.1.14.151/26 af not specified restrict not specified
Erik Nordmark
2006-Sep-16 01:16 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
Steffen Weiberle wrote:> I was able to get access to a system with bge interfaces, a v210. > Installed nv47 on it and the bfu for crossbow. > > I am consistently panicing the system when pinging the IP address of a > second zone with its own stack. Both zones are using the same gbe3 > interface. I thought I had it working if bge3 had a real IP address on > it, or if pinged outside addresses first, but turns out to not be the case.Do you have a core file or at least a stack trace for the panic? Erik
Steffen Weiberle
2006-Sep-16 10:55 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
I have two core files that were collected (because I entered OK at the prompt). I can provide access info to the system as well. Steffen # /usr/bin/mdb unix.0 vmcore.0 Loading modules: [ unix krtld genunix specfs dtrace ufs scsi_vhci sd pcisch ip sctpmdb: ld.so.1: mdb: fatal: relocation error: file /usr/lib/mdb/kvm/sparcv9/arp.so: symbol mdb_mac_addr: referenced symbol not found usba nca lofs zfs random ptm md cpc fcip fctl sppp nfs ] > ::stack vpanic(12b93a0, 7bb40bb0, 7bb40bd0, 3698, 237fffd8, 0) assfail+0x7c(7bb40bb0, 7bb40bd0, 3698, 1854000, 12b9000, 0) ip_input+0x188(600006a8ca8, 7bb40800, 0, 0, 0, 0) i_dls_link_rx+0x1e0(60000319888, 3001f02a030, 0, 6000005b7a0, 2a100c43840, 2a100c43848) mac_rx+0xe4(600003155d8, 3001f02a030, 0, 600003154d8, 7be5f08c, 0) mac_soft_ring_drain+0xc8(60000315978, 60000ad5e00, 7b2e0e08, 1291800, fffe, 0) mac_soft_ring_worker+0x68(1913800, 6000005b7a0, 0, 60000ad5e4c, 60000315978, 60000ad5e00) thread_start+4(60000ad5e00, 0, 0, 0, 0, 0) > ::status debugging crash dump vmcore.0 (64-bit) from rahul operating system: 5.11 crossbow_0806 (sun4u) panic message: assertion failed: mp->b_datap->db_ref == 1, file: ../../common/inet/ip/ip.c, line: 13976 dump content: kernel pages only > Erik Nordmark wrote On 09/15/06 21:16,:> Steffen Weiberle wrote: > >> I was able to get access to a system with bge interfaces, a v210. >> Installed nv47 on it and the bfu for crossbow. >> >> I am consistently panicing the system when pinging the IP address of a >> second zone with its own stack. Both zones are using the same gbe3 >> interface. I thought I had it working if bge3 had a real IP address on >> it, or if pinged outside addresses first, but turns out to not be the >> case. > > > Do you have a core file or at least a stack trace for the panic? > > Erik > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://opensolaris.org/mailman/listinfo/crossbow-discuss
Kais Belgaied
2006-Sep-18 04:55 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
Steffen Weiberle wrote:> I have two core files that were collected (because I entered OK at the > prompt). I can provide access info to the system as well.CR# 6471727 (ip.c line 13976 ASSERT(mp->b_datap->db_ref == 1) hit) has een filed to track this assertion failure. Thanks Steffen for catching this. Could you please provide a pointer to the cores offline? Kais.> > > Steffen >
Sunay Tripathi
2006-Sep-18 08:42 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
This is interesting. Steffen, what driver are you using? All Nemo based drivers should have db_ref == 1 while things like ce should come in via ip_rput where a copymsg can be done. So it seems like either we have a broken Nemo driver or something weird is going on with ce. Cheers, Sunay> Steffen Weiberle wrote: > > > I have two core files that were collected (because I entered OK at the > > prompt). I can provide access info to the system as well. > > > CR# 6471727 (ip.c line 13976 ASSERT(mp->b_datap->db_ref == 1) hit) > has een filed to track this assertion failure. > > Thanks Steffen for catching this. > Could you please provide a pointer to the cores offline? > > Kais. > > > > > > > Steffen > > > > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://opensolaris.org/mailman/listinfo/crossbow-discuss >-- Sunay Tripathi Sr. Staff Engineer Solaris Core Networking Technologies Sun MicroSystems Inc. Solaris Networking: http://www.opensolaris.org/os/community/networking Project Crossbow: http://www.opensolaris.org/os/project/crossbow
Erik Nordmark
2006-Sep-18 16:27 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
Steffen Weiberle wrote:> I was able to get access to a system with bge interfaces, a v210. > Installed nv47 on it and the bfu for crossbow. > > I am consistently panicing the system when pinging the IP address of a > second zone with its own stack. Both zones are using the same gbe3 > interface. I thought I had it working if bge3 had a real IP address on > it, or if pinged outside addresses first, but turns out to not be the case.I looked at your core files and the stack trace is: rahul# mdb -k *.0 Loading modules: [ unix krtld genunix specfs dtrace ufs scsi_vhci sd pcisch ip sctpmdb: ld.so.1: mdb: fatal: relocation error: file /usr/lib/mdb/kvm/sparcv9/arp.so: symbol mdb_mac_addr: referenced symbol not found usba nca lofs zfs random ptm md cpc fcip fctl sppp nfs ] > $c vpanic(12b93a0, 7bb40bb0, 7bb40bd0, 3698, 237fffd8, 0) assfail+0x7c(7bb40bb0, 7bb40bd0, 3698, 1854000, 12b9000, 0) ip_input+0x188(600006a8ca8, 7bb40800, 0, 0, 0, 0) i_dls_link_rx+0x1e0(60000319888, 3001f02a030, 0, 6000005b7a0, 2a100c43840, 2a100c43848) mac_rx+0xe4(600003155d8, 3001f02a030, 0, 600003154d8, 7be5f08c, 0) mac_soft_ring_drain+0xc8(60000315978, 60000ad5e00, 7b2e0e08, 1291800, fffe, 0) mac_soft_ring_worker+0x68(1913800, 6000005b7a0, 0, 60000ad5e4c, 60000315978, 60000ad5e00) thread_start+4(60000ad5e00, 0, 0, 0, 0, 0) > 7bb40bb0/s 0x7bb40bb0: mp->b_datap->db_ref == 1 > Can somebody verify if this is a known bug. Erik> Thanks > Steffen > > I configure thing via: > > ifconfig bge3 plumb > ifconfig bge3 10.1.14.158 netmask + broadcast + up > dladm create-vnic -d bge3 -m 0:9:7:8:9:1 -b 1000 1 > dladm create-vnic -d bge3 -m 0:9:7:8:9:2 -b 1000 2 > dladm create-vnic -d bge3 -m 0:9:7:8:9:3 -b 1000 3 > dladm show-vnic > > zoneadm -z z1 boot > zoneadm -z z2 boot > > Both zones have similar configs, here is z1: > # zonecfg -z z1 info > zonename: z1 > zonepath: /export/zones/z1 > autoboot: false > stacktype: exclusive > bootargs: > pool: > limitpriv: > inherit-pkg-dir: > dir: /lib > inherit-pkg-dir: > dir: /platform > inherit-pkg-dir: > dir: /sbin > inherit-pkg-dir: > dir: /usr > net: > physical: vnic1 > address: 10.1.14.151/26 > af not specified > restrict not specified > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://opensolaris.org/mailman/listinfo/crossbow-discuss
peter.memishian at sun.com
2006-Sep-18 18:11 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
> This is interesting. Steffen, what driver are you using? All Nemo> based drivers should have db_ref == 1 while things like ce should > come in via ip_rput where a copymsg can be done. > > So it seems like either we have a broken Nemo driver or something > weird is going on with ce. I still don''t understand the rationale for the ''db_ref == 1'' assumption in ip_input(); it means that we must always use copymsg() instead of dupmsg() inside GLDv3 when passing the same message up multiple streams. Among other things, this leads to a substantial waste of memory and time when passing messages up passive streams (e.g., to snoop), since each message must be copied in-full for no reason. Is all that worth it to save one comparison in ip_input()? (Did TSOL make ip_input() measurably slower when they added the is_system_labeled() check?) -- meem
Thirumalai Srinivasan
2006-Sep-18 21:06 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
The idea is that ip_rput is the slower input path, and will deal with the case of copying messages if needed, and eventually call ip_input(). If snoop is being used, and if a nemo driver wants to use dupmsg() instead of copymsg(), then it can switch the inbound path to go through dld and ip_rput(). But currently, we don''t use dupmsg() in i_dls_link_rx_promisc(). Also dupmsg() is not friendly to zero-copy. I can''t get the bugids or recall the details, (Erik may be able to recall this) but there were some deadlock issues with NFS zero copy in conjunction with snoop, and using copymsg() instead was a solution. Thirumalai peter.memishian at sun.com wrote:>I still don''t understand the rationale for the ''db_ref == 1'' assumption in >ip_input(); it means that we must always use copymsg() instead of dupmsg() >inside GLDv3 when passing the same message up multiple streams. Among >other things, this leads to a substantial waste of memory and time when >passing messages up passive streams (e.g., to snoop), since each message >must be copied in-full for no reason. Is all that worth it to save one >comparison in ip_input()? (Did TSOL make ip_input() measurably slower >when they added the is_system_labeled() check?) > >-- >meem >_______________________________________________ >crossbow-discuss mailing list >crossbow-discuss at opensolaris.org >http://opensolaris.org/mailman/listinfo/crossbow-discuss > >
Steffen Weiberle
2006-Sep-18 21:21 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
Any speculation whether this is due to crossbow bfu''ed code or code in b47? Would moving to 48 or back to 46 get around this? 46 is the oldest stuff i can get access to right now. Thanks! Erik Nordmark wrote On 09/18/06 12:27,:> Steffen Weiberle wrote: > >> I was able to get access to a system with bge interfaces, a v210. >> Installed nv47 on it and the bfu for crossbow. >> >> I am consistently panicing the system when pinging the IP address of a >> second zone with its own stack. Both zones are using the same gbe3 >> interface. I thought I had it working if bge3 had a real IP address on >> it, or if pinged outside addresses first, but turns out to not be the >> case. > > > I looked at your core files and the stack trace is: > rahul# mdb -k *.0 > Loading modules: [ unix krtld genunix specfs dtrace ufs scsi_vhci sd > pcisch ip sctpmdb: ld.so.1: mdb: fatal: relocation error: file > /usr/lib/mdb/kvm/sparcv9/arp.so: symbol mdb_mac_addr: referenced symbol > not found > usba nca lofs zfs random ptm md cpc fcip fctl sppp nfs ] > > $c > vpanic(12b93a0, 7bb40bb0, 7bb40bd0, 3698, 237fffd8, 0) > assfail+0x7c(7bb40bb0, 7bb40bd0, 3698, 1854000, 12b9000, 0) > ip_input+0x188(600006a8ca8, 7bb40800, 0, 0, 0, 0) > i_dls_link_rx+0x1e0(60000319888, 3001f02a030, 0, 6000005b7a0, 2a100c43840, > 2a100c43848) > mac_rx+0xe4(600003155d8, 3001f02a030, 0, 600003154d8, 7be5f08c, 0) > mac_soft_ring_drain+0xc8(60000315978, 60000ad5e00, 7b2e0e08, 1291800, > fffe, 0) > mac_soft_ring_worker+0x68(1913800, 6000005b7a0, 0, 60000ad5e4c, > 60000315978, > 60000ad5e00) > thread_start+4(60000ad5e00, 0, 0, 0, 0, 0) > > 7bb40bb0/s > 0x7bb40bb0: mp->b_datap->db_ref == 1 > > > > > Can somebody verify if this is a known bug. > > Erik > >> Thanks >> Steffen >> >> I configure thing via: >> >> ifconfig bge3 plumb >> ifconfig bge3 10.1.14.158 netmask + broadcast + up >> dladm create-vnic -d bge3 -m 0:9:7:8:9:1 -b 1000 1 >> dladm create-vnic -d bge3 -m 0:9:7:8:9:2 -b 1000 2 >> dladm create-vnic -d bge3 -m 0:9:7:8:9:3 -b 1000 3 >> dladm show-vnic >> >> zoneadm -z z1 boot >> zoneadm -z z2 boot >> >> Both zones have similar configs, here is z1: >> # zonecfg -z z1 info >> zonename: z1 >> zonepath: /export/zones/z1 >> autoboot: false >> stacktype: exclusive >> bootargs: >> pool: >> limitpriv: >> inherit-pkg-dir: >> dir: /lib >> inherit-pkg-dir: >> dir: /platform >> inherit-pkg-dir: >> dir: /sbin >> inherit-pkg-dir: >> dir: /usr >> net: >> physical: vnic1 >> address: 10.1.14.151/26 >> af not specified >> restrict not specified >> _______________________________________________ >> crossbow-discuss mailing list >> crossbow-discuss at opensolaris.org >> http://opensolaris.org/mailman/listinfo/crossbow-discuss > > > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://opensolaris.org/mailman/listinfo/crossbow-discuss
Kais Belgaied
2006-Sep-18 23:22 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
Erik Nordmark wrote On 09/18/06 09:27,:> > 7bb40bb0/s > 0x7bb40bb0: mp->b_datap->db_ref == 1 > > > > > Can somebody verify if this is a known bug.Not known in the crossbow-gate thus far, and a full text search in bugster reported no existing bug in stock onnv. The stack has fuctions that were introduced or significantly modified by CrossBow. Kais.> > Erik > >
peter.memishian at Sun.COM
2006-Sep-19 06:18 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
> The idea is that ip_rput is the slower input path, and will deal with> the case of copying messages if needed, and eventually call ip_input(). > If snoop is being used, and if a nemo driver wants to use dupmsg() > instead of copymsg(), then it can switch the inbound path to go through > dld and ip_rput(). How would a driver switch to an alternate entrypoint? Further, do we really want that complexity in each driver, and all of the inconsistencies across drivers that come with it? Moreover, what does this have to do with snoop? Seems quite likely to me that an application that opens a DLPI device won''t push any modules that modify the packet, and thus that we will still pay the memory and performance tax of needless copymsg() calls for those applications -- and all to save one comparison against a piece of memory that''s already in-cache. I also still want to know how much this questionable optimization actually saves us in the ideal (single-listener) case. Can we even measure it? > Also dupmsg() is not friendly to zero-copy. I can''t get the bugids or > recall the details, (Erik may be able to recall this) but there were > some deadlock issues with NFS zero copy in conjunction with snoop, and > using copymsg() instead was a solution. I''d like to hear more about this. Sounds related to the fact that snf_segmap() uses desballoca(), but that causes deadlocks in non-GLDv3 cases (e.g., see 6459866) and has to be changed. -- meem
Thirumalai Srinivasan
2006-Sep-19 17:13 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
> > >I also still want to know how much this questionable optimization actually >saves us in the ideal (single-listener) case. Can we even measure it? >This is subjective and we can keep on arguing about this endlessly :) Yes, the time taken to execute each individual check is not measurable on a benchmark, but the aggregate of all these checks is measurable. The real solution is to rearchitect the data paths as a separate project.> > >How would a driver switch to an alternate entrypoint? Further, do we >really want that complexity in each driver, and all of the inconsistencies > >Not the driver, but the nemo code. It is already doing that. The switch from i_dls_link_rx to i_dls_link_rx_promisc etc. The mechanism of switching callbacks is already there. In any case the nemo code currently uses copymsg(), I should have said, "if the nemo DLS code wanted to use dupmsg() instead of copymsg".>across drivers that come with it? Moreover, what does this have to do >with snoop? Seems quite likely to me that an application that opens a >DLPI device won''t push any modules that modify the packet, and thus that >we will still pay the memory and performance tax of needless copymsg() >calls for those applications -- and all to save one comparison against a >piece of memory that''s already in-cache. > >Which is the common case that needs to be optimized? To me that is IP and ARP plumbed over a device, and not snoop or apps running on raw DLPI devices.> > Also dupmsg() is not friendly to zero-copy. I can''t get the bugids or > > recall the details, (Erik may be able to recall this) but there were > > some deadlock issues with NFS zero copy in conjunction with snoop, and > > using copymsg() instead was a solution. > >I''d like to hear more about this. Sounds related to the fact that >snf_segmap() uses desballoca(), but that causes deadlocks in non-GLDv3 >cases (e.g., see 6459866) and has to be changed. > >I will let you know if I can dig it up, otherwise check with Erik or some NFS old timer. Thirumalai>-- >meem > >
peter.memishian at sun.com
2006-Sep-19 17:33 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
> This is subjective and we can keep on arguing about this endlessly :)The panic isn''t subjective -- and enhancing Nemo to switch the callbacks around when the number of listener streams changes will undoubtedly make the code more complex. Complexity also ends up in the Clearview softmac driver, which has to ensure that messages passed upstream from legacy drivers have a db_ref of 1 (or do the copymsg()). Further, the memory overhead associated with copymsg() is certainly measureable -- as is its performance overhead. All of this to avoid one check in ip_input() does not seem a fair balance. > >across drivers that come with it? Moreover, what does this have to do > >with snoop? Seems quite likely to me that an application that opens a > >DLPI device won''t push any modules that modify the packet, and thus that > >we will still pay the memory and performance tax of needless copymsg() > >calls for those applications -- and all to save one comparison against a > >piece of memory that''s already in-cache. > > > > Which is the common case that needs to be optimized? See above. I understand the notion of "a death by a thousand cuts", but I just don''t see how the elimination of this one cut justifies the above. -- meem
James Carlson
2006-Sep-19 17:50 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
Peter.Memishian at Sun.COM writes:> > This is subjective and we can keep on arguing about this endlessly :) > > The panic isn''t subjective -- and enhancing Nemo to switch the callbacks > around when the number of listener streams changes will undoubtedly make > the code more complex. Complexity also ends up in the Clearview softmac > driver, which has to ensure that messages passed upstream from legacy > drivers have a db_ref of 1 (or do the copymsg()). Further, the memory > overhead associated with copymsg() is certainly measureable -- as is its > performance overhead. All of this to avoid one check in ip_input() does > not seem a fair balance.Indeed. If we need to optimize the heck out of it, we could take the same tack as DBLK_RTFU_WORD() and do something like: #ifdef _BIG_ENDIAN #define DB_REF_1_VAL ((1<<8) | M_DATA) #else #define DB_REF_1_VAL (1 | (M_DATA<<8)) #endif #define DBLK_DATA_REF_1(dbp) (*(uint16_t *)&dbp->db_ref == DB_REF_1_VAL) That way, you can force all the non-data and non-ref-1 blocks into the slow path right off the bat in a single test. Because the chain mblk support is still internal, you could also redefine that interface so that only the first buffer in a chain needs to be tested: require the caller to place any dupb''d or non-M_DATA blocks at the head of independent chains. I think there are potentially quite a few performance-related improvements that could be made here without making the code brittle. -- James Carlson, KISS Network <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Thirumalai Srinivasan
2006-Sep-19 18:01 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
Peter.Memishian at sun.com wrote:> > This is subjective and we can keep on arguing about this endlessly :) > >The panic isn''t subjective -- and enhancing Nemo to switch the callbacks > >I didn''t think we were arguing about the panic. If you are referring to 6471727, and if you have the root cause, please share it with us.>around when the number of listener streams changes will undoubtedly make >the code more complex. Complexity also ends up in the Clearview softmac >driver, which has to ensure that messages passed upstream from legacy > >Why is the "complexity" of checking for db_ref of 1 and potentially doing a copymsg() greater when done in the softmac than when done in IP ?>drivers have a db_ref of 1 (or do the copymsg()). Further, the memory >overhead associated with copymsg() is certainly measureable -- as is its >performance overhead. All of this to avoid one check in ip_input() does >not seem a fair balance. > >Agreed, the copymsg overhead is substantial, but to me that is not the important case, but if the team feels otherwise, I defer to the team''s opinion. Thirumalai
peter.memishian at sun.com
2006-Sep-19 18:34 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
> >The panic isn''t subjective -- and enhancing Nemo to switch the callbacks> > I didn''t think we were arguing about the panic. If you are referring to > 6471727, and if you have the root cause, please share it with us. I''m simply saying that the code does not cause only subjective problems. > > Complexity also ends up in the Clearview softmac > > driver, which has to ensure that messages passed upstream from legacy > > Why is the "complexity" of checking for db_ref of 1 and potentially doing > a copymsg() greater when done in the softmac than when done in IP ? It isn''t itself more complex, but putting the check in IP addresses everyone''s needs; softmac only can handle the check for legacy drivers. It also means that one of the undocumented requirements for polling mode ends up in softmac. That said, I don''t think this is a major issue -- I was pointing it out mainly to illustrate that the problem isn''t localized. > >drivers have a db_ref of 1 (or do the copymsg()). Further, the memory > >overhead associated with copymsg() is certainly measureable -- as is its > >performance overhead. All of this to avoid one check in ip_input() does > >not seem a fair balance. > > Agreed, the copymsg overhead is substantial, but to me that is not the > important case, but if the team feels otherwise, I defer to the team''s > opinion. Do you have any thoughts on Jim''s proposed approach? -- meem
Thirumalai Srinivasan
2006-Sep-19 19:33 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
I agree, these are possible optimizations. But there is the other dimension where instead of funneling all packets through a single entry point, and doing these optimized checks serially, we may be able to send them through parallel paths (at least in some cases). For eg. after packet classification at the lowest level we may be able to bypass a lot of IP processing for TCP or UDP packets that satisfy some constraints, and with some assumptions (such as db_ref, no snoop etc..) Thirumalai James Carlson wrote:>Indeed. If we need to optimize the heck out of it, we could take the >same tack as DBLK_RTFU_WORD() and do something like: > >#ifdef _BIG_ENDIAN >#define DB_REF_1_VAL ((1<<8) | M_DATA) >#else >#define DB_REF_1_VAL (1 | (M_DATA<<8)) >#endif >#define DBLK_DATA_REF_1(dbp) (*(uint16_t *)&dbp->db_ref == DB_REF_1_VAL) > >That way, you can force all the non-data and non-ref-1 blocks into the >slow path right off the bat in a single test. > >Because the chain mblk support is still internal, you could also >redefine that interface so that only the first buffer in a chain needs >to be tested: require the caller to place any dupb''d or non-M_DATA >blocks at the head of independent chains. > >I think there are potentially quite a few performance-related >improvements that could be made here without making the code brittle. > > >
Thirumalai Srinivasan
2006-Sep-19 20:15 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
Peter.Memishian at sun.com wrote:> >It isn''t itself more complex, but putting the check in IP addresses >everyone''s needs; softmac only can handle the check for legacy drivers. >It also means that one of the undocumented requirements for polling mode >ends up in softmac. That said, I don''t think this is a major issue -- I >was pointing it out mainly to illustrate that the problem isn''t localized. > > >ok.>Do you have any thoughts on Jim''s proposed approach? > >Yes, I just responded to Jim''s email. Thirumalai
James Carlson
2006-Sep-19 20:24 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
Thirumalai Srinivasan writes:> I agree, these are possible optimizations. But there is the other dimension > where instead of funneling all packets through a single entry point, > and doing > these optimized checks serially,The idea here was to avoid the serial checks in the case where you get "expected" data. That way, you don''t encounter the complexity of dealing with multiple entry points and the problems that these fob off onto the callers, and you don''t get reduced performance in the cases you care about.> we may be able to send them through > parallel paths > (at least in some cases).Parallel paths have their own costs. In particular, they increase I$ pressure, which is likely a non-trivial issue in our one-packet-run- to-completion-block-or-queue processing model.> For eg. after packet classification at the > lowest level > we may be able to bypass a lot of IP processing for TCP or UDP packets > that > satisfy some constraints, and with some assumptions (such as db_ref, no > snoop etc..)Yes. But if you can do those with a minimum of push-back on the caller, you''ve got something good. -- James Carlson, KISS Network <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Erik Nordmark
2006-Sep-20 02:07 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
Peter.Memishian at sun.com wrote:> Further, the memory > overhead associated with copymsg() is certainly measureable -- as is its > performance overhead. All of this to avoid one check in ip_input() does > not seem a fair balance.I must be missing something. What is the difference in memory overhead between GLD doing a copymsg, and GLD doing a dupmsg followed by ip_rput/ip_input/whatever checking db_ref > 1 and doing a copymsg? Erik
Thirumalai Srinivasan
2006-Sep-20 02:42 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
Assuming you have many snoops running, only IP has to do a copymsg, whereas if GLD did it, it would have to copymsg() on all streams. But this is really a moot point. Only IP enables direct callback in the input path bypassing DLD. (through a capability negotiation). Callbacks into other consumers will be directed through DLD. So GLD can potentially figure out when and where to do the copymsg. Thirumalai Erik Nordmark wrote:> > I must be missing something. > > What is the difference in memory overhead between GLD doing a copymsg, > and GLD doing a dupmsg followed by ip_rput/ip_input/whatever checking > db_ref > 1 and doing a copymsg? > > Erik
Mike Ditto
2006-Sep-20 02:46 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
Erik Nordmark wrote:> What is the difference in memory overhead between GLD doing a copymsg, > and GLD doing a dupmsg followed by ip_rput/ip_input/whatever checking > db_ref > 1 and doing a copymsg?There is a chance (perhaps infinitesimal) that one of the dup msg consumers will dispose of the message before the other consumer sees it. There is a chance (perhaps infinitesimal) that the IP code will get smarter someday and not need to copy every packet. I''ve bumped into the IP restriction twice in recent months, once while writing crossbow/gld code and again with the bridge/loopback code. It seems to me that IP is imposing a burden on lower layers that should be handled in IP so that it''s localized. Whatever design quirk in IP causes it to need its own copy of packets should be handled above the data link interface so that it could be fixed someday entirely within the IP implementation. If IP can distinguish some packets, even if it''s only odd cases like packets to be dropped for some reason, that won''t need to be copied and do the copy only when it''s actually needed, it could make a significant difference in some cases. I think the VNIC code has changed quite a bit since I looked at it but it seemed at the time that the requirement, imposed by IP, to copy instead of dup certain packets was likely to have a negative performance impact. -=] Mike [=-
peter.memishian at sun.com
2006-Sep-20 04:32 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
> > Further, the memory> > overhead associated with copymsg() is certainly measureable -- as is its > > performance overhead. All of this to avoid one check in ip_input() does > > not seem a fair balance. > > I must be missing something. > > What is the difference in memory overhead between GLD doing a copymsg, > and GLD doing a dupmsg followed by ip_rput/ip_input/whatever checking > db_ref > 1 and doing a copymsg? For example, imagine you run a snoop on a machine running DHCP. GLD will create three copies -- one for the IP stream, one for the DHCP stream[1], and one for the snoop stream. Only two copies were needed. This case is not unique -- any time you have more than one read-only DLPI consumer, we end up consuming needless memory. There''s also the transmit loopback path, which currently creates a copy but doesn''t need to AFAIK. In addition, as Mike has pointed out, it''s not clear why IP always needs to make its own copy. By making the copy inside IP, we can further optimize it in the future -- but if it spreads to IP''s consumers, then it''s harder to change. [1] Yes, we''d like the DHCP client to not use DLPI -- and we''re working on that as part of Clearview. But unless we''re going to ban the use of DLPI by multiple clients, that use shouldn''t impose a needless memory tax. -- meem
Erik Nordmark
2006-Sep-22 03:05 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
peter.memishian at sun.com wrote:> For example, imagine you run a snoop on a machine running DHCP. GLD will > create three copies -- one for the IP stream, one for the DHCP stream[1], > and one for the snoop stream. Only two copies were needed. This case is > not unique -- any time you have more than one read-only DLPI consumer, we > end up consuming needless memory. There''s also the transmit loopback > path, which currently creates a copy but doesn''t need to AFAIK.I guess I''d suppressed the state of the DHCP client. But does the DHCP client really hold a DLPI stream attached and bound even when it is not waiting for a response from the DHCP servers? That seems suboptimal even if GLD does a dupmsg; the dupmsg + freemsg aren''t free even if they are less expensive than copymsg + freemsg. Do we know for sure that the use of dupmsg in GLD doesn''t make IP do a copymsg in any case? I would think I would see db_ref = 2 when GLD does a dupmsg, hence it would copy.> In addition, as Mike has pointed out, it''s not clear why IP always needs > to make its own copy. By making the copy inside IP, we can further > optimize it in the future -- but if it spreads to IP''s consumers, then > it''s harder to change.I think we had attempts at doing it selectively in the past. But there are lots of places where a packet is modified (IP option processing, ICMP handling, raw sockets) plus we don''t know for sure if kernel code above TCP has been built with the assumption that the packet can be modified. We can try to change this once the IP datapaths are in a better shape, as long as the kernel code above TCP doesn''t make bad assumptions. Erik
peter.memishian at sun.com
2006-Sep-22 08:36 UTC
[crossbow-discuss] panicing b47 pinging between exclusive zones
> But does the DHCP client really hold a DLPI stream attached and bound> even when it is not waiting for a response from the DHCP servers? > That seems suboptimal even if GLD does a dupmsg; the dupmsg + freemsg > aren''t free even if they are less expensive than copymsg + freemsg. Yes it does, and yes it is; see 4863327. > Do we know for sure that the use of dupmsg in GLD doesn''t make IP do a > copymsg in any case? I would think I would see db_ref = 2 when GLD does > a dupmsg, hence it would copy. It will copy -- my point was that there will only be two copies (instead of three) if snoop (or another DLPI application) is run. > > In addition, as Mike has pointed out, it''s not clear why IP always needs > > to make its own copy. By making the copy inside IP, we can further > > optimize it in the future -- but if it spreads to IP''s consumers, then > > it''s harder to change. > > I think we had attempts at doing it selectively in the past. > But there are lots of places where a packet is modified (IP option > processing, ICMP handling, raw sockets) plus we don''t know for sure if > kernel code above TCP has been built with the assumption that the packet > can be modified. Sendfile causes dblk_t''s to flow through TCP that cannot be modified, no? In any case, it will still be harder to optimize this in the future if the limitation becomes part of IP''s interface, rather than only part of its implementation. -- meem
David Edmondson
2006-Sep-22 09:05 UTC
[crossbow-discuss] Re: panicing b47 pinging between exclusive zones
* Peter.Memishian at Sun.COM [2006-09-22 09:36:32]> Sendfile causes dblk_t''s to flow through TCP that cannot be > modified, no?Isn''t it only the (bottom of the) receive path which modifies the packets? The Xen dom0 network driver also suffers due to the packets being modified - by default the pages containing packets passed from domU are mapped read-only in dom0. dme. -- David Edmondson, Sun Microsystems, http://www.dme.org