>>>>> "mg" == Mike Gerdts <mgerdts at gmail.com>
writes:
>>>>> "sw" == Saxon, Will <Will.Saxon at
sage.com> writes:
sw> I think there may be very good reason to use iSCSI, if
you''re
sw> limited to gigabit but need to be able to handle higher
sw> throughput for a single client.

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6817942

look at it now before it gets pulled back inside the wall. :(

I think this bug was posted on zfs-discuss earlier. Please see the
comments because he is not using lagg''s: even with a single 10Gbit/s
NIC, you cannot use the link well unless you take advantage of the
multiple MSI''s and L4 preclass built into the NIC. You need multiple
TCP circuits between client and server so that each will fire a
different MSI. He got about 3x performance using 8 connections.

It sounds like NFS is already fixed for this, but requires manual
tuning of clnt_max_conns and the number of reader and writer threads.

mg> it is rather common to have multiple 1 Gb links to
mg> servers going to disparate switches so as to provide
mg> resilience in the face of switch failures. This is not unlike
mg> (at a block diagram level) the architecture that you see in
mg> pretty much every SAN. In such a configuation, it is
mg> reasonable for people to expect that load balancing will
mg> occur.

nope. spanning tree removes all loops, which means between any two
points there will be only one enabled path. An L2-switched network
will look into L4 headers for splitting traffic across an aggregated
link (as long as it''s been deliberately configured to do that---by
default probably only looks to L2), but it won''t do any multipath
within the mesh.

Even with an L3 routing protocol it usually won''t do multipath unless
the costs of the paths match exactly, so you''d want to build the
topology to achieve this and then do all switching at layer 3 by
making sure no VLAN is larger than a switch.

There''s actually a cisco feature to make no VLAN larger than a *port*,
which I use a little bit. It''s meant for CATV networks I think, or
DSL networks aggregated by IP instead of ATM like maybe some European
ones? but the idea is not to put edge ports into vlans any more but
instead say ''ip unnumbered loopbackN'', and then some black
magic they
have built into their DHCP forwarder adds /32 routes by watching the
DHCP replies. If you don''t use DHCP you can add static /32 routes
yourself, and it will work. It does not help with IPv6, and also you
can only use it on vlan-tagged edge ports (whaaaaat? arbitrary!) but
neat that it''s there at all.

http://www.cisco.com/en/US/docs/ios/12_3t/12_3t4/feature/guide/gtunvlan.html

The best thing IMHO would be to use this feature on the edge ports,
just as I said, but you will have to teach the servers to VLAN-tag
their packets. not such a bad idea, but weird.

You could also use it one hop up from the edge switches, but I think
it might have problems in general removing the routes when you unplug
a server, and using it one hop up could make them worse. I only use
it with static routes so far, so no mobility for me: I have to keep
each server plugged into its assigned port, and reconfigure switches
if I move it. Once you have ``no vlan larger than 1 switch,''''
if you
actually need a vlan-like thing that spans multiple switches, the new
word for it is ''vrf''.

so, yeah, it means the server people will have to take over the job of
the networking people. The good news is that networking people don''t
like spanning tree very much because it''s always going wrong, so
AFAICT most of them who are paying attention are already moving in
this direction.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100726/978759b8/attachment.bin>

Mike Gerdts

2010-Jul-26 21:08 UTC

head link

[zfs-discuss] NFS performance?

On Mon, Jul 26, 2010 at 2:56 PM, Miles Nordin <carton at ivy.net>
wrote:>>>>>> "mg" == Mike Gerdts <mgerdts at
gmail.com> writes:
> ? ?mg> it is rather common to have multiple 1 Gb links to
> ? ?mg> servers going to disparate switches so as to provide
> ? ?mg> resilience in the face of switch failures. ?This is not unlike
> ? ?mg> (at a block diagram level) the architecture that you see in
> ? ?mg> pretty much every SAN. ?In such a configuation, it is
> ? ?mg> reasonable for people to expect that load balancing will
> ? ?mg> occur.
>
> nope. ?spanning tree removes all loops, which means between any two
> points there will be only one enabled path. ?An L2-switched network
> will look into L4 headers for splitting traffic across an aggregated
> link (as long as it''s been deliberately configured to do that---by
> default probably only looks to L2), but it won''t do any multipath
> within the mesh.
I was speaking more of IPMP, which is at layer 3.
> Even with an L3 routing protocol it usually won''t do multipath
unless
> the costs of the paths match exactly, so you''d want to build the
> topology to achieve this and then do all switching at layer 3 by
> making sure no VLAN is larger than a switch.
By default, IPMP does outbound load spreading.  Inbound load spreading
is not practical with a single (non-test) IP address.  If you have
multiple virtual IP''s you can spread them across all of the NICs in
the IPMP group and get some degree of inbound spreading as well.  This
is the default behavior of the OpenSolaris IPMP implementation, last I
looked.  I''ve not seen any examples (although I can''t say
I''ve looked
real hard either) of the Solaris 10 IPMP configuration set up with
multipe IP''s to encourage inbound load spreading as well.
>
> There''s actually a cisco feature to make no VLAN larger than a
*port*,
> which I use a little bit. ?It''s meant for CATV networks I think,
or
> DSL networks aggregated by IP instead of ATM like maybe some European
> ones? ?but the idea is not to put edge ports into vlans any more but
> instead say ''ip unnumbered loopbackN'', and then some
black magic they
> have built into their DHCP forwarder adds /32 routes by watching the
> DHCP replies. ?If you don''t use DHCP you can add static /32 routes
> yourself, and it will work. ?It does not help with IPv6, and also you
> can only use it on vlan-tagged edge ports (whaaaaat? arbitrary!) but
> neat that it''s there at all.
>
>
?http://www.cisco.com/en/US/docs/ios/12_3t/12_3t4/feature/guide/gtunvlan.html
Interesting... however this seems to limit you to < 4096 edge ports
per VTP domain, as the VID field in the 802.1q header is only 12 bits.
 It is also unclear how this works when you have one physical host
with many guests.  And then there is the whole thing that I don''t
really see how this helps with resilience in the face of a switch
failure.  Cool technology, but I''m not certain that it addresses what
I was talking about.
>
> The best thing IMHO would be to use this feature on the edge ports,
> just as I said, but you will have to teach the servers to VLAN-tag
> their packets. ?not such a bad idea, but weird.
>
> You could also use it one hop up from the edge switches, but I think
> it might have problems in general removing the routes when you unplug
> a server, and using it one hop up could make them worse. ?I only use
> it with static routes so far, so no mobility for me: I have to keep
> each server plugged into its assigned port, and reconfigure switches
> if I move it. ?Once you have ``no vlan larger than 1
switch,'''' if you
> actually need a vlan-like thing that spans multiple switches, the new
> word for it is ''vrf''.
There was some other Cisco dark magic that our network guys were
touting a while ago that would make each edge switch look like a blade
in a 6500 series.  This would then allow them to do link aggregation
across edge switches.  At least two of "organizational changes",
"personnel changes", and "roadmap changes" happened so
I''ve not seen
this in action.
>
> so, yeah, it means the server people will have to take over the job of
> the networking people. ?The good news is that networking people
don''t
> like spanning tree very much because it''s always going wrong, so
> AFAICT most of them who are paying attention are already moving in
> this direction.
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>


-- 
Mike Gerdts
http://mgerdts.blogspot.com/

zfs discuss - Jul 2010 - NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] iSCSI vs. NFS was RE: NFS performance?

[zfs-discuss] NFS performance?

[zfs-discuss] NFS performance?