Monday, 22 November 2010

IP over FC

A few notes about my experiments with IP over FC.

We have Windows, Mac and Linux clients. We have qlogic fibre switches and mostly qlogic HBAs. On one of the vendors visits to fix the SAN, all the LSI HBAs were pulled from all the Macs, and replaced with qlogic. So during my research, I've been mostly hacking with these LSI cards. These are interesting from a Linux point of view since the drivers are all in the default kernel on just about every distro I've tried - including the IP over FC driver 'mptlan.ko' - these are part of the whole 'Fusion' driver.

When I first plugged these cards into some spare machines - up popped the network interfaces as fc0 etc, and standard 'ifconfig' got them all working. Before long I had NFS working quite nicely over the connection. But the interfaces come up as:

fc0 Link encap:16/4 Mbps Token Ring HWaddr xx:xx:xx:xx:xx:xx
inet addr: Bcast: Mask:
RX packets:72250 errors:0 dropped:0 overruns:0 frame:0
TX packets:66105 errors:20 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:8192
RX bytes:15940069 (15.9 MB) TX bytes:564886898 (564.8 MB)

...and this 16/4 Mbps Token Ring, accurately describes the performance I'm getting. I've tried all sorts of things to try and force the card into a different type of encapsulation, but nothing works. I though it might be because I was connecting two machines point to point - but the same things happens if I bring up the card when it's plugged into a switch.

So I've tried to repeat the experiment with the qlogic cards we have. I've managed to get the driver from before the IP over FC was removed and at the moment I'm trying to find a distro that is close enough to RHEL or SLES to get this compiled, while at the same time being up to date enough to support the software stack I need to run on top (Realtime 3D/Video/Audio software) - but I have concerns about the practicalities of having to deploy this on visualisation clusters, and the brittle-ness of relying on out-of-date drivers...

So I'm wondering what my next move is - do I tell the University to jump ship to StorNext (another unknown quantity); treat the arrays as JBODS and let each machine do what it wants with its slice. Or pay for some contractors to come and re-configure MetaSAN to work (I have no faith in the product now, and a couple of days of hacking has taken the mystery
out of FC, to the point I'm not convinced there is anything left to configure)...