https://github.com/torvalds/linux
Revision 85746e429f8e5dc8c5c0beadc0f099cb1feab93e authored by Linus Torvalds on 07 July 2011, 20:16:21 UTC, committed by Linus Torvalds on 07 July 2011, 20:16:21 UTC
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (31 commits)
  sctp: fix missing send up SCTP_SENDER_DRY_EVENT when subscribe it
  net: refine {udp|tcp|sctp}_mem limits
  vmxnet3: round down # of queues to power of two
  net: sh_eth: fix the parameter for the ETHER of SH7757
  net: sh_eth: fix cannot work half-duplex mode
  net: vlan: enable soft features regardless of underlying device
  vmxnet3: fix starving rx ring whenoc_skb kb fails
  bridge: Always flood broadcast packets
  greth: greth_set_mac_add would corrupt the MAC address.
  net: bind() fix error return on wrong address family
  natsemi: silence dma-debug warnings
  net: 8139too: Initial necessary vlan_features to support vlan
  Fix call trace when interrupts are disabled while sleeping function kzalloc is called
  qlge:Version change to v1.00.00.29
  qlge: Fix printk priority so chip fatal errors are always reported.
  qlge:Fix crash caused by mailbox execution on wedged chip.
  xfrm4: Don't call icmp_send on local error
  ipv4: Don't use ufo handling on later transformed packets
  xfrm: Remove family arg from xfrm_bundle_ok
  ipv6: Don't put artificial limit on routing table size.
  ...
2 parent s 4dd1b49 + 9491230
Raw File
Tip revision: 85746e429f8e5dc8c5c0beadc0f099cb1feab93e authored by Linus Torvalds on 07 July 2011, 20:16:21 UTC
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Tip revision: 85746e4
ipoib.txt
IP OVER INFINIBAND

  The ib_ipoib driver is an implementation of the IP over InfiniBand
  protocol as specified by RFC 4391 and 4392, issued by the IETF ipoib
  working group.  It is a "native" implementation in the sense of
  setting the interface type to ARPHRD_INFINIBAND and the hardware
  address length to 20 (earlier proprietary implementations
  masqueraded to the kernel as ethernet interfaces).

Partitions and P_Keys

  When the IPoIB driver is loaded, it creates one interface for each
  port using the P_Key at index 0.  To create an interface with a
  different P_Key, write the desired P_Key into the main interface's
  /sys/class/net/<intf name>/create_child file.  For example:

    echo 0x8001 > /sys/class/net/ib0/create_child

  This will create an interface named ib0.8001 with P_Key 0x8001.  To
  remove a subinterface, use the "delete_child" file:

    echo 0x8001 > /sys/class/net/ib0/delete_child

  The P_Key for any interface is given by the "pkey" file, and the
  main interface for a subinterface is in "parent."

Datagram vs Connected modes

  The IPoIB driver supports two modes of operation: datagram and
  connected.  The mode is set and read through an interface's
  /sys/class/net/<intf name>/mode file.

  In datagram mode, the IB UD (Unreliable Datagram) transport is used
  and so the interface MTU has is equal to the IB L2 MTU minus the
  IPoIB encapsulation header (4 bytes).  For example, in a typical IB
  fabric with a 2K MTU, the IPoIB MTU will be 2048 - 4 = 2044 bytes.

  In connected mode, the IB RC (Reliable Connected) transport is used.
  Connected mode takes advantage of the connected nature of the IB
  transport and allows an MTU up to the maximal IP packet size of 64K,
  which reduces the number of IP packets needed for handling large UDP
  datagrams, TCP segments, etc and increases the performance for large
  messages.

  In connected mode, the interface's UD QP is still used for multicast
  and communication with peers that don't support connected mode. In
  this case, RX emulation of ICMP PMTU packets is used to cause the
  networking stack to use the smaller UD MTU for these neighbours.

Stateless offloads

  If the IB HW supports IPoIB stateless offloads, IPoIB advertises
  TCP/IP checksum and/or Large Send (LSO) offloading capability to the
  network stack.

  Large Receive (LRO) offloading is also implemented and may be turned
  on/off using ethtool calls.  Currently LRO is supported only for
  checksum offload capable devices.

  Stateless offloads are supported only in datagram mode.  

Interrupt moderation

  If the underlying IB device supports CQ event moderation, one can
  use ethtool to set interrupt mitigation parameters and thus reduce
  the overhead incurred by handling interrupts.  The main code path of
  IPoIB doesn't use events for TX completion signaling so only RX
  moderation is supported.

Debugging Information

  By compiling the IPoIB driver with CONFIG_INFINIBAND_IPOIB_DEBUG set
  to 'y', tracing messages are compiled into the driver.  They are
  turned on by setting the module parameters debug_level and
  mcast_debug_level to 1.  These parameters can be controlled at
  runtime through files in /sys/module/ib_ipoib/.

  CONFIG_INFINIBAND_IPOIB_DEBUG also enables files in the debugfs
  virtual filesystem.  By mounting this filesystem, for example with

    mount -t debugfs none /sys/kernel/debug

  it is possible to get statistics about multicast groups from the
  files /sys/kernel/debug/ipoib/ib0_mcg and so on.

  The performance impact of this option is negligible, so it
  is safe to enable this option with debug_level set to 0 for normal
  operation.

  CONFIG_INFINIBAND_IPOIB_DEBUG_DATA enables even more debug output in
  the data path when data_debug_level is set to 1.  However, even with
  the output disabled, enabling this configuration option will affect
  performance, because it adds tests to the fast path.

References

  Transmission of IP over InfiniBand (IPoIB) (RFC 4391)
    http://ietf.org/rfc/rfc4391.txt 
  IP over InfiniBand (IPoIB) Architecture (RFC 4392)
    http://ietf.org/rfc/rfc4392.txt 
  IP over InfiniBand: Connected Mode (RFC 4755)
    http://ietf.org/rfc/rfc4755.txt
back to top