Skip to content

Infiniband

IB简介

RDMA - Remote Direct Memory Access 远程直接内存存取。

InfiniBand是一种高性能计算机网络通信标准,它具有极高的吞吐量和极低的延迟。如果您需要使用InfiniBand进行编程,您需要使用支持InfiniBand的编程语言(如C++)来编写代码。

机构和组织:

OFA: Open Fabrics Alliance.

IBTA: InfiniBand Trade Association.

概念

  • CQ - Complete Queue 完成队列
  • WQ - Work Queue 工作队列
  • WR - Work Request 工作请求
  • QP - Queue Pairs 队列对(Send-Receive)
  • SQ - Send Queue 发送队列
  • RQ - Receive Queue 接收队列
  • PD - Protection Domain 保护域,将QP和MR结合在一起
  • MR - Memory Region 内存区域。一块经注册过的且本地网卡可以读写的内存区域。包含R_Key和L_Key。
  • SGE - Scatter/Gather Elements 分散/聚集元素。
  • R_Key - Remote Key
  • L_Key - Local Key
  • CA - (Host) Channel Adapter, an inifiniband network interface card.
  • NIC - Network Interface Card 网卡。
  • LID - Local Identifier.
  • CM - Connection Manager.

其他常见缩写:

  • RC - reliable connected.
  • SCSI - Small Computer System Interface 小型计算机系统接口。
  • SRP - SCSI RDMA Protocol. / Secure Remote Password.

博客:https://blog.51cto.com/liangchaoxi/4044818

安装

InfiniBand 和 RDMA 相关软件包

sudo apt-get install infiniband-diags
sudo apt install ibverbs-utils

API

以下是一些支持InfiniBand的C++库:

Infinity:这是一个轻量级的C++ RDMA库,用于InfiniBand网络。它提供了对两侧(发送/接收)和单侧(读/写/原子)操作的支持,并且是一个简单而强大的面向对象的ibVerbs抽象。该库使用户能够构建使用RDMA的复杂应用程序,而不会影响性能1

OFED:这是一个开放式Fabrics Enterprise Distribution,它提供了对InfiniBand和RoCE(RDMA over Converged Ethernet)技术的支持。OFED提供了一组用户空间库和驱动程序,可用于构建支持RDMA的应用程序2

以下是使用Infinity库编写支持InfiniBand的C++代码示例:

// 创建新上下文
infinity::core::Context *context = new infinity::core::Context();

// 创建队列对
infinity::queues::QueuePairFactory *qpFactory = new infinity::queues::QueuePairFactory(context);
infinity::queues::QueuePair *qp = qpFactory->connectToRemoteHost(SERVER_IP, PORT_NUMBER);

// 创建并向网络注册缓冲区
infinity::memory::Buffer *localBuffer = new infinity::memory::Buffer(context, BUFFER_SIZE);

// 从远程缓冲区读取(单向)并等待完成
infinity::memory::RegionToken *remoteBufferToken = new infinity::memory::RegionToken(REMOTE_BUFFER_INFO);
infinity::requests::RequestToken requestToken(context);
qp->read(localBuffer, remoteBufferToken, &requestToken);
requestToken.waitUntilCompleted();

// 将本地缓冲区的内容写入远程缓冲区(单向)并等待完成
qp->write(localBuffer, remoteBufferToken, &requestToken);
requestToken.waitUntilCompleted();

// 将本地缓冲区的内容通过队列对发送(双向)并等待完成
qp->send(localBuffer, &requestToken);
requestToken.waitUntilCompleted();

// 关闭连接
delete remoteBufferToken;
delete localBuffer;
delete qp;
delete qpFactory;
delete context;

以下是使用OFED库编写支持InfiniBand的C++代码示例:

// 创建新上下文
struct ibv_context *context = ibv_open_device(*device);

// 创建完成端口
struct ibv_pd *pd = ibv_alloc_pd(context);

// 创建队列对
struct ibv_qp_init_attr qp_init_attr;
memset(&qp_init_attr, 0, sizeof(qp_init_attr));
qp_init_attr.send_cq = cq;
qp_init_attr.recv_cq = cq;
qp_init_attr.qp_type = IBV_QPT_RC;
qp_init_attr.cap.max_send_wr = 1;
qp_init_attr.cap.max_recv_wr = 1;
qp_init_attr.cap.max_send_sge = 1;
qp_init_attr.cap.max_recv_sge = 1;
struct ibv_qp *qp = ibv_create_qp(pd, &qp_init_attr);

// 创建并向网络注册缓冲区
char *localBuffer = (char *)malloc(BUFFER_SIZE);
struct ibv_mr *mr = ibv_reg_mr(pd, localBuffer, BUFFER_SIZE, IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_READ | IBV_ACCESS_REMOTE_WRITE);

// 连接到远程主机
struct sockaddr_in remoteAddress;
memset(&remoteAddress, 0, sizeof(remoteAddress));
remoteAddress.sin_family = AF_INET;
remoteAddress.sin_port = htons(PORT_NUMBER);
inet_pton(AF_INET, SERVER_IP, &remoteAddress.sin_addr);
struct rdma_cm_id *cmId;
rdma_create_id(*eventChannel, &cmId, NULL, RDMA_PS_TCP);
rdma_resolve_addr(cmId, NULL, (struct sockaddr *)&remoteAddress, RESOLVE_TIMEOUT_MS);

// 等待连接完成
rdma_wait_event(*eventChannel, RDMA_CM_EVENT_ESTABLISHED);
rdma_ack_cm_event(cmEvent);

// 获取远程缓冲区信息
struct ibv_wc wc;
ibv_post_recv(qp, &recvWr, &badRecvWr);
do {
    ibv_poll_cq(cq, 1, &wc);
} while (wc.status != IBV_WC_SUCCESS || wc.opcode != IBV_WC_RECV_RDMA_WITH_IMM || wc.imm_data != htonl(IMM_DATA));
remoteBufferInfo.rkey = ntohl(wc.imm_data >> 8);
remoteBufferInfo.vaddr = wc.wr_id;

// 将本地缓冲区的内容写入远程缓冲区(单向)
struct ibv_send_wr sendWr;
memset(&sendWr, 0, sizeof(sendWr));
sendWr.wr_id = 0;
sendWr.opcode = IBV_WR_RDMA_WRITE_WITH_IMM;
sendWr.sg_list = &localSge;
sendWr.num_sge = 1;
sendWr.send_flags = IBV_SEND_SIGNALED;
sendWr.wr.rdma.remote_addr = remoteBufferInfo.vaddr;
sendWr.wr.rdma.rkey = remoteBufferInfo.rkey;
localSge.addr = (uintptr_t)localBuffer;
localSge.length = BUFFER_SIZE;
localSge.lkey = mr->lkey;
ibv_post_send(qp, &sendWr, &badSendWr);

// 关闭连接
ibv_dereg_mr(mr);
free(localBuffer);
ibv_destroy_qp(qp);
ibv_dealloc_pd(pd);
ibv_close_device(context);
  • 入门级文档:https://zhuanlan.zhihu.com/p/337461037
  • 文档:https://docs.kernel.org/infiniband/index.html
  • 文档:http://blog.foool.net/wp-content/uploads/linuxdocs/infiniband.pdf
  • 文档:https://support.bull.com/documentation/byproduct/infra/sw-extremcomp/sw-extremcomp-com/g
  • 文档:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/configuring_infiniband_and_rdma_networks/index
  • 博客:https://zhuanlan.zhihu.com/p/337461037
  • 文档:https://access.redhat.com/documentation/zh-cn/red_hat_enterprise_linux/7/html/networking_guide/ch-configure_infiniband_and_rdma_networks
  • 知乎专栏:https://www.zhihu.com/column/c_1231181516811390976
  • 知乎专栏:https://www.zhihu.com/column/rdmatechnology

Linux manual page

Command-Line

文档:https://docs.nvidia.com/networking/pages/viewpage.action?pageId=43719572

  • ibstat
  • ibhosts - 查看所有的IB hosts。
  • ibnetdiscover - discover InfiniBand topology.
  • ibv_devices - list RDMA devices.
  • ibv_devinof - Print information about RDMA devices available for use from userspace.
  • ibv_rc_pingpong - Run a simple ping-pong test over InfiniBand via the reliable connected (RC) transport.
  • targetcli - administration shell for storage targets

    targetcli is a shell for viewing, editing, and saving the configuration of the kernel's target subsystem, also known as LIO. It enables the administrator to assign local storage resources backed by either files, volumes, local SCSI devices, or ramdisk, and export them to remote systems via network fabrics, such as iSCSI or FCoE.

  • srp_daemon - Discovers and connects to InfiniBand SCSI RDMA Protocol (SRP) targets in an IB fabric.

  • ibsrpdm - List InfiniBand SCSI RDMA Protocol (SRP) targets on an IB fabric.