测来测去22:DPDK i40e fdir+rss+reta实现相同flow type不同input_set散列

问题

i40e网卡,同样的flow type,比如RTE_ETH_FLOW_NONFRAG_IPV4_UDP,需求是某一特定Dst Port的报文到rx队列1,某一特定Src IP + Dst IP的报文到rx队列2,或其他类似的针对相同flow type的需求。

方案

用fdir的话,针对某一种flow type只能配置一种固定的input_set,这两种报文各自需要不同的input_set,所以无法单独使用fdir实现。

但i40e网卡包匹配的过程是:先匹配fdir规则,如果能匹配则优先按fdir规则操作,如果不能匹配,则去匹配rss规则;如果能匹配则按rss规则操作,如果不能匹配则发送到默认rx队列0。

如果fdir针对某一种情况设定input_set,例如仅设定Dst Port,rss针对另一种情况设定input_set则可以实现对两种报文的匹配。此时还需要解决一个问题就是rss匹配的报文是对input_set字段中数值的哈希,不一定会哈希到某一个特定rx队列。

此问题可以利用重新配置rss redirection table的方式解决。需求中特定的Src IP + Dst IP的rss哈希值是一个固定的值,该值去查找redirection table中某固定的条目获得最终去往的rx队列,修改改条目至特定队列即可。

Code

首先打开port的fdir和rss的功能:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
static struct rte_eth_conf port_conf = {
.rxmode = {
.mq_mode = ETH_MQ_RX_RSS,
.max_rx_pkt_len = ETHER_MAX_LEN,
.split_hdr_size = 0,
.ignore_offload_bitfield = 1,
.offloads = (DEV_RX_OFFLOAD_CRC_STRIP |
DEV_RX_OFFLOAD_CHECKSUM),
},
.rx_adv_conf = {
.rss_conf = {
.rss_key = NULL,
.rss_hf = ETH_RSS_UDP,
},
},
.txmode = {
.mq_mode = ETH_MQ_TX_NONE,
},
.fdir_conf = {
.mode = RTE_FDIR_MODE_PERFECT,
.pballoc = RTE_FDIR_PBALLOC_64K,
.status = RTE_FDIR_REPORT_STATUS,
.drop_queue = 127,
},

};

然后配置一条fdir规则,可以针对第一种需求,Dst Port为4096的UDP报文进入rx队列1:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
struct rte_eth_fdir_filter arg_udpport = {
.soft_id = 1,
.input = {
.flow_type = RTE_ETH_FLOW_NONFRAG_IPV4_UDP,
.flow = {
.udp4_flow = {
.dst_port = 0x10, //4096=>0x1000
},
},
},
.action = {
.rx_queue = 1,
.behavior = RTE_ETH_FDIR_ACCEPT,
.report_status = RTE_ETH_FDIR_REPORT_ID,
},
};

然后分别配置fdir和rss的input_set

首先配置rss的input_set为Src IP + Dst IP:

1
2
3
4
5
6
7
8
struct rte_eth_hash_filter_info info; 
memset(&info, 0, sizeof(info));
info.info_type = RTE_ETH_HASH_FILTER_INPUT_SET_SELECT;
info.info.input_set_conf.flow_type = RTE_ETH_FLOW_NONFRAG_IPV4_UDP;
info.info.input_set_conf.field[0] = RTE_ETH_INPUT_SET_L3_DST_IP4;
info.info.input_set_conf.field[1] = RTE_ETH_INPUT_SET_L3_SRC_IP4;
info.info.input_set_conf.inset_size = 2;
info.info.input_set_conf.op = RTE_ETH_INPUT_SET_SELECT;

然后配置fdir的input_set为Dst Port:

1
2
3
4
5
6
7
struct rte_eth_fdir_filter_info fdir_filter_info;
memset(&fdir_filter_info, 0, sizeof(fdir_filter_info));
fdir_filter_info.info_type = RTE_ETH_FDIR_FILTER_INPUT_SET_SELECT;
fdir_filter_info.info.input_set_conf.flow_type = RTE_ETH_FLOW_NONFRAG_IPV4_UDP;
fdir_filter_info.info.input_set_conf.field[0] = RTE_ETH_INPUT_SET_L4_UDP_DST_PORT;
fdir_filter_info.info.input_set_conf.inset_size = 1;
fdir_filter_info.info.input_set_conf.op = RTE_ETH_INPUT_SET_SELECT;

然后令rss+fdir配置生效,同时添加一条fdir规则:

1
2
3
4
5
ret = rte_eth_dev_filter_ctrl(0, RTE_ETH_FILTER_HASH,RTE_ETH_FILTER_SET,&info);

ret = rte_eth_dev_filter_ctrl(0, RTE_ETH_FILTER_FDIR, RTE_ETH_FILTER_SET, &fdir_filter_info);

ret = rte_eth_dev_filter_ctrl(0, RTE_ETH_FILTER_FDIR, RTE_ETH_FILTER_ADD, &arg_udpport);

然后需要配置redirection table,需要拿到特定报文的rss hash的值。在mbuf结构体中有一个记录rss hash的字段,可以用gdb看。

启动DPDK应用,发送一个符合需求的UDP报文,随便你在哪里设置一个断点,能看到这个mbuf即可,在我的例子中打印出来的mbuf是这样的:

1
2
3
4
5
6
7
(gdb) p *$2
$3 = {cacheline0 = 0x7f9c7ed64d40, buf_addr = 0x7f9c7ed64dc0, {buf_iova = 68699966912, buf_physaddr = 68699966912}, rearm_data = 0x7f9c7ed64d50, data_off = 128, {
refcnt_atomic = {cnt = 1}, refcnt = 1}, nb_segs = 1, port = 0, ol_flags = 386, rx_descriptor_fields1 = 0x7f9c7ed64d60, {packet_type = 657, {l2_type = 1, l3_type = 9,
l4_type = 2, tun_type = 0, {inner_esp_next_proto = 0 '\000', {inner_l2_type = 0 '\000', inner_l3_type = 0 '\000'}}, inner_l4_type = 0}}, pkt_len = 60,
data_len = 60, vlan_tci = 0, hash = {rss = 2719877416, fdir = {{{hash = 2344, id = 41502}, lo = 2719877416}, hi = 0}, sched = {lo = 2719877416, hi = 0},
usr = 2719877416}, vlan_tci_outer = 0, buf_len = 2176, timestamp = 0, cacheline1 = 0x7f9c7ed64d80, {userdata = 0x0, udata64 = 0}, pool = 0x7f9c7fc36cc0, next = 0x0, {
tx_offload = 0, {l2_len = 0, l3_len = 0, l4_len = 0, tso_segsz = 0, outer_l3_len = 0, outer_l2_len = 0}}, priv_size = 0, timesync = 0, seqn = 0, shinfo = 0x0}

这里面显示了该报文rss hash值是2719877416;

拿到这个数字之后,需要进一步打开祖传的X710网卡的datasheet,查看一下7.1.8节的中关于Queue Index LUT的介绍。这个玩意就是redirection table,统一命名还是非常重要的:)

里面写的是:

The LUT in each PF gets the 9 LS bits of the hash output having either 128 or 512 entries

OK,这个rss hash值9 LS bits是:100101000,十进制是296。

有了这个值需要先准备一下配置redirection table所需的struct rte_eth_rss_reta_entry64结构体:

1
2
3
4
5
6
struct rte_eth_rss_reta_entry64 {
uint64_t mask;
/**< Mask bits indicate which entries need to be updated/queried. */
uint16_t reta[RTE_RETA_GROUP_SIZE];
/**< Group of 64 redirection table entries. */
};

注释也写得比较清楚,就是mask代表需要修改哪个entry,也就是和296对应的那个entry,reta就是需要进入哪个队列。因为mask是一个64bit的值,欲表达512个entry,需要一个该结构体的数组:

1
struct rte_eth_rss_reta_entry64 reta_conf[8];

8个是因为64x8=512;

求出具体要改那个index和偏移要这么计算:

296 / 64 = 4
296 % 64 = 40

于是代码中可以写为:

1
2
3
4
5
6
memset(reta_conf, 0, sizeof(reta_conf));
reta_conf[4].mask |= (1ULL << 40);
reta_conf[4].reta[40] = 2;

ret = rte_eth_dev_rss_reta_update(0,
reta_conf, 512);

此时可以实现第一节中提到的需求。

© 2020 DecodeZ All Rights Reserved. 本站访客数人次 本站总访问量
Theme by hiero