We've looked at this. The problem is that NICs want to read in TCP MSS size chunks (1448 bytes, for example), while storage devices are highly optimized for block-aligned (4K) chunks. So you need to buffer the storage reads someplace, and for now the only practical answer is host memory. There are NVME technologies that could help, but they are either too small, or come at too large of a price premium. CXL memory looks promising, but its not ready yet.
Does it? I thought with segmentation offloads the NIC basically gets TCP stream data in more or less arbitrary sizes, and then segments in into MTU sizes on its own?
We do fairly sophisticated TCP pacing, which requires sending down some small multiple of MSS to the NIC, so it doesn't always have the freedom to pull 4K at a time.
We've looked at this. The problem is that NICs want to read in TCP MSS size chunks (1448 bytes, for example), while storage devices are highly optimized for block-aligned (4K) chunks. So you need to buffer the storage reads someplace, and for now the only practical answer is host memory. There are NVME technologies that could help, but they are either too small, or come at too large of a price premium. CXL memory looks promising, but its not ready yet.
Does it? I thought with segmentation offloads the NIC basically gets TCP stream data in more or less arbitrary sizes, and then segments in into MTU sizes on its own?
We do fairly sophisticated TCP pacing, which requires sending down some small multiple of MSS to the NIC, so it doesn't always have the freedom to pull 4K at a time.