-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
investigate zero-copy socket writes #1299
Comments
This commit changes a few lwIP configuration options, following the hints at https://www.nongnu.org/lwip/2_1_x/optimization.html. The lwip_htons() and lwip_htonl() macros have been defined so that byte order inversion operations are executed inline instead of as function calls (in #1299, the lwip_htons() function shows up as taking a non-negligible share of CPU time). Instead of the default checksum algorithm #2, algorithm #3 is now being used, because it is faster on 64-bit platforms (checksum calculation on 20-byte data is around 35% faster).
I did some more testing with current master, and it is still the case that copying user data to kernel buffers when doing a socket send is more efficient (with Nanos running on qemu and sending TCP data to the local host) than the zero-copy approach. Even though retrieving the physical address from a kernel virtual address is now almost free, this doesn't apply to user buffers, so when doing zero-copy for each network packet being sent there is always a physical_from_virtual() call that involves a table lookup. non-zero-copy: zero-copy: The reduction of runtime_memcpy time is comparable to the increase in physical_from_virtual time, plus there is a significant increase in the time taken by mcache and objcache alloc/dealloc functions, which stems from the fact that when sending socket data with zero-copy we have to call |
From a cursory look it appears that we could potentially implement zero-copy on socket writes by eliminating the TCP_WRITE_FLAG_COPY flag on calls to tcp_write when SO_ZEROCOPY / MSG_ZEROCOPY is specified. User pages not under the domain of the pagecache are, in a sense, pinned already, and pages within the pagecache could be pinned by taking an extra refcount on the pagecache_page. Implementation of socket error queues would also be necessary to allow completion notification to the application.
This could potentially yield a significant performance benefit in cases such as large static page loads when the service supports zero copy (which requires that user buffers remain unmodified until after sent TCP data is acknowledged), but some further exploration might be necessary to verify that in fact the zero copy path - from lwIP through our existing PV nic drivers - will work as expected. Furthermore, note that SO_ZEROCOPY is a hint to the kernel to use zero-copy if available - with a guarantee that completion notifications will be returned - and not a guarantee that copying will be avoided (so a non-compliant driver could result in use of TCP_WRITE_FLAG_COPY with completion notifications).
https://www.kernel.org/doc/html/v4.15/networking/msg_zerocopy.html
https://blogs.oracle.com/linux/zero-copy-networking-in-uek6
The text was updated successfully, but these errors were encountered: