EXS AND ROEE FROM UNH-IOL - BASED IN PART UPON DR. ROBERT RUSSELL'S SLIDES PRESENTER: MIKKEL HAGEN
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
EXS and RoEE from UNH-IOL Based in part upon Dr. Robert Russell's slides Presenter: Mikkel Hagen www.openfabrics.org
Outline ¾InterOperability Laboratory (IOL) ¾Extended Sockets API (EXS-API) ¾EXS in user space ¾Mapping EXS onto RDMA ¾Performance ¾RDMA over Enhanced Ethernet (RoEE) www.openfabrics.org 2
UNH-IOL ¾Neutral 3rd party testing ¾Hosts the OFA-IWG interoperability cluster ~30 hosts, with targets and switches from many different vendors Both iWARP and IB ¾Maintains the OFW-IWG Logo List ¾Related testing through: iWARP, 10Gig, IB/10G Cable Testing and Data Center Bridging Consortia www.openfabrics.org 3
Extended Sockets (EXS-API) ¾Published by the Open Group ¾Extensions to “standard” sockets ¾Two major new features: Memory Registration for “zero-copy” I/O Event-based Asynchronous I/O ¾Useful for high-level access to RDMA www.openfabrics.org 4
IOL EXS Stack for OFA Application Program user space IOL EXS Library OFED user library OFED kernel modules kernel space IWARP / IB Driver iWARP RNIC / IB HCA hardware Cable www.openfabrics.org 5
IOL EXS ¾Runs entirely in user space ¾Utilizes OFED library to access RDMA ¾Requires no change to OFED or Linux ¾Extends Open Group ES-API specifications ES-API designed to run in kernel ES-API requires modifications to existing socket functions www.openfabrics.org 6
EXS Programming ¾Two major differences from “normal” sockets Memory registration • To lock memory regions for “zero-copy” I/O Event queues • To determine asynchronous I/O completion www.openfabrics.org 7
Memory Registration ¾exs_mregister() Registers a region of user virtual memory for “zero-copy” I/O ¾exs_mderegister() Unregisters previously registered memory region ¾Memory regions are used in I/O operations exs_send() exs_recv() www.openfabrics.org 8
Event Queues ¾exs_qcreate() Creates a new queue object ¾exs_qdelete() Deletes an existing queue object ¾exs_qdequeue() Blocks until event is posted to queue object ¾Event posted to a queue when an I/O completes www.openfabrics.org 9
Parameters to exs_send() ¾ Socket file descriptor – normal ¾ Buffer containing data to send – normal ¾ Number of bytes of data to send – normal ¾ Flags – normal ¾ Event queue for posting completion event – new ¾ User-defined request identification – new ¾ Registered memory region for data buffer - new www.openfabrics.org 10
Parameters to exs_recv() ¾ Socket file descriptor – normal ¾ Buffer into which data is received – normal ¾ Max number of bytes of data to read – normal ¾ Flags – normal ¾ Event queue for posting completion event – new ¾ User-defined request identification – new ¾ Registered memory region for data buffer - new www.openfabrics.org 11
Other EXS functions ¾ exs_init() ¾ exs_fcntl() - IOL Extension ¾ exs_socket() - IOL Extension ¾ exs_bind() - IOL Extension ¾ exs_listen() - IOL Extension ¾ exs_connect() ¾ exs_accept() ¾ exs_close() - IOL Extension www.openfabrics.org 12
Mapping IOL-EXS onto RDMA ¾ Transfer controlled by recipient of data ¾ exs_send() translates into RDMA “send” to advertise sender's data buffer to receiver ¾ exs_recv() sets up asynchronous wait for advertisement, then issues RDMA “read_request” to asynchronously transfer data directly from sender's memory into receiver's memory ¾ Asynchronous completion of transfer translates into short RDMA “send” of ACK back to sender www.openfabrics.org 13
Typical EXS Data Transfer user EXS OFED RNIC wire RNIC OFED EXS user exs_send RDMA Send exs_recv advertisement RDMA Read- Request RDMA Read- Response RDMA Send acknowledgment exs_qdequeue exs_qdequeue www.openfabrics.org 14
IOL EXS Internal Flow Control ¾Each side maintains local “send credit” value ¾Initial credits negotiated when connection established ¾exs_send() decrements local “send credit” by 1, blocks if none remaining ¾Receipt of ACK increments local “send credit” by 1, unblocks any waiting www.openfabrics.org 15
IOL-EXS Small Packets ¾Size limit configured by user with exs_fcntl() prior to connection establishment ¾Causes “small” amounts of data in exs_send() to travel as “immediate data” in the advertisement ¾Saves extra RDMA “read_request” / “read_response” exchange (hidden from user) ¾Costs extra data copy on receiver when data advertisement is matched with exs_recv() www.openfabrics.org 16
Initial Performance Measurements ¾Platform configuration 4 64-bit Intel 2.66 GHz processors 4 Gigabytes memory 1 NetEffect (Intel) 10 Gigabit/second RNIC ¾Test “blasting” data from one workstation to other ¾Metrics measured User-level throughput in Megabits/second CPU utilization as a percentage of 1 CPU www.openfabrics.org 17
EXS Throughput in Mbits/sec www.openfabrics.org 18
Bandwidth Utilization ¾ Standard Ethernet Frame 1538 bytes Ethernet gap, preamble, header and CRC 38 bytes IP and TCP headers 40 bytes MPA, DDP, RDMAP headers and MPA CRC 20 bytes ¾ Available user payload data 1440 bytes ¾ Percentage available for data 93.63% ¾ Percentage actually utilized 93.29% www.openfabrics.org 19
Percent CPU Utilization www.openfabrics.org 20
EXS Conclusions ¾Application programs using EXS can attain High bandwidth utilization Low CPU overhead ¾Application programs using EXS can tune their performance ¾Programmers don't need to learn verbs in order to make use of RDMA technology www.openfabrics.org 21
EXS Future Work ¾Compare various RNIC hardware ¾Compare standard and jumbo Ethernet frames ¾Compare EXS over IB and iWARP ¾Compare various applications ¾Look at multiple outstanding transmissions ¾Look at multiple simultaneous connections www.openfabrics.org 22
RoEE ¾With the introduction of Data Center Bridging standards into Ethernet, applications sensitive to loss can now be placed directly upon Ethernet such as Fibre Channel and RDMA ¾DCB has four new technologies being introduced: PFC, ETS, CN, DCBX www.openfabrics.org 23
Priority-based Flow Control (PFC) ¾Introduces a new MAC Control Frame that extends existing PAUSE mechanism ¾New frame has fields to define pause time for all eight defined priorities ¾This allows end nodes to inhibit only the traffic classes that are over utilizing their fair share of bandwidth www.openfabrics.org 24
Enhanced Transmission Selection (ETS) ¾ Adds another field, Priority Group ID (PGID), to frames ¾ Priorities are added to different groups and the bandwidth of the link is divided up between the groups. ¾ This allows network admins to provide minimum QoS bandwidth requirements to different traffic classes ¾ Rate limiters on the transmitter maintains the BW allocations www.openfabrics.org 25
Congestion Notification (CN) ¾Adds another field, FlowID, to frames ¾Provides end-to-end flow control ¾PFC causes back pressure congestion cascades that can pull in nodes and flows not directly involved in the congestion ¾CN allows devices to limit congestion pro- actively and hopefully prevent this phenomenon www.openfabrics.org 26
DCB Exchange (DCBX) ¾Allows end nodes to communicate configuration and even actively configure each other ¾Designed to provide a means of identifying config problems and limited means of fixing them www.openfabrics.org 27
DCB and UNH-IOL ¾ UNH-IOL is currently testing all of the above technologies in the new Data Center Bridging Consortium ¾ Future events are already planned with EA and FCIA to test DCB and FCoE (FC over DCB) ¾ Future events may also include Enterprise iSCSI (iSCSI over DCB) ¾ First test plan is complete: http://www.iol.unh.edu/services/testing/dcb/testsuites.php www.openfabrics.org 28
RoEE and UNH-IOL ¾ UNH-IOL has over 20 years of experience working with standards bodies including IEEE, IETF, ISO, ITU, etc. ¾ UNH-IOL has worked with RDMA technologies including iWARP, IB and OFA for several years – testing, documenting and developing applications ¾ UNH-IOL has worked on DCB from the beginning and will lead the way with testing and developing this new technology ¾ UNH-IOL is uniquely qualified with experience in all areas touching RoEE www.openfabrics.org 29
Contacts ¾Mikkel Hagen Research and Development Phone: 1 (603) 862 5083 Email: mhagen@iol.unh.edu ¾Bob Noseworthy Chief Engineer Phone: 1 (603) 862 0090 Email: ren@iol.unh.edu ¾Website: http://www.iol.unh.edu www.openfabrics.org 30
You can also read