Binary CAM Search v2.2 - LogiCORE IP Product Guide Vivado Design Suite - Xilinx
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Table of Contents Chapter 1: Introduction.............................................................................................. 4 Features........................................................................................................................................ 4 IP Facts..........................................................................................................................................5 Chapter 2: Overview......................................................................................................6 Navigating Content by Design Process.................................................................................... 8 Licensing and Ordering.............................................................................................................. 8 Chapter 3: Product Specification......................................................................... 10 Scheduling..................................................................................................................................11 Lookup Operation..................................................................................................................... 12 Table Management................................................................................................................... 12 Error Correction Coding (ECC)................................................................................................. 12 Performance.............................................................................................................................. 13 Port Descriptions.......................................................................................................................15 Register Address Space............................................................................................................ 17 Chapter 4: Designing with the Core................................................................... 19 General Design Guidelines.......................................................................................................19 Clocking...................................................................................................................................... 22 Resets..........................................................................................................................................23 Chapter 5: Design Flow Steps.................................................................................24 Customizing and Generating the Core................................................................................... 24 Constraining the Core...............................................................................................................34 Simulation.................................................................................................................................. 35 Synthesis and Implementation................................................................................................36 Chapter 6: Example Design..................................................................................... 37 Simulating the Example Design.............................................................................................. 38 Chapter 7: Software Application Example Design......................................40 PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 2
Running the Software Application Example Design............................................................. 41 Appendix A: Upgrading............................................................................................. 42 Appendix B: Debugging.............................................................................................43 Finding Help on Xilinx.com...................................................................................................... 43 Debug Tools............................................................................................................................... 44 Simulation Debug......................................................................................................................45 Hardware Debug....................................................................................................................... 46 General Checks.......................................................................................................................... 46 Appendix C: Additional Resources and Legal Notices............................. 48 Xilinx Resources.........................................................................................................................48 Documentation Navigator and Design Hubs.........................................................................48 References..................................................................................................................................48 Training Resources....................................................................................................................49 Revision History......................................................................................................................... 49 Please Read: Important Legal Notices................................................................................... 50 PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 3
Chapter 1: Introduction Chapter 1 Introduction The Binary CAM Search LogiCORE IP (BCAM) implements an associative array data structure also known as a content-addressable memory. The BCAM stores (key, response) entries with arbitrary key and response bit strings and allows the retrieval of the response based on an exact match of all bits in the search key with all bits in key. Features • Associative array containing arbitrary (key, response) entries. • Exact match key lookup returns hit/miss result and associated response value on hit. • High throughput: one lookup per clock cycle up to 600 MHz. Note: Achievable clock frequencies will depend on the device being used, the resources used by the CAM configuration, and the congestion in the device. • Flexible, supports a wide range of key widths, response widths and lookup rates with optimized resource utilization. • Supports all key widths up to 992 bits and all response widths up to 1024 bits. • Supports both UltraRAM (URAM) and block RAM implementations. • Scalable, supports one or multiple BCAM instances, each instance can use all block RAM/ UltraRAM (URAM) within an SLR allowing very large BCAMs. • High storage efficiency, 95% of the RAM bits are transformed to CAM bits. • Supports error correction coding (ECC). Single-bit errors are corrected dynamically during lookups, and permanently with patrol scrubbing. • Supports Vivado® IP integrator. • Supports entry insert, delete, update using standard TCAM like software APIs. • Can be inferred from within P4 code using the Vitis™ Networking P4 (VitisNetP4) tool. PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 4
Chapter 1: Introduction IP Facts LogiCORE™ IP Facts Table Core Specifics Supported Device Family1 UltraScale™, UltraScale+™, Zynq® UltraScale+™, Versal™ ACAP Supported User Interfaces AXI4-Stream and AXI4-Lite Interfaces Resources See the CAM Configuration Information section in the Main Tab topic. Provided with Core Design Files Encrypted Verilog RTL Example Design Verilog Test Bench Verilog Constraints File Xilinx Design Constraint (XDC) Simulation Model Verilog source code Supported S/W Driver2 Standalone Software Example Design Application Standalone, Vivado® IP integrator Tested Design Flows3 Design Entry4 Standalone, VitisNetP4, Vivado® IP integrator Simulation5 For supported simulators, see the Xilinx Design Tools: Release Notes Guide. Synthesis Xilinx Vivado Synthesis Support Release Notes and Known Issues Master Answer Record: 59718 All Vivado IP Change Logs Master Vivado IP Change Logs: 72775 Xilinx Support web page Notes: 1. For a complete list of supported devices, see the Vivado® IP catalog. 2. Standalone driver details can be found online. 3. For the supported versions of the tools, see the Xilinx Design Tools: Release Notes Guide. 4. The CAM IP is only supported in the Vivado IP Catalog running on a Linux operating system (not supported on Windows). 5. Modelsim, Questa, VCS, Xcelium, and Xsim are supported. Refer to Vivado Design Suite User Guide: Release Notes, Installation, and Licensing (UG973) for information on version compatibility. PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 5
Chapter 2: Overview Chapter 2 Overview The Binary CAM Search IP core (BCAM) is a member of the family of CAMs provided by Xilinx®. The family consists of four members: • Binary CAM (BCAM): Described in this document. Used for exact matching. Entry storage is provided in UltraRAM or block RAM. • Semi TCAM (STCAM): The STCAM is fully flexible in terms of number, size and position of wildcard (don't care) fields. Every key bit has a corresponding mask bit. The number of allowed unique masks is however limited. This allows for considerable memory and logic optimizations. See the Semi-Ternary CAM Search LogiCORE IP Product Guide (PG319). • Ternary CAM (TCAM): The primary usage of TCAM is for tables requiring full flexibility in terms of size and position of wildcard (don't care) fields. Every key bit has a corresponding mask bit stored together with the key. All entries can have different masks. TCAMs are used for Access Control List (ACL) type lookups, requiring a large number of different masks. See the Ternary CAM Search LogiCORE IP Product Guide (PG318). One or multiple instances of each type can be used inside the same FPGA. Different types can also be mixed inside the same FPGA. Each CAM type is optimized for its specific task in terms of hardware resource usage and can be flexibly configured using VitisNetP4 or the IP integrator. The BCAM stores {key, response} entries in either URAM or block RAM. The BCAM provides efficient use of FPGA resources, in contrast with basic BCAM implementations that store the keys in flip-flops and use logic resources for parallel key comparison. The Lookup interface of the BCAM receives a lookup key and outputs a result that contains a match flag indicating whether the lookup key matches the key of any entry in the BCAM. If any BCAM entry is matched, the response value of the matching entry is output. The BCAM is pipelined so that it can process a Lookup Request every clock cycle. The entries are read and written using a set of high-level Application Programming Interface (API) functions. The API functions are written in C and delivered as part of the IP. The API encapsulates the details of memory management and register access and provides a simple and efficient management interface. The API software with detailed documentation is found in the CAM IP product page. The user provides the functions for basic hardware reads and writes to the API. This allows for flexible hardware mapping and the communications link between the API software and the hardware is designed to the users' specifications. The communication link could for instance be AXI4-Lite or PCIe. PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 6
Chapter 2: Overview The BCAM design is highly configurable at compile time to make it suitable for a large variety of applications. The table below lists the configuration parameters. Table 1: Configuration Parameters Parameter Name Valid Range Description The width of the lookup key. KEY_WIDTH + KEY_WIDTH 10-992 bits RESPONSE_WIDTH + 1 cannot exceed 1536/1024 [block RAM/URAM]. The width of the lookup response. KEY_WIDTH + RESPONSE_WIDTH 1-1024 bits RESPONSE_WIDTH + 1 cannot exceed 1536/1024 [block RAM/URAM]. The number of usable entries (depth). To generate a BCAM with a certain memory depth, .(for example 4K), NUM_ENTRIES 1 - 1.25M specify 95% of the target NUM_ENTRIES = 0.95 x 4096 = 3891 BLOCK or ULTRA or The compiler selects the best suited type automatically. MEMORY_PRIMITIVE AUTO This can however be overridden as a user preference. This is the supported lookup rate of the instance (expressed in million Lookups per second). In order to LOOKUP_RATE 15 - 600 Mlps save resources it is important not to set the lookup rate higher than required. This is the clock frequency of the Lookup Request and LOOKUP_INTERFACE_FREQ 15-600 MHz response interfaces. LOOKUP_INTERFACE_FREQ >= LOOKUP_RATE This is the clock frequency of the memories and the internal datapath. An optional, high frequency RAM clock enables time division of the hardware resources, RAM_FREQ 15-600 MHz leading to significant savings. See the TDM_FACTOR parameter. RAM_FREQ >= LOOKUP_INTERFACE_FREQ The TDM_FACTOR is calculated from the ratio: RAM_FREQ / LOOKUP_RATE TDM_FACTOR 1, 2, or 4 The ratio is rounded downwards to the nearest power of two and capped based on NUM_ENTRIES. This further described in Resource Time Sharing. The use of a separate RAM clock is optional. If RAM_FREQ = LOOKUP_INTERFACE_FREQ, then the single SINGLE-CLOCK or CLOCKING_MODE clock mode is enabled. In single clock mode only the DUAL_CLOCK lookup interface clock is used for lookup interfaces, RAM and match logic. All of these parameters are extracted from the P4 code and VitisNetP4 tool during compilation. If the BCAM is used without P4, these parameters are set in the IP Generator prior to generating the BCAM hardware and software BCAM API. VitisNetP4 ensures that the parameters used to generate the hardware BCAM and those used to create the software BCAM are synchronized. For standalone usage, the user must guarantee that the parameters used to generate the hardware BCAM and the parameters used to call the software BCAM are identical. Related Information Resource Time Sharing PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 7
Chapter 2: Overview Navigating Content by Design Process Xilinx® documentation is organized around a set of standard design processes to help you find relevant content for your current development task. All Versal™ ACAP design process Design Hubs can be found on the Xilinx.com website. This document covers the following design processes: • Hardware, IP, and Platform Development: Creating the PL IP blocks for the hardware platform, creating PL kernels, functional simulation, and evaluating the Vivado® timing, resource use, and power closure. Also involves developing the hardware platform for system integration. Topics in this document that apply to this design process include: • Port Descriptions • Clocking • Customizing and Generating the Core Licensing and Ordering This Xilinx® LogiCORE™ IP module is provided under the terms of the Xilinx Core License Agreement. For full access to all core functionalities in simulation and in hardware, you must purchase a license for the core. To generate a full license, visit the product licensing web page. Evaluation licenses and hardware timeout licenses might be available for this core. Contact your local Xilinx sales representative for information about pricing and availability. Note: To verify that you need a license, check the License column of the IP Catalog. Included means that a license is included with the Vivado® Design Suite; Purchase means that you have to purchase a license to use the core. For more information about this core, visit the Binary CAM Search IP product web page. Information about other Xilinx® LogiCORE™ IP modules is available at the Xilinx Intellectual Property page. For information about pricing and availability of other Xilinx LogiCORE IP modules and tools, contact your local Xilinx sales representative. Licensing information is available on the CAM IP product page. License Checkers If the IP requires a license key, the key must be verified. The Vivado® design tools have several license checkpoints for gating licensed IP through the flow. If the license check succeeds, the IP can continue generation. Otherwise, generation halts with an error. License checkpoints are enforced by the following tools: PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 8
Chapter 2: Overview • Vivado Synthesis • Vivado Implementation • write_bitstream (Tcl command) IMPORTANT! IP license level is ignored at checkpoints. The test confirms a valid license exists. It does not check IP license level. PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 9
Chapter 3: Product Specification Chapter 3 Product Specification The functional block diagram of the core is shown in the following figure. Figure 1: Core Block Diagram Lookup Request Lookup Response Interface Interface AXI4-Stream AXI4-Stream Slave Master Lookup Interface Frequency FIFO RAM Frequency FIFO Strict CAM Priority Database Scheduler Registers & Memory RAM Frequency FIFO AXI4-Lite Frequency FIFO Mgmt Request Mgmt Response Interface Interface AXI4-Slave AXI4-Lite API Software X21980-062420 PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 10
Chapter 3: Product Specification The previous figure contains the following sub-blocks: • CAM Database: Memory and registers for storage of rules and logic for algorithmic lookup matching. • AXI4-Stream Slave: Protocol handling, including flow control, for Lookup Requests. • AXI4-Stream Master: Protocol handling, including flow control, for Lookup Responses. • Lookup Request FIFO (Optional): Transfers Lookup Requests from the Lookup Frequency domain to the RAM Frequency domain. • Lookup Response FIFO (Optional): Transfers Lookup Responses from the RAM Frequency domain to the Lookup Frequency domain. • Management Request FIFO: Buffering queue for input Management Requests. • Management Response FIFO: Buffering queue for Management Responses. • Strict Priority Scheduler: Schedules Lookup and Management Requests. • AXI4-Lite interface slave: Protocol handling for accepting read/write requests and generating responses. The AXI4-Lite interface uses 13 bits of address and 32 bits of data. The following clock domains are depicted in the block diagram: • AXI4-Stream Lookup Interface Frequency: The clock frequency of the Lookup Request/ Response interfaces. Depending on configuration parameters a new Lookup Request may be processed every cycle, every second cycle, every fourth cycle, etc. • RAM Frequency: The clock frequency of the internal RAM and match logic. • AXI4-Lite Frequency: The clock frequency of the AXI Lite bus. The clocking of the CAM can be set to one of two modes: • Single Clock Mode: If the Lookup Interface Frequency and RAM Frequency are equal, then the RAM clock is not used and the Lookup Request/Response FIFOs are removed completely. • Dual Clock Mode: If the RAM Frequency is higher than the Lookup Interface Frequency, shallow FIFOs are inserted to bridge the clock domains. Scheduling The Strict Priority Scheduler always selects a request from the Lookup Request FIFO first. A request can only be scheduled from the Management Request FIFO when the Lookup Request FIFO is empty. PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 11
Chapter 3: Product Specification When the Management Request FIFO becomes full, the AXI4-Slave stops accepting new Management Requests to prevent loss of Management Requests. For every Lookup Request processed by the CAM database a Lookup Response is generated. The Lookup Response is sent to the Lookup Response Interface. For a read Management Request, the response is sent to the Management Response Interface. Write Management Requests do not generate a Management Response. The Management Response Interface can only buffer one management response. The AXI4-Slave blocks the CPU read operation until the Management Response is available and sent on the AXI4-Lite interface. Lookup Operation The CAM Database stores the (key, response) entries and performs matching by processing a Lookup Request. If the lookup key matches the key of any entry, there is a match and the response is output. The API software ensures that two entries using identical keys cannot be inserted. If the lookup key does not match the key of any entry, a no match indication is output together with a default response. The default response entry is a special entry without any key part. The key part is not necessary since this entry always matches if no other entry is matched. Table Management In addition to processing Lookup Requests, the CAM Database also processes Management Requests. The CAM Database is fully pipelined. If the TDM factor is one, a new Lookup Request is processed every RAM clock cycle. If the TDM factor is two, a Lookup Request takes two RAM clock cycles to complete, and so on. The CAM Database can process one Management Request in a single RAM clock cycle, independent of the TDM factor. Error Correction Coding (ECC) All memories in the CAM database are ECC protected. There are eight additional ECC bits for every address in a 64-bit wide memory. The eight additional bits are only used for ECC and can not be used for storage. 64-bit wide memories are always referred in this document (even though 72 bits are used). PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 12
Chapter 3: Product Specification A scrubbing mechanism starts regularly (approximately 1 ms interval) and reads every memory address of the CAM in the background using idle cycles. If a single-bit error is detected during scrubbing, the error is corrected permanently by writing the corrected data back to the memory. Single-bit errors detected during lookup operations are corrected dynamically. If a double-bit error is detected during lookup, there will be no match. In general, if double-bit errors are detected during lookups it is recommended to drop the packet issuing the lookup. There are two statistic counters for ECC: • Single-bit errors: This counter increments for errors detected and corrected during scrubbing. • Double-bit errors: This counter increments for double-bit errors detected during scrubbing. The address of the first failing RAM location is stored in a register for diagnostic purposes. There are two status outputs related to ECC: Single-bit/double-bit error detected. The outputs are valid for one cycle (Lookup Interface clock domain) whenever an error is detected by the ECC scrubber. The status outputs can be used to trigger a CPU interrupt. When the CPU reads the statistics counters the counters are cleared. The software API provides a debug function to enable insertion of single-bit/double-bit errors during write operations. With the error insertion enabled, subsequent insert/update/delete operations will store data in memory with errors. Note that it takes up to 2 ms before the ECC scrubber detects the errors which can be observed on the status outputs / error counters. Performance Maximum Frequencies The Binary CAM Search IP is designed to run at up to 600 MHz in UltraScale+™-2 speed grade devices. Latency The BCAM lookup latency depends on the size of the BCAM the RAM_FREQ / LOOKUP_RATE ratio and the memory type. The lookup latency is constant and some examples are shown in the following table. Table 2: Lookup Latency [Block RAM / UltraRAM] Entries RAM Clock 1x RAM Clock 4x RAM Clock 16x RAM Clock 32x 256 16/17 13/14 9/9 8/8 1K 16/17 14/14 9/9 8/8 4K 18/17 14/14 9/9 8/8 16K 21/18 14/14 9/9 8/8 PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 13
Chapter 3: Product Specification Note: Latency values are measured in Lookup Interface Frequency cycles. Note: KEY_WIDTH = 32, RESPONSE_WIDTH = 16, LOOKUP_RATE = LOOKUP_INTERFACE_FREQ. Throughput The lookup throughput corresponds to the LOOKUP_RATE parameter. The highest possible lookup throughput is accomplished when LOOKUP_RATE equals the RAM _FREQ parameter. One Lookup Request can then be issued per RAM clock cycle. The Management Request has strictly lower priority than the Lookup Request, consequently the Management Request throughput becomes: Management Request Rate = RAM_FREQ - LOOKUP_RATE*TDM_FACTOR The ECC scrubbing process has the lowest priority. A memory read followed by a potential corrective write is only executed if both the Lookup Request and Management Request FIFOs are empty. Neither the lookup throughput nor the Management Request throughputs are affected. ECC scrubbing of a new address is only initiated if both FIFOs are empty and a potential pending corrective write has been executed. All read and write Management Requests are 32 bits wide. The only exception is for write Management Request of entry data. These Management Requests might be wider as described in the section below. The Management Request width for entry data is essential for correct dimensioning of the lookup rate and RAM frequency in order to have throughput headroom for Management Requests. In order to perform management updates while maintaining correct state for lookups, the management operations need to be atomic. This means that a complete entry must be written to the CAM Database before the entry is made active (valid). To accomplish wide writes, an entry is written using multiple Management Requests where the last Management Request sets the valid bit. When an already existing entry is being updated, the valid bit is already set. This means that the response data needs to be written using only a single Management Request. For this reason a Management Request writes at least (response + valid) width bits of data. The total width is rounded upwards to the next 64-bit boundary. For a 160-bit key with 72 bits of response + valid, assume that the following is written: • Key, 160 bits • Response + Valid, 72 bits In total, 232 bits are written. The width of response + valid is 72 bits, this will be rounded to 128 bits. Each Management request writes 128 bits. With rounding up, 232/128 = 2 write Management Requests are sent. The AXI4-Lite bus uses 13 bits of address and 32 bits of data, so for every Management Request multiple AXI4-Lite writes are issued from the API software. The AXI4-Lite writes are assembled by the AXI4-Slave to a single Management Request. The AXI4-Lite interface is a standard type. Refer to AXI4-Lite IPIF LogiCORE IP Product Guide (PG155). PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 14
Chapter 3: Product Specification The following table shows an example calculation for 100Gb Ethernet rate. Keep in mind the calculated update rate only refers to the hardware resources, the final update rate is most likely limited by the table management software. Table 3: 100GbE Update Rate Example Calculation (hardware limit) Management Update rate Management Request Size AXI Lite Write Operations [bits] [min / max] [M updates/s] 64 1/3 4.8 128 2/5 2.4 256 4/9 1.2 512 8 / 17 0.6 1024 16 / 33 0.3 Note: Parameters used in this example: LOOKUP_RATE = 148.8, RAM_FREQ = 600, TDM_FACTOR = 4 Note: AXI Lite minimum values apply when the write data is constant (for example initializing to zero). Maximum values applies for general write data pattern. Port Descriptions System Interface Ports Table 4: System Interface Port Name I/O Clock Description Asynchronous reset (active_low). The reset input is rstn I synchronized internally to both the ram_clk and key_clk domains. Reset Busy is an active high indicator that the core is rst_busy O s_axi_aclk currently in reset state. The ram clock is used for the internal RAM and match ram_clk I logic. Single-bit error output status. A single-bit error has sbiterr O key_clk been detected and corrected by the ECC scrubbing mechanism. Double-bit error output status. A double-bit error has dbiterr O key_clk been detected by the ECC scrubbing mechanism. debug_status[31:0] O Debug status port. PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 15
Chapter 3: Product Specification Lookup Interface Ports Table 5: Lookup Interface Port Name I/O Clock Description s_axis_lkup_tdata The Lookup Request key. tdata is padded with zeros to I key_clk [S_LKUP_WIDTH-1:0] a byte multiple length. tready indicates that the core is ready to accept lookup requests. The ratio of LOOKUP_INTERFACE_FREQ and LOOKUP_RATE determines how frequently new lookups s_axis_lkup_tready O key_clk can be started. If LOOKUP_INTERFACE_FREQ equals LOOKUP_RATE than lookups can be started every clock cycle without gaps. Indicates that the lookup request key is valid. A lookup s_axis_lkup_tvalid I key_clk is initiated when both tready and tvalid are high for one clock cycle. m_axis_lkup_tdata O key_clk The response value associated with the matching entry. [M_LKUP_WIDTH-1:0] tready for M_AXIS_DATA channel. Tie high if the m_axis_lkup_tready O key_clk downstream slave is always able to accept data from M_AXIS_DATA. m_axis_lkup_tvalid I key_clk tvalid for M_AXIS_DATA channel Figure 2: s_axis_lkup_data MSB 0 Zero LookupKey Padding KEY_WIDTH S_LKUP_WIDTH-1 S_LKUP_KEY_POS X24118-061120 S_LKUP_KEY_POS = 0 S_LKUP_WIDTH = 8 x floor((KEY_WIDTH + 7) / 8) PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 16
Chapter 3: Product Specification Figure 3: m_axis_lkup_data MSB 0 Double-bit Zero LookupKey Zero MatchFlag Zero ResponseValue Error Padding KEY_WIDTH Padding 1 Padding RESPONSE_WIDTH 1 M_LKUP_WIDTH-1 M_LKUP_KEY_POS M_LKUP_FLAG_POS M_LKUP_RESP_POS X24119-072121 M_LKUP_RESP_POS = 0 M_LKUP_FLAG_POS = 8 x floor((RESPONSE_WIDTH + 7) / 8) M_LKUP_KEY_POS = 8 x (1 + floor(RESPONSE_WIDTH + 7) / 8) M_LKUP_WIDTH = 8 x (floor(KEY_WIDTH + 7) / 8) + 2 + floor(RESPONSE_WIDTH + 7) / 8)) New AXI- response flag for lookup ECC double-bit errors, located after the MatchFlag. Figure 4: Lookup Interface Timing Diagram Register Address Space You must provide a hardware write and hardware read function for the API software to call once it needs to access the hardware. The hardware write and read functions use address, data and a user specified context as arguments. The data is 32 bits wide and the address is 13 bits wide. The context can be used to differentiate multiple instances and thus allow the same functions to be used for multiple instances. The user of the API software must map the 13-bit wide address PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 17
Chapter 3: Product Specification space used by each BCAM instance to a hardware base address in both the hardware write and read functions. The BCAM is accessed through the API and the register space is fully abstracted and therefore not listed. You do not need to directly access the registers in the IP; the driver supplied with the IP interfaces to the hardware under the hood and therefore direct accesses are not required. PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 18
Chapter 4: Designing with the Core Chapter 4 Designing with the Core This chapter includes guidelines and additional information to facilitate designing with the core. General Design Guidelines Implementation in Block RAM vs. UltraRAM The BCAM requires at least four RAM read accesses independently of the BCAM size. For small BCAMs it is therefore beneficial to use block RAM. Each RAM has associated logic, therefore as the BCAM size increases, less logic resources will be used if the total number of RAMs is kept low. This can be accomplished by using deeper RAMs such as UltraRAM (URAM), or depth cascaded RAMs. The API software calculates and automatically selects the optimum solution in terms of hardware resources. In addition to automatic optimization, the user can also direct the configuration to minimize either logic resources or RAM resources. If the target device has more spare capacity for a certain RAM type, the MEMORY_PRIMITIVE parameter can be forced to this RAM type even if it is less efficient, because it will provide better overall device utilization. Quantification Loss When an instance is generated, the total width of an entry is calculated: Entry Width = KEY_WIDTH + RESPONSE_WIDTH + Valid The valid bit adds one extra bit to the entry width. The total entry width is then mapped to the required number of block RAMs or URAMs necessary to read the entire entry in parallel. Block RAM and URAM are allocated in data width increments of 64 bits. To minimize quantification losses, it is beneficial if the entry size is close below or on a 64-bit boundary. For example, if the total entry size is 308 bits, the quantification loss is 12 bits per entry: 5*64 - 308 = 12 PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 19
Chapter 4: Designing with the Core Quantification loss does not only occur when the entry width does not align to 64-bit boundaries. It also occurs when NUM_ENTRIES does not match the RAM address space perfectly. For example, TDM_FACTOR = 1, MEMORY PRIMITIVE = BLOCK, NUM_ENTRIES = 1500. For this BCAM four block RAMs are required. Four block RAMs provide 2048 *0.95 = 1945 entries. Loss = (1945-1500)/1945 = 23% The maximum supported entry widths are listed in the following table. Table 6: Maximum Supported Entry Widths DEPTH MEMORY_PRIMITIVE ENTRY_WIDTH 512 BLOCK 1536 1024 BLOCK 7681 2048 BLOCK 3841 2048 ULTRA 2048 4096 ULTRA 1024 8192 ULTRA 512 16384 ULTRA 2561 Notes: 1. Not available for RAM_FREQ over 400 MHz. Storage Efficiency Up to 95% of the RAM addresses can be used for storage. Resource Time Sharing The CAM Database is designed to run at a clock frequency of up to 600 MHz and process one Lookup Request every clock cycle. A search operation requires a minimum of four RAM accesses. The RAM accesses can be performed in parallel using multiple RAM instances, sequentially by performing multiple accesses in the same RAM, or by a combination of both parallel and sequential access. For example if the RAM Frequency is 600 MHz and the Lookup rate is 300M Lookups per second, two accesses can be performed using the same RAM. The possible amount of time sharing of the RAM is calculated using integer division: TDM_FACTOR = RAM_FREQ / LOOKUP_RATE The TDM_FACTOR is rounded down to the nearest power of two and capped at four. The following values are allowed: 1, 2, and 4. The CAM depth also determines how much time sharing is necessary, as detailed in the following table. PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 20
Chapter 4: Designing with the Core Table 7: TDM_FACTOR calculations NUM_ENTRIES [BRAM/ LOOKUP_RATE [Mlps] RAM_CLOCK [MHz] TDM_FACTOR URAM] 973/7783 Up to 150 Minimum 4 x LOOKUP_RATE 4 2919/23348 Up to 300 Minimum 2 x LOOKUP_RATE 2 Up to 1.25M Up to 600 LOOKUP_IF_FREQ 1 • Lookup rates above 300 Mlps: Time sharing of the RAM is not possible because it would require a RAM_FREQ higher than the maximum supported 600 MHz. • Lookup rates up to 300 Mlps and NUM_ENTRIES up to 2919/23348 [BRAM/URAM]: It is beneficial to perform sequential lookups in the same RAM to save hardware logic resources using TDM_FACTOR 2 or 4. • NUM_ENTRIES larger than 2919/23348 [BRAM/URAM]: Time sharing is not required to provide the minimum required four RAM accesses. Table 8: Clock Configuration Examples Lookup RAM clock Lookup Rate Interface Ethernet Speed frequency Clocking Mode TDM Factor [Mlps] Frequency [MHz] [MHz] 10G 15 15 15 SINGLE_CLOCK 1 10G 15 30 30 SINGLE_CLOCK 2 10G 15 30 60 DUAL_CLOCK 4 100G 150 150 150 SINGLE_CLOCK 1 100G 150 300 300 SINGLE_CLOCK 2 100G 150 300 600 DUAL_CLOCK 4 400G 600 600 600 SINGLE_CLOCK 1 Registering Signals To simplify timing and increase system performance in a programmable device design, keep all inputs and outputs registered between the user application and the core. This means that all inputs and outputs from the user application should come from, or connect to, a flip-flop. While registering signals might not be possible for all paths, it simplifies timing analysis and makes it easier for the Xilinx® tools to place and route the design. Make Only Allowed Modifications You should not modify the core. Any modifications can have adverse effects on system timing and protocol compliance. Supported user configurations of the core can only be made by selecting the options in the customization IP dialog box when the core is generated. PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 21
Chapter 4: Designing with the Core Update Performance The API software is single threaded and uses one core for its processing. Changes to the CAM database are made using the insert, update, or delete functions. The required processing for these operations increases with the fill level of the CAM database. The required processing increases CAM width but does not increase with CAM depth. The insert function is twice as fast as the update and delete functions. Configuration Example The performance of the configuration listed below is 200k inserts per second. • Key size = 32, response size = 16 • CAM database = 100,000 keys • CAM database is empty and then filled with 100,000 keys • CPU=i7-7600U, 2.80 GHz. 4 MB 16-way set associative shared cache Clocking Table 9: Clocks Clock Description The AXI clock is used for table management. The AXI management interface has a completely s_axi_aclk asynchronous relationship with the lookup interface. key_clk AXI4-Stream clock for Lookup Request/Response interfaces. The ram_clk is optional. It provides an option to clock the internal RAM and match logic on a separate high frequency clock. In most cases this saves logic and memory resources. Guideline: ram_clk • key_clk 300 MHz, ram_clk is not used Related Information Performance Single Clock Mode Single clock mode is activated by specifying the same value for both LOOKUP_INTERFACE_FREQ and RAM_FREQ. The ram_clk port is not used in single clock mode. The Lookup interfaces, internal RAM and match logic are all clocked on the key_clk. PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 22
Chapter 4: Designing with the Core Note: Single clock mode is recommended for Lookup rates above 300 Mlps. Dual Clock Mode In dual clock mode the internal RAM and match logic is clocked on a separate high frequency clock ram_clk. This enables a high TDM_FACTOR to be used without increasing the frequency of the Lookup Interface. Note: Both ram_clk and key_clk must be derived from the same PLL in order to avoid clock drift. TDM Factor Settings for Clock Modes The following table shows the difference in LOOKUP_INTERFACE_FREQ for SINGLE_CLOCK and DUAL_CLOCK modes. Table 10: TDM Factor Settings for Clock Modes LOOKUP LOOKUP RATE TDM Factor INTERFACE FREQ RAM CLOCK [MHz] CLOCKING MODE [Msps] [MHz] 4 Up to 150 Minimum LOOKUP_IF_FREQ SINGLE CLOCK 4xLOOKUP_RATE 4 Up to 150 Minimum Minimum DUAL CLOCK LOOKUP_RATE 4xLOOKUP_RATE 2 Up to 300 Minimum LOOKUP_IF_FREQ SINGLE CLOCK 2xLOOKUP_RATE 2 Up to 300 Minimum Minimum DUAL CLOCK 2xLOOKUP_RATE 2xLOOKUP_RATE 1 Up to 600 LOOKUP_IF_FREQ LOOKUP_IF_FREQ SINGLE CLOCK Note: Dual clock mode is recommended for Lookup rates up to 300 Mlps. Resets At startup, both the AXI reset and the main reset must be asserted simultaneously for four cycles of the slower of the two clocks (s_axi_aclk and key_clk). As long as the reset assertion time is met, either reset can be asserted or negated first. The system is not ready to use until the reset phase is finished (indicated by the rst_busy output). The rst_busy output is high for approximately 30 clock cycles (slowest clock). PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 23
Chapter 5: Design Flow Steps Chapter 5 Design Flow Steps This chapter describes customizing and generating the core, constraining the core, and the simulation, synthesis and implementation steps that are specific to this IP core. More detailed information about the standard Vivado® design flows and the IP integrator can be found in the following Vivado® Design Suite user guides: • Vivado Design Suite User Guide: Designing IP Subsystems using IP Integrator (UG994) • Vivado Design Suite User Guide: Designing with IP (UG896) • Vivado Design Suite User Guide: Getting Started (UG910) • Vivado Design Suite User Guide: Logic Simulation (UG900) Customizing and Generating the Core This section includes information about using Xilinx® tools to customize and generate the core in the Vivado Design Suite. Following are instructions on how to deploy the CAMs using the CAM IP standalone. The CAMs can also be deployed as part of VitisNetP4 systems. For instructions on how to deploy in VitisNetP4, see the Vitis Networking P4 User Guide (UG1308) (registration required). CAM IP version 2.2 (cam_v2_2) is released as part of the Vivado 2021.1 release. The CAM IP can be instantiated in Vivado in two different ways; using the IP integrator or using the IP Catalog. CAM IP versions must be used with the equivalent versions of the Vivado tool. For example, CAM IP v2.2 must be paired with Vivado version 2021.1. IP Integrator If you are customizing and generating the core in the Vivado IP integrator, see the Vivado Design Suite User Guide: Designing IP Subsystems using IP Integrator (UG994) for detailed information. IP integrator will auto-compute certain configuration values when validating or generating the design. To check whether the values change, see the description of the parameter in this chapter. To view the parameter value you can run the validate_bd_design command in the Tcl Console. To instantiate the CAM IP in Vivado Design Suite using IP integrator, follow these steps: PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 24
Chapter 5: Design Flow Steps 1. Open a project in the Vivado tool. 2. Click Create Block Design in the panel on the left. 3. Right-click in the Diagram window and select Add IP. 4. Search for "cam" in the search box of the window that appears and select CAM IP. 5. The CAM IP can be configured by double-clicking the instantiated IP. PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 25
Chapter 5: Design Flow Steps IP Catalog You can customize the IP for use in your design by specifying values for the various parameters associated with the IP core using the following steps: 1. Select the IP from the IP catalog. 2. Double-click the selected IP or select the Customize IP command from the toolbar or right- click menu. For details, see the Vivado Design Suite User Guide: Designing with IP (UG896) and the Vivado Design Suite User Guide: Getting Started (UG910). To instantiate the CAM IP using the IP Catalog, follow these steps: 1. Open a project in Vivado. 2. Click IP Catalog in the panel on the left. Search for "cam" in the search box of the IP Catalog window and select CAM IP. 3. The CAM IP can be configured by double-clicking the instantiated IP. The CAM IP core in the Vivado Design Environment (IDE) has several fields used to set parameter values for the particular instantiation required. The following section provides descriptions for each field. • Component Name: The name of the core component to be instantiated. The name must begin with a letter and be composed of the following characters: a to z, A to Z, 0 to 9 and ‘_’. PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 26
Chapter 5: Design Flow Steps Main Tab • Lookup Mode: Options available in this pull-down menu are: BCAM, STCAM, and TCAM. Depending on the option selected, the configuration parameters available will change. • BCAM: Selects the Binary Content Addressable Memory. Figure 5: BCAM Configuration Parameters shows the configuration parameters when BCAM is selected. BCAM is described in this document. • STCAM: Selects the Semi-Ternary Content Addressable Memory. Figure 6: STCAM Configuration Parameters shows the configuration parameters when STCAM is selected. Refer to Semi-Ternary CAM Search LogiCORE IP Product Guide (PG319). • TCAM: Selects the Ternary Content Addressable Memory. Figure 7: TCAM Configuration Parameters shows the configuration parameters when TCAM is selected. Refer to Ternary CAM Search LogiCORE IP Product Guide (PG318). Figure 5: BCAM Configuration Parameters PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 27
Chapter 5: Design Flow Steps Figure 6: STCAM Configuration Parameters PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 28
Chapter 5: Design Flow Steps Figure 7: TCAM Configuration Parameters • MEMORY_PRIMITIVE: Options available in this pull-down menu are: AUTO, BLOCK, and ULTRA. • AUTO: The Vivado CAM compiler will select the best suited memory type, either being block RAM or URAM, automatically. The respective block RAM/URAM resource usage is shown in the "CAM Configuration Information" section of the GUI. • BLOCK: User specifies the memory primitive to be block RAM. The block RAM resource usage is shown in the CAM Configuration Information section of the GUI. • ULTRA: User specifies the memory primitive to be URAM. The URAM resource usage is shown in the CAM Configuration Information section of the GUI. • NUM_ENTRIES: The supported number of entries (depth). The number of entries is limited by the number of RAM instances used: • RAM_FREQ
Chapter 5: Design Flow Steps • RAM_FREQ > 480 MHz: 320 BRAM or 160 URAM Note: (BCAM only) NUM_ENTRIES represent number of usable entries. When targeting a certain memory depth for a BCAM, specify 95% of the target. For example, to get a BCAM with memory of depth 4K, specify NUM_ENTRIES = 0.95 x 4096 = 3891. • NUM_MASKS: The number of unique masks. The Vivado CAM compiler generates a STCAM supporting both the specified number of unique masks and the specified number of entries at the same time. Note: This parameter is only available for STCAM. • KEY_WIDTH: The width of the lookup key. Note: This parameter is only available for BCAM and STCAM. • FORMAT_STRING: Note: This parameter is only available for TCAM. The format string describes the format of all keys and masks. A key consists of several fields and the format string specifies the location and size of the fields. In the format string, fields are separated by a colon ':'. The format string is read left to right. The first field corresponds to the least significant bit of the key. The key is specified as little-endian. Each field is specified with a field length and field type. There are no alignment restrictions for the fields. The location, type, number, and size of fields is fully flexible. The following field types and lengths are supported: • b - bit field: The field mask is either all zeroes or all ones; field_len = 1..128 bit, complexity =2 • t - ternary field: The field mask can have any value; field_len = 2..16 bit, complexity = 2^field_len • c - constant bit field: The mask is all ones; field_len = 1..128 bit, complexity = 1 • u - unused bit field: The mask is all zeroes; field_len = 1..128 bit, complexity = 1 • p - prefix field: The mask has a pattern of consecutive zeroes and ones; field_len = 1..128 bit, complexity = field_len + 1 • r - range field: Matching is performed with logic comparators within start/end values • RESPONSE_WIDTH: The width of the lookup response. • DEFAULT_RESPONSE_VALUE: Specify the Default Response value when Lookup has no match. Valid range is between 1 to 1024 bits specified in Hex. The value is set to zero by default. • TEST_FEATURE: This is a test feature and should be left at the default value of zero. Note: This parameter is only available for BCAM. PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 30
Chapter 5: Design Flow Steps • OPTIMIZATION: Options available in this pull-down menu are: AUTO, RAM, and LOGIC. The logic usage is proportional to the number of Physical Memory Units. The RAM usage and number of Memory Units are shown in the CAM Configuration Information section of the GUI: • AUTO: The Vivado CAM generator will balance the usage of logic and RAM resources. • RAM: Use this option to reduce RAM usage at a potentially higher logic usage. • LOGIC: Use this option to reduce logic usage at a potentially higher RAM usage. • OPTIMIZE_ENTRIES: When this check box is selected, use the available memory to insert entries beyond the specified NUM_ENTRIES limit. A side effect for STCAM is that the specified NUM_MASKS value is no longer guaranteed but is ‘up to’ instead. Refer to the row BRAM36/URAM Utilization in the Figure 8: CAM Configuration Information table for resulting values based on this selection. Note: This parameter is only available for BCAM and STCAM. • OPTIMIZE_MASKS: When this check box is selected, use the available memory to insert more masks than the specified NUM_MASKS limit. A side effect of this is that the specified NUM_ENTRIES value is no longer guaranteed but ‘up to’ instead. Refer to the rows BRAM36/ URAM Utilization and Number of Masks in the Figure 8: CAM Configuration Information table for resulting values based on this selection. Note: This parameter is only available for STCAM. • LOGICAL_MEM_UNITS: Force Number of Logical Memory Units, the value will be rounded up to a multiple of the TDM_FACTOR. This is an Advanced Feature and it is recommended to use the AUTO selection. • AUTO: When this check box is selected, the Vivado CAM compiler automatically calculates the appropriate value for LOGICAL_MEM_UNITS. You can override this by deselecting the check box and specifying the appropriate value. • PRIORITY_WIDTH: Note: This parameter is only available for STCAM & TCAM The priority is usually defined wide enough to support one unique priority value per entry. The width can be larger to facilitate easier STCAM and TCAM management or narrower if entries are order independent and guaranteed not to overlap. • AUTO: When this check box is selected, the Vivado CAM compiler automatically calculates the appropriate value for PRIORITY_WIDTH based on NUM_ENTRIES for TCAM, and NUM_MASKS for STCAM. You can override this by deselecting this check box and specifying the appropriate value. For STCAM, the PRIORITY_WIDTH is calculated to accomodate NUM_MASKS different priorities and for TCAM it is calculated to accomodate NUM_ENTRIES different priorities. PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 31
Chapter 5: Design Flow Steps • LOOKUP_RATE: This is the supported lookup rate of the CAM instance (expressed in million Lookups per second). In order to save resources it is important not to set the lookup rate higher than required. • CLOCKING_MODE: Options available in this pull-down menu are: SINGLE CLOCK and DUAL CLOCK. When SINGLE CLOCK is selected the configuration parameter RAM_FREQ is not available. • LOOKUP_INTERFACE_FREQ: This is the clock frequency for the Lookup Request and Response Interfaces, specified in MHz. • RAM_FREQ: This is the clock frequency for the memories (block RAM/URAM) and the internal datapath, specified in MHz. Specifying a higher frequency RAM clock enables time division of the hardware resources, leading to significant resource savings. This configuration parameter is only available when CLOCKING_MODE = DUAL CLOCK is selected. Note: Note when CLOCKING_MODE = SINGLE CLOCK, RAM_FREQ = LOOKUP_INTERFACE_FREQ. • CAM Configuration Information: This table provides interactive information based on the configuration parameter values entered. It allows a quick way to converge on the required CAM settings for the design needs. The information provided is shown in the following figure. Figure 8: CAM Configuration Information • Memory Depth: The depth of the resulting memory based on the value of configuration parameter NUM_ENTRIES. • Memory Width: The width of the resulting memory based on configuration parameter values entered. The contributing composition of the individual components (which vary depending on the Lookup Mode selected) to the width is also provided. For example, the unused component (if not zero) indicates to the user that the other component sizes can be increased with no additional memory resource penalty. • BRAM36/URAM Utilization: The memory utilization efficiency. • BRAM36 Usage: The number of resulting BRAM36s used to implement the required memory size. A value of 0 indicates no BRAM36s are used. • URAM Usage: The number of resulting URAMs used to implement the required memory size. A value of 0 indicates no URAMs are used. PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 32
Chapter 5: Design Flow Steps • Memory Units: Each Physical Memory Units requires logic resources. The number of Physical Memory Units can be used to compare relative logic resource cost between different configurations. • Lookup Latency: The lookup latency value indicated in multiples of the LOOKUP_INTERFACE_FREQ clock cycles. • Lookup Interface Frequency: The LOOKUP_INTERFACE_FREQ frequency and the number of lookups occurring per LOOKUP_INTERFACE_FREQ clock cycle. • RAM Clock Frequency: The RAM_FREQ frequency (note when CLOCKING_MODE = SINGLE CLOCK, RAM_FREQ = LOOKUP_INTERFACE_FREQ). And the way the memory bandwidth is split between Lookup Requests and Management Requests. • TDM Factor: The time division multiplexing of hardware resources. The value indicated describes the number of memory accesses per Lookup Request. When the value is indicated as “capped” then a lower RAM_FREQ can be chosen without additional memory resource penalty. • Number of Masks: Displays the number of masks used for STCAM only. • Debug Flags: The debug flags can be enabled or disabled via selection of the tick-boxes as shown in the following figure, thus enabling /disabling the respective debug functions. The debug functions are controlled via the DEBUG_FLAGS parameter, a 32-bit integer, whose value is also shown in grey specified as "Debug Flags Value". Details of what each debug function does can be found in the Debugging section. Figure 9: Debug Flags Note: During HDL simulation the DEBUG_FLAGS values can be modified dynamically as required. Related Information Debugging User Parameters The following table shows the relationship between the fields in the Vivado IDE and the User Parameters (which can be viewed in the Tcl Console). PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 33
Chapter 5: Design Flow Steps Table 11: Vivado IDE Parameter to User Parameter Relationship Vivado IDE Parameter User Parameter Default Value Lookup Mode MODE BCAM MEMORY_PRIMITIVE MEM_TYPE AUTO NUM_ENTRIES NUM_ENTRIES 256 NUM_MASKS NUM_MASKS 2 KEY_WIDTH KEY_WIDTH 104 FORMAT_STRING FORMAT_STRING 32p:32p:16r:16r:8b RESPONSE_WIDTH RESP_WIDTH 16 PRIORITY_WIDTH PRIO_WIDTH 8 DEFAULT_RESPONSE_VALUE DEFAULT_RESPONSE 0x0000 AUTO PRIO_WIDTH_AUTO_SEL true LOOKUP_RATE LOOKUP_RATE 15.0 CLOCKING_MODE CLOCKING_MODE DUAL CLOCK LOOKUP_INTERFACE_FREQ LOOKUP_INTERFACE_FREQ 15.0 RAM_FREQ RAM_FREQ 480.0 OPTIMIZATION OPTIMIZATION AUTO OPTIMIZATION_ENTRIES OPTIMIZATION_ENTRIES false OPTIMIZATION_MASKS OPTIMIZATION_MASKS false LOGICAL_MEM_UNITS L_MEM_UNITS 4 AUTO L_MEM_UNITS_AUTO_SEL true Debug Flags DEBUG_FLAGS 262242 Output Generation For details, see the Vivado Design Suite User Guide: Designing with IP (UG896). Note: When the CAM IP is generated, a simulation SystemVerilog package file is also created with the name _sim_pkg.sv. This file contains the SystemVerilog task create which contains the configuration settings used for this particular CAM IP instance. You can use this simulation package file and thus the SystemVerilog tasks in it in their own testbench if desired. Constraining the Core This section contains information about constraining the core in the Vivado Design Suite. PG317 (v2.2) July 27, 2021 www.xilinx.com Send Feedback Binary CAM Search IP 34
You can also read