CPU-based high-performance IP routing table cache lookup
Department of Computer Science, SUNY at Stony Brook
Abstract: Line high-speed IP (Internet Protocol) routers require very fast routing table to find the new IP data packets received. Routing table lookup operation is time consuming, because the search process, only part of the IP address and network address length is variable. This paper presents a cluster called the Parallel Suez's IP routers for the routing table lookup algorithm. Innovation of the algorithm is that it can use CPU caching hardware to implement the routing table updates and direct IP address will be carefully mapped to the virtual address table. By running a detailed simulation model, which uses a network router from one of the main track collection of CPU memory hierarchy on the performance of the data packets, we show that the overall performance of the algorithm to find a volume of 870,000 per second, 500MHz the Alpha processor, which has 16K of cache, and 1 - 2MB of secondary cache. This result is implemented in software than ever before to find faster routes a 1 to 2 orders of magnitude. The article also reported suggestions in the scheme in the performance of various structural parameters and their storage costs, which is installed on a Linux system and has a Pentium - Ⅱ processor computer implementation of the measure.
I. INTRODUCTION
Each logical IP routing table entries include the following: the network mask, the destination network address and an output port identifier. Given a packet destination host IP address, an IP routing table entry fields in the network mask is used to extract the target network address, and then come to the destination network address field. If they match, the routing table will be considered as potential candidates. In this way logically compare the destination host address of each routing table entry. Finally, the network routing table in the address bar enter the longest winning candidates, and the packet is specified by the corresponding item in the next hop router output port. If there is no routing table entry matches the incoming packet's destination IP host address, the packet is forwarded to a default router. Routing table entry for each network mask IP address will extract the most significant bit of a different number as a network address. Therefore, IP routing table look-up table is basically looking for the longest prefix match routing a given target IP address. Scan in reverse order, the existing IP routing software, such as Unix, BSD, build an index tree, to speed up the process of routing table entries to find and avoid visiting irrelevant. However, even with indexes, software-based routing table lookup operation can not run the line speed. For example, for 1000 data packets and a 1 Gb / sec link, wire-speed routing table lookup means that each input port of the search volume of 100 million per second. This article describes a project called Suez high-performance software-based IP router, the IP routing table lookup program algorithm. Because the target hardware platform is a generalpurpose the CPU, the algorithm is designed for high-efficiency pure software design. In this algorithm, the key is to observe the routing table to find the destination IP address mapping search to the output port, and the modern microprocessor CPU cache is intended to promote a similar process. Thus, by the memory address for the virtual IP address, you can use CPU cache as one of its hardware support to speed up the routing table lookup. In the ideal case, if the routing table lookup for each virtual address corresponds to a search, only a cache access. So do a routing table lookup, just a cycle. However, in practice, require more complex mechanical distortion look-up table as the CPU cache to cache routing, because the IP address of the "label" length is variable, and the existing CPU cache hardware only supports fixed-length labels. The route lookup algorithm in Suiz will carefully connect the virtual address and IP address to ensure routing lookup speed is the speed out of a nearly completed depends on what processor. Remainder of this article are as follows. Section 2 reviews previous work, IP routing table lookup and cache. Section 3 describes in detail the routing table lookup algorithm Suiz caching algorithm and architecture is the use of CPU cache. Section 4 describes the method lookup algorithm is used to evaluate the proposed routing performance tables. Section 5 describes the simulation results and performance measurement and detailed analysis. Section 6 describes the sample structure and operation of the system implementation. Section 7 of this thesis, the end of this summary of the main results, as well as a brief overview of ongoing work.
Second, the related work
The most popular search string matches the longest prefix followed by data problems are a bit Patricia trie. Reduce the base of a tree called a similar technique has been implemented in the 4.3 version of Berkeley Unix. By comparing the VLSI implementation cost and performance issues of content-addressable memory (CAM) based solutions and Terry, routing table lookup. McCauley and Francis [4] based on binary and ternary layered network address of adhesion molecules in fast route variable-length look-up table solution. Knox and Panchanathan [5] describes an algorithm based on multi-processor routing table to find a solution, it is based on finding a base line for linear array. Recently, Walter • Telford Siegel developed a use of a particular binary search to find the length of the hash table, so that each hash table is a perfect or semi-perfect hash function on the set of prefixes. Mark Douglas has developed a compact representation of the routing table to ensure that the entire representation, suitable for the typical secondary cache. They estimate that each query can be instructed to complete the 100 with 8 memory references. Compared to previous schemes, Suiz routing table organization is very simple, query and thus more effective. Unlike tree-based search algorithm, Suiz routing table lookup algorithm does not need back support longest prefix match. Recently, Varghese presented the control of a smaller length scale is set to a unique prefix prefix length data structure in the search option to reduce the number of general techniques and expanding these routing prefix length. Through dynamic programming to select the smaller set, the use of the size of the search data structure as the objective function. Address the routing table lookup in another work, the implementation of the hardware using a large amount of cheap DRAM 1 2 level units lookup table and line the channel. In contrast, Suiz routing lookup algorithm entirely on standard processors and cache memory, the hardware and software implementation, it is a PC-based development of the address and page management hardware products / software design. Not previously reported details of the works, including the CPU cache of the delay measurement. Another way to speed up the routing table to find the cached query results. Feldmeie of the routing table cache management policy, and that the routing table lookup time reduced by 65%. He is investigating the effectiveness and environmental benefits of multimedia query cache routing table. Estrin and Mitzel calculated the router and find information storage requirements of the state to maintain, and to indicate the presence in the region, through the implementation of a simulation of the cache LRU find tracedriven routing table size for different conversation. In the present, there is no work of a previous use of the CPU cache, the same is true in the Suez. In addition, we believe Suiz routing table lookup scheme is the first algorithm in a single integrated cache and routing table lookup algorithm. At the same time, our work includes a detailed simulation and network router to perform measurements, mainly for the algorithm in the routing table from the collection and real-time traffic traces.
Third, the routing table lookup algorithm
Suiz a major design goal of the project is to show that the general-purpose CPU can be used as a powerful platform for high-performance IP routing. Therefore, the routing table lookup algorithm Suez take full advantage of a large number of cache hierarchy. The algorithm is based on two data structures, the destination host address cache (HAC), and a destination network address routing table (NART). Both are designed to use CPU cache efficiency. The algorithm first looks for the HAC, to check whether a given IP address is cached in the HAC, as has been seen. If so, find success, the corresponding output port is used for routing-related packets. If not, the method further consultations to complete the NART lookup.
A, host address cache
Typically, multiple packets in a network connection within the effective time of transmission, and network connections with relatively stable. Therefore, the target IP address of a local router traffic is to see samples. In other words, most of the routing table lookup to find directly from the HAC.
Therefore, the overall performance of the routing table lookup, reducing access time HAC hit is very important. And compared with a software data structure, such as a hash table, Suiz the HAC requires a lot of time to stay in one (L1) cache, and be able to use the cache function of the hardware directly to find. As a first step, 32-bit IP address can be viewed as 32-bit virtual memory address, simply seen as L1 cache. If they hit in the L1 cache, then the entire search will be completed in one CPU cycle, or to conduct a NART lookup is required. However, this approach has several drawbacks. First, the implementation of plans relative to the typical memory reference stream, the stream in the target host address, the lack of spatial locality. As a result, the address space consumption (access mode) will become sparse, which may lead to a very large page table and TLB miss rate and excessive (or) page fault rate. Second, unless there are special measures can not guarantee that all the time is a cache HAC. Especially the lack of coordination of virtual to physical address mapping may lead to unnecessary conflicts but have not hit the cache, the HAC and other data structures in the interface is used to find the algorithm. Third, the modern L1 cache cache block size is too large for the HAC. In other words, the network packets, due to lack of space as a reference stream locality, a single cache block is often underutilized, leading to most of the time no or only a HAC in the cache block. Therefore, the overall cache efficiency than it is a fixed size buffer is also low.