Dual Pai-Sigma Segment Matchline for low power Ternary CAM

S.Karthikeyan, Assistant Professor ECE, SNS College of Technology, Coimbatore, Tamilnadu
D.Deepika, Assistant Professor ECE, SNS College of Technology, Coimbatore, Tamilnadu

Abstract- This paper proposes a Dual Pai-Sigma Segment Matchline scheme to reduce the search (compare) power of a Ternary Content Addressable Memory (TCAM). The proposed Matchline does not incur the issues of short circuit current and charge sharing, which typically exist in the hybrid NAND-NOR Matchline. The proposed scheme is designed using 6T NAND & NOR TCAM cells and 4T NAND & NOR TCAM cells and their performance is compared. Results shows that the search power of the TCAM can achieve 60% power reduction compared with the conventional Single Pai-Sigma Matchline. And the delay is calculated as ns. And also the power delay product of the proposed scheme very less compared to conventional method.

Keyword- Dual Pai-Sigma Segment Matchline (DPSML), Content Addressable Memory (CAM), Ternary CAM (TCAM), Matchline (ML)

I.Introduction

Content Addressable Memory is used to access the memory through the data rather than the address which is used in the case of normal RAMs. The output of the CAM will be the location where the associate content is stored. In case of CAM, the input data and the stored data are being compared, if both match then the match line is used to indicate it. Due to its low power and fast matching capability it is mainly used in advanced applications like Strong ARM processors, ATM switches, etc.

CAM is classified as Binary CAM (BCAM) and Ternary CAM. Both can store logic 0 and logic 1. In addition the CAM stores the don’t care (X) bit, that represents both “0” and “1”, allowing a wildcard operation.

Many works on low-power design techniques for TCAMs have been reported, [2]-[8]. In [3], the authors proposed a current-race matchline sensing scheme to reduce power consumption by minimizing switching activity of search lines and shrinking voltage swing of matchlines. In [4], Hybrid CAM is designed by replacing 9T by 4T CAM cells. In [5], a selective precharge matchline scheme was proposed. An NOR-type matchline is partitioned into two parts. Then, the comparison result of the first part determines whether the second part is precharged or not such that the power consumption of the matchline is reduced. In [6], a matchline is divided into two parts, the first part is an NAND-type matchline and the second part is an NOR-type matchline. Similarly, the comparison result of the first part determines whether the second part is precharged or not.

This paper presents the DPSML scheme using TCAM cells for low power fast search operation with the delay time of less than 20ns.
II. EXISTING SYSTEM

Single Pai-Sigma Matchline Structure

Fig. 2(a) shows the transistor-level diagram of the Single Pai-Sigma matchline scheme, where only the comparison logic of the TCAM cells is shown. The Pai segment realizes NAND function. The Sigma segments realizes NOR function. The cellp-1 is merged with the interface logic between the Pai segment and Sigma segment. Fig. 2(b) shows the timing diagram of control and match signals of the Pai-Sigma matchline when a Compare operation is performed. For the Pai segment, i.e., cell0 to cellp-2, the comparison logic of each cell is comprised of two nMOS transistors in shunt and two pMOS transistors in series. Each pair of pMOS and nMOS transistors is controlled by Si and Mi. For the Sigma segment, i.e., cellp to cellp+1, the comparison logic of each cell is two nMOS transistors in series. The comparison logic of the cellp-1 is mixed with the interface logic.

In the precharge phase (Pre=0), for the Pai segment all internal nodes (i.e., I0, I1, ..., Ip-3, MLNAND) between two adjacent comparison logics can be charged to Vdd or Vdd-Vt regardless of the states of Si and Mi. The reason is that if (Si, Mi) = (1,1), (1,0), or (0,1), then Ii+1 = Ii. On the other hand, if (Si, Mi) = (0,0), then the pMOS transistors controlled by Si and Mi are turned on and the corresponding internal node Ii is precharged to logic 1. Therefore, the search lines of the cells from cell0 to cellp-2 do not need to be reset to guarantee that all internal nodes can be charged to logic 1. This can eliminate the dynamic power for resetting the search lines. Please note that the node MLNAND can be charged to Vdd through the precharge pMOS transistor. This guarantees that no DC current exists in the interface logic.

In the evaluation phase (Pre=1), if both Si and Mi for 0≤i≤p-2 are not logic 0, i.e., the search result is match, then MNAND = Ip-3 = Ip-4 = ... = Ip-0 = 0. If any (Si, Mi) of the cells (from cell0 to cellp-2) is (0, 0), i.e., the search result is mismatch, then the corresponding pMOS transistors are turned on and the corresponding internal node Ii+1 is precharged to logic 1. Then, the logic 1 is propagated to upstream internal nodes (Ip-2, Ip-3, ..., MLNAND) through the nMOS pass transistors. To cope with the issue of short circuit current, the proposed Pai-Sigma matchline uses a static CMOS gate (i.e., the interfacing logic) to cascade the NAND matchline and the NOR matchline. As Fig. 2(a) shows, the inputs of the static CMOS gate consist of MLNAND, Mp-1, and Sp-1. The static CMOS gate also serves the comparison logic of the cellp-1. Therefore, if the search result of cell0 to cellp-1 is match, the interface logic generates a logic 1 at MLpai node. The precharged pMOS of the NOR matchline will be turned off. If the search result of NOR matchline is miss, then no short-circuit path exists. The final search result of a matchline is the ML. Fig. 2(c) shows the truth table of ML with respect to the MLNAND, MLpai and MLsigma.

III. Proposed System

Dual Pai-Sigma Segment Matchline

The number of bits of the Pai matchline has a great influence on the speed and power of the Compare operation. To design a low-energy
TCAM, the product of the speed and power must be minimized. Here we select $p=8$ as an example for designing a low-power TCAM.

The delay contributed by the Pai segment can be reduced further by partitioning the Pai path into multiple Pai segments. For example, an 8-bit Pai segment can be implemented by 1, 2, 4, or 8 subsegments and then the comparison results of those subsegments are evaluated by using interfacing logic circuits. Fig. 5 shows the delay of the Pai segment with respect to different numbers of subsegments. The simulation results show that the two-subsegment Pai segment has the lowest delay. Therefore, the two-subsegment Pai segment is selected to realize a Dual Pai-Sigma matchline (DPSML) scheme. The Sigma segment is also partitioned into two subsegments to reduce the delay and power consumption.

Fig. 3. Dual Pai-Sigma matchline (DPSML) scheme.

Fig. 3. Shows the proposed DPSML scheme, where the Pai segment is separated into the PaiA and PaiB; and the Sigma segment is separated into the SigmaC and SigmaD. The precharge operation of the SigmaC and SigmaD is controlled by the result of ML$_{pai}$. If the comparison result of the Pai segment is match, the ML$_{pai}$=0 and the matchlines of SigmaC and SigmaD are precharged to Vdd. The ML$_{pai}$ is the AND of ML$_{paiA}$ and ML$_{paiB}$.

Fig. 4. The block diagram of different configurations in Pai circuit

The delay contributed by the Pai segment can be reduced further by partitioning the Pai path into multiple Pai segments. For example, an 8-bit Pai segment can be implemented by 1, 2, 4, or 8 subsegments and then the comparison results of those subsegments are evaluated by using interfacing logic circuits. Fig. 5 shows the delay of the Pai segment with respect to different numbers of subsegments. The simulation results show that the two-subsegment Pai segment has the lowest delay. Therefore, the two-subsegment Pai segment is selected to realize a Dual Pai-Sigma matchline (DPSML) scheme. The Sigma segment is also partitioned into two subsegments to reduce the delay and power consumption.

Fig. 5. The critical delay of different configurations in Pai circuit

Thus, if either ML$_{paiA}$ or ML$_{paiB}$ is at logic 0 then the ML$_{pai}$ = 0 and the matchlines of SigmaC and SigmaD can be precharged to Vdd. Therefore, only a precharge control circuit is
implemented in the PaiA segment, which can guarantee the ML to be set to logic 0 in the precharge phase. This can also reduce the power consumption contributed by the precharge signal. If a Compare operation is performed, the precharge control signal Pre is set to 0 and the ML_{paiA} becomes logic 0. Then, the match-lines of SigmaC and SigmaD are precharged to Vdd in the precharge phase.

In the evaluation phase, if the comparison result of the Pai segment (i.e., from cell_{0} to cell_{p-1}) is match, then the ML_{paiA} and ML_{paiB} are logic 1. The state of ML_{pai} thus becomes logic 1. Therefore, the NMOS transistors N1 and N2 are turned on and the matchlines of SigmaC and SigmaD are in the evaluation phase. If the comparison results of SigmaC and SigmaD are also match, then the results of the ML_{SigmaC} and ML_{SigmaD} are logic 1 and the ML is logic 1. If the comparison result of the Pai segment is mismatch, then the charge in the Sigma segment is not discharged, i.e., at logic 1 state.

IV. Analysis & Results

In this paper the proposed scheme is designed to search 16 bit address. The 4 MLs (ML0,ML1,ML2 AND ML3) are designed and stored with the 4 different data(16 bit) values of the routing table. And the stored data is compared with the search data.

The Fig.6 shows the output waveform of DPSML using 6T TCAM cells. Here in precharge phase (Pre=0) all matchlines are at 0. In evaluation phase (Pre=1) the ML which is matched with search line produces high value. Here ML gives the high value with the delay of 7.47ns. and the power consumption is 5.23 x e^2 watts.

The Fig.7 shows the output waveform of DPSML using 4T TCAM cells. Here in precharge phase (Pre=0) all matchlines are at 0. In evaluation phase (Pre=1) the ML which is matched with search line produces high value. Here all ML gives the high value by the use of don’t care condition with the delay of 16.2ns. the power consumption of this structure is 1.89 x e^4 watts.
B. Comparison Results

The low power TCAMs reported in [4], [5], [8] are designed for specific applications. For the general application, the proposed TCAM with Dual Pai-Sigmas Segment Matchlines has lowest power delay product for search operation.

Table I summarizes the comparison results of the existing system and proposed low power TCAM using 6T TCAM cells & 4T TCAM cells. From the Table I we can see that the proposed system with 4T TCAM cell structure reduces the power consumed by half of existing method but due to the high delay the power delay product of structure using 4T cells is only 15% reduction from existing method. The proposed scheme with 6T TCAM cell structure reduces the power delay product by 80% compared to existing Single Pai-Sigma ML.

Table I

<table>
<thead>
<tr>
<th>Parameters</th>
<th>Single Pai-Sigma Matchline</th>
<th>Dual Pai-Sigma Matchline using 4T cells</th>
<th>Dual Pai-Sigma Matchline using 6T cells</th>
</tr>
</thead>
<tbody>
<tr>
<td>Average power consumed (in watts)</td>
<td>202.73</td>
<td>103.19</td>
<td>54.90</td>
</tr>
<tr>
<td>Delay (in ns)</td>
<td>9.73</td>
<td>16.2</td>
<td>7.47</td>
</tr>
<tr>
<td>Total Time for execution (in sec)</td>
<td>11.40</td>
<td>7.23</td>
<td>7.79</td>
</tr>
<tr>
<td>Power Delay Product</td>
<td>1972.56</td>
<td>1671.67</td>
<td>410.103</td>
</tr>
</tbody>
</table>

V. Conclusion

In this paper we have presented a low power TCAM using the Dual Pai-Sigma matchline scheme which does not incur the issues of charge sharing and short circuit current. The switching activity of the search lines is also low. We have implemented the proposed low-power TCAM using 180nm CMOS technology. The delay in proposed system is only 7.5ns. Results show that the proposed TCAM can achieve 80% power delay product reduction compared with the conventional Pai-Sigma Matchline.
REFERENCES


