Design and Implementation of Fast Locking and Harmonic Free in Multiphase Digital DLL – Robust to Process Variations

T.S. Karthik¹, V. Jawahar Senthilkumar²
Department of Electronics and Communication Engineering
College of Engineering, Guindy
Anna University
Chennai - 600 025, INDIA
Email: writetotsk@annauniv.edu¹, veerajawahar@annauniv.edu²

Abstract: An ADMODL (All Digital Multiphase Delay Locked Loop) with Harmonic free, Low power, Low Jitter and Immune to SSN features are presented. Harmonic Free and Immune to SSN of the proposed ADMODL are achieved by implementing a Narrow-Wide Coarse Lock Detector (NWCLD) and Time to Digital Converter (TDC), which maintains the delay between reference clock and outgoing clock with in the suitable range along with the main Phase Frequency Detector (PFD) and also it monitors the coarse locking range, hence it also reduces jitter. Low Power can be achieved by using Frequency Estimation Selector (FES) which reuses the delay lines. With the help of both FES and NWCLD, it is possible to achieve fast locking and to enhance the flexibility of the ADMODL. Finally, Edge combiner act as a Duty Cycle Corrector (DCC), which has double edge (Both Rising and Falling edge) synchronization techniques (DES) used to give 50% output clock duty cycle irrespective of input clock duty cycle. The proposed ADMODL has better immunity to PVT variations (process, voltage, and temperature) than most conventional DLLs, which do not update the control word signals after the locking process and suitable apply in the SoCs or the low power requirement systems to eliminate the clock skew. Measurement results show that when operated at 1.2 V, the ADMODL can cover frequency range from 800 MHz to 1.2 GHz with 7.3 mW power consumption, 14.8 ps p-p jitter and less than 200 lock-in cycles.

Keywords: Coarse Lock Detector (CLD), Digital Delay Locked Loop, Edge Combiner, Frequency Estimation Selector, Harmonic Locking/Stuck Problems, SSN.

I. INTRODUCTION

Generation and distribution of clock signals inside the VLSI systems is one of the most important problems in the design of VLSI systems. Because of the process variations and interconnect parasitic, clock signals delays vary for different paths. The clock signals should have zero clock skew, that is to say all the clock signals should arrive at the inputs of registers at the same time. In order to circuit operate correctly, these differences should be eliminated, ideally to zero. Phase-locked loops (PLL) and Delay-locked loops (DLL) are extensively used in VLSI circuits in order to decrease clock skew in the clock networks. Especially, digital delay-locked loops (DLLs) are commonly used for clock synchronization in modern ICs because of their superior stability and process portability than analog. A digital DLL suitable for use in a DDR-SDRAM is presented. The DLL has a graduated coarse delay line and a phase interpolating fine delay line, allowing it to cover the operating range and alignment requirements of all currently defined DDR-SDRAM families while reducing the number of delay elements by a factor of two over a traditional digital DLL. By substituting the voltage-controlled delay line of conventional analog DLL with the digital-controlled delay line (DCDL) and replacing the charge pump and low filter with certain digital control circuits, a digital DLL could be achieved, as shown in Fig. 1, in which the numbers in the block of the DCDL indicate its relative delay time.

![Digital DLL Block Diagram](image)

Fig. 1. Digital DLL Block Diagram

Delay-locked loops (DLLs) are often used and their design faces tradeoffs between the requirements for low jitter, fast locking, wide frequency-range, and low power consumption. Low voltage headroom, associated with supply voltages in
scaled technologies presents a challenge for analog control loops in a DLL to achieve a very wide locking range. This limitation is solved by using digital control loops that ideally can use longer word lengths to extend the dynamic locking range [1]–[5]. To maximize timing margin, a 90° phase shift delay locked loop (DLL) used which delays data strobe (DQS) signal by 90° phase is adopted in the DDR interface.

Ring oscillator and counter based harmonic locking prevention techniques have been demonstrated in [1] which has a high locking time problem and Duty-cycle correcting techniques with loop embedded DCC have been proposed in [2] which has unstable duty cycle operation. Wide frequency of operation and Low jitter characteristics without harmonic locking problems have been implemented in [3] but it has area overhead and more power consumption problem. To eliminate these discrepancies, our proposed DLL includes Narrow-Wide Coarse Lock Detector (NWCLD) and Time to Digital Converter (TDC) for harmonic free and immune to SSN operation and for stable duty cycle operation. Edge combiner with DES techniques has been implemented. The ASIC needs to include a high-precision time-to-digital converter (TDC) for each detector element to reach the required time resolution with good stability and to choose a reasonable delay range rather than using self-correcting circuit. The lock-in time of the proposed ADMDLL can also be reduced by adding a TDC module. Thus, its operation is very robust and can avoid the possible false-lock of conventional designs.

This paper begins with the brief description of the proposed architecture in Section II. Section III discusses about the Harmonic locking problem, Duty Cycle Correction, Process Variation Modeling and their control schemes proposed in this paper. In Section IV, the simulation results are discussed and conclusions follow in Section V.

II. ARCHITECTURE OF THE PROPOSED MULTIPHASE DLL

The block diagram of the proposed All Digital Multiphase DLL is depicted in Fig. 2, which has FES and TDC such as [4], [5]. It consists of five major blocks: they are digitally controlled delay line (DCDL), frequency-estimation selector (FES), Narrow-Wide Coarse Lock Detector (NWCLD), Time to Digital Converter (TDC) and Edge combiner. The key advantages of these five major blocks are, good time resolution and stability by TDC, mismatching of the various harmonics of the clock signals can be monitored by using NWCLD along with FES, it results harmonic free and fast locking behavior of the DLL. The digitally controlled delay line (DCDL) is a digital control circuit whose delay is controlled by the digital control signals. It has both coarse and fine delay units in order to provide a effective delay of the clock signals according to the NWCLD outputs. Wide range frequency of operation of DLL is achieved by using FES, which reuses the delay lines and able to reduce the circuit area, power dissipation.

![Block diagram of the proposed digital multiphase DLL with FES and TDC](image1)

Fig. 2. Block diagram of the proposed digital multiphase DLL with FES and TDC

At first, NWCLD activates the FES and align the DCDL in the corresponding delay range. Then, the FES inputs multiphase outputs of DCDL to estimate input frequency range and generates digital codes (F1, F0). The TDC shown in Fig. 3 converts the reference clock’s period information into multiphase range delay units (RDUs) delay time. After TDC encoder, the DCDL range selection control code is sent to the NWCLD. After that, NWCLD uses (F1, F0) and signals from PFD to do the locking steps of the digital multiphase DLL. Once the DCDL clock signal is aligned with the reference clock signal, the ADMDLL achieves the lock state. After frequency acquisition between the input clock and the delayed clock in the NWCLD using the DCDL multi-phases, one-cycle phase lock occurs in the PFD. By using the NWCLD, we can not only make the best use of a DCDL range but make the DLL to be immune to SSN without harmonic lock and stuck problem.

![Architecture of the time-to-digital converter (TDC)](image2)

Fig. 3. Architecture of the time-to-digital converter (TDC)
As discussed in [4], [5], and [6], to avoid false lock, the DCDL should always operate under the delay range \(0.5 \times T_{\text{REF}} < T_{\text{DCDL}} < 1.5 \times T_{\text{REF}}\). Where \(T_{\text{REF}}\) means the period of reference clock and and \(T_{\text{DCDL}}\) means the time delay of the digitally controlled delay line. The proposed DCDL consists of equal delay stages, and the architecture for one delay stage is shown in Fig. 4. The delay time of one delay stage is controlled by three cascading stages: range selection stage, coarse-tuning stage, and fine-tuning stage. To improve the phase resolution, the fine-tuning delay cell is added after the coarse-tuning stage.

III. CIRCUIT DESIGN

A. Harmonic Locking Analysis in DLL

It is basically a wrong phase matching(locking) between reference clock and output clock. To avoid the problem of locking on varying harmonics, the following inequality should be applied in general,

\[
\frac{1}{2} T_{\text{clk}} < T_{\text{delay(min)}} < T_{\text{clk}} < T_{\text{delay(max)}} < 3/2 T_{\text{clk}} \quad \text{i.e.,}
\]

\[
0.5 T_{\text{cc}} < T_{\text{vcdl_min}} < T_{\text{cc}} \quad \text{and} \quad T_{\text{cc}} < T_{\text{vcdl_max}} < 1.5 T_{\text{cc}}
\]

\[
\text{Max}(T_{\text{vcdl_min}} < 2/3 T_{\text{vcdl_max}} < T_{\text{cc}} < \text{Min}(2 T_{\text{vcdl_min}} < T_{\text{vcdl_max}})
\]

Where \(T_{\text{cc}} \rightarrow \text{Clock Period}\), \(T_{\text{vcdl_min}} \rightarrow \text{Min.Delay of VCDL}\), \(T_{\text{vcdl_max}} \rightarrow \text{Max.Delay of VCDL}\).

Initial locking starts with \(T_{\text{vcdl_max}}\) or \(T_{\text{vcdl_min}}\) but it is clear that VCDL delay should be increased/decreased till DLL is locked. We need to use proper designed PD/PFD which can prevent stuck/harmonic-lock problems.

Harmonic locking of a DLL is not a problem when the DLL is employed as a zero-delay buffer but as a multiphase generator, harmonic locking must be avoided. For example, if a DLL which is required to create 8 uniform phases \((0^\circ, 45^\circ, 90^\circ, \ldots \text{and} \ 315^\circ)\) undergoes second harmonic locking, namely, if a \(0^\circ\) clock signal is locked to a \(720^\circ\) clock signal instead of \(360^\circ\) clock signal, then a \(180^\circ\) clock signal is obtained instead of a \(90^\circ\) clock signal and so on. Therefore, in multiphase applications, harmonic locking is obviated. The Fig. 5 represents the difference between the normal and harmonic locking behavior of DLL, from the figure, we can conclude that the rising edge of the \(\text{out}_{\text{clk}}\) should be located within the one clock cycle of \(\text{ref}_{\text{clk}}\) in order to operate circuit under normal lock condition.

Fig. 4. Architecture of the delay stage

Fig. 5. DLL Different Locking Conditions

The proposed DLL consists of PFD (Phase Frequency Detector) which help us to prevent any harmonic lock or stuck problems occurred in DLL. The Fig. 6 shows the block diagram of PFD.

Fig. 6. Block Diagram of Phase Frequency Detector (PFD)

B. Simultaneous Switching Noise (SSN) Analysis

The major noise source of the DLL circuit includes Simultaneous Switching Noise (SSN) from external circuit, phase detection noise, VCDL internal noise from substrate, external input noise and VCDL control voltage noise. Among these noise sources, the SSN from external circuit has large portion of increased jitter and very difficult to estimate the
The relation between noise voltage and increasing of DLL jitter. The SSN-to-jitter transfer function can be effective method of jitter estimation. SSN-to-jitter transfer function of DLL can be defined as function of conversion ratio between SSN voltage and amount of output clock jitter. Simultaneous switching noise is a voltage fluctuation between power and ground that occurs when multiple output drivers switch simultaneously.

The data rate of current DDR3/4 systems is expected to move from 2Gbps to 4Gbps. At such high data rates, SSO noise introduced by output drivers becomes the major bottleneck in designing memory channels. GDDR3/4 is based on the pseudo open drain logic (PODL) (single ended) signaling. It generates substantial AC current peaks when output drivers are simultaneous switching. These current peaks generate a large amount of SSO noise in the system if the impedance of the power distribution system is not sufficiently low. Co-simulation of PDN and channel model requires long simulation time and often results in convergence issues. Furthermore, supply noise in the system strongly depends on the switching data pattern. The worst case data pattern for supply noise is a function of PDN resonance; whereas, the worst case pattern for signal noise, such as crosstalk and inter-symbol-interference(ISI), depends on the channel transfer function. Therefore, finding the worst case system voltage and timing margin considering both signal and power integrity effects is a challenging task.

Bias filter capacitor is inserted for reduction output jitter. Each filter capacitance can be changed by external control switch. In general SSN must be less than that of Threshold Voltages.

![Fig. 7. SSN-to-Jitter Transfer Function of general DLL Circuit](image)

Thus, by using the NWCLD we can not only make the best use of a DCDL range but make the DLL to be immune to SSN without harmonic lock and stuck problems. The Fig. 8 represents the NWCLD with its narrow and wide output signals to DCDL.

![Fig. 8.1. Wide Up Signal from NWCLD to DCDL](image)

![Fig. 8.2. Narrow Up Signal from NWCLD to DCDL](image)

![Fig. 8.3. Wide Down Signal from NWCLD to DC](image)

![Fig. 8.4. Narrow Down Signal from NWCLD to DCDL](image)
C. Duty Cycle Correction

A clock with 50% duty cycle is very important in many applications such as DDR-SDRAMs and double-sampling analog-to-digital converters. To double the data rate, both positive and negative transition edges of a clock are utilized. However, the duty-cycle distortion of a clock occurs owing to the unmatched rising time and falling time in the clocking paths. Thus, the duty-cycle corrector (DCC) for a clock is needed. In general every clock synchronization system needs 50% output clock duty cycle in order make maximum data valid window. Edge combiner is basically a Duty Cycle Corrector (DCC), which is used to give 50% output clock duty cycle irrespective of input clock duty cycle.

![Image of Edge Combiner](image.png)

Fig. 9. Edge Combiner

It is also used for obtaining output clock 2x multiplications. The Fig. 9 shows the Basic Edge Combiner generates output clock with 50% duty cycle.

D. Process Variation Modeling

Process variations are fluctuations in the value of process parameters (Process, Voltage, Temperature) observed after fabrication. These variations effect the performance characteristics of devices as well as interconnects. parametric yield, timing yield, power yield are the fraction of samples that meet the particular constraint. Process variation can be considered to operate at two different levels; at the chip level, which we call inter-die variation, and at the transistor level, which we refer to as intra-die variation. Inter-die variations refer to a parameter variation that has the same value across a single die, and hence captures variations that occur from die-to-die, wafer-to-wafer and lot-to-lot. Intra-die variation is the component of variation that causes device parameters to vary across different locations within a single die. There are two basic modeling which illustrates the nature of process variations across VLSI design. They are as follows,

1) Pelgrum’s Model - To capture the mismatch in transistors arising due to variations in process parameters. The approach is based on analyzing the impact of variations (both random and correlated) in the frequency domain and abstracting key features of both intra-die and inter-die variation. Let us consider a parameter P that varies over the surface of a die in the x-y plane due to process variations. Variations in P for different values of coordinates (x, y) result in mismatch of transistors, which have been designed to have the same characteristics.

\[ \Delta P = \frac{1}{A_0} \left( \int_{t_1} P(x,y) \ dx \ dy - \int_{t_2} P(x,y) \ dx \ dy \right) \]

The relation between nominal and correlated component of variation of die (P) can be expressed as,

\[ \Delta P_{\text{corr}} = \Delta P_{\text{nom}} + a_1 \ x + a_2 \ y \]

Where \( P_{\text{nom}} \) nominal value of the correlated variation, it is estimated if and only if, origin on the wafer is known and deterministic and \( a_1 \) and \( a_2 \) process parameters depends on coordinate system for the die.

The expressions for the variance of a number of key parameters, such as threshold voltage (\( V_{\text{th}} \)) and gate oxide thickness (\( T_{\text{ox}} \)), can be expressed in terms of overall variation in parameter \( P \) as,

\[ \sigma^2 (V_{\text{th}}) = [A^2_{V_{\text{th}}} / W L] + S^2_{V_{\text{th}}} D^2 \]

\[ \sigma^2 (T_{\text{ox}}) = [A^2_{T_{\text{ox}}} / W L] + S^2_{T_{\text{ox}}} D^2 \]

where \( A_{V_{\text{th}}} \), \( S_{V_{\text{th}}} \), \( A_{T_{\text{ox}}} \) and \( S_{T_{\text{ox}}} \) are process dependent constants. By considering above equations, process variation metrics such as device leakage current, can be easily determined.

2) Quad-Tree Based Modeling - This approach to model process variations based on partitioning the overall die area into a number of regions. Instead of adopting a covariance matrix based model to consider correlated components of variations, it uses an additive approach to consider the spatial dependence of process parameters. The value of the process parameter for a device \( i \) can be expressed as

\[ X_{\text{total},i} = X_{\text{nom}} + \Delta X_{\text{inter}} + \Delta X_{\text{inter}} + \Delta X_{\text{intra},i} \]

which is a sum of the nominal value of the process parameter \( X_{\text{nom}} \), the variation due to inter-die variation \( \Delta X_{\text{inter}} \) which is the same for all gates, and the intra-die variation corresponding to that particular gate \( \Delta X_{\text{intra},i} \). Quad-Tree partitioning is
achieved by dividing the die area into four similar parts recursively as a region of the die and for each additional level of partitioning, every number of parts can be increased by a factor of four.

The proposed DCDL uses pelgrom’s model compare to quad-tree to capture the mismatch and effective sizing of transistors in the standard delay cell by using above said equations, so that process variation is minimized.

IV. SIMULATION RESULTS

The post-layout simulation waveforms of the proposed ADMDDL shown in Fig. 10 and 11. To make sure that the proposed design will not cause a harmonic locking with a reference clock, a ‘Done’ signal is used in this simulation. Table 1 and Fig 12 represents over all jitter performance of the DLL across three different corners based on reference clock period. When Operating in 1.1V, the locking range of ADMDDL is less than 200 clock cycles. Finally, Table 2 summarizes the comparative performance summary of conventional DLL’s with the proposed ADMDDL.

![Fig. 10. Simulation Output of 90° Locking](image1)

![Fig. 11. Simulation Output of All Digital Multiphase DLL (ADMDLL)](image2)

![Fig. 12. DLL Jitter Performance Vs Clock Period](image3)

The Power Consumption characteristics of DLL across PVT is tabulated in Table 1. The jitter calculations are especially important as the reference clock frequency decreases. The jitter is in an acceptable range of less than 500ps when the reference clock period is 5ns (200MHz). For a reference period of 10ns, the peak-to-peak jitter jumps to a worst-case value of almost 2ns! This would be unacceptable for many DLL usage situations. Hence this can be easily avoided by using fastest reference clock frequencies.

<table>
<thead>
<tr>
<th>Clock Period (tCK)</th>
<th>Slow Corner Locked</th>
<th>Typical Corner Locked</th>
<th>Fast Corner Locked</th>
</tr>
</thead>
<tbody>
<tr>
<td>5 ns</td>
<td>2.3 mA</td>
<td>1.8 mA</td>
<td>1.7 mA</td>
</tr>
<tr>
<td>7.5 ns</td>
<td>1.4 mA</td>
<td>1.0 mA</td>
<td>1.2 mA</td>
</tr>
<tr>
<td>10 ns</td>
<td>0.8 mA</td>
<td>0.7 mA</td>
<td>0.9 mA</td>
</tr>
</tbody>
</table>

Table 1: Power Supply Requirements Across PVT and tCK
using single DLL, it is flight (TOF) capability. Tomography (PET) imaging o
s
r
 cr
p
a

Table 2: ADMMDLl Comparative Performance Summary

V. CONCLUSION

In this paper, an all-digital multiphase clock generator architecture of DLL is presented. The proposed TDC and NWCLD can overcome the false-lock / harmonic locking problem in conventional designs. It is also that the proposed DLL is immune to SSN and consumes low power. To measure the performance of the DLL, the circuit was simulated across PVT corners and over the full operating range. The DLL takes exactly around < 200 cycles to lock with a 7.3 mW @ 800 MHz of power consumption. Proposed DLL is operates in the frequency range from 800 MHz to 1.2 GHz and consumes 7.3 mW @ 800 MHz. The post-layout simulated peak-to-peak jitter is 14.8 ps in an SSN environment. In summary, the proposed DLL matches well with future technology trends such as technology scaling, low supply voltage, and high frequency, good absolute stability, fast response, low level output jitter.

Due to its low jitter multiphase clock generation behavior, it is very useful for Positron Emission Tomography (PET) imaging applications with time-of-flight (TOF) capability. For further development, instead of using single DLL, it is also be enhanced with array of DLL’s to improve nonlinearity of the system.

VI. REFERENCES


