# Université de Lille 1 SCIENCES ET TECHNOLOGIES École Doctorale Sciences pour l'Ingénieur

# THÈSE

présentée en vue d'obtenir le grade de

## DOCTEUR

# En Micro et Nanotechnologie, Acoustique et Télécommunication

soutenue publiquement le 14 Décembre 2015 par

# ILIAS SOURIKOPOULOS

# TECHNIQUES DE TRAITEMENT NUMÉRIQUE EN TEMPS CONTINU APPLIQUÉES À L'ÉGALISATION DE CANAL POUR COMMUNICATIONS MILLIMÉTRIQUES À FAIBLE CONSOMMATION

# CONTINUOUS-TIME DIGITAL PROCESSING TECHNIQUES APPLIED TO CHANNEL EQUALIZATION FOR LOW-POWER MILLIMETER-WAVE COMMUNICATIONS

Devant le jury d'examen:

Président : Alain CAPPY Rapporteur : Dominique MORCHE Rapporteur : Andrei VLADIMIRESCU Examinateur : Andreia CATHELIN Examinateur : Yannis TSIVIDIS Encadrant de thèse : Antoine FRAPPÉ Co-directeur de thèse : Laurent CLAVIER Directeur de thèse : Andreas KAISER



#### ABSTRACT

Receivers for 60GHz wireless communications have been profiting from innovation in wired links in order to meet a power budget that will enable integration in next-generation high-speed portable wireless terminals. Mixed-signal implementations of the Decision Feedback Equalizer (DFE) have been proposed to alleviate overall system consumption. In this thesis, power minimization is pursued by removing the clock from the feedback path of the DFE. Inspired by recent developments in Continuous-Time Digital Signal Processing, a continuous-time digital delay line is used. The design aims at mitigating wireless channel impairments caused by signal reflections in typical Line-of-Sight, indoors deployment conditions. The system is shown theoretically to achieve channel-dependent power consumption within acceptable Bit Error Rate performance for decoding. Moreover, a programmable digital delay element is proposed as the granular element of the delay line that exploits body biasing to achieve a coarse/fine functionality. Prototype DFE and delay lines have been fabricated and characterized in 28nm Fully Depleted Silicon Over Insulator technology (FDSOI).

#### RÉSUMÉ

Les récepteurs pour les communications sans fil très haut débit à 60 GHz tirent profit des innovations des liens filaires afin de réduire le budget de puissance, ce qui permettra l'intégration de la prochaine génération des terminaux portables sans fil. L'implémentation d'un égaliseur de canal à décision rétroactive, utilisant des signaux mixtes, est proposé pour diminuer la consommation globale du système. Dans ce mémoire, la réduction de consommation est atteinte par l'élimination de l'horloge du chemin de rétroaction de l'égaliseur. Inspiré par des récents développements en traitement des signaux numériques en temps continu, une ligne à retard numérique est aussi introduite. Le système conçu vise à atténuer les effets causés par les réflexions du signal dans des contextes de transmission en contact visuel entre le transmetteur et le récepteur. Les résultats théoriques montrent ainsi une consommation dépendante de la réalisation du canal. En outre, un élément de délai numérique programmable est proposé en tant qu'élément granulaire de la ligne à retard, en exploitant la polarisation de substrat des transistors, afin d'atteindre un réglage des délais extrêmement fin. Des démonstrateurs sur Silicium ont été fabriqués et caractérisés en technologie 28 nm FDSOI (Fully Depleted Silicon Over Insulator) pour démontrer les concepts proposés dans cette thèse.

Ilias Sourikopoulos IEMN/ISEN Dpt. SMART/CCI 41, Bd. Vauban 59046 Lille CEDEX ilias.sourikopoulos@isen.fr ilias.sourikopoulos@ed.univ-lille1.fr

Notice for manuscript – library archival version, May 2016:

This manuscript version contains previously unpublished circuits and measurement results. It is released exclusively for library archiving and internal registering.

"Choose to be happy"

- Ilíana

#### ACKNOWLEDGEMENTS

This thesis was carried out working around exquisite people. If only I could enumerate how each one of them helped me along this four year path, I would need a tome. I feel genuinely lucky I had the opportunity to work around them.

I wish to extend my gratitude to my thesis directors Andreas Kaiser and Laurent Clavier for their leadership and insight that brought this thesis to fruition. Same goes to my ISEN advisor Antoine Frappé, whose guidance and commitment to this project was inexhaustible. He worked hard to keep me firmly oriented towards the goal and took a fair part in the mental rollercoaster involved, especially during the peak workloads of taping-out and measuring.

I would also like to thank the reporters and members of my jury for accepting my director's invitation to review my work. Their remarks, comments and corrections are an exceptional opportunity for me to improve this work.

Special thanks go to my industrial partner STMicroelectronics and personally Andreia Cathelin who not only managed access to high-end fabrication technology but was always there to receive the pitch and calibrate the ideas.

Bruno Stefanelli's contribution for this work was also crucial. He helped in so many different ways in design, assembly and test. He was an unequivocal voice of reference.

Axel Flament and Jean-Marc Capron provided various comments and advice in the process. I used Axel's DAC design on my chip and Jean-Marc's next door design solidarity.

Emmanuel Dubois and David Delcroix have been my assembly wizards. Die laser cutting on a droplet of water(!) and pad-side-wedge-land-side-ball bonding(!) actually permitted to have the measured silicon results presented in this thesis. (For a moment process engineering seemed like a decent dayjob.)

I would like to thank Valerie Vandenhende for her warmth and attention especially through my early days in France and her friendship throughout the years.

Florence Alberti has acted in times as my psychotherapist as well as administration companion across the hall. I'm thankful she has been usually the first person to say hello to.

I would like also to thank Olivier Irrmann for after-hours humor and for opening a door to the dark side and Marta for a great message of encouragement.

Jean-Charles Caillez for arranging funding to attend an unforgettable MOSAIC workshop in Lille.

Thanks to the colleagues and organizing committee in #Doctoriales2014. I had a burnout week to build an award winning Toy Company and in maThèse180 for glorious fun and stardom exposure. Special thanks to the scientific director of College Doctoral Laurence Duchien and my "Prise de parole en Public" coach Geraldine BESSON.

Thanks to the Centre National de la Rcherche Scientific (CNRS) for funding this work as well as an encouraging cuisine.

I would like to thank my friends, the skin of the earth yesterday, leaders of industry and academia tomorrow (or whatever they choose to be, for that matter), my fellow PhD students past and present. Hani was my anchor point the first months. I miss working with him around global politics and current affairs. I shared a lot of interesting discussions with Fawzi, I regard him a brilliant man and I was sad to see him leave. I appreciated the humor of Arnaud and the always cheerful demeanor of Baptiste. Pietro assigned me my first tape-out deal, it was evident this guy wouldn't escape from academia. With Stéphane, we had high spirited debates over channel equalization and estimation for a year, I thank him for making me part of his work. Camillo saved the day in the end of his tour of duty in ISEN I wish him well overlooking the lake and Matteo is always a pleasure to have around, I might follow in his footsteps soon.

My greetings to our newly welcome member in the lab, Dipal and all my support and "Bon Courage" to Fikre and Cristian on their way to the top of their efforts. Thank you for keeping me sane.

I cannot thank you enough, Iliana, for your patience and your support through this. This goes out to you, my family and the friends I miss.

### CONTENTS

| CHA  | PTER 1 INTRODUCTION                                            | 18 |
|------|----------------------------------------------------------------|----|
| 1.1  | A gun that fired 100 years later                               | 20 |
| 1.2  | The never ending revolution                                    | 21 |
| 1.3  | All is digital, but what about timing?                         | 23 |
| 1.4  | Thesis contribution                                            | 24 |
| 1.5  | Thesis organization                                            | 24 |
| 1.6  | Chapter Bibliography                                           | 25 |
| CHAI | PTER 2 CONNECTED WITHIN CLOSED DOORS                           | 28 |
| 2.1  | Communicating through the 60GHz channel                        | 29 |
| 2.1. | 1 Free space loss and beamforming                              | 29 |
| 2.1. | 2 Penetration and reflectivity                                 | 31 |
| 2.1. | 3 Multipath propagation and Inter-Symbol Interference          | 32 |
| 2.1. | 4 Features of channel modeling                                 | 35 |
| 2.1. | 5 Simulating 60GHz channels                                    | 36 |
| 2.2  | Equalization for 60GHz channel receivers                       | 38 |
| 2.2. | 1 OFDM and Frequency domain equalization                       | 39 |
| 2.2. | 2 Time domain equalization                                     | 42 |
| 2.3  | Decision feedback equalization                                 | 43 |
| 2.3. | 1 Employing the DFE: from Gigabit wired to Gigabit wireless    | 45 |
| 2.4  | State of the art mixed-signal DFEs in 60GHz receiver basebands | 47 |
| 2.5  | Chapter Bibliography                                           | 50 |
| CHA  | PTER 3 A 60GHZ BASEBAND DFE WITH CHANNEL DEPENDENT POWER       |    |
| CON  | SUMPTION                                                       | 56 |
| 3.1  | Critical tap cancellation                                      | 57 |

9

| 3.2   | A DFE with channel dependent power consumption                      | 61  |
|-------|---------------------------------------------------------------------|-----|
| 3.2.1 | Continuous-time digital feedback filtering for the DFE              | 61  |
| 3.2.2 | 2 Continuous-time digital delay element with two-stage control      | 63  |
| 3.2.3 | Power consumption comparison against the clocked approach           | 64  |
| 3.2.4 | A DFE consumption profile linked to the channel realization         | 65  |
| 3.3   | Chapter Bibliography                                                | 66  |
| CHAP  | TER 4 CONTINUOUS-TIME DIGITAL DELAY-LINE IN 28NM FDSOI              | 68  |
| 4.1   | General delay line specification                                    | 69  |
| 4.2   | Digital delay elements - state of the art                           | 69  |
| 4.2.1 | L Cascaded inverters                                                | 71  |
| 4.2.2 | 2 Capacitive Shunting                                               | 71  |
| 4.2.3 | 3 Semi-static approach                                              | 72  |
| 4.2.4 | 4 Current-starving                                                  | 73  |
| 4.2.5 | 5 Thyristor-based delay element                                     | 75  |
| 4.2.6 | 5 Discussion                                                        | 78  |
| 4.3   | Prototype back-gate driven thyristor based delay line in 28nm FDSOI | 79  |
| 4.3.1 | Proposed Delay element design                                       | 80  |
| 4.3.2 | 2 Delay versus control                                              | 82  |
| 4.4   | Delay line prototyping                                              | 85  |
| 4.4.1 | Post-layout simulation results                                      | 87  |
| 4.4.2 | 2 Chip design                                                       | 88  |
| 4.5   | Measurement results                                                 | 89  |
| 4.6   | Chapter bibliography                                                | 100 |
| CHAP  | TER 5 A 5-TAP MW/GBPS DFE FOR 60GHZ BASEBANDS                       | 103 |
| 5.1   | Specification of the proposed DFE                                   | 104 |
| 5.2   | Prototype DFE circuit design                                        | 105 |
| 5.2.1 | L Principle of operation                                            | 105 |
| 5.2.2 | 2 Transistor level design                                           | 106 |
| 5.    | 2.2.1 Analog summer                                                 | 106 |

|     | 5.2.2.2                 | Comparator                          | 109 |
|-----|-------------------------|-------------------------------------|-----|
|     | 5.2.2.3                 | Comment on first-tap implementation | 109 |
| 5.3 | Chip de                 | sign                                | 110 |
| 5.4 | 5.4 Measurement Results |                                     | 111 |
| 5.5 | Chapte                  | r Bibliography                      | 111 |

### **LIST OF FIGURES**

| Fig. 1-1. The first phonautograph                                                                  |            |
|----------------------------------------------------------------------------------------------------|------------|
| Fig. 1-2. A 37.000 year old "hello world"                                                          |            |
| Fig. 1-3. "Télégraphe de Chappe" named after the inventor                                          |            |
| Fig. 1-4 FD-SOI transistor cross-section [STMicro].                                                |            |
| Fig. 1-5 Continuous-time digital signal processor block diagram [Tsividis, 2006]                   |            |
| Fig. 2-1 Average atmospheric absorption of millimeter waves [FCC, 1997 and Lai, 2008]              |            |
| Fig. 2-2 Beamforming principle of operation [Ruckus, 2013]                                         |            |
| Fig. 2-3 3D model of conference room highlighting reception through $1^{st}$ and $2^{nd}$ order re | flections  |
| from the ceiling and walls [Maltsev10]                                                             |            |
| Fig. 2-4 Multipath propagation and channel impulse response from [Molich, 2011]. Echoe             | es of the  |
| signal arrive through different paths spreading the channel impulse response in time               |            |
| Fig. 2-5 Demonstrating ISI generated by the superposition of three components at the receiv        | /er input  |
| [Molisch, 2011]                                                                                    |            |
| Fig. 2-6 60GHz channel impulse response structure from [Maltsev, 2014]                             |            |
| Fig. 2-7 Impulse response realizations created by the 802.11ad model for a living room sett        | ing with   |
| (a1, a2) different Tx-Rx distance and (b1, b2) different Rx HPBW. The total power of all cluster   | er rays is |
| normalized to one                                                                                  |            |
| Fig. 2-8 Tapped delay line channel model in [Proakis, 2007]                                        |            |
| Fig. 2-9 By applying an MMSE criterion the error between the transmitted signal and the c          | output of  |
| the equalizer is minimized                                                                         |            |
| Fig. 2-10 Eye diagram with (i) no multipath distortion (ii) one path arriving within symbol t      | time (iii) |
| one path arriving one symbol time later than the previous case                                     |            |
| Fig. 2-11 Principle of a SC-FDE linear equalizer as seen in [Saito, 2013]                          |            |
| Fig. 2-12 Time-domain linear equalizer diagram from [Boccuzzi, 2008]                               |            |
| Fig. 2-13 Typical structure of a decision feedback equalizer, presented along with a channel       | impulse    |
| response. The feedback filter (FBF) can correct only post-cursor ISI                               |            |
| Fig. 2-14 Backplane channel frequency and impulse response. At very high data rates such as        | 25Gbps,    |
| post-cursor ISI spans 15 taps [Bulchazzelli, 2013]. For a rate of ~2 Gbps one or two tap           | os would   |
| suffice                                                                                            |            |
| Fig. 2-15 Evolution of DFE summers in wired links. (a) Resistively loaded CML summer. (b)          | Current-   |
| integrating summer. (c) Sampled current integrating summer. (d) Peaking current-int                | egrating   |
| summer from [Bulchazzeli, 2013]                                                                    |            |
| Fig. 2-16 Mixed signal equalization relieves the power consumption of the dominating DSP bl        | ock 47     |
| Fig. 2-17 Mixed-signal baseband in [Sobel10].                                                      |            |

| Fig. 2-18 Cascode summation in [Thakkar, 2012]                                                | 49       |
|-----------------------------------------------------------------------------------------------|----------|
| Fig. 2-19 Flexible tap allocation in [Sobel, 2009]                                            | 49       |
| Fig. 3-1 Propagation of two paths within a room and Line-Of-Sight channel impulse response    | 57       |
| Fig. 3-2 A typical 60GHz LOS channel impulse response. We remark that ISI is strongly impac   | ted by   |
| the indicated, high amplitude components                                                      | 58       |
| Fig. 3-3 Worst-case BERs with an increasing number of critical impulse response compo         | onents   |
| cancelled, under Eb/N0=5dB                                                                    | 59       |
| Fig. 3-4 Worst case BER with and without equalization of 5 critical components alongside with | ith the  |
| theoretical AWGN channel at 1Gb/s BPSK                                                        | 59       |
| Fig. 3-5 BER degrades as coefficient cancellation is performed with less resolution           | 60       |
| Fig. 3-6 The proposed DFE with variable coefficients and delays                               | 61       |
| Fig. 3-7 DFE feedback path implementation in [Sobel, 2009] left, and approach in [Park, 2011] | ] right. |
|                                                                                               | 62       |
| Fig. 3-8 The DFE delay element and the granular delay circuit used in simulation              | 63       |
| Fig. 3-9 Power efficiency comparison for a delay element                                      | 64       |
| Fig. 3-10 Consumption comparison between a discrete-time, static delay-line approach an       | nd the   |
| continuous-time configurable delay architecture                                               | 65       |
| Fig. 4-1 Typical definitions of rise/fall time and rising/falling edge propagation delay      | 70       |
| Fig. 4-2 Programmable delay line in [Li, 2006]                                                | 71       |
| Fig. 4-3 Output slope varies with load capacitance from [Rabaey, 2003]                        | 72       |
| Fig. 4-4 Capacitive shunting control from [Nejad, 2003]                                       | 72       |
| Fig. 4-5 Semi-static approach relieves short-circuit currents in [Jung, 2011] but adds a      | static   |
| consumption overhead                                                                          | 72       |
| Fig. 4-6 Current starved inverter delay element with analog control from [Nejad, 2003]        | 73       |
| Fig. 4-7 Current starved delay element with digital control [Nejad, 2005]                     | 73       |
| Fig. 4-8 Schmitt trigger output stage on current starved delay from [Mahapatra, 2002]         | 74       |
| Fig. 4-9 CMOS thyristor concept in [Kim, 1996]                                                | 75       |
| Fig. 4-10 Complete thyristor delay element with static triggering in [Kim, 1996]              | 76       |
| Fig. 4-11 Enhancing positive feedback with a current source (M9)                              | 77       |
| Fig. 4-12 Delay element in [Kurchuk, 2012]                                                    | 77       |
| Fig. 4-13 FDSOI transistor cross-section                                                      | 79       |
| Fig. 4-14 The delay element circuit proposed by this work                                     | 80       |
| Fig. 4-15 Vc node evolution for a charging(red) or discharging(blue) event.                   | 81       |
| Fig. 4-16 Simulation of rising edge delay variation vs. control                               | 83       |
| Fig. 4-17 Simulation of falling edge delay vs. control                                        | 83       |

| Fig. 4-18 Delay coarse/fine control with gate/body biasing                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 84                 |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------|
| Fig. 4-19 Well arrangement for the granular delay cell                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 85                 |
| Fig. 4-20 Delay line block diagram                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 86                 |
| Fig. 4-21 The 'lead' cell between groups enables the output from the previous group and shuts                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | down               |
| the rest of the delay line                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 86                 |
| Fig. 4-22 Simulation of the programmable range for rising edge delay times                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 88                 |
| Fig. 4-23 Delay line prototype built for characterization and die detail with layout inset. The                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | delay              |
| line size is 140um x 7um                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 89                 |
| Fig. 4-24 Rising edge delay vs. gate/body biasing: (i) gate voltage variation: $V_{Gn}$ = 0-1V under V                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | <sub>Bn</sub> =0,  |
| (ii) body voltage variation: $V_{Bn}$ =0-0.8V under $V_{Gn}$ =0.3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 91                 |
| Fig. 4-25 Rising edge delay vs. gate/body biasing: (i)gate voltage variation: $V_{Gn}$ =0-1V under V                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | <sub>Bn</sub> =0,  |
| (ii) body voltage variation: $V_{Bn}$ =0-0.8V under $V_{Gn}$ =0.4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 91                 |
| Fig. 4-26 Rising edge delay vs. gate/body biasing: (i) gate voltage variation: $V_{Gn}$ =0-1V under V                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | <sub>Bn</sub> =0,  |
| (ii) body voltage variation: $V_{Bn}$ =0-0.8V under $V_{Gn}$ =0.5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 92                 |
| Fig. 4-27 Rising edge delay vs. gate/body biasing: (i)gate voltage variation: $V_{Gn}$ =0-1V under V                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | <sub>Bn</sub> =0,  |
| (ii) body voltage variation: $V_{Bn} = 0.0.8V$ under $V_{Gn} = 0.6$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 92                 |
| Fig. 4-28 Falling edge delay vs. gate/body biasing: (i) gate voltage variation: $V_{Gp}$ =0-1V under V                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | <sub>Вр</sub> =1,  |
| (ii) body voltage variation: $V_{Bp}$ =1-0.2V under $V_{Gp}$ =0.4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 93                 |
| Fig. 4-29 Falling edge delay vs. gate/body biasing: (i) gate voltage variation: $V_{Gp}$ =0-1V under V                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | <sub>Вр</sub> =1,  |
| (ii) body voltage variation: $V_{Bp}$ =1-0.2V under $V_{Gp}$ =0.5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 93                 |
| Fig. 4-30 Falling edge delay vs. gate/body biasing: (i) gate voltage variation: $V_{Gp}$ =0-1V under V                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | <sub>Вр</sub> =1,  |
| (ii) body voltage variation: $V_{Bp}$ =1-0.2V under $V_{Gp}$ =0.6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 94                 |
| Fig. 4-31 Falling edge delay vs. gate/body biasing: (i) gate voltage variation: $V_{Gp}$ =0-1V under V                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | <sub>Bp</sub> =1,  |
| (ii) body voltage variation: $V_{Bp}$ =1-0.2V under $V_{Gp}$ =0.7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 94                 |
| Fig. 4-32 Rising edge delay vs. biasing gate/body (39 elements) Body curve: $V_{Gp}=0V$ , $V_{Bp}=1$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | V, V <sub>Gn</sub> |
| $=500$ mV, $V_{Bn} = -800$ mV to $800$ mV. Gate curve: $V_{Gp} = 0$ , $V_{Bp} = 1$ , $V_{Bn} = 0$ , $V_{Gn} = 0.5 - 1$ V                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 95                 |
| Fig. 4-33 Falling edge delay vs. biasing gate/body (39 elements) Body curve: $V_{Gp}$ =500V, $V_{Bp}$ =20                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 0mV-               |
| 1.8V, $V_{Gn} = 1V$ , $V_{Bn} = 0V$ Gate curve: $V_{Gp} = 0V - 0.5V$ , $V_{Bp} = 1$ , $V_{Bn} = 0$ , $V_{Gn} = 1V$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 95                 |
| Fig. 4-34 Delay line programmability measurement                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 96                 |
| Fig. 4-35 Single, granular delay element rising edge delay vs. biasing gate/body. Body curve: V                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | <sub>Bn</sub> =0-  |
| 0.8V, $V_{Gn}$ =500mV, $V_{Gp}$ =0V, $V_{Bp}$ =1V Gate curve: $V_{Gn}$ =0.2V-1V, $V_{Bn}$ =0, $V_{Gp}$ =0V, $V_{Bp}$ =1. The sensitive sensitive sense is the sense of the sens | vity is            |
| 40ps/800mV=50fs/mV                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 97                 |
| Fig. 4-36 Total delay line variation with supply voltage for $V_{Gn} = V_{Bn} = 0.5$ , $V_{Gp} = 0$ , $V_{Bp} = 1$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 98                 |
| Fig. 5-1 Block level circuit design for the proposed DFE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 104                |
| Fig. 5-2 Summer and coefficients                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 107                |

| Fig. 5-3 Voltage to current converter used for implementing coefficient values             | 108 |
|--------------------------------------------------------------------------------------------|-----|
| Fig. 5-4 Differential Cascode Voltage Switch XOR gate connected to current-steering switch | 108 |
| Fig. 5-5 Comparator with preamp and trimmer function                                       | 109 |
| Fig. 5-6 Typical PCB (left) vs. 60GHz wireless LOS indoor (right) IRs ( from Chapter 3)    | 110 |
| Fig. 5-7 Chip photograph with DFE layout inset                                             | 110 |

#### LIST OF TABLES

| Table 2-I Typical penetration losses [Xu 2002], [Manabe 1996], [Sato, 1997], [Langen, 1994] |     |
|---------------------------------------------------------------------------------------------|-----|
| Table 2-II State of the art Frequency Domain Equalizers                                     |     |
| Table 2-III State of the art mixed-signal DFE's for 60GHz receivers                         | 50  |
| Table 4-I Qualitative comparison of delay element types                                     |     |
| Table 4-II Summary of performance for the reviewed delay elements                           |     |
| Table 4-III Summary of simulation results                                                   |     |
| Table 4-IV Coarse/fine delay measurements for complete delay line enabled                   |     |
| Table 4-V Rising edge delay measurements for $V_{Gn}$ = $V_{Bn}$ =500mV                     |     |
| Table 4-VI Supply voltage sensitivity derived                                               |     |
| Table 4-VII Power consumption for 100MHz input $V_{Gn} = V_{Bn} = 500 \text{mV}$            |     |
| Table 5-I Table of specifications for the DFE                                               | 105 |
| Table 5-II Coefficient summing logic.                                                       | 106 |
| Table 5-III DFE extrapolated performance from simulations against state of the art          | 111 |

#### ABBREVIATIONS

| ADC    | Analog to Digital Converter                         |
|--------|-----------------------------------------------------|
| BER    | Bit Error Rate                                      |
| BPSK   | Binary Phase Shift Keying                           |
| CT-DSP | Continuous-Time Digital Signal Processing/Processor |
| DFE    | Decision Feedback Equalizer                         |
| DIBL   | Drain Induced Barrier Lowering                      |
| DSP    | Digital Signal Processing/Processor                 |
| FCC    | Federal Communication Commission                    |
| FDE    | Frequency Domain Equalization                       |
| FFF    | Feed –Forward Filter                                |
| FFT    | Fast Fourrier Transform                             |
| FIR    | Finite Impulse Response                             |
| FDSOI  | Fully depleted silicon over insulator               |
| HPBW   | Half-Power Beam-Width                               |
| IR     | Impulse Response realization                        |
| ISI    | Inter-Symbol Interference                           |
| ITU    | International Telecommunication Union               |
| ITU-R  | ITU-Radio communications sector                     |
| LOS    | Line-of-Sight                                       |
| LTE    | Long Term Evolution                                 |
| NLOS   | Non-Line-Of-Sight                                   |
| NMOS   | N-channel Metal Oxide Semiconductor                 |
| OFDM   | Orthogonal Frequency-Division Multiplex             |
| PMOS   | P-channel Metal Oxide Semiconductor                 |
| QAM    | Quadrature Amplitude Modulation                     |
| QPSK   | Quadrature Phase Shift Keying                       |
| Rx     | Receiver                                            |
| SCE    | Short-Channel Effects                               |
| SNR    | Signal to Noise Ratio                               |
| TG     | Task Group                                          |
| Тх     | Transmitter                                         |
| WLANs  | Wireless Local Area Networks                        |
| WPANs  | Wireless Personal Area Networks                     |

# Chapter 1 Introduction

Our very existence relies on communication. We receive information from the environment through our senses and at the same time we emit ourselves, consciously or not, by action or emotion. We establish inter-personal relations and have evolved our societies. We use codes to get through and we continue to nurture and evolve them. Our reach has extended. We have cooperated to surpass the inhibiting constraints of our biological limits and we continue to build wonders. Our voices and images now resonate across the planet as we probe our cosmos to find meaning.

Audiovisual information has been the most comprehensive for our understanding and has been communicated since the depths of time. Unfortunately, for audio, it took a great deal of human evolution to reach the invention of *phonautograph*<sup>1</sup> (Figure 1-1) [Crandall, 1925]. This explains the profound lack of empirical data on the age and origin of language and the apparent debate of theories on language evolutionary models [Ulbaek, 1998]. On the other hand, visual information has been carefully preserved for us to find under thin layers of calcite. Hand stencils (Figure 1-2) in the El Castillo cave in Spain mark among the oldest findings of prehistoric painting dating at least 37000 years ago [Pike, 2012]. This is perhaps the first proof of abstract thinking, symbolism or even art. We are still trying to reconnect with the messages of our ancestors to define who we are.



Fig. 1-1. The first phonautograph.



Fig. 1-2. A 37.000 year old "hello world".

<sup>&</sup>lt;sup>1</sup>Invented by the Parisian inventor Edouard-Leon Scott de Martinville in 1857

The need to relay messages in distance spawned to define what we call today telecommunications<sup>2</sup>. The societies grew in numbers and populations spread geographically. The scope increased and became of great importance in administration, as well as in times of war. Information based on pre-agreed codes and arrangements has been transmitted and received from ancient times using fire and smoke. In the tragic trilogy Oresteia [Aeschylus, 458BC], it is mentioned that the message for the fall of Troy was relayed back to Mycenae by mountain-top fire beacons ( $\varphi p \nu \kappa \tau o \dot{\varsigma}$ ). The exact position of the beacons is recited, covering mostly over-sea distance of hundreds of miles. A similar example of communication included the use of smoke signals over the Great Wall of China [Sterling, 2008]. These can be described

today as examples of a 1-bit Line-of-Sight communication with coding.

Lots of techniques as such have been used in the pre-electric era to achieve long distance relays. As a precursor to the postal service, couriers or messengers on horseback, or running, relayed the vital information. In the romantic example of Browning's poem, Pheidippides, the eponymous runner is depicted as the bearer of news of the Athenian victory in the battle of Marathon.

In a more recent example, in 18<sup>th</sup> century France, methods of messaging included notably the *telegraph semaphore*, which is a visual long-distance sign system used during Napoleonic times (Figure 1-3) and the *heliographs:* the solar light flashing system which was still used in the 20<sup>th</sup> century.



Fig. 1-3. "Télégraphe de Chappe" named after the inventor.

With the evolution of electromagnetic theory by a number of scientists around the end of the

 $<sup>^2</sup>$  compound of the Greek prefix *tele*- (τηλε-), meaning "distant", and the Latin *communicare*, meaning "to share" conceived by the French engineer and novelist Édouard Estaunié.

19th century, electricity, wired and wireless telecommunications were established. The first *electric telegraph* line was set in 1844 between Washington and Baltimore. In 1876 the *telephone* was presented by Bell and in 1887 the transmission of electromagnetic waves was demonstrated by Hertz.

The turn of the century found many scientists and entrepreneurs capitalizing on the scientific achievements and striving to advance the technology of electronic components. Up to now, little has changed in regard to the momentum of this movement. This continuous effort is constantly changing multiple aspects of our everyday life, from communications and healthcare, to commerce, education and entertainment. This thesis joins this effort. The work carried out is aiming to be applied to high-speed (Gb/s) communications by virtue of millimeter wave (mm-Wave) wireless transmission.

In the rest of this introduction we highlight the main influencers of this work, namely, mm-Wave communications, circuit design with FD-SOI technology, and continuous-time digital signal processing. These will lead to stating the contribution of this thesis. The chapter ends by outlining the organization of this manuscript.

## 1.1 A gun that fired 100 years later

In March 6, 1899 Lord Rayleigh communicated to the Royal Society in London the results of experiments by Jagadish Chandra Bose, who was performing millimeter-wave research and had invented a novel detection device. Bose had also developed multiple components to accommodate millimeter wave transmission [Sen, 1997] including a millimeter wave spark transmitter and the abovementioned self-recovering coherer detector. All these were used in a demonstration of remotely firing a gun. According to [Bondyopadhyay, 1998] which attempts to correct a "century-old misinformation", it is J.C Bose who is responsible for the "mercury coherer with a telephone" that G. Marconi used for the transatlantic radio transmission in 1901, while Marconi's patent features a trivially modified version of Bose's detector.

Almost a century later (in 1994), the United States Federal Communication Commission (FCC) put mm-Wave research on the map by establishing an unlicensed band at 59-64GHz. Soon, multiple regulatory bodies around the world allocated overlapping bands. For instance, in Europe and Japan the 57-66GHz band was allocated. The 9GHz bandwidth was partitioned by the International Telecommunication Union [ITU] into four channels of 2.16GHz starting

an era of Gb/s wireless data rates.

As mentioned, applied research in millimeter-wave communications dates back to the turning of the last century. Nevertheless, a center stage role was taken for the last 20 years, promptly following regulation. Around the early 1990s, the first efforts in building radios around the 60GHz band were targeting III-V semiconductors [Niomiya, 1996], because their inherent characteristics enable high frequency operation. However, as shrinking of the transistor sizes went on, and cut-off frequencies increased, silicon CMOS adoption was pursued, notably from the 130nm node [Razavi, 2006]. A comprehensive summation of the state of the art in 60GHz integrated circuits and systems can be found in [Rappaport, 2011].

As far as standardization is concerned, the preliminary work of Task Group 3c on IEEE 802.15 soon evolved and currently mm-Wave communications are identified in the IEEE 802.11ad-2012 amendment as the forthcoming generation of Wireless Local Area Networks. Though products and dedicated solutions have been primarily presented by the television manufacturers of the WirelessHD consortium [WiHD], it is expected that full commercial adoption of the technology will be triggered by the computer manufacturers' alliance. Indeed, recently, the standard has been promoted by merging the so-called WiGig Alliance with the Wi-Fi Alliance [Wi-Fi, 2013]. With unique standardization and marketing put in place, major computer and chip manufacturers are expected to populate the first generation of consumer products very soon. Mm-Wave communications are currently taking off.

At this opportune timing, this thesis work encompasses mm-Wave communications as the application field. It focuses in mobile device integration, which is expected to take some more time to be deployed commercially, due to overhead software development and power consumption issues.

## 1.2 The never ending revolution

Since the achievements of the pioneers in the beginning of the last century, an ever growing torrent of scientific advancement is turning science-fiction to reality. From the days of the radio and the transmission of sound, we have gone to satellites, the internet, and ubiquitous connectivity. The transmission of image and sound with unprecedented definition is common in our modern systems. Nonetheless, nanotechnology is promising to extend the information exchange to concerning more senses in the next few years [Toko, 2013]. This relentless boom of technology has been mainly fueled by augmenting information processing capabilities.



Fig. 1-4 FD-SOI transistor cross-section [STMicro].

Set off by the invention of the integrated circuit, the miniaturization efforts have gone the distance from microns to nanometers in a scaling frenzy. Processing power doubled every couple of years and promptly rendered relatively any "new" technology soon obsolete. The circuit design workhorse radically changed our way of living; the way we communicate and the way we process information.

Throughout this scaling process, though multiple innovations have been applied over the years to address the physical phenomena encountered, the principle of operation for the transistor devices has remained the same. As this thesis is written, transistor sizing in production has reached 14nm. The involved phenomena of short-channel effects (SCE) and drain induced barrier lowering (DIBL) [SOI, 2014] have pushed major manufacturers to diverge from the typical bulk CMOS paradigm in order to ensure continuing shrinking device size unobstructed. At the time, bulk CMOS is been replaced by process methodologies as 3D Tri-gate [Intel, 2015], FinFet [TSMC, 2015] or FDSOI [STMicro], concerning high-end nodes below 32nm.

For the prototyping purposes of this thesis 28nm Fully Depleted Silicon over Insulator (FD-DOI) offered by STMicroelectronics was available. FD-SOI features a very thin un-doped silicon film implementing the transistor channel. Beneath the channel, a buried thin body oxide is developed, as shown in Figure 1-4, which effectively enables modulation of the transistor threshold by regulating the voltage on the transistor body. The body biasing option serves effectively as a knob in optimizing performance and power. When the polarization of the substrate is positive, as in creating the same field direction as the gate (Forward Body Bias), the transistor can be switched faster. As it will be seen further, being able to work upon this technology enabled unique research directions for this work.

# 1.3 All is digital, but what about timing?

The term "digital system" is very commonly used not only to describe how a system processes information, but also to imply the system's timing characteristics. This is because most digital systems operate in the discrete-time domain. However, demonstrated continuous-time digital signal processors (CT-DSP) [Kurchuk, 2012] [Vezytzis, 2015] propose digital data processing without the use of a clock reference. This enables unique performance characteristics. In clocked systems, processing operations go on even in the absence of the input signal. In CT-DSP, it is the presence and dynamic characteristics of that signal that essentially drive the processing of data. It has been proven that this leads to activity dependent power consumption and a less noisy spectral output, because the absence of sampling mitigates the presence of quantization noise.

The main features of this methodology are notably level-crossing quantizers and continuoustime digital delay elements in order to implement digital filters similar to the one seen in Figure 1-5. During this thesis the design of continuous-time delay elements has been undertaken using ideas from the works that followed this principle.



Fig. 1-5 Continuous-time digital signal processor block diagram [Tsividis, 2006].

## **1.4 Thesis contribution**

As the aforementioned fields fused during the time of this thesis, the research results, demonstrated through this manuscript, are the following:

- An approach dealing with the problem of baseband equalization in mm-Wave communications. Line-of-Sight (LOS) deployment is suggested in order to motivate the development of low-power equalization for mobile devices. An efficient critical-tap cancellation scheme is proposed, which entails digital filtering implemented with variable coefficients in magnitude as well as delay. The use of continuous-time delays is considered, in order to mitigate overhead consumption due to clock routing and establish a channel dependent power consumption profile [Sourikopoulos, 2014].
- The design of a tunable digital continuous-time delay element in 28nm FD-SOI. The introduction of body-biasing is attempted for the first time in developing such circuits in order to validate the fact that gate/body biasing can enable a coarse/fine control scheme for the output delay value.
- The design of a flexible low-power mixed-signal decision feedback equalizer for 60GHz channels in 28nm FD-SOI. The topology features multiple continuous-time delay lines composed of the aforementioned delay elements. The results reveal that the power consumption of this circuit: (i) is lower than traditional implementation with clocked elements and (ii) depends on the channel realization targeted for equalization.

## 1.5 Thesis organization

The remainder of this manuscript is organized as follows:

**Chapter 2** is a crash course in 60GHz channel equalization. The problem of Inter-Symbol Interference (ISI) is elaborated and the different approaches in equalization are iterated along with the state of the art.

**Chapter 3** describes our proposed approach in the context of mixed-signal equalization. The theoretical/simulated results presented here have motivated the prototyping work.

**Chapter 4** deals with the design of the proposed delay element. After going through relevant state of the art and simulated results, silicon prototyping for a delay line is detailed and

measurement results are presented.

**Chapter 5** presents the design of the proposed equalizer. It highlights key design choices and concludes with the demonstration of the characterization test-bench and measurement results.

**Chapter 6** concludes this manuscript by reflecting on the outcome of this effort and proposing directions for further research.

## 1.6 Chapter Bibliography

[Aeschylus, 458BC] Aeschylus - Agamemnon, verses 280-330

[Bondyopadhyay 1998] Bondyopadhyay, P.K. "Sir J.C. Bose Diode Detector Received Marconi's First Transatlantic Wireless Signal of December 1901 (the 'Italian Navy Coherer' Scandal Revisited)." *Proceedings of the IEEE* 86, no. 1 (January 1998): 259–85. doi:10.1109/5.658778.

[Crandal, 1925] Crandall, Irving B. "The Sounds of Speech." *Bell System Technical Journal* 4, no. 4 (October 1925): 586–626. doi:10.1002/j.1538-7305.1925.tb03969.x.

[Intel, 2015] "Intel®22 nm Technology." http://www.intel.com/content/www/us/en/silicon-innovations/intel-22nm-technology.html.

[ITU] ITU: Committed to Connecting the World." http://www.itu.int/en/Pages/default.aspx.

[Kurchuk, 2012] Kurchuk, M., C. Weltin-Wu, D. Morche, and Y. Tsividis. "Event-Driven GHz-Range Continuous-Time Digital Signal Processor With Activity-Dependent Power Dissipation." *IEEE Journal of Solid-State Circuits* 47, no. 9 (September 2012): 2164–73. doi:10.1109/JSSC.2012.2203459.

[Ninomiya, 1996] Ninomiya, T., T. Saito, Y. Ohashi, and H. Yatsuka. "60-GHz Transceiver for High-Speed Wireless LAN System," 2:1171–74. IEEE, 1996. doi:10.1109/MWSYM.1996. 511238.

[Pike, 2012] Pike, A. W. G., D. L. Hoffmann, M. Garcia-Diez, P. B. Pettitt, J. Alcolea, R. De Balbin, C. Gonzalez-Sainz, et al. "U-Series Dating of Paleolithic Art in 11 Caves in Spain." *Science* 336, no. 6087 (June 15, 2012): 1409–13. doi:10.1126/science.1219957.

[Rappaport, 2011] Rappaport, T.S., J.N. Murdock, and F. Gutierrez. "State of the Art in 60-GHz Integrated Circuits and Systems for Wireless Communications." *Proceedings of the IEEE*, vol. 99, no. 8 (August 2011): 1390–1436. doi:10.1109/JPROC.2011.2143650.

[Razavi, 2006] Razavi, B. "A 60-GHz CMOS Receiver Front-End." *IEEE Journal of Solid-State Circuits* 41, no. 1 (January 2006): 17–22. doi:10.1109/JSSC.2005.858626.

[Sen, 1997] Sen, A.K. "Sir J.C. Bose and Radio Science," 2:557–60. IEEE, 1997. doi:10.1109/MWSYM.1997.602854.

[SOI, 2014] Kononchuk, Oleg, and Bich-Yen Nguyen, eds. *Silicon-on-Insulator (SOI) Technology: Manufacture and Applications*. Woodhead Publishing Series in Electronic and Optical Materials 58. Amsterdam: WP, Woodhead Publ./Elsevier, 2014.

[Sourikopoulos, 2014] Sourikopoulos, Ilias, Antoine Frappe, Andreas Kaiser, and Laurent Clavier. "A Decision Feedback Equalizer with Channel-Dependent Power Consumption for 60-GHz Receivers," *International Symposium in Circuits and Systems, Melbourne, Australia* 1484– 87. IEEE, 2014. doi:10.1109/ISCAS.2014.6865427.

[Sterling, 2008] Sterling, Christopher H., ed. *Military Communications: From Ancient Times to the 21st Century*. Santa Barbara, Calif: ABC-CLIO, 2008.

[STMicro] http://www.st.com/web/en/about\_st/fd-soi.html

[Tsividis, 2006] Tsividis, Y. "Mixed-Domain Systems and Signal Processing Based on Input Decomposition." *IEEE Transactions on Circuits and Systems I: Regular Papers* 53, no. 10 (October 2006): 2145–56. doi:10.1109/TCSI.2006.882822.

[TSMC, 2015] "Taiwan Semiconductor Manufacturing Company Limited." http://www.tsmc.com/english/dedicatedFoundry/services/reference\_flow.htm.

[Toko, 2013] Toko, Kiyoshi, Takeshi Onodera, and Yusuke Tahara. "Nano-Biosensors for Mimicking Gustatory and Olfactory Senses." In *Bio-Nanotechnology*, edited by Debasis Bagchi, Manashi Bagchi, Hiroyoshi Moriyama, and Fereidoon Shahidi, 270–91. Oxford, UK: Blackwell Publishing Ltd., 2013. http://doi.wiley.com/10.1002/9781118451915.ch15.

[Ulbaek, 1998] Ib Ulbaek: The origin of language and cognition. In Hurford, James R., ed. *Approaches to the Evolution of Language: Social and Cognitive Bases*. Repr. Cambridge: Cambridge Univ. Press, 2001.

[Vezyrtzis , 2014] Vezyrtzis, C., W. Jiang, S.M. Nowick, and Y. Tsividis. "A Flexible, Event-

Driven Digital Filter With Frequency Response Independent of Input Sample Rate." *IEEE Journal of Solid-State Circuits* Early Access Online (2014). doi:10.1109/JSSC.2014.2336532.

[WiHD] http://www.wirelesshd.org/

[Wi-Fi, 2013] "Wi-Fi Alliance® and Wireless Gigabit Alliance to Unify | Wi-Fi Alliance." http://www.wi-fi.org/news-events/newsroom/wi-fi-alliance-and-wireless-gigabit-allianceto-unify.

# Chapter 2 Connected within closed doors

Thanks to the continuous effort to deliver ever higher information throughput, digital communications are at the doorstep of commercialization for the 60GHz era and multi-Gbps wireless transmission. Actively researched for more than 20 years, major stakeholder consortia now drive the industry towards what is a paradigm shift in wireless connectivity and Wi-Fi experience. This was sparked by putting this millimeter wave communication scenario on the map of regulatory bodies. After multinational coordination, 5GHz of continuous bandwidth worldwide have been made available for general use (industrial, medical and scientific) wireless and mobile communications.

Wide adoption for this technology has set off by complementing current popular wireless and mobile standards. Major telecommunication companies and research institutions are grooming this technology for what will be 5G (or more for that matter) in the mobile evolution [Rappaport, 2013], while the standardization of a new 60GHz generation of Wi-Fi as in [802.11ad, 2012] has been in place already since late 2012. The first commercial transceiver chip along with the first personal computer featuring 60GHz technology have already been showcased and as major computer and telecommunication companies are beginning to get involved, wide proliferation is expected soon. A recent financial report [MarketsAndMarkets, 2015] forecasts a 1.7 billion market for 60GHz products in the next five years.

Concerning the technical challenges, multiple innovations have enabled progress. Obviously the leap to the 60GHz band has introduced a series of challenges starting with front-end technology and this is due to the special characteristics of the wireless channel. During signal propagation, high attenuation and low diffraction effects are experienced. That's why this technology largely depends upon the use of high gain antennas with beamforming to establish the communication link.

The focus of this work aims at the baseband part of a 60GHz receiver, namely on the equalizer block. This block is responsible for the removal of Inter-Symbol Interference (ISI) after signal down-conversion.

This chapter will provide a foundation for the contribution of this thesis. Key channel characteristics will precede the presentation of state of the art equalization methods and the

evaluation of the equalizer in the receiver baseband chain. The mixed-signal equalization approach has been recently put forward as the most energy efficient one, so the chapter concludes with the presentation of state of the art of mixed signal DFEs. These developments will be the pilot to the essence of this research effort: the system and circuit level contributions presented in the chapters to follow.

## 2.1 Communicating through the 60GHz channel

Establishing a communication channel of 60GHz has revealed multiple intricacies that not only change drastically the relevant transceiver architecture front-ends and PHYs, but also propose new use-models for future systems. Generally, comparing with established standards in the 2-5GHz band, one can expect 20-30dB more attenuation, with a quasi optical character and building material confinement. Though these intricacies in channel modeling are out of the scope of this thesis, it is important for system architecture design to have insight into the main features of transmission. This will readily help identify the applicable communication scenarios and equipment for deployment. Therefore, as a foundation, basic features of propagation through the 60GHz channel are presented in this section.

#### 2.1.1 Free space loss and beamforming

It is well known from electromagnetism that as frequencies of transmission increase, the capability of establishing a communication link demands more power. For 1 meter of free space propagation, according to [Friis, 1946], the 60GHz channel exhibits almost 28dB higher loss than the 2.4GHz channel. Apart from free space path loss, millimeter wave propagation accounts for extra attenuation due to atmospheric absorption by oxygen, water vapor or rainfall. Figure 2-1 displays the attenuation of electromagnetic signals in air. These conditions undermine any overall low power operation specification at the transceiver. That's because for what concerns the most power hungry component of the system the power amplifier, more power is needed in order to compensate for the increased path loss. In addition to that, it also means less efficiency when comparing with a lower frequency transmission, because the amplifier is operating closer to the maximum frequency of operation. Evidently, this mode is not encouraging long distance communications. This



Fig. 2-1 Average atmospheric absorption of millimeter waves [FCC, 1997 and Lai, 2008].

caused relaxing substantially the regulation for transmitted power. For example, in the United States, as per [FCCpart15, 2009], the limit for field strength in the 57-64GHz band is around 165 times the limit for emissions in the 2.4-5GHz band. This fact, however, has some positive ramifications as a communications scenario because interception becomes more difficult due to the weaker signal. This becomes attractive in terms of data security. In fact, 60GHz transmission was first proposed as a battlefield communications mode exactly for this reason [Agilent, 2013]. Furthermore, from a spectrum utilization standpoint, spectrum decongestion can be envisioned as the established link becomes easily spatially confined and still there's ample bandwidth in comparison with the lower GHz bands.



In order to compensate for the path loss, beamforming has been regarded as a key enabling

Fig. 2-2 Beamforming principle of operation [Ruckus, 2013].

technology. Beamforming refers to using multiple antenna elements in order to enhance antenna directivity. The beamforming gain is realized by introducing a phase shift between the different antenna elements as seen in Figure 2-2. This results to the appearance of maxima in total radiated energy towards a certain direction, due to constructive interference. Therefore, signal power and consequently SNR are increased and faster data rates are achieved. This technique is more suited to millimeter-wave communication rather than the 2.4-5GHz band, because multiple antenna elements can occupy a smaller area. For example, a square 4x4 antenna array with 16 antenna elements could be packed in 1cm<sup>2</sup> with adjacent elements separated by half the wavelength [Perahia, 2010].

#### 2.1.2 Penetration and reflectivity

One of the most prominent facts about signal propagation at 60GHz is that it generally exhibits high penetration losses and generally poor reflectivity on common building materials. Some penetration losses from the literature, are presented in Table 2-I. Also, as mentioned in [Maltsev, 2014], propagation through diffraction is a practically unviable mechanism. The power transmitted reaches the receiver through the Line-Of-Sight or signal reflections as seen in Figure 2-3. The latter are expected to sustain substantial attenuation depending on the reflectivity of the material. As derived from the measurements of



4.5 m

Fig. 2-3 3D model of conference room highlighting reception through 1<sup>st</sup> and 2<sup>nd</sup> order reflections from the ceiling and walls [Maltsev10].

| Building Material | Penetration Loss |
|-------------------|------------------|
| Glass             | 1.7-4.5 dB       |
| Plasterboard Wall | 5.4 to 8.1 dB    |
| Wooden panel      | 7 dB             |
| Limestone         | >30 dB           |
| Concrete          | >30 dB           |
| Metalized glass   | >30 dB           |
| Granite           | >30 dB           |

[Langen, 1994], reflectivity is associated with the thickness as well as the roughness of the material surface and could reach a few decades of decibels from common building materials. Therefore, it is expected that for indoor environments, the practical use model will be receiving the signal from the direct path, or worse, by locking-in to a low-order reflection (i.e. after one or two consecutive interactions with the surroundings). The latter will demand a more austere link budget. For this reason, in order to minimize losses, the antenna beam should be steered towards the transmitter or the point of reflection.

#### 2.1.3 Multipath propagation and Inter-Symbol Interference

With the previous descriptions we can surmise that the power reaching the receiver will be the sum of radiation coming from different directions. This *multipath propagation* is a general characteristic of wireless channels and refers to the existence of a multitude of propagation paths from the transmitter to the receiver. The signal can be reflected, diffracted or scattered along its way [Molisch, 2011]. Already, it is clear that signal propagation in 60GHz might appear spatially confined, especially when indoors, so a large number of *multipath components* (essentially echoes of the signal) might arrive at the receiver. These components, following different trajectories, arrive at the receiver with different delay times. The extent of these arrivals in time constitutes the *delay spread* of the channel.



Fig. 2-4 Multipath propagation and channel impulse response from [Molich, 2011]. Echoes of the signal arrive through different paths spreading the channel impulse response in time.

We can instantiate exactly this situation by taking a snapshot of the amplitude and phase of the received components after transmitting an impulse to provide the channel impulse response realization (IR throughout this text) as seen in Figure 2-4. In an ideal channel case the response should be a single impulse, however, what we have is a series of multipath components corresponding to the different propagation paths, as depicted. These multipath components are the result of scaling and phase shifting of the original impulse, sustained throughout propagation.

If there is visual contact between the receiver and the transmitter, the maximum amplitude component will refer what is known as the LOS path, which arrives always first ahead of all others. This is because it covers the shortest distance and sustains minimum interaction with the environment. In a situation where no visual contact is established, the highest amplitude component will refer to the most powerful reflection, due to the signal bouncing off a reflective surface. In any of these two cases, this component represents the reference with which the receiver will synchronize. This component is called *the cursor*.

In the case of receiving a symbol, if the delay spread is longer than symbol duration, then Inter-Symbol Interference is observed, which potentially leads to the loss of data. Figure 2-5 details the situation. A data stream is sent by the transmitter (Tx) in the top waveform. The data propagate through the multipath channel and arrive arbitrarily scaled, delayed and dephased to the transmitter. The examples of three path waveforms are displayed.



Fig. 2-5 Demonstrating ISI generated by the superposition of three components at the receiver input [Molisch, 2011].

These arriving waveforms, noted as Paths 1-3, are summed at the receiver. Upon reception the receiver should take a decision about the symbol arriving. The simplest way is applying an amplitude threshold for comparison. So, as the amplitude of the received symbol is compromised by interfering symbols, the decision could be erroneous, resulting in data loss.

The use of directive antennas and beamforming, however, shortens the delay spread because it permits only a part of the multipath spread to arrive with sufficient power to the receiver. Unfortunately, in the case of the 60GHz channel, as seen in all experimental results, [eg. Xu, 2002] multiple decades of nanoseconds of multipath spreads are expected for common indoor environments (living room, conference room, cubicle) even with applied beamforming transmitter and receiver.

ISI is the cause of *irreducible errors* that cannot be remedied by simply increasing the power in the transmitter. Therefore, dedicated functions are needed to confront the situation. These measures can be the application of OFDM (Orthogonal Frequency-Division Multiplex) or the employment of equalizers. We will review these further, after summarizing some channel modeling aspects.



Fig. 2-6 60GHz channel impulse response structure from [Maltsev, 2014].

#### 2.1.4 Features of channel modeling

Despite the fact that the impulse response is sufficient to describe the behavior of linear and time invariant systems, this is not the case for a 60GHz channel. As already mentioned in the previous paragraph, we refer to impulse response *realizations* to hint the fact that the communication channel is dependent on the environment. It is commonly modeled as a linear time variant system to reflect relatively fast changes in the environment, like for instance the case of mobility.

Analyzing the different impairments that the signal undergoes in a deterministic way is usually complicated and renders little accuracy. So, most of the times, the goal is to describe the probability of attaining a value for a certain parameter. This is done through assigning the description of related phenomena to statistical distributions. For instance, the Rayleigh and the Rician distributions are used to describe the fluctuations for *multipath fading* in the 60GHz channel [Molisch, 2011].

Besides the multipath character, the 60GHz channel has also been reportedly exhibiting a clustered impulse response structure seen in Figure 2-6. In an ideal reflection case one ray associated with the reflection should be received. However, the multipath components referring to reflection paths appear to form clusters. In studies of channel characterization experiments [Sawada, 2010] *clustering* has been attributed to the fine character of reflecting surfaces. Statistic distributions are employed to intra and inter-cluster characteristics such as

cluster amplitude and timing.

Total signal amplitude may vary because of *shadowing* from objects preventing the signal to reach the receiver, or even slight movement. For a better understanding of what the latter entails, a movement of the receiver for a few millimeters might turn interference from constructive to destructive for the 60GHz band. In the case of 2GHz that could have been around 10cm. Additionally, when signal propagation is blocked, some components from the impulse response could be either removed or even more clusters could be added, because of additional reflections. This effect is taken into account by the introduction of the *cluster blockage probability* associated with each type of cluster.

Another key point is that the channel has been verified by multiple measurements to have a *quasi-optical nature*. For this reason ray-tracing can be employed as a technique to predict the channel behavior and aid in spatial and temporal modeling [Maltsev, 2014].

In addition to the above, propagation has been sensitive to antenna *polarization mismatches* between the receiver and transmitter. Polarization is a property of EM waves describing the orientation of the electric field and magnetic intensity in space and time. Experimental proof of the strong polarization impact on 60 GHz WLAN systems was given in [802.11ad\_model, 2010]. In [Maltsev, 2010] scenarios with losses as large as 10-20dB are attributed to polarization mismatches. The most robust arrangement was measured when horizontal polarization is used in the transmitter and circular in the receiver (or vice versa). This approach minimizes the degradation due to polarization mismatch to moderate values (2–3 dB).

#### 2.1.5 Simulating 60GHz channels

Models for simulation of communication over the 60GHz band have been recently available under the auspices of the standardization efforts for the 802.15.3c Wireless Personal Area Networks (WPANs) [TG3c, 2009] as well as the 802.11ad Wireless Local Area Networks (WLAN) [TGad\_model\_doc, 2010]. Though throughout this thesis both models have been used for simulation, the results presented in this and later chapters have been produced with the release of the 802.11ad model, which was issued during this thesis work period and is considered the state of the art for reference purposes.


Fig. 2-7 Impulse response realizations created by the 802.11ad model for a living room setting with (a1, a2) different Tx-Rx distance and (b1, b2) different Rx HPBW. The total power of all cluster rays is normalized to one.

In the 801.11ad model used for simulations [TGad\_model, 2010] there are three available environment options: living room, cubicle and conference room. Prior to generating impulse responses a configuration file is set up. This file enables manual tuning of various parameters such as sampling rate, distance, probability of reflection blockers, antenna type and polarization. In Figure 2-7(a1, a2) the effect of distance is shown. As the distance increases, the cursor diminishes due to free space path loss. In the examples (b1, b2) the half-power beam-width (HPBW) is varied. With the smaller angle, spatial filtering is essentially performed, so energy reception from different directions due to reflection is close to zero.



Fig. 2-8 Tapped delay line channel model in [Proakis, 2007].

In order to further investigate system performance a simulation test-bench is built, in which the channel is represented as a tapped-delay line. To emulate multipath propagation, the coefficients are set by the channel impulse response realizations coming from the [TGad\_model, 2010]. Variation of the Signal to Noise Ratio (SNR) is achieved by adding noise to the output<sup>3</sup>. This setup, displayed in Figure 2-8, was used to perform Bit Error Rate simulations and evaluate receiver performance.

# 2.2 Equalization for 60GHz channel receivers

Multipath fading is considered an irreducible error in signal transmission. This means that it cannot be mitigated by increasing transmit power or reducing receiver noise figure. So, the receiver should be built accordingly in order to combat the delay-dispersive environment that the 60GHz channel imposes. The measures taken usually are the appropriate choice of a modulation scheme or the use of equalization.

Equalization in its most straightforward kind (the linear equalizer) involves the application of a filter in the receiver which aims at inverting the channel impulse response. Effectively, this

<sup>&</sup>lt;sup>3</sup> This can be also carried-out by convolving with an Added White Gaussian Noise channel



*Fig. 2-9 By applying an MMSE criterion the error between the transmitted signal and the output of the equalizer is minimized.* 

removes all impairments that stem from the channel. The situation, however, complicates with the presence of noise. As shown in Figure 2-9, when compensating for a fade, applying exactly the inverse transfer function will actually amplify any noise as well. This is known as the zero forcing (ZF) criterion. The situation worsens when frequency nulls are present: an inverting filter will amplify noise to the maximum [Proakis, 2007]. A solution is to apply a minimum mean square error (MMSE) criterion. This will effectively minimize the noise contribution at the receiver, rather than trying to totally dismiss ISI by inverting the channel.

Equalization can be either performed in the time or the frequency domain. Time-domain implementations using mixed-signal design have been recently suggested as a low power solution suitable for mobile terminal integration in 60GHz systems [Sobel, 2009]. In the rest of this section, after reviewing generally equalization and reflect on system integration in terms of system power consumption, we will revise the relevant state of the art in mixed signal equalization.

#### 2.2.1 OFDM and Frequency domain equalization

Dealing with the effect of the channel multipath and ISI in the receiver can be simply done by lowering the data rate. This would require establishing a symbol duration that is substantially longer that the delay spread of the channel. The transmitted symbol would then last longer than the effect of its last echo. This way the distorting effect of this echo would only affect the symbol currently transmitted and not any other. To bring some insight, let us visualize the eye diagrams in Figure 2-10 coming from three cases: (i) no



Fig. 2-10 Eye diagram with (i) no multipath distortion (ii) one path arriving within symbol time (iii) one path arriving one symbol time later than the previous case.

interference (ii) one single echo sized 0.3x the cursor arriving during symbol time, and (iii) one single echo arriving one symbol time later. Distortion by components within symbol time might alter the eye shape, but still a simple shift in the decision phase could maximize performance. Things aren't that straightforward with inter-symbol interference in case (iii). The cumulative effect of multiple paths contributing to the distortion will cause the eye to completely shut. Extra countermeasures need to be taken then, that is, dedicated equalizers.

After following the above, it is easy to understand that schemes with multiple carriers that inherently use long symbol times in order to achieve multiplexing are resistant to the effect of multipath propagation. The known drawbacks, however, have to do with the power consumption. OFDM is costly on the transmitter power amplifier due to its high *Peak to Average Power Ratio*. Additionally, it assumes more complex processing on the digital baseband than simpler, single carrier schemes. In the work of [Mitomo, 2012] 320mW of power are needed for the OFDM/QPSK digital baseband implementing a 64-point I/FFT block for a short range/one-to-one transceiver design. A similar proximity transceiver demonstrated in [Saigusa, 2014], integrates a low 16-point I/FFT reporting a total of 125mW for the baseband. Nonetheless, by combining OFDM and very short distance links, the 60GHz channel multipath is substantially reduced and the need for elaborate equalization schemes is obviated.

On the contrary, when choosing to apply a single carrier modulation scheme, to benefit from the less stringent linearity constraints in the front-end, frequency domain equalization is a



Fig. 2-11 Principle of a SC-FDE linear equalizer as seen in [Saito, 2013].

choice that can actually coexist with OFDM. Single Carrier Frequency Domain Equalization (SC-FDE) is delivering similar performance to that of an OFDM system with essentially equivalent complexity [Falconer, 2002]. However, the power amplifier of an SC transmitter requires a smaller linear range to support an average power. This is why in mobile radios (LTE), the common practice is to use OFDM for the downlink and SC-FDMA on the uplink, in order to save battery power. [Korhonen, 2014].

The principle of a linear, frequency domain equalizer is presented in Figure 2-11. The aim to invert the channel's frequency response is implemented in the frequency domain, so *Table 2-II State of the art Frequency Domain Equalizers* 

|                           | [Saito, 2013]      | [Hsiao, 2011] | [Yeh, 2011] |
|---------------------------|--------------------|---------------|-------------|
| Process [nm]              | 40                 | 65            | 65          |
| Symbol Rate [GS/s]        | 1.76               | 1.76          | 1.76        |
| FDE [GS/s]                | 3.52(x2 0.S)       | 1.76          | 1.76        |
| Core clock [MHz]          | 220                | 440           | 330         |
| FFT                       | 128pt (50%overlap) | 512pt         | 512pt       |
| IFFT                      | 64 pt              | 512pt         | 512pt       |
| Gate count (Data path)    | 284k               | 522k          | 1723k       |
| Gate count (Channel Est.) | 403k               | 76k           |             |
| Power (Data path) [mW]    | 116                | 208           | 211         |
| Max. excess [ns]          | 18                 | 36            | 36          |

an FFT is performed on the received signal and the output is scaled accordingly to perform the inversion. This is followed by an inverse FFT operation to return to the time domain and detect the symbol. State of the art FDEs are summarized in Table 2-II. The SC-FDE equalizer in [Hsiao, 2011] consumes 208mW with 7Gbps and 16QAM, while in [Saito, 2013] for the same rate the FDE consumes 116 mW.

#### 2.2.2 Time domain equalization

Time domain equalization on the receiver side involves cancelling the channel echoes by directly filtering incoming symbols. In Figure 2-12 a linear time-domain equalizer is seen, which is implemented as an FIR filter. Usually filters as such work with symbol time delays (Symbol Spaced Equalizer). The channel is non-static, so the coefficients,  $b_i(k)$ , are variable and follow the channel changes. This is usually done by interposing a training mode, which sets the coefficients by comparing incoming data with a pilot, i.e. a priori known data. By using an adaptation algorithm the coefficients are periodically updated in order to follow the channel changes.

Traditionally this kind of equalization has been regarded as inefficient when having to deal with the long delay spreads that are common in broadband wireless communications.



Fig. 2-12 Time-domain linear equalizer diagram from [Boccuzzi, 2008].

The reason is mainly the increasing complexity because the number of operations per unit interval grows linearly with respect to both the delay spread and the symbol rate [Pancaldi, 2008]. This means that as the delay spreads or the symbol rate increases, a larger number of multipliers and summers will be needed to implement the required filter length.

Straightforward equalizer implementations of high speed FIR filters have been very common in the equalization of read channels coming from magnetic and optical disk drives [Rylov, 2001], [Tierno, 2002]. In [Singh, 2010] a 6-bit 10-tap FIR equalizer is implemented with distributed arithmetic as shown in [Pearson, 1995]. The design featured an asynchronous pipeline, built in 0.18um and consumes 500mW of power for the maximum throughput of 1.3GS/s.

As for the 60GHz communications realm, an 8-tap FIR filter is reported to be used as an equalizer for the transceiver in [Okada, 2013]. Working with a symbol rate of 1.6GS/s can correct a delay spread of 5 ns, which, as described above, implies a very short link distance.

# 2.3 Decision feedback equalization

As mentioned in [Belfiore, 1979], the decision feedback equalizer (DFE) came out of the interest to provide bit-by-bit detection with significant performance advantages over the linear equalizer. It has been largely employed since its original conception and many sorts of implementations ranging from fully analog to fully digital and mixed signal have been proposed. The DFE has been one of the standard choices for time domain equalization in digital communications, especially due to its straightforward design and performance.

The design a DFE is shown in its typical structure in Figure 2-13. The concept of using past decisions for received bits in order to combat ISI was initially introduced in [Austin, 1967]. The received signal, r(t), is processed with a linear feed-forward filter (FFF). The output is summed with a filtered version of previous symbol decisions before taking the decision with a threshold device.

The DFE works under the assumption of knowledge of the channel and the fact that decisions upon incoming symbols are correct. With the use of the feedback filter the impact of *current* 



Fig. 2-13 Typical structure of a decision feedback equalizer, presented along with a channel impulse response. The feedback filter (FBF) can correct only post-cursor ISI.

decisions on *future* bits is cancelled. This is modeled as canceling the *post-cursor* components of the channel impulse response. To bring more insight, in a situation of visual contact, namely the Line-Of-Sight (LOS) response, we would only have to deal with post-cursor ISI, because the useful signal, i.e. the cursor, would only be impacted by previous decisions. In Non-Line-Of-Sight (NLOS) the link is established through a reflection, and the time reference is shifted. This reference shift actually enables filtering out pre-cursor interference which would otherwise be achieved only by a non-causal filter.

Performance enhancement comes from the fact that the removal of post-cursor ISI is carriedout by applying a decision on the incoming data, so there's no additive noise in the feedback. That's the reason the DFE exhibits a lower error probability than the linear equalizer. The forward filter in the DFE is basically set up to remove only the pre-cursor interference, which cannot be handled by the feedback filter.

A problem associated with the DFE however, is that with each wrong decision the probability of an error is further increased, which leads to bursts of errors occurring. This phenomenon known as error propagation is exacerbated by increasing the number of taps and the associated tap weights. However, it is proven not to play a significant role, especially when equalization is assisted by coding. [Molisch, 2011]

Regarding tap assignment for the DFE, minimizing the mean square error will lead to a different implementation than the one in a linear filter. That's because in this case, the post-cursor interference is removed with no additive noise contributions. In absence of precursor interference, subtracting ISI can be performed with the ZF criterion in a straightforward manner by simply inverting the post-cursor coefficients of the channel. When pre-cursor interference exists, as in an NLOS condition, a noise whitening filter is the optimum forward filter to establish a purely causal equivalent channel [Proakis, 2007].

#### 2.3.1 Employing the DFE: from Gigabit wired to Gigabit wireless

The decision feedback equalizer has been employed in communication systems for many years and has been popular especially in the implementation of equalizers for high-speed wired-links. Wired channels such as PCB traces exhibit a much smaller delay spread than the 60GHz. This means that managing the post-cursor distortion of the wired link channel (Figure 2-14) demands a smaller number of taps. This is suited to the error probability performance characteristics of the DFE, as mentioned in the previous section.



Fig. 2-14 Backplane channel frequency and impulse response. At very high data rates such as 25Gbps, post-cursor ISI spans 15 taps [Bulchazzelli, 2013]. For a rate of ~2 Gbps one or two taps would suffice.



Fig. 2-15 Evolution of DFE summers in wired links. (a) Resistively loaded CML summer. (b) Current-integrating summer. (c) Sampled current integrating summer. (d) Peaking currentintegrating summer from [Bulchazzeli, 2013].

The most straightforward DFE implementation approach is the digital one. The DFE is implemented directly in a Digital Signal Processor (DSP) after digitization. The wired-link fully digital paradigm has showcased works as in [Spagna, 2010] with a 78mW, 4-tap, 11GS/s DFE implemented in 32nm CMOS and [Toifl, 2014] in which a low power 16GS/s system is presented, featuring 8-tap quarter-rate DFE with 1.6pJ/bit efficiency (~25mW).

High efficiency wire-line DFEs which are commonly realized in mixed-signal topologies, have set an example for 60GHz channel equalization. In such implementations the common strategy is to implement the summation part of the topology with an analog circuit. As the bandwidths rise over the years, many techniques have been used, as seen in Figure 2-15, starting from conventional current summation to current integration, exploiting sampling and finally peaking. A common rule of thumb for efficiency in wire-line designs is around 1mW/Gbps, [O'Mahony, 2010]. Latest DFE designs targeting the implementation of multi-Gbps rates have showcased rates such as 28Gbps in [Bulzachelli, 2012] or the 25Gbps 0.232mW/Gbps charge steering two-tap DFE in [Jung, 2014].



Fig. 2-16 Mixed signal equalization relieves the power consumption of the dominating DSP block.

The order of magnitude of data rates in wired-links matches the expected order of magnitude in 60GHz wireless links. So the DFE, as we will see in the next section, has drawn attention for potential low-power applications in wireless receivers.

# 2.4 State of the art mixed-signal DFEs in 60GHz receiver basebands

Evaluating current state of the art of equalization solutions seen so far, it becomes apparent that OFDM/FDE solutions will most likely require a power consumption in the order of multiple hundreds of mW. This is added to an already power-hungry baseband PHY chain, especially when using LDPC decoders<sup>4</sup>. Overall, as mentioned in [Thakkar, 2012], the DSP consumption is likely to dominate the baseband, putting aside the fact that high speed, high resolution ADCs needed in the front-line (Figure 2-16) have been recently scaled to a few mW. Following the efficiency paradigm of wired links, mixed-signal DFE's have been lately proposed as low power alternatives aiming to off-load consumption of the DSP part. However, as [Park, 2011] acknowledges, historically, blocks requiring high-speed are implemented in analog until device scaling catches up and digital implementations dominate.

<sup>&</sup>lt;sup>4</sup> Low Density Parity Check decoders are highly demanding in terms of power consumption and they are commonly designed with aggressive supply scaling and more recently by applying body bias techniques to further reduce power leakage [Weiner14].



Fig. 2-17 Mixed-signal baseband in [Sobel10].

In [Sobel, 2009], adhering to a single carrier system was supported to achieve low power operation. It is an intuitive choice, because the 60GHz band provides ample bandwidth to employ low order modulation techniques without the need for the circuit complexity of more spectrally efficient schemes. This has been also validated by the [802.11ad-2012] standard, which prescribes QPSK and BPSK in its low power modes. On the receiver side, the paradigm of software defined radio dictates direct digitization after antenna reception. Nevertheless, the benefits of digital flexibility come with the cost of more demanding circuit requirements. In the aforementioned work, the mixed-signal receiver baseband of Figure 2-17 is proposed, where analog synchronization coexists with digital control. This scheme aims at alleviating the complexity of the ADC, by preconditioning the demodulated signal with the synchronization circuit. The included DFE is implemented in current mode. Each tap is represented as a switchable source that draws current from a common rail. The design achieves a data-rate of 1Gb/s and consumes 14mW for the DFE.

In [Park, 2011] another attempt to employ an analog summer is reported. The proposed 1.76GS/s DFE was designed with a resistively loaded CML summer in the input as in Figure 2-15, case (a). The output was connected to six current steering switches implementing the summation for the DFE feedback taps.



Fig. 2-18 Cascode summation in [Thakkar, 2012].

In the work of [Thakkar, 2012], the problem of capacitive loading in the current mode summer was addressed. Evidently that as the number of taps (current steering switches) increases, the associated capacitance hanging from the summation node increases as well. This limits the settling time and confines the maximum rate. The solution given was to interpose a cascode transistor as seen in Figure 2-19. The related capacitance of the cascode transistor is much smaller than the one related to the sum of tap coefficients, so the loading of the summing amplifier is reduced. This technique enabled low-power designs with large numbers of taps: a 40-tap design at 4GS/s with 1.4pJ/bit efficiency in [Thakkar, 2012] and a current integrating 100-tap DFE design with reported 6.9 pJ/bit efficiency in [Thakkar, 2014].

Reviewing the aforementioned approaches in terms of dealing with the relatively long delay spread of the 60GHz channel we can comment on the two different ways taps are allocated. [Sobel,2009] aims for a low number with flexible delay assignment (Figure 2-19), whereas [Thakkar, 2012] accounts for a non-flexible multi-tap design.



Fig. 2-19 Flexible tap allocation in [Sobel, 2009]

|                                  | [Sobel10]          | [Thakkar12] | [Thakkar14]   |
|----------------------------------|--------------------|-------------|---------------|
| Technology                       | 90nm               | 65nm        | 65nm          |
| Sample Rate                      | 500MS/s            | 5GS/s       | 1.76GS/s      |
| Modulation                       | MSK                | QPSK        | QPSK          |
| Targeted BER (coverage analysis) | 10 <sup>-3</sup>   | N/A         | 10^-3         |
| Connected DFE coeffs / channel   | 8                  | 20          | 50            |
| Variable delay taps              | Yes                | No          | No            |
| Delay spread coverage total(ns)  | 32ns               | 4ns         | 28ns          |
| Tap distribution                 | 8 taps/32ns        | 1 tap/200ps | 1 tap/560ps   |
| Analog Summation type            | Current Buffer+TIA | Resistive   | Current Integ |
| Power(mW)                        | 14mW               | 7mW         | 7.5mW         |
| Efficiency (pJ/bit)              | 28                 | 1.4         | 4.26          |

#### Table 2-III State of the art mixed-signal DFE's for 60GHz receivers

A synopsis of the state of the art performance is found in Table 2-III.

After analyzing the conditions of system deployment for 60GHz communications, in the following chapter, a LOS deployment scenario is proposed. As mentioned, LOS entails only post-cursor inter-symbol interference, so equalization can be performed by the DFE without a forward filter. We propose a flexible methodology with a small number of taps, which makes use of a continuous-time digital delay line and aims at very low power operation.

# 2.5 Chapter Bibliography

[Austin, 1967] Austin, M.E. "Decision-Feedback Equalization for Digital Communication over Dispersive Channels." M.I.T. Research Laboratory of Electronics, August 11, 1967. http://dspace.mit.edu/bitstream/handle/1721.1/4282/RLE-TR-461-04744914.pdf?sequence=1.

[Belfiore, 1979] Belfiore, C.A., and Jr. Park, J.H. "Decision Feedback Equalization." *Proceedings of the IEEE* 67, no. 8 (August 1979): 1143–56. doi:10.1109/PROC.1979.11409.

[Boccuzzi, 2008] Boccuzzi, Joseph. Signal Processing for Wireless Communications. New York, NY: McGraw-Hill, 2008. ISBN :978-0-07-148905-8

[Bulzacchelli, 2013] Bulzacchelli, J.F. "Design Techniques for CMOS Backplane Transceivers Approaching 30-Gb/s Data Rates." In *2013 IEEE Custom Integrated Circuits Conference (CICC)*, 1–8, 2013. doi:10.1109/CICC.2013.6658405.

[Bulzacchelli, 2012] Bulzacchelli, J.F., C. Menolfi, T.J. Beukema, D.W. Storaska, J. Hertle, D.R. Hanson, Ping-Hsuan Hsieh, et al. "A 28-Gb/s 4-Tap FFE/15-Tap DFE Serial Link Transceiver in 32-Nm SOI CMOS Technology." *IEEE Journal of Solid-State Circuits* 47, no. 12 (December 2012): 3232–48. doi:10.1109/JSSC.2012.2216414.

[Falconer, 2002] Falconer, David, Sirikiat Lek Ariyavisitakul, Anader Benyamin-Seeyar, and Brian Eidson. "Frequency Domain Equalization for Single-Carrier Broadband Wireless Systems." *Communications Magazine, IEEE* 40, no. 4 (April 2002): 58–66. doi:10.1109/35.995852.

[FCCpart15, 2009] Federal Communications Commission. "CFR-2009-title47-vol1part15.pdf." http://www.gpo.gov/fdsys/pkg/CFR-2009-title47-vol1/pdf/CFR-2009-title47vol1-part15.pdf.

[FCC, 1997] Federal Communications Commission, Office of Engineering and Technology, and New Technology Development Division. "Millimeter Wave Propagation: Spectrum Management Implications." *Bulletin Number 70*, 1997.

https://transition.fcc.gov/Bureaus/Engineering\_Technology/Documents/bulletins/oet70/oe t70a.pdf.

[Friis, 1946] Friis, H.T. "A Note on a Simple Transmission Formula." *Proceedings of the IRE* 34, no. 5 (May 1946): 254–56. doi:10.1109/JRPROC.1946.234568.

[Hsiao, 2011] Hsiao, Frank, Adrian Tang, Derek Yang, Mike Pham, and Mau-Chung Frank Chang. "A 7Gb/s SC-FDE/OFDM MMSE Equalizer for 60GHz Wireless Communications," 293– 96. IEEE, 2011. doi:10.1109/ASSCC.2011.6123570.

[TG3c, 2009]"IEEE 802.15.3c WPAN Task Group." http://www.ieee802.org/15/pub/TG3c\_CFPdoc&Proposals.html.

[802.11ad, 2012] "IEEE Standard for Information technology–Telecommunications and Information Exchange between systems–Local and Metropolitan Area networks–Specific Requirements-Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications Amendment 3: Enhancements for Very High Throughput in the 60 GHz Band." *IEEE Std 802.11ad-2012 (Amendment to IEEE Std 802.11-2012, as Amended by IEEE Std*  *802.11ae-2012 and IEEE Std 802.11aa-2012)*, 2012, 1–628. doi:10.1109/IEEESTD.2012.6392842.

[Jung, 2014] Jung, Jun Won, and B. Razavi. "2.4 A 25Gb/s 5.8mW CMOS Equalizer." In *Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014 IEEE International,* 44–45, 2014. doi:10.1109/ISSCC.2014.6757330.

[Korhonen, 2014 Korhonen, Juha. *Introduction to 4G Mobile Communications*. Artech House Mobile Communications Series. Boston: Artech House, 2014. ISBN: 978-1-60807-699-4

[Lai, 2008] Lai, Ivan Chee-Hong, and Minoru Fujishima. *Design and Modeling of Millimeter-Wave CMOS Circuits for Wireless Transceivers: Era of Sub-100nm Technology*. Dordrecht: Springer, 2008.

[Langen, 1994] Langen, B., G. Lober, and W. Herzig. "Reflection and Transmission Behaviour of Building Materials at 60 GHz," 505–9. IOS Press, 1994. doi:10.1109/WNCMF.1994.529141.

[Maltsev, 2009] Maltsev, A., R. Maslennikov, A. Sevastyanov, A. Khoryaev, and A. Lomayev. "Experimental Investigations of 60 GHz WLAN Systems in Office Environment." *IEEE Journal on Selected Areas in Communications* 27, no. 8 (2009): 1488–99. doi:10.1109/JSAC.2009.091018.

[Maltsev, 2010] Maltsev, A., E. Perahia, R. Maslennikov, A. Sevastyanov, A. Lomayev, and A. Khoryaev. "Impact of Polarization Characteristics on 60-GHz Indoor Radio Communication Systems." *IEEE Antennas and Wireless Propagation Letters* 9 (2010): 413–16. doi:10.1109/LAWP.2010.2048410.

[Maltsev, 2014] Maltsev, Alexander, Andrey Pudeyev, Ingolf Karls, Ilya Bolotin, Gregory Morozov, Richard Weiler, Michael Peter, and Wilhelm Keusgen. "Quasi-Deterministic Approach to mmWave Channel Modeling in a Non-Stationary Environment," 966–71. IEEE, 2014. doi:10.1109/GLOCOMW.2014.7063558.

[Manabe, 1996] Manabe, T., Y. Miura, and T. Ihara. "Effects of Antenna Directivity and Polarization on Indoor Multipath Propagation Characteristics at 60 GHz." *IEEE Journal on Selected Areas in Communications* 14, no. 3 (April 1996): 441–48. doi:10.1109/49.490229.

[MarketsAndMarkets, 2015] MarketsAndMarkets. "Millimeter Wave Technology Market by Frequency Band & Components - 2020 | MarketsandMarkets."

http://www.marketsandmarkets.com/Market-Reports/millimeter-wave-technology-market-981.html.

[Mitomo, 2012] Mitomo, Toshiya, Yukako Tsutsumi, Hiroaki Hoshino, Masahiro Hosoya, Tong Wang, Yuta Tsubouchi, Ryoichi Tachibana, et al. "A 2Gb/s-Throughput CMOS Transceiver Chipset with in-Package Antenna for 60GHz Short-Range Wireless Communication," 266–68. IEEE, 2012. doi:10.1109/ISSCC.2012.6177010.

[Molisch, 2011] Molisch, Andreas F. *Wireless Communications*. 2nd ed. Chichester, West Sussex, U.K: Wiley: IEEE, 2011.

[Okada, 2013] Okada, K., K. Kondou, M. Miyahara, M. Shinagawa, H. Asada, R. Minami, T. Yamaguchi, et al. "Full Four-Channel 6.3-Gb/s 60-GHz CMOS Transceiver With Low-Power Analog and Digital Baseband Circuitry." *IEEE Journal of Solid-State Circuits* 48, no. 1 (January 2013): 46–65. doi:10.1109/JSSC.2012.2218066.

[O'Mahony, 2010] O'Mahony, Frank, James E. Jaussi, Joseph Kennedy, Ganesh Balamurugan, Mozhgan Mansuri, Clark Roberts, Sudip Shekhar, Randy Mooney, and Bryan Casper. "A 47x 10 Gb/s 1.4 mW/Gb/s Parallel Interface in 45 Nm CMOS." *IEEE Journal of Solid-State Circuits* 45, no. 12 (December 2010): 2828–37. doi:10.1109/JSSC.2010.2076214.

[Pancaldi, 2008] Pancaldi, Fabrizio, Giorgio M. Vitetta, Reza Kalbasi, Naofal Al-Dhahir, Murat Uysal, and Hakam Mheidat. "Single-Carrier Frequency Domain Equalization." *Signal Processing Magazine, IEEE* 25, no. 5 (September 2008): 37–56. doi:10.1109/MSP.2008.926657.

[Park, 2011] Park, Ji-Hoon. "Power-Efficient Design of Multi-Gbps Wireless Baseband," 2011. http://escholarship.org/uc/item/8631t4mq.pdf.

[Pearson, 1995] Pearson, Dale J., Scott K. Reynolds, Andrew C. Megdanis, Sudhir Gowda, Kevin R. Wrenner, Michael Immediato, Richard L. Galbraith, and Hyun J. Shin. "Digital FIR Filters for High Speed PRML Disk Read Channels." *Solid-State Circuits, IEEE Journal of* 30, no. 12 (December 1995): 1517–23. doi:10.1109/4.482200.

[Perahia, 2010] Perahia, E., Carlos Cordeiro, Minyoung Park, and L.L. Yang. "IEEE 802.11ad: Defining the Next Generation Multi-Gbps Wi-Fi." In *2010 7th IEEE Consumer Communications and Networking Conference (CCNC)*, 1–5, 2010. doi:10.1109/CCNC.2010.5421713.

[Proakis, 2007] Proakis, John G., and Masoud Salehi. *Digital Communications*. 5th ed. Boston: McGraw-Hill, 2008. ISBN: 978-0-07-295716-7

[Rappaport, 2013] Rappaport, T. S., Shu Sun, R. Mayzus, Hang Zhao, Y. Azar, K. Wang, G. N. Wong, J. K. Schulz, M. Samimi, and F. Gutierrez. "Millimeter Wave Mobile Communications for

5G Cellular: It Will Work!" *IEEE Access* 1 (2013): 335–49. doi:10.1109/ACCESS.2013.2260813.

[Rylov, 2001] Rylov, S., A. Rylyakov, J. Tierno, M. Immediato, M. Beakes, M. Kapur, P. Ampadu, and D. Pearson. "A 2.3 GSample/s 10-Tap Digital FIR Filter for Magnetic Recording Read Channels," 190–91. IEEE, 2001. doi:10.1109/ISSCC.2001.912599.

[Saigusa, 2014] Saigusa, S., Toshiya Mitomo, Hidenori Okuni, M. Hosoya, Akihide Sai, Shigeaki Kawai, Tao Wang, et al. "20.4 A Fully Integrated Single-Chip 60GHz CMOS Transceiver with Scalable Power Consumption for Proximity Wireless Communication," 348–49. IEEE, 2014. doi:10.1109/ISSCC.2014.6757464.

[Saito, 2013] Saito, Nobuo, T. Tsukizawa, N. Shirakata, Takahito Morita, Kiyoshi Tanaka, Jun Sato, Yu Morishita, et al. "A Fully Integrated 60-GHz CMOS Transceiver Chipset Based on WiGig/IEEE 802.11ad With Built-In Self Calibration for Mobile Usage." *Solid-State Circuits, IEEE Journal of* 48, no. 12 (December 2013): 3146–59. doi:10.1109/JSSC.2013.2279573.

[Sato, 1997] Sato, K., T. Manabe, T. Ihara, H. Saito, S. Ito, T. Tanaka, K. Sugai, et al. "Measurements of Reflection and Transmission Characteristics of Interior Structures of Office Building in the 60-GHz Band." *IEEE Transactions on Antennas and Propagation* 45, no. 12 (December 1997): 1783–92. doi:10.1109/8.650196.

[Sawada, 2010] Sawada, Hirokazu, Hiroyuki Nakase, Shuzo Kato, Masahiro Umehira, Katsuyoshi Sato, and Hiroshi Harada. "Impulse Response Model and Parameters for Indoor Channel Modeling at 60GHz," 1–5. IEEE, 2010. doi:10.1109/VETECS.2010.5494126.

[Singh, 2010] Singh, Montek, Jose A. Tierno, Alexander Rylyakov, Sergey Rylov, and Steven M. Nowick. "An Adaptively Pipelined Mixed Synchronous-Asynchronous Digital FIR Filter Chip Operating at 1.3 Gigahertz." *Very Large Scale Integration (VLSI) Systems, IEEE Transactions on* 18, no. 7 (July 2010): 1043–56. doi:10.1109/TVLSI.2009.2019660.

[Sobel, 2009] Sobel, D.A., and R.W. Brodersen. "A 1 Gb/s Mixed-Signal Baseband Analog Front-End for a 60 GHz Wireless Receiver." *Solid-State Circuits, IEEE Journal of* 44, no. 4 (April 2009): 1281–89. doi:10.1109/JSSC.2009.2014731.

[Spagna, 2010] Spagna, Fulvio, Lidong Chen, Mamatha Deshpande, Yongping Fan, Doug Gambetta, Sujatha Gowder, Sitaraman Iyer, et al. "A 78mW 11.8Gb/s Serial Link Transceiver with Adaptive RX Equalization and Baud-Rate CDR in 32nm CMOS," 366–67. IEEE, 2010. doi:10.1109/ISSCC.2010.5433823.

[TGad\_model\_doc, 2010] "TGad Channel Models for 60 Ghz WLAN Systems r8," 2010. https://mentor.ieee.org/802.11/dcn/09/11-09-0334-08-00ad-channel-models-for-60-ghzwlan-systems.doc.

[TGad\_model, 2010] "TGad Implementation of 60GHz WLAN Channel Model r3," 2010. https://mentor.ieee.org/802.11/dcn/09/11-09-0854-03-00ad-implementation-of-60ghzwlan-channel-model.doc.

[Thakkar, 2012] Thakkar, C., Lingkai Kong, Kwangmo Jung, A. Frappe, and E. Alon. "A 10 Gb/s 45 mW Adaptive 60 GHz Baseband in 65 Nm CMOS." *Solid-State Circuits, IEEE Journal of* 47, no. 4 (April 2012): 952–68. doi:10.1109/JSSC.2012.2184651.

[Tierno, 2002] Tierno, J., A. Rylyakov, S. Rylov, M. Singh, P. Ampadu, S. Nowick, M. Immediato, and S. Gowda. "A 1.3 GSample/s 10-Tap Full-Rate Variable-Latency Self-Timed FIR Filter with Clocked Interfaces," 1:60–444 vol.1. IEEE, 2002. doi:10.1109/ISSCC.2002.992938.

[Toifl]Toifl, Thomas, Peter Buchmann, Troy Beukema, Michael Beakes, Matthias Brandli, Pier Andrea Francese, Christian Menolfi, Marcel Kossel, Lukas Kull, and Thomas Morf. "A 3.5pJ/bit 8-Tap-Feed-Forward 8-Tap-Decision Feedback Digital Equalizer for 16Gb/s I/Os," 455–58. IEEE, 2014. doi:10.1109/ESSCIRC.2014.6942120.

[Ruckus, 2013] "Using All the Tools You Can." Application Note Ruckus Wirelles Inc. http://c541678.r78.cf2.rackcdn.com/wp/wp-using-all-the-tools-you-can.pdf.

[Weiner, 2014] Weiner, M., Marjan Blagojevic, Sergey Skotnikov, Andreas Burg, Philippe Flatresse, and B. Nikolic. "27.7 A Scalable 1.5-to-6Gb/s 6.2-to-38.1mW LDPC Decoder for 60GHz Wireless Networks in 28nm UTBB FDSOI," 464–65. IEEE, 2014. doi:10.1109/ISSCC.2014.6757515.

[Agilent, 2013]"Wireless LAN at 60 GHz - IEEE 802.11ad Explained." http://cp.literature.agilent.com/litweb/pdf/5990-9697EN.pdf.

[Xu, 2002] Xu, Hao, V. Kukshya, and T.S. Rappaport. "Spatial and Temporal Characteristics of 60-GHz Indoor Channels." *Selected Areas in Communications, IEEE Journal on* 20, no. 3 (April 2002): 620–30. doi:10.1109/49.995521.

[Yeh, 2011] Yeh, Fu-Chun, Tai-Yang Liu, Ting-Chen Wei, Wei-Chang Liu, and Shyh-Jye Jou. "A SC/OFDM Dual Mode Frequency-Domain Equalizer for 60GHz Multi-Gbps Wireless Transmission," 1–4. IEEE, 2011. doi:10.1109/VDAT.2011.5783559.

# Chapter 3 A 60GHz baseband DFE with channel dependent power consumption

In the previous chapter, after reviewing the basic challenges of the 60GHz channel, we focused on innovation in the area of channel equalization and more specifically around the decision feedback equalizer. Though the traditional approaches have leveraged the robustness of digital signal processing, recent research efforts have been supporting the idea of a non-strictly digital implementation. As shown, in the past five years mixed-signal DFEs have been proposed as part of the baseband processing chain, inspired by wire-line equalization techniques. Such schemes suggest equalization before signal digitization. This is beneficial because it involves off-loading the already heavily consuming baseband DSP processor. Additionally, it relaxes ADC design requirements, because after equalization, less resolution will be needed. After all, in terms of system integration, multi-mode/multi-rate specifications, as in coexisting Wi-Fi/WiGig, create the need for power reduction, as design specifications are targeting the worst case of each mode [Tsigabu, 2015].

The targeted application scenario for this work involves wireless connectivity with Gb/s data-rates in the frame of mobile device integration. Visual contact between the communication terminals limits the effect of ISI to only post-cursor interference, so Line-Of-Sight indoors deployment is selected as the use model.

An implementation of a DFE with a small number of taps is envisioned. In order to cover the needed delay spread, a flexible, critical tap cancellation scheme is assumed. The proposed scheme exhibits a small Bit Error Rate, because a flexible tap allocation permits targeting the reflections that create the most interference. Furthermore, the circuit implementation aims at low power consumption by avoiding extensive use of clocked hardware. Indeed, mixed-signal DFE implementations report high percentages of total power consumption owing to the presence of clocked parts (50% in [Thakkar, 2014]). With this mindset, and motivated by developments in continuous-time digital signal processing, implementing the feedback filter of the DFE as a continuous-time digital tapped delay line is hereby proposed. The top-level circuit simulations validate the existence of a channel dependent power consumption profile. This study motivates the silicon implementation that follows.

This chapter is an elaboration of the presentation in [Sourikopoulos, 2014].

# 3.1 Critical tap cancellation

In the previous chapter, modeling for the 60GHz channel was suggested to be carried out by a tapped delay line [Proakis, 2008]. The related frequency response is not flat and this accommodates the frequency selective character of the medium. When in Line-of-Sight with the transmitter, the signal at the receiver arrives in a direct path, but signal power is also received via reflections due to surrounding surfaces. This means that the channel's impulse response is directly associated with these surroundings. We can visualize the situation in the basic arrangement of Figure 3-1. Assuming an indoor environment, there are two more paths received by reflections on the room walls, besides the Line-Of-Sight path. The extra signal components arrive with different delays and amplitudes after following trajectories of different length and different attenuation. As mentioned before, the time extent of these arrivals constitutes the *multipath spread*.

The receiver should exhibit a highly spatially selective character in order to filter out as much as out-of-bounds energy possible. This is realized through the use of beamforming techniques which results in modifying the radiation pattern of an antenna in order to constructively add the phases of signals coming from a certain direction.



Fig. 3-1 Propagation of two paths within a room and Line-Of-Sight channel impulse response.



Fig. 3-2 A typical 60GHz LOS channel impulse response. We remark that ISI is strongly impacted by the indicated, high amplitude components.

This way beamforming substantially reduces the multipath spread, but it does not eliminate completely the need for a receiver equalizer, the latter being assigned to remove the remaining ISI.

Following the quasi-optical character of the channel, an intuitive observation comes from the fact that the signal reflections which create ISI are more likely to come from one or two consecutive reflections before reaching the receiver (namely 1<sup>st</sup> or 2<sup>nd</sup> order reflections). Indeed, measurement campaigns validate the fact that the signal loses around 10dB of its power in each reflection due to associated diffraction effects [Niknejad, 2010]. Reviewing a typical channel impulse response realization (Figure 3-2) shows that performance will be dominated by the high amplitude components stemming from these reflections. So, one could attempt to design a high efficiency equalization system that targets the cancellation of only these critical components. The fact remains however, that the channel is unknown; and furthermore, it is time variant. Design redundancy is demanded, but one can explore an efficient specification by restricting the deployment scenario as shown below.

In order to quantify the needed number of impulse response components, BER simulations are performed. The channel model used for the results below is the one developed by the 802.11ad working group, available from [TGad, 2010]. The deployment scenario settings refer to a typical Line-Of-Sight indoors setup with a 5m link distance. We have assumed beamforming with half power beam-width of 60° and no blocking of 1<sup>st</sup> and 2<sup>nd</sup> order reflections. Using the statistical model, a set of probable channel impulse response realizations were generated. These were convolved with an AWGN (Added White Gaussian Noise) channel of Eb/N0=5dB. For each channel realization the Bit Error rate was simulated repeatedly, each time cancelling the highest amplitude component within a delay spread of



*Fig. 3-3 Worst-case BERs with an increasing number of critical impulse response components cancelled, under Eb/N0=5dB.* 

20ns. The transmission mode was assumed to be a BPSK modulation at 1 Gb/s. In Figure 3-3 the cases of the three worst (from a BER standpoint) channel realizations of the set are shown versus the number of the critical components cancelled. The curves saturate, as more components of lesser impact are cancelled. If more than five components are cancelled the BER changes are less than 5%. So, choosing the case of five taps for further study we simulate the BER again for the same realizations; this time, versus the bit SNR. The simulation results are in Figure 3-4. The BER was simulated for each realization two times; one considering the equalization (signed "eq" in Figure 3-4) and one without. Alongside, there is



Fig. 3-4 Worst case BER with and without equalization of 5 critical components alongside with the theoretical AWGN channel at 1Gb/s BPSK.

the theoretical BER performance of the AWGN channel for reference. The above results demonstrate that for the complete set under test, a BER better than  $10^{-2}$  is expected even when assuming noise power to be half the one of the signal (6dB point). The  $10^{-2}$  BER is a usual threshold in performance that can be considered sufficient for modern coding schemes [Weiner, 2014].

Approaching equalization realistically involves sustaining an error in meeting coefficient values due to quantization. This error is the remainder of ISI that cumulatively deteriorates performance. For this purpose, we investigate the quantization of the tap coefficients assigned to cancel the critical multipath components. Again, going through repeated BER simulations of increasing quantization bits, displayed in Figure 3-5, we discover that 6 bits are sufficient to assure the aforementioned performance.

Envisioning an implementation, the metrics which are interesting to derive are the dynamic range for a tap as well as the total amount of ISI amplitude with respect to the cursor. The statistic set revealed that the highest amplitude components for any realization didn't exceed 0.4 times the amplitude of the cursor, with a total induced ISI of no more than 1.8x the cursor. These findings are in concert with the corresponding ones in [Thakkar, 2012] which predict respectively  $\sim$ 0.5x and  $\sim$ 2x. Besides, in order to develop some circuit insight for the following chapters, these values are critical for performance as they set the specification for the summation operating point and the dynamic range of the coefficient implementation.



Fig. 3-5 BER degrades as coefficient cancellation is performed with less resolution.



Fig. 3-6 The proposed DFE with variable coefficients and delays.

# 3.2 A DFE with channel dependent power consumption

The above results build the motivation to further study the case of a 5-tap DFE. The BER analysis suggests that the taps must be able to be assigned in any delay arrangement up to 20ns. So, in essence the block diagram of the envisioned system can be represented by the one in Figure 3-6. Apart from the traditionally variable tap coefficients, the delay elements are also configurable. As we will see in Chapter 5, in a mixed-signal implementation the system requires a clocked comparator to follow the output of an analog summer, which is usually implemented in current mode. To realize the sign, the DFE's adaptable coefficients are implemented as current-steering switches, while the magnitude of the tap is set by digital-to-analog converters.

#### 3.2.1 Continuous-time digital feedback filtering for the DFE

Concerning the delay line design, the above specification could be supported by a flexible scheme as the one in [Sobel, 2009], repeated here in Figure 3-7(left). It is a clocked delay-line with each delay element output connected to 8 multiplexers. In this case, assuming a targeted rate of 2GS/s and the aforementioned multipath spread of 20ns, such a delay line would require five 40-to-1 multiplexers and 40 flip-flops.

The standard-cell discrete-time digital implementation, seen in Figure 3-7(right), introduces distributed multiplexing with redundant delay elements to accommodate clustering. This leads to somewhat simpler overhead routing [Park, 2011]. Nonetheless, all possible tap delay arrangements should be able to be addressed so further control was introduced in the form

of multiplexer delay offsets.

This thesis suggests the use of configurable continuous-time digital delays for the implementation of the DFE's feedback path. Obviously, the timing characteristics of clocked delays are less dependable to process, voltage or temperature (PVT) variations, thanks to using a reference. However, this robustness of the clock reference in creating precise delays for the tap coefficients is of lesser importance in view of the channels non-static character. Adaptation is an integral part of an equalizer system and this work suggests extending it to control the delays together with the coefficient values. This strategy doesn't assign a computational overhead to the adaptation hardware, because the total number of variables is kept very low; especially when comparing to the adaptation for multi-coefficient schemes such as the 100-tap DFE in [Thakkar, 14].

Building a continuous-time digital delay line offers consumption scaling. This can be carriedout by introducing a control scheme that will determine not only the amount of active granular delay elements in the delay-line, but also their output delay value. In the previous section we specified the presence of five flexible taps that could cover a delay range of 20ns. This means that each tap should be associated with a delay element that can produce a delay value ranging from 500ps-18ns<sup>5</sup>. As we will see further, this consumption scaling, due to the continuous-time digital delay-line implementation, can offer channel dependent power consumption. Power will vary depending on the delay value settings that are needed to equalize the different channel realizations.



Fig. 3-7 DFE feedback path implementation in [Sobel, 2009] left, and approach in [Park, 2011] right.

 $<sup>^5</sup>$  For a 2Gb/s case and 5 taps on the edge of the 20ns delay spread, the tap delay values should be set as: 1 x 18ns + 4 x 500ps = 20ns.



Fig. 3-8 The DFE delay element and the granular delay circuit used in simulation.

This operation could be loosely associated with the activity dependent consumption character of Continuous Time Digital Signal Processing (CT-DSP) [Schell, 2008]. Nevertheless, there is a distinct difference that refers to the quantization block. In the proposed case of Figure 3-6 the comparator is clocked and this means that there will be indeed a consumption floor for the DFE. In the next section, we further elaborate on the power savings of this approach against the typical clocked delay-line implementation. In order to project performance benefits of a potential circuit realization, simulations were carried out on the 28nm FDSOI technology node.

#### 3.2.2 Continuous-time digital delay element with two-stage control

The system was simulated with delay elements such as the ones in Figure 3-8. The tap delay elements of Figure 3-6,  $D_i$ , are actually designed as delay lines themselves made up of variable granular delay circuits,  $d_k$ . As commented in the previous section we choose to design a delay element with two distinct controlling mechanisms: (i) the number of active granular delay elements  $d_k$  (organized in sub-groups) and (ii) the delay value of these granular elements (kept the same, throughout each  $D_i$  element). For our initial system simulation purposes we used a modified version of the cell proposed in [Kurchuk, 2010]. In the next chapter of this thesis we will present an improved delay element based on combined gate/body biasing.

The granular delay grouping (2d, 3d, 5d etc.) for each delay element D<sub>i</sub> was optimized for minimizing inter-group consumption discontinuities up to 10ns. This explains the non-increasing group size value on the penultimate group. This 10ns optimization was based in observing the majority of delay spreads for the given impulse response test set.

The targeted range is 500ps-18ns for each equalizer tap as discussed. We assumed a receiver scenario of BPSK demodulation with a rate of 2Gb/s. Such a rate imposes a



Fig. 3-9 Power efficiency comparison for a delay element.

maximum granular delay of 500ps to avoid any data collision. After accounting for a 20% margin against PVT variation, we have defined the range of variability to be from 200ps to 400ps<sup>6</sup>.

#### 3.2.3 Power consumption comparison against the clocked approach

Figure 3-9 represents a first perspective on the comparison with the discrete-time delay element approach. The energy consumption of a discrete-time digital implementation of the delay element ( $D_i$ ) is noted as "Static approach". It refers to an implementation comprised of low-leakage, standard cell, D-type, flip flops with overhead multiplexing to enable the desired delay variation. Such consumption is fixed with the clock rate and appears constant regardless the output delay value.

In the case of the suggested continuous-time digital implementation, denoted as "This work", consumption depends on the active part. It refers only to the number of elements enabled and it also depends on the granular delay value configured each time. This explains the graph discontinuities, which indicate how granular delay groups are enabled consecutively. The widest discontinuity appears when enabling the last stage because, as mentioned, inter-group consumption optimization was carried-out up to 10ns. The "optimization reference" that was used was a delay line of granular cells with fixed 500ps delays.

 $<sup>^6</sup>$  The simulations of a similar cell in [Kurchuk, 2010] have shown 3.3% (1- $\sigma$ %) of delay variation for random device mismatch and less than 6% for a (-60°C, 120°C) temperature range. Standard deviation delay jitter is in the order of 15ps.

Even in the worst case of maximum delay, the continuous-time digital approach consumes half the power. To look further into this, comparing the energy consumed by a *granular* delay element (within the 200ps-400ps range) with the energy dissipated by a standard-cell D-flip-flop, we find an average 60% reduction. Also, we note that the energy consumed on the continuous-time element varies almost linearly, but not directly proportionally to the delay value. These remarks are reflected in the overall consumption profiles.

In our simulations, unselected stages were disabled by switching off the current-starving transistors, as seen in Figure 3-8 (right). The total leakage represents, in general, less than 1% of the overall consumption. It also important to underline that the discrete-time approach doesn't include power due to clock routing and driving; it only refers to the flip-flops and multiplexer.

#### 3.2.4 A DFE consumption profile linked to the channel realization

From the above it becomes clear that the continuous-time digital feedback filter for the DFE consumes depending on the tap delay settings. Obviously, this is imposed by the channel realization at hand that is targeted for equalization. On the stacked histogram of Figure 3-10, this energy consumption is presented against the constant consumption of the fully clocked counterpart as a normalized sum. The results for the set of 100 IRs yield an overall power reduction of 3 to 4 times when compared with the clocked approach. This comes on top



*Fig. 3-10 Consumption comparison between a discrete-time, static delay-line approach and the continuous-time configurable delay architecture.* 

of alleviating the system of any clock driving needed, so the situation is expected to be even more favorable for the proposed approach.

Summarizing the chapter, we have shown an energy efficient approach on designing a DFE for a 60GHz deployment scenario. Initially, as motivated from the previous chapter, we have assumed the use of a mixed-signal DFE. The BER simulations performed reveal that supporting a critical-tap cancellation scheme could be an adequate solution against ISI for LOS indoor conditions. Moreover, introducing a continuous-time digital delay line leads to channel-dependent power consumption. This feature highlights the efficiency of the circuit especially when realizing the non-static character of the 60GHz channel and its dependence on the environmental settings. A clocked approach disregards this fact by exhibiting a single energy consumption rate. This architecture was targeted for implementation using the 28nm FDSOI process, in order to validate this study experimentally. In the following chapters we present the delay-line and complete DFE prototyping effort.

# 3.3 Chapter Bibliography

[Kurchuk, 2010] Kurchuk, M., and Y. Tsividis. "Energy-Efficient Asynchronous Delay Element with Wide Controllability." In *Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS)*, 3837–40. IEEE, 2010. doi:10.1109/ISCAS.2010.5537714.

[Niknejad, 2010] Niknejad, A.M. "Siliconization of 60 GHz." *IEEE Microwave Magazine* 11, no. 1 (2010): 78–85. doi:10.1109/MMM.2009.935209.

[Park, 2011] Park, Ji-Hoon. "Power-Efficient Design of Multi-Gbps Wireless Baseband," 2011. http://escholarship.org/uc/item/8631t4mq.pdf.

[Proakis, 2008] Proakis, John G., and Masoud Salehi. *Digital Communications*. 5th ed. Boston: McGraw-Hill, 2008.

[Schell, 2008] Schell, B., and Y. Tsividis. "A Continuous-Time ADC/DSP/DAC System With No Clock and With Activity-Dependent Power Dissipation." *IEEE Journal of Solid-State Circuits* 43, no. 11 (November 2008): 2472–81. doi:10.1109/JSSC.2008.2005456.

[Sobel, 2009] Sobel, D.A., and R.W. Brodersen. "A 1 Gb/s Mixed-Signal Baseband Analog Front-End for a 60 GHz Wireless Receiver." *Solid-State Circuits, IEEE Journal of* 44, no. 4 (April 2009): 1281–89. doi:10.1109/JSSC.2009.2014731. [Sourikopoulos, 2014] Sourikopoulos, Ilias, Antoine Frappe, Andreas Kaiser, and Laurent Clavier. "A Decision Feedback Equalizer with Channel-Dependent Power Consumption for 60-GHz Receivers," 1484–87. Melbourne, Australia: IEEE, 2014. doi:10.1109/ISCAS.2014.6865427.

[TGad, 2010] "TGad Implementation of 60GHz WLAN Channel Model r3," 2010. https://mentor.ieee.org/802.11/dcn/09/11-09-0854-03-00ad-implementation-of-60ghzwlan-channel-model.doc.

[Thakkar, 2012] Thakkar, C., Lingkai Kong, Kwangmo Jung, A. Frappe, and E. Alon. "A 10 Gb/s 45 mW Adaptive 60 GHz Baseband in 65 Nm CMOS." *Solid-State Circuits, IEEE Journal of* 47, no. 4 (April 2012): 952–68. doi:10.1109/JSSC.2012.2184651.

[Thakkar, 2014] Thakkar, C., N. Narevsky, C.D. Hull, and E. Alon. "Design Techniques for a Mixed-Signal I/Q 32-Coefficient Rx-Feedforward Equalizer, 100-Coefficient Decision Feedback Equalizer in an 8 Gb/s 60 GHz 65 Nm LP CMOS Receiver." *IEEE Journal of Solid-State Circuits* 49, no. 11 (November 2014): 2588–2607. doi:10.1109/JSSC.2014.2360917.

[Tsigabu, 2015] Gebreyohanens, Fikre Tsigabu, Antoine Frappe, and Andreas Kaiser. "A Configurable Transmitter Architecture for IEEE 802.11ac and 802.11ad Standards." *IEEE Transactions on Circuits and Systems II: Express Briefs*, 2015, 1–1. doi:10.1109/TCSII.2015.2468920.

[Weiner, 2014] Weiner, M., Marjan Blagojevic, Sergey Skotnikov, Andreas Burg, Philippe Flatresse, and B. Nikolic. "27.7 A Scalable 1.5-to-6Gb/s 6.2-to-38.1mW LDPC Decoder for 60GHz Wireless Networks in 28nm UTBB FDSOI," 464–65. IEEE, 2014. doi:10.1109/ISSCC.2014.6757515.

# Chapter 4 Continuous-Time Digital Delay-Line in 28nm FDSOI

In the previous chapter, a decision feedback equalizer (DFE) was introduced in order to remove the 60GHz channel ISI. It features a feedback filter of variable coefficients and a continuous-time digital delay line. The delay-line specified for this approach exhibits scaled power consumption because of the introduction of two-stage control. Equalizer design with a fully discrete-time approach involves a high consumption floor; instead the proposed approach associates the power consumption of the equalizer with the channel impulse response realization. Power consumption is therefore sensitive to the surrounding settings of transmission (reflective surfaces, shadowing obstacles, etc.), which means the better the conditions, the less power is spent.

In this chapter, the focus turns to the implementation details of the delay line. The chapter starts by highlighting the requirements and existing techniques in producing digital delay, summarizing the state of the art. Subsequently, a topology assessment is presented based on the specified performance metrics. The proposed delay-cell, which was designed, fabricated and characterized during this thesis is then described. Specifically, the proposed design is based on a topology with low supply noise sensitivity and low jitter. Functionality is extended to support coarse/fine control for the output delay value, without the need for additional hardware. This is made possible by taking advantage of the body biasing capabilities available in the FDSOI technology. The proposed delay element presents unique performance characteristics in terms of the achieved delay resolution and delay dynamic range.

The chapter concludes with the demonstration of a delay line prototype, fabricated in 28nm FDSOI technology. After presenting the topology description and focusing on the major design aspects, the measurement results are presented and a discussion follows on the characterization findings.

# 4.1 General delay line specification

Previously, we have identified the need for a continuous-time digital delay line comprised of discrete digital-delay elements to act as the feedback loop of the decision feedback equalizer. Targeting throughputs in the order of Gb/s means that the delay line should be comprised of elements whose granularity - their maximum delay capability - should not exceed the order of hundreds of picoseconds.

At this point, it is important to stress that incorporating continuous-time delays in the design actually benefits from the fact that common DFE operation is based usually on an adaptation scheme due to the non-static character of the 60GHz channel. Extending the use of an adaptation algorithm to vary delay values makes it important to implement only the required granularity and variability of the delay-line, instead of calibrating the line to produce exact delay values.

From the previous chapter, a multi-nanosecond span was shown appropriate for an increased coverage of probable Line-of-Sight channel realizations. Arbitrarily allocating 5 taps within this range led to a 500ps-18ns specification for the delay element, targeting a rate of 2GS/s. A range of 200ps-400ps is chosen for the granular delay taking into account a 20% margin against PVT variations.

# 4.2 Digital delay elements - state of the art

Delay manipulation is a major concern for the reliable implementation of circuits whose purpose is timing. For example, clock phase generation, deskewing, data alignment, hazard mitigation are just some signal processing functions that could entail digital delays. However, as one acknowledges the fact that all sort of signal processing steps intrinsically produce some delay, then designing for delay as a performance metric is straightforward. It is this intrinsic non-ideality, namely *propagation delay* that is usually manipulated to build digital delay circuits (Figure 4-1).

Depending on the application, the range and the importance of accurately controlling a delay value differs. On the one hand, some sort of simple operation, such as to avoid setup and hold violations in a datapath can be simply handled with some extra buffering. On the other hand, producing reliable clock signals comes along with designing elaborate systems with multiple control loops. The reason for the required complexity is that they require high precision in

implementing accurate timing operations in order to achieve nominal values.

Common examples of systems that employ controlled delay-lines include delay locked loops (DLL) [Hossain, 2014], digitally controlled oscillators [DCO], phase locked loops (PLL) [Xu, 2010], asynchronous pipelines [Chang, 2010] and time-to-digital converters (TDC) [Jansson, 2004]. Delay circuits are used to realize pulse-width control circuits with output programmable duty cycles used in ADCs or DACs [Su, 2012] and more recently continuous-time digital filters [Vezytzis, 2013]. Besides, it is not uncommon to employ dedicated delay elements for relatively simpler signal processing operations as well, like phase shifting, interpolation or non-overlapping clock generation.

Manipulating the mechanisms involved in the generation of delay has lead to the introduction of multiple ideas throughout the years. Apart from the straightforward cascade of inverters, or any logic gate for that matter, there has been specific research activity targeted at building power-efficient, variable delay elements. Below, we present the main techniques that spawned many variants over the recent years: capacitive shunting, current starving and thyristor-based design. After the presentation, a short discussion follows on the topology selected for further study.



Fig. 4-1 Typical definitions of rise/fall time and rising/falling edge propagation delay.



Fig. 4-2 Programmable delay line in [Li, 2006].

#### 4.2.1 Cascaded inverters

Obviously, one of the simplest ways to introduce a digital delay has been the cascade of inverters. In this case, delay is created due to the finite slopes of charging and discharging the loaded inverter outputs. These charging and discharging slopes could be modeled based on the time constants set by the effective resistance of the switching transistors and the output capacitance. A scheme of cascaded inverters interleaved with multiplexers is a typical example implementation. A recent realization was proposed in [Li, 2006], where high effort stages of cascaded inverters where used to produce a nanosecond order delay values (Figure 4-2). Evidently, any sort of variation in process, supply or temperature readily translates in variation in delay [Rabaey, 2003]. Moreover, the delay is inversely proportional to the switching slope. So, for bigger delay values there's more power consumption involved.

#### 4.2.2 Capacitive Shunting

As seen above, for the general arrangement of cascades of inverters or in the case of inverters driving capacitive loads such as the one in Figure 4-3, the output signal slope varies with the time constant. This represents the main mechanism for creating delay. Actually, delay can be varied by modifying output capacitance directly [Bazes, 1985] or the charge flow to it, through control voltage Vctrl as seen in Figure 4-4 taken from the proposed cell in [Nejad, 2003].





Fig. 4-3 Output slope varies with load capacitance from [Rabaey, 2003].

Fig. 4-4 Capacitive shunting control from [Nejad, 2003].

### 4.2.3 Semi-static approach

A semi-static approach to ensure minimum short circuit current during transitions in a delay element is proposed by [Jung, 2013]. The topology is shown in Figure 4-5 for producing rising edge delays. Input is connected to two separate arrangements, where in each one, the complementary transistor controlled biasing current mirror. Thus, a static consumption overhead is introduced. The two preliminary outputs are combined in a single stage by mitigating the short-circuit currents.



*Fig. 4-5 Semi-static approach relieves short-circuit currents in [Jung, 2011] but adds a static consumption overhead.*
#### 4.2.4 Current-starving

The current-starving technique is realized by adding extra MOS devices in series with the ones of the inverter. This effectively reduces (starves) the current associated with the switching events, which directly impacts the propagation delay.

As shown in Figure 4-6, delay control can be established either by directly modulating the voltage on the gates of the starving transistors. A digital approach would be switching-in transistors parallel to the starving ones as seen in Figure 4-7.



Fig. 4-6 Current starved inverter delay element with analog control from [Nejad, 2003]



Fig. 4-7 Current starved delay element with digital control [Nejad, 2005].

In the work of [Nejad, 2005] it was shown that the topology of Figure 4-7 leads to a nonmonotonic increase of the delay with an ascending binary input pattern. This is due to parasitic capacitance being added, which counteracts the reduction of effective resistance. In order to achieve a monotonic behavior, it is proposed to control only the current of the starving transistors (actually reverting to the scheme in Figure 4-6). This could be done by mirroring the output of a current DAC at the cost of extra static consumption. This approach is proven to increase tolerance to PVT variations because the digital code reflects only the control current.

On another note, the current-starving technique produces a less sharp transition slope, which means more short-circuit current. A way to limit this side-effect was proposed in [Yang, 2006] through the use of series diodes to limit the output swing, which effectively reduces power consumption.

An also interesting modification to the current starved inverter is a topology resembling the Schmitt trigger. The idea was proposed in [Sekiyama, 1992] in the context of an SRAM cell. The slow transition of the starved inverter is remedied with a cascaded inverter. This idea was extended to Figure 4-8 in the work [Mahapatra, 2002] where the Schmitt trigger is presented with positive feedback action from the output signal, therefore improving the output transitions. Additionally, the publication suggests this topology as the best in terms of signal integrity and delay by a comparison with set of rudimentary delay elements (current starved, transmission-gate load based, cascaded inverter based). The idea of positive feedback for transition slope modification draws from thyristor-based elements presented in the next section.



Fig. 4-8 Schmitt trigger output stage on current starved delay from [Mahapatra, 2002].

#### 4.2.5 Thyristor-based delay element

The basic principle of a thyristor device is the activation of the device when a certain conduction threshold is crossed. In a thyristor-based delay element this operation is replicated by a positive feedback mechanism that completes a delay event after crossing a threshold. The delay event could be a capacitor that is slowly charged or discharged and after a threshold voltage across its terminals is reached, the (dis-)charging is forcefully accelerated through positive feedback action.

The concept is illustrated in Figure 4-9, using a complementary transistor pair. In order to describe functionality, we will start from a steady state where both transistors are off and the gate voltages are  $V_{DD}$  for gate P and zero for gate N. As we lower the voltage on P, in the vicinity of  $V_{DD}$ - $V_{THp}$ , the PMOS begins to turn on. However, as the PMOS turns on, the voltage on its gate is kept driven towards the ground, not only because of our triggering Fig. 4-9 CMOS thyristor concept in [Kim, 1996]. action, but also because the NMOS is now beginning to turn on as well. Therefore,



superimposed on the action initially causing the switch, comes an additive force, which further accelerates the switch, hence, a positive feedback loop. Obviously, to accommodate another switching cycle, the thyristor must be pre-charged again as there is no way to return to prior state. This can be realized by adding a pre-charge circuit. The complete topology of the cell, where sequencing of the delaying and pre-charging events take place is shown in Figure 4-10. It works with two similar parts for supporting rising and falling edges, where the delayed rising output pre-charges the part that delays the falling edge. For example, before any rising edge of D,  $\overline{Q}$  has already reached high state through the preceding high to low transition of D. So, the left side thyristor is ready to delay the rising edge of D.

The thyristor-based methodology exhibits attractive characteristics in terms of power consumption and supply sensitivity. Power is consumed during switching with a small shunt current.



Fig. 4-10 Complete thyristor delay element with static triggering in [Kim, 1996].

Primarily, there is no static consumption apart from any control current generation mechanism. Also, supply sensitivity is generally low because the delay is composed of two components: the controlled current part and the switching part. The switching part is the only part depending on the supply value, because this is when a charge cycle is completed through positive feedback. If this duration is negligible in terms of the desired delay value, sensitivity to the supply voltage is obviously minimal.

The thyristor-based delay element has attracted researchers' attention due to its special characteristics and various works have been proposed based on it. In [Zhang, 2004], the fact that the thyristor-based topology can suffer from charge sharing is acknowledged. The parasitic capacitance tied to  $\overline{Q}$  is shared with the source capacitances of the input transistors. The work proposed the addition of switches to pre-charge the output nodes, prior to switching the input.

In the work of [Schell, 2008], the thyristor-based topology was used to accommodate a delay cell that would support narrowly spaced bursts of asynchronous pulses. The delay element featured in this work produces delayed pulses of determined width. So, with respect to Figure 4-10, only one side of the circuit was needed. The circuit made use of an additional capacitor element, which can be slowly discharged through the current source. The same concept is followed for a similar circuit in the work of [Vezyrtzis, 2013]. The design comprised two capacitors and two current sources with independent biases for the same delay cell, in order to support further configurability.

The same general methodology is used for the thyristor delay topology in [Saft, 2014]. The work proposed two ideas on the basic scheme: (i) the addition of an extra current source to accommodate a sharper edge and reduce the transition shunt current (M9 in



Fig. 4-11 Enhancing positive feedback with a current source (M9).

Figure 4-11) (ii) a series diode connected with the output transistor pair in order to modify the thyristor activation point (not shown).

Another thyristor topology, proposed in [Kurchuk, 2012], consolidates the two stages of the Schmitt trigger idea of Figure 4-8. For the design shown in Figure 4-12 the principle of operation of the thyristor is still valid; the internal node, Vc, is (dis-)charged progressively up to a point where positive feedback is forcing the phenomenon to complete rapidly. This topology is very efficient in terms of power consumption as it comprises only a few inverter stages. Buffering the output, renders delay independent of the output load.



Fig. 4-12 Delay element in [Kurchuk, 2012].

| Table 4-I | Qualitative | comparison | of delay | element types. |
|-----------|-------------|------------|----------|----------------|
|-----------|-------------|------------|----------|----------------|

|                            | Inverter   | Capacitive |             | Current-   |           |
|----------------------------|------------|------------|-------------|------------|-----------|
| Туре                       | chain      | shunt      | Semi-static | starved    | Thryistor |
| Static power               | no         | no         | yes         | yes        | no        |
| Energy/toggle              | ∝ to delay | ∝ to delay | ∝ to delay  | ∝ to delay | constant  |
| Supply sensitivity         | high       | medium     | medium      | medium     | low       |
| Temperature<br>sensitivity | high       | medium     | medium      | medium     | very low  |

#### 4.2.6 Discussion

Table 4-I presents a qualitative comparison of the characteristics of the delay element types that were presented above. Evidently, there have been a lot of ideas to produce digital delay elements, but out of the main topology classes described above, the one that seems more attractive for advanced CMOS technology nodes is the thyristor-based one. This is because of its superior sensitivity to supply and temperature as well as the fact that there's no static consumption.

All the reviewed implementations of the thyristor-based delay element are based on a usual control mechanism during transitions by modulating the (dis)charge current. As different technology nodes are targeted and delay ranges differ, it is difficult to quantify the efficiency of the topologies. To overcome this issue, simulations have been reported for comparisons. But even so, it is not clear what amount of optimization has gone into the comparisons and yet, no variant has proposed a remarkable improvement over the main topology. The table below presents a performance comparison of the most important works.

| Ref.           | [Jung, 2011] | [Nejad, 2005]      | [Kim, 1996]   | [Li, 2006]            | [Schell, 2008] | [Kurchuk,<br>2010] |
|----------------|--------------|--------------------|---------------|-----------------------|----------------|--------------------|
| Node           | 0.35µm       | 0.18µm             | 0.8µm         | 0.25µm                | 90nm           | 65nm               |
| Supply         | 3.3V         | 1.8V               | 2V            | 2.5V                  | 1V             | 1V                 |
| Туре           | semi-static  | current<br>starved | thyristor     | cascaded<br>inverters | thyristor      | thyristor          |
| Delay<br>Range | 3-10ns       | <2.5ns             | 3ns-76.3ms    | -                     | 5ns-1µs        | 95-250ps           |
| Power          | 40µW         | 150-300μW          | 10nW<br>@1MHz | 12pJ/event            | 50fJ/event     | 14fJ/event         |

Table 4-II Summary of performance for the reviewed delay elements.

It is clear that selecting the proper delay cell depends mainly on its assignment. The decision feedback equalizer specification from the previous chapter targets speeds of 1-2Gbps. This determines a regime for the delay range of a few hundreds of picoseconds. The variant of [Kurchuk, 12] is realized on a sub-100n node and has reported the minimum power consumption for a fabricated circuit on that delay range. In the next section we build upon this design to exhibit our proposed topology.

# 4.3 Prototype back-gate driven thyristor based delay line in 28nm FDSOI

According to STMicroelectronics the establishment of FDSOI technology during the recent years has been a key enabler of the continuation of Moore's law. FDSOI has provided a roadmap to evolve further chip integration by maintaining a planar technology and therefore maintaining simplicity in the manufacturing process. The FDSOI technology is implemented with a non-doped silicon thin film to realize the transistor channel, featuring an ultra-thin buried oxide underneath as seen in Figure 4-13. The advantage is an additional control mechanism to modify the transistor's behavior by applying bias to the body terminal. In traditional bulk CMOS technology body-biasing is in most cases an option compromising the transistor's operation due to the presence of leakage currents from the substrate. On the contrary, in FDSOI transistors the buried oxide provides the needed isolation from the substrate, so that a gate-like (namely, the back-gate) control mechanism is introduced. As seen further, this mechanism provides a means of performance fine-tuning.



Fig. 4-13 FDSOI transistor cross-section.



Fig. 4-14 The delay element circuit proposed by this work.

#### 4.3.1 Proposed Delay element design

After acknowledging the above technology facts and combined with the conclusions on the current state of the art, a delay cell was implemented, which benefits from the unique technology traits of FDSOI. The proposed cell topology is displayed in Figure 4-14 and can be readily distinguished in three parts. There is the current-starved inverter part on the first stage with the novelty of both gate and body bias control. There is a second inverter stage, which is gated by the complement of the input signal. This one produces also a feedback signal. Finally, there is a buffering stage, which generates complementary output signals.

The thyristor based mechanism, as detailed above, remains intact. In this case the delaying mechanism is the (dis)charging of the output of the starved-inverter. Each of the two biased transistors exists in parallel with one that becomes activated only by the feedback action of node V<sub>F</sub> from the next stage. Specifically as seen in Figure 4-15, during a charge or discharge cycle, the first stage output, node V<sub>c</sub>, slowly reaches the flipping point of the second stage inverter. By the time it flips, the corresponding parallel transistor on the feedback path shorts the "starvation" mechanism, forcefully completing the cycle. That is, fully charging node V<sub>c</sub> through the PMOS transistors, or discharging it completely through their NMOS counterparts. The slowly changing part refers to the current-starved part of the cycle, which will gradually trigger the positive feedback action. Due to this action, the cycle ends with a steep slope to the rail.



Fig. 4-15 Vc node evolution for a charging(red) or discharging(blue) event.

This compact topology presents a way to delay both rising and falling edges of the input signal with direct control over the gates of the starved transistor. The order of delay is principally set by the size of the input capacitance of the second stage inverter and the effective resistance of the starved transistor.

Evidently, the cell is composed of inverter structures so there's no static power consumption. Moreover, potential leakage currents are minimized just because of the fact that fabrication is carried out in FDSOI technology. As shown on the transistor cross-section in Figure 4-13, the FDSOI substrate is separated from the source and drain areas with a dielectric, resulting in a well-confined path for the current. On the contrary, in bulk technology, isolation from the substrate happens with reverse biased p-n junctions, so current leakage is more pronounced. Therefore, the element's consumption profile is dominated by (i) the current-starved (dis)charging part which produces the desired delay and (ii) the short-circuit currents during switching. In an attempt to minimize the latter, the cell is designed with complementary inputs ensuring that the *in\_* signal *precedes* the *in* signal. This can be easy to realize if the delay cell is cascaded with a similar one. Arriving directly on the second stage inverter, the input complement, *in* signal, plays a preparatory role for the upcoming switch. When the *in* signal follows, say in a high state, in order to slowly discharge the second stage input, *in\_* is already set on a low state and has effectively shut the path to the ground. This action enables the second stage output to be raised high with a reduced short-circuit current loss. The corresponding phenomena take place on the falling edge of the input. Additionally, the delay cell output is buffered, an option that renders the delay value independent of the load of the delay cell. Buffering also steepens the output signal transitions in order to keep short-circuit current generation minimum for subsequent cells, but it also requires extra power. Setting the required buffer size was driven by the delay range and layout considerations.

The final feature to comment on the designed delay element refers to the application of body biasing. This capability enables a second control terminal for the transistor current. Therefore, an extra way to modulate delay can be established. Summarizing the control nodes of the delay element we can review that input rising edge delay control is handled by the NMOS gate ( $V_{Gn}$ ) and body ( $V_{Bn}$ ) voltages, whereas falling edge delay is controlled by the PMOS gate ( $V_{Gp}$ ) and body ( $V_{Bp}$ ) voltages. As it will be presented in the next section, this biasing scheme enables a coarse/fine control mechanism for the delay with no additional hardware.

#### 4.3.2 Delay versus control

During the design phase, transistor sizing for this delay element was optimized for low power operation to support a delay range from the order of picoseconds to nanoseconds. In Figure 4-16, rising edge delay is simulated against the respective control voltages. We assume that the PMOS voltages are fixed at  $V_{Gp}$  =500mV and  $V_{Bp}$  =1V. We are varying  $V_{Bn}$  body bias (in green) for different  $V_{Gn}$  values to produce the family of curves below. Also shown is the variation of  $V_{Gn}$  (in blue) for a fixed  $V_{Bn}$  =0V, which represents a classic bulk technology approach.

Correspondingly, in Figure 4-17, for the falling edge delay variation, it is the NMOS voltages that are set to  $V_{Gn}$  =500mV and  $V_{Bn}$  =0V. We are varying  $V_{Bp}$  voltages (in yellow) for different  $V_{Gp}$  values for the counterpart to create the new family of curves. The figure also displays the curve for the delay when varying  $V_{Gp}$  (in red) for the typical fixed body bias of  $V_{Bp}$  =1V.

After reviewing the two figures, it is evident that the choice of the targeted technology and the utilization of body bias drastically widen the capability to create and control delays. Approaching delay control only through the typical way of setting the transistor gate voltage would narrow down control flexibility to what is represented with the red and blue curves. However, body biasing results in the delay value becoming dependent on both controls. The exponential type reflects the delay proportionality to the current. Besides, the fact that the curve type is retained, regardless the voltage varied (gate or body), only reminds us that it is always the same mechanism involved. What is different in the two control paths refers to the different thicknesses of the interposed oxides and the contact-channel distances.



Fig. 4-16 Simulation of rising edge delay variation vs. control



Fig. 4-17 Simulation of falling edge delay vs. control.

The range of body biasing, constrained by the fabrication design directives, permits the establishment of an application-dependent control strategy that can potentially ease the specification of the control mechanism. To demonstrate that, in Figure 4-17 it is obvious that gate-only control (red curve) can be realized for the shown extent of the delay with a dynamic range of around 800mV. Be that as it may, by establishing a constant gate voltage of 700mV (yellow curve in the middle) and varying the body bias can yield the same extent of

delay, with a more than double the dynamic range for the control. This could directly reflect the design requirements for the potential employment of a controlling DAC, by reducing the required resolution.

Clearly, providing a second degree of delay control enriches a system designer's repertoire of choosing the most fitting solution to the application at hand. Another remark that can be made concerns the slope of the delay variation that's increasing with the gate voltage under body bias conditions. The impact of the body bias follows the gate as the current slope increases above the threshold of the device.

This effect can be exploited when biasing the gate and can lead actually to obtaining a segment of the delay range that refers a mostly linear part of the curve. To elaborate this, still referring to Figure 4-17, we can assume a gate voltage in the vicinity of 500mV where obviously the delay versus the body bias variation provides the higher correlation with a linear function. In Figure 4-18 we show in detail this situation. This demonstrates a very fine resolution of delay values achieved over the applicable delay range. In this example the delay can be varied around over 100ps by modifying the body bias from 200mV to 1V.



Fig. 4-18 Delay coarse/fine control with gate/body biasing.

The gate bias is held constant as it is the most sensitive in variations. This design strategy enables establishing a control function of very high precision. In the example simulation, a very low sensitivity to the bias voltage 100ps/800mV=125fs/mV can be achieved.

# 4.4 Delay line prototyping

After acknowledging the characteristics and the flexibility of the proposed delay cell, the design of a programmable delay line is motivated. It should serve as the flexible delay element needed to implement the feedback filter of the proposed Decision Feedback Equalizer that was described in the previous chapter. For the remainder of this section we will focus on the implementation features of the fabricated delay-line.

In an effort to fully investigate the topology, control flexibility was targeted. Besides, this was the first time of characterizing this type of delay circuit, so producing delays by varying the gate or body biases on both NMOS and PMOS was envisioned. The granular delay transistors were carefully laid out in twin and triple-well arrangements in order to ensure isolation and proper biasing. The well arrangement is shown in Figure 4-19.

A block diagram of the implemented delay-line topology is presented in Figure 4-20. The design strategy involves a cascade of granular cells as the one presented in the previous chapter.

The cascade of granular elements is organized in groups with the biggest group having eleven granular delay elements. After post-layout simulations, the choice for the increment of



Fig. 4-19 Well arrangement for the granular delay cell.



Fig. 4-20 Delay line block diagram.

group size was based on the range of delay for the granular cell and ensures minimum overlapping when programming a delay value.

The delay line is designed with multiple output tap points, so programmable delays values can be produced. This is realized by connecting all tap output nodes on a common bus. Control of the bus is carried out with a thermometer-coded signal (*en*) that isolates the selected output.

Moreover, in order to minimize power consumption, a shut-down scheme is established for the delay groups that are not participating in the output delay. A function cell (*the lead*) is placed between groups, which can enable the output of the previous group while shutting down the subsequent groups.



Fig. 4-21 The 'lead' cell between groups enables the output from the previous group and shuts down the rest of the delay line.

The *lead*'s functionality is presented in Figure 4.21. The logic uses the preceding control bit to identify the thermometer code extent and routes the output of the previous group to the bus by activating a tri-state buffer. At the same time it isolates the input of the subsequent granular cell and disables it by pulling it up to the positive rail. The corresponding pull-down is performed for the complementary granular cell input. This effectively propagates a steady-state for the granular elements that do not contribute to the delay function, with no further consumption except for the one coming from leakage currents.

Apart from the programmable delay line length, the four gate/body bias voltages are connected in parallel to all granular delay cells of the line. This way, a general two-step configuration is permitted. The delay value at the output is given as the product of the common granular delay multiplied by the number of the enabled elements.

#### 4.4.1 Post-layout simulation results

Figure 4-22 shows simulated<sup>7</sup> rising edge delay times versus the number of enabled granular delay groups. The values for the "starved" NMOS/PMOS gate biases were held constant at 400mV and 600mV respectively with the PMOS body on 1V. With each group enabled, the NMOS body bias was set on either 0V or 800mV to account for the extent of variability under the common 1V supply.

The range of variability for the granular cell can accommodate Gb/s rates and was accounted for the increment in grouping size, as shown in the previous chapter. Furthermore, body biasing enables a very fine tuning of delay. The value of 142fs/mV is achievable over a range of 113ps and 5.52ps/mV over 4,42ns with the last group; somewhat less than an order of magnitude. The granular delay cell alone has been simulated to consume 5.16fJ/bit. By comparison to the 14fJ/bit cell of [Kurchuk, 2010] the reduction comes from technology scaling and a somewhat different shutdown mechanism. The simulated results are summarized in Table 4-III.

<sup>&</sup>lt;sup>7</sup> Post-layout simulations for the granular cell included resistive and capacitive parasitics.



*Fig. 4-22 Simulation of the programmable range for rising edge delay times. Table 4-III Summary of simulation results.* 

| Range of delay (complete line)       | 300ps - 12.8ns         |
|--------------------------------------|------------------------|
| Range of delay (granular)            | 113ps-442ps            |
| Delay line average sensitivity       | 142fs/mV - 5.52ps/mV   |
| Bit efficiency - granular cell       | 5.16fJ/bit             |
| Bit efficiency - complete delay line | 35 fj/bit - 260 fj/bit |

### 4.4.2 Chip design

The fabricated delay line prototype design is summarized in Figure 4-23. It features external control for the gate voltages of the granular cells and two 1V referenced R-string DACs for



Fig. 4-23 Delay line prototype built for characterization and die detail with layout inset. The delay line size is 140um x 7um.

body-biasing. A second variant was also fabricated to facilitate characterization with external control and characterize the DAC separately. The *en* control signals are provided through control registers that are externally loaded through a serial programming interface. The measurements are taken by subtracting the dummy output delay from the delay line output to de-embed the impact of pad drivers external routing and cabling.

#### 4.5 Measurement results

Within a strict time budget for documenting results, this version of the manuscript was written as the experimental test-bench had just been assembled. Nonetheless, the results below verify the good functionality of the delay line prototypes.

#### 4.5.1 Coarse/fine character

The measurements were made with the Register #2 of Figure 4-23 in initialization mode, which enabled all 39 granular delay elements in the line. Input was a 1 kHz square wave with a rise time of 27,5ns. These measurements were taken using a 1GHz oscilloscope.

The measurements refer to rising and falling edge delay times between the line output and the dummy output. The sweeps involve:

- (i) varying the body bias of the NMOS transistor ( $V_{Bn}$ ) in the 0-800mV range for different values of the NMOS gate biases ( $V_{Gn}$ )
- (ii) varying the body bias of the PMOS transistor  $(V_{Bp})$  in the 200mV-1V range for different values of the PMOS gate biases  $(V_{Gp})$ .

The gate voltages were also varied for the typical cases of  $V_{Bn}=0$  (rising edge measurements) or  $V_{Bp} = 1$  (falling edge measurements) and they are kept as a reference for all curves to show the coarse delay tuning, which is achieved through gate-only control.

Simulation results are presented with the measurements of the rising edge for completeness. The simulations refer to schematic representations for the delay line, with RC parasitics for the granular cells only. They do not include the delay induced by front drivers, overhead routing, the *lead* blocks and the common bus rail driver.

The coarse/fine character that body biasing enables is observed as expected. Sensitivity in the order of fs/mV is attainable. This fine character is modified by varying gate biasing, which trades-off delay range with sensitivity. Table 4-IV below summarizes the measured results seen in Figures 4-24 to 4-31.

| Delay<br>type | Control<br>type | Constant<br>bias           | Varying<br>bias                 | Delay<br>Range(ns) | Sensitivity<br>(ps/mV) |
|---------------|-----------------|----------------------------|---------------------------------|--------------------|------------------------|
| Rising        | Gate<br>Biasing | $V_{Bn} = 0V$              | <i>V<sub>Gn</sub></i> =0.3 - 1V | 5.63-80.18         | 106,5                  |
| Fdge          |                 | $V_{Gn} = 300 \mathrm{mV}$ |                                 | 27.9-79.75         | 74,07                  |
| Delay         | Body            | $V_{Gn} = 400 \mathrm{mV}$ | $V_{Bn}=0-0.8V$                 | 18.09-10.87        | 10,31                  |
| Deluy         | Biasing         | $V_{Gn} = 500 \mathrm{mV}$ |                                 | 7.41-8.92          | 2,16                   |
|               |                 | $V_{Gn} = 600 \mathrm{mV}$ |                                 | 6.31-6.84          | 0,76                   |
| Falling       | Gate<br>Biasing | <i>V<sub>Bp</sub></i> =1   | $V_{Gp} = 0 - 0.7 V$            | 5.02-171.4         | 237,68                 |
| Fdge          |                 | $V_{Gp} = 400 \mathrm{mV}$ |                                 | 6.39-7.44          | 1,5                    |
| Delay         | Body            | $V_{Gp} = 500 \mathrm{mV}$ | V <sub>Pr</sub> =0 2 - 1V       | 8.57-12.17         | 5,14                   |
| Deluy         | Biasing         | $V_{Gp}$ = 600mV           |                                 | 16.36-32.9         | 23,63                  |
|               |                 | $V_{Gp} = 700 \mathrm{mV}$ |                                 | 53.6-160.2         | 152,29                 |

Table 4-IV Coarse/fine delay measurements for complete delay line enabled



Fig. 4-24 Rising edge delay vs. gate/body biasing: (i) gate voltage variation:  $V_{Gn} = 0.1V$  under  $V_{Bn} = 0$ , (ii) body voltage variation:  $V_{Bn} = 0.08V$  under  $V_{Gn} = 0.3$ 



Fig. 4-25 Rising edge delay vs. gate/body biasing: (i)gate voltage variation:  $V_{Gn} = 0.1V$  under  $V_{Bn} = 0$ , (ii) body voltage variation:  $V_{Bn} = 0.08V$  under  $V_{Gn} = 0.4$ 



Fig. 4-26 Rising edge delay vs. gate/body biasing: (i) gate voltage variation:  $V_{Gn} = 0.1V$ under  $V_{Bn} = 0$ , (ii) body voltage variation:  $V_{Bn} = 0.0.8V$  under  $V_{Gn} = 0.5$ 



Fig. 4-27 Rising edge delay vs. gate/body biasing: (i)gate voltage variation:  $V_{Gn} = 0.1V$  under  $V_{Bn} = 0$ , (ii) body voltage variation:  $V_{Bn} = 0.08V$  under  $V_{Gn} = 0.6$ 



Fig. 4-28 Falling edge delay vs. gate/body biasing: (i) gate voltage variation:  $V_{Gp} = 0.1V$ under  $V_{Bp} = 1$ , (ii) body voltage variation:  $V_{Bp} = 1-0.2V$  under  $V_{Gp} = 0.4$ 



Fig. 4-29 Falling edge delay vs. gate/body biasing: (i) gate voltage variation:  $V_{Gp} = 0.1V$ under  $V_{Bp} = 1$ , (ii) body voltage variation:  $V_{Bp} = 1-0.2V$  under  $V_{Gp}=0.5$ 



Fig. 4-30 Falling edge delay vs. gate/body biasing: (i) gate voltage variation:  $V_{Gp} = 0.1V$ under  $V_{Bp} = 1$ , (ii) body voltage variation:  $V_{Bp} = 1-0.2V$  under  $V_{Gp}=0.6$ 



Fig. 4-31 Falling edge delay vs. gate/body biasing: (i) gate voltage variation:  $V_{Gp}$ =0-1V under  $V_{Bp}$  =1, (ii) body voltage variation:  $V_{Bp}$  =1-0.2V under  $V_{Gp}$ =0.7

#### 4.5.2 Extended body biasing

The presence of the buried oxide under the FDSOI transistor channel enables body biasing beyond the 0-1V range. Indicative delay measurements were taken by extending the applied voltages by 800mV. More specifically, regarding rising edge delay,  $V_{Bn}$  was varied from within -0.8V to 0.8V and for falling delays  $V_{Bp}$  was varied within 200mV to 1.8V. The results displayed in Figure 4-32,33 reveal no character change for the variation of delay and indeed advocate the use of extended body biasing as an effective means of extending the delay range.



Fig. 4-32 Rising edge delay vs. biasing gate/body (39 elements) Body curve:  $V_{Gp}=0V$ ,  $V_{Bp}=1V$ ,  $V_{Gn}=500mV$ ,  $V_{Bn}=-800mV$  to 800mV. Gate curve:  $V_{Gp}=0$ ,  $V_{Bp}=1$ ,  $V_{Bn}=0$ ,  $V_{Gn}=0.5-1V$ 



Fig. 4-33 Falling edge delay vs. biasing gate/body (39 elements) Body curve: V<sub>Gp</sub>=500V, V<sub>Bp</sub>=200mV-1.8V, V<sub>Gn</sub>=1V, V<sub>Bn</sub>=0V Gate curve: V<sub>Gp</sub>=0V-0.5V, V<sub>Bp</sub>=1, V<sub>Bn</sub>=0, V<sub>Gn</sub>=1V

#### 4.5.3 Delay line programmability

The functionality and programmability of the control were verified in enabling consecutive delay line groups. A series of rising edge delay measurements was performed under constant gate biasing with  $V_{Gn}$  =400mV for increasing number of active granular element groups. The rising edge delay values were measured under  $V_{Bn}$  values of 0 and 800mV. The PMOS bias voltages were set on  $V_{Gp}$  =600mV and  $V_{Bp}$  =1 throughout the set. The measured results are displayed in Figure 4-34 against simulated results.

The delay line under these gate conditions achieves a range of 530ps to 16,130ns. All bias conditions are common throughout the control vector sweep, so the resolution and the delay range for each comprising granular element remains constant. The shape of the curve reflects the compounded effect of incremental group size.



Fig. 4-34 Delay line programmability measurement

#### 4.5.4 Granular delay element performance

Multiple delay group measurements were performed to extrapolate the performance of a single granular delay element. The goal is to de-embed the delay of the *lead* circuit as well as any overhead buffering before the pad driver<sup>8</sup>. The scenario involved the measurement of rising edge delays for all delay line groups with  $V_{Gn} = V_{Bn} = 500$  mV,  $V_{Gp} = 0$ V and  $V_{Bp} = 1$ V. This resulted in Table 4-V. Solving the set of equations, we get of 226,75ps for the bus driver delay, 73,7ps for lead and 175.2ps for the granular delay. Hence, the delay variation for a single granular is deduced (Figure 4-35).



| Table 4-V Rising edge delay measurements | ; for | V <sub>Gn</sub> = | V <sub>Bn</sub> = | =500mV |
|------------------------------------------|-------|-------------------|-------------------|--------|
|------------------------------------------|-------|-------------------|-------------------|--------|

Fig. 4-35 Single, granular delay element rising edge delay vs. biasing gate/body. Body curve: V<sub>Bn</sub> =0-0.8V, V<sub>Gn</sub> =500mV, V<sub>Gp</sub>=0V, V<sub>Bp</sub> =1V Gate curve: V<sub>Gn</sub> =0.2V-1V, V<sub>Bn</sub> =0, V<sub>Gp</sub>=0V, V<sub>Bp</sub> =1. The sensitivity is 40ps/800mV=50fs/mV

<sup>&</sup>lt;sup>8</sup> The remaining pad driver impact delay is removed by measuring the delay with respect to the dummy output.

#### 4.5.5 Supply voltage sensitivity

A series of rising delay measurements was performed for different supply voltages. The results displayed in Figure 4-36 involved  $V_{Gn} = V_{Bn} = 500$  with  $V_{Gp} = 0$  and  $V_{Bp} = 1$  and the complete line enabled. As delay is produced from components of different sensitivity (granular delay cell, lead cell, bus driver) to the supply voltage, we repeat the previous methodology by measuring delay for different supply voltages. The results are shown in Table 4-VI.



Fig. 4-36 Total delay line variation with supply voltage for  $V_{Gn} = V_{Bn} = 0.5$ ,  $V_{Gp} = 0$ ,  $V_{Bp} = 1$ 

|              | 1.1V   | 1V     | 0.9V    |
|--------------|--------|--------|---------|
| Granular(ps) | 157.94 | 174.64 | 204.67  |
| Driver(ps)   | 157.35 | 190.19 | 243.21  |
| Lead(ps)     | 62.51  | 75.24  | 95.6057 |

Table 4-VI Supply voltage sensitivity derived

#### 4.5.6 Dynamic Power consumption

A separate supply pin used for both delay lines of Figure 4-23 enabled measurement of power consumption. During measurements, one delay line was always completely shut-down using the shut-down mechanism of the first lead. On the delay line under test, in order to de-embed the consumption of the leads and bus driver, the same methodology as the one of the previous paragraph was used, by enabling consecutively delay groups. Initially the total leakage power consumption floor was identified to 9.7uW. Power consumption was measured with an input frequency of 100MHz. The results with an increasing number of active delay groups are seen in Table 4-VII. Solving the system of equations for the power consumption of the granular element, the bus driver and the lead, we get:  $P_{elem}=1.2679\mu$ W,  $P_{driver}=11.1375\mu$ W and  $P_{lead}=0.5774\mu$ W. The results signify a dynamic consumption of 12,68fJ/bit for the granular delay element. The measurements were verified by repeating the procedure with the other delay line disabled.

| Power (uW) | Power(uW) | Elements | # Bus   |        |
|------------|-----------|----------|---------|--------|
| simulated  | measured  | Enabled  | drivers | #Leads |
| 3.5        | 13        | 1        | 1       | 1      |
| 4.2        | 14.8      | 2        | 1       | 2      |
| 5          | 16.7      | 3        | 1       | 3      |
| 5.7        | 18.5      | 4        | 1       | 4      |
| 6.5        | 20.3      | 5        | 1       | 5      |
| 7.8        | 23.6      | 7        | 1       | 6      |
| 9.8        | 27.8      | 10       | 1       | 7      |
| 11.9       | 33.5      | 14       | 1       | 8      |
| 15.3       | 41.7      | 20       | 1       | 9      |

| Table 4-VII Power | <i>consumption</i> | for 100MHz | input $V_{Gn} =$ | $V_{Bn} = 500 mV$ |
|-------------------|--------------------|------------|------------------|-------------------|
|                   |                    | ,          |                  |                   |

|                          | P <sub>elem</sub>  | P <sub>driver</sub> | Plead  |
|--------------------------|--------------------|---------------------|--------|
| Power (µW) @100MHz input | 1.2679 (0.517 sim) | 11.1375             | 0.5774 |

# 4.6 Chapter bibliography

[Bazes, 1985] Bazes, Mel. "A Novel Precision MOS Synchronous Delay Line." *Solid-State Circuits, IEEE Journal of* 20, no. 6 (December 1985): 1265–71. doi:10.1109/JSSC.1985.1052467.

[Chang, 2010] Chang, Ik Joon, Sang Phill Park, and Kaushik Roy. "Exploring Asynchronous Design Techniques for Process-Tolerant and Energy-Efficient Subthreshold Operation." *IEEE Journal of Solid-State Circuits* 45, no. 2 (February 2010): 401–10. doi:10.1109/JSSC.2009.2036764.

[Hossain, 2014] Hossain, Masum, Farrukh Aquil, Pak Shing Chau, Brian Tsang, Phuong Le, Jason Wei, Teva Stone, et al. "A Fast-Lock, Jitter Filtering All-Digital DLL Based Burst-Mode Memory Interface." *IEEE Journal of Solid-State Circuits* 49, no. 4 (April 2014): 1048–62. doi:10.1109/JSSC.2013.2297403.

[Jansson, 2006] Jansson, Jussi-Pekka, Antti Mäntyniemi, and Juha Kostamovaara. "A CMOS Time-to-Digital Converter with Better than 10 Ps Single-Shot Precision." *Solid-State Circuits, IEEE Journal of* 41, no. 6 (June 2006): 1286–96. doi:10.1109/JSSC.2006.874281.

[Jung, 2011] Jung, L.H., T. Lehmann, G.J. Suaning, and N.H. Lovell. "A Semi-Static Threshold-Triggered Delay Element for Low Power Applications." In *2011 IEEE International Symposium on Circuits and Systems (ISCAS)*, 833–36, 2011. doi:10.1109/ISCAS.2011.5937695.

[Jung, 2013] Jung, L.H., N. Shany, A. Emperle, T. Lehmann, P. Byrnes-Preston, N.H. Lovell, and G.J. Suaning. "Design of Safe Two-Wire Interface-Driven Chip-Scale Neurostimulator for Visual Prosthesis." *IEEE Journal of Solid-State Circuits* 48, no. 9 (September 2013): 2217–29. doi:10.1109/JSSC.2013.2264136.

[Kim, 1996] Kim, Gyudong, Min-Kyu Kim, Byoung-Soo Chang, and Wonchan Kim. "A Low-Voltage, Low-Power CMOS Delay Element." *IEEE Journal of Solid-State Circuits* 31, no. 7 (1996): 966–71. doi:10.1109/4.508210.

[Kurchuk, 2010] Kurchuk, M., and Y. Tsividis. "Energy-Efficient Asynchronous Delay Element with Wide Controllability." In *Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS)*, 3837–40. IEEE, 2010. doi:10.1109/ISCAS.2010.5537714.

[Kurchuk, 2012] Kurchuk, M., C. Weltin-Wu, D. Morche, and Y. Tsividis. "Event-Driven GHz-Range Continuous-Time Digital Signal Processor With Activity-Dependent Power Dissipation." *IEEE Journal of Solid-State Circuits* 47, no. 9 (September 2012): 2164–73. doi:10.1109/JSSC.2012.2203459.

[Li, 2006] Li, Y. W, K. L Shepard, and Y. P Tsividis. "A Continuous-Time Programmable Digital FIR Filter." *IEEE Journal of Solid-State Circuits* 41, no. 11 (November 2006): 2512–20. doi:10.1109/JSSC.2006.883314.

[Mahaptra, 2002] Mahapatra, N.R., A. Tareen, and S.V. Garimella. "Comparison and Analysis of Delay Elements." In *The 2002 45th Midwest Symposium on Circuits and Systems, 2002. MWSCAS-2002*, 2:II – 473 – II – 476 vol.2, 2002. doi:10.1109/MWSCAS.2002.1186901.

[Nejad, 2003] Maymandi-Nejad, Mohammad, and Manoj Sachdev. "A Digitally Programmable Delay Element: Design and Analysis." *Very Large Scale Integration (VLSI) Systems, IEEE Transactions on* 11, no. 5 (October 2003): 871–78. doi:10.1109/TVLSI.2003.810787.

[Nejad, 2005] Maymandi-Nejad, M., and M. Sachdev. "A Monotonic Digitally Controlled Delay Element." *IEEE Journal of Solid-State Circuits* 40, no. 11 (November 2005): 2212–19. doi:10.1109/JSSC.2005.857370.

[Rabaey, 2003] Rabaey, Jan M., Anantha P. Chandrakasan, and Borivoje Nikolić. *Digital Integrated Circuits: A Design Perspective*. 2. ed. Prentice Hall Electronics and VLSI Series. Upper Saddle River, NJ: Prentice Hall, 2003.

[Saft, 2014] Saft, Benjamin, Eric Schafer, Andre Jager, Alexander Rolapp, and Eckhard Hennig. "An Improved Low-Power CMOS Thyristor-Based Micro-to-Millisecond Delay Element," 123– 26. IEEE, 2014. doi:10.1109/ESSCIRC.2014.6942037.

[Schell, 2008] Schell, B., and Y. Tsividis. "A Low Power Tunable Delay Element Suitable for Asynchronous Delays of Burst Information." *IEEE Journal of Solid-State Circuits* 43, no. 5 (May 2008): 1227–34. doi:10.1109/JSSC.2008.920332.

[Sekiyama, 1992] Sekiyama, Akinori, Teruo Seki, Shinji Nagai, Akihiro Iwase, Noriyuki Suzuki, and Masato Hayasaka. "A 1-V Operating 256-Kb Full-CMOS SRAM." *Solid-State Circuits, IEEE Journal of* 27, no. 5 (May 1992): 776–82. doi:10.1109/4.133168.

[Su, 2012] Su, Jun-Ren, Te-Wen Liao, and Chung-Chih Hung. "Delay-Line Based Fast-Locking All-Digital Pulsewidth-Control Circuit with Programmable Duty Cycle," 305–8. IEEE, 2012. doi:10.1109/IPEC.2012.6522686.

[Vezyrtzis, 2013] Vezyrtzis, C., Weiwei Jiang, S.M. Nowick, and Y. Tsividis. "A Flexible,

Clockless Digital Filter." In *ESSCIRC (ESSCIRC), 2013 Proceedings of the*, 65–68, 2013. doi:10.1109/ESSCIRC.2013.6649073.

[Xu, 2010] Xu, Liangge, Saska Lindfors, Kari Stadius, and Jussi Ryynanen. "A 2.4-GHz Low-Power All-Digital Phase-Locked Loop." *IEEE Journal of Solid-State Circuits* 45, no. 8 (August 2010): 1513–21. doi:10.1109/JSSC.2010.2047453.

[Yang, 2006] Yang, Jung-Lin, Chih-Wei Chao, and Sung-Min Lin. "Tunable Delay Element for Low Power VLSI Circuit Design." In *TENCON 2006. 2006 IEEE Region 10 Conference*, 1–4, 2006. doi:10.1109/TENCON.2006.344092.

[Zhang, 2004] Zhang, Junmou, S.R. Cooper, A.R. LaPietra, M.W. Mattern, R.M. Guidash, and E.G. Friedman. "A Low Power Thyristor-Based CMOS Programmable Delay Element." In *Proceedings of the 2004 International Symposium on Circuits and Systems, 2004. ISCAS '04*, 1:I – 769–72 Vol.1, 2004. doi:10.1109/ISCAS.2004.1328308.

# Chapter 5 A 5-tap mW/Gbps DFE for 60GHz basebands

From our system perspective analysis in Chapter 3, a highly configurable delay element was specified to support a critical tap cancellation scheme for the highest magnitude components of the 60GHz channel impulse response. After reviewing the most favorable design options, the choice for the design was to employ a cascade of novel, finely tunable delay elements. Capitalizing upon the merits of substrate isolation and body biasing, some unique characteristics have been highlighted through the characterization described in the previous chapter. The delay line under test has been very finely tunable and can cover a wide range. Therefore, it represents an excellent candidate for assigning the role of the programmable delay element serving each tap in the DFE's feedback loop.

In this chapter, the main hardware demonstrator of this thesis is presented, which was also fabricated in 28nm FDSOI. It features a flexible tap analog DFE with 5 taps, to serve a 60GHz receiver baseband system. Its role is to mitigate ISI, which mainly stems from signal reflections reaching the receiver along with the useful signal. The implemented topology follows a common mixed-signal approach, adopted recently for 60GHz basebands, that draws inspiration from wire-line DFE's. The DFE is based on analog summation in current mode logic with variable current sources implementing the tap coefficients. The sliced data are driven to an un-clocked, continuous-time, digital delay line. The novelty of the fabricated circuit lays in the absence of the clock for the feedback branch. The delay line would otherwise be typically implemented as a long clocked shift register. Besides, the power savings are not only confined in the mitigation of clock drivers, but also come from the fact that the continuous-time feedback approach exhibits a much smaller consumption floor than the typical static, fully-clocked, approach. Comparatively, power consumption is now dependent on the channel's impulse response, because each channel realization requires different delay values and hence, delay-line consumption varies accordingly. The remainder of this chapter highlights the key design aspects of the proposed DFE and presents the characterization results followed by a short discussion and comparison with the relative state of the art that was presented previously.

## 5.1 Specification of the proposed DFE

The basic block diagram of the fabricated prototype DFE, is shown in Figure 5-1. The channel input is summed with the weighted and delayed versions of previous bit decisions made by the clocked slicer. Assuming these decisions are correct, this alleviates the ISI caused by these bits (post-cursor ISI). Coefficients are programmable to reflect the need for equalization of multiple 60GHz channel realizations, thus adhering to the non-static character that the channel exhibits. The prototype is realized with five variable delay elements attached to the respective coefficients. Under this arrangement, each of the five coefficients covers a certain delay range instead of a specific value. This functionality is realized with a continuous-time digital delay line. The desired rate of operation imposes a granularity constraint for the delay line. Obviously, this granular delay cannot be greater than the clock period applied to the slicer. This requirement dictates the implementation of the delay elements as delay lines themselves.

A data rate of 2Gbps is proposed, which is reasonable for the realization of a low power 60GHz wireless system. After all, the proposed system can be employed with a single-carrier modulation, such as BPSK or QPSK, which is suggested by the IEEE 802.11ad standard for single carrier operation<sup>9</sup>. As mentioned in the previous chapter, the delay line can support delay spread coverage of 13ns and the granularity required for the above bit rate should be below 500ps. Moreover, the channel realizations studied in Chapter 3 set the requirement for coefficient amplitude to 0.4 times the amplitude of the cursor. Consequently, with the five tap arrangement, this results to total ISI correcting capability of twice the cursor. Furthermore, regarding the specification for the quantization of the coefficients, the study has revealed that



Fig. 5-1 Block level circuit design for the proposed DFE.

<sup>&</sup>lt;sup>9</sup> The Single Carrier PHY option of the standard specificies rate of 385-4620 Mb/s

#### Table 5-I Table of specifications for the DFE.

| Technology                     | 28nm FDSOI |
|--------------------------------|------------|
| Sample Rate                    | 2GS/s      |
| Modulation                     | BPSK       |
| Targeted BER                   | 10-3       |
| # DFE coeff. / IQ-channel      | 5          |
| Programmable delay coeffiients | Yes        |
| Maximum Delay Spread Coverage  | 65ns       |
| Nominal delay spread coverage  | 300ps-13ns |

a minimum resolution of 5 bits does not deteriorate the system's BER. These general specifications reflect the motivation for the circuit design effort presented ahead. The specifications are summarized in Table 5-I.

# 5.2 Prototype DFE circuit design

#### 5.2.1 Principle of operation

The specified block level diagram for the DFE can be seen in Figure 5-2. The topology represents an established scheme for mixed signal DFEs, where summation is carried out in current-mode for simplicity. All that is required to implement summation is to drive the relevant signals to the same node. When differential signaling is used, summation is carried out over two rails. These rails are connected to the comparator input.

The differential input voltage arrives to a transconductance block implemented as a resistively loaded differential pair. Additionally, the weighting of the sliced and delayed data to be summed is carried out through programmable current sources. Current-steering switches implement the multiplication of the coefficients with the digital data by drawing current from the respective rail.

The implementation of the coefficient multiplication takes also into account the sign of the coefficient. In the differential scheme, the possible four combinations of sign and bit value are reduced to drawing current either from one rail or the other. Addition is implemented as drawing current from the dubbed "negative" rail and subtraction by drawing current from the "positive". This functionality is realized by a XOR function of the bit value and the coefficient sign as seen in Table 5-II.

| Bit value      | Coofficient | Required                               | Implemented                          | Positive    | Negative    |
|----------------|-------------|----------------------------------------|--------------------------------------|-------------|-------------|
| (differential) | coefficient | function                               | function                             | rail switch | rail switch |
| 1              | а           | (V+ - V-) + a                          | V+ - (Va)                            | Off         | On          |
| 1              | -a          | (V <sub>+</sub> - V <sub>-</sub> ) - a | (V <sub>+</sub> -a) - V <sup>-</sup> | On          | Off         |
| -1             | а           | (V <sub>+</sub> - V <sub>-</sub> ) - a | (V+ -a) - V-                         | On          | Off         |
| -1             | -a          | (V <sub>+</sub> - V <sub>-</sub> ) + a | V+ - (Va)                            | Off         | On          |

Table 5-II Coefficient summing logic.

The currents representing the cursor input and the coefficient scaling inputs are summed over the rails which are connected to the comparator input. On each clock cycle, a decision generates a binary symbol which is propagated along the delay line.

#### 5.2.2 Transistor level design

#### 5.2.2.1 Analog summer



Fig. 5-2 Summer and coefficients.

A more detailed view of the summation circuit is shown in Figure 5-3. Generally speaking, DFE design for a specific data rate essentially sets the settling time for coefficient multiplication and therefore the bandwidth of the summer. In such a topology the bandwidth is determined primarily by the resistive load of the input differential pair and the capacitances added from the switches connected to the rails. Settling time is chosen to be around 4 time constants, which translates to reaching more than 98% of the final value.

Besides bandwidth, the resistance value should be chosen bearing in mind the rail common mode. The common mode is also applied to the input of the comparator and should accommodate the rail voltage swing. This swing has been chosen to be 120mV following a typical RF front end link-budget as in the work of [Thakkar, 2012].

The common-mode depends on the current of the differential input pair, which practically determines the power consumption of the summer circuit. This current also sets the gain of the input pair, which in turn determines the swing at the input of the DFE. This input swing can be guaranteed by a variable gain amplifier stage that usually precedes the DFE.

As the tap count is low, the technology node implies relatively small capacitances. The required data rate of 2Gb/s is not too extreme<sup>10</sup>, so a relatively large resistance value can be used for the required bias point. However, attention should be paid to the generated thermal noise, which along with all sources should not exceed the quantization step of the rail voltage. A peak-to-peak noise of 860uV was simulated in the input of the comparator.

<sup>&</sup>lt;sup>10</sup> Referring to Chapter 2 state of the art in wired links, multi GS/s rates have been achieved.

The current source for the input pair  $(I_{tap} \text{ in Figure 5-3})$  is provided from an external reference and the transistors implementing the tail were sized according to [Pelgrom, 1989] in view of achieving current matching better than 1%.

As for the coefficient values, the current sources on the taps are set through 8-bit voltage DACs. Though, regarding to what was discussed in Chapter 3, this precision exceeds requirements, the design of the delay line voltage DAC was reused from another project. Nevertheless, distributing voltage from the DACs to the coefficients was avoided due to probable asymmetries in the references. respective ground Besides, the DACs where placed away from the DFE core. Instead, current-mode distribution to the coefficient current sources was implemented using linear voltage to current converters. The classic converter topology that was used can be seen in Figure 5-3.



Fig. 5-3 Voltage to current converter used for implementing coefficient values.



Fig. 5-4 Differential Cascode Voltage Switch XOR gate connected to current-steering switch.

The drivers for the switches displayed in Figure 5-4 were designed as XOR/XNOR with


Fig. 5-5 Comparator with preamp and trimmer function.

differential cascode voltage switch logic. They also feature tri-state disabling with pull-down capability to enable turning the drivers off for general calibration purposes.

### 5.2.2.2 Comparator

The clocked comparator is a typical implementation of the established StrongArm latch [Kobayashi, 1996]<sup>11</sup>. A small gain preamp was added with a trimmer function. The preamp is implemented as a resistively loaded differential pair with an extra transistor pair on the output (Figure 5-5). This circuit can create an imbalance on the common mode, which can effectively cancel mismatches that may exist in the input of the comparator. The reference voltage of the comparator *vg* is provided externally and the gate voltage of the trimming transistor, *Vtrim*, is controlled by an internal DAC to aid in comparator calibration. An SR latch is placed in the output in order to keep the data valid for a full clock period<sup>12</sup>.

#### 5.2.2.3 Comment on first-tap implementation

Compared to typical implementations of mixed-signal DFEs for wired applications, this design is more relaxed in the sense that no first-tap scaling is pursued. This is a usual constraint in wire-line design namely for high speed PCB traces. A wired channel's impulse response (Figure 5-6) exhibits a tail that follows the cursor from next unit interval. Therefore, the first loop, which consists of the comparator decision and first tap coefficient summation, should settle in less than the clock period. This imposes a stringent bandwidth constraint for the summer.

<sup>&</sup>lt;sup>11</sup> A concise overview of the can also be found in [Razavi, 2015].

<sup>&</sup>lt;sup>12</sup> The comparator clocked PMOS transistors pull up the output for half the clock period (*reset* phase).



Fig. 5-6 Typical PCB (left) vs. 60GHz wireless LOS indoor (right) IRs (from Chapter 3).

Our study of 60GHz modeling has revealed that the 60GHz channel impulse response realizations do not demand cancellation of components immediately following the cursor. The main source of ISI comes out of channel reflections that arrive at the receiver after multiple unit intervals. That's why in this prototype design a delay element has been placed right after the comparator.

## 5.3 Chip design

A die photograph with a layout inset is presented in Figure 5-7. The DFE core includes the analog summer, tap drivers and switches and the clocked comparator. The tap delays were implemented with the design presented in the previous chapter, using common external gate bias and internal DACs for body biasing each one. Also extra DACs were used to produce the coefficient current values and trimmer voltage. The master control register is used to program DAC values, set delay line lengths, enable current-steering switches and set coefficient signs.



Fig. 5-7 Chip photograph with DFE layout inset

# **5.4 Measurement Results**

By the time this version of the manuscript was written, chip assembly was not yet performed for the DFE part. Extrapolated power performance based on simulations and comparison with the state of the art is found in Table 5-III below. The targeted metric for this DFE implementation is the power consumption as this topology benefits from its special continuous-time digital delay line design.

| Results                             | This work                      | [Thakkar, 2012]      | [Thakkar, 2014]      | [Sobel, 2009]      |
|-------------------------------------|--------------------------------|----------------------|----------------------|--------------------|
| Technology                          | 28nm FDSOI                     | 65nm CMOS            | 65nm CMOS            | 90nm CMOS          |
| Sample Rate                         | 1.76GS/s                       | 5GS/s                | 1.76GS/s             | 500MS/s            |
| Modulation                          | BPSK                           | QPSK                 | QPSK                 | MSK                |
| Targeted BER                        | 10 <sup>-3</sup>               | N/A                  | 10-3                 | 10 <sup>-3</sup>   |
| Connected DFE<br>coeffs / IQchannel | 5                              | 20                   | 50                   | 8                  |
| Floating taps                       | Yes                            | No                   | No                   | Yes                |
| Max. Delay spread (ns)              | 65ns                           | 4ns                  | 28ns                 | 32ns               |
| Coverage                            | 5 taps<br>within<br>300ps-13ns | 1 tap every<br>200ps | 1 tap every<br>560ps | 8 taps within 32ns |
| Analog Summer                       | Resistive                      | Resistive            | Current Integ        | Current Buffer+TIA |
| Power/channel(mW)                   | 0.7 - 2.7                      | 7                    | 7.5                  | 14                 |
| Efficiency (pJ/bit)                 | 0.42 - 1.8                     | 1.4                  | 6.9                  | 28                 |

Table 5-III DFE extrapolated performance from simulations against state of the art.

# 5.5 Chapter Bibliography

[Kobayashi, 1996] Kobayashi, T., K. Nogami, T. Shirotori, and Y. Fujimoto. "A Current-Controlled Latch Sense Amplifier and a Static Power-Saving Input Buffer for Low-Power Architecture." *IEEE Journal of Solid-State Circuits* 28, no. 4 (April 1993): 523–27. doi:10.1109/4.210039.

[Pelgrom, 1989] Pelgrom, M.J.M., A.C.J. Duinmaijer, and A.P.G. Welbers. "Matching Properties of MOS Transistors." *IEEE Journal of Solid-State Circuits* 24, no. 5 (October 1989): 1433–39. doi:10.1109/JSSC.1989.572629.

[Razavi, 2015] Razavi, Behzad. "The StrongARM Latch [A Circuit for All Seasons]." *IEEE Solid-State Circuits Magazine* 7, no. 2 (2015): 12–17. doi:10.1109/MSSC.2015.2418155.

[Sobel, 2009] Sobel, D.A., and R.W. Brodersen. "A 1 Gb/s Mixed-Signal Baseband Analog Front-End for a 60 GHz Wireless Receiver." *Solid-State Circuits, IEEE Journal of* 44, no. 4 (April 2009): 1281–89. doi:10.1109/JSSC.2009.2014731.

[Thakkar, 2012] Thakkar, C., Lingkai Kong, Kwangmo Jung, A. Frappe, and E. Alon. "A 10 Gb/s 45 mW Adaptive 60 GHz Baseband in 65 Nm CMOS." *Solid-State Circuits, IEEE Journal of* 47, no. 4 (April 2012): 952–68. doi:10.1109/JSSC.2012.2184651.

[Thakkar, 2014] Thakkar, C., N. Narevsky, C.D. Hull, and E. Alon. "Design Techniques for a Mixed-Signal I/Q 32-Coefficient Rx-Feedforward Equalizer, 100-Coefficient Decision Feedback Equalizer in an 8 Gb/s 60 GHz 65 Nm LP CMOS Receiver." *IEEE Journal of Solid-State Circuits* 49, no. 11 (November 2014): 2588–2607. doi:10.1109/JSSC.2014.2360917.