# THÈSE

Présentée à l'Université de Lille, Sciences et Technologies École Doctorale Sciences pour l'Ingénieur

En vue d'obtenir le grade de

## DOCTEUR

Discipline : Electronique, microélectronique, nanoélectronique et micro-ondes

par

## **Răzvan-Cristian MARIN**

Transmetteurs Radiofréquences Numériques Fortement Parallélisés avec Amplificateur de Puissance Commuté et Filtre de Bande Embarqués en Technologie 28nm FD-SOI CMOS

Highly Parallel Digital RF Transmitter with Switch-Mode Power Amplifier and Embedded Band Filter in 28nm FD-SOI CMOS

Soutenue publiquement le 23 Novembre 2017 devant le jury d'examen

Membres du Jury:

Nathalie DELTIMPLE Jussi RYYNÄNEN Henrik SJÖLAND Cristophe LOYEZ Andreia CATHELIN Antoine FRAPPÉ Andreas KAISER Rapporteur Rapporteur Examinateur Examinateur Examinateur Co-encadrant Directeur de thèse



## Abstract

The present PhD work covers the study, design and demonstration of all-digital transmitters targeting advanced communication standards for mobile applications in the frame of the Internet of Things (IoT). Key innovations are time-interleaved Delta-Sigma modulators (DSM) and a power and area-efficient switched-capacitor (SC) finite impulse response power amplifier (FIR-PA).

The proposed transmitter architecture comprises a single-bit 8-channel timeinterleaved (TI) DSM, which enables a simplified operation of the output stage with a double function of power amplifier and FIR filter. The common FIR-PA block uses exclusively inverters and capacitors in a switched-capacitor configuration, thus being fully compatible with advanced CMOS technology nodes. A particular attention is paid to the complexity and power consumption of the output stage by reducing switching redundancy and co-designing the band filter together with an output RLC filter.

The prototype is integrated in 28nm FD-SOI CMOS technology with 10 metal layers and body biasing fine-tuning features. The integrated circuit (IC) is packaged in a custom Ball-Grid Array (BGA) package, including additional passive components. The proposed digital RF transmitter based on 1-bit delta-sigma modulators and switched-capacitor power amplifier with embedded 109-tap FIR band filter achieves 13.5 in-band effective number of bits (ENOB) and is 900 MHz LTE-compliant.

The overall power consumption is 35 mW at 2.9 dBm peak output power and 1V supply. LO and image rejection are >55 dBc thanks to FD-SOI body-bias V<sub>t</sub> fine-tuning. With respect to relevant state-of-the art, at similar output power levels, the FIR-PA (at 1 V) consumes 7 times less than a 10-bit DSM-based DAC (at 1.5 V) and 25% less than a 12-bit

resistive DAC (at 0.9 V). The total active area is 0.047 mm<sup>2</sup>, at least 4 times lower than the smallest previously published work.

Consequently, this work stands out for low power consumption thanks to the singlebit core solution combined with band filtering and low area achieved with a multi-layer FIR-PA cell structure. It demonstrates the transition from traditional analog to highly integrated digital-intensive transmitters targeting the future of mobile applications.

**Keywords**: Delta-sigma modulation (DSM), Time-interleaving (TI), Finite impulse response filter (FIR), Switched-capacitor (SC) power amplifier (PA), FIR-PA, 28nm FD-SOI, Body-bias, V<sub>t</sub> fine-tuning, All-digital transmitter.

## Résumé

Le présent travail de thèse porte sur l'étude, la conception et la démonstration d'émetteurs entièrement numériques, ciblant des standards de communication avancés pour les applications mobiles dans le cadre de l'Internet des Objets (IoT). Les innovations clés sont le modulateur Delta-Sigma (DSM) entrelacé et un amplificateur de puissance à réponse impulsionnelle finie (FIR-PA) basé sur une structure efficace à capacités commutées (SC).

L'architecture d'émetteur proposée comprend un DSM entrelacé (TI) à 8 canaux qui permet un fonctionnement simplifié de l'étage de sortie avec une double fonction d'amplificateur de puissance et de filtre FIR. Le block FIR-PA utilise uniquement des inverseurs CMOS et des condensateurs dans une configuration SC, ce qui est entièrement compatible avec les nœuds technologiques CMOS avancés. Une attention particulière est accordée à la complexité et à la consommation d'énergie de l'étage de sortie, en réduisant la redondance de commutation et en réalisant un co-design du filtre de bande avec un filtre RLC en sortie.

Le prototype est implémenté dans une technologie 28nm FD-SOI CMOS avec 10 couches métalliques et un contrôle amélioré de la tension du substrat. Le circuit intégré (IC) est monté dans un substrat de type BGA, avec des composants passifs supplémentaires. L'émetteur RF numérique basé sur les modulateurs delta-sigma 1-bit et l'amplificateur de puissance à capacités commutées, intégrant un filtre de bande avec 109 coefficients, atteint un nombre de bits effectif (ENOB) de 13.5 dans la bande de signal utile et est compatible avec le standard LTE 900 MHz.

Le circuit consomme 35 mW à une puissance de sortie maximale de 2.9 dBm et une alimentation de 1 V. La rejection des composants de l'oscillateur local (LO) et d'image est > 55 dBc, grâce au réglage fin du V<sub>t</sub> par polarisation du substrat. Par rapport à l'état de l'art, à des niveaux de puissance de sortie similaires, le FIR-PA (à 1 V) consomme 7 fois moins qu'un DAC 10-bit intégrant des modulateurs delta-sigma (à 1.5 V) et 25% moins qu'un DAC résistif 12-bit (à 0.9 V). La surface active totale est de 0.047 mm<sup>2</sup>, soit 4 fois moins que le plus petit circuit publié précédemment.

Par conséquent, ce travail se distingue par une faible consommation d'énergie grâce à la l'architecture 1-bit combinée au filtrage de bande et par la surface réduite obtenue par l'intégration efficace des cellules du FIR-PA. Il démontre la transition de l'émetteur analogique traditionnel à l'émetteur numérique intégré ciblant l'avenir des applications mobiles.

**Mots-clés** : Modulation delta-sigma (DSM), Entrelacement temporel (TI), Filtre à réponse impulsionnelle finie (FIR), Amplificateur de puissance (PA) à capacités commutées (SC), FIR-PA, 28nm FD-SOI, Polarisation du substrat (Body-Bias), Réglage fin du V<sub>t</sub>, Émetteur numérique.

## **List of publications**

**[Marin15]** R.-C. Marin, A. Frappé, A. Kaiser, and A. Cathelin, "Considerations for high-speed configurable-bandwidth time-interleaved digital delta-sigma modulators and synthesis in 28 nm UTBB FD-SOI," in *2015 IEEE 13th International New Circuits and Systems Conference (NEWCAS)*, Grenoble, pp. 1–4, June 2015.

This paper presents the design and simulation of a time-interleaved delta-sigma modulator as part of a digital transmitter chain. The architecture is chosen based on a critical path analysis in order to reach very high frequency operation. The modulator's configurability allows it to target signal bandwidths from 20 MHz up to 160 MHz with a SNR greater than 67 dB. Finally, the modulator is synthesized using standard cells in 28nm FDSOI CMOS from STMicroelectronics and simulated for different numbers of time-interleaved channels, reaching a sample rate of up to 6 GS/s. An optimum number of channels can be found based on a trade-off between operating frequency, supply voltage, power consumption and area.

**[Marin16]** R.-C. Marin, A. Frappé, A. Kaiser, "Delta-Sigma Based Digital Transmitters with Low-Complexity Embedded-FIR Digital to RF Mixing," *23rd IEEE International Conference on Electronics, Circuits and Systems (ICECS)*, Monte Carlo, pp. 237-240, Dec. 2016.

The focus of this contribution is to review delta-sigma based all-digital transmitters and to discuss issues related to large out-of-band quantization noise and possible coexistence problems. Low-complexity embedded-FIR filters are very interesting to relax the filtering constraints while keeping systems as digital as possible to benefit from the advanced CMOS node integration. In this paper we propose a single-bit digital to RF mixer with embedded-FIR, which provides noise level reduction at specific frequencies in order to target multi-standard coexistence. This architecture introduces simple logic operating at low frequency which enables single-bit output and avoids the use of an additional delayed DAC, thus reducing considerably the power consumption and area of the output stage. Finally, we introduce an asymmetric unbalanced FIR architecture to provide a complementary solution for out-of-band noise reduction.

**[Marin17a]** R.-C. Marin, A. Frappé, A. Kaiser, "Considerations for Complex Digital Delta-Sigma Modulators for Standard Coexistence in Digital Wireless Transmitters," **accepted** to *IEEE Trans. Circuits Syst. I, Reg. Papers*, June 2017.

This paper presents a Complex Delta-Sigma Modulator (CDSM) designed for integration in a digital transmitter chain targeting multi-standard coexistence with nearby receivers. A review of known design methods for CDSM revealed limitations regarding the poles/zeros optimization, and the configurability of the complex zeros placement. The proposed architecture introduces two additional cross-couplings from the I and Q quantizers outputs in order to decorrelate the zeros placement and the poles optimization problem. Hence, the improved CDSM can be implemented using existing optimization tools, which reduces considerably the number of iterations and the computational effort. In addition, the resulting modulator can target different coexistence scenarios without the need of redesign, unlike other known methods. Simulation results show a noise level reduction of approximately 20-30 dB near specific frequency bands by the proposed CDSM scheme with respect to standard DSM. Finally, we show an efficient coarse/fine configurability mechanism, which is obtained when introducing additional delays in the cross-coupling paths.

**[Marin17b]** R.-C. Marin, A. Frappé, B. Stefanelli, P. Cathelin, A. Cathelin, A. Kaiser, "A 28nm FD-SOI CMOS Digital RF Transmitter with Switched-Capacitor Pre-Power Amplifier and Embedded Band Filter," **to be submitted** to *Journal of Solid-State Circuits*.

This paper introduces a 900MHz LTE-compliant digital RF TX based on 1-bit  $\Delta\Sigma M$  and SC pre-PA with embedded 109-tap FIR (FIR-PPA) Band Filter achieving 13.5 in-band ENOB. The FIR-PPA is built with CMOS inverters and Metal-Oxide-Metal capacitors, integrated directly under flip-chip RF pads in 28nm FD-SOI CMOS. The overall power consumption is 35mW at 2.9dBm peak output power and 1V supply. LO and image rejection are >55dBc thanks to FD-SOI body-bias V<sub>t</sub> tuning. The total active area is 0.047mm<sup>2</sup>.

## Acknowledgments

It was a real adventure, having alongside great people to whom I wish to extend my deepest gratitude. I will always cherish these moments.

Andreas, your always challenging comments, questions, remarks have added a great value to this work. It was a pleasure working with you on defining and expressing clear concepts and ideas, and learning from one of the best.

Andreia and Philippe, your excellent guidance allowed a perfect combination of theory and practice, academia and industry, to explore the full circuit design flow from idea to prototype and see the "bigger picture". Thank you for the access to high-end technology and the opportunity to discover and work briefly at STMicroelectronics, which were definitive in reaching my goals.

Antoine, thank you for your full support throughout this period and the "always opendoor" policy. Thank you for the passionate discussions, brain-storming and idea definition, it was an exciting ride, and you were there all the way.

Nathalie Deltimple, Jussi Ryynänen, Henrik Sjöland and Cristophe Loyez, thank you for accepting the invitation to report and examine my work and I'm looking forward to your future comments, questions, remarks which represent an excellent opportunity for improvement.

I wish to thank the Hauts-de-France Region (formerly Nord-Pas-de-Calais) and STMicroelectronics for funding this PhD work through CNRS. Also, I wish to thank the Research Council of the Catholic University of Lille, for research funding and rewarding me with the 2<sup>nd</sup> prize in the PhD Research Contest 2016.

Bruno Stefanelli, your involvement, availability, attention to details and design experience, were indispensable in the success of this work.

Axel Flament, thank you for taking the time to pass some of your experience, especially for transmission line design.

Jean-Marc Capron, thank you for your involvement in the project, raising questions, and clarifying key-points. In addition, thank you for giving me the opportunity to teach at ISEN, it was a great experience which helped me gain confidence in public speech and develop social skills.

Cristophe Denoyelle, thank you for mastering fine BGA soldering which allowed fast measurement setup and debug, in order to obtain the best possible performance. Your availability and curiosity played a defining role.

Emmanuel Dubois, thank you for the institutional and experimental support at IEMN and ISEN.

Rédha Kassi and Laurent Bigot, thank you for the access to measurement equipment and setup support at IRCICA.

To Didier Campos and the packaging integration team at STMicroelectronics, special thanks for the BGA design, packaging and follow-up on measurements.

Florence Alberti, thank you for your availability, help and prompt solution to any administrative situation.

Nora Benbahlouli, thank you for the administrative support at IEMN.

Anne-Marie, thank you for your support in entering the PhD research contest, for your kindness and for the occasional discussions on everything and anything which usually started in French and finished in Romanian.

Evelyne Litton, thank you for welcoming me wholeheartedly at ISEN 4 years ago and for your warm and friendly attitude. The same goes for the entire SMART department at ISEN. This made the adjustment to France and Lille very smooth, thank you for your hospitality. I would also like to thank fellow PhD, PostDoc past and present for their advice, company and friendship. Hani, thank you for sharing some of your knowledge and experience during my stay at ST, and for a comforting word in a sorrow day. Reda and Nassim, thank you for interesting discussions on chip integration and providing a friendly environment at ST. Camillo and Matteo, you guys spent too little time at ISEN and you were/are definitely missed, I wish you all the best. Ilias, you brought a mix of maturity and cheerful attitude to the group, thank you for your advices both professionally and personally. Fikre, thank you for providing a calm, friendly working environment in office, during TP or at conferences. Now, the senior PhD role is passed to Dipal, always cheerful, you are great, just "keep calm and tape-out" (as Ilias used to say). Greetings to Angel, you are on the right path.

Miruna Niţescu and Florin Constantinescu, thank you for your guidance during Bachelor and Master studies, for the important role you played in the opportunity to study in France, and generally for being there and supporting me both professionally and personally.

Cornel Stănescu, thank you for giving me the opportunity to explore industry analog IC design at a high-level, and inspiring the desire to go further on towards PhD.

Finally, I wish to thank my family, my mom and dad for everything they did for me, love, cherish, sacrifices, the possibility to go and pursue my dreams via the best investment -in education-, my brother, my best friend, always supportive, listening and advising, Typhaine for her love and indefinite support, Valérie and Denis for their warmth, hospitality, trust and consideration.

## **Table of contents**

| Abstracti                    |
|------------------------------|
| Résuméiii                    |
| List of publicationsv        |
| Acknowledgements             |
| Table of contents xi         |
| List of figuresxvii          |
| List of tables               |
| List of acronyms xxv         |
| Chapter 1. Introduction      |
| 1.1. Motivation              |
| 1.2. Research considerations |
| 1.3. Main contributions      |
| 1.4. Manuscript organization |
| 1.5. Chapter Bibliography    |

| Chapter 2. WLAN Transmitters                                | 11 |
|-------------------------------------------------------------|----|
| 2.1. IEEE 802.11 Standard Specifications                    | 11 |
| 2.1.1. Channelization                                       |    |
| 2.1.2. Transmit Spectral Mask                               | 15 |
| 2.1.3. Transmitter Measurements                             |    |
| 2.2. State-of-the-art in wireless transmitter architectures |    |
| 2.2.1. Analog transmitters                                  |    |
| 2.2.2. Digital Transmitters                                 |    |
| 2.2.2.1. Polar architecture                                 |    |
| 2.2.2.2. Cartesian architecture                             |    |
| 2.3. Conclusion                                             |    |
| 2.4. Chapter Bibliography                                   |    |
|                                                             |    |
| Chapter 3. Transmitter architecture description             |    |
| 3.1. Introduction                                           |    |
| 3.2. Digital Delta-Sigma Modulators                         |    |
| 3.2.1. Basic concepts                                       |    |
| 3.2.1.1. Quantization Noise                                 |    |
| 3.2.1.2. Oversampling                                       |    |
| 3.2.1.3. 1 <sup>st</sup> order noise shaping                |    |
| 3.2.1.4. Higher-order noise shaping                         |    |
| 3.2.2. Increased effective operating frequency              |    |
| 3.2.2.1. Time-interleaving methods                          |    |
| 3.2.2.2. Critical path study                                |    |
| 3.2.3. CIFB architecture                                    |    |
| 3.2.3.1. CIFB DSM design specifications                     |    |
| 3.2.3.2. Architecture coefficients                          |    |
| 3.2.4. DSM synthesis                                        |    |

| 3.3. FIR-PA                                                       | 52 |
|-------------------------------------------------------------------|----|
| 3.3.1. FIR Filter                                                 | 52 |
| 3.3.2. Power amplifier                                            | 54 |
| 3.3.2.1. Design specifications                                    | 56 |
| 3.3.2.2. Filter optimization using the matching network           | 56 |
| 3.3.3. Efficiency                                                 | 60 |
| 3.3.3.1. Single-ended FIR-PA efficiency                           | 64 |
| 3.3.3.2. Differential FIR-PA efficiency                           | 67 |
| 3.3.3.2.1. Optimization of the switching activity                 | 67 |
| 3.3.3.2.2. Half-SC FIR-PA                                         | 68 |
| 3.3.3.2.3. Comparison with ideal high-resolution DAC              | 72 |
| 3.4. FIR-PA non-idealities                                        | 73 |
| 3.4.1. Non-ideal switch model                                     | 74 |
| 3.4.1.1. Power spectral density estimation                        | 76 |
| 3.4.1.2. Model of the rise/fall time                              | 78 |
| 3.4.1.3. Model of the low-to-high/high-to-low delay               | 79 |
| 3.4.1.4. Combined model of rise/fall time, LH/HL delay and jitter | 80 |
| 3.4.1.5. Results                                                  | 82 |
| 3.4.2. Non-ideal FIR filter coefficients                          | 85 |
| 3.4.2.1. Estimation                                               | 85 |
| 3.4.2.2. Comparison with non-ideal high-resolution DAC            | 87 |
| 3.4.2.3. Compensation                                             | 88 |
| 3.5. Voltage supply variation                                     | 89 |
| 3.6. Conclusion                                                   | 91 |
| 3.7. Chapter Bibliography                                         | 93 |

| Chapter 4. All-digital transmitter circuit design      |     |
|--------------------------------------------------------|-----|
| 4.1. Transmitter IC description                        |     |
| 4.1.1. IC block diagram                                |     |
| 4.1.2. IC configuration and physical implementation    |     |
| 4.2. Switched-capacitor FIR-PA design                  |     |
| 4.2.1. Configurable FIR-PA                             |     |
| 4.2.1.1. Initial estimation of power cell distribution |     |
| 4.2.1.2. Actual power cell distribution                |     |
| 4.2.2. Unitary power cell                              | 106 |
| 4.2.2.1. CMOS inverter design                          | 107 |
| 4.2.2.1.1. Inverter sizing                             |     |
| 4.2.2.1.2. MOM capacitor design                        | 109 |
| 4.2.2.1.3. Power consumption estimation                |     |
| 4.2.2.1.4. Effect of the CMOS inverter on-resistance   |     |
| 4.2.2.2. FIR-PA cells overview                         |     |
| 4.2.2.3. Decoupling capacitors                         |     |
| 4.2.2.4. Power efficiency estimation                   |     |
| 4.2.3. FIR-PA structure                                |     |
| 4.2.3.1. Ideal PA performance                          |     |
| 4.2.3.2. Layout parasitic extraction                   |     |
| 4.3. Digital block design                              |     |
| 4.3.1. Input signals generation                        |     |
| 4.3.2. FIR-PA SIGGEN block                             |     |
| 4.3.3. Digital to RF mixer                             |     |
| 4.3.4. Power consumption estimation                    |     |
| 4.4. Conclusion                                        |     |

| 4.5. Chapter Bibliography                                       |     |
|-----------------------------------------------------------------|-----|
| Chapter 5. Measurements                                         |     |
| 5.1. IC Packaging                                               | 139 |
| 5.2. PCB design                                                 |     |
| 5.2.1. High-speed signal line design                            |     |
| 5.2.2. RF differential outputs                                  |     |
| 5.3. Measurement setup and test cases                           |     |
| 5.3.1. Case 1: Functional                                       |     |
| 5.3.2. Case 2: FIR filter with transmission line adaptation     |     |
| 5.3.3. Case 3: Higher frequency operation                       |     |
| 5.4. Experimental IC validation on LTE standard at 900 MHz band |     |
| 5.5. Conclusion                                                 |     |
| 5.6. Chapter Bibliography                                       | 156 |
| Chapter 6. Conclusion                                           |     |
| 6.1. Research conclusion                                        |     |
| 6.2. Future directions                                          | 158 |
| 6.3. Chapter Bibliography                                       |     |
| Additional work. Multi-standard coexistence                     |     |
| A.1. Complex Delta-Sigma Modulators                             |     |
| A.2. Embedded-FIR digital to RF mixing                          |     |
| A.3. Chapter Bibliography                                       |     |

## **List of figures**

### Chapter 1

| Fig. 1   Battery - smartphone                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | ~   |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| $1 1_{\text{M}}$ , $1,1$ , $D$ with $y$ of the photon | . 3 |

| Fig. 2.1. Overview of overlapping channels in 2.4 GHz band [Tektronix13]               | 12 |
|----------------------------------------------------------------------------------------|----|
| Fig. 2.2. Overview of non-overlapping channels in 2.4 GHz band [Tektronix13]           | 13 |
| Fig. 2.3. Available non-overlapping channels in 5 GHz band [Cisco13]                   | 13 |
| Fig. 2.5. OFDM spectral mask for 802.11a/g/n/ac [Tektronix13]                          | 14 |
| Fig. 2.6. Traditional analog transmitter architecture                                  | 16 |
| Fig. 2.7. Transmitter block diagram with mixer-PA interface [Kumar13]                  | 17 |
| Fig. 2.8. Block diagram of the dual-band 3-stream MIMO WLAN radio [He14]               | 17 |
| Fig. 2.9. Transmitter block diagram [Chen14]                                           | 18 |
| Fig. 2.10. General representation of a data point                                      | 20 |
| Fig. 2.11. Polar representation bandwidth expansion                                    | 21 |
| Fig. 2.12. Digital Polar transmitter block diagram [Zheng13]                           | 21 |
| Fig. 2.13. Proposed outphasing TX architecture [Ravi12]                                | 22 |
| Fig. 2.14. Digitally modulated polar TX: Architecture (a); Drain efficiency (b) [Ye13] | 23 |
| Fig. 2.15. Class-G SC PA: Architecture (left); Ideal efficiency (right) [Yoo13]        | 23 |
| Fig. 2.16. Spectrum Cartesian representation                                           | 24 |
| Fig. 2.17. TX block diagram (left); QAM spectrum including replicas (right) [Alavi14]  | 24 |
| Fig. 2.18. All-digital quadrature transmitter architecture [Wang14]                    | 25 |
| Fig. 2.19. Digital PWM transmitter system level view [Hezar14]                         | 26 |

| Fig. | 3.1. Traditional Analog Transmitter (left); Digital Transmitter (right)                            | 32 |
|------|----------------------------------------------------------------------------------------------------|----|
| Fig. | 3.2. Digital I/Q modulation block diagram [Alavi14]                                                | 33 |
| Fig. | 3.3. D-RF architecture based on single-bit DSM [Frappe09]                                          | 33 |
| Fig. | 3.4. Digital TX: Image replicas [Alavi14] (left); Out-of-band noise [Frappe09] (right)             | 34 |
| Fig. | 3.5. FIR-DAC [Gebreyohannes16]: TX block diagram (a); 1-bit N-length FIR DAC (b)                   | 35 |
| Fig. | 3.6. Proposed concept of all-digital transmitter with embedded-FIR PA                              | 36 |
| Fig. | 3.7. Delta-Sigma Modulation concept                                                                | 37 |
| Fig. | 3.8. Quantizer (left) and its linear model (right) [Kozak03]                                       | 38 |
| Fig. | 3.9. 1 <sup>st</sup> order DSM architecture                                                        | 39 |
| Fig. | 3.10. Discrete-time integrator (left) and resonator (right)                                        | 40 |
| Fig. | 3.11. 3 <sup>rd</sup> order CIFB architecture                                                      | 41 |
| Fig. | 3.12. MASH 1-1 architecture                                                                        | 42 |
| Fig. | 3.13. Pipelining using 5-bit and 6-bit ripple carry adders (RCA5/RCA6) [Schmidt11]                 | 43 |
| Fig. | 3.14. Equivalent block filtering structure for SISO transfer function <i>H</i> ( <i>z</i> ) [KP93] | 44 |
| Fig. | 3.15. 1 <sup>st</sup> order 2-channel TI DSM: poly-phase (left); node equations (right) [Marin15]  | 46 |
| Fig. | 3.16. 3 <sup>rd</sup> order CIFB architecture with 1-bit bandwidth control [Marin15]               | 48 |
| Fig. | 3.17. NTF of the 3 <sup>rd</sup> order CIFB DSM: $g_1 = 0$ ; $g_1 = 2^{-7}$ [Marin15]              | 49 |
| Fig. | 3.18. TI DSM estimated maximum operating frequency                                                 | 50 |
| Fig. | 3.19. 1-bit DSM output spectrum (Amplitude scaled to $P_{out} = 20$ dBm, $BW = 20$ MHz)            | 52 |
| Fig. | 3.20. Simulated output spectrum: FIR filtered DSM output ( $BW = 20$ MHz)                          | 53 |
| Fig. | 3.21. Switching-mode class-D PA: active devices (left); ideal switch (right)                       | 54 |
| Fig. | 3.22. Switching-mode class-D PA with embedded N-length FIR filter                                  | 54 |
| Fig. | 3.23. Transfer function $2^{nd}$ order BPF filter : {R, L, C}, {2*R, L, C}, and {R, 2*L, C/2}      | 56 |
| Fig. | 3.24. Simulated output spectrum: FIR 5-bit and RLC vs. FIR 8-bit ( $BW = 20$ MHz)                  | 58 |
| Fig. | 3.25. Digital Cartesian transmitter based on single-ended FIR-PA                                   | 60 |
| Fig. | 3.26. 2-bit SC PA: at $t_0$ (left) and $t_1$ (right)                                               | 61 |
| Fig. | 3.27. Schematic of an ideal unitary power cell in Cadence                                          | 63 |
| Fig. | 3.28. ISI with asymmetric fronts (a); resulting output spectra (b) [Frappe07]                      | 65 |
| Fig. | 3.29. FIR-PA differential architecture                                                             | 66 |
| Fig. | 3.30. Half-SC FIR-PA differential architecture                                                     | 68 |

| Fig. 3.31. Power efficiency characteristics                                                            | 70 |
|--------------------------------------------------------------------------------------------------------|----|
| Fig. 3.32. Output spectrum: comparison Half-SC FIR-PA with ideal DAC architectures                     | 72 |
| Fig. 3.33. Ideal switch (left); CMOS inverter switch (right)                                           | 73 |
| Fig. 3.34. Input / Output waveforms: Ideal switch (left); CMOS inverter switch (right)                 | 73 |
| Fig. 3.35. Jitter with Gaussian distribution [Smilkstein07]                                            | 74 |
| Fig. 3.36. Input / Output waveforms of CMOS inverter switch: Jitter effect                             | 75 |
| Fig. 3.37. Successive capture averaging [Bishop10]                                                     | 76 |
| Fig. 3.38. Ideal switch (blue) and CMOS inverter (green and red) with variable $t_r$ and $t_f$         | 77 |
| Fig. 3.39. Ideal switch (blue) and CMOS inverter (green and red) with variable $t_{PLH}$ and $t_{PHL}$ | 79 |
| Fig. 3.40. Output waveform: ideal switch and CMOS inverter (combined non-ideal effects)                | 80 |
| Fig. 3.41. Output spectrum: ideal (black); variable $t_r$ and $t_f$ (blue)                             | 81 |
| Fig. 3.42. Output spectrum: ideal (black); variable $t_{PLH}$ and $t_{PHL}$ (blue)                     | 81 |
| Fig. 3.43. Output spectrum: ideal (black); variable jitter <i>t<sub>j,var</sub></i> (blue)             | 82 |
| Fig. 3.44. Output spectrum: ideal (black); non-ideal effects (blue)                                    | 83 |
| Fig. 3.45. Output spectrum: non-ideal coefficients $\sigma_c = 0.05$ (35 random realizations)          | 84 |
| Fig. 3.46. Output spectrum: non-ideal coefficients $\sigma_c = 0.1$ (35 random realizations)           | 85 |
| Fig. 3.47. Half-SC FIR-PA vs. segmented DACs - non-ideal coefficients $\sigma_c = 0.1$                 | 86 |
| Fig. 3.48. Output spectrum: Compensated coefficients $\sigma_c = 0.1$ (35 random realizations)         | 87 |
| Fig. 3.49. Output spectrum: Non-ideal supply voltage $\sigma_{vdd} = 3\%$ (35 random realizations)     | 88 |
| Fig. 3.50. Output spectrum: Non-ideal supply voltage $\sigma_{vdd} = 10\%$ (35 random realizations)    | 89 |
| Fig. 3.51. Output spectrum: Effect of switching current over the non-ideal supply voltage              | 89 |

| Fig. 4.1. Proposed all-digital transmitter implementation                        | 97  |
|----------------------------------------------------------------------------------|-----|
| Fig. 4.2. IC block diagram                                                       | 98  |
| Fig. 4.3. IC layout view                                                         | 99  |
| Fig. 4.4. Pad matrix (left); Color code (right)                                  | 100 |
| Fig. 4.5. FD-SOI UTBB transistor cross-section [Cathelin17]                      | 101 |
| Fig. 4.6. FIR filter coefficients for multi-standard: BW = {20, 40, 80, 160} MHz | 105 |
| Fig. 4.7. Flip-Chip Pad dimensions in 28 nm CMOS FD-SOI                          | 106 |
| Fig. 4.8. CMOS inverter with nMOS back-gate control                              | 107 |

| Fig. 4.9. Layout MOM capacitor: unitary cell (left); Group of 4 capacitors (right)     | 109 |
|----------------------------------------------------------------------------------------|-----|
| Fig. 4.10. Group of 4 power cells: schematic (left); layout (right)                    | 110 |
| Fig. 4.11. Short-circuit currents during transients [Rabaey03]                         | 111 |
| Fig. 4.12. Matching cell: nMOS switch (a); nMOS switch-on (b); nMOS switch-off (c)     | 113 |
| Fig. 4.13. Decap with active devices                                                   | 114 |
| Fig. 4.14. FIR-PA power cells: fixed and coexistence (a); matching network (b)         | 115 |
| Fig. 4.15. Power efficiency characteristics                                            | 117 |
| Fig. 4.16. Layout FIR-PA: unitary cell 3D view (left); FIR-PA under signal PAD (right) | 118 |
| Fig. 4.17. Maximum output signal amplitude: ideal and extracted                        | 120 |
| Fig. 4.18. Overview of output power and drain efficiency performance                   | 122 |
| Fig. 4.19. Signal generator digital circuits block diagram: PA test                    | 124 |
| Fig. 4.20. Signal generator digital circuits block diagram: DSM test                   | 125 |
| Fig. 4.21. Timing diagram: DSM input signals for 8-channel TI DSM test                 | 125 |
| Fig. 4.22. Signal generator digital circuits block diagram: complete TX test           | 126 |
| Fig. 4.23. Timing diagram: DSM input signals for complete TX test                      | 127 |
| Fig. 4.24. FIR-PA SIGGEN block diagram                                                 | 128 |
| Fig. 4.25. 2-step MUX DRFM                                                             | 130 |

| Fig. 5.1. BGA substrate with underfill view TOP: layout (left); assembled (right)                         | . 137 |
|-----------------------------------------------------------------------------------------------------------|-------|
| Fig. 5.2. PCB version <i>functional</i> at 2.4 GHz - View TOP                                             | . 138 |
| Fig. 5.3. Coplanar Waveguide w/Ground [Hartley17]                                                         | . 139 |
| Fig. 5.4. TX-LINE user interface example                                                                  | . 140 |
| Fig. 5.5. ADS schematic – Example of adapted transmission line 2 $\Omega$ to 50 $\Omega$ at 2.4 GHz       | . 141 |
| Fig. 5.6. PCB versions <i>functional</i> (left) and <i>power</i> (right) - View TOP                       | . 141 |
| Fig. 5.7. Measurement setup                                                                               | . 142 |
| Fig. 5.8. RLC BPF transfer function: measured vs. theoretical (functional)                                | . 143 |
| Fig. 5.9. Power consumption vs. center frequency                                                          | . 144 |
| Fig. 5.10. Case 1: Measured spectrum 9 MHz BB single-tone, $f_{CK} = 1.8$ GHz                             | . 145 |
| Fig. 5.11. RLC BPF transfer function: measured vs. theoretical ( <i>power</i> 2 $\Omega$ to 50 $\Omega$ ) | . 146 |
| Fig. 5.12. Measured spectrum 9 MHz BB single-tone: RF output vs. DSM input                                | . 148 |

| Fig. 5.13. 6 dB back-off fine-tuning with FD-SOI body-bias                   | 149 |
|------------------------------------------------------------------------------|-----|
| Fig. 5.14. Output power and ACLR for LTE 10 MHz and 20 MHz                   | 150 |
| Fig. 5.15. Measured performance summary and comparison with state-of-the-art | 151 |

| Fig. 6.1. M-TI mixer array: diagra | am (left); rectangular pulse trains (right) | [Koh14] 158  |
|------------------------------------|---------------------------------------------|--------------|
| Fig. 6.2. Amplitude (a) and phase  | information (b) in a PWPM signal [Wa        | lling09] 158 |

#### **Additional work**

| Fig. A | A.1. 4 <sup>th</sup> order CDSM architecture (a); Simulated output spectrum (b) [Marin17a] | 163 |
|--------|--------------------------------------------------------------------------------------------|-----|
| Fig. A | A.2. Symmetric embedded-FIR (a); asymmetric unbalanced embedded-FIR (b)                    | 165 |
| Fig. A | A.3. Output spectrum: Asymmetric unbalanced/Symmetric embedded-FIR ( $k = 5$ )             | 165 |

## List of tables

#### Chapter 2

| Table 2.1. Overview of IEEE 802.11 [Wiki802]                    | 13 |
|-----------------------------------------------------------------|----|
| Table 2.2. Frequency offset values for spectral mask definition | 15 |
| Table 2.3. State-of-the-art in analog and digital transmitters  | 29 |

#### Chapter 3

| Table 3.1. Hardware comparison for L-th order M-channel TI DSM [Kozak03]    | 46 |
|-----------------------------------------------------------------------------|----|
| Table 3.2. Peak SNR 3 <sup>rd</sup> order CIFB DSM [Marin15]                | 50 |
| Table 3.3. FIR filter initial design specifications                         | 54 |
| Table 3.4. BPF signal-band attenuation                                      | 58 |
| Table 3.5. Optimized FIR filter design specifications                       | 59 |
| Table 3.6. Single-ended FIR-PA drain efficiency                             | 66 |
| Table 3.7. Possible values of $V_{out,diff}$ depending on $in_1$ and $in_N$ | 67 |
| Table 3.8. Differential FIR-PA drain efficiency                             | 70 |
| Table 3.9. FIR-PA architectures performance summary                         | 72 |
| Table 3.10. Simulated FIR-PA performance                                    | 72 |

| Table 4.1. Coefficient adjustment for 80 MHz and 160 MHz                                            | 105 |
|-----------------------------------------------------------------------------------------------------|-----|
| Table 4.2. CMOS inverter design results                                                             | 110 |
| Table 4.3. Differential FIR-PA drain efficiency - CMOS inverter and MOM capacitor                   | 117 |
| Table 4.4. Differential FIR-PA drain efficiency – configurability, parasitics extraction            | 122 |
| Table 4.5. Differential FIR-PA drain efficiency - $R_{on}$ , configurability, parasitics extraction | 122 |

| Table 4.6. Digital blocks power consumption breakdown [mW]                          | . 132 |
|-------------------------------------------------------------------------------------|-------|
| Table 4.7. Post-layout circuit-level simulations - expected transmitter performance | . 132 |

| Table 5.1. Test cases description 14 | 17 |
|--------------------------------------|----|
|--------------------------------------|----|

# List of acronyms

| A/D  | Analog-to-digital                                      |
|------|--------------------------------------------------------|
| ACW  | Amplitude-control word                                 |
| ADS  | Advanced design system                                 |
| AP   | Access point                                           |
| AWG  | Arbitrary waveform generator                           |
| BB   | Baseband                                               |
| BE   | Back-end                                               |
| BGA  | Ball grid array                                        |
| BPF  | Band-pass filter                                       |
| BS   | Borrow-save                                            |
| BW   | Bandwidth                                              |
| ССК  | Complementary code keying                              |
| CDSM | Complex delta-sigma modulator                          |
| CIFB | Cascade-of-integrators feedback form                   |
| CIFF | Cascade-of-integrators feedforward form                |
| CIM3 | Counter 3 <sup>rd</sup> -order intermodulation product |
| CPWG | Coplanar waveguide with ground                         |
| CRFB | Cascade-of-resonators feedback form                    |
| CRFF | Cascade-of-resonators feedforward form                 |
| D/A  | Digital-to-analog                                      |

| DAC    | Digital to analog converter         |
|--------|-------------------------------------|
| DBFS   | Decibel relative to full-scale      |
| DDRM   | Direct digital to RF modulator      |
| DEM    | Dynamic element matching            |
| DFE    | Digital front-end                   |
| DFS    | Dynamic frequency selection         |
| DPA    | Digital power amplifier             |
| D-RF   | Digital to RF                       |
| DRFC   | Direct-RF Conversion                |
| DRFM   | Digital to RF mixer                 |
| DSM    | Delta-sigma modulator               |
| DSO    | Digital sampling oscilloscope       |
| DSP    | Digital signal processing           |
| DSSS   | Direct-sequence spread spectrum     |
| DUT    | Device under test                   |
| ENOB   | Effective number of bits            |
| EVM    | Error vector magnitude              |
| FAA    | Federal Aviation Administration     |
| FCC    | Federal Communications Commission   |
| FD-IQ  | Frequency dependent IQ              |
| FD-SOI | Fully-depleted silicon-on-insulator |
| FE     | Front-end                           |
| FEC    | Forward error correction            |
| FFT    | Fast Fourier Transform              |
| FHSS   | Frequency-hopping spread spectrum   |
| FIR    | Finite impulse response             |

| FIR-PA  | Finite impulse response power amplifier     |
|---------|---------------------------------------------|
|         |                                             |
| GPS     | Global Positioning System                   |
| GLONASS | Global Navigation Satellite System          |
| HL      | High-to-low                                 |
| I/Q     | In-/Quadrature-phase                        |
| IC      | Integrated circuit                          |
| ІоТ     | Internet of Things                          |
| ISI     | Inter-symbol interference                   |
| LH      | Low-to-high                                 |
| LO      | Local oscillator                            |
| LPDSM   | Low-pass delta-sigma modulator              |
| LPF     | Low-pass filter                             |
| LSB     | Least significant bit                       |
| LTE     | Long-Term Evolution                         |
| LVT     | Low Voltage Threshold                       |
| MASH    | Multi-stage-noise-shaping                   |
| MIM     | Metal-insulator-metal                       |
| МІМО    | Multiple input multiple output              |
| МОМ     | Metal-oxide-metal                           |
| MSB     | Most significant bit                        |
| MUX     | Multiplexer                                 |
| NTF     | Noise transfer function                     |
| NTIA    | National Telecommunications and Information |
|         | Administration                              |
| OFDM    | Orthogonal frequency-division multiplexing  |

| 00B   | Out-of-band                           |
|-------|---------------------------------------|
| OSR   | Oversampling ratio                    |
| РА    | Power amplifier                       |
| PAE   | Power added efficiency                |
| PAPR  | Peak-to-average power ratio           |
| РСВ   | Printed circuit board                 |
| PSD   | Power spectral density                |
| PWM   | Pulse-width modulation                |
| PWPM  | Pulse-width pulse-position modulation |
| QAM   | Quadrature amplitude modulation       |
| RC    | Resistance/Capacitance                |
| RCA   | Ripple carry adder                    |
| RF    | Radio frequency                       |
| RLC   | Resistance/Inductance/Capacitance     |
| RMS   | Root mean square                      |
| RX    | Receiver                              |
| S/P   | Serial-to-parallel                    |
| SC    | Switched-capacitor                    |
| SC PA | Switched-capacitor power amplifier    |
| SDR   | Software defined radio                |
| SE    | Single-ended                          |
| SISO  | Single input single output            |
| SMD   | Surface-Mounted Device                |
| SMPA  | Switching-mode power amplifier        |
| SNDR  | Signal to noise and distortion ratio  |
| SNR   | Signal to noise ratio                 |

| SPI  | Serial Peripheral Interface                    |  |
|------|------------------------------------------------|--|
| TDWR | Terminal Doppler Weather Radar                 |  |
| TI   | Time-interleaving                              |  |
| TL   | Transmission line                              |  |
| ТХ   | Transmitter                                    |  |
| UAS  | Unmanned aircraft systems                      |  |
| UNII | Unlicensed National Information Infrastructure |  |
| UTBB | Ultra Thin Body & Box                          |  |
| VHDL | Very High Speed Integrated Circuit Hardware    |  |
|      | Description Language                           |  |
| WLAN | Wireless local area networks                   |  |
| ZOH  | Zero-order hold                                |  |

## **1. Introduction**

The increasing demand of more and more performant mobile communications offers interesting research perspectives towards highly integrated communication systems. However, in order to support this evolution, we need to overcome associated challenges, such as high data rates, low power consumption, reduced area, and configurability.

#### **1.1. Motivation**

First of all, we need higher data rates because we would like to access (send or receive) more information, faster. Technology advancements try to answer this demand and follow the trend of increased signal bandwidths and data rates. This is also the case of the IEEE 802.11 communication standard, which has seen an exponential evolution during the past 20 years.

Secondly, if we were able to offer performance while reducing the power consumption, we could propose a mobile device with either increased battery autonomy or with a reduced battery size, thus offering flexible solutions to market demand. The real case of a personal mobile phone (smartphone) available on the market after a 9– month daily use is considered in Fig. 1.1. Regarding the battery statistics, it is easily seen that the consumption due to Internet connections (check e-Mail, web browsing) and communications (social networks) represents almost one

| ( 🔯 Battery        |                         |     |
|--------------------|-------------------------|-----|
| Battery Percentage |                         |     |
| 33% - Not charging |                         |     |
|                    | 2d 5h 1m 46s on battery |     |
|                    | Mobile standby          | 32% |
| ٦                  | Wi-Fi                   | 32% |
| ø                  | Phone idle              | 30% |
|                    | Screen                  | 3%  |
|                    | Android System          | 2%  |
| ÷                  |                         |     |

Fig. 1.1. Battery - smartphone

third of the total power consumption of the mobile device.

Furthermore, a size reduction would allow the integration of more functions on the same device at lower fabrication costs. This drives the research towards advanced technology nodes, which benefit from reduced transistor sizes, and/or improved control (gate and back-gate), such as CMOS 28nm FD-SOI from STMicroelectronics [FDSOI16]. Still, the design of radios becomes more challenging when scaling with technology, thus requiring innovative solutions to optimize on-chip integration.

Finally, we target configurable solutions, in order to comply and keep-up with the evolution of multi-standard communications and be able to connect to existing networks without constraints, e.g. accessing mobile Internet in the mountains. In addition, most traditional analog solutions include circuit blocks specific to a given communication standard, whereas a configurable architecture could enable the reuse of functions within the same circuit and further reduce device area.

Therefore, recent development in mobile communications has seen a transition from analog to digital processing in transmitter architectures in order to provide configurable high-speed solutions, (**still**) taking advantage of a continuous technology scaling, down to 5nm process, which is expected to be fully implemented by the end of 2020 [Zafar16].

Hence, this thesis proposes the system design and integrated circuit (IC) implementation of an all-digital transmitter, which is optimized for low power consumption and reduced area in CMOS 28nm FD-SOI, in order to target cellular communications and support emerging Internet of Things (IoT) applications.

#### **1.2. Research considerations**

The present PhD work covers the complete design of an all-digital transmitter from initial design specifications to fabrication and validation, considering the state-of-the-art study, system level, integrated circuit (IC), and printed circuit board (PCB) design.

The design methodology proposed throughout this work is based on the simplification of both general system and circuit architectures through innovative, engineering design techniques. The general system is concentrated on single-bit signal processing thanks to Delta-Sigma Modulation, which can be reduced to a basic 2-levels switching function [Frappe09]. Hence, the operation of the output stage can be also reduced to a simple basic function, which is implemented at circuit level in the form of a switching-mode power amplifier (SMPA) built with a CMOS inverter switching on and off capacitors.

Consequently, I proposed to use existing simple functions, and move the complexity towards the way these functions are implemented to work together in a highly-efficient architecture. For example, the switched-capacitor network in the output stage is also used to obtain a finite impulse response (FIR) filtering function thanks to the constant-level driving signals provided by the Delta-Sigma Modulators.

The initial research was focused on Delta-Sigma Modulators (DSM) and the way to operate multiple DSMs in a time-interleaved (TI) scheme, which demonstrated the feasibility of the solution. Furthermore, it was seen that TI DSM design is compatible with automatic synthesis and layout tools (reduced time of design) to achieve low complexity (single-bit) at lower operating frequency per TI channel (reduced design constraints).

Therefore, the study on the possibilities and feasibility of the digital design, notably single-bit DSM, played a major role in the research directions following the state-of-the-art, towards the research of all-digital transmitters combining low-complexity digital processing with an efficient power amplifier (PA) stage and FIR filtering.

An extensive analysis was performed for each system block (theoretical concepts, circuit implementation) considering both the block itself and the complete transmitter chain (co-design), which enabled numerous simplifications and improvements (at system, circuit schematic/layout levels) to avoid redundancy and over-design.

Finally, it is noted that this research work was greatly facilitated by the use of programming, whether it was MATLAB for system level design, VHDL and Verilog for digital synthesis, Tcl for digital synthesis and automatic place and route, or SKILL for iterative layouts in Cadence.

#### **1.3. Main contributions**

The following resumes the main contributions of this work:

- > Extensive theoretical analysis of a complete all-digital transmitter chain designed for WLAN applications. The proposed design methodology relies on the co-design of the constitutive blocks to fit design specifications, by using the advantages of one block to compensate the disadvantages of other blocks. Hence, single-bit Delta-Sigma modulation is used to enable a simplified switching scheme in the output stage. The **time-interleaving** concept is applied to single-bit Delta-Sigma modulators in order to increase the maximum frequency operation and support WLAN applications [Marin15]. Taking advantage of DSM constant-level switching, the output stage is implemented as a switching-mode class-D PA. Each power cell (switching element and capacitor) is used to create the coefficients of a digital FIR filter, thus obtaining a **digital FIR-PA** which meets out-of-band noise requirements and supports multi-standard coexistence. Finally, the FIR-PA architecture is optimized using extended digital configurability to avoid operation switching-dependent power redundancy and reduce consumption [Marin17b].
- The integration on chip of the proposed transmitter. The complete differential FIR-PA stage is integrated under two signal pads, resulting in zero effective additional area. This was made possible by the low complexity of the unitary cell (inverter and capacitor), and the 10 metal layers with Flip-Chip pads flavor of the technology. Furthermore, I used the body-bias feature of the 28nm FD-SOI technology to reduce switching non-idealities due to the CMOS inverter (equalize simultaneously the rise and fall times, and the low-to-high and high-to-low delays) and improve output performance.
- ➤ The functionality of the differential FIR-PA is validated when transmitting Delta-Sigma modulated sinewave and LTE 10 MHz/20 MHz (LTE10/LTE20) signals (6 dB PAPR) at a center frequency ( $f_c$ ) of 900 MHz. The peak output power obtained on a 50 Ω load at 1 V supply voltage in the sinewave case is

2.9 dBm for a total power consumption of 35 mW, out of which the FIR-PA consumes only 10.8 mW (useful output power and dissipated power). The maximum output power and ACLR for LTE10 are -2.8 dBm and -33/-41 dBc, whereas for LTE20 we obtained -3.2 dBm and -33.5/-40 dBc, respectively.

#### 1.4. Manuscript organization

This manuscript is organized as follows:

Chapter 2 details the evolution of modern communication standards, such as the IEEE 802.11 and sets the system specifications in terms of signal bandwidth, operating frequency, and multi-standard compatibility. An extensive state-of-the-art in WLAN transmitter architectures is presented next, focusing on the advantages and disadvantages of analog and digital (polar or Cartesian) implementations. The conclusion of this study implies a trade-off between two main digital architectures, one based on multi-bit digital to analog converters (DAC), and the other based on low complexity structures (reduced number of bits) combined with additional filtering.

Chapter 3 analyzes the research directions derived from the state-of-the-art and introduces the proposed transmitter architecture. Furthermore, each constitutive block is described to ensure the feasibility of the proposed implementation and identify possible design limitations. Moreover, I introduce innovative solutions at block and full-architecture levels to improve efficiency and avoid operation redundancy and over-design, i.e. introduce configurable time-interleaved DSM [Marin15], co-design of FIR and RLC bandpass filters, and switched capacitor FIR-PA. The extensive study of non-idealities, due to switching, coefficient implementation, and voltage supply variation completes the system-level analyses and sets the main design specifications for the circuit (schematic and layout) design.

Chapter 4 describes the circuit design of the proposed all-digital transmitter and highlights the innovative implementation of the FIR-PA under the RF signal pads, with **zero** effective additional area cost. Each circuit block is rigorously analyzed and optimized considering area, power consumption, or additional non-idealities effects. The architecture
takes advantage of the advanced design technology, 28nm CMOS FD-SOI from STMicroelectronics, to implement fast switching inverters with reduced non-idealities (using FD-SOI body-bias feature) in the PA, and highly-performant digital circuits using standard library cells to generate the digital FIR driving signals based on quadrature time-interleaved DSMs. Finally, I identified possible solutions and directions to enhance the overall performance of the all-digital transmitter in a second IC version.

Chapter 5 presents the measurements setup and results to validate the theoretical analysis of the proposed all-digital transmitter. The IC is placed on a Ball Grid Array (BGA) substrate (custom designed at STMicroelectronics), which connects to a dedicated Printed Circuit Board (PCB). Both BGA and PCB designs are thoroughly described. Finally, the measurements results are presented and compared with relevant state-of-the-art publications [Marin17b].

Chapter 6 concludes this work, and offers future directions for Transmitters in wireless communication systems.

In the last chapter on Multi-standard coexistence, I describe two innovative schemes based on Complex Delta Sigma Modulators (CDSM) [Marin17a] and digital to RF mixing with embedded-FIR filtering [Marin16], which are seen as possible solutions to support and improve multi-standard coexistence. Such architectures could be integrated (using automatic tools for synthesis and layout) in a second version of the proposed all-digital transmitter to further reduce complexity (due to stringent filtering constraints) and improve system efficiency.

# **1.5. Chapter Bibliography**

**[FDSOI16]** FD-SOI STMicroelectronics (online), http://www.st.com/web/en/about\_st/fd-soi. html, Oct. 2016.

**[Frappe09]** A. Frappé, A. Flament, B. Stefanelli, A. Kaiser and A. Cathelin, "An All-Digital RF Signal Generator Using High-Speed  $\Delta\Sigma$  Modulators," *IEEE J. Solid-State Circuits*, vol. 44, no. 10, pp. 2722-2732, Oct. 2009.

**[Marin15]** R.-C. Marin, A. Frappé, A. Kaiser, and A. Cathelin, "Considerations for high-speed configurable-bandwidth time-interleaved digital delta-sigma modulators and synthesis in 28 nm UTBB FD-SOI," in *2015 IEEE 13th International New Circuits and Systems Conference (NEWCAS)*, Grenoble, pp. 1–4, June 2015.

**[Marin16]** R.-C. Marin, A. Frappé, A. Kaiser, "Delta-Sigma Based Digital Transmitters with Low-Complexity Embedded-FIR Digital to RF Mixing," *23rd IEEE International Conference on Electronics, Circuits and Systems (ICECS)*, Monte Carlo, pp. 237-240, Dec. 2016.

**[Marin17a]** R.-C. Marin, A. Frappé, A. Kaiser, "Considerations for Complex Digital Delta-Sigma Modulators for Standard Coexistence in Digital Wireless Transmitters," **accepted** to *IEEE Trans. Circuits Syst. I, Reg. Papers,* June 2017.

**[Marin17b]** R.-C. Marin, A. Frappé, B. Stefanelli, P. Cathelin, A. Cathelin, A. Kaiser, "A 28nm FD-SOI CMOS Digital RF Transmitter with Switched-Capacitor Pre-Power Amplifier and Embedded Band Filter," **to be submitted** to *Journal of Solid-State Circuits*.

**[Zafar16]** R. Zafar, "TSMC To Fully Adopt EUV For 5nm By 2020; 10nm To Be Profitable By End Of 2017," WCCFTECH (online), http://wccftech.com/tsmc-5nm/ (October 2016).

# 2. WLAN Transmitters

The first part of this chapter presents the evolution of the IEEE 802.11 standard and an overview of the standard amendments and channel specifications. This is followed by a review of the state-of-the-art in wireless transmitter (TX) architectures, traditional analog and digital, focusing on advantages, disadvantages and main performances of each implementation studied. Finally, based on this study, we identify two possible directions for the implementation of a complete transmitter system and we set the main design specifications.

# 2.1. IEEE 802.11 Standard Specifications

This work is motivated by the perspective of highly integrated communication systems to support fast and easy access to information. Since its introduction 1997, the IEEE 802.11 communication standard has had an exponential evolution, in order to address associated challenges in terms of larger signal bandwidths (BW) and higher data rates, through the use of complex modulation schemes and multiple antennas [Wiki802].

The original version of the standard specified a signal with a bandwidth of 22 MHz transmitted over the Industrial Scientific Medical frequency band at 2.4 GHz. However, the data rate was limited to 2 Mbit/s, due to the use of spread spectrum techniques, such as frequency-hopping spread spectrum (FHSS) and direct-sequence spread spectrum (DSSS).

FHSS is based on a signal spread over rapidly changing frequencies in a predetermined sequence known by both transmitter and receiver (RX). In contrast, the DSSS adds pseudorandom noise to the data, by multiplying the signal with a sequence of "1" and "-1" values at a frequency much higher than that of the original signal.

Later, DSSS was used in 802.11b for an increased data rate up to 11 Mbit/s, thanks to the introduction of complementary code keying (CCK) based on multiple sequences with shorter length. However, the 802.11b devices present interference issues with other products operating in the 2.4 GHz band, such as microwave ovens, cordless phones or Bluetooth devices.

Furthermore, the introduction of 802.11a has seen a transition from spread-spectrum transmission to orthogonal frequency-division multiplexing (OFDM), with increased data rates up to 54 Mbit/s. In OFDM, the digital data is encoded on multiple carrier frequencies which are chosen to be orthogonal to each other, thus allowing high spectral efficiency and in the same time eliminating cross-talk between sub-channels. However, the orthogonality is affected by any frequency deviation between the transmitter and receiver, which can cause inter-carrier interference. In addition, the 802.11a has been targeted to operate in the 5 GHz band and takes advantage of the increased number of usable channels and less interference with other devices.

Early 2003 the 802.11g standard has been adopted, combining the operation in the 2.4 GHz band (802.11b) with OFDM based transmission schemes (802.11a) for a maximum data rate of 54 Mbit/s exclusive of forward error correction codes (FEC). The 802.11g devices are fully compatible with 802.11b, though the presence of both devices in the network will significantly reduce the overall speed [Tektronix13].

Finally, the latest amendments 802.11n and ac combine OFDM with multiple input, multiple output (MIMO) technology, which enables the transmission of multiple signals over multiple antennas in order to increase the capacity of the radio link. Therefore, 802.11ac can support signal bandwidths of 20 to 160 MHz, using up to 8 spatial streams with a data rate per stream as high as 866 Mbit/s in the 160 MHz case, being the first Wi-Fi standard to reach gigabit per second range [Std11ac].

Table 2.1 summarizes the performance and targeted applications of IEEE 802.11.

| 802.11<br>Protocol | Release<br>year | Frequency<br>[GHz] | Bandwidth<br>[MHz] | Data rate per<br>stream [Mbit/s] | MIMO<br>streams | Modulation   |
|--------------------|-----------------|--------------------|--------------------|----------------------------------|-----------------|--------------|
| 802.11             | 1997            | 2.4                | 22                 | 1, 2                             | -               | DSSS, FSSS   |
| a                  | 1999            | 5                  | 20                 | 6, 9, 12, 18, 24,<br>36, 48, 54  | -               | OFDM         |
| b                  | 1999            | 2.4                | 22                 | 1, 2, 5.5, 11                    | -               | DSSS         |
| g                  | 2003            | 2.4                | 20                 | 6, 9, 12, 18, 24,<br>36, 48, 54  | -               | OFDM         |
| n                  | 2009            | 2.4/5              | 20                 | up to 288.8                      | 4               | MIMO<br>OFDM |
|                    |                 |                    | 40                 | up to 600                        | 4               |              |
| ac                 | 2013            | 5                  | 20                 | up to 346.8                      |                 | MIMO         |
|                    |                 |                    | 40                 | up to 800                        | o               |              |
|                    |                 |                    | 80                 | up to 1733.2                     |                 | OFDM         |
|                    |                 |                    | 160                | up to 3466.8                     |                 |              |

 Table 2.1. Overview of IEEE 802.11 [Wiki802]

# **2.1.1.** Channelization

In the 2.4 - 2.5 GHz band there are 14 overlapping channels with center frequencies spaced 5 MHz one from another, except for a spacing of 12 MHz between channels 13 and 14 (Fig. 2.1).



Fig. 2.1. Overview of overlapping channels in 2.4 GHz band [Tektronix13]

In order to avoid interferences due to adjacent channels overlapping, it is recommended to leave 3 or 4 channels clear between used channels. This is highlighted in Fig. 2.2 for different modulation schemes (DSSS and OFDM) and different signal bandwidths (20/22/40 MHz).

In the US, there are only three non-overlapping usable channels with a 25 MHz separation (channels 1, 6, 11), whereas in Europe, the separation is 20 MHz [Tektronix13], thus allowing the use of four channels (1, 5, 9, and 13).



Fig. 2.2. Overview of non-overlapping channels in 2.4 GHz band [Tektronix13]

On the other hand, the number of non-overlapping channels in the 5 GHz spectrum is larger thanks to the increased bandwidth availability. All channels are spaced 20 MHz apart and grouped into three bands, called Unlicensed National Information Infrastructure (UNII), whereas UNII-1 is allowed for indoor use only, and UNII-2 and UNII-3 can be used indoor and outdoor (Fig. 2.3).



Fig. 2.3. Available non-overlapping channels in 5 GHz band [Cisco13]

Finally, all the 802.11 devices working either at 2.4 GHz or 5 GHz bands are required to share the available bandwidth, which limits the use of wider bandwidths according to the current channel utilization.

# **2.1.2. Transmit Spectral Mask**

The transmit spectral mask defines the allowed power distribution across the channel and the required signal attenuation outside the channel in order to reduce interferences with transmitters on other channels.

The required spectral mask for OFDM encoding schemes used for 802.11 a/g/n/ac standards is shown in Fig. 2.4, whereas the values of the frequency offsets with respect to the center frequency for signal bandwidths of 20 MHz up to 160 MHz are given in Table 2.2.



Fig. 2.4. OFDM spectral mask for 802.11a/g/n/ac [Tektronix13]

Table 2.2. Frequency offset with respect to the center frequency for spectral mask definition

| Signal Bandwidth | Α      | В      | С       | D       |
|------------------|--------|--------|---------|---------|
| 20 MHz           | 9 MHz  | 11 MHz | 20 MHz  | 30 MHz  |
| 40 MHz           | 19 MHz | 21 MHz | 40 MHz  | 60 MHz  |
| 80 MHz           | 39 MHz | 41 MHz | 80 MHz  | 120 MHz |
| 160 MHz          | 79 MHz | 81 MHz | 160 MHz | 240 MHz |

# 2.1.3. Transmitter Measurements

First of all, transmitter measurements should be performed using 100 kHz resolution bandwidth and a 30 kHz video bandwidth, whereas the transmit power limit is stated in the amendment with typical values between 100 mW (20 dBm) and 1000 mW (30 dBm) depending on regulatory classes and geographical region [Std11n].

Furthermore, the requirements in terms of spectral flatness correspond to the maximum deviation in dB of the average energy of the constellations in each of the subcarriers, typically ±4 dB.

Finally, the transmitter center frequency tolerance is limited to  $\pm 25$  ppm for the 2.4 GHz band and  $\pm 20$  ppm for the 5 GHz band, respectively. The same limits apply to the symbol clock frequency tolerance.

## **2.2.** State-of-the-art in wireless transmitter architectures

In the following section, state-of-the-art transmitter architectures will be presented with a focus on advantages, disadvantages and main performances of each implementation studied. In literature, we may find two main categories of transmitters: analog and digital. The first implementation studied is the standard analog transmitter. Next, the digital transmitter with the Cartesian and polar topologies will be presented. Finally, this study is resumed in the form of a table comprising the most important parameters of transmitters, which will be further used to set the main design specifications.

# 2.2.1. Analog transmitters

The role of a transmitter is to perform modulation, frequency translation and power amplification of a signal, before it is transmitted by the antenna. Figure Fig. 2.5 presents the architecture of a traditional direct conversion analog transmitter where we may identify three main blocks: the digital baseband which deals with the signal modulation, an analog baseband comprising the signal conversion (DAC) and anti-aliasing low-pass filter (LPF), and the RF front-end which performs signal up-conversion and amplification.



Fig. 2.5. Traditional analog transmitter architecture

Most recent works are based on a direct-conversion structure, where the signal upconversion is performed in one step directly to RF, meaning the transmitted carrier frequency is equal to the local oscillator (LO) frequency [He14] [Chen14]. The main drawback of such architectures is the fact that the output of the power amplifier (PA) will corrupt the local oscillator spectrum, since they both work at the same center frequency. This phenomenon is called "injection pulling" and may be alleviated by using offset LO generators [Chung12], or by employing calibration algorithms which correct the VCO control voltage in order to counterbalance the pulling effect [Mirzaei14].

The RF core of an 802.11n MIMO WLAN SoC occupying 3.8 mm<sup>2</sup> in 45nm CMOS was presented in [Kumar13]. The transmitter consists of high-speed DACs, Op-Amp/RC-based filters and mixers with a transformer as load, making it possible to achieve the required PA input voltage swing using only passive components (Fig. 2.6).

A standard fractional-N PLL generates the LO, in which a  $\Delta\Sigma$  modulator is embedded in order to limit the phase noise. P<sub>sat</sub> is shown to be 29 dBm when combining two PA outputs, while the PA drain efficiency is approximately 31%.

Furthermore, this implementation reports one of the highest output power in the case of an OFDM signal at a data rate of 54 Mbit/s (OFDM54), namely  $P_{out,b/g}$  = 22.3 dBm for the *b/g* band and  $P_{out,a}$  = 18.7 dBm for the *a* band, respectively. The reported power consumption is 110-122 mW at 1.35 V supply voltage. However, this value does not consider the 4 PAs and drivers and we estimate the power consumption of the full chip to be at least ~1W.



Fig. 2.6. Transmitter block diagram with mixer-PA interface [Kumar13]

A more complete solution integrating all of the functions of an 802.11a/b/g/n/ac WLAN in a 3-stream MIMO SoC was introduced in [He14]. A 5<sup>th</sup>-order Chebyshev LPF with programmable gain and bandwidth is used to support up to 80MHz signal bandwidth (Fig. 2.7). The LO signals for 2.4 GHz and 5 GHz transceivers are generated by an all-digital PLL, having the best reported Figure of Merit with a consumption as low as 12.9 mW.



Fig. 2.7. Block diagram of the dual-band 3-stream MIMO WLAN radio [He14]

The error vector magnitude (EVM) is shown to be lower than -37 dB at  $P_{out} = -5$  dBm (w/o PA) for signal bandwidths of 40 MHz @ 2.4 GHz and 80 MHz @ 5GHz, respectively. The 40nm chip consumes 1-1.5 W in Transmitter (TX) mode (w/o PA), while the analog and RF circuits occupy ~21.5mm<sup>2</sup> out of the total 46 mm<sup>2</sup> die area. The circuit achieves an over-the air TCP/IP throughput rate of 1.1 Gb/s, however the full chip consumes a lot of power and occupies a very large area.

Furthermore, [Chen14] proposes a way to increase the signal BW to 160 MHz for 802.11ac by identifying the Frequency Dependent IQ imbalance (FD-IQ) as a design constraint and trying to cancel it. A pair of 10-bit DACs operating at 960 MS/s followed by a first-order RC filter is used to meet stringent VHT80 256QAM EVM requirement of -32 dB (Fig. 2.8). This structure achieves the highest output power of 17.5 dBm for 80 MHz channel bandwidth, while having similar results in terms of output power for OFDM54 as presented in [Kumar13]. The overall power consumption (~1.6-1.7 W) is larger than in [He14], but comparable since [He14] does not include a PA. The analog and RF circuits occupy 7.7 mm<sup>2</sup> out of a total of 27.8 mm<sup>2</sup> in 55nm CMOS, i.e. one third of the effective area reported in [He14].



Fig. 2.8. Transmitter block diagram [Chen14]

These references offer important information about the main design parameters of transmitters, such as 17 to 20 dBm output power, EVM of -32 dB and targeted signal bandwidths up to 80 MHz in 802.11ac.

However, we can remark that the analog transmitter consumes more than 1-1.5 W, while occupying large area between 4 and 46 mm<sup>2</sup>. This is partially due to the fact that the analog solution is not flexible at all, having separate paths for each standard with specific passive components.

### 2.2.2. Digital Transmitters

The high demand of more and more performant mobile communications systems determines the research and advancement in this field, in order to overcome the associated challenges, such as higher data rates, configurability for multi-standard, area and power consumption reduction. By taking advantage of the new, advanced CMOS technologies, transmitter architectures primarily including digital functions have been proposed and proven feasible for high data-rates [Wang14].

Mainly, the transmitted signal can be represented in two different forms, Cartesian and polar, which determine the nature of the digital transmitter architecture. On the one hand, the Cartesian coordinates of a data point A (Fig. 2.9) are obtained by orthogonal projection on the axis Ox and Oy and expressed in a complex form

$$IQ_A = I_A + jQ_A \tag{2.1}$$

where  $I_A$  is the in-phase and  $Q_A$  the quadrature component, respectively.

On the other hand, the polar representation defines the same point A by the angle

$$\theta_A = \arctan \frac{Q_A}{I_A} \tag{2.2}$$

and the radius

$$r_{A} = \sqrt{I_{A}^{2} + Q_{A}^{2}}$$
(2.3)

Finally

$$IQ_A = r_A \cdot e^{j\theta_A} \tag{2.4}$$

Furthermore, [Ravi12] uses the outphasing signal representation derived from the polar form, where the signal IQ(t) is obtained as a sum of two signals with the same

amplitude, variable angle  $\phi(t)$  to encode the amplitude information a(t), and commonmode phase  $\theta(t)$  to control the desired output phase.

$$IQ(t) = \max[a(t)] \cdot \cos[wt + \theta(t)] \cdot \cos[\varphi(t)]$$
(2.5)

$$\varphi(t) = \arctan \frac{a(t)}{\max[a(t)]}$$
(2.6)



Fig. 2.9. General representation of a data point

#### 2.2.2.1. Polar architecture

Recent digital polar architectures have presented good performances in terms of output power comparable to traditional analog architectures, around 20 dBm, while consuming 4-5 times less than its analog counterparts, namely around 250-300 mW. However, it is mainly used in applications targeting small bandwidth signals, 20 to 40 MHz, due to inherent bandwidth expansion caused by nonlinear Cartesian to polar conversion, as depicted in Fig. 2.10. Here, both the magnitude and phase spectra extend beyond the sharply limited Cartesian signal spectrum, whereas a strong DC component is visible in the envelope spectrum, resulting from its positive characteristic. For example, in [Walling13] it is shown that in order to meet the IEEE 802.11a standard, the bandwidths of the amplitude and phase must be two, respectively three times larger than the bandwidth of the original Cartesian signal.



Fig. 2.10. Polar representation bandwidth expansion

A digital polar transmitter with on-chip power amplifier in 65nm CMOS (Fig. 2.11) is presented in [Zheng13], providing an output power of 20 dBm and a power added efficiency, PAE = 32.3% for WCDMA and 20 MHz WLAN applications. A PLL is used to modulate the output phase of the carrier and a switched-capacitor digital polar modulator is used to switch on/off PA cells according to a 9-bit digital amplitude-control word (ACW).

The circuit occupies an active area of 0.77 mm<sup>2</sup> and consumes 55 mW and 302 mW for an output power of 0 dBm and 20 dBm, respectively. Distortions in polar architectures due to amplitude variations have been reduced in [Zheng13] using an AM-adaptive bias technique for an AM-AM INL error of 3.2% at peak  $P_{out} = 13.7$  dBm. This involved adding a PA replica which senses the RF envelope and regulates the PA bias accordingly.



Fig. 2.11. Digital Polar transmitter block diagram [Zheng13]

However, if the processing signals had a constant envelope, the PA replica would no longer be necessary. This idea is implemented in the outphasing technique, where the transmitted signal is obtained through the summation of two signals with constant amplitude and time-varying phases.

A well-known drawback of modern communications standards (Wi-Fi, WiMAX) is the large peak-to-average power ratio (PAPR), ~13 dB, meaning an increased probability for signals with low output power. In order to reduce power consumption, the transmitter needs to be efficient not only at peak power but also at back-off. For example, a linear PA will have a low efficiency at low output power, since the output voltage varies in time, while the DC power remains relatively constant.

Ravi (Intel/2012) demonstrated the use of a delay-based approach in a digital outphasing transmitter for 20-40 MHz channel WLAN [Ravi12]. Here, the information is encoded in the time location of the clock edges which are dynamically delayed using two 8-bit phase modulators (Fig. 2.12).



Fig. 2.12. Proposed outphasing TX architecture [Ravi12]

This architecture can target bandwidths up to 40 MHz, i.e. larger than in other polar architectures [Zheng13]. However, the power consumption excluding the PAs is 2 times higher than in [Zheng13] at similar output power levels.

A way to improve efficiency at back-off power in digital polar transmitters is identified in [Ye13] and relies on efficient impedance modulation. The architecture employs two 8-bit switching current-mode class-D PAs for high-peak efficiency and a dual-section power combiner realized by two transformers with series connected secondaries (Fig. 2.13). At back-off power, the PA efficiency is improved by ~50% (Efficiency ratio in Fig. 2.13 right), when disabling the second PA and shorting the primary of the second transformer.

The 8-bit amplitude and 9-bit phase resolution are chosen for acceptable out-of-band noise of -125 dBm/Hz at 200 MHz offset, which is 15 dB/Hz better than in [Ravi12]. The reported output power and PAE for WLAN 802.11g 54 Mbps mode are lower than in [Zheng13], namely 16.8 dBm and 21.8%, respectively.



Fig. 2.13. Digitally modulated polar TX: Architecture (a); Drain efficiency (b) [Ye13]

Furthermore, it is shown in [Yoo13] that the average power added efficiency can be increased up to 33% for OFDM signals, at an average output power of 16.8 dBm, when introducing a polar class-G Switched-Capacitor PA topology. This is achieved by applying different voltages to different unit capacitors simultaneously, using enhanced digital coding to control bottom-plate switches (Fig. 2.14). The unitary cell relies on a cascode switch structure to support higher supply voltages ( $2*V_{DD}$ ) and additional output series pMOS-nMOS transistors to reduce gate stress and leakage.



Fig. 2.14. Class-G SC PA: Architecture (left); Ideal efficiency (right) [Yoo13]

## 2.2.2.2. Cartesian architecture

The main advantage of the Cartesian architecture is that the signal bandwidth remains constant (Fig. 2.15), thus being able to address larger bandwidth signals [Alavi14] [Wang14] than its polar counterpart. Ideally, the I and Q paths are identical, which makes the design simpler, as it can be done for one path and reproduced for the other. However, this redundancy directly impacts power consumption and area. Moreover, gain imbalance and offset of the two paths can stretch and rotate the signal constellation, which may degrade the EVM performances.





First, an all-digital wideband solution of a 2x13-bit RF-DAC (Fig. 2.16) is implemented in [Alavi14], to be able to increase the modulation bandwidth to 154 MHz.



Fig. 2.16. TX block diagram (left); QAM spectrum including replicas (right) [Alavi14]

Simple digital pre-distortion is enough to obtain an EVM under -28 dB. A high output power of 22.8 dBm is reported, with the drain and system efficiency of 42% and 34%, respectively, resulting in a power consumption of ~600 mW including PA. In addition, there is no filtering for the out-of-band noise. Thus, the replicas of the ZOH operation can be seen at ±300 MHz from the center frequency  $f_c = 2.4$  GHz, which affects especially the larger bandwidth signals.

Furthermore, a Multi-band Multi-mode all-digital quadrature transmitter is presented in [Wang14], to address 80 MHz signal bandwidth in 802.11ac with an output power of 15.7 dBm [Wang14]. The proposed circuit comprises a digital front-end (DFE), 13-bit RF power DAC, LO chain and an interface between DFE and DAC working @ 800 MHz (Fig. 2.17). The PSD level measured in the GPS (Global Positioning System) band around 1575 MHz is reported to be -130 dBm/Hz, when transmitting an 802.11g 54 Mbps signal on Channel 1 at an output power of 16.35 dBm. Therefore, both Cartesian architectures [Alavi14] [Wang14] can target large bandwidth applications, though different trade-offs were considered for power consumption and EVM.



Fig. 2.17. All-digital quadrature transmitter architecture [Wang14]

On the one hand, [Alavi14] is using 13-bit DACs working at 300 MHz to reduce consumption, but image replicas are seen very close to the signal. In addition, the duty-cycle of the upsampling clocks is set to 25%, in order to avoid correlation between I and Q paths, when performing orthogonal summation. This allows the use of simple pre-distortion to fulfill EVM requirements, while consuming an additional 5.5 mW for the clock generation [He10]. On the other hand, in [Wang14] the signal images are pushed further away using a sampling frequency of 800MHz, but the 13-bit DACs consumes more due to higher operating frequency. The clocks are easier to generate, having a duty-cycle of 50%,

but the interaction between I and Q paths can degrade the EVM. In order to compensate for IQ mismatch, DC offset, and DPA nonlinearity simultaneously, a sophisticated 2D adaptive algorithm is included in the DFE, which consumes 22 mW in total.

In contrast, [Hezar14] proposes a highly efficient 45nm CMOS digital transmitter solution for high output power without using DACs [Hezar14]. The baseband I/Q signals are up-sampled and noise shaped by a LPDSM [Hezar10] with a 4-bit output, which is converted to a time-quantized window via PWM (Fig. 2.18). This allows an efficient switching of the PAs based on the signal amplitude, namely a peak efficiency of 47% and 23% at back-off power. Normally, the PWM needs to work at  $4*f_c$ , but due to high complexity the data is read from a lookup table (LUT) at a much lower speed. Even so, the logic still needs to work at  $2*f_c$  for signal mixing. Furthermore, the lower resolution used here will increase the out-of-band noise floor, ~6dB/bit, but this problem is not addressed.



Fig. 2.18. Digital PWM transmitter system level view [Hezar14]

# **2.3.** Conclusion

A performance summary of State-of-the-art analog and digital transmitters is presented in Table 2.3. It is seen that digital transmitters consume around 250-300 mW (including digital-to-analog conversion, filtering, signal mixing and power amplification) and occupy less than 1 mm<sup>2</sup> depending on technology node, whereas its analog counterpart can consume 4-5 times more and occupy an area 4-10 times larger.

Digital transmitters are much easier to migrate with technology downscaling and represent a very good solution for systems with high constraints on power consumption and area. Furthermore, the digital Cartesian architecture can be preferred in applications targeting large bandwidth signals, as it can address bandwidths up to 154 MHz [Alavi14], and is not affected by bandwidth expansion inherent to polar architectures.

Moreover, the PA efficiency is identified as an important parameter of digital transmitters, since it directly impacts the overall power consumption. The switched capacitor class-G PA presented in [Yoo13] proved to be a good solution to improve average PA efficiency up to 33%, namely 12% more than the next best solution based on switching current mode PAs [Ye13]. In addition, it is shown in [Hezar14], that the class-G SC PA is also compatible with Cartesian architectures achieving a peak PAE of 47%.

Two important issues which concern digital transmitters are the image replicas and the out-of-band-noise, respectively. On the one hand, image replicas can be seen very close to the signal when the up-sampling frequency is not high enough. For instance, in [Alavi14], for a bandwidth of 154 MHz, image replicas will be very close at ±300 MHz from center frequency. These images are pushed further away by using a higher sampling frequency, 800 to 1000 MHz in [Ye13] [Wang14], however the overall power consumption will increase. On the other hand, the out-of-band (OOB) noise (at 200 MHz and above) injected into a nearby receiver within another band should be ideally below thermal noise floor, i.e. -174 dBm/Hz at 300 K, in order to allow coexistence of multiple radios.

The best result in terms of OOB noise can be found in [Wang14], ~-130 dBm/Hz at GPS band when transmitting 802.11g 54Mbps signal. However, [Hezar14] proposes an architecture which replaces the 13-bit DAC @800 MHz from [Wang14] with a 4-bit DSM and 1-bit PWM. This solution enables the use of efficient switching scheme of the PA with reduced complexity. Nevertheless, reducing the number of bits translates to higher quantization noise, approximately 6dB for each bit [Kozak03], hence additional filtering may be necessary in order to improve the out-of-band noise floor.

As a result, a trade-off can be made between a solution implementing high-speed DACs with large number of bits, or one which finds a way to combine a reduced number of bits with additional filtering and efficient switching for the PA.

|                           | [Kumar13]                   | [He14]                               | [Zheng13]                  | [Ravi12]                    | [Ye13]                             | [Alavi14]                | [Wang14]                                       |
|---------------------------|-----------------------------|--------------------------------------|----------------------------|-----------------------------|------------------------------------|--------------------------|------------------------------------------------|
| Architecture              | Direct                      | Direct                               | Polar                      | Polar                       | Polar                              | I/Q                      | I/Q                                            |
| Process                   | 45nm                        | 40nm                                 | 65nm                       | 32nm                        | 65nm                               | 65nm                     | 40nm                                           |
| Standard                  | 11a/b/g<br>20M              | 11ac<br>80M                          | 11g<br>20M                 | 11g/n<br>20/40M             | 11g<br>20M                         | -<br>154M                | 11 g/n/ac 80M                                  |
| No. bits                  | -                           | -                                    | 9                          | 8                           | 9                                  | 13                       | 13                                             |
| PA                        | yes                         | no                                   | yes                        | yes                         | yes                                | yes                      | yes                                            |
| Peak Pout<br>[dBm]        | 22.3/20.8<br>11b/g          | -5<br>(No PA)<br>11n/ac              | 20.4<br>Single tone        | <b>20(20M)</b><br>12(40M)   | 16.8<br>11g                        | 22.8<br>Single tone      | 18.8 (g)<br>17.1(n 40M)<br><b>15.7(ac 80M)</b> |
| Drain eff.<br>[%]         | 31.3<br>at 33.9 dBm         | -                                    | -                          | -                           | 24.5                               | 42<br>19@6dB<br>back-off | -                                              |
| System eff.<br>[%]        | -                           | -                                    | -                          | 18.6                        | 19.3                               | 34<br>14@6dB<br>back-off | -                                              |
| PAE [%]                   | -                           | -                                    | 32.3                       | 22                          | 21.8                               | -                        | 17 (g)                                         |
| EVM [dB]<br>@Peak Pout    | -25/-28                     | -41<br>(2.4GHz)<br>-37<br>(5GHz)     | -31<br>WCDMA<br>-27.7 WLAN | -25(20M)<br>-28(40M)        | -28                                | -28                      | -25 (g)<br>-30.8(n 40M)<br>-33(ac 80M)         |
| Core supply               | 1.35 V                      | -                                    | 1.2 V                      | 1.05V                       | 1.2 V                              | 1.3V                     | -                                              |
| TX Power<br>Cons.<br>[mW] | 110.7 <sup>a</sup><br>122.8 | 1.08W<br>(2.4GHz)<br>1.52W<br>(5GHz) | 55@0dBm<br>302@20dBm       | 82ª                         | 27 <sup>a</sup><br>248@16.8<br>dBm | -                        | 22<br>DFE                                      |
| Noise floor<br>[dBm/Hz]   | -                           | -                                    | -                          | -110<br>@200MHz             | -125<br>@200MHz                    | -                        | -130<br>@GPS                                   |
| Area                      | 3.8 mm <sup>2</sup>         | ~11mm <sup>b</sup>                   | 1.1 mm <sup>2</sup>        | 0.9mm <sup>2</sup><br>PM+PA | ~1.5mm <sup>2</sup>                | 0.45mm <sup>2</sup>      | 0.18mm <sup>2</sup><br>DFE                     |

 Table 2.3. State-of-the-art in analog and digital transmitters

<sup>a</sup>: excluding PA & driver

Consequently, we can derive the main architecture specifications which will be further used in our design, namely:

- Cartesian architecture
- ▶ Peak  $P_{out} \approx 15-20 \text{ dBm}$
- ► PAE ≈ 30-32%
- Power consumption: <250-300 mW</p>
- $\succ$  V<sub>DD</sub> = 1 V
- ➢ Noise floor @GPS: ≤ -130 dBm/Hz
- $\blacktriangleright$  Area: < 1mm<sup>2</sup>

# **2.4.** Chapter Bibliography

**[Alavi14]** M. S. Alavi, R. B. Staszewski, L. C. N. de Vreede, J. R. Long, "A Wideband 2x13-bit All-Digital I/Q RF-DAC," *IEEE Transactions On Microwave Theory And Techniques*, vol. 62, no. 4, pp. 732-752, Apr. 2014.

**[Blank13]** R. Blank, L. E. Strickling, "Evaluation of the 5350-5470 MHz and 5850-5925 MHz Bands Pursuant to Section 6406(B) of the Middle Class Tax Relief and Job Creation Act of 2012," U.S. Department of Commerce, https://www.ntia.doc.gov/files/ntia/publications/ ntia\_5\_ghz\_report\_01-25-2013.pdf, 2013 (last accessed 24/05/2016).

**[Chen14]** T.-M. Chen et. al., "A 2x2 MIMO 802.11 abgn/ac WLAN SoC with integrated T/R switch and on-chip PA delivering VHT80 256QAM 17.5dBm in 55nm CMOS," *IEEE Radio Frequency Integrated Circuits Symposium (RFIC)*, pp. 226-228, June 2014.

**[Chung12]** Y.-H. Chung, M. Chen et. al., "A 4-in-1 (WiFi/BT/FM/GPS) Connectivity SoC with Enhanced Co-Existence Performance in 65nm CMOS," *IEEE International Solid-State Circuits Conference (ISSCC)*, pp. 172-173, Feb. 2012.

**[Cisco13]** Cisco Systems, "High Density Experience (HDX) Deployment Guide," http://www.cisco.com/c/en/us/td/docs/wireless/controller/technotes/8-1/HDX-DG/b\_hdx\_dg\_ final.pdf, 2013 (last accessed 22/05/2016).

**[He10]** X. He, J. van Sinderen, R. Rutten, "A 45 nm WCDMA transmitter using direct quadrature voltage modulator with high oversampling digital front-end," *IEEE International Solid-State Circuits Conference (ISSCC)*, pp. 62–63, 2010.

**[He14]** M. He, R. Winoto, X. Gao et. al., "A 40nm Dual-Band 3-Stream 802.11a/b/g/n/ac MIMO WLAN SoC with 1.1Gb/s Over-the-Air Throughput," *IEEE International Solid-State Circuits Conference (ISSCC)*, pp. 350-352, Feb. 2014.

**[Hezar10]** R. Hezar, L. Risbo et. al., "A 110dB SNR and 0.5mW Current-Steering Audio DAC Implemented in 45nm CMOS," *IEEE International Solid-State Circuits Conference (ISSCC)*, pp. 304-306, Feb. 2010.

**[Hezar14]** R. Hezar, L. Ding, J. Hur, B. Haroun, "A 23dBm Fully Digital Transmitter using  $\Sigma\Delta$  and Pulse-width Modulation for LTE and WLAN Applications in 45nm CMOS," *IEEE Radio Frequency Integrated Circuits Symposium (RFIC)*, pp. 217-220, June 2014.

**[Kozak03]** M. Kozak, I. Kale, "Oversampled Delta-Sigma Modulators," Kluwer Academic Publishing, Dordrecht, The Netherlands, 2003.

**[Kumar13]** R. Kumar, T. Krishnaswamy, G. Rajendran et. al., "A Fully Integrated 2×2 b/g and 1×2 a-Band MIMO WLAN SoC in 45nm CMOS for Multi-Radio IC," *IEEE International Solid-State Circuits Conference (ISSCC)*, pp. 328-330, Feb. 2013.

[Mirzaei14] A. Mirzaei, M. Mikhemar, H. Darabi, "A Pulling Mitigation Technique for Direct-Conversion Transmitters," *IEEE International Solid-State Circuits Conference (ISSCC)*, pp. 374-375, Feb. 2014.

**[Ravi12]** A. Ravi, P. Madoglio, H. Xu et. al., "A 2.4-GHz 20–40-MHz Channel WLAN Digital Outphasing Transmitter Utilizing a Delay-Based Wideband Phase Modulator in 32-nm CMOS," *IEEE Journal Of Solid-State Circuits*, vol. 47, no. 12, pp. 3184-3196, Dec. 2012.

**[Std11ac]** IEEE Std 802.11ac<sup>™</sup>-2013, Part 11, Amendment 4, Dec. 2013.

**[Std11n]** IEEE Std 802.11n<sup>™</sup>-2009, Part 11, Amendment 5, Oct. 2009.

**[Tektronix13]** Tektronix, "Wi-Fi: Overview of the 802.11 Physical Layer and Transmitter Measurements,"

http://www.cnrood.com/public/docs/WiFi\_Physical\_Layer\_and\_Transm\_Meas. pdf, 2013 (last accessed 26/05/2016).

**[Walling13]** J. S. Walling, D. J. Allstot, "Design Considerations for Supply Modulated EER Power Amplifiers," *IEEE Wireless and Microwave Technology Conference (WAMICON)*, Orlando, FL, pp. 1-4, Apr. 2013.

**[Wang14]** H. Wang, C.-H. Peng, Y. Chang et. al., "A Highly-Efficient Multi-Band Multi-Mode All-Digital Quadrature Transmitter," *IEEE Transactions on Circuits and Systems-I: Regular Papers*, vol. 61, no. 5, pp. 1321-1330, May 2014.

**[Wiki802]** Wikipedia, "IEEE 802.11," https://en.wikipedia.org/wiki/IEEE\_802.11 (last accessed 24/05/2016).

**[Ye13]** L. Ye, J. Chen, L. Kong, E. Alon, A. M. Niknejad, "Design Considerations for a Direct Digitally Modulated WLAN Transmitter With Integrated Phase Path and Dynamic Impedance Modulation," *IEEE Journal Of Solid-State Circuits*, vol. 48, no. 12, pp. 3160-3177, Dec. 2013.

**[Yoo11]** S.-M. Yoo, J. S. Walling, E. C. Woo, D. J. Allstot, "A switched-capacitor power amplifier for EER/Polar transmitters," *IEEE International Solid-State Circuits Conference (ISSCC)*, pp. 428–430, Feb. 2011.

**[Yoo13]** S.-M. Yoo et. al., "A Class-G Switched-Capacitor RF Power Amplifier," *IEEE Journal Of Solid-State Circuits*, vol. 48, no. 5, pp. 1212-1224, May 2013.

**[Zheng13]** S. Zheng, H. C. Luong, "A CMOSWCDMA/WLAN Digital Polar Transmitter With AM Replica Feedback Linearization," *IEEE Journal Of Solid-State Circuits*, vol. 48, no. 7, pp. 1701-1709, July 2013.

# 3. Transmitter architecture description

In the following chapter, the proposed transmitter architecture will be presented with a focus on advantages, disadvantages and main performances of each constitutive block. Based on the literature study, the digital transmitter architecture has proven to be a very good solution for systems with high constraints on power consumption and area, like mobile devices. Furthermore, if the targeted application assumes large bandwidth signals, the digital Cartesian architecture can be preferred to its polar counterpart, since it is not affected by bandwidth expansion inherent to polar architectures. Finally, a trade-off can be made between a solution implementing high-speed digital to analog converters (DAC) with large number of bits, or one which finds a way to combine a reduced number of bits with additional filtering and efficient switching for the power amplifier (PA).

# **3.1. Introduction**

The general architectures of a traditional analog transmitter and a digital implementation can be compared in Fig. 3.1. On the one hand, in the case of an analog implementation, the digital input signals are converted to the analog domain where they are low-pass filtered in order to remove image replicas and reduce the out-of-band noise.



Fig. 3.1. Traditional Analog Transmitter (left); Digital Transmitter (right)

Next, the in-phase I and quadrature-phase Q signals are up-converted to RF and fed to the power amplifier (PA) stage. Finally, the RF output is band-pass filtered before reaching the antenna.

On the other hand, the digital architecture is based on a direct up-conversion from digital to RF (D-RF), like in the case of I/Q RF-DAC [Alavi14], where the mixer and a 2\*13bit digital PA (DPA) are combined in order to obtain the RF output (Fig. 3.2). The digital baseband signals are up-sampled and interpolated at 300 MHz and directly fed to the combined mixer and DPA, which also performs a zero-order hold (ZOH) operation to balance the speed of baseband up-sampled signals with the LO clock.



Fig. 3.2. Digital I/Q modulation block diagram [Alavi14]

An all-digital RF signal generator is introduced in [Frappe09], based on single-bit LPDSM and a simplified digital mixer, which can be used in a digital Cartesian transmitter (Fig. 3.3). Hence, reducing the number of bits in the DSM, down to 1-bit output, allows the use of a simple mixer stage and ensures linearity and improved efficiency of the succeeding switching PA stage, at the cost of an increased operating frequency, that is, nevertheless, compatible with implementations in advanced CMOS nodes.



Fig. 3.3. D-RF architecture based on single-bit DSM [Frappe09]

However, two important issues which concern digital transmitters, namely the image replicas and the out-of-band noise, can affect the coexistence of multiple radios. First, the ZOH operation in [Alavi14] leads to replicas which can be seen at ±300 MHz from the center frequency,  $f_c = 2.4$  GHz (Fig. 3.4 left). Secondly, the performances of the DSM-based architecture in [Frappe09] are less affected by image replicas, thanks to the higher sampling frequency,  $2^*f_c$ , though they are affected by the out-of-band noise, since the main function of the DSM is to push away noise from the band of interest (Fig. 3.4 right).





Hence, additional filtering is needed in order to reduce the out-of-band noise and ensure coexistence of multiple radios within same devices. Recently, [Gebreyohannes16] has proposed a differential current-steering FIR DAC architecture with symmetric coefficients targeting –60 dBr at the FIR DAC stopband. The digital signal processor is based on a 1-bit DSM working at 3.52 GHz which is used to drive the FIR-DAC current cells based on the chosen FIR coefficients (Fig. 3.5). This architecture has the advantage of shifting the filtering function from analog to digital domain, thus obtaining a semi-digital transmitter architecture.

Following the concepts in [Alavi14] [Frappe09] [Gebreyohannes16] considering digital filtering and mixing, we can envision an architecture which pushes the digital domain operation up to the PA stage, thus obtaining a complete all-digital transmitter (Fig. 3.6) which performs exactly the same functions as a traditional analog transmitter.



Fig. 3.5. FIR-DAC [Gebreyohannes16]: TX block diagram (a); 1-bit N-length FIR DAC (b)

Conceptually, this architecture is Cartesian for large bandwidth applications and is based on DSM for reduced number of bits, additional low frequency configurability (FIR-PA SIGGEN), and simple digital to RF mixer (DRFM) with time-interleaved operation (sampling *I*-path on odd periods and *Q*-path on even periods) which enables the use of a common PA for *I* and *Q* paths with additional FIR filtering (Fig. 3.6).

With respect to [Alavi14], the proposed implementation uses 1-bit DSM instead of 13bit DAC, is not affected by image replicas, thanks to the high sampling frequency, and provides additional FIR filtering. Moreover, the architecture is all-digital and uses a common I/Q block working at  $4^*f_c$ , instead of two blocks at  $2^*f_c$  as in [Gebreyohannes16], thus trading-off complexity and area with higher frequency operation. Finally, the digital transmitter builds upon the all-digital RF signal generator based on DSM from [Frappe09] and integrates additionally the PA output stage with embedded filtering (highlighted in Fig. 3.6 with the simple view of the spectrum at different points in the system), thus providing a complete all-digital solution. Still, the DSM in [Frappe09] is used to target UMTS standard at a carrier frequency  $f_c = 1$  GHz, so in order to be able to target more performant communication standards and to reduce quantization noise in the signal-band, the operating frequency of the modulator should be increased. Nevertheless, this would also mean increased timing constraints which are hard to meet in a traditional DSM.

In the following sections of this chapter, each constitutive block of the proposed architecture will be presented in detail, including relevant literature study, associated design constraints, proposed improvements and performances overview.



Fig. 3.6. Proposed concept of all-digital transmitter with embedded-FIR PA

## 3.2. Digital Delta-Sigma Modulators

This section introduces the concept of Delta-Sigma Modulation. Next, time interleaving [KP93] [Kozak03] is introduced as a solution in order to improve the maximum effective operating frequency. These methods are applied to different DSM configurations in order to choose the most efficient architecture for high-speed applications based on a critical path study.

The DSM is designed in MATLAB with the Delta-Sigma Toolbox [Schreier16] for large signal bandwidths (20 to 160 MHz) and high sampling frequency (~4.8 GHz) to enable WLAN 802.11. The resulting modulator is described in VHDL and the code is synthesizable, which eases its integration using automatic design tools (IC or FPGA). Part of this study was published 2015 [Marin15].

Moreover, two possible improvement schemes, that can be employed to reduce the quantization noise produced by a DSM at specific frequencies, have been proposed [Marin16] [Marin17a]. A detailed description of these methods is provided in the chapter on Multi-standard coexistence.

## **3.2.1. Basic concepts**

The Delta-Sigma Modulation is based on three main concepts: quantization noise, oversampling and noise shaping (Fig. 3.7). First of all, in order to reduce complexity, the output signal can be quantized on a low number of bits which causes quantization noise.

Next, oversampling (for example four times oversampling in Fig. 3.7) is performed in order to reduce the level of the quantization noise, by distributing it over a larger frequency domain.

Finally, the noise shaping is added in order to further reduce the signal-band quantization noise and push it out of band. In Fig. 3.7,  $S_e(f)$  denotes the power spectral density,  $f_b$  the signal bandwidth and  $f_s$  the sampling frequency, respectively.



Fig. 3.7. Delta-Sigma Modulation concept

### 3.2.1.1. Quantization Noise

Quantization is a non-linear process used in analog-to-digital (A/D) and digital-toanalog (D/A) conversions which maps any input amplitude to the closest output level. However, a lower number of output levels determines a larger loss of exact information of input amplitude, leading to quantization errors.

In order to simplify the analysis, the quantizer can be linearized by using an additive white noise model, independent of the input sequence v[n] (Fig. 3.8). This leads to a number of assumptions on the error process and its statistics [Kozak03]:

- the quantization error sequence *e*[*n*] is uncorrelated with the input sequence *v*[*k*], for all *n*, *k*;
- ► e[n] is uniformly distributed over the range  $[-\Delta/2, \Delta/2]$ , where  $\Delta$  is called bin width and represents the difference between two consecutive output levels;
- $\triangleright$  *e*[*n*] is a white noise with an average power  $\sigma_e^2 = \Delta^2/12$ .

Finally, it is concluded in [Kozak03], that the signal-to-noise ratio (SNR) increases by approximately 6 dB for each additional bit in the quantizer resolution.



Fig. 3.8. Quantizer (left) and its linear model (right) [Kozak03]

#### 3.2.1.2. Oversampling

Oversampling is achieved by sampling the input signal at a faster rate than two times the signal bandwidth (Nyquist rate). Hence, the oversampling ratio (OSR) is given in (3.1) as the ratio between the sampling frequency and the Nyquist rate.

$$OSR = \frac{f_s}{2f_b} \tag{3.1}$$

The advantage of this technique is that the total quantization error power for an oversampled signal remains the same as in the case of a Nyquist rate converter, which means that the overall noise levels are reduced, thanks to its distribution over a larger frequency domain (Fig. 3.7 center).

Finally, it is shown in [Kozak03], that for each doubling of OSR, the signal-to-noise ratio (SNR) increases by 3 dB, which is equivalent to an improvement of 0.5 bit of resolution. However, this is limited by the maximum operating frequency, which is mainly dependent on the technology node and the choice of architecture.

# **3.2.1.3.** 1<sup>st</sup> order noise shaping

The diagram of a 1<sup>st</sup> order DSM incorporating a single-bit quantizer along with a discrete-time integrator in negative feed-back is displayed in Fig. 3.9. The output in z-domain is given by

$$Y(z) = X(z) \cdot z^{-1} + E(z) \cdot (1 - z^{-1})$$
(3.2)

where X(z), Y(z) and E(z) are the z-transforms of the input, output and the quantization error, respectively.

It can be seen that the input signal is filtered by the signal transfer function  $STF(z) = z^{-1}$ , while the quantization error is high-pass filtered by the noise transfer function  $NTF(z) = (1-z^{-1})$ . Thus, the quantization noise is pushed away from the signal-band to the out-of-band in order to improve the SNR.



Fig. 3.9. 1<sup>st</sup> order DSM architecture

## 3.2.1.4. Higher-order noise shaping

Higher-order DSM can be realized using either the single-stage approach [Schreier16] or multi-stage architectures [Matsuya87].

The **single-stage** architecture is based on a single quantizer in the feedback loop, which means that single-bit quantization can still be performed. However, increasing the loop order can cause instability problems. In [Schreier16] it is shown, that the stability issue can be overcome through a custom choice of the modulator's coefficients and a restriction of the maximum amplitude of the input signal. Furthermore, the algorithm used in [Schreier16] is based on Lee's criterion [Chao90], which suggests as a general rule of thumb, that the peak gain of the noise transfer function (NTF) should be less than 2, i.e.  $\max{\{NTF(f)\}} \le 2$ , where  $f \in [-f_s/2, f_s/2]$ . In addition, this architecture is also called error-feedback and can be found in different configurations, such as Cascade-of-integrators feedback form (CIFB) [Frappe09], Cascade-of-integrators feedforward form (CIFF) [Hatami14], Cascade-of-resonators feedback form (CRFB) [Ebrahimi11], Cascade-of-resonators feedforward form (CRFF) [Silva12].

The discrete-time integrator and resonator are depicted in Fig. 3.10. The output in zdomain of the integrator is

$$V(z) = U(z) \cdot \frac{1}{z - 1} \tag{3.3}$$

whereas for the resonator we obtain



Fig. 3.10. Discrete-time integrator (left) and resonator (right)

For example, a 3<sup>rd</sup> order DSM in a CIFB configuration (Fig. 3.11) is built with three integrator stages (shown in Fig 2.10 left) with the z-domain output

$$Y(z) = STF(z) \cdot X(z) + NTF(z) \cdot E(z)$$
(3.5)

$$STF(z) = \frac{1}{den(z)}$$
(3.6)

$$NTF(z) = (z-1) \cdot ((z-1)^2 + g_1) \cdot \frac{1}{den(z)}$$
(3.7)

$$den(z) = a_1 + a_2 \cdot (z - 1) + a_3 \cdot (z - 1)^2 + (z - 1) \cdot ((z - 1)^2 + g_1)$$
(3.8)

where the  $g_1$  coefficient (feedback path) can be used to place optimized zeros (not at DC) for bandwidth configurability (depicted in Section 3.2.3.2), since the distance between the position of optimized zeros and DC increases with  $g_1$ .



Fig. 3.11. 3<sup>rd</sup> order CIFB architecture

In the **multi-stage** approach, several low-order (1<sup>st</sup> and 2<sup>nd</sup>) DSMs are connected in cascade to ensure the stability of the complete system, thanks to the inherent stability of 1<sup>st</sup> and 2<sup>nd</sup> order modulators (Fig. 3.12). The input signal is fed to the first stage, while the negative of the quantization error from stage *k* is fed to the input of stage k+1 ( $k = \{1..., n-1\}$  and *n* is the total number of stages).

Finally, the outputs from each stage are combined together, which results in the cancellation of the quantizer error terms from all stages, except the last one. In addition, this architecture is also called multi-stage-noise-shaping (MASH) and was first presented in [Uchimura88].

Nevertheless, the MASH architecture has a multi-bit output which requires the use of an additional Digital-to-Analog Converter (DAC) and does not allow bandwidth configurability, since all the zeros of the noise shaping function are located at DC. Hence, error feedback DSMs have been preferred and will be studied further.



Fig. 3.12. MASH 1-1 architecture

## **3.2.2. Increased effective operating frequency**

It was stated previously that the SNR performance of a DSM is directly proportional to the oversampling ratio, which means that the main design goal is to increase the maximum effective operating frequency of the DSM. This can be achieved using various techniques, such as time-interleaving [Ebrahimi11] [Madoglio10], pipelining [Bhide13a] [Schmidt11], or look-ahead [Bhide15].

The concept of time-interleaving proposes a simple form of parallelism, where a high data rate input signal ( $R_{in}$ ) is distributed over multiple stages ( $n_{TI}$ ), each working at a slower rate depending on the number of stages ( $R_{TI} = R_{in} / n_{TI}$ ) [Kozak03]. For example, [Ebrahimi11] presents a TI poly-phase DSM in a 2<sup>nd</sup> order CRFB configuration on 8 channels which can target WIMAX signals of bandwidth BW = 8 MHz with an OSR =100 and an effective sampling frequency of 800 MHz. The modulator is implemented on FPGA and achieves an SNDR  $\approx$  60 dB measured with a logic analyzer. This method can also be applied to MASH architecture, as in [Madoglio10], where a 2<sup>nd</sup> order MASH DSM with 8 TI channels, integrated in CMOS 45nm, is presented to work at an effective sampling frequency of 2.5 GHz while consuming 6.9 mW.

Furthermore, the process of pipelining represents a distribution of arithmetic operations into smaller operations, separated by additional registers, which leads to an increased overall clock rate at the expense of higher latency. This is visible in [Bhide13a]
where a MASH pipelined architecture with 2-TI channels achieves a maximum sampling frequency around 8 GHz, for a power consumption which is ten times larger than in [Madoglio10]. Regarding error-feedback architectures, pipelining is generally not applicable to the full architecture due to the coefficients present in the feedback paths. However, it is shown in [Schmidt11] that if the quantizer thresholds are large powers of 2, the least significant bits (LSB) can be pipelined as they are outside the feedback loop, which further leads to a shorter critical path. Thus, the 3-level architecture enables the use of smaller effective word widths in the feedback loop, down to 5 most significant bits (MSB) out of 16 bits in total (Fig. 3.13).



Fig. 3.13. Pipelining using 5-bit and 6-bit ripple carry adders (RCA5/RCA6) [Schmidt11]

Finally, [Bhide15] presents a look-ahead technique in order to decouple the 2 TIchannels in a MASH DSM, so that part of the computation is moved out from the integrator feedback loop to before and after the integrator. Using this method, the DSM achieves an effective rate of 11.05 GS/s, which represents a 37 ps improvement in the delay with respect to a 2-bit TI DSM pipeline [Bhide13b].

In the following sub-sections, time-interleaving in Delta-Sigma modulators will be discussed in detail, focusing on a comparison between the methods in [KP93] [Kozak03] and a critical path study to determine the most suitable error-feedback architecture for high-speed applications.

#### 3.2.2.1. Time-interleaving methods

Most of the time-interleaved DSM proposed in literature, MASH or error-feedback [Ebrahimi11] [Madoglio10] [Bhide13a] [Bhide15] [Bhide13b] use the popular block digital filtering method for time-interleaving, introduced over 20 years ago in [KP93].

This method is based on poly-phase components and results in an effective sampling frequency of  $M_{f_s}$ , where M is the number of cross-coupled DSMs operating at the sampling frequency  $f_s$ . The equivalent system with the input-output transfer function  $Y(z) = H(z) \cdot X(z)$  is presented in Fig. 3.14, where  $\overline{H}(z)$  is an  $M \ge M$  transfer function matrix and  $\overline{H}_{ij}$  is the contribution of the j-th input into the i-th output.



Fig. 3.14. Equivalent block filtering structure for SISO transfer function H(z) [KP93] The general structure of  $\overline{H}(z)$  is given in (3.9)

$$\overline{H}(z) = \begin{bmatrix} E_0(z) & E_1(z) & \dots & E_{M-1}(z) \\ z^{-1}E_{M-1}(z) & E_0(z) & \dots & E_{M-2}(z) \\ z^{-1}E_{M-2}(z) & z^{-1}E_{M-1}(z) & \dots & E_{M-3}(z) \\ \vdots & & & & \\ z^{-1}E_1(z) & z^{-1}E_2(z) & \dots & E_0(z) \end{bmatrix}$$
(3.9)

where the first row elements are type 1 poly-phase components of H(z)

$$H(z) = \sum_{l=0}^{M-1} z^{-1} E_l(z^M)$$
(3.10)

Let us assume the simple case of a resonator stage with the transfer function h(z) = z / (z - 1). The poly-phase decomposition of H(z) for 2 time-interleaved channels, M = 2 is

$$H(z) = E_0(z^2) + z^{-1}E_1(z^2) = \frac{1+z^{-1}}{1-z^{-2}}$$
(3.11)

hence  $E_0(z)$  is equal to  $E_1(z)$ 

$$\begin{cases} E_0(z^2) = \frac{1}{1 - z^{-2}} \\ E_1(z^2) = \frac{1}{1 - z^{-2}} \end{cases}$$
(3.12)

and

$$\overline{H}(z) = \begin{bmatrix} E_0(z) & E_1(z) \\ z^{-1}E_1(z) & E_0(z) \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ z^{-1} & 1 \end{bmatrix} \cdot \frac{1}{1 - z^{-1}}$$
(3.13)

Moreover, [Kozak03] has introduced a new method with reduced complexity in terms of hardware requirements which is derived directly from the time-domain behavior of the DSM. This approach includes the following steps [Kozak03]: (a) write the time-domain node equations for *M* consecutive time slots, (b) re-label the nodes within the modulator based on the number of channels *M* and re-write the time-domain equations, (c) combine the resulting equations into a single equation corresponding to one time slot, and (d) derive the architecture corresponding to the time-domain behavior.

The comparison between these 2 methods in terms of hardware requirements for an L-th order M-channel DSM (Table 3.1), shows a highly reduced complexity for the method in [Kozak03], especially when increasing the number of TI channels.

|                           | Block digita | l filtering [KP93]              | TI reduced complexity [Kozak03] |                                 |  |
|---------------------------|--------------|---------------------------------|---------------------------------|---------------------------------|--|
|                           | General      | 3 <sup>rd</sup> order 8-channel | General                         | 3 <sup>rd</sup> order 8-channel |  |
| No. integrators           | LM           | 24                              | none                            | 0                               |  |
| No. cross-<br>connections | $L(M^2-M)$   | 168                             | (L+1)M                          | 32                              |  |
| No. delay<br>elements     | М            | 8                               | L                               | 3                               |  |
| No. two-input<br>adders   | $L(M^2-M)$   | 168                             | LM                              | 24                              |  |

**Table 3.1.** Hardware comparison for L-th order M-channel TI DSM [Kozak03]

Nevertheless, this method has been used only recently in a 4-channel TI DSM implementation on FPGA, working at a maximum sampling frequency of 400 MHz with a narrow signal bandwidth of 1.25 MHz [Podsiadlik14].

### **3.2.2.2.** Critical path study

In literature, we may find four main error-feedback architectures, as detailed in [Schreier16], namely CIFB, CIFF, CRFB and CRFF, respectively. A critical path study for time-interleaved error-feedback DSM is proposed to determine the DSM design specifications for high-speed operation.

The architectures of a 1st order 2-channel DSM using the poly-phase and node equations methods are displayed in Fig. 3.15. Assuming that the addition between 2 *n*-bit signals introduces a delay proportional to *n* and an addition between a *n*-bit signal and a  $d_a$ -bit signal ( $d_a \le n$ ) introduces a reduced delay, we can estimate the delay factor of the critical path for the poly-phase,  $D_{pf} = 4n + 2d_a$ , and node equations methods,  $D_{neq} = 2n + 2d_a$ .





We note that the value  $d_a$  depends on the value of the coefficient a, namely if  $a = 2^0$ , the summation will involve only the most significant bit (MSB) and  $d_a = 1$ , whereas in the worst case ( $a = 2^{-n+1}$ ), the summation is performed on all n bits, hence  $d_a = n$ .

This study is extended in [Marin15] to known architectures of error-feedback DSM and shows that the complexity of the critical path of the CIFB architecture is almost half of the next best one, CIFF, and almost a third of the poly-phase CRFB implementation used in

[Ebrahimi11]. Furthermore, increasing the modulator order in the case of the CIFB architecture does not affect the critical path length, while providing improved SNR.

In conclusion, the CIFB architecture, time-interleaved using the node equations method in [Kozak03] is more suited for high-speed applications, since it employs less hardware with relaxed timing constraints.

## **3.2.3. CIFB architecture**

A 3<sup>rd</sup> order CIFB DSM (Fig. 3.11) was designed in MATLAB using the Delta-Sigma Toolbox [Schreier16], which enables the synthesis of the noise transfer functions for single-path and quadrature Delta-Sigma modulators. The available options for single-path DSM are the following: the order of the NTF, the oversampling ratio (OSR), optimized zeros, out-of-band gain of the NTF and the center frequency of the modulator. Thus, we only need to set the requirements of the DSM and perform a few iterations in order to obtain an optimum and stable architecture, which simplifies the design and computational effort. A detailed description of this tool and the associated functions can be found in [Schreier05].

# 3.2.3.1. CIFB DSM design specifications

The specifications of the CIFB DSM have been set with respect to the design parameters from [Schreier16] (order, OSR, zeros optimization, NTF gain, center frequency) and the targeted application.

First of all, the order of the DSM should be i) low, reduced number of stages means reduced implementation complexity, ii) uneven to allow the placement of DC and optimized zeros, and iii) high enough to meet the SNR requirements for the targeted application. Thus, we set the order to 3, which is the smallest value that meets the aforementioned conditions. Secondly, in order to determine the OSR we need to set the sampling frequency and the signal bandwidth (3.1).

The modulator is proposed as a part of a digital transmitter chain, targeting IEEE 802.11 standard, with the signal bandwidth  $BW = \{20, 40, 80, 160\}$  MHz at a center

frequency  $f_c = 2.4$  GHz. Hence, we obtain the corresponding oversampling ratio, OSR = {240, 120, 60, 30}. Furthermore, the optimized zeros (zeros not at DC) option refers to the possibility of adding bandwidth configurability, namely targeting multiple signal bandwidth while meeting SNR requirements. Next, the out-of-band gain of the NTF is determined according to Lee's criterion [Chao90], namely max $\{NTF(f)|\} \le 2 = 1.6$ , where  $f \in [-f_s/2, f_s/2]$ . Finally, the center frequency of the modulator is  $f_0 = 0$ , since the proposed architecture is based on low-pass DSM.

#### 3.2.3.2. Architecture coefficients

The coefficients  $\{a_1, a_2, a_3, b_1, g_1\}$  of the DSM (as shown in Fig. 3.11) are obtained based on the design specifications using the Delta-Sigma Toolbox functions *synthesizeNTF* and *realizeNTF*.

The implementation of the coefficients influences the delay factor of the critical path, which in the end determines the maximum sampling frequency of the modulator. In order to obtain simplified operations, all the coefficients have been quantized to sums of negative powers of 2/negative powers of 2, thus avoiding the use of dedicated multipliers.

Furthermore, we observe that the bandwidth configurability is determined only by the  $g_1$  coefficient, whereas the other coefficients are almost equal for different OSR values. In addition, [Marin15] introduces simple configurability using only a 1-bit control signal (*ctrl*) to activate or deactivate the  $g_1$  coefficient, thus obtaining close to ideal SNR for the targeted signal bandwidths (Fig. 3.16).





Hence, the optimized zero placement ( $g_1 = 2^{-7}$ ) is suited for signal bandwidths larger than 80 MHz, as shown in the NTF plot from Fig. 3.17.



Fig. 3.17. NTF of the  $3^{rd}$  order CIFB DSM:  $g_1 = 0$ ;  $g_1 = 2^{-7}$  [Marin15]

Finally, the values of the peak SNR for the targeted signal bandwidths can be obtained using the function *simulateSNR* from the Delta-Sigma Toolbox. The results are displayed in Table 3.2 for the *ideal* case (ideal coefficients) and *optimized* case (quantized coefficients with  $g_1$  de-/activation as shown in Fig. 3.16). In addition, the *simulateSNR* function can be used to determine the stability of a DSM, namely the maximum amplitude of the input signal in dBFS (decibel relative to a full-scale sinewave), which in this case was calculated to be approximately -2.8 dBFS.

| BW [MHz] | OSD | Peak SI | NR [dB]            |
|----------|-----|---------|--------------------|
|          | USK | ideal   | optimized          |
| 20       | 240 | 134.1   | 127.4 <sup>a</sup> |
| 40       | 120 | 112.3   | 104 <sup>a</sup>   |
| 80       | 60  | 91.4    | 82.4 <sup>a</sup>  |
| 160      | 30  | 69.2    | 67.2 <sup>b</sup>  |

| Table 3.2.         Peak SNR | 3 <sup>rd</sup> order | CIFB DSM | [Marin15] |
|-----------------------------|-----------------------|----------|-----------|
|-----------------------------|-----------------------|----------|-----------|

<sup>a.</sup>  $g_1$  deactivated

<sup>b.</sup>  $g_1$  activated

#### **3.2.4. DSM synthesis**

The final step of the design flow is the physical implementation of the proposed modulator targeting either integrated circuit (IC), or FPGA applications. Hence, the architecture has been described in VHDL based on the signal precision and the amplitude ratios between different stages (to avoid saturation), obtained through simulation. In order to ensure good SNR performance, the input signal *x* is coded on 12 bits (11 bits data, 1 sign bit), which corresponds to 16-bit accuracy considering the coefficient  $b_1 \approx 2^{-4}$ . Furthermore, there is a direct relationship between the NTF order and signals precision, namely if the NTF order is increased, the ratio between the output of the last stage and the input signal also increases, which determines a larger number of bits needed for signals coding.

The proposed DSM was synthesized in 28nm FD-SOI CMOS from STMicroelectronics using standard cells libraries for a set of three supply voltages, {0.8, 0.9, 1} V. The results in terms of critical path slack are used to estimate the maximum sampling frequency of the modulator with respect to the supply voltage, reaching up to 6 GHz for 5-TI channels operating at 1 V (Fig. 3.18).

Hence, we note that time-interleaving can be used, either to increase the maximum sampling frequency for a given supply voltage, or to lower the supply voltage for a given sampling frequency, whereas the power consumption/GHz remains almost constant.



Fig. 3.18. TI DSM estimated maximum operating frequency

In conclusion, an optimum number of time-interleaving channels (between five and ten) can be found based on a trade-off between operating frequency, supply voltage, power consumption and area, respectively.

## **3.3. FIR-PA**

This section describes the proposed FIR-PA implementation for high-speed all-digital transmitters targeting multi-standard coexistence. A FIR filter is designed in MATLAB in order to meet the out-of-band noise requirements when transmitting at the center frequency  $f_c \approx 2.4$  GHz. Next, we perform a theoretical study of a switched-capacitor (SC) PA based on embedded-FIR in order to evaluate the performances in terms of power efficiency and area. These performances are further improved by introducing different optimization steps for reduced switching and complexity of the output stage.

## 3.3.1. FIR Filter

The proposed FIR filter has been designed in MATLAB in order to attenuate the quantization noise produced by the 1-bit DSM (described in the previous Section) and meet the out-of-band noise specifications (OOB mask) depicted in Fig. 3.19. The values are given at the PA output, assuming a noise reduction of around 40 dB thanks to antenna coupling losses and a coexistence band-pass filter (BPF). This is a generally valid assumption for chipsets coexisting in a single device and is implemented here as a brick-wall filter.

We remark that the most stringent noise specifications can be found near-band, between 2.3 – 2.7 GHz at 3GPP band, and out-of-band between 1.57 – 1.61 GHz at GPS and GLONASS (Global Navigation Satellite System) bands.

Based on these requirements, we propose the design of a digital FIR filter, taking into consideration that the near-band noise will determine the number of filter coefficients, and the out-of-band the resolution of the coefficients, respectively.



Fig. 3.19. 1-bit DSM output spectrum (Amplitude scaled to  $P_{out} = 20$  dBm, BW = 20 MHz)

The Filter Design & Analysis Tool in MATLAB (fdatool) was used to design the FIR filter with different windows (under the same conditions in terms of filter length, coefficient quantization), to obtain a trade-off between near-band and out-of-band attenuation based on the design specifications shown in Fig. 3.19. It is found that the required performance trade-off can be achieved by a FIR filter using a Hann window

$$w(n) = \frac{1}{2} \left( 1 - \cos\left(\frac{2\pi n}{N-1}\right) \right) \tag{3.14}$$

where *N* represents the width, in samples, of a discrete-time symmetrical window function w[n] and  $0 \le n \le N - 1$ .

Next, we derive the limit values of the FIR filter parameters, namely 115 symmetric coefficients quantized on 8 bits, which enable the implementation of a transfer function that meets the design specifications, when transmitting a 20 MHz bandwidth signal on Channel 1 centered at 2.412 GHz (Fig. 3.20). The 8-bit coefficient quantization corresponds to the ratio between the largest (*coeff<sub>i</sub>*) and the smallest (*coeff<sub>s</sub>*) filter coefficient. Hence, in this case, *coeff<sub>s</sub>* = 1 and *coeff<sub>i</sub>* = 256. Knowing *coeff<sub>s</sub>* = 1, we can obtain the number of unitary filter taps by summing all the quantized coefficients, resulting in a total number of 14540.

Finally, the filter design requirements corresponding to the targeted signal bandwidths, {20, 40, 80, 160} MHz, are shown in Table 3.3, where we can notice that the number of coefficients decreases for larger bandwidths, 80 MHz and 160 MHz, thanks to the relaxed near-band noise requirements (the large bandwidth signals overlap the 3GPP

band). In conclusion, in order to facilitate the PA design in terms of power consumption and area, the maximum number of unitary filter taps (14540) needs to be highly reduced, meaning a smaller number of coefficients with lower quantization.



Fig. 3.20. Simulated output spectrum: FIR filtered DSM output (BW = 20 MHz)

 Table 3.3. FIR filter initial design specifications

| BW [MHz] | No. coefficients | Quantization<br>No. of bits | No. unitary taps |
|----------|------------------|-----------------------------|------------------|
| 20       | 115              | 8                           | 14540            |
| 40       | 115              | 8                           | 14540            |
| 80       | 75               | 8                           | 8390             |
| 160      | 65               | 8                           | 5380             |

# **3.3.2.** Power amplifier

The main idea of our proposed transmitter architecture is to shift processing from analog to digital domain for increased transmitter performances, thus adding increased digital functionality while taking advantage of the advanced CMOS technology node, 28nm FD-SOI CMOS from STMicroelectronics. In return, the design constraints of the PA output stage are highly reduced, so we can target a simple and efficient PA stage with low power consumption. This is enabled here by the 1-bit output DSM which allows the use of a linear and highly efficient switching-mode power amplifier (SMPA), such as the class-D PA (Fig. 3.21 left). The SMPA is based in principle on complementary active devices which can be represented as an ideal switch, where either the voltage or current waveforms across it are alternatively minimized to reduce overlap (Fig. 3.21 right). However, most of the PA stages found in literature are used in polar architectures as a combination of several power cells in order to target different amplitude levels [Zheng13] [Ye13].



Fig. 3.21. Switching-mode class-D PA: active devices (left); ideal switch (right)

Since the architecture is based on 1-bit output, the amplitude is constant (ideally  $V_{DD}$ ), so it does not need the power cells combination. Therefore, we propose to use this feature in order to obtain a FIR filter (of length *N*) for multi-standard coexistence, where the filter coefficients are implemented as class-D power cells (Fig. 3.22).



Fig. 3.22. Switching-mode class-D PA with embedded N-length FIR filter

#### **3.3.2.1. Design specifications**

The value of the load resistance  $R_L$  depends on the targeted maximum ouput power, which is  $P_{out,max} = 20 \text{ dBm} = 100 \text{ mW}$ . Generally, the output power of a linear RF PA is:

$$P_{out} = 0.5 \cdot \frac{V_{out}^2}{R} \tag{3.15}$$

where *V*<sub>out</sub> represents the peak value of the output voltage.

Next, the capacitance seen from the output in the node  $C_x$  is equal to the sum of all capacitances  $C_{1...N}$ . The total capacitance C and the inductance L are chosen in order to obtain resonance at the carrier frequency,  $f_c = 2.4$  GHz. We note that these values are non-uniquely defined provided a high enough quality factor which can be achieved by using off-chip inductors.

$$2\pi f_c = \frac{1}{\sqrt{L \cdot C}} \tag{3.16}$$

Furthermore, assuming a train of pulses at the carrier frequency, we can write:

$$V_{out} = \frac{2}{\pi} \cdot \left(\frac{n_{on}}{n_t}\right) \cdot V_{DD}$$
(3.17)

where  $2/\pi$  is the first coefficient of the Fourier series, and  $n_{on}/n_t$  represents the ratio between the switched-on capacitance and the total capacitance. Finally, we obtain

$$P_{out} = \frac{2}{\pi^2} \cdot \left(\frac{n_{on}}{n_t}\right)^2 \cdot \frac{V_{DD}^2}{R}$$
(3.18)

The value of the resistance can be obtained by replacing  $P_{out} = 100 \text{ mW}$ ,  $n_{on}/n_t = 1$  and  $V_{DD} = 1 \text{ V}$  in (3.18), namely  $R \approx 2 \Omega$ .

#### **3.3.2.2.** Filter optimization using the matching network

Let us recall Table 3.3, where it is shown that the FIR filter (8-bit quantization), designed for {20, 40} MHz signal bandwidths, requires a total number of 14540 unitary power cells which would determine a very large power consumption and area. In order to

target the physical implementation, this number needs to be reduced by lowering the number of coefficients and quantization bits.

Next, we remark that at the output of the PA (Fig. 3.22) we obtain a series RLC network formed by the matching network LC and the load resistance R. Hence, when L and C are resonant at the carrier frequency, a band-pass filter (BPF) is obtained around  $f_c$ , which will relax the design constraints of the proposed FIR filter, with respect to the chosen values of R, L, and C. Consequently, the capacitor used for switching is inherently part of the matching network, and serves two purposes, i.e. D/A conversion and continuous-time filtering.

The general transfer function of a RLC-BPF in s-domain is:

$$tf_{BPF}(s) = \frac{\frac{R}{L}s}{s^{2} + \frac{R}{L}s + \frac{1}{LC}}$$
(3.19)

Let us set  $R = 2 \Omega$ , L = 1 nH, and C = 4.39 pF, whereas the resistance is chosen based on the maximum output power (Section 3.3.2.1), the inductance based on the available surface-mounted devices (SMD) with a quality factor larger than 50, and the capacitance is obtained with respect to L and  $f_c$ , so that L and C are resonant at  $f_c = 2.4$  GHz.

Next, we can plot the resulting transfer function in order to estimate the signal-band and out-of-band attenuation (Fig. 3.23).



Fig. 3.23. Transfer function  $2^{nd}$  order BPF filter : {R, L, C}, {2\*R, L, C}, and {R, 2\*L, C/2}

The filter is simulated for three sets of values, namely {*R*, *L*, *C*}, {2\**R*, *L*, *C*}, and {*R*, 2\**L*, *C*/2}. The results show that the attenuation is lower either when increasing *R*, or when increasing *C* and decreasing *L*, which is directly linked to the quality (Q) factor of the 2<sup>nd</sup> order band-pass filter. Furthermore, the maximum signal-band attenuation is around -3 dB in the worst case when BW = 160 MHz, as shown in Table 3.4, while still meeting the maximum deviation requirement of ±4 dB for the 802.11ac (detailed in Section 2.1.3).

However, the out-of-band attenuation relaxes the filtering constraints and allows a simplification of the FIR filter. Hence, if we take into consideration the BPF attenuation when designing the FIR filter, we estimate that the number of bits necessary for the coefficient quantization can be reduced down to 5.

| DW (MHz)     | Signal-Band Attenuation at ± BW/2 [dB]                 |                                |                                                             |  |  |
|--------------|--------------------------------------------------------|--------------------------------|-------------------------------------------------------------|--|--|
| D W [IVIIIZ] | BPF <sub>1</sub><br>{ <i>R</i> , <i>L</i> , <i>C</i> } | $\frac{BPF_2}{\{2^*R, L, C\}}$ | BPF <sub>3</sub><br>{ <i>R</i> , 2* <i>L</i> , <i>C</i> /2} |  |  |
| 20           | -0.04                                                  | -0.01                          | -0.1                                                        |  |  |
| 40           | -0.1                                                   | -0.015                         | -0.3                                                        |  |  |
| 80           | -0.26                                                  | -0.06                          | -1                                                          |  |  |
| 160          | -0.9                                                   | -0.3                           | -3                                                          |  |  |

Table 3.4. BPF signal-band attenuation

Finally, the optimized FIR filter design specifications using 5-bit coefficient quantization are summarized in Table 3.5. We observe that the number of unitary taps is almost eight times lower as in the case with 8-bit quantization. Furthermore, the number of coefficients also decreased thanks to the lower resolution which determines a cancelation of the smallest coefficients (up to 24 cancelled coefficients for the 160 MHz case). For instance, a coefficient with the value  $c_v \leq 3$  (smaller than  $0.5*2^3$ ) in the 8-bit quantization case is rounded to zero in the case of 5-bit quantization and is no longer taken into consideration.

| BW [MHz] | No. coefficients | Quantization<br>No. of bits | No. unitary taps<br>(Power cells) |
|----------|------------------|-----------------------------|-----------------------------------|
| 20       | 109              | 5                           | 1820                              |
| 40       | 109              | 5                           | 1820                              |
| 80       | 67               | 5                           | 1050                              |
| 160      | 41               | 5                           | 670                               |

**Table 3.5.** Optimized FIR filter design specifications

Consequently, the proposed band filter (FIR + BPF) provides highly performant noise attenuation with relaxed design constraints, thus facilitating the FIR-PA integration in advanced CMOS technology nodes. The proposed optimized FIR filter with 5-bit quantization and RLC ( $R = 2 \Omega$ , L = 1 nH, C = 4.39 pF) is simulated and compared with the 8-bit quantization case without matching network (Fig. 3.24).





Furthermore, a configurability feature is introduced in order to provide a simple mechanism to adapt the filter to the signal bandwidth of the targeted application (simple change between filter functions for 20/40/80/160 MHz). Consequently, each filter coefficient can be changed by ±1, ±2 units, thus ensuring a correct filter definition for all targeted applications. For example, a coefficient value  $c_{v,80M}$  = 39 in the 80 MHz case can be obtained from the combination of two coefficients from the 20 MHz case (Section 4.2.1.2),

 $c_{v1,20M}$  = 15 and  $c_{v2,20M}$  = 23, where we activate a +1 unit coefficient change on either  $c_{v1,20M}$  or  $c_{v2,20M}$  (15+23+1=39). Moreover, the coefficient change may also compensate possible physical non-idealities for an improved tolerance to the physical variation of unitary capacitances, thus being able to successfully meet the OOB mask (Section 3.4.2.3).

To summarize, the proposed filter design approach is based on a co-design of the FIR and RLC band-pass filters, which allows the simplification of the FIR filter specifications in terms of number of coefficients and number of unit cells, thus improving efficiency.

This approach can be divided in the following steps: i) choose the appropriate filter window which provides a good trade-off between near-band and out-of-band attenuation, ii) find the limit value of the number of bits for coefficient quantization (meet OOB mask), iii) reduce the number of quantization bits of the FIR filter, by taking into account the inherent RLC band-pass filter attenuation.

## **3.3.3. Efficiency**

In an RF PA, the power efficiency is one of the most important performance figures and is usually defined by the drain efficiency ( $\eta$ ), or the total power-added efficiency (*PAE*):

$$\eta = P_{out} / P_{DC} \tag{3.20}$$

$$PAE = P_{out} / (P_{in} + P_{DC})$$
(3.21)

where  $P_{in}$ ,  $P_{out}$ , and  $P_{DC}$  are the input power, output power, and DC power dissipation, respectively.

In the case of a switched-capacitor PA (SC PA) with ideal switches, we can rewrite Eq. (3.20) to obtain

$$\eta = P_{out} / \left( P_{out} + P_{SC} \right) \tag{3.22}$$

where *P*<sub>SC</sub> represents the dynamic power required to charge the SC array.

We note that in the case of real switches (described in Chapter 4), the efficiency relation includes also the leakage power, static and dynamic due to parasitic capacitances and direct-path/static currents, and the resistive losses due to the switch on-resistance  $R_{on}$ .

Next,  $P_{SC}$  can be expressed as the ratio between the total energy spent during  $N_P$  periods and the number of periods  $N_P$  times the sampling period  $T_s = 1/(4*f_c) = 1/f_s$ 

$$P_{SC} = \frac{1}{N_P \cdot T_s} \cdot \sum_{i=1}^{N_P} E_{x,i}$$
(3.23)

where  $E_x$  is the energy spent during one period  $T_s$ .

The proposed FIR-PA is designed as a part of a digital Cartesian transmitter (Fig. 3.25) and will be shared alternatively by the *I* and *Q* paths. The interconnection network uses the single-bit output of the DSM (for *I* and *Q* paths) in order to obtain the *N*-path input signals for the FIR-PA. Additionally, the digital to RF mixing is included in the interconnection network, where  $sel_P$  is used for path sharing (*I* and *Q*) and  $sel_S$  for the sign change, corresponding to a general RF output stream {*I*, *Q*, -*I*, -*Q*}. Hence, at one time period  $T_s$ ,  $V_{out}$  will correspond to the *I*-path, and at the next one, to the *Q*-path, respectively.



Fig. 3.25. Digital Cartesian transmitter based on single-ended FIR-PA

Let us consider the simplified SC PA in Fig. 3.26, which will be used to determine the maximum energy spent during a transition, similar to the approach in [Yoo11]. In this case, the inductor can be seen as an open circuit during fast switching transitions, and the output voltage can be obtained as the ratio between the switched-on capacitances and the total capacitance. Furthermore, we note that in [Yoo11] the switched-capacitor network is used in a polar configuration to activate unitary cells based on the signal amplitude during the first half-period, whereas in the second half, all the cells are switched-off. Hence, the energy spent during one time period depends only on the number of activated and deactivated cells (without considering transitions).

However, in the proposed architecture (Fig. 3.25), the unitary cells are activated or deactivated during a complete time period based on the control signals corresponding to the FIR filter implementation, meaning that the switching scheme model has to take into consideration the previous state (de-/activated) of the cells (transitions).

Initially, at time instant  $t_0$ , there are two capacitors with the bottom plates connected to  $V_{DD}$  and two capacitors with the bottom plates connected to *GND*. The first index of the capacitance corresponds to the connection of the bottom plate at time  $t_0$ , whereas the second index corresponds to time  $t_1$  (1 for  $V_{DD}$  and 0 for *GND*).

Hence,  $C_{11}$  is connected to  $V_{DD}$  at  $t_0$  and  $t_1$ ,  $C_{00}$  is connected to GND at  $t_0$  and  $t_1$ ,  $C_{10}$  is connected to  $V_{DD}$  at  $t_0$  and is switched-off at  $t_1$ , and  $C_{01}$  is connected to GND at  $t_0$  and is switched-off at  $t_1$ , respectively.



Fig. 3.26. 2-bit SC PA: at  $t_0$  (left) and  $t_1$  (right)

The corresponding output voltages at  $t_0$  ( $V_0$ ) and  $t_1$  ( $V_1$ ) are obtained as a ratio between the switched-on capacitances and the total capacitance  $C_T = C_{11} + C_{00} + C_{10} + C_{01}$ .

$$V_0 = \frac{C_{11} + C_{10}}{C_T} \cdot V_{DD}$$
(3.24)

$$V_1 = \frac{C_{11} + C_{01}}{C_T} \cdot V_{DD}$$
(3.25)

We note that the two output voltages are not equal when  $C_{10} \neq C_{01}$ , meaning that energy will also be spent due to the voltage drop difference on  $C_{11}$ , from  $V_{DD} - V_0$  to  $V_{DD} - V_1$ , or the voltage drop difference on  $C_{00}$ , from  $-V_0$  to  $-V_1$ , respectively. Hence, we write

$$E_x = E_{11} + E_{01} = E_{00} + E_{10}$$
(3.26)

where  $E_{11}$ ,  $E_{01}$ ,  $E_{00}$ , and  $E_{10}$  correspond to the energy spent during one transition, due to the voltage drop difference on  $C_{11}$ ,  $C_{01}$ ,  $C_{00}$ , and  $C_{10}$ .

$$E_{11} = \int_{t_0}^{t_1} u(t) \cdot i(t) dt = V_{DD} \cdot \int_{t_0}^{t_1} C_{11} \cdot \frac{du(t)}{dt} dt = C_{11} \cdot V_{DD} \cdot \int_{u(t_0)}^{u(t_1)} du(t) =$$
$$= C_{11} \cdot V_{DD} \cdot (V_{DD} - V_1 - V_{DD} + V_0) = C_{11} \cdot V_{DD} \cdot (V_0 - V_1)$$
(3.27)

Next, we replace (3.24) and (3.25) in (3.27)

$$E_{11} = C_{11} \cdot \frac{C_{10} - C_{01}}{C_T} \cdot V_{DD}^2$$
(3.28)

Analog, we can obtain  $E_{01}$ ,  $E_{00}$ , and  $E_{10}$ 

$$E_{01} = C_{01} \cdot \frac{C_{11} + 2 \cdot C_{10} + C_{00}}{C_T} \cdot V_{DD}^2$$
(3.29)

$$E_{00} = C_{00} \cdot \frac{C_{01} - C_{10}}{C_T} \cdot V_{DD}^2$$
(3.30)

$$E_{10} = C_{10} \cdot \frac{C_{11} + 2 \cdot C_{01} + C_{00}}{C_T} \cdot V_{DD}^2$$
(3.31)

The resulting maximum energy spent during one transition is

$$E_{x} = \frac{C_{10} \cdot (C_{11} + C_{01}) + C_{01} \cdot (C_{00} + C_{10})}{C_{T}} \cdot V_{DD}^{2}$$
(3.32)

This formula is validated through *Spectre RF* simulations based on the ideal unitary power cell displayed in Fig. 3.27 (cells used: *switch* for *sw1* and *sw2* and *cap* for *C1*, from the *analogLib* library) and will be further used for the FIR-PA efficiency study.

The efficiency study will be extended in Chapter 4 to include also the performance of the designed circuit elements (real switch and capacitors).



Fig. 3.27. Schematic of an ideal unitary power cell in Cadence

#### 3.3.3.1. Single-ended FIR-PA efficiency

The drain efficiency characteristic is expressed using (3.22) for an input sine wave of frequency  $f_{sin} = 1$  MHz which is sampled at  $f_s/2 = 4.8$  GHz. First, the modulator output is obtained in ModelSim (VHDL description) for different values of the input signal peak amplitude  $A_{in}$ , whereas the maximum peak amplitude is chosen equal to 640 mV to ensure DSM stability according to Section 3.2.3.2. Next, the rms value of the output voltage  $V_{out}$  (Fig. 3.25) is calculated in order to obtain the output power  $P_{out}$ , when  $R = 2 \Omega$ .

$$P_{out} = \frac{V_{out,rms}^2}{R}$$
(3.33)

Finally, we can combine (3.23) and (3.32) to determine the switching power  $P_{SC}$ .

$$P_{SC} = \alpha_{SC} \cdot C_T \cdot V_{DD}^2 \cdot f_s \tag{3.34}$$

where the factor  $\alpha_{SC}$  is obtained from Eq. (3.32) as an average value of the total energy spent during  $N_P$  periods, independent of  $C_T$  and  $V_{DD}$ 

$$\alpha_{SC} = \frac{\sum_{i=1}^{N_P} \alpha_{10,i} \cdot (\alpha_{11,i} + \alpha_{01,i}) + \alpha_{01,i} \cdot (\alpha_{00,i} + \alpha_{10,i})}{N_P}$$
(3.35)

and all the capacitances are normalized to the total capacitance  $C_T$ 

$$\begin{cases} \alpha_{00} = \frac{C_{00}}{C_T} & \alpha_{01} = \frac{C_{01}}{C_T} \\ \alpha_{10} = \frac{C_{10}}{C_T} & \alpha_{11} = \frac{C_{11}}{C_T} \end{cases}$$
(3.36)

Let us consider the optimized FIR filter designed for the 20 MHz bandwidth with 109 coefficients corresponding to 1820 unitary power cells. The values of the capacitances  $C_{11}$ ,  $C_{01}$ ,  $C_{00}$ , and  $C_{10}$  in (3.36) are determined as follows:  $C_{11}$  is the sum of all the capacitances connected to  $V_{DD}$  at  $t_0$  and  $t_1$ ,  $C_{00}$  connected to GND at  $t_0$  and  $t_1$ ,  $C_{10}$  connected to  $V_{DD}$  at  $t_0$  and  $t_1$ , and  $C_{01}$  connected to GND at  $t_0$  and switched-on at  $t_1$ , respectively. Hence, we can determine the normalized capacitances  $\alpha_{00}$ ,  $\alpha_{01}$ ,  $\alpha_{10}$ ,  $\alpha_{11}$  which are further used to calculate the factor  $\alpha_{SC}$  during  $N_p$  periods.

The proposed switching model, based on Eq. (3.22) and (3.33-3.36), is used to perform a system level simulation in order to estimate the efficiency of the single-ended architecture ( $\eta_{SE}$ ) for the targeted signal bandwidths (Table 3.6), considering ideal switches and capacitors, where  $C_T$  = 4.39 pF. We recall that the input signal is a sinewave of frequency  $f_{sin}$  = 1 MHz with the peak amplitude  $A_{in}$ , sampled at  $f_s/2$  = 4.8 GHz.

We can note that the drain efficiency increases up to 76.1% (considering ideal switches), proportional to the increase of the input signal amplitude (the output power  $P_{out}$  increases), whereas  $P_{SC}$  remains almost constant. Furthermore, it is seen that the power consumption is almost the same, for all the targeted bandwidths (factor  $\alpha_{SC}$  almost constant), thus being independent on the value and the number of coefficients of the corresponding FIR filters.

Finally, in order to increase the maximum output power, we propose the implementation of a differential FIR-PA architecture, which will be detailed next. Additionally, with respect to a single-ended architecture, the differential structure has also the advantage of removing inter-symbol interference (ISI) due to the rise and fall times of output signal transitions with asymmetric edges (Fig. 3.28), as stated in [Frappe07].

| Ain<br>[mV] | V <sub>out,rms</sub><br>[mV] | P <sub>out</sub><br>[mW] | α <sub>SC</sub><br>20/40 MHz | α <sub>SC</sub><br>80 MHz | α <sub>SC</sub><br>160 MHz | P <sub>SC</sub><br>[mW] | η <sub>se</sub><br>(%) |
|-------------|------------------------------|--------------------------|------------------------------|---------------------------|----------------------------|-------------------------|------------------------|
| 64          | 18.4                         | 0.17                     | 0.164                        | 0.165                     | 0.166                      | 6.93                    | 2.4                    |
| 128         | 36.7                         | 0.67                     | 0.164                        | 0.165                     | 0.165                      | 6.93                    | 8.9                    |
| 192         | 55                           | 1.51                     | 0.169                        | 0.169                     | 0.17                       | 7.14                    | 17.5                   |
| 256         | 73.4                         | 2.69                     | 0.156                        | 0.157                     | 0.158                      | 6.59                    | 28.9                   |
| 320         | 91.7                         | 4.2                      | 0.163                        | 0.164                     | 0.164                      | 6.88                    | 37.9                   |
| 384         | 110                          | 6.05                     | 0.159                        | 0.16                      | 0.16                       | 6.72                    | 47.4                   |
| 448         | 128.4                        | 8.24                     | 0.163                        | 0.164                     | 0.165                      | 6.88                    | 54.5                   |
| 512         | 146.7                        | 10.76                    | 0.156                        | 0.157                     | 0.157                      | 6.59                    | 62                     |
| 576         | 165                          | 13.61                    | 0.158                        | 0.158                     | 0.159                      | 6.67                    | 67.2                   |
| 640         | 203.6                        | 20.72                    | 0.154                        | 0.155                     | 0.156                      | 6.5                     | 76.1                   |

Table 3.6. Single-ended FIR-PA drain efficiency



Fig. 3.28. ISI with asymmetric fronts (a); resulting output spectra (b) [Frappe07]

# **3.3.3.2.** Differential FIR-PA efficiency

The differential FIR-PA architecture is illustrated in Fig. 3.29 and is expected to obtain an increased output power (+ 3 dB) for the same overall efficiency (output power is doubled, power consumption is doubled) with respect to the single-ended FIR-PA.



Fig. 3.29. FIR-PA differential architecture

# **3.3.3.2.1.** Optimization of the switching activity

Let us recall that the FIR filter is designed with symmetric coefficients, meaning that for example  $C_1 = C_N$ , hence  $in_1$  and  $in_N$  have the same influence  $(C_1/C_T)$  on the differential output voltage value  $V_{out,diff}$ . Table 3.7 comprises the possible values of  $V_{out,diff}$  depending on the switching activity on  $in_1$  and  $in_N$  (the influence of  $in_{2,...,N-1}$  is not taken into account).

| Case | in <sub>1</sub> | in <sub>N</sub> | V <sub>out,pos</sub> | V <sub>out,neg</sub> | $\mathbf{V}_{out,diff}$ |
|------|-----------------|-----------------|----------------------|----------------------|-------------------------|
| 00   | 0               | 0               | 0                    | $2*C_1/C_T$          | $-2*C_1/C_T$            |
| 01   | 0               | 1               | $C_1/C_T$            | $C_1/C_T$            | 0                       |
| 10   | 1               | 0               | $C_1/C_T$            | $C_1/C_T$            | 0                       |
| 11   | 1               | 1               | $2*C_1/C_T$          | 0                    | $2*C_1/C_T$             |

**Table 3.7.** Possible values of  $V_{out,diff}$  depending on  $in_1$  and  $in_N$ 

where

$$V_{out,diff} = V_{out,pos} - V_{out,neg}$$
(3.37)

$$V_{out, pos} = in_1 \cdot \frac{C_1}{C_T} + in_N \cdot \frac{C_N}{C_T} = (in_1 + in_N) \cdot \frac{C_1}{C_T}$$
(3.38)

$$V_{out,neg} = \overline{in_1} \cdot \frac{C_1}{C_T} + \overline{in_N} \cdot \frac{C_N}{C_T} = \left(\overline{in_1} + \overline{in_N}\right) \cdot \frac{C_1}{C_T}$$
(3.39)

The cases "01" ( $in_1 = 0$  and  $in_N = 1$ ) and "10" ( $in_1 = 1$  and  $in_N = 0$ ) are redundant and determine a zero output value, while producing unnecessary switching activity. In order to improve the efficiency of the differential FIR-PA, this redundancy can be eliminated from Eq. (3.38) and (3.39) with AND (positive side)/NOR (negative sides) gates between  $in_1$  and  $in_N$  in Eq. (3.40) and (3.41), leading to new driving signals  $RFP_1$  and  $RFN_1$ .

$$V_{out,pos} = \left(in_1 AND in_N\right) \cdot \frac{C_1}{C_T} + \left(in_1 AND in_N\right) \cdot \frac{C_N}{C_T} = 2 \cdot RFP_1 \cdot \frac{C_1}{C_T}$$
(3.40)

$$V_{out,neg} = \left(in_1 \ NOR \ in_N\right) \cdot \frac{C_1}{C_T} + \left(in_1 \ NOR \ in_N\right) \cdot \frac{C_N}{C_T} = 2 \cdot RFN_1 \cdot \frac{C_1}{C_T}$$
(3.41)

Consequently, based on the optimization in Eq. (3.40) and (3.41) the possible values of  $V_{out,diff}$  depending on the switching activity on  $in_i$  and  $in_{N-i+1}$  are  $\{-2*C_i/C_T, 0, 2*C_i/C_T\}$ . Simulation results show that avoiding unnecessary switching activity with this method reduces the power consumption due to capacitor switching by 15-18%, thus increasing the FIR-PA efficiency by up to 3% at maximum input amplitude.

#### 3.3.3.2.2. Half-SC FIR-PA

Considering the fact, that the possible values of  $V_{out,diff}$  depend only on  $C_i$  and that the factor 2 is relative to the ratio between the smallest coefficient and the sum of all the FIR filter coefficients, we therefore propose an original solution to improve the efficiency of the output stage by eliminating half of the filter coefficients (with respect to Table 3.5) without affecting the filter transfer function (Fig. 3.30).



Fig. 3.30. Half-SC FIR-PA differential architecture

We note that using the proposed switching scheme with reduced redundancy, the outputs on the positive or the negative sides do not correspond anymore to the output of a single-ended architecture. However, the switching optimization was performed considering a differential structure, leading to identical transfer functions for the differential Half-SC FIR-PA and the initial differential FIR-PA.

Furthermore, the power consumption and area of the proposed Half-SC FIR-PA is reduced by half with respect to the initial differential FIR-PA in Fig. 3.29, and the interface between the outputs of the DSM and the Half-SC FIR-PA inputs needs to drive only half the number of signal lines (N/2+1), thus reducing the overall system power consumption even further. Based on the proposed improvements of the Half-SC FIR-PA (reduced unnecessary switching and half of the FIR coefficients), we can recalculate the factor  $\alpha_{HSC}$  using (3.35) and the associated power consumption.

Consequently, we can compare the efficiency of the Half-SC FIR-PA ( $\eta_{HSC}$ ) to the single-ended ( $\eta_{SE}$ ) architecture in Table 3.8. Moreover, the factor  $\alpha_{DSC}$  describes the performances of the differential SC PA before the optimization of the switching activity (Fig. 3.29) and is obtained by doubling the factor  $\alpha_{SC}$  (obtained for the 20/40 MHz filter) from Table 3.6.

| A <sub>in</sub><br>[mV] | V <sub>out,rms</sub><br>[mV] | P <sub>out</sub><br>[mW] | addsc | a <sub>HSC</sub> | P <sub>HSC</sub><br>[mW] | η <sub>нsc</sub><br>(%) | η <sub>SE</sub><br>(%) |
|-------------------------|------------------------------|--------------------------|-------|------------------|--------------------------|-------------------------|------------------------|
| 64                      | 36.8                         | 0.34                     | 0.328 | 0.133            | 5.62                     | 5.7                     | 2.4                    |
| 128                     | 73.4                         | 1.35                     | 0.328 | 0.135            | 5.70                     | 19.1                    | 8.9                    |
| 192                     | 110                          | 3.02                     | 0.338 | 0.139            | 5.87                     | 34                      | 17.5                   |
| 256                     | 146.8                        | 5.39                     | 0.313 | 0.132            | 5.57                     | 49.1                    | 28.9                   |
| 320                     | 183.4                        | 8.41                     | 0.326 | 0.136            | 5.74                     | 59.4                    | 37.9                   |
| 384                     | 220                          | 12.1                     | 0.318 | 0.134            | 5.66                     | 68.1                    | 47.4                   |
| 448                     | 256.8                        | 16.49                    | 0.326 | 0.136            | 5.74                     | 74.1                    | 54.5                   |
| 512                     | 293.4                        | 21.52                    | 0.312 | 0.132            | 5.57                     | 79.4                    | 62                     |
| 576                     | 330                          | 27.22                    | 0.315 | 0.134            | 5.66                     | 82.8                    | 67.2                   |
| 640                     | 407.2                        | 41.45                    | 0.309 | 0.131            | 5.53                     | 88.2                    | 76.1                   |

Table 3.8. Differential FIR-PA drain efficiency

The efficiency characteristics of the single-ended (*SE*), differential optimized with reduced switching activity (*Diff\_opt*) and differential *Half-SC* architectures are plotted in Fig. 3.31. The major advantage of the proposed Half-SC scheme is the increased power efficiency at back-off output power levels, which is very important when targeting advanced communication standards with large PAPR, such as WLAN (PAPR  $\approx$  13 dB).

Additionally, the Half-SC scheme has been simulated in Cadence, using ideal switches and capacitors (as shown in Fig. 3.27), in order to validate the theoretical analysis and MATLAB simulations.



Fig. 3.31. Power efficiency characteristics

The simulations have been performed for 4 input signal peak amplitudes,  $A_{in2} = 128$  mV,  $A_{in5} = 320$  mV,  $A_{in8} = 512$  mV and  $A_{in10} = 640$  mV, and the resulting efficiency characteristic (Fig. 3.31 Half-SC Cadence Sim, diamond markers) is similar to the one obtained through MATLAB simulations (Fig. 3.31 Half-SC, square markers).

Furthermore, it is seen that when we consider the switched-capacitor (SC) network together with the RLC band-pass filter, the drain efficiency is slightly higher than expected (output power slightly larger, capacitive dissipated power slightly smaller), which may be accounted for by a small current leakage from the SC network towards the RLC filter, during capacitor discharge. However, this effect was not modeled in MATLAB since the two networks (SC and RLC) are simulated independently.

Thus, the number of coefficients in the Half-SC scheme is reduced by half, and we can approximate the new maximum number of unitary power cells (filter taps), namely ~910 (from Table 3.5, 1820 unitary taps for the 20 MHz case), which are driven by a maximum of 55 (109/2 + 1) control signals.

Finally, Table 3.9 presents a performance summary for the single-ended, differential and Half-SC architectures, namely the output power, power consumption, area, number of power cells and number of input drivers.

|                       | Single-ended    | Differential       | Half-SC                    |
|-----------------------|-----------------|--------------------|----------------------------|
| Output Power          | Pout            | 2*Pout             | 2*Pout                     |
| Switching consumption | $P_{SC}$        | 2*P <sub>SC</sub>  | $\mathbf{P}_{\mathbf{SC}}$ |
| Area                  | А               | 2*A                | А                          |
| No. of power cells    | N <sub>pc</sub> | 2*N <sub>pc</sub>  | N <sub>pc</sub>            |
| No. Input drivers     | N <sub>id</sub> | 2* N <sub>id</sub> | N <sub>id</sub>            |

Table 3.9. FIR-PA architectures performance summary

In conclusion, the proposed Half-SC method combines the advantages of a differential architecture (twice the output power) with the ones of a single-ended architecture (complexity - power consumption and area) to obtain a highly efficient SC FIR-PA. The expected performance based on system-level simulations is derived in Table 3.10.

 Table 3.10. Simulated FIR-PA performance

| Parameters                        | System simulation results |
|-----------------------------------|---------------------------|
| Bandwith [MHz]                    | 20 - 160                  |
| Carrier frequency [GHz]           | 2.4                       |
| FIR-PA switching consumption [mW] | 5.6                       |
| Peak output power [dBm]           | 16.2                      |
| Drain efficiency [%]              | 88                        |

# **3.3.3.2.3.** Comparison with ideal high-resolution DAC

A comparison in terms of noise performances is made between the proposed ideal Half-SC architecture and ideal high-resolution DACs (10-bit, 11-bit, 12-bit, 13-bit). The operating frequency of the Half-SC scheme is  $4^{*}f_{c}$  (RF output signal centered at  $f_{c}$ ), whereas for the ideal DACs we have considered an ideal quadrature operation at  $4^{*}f_{c}$  per signal path (in-phase, quadrature phase) based on a 300 MHz signal processing, as described in

[Alavi14]. We note that in [Alavi14] the  $4*f_c$  operation is implemented through upconverting quadrature clocks working at  $f_c$  with a duty-cycle of 25 % to avoid correlation between the I and Q paths.

The corresponding output spectrum simulations are presented in Fig. 3.32. Therefore, with respect to an ideal 10-bit DAC, the proposed Half-SC FIR-PA i) presents similar complexity in terms of number of cells, ii) with lower number of control signals depending on DAC encoding (18 times lower compared to pure thermometer DAC), iii) eliminates image replicas due to the ZOH operation, and iv) provides highly adapted noise suppression in the most critical bands (for example the GPS band around 1.575 GHz) targeting multi-standard coexistence with nearby receivers.



Fig. 3.32. Output spectrum: comparison Half-SC FIR-PA with ideal DAC architectures

#### 3.4. FIR-PA non-idealities

The proposed FIR-PA is built upon a basic power cell which comprises a simple switched-capacitor scheme. Until now, both the switch and the capacitance have been assumed to be ideal. However, in circuit design, non-idealities can arise either in the switch circuit or in the capacitance value.

In this section, we will first propose a model for the switch non-idealities, such as rise/fall time ( $t_r$ ,  $t_f$ ), low-to-high/high-to-low delay ( $t_{PLH}$ ,  $t_{PHL}$ ) and jitter ( $t_j$ ) and we will investigate the effect of these non-idealities over the output power spectrum. Next, we will

study the effect of non-ideal capacitance values (non-ideal FIR coefficients) in order to determine the tolerance of the proposed scheme to coefficients variation.

Furthermore, we also investigate the cycle-to-cycle effect of the voltage supply variation over the output spectrum. Finally, we will derive the specifications for the IC design, taking into consideration the non-ideal effects introduced by the switchedcapacitor scheme.

# 3.4.1. Non-ideal switch model

First of all, let us consider an ideal switch (from Fig. 3.30) and its implementation as a CMOS inverter with the input signal  $in_1$  and the output signal  $out_1$  (Fig. 3.33).



Fig. 3.33. Ideal switch (left); CMOS inverter switch (right)

The resulting output waveform of the ideal switch is shown in Fig. 3.34 (left), whereas  $in_1$  is an ideal square wave. However, as described in Fig. 3.25, the input signal  $in_1$  is obtained in the digital interconnection network, thus being subject to system clock non-idealities. Hence, the output of the CMOS inverter can no longer be represented as an ideal waveform and is determined by parameters such as rise/fall time ( $t_r$ ,  $t_f$ ), low-to-high/high-to-low delay ( $t_{PLH}$ ,  $t_{PHL}$ ) and jitter ( $t_j$ ).

The rise time  $t_r$  is defined as the time required for the output to swing from 10% of  $V_{DD}$  to 90% of  $V_{DD}$ . Similarly,  $t_f$  is defined as the time required for the output to swing from 90% of  $V_{DD}$  to 10% of  $V_{DD}$ . The low-to-high delay  $t_{PLH}$  is defined as the difference between the time instants when both the input and the output cross  $V_{DD}/2$  for an output transition low-to-high (LH), whereas the high-to-low delay  $t_{PHL}$  is defined similarly for an output transition high-to-low (HL) [Razavi06].



Fig. 3.34. Input / Output waveforms: Ideal switch (left); CMOS inverter switch (right)

These four parameters can be visualized in Fig. 3.34 (right) when using a CMOS inverter switch. Note that the amplitude is normalized to  $V_{DD}$  and time is normalized for one sampling period  $T_s = 1/(4*f_c)$ .

Moreover, jitter  $t_j$  represents the random arrival time variation of a signal with respect to the actual ideal arrival time. Hence, clock signals which are subject to jitter will determine the expected signals to arrive early at one time instant and late at another. As it is shown in [Smilkstein07], jitter can be expressed as a peak-to-peak value  $t_{j,pk-pk}$ , or rms value  $t_{j,rms}$  (Fig. 3.35), whereas its distribution is assumed to be normal.



Fig. 3.35. Jitter with Gaussian distribution [Smilkstein07]

We can visualize in Fig. 3.36 the jitter effect over the output signal  $out_1$  in the case of a CMOS inverter switch. Thus, jitter can be modelled like in the case of a low-to-high/high-to-low delay, with the exception that jitter also varies from cycle to cycle.



Fig. 3.36. Input / Output waveforms of CMOS inverter switch: Jitter effect

# 3.4.1.1. Power spectral density estimation

The power spectral density (PSD) estimation is based on the Fast Fourier Transform (FFT) algorithm for computing the Fourier Transform of a length-*N* discrete-time sequence x(n) at *N* frequency points (called *FFT bins*), where  $f = \{0, 1, ..., (N-1)\}/N$  [Schreier05]

$$X(f) = \sum_{n=0}^{N-1} x(n) \cdot e^{-j2\pi jn}$$
(3.42)

Moreover, [Schreier05] suggests using an additional window function, such as the Hann window which provides very large high-frequency attenuation and sufficient protection against noise leakage. The PSD estimate is

$$S_{x}(f) = \frac{\left|\sum_{n=0}^{N-1} w(n) \cdot x(n) \cdot e^{-j2\pi f n}\right|^{2}}{\left\|w\right\|_{2}^{2}}$$
(3.43)

where w(n) is the Hann window function given in Eq. (3.14) and

$$\|w\|_{2}^{2} = \sum_{n=0}^{N-1} |w(n)|^{2}$$
(3.44)

is the energy of the window and represents the scaling factor of the FFT for calibrated noise density. A detailed description of the PSD estimation can be found in [Schreier05].

In our proposed FIR-PA, x(n) represents a discrete value of the output of the ideal switch or CMOS inverter,  $out_1(n)$  sampled at  $f_s = 1/(4*f_c)$ . If we consider the output waveform of the ideal switch (Fig. 3.34 left), then the discrete value  $x(n) = out_1(n)$  will take an ideal value of either 0 or 1, depending on the switching activity of the DSM.

However, the CMOS inverter (Fig. 3.34 right) presents physical non-idealities, and the output value during a complete sampling period  $T_s$  is no longer ideally equal to 0 or 1. Hence, this signal should be sampled at a much higher frequency than  $f_s$  ( $K_s*f_s$ ), in order to acquire enough samples, needed to obtain the discrete value of the output signal between [0; 1]. Once the values have been acquired, the signal needs to be decimated in order to return to the initial sampling frequency,  $f_s$ .

In order to decimate the signal, we may first down-sample the signal by the factor  $K_s$ , namely keep only every  $K_s$ <sup>th</sup> sample. Still, this would bring us back to the initial non-representative case. Hence, we propose to decimate the signal through successive capture averaging, a technique which is employed by most digital sampling oscilloscopes (DSO) [Bishop10]. This way,  $K_s$  input samples at  $K_s*f_s$  are averaged together in order to obtain one output sample at  $f_s$  (Fig. 3.37 for  $K_s = 3$ ).

At this point, we mention that the proposed sampling at  $K_s^*f_s$  and decimation through successive capture averaging at  $f_s$  are only performed in simulation in order to be able to estimate the PSD using the FFT. In conclusion, in the case of the CMOS inverter with nonidealities, the discrete value of  $x(n) = out_1(n)$  at  $f_s$  can be approximated by the mean value of the continuous signal  $x(t) = out_1(t)$ .



Fig. 3.37. Successive capture averaging [Bishop10]

This is further equivalent to the signed area in the *xy*-plane bounded by the graph of  $out_1(t)$  (Fig. 3.34).

$$x(n) = mean(x(t)) = mean(out_1(t)) = \int_{0}^{T_s} out_1(t) dt$$
(3.45)

#### 3.4.1.2. Model of the rise/fall time

Let us consider the transient outputs of an ideal switch and a CMOS inverter with variable non-zero rise and fall times (Fig. 3.38). When  $x(n)_{ideal} = 1$ , we obtain in the non-ideal case  $x(n)_{NI}$ 

$$x(n)_{NI} = \int_{0}^{T_s} out_1(t) dt = 1 - A_2 - A_4$$
(3.46)

and when  $x(n)_{ideal} = 0$ , we obtain in the non-ideal case  $x(n)_{NI}$ 

$$x(n)_{NI} = \int_{0}^{T_{s}} out_{1}(t) dt = 0 + A_{1} + A_{3}$$
(3.47)

Taking into consideration that the rise and fall times may vary from one path to another we may write the general formulas for  $A_1$ ,  $A_2$ ,  $A_3$ , and  $A_4$ 



Fig. 3.38. Ideal switch (blue) and CMOS inverter (green and red) with variable  $t_r$  and  $t_f$ 

$$\begin{cases} A_{1} = \frac{1}{8} \cdot \frac{t_{rx}^{2}}{(t_{rx} + \Delta t_{r})} \\ A_{2} = \frac{1}{8} \cdot \frac{(t_{rx} + 2 \cdot \Delta t_{r})^{2}}{(t_{rx} + \Delta t_{r})} \\ A_{3} = \frac{1}{8} \cdot \frac{t_{fx}^{2}}{(t_{fx} + \Delta t_{f})} \\ A_{4} = \frac{1}{8} \cdot \frac{(t_{fx} + 2 \cdot \Delta t_{f})^{2}}{(t_{fx} + \Delta t_{f})} \end{cases}$$
(3.48)

where  $t_{rx} = t_r / 0.8$ ,  $t_{fx} = t_f / 0.8$ ,  $\Delta t_r$  represents the variation (red curve Fig. 3.38) of the rise time specific to each path with respect to an initial value  $t_r$  (green curve Fig. 3.38), and  $\Delta t_f$  the variation of the fall time with respect to an initial value  $t_f$ , respectively.

# 3.4.1.3. Model of the low-to-high/high-to-low delay

Let us consider the outputs of an ideal switch and a CMOS inverter switch with fixed non-zero rise and fall times and variable LH/HL delays (Fig. 3.39). The discrete non-ideal output is calculated using (3.46) and (3.47), and the areas  $A_{1-4}$  using (3.49), respectively.
$$\begin{cases}
A_{1} = \frac{1}{8} \cdot \frac{\left(t_{rx} - 2 \cdot \Delta t_{PLH}\right)^{2}}{t_{rx}} \\
A_{2} = \frac{1}{8} \cdot \frac{\left(t_{rx} + 2 \cdot \Delta t_{PLH}\right)^{2}}{t_{rx}} \\
A_{3} = \frac{1}{8} \cdot \frac{\left(t_{fx} - 2 \cdot \Delta t_{PHL}\right)^{2}}{t_{fx}} \\
A_{4} = \frac{1}{8} \cdot \frac{\left(t_{fx} + 2 \cdot \Delta t_{PHL}\right)^{2}}{t_{fx}}
\end{cases}$$
(3.49)

where  $t_{rx} = t_r / 0.8$ ,  $t_{fx} = t_f / 0.8$ ,  $\Delta t_{PLH}$  represents the variation (red curve Fig. 3.39) of the LH delay specific to each path with respect to a zero initial value (green curve Fig. 3.39), and  $\Delta t_{PHL}$  the variation of the HL delay with respect to a zero initial value, respectively.



Fig. 3.39. Ideal switch (blue) and CMOS inverter (green and red) with variable  $t_{PLH}$  and  $t_{PHL}$ 

#### 3.4.1.4. Combined model of rise/fall time, LH/HL delay and jitter

Let us consider the case of a CMOS inverter with variable non-zero rise and fall times and variable LH/HL delays (Fig. 3.40). In this case, we wish to combine both effects in order to provide a complete model of non-idealities. Therefore, we still use (3.46) and (3.47) to find the discrete non-ideal value of x(n), where each area can be calculated using (3.50).



Fig. 3.40. Output waveform: ideal switch and CMOS inverter (combined non-ideal effects)

Finally, as mentioned before, jitter can be also considered as a variable delay, so we can now describe the complete model for the switching non-idealities. This model is derived in Eq. (3.51) and was used in simulations in order to visualize the effect of the non-idealities over the output spectrum.

$$\begin{cases} A_{1} = \frac{1}{8} \cdot \frac{\left(t_{rx} - 2 \cdot \left(\Delta t_{PLH} + t_{j,var}\right)\right)^{2}}{t_{rx} + \Delta t_{r}} \\ A_{2} = \frac{1}{8} \cdot \frac{\left(t_{rx} + 2 \cdot \left(\Delta t_{PLH} + t_{j,var}\right) + 2 \cdot \Delta t_{r}\right)^{2}}{t_{rx} + \Delta t_{r}} \\ A_{3} = \frac{1}{8} \cdot \frac{\left(t_{fx} - 2 \cdot \left(\Delta t_{PHL} + t_{j,var}\right)\right)^{2}}{t_{fx} + \Delta t_{f}} \\ A_{4} = \frac{1}{8} \cdot \frac{\left(t_{fx} + 2 \cdot \left(\Delta t_{PHL} + t_{j,var}\right) + 2 \cdot \Delta t_{f}\right)^{2}}{t_{fx} + \Delta t_{f}} \end{cases}$$
(3.51)

#### 3.4.1.5. Results

In the following sub-section we study the influence of the aforementioned parameters over the resulting spectrum, in order to derive the limit values for which the performance of the system starts to degrade (fail to meet OOB mask). We note that the study is based on a hypothetical random variation of parameters, whereas in reality, this variation depends on the supply voltage and signal generation equipment.

As a methodology, all parameters are specific to each path and normally distributed, and  $\Delta t_r$ ,  $\Delta t_f$ ,  $\Delta t_{PLH}$  and  $\Delta t_{PHL}$  are constant throughout the simulation, whereas  $t_{j,var}$  changes from cycle to cycle. The first simulation shows the resulting output spectrum when varying only the rise and fall times ( $\Delta t_r$  and  $\Delta t_f$ ) against the ideal output spectrum (Fig. 3.41).



Fig. 3.41. Output spectrum: ideal (black); variable  $t_r$  and  $t_f$  (blue)

This confirms the statement in [Frappe07] that a differential architecture is not affected by ISI in the case of non-equal rise and fall times. However, slight differences may still be observed, due to variable rise and fall times from one signal path to another.

Furthermore, Fig. 3.42 highlights the effect of variable delays ( $\Delta t_{PLH}$  and  $\Delta t_{PHL}$ ) over the filtering function. As we may notice, the noise level is higher than the one in the ideal case especially near the signal-band.



Fig. 3.42. Output spectrum: ideal (black); variable  $t_{PLH}$  and  $t_{PHL}$  (blue)

In this case, it is the degradation of the signal-band noise performances which determines the limit value of the standard deviation of the propagation delay  $t_{PLH}$  (or  $t_{PHL}$ ), corresponding to  $\sigma_{PLH/PHL} \approx 0.1 \cdot t_{PLH}$ .

Next, we depict in Fig. 3.43 the effect of jitter over the filtering function. We remark that, with respect to the other parameters, jitter determines an increased noise floor, especially near the signal-band, where the FIR filter zeros are less visible. This is due to the fact that the deviation mostly affects the largest coefficients of the FIR filter and that near the signal-band, the zeros of the FIR function are highly dependent on the paths comprising the largest coefficients.

Still, far out-of-band, the output spectrum meets the OOB mask with a safe margin. Here, the limit value of the standard deviation of variable jitter is found to be  $\sigma_{j,var} \approx 1$  % of the main clock period, which corresponds to  $\sigma_{j,var} \approx 1$  ps, for an operating frequency of around 10 GHz.



Fig. 3.43. Output spectrum: ideal (black); variable jitter  $t_{i,var}$  (blue)

Finally, we have simulated all the non-ideal effects together with the corresponding limit values (Fig. 3.44) and we remark that the resulting output spectrum is very similar to the one in Fig. 3.43. Thus, we can conclude that out of the three switch non-idealities under study, jitter has the largest influence over the FIR-PA output performance.

However, we find it interesting that when all the non-idealities are simulated together, the noise level near the signal-band is lower than in the case when only the effect of jitter was studied. Since the variable delay and jitter have basically the same effect on the output value, we may assume that there is a slight cancellation effect when studying these effects together, which in return determines a slightly lower deviation from the ideal output value.



Fig. 3.44. Output spectrum: ideal (black); non-ideal effects (blue)

## 3.4.2. Non-ideal FIR filter coefficients

In our proposed FIR-PA architecture, each filter coefficient is obtained through the ratio between the capacitance on each path and the total capacitance of the paths. Until now, the filter coefficients have been assumed to be ideal, namely each coefficient i quantized to 5 bits can be set to an integer value  $c_i$  between [1; 32].

Furthermore, instead of designing one cell per each coefficient with the corresponding  $c_i$ , we choose to design a unitary power cell which is then multiplied by  $c_i$ . Even so, depending on the technology and capacitor choice, the value of the capacitance of each unitary cell may still vary from one to another.

Hence, it is important to study the influence of non-ideal coefficients on the filtering function, in order to determine the tolerance of the proposed architecture to the physical variation of unitary capacitances ( $\Delta C_i$ ).

## 3.4.2.1. Estimation

The proposed test has the following characteristics:

- ➢ for each unitary coefficient, we compute two vectors (one for the positive and one for the negative side of the FIR-PA) of variable coefficients  $c_{var,a}$  and  $c_{var,b}$  (length is equal to the total number of unitary coefficients) following a normal distribution law of mean  $m_c$  = 0 and standard deviation  $\sigma_c$  between [0.01; 0.2];
- > there are 35 random realizations of the FIR filter for each  $\sigma_c$ ;
- for each realization of the FIR filter, each coefficient is computed as the number of ideal unitary coefficients plus the corresponding variable coefficient values taken randomly from cvar,a or cvar,b.

It is seen that for a  $\sigma_c$  = 0.05, which translates to a peak deviation of around ±15% of one unitary capacitance, the output spectrum of the designed FIR function meets the outof-band mask (Fig. 3.45). This tolerance is achieved thanks to i) the large number of coefficients which provide excellent noise rejection (in the ideal case, 109 coefficients FIR filter implemented with half the complexity - 55 coefficients), and ii) low coefficient quantization to 5 bits which is already subject to coefficient rounding errors.





However, increasing  $\sigma_c$  up to 0.1 (peak deviation of around ±30%) we notice that the noise level performances near the signal-band as well as out-of-band are slightly degraded (Fig. 3.46), namely 30 realizations out of 35 do not meet the transmit mask near the signal-band, and 6 realizations out of 35 do not meet the OOB mask.



Fig. 3.46. Output spectrum: non-ideal coefficients  $\sigma_c = 0.1$  (35 random realizations)

#### 3.4.2.2. Comparison with non-ideal high-resolution DAC

In this section, we will compare the Half-SC structure with a DAC architecture, assuming in both cases non-ideal coefficients with a standard deviation  $\sigma_c = 0.1$  (peak deviation of around ±30%). Previously, we considered high-resolution DAC architectures with pure thermometer encoding (Section 3.3.3.2.3), even though it is not used in practice, due to the increased complexity, area, interconnect parasitics and power consumption.

This justifies the introduction of segmented approaches in order to trade-off complexity with output performances (avoid non-monotonic behavior and mid-code transition glitches). Such an architecture is proposed in [Alavi14] and uses 8 bits thermometer encoding for the most significant bits (MSB) and 4 bits binary encoding for the least significant bits (LSB) in a 13-bit DAC (1 sign bit), resulting in 256 MSB and 16 LSB units. We further recall that the architecture in [Alavi14] is based on a 300 MHz signal processing which creates image replicas due to the ZOH operation.

In the present comparison, the DAC has the same ratio of number of bits with thermometer and binary encoding as in [Alavi14], namely 2/3 of total number of bits for thermometer, and 1/3 for binary, respectively. The simulations are performed for {11, 12}-bit segmented DACs with {7, 8} bits thermometer encoding and 4 bits binary encoding, considering 35 random realizations for the coefficient mismatch (Fig. 3.47). Furthermore, the Half-SC FIR-PA considers one of the 35 filter realizations in Fig. 3.46.





Consequently, compared to high-resolution segmented DAC architectures, the proposed structure with non-ideal coefficients presents i) a complexity comparable to a 10bit DAC, with less control signals (55 for Half-SC, and 68 for segmented 10-bit DAC), and noise performances better overall than 11-bit and close to 12-bit DACs near the signalband and in critical bands (i.e. GPS band), thus enabling multi-standard coexistence.

# 3.4.2.3. Compensation

Previously, we presented the effect of non-ideal coefficients over the output spectrum and derived the limit variance of each unitary power cell  $\sigma_c \leq 0.1$ . However, it results in simulations that the absolute maximum deviation of a coefficient with respect to its ideal value is less than 2 units when  $\sigma_c \leq 0.1$  (corresponds to a peak deviation of ±30%). Still, considering the case of the maximum coefficient 32, ±30% out of 32 would mean a deviation close to ±10 units. The explanation is found in the way the physical coefficient is implemented. Hence, the maximum coefficient is divided in to 32 unitary cells which are subject to a specific deviation, some more than others. So, in the end, the total deviation of the coefficient is compensated among cells (in reality it is not equal to 30% out of 32), because each unitary cell had its own deviation.

Consequently, if the maximum deviation of a coefficient is less than 2, it could be further reduced using the ±1, ±2 cells feature (introduced in Section 3.3.2.2), which ensures filtering configurability. In order to test this assumption, each new coefficient is computed as the difference between the non-ideal coefficient and the rounded value of the deviation from the ideal coefficient. For example, if the ideal coefficient is  $c_{id} = 32$ , and the non-ideal coefficient is  $c_{nid} = 30.8$ , the new coefficient becomes  $c_{new} = 30.8 + 1 = 31.8$ , meaning that the proposed mechanism does not completely compensate the coefficient deviation, but it will reduce it enough to limit the performance degradation. We note that this mechanism can be applied only to coefficients larger than 2, whereas for the unitary coefficients (single unitary cells) a peak deviation of 30% is rounded to zero (no action taken).

Finally, we plot the output spectrum of the compensated FIR filter, when  $\sigma_c = 0.1$  (Fig. 3.48). The noise performance is highly improved with respect to the non-compensated

filter. We recall that before there were 30 filter realizations out of 35 that did not meet the transmit mask near the signal-band and 6 out of 35 that did not meet the OOB mask around GPS. Now, there are only 4 out of 35 that do not meet the mask near the signal-band and 1 out of 35 for GPS, respectively. This proves an additional advantage of the  $\pm 1$ ,  $\pm 2$  cells feature, to allow the compensation of coefficient non-idealities.



Fig. 3.48. Output spectrum: Compensated coefficients  $\sigma_c = 0.1$  (35 random realizations)

# 3.5. Voltage supply variation

We showed previously that the output power depends on the supply voltage  $V_{DD}$ , which is considered ideal (fixed value). However, in reality the supply voltage is subject to variations that are either external due to its generation (equipment or IC voltage regulators), or internal due to power supply noise (*IR* drop and *Ldi/dt* effects [Meng06]) specific to the integration on chip. Furthermore, these variations are mainly dependent on the load current and become more important in applications involving high-frequency switching, such as the proposed switched-capacitor architecture.

Therefore, we propose to study the effect of the  $V_{DD}$  variation from cycle-to-cycle  $(\Delta V_{DD})$  in two conditions, namely at each time period we compute  $\Delta V_{DD}$  based either on a normal distribution or as a function of the switching current. In the first case, we define a normal distribution with a zero mean and a standard deviation  $\sigma_{vdd}$ , whereas for each value of  $\sigma_{vdd}$  we perform 35 simulations.

When  $\sigma_{vdd} = 3\% \cdot V_{DD}$  (± 3 mV when  $V_{DD} = 1$  V), we observe that the noise specifications are successfully met overall, while still observing an attenuation of the FIR function zeros near-band (Fig. 3.49).



Fig. 3.49. Output spectrum: Non-ideal supply voltage  $\sigma_{vdd} = 3\%$  (35 random realizations)

However, for larger deviation, up to  $\sigma_{vdd} = 10\% \cdot V_{DD}$ , the near-band performances are slightly degraded, but in the same time, the out-of-band noise specifications are still met, thanks to the noise rejection margin, taken for the design of the FIR filter (Fig. 3.50).





In the second case, we compute the switching current  $I_{sc}$  at each time period, and we define  $\Delta V_{DD} = k_r \cdot I_{sc}$  ( $k_r$  has the physical significance of a resistance). Simulation results for different values of  $k_r$  ( $k_r = 0,...,10$ ) are presented in Fig. 3.51 and show that the OOB mask is met overall even for a critical value of  $\Delta V_{DD} = 10 \cdot I_{sc}$ , thus proving the robustness of the proposed architecture.

Finally, we note that an increase of  $k_r$  ( $k_r > 3$ ) determines a gradual degradation of the near-band noise performances, which is in agreement with the previous non-idealities studies, which show that the near-band is the most vulnerable to physical non-idealities (switching, coefficients, supply variation).



Fig. 3.51. Output spectrum: Effect of switching current over the non-ideal supply voltage

# **3.6.** Conclusion

The system design of the proposed all-digital transmitter based on DSM and FIR-PA has been covered in this chapter. The methodology was concentrated on a co-design of constitutive blocks, which enabled the optimization of the design in order to fit the specifications of the 802.11 standard, by using the advantages of one block to compensate the disadvantages of other blocks, namely:

- The low resolution of the DSM (single-bit) is extremely important for the operation of the proposed Half-SC FIR-PA, because the output levels are constant, and the available cells can be used to implement a FIR function; furthermore, it simplifies the FIR-PA architecture, which can be implemented with simple switches and capacitors; moreover, the use of time-interleaving enables a lower operating frequency in the digital circuits, thus allowing the integration of extended configurability with reduced timing constraints;
- It is seen, that in order to compensate the out-of-band noise generated by the DSM, the FIR filter requires 109 coefficients quantized to 8 bits; however, if

the attenuation of the output RLC band-pass filter was taken into consideration, then the coefficient resolution can be reduced down to 5 bits; in addition, by introducing extended digital configurability, half of the coefficients can be eliminated to obtain the Half-SC operation, which allows a 16 times complexity reduction for the same performance (55 coefficients on 5 bits vs. 109 coefficients on 8 bits);

To summarize, the advantages of low resolution and low-speed operation of the timeinterleaved DSM compensate the complexity of the FIR-PA, whereas the FIR-PA excellent noise rejection compensates the out-of-band noise generated by the DSM.

Each constitutive block was presented with advantages and disadvantages and I proposed several techniques in order to achieve the design specifications derived at the end of Chapter 2. These contributions include:

- The design of a 8-channels TI DSM based on CIFB structure for high-speed operation in order to target WLAN signals with bandwidths up to 160 MHz;
- ➤ Configurable FIR filter with 5-bit quantization using RLC matching network to meet the out-of-band noise specifications when transmitting WLAN signals at a center frequency  $f_c \approx 2.4$  GHz;
- Improved Half-SC differential FIR-PA to achieve a peak output power of ~16 dBm at reduced power consumption and area.

Consequently, it is shown, that the proposed structure has a maximum complexity of 907 unitary cells and 55 driving signals, and provides highly adapted noise suppression in the most critical bands (for example the GPS band around 1.575 GHz) for excellent multi-standard coexistence with nearby receivers.

Finally, the effects of real circuit non-idealities over the output spectrum were analyzed, in order to set the design specifications for the unitary switch cell (with respect to the sampling period  $T_s = 1/4^* f_c \approx 100$  ps), capacitance, and supply voltage variation tolerance:

- Switch cell:
  - $t_{rx} = 5 \text{ ps}, t_{fx} = 5 \text{ ps};$

•  $\Delta t_r$ ,  $\Delta t_f$ ,  $\Delta t_{PLH}$ ,  $\Delta t_{PHL}$  normally distributed

• mean  $m_{PLH/PHL} = 0$  and  $\sigma_{PLH/PHL} = 0.5$  ps;

•  $t_{j,var}$  normally distributed

• mean  $m_{j,var} = 0$  and  $\sigma_{j,var} = 1$  ps;

Capacitance:

- Total capacitance value  $C_T \approx 2 \text{ pF}$ ;
- $\Delta C_i$  normally distributed
  - mean  $m_c = 0$  and  $\sigma_c \le 0.05$ ;
- We note that for  $\sigma_c \leq 0.1$ , we can compensate the non-ideal coefficients values through a configurable  $\pm 1, \pm 2$  cells feature.

#### Supply voltage:

- Nominal  $V_{DD} = 1$ V;
- $\Delta V_{DD}$  normally distributed
  - mean  $m_c = 0$  and  $\sigma_{vdd} \le 0.03 \cdot V_{DD}$ ;
- $\Delta V_{DD}$  as a function of the switching current  $\Delta V_{DD} = k_r \cdot I_{sc}$ 
  - 00B mask met overall;
  - degradation of the near-band noise performances for  $k_r \ge 3$

# **3.7.** Chapter Bibliography

**[Alavi14]** M. S. Alavi, R. B. Staszewski, L. C. N. de Vreede, J. R. Long, "A Wideband 2x13-bit All-Digital I/Q RF-DAC," *IEEE Transactions On Microwave Theory And Techniques*, vol. 62, no. 4, pp. 732-752, Apr. 2014.

**[Bhide13a]** A. Bhide, O. E. Najari, B. Mesgarzadeh, A. Alvandpour, "Critical Path Analysis of Two-channel Interleaved Digital MASH  $\Delta\Sigma$  Modulators," *NORCHIP*, Vilnius, pp. 1-4, Nov. 2013.

**[Bhide13b]** A. Bhide, O. Najari, B. Mesgarzadeh, and A. Alvandpour, "An 8-GS/s 200-MHz Bandwidth 68-mW  $\Delta\Sigma$  DAC in 65-nm CMOS," *IEEE Trans. Circuits Syst. II: Express Briefs*, vol. 60, no. 7, pp. 387–391, July 2013.

**[Bhide15]** A. Bhide, A. Alvandpour, "An 11 GS/s 1.1 GHz Bandwidth Interleaved  $\Delta\Sigma$  DAC for 60 GHz Radio in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 50, no. 10, pp. 2306-2318, Oct. 2015.

**[Bishop10]** C. Bishop, C Kung, "Effects of averaging to reject unwanted signals in Digital Sampling Oscilloscopes," in *IEEE AUTOTESTCON*, Sept. 2010.

**[Chao90]** K. C. H. Chao, S. Nadeem, W. Lee and G. C. Sodini, "A higher order topology for interpolative modulators for oversampling A/D converters," *IEEE Trans. Circuits and Systems*, vol. 37, pp. 309-318, Mar. 1990.

**[Cipriani10]** E. Cipriani, P. Colantonio, F. Giannini and R. Giofre, "The Switched Mode Power Amplifiers" in "Advanced Microwave and Millimeter Wave Technologies Semiconductor Devices Circuits and Systems," Moumita Mukherjee (Ed.), ISBN: 978-953-307-031-5, InTech, 2010, (online): http://www.intechopen.com/books/advancedmicrowave-and-millimeter-wave-technologies-semiconductordevices-circuits-andsystems/the-switched-mode-power-amplifiers.

**[Ebrahimi11]** M. M. Ebrahimi, M. Helaoui, and F. M. Ghannouchi, "Time-Interleaved Delta-Sigma Modulator For Wideband Digital GHz Transmitters Design and SDR Applications," *J. Progress Electromagnetics Res. B*, vol. 34, pp. 263–281, July 2011.

**[Frappe07]** A. Frappé, "All-Digital RF signal generation using ΔΣ Modulation for Mobile Communication Terminals," PhD Thesis, University of Science and Technology Lille 1, Dec. 2007.

**[Frappe09]** A. Frappé, A. Flament, B. Stefanelli, A. Kaiser and A. Cathelin, "An All-Digital RF Signal Generator Using High-Speed  $\Delta\Sigma$  Modulators," *IEEE J. Solid-State Circuits*, vol. 44, no. 10, pp. 2722-2732, Oct. 2009.

**[Gebreyohannes16]** F. T. Gebreyohannes, A. Frappé, and A. Kaiser, "A Configurable Transmitter Architecture for IEEE 802.11ac and 802.11ad Standards," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 63, no. 1, pp. 9-13, Jan. 2016.

**[Hatami14]** S. Hatami, M. Helaoui, F. M. Ghannouchi, M. Pedram, "Single-Bit Pseudoparallel Processing Low-Oversampling Delta–Sigma Modulator Suitable for SDR Wireless Transmitters," *IEEE Transactions on VLSI Systems*, vol. 22, no. 4, pp. 922-931, April 2014. **[KP93]** R. Khoini-Poorfard, D.A. Johns, "Time-interleaved oversampling converters," *Electronic Letters*, pp. 1673-1674, Sept. 1993.

**[Kozak03]** M. Kozak, I. Kale, "Oversampled Delta-Sigma Modulators," Kluwer Academic Publishing, 2003.

**[Madoglio10]** P. Madoglio et al., "A 2.5-GHz, 6.9-mW, 45-nm-LP CMOS, ΔΣ Modulator Based on Standard Cell Design With Time-Interleaving," *IEEE J. Solid-State Circuits*, vol. 45, no. 7, pp. 1410-1420, July 2010.

**[Marin15]** R.-C. Marin, A. Frappé, A. Kaiser, and A. Cathelin, "Considerations for high-speed configurable-bandwidth time-interleaved digital delta-sigma modulators and synthesis in 28 nm UTBB FD-SOI," in *2015 IEEE 13th International New Circuits and Systems Conference (NEWCAS)*, Grenoble, pp. 1–4, June 2015.

**[Marin16]** R.-C. Marin, A. Frappé, A. Kaiser, "Delta-Sigma Based Digital Transmitters with Low-Complexity Embedded-FIR Digital to RF Mixing," *23rd IEEE International Conference on Electronics, Circuits and Systems (ICECS)*, Monte Carlo, pp. 237-240, Dec. 2016.

**[Marin17a]** R.-C. Marin, A. Frappé, A. Kaiser, "Considerations for Complex Digital Delta-Sigma Modulators for Standard Coexistence in Digital Wireless Transmitters," **accepted** to *IEEE Trans. Circuits Syst. I, Reg. Papers*, June 2017.

**[Matsuya87]** Y. Matsuya et. al., "A 16-bit oversampling A-to-D conversion technology using triple integration noise shaping," *IEEE J. Solid-State Circuits*, vol SC-22, pp. 921-929, Dec. 1987.

[Meng06] X. Meng, K. Arabi, and R. Saleh, "Novel Decoupling Capacitor Designs for sub-90nm CMOS Technology," *7th International Symposium on Quality Electronic Design (ISQED)*, March 2006.

**[Podsiadlik14]** T. Podsiadlik, R. Farrell, "Time-Interleaved  $\Delta\Sigma$  Modulators for FPGAs," *IEEE Trans. Circuits Syst. II: Express Briefs*, vol. 61, no. 10, pp. 808-812, Oct. 2014.

**[Razavi06]** B. Razavi, "Fundamentals of Microelectronics", Hoboken, NJ: Wiley, pp. 784-785, 2006.

**[Schmidt11]** M. Schmidt, S. Haug, M. Grözing, M. Beroth, "A pipelined 3-level bandpass delta-sigma modulator for class-S power amplifiers," in *2011 IEEE International Symposium on Circuits and Systems (ISCAS)*, Rio de Janeiro, pp. 2757-2760, May 2011.

**[Schreier05]** R. Schreier and G. C. Temes, "Understanding Delta-Sigma Data Converters," Hoboken, NJ, John Wiley& Sons, Inc., 2005.

**[Schreier16]** R. Schreier. (2016, April). "Delta-Sigma Toolbox," [Online]. Available: http://www.mathworks.fr.

**[Silva12]** N. V. Silva, A. S. R. Oliveira, U. Gustavsson, N. B. Carvalho, "A Novel All-Digital Multichannel Multimode RF Transmitter Using Delta-Sigma Modulation," *IEEE Microwave and Wireless Components Letters*, vol. 22, no. 3, pp. 156-158, March 2012.

**[Smilkstein07]** T. H. Smilkstein, "Jitter Reduction on High-Speed Clock Signals," PhD Thesis, Electrical Engineering and Computer Sciences, University of California at Berkeley, Aug. 2007.

**[Uchimura88]** K. Uchimura, T. Hayashi, T. Kimura, A. Iwata, "Oversampling A-to-D and D-to-A converters with multistage noise shaping modulators," *IEEE Trans. Acoust., Speech, Signal Processing*, vol. 36, pp. 1889-1905, Dec. 1988.

**[Ye13]** L. Ye, J. Chen, L. Kong, E. Alon, A. M. Niknejad, "Design Considerations for a Direct Digitally Modulated WLAN Transmitter With Integrated Phase Path and Dynamic Impedance Modulation," *IEEE Journal Of Solid-State Circuits*, vol. 48, no. 12, pp. 3160-3177, Dec. 2013.

**[Yoo11]** S.-M. Yoo, J. S. Walling, E. C. Woo, D. J. Allstot, "A switched-capacitor power amplifier for EER/Polar transmitters," *IEEE International Solid-State Circuits Conference (ISSCC)*, pp. 428–430, Feb. 2011.

**[Zheng13]** S. Zheng, H. C. Luong, "A CMOSWCDMA/WLAN Digital Polar Transmitter With AM Replica Feedback Linearization," *IEEE Journal Of Solid-State Circuits*, vol. 48, no. 7, pp. 1701-1709, July 2013.

# 4. All-digital transmitter circuit design

The circuit design of the proposed all-digital transmitter will be detailed in the present chapter, highlighting the implementation of the functionalities described previously for reduced area and power consumption.

First, we focus on the design of the switched capacitor power amplifier in three directions: i) the design of the switch as a basic CMOS inverter for reduced power consumption, ii) the use of the body-bias functionality offered by the 28 nm FD-SOI CMOS technology from STMicroelectronics to cancel the effects of switching non-idealities such as rise/fall time and low-to-high/high-to-low delay, and iii) the use of custom metal-oxide-metal (MOM) capacitors for minimal area.

Next, we present the digital circuit design of the signal generator block which includes the Delta-Sigma modulators, additional logic operations for configurability, the up-conversion digital mixer and signal drivers.

Finally, we state the main design contributions and identify possible improvements, which could be implemented in a second IC to improve the overall performances of the alldigital transmitter.

# 4.1. Transmitter IC description

This section presents an overview of the transmitter IC, including the block diagram, the IC structure and final layout view and introduces the main processing blocks which will be described in the following sections.

## 4.1.1. IC block diagram

The proposed all-digital transmitter chain (Fig. 4.1) comprises three main parts: the digital signal processing (DSP) unit which is used to generate the digital inputs of the system, the transmitter IC and the matching network which is implemented together in the IC package and on the printed circuit board (PCB). This section focuses on the IC description, whereas the DSP and the matching network are described in Chapter 5.



Fig. 4.1. Proposed all-digital transmitter implementation

The IC includes a single-bit 8-channel TI DSM used to provide the input signals to the Half-SC differential FIR-PA. We note that the TI DSM was designed according to the estimation in Section 3.2.4 for the optimum number of time-interleaving channels.

The 8 single-bit outputs of the TI DSM at  $f_c/4$  (carrier frequency,  $f_c \approx 2.4$  GHz) are first passed through four 2:1 serializers in parallel (Serializer 8:4) in order to obtain 4 streams at  $f_c/2$ . Next, the FIR-PA SIGGEN block generates the additional logic needed to enable the Half-SC operation (as shown in Section 3.3.3.2). At the output of this block, we obtain all the signals for the digital up-conversion mixer, which works as an interconnection matrix.

Consequently, the resulting streams at  $4*f_c$  are  $RFP = \{IP1, QP1, IP2, QP2, IP3, QP3, IP4, QP4\}$  and  $RFN = \{IN1, QN1, IN2, QN2, IN3, QN3, IN4, QN4\}$ . Finally, we note that the 55 streams of RFP and RFN correspond to the maximum number of coefficients of the FIR filter (Table 3.5 in Section 3.3.2.2) divided by 2 for Half-SC operation.

# 4.1.2. IC configuration and physical implementation

The proposed transmitter has been designed in two versions to enable the possibility of performing multiple tests using only one IC (Side A MUX and Side B FF in Fig. 4.2).





On the one hand, the first version (Side A MUX) uses externally generated Delta-Sigma modulated signals and does not include a DSM block. The mixing of *I* and *Q* signals is performed using multiplexers controlled by a clock signal (*CLK\_2fc*) at a frequency  $f_1 = 2^*f_c$ , which is around 4.8 GHz when transmitting WLAN signals around  $f_c = 2.4$  GHz.

Hence, in this version, the digital circuits work at a maximum clock frequency of around 4.8 GHz, to relax timing constraints in the clock tree generation and reduce power consumption (compared to an operation at  $4*f_c \approx 9.6$  GHz, as described in Chapter 3).

On the other hand, the second version (Side B FF) can work with both on-chip DSM (DSM\_I/Q block) and externally generated DSM outputs (DSM\_IN block), whereas the onchip DSM can also be measured separately through the OUT\_DSM pad. Moreover, this version builds upon the Side A MUX version, and introduces additional D Flip-flops after the multiplexer stages in order to resynchronize the output signals. In addition, the clock signal used for the D Flip-flops (*CLK\_4fc*) is generated at a frequency  $f_2 = 4*f_c$ , which is around 9.6 GHz for WLAN signals around  $f_c = 2.4$  GHz.

The IC is designed in STMicroelectronics 28 nm FD-SOI CMOS LVT process and occupies an area of 1.63 x 1.4 mm<sup>2</sup> (Fig. 4.3), which is mainly determined by the number of Flip-chip pads (36) and the spacing between these pads  $d_{cc} = 0.2$  mm (distance center to center; chosen to relax the physical constraints for the IC package design). Furthermore, the area occupied by the differential power amplifier and the digital blocks (FIR-PA SIGGEN and DRFM) including signal routing is 0.32 x 0.15 mm<sup>2</sup> (~2.1 % of the total area).



Besides, the DSM\_I/Q, DSM\_IN and DSM\_OUT blocks occupy 0.24 x 0.1 mm<sup>2</sup> (~1 % of total area). Thus, the Side A MUX circuit occupies ~2.1 % out of the total area, the Side B FF circuit ~3.1 %, whereas the rest of the area excluding the ESD ring (around 84 %) is occupied by decoupling capacitors.

This reveals one of the major contributions of this work, namely the full integration of the FIR-PA under a Flip-chip pad, resulting in **zero** additional area space needed for the PA, since the signal pads are mandatory on chip.

In order to identify the 36 Flip-chip pads, we can visualize in Fig. 4.4 the pad matrix with the corresponding signal names (left) and the associated color code (right). We can first remark that each circuit has separated 1 V supply voltages for the digital circuits (D1 and D6) and PA stage (A1 and A6), which determines the separation of the ESD ring into four power supply domains. In addition, the RF output pads (B1, C1 and B6, C6) are surrounded by either *GND* or  $V_{DD}$  pads to provide isolation from the rest of the signals.

|   | А        | В        | С       | D         | E             | F        |                                                                    |
|---|----------|----------|---------|-----------|---------------|----------|--------------------------------------------------------------------|
| 1 | VDD_B    | OUT_P_B  | OUT_N_B | VDD_DIG_B | CLK_4fc       | OUT_DSM  | Power supplies and GND                                             |
| 2 | IN_I1    | GND      | GND     | GND       | GND           | IN_Q1    | Body-bias voltages                                                 |
| 3 | IN 12    | resetb   | csb pad | lfclk     | VBBP DIG      | IN Q2    | SPI Reset and Chip select                                          |
| 4 | <br>IN13 | sclk_pad | miso    | VBBN_A    | -<br>VBBN_DIG | IN_Q3    | SPI Clock and Signals (< 20 MHz)<br>High-speed signals (1 - 5 GHz) |
| 5 | IN_14    | GND      | GND     | GND       | GND           | IN_Q4    | High-speed clocks (max 10 GHz)                                     |
| 6 | VDD_A    | OUT_P_A  | OUT_N_A | VDD_DIG_A | CLK_2fc       | mosi_pad | RF outputs                                                         |

Fig. 4.4. Pad matrix (left); Color code (right)

Moreover, we take advantage of the back-gate feature of the FD-SOI technology in order to introduce an additional control mechanism for performance optimization [Sourikopoulos15] [Cathelin17]. This feature enables a large forward body-bias range (0 V to 3 V for LVT nMOS and -3 V to 0 V for LVT pMOS) thanks to the ultra-thin buried oxide beneath the transistor channel which allows the use of the local substrate as a back-gate of the transistor channel. Thus, the threshold voltage can be tuned to allow faster transistor switching and adjust the effective transconductance of the active transistors for improved static and dynamic switch performance. Three body-bias voltages are used in the design, two in the digital circuit block *VBBP\_DIG* and *VBBN\_DIG*, and one in the PA output stage *VBBN\_A*, which is found to be sufficient to ensure the equalization of the inverter switching parameters (as shown in Section 4.2.2.1).





Finally, a Serial Peripheral Interface (SPI) is used to program the digital block for the different modes of operation of the proposed transmitter. The corresponding I/O signals are shown in Fig. 4.4 (B3, C3, D3, B4, C4, F6).

# 4.2. Switched-capacitor FIR-PA design

This section presents the circuit design of the proposed switched-capacitor FIR-PA. The first step is to determine the number of unitary cells which will be implemented in order to provide a highly configurable solution in terms of signal bandwidth, FIR filtering function and matching network resonance frequency. This will determine the area constraints of the unitary power cells in order to integrate the complete PA stage under a Flip-chip pad. Next, we will describe the unitary power cell, composed of a CMOS inverter and MOM capacitor, which fit the area and power constraints. Finally, we analyze the performances of the ideal versus layout parasitic extracted designs and complete the power efficiency study presented in Table 3.8 (Section 3.3.3.2).

# 4.2.1. Configurable FIR-PA

The configurability of the proposed FIR-PA is implemented on three levels, namely WLAN signal bandwidth (20 to 160 MHz), improved standard coexistence (configurable FIR coefficients), and WLAN channel selectivity (variable resonance frequency of the matching network). This leads to a distribution of three types of power cells, whereas the

first category is fixed and handles the basic PA function, and the two other sets of cells can be enabled / disabled and add to the increased configurability of the proposed solution.

#### 4.2.1.1. Initial estimation of power cell distribution

First, we can obtain the number of signal streams ( $N_{st}$ ) and minimum number of associated power cells ( $N_{pc}$ ) considering the values given in Table 3.5, which are divided by 2 for Half-SC operation, namely  $N_{st}$  = 55, and  $N_{pc}$  = 910. Moreover, the ±1, ±2 mechanism (Section 3.3.2.2), consisting of adding/removing 1 or 2 cells, is used in order to change the value of a targeted coefficient for improved standard coexistence.

Hence, the total number of configurable cells associated to the filter coefficients is approximately equal to  $N_{FIRconf} \approx 2*2*55 = 220$ , where the first factor "2" represents the maximum number of cells to be added/removed per coefficient, the second "2" takes into consideration the "±" sign, and "55" is the number of streams ( $N_{st}$ ), respectively. Consequently, we obtain  $N_{pc,FIRconf} = N_{pc} + N_{FIRconf}/2 = 910+110 = 1020$  power cells which build the total capacitance  $C_T$ , seen in either nodes  $C_{x,pos}$  or  $C_{x,neg}$  (Fig. 3.30).

Finally, as described in the previous chapter, the resonance frequency of the output matching network is determined by the total capacitance  $C_T$  and the inductance L (Fig. 4.1). Furthermore, the inductors are integrated in the IC package and cannot be changed, which means that the inductance value L is fixed. Thus, in order to configure the matching network, we propose to introduce additional capacitor cells ( $N_{MN} = 0.15 * N_{pc,FIRconf}$ ) which can be dis-/connected from-/to the output nodes  $C_{x,pos}$  and  $C_{x,neg}$ . We note that this configuration will mainly address the WLAN channel selectivity (for example, the three non-overlapping 20 MHz channels centered at 2.412 GHz, 2.437 GHz, and 2.462 GHz), but it can also compensate for possible implementation non-idealities, such as the tolerance of the inductance value L, or additional parasitic inductance due to routing or packaging.

In conclusion, we can make an initial estimation of the total number of power cells,  $N_{pc,total} \approx 1.15 * N_{pc,FIRconf} \approx 1170$ , whereas 800 (68 % of total) power cells are fixed, 220 (19 % of total) cells are used to adjust the filter coefficients and 150 (13 % of total) cells are used to configure the matching network.

## 4.2.1.2. Actual power cell distribution

First, we determine the filter coefficients corresponding to the four targeted signal bandwidths {20, 40, 80, 160} MHz in the Half-SC scheme. The 20 MHz filter has 55 coefficients with a total number of 907 cells and is further taken as a reference to obtain bandwidth configurability. Using this reference, we can easily obtain the 40 MHz filter, as it also has 55 coefficients with a maximum coefficients difference of ±2.

However, this cannot be directly applied to the 80 MHz and 160 MHz filters, because the number of coefficients is not the same, namely 34 coefficients for 80 MHz and 21 coefficients for 160 MHz. We recall that the number of FIR filter coefficients for larger bandwidths (80 MHz and 160 MHz) is reduced with respect to the reference filter, thanks to the relaxed near-band noise requirements (Section 3.3.1).

Hence, the coefficients values can be adjusted by multiplying them with a factor of  $55/34 \approx 1.6$  (80 MHz) and  $55/21 \approx 2.6$  (160 MHz) and to combine the 55 coefficients (of the reference filter) in order to obtain either 34 or 21 coefficients (see Table 4.1). For example, in the initial 80 MHz case, the maximum value of a coefficient is 32. After the multiplication with the factor 1.6 we obtain a rounded value of 54, which can be obtained by combining two coefficients from the reference filter (32 + 22 = 54). This feature is implemented in the digital FIR-PA SIGGEN block and will be presented in Section 4.3.2.

The coefficient adjustment (for 80 MHz and 160 MHz) is implemented here in order to keep switching the most of the power cells in all bandwidth cases, thus maintaining the maximum possible amplitude of the output signal relative to the supply voltage  $V_{DD}$ .

Consequently, the resulting sets of coefficients associated to the signal bandwidths {20, 40, 80, 160} MHz for the Half-SC scheme are plotted in Fig. 4.6. The total number of cells is 907 in the 20 MHz case, 866 for 40 MHz, and 885 for 80 MHz and 160 MHz, respectively.

| FIR 80                |                        | FIR 160                |                               |  |  |
|-----------------------|------------------------|------------------------|-------------------------------|--|--|
| Coefficients FIR 80   | Coefficients FIR 20/40 | Coefficients FIR 160   | Coefficients FIR 20/40        |  |  |
| c <sub>1</sub>        | <b>c</b> <sub>1</sub>  | <b>c</b> <sub>1</sub>  | C9                            |  |  |
| c <sub>2</sub>        | c <sub>2</sub>         | c <sub>2</sub>         | c <sub>12</sub>               |  |  |
| c <sub>3</sub>        | <b>c</b> <sub>5</sub>  | <b>c</b> <sub>3</sub>  | $c_3 + c_9 + c_{13}$          |  |  |
| c <sub>4</sub>        | c <sub>8</sub>         | <b>c</b> <sub>4</sub>  | $c_6 + c_7 + c_{15}$          |  |  |
| <b>c</b> <sub>5</sub> | <b>c</b> 9             | <b>c</b> <sub>5</sub>  | $c_{11} + c_{20}$             |  |  |
| c <sub>6</sub>        | c <sub>12</sub>        | <b>c</b> <sub>6</sub>  | $c_{16} + c_{21}$             |  |  |
| c <sub>7</sub>        | c <sub>14</sub>        | c <sub>7</sub>         | $c_{19} + c_{23}$             |  |  |
| c <sub>8</sub>        | c <sub>15</sub>        | c <sub>8</sub>         | $c_4 + c_{22} + c_{25}$       |  |  |
| <b>C</b> 9            | c <sub>17</sub>        | C9                     | $c_8 + c_{26} + c_{27}$       |  |  |
| c <sub>10</sub>       | $c_6 + c_{18}$         | c <sub>10</sub>        | $c_{30} + c_{33}$             |  |  |
| c <sub>11</sub>       | $c_3 + c_{21}$         | c <sub>11</sub>        | $c_5 + c_{34} + c_{35}$       |  |  |
| c <sub>12</sub>       | c <sub>25</sub>        | c <sub>12</sub>        | $c_1 + c_2 + c_{36} + c_{38}$ |  |  |
| c <sub>13</sub>       | c <sub>26</sub>        | c <sub>13</sub>        | $c_{14} + c_{37} + c_{41}$    |  |  |
| c <sub>14</sub>       | $c_4 + c_{28}$         | c <sub>14</sub>        | $c_{17} + c_{39} + c_{42}$    |  |  |
| c <sub>15</sub>       | $c_{10} + c_{29}$      | <b>c</b> <sub>15</sub> | $c_{18} + c_{43} + c_{46}$    |  |  |
| c <sub>16</sub>       | c <sub>37</sub>        | c <sub>16</sub>        | $c_{24} + c_{44} + c_{45}$    |  |  |
| c <sub>17</sub>       | C <sub>38</sub>        | c <sub>17</sub>        | $c_{28} + c_{47} + c_{48}$    |  |  |
| c <sub>18</sub>       | $c_7 + c_{39}$         | c <sub>18</sub>        | $c_{29} + c_{49} + c_{51}$    |  |  |
| c <sub>19</sub>       | $c_{11} + c_{40}$      | C <sub>19</sub>        | $c_{32} + c_{50} + c_{52}$    |  |  |
| c <sub>20</sub>       | $c_{13} + c_{41}$      | c <sub>20</sub>        | $c_{31} + c_{53} + c_{54}$    |  |  |
| c <sub>21</sub>       | $c_{16} + c_{42}$      | c <sub>21</sub>        | $c_{40} + c_{55}$             |  |  |
| c <sub>22</sub>       | $c_{19} + c_{43}$      | Х                      | Х                             |  |  |
| c <sub>23</sub>       | $c_{20} + c_{44}$      | Х                      | Х                             |  |  |
| c <sub>24</sub>       | $c_{23} + c_{45}$      | Х                      | Х                             |  |  |
| c <sub>25</sub>       | $c_{24} + c_{48}$      | Х                      | Х                             |  |  |
| c <sub>26</sub>       | $c_{27} + c_{49}$      | Х                      | Х                             |  |  |
| c <sub>27</sub>       | $c_{30} + c_{46}$      | Х                      | Х                             |  |  |
| c <sub>28</sub>       | $c_{32} + c_{47}$      | Х                      | Х                             |  |  |
| C <sub>29</sub>       | $c_{31} + c_{51}$      | X                      | X                             |  |  |
| C <sub>30</sub>       | $c_{34} + c_{50}$      | Х                      | Х                             |  |  |
| c <sub>31</sub>       | $c_{33} + c_{52}$      | Х                      | Х                             |  |  |
| c <sub>32</sub>       | $c_{35} + c_{53}$      | X                      | Х                             |  |  |
| C <sub>33</sub>       | $c_{36} + c_{54}$      | X                      | X                             |  |  |
| C <sub>34</sub>       | $c_{22} + c_{55}$      | X                      | X                             |  |  |

Table 4.1. Coefficient adjustment for 80 MHz and 160 MHz

Finally, we obtain the distribution of power cells, namely 790 fixed cells, 192 coexistence configuration cells ( $\pm 1$ ,  $\pm 2$  mechanism) and 170 matching network configuration cells for a total number of 1152 cells, which are divided into a 32 x 36 matrix.



Fig. 4.6. FIR filter coefficients for multi-standard: BW = {20, 40, 80, 160} MHz

## 4.2.2. Unitary power cell

One of the innovations proposed in this work is to build the FIR-PA circuit based on the signal pad which is mandatory on chip, thus reducing the effective area down to zero. This is made possible using the 10 metal layers with Flip-Chip pads flavor of the 28 nm CMOS FD-SOI technology. Hence, the CMOS transistors of the inverter are built up to Metal1, inter-/intra-cell connections are made on Metal2 and Metal3, the MOM capacitor is built from Metal4 up to Metal10 and finally the Flip-Chip pad is on Alucap.

However, integrating the FIR-PA under the signal pad adds an area constraint on the design of the unitary power cell, and mainly on the MOM capacitor. The Flip-Chip pad used in the technology has an octagonal shape with a pad length of 87  $\mu$ m and a pad opening of 55  $\mu$ m (Fig. 4.7). Hence, the target area of the complete PA is initially set to a square of 55  $\mu$ m \* 55  $\mu$ m. Therefore, we can obtain the target area of the unitary cell by dividing the dimension of the pad opening (55  $\mu$ m) by the number of lines (32) and columns (36) of the power cell matrix, namely approximately 1.7 x 1.5  $\mu$ m<sup>2</sup>.



Fig. 4.7. Flip-Chip Pad dimensions in 28 nm CMOS FD-SOI

## 4.2.2.1. CMOS inverter design

In Chapter 3 we have described the FIR-PA architecture based on unitary power cells, which are built with an ideal switch and a capacitor. In order to maintain the digital nature of the proposed transmitter chain with low complexity, the switch is implemented as a CMOS inverter. A detailed description of the CMOS inverter can be found in [Rabaey03].

In the following subsections, we will describe the steps of the CMOS inverter design, considering a trade-off between dynamic performance, area, power consumption and on-resistance efficiency loss: i) the dynamic performance has to meet the design constraints in terms of minimizing the switching non-idealities, ii) the total area of a unitary cell needs to allows the implementation of the PA under a signal pad, iii) increasing the inverter size determines a larger power consumption, mainly due to larger parasitic capacitances and direct-path peak current, and iv) increasing the inverter size reduces the on-resistance and minimizes the efficiency loss.

#### 4.2.2.1.1. Inverter sizing

The design of a CMOS inverter is concentrated on the sizing of the pMOS and nMOS transistors in order to reduce the switching non-idealities discussed in Section 3.4.1, such as the rise/fall time ( $t_r$ ,  $t_f$ ) and low-to-high/high-to-low delay ( $t_{PLH}$ ,  $t_{PHL}$ ). Furthermore, the inverter cell is designed for the initial capacitance value of around 2.2 fF (Chapter 3), and accounts for the process variation of the integrated capacitor and additional parasitics due to physical implementation.

The main design parameter is the ratio between the widths of the pMOS and the nMOS transistors  $R_W = W_P / W_N$ , for the same channel length  $L_P = L_N = 28$  nm, which was set to minimum to enable the fastest switching possible. The parameter  $W_N$  is used to minimize the rise/fall time and low-to-high/high-to-low delay for the given capacitance value, whereas the ratio  $R_W$  is considered for equalizing either the rise and fall times, or the low-to-high and high-to-low delay, with respect to the study of non-idealities in Section 3.4.1.

However, we wish to perform this equalization in the same time ( $t_r = t_f$  and  $t_{PLH} = t_{PHL}$ ), and in order to do so, we propose to use the back-gate feature of the 28 nm FD-SOI technology to reduce switching non-idealities and improve performances.

Consequently, it is seen in simulation that using a variable back-gate voltage only for the nMOS transistor (Fig. 4.8) is sufficient for the targeted equalization with less than 1.5% difference (Table 4.2).



Fig. 4.8. CMOS inverter with nMOS back-gate control

## 4.2.2.1.2. MOM capacitor design

The capacitor architectures available in the technology design kit are: Poly-Nwell, metal-oxide-metal (MOM) and metal-insulator-metal (MIM) capacitors. Furthermore, the capacitor should have a capacitance of around 2.2 fF for an area of 2.6  $\mu$ m<sup>2</sup> (the capacitance value is given in Section 3.3.2.2, whereas the cell area is discussed in Section 4.2.2), without active devices, in order to allow the integration of a capacitor and a CMOS inverter under the signal pad. Consequently, the MOM capacitor is preferred for the implementation of the power cell, because Poly-Nwell capacitor cannot be integrated together with a CMOS inverter and MIM capacitors have a typical density larger than 15 fF /  $\mu$ m<sup>2</sup>.

The designed MOM capacitor is built with Metal4 up to Metal10 for reduced corner variation (Fig. 4.9 left), whereas the capacitor fingers are implemented on Metal4 to Metal 8, and the last two metal layers, Metal9 and Metal10, are used for the output terminal. Thanks to the proposed layout, the MOM capacitor takes advantage of both lateral (between adjacent metal lines) and vertical fields (stacked metal lines) for a value of approximately 1.66 fF (RC extraction - nominal), which depends on the number and the width of metal fingers that fit the desired cell area (~2.6  $\mu$ m<sup>2</sup>).

In addition, in order to improve performance (reduce output node parasitics due to layout), we propose a cell grouping 4 by 4 (Fig. 4.9 right) with the following advantages:

- > The ring (output terminal) provides isolation between different cell groups;
- There is no inner-group ring, thus reducing the parasitic capacitance seen at the output terminal (maintain PA efficiency);
- > Additional free space which can be used for supply decoupling.

Finally, we note that the implemented capacitance value of  $\sim$ 1.66 fF is less than the value used in system-level simulations, meaning that the total capacitance seen in the output node is reduced, which determines less power consumption due to capacitor switching (ideally 25% less) for the same output power.



Fig. 4.9. Layout MOM capacitor: unitary cell (left); Group of 4 capacitors (right)

The results in terms of transistor sizing and switching non-idealities are displayed in Table 4.2 for the initial unitary load capacitance  $C_{u1} = 2.2$  fF and the designed unitary capacitance  $C_{u2} = 1.66$  fF (MOM capacitor). As expected, when the load capacitance is lower, the inverter switches faster, thus obtaining a rise/fall time of ~5.1 ps and a low-tohigh/high-to-low delay of ~4.4 ps (for  $C_{u2}$ ).

| Table 4.2. | CMOS | inverter | design | results |
|------------|------|----------|--------|---------|
|------------|------|----------|--------|---------|

| Design never store                                                                                                   | Switching non-idealities |                     |                       |                       |  |  |
|----------------------------------------------------------------------------------------------------------------------|--------------------------|---------------------|-----------------------|-----------------------|--|--|
| Design parameters                                                                                                    | t <sub>r</sub> [ps]      | t <sub>f</sub> [ps] | t <sub>PLH</sub> [ps] | t <sub>PHL</sub> [ps] |  |  |
| $C_{u1} = 2.19 \text{ fF}$<br>$W_N = 0.72 \ \mu\text{m}$<br>$W_P = 1.72 \ \mu\text{m}$<br>$V_{bbn} = [0.6, 0.8] \ V$ | 5.71                     | 5.71 – 5.74         | 4.88                  | 4.83 - 4.94           |  |  |
| $C_{u2} = 1.66 \text{ fF}$<br>$W_N = 0.72 \ \mu\text{m}$<br>$W_P = 1.72 \ \mu\text{m}$<br>$V_{bbn} = [0.5, 0.7] \ V$ | 5.11                     | 5.09 - 5.1          | 4.47                  | 4.42 - 4.52           |  |  |

Finally, we can visualize in Fig. 4.10 the group of 4 power cells, based on the designed CMOS inverter and MOM capacitor. It is noticed that the center area of the cell remains empty, thus leaving enough space to add decoupling capacitors (between  $V_{DD}$  and GND) to reduce the power supply noise due to *IR* drop and *Ldi/dt* effects [Meng06].



Fig. 4.10. Group of 4 power cells: schematic (left); layout (right)

#### 4.2.2.1.3. Power consumption estimation

The total power consumption of the CMOS inverter ( $P_{SW}$ ) can be expressed as a sum of three components: capacitive ( $P_{Cpar}$ ), direct-path ( $P_{dp}$ ) and static ( $P_{stat}$ ).

$$P_{SW} = P_{Cpar} + P_{dp} + P_{stat} \tag{4.1}$$

First of all, the capacitive power dissipation is determined by the charging and discharging of parasitic capacitances ( $C_{par}$ ) seen in the inverter output node. Furthermore, [Rabaey03] shows that in order to compute the capacitive power dissipation, we have to consider the switching activity of the device

$$P_{Cpar} = C_{par} \cdot V_{DD}^2 \cdot f_{0 \to 1}$$
(4.2)

where  $f_{0\to 1}$  represents the frequency of energy-consuming  $0\to 1$  transitions. It can be expressed as the ratio between the number of switching events  $(N_{0\to 1})$  and the number of periods  $N_P$  times the sampling period  $T_s$ , or as the activity factor  $(\beta_{0\to 1} = N_{0\to 1} / N_P)$  times the sampling frequency  $f_s$ 

$$f_{0\to1} = \frac{N_{0\to1}}{N_P \cdot T_s} = \beta_{0\to1} \cdot f_s \tag{4.3}$$

From the DC operating point simulation of the CMOS inverter (dimensions given in Table 4.2), we can obtain the parasitic capacitance seen in the output node of the unitary power cell as the sum of the gate-drain and drain-body capacitances of the pMOS and nMOS transistors,  $C_{par} \approx 1.12$  fF (as shown in the example from [Rabaey03, p. 190]). This leads to a capacitive power dissipation per unitary cell of  $P_{Cpar} \approx 10.75 \,\mu\text{W}$ , considering a maximum switching activity  $\beta_{0 \rightarrow 1} = 1$  at  $f_s = 9.6$  GHz.

Moreover, the direct-path power dissipation concerns the direct current path between supply and ground, when the nMOS and the pMOS transistors are conducting simultaneously during switching (depends on activity factor). In order to estimate the average power consumption, [Rabaey03] approximates the resulting current spikes as triangles and assumes symmetric rising and falling responses (Fig. 4.11  $C_{par}$  is notated  $C_L$  in [Rabaey03])

$$P_{dp} = t_{sc} \cdot V_{DD} \cdot I_{peak} \cdot f_{0 \to 1}$$
(4.4)

where *t<sub>sc</sub>* represents the time both devices are conducting

$$t_{sc} \approx \frac{V_{DD} - 2V_T}{V_{DD}} \cdot \frac{t_{r(f)}}{0.8}$$

$$(4.5)$$

In our case, the supply voltage is set to  $V_{DD} = 1$  V, the threshold voltage is obtained from the DC operating point  $V_T \approx 0.28$  V, and the rise/fall time was given in Table 4.2,  $t_{r(f)} \approx 5.1$  ps. Hence, we can estimate  $t_{sc} \approx (1 - 0.28) \cdot 5.1 / 0.8 = 2.87$  ps. Furthermore, the peak current  $I_{peak}$  (as depicted in Fig. 4.11) was obtained through the DC simulation of the CMOS inverter, resulting in  $I_{peak} \approx 96 \mu$ A. Therefore, the direct-path power dissipation can be evaluated  $P_{dp} \approx 2.87 \cdot 1 \cdot 96 \cdots 1 \cdot 9.6 \cdot 10^{-3} = 2.65 \mu$ W, considering a maximum switching activity  $\beta_{0\to 1} = 1$  at  $f_s = 9.6$  GHz.





Finally, the static power dissipation is determined by the current ( $I_{stat}$ ) flowing between the supply rails when the CMOS inverter is not switching.

$$P_{stat} = I_{stat} \cdot V_{DD} \tag{4.6}$$

These results are validated through a Cadence simulation of the CMOS inverter, with a total power consumption of  $P_t \approx 13.6 \,\mu\text{W}$  when switching at a frequency  $f_s = 9.6 \,\text{GHz}$  and a static power dissipation of  $P_{stat} = 0.3 \,\mu\text{W}$ . Thus, the activity dependent power consumption per unitary cell is approximately  $P_{act} = P_{Cpar} + P_{dp} = 13.3 \,\mu\text{W}$  for  $\beta_{0\to 1} = 1$ , whereas  $P_{Cpar}$  has a contribution of around 80 %, and  $P_{dp}$  around 20 %, respectively.

#### **4.2.2.1.4.** Effect of the CMOS inverter on-resistance

The on-resistance of the inverter ( $R_{on}$ ) is an important design parameter, as the equivalent resistance of all the inverters ( $R_{on}$  divided by the number of inverter cells) will be seen at the output in series with the load resistance, thus creating a voltage divider.

This will determine an overall PA efficiency loss, due to the lower output power  $P_{out,ron}$  (reduced voltage drop on the load resistance), and the power dissipated on the PA equivalent resistance  $P_{c,req}$ . From the DC operating point we can obtain  $R_{on} \approx 460 \Omega$ , which leads to an equivalent resistance  $R_{eq} \approx 0.4 \Omega$  (considering 1152 cells).

Thus, for a 2 $\Omega$  load resistance, the output power can be approximated to  $P_{out,ron} \approx (2 / 2.4)^2 \cdot P_{out,ideal} \approx 0.69 \cdot P_{out,ideal}$  (around 1.6 dB output power loss), and the dissipated power  $P_{c,req} \approx (0.4 / 2.4)^2 \cdot (2 / 0.4) \cdot P_{out,ideal} \approx 0.14 \cdot P_{out,ideal}$ .

#### 4.2.2.2. FIR-PA cells overview

We recall that the proposed FIR-PA has three types of power cells, namely fixed, coexistence and matching network cells. On the one hand, the fixed and coexistence cells are the same (CMOS inverter and MOM capacitor), whereas the only difference is represented by the digital signal driving the inverter. On the other hand, the matching

network cells are used to change the total value of capacitance seen in the output node, in order to be able to configure the resonance frequency of the RLC bandpass filter.

The latter is further implemented using an nMOS switch, which will connect additional capacitance to *GND*, when the gate input signal is a logic 1 (increase the total output capacitance), and will provide a high impedance node when the gate input signal is a logic 0 (reduce total output capacitance). The associated cell reuses the MOM capacitor and the nMOS transistor, which were previously detailed in Sections 4.2.2.1 and 4.2.2.2, whereas the pMOS transistor is removed (Fig. 4.12 a).



Fig. 4.12. Matching cell: nMOS switch (a); nMOS switch-on (b); nMOS switch-off (c)

When the nMOS is switched-on for  $V_H = V_{DD}$ , the impedance of the parasitic capacitor of the nMOS can be neglected (Fig. 4.12 b) with respect to the output resistance (approximately 460  $\Omega$ ), and the capacitance seen from the output node to ground is equal to  $C_{MOM} = 1.66$  fF.

On the contrary, when the nMOS is switched-off, the output resistance is approximately 14 M $\Omega$  and can be neglected (Fig. 4.12 c), meaning that the equivalent capacitance seen from the output node to ground is  $C_{eq} = (C_{MOM} \text{ series } C_{par}) = C_{MOM} \cdot C_{par} / (C_{MOM} + C_{par}) \approx 0.38 \text{ fF}.$ 

# 4.2.2.3. Decoupling capacitors

It was previously suggested that the free area in the designed power cell (Fig. 4.10) can be used to add decoupling capacitors (*decap*) between  $V_{DD}$  and *GND* to reduce the power supply noise. However, the free space is found very close to the MOM switch

capacitor. Hence, the decoupling capacitor will not be implemented as a MOM, in order to minimize the additional parasitics.

Instead, we choose to implement the *decap* with active devices (pMOS and nMOS transistors), as shown in Fig. 4.13. A detailed description of this cell can be found in [Meng06] [Vazquez04].



Fig. 4.13. Decap with active devices

For a group of 4 power cells (as shown in Fig. 4.10), there is enough space to add 2 *decap* cells, with the dimensions  $W_P$  = 400 nm,  $W_N$  = 350 nm,  $L_P$  =  $L_N$  = 400 nm (Fig. 4.14), resulting in a decoupling capacitance of approximately 5.4 fF.




Fig. 4.14. FIR-PA power cells: fixed and coexistence (a); matching network (b)

After RC extraction, the total decoupling capacitance between  $V_{DD}$  and GND is estimated to be around 12 fF for each group of 4 cells (10.8 fF thanks to 2 *decap* cells and 1.2 fF due to additional wire parasitics). Hence, we estimate that the total decoupling capacitance included in the FIR-PA cells is approximately  $C_{dec} \approx 12 \cdot 1152 / 4 \approx 3.46$  pF.

## 4.2.2.4. Power efficiency estimation

We can now estimate the power efficiency of the proposed FIR-PA in Table 4.3, including the power consumption of both the real switch and capacitor, whereas the input signal is a sinewave of frequency  $f_{sin} = 1$  MHz with the peak amplitude  $A_{in}$ , sampled at a frequency  $f_s/2 = 4.8$  GHz (the same conditions used in the theoretical analysis in Chapter 3).

Hence, we can update Eq. (3.20)

$$\eta = P_{out} / \left( P_{out} + P_{HSC} + P_{HSW} \right) \tag{4.7}$$

For comparison, we include the theoretical output power  $P_{out}$ , and the corresponding drain efficiency  $\eta_{HSC} = P_{out} / (P_{out} + P_{HSC})$ , whereas  $P_{HSC}$  represents the capacitive power dissipation for a unitary capacitance cell  $C_{u2} = 1.66$  fF (computed according to Chapter 3).

| A <sub>in</sub><br>[mV] | P <sub>out</sub><br>[mW] | P <sub>HSC</sub><br>[mW] | η <sub>нsc</sub><br>(%) | P <sub>t,Cpar</sub><br>[mW] | P <sub>t,dp</sub><br>[mW] | P <sub>t,stat</sub><br>[mW] | η <sub>HSC1</sub><br>(%) | P <sub>out,ron</sub><br>[mW] | P <sub>c,req</sub><br>[mW] | η <sub>HSC2</sub><br>(%) |
|-------------------------|--------------------------|--------------------------|-------------------------|-----------------------------|---------------------------|-----------------------------|--------------------------|------------------------------|----------------------------|--------------------------|
| 64                      | 0.34                     | 4.25                     | 7.4                     | 2.77                        | 0.69                      | 0.54                        | 3.9                      | 0.23                         | 0.05                       | 2.7                      |
| 128                     | 1.35                     | 4.31                     | 23.9                    | 2.8                         | 0.7                       | 0.54                        | 13.9                     | 0.93                         | 0.19                       | 9.8                      |
| 192                     | 3.02                     | 4.44                     | 40.5                    | 2.77                        | 0.69                      | 0.54                        | 26.3                     | 2.08                         | 0.42                       | 19                       |
| 256                     | 5.39                     | 4.21                     | 56.1                    | 2.63                        | 0.66                      | 0.54                        | 40.1                     | 3.72                         | 0.75                       | 29.7                     |
| 320                     | 8.41                     | 4.34                     | 66                      | 2.89                        | 0.72                      | 0.54                        | 49.7                     | 5.8                          | 1.18                       | 37.5                     |
| 384                     | 12.1                     | 4.28                     | 73.9                    | 2.88                        | 0.72                      | 0.54                        | 58.9                     | 8.35                         | 1.69                       | 45.2                     |
| 448                     | 16.49                    | 4.34                     | 79.2                    | 3.12                        | 0.78                      | 0.54                        | 65.2                     | 11.38                        | 2.31                       | 50.6                     |
| 512                     | 21.52                    | 4.21                     | 83.6                    | 3.25                        | 0.81                      | 0.54                        | 70.9                     | 14.85                        | 3.01                       | 55.7                     |
| 576                     | 27.22                    | 4.28                     | 86.4                    | 3.48                        | 0.87                      | 0.54                        | 74.8                     | 18.78                        | 3.81                       | 59.1                     |
| 640                     | 41.45                    | 4.18                     | 90.8                    | 3.54                        | 0.89                      | 0.54                        | 81.9                     | 28.6                         | 5.8                        | 65.7                     |

Table 4.3. Differential FIR-PA drain efficiency - CMOS inverter and MOM capacitor

We further note that the output power  $P_{out}$  is obtained based on the assumption that all the power cells can switch at a given time period, without taking into consideration any loss due to the implementation of additional configurable cells, CMOS inverter onresistance, or layout parasitics.

Moreover,  $P_{HSW}$  represents the total power consumption due to the CMOS inverters

$$P_{HSW} = P_{t,Cpar} + P_{t,dp} + P_{t,stat}$$

$$\tag{4.8}$$

$$P_{t,Cpar} = 2 \cdot 907 \cdot \beta_{HSW} \cdot P_{Cpar} = 1814 \cdot \beta_{HSW} \cdot 0.8 \cdot P_{act}$$

$$\tag{4.9}$$

$$P_{t,dp} = 2 \cdot 907 \cdot \beta_{HSW} \cdot P_{dp} = 1814 \cdot \beta_{HSW} \cdot 0.2 \cdot P_{act}$$

$$\tag{4.10}$$

$$P_{t,stat} = 2 \cdot 907 \cdot P_{stat} \tag{4.11}$$

where the factor 2 is used for differential operation, 907 is the number of power cells associated to the reference filter (Section 4.2.1.2),  $P_{act} = 13.3 \mu W$  is the activity dependent

power consumption of a CMOS inverter when switching at a frequency  $f_s$  = 9.6 GHz, and  $P_{stat}$  = 0.3 µW is the static power consumption of a CMOS inverter (Section 4.2.2.2.2).

Moreover, we recall that  $P_{Cpar} = 10.75 \,\mu\text{W}$  and  $P_{dp} = 2.65 \,\mu\text{W}$  are dependent on the switching activity for  $0 \rightarrow 1$  transitions ( $\beta_{0 \rightarrow 1}$ ) and were derived for  $\beta_{0 \rightarrow 1} = 1$ . However, in our case, the switching activity of the DSM output (further noted  $\beta_{HSW}$ ) is less than 1 and was obtained through simulation for the given input signal peak amplitudes. Hence, using Eq. (4.7 - 4.11) we can estimate the drain efficiency  $\eta_{HSC1}$ , which highlights the additional power consumption due to the switching of CMOS inverters.

Next, we take into consideration the on-resistance of the CMOS inverter, which determines a lower output power,  $P_{out,ron} \approx 0.69 \cdot P_{out}$ , and an additional dissipated power on the on-resistance, namely  $P_{c,req} \approx 0.14 \cdot P_{out}$  (as estimated in Section 4.2.2.2.3), thus affecting the drain efficiency  $\eta_{HSC2}$  which can be computed using

$$\eta = P_{out,ron} / \left( P_{out,ron} + P_{HSC} + P_{HSW} + P_{c,req} \right)$$
(4.12)

The drain efficiency characteristics of the differential Half-SC FIR-PA are plotted in Fig. 4.15, to highlight the effects of capacitive power consumption ( $\eta_{HSC}$ ), capacitive and inverter power consumption ( $\eta_{HSC1}$ ), and capacitive and inverter power consumption considering the on-resistance  $R_{on}$  ( $\eta_{HSC2}$ ).



Fig. 4.15. Power efficiency characteristics

## 4.2.3. FIR-PA structure

The coefficients of the FIR filter can be implemented using single-bit unitary power cells based on CMOS inverters and MOM capacitors (3D GDSII Viewer Fig. 4.16 left). Now, we recall that the FIR coefficients were quantized on five bits, meaning that the smallest coefficient corresponds to 1 power cell, and the largest coefficient to 32, respectively.

Consequently, in order to reduce non-idealities which may occur due to long signal lines and large number of driven power cells, we proposed an additional cell grouping of {1, 2, 4, 8} power cells per driving signal, which is complementary to the grouping presented in Fig. 4.14. For instance, using this scheme, the coefficient of 32 is divided into 4 groups of 8 cells, which are driven by 4 equivalent signals. In addition, the FIR-PA structure is integrated under the signal pad (Fig. 4.16 right), whereas the connection is made on Metal 10 (output FIR-PA) and Alucap (signal PAD).

Next, we will present the ideal performances of the proposed FIR-PA using the three types of unitary cells, namely fixed, coexistence, and matching, respectively. Furthermore, we will highlight the influence of the layout extracted parasitics on the maximum amplitude of the output signal. Finally, these results are combined with the ideal power efficiency estimation from Table 4.3 in order to obtain a complete performance estimation of the PA stage.



Fig. 4.16. Layout FIR-PA: unitary cell 3D view (left); FIR-PA under signal PAD (right)

## 4.2.3.1. Ideal PA performance

The performances of the PA in terms of output power depend on the maximum amplitude of the output signal, which can be obtained when switching on all the cells at the same time. Ideally, if we considered switching on only the fixed cells, the maximum amplitude equals  $V_{DD}$ .

However, the proposed architecture includes also configurability cells for coexistence and matching, which are only partially activated based on the required specifications. In order to ensure coexistence in the 20 MHz case, we need to activate 907 power cells, resulting in  $N_{pc,c}$  = 790 + 192 - 907 = 75 coexistence cells which are never activated (always grounded). Thus, the maximum amplitude of the output signal is reduced proportionally to the number of inactive cells, namely  $A_{max,c}$  =  $V_{DD} \cdot 907 / (907+75) = 0.923 \cdot V_{DD}$ .

Next, we recall that the matching cells contain an nMOS switch driving a MOM capacitor (Fig. 4.12). Hence, when all the matching cells are active (nMOS switched-on), the maximum amplitude of the output signal is  $A_{max,mcon} = 0.787 \cdot V_{DD}$ , and when all the matching cells are deactivated (nMOS switched-off) we obtain  $A_{max,mcoff} = 0.888 \cdot V_{DD}$  ( $C_{eq} = 0.38$  fF). Consequently, the increased configurability for coexistence and matching network is achieved with a reduction of the output signal power of around 1-2 dB.

### 4.2.3.2. Layout parasitic extraction

The MOM capacitor structure presents inherent parasitic capacitance to ground, which will be seen in the output node. This will increase the total capacitance, thus reducing the maximum amplitude of the output signal. Consequently, the output power and efficiency of the FIR-PA stage will be lower. The effect of parasitic capacitances can be evaluated when performing simulations of the FIR-PA cell with extracted parasitics considering the same conditions as in the ideal case, namely 907 power cells switched on and 75 coexistence cells switched off.

Hence, when the matching cells are active, the maximum amplitude of the output signal is reduced to  $A_{max,mconext} = 0.662 \cdot V_{DD}$ , and when all the matching cells are deactivated

we obtain  $A_{max,mcoffext} = 0.719 \cdot V_{DD}$ . The results for the ideal and parasitics extracted cases are plotted in Fig. 4.17. Based on these results we can estimate the maximum output power and efficiency in Table 4.4, for the 5 cases depicted in Fig. 4.17, whereas the values for the case "Ideal" are taken from the last line of Table 4.3.



Fig. 4.17. Maximum output signal amplitude: ideal and extracted

We note that the proposed FIR-PA has an ideal output power of almost 16.2 dBm with an efficiency of ~82%. However, these performances will be lower due to additional configurability and chip integration. The output power ( $P_{out}$ ) and the efficiency ( $\eta_{HSC2}$ ) are reduced in the case of i) filter configurability by 0.7 dB and 2.5%, ii) filter and matching network configurability by 2.1 dB and 8.2%, and iii) filter and matching network configurability, and circuit integration by 3.6 dB and 15.5%, respectively.

Moreover, as shown in Section 4.2.2.5, the on-resistance of the CMOS inverter has an important effect over the drain efficiency, namely lower output power and additional dissipated power on the PA equivalent resistance. Thus, this parameter is taken into consideration for the drain efficiency estimation in Table 4.5, whereas the values for the case "Ideal  $R_{on}$ " are taken from the last line of Table 4.3.

|                                       | Ideal | Case 1 | Case 2 | Case 3 | Case 4 | Case 5 |
|---------------------------------------|-------|--------|--------|--------|--------|--------|
| Output signal<br>Amplitude factor     | 1     | 0.923  | 0.888  | 0.787  | 0.719  | 0.662  |
| Output power<br>P <sub>out</sub> [mW] | 41.45 | 35.31  | 32.68  | 25.67  | 21.37  | 18.11  |
| Efficiency<br><b>η</b> нsc1 (%)       | 81.9  | 79.4   | 78.1   | 73.7   | 70     | 66.4   |

Table 4.4. Differential FIR-PA drain efficiency - configurability, parasitics extraction

The proposed FIR-PA has an output power of almost 14.6 dBm with an efficiency of around 66% when considering the CMOS on-resistance, which translates to a power and drain efficiency loss of 1.6 dB and 16 % compared to the ideal case (without  $R_{on}$ ). These performances will be further degraded due to the additional configurability and chip integration.

|                                       | Ideal<br><i>R<sub>on</sub></i> | Case 1<br><i>R<sub>on</sub></i> | Case 2<br><i>R<sub>on</sub></i> | Case 3<br><i>R</i> on | Case 4<br><i>R</i> on | Case 5<br><i>R<sub>on</sub></i> |
|---------------------------------------|--------------------------------|---------------------------------|---------------------------------|-----------------------|-----------------------|---------------------------------|
| Output signal<br>Amplitude factor     | 1                              | 0.923                           | 0.888                           | 0.787                 | 0.719                 | 0.662                           |
| Output power<br>P <sub>out</sub> [mW] | 28.6                           | 24.52                           | 22.69                           | 17.82                 | 14.88                 | 12.61                           |
| Efficiency<br><b>η</b> нsc2 (%)       | 65.7                           | 63.5                            | 62.3                            | 58.3                  | 55.1                  | 51.9                            |

With respect to the ideal case (without  $R_{on}$ ), the output power ( $P_{out}$ ) and the efficiency ( $\eta_{HSC}$ ) are reduced in the case of i) additional filter configurability by 2.3 dB and 18.5%, ii) additional filter configurability and matching network configurability by ~3.7 dB and 23.7%, and iii) additional filter configurability, matching network configurability and integration by 5.2 dB and 30.2%, respectively.

Finally, we synthesize in Fig. 4.18 the results in terms of output power and power efficiency derived in this section, focusing on the influence of the CMOS inverter on-resistance ( $R_{on}$ ), additional configurability (coexistence and matching cells), and integration (layout). We recall from Fig. 4.17 that:

- > *Ideal*: FIR-PA cells without configuration (coexistence and matching) cells
- *Case 1*: ideal case with additional coexistence cells
- *Case 2*: ideal case with additional coexistence cells and matching cells-off
- *Case 3*: ideal case with additional coexistence cells and matching cells-on
- Case 4: parasitics extracted case with additional coexistence cells and matching cells-off
- Case 5: parasitics extracted case with additional coexistence cells and matching cells-on



Fig. 4.18. Overview of output power and drain efficiency performance

## 4.3. Digital block design

The present section describes the implementation of the digital block circuits which provide the input signals for a correct Half-SC FIR-PA operation. This means that, for a given DSM output, the signal generator circuit needs to generate the correct input signals as described in Chapter 3, Eq. (3.40-3.41), in order to enable the reduction of the symmetric filter coefficients. Furthermore, the digital mixing around  $f_c$  is performed by interleaving the I and Q paths in signal streams corresponding to {I, Q, -I, -Q} at a sampling frequency of  $4^*f_c$  (see Section 3.3.3).

In addition, we introduced an interface between the digital signal generator and the FIR-PA stage, whereas different buffer sizes are used on each line according to a line load estimation based on theoretical calculation and RC extraction of the output stage. Moreover, the configurability of the filter coefficients, matching network, different modes of operation and resets of digital blocks is implemented in a Serial Peripheral Interface (SPI) based on a finite state machine with read/write option and 32 8-bit registers.

The digital blocks (except clock tree and buffer stage) have been synthesized using RTL Compiler from Cadence based on the VHDL description. The clock tree and the buffer stage were custom designed using standard cells in order to ensure correct sampling operation and appropriate line driving. Finally, all the digital blocks have been included in a top Verilog source which was used for the automatic place and route in SoC Encounter from Cadence.

## 4.3.1. Input signals generation

It was previously shown that the proposed IC implementation includes two circuit versions, namely Side A MUX, and Side B FF. Side A MUX has only one mode of operation, i.e. feeding externally generated DSM outputs to the four in-phase signals (IN\_I1, IN\_I2, IN\_I3, IN\_I4) and four quadrature signals (IN\_Q1, IN\_Q2, IN\_Q3, IN\_Q4), respectively (Fig. 4.19). The signals are sampled at a frequency  $f_c/2$  and correspond to the outputs of 4-channel time-interleaved quadrature DSMs, which are obtained through simulation with

MATLAB or ModelSim for various input signals (sinewave, WLAN). Furthermore, these signals are directly used by the FIR-PA SIGGEN blocks to generate the signal streams for the PA output stage.

Side B FF presents three modes of operation, which enable the tests of the PA, the 8channel time-interleaved DSM, and the complete transmitter chain (including on-chip DSM). The first test of the PA stage is based on externally generated outputs of quadrature DSMs and is identical to the test on Side A MUX (Fig. 4.19).



Fig. 4.19. Signal generator digital circuits block diagram: PA test

Next, let us consider the test of the 8-channel time-interleaved DSM in the I path (Fig. 4.20). The input signal is coded on 12 bits (11 bits data, 1 sign bit) at a frequency  $2*f_c/8 = f_c/4$  (around 600 MHz when  $f_c = 2.4$  GHz). However, the number of signal pads is limited to four on both I and Q, due to area constraints (the total chip area is determined by the number of pads, whereas the total active area is around 5 % of the total chip area).

Therefore, we send the data on 6 signal pads (IN\_I1, IN\_I2, IN\_I3, IN\_Q1, IN\_Q2, IN\_Q3) at  $f_c/2$  (around 1.2 GHz for  $f_c = 2.4$  GHz) and reconstruct the input signal using one synchronization signal (IN\_Q4) corresponding to a clock signal of frequency  $f_c/4$ .



Fig. 4.20. Signal generator digital circuits block diagram: DSM test

The corresponding timing diagram is presented in Fig. 4.21 and highlights the importance of the synchronization bit which determines the start of a new input data when  $IN_Q4 = 0$ .



Fig. 4.21. Timing diagram: DSM input signals for 8-channel TI DSM test

For the test of the complete transmitter (Fig. 4.22), the input signals for the I and Q paths are coded on 12 bits (11 bits data, 1 sign bit) at a frequency  $f_c/4$  (around 600 MHz when  $f_c = 2.4$  GHz).

Again, due to the limited number of signal pads (four on both I and Q), we need to encode multiple bits per signal line. Hence, we generate the input signals at a larger sampling rate ( $f_c$  instead of  $f_c/4$ ), in order to include more signal bits per signal line (4 bits per line), and reconstruct the 12-bit input signals afterwards. Thus, the 12 bits are encoded on three signal lines on I (IN\_I1, IN\_I2, IN\_I3) and Q (IN\_Q1, IN\_Q2, IN\_Q3) sides, whereas the two remaining lines (IN\_I4 and IN\_Q4) are used to ensure correct signal reconstruction and correspond to clock signals of frequency  $f_c/2$  (IN\_I4) and  $f_c/4$  (IN\_Q4).



Fig. 4.22. Signal generator digital circuits block diagram: complete TX test

This operation is illustrated in Fig. 4.23 for the in-phase input signals, where we can see that the 12 bits are sent on IN\_I1, IN\_I2, IN\_I3 at a sampling frequency of  $f_c$  during a period of  $4/f_c$ , and they are recovered on IN\_I\_DS during the next period of  $4/f_c$ .

The IN\_I4 and IN\_Q4 are used for synchronization in order to determine correctly where is the start of a new 12-bit input code, namely when IN\_I4 = 0 and IN\_Q4 = 0. The same process is applied to the quadrature input signals, where 12 bits are sent on IN\_Q1, IN\_Q2, IN\_Q3 at a sampling frequency of  $f_c$ , and they are recovered on IN\_Q\_DS.



Fig. 4.23. Timing diagram: DSM input signals for complete TX test

Furthermore, the outputs of the 8-channel TI quadrature DSMs sampled at  $f_c/4$ , namely {VI1, VI2, VI3, VI4, VI5, VI6, VI7, VI8} for the I path and {VQ1, VQ2, VQ3, VQ4, VQ5, VQ6, VQ7, VQ8} for the Q path, are passed through four 2:1 serializers working at  $f_c/2$  in order to reduce the number of signal streams to four and enable efficient block reuse (FIR-PA SIGGEN) and compatibility with the PA test case using externally generated DSM samples (Fig. 4.19).

# 4.3.2. FIR-PA SIGGEN block

The FIR-PA SIGGEN block uses the output of a single-bit DSM to provide the necessary signals that implement Eq. (3.40-3.41) and allow the FIR filter simplification (Fig. 4.24).





The FIR-PA SIGGEN block is designed as a 4-phase system on both I and Q paths at  $f_c/2$ . At the input, a shift register holds 112 consecutive samples I/Q[1,...,112] at each time period (2/ $f_c$ ), whereas 109 is the maximum filter length in the 20/40 MHz case. Thus, I[109] = I1, I[110] = I2, I[111] = I3, I[112] = I4 (from Fig. 4.19), and these signals are further delayed in order to derive I[1,...,108], where I[108] =  $z^{-1}$ ·I[112].

Next, 4 groups of signal streams per path (I/Q) and per side (positive/negative) are derived corresponding to the 4-phase configuration. Thus, we may write for the I path (identical for the Q path)

$$\begin{bmatrix} IP1[k] = (I[k] \ NAND \ I[109 - k + 1]) & IP1[55] = (I[55]) \\ IP2[k] = (I[k + 1] \ NAND \ I[110 - k + 1]) & IP2[55] = (I[56]) \\ IP3[k] = (I[k + 2] \ NAND \ I[111 - k + 1]) & IP3[55] = (I[57]) \\ IP4[k] = (I[k + 3] \ NAND \ I[112 - k + 1]) & IP4[55] = (I[58]) \\ \end{bmatrix}$$

$$(4.13)$$

and

where *k* = {1,...,54}.

Hence, the signal stream  $RFP_1$  for the 20/40 MHz case (reference filter) at  $4^*f_c$  will be {IP1[1], QP1[1], IN2[1], QN2[1], IP3[1], QP3[1], IN4[1], QP4[1]}.

However, this arrangement does not apply to the 80 MHz and 160 MHz filter cases, due to the different number and values of filter coefficients (see Section 4.2.1.2). In these cases, the signal streams  $RFP_{1,...,55}$  are obtained individually based on the chosen coefficient adjustment (reference filter coefficient combination in Table 4.1).

Let us consider in the 80 MHz case, the filter coefficient  $c_{10,80M} = 10$ , which is obtained by combining two coefficients of the reference filter  $c_{10,80M} = c_{6,20/40M} + c_{18,20/40M}$ , meaning that the signal streams *RFP*<sup>6</sup> and *RFP*<sup>18</sup> are equivalent and correspond to {IP1[6], QP1[6], IN2[6], QN2[6], IP3[6], QP3[6], IN4[6], QP4[6]}. Thus, we may write

$$IP1[6] = (I[109 - 65 + 12] \ NAND \ I[109 - 12 + 1]) \equiv (I[52] \ NAND \ I[98])$$

$$IP2[6] = (I[53] \ NAND \ I[99])$$

$$IP3[6] = (I[54] \ NAND \ I[100])$$

$$IP4[6] = (I[55] \ NAND \ I[101])$$

$$IN1[6] = (I[109 - 65 + 12] \ OR \ I[109 - 12 + 1]) \equiv (I[52] \ OR \ I[98])$$

$$IN2[6] = (I[53] \ OR \ I[99])$$

$$IN3[6] = (I[54] \ OR \ I[100])$$

$$(4.16)$$

$$IN4[6] = (I[55] \ OR \ I[101])$$

where "109-65" represents the difference between the length of the reference (20/40 MHz) and the 80 MHz filters.

## 4.3.3. Digital to RF mixer

The DRFM block (Fig. 4.25), is implemented as a multiplexer (MUX) which reconstructs the RF signal streams  $RFP/N_{1,...,55}$  at  $4*f_c$ , based on the signals derived in the FIR-PA SIGGEN block at  $f_c/2$ . The signal multiplexing is performed in two steps, namely a 4:1 MUX which creates the signals for the I and Q paths on the positive and negative sides at  $2*f_c$ , followed by a 2:1 MUX which interleaves the I and Q paths and generates the RF streams at  $4*f_c$ .

For example, in the case of  $RFP_1$ , the signals obtained from the FIR-PA SIGGEN block are {IP1[1], QP1[1], IN2[1], QN2[1], IP3[1], QP3[1], IN4[1], QP4[1]} which are all available during a period of  $(2/f_c)$ . Next, thanks to the 4:1 MUX we will obtain two signals at  $2^*f_c$ ,  $I_P_1 =$ {IP1[1], IN2[1], IP3[1], IN4[1]} and  $Q_P_1 =$ {QP1[1], QN2[1], QP3[1], QP4[1]}. Finally, the 2:1 MUX interleaves  $I_P_1$  and  $Q_P_1$  to obtain  $RFP_1$  at  $4^*f_c$ . The digital to RF mixing operation is implemented identically in both circuit versions, Side A MUX and Side B FF. However, in the latter, an additional resampling at the frequency  $4*f_c$  is performed based on D Flip-flops, in order to re-synchronize all the signal streams before feeding them into the PA output stage. We note, that the resampling is not performed in the Side A MUX circuit, which uses an input clock frequency of  $2*f_c$ .



Fig. 4.25. 2-step MUX DRFM

## 4.3.4. Power consumption estimation

The power consumption of the digital block circuits working in the transmitter chain at  $4*f_c = 9.6$  GHz in the case of a 1 MHz sinewave is estimated through simulation to be around 60 mW, out of which 20 mW are dissipated in the quadrature DSMs (see Section 3.2.4). The power consumption breakdown can be found in Table 4.6.

We can remark that almost half of the power is consumed to obtain the useful signal functions (DSM, Clock Tree, FIR-PA SIGGEN), whereas the rest of the power is dissipated in the signal reconstruction and interface with the PA output stage. This means that a future version of the circuit should also concentrate on optimizing the MUX and buffer stages in order to improve overall system efficiency. For example, in the case of the buffer stages, we could envision a different layout of the stream lines in the PA, which may optimize the line loading and allow a reduction of the buffer sizes to lower power consumption.

| DSM | Clock Tree | Buffer | FIR-PA<br>SIGGEN | 4:1<br>MUX | 2:1 MUX | Total |
|-----|------------|--------|------------------|------------|---------|-------|
| 20  | 7.2        | 15.5   | 7.1              | 9.4        | 1.2     | 60.4  |

**Table 4.6.** Digital blocks power consumption breakdown [mW]

A summary of expected performance based on post-layout simulations is presented in Table 4.7.

 Table 4.7. Post-layout circuit-level simulations - expected transmitter performance

| Parameters                      | Post-layout simulation results      |
|---------------------------------|-------------------------------------|
| Bandwith [MHz]                  | 20 - 160                            |
| Carrier frequency [GHz]         | 2.4                                 |
| Digital blocks consumption [mW] | 60.4                                |
| FIR-PA consumption [mW]         | 24.2                                |
| Peak output power [dBm]         | 11.7 <sup>a</sup> / 11 <sup>b</sup> |
| Drain efficiency [%]            | 55 <sup>a</sup> / 52 <sup>b</sup>   |
| System efficiency [%]           | 17 <sup>a</sup> / 14.9 <sup>b</sup> |

<sup>a</sup> configuration matching cells-off Case 4 (Fig. 4.18)

<sup>b</sup> configuration matching cells-on Case 5 (Fig. 4.18)

# 4.4. Conclusion

This chapter presents the circuit design of the all-digital transmitter based on timeinterleaved DSM and FIR-PA, according to the theoretical analysis from Chapter 3.

The key-point of the proposed architecture is to move the signal processing from the output stage to the digital domain, namely the configuration of the driving signals for the FIR filter coefficients is implemented in a digital block (block FIR-PA SIGGEN in Fig. 4.19), working at lower frequencies ( $f_c/2$ ) for reduced timing constraints, thanks to the time-interleaving scheme. In addition, the signal reconstruction at  $f_s = 4*f_c$  is implemented using

two stages of digital multiplexers (4:1 MUX and 2:1 MUX), in order to achieve digital to RF mixing around  $f_c$  (Section 4.3.3).

Furthermore, the FIR-PA is built using the proposed differential Half-SC structure, whereas the unitary switching cell is formed from a CMOS inverter, driving a MOM capacitor, thus maintaining the low complexity switching scheme in the output stage.

Consequently, we may state that the proposed architecture proves the concept of alldigital transmitter, and pushes the digital operation domain (digital signal processing and simple CMOS inverter) up to the antenna.

The main contributions presented in this chapter are summarized next:

► FIR-PA:

- the complete differential output stage is integrated under two signal pads, meaning that the effective additional area is **zero**;
- custom design of CMOS inverter
  - inverter size optimization considering area, power consumption, and the efficiency loss due to the on-resistance;
  - reduce switching non-idealities using the body-bias feature of the 28nm FD-SOI technology (equalize simultaneously the rise and fall times, and the low-to-high and high-to-low delays);
- custom design of MOM capacitor
  - grouping of 4 unitary cells which enables a trade-off between the shielding of signal lines and output parasitic capacitances;
  - iterative process using SKILL scripting to further reduce parasitics;
- additional configurability using coexistence cells (for the FIR function) and matching cells (for the matching network); the matching cells are placed on the sides and serve also as dummy cells for the final structure;
- introduce supply decoupling using active devices (efficient use of cell area) to reduce the voltage supply variation;

- evaluation of the FIR-PA performances comprising the output power, dissipated power and drain efficiency, highlighting the influence of the design parameters, such as the unitary MOM capacitance (capacitive dissipated power), inverter size (parasitic capacitances, direct-path current, static current, and on-resistance), and layout extracted parasitics.
- Digital processing
  - the main digital functions are described in VHDL, thus allowing an automatic synthesis in RTL Compiler using standard cells;
  - the clock tree and output buffers (interface with the FIR-PA) were custom designed using standard cells to ensure correct signal synchronization, and buffer sizing (based on the estimation of the line loading) to avoid additional switching non-idealities;
  - the layout of the complete digital circuit was done in SoC Encounter using only standard cells (automatic place and route process).

Finally, several improvements are identified to enhance the overall performance of the proposed digital transmitter:

#### ► FIR-PA:

- implement a different signal line layout in order to reduce the corresponding parasitic capacitances, thus allowing the reduction of the buffer sizes (in the digital block) to lower the power consumption;
- reduce the number of signal lines through an optimization of the number of power cells per signal line; the load line estimation shows that the parasitic capacitance of the metal line connecting the FIR-PA to the digital circuits is larger than the effective gate capacitance of the driven power cells; this will also determine a different cell placement optimization;
- improve the efficiency of the matching cells
  - the current design implements several activation signals, whereas each signal drives multiple nMOS switches to pull-

down MOM capacitors; ideally, when the nMOS is switched-off, the MOM capacitor should be disconnected from the output node; still, the MOM capacitance is actually connected in series with the nMOS parasitic capacitance, which lowers the output signal amplitude; this effect could be partially reduced when using one activation signal for one nMOS switch to pull down several MOM capacitors (1:1:*n* activation instead of 1:*n*:*n*).

- Digital processing
  - Optimization of the signal reconstruction and buffering blocks, which consume almost half the power consumption of the digital blocks
    - benefits from the reduced number of signal lines in the FIR-PA
  - o DSM optimization
    - currently, the switching activity of the DSM is almost constant, regardless the amplitude of the input signal, which is less efficient for low input signal amplitudes;
    - the peak input signal amplitude (output power) is limited by the stability of the DSM; a way to improve DSM stability for higher amplitudes is proposed in [Basetas17], which implements a class of single-bit multi-step look-ahead modulators, allowing input signal peak amplitudes up to 0.9 (compared to 0.64 in our case);
    - the maximum operating frequency, around 5-6 GHz using time-interleaving, is limiting the possibility of targeting 802.11ac signals in the 5 GHz frequency band; in order to overcome this limitation, we can investigate pipelining as introduced in [Schmidt11], which can be combined with recent implementations of look-ahead techniques that can also reduce the length of the critical path by half (ideally doubles the maximum operating frequency), as in [Tanio16]; in addition, we can study the possibility to replace the current 2's complement signal representation with borrow-save (BS)

arithmetic, as shown in [Frappe06], to further reduce the critical path delay.

- FIR filter signal lines
  - The number of signal lines used for configuring the FIR filter is equivalent to the length of the FIR filter; furthermore, it was shown that a lower coefficient resolution may lead to the cancellation of the smallest coefficients; however, the number of bits used for the coefficient quantization is based on the out-of-band mask (set to 5 in the current design); in [Marin16] and [Marin17a] we show how we could reduce the noise level in specific frequency bands (especially the GPS band), in order to relax the noise constraints and facilitate WLAN standard coexistence; therefore, taking advantage of the proposed noise reduction, we could envision a further reduction of the number of bits used for the FIR coefficients quantization, thus relaxing the filtering constraints (number of signal lines, total number of power cells);
  - study the improvement of the switching activity dependency between the FIR filter signal lines, to achieve a more efficient switching-cell use in the output stage.

# 4.5. Chapter Bibliography

**[Basetas17]** C. Basetas, T. Orfanos, and P. P. Sotiriadis, "A Class of 1-Bit Multi-Step Look-Ahead  $\Sigma$ - $\Delta$  Modulators," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 64, no. 1, pp. 24-37, Jan. 2017.

**[Cathelin17]** A. Cathelin, "RF/analog and mixed-signal design techniques in FD-SOI technology," in IEEE Custom Integrated Circuits Conf. (CICC), May 2017, pp. 1–53.

**[Frappe06]** A. Frappé, A. Flament, A. Kaiser, B. Stefanelli, A. Cathelin, and R. Daouphars, "Design techniques for very high speed digital delta-sigma modulators aimed at all-digital RF transmitters," 13th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Nice, pp. 1113-1116, Dec. 2006.

**[Marin16]** R.-C. Marin, A. Frappé, A. Kaiser, "Delta-Sigma Based Digital Transmitters with Low-Complexity Embedded-FIR Digital to RF Mixing," *23rd IEEE International Conference on Electronics, Circuits and Systems (ICECS)*, Monte Carlo, pp. 237-240, Dec. 2016.

**[Marin17a]** R.-C. Marin, A. Frappé, A. Kaiser, "Considerations for Complex Digital Delta-Sigma Modulators for Standard Coexistence in Digital Wireless Transmitters," **accepted** to *IEEE Trans. Circuits Syst. I, Reg. Papers*, June 2017.

[Meng06] X. Meng, K. Arabi, and R. Saleh, "Novel Decoupling Capacitor Designs for sub-90nm CMOS Technology," *7th International Symposium on Quality Electronic Design (ISQED)*, San Jose, pp. - 271, Mar. 2006.

**[Rabaey03]** J. M. Rabaey, A. Chandrakasan, B. Nikolic, "Digital Integrated Circuits", Second Edition, Prentice Hall Publishing, 2003.

**[Schmidt11]** M. Schmidt, S. Haug, M. Grözing, M. Beroth, "A pipelined 3-level bandpass delta-sigma modulator for class-S power amplifiers," in *2011 IEEE International Symposium on Circuits and Systems (ISCAS)*, Rio de Janeiro, pp. 2757-2760, May 2011.

**[Sourikopoulos15]** I. Sourikopoulos, "Continous-time digital processing techniques applied to channel equalization for low-power millimeter-wave communications," PhD Thesis, University of Science and Technology Lille 1, Dec. 2015.

**[Tanio16]** M. Tanio, S. Hori, N. Tawa, T. Yamase, and K. Kunihiro, "An FPGA-based alldigital transmitter with 28-GHz time-interleaved delta-sigma modulation," *2016 IEEE MTT-S International Microwave Symposium (IMS)*, pp. 1-4, May 2016.

**[Vazquez04]** J. R. Vazquez, M. Meijer, "Modelling the Dynamic Response of On-Chip Decoupling Capacitors," *8<sup>th</sup> IEEE Workshop on Signal Propagation on Interconnects*, Heidelberg, pp. 39-42, May 2004.

# 5. Measurements

This chapter describes the measurement setup and results of the proposed digital RF Transmitter implemented in 28 nm FD-SOI CMOS from STMicroelectronics. The IC is placed on a Ball Grid Array (BGA) substrate with underfill which connects to a dedicated Printed Circuit Board (PCB), to validate the theoretical analysis presented in the previous chapters. The measurements were performed at IRCICA (Institut de Recherche sur les Composants logiciels et matériels pour l'Information et la Communication Avancée), France.

# 5.1. IC Packaging

A specific four metal layer Ball Grid Array (BGA) substrate (5\*5 mm<sup>2</sup>) with underfill was designed under industrial conditions at STMicroelectronics and hosts the flip-chip die and eleven Surface-Mounted Devices (SMD), thus allowing the integration of high quality factor passive inductors and additional decoupling capacitors (Fig. 5.1).



Fig. 5.1. BGA substrate with underfill view TOP: layout (left); assembled (right)

The device under test (DUT) is placed in the center of the BGA substrate, with a slight offset (180  $\mu$ m) on the X-axis, in order to allow symmetric route lines for the differential outputs (routing from DUT to X1 - X2, and from DUT to X3 - X4).

Furthermore, the SMD 0201 components X1 - X2 and X3 - X4 are either  $0\Omega$  resistors, or RF inductors to implement an LC matching network at the desired center frequency. Next, seven SMD 0201 capacitors ( $C_{dec} = 10$  pF) are used for supply decoupling, namely  $VDD_A, VDD_DIG_A, VDD_B, VDD_DIG_B, VBBN_A, VBBN_DIG$ , and  $VBBP_DIG$ .

Finally, the power plane of the BGA substrate (2<sup>nd</sup> internal plane, closer to BOTTOM) has been equally divided into four sub-planes, one for each of the DUT's supply voltages (*VDD\_A*, *VDD\_DIG\_A*, *VDD\_B*, *VDD\_DIG\_B*).

# 5.2. PCB design

The PCB test versions were designed in Altium Designer and fabricated by Eurocircuits [Euroc1] (Fig. 5.2), based on four metal layers (TOP, INNER1, INNER2, and BOTTOM), similar to the BGA substrate, whereas INNER1 is used as ground plane for TOP and INNER2 is partially used as supply plane and partially as ground plane for BOTTOM.

The DUT is placed in the middle of the board and can be tested in two operating modes, functional and power. In the power mode the LC matching network and the RF output transmission lines (TL) are designed at the center frequencies {0.9, 1.6, 2.4} GHz.



Fig. 5.2. PCB version *functional* at 2.4 GHz - View TOP

The data inputs I and Q are routed on TOP from the sides, ensuring equal line lengths, whereas the main clock is routed on BOTTOM, due to the limited routing space near the IC. Furthermore, the supply voltages can be used separately for the analog and digital circuits, whereas the body-bias voltages can be generated externally or grounded on the PCB (default operation for LVT transistors).

In addition, the board enables the use of an external SPI controller (Arduino based, MATLAB code) which generates the input signals for the internal SPI present on the IC. However, during measurement, the communication with the on-chip SPI could not be achieved. Hence, the test cases were built on the default state of the SPI, which enabled only the Side A MUX circuit in the 20 MHz FIR filter case, without the configurability of FIR coefficients, or output matching network cells.

Next, we will describe the design of the critical signal lines (high-speed signals, RF outputs in Fig. 4.4) depending on the PCB base material properties.

## 5.2.1. High-speed signal line design

The high-speed signal lines have been designed as transmission lines with the impedance  $Z_0 = 50 \Omega$ , in order to preserve signal integrity (reflection). Furthermore, due the frequency range of these signals (RF/microwave), the transmission line can be modeled as a coplanar waveguide with ground (CPWG), as shown in Fig. 5.3 [Hartley17].



Fig. 5.3. Coplanar Waveguide w/Ground [Hartley17]

Moreover, the parameters of the transmission line can be calculated using either the mathematical formulas from [Hartley17], or dedicated software such as TX-LINE from National Instruments (free license), which derives the electrical/physical line characteristics depending on the material parameters [NI17].

The base material used for the PCB is RO4350B [Euroc2], with a dielectric constant  $\varepsilon_r = 3.66$ , loss tangent tan $\delta = 0.0031$ , and height H = 0.254mm, whereas the gap is set to G = 0.508 mm, and the trace thickness to T = 30 µm, respectively. Hence, using TX-LINE, we obtain a trace width W = 20 mils = 0.508 mm with the impedance  $Z_0 \approx 50 \Omega$  at a frequency f = [1; 10] GHz (Fig. 5.4).

| 9                           |               | TXLIN        | E 2003 - ( | CPW                    |                |        | ×       |  |  |
|-----------------------------|---------------|--------------|------------|------------------------|----------------|--------|---------|--|--|
| vlicrostrip   Stripline   C | PW CPW Ground | Round Coaxia | Slotline   | Coupled MSLine   Cou   | pled Stripline |        |         |  |  |
| Material Parameters         |               |              |            |                        |                |        |         |  |  |
| Dielectric GaAs             | •             | Conductor    | Copper     | -                      | +G→ <u> </u> + | -₩→  ↓ |         |  |  |
| Dielectric Constant         | 3.66          | Conductivity | 5.88E+07   | S/m 💌                  |                | ε. Τ   |         |  |  |
| Loss Tangent                | 0.0031        |              |            | AWR                    |                | -1     | <i></i> |  |  |
| Electrical Characteristic   | 08            |              | 7          | Physical Characteristi | c              |        |         |  |  |
| Impedance                   | 50            | Ohms 🔻       |            | Physical Length (L)    | 714.871        | mil    | -       |  |  |
| Frequency                   | 10            | GHz 💌        | -          | Width (W)              | 20             | mil    | •       |  |  |
| Electrical Length           | 90            | deg 💌        | -          | Gap (G)                | 20             | mil    | •       |  |  |
| Phase Constant              | 180           | deg/m 💌      |            | Height (H)             | 100            | um     | •       |  |  |
| Effective Diel. Const.      | 10            |              |            | Thickness (T)          | 35             | um     | •       |  |  |
| Loss                        | 10            | dB/m 💌       |            |                        |                |        |         |  |  |
|                             |               |              |            |                        |                |        |         |  |  |

Fig. 5.4. TX-LINE user interface example

# 5.2.2. RF differential outputs

The PCB was designed to allow the DUT measurement in terms of functionality (version functional), and maximum output power (version power). In the functional version, the trace width is set to  $W_f = 0.508$  mm with the impedance  $Z_0 \approx 50 \Omega$  (as shown in Section 5.2.1). In this case, the LC matching network is set at half of the maximum targeted center frequency, namely 1.2 GHz, in order to test the functionality of the proposed all-digital transmitter in the frequency domain [0; 2.4] GHz in terms of high-frequency operation and band filtering transfer function. Moreover, we note that the resulting output power is much lower than the one shown in the theoretical analysis (~14 dB lower), due to the difference between the load resistance values (50  $\Omega$  compared to 2  $\Omega$ ).

In the *power* version, the RF differential outputs are designed as adapted transmission lines from 2  $\Omega$  to 50  $\Omega$  and 4  $\Omega$  to 50  $\Omega$  at the test center frequencies, in order to maximize the power transfer towards the load and evaluate the impact of the series

parasitic resistance in the signal path. The line adaptation has been designed in ADS (Advanced Design System from Keysight), using s-parameters ( $S_{11}$  and  $S_{21}$ ) representation of a transmission line between two ports, *port1* and *port2* (Fig. 5.5).



Fig. 5.5. ADS schematic – Example of adapted transmission line 2  $\Omega$  to 50  $\Omega$  at 2.4 GHz

The parameter  $S_{11}$  is called reflection or return loss and refers to the signal reflected at *port1*, when a signal is incident at *port1*, whereas  $S_{21}$  is called transmission or insertion loss and refers to the signal transmitted at *port2*, when a signal is incident at *port1*. The ideal  $S_{11}$  should present very little return (reflection) at all frequencies, and the ideal  $S_{21}$ , very little loss, respectively.

The adapted transmission lines were designed for the power PCB version at three test center frequencies, 900 MHz, 1.6 GHz and 2.4 GHz, to achieve in all three cases a transmission loss  $S_{21} > -0.2$  dB and a return loss  $S_{11} < -20$  dB over a 100 MHz bandwidth.

Finally, an example of the difference between the trace widths and length of the RF outputs based on the transmission line design is highlighted in Fig. 5.6. In the *functional* version (Fig. 5.6 left), the trace width is set to  $W_f$  = 0.508 mm with the impedance  $Z_0 \approx 50 \Omega$ , whereas the *power* version (Fig. 5.6 right) corresponds to an adapted transmission line from 2  $\Omega$  to 50  $\Omega$  at the center frequency 900 MHz with the trace width  $W_p$  = 3.048 mm and length  $L_p$  = 43.18 mm.



Fig. 5.6. PCB versions functional (left) and power (right) - View TOP

# 5.3. Measurement setup and test cases

This section presents the measurement setup (Fig. 5.7) and tests performed in order to validate the proposed concept of all-digital transmitter based on single-bit core processing, digital to RF mixing and differential SC PA with embedded band filtering.



Fig. 5.7. Measurement setup

The input I/Q data files are generated in MATLAB and are assigned to the eight synchronous outputs of the arbitrary waveform generators (AWG) used to stream in data to the device under test (DUT) at a sampling frequency of  $f_c/2$  (equal to 450 MHz for the 900 MHz test).

The main clock is generated externally at a frequency of  $2^*f_c$ . A differential RF probe is used to measure the RF output on a 50  $\Omega$  load. The probe is further connected to the PXA signal analyzer to obtain the main performance of the DUT.

### 5.3.1. Case 1: Functional

The first test highlights the functionality of the proposed transmitter in terms of operating frequency and dissipated power, using the *functional* PCB version (transmission lines with  $Z_0 = 50 \Omega$  at the RF outputs). In this case, the components X1 - X2 are RF inductors used in the LC matching network at a center frequency  $f_c = 1.2$  GHz (as described in Section 5.2.2).

Considering the test bench in Fig. 5.7, when the main clock frequency is  $f_{CK}$ , and all the digital inputs (I1-4, Q1-4) are set to "0", the resulting output signal is a periodic square wave at the frequency  $f_{CK}/2$ , due to the digital mixing function {I, Q, -I, -Q} which translates to {0, 0, 1, 1}. Thus, we can measure the peak amplitude of the fundamental at different frequencies  $f_{CK}/2$ . Next, the amplitudes are normalized in order to plot the resulting RLC BPF transfer function (Fig. 5.8) with respect to the theoretical filter transfer function.



Fig. 5.8. RLC BPF transfer function: measured vs. theoretical (functional)

This comparison reveals a performance degradation in terms of peak output power for center frequencies above 1.2 GHz (corresponding to >4.8 GHz digital core frequency). Post-fabrication extracted simulations show that this limitation is mainly due to the signal generation in the digital core circuit, namely the clock-tree generation and the signal multiplexing at 4\*  $f_c$  present degraded performance when working at higher frequencies, which leads to a loss of useful samples and non-ideal interleaving of the I and Q paths.

Marin Răzvan-Cristian, PhD Thesis 2017

This is also confirmed by the evolution of the power consumption with respect to the center frequency at a fixed supply voltage of 1 V (Fig. 5.9). It is seen that for a center frequency  $f_c > 1.8$  GHz, the power consumption starts to reduce, as a consequence of the non-ideal clock-tree operation, which is in agreement with the observations in [Muller11]. Consequently, due to the digital core limitation, the following measurements were performed in the frequency domain [0.3; 1.8] GHz.



Fig. 5.9. Power consumption vs. center frequency

Furthermore, in Fig. 5.10 we highlight the embedded FIR filter performance, when transmitting a 9 MHz baseband single-tone at a carrier frequency  $f_c$  = 900 MHz, whereas the main input clock frequency is  $f_{CK}$  = 2\* $f_c$  = 1.8 GHz. The measured peak output power is  $P_{out,pk}$  = -2.3 dBm, for an overall power consumption of 31.4 mW at 1 V supply voltage, out of which the front-end (FE) output stage consumes 7.4 mW and the back-end (BE) digital core 24 mW, respectively.

The resulting OOB noise attenuation is around -50 dBc and reflects mainly the embedded FIR filtering transfer function. Furthermore, it is seen that the main digital circuit operation at half the center frequency  $f_c/2$  and the  $V_{DD}$  supply noise due to high-frequency switching determine the presence of low amplitude signal images at  $f_c \pm f_c/2$ .

The measured output power of -2.3 dBm is coherent with the theoretical estimation in Chapter 4, considering the output power value in Table 4.4 (Case 5), the load resistance  $R_L = 50 \Omega$ , and -1 dB from RLC attenuation (Fig. 5.8).



Fig. 5.10. Case 1: Measured spectrum 9 MHz BB single-tone,  $f_{CK} = 1.8$  GHz

The peak output power and the out-of-band noise attenuation can be further improved, using adapted transmission lines at the carrier frequency  $f_c$  = 900 MHz (from 2  $\Omega$  to 50  $\Omega$  and from 4  $\Omega$  to 50  $\Omega$ ), to lower the load resistance and increase the quality factor of the RLC filter.

### 5.3.2. Case 2: FIR filter with transmission line adaptation

This case proposes to improve the performance obtained in *Case 1* in terms of peak output power and OOB noise attenuation, by introducing an adapted transmission line at the carrier frequency  $f_c$  = 900 MHz. Two line adaptations have been designed, from 2  $\Omega$  to 50  $\Omega$  and from 4  $\Omega$  to 50  $\Omega$ , in order to evaluate the effect of the RLC transfer function, and the parasitic resistances in the signal path due to BGA packaging and PCB.

**Case 2a** corresponds to a PCB *power* with an adapted transmission line from 2  $\Omega$  to 50  $\Omega$ , and an LC matching network at the carrier frequency  $f_c$  = 900 MHz, built with RF inductors on the BGA and PCB.

**Case 2b** corresponds to a PCB *power* with an adapted transmission line from 4  $\Omega$  to 50  $\Omega$ , and the same LC matching network used in *Case 2a*.

The measured peak output power in *Case 2a* is 2.2 dBm, when transmitting a 9 MHz baseband single-tone at a carrier frequency of 900 MHz and 1 V supply voltage. In *Case 2b* the peak output power is 2.9 dBm using a 4  $\Omega$  transmission line adaptation, and corresponds to an improvement of 0.7 dB with respect to *Case 2a* and 5.2 dB with respect to *Case 1*, respectively (as summarized in Table 5.1).

The difference between the measured peak output power in *Case 2(a/b)* and theoretical value (derived in Chapter 4) arises due to the effect of the series parasitic resistance in the signal path. The initial estimation was based on the equivalent on-resistance of the SC PA stage and the inherent resistance of the RF inductors, resulting in an equivalent 0.7  $\Omega$  parasitic resistance.

However, the output power values obtained in measurement are lower than expected, due to external factors related to the test bench, such as BGA soldering, pad access, RF output traces on BGA and PCB, respectively. Taking into account the results from *Case 1, Case 2a* and *Case 2b*, we can estimate the total series parasitic resistance to approximately 4.6  $\Omega$ , namely the FIR-PA equivalent on-resistance of 0.4  $\Omega$ , the RF inductors parasitic resistance of 0.3  $\Omega$  and a series parasitic resistance due to test bench of 3.9  $\Omega$ .

This effect is also visible in Fig. 5.11 when plotting the measured RLC transfer function with respect to the theoretical transfer function obtained for a load resistance  $R_L = 2 \Omega$  and a parasitic resistance  $R_{ps} = 0.7 \Omega$  (similar method used in Fig. 5.8). Thus, the OOB noise attenuation of the measured BPF filter is inferior with respect to the ideal case due to the additional parasitic resistance which determines a lower filter quality factor.



Fig. 5.11. RLC BPF transfer function: measured vs. theoretical (*power* 2  $\Omega$  to 50  $\Omega$ )

## 5.3.3. Case 3: Higher frequency operation

The circuit was also measured at a center frequency  $f_c = 1.6$  GHz using a PCB *power* version with adapted transmission lines 2  $\Omega$  to 50  $\Omega$ . The peak output power is -0.8 dBm, when transmitting a 16 MHz baseband single-tone at 1 V supply voltage. The output stage consumes 20.6 mW and the digital core 46 mW, respectively.

The digital mixer performance is degraded due to the aforementioned operating frequency limitation (Section 5.3.1) and determines an image component at -17dBc, because the Side A MUX circuit doesn't perform signal resynchronization after the I/Q multiplexing function (Section 4.3.3).

The measured results in this case confirm the conclusions of *Case 1*, and provide information regarding future improvements/optimizations of the digital core circuit to enable an efficient operation for center frequencies above 1.8 GHz.

Finally, a summary of the measurement test cases is presented in Table 5.1. The theoretical expected values in terms of peak output power and consumption were derived with respect to Section 4.2.3 for two values of the series parasitic resistance,  $R_{ps} = 0.7 \Omega$  and  $R_{ps} = 4.6 \Omega$ , when operating at a center frequency of 900 MHz. It is seen that the measured results closely match the theoretical expected values when considering  $R_{ps} = 4.6 \Omega$ .

| Test    | DCD   | т      | $f_c$ | f <sub>c</sub> Peak Pout [dBm] |                                    |       | Cons. FE [mW]                        |      |  |
|---------|-------|--------|-------|--------------------------------|------------------------------------|-------|--------------------------------------|------|--|
| Test    | rud   | 11.    | [GHz] | Meas.                          | Th.                                | Meas. | Th.                                  | [%]  |  |
| Case 1  | func. | 50 Ω   | 0.9   | -2.3                           | -2.3                               | 7.4   | 4.6 <sup>a</sup> /5.8 <sup>b</sup>   | 7.9  |  |
| Case 2a | pow.  | 2-50 Ω | 0.9   | 2.2                            | 10 <sup>a</sup> /2.2 <sup>b</sup>  | 10.2  | 17.3 <sup>a</sup> /9.3 <sup>b</sup>  | 16.2 |  |
| Case 2b | pow.  | 4-50 Ω | 0.9   | 2.9                            | 8.2 <sup>a</sup> /2.9 <sup>b</sup> | 10.8  | 12.7 <sup>a</sup> /10.2 <sup>b</sup> | 18.1 |  |
| Case 3  | pow.  | 2-50 Ω | 1.6   | -0.8                           | $10^{a}/2.2^{b}$                   | 20.6  | 19.8 <sup>a</sup> /11.8 <sup>b</sup> | 0.6  |  |

Table 5.1. Test cases description

 $^{\rm a}$  Equivalent series parasitic resistance of 0.7  $\Omega$ 

 $^{\rm b}$  Equivalent series parasitic resistance of 4.6  $\Omega$ 

In conclusion, during measurement we identified an operating frequency limitation, due to the clock-tree generation and the signal multiplexing at  $4^*f_c$ , which translates to reduced performance in terms of peak output power and distortion in the signal-band for center frequencies higher than 1.8 GHz. In addition, the peak output power is also reduced due to an estimated series parasitic resistance of 4.6  $\Omega$  (~4  $\Omega$  more than estimated) present in the signal path. Therefore, an extensive set of measurements was performed for the test *Case 2b* which presents the best measured peak output power, in order to validate the proposed digital RF transmitter concept for sinewave and LTE signals (LTE 10 MHz and LTE 20 MHz) at the 900 MHz band. The results are further presented and compared with relevant state-of-the-art publications in Section 5.4.

# 5.4. Experimental IC validation on LTE standard at 900 MHz band

The best performance of the proposed digital transmitter in terms of peak output power is obtained in *Case 2b*. The measured output spectrum of a 9MHz baseband singletone at a peak output power of 2.9 dBm emphasizes the out-of-band attenuation provided by the digital band filter composed of 109-tap FIR and RLC filtering (Fig. 5.12). The overall power consumption is 34.8 mW at 1 V supply voltage, out of which the front-end consumes 10.8 mW and the back-end 24 mW, respectively. The resulting in-band peak SNR is ~66dB, which corresponds to 10.5 effective number of bits (ENOB).



Fig. 5.12. Measured spectrum 9 MHz BB single-tone: RF output vs. DSM input

In terms of out-of-band noise, it is seen that the proposed architecture achieves around -70 dBc at an offset of  $-f_c/3 = -300$  MHz, when transmitting at the center frequency  $f_c = 900$  MHz. This result is coherent with the system and circuit-level study presented in the previous chapters which was concentrated on obtaining around -70 dBc noise attenuation at the GPS band, located at  $-f_c/3$ , when transmitting at  $f_c = 2.4$  GHz.

The FD-SOI body-bias  $V_t$  tuning feature, described in [Cathelin17], was used in the digital to RF mixer and the output FIR-PA blocks to adjust the CMOS transistors switching operation, and reduce switching non-idealities as detailed in Section 3.4 and Section 4.1.2.

It is confirmed, that the DRFM stage is very sensitive to switching non-idealities, due to the clock tree generation and high-frequency signal multiplexing. The body-bias tuning is applied in the digital core and has a major impact on finding an optimum operating point to attenuate the LO and image to -55 dBc, and -63 dBc respectively, whereas the counter 3<sup>rd</sup>-order intermodulation product (CIM3) is lower than -46 dBc (Fig. 5.13). For instance, the image component is highly reduced by around 30 dB, thanks to the improvement of the I/Q path interleaving achieved through a balanced pMOS-nMOS switching operation (Section 4.1.2).



Fig. 5.13. 6 dB back-off fine-tuning with FD-SOI body-bias

In the case of the output FIR-PA, the effect of body-biasing is shown to be minimal in terms of noise performance. This can be explained by the fact that the impact of nonidealities resulting from the DRFM block is dominant with respect to non-ideal CMOS inverter operation. Hence, if the DRFM switching non-idealities are not reduced locally (as
shown in Fig. 5.13), they will be translated further to the FIR-PA driving signals, with a major impact on the output stage performance. In this case, we estimate that the use of a global body-bias tuning in the FIR-PA will not be effective in the multi-path switching-mode structure, as it cannot ensure an optimum CMOS inverter operation (due to non-ideal driving signals) for all the paths at the same time.

Furthermore, we identify the presence of increased even-order harmonics due to the non-ideal differential to single mode conversion. We recall from Chapter 3, that in order to reduce switching redundancy and filter complexity, the positive and negative driving signals of the FIR-PA are not purely differential and present three possible values  $RFP/RFN = \{1/0; 0/0; 0/1\}$ , which may lead to additional common mode non-idealities.

In addition, it is seen that the performance in terms of third harmonic (Hm3) rejection is degraded due to circuit non-idealities. In Chapter 3, the FIR-PA non-idealities were studied in order to evaluate the effects on the out-of-band noise attenuation. The derived non-idealities models (inverter switching, coefficient mismatch and supply voltage variation) can be also applied to assess the signal-band performance. Using the same non-idealities design specifications (Section 3.5), resulted in a degradation of the Hm3 rejection with respect to the ideal case in the presence of switching non-idealities ( $\sim$ 26 dB for variable propagation delays and jitter) and  $V_{DD}$  supply variation (20 to 40 dB). Moreover, switching non-idealities are shown to affect also the even-order harmonics performance.

The output power and ACLR performances for LTE 10 MHz and 20 MHz transmitted signals are evaluated in Fig. 5.14, highlighting the spectral mask compliance.



Fig. 5.14. Output power and ACLR for LTE 10 MHz and 20 MHz

The maximum output power and ACLR for LTE10 are -2.8 dBm and -33/-41 dBc, whereas for LTE20 we obtained -3.2 dBm and -33.5/-40 dBc, respectively for an overall power consumption of 37 mW when operating at a 1 V supply voltage.

The 28nm FD-SOI transmit chain performance is summarized and further compared with reported digital-intensive architectures in Fig. 5.15.

|                                            | ISSCC'15<br>[Jin15]                         | ISSCC'17<br>[Roverato17]                           | ISSCC'16<br>[PFilho16]                       |           | This work                                   |             |
|--------------------------------------------|---------------------------------------------|----------------------------------------------------|----------------------------------------------|-----------|---------------------------------------------|-------------|
| Architecture                               | Quadrature DRFC<br>(Common IQ)              | 10b ΔΣ + Mismatch<br>Shaping 10b DAC<br>(Dual I/Q) | RQDAC +<br>passive mixer<br>(Dual I/ Dual Q) |           | Digital Mixer +<br>SC FIR-PA<br>(common IQ) |             |
| Carrier Frequency<br>[MHz]                 | 800                                         | 900                                                | 900                                          | 2400      | 900                                         |             |
| Supply voltage [V]<br>Front-end / Back-end | 1.1 / 1.1                                   | 1.5 / 0.9                                          | 0.9 / 1.1                                    |           | 1/1                                         |             |
| DAC Resolution [bits]                      | 6<br>(physical)                             | 10<br>(physical)                                   | 12<br>(physical)                             |           | 1<br>(10.5 ENOB)                            |             |
| Peak P <sub>out</sub> [dBm]                | 13.9                                        | 1.2                                                | 3.5                                          |           | 2.9                                         |             |
| LO feedthrough [dBc]                       | -                                           | -61                                                | < -57                                        |           | < -55                                       |             |
| Image [dBc]                                | -                                           | -36                                                | < -42                                        |           | < -60                                       |             |
| CIM3 [dBc]                                 | -                                           | -67                                                | < -50                                        |           | < -46                                       |             |
| Modulation signal                          | LTE 10MHz                                   | LTE 20MHz                                          | 20 MHz                                       |           | LTE 10MHz                                   | LTE 20MHz   |
| Average P <sub>out</sub> [dBm]             | 6.97                                        | 0.9                                                | -3.5                                         |           | -2.8                                        | -3.2        |
| ACLR [dBc]                                 | -32.4 / -34.9                               | -60.7 / -61.6                                      | -42 / -59                                    | -47 / -59 | -33 / -41                                   | -33.5 / -40 |
| Consumption [mW]<br>Front-end / Back-end   | CW: 60.7 / - (FPGA)<br>LTE: 17.1 / - (FPGA) | LTE 20MHz:<br>75 / 22 (w/o ΔΣ)                     | 20MHz: 20MHz:<br>11.1 / - 24.8 / -           |           | CW: 10.8 / 24<br>LTE: 8.2 / 29              |             |
| Active area [mm <sup>2</sup> ]             | 0.24                                        | 0.82                                               | 0.22                                         |           | 0.047 (incl. RF pads)                       |             |
| Matching network                           | Off-chip                                    | On-Chip                                            | Off-chip                                     |           | On BGA                                      |             |
| Technology                                 | 28nm                                        | 28nm                                               | 28nm                                         |           | 28nm FD-SOI                                 |             |

Fig. 5.15. Measured performance summary and comparison with state-of-the-art

First, [Jin15] introduces a 6-bit switched-capacitor I/Q sharing direct-RF conversion (DRFC) structure, implementing a disable opposite cell technique to improve drain efficiency. This way, it can digitally deactivate same-size cell pairs being driven by 180 degrees out-of-phase signals before the power stage. Using an LC matching network at 900 MHz, the measured peak output power is 13.9 dBm when performing continuous-wave measurements at 289 points (17 points for each I and Q symbols). However, due to the low-resolution scheme, the spectrum mask for LTE 10 MHz is not met, which determines the need of additional sharp filters to reduce RX band noise [Jin17].

Furthermore, [Roverato17] demonstrates a digital Delta-Sigma Modulator mismatch shaping architecture to provide an efficient reduction of out-of-band emissions at a programmable duplex distance. Hence, static mismatch effects are minimized by scrambling the order of conversion cells at each sampling instant which transforms static mismatch nonlinearity into pseudorandom noise. Yet, this architecture is operated at a large supply voltage of 1.5 V, which determines a considerable output stage power consumption of 75 mW at a peak output power of 1.2 dBm, and may lead to integration issues in applications targeting limited supply voltage mobile devices.

Finally, [PFilho16] presents a 12-bit resistive quadrature DAC with dual inphase/dual quadrature signal paths based on an incremental-charge operation to reduce the power consumption required to drive the RF load. This architecture is demonstrated to work at both 900 MHz and 2.4 GHz band and can drive a 50  $\Omega$  load at a peak output power of 3.5 dBm using an off-chip balun. Additionally, the digital input data is read from an integrated memory of 4096 words, and simple predistortion is applied on the baseband signal to compensate switch conductance variation with respect to the output voltage.

Our architecture implements a 28nm FDSOI digital transmit chain combining singlebit processing, SC network and band filtering. This work stands out for out-of-band noise attenuation, reduced SC structure complexity, low power consumption and area. At similar output power levels, the FIR-PA (at 1V) consumes 7 times less than a 10-bit DSM-based DAC (at 1.5V) [Roverato17] and 25% less than a 12-bit resistive DAC (at 0.9V) [PFilho16]. The total active area is at least 4 times lower than the smallest previous design, for same technology node.

### **5.5.** Conclusion

This chapter presents the experimental validation of an all-digital transmitter concept combining single-bit processing and extended (109-tap FIR) band filtering for low power and area mobile devices developed to support emerging IoT applications.

The proposed TX is compliant with LTE 900 MHz signals and achieves state-of-the-art performance at similar output power compared to recent relevant work. Furthermore, the

circuit operation is proven up to a center frequency of 1.8 GHz. This work stands out for low power consumption thanks to the single-bit DSM-based core solution combined with band filtering (FIR and RLC band-pass) and low area achieved with a multi-layer FIR-PA cell structure.

During measurements, we identified several possible improvements in terms of peak output power, and maximum operating frequency, to enable better operating performance and multi-standard compliance.

On the one hand, the measured peak output power was lower than expected when using low-impedance transmission line adaptation, due to parasitic resistances in the signal path corresponding to packaging and PCB. It is seen, that for a 50  $\Omega$  load, the series parasitic resistance can be neglected and the measured results are coherent with circuit level simulations (Section 5.3.1). However, when reducing the output load to 2  $\Omega$  or 4  $\Omega$ , the value of the series parasitic resistance can be estimated to 4.6  $\Omega$ , which is ~4  $\Omega$  larger than the initial estimation based only on the inverter on-resistance and parasitic resistance of RF inductors (Section 5.3.2). A possible way to overcome the output power loss due to parasitics, would be to integrate the matching network on-chip [Roverato17], making it necessary to find a balance between the output power and the increased occupied area.

On the other hand, we remark that the maximum operating frequency is degraded and is limited to around 1.8 GHz, due to the digital circuit core performance. In a second IC version, the digital core would be designed and integrated hierarchically (block by block), taking special care to the clock-tree generation and signal interfacing. Extensive simulation including extraction would allow the identification of critical design blocks and a better control on the digital design flow. Furthermore, integrating the digital SPI separately would ensure its correct operation and provide the TX extended configurability in terms of FIR filter transfer function and matching network center frequency, as described in Chapter 4.

In conclusion, the proposed concept of all-digital transmitter demonstrates the transition from traditional analog to highly integrated digital-intensive transmitters and may play an important role in the future of mobile applications.

## 5.6. Chapter Bibliography

**[Cathelin17]** A. Cathelin, "RF/analog and mixed-signal design techniques in FD-SOI technology," *IEEE Custom Integrated Circuits Conference (CICC)*, pp. 1-53, May 2017.

[Euroc1] Eurocircuits (online), http://www.eurocircuits.com/, Nov. 2016.

[Euroc2] RO4000<sup>®</sup> Series (online), http://www.eurocircuits.com/wp-content/uploads/ ec2015/ecImage/document/RO4003%20and%20RO4350B%20Rogers.pdf, Nov. 2016.

**[Hartley17]** R. Hartley, "RF / Microwave PC Board Design and Layout" (online), www.jlab.org /accel/eecad/pdf/050rfdesign.pdf, Nov. 2016.

[Jin15] H. Jin et al., "Efficient Digital Quadrature Transmitter Based on IQ Cell Sharing," *IEEE International Solid-State Circuits Conference (ISSCC)*, pp. 168–169, Feb. 2015.

[Jin17] H. Jin, D. Kim, and B. Kim, "Efficient Digital Quadrature Transmitter Based on IQ Cell Sharing," *IEEE J. Solid-State Circuits*, vol. 52, no. 5, pp. 1345-1357, May 2017.

**[Marin17b]** R.-C. Marin, A. Frappé, B. Stefanelli, P. Cathelin, A. Cathelin, A. Kaiser, "A 28nm FD-SOI CMOS Digital RF Transmitter with Switched-Capacitor Pre-Power Amplifier and Embedded Band Filter," **to be submitted** to *Journal of Solid-State Circuits*.

**[Muller11]** J. Muller, "60 GHz wireless transmitter with SDR capabilities," PhD Thesis, University of Science and Technology Lille 1, Sept. 2011.

**[NI17]** TX-LINE: Transmission Line Calculator (online), http://www.awrcorp.com/ products/additional-products/tx-line-transmission-line-calculator, Nov. 2016.

**[PFilho16]** P.E. Paro Filho et al., "A 0.22mm<sup>2</sup> CMOS resistive charge-based direct-launch digital transmitter with -159dBc/Hz out-of-band noise," *IEEE International Solid-State Circuits Conference (ISSCC)*, pp. 250–252, Feb. 2016.

**[Roverato17]** E. Roverato et al., "All-digital RF transmitter in 28nm CMOS with programmable RX-band noise shaping," *IEEE International Solid-State Circuits Conference (ISSCC)*, pp. 222–223, Feb. 2017.

# 6. Conclusion

This chapter concludes this work on the study, design and validation of a 28nm FD-SOI CMOS digital RF transmitter with switched-capacitor power amplifier and embedded band filter. Furthermore, we present a view of future concept development to support highly-ambitious emerging Internet of Things (IoT) applications.

#### **6.1. Research conclusion**

The research work is concentrated on the development of an all-digital transmitter based on a *hybrid* single-bit operation and 109-tap FIR filter to address the challenges of tomorrow's integrated communication systems.

The general idea behind the concept is to reduce the main operation to simple functions. Hence, instead of adding complexity inside the given function, we propose an efficient way to merge and optimize simple functions together. For instance, [Yoo13] implements a class-G switched capacitor power amplifier using a 3-level supply scheme with *GND*,  $V_{DD}$ ,  $2*V_{DD}$ . In order to support higher supply voltages and reduce gate stress and leakage, the switched is designed in cascode with 2 additional output series pMOS-nMOS transistors.

In our case, the switch is based on a simple CMOS inverter in a switched capacitor scheme which performs a dual function of digital to analog conversion and band filtering. We note that the architecture in [Yoo13] is compatible with the proposed FIR-PA and may be used to provide additional functionality, such as the embedded-FIR digital to RF mixing presented in Section A.2 [Marin16].

Furthermore, the combination of single-bit operation and CMOS inverter switching allows the reduction of the FIR filter complexity by half, while maintaining the attenuation performance of the full FIR filter. This has a major positive impact on area and power consumption and proves the feasibility of the proposed SC PA topology operating at low supply voltages. However, in order to maintain the output power performance, the supply voltage reduction needs to be compensated by the use of lower load resistances which raises additional issues in terms of circuit integration, i.e. integrated inductors present a limited quality factor

This way, the proposed digital transmitter architecture challenges the traditional design based on high-resolution DAC architectures. It presents a structure complexity (number of unitary cells) comparable to a 10-bit DAC with less control signals (55 for Half-SC, and 68 for segmented 10-bit DAC), and noise performances better overall than 11-bit in the presence of non-idealities.

The circuit measurement results validate the proposed concept of all-digital transmitter in terms of low power, reduced area and large out-of-band attenuation. Furthermore, for each design phase we identified and proposed several improvements to enable extended configurability and better performance in terms of output power and operating frequency to support multi-standard applications. For example, we may recall the optimization of the number of signal lines and the signal interface between digital core and FIR-PA, the introduction of additional noise-shaping features, and the reduction of the series parasitic resistance in the signal path.

In conclusion, the proposed architecture demonstrates the transition from traditional analog to highly integrated digital-intensive transmitters, by pushing the digital domain up to the antenna to open the way towards smaller, faster, *smarter* all-digital transmitters.

#### **6.2. Future directions**

The proposed architecture combining 28nm FD-SOI CMOS technology and the transition from analog to digital signal processing enables extended configurability for software defined radio (SDR), leading to accessible and efficient solutions for *smart* mobile

devices to support highly-ambitious Internet of Things (IoT) applications, such as *Smart City* [Shahrour17].

Let us now imagine the future development of the proposed all-digital transmitter concept, building upon the present achievements to enhance system efficiency and configurability at superior operating frequencies.

The **system efficiency** depends on the output power and dissipated power. Hence, in order to improve the efficiency, we should study the possibility of increasing the output power, while maintaining or reducing the dissipated power. One way to increase output power was already mentioned in the previous chapter and involves the use of an on-chip matching network to reduce parasitic resistances in the signal path due to packaging and PCB. For instance, [Roverato17] introduces an I/Q path combination through on-chip RF balun to match  $50\Omega$  external load. In addition, the low power and reduced area of the proposed PA provides ample design margin, making it possible to replicate the structure and introduce an on-chip power combiner, similar to [Passamani17].

Furthermore, we propose the study of an adaptive power scheme specific to back-off operation. It is observed in system level simulations, that reducing the input signal power determines a reduced maximum number of active PA cells (output power depends on the ratio between active and total number of cells), whereas the DSM-driven switching activity (therefore dissipated power) remains almost constant. Hence, we envision the implementation of a dynamic adaptive scheme which adjusts proportionally (to maintain FIR filter coefficient ratio) the number of active PA cells, to increase the system efficiency at back-off power levels.

In terms of **operating frequency**, the proposed system benefits from extended configurability running at half the carrier frequency ( $f_c/2$ ). However, the digital to RF upconversion mixer implemented as a 2-step multiplexer at the final frequency of  $4^*f_c$ , is found to be critical in terms of dissipated power and frequency limitation. Hence, the objective would be to reduce the basic operating frequency of the mixer, which could be achieved as in the case of the DSM, by introducing a time-interleaved scheme. [Koh14] presents an extended study on time-interleaved RF carrier modulation, based on mixer arrays where the outputs are modulated by a series of low-frequency time-delayed carriers, and interleaved in the time domain to synthesize the final output (Fig. 6.1).



Fig. 6.1. M-TI mixer array: diagram (left); rectangular pulse trains (right) [Koh14]

Moreover, [Walling09] introduces the concept of pulse-width pulse-position modulation (PWPM) in a class-E PA, whereas the input signal amplitude is mapped to the width of a single pulse, and the phase to the edge timing of the same pulse, respectively (Fig. 6.2). However, the performance of this scheme is degraded when transmitting signals with large PAPR at higher carrier frequencies, since the minimum signal amplitude determines the minimum pulse-width, which becomes a larger fraction of the switching period.





[Cho16] introduces a technique to overcome the issue of narrow-pulses by employing carrier-switching, which uses the fundamental component of the carrier frequency for large signal amplitudes, and the second harmonic of half of the carrier frequency for small signal amplitudes.

It is now clear that the pulse positioning concept (TI mixer or PWM-based) can be an efficient alternative to 2-step multiplexing, thanks to the reduced timing constraints and power consumption. Hence, the maximum operating frequency would no longer be limited by the high-frequency signal sampling, and would depend on the design of the delay cell used to define the pulse position in time. Recent works show that a granular delay of 110 ps - 500 ps with an energy efficiency of 12.5 fJ/event can be achieved using a 28nm FD-SOI CMOS thyristor-type delay element [Sourikopoulos16]. Additionally, this scheme employs body-bias V<sub>t</sub> fine-tuning to obtain a fine/coarse delay control, with a minimum fine sensitivity of 50 fs/mV and coarse sensitivity of 600 fs/mV.

Consequently, implementing the signal up-conversion using the pulse positioning concept and the delay element in [Sourikopoulos16] would provide a power efficient solution allowing carrier frequencies over 10 GHz.

### **6.3.** Chapter Bibliography

**[Cho16]** K. Cho, and R. Gharpurey, "A Digitally Intensive Transmitter/PA Using RF-PWM With Carrier Switching in 130 nm CMOS," *IEEE Journal Of Solid-State Circuits*, vol. 51, no. 5, pp. 1188-1199, May 2016.

**[Koh14]** K.-J. Koh, S. Y. Mortazavi, S. Afroz, "Time Interleaved RF Carrier Modulations and Demodulations," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 61, no. 2, pp. 573-586, Feb. 2014.

**[Marin16]** R.-C. Marin, A. Frappé, A. Kaiser, "Delta-Sigma Based Digital Transmitters with Low-Complexity Embedded-FIR Digital to RF Mixing," *23rd IEEE International Conference on Electronics, Circuits and Systems (ICECS)*, Monte Carlo, pp. 237-240, Dec. 2016.

**[Roverato17]** E. Roverato et al., "All-digital RF transmitter in 28nm CMOS with programmable RX-band noise shaping," *IEEE International Solid-State Circuits Conference (ISSCC)*, pp. 222–223, Feb. 2017.

**[Passamani17]** A. Passamani et al., "A 1.1V, 28.6dBm Fully Integrated Digital Power Amplifier for Mobile and Wireless Applications in 28nm CMOS Technology with 35% PAE," *IEEE International Solid-State Circuits Conference (ISSCC)*, pp. 232–233, Feb. 2017.

**[Shahrour17]** I. Shahrour, "Smart City: Governance of the Information System", *Global Management Conference (GMC)*, Lille, June 2017.

**[Sourikopoulos16]** I. Sourikopoulos, A. Frappé, A. Cathelin, L. Clavier, A. Kaiser, "A Digital Delay Line with Coarse/Fine tuning through Gate/Body biasing in 28nm FDSOI", in *42<sup>nd</sup> European Solid-State Circuits Conference (ESSCIRC)*, Lausanne, pp. 145-148, Sept. 2016.

**[Walling09]** J. S. Walling et. al., "A Class-E PA With Pulse-Width and Pulse-Position Modulation in 65 nm CMOS," *IEEE Journal Of Solid-State Circuits*, vol. 44, no. 6, pp. 1668-1678, June 2009.

**[Yoo13]** S.-M. Yoo, J. S. Walling et. al., "A Class-G Switched-Capacitor RF Power Amplifier," *IEEE Journal Of Solid-State Circuits*, vol. 48, no. 5, pp. 1212-1224, May 2013.

## A. Multi-standard coexistence

The use of a Delta-Sigma Modulator in digital transmitter architectures has the advantage of increased performances in terms of SNR in the band of interest. However, the resulting out-of-band noise becomes an issue for multi-standard coexistence, thus increasing the complexity of the succeeding filtering stage. These constraints can be further relaxed in the DSM stage [Marin17a] or in the digital to RF mixing stage [Marin16].

### A.1. Complex Delta-Sigma Modulators

For 1-bit high-speed DSM, a FIR-DAC based mixed-signal stage [Gebreyohannes16], or the proposed Half-SC FIR-PA architecture offer a good alternative to analog filters. Nevertheless, the designed filter requires a large number of signal taps in order to meet the stringent noise specifications, resulting in area and power consumption penalty due to the implementation of additional coefficient cells and control logic.

These constraints could be relaxed in the DSM stage, by placing complex zeros near the frequency bands, where low noise levels are needed, depending on the targeted application, thus obtaining a complex Delta-Sigma modulator (CDSM). For example, [Nzeza08] introduces a 5<sup>th</sup> order CDSM used to meet the spurious emissions requirements for the UMTS and DCS 1800 receive bands. Furthermore, the Delta-Sigma Toolbox in [Schreier16] can also be used to synthesize the noise transfer functions for quadrature delta-sigma modulators, with the following parameters: the order of the NTF, the oversampling ratio (OSR), the center frequency, the rms (root mean square) in-band noise gain, the rms image-band noise gain, and the number of image-band zeros. However, both these methods introduce limitations regarding the design of CDSM. On the one hand, the approach in [Nzeza08] is based on large computational effort (20 different coefficients need to be set) in order to optimize the poles and zeros which makes it difficult to apply to other scenarios. On the other hand, the method in [Schreier16] cannot be used to set zeros based on noise level requirements, since it can only target the image band (symmetric to the transmit band with respect to  $f_c$ ).

In order to overcome the aforementioned limitations, we propose to introduce two additional cross-couplings from the I and Q quantizers' outputs to cancel the complex coefficients in the characteristic polynomial and decorrelate the poles optimization and the zeros placement. Thus, the modified architecture can be obtained using automatic tools with flexible custom zeros placement, for improved out-of-band noise performances.

The proposed design method is applied to a 4<sup>th</sup> order CDSM (Fig. A.1 a) as part of a WLAN digital transmitter chain with the following signal and noise transfer functions (when  $c = c_x$ )

$$STF(z) = \frac{1}{den(z)} \tag{6.1}$$

$$NTF(z) = \frac{(z-1) \cdot ((z-1)^2 + g_1) \cdot (z - (r - jc \cdot z^{-0.5}))}{den(z)}$$
(6.2)

$$den(z) = a_1 + a_2 \cdot (z-1) + a_3 \cdot (z-1)^2 + (z-r+a_4) \cdot (z-1) \cdot ((z-1)^2 + g_1)$$
(6.3)

Compared to other known methods [Nzeza08] [Schreier16], the proposed CDSM can ease multi-standard coexistence using a generic well-known architecture and may achieve a noise level reduction at specific frequency bands (i.e. 3G, GPS) of 20-30 dB (Fig. A.1 b).

Moreover, we studied a complementary mechanism for zeros placement, by introducing additional delays in the cross-coupling paths of the CDSM block. Therefore, we can obtain a fine/coarse control over the coexistence bandwidth, which simplifies the architecture implementation, thanks to the reduced number of [r, c] coefficient pairs.



Fig. A.1. 4<sup>th</sup> order CDSM architecture (a); Simulated output spectrum (b) [Marin17a]

Finally, we note that the proposed scheme is ideal in the digital domain, but it may suffer from static mismatch when translating to analog/RF, which can reduce the theoretical notch placement efficiency. This issue was identified and solved in [Roverato14] in the case of a DSM-based 10-bit current-steering DAC, used to reduce out-of-band emissions at specific RX bands. In order to minimize the effect of static mismatch, the architecture employs a dynamic element matching (DEM) technique based on a treestructure encoder, to scramble the order of the conversion cells at each sampling instant, and transform static mismatch nonlinearity into pseudorandom noise. Therefore, an adapted version of this technique to single-bit CDSM may prove highlyefficient in compensating the static mismatch of the succeeding output stage, to achieve close to ideal notch placement performance.

#### A.2. Embedded-FIR digital to RF mixing

Low-complexity embedded-FIR filters are very interesting to relax the filtering constraints while keeping systems as digital as possible to benefit from the advanced CMOS node integration [Gaber11] [Flament08].

A FIR IQ DDRM (direct digital to RF modulator) transmitter is proposed in [Gaber11], in order to enable a 4<sup>th</sup> order RF FIR filter which reduces the quantization noise by more than 22 dB at 900 MHz. Furthermore, in [Flament08] several 1-bit power DAC are combined in order to obtain an RF FIR at high power levels (power gain of 14 dB). In addition, [Pozsgay08] uses an additional delayed DAC which enables a simple FIR function in order to insert 2 notches at ±500 MHz when transmitting a 802.11g signal at  $f_c = 2.45$ GHz.

The architecture proposed in [Marin16] implements symmetric and asymmetric FIR filtering embedded in the digital to RF mixing stage (Fig. A.2), in a similar manner as shown in [Pozsgay08] while avoiding the use of an additional delayed DAC, thanks to simple logic operations (AND/NOR) which eliminate redundancy.





Fig. A.2. Symmetric embedded-FIR (a); asymmetric unbalanced embedded-FIR (b)

The symmetric and asymmetric unbalanced embedded-FIR were simulated in MATLAB in order to show the flexibility of the proposed architecture in the placement of a-/symmetric notches for reduced noise level constraints (Fig. A.3 for k = 5).



Fig. A.3. Output spectrum: Asymmetric unbalanced/Symmetric embedded-FIR (k = 5)

Finally, we suggest the implementation of a fully-digital architecture to provide a highly flexible real/complex FIR solution with configurable FIR order, which could be integrated in a future version of the proposed all-digital transmitter in order to further reduce the complexity of the FIR-PA output stage.

## A.3. Chapter Bibliography

**[Flament08]** A. Flament, A. Frappé, A. Kaiser, B. Stefanelli, A. Cathelin, H. Ezzeddine, "A 1.2 GHz Semi-Digital Reconfigurable FIR Bandpass Filter with Passive Power Combiner," *34th European Solid-State Circuits Conference (ESSCIRC)*, pp. 418-421, Sept. 2008.

**[Gaber11]** W. M. Gaber, P. Wambacq, J. Craninckx, and M. Ingels, "A CMOS IQ direct digital RF modulator with embedded RF FIR-based quantization noise filter," *2011 Proceedings of the ESSCIRC (ESSCIRC)*, pp. 139-142, Sept. 2011.

**[Gebreyohannes16]** F. T. Gebreyohannes, A. Frappé, and A. Kaiser, "A Configurable Transmitter Architecture for IEEE 802.11ac and 802.11ad Standards," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 63, no. 1, pp. 9-13, Jan. 2016.

**[Marin16]** R.-C. Marin, A. Frappé, A. Kaiser, "Delta-Sigma Based Digital Transmitters with Low-Complexity Embedded-FIR Digital to RF Mixing," *23rd IEEE International Conference on Electronics, Circuits and Systems (ICECS)*, Monte Carlo, pp. 237-240, Dec. 2016.

**[Marin17a]** R.-C. Marin, A. Frappé, A. Kaiser, "Considerations for Complex Digital Delta-Sigma Modulators for Standard Coexistence in Digital Wireless Transmitters," **accepted** to *IEEE Trans. Circuits Syst. I, Reg. Papers,* June 2017.

**[Nzeza08]** C. Nsiala Nzéza, et. al., "Reconfigurable complex digital Delta-Sigma modulator Synthesis for Digital Wireless Transmitters," in *4<sup>th</sup> European Conference on Circuits and Systems for Communications (ECCSC)*, pp. 320-325, July 2008.

**[Pozsgay08]** A. Pozsgay, et. al, "A Fully Digital 65nm CMOS Transmitter for the 2.4- to-2.7GHz WiFi/WiMAX Bands using 5.4GHz  $\Delta\Sigma$  RF DACs," *IEEE International Solid-State Circuits Conference (ISSCC)*, pp. 360-619, Feb. 2008.

**[Roverato14]** E. Roverato, M. Kosunen, J. Lemberg, K. Stadius, and J. Ryynänen, "RX-Band Noise Reduction in All-Digital Transmitters With Configurable Spectral Shaping of Quantization and Mismatch Errors," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 61, no. 11, pp. 3256-3265, Nov. 2014.

**[Schreier16]** R. Schreier (2016, April). "Delta-Sigma Toolbox," [Online]. Available: http://www.mathworks.fr.