# An Integrated Circuit with Transmit Beamforming and Parallel Receive Channels for Real-Time Three-Dimensional Ultrasound Imaging

Ira Wygant, Hyunjoo Lee, Amin Nikoozadeh, David T. Yeh, Ömer Oralkan, Mustafa Karaman<sup>\*</sup>, and Butrus T. Khuri-Yakub E. L. Ginzton Laboratory Stanford University, Stanford, CA 94304 Email: iwygant@stanford.edu \*Electronics Engineering Department, Işik University, Istanbul, Turkey Email: karaman@isikun.edu.tr

Abstract— We present the design of an integrated circuit (IC) that will be flip-chip bonded to a  $16 \times 16$ -element CMUT array. The IC provides 16 receive channels which can be configured to receive along either of the array diagonals or on any single row of the array. On transmit, all 256 elements can be used to transmit arbitrarily focused beams. Focused transmission with the full array is made possible by on-chip pulsers and memory. A 25-V pulser and 8-bit shift register is provided for each element of the array. Prior to each transmit, new values are loaded into the shift registers. Current-controlled one-shots control the transmit pulse widths. Circuit simulations and the IC layout are presented. Simulations predict that delay values can be loaded in less than 1.3  $\mu$ s and show the generation of precisely timed pulses. The IC is being prepared for submission to National Semiconductor for fabrication in a high-voltage BiCMOS process.

# I. INTRODUCTION

We have demonstrated volumetric ultrasound imaging with a  $16 \times 16$  element CMUT array which is flip-chip bonded to an integrated circuit that provides the front-end circuitry for the system [1]. To simplify the design for that initial implementation, only a single element can be active at time. Thus, the images are acquired using classic synthetic aperture (CSA) imaging. CSA imaging is the easiest to implement with hardware but suffers from low signal-to-noise ratio (SNR) and prominent grating lobes in comparison with phased array imaging techniques. We are now developing a new version of the front-end IC which can transmit steered beams and provides 16 parallel receive channels. In [2] and [3] we examine different beamforming strategies for implementation with this new IC.

Here we present the design, simulation, and layout of the IC. A diagram illustrating an ultrasound probe based on the IC is shown in Fig.1. To transmit steered beams without providing a cable for each element in the array, the IC stores an 8-bit delay value for each element of the array; cables are only needed to serially load the delay values and for an 8-bit counter provided by the system. In receive, 16 channels can be configured to receive on either diagonal of the array or on any one row. Full aperture transmit and a reconfigurable receive aperture allows



Fig. 1. CMUT array flip-chip bonded to an integrated circuit (IC). The IC stores transmit delay information so that the full array can be used in transmit without providing a cable for every element of the array. Sixteen receive channels on the IC can be configured for different receive apertures.

for the implementation of numerous beamforming algorithms. As described in [2], using the full array for transmit and receiving on an X-shaped aperture provides real-time frame rates and image quality comparable to using the full aperture for both transmit and receive.

## II. CIRCUIT DESIGN

A top-level diagram of the IC is shown in Fig. 2. Most of the IC consists of a  $16 \times 16$ -element array of transmit circuit blocks. A two-phase clock and 16 digital lines are used to load delay information into the IC. When transmitting, the 8bit clock signal counts in Gray code. Each pulser fires when its stored delay value is equal to the global counter. A reset signal resets the comparators used for the count comparison. Pulse width is set by one-shots, whose pulse-widths are determined by a DC bias current. On receive, a six bit signal is used to select a receive aperture shape (Fig. 3). The receive aperture consists of sixteen elements which are connected to a row of preamplifiers and buffers along the bottom of the circuit.

The transmit circuit is shown in Fig. 5. An 8-bit shift register



Fig. 2. Top-level diagram of the integrated circuit showing the cables needed between the system and circuit.



Fig. 3. The full array is used for transmit. The 16 receive channels can be configured to receive along either diagonal or along any row.

stores a Gray-code count value. The shift registers of all the transmit cells in a row are connected in series and loaded serially by a single digital line. The circuit used for a singlebit of the shift register is shown in Fig. 6(a) [4]. An 8-bit comparator compares the value stored in the shift register with a global count value. The comparator has a pre-charged output that goes low when the comparison is true. The output stays low until the reset signal is provided. The comparator output connects to a one-shot circuit. The delay circuit used for the one-shot is shown in Fig. 6(b). The one-shot output is connected to the pulser shown in Fig. 6(d) which leads to the flip-chip bond pad and CMUT element.

A high-voltage switch connects the flip-chip bond pad to one of the amplifiers at the bottom of the chip. All of the elements in a column share the same amplifier and buffer. Receive aperture select signals and decoders at the edge of the chip select which element in the column will be connected to the amplifier and buffer.

The pseudocode shown in Fig. 4 describes the image acquisition procedure. First, Gray-code count values are loaded serially into the shift registers of the transmit circuits. The outputs of the comparators are then precharged with a reset signal. To transmit, the global counter counts in Gray code from 0 to 255. To receive, a receive aperture is selected and if not already on, the receiving amplifiers are powered on.

The preamplifiers, output buffers, and pulsers are the same

FOR B = 1:Number of transmit beams Load transmit focusing delays Reset transmit comparators Fire all pulsers by incrementing COUNT<0:7> from 0 to 255 Specify receive aperture Enable amplifiers for reception

Fig. 4. Image acquisition procedure.



Fig. 7. Shift register simulated at 100 MHz.

as those used in [1]. One change made here is that a single amplifier is shared among sixteen elements. Switches between each transducer and the amplifier reduce the parasitic capacitance of this shared connection, but the capacitance is still more than that for a dedicated amplifier. The extra capacitance is expected to be about 500 fF which in simulation does not significantly increase the amplifier noise. For this design, even with slightly higher input capacitance, the capacitance in parallel with the feedback resistor still has a dominant effect on circuit performance [5].

## **III. SIMULATIONS AND LAYOUT**

The IC is designed for a National Semiconductor process with high-voltage CMOS,  $1.5-\mu m$  low-voltage CMOS, and bipolar devices. Circuit simulations were made to predict the performance of individual circuit blocks and verify top-level functionality.

Fig. 7 shows the simulated loading of a shift register at 100 MHz. At this speed and with 16 parallel lines, it takes 1.3  $\mu s$  to load 256 8-bit registers. Even if a substantially lower clock speed is used, the time needed to load the delay values should have little effect on the imaging frame rate. The delay values can also be loaded while receiving, although in that case switching noise from the digital circuits might affect receive SNR.



Fig. 5. The circuit provided for each element of the array.



(a) A single bit of the shift register.

(b) Delay circuit used in one-shot.

Fig. 6. Circuit schematics.

(c) Transimpedance preamplifier.



Fig. 8. One-shot and pulse generation simulations.

The one-shot pulse width is the same for every element in the array. The width is set based on the center frequency of the transducer array. Fig. 8 shows how the one-shot bias current controls pulse width.

Fig. 9 shows the generation of timed output pulses after the shift registers have been loaded with delay values. For this simulation, delay values were loaded into a row of transmit circuits at 50 MHz. The global counter was then incremented from 135 to 145 (a range which included the stored delay values) causing each of the pulsers to fire. Note that the global count value is incremented only when pulses are desired. In this way, each pulser can be set to fire at an arbitrary time with no limitation on its delay relative to other pulses.

The total power consumption during receive is expected to be about 150 mW. Roughly one-third of this is consumed by the preamplifiers with the remainder used by the output buffers.

Fig. 10 shows the circuit layout which is complete except for the reconfigurable receive aperture circuitry. Because all of the circuitry needed for each element does not fit in a  $250-\mu m \times 250-\mu m$  space, there are two arrays of circuitry. The pulser, one-shot, flip-chip bond pad, switch, and the associated digital logic fit in a 250- $\mu$ m × 250- $\mu$  m space which sits directly



Fig. 9. Simulation showing generation of timed pulses.

under each transducer element. Approximately 30% of that space is consumed by the pulser, 20% by the one-shot, and 10% by the switch. The remainder is consumed by the bond pad and routing. Significant space is required for routing because only two metal layers are available in the process and many connections are made between the two circuit arrays.

The layout of a single pulser and shift-register block consumes  $250-\mu$ m×380 $\mu$ m with each consuming about half of that area. For the design of the digital blocks, tradeoffs were made between area and reliability. For the shift register, the PMOS transistors in the transmission gates and the feedback inverter at the output could have been removed to reduce the area consumed; but, they were preserved for reliability. The comparator was implemented using standard-cell XOR and NAND gates with a precharged NAND at the output. An implementation using only precharged logic would have consumed less area but would have required another clock and might be less reliable.

The total chip area is approximately 1-cm  $\times$  6-mm. Two sides of IC are free of wire-bond pads such that a 2×2 array of the ICs can be tiled together to create a 32×32-element array.

## IV. CONCLUSION

We have completed the design and simulation of a new IC with transmit beamforming. It is expected that this IC will provide substantially better image quality in comparison with our previous implementation of a front-end IC.



Amplifier and buffer rows

Fig. 10. Circuit layout.

#### ACKNOWLEDGMENT

We would like to thank National Semiconductor for fabrication of the integrated circuit as well as their help with circuit design and simulations.

Dr. Karaman is supported by TÜBİTAK of Turkey through grant 106M333.

#### REFERENCES

- I. O. Wygant et al., "An endoscopic imaging system based on a twodimensional CMUT array: real-time imaging results," presented at the 2005 IEEE International Ultrasonics Symposium, Rotterdam, The Netherlands, Sep. 18–21, 2005.
- [2] I. O. Wygant, M. Karaman, Ö. Oralkan, and B. T. Khuri-Yakub, "Beamforming and hardware design for a multichannel front-end integrated circuit for real-time 3D catheter-based ultrasonic imaging," presented at SPIE Medical Imaging 2006, San Diego, USA, Feb. 11–16, 2006.
- [3] I. O. Wygant, M. Karaman, Ö. Oralkan, and B. T. Khuri-Yakub, "Volumetric imaging using fan-beam scanning with reduced redundancy 2D arrays," presented at the 2006 IEEE International Ultrasonics Symposium, Vancouver, Canada, Oct. 3–6, 2006.
- [4] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, *Digital Integrated Circuits*. Prentice Hall, 2003.
- [5] J. Graeme, *Photodiode Amplifiers: Op Amp Solutions*. Boston: McGraw Hill, 1996.