Esta es la versión de autor de la comunicación de congreso publicada en:
This is an author produced version of a paper published in:


DOI: http://dx.doi.org/10.1109/NORCHIP.2005.1596986

Copyright: © 2005 IEEE

El acceso a la versión del editor puede requerir la suscripción del recurso
Access to the published version may require subscription
Thermal Verification on FPGAs

Eduardo Boemo and Sergio López-Buedo
School of Engineering
Universidad Autónoma de Madrid, Spain
http://www.ii.uam.es
e-mail: eduardo.boemo@uam.es

Abstract
Thermal verification of complex ICs can help the designer to detect if a particular block is working beyond specifications. A simple method is to extract the output frequencies of an array of ring-oscillators previously distributed in the die. The main advantage is that neither external transducers nor analog parts are necessary. Other possibility is to bias one of the clamping diodes usually present in the pads, and measure its junction forward voltage. In both cases, the measurement of temperature can be done in actual working conditions; that is, with the chip inside the case with its heat sink and fan.

1. Introduction
The well-known formula:

\[ T_j = T_A + \theta_{JA} P_D \]  

[1]

is appropriated to characterize chip packages [1][9], but is a coarse model for any practical application. The expression does not take into account aspects like the particular PCB traces, the hard-to-model heat sink and fan influence, the board position, or the interaction with hot devices situated near the chip. Infrared cameras can be utilized to get a thermal map of a chip, but they require a direct vision of the silicon: Thus, they can not be employed in actual working conditions.

The implementation of on-chip thermal transducers allows the designer to avoid the inconveniences described above. Main techniques to construct temperature sensors on CMOS technology make use of analog effects like the temperature dependence of the junction forward voltage, or the Seebeck effect [2]. In this paper, ring oscillators are employed as temperature transducers (Fig.1). This type of circuits can be easily implemented using few logic cells. The advantages of this approach are multiple: a) Like other on-chip sensors, the junction temperature instead of the package one is measured; b) All signals are digital; thus, they can be routed using the general interconnection network of the board; c) The sensor itself is small: practical circuits make use of one or two logic blocks, and a minimum-size sensor can be fitted in just an I/O pad; d) A sensor or even an array of them can be placed in virtually any position of the chip, making possible to construct a thermal map of the die; and f) The sensor can be dynamically inserted or eliminated in FPGA technology.

Quenot et all [3] have proposed ring oscillators to measure both the temperature and power supply fluctuations. The oscillator is activated during a fixed period, and a counter with an scan path is used to read back the resulting frequency. At PCB level, a thermal monitoring method based on the measurement of a copper trace resistance has been proposed in [4].

In this work, an array of ring-oscillators and counters is utilized. In FPGA technology, the method is also compatible with dynamic reconfiguration, as the sensor (ring oscillator plus its associated counters) could be on-the-fly inserted. The sensors can be remotely operated, being suitable for application that requires high reliability like avionics or on-board satellite circuits.

Fig.1: A ring-oscillator scheme
2. Sensor Calibration

Ring-oscillators can be manually constructed or by using an automatic partitioning, placement and routing design flow. In the last case, the designers must include high-level directives to avoid the simplification of an even number of inverters during the compilation process. In FPGAs, it is also useful to fit the inverters in distant LUTs, in order to increase wiring delays. Note that unusually in digital design, in this case the goal is to decrease the operation frequency in order to minimize self-heating, extra power consumption, and counter size. Fig.2 depicts a detailed layout of an oscillator in a Xilinx FPGA. The oscillator was manually placed and routed using 4 CLBs. It has have an overall loop delay of 35 ns: 23.2 ns corresponding to LUTs and 11.8 ns to wiring. The outputs is buffered (via other CLB) in order to prevent different capacitance load caused by different output paths.

Fig.2: A ring-oscillator layout on FPGAs

In order to perform an accurate sensor calibration, the FPGA samples must be introduced in a temperature-controlled oven. In our case, to save time, a configuration with all the oscillators was downloaded, but just one was enabled during its characterization. The chip temperature was measured by placing an Iron-Constantan (Fe-CuNi) thermocouple in the center of the package. A set of long cables (near to one meter) were necessary to externally configure and control the FPGA, as well as to carry the oscillator outputs outside the oven. In order to prevent excessive power consumption in the sensors due to these high off-chip loads, a driver (type 74HC125) was inserted near the FPGA to isolate the ring outputs from the cables.

A source of error in active thermal transducers is the own sensor dissipation. However, performing the measurements during a short enable window minimizes this problem. In our case, the oscillator was first left running during 0.2 ms to stabilize the output. Then, we measured the frequency during 4 ms and then stopped the oscillator. The procedure was repeated every 250 ms. All the parameters were obtained empirically, and they should be reviewed in other chip technologies.

Fig.3: The effect of self-heating. Short enable window versus continuous operation

3. Testing a microprocessor twin-core

We performed several experiments to verify the ideas [5]-[7]. The last one was to measure the thermal behavior of two 32-bit PLASMA microprocessors, compatible with MIPS-I [8], whose VHDL code was obtained from the opencores.org initiative.

The two cores are placed in a XCV800HQ240-4C Virtex FPGA. Using the LOC directive, the position of the first processor is restricted to the first 32 left columns; meanwhile the second one is situated in the other 32 right columns. Each processor has 4 KB available, implemented in BlockRAM (BRAM).

In free zones previously reserved in the processor layout, the array of 32 sensors is added. To implement this circuit, the ISE 5.1i and XST synthesizer was utilized. In Fig.3 the final layout of both processors and sensors is presented. The position of the two cores can be clearly observed. The temperature sensors only make use of the 5.7% of the die.
Two different benchmark routines were utilized to produce different power consumptions. The first one, named opcodes.asm, uses all the instructions and then it stalls in an infinite loop. The second, pi.c, calculates continuously the first 40 digits of the π number using integer arithmetic. The utility of opcodes.asm is to check that the CPU is working correctly, but it is also useful to maintain them in a low-power operation (during the final infinite loop). On the other hand, pi.c is a program that produces a high consumption, at least in comparison with opcodes.asm.

4. Experimental Results

Measurements were done in a clean room that guarantees ambient temperature variations smaller than 1 °C. Previously, the response of all the sensors was pre-calibrated introducing the FPGAs sample in a temperature-controlled oven, and measuring the output frequencies for different temperatures.

During operation, a clock frequency 10 MHz was selected to operate the processors. The measured power consumptions for each core and routine are summarized in Table 1.

<table>
<thead>
<tr>
<th>Routine</th>
<th>Left µP</th>
<th>Right µP</th>
</tr>
</thead>
<tbody>
<tr>
<td>opcodes.asm</td>
<td>92.75 mW</td>
<td>93.75 mW</td>
</tr>
<tr>
<td>pi.c</td>
<td>204 mW</td>
<td>206.25 mW</td>
</tr>
</tbody>
</table>

Table 1: Power consumption of the test routines

Additionally, each processor has an idle consumption of approximately 25 mW. In figures 4 to 7, the results of some of the different experiments are shown. In these thermal maps, the constant chip temperature corresponding to this consumption is eliminated to make the figures more visible. Thus, the y-axis represents the deviation of the local temperature respect to the mean temperature increment caused by the activation of the microprocessors from the idle (reset) state.
Fig. 4: Thermograph obtained by interpolating the sensor responses. The left processor is running *opcodes.asm*, and the right one *pi.c*.

Fig. 5: Temperature deviation respect to the mean value. The left processor is running *opcodes.asm*, and the right one *pi.c*. 
Fig. 6: Thermograph obtained by interpolating the sensor responses. The left processor is running pi.c, and the right one opcodes.asm.

Fig. 7: Temperature deviation respect to the mean value. The left processor is running pi.c, and the right one opcodes.asm.
The figures show that the technique is sensitive enough, even considering the small consumptions involved in the experiments. Is significant the case where a processor runs the heavy program meanwhile the other is hung (figures 2 to 5), because the temperature gradients can be clearly detected.

Unfortunately, the processor descriptions utilized in the experiments have no indication of placement. So, the thermal status of the different functional units can not be determined. This information would be useful to redesign the blocks that are producing hot spots. As only exception, a little difference can be observed in the BlockRAM area in Figs. 8 and 9 (where both microprocessors are hung). A small activity is observed, caused by the access to two memory positions that return the JMP and NOP instruction codes.

5. Conclusions

A new technique, useful to detect hot spots or thermal gradients in FPGA-based circuits, has been presented. Some thermographs have been obtained from a real, complex system utilized as a case-study. The effects of different routines on the thermal status of the chip have been clearly evidenced.

As a future work, the effects of the possible voltage drops have to be identified and compensated. This can be done by using two ring oscillators per sensor, with different sensitivities, where it is shown that it is possible to simultaneously obtain the temperature and voltage values with errors less than ±1°C and ±5mV [10].

The work opens the door to the design of future EDA tools, which could on-the-fly analyze the activity of the different blocks of a complex circuit. This could be useful for the designer of low-power electronics. He or she could know which block is consuming more power, and thus has to be redesigned, instead of applying a global strategy to reduce power (for example, lowering the clock frequency). The technique is also useful to know, point-to-point of the die, the safety margins respect to the maximum nominal temperature, nowadays situated near 125 ºC. In those applications were thermal deration [11] is still a requirement, this kind of information can save hundred of hours of redesign.

6. Acknowledgements

This research is supported by project number 07T/0052/2003-3 of the Consejería de Educación de la Comunidad Autónoma de Madrid, Spain.

7. References