AVR32AP_技术文档

Feature Summary

• 32-bit load/store AVR32B RISC architecture

• 15 general-purpose 32-bit registers

• 32-bit Stack Pointer, Program Counter and Link Register reside in register file

• Fully orthogonal instruction set

32-bit AVR^®

Microcontroller

• Pipelined architecture allows one instruction per clock cycle for most instructions

• Byte, half-word, word and double word memory access

• Shadowed interrupt context for INT3 and multiple interrupt priority levels

• Privileged and unprivileged modes enabling efficient and secure Operating Systems

• Full MMU allows for operating systems with memory protection

• Instruction and data caches

• Innovative instruction set together with variable instruction length ensuring industry

leading code density

• DSP extention with saturating arithmetic, and a wide variety of multiply instructions

• SIMD extention for media applications

AVR32 AP

Technical

Reference

Manual

• Dynamic branch prediction and return address stack for fast change-of-flow

• Powerful On-Chip Debug system

• Coprocessor interface

32001A–AVR32–06/06

1. Introduction

AVR^®32 is a new high-performance 32-bit RISC microprocessor core, designed for cost-sensi-

tive embedded applications, with particular emphasis on low power consumption and high code

density. In addition, the instruction set architecture has been tuned to allow for a variety of

microarchitectures, enabling the AVR32 to be implemented as low-, mid- or high-performance

processors.

1.1

The AVR family

The AVR family was launched by Atmel^®in 1996 and has had remarkable success in the 8-and

16-bit flash microcontroller market. AVR32 complements the current AVR microcontrollers.

Through the AVR32 family, the AVR is extended into a new range of higher performance appli-

cations that is currently served by 32- and 64-bit processors

To truly exploit the power of a 32-bit architecture, the new AVR32 architecture is not binary com-

patible with earlier AVR architectures. In order to achieve high code density, the instruction

format is flexible providing both compact instructions with 16 bits length and extended 32-bit

instructions. While the instruction length is only 16 bits for most instructions, powerful 32-bit

instructions are implemented to further increase performance. Compact and extended instruc-

tions can be freely mixed in the instruction stream.

1.2

The AVR32 Microprocessor Architecture

The AVR32 is a new innovative microprocessor architecture. It is a fully synchronous synthesi-

sable RTL design with industry standard interfaces, ensuring easy integration into SoC designs

with legacy intellectual property (IP). Through a quantitative approach, a large set of industry

recognized benchmarks have been compiled and analyzed to achieve the best code density in

its class of microprocessor architectures. In addition to lowering the memory requirements, a

compact code size also contributes to the core’s low power characteristics. The processor sup-

ports byte and half-word data types without penalty in code size and performance.

Memory load and store operations are provided for byte, half-word, word and double word data

with automatic sign- or zero extension of half-word and byte data. The C-compiler is closely

linked to the architecture and is able to exploit code optimization features, both for size and

speed.

In order to reduce code size to a minimum, some instructions have multiple addressing modes.

As an example, instructions with immediates often have a compact format with a smaller imme-

diate, and an extended format with a larger immediate. In this way, the compiler is able to use

the format giving the smallest code size.

Another feature of the instruction set is that frequently used instructions, like add, have a com-

pact format with two operands as well as an extended format with three operands. The larger

format increases performance, allowing an addition and a data move in the same instruction in a

single cycle.

2

AVR32

32001A–AVR32–06/06

AVR32

Load and store instructions have several different formats in order to reduce code size and

speed up execution:

• Load/store to an address specified by a pointer register

• Load/store to an address specified by a pointer register with postincrement

• Load/store to an address specified by a pointer register with predecrement

• Load/store to an address specified by a pointer register with displacement

• Load/store to an address specified by a small immediate (direct addressing within a small

page)

• Load/store to an address specified by a pointer register and an index register.

The register file is organized as 16 32-bit registers and includes the Program Counter, the Link

Register, and the Stack Pointer. In addition, one register is designed to hold return values from

function calls and is used implicitly by some instructions.

The AVR32 architecture defines several microarchitectures in order to capture the entire range

of applications. The microarchitectures are named AVR32A, AVR32B and so on. Different

microarchitectures are suited to different end applications, allowing the designer to select a

microarchitecture with the optimum set of parameters for a specific application.

1.3

Event handling

The AVR32 incorporates a powerful event handling scheme. The different event sources, like

“Illegal opcode” and external interrupt requests, have different priority levels, ensuring a well-

defined behavior when multiple events are received simultaneously. Additionally, pending

events of a higher priority class may preempt handling of ongoing events of a lower priority

class. Each priority class has dedicated registers to keep the return address and status register

thereby removing the need to perform time-consuming memory operations to save this

information.

There are four levels of external interrupt requests, all executing in their own context. An inter-

rupt controller does the priority handling of the external interrupts and provides the prioritized

interrupt vector to the processor core.

1.4

Java Support

The AVR32 architecture defines a Java^®hardware acceleration option, in the form of a Java Vir-

tual Machine hardware implementation.

3

32001A–AVR32–06/06

1.5

Microarchitectures

The AVR32 architecture defines different microarchitectures. This enables implementations that

are tailored to specific needs and applications. The microarchitectures provide different perfor-

mance levels at the expense of area and power consumption. The following microarchitectures

are defined:

1.5.1

AVR32A

The AVR32A microarchitecture is targeted at cost-sensitive, lower-end applications like smaller

microcontrollers. This microarchitecture does not provide dedicated hardware registers for shad-

owing of register file registers in interrupt contexts. Additionally, it does not provide hardware

registers for the return address registers and return status registers. Instead, all this information

is stored on the system stack. This saves chip area at the expense of slower interrupt handling.

Upon interrupt initiation, registers R8-R12 are automatically pushed to the system stack. These

registers are pushed regardless of the priority level of the pending interrupt. The return address

and status register are also automatically pushed to stack. The interrupt handler can therefore

use R8-R12 freely. Upon interrupt completion, the old R8-R12 registers and status register are

restored, and execution continues at the return address stored popped from stack.

The stack is also used to store the status register and return address for exceptions and scall.

Executing the rete or rets instruction at the completion of an exception or system call will pop

this status register and continue execution at the popped return address.

1.5.2

AVR32B

The AVR32B microarchitecture is targeted at applications where interrupt latency is important.

The AVR32B therefore implements dedicated registers to hold the status register and return

address for interrupts, exceptions and supervisor calls. This information does not need to be

written to the stack, and latency is therefore reduced. Additionally, AVR32B allows hardware

shadowing of the registers in the register file. The INT0 to INT3 contexts may have dedicated

versions of the registers in the register file, allowing the interrupt routine to start executing

immediately.

The scall, rete and rets instructions use the dedicated status register and return address regis-

ters in their operation. No stack accesses are performed.

4

AVR32

32001A–AVR32–06/06

AVR32

1.6

The AVR32 AP implementation

The first implementation of the AVR32B microarchitecture is designed as an application proces-

sor and called AVR32 AP. This implementation targets high-performance applications in the

DSP, multimedia and wireless segment, and provides:

• Advanced OCD system.

• Efficient data and instruction caches.

• Full MMU.

• Java acceleration is implemented in hardware.

• Fast interrupt handling is provided through shadowed register banks for interrupt priority 3.

• SIMD extension.

• DSP extension.

• Service Access Port (SAP) that gives an external JTAG controller access to memories and

registers inside the AVR32 AP core.

Figure 1-1 on page 5 displays the contents of AVR32 AP:

Figure 1-1. Overview of AVR32 AP.

Service

Access

Port

OCD

system

Reset

control

Tightly Coupled Bus

BTB RAM interface

AVR32 AP CPU pipeline with Java accelerator

MMU

Dcache

controller

Icache

controller

Cache RAM interface

High Speed

bus master

High Speed

bus master

5

32001A–AVR32–06/06

2. Programming Model

This chapter describes the programming model and the set of registers accessible to the user. It

also describes the implementation options in AVR32 AP.

2.1

Architectural compatibility

AVR32 AP is fully compatible with the Atmel AVR32B architecture.

2.2

Implementation options

2.2.1

Memory management

AVR32 AP implements a full MMU as specified by the AVR32 architecture.

2.2.2

Java support

AVR32 AP implements a Java Extention Module (JEM) as defined in the AVR32 architecture.

2.3

Register file configuration

The AVR32B architecture specifies that the exception contexts may have a different number of

shadowed registers in different implementations. The following shadow model is used in AVR32

AP.

Figure 2-1. Register file configuration. Shadowed registers are marked in grey.

Supervisor

INT0

INT1

INT2

INT3

Exception

NMI

Application

Bit 31

Bit 0

Bit 31

Bit 0

Bit 31

Bit 0

Bit 31

Bit 0

Bit 31

Bit 0

Bit 31

Bit 0

Bit 31

Bit 0

PC

LR

SP_SYS

R12

PC

LR

SP_SYS

R12

PC

LR

PC

LR

SP_SYS

R12

PC

LR

SP_SYS

R12

PC

LR

SP_APP

R12

PC

LR

SP_SYS

R12

LR_INT3

SP_SYS

R12_INT3

R11_INT3

R10_INT3

R9_INT3

R8_INT3

R7

SP_SYS

R12

R11

R10

R9

R11

R10

R9

R8

R7

INTR07PC

INTR16PC

FINRT5PC

SMRP4C

R3

INTR07PC

INTR16PC

FINRT5PC

SMRP4C

R3

INTR07PC

INTR16PC

FINRT5PC

SMRP4C

R3

INTR07PC

INTR16PC

FINRT5PC

SMRP4C

R3

INTR07PC

INTR16PC

FINRT5PC

SMRP4C

R3

INTR07PC

INTR16PC

FINRT5PC

SMRP4C

R3

R6

FINRT5PC

SMRP4C

R3

FINTPC

R5

SMRP4C

R3

R2

R1

R0

SR

RSR_INT0

RAR_INT0

RSR_NMI

RAR_NMI

RSR_SUP

RAR_SUP

RSR_INT1

RAR_INT1

RSR_INT2

RAR_INT2

RSR_INT3

RAR_INT3

RSR_EX

RAR_EX

6

AVR32

32001A–AVR32–06/06

AVR32

2.4

Status register configuration

The Status Register (SR) is splitted into two halfwords, one upper and one lower. The lower

word contains the C, Z, N, V and Q condition code flags and the R, T and L bits, while the upper

halfword contains information about the mode and state the processor executes in.

Figure 2-2. The Status Register high halfword.

Bit 31

Bit 16

G M

LC

1

-

H

0

J

DM

0

D

0

-

M 2

0

M 1

0

M 0

1

EM

1

I3M

0

IF2EM

0

I1M

0

I0M

0

Bit nam e

0

1

Initial value

G lobal Interrupt M ask

Interrupt Level 0 M ask

Interrupt Level 1 M ask

Interrupt Level 2 M ask

Interrupt Level 3 M ask

Exception M ask

M ode Bit 0

M ode Bit 1

M ode Bit 2

Reserved

Debug State

Debug State M ask

Java State

Java Handle

Reserved

Figure 2-3. The Status Register low halfword.

B it 1 5

B it 0

R

0

T

0

-

L

0

Q

0

V

0

N

0

Z

0

C

B it n a m e

0

In itial va lu e

C arry

Z ero

S ign

O ve rflow

S atu ratio n

L ock

R eserved

S cratch

R egiste r R em ap E na ble

H - Java Handle

This bit is included to support different heap types in the Java Virtual Machine. For more details,

see the Java Technical Reference manual. The bit is cleared at reset.

J - Java state

The processor is in Java state when this bit is set. The incoming instruction stream will be

decoded as a stream of Java bytecodes, not RISC opcodes. The bit is cleared at reset. This bit

should not be modified by the user as undefined behaviour may result.

DM - Debug State Mask

If this bit is set, the Debug State is masked and cannot be entered. The bit is cleared at reset,

and can both be read and written by software.

7

32001A–AVR32–06/06

D - Debug state

The processor is in debug state when this bit is set. The bit is cleared at reset and should only be

modified by debug hardware, the breakpoint instruction or the retd instruction. Undefined behav-

iour may result if the user tries to modify this bit manually.

M2, M1, M0 - Execution Mode

These bits show the active execution mode. The different settings for the different modes are

shown in Table 2-1. M2 and M1 are cleared by reset while M0 is set so that the processor is in

supervisor mode after reset. These bits are modified by hardware, or execution of certain

instructions like scall, rets and rete. Undefined behaviour may result if the user tries to modify

these bits manually.

Table 2-1.

Mode bit settings

M2

1

M1

M0

1

Mode

1

0

1

0

Non Maskable Interrupt

Exception

1

0

1

Interrupt level 3

Interrupt level 2

Interrupt level 1

Interrupt level 0

Supervisor

1

0

1

0

1

0

Application

EM - Exception mask

When this bit is set, exceptions are masked. Exceptions are enabled otherwise. The bit is auto-

matically set when exception processing is initiated or Debug Mode is entered. Software may

clear this bit after performing the necessary measures if nested exceptions should be supported.

This bit is set at reset.

I3M - Interrupt level 3 mask

When this bit is set, level 3 interrupts are masked. If I3M and GM are cleared, INT3 interrupts

are enabled. The bit is automatically set when INT3 processing is initiated. Software may clear

this bit after performing the necessary measures if nested INT3s should be supported. This bit is

cleared at reset.

I2M - Interrupt level 2 mask

When this bit is set, level 2 interrupts are masked. If I2M and GM are cleared, INT2 interrupts

are enabled. The bit is automatically set when INT3 or INT2 processing is initiated. Software

may clear this bit after performing the necessary measures if nested INT2s should be supported.

This bit is cleared at reset.

I1M - Interrupt level 1 mask

When this bit is set, level 1 interrupts are masked. If I1M and GM are cleared, INT1 interrupts

are enabled. The bit is automatically set when INT3, INT2 or INT1 processing is initiated. Soft-

ware may clear this bit after performing the necessary measures if nested INT1s should be

supported. This bit is cleared at reset.

8

AVR32

32001A–AVR32–06/06

AVR32

I0M - Interrupt level 0 mask

When this bit is set, level 0 interrupts are masked. If I0M and GM are cleared, INT0 interrupts

are enabled. The bit is automatically set when INT3, INT2, INT1 or INT0 processing is initiated.

Software may clear this bit after performing the necessary measures if nested INT0s should be

supported. This bit is cleared at reset.

GM - Global Interrupt Mask

When this bit is set, all interrupts are disabled. This bit overrides I0M, I1M, I2M and I3M. The bit

is automatically set when exception processing is initiated, Debug Mode is entered, or a Java

trap is taken. This bit is automatically cleared when returning from a Java trap. This bit is set

after reset.

R - Java Register Remap

When this bit is set, the addresses of the registers in the register file is dynamically changed.

This allows efficient use of the register file registers as a stack. For more details, see the Java

Technical Reference Manual. The R bit is cleared at reset. Undefined behaviour may result if

this bit is modified by the user.

T - Scratch bit

Not used by any instruction, but can be manipulated by application software as a scratchpad bit.

This bit is cleared after reset.

L - Lock flag

Used by the conditional store instruction. Used to support atomical memory access. Automati-

cally cleared by rete. This bit is cleared after reset.

Q - Saturation flag

The saturation flag indicates that a saturating arithmetic operation overflowed. The flag is sticky

and once set it has to be manually cleared by a csrf instruction after the desired action has been

taken. See the Instruction set description for details.

V - Overflow flag

The overflow flag indicates that an arithmetic operation overflowed. See the Instruction set

description for details.

N - Negative flag

The negative flag is modified by arithmetical and logical operations. See the Instruction set

description for details.

Z - Zero flag

The zero flag indicates a zero result after an arithmetic or logic operation. See the Instruction set

description for details.

C - Carry flag

The carry flag indicates a carry after an arithmetic or logic operation. See the Instruction set

description for details.

9

32001A–AVR32–06/06

2.5

System registers

The system registers are placed outside of the virtual memory space, and are only accessible

using the privileged mfsr and mtsr instructions. Some of the System Registers can be altered

automatically by hardware. The table below lists the system registers specified in AVR32 AP. It

also identifies their address and the pipeline stage in which it is located. The programmer is

responsible for maintaining correct sequencing of any instructions following a mtsr instruction.

Table 2-2.

System Registers implemented in AVR32 AP

Location

Reg #

0

Address

0

Name

Function

in pipeline

SR

Status Register

A1

ID

1

4

EVBA

Exception Vector Base Address

Application Call Base Address

CPU Control Register

2

8

ACBA

3

12

CPUCR

4

16

ECR

Exception Cause Register

5

20

RSR_SUP

RSR_INT0

RSR_INT1

RSR_INT2

RSR_INT3

RSR_EX

RSR_NMI

RSR_DBG

RAR_SUP

RAR_INT0

RAR_INT1

RAR_INT2

RAR_INT3

RAR_EX

RAR_NMI

RAR_DBG

JECR

Return Status Register for supervisor context

Return Status Register for INT 0 context

Return Status Register for INT 1 context

Return Status Register for INT 2 context

Return Status Register for INT 3 context

Return Status Register for Exception context

Return Status Register for NMI context

Return Status Register for Debug Mode

Return Address Register for supervisor context

Return Address Register for INT 0 context

Return Address Register for INT 1 context

Return Address Register for INT 2 context

Return Address Register for INT 3 context

Return Address Register for Exception context

Return Address Register for NMI context

Return Address Register for Debug Mode

Java Exception Cause Register

Java Operand Stack Pointer

6

24

7

28

8

32

9

36

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

40

44

48

52

56

60

64

68

72

76

80

84

88

JOSP

92

JAVA_LV0

JAVA_LV1

JAVA_LV2

JAVA_LV3

JAVA_LV4

JAVA_LV5

Java Local Variable 0

A1

96

Java Local Variable 1

100

104

108

112

Java Local Variable 2

Java Local Variable 3

Java Local Variable 4

Java Local Variable 5

10

AVR32

32001A–AVR32–06/06

AVR32

Table 2-2.

System Registers implemented in AVR32 AP (Continued)

Location

Reg #

29

Address

116

120

124

128

256

260

264

268

272

276

280

284

288

292

296

300

304

308

312

316

768

772

776

Name

Function

in pipeline

JAVA_LV6

JAVA_LV7

JTBA

Java Local Variable 6

A1

30

Java Local Variable 7

A1

31

Java Trap Base Address

Java Write Barrier Control Register

Configuration register 0

Configuration register 1

Cycle Counter register

Compare register

A1

32

JBCR

A1

64

CONFIG0

CONFIG1

COUNT

COMPARE

TLBEHI

TLBELO

PTBR

TCB

65

66

67

68

TLB Entry High

69

TLB Entry Low

70

Page Table Base Register

TLB Exception Address Register

MMU Control Register

TLB Accessed Register Low

TLB Accessed Register High

Performance Clock Counter

Performance Counter 0

Performance Counter 1

Performance Counter Control Register

Bus Error Address Register

SAB Address Low Register

SAB Address High Register

SAB Data Register

71

TLBEAR

MMUCR

TLBARLO

TLBARHI

PCCNT

PCNT0

PCNT1

PCCR

72

73

74

75

76

77

78

79

BEAR

192

193

194

SABAL

SABAH

SABD

SR - Status Register

The Status Register is mapped into the system register space. This allows it to be loaded into

the register file to be modified, or to be stored to memory. The Status Register is described in

detail in Section 2.4 on page 7.

EVBA - Exception Vector Base Address

This register contains a pointer to the exception routines. All exception routines starts at this

address, or at a defined offset relative to the address. Special alignment requirements apply for

EVBA, see Section 3.10 ”Event handling” on page 30.

ACBA - Application Call Base Address

Pointer to the start of a table of function pointers. Subroutines residing in this space can be

called by the compact acall instruction. This facilitates efficient reuse of code. Keeping this base

pointer as a register facilitates multiple application spaces. ACBA is a full 32 bit register, but the

11

32001A–AVR32–06/06

lowest bit should be written to zero, making ACBA halfword aligned. Failing to do so may result

in erroneous behaviour.

CPUCR - CPU Control Register

Register controlling the configuration and behaviour of the CPU. The following fields are defined:

Table 2-3.

CPU control register

Bit

Name

Reset Access

Description

Enable bit for coprocessor 7 to coprocessor 0. The

corresponding coprocessor is enabled if this bit is written

to one by software. Can be written to one only if the

corresponding coprocessor is present in the system.

Attempting to issue a coprocessor instruction to a

coprocessor whose enable bit is cleared, will result in a

coprocessor absent exception.

COP7EN

-

COP0EN

31 -

24

0

1

Read/write

Imprecise Execution Enable. Required for various OCD

features, see Section 9. ”OCD system” on page 86. If

cleared, memory operations will require several

additional clock cycles.

5

IEE

Imprecise Breakpoint Enable. Required for various OCD

features, see Section 9. ”OCD system” on page 86. If

cleared, memory operations will require an additional

clock cycle.

4

3

IBE

RE

1

Read/write

If set, the return stack is enabled. Disabling the return

stack will empty it, removing all entries.

If set, branch instructions can be folded with other

instructions.

2

1

0

FE

BE

BI

1

-

Read/write

If set, branch prediction is enabled.

Read-

0/write-1

BTB invalidate. Writing to 1 will invalidate all entries in

the BTB.

Read-

0/write-0

Other

-

Unused. Read as 0. Should be written as 0.

ECR - Exception Cause Register

This register identifies the cause of the most recently executed exception. This information may

be used to handle exceptions more efficiently in certain operating systems. The register is

updated with a value equal to the EVBA offset of the exception, shifted 2 bit positions to the

right. Only the 9 lowest bits of the EVBA offset are considered. As an example, an ITLB miss

jumps to EVBA+0x50. The ECR will then be loaded with 0x50>>2 == 0x14. The ECR register is

not loaded when a Breakpoint or OCD Stop CPU exception is taken. Note that for interrupts, the

offset is given by the autovector provided by the interrupt controller. The resulting ECR value

may therefore overlap with an ECR value used by a regular exception. This can be avoided by

choosing the autovector offsets so that no such overlaps occur.

RSR_SUP, RSR_INT0, RSR_INT1, RSR_INT2, RSR_INT3, RSR_EX, RSR_NMI - Return Sta-

tus Registers

If a request for a mode change like an interrupt request is accepted when executing in a context

C, the Status Register values in context C are automatically stored in the Return Status Register

(RSR) associated with the interrupt context I. When the execution in the interrupt state I is fin-

12

AVR32

32001A–AVR32–06/06

AVR32

ished and the rets / rete instruction is encountered, the RSR associated with I is copied to SR,

and the execution continues in the original context C.

RSR_DBG - Return Status Register for Debug Mode

When Debug mode is entered, the status register contents of the original mode is automatically

saved in this register. When the debug routine is finished, the retd instruction copies the con-

tents of RSR_DBG into SR.

RAR_SUP, RAR_INT0, RAR_INT1, RAR_INT2, RAR_INT3, RAR_EX, RAR_NMI - Return

Address Registers

If a request for a mode change, for instance an interrupt request, is accepted when executing in

a context C, the re-entry address of context C is automatically stored in the Return Address Reg-

ister (RAR) associated with the interrupt context I. When the execution in the interrupt state I is

finished and the rets / rete instruction is encountered, a change-of-flow to the address in the

RAR associated with I, and the execution continues in the original context C.

RAR_DBG - Return Address Register for Debug Mode

When Debug mode is entered, the Program Counter contents of the original mode is automati-

cally saved in this register. When the debug routine is finished, the retd instruction copies the

contents of RAR_DBG into PC.

JECR - Java Exception Cause Register

This register contains information needed for Java traps. See Java Technical Reference Manual

for details.

JOSP - Java Operand Stack Pointer

This register holds the Java Operand Stack Pointer. See Java Technical Reference Manual for

details. The register is initialized to 0 at reset.

JAVA_LVx - Java Local Variable Registers

The Java Extension Module uses these registers to temporarily store local variables. See Java

Technical Reference Manual for details.

JTBA - Java Trap Base Address

This register contains the base address to the program code for the trapped Java instructions.

See Java Technical Reference Manual for details.

JBCR - Java Write Barrier Control Register

This register is used by the garbage collector in the Java Virtual Machine. See Java Technical

Reference Manual for details.

CONFIG0 / 1 - Configuration Register 0 / 1

Used to describe the processor, its configuration and capabilities. The contents and functionality

of these registers is described in detail in Section 2.6 on page 16.

COUNT - Cycle Counter Register

The COUNT register increments once every clock cycle, regardless of pipeline stalls and

flushes. The COUNT register can both be read and written. The count register can be used

together with the COMPARE register to create a timer with interrupt functionality. The COUNT

13

32001A–AVR32–06/06

register is written to zero upon reset. Incrementation of the COUNT register can not be disabled.

The COUNT register will increment even though a compare interrupt is pending.

COMPARE - Cycle Counter Compare Register

The COMPARE register holds a value that the COUNT register is compared against. The COM-

PARE register can both be read and written. When the COMPARE and COUNT registers match,

a compare interrupt request is generated. This interrupt request is routed out to the interrupt

controller, which may forward the request back to the processor as a normal interrupt request at

a priority level determined by the interrupt controller. Writing a value to the COMPARE register

clears any pending compare interrupt requests. The compare and exception generation feature

is disabled if the COMPARE register contains the value zero. The COMPARE register is written

to zero upon reset.

TLBEHI - MMU TLB Entry Register High Part

Used to interface the CPU to the TLB. The contents and functionality of the register is described

in detail in Section 4. on page 48.

TLBELO - MMU TLB Entry Register Low Part

Used to interface the CPU to the TLB. The contents and functionality of the register is described

in detail in Section 4. on page 48.

PTBR - MMU Page Table Base Register

Contains a pointer to the start of the Page Table. The contents and functionality of the register is

described in detail in Section 4. on page 48.

TLBEAR - MMU TLB Exception Address Register

Contains the virtual address that caused the most recent MMU error. The contents and function-

ality of the register is described in detail in Section 4. on page 48.

MMUCR - MMU Control Register

Used to control the MMU and the TLB. The contents and functionality of the register is described

in detail in Section 4. on page 48.

TLBARLO/HI - MMU TLB Accessed Register Low/High

Contains the Accessed bits for the TLB. The contents and functionality of the register is

described in detail in Section 4. on page 48.

PCCNT - Performance Clock Counter

Clock cycle counter for performance counters. The contents and functionality of the register is

described in detail in the AVR32 Architecture Manual.

PCNT0 / PCNT1 - Performance Counter 0 / 1

Counts the events specified by the Performance Counter Control Register. The contents and

functionality of the register is described in detail in the AVR32 Architecture Manual.

PCCR - Performance Counter Control Register

Controls and configures the setup of the performance counters. The contents and functionality

of the register is described in detail in the AVR32 Architecture Manual.

BEAR - Bus Error Address Register

14

AVR32

32001A–AVR32–06/06

AVR32

Physical address that caused a Data Bus Error. This register is Read Only. Writes are allowed,

but are ignored.

SABAL - Service Access Bus Address Low

Lower part of address to Service Access Bus used by debug system.

SABAH - Service Access Bus Address High

Higher part of address to Service Access Bus used by debug system.

SABD - Service Access Bus Data

Data to or from Service Access Bus used by debug system.

15

32001A–AVR32–06/06

2.6

Configuration Registers

Configuration registers are used to inform applications and operating systems about the setup

and configuration of the processor on which it is running. Some of the fields in the configuration

registers are fixed for all implementations using the AVR32 AP platform, while others, like the

number of sets in each cache, can be different for each implementation of the platform. Such

fields have IMPL in the Value field in the following tables. The programmer should refer to the

data sheet for the specific product in order to obtain information on IMPL fields. The AVR32

implements the following read-only configuration registers.

Figure 2-4. Configuration Registers.

CONFIG0

31

24 23

20 19

Processor

Revision

16 15 13 12 10

9

7

6

F

5

J

4 3 2 1 0

Processor ID

-

AT

AR

MMUT

P O S D R

CONFIG1

31

26 25

20 19

1615

13 12

10 9

6

5

3

2

0

IMMU SZ

DMMU SZ

ISET

ILSZ

IASS

DSET

DLSZ

DASS

Table 2-4 shows the CONFIG0 fields.

Table 2-4.

Name

CONFIG0 Fields

Bit

Description

Specifies the type of processor. This allows the application to

distinguish between different processor implementations.

Processor ID

RESERVED

31:24

23:20

Reserved for future use.

Processor revision 19:16

Specifies the revision of the processor implementation.

Architecture type

Value

0

Semantic

AT

15:13

12:10

Unused in AVR32 AP

AVR32B

1

Other

Reserved

Architecture Revision

Value

0

Semantic

AR

Unused in AVR32 AP

Revision 1

1

Other

Reserved

16

AVR32

32001A–AVR32–06/06

AVR32

Table 2-4.

Name

CONFIG0 Fields (Continued)

Bit

Description

MMU type

Value

Semantic

0

Unused in AVR32 AP

Shared TLB

MMUT

9:7

1

2

3

Unused in AVR32 AP

Reserved

Other

Floating-point unit implemented

Value

Semantic

F

J

6

5

4

3

2

1

0

1

No FPU implemented

Unused in AVR32 AP

Java extension implemented

Value

Semantic

0

1

Unused in AVR32 AP

Java extension implemented

Performance counters implemented

Value

Semantic

P

O

S

D

R

0

1

Unused in AVR32 AP

Performance Counters implemented

On-Chip Debug implemented

Value

Semantic

0

1

Unused in AVR32 AP

OCD implemented

SIMD instructions implemented

Value

Semantic

0

1

Unused in AVR32 AP

SIMD instructions implemented

DSP instructions implemented

Value

Semantic

0

1

Unused in AVR32 AP

DSP instructions implemented

Memory Read-Modify-Write instructions implemented

Value

Semantic

0

1

No RMW instructions implemented

Unused in AVR32 AP

17

32001A–AVR32–06/06

Table 2-4 shows the CONFIG1 fields.

Table 2-5.

Name

CONFIG1 Fields

Bit

Description

IMMU SZ

31:26

Not used in single-MMU systems like AVR32 AP.

Indicates the number of entries in the shared MMU in single-MMU

systems like AVR32 AP. The number of entries in the MMU equals

(DMMU SZ) + 1.

DMMU SZ

25:20

Number of sets in ICACHE

Value

0

Semantic

1

2

4

3

8

4

16

5

32

6

64

ISET

19:16

7

128

256

512

1024

2048

4096

8192

16384

32768

8

9

10

11

12

13

14

15

Line size in ICACHE

Value

Semantic

0

1

2

3

4

5

6

7

No ICACHE present

4 bytes

8 bytes

ILSZ

15:13

16 bytes

32 bytes

64 bytes

128 bytes

256 bytes

18

AVR32

32001A–AVR32–06/06

AVR32

Table 2-5.

Name

CONFIG1 Fields (Continued)

Bit

Description

Associativity of ICACHE

Value

Semantic

Direct mapped

2-way

0

1

2

3

4

5

6

7

4-way

IASS

12:10

8-way

16-way

32-way

64-way

128-way

Number of sets in DCACHE

Value

0

Semantic

1

2

4

3

8

4

16

5

32

6

64

DSET

9:6

7

128

256

512

1024

2048

4096

8192

16384

32768

8

9

10

11

12

13

14

15

19

32001A–AVR32–06/06

Table 2-5.

Name

CONFIG1 Fields (Continued)

Bit

Description

Line size in DCACHE

Value

Semantic

0

1

2

3

4

5

6

7

No DCACHE present

4 bytes

8 bytes

DLSZ

5:3

16 bytes

32 bytes

64 bytes

128 bytes

256 bytes

Associativity ofDCACHE

Value

Semantic

Direct mapped

2-way

0

1

2

3

4

5

6

7

4-way

DASS

2:0

8-way

16-way

32-way

64-way

128-way

20

AVR32

32001A–AVR32–06/06

AVR32

3. Pipeline

3.1

Overview

AVR32 AP is a pipelined processor with seven pipeline stages. The pipeline has three subpipes,

namely the Multiply pipe, the Execute pipe and the Data pipe. These pipelines may execute dif-

ferent instructions in parallel. Instructions are issued in order, but may complete out of order

(OOO) since the subpipes may be stalled individually, and certain operations may use a subpipe

for several clock cycles.

The following figure shows an overview of the AVR32 AP pipeline stages.

Figure 3-1. The AVR32 AP pipeline stages.

M1

A1

DA

M2

A2

D

Multiply pipe

ALU pipe

IF1

IF2

ID

IS

WB

Prefetch unit

Decode unit

Load-store

pipe

The following abbreviations are used in the figure:

• IF1, IF2 - Instruction Fetch 1 and 2

• ID - Instruction Decode

• IS - Instruction Issue

• A1, A2 - ALU stage 1 and 2

• M1, M2 - Multiply stage 1 and 2

• DA - Data Address calculation stage

• D - Data cache access

• WB - Writeback

3.2

Prefetch unit

The prefetch unit comprises the IF1 and IF2 pipestages, and is responsible for feeding instruc-

tions to the decode unit. The prefetch unit fetches 32 bits at a time from the instruction cache

and places them in a FIFO prefetch buffer. At the same time, one instruction, either RISC

extended or compact, or Java, is fed to the decode stage.

The instruction fetches are probed for the presence of change-of-flow instructions. If such

instructions are found, the prefetch unit will try to determine the destination of the instruction and

continue fetching instructions from there. The branch penalty will be eliminated if the prefetch

unit correctly predicts the destination of a change-of-flow instruction. When possible, the

21

32001A–AVR32–06/06

prefetch unit will remove the change-of-flow instruction from the pipeline and replace it with the

target instruction. This is called branch folding.

In Java mode, the prefetch unit is able to recognize certain Java instruction pairs and merge

them together to one merged instruction. These merged instructions are passed on to ID as one

instruction.

Details about the prefetch unit is given in chapter 5.

3.3

Decode unit

The decode unit generates the necessary signals in order for the instruction to execute correctly.

The ID stage accepts one instruction each clock cycle from the prefetch unit. This instruction is

then decoded, and control signals and register file addresses are generated. If the instruction

cannot be decoded, an illegal instruction or unimplemented instruction exception is issued. The

ID stage also contains a state machine required for controlling multicycle instructions.

The ID stage performs the remapping of register file addresses from logical to physical

addresses. This is used both for remapping register address into the different contexts, and for

remapping registers to the Java operand stack if the R bit in the status register is set. The ID

stage also contains the Java Operand Stack Pointer (JOSP) register which is used to address

the Java operand stack if the CPU is running in Java mode.

The IS stage performs register file reads and keeps track of data hazards in the pipeline. If haz-

ards exist, pipelines are frozen as needed in order to resolve the hazard.

3.4

ALU pipeline

The ALU pipeline performs most of the data manipulation instructions, like arithmetical and logi-

cal operations. The A1 stage performs the following tasks:

• Target address calculation and condition check for change-of-flow instructions. The A1

pipestage checks if the branch prediction performed by the prefetch unit was correct. If not,

the prefetch unit is notified so that the pipeline can be flushed, the correct instruction can be

fetched and the BTB can be updated.

• Condition code checking for conditional instructions.

• Address calculation for indexed memory accesses

• Writeback address calculation for the LS pipeline.

• All flag setting for arithmetical and logical instructions.

• The A2 stage performs the following tasks:

• The saturation needed by satadd and satsub.

• The operation and flag setting needed by satrnds, satrndu, sats and satu.

22

AVR32

32001A–AVR32–06/06

AVR32

3.5

Multiply pipeline

All multiply instructions execute in the multiply pipeline. This pipeline contains a 32 by 16 multi-

plier array, and 16x16 and 32x16 multiplications therefore have an issue latency of one cycle.

Multiplication of 32 by 32 bits require two iterations through the multiplier array, and therefore

needs several cycles to complete. Additional cycles may be needed if an accumulation is to be

performed. This will stall the multiply pipeline until the instruction is complete.

A special accumulator cache is implemented in the MUL pipeline. This cache saves the multiply-

accumulate result in dedicated registers in the MUL pipeline, as well as writing them back to the

register file. This allows subsequent MAC instructions to read the accumulator value from the

cache, instead of from the register file. This will speed up MAC operations by one clock cycle. If

a MAC instruction targets a register not found in the cache, one clock cycle is added to the MAC

operation, loading the accumulator value from the register file into the cache. In the next cycle,

the MAC operation is restarted automatically by hardware. If a multiply (not MAC) instruction is

executed with target address equal to that of a valid cached register, the multiply instruction will

update the cache. All multiply and divide instructions will update the cache with its result, so that

a subsequent MAC to the same register will not have to preload the cache.

The accumulator cache can hold one doubleword accumulator value, or one word accumulator

value. Hardware ensures that the accumulator cache is kept consistent. If another pipeline

updates one of the registers kept in the accumulator cache, the cache is invalidated. The cache

is automatically invalidated after reset.

Some of the multiply instructions, machh.d, macwh.d, mulwh.d and mulnwh.d, produce a 48-bit

result that is to be placed in two registers. These instructions all have an issue latency of 1, even

though the MUL pipe only has one writeback port and two results are produced. This is handled

by delaying the writeback of the low register until the MUL pipeline is idle. Then, the low register

can be written back without stalling the MUL pipe. The high register is written back to the register

file when the instruction leaves the M2 stage. This scheme allows several of these instructions to

be issued consecutively, with no stalls due to writeback port congestion. This will increase per-

formance in MUL-intensive applications such as DSP algorithms. The MUL pipe can only hold

one delayed register for writeback, so a MUL instruction writing to another register will have to

stall one cycle in IS if a writeback is pending in the MUL pipe. Hazard detection is performed on

the pending writeback register, so any instruction reading a register pending writeback will stall

in IS until the value is forwardable in M2.

The multiply pipeline also contains a divider, performing multicycle 32-by-32 signed and

unsigned division with both quotient and remainder outputs.

In general, the MUL instructions do not set any flags. However, some of the MUL instructions

may set the saturate (Q) flag. No hazard detection is performed on this setting of the Q flag. The

programmer must ensure that such a Q flag update has propagated to the status register

before using the Q flag.

23

32001A–AVR32–06/06

3.6

Load-store pipeline

The load-store (LS) pipeline is able to read or write up to two registers per clock cycle, if the data

is 64-bit aligned. The address is calculated by the A1 pipe stage for indexed and load-extracted-

index accesses, the DA stage performs all other address calculations. Thereafter the address is

passed on to the LS pipe and output to the cache, together with the data to write if the access is

a write. If the access is a read, the read data is returned from the cache in the D stage. If the

read data requires typecasting or other manipulation like performed by ldins or ldswp, this

manipulation is performed in the WB stage.

The LS pipeline also contains hardware for performing load and store multiple instructions

decoupled from the rest of the core. For such instructions, the A1 stage calculates the pointer

writeback address if needed. The load or store is then decoupled from the integer unit, and the

integer unit may execute sequential instructions if no hazards occur. Load and store of multiple

registers are performed by accessing 2 words at a time. If the first address is not 64-bit aligned,

the first access is performed as a single word. The rest of the transfer is then performed as 64 bit

accesses. The last transfer may need to be performed as a 32 bit access, depending on the

number of registers to load or store.

For code efficiency purposes, the programmer should always try to rearrange the instructions in

the code in such a way that no data stalls will occur.

3.6.1

Support for unaligned addresses

The LS pipeline is able to perform certain word-sized load and store instructions of any align-

ment, and word-aligned st.d and ld.d. Any other unaligned memory access will cause an MMU

address exception. All coprocessor memory access instructions require word-aligned pointers.

Doubleword-sized accesses with word-aligned pointers will automatically be performed as two

word-sized accesses.

The following table shows the instructions with support for unaligned addresses. All other

instructions require aligned addresses. Accessing an unaligned address may require several

clock cycles, refer to Section 10. on page 154 for details.

Table 3-1.

Instructions with unalignment support

Supported alignment

Instruction

ld.w

Any

st.w

Any

lddsp

Any

lddpc

Any

stdsp

Any

ld.d

Word

st.d

All coprocessor memory access instruction

24

AVR32

32001A–AVR32–06/06

AVR32

3.7

3.8

Writeback

The three subpipes share a writeback (WB) stage with three register file write ports. If the three

subpipes produces four results at the same time, the MUL pipeline is temporarily stalled until a

writeback port is available. The WB stage also contains logic for:

• Sign- or zero-extention of data loaded from cache.

• Execution of ldins and ldswp.

• Output formatting of data loaded from unaligned addresses.

Forwarding hardware and hazard detection

The pipeline is implemented in such a way that the programmer in most cases will not have to

consider hazards between instructions when writing code. Efficient operand forwarding mecha-

nisms are implemented in order to minimize pipeline stalls due to data dependencies. When

dependencies exist, the hardware will stall the affected parts of the pipeline in order to guaran-

tee correct execution. Data forwarding is done automatically and is invisible to the user. This

ensures that all code will execute correctly, even though the pipeline may have to be stalled in

some cases. The user should be aware of these stalls and try to rewrite the code so that no such

dependencies arise. This will result in faster execution.

Since instructions are allowed to complete out of order, both Write-After-Read (WAR), Write-

After-Write (WAW) and Read-After-Write (RAW) hazards may occur. If an instruction is affected

by a hazard, or will provoke a hazard, it is frozen in the IS stage until the hazard is resolved. This

will also freeze all upstream pipeline stages. All downstream stages are allowed to continue exe-

cution. Instructions storing data to memory will read the data to store from the register file in the

D pipeline stage. This pipeline stage has a dedicated hazard detection and forwarding unit. If the

data to store to memory is not available in the D stage, the LS pipe will have to stall. Newer

instructions may still start executing in the other pipelines.

3.8.1

IS stage forwarding

The IS stage is able to forward data from the register file inputs to the register file outputs. If data

to write is present at the write ports of the register file at the same time as the register is read,

the data not yet written will be read. This ensures that data from the writeback stages are for-

warded to the register file outputs. This is illustrated in Figure 3-2:

Figure 3-2. Forwarding inside the IS stage

Read address_n==

Register File

Write address_m

Read port n

Forwarded data

Write address

Write port m

Write data

25

32001A–AVR32–06/06

3.8.2

Forwarding sources

All operations that produce valid results are forwarded. All data are forwarded directly from the

inputs of pipeline registers. The following figure shows the forwarding sources, and the name of

the forwarded signals. Each of the forwarded signals carry a word-sized value. Pipeline registers

are illustrated as a thick black line, the load modification unit is illustrated as a gray box.

Figure 3-3. Forwarding sources

fwd_mul

Load

Integer unit

Mod

Multiply pipe

M1

A1

M2

A2

ALU pipe

WB

fwd_a1

fwd_a2

Load-store unit

Data pipe

DA

D

fwd_dataA fwd_dataB

3.8.3

Forwarding destinations

The forwarded data is input to the IS stage. The IS stage has logic deciding whether the value

read from the register file is valid, or if a forwarded value should be used. This is illustrated in

Figure 3-4. Forwarded data is shown with bent arrows, and data from the previous pipeline

stage is shown in straight arrows. The forwarded value really consists of all the possible forward

values described in Figure 3-3, but is shown as a single value for simplicity. The prefetch unit

also receives forwarded data. This data is used for calculating an instruction fetch address for

change-of-flow instructions. Target addresses for change-of-flow instructions are produced

either by the A1 stage, or the WB stage.

26

AVR32

32001A–AVR32–06/06

AVR32

Figure 3-4. Forwarding destinations

Prefetch unit

Integer unit

M1

IF

M2

A1

A2

Reg

File

WB

Load-store unit

Pointer

DA

D

Issue

To cache

3.9

Hazards not handled by the hardware

All hazards occurring between normal arithmetical, logical, load-store and change-of-flow

instructions are handled automatically by hardware. There are, however, a few instruction

sequences which must be sequenced by the user. These sequences are described in this chap-

ter. The programmer can assume that any instruction sequence other than the sequences

explicitly mentioned in this chapter will work without any special consideration.

3.9.1

Accessing system registers with mtsr and mfsr

The mtsr instruction writes the contents of a register into a system register. The system registers

control the behaviour of the CPU. The programmer must make sure that any mtsr instruction has

committed and has altered the state of the system in the desired way before issuing any new

instructions that depend on this new state. This can be done by inserting nop instructions, or

other instructions that do not depend on the new state generated by the mtsr instruction.

Table 2-2, “System Registers implemented in AVR32 AP,” on page 10 details the timing for

writes into the different system registers. The system registers are written as the mtsr instruction

leaves the pipeline stage described in the table. The system registers are read as the mfsr

instruction leaves the pipeline stage described in the table. As soon as a system register is read

by mfsr, it can be forwarded as any regular register file register.

Some of the system registers are located inside modules on the TCB bus. These are written

when the mtsr instruction leaves the D pipeline stage. Instructions depending on a mtsr to these

system registers being committed must therefore wait in the IS stage until the effects of the mtsr

is guaranteed to be visible to the instruction. The following code demonstrates a write to the

ASID field of TLBEHI, followed by a rete to an address which requires the new ASID to be visi-

ble. A nop is inserted to guarantee that the mtsr leaves the D stage at the same time as rete

leaves the A1 stage. In the following cycle, the icache will start fetching at the specified address

27

32001A–AVR32–06/06

and observe the newly updated ASID. Register r0 is assumed to contain the correct value to

write into TLBEHI.

mtsr TLBEHI, r0

nop

rete

3.9.2

3.9.3

Writing to the status register with ssrf and csrf

These instructions have the same timing as a mtsr to the system register.

Writing to and using the JOSP register

The JOSP register is used to determine which register file register to access when in Java

mode. This is needed because the 8 elements on top of the Java operand stack are located in

the register file. Since the register addresses are generated in the ID stage, JOSP is located

here.

JOSP is automatically updated to the correct value when executing Java bytecodes in Java

mode. One may also need to update the JOSP register manually, either with the incjosp instruc-

tion, or using mtsr/mfsr for reading/writing JOSP.

When updating JOSP with incjosp, JOSP is updated with the new value when incjosp has left ID.

The incjosp instruction reads the value of JOSP when it is in ID, and writes the new value as it

leaves ID. If the incjosp instruction is flushed from the pipe before being committed for some rea-

son like an interrupt or a taken change-of-flow instruction, hardware automatically restores the

correct value to the JOSP register. The JOSP register will be restored to the value it had after

the last completed instruction.

When updating JOSP with mtsr, JOSP is updated with the new value when mtsr has left A1.

The user is responsible for not letting any instruction that uses JOSP leave ID before mtsr has

written the new JOSP value. This may require inserting nop instructions between mtsr and any

instruction using JOSP.

The following assembly code illustrates coding to avoid hazards when accessing JOSP. Two

nop instructions are inserted to make sure that the new value of JOSP written by mtsr as mtsr

leaves A1 is visible to the incjosp instruction when it enters ID. A mfsr instruction may follow

immediately after incjosp, as incjosp writes the new JOSP value when it leaves ID, while mfsr

reads JOSP while it is in A1.

mtsr JOSP, r0

nop

incjosp -2

mfsr r1, JOSP

The following assembly code is another illustration of coding to avoid hazards when accessing

JOSP. The two sets of code perform identical operations. This code sets the R bit in the status

register in order to enable remapping of the register file to a Java operand stack. This effectively

remaps r0 to r7 into a Java operand stack, where the mapping from logical register to physical

register is dependent on the value of JOSP. Note that the second code example is strongly dis-

couraged to use in practice, since no JOSP over/underflow detection is performed. The code is

presented only to show the differences in timing between the two ways of writing to JOSP.

In the first code, incjosp changes the value of JOSP when it is in ID. The new value of JOSP is

therefore visible when the add instruction enters ID.

28

AVR32

32001A–AVR32–06/06

AVR32

In the second code, mtsr writes the new value of JOSP as it leaves A1. As the add instruction

needs JOSP to be updated when it enters ID because of the register remapping, two nop

instructions must be inserted.

ssrf R

incjosp -2

add r0, r0

ssrf R

mfsr r8, JOSP

sub r8, 2

mtsr JOSP, r8

nop

add r0, r0

3.9.4

Execution of TLB instructions

The TLB instructions tlbr, tlbw and tlbs are used to maintain the data in the TLB. They use the

TCB bus to access the MMU, and the instruction is dispatched to the MMU when the instruction

is in the D pipeline stage. The programmer must make sure that any writes to the TLB with the

tlbw instructions are completed before the TLB entry is used in an icache or dcache memory

access. This is handled automatically for any dcache memory access, since any load/store

instructions flow through the same pipeline as the tlbw instruction, and the tlbw instruction will

have left the D stage before any load/store instruction enters it. Any icache access that is to use

the page table entry written by tlbw must wait until the tlbw instruction is in the D pipeline stage.

This may require inserting a nop or another unrelated instruction, as illustrated in the code

below, which shows a part of a ITLB miss handler. The rete instruction wishes to use the page

table entry written by tlbw to generate the physical address of the instruction to return to.

tlbw

nop

rete

3.9.5

Execution of cache instructions

The cache instruction perform various cache-relatated operations, like invalidation of lines.

Some of these operations are harmless, and need no sequencing or hazard consideration.

Other operations, like invalidation, require more concern. The programmer must make sure that

any invalidation is committed before any instruction the depends on the invalidation already

being performed is allowed to execute.

The cache instruction use the TCB bus to access the caches, and the instruction is dispatched

to the caches when the instruction is in the D pipeline stage. The programmer must make sure

that any cache instructions are completed before any icache or dcache memory access that

depends on the cache instruction is executed. This is handled automatically for any dcache

memory access, since any load/store instructions flow through the same pipeline as the cache

instruction, and the cache instruction will have left the D stage before any load/store instruction

enters it. Any icache access that is dependent on the cache instruction must wait until the cache

instruction is in the D pipeline stage. This may require inserting a nop or another unrelated

instruction, as illustrated in the code below. The rjmp instruction wishes to jump to a location

29

32001A–AVR32–06/06

labeled flushedaddress that must be flushed from the cache. INVALIDATEI is a macro that is

defined to be the command for invalidation of the icache.

cache INVALIDATEI

nop

rjmp flushedaddress

3.9.6

Hazards on the Q flag

Some of the instructions in the instruction set updates the status register Q flag. Many of these

instructions, like satadd, generate the new Q flag after a single cycle so no hazards are present

between these instructions and other instructions. The sats, satu, satrnds, satrndu and some

multiply instructions, require several cycles before updating the Q flag. The required Q flag

latency for each of these instructions is listed in Section 10. on page 154. The user must make

sure that any of these instructions have completed and updated the Q flag before using the Q

flag in any computations. In the following example, a satrnds instruction is followed by a branch-

if-q-set instruction. A nop is needed in order to guarantee correct execution.

satrnds r0>>0, 5

nop

brqs targetaddress

3.10 Event handling

The CPU is able to respond to different events. An event can be either an interrupt or an excep-

tion. Interrupts are requests from external modules and are routed through the interrupt

controller. Exceptions are system events that require handling outside normal program flow.

Different types of exceptions can occur during execution of an instruction. Some exceptions are

instruction-address related, and occur during instruction fetch. Other exceptions occur during

decode, like unimplemented instruction and illegal opcode. Data access instructions can cause

data-address related exceptions, like DTLB miss. Exceptions can occur in different pipe stages,

depending on the type of exception. Several exceptions can be related to the same instruction.

Mechanisms must therefore be implemented so that several exceptions associated with the

same instruction can be handled correctly. The exception priorities are defined Table 3-2 on

page 34. An instruction that has caused an exception request is called a contaminated

instruction.

Each pipeline stage has a pipeline register that holds the exception requests associated with the

instruction in that pipeline stage. This allows the exception request to follow the contaminated

instruction through the pipeline.

Events are detected in two different pipeline stages. The D stage detects all data-address

related exceptions (DTLB multiple hit, DTLB miss, DTLB protection and DTLB modified). All

other exceptions and interrupts are detected in the A1 stage. Data breakpoints are also detected

in A1.

A complication occurs with the event detection in the A1 stage: The instruction tagged as con-

taminated may be part of a folded branch. In this case, the event is taken only if the branch

prediction was correct. Otherwise, the entire folded branch instruction is flushed.

Data-address related exceptions are detected in the D stage. The address boundary check unit

ensures that no sequential instructions are issued unless it can be guaranteed that the data

access will not generate an exception.

30

AVR32

32001A–AVR32–06/06

AVR32

Generally, all exceptions, including breakpoint, have the failing instruction as restart address.

This allows a fixup exception routine to correct the error and restart the instruction. Interrupts

(INT0-3, NMI) have the address of the first non-completed instruction as restart address. When

an event is accepted, the A1 stage and all upstream stages are flushed.

Branch folding complicates exception handling. If a folded instruction fails the condition check in

the A1 stage, the address of the folded instruction should be used as restart address. This is

implemented by passing the address of the folded instruction in the PC pipeline register.

When folding branches, both the branch and the folded instruction can be contaminated. How do

we determine which of the two instructions caused the exception? The fetch stage is responsible

for not folding instructions if the branch instruction is contaminated. The branch instruction can

be contaminated only due to instruction-address related exceptions, as it must already have

been decoded and recognized in order to have been placed in the BTB. This contamination is

known already in IF. If folding has occurred, it is guaranteed that the contamination was not in

the branch instruction, and it must therefore be in the folded instruction. Therefore, the folded

instruction should be restarted.

3.10.1

Event priority

Several instructions may be in the pipeline at the same time, and several events may be issued

in each pipeline stage. This implies that several pending exceptions may be in the pipeline

simultaneously. Priorities must therefore be imposed, ensuring that the correct event is serviced

first. The priority scheme obeys the following rules:

1. If several instructions trigger events, the instruction furthest down the pipeline is ser-

viced first, even if upstream instructions have pending events of higher priority.

2. If this instruction has several pending events, the event with the highest priority is ser-

viced first. After this event has been serviced, all pending events are cleared and the

instruction is restarted.

3.10.2

Exceptions and interrupt requests

When an event other than scall or debug request is received by the core, the following actions

are performed atomically:

1. The pending event will not be accepted if it is masked. The I3M, I2M, I1M, I0M, EM and

GM bits in the Status Register are used to mask different events. Not all events can be

masked. A few critical events (NMI, Unrecoverable Exception, TLB Multiple Hit and Bus

Error) can not be masked. When an event is accepted, hardware automatically sets the

mask bits corresponding to all sources with equal or lower priority. This inhibits accep-

tance of other events of the same or lower priority, except for the critical events listed

above. Software may choose to clear some or all of these bits after saving the neces-

sary state if other priority schemes are desired. It is the event source’s responsability to

ensure that their events are left pending until accepted by the CPU.

2. When a request is accepted, the Status Register and Program Counter of the current

context is stored in the Return Status Register and Return Address Register corre-

sponding to the new context. Saving the Status Register ensures that the core is

returned to the previous execution mode when the current event handling is completed.

When exceptions occur, both the EM and GM bits are set, and the application may

manually enable nested exceptions if desired by clearing the appropriate bit. Each

exception handler has a dedicated handler address, and this address uniquely identi-

fies the exception source.

31

32001A–AVR32–06/06

3. The Mode bits are set correctly to reflect the priority of the accepted event, and the cor-

rect register file banks are selected. The address of the event handler, as shown in

Table 3-2, is loaded into the Program Counter.

The execution of the event routine then continues from the effective address calculated.

The rete instruction signals the end of the event. When encountered, the values in the Return

Status Register and Return Address Register corresponding to the event context are restored to

the Status Register and Program Counter. The restored Status Register contains information

allowing the core to resume operation in the previous execution mode. This concludes the event

handling.

3.10.3

Supervisor calls

The AVR32 instruction set provides a supervisor mode call instruction. The scall instruction is

designed so that privileged routines can be called from any context. This facilitates sharing of

code between different execution modes. The scall mechanism is designed so that a minimal

execution cycle overhead is experienced when performing supervisor routine calls from time-

critical event handlers.

The scall instruction behaves differently depending on which mode it is called from. The behav-

iour is detailed in the Instruction Set Reference in the Architecture Manual. In order to allow the

scall routine to return to the correct context, a return from supervisor call instruction, rets, is

implemented.

3.10.4

Debug requests

The AVR32 architecture defines a dedicated debug mode. When a debug request is received by

the core, Debug mode is entered. Entry into Debug mode can be masked by the DM bit in the

status register. Upon entry into Debug mode, hardware sets the SR[D] bit and jumps to the

Debug Exception handler. By default, debug mode executes in the exception context, but with

dedicated Return Address Register and Return Status Register. These dedicated registers

remove the need for storing this data to the system stack, thereby improving debuggability. The

mode bits in the status register can freely be manipulated in Debug mode, to observe registers

in all contexts, while retaining full privileges.

Debug mode is exited by executing the retd instruction. This returns to the previous context.

3.11 Entry points for events

Several different event handler entry points exists. For AVR32B, the reset routine entry address

is always fixed to 0xA000_0000. This address resides in unmapped, uncached space in order to

ensure well-defined resets.

TLB miss exceptions and scall have a dedicated space relative to EVBA where their event han-

dler can be placed. This speeds up execution by removing the need for a jump instruction placed

at the program address jumped to by the event hardware. All other exceptions have a dedicated

event routine entry point located relative to EVBA. The handler routine address identifies the

exception source directly.

All external interrupt requests have entry points located at an offset relative to EVBA. This

autovector offset is specified by an external Interrupt Controller. The programmer must make

sure that none of the autovector offsets interfere with the placement of other code. The autovec-

tor offset has 14 address bits, giving an offset of maximum 16384 bytes.

Special considerations should be made when loading EVBA with a pointer. Due to security con-

siderations, the event handlers should be located in the privileged address space, or in a

32

AVR32

32001A–AVR32–06/06

AVR32

privileged memory protection region. In a segmented AVR32B system, some segments of the

virtual memory space may be better suited than others for holding event handlers. This is due to

differences in translateability and cacheability between segments. A cacheable, non-translated

segment may offer the best performance for event handlers, as this will eliminate any TLB

misses and speed up instruction fetch. The user may also consider to lock the event handlers in

the instruction cache.

If several events occur on the same instruction, they are handled in a prioritized way. The priority

ordering is presented in Table 3-2. If events occur on several instructions at different locations in

the pipeline, the events on the oldest instruction are always handled before any events on any

younger instruction, even if the younger instruction has events of higher priority than the oldest

instruction. An instruction B is younger than an instruction A if it was sent down the pipeline later

than A.

The addresses and priority of simultaneous events are shown in Table 3-2 on page 34

The interrupt system requires that an interrupt controller is present outside the core in order to

prioritize requests and generate a correct offset if more than one interrupt source exists for each

priority level. An interrupt controller generating different offsets depending on interrupt request

source is referred to as autovectoring. Note that the interrupt controller should generate

autovector addresses that do not conflict with addresses in use by other events or regular pro-

gram code.

The addresses of the interrupt routines are calculated by adding the address on the autovector

offset bus to the value of the Exception Vector Base Address (EVBA). In AVR32 AP, the actual

autovector address is formed by bitwise OR-ing the autovector offset to EVBA. Using bitwise-OR

instead of an adder saves hardware. The programmer must consider this when setting up

EVBA.

33

32001A–AVR32–06/06

Table 3-2.

Priority and handler addresses for events

Priority

1

Handler Address

0xA000_0000

Provided by OCD system

EVBA+0x00

EVBA+0x04

EVBA+0x08

EVBA+0x0C

EVBA+0x10

Autovectored

EVBA+0x14

EVBA+0x50

EVBA+0x18

EVBA+0x1C

EVBA+0x20

EVBA+0x24

EVBA+0x28

EVBA+0x2C

EVBA+0x30

EVBA+0x100

EVBA+0x34

EVBA+0x38

EVBA+0x60

EVBA+0x70

EVBA+0x3C

EVBA+0x40

EVBA+0x44

Name

Event source

External input

OCD system

Internal

Stored Return Address

Undefined

Reset

2

OCD Stop CPU

Unrecoverable exception

TLB multiple hit

First non-completed instruction

PC of offending instruction

First non-completed instruction

PC of offending instruction

First non-completed instruction

PC of offending instruction

Unused in AVR32 AP

3

4

Internal signal

Data bus

External input

ITLB

5

Bus error data fetch

Bus error instruction fetch

NMI

6

7

8

Interrupt 3 request

Interrupt 2 request

Interrupt 1 request

Interrupt 0 request

Instruction Address

ITLB Miss

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

ITLB

ITLB Protection

ITLB

Breakpoint

OCD system

Instruction

-

Illegal Opcode

Unimplemented instruction

Privilege violation

Floating-point

Coprocessor absent

Supervisor call

Instruction

DTLB

PC of offending instruction

PC(Supervisor Call) +2

Data Address (Read)

Data Address (Write)

DTLB Miss (Read)

DTLB Miss (Write)

DTLB Protection (Read)

DTLB Protection (Write)

DTLB Modified

PC of offending instruction

DTLB

34

AVR32

32001A–AVR32–06/06

AVR32

3.11.1

Description of events in AVR32 AP

3.11.1.1

Reset Exception

The Reset exception is generated when the reset input line to the CPU is asserted. The Reset

exception can not be masked by any bit. The Reset exception resets all synchronous elements

and registers in the CPU pipeline to their default value, and starts execution of instructions at

address 0xA000_0000.

SR = reset_value_of_SREG;

PC = 0xA000_0000;

All other system registers are reset to their reset value, which may or may not be defined. Refer

to “Programming Model” on page 6 for details.

3.11.1.2

OCD Stop CPU Exception

The OCD Stop CPU exception is generated when the OCD Stop CPU input line to the CPU is

asserted. The OCD Stop CPU exception can not be masked by any bit. This exception is identi-

cal to a non-maskable, high priority breakpoint. Any subsequent operation is controlled by the

OCD hardware. The OCD hardware will take control over the CPU and start to feed instructions

directly into the pipeline.

RSR_DBG = SR;

RAR_DBG = PC;

SR[M2:M0] = B’110;

SR[R] = 0;

SR[J] = 0;

SR[D] = 1;

SR[DM] = 1;

SR[EM] = 1;

SR[GM] = 1;

3.11.1.3

Unrecoverable Exception

The Unrecoverable Exception is generated when an exception request is issued when the

Exception Mask (EM) bit in the status register is asserted. The Unrecoverable Exception can not

be masked by any bit. The Unrecoverable Exception is generated when a condition has

occurred that the hardware cannot handle. The system will in most cases have to be restarted if

this condition occurs.

RSR_EX = SR;

RAR_EX = PC of offending instruction;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’110;

SR[EM] = 1;

SR[GM] = 1;

PC = EVBA | 0x00;

35

32001A–AVR32–06/06

3.11.1.4

TLB Multiple Hit Exception

TLB Multiple Hit exception is issued when multiple address matches occurs in the TLB, causing

an internal inconsistency.

This exception signals a critical error where the hardware is in an undefined state. All interrupts

are masked, and PC is loaded with EVBA | 0x04. MMU-related registers are updated with infor-

mation in order to identify the failing address and the failing TLB if multiple TLBs are present.

TLBEHI[ASID] is unchanged after the exception, and therefore identifies the ASID that caused

the exception.

RSR_EX = SR;

RAR_EX = PC of offending instruction;

TLBEAR = FAILING_VIRTUAL_ADDRESS;

TLBEHI[VPN] = FAILING_PAGE_NUMBER;

TLBEHI[I] = 0/1, depending on which TLB caused the error;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’110;

SR[EM] = 1;

SR[GM] = 1;

PC = EVBA | 0x04;

3.11.1.5

Bus Error Exception on Data Access

The Bus Error on Data Access exception is generated when the data bus detects an error condi-

tion. This exception is caused by events unrelated to the instruction stream, or by data written to

the cache write-buffers many cycles ago. Therefore, execution can not be resumed in a safe

way after this exception. The value placed in RAR_EX is unrelated to the operation that caused

the exception. The exception handler is responsible for performing the appropriate action.

RSR_EX = SR;

RAR_EX = PC of first non-issued instruction;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’110;

SR[EM] = 1;

SR[GM] = 1;

PC = EVBA | 0x08;

3.11.1.6

Bus Error Exception on Instruction Fetch

The Bus Error on Instruction Fetch exception is generated when the data bus detects an error

condition. This exception is caused by events related to the instruction stream. Therefore, exe-

cution can be restarted in a safe way after this exception, assuming that the condition that

caused the bus error is dealt with.

RSR_EX = SR;

RAR_EX = PC;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’110;

SR[EM] = 1;

36

AVR32

32001A–AVR32–06/06

AVR32

SR[GM] = 1;

PC = EVBA | 0x0C;

3.11.1.7

NMI Exception

The NMI exception is generated when the NMI input line to the core is asserted. The NMI excep-

tion can not be masked by the SR[GM] bit. However, the core ignores the NMI input line when

processing an NMI Exception (the SR[M2:M0] bits are B’111). This guarantees serial execution

of NMI Exceptions, and simplifies the NMI hardware and software mechanisms.

Since the NMI exception is unrelated to the instruction stream, the instructions in the pipeline are

allowed to complete. After finishing the NMI exception routine, execution should continue at the

instruction following the last completed instruction in the instruction stream.

RSR_NMI = SR;

RAR_NMI = Address of first noncompleted instruction;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’111;

SR[EM] = 1;

SR[GM] = 1;

PC = EVBA | 0x10;

3.11.1.8

INT3 Exception

The INT3 exception is generated when the INT3 input line to the core is asserted. The INT3

exception can be masked by the SR[GM] bit, and the SR[I3M] bit. Hardware automatically sets

the SR[I3M] bit when accepting an INT3 exception, inhibiting new INT3 requests when process-

ing an INT3 request.

The INT3 Exception handler address is calculated by adding EVBA to an interrupt vector offset

specified by an interrupt controller outside the core. The interrupt controller is responsible for

providing the correct offset.

Since the INT3 exception is unrelated to the instruction stream, the instructions in the pipeline

are allowed to complete. After finishing the INT3 exception routine, execution should continue at

the instruction following the last completed instruction in the instruction stream.

RSR_INT3 = SR;

RAR_INT3 = Address of first noncompleted instruction;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’101;

SR[I3M] = 1;

SR[I2M] = 1;

SR[I1M] = 1;

SR[I0M] = 1;

PC = EVBA | INTERRUPT_VECTOR_OFFSET;

3.11.1.9

INT2 Exception

The INT2 exception is generated when the INT2 input line to the core is asserted. The INT2

exception can be masked by the SR[GM] bit, and the SR[I2M] bit. Hardware automatically sets

37

32001A–AVR32–06/06

the SR[I2M] bit when accepting an INT2 exception, inhibiting new INT2 requests when process-

ing an INT2 request.

The INT2 Exception handler address is calculated by adding EVBA to an interrupt vector offset

specified by an interrupt controller outside the core. The interrupt controller is responsible for

providing the correct offset.

Since the INT2 exception is unrelated to the instruction stream, the instructions in the pipeline

are allowed to complete. After finishing the INT2 exception routine, execution should continue at

the instruction following the last completed instruction in the instruction stream.

RSR_INT2 = SR;

RAR_INT2 = Address of first noncompleted instruction;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’100;

SR[I2M] = 1;

SR[I1M] = 1;

SR[I0M] = 1;

PC = EVBA | INTERRUPT_VECTOR_OFFSET;

3.11.1.10

INT1 Exception

The INT1 exception is generated when the INT1 input line to the core is asserted. The INT1

exception can be masked by the SR[GM] bit, and the SR[I1M] bit. Hardware automatically sets

the SR[I1M] bit when accepting an INT1 exception, inhibiting new INT1 requests when process-

ing an INT1 request.

The INT1 Exception handler address is calculated by adding EVBA to an interrupt vector offset

specified by an interrupt controller outside the core. The interrupt controller is responsible for

providing the correct offset.

Since the INT1 exception is unrelated to the instruction stream, the instructions in the pipeline

are allowed to complete. After finishing the INT1 exception routine, execution should continue at

the instruction following the last completed instruction in the instruction stream.

RSR_INT1 = SR;

RAR_INT1 = Address of first noncompleted instruction;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’011;

SR[I1M] = 1;

SR[I0M] = 1;

PC = EVBA | INTERRUPT_VECTOR_OFFSET;

3.11.1.11

INT0 Exception

The INT0 exception is generated when the INT0 input line to the core is asserted. The INT0

exception can be masked by the SR[GM] bit, and the SR[I0M] bit. Hardware automatically sets

the SR[I0M] bit when accepting an INT0 exception, inhibiting new INT0 requests when process-

ing an INT0 request.

The INT0 Exception handler address is calculated by adding EVBA to an interrupt vector offset

specified by an interrupt controller outside the core. The interrupt controller is responsible for

providing the correct offset.

38

AVR32

32001A–AVR32–06/06

AVR32

Since the INT0 exception is unrelated to the instruction stream, the instructions in the pipeline

are allowed to complete. After finishing the INT0 exception routine, execution should continue at

the instruction following the last completed instruction in the instruction stream.

RSR_INT0 = SR;

RAR_INT0 = Address of first noncompleted instruction;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’010;

SR[I0M] = 1;

PC = EVBA | INTERRUPT_VECTOR_OFFSET;

3.11.1.12

Instruction Address Exception

The Instruction Address Error exception is generated if the generated instruction memory

address has an illegal alignment.

RSR_EX = SR;

RAR_EX = PC;

TLBEAR = FAILING_VIRTUAL_ADDRESS;

TLBEHI[VPN] = FAILING_PAGE_NUMBER;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’110;

SR[EM] = 1;

SR[GM] = 1;

PC = EVBA | 0x14;

3.11.1.13

ITLB Miss Exception

The ITLB Miss exception is generated when no TLB entry matches the instruction memory

address, or if the Valid bit in a matching entry is 0.

RSR_EX = SR;

RAR_EX = PC;

TLBEAR = FAILING_VIRTUAL_ADDRESS;

TLBEHI[VPN] = FAILING_PAGE_NUMBER;

TLBEHI[I] = 1;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’110;

SR[EM] = 1;

SR[GM] = 1;

PC = EVBA | 0x50;

39

32001A–AVR32–06/06

3.11.1.14

ITLB Protection Exception

The ITLB Protection exception is generated when the instruction memory access violates the

access rights specified by the protection bits of the addressed virtual page.

RSR_EX = SR;

RAR_EX = PC;

TLBEAR = FAILING_VIRTUAL_ADDRESS;

TLBEHI[VPN] = FAILING_PAGE_NUMBER;

TLBEHI[I] = 1;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’110;

SR[EM] = 1;

SR[GM] = 1;

PC = EVBA | 0x18;

3.11.1.15

Breakpoint Exception

The Breakpoint exception is issued when a breakpoint instruction is executed, or the OCD

breakpoint input line to the CPU is asserted, and SREG[DM] is cleared.

An external debugger can optionally assume control of the CPU when the Breakpoint Exception

is executed. The debugger can then issue individual instructions to be executed in Debug mode.

Debug mode is exited with the retd instruction. This passes control from the debugger back to

the CPU, resuming normal execution.

RSR_DBG = SR;

RAR_DBG = PC;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’110;

SR[D] = 1;

SR[DM] = 1;

SR[EM] = 1;

SR[GM] = 1;

PC = EVBA | 0x1C;

3.11.1.16

Illegal Opcode

This exception is issued when the core fetches an unknown instruction, or when a coprocessor

instruction is not acknowledged. When entering the exception routine, the return address on

stack points to the instruction that caused the exception.

RSR_EX = SR;

RAR_EX = PC;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’110;

SR[EM] = 1;

SR[GM] = 1;

PC = EVBA | 0x20;

40

AVR32

32001A–AVR32–06/06

AVR32

3.11.1.17

Unimplemented Instruction

This exception is issued when the core fetches an instruction supported by the instruction set

but not by the current implementation. This allows software implementations of unimplemented

instructions. When entering the exception routine, the return address on stack points to the

instruction that caused the exception.

RSR_EX = SR;

RAR_EX = PC;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’110;

SR[EM] = 1;

SR[GM] = 1;

PC = EVBA | 0x24;

3.11.1.18

Data Read Address Exception

The Data Read Address Error exception is generated if the address of a data memory read has

an illegal alignment.

RSR_EX = SR;

RAR_EX = PC;

TLBEAR = FAILING_VIRTUAL_ADDRESS;

TLBEHI[VPN] = FAILING_PAGE_NUMBER;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’110;

SR[EM] = 1;

SR[GM] = 1;

PC = EVBA | 0x34;

3.11.1.19

Data Write Address Exception

The Data Write Address Error exception is generated if the address of a data memory write has

an illegal alignment.

RSR_EX = SR;

RAR_EX = PC;

TLBEAR = FAILING_VIRTUAL_ADDRESS;

TLBEHI[VPN] = FAILING_PAGE_NUMBER;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’110;

SR[EM] = 1;

SR[GM] = 1;

PC = EVBA | 0x38;

41

32001A–AVR32–06/06

3.11.1.20

3.11.1.21

3.11.1.22

DTLB Read Miss Exception

The DTLB Read Miss exception is generated when no TLB entry matches the data memory

address of the current read operation, or if the Valid bit in a matching entry is 0.

RSR_EX = SR;

RAR_EX = PC;

TLBEAR = FAILING_VIRTUAL_ADDRESS;

TLBEHI[VPN] = FAILING_PAGE_NUMBER;

TLBEHI[I] = 0;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’110;

SR[EM] = 1;

SR[GM] = 1;

PC = EVBA | 0x60;

DTLB Write Miss Exception

The DTLB Write Miss exception is generated when no TLB entry matches the data memory

address of the current write operation, or if the Valid bit in a matching entry is 0.

RSR_EX = SR;

RAR_EX = PC;

TLBEAR = FAILING_VIRTUAL_ADDRESS;

TLBEHI[VPN] = FAILING_PAGE_NUMBER;

TLBEHI[I] = 0;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’110;

SR[EM] = 1;

SR[GM] = 1;

PC = EVBA | 0x70;

DTLB Read Protection Exception

The DTLB Protection exception is generated when the data memory read violates the access

rights specified by the protection bits of the addressed virtual page.

RSR_EX = SR;

RAR_EX = PC;

TLBEAR = FAILING_VIRTUAL_ADDRESS;

TLBEHI[VPN] = FAILING_PAGE_NUMBER;

TLBEHI[I] = 0;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’110;

SR[EM] = 1;

SR[GM] = 1;

PC = EVBA | 0x3C;

42

AVR32

32001A–AVR32–06/06

AVR32

3.11.1.23

DTLB Write Protection Exception

The DTLB Protection exception is generated when the data memory write violates the access

rights specified by the protection bits of the addressed virtual page.

RSR_EX = SR;

RAR_EX = PC;

TLBEAR = FAILING_VIRTUAL_ADDRESS;

TLBEHI[VPN] = FAILING_PAGE_NUMBER;

TLBEHI[I] = 0;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’110;

SR[EM] = 1;

SR[GM] = 1;

PC = EVBA | 0x40;

3.11.1.24

Privilege Violation Exception

If the application tries to execute privileged instructions, this exception is issued. The complete

list of priveleged instructions is shown in Table 3-3. When entering the exception routine, the

address of the instruction that caused the exception is stored as yhe stacked return address.

RSR_EX = SR;

RAR_EX = PC;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’110;

SR[EM] = 1;

SR[GM] = 1;

PC = EVBA | 0x28;

43

32001A–AVR32–06/06

Table 3-3.

List of instructions which can only execute in privileged modes.

Privileged Instructions

Comment

csrf - clear status register flag

cache - perform cache operation

Privileged only when accessing upper half of status register

tlbr - read addressed TLB entry into

TLBEHI and TLBELO

tlbw - write TLB entry registers into

TLB

tlbs - search TLB for entry matching

TLBEHI[VPN]

mtsr - move to system register

mfsr - move from system register

mtdr - move to debug register

mfdr - move from debug register

rete- return from exception

rets - return from supervisor call

retd - return from debug mode

sleep - sleep

Unpriviledged when accessing JOSP and JECR

ssrf - set status register flag

Privileged only when accessing upper half of status register

3.11.1.25

DTLB Modified Exception

The DTLB Modified exception is generated when a data memory write hits a valid TLB entry, but

the Dirty bit of the entry is 0. This indicates that the page is not writable.

RSR_EX = SR;

RAR_EX = PC;

TLBEAR = FAILING_VIRTUAL_ADDRESS;

TLBEHI[VPN] = FAILING_PAGE_NUMBER;

TLBEHI[I] = 0;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’110;

SR[EM] = 1;

SR[GM] = 1;

PC = EVBA | 0x44;

44

AVR32

32001A–AVR32–06/06

AVR32

3.11.1.26

Floating-point Exception

The Floating-point exception is generated when the optional Floating-Point Hardware signals

that an IEEE^®exception occurred, or when another type of error from the floating-point hardware

occurred. Unused in AVR32 AP since it has no FP hardware.

RSR_EX = SR;

RAR_EX = PC;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’110;

SR[EM] = 1;

SR[GM] = 1;

PC = EVBA | 0x2C;

3.11.1.27

Coprocessor Exception

The Coprocessor exception occurs when the addressed coprocessor does not acknowledge an

instruction. This permits software implementation of coprocessors.

RSR_EX = SR;

RAR_EX = PC;

SR[R] = 0;

SR[J] = 0;

SR[M2:M0] = B’110;

SR[EM] = 1;

SR[GM] = 1;

PC = EVBA | 0x30;

3.11.1.28

Supervisor call

Supervisor calls are signalled by the application code executing a supervisor call (scall) instruc-

tion. The scall instruction behaves differently depending on which context it is called from. This

allows scall to be called from other contexts than Application.

When the exception routine is finished, execution continues at the instruction following scall. The

rets instruction is used to return from supervisor calls.

If ( SR[M2:M0] == {B’000 or B’001} )

RAR_SUP ← PC + 2;

RSR_SUP ← SR;

PC ← EVBA | 0x100;

SR[M2:M0] ← B’001;

else

LR_Current

← PC + 2;

Context

PC ← EVBA | 0x100;

45

32001A–AVR32–06/06

3.12 Interrupt latencies

The following features in AVR32 AP ensure low and deterministic interrupt latency:

• Four different interrupt levels and an NMI ensures that the user can efficiently prioritize the

interrupt sources.

• Interrupts are autovectored, allowing the CPU to jump directly to the interrupt handler.

• A shadowed interrupt context for INT3 is provided so that critical interrupt handlers can start

directly without having to stack registers.

• Interrupt handler code can be locked in the icache, and the corresponding page table

information can be locked in the TLB.

The following calculations makes the following assumptions:

• The interrupt handler code is present in the icache and fetching handler instructions does not

cause any MMU exceptions.

• The pending interrupt is of higher priority than any executing interrupts, so that it can be

handled immediately.

• Any instructions in DA or D do not cause a cache miss. Any interrupts will wait until

instructions in DA or D have left these pipeline stages. If the instruction in DA or D cause a

cache miss, the time for the cache line to be loaded so that the instruction can complete will

depend on the timing of the memory the data will be loaded from. Any time spent reloading a

cache line must be added to the maximum interrupt latency calculated below.

3.12.1

Maximum interrupt latency

The maximum interrupt latency occurs when a long-running instruction is present in DA. Any

instruction must have left DA and D before interrupt handling will commence. The latency can be

calculated as follows:

Table 3-4.

Source

Maximum interrupt latency

Delay

10

Wait for the slowest instruction (ldm/stm) to leave DA and D

Wait for autovector target instruction to be fetched

TOTAL

4

14

3.12.2

Minimum interrupt latency

The maximum interrupt latency can be calculated as follows:

Table 3-5.

Source

Maximum interrupt latency

Delay

DA and D are empty

0

4

Wait for autovector target instruction to be fetched

TOTAL

46

AVR32

32001A–AVR32–06/06

AVR32

3.13 Processor consistency

Special hardware is implemented ensuring strict processor consistency, despite the use of OOO

completion. No instruction is allowed to change the state of the processor if there is a possibility

that an older, uncommitted instruction may not complete. In such a case, the younger instruction

is frozen in the IS stage until it can be guaranteed that the older instruction will commit. In prac-

tice, it is only memory access instructions that can cause a recoverable exception after they

have left the IS stage. Such address-related exceptions are always detected at the end of the D

stage. All other exceptions occuring after an instruction has left the IS stage are unrecoverable,

so processor consistency is unimportant, as a reset will have to be performed anyway.

The following mechanisms ensure processor consistency:

3.13.1

Address boundary checking

If a memory access instruction generates addresses that cross a page boundary, the next

sequential instruction is frozen in the IS stage until the memory access instruction has success-

fully left the D stage. This ensures that no address-related exceptions will occur in the middle of

a memory access instruction. As a consequence, the memory access instruction is guaranteed

to complete, and the following instruction may safely leave the IS stage.

Simple address checking is used to ensure that a memory access instruction cannot cause an

address related exception. This is checked by examining the memory pointer, the size of the

data transfer and the direction of pointer incrementation.

3.13.2

Handling contaminated instructions

Contaminated instructions are instructions that are tagged as having caused an exception. The

following rules ensures in-order completion and handling of contaminated instructions.

• A contaminated instruction is frozen in the IS stage until the DA and D stages are empty.

• When a contaminated instruction leaves the IS stage, it is issued to the A1 stage, regardless

of instruction type. All sequential instructions are frozen until the contaminated instruction

has either committed or been flushed from the pipe. This last event can occur only when the

contaminated instruction is folded with a branch.

3.13.3

Handling instructions with PC as destination

Instructions with PC as destination register will cause a change of flow. It must therefore be

ensured that no sequential instructions are allowed to commit before the instruction updating the

PC. When the instruction updating PC has left IS, all upstream stages are frozen until the

instruction updating PC has committed. The new PC value is forwarded directly from the WB

stage to the IF stage.

47

32001A–AVR32–06/06

4. Virtual memory

The AVR32 architecture uses virtual memory in order to support operating systems and large

memory spaces efficiently. Virtual memory simplifies execution of multiple processes and allows

allocation of privileges to different sections of the memory space.

The AVR32 architecture specifies a 32-bit virtual memory space. This virtual space can be

mapped to a 32-bit physical space. How this memory space is used and mapped is defined by

bus controllers and memory controllers on the outside of AVR32 AP.

4.1

Memory map

The memory map has six different segments, named P0 through P4, and U0. The P-segments

are accessible in the privileged modes, while the U-segment is accessible in the unprivileged

mode.

The virtual memory map is specified below.

Figure 4-1. The AVR32 virtual memory space

0xFFFFFFFF

512MB system space,

non-cacheable

P4

P3

P2

P1

0xE0000000

512MB translated space,

cacheable

0xC0000000

0xA0000000

0x80000000

Unaccessible space

Access error

512MB non-translated

space, non-cacheable

512MB non-translated

space, cacheable

0x80000000

2GB translated space

Cacheable

2GB translated space

Cacheable

P0

U0

0x00000000

Privileged Modes

Unprivileged Mode

Both the P1 and P2 segments are default segment translated to the physical address range

0x00000000 to 0x1FFFFFFF. The mapping between virtual addresses and physical addresses

is therefore implemented by clearing of MSBs in the virtual address. The difference between P1

and P2 is that P1 is cached, while P2 is uncached. Because P1 and P2 are segment translated

and not page translated, code for initialization of MMUs and exception vectors are located in

these segments. P1, being cacheable, offers higher performance than P2.

The P3 space is also by default segment translated to the physical address range 0x00000000

to 0x1FFFFFFF. By enabling and setting up the MMU, the P3 space becomes page translated.

Page translation will override segment translation.

The P4 space is intended for memory mapping special system resources like peripheral mod-

ules. This segment is non-cacheable, non-translated.

48

AVR32

32001A–AVR32–06/06

AVR32

The U0 segment is accessible in the unprivileged user mode. This segment is cacheable and

translated, depending upon the configuration of the cache and the memory management unit. If

accesses to other memory addresses than the ones within U0 is made in application mode, an

access error exception is issued.

The virtual address map is summarized in Table 4-1.

Table 4-1.

The virtual address map

Virtual

address

[31:29]

Default

segment

translated

Segment

name

Virtual

Address Range

Segment

size

Accessible

from

Characteristics

System space

0xFFFF_FFFF to

0xE000_0000

111

P4

512 Mb

Privileged

No

Unmapped, Uncacheable

Mapped,

0xDFFF_FFFF to

0xC000_0000

110

101

100

0xx

P3

512 Mb

2 Gb

Privileged

Yes

No

Cacheable

0xBFFF_FFFF to

0xA000_0000

P2

Unmapped, Uncacheable

Unmapped, Cacheable

Mapped, Cacheable

0x9FFF_FFFF to

0x8000_0000

P1

Unprivileged

Privileged

0x7FFF_FFFF to

0x0000_0000

P0 / U0

The segment translation can be disabled by clearing the S bit in the MMUCR. This will place all

the virtual memory space into a single 4 GB mapped memory space. Segment translation is

enabled by default.

The AVR32 architecture has two translations of addresses.

1. Segment translation (enabled by the MMUCR[S] bit)

2. Page translation (enabled by the MMUCR[E] bit)

Both these translations are performed by the MMU and they can be applied independent of each

other. This means that you can enable:

1. No translation. Virtual and physical addresses are the same.

2. Segment translation only. The virtual and physical addresses are the same for

addresses residing in the P0, P4 and U0 segments. P1, P2 and P3 are mapped to the

physical address range 0x00000000 to 0x1FFFFFFF.

3. Page translation only. All addresses are mapped as described by the TLB entries.

Doing this will give all access permission control to the AP bits in the TLB entry match-

ing the virtual address, and allow all virtual addresses to be translated.

4. Both segment and page translations. P1 and P2 are mapped to the physical address

range 0x00000000 to 0x1FFFFFFF. U0, P0 and P3 are mapped as described by the

TLB entries. The virtual and physical addresses are the same for addresses residing in

the P4 segment.

The segment translation is by default turned on and the page translation is by default turned off

after reset. The segment translation is summarized in Figure 4-2 on page 50.

49

32001A–AVR32–06/06

Figure 4-2. The AVR32 segment translation map

Physical address space

Virtual address space

Segment

0xFFFFFFFF

0xE0000000

0xFFFFFFFF

translation

512MB system space,

512MB physical address

space

P4

non-cacheable

0xE0000000

512MB translated space,

P3

cacheable

0xC0000000

512MB non-translated

space, non-cacheable

P2

0xA0000000

512MB non-translated

space, cacheable

P1

0x80000000

2GB translated space

2GB physical address

space

P0 / U0

cacheable

0x20000000

0x00000000

4.2

Understanding the MMU

The AVR32 Memory Management Unit (MMU) is responsible for mapping virtual to physical

addresses. When a memory access is performed, the MMU translates the virtual address speci-

fied into a physical address, while checking the access permissions. If an error occurs in the

translation process, or operating system intervention is needed for some reason, the MMU will

issue an exception, allowing the problem to be resolved by software.

The MMU architecture uses paging to map memory pages from the 32-bit virtual address space

to a 32-bit physical address space. Page sizes of 1, 4, 64 Kbytes and 1 Mbyte are supported.

Each page has individual access rights, providing fine protection granularity.

The information needed in order to perform the virtual-to-physical mapping resides in a page

table. Each page has its own entry in the page table. The page table also contains protection

information and other data needed in the translation process. Conceptually, the page table is

accessed for every memory access, in order to read the mapping information for each page.

4.2.1

Virtual Memory Models

The MMU provides two different virtual memory models, selected by the Mode (M) bit in the

MMU Control Register:

• Shared virtual memory, where the same virtual address space is shared between all

processes

• Private virtual memory, where each process has its own virtual address space

In shared virtual memory, the virtual address uniquely identifies which physical address it should

be mapped to. Two different processes addressing the same virtual address will always access

50

AVR32

32001A–AVR32–06/06

AVR32

the same physical address. In other words, the Virtual Page Number (VPN) section of the virtual

address uniquely specifies the Physical Frame Number (PFN) section in the physical address.

In private virtual memory, each process has its own virtual memory space. This is implemented

by using both the VPN and the Application Space Identifier (ASID) of the current process when

searching the TLB for a match. Each process has a unique ASID. Therefore, two different pro-

cesses accessing the same VPN won’t hit the same TLB entry, since their ASID is different.

Pages can be shared between processes in private virtual mode by setting the Global (G) bit in

the page table entry. This will disable the ASID check in the TLB search, causing the VPN sec-

tion uniquely to identify the PFN for the particular page.

4.2.2

MMU interface registers

The following registers are used to control the MMU, and provide the interface between the

MMU and the operating system. Most registers can be altered both by the application software

(by writing to them) and by hardware when an exception occurs. All the registers are mapped

into the System Register space, their addresses are presented in Section 2.5 ”System registers”

on page 10. The MMU interface registers are shown in Figure 4-3 on page 52.

51

32001A–AVR32–06/06

Figure 4-3. The MMU interface registers

TLBEHI

31

10

9

8

I

7

0

VPN

V

ASID

TLBELO

31

9

8

7

6

5

4

3

2

1

0

PFN

C

G

B

AP

SZ

D

W

PTBR

31

0

PTBR

TLBEAR

31

TLBEAR

MMUCR

31

26 25

20 19 18

-

14 13 12

8

7

5

4

3

2

I

1

0

-

DRP

-

DLA

-

S

N

M

E

TLBARLO

31

0

TLBARLO

4.2.2.1

TLB Entry Register High Part - TLBEHI

The content of the TLBEHI and TLBELO registers is loaded into the TLB when the tlbw instruc-

tion is executed. The TLBEHI register consists of the following fields:

• VPN - Virtual Page Number in the TLB entry. This field contains 22 bits, but the number of

bits used depends on the page size. A page size of 1 Kb requires 22 bits, while larger page

sizes require fewer bits. When preparing to write an entry into the TLB, the virtual page

number of the entry to write should be written into VPN. When an MMU-related exception has

occurred, the virtual page number of the failing address is written to VPN by hardware.

• V - Valid. Set if the TLB entry is valid, cleared otherwise. This bit is written to 0 by a reset. If

an access to a page which is marked as invalid is attempted, an TLB Miss exception is

raised. Valid is set automatically by hardware whenever an MMU exception occurs.

• I - Instruction TLB. The I bit is set by hardware when an MMU-related exception occurs,

indicating whether the error was caused by instructions or data. All MMU operations always

use the unified TLB no matter which state the I bit is in.

• ASID - Application Space Identifier. The operating system allocates a unique ASID to each

process. This ASID is written into TLBEHI by the OS, and used in the TLB address match if

the MMU is running in Private Virtual Memory mode and the G bit of the TLB entry is cleared.

ASID is never changed by hardware.

52

AVR32

32001A–AVR32–06/06

AVR32

4.2.2.2

TLB Entry Register Low Part - TLBELO

The content of the TLBEHI and TLBELO registers is loaded into the TLB when the tlbw instruc-

tion is executed. None of the fields in TLBELO are altered by hardware. The TLBELO register

consists of the following fields:

• PFN - Physical Frame Number to which the VPN is mapped. This field contains 22 bits, but

the number of bits used depends on the page size. A page size of 1 Kb requires 22 bits, while

larger page sizes require fewer bits. When preparing to write an entry into the TLB, the

physical frame number of the entry to write should be written into PFN.

• C - Cacheable. Set if the page is cacheable, cleared otherwise.

• G - Global bit used in the address comparison in the TLB lookup. If the MMU is operating in

the Private Virtual Memory mode and the G bit is set, the ASID won’t be used in the TLB

lookup.

• B - Bufferable. Set if the page is bufferable, cleared otherwise.

• AP - Access permissions specifying the privilege requirements to access the page. The

following permissions can be set, see Table 4-2:

Table 4-2.

AP[2:0]

000

Access permissions implied by the AP bits

Privileged mode

Read

Unprivileged mode

None

001

Read / Execute

None

010

Read / Write

None

011

Read / Write / Execute

Read

None

100

Read

101

Read / Execute

Read / Write

Read / Write / Execute

110

Read / Write

111

Read / Write / Execute

• SZ - Size of the page. The following page sizes are provided, see Table 4-3:

Table 4-3.

Page sizes implied by the SZ bits

SZ[1:0]

00

Page size

1 Kb

Bits used in VPN

TLBEHI[31:10]

TLBEHI[31:12]

TLBEHI[31:16]

TLBEHI[31:20]

Bits used in PFN

TLBELO[31:10]

TLBELO[31:12]

TLBELO[31:16]

TLBELO[31:20]

01

4 Kb

10

64 Kb

1 Mb

11

• D - Dirty bit. Set if the page has been written to, cleared otherwise. If the memory access is a

store and the D bit is cleared, an Initial Page Write exception is raised.

• W - Write through. If set, a write-through cache update policy should be used. Write-back

should be used otherwise. The bit is ignored if the cache only supports write-through or write-

back.

53

32001A–AVR32–06/06

4.2.2.3

Page Table Base Register - PTBR

This register points to the start of the page table structure. The register is not used by hardware,

and can only be modified by software. The register is meant to be used by the MMU-related

exception routines.

4.2.2.4

4.2.2.5

TLB Exception Address Register - TLBEAR

This register contains the virtual address that caused the most recent MMU-related exception.

The register is updated by hardware when such an exception occurs.

MMU Control Register - MMUCR

The MMUCR controls the operation of the MMU. The MMUCR has the following fields:

• DRP - Data TLB Replacement Pointer. DRP points to the TLB entry to overwrite when a new

entry is loaded by the tlbw instruction. The DRP field is incremented automatically by

hardware upon every tlbw instruction. If DRP wraps around after such an incrementation,

DRP is set to the value indicated by DLA. The DRP field can also be written by software,

allowing the exception routine to implement a replacement algorithm in software. The DRP

field is 5 bits wide, to support 32 entries in the UTLB.

When a DTLB protection exception, DTLB modified exception, or ITLB protection exception

occurs on a valid page, the DRP is set to the index of that page.

• DLA - Data TLB Lockdown Amount. Specified the number of locked down TLB entries. All

TLB entries from entry 0 to entry (DLA-1) are locked down. If DLA equals zero, no entries are

locked down. A DLA setting does not prevent the programmer from modifying an entry in the

TLB. DLA is only used when the tlbw autoincrement of DRP causes DRP to wrap.

• S - Segmentation Enable. If set, the segmented memory model is used in the translation

process. If cleared, the memory is regarded as unsegmented. The S bit is set after reset.

• N - Not Found. Set if the entry searched for by the TLB Search instruction (tlbs) was not

found in the TLB.

• I - Invalidate. Writing this bit to one invalidates all TLB entries. The bit is always read as zero.

• M - Mode. Selects whether the shared virtual memory mode or the private virtual memory

mode should be used. The M bit determines how the TLB address comparison should be

performed, see Table 4-4.

Table 4-4.

MMU mode implied by the M bit

Mode

M

0

Private Virtual Memory

Shared Virtual Memory

1

• E - Enable. If set, the MMU page translation is enabled. If cleared, no page translation is

performed.

4.2.2.6

TLB Accessed Register HI - TLBARHI

TLBARHI is not implemented since only 32 TLB entries are present.

54

AVR32

32001A–AVR32–06/06

AVR32

4.2.2.7

TLB Accessed Register LO - TLBARLO

The TLBARLO register is a 32-bit register with 32 1-bit fields. Each of these fields contain the

Accessed bit for the corresponding UTLB entry. Bit 0 in TLBARLO correspond to UTLB entry 0,

bit 31 in TLBARLO correspond to UTLB entry 32.

Note: The contents of TLBARLO are reversed to let the Count Leading Zero (CLZ) instruction be

used directly on the contents of the registers. E.g. if CLZ returns the value four on the contents

of TLBARLO, then item four is the first unused item in the TLB.

4.2.3

Page Table Organization

The MMU leaves the page table organization up to the OS software. Since the page table han-

dling and TLB handling is done in software, the OS is free to implement different page table

organizations. It is recommended, however, that the page table entries (PTEs) are of the format

shown in Figure 4-4. This allows the loaded PTE to be written directly into TLBELO, without the

need for reformatting. How the PTEs are indexed and organized in memory is left to the OS.

Figure 4-4. Recommended Page Table Entry format

31

10

9

8

7

6

5

4 3 2 1 0

PFN

C

G B

AP

SZ W D

4.2.4

TLB organization

The TLB is used as a cache for the page table, in order to speed up the virtual memory transla-

tion process. A single TLB is implemented in AVR32 AP, with 32 entries. The TLB is configured

as shown in Table 4-5.

Figure 4-5. TLB organization

Address section

Data section

Entry 0

Entry 1

Entry 2

Entry 3

VPN[21:0]

ASID[7:0]

V

PFN[21:0]

C G

B

AP[2:0] SZ[1:0]

D

W A

VPN[21:0]

Entry 31

VPN[21:0]

ASID[7:0]

V

PFN[21:0]

C G

B

AP[2:0] SZ[1:0]

D

W A

The A bit is the Accessed bit. This bit is set when the TLB entry is loaded with a new value using

the tlbw instruction. It is cleared whenever the TLB matching process finds a match in the spe-

cific TLB entry. The A bit is used to implement pseudo-LRU replacement algorithms.

When an address look-up is performed by the TLB, the address section is searched for an entry

matching the virtual address to be accessed. The matching process is described in chapter

4.2.5.

55

32001A–AVR32–06/06

The MMU has a 4-entry micro-ITLB, and an 8 entry micro-DTLB connected to the caches. The

caches use the micro-TLBs directly for look-ups. If the desired entry is not found in the small

micro-TLB, the larger common TLB is searched. If the entry is found in the common TLB, it is

copied into the desired micro-TLB and the access is performed. Otherwise, a page miss excep-

tion is issued.

The use of micro-TLBs is completely transparent to the user. Hardware is responsible for replac-

ing entries in the micro-TLB with entries found in the main TLB. Small micro-TLBs are used in

order to increase clock frequency, since performing a look-up in a large TLB is slower than for a

small TLB. If an access misses in the micro-TLB, a clock cycle penalty is imposed for performing

a look-up in the large TLB.

4.2.5

Translation process

The translation process maps addresses from the virtual address space to the physical address

space. The addresses are generated as shown in Table 4-5, depending on the page size

chosen:

Table 4-5.

Page size

1 Kb

Physical address generation

Physical address

PFN[31:10], VA[9:0]

PFN[31:12], VA[11:0]

PFN[31:16], VA[15:0]

PFN[31:20], VA[19:0]

4 Kb

64 Kb

1 Mb

56

AVR32

32001A–AVR32–06/06

AVR32

A data memory access can be described as shown in Table 4-6.

Table 4-6. Data memory access pseudo-code example

If (Segmentation disabled)

If (! PagingEnabled)

PerformAccess(cached, write-back);

else

PerformPagedAccess(VA);

else

if (VA in Privileged space)

if (InApplicationMode)

SignalException(DTLB Protection, accesstype);

endif;

if (VA in P4 space)

PerformAccess(non-cached);

else if (VA in P2 space)

PerformAccess(non-cached);

else if (VA in P1 space)

PerformAccess(cached, writeback);

else

// VA in P0, U0 or P3 space

if ( ! PagingEnabled)

PerformAccess(cached, writeback);

else

PerformPagedAccess(VA);

endif;

57

32001A–AVR32–06/06

The translation process performed by PerformPagedAccess( ) can be described as shown in

Table 4-7.

PerformPagedAccess( ) pseudo-code example

match ← 0;

for (i=0; i<TLBentries; i++)

if ( Compare(TLB[i]_VPN, VA, TLB[i]_SZ, TLB[i]_V) )

// VPN and VA matches for the given page size and entry valid

if ( SharedVirtualMemoryMode or

(PrivateVirtualMemoryMode and ( TLB[i]_Gor (TLB[i]_ASID==TLBEHI_ASID) ) ) )

if (match == 1)

SignalException(TLBmultipleHit);

else

match ← 1;

TLB[i]_A← 1;

ptr ← i;

// pointer points to the matching TLB entry

endif;

endfor;

if (match == 0 )

SignalException(DTLBmiss, accesstype);

endif;

if (InApplicationMode)

if (TLB[ptr]_AP[2]== 0)

SignalException(DTLBprotection, accesstype);

endif;

if (accesstype == write)

if (TLB[ptr]_AP[1]== 0)

SignalException(DTLBprotection, accesstype);

endif;

if (TLB[ptr]_D== 0)

// Initial page write

SignalException(DTLBmodified);

endif;

if (TLB[ptr]_C== 1)

if (TLB[ptr]_W== 1)

PerformAccess(cached, write-through);

else

PerformAccess(cached, write-back);

endif;

else

PerformAccess(non-cached);

endif;

58

AVR32

32001A–AVR32–06/06

AVR32

An instruction memory access can be described as shown in Table 4-8.

Table 4-8. Instruction memory access pseudo-code example

If (Segmentation disabled)

If (! PagingEnabled)

PerformAccess(cached, write-back);

else

PerformPagedAccess(VA);

else

if (VA in Privileged space)

if (InApplicationMode)

SignalException(ITLB Protection, accesstype);

endif;

if (VA in P4 space)

PerformAccess(non-cached);

else if (VA in P2 space)

PerformAccess(non-cached);

else if (VA in P1 space)

PerformAccess(cached, writeback);

else

// VA in P0, U0 or P3 space

if ( ! PagingEnabled)

PerformAccess(cached, writeback);

else

PerformPagedAccess(VA);

endif;

59

32001A–AVR32–06/06

The translation process performed by PerformPagedAccess( ) can be described as as shown in

Table 4-9.

PerformPagedAccess( ) pseudo-code example

match ← 0;

for (i=0; i<TLBentries; i++)

if ( Compare(TLB[i]_VPN, VA, TLB[i]_SZ, TLB[i]_V) )

// VPN and VA matches for the given page size and entry valid

if ( SharedVirtualMemoryMode or

(PrivateVirtualMemoryMode and ( TLB[i]_Gor (TLB[i]_ASID==TLBEHI_ASID) ) ) )

if (match == 1)

SignalException(TLBmultipleHit);

else

match ← 1;

TLB[i]_A← 1;

ptr ← i;

// pointer points to the matching TLB entry

endif;

endfor;

if (match == 0 )

SignalException(ITLBmiss);

endif;

if (InApplicationMode)

if (TLB[ptr]_AP[2]== 0)

SignalException(ITLBprotection);

endif;

if (TLB[ptr]_AP[0]== 0)

SignalException(ITLBprotection);

endif;

if (TLB[ptr]_C== 1)

PerformAccess(cached);

else

PerformAccess(non-cached);

endif;

4.3

Operation of the MMU and MMU exceptions

The MMU uses both hardware and software mechanisms in order to perform its memory remap-

ping operations. The following tasks are performed by hardware:

1. The MMU decodes the virtual address and tries to find a matching entry in the TLB.

This entry is used to generate a physical address. If no matching entry is found, a TLB

miss exception is issued.

2. The matching entry is used to determine whether the access has the appropriate

access rights, cacheability, bufferability and so on. If the access is not permitted, a TLB

Protection Violation exception is issued.

60

AVR32

32001A–AVR32–06/06

AVR32

3. If any other event arises that requires software intervention, an appropriate exception is

issued.

4. If the correct entry was found in the TLB, and the access permissions were not violated,

the memory access is performed without any further software intervention.

The following tasks must be performed by software:

1. Setup of the MMU hardware by initializing the MMU-related registers and data struc-

tures if needed.

2. Maintenance of the TLB structure. TLB entries are written, invalidated and replaced by

means of software. A tlbw instruction is included in the instruction set to support this.

3. The MMU may generate several exceptions. Software exception handlers must be writ-

ten in order to service these exceptions.

4.3.1

4.3.2

The tlbw instruction

The tlbw instruction is implemented in order to aid in performing TLB maintenance. The instruc-

tion copies the contents of TLBEHI and TLBELO into the TLB entry pointed to by the DTLB

Replacement Pointer (DRP) in the MMU Control Register. DRP is automatically incremented by

hardware in order to implement a TLB replacement algorithm in hardware. Software may update

DRP before executing tlbw in order to implement a software replacement algorithm.

TLB synonyms

The caches in the AVR32 AP system are virtually indexed but physically tagged. This allows a

cache access to start before the MMU translation has completed, but puts some restrictions on

which address translations can be performed.

If using pages smaller than 1/4th of the cache size, it is possible that the virtual address and the

physical address could map to different places in the cache. To avoid unpredictable behaviour,

the OS must ensure that no translations change address bits that are lower than 1/4th cache

size.

This means that all page translation must fulfill the following restriction:

Address_Physicalmodulo (Cache Size / 4) = Address_Virtualmodulo (Cache Size / 4)

For cache sizes up to 16 kB, this is only relevant for 1kB MMU pages. For 32kB caches, this is

also relevant for 4kB pages.

Example 1: On a system with 8kB caches, virtual and physical address must be the same mod-

ulo 2 kB. If using 1kB pages, the OS must ensure that bit 10 of the address is not changed by

the translation.

Example 2: On a system with 16kB caches, virtual and physical address must be the same mod-

ulo 4 kB. If using 1kB pages, the OS must ensure that bit 10 and 11 of the address are not

changed by the translation.

4.3.3

MMU exception handling

This chapter describes the software actions that must be performed for MMU-related excep-

tions. The hardware actions performed by the exceptions are described in detail in Section

3.11.1 ”Description of events in AVR32 AP” on page 35.

61

32001A–AVR32–06/06

4.3.3.1

ITLB / DTLB Multiple Hit

If multiple matching entries are found when searching the TLB, or matching entries are found in

a segment translated area, this exception is issued. This situation is a critical error, since mem-

ory consistency can no longer be guaranteed. The exception hardware therefore jumps to the

reset vector, where software should execute the required reset code. This exception is a sign of

erroneous code and is not normally generated.

The software handler should perform a normal system restart. However, debugging code may

be inserted in the handler.

4.3.3.2

ITLB / DTLB Miss

This exception is issued if no matching entries are found in the TLB, or when a matching entry is

found with the Valid bit cleared.

1. Examine the TLBEAR and TLBEHI registers in order to identify the page that caused

the fault. Use this to index the page table pointed to by PTBR and fetch the desired

page table entry.

2. Use the fetched page table entry to update the necessary bits in TLBEHI and TLBELO.

The following bits must be updated, not all bits apply to ITLB entries: V, PFN, C, G, B,

AP[2:0], SZ[1:0], W, D.

3. The replacement pointer in MMUCR[DRP] may be written to manually choose which

entry to replace.

4. Execute the tlbw instruction in order to update the TLB entry.

5. Finish the exception handling and return to the application by executing the rete

instruction.

4.3.3.3

ITLB / DTLB Protection Violation

This exception is issued if one of the following occur:

• Access to a privileged segment in application mode.

• Access to a page translated area, and the access permissions on the matching page does

not allow that type of access. MMUCR[DRP] is updated to point to the matching TLB entry.

Software must examine the TLBEAR and TLBEHI registers in order to identify the instruction

and process that caused the error. Corrective measures like terminating the process must then

be performed before returning to normal execution with rete.

4.3.3.4

DTLB Modified

This exception is issued if a valid memory write operation is performed to a page that has never

been written before. This is detected by the Dirty-bit in the matching TLB entry reading zero.

1. Examine the TLBEAR and TLBEHI registers in order to identify the page that caused

the fault. Use this to index the page table pointed to by PTBR and fetch the desired

page table entry.

2. Set the Dirty bit in the read page table entry and write this entry back to the page table

3. Use the fetched page table entry to update the necessary bits in TLBEHI and TLBELO.

The following bits must be updated: V, PFN, C, G, B, AP[2:0], SZ[1:0], W, D.

4. The TLBEHI[I] register is cleared by hardware to indicate that it was a data access, and

MMUCR[DRP] is updated to point to the matching TLB entry.

5. Execute the tlbw instruction in order to update the TLB entry.

6. Finish the exception handling and return to the application by executing the rete

instruction.

62

AVR32

32001A–AVR32–06/06

AVR32

5. Prefetch Unit

5.1

Instruction buffer

The instruction buffer is implemented as a 96-bit FIFO queue, holding 12 byte-sized entries. The

buffer can hold either Java or RISC instructions. The instruction at the front of the queue is

issued to the ID stage at each clock cycle. The buffer detects the length of this instruction, and

shifts the queue the appropriate amount. The tail of the queue is filled with instructions from the

instruction cache. Instructions are placed in the buffer as soon as the queue has vacant slots. If

the queue is empty, or the instruction at the head of the queue is only partially fetched, the ID

stage may need to stall until the entire instruction is available.

The queue can contain instructions of different length. The instruction at the front of the queue is

always assumed to be a valid, aligned and complete instruction. If this condition fails, the hard-

ware would be unable to determine the instruction boundary. This is necessary in order to

separate between instructions and decide where in the buffer an instruction starts and ends.

The instruction buffer has the following format:

Figure 5-1. Instruction buffer.

Instruction queue

5.1.1

Instruction buffer fill

If the instruction buffer is non-empty, fetched instructions will always reside at sequential

addresses of the instructions already in the buffer. In this case, no special concerns are taken. If

the instruction buffer has been flushed and is empty, the loaded word must be rotated in such a

way that the addressed instruction is placed at the front of the instruction queue. If the target

address is not word-aligned, the most significant bytes in the fetched word are discarded. In

order to efficiently execute branch instructions, the branch targets for extended branch target

instructions should be aligned. This will in many cases allow the target instruction to be fetched

and executed without pipeline stalls.

A dedicated fetch address adder generates addresses to fetch from the instruction cache for

sequential instruction flow.

5.1.2

Flushing of the instruction buffer

The instruction buffer is flushed in the following circumstances:

• Entry into an exception routine

• Execution of an instruction with PC as target register

• Excecution of an unpredicted branch

• Detection of a mispredicted branch

• A procedure return address is popped from the return address stack.

63

32001A–AVR32–06/06

5.1.3

Instruction forwarding

If for some reason the instruction buffer is empty, the fetched instruction will be forwarded past

the fetch queue and into ID as soon as it is fetched. This forwarding is done only when it can be

ensured that the entire instruction is contained in the fetched word. This is ensured if the 2 least

significant bits of the PC of the desired instruction equal 00. If the bits are 10, the fetched instruc-

tion must pass via the instruction buffer. This will impose a 1-cycle penalty on the case where

the addressed instruction is a compact instruction, but is done for critical path simplification.

Instruction forwarding is done only in RISC mode, no instruction forwarding is done in Java

mode.

5.2

Branch prediction

5.2.1

Functionality

AVR32 AP implements special hardware in order to minimize the penalty from mispredicted

branches. A branch target buffer (BTB) is used in order to record information about the outcome

of encountered branches. This information is used to predict the outcome of branches based on

the recorded history of the branch. The hardware is able to start fetching instructions from the

predicted path, so that no branch penalty is experienced if the prediction was correct. If the pre-

diction was wrong, the pipeline will have to be flushed and execution resume at the correct path.

If prediction information about an encountered branch is contained in the BTB, hardware will in

many cases be able to fold the branch instruction with the following instruction. In this process,

the branch instruction is removed from the execution stream and its condition codes are passed

on to the following instruction. This instruction is sent down the pipe together with the condition

codes of the branch. If the branch was predicted correctly, the folded instruction is allowed to

complete. Otherwise, the folded instruction is flushed from the pipeline and execution continues

from the alternate branch path.

Branch prediction is enabled or disabled according to the Branch Prediction Enable (BE) bit in

the CPUCR system register. Before enabling or disabling the BTB, it must be invalidated. No

branches should be executed between the BTB invalidate and the BTB enable or disable.

Branch prediction is invisible to the programmer. Hardware makes sure that the program exe-

cutes correctly regardless of a branch being predicted or not, and the correctness of the

prediction.

64

AVR32

32001A–AVR32–06/06

AVR32

5.2.2

Predictable instructions

The following instructions are predictable, and may be placed in the BTB.

Table 5-1.

Predictable instructions

Conditiona

l

Can fold other

instructions

Can merge with other

instructions

Instruction

br k8

Mode

Yes

No

Yes

RISC

Java

Yes

No

-

br k21

-

rjmp k10

rjmp k21

rcall k10

rcall k21

if{cond}

-

Yes

ifcmp{cond}

Java

-

5.2.3

Foldable instructions

All instructions can be folded into a branch instruction, except for instructions that are predicted

by the BTB. These instructions are listed in Table 5-1. In other words, br {k8, k21} and rjmp {k10,

k21} can not be folded together with any of the instructions listed in Table 5-1. The three call

instructions listed in the table can not fold into any other instructions.

All Java branches can be predicted and can be merged as described in the Java Technical

Reference.

5.2.4

Branch target buffer

The branch target buffer (BTB) is a n-entry direct mapped cache. The indexing function used is

index = fetchadr[n+2:2]. This function is expected to present a good hashing function into the

cache, distributing competing entries evenly into the cache. Note that bits [n+2:2] in the instruc-

tion address is used for BTB lookup. This will map two sequential compact branches to the same

BTB entry. This should not reduce performance, as the case where two sequential branches

both are predicted taken is meaningless. The other bits in the instruction address is used as

cache tag fields.

Each line in the BTB cache cache has the following format. The Ext field indicates if the branch

instruction is an extended instruction.

Figure 5-2. BTB entry.

Tag

Data

Target address

Branch Address

Valid

History[1:0] Ext

The history bits are implemented as a 2-bit saturating counter. When a new branch is detected,

the counter is initialized to Strong Taken. The FSM has the following coding and transitions:

65

32001A–AVR32–06/06

Figure 5-3. BTB entry.

Branch taken

Strong Taken

11

Branch taken

Branch not taken

Weak Not Taken

00

Weak Taken

10

Branch not taken

Strong Not Taken

01

Branch taken

Branch not taken

5.2.5

BTB update policy

A new entry is stored in the BTB when the following conditions are met:

• The branch instruction or the branch-folded instruction has executed, i.e. left A1.

• The branch was taken

• The branch is not currently in the BTB

• Branch prediction is enabled

Once a branch is stored in the BTB, the history bits are updated upon every execution of the

branch. When a new branch is detected, the counter is initialized to Strong Taken, and the Valid

bit is set.

5.2.6

5.2.7

Reset

Branch prediction is enabled after reset. All valid bits in the BTB are cleared after reset.

The BTB is invalidated when one or more of the following events occur:

Invalidation

• Reset

• The instruction cache is invalidated

• The ASID part of the TLBEHI system register is written

• The BTB Invalidate (BI) bit in the CPUCR system register

The application may manually need to invalidate the BTB after executing self-modifying code, in

order to avoid false predictions.

Important note:

As mentioned above, the BTB is invalidated when the ASID field in TLBEHI is written. As shown

in Table 2-2, “System Registers implemented in AVR32 AP,” on page 10, TLBEHI is accessed

through the TCB bus. This implies that several instructions can be present upstream in the pipe-

line when TLBEHI is written. However, writing to ASID does not cause the pipeline to be flushed.

The user must therefore ensure that predicted instructions is not fetched and sent down the

66

AVR32

32001A–AVR32–06/06

AVR32

pipeline before the write to ASID has invalidated the BTB. Failing to do so may cause UNDE-

FINED behaviour. This error can be avoided by one of the following methods:

• Scheduling the code executed after the write to TLBEHI, such that no predictable instructions

are fetched before the BTB has been invalidated by the write to TLBEHI, or

• Forcing a pipeline flush by issuing an unpredicted change-of-flow instruction after the write to

the TLBEHI, so that the pipeline is flushed after the BTB has been invalidated. This can be

done by the following code sequence:

mtsr AVR32_TLBEHI, r0 ; Update TLBEHI with new value present in r0

sub pc, -2

; Not predictable COF insn flushing the pipe

5.2.8

Disabling

Branch prediction is disabled by clearing the Branch Prediction Enable (BE) bit in CPUCR.

67

32001A–AVR32–06/06

5.2.9

Return stack

The return stack is a 4-entry circular buffer, holding the return addresses for call instructions.

The ID stage controls the pushing of return addresses to the return stack. When A1 detects a

rcall, icall, mcall or acall instruction, the address of the instruction following the call (the instruc-

tion in IS) is pushed. This is the return address.

There are two types of return instructions: Predicted taken and predicted not taken. The predic-

tion is statically based on the instruction opcode, as shown in Table 5-2. Predicted taken return

instructions will cause a return stack pop and execution will continue as soon as possible from

the new path. Predicted not taken return instructions will not cause a return stack pop until it has

reached the A1 stage and the condition is evaluated to be true.

When ID detects a predicted taken return instruction, the top-of-stack element is popped and

used as an instruction fetch address. The return stack is circular, and overflow is handled by

hardware by means of a saturating valid-element counter. If a predicted taken return instruction

is encountered and the return stack is empty, the return instruction will still be executed cor-

rectly, but with a cycle penalty.

The instructions in Table 5-2 are considered return-instructions. No other instructions or mecha-

nisms should be used to return from call-instructions. Violation of this rule will place hardware in

an undetermined state. If the user wants to return from call-instructions by other means than the

instructions listed in Table 5-2, the return stack must be disabled by clearing the Return Stack

Enable (RE) bit in CPUCR. Another approach is to flush the return stack before returning by exe-

cuting the flush return stack (frs) instruction.

Table 5-2.

Predictable return instructions

Instruction

mov pc, lr

Prediction

Taken

ret, cond == AL

ret, cond != AL

Taken

Not taken

Taken

popm with PC in register list

ldm with PC in register list

Taken

5.2.10

5.2.11

Reset

The return stack is enabled and empty after reset.

Disabling

The return stack is disabled by clearing the Return Stack Enable (RE) bit in CPUCR. Disabling

the stack will reset the element counter, removing all entries.

68

AVR32

32001A–AVR32–06/06

AVR32

6. Instruction Cache

The AVR32 AP uses an instruction cache in order to increase performance and lower power

consumption. The cache has the following features:

• Virtually indexed, physically tagged.

• 4-way Set-associative.

• 32-byte line size.

• Least recently used (LRU) allocate-on-read-miss line replacements.

• Easily portable using standard single and two port RAMs.

• Lockable on a per-line basis.

• Cacheable or uncacheable operation configurable on a per-page basis through the MMU.

• All accesses are subject to MMU protection and translation checks.

• Powerful cache maintenance operations, allowing many common cache operations to be

performed through a single instruction.

The number of sets in the instruction cache is specified in the CONFIG1 register described in

chapter 2.6. The total cache size is (number of sets * line size * associativity), i.e. (128 * number

of sets) bytes.

6.1

Behaviour

Reset invalidates all entries in the ICache.

All instruction fetches will result in an ICache lookup and will return the cached data if the

requested address is found in the cache (a cache hit). If data are found in the cache, they are

returned even if the memory area is marked as uncacheable. If the address is not found (a

cache miss) and the address is in a cacheable area, a burst read access is started on the sys-

tem bus in order to read an entire cache line of data. The read data will be written to an unlocked

line according to a LRU scheme, possibly replacing another line.

If a cache miss is in an uncacheable area, a single non-sequential read is started on the system

bus.

If the MMU is disabled or the fetch is from an unmapped segment, the cacheability of memory

areas is predefined as shown in the architecture manual. If the MMU is enabled and the fetch is

from a mapped segment, the cacheability of memory areas is controlled through the C bit in the

TLB.

If the MMU signals an exception, the cache will abandon the fetch. The exception is handled by

the CPU as soon as it knows if the instruction should really be executed, but is ignored if it turns

out to be a needlessly prefetched instruction.

There is no hardware support for self-modifying code. If any memory that may be cached in the

ICache is modified during execution, the programmer is responsible for ensuring ICache consis-

tency. See chapter 6.3 for details on how to do this.

69

32001A–AVR32–06/06

6.2

Cache operations

All cache operations are initiated through the CACHE instruction. See the Instruction Set

Description for the format of the CACHE instruction.

The following cache operations are defined for the ICache:

Table 6-1.

ICache operations

Op[4:3]

00

Op[2:0]

000

001

010

011

100

101

110

111

xxx

Operation

Parameter

Flush mode

Virtual Address

N/A

Flush

00

Invalidate

Lock

00

Unlock

00

Prefetch

00

Reserved

Reserved for other caches

00

N/A

00

N/A

Other

N/A

6.2.1

The Flush operation

The flush operation is used to reset the contents of the cache. This is done automatically at

reset, and may be used at the programmers discretion at other times. The parameter is used to

select one of the following flush modes:

Table 6-2.

ICache flush modes

Mode

Name

Description

0

Flush all

All lines are invalidated and unlocked, including locked ones.

Invalidate all unlocked lines.

1

Flush unlocked

Unlock all

Undefined

2

All locked lines are unlocked, but no lines are invalidated.

Should not be used - operation is undefined.

Others

6.2.2

6.2.3

The Invalidate operation

The invalidate operation will try to invalidate the line containing the address given by the param-

eter. The address is treated as a virtual address, and is translated to a physical address by the

MMU. If the line exists in the cache, it is marked as invalid and unlocked. Otherwise nothing is

done. If any MMU exceptions happen, the operation is silently aborted.

The Lock operation

The lock operation will try to lock the line containing the address given by the parameter into the

cache. The address is treated as a virtual address, and is translated to a physical address by the

MMU. If the line exists in the cache, the lock bit is set. If the line does not exist in the cache, it will

be fetched from the bus and locked - even if the area is uncacheable. If any exceptions happen

- MMU or bus exceptions - the operation is silently aborted. If the requested address maps to a

set with four lines already locked, the operation is silently aborted.

70

AVR32

32001A–AVR32–06/06

AVR32

6.2.4

6.2.5

The Unlock operation

The unlock operation will try to unlock the line containing the address given by the parameter.

The address is treated as a virtual address, and is translated to a physical address by the MMU.

If the line exists in the cache and is locked, it is unlocked. Otherwise nothing is done. If any MMU

exceptions happen, the operation is silently aborted.

The Prefetch operation

The prefetch operation will try to load the line containing the address given by the parameter into

the cache. The address is treated as a virtual address, and is translated to a physical address by

the MMU. If the line exists in the cache, nothing is done. If the line does not exist in the cache, it

will be fetched from the bus - even if the area is uncacheable. If any exceptions happen - MMU

or bus exceptions - the operation is silently aborted.

6.3

Memory coherency

Whenever code is modified in some way, e.g. through self-modifying code, there is a chance

that the instruction cache may hold cached copies of the old instructions. To ensure correct exe-

cution of the new code, the user must manually force the caches to update. The procedures for

doing so are described below.

6.3.1

DMA of program code

If some peripheral updates program code through DMA to memory, the instruction cache may

hold cached copies of the old code. The old code must be manually flushed from the instruction

cache by following these steps:

1. Flush the entire instruction cache, as described in chapter 6.2.1, or

2. Flush only the affected memory areas through one or more Invalidate operations, as

described in chapter 6.2.2.

3. Jump to the new code using an unpredicted branch

Example:

mov R0, 0

cache ICACHE_FLUSH, R0[ICACHE_INVALIDATE_ALL]

mov PC, new_code_label

6.3.2

Self-modifying code

If the CPU updates the code through writing to memory, the updated code may be buffered in

the data cache or the write buffer, and the instruction cache may hold cached copies of the old

code. The caches must be manually updated by following these steps:

1. Clean the entire data cache, as described in TODO, or

2. Clean only the affected memory areas through one or more Clean operations, as

described in TODO.

3. Empty the write buffer, as described in TODO

4. Flush the entire instruction cache, as described in chapter 6.2.1, or

5. Flush only the affected memory areas through one or more Invalidate operations, as

described in chapter 6.2.2.

6. Jump to the new code using an unpredicted branch

71

32001A–AVR32–06/06

Example:

Mov R0, 0

cache DCACHE_FLUSH, R0[DCACHE_CLEAN_ALL]

sync 0

cache ICACHE_FLUSH, R0[ICACHE_INVALIDATE_ALL]

mov PC, new_code_label

6.4

Debug access to ICache memories

It is possible to directly access the memories in the ICache through the SAB bus.

The ICache maps read or write requests to the cache memories by decoding the address as

shown below:

Figure 6-1. ICache direct access addressing.

31

0

Ignored

T/D

Way

Set

Byte

Table 6-3.

Field

T/D

ICache direct access address fields

Size (bits)

Description

1

Access tag memories if set, data memories if cleared

Selects which line in a set to access.

Selects which set to access.

Way

2

Set

log2(number of sets)

Selects which byte of the line to access.

Ignored if accessing tag memory.

Byte

5

Note that the ICache only supports word-aligned 32-bit accesses. Other sizes or alignments may

cause undefined behaviour.

72

AVR32

32001A–AVR32–06/06

AVR32

6.4.1

Format of tag memory

Each word in the tag memory is formatted as shown below:

Figure 6-2. ICache direct access tag format.

31

2

1

0

Tag

N/A

Valid

Lock

Table 6-4.

Field

ICache direct access tag fields

Size (bits)

Description

The most significant bits of the physical address of the data

in the specified line.

Tag

N/A

27 - log2(number of sets)

Not used.

30 - tag size

Reads are undefined, writes are ignored.

Lock

Valid

1

Set if the line is locked, cleared otherwise.

Set if the line is valid, cleared otherwise.

73

32001A–AVR32–06/06

7. Data Cache and Write Buffer

The AVR32 AP uses a data cache and a write buffer in order to increase performance and

reduce power consumption. The data cache has the following features:

• Virtually indexed, physically tagged.

• 4-way set-associative.

• 32-byte line size

• Least recently used allocate-on-read-miss line replacements.

• Synthesizable design using standard single and two port RAMs.

• Lockable on a per-line basis.

• Cacheable or uncacheable operation configurable on a per-page basis through the MMU.

• Write-back or write-through operation configurable on a per-page basis through the MMU.

• All accesses are subject to MMU protection and translation checks.

• Powerful cache maintenance operations, allowing many common cache operations to be

performed through a single instruction.

The number of sets in the data cache is specified in the CONFIG1 register described in chapter

2.6. The total cache size is (number of sets * line size * associativity), i.e. (128 * number of sets)

bytes.

The write buffer has the following features:

• Bufferable or unbufferable writes configurable on a per-page basis through the MMU.

• Does write combining on bufferable writes.

• Holds up to 32 bytes of buffered data, plus up to 32 bytes of data that are about to be written

to the bus.

• Can be manually flushed by using the SYNC instruction.

7.1

Data cache behaviour

Reset invalidates all entries in the DCache.

All reads result in an DCache lookup and will return the cached data if the requested address is

found in the cache (a cache hit). If the address is not found (a cache miss) and the address is in

a cacheable area, a burst read access is started on the system bus in order to read an entire

cache line of data. The read data will be written to an unlocked line according to a round-robin

scheme, possibly replacing another line.

If the cache miss is in an uncacheable area, a single non-sequential read is started on the sys-

tem bus.

All writes result in a DCache lookup and will update the cached data if the requested address is

found in the cache (a cache hit). If the address is not found (a cache miss) or the address is con-

figured as write-through, the write will also update the write buffer (if bufferable) or be written to

the bus (if unbufferable).

If the MMU is disabled or the fetch is from an unmapped segment, the cacheability, bufferability

and write-back/write-through attributes of memory areas are predefined as shown in the archi-

tecture manual. If the MMU is enabled and the access is to a mapped segment, the attributes

are configured through the C, B and W bits in the TLB.

74

AVR32

32001A–AVR32–06/06

AVR32

7.2

Write buffer behaviour

Reset invalidates the entire write buffer.

Writes are separated into "write now" (unbufferable) and "write later" (bufferable) writes. Multiple

bufferable writes to the same aligned 32-byte line may be merged for performance, and reads

may forward buffered data directely from the buffer. Bufferable writes will stay in the write buffer

until forced out when one of the following occur:

A buffered write is to another 32-byte line than the one currently in the buffer. The old data are

moved to the "write now" part of the buffer, and the new data are kept as "write later".

A read finds only part of the requested data in the write buffer, e.g a word read finds one byte of

the word in the write buffer. The buffered data are moved to the "write now" part of the buffer,

and the read is qued until the write has completed. This ensures data consistency.

A SYNC instruction is executed. All data in the write buffer are written to the bus.

Unbufferable writes are put in the "write now" part of the buffer. These are written to the bus at

the earliest oportunity, in the same order as issued, and without any write combining. These

data cannot be forwarded by reads, and are always performed before any read misses.

7.3

Cache and write buffer operations

7.3.1

The Cache instruction

The cache instruction can be used to send special commands to the cache with a 32-bit param-

eter. Se the architecture reference manual for description of the instruction. The following

commands are currently defined:

Table 7-1.

Cache instruction parameters

Op[4:3]

01

Op[2:0]

Operation

Parameter

000

001

010

011

100

101

110

111

xxx

Flush

Flush Mode

Virtual Address

Undefined

01

Lock

01

Unlock

01

Invalidate

01

Clean

01

Clean & Invalidate

Reserved for future use

Reserved for other caches

01

Undefined

Other

N/A

75

32001A–AVR32–06/06

7.3.2

Flush

The flush operation is used to reset the contents of the cache. While a flush is active, all other

operations are stalled. The parameter is used to select one of the following flush modes:

Table 7-2.

Mode

Flush modes

Name

Description

All lines are invalidated and unlocked.

0

1

Invalidate all

Dirty lines are not cleaned!

All unlocked lines are invalidated.

Invalidate unlocked

Dirty lines are not cleaned!

2

Clean all

All lines are cleaned, but no lines are invalidated.

All unlocked lines are cleaned, but no lines are invalidated.

All lines are cleaned, invalidated and unlocked.

All unlocked lines are cleaned and invalidated.

All locked lines are unlocked.

3

Clean Unlocked

Clean & Invalidate all

Clean & Invalidate unlocked

Unlock all

4

5

6

Others

Undefined

Should not be used - operation is undefined.

7.3.3

7.3.4

Lock

The lock operation will try to lock the line containing the address given by the parameter into the

cache. If the line exists in the cache, the lock bit is set. If the line does not exist in the cache, it

will be fetched from the bus and locked - even if the area is uncacheable. If any MMU exceptions

happen, the operation is silently aborted. If the requested address maps to a set with four lines

already locked, the operation is silently aborted.

Unlock

The unlock operation will try to unlock the line containing the address given by the parameter. If

the line exists in the cache and is locked, it is unlocked. Otherwise nothing is done. If any MMU

exceptions happen, the operation is silently aborted.

If the line referred to is still being fetched from the bus, the unlock operation will complete but the

line could be locked after the unlock completes. This is only possible if a lock is closely followed

by an unlock to the same line - in this case use the sync instruction to make sure any pending

locks are completed before the unlock is performed.

7.3.5

Invalidate

The invalidate operation will try to invalidate the line containing the address given by the param-

eter. If the line exists in the cache, it is marked as invalid. Otherwise nothing is done. If the line

has dirty data, these updates will be lost! If any MMU exceptions happen, the operation is

silently aborted.

If the line referred to is still being fetched from the bus, the invalidate operation will complete but

the line could become valid after the invalidate completes. This is only possible if a read or

prefetch is closely followed by an invalidate to the same line - in this case use the sync instruc-

tion to make sure any pending reads are completed before the invalidate is performed.

7.3.6

76

Clean

The invalidate operation will try to clean the line containing the address given by the parameter.

If the line exists in the cache and has dirty data, it will be written to the write buffer and marked

AVR32

32001A–AVR32–06/06

AVR32

as clean. If the write buffer is full the instruction will stall. If the line is not found or the line is

clean, nothing is done. If any MMU exceptions happen, the operation is silently aborted.

7.3.7

Clean & Invalidate

This operation performs a clean (as described in chapter 7.3.6) and then a invalidate (as

described in chapter 7.3.5).

7.4

Prefetch instruction

The prefetch operation will try to load the line containing the address given by the parameter into

the cache. If the line exists in the cache, nothing is done. If the line does not exist in the cache, it

will be fetched from the bus - even if the area is uncacheable. If any MMU exceptions happen,

the operation is silently aborted.

7.5

7.6

Sync instructions

The sync instruction will flush the write buffer, by forcing it to write any dirty data to the bus. This

can be used to ensure the data in main memory is consistent.

It will also ensure all pending read operations are completed, so e.g. invalidate and unlock oper-

ations are guaranteed to complete correctly.

Memory mapped cache memories

Both the data and tag memories of the DCache are mapped into the global address space, and

can be accessed by programs or through the SAP or OCD system. This can be used for com-

plex cache control, or simply as a very fast scratch RAM. The base address of the memory map

is IMPLEMENTATION DEFINED, but will always be in an unmapped privileged segment. The

size of the memory mapped area is always twice the cache size.

This mapped area is not available through the instruction cache or from peripherals, so it is not

possible to run programs from this area, or DMA data to or from it.

Note: Incorrect values written to the memory mapped cache memories may cause data corrup-

tion or unpredictable behaviour.

The DCache maps read or write requests to the cache memories by decoding the address as

shown below:

Figure 7-1. DCache direct access addressing.

31

0

Base

T/D

Way

Set

Byte

77

32001A–AVR32–06/06

Table 7-3.

Field

DCache direct access address fields

Size (bits)

Description

Base address of the memory mapped area. Memory

mapped cache access is only enabled if this field

matches the IMPLEMENTATION DEFINED base

address.

Base

26 - log2(number of sets)

T/D

1

2

Access tag memories if set, data memories if cleared

Selects which line in a set to access.

Selects which set to access.

Way

Set

log2(number of sets)

5

Number of sets is (cache size / (associativity * line

size))

Byte

Selects which byte of the line to access.

The DCache data memories supports any access that is aligned to the size of the access. The

tag memories only supports aligned 32-bit accesses, other accesses are undefined.

7.6.1

Format of tag memory

In the tag part of the memory mapped area the following data can be found for each line:

Table 7-4.

Word

Memory map tag layout

Data

Address, lock and valid bits.

The valid bit is bit 0, the lock bit is bit 1, the address bits are the upper n bits where n is 27

- log2(number of sets).

0

The remaining bits read as zero, and are ignored on writes.

Dirty bits.

The lower 8 bits contain one dirty bit per word of data. The word at address zero

correspond to bit zero.

1

2

The remaining bits read as zero, and are ignored on writes.

Empty.

Read as zero, and is ignored on writes.

Replace data.

The lower 6 bits contain the replace data used by the cache for selecting which line to

replace within the set. This data is the same for all lines in a set.

Bits 5:4 hold the index of the most recently used line within the set.

Bits 3:2 hold the index of the second most recently used line within the set.

Bits 1:0 hold the index of the third recently used line within the set.

3

The least used line within the set, i.e. the one selected for replacement, is the one not

listed in the above fields.

The remaining bits read as zero, and are ignored on writes.

Note: Setting two or more of the above fields to the same index may lead to unpredictable

behaviour.

> 3

Words 0-3 are repeated over the entire line.

78

AVR32

32001A–AVR32–06/06

AVR32

8. Coprocessor interface

The coprocessor interface allows custom peripherals such as graphics coprocessors to be

tightly coupled to the CPU. No hazard detection is performed on coprocessor registers, so soft-

ware must schedule instructions with care so that no hazards exist.

Figure 8-1. The coprocessor pipeline.

ID

IS

A1

A2

D

WB

DA

TCB BUS

Command, operand and address passing to coprocessors is done via a dedicated, pipelined

bus. This bus is called the Tightly Coupled Bus (TCB), and is used by the system register inter-

face as well. Using a bus allows for easy attachment of coprocessors. The simple

synchronization between the coprocessor and the CPU allows the coprocessor clock frequency

to differ from that of the CPU.

The TCB bus is also used for transporting data to and from the external system registers.

Accesses to these registers can be performed by placing an opcode on tcb_cmd, as done for

tlbr, tlbw and tlbs.

8.1

8.2

Coprocessor pipeline

The coprocessor interface does not specify any special construction or architecture of the copro-

cessor pipelines. A coprocessor only needs to comply to the rules of the TCB. Special

handshaking signals are implemented so that the coprocessor can stall the CPU pipeline if it is

unable to reply to a request from the CPU.

TCB specification

The TCB bus is implemented as a pipelined bus in order to achieve maximum performance.

Additionaly, handshaking has been implemented so that slow coprocessors can insert the

required number of wait states. The CPU is the single master on the TCB bus, and all coproces-

sors are slaves. The coprocessors will only respond to transactions issued by the master, and

can never initiate a transfer.

Since the TCB is a pipelined bus, a bus transaction consists of an address phase and a data

transfer phase.

79

32001A–AVR32–06/06

Table 8-1.

Name

TCB signals

Dir

Description

Value

0

Semantic

Comment

IDLE

No TCB activity

CRd on tcb_cprega

1

2

3

4

write.w

Data to coprocessor on tcb_wdataa

CRd+1:CRd on tcb_cprega:tcb_cpregb

write.d

read.w

read.d

Data to coprocessor on

tcb_wdataa:tcb_wdatab

CRs on tcb_cprega

Data from coprocessor on tcb_rdataa

CRs+1:CRs on tcb_cprega:tcb_cpregb

Data from coprocessor on

tcb_rdataa:tcb_rdatab

SysRegNo[3:0] on tcb_cprega

SysRegNo[7:4] on tcb_cpno

5

6

7

8

mtsr

mfsr

mtdr

mfdr

Data to system register on tcb_wdataa

tcb_cmd[7:0]

Out

SysRegNo[3:0] on tcb_cprega

SysRegNo[7:4] on tcb_cpno

Data from system register on tcb_rdataa

DebugRegNo[3:0] on tcb_cprega

DebugRegNo[7:4] on tcb_cpno

Data to debug register on tcb_wdataa

DebugRegNo[3:0] on tcb_cprega

DebugRegNo[7:4] on tcb_cpno

Data from debug register on tcb_rdataa

9

tlbr

tcb_cpreg* and tcb_cpno not used

10

11

tlbs

tlbw

128-

255

COP opcode

The opcode of the COP instruction.

Used to address coprocessors or system register blocks. 8 Different

coprocessors and 16 different system register blocks are supported in

AVR32 AP.

tcb_cpno[3:0]

Out

Bits 2:0 are used for carrying the coprocessor number for coprocessor

instructions. Bits 3:0 are used for carrying the system register block

address for mt(s,d)r and mf(s,d)r, which is given by SysRegNo[7:4].

80

AVR32

32001A–AVR32–06/06

AVR32

Table 8-1.

Name

TCB signals

Dir

Description

Address bus for the three operands required by the coprocessor

instructions. Also used to carry part of the system register address

together with tcb_cpno.

Operat

ion

tcb_cpre

gc

tcb_cprega

CRd

tcb_cpregb

CRx

COP

CRy

write.w CRd

tcb_cprega[3:0],

tcb_cpregb[3:0],

tcb_cpregc[3:0]

Out

write.d

read.w

read.d

CRd+1

CRd

CRs

CRs+1

mtsr/m

tdr

SysRegNo[3:0]

mfsr/m

fdr

Asserted by the CPU if the CPU LS pipeline was flushed in the

previous cycle. This typically occurs if a coprocessor memory operation

caused an address exception in the D stage. The coprocessor must be

informed that any data output from the data cache will be invalid so that

any pending coprocessor load instructions must be flushed. The

coprocessor or system register block must take appropriate action in

order to ensure the correct semantic on the TCB bus.

tcb_cpuflushed

tcb_cpustalled

Out

Asserted by the CPU if the CPU LS pipeline was stalled in the previous

cycle. The coprocessor or system register block must take appropriate

action in order to ensure the correct semantic on the TCB bus.

Out

Data to write to coprocessor. For word transfers, tcb_wdataa contains

the data to transfer, and tcb_wdatab is UNDEFINED. For doubleword

transfers, tcb_wdataa contains the most significant part of the data,

and tcb_wdatab contains the least significant part of the data.

tcb_wdataa[31:0],

tcb_wdatab[31:0]

81

32001A–AVR32–06/06

Table 8-1.

Name

TCB signals

Dir

Description

Acknowledge signal from the coprocessor slaves. Indicates if the

coprocessor or system register was able to process the command from

the master. If not, the LS pipeline should stall until the tcb_ready signal

is asserted. Each slave connected to the TCB has an individual ready

signal, and all these are AND-ed together to form tcb_ready. All slaves

connected to the TCB should therefore leave their ready signal HIGH

AT ALL TIMES unless they really want to stall the TCB. If one or more

slaves drive their ready signal low, the TCB and LS pipe is stalled.

tcb_ready

In

A TCB slave may only start a stall during the address phase of the TCB

operation. If tcb_ready stays asserted on the first clock edge after the

operation is initiated, tcb_ready must stay asserted until the data is

ready.

Indicates which coprocessors are present on the TCB bus. AVR32

supports 8 coprocessors, and each of the 8 coprocessor addresses is

either in use by a coprocessor or not in use. When attaching a

coprocessor on the TCB bus, the system integrator must assert the

corresponding bit position in the tcb_present bus to 1. The bit position

of unconnected TCB addresses must be disasserted. The tcb_present

bus is used by the ID stage when decoding a coprocessor instruction,

in order to detect whether a coprocessor absent exception should be

triggered.

tcb_present[7:0]

In

The tcb_present signal is static, ie. it should not change during

execution.

If bit n asserted, coprocessor n is present. Otherwise, no coprocessor

with address n is present on the TCB bus.

Data read from the coprocessor to the CPU. The supported data widths

are word and doubleword. Each slave connected to the TCB has

individual tcb_rdataa and tcb_rdatab outputs, and all these are AND-ed

together to form the tcb_rdataa and tcb_rdatab that is input to the CPU.

All slaves connected to the TCB should therefore leave their

tcb_rdataa and tcb_rdatab outputs HIGH AT ALL TIMES unless they

really want to perform a write to the TCB.

tcb_rdataa[31:0]

tcb_rdatab[31:0]

In

For word transfers, tcb_rdataa contains the data to transfer, and

tcb_rdatab is UNDEFINED. For doubleword transfers, tcb_rdataa

contains the most significant part of the data, and tcb_rdatab contains

the least significant part of the data.

8.3

8.4

Connecting coprocessors to the TCB bus

Connecting new coprocessors to the TCB bus is simple. Just assert the bit number in

tcb_present corresponding to the desired coprocessor number, and AND the tcb_ready,

tcb_rdataa and tcb_rdatab signals from the coprocessor together with the same signals from the

other coprocessors. The TCB signals output from the CPU must also be routed to the corre-

sponding inputs on the coprocessor.

Execution of coprocessor instructions

All coprocessor instructions flow through the LS pipeline. The state machine in the DA stage is

responsible for correct execution of the coprocessor instructions. The coprocessor data transfer

instructions behave very similarly to their corresponding load/store instructions. The main differ-

ence is that data is written/read from the TCB bus instead of the integer register file. The timing

82

AVR32

32001A–AVR32–06/06

AVR32

requirements of the TCB bus must be obeyed. This may require stalling of the LS pipeline for the

required number of clock cycles.

8.4.1

Stalling of the TCB bus and the LS pipeline

An addressed TCB slave may stall the TCB bus and the LS pipeline if it is unable to fulfill the

required TCB timing. The addressed slave can perform this stalling by outputting a logical-0

value on its tcb_ready output. Writing a value of zero on tcb_ready will cause the LS pipeline

and the data cache to stall until tcb_ready is written to one again. When the CPU LS pipeline

stalls, all outputs from the CPU to the TCB bus will remain unchanged.

If the CPU LS pipeline stalls for some reason, the TCB may need to be informed if the LS stall

affects a TCB transfer. The tcb_cpustalled signal is a registered version of the stall signal in the

LS pipeline. If the CPU was stalled in the previous cycle, tcb_cpustalled is asserted. Otherwise,

tcb_cpustalled is disasserted. Since disassertion of tcb_ready will cause the LS pipeline to stall,

tcb_cpustalled will always be high the cycle following a cycle where tcb_ready was low.

8.4.2

Coprocessor operation

The cop instruction issues a command in the form of an opcode and three operand addresses to

the addressed coprocessor. The coprocessor operation only uses the address phase of the bus

transaction, while the data phase is unused and all signals are considered don’t care.

If the CPU stalls immediately after issuing a coprocessor operation, a coprocessor operation

opcode will be present on tcb_cmd for multiple cycles. If the TCB slave needs to avoid the oper-

ation being executed several times, the TCB slave must qualify any coprocessor operation

opcodes with tcb_cpustalled being low.

8.4.3

Writing data to coprocessor

There are several instructions that transfer data to coprocessors. These instructions are: ldc.d,

ldc.w, ldcm.d, ldcm.w, mvrc, mtdr and mtsr. Mtdr and mtsr are not coprocessor instructions, but

use the TCB in a manner very similar to mvrc, and is therefore included here. Data can be trans-

ferred to coprocessor registers in sizes of word and doubleword. The data transferred to the

coprocessor register file is either read from memory or from one of the integer registers in the

CPU.

The ldcm instructions behave similarly to the ldm and popm instructions. The hardware always

try to transfer a doubleword from the cache in order to speed up the data transfer. This is suc-

cessful if the memory pointer is doubleword aligned. Otherwise, a word access is performed

first, then the remaining transfers are performed as doubleword accesses.

The FSM in the DA stage decomposes load of multiple registers into a sequence of load of

words and doublewords. These accesses are pipelined according to the TCB rules, allowing a

bus transfer of word or doubleword per clock cycle.

If the CPU stalls after issuing a TCB write command, tcb_wdataa and tcb_wdatab will contain

the write data for the previous TCB bus cycle. This write data will be present on the bus until the

last cycle where tcb_cpustalled is high. In this cycle, the write data for the stalled TCB write com-

mand will be present on the bus. If the TCB slave needs to avoid that the write data for the

previous TCB command is written to the TCB slave destination registers for the stalled (current)

write, the TCB slave must not update the TCB destination registers when tcb_cpustalled is high.

83

32001A–AVR32–06/06

8.4.4

Reading data from coprocessor

There are several instructions that transfer data from coprocessors. These instructions are:

stc.d, stc.w, stcm.d, stcm.w, mvcr, mfdr and mfsr. Mfdr and mfsr are not coprocessor instruc-

tions, but use the TCB in a manner very similar to mvcr, and is therefore included here. Data can

be transferred from coprocessor registers in sizes of word and doubleword. The data transferred

from the coprocessor register file is either stored to memory or into one of the integer registers in

the CPU.

The stcm instructions behave similarly to the stm and pushm instructions. The hardware always

try to transfer a doubleword to the cache in order to speed up the data transfer. This is success-

ful if the memory pointer is doubleword aligned. Otherwise, a word access is performed first,

then the remaining transfers are performed as doubleword accesses.

The FSM in the DA stage decomposes store of multiple registers into a sequence of store of

words and doublewords. These accesses are pipelined according to the TCB rules, allowing a

bus transfer of word or doubleword per clock cycle.

A TCB slave must take special action if it was read from in the previous cycle, and

tcb_cpustalled gets asserted. In this case, the slave must continue to output the data values it

put on tcb_rdataa and tcb_rdatab in the previous cycle for as long as tcb_cpustalled is asserted,

even though a new command is present on tcb_cmd.

8.5

Timing diagrams

8.5.1

Coprocessor operation

Figure 8-2. COP bus timing.

clock

tcb_cmd

COMMAND A

COP # (A)

CRd (A)

COMMAND B

COP # (B)

CRd (B)

COMMAND C

COP # (C)

CRd (C)

tcb_cpno

tcb_cprega

tcb_cpregb

tcb_cpregc

tcb_cpustalled

tcb_wdataa

tcb_wdatab

tcb_ready

tcb_rdataa

tcb_rdatab

CRx (A)

CRx (B)

CRx (C)

CRy (A)

CRy (B)

CRy (C)

84

AVR32

32001A–AVR32–06/06

AVR32

8.5.2

Writes to coprocessor register file

Figure 8-3. Write to CP timing.

clock

tcb_cmd

WRITEW A

COP # (A)

CRd (A)

WRITED B

COP # (B)

CRd+1 (B)

CRd (B)

WRITED C

COP # (C)

CRd+1 (C)

CRd (C)

tcb_cpno

tcb_cprega

tcb_cpregb

tcb_cpregc

tcb_cpustalled

tcb_wdataa

tcb_wdatab

tcb_ready

tcb_rdataa

tcb_rdatab

Data(A)

DataMSP(B)

DataLSP(B)

DataMSP(C)

DataLSP(C)

8.5.2.1

Reads from coprocessor register file

Figure 8-4. Read from CP timing.

clock

tcb_cmd

READW A

COP # (A)

CRd (A)

READD B

COP # (B)

CRd+1 (B)

CRd (B)

READD C

COP # (C)

CRd+1 (C)

CRd (C)

tcb_cpno

tcb_cprega

tcb_cpregb

tcb_cpregc

tcb_cpustalled

tcb_wdataa

tcb_wdatab

tcb_ready

tcb_rdataa

tcb_rdatab

Data (A)

DataMSP(B)

DataLSP(B)

DataMSP(C)

DataLSP(C)

85

32001A–AVR32–06/06

9. OCD system

9.1

Overview

The AVR32 CPU is targeted at a wide range of 32-bit applications. The CPU can be delivered in

very different implementations in various ASIC’s, ASSP’s, and standard parts to satisfy require-

ments for low-cost as well as high-speed markets. According to the cost sensitivity and

complexity of these applications, a similar span in debug complexity must be expected. While

some users expect very simple debug features, or none at all, others will demand full-speed

trace and RTOS debug support. This also applies to the debug tools: While the simplest devel-

opment takes place on simulators and development boards, most will require basic on-chip

debug emulators, and a few will require complex emulators with full-speed trace.

To match these criteria, the AVR32 AP OCD system is designed in accordance with the Nexus

2.0 standard (IEEE-ISTO 5001™-2003), which is a highly flexible and powerful open on-chip

debug standard for 32-bit microcontrollers.

9.1.1

Features

• Nexus compliant debug solution

• OCD supports any CPU speed

• Execute debug specific CPU instructions (debug code) from program memory monitor or

external debugger

• Debug code can read and write all registers and data memory

• Debug code can communicate with debugger through the debug port

• Debug mode can be entered by external command, breakpoint instruction, or hardware

breakpoints

• Six program counter hardware breakpoints are supported

• Two data breakpoints are supported

• Breakpoints can be configured as watchpoints (flagged to the external debugger)

• Hardware breakpoints can be combined to give break on ranges

• Real-time program counter branch tracing

• Real-time data trace

• Real-time read/write access to data memory and data cache

• Real-time process trace

• ASID-specific breakpoints

9.1.2

OCD controller overview

The OCD system interfaces provides the external debugger with access to the on-chip debug

logic through the JTAG port and the Auxiliary (AUX) port, as shown in Figure 9-1. The operation

is described briefly below and in more detail in separate chapters.

9.1.2.1

Host, debugger, and emulator

At the host side, the user debugs his software using a source level debugger, which can read his

compiled and linked object code. The source level debugger accesses features in the emulator

and OCD system through an API (defined by the vendor or based on the Nexus recommenda-

tions), which constitutes the abstract interface between the source level debugger and the

emulator. The API translates high-level functions, such as setting breakpoints or reading mem-

86

AVR32

32001A–AVR32–06/06

AVR32

ory areas, to sets of low level commands understood by the OCD controller. Certain operations

(such as reading the register file) may require running sections of debug code on the CPU,

which can also be handled in this level. The emulator translates the communication from the

host into commands transmitted to the target over the JTAG port. If trace is enabled, trace mes-

sages are transmitted from the device on the Nexus-defined auxiliary (AUX) port. The AUX port

can be scaled to the number of output pins needed to sustain the estimated bandwidth require-

ment. The Nexus protocol defines the format of the messages and signals, the pin count options

and pinout of the debug port, and the type of connector used.

Figure 9-1. Block diagram of the OCD system (shaded) and its main connections.

H o s t

AU X P o rt

D e b u g g e r

J T AG P o rt

O C D s ys te m

T AP

N a n o T ra c e

S e rvic e Ac c e s s

P ort (S AP )

m

e s s a g e s

T ra n s m itQ u e u e

W a tch p o in t

m s g

D a ta

T ra c e

m

s g

S e rvice Ac c e s s B u s

(S AB )

C P U o b s e rva tio n u n its

D e b u g

S ta tu s m s g

B ra n c h

T ra c e

M e s s a g e

T rig g e r

D a ta T ra c e

P ro g ra m

T ra c e

T rig g e r

O w n e rs h ip

T ra c e

P C

M e s s a g e

F lo w

C o ntro l

U nit

C o m p a ra to rs

O w n e rs h ip

T ra c e

M e m o ry

In te rfa c e

U n it

B re a k p o in t U n it

D a ta

C o m p a ra to rs

U n it

CPU

o b s e rva tio

n

OC D

c o n tro l

s ig n a ls

D e b u g

in s t

C o -p ro ce s s o r

b u s

s ig n a ls

In s t

B us

M a s te r

I-c a c he

AS ID

PC

A V R 3 2 A P

C P U

B us

A rbite r

M M U

H ig h S p e e d B u s

D a ta Ad r

D -

B us

D a ta b u s

c a c he

M a s te r

9.1.2.2

Accessing the debug features

A number of blocks handle the various debug functions specified by the Nexus standard. The

emulator communicates with registers in these blocks by commands on the JTAG port, as spec-

ified by the Nexus standard. OCD registers are typically used for configuration, control, and

status information. Trace information and debug events can also generate messages to be

transmitted on the AUX port.

Registers are indexed and are accessed through Read Register and Write Register messages

from the emulator. Alternatively, they can be accessed by the CPU through mtdr and mfdr

instructions, which gives a debug monitor in the CPU access to most of the debug features in

the OCD system, as described in “OCD Register Access” on page 98.

87

32001A–AVR32–06/06

9.1.2.3

9.1.2.4

Transmit Queue

Trace and watchpoint messages are inserted into the Transmit Queue (TXQ) before being trans-

mitted on the AUX port. This provides some flexibility between the peak rate of trace message

generation and the average rate of message transmission on the AUX port.

Flow Control Unit

The Flow Control Unit (FCU) can bring the CPU into and out of Debug Mode, and control the

CPU operation in Debug Mode. The behavior is controlled by accessing OCD registers.

Debug Mode can be configured as OCD Mode or Monitor Mode. In OCD mode, The CPU

fetches instructions from the Debug Instruction Register. If the register is empty, the CPU is

halted. In Monitor Mode, the CPU fetches debug instructions from a monitor code in the program

memory, and the Debug Instruction Register is not used.

The FCU also handles single stepping by returning the CPU to normal mode, letting the CPU

fetch one instruction from the program memory, and then returning to Debug Mode on the fol-

lowing instruction.

9.1.2.5

Breakpoint modules

A number of instruction and data breakpoint modules can be configured for run-time monitoring

of the instruction fetches and data accesses by the CPU. The modules can report if the moni-

tored operation matches a predefined address, alternatively, also a data value. The modules

operate on virtual addresses.

A breakpoint will bring the CPU into Debug Mode. Watchpoints are reported to the debugger,

but does not affect CPU operation. A watchpoint can also be configured to start or stop data and

program trace.

The breakpoint modules can be combined to produce a watchpoint or breakpoint. Complex

breakpoint/watchpoint conditions are supported, e.g. trigger when a specific procedure writes a

certain variable with a specific value.

9.1.2.6

Program and Data Trace

The Program Trace Unit sends Branch Trace Messages to the debugger, which allows the pro-

gram flow to be reconstructed. To keep the amount of debug information low to save bandwidth,

only change of program flow are reported (such as unconditional branches, taken conditional

branches interrupts, exceptions, return operations, and load operations with PC as destination),

hence the term "branch tracing". Messages are typically relative to the previously transmitted

message, to be able to compress information as much as possible. Thus, the trace messages

are sent out in temporal order, and regularly, synchronization messages with uncompressed,

absolute addresses, are transmitted in case synchronization is lost.

The Data Trace Unit similarly traces data accesses, for read or write accesses, or both. Similar

relative address compression and synchronization schemes are used for Data Trace Messages.

Since new trace messages can be generated before the previous ones have been transmitted,

all trace messages are queued before being transmitted by the AUX interface. If the queue over-

flows, the CPU can be halted to avoid losing trace information, or an error message followed by

synchronization trace messages will be transmitted.

9.1.2.7

RTOS debug support

Applications developed on an RTOS platform places special requirements on the OCD controller

and the debug software. For high-level debugging, the user will want to see which process is

88

AVR32

32001A–AVR32–06/06

AVR32

running at any time, without having to interrupt the CPU or trace the program flow. This is

accomplished through Ownership Trace Messaging, in which the process ID of the running pro-

cess is reported at every process switch. The CPU writes the process ID to an OCD register in

the Ownership Trace Unit, which in turn generates an Ownership Trace Message.

9.1.2.8

Timestamps

The emulator can tag events with a timestamp when they are extracted from the OCD system

and transmitted to the emulator, to provide timing information for these events when they are

transmitted to the debug host. However, due to the delay of the transmit queue and transmit time

over the AUX port, this timing will have limited accuracy. To compensate for this, the EVTO pin

can be configured to toggle every time a message is inserted into the Transmit Queue, thus indi-

cating very precisely when each event occurs. The emulator would then store a queue of

timestamp tags with each event, and associate each tag with the corresponding message, as

they are extracted on the AUX port.

9.1.2.9

Real-time memory access

Real-time block transfers of data to or from system memory is also possible through the Memory

Interface Unit (MIU). The tool initiates these transfers by writing to OCD registers in the MIU.

Unlike the comparator units, the MIU operates on physical addresses, since no interference with

the operating system can be expected. This means that the debug software must perform the

translation between the virtual and physical address map before accessing the memory. This

mapping is typically specified through page tables located in a privileged, unmapped area of the

RAM, and can be read out by the debugger to calculate the physical address. Since the location

and format of the page table is OS specific, the debugger must be "OS aware" to employ this

feature.

The CPU can also use the MIU to perform an efficient transfer of data from user memory to the

tool, without a prior read request from the tool.

9.1.2.10

Java debug features

AVR32 AP has native support for Java bytecode programs, executing on a Java Virtual Machine

(JVM) platform. The OCD features mentioned above are also available in Java mode, enabling

the same debug support for Java programs as for C/C++/assembly programs. The JVM will

implement a debug protocol which the debugger can use to extract key information about tasks

and objects in the execution environment. Alternatively, if the format of the data structures cre-

ated by the JVM is known by the debugger, the debugger can read out all JVM and task

information by block read commands.

89

32001A–AVR32–06/06

9.2

CPU Development Support

The OCD system can bring CPU into and out of Debug Mode, and control the CPU operation in

Debug Mode. The behavior is controlled by OCD register configuration, stop commands from

the debugger, or breakpoints. The OCD registers can be accessed by Nexus messages or from

the CPU as memory-mapped registers.

9.2.1

Debug Mode

Debug Mode is an execution mode dedicated to application debugging and is not intended for

running application code. Debug Mode can execute a debug code either from an external

debugger through the OCD system (OCD Mode), or from a debug routine in program memory

(Monitor Mode). The debug code will typically read out system registers and information about

the various processes running in the system before restarting.

The Nexus class 3 compliant OCD system contains breakpoint and trace modules, and other

features for debugging code on the CPU. These features are generally accessible both in OCD

Mode and Monitor Mode. In OCD Mode, the debugger accesses the features through messages

over the AUX debug port, and in Monitor Mode, the CPU accesses the features through mtdr

and mfdr instructions. The OCD system runs at system speed to stay synchronous with the CPU

at all times. If the CPU is in a low-power sleep mode, it is woken up before entering Debug

Mode.

9.2.1.1

Operations in Debug Mode

Debug Mode is characterized by the Debug (D) bit in the Status Register (SR) in the CPU.

Debug Mode is a privileged mode, and all legal instructions and memory operations are permit-

ted Illegal opcodes or memory operations which would normally cause an exception will be

ignored in Debug Mode.

The Debug Mode has a dedicated Return Address and Return Status Register (RAR_DBG and

RSR_DBG, respectively) but no other masked registers. RAR_DBG and RSR_DBG are not

observable as part of the register file, only as system registers. The register file view is mapped

according to the mode bits in the Status Register (M[2:0]). These bits are set to the exception

context when entering Debug Mode, but can be changed freely within Debug Mode by writing to

SR. In this way, different register contexts can be observed and modified, while maintaining the

execution and access privileges of Debug Mode.

Debug Mode is exited by the retd instruction, both in Monitor Mode and OCD Mode. This

restores PC from RAR_DBG and SR from RSR_DBG.

9.2.1.2

A typical debug session flow

Figure 9-2 shows an example of a typical flow in Debug Mode. A software or hardware break-

point aborts the execution of an instruction and causes Debug Mode to be entered. If the Monitor

Mode (MM) bit in the Development Control (DC) OCD register is set, Monitor Mode is entered,

and the CPU jumps to the software debug monitor starting at EVBA+0x01C. Otherwise, OCD

Mode is entered, and the CPU stalls while waiting for instructions to be entered by the external

debugger through the Debug Instruction (DINST) OCD register. In either case, the D bit in the

CPU Status Register is set during the whole debug session, giving access to all privileged oper-

ations. Any number of instructions can be executed before returning to the breakpointed

instruction by the retd instruction. RAR_DBG stores the address of the breakpointed instruction,

and manipulating RAR_DBG in Debug Mode is useful if a different return address is desired (for

instance, to avoid repeated hits on a breakpoint instruction).

90

AVR32

32001A–AVR32–06/06

AVR32

Figure 9-2. Example of flow in Debug Mode.

LR_DBG

User code

Breakpointed instruction

Debug Mode

DC:MM?

External

0 = OCD Mode

Debugger

1 = Monitor Mode

Write Register commands

EVBA+0x300

Instructionsfrom

externaldebugger

SR:D = 1

Softwaredebugmonitor

SR:D = 1

Ins t

DINST

retd

9.2.2

Monitor Mode

If the Monitor Mode (MM) bit in the Development Control register (DC) is set, the CPU will enter

Debug Mode in Monitor Mode. Instructions are fetched from the monitor code located in the pro-

gram memory at the Exception Vector Base Address (EVBA) + 0x01C. The monitor code

contains the necessary mechanisms to read and modify CPU and system registers, and memory

areas. All other exceptions and interrupts are masked by default when entering Monitor Mode,

but the monitor code can explicitly unmask interrupts to allow critical interrupts to be serviced

while the system is being debugged.

The monitor code will typically communicate with an external debug tool, or (in cases of

advanced systems like PDA’s) a debug tool running within the application (self-hosted debug-

ger). Communication with the external tool may take place over any communication link present

in that device (e.g. USB, RS232), if such a communication line can be reserved for debug

purposes.

Alternatively, the Debug Communication Mechanism in the OCD system can be used to commu-

nicate between the CPU and emulator over the JTAG port. This is a set of OCD registers which

can be written by the CPU or emulator, allowing a communication protocol to be developed in

software. This mechanism can be used in any privileged CPU mode, including OCD Mode.

Monitor Mode is exited with the retd instruction.

91

32001A–AVR32–06/06

9.2.2.1

Debugging a monitor code

Each execution mode has a mask bit in SR, which indicates if a request to enter that mode will

be taken or masked. The default priority of modes are reflected in these bits: When entering an

execution mode, modes of the same or lower priority are masked. Privileged modes can over-

ride the mask, to dynamically change priorities (e.g. to allow critical interrupts to be serviced).

By default, Debug Mode has priority above all other execution modes. This implies that any

supervisor or user code can be interrupted by Debug Mode. Other modes can be explicitly

unmasked by a monitor code to allow critical interrupts to be serviced. By default, Debug Mode

is masked by the Debug Mask (DM) bit in SR when executing in Monitor Mode. The Monitor

Mode can stack away the RAR_DBG and RSR_DBG and then explicitly clear the DM bit to

enable Debug Mode to be re-entered. If a debug exception occurs in Monitor Mode, the OCD

system will bring the CPU into OCD Mode, even if the MM bit is set. This allows Monitor Mode

programs to be debugged.

9.2.3

OCD Mode

If the Monitor Mode (MM) bit in the Development Control register (DC) is cleared, the CPU will

enter Debug Mode in OCD Mode. When the CPU is in OCD Mode, the Debug Status (DBS) bit

in the Development Status (DS) register is set, in addition to the D bit in SR in the CPU. OCD

Mode is similar to Monitor Mode, except that instructions are fetched from the OCD system.

OCD instructions are loaded by the debug tool by writing the opcode to the Debug Instruction

register (DINST). Once an instruction is written to DINST, the CPU will fetch it, and the Instruc-

tion Complete bit in DS (DS:INC) will be cleared until the CPU has completed the operation. The

CPU is then halted until DINST is written again.

The first instruction entered must be aligned to the MSB of DINST. A sequence of instructions

can be entered to DINST one word at a time, in the same sequence they would appear in pro-

gram memory, i.e. they do not need to be word aligned. If the upper halfword of an extended

instruction is written to the lower halfword of DINST, the lower halfword of the instruction is writ-

ten as the upper halfword of DINST in the next access. If the last instruction in a sequence is

written to the upper halfword of DINST, the lower halfword should be written with a nop opcode.

See Figure 9-3 for an illustration of a sequence of operations used to execute instructions in

OCD Mode.

Any instruction valid in Monitor Mode is also valid in OCD Mode. Memory operations can be con-

ducted without any special synchronization with external hardware.

All OCD units can be configured while the CPU executes in OCD Mode, but the following debug

features are disabled:

• PC breakpoints

• Data breakpoints

• Watchpoints

• Program Trace

• Data Trace

• Nano Trace

OCD Mode is exited by writing the retd instruction to DINST.

92

AVR32

32001A–AVR32–06/06

AVR32

Figure 9-3. Executing instructions on the CPU in OCD Mode.

OCD

Written by

Opcode

0x0E9C

Changes in DS

Instructions

tool to DINST

0x0E9C201C

INC→0→1

mov r12,r7

sub r12,0x01

mov r6,r12

adc r6,r12,r7

retd

0x201C

0x1896F807

0x0046D623

INC→0→1

0x1896

0xF807 0046

0xD623

DBS→0

9.2.4

Entry into Debug Mode

Debug Mode can only be entered when the OCD is enabled, and Debug Mode is not masked.

The following ways of entry are then possible:

• Debug request from the debugger

• Program counter breakpoint

• Data address or value breakpoint

• breakpoint instruction

• Trapping opcode 0x0000

• Single step

• Hardware error

• Event on EVTI pin

• NanoTrace buffer full

• Abort command from the debugger

The debugger can identify the condition which caused entry into Debug Mode by examining the

status bits in the Development Status register (DS). Each cause of entry has a particular bit

associated with it. Several exceptions can trigger simultaneously, causing more than one bit to

be set.

Note that any privileged CPU mode may write the SR:D bit to one directly, but this will not cause

entry to Debug Mode.

9.2.4.1

9.2.4.2

Debug request

The debugger may want to stop CPU operation, unrelated to current instruction execution, e.g. if

the user presses a "STOP" button in the debug tool GUI. The debugger will then write the Debug

Request (DBR) bit in the Development Control Register (DC). This causes the CPU to enter

Debug Mode on the next instruction to be executed, before execution.

Program counter breakpoint

The Program Counter breakpoints can be configured to halt the CPU when executing code at a

specific address, or address range. This will cause the CPU to be halted before the break-

pointed instruction is executed.

93

32001A–AVR32–06/06

The Ignore First Match (IFM) bit in the Development Control (DC) register should be written to

one before exiting Debug Mode, to avoid re-triggering the program breakpoint. This bit only pre-

vents program breakpoints from re-triggering. If the instruction causes a breakpoint for another

reason (e.g. a breakpoint instruction or a data breakpoint), Debug Mode will be re-entered.

9.2.4.3

Data address or value breakpoint

CPU memory accesses can be monitored by data breakpoint comparators in the OCD system. If

the access matches a set of predefined conditions (e.g. address, value, or access type), Debug

Mode is entered after the memory operation completes, but before the next instruction is

executed.

Data breakpoints are precise, halting on the instruction immediately after the memory operation

which caused the breakpoint. The CPU will return to the first non-executed instruction when a

retd is executed.

9.2.4.4

breakpoint instruction

The breakpoint instruction is programmed along with the object code into the program memory

or instruction cache, and is decoded by the CPU. When this instruction is scheduled for execu-

tion and Debug Mode is enabled, the CPU will enter Debug Mode. If Debug Mode is disabled

(e.g. masked by the DM bit in the Status Register, or DBE in DC is zero), the breakpoint instruc-

tion will execute as a nop (no operation).

For devices based on volatile program memory, the breakpoint instruction can be dynamically

inserted into the code by the debug tool, enabling an unlimited number of program breakpoints

in the code. This involves replacing an existing opcode with a breakpoint instruction. The

replaced opcode has to be re-inserted before exiting Debug Mode. Note that this is only possible

in OCD Mode.

For devices based on non-volatile program memory, the breakpoint instruction can be statically

compiled or linked into the code before downloading, marking all points the program can be

halted. Debug Mode will be entered for all breakpoints (if Debug Mode is enabled), and the

debugger would return immediately if it does not want to halt at a particular breakpoint location in

the code.

Alternatively, the Instruction Cache memory can be directly written by the debugger through the

JTAG port. The page containing the software breakpoint can be programmed into a cache page

and locked, to prevent it from being flushed. Every time the CPU executes the breakpointed sec-

tion of the code, it fetches these instructions from the cache instead of the program memory.

This method can only be used to insert software breakpoints in cacheable regions of the mem-

ory space, as defined by the Memory Management Unit.

The breakpoint will be taken before the breakpoint instruction is actually executed. This has the

effect that the CPU will return from Debug Mode to the same breakpoint instruction, re-entering

Debug Mode immediately, unless the OCD system is configured to modify the return address or

replace the breakpoint instruction from the instruction flow. The IFM bit does not have an effect

when Debug Mode returns to a breakpoint instruction.

9.2.4.5

Trapping opcode 0x0000

In Flash-based microcontrollers, the opcode 0x0000 can overwrite any other opcode without

having to erase and reprogram the Flash. Therefore this instruction can enter Debug Mode, as

for the breakpoint instruction. However, the opcode 0x0000 is also a valid part of the instruction

94

AVR32

32001A–AVR32–06/06

AVR32

set (ADD R0,R0 in AVR32) and can be part of the software to be debugged. Therefore, the user

must write the DC:TOZ (Trap Opcode Zero) bit to one to enable this feature.

The DS:BOZ bit will be set if Debug Mode is entered due to a trapped 0x0000 instruction. The

debugger must then identify whether this opcode belongs to the original object file or has been

inserted by the debugger as a software breakpoint. If it was part of the object file, the debugger

should use the Instruction Replacement to return to the program, and insert the 0x0000 opcode

in DINST.

9.2.4.6

Single stepping

The debugger will typically allow the user to step through the application source or object code,

line by line. This single stepping can be either of step-into or step-over type. Step-into will exe-

cute exactly one instruction and halt the CPU at the start of the next instruction, regardless of

whether this instruction is part of the main program, subroutine, interrupt, or exception. Step-

over will execute the current instruction and any lower-level events generated before the follow-

ing instruction (including subroutines, interrupts, and exceptions).

Step-over in the object code and all single stepping in the source code are implemented by con-

figuring a program breakpoint on the address of the next object code instruction where the

debugger expects to halt.s

Step-into is implemented in OCD hardware and is controlled by the Single Step (SS) bit in the

Development Control register. When Debug Mode is exited by retd, exactly one instruction from

the program memory will be executed before Debug Mode is re-entered. This mechanism works

identically for OCD and Monitor Mode.

9.2.4.7

Hardware Error

The CPU might encounter problems which cannot be handled in software. This includes access-

ing a memory area reserved for NanoTrace. These types of errors should never occur in a

correctly written application, and will normally trigger a soft reset.

To ease debugging of these types of errors, the debugger can write the DC:TSR (Trap Soft

Reset) bit to one. The CPU will then enter Debug Mode if a soft reset occurs. This includes any

kind of soft reset in the device, such as watchdog reset. The Hardware Error bit (HWE) in the

Development Status register will be set to indicate that a trapped soft reset caused entry to OCD

mode.

Note that if OCD mode is disabled (i.a. also when Monitor Mode is enabled), the soft reset allows

the software to restart in a defined manner.

Since the soft reset causes may corrupt CPU execution, the RAR_DBG and RSR_DBG are

undefined when Debug Mode is entered due to a hardware error.

9.2.4.8

9.2.4.9

Event on EVTI pin

If the Event In Control (EIC) bits in DC are written to 0b01, a high-to-low transition on the EVTI

pin will generate a breakpoint. EVTI must stay low for one CPU clock cycle to guarantee that the

breakpoint will trigger. The External Breakpoint (EXB) bit in DS will be set when a breakpoint is

entered due to an event on the EVTI pin.

NanoTrace buffer full

When using NanoTrace to write trace information to memory, the user can configure a break-

point when the buffer becomes full. This will set the NanoTrace Buffer Full (NTBF) bit in DS.

RAR_DBG will point to the last non-executed instruction.

95

32001A–AVR32–06/06

9.2.4.10

Abort command

Some software errors could cause the CPU to get stuck in a state which does not allow Debug

Mode to be entered through the mechanisms described above. An example is if a privileged

mode writes SR:DM to one, without clearing the bit.

To prevent the debugger from hanging indefinitely, the debugger can write the DC:ABORT bit to

one after some timeout period, and force the CPU to enter Debug Mode. The abort command

behaves identical to a debug request, except that the DM bit and any pending exception will be

ignored, regardless of exception priority. The RAR_DBG and RSR_DBG will reflect the last non-

executed instruction, which can aid in locating the error.

If Debug Mode is entered due to an abort command, DS:DBA will be set, as for debug requests.

9.2.5

Exceptions and Debug Mode

Debug Mode has priority over any execution mode, so that breakpoints can be set in exception

and interrupt routines. However, if a breakpoint is set on an instruction which triggers a critical

exception, the breakpoint is flushed. Critical exceptions are exception which are asynchronous

to the CPU (interrupts), exceptions which invalidate the currently fetched instruction (e.g.

instruction address exceptions), and exceptions which indicate that the system has become

unstable and should abort the program flow (e.g. bus error). The complete list of exceptions with

higher priority than Debug Mode are listed in the exception chapter in the AVR32 Architecture

Manual.

If a PC breakpoint, a breakpoint instruction, or a trapped 0x0000 opcode is flushed by an excep-

tion, Debug Mode will not be entered. If another type of breakpoint has triggered, Debug Mode

will be entered on the first instruction in the exception handler.

In the rare cases where the first instruction in a critical exception also triggers a critical exception

(e.g. if EVBA is set incorrectly, triggering an infinite loop of instruction address exceptions), the

debugger must write the DC:ABORT bit to one to halt the CPU and enter Debug Mode to identify

the error.

9.2.6

Instruction replacement

A convenient way of implementing an unlimited number of instruction breakpoints is letting the

debugger replace an instruction by a breakpoint instruction. This mechanism is only available in

OCD Mode on devices implemented with writeable program memory or writeable instruction

cache. If this instruction executes, Debug Mode will be entered, and the debugger identifies the

breakpointed location. When returning, the breakpoint instruction must be replaced by the origi-

nal instruction. The debugger will write the Instruction Replace (IRP) bit in DC and the

appropriate instruction in the Debug Instruction Register and its corresponding PC value in the

Debug Program Counter (DPC). When retd is executed, PC and SR are restored, but one more

instruction is fetched from the OCD system before returning to fetching from program memory.

Note that instruction replacement operates on word boundaries. The debugger must store the

whole word containing the replaced opcode before inserting the breakpoint instruction. Also note

that DPC should always be written when performing an instruction replacement to ensure the

correct instruction is executed.

The debugger will then perform the following sequence when exiting OCD Mode. Note that

RAR_DBG is accessed through executing CPU instructions through the Debug Instruction regis-

ter (DINST). The same sequence can be used both for compact and extended instructions,

regardless if the extended instruction is unaligned (in which case only the upper halfword of the

instruction is replaced).

96

AVR32

32001A–AVR32–06/06

AVR32

1. Write RAR_DBG to the Debug Program Counter.

2. Increment RAR_DBG by 2 or 4, so the register points to the start of the next word in the

program memory.

3. Write 1 to Instruction Replace (IRP) in DC.

4. Write a retd instruction to DINST. The CPU will exit Debug Mode and stall while waiting

for new instructions.

5. Write the stored word to DINST. This instruction is fetched by the CPU, and the CPU

continues normal program execution.

9.2.6.1

Instruction replacement example

Table 9-1 shows an example of a code where the user wants to insert a breakpoint.

Table 9-1.

PC value

0x000010

0x000012

0x000014

0x000016

0x00001A

Example of a user code section

Opcode

0x0E9C

0x201C

Instruction

mov r12,r7

sub r12,0x01

rcall label1

adc r6,r12,r7

sub r7,0x02

0xC0AC

0xF8070046

0x2027

The tool wants to insert a software breakpoint on the instruction "adc r6,r12,r7" on

PC=0x000016. This is an extended instruction, and only the upper halfword needs to be

replaced by the breakpoint instruction.

1. The upper halfword is contained within the word located at 0x000014, and the debug

tool stores this value (0xC0ACF807).

2. The debugger writes a breakpoint instruction (opcode 0xD673) to location 0x000016 in

the CPU’s program memory to replace the most significant word of the breakpointed

instruction.

3. When the breakpoint instruction executes, the CPU will enter OCD Mode, and DS:DBS

and DS:SWB are set, indicating that OCD Mode is entered due to a software

breakpoint.

4. The tool performs a normal sequence of operation in OCD Mode.

5. When the tool is ready to return to normal CPU operation, it reads the RAR_DBG value

to find the return address.

6. The tool inserts CPU instructions to DINST to increment RAR_DBG by 2, so it is

aligned to the next word in the program memory.

7. The tool inserts a "retd" instruction to DINST. The tool will receive a Debug Status mes-

sage, which indicates that the CPU has exited OCD Mode, and is now waiting for one

more instruction from the tool.

8. The tool writes the return address (0x000016) to the Debug Program Counter (DPC).

9. The tool looks up the stored instruction word (based on the return address) and writes

this value (0xC0ACF807) to the Debug Instruction Register (DINST). The CPU now

resumes normal operation.

97

32001A–AVR32–06/06

9.2.7

Sleep Mode

If the CPU is in sleep mode, it will not receive clocks nor respond to an OCD request from the

debugger. Thus, if the Debug Request bit in DC is written to one while the CPU is in sleep mode,

the CPU will automatically return to active mode. The instruction following the sleep instruction

will be tagged with an OCD exception, and the CPU will jump directly to Debug Mode. The nor-

mal debug procedure can be followed while executing in Debug Mode. If Debug Mode is entered

from sleep mode, the Stop Status (STP) bit in the Development Status register will be set.

When returning from Debug Mode, the CPU will by default return to the instruction following the

sleep instruction. The debugger can handle this situation in two ways:

1. Allow the CPU to wake up from sleep mode on a debug request.

2. Decrement RAR_DBG in Debug Mode to return to the sleep instruction. This places the

CPU back into sleep mode after exiting Debug Mode.

9.2.8

9.2.9

OCD Register Access

The OCD registers control the OCD system. Their specification is based on the Nexus Recom-

mended Registers as outlined in the Nexus Standard Specification [IEEE-ISTO 5001™-2003].

All registers can be accessed through the JTAG interface.

OCD features in Debug Mode

When the CPU executes in Debug Mode, certain OCD features will be disabled. The following

table indicates how the various OCD features will behave in Debug Mode. For more information

on the specific features, please see the indicated page.

Table 9-2.

Feature

OCD features in Debug Mode

Available in Debug Mode?

Program Breakpoints (HW)

Software Breakpoints

Data Breakpoints

Yes, in Monitor Mode when SR:DM is cleared

Watchpoints (program and data)

Program Trace

Yes, in Monitor Mode

No

Data Trace

No

Ownership Trace

Yes

No

NanoTrace

Direct Memory Access

Debug Communication Mechanism

Yes

9.2.10

OCD Registers Accessed by CPU

A monitor program running on the target can access the OCD registers through mtdr and mfdr

instructions. These instructions transfer data between a register in the register file and an OCD

register, according to the register index given in “OCD Register Summary” on page 152. These

instructions can also be used in OCD mode to transfer information from the register file and sys-

tem registers to the debugger, through the Debug Communication Mechanism.

98

AVR32

32001A–AVR32–06/06

AVR32

9.2.11

Runtime write access to OCD registers

The OCD registers can always be accessed by JTAG when the when the OCD system is not

enabled or the CPU is in OCD Mode. The OCD registers can also be read by JTAG at any time,

and by the CPU in any privileged mode.

When the CPU is in other modes - either running normal code, or executing in Monitor Mode -

the OCD registers can be written by JTAG as specified in Table 9-3. If the registers are

accessed in another way than specified, undefined operation may result.

The OCD Register Protect (ORP) bit in DC define the allowed write access to OCD registers in

privileged modes. If the ORP bit in DC does not allow CPU access to OCD registers in the cur-

rently executing mode, only PID and DCCPU can be written. Illegal access to the registers will

be ignored with no error reporting.

Table 9-3.

OCD Register access

Can be written by

CPU in Monitor

Mode?

Can be written by JTAG

while CPU is running?

Register

Development Control (DC)

Yes

No

Yes

Read/Write Access Control/Status (RWCS)

Read/Write Access Address (RWA)

Read/Write Access Data (RWD)

Watchpoint Trigger (WT)

Can be written to disable /

enable trace channels.

Data Trace Control (DTC)

Yes

Data Trace Start Address (DTSA) Channel 1 to

2

Can only be written while

trace channel is disabled

Data Trace End Address (DTEA) Channel 1 to

2

Can only be written while

trace channel is disabled

Can be written to disable /

enable watchpoints /

breakpoints.

PC Breakpoint/Watchpoint Control (BWC)

Data Breakpoint/Watchpoint Control (BWC)

PC Breakpoint/Watchpoint Address (BWA)

Yes, if SR:DM is set.

Can be written to disable /

enable watchpoints /

breakpoints.

Can only be written while

breakpoint / watchpoint is

disabled

Yes, if SR:DM is set

or breakpoint

disabled.

Yes, if SR:DM is set

or breakpoint

disabled.

Can only be written while

breakpoint / watchpoint is

disabled

Data Breakpoint/Watchpoint Address (BWA)

Breakpoint/Watchpoint Data (BWD)

Yes, if SR:DM is set

or breakpoint

disabled.

Can only be written while

breakpoint / watchpoint is

disabled

Ownership Trace Process ID (PID)

Debug Optimization Control (DOC)

Yes

No

Yes

No

99

32001A–AVR32–06/06

Table 9-3.

OCD Register access (Continued)

Can be written by JTAG

Can be written by

CPU in Monitor

Mode?

Register

while CPU is running?

Can only be written while

breakpoint / watchpoint is

disabled

Yes, if SR:DM is set

or breakpoint

disabled.

Event Pair Control (EPC)

Debug Instruction Register

No

Debug Program Counter

No

Debug Communication CPU (DCCPU)

Debug Communication Emulator (DCEMU)

Yes

9.2.12

Debugging Java programs

Java mode operation

9.2.12.1

To run Java programs, a Java Virtual Machine (JVM) must be implemented in software. Java

bytecode programs can then be executed natively on the CPU by placing the CPU in Java

mode. This mode is described in the "AVR32 CPU Architecture" document. The Java mode

characteristics include:

• The CPU decodes instructions as Java bytecodes, each consisting of 1 or more bytes.

• Complex Java instructions are trapped and executed as a RISC routine embedded in the

JVM.

• Java programs can execute in Application or Supervisor mode, thus using the Application or

Supervisor register context, respectively.

• The lower half of the register file is remapped to operate as a push/pop stack for operands

• Other register file registers and system registers hold pointers to memory structures created

by the JVM.

9.2.12.2

Java and debug functionality

The operating mode of the CPU is contained in the bits in the upper half of the Status Register

(SR) in the CPU. When Debug Mode (OCD or Monitor Mode) is entered from Java mode, the

CPU switches to RISC mode (SR:D=1, SR:J=0) and the exception register context (SR:M=6).

By changing SR:M to Application or Supervisor mode, the debugger can execute RISC instruc-

tions on the CPU and still read out the register context of the Java program. The SR:J bit should

never be set in Debug Mode, as this can cause undefined behavior of the CPU.

The debug features available in RISC mode are also available for Java mode. Note the following

particularities about Java debugging:

• Software breakpoints are set by the Java breakpoint bytecode instead of the RISC

breakpoint opcode.

• Instruction replacement is possible when exiting from Debug Mode to Java mode. The same

procedure as for RISC mode can be used. The Java bytecode for the return instruction must

be written to the Debug Instruction Register, and the address of this instruction must be

written to the Debug Program Counter.

• If the memory allocation scheme for the JVM implementation is known, memory block read

commands can be used to extract any task information created by the JVM.

100

AVR32

32001A–AVR32–06/06

AVR32

• The incjosp RISC instruction is only used in JVM implementations. This instructioncannot be

breakpointed. Single stepping over the instruction will result in stepping over the next

instruction as well.

9.2.13

CPU optimization control

The CPU contains a number of optimization features which could obscure visibility and compli-

cate debugging. These features are normally controlled by the CPUCR system register, but are

automatically disabled by the OCD system according to which OCD features are enabled.

It is possible to for the debugger to override the default disabling of CPU optimization features by

writing the CPU Control Mask Register. The debugger can thus manually tune which features

should be disabled to enhance debugging of performance critical code, trading CPU perfor-

mance against debug visibility. CPUCM will retain its written value until written with a new value

or the OCD is reset.

The user should be familiar with the operation of the CPU pipeline and Data Cache to utilize

these optimization features. Changing the CPUCM register is only recommended for advanced

users who require high CPU performance during their debug sessions.

9.2.13.1

Branch prediction

By default, the CPU will optimize branch execution by attempting to predict the target address

for the branch. This has an adverse effect for program trace, since extended branch (br{cond4})

and extended rcall (rcall k21) could generate messages with incorrect target addresses.

The OCD system automatically disables this feature when program trace is enabled. If the code

does not contain extended rbanch or rcall instructions, or the user accepts incorrect target

addresses for these instructions, the user can write CPUCM:BEM to one.

It is not recommended to write both CPUCM:BEM and CPUCM:FEM to one during program

trace, as this will cause many errors in the program trace output.

9.2.13.2

Branch folding

The CPU pipeline can compress a branch instruction and the instruction following the branch

into one pipeline instruction, to improve instruction throughput. This process is known as branch

folding. If branch folding is enabled, the OCD is unable to observe the PC of a folded branch,

which can cause PC breakpoints on branch instructions to failt to trigger. Branch folding is there-

fore by default disabled when the OCD is enabled.

Branch folding can be kept enabled by writing CPUCM:FEM to one. This means that instructions

following a branch can not always be breakpointed.

9.2.13.3

Return stack

To speed up subroutine and interrupt handling, the CPU can buffer the most recently used

return addresses in the Return Stack instead of having to fetch them from the regular data mem-

ory stack.

If returning from a subroutine using a ld/ldm or popm to PC, and this address is fetched from the

return stack, program trace will not report an Indirect Branch message, as it should.

For this reason, the OCD system disables the return stack when program trace is activated. If

the code does not contain loads or pop to PC, the user can keep the return stack enabled by

writing CPUCM:REM to one.

101

32001A–AVR32–06/06

9.2.13.4

Imprecise breakpoints

The CPU will normally issue more instructions for execution before previous instructions are

completed. When breakpointing memory operations which cause exceptions, the exception may

already have been started. This causes the breakpoint to behave incorrectly, typically triggering

for a later instruction instead. To avoid confusion, the OCD ensures breakpoints are precise

when the OCD is enabled. This forces the CPU to delay the exception check for memory opera-

tions until a possible breakpoint has been resolved. This normally causes one cycle penalty for

memory operations.

It is possible, but not recommended, to allow imprecise breakpoints during debugging by writing

CPUCM:IBEM to one.

9.2.13.5

Imprecise execution

The AVR32 AP CPU contains logic to optimize instruction execution, which implies that instruc-

tions may complete out-of-order. In a debug context, this may lead to imprecise behavior.

Specifically, when a data breakpoint triggers, one or more instructions following the breakpoint

may have been executed. Also, when using DC:OVC to prevent data trace overruns, several

memory operations may have already been started, possibly causing an overrun situation.

For this reason, the OCD system forces the CPU to use precise execution when data break-

points are enabled or DC:OVC prevents data trace overruns. Precise execution will reduce

memory performance significantly. Imprecise execution can be kept enabled by writing

CPUCM:IEEM to one.

9.2.14

Messages

9.2.14.1

Debug Status (DEBS)

This message is output when the CPU enters or exits Debug Mode or a low-power mode. The

message is output whenever the AUX port is enabled. The STATUS field of this message con-

tains the information in the Development Status register. The field will contain these values:

• The CPU enters Debug Mode: STATUS bits indicate cause of entry to Debug Mode. DBS is

set if OCD Mode was entered.

• The CPU exits Debug Mode: STATUS = 0.

• The CPU enters a low-power mode: Only the STP bit is set, while the other bits are zero.

• The CPU exits a low-power mode: STATUS = 0

Table 9-4.

Debug Status

Debug Status Message

Packet

Size

Packet

Name

Packet

Type

Description

32

6

STATUS

TCODE

Fixed

The contents of the Development Status register.

Value = 0

102

AVR32

32001A–AVR32–06/06

AVR32

9.2.15

Registers

Device ID Register (DID)

9.2.15.1

The Device ID Register (DID) provides key attributes to the development tool concerning the

embedded processor. This is the same as the value returned by the JTAG ID instruction.

Table 9-5.

DID Register

R/W

Bit Number Field Name

Init. Val.

Description

Part

specific

R

31:28

27:12

11:1

RN

PN

RN - Revision Number

Part

specific

R

PN - Product Number

Manufacturer ID

0x01F = ATMEL

MID

0x01F

1

Reserved

0

Reserved

This bit always reads as 1

9.2.15.2

Nexus Configuration Register (NXCFG)

The Nexus Configuration Register (NXCFG) provides key information about the specific imple-

mentation of the CPU and OCD architecture, and the configuration of the Nexus development

features on this device. This information is static, and may be used to develop generic Nexus

debuggers which will work across a family of AVR32 devices with different Nexus configurations.

Table 9-6.

Nexus Configuration Register

R/W

Bit Number

Field Name

Init. Val.

Description

R

31:29

Reserved

0

Direct Memory Access support

0 = Not supported

R

28

NXDMA

NXDTC

NXDRT

NXDWT

NXOT

0

1 = Supported

Data Trace Channels

0 = Not supported

1 = Supported

27:25

24

Data Read Trace Support

0 = Not supported

1 = Supported

Data Write Trace Support

0 = Not supported

1 = Supported

23

Ownership Trace support

0 = Not supported

1 = Supported

22

Program Trace support

0 = Not supported

1 = Supported

21

NXPT

103

32001A–AVR32–06/06

Table 9-6.

Nexus Configuration Register

R/W

Bit Number

Field Name

Init. Val.

Description

AUX MDO pins

R

20:17

16

NXMDO

6

0 = no MDO or MSEO pins

n = n MDO pins, NXMSEO MSEO pins

AUX MSEO pins

0 = 1 MSEO pin

1 = 2 MSEO pins

R

NXMSEO

1

R

15:12

11:8

NXDB

2

6

Number of Data breakpoints

Number of PC breakpoints

NXPCB

OCD Version

R

7:4

3:0

NXOCD

0

0000 = AVR32 AP OCD

Other = Reserved

Architecture

0000 = AVR32B

0001 = AVR32A

Other = reserved

NXARCH

9.2.15.3

Debug Communication CPU Register (DCCPU)

If the CPU wants to transmit data to the debugger tool, it writes data to the Debug Communica-

tion CPU Register using mtdr. By writing this register, a dirty bit is set in the Debug

Communication Status Register. The emulator should poll the status register and read DCCPU if

the dirty bit is set.

Table 9-7.

Debug Communication CPU Register

R/W

Bit Number

Field Name

Init. Val.

Description

Data Value

0x0000_

0000

R/W

31:0

DATA

Data written by CPU

9.2.15.4

Debug Communication Emulator Register (DCEMU)

When the emulator writes to this register, a dirty bit is set in the Debug Communication Status

register. The CPU can poll this bit to see if DCEMU contains unread data..

Table 9-8.

Debug Communication Emulator Register

R/W

Bit Number

Field Name

Init. Val.

Description

Data Value

0x0000_

0000

R/W

31:0

DATA

Data written by Emulator

104

AVR32

32001A–AVR32–06/06

AVR32

9.2.15.5

Debug Communication Status Register (DCSR)

To avoid overruns the CPU must poll this register before writing a new value to DCCPU. Note

that the bits in this register are not automatically cleared in OCD mode. This allows a debugger

to update views and observe the system without accidentally modifying the DCSR register.

Table 9-9.

Debug Communication Status Register

R/W

Bit Number

Field Name

Init. Val.

Description

Reserved

0x0000_

0000

R

31:2

1

Reserved

These bits are reserved, and will always read as 0

Emulator Data Dirty

0 = DCEMU has not been written to since last read

from CPU.

R/W

EMUD

CPUD

0

1 = DCEMU contains a new data value.

This bit is cleared by reading DCEMU.

CPU Data Dirty

0 = DCCPU has not been written to since last read

from emulator.

0

1 = DCCPU contains a new data value.

This bit is cleared by reading DCCPU.

9.2.15.6

Development Control Register (DC)

DC is used for basic development control of the CPU.

Table 9-10. Development Control Register

R/W

Bit Number

Field Name

Init. Val.

Description

ABORT

Writing ABORT to one while DBE is asserted

causes the CPU to enter Debug Mode, regardless

of SR:DM and any pending exceptions. If the CPU

was in sleep mode, it will first be woken up before

entering Debug Mode. The ABORT bit is cleared

automatically when Debug Mode is entered.

R/W

31

ABORT

0

RES - Application Reset

Writing this bit causes an application reset, which

will reset the CPU and other system modules. The

OCD state machines will be reset and the

S

30

RES

0

Transmit Queue flushed, but the OCD control and

configuration registers will not be cleared.

MM - Monitor Mode

1 = The CPU will enter Debug Mode in Monitor

Mode

R/W

29

28

MM

0

0 = The CPU will enter Debug Mode in OCD Mode

Changing this bit in Debug Mode does not take

effect until the CPU enters Debug Mode the next

time.

ORP - OCD Register Protect

0 = OCD registers can be written by any privileged

CPU mode

ORP

1= OCD registers can be written only in Debug

Mode

105

32001A–AVR32–06/06

Table 9-10. Development Control Register

R/W

Bit Number

Field Name

Init. Val.

Description

RID - Run In Debug

R/W

27

RID

0

0: Peripherals are frozen in Debug Mode

1: Peripherals keep running in Debug Mode

TSR - Trap Soft Reset

0: A soft reset event causes the CPU to be reset

R/W

26

25

TSR

TOZ

0

1: A soft reset event causes the CPU to enter

Debug Mode.

TOZ - Trap Opcode Zero

0: The opcode 0x0000 is executed as a normal

CPU instruction

1: The opcode 0x0000 causes entry to Debug

Mode

IFM - Ignore First Match

When written to one, a PC breakpoint on the first

instruction after exiting Debug Mode with the retd

instruction will not trigger re-entry to Debug Mode.

Typically used when returning from a program

breakpoint. This bit stays one until written to zero.

R/W

24

23

IFM

IRP

0

IRP - Instruction Replace

If IRP is written to one before exiting OCD Mode

with the retd instruction, the first instruction after

exiting OCD Mode will be fetched from the Debug

Instruction Register. This bit is cleared

automatically after this fetch takes place. This bit

will not have any effect if written at the same time

as RES.

SQA - Software Quality Assurance

0: Regular program trace

R/W

22

SQA

EOS

0

1: SQA enhanced program trace

EOS - Event Out Select

00 = No operation

01 = Emit event out when the CPU enters Debug

Mode

21:20

10 = Emit event out for breakpoints/watchpoints

11 = Emit event out for message insertion into the

TXQ

R

19:14

13

Reserved

DBE

DBE - Debug Enable

DBE enables Debug Mode and all debug features

in the CPU. DBE must be written to one to enable

breakpoints, debug requests, or single steps.

R/W

0

DBR - Debug Request

Writing DBR to one while DBE is asserted causes

the CPU to enter Debug Mode. If the CPU was in

sleep mode, it will first be woken up before

entering Debug Mode. The DBR bit is cleared

automatically when Debug Mode is entered.

R/W

12

DBR

106

AVR32

32001A–AVR32–06/06

AVR32

Table 9-10. Development Control Register

R/W

Bit Number

Field Name

Init. Val.

Description

11:9

Reserved

SS - Single Step

If SS is written to one before exiting Debug Mode

with the retd instruction, exactly one instruction will

be executed before returning to Debug Mode. SS

stays one until written to zero by the debugger.

R/W

8

SS

0

OVC[2:0] - Overrun Control

OVC controls the action taken if Branch, Data, or

Ownership trace messages are generated while

the Transmit Queue is full. Settings 111 though

100 are reserved.

000 = Generate overrun messages

R/W

7:5

OVC

0

001 = Delay CPU to avoid BTM and Ownership

Trace overruns

010 = Delay CPU to avoid DTM and Ownership

Trace overruns

011 = Delay CPU to avoid BTM, DTM, and

Ownership Trace overruns

111-100 = Reserved

EIC[1:0] - EVTI Control

The EIC bits control the action performed when

the EVTI pin on the Nexus debug port receives a

high-to-low transition. If trace is enabled, EVTI can

be configured to cause a trace synchronization

message. If Debug Mode is enabled, EVTI can be

configured to cause a breakpoint.

R/W

4:3

EIC

0

00 = EVTI for program and data trace

synchronization

01 = EVTI for breakpoint generation

10 = No operation

11 = Reserved

TM[2:0] - Trace Mode

The TM bits select which trace modes are

enabled.

000 = No Trace

XX1 = OTM Enabled

X1X = DTM Enabled

1XX = BTM Enabled

R/W

2:0

TM

0

If Data or Branch tracing is triggered or stopped by

a watchpoint , the DTM and BTM bits are updated

accordingly.

9.2.15.7

Development Status (DS) register

This register is used to examine the debug state of the CPU and the cause for entering Debug

Mode. Note that multiple sources may trigger Debug Mode simultaneously, causing more than

one bit to be set. The register is read-only. All bits are dynamic and do not require clearing.

107

32001A–AVR32–06/06

This register is undefined when the CPU is not in Debug Mode.

Table 9-11. Development Status register

R/W

R

Bit Number

31:29

Field Name

Reserved

Init. Val.

0

Description

NTBF - NanoTrace Buffer Full

This bit is set if Debug Mode was entered due to

the NanoTrace buffer being full. This bit is cleared

when Debug Mode is exited.

R

28

27

NTBF

EXB

0

EXB -External Breakpoint

This bit is set if Debug Mode was entered due to

an event on the EVTI pin. This bit is cleared when

Debug Mode is exited.

DBA - Debug Acknowledge

This bit is set if Debug Mode was entered due to

setting the Debug Request or ABORT bit in the

DC register. This bit is cleared when Debug Mode

is exited.

R

26

25

DBA

BOZ

0

BOZ - Break on Opcode Zero

This bit is set if Debug Mode was entered due to

opcode 0x0000 being executed. This bit is cleared

when Debug Mode is exited.

INC - Instruction Complete

0: The CPU is executing one or more instructions,

or is not in OCD Mode.

R

24

INC

0

1: The CPU is in OCD Mode and is not executing

any instructions.

23:16

Reserved

BP - Breakpoint Status

The BP bits identify which hardware breakpoint

caused Debug Mode to be entered:

BP[0]: BP0A

BP[1]: BP0B

BP[2]: BP1A

BP[3]: BP1B

BP[4]: BP2A

BP[5]: BP2B

BP[6]: BP3A

BP[7]: BP3B

R

15:8

BP[7:0]

0

These bits are cleared when Debug Mode is

exited.

R

7:6

5

Reserved

DBS

0

DBS - Debug Status

DBS is set when the CPU is in OCD Mode,

otherwise cleared. This bit stays cleared also

when the CPU operates in Monitor Mode.

108

AVR32

32001A–AVR32–06/06

AVR32

Table 9-11. Development Status register

R/W

Bit Number

Field Name

Init. Val.

Description

STP - Stop Status

STP is set if OCD Mode is entered from sleep

mode. This bit can be used by the debugger to

determine the proper return sequence from OCD

Mode. This bit is cleared when OCD Mode is

exited.

R

4

STP

0

HWE - Hardware Error

This bit is set if a hardware error has triggered

entry to Debug Mode. The debugger should

assume that all status information has been lost,

and write the RES bit in DC to reset the system.

The OCD control and configuration registers

should be reconfigured.

R

3

2

HWE

HWB

0

HWB - Hardware Breakpoint Status

This bit is set if Debug Mode was entered due to a

hardware breakpoint. The BP[7:0] bits should be

examined to determine the breakpoint(s) which

triggered. This bit is cleared when Debug Mode is

exited.

SWB - Software Breakpoint Status

This bit is set if Debug Mode was entered due to a

breakpoint instruction being executed. Returning

from a software breakpoint may require special

handling by the debugger. This bit is cleared when

Debug Mode is exited.

R

1

0

SWB

SSS

0

SSS - Single Step Status

This bit is set when Debug Mode is entered due to

a single step. This bit is cleared when Debug

Mode is exited.

9.2.15.8

Debug Instruction Register (DINST)

The Debug Instruction Register contains the instruction to be executed in OCD Mode. The CPU

fetches and executes the instruction faster than they can be written by the Debug port. DINST is

also used to store the instruction to replace the breakpoint instruction.

Table 9-12. Debug Instruction register

R/W

Bit Number

Field Name

Init. Val.

Description

DINST - Debug Instruction

R/W

31:0

DINST

0

The instruction to be executed on the CPU.

9.2.15.9

Debug Program Counter (DPC)

This register contains the PC value of the last executed instruction in any non-debug mode. This

allows a debugger to sample program execution addresses for statistical purposes without inter-

rupting the CPU.

If this register is read in Debug Mode, it will reflect the last executed instruction before Debug

Mode was entered. Note that several types of breakpoints trigger before an instruction is exe-

cuted, so this value is not necessarily identical to RAR_DBG.

109

32001A–AVR32–06/06

When replacing the return instruction from Debug Mode, the CPU will see the DPC value as the

PC value for the executed instruction. The user only needs to write this register when replacing

the return instruction from OCD Mode.

Table 9-13. Debug Program Counter

R/W

Bit Number

Field Name

Init. Val.

Description

DPC - Debug Program Counter

PC of the last executed instruction

R/W

31:0

DPC

0

9.2.15.10

CPU Control Mask Register (CPUCM)

This register prevents the OCD from overriding the operation of the CPU Control Register

(CPUCR). A value written to this register is kept until a new value is written or the OCD is reset..

Table 9-14. CPU Control Mask Register

R/W

Bit Number

Field Name

Init. Val.

Description

R

31:6

Reserved

0

Imprecise Execution Enable Mask

R/W

5

4

IEEM

IBEM

0

When set, the OCD will not disable imprecise

execution.

Imprecise Breakpoint Enable Mask

When set, the OCD will not disable imprecise PC

breakpoints.

Return stack Enable Mask

R/W

3

2

REM

FEM

0

When set, the OCD will not disable the return

stack.

Branch Folding Enable Mask

When set, the OCD will not disable branch folding.

Branch Prediction Enable Mask

R/W

1

0

BEM

0

When set, the OCD will not disable branch

prediction.

Reserved

110

AVR32

32001A–AVR32–06/06

AVR32

9.3

Debug Port

9.3.1

Overview

The OCD debug port consists of the JTAG port and the AUX port. The low bandwidth JTAG port

handles all register access, while the high bandwidth AUX port transfers all Nexus messages

from the OCD system.

The Nexus standard defines the maximum clock frequency for JTAG to be 33 MHz, and for AUX

200 MHz.

9.3.2

JTAG

Access to OCD register is done through an IEEE1149.1 JTAG-port. The JTAG TAP controller is

shared with the rest of the system. In order to enable access to OCD register the emulator must

perform the following sequence.

1. Put the TAP controller in the state "test logic reset".

2. Insert the OCD Instruction to prepare the Debug Port to receive OCD register access.

The OCD instruction is inserted using the IR scan path.

3. Use the DR scan path to insert the OCD register address and operation (Read / Write).

4. Use the DR scan path to read / write the data to / from the register.

5. Repeat 3 through 4 for every register operation. The TAP controller will remain in OCD

mode until a test logic reset is detected.

To be able to use JTAG-based debug tools for AVR32 without adapters, it is recommended that

a circuit design using an AVR32 device should use a standard 10-pin 50-mil IDC connector with

the pinout shown in Table 9-15. The signals are described in Table 9-16.

Table 9-15. AVR32 standard JTAG connector pinout. All directions relative to processor

Signal

TCK

TDO

TMS

N/C

Dir

In

1

2

Dir

Signal

GND

Out

In

3

4

Out

In

VREF

RESET_N

N/C

5

6

7

8

TDI

In

9

10

N/C

Table 9-16. JTAG signals

Direction

Input

Description

TRST_N

TCK

Asynchronous reset for the TAP controller and JTAG registers

Test Clock. Data is driven on falling edge, sampled on rising edge.

Test Mode Select

Input

TMS

TDI

Input

Test Data In

111

32001A–AVR32–06/06

Table 9-16. JTAG signals

Direction

Output

Input

Description

Test Data Out

Device reset

TDO

RESET_N

VREF

Output

Reference voltage from target. Signals should be driven relative to this

voltage level.

Figure 9-4. JTAG TAP controller state diagram.

Test-Logic-

Reset

1

0

1

Run-Test/

Idle

Select-DR

Scan

Select-IR

Scan

0

1

0

Capture-DR

Capture-IR

1

0

Shift-DR

0

Shift-IR

0

1

Exit1-DR

Exit1-IR

0

Pause-DR

0

Pause-IR

0

1

Exit2-DR

1

Exit2-IR

1

0

1

Update-DR

0

Update-IR

0

1

9.3.3

AUX port

The Auxiliary (AUX) port and messaging protocol follow the definitions of the Nexus standard.

This standard allows varying the number of signalling pins. The following configuration is

selected for AVR32 AP.

• 6 data output pins (MDO)

• 2 message start/end output pins (MSEO)

• 1 EVTO pin

• 1 EVTI pin

The configuration is based on the presumed needs for bandwidth in a system being traced at

100+ MIPS, balanced against the desire to keep debug pincount low. This configuration can be

changed in future implementations to allow for greater or smaller bandwidth over the AUX port.

112

AVR32

32001A–AVR32–06/06

AVR32

The AUX pins may be multiplexed with GPIO in a device. By default, the MCKO, MDO, and

MSEO pins are tristated or used as GPIO, and the Nexus functionality must be explicitly enabled

by the debugger. EVTO, EVTI, and the JTAG pins are always available to the debugger.

If the AUX pins are needed for Nexus functionality in an application, it is recommended not to

use these pins for GPIO purposes, as this can affect the signal integrity required for Nexus

operation.

The complete signal list of the AUX port is shown in Table 9-17.

Table 9-17. Auxiliary pins

Auxiliary

pins

Direct

Width ion

Description

Message Clockout (MCKO) is a free-running output clock to

development tools for timing of MDO and MSEO pin functions.

MCKO

1

O

Message Data Out (MDO[5:0]) are output pins used for all messages

generated by the device. In single datarate mode, external latching of

MDO shall occur on rising edge of MCKO. In double datarate mode,

external latching of MDO shall occur on both edges of MCKO.

MDO

6

O

Message Start/End Out (MSEO[1:0]) pins indicate when a message on

the MDO pins has started, when a variable length packet has ended,

and when the message has ended. In single datarate mode, external

latching of MSEO shall occur on rising edge of MCKO. In double

datarate mode, external latching of MSEO shall occur on both edges

of MCKO.

MSEO

EVTO

2

1

O

Event Out (EVTO) is an output pin which can be configured to toggle

every time a message is inserted into the Transmit Queue, when the

CPU entered OCD Mode, or when a breakpoint or watchpoint hit

occured, as configured by the EOS bits in the Development Control

register .

Event In (EVTI) is an input which, when a high-to-low transition occurs,

a processor is halted (breakpoint) or program and data

synchronization messages are transmitted from the OCD controller, as

configured by the EIC bits in the Development Control register.

EVTI

1

I

RESET_N

System reset

113

32001A–AVR32–06/06

To be able to use AUX-based debug tools for AVR32, a circuit design using an AVR32 device

should use a Mictor38 connector (AMP P/N 767054-1) as defined in the Nexus standard, with

the pinout shown in Table 9-18.

Table 9-18. AVR32 standard Nexus connector pinout. All directions relative to processor

Signal

MSEO0

MSEO1

MCKO

EVTO_N

MDO0

MDO1

MDO2

MDO3

MDO4

MDO5

Dir

38

36

34

32

30

28

26

24

22

20

18

16

14

12

10

8

37

35

33

31

29

27

25

23

21

19

17

15

13

11

9

Dir

Signal

N/C

Out

N/C

Out

In

N/C

In

TRST_N

TDI

TMS

TCK

N/C

VREF

Out

In

TDO

RESET_N

N/C

EVTI_N

N/C

7

6

5

N/C

4

3

N/C

2

1

N/C

9.3.3.1

Reset configuration

Message transmission can be enabled or disabled according to the state of the EVTI pin when

the JTAG TAP controller is reset. When messaging is enabled, output messages are transmitted

normally. If message transmission is disabled, the auxiliary output pins (MCKO, MDO, MSEO)

are tristated, and no messages will be transmitted.

Reset configuration information must be valid on EVTI at least 2 TCK periods prior to negation of

TRST or exit from the TEST-LOGIC-RESET TAP state. The AUX port will be enabled as shown

in Table 9-19.

If the Nexus port is disabled after reset, the debugger can still enable the port by writing to the

AXC:AXE (Auxiliary Enable) bit to enable trace functionality at any time before trace is activated.

114

AVR32

32001A–AVR32–06/06

AVR32

Debug functionality based on the JTAG or EVTI or EVTO pins is still available even if the AUX

port is disabled.

Table 9-19. EVTI pin reset configuration

Reset state

Description

0

1

Message transmission enabled

Message transmission disabled (default)

9.3.3.2

Message protocol

The OCD System implements the Auxiliary Port Message Protocol defined in the Nexus stan-

dard. The following section is merely a summary of this protocol. For details, please see the

Nexus standard.

Messages are composed of a Start-of-Message (SOM) token, followed by one or more packets

of information, each of fixed or variable length, and ended by an End-of-Message (EOM) token.

SOM/EOM and End-of-Variable-Length-Packets (EVLP) are signalled by MSEO for transmitted

messages. Packet information is carried by the MDO pins. The number of MDO pins available is

known as the port boundary. The information carried by the MDO and MSEO pins each cycle is

known as a frame.

9.3.3.3

Message rules

MDO is valid whenever MSEO does not indicate "idle".

Fixed length packets are implicitly recognized from the message format, and are not required to

end on a port boundary. Thus, packets may also start within a port boundary if following a fixed

length packet. The end of variable length packets is identified through the MSEO pins, and to

identify the end of the packet uniquely, these packets must end on a port boundary. If neces-

sary, the packet must be stuffed with zeroes to align the end to a port boundary. Variable length

packets may be truncated by omitting leading zeroes so that the packet ends on the first possi-

ble port boundary.

• The MSEO pins behave the following way ("x" means "don’t care"):

• 0b11 followed by 0b00 indicates SOM

• 0b0x followed by 0b11 indicates EOM

• 0b00 followed by 0b01 indicates EVLP

• MSEO is 0b00 at all other clocks during transmission of a message

• MSEO is 0b11 at all clocks when idle.

9.3.3.4

Clock and frame rate

In single datarate mode (default), MDO and MSEO should be sampled by an external tool on the

rising edge of MCKO. In double datarate mode, the MCKO clock runs at half frequency, so MDO

and MSEO should be sampled on both edges of MCKO. This is configured by the Double Dat-

arate bit in the AUX Port Control Register.

It is also possible to reduce the frequency of the AUX port compared to the CPU clock by writing

the AXC:LS and AXC:DIV bits. If LS=1, the DIV value selects the frame rate of the AUX port:

f_AUX= f_CPU/(DIV+1)

If LS=1 and DIV=0, f_AUX= f_CPU/2.

115

32001A–AVR32–06/06

This can be combined with the single or dual datarate mode, as described above. In either

case, the sampling edge will be as close to the middle of the MDO data frame as possible. The

duty cycle of the MCKO clock will stay within the 40-60 duty cycle requirement of the Nexus

standard for all settings apart from DIV=2.

9.3.3.5

Example

Figure 9-5 shows an example of transmission of a Program Trace Indirect Branch message. The

TCODE is fixed at 6 bits (=4 for PTIB), followed by a fixed-length packet (EVT-ID = 2), and a

variable-length packet (I-CNT = 63). I-CNT is stuffed with zeroes to fit the port boundary. Finally,

the variable packet U-ADDR (=5) is transmitted. Since this leading zeroes of this packet can be

truncated, it fits within a single frame.

Figure 9-5. Example of a Nexus message transmission with single and double datarate.

IDLE

SOM

NORMAL

EVLP

EOM

MCKO (DDR=1)

MCKO (DDR=0)

MSEO[1..0]

MDO[5..0]

11

00

01

11

000100

111110

000011

000101

I-CNT = 63

EVT-ID = 2

U-ADDR = 5

TCODE = 4

Zero stuffing

9.3.3.6

Transmit queue and overruns

Messages from various sources are inserted in a Transmit Queue (TXQ), which stores a number

of frames. This queue acts as a FIFO which allows messages to be inserted more rapidly than

they can be retrieved by the emulator.

The queue holds 16 frames. If more messages are inserted than there is room for in the queue,

information will be lost, and an overrun situation occurs. The TXQ will block any more messages

from being inserted, and allow the queue to be emptied by the emulator before allowing any

more messages to be inserted. The first message to be inserted after the overrun is cleared, is

an Error message, which informs the emulator that an overrun has occurred and which types of

trace messages have been lost. After this, transmission continues as normal.

Alternatively, the user can configure the OCD to halt the CPU to prevent overruns. This can be

done selectively for different message types, and is controlled by writing to the Overrun Control

(OVC) bits in the DC register.

9.3.3.7

Trace and reset

All pending trace messages in the Transmit Queue are flushed if: the OCD is reset by a system

reset; the OCD is disabled; or an application reset is triggered by writing to the DC:RES bit.

Thus, if the CPU is reset, but not the OCD, the program flow can be observed by program trace.

However, if the debugger resets the system, the remaining messages in the queue are of no

value, and expected to be flushed.

116

AVR32

32001A–AVR32–06/06

AVR32

Note that if the OCD is disabled (by clearing DC:DBE or by a system reset), trace is suspended

until DC:DBE is written to one. The DC:TM bits must be written simultaneously, and define which

trace features should now be active.

Similarly, when an application reset is triggered by writing DC:RES, the DC:TM bits are written

simultaneously and define which trace features should now be active.

9.3.4

Messages

9.3.4.1

Error

The error message indicates various errors that can occur during trace or debugging. Table 9-21

lists the various errors that can be reported, along with the associated ECODE.

If trace messages are lost because of insufficient space in the Transmit Queue, an error mes-

sage is transmitted, followed by a synchronization message, as soon as space is available in the

Transmit Queue.

Table 9-20. Error

Indirect Branch Message with Sync

Direction: From target

Packet Size

(bits)

Packet Name

ECODE

Packet Type

Fixed

Description

5

6

Error code. Refer to Table 9-21.

Value = 8

TCODE

Fixed

Table 9-21. Error codes

ECODE

0b00000

0b00001

0b00010

Description

Ownership trace overrun

Program trace overrun

Data trace overrun

0b00011 -

0b00101

Reserved

0b00110

0b00111

0b01000

Watchpoint overrun.

Program and/or data and/or ownership trace overrun.

Program trace and/or data and/or ownership trace and/or watchpoint overrun.

0b01001 -

0b11111

Reserved

117

32001A–AVR32–06/06

9.3.5

Registers

9.3.5.1

Auxiliary Port Control Register (AXC)

Table 9-22 shows the description of the Auxiliary Port Control Register. This register allows

greater flexibility in controlling the operation of the AUX port than specified by the Nexus stan-

dard. This includes enabling the AUX port, and controlling the speed of the clock and data

compared to the CPU clock.

Table 9-22. AUX Port Control Register

R/W

Bit Number

Field Name

Init. Val.

Description

Reserved

R

31:14

Reserved

0

These bits are reserved, and will always read as 0

This bit is reserved for internal test purposes and

should be written to zero.

R/W

13

12

REXTEN

REX

0

This bit is reserved for internal test purposes and

should be written to zero.

LS - Low Speed

0:AUX port runs at the same speed as the CPU

R/W

11

10

LS

0

1:AUX port runs at reduced speed compared to

the CPU.

DDR - Double Data Rate

Setting this bit halves the MCKO rate so that MDO

data must be sampled on both edges of MCKO.

DDR

1 = Double data rate mode

0 = Single datarate mode

AXS - Auxiliary Port Select

0: AUX port is used for GPIO

R/W

9

AXS

0

1: AUX port is used for Nexus operation.

This bit does not need to be written in devices with

dedicated AUX pins

AXE - Auxiliary Port Enable

0: AUX port is tristated

R/W

R

8

AXE

0

1: AUX port is used for Nexus operation.

Reserved

7:4

3:0

Reserved

DIV

These bits are reserved, and will always read as 0

DIV - Division factor

R/W

If LS=1, the DIV value selects the frame rate of the

AUX port.

118

AVR32

32001A–AVR32–06/06

AVR32

9.4

Breakpoints

9.4.1

Overview

The Nexus Recommended Register map supports up to 8 universal breakpoints. However since

the AVR32 AP hardware employs separate instruction and data memories, the OCD system

must also separate program and data breakpoints. Any breakpoint can also be programmed as

a watchpoint. The watchpoint will trigger a Watchpoint Hit message. The OCD system supports

up to six program breakpoints modules and two data breakpoint modules. In addition to this, the

data trace modules can also be used as data address watchpoints. The trace watchpoints result

in a vendor defined Trace Watchpoint Hit message.

Figure 9-6. Breakpoint modules.

CPU

Data

Address

Data Value

Data

Address

PC

Program

BP/WP

Data

BP/WP

Trace

BP/WP

119

32001A–AVR32–06/06

Figure 9-7. Breakpoint unit overview.

PC

Breakpoint

Unit

PC

Breakpoint

Module 0A

Breakpoint

Module 0B

Breakpoint

Module 1A

Breakpoint

Module 1B

Breakpoint

Module 2A

Breakpoint

Module 2B

6 PC

Breakpoints

Data

Breakpoint

Unit

5 PC Watchpoints

2 Data Watchpoints

Data

Breakpoint

Module 3A

Breakpoint

Module 3B

Trigger

Unit

6 PC

Watchpoints

Event Pair 3

Address Range

Double Word

Start/

Stop

Start/

Stop

Program

Trace

Unit

Data

Trace

Unit

2 Data

Watchpoints

2 Data

Breakpoints

2 Range Data

Watchpoints

Watchpoint

Message

Generator

Messages to

Transmit Queue

Debug

Optimization

Unit

CPU

Control signals

9.4.2

Breakpoint Unit description

The Breakpoint unit consists of the units shown in Figure 9-7. The PC Breakpoint Unit (PBU)

handles the program counter breakpoints. The PBU can have up to 6 PC breakpoint modules

that can match on a single PC. Two modules can be combined to give a match on a range of PC

values, thus up to three ranges can be defined. The PBU is configured with registers Breakpoint

/ Watchpoint Control (BWC) and Breakpoint / Watchpoint Address (BWA) 0A, 0B, 1A, 1B, 2A,

and 2B.

The Data Breakpoint Unit handles data breakpoints. The data breakpoints can be configured

with the BWC / BWA / BWD 3A and 3B registers, as well as EPC3.

120

AVR32

32001A–AVR32–06/06

AVR32

The Watchpoint Message Generator (WMG) generates watchpoint messages for all breakpoint

modules and data trace watchpoints.

Optionally, a breakpoint or watchpoint can be signalled by a pulse on the EVTO pin. This

requires DC:EOS bits to be set to 1 and EOC in the corresponding Breakpoint/Watchpoint Con-

trol Register must be written to one.

9.4.2.1

Program Breakpoints

In order to enable a simple program breakpoint the Breakpoint / Watchpoint Address (BWA) and

Breakpoint / Watchpoint Control (BWC) registers for that breakpoint must be updated.

The BWA register must be written with the address of the instruction where the debugger wants

to halt.

BWA operates on virtual addresses. In order to get a precise match on a virtual address if MMU

is enabled, Address Space Identifier (ASID) matching must be enabled in the BWC, and the

ASID must be written to the ASID field of the BWC. If the ASID to match on will be read from the

current ASID in the MMU.

The BWC must have the Breakpoint / Watchpoint Enable (BWE) field set to breakpoint.

Program breakpoints break on the instruction pointed to by BWA. The instruction will cause a

debug exception and the Debug Mode Return Address Register (RAR_DBG) and Debug Mode

Return Status Register (RSR_DBG) will point to the instruction that caused the debug exception.

The Development Status register will also be updated to indicate which breakpoint caused the

exception. In OCD Mode the debug tool can then feed the CPU with debug code to ascertain the

state of the processor. In OCD Mode the breakpoint modules are disabled.

Upon return from Debug Mode, the PC and SR will be restored from the RAR_DBG and

RSR_DBG and the instruction that caused the debug exception will be fetched again. If the pro-

gram breakpoint has not been disabled in Debug Mode, the Ignore First Match (IFM) bit in the

Development Control (DC) register must be written to one to avoid triggering another breakpoint

on the first instruction after exiting Debug Mode. The IFM bit prevents any Program Breakpoint

operation on the first instruction after exiting Debug Mode.

The AME bit in the BWCA registers can be used to enable a bitwise address masking. When

AME is enabled the BWA register in the B module is used as a noninverting bitwise mask that is

applied to the PC and the value in BWA register in the A module. The A breakpoint will thus trig-

ger when PC = BWAnA & BWAnB. The B breakpoint will never trigger when AME is enabled.

9.4.2.2

Watchpoints

When enabled in the BWC, a watchpoint message is sent when the instruction address matches

the address stored in BWA. If both a Trace watchpoint and a Watchpoint triggers at the same

time, the Trace watchpoint will be ignored and only a Watchpoint Hit message will be generated.

Note that Program, Data, and Trace watchpoints are generated at different pipeline stages and

will not be synchronized when the messages are generated. A Program Watchpoint on a load

store instruction will hit before a data watchpoint on the same instruction.

121

32001A–AVR32–06/06

9.4.2.3

Data Breakpoints

Data Breakpoint modules listen on the data address and data value lines between the CPU and

the data cache and can halt the CPU, or send a watchpoint message, if the address and / or

value meets a stored compare value. Unlike program breakpoints, data breakpoints halt on the

next instruction after the load / store instruction that caused the breakpoint has completed.

The BWA register must be written with the address of the data the debugger wants to halt on.

BWA matches on virtual addresses. In order to get a precise match on a virtual address when

MMU is enabled, the Address Space Identifier (ASID) matching must be enabled in the BWC,

and the ASID must be written to the ASID field of the BWC. The ASID to match on will be read

from the current ASID in the MMU.

As shown in Figure 9-8 the data breakpoint modules snoop on the address and data lines

between the CPU and the data cache. This ensures that the data breakpoints only trigger on

actual load / store operations and not on prefetch or other automated cache related accesses.

This is required for data breakpoints to be consistent with the CPUs view of the data memory. It

does however, have the effect that a write to cached memory will trigger before the write has

been flushed to memory. Uncached writes will trigger as they are written to the memory.

Data breakpoints are not available in Monitor Mode.

Figure 9-8. Data breakpint interface.

CPU

Data

Breakpoints

Data

Address

Data

Data Cache

System Bus

Memory

9.4.2.4

Data alignment

The AVR32 can read or write data in bytes, halfwords, or words. The same data location can be

accessed through either operation, e.g. a byte location can be accessed as part of a double

word. The data bus operations seen by the OCD system are always aligned, i.e. halfwords start

on halfword boundaries, word accesses start on word boundaries, as illustrated in Figure 9-9. If

the data bus operation is a double word load / store, the breakpoint module will see the word

data value which corresponds to the address in BWA.

122

AVR32

32001A–AVR32–06/06

AVR32

One data breakpoint module can only compare 32 bits of data. The data to be matched can

therefore not cross a word boundary if the data breakpoint is to match correctly. When the

debugger wants to match on a byte or halfword, the BWD register must be written with the LSB

aligned, and the BWC:BME bits must be set to mask the upper bits of the BWD register.

For example, if the debugger wants to match against Byte 1 in Figure 9-9, the BWA must be set

to the byte address of Byte 1 and the BWD written with the value to match on aligned to LSB.

Also the BWC:BME must be set to mask the 24 most significant bits of the BWD register (BME =

0xE).

By default, the data breakpoint module will match on the data value regardless of the size of the

access. The data BWC can also be set to match on a specific access size if the SIZE bits are

set. The debugger can for example, set the breakpoint module to match only on byte writes to

byte 1 in Figure 9-9. The BWD register must still be aligned correctly, and the byte mask must be

set, but the data breakpoint will only trigger if a single byte is written to byte 1 and not if, for

example, a whole word is written to byte 0, 1, 2, and 3.

The OCD system can also break on a 64-bit value if both data breakpoint modules are combined

as shown in Figure 9-10. Setting the Double Word Enable (DWE) bit in EPC3 will cause break-

point module A to trigger only if both breakpoint module A and B matches. The debugger can

then set the BWA3A and BWD3A to the least significant word in the double word and the

BWA3B and BWD3B to the most significant word of the double word. BWC3A:BWE control the

breakpoint operation of the combined breakpoint, while BWC3B:BWE is disregarded.

For example, to set a breakpoint when the address 0x800C is written to

0x0123456789ABCDEF, the registers must be configured as follows:

• EPC3 = (DWE)

• BWC3A = (BWE | BRW | BWO*3 | SIZE*7 )

• BWC3B = (BWE | BRW | BWO*3 | SIZE*7 )

• BWA3A = 0x8010

• BWA3B = 0x800C

• BWD3A = 0x89ABCDEF

• BWD3B = 0x01234567

Figure 9-9. Memory access data alignment.

Double w ord byte 0 to 3

Double w ord byte 4 to 7

Word

0x8010

0x800C

0x8008

0x8004

0x8000

Halfw ord 0

Halfw ord 1

Byte 0

Byte 1

Byte 2

Byte 3

3

2

1

0

123

32001A–AVR32–06/06

Figure 9-10. Data breakpoint alignment.

MSB LSB MSB

LSB MSB

LSB

Data Breakpoint Module

Data Breakpoint Module B

Data Breakpoint Module A

Word

Doublew ord

Halfw ord 0

Halfw ord 1

0x800C

0x8010

Byte 0

Byte 1 Byte 2 Byte 3

9.4.2.5

Unaligned accesses

The CPU supports unaligned accesses by breaking the operation down into multiple aligned

accesses. This means that one ld/st instruction from the CPU can be seen as many sequential

operations on the databus, depending on the alignment of the data to be accessed:

• Unaligned double word load / store is seen as a sequence of word loads / stores.

• Unaligned store word is seen as a sequence of stores which may have different sizes. Eg

st.b , st.h, st.b for byte aligned st.w

• Unaligned load word is always done as two load words.

9.4.3

Advanced features

9.4.3.1

Ranges

It is possible to compine both data breakpoint modules to break on a range of data addresses.

Range is enabled using the EPC3 register. Whenever a data breakpoint range is used data

value matching should be disabled.

The debugger can set up an event pair to give a breakpoint or watchpoint on a range of instruc-

tion addresses. The event pair will then configure the A and B address comparator to give a

match if the instruction address is less than or equal to the BWA. Using the Range (RNG) bits of

the Event Pair Control (EPC) register either inclusive or exclusive ranges can then be set up as

shown in Table 9-23.

When ranges are enabled, the BWC:BWE of module A will control the ranged breakpoint, mod-

ule B will be disabled.

Table 9-23. Range settings.

EPC3: RNG

Resulting Range

10

01

BWA3A < ADDR <= BWA3B

ADDR <= BWA3A or ADDR > BWA3B

9.4.4

Triggering trace

A watchpoint from the program or data breakpoint modules can be used to start or stop program

or data trace. This is done using a trigger unit. The trigger unit can be configured using the

watchpoint trigger register. When the trigger unit is set to start trace upon a watchpoint, DC:TM

will be set accordingly, and trace will then be enabled. If a data watchpoint enables data trace,

the data event is not included in the data trace output, while an event which disables data trace

is included in the data trace output.

124

AVR32

32001A–AVR32–06/06

AVR32

9.4.5

Messages

Watchpoint Hit (WH)

9.4.5.1

Table 9-24. Watchpoint Hit

Watchpoint Message

Direction: From target

Description

Packet

Size

Packet

Name

Packet

Type

XXXXXXX1 = Watchpoint 0 matched

XXXXXX1X = Watchpoint 1 matched

...

8

WPHIT

Fixed

X1XXXXXX = Watchpoint 6 matched

1XXXXXXX = Watchpoint 7 matched

6

TCODE

Value = 15

9.4.5.2

Trace Watchpoint Hit (TWH)

Table 9-25. Trace Watchpoint Hit

Trace Watchpoint Message

Direction: From target

Description

Packet

Size

Packet

Name

Packet

Type

X1 = Watchpoint 0 matched

1X = Watchpoint 1 matched

2

6

WPHIT

TCODE

Fixed

Value = 56

9.4.6

Registers

9.4.6.1

PC Breakpoint/Watchpoint Address registers (BWA0A, BWA0, BWA1A, BWA1B, BWA2A, BWA2B)

The 6 BWA registers contains one instruction address each. The address can be used for a sin-

gle breakpoint match or used as bitwise mask to create a range.

Table 9-26. PC BWAnx Register

R/W

Bit Number

Field Name

Init. Val.

Description

R/W

31:0

BWA

0

Breakpoint/Watchpoint Address

125

32001A–AVR32–06/06

9.4.6.2

PC Breakpoint/Watchpoint Control registers - (BWC0A, BWC0B, BWC1A, BWC1B, BWC2A, BWC2B)

Table 9-27. PC BWCnx Register

Field

R/W

RW

R

Bit Number Name

Init. Val.

Description

BWE - Breakpoint / Watchpoint Enable

00 = Disabled

31:30

29:26

BWE

00

0

01 = Breakpoint enabled

10 = Reserved

11 = Watchpoint enabled

Reserved

AME - Address Mask Enable

This bit is only present in BWCxA registers.

0 = Disabled.

RW

25

AME

0

1 = Enabled. BWAxB will be used to bitwise mask

the PC compare according to this function:

BP A: (PC & BWA_B) == (BWA_A & BWA_B)

BP B: Will never trigger

R

24:15

14

Reserved

EOC

0

Reserved

EOC - EVTO Control

0 = Breakpoint/watchpoint status indication not

output on EVTO

1 = Breakpoint/watchpoint status indication is

output on EVTO

R

13:9

8:1

Reserved

ASID

0

Reserved

ASID - Asid to match

RW

0x00

The 8 bit ASID to match when ASID matching is

enabled.

ASIDEN - ASID match enable

0 = Disabled.

RW

0

ASIDEN

0

1 = Enabled. The breakpoint module will only give

a match if the ASID also matches.

126

AVR32

32001A–AVR32–06/06

AVR32

9.4.6.3

Event Pair Control 3 (EPC3)

Table 9-28. Data Event Pair Control (EPC3) Register

R/W

Bit Number

Field Name

Init. Val.

Description

R

31:3

-

Reserved

DWE - Enable combined Double Word value

compare

0 = Disabled (Default)

1= Combine two event units to do 64 bit

compare

RW

2

DWE

RNG

0

If enabled both data breakpoint modules units

will be combined to do a single 64 bit value

compare

RNG - Range Enable

00: disabled

01: Exclusive range (PC <= Even or PC >

Odd)

1:0

0b00

10: Inclusive range (Even < PC <= Odd)

11: Reserved

9.4.6.4

Data Breakpoint / Watchpoint Address (BWA3A, BWA3B)

Table 9-29. Data Breakpoint/Watchpoint address (BWA3x) register

R/W

Bit Number

Field Name

Init. Val.

Description

Address of data for breakpoint or watchpoint

generation.

RW

31:0

BWA

0x00000000

9.4.6.5

Data Breakpoint / Watchpoint Data (BWD3A, BWD3B)

Table 9-30. Data Breakpoint/Watchpoint data (BWD3x) register

R/W

Bit Number

Field Name

Init. Val.

Description

Data value for breakpoint or watchpoint

generation.

RW

31:0

BWD

0x00000000

127

32001A–AVR32–06/06

9.4.6.6

Data Breakpoint / Watchpoint Control (BWC3A, BWC3B)

Table 9-31. Data Breakpoint / Watchpoint Control (BWC3x)

R/W

Bit Number

Field Name

Init. Val.

Description

BWE - Breakpoint / Watchpoint Enable

00 = Disabled

RW

31:30

BWE

00

01 = Breakpoint enabled

10 = Reserved

11 = Watchpoint enabled

BRW - Breakpoint/Watchpoint Read/Write

Select

00 = Break on read access

01 = Break on write access

10 = Break on any access

11 = Reserved

RW

29:28

BRW

00

R

27:24

23:20

Reserved

BME

00

Reserved

BME - Breakpoint/Watchpoint Data Mask

1XXX = Mask bits 31:24 in BWD

X1XX = Mask bits 23:16 in BWD

XX1X = Mask bits 15:8 in BWD

XXX1 = Mask bits 7:0 in BWD

RW

0x0

R

19:18

17:16

15:12

Reserved

BWO

00

000

0

Reserved

BWO - Breakpoint/Watchpoint Operand

1X = Compare with BWA value

X1 = Compare with BWD value

RW

R

Reserved

SIZE - Size bits to match

0xx = Disregard access size (Default)

100 = Byte access

R/W

11:9

SIZE

000

101 = Halfword access

110 = Word access

111 = Reserved

ASID - Asid to match

RW

8:1

0

ASID

0x00

0

The 8 bit ASID to match when ASID matching

is enabled.

ASIDEN - ASID match enable

0 = Disabled.

ASIDEN

1 = Enabled. The breakpoint module will only

give a match if the ASID also matches.

128

AVR32

32001A–AVR32–06/06

AVR32

9.4.6.7

Watchpoint Trigger

Table 9-32. WT, Watchpoint Trigger Register

R/W

Bit Number

Field Name

Init. Val.

Description

PTS - Program Trace Start

000 = Trigger disabled

001 = Program watchpoint 0b

010 = Program watchpoint 1a

011 = Program watchpoint 1b

100 = Program watchpoint 2a

101 = Program watchpoint 2b

110 = Data watchpoint 3a

111 = Data watchpoint 3b

R/W

31:29

PTS

000

PTE - Program Trace End

000 = Trigger disabled

R/W

28:26

25:23

PTE

DTS

000

001 <-> 111 Watchpoint selected as for PTS

DTS - Data Trace Start

000 = Trigger disabled

001 <-> 111 Watchpoint selected as for PTS

DTE - Data Trace End

R/W

R

22:20

19:0

DTE

000

-

000 = Trigger disabled

001 <-> 111 Watchpoint selected as for PTS

Reserved

129

32001A–AVR32–06/06

9.5

Program trace

9.5.1

Program trace overview

The AVR32 OCD system provides program trace support via the debug port. The program trace

feature implements a Program Flow Change Model in which the program trace is synchronized

at each program flow discontinuity. This occurs at taken indirect branches and exceptions. A

record of taken / not taken direct branches is included so that the complete program flow can be

decoded.

The development tool can then interpolate what transpires between each program trace mes-

sage by correlating information from branch target messaging and static source or object code

files. Self-modifying code cannot be traced with the Program Flow Change Model because the

source code is not static.

The TM[2] bit in the Development Control register must be set to enable program trace.

9.5.1.1

Branch message summary

Five types of branch messages can be generated:

1. Program Trace, Indirect Branch is transmitted on most subroutine calls, returns, inter-

rupts, exceptions, and any situation where the target address of a branch cannot be

determined from the source code. This message contains the instruction count to iden-

tify the branch and the target PC to identify the branch target.

2. Program Trace Synchronization is transmitted to indicate the current PC after starting

trace or after trace synchronization is lost.

3. Program Trace, Indirect Branch messages with sync contain both instruction count and

PC, and are transmitted instead of a Program Trace Synchronization message if a syn-

chronization condition occurs and the current instruction is a taken direct/indirect

branch.

4. Program Trace, Resource full messages is transmitted when an internal buffer over-

flows. ICNT is transmitted whenever it overflows with this message.

5. Program Trace Correlation. This message is transmitted to synchonize the program

trace with an event. Sent when trace is disabled, debug mode is entered or sleep mode

is entered.

The Nexus standard also specifies Program Trace Correction messages to correct for specula-

tively transmitted trace messages, but these are not implemented in the AVR32, since program

trace messages are only transmitted for actually executed instructions. Similarly, the Nexus-

specified CANCEL packet of synchronized branch messages is not implemented in AVR32.

Entry into Debug Mode will generate an program trace correlation message, while no trace mes-

sages are generated while executing in Debug Mode. A Program Trace Synchronization

message is transmitted when Debug Mode is exited.

9.5.2

Branch message packets

The program trace messages contain packets which identify the address of the taken branch,

the target of the branch, and the current program counter value. These packets are discussed

below.

130

AVR32

32001A–AVR32–06/06

AVR32

9.5.2.1

Instruction count packet

In several of the program trace messages, an Instruction Count (I-CNT) packet is included, to

identify the number of sequentially executed instruction units since the last program trace mes-

sage. In AVR32, this figure refers to bytes, i.e. compact instructions count two bytes and

extended instructions are four bytes.

The following rules apply to instruction counts:

• A taken indirect branch which generates a trace message is not included in the instruction

count.

• An indirect branch which is not taken is included in the instruction count.

• Speculatively fetched instructions are not counted until they are actually executed.

• The instruction counter is reset every time a program trace message is generated.

9.5.2.2

Compressed program counter packets

To save bandwidth, the Nexus messages employ compressed versions of the program counter

address. These include:

U-ADDR = StripLeadingZeros (Previous sent addr xor uncompressed address from pipeline).

F-ADDR = Full target address for a taken branch. Leading zeroes may be truncated.

9.5.3

Special cases

9.5.3.1

Debug Mode

When entering Debug Mode, a PTC message is generated with EVCODE = 0.

When exiting Debug Mode, a PTSY message is generated. If the instruction also generates a

branch message, the branch message with sync (i.e. PTDBS or PTIBS) is generated instead of

PTSY. In this case, the address of the instruction which generated the branch message can not

be explicitly reconstructed from the trace log, but the debugger will normally know which address

was returned to when Debug Mode was exited.

If a breakpoint occurs on the first instruction after exiting Debug Mode, a PTC message with

EVCODE = 0 is generated.

9.5.4

Messages

9.5.4.1

Program Trace, Direct Branch

This message is output by the target processor whenever there is a change of program flow

caused by a conditional or unconditional branch. The instruction count (I-CNT) is included to

identify the branch address. The following AVR32 instructions can cause a direct branch:

Table 9-33. Direct branch instructions

Mnemonic

br{cond3}

br{cond4}

rjmp

Description

Compact

Extended

Compact

Branch if condition satisfied.

131

32001A–AVR32–06/06

Table 9-34. Direct Branch message without sync

Direct Branch Message Direction: From target

Packet Size

(bits)

Packet

Name

Packet

Type

Description

8

6

I-CNT

Variable

Fixed

Number of bytes executed since the last taken branch.

Value = 3

TCODE

9.5.4.2

Program Trace, Direct Branch with Target Address

This message is transmitted instead of the Direct Branch message when SQA enhanced pro-

gram trace is enabled by writing DC:SQA to one. This simplifies real-time PC reconstruction in

the emulator for real-time code coverage and performance analysis purposes.

Table 9-35. Direct Branch message with Target Address

Direct Branch Message with Sync

Direction: From target

Packet

Size (bits)

Packet

Name

Packet

Type

Description

The unique portion of the branch target address for a taken

indirect branch or exception. Most significant bits that have a

value of 0 are truncated.

32

U-ADDR

Variable

8

6

I-CNT

Variable

Fixed

Number of bytes executed since the last taken branch.

Value = 57

TCODE

9.5.4.3

Program Trace, Indirect Branch

An indirect branch is output by the target processor whenever there is a change of program flow

caused by a subroutine call, return instruction, interrupt, or exception.

Messages for taken indirect branches and exceptions include how many sequential bytes were

executed since the last taken branch or exception, and the unique portion of the branch target

address or exception vector address. The unique portion of the branch is found by doing an

exclusively or on the branch target and the last sent UADDR / FADDR. Additionally, the cause

of the indirect branch is identified through an Event ID packet. Operations causing indirect

branches and their corresponding EVT-ID are shown below.

Table 9-36. Operations causing indirect branch messages

Description

Operation

EVT-ID

Exception entry

Subroutine call

Exception, interrupts (0 to 3), NMI, entry to Debug Mode

acall, icall, mcall, jcall, scall, rcall instruction

3

2

Any mov (except mov pc, lr) or load (except popm/ldm) with

PC as destination.

Branch via register

contents

1

0

Any arithmetic instruction with PC as destination.

ret{cond4}, rete, rets, retj, (mov pc, lr), popm/ldm loading

PC

Return

Note that subrotine returns are often accomplished by a mov pc, lr, popm or ldm instruction with

PC included in the argument list. This generates an EVT-ID of 0 instead of 1.

132

AVR32

32001A–AVR32–06/06

AVR32

.

Table 9-37. Indirect branch message without sync

Indirect Branch Message

Direction: From target

Packet

Size (bits)

Packet

Name

Packet

Type

Description

The unique portion of the branch target address for a taken

indirect branch or exception. Most significant bits that have a

value of 0 are truncated.

32

8

U-ADDR

I-CNT

Variable

Number of bytes executed since the last taken branch.

Cause of indirect branch:

3: Exception entry

2: Call

2

6

EVT-ID

Fixed

1: Branch via register contents

0: Return

TCODE

Value = 4

9.5.4.4

Program Trace Synchronization

This message is output by the PTU when any of the following conditions occurs:

1. Upon exit from reset. This is required to allow the number of instruction units executed

packet in a subsequent Program Trace Message to be correctly interpreted by the tool.

2. When program trace is enabled during normal execution of the embedded processor.

3. Upon exit from a power-down state. This is required to allow the number of instruction

units executed packet in a subsequent Program Trace Message to be correctly inter-

preted by the tool.

4. Upon exiting from Debug Mode.

5. An overrun condition had previously occurred in which one or more branch trace occur-

rences were discarded by the target processor’s debug logic.To inform the tool that an

overrun condition occurred, the target outputs an Error Message (TCODE = 8) with an

ECODE value of 00001 or 00111 immediately prior to the Program Trace Synchroniza-

tion Message.

6. A debug control register field specifies that EVTI pin action is to generate program trace

synchronization, and the Event-In (EVTI) pin has been asserted.

7. Upon overflow of the sequential instruction unit counter.

8. After 256 branch messages without sync.

Table 9-38. Program Trace Synchronization Message

Program Trace Sync Message

Direction: From target

Packet

Size (bits)

Packet

Name

Packet

Type

Description

The full current instruction address. Most significant bits that

have a value of 0 are truncated.

32

PC

Variable

8

6

I-CNT

Variable

Fixed

Number of bytes executed since the last taken branch.

Value = 9

TCODE

133

32001A–AVR32–06/06

9.5.4.5

Program Trace, Direct Branch with Sync

If a Program Trace Synchronization message occurs on an instruction which transmits a direct

branch message, the Direct Branch with Sync message is transmitted instead of the Program

Trace Synchronization message. The Direct Branch with Sync message contains the instruction

count referring to the taken branch, as well as the complete PC value of the branch target.

The format for direct branch messages with sync is shown below. The AVR32 OCD system

never issues speculative branch messages and there is therefore no CANCEL packet.

Table 9-39. Direct Branch message with Sync

Direct Branch Message with Sync

Direction: From target

Packet

Size (bits)

Packet

Name

Packet

Type

Description

The full target address for a taken direct branch. Most

significant bits that have a value of 0 are truncated.

32

F-ADDR

Variable

8

6

I-CNT

Variable

Fixed

Number of bytes executed since the last taken branch.

Value = 11

TCODE

9.5.4.6

Program Trace, Indirect Branch with Sync

If a Program Trace Synchronization message occurs on an instruction which transmits an indi-

rect branch message, the Indirect Branch with Sync message is transmitted instead of the

Program Trace Synchronization message. The Indirect Branch with Sync message contains the

instruction count referring to the taken branch, as well as the complete PC value of the branch

target.

The format for indirect branch messages with sync is shown below. The AVR32 OCD system

never issues speculative branch messages and there is therefore no CANCEL packet.

Table 9-40. Indirect Branch message with Sync

Indirect Branch Message with Sync

Direction: From target

Packet

Size (bits)

Packet

Name

Packet

Type

Description

The full target address for a taken direct branch. Most

significant bits that have a value of 0 may be truncated.

32

8

F-ADDR

I-CNT

Variable

Number of bytes executed since the last taken branch.

Cause of indirect branch:

3: Exception entry

2: Call

2

6

EVT-ID

Fixed

1: Branch via register contents

0: Return

TCODE

Value = 12

9.5.4.7

Program Trace, Resource Full

This message is output whenever an internal resource (sequential instruction counter) has

reached its maximum value. To avoid losing information when this resource becomes full, the

Resource Full message is transmitted. The information from this message is added with infor-

mation from subsequent messages to interpret the full picture of what has transpired. Multiple

134

AVR32

32001A–AVR32–06/06

AVR32

Resource Full messages can occur before the arrival of the message that the information

belongs with.

Table 9-41. Resource Full message

Program Trace, Resource Full

Direction: From target

Packet

Size (bits)

Packet

Name

Packet

Type

Description

8

4

6

RDATA

RCODE

TCODE

Variable

Number of bytes executed since the last taken branch.

Resource Code. This code indicates which internal resource

has reached its maximum value. Refer to Table 9-42 for

details.

Fixed

Value = 27

Table 9-42. Resource Code (RCODE) description

Resource

Code

Resource

Data Packet Value

Program Trace - Sequential Instruction

Counter

Number of instruction units executed since

the last taken branch.

0b0000

0b0001 -

0b1111

Reserved

9.5.4.8

Program Trace Correlation

Program Trace Correlation messages are used to correlate events to the program flow that may

not be associated with the instruction stream (e.g. Data Trace Messages). The occurrence of an

event listed in Table 9-43 will cause this message to be transmitted.

Table 9-43. Program Trace Correlation message

Program Trace Correlation

Direction: From target

Packet

Size (bits)

Packet

Name

Packet

Type

Description

Number of instruction units executed since the last taken

branch.

8

I-CNT

Variable

4

6

EVCODE

TCODE

Fixed

Event Code. Refer to Table 9-44.

Value = 33

Table 9-44. Event Code (EVCODE) description

Event Code

(EVCODE)

Event Description

0b0000

Entry into Debug Mode

Entry into Low Power Mode

0b0001

135

32001A–AVR32–06/06

Table 9-44. Event Code (EVCODE) description

Event Code

(EVCODE)

Event Description

Reserved

0b0010 - 0b0011

0b0100

Program Trace Disabled

Reserved

0b0101 - 0b1111

9.5.5

Registers

Program trace is enabled using the TM field in the Development Control register.

9.6

Data Trace

9.6.1

Overview

The AVR32 OCD system provides data trace via the AUX port. The CPU data memory accesses

can be monitored real-time using the Nexus class 3 compliant Data Trace Unit. Both reads and

writes can be traced. Information is traced between the CPU and data cache, which gives imme-

diate access to modified data for cached memory accesses. This provides a direct

correspondence between the CPU program and traced data, even if there may be a delay

before the written cache data is actually flushed to the data memory.

Data Trace information is transmitted through data trace messages, which can be of read or

write type, with or without sync. The messages contain information about the data address and

value which triggered the trace. Data addresses can be complete (with sync), or compressed

relative to the previous transmitted message (without sync). The value contains the data value

read or written from the data cache, and is of the same width as the access size (byte, halfword,

word, or doubleword).

The TM[1] bit in the Development Control register must be set to enable data trace. It is also

possible to trigger data trace using watchpoints. In this case, TM[1] will be set or cleared

automatically.

9.6.2

Using data trace channels as watchpoints

Data Trace is enabled for address ranges (trace channels) specified by pairs of Data Trace Start

and End Address registers (DTSA/DTEA). Each data access within that boundary will generate

an action as specified by the corresponding bits in the Data Trace Control register (DTC). The

AVR32 OCD system currently supports two data trace channels.

While each channel can be used to trigger data trace messages, it is also possible to trigger

watchpoint messages, providing flexibility when using the OCD system. Watchpoints can be

ranged, i.e. trigger on all accesses between DTSA through DTEA, or trigger on a single location,

if DTSA and DTEA are written to the same value.

Writing TnWP to one enables a watchpoint on accesses for data trace channel n. The watch-

point message is sent as a vendor defined trace watchpoint message.

It is possible to enable both trace and watchpoint on the same channel, but typically, only one of

the options will be used.

9.6.3

136

Messages

The Trace Watchpoint Hit message is described in Section 9.4.5.2 on page 125.

AVR32

32001A–AVR32–06/06

AVR32

9.6.3.1

Data Trace, Data Write (DTDW)

This message is output by the target processor when it detects a memory write that matches the

OCD system’s data trace attributes.

Table 9-45. Data Trace, Data Write message

Data Trace, Data Write message

Direction: From target

Packet

Size

Packet

Name

Packet

Type

Description

8 / 16 /

32

The data value written. The size will vary depending on the load

/ store instruction being traced.

DATA

Variable

The unique portion of the data write address, which is relative to

the previous Data Trace Message (read or write).

32

U-ADDR

Data size:

00 = 8 bits

01 = 16 bits

10 = 32 bits

2

6

DSZ

Fixed

TCODE

Value=5

9.6.3.2

Data Trace, Data Write with Sync (DTDWS)

This message is an alternative to the Data Trace, Data Write Message. It is output instead of a

Data Trace, Data Write Message whenever a memory write occurs that matches the debug

logic’s data trace attributes, and when one of the following conditions has occurred:

1. The processor has exited from reset. This synchronization message is required to allow

the unique portion of the data write address of following Data Trace, Data Write Mes-

sages to be correctly interpreted by the tool.

2. When data trace is enabled during normal execution of the embedded processor.

3. Upon exit from a power-down state. This synchronization message is required to allow

the unique portion of the data write address of following Data Trace, Data Write Mes-

sages to be correctly interpreted by the tool.

4. The Event-In pin has been asserted and a debug control register field specifies that

EVTI pin action is to generate data trace synchronization.

5. An overrun condition had previously occurred in which one or more data trace occur-

rences were discarded by the target processor’s debug logic. To inform the tool that an

overrun condition occurred,the target outputs an Error Message (TCODE = 8) with an

ECODE value of 00010 or 00111 immediately prior to the Data Trace, Data Write with

Sync Message.

6. The Data Trace Message counter has expired indicating that at most 256 without-sync

versions of Data Trace Messages have been sent since the last with-sync version.

7. A data write is detected following the processor exiting from Debug Mode.

137

32001A–AVR32–06/06

Table 9-46. Data Trace, Data Write with Sync message

Data Trace, Data Write with Sync

message

Direction: From target

Packet

Size

Packet

Name

Packet

Type

Description

8 / 16 /

32

The data value written. The size will vary depending on the load /

store instruction being traced.

DATA

Variable

The full address of the memory location written. Most significant

bits that have a value of 0 are truncated.

32

F-ADDR

Data size:

00 = 8 bits

01 = 16 bits

10 = 32 bits

2

6

DSZ

Fixed

TCODE

Value=13

9.6.3.3

Data Trace, Data Read (DTDR)

This message is output by the target processor when it detects a memory read that matches the

OCD system’s data trace attributes.

Table 9-47. Data Trace, Data Read message

Data Trace, Data Read message

Direction: From target

Packet

Size

Packet

Name

Packet

Type

Description

8 / 16 /

32

The data value read. The size will vary depending on the load /

store instruction being traced.

DATA

Variable

The unique portion of the data read address, which is relative to

the previous Data Trace Message (read or write).

32

U-ADDR

Data size:

00 = 8 bits

01 = 16 bits

10 = 32 bits

2

6

DSZ

Fixed

TCODE

Value=6

9.6.3.4

Data Trace, Data Read with Sync (DTDRS)

This message is an alternative to the Data Trace, Data Read Message. It is output instead of a

Data Trace, Data Read Message whenever a memory read occurs that matches the debug

logic’s data trace attributes, and when one of the following conditions has occurred:

The processor has exited from reset. This synchronization message is required to allow the

unique portion of the data write address of following Data Trace, Data Read Messages to be cor-

rectly interpreted by the tool.

When enabling data trace is during normal execution of the embedded processor.

Upon exit from a power-down state. This synchronization message is required to allow the

unique portion of the data write address of following Data Trace, Data Read Messages to be cor-

rectly interpreted by the tool.

138

AVR32

32001A–AVR32–06/06

AVR32

The Event-In pin has been asserted and a debug control register field specifies that EVTI pin

action is to generate data trace synchronization.

An overrun condition had previously occurred in which one or more data trace occurrences were

discarded by the target processor’s debug logic. To inform the tool that an overrun condition

occurred, the target outputs an Error Message (TCODE = 8) with an ECODE value of 00010 or

00111 immediately prior to the Data Trace, Data Read with Sync Message.

The periodic Data Trace Message counter has expired indicating that 255 without-sync versions

of Data Trace Messages have been sent since the last with-sync version.

A data read is detected following the processor exiting from Debug Mode.

Table 9-48. Data Trace, Data Read with Sync message

Data Trace, Data Read with Sync

message

Direction: From target

Packet

Size

Packet

Name

Packet

Type

Description

8 / 16 /

32

The data value read. The size will vary depending on the load /

store instruction being traced.

DATA

Variable

The full address of the memory location written. Most significant

bits that have a value of 0 are truncated.

32

F-ADDR

Data size:

00 = 8 bits

01 = 16 bits

10 = 32 bits

2

6

DSZ

Fixed

TCODE

Value=14

9.6.4

Registers

Data Trace Control register (DTC)

9.6.4.1

This register controls actions taken on data accesses within all data trace channels.

Table 9-49. Data Trace Control Register

R/W

Bit Number

Field Name

Init. Val.

Description

RWT0 - Read/Write Trace channel 0

00 = No trace enabled

R/W

31:30

RWT0

0

x1 = Enable data read trace

1x = Enable data write trace

RWT1 - Read/Write Trace channel 1

00 = No trace enabled

R/W

29:28

RWT1

0

x1 = Enable data read trace

1x = Enable data write trace

R

27:20

19:12

11

Reserved

ASID1

0

R/W

ASID to match for channel 1

ASID1EN - ASID 1 enable

ASID to match for channel 0

ASID1EN

ASID0

10:3

139

32001A–AVR32–06/06

Table 9-49. Data Trace Control Register

R/W

Bit Number

Field Name

ASID0EN

T1WP

Init. Val.

Description

2

1

0

ASID1EN - ASID 0 enable

T1WP - Trace Channel 1 Watchpoint

T0WP - Trace Channel 0 Watchpoint

T0WP

9.6.4.2

Data Trace Start/End Address register (DTSA/DTEA)

DTSAn and DTEAn define the inclusive data access range [DTSAn : DTEAn] for trace channel

n. Each trace channel 0 and 1 has its own DTSA/DTEA register pair. If DTSA=DTEA, the trace

channel will match on accesses to a single location. If DTSA>DTEA, no match will occur for the

trace channel.

DTSA0, DTSA1

Table 9-50. Data Trace Start Address Register

R/W

Bit Number

Field Name

Init. Val.

Description

R/W

31:0

DTSA

0

DTSA - Start address for trace visibility

DTEA0, DTEA1

Table 9-51. Data Trace End Address Register

R/W

Bit Number

Field Name

Init. Val.

Description

R/W

31:0

DTEA

0

DTEA - End address for trace visibility

9.7

Ownership Trace

9.7.1

Functional description

The AVR32 OCD system implements Ownership Trace in compliance with the Nexus standard.

Ownership trace provides a macroscopic view, such as task flow reconstruction, when debug-

ging software written in a high level (or object oriented) language. It offers the highest level of

abstraction for tracking operating system software execution. This is especially useful when the

developer is not interested in debugging at lower levels.

Ownership trace is especially important for embedded processors with a memory management

unit, in which all processes can use the same virtual program and data spaces. Ownership trace

offers development tools a mechanism to decipher which set of symbolics and sources are

associated for lower levels of visibility and debugging.

Ownership trace information is transmitted out the AUX using an Ownership Trace Message.

OTM facilitates ownership trace by providing visibility of which process ID or operating system

task is activated. An Ownership Trace Message is transmitted to indicate when a new pro-

cess/task is activated, allowing development tools to trace ownership flow. Additionally, an

Ownership Trace Message is also transmitted periodically during runtime at a minimum fre-

quency of every 256 Program Trace or Data Trace Messages.

In the AVR32, this feature is supported through an Ownership Trace Register, which automati-

cally produces an Ownership Trace Message when written to. The RTOS scheduler routine

writes the new process ID to this register during process switching using the mtdr instruction.

The TM[0] bit in the Development Control register must be set to enable ownership trace.

140

AVR32

32001A–AVR32–06/06

AVR32

9.7.2

Messages

9.7.2.1

Ownership Trace (OT)

• The ownership trace message is sent:

• When the Ownership Trace Process ID (PID) register is written.

• When program trace with sync message is generated due to overflow in the periodic

message counter.

• When a data trace with sync message is generated due to overflow in the periodic message

counter.

• After a Transmit Queue overrun if the CPU has written to PID when the queue was full.

If there is no room in the Transmit Queue for the message, and the CPU is not halted to prevent

overruns, an error message is produced.

Table 9-52. Ownership Trace Message

Ownership Trace Message

Direction: From target

Packet

Size

Packet

Name

Packet

Type

Description

Task / process ID.

Value = 2

32

6

PROCESS

TCODE

Fixed

9.7.3

Registers

Ownership Trace Process ID (PID)

9.7.3.1

The CPU should write the current Process ID value to this register, whenever the RTOS per-

forms a process switch. This will automatically create an Ownership Trace Message to be

transmitted to the tool. This register can be written from any privileged CPU mode.

The tool can read and write this register, although it is recommended that only the CPU writes

this register.

Table 9-53. Ownership Trace Process ID (PID)

R/W

Bit Number

Field Name

Init. Val.

Description

PROCESS - Process ID

RW

31:0

PROCESS

0

The unique Process ID number of the currently

running process.

9.8

Memory Interface

9.8.1

Overview

The Memory Interface provides the debug tool with a mechanism for a DMA-like access to mem-

ory mapped resources both run-time and in Debug Mode. Memory mapped resources include

main memory and memory mapped peripheral modules. Access to registers in peripheral mod-

ules may trigger unintended operation of the module, and the debug tool should generally

include a list of access restrictions for the peripherals. The CPU register file is not memory

mapped, and not accessible through the Memory Interface.

Note that the tool uses physical addresses when accessing memory mapped data through the

Memory Interface, as opposed to Breakpoint/Watchpoints, which use virtual addresses.

141

32001A–AVR32–06/06

9.8.2

Memory (Block) Access

The Memory Block Access is controlled by the registers Read/Write Access Data (RWD),

Read/Write Access Address (RWA), and Read/Write Control/Status (RWCS). The tool accesses

these registers via the JTAG port. A Memory Block Access provides the debug tool with a mech-

anism for a DMA-like access to memory mapped resources. In this document, the terms

Memory Block Access and Memory Access refer to a memory block of any length from one sin-

gle location (CNT = 1), to the full range supported by the OCD system (CNT = 16 383). When

using the maximum size of word, the maximum memory that can be transferred in one block is

thus 64KB.

9.8.2.1

Memory Read Operations

1. The tool writes RWA with physical address to be read.

2. The tool configures RWCS Register, including CNT, AC=1, and RW=0 to indicate

(block) read.

3. The tool must wait until the data is ready from the MIU. If very slow memory is

accessed the tool can check that the DV bit in RWCS is one before reading RWD.

4. The tool reads RWD. The MIU outputs the data and auto-increments the address.

5. Step 3 and 4 are repeated until the number of data specified by the CNT field has been

read.

6. When the entire block has been read AC will be cleared by the MIU. RWA will point to

the last location that was read. If the tool wishes to continue reading from this point

RWCS must be written again.

9.8.2.2

9.8.2.3

Special cases:

• If the memory read operation results in an error, the ERR bit in RWCS will be set, and the

data in RWD will not be valid (DV=0). AC will also be cleared.

• Any write to RWCS will abort an ongoing block access.

Memory Write Operations

1. The tool writes RWA with the first physical address to be written.

2. The tool configure the RWCS Register, including CNT, AC=1, and RW=1 to indicate

(block) write.

3. The tool writes one data value to RWD.

4. The tool must wait until the MIU is ready to accept more data. This can be done by

reading the ready bit in RWCS or by using the NEXUS-STATUS JTAG command.

5. Step 3 and 4 are repeated until the number of data specified by the CNT field has been

transmitted.

6. When the entire block has been read AC will be cleared by the MIU. RWA will point to

the last location that was written. If the tool wishes to continue reading from this point

RWCS must be written again.

9.8.2.4

Special cases:

• If the memory write operation results in an error, the ERR bit in RWCS will be set and DV

cleared. AC will also be cleared.

• Any write to RWCS will abort an ongoing block access.

142

AVR32

32001A–AVR32–06/06

AVR32

9.8.3

Address Space

The SZ field of RWCS determines the word size (data type) of the memory access, while the

CNT-field of the RWCS register determines the number of accesses of size SZ. Since Avr-32 is

byte addressed, the accessed address range is from RWA through RWA + k * (CNT+1), where k

is given by Table 9-54. The address in RWA must correspond to the Most Significant Byte of the

data of size SZ in physical memory.

Note that the AVR32 uses a big-endian memory model. E.g., if the word located at address

0x0000 contains 0x12345678, the byte at 0x0000 will read 0x12, and the halfword at 0x0000 will

read 0x1234.

Table 9-54. Address Increment as a Function of Word Size / Data Type

Access Type

Byte access

SZ

k = 2^SZ

000

001

010

1

2

4

Half-word access

Word access

9.8.4

Error Conditions

Errors during a memory access are indicated to the tool by setting the ERR and clearing the DV

bit of the RWCS register.

There are 3 sources of errors:

1. The system bus signals an error

2. If the tool writes to the RWCS register during a single or block access, the access is ter-

minated and indicated as an error.

3. The Tool writes SZ or CNT to an illegal value.

9.8.5

Data Cache operation

Memory accesses from the OCD system are served by the Data Cache. Since the Data Cache

can buffer CPU accesses to memory, there may be an inconsistent data view between the

cache and system memory. The OCD system will specify whether or not the access should be

cached or uncached. By default, reads will be cached, and writes uncached. This can be altered

by writing the Cache Control (CCTRL) bits in RWCS.

Uncached reads return the value in system memory, which may differ from the value in the

cache if the CPU has written to this location. Cached reads return the value in the cache, i.e. the

same as the value seen by the CPU. If the data is not present inside the cache, the Data Cache

accesses the data through the system bus. Unlike a memory access from the CPU, a cache

miss for an OCD access does not update the Data Cache memory or registers.

Uncached writes will change the value both in the cache and system memory. Cached writes will

only change the cache value, and tag the cache line as dirty, ensuring it will be written to mem-

ory on the next cache flush. Cached writes are faster than uncached writes, but can cause

temporary inconsistency between system and cache memory.

Access error indicated by the system bus results in an exception in the CPU if cached operation

is used. When the error is due to an uncached memory access from the OCD System, the

exception is not generated, but the Error bit is set in the Read Write Control Register (RWCS).

The Data Cache is a shared resource between the CPU and the OCD System. This resource is

allocated solely to the CPU in normal operation. Hence, the OCD System will degrade perfor-

143

32001A–AVR32–06/06

mance. This delay is particularly large for uncached accesses or cached accesses to locations

not present in the cache.

9.8.6

NanoTrace

NanoTrace is an AVR32 specific debug feature which allows a JTAG-based emulator to observe

limited trace information without the need for an AUX interface. Instead, the trace messages will

be written to a circular buffer in a reserved space in the data memory, configured by the RWCS

and RWA registers. NanoTrace employs the block write mechanism to write trace data to the

internal SRAM, so block read/write is not available when NanoTrace is enabled. All messages

normally written to the AUX port will be written to memory, so all kinds of trace, as well as watch-

point messages are written to the internal memory and can be reconstructed.

The AUX port does not need to be active for NanoTrace to function.

9.8.6.1

NanoTrace operation

To enable NanoTrace, the RWCS must be written with AC=1, RW=1, SZ=010, NTE=1 and

CNT=2ⁿ, where n is an integer between 0 and 28. The start address of the circular buffer must

be written to RWA. The CNT field must be written to log2("size of buffer in bytes">>2), this

n

restricts the size of the NanoTrace buffer to 2 boundaries but permits up to 1 GB of trace buffer.

Once NanoTrace is enabled, messages are extracted frame by frame from the Transmit Queue

and written to the RWD register. Only valid (i.e. non-idle) frames are extracted. When RWD has

no room for more frames, it is written to the circular buffer in memory, as shown in Figure 9-11.

The buffer is repeatedly overwritten with trace messages until NanoTrace is halted. This occurs

when the NTE bit in RWCS is written to zero. Every time the buffer wraps, the next trace mes-

sage is inserted with sync, to increase the portion of the trace buffer which can be uniquely

reconstructed.

When NanoTrace is halted, the block read/write mechanism can again be used to access mem-

ory locations from the debugger.

Figure 9-11. NanoTrace memory arrangement.

Oldest message

New est message

RWA

2^CNTwords

RWA[31:(CNT+2)]

Word

9.8.6.2

Extracting NanoTrace messages

When NanoTrace is halted, or no more trace messages are generated (e.g. in OCD Mode), the

RWA register will point to the word following the last message written to memory. If the circular

buffer has been completely filled and thus overwritten at least once, the RWCS:WRAPPED bit

144

AVR32

32001A–AVR32–06/06

AVR32

will be set. This means that the word pointed to by RWA is part of the oldest message. If

RWCS:WRAPPED is cleared, only the messages from RWA[31:(CNT+2)] to RWA-4 contain

valid message data.

The trace log can thus be reconstructed by reading words from RWA (or RWS[31:(CNT+2)] if

RWCS:WRAPPED is cleared) to RWA-4 in the circular RAM buffer. When reaching the address

RWA+CNT*4, the address should be wrapped down to RWA[31:(CNT+2)]. Frames consist of

the value of the MSEO pins in the most significant bit positions, and the value of the MDO pins in

the least significant bit positions. Frames are aligned to the most significant bit within each word,

as shown in Figure 9-12.

Since RWD is only written to the buffer when a whole word of data is filled, the last frames of the

last message may not have been transmitted to memory. RWCS:DV will be set to indicate that

RWD contains valid trace data, and these frames can be extracted by reading RWD. Empty

frame positions within RWD are tagged as "Idle", i.e. MSEO = 0b11.

Figure 9-12 shows an example of a NanoTrace buffer, with RWA starting at 0x1000 and CNT =

10 (i.e. the buffer size is 1024 words, or 4096 frames). When the trace was stopped,

RWCS:WRAPPED is set and RWA = 0x1234, so the last word of frame data written to the mem-

ory is located at 0x1230, and a partially filled word is in RWD. In this example, the last message

(shown in white) in the Transmit Queue was an Indirect Branch message with Sync. The same

example was shown for regular AUX port transmission. The last two frames of the message still

reside in RWD, which has been only partially filled.

Figure 9-12. Frame organization within a word.

31

24

16

8

0

Frame0

MDO

Frame1

MDO

Frame2

MDO

Frame3

MDO

MSEO

Figure 9-13. Reconstructing a NanoTrace message.

31

24

16

8

0

Frame4096

000011

Frame4097

000101

Empty

000000

Empty

000000

RWD

01

11

.

Frame0

MDO

Frame1

MDO

Frame2

MDO

Frame4094

000100

Frame4090

MDO

Frame3

MDO

Frame4095

111110

RWA = 0x1234

MSEO

11

MSEO

00

Frame4092

MDO

Frame4093

MDO

Frame4088

MDO

Frame4089

MDO

Frame4091

MDO

MSEO

.

9.8.6.3

NanoTrace access protection

If the CPU attempts to write the data memory reserved for NanoTrace messages, the CPU soft-

ware or message reconstruction can fail. To automatically detect this source of error, it is

possible to write the NanoTrace Access Protection (NTAP) bit in RWCS to one. This will cause a

145

32001A–AVR32–06/06

hardware error to be triggered if the CPU attempts to access the protected area. This allows the

emulator to abort the program execution and notify the user about the illegal access.

NanoTrace access protection will only function correctly when physical and virtual addresses

are the same for the memory region reserved for NanoTrace. If this is not the case, NTAP

should stay zero to avoid incorrect access error breakpoints.

Note that NanoTrace access protection will never trigger in Monitor Mode.

9.8.6.4

9.8.6.5

Overrun control

The DC:OVC bits works for NanoTrace as well as for AUX port messages. However, the overrun

prevention will not be as efficient for NanoTrace. If the Transmit Queue becomes full, the CPU

will not issue any more instructions, but already issued instructions will be allowed to complete. If

these instructions generate trace information, the Transmit Queue may overrun even when the

CPU is stalled.

NanoTrace Buffer Control

By default, the NanoTrace buffer will be repeatedly overwritten until NanoTrace is stopped. How-

ever, by writing the RWCS:NTBC bits, it is possible to control the behavior when the buffer

becomes full. In this case, RWD will not contain trace information, and does not need to be read

out. RWA will point to the first address in the buffer, so RWA does not need to be rewritten if

NanoTrace is restarted.

In some cases, only the first trace messages after NanoTrace is enabled are interesting. In this

case, NanoTrace can be disabled when the buffer is full. The debugger will detect that this has

occurred by observing when RWCS:NTE is negated. RWCS:AC and DV will also be cleared, to

indicate that the memory operation is complete, and no valid trace information exists in RWD. To

restart NanoTrace, RWCS:NTE and AC must be written to one.

Alternatively, Debug Mode can be triggered when the buffer is full. This will set the NanoTrace

Buffer Full bit in the Development Status register (DS:NTBF). RWCS:NTE will stay set, but AC

and DV will be cleared. The debugger can then read out the NanoTrace buffer in Debug Mode,

before restarting execution. To restart NanoTrace when exiting Debug Mode, RWCS:NTE and

RWCS:AC must be written to one.

9.8.6.6

CRC-32 check of a memory block

The memory interface unit can generate a CRC-32 checksum on a memory block.

The standard CRC-32 (802.3) polynomial is used:

x³²+ x²⁶+ x²³+ x²²+ x¹⁶+ x¹²+ x¹¹+ x¹⁰+ x⁸+ x⁷+ x⁵+ x⁴+ x²+ x + 1

To enable this feature the debugger must set the AC=1, CRC=1, SIZE=2 (word) and

CNT=<number of words in block> bit in RWCS and the start of the block in RWA. The MIU will

then read the memory block and put a CRC-32 of the memory block in RWD when AC is cleared

and DV is set. The debugger can continue the CRC generation on a new block by rewriting the

RWCS with AC, CRC, SIZE and CNT when a CRC block is finished and the CRC bit is still set.

The CRC in RWD after the second block will be CRC32(block1 + block2).

9.8.7

146

Messages

The Memory Interface generates no messages, all features are accessed with regular read /

write messages on the JTAG interface.

AVR32

32001A–AVR32–06/06

AVR32

9.8.8

Registers

Read/Write Access Control/Status (RWCS) Register

9.8.8.1

Table 9-55. Read/Write Access Control/Status (RWCS)

R/W

Bit Number

Field Name

Init. Val.

Description

AC - Access

R/W

31

AC

0

0 = No access ongoing

1 = Start access

RW - Memory Access Read/Write

R/W

30

RW

SZ

0

0 = Read

1 = Write

SZ - Data Size

000 = Byte

001 = Half-Word

010 = Word

29:27

011 = Reserved

1xx = Reserved

CCTRL - Cache Control

00 = Auto

R/W

26:25

24

CCTRL

00

0

01 = Always use cached memory view

10 = Always use uncached memory view

11 = Reserved

WRAPPED - NanoTrace Buffer wrapped

WRAPPED

Indicates that the RWA pointer to the nanotrace

buffer has wrapped at least once.

NTAP - NanoTrace Access Protection

Enables NanoTrace access protection.

R/W

23

22

NTAP

NTE

0

NTE - NanoTrace Enable

Enables NanoTrace.

NTBC - NanoTrace Buffer Control

00 = Overwrite buffer

R/W

21:20

NTBC

00

01 = Disable trace when buffer full

10 = Trigger breakpoint when buffer full

11 = Reserved

CRC - CRC Enable

R/W

R

19

CRC

-

0

Enables CRC of memory area.

18:16

Reserved

CNT - Access Count

R/W

15:2

CNT

0

Number of accesses of word size SZ.CNT is an

unsigned number.

R

1

0

ERR

DV

0

Last access generated an error

Data Valid in RWD

147

32001A–AVR32–06/06

AC

The tool writes the AC bit to one to initiate an access. The AC field is negated by the MIU upon

completion of the access requested by the tool. Any write operation to the RWCS register will

terminate any access in process, including the remaining of the block access. If the write opera-

tion sets AC=1, the previous (block) access will be terminated, but a new one will be initiated.

SZ

SZ determines the access size. The bits are written to by the tool.

RW

RW determines whether the access is a read or write. The RW bit is written by the tool.

NTE

Enable nanotrace. When NanoTrace is enabled, trace messages will be written to the data

memory.

CRC

When this bit is set the MIU will read the entire memory area specified with RWA and CNT and

place a CRC-32 signature of this area in RWD when AC is cleared and DV is set. NTE and CRC

is mutually exclusive, SZ must be word. When the CRC generation of a block is complete the

CRC-32 will be in RWD. If the tool wishes to continue calculating CRC beyond the first block it

must rewrite RWCS with AC=1, CRC=1, SZ=10 and appropriate CNT.

WRAPPED

This bit is set when the RWA pointer into the NanoTrace buffer has wrapped at least once. The

emulator should reset this bit when a new NanoTrace session is started.

CCTRL

MIU memory access is routed through the Data Cache. There are two ways of accessing the

data cache, cached and uncached. The safest way of accessing the memory is using cached

reads and uncached writes, the Auto setting of CCTRL automatically uses this configuration.

Note that when the Auto setting is used with NanoTrace, the MIU will write to cached memory to

improve trace performance.

In the cached memory view writes will be write back, and any errors will be routed to the CPU as

bus error, the ERR bit will not be set. Reads will access the cache and see the CPU’s view of the

memory.

In the uncached memory view writes will be write through, but they will update the cache to pre-

serve memory consistency any bus errors will be reported back to the OCD and ERR bit will be

set. Reads will go straight to the bus and bypass any cache buffers. In this mode the memory

view may be different from the CPU’s view of the memory.

CNT

To request a block move, CNT is set by the tool to the number of accesses of data size SZ, zero

is an illegal value. The CNT field is incremented by the OCD system during an in-progress block

move. When CNT wraps to 0, the block move is complete, and the OCD system negates the AC

field. If an error occurs, CNT indicates how far the block access had progressed before the error

occurred.

DV and ERR

148

AVR32

32001A–AVR32–06/06

AVR32

If errors occur, the target will terminate the access, including any remaining block accesses,

within one access cycle of the target. In this case, the access in progress when the RWCS Reg-

ister is written is not guaranteed to complete. Errors are either due to errors on the system bus

during an access requested by the tool, triggered by writing the RWCS Register while any sin-

gle or block access is in progress, or attempting a block access with CNT=0. See Table 9-56 for

a description.

Note that for Read Accesses, DV is always cleared when RWD is read, including for the last

access.

Table 9-56. Read/Write Access Status Bit Encoding

DV

0

ERR

Read Action

Write Action

0

1

0

1

Read Access has not completed

Read Access error has occurred

Read Access completed without error

Not Allowed

Write Access completed without error

Write Access error has occurred

Write Access has not completed

Not Allowed

0

1

9.8.8.2

Read/Write Access Address (RWA) Register

The RWA Register is used by the tool to program the physical address of memory mapped

resource to be accessed, or the lowest physical address (i.e. lowest unsigned value) for a block

access (CNT>0). RWA must correspond to the most significant byte of the data of size SZ. Refer

to “Address Space” on page 143 for a description of the address range during a Memory Block

Access..

Table 9-57. Read/Write Access Address (RWA)

R/W

Bit Number

Field Name

Init. Val.

Description

0x0000_0

000

R/W

31:0

RWA

Physical address to be accessed

9.8.8.3

Read/Write Access Data (RWD) Register

The RWD Register contains the data to be written for the next Memory Block Write access, and

the read data for completed memory read accesses.

Note that the data is presented in little-endian format in the RWD register as shown in Table 9-

59.

Table 9-58. Organization of RWD for Different Data Sizes

Access

31 24

23 16

15

8

7

0

Byte

Half-word

Word

MS Byte

LS Byte

MS Byte

149

32001A–AVR32–06/06

.

Table 9-59. Read/Write Data (RWD)

R/W

Bit Number

Field Name

Init. Val.

Description

32 bits of data read from a physical address

location or to be written to a physical address

location.

0x0000

_0000

R/W

31:0

RWD

9.9

OCD Message Summary

Table 9-60. Message Summary

Public /

Vendor

TCODE

Message

Defined

Page

0

Debug Status (DEBS)

Reserved

Public

page 102

1

2

Ownership Trace (OT)

Public

page 141

page 131

page 132

page 137

page 138

3

Program Trace, Direct Branch (PTDB)

Program Trace, Indirect Branch (PTIB)

Data Trace, Data Write (DTDW)

Data Trace, Data Read (DTDR)

Reserved

4

5

6

7

8

Error (ERROR)

Public

page 117

page 133

9

Program Trace Synchronization (PTSY)

Reserved

10

11

12

13

14

15

16–26

27

28–32

33

34–55

56

57

58-62

Program Trace, Direct Branch with Sync (PTDBS)

Program Trace, Indirect Branch with Sync (PTIBS)

Data Trace, Data Write with Sync (DTDWS)

Data Trace, Data Read with Sync (DTDRS)

Watchpoint Hit (WH)

Public

page 134

page 137

page 138

page 125

Reserved

Program Trace Resource Full (PTRF)

Reserved

Public

page 134

page 135

Program Trace Correlation (PTC)

Reserved

Trace Watchpoint Hit (TWH)

Direct Branch with Target Address (DBTA)

Reserved

Vendor

page 125

page 131

Vendor Defined Extension Message

Reserved

63 (0x3F)

Vendor

150

AVR32

32001A–AVR32–06/06

AVR32

Table 9-60 shows the messages which can be transmitted by the target on the AUX port. OCD

registers can be written by the tool using the JTAG mechanism described in “Debug Port” on

page 111.

Table 9-61 shows the format of the transmitted messages. Packets shown in bold are variable

length, the others are fixed length. All variable length packets can be truncated by omitting lead-

ing zeroes, but will always end on a port boundary.

Table 9-61. Message formats

Message format

TCODE

Nexus Message

Debug Status

Ownership Trace

Error

[5:0]

Packet 1

Packet 2

Packet 3

0

2

8

STATUS[31:0]

PROCESS [31:0]

ECODE[4:0]

-

Program Trace,

Direct Branch

3

-

I-CNT[7:0]

Program Trace,

Direct Branch with

Target Address

57

I-CNT[7:0]

U-ADDR[31:0]

I-CNT[7:0]

PC[31:0]

Program Trace,

Indirect Branch

4

9

EVT-ID[1:0]

U-ADDR[31:0]

Program Trace

Synchronization

-

I-CNT[7:0]

Program Trace,

Direct Branch with

Sync

11

12

-

I-CNT[7:0]

F-ADDR[31:0]

Program Trace,

Indirect Branch with

Sync

EVT-ID[1:0]

I-CNT[7:0]

F-ADDR[31:0]

Program Trace

Resource Full

27

33

5

RCODE[3:0]

EVCODE[3:0]

DSZ[1:0]

RDATA[7:0]

I-CNT[7:0]

Program Trace

Correlation

Data Trace, Data

Write

U-ADDR[31:0]

F-ADDR[31:0]

DATA[31:0]

Data Trace, Data

Read

6

DSZ[1:0]

Data Trace, Data

Write with Sync

13

DSZ[1:0]

Data Trace, Data

Read with Sync

14

15

56

DSZ[1:0]

F-ADDR[31:0]

DATA[31:0]

Watchpoint Hit

WPHIT[7:0]

WPHIT[1:0]

-

Trace Watchpoint

Hit

-

151

32001A–AVR32–06/06

9.10 OCD Register Summary

Use the index shown in the "Register index" column when accessing OCD registers by the

Nexus access mechanism (see Section 9.3.2 on page 111).Use the index shown in the

"mtdr/mfdr index" column when accessing OCD registers by mtdr/mfdr instructions from the

CPU (see Section 9.2.10 on page 98). These indexes are identical to the register index multi-

plied by 4.

Table 9-62. OCD Register Summary

Register

Index

mtdr/mf

dr index

Access

Type

Register

Page

0

Device ID (DID)

R

page 103

1

4

Reserved

—

2

8

Development Control (DC)

R/W

—

page 105

page 107

page 147

3

12

Reserved

4

16

Development Status (DS)

R

5-6

7

20-24

28

Reserved

—

Read/Write Access Control/Status (RWCS)

Reserved

R/W

—

8

32

9

36

Read/Write Access Address (RWA)

Read/Write Access Data (RWD)

Watchpoint Trigger (WT)

R/W

—

page 149

page 129

10

11

12

13

14–15

16-17

18–19

20-21

22

23

24

25

26

27

28

29

30

31

32

33

40

44

48

Reserved

52

Data Trace Control (DTC)

R/W

—

page 139

page 140

56-60

64-68

72-76

80-84

88

Data Trace Start Address (DTSA) Channel 0 to 1

Reserved

Data Trace End Address (DTEA) Channel 0 to 1

Reserved

R/W

—

page 140

PC Breakpoint/Watchpoint Control 0A (BWC0A)

PC Breakpoint/Watchpoint Control 0B (BWC0B)

PC Breakpoint/Watchpoint Control 1A (BWC1A)

PC Breakpoint/Watchpoint Control 1B (BWC1B)

PC Breakpoint/Watchpoint Control 2A (BWC2A)

PC Breakpoint/Watchpoint Control 2B (BWC2B)

Data Breakpoint/Watchpoint Control 3A (BWC3A)

Data Breakpoint/Watchpoint Control 3B (BWC3B)

PC Breakpoint/Watchpoint Address 0A (BWA0A)

PC Breakpoint/Watchpoint Address 0B (BWA0B)

PC Breakpoint/Watchpoint Address 1A (BWA1A)

PC Breakpoint/Watchpoint Address 1B (BWA1B)

R/W

page 126

page 128

page 125

92

96

100

104

108

112

116

120

124

128

132

152

AVR32

32001A–AVR32–06/06

AVR32

Table 9-62. OCD Register Summary

Register

Index

mtdr/mf

dr index

Access

Register

Type

R/W

—

Page

34

136

PC Breakpoint/Watchpoint Address 2A (BWA2A)

PC Breakpoint/Watchpoint Address 2B (BWA2B)

Data Breakpoint/Watchpoint Address 3A (BWA3A)

Data Breakpoint/Watchpoint Address 3B (BWA3B)

Breakpoint/Watchpoint Data 3A (BWD3A)

Breakpoint/Watchpoint Data 3B (BWD3B)

Reserved

page 125

page 127

35

140

36

144

37

148

38

152

39

156

40–65

64

160-260

256

Nexus Configuration (NXCFG)

R

page 103

page 109

65

260

Debug Instruction Register (DINST)

Debug Program Counter (DPC)

R/W

—

66

264

67

268

CPU Control Mask

68

272

Debug Communication CPU Register (DCCPU)

Debug Communication Emulator Register (DCEMU)

Debug Communication Status Register (DCSR)

Ownership Trace Process ID (PID)

Reserved

page 104

page 105

page 141

69

276

70

280

71

284

72-74

75

288-296

300

Event Pair Control 3 (EPC3)

R/W

page 127

page 118

76

304

AUX port Control (AXC)

308-

1020

77– 255

Reserved

—

153

32001A–AVR32–06/06

10. Instruction cycle summary

This chapter presents the grouping of the instructions in the AVR32 architecture. All the instruc-

tions in each group behave similarly in the pipeline, and are discussed as a group in the rest of

this documentation.

10.1 Validity of timing information

This chapter presents information about the timing requirements of each instruction. This infor-

mation should be used together with measurements from cycle-correct simulations. Issues like

branch prediction, data hazards, cache misses and exceptions may cause the cycle require-

ments of real implementations to differ from the theoretical number presented here.

All timing presented here represents best case numbers. The following factors are assumed:

• No data hazards are experienced

• No resource conflicts are encountered in the pipeline

• All data and instruction accesses hit in the caches, and no protection violations are

experienced

10.2 Definitions

The following definitions are used in the tables below:

10.2.1

10.2.2

Issue

An instruction is issued when it leaves the IS stage and enters the M1, A1, or DA stage.

Issue latency

The issue latency represents the number of clock cycles required between the issue of the

instruction and the issue of the following instruction to the same subpipe. Generally, an instruc-

tion has an issue latency of one if the following instruction is issued to another subpipe and no

data hazards exist.

10.2.3

10.2.4

Result latency

Flag latency

The result latency represents the number of cycles between the issue of the instruction and the

availability of the result from the forwarding logic. Some instructions, like 64-bit multiplications,

produce several results. For these instructions, the result latency for both the first part of the

result and the last part of the result are presented. After the result latency period, the data is

available for forwarding, and instructions with data dependencies may execute.

The flag latency represents the number of clock cycles required between the issue of an instruc-

tion updating the flags and the issue of another instruction using the flags. Note that flags are

also forwarded, in most cases making the flags available to the following instruction. As an

example, for an add followed by a branch, the branch will read the flags updated by the add. No

stall is required between the add and the branch.

154

AVR32

32001A–AVR32–06/06

AVR32

10.3 Special considerations

10.3.1

PC as destination register

Most instructions can use PC as destination register. This will result in a jump to the calculated

address. Forwarding is not implemented, so jumping is performed when the target address is

available in WB.

10.3.2

Branch prediction

Branch prediction allows the branch penalty to be removed for correctly predicted branches. For

erroneously predicted branches, a branch delay of four cycles is imposed. For correctly pre-

dicted, folded branches, the branch executes in zero cycles. Erroneously predicted folded

branches execute in four cycles.

Table 10-1. Predicted branch and call cycle requirement

Predicted

correctly

Predicted

erroneously correctly

Folded

erroneously predicted

Not

Instruction

br disp

1

4

0

4

rjmp disp

rcall disp

0

4

NA

10.3.3

Return address stack

A return address stack is implemented, allowing the subprogram return address to be available

early. The return address stack can keep 4 elements. If more elements are pushed, the oldest

element is overwritten. Hardware keeps control over the number of valid elements on the stack.

Stack over- and underflow is handled automatically by hardware, at the cost of performance

loss. When a return is attempted with an empty return address stack, the return instruction is

considered as not predicted.

Table 10-2. Return instruction cycle requirement

Instruction

Predicted correctly

Predicted erroneously Not predicted

ret, cond != AL

1

2

4

-

4

6

ret, cond == AL

mov PC, LR

-

popm with PC in reglist

ldm with PC in reglist

-

155

32001A–AVR32–06/06

10.4 ALU Operations

This group comprises simple single-cycle ALU operations like add and sub. The conditional sub

and mov instructions are also in this group. All instructions in this group take one cycle to exe-

cute, and the result is available for use by the following instruction.

Table 10-3. Timing of ALU operations

Issue

Result

Flag

Mnemonics

Operands

Rd

Description

Absolute value.

Add carry to register.

Add with carry.

Add.

latency latency latency

abs

acr

C

E

C

1

Rd

adc

add

Rd, Rx, Ry

Rd, Rs

Rd, Rx,

E

C

Add shifted.

1

(Ry << sa)

Add signed halfwords.

Rd, Rx<part>,

Ry<part>

addhh.w

(32 ← 16 + 16)

addabs

cp.b

E

C

E

C

E

C

E

Rd, Rx, Ry

Rd, Rs

Add with absolute value.

Compare byte.

1

cp.h

Rd, Rs

Compare halfword.

Rd, Rs

cp.w

Rd, imm

Rd

Compare.

cpc

Compare with carry.

Rd, Rs

max

min

neg

Rd, Rx, Ry

Rd

Return signed maximum.

Return signed minimum.

Two’s Complement.

Rd, Rs

rsub

Reverse subtract.

Subtract with carry.

Rd, Rs, k8

Rd, Rx, Ry

sbc

scr

Subtract carry from

register.

C

E

Rd

1

Rd, Rs

Rd, Rx,

(Ry << sa)

sub

Subtract.

C

E

Rd, imm

1

Rd, imm

Rd, Rs, imm

156

AVR32

32001A–AVR32–06/06

AVR32

Table 10-3. Timing of ALU operations

Subtract signed

halfwords

Rd, Rx<part>,

subhh.w

C

1

Ry<part>

Rd, imm

Rd

(32 ← 16 - 16)

Subtract immediate if

condition satisfied.

sub{cond4}

tnbz

E

C

1

Test no byte equal to

zero.

C

E

C

Rd, Rs

1

and

Rd, Rx, Ry << sa

Rd, Rx, Ry >> sa

Rd, Rs

Logical AND.

andn

Logical AND NOT.

Logical AND High

Halfword, low halfword is

unchanged.

E

Rd, imm

1

andh

andl

Logical AND High

Halfword, clear other

halfword.

Rd, imm, COH

Rd, imm

Logical AND Low

Halfword, high halfword

is unchanged.

Logical AND Low

Halfword, clear other

halfword.

E

C

Rd, imm, COH

Rd

1

One’s Complement

(NOT).

com

eor

C

E

Rd, Rs

1

Rd, Rx, Ry << sa

Rd, Rx, Ry >> sa

Logical Exclusive OR.

Logical Exclusive OR

(High Halfword).

eorh

eorl

E

Rd, imm

1

Logical Exclusive OR

(Low Halfword).

C

E

Rd, Rs

1

or

Rd, Rx, Ry << sa

Rd, Rx, Ry >> sa

Logical (Inclusive) OR.

Logical OR (High

Halfword).

orh

E

Rd, imm

1

Logical OR (Low

Halfword).

orl

E

C

E

Rd, imm

1

tst

Rd, Rs

Test register for zero.

Insert the lower w5 bits of

Rs in Rd at bit-offset o5.

bfins

Rd, Rs, o5, w5

157

32001A–AVR32–06/06

Table 10-3. Timing of ALU operations

Extract and sign-extend

the w5 bits in Rs starting

at bit-offset o5 to Rd.

bfexts

bfextu

E

Rd, Rs, o5, w5

1

Extract and zero-extend

the w5 bits in Rs starting

at bit-offset o5 to Rd.

bld

E

C

E

Rd, b5

Rd

Bit load.

1

brev

bst

Bit reverse.

Bit store.

Rd, b5

Typecast byte to signed

word.

casts.b

casts.h

castu.b

castu.h

C

Rd

1

Typecast halfword to

signed word.

Typecast byte to

unsigned word.

Typecast halfword to

unsigned word.

cbr

C

E

C

Rd, b5

Rd, Rs

Rd, b5

Rd

Clear bit in register.

Count leading zeros.

Set bit in register.

1

clz

sbr

swap.b

Swap bytes in register.

Swap bytes in each

halfword.

swap.bh

swap.h

C

Rd

1

Swap halfwords in

register.

E

C

E

C

E

C

Rd, Rx, Ry

Rd, Rs, sa

Rd, sa

1

Arithmetic shift right

(signed).

asr

lsl

Rd, Rx, Ry

Rd, Rs, sa

Rd, sa

Logical shift left.

Rd, Rx, Ry

Rd, Rs, sa

Rd, sa

lsr

Logical shift right.

rol

Rd

Rotate left through carry.

Rotate right through

carry.

ror

C

Rd

1

C

E

C

Rd, imm

Rd, Rs

1

Load immediate into

register.

mov

Copy register.

158

AVR32

32001A–AVR32–06/06

AVR32

Table 10-3. Timing of ALU operations

Copy register if condition

is true.

E

Rd, Rs

1

mov{cond4}

Load immediate into

register if condition is

true.

E

Rd, imm

csrf

C

b5

Rd

Clear status register flag.

1

Copy status register flag

to C and Z.

csrfcz

ssrf

Set status register flag.

Conditionally set register

to true or false.

sr{cond4}

10.5 Multiply16 operations

These instructions require one pass through the multiplier array and produce a 32-bit result. For

mulrndhh, a rounding value of 0x8000 is added to the product producing the final result. This

group does not set any flags, except for the mulsat instructions which set Q if saturation

occurred. The Q flag is a sticky flag, so subsequent instructions will not stall due to Q flag

dependencies.

Table 10-4. Timing of Multiply16 operations

Issue

Result

Flag

Mnemonics

Operands

Description

latency latency latency

mul

E

Rd, Rs, imm

Multiply immediate.

1

2

N/A

Signed Multiply of

halfwords.

Rd, Rx<part>,

Ry<part>

mulhh.w

mulnhh.w

mulnwh.d

mulwh.d

(32 ← 16 x 16)

Signed Multiply of

halfwords.

Rd, Rx<part>,

Ry<part>

E

1

2

N/A

(32 ← 16 x 16)

Signed Multiply, word

and halfword.

2 +

delaye

d wb

Rd, Rx, Ry<part>

(48 ← 32 x 16)

Signed Multiply, word

and halfword.

2 +

delaye

d wb

(48 ← 32 x 16)

Fractional signed

multiply with saturation.

Return halfword.

Rd, Rx<part>,

Ry<part>

mulsathh.h

mulsathh.w

E

1

2

Q: 3

(16 ← 16 x 16)

Fractional signed

multiply with saturation.

Return word.

Rd, Rx<part>,

Ry<part>

(32 ← 16 x 16)

159

32001A–AVR32–06/06

Table 10-4. Timing of Multiply16 operations

Fractional signed

multiply with saturation.

Return word.

mulsatwh.w

E

Rd, Rx, Ry<part>

1

2

Q: 3

(32 ← 32 x 16)

Signed multiply with

rounding. Return

halfword.

Rd, Rx<part>,

Ry<part>

mulsatrndhh.h

mulsatrndwh.w

(16 ← 16 x 16)

Signed multiply with

rounding. Return

halfword.

Rd, Rx, Ry<part>

(32 ← 32 x 16)

10.6 Mac16 operations

These instructions require one pass through the multiplier array and produce a 32-bit result. This

result is added to an accumulator register. A valid copy of this accumulator may be cached in the

accumulator cache. Otherwise, an extra cycle is needed to read the accumulator from the regis-

ter file. Therefore, issue and result latencies depend on whether the accumulator is cached in

the AccCache.

The machh.d and macwh.d instruction uses a 48-bit accumulator. The accumulator in the MUL

pipeline is wide enough to perform an 48-bit accumulation in a single cycle. The requirements for

machh.d and macwh.d is listed separately below. In these two instructions, the high part of the

result is written back first, contrary to the other doubleword instructions. The low part of the

result is written back when the MUL write port is idle. This implies that other MUL instructions

may complete before the low part of a machh.d or macwh.d is written back. Hardware interlocks

are present in order to guarantee correct execution in this case, guaranteeing that no hazards

will occur.

This group does not set any flags, except for the macsat instruction which set Q if saturation

occurred. The Q flag is a sticky flag, so subsequent instructions will not stall due to Q flag depen-

dencies. If saturation occurred, the Q flag is set after 3 or 4 cycles, depending on an

accumulator cache hit.

160

AVR32

32001A–AVR32–06/06

AVR32

Table 10-5. Timing of Mac16 operations

Issue

Result

Flag

Mnemonics

Operands

Description

latency latency latency

Multiply signed halfwords

and accumulate.

Rd, Rx<part>,

Ry<part>

machh.w

E

1/2

2/3

N/A

(32 ← 16x16 + 32)

Multiply signed halfwords

and accumulate.

2/3 +

delaye

d wb

Rd, Rx<part>,

Ry<part>

machh.d

macwh.d

(48 ← 16x16 + 48)

Multiply signed word and

halfword and

accumulate.

2/3 +

delaye

d wb

E

Rd, Rx, Ry<part>

1/2

N/A

(48 ← 32 x 16 + 48)

Fractional signed multiply

accumulate with

saturation. Return word.

Q:

Rd, Rx<part>,

Ry<part>

macsathh.w

2/3

3/4

(32 ← 16 x 16 + 32)

10.7 MulMac32 operations

These instructions require two passes through the multiplier array to produce a 32-bit result. For

mac, a valid copy of this accumulator may be cached in the accumulator cache. Otherwise, an

extra cycle is needed to read the accumulator from the register file. Therefore, issue and result

latencies depend on whether a valid entry is found in the accumulator cache.

Table 10-6. Timing of MulMac32 operations

Issue

Result

Flag

Mnemonics

Operands

Description

latency latency latency

Multiply accumulate.

mac

E

Rd, Rx, Ry

2/3

2

3/4

3

N/A

(32 ← 32x32 + 32)

Multiply.

mul

Rd, Rx, Ry

(32 ← 32 x 32)

10.8 MulMac64 operations

These instructions require two passes through the multiplier array to produce a 64-bit result. For

macs and macu, a valid copy of this accumulator may be cached in the accumulator cache. Oth-

erwise, an extra cycle is needed to read the accumulator from the register file. Therefore, issue

and result latencies depend on whether a valid entry is found in the accumulator cache. The low

161

32001A–AVR32–06/06

part of the result is written back 1 cycle before the high part, and the result latencies presented

are for the low part of the result.

Table 10-7. Timing of MulMac64 operations

Issue

Result

Flag

Mnemonics

Operands

Description

latency latency latency

Multiply signed

accumulate.

macs.d

E

Rd, Rx, Ry

3/4

4/5

N/A

(64 ← 32x32 + 64)

Multiply unsigned

accumulate.

macu.d

Rd, Rx, Ry

(64 ← 32x32 + 64)

Signed Multiply.

muls.d

mulu.d

E

Rd, Rx, Ry

3

4

N/A

(64 ← 32 x 32)

Unsigned Multiply.

(64 ← 32 x 32)

10.9 Divide operations

These instructions require several cycles in the multiply pipeline to complete. The quotient (Q) is

written back 1 cycle before the remainder (R).

Table 10-8. Timing of divide operations

Issue

Result

Flag

Mnemonics

Operands

Description

latency latency latency

Divide signed.

(32 ← 32/32)

(32 ← 32%32)

Q:33

divs

E

Rd, Rx, Ry

33

N/A

R:34

Divide unsigned.

(32 ← 32/32)

(32 ← 32%32)

Q:33

R:34

divu

Rd, Rx, Ry

10.10 Saturate operations

The saturate instructions use both the A1 and A2 stages to produce a valid result. Flags are for-

warded so that they are ready for the following instruction to use.

Table 10-9. Timing of saturate operations

Issue

Result

Flag

Mnemonics

satadd.h

Operands

Rd, Rx, Ry

Description

latency latency latency

Saturated add

halfwords.

E

1

2

1

satadd.w

satsub.h

Saturated add.

Saturated subtract

halfwords.

E

Rd, Rx, Ry

1

2

1

satsub.w

Saturated subtract.

Rd, Rs, imm

162

AVR32

32001A–AVR32–06/06

AVR32

Table 10-9. Timing of saturate operations (Continued)

Signed saturate from bit

given by sa after a right

shift with rounding of b5

bit positions.

satrnds

E

Rd >> sa, b5

1

2

1

Unsigned saturate from

bit given by sa after a

right shift with rounding

of b5 bit positions.

satrndu

E

Rd >> sa, b5

Shift sa positions and do

signed saturate from bit

given by b5.

sats

satu

E

Rd >> sa, b5

1

2

1

Shift sa positions and do

unsigned saturate from

bit given by b5.

10.11 Load and store operations

This group includes all the load and store instructions. The LS pipeline has a dedicated adder

with an operand shift functionality, which performs all the address calculations except the ones

needed for indexed addressing. The additions needed in indexed addressing is performed by

the adder in the A1 stage. The A1 adder also performs the writeback address calculation for

autoincrement and autodecrement operation.

Loaded word data are available directly after the D pipestage. Byte and halfword data must be

extended and rotated before they are valid. This is performed in the WB stage. Ldins and ldswp

instructions also require modification in the WB stage before their results are valid. Stswp

instructions require modification before their data is output to the cache. This modification is per-

formed in the D stage. All store instructions may experience write-after-read hazards, and

therefore subsequent instructions writing to the register to be stored are stalled until the store

instruction has left the D stage.

Load of unaligned word addresses will increase the issue and result latency with one or two

cycles, depending on the alignment. Store of unaligned word addresses will increase the issue

latency with one or two cycles, depending on the alignment. Load of word-aligned doubleword

will increase the issue and result latency with one cycle. Store of word-aligned doubleword will

increase the issue latency with one cycle.

Table 10-10. Timing of load and store operations

Issue

Result

Flag

Mnemonics

Operands

Description.

latency latency latency

Load unsigned byte with

post-increment.

C

Rd, Rp++

1

3

N/A

Load unsigned byte with

pre-decrement.

Rd, --Rp

ld.ub

C

E

Rd, Rp[disp]

1

3

N/A

Load unsigned byte with

displacement.

Indexed Load unsigned

byte.

E

Rd, Rb[Ri<<sa]

1

3

N/A

163

32001A–AVR32–06/06

Table 10-10. Timing of load and store operations (Continued)

Load signed byte with

displacement.

E

C

Rd, Rp[disp]

Rd, Rb[Ri<<sa]

Rd, Rp++

1

3

N/A

ld.sb

Indexed Load signed

byte.

Load unsigned halfword

with post-increment.

Load unsigned halfword

with pre-decrement.

Rd, --Rp

ld.uh

C

E

Rd, Rp[disp]

1

3

N/A

Load unsigned halfword

with displacement.

Indexed Load unsigned

halfword.

E

C

Rd, Rb[Ri<<sa]

Rd, Rp++

1

3

N/A

Load signed halfword

with post-increment.

Load signed halfword

with pre-decrement.

Rd, --Rp

ld.sh

C

E

Rd, Rp[disp]

1

3

N/A

Load signed halfword

with displacement.

Indexed Load signed

halfword.

E

C

Rd, Rb[Ri<<sa]

Rd, Rp++

1

3

2

N/A

Load word with post-

increment.

Load word with pre-

decrement.

Rd, --Rp

C

E

Rd, Rp[disp]

Rd, Rb[Ri<<sa]

1

2

N/A

Load word with

displacement.

ld.w

Indexed Load word.

Rd, Rp[

Load word with extracted

index.

E

C

1

2

N/A

Ri<part> << 2]

Load doubleword with

post-increment.

Rd, Rp++

Load doubleword with

pre-decrement.

C

E

Rd, --Rp

1

2

N/A

ld.d

Rd, Rp

Load doubleword.

Load double with

displacement.

Rd, Rp[disp]

Rd, Rb[Ri<<sa]

Indexed Load double.

Load byte with

Rd<part>,

Rp[disp]

displacement and insert

at specified byte location

in Rd.

ldins.b

E

1

3

N/A

164

AVR32

32001A–AVR32–06/06

AVR32

Table 10-10. Timing of load and store operations (Continued)

Load halfword with

Rd<part>,

Rp[disp]

displacement and insert

at specified halfword

location in Rd.

ldins.h

E

1

3

N/A

Load halfword with

displacement, swap

bytes and sign-extend

ldswp.sh

ldswp.uh

ldswp.w

E

1

3

N/A

Load halfword with

displacement, swap

bytes and zero-extend

Rd, Rp[disp]

Load word with

displacement and swap

bytes.

Load with displacement

from PC.

lddpc

lddsp

C

Rd, PC[disp]

Rd, SP[disp]

Rp++, Rs

1

2

1

N/A

Load with displacement

from SP.

Store with post-

increment.

Store with pre-

decrement.

--Rp, Rs

st.b

st.d

st.h

C

E

Rp[disp], Rs

Rb[Ri<<sa], Rs

1

N/A

Store byte with

displacement.

Indexed Store byte.

Store with post-

increment.

C

Rp++, Rs

1

N/A

Store with pre-

decrement.

C

E

C

--Rp, Rs

1

N/A

Rp, Rs

Store doubleword

Store double with

displacement

Rp[disp], Rs

Rb[Ri<<sa], Rs

Rp++, Rs

Indexed Store double.

Store with post-

increment.

Store with pre-

decrement.

C

--Rp, Rs

1

N/A

C

E

Rp[disp], Rs

Rb[Ri<<sa], Rs

1

N/A

Store halfword with

displacement.

Indexed Store halfword.

165

32001A–AVR32–06/06

Table 10-10. Timing of load and store operations (Continued)

Store with post-

increment.

C

Rp++, Rs

--Rp, Rs

1

N/A

Store with pre-

decrement.

st.w

C

E

Rp[disp], Rs

Rb[Ri<<sa], Rs

1

N/A

Store word with

displacement.

Indexed Store word.

Conditional store with

displacement.

stcond

stdsp

E

C

Rp[disp], Rs

SP[disp], Rs

1

N/A

Store with displacement

from SP.

Combine halfwords to

word and store with

displacement

Rp[disp<<2], Rx,

Ry

E

1

N/A

sthh.w

Rb[Ri<<sa], Rx,

Ry

Combine halfwords to

word and store indexed

Swap bytes and store

halfword with

displacement.

stswp.h

stswp.w

Rp[disp], Rs

Swap bytes and store

word with displacement.

10.12 Load and store multiple operations

These instructions perform multiple data accesses. The writeback pointer is calculated by the A1

adder if needed, and the incremental pointer addresses are generated by the DA adder under

control by the LS pipeline control FSM. This FSM enables the LS pipe to operate decoupled

from the rest of the pipeline.

As the table shows, the updated pointer is available after the instruction has left the A1 stage. If

PC is specified for a load, and the return stack is empty, a 6 cycle penalty is taken, as the pipe-

line must be flushed. If enough registers are specified in the register list, this PC load penalty will

be masked by the regular register loads.

Store multiple instructions have the same write-after-write hazard detection as regular store

instructions. Subsequent instructions writing to a register that is to be stored are stalled until the

store has left the D stage.

166

AVR32

32001A–AVR32–06/06

AVR32

LDM and POPM have a flag latency of 2 cycles.

Table 10-11. Timing of load and store multiple operations

First

Pointer loaded

Penalt

y for

PC

update

ready

data

ready

Mnemonics

Operands

Description

load

Load multiple registers.

R12 is tested if PC is

loaded.

ldm

E

Rp{++}, Reglist16

1

2

+6

Load multiple registers in

application context for

task switch.

ldmts

popjc

popm

pushjc

E

C

Rp{++}, Reglist16

1

2

-

+6

-

Pop Java context from

frame

Pop multiple registers

from stack. R12 is tested

if PC is popped.

Reglist8

+6

-

Push Java context to

frame

Push multiple registers to

stack.

pushm

stm

C

E

Reglist8

1

-

{--}Rp, Reglist16

Store multiple registers.

Store multiple registers in

application context for

task switch.

stmts

E

{--}Rp, Reglist16

1

-

10.13 Branch operations

The branch instructions are subject to branch prediction. This implies that the latencies related

to branches depends on whether the prefetch unit correctly predicted the outcome of the branch,

and if it had time to prefetch the branch target. The rjmp instruction is unconditional, and always

taken. It can never be predicted incorrectly.

The ret instruction has dedicated return stack hardware. The return address of call instructions is

pushed on a 4-entry loop stack. When a ret instruction is encountered and predicted taken, the

top stack element is popped and the instruction fetches are redirected to this address. In a way,

ret behaves very similarly to branches, except that their target address is fetched from a loop

stack when predicted taken.

167

32001A–AVR32–06/06

Table 10-12. Timing of branch operations

Predict Predict

ed ed

correct incorre Predict

ly ctly able

Mnemonics

br{cond3}

br{cond4}

rjmp

Operands

disp

Description

C

E

C

Branch if condition

satisfied.

disp

See chapter 10.3.2 and

chapter 10.3.3

disp

Relative jump.

Conditional return from

subroutine with move and

test of return value.

ret{cond4}

C

Rs

10.14 Call operations

Call instructions behave similarly to branches, except that the link register must be updated.

Branches can therefore never be reduced to zero cycles. The relative branches and acall are

always predicted, and can never be predicted incorrectly. The other call instructions are never

predicted, and will therefore have to flow through the pipeline. Mcall and acall will flow through

the pipeline, and the loaded target address is not ready until the WB pipestage. All correctly pre-

dicted instructions take 1 or 2 cycles, depending on their size and alignment.

Table 10-13. Timing of call operations

Predict Predict

ed

Operands /

Syntax

correct incorre Predict

Mnemonics

acall

Description

ly

ctly

6

able

No

C

E

C

E

C

disp

Application call.

Register indirect call.

Memory call.

-

icall

Rd

-

4

No

mcall

Rp[disp]

disp

-

6

No

1/2

-

4

Yes

No

rcall

Relative call.

disp

4

scall

Supervisor call.

Breakpoint.

4

breakpoint

-

4

No

10.15 Return from exception operations

These instructions are never predicted, but flow through the pipe as regular instructions. The tar-

get address is calculated when the instruction is in the A1 stage. In the following cycle, the target

instruction is fetched, and the execution stream continues from there.

Table 10-14. Timing of return from exception operations

Operands /

Syntax

Issue

Result

Flag

Mnemonics

retd

Description

latency latency latency

C

Return from debug mode

Return from exception

4

N/A

rete

Return from supervisor

call

rets

C

4

N/A

168

AVR32

32001A–AVR32–06/06

AVR32

10.16 Swap operation

The swap instruction perform two atomical memory accesses, first one read and then one write.

Write-after-write hazards may arise for the store part of xchg. This will stall subsequent instruc-

tions writing to the register to store until the store part of xchg has left D.

Table 10-15. Timing of swap operation

Operands /

Syntax

Issue

Result

Flag

Mnemonics

Description.

latency latency latency

Exchange register and

memory

xchg

E

Rd, Rx, Ry

2

3

N/A

10.17 System register operations

This group moves data to and from the system registers. Forwarding and hazard detection is

implemented for the system registers. Latencies vary depending on where the system register

being is placed in the system, refer to Table 2-2 on page 10 for details. Accesses to system reg-

isters in A1 take one cycle. Accesses to registers on the TCB bus have a read latency of four

cycles, and writes have an issue latency of one cycle. Special care must be taken to avoid haz-

ards when using the some of these instructions, refer to Section 3.9 on page 27 for details.

Table 10-16. Timing of system register operations

Operands /

Syntax

Issue

Result

Flag

Mnemonics

Description.

latency latency latency

Move debug register to

Rd.

mfdr

E

C

Rd, SysRegNo

SysRegNo, Rs

Rs

2

4

N/A

Move system register to

Rd.

mfsr

1/2

1

1/4

4

Move Rs to debug

register.

mtdr

mtsr

Move Rs to system

register.

1

4

Move Rs to status

register.

musfr

mustr

1

2

Move status register to

Rd.

Rd

1

2

tlbr

C

Read TLB entry.

Search TLB for entry.

Write TLB entry.

1

3

N/A

tlbs

tlbw

169

32001A–AVR32–06/06

10.18 System control operations

This group contains simple single-cycle instructions that control the behaviour of different parts

of the system. Special care must be taken to avoid hazards when using the cache instruction,

refer to Section 3.9.5 on page 29 for details.

Table 10-17. Timing of system control operations

Operands /

Syntax

Issue

Result

Flag

Mnemonics

Description.

latency latency latency

cache

E

C

Rp[disp], Op5

Perform cache operation

1

N/A

Invalidate the return

address stack

frs

pref

E

Rp[disp]

Op8

Prefetch cache line

Enter SLEEP mode.

Flush write buffer

1

N/A

sleep

sync

Op8

10.19 Coprocessor operations

Figure 10-1. Timing of coprocessor operations

Operands /

Issue

Result

Flag

Mnemonics

Syntax

Description.

latency

CP#, CRd, CRx,

CRy, Op

cop

E

Coprocessor operation.

1

-

N/A

CP#, CRd,

Rp[disp]

Load coprocessor

register.

Load coprocessor

register with pre-

decrement.

E

CP#, CRd, --Rp

ldc.d

5

Load coprocessor

register with indexed

addressing.

CP#, CRd,

Rb[Ri<<sa]

Load coprocessor 0

register.

ldc0.d

ldc.w

E

CRd, Rp[disp]

CP#, CRd,

Rp[disp]

Load coprocessor

register.

Load coprocessor

register with pre-

decrement.

E

CP#, CRd, --Rp

Load coprocessor

register with indexed

addressing.

CP#, CRd,

Rb[Ri<<sa]

Load coprocessor 0

register.

ldc0.w

ldcm.d

ldcm.w

E

CRd, Rp[disp]

1

5

N/A

CP#, Rp{++},

ReglistCPD8

Load multiple

coprocessor registers.

As LDM

CP#, Rp{++},

ReglistCPH8

Load multiple

coprocessor registers.

170

AVR32

32001A–AVR32–06/06

AVR32

Figure 10-1. Timing of coprocessor operations (Continued)

CP#, Rp{++},

ReglistCPL8

Load multiple

coprocessor registers.

ldcm.w

mvcr.d

mvcr.w

mvrc.d

mvrc.w

E

As LDM

N/A

Move from coprocessor

to register.

CP#, Rd, CRs

CP#, CRd, Rs

2

1

4

5

Move from coprocessor

to register.

Move from register to

coprocessor.

Move from register to

coprocessor.

CP#, Rp[disp],

CRs

Store coprocessor

register.

Store coprocessor

register with post-

increment.

E

CP#, Rp++, CRs

stc.d

1

5

N/A

Store coprocessor

register with indexed

addressing.

CP#, Rb[Ri<<sa],

CRs

Store coprocessor 0

register.

stc0.d

stc.w

E

Rp[disp], CRs

CP#, Rp[disp],

CRs

Store coprocessor

register.

Store coprocessor

register with post-

increment.

E

CP#, Rp++, CRs

Store coprocessor

register with indexed

addressing.

CP#, Rb[Ri<<sa],

CRs

Store coprocessor 0

register.

stc0.w

stcm.d

stcm.w

E

Rp[disp], CRs

N/A

CP#, {--}Rp,

ReglistCPD8

Store multiple

coprocessor registers.

As STM

+1

As STM

+1

CP#, {--}Rp,

ReglistCPH8

Store multiple

coprocessor registers.

As STM

+1

As STM

+1

CP#, {--}Rp,

ReglistCPL8

Store multiple

coprocessor registers.

As STM

+1

As STM

+1

10.20 Java return operation

Table 10-18. Timing of Java return operation

Operands /

Issue

Result

Flag

Mnemonics

Syntax

Description.

latency latency latency

retj

C

Return from Java trap.

4

N/A N/A

171

32001A–AVR32–06/06

10.21 SIMD operations

This group comprises instructions operating on multiple data in parallel. Some instructions in this

group take one cycle to execute, and the result is available for use by the following instruction.

Other instructions perform saturation in A2, and need two cycles before the result is ready.

Table 10-19. Timing of SIMD Operations

Operands /

Syntax

Issue

Result

Flag

Mnemonics

pabs.{sb/sh}

packsh.{ub/sb}

Description.

latency latency latency

E

Rd, Rs

Packed Absolute Value.

Pack Halfwords to Bytes.

1

Rd, Rx, Ry

Pack Words to

Halfwords.

packw.sh

E

Rd, Rx, Ry

1

padd.{b/h}

paddh.{ub/sh}

Packed Addition.

Packed Addition with

halving.

padds.{ub/sb/u

h/sh}

Packed Addition with

Saturation.

E

Rd, Rx, Ry

1

2

1

Rd, Rx<part>,

Ry<part>

Packed Halfword

Addition and Subtraction.

paddsub.h

Packed Halfword

Addition and Subtraction

with halving.

Rd, Rx<part>,

Ry<part>

paddsubh.sh

E

1

2

1

Packed Halfword

Addition and Subtraction

with Saturation.

paddsubs.{uh/

sh}

Rd, Rx<part>,

Ry<part>

Packed Halfword

Addition with Crossed

Operand.

paddx.h

Rd, Rx, Ry

Packed Halfword

Addition with Crossed

Operand and Halving.

paddxh.sh

Packed Halfword

paddxs.{uh/sh}

pasr.{b/h}

E

Rd, Rx, Ry

Rd, Rs, {sa}

Addition with Crossed

Operand and Saturation.

1

2

1

Packed Arithmetic Shift

Left.

pavg.{ub/sh}

plsl.{b/h}

E

Rd, Rx, Ry

Rd, Rs, {sa}

Rd, Rx, Ry

Packed Average.

1

Packed Logic Shift Left.

Packed Logic Shift Right.

Packed Maximum Value.

Packed Minimum Value.

plsr.{b/h}

pmax.{ub/sh}

pmin.{ub/sh}

Sum of Absolute

Differences.

psad

E

Rd, Rx, Ry

1

2

1

2

1

psub.{b/h}

psubadd.h

Packed Subtraction.

Rd, Rx<part>,

Ry<part>

Packed Halfword

Subtraction and Addition.

172

AVR32

32001A–AVR32–06/06

AVR32

Table 10-19. Timing of SIMD Operations

Packed Halfword

Subtraction and Addition

with halving.

Rd, Rx<part>,

Ry<part>

psubaddh.sh

E

1

2

1

Rd, Rx<part>,

Ry<part>

Packed Halfword

Subtraction and Addition

with Saturation.

psubadds.{uh/

sh}

Packed Subtraction with

halving.

psubh.{ub/sh}

E

Rd, Rx, Ry

1

2

1

psubs.{ub/sb/u

h/sh}

Packed Subtraction with

Saturation.

Packed Halfword

Subtraction with Crossed

Operand.

psubx.h

E

Rd, Rx, Ry

1

Packed Halfword

Subtraction with Crossed

Operand and Halving.

psubxh.sh

psubxs.{uh/sh}

Packed Halfword

Subtraction with Crossed

Operand and Saturation.

E

Rd, Rx, Ry

1

2

1

punpck{ub/sb}.

h

Unpack Bytes to

Halfwords.

Rd, Rs<part>

173

32001A–AVR32–06/06

11. Glossary

The following abbreviations and terms are used in this document.

Recoverable Exception

Processor Consistency

An exception that saves enough state so that normal program execution can resume

after the exception routine has finished.

A strict processor consistency is maintained if only recoverable exceptions can occur.

Otherwise, the processor has a weak consistency.

Instruction Commit

Processor State

An instruction is said to be committed when it has updated the processor state.

The processor state is comprised of the following modules:

• The register file

• The status register

• The system registers

• The coprocessors

• The MMU

• The debug hardware

Contaminated instruction

An instruction flowing through the pipeline that has somehow caused an exception. If

such an instruction is about to commit, the exception routine will be entered. An example

of a contaminated instruction is an instruction that caused a protection violation when it

was fetched. Contaminated instructions will not always generate exceptions, they may

for example be flushed from the pipe by branches further down the pipe.

Branch History Table

BHT

BTB

Branch Target Buffer

HUM

See Hit under miss

Icache

Dcache

Frozen instruction

Nexus

Instruction cache

Data cache

Instruction halted in a pipeline stage du to some kind of data hazard

The IEEE-ISTO 5001™-2003 debug standard for embedded processors.

On-Chip Debug

OCD

AUX

API

The Nexus-defined Auxiliary port for trace debug information.

Application Program Interface

JTAG

Joint Test Action Group, i.e. IEEE1149.1 standard

Flow Control Unit

FCU

MIU

Memory Interface Unit

JVM

Java Virtual Machine

Debug Mode

Monitor Mode

OCD Mode

A CPU mode dedicated to executing instructions for debug purposes.

Debug Mode running from instructions fetched from memory.

Debug Mode running from instructions entered by an external debugger.

174

AVR32

32001A–AVR32–06/06

AVR32

12. Revision History

12.1 Rev. 32001A-06/06

1.

Initial version.

175

32001A–AVR32–06/06

176

AVR32

32001A–AVR32–06/06

AVR32

Table of contents

1

2

3

Introduction .............................................................................................. 2

1.1

1.2

1.3

1.4

1.5

1.6

The AVR family ..............................................................................................2

The AVR32 Microprocessor Architecture ......................................................2

Event handling ...............................................................................................3

Java Support ..................................................................................................3

Microarchitectures ..........................................................................................4

The AVR32 AP implementation .....................................................................5

Programming Model ................................................................................ 6

2.1

2.2

2.3

2.4

2.5

2.6

Architectural compatibility ..............................................................................6

Implementation options ..................................................................................6

Register file configuration ..............................................................................6

Status register configuration ..........................................................................7

System registers ..........................................................................................10

Configuration Registers ...............................................................................16

Pipeline ................................................................................................... 21

3.1

Overview ......................................................................................................21

Prefetch unit .................................................................................................21

Decode unit ..................................................................................................22

ALU pipeline .................................................................................................22

Multiply pipeline ...........................................................................................23

Load-store pipeline ......................................................................................24

Writeback .....................................................................................................25

Forwarding hardware and hazard detection ................................................25

Hazards not handled by the hardware .........................................................27

Event handling .............................................................................................30

Entry points for events .................................................................................32

Interrupt latencies ........................................................................................46

Processor consistency .................................................................................47

3.2

3.3

3.4

3.5

3.6

3.7

3.8

3.9

3.10

3.11

3.12

3.13

4

5

Virtual memory ....................................................................................... 48

4.1

4.2

4.3

Memory map ................................................................................................48

Understanding the MMU ..............................................................................50

Operation of the MMU and MMU exceptions ...............................................60

Prefetch Unit ........................................................................................... 63

i

32001A–AVR32–06/06

5.1

5.2

Instruction buffer ..........................................................................................63

Branch prediction .........................................................................................64

6

7

Instruction Cache ................................................................................... 69

6.1

6.2

6.3

6.4

Behaviour .....................................................................................................69

Cache operations .........................................................................................70

Memory coherency ......................................................................................71

Debug access to ICache memories .............................................................72

Data Cache and Write Buffer ................................................................ 74

7.1

7.2

7.3

7.4

7.5

7.6

Data cache behaviour ..................................................................................74

Write buffer behaviour ..................................................................................75

Cache and write buffer operations ...............................................................75

Prefetch instruction ......................................................................................77

Sync instructions ..........................................................................................77

Memory mapped cache memories ...............................................................77

8

9

Coprocessor interface ........................................................................... 79

8.1

8.2

8.3

8.4

8.5

Coprocessor pipeline ...................................................................................79

TCB specification .........................................................................................79

Connecting coprocessors to the TCB bus ...................................................82

Execution of coprocessor instructions .........................................................82

Timing diagrams ..........................................................................................84

OCD system ............................................................................................ 86

9.1

9.2

9.3

9.4

9.5

9.6

9.7

9.8

9.9

9.10

Overview ......................................................................................................86

CPU Development Support ..........................................................................90

Debug Port .................................................................................................111

Breakpoints ................................................................................................119

Program trace ............................................................................................130

Data Trace .................................................................................................136

Ownership Trace ........................................................................................140

Memory Interface .......................................................................................141

OCD Message Summary ...........................................................................150

OCD Register Summary ............................................................................152

10 Instruction cycle summary ................................................................. 154

10.1

10.2

Validity of timing information ......................................................................154

Definitions ..................................................................................................154

ii

AVR32

32001A–AVR32–06/06

AVR32

10.3

Special considerations ...............................................................................155

ALU Operations .........................................................................................156

Multiply16 operations .................................................................................159

Mac16 operations ......................................................................................160

MulMac32 operations .................................................................................161

MulMac64 operations .................................................................................161

Divide operations .......................................................................................162

Saturate operations ....................................................................................162

Load and store operations .........................................................................163

Load and store multiple operations ............................................................166

Branch operations ......................................................................................167

Call operations ...........................................................................................168

Return from exception operations ..............................................................168

Swap operation ..........................................................................................169

System register operations ........................................................................169

System control operations .........................................................................170

Coprocessor operations .............................................................................170

Java return operation .................................................................................171

SIMD operations ........................................................................................172

10.4

10.5

10.6

10.7

10.8

10.9

10.10

10.11

10.12

10.13

10.14

10.15

10.16

10.17

10.18

10.19

10.20

10.21

11 Glossary ................................................................................................ 174

12 Revision History ................................................................................... 175

12.1

Rev. 32001A-06/06 ....................................................................................175

iii

32001A–AVR32–06/06

iv

AVR32

32001A–AVR32–06/06

Atmel Corporation

Atmel Operations

2325 Orchard Parkway

San Jose, CA 95131, USA

Tel: 1(408) 441-0311

Fax: 1(408) 487-2600

Memory

RF/Automotive

Theresienstrasse 2

Postfach 3535

74025 Heilbronn, Germany

Tel: (49) 71-31-67-0

Fax: (49) 71-31-67-2340

2325 Orchard Parkway

San Jose, CA 95131, USA

Tel: 1(408) 441-0311

Fax: 1(408) 436-4314

Regional Headquarters

Microcontrollers

2325 Orchard Parkway

San Jose, CA 95131, USA

Tel: 1(408) 441-0311

Fax: 1(408) 436-4314

1150 East Cheyenne Mtn. Blvd.

Colorado Springs, CO 80906, USA

Tel: 1(719) 576-3300

Europe

Atmel Sarl

Route des Arsenaux 41

Case Postale 80

CH-1705 Fribourg

Switzerland

Tel: (41) 26-426-5555

Fax: (41) 26-426-5500

Fax: 1(719) 540-1759

Biometrics/Imaging/Hi-Rel MPU/

High Speed Converters/RF Datacom

Avenue de Rochepleine

La Chantrerie

BP 70602

44306 Nantes Cedex 3, France

Tel: (33) 2-40-18-18-18

Fax: (33) 2-40-18-19-60

BP 123

38521 Saint-Egreve Cedex, France

Tel: (33) 4-76-58-30-00

Fax: (33) 4-76-58-34-80

Asia

Room 1219

Chinachem Golden Plaza

77 Mody Road Tsimshatsui

East Kowloon

Hong Kong

Tel: (852) 2721-9778

Fax: (852) 2722-1369

ASIC/ASSP/Smart Cards

Zone Industrielle

13106 Rousset Cedex, France

Tel: (33) 4-42-53-60-00

Fax: (33) 4-42-53-60-01

1150 East Cheyenne Mtn. Blvd.

Colorado Springs, CO 80906, USA

Tel: 1(719) 576-3300

Japan

9F, Tonetsu Shinkawa Bldg.

1-24-8 Shinkawa

Chuo-ku, Tokyo 104-0033

Japan

Tel: (81) 3-3523-3551

Fax: (81) 3-3523-7581

Fax: 1(719) 540-1759

Scottish Enterprise Technology Park

Maxwell Building

East Kilbride G75 0QR, Scotland

Tel: (44) 1355-803-000

Fax: (44) 1355-242-743

Literature Requests

www.atmel.com/literature

Disclaimer: The information in this document is provided in connection with Atmel products. No license, express or implied, by estoppel or otherwise, to any

intellectual property right is granted by this document or in connection with the sale of Atmel products. EXCEPT AS SET FORTH IN ATMEL’S TERMS AND CONDI-

TIONS OF SALE LOCATED ON ATMEL’S WEB SITE, ATMEL ASSUMES NO LIABILITY WHATSOEVER AND DISCLAIMS ANY EXPRESS, IMPLIED OR STATUTORY

WARRANTY RELATING TO ITS PRODUCTS INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR

PURPOSE, OR NON-INFRINGEMENT. IN NO EVENT SHALL ATMEL BE LIABLE FOR ANY DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE, SPECIAL OR INCIDEN-

TAL DAMAGES (INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF PROFITS, BUSINESS INTERRUPTION, OR LOSS OF INFORMATION) ARISING OUT

OF THE USE OR INABILITY TO USE THIS DOCUMENT, EVEN IF ATMEL HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Atmel makes no

representations or warranties with respect to the accuracy or completeness of the contents of this document and reserves the right to make changes to specifications

and product descriptions at any time without notice. Atmel does not make any commitment to update the information contained herein. Atmel’s products are not

intended, authorized, or warranted for use as components in applications intended to support or sustain life.

tered trademarks or trademarks of Atmel Corporation or its subsidiaries. Other terms and product names may be trademarks of others.

32001A–AVR32–06/06


型号：	AVR32AP
厂家：	ATMEL
描述：	32-bit AVR Microcontroller 32位AVR微控制器微控制器
文件：	总181页 (文件大小：1390K)
中文：	中文翻译
下载：	下载PDF数据表文档文件

AVR32AP [ATMEL]

相关型号：

SI9130DB

SI9135LG-T1

SI9135LG-T1-E3

SI9135_11

SI9136_11

SI9130CG-T1-E3

SI9130LG-T1-E3

SI9130_11

SI9137

SI9137DB

SI9137LG

SI9122E