HD6437041 [ETC]

SuperH RISC Engine SH-DSP Software Application Notes/Q&A ; 的SuperH RISC引擎SH -DSP软件应用手册/ Q＆A\n


型号：	HD6437041
厂家：	ETC
描述：	SuperH RISC Engine SH-DSP Software Application Notes/Q&A 的SuperH RISC引擎SH -DSP软件应用手册/ Q＆A\n
文件：	总124页 (文件大小：407K)
中文：	中文翻译
下载：	下载PDF数据表文档文件

To all our customers

Regarding the change of names mentioned in the document, such as Hitachi

Electric and Hitachi XX, to Renesas Technology Corp.

The semiconductor operations of Mitsubishi Electric and Hitachi were transferred to Renesas

Technology Corporation on April 1st 2003. These operations include microcomputer, logic, analog

and discrete devices, and memory chips other than DRAMs (flash memory, SRAMs etc.)

Accordingly, although Hitachi, Hitachi, Ltd., Hitachi Semiconductors, and other Hitachi brand

names are mentioned in the document, these names have in fact all been changed to Renesas

Technology Corp. Thank you for your understanding. Except for our corporate trademark, logo and

corporate statement, no changes whatsoever have been made to the contents of the document, and

these changes do not constitute any alteration to the contents of the document itself.

Renesas Technology Home Page: http://www.renesas.com

Renesas Technology Corp.

Customer Support Dept.

April 1, 2003

Cautions

Keep safety first in your circuit designs!

1. Renesas Technology Corporation puts the maximum effort into making semiconductor products better

and more reliable, but there is always the possibility that trouble may occur with them. Trouble with

semiconductors may lead to personal injury, fire or property damage.

Remember to give due consideration to safety when making your circuit designs, with appropriate

measures such as (i) placement of substitutive, auxiliary circuits, (ii) use of nonflammable material or

(iii) prevention against any malfunction or mishap.

Notes regarding these materials

1. These materials are intended as a reference to assist our customers in the selection of the Renesas

Technology Corporation product best suited to the customer's application; they do not convey any

license under any intellectual property rights, or any other rights, belonging to Renesas Technology

Corporation or a third party.

2. Renesas Technology Corporation assumes no responsibility for any damage, or infringement of any

third-party's rights, originating in the use of any product data, diagrams, charts, programs, algorithms, or

circuit application examples contained in these materials.

3. All information contained in these materials, including product data, diagrams, charts, programs and

algorithms represents information on products at the time of publication of these materials, and are

subject to change by Renesas Technology Corporation without notice due to product improvements or

other reasons. It is therefore recommended that customers contact Renesas Technology Corporation

or an authorized Renesas Technology Corporation product distributor for the latest product information

before purchasing a product listed herein.

The information described here may contain technical inaccuracies or typographical errors.

Renesas Technology Corporation assumes no responsibility for any damage, liability, or other loss

rising from these inaccuracies or errors.

Please also pay attention to information published by Renesas Technology Corporation by various

means, including the Renesas Technology Corporation Semiconductor home page

(http://www.renesas.com).

4. When using any or all of the information contained in these materials, including product data, diagrams,

charts, programs, and algorithms, please be sure to evaluate all information as a total system before

making a final decision on the applicability of the information and products. Renesas Technology

Corporation assumes no responsibility for any damage, liability or other loss resulting from the

information contained herein.

5. Renesas Technology Corporation semiconductors are not designed or manufactured for use in a device

or system that is used under circumstances in which human life is potentially at stake. Please contact

Renesas Technology Corporation or an authorized Renesas Technology Corporation product distributor

when considering the use of a product contained herein for any specific purposes, such as apparatus or

systems for transportation, vehicular, medical, aerospace, nuclear, or undersea repeater use.

6. The prior written approval of Renesas Technology Corporation is necessary to reprint or reproduce in

whole or in part these materials.

7. If these products or technologies are subject to the Japanese export control restrictions, they must be

exported under a license from the Japanese government and cannot be imported into a country other

than the approved destination.

Any diversion or reexport contrary to the export control laws and regulations of Japan and/or the

country of destination is prohibited.

8. Please contact Renesas Technology Corporation for further details on these materials or the products

contained therein.

SuperH RISC Engine

SH-DSP Software

Application Note

ADE-502-069

Rev. 1.0

9/21/1999

Hitachi, Ltd.

Cautions

1. Hitachi neither warrants nor grants licenses of any rights of Hitachi’s or any third party’s

patent, copyright, trademark, or other intellectual property rights for information contained in

this document. Hitachi bears no responsibility for problems that may arise with third party’s

rights, including intellectual property rights, in connection with use of the information

contained in this document.

2. Products and product specifications may be subject to change without notice. Confirm that you

have received the latest product standards or specifications before final design, purchase or

use.

3. Hitachi makes every attempt to ensure that its products are of high quality and reliability.

However, contact Hitachi’s sales office before using the product in an application that

demands especially high quality and reliability or where its failure or malfunction may directly

threaten human life or cause risk of bodily injury, such as aerospace, aeronautics, nuclear

power, combustion control, transportation, traffic, safety equipment or medical equipment for

life support.

4. Design your application so that the product is used within the ranges guaranteed by Hitachi

particularly for maximum rating, operating supply voltage range, heat radiation characteristics,

installation conditions and other characteristics. Hitachi bears no responsibility for failure or

damage when used beyond the guaranteed ranges. Even within the guaranteed ranges,

consider normally foreseeable failure rates or failure modes in semiconductor devices and

employ systemic measures such as fail-safes, so that the equipment incorporating Hitachi

product does not cause bodily injury, fire or other consequential damage due to operation of

the Hitachi product.

5. This product is not designed to be radiation resistant.

6. No one is permitted to reproduce or duplicate, in any form, the whole or part of this document

without written approval from Hitachi.

7. Contact Hitachi’s sales office for any questions regarding this document or Hitachi

semiconductor products.

Preface

The SH-DSP is a CPU core belonging to the SuperH RISC engine family. It is a 32-bit RISC

microcontroller based on the SH-2 CPU, optimized for signal processing performance, and

incorporating a DSP unit.

These application notes contain example code that makes use of the special features of the SH-

DSP as well as explanations of how to utilize the hardware. It is hoped that these application notes

will be of use to programmers designing applications that make use of the DSP functions.

Note that though the operation of the example code contained in these application notes has

been verified, it is still necessary to confirm its operation when in an actual implementation.

For more information on the hardware, please refer to the hardware manual for the appropriate

product.

Please feel free to contact Hitachi for detailed information on development systems.

Rev.1.0, 09/99, page v of 7

SH-DSP Code Samples

These application notes contain example code written to illustrate the special features of the SH-

DSP.

Figure 1 shows the format used for listings of source code in the application notes. The main

program code is transferred to XRAM and the program is executed in XRAM. This format is

compatible with the SH7612. When using other SH-DSP models, the following modifications and

cautions apply:

•

XRAM starting address setting .......................................................................................... (1)

Vector and stack pointer (YRAM ending address + 1 byte) settings ................................. (2)

Usage of commands with other SH-DSP models............................................................... (3)

Since space for the data used by the main program is reserved in XRAM or YRAM,

changes to XRAM or YRAM address settings to match microcontroller used ................. (4)

;***************************************************************************

Symbol definition

;***************************************************************************

;

[

XRAM address (SH7612)

.EQU H'1000E000 ------------------------------------- (2)

]

XRAM_TOP

;***************************************************************************

;* Program transfer routine

;***************************************************************************

.SECTION VECT,CODE,LOCATE=H'0

;

.DATA.L

_PRES

H'10020000

;_PRES

;SP

------------------- (1)

.SECTION ROM,CODE,LOCATE=H'1000

_PRES: MOV.L

MOV.L

PRG_MOVE:

MOV.W

ADD

#XRAM_TOP,R1

#MAIN,R10

#MAIN_E,R11

@R10+,R0

R0,@R1

#2,R1

R11,R10

PRG_MOVE

#XRAM_TOP,R0

CMP/GE

MOV.L

JMP

@R0

;Branch to program starting address

;at transfer destination

NOP

Main program ---------------------------------- (3)

Data -------------------------------------- (4)

.END

Figure 1 Source Code Format

Rev. 1.0, 09/99, page vi of 7

Contents

Section 1 Example of Calling Functions (DSP Library)

from C Source Code ......................................................................................

...........................................................................................................................................

1.1

1.2 Linking Assignments.........................................................................................................

1.2.1 “prglnk1.sub” Subcommand File for Linking......................................................

1.2.2 “ini.bat” Batch File for Creating Absolute Files ..................................................

1.2.3 “vect.src” Vector Table for “dsplbr.c” Program, which Uses DSP Library.........

1.3 Function Execution Process ..............................................................................................

Section 2 X/Y Bus Data Access....................................................................................

2.1 X Memory Read................................................................................................................

2.2 X Memory Write ............................................................................................................... 10

2.3 Y Memory Read................................................................................................................ 14

2.4 Y Memory Write ............................................................................................................... 17

Section 3 16-bit Fixed-point Multiplication .............................................................. 21

Section 4 Parallel Execution Instruction..................................................................... 27

Section 5 Repeat Instruction........................................................................................... 33

Section 6 Examples of Arguments Passed Between CPU Instructions

and DSP Instructions..................................................................................... 41

Section 7 32-bit Multiplication...................................................................................... 45

Section 8 .............................................................................................................................. 59

Section 9 Matrix Operations........................................................................................... 75

Section 10 Inner Product.................................................................................................... 83

Section 11 Square Root...................................................................................................... 91

Section 12 Square Mean Error......................................................................................... 105

Section 13 Effects of DSP Instructions on Program Performance........................ 115

Rev.1.0, 09/99, page vii of 7

Section 1 Example of Calling Functions (DSP Library)

from C Source Code

1.1

C Source Code Employing Functions (DSP Library)

The example code below, “dsplbr.c,” illustrates calling the “Mean” function in the DSP library

(shdsplib.lib) from C source code.

<<SH-DSP Application Notes>>

-- DSP library usage example --

"dsplbr.c"

#include "ensigdsp.h"

#define N 6

(1)

/* Mean value definition */

/* Input data number */

short dat[6]={45,61,516,3000,-974,10214} /* Input data */

(2)

(3)

#pragma section X

static short

#pragma section Y

static short

#pragma section ANS

static short

/* XRAM address */

/* YRAM address */

datx[N];

daty[N];

answer;

/* Address for storing mean value */

#pragma section

main()

{

short

i,output[1];

/* output for storing variable i

and Mean function calculation

result */

int

src_x;

/* Argument specifying storage area

for input data */

for(i=0;i<N;i++)

{

datx[i] = dat[i];

daty[i] = dat[i];

}

/* Copy input data to XRAM */

/* Copy input data to YRAM */

/* select XRAM

src_x = 1;

(4)

/* Use XRAM area for Mean

function calculation */

Mean(output,datx,N,src_x);

answer = output[0];

/* Pass Mean function arguments and

calculate mean value */

/* Store Mean function calculation

result at answer address * /

while(1);

/* Processing complete */

}

*1 Refer to 1.3 Function Execution Process for details.

Rev. 1.0, 09/99, page 1 of 115

(1) The format of the functions in the library shdsplib.lib are defined in the header file

ensigndsp.h.

(2) To ensure efficient X bus data transfer with the DSP unit, it is necessary to place datX[N] in

XRAM. Section X needs to be set when linking to addresses in XRAM. (See 1.2 Linking

Assignments.)

(3) To ensure efficient Y bus data transfer with the DSP unit, it is necessary to place datY[N] in

YRAM. Section Y needs to be set when linking to addresses in XRAM. (See 1.2 Linking

Assignments.)

(4) If srx_x = 1, an area in XRAM is used for Mean function calculations. If srx_x = 0, an area in

YRAM is used.

1.2

Linking Assignments

When using the DSP library the utmost care must be taken to ensure that the section setting is

correct. The example code dsplbr.c shown in section 1.1 has two sections, X and Y. If XRAM and

YRAM address are not set for these sections, the functions’ internal calculations cannot be

performed correctly. These addresses are assigned in the subcommand file.

1.2.1

“prglnk1.sub” Subcommand File for Linking

INPUT

START

vect,dsplbr

BX(1000ff00),BANS(1000fff0),BY(1001e000)------------------ (1)

LIBRARY

OUTPUT

FORM

shdsplib.lib-------------------------------------------------------------------- (2)

dsplbr.map

dsplbr.abs

DEBUG

EXIT

(1) BX(1000ff00) assigns #pragma section X (section X) of dsplbr.c to address H'1000FF00.

BY(1001e000) assigns #pragma section Y (section Y) of dsplbr.c to address H'1001E000.

(2) This specifies shdsplib.lib, which includes the Mean function, as the library to be edited.

Rev. 1.0, 09/99, page 2 of 115

1.2.2

“ini.bat” Batch File for Creating Absolute Files

asmsh vect.src -cpu=shdsp -debug -lis

shc dsplbr.c -cpu=sh2 -lis -debug -include=ensigdsp.h

lnk -subcommand=prglnk1.sub

1.2.3

“vect.src” Vector Table for “dsplbr.c” Program, which Uses DSP Library

;********************************************************

<<SH-DSP Application Notes>>

-- DSP library usage example --

"vect.src"

;*******************************************************

.import

_main

.section vect,data,locate=h'0

.data.l

.end

_main

h'10020000

Rev. 1.0, 09/99, page 3 of 115

1.3

Function Execution Process

Excerpts from the example code dsplbr.c shown in section 1.1, and the assembler code resulting

from the functions used, as shown below.

src_x = 1;

Assembler code resulting from function

Mean(output,datx,N,src_x;)

Address

Label

Assembler

1001e2fc

1001e2fe

1001e300

1001e302

_Mean CMP/PZ

answer = output[0]

MOV

@1001E322:8

#H'01,R1

R1,R7

CMP/GT

1001e486

1001e488

1001e48a

NEG

MOV.W

RTS

R2,R2

R2,@R4

In table 1.1, the input data is arranged starting at address H'1000FF00. It is assumed that the data

in RAM has been cleared to 0. The data remains the same after the function is executed.

Table 1.1 Memory Map

XRAM Memory

H'1000FF00

H'1000FF08

002D 003D 0204 0BB8

FC32 27E6 0000 0000

Rev. 1.0, 09/99, page 4 of 115

Table 1.2 Function Execution Process

Excerpt from dsplbr.c Code

Mean(output,datx,N,src_x);

Before execution:

R4=H'1001FFFC, R5=H'1000FF00, R6=6, R7=1

After execution:

R4=H'1001FFFC, R5=H'1000FF0C, R6=6, R7=H'10000

The function arguments are assigned the declaration sequence R4 to R7, so output=H'1001FFFC,

datx=H'1000FF00, N=6, src_x=1 is passed to the function. The calculation result is held in @R4.

Table 1.3 C Source Code Execution Process (Process Inside Memory Map)

Excerpt from dsplbr.c Code

YRAM Memory

answer = output[0];

Before execution:

H'1001FF00

0000 0000 0000 0000

After execution:

H'1001FF00

0860 0000 0000 0000

The C source code then stores the function calculation result from @R4 in answer (H'1001FF0).

Table 1.4 Mean Function Calculation Result

Input Value

(decimal)

Input Value

(hexadecimal)

Logical Value

(decimal)

Logical Value

(hexadecimal)

Output Value

(hexadecimal)

H'2D

2143.666667

H'860

(2144 calculated

as a decimal value)

H'3D

516

H'204

H'BB8

H'FC32

H'27E6

3000

–974

10214

Rev. 1.0, 09/99, page 5 of 115

Section 2 X/Y Bus Data Access

2.1

X Memory Read

Overview

The data from the XRAM_ADD address (H'1000FF00) and XRAM_ADD+2 address

(H'1000FF02) is transferred, respectively, to registers X0 and X1.

Description

Table 2.1 shows the types of X memory read instructions and the registers that can be used as

operands. Data can be read from X memory using the commands listed in table 2.1.

When reading data from X memory the transfer data length is 16 bits, so the data is stored as the

upper word of register X0 or X1. When this happens, the lower word of register X0 or X1 is

cleared to 0. Processes (1) and (2) in the flowchart are illustrated below.

Table 2.1 X Memory Read Instruction Types

X Memory Read

Instruction

Source Register

(Ax)

Destination Register

(Dx)

Index Register

(Ix)

MOVX.W @Ax,Dx

MOVX.W @Ax+,Dx

MOVX.W @Ax+Ix,Dx

R4, R5

X0, X1

Rev. 1.0, 09/99, page 7 of 115

Process (1)

XRAM

16 15

XRAM_TOP

XRAM_ADD

16 15

Bit: 31

XRAM_END

Stores read data

Cleared to 0

Process (2)

XRAM

16 15

XRAM_TOP

XRAM_ADD

16 15

Bit: 31

XRAM_END

Stores read data

Cleared to 0

: Ignored

Flowchart

Start

Transfer XRAM address (H'1000FF00) to register R4

After reading data (0.5) from R4 address

(H'1000FF00) to register X0, increment R4 address

(1)

(2)

Read data (0.25) from R4 address (H'1000FF02) to

End

Rev. 1.0, 09/99, page 8 of 115

Main Program

;**********************************************************************

X memory read

;**********************************************************************

MAIN:

EXIT:

MOV.L

MOVX.W

BRA

#XRAM_ADD,R4

@R4+,X0

@R4,X1

;XRAM_ADD address -> register R4

;(H'1000FF00) -> X0

;(H'1000FF02) -> X1

EXIT

NOP

MAIN_E: NOP

Data

;***************************************************************

;* Data

;***************************************************************

.SECTION XRAM,DATA,LOCATE=H'1000FF00

XRAM_ADD:

.XDATA.W

0.5,0.25

Rev. 1.0, 09/99, page 9 of 115

2.2

X Memory Write

Overview

The data from the XRAM_ADD1 address (H'1000FF00) and XRAM_ADD1+2 address

(H'1000FF02) is transferred the XRAM_ADD2 address and XRAM_ADD2+2 address.

Description

Table 2.2 shows the types of X memory write instructions and the registers that can be used as

operands. Data can be written to X memory using the commands listed in table 2.2.

When writing data to X memory the transfer data length is 16 bits, so the upper word data from

guard bit and lower word of register A0 or A1 is ignored. The X memory write instructions can

use only registers A0 and A1 as source registers (see Table 2.2 X Memory Write Instruction

Types), so when transferring data to register A0 or A1, single data transfers with register A0 or A1

as the destination operand are used. Processes (1) and (2) in the flowchart are illustrated below.

Table 2.2 X Memory Write Instruction Types

X Memory Write

Instruction

Source Register

(Da)

Destination Register

(Ax)

Index Register

(Ix)

MOVX.W Da,@Ax

MOVX.W Da,@Ax+

MOVX.W Da,@Ax+Ix

A0, A1

R4, R5

Rev. 1.0, 09/99, page 10 of 115

Process (1)

Memory map (XRAM)

16 15

XRAM_TOP

16 15

XRAM_ADD1

Bit: 39

Data written to XRAM

Ignored

XRAM_ADD2

XRAM_END

Process (2)

Memory map (XRAM)

16 15

XRAM_TOP

16 15

Data written to XRAM

XRAM_ADD1

Bit: 39

Ignored

XRAM_ADD2

XRAM_END

Rev. 1.0, 09/99, page 11 of 115

Flowchart

Start

Transfer XRAM_ADD1 address (H'1000FF00) to

Transfer XRAM_ADD2 address (H'1000FF00) to

After transferring data (0.5) from R4 (H'1000FF00)

address to register A0, increment R4 address

(1)

Transfer register A0 data to R2 (H'1000FF04)

address and increment R2

Transfer data (0.25) from R4 (H'1000FF02) address

to register A1

(2)

Transfer data from register A1 to R2 (H'1000FF06)

address

End

Rev. 1.0, 09/99, page 12 of 115

Main Program

***********************************************************************

X memory write

;**********************************************************************

MAIN:

EXIT:

MOV.L

MOVS.W

MOVX.W

MOVS.W

MOVX.W

BRA

#XRAM_ADD1,R2

#XRAM_ADD2,R4

@R2+,A0

;XRAM_ADD1 -> R2 register

;XRAM_ADD2 -> R4 register

;(H'1000FF00) -> A0 register

;A0 register data -> XRAM_ADD2

;(H'1000FF00) -> A1 register

;A1 register data -> XRAM_ADD2+2

A0,@R4+

@R2,A1

A1,@R4

EXIT

NOP

MAIN_E: NOP

Data

;***************************************************************

;* Data

;***************************************************************

.SECTION XRAM,DATA,LOCATE=H'1000FF00

XRAM_ADD1:

XRAM_ADD2:

.XDATA.W

.RES.W

0.5,0.25

Rev. 1.0, 09/99, page 13 of 115

2.3

Y Memory Read

Overview

The data from the TRAM_ADD address (H'1001FF00) and YRAM_ADD+2 address

(H'1001FF02) is transferred, respectively, to registers Y0 and Y1.

Description

Table 2.3 shows the types of Y memory read instructions and the registers that can be used as

operands. Data can be read from Y memory using the commands listed in table 2.3.

When reading data from Y memory the transfer data length is 16 bits, so the data is stored as the

upper word of register Y0 or Y1. When this happens, the lower word of register Y0 or Y1 is

cleared to 0. Processes (1) and (2) in the flowchart are illustrated below.

Table 2.3 Y Memory Read Instruction Types

Y Memory Read

Instruction

Source Register

(Ay)

Destination Register

(Dy)

Index Register

(Iy)

MOVY.W @Ay,Dy

MOVY.W @Ay+,Dy

MOVY.W @Ay+Iy,Dy

R6, R7

Y0, Y1

Rev. 1.0, 09/99, page 14 of 115

Process (1)

YRAM

16 15

YRAM_TOP

YRAM_ADD

16 15

Bit: 31

YRAM_END

Stores read data

Cleared to 0

Process (2)

YRAM

16 15

YRAM_TOP

YRAM_ADD

16 15

Bit: 31

YRAM_END

Stores read data

Cleared to 0

: Ignored

Flowchart

Start

Transfer YRAM address (H'1001FF00) to register R6

After reading data (0.5) from R4 address

(H'1001FF00) to register Y0, increment R6 address

(1)

(2)

Read data (0.25) from R6 address (H'1001FF02) to

End

Rev. 1.0, 09/99, page 15 of 115

Main Program

;**********************************************************************

Y memory read

;**********************************************************************

MAIN:

EXIT:

MOV.L

MOVX.W

BRA

#YRAM_ADD,R6

@R6+,Y0

@R6,Y1

;YRAM_ADD address -> R6 register

;(H'1001FF00) -> Y0

;(H'1001FF02) -> Y1

EXIT

NOP

MAIN_E: NOP

Data

;***************************************************************

;* Data

;***************************************************************

.SECTION YRAM,DATA,LOCATE=H'1001FF00

YRAM_ADD:

.XDATA.W

0.5,0.25

Rev. 1.0, 09/99, page 16 of 115

2.4

Y Memory Write

Overview

The data from the YRAM_ADD1 address (H'1001FF00) and YRAM_ADD1+2 address

(H'1001FF02) is transferred the YRAM_ADD2 address and YRAM_ADD2+2 address.

Description

Table 2.4 shows the types of Y memory write instructions and the registers that can be used as

operands. Data can be written to Y memory using the commands listed in table 2.4.

When writing data to Y memory the transfer data length is 16 bits, so the upper word data from

guard bit and lower word of register A0 or A1 is ignored. The Y memory write instructions can

use only registers A0 and A1 as source registers (see Table 2.4 Y Memory Write Instruction

Types), so when transferring data to register A0 or A1, single data transfers with register A0 or A1

as the destination operand are used. Processes (1) and (2) in the flowchart are illustrated below.

Table 2.4 Y Memory Write Instruction Types

Y Memory Write

Instruction

Source Register

(Da)

Destination Register

(Ax)

Index Register

(Ix)

MOVY.W Da,@Ax

MOVY.W Da,@Ax+

MOVY.W Da,@Ax+Ix

A0, A1

R6, R7

Rev. 1.0, 09/99, page 17 of 115

Process (1)

Memory map (YRAM)

16 15

YRAM_TOP

16 15

YRAM_ADD1

Bit: 39

Data written to YRAM

Ignored

YRAM_ADD2

YRAM_END

Process (2)

Memory map (YRAM)

16 15

YRAM_TOP

16 15

Data written to YRAM

YRAM_ADD1

Bit: 39

Ignored

YRAM_ADD2

YRAM_END

: Ignored

Rev. 1.0, 09/99, page 18 of 115

Flowchart

Start

Transfer YRAM_ADD1 address (H'1001FF00) to

Transfer YRAM_ADD2 address (H'1001FF00) to

After transferring data (0.5) from R6 (H'1001FF00)

address to register A0, increment R6 address

(1)

Transfer register A0 data to R3 (H'1001FF04)

address and increment R3

Transfer data (0.25) from R6 (H'1001FF02) address

to register A1

(2)

Transfer data from register A1 to R3 (H'1001FF06)

address

End

Rev. 1.0, 09/99, page 19 of 115

Main Program

***********************************************************************

Y Memory Write

;**********************************************************************

MAIN:

EXIT:

MOV.L

MOVS.W

MOVX.W

MOVS.W

MOVX.W

BRA

#YRAM_ADD1,R3

#YRAM_ADD2,R6

@R3+,A0

;YRAM_ADD1 -> R3 register

;YRAM_ADD2 -> R6 register

;(H'1001FF00) -> A0 register

;A0 register data -> YRAM_ADD2

;(H'1001FF00) -> A1 register

;A1 register data -> YRAM_ADD2+2

A0,@R6+

@R3,A1

A1,@R6

EXIT

NOP

MAIN_E: NOP

Data

;****************************************************************

;* Data

;****************************************************************

.SECTION YRAM,DATA,LOCATE=H'1001FF00

YRAM_ADD1:

YRAM_ADD2:

.XDATA.W

.RES.W

0.5,0.25

Rev. 1.0, 09/99, page 20 of 115

Section 3 16-bit Fixed-point Multiplication

Overview

Multiplies the 16-bit data at the XRAM-ADD address (H'1000FF000) and the 16-bit data at the

YRAM-ADD address (H'1001FF002). The result is stored at the ANS address (H'1001FF002).

Description

1. Data Transfer

Transfer of the data from the XRAM-ADD address (H'1000FF000) and the YRAM-ADD

address (H'1001FF002) is performed using X bus data transfer and Y bus data transfer, as

described in 2. X/Y Bus Data Access. In process (1) in the flowchart the XRAM and YRAM

data is read simultaneously, but no contention occurs because the X bus and Y bus are

independent of each other. The format is shown below.

The sequence is [X bus data transfer] then [Y bus data transfer]. If these are described in a

single step, the instructions may be combined as either [X memory read] [Y memory write] or

[X memory write] [Y memory read].

Format: MOVX.W @R5,X1

MOVY.W @R7,Y1

Rev. 1.0, 09/99, page 21 of 115

2. Fixed-point Multiplication

The PMULS instruction is used to perform fixed-point multiplication in process (2) in the

flowchart. The format is shown below. The fixed-point multiplication process is shown in

figure 3.1. Only the upper word data from source 1 and source 2 is valid. For example, if the

longword H'12345678 was read from the source, the portion that would actually be multiplied

would be H'1234.

Format: PMULS

Se,Sf,Dg

Source 1 (Se): X0, X1, Y0, A1

Only upper word is valid

Source 2 (Sf): Y0, Y1, X0, A1

Only upper word is valid

MAC

(multiplier)

Destination (Dg): M0, M1, A0, A1

Guard bit

Code extension

1 0

: Ignored

1 0

Figure 3.1 Fixed-point Multiplication Process

Rev. 1.0, 09/99, page 22 of 115

3. Overflow

An overflow can occur during fixed-point multiplication only if the operation is H'8000(–1.0)

× H'8000(–1.0), in which case the calculation result is H'8000(–1.0). This can happen only

when the destination register is a register other than A0 or A1, both of which have guard bits.

If the destination register is A0 or A1, the result of the above calculation is the correct value of

H'008000000(1.0). Refer to table 3.1 for additional fixed-point multiplication execution

examples.

Since the destination register used in the example main program is A0, no overflow problem

occurs.

Table 3.1 Fixed-point Multiplication Execution Examples

State of Operation Destination

Operation Example

Result

M0, M1

A0, A1

M0, M1

A0, A1

M0, M1

A0, A1

Operation Result

H'4000 (0.5) ×

H'2000 (0.25)

Positive

H'1000 0000 (0.125)

H'00 1000 0000 (0.125)

H'FFC00 0000 (–1.95×10^–3)

H'FF FFC00 0000 (–1.95×10^–3)

H'8000 0000 (–0.1)

H'0800 (0.0625) ×

H'FC00 (–0.03125)

Negative

Overflow

H'8000 (–1.0) ×

H'8000 (–1.0)

H'00 8000 0000 (1.0)

Rev. 1.0, 09/99, page 23 of 115

Flowchart

Start

Transfer XRAM_ADD address (H'1000F000) to

Transfer YRAM_ADD address (H'1001F000) to

Transfer ANS address (H'1001F002) to register R7

Transfer data from R4 address (H'1000F000) to

Transfer data from R6 address (H'1001F000) to

(1)

(2)

Multiply upper 16 bits of register X0 data and register

Y0 data, store result in register A0

Transfer data from register A0 to ANS address

(H'1001F002)

End

Rev. 1.0, 09/99, page 24 of 115

Main Program

;*******************************************************************************************

16-bit fixed-point multiplication routine

;*******************************************************************************************

MAIN: MOV.L

MOV.L

#0,R4

;Clear register R4

#0,R6

;Clear register R6

MOV.L

#XRAM_ADD,R4

#YRAM_ADD,R6

#ANS,R7

;XRAM address -> register R4

;YRAM address -> register R6

;ANS address -> register R7

MOV.L

MOVX.W @R4,X0 MOVY.W @R6,Y0

;XRAM and YRAM address data ->

registers X0 and Y0

PMULS X0,Y0,A0

;16-bit fixed-point

multiplication

MOVY.W A0,@R7

;Store multiplication result

EXIT: BRA

NOP

EXIT

MAIN_E:NOP

Data

;**************************************************************

;* Data

;**************************************************************

.SECTION XRAM,DATA,LOCATE=H'1000F000

XRAM_ADD:

.XDATA.W

0.0625

.SECTION YRAM,DATA,LOCATE=H'1001F000

YRAM_ADD:

ANS:

.XDATA.W

.RES.W

0.03125

Rev. 1.0, 09/99, page 25 of 115

Section 4 Parallel Execution Instruction

Overview

Four data values obtained sequentially from the XRAM-ADD address (H'1000FF000) and the

YRAM-ADD address (H'1001FF000) are added and multiplied. The addition result is stored at the

ANS1 address (H'1000FF004) and the multiplication result at the ANS2 address (H'1001FF004).

Description

1. Structure of Parallel Execution Instruction

The parallel execution instruction is used to transfer data between a DSP register and X

memory or Y memory at the same time a DSP operation is being executed. Table 4.1 shows

the data transfer and DSP operation structure. The parallel execution instruction comprises a

DSP operation portion and a data transfer portion. Table 4.2 lists format examples for the

parallel execution instruction. The DSP operation portion is a single instruction like the regular

PAND, PINC, and PSHA instructions. However, as shown in table 4.2, its has two-instruction

structure the case of the PADD and PMULS instructions, or the PSUB and PMULS

instructions. The data transfer portion consists of two instructions, one the data transfer

instruction for X memory and the other the data transfer instruction for Y memory. Either one

of these data transfer instructions may be used.

Table 4.1 Data Transfer and DSP Operation Structure

Parallel

Data Transfer Processing with Parallel Processing Instructio

Type

Bus Used Length

DSP Operation

of Data Transfers

n Length

Double X bus

16 bits

No: One or the other 16 bits

data transfer

data

Y bus

transfer

(1)

(2)

Yes: Data transfer

with X memory and Y

memory at same time

Yes

No: One or the other 32 bits

data transfer

Yes: Data transfer

with X memory and Y

memory at same time

Single

data

C bus^*1

16 bits

32 bits

16 bits

transfer

*1: Note that the name differs depending on the product.

Rev. 1.0, 09/99, page 27 of 115

Table 4.2 Parallel Execution Instruction Format Examples

DSP Operation Portion

Data Transfer Portion

PADD X0,Y0,A0 PMULS X0,Y0,A1

PSUB X1,Y1,A1 PMULS X0,Y1,A0

PADD X0,Y0,A0 PMULS X0,Y0,A1

MOVX.W A0,@R4 MOVY.W A1,@R6

MOVX.W @R5,X1 MOVY.W @R7,Y1

MOVX.W A0,@R4

PINC

X0,Y0,A0

MOVY.W @R6,Y1

PAND X0,Y0,A0

PSHA X0,Y0,A0

MOVX.W A0,@R5

MOVX.W @R4,X1 MOVY.W A1,@R7

2. Parallel Processing of Double Data Transfer and DSP Operation

Process (1) in the flowchart on the following page is double data transfer with no DSP

operation instruction parallel processing, which is indicated as (1) in table 4.1, and processes

(2) and (3) are double data transfer with parallel processing of DSP operation instructions,

which is indicated as (2) in table 4.1. Processes (2) and (3) consist of four instructions, which is

the maximum number that can be declared in a single step. In this case, one execution state is

used.

3. Effect of DSP Operation Portion Result on Data Transfer Portion

Table 4.3 shows the effect of the DSP operation portion result on the data transfer portion.

Instruction 2 (process (3)) uses A0 and A1 as the destination register for the DSP operation

portion and also as the source register for the data transfer portion. However, the result of the

DSP operation portion is not the data stored in the data transfer portion. In this case the

underlined registers are affected, so the calculation result from instruction 1 (process (2))

operation portion is stored in the instruction 2 (process (3)) data transfer portion.

Figure 4.1 shows the instruction 2 pipeline flow. When instructions are executed in parallel,

each of the instructions is processed independently, as shown in figure 4.1. The reason the

DSP operation portion result does not become the data stored in the data transfer portion in this

case is that the WB/DSP stage, in which DSP operations are performed using PADD and

PMULS, is later than the MA stage, in which memory access is performed using MOVX.W

and MOVY.W.

Note that after the execution of instruction 2 (process (3)), the X1 and Y1 addition and

multiplication results are stored in registers A0 and A1.

Rev. 1.0, 09/99, page 28 of 115

Table 4.3 Effect of DSP Operation Portion Result on Data Transfer Portion

Excerpts from Main Program

;Instruction 1

PADD

X0,Y0,A0 PMULS X0,Y0,A1

MOVX.W @R4,X1

MOVY.W @R6,Y1

;Instruction 2

PADD

X1,Y1,A0 PMULS X1,Y1,A1

MOVX.W A0,@R5+ MOVY.W A1,@R7+

Content of Registers

Before execution of instruction 2:

X1=H'1000 0000, Y1=H'0800 0000, A0=H'6000 0000, A1=H'1000 0000

After execution of instruction 2:

X1=H'1000 0000, Y1=H'0800 0000, A0=H'1800 0000, A1=H'0100 0000

Slot

PADD

X1,Y1,A0

X1,Y1,A1

A0,@R5+

A1,@R7+

WB/DSP

PMULS

MOVX.W

MOVY.W

Figure 4.1 Instruction 2 Pipeline Flow

Rev. 1.0, 09/99, page 29 of 115

Flowchart

Start

Transfer XRAM_ADD address (H'1000F000) to

Transfer ANS1 address (H'1000F004) to register R5

Transfer YRAM_ADD address (H'1001F000) to

Transfer ANS2 address (H'1001F004) to register R7

After transferring data (0.5) from R4 address

(H'1000F000) to register X0, increment address

After transferring data (0.25) from R6 address

(H'1001F000) to register Y0, increment address

(1)

(2)

Add data in registers X0 and Y0, store result in

Multiply data in registers X0 and Y0, store result in

After transferring data (0.25) from R4 address

(H'1000F000) to register X1, increment address

After transferring data (0.5) from R6 address

(H'1001F000) to register Y1, increment address

Add data in registers X1 and Y1, store result in

Multiply data in registers X1 and Y1, store result in

After transferring data register A0 to ANS1 address

(H'1000F004), increment address

After transferring data register A1 to ANS2 address

(H'1001F004), increment address

(3)

(1)

After transferring data register A0 to ANS1 address

(H'1000F004), increment address

After transferring data register A1 to ANS2 address

(H'1001F004), increment address

End

Rev. 1.0, 09/99, page 30 of 115

Main Program

;*******************************************************************************************

Parallel data transfer routine

;******************************************************************************************

MAIN: MOV.L

MOV.L

#XRAM_ADD,R4

#ANS1,R5

MOV.L

#YRAM_ADD,R6

#ANS2,R7

MOV.L

MOVX.W @R4+,X0 MOVY.W @R6+,Y0

;No parallel processing

MOVX.W @R4,X1 MOVY.W @R6,Y1

;Parallel processing

MOVX.W A0,@R5+ MOVY.W A1,@R7+

;Parallel processing

MOVX.W A0,@R5 MOVY.W A1,@R7

;No parallel processing

PADD X0,Y0,A0 PMULS X0,Y0,A1

PADD X1,Y1,A0 PMULS X1,Y1,A1

EXIT: BRA

EXIT

NOP

MAIN_E:NOP

Data

;**********************************************************************

;* Data(X/YRAM)

;**********************************************************************

.SECTION XRAM,DATA,LOCATE=H'1000F000

XRAM_ADD:

ANS1:

.XDATA.W

.RES.W

0.5,0.125

;DSP operation data

;DSP operation result storage area

.SECTION YRAM,DATA,LOCATE=H'1001F000

YRAM_ADD:

ANS2:

.XDATA.W

.RES.W

0.25,0.0625

;DSP operation data

;DSP operation result storage area

Rev. 1.0, 09/99, page 31 of 115

Section 5 Repeat Instruction

Overview

The average of ten data values stored in XRAM and YRAM is obtained. To accomplish this, the

repeat function is used for transferring data from XRAM and YRAM to the DSP unit, and for

adding the ten data values.

Description

1. DSP Repeat Control

Three settings are required in order to perform repeat control: I the start address setting for the

program to be repeated, II the end address setting for the program to be repeated, III and the

setting for the number of repetitions to be performed. After settings I through III have been

completed, Process IV is to start the program to be repeated. Note that a minimum of one

instruction is required between the processing of III and IV.

The sequence of processes I through IV is shown below.

LDRS instruction is used to set the repeat start address in the RS register.

II LDRE instruction is used to set the repeat end address in the RE register.

III SETRC instruction is used to set the number of repetitions in the RC register.

(Minimum of one instruction inserted.)

IV Program to be repeated is started.

Process (1) in the flowchart on the next page corresponds to I through III above. After the

program to be repeated is started (IV), it is repeated within the scope of process (2). Two main

programs are shown in the example, but their function is the same. In (1) repeat control

instructions (LDRS, LDRE, and SETRC) are used, and in (2) the extended instruction

REPEAT is used. REPEAT automatically generates the CPU instructions (LDRS, LDRE, and

SETRC) used to repeat the instructions between the start and end addresses. In the format

shown below if the number of repetitions is omitted, the SETRC instruction is not generated.

Format: REPEAT [start address], [end address], [number of repetitions]

Rev. 1.0, 09/99, page 33 of 115

In program (1) the repeat start and end addresses are different from the actual addresses, and

this is because the address setting change depending on the number of instructions in the

program to be repeated. Table 5.1 shows how the RS and RE settings change depending on the

number of instructions within the range to be repeated. These are the addresses actually

repeated by the program when the repeat start and end addresses are set in RS and RE.

Therefore, it is necessary to label the repeat start and end addresses while keeping the offsets

listed in Table 5.1 in mind. The setting method for RS and RE in program (1) is described on

the next page.

RPT_S0+N: Address N bytes from the instruction preceding the instruction at the start

address of the program to be repeated

RPT_S:

RPT_E:

Start address of the program to be repeated

End address of the program to be repeated

RPT_E3+4: Address 4 bytes from the instruction three instructions before the instruction at

the end address of the program to be repeated

Table 5.1 RS and RE Setting Values Based on Number of Instructions Within Repeat

Number of Instructions in Program to be Repeated

RPT_S0 + 8

RPT_S0 + 4

RPT_S0 + 6

RPT_S0 + 4

RPT_S

RPT_E3 + 4

Rev. 1.0, 09/99, page 34 of 115

2. Repeat Control Using CPU Instructions

Example (a) shows the method for setting addresses in RS and RE. If there are three

instructions in the portion to be repeated, RS and RE must be set to the RPT_S0+4 address, as

indicated in Table 5.1. The double data transfer instructions in lines (1) and (2) of this program

have a 16-bit instruction length, so the RPT_S0+4 address corresponds to the RPT_E0 address.

If RS and RE are set to the address RPT_E0, the result is program (b).

LDRS

LDRE

SETRC

RPT_S0+4 address

;Repeat start address

;Repeat end address

;Repeat counter setting/5 repetitions

RPT_S0:

RPT_S:

(1) MOVX.W @R5,X1 MOVY.W @R7,Y1

(2) MOVX.W @R4+,X0 MOVY.W @R6+,Y0

;Clear X1, Y1 = 1/10

RPT_E0: PADD

RPT_E: PADD

X0,Y0,M0

X1,M0,X1

;X1/data total

PMULS X1,Y1,A1

;A1/average value

(a) RS and RE Address Setting Method

LDRS

LDRE

RPT_E0

;Repeat start address

;Repeat end address

;Repeat counter setting/5 repetitions

SETRC

RPT_S0:

MOVX.W @R5,X1 MOVY.W @R7,Y1

MOVX.W @R4+,X0 MOVY.W @R6+,Y0

;Clear X1, Y1 = 1/10

RPT_S:

RPT_E0: PADD

RPT_E: PADD

X0,Y0,M0

X1,M0,X1

;X1/data total

PMULS X1,Y1,A1

;A1/average value

(b) RS and RE Address Setting Method

Rev. 1.0, 09/99, page 35 of 115

3. Repeat Control Using Extended Instructions

When the extended instruction REPEAT is used there is no need to perform complicated

labeling, as is the case when using CPU instructions for repeat control. The following

explanation is based on the expanded image of a portion of a repeat program shown as (a)

below. With REPEAT one only needs to declare the labels for the start (RPT_S) and end

(RPT_E) addresses of the program to be repeated, and then the assembler automatically

calculates the address values to be used for the RS and RE settings (RPT_E0 if the code to be

repeated contains three instructions), and generates the LDRS, LDRE, and SETRC

instructions. When the extended instruction REPEAT is actually used, the result is the repeat

program shown in example (b) below.

REPEAT RPT_S,RPT_E,#5

LDRS

LDRE

SETRC

RPT_E0

RPT-E0

;RPT_S0+4

Expands to CPU instructions for repeat control.

RPT_S0:

RPT_S:

MOVX.W @R5,X1

MOVY.W @R7,Y1

MOVX.W @R4+,X0 MOVY.W @R6+,Y0

RPT_E0: PADD

RPT_E: PADD

X0,Y0,M0

X1,M0,X1

PMULS X1,Y1,A1

(a) Expanded Image of Repeat Program

REPEAT RPT_S,RPT_E,#5

RPT_S0:

RPT_S:

MOVX.W @R5,X1

MOVY.W @R7,Y1

MOVX.W @R4+,X0 MOVY.W @R6+,Y0

RPT_E0: PADD

RPT_E: PADD

X0,Y0,M0

X1,M0,X1

PMULS X1,Y1,A1

(b) Repeat Program Using Extended Instruction REPEAT

Rev. 1.0, 09/99, page 36 of 115

Flowchart

Start

Transfer XRAM_ADD address to R4

Transfer CLR address to R5

Transfer YRAM_ADD address to R6

Transfer DIV address to R7

Set RPT_S address as repeat start address (RS)

Set RPT_E address as repeat end address (RE)

(1)

Set RC counter in register SR to number of

repetitions (5 times)

Clear register X1 by transferring R5 address

(H'1000F00A) data (0) to register X1

Transfer data (0.1) from register R7 (H'1001F00A) to

Repeat program

number of times

indicated by

repetitions setting

(5 times in this

case)

Transfer R4 address data to register X0 and

increment R4 address

Transfer R6 address data to register Y0 and

increment R6 address

(2)

Add data from registers X0 and Y0, and store result

in register M0

Add data from registers X1 and M0, and store result

in register X1

Multiply data from registers X1 and Y1, and store

result in register A0

End

Rev. 1.0, 09/99, page 37 of 115

Main Program

(1) Repeat Control Using CPU Instructions

;*******************************************************************************************

Repeat routine

;*******************************************************************************************

MAIN: MOV.L

MOV.L

#XRAM_ADD,R4

#CLR,R5

#YRAM_ADD,R6

#DIV,R7

RPT_E0

MOV.L

LDRS

;Repeat start address

;Repeat end address

LDRE

RPT_E0

SETRC

;Repeat counter setting/5

repetitions

MOVX.W @R5,X1 MOVY.W @R7,Y1

MOVX.W @R4+,X0 MOVY.W @R6+,Y0

;Clear X1, Y1 = 1/10

RPT_S:

RPT_E0:PADD X0,Y0,M0

RPT_E: PADD X1,M0,X1

;X1/data total

PMULS X1,Y1,A1

;A1/average value

EXIT: BRA

NOP

EXIT

MAIN_E:NOP

(2) Repeat Control Using Extended Instruction REPEAT

;*******************************************************************************************

;* Repeat routine

;*******************************************************************************************

MAIN: MOV.L

MOV.L

#XRAM_ADD,R4

#CLR,R5

MOV.L

#YRAM_ADD,R6

#DIV,R7

MOV.L

#5,R0

REPEAT RPT_S,RPT_E,R0

;CPU instructions for

repeat control generated

automatically

MOVX.W @R5,X1 MOVY.W @R7,Y1

MOVX.W @R4+,X0 MOVY.W @R6+,Y0

;Clear X1, Y1 = 1/10

RPT_S:

PADD X0,Y0,M0

RPT_E: PADD X1,M0,X1

PMULS X1,Y1,A1

;X1/data total

;A1/average value

EXIT: BRA

NOP

EXIT

MAIN_E:NO

Rev. 1.0, 09/99, page 38 of 115

Data

* Same data used by main programs (1) and (2)

;*******************************************************************************************

Data (X/YRAM)

;*******************************************************************************************

.SECTION XRAM,CODE,LOCATE=H'1000F000

XRAM_ADD: .XDATA.W

CLR; .DATA.W

0.0625,0.125,0.0625,0.0625,0.03125 ;DSP operation data

;DSP operation result storage area

.SECTION YRAM,CODE,LOCATE=H'1001F000

YRAM_ADD: .XDATA.W

DIV: .XDATA.W

0.0625,0.125,0.03125,0.125,0.0625

0.1

;DSP operation data

;DSP operation result storage area

Rev. 1.0, 09/99, page 39 of 115

Section 6 Examples of Arguments Passed Between CPU

Instructions and DSP Instructions

Overview

The two 16-bit fixed-point data values stored at the XRAM_ADD address (H'1000F000) and

YRAM_ADD address (H'1001F000) are multiplied using DSP instructions and CPU instructions.

Description

When data is passed between CPU instructions and DSP instructions, R4, R5, R6, and R7 are used

as pointers and the data is passed via XRAM and YRAM. The procedure when the result of a

calculation performed by the DSP is used by the CPU is described below.

As can be seen in (2-1), (3-1), and (3-2), both the (2) DSP multiplication routine and (3) CPU

multiplication routine of the example main program read data stored in XRAM and YRAM.

Example arguments:

PADD

X0,Y0,A0 ; Stores result of adding X0 and Y0 in A0

MOVX.W A0,@R4

MOV.W @R4,R0

; Transfers A0 data to R4 address

; Transfers R4 address data to R0

Some points need to be kept in mind when transferring data. Some of the DSP instructions are for

handling fixed-point data, and when fixed-point multiplication is performed the result is matched

to the MSB. However, when multiplication is performed using CPU instructions, integer

multiplication is performed and the is matched to the LSB. This means that the calculation result

will differ from that obtained using DSP instructions.

The multiplication process used in (2-1), (3-1), and (3-2) in the (2) DSP multiplication routine and

(3) CPU multiplication routine in the flowchart on the following page is shown in table 6.1. This

shows that the calculation results after execution differ even if the source operand data is identical.

When a DSP instruction (PMULS) is used to multiply integer data, it is necessary to convert the

calculation result from fixed-bit data into integer format by performing a bit shift.

Rev. 1.0, 09/99, page 41 of 115

Table 6.1 DSP and CPU Multiplication Process

Excerpt from Main Program

(2) DSP multiplication routine PMULS X0,Y0,A0

Before execution:

X0=H'4000, Y0=2000

After execution:

A0=H'1000 0000

(3) CPU multiplication routine MULS.W R0,R1

STS MACL,R14

Before execution:

R0=H'4000, R1=H'2000

After execution:

R14=H'0800 0000

Rev. 1.0, 09/99, page 42 of 115

Flowchart

Start

Transfer XRAM_ADD address (H'1000F000) to

(1-1)

(1-2)

(1)

Transfer YRAM_ADD address (H'1001F000) to

Transfer data (H'4000) from R4 address

(H'1000F000) to register X0

Transfer data (H'2000) from R6 address

(H'1001F000) to register Y0

(2-1)

(2)

Multiply data from register X0 and register Y0, store

result in register A0

(2-2)

(3-1)

Transfer data (H'4000) from R4 address

(H'1000F000) to register R0

Transfer data (H'2000) from R6 address

(H'1001F000) to register R1

(3-2)

(3-3)

(3-4)

(3)

Multiply data from register R0 and register R1

Transfer data (multiplication result) from register

MACL to register R14

End

Rev. 1.0, 09/99, page 43 of 115

Main Program

;*******************************************************************************************

Initial setting routine

;*******************************************************************************************

MAIN: MOV.L

MOV.L

#XRAM_ADD,R4

#YRAM_ADD,R6

;*******************************************************************************************

;* DSP multiplication routine

;*******************************************************************************************

MOVX.W @R4,X0 MOVY.W @R6,Y0 ;Load 0.5,0.25

PMULS X0,Y0,A0

;A0 = multiplication result

;*******************************************************************************************

;* CPU multiplication routine

;*******************************************************************************************

MOV.L

MULS.W

STS

@R4,R0

@R6,R1

R0,R1

;H'4000 load

;H'2000 load

MACL,R14

;R14 = multiplication result

EXIT: BRA

EXIT

NOP

MAIN_E:NOP

Data

;**********************************************************************

;* Data

;**********************************************************************

.SECTION XRAM,DATA,LOCATE=H'1000F000

XRAM_ADD: .XDATA.W 0.5

;DSP operation data

.SECTION YRAM,DATA,LOCATE=H'1001F000

.XDATA.W 0.25

.END

YRAM_ADD

Rev. 1.0, 09/99, page 44 of 115

Section 7 32-bit Multiplication

Overview

The 32-bit data value stored at the XRAM_ADD address (H'1000F000) and the 32-bit data value

stored at the YRAM_ADD address (H'1001F000) are multiplied, and the result (64-bit) is

transferred from the ANS address (H'1001F100) to the ANS+7 address (H'1001F107), where it is

stored.

Description

1. Overview of Calculation Method

The addresses where the multiplier and multiplicand of a 32-bit multiplication operation are

stored, and the address where the result is stored, are shown in figure 7.1. Figure 7.2 shows an

overview of the calculation method for 32-bit multiplication. The 32-bit data values (the

multiplier and multiplicand) are separated into their upper and lower 16-bit segments (here

provisionally called A, B, C, and D), which are then multiplied to produce the 64-bit operation

result. The top bit (MSB) of the 16-bit data input to the multiplier is interpreted as the sign bit,

and it has a weight of –2⁰= –1. Therefore, in the example program the first top bit (MSB) is

replaced with 0, the product of the various segments is calculated, and a correction items are

added using the top bit in order to obtain the 32-bit multiplication result.

Input

16 15

XRAM_ADD+2

Multiplicand (32-bit)

Multiplier (32-bit)

XRAM_ADD

YRAM_ADD

16 15

YRAM_ADD+2

)

Output

48 47

32 31

16 15

Multiplication result

(64-bit)

ANS

ANS+2

ANS+4

ANS+6

Figure 7.1 32-bit Multiplication

Rev. 1.0, 09/99, page 45 of 115

Multiplicand

Multiplier

× )

B: XRAM_ADD+2 address data

A: XRAM_ADD address data

D: YRAM_ADD+2 address data

C: YRAM_ADD address data

B × D

A × D

B × C

A × C

48 47

32 31

16 15

Figure 7.2 Overview of Calculation Method for 32-bit Multiplication

Rev. 1.0, 09/99, page 46 of 115

2. Double-length Calculation Algorithm

If the single-precision number of bits is n, “double-length” refers to 2n bits. Therefore, 2n bit

numbers can be expressed as shown in figure 7.3.

2n–1

n n–1

Multiplicand: E

^2n–1(Upper MSB)

–e_2n–1· 2

²∑^n–2e_i· 2ⁱ

i=n

e_n–1· 2^n–1(Lower MSB)

n–2

∑ e_i· 2ⁱ

i=0

2n–1

n n–1

Multiplier: F

–f_2n–1· 2^2n–1

²∑^n–2f_i· 2ⁱ

i=n

f_n–1· 2^n–1

n–2

∑ f_i· 2ⁱ

i=0

*1: ei, fi = 0 or 1

Figure 7.3 Structure of 2n-bit Numbers

Rev. 1.0, 09/99, page 47 of 115

Here, if Σe_i· 2ⁱ= A0, Σe_i· 2ⁱ= B0, Σe_i· 2ⁱ= C0, Σe_i· 2ⁱ= D0, performing the double-length

multiplication E × F is can be expressed as:

E × F = (–e_2n–1· 2^2n–1+ A0 + e_2n–1· 2^n–1++ B0) × (–f_2n–1· 2^2n–1+ C0 + f_2n–1· 2^n–1++ D0)

= e_2n–1· f_2n–1· 2^4n–2(1)

–e_2n–1· 2^2n–1(C0 + f_n–1· 2^n–1++ D0) (2)

–f_2n–1· 2^2n–1(A0 + e_n–1· 2^n–1++ B0) (3)

+e_n–1· 2^n–1(C0 + f_n–1· 2^n–1++ D0) (4)

+f_n–1· 2^n–1(A0 + B0) (5)

+A0 · C0 + A0 · D0 + B0 · C0 + B0 · D0 (6)

In the above equation, (6) is the product of the segments and (1) through (5) are correction

items.

The correction items involve determining whether the sign bit is “0” or “1” and, if it is “1”,

adding it to or deleting it from the product of the segments.

Figure 7.4 shows a 32-bit double-length multiplication algorithm that uses the above equation.

The whole can be subdivided into the following six parts:

In part (1), in order to clear the sign bits of A, B, C, and D to 0, the logical product with

H'7FFF is obtained, resulting in A0, B0, C0, and D0. In part (2), the product is calculated for

the following four segments: A0 · C0, A0 · D0, B0 · C0, and D0 · C0. In parts (3) through (6),

the sum is obtained for each digit, and the results are stored at the ANS, ANS+2, ANS+4, and

ANS+6 addresses.

Rev. 1.0, 09/99, page 48 of 115

16 15

× )

(1-1)

(1-2)

(1-3)

(1-4)

(2-1)

(2-2)

(2-3)

(2-4)

(3-1)

(4-1)

(4-2)

(4-3)

(4-4)

(4-5)

(1)

16 15

A0 × D0

16 15

B0 × D0

(2)

(3)

16 15

A0 × D0

16 15

B0 × C0

ANSWER1

(A0 × D0) Low

(B0 × C0) Low

(B0 × D0) High

(4)

16 15

Correction item (4)

C0 + D

16 15

A0 + B0

+ ) Correction item (5)

(4-6)

(5-1)

(5-2)

(5-3)

(5-4)

(5-5)

(5-6)

(5-7)

ANSWER2

(A0 × C0) Low

(B0 × C0) High

(A0 × D0) High

16 15

Correction item (2)

Correction item (3)

–(C0 + D)

(5)

16 15

–(A0 + B)

Correction item (4)

+ )

Correction item (5)

(5-8)

(6-1)

(6-2)

(6-3)

(6-4)

(6-5)

ANSWER3

(A0 × C0) High

Correction item (2)

Correction item (3)

–C0

(6)

–A0

+ ) Correction item (1)

H'8000

ANSWER4

*1 S : Sign bit

*2 : Decimal point position

Figure 7.4 32-bit Double-length Multiplication Algorithm

Rev. 1.0, 09/99, page 49 of 115

Flowchart

Start

To clear sign bit of A, obtain logical product of A and

H'7FFF, and designate as A0

Determine sign bit

(1-1)

(1-2)

To clear sign bit of B, obtain logical product of A and

H'7FFF, and designate as B0

Determine sign bit

(1)

To clear sign bit of C, obtain logical product of A and

H'7FFF, and designate as C0

Determine sign bit

(1-3)

(1-4)

To clear sign bit of D, obtain logical product of A and

H'7FFF, and designate as D0

Determine sign bit

Multiply A0 and C0, separate upper and lower bits of

result, and store in XRAM

(2-1)

(2-2)

Multiply B0 and D0, separate upper and lower bits of

result, and store in YRAM

(2)

Multiply A0 and D0, separate upper and lower bits of

result, and store in XRAM

(2-3)

(2-4)

(3-1)

Multiply B0 and C0, separate upper and lower bits of

result, and store in YRAM

Store lower bits of B0 and D0 multiplication result at

ANS+6 address

(3)

(4)

Add lower bits of A0 × D0, lower bits of B0 × C0, and

lower bits of B0 × D0

(4-1)

(4-2)

Is B sign bit 1?

Yes

Add lower bits (D) of correction item (4) to result of

(4-1)

(4-3)

Rev. 1.0, 09/99, page 50 of 115

(4-4)

(4-5)

Is D sign bit 1?

Yes

(4)

Add lower bits (B0) of correction item (5) to result of

(4-1) or (4-3)

(4-6)

(5-1)

(5-2)

Store result of (4-1), (4-3) or (4-5) at ANS+4 address

Add lower bits of A0 × C0, lower bits of B0 × C0, and

upper bits of A0 × D0

Is A sign bit 1?

Yes

Add lower bits (–D) of correction item (2) to result of

(5-1)

(5-3)

(5-4)

Is C sign bit 1?

Yes

(5)

Add lower bits (–B) of correction item (3) to result of

(5-1) or (5-3)

(5-5)

(5-6)

Is B sign bit 1?

Yes

Add upper bits (C0) of correction item (4) to result of

(5-1), (5-3) or (5-5)

(5-7)

(5-8)

Is D sign bit 1?

Yes

Add upper bits (A0) of correction item (5) to result of

(5-3), (5-5) or (5-7)

(5-9)

Rev. 1.0, 09/99, page 51 of 115

(5)

Store result of (5-1), (5-3), (5-5), (5-7) or (5-9) at

ANS+2 address

(5-10)

(6-1)

(6-2)

Add carry to upper bits of result of (2-1)

Is A sign bit 1?

Yes

Add upper bits (–C0) of correction item (2) to result

of (6-1)

(6-3)

(6-4)

Is C sign bit 1?

Yes

(6)

Add upper bits (–A0) of correction item (3) to result

of (6-1) or (6-3)

(6-5)

(6-6)

Are A and C sign bits both 1?

Yes

Add of correction item (1) (H'8000) to result of (6-1),

(6-3) or (6-5)

(6-7)

(6-8)

Store result of (6-1), (6-3), (6-5) or (6-7) at ANS

address

End

Rev. 1.0, 09/99, page 52 of 115

Main Program

;*******************************************************************************************

32-bit fixed-point multiplication routine

[A][B] × [C][D]

;*******************************************************************************************

MAIN: MOV.L #XRAM_ADD,R4

MOV.L #WORKX,R5

MOV.L #YRAM_ADD,R6

MOV.L #WORKY,R7

;XRAM for work

;YRAM for work

;Clear sign

MOV.W

PCLR

#H'7FFF,R0

R0,@R7

MOVX.W @R4+,X0 MOVY.W @R7,Y0 ;A,H'7FFF load

MOVY.W @R6+,Y1;A0,C load

PAND

X0,Y0,A0

MOV.W

PSHA

R0,@R5

;H'7FFF -> #WORKX

#1,X0

MOVX.W @R5,X1

MOVX.W A0,@R5+

MOVX.W @R4,X0

;A sign chech,H'7FFF load

DCT PINC A1,A1

;A0 store

PAND

MOV.L

PCLR

X1,Y1,A0

;C0,B load

R4,@-R15

#SIGNA,R4

MOVX.W A1,@R4+

PSHA

#1,Y1

A1,A1

X0,Y0,A0

MOVY.W A0,@R7+;C sign check,C0 store

MOVY.W @R6,Y1 ;B sign check,D load

;B0

DCT PINC

PAND

MOVX.W A1,@R4+

MOVX.W A0,@R5

MOVX.W A1,@R4+

PCLR

PSHA

#1,X0

A1,A1

X1,Y1,A0

DCT PINC

PAND

;D0,B0 store

PCLR

PSHA

#1,Y1

A1,A1

DCT PINC

MOVY.W A0,@R7 ;D0 store

MOVX.W A1,@R4

MOV.L

@R15+,R4

;*****************************************************************

;*Segment product calculation routine/ B0×D0,A0×C0,B0×C0,A0×D0

;*****************************************************************

MOV.L

#WORKX,R5

#WORKY,R7

MOVX.W @R5+,X0 MOVY.W @R7+,Y0;A0,C0

PMULS X0,Y0,A1

PMULS X1,Y1,A0

MOVX.W @R5+,X1 MOVY.W @R7+,Y1;A0×C0,B0,D0

MOVX.W A1,@R5+

;B0×D0, (A0×C0)H store

PSHA

#16,A1

MOVY.W A0,@R7+;(A0×C0)L, (B0×D0)H store

Rev. 1.0, 09/99, page 53 of 115

PSHA

PMULS X0,Y1,A1

PSHA #16,A1

PMULS X1,Y0,A1

PSHA #16,A1

#16,A0

MOVX.W A1,@R5+

;(B0×D0)L, (A0×C0)L store

MOVY.W A0,@R7+;A0×D0, (B0×D0)L store

;(A0×D0)L, (A0×D0)H store

MOVX.W A1,@R5+

MOVX.W A1,@R5

;B0×C0, (A0×D0)L store

MOVY.W A1,@R7+;(B0×C0)L, (B0×C0)H store

MOVY.W A1,@R7 ;(B0×C0)L store

;******************

;*ANSWER1 STORE

;******************

MOV.L

R7,@-R15

#ANS,R7

#6,R7

;push R7

MOV.L

ADD

MOVY.W A0,@R7+;Store in ANS1

ADD

#-2,R7

MOV.L

R7,R14

;R14=#ANS+2

;pop R7

@R15+,R7

********************************************************************************************

;*2-word calculation routine/ R4=#XRAM_ADD+2,R5=#WORKX+10,R6=#YRAM_ADD+2,R7=#WORKY+10

;*******************************************************************************************

PCOPY X1,M1

MOV.L #-6,R9

PCLR

MOVX.W @R5,X1

MOVY.W @R7+R9,Y1 ;(A0×D0)L lode,

(B0×C0)L load

PADD

X1,Y1,A0

MOVY.W @R7+,Y1

;(A0×D0)L+(B0×C0)L,

(B0×D0)H load

DCT PINC

PADD

A1,A1

;carry check

A0,Y1,A0

;(A0×D0)L+(B0×C0)

L+(B0×D0)H

DCT PINC

MOV.W

MOV.L

MOV.W

CMP/EQ

A1,A1

;carry check

#H'0,R10

#SIGND,R0

@R0+,R1

R10,R1

;Is B negative?

HOSEI4_L

MOVY.W @R6,Y1

;Load D

;Add D

PADD

DCT PINC

HOSEI4_L:

MOV.W

A0,Y1,A0

A1,A1

@R0,R1

CMP/EQ

R10,R1

;Is D negative?

;Add B0

HOSEI5_L

PADD

A0,M1,A0

A1,A1

DCT PINC

HOSEI5_L:

MOV.L

R4,@-R15

;push R4

Rev. 1.0, 09/99, page 54 of 115

MOV.L

#CARRY,R4

@R15+,R4

MOVX.W A1,@R4

;carry store

;pop R4

MOV.L

;******************

;*ANSWER2 STORE

;******************

MOV.L

R7,@-R15

R14,R7

;push R7

MOV.L

MOVY.W A0,@R7+

;ANS2 store

ADD

#-2,R7

MOV.L

R7,R14

;R14=#ANS+4

;pop R7

@R15+,R7

;*******************************************************************************************

;*3-word calculation routine/ R4=#XRAM_ADD+2,R5=#WORKX+10,R6=#YRAM_ADD+2,R7=#WORKY+6

;*******************************************************************************************

MOV.L #-4,R8

PCOPY X0,A1

MOVX.W @R5+R8,X0 MOVY.W @R7+,Y1 ;dummy load

MOVX.W @R5+,X0

MOVY.W @R7+,Y1 ;(A0×C0)L lode,

(B0×C0)H load

PADD

X0,Y1,M1

MOVX.W @R5,X1

;(A0×C0)L+(B0×C0)H,

(A0×D0)H load

DCT PINC

PADD

M0,M0

;carry check

X1,M1,A0

;(A0×C0)L+(B0×C0)

H+(A0×D0)H

DCT PINC

;Correction

MOV.W

M0,M0

;carry check

#H'0,R10

#SIGNA,R0

@R0+,R1

R10,R1

MOV.L

MOV.W

CMP/EQ

;Is A negative?

HOSEI2_L

PSUB

A0,Y1,A0

M0,M0

;Subtract D (correction 2)

DCT PDEC

HOSEI2_L:

MOV.W

@R0+,R1

R10,R1

CMP/EQ

;Is C negative?

HOSEI3_L

MOVX.W @R4,X1

PCOPY X1,M1

PSUB

A0,M1,A0

M0,M0

;Subtract B (correction 3)

DCT PDEC

HOSEI3_L:

MOV.W

CMP/EQ

@R0+,R1

R10,R1

;Is B negative?

HOSEI4_H

PADD

A0,Y0,A0

;Subtract C0 (correction 4)

Rev. 1.0, 09/99, page 55 of 115

DCT PINC

HOSEI4_H:

MOV.W

M0,M0

@R0+,R1

R10,R1

CMP/EQ

;Is D negative?

HOSEI5_H

PCOPY A1,M1

PADD

DCT PINC

A0,M1,A0

M0,M0

;Add A0 (correction 5)

HOSEI5_H:

PCOPY A0,M1

MOV.L

#CARRY,R4

MOVX.W @R4,X1

;Load carry

;Add carry

PADD

X1,M1,A0

M0,M0

DCT PINC

;Check carry

;**************

;*ANSWER3 STORE

;**************

MOV.L

R14,R7

#-2,R7

MOVY.W A0,@R7+;ANS3 store

ADD

;*******************************************************************************************

;*4-word calculation routine/ R4=#XRAM_ADD+2,R5=#WORKX+8,R6=#YRAM_ADD+2,R7=#WORKY+10

;*******************************************************************************************

PCLR

MOVX.W @R5+R8,X1

MOVX.W @R5,X1

;dummy load

;(A0×C0)H load

PADD

X1,M0,A0

M1,M1

DCT PINC

;Correction

MOV.L

#SIGNA,R0

@R0+,R1

R10,R1

MOV.W

CMP/EQ

;Is A negative?

HOSEI3_H

PCOPY A1,M0

PSUB

DCT PDEC

MOV.L

A0,M0,A0

M1,M1

;Subtract C0 (correction 2)

#H'0,R12

#1,R12

ADD

HOSEI2_H:

MOV.W

@R0+,R1

R10,R1

CMP/EQ

;Is C negative?

HOSEI4_H

PSUB

A0,Y0,A0

M1,M1

;Subtract A0 (correction 3)

DCT PDEC

ADD

#1,R12

HOSEI3_H:

Rev. 1.0, 09/99, page 56 of 115

MOV.L

CMP/EQ

#2,R1

R1,R12

FIN

;Are both A and C negative?

MOV.W

#H'8000,R10

R10,@R5

MOVX.W @R5,X0

PCOPY X0,M1

;Add H'8000 (correction 1)

PADD

A0,M1,A0

;**************

;*ANSWER4 STORE

;**************

FIN:

MOVY.W A0,@R7 ;ANS4 store

EXIT: BRA

EXIT

NOP

MAIN_E:

NOP

Data

;*******************************************************************************************

;* 32-bit multiplication data (XRAM/YRAM)

;*******************************************************************************************

.SECTION XRAM,DATA,LOCATE=H'1000F000

XRAM_ADD:

WORKX:

CARRY:

SIGNA:

SIGNC:

SIGNB:

SIGND:

.XDATA.L

.RES.W

0.25002500 ;Multiplicand

;Work area

;Carry area

;For determining sign of multiplicand upper word A

;For determining sign of multiplier upper word C

;For determining sign of multiplicand lower word B

;For determining sign of multiplier lower word D

.SECTION YRAM,DATA,LOCATE=H'1001F000

YRAM_ADD:

WORKY:

ANS:

.XDATA.L

.RES.W

0.50005000 ;Multiplier

;Work area

.RES.W

;Multiplication result storage area

Rev. 1.0, 09/99, page 57 of 115

Section 8 Trigonometric Functions

Overview

Calculating the trigonometric functions SIN(X) and COS(X).

Description

1. Performing Trigonometric Functions

Figure 8.1 shows curves for SIN(X) and COS(X). If the angle range is –π ≤ X ≤ π, the

relationships expressed in equation (1) exists.

SIN(–X) = –SIN(X)

COS(–X) = COS(X)

------------------------------------------------------------------ (1)

Using the relationships expressed in equation (1), the SIN(X) and COS(X) of –π ≤ X ≤ 0 can

be calculated by obtaining the SIN(X) and COS(X) of 0 ≤ X ≤ π and processing the sign.

Next is figure 8.2 (a) and (b). The relationships of SIN(X) and COS(X), with X = π/2 at the

center, are expressed in equation (2).

SIN(X + π/2) = –SIN(π/2 – X)

------------------------------------------------------ (2)

COS(X + π/2) = COS(π/2 – X)

–π

–π/2

π/2

–1

Figure 8.1 SIN(X) and COS(X) Curves

Rev. 1.0, 09/99, page 59 of 115

π/2

–1

(a) SIN (X)

(b) COS (X)

Figure 8.2 SIN(X) and COS(X) Curves with X = π/2 at Center

Based on the relationship between equations (1) and (2), the SIN(X) and COS(X) of –π ≤ X ≤

π can be calculated by obtaining the SIN(X) and COS(X) of 0 ≤ X ≤ π and, finally, processing

the sign. The example program divides 0 ≤ X ≤ π/2 into 128 segments. If X = n · π/256 + ∆X

(n = 1, 2, ...., 128), the result is equation (3), based on the addition theorem of trigonometric

functions.

SIN(X) = SIN(n · π/256 + ∆X)

= SIN(n · π/256) · COS(∆X) – COS(n · π/256) · SIN(∆X)

COS(X) = COS(n · π/256 + ∆X)

------------ (3)

= COS(n · π/256) · COS(∆X) – SIN(n · π/256) · SIN(∆X)

If we assume that in equation (3) ∆X is extremely small and approximate that SIN(∆X) = ∆X

and COS(∆X) = 1 – (∆X)²/2, the result is equation (4).

SIN(X) = SIN(n · π/256) · {1 – (∆X)²/2} + ∆X · COS(n · π/256)

--------------- (4)

COS(X) = COS(n · π/256) · {1 – (∆X)²/2} – ∆X · SIN(n · π/256)

In other words, by calculating equation (4) using ∆X and table data (n · π/256), we can obtain

the SIN(X) and COS(X) of 0 ≤ X ≤ π/2. The final result is then obtained by performing sign

processing.

Rev. 1.0, 09/99, page 60 of 115

2. Converting Input Values

Using conversion equation (5), the example program inputs to the DSP as angle parameters the

input value X for the range –π ≤ X ≤ π and a for the range –1 ≤ X < 1.

X = π · a

a = X/π

--------------------------------------------------------------------------------- (5)

X unit: rad

a unit: rad/π

Table 8.1 Relation Between Input Value a and Polarity

Result

Input Value

SIN(X)

COS(X)

|a|

–1 < ≤ a < –0.5

(–π ≤ X < –π/2)

Negative

| a | > 0.5

–0.5 ≤ a < 0

(–π/2 ≤ X < 0)

Negative

Positive

Negative

| a | ≤ 0.5

| a | > 0.5

0 ≤ a ≤ 0.5

(0 ≤ X ≤ π/2)

0.5 < a < 1

(π/2 < X < π)

Here the range 0 ≤ X ≤ π/2 corresponds to the range 0 ≤ X ≤ 0.5. Also, the input value a is

converted from the range –1 < a ≤ 1 to the range 0 ≤ a' ≤ 0.5. Figure 8.3 shows the curves

| SIN(X) | and | COS(X) |.

–π

–π/2

π/2

–π

–π/2

π/2

(a) | SIN(X) |

(b) | COS(X) |

Figure 8.3 Curves | SIN(X) | and | COS(X) |

Rev. 1.0, 09/99, page 61 of 115

When obtaining the SIN(X) and COS(X) of point A in figure 8.3, if we assume that A = π/2 +

B, then a = 0.5 + b. Therefore, it is possible to obtain the deviation | b | relative to X = π/2

using equation (6).

| b | = | | a | –0.5 | ------------------------------------------------------------------------- (6)

Next, based on deviation | b |, equation (7) is used to calculate the conversion of input value a

for the range –1 < a ≤ 1 to a' for the range 0 ≤ a' ≤ 0.5.

a' = | | | a | –0.5 | –0.5 | ------------------------------------------------------------------- (7)

3. a' Table Data

The example program uses a table with 128 cells. In other words, the range 0 ≤ a' ≤ 0.5 is

divided into 128 equal segments. The difference in a' due to the angle of each segment is

expressed in equation (8).

0.5/128 = 0.00390625 ------------------------------------------------------------------- (8)

Table 8.2 shows the correspondence between table address n and a' in decimal notation and as

16-bit fixed-point expressions.

Table 8.2 Relationship Between Table Address n and a'

Table

Address

n/256;

Decimal Notation

rad]/π

16-bit Fixed-point Expression

15 14 13 12 11 10

0.00000000

0.00390625

0.00781250

0.01171875

0.01562500

127

128

0.49609375

0.50000000

: Decimal point position

Rev. 1.0, 09/99, page 62 of 115

4. Method of Calculating ∆X

As shown in table 8.2, the upper nine bits of the a' data expressed in fixed-point format

correspond to n, and the lower seven bits to the amount of shift from the table data ∆a'. Figure

8.4 shows the bit structure of a'. By obtaining the value of a', it is possible to calculate the

equation (2) table data address (the value of n · π/256) as well as ∆X at the same time. Finally,

table 8.1 is used for sign processing in order to obtain the SIN(X) and COS(X) of –π ≤ X ≤ π.

Table address n

Shift from table ∆a

: Decimal point position

Figure 8.4 Bit Structure of a'

Figure 8.5 shows the relationship with the amount of shift between table values ∆X. Table shift

∆X can also be obtained by using the ∆a of a' and equation (9).

∆X = ∆a · π -------------------------------------------------------------------------------- (9)

(n+1) · π/256

∆X

n · π/256

Figure 8.5 Relation With Amount of Shift Between Table Values

Rev. 1.0, 09/99, page 63 of 115

5. Overflow Processing

If the calculation result is as shown in equation (10), an overflow occurs.

| SIN(X) | ≥ 1

| COS(X) | < 0

-------------------------------------------------------------------------- (10)

In such cases the value is corrected using equation (11).

| SIN(X) | = 1 – 2^–15

------------------------------------------------------------------- (11)

| COS(X) | = 0

6. Algorithm for Calculating Trigonometric Functions

The algorithm for calculating trigonometric functions is as follows.

(1) Make initial settings.

(2) Load input value a, calculate | | | a | –0.5 | –0.5 | to obtain a'.

(3) Obtain logical product of above and #H'FF80 and calculate upper nine bits (n/256) of a'.

Then calculate n and set value in Y bus index register (R9).

(4) Obtain logical product of above and #H'007F and calculate lower seven bits (∆a') of a'.

(5) Calculate π∆a'; calculate ∆X.

(6) Calculate 1 – (∆X)²/2. Load sin(n × π/256) and cos(n × π/256) from data table in YRAM.

(7) Calculate sin(X).

(8) Process sign of sin(X); store sin(X).

(9) Calculate cos(X).

(10)Process sign of cos(X); store cos(X).

Rev. 1.0, 09/99, page 64 of 115

Execution Example

The sin(X) and cos(X) (OUTPUT) calculation results obtained based on the input value a

(INPUT) are shown in table 8.3.

Table 8.3 sin(x), cos(X) Calculation Results

Logical Value

(decimal)

Logical Value

(hexadecimal)

Output Value

(hexadecimal)

Input

Value

Angle

X°

(a = X/π) sin(X)

cos(X)

sin(X)

cos(X)

H'7FFF

H'6EDA

H'5A82

H'011E

H'8EFC

H'8002

H'620D

H'2121

H'A263

H'8000

sin(X)

cos(X)

H'7FFF

H'6ED9

H'5A82

H'011D

H'8EFD

H'8002

H'620F

H'2121

H'A263

H'8001

H'0000

H'4000

H'5A82

H'7FFE

H'3C17

H'011E

H'ADB9

H'845D

H'A8B4

H'0000

H'3FFE

H'5A82

H'7FFD

H'3C19

H'011C

H'ADBB

H'845D

H'A8B5

H'0002

0.16667

0.25

0.5

0.86603

0.70711

0.00873

–0.88295

–0.99996

0.70711

0.99996

0.46947

0.00873

89.5

152

179.5

–40

–75

–137

–180

0.49722

0.84444

0.99722

–0.22222 –0.64279 0.76604

–0.41667 –0.96593 0.25882

–0.76111 –0.681

–1

–0.73135

–1

Rev. 1.0, 09/99, page 65 of 115

Flowchart

Start

(1-1)

(1-2)

(1-3)

Transfer INPUT address to register R4

Transfer WORK address to register R5

Transfer TABLE_SIN address to register R6

Transfer TABLE_COS address to register R7

Load input value a

(1)

(1-4)

(2-1)

(2-2)

Transfer H'FF80 to R5 address (WORK area)

To determine sign, copy a and store value in register

M1, load 0.5

(2-3)

(2-4)

(2-5)

(2)

Calculate | | a | –0.5 |

Calculate | | | a | –0.5 | –0.5 | to obtain a', load

H'FF80 from address R5

Obtain logical product of a' and H'FF80, calculate

upper 9 bits (n/256) of a'

(3-1)

Convert n/256 fixed-point data to integer data by

shifting n/256 6 bits to the right

(3-2)

(3-3)

(3)

Transfer integer data n obtained in (2-1) to R5

address (WORK area)

Zero-extend integer data n passed to CPU unit via R5

address to long-word size, set Y index register R9

(3-4)

Rev. 1.0, 09/99, page 66 of 115

(4-1)

(4-2)

Transfer H'007F to R5 address (WORK area)

Load H'007F from R5 address

(4)

Obtain logical product of a' and H'007F, calculate

lower seven bits (∆a') of a'

(4-3)

(5-1)

Calculate 4∆a' by shifting the ∆a' value obtained in

(4-3) 2 bits to the left

Calculate π/4

(5)

(5-2)

(6-1)

Multiply 4∆a' and π/4 to calculate ∆X

Square (∆X²) ∆X value obtained in (5-2)

Load sin(n × π/256) from data table in YRAM

Shift ∆X²value obtained in (6-1) 1 bit to the right to

obtain 1/2 (∆X²/2)

(6-2)

(6)

Load –1 from register R4

Subtract ∆X²/2 value obtained in (6-2) from –1 loaded

in (6-2) to calculate 1 – ∆X²/2

Load cos(n × π/256) from data table

(6-3)

(7-1)

Set operation result status (set using DC bit in register

DSR) to overflow mode

Multiply ∆X value obtained in (5-2) and cos(n × π/256)

value loaded in (6-3)

(7-2)

(7-3)

(7)

Multiply sin(n × π/256) value obtained in (6-1) and

(1 – ∆X²/2) value obtained in (6-3)

Add operation results from (7-2) and (7-3) to calculate

sin(X)

(7-4)

Rev. 1.0, 09/99, page 67 of 115

(7-5)

Did (7-4) operation overflow?

Yes

(7)

(7-6)

(8-1)

(8-2)

Decrement sin(X) value obtained in (7-4)

Copy input value a from register M1 to register X1

Set operation result status (set using DC bit in register

DSR) to negative value mode

Shift by 1 bit input value a stored in register X1 in

(8-1)

(8-3)

(8-4)

(8-5)

(8)

Is the sign bit of a 1 (a < 0)?

Yes

Reverse the sign of the sin(X) value obtained in (7-4)

(8-6)

(8-7)

Transfer the OUTPUT address to register R6

Store sin(X) at the R6 address (OUTPUT+2)

Set operation result status (set using DC bit in register

DSR) to overflow mode

(9-1)

(9-2)

(9-3)

(9-4)

Multiply DX value obtained in (5-2) and sin(n × π/256)

value loaded in (6-1)

(9)

Multiply 1 – ∆X²/2 and cos(n × π/256) values obtained

in (6-3)

Add operation results from (9-2) and (9-3) to calculate

cos(X)

III

Rev. 1.0, 09/99, page 68 of 115

III

(9)

(9-5)

(9-6)

Did (9-4) operation overflow?

Yes

Clear cos(X) value obtained in (9-4) to 0

(10-1)

(10-2)

Transfer the DAT address to register R4

Load 0.5 from R4 address

Calculate absolute value of input value a stored in

(10-3)

(10-4)

Set operation result status (set using DC bit in register

DSR) to negative value mode

(10)

Is value

(10-5)

of | a | greater than 0.5?

| a | > 0.5?

Yes

Reverse the sign of the cos(X) value obtained in

(10-4)

(10-6)

(10-7)

Store cos(X) at the R6 address (OUTPUT+2)

End

Rev. 1.0, 09/99, page 69 of 115

Main Program

;*******************************************************************************************

Trigonometric function routine

sinX,cosX

;*******************************************************************************************

Initial setting routine

;*******************************************************************************************

MAIN:

MOV.L

#INPUT,R4

#WORK,R5

#TABLE_SIN,R6

#TABLE_COS,R7

;*******************************************************************************************

;* a calculation routine

;*******************************************************************************************

MOVX.W @R4,X0

;a load

MOV.L

#H'FF80,R0

;For extracting upper 9 bits

of a' (N×π/64)

MOV.W

R0,@R5

MOV.L

#DAT,R4

PCOPY X0,M1

MOVX.W @R4+,X1

;For determining sign of M1,

load 0.5

PCOPY X1,Y1

PSUB

PABS

PSUB

PABS

X0,Y1,M0

M0,A0

;||a|-0.5|

A0,Y1,M0

M0,M0

;|||a|-0.5|-0.5|

;M0 = a', #H'FF80 load

MOVX.W @R5,X0

;*******************************************************************************************

;* n calculation, R6 setting routine

;*******************************************************************************************

PAND

PSHA

X0,M0,A0

#-6,A0

;A1 = n/256

;Convert fixed-point n to

integer n

MOVX.W A0,@R5

;Pass integer n to CPU unit

MOV.W

EXTU.W

MOV.L

@R5,R1

R1,R1

R1,R9

;

;*******************************************************************************************

∆a' calculation routine

;*******************************************************************************************

MOV.L

#H'007F,R0

;For extracting lower 7 bits

of a' (∆a')

Rev. 1.0, 09/99, page 70 of 115

MOV.W

PAND

R0,@R5

MOVX.W @R5,X1

;#H'007F load

X1,M0,Y1

;∆a'

;*******************************************************************************************

∆X calculation routine

;*******************************************************************************************

PSHA

#2,Y1

MOVX.W @R4+,X1

;4∆a', ∆/4 load

;∆a'× π

PMULS X1,Y1,A1

;*******************************************************************************************

1 – (∆X²)/2calculation, sin(n × π/256) and cos(n × π/256) loading routine

;*******************************************************************************************

PCOPY A1,X0

MOVY.W @R6+R9,Y0 ;copy,dummy load

PMULS A1,X0,M0

MOVY.W @R6,Y0

;∆X²,sin(n×π/256) load

PSHA

PSUB

#-1,M0

MOVX.W @R4,X1 MOVY.W @R7+R9,Y1 ;∆X²/2, -1 lode,dummy load

X1,M0,A1

MOVY.W @R7,Y1

;1-∆X²/2,cos(n×π/256) load

;*******************************************************************************************

;* sin(X) calculation routine

;*******************************************************************************************

MOV.L

LDS

#H'6,R0

R0,DSR

;Set overflow mode

;∆X·cos(n×π/256)

;(1-(∆X²)/2)·sin(n×π/256)

PMULS X0,Y1,M0

PMULS A1,Y0,A0

PABS

PADD

A0,A0

A0,M0,A0

A0,A0

;A0 = sin(X)

DCT PDEC

;If overflow occurs, sin(X) – 1

;*******************************************************************************************

;* sin(X) sign processing and storing routine

;*******************************************************************************************

PCOPY M1,X1

MOV.L

LDS

#H'0,R0,

R0,DSR

;Carry/borrow mode

;If a < 0, reverse sign

;Store sin(X)

PSHA

#1,X1

A0,A0

DCT PNEG

MOV.L

#OUTPUT,R6

MOVY.W A0,@R6+

;*******************************************************************************************

;* cos(X) calculation routine

;*******************************************************************************************

MOV.L

LDS

#H'6,R0

R0,DSR

;Set overflow mode

;∆X·SIN(N×π/64)

PMULS X0,Y0,M0

PMULS A1,Y1,A0

;(1-(∆X·∆X)/2)·COS(N×π/64)

PABS

PSUB

A0,A0

A0,M0,A0

DCT PCLR

;If overflow occurs, clear cos(X) to 0

Rev. 1.0, 09/99, page 71 of 115

;;******************************************************************************************

;* cos(X) sign processing and storing routine

;*******************************************************************************************

MOV.L

#DAT,R4

MOVX.W @R4.X0

;0.5 load

;|a|

PABS

MOV.L

M1,M1

#H'2,R0

R0,DSR

LDS

;Set negative value mode

PCMP

X0,M1

A0,A0

DCT PNEG

;If | a | < 0.5, reverse sign

MOVY.W A0,@R6

EXIT: BRA

NOP

EXIT

MAIN_E:NOP

Rev. 1.0, 09/99, page 72 of 115

Data

;*******************************************************************************************

;* Trigonometric function data routine

;*******************************************************************************************

.SECTION XRAM,DATA,LOCATE=H'1000FF00

INPUT:

WORK:

DAT:

.RES.W

;External input data storage area

.RES.W

.XDATA.W

0.5,0.78540,-1

;For calculating a', for calculating Ñ/4 (1 – ¦X²/2)

.SECTION YRAM,DATA,LOCATE=H'1001F800

0,0.01227,0.02454,0.03681,0.04907,0.06132 ;N/0 - 5

TABLE_SIN:

.XDATA.W

0.07356,0.08580,0.09802,0.11022,0.12241

0.13458,0.14673,0.15886,0.17096,0.18304

0.19509,0.20711,0.21910,0.23106,0.24298

0.25487,0.26671,0.27852,0.29028,0.30201

0.31368,0.32531,0.33689,0.34842,0.35990

0.37132,0.38268,0.39400,0.40524,0.41643

0.42756,0.43862,0.44961,0.46054,0.47140

0.48218,0.49290,0.50354,0.51410,0.52459

0.53500,0.54532,0.55557,0.56573,0.57581

0.58580,0.59570,0.60551,0.61523,0.62486

0.63439,0.64383,0.65317,0.66242,0.67156

0.68060,0.68954,0.69838,0.70711,0.71573

0.72425,0.73265,0.74095,0.74914,0.75721

0.76517,0.77301,0.78074,0.78835,0.76584

0.80321,0.81046,0.81758,0.82459,0.83147

0.83822,0.84485,0.85136,0.85773,0.86397

0.87009,0.87607,0.88192,0.88764,0.89322

0.89867,0.90399,0.90917,0.91421,0.91911

0.92388,0.92851,0.93299,0.93734,0.94154

0.94561,0.94953,0.95331,0.95694,0.96043

0.96378,0.96700,0.97003,0.97294,0.97570

0.97832,0.98079,0.98311,0.98528,0.98730

0.98918,0.99090,0.99248,0.99391,0.99518

0.99631,0.99729,0.99812,0.99880,0.99932

0.99970,0.99992,1

;N/6 - 10

;N/11 - 15

;N/16 - 20

;N/21 - 25

;N/26 - 30

;N/31 - 35

;N/36 - 40

;N/41 - 45

;N/46 - 50

;N/51 - 55

;N/56 - 60

;N/61 - 65

;N/66 - 70

;N/71 - 75

;N/76 - 80

;N/81 - 85

;N/86 - 90

;N/91 - 95

;N/96 - 100

;N/101 - 105

;N/106 - 110

;N/111 - 115

;N/116 - 120

;N/121 - 125

;N/126 - 128

TABLE_COS:

.XDATA.W

1,0.99992,0.99970,0.99932,0.99880,0.99812 ;N/0 - 5

0.99729,0.99631,0.99518,0.99391,0.99248

0.99090,0.98918,0.98730,0.98528,0.98311

0.98079,0.97832,0.97570,0.97294,0.97003

0.96700,0.96378,0.96043,0.95694,0.95331

0.94953,0.94561,0.94154,0.93734,0.93299

0.92851,0.92388,0.91911,0.91421,0.90917

0.90399,0.89867,0.89322,0.88764,0.88192

;N/6 - 10

;N/11 - 15

;N/16 - 20

;N/21 - 25

;N/26 - 30

;N/31 - 35

;N/36 - 40

Rev. 1.0, 09/99, page 73 of 115

.XDATA.W

0.87607,0.87009,0.86397,0.85773,0.85136

0.84485,0.83822,0.83147,0.82459,0.81758

0.81046,0.80321,0.76584,0.78835,0.78074

0.77301,0.76517,0.75721,0.74914,0.74095

0.73265,0.72425,0.71573,0.70711,0.69838

0.68954,0.68060,0.67156,0.66242,0.65317

0.64383,0.63439,0.62486,0.61523,0.60551

0.59570,0.58580,0.57581,0.56573,0.55557

0.54532,0.53500,0.52459,0.51410,0.50354

0.49290,0.48218,0.47140,0.46054,0.44961

0.43862,0.42756,0.41643,0.40524,0.39400

0.38268,0.37132,0.35990,0.34842,0.33689

0.32531,0.31368,0.30201,0.29028,0.27852

0.26671,0.25487,0.24298,0.23106,0.21910

0.20711,0.19509,0.18304,0.17096,0.15886

0.14673,0.13458,0.12241,0.11022,0.09802

0.08580,0.07356,0.06132,0.04907,0.03681

0.02454,0.01227,0

;N/41 - 45

;N/46 - 50

;N/51 - 55

;N/56 - 60

;N/61 - 65

;N/66 - 70

;N/71 - 75

;N/76 - 80

;N/81 - 85

;N/86 - 90

;N/91 - 95

;N/96 - 100

;N/101 - 105

;N/106 - 110

;N/111 - 115

;N/116 - 120

;N/121 - 125

;N/126 - 128

OUTPUT:

.RES.W

;External output data storage area

Rev. 1.0, 09/99, page 74 of 115

Section 9 Matrix Operations

Overview

Matrix A (3, 3) and matrix B (3, 3) are multiplied to obtain a 32-bit precision matrix product C (3,

3). Matrixes A and B are set in XRAM and YRAM beforehand. Matrix product C is stored

beginning at YRAM address H'1001FF00.

Description

1. Method of Expressing Matrixes

Figure 9.1 shows matrix A (n,m). The element a_ijis a component of matrix A. Horizontal rows

of components are called rows, which are numbered from the top as row1, row2, row3, ..., row

i, ... and so on. Vertical columns of components are called columns, which are numbered from

the left as column 1, column 2, column 3, ... column j, ... and so on. The components in the

position where row I and column k intersect is called component (i,j). Component (i,j) of

matrix A (n,m) is expressed as ai,j.

(Column j)

a₁₁a₁₂a_1j

a₂₁a₂₂a_2j

a_1n

a_2n

A = (row i)

a_i1

a_i2

a_ij

a_in

a_m1a_m2a_mj

a_mn

Figure 9.1 Matrix A

2. Method of Calculating Matrix Product

Figure 9.2 shows the expression of the components of matrix A × matrix B = matrix product C.

a₁₁a₁₂a₁₃

a₂₁a₂₂a₂₃

a₃₁a₃₂a₃₃

b₁₁b₁₂b₁₃

b₂₁b₂₂b₂₃

b₃₁b₃₂b₃₃

c₁₁c₁₂c₁₃

c₂₁c₂₂c₂₃

c₃₁c₃₂c₃₃

Matrix A

Matrix B

Matrix Product C

*1 c_i,j: 32-bit components.

Figure 9.2 Expression of Components of Matrix A × Matrix B = Matrix Product C

Rev. 1.0, 09/99, page 75 of 115

The components c_i,jof matrix product C are obtained using the following equation.

C_n,m= Σ (a_n,i× b_i,m)

i=1

The components c_i,jof matrix product C are obtained by performing a sum of products

calculation on row components a_n,iof matrix A and column components b_i,mof matrix B.

3. Method of Storing Matrix A, Matrix B, and Matrix Product C Components

The components c_n,mof matrix product C are obtained by performing a sum of products

calculation on row components a_n,iof matrix A and column components b_i,mof matrix B. The

example subroutine, in order to increase the processing speed, stores the elements in XRAM

and YRAM as shown in figure 9.3

A₁

A₂

C₁

C₂

B₁B₂B₃

Matrix B

A₃

C₃

Matrix A

Matrix Product C

Address

XRAM

a_1,1

a_1,2

a_1,3

a_2,1

a_2,2

a_2,3

a_3,1

a_3,2

a_3,3

Address

YRAM

#MATRIXA

#MATRIXC

CH1,1

CL1,1

CH1,2

CL1,2

CH1,3

CL1,3

CH2,1

CL2,1

CH2,2

CL2,2

CH2,3

CL2,3

CH3,1

CL3,1

CH3,2

CL3,2

CH3,3

CL3,3

#MATRIXA+2

#MATRIXA+4

#MATRIXA+6

#MATRIXA+8

#MATRIXA+A

#MATRIXA+C

#MATRIXA+E

#MATRIXA+10

#MATRIXC+2

#MATRIXC+4

#MATRIXC+6

#MATRIXC+8

#MATRIXC+A

#MATRIXC+C

#MATRIXC+E

#MATRIXC+10

#MATRIXC+12

#MATRIXC+14

#MATRIXC+16

#MATRIXC+18

#MATRIXC+1A

#MATRIXC+1C

#MATRIXC+1E

#MATRIXC+20

#MATRIXC+22

A₁

A₂

A₃

C₁

C₂

C₃

Address

YRAM

b_1,1

b_2,1

b_3,1

b_1,2

b_2,2

b_3,2

b_1,3

b_2,3

b_3,3

#MATRIXB

#MATRIXB+2

#MATRIXB+4

#MATRIXB+6

#MATRIXB+8

#MATRIXB+A

#MATRIXB+C

#MATRIXB+E

#MATRIXB+10

B₁

B₂

B₃

*1 CHi,j: Upper 16 bits of Ci,j

CLi,j: Lower 16 bits of Ci,j

Figure 9.3 Memory Map with Matrix A, Matrix B, and Matrix Product C

Components Stored

Rev. 1.0, 09/99, page 76 of 115

4. Algorithm for Calculating Matrix Product C

Figure 9.4 shows the algorithm for calculating matrix product C. The details of the algorithm

are described below.

(1) Clear counter registers, store matrix A in the X address register (R4) and matrix B in the Y

address registers (R6, R7), set the addresses for storing the components of matrix product

(2) Perform sum of products calculation on row components a_n,iof matrix A and column

components b_i,mof matrix B.

(3) Store CHn,m (upper 16 bits of matrix product Cn,m) in MATRIXC+2n address and

CLn,m (lower 16 bits) in MATRIXC+2n+2 address.

(4) Return matrix A column components to first column.

(5) Determine if one row of matrix product Cn,m has been calculated. If n is not 3, return to

process (2). If n is 3, move to process (6).

(6) Shift matrix A row components down one row.

(7) Determine if all three rows of matrix product C have been calculated. If n is not 3, return

to process (2). If n is 3, all of matrix product Cn,m has been calculated and processing

ends.

Rev. 1.0, 09/99, page 77 of 115

(1)

(2)

Initial setting

Sum of products calculation on row components a_n,i

of matrix A and column components b_i,mof matrix B

Cn,m = Σ (Cn,i × Ci,m)

i=1

Store CHn,m (upper 16 bits of matrix product Cn,m)

in MATRIXC+2n address and CLn,m (lower 16 bits)

in MATRIXC+2n+2 address

(3)

Return matrix A column components to first column

(4)

(5)

n = 3?

Yes

(6)

(7)

Shift matrix A row components down one row

n = 3?

Yes

End

Figure 9.4 Algorithm for Calculating Matrix Product C

Rev. 1.0, 09/99, page 78 of 115

Flowchart

Start

Clear R10 address

Clear R12 address

Transfer MATRIXA (H'1000FF00) address to register

(1)

Transfer MATRIXB (H'1001FF00) address to register

Transfer MATRIXC (H'1001FF12) address to register

Use extended instruction REPEAT to set repeat start

address (LOOP_S), repeat end address (LOOP_E),

and number of repeats (3 times)

Clear register M0

Clear register A0

(2)

After reading 1 component a_i,jfrom matrix A,

increment R4 address

After reading 1 component b_i,jfrom matrix B,

increment R6 address

Repeat program

number of times

indicated by number

of repeats setting (3

times in the case of

the example

Multiply matrix A component a_i,jby matrix B

component b_i,j

program)

Add product of a_i,jand b_i,jto product from previous

repeat; c_i,jhas been calculated once repeat operation

finishes

α β

Rev. 1.0, 09/99, page 79 of 115

α β

Shift matrix product c_i,jobtained in process (2) 16

bits to the left

Store upper 16 bits of matrix product c_i,j(cH_i,

MATRIXC+2n address

j) in

(3)

(4)

Store lower 16 bits (cL_i,j) in MATRIXC+2n+2 address

Return matrix A column components to first column

Calculation of 1 component of matrix product C is

finished, so increment R12 counter register

Is calculation of 1

row of matrix product C finished?

R11 = R12?

(5)

Yes

Clear register R12 (clear counter)

(6)

(7)

Shift matrix A row components down one row

Calculation of 1 row of matrix product C is finished,

so increment R10 counter register

Is calculation of 3

rows of matrix product C finished?

R13 = R10?

Yes

End

Rev. 1.0, 09/99, page 80 of 115

Main Program

matrix.src

;*******************************************************************************************

Matrix operation routine

[A][B]=[C]

;*******************************************************************************************

MAIN: MOV.L

MOV.L

#0,R10

#0,R12

MOV.L

#MATRIXA,R4

#MATRIXB,R6

#MATRIXC,R7

MOV.L

;****************************************

;Calculate all components/R10, R13

;****************************************

MOV.L

MATORIX:

#3,R13

;Set repeat value (number of rows)

;**********************************

;Calculate row components of n’th row

;**********************************

MOV.L

#3,R11

;Set repeat value (number of columns)

RETSU:

;****************************

;Calculate 1 component

;****************************

BSR

NOP

BSR

NOP

SEIBUN

STORE

;****************************

ADD

#-6,R4

;Return address to first column of row i

of matrix A

ADD

#1,R12

;Increment counter each time 1 component

of 1 row of matrix product C is

calculated

CMP/EQ

R11,R12

;Is sum of products calculation for 1 row

of matrix product C finished?

RETSU

MOV.L

#0,R12

;Clear counter

;**********************************

ADD

#6,R4

MOV.L

ADD

#MATRIXB,R6

#1,R10

;Increment counter when sum of products

calculation for 1 row of matrix product C

is finished

CMP/EQ

R13,R10

;Is sum of products calculation for last

row of matrix product C finished?

Rev. 1.0, 09/99, page 81 of 115

MATORIX

;****************************************

EXIT: BRA

NOP

EXIT

;*******************************************************************************************

;Matrix C 1 component calculation routine

;*******************************************************************************************

SEIBUN:

REPEAT LOOP_S,LOOP_E,#3

;Number of rows in matrix [A]

is number of repeats

PCLR

;Clear for repeat

LOOP_S:

MOVX.W @R4+,X0 MOVY.W @R6+,Y0 ;aij,bij load

PMULS X0,Y0,M0

LOOP_E:PADD

A0,M0,A0

RTS

NOP

;*******************************************************************************************

;Matrix C 1 component storage routine

;*******************************************************************************************

STORE: PSHA

#16,A0

MOVY.W A0,@R7+ ;Store upper bits of c_i,j

MOVY.W A0,@R7+ ;Store lower bits of c_i,j

RTS

NOP

;***********************

MAIN_E:NOP

Data

*********************************************************************************

;* Matrix operation data (XRAM/YRAM)

;*********************************************************************************

.SECTION XRAM,DATA,LOCATE=H'1000FF00

MATRIXA:

. XDATA.W

0.5,0.125,0.5,0.125,0.5,0.125,0.5,0.125,0.5

.SECTION YRAM,DATA,LOCATE=H'1001FF00

MATRIXB:

MATRIXC:

.RES.W

0.25,0.0625,0.25,0.0625,0.25,0.0625,0.25,0.0625,0.25

Rev. 1.0, 09/99, page 82 of 115

Section 10 Inner Product

Overview

The inner product (32-bit precision) of two non-zero n-dimensional space vectors, a (16-bit

components) and b (16-bit components), is calculated. The n-dimensional space vectors a and b

are set in XRAM and YRAM beforehand. The inner product of a and b is stored in YRAM at

address H'1001FF00.

Description

1. Method of Expressing Space Vectors

Figure 10.1 shows an expression of the components of n-dimensional space vector a. An n-

dimensional space vector can be thought of as a vector consisting of a group of n real numbers.

There are two ways of expressing the components of a vector: as a row vector and as a column

vector.

a₁

a₂

...

a₁, a₂, , a_n

a_n

(a) Row vector

(b) Column vector

*1 a_i: 16-bit

Figure 10.1 Expression of Components of n-dimensional Space Vector a

Rev. 1.0, 09/99, page 83 of 115

2. Method of Calculating Inner Product

Figure 10.2 shows an expression of the components of the inner product of n-dimensional

space vectors a and b. Here the inner product of vectors a and b is expressed as (a,b).

b₁

b₂

b_i

...

a₁b₁+ a₂b₂+ + a_ib_i+ + a_nb_n

a₁, a₂, , a_i, , a_n

n-dimensional

space vector

Row vector a

b_n

n-dimensional

space vector

Column vector

*1 a_i: 16-bit

b_i: 16-bit

*2 32-bit

Figure 10.2 Expression of Components of Inner Product of n-dimensional Space

Vectors a and b

The inner product (a,b) is obtained using the following equation.

(

a,b

) = Σ a b

i i

i=1

Using the above equation, the inner product (a,b) is obtained by performing a sum of products

calculation on components a_iof space vector a and components b_iof space vector b.

Rev. 1.0, 09/99, page 84 of 115

3. Method of Storing Inner Product (a,b) of n-dimensional Space Vectors a and b

Figure 10.3 shows the method of storing the inner product (a,b) components of n-dimensional

space vectors a and b, which are set in XRAM and YRAM.

Address

VECTORA

XRAM

a₁

Address

VECTORB

YRAM

b₁

VECTORA+2

VECTORA+4

a₂

a₃

VECTORB+2

VECTORB+4

b₂

b₃

VECTORA+2n–2

VECTORA+2n

a_n–1

a_n

VECTORB+2n–2

VECTORB+2n

b_n–1

b_n

Address

#IN_PRO

#IN_PRO+2

YRAM

(

) H

) L

a,b

(

a,b

*1 ( )H: Upper 16 bits of (

a,b a,b

)

(a,b)L: Lower 16 bits of (a,b)

Figure 10.3 Method of Storing Inner Product (a,b) of n-dimensional

Space Vectors a and b

Rev. 1.0, 09/99, page 85 of 115

4. Algorithm for Calculating Inner Product

Figure 10.4 shows the algorithm for calculating the inner product (a,b). The details of the

algorithm are described below.

(1) Set the addresses where the space vector a and b components are stored as well as the

address for storing the inner product of a and b in X address register (R4) and Y address

registers (R6, R7).

(2) Perform a sum of products calculation on components a_iof space vector a and components

b_iof space vector b.

(3) Store (a,b)H, the upper 16 bits of inner product (a,b) at the IN_PRO address and (a,b)L,

the lower 16 bits of inner product (a,b), at the IN_PRO+2 address. This completes the

process.

(1)

Initial setting

sum of products calculation on components a_iof

space vector a and components b_iof space vector

(2)

(a,b) = Σ (a_i× b_i)

i=1

Store (a,b)H, the upper 16 bits of inner product

(

) at the IN_PRO address and ( )L, the lower

a,b

(3)

16 bits of inner product (a,b), at the IN_PRO+2

address

End

Figure 10.4 Algorithm for Calculating Inner Product

Rev. 1.0, 09/99, page 86 of 115

Flowchart

Start

Transfer VECTORA (H'1000FF00) address to register

(1-1)

(1-2)

Transfer VECTORB (H'1001FF00) address to register

(1)

Transfer IN_PRO (H'1001FF0A) address to register

(1-3)

(2-1)

Use extended instruction REPEAT to set repeat start

address (LOOP_S), repeat end address (LOOP_E),

and number of repeats (n + 2 times)

(2-2)

(2-3)

Clear register M0

Clear register A0

(2)

After reading 1 component ai of vector a from XRAM,

increment R4 address

After reading 1 component bi of vector b from YRAM,

increment R6 address

(2-4)

Multiply a_iby b_i

i–1

Σ a_jb_j

Calculate a_ib_iand

j=1

Shift obtained inner product (a,b) 16 bits to the left to

obtain (a,b)L

(3-1)

(3-2)

Store ( )H, the upper 16 bits of inner product (

)

a,b

at IN_PRO address, increment IN_PRO address

(3)

Store ( )L, the lower 16 bits of inner product ( ),

a,b

at IN_PRO+2 address

End

Rev. 1.0, 09/99, page 87 of 115

Main Program

This program calculates the inner product for the three-dimensional space vector {ai, bi (i = 1, 2,

3)}.

in_pro.src

;*******************************************************

Inner product calculation routine

(a,b)=a1b1+a2b2+a3b3

;*******************************************************

Initial setting routine

;*******************************************************

MAIN:

MOV.L

#VECTORA,R4

#VECTORB,R6

#IN_PRO,R7

;*******************************************************************************************

;* Sum of products calculation routine

;*******************************************************************************************

REPEAT LOOP_S,LOOP_S,#5

;Number of components in vector a

+ 2 is number of repeats

PCLR

LOOP_S:

PADD

A0,M0,A0 PMULS X0,Y0,M0 MOVX.W @R4+,X0 MOVY.W @R6+,Y0;ai,bi load

;*******************************************************************************************

;* Inner product storage routine

;*******************************************************************************************

STORE:

PSHA

#16,A0

MOVY.W A0,@R7+;Store upper bits

of inner product

MOVY.W A0,@R7 ;Store lower bits

of inner product

EXIT:

BRA

NOP

EXIT

MAIN_E: NOP

Rev. 1.0, 09/99, page 88 of 115

Data

;*****************************************************************

;* Inner product calculation data (XRAM/YRAM)

;*****************************************************************

.SECTION XRAM,DATA,LOCATE=H'1000FF00

VECTORA:

.XDATA.W

0.5,0.125,0.5,0,0

.SECTION YRAM,DATA,LOCATE=H'1001FF00

VECTORB:

IN_PRO:

.XDATA.W

.RES.W

0.25,0.0625,0.25,0,0

Rev. 1.0, 09/99, page 89 of 115

Section 11 Square Root

Overview

A 16-bit fixed-point square root calculation is performed and a square root with 15-bit precision is

obtained.

Description

1. I/O Value Data Format

Figure 11.1 shows the data format for I/O values. The value, X, whose square root is to be

determined is input in 16-bit format with its uppermost bit set to 0. However, it is also

necessary to perform normalization on X before calculating the square root.

The square root, √X, is output in 16-bit (1 word) format with the uppermost bit set to 0.

Bit: 15

Input value

Output value

Square root, X

X, whose square root

is to be determined

: Decimal point position

Figure 11.1 I/O Value Data Format

2. Method of Calculating Square Root

Figure 11.2 illustrates the square root function. The example program calculates an

approximate value for the square root of X using a polyline graph of the sort shown in Figure

11.2 Square Root Function. Next, a gradualization equation is used to converge on a more

accurate value. This is the method used to calculate the square root, √X.

Once normalization is performed on X, the range that can be taken by X, the value whose

square root is to be calculated, is as follows.

0 ≤ X < 1.0

(H'00000 ≤ X ≤ H'7FFF)

In the square root function shown in Figure 11.2, the slope of the polyline graph is created by a

combination of comparatively gentle sections greater than 0.1 and steep sections less than 0.1,

resulting in approximation equations (1) and (2). Using these two equations, an approximate

square root value (y0) is obtained.

Rev. 1.0, 09/99, page 91 of 115

1.0

√0.7

√0.5

y₀= 0.58579 × X + 0.41422

0.5

0.41422

√1.0

y₀= 3.16228 × X

0.1

0.25

0.5

0.7

1.0

Value whose square root is

to be determined, X

Figure 11.2 Square Root Function

Input value X > 0.1

y₀= 0.58579 × X + 0.41422 ------------------------------------------------------------- (1)

Input value X ≤ 0.1

y₀= 3.16228 × X -------------------------------------------------------------------------- (2)

(The actual program uses y₀= 0.79057 × X × 2².)

Note that equation (2) cannot be used without modification for fixed-point calculation.

Therefore, normalization is performed and it is used as y₀= 0.79057 × X × 2².

Next, the value y₀obtained with approximation equations (1) and (2) is assigned to

gradualization equation (3) to obtain a more accurate square root value, √X.

y₀= √X = 1/2 (y₀+ X/y₀) ----------------------------------------------------------------- (3)

Here, in item 2 of equation (3), since the value whose square root is being calculated, X, has

been normalized, X/y₀must be a normalized value in order to y₀> X after the calculations of

equations (1) and (2). In the sample program gradualization equation (3) is performed three

times, resulting in a square root value with 15-bit precision.

Rev. 1.0, 09/99, page 92 of 115

3. Algorithm for Fixed-point Square Root Calculation

The algorithm for fixed-point square root calculation is described below.

(1) Initial settings are performed.

(2) It is determined whether X, the value whose square root is to be calculated, is not 0. If X is

0, the square root, √X, is given as 0 and processing ends.

(3) It is determined whether X, the value whose square root is to be calculated, is a negative

number. If X is a negative number, the square root, √X, is given as H'FFFF and processing

ends.

(4) X, the value whose square root is to be calculated, is compared to H'7FFB to determine

whether it is larger or smaller. If X > H'7FFB, the square root, √X, is given as √X(=X) and

processing ends.

(5) X, the value whose square root is to be calculated, is compared to 0.1 to determine

whether it is larger or smaller. If X > 0.1, processing continues with (6). If X ≤ 0.1,

processing continues with (6)'.

(6) Equation (1) is used to calculate approximate square root y₀. Processing continues with

(7).

(6)' Equation (2) is used to calculate approximate square root y₀. Processing continues with

(7).

(7) Approximate square root y₀is compared to X, the value whose square root is being

calculated, to determine whether it is larger or smaller. If y₀= X, approximate square root

y₀is divided by 2, 0.5 (H'4000) is added, the result is given as the square root, √X, and

processing ends.

(8) If the comparison in (7) shows that X, the value whose square root is being calculated, is

greater than approximate square root y₀, gradualization equation X/y₀is not performed. In

this case the square root, √X, is given as H'FFFF and processing ends.

(9) Gradualization equation (3) is used to calculate square root value y, which is given as the

square root, √X, and processing ends.

Figure 11.3 shows the algorithm used for calculating the square root.

Rev. 1.0, 09/99, page 93 of 115

Initial setting

(1)

Yes

X = 0?

(2)

(3)

√ X = 0

√ X = H'FFFF

√ X = X

X < 0?

(4)

X > H'7FFB?

(5)

(6)

X > 0.1?

Calculate approximate square

root y₀using equation (1)

y₀= 0.58579 × X + 0.41422

Calculate approximate square

root y₀using equation (2)

y₀= 3.16228 × X

(6)'

(7)

Yes

y₀= X?

Divide approximate square root

y₀by 2, add 0.5

y₀= 1/2 (y₀+ 1)

(8)

(9)

y₀< X?

√ X = H'FFFF

Calculate square root √ X using

equation (3)

y₀= √ X = 1/2 (y₀+ X/y₀)

End

Figure 11.3 Algorithm for Calculating Square Root

Rev. 1.0, 09/99, page 94 of 115

Flowchart

Start

(1-1)

(1-2)

(1-3)

Transfer INPUT address to register R4

Transfer EX_OUT address to register R5

Transfer DAT address to register R6

Transfer DAT2 address to register R7

(1)

(1-4)

(2-1)

Load input value X in register R0

Is data value

in register R0 (input value X) 0?

(X = 0?)

(2-2)

(2-3)

(2)

Yes

Load H'0 in register X0

(2-4)

(2-5)

Copy register X0 data (H'0) to register A0

FIN

Exchange lower word of data in register R0 and

upper word of data in register R1

(3-1)

(3-2)

Shift data in register R1 (upper word is input value X)

1 bit to the left to determine sign

Is bit 31 of register R1 1?

(X < 0?)

(3-3)

(3-4)

Yes

(3)

Load H'FFFF in register X0

(3-5)

(3-6)

Copy register X0 data (H'FFFF) to register A0

FIN

Rev. 1.0, 09/99, page 95 of 115

Load input value X in register R0

Load H'7FFB in register R1

(4-1)

(4-2)

Is R0 greater than R1?

(4-3)

X > H'7FFB?

(4)

Yes

Transfer EX_OUT2 address to register R5

(4-4)

(4-5)

Load input value X in register X0

Copy register X0 data to register A0

(4-6)

FIN

(5-1)

(5-2)

Transfer DAT2 address to register R7

Load 0.1 in register R1

(5)

Is R0 greater than R1?

(5-3)

(6-1)

X > 0.1?

Yes

Load input value X in register X1

Load data for approximate square root calculation

output (0.58579) in register Y0

(6-2)

(6-3)

Load input value X in register R1

(6)

Transfer WORK address to register R4

Multiply register X1 and register Y0 (0.58579X)

Load data for approximate square root calculation

output (0.41422) in register Y1

(6-4)

(6-5)

Multiply register A1 and register Y1 (0.58579X +

0.41422)

Rev. 1.0, 09/99, page 96 of 115

(6'-1)

(6'-2)

Transfer KINJI2 address to register R6

Load input value X in register X1

Load data for approximate square root calculation

output (0.79057) in register Y0

(6'-3)

(6'-4)

Load input value X in register R1

(6)'

Transfer WORK address to register R4

Multiply register X1 and register Y0 (0.79057X)

(6'-5)

(6'-6)

Shift 2 bits to left to multiply 0.79057X by 4

Load approximate square root y₀in register R0 via

@R4

(7-1)

(7-2)

Is approximate square

root y₀equivalent to input value X?

y₀= X?

Yes

(7)

Shift data in register A0 1 bit to right to multiply

approximate square root y₀by 1/2

Load 0.5 in register Y1

(7-3)

(7-4)

Add register A0 and register Y1 (y

result in register A0

0/2 + 0.5), store

FIN

Is input value X greater

than approximate square root y₀?

X > y₀?

(8-1)

Yes

(8)

(8-2)

(8-3)

Load H'FFFF in register X0

Copy register X0 data (H'FFFF) to register X0

FIN

III

Rev. 1.0, 09/99, page 97 of 115

III

Set register R14 to 3 (number of times to perform

gradualization equation)

(9-1)

(9-2)

(9-3)

Clear register R13 to 0

Increment register R13 (repeat counter)

(9-4)

(9-5)

Save input value X in register R11

Clear register R12

Use extended instruction REPEAT to set repeat start

address (LOOP_S), repeat end address (LOOP_E),

and number of repeats (15 times)

(9-6)

(9-7)

(9-8)

Initialize for signless division

(9)

Perform 1-step division on X using y₀

Store T bit in R12, shift R12 1 bit to left

Program repeats number of times

specified as number of repeats (15

times in case of sample program)

(9-9)

Transfer X/y₀to register Y0 via @R4

(9-10)

(9-11)

(9-12)

Copy register X0 to register Y1

Shift data in register A0 1 bit to right to multiply X by 1/2

Shift data in register X1 1 bit to right to multiply X by 1/2

(9-13)

(9-14)

Add calculation results from (9-12) and (9-13) to

obtain square root y (√X). Store calculation result in

(9-15)

(9-16)

Transfer y (√X) to register Y0 via @R4

Restore input value X in register R1 from register R11

Rev. 1.0, 09/99, page 98 of 115

Is register R13

greater than register R14?

(9-17)

(9)

Yes

FIN

(9-18)

Store data from register A0 in register R7 (OUTPUT)

End

Rev. 1.0, 09/99, page 99 of 115

Main Program

rout.src

;*******************************************************************************************

Square root calculation routine

√X

;*******************************************************************************************

Initial setting routine

;*******************************************************************************************

MAIN:

MOV.L

#INPUT,R4

#EX_OUT,R5

#KINJI1,R6

#DAT1,R7

;*******************************************************************************************

;* Zero check of value to have square root calculated routine

;*******************************************************************************************

MOV.W

CMP/EQ

@R4,R0

#0,R0

ZERO_CH

;If zero, do following

;End of processing

processing

MOVX.W @R4,X0

PCOPY X0,A0

BRA

NOP

FIN

;*******************************************************************************************

;* Negative value check of value to have square root calculated routine

;*******************************************************************************************

ZERO_CH:

SWAP

SHAL

R0,R1

MINUS_CH

;If negative, do following

processing

MOVX.W @R5,X0

PCOPY

BRA

X0,A0

FIN

;End of processing

NOP

;;******************************************************************************************

;* Comparison of value to have square root calculated and F'7FFB routine

;*******************************************************************************************

MINUS_CH:

Rev. 1.0, 09/99, page 100 of 115

MOV.W

CMP/GT

@R4,R0

@R7,R1

R1,R0

;X load

;H'7FFB load

;R0 > R1 ?

EQU_SEL

;If X > F'7FFB, do following

processing

MOV.L

#EX_OUT2,R5

FIN

MOVX.W @R5,X0

;X load

PCOPY X0,A0

BRA

NOP

;*******************************************************************************************

;* Approximation equation selection routine

;*******************************************************************************************

EQU_SEL:

MOV.L

MOV.W

CMP/GT

#DAT2,R7

@R7,R1

R1,R0

Y0_PRO2

;If X ≤ 0.1, jump

********************************************************************************************

;* Approximate square root y0 calculation routine

;*******************************************************************************************

Y0_PRO1:

MOVX.W @R4,X1 MOVY.W @R6+,Y0;Load input value X (value to

have square root calculated)

for use in calculating

approximate square root

MOV.W

MOV.L

@R4,R1

;Keep input value X (value to

have square root calculated)

in R1

#WORK,R4

PMULS X1,Y0,A1

MOVY.W @R6+,Y1;0.58579X,0.41422 load

;0.58579X+0.41422 -> y0

PADD

BRA

A1,Y1,A0

HIKAKU

NOP

;*******************************************************************************************

;* Approximation equation (2) y0 calculation routine

;*******************************************************************************************

Y0_PRO2:

MOV.L

#KINJI2,R6

MOVX.W @R4,X1 MOVY.W @R6+,Y0;Load input value X (value to

have square root calculated)

for use in calculating

approximate square root

MOV.W

MOV.L

@R4,R1

;Keep input value X (value to

have square root calculated)

in R1

#WORK,R4

Rev. 1.0, 09/99, page 101 of 115

PMULS X1,Y0,A1

PSHA #2,A0

MOVY.W @R6+,Y1;0.58579X,0.41422 load

;0.58579X+0.41422 -> y0

********************************************************************************************

;* Comparison of approximate square root and value to have square root

calculated routine/Part 1

;*******************************************************************************************

HIKAKU:

MOVX.W A0,@R4

;Pass to CPU unit

MOV.W

@R4,R0

R0,R1

CMP/EQ

;Approximate square root y0 =

input value X (value to have

square root calculated)?

NOT_EQ

;If y0 ≠ X, do following

processing

PSHA

PADD

BRA

#-1,A0

A0,Y1,A0

FIN

MOVY.W @R6,Y1 ;y0/2,0.5 load

;y0/2-0.5

;End of processing

NOP

;*******************************************************************************************

;* Comparison of approximate square root and value to have square root

calculated routine/Part 2

;*******************************************************************************************

NOT_EQ:

CMP/GT

R0,R1

NOT_GT

;If y0 < X, do following

processing

MOVX.W @R5,X0 ;H'FFFF load

PCOPY X0,A0

BRA

NOP

FIN

;*******************************************************************************************

;* Square root y calculation using gradualization equation routine

;*******************************************************************************************

NOT_GT:

MOV.L

#3,R14

#0,R13

;Set number of repeats

;Increment counter

LENEAR_LP:

ADD

#1,R13

MOV

R1,R11

;push X

MOV.L

REPEAT

DIV0U

#0,R12

;Clear register R12

LOOP_S,LOOP_E,#15

;Signless initialization

;R1/R0

LOOP_S:

LOOP_E:

DIV1

R0,R1

Rev. 1.0, 09/99, page 102 of 115

ROTCL

MOV.W

R12

;Store T bit

R12,@R4

MOVX.W @R4,X0

MOVX.W A0,@R4

PCOPY X0,Y1

PSHA

PADD

#-1,A0

#-1,Y1

;y0/2

;(X/y0)/2

A0,Y1,A0

MOV.W

MOV

@R4,R0

R11,R1

;pop X

CMP/GT

R14,R13

LENEAR_LP

;If set number of repeats has

been performed, escape

FIN:

MOV.L

#OUTPUT,R7

EXIT

MOVY.W A0,@R7 ;Store square root √X

EXIT: BRA

NOP

MAIN_E:NOP

Data

;*******************************************************************************************

;* Square root calculation data (XRAM/YRAM)

;*******************************************************************************************

.SECTION XRAM,DATA,LOCATE=H'1000FF00

INPUT:

WORK:

.RES.W

;External input data storage area

;Work area

.RES.W

EX_OUT:

EX_OUT2:

.DATA.W

.XDATA.W

H'FFFF

;Output value if input value X < 0

;Output value if input value X > H'7FFB

.SECTION YRAM,DATA,LOCATE=H'1001FF00

KINJI1:

KINJI2:

DAT1:

.XDATA.W

.DATA.W

.XDATA.W

.RES.W

0.58579,0.41422,0.5

;Approximation equation (1)

;Approximation equation (2)

0.79057

H'7FFB

0.1

DAT2:

OUTPUT:

;External output data storage area

Rev. 1.0, 09/99, page 103 of 115

Execution Example

The input values for X (INPUT) and the square root √X values calculated (OUTPUT) are shown

in table 11.1.

Table 11.1 Square Root √X Calculation Results (3 Executions of Gradualization Equation)

Logical Value

(decimal)

√X

Logical Value

(hexadecimal)

√X

Output Value

(hexadecimal)

√X

Input Value X

(decimal)

Input Value X

(hexadecimal)

0.9999

0.99987

0.85

H'7FFC

H'7FFB

H'6CCD

H'42F1

H'2BB5

H'1168

H'0B23

H'0147

H'0000

H'A667

0.99995

0.99993

0.92195

0.72319

0.5831

0.36878

0.29496

0.1

H'7FFE

H'7FFD

H'7602

H'5C91

H'4AA3

H'2F34

H'25C1

H'0CCD

H'0000

—

H'7FFF

H'7FFD

H'7602

H'5C90

H'4AA2

H'2F33

H'25C1

H'0CC9

H'0000

H'FFFF

0.523

0.34

0.136

0.087

0.01

–0.7

—

Rev. 1.0, 09/99, page 104 of 115

Section 12 Square Mean Error

Overview

The square mean error of two variables, a[i] (16-bit components) and b[i] (16-bit components), is

calculated.

(i = 1, 2, ..., n)

Description

1. Method of Obtaining Square Mean Error

In order to obtain the square mean error, first the error e[i] for the two variables, a[i] and b[i],

must be considered. The relevant equation is given as equation (1) below.

^*1e[i] = a[i] – b[i] ------------------------------------------------------------------------- (1)

(i = 1, 2, ..., n)

Next, the error distribution Se²is obtained. The error distribution Se²can be calculated by

dividing the sum total of the squares of the errors e[i] by the number of components (n). The

components of the squares of the errors e[i] can be expressed as follows.

1/n · Σe[i]²= 1/n · (a[1] – b[1])²+ (a[2] – b[2]²+ + (a[n] – b[n])²

...

The error distribution Se²can be obtained using equation (2) below.

Se²= 1/n · Σ (a[i] – b[i])²

----------------------------------------------------------------- (2)

i=1

The square mean error E[Se²] is expressed as the square root of the error distribution Se². The

relevant equation for obtaining the square mean error E[Se²] is shown as equation (3) below.

E[e²] = 1/n · Σ (a[i] – b[i])²

------------------------------------------------------------- (3)

i=1

*1 a[i]: 16-bit

b[i]: 16-bit

e[i]: 16-bit

Rev. 1.0, 09/99, page 105 of 115

2. Method of Storing Components of Variables a[i] and b[i]

On order to obtain the square mean error, it is first necessary to calculate the sum total of the

squares of the errors e[i]. To increase processing speed, the components of a[i] and b[i] are

stored in XRAM and YRAM ahead of time as shown in figure 12.1. Note that 0 is stored in

VECTORA+2n, VECTORA+2n+2, VECTORB+2n, and VECTORB+2n+2 of XRAM and

YRAM. The example program will not run properly if zeros are not stored in these locations.

For division by the number of components n, the numeric value 1/n is stored in XRAM. The

actual program does not use a DSP instruction, but rather multiplies values by 1/n.

XRAM

YRAM

Address

VECTORA

VECTORA+2

VECTORA+4

VECTORA+6

Address

VECTORB

VECTORB+2

VECTORB+4

VECTORB+6

a[1]

a[2]

a[3]

b[1]

b[2]

b[3]

VECTORA+2n–4

VECTORA+2n–2

VECTORA+2n

a[n–1]

a[n]

VECTORB+2n–4

VECTORB+2n–2

VECTORB+2n

b[n–1]

b[n]

VECTORA+2n+2

VECTORB+2n+2

XRAM

1/n

Address

VECTORA

Figure 12.1 Memory Map of Storage of Variables a[i] and b[i], Etc.

Rev. 1.0, 09/99, page 106 of 115

3. Algorithm for Calculating Square Mean Error

The algorithm used to calculate the square mean error is described below.

(1) Perform initial settings.

(2) Set items (2) and (3) so that the number of repeats is number of elements n + 2. Two extra

repeats are added since the following four instructions run in parallel.

e[i]²+ ⁱΣ^–1e[j]²

Calculate

, calculate e[i], load a[i], load b[i]

j=1

(3) Calculate the error e[i] for a[i] and b[i].

Σ (a[i] – b[i])²

(4) Divide

, which was obtained using processes (2) and (3), by n.

i=1

(5) Calculate the square root of the input error distribution Se². This yields the square mean

error and completes the processing. (For details, see 3. Algorithm for Fixed-point Square

Root Calculation in 11. Square Root.)

(1)

Initial setting

Execute the following 4 instructions in parallel

Calculate e[i]²+ ⁱΣ^–1e[j]², calculate e[i]², load a[i], load

(2)

j=1

Number of repeats is number of

components n + 2

Calculate error for a[i] and b[i]

e[i] = a[i] – b[i]

(3)

Divide Σ(a[i] – b[i])²by n

(4)

(5)

Se²= 1/2 · Σ (a[i] – b[i])²

i=1

Calculate square root of Se²

End

Figure 12.2

Rev. 1.0, 09/99, page 107 of 115

Flowchart

Start

(1-1)

(1-2)

(1-3)

Transfer VECTORA address to register R4

Transfer SEIBUN_N address to register R5

Transfer VECTORB address to register R6

(1)

Use extended instruction REPEAT to set repeat start

address (LOOP_S), repeat end address (LOOP_E),

and number of repeats (5 times)

(2-1)

(2-2)

Clear register A1

Clear register Y0

(2-3)

(2-4)

(2)

Add e[i]²and ⁱΣ^–1e[j]²

j=1

(2-5)

Calculate e[i]²

After reading a[i] from XRAM, increment R4 address

After reading b[i] from YRAM, increment R6 address

Program repeats number of

times specified as number

of repeats (5 times in case

of sample program)

(3)

(4)

Calculate error e[i] for a[i] and b[i]

(3-1)

Copy contents of register X0 to register A1

Read 1/n to register X1

(4-1)

Multiply Σe[j]²and 1/n

(4-2)

i=1

Rev. 1.0, 09/99, page 108 of 115

(5-1)

(5-2)

Transfer INPUT address to register R4

Store error distribution Se²(register A1) at input

address (INPUT) used for square root output

(5)

(See flowchart in section 11, Square Root for details)

End

Rev. 1.0, 09/99, page 109 of 115

Main Program

The example program calculates the square mean error using three components {a[i], b[i] (i = 1, 2,

3)}

squ_ave.src

;*******************************************************************************************

Square mean routine

a[i],b[i]

;*******************************************************************************************

Initial setting routine

;*******************************************************************************************

MAIN:

MOV.L

#VECTORA,R4

#SEIBUN_N,R5

#VECTORB,R6

;*******************************************************************************************

;* Error distribution calculation routine

;*******************************************************************************************

REPEAT LOOP_S,LOOP_E,#5

;Number of repeats is number of

vector a components + 2

PCLR

LOOP_S:

LOOP_E:

PADD

PSUB

A0,Y0,Y0 PMULS

X0,Y1,A1

A1,A1,A0 MOVX.W @R4+,X0 MOVY.W @R6+,Y1;a[i],b[i]load

PCOPY Y0,A1

MOVX.W @R5,X1

;1/3 load

PMULS X1,A1,A1

;0.33333 × Σ(a[i] - b[i])²

;*******************************************************************************************

;* Value to have square root calculated storage routine

;*******************************************************************************************

MOV.L #INPUT,R4

MOVX.W A1,@R4

;

;*******************************************************************************************

;* Square root calculation routine

;*******************************************************************************************

Initial setting routine

Rev. 1.0, 09/99, page 110 of 115

;*******************************************************************************************

SEMI_MAIN:

MOV.L

#EX_OUT,R5

#DAT,R6

#DAT2,R7

;*******************************************************************************************

;* Zero check of value to have square root calculated routine

;*******************************************************************************************

MOV.W

CMP/EQ

@R4,R0

#0,R0

ZERO_CH

MOVX.W @R4,X0

;H'0 load

PCOPY X0,A0

;

BRA

NOP

FIN

;End of processing

;*******************************************************************************************

;* Negative value check of value to have square root calculated routine

;*******************************************************************************************

ZERO_CH:

SWAP

SHAL

R0,R1

MINUS_CH

;If negative, do

;H'FFFF load

following processing

MOVX.W @R5,X0

PCOPY

BRA

X0,A0

FIN

;End of processing

NOP

;*******************************************************************************************

Comparison of value to have square root calculated and F'7FFB

routine

;*******************************************************************************************

MINUS_CH:

MOV.W

CMP/GT

@R4,R0

;X load

@R7,R1

;H'7FFB load

;R0 > R1 ?

R1,R0

EQU__SEL

#EX_OUT2,R5

;If R1 is greater, jump

MOV.L

MOVX.W @R5,X0

;X load

PCOPY X0,A0

BRA

NOP

FIN

;*******************************************************************************************

;* Approximation equation selection routine

Rev. 1.0, 09/99, page 111 of 115

;*******************************************************************************************

EQU_SEL:

MOV.L

MOV.W

CMP/GT

#DAT2,R7

@R7,R1

R1,R0

Y0_PRO2

;*******************************************************************************************

;* Approximation equation (1) y0 calculation routine

;*******************************************************************************************

Y0_PRO1:

MOVX.W @R4,X1

MOVY.W @R6+,Y0 ;Load input value X

(value to have square

root calculated) for use

in calculating

approximate square root

MOV.W

@R4,R1

;Keep input value X

(value to have square

root calculated) in R1

MOV.L

#WORK,R4

PMULS X1,Y0,A1

MOVY.W @R6+,Y1 ;0.58579X,0.41422 load

;0.58579X+0.41422-> y0

PADD

BRA

A1,Y1,A0

HIKAKU

NOP

;*******************************************************************************************

;* Approximation equation (2) y0 calculation routine

;*******************************************************************************************

Y0_PRO2:

MOV.L

#KINJI2,R6

MOVX.W @R4,X1

MOVY.W @R6+,Y0 ;Load input value X

(value to have square

root calculated) for use

in calculating

approximate square root

MOV.W

@R4,R1

;Keep input value X

(value to have square

root calculated) in R1

MOV.L

#WORK,R4

PMULS X1,Y0,A0

;0.79057 × X

PSHA

#2,A0

;(0.79057 × X) × 4

;*******************************************************************************************

Comparison of approximate square root and value to have square root

calculated routine/Part 1

;*******************************************************************************************

HIKAKU:

MOVX.W A0,@R4

;Pass to CPU unit

MOV.W

@R4,R0

R0,R1

CMP/EQ

;Approximate square root

= input value X (value

to have square root

calculated)?

Rev. 1.0, 09/99, page 112 of 115

NOT_EQ

FIN

PSHA

PADD

BRA

NOP

#-1,A0

MOVY.W @R6,Y1 ;y0/2,0.5 load

;y0/2-0.5

A0,Y1,A0

;*******************************************************************************************

Comparison of approximate square root and value to have square root

calculated routine/Part 2

;*******************************************************************************************

NOT_EQ:

CMP/GT

R0,R1

NOT_GT

MOVX.W @R5,X0

;H'FFFF load

PCOPY X0,A0

BRA

NOP

FIN

;

;*******************************************************************************************

;* Square root y calculation using gradualization equation routine

;*******************************************************************************************

NOT_GT:

MOV.L

#3,R14

#0,R13

;Set number of repeats

;Increment counter

LENEAR_LP:

ADD

#1,R13

MOV

R1,R11

MOV.L

REPEAT

DIV0U

#0,R12

DIV_S,DIV_E,#15

;Signless initialization

;R1/R0

DIV_S:

DIV_E:

DIV1

R0,R1

ROTCL

MOV.W

R12

;Store T bit

R12,@R4

MOVX.W @R4,X0

PCOPY X0,Y1

PSHA

PADD

#-1,A0

;y0/2

#-1,Y1

;(X/y0)/2

A0,Y1,A0

MOVX.W A0,@R4

MOV.W

MOV

@R4,R0

R11,R1

CMP/GT

R14,R13

LENEAR_LP

Rev. 1.0, 09/99, page 113 of 115

FIN:

MOV.L

#OUTPUT,R7

MOVY.W A0,@R7 ;Store square root √X

EXIT: BRA

NOP

EXIT

MAIN_E:NOP

Data

;*******************************************************************************************

;* Square mean calculation data (XRAM/YRAM)

;*******************************************************************************************

.SECTION XRAM,DATA,LOCATE=H'1000FF00

VECTERA:

.XDATA.W

0.5,0.125,0.5,0,0

0.33333

SEIBUN_N:

;1/number of components (n)

;* For calculating square root *

INPUT:

WORK:

.RES.W

EX_OUT:

EX_OUT2:

.DATA.W

.XDATA.W

H'FFFF

.SECTION YRAM,DATA,LOCATE=H'1001FF00

.XDATA.W 0.25,0.0625,0.25,0,0

VECTERB:

;; * For calculating square root *

KINJI1:

KINJI2:

DAT1:

.XDATA.W

.DATA.W

.XDATA.W

.RES.W

0.58579,0.41422,0.5

;Approximation equation (1)

;Approximation equation (2)

0.79057

H'7FFB

0.1

DAT2:

OUTPUT:

Rev. 1.0, 09/99, page 114 of 115

Section 13 Effects of DSP Instructions on Program

Performance

The number of execution cycles required by each function program file is listed in tables 13.1 and

13.2.

The test conditions used for table 13.1 were as follows: an E8000 (SH7612) emulator was used,

the main program of each program file was allocated to XRAM, and the data was allotted to

XRAM and YRAM.

The test conditions used for table 13.2 were as follows: a simulator (SH-DSP) was used, the main

program of each program file was allocated to XROM, and the data was allotted to XRAM and

YRAM.

Table 13.1 Performance of Programs Employing DSP Instructions

No. of Execution

Program Filename

pmuls32.src

tri_fun.src

Function

Cycles

116

Notes

32-bit multiplication

Trigonometric function

Matrix operation

Inner product

matrix.src

238

3 × 3 matrix operation

in_pro.src

3-dmensional space vectors

rout.src

Square root

104

114

squ_ave.src

Square mean error

n = 3 (3 components)

Table 13.2 Performance of Programs Employing DSP Instructions

No. of Execution

Program Filename

pmuls32.src

tri_fun.src

Function

Cycles

172

Notes

32-bit multiplication

Trigonometric function

Matrix operation

Inner product

matrix.src

378

3 × 3 matrix operation

in_pro.src

3-dmensional space vectors

rout.src

Square root

272

292

squ_ave.src

Square mean error

n = 3 (3 components)

Rev. 1.0, 09/99, page 115 of 115

SH-DSP Software Application Note

Publication Date: 1st Edition, September 1999

Published by:

Electronic Devices Sales & Marketing Group

Semiconductor & Integrated Circuits

Hitachi, Ltd.

Edited by:

Technical Documentation Group

UL Media Co., Ltd.

HD6437041 [ETC]

相关型号：

HD6437041ACF

HD6437041ACF28

HD6437041AF

HD6437041AF28

HD6437041AVCF16

HD6437041AVF16

HD6437041F28

HD6437041F33

HD6437041VF16

HD6437042

HD6437042ACF

HD6437042ACF28