HD6437041 [ETC]

SuperH RISC Engine SH-DSP Software Application Notes/Q&A ; 的SuperH RISC引擎SH -DSP软件应用手册/ Q&A\n
HD6437041
型号: HD6437041
厂家: ETC    ETC
描述:

SuperH RISC Engine SH-DSP Software Application Notes/Q&A
的SuperH RISC引擎SH -DSP软件应用手册/ Q&A\n

文件: 总124页 (文件大小:407K)
中文:  中文翻译
下载:  下载PDF数据表文档文件
To all our customers  
Regarding the change of names mentioned in the document, such as Hitachi  
Electric and Hitachi XX, to Renesas Technology Corp.  
The semiconductor operations of Mitsubishi Electric and Hitachi were transferred to Renesas  
Technology Corporation on April 1st 2003. These operations include microcomputer, logic, analog  
and discrete devices, and memory chips other than DRAMs (flash memory, SRAMs etc.)  
Accordingly, although Hitachi, Hitachi, Ltd., Hitachi Semiconductors, and other Hitachi brand  
names are mentioned in the document, these names have in fact all been changed to Renesas  
Technology Corp. Thank you for your understanding. Except for our corporate trademark, logo and  
corporate statement, no changes whatsoever have been made to the contents of the document, and  
these changes do not constitute any alteration to the contents of the document itself.  
Renesas Technology Home Page: http://www.renesas.com  
Renesas Technology Corp.  
Customer Support Dept.  
April 1, 2003  
Cautions  
Keep safety first in your circuit designs!  
1. Renesas Technology Corporation puts the maximum effort into making semiconductor products better  
and more reliable, but there is always the possibility that trouble may occur with them. Trouble with  
semiconductors may lead to personal injury, fire or property damage.  
Remember to give due consideration to safety when making your circuit designs, with appropriate  
measures such as (i) placement of substitutive, auxiliary circuits, (ii) use of nonflammable material or  
(iii) prevention against any malfunction or mishap.  
Notes regarding these materials  
1. These materials are intended as a reference to assist our customers in the selection of the Renesas  
Technology Corporation product best suited to the customer's application; they do not convey any  
license under any intellectual property rights, or any other rights, belonging to Renesas Technology  
Corporation or a third party.  
2. Renesas Technology Corporation assumes no responsibility for any damage, or infringement of any  
third-party's rights, originating in the use of any product data, diagrams, charts, programs, algorithms, or  
circuit application examples contained in these materials.  
3. All information contained in these materials, including product data, diagrams, charts, programs and  
algorithms represents information on products at the time of publication of these materials, and are  
subject to change by Renesas Technology Corporation without notice due to product improvements or  
other reasons. It is therefore recommended that customers contact Renesas Technology Corporation  
or an authorized Renesas Technology Corporation product distributor for the latest product information  
before purchasing a product listed herein.  
The information described here may contain technical inaccuracies or typographical errors.  
Renesas Technology Corporation assumes no responsibility for any damage, liability, or other loss  
rising from these inaccuracies or errors.  
Please also pay attention to information published by Renesas Technology Corporation by various  
means, including the Renesas Technology Corporation Semiconductor home page  
(http://www.renesas.com).  
4. When using any or all of the information contained in these materials, including product data, diagrams,  
charts, programs, and algorithms, please be sure to evaluate all information as a total system before  
making a final decision on the applicability of the information and products. Renesas Technology  
Corporation assumes no responsibility for any damage, liability or other loss resulting from the  
information contained herein.  
5. Renesas Technology Corporation semiconductors are not designed or manufactured for use in a device  
or system that is used under circumstances in which human life is potentially at stake. Please contact  
Renesas Technology Corporation or an authorized Renesas Technology Corporation product distributor  
when considering the use of a product contained herein for any specific purposes, such as apparatus or  
systems for transportation, vehicular, medical, aerospace, nuclear, or undersea repeater use.  
6. The prior written approval of Renesas Technology Corporation is necessary to reprint or reproduce in  
whole or in part these materials.  
7. If these products or technologies are subject to the Japanese export control restrictions, they must be  
exported under a license from the Japanese government and cannot be imported into a country other  
than the approved destination.  
Any diversion or reexport contrary to the export control laws and regulations of Japan and/or the  
country of destination is prohibited.  
8. Please contact Renesas Technology Corporation for further details on these materials or the products  
contained therein.  
SuperH RISC Engine  
SH-DSP Software  
Application Note  
ADE-502-069  
Rev. 1.0  
9/21/1999  
Hitachi, Ltd.  
Cautions  
1. Hitachi neither warrants nor grants licenses of any rights of Hitachi’s or any third party’s  
patent, copyright, trademark, or other intellectual property rights for information contained in  
this document. Hitachi bears no responsibility for problems that may arise with third party’s  
rights, including intellectual property rights, in connection with use of the information  
contained in this document.  
2. Products and product specifications may be subject to change without notice. Confirm that you  
have received the latest product standards or specifications before final design, purchase or  
use.  
3. Hitachi makes every attempt to ensure that its products are of high quality and reliability.  
However, contact Hitachi’s sales office before using the product in an application that  
demands especially high quality and reliability or where its failure or malfunction may directly  
threaten human life or cause risk of bodily injury, such as aerospace, aeronautics, nuclear  
power, combustion control, transportation, traffic, safety equipment or medical equipment for  
life support.  
4. Design your application so that the product is used within the ranges guaranteed by Hitachi  
particularly for maximum rating, operating supply voltage range, heat radiation characteristics,  
installation conditions and other characteristics. Hitachi bears no responsibility for failure or  
damage when used beyond the guaranteed ranges. Even within the guaranteed ranges,  
consider normally foreseeable failure rates or failure modes in semiconductor devices and  
employ systemic measures such as fail-safes, so that the equipment incorporating Hitachi  
product does not cause bodily injury, fire or other consequential damage due to operation of  
the Hitachi product.  
5. This product is not designed to be radiation resistant.  
6. No one is permitted to reproduce or duplicate, in any form, the whole or part of this document  
without written approval from Hitachi.  
7. Contact Hitachi’s sales office for any questions regarding this document or Hitachi  
semiconductor products.  
Preface  
The SH-DSP is a CPU core belonging to the SuperH RISC engine family. It is a 32-bit RISC  
microcontroller based on the SH-2 CPU, optimized for signal processing performance, and  
incorporating a DSP unit.  
These application notes contain example code that makes use of the special features of the SH-  
DSP as well as explanations of how to utilize the hardware. It is hoped that these application notes  
will be of use to programmers designing applications that make use of the DSP functions.  
Note that though the operation of the example code contained in these application notes has  
been verified, it is still necessary to confirm its operation when in an actual implementation.  
For more information on the hardware, please refer to the hardware manual for the appropriate  
product.  
Please feel free to contact Hitachi for detailed information on development systems.  
Rev.1.0, 09/99, page v of 7  
SH-DSP Code Samples  
These application notes contain example code written to illustrate the special features of the SH-  
DSP.  
Figure 1 shows the format used for listings of source code in the application notes. The main  
program code is transferred to XRAM and the program is executed in XRAM. This format is  
compatible with the SH7612. When using other SH-DSP models, the following modifications and  
cautions apply:  
XRAM starting address setting .......................................................................................... (1)  
Vector and stack pointer (YRAM ending address + 1 byte) settings ................................. (2)  
Usage of commands with other SH-DSP models............................................................... (3)  
Since space for the data used by the main program is reserved in XRAM or YRAM,  
changes to XRAM or YRAM address settings to match microcontroller used ................. (4)  
;***************************************************************************  
;*  
Symbol definition  
;***************************************************************************  
;
[
XRAM address (SH7612)  
.EQU H'1000E000 ------------------------------------- (2)  
]
XRAM_TOP  
;***************************************************************************  
;* Program transfer routine  
;***************************************************************************  
.SECTION VECT,CODE,LOCATE=H'0  
;
.DATA.L  
.DATA.L  
_PRES  
H'10020000  
;_PRES  
;SP  
------------------- (1)  
.SECTION ROM,CODE,LOCATE=H'1000  
_PRES: MOV.L  
MOV.L  
MOV.L  
PRG_MOVE:  
MOV.W  
MOV.W  
ADD  
#XRAM_TOP,R1  
#MAIN,R10  
#MAIN_E,R11  
@R10+,R0  
R0,@R1  
#2,R1  
R11,R10  
PRG_MOVE  
#XRAM_TOP,R0  
CMP/GE  
BF  
MOV.L  
JMP  
@R0  
;Branch to program starting address  
;at transfer destination  
NOP  
Main program ---------------------------------- (3)  
Data -------------------------------------- (4)  
.END  
Figure 1 Source Code Format  
Rev. 1.0, 09/99, page vi of 7  
Contents  
Section 1 Example of Calling Functions (DSP Library)  
from C Source Code ......................................................................................  
...........................................................................................................................................  
1
1
2
2
3
3
4
1.1  
1.2 Linking Assignments.........................................................................................................  
1.2.1 “prglnk1.sub” Subcommand File for Linking......................................................  
1.2.2 “ini.bat” Batch File for Creating Absolute Files ..................................................  
1.2.3 “vect.src” Vector Table for “dsplbr.c” Program, which Uses DSP Library.........  
1.3 Function Execution Process ..............................................................................................  
Section 2 X/Y Bus Data Access....................................................................................  
2.1 X Memory Read................................................................................................................  
7
7
2.2 X Memory Write ............................................................................................................... 10  
2.3 Y Memory Read................................................................................................................ 14  
2.4 Y Memory Write ............................................................................................................... 17  
Section 3 16-bit Fixed-point Multiplication .............................................................. 21  
Section 4 Parallel Execution Instruction..................................................................... 27  
Section 5 Repeat Instruction........................................................................................... 33  
Section 6 Examples of Arguments Passed Between CPU Instructions  
and DSP Instructions..................................................................................... 41  
Section 7 32-bit Multiplication...................................................................................... 45  
Section 8 .............................................................................................................................. 59  
Section 9 Matrix Operations........................................................................................... 75  
Section 10 Inner Product.................................................................................................... 83  
Section 11 Square Root...................................................................................................... 91  
Section 12 Square Mean Error......................................................................................... 105  
Section 13 Effects of DSP Instructions on Program Performance........................ 115  
Rev.1.0, 09/99, page vii of 7  
Section 1 Example of Calling Functions (DSP Library)  
from C Source Code  
1.1  
C Source Code Employing Functions (DSP Library)  
The example code below, “dsplbr.c,” illustrates calling the “Mean” function in the DSP library  
(shdsplib.lib) from C source code.  
/*  
<<SH-DSP Application Notes>>  
-- DSP library usage example --  
"dsplbr.c"  
*/  
#include "ensigdsp.h"  
#define N 6  
(1)  
/* Mean value definition */  
/* Input data number */  
short dat[6]={45,61,516,3000,-974,10214} /* Input data */  
(2)  
(3)  
#pragma section X  
static short  
#pragma section Y  
static short  
#pragma section ANS  
static short  
/* XRAM address */  
/* YRAM address */  
datx[N];  
daty[N];  
answer;  
/* Address for storing mean value */  
#pragma section  
main()  
{
short  
i,output[1];  
/* output for storing variable i  
and Mean function calculation  
result */  
int  
src_x;  
/* Argument specifying storage area  
for input data */  
for(i=0;i<N;i++)  
{
datx[i] = dat[i];  
daty[i] = dat[i];  
}
/* Copy input data to XRAM */  
/* Copy input data to YRAM */  
/* select XRAM  
*1  
src_x = 1;  
*/  
(4)  
/* Use XRAM area for Mean  
function calculation */  
Mean(output,datx,N,src_x);  
answer = output[0];  
/* Pass Mean function arguments and  
calculate mean value */  
/* Store Mean function calculation  
result at answer address * /  
while(1);  
/* Processing complete */  
}
*1 Refer to 1.3 Function Execution Process for details.  
Rev. 1.0, 09/99, page 1 of 115  
(1) The format of the functions in the library shdsplib.lib are defined in the header file  
ensigndsp.h.  
(2) To ensure efficient X bus data transfer with the DSP unit, it is necessary to place datX[N] in  
XRAM. Section X needs to be set when linking to addresses in XRAM. (See 1.2 Linking  
Assignments.)  
(3) To ensure efficient Y bus data transfer with the DSP unit, it is necessary to place datY[N] in  
YRAM. Section Y needs to be set when linking to addresses in XRAM. (See 1.2 Linking  
Assignments.)  
(4) If srx_x = 1, an area in XRAM is used for Mean function calculations. If srx_x = 0, an area in  
YRAM is used.  
1.2  
Linking Assignments  
When using the DSP library the utmost care must be taken to ensure that the section setting is  
correct. The example code dsplbr.c shown in section 1.1 has two sections, X and Y. If XRAM and  
YRAM address are not set for these sections, the functions’ internal calculations cannot be  
performed correctly. These addresses are assigned in the subcommand file.  
1.2.1  
“prglnk1.sub” Subcommand File for Linking  
INPUT  
START  
vect,dsplbr  
BX(1000ff00),BANS(1000fff0),BY(1001e000)------------------ (1)  
LIBRARY  
PRINT  
OUTPUT  
FORM  
shdsplib.lib-------------------------------------------------------------------- (2)  
dsplbr.map  
dsplbr.abs  
A
DEBUG  
EXIT  
(1) BX(1000ff00) assigns #pragma section X (section X) of dsplbr.c to address H'1000FF00.  
BY(1001e000) assigns #pragma section Y (section Y) of dsplbr.c to address H'1001E000.  
(2) This specifies shdsplib.lib, which includes the Mean function, as the library to be edited.  
Rev. 1.0, 09/99, page 2 of 115  
1.2.2  
“ini.bat” Batch File for Creating Absolute Files  
asmsh vect.src -cpu=shdsp -debug -lis  
shc dsplbr.c -cpu=sh2 -lis -debug -include=ensigdsp.h  
lnk -subcommand=prglnk1.sub  
1.2.3  
“vect.src” Vector Table for “dsplbr.c” Program, which Uses DSP Library  
;********************************************************  
;*  
;*  
;*  
;*  
<<SH-DSP Application Notes>>  
-- DSP library usage example --  
"vect.src"  
;*******************************************************  
.import  
_main  
.section vect,data,locate=h'0  
.data.l  
.data.l  
.end  
_main  
h'10020000  
Rev. 1.0, 09/99, page 3 of 115  
1.3  
Function Execution Process  
Excerpts from the example code dsplbr.c shown in section 1.1, and the assembler code resulting  
from the functions used, as shown below.  
.
.
.
src_x = 1;  
Assembler code resulting from function  
Mean(output,datx,N,src_x;)  
Address  
Label  
Assembler  
1001e2fc  
1001e2fe  
1001e300  
1001e302  
_Mean CMP/PZ  
R7  
answer = output[0]  
.
.
.
BF  
MOV  
@1001E322:8  
#H'01,R1  
R1,R7  
CMP/GT  
.
.
.
.
.
.
1001e486  
1001e488  
1001e48a  
NEG  
MOV.W  
RTS  
R2,R2  
R2,@R4  
In table 1.1, the input data is arranged starting at address H'1000FF00. It is assumed that the data  
in RAM has been cleared to 0. The data remains the same after the function is executed.  
Table 1.1 Memory Map  
XRAM Memory  
H'1000FF00  
H'1000FF08  
002D 003D 0204 0BB8  
FC32 27E6 0000 0000  
Rev. 1.0, 09/99, page 4 of 115  
Table 1.2 Function Execution Process  
Excerpt from dsplbr.c Code  
Register Contents  
Mean(output,datx,N,src_x);  
Before execution:  
R4=H'1001FFFC, R5=H'1000FF00, R6=6, R7=1  
After execution:  
R4=H'1001FFFC, R5=H'1000FF0C, R6=6, R7=H'10000  
The function arguments are assigned the declaration sequence R4 to R7, so output=H'1001FFFC,  
datx=H'1000FF00, N=6, src_x=1 is passed to the function. The calculation result is held in @R4.  
Table 1.3 C Source Code Execution Process (Process Inside Memory Map)  
Excerpt from dsplbr.c Code  
YRAM Memory  
answer = output[0];  
Before execution:  
H'1001FF00  
0000 0000 0000 0000  
After execution:  
H'1001FF00  
0860 0000 0000 0000  
The C source code then stores the function calculation result from @R4 in answer (H'1001FF0).  
Table 1.4 Mean Function Calculation Result  
Input Value  
(decimal)  
Input Value  
(hexadecimal)  
Logical Value  
(decimal)  
Logical Value  
(hexadecimal)  
Output Value  
(hexadecimal)  
45  
H'2D  
2143.666667  
H'860  
H'860  
(2144 calculated  
as a decimal value)  
61  
H'3D  
516  
H'204  
H'BB8  
H'FC32  
H'27E6  
3000  
–974  
10214  
Rev. 1.0, 09/99, page 5 of 115  
Section 2 X/Y Bus Data Access  
2.1  
X Memory Read  
Overview  
The data from the XRAM_ADD address (H'1000FF00) and XRAM_ADD+2 address  
(H'1000FF02) is transferred, respectively, to registers X0 and X1.  
Description  
Table 2.1 shows the types of X memory read instructions and the registers that can be used as  
operands. Data can be read from X memory using the commands listed in table 2.1.  
When reading data from X memory the transfer data length is 16 bits, so the data is stored as the  
upper word of register X0 or X1. When this happens, the lower word of register X0 or X1 is  
cleared to 0. Processes (1) and (2) in the flowchart are illustrated below.  
Table 2.1 X Memory Read Instruction Types  
X Memory Read  
Instruction  
Source Register  
(Ax)  
Destination Register  
(Dx)  
Index Register  
(Ix)  
MOVX.W @Ax,Dx  
MOVX.W @Ax+,Dx  
MOVX.W @Ax+Ix,Dx  
R4, R5  
X0, X1  
R8  
Rev. 1.0, 09/99, page 7 of 115  
Process (1)  
XRAM  
16 15  
31  
0
XRAM_TOP  
XRAM_ADD  
*1  
Register X0  
16 15  
Bit: 31  
0
XRAM_END  
Stores read data  
Cleared to 0  
Process (2)  
XRAM  
16 15  
31  
0
XRAM_TOP  
XRAM_ADD  
*1  
Register X1  
16 15  
Bit: 31  
0
XRAM_END  
Stores read data  
Cleared to 0  
*1  
: Ignored  
Flowchart  
Start  
Transfer XRAM address (H'1000FF00) to register R4  
After reading data (0.5) from R4 address  
(H'1000FF00) to register X0, increment R4 address  
(1)  
(2)  
Read data (0.25) from R4 address (H'1000FF02) to  
register X1  
End  
Rev. 1.0, 09/99, page 8 of 115  
Main Program  
;**********************************************************************  
;*  
X memory read  
;**********************************************************************  
MAIN:  
EXIT:  
MOV.L  
MOVX.W  
MOVX.W  
BRA  
#XRAM_ADD,R4  
@R4+,X0  
@R4,X1  
;XRAM_ADD address -> register R4  
;(H'1000FF00) -> X0  
;(H'1000FF02) -> X1  
EXIT  
NOP  
MAIN_E: NOP  
Data  
;***************************************************************  
;* Data  
;***************************************************************  
.SECTION XRAM,DATA,LOCATE=H'1000FF00  
XRAM_ADD:  
.XDATA.W  
0.5,0.25  
Rev. 1.0, 09/99, page 9 of 115  
2.2  
X Memory Write  
Overview  
The data from the XRAM_ADD1 address (H'1000FF00) and XRAM_ADD1+2 address  
(H'1000FF02) is transferred the XRAM_ADD2 address and XRAM_ADD2+2 address.  
Description  
Table 2.2 shows the types of X memory write instructions and the registers that can be used as  
operands. Data can be written to X memory using the commands listed in table 2.2.  
When writing data to X memory the transfer data length is 16 bits, so the upper word data from  
register A0 or A1, as specified by the instruction, is stored in X memory. When this happens, the  
guard bit and lower word of register A0 or A1 is ignored. The X memory write instructions can  
use only registers A0 and A1 as source registers (see Table 2.2 X Memory Write Instruction  
Types), so when transferring data to register A0 or A1, single data transfers with register A0 or A1  
as the destination operand are used. Processes (1) and (2) in the flowchart are illustrated below.  
Table 2.2 X Memory Write Instruction Types  
X Memory Write  
Instruction  
Source Register  
(Da)  
Destination Register  
(Ax)  
Index Register  
(Ix)  
MOVX.W Da,@Ax  
MOVX.W Da,@Ax+  
MOVX.W Da,@Ax+Ix  
A0, A1  
R4, R5  
R8  
Rev. 1.0, 09/99, page 10 of 115  
Process (1)  
Memory map (XRAM)  
16 15  
31  
0
XRAM_TOP  
Register A0  
16 15  
XRAM_ADD1  
Bit: 39  
31  
0
Data written to XRAM  
Ignored  
Ignored  
XRAM_ADD2  
XRAM_END  
Process (2)  
Memory map (XRAM)  
16 15  
31  
0
XRAM_TOP  
Register A0  
16 15  
Data written to XRAM  
XRAM_ADD1  
Bit: 39  
31  
0
Ignored  
Ignored  
XRAM_ADD2  
XRAM_END  
Rev. 1.0, 09/99, page 11 of 115  
Flowchart  
Start  
Transfer XRAM_ADD1 address (H'1000FF00) to  
register R2  
Transfer XRAM_ADD2 address (H'1000FF00) to  
register R4  
After transferring data (0.5) from R4 (H'1000FF00)  
address to register A0, increment R4 address  
(1)  
Transfer register A0 data to R2 (H'1000FF04)  
address and increment R2  
Transfer data (0.25) from R4 (H'1000FF02) address  
to register A1  
(2)  
Transfer data from register A1 to R2 (H'1000FF06)  
address  
End  
Rev. 1.0, 09/99, page 12 of 115  
Main Program  
***********************************************************************  
;*  
X memory write  
;**********************************************************************  
MAIN:  
EXIT:  
MOV.L  
MOV.L  
MOVS.W  
MOVX.W  
MOVS.W  
MOVX.W  
BRA  
#XRAM_ADD1,R2  
#XRAM_ADD2,R4  
@R2+,A0  
;XRAM_ADD1 -> R2 register  
;XRAM_ADD2 -> R4 register  
;(H'1000FF00) -> A0 register  
;A0 register data -> XRAM_ADD2  
;(H'1000FF00) -> A1 register  
;A1 register data -> XRAM_ADD2+2  
A0,@R4+  
@R2,A1  
A1,@R4  
EXIT  
NOP  
MAIN_E: NOP  
Data  
;***************************************************************  
;* Data  
;***************************************************************  
.SECTION XRAM,DATA,LOCATE=H'1000FF00  
XRAM_ADD1:  
XRAM_ADD2:  
.XDATA.W  
.RES.W  
0.5,0.25  
2
Rev. 1.0, 09/99, page 13 of 115  
2.3  
Y Memory Read  
Overview  
The data from the TRAM_ADD address (H'1001FF00) and YRAM_ADD+2 address  
(H'1001FF02) is transferred, respectively, to registers Y0 and Y1.  
Description  
Table 2.3 shows the types of Y memory read instructions and the registers that can be used as  
operands. Data can be read from Y memory using the commands listed in table 2.3.  
When reading data from Y memory the transfer data length is 16 bits, so the data is stored as the  
upper word of register Y0 or Y1. When this happens, the lower word of register Y0 or Y1 is  
cleared to 0. Processes (1) and (2) in the flowchart are illustrated below.  
Table 2.3 Y Memory Read Instruction Types  
Y Memory Read  
Instruction  
Source Register  
(Ay)  
Destination Register  
(Dy)  
Index Register  
(Iy)  
MOVY.W @Ay,Dy  
MOVY.W @Ay+,Dy  
MOVY.W @Ay+Iy,Dy  
R6, R7  
Y0, Y1  
R9  
Rev. 1.0, 09/99, page 14 of 115  
Process (1)  
YRAM  
16 15  
31  
0
YRAM_TOP  
YRAM_ADD  
*1  
Register Y0  
16 15  
Bit: 31  
0
YRAM_END  
Stores read data  
Cleared to 0  
Process (2)  
YRAM  
16 15  
31  
0
YRAM_TOP  
YRAM_ADD  
*1  
Register Y1  
16 15  
Bit: 31  
0
YRAM_END  
Stores read data  
Cleared to 0  
*1  
: Ignored  
Flowchart  
Start  
Transfer YRAM address (H'1001FF00) to register R6  
After reading data (0.5) from R4 address  
(H'1001FF00) to register Y0, increment R6 address  
(1)  
(2)  
Read data (0.25) from R6 address (H'1001FF02) to  
register Y1  
End  
Rev. 1.0, 09/99, page 15 of 115  
Main Program  
;**********************************************************************  
;*  
Y memory read  
;**********************************************************************  
MAIN:  
EXIT:  
MOV.L  
MOVX.W  
MOVX.W  
BRA  
#YRAM_ADD,R6  
@R6+,Y0  
@R6,Y1  
;YRAM_ADD address -> R6 register  
;(H'1001FF00) -> Y0  
;(H'1001FF02) -> Y1  
EXIT  
NOP  
MAIN_E: NOP  
Data  
;***************************************************************  
;* Data  
;***************************************************************  
.SECTION YRAM,DATA,LOCATE=H'1001FF00  
YRAM_ADD:  
.XDATA.W  
0.5,0.25  
Rev. 1.0, 09/99, page 16 of 115  
2.4  
Y Memory Write  
Overview  
The data from the YRAM_ADD1 address (H'1001FF00) and YRAM_ADD1+2 address  
(H'1001FF02) is transferred the YRAM_ADD2 address and YRAM_ADD2+2 address.  
Description  
Table 2.4 shows the types of Y memory write instructions and the registers that can be used as  
operands. Data can be written to Y memory using the commands listed in table 2.4.  
When writing data to Y memory the transfer data length is 16 bits, so the upper word data from  
register A0 or A1, as specified by the instruction, is stored in Y memory. When this happens, the  
guard bit and lower word of register A0 or A1 is ignored. The Y memory write instructions can  
use only registers A0 and A1 as source registers (see Table 2.4 Y Memory Write Instruction  
Types), so when transferring data to register A0 or A1, single data transfers with register A0 or A1  
as the destination operand are used. Processes (1) and (2) in the flowchart are illustrated below.  
Table 2.4 Y Memory Write Instruction Types  
Y Memory Write  
Instruction  
Source Register  
(Da)  
Destination Register  
(Ax)  
Index Register  
(Ix)  
MOVY.W Da,@Ax  
MOVY.W Da,@Ax+  
MOVY.W Da,@Ax+Ix  
A0, A1  
R6, R7  
R9  
Rev. 1.0, 09/99, page 17 of 115  
Process (1)  
Memory map (YRAM)  
16 15  
31  
0
YRAM_TOP  
*1  
Register A0  
16 15  
YRAM_ADD1  
Bit: 39  
31  
0
Data written to YRAM  
*1  
Ignored  
Ignored  
YRAM_ADD2  
YRAM_END  
Process (2)  
Memory map (YRAM)  
16 15  
31  
0
YRAM_TOP  
*1  
Register A0  
16 15  
Data written to YRAM  
YRAM_ADD1  
Bit: 39  
31  
0
*1  
Ignored  
Ignored  
YRAM_ADD2  
YRAM_END  
*1  
: Ignored  
Rev. 1.0, 09/99, page 18 of 115  
Flowchart  
Start  
Transfer YRAM_ADD1 address (H'1001FF00) to  
register R3  
Transfer YRAM_ADD2 address (H'1001FF00) to  
register R6  
After transferring data (0.5) from R6 (H'1001FF00)  
address to register A0, increment R6 address  
(1)  
Transfer register A0 data to R3 (H'1001FF04)  
address and increment R3  
Transfer data (0.25) from R6 (H'1001FF02) address  
to register A1  
(2)  
Transfer data from register A1 to R3 (H'1001FF06)  
address  
End  
Rev. 1.0, 09/99, page 19 of 115  
Main Program  
***********************************************************************  
;*  
Y Memory Write  
;**********************************************************************  
MAIN:  
EXIT:  
MOV.L  
MOV.L  
MOVS.W  
MOVX.W  
MOVS.W  
MOVX.W  
BRA  
#YRAM_ADD1,R3  
#YRAM_ADD2,R6  
@R3+,A0  
;YRAM_ADD1 -> R3 register  
;YRAM_ADD2 -> R6 register  
;(H'1001FF00) -> A0 register  
;A0 register data -> YRAM_ADD2  
;(H'1001FF00) -> A1 register  
;A1 register data -> YRAM_ADD2+2  
A0,@R6+  
@R3,A1  
A1,@R6  
EXIT  
NOP  
MAIN_E: NOP  
Data  
;****************************************************************  
;* Data  
;****************************************************************  
.SECTION YRAM,DATA,LOCATE=H'1001FF00  
YRAM_ADD1:  
YRAM_ADD2:  
.XDATA.W  
.RES.W  
0.5,0.25  
2
Rev. 1.0, 09/99, page 20 of 115  
Section 3 16-bit Fixed-point Multiplication  
Overview  
Multiplies the 16-bit data at the XRAM-ADD address (H'1000FF000) and the 16-bit data at the  
YRAM-ADD address (H'1001FF002). The result is stored at the ANS address (H'1001FF002).  
Description  
1. Data Transfer  
Transfer of the data from the XRAM-ADD address (H'1000FF000) and the YRAM-ADD  
address (H'1001FF002) is performed using X bus data transfer and Y bus data transfer, as  
described in 2. X/Y Bus Data Access. In process (1) in the flowchart the XRAM and YRAM  
data is read simultaneously, but no contention occurs because the X bus and Y bus are  
independent of each other. The format is shown below.  
The sequence is [X bus data transfer] then [Y bus data transfer]. If these are described in a  
single step, the instructions may be combined as either [X memory read] [Y memory write] or  
[X memory write] [Y memory read].  
Format: MOVX.W @R5,X1  
MOVY.W @R7,Y1  
Rev. 1.0, 09/99, page 21 of 115  
2. Fixed-point Multiplication  
The PMULS instruction is used to perform fixed-point multiplication in process (2) in the  
flowchart. The format is shown below. The fixed-point multiplication process is shown in  
figure 3.1. Only the upper word data from source 1 and source 2 is valid. For example, if the  
longword H'12345678 was read from the source, the portion that would actually be multiplied  
would be H'1234.  
Format: PMULS  
Se,Sf,Dg  
Source 1 (Se): X0, X1, Y0, A1  
Only upper word is valid  
Source 2 (Sf): Y0, Y1, X0, A1  
Only upper word is valid  
31  
31  
0
0
31  
31  
0
0
39  
39  
MAC  
(multiplier)  
Destination (Dg): M0, M1, A0, A1  
Guard bit  
Code extension  
0
39  
31  
1 0  
0
: Ignored  
31  
1 0  
Figure 3.1 Fixed-point Multiplication Process  
Rev. 1.0, 09/99, page 22 of 115  
3. Overflow  
An overflow can occur during fixed-point multiplication only if the operation is H'8000(–1.0)  
× H'8000(–1.0), in which case the calculation result is H'8000(–1.0). This can happen only  
when the destination register is a register other than A0 or A1, both of which have guard bits.  
If the destination register is A0 or A1, the result of the above calculation is the correct value of  
H'008000000(1.0). Refer to table 3.1 for additional fixed-point multiplication execution  
examples.  
Since the destination register used in the example main program is A0, no overflow problem  
occurs.  
Table 3.1 Fixed-point Multiplication Execution Examples  
State of Operation Destination  
Operation Example  
Result  
Register  
M0, M1  
A0, A1  
M0, M1  
A0, A1  
M0, M1  
A0, A1  
Operation Result  
H'4000 (0.5) ×  
H'2000 (0.25)  
Positive  
H'1000 0000 (0.125)  
H'00 1000 0000 (0.125)  
H'FFC00 0000 (–1.95×10–3)  
H'FF FFC00 0000 (–1.95×10–3)  
H'8000 0000 (–0.1)  
H'0800 (0.0625) ×  
H'FC00 (–0.03125)  
Negative  
Overflow  
H'8000 (–1.0) ×  
H'8000 (–1.0)  
H'00 8000 0000 (1.0)  
Rev. 1.0, 09/99, page 23 of 115  
Flowchart  
Start  
Transfer XRAM_ADD address (H'1000F000) to  
register R4  
Transfer YRAM_ADD address (H'1001F000) to  
register R6  
Transfer ANS address (H'1001F002) to register R7  
Transfer data from R4 address (H'1000F000) to  
register X0  
Transfer data from R6 address (H'1001F000) to  
register Y0  
(1)  
(2)  
Multiply upper 16 bits of register X0 data and register  
Y0 data, store result in register A0  
Transfer data from register A0 to ANS address  
(H'1001F002)  
End  
Rev. 1.0, 09/99, page 24 of 115  
Main Program  
;*******************************************************************************************  
;*  
16-bit fixed-point multiplication routine  
;*******************************************************************************************  
MAIN: MOV.L  
MOV.L  
#0,R4  
;Clear register R4  
#0,R6  
;Clear register R6  
MOV.L  
#XRAM_ADD,R4  
#YRAM_ADD,R6  
#ANS,R7  
;XRAM address -> register R4  
;YRAM address -> register R6  
;ANS address -> register R7  
MOV.L  
MOV.L  
MOVX.W @R4,X0 MOVY.W @R6,Y0  
;XRAM and YRAM address data ->  
registers X0 and Y0  
PMULS X0,Y0,A0  
;16-bit fixed-point  
multiplication  
MOVY.W A0,@R7  
;Store multiplication result  
EXIT: BRA  
NOP  
EXIT  
MAIN_E:NOP  
Data  
;**************************************************************  
;* Data  
;**************************************************************  
.SECTION XRAM,DATA,LOCATE=H'1000F000  
XRAM_ADD:  
.XDATA.W  
0.0625  
.SECTION YRAM,DATA,LOCATE=H'1001F000  
YRAM_ADD:  
ANS:  
.XDATA.W  
.RES.W  
0.03125  
1
Rev. 1.0, 09/99, page 25 of 115  
Section 4 Parallel Execution Instruction  
Overview  
Four data values obtained sequentially from the XRAM-ADD address (H'1000FF000) and the  
YRAM-ADD address (H'1001FF000) are added and multiplied. The addition result is stored at the  
ANS1 address (H'1000FF004) and the multiplication result at the ANS2 address (H'1001FF004).  
Description  
1. Structure of Parallel Execution Instruction  
The parallel execution instruction is used to transfer data between a DSP register and X  
memory or Y memory at the same time a DSP operation is being executed. Table 4.1 shows  
the data transfer and DSP operation structure. The parallel execution instruction comprises a  
DSP operation portion and a data transfer portion. Table 4.2 lists format examples for the  
parallel execution instruction. The DSP operation portion is a single instruction like the regular  
PAND, PINC, and PSHA instructions. However, as shown in table 4.2, its has two-instruction  
structure the case of the PADD and PMULS instructions, or the PSUB and PMULS  
instructions. The data transfer portion consists of two instructions, one the data transfer  
instruction for X memory and the other the data transfer instruction for Y memory. Either one  
of these data transfer instructions may be used.  
Table 4.1 Data Transfer and DSP Operation Structure  
Parallel  
Data Transfer Processing with Parallel Processing Instructio  
Type  
Bus Used Length  
DSP Operation  
of Data Transfers  
n Length  
Double X bus  
16 bits  
No  
No: One or the other 16 bits  
data transfer  
data  
Y bus  
transfer  
(1)  
(2)  
Yes: Data transfer  
with X memory and Y  
memory at same time  
Yes  
No: One or the other 32 bits  
data transfer  
Yes: Data transfer  
with X memory and Y  
memory at same time  
Single  
data  
C bus*1  
16 bits  
32 bits  
No  
16 bits  
transfer  
*1: Note that the name differs depending on the product.  
Rev. 1.0, 09/99, page 27 of 115  
Table 4.2 Parallel Execution Instruction Format Examples  
DSP Operation Portion  
Data Transfer Portion  
PADD X0,Y0,A0 PMULS X0,Y0,A1  
PSUB X1,Y1,A1 PMULS X0,Y1,A0  
PADD X0,Y0,A0 PMULS X0,Y0,A1  
MOVX.W A0,@R4 MOVY.W A1,@R6  
MOVX.W @R5,X1 MOVY.W @R7,Y1  
MOVX.W A0,@R4  
PINC  
X0,Y0,A0  
MOVY.W @R6,Y1  
PAND X0,Y0,A0  
PSHA X0,Y0,A0  
MOVX.W A0,@R5  
MOVX.W @R4,X1 MOVY.W A1,@R7  
2. Parallel Processing of Double Data Transfer and DSP Operation  
Process (1) in the flowchart on the following page is double data transfer with no DSP  
operation instruction parallel processing, which is indicated as (1) in table 4.1, and processes  
(2) and (3) are double data transfer with parallel processing of DSP operation instructions,  
which is indicated as (2) in table 4.1. Processes (2) and (3) consist of four instructions, which is  
the maximum number that can be declared in a single step. In this case, one execution state is  
used.  
3. Effect of DSP Operation Portion Result on Data Transfer Portion  
Table 4.3 shows the effect of the DSP operation portion result on the data transfer portion.  
Instruction 2 (process (3)) uses A0 and A1 as the destination register for the DSP operation  
portion and also as the source register for the data transfer portion. However, the result of the  
DSP operation portion is not the data stored in the data transfer portion. In this case the  
underlined registers are affected, so the calculation result from instruction 1 (process (2))  
operation portion is stored in the instruction 2 (process (3)) data transfer portion.  
Figure 4.1 shows the instruction 2 pipeline flow. When instructions are executed in parallel,  
each of the instructions is processed independently, as shown in figure 4.1. The reason the  
DSP operation portion result does not become the data stored in the data transfer portion in this  
case is that the WB/DSP stage, in which DSP operations are performed using PADD and  
PMULS, is later than the MA stage, in which memory access is performed using MOVX.W  
and MOVY.W.  
Note that after the execution of instruction 2 (process (3)), the X1 and Y1 addition and  
multiplication results are stored in registers A0 and A1.  
Rev. 1.0, 09/99, page 28 of 115  
Table 4.3 Effect of DSP Operation Portion Result on Data Transfer Portion  
Excerpts from Main Program  
;Instruction 1  
PADD  
X0,Y0,A0 PMULS X0,Y0,A1  
MOVX.W @R4,X1  
MOVY.W @R6,Y1  
;Instruction 2  
PADD  
X1,Y1,A0 PMULS X1,Y1,A1  
MOVX.W A0,@R5+ MOVY.W A1,@R7+  
Content of Registers  
Before execution of instruction 2:  
X1=H'1000 0000, Y1=H'0800 0000, A0=H'6000 0000, A1=H'1000 0000  
After execution of instruction 2:  
X1=H'1000 0000, Y1=H'0800 0000, A0=H'1800 0000, A1=H'0100 0000  
Slot  
PADD  
X1,Y1,A0  
X1,Y1,A1  
A0,@R5+  
A1,@R7+  
IF  
IF  
IF  
IF  
ID  
ID  
ID  
ID  
EX  
EX  
EX  
EX  
MA  
MA  
MA  
MA  
WB/DSP  
WB/DSP  
WB/DSP  
WB/DSP  
PMULS  
MOVX.W  
MOVY.W  
Figure 4.1 Instruction 2 Pipeline Flow  
Rev. 1.0, 09/99, page 29 of 115  
Flowchart  
Start  
Transfer XRAM_ADD address (H'1000F000) to  
register R4  
Transfer ANS1 address (H'1000F004) to register R5  
Transfer YRAM_ADD address (H'1001F000) to  
register R6  
Transfer ANS2 address (H'1001F004) to register R7  
After transferring data (0.5) from R4 address  
(H'1000F000) to register X0, increment address  
After transferring data (0.25) from R6 address  
(H'1001F000) to register Y0, increment address  
(1)  
(2)  
Add data in registers X0 and Y0, store result in  
register A0  
Multiply data in registers X0 and Y0, store result in  
register A1  
After transferring data (0.25) from R4 address  
(H'1000F000) to register X1, increment address  
After transferring data (0.5) from R6 address  
(H'1001F000) to register Y1, increment address  
Add data in registers X1 and Y1, store result in  
register A0  
Multiply data in registers X1 and Y1, store result in  
register A1  
After transferring data register A0 to ANS1 address  
(H'1000F004), increment address  
After transferring data register A1 to ANS2 address  
(H'1001F004), increment address  
(3)  
(1)  
After transferring data register A0 to ANS1 address  
(H'1000F004), increment address  
After transferring data register A1 to ANS2 address  
(H'1001F004), increment address  
End  
Rev. 1.0, 09/99, page 30 of 115  
Main Program  
;*******************************************************************************************  
;*  
Parallel data transfer routine  
;******************************************************************************************  
MAIN: MOV.L  
MOV.L  
#XRAM_ADD,R4  
#ANS1,R5  
MOV.L  
#YRAM_ADD,R6  
#ANS2,R7  
MOV.L  
MOVX.W @R4+,X0 MOVY.W @R6+,Y0  
;No parallel processing  
MOVX.W @R4,X1 MOVY.W @R6,Y1  
;Parallel processing  
MOVX.W A0,@R5+ MOVY.W A1,@R7+  
;Parallel processing  
MOVX.W A0,@R5 MOVY.W A1,@R7  
;No parallel processing  
PADD X0,Y0,A0 PMULS X0,Y0,A1  
PADD X1,Y1,A0 PMULS X1,Y1,A1  
EXIT: BRA  
EXIT  
NOP  
MAIN_E:NOP  
Data  
;**********************************************************************  
;* Data(X/YRAM)  
;**********************************************************************  
.SECTION XRAM,DATA,LOCATE=H'1000F000  
XRAM_ADD:  
ANS1:  
.XDATA.W  
.RES.W  
0.5,0.125  
2
;DSP operation data  
;DSP operation result storage area  
.SECTION YRAM,DATA,LOCATE=H'1001F000  
YRAM_ADD:  
ANS2:  
.XDATA.W  
.RES.W  
0.25,0.0625  
2
;DSP operation data  
;DSP operation result storage area  
Rev. 1.0, 09/99, page 31 of 115  
Section 5 Repeat Instruction  
Overview  
The average of ten data values stored in XRAM and YRAM is obtained. To accomplish this, the  
repeat function is used for transferring data from XRAM and YRAM to the DSP unit, and for  
adding the ten data values.  
Description  
1. DSP Repeat Control  
Three settings are required in order to perform repeat control: I the start address setting for the  
program to be repeated, II the end address setting for the program to be repeated, III and the  
setting for the number of repetitions to be performed. After settings I through III have been  
completed, Process IV is to start the program to be repeated. Note that a minimum of one  
instruction is required between the processing of III and IV.  
The sequence of processes I through IV is shown below.  
I
LDRS instruction is used to set the repeat start address in the RS register.  
II LDRE instruction is used to set the repeat end address in the RE register.  
III SETRC instruction is used to set the number of repetitions in the RC register.  
:
(Minimum of one instruction inserted.)  
IV Program to be repeated is started.  
Process (1) in the flowchart on the next page corresponds to I through III above. After the  
program to be repeated is started (IV), it is repeated within the scope of process (2). Two main  
programs are shown in the example, but their function is the same. In (1) repeat control  
instructions (LDRS, LDRE, and SETRC) are used, and in (2) the extended instruction  
REPEAT is used. REPEAT automatically generates the CPU instructions (LDRS, LDRE, and  
SETRC) used to repeat the instructions between the start and end addresses. In the format  
shown below if the number of repetitions is omitted, the SETRC instruction is not generated.  
Format: REPEAT [start address], [end address], [number of repetitions]  
Rev. 1.0, 09/99, page 33 of 115  
In program (1) the repeat start and end addresses are different from the actual addresses, and  
this is because the address setting change depending on the number of instructions in the  
program to be repeated. Table 5.1 shows how the RS and RE settings change depending on the  
number of instructions within the range to be repeated. These are the addresses actually  
repeated by the program when the repeat start and end addresses are set in RS and RE.  
Therefore, it is necessary to label the repeat start and end addresses while keeping the offsets  
listed in Table 5.1 in mind. The setting method for RS and RE in program (1) is described on  
the next page.  
RPT_S0+N: Address N bytes from the instruction preceding the instruction at the start  
address of the program to be repeated  
RPT_S:  
RPT_E:  
Start address of the program to be repeated  
End address of the program to be repeated  
RPT_E3+4: Address 4 bytes from the instruction three instructions before the instruction at  
the end address of the program to be repeated  
Table 5.1 RS and RE Setting Values Based on Number of Instructions Within Repeat  
Number of Instructions in Program to be Repeated  
1
2
3
4
RS  
RE  
RPT_S0 + 8  
RPT_S0 + 4  
RPT_S0 + 6  
RPT_S0 + 4  
RPT_S0 + 4  
RPT_S0 + 4  
RPT_S  
RPT_E3 + 4  
Rev. 1.0, 09/99, page 34 of 115  
2. Repeat Control Using CPU Instructions  
Example (a) shows the method for setting addresses in RS and RE. If there are three  
instructions in the portion to be repeated, RS and RE must be set to the RPT_S0+4 address, as  
indicated in Table 5.1. The double data transfer instructions in lines (1) and (2) of this program  
have a 16-bit instruction length, so the RPT_S0+4 address corresponds to the RPT_E0 address.  
If RS and RE are set to the address RPT_E0, the result is program (b).  
LDRS  
LDRE  
SETRC  
RPT_S0+4 address  
RPT_S0+4 address  
#5  
;Repeat start address  
;Repeat end address  
;Repeat counter setting/5 repetitions  
RPT_S0:  
RPT_S:  
(1) MOVX.W @R5,X1 MOVY.W @R7,Y1  
(2) MOVX.W @R4+,X0 MOVY.W @R6+,Y0  
;Clear X1, Y1 = 1/10  
RPT_E0: PADD  
RPT_E: PADD  
X0,Y0,M0  
X1,M0,X1  
;X1/data total  
PMULS X1,Y1,A1  
;A1/average value  
(a) RS and RE Address Setting Method  
LDRS  
LDRE  
RPT_E0  
RPT_E0  
#5  
;Repeat start address  
;Repeat end address  
;Repeat counter setting/5 repetitions  
SETRC  
RPT_S0:  
MOVX.W @R5,X1 MOVY.W @R7,Y1  
MOVX.W @R4+,X0 MOVY.W @R6+,Y0  
;Clear X1, Y1 = 1/10  
RPT_S:  
RPT_E0: PADD  
RPT_E: PADD  
X0,Y0,M0  
X1,M0,X1  
;X1/data total  
PMULS X1,Y1,A1  
;A1/average value  
(b) RS and RE Address Setting Method  
Rev. 1.0, 09/99, page 35 of 115  
3. Repeat Control Using Extended Instructions  
When the extended instruction REPEAT is used there is no need to perform complicated  
labeling, as is the case when using CPU instructions for repeat control. The following  
explanation is based on the expanded image of a portion of a repeat program shown as (a)  
below. With REPEAT one only needs to declare the labels for the start (RPT_S) and end  
(RPT_E) addresses of the program to be repeated, and then the assembler automatically  
calculates the address values to be used for the RS and RE settings (RPT_E0 if the code to be  
repeated contains three instructions), and generates the LDRS, LDRE, and SETRC  
instructions. When the extended instruction REPEAT is actually used, the result is the repeat  
program shown in example (b) below.  
REPEAT RPT_S,RPT_E,#5  
LDRS  
LDRE  
SETRC  
RPT_E0  
RPT-E0  
#5  
;RPT_S0+4  
;RPT_S0+4  
Expands to CPU instructions for repeat control.  
RPT_S0:  
RPT_S:  
MOVX.W @R5,X1  
MOVY.W @R7,Y1  
MOVX.W @R4+,X0 MOVY.W @R6+,Y0  
RPT_E0: PADD  
RPT_E: PADD  
X0,Y0,M0  
X1,M0,X1  
PMULS X1,Y1,A1  
(a) Expanded Image of Repeat Program  
REPEAT RPT_S,RPT_E,#5  
RPT_S0:  
RPT_S:  
MOVX.W @R5,X1  
MOVY.W @R7,Y1  
MOVX.W @R4+,X0 MOVY.W @R6+,Y0  
RPT_E0: PADD  
RPT_E: PADD  
X0,Y0,M0  
X1,M0,X1  
PMULS X1,Y1,A1  
(b) Repeat Program Using Extended Instruction REPEAT  
Rev. 1.0, 09/99, page 36 of 115  
Flowchart  
Start  
Transfer XRAM_ADD address to R4  
Transfer CLR address to R5  
Transfer YRAM_ADD address to R6  
Transfer DIV address to R7  
Set RPT_S address as repeat start address (RS)  
Set RPT_E address as repeat end address (RE)  
(1)  
Set RC counter in register SR to number of  
repetitions (5 times)  
Clear register X1 by transferring R5 address  
(H'1000F00A) data (0) to register X1  
Transfer data (0.1) from register R7 (H'1001F00A) to  
register Y1  
Repeat program  
number of times  
indicated by  
repetitions setting  
(5 times in this  
case)  
Transfer R4 address data to register X0 and  
increment R4 address  
Transfer R6 address data to register Y0 and  
increment R6 address  
(2)  
Add data from registers X0 and Y0, and store result  
in register M0  
Add data from registers X1 and M0, and store result  
in register X1  
Multiply data from registers X1 and Y1, and store  
result in register A0  
End  
Rev. 1.0, 09/99, page 37 of 115  
Main Program  
(1) Repeat Control Using CPU Instructions  
;*******************************************************************************************  
;*  
Repeat routine  
;*******************************************************************************************  
MAIN: MOV.L  
MOV.L  
#XRAM_ADD,R4  
#CLR,R5  
#YRAM_ADD,R6  
#DIV,R7  
RPT_E0  
MOV.L  
MOV.L  
LDRS  
;Repeat start address  
;Repeat end address  
LDRE  
RPT_E0  
SETRC  
#5  
;Repeat counter setting/5  
repetitions  
MOVX.W @R5,X1 MOVY.W @R7,Y1  
MOVX.W @R4+,X0 MOVY.W @R6+,Y0  
;Clear X1, Y1 = 1/10  
RPT_S:  
RPT_E0:PADD X0,Y0,M0  
RPT_E: PADD X1,M0,X1  
;X1/data total  
PMULS X1,Y1,A1  
;A1/average value  
EXIT: BRA  
NOP  
EXIT  
MAIN_E:NOP  
(2) Repeat Control Using Extended Instruction REPEAT  
;*******************************************************************************************  
;* Repeat routine  
;*******************************************************************************************  
MAIN: MOV.L  
MOV.L  
#XRAM_ADD,R4  
#CLR,R5  
MOV.L  
#YRAM_ADD,R6  
#DIV,R7  
MOV.L  
MOV.L  
#5,R0  
REPEAT RPT_S,RPT_E,R0  
;CPU instructions for  
repeat control generated  
automatically  
MOVX.W @R5,X1 MOVY.W @R7,Y1  
MOVX.W @R4+,X0 MOVY.W @R6+,Y0  
;Clear X1, Y1 = 1/10  
RPT_S:  
PADD X0,Y0,M0  
RPT_E: PADD X1,M0,X1  
PMULS X1,Y1,A1  
;X1/data total  
;A1/average value  
EXIT: BRA  
NOP  
EXIT  
MAIN_E:NO  
Rev. 1.0, 09/99, page 38 of 115  
Data  
* Same data used by main programs (1) and (2)  
;*******************************************************************************************  
;*  
Data (X/YRAM)  
;*******************************************************************************************  
.SECTION XRAM,CODE,LOCATE=H'1000F000  
XRAM_ADD: .XDATA.W  
CLR; .DATA.W  
0.0625,0.125,0.0625,0.0625,0.03125 ;DSP operation data  
;DSP operation result storage area  
0
.SECTION YRAM,CODE,LOCATE=H'1001F000  
YRAM_ADD: .XDATA.W  
DIV: .XDATA.W  
0.0625,0.125,0.03125,0.125,0.0625  
0.1  
;DSP operation data  
;DSP operation result storage area  
Rev. 1.0, 09/99, page 39 of 115  
Section 6 Examples of Arguments Passed Between CPU  
Instructions and DSP Instructions  
Overview  
The two 16-bit fixed-point data values stored at the XRAM_ADD address (H'1000F000) and  
YRAM_ADD address (H'1001F000) are multiplied using DSP instructions and CPU instructions.  
Description  
When data is passed between CPU instructions and DSP instructions, R4, R5, R6, and R7 are used  
as pointers and the data is passed via XRAM and YRAM. The procedure when the result of a  
calculation performed by the DSP is used by the CPU is described below.  
As can be seen in (2-1), (3-1), and (3-2), both the (2) DSP multiplication routine and (3) CPU  
multiplication routine of the example main program read data stored in XRAM and YRAM.  
Example arguments:  
PADD  
X0,Y0,A0 ; Stores result of adding X0 and Y0 in A0  
MOVX.W A0,@R4  
MOV.W @R4,R0  
; Transfers A0 data to R4 address  
; Transfers R4 address data to R0  
Some points need to be kept in mind when transferring data. Some of the DSP instructions are for  
handling fixed-point data, and when fixed-point multiplication is performed the result is matched  
to the MSB. However, when multiplication is performed using CPU instructions, integer  
multiplication is performed and the is matched to the LSB. This means that the calculation result  
will differ from that obtained using DSP instructions.  
The multiplication process used in (2-1), (3-1), and (3-2) in the (2) DSP multiplication routine and  
(3) CPU multiplication routine in the flowchart on the following page is shown in table 6.1. This  
shows that the calculation results after execution differ even if the source operand data is identical.  
When a DSP instruction (PMULS) is used to multiply integer data, it is necessary to convert the  
calculation result from fixed-bit data into integer format by performing a bit shift.  
Rev. 1.0, 09/99, page 41 of 115  
Table 6.1 DSP and CPU Multiplication Process  
Excerpt from Main Program  
(2) DSP multiplication routine PMULS X0,Y0,A0  
Register Contents  
Before execution:  
X0=H'4000, Y0=2000  
After execution:  
A0=H'1000 0000  
(3) CPU multiplication routine MULS.W R0,R1  
STS MACL,R14  
Before execution:  
R0=H'4000, R1=H'2000  
After execution:  
R14=H'0800 0000  
Rev. 1.0, 09/99, page 42 of 115  
Flowchart  
Start  
Transfer XRAM_ADD address (H'1000F000) to  
register R4  
(1-1)  
(1-2)  
(1)  
Transfer YRAM_ADD address (H'1001F000) to  
register R6  
Transfer data (H'4000) from R4 address  
(H'1000F000) to register X0  
Transfer data (H'2000) from R6 address  
(H'1001F000) to register Y0  
(2-1)  
(2)  
Multiply data from register X0 and register Y0, store  
result in register A0  
(2-2)  
(3-1)  
Transfer data (H'4000) from R4 address  
(H'1000F000) to register R0  
Transfer data (H'2000) from R6 address  
(H'1001F000) to register R1  
(3-2)  
(3-3)  
(3-4)  
(3)  
Multiply data from register R0 and register R1  
Transfer data (multiplication result) from register  
MACL to register R14  
End  
Rev. 1.0, 09/99, page 43 of 115  
Main Program  
;*******************************************************************************************  
;*  
Initial setting routine  
;*******************************************************************************************  
MAIN: MOV.L  
MOV.L  
#XRAM_ADD,R4  
#YRAM_ADD,R6  
;*******************************************************************************************  
;* DSP multiplication routine  
;*******************************************************************************************  
MOVX.W @R4,X0 MOVY.W @R6,Y0 ;Load 0.5,0.25  
PMULS X0,Y0,A0  
;A0 = multiplication result  
;*******************************************************************************************  
;* CPU multiplication routine  
;*******************************************************************************************  
MOV.L  
MOV.L  
MULS.W  
STS  
@R4,R0  
@R6,R1  
R0,R1  
;H'4000 load  
;H'2000 load  
MACL,R14  
;R14 = multiplication result  
EXIT: BRA  
EXIT  
NOP  
MAIN_E:NOP  
Data  
;**********************************************************************  
;* Data  
;**********************************************************************  
.SECTION XRAM,DATA,LOCATE=H'1000F000  
XRAM_ADD: .XDATA.W 0.5  
;DSP operation data  
;DSP operation data  
.SECTION YRAM,DATA,LOCATE=H'1001F000  
.XDATA.W 0.25  
.END  
YRAM_ADD  
Rev. 1.0, 09/99, page 44 of 115  
Section 7 32-bit Multiplication  
Overview  
The 32-bit data value stored at the XRAM_ADD address (H'1000F000) and the 32-bit data value  
stored at the YRAM_ADD address (H'1001F000) are multiplied, and the result (64-bit) is  
transferred from the ANS address (H'1001F100) to the ANS+7 address (H'1001F107), where it is  
stored.  
Description  
1. Overview of Calculation Method  
The addresses where the multiplier and multiplicand of a 32-bit multiplication operation are  
stored, and the address where the result is stored, are shown in figure 7.1. Figure 7.2 shows an  
overview of the calculation method for 32-bit multiplication. The 32-bit data values (the  
multiplier and multiplicand) are separated into their upper and lower 16-bit segments (here  
provisionally called A, B, C, and D), which are then multiplied to produce the 64-bit operation  
result. The top bit (MSB) of the 16-bit data input to the multiplier is interpreted as the sign bit,  
and it has a weight of –20 = –1. Therefore, in the example program the first top bit (MSB) is  
replaced with 0, the product of the various segments is calculated, and a correction items are  
added using the top bit in order to obtain the 32-bit multiplication result.  
Input  
31  
31  
16 15  
XRAM_ADD+2  
0
0
Multiplicand (32-bit)  
Multiplier (32-bit)  
XRAM_ADD  
YRAM_ADD  
16 15  
YRAM_ADD+2  
×
)
Output  
63  
48 47  
32 31  
16 15  
0
Multiplication result  
(64-bit)  
ANS  
ANS+2  
ANS+4  
ANS+6  
Figure 7.1 32-bit Multiplication  
Rev. 1.0, 09/99, page 45 of 115  
A
C
B
D
Multiplicand  
Multiplier  
× )  
B: XRAM_ADD+2 address data  
A: XRAM_ADD address data  
D: YRAM_ADD+2 address data  
C: YRAM_ADD address data  
B × D  
+
A × D  
+
B × C  
+
A × C  
63  
48 47  
32 31  
16 15  
0
Figure 7.2 Overview of Calculation Method for 32-bit Multiplication  
Rev. 1.0, 09/99, page 46 of 115  
2. Double-length Calculation Algorithm  
If the single-precision number of bits is n, “double-length” refers to 2n bits. Therefore, 2n bit  
numbers can be expressed as shown in figure 7.3.  
A
B
2n–1  
n n–1  
Multiplicand: E  
A0  
B0  
*1  
2n–1 (Upper MSB)  
–e2n–1 · 2  
2n–2ei · 2i  
i=n  
en–1 · 2n–1 (Lower MSB)  
n–2  
ei · 2i  
i=0  
C
D
2n–1  
n n–1  
Multiplier: F  
C0  
D0  
*1  
–f2n–1 · 22n–1  
2n–2fi · 2i  
i=n  
fn–1 · 2n–1  
n–2  
fi · 2i  
i=0  
*1: ei, fi = 0 or 1  
Figure 7.3 Structure of 2n-bit Numbers  
Rev. 1.0, 09/99, page 47 of 115  
Here, if Σei · 2i = A0, Σei · 2i = B0, Σei · 2i = C0, Σei · 2i = D0, performing the double-length  
multiplication E × F is can be expressed as:  
E × F = (–e2n–1 · 22n–1 + A0 + e2n–1 · 2n–1+ + B0) × (–f2n–1 · 22n–1 + C0 + f2n–1 · 2n–1+ + D0)  
= e2n–1 · f2n–1 · 24n–2 (1)  
–e2n–1 · 22n–1 (C0 + fn–1 · 2n–1+ + D0) (2)  
–f2n–1 · 22n–1 (A0 + en–1 · 2n–1+ + B0) (3)  
+en–1 · 2n–1 (C0 + fn–1 · 2n–1+ + D0) (4)  
+fn–1 · 2n–1 (A0 + B0) (5)  
+A0 · C0 + A0 · D0 + B0 · C0 + B0 · D0 (6)  
In the above equation, (6) is the product of the segments and (1) through (5) are correction  
items.  
The correction items involve determining whether the sign bit is “0” or “1” and, if it is “1”,  
adding it to or deleting it from the product of the segments.  
Figure 7.4 shows a 32-bit double-length multiplication algorithm that uses the above equation.  
The whole can be subdivided into the following six parts:  
In part (1), in order to clear the sign bits of A, B, C, and D to 0, the logical product with  
H'7FFF is obtained, resulting in A0, B0, C0, and D0. In part (2), the product is calculated for  
the following four segments: A0 · C0, A0 · D0, B0 · C0, and D0 · C0. In parts (3) through (6),  
the sum is obtained for each digit, and the results are stored at the ANS, ANS+2, ANS+4, and  
ANS+6 addresses.  
Rev. 1.0, 09/99, page 48 of 115  
31  
S
16 15  
S
0
0
*1  
*2  
A
C
B
D
31  
S
16 15  
S
× )  
15  
0
0
0
(1-1)  
(1-2)  
(1-3)  
(1-4)  
(2-1)  
(2-2)  
(2-3)  
(2-4)  
(3-1)  
(4-1)  
(4-2)  
(4-3)  
(4-4)  
(4-5)  
A0  
C0  
15  
0
(1)  
15  
0
0
0
0
B0  
D0  
15  
0
31  
16 15  
A0 × D0  
31  
16 15  
0
B0 × D0  
(2)  
(3)  
31  
31  
16 15  
0
A0 × D0  
16 15  
0
B0 × C0  
15  
0
0
ANSWER1  
15  
(A0 × D0) Low  
+
15  
15  
0
0
0
0
(B0 × C0) Low  
+
(B0 × D0) High  
(4)  
+
31  
16 15  
Correction item (4)  
C0 + D  
+
31  
16 15  
A0 + B0  
+ ) Correction item (5)  
15  
C
0
(4-6)  
(5-1)  
(5-2)  
(5-3)  
(5-4)  
(5-5)  
(5-6)  
(5-7)  
ANSWER2  
15  
15  
15  
0
(A0 × C0) Low  
+
0
0
(B0 × C0) High  
+
(A0 × D0) High  
+
31  
31  
16 15  
0
Correction item (2)  
Correction item (3)  
–(C0 + D)  
(5)  
+
+
16 15  
0
–(A0 + B)  
15  
0
Correction item (4)  
C0  
+
15  
15  
0
0
+ )  
Correction item (5)  
A0  
(5-8)  
(6-1)  
(6-2)  
(6-3)  
(6-4)  
(6-5)  
C
ANSWER3  
15  
15  
15  
15  
0
(A0 × C0) High  
+
0
0
0
Correction item (2)  
Correction item (3)  
–C0  
+
(6)  
–A0  
+
+ ) Correction item (1)  
H'8000  
15  
0
ANSWER4  
*1 S : Sign bit  
*2 : Decimal point position  
Figure 7.4 32-bit Double-length Multiplication Algorithm  
Rev. 1.0, 09/99, page 49 of 115  
Flowchart  
Start  
To clear sign bit of A, obtain logical product of A and  
H'7FFF, and designate as A0  
Determine sign bit  
(1-1)  
(1-2)  
To clear sign bit of B, obtain logical product of A and  
H'7FFF, and designate as B0  
Determine sign bit  
(1)  
To clear sign bit of C, obtain logical product of A and  
H'7FFF, and designate as C0  
Determine sign bit  
(1-3)  
(1-4)  
To clear sign bit of D, obtain logical product of A and  
H'7FFF, and designate as D0  
Determine sign bit  
Multiply A0 and C0, separate upper and lower bits of  
result, and store in XRAM  
(2-1)  
(2-2)  
Multiply B0 and D0, separate upper and lower bits of  
result, and store in YRAM  
(2)  
Multiply A0 and D0, separate upper and lower bits of  
result, and store in XRAM  
(2-3)  
(2-4)  
(3-1)  
Multiply B0 and C0, separate upper and lower bits of  
result, and store in YRAM  
Store lower bits of B0 and D0 multiplication result at  
ANS+6 address  
(3)  
(4)  
Add lower bits of A0 × D0, lower bits of B0 × C0, and  
lower bits of B0 × D0  
(4-1)  
(4-2)  
No  
Is B sign bit 1?  
Yes  
Add lower bits (D) of correction item (4) to result of  
(4-1)  
(4-3)  
I
Rev. 1.0, 09/99, page 50 of 115  
I
No  
(4-4)  
(4-5)  
Is D sign bit 1?  
Yes  
(4)  
Add lower bits (B0) of correction item (5) to result of  
(4-1) or (4-3)  
(4-6)  
(5-1)  
(5-2)  
Store result of (4-1), (4-3) or (4-5) at ANS+4 address  
Add lower bits of A0 × C0, lower bits of B0 × C0, and  
upper bits of A0 × D0  
No  
Is A sign bit 1?  
Yes  
Add lower bits (–D) of correction item (2) to result of  
(5-1)  
(5-3)  
(5-4)  
No  
Is C sign bit 1?  
Yes  
(5)  
Add lower bits (–B) of correction item (3) to result of  
(5-1) or (5-3)  
(5-5)  
(5-6)  
No  
Is B sign bit 1?  
Yes  
Add upper bits (C0) of correction item (4) to result of  
(5-1), (5-3) or (5-5)  
(5-7)  
(5-8)  
No  
Is D sign bit 1?  
Yes  
Add upper bits (A0) of correction item (5) to result of  
(5-3), (5-5) or (5-7)  
(5-9)  
II  
Rev. 1.0, 09/99, page 51 of 115  
II  
(5)  
Store result of (5-1), (5-3), (5-5), (5-7) or (5-9) at  
ANS+2 address  
(5-10)  
(6-1)  
(6-2)  
Add carry to upper bits of result of (2-1)  
No  
Is A sign bit 1?  
Yes  
Add upper bits (–C0) of correction item (2) to result  
of (6-1)  
(6-3)  
(6-4)  
No  
Is C sign bit 1?  
Yes  
(6)  
Add upper bits (–A0) of correction item (3) to result  
of (6-1) or (6-3)  
(6-5)  
(6-6)  
No  
Are A and C sign bits both 1?  
Yes  
Add of correction item (1) (H'8000) to result of (6-1),  
(6-3) or (6-5)  
(6-7)  
(6-8)  
Store result of (6-1), (6-3), (6-5) or (6-7) at ANS  
address  
End  
Rev. 1.0, 09/99, page 52 of 115  
Main Program  
;*******************************************************************************************  
;*  
;*  
;*  
;*  
32-bit fixed-point multiplication routine  
[A][B] × [C][D]  
;*******************************************************************************************  
MAIN: MOV.L #XRAM_ADD,R4  
MOV.L #WORKX,R5  
MOV.L #YRAM_ADD,R6  
MOV.L #WORKY,R7  
;XRAM for work  
;YRAM for work  
;Clear sign  
MOV.W  
MOV.W  
PCLR  
#H'7FFF,R0  
R0,@R7  
A1  
MOVX.W @R4+,X0 MOVY.W @R7,Y0 ;A,H'7FFF load  
MOVY.W @R6+,Y1;A0,C load  
PAND  
X0,Y0,A0  
MOV.W  
PSHA  
R0,@R5  
;H'7FFF -> #WORKX  
#1,X0  
MOVX.W @R5,X1  
MOVX.W A0,@R5+  
MOVX.W @R4,X0  
;A sign chech,H'7FFF load  
DCT PINC A1,A1  
;A0 store  
PAND  
MOV.L  
MOV.L  
PCLR  
X1,Y1,A0  
;C0,B load  
R4,@-R15  
#SIGNA,R4  
A1  
MOVX.W A1,@R4+  
PSHA  
#1,Y1  
A1,A1  
X0,Y0,A0  
A1  
MOVY.W A0,@R7+;C sign check,C0 store  
MOVY.W @R6,Y1 ;B sign check,D load  
;B0  
DCT PINC  
PAND  
MOVX.W A1,@R4+  
MOVX.W A0,@R5  
MOVX.W A1,@R4+  
PCLR  
PSHA  
#1,X0  
A1,A1  
X1,Y1,A0  
A1  
DCT PINC  
PAND  
;D0,B0 store  
PCLR  
PSHA  
#1,Y1  
A1,A1  
DCT PINC  
MOVY.W A0,@R7 ;D0 store  
MOVX.W A1,@R4  
MOV.L  
@R15+,R4  
;*****************************************************************  
;*Segment product calculation routine/ B0×D0,A0×C0,B0×C0,A0×D0  
;*****************************************************************  
MOV.L  
MOV.L  
#WORKX,R5  
#WORKY,R7  
MOVX.W @R5+,X0 MOVY.W @R7+,Y0;A0,C0  
PMULS X0,Y0,A1  
PMULS X1,Y1,A0  
MOVX.W @R5+,X1 MOVY.W @R7+,Y1;A0×C0,B0,D0  
MOVX.W A1,@R5+  
;B0×D0, (A0×C0)H store  
PSHA  
#16,A1  
MOVY.W A0,@R7+;(A0×C0)L, (B0×D0)H store  
Rev. 1.0, 09/99, page 53 of 115  
PSHA  
PMULS X0,Y1,A1  
PSHA #16,A1  
PMULS X1,Y0,A1  
PSHA #16,A1  
#16,A0  
MOVX.W A1,@R5+  
;(B0×D0)L, (A0×C0)L store  
MOVY.W A0,@R7+;A0×D0, (B0×D0)L store  
;(A0×D0)L, (A0×D0)H store  
MOVX.W A1,@R5+  
MOVX.W A1,@R5  
;B0×C0, (A0×D0)L store  
MOVY.W A1,@R7+;(B0×C0)L, (B0×C0)H store  
MOVY.W A1,@R7 ;(B0×C0)L store  
;******************  
;*ANSWER1 STORE  
;******************  
MOV.L  
R7,@-R15  
#ANS,R7  
#6,R7  
;push R7  
MOV.L  
ADD  
MOVY.W A0,@R7+;Store in ANS1  
ADD  
#-2,R7  
MOV.L  
MOV.L  
R7,R14  
;R14=#ANS+2  
;pop R7  
@R15+,R7  
********************************************************************************************  
;*2-word calculation routine/ R4=#XRAM_ADD+2,R5=#WORKX+10,R6=#YRAM_ADD+2,R7=#WORKY+10  
;*******************************************************************************************  
PCOPY X1,M1  
MOV.L #-6,R9  
PCLR  
A1  
MOVX.W @R5,X1  
MOVY.W @R7+R9,Y1 ;(A0×D0)L lode,  
(B0×C0)L load  
PADD  
X1,Y1,A0  
MOVY.W @R7+,Y1  
;(A0×D0)L+(B0×C0)L,  
(B0×D0)H load  
DCT PINC  
PADD  
A1,A1  
;carry check  
A0,Y1,A0  
;(A0×D0)L+(B0×C0)  
L+(B0×D0)H  
DCT PINC  
MOV.W  
MOV.L  
MOV.W  
CMP/EQ  
BT  
A1,A1  
;carry check  
#H'0,R10  
#SIGND,R0  
@R0+,R1  
R10,R1  
;Is B negative?  
HOSEI4_L  
MOVY.W @R6,Y1  
;Load D  
;Add D  
PADD  
DCT PINC  
HOSEI4_L:  
MOV.W  
A0,Y1,A0  
A1,A1  
@R0,R1  
CMP/EQ  
R10,R1  
;Is D negative?  
;Add B0  
BT  
HOSEI5_L  
PADD  
A0,M1,A0  
A1,A1  
DCT PINC  
HOSEI5_L:  
MOV.L  
R4,@-R15  
;push R4  
Rev. 1.0, 09/99, page 54 of 115  
MOV.L  
#CARRY,R4  
@R15+,R4  
MOVX.W A1,@R4  
;carry store  
;pop R4  
MOV.L  
;******************  
;*ANSWER2 STORE  
;******************  
MOV.L  
R7,@-R15  
R14,R7  
;push R7  
MOV.L  
MOVY.W A0,@R7+  
;ANS2 store  
ADD  
#-2,R7  
MOV.L  
MOV.L  
R7,R14  
;R14=#ANS+4  
;pop R7  
@R15+,R7  
;*******************************************************************************************  
;*3-word calculation routine/ R4=#XRAM_ADD+2,R5=#WORKX+10,R6=#YRAM_ADD+2,R7=#WORKY+6  
;*******************************************************************************************  
MOV.L #-4,R8  
PCOPY X0,A1  
MOVX.W @R5+R8,X0 MOVY.W @R7+,Y1 ;dummy load  
MOVX.W @R5+,X0  
MOVY.W @R7+,Y1 ;(A0×C0)L lode,  
(B0×C0)H load  
PADD  
X0,Y1,M1  
MOVX.W @R5,X1  
;(A0×C0)L+(B0×C0)H,  
(A0×D0)H load  
DCT PINC  
PADD  
M0,M0  
;carry check  
X1,M1,A0  
;(A0×C0)L+(B0×C0)  
H+(A0×D0)H  
DCT PINC  
;Correction  
MOV.W  
M0,M0  
;carry check  
#H'0,R10  
#SIGNA,R0  
@R0+,R1  
R10,R1  
MOV.L  
MOV.W  
CMP/EQ  
BT  
;Is A negative?  
HOSEI2_L  
PSUB  
A0,Y1,A0  
M0,M0  
;Subtract D (correction 2)  
DCT PDEC  
HOSEI2_L:  
MOV.W  
@R0+,R1  
R10,R1  
CMP/EQ  
BT  
;Is C negative?  
HOSEI3_L  
MOVX.W @R4,X1  
PCOPY X1,M1  
PSUB  
A0,M1,A0  
M0,M0  
;Subtract B (correction 3)  
DCT PDEC  
HOSEI3_L:  
MOV.W  
CMP/EQ  
BT  
@R0+,R1  
R10,R1  
;Is B negative?  
HOSEI4_H  
PADD  
A0,Y0,A0  
;Subtract C0 (correction 4)  
Rev. 1.0, 09/99, page 55 of 115  
DCT PINC  
HOSEI4_H:  
MOV.W  
M0,M0  
@R0+,R1  
R10,R1  
CMP/EQ  
BT  
;Is D negative?  
HOSEI5_H  
PCOPY A1,M1  
PADD  
DCT PINC  
A0,M1,A0  
M0,M0  
;Add A0 (correction 5)  
HOSEI5_H:  
PCOPY A0,M1  
MOV.L  
#CARRY,R4  
MOVX.W @R4,X1  
;Load carry  
;Add carry  
PADD  
X1,M1,A0  
M0,M0  
DCT PINC  
;Check carry  
;**************  
;*ANSWER3 STORE  
;**************  
MOV.L  
R14,R7  
#-2,R7  
MOVY.W A0,@R7+;ANS3 store  
ADD  
;*******************************************************************************************  
;*4-word calculation routine/ R4=#XRAM_ADD+2,R5=#WORKX+8,R6=#YRAM_ADD+2,R7=#WORKY+10  
;*******************************************************************************************  
PCLR  
PCLR  
Y1  
MOVX.W @R5+R8,X1  
MOVX.W @R5,X1  
;dummy load  
M1  
;(A0×C0)H load  
PADD  
X1,M0,A0  
M1,M1  
DCT PINC  
;Correction  
MOV.L  
#SIGNA,R0  
@R0+,R1  
R10,R1  
MOV.W  
CMP/EQ  
BT  
;Is A negative?  
HOSEI3_H  
PCOPY A1,M0  
PSUB  
DCT PDEC  
MOV.L  
A0,M0,A0  
M1,M1  
;Subtract C0 (correction 2)  
#H'0,R12  
#1,R12  
ADD  
HOSEI2_H:  
MOV.W  
@R0+,R1  
R10,R1  
CMP/EQ  
BT  
;Is C negative?  
HOSEI4_H  
PSUB  
A0,Y0,A0  
M1,M1  
;Subtract A0 (correction 3)  
DCT PDEC  
ADD  
#1,R12  
HOSEI3_H:  
Rev. 1.0, 09/99, page 56 of 115  
MOV.L  
CMP/EQ  
BF  
#2,R1  
R1,R12  
FIN  
;Are both A and C negative?  
MOV.W  
MOV.W  
#H'8000,R10  
R10,@R5  
MOVX.W @R5,X0  
PCOPY X0,M1  
;Add H'8000 (correction 1)  
PADD  
A0,M1,A0  
;**************  
;*ANSWER4 STORE  
;**************  
FIN:  
MOVY.W A0,@R7 ;ANS4 store  
EXIT: BRA  
EXIT  
NOP  
MAIN_E:  
NOP  
Data  
;*******************************************************************************************  
;* 32-bit multiplication data (XRAM/YRAM)  
;*******************************************************************************************  
.SECTION XRAM,DATA,LOCATE=H'1000F000  
XRAM_ADD:  
WORKX:  
CARRY:  
SIGNA:  
SIGNC:  
SIGNB:  
SIGND:  
.XDATA.L  
.RES.W  
.RES.W  
.RES.W  
.RES.W  
.RES.W  
.RES.W  
0.25002500 ;Multiplicand  
6
1
1
1
1
1
;Work area  
;Carry area  
;For determining sign of multiplicand upper word A  
;For determining sign of multiplier upper word C  
;For determining sign of multiplicand lower word B  
;For determining sign of multiplier lower word D  
.SECTION YRAM,DATA,LOCATE=H'1001F000  
YRAM_ADD:  
WORKY:  
ANS:  
.XDATA.L  
.RES.W  
0.50005000 ;Multiplier  
6
4
;Work area  
.RES.W  
;Multiplication result storage area  
Rev. 1.0, 09/99, page 57 of 115  
Section 8 Trigonometric Functions  
Overview  
Calculating the trigonometric functions SIN(X) and COS(X).  
Description  
1. Performing Trigonometric Functions  
Figure 8.1 shows curves for SIN(X) and COS(X). If the angle range is –π ≤ X ≤ π, the  
relationships expressed in equation (1) exists.  
SIN(–X) = –SIN(X)  
COS(–X) = COS(X)  
------------------------------------------------------------------ (1)  
Using the relationships expressed in equation (1), the SIN(X) and COS(X) of –π ≤ X 0 can  
be calculated by obtaining the SIN(X) and COS(X) of 0 X ≤ π and processing the sign.  
Next is figure 8.2 (a) and (b). The relationships of SIN(X) and COS(X), with X = π/2 at the  
center, are expressed in equation (2).  
SIN(X + π/2) = –SIN(π/2 – X)  
------------------------------------------------------ (2)  
COS(X + π/2) = COS(π/2 – X)  
1
π  
π/2  
0
π/2  
π
–1  
Figure 8.1 SIN(X) and COS(X) Curves  
Rev. 1.0, 09/99, page 59 of 115  
1
0
1
π/2  
π
0
π/2  
π
–1  
(a) SIN (X)  
(b) COS (X)  
Figure 8.2 SIN(X) and COS(X) Curves with X = π/2 at Center  
Based on the relationship between equations (1) and (2), the SIN(X) and COS(X) of –π ≤ X ≤  
π can be calculated by obtaining the SIN(X) and COS(X) of 0 X ≤ π and, finally, processing  
the sign. The example program divides 0 X ≤ π/2 into 128 segments. If X = n · π/256 + X  
(n = 1, 2, ...., 128), the result is equation (3), based on the addition theorem of trigonometric  
functions.  
SIN(X) = SIN(n · π/256 + X)  
= SIN(n · π/256) · COS(X) – COS(n · π/256) · SIN(X)  
COS(X) = COS(n · π/256 + X)  
------------ (3)  
= COS(n · π/256) · COS(X) – SIN(n · π/256) · SIN(X)  
If we assume that in equation (3) X is extremely small and approximate that SIN(X) = X  
and COS(X) = 1 – (X)2/2, the result is equation (4).  
SIN(X) = SIN(n · π/256) · {1 – (X)2/2} + X · COS(n · π/256)  
--------------- (4)  
COS(X) = COS(n · π/256) · {1 – (X)2/2} – X · SIN(n · π/256)  
In other words, by calculating equation (4) using X and table data (n · π/256), we can obtain  
the SIN(X) and COS(X) of 0 X ≤ π/2. The final result is then obtained by performing sign  
processing.  
Rev. 1.0, 09/99, page 60 of 115  
2. Converting Input Values  
Using conversion equation (5), the example program inputs to the DSP as angle parameters the  
input value X for the range –π ≤ X ≤ π and a for the range –1 X < 1.  
X = π · a  
a = X/π  
--------------------------------------------------------------------------------- (5)  
X unit: rad  
a unit: rad/π  
Table 8.1 Relation Between Input Value a and Polarity  
Result  
Input Value  
SIN(X)  
COS(X)  
|a|  
–1 < ≤ a < –0.5  
(–π ≤ X < π/2)  
Negative  
Negative  
| a | > 0.5  
–0.5 a < 0  
(–π/2 X < 0)  
Negative  
Positive  
Positive  
Positive  
Positive  
Negative  
| a | 0.5  
| a | 0.5  
| a | > 0.5  
0 a 0.5  
(0 X ≤ π/2)  
0.5 < a < 1  
(π/2 < X < π)  
Here the range 0 X ≤ π/2 corresponds to the range 0 X 0.5. Also, the input value a is  
converted from the range –1 < a 1 to the range 0 a' 0.5. Figure 8.3 shows the curves  
| SIN(X) | and | COS(X) |.  
π  
π/2  
0
π/2  
π
π  
π/2  
0
π/2  
π
B
B
A
A
(a) | SIN(X) |  
(b) | COS(X) |  
Figure 8.3 Curves | SIN(X) | and | COS(X) |  
Rev. 1.0, 09/99, page 61 of 115  
When obtaining the SIN(X) and COS(X) of point A in figure 8.3, if we assume that A = π/2 +  
B, then a = 0.5 + b. Therefore, it is possible to obtain the deviation | b | relative to X = π/2  
using equation (6).  
| b | = | | a | –0.5 | ------------------------------------------------------------------------- (6)  
Next, based on deviation | b |, equation (7) is used to calculate the conversion of input value a  
for the range –1 < a 1 to a' for the range 0 a' 0.5.  
a' = | | | a | –0.5 | –0.5 | ------------------------------------------------------------------- (7)  
3. a' Table Data  
The example program uses a table with 128 cells. In other words, the range 0 a' 0.5 is  
divided into 128 equal segments. The difference in a' due to the angle of each segment is  
expressed in equation (8).  
0.5/128 = 0.00390625 ------------------------------------------------------------------- (8)  
Table 8.2 shows the correspondence between table address n and a' in decimal notation and as  
16-bit fixed-point expressions.  
Table 8.2 Relationship Between Table Address n and a'  
a'  
Table  
Address  
n
n/256;  
Decimal Notation  
rad]/π  
16-bit Fixed-point Expression  
15 14 13 12 11 10  
9
8
7
6
5
4
3
2
1
0
0
1
2
3
4
0.00000000  
0.00390625  
0.00781250  
0.01171875  
0.01562500  
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
1
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
127  
128  
0.49609375  
0.50000000  
0
0
0
1
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
: Decimal point position  
Rev. 1.0, 09/99, page 62 of 115  
4. Method of Calculating X  
As shown in table 8.2, the upper nine bits of the a' data expressed in fixed-point format  
correspond to n, and the lower seven bits to the amount of shift from the table data a'. Figure  
8.4 shows the bit structure of a'. By obtaining the value of a', it is possible to calculate the  
equation (2) table data address (the value of n · π/256) as well as X at the same time. Finally,  
table 8.1 is used for sign processing in order to obtain the SIN(X) and COS(X) of –π ≤ X ≤ π.  
15  
7
6
0
Table address n  
Shift from table a  
: Decimal point position  
Figure 8.4 Bit Structure of a'  
Figure 8.5 shows the relationship with the amount of shift between table values X. Table shift  
X can also be obtained by using the a of a' and equation (9).  
X = a · π -------------------------------------------------------------------------------- (9)  
1
(n+1) · π/256  
X  
n · π/256  
0
1
Figure 8.5 Relation With Amount of Shift Between Table Values  
Rev. 1.0, 09/99, page 63 of 115  
5. Overflow Processing  
If the calculation result is as shown in equation (10), an overflow occurs.  
| SIN(X) | 1  
| COS(X) | < 0  
-------------------------------------------------------------------------- (10)  
In such cases the value is corrected using equation (11).  
| SIN(X) | = 1 – 2–15  
------------------------------------------------------------------- (11)  
| COS(X) | = 0  
6. Algorithm for Calculating Trigonometric Functions  
The algorithm for calculating trigonometric functions is as follows.  
(1) Make initial settings.  
(2) Load input value a, calculate | | | a | –0.5 | –0.5 | to obtain a'.  
(3) Obtain logical product of above and #H'FF80 and calculate upper nine bits (n/256) of a'.  
Then calculate n and set value in Y bus index register (R9).  
(4) Obtain logical product of above and #H'007F and calculate lower seven bits (a') of a'.  
(5) Calculate π∆a'; calculate X.  
(6) Calculate 1 – (X)2/2. Load sin(n × π/256) and cos(n × π/256) from data table in YRAM.  
(7) Calculate sin(X).  
(8) Process sign of sin(X); store sin(X).  
(9) Calculate cos(X).  
(10)Process sign of cos(X); store cos(X).  
Rev. 1.0, 09/99, page 64 of 115  
Execution Example  
The sin(X) and cos(X) (OUTPUT) calculation results obtained based on the input value a  
(INPUT) are shown in table 8.3.  
Table 8.3 sin(x), cos(X) Calculation Results  
Logical Value  
(decimal)  
Logical Value  
(hexadecimal)  
Output Value  
(hexadecimal)  
Input  
Value  
Angle  
X°  
(a = X/π) sin(X)  
cos(X)  
1
sin(X)  
cos(X)  
H'7FFF  
H'6EDA  
H'5A82  
H'011E  
H'8EFC  
H'8002  
H'620D  
H'2121  
H'A263  
H'8000  
sin(X)  
cos(X)  
H'7FFF  
H'6ED9  
H'5A82  
H'011D  
H'8EFD  
H'8002  
H'620F  
H'2121  
H'A263  
H'8001  
0
0
0
H'0000  
H'4000  
H'5A82  
H'7FFE  
H'3C17  
H'011E  
H'ADB9  
H'845D  
H'A8B4  
H'0000  
H'0000  
H'3FFE  
H'5A82  
H'7FFD  
H'3C19  
H'011C  
H'ADBB  
H'845D  
H'A8B5  
H'0002  
30  
0.16667  
0.25  
0.5  
0.86603  
0.70711  
0.00873  
–0.88295  
–0.99996  
45  
0.70711  
0.99996  
0.46947  
0.00873  
89.5  
152  
179.5  
–40  
–75  
–137  
–180  
0.49722  
0.84444  
0.99722  
–0.22222 –0.64279 0.76604  
–0.41667 –0.96593 0.25882  
–0.76111 –0.681  
–1  
–0.73135  
–1  
0
Rev. 1.0, 09/99, page 65 of 115  
Flowchart  
Start  
(1-1)  
(1-2)  
(1-3)  
Transfer INPUT address to register R4  
Transfer WORK address to register R5  
Transfer TABLE_SIN address to register R6  
Transfer TABLE_COS address to register R7  
Load input value a  
(1)  
(1-4)  
(2-1)  
(2-2)  
Transfer H'FF80 to R5 address (WORK area)  
To determine sign, copy a and store value in register  
M1, load 0.5  
(2-3)  
(2-4)  
(2-5)  
(2)  
Calculate | | a | –0.5 |  
Calculate | | | a | –0.5 | –0.5 | to obtain a', load  
H'FF80 from address R5  
Obtain logical product of a' and H'FF80, calculate  
upper 9 bits (n/256) of a'  
(3-1)  
Convert n/256 fixed-point data to integer data by  
shifting n/256 6 bits to the right  
(3-2)  
(3-3)  
(3)  
Transfer integer data n obtained in (2-1) to R5  
address (WORK area)  
Zero-extend integer data n passed to CPU unit via R5  
address to long-word size, set Y index register R9  
(3-4)  
I
Rev. 1.0, 09/99, page 66 of 115  
I
(4-1)  
(4-2)  
Transfer H'007F to R5 address (WORK area)  
Load H'007F from R5 address  
(4)  
Obtain logical product of a' and H'007F, calculate  
lower seven bits (a') of a'  
(4-3)  
(5-1)  
Calculate 4a' by shifting the a' value obtained in  
(4-3) 2 bits to the left  
Calculate π/4  
(5)  
(5-2)  
(6-1)  
Multiply 4a' and π/4 to calculate X  
Square (X2) X value obtained in (5-2)  
Load sin(n × π/256) from data table in YRAM  
Shift X2 value obtained in (6-1) 1 bit to the right to  
obtain 1/2 (X2/2)  
(6-2)  
(6)  
Load –1 from register R4  
Subtract X2/2 value obtained in (6-2) from –1 loaded  
in (6-2) to calculate 1 – X2/2  
Load cos(n × π/256) from data table  
(6-3)  
(7-1)  
Set operation result status (set using DC bit in register  
DSR) to overflow mode  
Multiply X value obtained in (5-2) and cos(n × π/256)  
value loaded in (6-3)  
(7-2)  
(7-3)  
(7)  
Multiply sin(n × π/256) value obtained in (6-1) and  
(1 – X2/2) value obtained in (6-3)  
Add operation results from (7-2) and (7-3) to calculate  
sin(X)  
(7-4)  
II  
Rev. 1.0, 09/99, page 67 of 115  
II  
No  
(7-5)  
Did (7-4) operation overflow?  
Yes  
(7)  
(7-6)  
(8-1)  
(8-2)  
Decrement sin(X) value obtained in (7-4)  
Copy input value a from register M1 to register X1  
Set operation result status (set using DC bit in register  
DSR) to negative value mode  
Shift by 1 bit input value a stored in register X1 in  
(8-1)  
(8-3)  
(8-4)  
(8-5)  
(8)  
No  
Is the sign bit of a 1 (a < 0)?  
Yes  
Reverse the sign of the sin(X) value obtained in (7-4)  
(8-6)  
(8-7)  
Transfer the OUTPUT address to register R6  
Store sin(X) at the R6 address (OUTPUT+2)  
Set operation result status (set using DC bit in register  
DSR) to overflow mode  
(9-1)  
(9-2)  
(9-3)  
(9-4)  
Multiply DX value obtained in (5-2) and sin(n × π/256)  
value loaded in (6-1)  
(9)  
Multiply 1 – X2/2 and cos(n × π/256) values obtained  
in (6-3)  
Add operation results from (9-2) and (9-3) to calculate  
cos(X)  
III  
Rev. 1.0, 09/99, page 68 of 115  
III  
(9)  
No  
(9-5)  
(9-6)  
Did (9-4) operation overflow?  
Yes  
Clear cos(X) value obtained in (9-4) to 0  
(10-1)  
(10-2)  
Transfer the DAT address to register R4  
Load 0.5 from R4 address  
Calculate absolute value of input value a stored in  
register M1 to obtain | a |  
(10-3)  
(10-4)  
Set operation result status (set using DC bit in register  
DSR) to negative value mode  
(10)  
Is value  
No  
(10-5)  
of | a | greater than 0.5?  
| a | > 0.5?  
Yes  
Reverse the sign of the cos(X) value obtained in  
(10-4)  
(10-6)  
(10-7)  
Store cos(X) at the R6 address (OUTPUT+2)  
End  
Rev. 1.0, 09/99, page 69 of 115  
Main Program  
;*******************************************************************************************  
;*  
;*  
;*  
;*  
Trigonometric function routine  
sinX,cosX  
;*******************************************************************************************  
;*******************************************************************************************  
;*  
Initial setting routine  
;*******************************************************************************************  
MAIN:  
MOV.L  
MOV.L  
MOV.L  
MOV.L  
#INPUT,R4  
#WORK,R5  
#TABLE_SIN,R6  
#TABLE_COS,R7  
;*******************************************************************************************  
;* a calculation routine  
;*******************************************************************************************  
MOVX.W @R4,X0  
;a load  
MOV.L  
#H'FF80,R0  
;For extracting upper 9 bits  
of a' (N×π/64)  
MOV.W  
R0,@R5  
MOV.L  
#DAT,R4  
PCOPY X0,M1  
MOVX.W @R4+,X1  
;For determining sign of M1,  
load 0.5  
PCOPY X1,Y1  
PSUB  
PABS  
PSUB  
PABS  
X0,Y1,M0  
M0,A0  
;||a|-0.5|  
A0,Y1,M0  
M0,M0  
;|||a|-0.5|-0.5|  
;M0 = a', #H'FF80 load  
MOVX.W @R5,X0  
;*******************************************************************************************  
;* n calculation, R6 setting routine  
;*******************************************************************************************  
PAND  
PSHA  
X0,M0,A0  
#-6,A0  
;A1 = n/256  
;Convert fixed-point n to  
integer n  
MOVX.W A0,@R5  
;Pass integer n to CPU unit  
MOV.W  
EXTU.W  
MOV.L  
@R5,R1  
R1,R1  
R1,R9  
;
;
;*******************************************************************************************  
;*  
a' calculation routine  
;*******************************************************************************************  
MOV.L  
#H'007F,R0  
;For extracting lower 7 bits  
of a' (a')  
Rev. 1.0, 09/99, page 70 of 115  
MOV.W  
PAND  
R0,@R5  
MOVX.W @R5,X1  
;#H'007F load  
X1,M0,Y1  
;a'  
;*******************************************************************************************  
;*  
X calculation routine  
;*******************************************************************************************  
PSHA  
#2,Y1  
MOVX.W @R4+,X1  
;4a', /4 load  
;a'× π  
PMULS X1,Y1,A1  
;*******************************************************************************************  
;*  
1 – (X2)/2calculation, sin(n × π/256) and cos(n × π/256) loading routine  
;*******************************************************************************************  
PCOPY A1,X0  
MOVY.W @R6+R9,Y0 ;copy,dummy load  
PMULS A1,X0,M0  
MOVY.W @R6,Y0  
;X2,sin(n×π/256) load  
PSHA  
PSUB  
#-1,M0  
MOVX.W @R4,X1 MOVY.W @R7+R9,Y1 ;X2/2, -1 lode,dummy load  
X1,M0,A1  
MOVY.W @R7,Y1  
;1-X2/2,cos(n×π/256) load  
;*******************************************************************************************  
;* sin(X) calculation routine  
;*******************************************************************************************  
MOV.L  
LDS  
#H'6,R0  
R0,DSR  
;Set overflow mode  
;X·cos(n×π/256)  
;(1-(X2)/2)·sin(n×π/256)  
PMULS X0,Y1,M0  
PMULS A1,Y0,A0  
PABS  
PADD  
A0,A0  
A0,M0,A0  
A0,A0  
;A0 = sin(X)  
DCT PDEC  
;If overflow occurs, sin(X) – 1  
;*******************************************************************************************  
;* sin(X) sign processing and storing routine  
;*******************************************************************************************  
PCOPY M1,X1  
MOV.L  
LDS  
#H'0,R0,  
R0,DSR  
;Carry/borrow mode  
;If a < 0, reverse sign  
;Store sin(X)  
PSHA  
#1,X1  
A0,A0  
DCT PNEG  
MOV.L  
#OUTPUT,R6  
MOVY.W A0,@R6+  
;*******************************************************************************************  
;* cos(X) calculation routine  
;*******************************************************************************************  
MOV.L  
LDS  
#H'6,R0  
R0,DSR  
;Set overflow mode  
;X·SIN(N×π/64)  
PMULS X0,Y0,M0  
PMULS A1,Y1,A0  
;(1-(X)/2)·COS(N×π/64)  
PABS  
PSUB  
A0,A0  
A0,M0,A0  
A0  
DCT PCLR  
;If overflow occurs, clear cos(X) to 0  
Rev. 1.0, 09/99, page 71 of 115  
;;******************************************************************************************  
;* cos(X) sign processing and storing routine  
;*******************************************************************************************  
MOV.L  
#DAT,R4  
MOVX.W @R4.X0  
;0.5 load  
;|a|  
PABS  
MOV.L  
M1,M1  
#H'2,R0  
R0,DSR  
LDS  
;Set negative value mode  
PCMP  
X0,M1  
A0,A0  
DCT PNEG  
;If | a | < 0.5, reverse sign  
MOVY.W A0,@R6  
EXIT: BRA  
NOP  
EXIT  
MAIN_E:NOP  
Rev. 1.0, 09/99, page 72 of 115  
Data  
;*******************************************************************************************  
;* Trigonometric function data routine  
;*******************************************************************************************  
.SECTION XRAM,DATA,LOCATE=H'1000FF00  
INPUT:  
WORK:  
DAT:  
.RES.W  
1
;External input data storage area  
.RES.W  
1
.XDATA.W  
0.5,0.78540,-1  
;For calculating a', for calculating Ñ/4 (1 – ¦X2/2)  
.SECTION YRAM,DATA,LOCATE=H'1001F800  
0,0.01227,0.02454,0.03681,0.04907,0.06132 ;N/0 - 5  
TABLE_SIN:  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
0.07356,0.08580,0.09802,0.11022,0.12241  
0.13458,0.14673,0.15886,0.17096,0.18304  
0.19509,0.20711,0.21910,0.23106,0.24298  
0.25487,0.26671,0.27852,0.29028,0.30201  
0.31368,0.32531,0.33689,0.34842,0.35990  
0.37132,0.38268,0.39400,0.40524,0.41643  
0.42756,0.43862,0.44961,0.46054,0.47140  
0.48218,0.49290,0.50354,0.51410,0.52459  
0.53500,0.54532,0.55557,0.56573,0.57581  
0.58580,0.59570,0.60551,0.61523,0.62486  
0.63439,0.64383,0.65317,0.66242,0.67156  
0.68060,0.68954,0.69838,0.70711,0.71573  
0.72425,0.73265,0.74095,0.74914,0.75721  
0.76517,0.77301,0.78074,0.78835,0.76584  
0.80321,0.81046,0.81758,0.82459,0.83147  
0.83822,0.84485,0.85136,0.85773,0.86397  
0.87009,0.87607,0.88192,0.88764,0.89322  
0.89867,0.90399,0.90917,0.91421,0.91911  
0.92388,0.92851,0.93299,0.93734,0.94154  
0.94561,0.94953,0.95331,0.95694,0.96043  
0.96378,0.96700,0.97003,0.97294,0.97570  
0.97832,0.98079,0.98311,0.98528,0.98730  
0.98918,0.99090,0.99248,0.99391,0.99518  
0.99631,0.99729,0.99812,0.99880,0.99932  
0.99970,0.99992,1  
;N/6 - 10  
;N/11 - 15  
;N/16 - 20  
;N/21 - 25  
;N/26 - 30  
;N/31 - 35  
;N/36 - 40  
;N/41 - 45  
;N/46 - 50  
;N/51 - 55  
;N/56 - 60  
;N/61 - 65  
;N/66 - 70  
;N/71 - 75  
;N/76 - 80  
;N/81 - 85  
;N/86 - 90  
;N/91 - 95  
;N/96 - 100  
;N/101 - 105  
;N/106 - 110  
;N/111 - 115  
;N/116 - 120  
;N/121 - 125  
;N/126 - 128  
TABLE_COS:  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
1,0.99992,0.99970,0.99932,0.99880,0.99812 ;N/0 - 5  
0.99729,0.99631,0.99518,0.99391,0.99248  
0.99090,0.98918,0.98730,0.98528,0.98311  
0.98079,0.97832,0.97570,0.97294,0.97003  
0.96700,0.96378,0.96043,0.95694,0.95331  
0.94953,0.94561,0.94154,0.93734,0.93299  
0.92851,0.92388,0.91911,0.91421,0.90917  
0.90399,0.89867,0.89322,0.88764,0.88192  
;N/6 - 10  
;N/11 - 15  
;N/16 - 20  
;N/21 - 25  
;N/26 - 30  
;N/31 - 35  
;N/36 - 40  
Rev. 1.0, 09/99, page 73 of 115  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
.XDATA.W  
0.87607,0.87009,0.86397,0.85773,0.85136  
0.84485,0.83822,0.83147,0.82459,0.81758  
0.81046,0.80321,0.76584,0.78835,0.78074  
0.77301,0.76517,0.75721,0.74914,0.74095  
0.73265,0.72425,0.71573,0.70711,0.69838  
0.68954,0.68060,0.67156,0.66242,0.65317  
0.64383,0.63439,0.62486,0.61523,0.60551  
0.59570,0.58580,0.57581,0.56573,0.55557  
0.54532,0.53500,0.52459,0.51410,0.50354  
0.49290,0.48218,0.47140,0.46054,0.44961  
0.43862,0.42756,0.41643,0.40524,0.39400  
0.38268,0.37132,0.35990,0.34842,0.33689  
0.32531,0.31368,0.30201,0.29028,0.27852  
0.26671,0.25487,0.24298,0.23106,0.21910  
0.20711,0.19509,0.18304,0.17096,0.15886  
0.14673,0.13458,0.12241,0.11022,0.09802  
0.08580,0.07356,0.06132,0.04907,0.03681  
0.02454,0.01227,0  
;N/41 - 45  
;N/46 - 50  
;N/51 - 55  
;N/56 - 60  
;N/61 - 65  
;N/66 - 70  
;N/71 - 75  
;N/76 - 80  
;N/81 - 85  
;N/86 - 90  
;N/91 - 95  
;N/96 - 100  
;N/101 - 105  
;N/106 - 110  
;N/111 - 115  
;N/116 - 120  
;N/121 - 125  
;N/126 - 128  
OUTPUT:  
.RES.W  
2
;External output data storage area  
Rev. 1.0, 09/99, page 74 of 115  
Section 9 Matrix Operations  
Overview  
Matrix A (3, 3) and matrix B (3, 3) are multiplied to obtain a 32-bit precision matrix product C (3,  
3). Matrixes A and B are set in XRAM and YRAM beforehand. Matrix product C is stored  
beginning at YRAM address H'1001FF00.  
Description  
1. Method of Expressing Matrixes  
Figure 9.1 shows matrix A (n,m). The element aij is a component of matrix A. Horizontal rows  
of components are called rows, which are numbered from the top as row1, row2, row3, ..., row  
i, ... and so on. Vertical columns of components are called columns, which are numbered from  
the left as column 1, column 2, column 3, ... column j, ... and so on. The components in the  
position where row I and column k intersect is called component (i,j). Component (i,j) of  
matrix A (n,m) is expressed as ai,j.  
(Column j)  
a11 a12 a1j  
a21 a22 a2j  
a1n  
a2n  
A = (row i)  
ai1  
ai2  
aij  
ain  
am1 am2 amj  
amn  
Figure 9.1 Matrix A  
2. Method of Calculating Matrix Product  
Figure 9.2 shows the expression of the components of matrix A × matrix B = matrix product C.  
*1  
a11 a12 a13  
a21 a22 a23  
a31 a32 a33  
b11 b12 b13  
b21 b22 b23  
b31 b32 b33  
c11 c12 c13  
c21 c22 c23  
c31 c32 c33  
×
=
Matrix A  
Matrix B  
Matrix Product C  
*1 ci,j: 32-bit components.  
Figure 9.2 Expression of Components of Matrix A × Matrix B = Matrix Product C  
Rev. 1.0, 09/99, page 75 of 115  
The components ci,j of matrix product C are obtained using the following equation.  
3
Cn,m = Σ (an,i × bi,m)  
i=1  
The components ci,j of matrix product C are obtained by performing a sum of products  
calculation on row components an,i of matrix A and column components bi,m of matrix B.  
3. Method of Storing Matrix A, Matrix B, and Matrix Product C Components  
The components cn,m of matrix product C are obtained by performing a sum of products  
calculation on row components an,i of matrix A and column components bi,m of matrix B. The  
example subroutine, in order to increase the processing speed, stores the elements in XRAM  
and YRAM as shown in figure 9.3  
A1  
A2  
C1  
C2  
×
B1 B2 B3  
Matrix B  
=
A3  
C3  
Matrix A  
Matrix Product C  
Address  
XRAM  
a1,1  
a1,2  
a1,3  
a2,1  
a2,2  
a2,3  
a3,1  
a3,2  
a3,3  
Address  
YRAM  
*1  
#MATRIXA  
#MATRIXC  
CH1,1  
CL1,1  
CH1,2  
CL1,2  
CH1,3  
CL1,3  
CH2,1  
CL2,1  
CH2,2  
CL2,2  
CH2,3  
CL2,3  
CH3,1  
CL3,1  
CH3,2  
CL3,2  
CH3,3  
CL3,3  
#MATRIXA+2  
#MATRIXA+4  
#MATRIXA+6  
#MATRIXA+8  
#MATRIXA+A  
#MATRIXA+C  
#MATRIXA+E  
#MATRIXA+10  
#MATRIXC+2  
#MATRIXC+4  
#MATRIXC+6  
#MATRIXC+8  
#MATRIXC+A  
#MATRIXC+C  
#MATRIXC+E  
#MATRIXC+10  
#MATRIXC+12  
#MATRIXC+14  
#MATRIXC+16  
#MATRIXC+18  
#MATRIXC+1A  
#MATRIXC+1C  
#MATRIXC+1E  
#MATRIXC+20  
#MATRIXC+22  
A1  
A2  
A3  
C1  
C2  
C3  
Address  
YRAM  
b1,1  
b2,1  
b3,1  
b1,2  
b2,2  
b3,2  
b1,3  
b2,3  
b3,3  
#MATRIXB  
#MATRIXB+2  
#MATRIXB+4  
#MATRIXB+6  
#MATRIXB+8  
#MATRIXB+A  
#MATRIXB+C  
#MATRIXB+E  
#MATRIXB+10  
B1  
B2  
B3  
*1 CHi,j: Upper 16 bits of Ci,j  
CLi,j: Lower 16 bits of Ci,j  
Figure 9.3 Memory Map with Matrix A, Matrix B, and Matrix Product C  
Components Stored  
Rev. 1.0, 09/99, page 76 of 115  
4. Algorithm for Calculating Matrix Product C  
Figure 9.4 shows the algorithm for calculating matrix product C. The details of the algorithm  
are described below.  
(1) Clear counter registers, store matrix A in the X address register (R4) and matrix B in the Y  
address registers (R6, R7), set the addresses for storing the components of matrix product  
C.  
(2) Perform sum of products calculation on row components an,i of matrix A and column  
components bi,m of matrix B.  
(3) Store CHn,m (upper 16 bits of matrix product Cn,m) in MATRIXC+2n address and  
CLn,m (lower 16 bits) in MATRIXC+2n+2 address.  
(4) Return matrix A column components to first column.  
(5) Determine if one row of matrix product Cn,m has been calculated. If n is not 3, return to  
process (2). If n is 3, move to process (6).  
(6) Shift matrix A row components down one row.  
(7) Determine if all three rows of matrix product C have been calculated. If n is not 3, return  
to process (2). If n is 3, all of matrix product Cn,m has been calculated and processing  
ends.  
Rev. 1.0, 09/99, page 77 of 115  
(1)  
(2)  
Initial setting  
Sum of products calculation on row components an,i  
of matrix A and column components bi,m of matrix B  
3
Cn,m = Σ (Cn,i × Ci,m)  
i=1  
Store CHn,m (upper 16 bits of matrix product Cn,m)  
in MATRIXC+2n address and CLn,m (lower 16 bits)  
in MATRIXC+2n+2 address  
(3)  
Return matrix A column components to first column  
(4)  
(5)  
No  
n = 3?  
Yes  
(6)  
(7)  
Shift matrix A row components down one row  
No  
n = 3?  
Yes  
End  
Figure 9.4 Algorithm for Calculating Matrix Product C  
Rev. 1.0, 09/99, page 78 of 115  
Flowchart  
Start  
Clear R10 address  
Clear R12 address  
Transfer MATRIXA (H'1000FF00) address to register  
R4  
(1)  
Transfer MATRIXB (H'1001FF00) address to register  
R6  
Transfer MATRIXC (H'1001FF12) address to register  
R7  
Use extended instruction REPEAT to set repeat start  
address (LOOP_S), repeat end address (LOOP_E),  
and number of repeats (3 times)  
Clear register M0  
Clear register A0  
(2)  
After reading 1 component ai,j from matrix A,  
increment R4 address  
After reading 1 component bi,j from matrix B,  
increment R6 address  
Repeat program  
number of times  
indicated by number  
of repeats setting (3  
times in the case of  
the example  
Multiply matrix A component ai,j by matrix B  
component bi,j  
program)  
Add product of ai,j and bi,j to product from previous  
repeat; ci,j has been calculated once repeat operation  
finishes  
α β  
I
Rev. 1.0, 09/99, page 79 of 115  
I
α β  
Shift matrix product ci,j obtained in process (2) 16  
bits to the left  
Store upper 16 bits of matrix product ci,j (cHi,  
MATRIXC+2n address  
j) in  
(3)  
(4)  
Store lower 16 bits (cLi,j) in MATRIXC+2n+2 address  
Return matrix A column components to first column  
Calculation of 1 component of matrix product C is  
finished, so increment R12 counter register  
Is calculation of 1  
No  
row of matrix product C finished?  
R11 = R12?  
(5)  
Yes  
Clear register R12 (clear counter)  
(6)  
(7)  
Shift matrix A row components down one row  
Calculation of 1 row of matrix product C is finished,  
so increment R10 counter register  
Is calculation of 3  
No  
rows of matrix product C finished?  
R13 = R10?  
Yes  
End  
Rev. 1.0, 09/99, page 80 of 115  
Main Program  
matrix.src  
;*******************************************************************************************  
;*  
;*  
;*  
;*  
Matrix operation routine  
[A][B]=[C]  
;*******************************************************************************************  
MAIN: MOV.L  
MOV.L  
#0,R10  
#0,R12  
MOV.L  
#MATRIXA,R4  
#MATRIXB,R6  
#MATRIXC,R7  
MOV.L  
MOV.L  
;****************************************  
;Calculate all components/R10, R13  
;****************************************  
MOV.L  
MATORIX:  
#3,R13  
;Set repeat value (number of rows)  
;**********************************  
;Calculate row components of n’th row  
;**********************************  
MOV.L  
#3,R11  
;Set repeat value (number of columns)  
RETSU:  
;****************************  
;Calculate 1 component  
;****************************  
BSR  
NOP  
BSR  
NOP  
SEIBUN  
STORE  
;****************************  
ADD  
#-6,R4  
;Return address to first column of row i  
of matrix A  
ADD  
#1,R12  
;Increment counter each time 1 component  
of 1 row of matrix product C is  
calculated  
CMP/EQ  
R11,R12  
;Is sum of products calculation for 1 row  
of matrix product C finished?  
BF  
RETSU  
MOV.L  
#0,R12  
;Clear counter  
;**********************************  
ADD  
#6,R4  
MOV.L  
ADD  
#MATRIXB,R6  
#1,R10  
;Increment counter when sum of products  
calculation for 1 row of matrix product C  
is finished  
CMP/EQ  
R13,R10  
;Is sum of products calculation for last  
row of matrix product C finished?  
Rev. 1.0, 09/99, page 81 of 115  
BF  
MATORIX  
;****************************************  
EXIT: BRA  
NOP  
EXIT  
;*******************************************************************************************  
;Matrix C 1 component calculation routine  
;*******************************************************************************************  
SEIBUN:  
REPEAT LOOP_S,LOOP_E,#3  
;Number of rows in matrix [A]  
is number of repeats  
PCLR  
PCLR  
M0  
A0  
;Clear for repeat  
LOOP_S:  
MOVX.W @R4+,X0 MOVY.W @R6+,Y0 ;aij,bij load  
PMULS X0,Y0,M0  
LOOP_E:PADD  
A0,M0,A0  
RTS  
NOP  
;*******************************************************************************************  
;Matrix C 1 component storage routine  
;*******************************************************************************************  
STORE: PSHA  
#16,A0  
MOVY.W A0,@R7+ ;Store upper bits of ci,j  
MOVY.W A0,@R7+ ;Store lower bits of ci,j  
RTS  
NOP  
;***********************  
MAIN_E:NOP  
Data  
*********************************************************************************  
;* Matrix operation data (XRAM/YRAM)  
;*********************************************************************************  
.SECTION XRAM,DATA,LOCATE=H'1000FF00  
MATRIXA:  
. XDATA.W  
0.5,0.125,0.5,0.125,0.5,0.125,0.5,0.125,0.5  
.SECTION YRAM,DATA,LOCATE=H'1001FF00  
MATRIXB:  
MATRIXC:  
.RES.W  
.RES.W  
0.25,0.0625,0.25,0.0625,0.25,0.0625,0.25,0.0625,0.25  
18  
Rev. 1.0, 09/99, page 82 of 115  
Section 10 Inner Product  
Overview  
The inner product (32-bit precision) of two non-zero n-dimensional space vectors, a (16-bit  
components) and b (16-bit components), is calculated. The n-dimensional space vectors a and b  
are set in XRAM and YRAM beforehand. The inner product of a and b is stored in YRAM at  
address H'1001FF00.  
Description  
1. Method of Expressing Space Vectors  
Figure 10.1 shows an expression of the components of n-dimensional space vector a. An n-  
dimensional space vector can be thought of as a vector consisting of a group of n real numbers.  
There are two ways of expressing the components of a vector: as a row vector and as a column  
vector.  
*1  
a1  
a2  
:
*1  
...  
a1, a2, , an  
an  
(a) Row vector  
(b) Column vector  
*1 ai: 16-bit  
Figure 10.1 Expression of Components of n-dimensional Space Vector a  
Rev. 1.0, 09/99, page 83 of 115  
2. Method of Calculating Inner Product  
Figure 10.2 shows an expression of the components of the inner product of n-dimensional  
space vectors a and b. Here the inner product of vectors a and b is expressed as (a,b).  
*1  
b1  
b2  
*1  
*2  
:
bi  
:
...  
...  
×
...  
...  
=
a1b1 + a2b2 + + aibi + + anbn  
a1, a2, , ai, , an  
n-dimensional  
space vector  
Row vector a  
bn  
n-dimensional  
space vector  
Column vector  
*1 ai: 16-bit  
bi: 16-bit  
*2 32-bit  
b
Figure 10.2 Expression of Components of Inner Product of n-dimensional Space  
Vectors a and b  
The inner product (a,b) is obtained using the following equation.  
3
(
a,b  
) = Σ a b  
i i  
i=1  
Using the above equation, the inner product (a,b) is obtained by performing a sum of products  
calculation on components ai of space vector a and components bi of space vector b.  
Rev. 1.0, 09/99, page 84 of 115  
3. Method of Storing Inner Product (a,b) of n-dimensional Space Vectors a and b  
Figure 10.3 shows the method of storing the inner product (a,b) components of n-dimensional  
space vectors a and b, which are set in XRAM and YRAM.  
Address  
VECTORA  
XRAM  
a1  
Address  
VECTORB  
YRAM  
b1  
VECTORA+2  
VECTORA+4  
a2  
a3  
VECTORB+2  
VECTORB+4  
b2  
b3  
VECTORA+2n–2  
VECTORA+2n  
an–1  
an  
VECTORB+2n–2  
VECTORB+2n  
bn–1  
bn  
Address  
#IN_PRO  
#IN_PRO+2  
YRAM  
*1  
(
) H  
) L  
a,b  
(
a,b  
*1 ( )H: Upper 16 bits of (  
a,b a,b  
)
(a,b)L: Lower 16 bits of (a,b)  
Figure 10.3 Method of Storing Inner Product (a,b) of n-dimensional  
Space Vectors a and b  
Rev. 1.0, 09/99, page 85 of 115  
4. Algorithm for Calculating Inner Product  
Figure 10.4 shows the algorithm for calculating the inner product (a,b). The details of the  
algorithm are described below.  
(1) Set the addresses where the space vector a and b components are stored as well as the  
address for storing the inner product of a and b in X address register (R4) and Y address  
registers (R6, R7).  
(2) Perform a sum of products calculation on components ai of space vector a and components  
bi of space vector b.  
(3) Store (a,b)H, the upper 16 bits of inner product (a,b) at the IN_PRO address and (a,b)L,  
the lower 16 bits of inner product (a,b), at the IN_PRO+2 address. This completes the  
process.  
(1)  
Initial setting  
sum of products calculation on components ai of  
b
space vector a and components bi of space vector  
n
(2)  
(a,b) = Σ (ai × bi)  
i=1  
Store (a,b)H, the upper 16 bits of inner product  
(
) at the IN_PRO address and ( )L, the lower  
a,b  
a,b  
(3)  
16 bits of inner product (a,b), at the IN_PRO+2  
address  
End  
Figure 10.4 Algorithm for Calculating Inner Product  
Rev. 1.0, 09/99, page 86 of 115  
Flowchart  
Start  
Transfer VECTORA (H'1000FF00) address to register  
R4  
(1-1)  
(1-2)  
Transfer VECTORB (H'1001FF00) address to register  
R6  
(1)  
Transfer IN_PRO (H'1001FF0A) address to register  
R7  
(1-3)  
(2-1)  
Use extended instruction REPEAT to set repeat start  
address (LOOP_S), repeat end address (LOOP_E),  
and number of repeats (n + 2 times)  
(2-2)  
(2-3)  
Clear register M0  
Clear register A0  
(2)  
After reading 1 component ai of vector a from XRAM,  
increment R4 address  
After reading 1 component bi of vector b from YRAM,  
increment R6 address  
(2-4)  
Multiply ai by bi  
i–1  
Σ ajbj  
Calculate aibi and  
j=1  
Shift obtained inner product (a,b) 16 bits to the left to  
obtain (a,b)L  
(3-1)  
(3-2)  
Store ( )H, the upper 16 bits of inner product (  
)
a,b  
a,b  
at IN_PRO address, increment IN_PRO address  
(3)  
Store ( )L, the lower 16 bits of inner product ( ),  
a,b  
a,b  
at IN_PRO+2 address  
End  
Rev. 1.0, 09/99, page 87 of 115  
Main Program  
This program calculates the inner product for the three-dimensional space vector {ai, bi (i = 1, 2,  
3)}.  
in_pro.src  
;*******************************************************  
;*  
;*  
;*  
;*  
Inner product calculation routine  
(a,b)=a1b1+a2b2+a3b3  
;*******************************************************  
;*******************************************************  
;*  
Initial setting routine  
;*******************************************************  
MAIN:  
MOV.L  
MOV.L  
MOV.L  
#VECTORA,R4  
#VECTORB,R6  
#IN_PRO,R7  
;*******************************************************************************************  
;* Sum of products calculation routine  
;*******************************************************************************************  
REPEAT LOOP_S,LOOP_S,#5  
;Number of components in vector a  
+ 2 is number of repeats  
PCLR  
PCLR  
PCLR  
PCLR  
A0  
M0  
X0  
Y0  
LOOP_S:  
PADD  
A0,M0,A0 PMULS X0,Y0,M0 MOVX.W @R4+,X0 MOVY.W @R6+,Y0;ai,bi load  
;*******************************************************************************************  
;* Inner product storage routine  
;*******************************************************************************************  
STORE:  
PSHA  
#16,A0  
MOVY.W A0,@R7+;Store upper bits  
of inner product  
MOVY.W A0,@R7 ;Store lower bits  
of inner product  
EXIT:  
BRA  
NOP  
EXIT  
MAIN_E: NOP  
Rev. 1.0, 09/99, page 88 of 115  
Data  
;*****************************************************************  
;* Inner product calculation data (XRAM/YRAM)  
;*****************************************************************  
.SECTION XRAM,DATA,LOCATE=H'1000FF00  
VECTORA:  
.XDATA.W  
0.5,0.125,0.5,0,0  
.SECTION YRAM,DATA,LOCATE=H'1001FF00  
VECTORB:  
IN_PRO:  
.XDATA.W  
.RES.W  
0.25,0.0625,0.25,0,0  
2
Rev. 1.0, 09/99, page 89 of 115  
Section 11 Square Root  
Overview  
A 16-bit fixed-point square root calculation is performed and a square root with 15-bit precision is  
obtained.  
Description  
1. I/O Value Data Format  
Figure 11.1 shows the data format for I/O values. The value, X, whose square root is to be  
determined is input in 16-bit format with its uppermost bit set to 0. However, it is also  
necessary to perform normalization on X before calculating the square root.  
The square root, X, is output in 16-bit (1 word) format with the uppermost bit set to 0.  
Bit: 15  
0
0
Bit: 15  
0
0
Input value  
Output value  
Square root, X  
X, whose square root  
is to be determined  
: Decimal point position  
Figure 11.1 I/O Value Data Format  
2. Method of Calculating Square Root  
Figure 11.2 illustrates the square root function. The example program calculates an  
approximate value for the square root of X using a polyline graph of the sort shown in Figure  
11.2 Square Root Function. Next, a gradualization equation is used to converge on a more  
accurate value. This is the method used to calculate the square root, X.  
Once normalization is performed on X, the range that can be taken by X, the value whose  
square root is to be calculated, is as follows.  
0 X < 1.0  
(H'00000 X H'7FFF)  
In the square root function shown in Figure 11.2, the slope of the polyline graph is created by a  
combination of comparatively gentle sections greater than 0.1 and steep sections less than 0.1,  
resulting in approximation equations (1) and (2). Using these two equations, an approximate  
square root value (y0) is obtained.  
Rev. 1.0, 09/99, page 91 of 115  
1.0  
0.7  
0.5  
y0 = 0.58579 × X + 0.41422  
0.5  
0.41422  
1.0  
y0 = 3.16228 × X  
0
0.1  
0.25  
0.5  
0.7  
1.0  
Value whose square root is  
to be determined, X  
Figure 11.2 Square Root Function  
Input value X > 0.1  
y0 = 0.58579 × X + 0.41422 ------------------------------------------------------------- (1)  
Input value X 0.1  
y0 = 3.16228 × X -------------------------------------------------------------------------- (2)  
(The actual program uses y0 = 0.79057 × X × 22.)  
Note that equation (2) cannot be used without modification for fixed-point calculation.  
Therefore, normalization is performed and it is used as y0 = 0.79057 × X × 22.  
Next, the value y0 obtained with approximation equations (1) and (2) is assigned to  
gradualization equation (3) to obtain a more accurate square root value, X.  
y0 = X = 1/2 (y0 + X/y0) ----------------------------------------------------------------- (3)  
Here, in item 2 of equation (3), since the value whose square root is being calculated, X, has  
been normalized, X/y0 must be a normalized value in order to y0 > X after the calculations of  
equations (1) and (2). In the sample program gradualization equation (3) is performed three  
times, resulting in a square root value with 15-bit precision.  
Rev. 1.0, 09/99, page 92 of 115  
3. Algorithm for Fixed-point Square Root Calculation  
The algorithm for fixed-point square root calculation is described below.  
(1) Initial settings are performed.  
(2) It is determined whether X, the value whose square root is to be calculated, is not 0. If X is  
0, the square root, X, is given as 0 and processing ends.  
(3) It is determined whether X, the value whose square root is to be calculated, is a negative  
number. If X is a negative number, the square root, X, is given as H'FFFF and processing  
ends.  
(4) X, the value whose square root is to be calculated, is compared to H'7FFB to determine  
whether it is larger or smaller. If X > H'7FFB, the square root, X, is given as X(=X) and  
processing ends.  
(5) X, the value whose square root is to be calculated, is compared to 0.1 to determine  
whether it is larger or smaller. If X > 0.1, processing continues with (6). If X 0.1,  
processing continues with (6)'.  
(6) Equation (1) is used to calculate approximate square root y0. Processing continues with  
(7).  
(6)' Equation (2) is used to calculate approximate square root y0. Processing continues with  
(7).  
(7) Approximate square root y0 is compared to X, the value whose square root is being  
calculated, to determine whether it is larger or smaller. If y0 = X, approximate square root  
y0 is divided by 2, 0.5 (H'4000) is added, the result is given as the square root, X, and  
processing ends.  
(8) If the comparison in (7) shows that X, the value whose square root is being calculated, is  
greater than approximate square root y0, gradualization equation X/y0 is not performed. In  
this case the square root, X, is given as H'FFFF and processing ends.  
(9) Gradualization equation (3) is used to calculate square root value y, which is given as the  
square root, X, and processing ends.  
Figure 11.3 shows the algorithm used for calculating the square root.  
Rev. 1.0, 09/99, page 93 of 115  
Initial setting  
(1)  
Yes  
Yes  
Yes  
Yes  
X = 0?  
(2)  
(3)  
No  
X = 0  
X = H'FFFF  
X = X  
X < 0?  
No  
(4)  
X > H'7FFB?  
No  
(5)  
(6)  
X > 0.1?  
No  
Calculate approximate square  
root y0 using equation (1)  
y0 = 0.58579 × X + 0.41422  
Calculate approximate square  
root y0 using equation (2)  
y0 = 3.16228 × X  
(6)'  
(7)  
Yes  
Yes  
y0 = X?  
No  
Divide approximate square root  
y0 by 2, add 0.5  
y0 = 1/2 (y0 + 1)  
(8)  
(9)  
y0 < X?  
No  
X = H'FFFF  
Calculate square root X using  
equation (3)  
y0 = √ X = 1/2 (y0 + X/y0)  
End  
Figure 11.3 Algorithm for Calculating Square Root  
Rev. 1.0, 09/99, page 94 of 115  
Flowchart  
Start  
(1-1)  
(1-2)  
(1-3)  
Transfer INPUT address to register R4  
Transfer EX_OUT address to register R5  
Transfer DAT address to register R6  
Transfer DAT2 address to register R7  
(1)  
(1-4)  
(2-1)  
Load input value X in register R0  
Is data value  
in register R0 (input value X) 0?  
(X = 0?)  
No  
(2-2)  
(2-3)  
(2)  
Yes  
Load H'0 in register X0  
(2-4)  
(2-5)  
Copy register X0 data (H'0) to register A0  
FIN  
Exchange lower word of data in register R0 and  
upper word of data in register R1  
(3-1)  
(3-2)  
Shift data in register R1 (upper word is input value X)  
1 bit to the left to determine sign  
No  
Is bit 31 of register R1 1?  
(X < 0?)  
(3-3)  
(3-4)  
Yes  
(3)  
Load H'FFFF in register X0  
(3-5)  
(3-6)  
Copy register X0 data (H'FFFF) to register A0  
FIN  
I
Rev. 1.0, 09/99, page 95 of 115  
I
Load input value X in register R0  
Load H'7FFB in register R1  
(4-1)  
(4-2)  
No  
Is R0 greater than R1?  
(4-3)  
X > H'7FFB?  
(4)  
Yes  
Transfer EX_OUT2 address to register R5  
(4-4)  
(4-5)  
Load input value X in register X0  
Copy register X0 data to register A0  
(4-6)  
FIN  
(5-1)  
(5-2)  
Transfer DAT2 address to register R7  
Load 0.1 in register R1  
(5)  
No  
Is R0 greater than R1?  
(5-3)  
(6-1)  
X > 0.1?  
Yes  
Load input value X in register X1  
Load data for approximate square root calculation  
output (0.58579) in register Y0  
(6-2)  
(6-3)  
Load input value X in register R1  
(6)  
Transfer WORK address to register R4  
Multiply register X1 and register Y0 (0.58579X)  
Load data for approximate square root calculation  
output (0.41422) in register Y1  
(6-4)  
(6-5)  
Multiply register A1 and register Y1 (0.58579X +  
0.41422)  
II  
α
Rev. 1.0, 09/99, page 96 of 115  
α
II  
(6'-1)  
(6'-2)  
Transfer KINJI2 address to register R6  
Load input value X in register X1  
Load data for approximate square root calculation  
output (0.79057) in register Y0  
(6'-3)  
(6'-4)  
Load input value X in register R1  
(6)'  
Transfer WORK address to register R4  
Multiply register X1 and register Y0 (0.79057X)  
(6'-5)  
(6'-6)  
Shift 2 bits to left to multiply 0.79057X by 4  
Load approximate square root y0 in register R0 via  
@R4  
(7-1)  
(7-2)  
Is approximate square  
root y0 equivalent to input value X?  
y0 = X?  
No  
Yes  
(7)  
Shift data in register A0 1 bit to right to multiply  
approximate square root y0 by 1/2  
Load 0.5 in register Y1  
(7-3)  
(7-4)  
Add register A0 and register Y1 (y  
result in register A0  
0/2 + 0.5), store  
FIN  
Is input value X greater  
than approximate square root y0?  
X > y0?  
No  
(8-1)  
Yes  
(8)  
(8-2)  
(8-3)  
Load H'FFFF in register X0  
Copy register X0 data (H'FFFF) to register X0  
FIN  
III  
Rev. 1.0, 09/99, page 97 of 115  
III  
Set register R14 to 3 (number of times to perform  
gradualization equation)  
(9-1)  
(9-2)  
(9-3)  
Clear register R13 to 0  
Increment register R13 (repeat counter)  
(9-4)  
(9-5)  
Save input value X in register R11  
Clear register R12  
Use extended instruction REPEAT to set repeat start  
address (LOOP_S), repeat end address (LOOP_E),  
and number of repeats (15 times)  
(9-6)  
(9-7)  
(9-8)  
Initialize for signless division  
(9)  
Perform 1-step division on X using y0  
Store T bit in R12, shift R12 1 bit to left  
Program repeats number of times  
specified as number of repeats (15  
times in case of sample program)  
(9-9)  
Transfer X/y0 to register Y0 via @R4  
(9-10)  
(9-11)  
(9-12)  
Copy register X0 to register Y1  
Shift data in register A0 1 bit to right to multiply X by 1/2  
Shift data in register X1 1 bit to right to multiply X by 1/2  
(9-13)  
(9-14)  
Add calculation results from (9-12) and (9-13) to  
obtain square root y (X). Store calculation result in  
register A0  
(9-15)  
(9-16)  
Transfer y (X) to register Y0 via @R4  
Restore input value X in register R1 from register R11  
IV  
β
Rev. 1.0, 09/99, page 98 of 115  
β
IV  
Is register R13  
greater than register R14?  
No  
(9-17)  
(9)  
Yes  
FIN  
(9-18)  
Store data from register A0 in register R7 (OUTPUT)  
End  
Rev. 1.0, 09/99, page 99 of 115  
Main Program  
rout.src  
;*******************************************************************************************  
;*  
;*  
;*  
;*  
Square root calculation routine  
X  
;*******************************************************************************************  
;*******************************************************************************************  
;*  
Initial setting routine  
;*******************************************************************************************  
MAIN:  
MOV.L  
MOV.L  
MOV.L  
MOV.L  
#INPUT,R4  
#EX_OUT,R5  
#KINJI1,R6  
#DAT1,R7  
;*******************************************************************************************  
;* Zero check of value to have square root calculated routine  
;*******************************************************************************************  
MOV.W  
CMP/EQ  
BF  
@R4,R0  
#0,R0  
ZERO_CH  
;If zero, do following  
;End of processing  
processing  
MOVX.W @R4,X0  
PCOPY X0,A0  
BRA  
NOP  
FIN  
;*******************************************************************************************  
;* Negative value check of value to have square root calculated routine  
;*******************************************************************************************  
ZERO_CH:  
SWAP  
SHAL  
BF  
R0,R1  
R1  
MINUS_CH  
;If negative, do following  
processing  
MOVX.W @R5,X0  
PCOPY  
BRA  
X0,A0  
FIN  
;End of processing  
NOP  
;;******************************************************************************************  
;* Comparison of value to have square root calculated and F'7FFB routine  
;*******************************************************************************************  
MINUS_CH:  
Rev. 1.0, 09/99, page 100 of 115  
MOV.W  
MOV.W  
CMP/GT  
BF  
@R4,R0  
@R7,R1  
R1,R0  
;X load  
;H'7FFB load  
;R0 > R1 ?  
EQU_SEL  
;If X > F'7FFB, do following  
processing  
MOV.L  
#EX_OUT2,R5  
FIN  
MOVX.W @R5,X0  
;X load  
PCOPY X0,A0  
BRA  
NOP  
;*******************************************************************************************  
;* Approximation equation selection routine  
;*******************************************************************************************  
EQU_SEL:  
MOV.L  
MOV.W  
CMP/GT  
BF  
#DAT2,R7  
@R7,R1  
R1,R0  
Y0_PRO2  
;If X 0.1, jump  
********************************************************************************************  
;* Approximate square root y0 calculation routine  
;*******************************************************************************************  
Y0_PRO1:  
MOVX.W @R4,X1 MOVY.W @R6+,Y0;Load input value X (value to  
have square root calculated)  
for use in calculating  
approximate square root  
MOV.W  
MOV.L  
@R4,R1  
;Keep input value X (value to  
have square root calculated)  
in R1  
#WORK,R4  
PMULS X1,Y0,A1  
MOVY.W @R6+,Y1;0.58579X,0.41422 load  
;0.58579X+0.41422 -> y0  
PADD  
BRA  
A1,Y1,A0  
HIKAKU  
NOP  
;*******************************************************************************************  
;* Approximation equation (2) y0 calculation routine  
;*******************************************************************************************  
Y0_PRO2:  
MOV.L  
#KINJI2,R6  
MOVX.W @R4,X1 MOVY.W @R6+,Y0;Load input value X (value to  
have square root calculated)  
for use in calculating  
approximate square root  
MOV.W  
MOV.L  
@R4,R1  
;Keep input value X (value to  
have square root calculated)  
in R1  
#WORK,R4  
Rev. 1.0, 09/99, page 101 of 115  
PMULS X1,Y0,A1  
PSHA #2,A0  
MOVY.W @R6+,Y1;0.58579X,0.41422 load  
;0.58579X+0.41422 -> y0  
********************************************************************************************  
;* Comparison of approximate square root and value to have square root  
calculated routine/Part 1  
;*******************************************************************************************  
HIKAKU:  
MOVX.W A0,@R4  
;Pass to CPU unit  
MOV.W  
@R4,R0  
R0,R1  
CMP/EQ  
;Approximate square root y0 =  
input value X (value to have  
square root calculated)?  
BF  
NOT_EQ  
;If y0 X, do following  
processing  
PSHA  
PADD  
BRA  
#-1,A0  
A0,Y1,A0  
FIN  
MOVY.W @R6,Y1 ;y0/2,0.5 load  
;y0/2-0.5  
;End of processing  
NOP  
;*******************************************************************************************  
;* Comparison of approximate square root and value to have square root  
calculated routine/Part 2  
;*******************************************************************************************  
NOT_EQ:  
CMP/GT  
BF  
R0,R1  
NOT_GT  
;If y0 < X, do following  
processing  
MOVX.W @R5,X0 ;H'FFFF load  
PCOPY X0,A0  
BRA  
NOP  
FIN  
;*******************************************************************************************  
;* Square root y calculation using gradualization equation routine  
;*******************************************************************************************  
NOT_GT:  
MOV.L  
MOV.L  
#3,R14  
#0,R13  
;Set number of repeats  
;Increment counter  
LENEAR_LP:  
ADD  
#1,R13  
MOV  
R1,R11  
;push X  
MOV.L  
REPEAT  
DIV0U  
#0,R12  
;Clear register R12  
LOOP_S,LOOP_E,#15  
;Signless initialization  
;R1/R0  
LOOP_S:  
LOOP_E:  
DIV1  
R0,R1  
Rev. 1.0, 09/99, page 102 of 115  
ROTCL  
MOV.W  
R12  
;Store T bit  
R12,@R4  
MOVX.W @R4,X0  
MOVX.W A0,@R4  
PCOPY X0,Y1  
PSHA  
PSHA  
PADD  
#-1,A0  
#-1,Y1  
;y0/2  
;(X/y0)/2  
A0,Y1,A0  
MOV.W  
MOV  
@R4,R0  
R11,R1  
;pop X  
CMP/GT  
BF  
R14,R13  
LENEAR_LP  
;If set number of repeats has  
been performed, escape  
FIN:  
MOV.L  
#OUTPUT,R7  
EXIT  
MOVY.W A0,@R7 ;Store square root X  
EXIT: BRA  
NOP  
MAIN_E:NOP  
Data  
;*******************************************************************************************  
;* Square root calculation data (XRAM/YRAM)  
;*******************************************************************************************  
.SECTION XRAM,DATA,LOCATE=H'1000FF00  
INPUT:  
WORK:  
.RES.W  
1
;External input data storage area  
;Work area  
.RES.W  
1
EX_OUT:  
EX_OUT2:  
.DATA.W  
.XDATA.W  
H'FFFF  
1
;Output value if input value X < 0  
;Output value if input value X > H'7FFB  
.SECTION YRAM,DATA,LOCATE=H'1001FF00  
KINJI1:  
KINJI2:  
DAT1:  
.XDATA.W  
.XDATA.W  
.DATA.W  
.XDATA.W  
.RES.W  
0.58579,0.41422,0.5  
;Approximation equation (1)  
;Approximation equation (2)  
0.79057  
H'7FFB  
0.1  
DAT2:  
OUTPUT:  
1
;External output data storage area  
Rev. 1.0, 09/99, page 103 of 115  
Execution Example  
The input values for X (INPUT) and the square root X values calculated (OUTPUT) are shown  
in table 11.1.  
Table 11.1 Square Root X Calculation Results (3 Executions of Gradualization Equation)  
Logical Value  
(decimal)  
X  
Logical Value  
(hexadecimal)  
X  
Output Value  
(hexadecimal)  
X  
Input Value X  
(decimal)  
Input Value X  
(hexadecimal)  
0.9999  
0.99987  
0.85  
H'7FFC  
H'7FFB  
H'6CCD  
H'42F1  
H'2BB5  
H'1168  
H'0B23  
H'0147  
H'0000  
H'A667  
0.99995  
0.99993  
0.92195  
0.72319  
0.5831  
0.36878  
0.29496  
0.1  
H'7FFE  
H'7FFD  
H'7602  
H'5C91  
H'4AA3  
H'2F34  
H'25C1  
H'0CCD  
H'0000  
H'7FFF  
H'7FFD  
H'7602  
H'5C90  
H'4AA2  
H'2F33  
H'25C1  
H'0CC9  
H'0000  
H'FFFF  
0.523  
0.34  
0.136  
0.087  
0.01  
0
0
–0.7  
Rev. 1.0, 09/99, page 104 of 115  
Section 12 Square Mean Error  
Overview  
The square mean error of two variables, a[i] (16-bit components) and b[i] (16-bit components), is  
calculated.  
(i = 1, 2, ..., n)  
Description  
1. Method of Obtaining Square Mean Error  
In order to obtain the square mean error, first the error e[i] for the two variables, a[i] and b[i],  
must be considered. The relevant equation is given as equation (1) below.  
*1 e[i] = a[i] – b[i] ------------------------------------------------------------------------- (1)  
(i = 1, 2, ..., n)  
Next, the error distribution Se2 is obtained. The error distribution Se2 can be calculated by  
dividing the sum total of the squares of the errors e[i] by the number of components (n). The  
components of the squares of the errors e[i] can be expressed as follows.  
1/n · Σe[i]2 = 1/n · (a[1] – b[1])2 + (a[2] – b[2]2 + + (a[n] – b[n])2  
...  
The error distribution Se2 can be obtained using equation (2) below.  
n
Se2 = 1/n · Σ (a[i] – b[i])2  
----------------------------------------------------------------- (2)  
i=1  
The square mean error E[Se2] is expressed as the square root of the error distribution Se2. The  
relevant equation for obtaining the square mean error E[Se2] is shown as equation (3) below.  
n
E[e2] = 1/n · Σ (a[i] – b[i])2  
------------------------------------------------------------- (3)  
i=1  
*1 a[i]: 16-bit  
b[i]: 16-bit  
e[i]: 16-bit  
Rev. 1.0, 09/99, page 105 of 115  
2. Method of Storing Components of Variables a[i] and b[i]  
On order to obtain the square mean error, it is first necessary to calculate the sum total of the  
squares of the errors e[i]. To increase processing speed, the components of a[i] and b[i] are  
stored in XRAM and YRAM ahead of time as shown in figure 12.1. Note that 0 is stored in  
VECTORA+2n, VECTORA+2n+2, VECTORB+2n, and VECTORB+2n+2 of XRAM and  
YRAM. The example program will not run properly if zeros are not stored in these locations.  
For division by the number of components n, the numeric value 1/n is stored in XRAM. The  
actual program does not use a DSP instruction, but rather multiplies values by 1/n.  
XRAM  
YRAM  
Address  
VECTORA  
VECTORA+2  
VECTORA+4  
VECTORA+6  
Address  
VECTORB  
VECTORB+2  
VECTORB+4  
VECTORB+6  
15  
0
15  
0
a[1]  
a[2]  
a[3]  
b[1]  
b[2]  
b[3]  
VECTORA+2n–4  
VECTORA+2n–2  
VECTORA+2n  
a[n–1]  
a[n]  
0
VECTORB+2n–4  
VECTORB+2n–2  
VECTORB+2n  
b[n–1]  
b[n]  
0
VECTORA+2n+2  
0
VECTORB+2n+2  
0
XRAM  
1/n  
Address  
15  
0
VECTORA  
Figure 12.1 Memory Map of Storage of Variables a[i] and b[i], Etc.  
Rev. 1.0, 09/99, page 106 of 115  
3. Algorithm for Calculating Square Mean Error  
The algorithm used to calculate the square mean error is described below.  
(1) Perform initial settings.  
(2) Set items (2) and (3) so that the number of repeats is number of elements n + 2. Two extra  
repeats are added since the following four instructions run in parallel.  
e[i]2 + iΣ–1 e[j]2  
Calculate  
, calculate e[i], load a[i], load b[i]  
j=1  
(3) Calculate the error e[i] for a[i] and b[i].  
n
Σ (a[i] – b[i])2  
(4) Divide  
, which was obtained using processes (2) and (3), by n.  
i=1  
(5) Calculate the square root of the input error distribution Se2. This yields the square mean  
error and completes the processing. (For details, see 3. Algorithm for Fixed-point Square  
Root Calculation in 11. Square Root.)  
(1)  
Initial setting  
Execute the following 4 instructions in parallel  
Calculate e[i]2 + iΣ1e[j]2, calculate e[i]2, load a[i], load  
(2)  
j=1  
Number of repeats is number of  
components n + 2  
Calculate error for a[i] and b[i]  
e[i] = a[i] – b[i]  
(3)  
Divide Σ(a[i] – b[i])2 by n  
n
(4)  
(5)  
Se2 = 1/2 · Σ (a[i] – b[i])2  
i=1  
Calculate square root of Se2  
End  
Figure 12.2  
Rev. 1.0, 09/99, page 107 of 115  
Flowchart  
Start  
(1-1)  
(1-2)  
(1-3)  
Transfer VECTORA address to register R4  
Transfer SEIBUN_N address to register R5  
Transfer VECTORB address to register R6  
(1)  
Use extended instruction REPEAT to set repeat start  
address (LOOP_S), repeat end address (LOOP_E),  
and number of repeats (5 times)  
(2-1)  
(2-2)  
Clear register A1  
Clear register Y0  
Clear register Y0  
(2-3)  
(2-4)  
(2)  
Add e[i]2 and iΣ1e[j]2  
j=1  
(2-5)  
Calculate e[i]2  
After reading a[i] from XRAM, increment R4 address  
After reading b[i] from YRAM, increment R6 address  
Program repeats number of  
times specified as number  
of repeats (5 times in case  
of sample program)  
(3)  
(4)  
Calculate error e[i] for a[i] and b[i]  
(3-1)  
Copy contents of register X0 to register A1  
Read 1/n to register X1  
(4-1)  
n
Multiply Σe[j]2 and 1/n  
(4-2)  
i=1  
I
Rev. 1.0, 09/99, page 108 of 115  
I
(5-1)  
(5-2)  
Transfer INPUT address to register R4  
Store error distribution Se2 (register A1) at input  
address (INPUT) used for square root output  
(5)  
<Square root calculation routine>  
(See flowchart in section 11, Square Root for details)  
End  
Rev. 1.0, 09/99, page 109 of 115  
Main Program  
The example program calculates the square mean error using three components {a[i], b[i] (i = 1, 2,  
3)}  
squ_ave.src  
;*******************************************************************************************  
;*  
;*  
;*  
;*  
Square mean routine  
a[i],b[i]  
;*******************************************************************************************  
;*******************************************************************************************  
;*  
Initial setting routine  
;*******************************************************************************************  
MAIN:  
MOV.L  
MOV.L  
MOV.L  
#VECTORA,R4  
#SEIBUN_N,R5  
#VECTORB,R6  
;*******************************************************************************************  
;* Error distribution calculation routine  
;*******************************************************************************************  
REPEAT LOOP_S,LOOP_E,#5  
;Number of repeats is number of  
vector a components + 2  
PCLR  
PCLR  
PCLR  
A1  
Y0  
A0  
LOOP_S:  
LOOP_E:  
PADD  
PSUB  
A0,Y0,Y0 PMULS  
X0,Y1,A1  
A1,A1,A0 MOVX.W @R4+,X0 MOVY.W @R6+,Y1;a[i],b[i]load  
PCOPY Y0,A1  
MOVX.W @R5,X1  
;1/3 load  
PMULS X1,A1,A1  
;0.33333 × Σ(a[i] - b[i])2  
;*******************************************************************************************  
;* Value to have square root calculated storage routine  
;*******************************************************************************************  
MOV.L #INPUT,R4  
MOVX.W A1,@R4  
;
;*******************************************************************************************  
;* Square root calculation routine  
;*******************************************************************************************  
;*******************************************************************************************  
;*  
Initial setting routine  
Rev. 1.0, 09/99, page 110 of 115  
;*******************************************************************************************  
SEMI_MAIN:  
MOV.L  
MOV.L  
MOV.L  
#EX_OUT,R5  
#DAT,R6  
#DAT2,R7  
;*******************************************************************************************  
;* Zero check of value to have square root calculated routine  
;*******************************************************************************************  
MOV.W  
CMP/EQ  
BF  
@R4,R0  
#0,R0  
ZERO_CH  
MOVX.W @R4,X0  
;H'0 load  
PCOPY X0,A0  
;
BRA  
NOP  
FIN  
;End of processing  
;*******************************************************************************************  
;* Negative value check of value to have square root calculated routine  
;*******************************************************************************************  
ZERO_CH:  
SWAP  
SHAL  
R0,R1  
R1  
BF  
MINUS_CH  
;If negative, do  
;H'FFFF load  
following processing  
MOVX.W @R5,X0  
PCOPY  
BRA  
X0,A0  
FIN  
;End of processing  
NOP  
;*******************************************************************************************  
;*  
Comparison of value to have square root calculated and F'7FFB  
routine  
;*******************************************************************************************  
MINUS_CH:  
MOV.W  
MOV.W  
CMP/GT  
BF  
@R4,R0  
;X load  
@R7,R1  
;H'7FFB load  
;R0 > R1 ?  
R1,R0  
EQU__SEL  
#EX_OUT2,R5  
;If R1 is greater, jump  
MOV.L  
MOVX.W @R5,X0  
;X load  
PCOPY X0,A0  
BRA  
NOP  
FIN  
;*******************************************************************************************  
;* Approximation equation selection routine  
Rev. 1.0, 09/99, page 111 of 115  
;*******************************************************************************************  
EQU_SEL:  
MOV.L  
MOV.W  
CMP/GT  
BF  
#DAT2,R7  
@R7,R1  
R1,R0  
Y0_PRO2  
;*******************************************************************************************  
;* Approximation equation (1) y0 calculation routine  
;*******************************************************************************************  
Y0_PRO1:  
MOVX.W @R4,X1  
MOVY.W @R6+,Y0 ;Load input value X  
(value to have square  
root calculated) for use  
in calculating  
approximate square root  
MOV.W  
@R4,R1  
;Keep input value X  
(value to have square  
root calculated) in R1  
MOV.L  
#WORK,R4  
PMULS X1,Y0,A1  
MOVY.W @R6+,Y1 ;0.58579X,0.41422 load  
;0.58579X+0.41422-> y0  
PADD  
BRA  
A1,Y1,A0  
HIKAKU  
NOP  
;*******************************************************************************************  
;* Approximation equation (2) y0 calculation routine  
;*******************************************************************************************  
Y0_PRO2:  
MOV.L  
#KINJI2,R6  
MOVX.W @R4,X1  
MOVY.W @R6+,Y0 ;Load input value X  
(value to have square  
root calculated) for use  
in calculating  
approximate square root  
MOV.W  
@R4,R1  
;Keep input value X  
(value to have square  
root calculated) in R1  
MOV.L  
#WORK,R4  
PMULS X1,Y0,A0  
;0.79057 × X  
PSHA  
#2,A0  
;(0.79057 × X) × 4  
;*******************************************************************************************  
;*  
Comparison of approximate square root and value to have square root  
calculated routine/Part 1  
;*******************************************************************************************  
HIKAKU:  
MOVX.W A0,@R4  
;Pass to CPU unit  
MOV.W  
@R4,R0  
R0,R1  
CMP/EQ  
;Approximate square root  
= input value X (value  
to have square root  
calculated)?  
Rev. 1.0, 09/99, page 112 of 115  
BF  
NOT_EQ  
FIN  
PSHA  
PADD  
BRA  
NOP  
#-1,A0  
MOVY.W @R6,Y1 ;y0/2,0.5 load  
;y0/2-0.5  
A0,Y1,A0  
;*******************************************************************************************  
;*  
Comparison of approximate square root and value to have square root  
calculated routine/Part 2  
;*******************************************************************************************  
NOT_EQ:  
CMP/GT  
BF  
R0,R1  
NOT_GT  
MOVX.W @R5,X0  
;H'FFFF load  
PCOPY X0,A0  
BRA  
NOP  
FIN  
;
;*******************************************************************************************  
;* Square root y calculation using gradualization equation routine  
;*******************************************************************************************  
NOT_GT:  
MOV.L  
MOV.L  
#3,R14  
#0,R13  
;Set number of repeats  
;Increment counter  
LENEAR_LP:  
ADD  
#1,R13  
MOV  
R1,R11  
MOV.L  
REPEAT  
DIV0U  
#0,R12  
DIV_S,DIV_E,#15  
;Signless initialization  
;R1/R0  
DIV_S:  
DIV_E:  
DIV1  
R0,R1  
ROTCL  
MOV.W  
R12  
;Store T bit  
R12,@R4  
MOVX.W @R4,X0  
PCOPY X0,Y1  
PSHA  
PSHA  
PADD  
#-1,A0  
;y0/2  
#-1,Y1  
;(X/y0)/2  
A0,Y1,A0  
MOVX.W A0,@R4  
MOV.W  
MOV  
@R4,R0  
R11,R1  
CMP/GT  
BF  
R14,R13  
LENEAR_LP  
Rev. 1.0, 09/99, page 113 of 115  
FIN:  
MOV.L  
#OUTPUT,R7  
MOVY.W A0,@R7 ;Store square root X  
EXIT: BRA  
NOP  
EXIT  
MAIN_E:NOP  
Data  
;*******************************************************************************************  
;* Square mean calculation data (XRAM/YRAM)  
;*******************************************************************************************  
.SECTION XRAM,DATA,LOCATE=H'1000FF00  
VECTERA:  
.XDATA.W  
.XDATA.W  
0.5,0.125,0.5,0,0  
0.33333  
SEIBUN_N:  
;1/number of components (n)  
;* For calculating square root *  
INPUT:  
WORK:  
.RES.W  
1
.RES.W  
1
EX_OUT:  
EX_OUT2:  
.DATA.W  
.XDATA.W  
H'FFFF  
1
.SECTION YRAM,DATA,LOCATE=H'1001FF00  
.XDATA.W 0.25,0.0625,0.25,0,0  
VECTERB:  
;; * For calculating square root *  
KINJI1:  
KINJI2:  
DAT1:  
.XDATA.W  
.XDATA.W  
.DATA.W  
.XDATA.W  
.RES.W  
0.58579,0.41422,0.5  
;Approximation equation (1)  
;Approximation equation (2)  
0.79057  
H'7FFB  
0.1  
DAT2:  
OUTPUT:  
1
Rev. 1.0, 09/99, page 114 of 115  
Section 13 Effects of DSP Instructions on Program  
Performance  
The number of execution cycles required by each function program file is listed in tables 13.1 and  
13.2.  
The test conditions used for table 13.1 were as follows: an E8000 (SH7612) emulator was used,  
the main program of each program file was allocated to XRAM, and the data was allotted to  
XRAM and YRAM.  
The test conditions used for table 13.2 were as follows: a simulator (SH-DSP) was used, the main  
program of each program file was allocated to XROM, and the data was allotted to XRAM and  
YRAM.  
Table 13.1 Performance of Programs Employing DSP Instructions  
No. of Execution  
Program Filename  
pmuls32.src  
tri_fun.src  
Function  
Cycles  
116  
62  
Notes  
32-bit multiplication  
Trigonometric function  
Matrix operation  
Inner product  
matrix.src  
238  
15  
3 × 3 matrix operation  
in_pro.src  
3-dmensional space vectors  
rout.src  
Square root  
104  
114  
squ_ave.src  
Square mean error  
n = 3 (3 components)  
Table 13.2 Performance of Programs Employing DSP Instructions  
No. of Execution  
Program Filename  
pmuls32.src  
tri_fun.src  
Function  
Cycles  
172  
80  
Notes  
32-bit multiplication  
Trigonometric function  
Matrix operation  
Inner product  
matrix.src  
378  
21  
3 × 3 matrix operation  
in_pro.src  
3-dmensional space vectors  
rout.src  
Square root  
272  
292  
squ_ave.src  
Square mean error  
n = 3 (3 components)  
Rev. 1.0, 09/99, page 115 of 115  
SH-DSP Software Application Note  
Publication Date: 1st Edition, September 1999  
Published by:  
Electronic Devices Sales & Marketing Group  
Semiconductor & Integrated Circuits  
Hitachi, Ltd.  
Edited by:  
Technical Documentation Group  
UL Media Co., Ltd.  
Copyright © Hitachi, Ltd., 1999. All rights reserved. Printed in Japan.  

相关型号:

HD6437041ACF

Renesas 32-Bit Single-Chip RISC Microprocessor SuperH RISC engine Family/SH7040 Series(CPU Core SH-2)
RENESAS

HD6437041ACF28

Renesas 32-Bit RISC Microcomputer SuperH RISC engine Family/SH7040 Series
RENESAS

HD6437041AF

Renesas 32-Bit Single-Chip RISC Microprocessor SuperH RISC engine Family/SH7040 Series(CPU Core SH-2)
RENESAS

HD6437041AF28

Renesas 32-Bit RISC Microcomputer SuperH RISC engine Family/SH7040 Series
RENESAS

HD6437041AVCF16

Renesas 32-Bit RISC Microcomputer SuperH RISC engine Family/SH7040 Series
RENESAS

HD6437041AVF16

Renesas 32-Bit RISC Microcomputer SuperH RISC engine Family/SH7040 Series
RENESAS

HD6437041F28

32-Bit Microcontroller
ETC

HD6437041F33

Microcontroller, 32-Bit, MROM, CMOS, PQFP144, 20 X 20 MM, QFP-144
HITACHI

HD6437041VF16

Microprocessor
ETC

HD6437042

SuperH RISC Engine SH-1/SH-2/SH-DSP Programming Manual Programming Manual
ETC

HD6437042ACF

Renesas 32-Bit Single-Chip RISC Microprocessor SuperH RISC engine Family/SH7040 Series(CPU Core SH-2)
RENESAS

HD6437042ACF28

Renesas 32-Bit RISC Microcomputer SuperH RISC engine Family/SH7040 Series
RENESAS