Computer technology has made incredible progress in the past half century. In 1945, there were no stored-program computers. Today, a few thousand dollars will purchase a personal computer that has more performance, more main memo-ry, and more disk storage than a computer bought in 1965 for $1 million. This
rapid rate of improvement has come both from advances in the technology used to build computers and from innovation in computer design. While technological improvements have been fairly steady, progress arising from better computer architectures has been much less consistent. During the first 25 years of elec-
tronic computers, both forces made a major contribution; but beginning in about 1970, computer designers became largely dependent upon integrated circuit tech-nology. During the 1970s, performance continued to improve at about 25% to 30% per year for the mainframes and minicomputers that dominated the industry.
The late 1970s saw the emergence of the microprocessor. The ability of themicroprocessor to ride the improvements in integrated circuit technology more closely than the less integrated mainframes and minicomputers led to a higher rate of improvement—roughly 35% growth per year in performance.
912 trang |
Chia sẻ: banmai | Lượt xem: 2482 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Computer organization and design - The hardware-software interface, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
s25 Const16
Opx6
Opx3
Opx11
Rs15 Const14 Opx2
O C
Opx2
O C
Rs25 Rs15
Rs15
Const11
Const19
Const26
Const26
Const24
Const11
Const30
Const5
Const16
Const16
Rs15 Rd5
Rd5 Rs15
Rs25 Rd5
Rd5
Opx6 Const11
Opx6 Rs15 1 Const13
Rs15 Rs25 Rd5
Rd5 Rs15
Rs15
Rs25 Opx11
Opx6
Opx11
Opx11Rs215 Rs125 Rd5
Opx80Rd5 Opx6 Rs25
Op6
Op6
Op6
Op6
Op6
Op6
Op6
Op2
Op2
Op2
Op2
Op6
Op6
Op6
Op6
Op6
Op6
Op6
Op6
31 29 24 18 13 12 4 0
31 29 24 18 13 12 0
31 29 18 12 1 0
31 29 20 15 12 1 0ister fields are located in similar pieces of the instruction, be aware that the destination and two source fields
ere are the meanings of the abbreviations: Op = the main opcode, Opx = an opcode extension, Rd = the
ter, Rs1 = source register 1, Rs2 = source register 2, and Const = a constant (used as an immediate or as
sion 2.0 of PA-RISC will include a 16-bit add immediate format and a 17-bit address for calls. Note that our
X in Chapters 2 and 3 numbers bits from left to right, while this figure uses right-to-left numbering.
C.3 Instructions: The DLX Subset
C-5
Format: instruction category DLX MIPS IV PA-RISC 1.1 PowerPC SPARC V9
Branch: all Sign Sign Sign Sign Sign
Jump/call: all Sign — Sign Sign Sign
Register-immediate: data transfer Sign Sign Sign Sign Sign
Register-immediate: arithmetic Sign Sign Sign Sign Sign
Register-imme
FIGURE C.4 S
tended since the
immediate instru
The similarities of each architecture allow simultaneous descriptions, starting
with the operations equivalent to DLX.
DLX Instructions
Almost every instruction found in DLX is found in the other architectures, as
Figure C.5 shows. (For reference, definitions of the DLX instructions are found
in Figure 2.25 of Chapter 2 and on the back inside cover.) Instructions are listed
under four categories: data transfer; arithmetic, logical; control; and floating
point. A fifth category in the figure shows conventions for register usage and
pseudo-instructions on each architecture. If a DLX instruction requires a short se-
quence of instructions in other architectures, these instructions are separated by
semicolons in Figure C.5. (To avoid confusion, the destination register will
always be the leftmost operand in this appendix, independent of the notation nor-
mally used with each architecture.)
Every architecture must have a scheme for compare and conditional branch,
but despite all the similarities, each of these architectures has found a different
way to perform the operation.
diate: logical Sign Zero — Zero Sign
ummary of constant extension. The constants in the jump and call instructions of MIPS are not sign ex-
y only replace the lower 28 bits of the PC, leaving the upper 4 bits unchanged (PA-RISC has no logical
ctions).
C.3 Instructions: The DLX Subset
C-6
Appendix C Survey of RISC Architectures
Instruction na
Data transfer
(instruction fo
Load byte sign
Load byte unsi
Load half word
Load half word
unsigned
Load word
Load SP float
Load DP float
Store byte
Store half wor
Store word
Store SP float
Store DP float
Read, write
special registe
Move int. to F
Move FP to in
Arithmetic, lo
(instruction fo
Add
Add (trap if ov
Sub
Sub (trap if ov
Multiply
Multiply (trap
Divide
Divide (trap if
And
Or
Xor
me DLX MIPS IV PA-RISC 1.1 PowerPC SPARC V9
rmats) R–I R–I R–I, R–R R–I, R–R R–I, R–R
ed LB LB LDB;
EXTRS,8,31
LBZ; EXTSB LDSB
gned LBU LBU LDB,LDBX,LDBS LBZ LDUB
signed LH LH LDH;
EXTRS16,31
LHA LDSH
LHU LHU LDH,LDHX,LDHS LHZ LDUH
LW LW LDW,LDWX, LDWS LW LD
LF LWC1 FLDWX,FLDWS LFS LDF
LD LDC1 FLDDX,FLDDS LFD LDDF
SB SB STB,STBX,STBS STB STB
d SH SH STH,STHX,STHS STH STH
SW SW STW,STWX,STWS STW ST
SF SWC1 FSTWX,FSTWS STFS STF
SD SWD1 FSTDX,FSTDS STFD STDF
rs
MOVS2I,
MOVI2S
MF, MT_ MFCTL, MTCTL MFSPR, MF_,
MTSPR, MT_
RD,WR,
RDPR,WRPR,
LDXFSR, STXFSR
P reg. MOVI2FP MFC1 STW; FLDWX STW; LDFS ST; LDF
t. reg. MOVFP2I MTC1 FSTWX; LDW STFS; LW STF; LD
gical
rmats) R–R, R–I R–R, R–I R–R, R–I R–R, R–I R–R, R–I
ADDU,ADDUI ADDU,
ADDIU
ADDL, LD0,
ADDI, UADDCM
ADD,ADDI ADD
erflow) ADD,ADDI ADD,
ADDI
ADDO, ADDIO ADDO;
MCRXR; BC
ADDcc; TVS
SUBU,SUBUI SUBU SUB,SUBI SUBF SUB
erflow) SUB,SUBI SUB SUBTO,SUBIO SUBF/oe SUBcc; TVS
MULTU,
MULTUI
MULT,
MULTU
SHiADD; ...;
(i=1,2,3)
MULLW,
MULLI
MULX
if ovf) MULT,MULTI — SHiADDO; ...; — —
DIVU,DIVUI DIV,DIVU DS; ...; DS DIVW DIVX
ovf) DIV,DIVI — — — —
AND,ANDI AND,ANDI AND AND,ANDI AND
OR,ORI OR,ORI OR OR,ORI OR
XOR,XORI XOR,XORI XOR XOR,XORI XOR
Figure continued on next page
Instruction N
Arithmetic
(co
(instruction fo
Load high part
Shift left logic
Shift right logi
Shift right arit
Compare
Control
(instruction fo
Branch on inte
compare
Branch on floa
point compare
Jump, jump re
Call, call regis
Trap
Return from in
Floating poin
(instruction fo
Add single, do
Sub single, do
Mult single, do
Div single, dou
Compare
Move R–R
Convert
(single,double
to
(single,doubleC.3 Instructions: The DLX Subset C-7
ame DLX MIPS IV PA-RISC 1.1 PowerPC SPARC V9
ntinued)
rmats) R–I R–I R–I, R–R R–I, R–R R–I, R–R
register LHI LUI LDIL ADDIS SETHI (B fmt.)
al SLL,SLLI SLLV,SLL ZDEP 31-i,
32-i
RLWINM SLL
cal SRL,SRLI SRLV,SRL EXTRU 31, 32-i RLWINM 32-i SRL
hmetic SRA,SRAI SRAV,SRA EXTRS 31, 32-i SRAW SRA
S_(, £ , ‡ ,
=, „ )
SLT/I,
SL/ITU
COMB CMP(I)CLR SUBcc r0,...
rmats) B, J/C B, J/C B, J/C B, J/C B, J/C
ger BEQ,BNE BEQ,BNE,
B_Z
(, £ ,‡ )
COMB, COMIB BC BR_Z, BPcc
(, £ , ‡ ,=, „ )
ting- BFPT,BFPF BC1T,BC1F FSTWX f0; LDW
t; BB t
BC FBPfcc
(,£ ,‡ ,=,...)
gister J,JR J,JR BL r0, BLR r0 B, BCLR,
BCCTR
BA, JMPL
r0,...
ter JAL,JALR JAL,JALR BL, BLE BL,BLA,
BCLRL,
BCCTRL
CALL, JMPL
TRAP BREAK BREAK TW, TWI Ticc, SIR
terrupt RFE JR; RFE RFI,RFIR RFI DONE, RETRY,
RETURN
t
rmats) R–R R–R R–R R–R R–R
uble ADDF,
ADDD
ADD.S,
ADD.D
FADD
FADD/dbl
FADDS,
FADD
FADDS,
FADDD
uble SUBF,
SUBD
SUB.S,
SUB.D
FSUB
FSUB/dbl
FSUBS,
FSUB
FSUBS,
FSUBD
uble MULF,
MULD
MUL.S,
MUL.D
FMPY
FMPY/dbl
FMULS,
FMUL
FMULS,
FMULD
ble DIVF,
DIVD
DIV.S,
DIV.D
FDIV,
FDIV/dbl
FDIVS,
FDIV
FDIVS,
FDIVD
_F, _D
(, £ , ‡ ,=,
...)
C_.S, C_.D
(,£ ,‡ ,=,
...)
FCMP, FCMP/
dbl
()
FCMP FCMPS,
FCMPD
MOVF MOV.S FCPY FMV FMOVS/D/Q
,integer)
,integer)
CVTF2D,
CVTD2F,
CVTF2I,
CVTD2I,
CVTI2F,
CVTI2D
CVT.S.D,
CVT.D.S,
CVT.S.W,
CVT.D.W,
CVT.W.S,
CVT.W.D
FCNVFF,s,d
FCNVFF,d,s
FCNVXF,s,s
FCNVXF,d,d
FCNVFX,s,s
FCNVFX,d,s
—,
FRSP,
—,
FCTIW,
—,
—
FSTOD,
FDTOS,
FSTOI,
FDTOI,
FITOS,
FITOD
Figure continued on next page
C-8
Instruction N
Conventions
Register with v
Return address
No-op
Move R–R int
Operand order
FIGURE C.5
thesized in a fe
of instructions
but SPARC us
these instructioAppendix C Survey of RISC Architectures
Compare and Conditional Branch
SPARC uses the traditional four condition code bits stored in the program status
word: negative, zero, carry, and overflow. They can be set on any arithmetic or
logical instruction; unlike earlier architectures, this setting is optional on each in-
struction. An explicit option leads to fewer problems in pipelined implementa-
tion. Although condition codes can be set as a side effect of an operation, explicit
compares are synthesized with a subtract using r0 as the destination. SPARC
conditional branches test condition codes to determine all possible unsigned and
signed relations. Floating point uses separate condition codes to encode the IEEE
754 conditions, requiring a floating-point compare instruction. Version 9 expand-
ed SPARC branches in four ways: a separate set of condition codes for 64-bit op-
erations; a branch that tests the contents of a register and branches if the value is
=,„ ,<,£ ,‡ , or ‡ 0 (see MIPS below); three more sets of floating-point condition
codes; and branch instructions that encode static branch prediction.
PowerPC also uses four condition codes: less than, greater than, equal, and
summary overflow, but it has eight copies of them. This redundancy allows the
PowerPC instructions to use different condition codes without conflict, essential-
ly giving PowerPC eight extra 4-bit registers. Any of these eight condition codes
can be the target of a compare instruction, and any can be the source of a condi-
tional branch. The integer instructions have an option bit that behaves as if the in-
teger op was followed by a compare to zero that sets the first condition “register.”
PowerPC also lets the second “register” be optionally set by floating-point
instructions. PowerPC provides logical operations among these eight 4-bit condi-
tion code registers (CRAND, CROR, CRXOR, CRNAND, CRNOR, CREQV), allowing more
ame DLX MIPS IV PA-RISC 1.1 PowerPC SPARC V9
alue 0 r0 r0 r0 r0 (ad-
dressing)
r0
reg. r31 r31 r2, r31 link
(special)
r31
ADD
r0,r0,r0
SLL
r0,r0,r0
OR r0,r0,r0 ORI
r0,r0,#0
SETHI r0,0
eger ADD
...,r0,...
ADD
...,r0,...
OR ...,r0,... OR rx, ry,
ry
OR ...,r0,...
OP
Rd,Rs1,Rs2
OP
Rd,Rs1,Rs2
OP Rs1,Rs2,Rd OP
Rd,Rs1,Rs2
OP Rs1,Rs2,Rd
Instructions equivalent to DLX. Dashes mean the operation is not available in that architecture, or not syn-
w instructions. Such a sequence of instructions is shown separated by semicolons. If there are several choices
equivalent to DLX, they are separated by commas. Note that in the “Arithmetic, logical” category all machines
e separate instruction mnemonics to indicate an immediate operand; SPARC offers immediate versions of
ns but uses a single mnemonic. (Of course these are separate opcodes!)complex conditions to be tested by a single branch.
C.4 Instructions: Common Extensions to DLX C-9
Number of co
(integer and F
Basic compare
(integer and F
Basic branch i
(integer and F
Compare regis
const and bran
Compare regis
branch
FIGURE C.6 S
plished by copyi
comparison bit. I
r0 as the destinMIPS uses the contents of registers to evaluate conditional branches. Any two
registers can be compared for equality (BEQ) or inequality (BNE) and then the
branch is taken if the condition holds. The set-on-less-than instructions (SLT,
SLTI, SLTU, SLTIU) compare two operands and then set the destination register to
1 if less and to 0 otherwise. These instructions are enough to synthesize the full
set of relations. Because of the popularity of comparisons to 0, MIPS includes
special compare-and-branch instructions for all such comparisons: greater than or
equal to zero (BGEZ), greater than zero (BGTZ), less than or equal to zero (BLEZ),
and less than zero (BLTZ). Of course, equal and not equal to zero can be synthe-
sized using r0 with BEQ and BNE. Like SPARC, MIPS I uses a condition code for
floating point with separate floating-point compare and branch instructions;
MIPS IV expands this to eight floating-point condition codes, with the floating-
point comparisons and branch instructions specifying the condition to set or test.
PA-RISC has many branch options, which we’ll see in section C.8. The most
straightforward is a compare and branch instruction (COMB), which compares two
registers, then branches depending on the standard relations, and tests the least-
significant bit of the result of the comparison.
Figure C.6 summarizes the four schemes used for conditional branches.
Figure C.7 lists instructions not found in Figure C.5 in the same four categories.
Instructions are put in this list if they appear in more than one of the four archi-
tectures. The instructions are defined using the hardware description language,
which is described on the page facing the inside back cover.
DLX MIPS IV PA-RISC 1.1 PowerPC SPARC V9
ndition code bits
P)
1 FP 8 FP 1 FP 8 · 4 both 2 · 4 integer,
4 · 2 FP
instructions
P)
1 integer,
1 FP
1 integer,
1 FP
4 integer,
1 FP
4 integer,
2 FP
1 FP
nstructions
P)
1 integer,
1 FP
2 integer,
1 FP
7 integer 1 both 3 integer,
1 FP
ter with register/
ch
=,„ =,„ =,„ ,,‡ ,
even, odd
— —
ter to zero and =,„ =,„ ,,‡ =,„ ,,‡ ,
even, odd
— =,„ ,,‡
ummary of five approaches to conditional branches. Floating-point branch on PA-RISC is accom-
ng the FP status register into an integer register and then using the branch on bit instruction to test the FP
nteger compare on SPARC is synthesized with an arithmetic instruction that sets the condition codes using
ation. PA-RISC 2.0 will have eight floating-point condition code bits.
C.4 Instructions: Common Extensions to DLX
C-10
Name
Data transfer
Atomic swap R
(for semaphore
Load 64-bit in
Store 64-bit in
Load 32-bit in
unsigned
Load 32-bit in
signed
Prefetch
Load coproces
Store coproces
Endian
Cache flush
Shared memor
synchronizatio
Arithmetic, lo
64-bit integer
arithmetic ops
64-bit integer
logical ops
64-bit shifts
Conditional m
Support for mu
word integer a
Support for mu
word integer s
And not
Or not
Add high
immediate
Coprocessor
operationsAppendix C Survey of RISC Architectures
Definition MIPS IV PA-RISC 1.1 PowerPC SPARC V9
/M
s)
Temp‹ Rd;
Rd‹ Mem[x];
Mem[x]‹ Temp
LL;SC
— (see C.8) LWARX;
STWCX
CASA,
CASX
teger Rd‹ 64 Mem[x] LD (in 2.0) LD LDX
t. Mem[x]‹ 64 Rd SD (in 2.0) STD STX
t. Rd32..63‹ 32 Mem[x];
Rd0..31 ‹ 32 0
LWU (in 2.0) LWZ LDUW
t. Rd32..63‹ 32 Mem[x];
Rd0..31 ‹ 32 Mem[x]032
LW (in 2.0) LWA LDSW
Cache[x]‹ hint PREF, PREFX LDWX, LDWS,
STWX,STWS
DCBT,
DCBTST
PREFETCH
sor Coprocessor‹ Mem[x] LWCi CLDWX,CLDWS — —
sor Mem[x]‹ Coprocessor SWCi CSTWX,CSTWS — —
(Big/Little Endian?) Either Either Either Either
(Flush cache block at
this address)
CP0op FDC, FIC DCBF FLUSH
y
n
(All prior data transfers
complete before next
data transfers may start)
SYNC SYNC SYNC MEMBAR
gical
Rd‹ 64Rs1 op64 Rs2 DADD,DSUB
DMULT, DDIV
(in 2.0) ADD,SUBF,
MULLD,
DIVD
ADD,
SUB, MULX,
S/UDIVX
Rd‹ 64Rs1 op64 Rs2 AND,OR,XOR (in 2.0) AND,OR,XOR AND,OR,XOR
Rd‹ 64Rs1 op64 Rs2 DSLL,DSRA,
DSRL
(in 2.0) SLD,SRAD,
SRLD
SLLX,
SRAX, SRLX
ove if (cond) Rd‹
Rs MOVN/Z SUBc,n;
ADD
— MOVcc,
MOVr
lti-
dd
CarryOut,Rd ‹ Rs1 +
Rs2 + OldCarryOut
ADU;SLTU;
ADDU
ADDC ADDC,
ADDE.
ADDcc
lti-
ub
CarryOut,Rd ‹ Rs1
Rs2 + OldCarryOut
SUBU;SLTU;
SUBU
SUBB SUBFC,
SUBFE.
SUBcc
Rd ‹ Rs1 & ~(Rs2) — ANDCM ANDC ANDN
Rd ‹ Rs1 | ~(Rs2) — — ORC ORN
Rd0..15‹ Rs10..15 +
(Const<<16);
— ADDIL (R–I) ADDIS
(R–I)
—
(Defined by
coprocessor)
COPi COPR,i — IMPDEPiFigure continued on next page
C.4 Instructions: Common Extensions to DLX C-11
Name
Control
Optimized del
branches
Conditional tra
No. control reg
Floating poin
Multiply & Ad
Multiply & Su
Neg Mult & A
Neg Mult & S
Square Root
Conditional M
Negate
Absolute value
FIGURE C.7Although most of the categories are self-explanatory, a few bear comment:
n The “atomic swap” row means a primitive that can exchange a register with
memory without interruption. This is useful for operating system semaphores
in uniprocessor as well as for multiprocessor synchronization (see section 8.5
of Chapter 8).
n The 64-bit data transfer and operation rows show how MIPS, PowerPC, and
SPARC define 64-bit addressing and integer operations. SPARC simply defines
all register and addressing operations to be 64 bits, adding only special instruc-
tions for 64-bit shifts, data transfers, and branches. MIPS includes the same ex-
tensions, plus it adds separate 64-bit signed arithmetic instructions. PowerPC
added 64-bit right shift, load, store, divide, and compare and has a separate mode
determining whether instructions are interpreted as 32- or 64-bit operations; 64-
bit operations will not work in a machine that only supports 32-bit mode. PA-
RISC is expanded to 64-bit addressing and operations in version 2.0.
n The “prefetch” instruction supplies an address and hint to the implementation
about the data. Hints include that the data is likely to be read or written soon,
Definition MIPS IV PA-RISC 1.1 PowerPC SPARC V9
ayed (Branch not always
delayed )
BEQL,BNEL,
B_ZL
(, £ , ‡ )
COMBT,n,
COMBF,n
— BPcc,A
FPBcc,A
p if (COND)
{R31‹ PC; PC ‹ 0..0#i}
T_,T_I
(=, „ ,, £ , ‡ )
SUBc,n;
BREAK
TW, TD,
TWI, TDI
Tcc
s. Misc. regs (virtual
memory, interrupts,...)
» 12 32 33 29
t
d Fd ‹ ( Fs1 · Fs2) + Fs3 MADD.S/D — (see C.8) FMADD/S
b Fd ‹ ( Fs1 · Fs2) – Fs3 MSUB.S/D — (see C.8) FMSUB/S
dd Fd ‹ –(( Fs1 · Fs2)+Fs3) NMADD.S/D FNMADD/S
ub Fd ‹ –(( Fs1 · Fs2)–Fs3) NMSUB.S/D FNMSUB/S
Fd ‹ SQRT(Fs) SQRT.S/D FSQRTsgl/
dbl
FSQRT/S FSQRTS/D
ove if (cond) Fd‹ Fs MOVF/T,
MOVF/T.S/D,
FTEST;FCPY — FMOVcc
Fd ‹ Fs ^ x80000000 NEG.S/D (in 2.0) FNEG FNEGS/D/Q
Fd ‹ Fs & x7FFFFFFF ABS.S/D FABS/dbl FABS FABSS/D/Q
Instructions not found in DLX but found in two or more of the four architectures.
C-12
Found in archi
Execute follow
instruction
FIGURE C.8 WAppendix C Survey of RISC Architectures
likely to be read or written only once, or likely to be read or written many times.
Prefetch does not cause exceptions. MIPS has a version that adds two registers
to get the address for floating-point programs, unlike non-floating-point MIPS
programs. (See pages 412–414 in Chapter 5 to learn more about prefetching.)
n In the “Endian” row, “Big or Little” means there is a bit in the program status
register that allows the processor to act either as Big Endian or Little Endian
(see page 73 in Chapter 2). This can be accomplished by simply complement-
ing some of the least-significant bits of the address in data transfer instructions.
n The “shared memory synchronization” helps with cache-coherent multi-
processors: All loads and stores executed before the instruction must complete
before loads and stores after it can start. (See section 8.5 of Chapter 8.)
n The “coprocessor operations” row lists several categories that allow for the pro-
cessor to be extended with special-purpose hardware.
One difference that needs a longer explanation is the optimized branches. Figure
C.8 shows the options. The PowerPC offers branches that take effect immediately,
like branches on earlier architectures. This avoids executing NOPs when there is no
instruction to fill the delay slot; all the rest offer delayed branches. The other three
provide a version of delayed branch that makes it easier to fill the delay slot. The
SPARC “annulling” branch executes the instruction in the delay slot only if the
branch is taken; otherwise the instruction is annulled. This means the instruction at
the target of the branch can safely be copied into the delay slot since it will only
be executed if the branch is taken. The restrictions are that the target is not anoth-
er branch and that the target is known at compile time. (SPARC also offers a non-
delayed jump because an unconditional branch with the annul bit set does not
execute the following instruction.) Recent versions of the MIPS architecture have
added a branch likely instruction that also annuls the following instruction if the
branch is not taken. PA-RISC allows almost any instruction to annul the next in-
struction, including branches. Its “nullifying” branch option will execute the next
instruction depending on the direction of the branch and whether it is taken (i.e.,
if a forward branch is not taken or a backward branch is taken). Presumably this
choice was made to optimize loops, allowing the instructions following the exit
branch and the looping branch to execute in the common case.
Now that we have covered the similarities, we will focus on the unique fea-
tures of each architecture, ordering them by length of description of the unique
features from shortest to longest.
(Plain) Branch Delayed branch Annulling delayed branch
tectures PowerPC DLX, MIPS,
PA-RISC, SPARC
MIPS, SPARC PA-RISC
ing Only if branch
not taken
Always Only if branch taken If forward branch not
taken or backward
branch taken
hen the instruction following the branch is executed for three types of branches.
C.5 Instructions Unique to MIPS C-13MIPS has gone through four generations of instruction set evolution, and this
evolution has generally added features found in other architectures. Here are the
salient unique features of MIPS, the first several of which were found in the orig-
inal instruction set.
Nonaligned Data Transfers
MIPS has special instructions to handle misaligned words in memory. A rare
event in most programs, it is included for COBOL programs where the program-
mer can force misalignment by declarations. Although most RISCs trap if you try
to load a word or store a word to a misaligned address, on all architectures mis-
aligned words can be accessed without traps by using four load byte instructions
and then assembling the result using shifts and logical ORs. The MIPS load and
store word left and right instructions (LWL, LWR, SWL, SWR) allow this to be done in
just two instructions: LWL loads the left portion of the register and LWR loads the
right portion of the register. SWL and SWR do the corresponding stores. Figure C.9
shows how they work. There are also 64-bit versions of these instructions.
TLB Instructions
TLB misses are handled in software in MIPS, so the instruction set also has in-
structions for manipulating the registers of the TLB (see pages 455–456 in Chap-
ter 5 for more on TLBs). These registers are considered part of the “system
coprocessor” and thus can be accessed by the instructions that move between co-
processor registers and integer registers. The contents of a TLB entry are read by
loading via read indexed TLB entry (TLBR) and written using either write indexed
TLB entry (TLBWI) or write random TLB entry (TLBWR). The TLB contents are
searched using probe TLB for matching entry (TLBP).
Remaining Instructions
Below is a list of the remaining unique details of the MIPS architecture:
n NOR—This logical instruction calculates ~(Rs1 | Rs2).
n Constant shift amount—Non-variable shifts use the 5-bit constant field shown
in the register-register format in Figure C.3.
n SYSCALL—This special trap instruction is used to invoke the operating
system.
n Move to/from control registers—CTCi and CFCi move between the integer
registers and control registers.
C.5 Instructions Unique to MIPS
C-14 Appendix C Survey of RISC Architectures
n Jump/call not PC-relative—The 26-bit address of jumps and calls is not added
to the PC. It is shifted left 2 bits and replaces the lower 28 bits of the PC. This
would only make a difference if the program were located near a 256-MB
boundary.
n Load linked/store conditional—This pair of instructions gives MIPS atomic op-
erations for semaphores, allowing data to be read from memory, modified, and
stored without fear of interrupts or other machines accessing the data in a
multiprocessor (see section 8.5 of Chapter 8). There are both 32- and 64-bit
versions of these instructions.
n Reciprocal and reciprocal square root—These instructions, which do not fol-
low IEEE 754 guidelines of proper rounding, are included apparently for appli-
FIGURE C.9 MIPS instructions for unaligned word reads. This figure assumes opera-
tion in Big Endian mode. Case 1 first loads the 3 bytes 101,102, and 103 into the left of R2,
leaving the least-significant byte undisturbed. The following LWR simply loads byte 104 into
the least-significant byte of R2, leaving the other bytes of the register unchanged using LWL.
Case 2 first loads byte 203 into the most-significant byte of R4, and the following LWR loads
the other 3 bytes of R4 from memory bytes 204, 205, and 206. LWL reads the word with the
first byte from memory, shifts to the left to discard the unneeded byte(s), and changes only
those bytes in Rd. The byte(s) transferred are from the first byte until the lowest-order byte of
the word. The following LWR addresses the last byte, right shifts to discard the unneeded
byte(s), and finally changes only those bytes of Rd. The byte(s) transferred are from the last
byte up to the highest-order byte of the word. Store word left (SWL) is simply the inverse of
LWL, and store word right (SWR) is the inverse of LWR. Changing to Little Endian mode flips
which bytes are selected and discarded. (If big-little, left-right, load-store seem confusing,
don’t worry, it works!)
100 101 102 103
104 105 106 107
200 201 202 203
204 205 206 207
Case 1
Before
After
After
M[100] D DA V
M[104]
R2
R2
R2
E
J
D
D
O
A
A
H
V
V
N
N
E
LWL R2, 101:
LWR R2, 104:
Case 2
Before
After
After
M[200]
M[204]
R4
R4
R4
A V E
J
D
D
O
O
A
H
H
V
N
N
E
LWL R4, 203:
LWR R4, 206:cations that value speed of divide and square root more than they value
accuracy.
C.6 Instructions Unique to SPARC C-15n Conditional procedure call instructions—BGEZAL saves the return address and
branches if the content of Rs1 is greater than or equal to zero, and BLTZAL does
the same for less than zero. The purpose of these instructions is to get a PC-
relative call. (There are “likely” versions of these instructions as well.)
There is no specific provision in the MIPS architecture for floating-point execu-
tion to proceed in parallel with integer execution, but the MIPS implementations
of floating point allow this to happen by checking to see if arithmetic interrupts
are possible early in the cycle (see Appendix A). Normally interrupts are not pos-
sible when integer and floating point operate in parallel.
Several features are unique to SPARC.
Register Windows
The primary unique feature of SPARC is register windows, an optimization for
reducing register traffic on procedure calls. Several banks of registers are used,
with a new one allocated on each procedure call. Although this could limit the
depth of procedure calls, the limitation is avoided by operating the banks as a cir-
cular buffer, providing unlimited depth. The knee of the cost-performance curve
seems to be six to eight banks.
SPARC can have between two and 32 windows, typically using eight registers
each for the globals, locals, incoming parameters, and outgoing parameters. (Giv-
en each window has 16 unique registers, an implementation of SPARC can have
as few as 40 physical registers and as many as 520, although most have 128 to
136, so far.) Rather than tie window changes with call and return instructions,
SPARC has the separate instructions SAVE and RESTORE. SAVE is used to “save”
the caller’s window by pointing to the next window of registers in addition to per-
forming an add instruction. The trick is that the source registers are from the call-
er’s window of the addition operation, while the destination register is in the
callee’s window. SPARC compilers typically use this instruction for changing the
stack pointer to allocate local variables in a new stack frame. RESTORE is the in-
verse of SAVE, bringing back the caller’s window while acting as an add instruc-
tion, with the source registers from the callee’s window and the destination
register in the caller’s window. This automatically deallocates the stack frame.
Compilers can also make use of it for generating the callee’s final return value.
The danger of register windows is that the larger number of registers could
slow down the clock rate. This was not the case for early implementations. The
SPARC architecture (with register windows) and the MIPS R2000 architecture
(without) have been built in several technologies since 1987. For several genera-
tions the SPARC clock rate has not been slower than the MIPS clock rate for
C.6 Instructions Unique to SPARC
C-16 Appendix C Survey of RISC Architecturesimplementations in similar technologies, probably because cache-access times
dominate register-access times in these implementations. The current generation
machines took different implementation strategies—superscalar vs. superpipe-
lining—and it’s unlikely that the number of registers by themselves determined
the clock rate in either machine.
Another data transfer feature is alternate space option for loads and stores.
This simply allows the memory system to identify memory accesses to input/out-
put devices, or to control registers for devices such as the cache and memory-
management unit.
Fast Traps
Version 9 SPARC includes support to make traps fast. It expands the single level
of traps to at least four levels, allowing the window overflow and underflow trap
handlers to be interrupted. The extra levels mean the handler does not need to
check for page faults or misaligned stack pointers explicitly in the code, thereby
making the handler faster. Two new instructions were added to return from this
multilevel handler: RETRY (which retries the interrupted instruction) and DONE
(which does not). To support user-level traps, the instruction RETURN will return
from the trap in nonprivileged mode.
Support for LISP and Smalltalk
The primary remaining arithmetic feature is tagged addition and subtraction. The
designers of SPARC spent some time thinking about languages like LISP and
Smalltalk, and this influenced some of the features of SPARC already discussed:
register windows, conditional trap instructions, calls with 32-bit instruction ad-
dresses, and multiword arithmetic (see Taylor et al. [1986] and Ungar et al.
[1984]). A small amount of support is offered for tagged data types with opera-
tions for addition, subtraction, and hence comparison. The two least-significant
bits indicate whether the operand is an integer (coded as 00), so TADDcc and
TSUBcc set the overflow bit if either operand is not tagged as an integer or if the
result is too large. A subsequent conditional branch or trap instruction can decide
what to do. (If the operands are not integers, software recovers the operands,
checks the types of the operands, and invokes the correct operation based on
those types.) It turns out that the misaligned memory access trap can also be put
to use for tagged data, since loading from a pointer with the wrong tag can be an
invalid access. Figure C.10 shows both types of tag support.
Overlapped Integer and Floating-Point Operations
SPARC allows floating-point instructions to overlap execution with integer in-
structions. To recover from an interrupt during such a situation, SPARC has a
queue of pending floating-point instructions and their addresses. RDPR allows the
C.6 Instructions Unique to SPARC C-17processor to empty the queue. The second floating-point feature is the inclusion
of floating-point square root instructions FSQRTS, FSQRTD, and FSQRTQ.
Remaining Instructions
The remaining unique features of SPARC are
n JMPL uses Rd to specify the return address register, so specifying r31 makes it
similar to JALR in DLX and specifying r0 makes it like JR.
n LDSTUB loads the value of the byte into Rd and then stores FF16 into the ad-
dressed byte. This version 8 instruction can be used to implement a semaphore.
n CASA (CASXA) atomically compares a value in a processor register to 32-bit
(64-bit) value in memory; if and only if they are equal, it swaps the value in
memory with the value in a second processor register. This version 9 instruction
can be used to construct wait-free synchronization algorithms that do not re-
quire the use of locks.
n XNOR calculates the exclusive or with the complement of the second operand.
FIGURE C.10 SPARC uses the two least-significant bits to encode different data
types for the tagged arithmetic instructions. (a) Integer arithmetic, which takes a single
cycle as long as the operands and the result are integers. (b) The misaligned trap can be
used to catch invalid memory accesses, such as trying to use an integer as a pointer. For
languages with paired data like LISP, an offset of –3 can be used to access the even word of
a pair (CAR) and +1 can be used for the odd word of a pair (CDR).
(a) Add, sub, or
compare integers
(coded as 00)
(b) Loading via
valid pointer
(coded as 11)
00 (R5)
00 (R6)
00 (R7)
11
3
(R4)
00 (Word
address)
TADDcc r7, r5, r6
LD rD, r4, -3
+
–
–
C-18 Appendix C Survey of RISC Architecturesn BPcc, BPr, and FBPcc include a branch prediction bit so that the compiler can
give hints to the machine about whether a branch is likely to be taken or not.
n ILLTRAP causes an illegal instruction trap. Muchnick [1988] explains how this
is used for proper execution of aggregate returning procedures in C.
n POPC counts the number of bits set to one in an operand.
n Non-faulting loads allow compilers to move load instructions ahead of condi-
tional control structures that control their use. Hence, non-faulting loads will
be executed speculatively.
n Quadruple precision floating-point arithmetic and data transfer allow the
floating-point registers to act as eight 128-bit registers for floating-point oper-
ations and data transfers.
n Multiple-precision floating-point results for multiply mean that two single-
precision operands can result in a double-precision product and two double-
precision operands can result in a quadruple-precision product. These instruc-
tions can be useful in complex arithmetic and some models of floating-point
calculations.
PowerPC is the result of several generations of IBM commercial RISC machines:
IBM RT/PC, IBM Power-1, and IBM Power-2.
Branch Registers: Link and Counter
Rather than dedicate one of the 32 general-purpose registers to save the return ad-
dress on procedure call, PowerPC puts the address into a special register called
the link register. Since many procedures will return without calling another pro-
cedure, link doesn’t always have to be saved away. Making the return address a
special register makes the return jump faster since the hardware need not go
through the register read pipeline stage for return jumps.
In a similar vein, PowerPC has a count register to be used in for loops where
the program iterates for a fixed number of times. By using a special register the
branch hardware can determine quickly whether a branch based on the count reg-
ister is likely to branch, since the value of the register is known early in the exe-
cution cycle. Tests of the value of the count register in a branch instruction will
automatically decrement the count register.
Given that the count register and link register are already located with the
hardware that controls branches, and that one of the problems in branch predic-
tion is getting the target address early in the pipeline (see Chapter 3, section 3.5),
the PowerPC architects decided to make a second use of these registers. Either
C.7 Instructions Unique to PowerPC
C.8 Instructions Unique to PA-RISC C-19register can hold a target address of a conditional branch. Thus PowerPC supple-
ments its basic conditional branch with two instructions that get the target ad-
dress from these registers (BCLR, BCCTR).
Remaining Instructions
Unlike other RISC machines, register 0 is not hardwired to the value 0. It cannot
be used as a base register, but in base+index addressing it can be used as the in-
dex. The other unique features of the PowerPC are
n Load multiple and store multiple save or restore up to 32 registers in a single
instruction.
n LSW and STSW permit fetching and storing of fixed and variable-length strings
that have arbitrary alignment.
n Rotate with mask instructions support bit field extraction and insertion. One
version rotates the data and then performs logical AND with a mask of ones,
thereby extracting a field. The other version rotates the data but only places the
bits into the destination register where there is a corresponding 1 bit in the
mask, thereby inserting a field.
n Algebraic right shift sets the carry bit (CA) if the operand is negative and any
one bits are shifted out. Thus a signed divide by any constant power of two that
rounds toward zero can be accomplished with a SRAWI followed by ADDZE,
which adds CA to the register.
n CBTLZ will count leading zeros.
n SUBFIC computes (immediate – RA), which can be used to develop a one’s or
two’s complement.
n Logical shifted immediate instructions shift the 16-bit immediate to the left 16
bits before performing AND, OR, or XOR.
PA-RISC was expanded slightly in 1990 with version 1.1 and changed signifi-
cantly in 2.0 with 64-bit extensions that will be in systems shipped in 1996. PA-
RISC perhaps has the most unusual features of any commercial RISC machine.
For example, it has the most addressing modes, instruction formats, and, as we
shall see, several instructions that are really the combination of two simpler in-
structions.
C.8 Instructions Unique to PA-RISC
C-20 Appendix C Survey of RISC Architectures
Name
COMB
COMIB
MOVB
MOVIB
ADDB
ADDIB
BB
BVB
FIGURE C.11 T
5-bit immediate i
flow unsigned, n
branches depen
register. The subNullification
As shown in Figure C.8 on page C-12, several RISC machines can choose to not
execute the instruction following a delayed branch, in order to improve utilization
of the branch slot. This is called nullification in PA-RISC, and it has been general-
ized to apply to any arithmetic-logical instruction as well as to all branches. Thus
an add instruction can add two operands, store the sum, and cause the following
instruction to be skipped if the sum is zero. Like conditional move instructions,
nullification allows PA-RISC to avoid branches in cases where there is just one in-
struction in the then part of an if statement.
A Cornucopia of Conditional Branches
Given nullification, PA-RISC did not need to have separate conditional branch in-
structions. The inventors could have recommended that nullifying instructions
precede unconditional branches, thereby simplifying the instruction set. Instead,
PA-RISC has the largest number of conditional branches of any RISC machine.
Figure C.11 shows the conditional branches of PA-RISC. As you can see, several
are really combinations of two instructions.
Synthesized Multiply and Divide
PA-RISC provides several primitives so that multiply and divide can be synthe-
sized in software. Instructions that shift one operand 1, 2, or 3 bits and then add,
Instruction Notation
Compare and branch if (cond(Rs1,Rs2)) {PC ‹ PC + offset12}
Compare imm. and branch if (cond(imm5,Rs2)) {PC ‹ PC + offset12}
Move and branch Rs2 ‹ Rs1,
if (cond(Rs1,0))
{PC ‹ PC + offset12}
Move immediate and branch Rs2 ‹ imm5,
if (cond(imm5,0))
{PC ‹ PC + offset12}
Add and branch Rs2 ‹ Rs1 + Rs2,
if (cond(Rs1 + Rs2,0))
{PC ‹ PC + offset12}
Add imm. and branch Rs2 ‹ imm5 + Rs2,
if (cond(imm5 + Rs2,0))
{PC ‹ PC + offset12}
Branch on bit if (cond(Rsp,0) {PC ‹ PC + offset12}
Branch on variable bit if (cond(Rssar,0) {PC ‹ PC + offset12}
he PA-RISC conditional branch instructions. The 12-bit offset is called offset12 in this table, and the
s called imm5. The 16 conditions are =, <, £ , odd, signed overflow, unsigned no overflow, zero or no over-
ever, and their respective complements. The BB instruction selects one of the 32 bits of the register and
ding if its value is 0 or 1. The BVB selects the bit to branch using the shift amount register, a special-purpose
script notation specifies a bit field.
C.8 Instructions Unique to PA-RISC C-21trapping or not on overflow, are useful in multiplies. Divide step performs the
critical step of nonrestoring divide, adding or subtracting depending on the sign
of the prior result. Magenheimer et al. [1988] measured the size of operands in
multiplies and divides to show how well the multiply step would work. Using
these data for C programs, Muchnick [1988] found that by making special cases
the average multiply by a constant takes 6 clock cycles and multiply of variables
takes 24 clock cycles. PA-RISC has 10 instructions for these operations.
The original SPARC architecture used similar optimizations, but with increas-
ing number of transistors the instruction set was expanded to include full multi-
ply and divide operations. PA-RISC gives some support along these lines by
putting a full 32-bit integer multiply in the floating-point unit; however, the inte-
ger data must first be moved to floating-point registers.
Decimal Operations
COBOL programs will compute on decimal values, stored as 4 bits per digit,
rather than converting back and forth between binary and decimal. PA-RISC has
instructions that will convert the sum from a normal 32-bit add into proper deci-
mal digits. It also provides logical and arithmetic operations that set the condition
codes to test for carries of digit, bytes, or half words. These operations also test
whether bytes or half words are zero. These operations would be useful in arith-
metic on 8-bit ASCII characters. Five PA-RISC instructions provide decimal sup-
port.
Remaining Instructions
Here are some remaining PA-RISC instructions:
n Branch vectored shifts an index register left 3 bits, adds it to a base register and
then branches to the calculated address. It is used for case statements.
n Extract and deposit instructions allow arbitrary bit fields to be selected from or
inserted into registers. Variations include whether the extracted field is sign-
extended, whether the bit field is specified directly in the instruction or indirectly
in another register, and whether the rest of the register is set to zero or left un-
changed. PA-RISC has 12 such instructions.
n To simplify use of 32-bit address constants, PA-RISC includes ADDIL, which
adds a left-adjusted 21-bit constant to a register and places the result in register
1. The following data transfer instruction uses offset addressing to add the low-
er 11 bits of the address to register 1. This pair of instructions allows PA-RISC
to add a 32-bit constant to a base register, at the cost of changing register 1.
n PA-RISC has nine debug instructions that can set breakpoints on instruction or
data addresses and return the trapped addresses.
C-22 Appendix C Survey of RISC Architectures
n Load and clear instructions provide a semaphore that reads a value from mem-
ory and then writes zero.
n Store bytes short optimizes unaligned data moves, moving either the leftmost
or the rightmost bytes in a word to the effective address depending on the in-
struction options and condition code bits.
n Loads and stores work well with caches by having options that give hints about
whether to load data into the cache if it’s not already in the cache. For example,
load with a destination of register 0 is defined to be a cache hint.
n Multiply/add and multiply/subtract are floating-point operations that can
launch two independent floating-point operations in a single instruction. Ver-
sion 2.0 of PA-RISC will have fused multiply-add like the PowerPC.
In addition to instructions, here are a few features that distinguish PA-RISC:
n The segmented address space above the 232 boundary means that there must be
instructions to manipulate the segment registers and branch instructions that
can leave the current segment.
n The data addressing modes use either a 14-bit offset or a 5-bit offset, and the
sum of the base register and the immediate can be used to update the base reg-
ister. The decision of whether to use only the base register or the sum as the ef-
fective address is optional. For 5-bit offsets there is a bit in the instruction that
makes the decision, but in the 14-bit offsets it depends on the sign bit offset:
Negative means use the sum, positive means use the register. These options turn
the standard 6-integer data transfers into 20 instructions. PA-RISC 2.0 makes
the set of addressing options more orthogonal.
This appendix covers the addressing modes, instruction formats, and all instruc-
tions found in four recent RISC architectures. Although the later sections con-
centrate on the differences, it would not be possible to cover four architectures in
these few pages if there were not so many similarities. In fact, we would guess
that more than 90% of the instructions executed for any of these architectures
would be found in Figure C.5 on pages C-6–C-8. To contrast this homogeneity,
Figure C.12 gives a summary for four architectures from the 1970s in a format
similar to that shown in Figure C.1 on page C-2. (Imagine trying to write a single
appendix in this style for those architectures.) In the history of computing, there
has never been such widespread agreement on computer architecture.
C.9 Concluding Remarks
C.9 Concluding Remarks C-23
IBM 360/370 Intel 8086 Motorola 68000 DEC VAX
Date announced 1964/1970 1978 1980 1977
Instruction size(s) (bits) 16,32,48 8,16,24,32, 40,48 16,32,48,64,80 8,16,24,32,..., 432
Addressing (size, model) 24 bits, flat/
31 bits, flat
4+16 bits,
segmented
24 bits, flat 32 bits, flat
Data aligned? Yes 360/ No 370 No 16-bit aligned No
Data addressing modes 2/3 5 9 ‡ 14
Protection Page None Optional Page
Page size 2 KB & 4 KB — 0.25 to 32 KB 0.5 KB
I/O Opcode Opcode Memory mapped Memory mapped
Integer registers (size,
model, number)
16 GPR · 32 bits 8 dedicated data ·
16 bits
8 data & 8 address
· 32 bits
15 GPR · 32 bits
Separate floati
registers
Floating-point
FIGURE C.12
agreement betw
tion of just this oThis style of architectures cannot remain static, however. Like people, instruc-
tion sets tend to get bigger as they get older. Figure C.13 shows the genealogy of
these instruction sets, and Figure C.14 shows which features were added to or de-
leted from generations of machines over time.
ng-point 4 · 64 bits Optional:
8 · 80 bits
Optional:
8 · 80 bits
0
format IBM (floating
hexadecimal)
IEEE 754 single,
double, extended
IEEE 754 single,
double, extended
DEC
Summary of four 1970s architectures. Unlike the architectures in Figure C.1 on page C-2, there is little
een these architectures in any category. (See Appendix D for more details on the 8086; in fact, the descrip-
ne machine is as long as this whole appendix!)
C-24 Appendix C Survey of RISC Architectures
1960
1965
1970
CDC 6600
1963
IBM ASC 1968FIGURE C.13 The lineage of RISC instruction sets. Commercial machines are shown in
plain text and research machines in bold. The CDC-6600 and Cray-1 were load-store ma-
chines with register 0 fixed at 0, and separate integer and floating-point registers. Instructions
could not cross word boundaries. An early IBM research machine led to the 801 and America
research projects, with the 801 leading to the unsuccessful RT/PC and America leading to the
successful Power architecture. Some people who worked on the 801 later joined Hewlett
Packard to work on the PA-RISC. The two university projects were the basis of MIPS and
SPARC machines. DEC shipped workstations using MIPS microprocessors for three years
before they brought out their own RISC instruction set, Alpha, which is very similar to MIPS III.
1975
1980
1985
1990
1995
IBM 801
1975
America
1985
Power-1
1990
PowerPC
1993
Power-2
1993
RT/PC
1986
PA-RISC
1986
CRAY 1
1976
Berkeley RISC-1
1981
SPARC v8
1987
SPARC v9
1994
Stanford MIPS
1982
MIPS I
1986
MIPS II
1989
MIPS III
1992
Alpha
1992
MIPS IV
1994
C.10 References C-25
Feature
Interlocked loa
Load/store FP
Semaphore
Square root
Single-precisio
Memory synch
Coprocessor
Base + index a
» 32 64-bit FP
Annulling dela
Branch registe
Big or Little E
Branch predict
Conditional m
Prefetch data i
64-bit addressi
32-bit multiply
Load/store FP
Fused FP mul/
String instruct
FIGURE C.14
tinued from prior
C
BHANDARKAR, D. P. [1995]. Alpha Architecture and Implementations, Digital Press, Newton, Mass.
HEWLETT PACKARD [1994]. PA-RISC 1.1 Architecture Reference Manual, 3rd ed.
IBM [1994]. The PowerPC Architecture, Morgan Kaufmann, San Francisco.
KANE, G. [1988]. MIPS RISC Architecture, Prentice Hall, Englewood Cliffs, N. J.
MAGENHEIMER, D. J., L. PETERS, K. W. PETTIS, AND D. ZURAS [1988]. “Integer multiplication and
division on the HP Precision Architecture,” IEEE Trans. on Computers, 37:8, 980–990.
MUCHNICK, S. S. [1988]. “Optimizing compilers for SPARC,” Sun Technology (Summer) 1:3, 64–77.
SILICON GRAPHICS [1994]. MIPS IV Instruction Set, Revision 2.2.
SITES, R. L. (ED.) [1992]. Alpha Architecture Reference Manual, Digital Press, Newton, Mass.
PA-RISC SPARC MIPS Power
1.0 1.1 2.0 v. 8 v. 9 I II III IV 1 2 PC
ds Ö " " Ö " + " " Ö " "
double Ö " " Ö " + " " Ö " "
Ö " " Ö " + " " Ö " "
Ö " " Ö " + " " + "
n FP ops Ö " " Ö " Ö " " " +
ronization Ö " " Ö " + " " Ö " "
Ö " " Ö – Ö " " "
ddressing Ö " " Ö " + Ö " "
registers " " + + " Ö " "
yed branch Ö " " Ö " + " "
r contents Ö " " + Ö " " "
ndian + " + Ö " " " +
ion bit + + " " Ö " "
ove + + Ö " –
nto cache + + + Ö " "
ng/ int. ops + + + " +
, divide + " + Ö " " " Ö " "
quad + + –
add + + Ö " "
ions Ö " " Ö " –
Features added to RISC machines. Ö means in the original machine, + means added later, " means con-
machine, and – means removed from architecture.
.10 ReferencesSUN MICROSYSTEMS [1989]. The SPARC Architectural Manual, Version 8, Part No. 800-1399-09,
C-26 Appendix C Survey of RISC ArchitecturesAugust 25, 1989.
TAYLOR, G., P. HILFINGER, J. LARUS, D. PATTERSON, AND B. ZORN [1986]. “Evaluation of the SPUR
LISP architecture,” Proc. 13th Symposium on Computer Architecture (June), Tokyo.
UNGAR, D., R. BLAU, P. FOLEY, D. SAMPLES, AND D. PATTERSON [1984]. “Architecture of SOAR:
Smalltalk on a RISC,” Proc. 11th Symposium on Computer Architecture (June), Ann Arbor, Mich.,
188–197.
WEAVER, D. L. AND T. GERMOND [1994]. The SPARC Architectural Manual, Version 9, Prentice
Hall, Englewood Cliffs, N. J.
WEISS, S. AND J. E. SMITH [1994]. Power and PowerPC, Morgan Kaufmann, San Francisco.
Các file đính kèm theo tài liệu này:
- Computer_Organization_and_Design_The_Hardware_Software_Interface.pdf