Scattered look-ahead pipelining: The denominator of the transfer
function in (10.9) is transformed in a way that it contains the N terms
. Equivalently, the state y(n) is computed in
terms of N past scattered states y(n-M),y(n-2M), , and y(n-NM)
• In scattered look-ahead, for each poles in the original filter, we
introduce (M-1) canceling poles and zeros with equal angular spacing
at a distance from the origin the same as that of the original pole.
– Example: if the original filter has a pole at , we add (M-1)
poles and zeros at to
derive a pipelined realization with M loop pipeline stages
47 trang |
Chia sẻ: huyhoang44 | Lượt xem: 712 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Kỹ thuật viễn thông - Chapter 10: Pipelined and parallel recursive and adaptive filters, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
Chapter 10: Pipelined and Parallel
Recursive and Adaptive Filters
Keshab K. Parhi
Chapter 10 2
Outline
• Introduction
• Pipelining in 1st-Order IIR Digital Filters
• Pipelining in Higher-Order IIR Digital Filters
• Parallel Processing for IIR Filters
• Combined Pipelining and Parallel Processing for IIR
Filters
Chapter 10 3
First-Order IIR Filter
• Consider a 1st-order linear time-invariant recursion (see Fig. 1)
• The iteration period of this filter is , where represent
word-level multiplication time and addition time
Look-Ahead Computation
{ }am TT + { }am TT ,
)()()1( nubnyany ×+×=+ (10.1)
• In look-ahead transformation, the linear recursion is first iterated a few
times to create additional concurrency.
• By recasting this recursion, we can express y(n+2) as a function of y(n)
to obtain the following expression (see Fig. 2(a))
• The iteration bound of this recursion is , the same as the
original version, because the amount of computation and the number of
logical delays inside the recursive loop have both doubled
( ) 22 am TT +
[ ] )1()()()2( +++=+ nbunbunayany (10.2)
Chapter 10 4
• Another recursion equivalent to (10.2) is (10.3). Shown on Fig.2(b), its
iteration bound is , a factor of 2 lower than before.
• Applying (M-1) steps of look-ahead to the iteration of (10.1), we can
obtain an equivalent implementation described by (see Fig. 3)
– Note: the loop delay is instead of , which means that the loop
computation must be completed in M clock cycles (not 1 clock cycle). The
iteration bound of this computation is , which corresponds
to a sample rate M times higher than that of the original filter
– The terms in (10.4) can be pre-computed
(referred to as pre-computation terms). The second term in RHS of (10.4)
is the look-ahead computation term (referred to as the look-ahead
complexity); it is non-recursive and can be easily pipelined
)1()()()2( 2 +×+×+×=+ nubnuabnyany
( ) 2am TT +
(10.3)
å
-
=
--+××+×=+
1
0
)1()()(
M
i
iM iMnubanyaMny (10.4)
Mz- 1-z
( ) MTT am +
{ }MM ababaab ,,,, 12 -×××
Chapter 10 5
Fig.2.(a)
Fig.2.(b)
2D
D
b
a
u(n+1)
y(n+1)
u(n)
b
a
y(n+2)
y(n)
2D
D
ab a
u(n+1) u(n)
y(n+2)
y(n)
b 2
D
b a
u(n)
y(n)
y(n+1)
Fig. 1
Chapter 10 6
MD
D
ab a
u(n+M-1) u(n+M-2)
y(n+M)
y(n)
b M
D
a b
u(n)
M-1
Fig. 3: M-stage Pipelinable 1st-Order IIR Filter
• Look-ahead computation has allowed a single serial computation to be
transformed into M independent concurrent computations, and to
pipeline the feedback loop to achieve high speed filtering of a single
time series while maintaining full hardware utilization.
• Provided the multiplier and the adder can be conveniently pipelined,
the iteration bound can be achieved by retiming or cutset
transformation (see Chapter 4)
Chapter 10 7
Pipelining in 1st-Order IIR Digital Filters
– Look-ahead techniques add canceling poles and zeros with equal
angular spacing at a distance from the origin which is same as
that of the original pole. The pipelined filters are always stable
provided that the original filter is stable
– The pipelined realizations require a linear increase in complexity
but decomposition techniques can be used to obtain an
implementation with logarithmic increase in hardware with
respect to the number of loop pipeline stages
– Example: Consider the 1st-order IIR filter transfer function
• The output sample y(n) can be computed using the input sample
u(n) and the past output sample as follows:
• The sample rate of this recursive filter is limited by the
computation time of one multiply-add operation
11
1
)( -×-
=
za
zH (10.5)
)()1()( nunyany +-×= (10.6)
Chapter 10 8
Pipelining in 1st-Order IIR Digital Filters (continued)
1. Look-Ahead Pipelining for 1st-Order IIR Filters
• Look-ahead pipelining adds canceling poles and zeroes to the transfer
function such that the coefficients of in the
denominator of the transfer function are zero. Then, the output sample
y(n) can be computed using the inputs and the output sample y(n-M)
such that there are M delay elements in the critical loop, which in turn
can be used to pipeline the critical loop by M stages and the sample
rate can be increased by a factor M
• Example: Consider the 1st-order filter, , which
has a pole at z=a (a£1). A 3-stage pipelined equivalent stable filter can
be derived by adding poles and zeroes at , and is given
by
( )},,{ 11 --- ××× Mzz
( )111)( -×-= zazH
( )32pjaez ±=
33
221
1
1
)(
-
--
×-
×+×+
=
za
zaza
zH
Chapter 10 9
Pipelining in 1st-Order IIR Digital Filters (continued)
2. Look-Ahead Pipelining with Power-of-2 Decomposition
• With power-of-2 decomposition, an M-stage (for power-of-2 M)
pipelined implementation for 1st-order IIR filter can be obtained by
sets of transformations
• Example: Consider a 1st-order recursive filter transfer function
described by . The equivalent pipelined
transfer function can be described using the decomposition technique
as follows
– This pipelined implementation is derived by adding (M-1) poles and zeros
at identical locations.
– The original transfer function has a single pole at (see Fig.4(a)).
M2log
( )
MM
M
i
za
zazb
zH
ii
-
-
=
--
×-
×+×
= Õ
1
1
)(
1log
0
221 2
(10.7)
( ) ( )11 1)( -- ×-×= zazbzH
az =
Chapter 10 10
– The pipelined transfer function has poles at the following locations (see
Fig.4(b) for M=8):
– The decomposition of the canceling zeros is shown in Fig.4(c). The i-th
stage of the decomposed non-recursive portion implements zeros
located at:
– The i-th stage of the decomposed non-recursive portion requires a single
pipelined multiplication operation independent of the stage number i
– The multiplication complexity of the pipelined implementation is
– The finite-precision pipelined filters suffer from inexact pole-zero
cancellation, which leads to magnitude and phase error. These errors can
be reduced by increasing the wordlength (see p. 323)
( ) ( ){ }MMjMj aeaea pp 212 ,,, ×-×××
i2
( ) ( )( ),212exp injaz p+= (10.8)( )12,,1,0 -×××= in
( )2log 2 +M
Chapter 10 11
Fi
g.
4
•
Po
le
re
pr
es
en
ta
tio
n
of
a
1
ST
-o
rd
er
re
cu
rs
iv
e
fi
lte
r
•
Po
le
/z
er
o
re
pr
es
en
ta
tio
n
of
a
1
ST
-o
rd
er
II
R
w
ith
8
lo
op
s
ta
ge
s
•
D
ec
om
po
si
tio
n
ba
se
d
on
p
ip
el
in
ed
II
R
fo
r M
=
8
Chapter 10 12
Pipelining in 1st-Order IIR Digital Filters (continued)
3. Look-Ahead Pipelining with General Decomposition
• The idea of decomposition can be extended to any arbitrary number of
loop pipelining stages M. If , then the non-recursive
stages implement , , zeros,
respectively, totaling (M-1) zeros
• Example (Example 10.3.3, p.325) Consider the 1st-order IIR
– A 12-stage pipelined decomposed implementation is given by
pMMMM ×××= 21( )11 -M ( )121 -MM ( )1, 121 -×××××× - pp MMMM
11
1
)( -×-
=
za
zH
( )( )( )
1212
6644221
1212
11
0
1
111
1
)( -
----
-
=
-
×-
++++
=
×-
×
= å
za
zazazaaz
za
za
zH i
ii
- This implementation is based on a
decomposition (see Fig. 5)
232 ´´
Chapter 10 13
– The first section implements 1 zero at –a, the second section implements 4
zeros at , and the third section implements 6 zeros at
– Another decomposed implementation ( decomposition) is given
by
• The first section implements 1 zero at –a, the second section
implements 2 zeros at , and the third section implements 8 zeros
at
{ }323, pp jj aeae ±±
{ }656 ,, pp jj aeaeja ±±±
( )( )( )
1212
8844221
1
111
)(
-
----
×-
++++
=
za
zazazaaz
zH
322 ´´
{ }653236 ,,, pppp jjjj aeaeaeae ±±±±
ja±
– The third decomposition ( decomposition) is given by
• The first section implements 2 zeros at , the second section
implements 3 zeros at , and the third section
implements 6 zero at
223 ´´
( )( )( )
1212
6633221
1
111
)(
-
----
×-
++++
=
za
zazazaaz
zH
{ }32pjae±
{ }3, pjaea ±-
{ }656 ,, pp jj aejaae ±± ±
Chapter 10 14
Fig.5(a)
Fig.5(b)
Pole-zero location of a
12-stage pipelined
1ST-order IIR
Decomposition of the zeros
of the pipelined filter for
a 2X3X2 decomposition
Chapter 10 15
Pipelining in Higher-order IIR Digital Filters
• Higher-order IIR digital filters can be pipelined by using clustered
look-ahead or scattered look-ahead techniques. (For 1st-order IIR filters,
these two look-ahead techniques reduce to the same form)
– Clustered look-ahead: Pipelined realizations require a linear
complexity in the number of loop pipelined stages and are not
always guaranteed to be stable
– Scattered look-ahead: Can be used to derive stable pipelined IIR
filters
– Decomposition technique: Can also be used to obtain area-efficient
implementation for higher-order IIR for scattered look-ahead filters
– Constrained filter design techniques: Achieve pipelining without
pole-zero cancellation
Chapter 10 16
• The transfer function of an N-th order direct-form recursive filter is
described by
• Equivalently, the output y(n) can be described in terms of the input
sample u(n) and the past input/output samples as follows
– The sample rate of this IIR filter realization is limited by the throughput of
1 multiplication and 1 addition, since the critical path contains a single
delay element
å
å
=
-
=
-
-
= N
i
i
i
N
i
i
i
za
zb
zH
1
0
1
)( (10.9)
(10.10)
)()(
)()()(
1
01
nzinya
inubinyany
N
i
i
N
i
i
N
i
i
+-=
-+-=
å
åå
=
==
Chapter 10 17
1. Clustered Look-Ahead Pipelining
• The basic idea of clustered look-ahead:
– Add canceling poles and zeros to the filter transfer function such
that the coefficients of in the denominator of the
transfer function are 0, and the output samples y(n) can be
described in terms of the cluster of N past outputs:
– Hence the critical loop of this implementation contains M delay
elements and a single multiplication. Therefore, this loop can be
pipelined by M stages, and the sample rate can be increased by a
factor M. This is referred to as M-stage clustered look-ahead
pipelining
{ })1(1 ,, --- ××× Mzz
{ })1(,),( +--×××- NMnyMny
Pipelining in Higher-order IIR Digital Filters (cont’d)
Chapter 10 18
• Example: (Example 10.4.1, p.327) Consider the all-pole 2nd-order IIR filter
with poles at . The transfer function of this filter is
– A 2-stage pipelined equivalent IIR filter can be obtained by eliminating
the term in the denominator (i.e., multiplying both the numerator and
denominator by ). The transformed transfer function is
given by:
– From, the transfer function, we can see that the coefficient of in the
denominator is zero. Hence, the critical path of this filter contains 2 delay
elements and can be pipelined by 2 stages
{ }43,21
2
8
31
4
51
1
)( -- +-
=
zz
zH
(10.11)
1-z
( )1451 -+ z
3
32
152
16
19
1
4
5
1
4
5
1
4
5
2
8
31
4
5
1
1
1
1
1
1
)(
--
-
-
-
--
+-
+
=
+
+
×
+-
=
zz
z
z
z
zz
zH
(10.12)
1-z
Chapter 10 19
• Computation complexity: The numerator (non-recursive portion) of this
pipelined filter needs (N+M) multiplications, and the denominator (recursive
portion) needs N multiplications. Thus, the total complexity of this pipelined
implementation is (N+N+M) multiplications
• Stability: The canceling poles and zeros are utilized for pipelining IIR filters.
However, when the additional poles lie outside the unit circle, the filter
becomes unstable. Note that the filters in (10.12) and (10.13) are unstable.
– Similarly, a 3-stage pipelined realization can be derived by eliminating
the terms of in the denominator of (10.21), which can be
done by multiplying both numerator and denominator by
.
– The new transfer function is given by:
{ }21 , -- zz
( )21 1619451 -- ++ zz
4
128
573
64
65
2
16
191
4
5
1
1
)( --
--
+-
++
=
zz
zz
zH (10.13)
Chapter 10 20
• If the desired pipeline delay M does not produce a stable filter, M
should be increased until a stable pipelined filter is obtained. To obtain
the optimal pipelining level M, numerical search methods are generally
used
• Example (Example 10.4.3, p.330) Consider a 5-level (M=5) pipelined
implementation of the following 2nd-order transfer function
– By the stability analysis, it is shown that (M=5) does not meet the stability
condition. Thus M is increased to M=6 to obtain the following stable
pipelined filter as
21 6889.05336.11
1
)( -- +-
=
zz
zH
(10.14)
76
54321
5011.03265.11
7275.01454.14939.16630.15336.11
)( --
-----
+-
+++++
=
zz
zzzzz
zH
(10.15)
2. Stable Clustered Look-Ahead Filter Design
Chapter 10 21
3. Scattered Look-Ahead Pipelining
• Scattered look-ahead pipelining: The denominator of the transfer
function in (10.9) is transformed in a way that it contains the N terms
. Equivalently, the state y(n) is computed in
terms of N past scattered states y(n-M),y(n-2M),, and y(n-NM)
• In scattered look-ahead, for each poles in the original filter, we
introduce (M-1) canceling poles and zeros with equal angular spacing
at a distance from the origin the same as that of the original pole.
– Example: if the original filter has a pole at , we add (M-1)
poles and zeros at to
derive a pipelined realization with M loop pipeline stages
• Assume that the denominator of the transfer function can be factorized
as follows:
pz=
{ }NMMM zzz --- ××× ,,, 2
( ){ }1,,2,1,2exp -×××== MkMkjpz p
Õ = --=
N
i i
zpzD
1
1)1()( (10.16)
Chapter 10 22
– Then, the pipelining process using the scattered look-ahead
approach can be described by
(continued)
( )
( ) )('
)('
1
1)(
)(
)(
)(
1
1
0
12
1
1
1
12
MN
i
M
k
Mkj
i
N
i
M
k
Mkj
i
zD
zN
zep
zepzN
zD
zN
zH
=
-
-
=
=
Õ Õ
Õ Õ
=
-
=
-
=
-
=
-
p
p
(10.17)
• Example (Example 10.4.5, p.332) Consider the 2nd-order filter with complex
conjugate poles at . The filter transfer function is given by
– We can pipeline this filter by 3 stages by introducing 4 additional
poles and zeros at if
qjrez ±=
221cos21
1
)( -- +-
=
zrzr
zH
q
( ) ( ){ }3232 , pqpq -±+± == jj rezrez 32pq ¹
Chapter 10 23
– (cont’d) The equivalent pipelined filter is then given by
– When , then only 1 additional pole and zero at is
required for 3-stage pipelining since
and . The equivalent pipelined filter is then
given by
• Example (Example 10.4.6, p.332) Consider the 2nd-order filter with real
poles at . The transfer function is given by
( )
( ) 6633
4433221
3cos21
cos22cos21cos21
)( --
----
+×-
+×+×++×+
=
zrzr
zrzrzrzr
zH
q
qqq
32pq = rz =
( ) rrez j == -± 32pq
( ) qpq jj rerez ±+± == 32
( )( ) 33
1
1221
1
1
1
11
1
)( -
-
---
-
×-
×-
=
×-×+×+
×-
=
zr
zr
zrzrzr
zr
zH
{ }21, rzrz ==
( ) 2211211
1
)( -- ×+×+-
=
zrrzrr
zH
Chapter 10 24
– (cont’d) A 3-stage pipelined realization is derived by adding
poles(and zeros) at . The pipelined
realization is given by
– The pole-zero locations of a 3-stage pipelined 2nd-order filter with
poles at z=1/2 and z=3/4 are shown in Fig. 6
• CONCLUSIONS:
– If the original filter is stable, then the scattered look-ahead
approach leads to stable pipelined filters, because the distance of
the additional poles from the original is the same as that of the
original filter
– Multiplication Complexity: (NM+1) for the non-recursive portion
in (10.17), and N for the recursive portion. Total pipelined filter
multiplication complexity is (NM+N+1)
{ }322321 , pp jj erzerz ±± ==
( ) ( ) ( )
( ) 6323133231
42
2
2
1
3
2121
22
221
2
1
1
21
1
1
)(
--
----
+×+-
++++++++
=
zrrzrr
zrrzrrrrzrrrrzrr
zH
Chapter 10 25
2
1
4
3 1 Re[z]
Im[z]
Fig. 6: Pole-zero representation of a 3-stage pipelined equivalent
stable filter derived using scattered look-ahead approach
– (cont’d) The multiplication complexity is linear with respect to M.
and is much greater than that of clustered look-ahead
– Also the latch complexity is square in M, because each multiplier
is pipelined by M stages
Chapter 10 26
4. Scattered Look-Ahead Pipelining with Power-of-2
Decomposition
• This kind of decomposition technique will lead to a logarithmic
increase in multiplication complexity (hardware) with respect to the
level of pipelining
• Let the transfer function of a recursive digital filter be
– A 2-stage pipelined implementation can be obtained by
multiplying by in the numerator and
denominator. The equivalent 2-stage pipelined implementation is
described by
)(
)(
1
)(
1
0
zD
zN
za
zb
zH N
i
i
i
N
i
i
i =
×-
=
å
å
=
-
=
-
(10.18)
( )å = -×-- Ni iii za1 )1(1
( ) ( )( )
( )( ) )('
)('
)1(11
11
)(
11
10
zD
zN
zaza
zazb
zH N
i
i
i
iN
i
i
i
N
i
i
i
iN
i
i
i =
×--×-
×--
=
åå
åå
=
-
=
-
=
-
=
-
(10.19)
Chapter 10 27
– (cont’d) Similarly, subsequent transformations can be applied to
obtain 4, 8, and 16 stage pipelined implementations, respectively
– Thus, to obtain an M-stage pipelined implementation (for power-
of-2 M), sets of such transformations need to be applied.
– By applying sets of such transformations, an
equivalent transfer function (with M pipelining stages inside the
recursive loop) can be derived, which requires a complexity of
multiplications, a logarithmic complexity
with respect to M.
M2log
( )1log 2 -M
( )1log2 2 ++ MNN
– Note: the number of delays (or latches) is linear: the total number of
delays (or latches) is approximately , about NM
delays in non-recursive portion, and delays for
pipelining each of the multipliers by M stages
– In the decomposed realization, the 1st stage implements an N-th
order non-recursive section, and the subsequent stages respectively
implement 2N, 4N,, NM/2-order non-recursive sections
( )1log 2 +MNM
MNM 2log
MN 2log
Chapter 10 28
• Example (Example 10.4.7, p.334) Consider a 2nd-order recursive filter
described by
– The poles of the system are located at (see Fig. 7(a)).
– The pipelined filter requires multiplications and is
described by
221
2
2
1
10
cos21)(
)(
)( --
--
+×-
++
==
zrzr
zbzbb
zU
zY
zH
q
{ }qjrez ±=
´
+×-
= --
=
-å
MMMM
i
i
i
zrzMr
zb
zH 22
2
0
cos21
)(
q
( )Õ -= -- +++×+1log0 22222 112cos21Mi i iiii zrzr q
{ }Mwhere pq 2¹
– The 2M poles of the transformed transfer function (shown in Fig.
7(b)) are located at
( )( ) ( )1,2,1,0,2 -×××== +± Mirez Mij pq
( )5log2 2 +M
Chapter 10 29
Fig. 7: (also see Fig.10.11, p.336) Pole-zero representation for Example 10.4.7,
the decompositions of the zeros is shown in (c)
Chapter 10 30
Parallel Processing in IIR Filters
• Parallel processing can also be used in design of IIR filters
• First discuss parallel processing for a simple 1st-order IIR filter, then
we discuss higher order filters
• Example: (Example 10.5.1, p.339) Consider the transfer function of a 1st-
order IIR filter given by
– where for stability. This filter has only 1 pole located at . The
corresponding input-output can be written as y(n+1)=ay(n)+u(n)
– Consider the design of a 4-parallel architecture (L=4) for the foregoing
filter. Note that in the parallel system, each delay element is referred to as
a block delay, where the clock period of the block system is 4 times the
sample period. Therefore, the loop update equation should update y(n+4)
by using inputs and y(n).
1
1
1
)(
-
-
-
=
az
z
zH
1£a az =
Chapter 10 31
– By iterating the recursion (or by applying look-ahead technique), we get
– Substituting
– The corresponding architecture is shown in Fig.8.
)3()2()1()()()4( 234 +++++++=+ nunaunuanuanyany
kn 4=
)34()24(
)14()4()4()44( 234
++++
+++=+
kukau
kuakuakyaky
D
2a 3aa
)24( +ku )14( +ku )4( ku)34( +ku
4a
)44( +ky )4( ky
Fig. 8: (also see Fig.10.14, p. 340)
(10.20)
Chapter 10 32
– The pole of the original filter is at , whereas the pole for the
parallel system is at , which is much closer to the origin since
– An important implication of this pole movement is the improved
robustness of the system to the round-off noise
– A straightforward block processing structure for L=4 obtained by
substituting n=4k+4, 4k+5, 4k+6 and 4k+7 in (10.20) is shown in Fig. 9.
– Hardware complexity of this architecture: multiply-add operations
(Because L multiply-add operations are required for each output and there
are L outputs in total)
– The square increase in hardware complexity can be reduced by exploiting
the concurrency in the computation (the decomposition property in the
scattered look-ahead mode can not be exploited in the block processing mode
because one hardware delay element represents L sample delays)
az =
4az =
{ }1since,4 ££ aaa
2L
Chapter 10 33
Fig. 9: (also Fig.10.15, p.341) A 4-parallel 1st-order recursive filter
D
2a3a
)44( +ku )54( +ku )64( +ku)34( +ku
4a
)74( +ky )34( +ky
a
D
2a
3a
4a
)64( +ky )24( +ky
a
D
D
2a3a
4a
)54( +ky )14( +ky
a
D
3a 2a
4a
)64( +ky )4( ky
a
D
D
Chapter 10 34
• Trick:
– Instead of computing y(4k+1), y(4k+2) & y(4k+3) independently,
we can use y(4k) to compute y(4k+1), use y(4k+1) to compute
y(4k+2), and use y(4k+2) to compute y(4k+3), at the expense of an
increase in the system latency, which leads to a significant
reduction in hardware complexity.
– This method is referred as incremental block processing, and
y(4k+1), y(4k+2) and y(4k+3) are computed incrementally.
• Example (Example 10.5.2, p.341) Consider the same 1st-order filter in last
example. To derive its 4-parallel filter structure with the minimum hardware
complexity instead of simply repeating the hardware 4 times as in Fig.15, the
incremental computation technique can be used to reduce hardware complexity
– First, design the circuit for computing y(4k) (same as Fig.14)
– Then, derive y(4k+1) from y(4k), y(4k+2) from y(4k+1), y(4k+3) from
y(4k+2) by using
ï
î
ï
í
ì
+++=+
+++=+
+=+
)24()24()34(
)14()14()24(
)4()4()14(
kukayky
kukayky
kukayky
Chapter 10 35
– The complete architecture is shown in Fig.10
– The hardware complexity has reduced from to (2L-1) at the expense of
an increase in the computation time for y(4k+1), y(4k+2) and y(4k+3)
Fig.10: (also see Fig.10.16, p.342) Incremental block filter structure with L=4
2L
D
2a 3aa
)24( +ku )14( +ku )4( ku)34( +ku
4a
)44( +ky
)4( ky
a
a
a
)14( +ky
)24( +ky
)34( +ky
Chapter 10 36
• Example (Example 10.5.3, p.342) Consider a 2nd-order IIR filter described
by the transfer function (10.21). Its pole-zero locations are shown in
Fig.11. Derive a 3-parallel IIR filter where in every clock cycle 3
inputs are processed and 3 outputs are generated
– Since the filter order is 2, 2 outputs need to be updated independently and
the 3rd output can be computed incrementally outside the feedback loop
using the 2 updated outputs. Assume that y(3k) and y(3k+1) are computed
using loop update operations and y(3k+2) is computed incrementally.
From the transfer function, we have:
– The loop update process for the 3-parallel system is shown in Fig.12
where y(3k+3) and y(3k+4) are computed using y(3k) and y(3k+1)
2
8
31
4
5
21
1
)1(
)( --
-
+-
+
=
zz
z
zH (10.21)
)2()1(2)()(
);()2()1()( 8
3
4
5
-+-+=
+---=
nunununf
nfnynyny
(10.22)
Chapter 10 37
2
1
4
3 11-
D
D
y(3k+3) y(3k)
y(3k+1)y(3k+4)
Fig.11: Pole-zero plots
for the transfer function
Fig.12: Loop update for
block size=3
Chapter 10 38
– The computation of y(3k+3) using y(3k) & y(3k+1) can be carried out if
y(n+3) can be computed using y(n) & y(n+1). Similarly y(3k+4) can be
computed using y(3k) & y(3k+1) if y(n+4) can be expressed in terms of
y(n) & y(n+1) (see Fig.13). These state update operations correspond to
clustered look-ahead operation for M=2 and 3 cases. The 2-stage and 3-
stage clustered look-ahead equations are derived as:
[ ]
[ ]
)()1(
)3()2()4()3(
)(
)2()1()3()2(
)()2()1()(
4
5
32
15
8
3
4
5
16
19
8
3
8
3
4
5
4
5
8
3
4
5
nfnf
nynfnyny
nf
nynfnyny
nfnynyny
+-+
---+---=
+
---+---=
+---=
(10.23)
Chapter 10 39
– Substituting n=3k+3 & n=3k+4 into (10.23), we have the following 2
loop update equations:
– The output y(3k+2) can be obtained incrementally as follows:
– The block structure is shown in Fig. 14
)43()33(
)23()3()13()43(
)23(
)23()3()13()33(
4
5
16
19
128
57
64
65
4
5
32
15
16
19
++++
ï
î
ï
í
ì
++-+=+
++
++-+=+
kfkf
kfkykyky
kf
kfkykyky
(10.24)
)23()3()13()23( 3
8
4
5 ++-+=+ kfkykyky
3
2
n+2 n+3 n+4n+1n
Fig.13: Relationship of the recursive outputs
Chapter 10 40
Fig. 14: Block structure of the 2nd-order IIR filter (L=3)
(also see Fig.10.20, p.344)
D
2
)43( +ku
D
2
2
D
D
4
5 16
19
4
5
16
19
32
15-
64
65
128
51-
8
3-
4
5
)33( +ku )23( +ku
)43(1 +kf
)33(1 +kf
)23(1 +kf
)43(2 +kf
)33(2 +kf
)13( +ky
)3( ky
)23( +ky
Chapter 10 41
• Comments
– The original sequential system has 2 poles at {1/2, 3/4}. Now consider the
pole locations of the new parallel system. Rewrite the 2 state update
equations in matrix form: Y(3k+3)=AY(3k)+F, i.e.
– The eigenvalues of system matrix A are , which are the poles of
the new parallel system. Thus, the parallel system is more stable. Note:
the parallel system has the same number of poles as the original system
– For a 2nd-order IIR filter (N=2), there are total 3L+[(L-2)+(L-1)]+4+2(L-
2)=7L-3 multiplications, (the numerator part — 3L; the overhead of loop
update — [(L-2)+(L-1)]; the loop multiplications — 4; the incremental
computation — 2(L-2)). The multiplication complexity is linear function
of block size L. This multiplication complexity can be further reduced by
using fast parallel filter structures and substructure sharing for the
incrementally-computed outputs
ú
û
ù
ê
ë
é
+ú
û
ù
ê
ë
é
+
×ú
û
ù
ê
ë
é
=ú
û
ù
ê
ë
é
+
+
-
-
2
1
64
65
128
57
16
19
32
15
)13(
)3(
)43(
)33(
f
f
ky
ky
ky
ky
(10.25)
( ) ( )343321 ,
Chapter 10 42
Combined Pipelining and Parallel Processing
For IIR Filters
• Pipelining and parallel processing can also be combined for IIR filters
to achieve a speedup in sample rate by a factor L×M, where L denotes
the levels of block processing and M denotes stages of pipelining, or to
achieve power reduction at the same speed
• Example (Example 10.6.1, p.345) Consider the 1st-order IIR with transfer
function (10.26). Derive the filter structure with 4-level pipelining and
3-level block processing (i.e., M=4, L=3)
– Because the filter order is 1, only 1 loop update operation is required.The
other 3 outputs can be computed incrementally.
11
1
)( -×-
=
za
zH (10.26)
Chapter 10 43
– Since pipelining level M=4, the loop must contain 4 delay elements
(shown in Fig.15). Since the block size L=3, each delay element
represents a block delay (corresponds to 3 sample delays). Therefore,
y(3k+12) needs to be expressed in terms of y(3k) and inputs (see Fig. 15).
Fig.15: Loop update for the pipelined block system
(also see Fig.10.21, p. 346)
4D
y(3k)y(3k+12)
12a
Chapter 10 44
– Substituting n=3k+12, we get:
– Finally, we have:
– The parallel-pipelined filter structure is shown in Fig. 16
)()11()12(
)()1()(
1112 nunuanya
nunayny
+××××××+-+-=
××××××=
+-=
)123()93()63()3(
)123()13()3()123(
11
3
2
612
1112
++++++=
++××××××+++=+
kfkfakfakya
kukuakyaky
ïî
ï
í
ì
+++=+
+++++=+
)123()93()123(
)123()113()103()123(
11
3
2
2
1
kfkfakf
kukaukuakf
where
ï
î
ï
í
ì
+++=+
++=+
++++++=+
)23()13()23(
)13()3()13(
)123()93()63()3()123( 11
3
2
612
kukayky
kukayky
kfkfakfakyaky
(10.27)
Chapter 10 45
Fig. 16: The filter structure of the pipelined block system
with L=3 & M=4 (also see Fig.10.22, p.347)
3D
)103( +ku
2a
)113( +ku
)123( +ku
a
3D
D
3a
2D
3a
4D
12a
)123(1 +kf )123(2 +kf
)13( +ku
)23( +ku )23( +ky
)13( +ky
)3( ky
Chapter 10 46
• Comments
– The parallel-pipelined filter has 4 poles: . Since the
pipelining level is 4 and the filter order is 1, there are total 4 poles in the
new system, which are separated by the same angular distance. Since the
block size is 3, the distance of the poles from the origin is .
– Note: The decomposition method is used here in the pipelining phase.
– The multiplication complexity ( assuming the pipelining level M to be
power of 2) can be calculated as (10.28), which is linear with respect to L,
and logarithmic with respect to M:
• Example (Example 10.6.2, p. 347) Consider the 2nd-order filter in Example
10.5.3 again, design a pipelined-block system for L=3 and M=2
( )3333 ,,, jajaaa --
3a
( ) ( ) MLLML 22 log1211log1 +-=-+++-
)2()1(2)()(
);()2()1()( 8
3
4
5
-+-+=
+---=
nunununf
nfnynyny
(10.28)
(10.29)
Chapter 10 47
– A method similar to clustered look-ahead can be used to update y(3k+6)
and y(3k+7) using y(3k) and y(3k+1). Then by index substitution, the final
system of equations can be derived.
– Suppose the system update matrix is A. Since the poles of the original
system are , the eigenvalues of A can be verified to be
– The poles of the new parallel-pipelined second-order filter are the square
roots of eigenvalues of A, i.e.,
• Comments: In general, the systematic approach below can be used to
compute the pole location of the new parallel pipelined system:
– 1. Write the loop update equations using LM-level look-ahead, where M
and L denote the level of pipelining and parallel processing, respectively.
– 2. Write the state space representation of the parallel pipelined filter,
where state matrix A has dimension N×N and N is the filter order
– 3. Compute the eigenvalues of matrix A,
– 4. The NM poles of the new parallel-pipelined system correspond to the
M-th roots of the eigenvalues of A, i.e.,
( )4321 , ( ) ( )643621 ,
( ) ( ) ( ) ( )343343321321 ,,, --
il
Ni ££1( )Mi
1
l
Các file đính kèm theo tài liệu này:
- chap_chap10_824_7935.pdf