Example-7 (cont’d)
– In this example, the Winograd convolution algorithm requires 5
multiplications and 11 additions compared with 6 multiplications and 2
additions for direct implementation
• Notes:
– The number of multiplications in Winograd algorithm is highly dependent
on the degree of each . Therefore, the degree of m(p) should be as
small as possible.
– More efficient form (or a modified version) of the Winograd algorithm
can be obtained by letting deg[m(p)]=deg[s(p)] and applying the CRT to
50 trang |
Chia sẻ: huyhoang44 | Lượt xem: 715 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Kỹ thuật viễn thông - Chapter 8: Fast convolution, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
Chapter 8: Fast Convolution
Keshab K. Parhi
2Chap. 8
Chapter 8 Fast Convolution
• Introduction
• Cook-Toom Algorithm and Modified Cook-Toom
Algorithm
• Winograd Algorithm and Modified Winograd Algorithm
• Iterated Convolution
• Cyclic Convolution
• Design of Fast Convolution Algorithm by Inspection
3Chap. 8
Introduction
• Fast Convolution: implementation of convolution algorithm using fewer
multiplication operations by algorithmic strength reduction
• Algorithmic Strength Reduction: Number of strong operations (such as
multiplication operations) is reduced at the expense of an increase in the
number of weak operations (such as addition operations). These are best suited
for implementation using either programmable or dedicated hardware
• Example: Reducing the multiplication complexity in complex number
multiplication:
– Assume (a+jb)(c+dj)=e+jf, it can be expressed using the matrix form, which
requires 4 multiplications and 2 additions:
– However, the number of multiplications can be reduced to 3 at the expense of 3
extra additions by using:
ú
û
ù
ê
ë
é
×ú
û
ù
ê
ë
é -
=ú
û
ù
ê
ë
é
b
a
cd
dc
f
e
î
í
ì
-++=+
-+-=-
)()(
)()(
baddcbbcad
baddcabdac
4Chap. 8
– Rewrite it into matrix form, its coefficient matrix can be decomposed as
the product of a 2X3(C), a 3X3(H)and a 3X2(D) matrix:
• Where C is a post-addition matrix (requires 2 additions), D is a pre-addition
matrix (requires 1 addition), and H is a diagonal matrix (requires 2 additions to
get its diagonal elements)
– So, the arithmetic complexity is reduced to 3 multiplications and 3
additions (not including the additions in H matrix)
• In this chapter we will discuss two well-known approaches to the design of
fast short-length convolution algorithms: the Cook-Toom algorithm (based
on Lagrange Interpolation) and the Winograd Algorithm (based on the
Chinese remainder theorem)
xDHC
b
a
d
dc
dc
f
e
s ×××=ú
û
ù
ê
ë
é
×
ú
ú
ú
û
ù
ê
ê
ê
ë
é
-
×
ú
ú
ú
û
ù
ê
ê
ê
ë
é
+
-
×ú
û
ù
ê
ë
é
=ú
û
ù
ê
ë
é
=
11
10
01
00
00
00
110
101
5Chap. 8
Cook-Toom Algorithm
• A linear convolution algorithm for polynomial multiplication based on
the Lagrange Interpolation Theorem
• Lagrange Interpolation Theorem:
Let nbb ,....,0 be a set of 1+n distinct points, and let )( if b , for i
= 0, 1, , n be given. There is exactly one polynomial )( pf of degree n or less
that has value )( if b when evaluated at ib for i = 0, 1, , n. It is given by:
Õ
Õ
å
¹
¹
= -
-
=
ij
ji
ij
jn
i
i
p
fpf
)(
)(
)()(
0 bb
b
b
6Chap. 8
• The application of Lagrange interpolation theorem into linear
convolution
Consider an N-point sequence { }110 ,...,, -= Nhhhh
and an L-point sequence { }110 ,...,, -= Lxxxx . The linear
convolution of h and x can be expressed in terms of polynomial
multiplication as follows: )()()( pxphps ×= where
01
1
1 ...)( hphphph
N
N +++=
-
-
01
1
1 ...)( xpxpxpx
L
L +++=
-
-
01
2
2 ...)( spspsps
NL
NL +++=
-+
-+
The output polynomial )( ps has degree 2-+ NL and has
1-+ NL
different points.
7Chap. 8
• (continued)
)( ps can be uniquely determined by its values at 1-+ NL
different points. Let { }210 ,...,, -+ NLbbb be 1-+ NL
different real numbers. If )( is b for { }2,...,1,0 -+= NLi are
known, then )( ps can be computed using the Lagrange interpolation
theorem as:
Õ
Õ
å
¹
¹
-+
= -
-
=
ij
ji
ij
jNL
i
i
p
sps
)(
)(
)()(
2
0 bb
b
b
It can be proved that this equation is the unique solution to compute linear
convolution for )( ps given the values of )( is b , for
{ }2,...,1,0 -+= NLi .
8Chap. 8
• Cook-Toom Algorithm (Algorithm Description)
• Algorithm Complexity
– The goal of the fast-convolution algorithm is to reduce the multiplication
complexity. So, if bi `s (i=0,1,,L+N-2) are chosen properly, the
computation in step-2 involves some additions and multiplications by
small constants
– The multiplications are only used in step-3 to compute s(bi). So, only
L+N-1 multiplications are needed
1. Choose 1-+ NL different real numbers 210 ,, -+××× NLbbb
2. Compute )( ih b and )( ix b , for { }2,,1,0 -+×××= NLi
3. Compute )()()( iii xhs bbb ×= , for { }2,,1,0 -+×××= NLi
4. Compute )( ps by using Õ
Õ
å
¹
¹
-+
= -
-
=
ij
ji
ij
jNL
i
i
p
sps
)(
)(
)()(
2
0 bb
b
b
9Chap. 8
– By Cook-Toom algorithm, the number of multiplications is reduced from
O(LN) to L+N-1 at the expense of an increase in the number of additions
– An adder has much less area and computation time than a multiplier. So,
the Cook-Toom algorithm can lead to large savings in hardware (VLSI)
complexity and generate computationally efficient implementation
• Example-1: (Example 8.2.1, p.230) Construct a 2X2 convolution algorithm using
Cook-Toom algorithm with b={0,1,-1}
– Write 2X2 convolution in polynomial multiplication form as
s(p)=h(p)x(p), where
– Direct implementation, which requires 4 multiplications and 1 additions,
can be expressed in matrix form as follows:
2
210
1010
)(
)()(
pspssps
pxxpxphhph
++=
+=+=
ú
û
ù
ê
ë
é
×
ú
ú
ú
û
ù
ê
ê
ê
ë
é
=
ú
ú
ú
û
ù
ê
ê
ê
ë
é
1
0
1
01
0
2
1
0
0
0
x
x
h
hh
h
s
s
s
10Chap. 8
• Example-1 (continued)
– Next we use C-T algorithm to get an efficient convolution implementation
with reduced multiplication number
– Then, s(b0), s(b1), and s(b2) are calculated, by using 3 multiplications, as
– From the Lagrange Interpolation theorem, we get:
1021022
1011011
00000
)(,)(,2
)(,)(,1
)(,)(,0
xxxhhh
xxxhhh
xxhh
-=-==
+=+==
===
bbb
bbb
bbb
)()()()()()()()()( 222111000 bbbbbbbbb xhsxhsxhs ===
2
2
10
21
0
221
0
1202
10
2101
10
1
2010
21
0
)
2
)()()(()
2
)()(()(
))((
))((
)2(
))((
))(()(
))((
))(()()(
sppss
ssspssps
pp
s
ppsppsps
++=
++-+-+=
--
--
+
--
--+
--
--=
bbbbbb
bbbb
bb
b
bbbb
bbb
bbbb
bbb
11Chap. 8
• Example-1 (continued)
– The preceding computation leads to the following matrix form
– The computation is carried out as follows (5 additions, 3 multiplications)
ú
û
ù
ê
ë
é
×
ú
ú
ú
û
ù
ê
ê
ê
ë
é
-
×
ú
ú
ú
û
ù
ê
ê
ê
ë
é
-
+×
ú
ú
ú
û
ù
ê
ê
ê
ë
é
-
-=
ú
ú
ú
û
ù
ê
ê
ê
ë
é
×
ú
ú
ú
û
ù
ê
ê
ê
ë
é
×
ú
ú
ú
û
ù
ê
ê
ê
ë
é
-
-=
ú
ú
ú
û
ù
ê
ê
ê
ë
é
×
ú
ú
ú
û
ù
ê
ê
ê
ë
é
-
-=
ú
ú
ú
û
ù
ê
ê
ê
ë
é
1
0
10
10
0
2
1
0
2
1
0
2
1
0
2
1
0
11
11
01
2)(00
02)(0
00
111
110
001
)(
)(
)(
2)(00
02)(0
00)(
111
110
001
2)(
2)(
)(
111
110
001
x
x
hh
hh
h
x
x
x
h
h
h
s
s
s
s
s
s
b
b
b
b
b
b
b
b
b
210221100
222111000
10210100
10
2
10
100
,,.4
,,.3
,,.2
2
,
2
,.1
SSSsSSsSs
XHSXHSXHS
xxXxxXxX
hhHhhHhH
++-=-==
===
-=+==
-=+== (pre-computed)
12Chap. 8
– (Continued): Therefore, this algorithm needs 3 multiplications and 5
additions (ignoring the additions in the pre-computation ), i.e., the number
of multiplications is reduced by 1 at the expense of 4 extra additions
– Example-2, please see Example 8.2.2 of Textbook (p.231)
• Comments
– Some additions in the preaddition or postaddition matrices can be
shared. So, when we count the number of additions, we only count one
instead of two or three.
– If we take h0, h1 as the FIR filter coefficients and take x0, x1 as the signal
(data) sequence, then the terms H0, H1 need not be recomputed each
time the filter is used. They can be precomputed once offline and stored.
So, we ignore these computations when counting the number of
operations
– From Example-1, We can understand the Cook-Toom algorithm as a
matrix decomposition. In general, a convolution can be expressed in
matrix-vector forms as
ú
û
ù
ê
ë
é
×
ú
ú
ú
û
ù
ê
ê
ê
ë
é
=
ú
ú
ú
û
ù
ê
ê
ê
ë
é
1
0
1
01
0
2
1
0
0
0
x
x
h
hh
h
s
s
s
xTs ×=or
13Chap. 8
– Generally, the equation can be expressed as
• Where C is the postaddition matrix, D is the preaddition matrix, and H is a
diagonal matrix with Hi, i = 0, 1, , L+N-2 on the main diagonal.
– Since T=CHD, it implies that the Cook-Toom algorithm provides a way
to factorize the convolution matrix T into multiplication of 1 postaddition
matrix C, 1 diagonal matrix H and 1 preaddition matrix D, such that the
total number of multiplications is determined only by the non-zero
elements on the main diagonal of the diagonal matrix H
– Although the number of multiplications is reduced, the number of
additions has increased. The Cook-Toom algorithm can be modified in
order to further reduce the number of additions
xDHCxTs ×××=×=
14Chap. 8
Modified Cook-Toom Algorithm
• The Cook-Toom algorithm is used to further reduce the number of
addition operations in linear convolutions
• Now consider the modified Cook-Toom Algorithm
Define
2
2)()('
-+
-+-=
NL
NL pSpsps . Notice that the
degree of )(ps is 2-+ NL and 2-+NLS is its highest order
coefficient. Therefore the degree of )(' ps is 3-+ NL .
15Chap. 8
• Modified Cook-Toom Algorithm
1. Choose 2-+ NL different real numbers 310 ,, -+××× NLbbb
2. Compute )( ih b and )( ix b , for { }3,,1,0 -+×××= NLi
3. Compute )()()( iii xhs bbb ×= , for { }3,,1,0 -+×××= NLi
4. Compute
2
2)()('
-+
-+-=
NL
iNLii sss bbb , for { }3,,1,0 -+×××= NLi
5. Compute )(' ps by using Õ
Õ
å
¹
¹
-+
= -
-
=
ij
ji
ij
jNL
i
i
p
sps
)(
)(
)(')('
2
0 bb
b
b
6. Compute
2
2)(')(
-+
-++=
NL
NL pspsps
16Chap. 8
• Example-3 (Example 8.2.3, p.234) Derive a 2X2 convolution algorithm using the
modified Cook-Toom algorithm with b={0,-1}
– and
• Which requires 2 multiplications (not counting the h1x1
multiplication)
– Apply the Lagrange interpolation algorithm, we get:
Consider the Lagrange interpolation for
2
11)()(' pxhpsps -= at
{ }1,0 10 -== bb .
First, find
2
11)()()(' iiii xhxhs bbbb -=
1011011
00000
)(,)(,1
)(,)(,0
xxxhhh
xxhh
-=-=-=
===
bbb
bbb
111010
2
111111
00
2
011000
))(()()()('
)()()('
xhxxhhxhxhs
xhxhxhs
---=-=
=-=
bbbb
bbbb
))(')('()('
)(
)(
)('
)(
)(
)(')('
100
01
0
1
10
1
0
bbb
bb
b
b
bb
b
b
ssps
p
s
p
sps
-+=
-
-
+
-
-
=
17Chap. 8
• Example-3 (cont’d)
– Therefore,
– Finally, we have the matrix-form expression:
– Notice that
– Therefore:
2
210
2
11)(')( pspsspxhpsps ++=+=
ú
ú
ú
û
ù
ê
ê
ê
ë
é
×
ú
ú
ú
û
ù
ê
ê
ê
ë
é
-=
ú
ú
ú
û
ù
ê
ê
ê
ë
é
11
1
0
2
1
0
)('
)('
100
011
001
xh
s
s
s
s
s
b
b
ú
ú
ú
û
ù
ê
ê
ê
ë
é
×
ú
ú
ú
û
ù
ê
ê
ê
ë
é
-=
ú
ú
ú
û
ù
ê
ê
ê
ë
é
11
1
0
11
1
0
)(
)(
100
110
001
)('
)('
xh
s
s
xh
s
s
b
b
b
b
ú
û
ù
ê
ë
é
×
ú
ú
ú
û
ù
ê
ê
ê
ë
é
-×
ú
ú
ú
û
ù
ê
ê
ê
ë
é
-×
ú
ú
ú
û
ù
ê
ê
ê
ë
é
-=
ú
ú
ú
û
ù
ê
ê
ê
ë
é
×
ú
ú
ú
û
ù
ê
ê
ê
ë
é
-×
ú
ú
ú
û
ù
ê
ê
ê
ë
é
-=
ú
ú
ú
û
ù
ê
ê
ê
ë
é
1
0
1
10
0
11
1
0
2
1
0
10
11
01
00
00
00
100
111
001
)(
)(
100
110
001
100
011
001
x
x
h
hh
h
xh
s
s
s
s
s
b
b
18Chap. 8
• Example-3 (cont’d)
– The computation is carried out as the follows:
– The total number of operations are 3 multiplications and 3 additions.
Compared with the convolution algorithm in Example-1, the number of
addition operations has been reduced by 2 while the number of
multiplications remains the same.
• Example-4 (Example 8.2.4, p. 236 of Textbook)
• Conclusion: The Cook-Toom Algorithm is efficient as measured by the
number of multiplications. However, as the size of the problem increases, it is
not efficient because the number of additions increases greatly if b takes
values other than {0, ±1, ±2, ±4}. This may result in complicated pre-addition
and post-addition matrices. For large-size problems, the Winograd algorithm is
more efficient.
22210100
222111000
1210100
1210100
,,.4
,,.3
,,.2
,,.1
SsSSSsSs
XHSXHSXHS
xXxxXxX
hHhhHhH
=+-==
===
=-==
=-== (pre-computed)
19Chap. 8
Winograd Algorithm
• The Winograd short convolution algorithm: based on the CRT (Chinese
Remainder Theorem) ---It’s possible to uniquely determine a nonnegative integer given
only its remainder with respect to the given moduli, provided that the moduli are relatively
prime and the integer is known to be smaller than the product of the moduli
• Theorem: CRT for Integers
Given [ ]cRc imi = (represents the remainder when c is divided by im ), for
ki ,...,1,0= , where im are moduli and are relatively prime, then
MMNcc
k
i
iii mod
0
÷
ø
ö
ç
è
æ
= å
=
, where Õ ==
k
i i
mM
0 , ii mMM = ,
and iN is the solution of 1),( ==+ iiiiii mMGCDmnMN , provided
that Mc <£0
20Chap. 8
• Theorem: CRT for Polynomials
• Example-5 (Example 8.3.1, p.239): using the CRT for integer, Choose
moduli m0=3, m1=4, m2=5. Then , and .
Then:
– where and are obtained using the Euclidean GCD algorithm. Given
that the integer c satisfying , let .
Given [ ])()()( )()( pcpRpc imi = , for i=0, 1, ,k, where )()( pm i are
relatively prime, then )(mod)()()()(
0
)()()( pMpMpNpcpc
k
i
iii ÷
ø
ö
ç
è
æ
= å
=
, where
Õ ==
k
i
i pmpM
0
)( )()( , )()()( )()( pmpMpM ii = , and )()( pN i is the solution of
1))(),(()()()()( )()()()()()( ==+ pmpMGCDpmpnpMpN iiiiii Provided that the
degree of )( pc is less than the degree of )( pM
60210 == mmmM ii mMM =
15)5(12)2(,12,5
14)4(15)1(,15,4
1)3(720)1(,20,3
22
11
00
=+-==
=+-==
=+-==
Mm
Mm
Mm
iN in
Mc <£0 [ ]cRc imi =
21Chap. 8
• Example-5 (cont’d)
– The integer c can be calculated as
– For c=17,
• CRT for polynomials: The remainder of a polynomial with regard to modulus
, where , can be evaluated by substituting by
in the polynomial
• Example-6 (Example 8.3.2, pp239)
60mod)241520(mod 210
0
cccMMNcc
k
i
iii *-*-*-=÷
ø
ö
ç
è
æ
= å
=
2)17(,1)17(,2)17( 524130 ====== RcRcRc
( ) 1760mod10360mod)224115220( =-=*-*-*-=c
)(pfpi + 1))(deg( -£ ipf ip
)( pf-
[ ]
[ ]
[ ] 5253)2(5535).(
5353)2(5535).(
195)2(3)2(5535).(
2
2
2
2
22
2
2
2
--=++--=++
-=++-=++
=+-+-=++
++
+
+
xxxxxRc
xxxxRb
xxRa
xx
x
x
22Chap. 8
• Winograd Algorithm
– 1. Choose a polynomial with degree higher than the degree of
and factor it into k+1 relatively prime polynomials with real
coefficients, i.e.,
– 2. Let . Use the Euclidean GCD algorithm to
solve for .
– 3. Compute:
– 4. Compute:
– 5. Compute by using:
)()()()( )()1()0( pmpmpmpm k×××=
)()( pxph
)()()( )()( pmpmpM ii =
1)()()()( )()()()( =+ pmpnpMpN iiii )()( pN i
kifor
pmpxpxpmphph iiii
,,1,0
)(mod)()(),(mod)()( )()()()(
×××=
==
kiforpmpxphps iiii ,,1,0),(mod)()()( )()()()( ×××==
å
=
=
k
i
iiii pmpMpNpsps
0
)()()()( )(mod)()()()(
)( pm
)( ps
23Chap. 8
• Example-7 (Example 8.3.3, p.240) Consider a 2X3 linear convolution as in
Example 8.2.2. Construct an efficient realization using Winograd algorithm
with
– Let:
– Construct the following table using the relationships
and
– Compute residues from :
)1)(1()( 2 +-= ppppm
1)(,1)(,)( 2)2()1()0( +=-== ppmppmppm
1)()()()( )()()()( =+ pmpnpMpN iiii
)()()( )()( pmpmpM ii =
2,1,0=ifor
i )()( pm i )()( pM i )()( pn i )()( pN i
0 p 123 -+- ppp 12 +- pp 1-
1 1-p pp +3 ( )2221 ++- pp 21
2 12 +p pp -2 ( )221 -- p ( )121 -p
2
21010 )(,)( pxpxxpxphhph ++=+=
pxxxpxphhph
xxxpxhhph
xpxhph
120)2(10
)2(
210
)1(
10
)1(
0
)0(
0
)0(
)()(,)(
)(,)(
)(,)(
+-=+=
++=+=
==
24Chap. 8
• Example-7 (cont’d)
– Notice, we need 1 multiplication for , 1 for , and 4 for
– However it can be further reduced to 3 multiplications as shown below:
– Then:
psspxxhxhxhxxh
ppxxxphhps
sxxxhhpssxhps
)2(
1
)2(
02011011200
12010
)2(
)1(
021010
)1()0(
000
)0(
))(()(
)1mod()))((()(
))(()(,)(
2
+=-++--=
++-+=
=+++===
)()0( ps )()1( ps )()2( ps
ú
ú
ú
û
ù
ê
ê
ê
ë
é
-
-+
×
ú
ú
ú
û
ù
ê
ê
ê
ë
é
+
-×ú
û
ù
ê
ë
é
-
-
=ú
û
ù
ê
ë
é
1
20
210
10
10
0
)2(
1
)2(
0
00
00
00
011
101
x
xx
xxx
hh
hh
h
s
s
[ ]
)mod(
)2()()1)((
)(mod)()()()(
234
23
2
)(3
2
)(23)0(
2
0
)()()()(
)2()1(
pppp
ppppppppps
pmpMpNpsps
pSpS
i
iiii
-+-
+-+++-+--=
= å
=
25Chap. 8
• Example-7 (cont’d)
– Substitute into to obtain the following
table
– Therefore, we have
)(),(),( )2()1()0( pspsps )( ps
0p 1p 2p 3p
)0(
0s
)0(
0s-
)0(
0s
)0(
0s-
0 )1(
02
1 s 0 )1(021 s
0 )2(
02
1 s )2(0s-
)2(
02
1 s
0 )2(
12
1 s 0 )2(12
1 s-
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ë
é
×
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
--
-
-
=
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
)2(
12
1
)2(
02
1
)1(
02
1
)0(
0
3
2
1
0
1111
0201
1111
0001
s
s
s
s
s
s
s
s
26Chap. 8
• Example-7 (cont’d)
– Notice that
– So, finally we have:
ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ë
é
-
-+
++
×
ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ë
é
×
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
-
=
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ë
é
+
-
+
1
20
210
210
0
2
2
2
2
0
)2(
12
1
)2(
02
1
)1(
02
1
)0(
0
10
01
0
10
0000
0000
0000
0000
0000
01100
10100
00010
00001
x
xx
xxx
xxx
xh
s
s
s
s
hh
hh
h
hh
ú
ú
ú
û
ù
ê
ê
ê
ë
é
×
ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ë
é
-
-×
ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ë
é
×
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
---
-
--
=
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
+
-
+
2
1
0
2
2
2
2
0
3
2
1
0
010
101
111
111
001
0000
0000
0000
0000
0000
11001
20201
11211
00001
10
01
0
10
x
x
x
h
s
s
s
s
hh
hh
h
hh
27Chap. 8
• Example-7 (cont’d)
– In this example, the Winograd convolution algorithm requires 5
multiplications and 11 additions compared with 6 multiplications and 2
additions for direct implementation
• Notes:
– The number of multiplications in Winograd algorithm is highly dependent
on the degree of each . Therefore, the degree of m(p) should be as
small as possible.
– More efficient form (or a modified version) of the Winograd algorithm
can be obtained by letting deg[m(p)]=deg[s(p)] and applying the CRT to
)()( pm i
)()()(' 11 pmxhpsps LN ---=
28Chap. 8
Modified Winograd Algorithm
– 1. Choose a polynomial with degree equal to the degree of
and factor it into k+1 relatively prime polynomials with real coefficients,
i.e.,
– 2. Let , use the Euclidean GCD algorithm to
solve for .
– 3. Compute:
– 4. Compute:
– 5. Compute by using:
– 6. Compute
)( pm )( ps
)()()()( )()1()0( pmpmpmpm k×××=
)()()( )()( pmpmpM ii =
1)()()()( )()()()( =+ pmpnpMpN iiii )()( pN i
kifor
pmpxpxpmphph iiii
,,1,0
)(mod)()(),(mod)()( )()()()(
×××=
==
kiforpmpxphps iiii ,,1,0),(mod)()()(' )()()()( ×××==
)(' ps
å
=
=
k
i
iiii pmpMpNpsps
0
)()()()( )(mod)()()(')('
)()(')( 11 pmxhpsps LN --+=
29Chap. 8
• Example-8 (Example 8.3.4, p.243 ): Construct a 2X3 convolution algorithm
using modified Winograd algorithm with m(p)=p(p-1)(p+1)
– Let
– Construct the following table using the relationships
and
– Compute residues from :
1)(,1)(,)( )2()1()0( +=-== ppmppmppm
)()()( )()( pmpmpM ii =
1)()()()( )()()()( =+ pmpnpMpN iiii
i )()( pm i )()( pM i )()( pn i )()( pN i
0 p 12 -p p 1-
1 1-p pp +2 ( )221 +- p 21
2 1+p pp -2 ( )221 -- p 21
2
21010 )(,)( pxpxxpxphhph ++=+=
210)2(10
)2(
210
)1(
10
)1(
0
)0(
0
)0(
)(,)(
)(,)(
)(,)(
xxxpxhhph
xxxpxhhph
xpxhph
+-=-=
++=+=
==
))(()('
,))(()(',)('
21010
)2(
21010
)1(
00
)0(
xxxhhps
xxxhhpsxhps
+--=
+++==
30Chap. 8
• Example-8 (cont’d)
– Since the degree of is equal to 1, is a polynomial of
degree 0 (a constant). Therefore, we have:
– The algorithm can be written in matrix form as:
)()( pm i )(' )( ps i
[ ]
)()'()('
)()()()1('
)()(')(
21
3
2
'
2
')0(2
212
)2('
2
)1(')0(
3
21
2
2
'2
2
'2)0(
21
)2()1(
)2()1(
xhpspxhps
ppxhppppps
pmxhpsps
ssss
SS
+++-+--+=
-+-++++--=
+=
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ë
é
×
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
-
--
=
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
21
2
'
2
'
)0(
3
2
1
0
)2(
)1(
'
1000
0111
1110
0001
xh
s
s
s
s
s
s
s
31Chap. 8
• Example-8 (cont’d)
– (matrix form)
– Conclusion: this algorithm requires 4 multiplications and 7 additions
ú
ú
ú
û
ù
ê
ê
ê
ë
é
×
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
-
×
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
×
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
-
--
=
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
-
+
2
1
0
1
2
2
0
3
2
1
0
100
111
111
001
000
000
000
000
1000
0111
1110
0001
10
10
x
x
x
h
h
s
s
s
s
hh
hh
32Chap. 8
Iterated Convolution
• Iterated convolution algorithm: makes use of efficient short-length
convolution algorithms iteratively to build long convolutions
• Does not achieve minimal multiplication complexity, but achieves a
good balance between multiplications and addition complexity
• Iterated Convolution Algorithm (Description)
– 1. Decompose the long convolution into several levels of short
convolutions
– 2. Construct fast convolution algorithms for short convolutions
– 3. Use the short convolution algorithms to iteratively (hierarchically)
implement the long convolution
– Note: the order of short convolutions in the decomposition affects the
complexity of the derived long convolution
33Chap. 8
• Example-9 (Example 8.4.1, pp.245): Construct a 4X4 linear convolution
algorithm using 2X2 short convolution
– Let
and
– First, we need to decompose the 4X4 convolution into a 2X2 convolution
– Define
– Then, we have:
3
3
2
210
3
3
2
210 )(,)( pxpxpxxpxphphphhph +++=+++=
)()()( pxphps =
pxxpxpxxpx
phhphphhph
321100
321100
)(',)('
)(',)('
+=+=
+=+=
qpxpxqpxpxeippxpxpx
qphphqphpheipphphph
)(')('),()(.,.,)(')(')(
)(')('),()(.,.,)(')(')(
10
2
10
10
2
10
+==+=
+==+=
[ ] [ ]
[ ]
),()(')(')('
)(')(')(')(')(')(')(')('
)(')(')(')('
),(),()()()(
2
210
2
11011000
1010
qpsqpsqpsps
qpxphqpxphpxphpxph
qpxpxqphph
qpxqphpxphps
=++=
+++=
+×+=
==
34Chap. 8
• Example-9 (cont’d)
– Therefore, the 4X4 convolution is decomposed into two levels of nested
2X2 convolutions
– Let us start from the first convolution , we have:
– We have the following expression for the third convolution:
– For the second convolution, we get the following expression:
)(')(')(' 000 pxphps ×=
[ ]1100101021100
10100000
)()(
)()('')(')('
xhxhxxhhppxhxh
pxxphhxhpxph
--+×+++=
+×+=׺×
[ ]3322323223322
323211112
)()(
)()('')(')(')('
xhxhxxhhppxhxh
pxxphhxhpxphps
--+×+++=
+×+=׺×=
[ ]11001010
011001101
'''')''()''(
'''')(')(')(')(')('
xhxhxxhh
xhxhpxphpxphps
×-×-+×+=
×+׺×+×=
: addition: multiplication
35Chap. 8
• Example-9 (Cont’d)
• For , we have the following expression:
– If we rewrite the three convolutions as the following expressions, then we can get
the following table (see the next page):
[ ])''()''( 1010 xxhh +×+
[ ] [ ]
)]()()()(
)()[(
)()()()(
)()()()()''()''(
31312020
32103210
3131
2
2020
312031201010
xxhhxxhh
xxxxhhhhp
xxhhpxxhh
xxpxxhhphhxxhh
+×+-+×+-
+++×++++
+×+++×+=
+++×+++=+×+
( ) ( ) 32211010
3
2
2111
3
2
2100
''''
''
''
cppccxxhh
bppbbxh
appaaxh
++º+×+
++º
++º
This requires 9 multiplications and 11 additions
36Chap. 8
• Example-9 (cont’d)
– Therefore, the total number of operations used in this 4X4 iterated
convolution algorithm is 9 multiplications and 19 additions
0p 1p 2p 3p 4p 5p 6p
1a 2a 3a 1b 2b 3b
1c 2c 3c
1b- 2b- 3b-
1a- 2a- 3a-
Total 8 additions here
37Chap. 8
Cyclic Convolution
• Cyclic convolution: also known as circular convolution
• Let the filter coefficients be , and the data
sequence be .
– The cyclic convolution can be expressed as
– The output samples are given by
• where denotes
• The cyclic convolution can be computed as a linear convolution
reduced by modulo . (Notice that there are 2n-1 different output
samples for this linear convolution). Alternatively, the cyclic
convolution can be computed using CRT with , which
is much simpler.
{ }110 ,,, -×××= nhhhh
{ }110 ,,, -×××= nxxxx
[ ] )1mod()()()( -×=O= nn ppxphxhps
( )( )å
-
=
- -×××==
1
0
1,,1,0,
n
k
kkii nixhs
( ) nki mod-( )( )ki -
1-np
1)( -= nppm
38Chap. 8
• Example-10 (Example 8.5.1, p.246) Construct a 4X4 cyclic convolution
algorithm using CRT with
– Let
– Let
– Get the following table using the relationships
and
– Compute the residues
)1)(1)(1(1)( 24 ++-=-= pppppm
3
3
2
210
3
3
2
210 )(,)( pxpxpxxpxphphphhph +++=+++=
1)(,1)(,1)( 2)2()1()0( +=+=-= ppmppmppm
)()()( )()( pmpmpM ii =
1)()()()( )()()()( =+ pmpnpMpN iiii
i )()( pm i )()( pM i )()( pn i )()( pN i
0 1-p 123 -++ ppp )32( 24
1 ++- pp 4
1
1 1+p 123 -+- ppp ( )32241 +- pp 41-
2 12 +p 12 -p 21 21-
( ) ( ) phhphhhhph
hhhhhph
hhhhhph
)2(
1
)2(
03120
)2(
)1(
03210
)1(
)0(
03210
)0(
)(
,)(
,)(
+=-+-=
=-+-=
=+++=
39Chap. 8
• Example-10 (cont’d)
– Since
– or in matrix-form
– Computations so far require 5 multiplications
( ) ( ) pxxpxxxxpx
xxxxxpx
xxxxxpx
)2(
1
)2(
03120
)2(
)1(
03210
)1(
)0(
03210
)0(
)(
,)(
,)(
+=-+-=
=-+-=
=+++=
[ ]
( ) ( ))2(0)2(1)2(1)2(0)2(1)2(1)2(0)2(0
2)2()2()2(
1
)2(
0
)2(
)1(
0
)1(
0
)1(
0
)1()1()1(
)0(
0
)0(
0
)0(
0
)0()0()0(
)1mod()()()(
,)()()(
,)()()(
xhxhpxhxh
ppxphpssps
sxhpxphps
sxhpxphps
++×-×=
+×=+=
=×=×=
=×=×=
( ) ( )
( ) ( ) ,
,
)2(
0
)2(
0
)2(
1
)2(
1
)2(
0
)2(
0
)2(
0
)2(
1
)2(
1
)2(
0
)2(
1
)2(
1
)2(
1
)2(
0
)2(
1
)2(
0
)2(
0
)2(
1
)2(
1
)2(
0
)2(
0
)2(
0
xhhxxhxhxhs
xhhxxhxhxhs
-++=+=
+-+=-=
ú
ú
ú
û
ù
ê
ê
ê
ë
é +
×
ú
ú
ú
û
ù
ê
ê
ê
ë
é
+
-×ú
û
ù
ê
ë
é -
=ú
û
ù
ê
ë
é
)2(
1
)2(
0
)2(
1
)2(
0
)2(
1
)2(
0
)2(
0
)2(
1
)2(
0
)2(
1
)2(
0
00
00
00
011
101
x
x
xx
hh
hh
h
s
s
: multiplication
40Chap. 8
• Example-10 (cont’d)
– Then
– So, we have
[ ] ( )
( ) ( ) ( )
( ))2(121)1(041)0(0413
244
2
244244
2
1)2(
12
1)2(
04
1)1(
04
1)0(
0
2
0
)()()()(
)2(
0
)1(
0
)0(
0
)2(
1
)1(
0
)0(
0
)2(
0
)1(
0
)0(
0
222323
)()()(
)(mod)()()()(
sssp
pp
pssss
pmpMpNpsps
sssssssss
pppppppp
i
iiii
--+
-+++-+++=
×+++=
=
-
-
-
-
-
-+-+++
=
å
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ë
é
×
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
--
-
-
=
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
)2(
12
1
)2(
02
1
)1(
04
1
)0(
04
1
3
2
1
0
1011
0111
1011
0111
s
s
s
s
s
s
s
s
41Chap. 8
• Example-10 (cont’d)
– Notice that:
( )
( ) ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ë
é
+×
ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ë
é
-
-
×
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
-
=
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ë
é
)2(
1
)2(
0
)2(
1
)2(
0
)1(
0
)0(
0
)2(
1
)2(
02
1
)2(
0
)2(
12
1
)2(
02
1
)1(
04
1
)0(
04
1
)2(
12
1
)2(
02
1
)1(
04
1
)0(
04
1
0000
0000
0000
0000
0000
01100
10100
00010
00001
x
x
xx
x
x
hh
hh
h
h
h
s
s
s
s
42Chap. 8
• Example-10 (cont’d)
– Therefore, we have
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
×
ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ë
é
-
-
--
--
×
ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ë
é
×
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
---
-
-
-
=
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
--+
-++-
-
-+-
+++
3
2
1
0
2
2
2
4
4
3
2
1
0
1010
0101
1111
1111
1111
0000
0000
0000
0000
0000
01111
10111
01111
10111
3210
3210
20
3210
3210
x
x
x
x
s
s
s
s
hhhh
hhhh
hh
hhhh
hhhh
43Chap. 8
• Example-10 (cont’d)
– This algorithm requires 5 multiplications and 15 additions
– The direct implementation requires 16 multiplications and 12 additions
(see the following matrix-form. Notice that the cyclic convolution matrix
is a circulant matrix)
• An efficient cyclic convolution algorithm can often be easily extended
to construct efficient linear convolution
• Example-11 (Example 8.5.2, p.249) Construct a 3X3 linear convolution
using 4X4 cyclic convolution algorithm
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
×
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
=
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
3
2
1
0
0123
3012
2301
1230
3
2
1
0
x
x
x
x
hhhh
hhhh
hhhh
hhhh
s
s
s
s
44Chap. 8
• Example-11 (cont’d)
– Let the 3-point coefficient sequence be , and the 3-point
data sequence be
– First extend them to 4-point sequences as:
– Then the 3X3 linear convolution of h and x is
– The 4X4 cyclic convolution of h and x, i.e. , is:
{ }210 ,, hhhh ={ }210 ,, xxxx =
{ } { }0,,,,0,,, 210210 xxxxhhhh ==
ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ë
é
+
++
+
=×
22
2112
201102
1001
00
xh
xhxh
xhxhxh
xhxh
xh
xh
xh 4O
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
+
++
+
+
=O
2112
201102
1001
2200
4
xhxh
xhxhxh
xhxh
xhxh
xh
45Chap. 8
• Example-11 (cont’d)
– Therefore, we have
– Using the result of Example-10 for , the following convolution
algorithm for 3X3 linear convolution is obtained:
)1()()()( 422 -+O=×= pxhxhpxphps n
xh 4O
×
ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ë
é
---
-
-
--
=
ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ë
é
100000
001111
010111
001111
110111
4
3
2
1
0
s
s
s
s
s
(continued on the next page)
46Chap. 8
• Example-11 (cont’d)
ú
ú
ú
û
ù
ê
ê
ê
ë
é
×
ú
ú
ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ê
ê
ë
é
-
-
-
×
ú
ú
ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ê
ê
ë
é
×
-+
++-
-
+-
++
2
1
0
2
2
2
2
4
4
100
010
101
111
111
111
00000
00000
00000
00000
00000
00000
210
210
20
210
210
x
x
x
h
hhh
hhh
hh
hhh
hhh
47Chap. 8
• Example-11 (cont’d)
– So, this algorithm requires 6 multiplications and 16 additions
• Comments:
– In general, an efficient linear convolution can be used to obtain an
efficient cyclic convolution algorithm. Conversely, an efficient
cyclic convolution algorithm can be used to derive an efficient
linear convolution algorithm
48Chap. 8
Design of fast convolution algorithm by
inspection
• When the Cook-Toom or the Winograd algorithms can not generate an
efficient algorithm, sometimes a clever factorization by inspection may
generate a better algorithm
• Example-12 (Example 8.6.1, p.250) Construct a 3X3 fast convolution
algorithm by inspection
– The 3X3 linear convolution can be written as follows, which requires 9
multiplications and 4 additions
ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ë
é
+
++
+
=
ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ë
é
22
2112
201102
1001
00
4
3
2
1
0
xh
xhxh
xhxhxh
xhxh
xh
s
s
s
s
s
49Chap. 8
• Example-12 (cont’d)
– Using the following identities:
– The 3X3 linear convolution can be written as:
( ) ( )
( )( )
( )( ) 2211212121123
22110020202011022
1100101010011
xhxhxxhhxhxhs
xhxhxhxxhhxhxhxhs
xhxhxxhhxhxhs
--++=+=
-+-++=++=
--+×+=×+=
×
ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ë
é
--
--
--
=
ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ë
é
000100
100110
010111
001011
000001
4
3
2
1
0
s
s
s
s
s
(continued on
the next page)
50Chap. 8
• Example-12 (cont’d)
– Conclusion: This algorithm, which can not be obtained by using
the Cook-Toom or the Winograd algorithms, requires 6
multiplications and 10 additions
ú
ú
ú
û
ù
ê
ê
ê
ë
é
×
ú
ú
ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ê
ê
ë
é
×
ú
ú
ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ê
ê
ë
é
+
+
+
×
2
1
0
21
20
10
2
1
0
110
101
011
100
010
001
00000
00000
00000
00000
00000
00000
x
x
x
hh
hh
hh
h
h
h
Các file đính kèm theo tài liệu này:
- chap_chap8_1691_1903.pdf