Kỹ thuật viễn thông - Chapter 8: Fast convolution

Example-7 (cont’d) – In this example, the Winograd convolution algorithm requires 5 multiplications and 11 additions compared with 6 multiplications and 2 additions for direct implementation • Notes: – The number of multiplications in Winograd algorithm is highly dependent on the degree of each . Therefore, the degree of m(p) should be as small as possible. – More efficient form (or a modified version) of the Winograd algorithm can be obtained by letting deg[m(p)]=deg[s(p)] and applying the CRT to

50 trang | Chia sẻ: huyhoang44 | Lượt xem: 608 | Lượt tải: 0

Bạn đang xem trước 20 trang tài liệu Kỹ thuật viễn thông - Chapter 8: Fast convolution, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên

Chapter 8: Fast Convolution Keshab K. Parhi 2Chap. 8 Chapter 8 Fast Convolution • Introduction • Cook-Toom Algorithm and Modified Cook-Toom Algorithm • Winograd Algorithm and Modified Winograd Algorithm • Iterated Convolution • Cyclic Convolution • Design of Fast Convolution Algorithm by Inspection 3Chap. 8 Introduction • Fast Convolution: implementation of convolution algorithm using fewer multiplication operations by algorithmic strength reduction • Algorithmic Strength Reduction: Number of strong operations (such as multiplication operations) is reduced at the expense of an increase in the number of weak operations (such as addition operations). These are best suited for implementation using either programmable or dedicated hardware • Example: Reducing the multiplication complexity in complex number multiplication: – Assume (a+jb)(c+dj)=e+jf, it can be expressed using the matrix form, which requires 4 multiplications and 2 additions: – However, the number of multiplications can be reduced to 3 at the expense of 3 extra additions by using: ú û ù ê ë é ×ú û ù ê ë é - =ú û ù ê ë é b a cd dc f e î í ì -++=+ -+-=- )()( )()( baddcbbcad baddcabdac 4Chap. 8 – Rewrite it into matrix form, its coefficient matrix can be decomposed as the product of a 2X3(C), a 3X3(H)and a 3X2(D) matrix: • Where C is a post-addition matrix (requires 2 additions), D is a pre-addition matrix (requires 1 addition), and H is a diagonal matrix (requires 2 additions to get its diagonal elements) – So, the arithmetic complexity is reduced to 3 multiplications and 3 additions (not including the additions in H matrix) • In this chapter we will discuss two well-known approaches to the design of fast short-length convolution algorithms: the Cook-Toom algorithm (based on Lagrange Interpolation) and the Winograd Algorithm (based on the Chinese remainder theorem) xDHC b a d dc dc f e s ×××=ú û ù ê ë é × ú ú ú û ù ê ê ê ë é - × ú ú ú û ù ê ê ê ë é + - ×ú û ù ê ë é =ú û ù ê ë é = 11 10 01 00 00 00 110 101 5Chap. 8 Cook-Toom Algorithm • A linear convolution algorithm for polynomial multiplication based on the Lagrange Interpolation Theorem • Lagrange Interpolation Theorem: Let nbb ,....,0 be a set of 1+n distinct points, and let )( if b , for i = 0, 1, , n be given. There is exactly one polynomial )( pf of degree n or less that has value )( if b when evaluated at ib for i = 0, 1, , n. It is given by: Õ Õ å ¹ ¹ = - - = ij ji ij jn i i p fpf )( )( )()( 0 bb b b 6Chap. 8 • The application of Lagrange interpolation theorem into linear convolution Consider an N-point sequence { }110 ,...,, -= Nhhhh and an L-point sequence { }110 ,...,, -= Lxxxx . The linear convolution of h and x can be expressed in terms of polynomial multiplication as follows: )()()( pxphps ×= where 01 1 1 ...)( hphphph N N +++= - - 01 1 1 ...)( xpxpxpx L L +++= - - 01 2 2 ...)( spspsps NL NL +++= -+ -+ The output polynomial )( ps has degree 2-+ NL and has 1-+ NL different points. 7Chap. 8 • (continued) )( ps can be uniquely determined by its values at 1-+ NL different points. Let { }210 ,...,, -+ NLbbb be 1-+ NL different real numbers. If )( is b for { }2,...,1,0 -+= NLi are known, then )( ps can be computed using the Lagrange interpolation theorem as: Õ Õ å ¹ ¹ -+ = - - = ij ji ij jNL i i p sps )( )( )()( 2 0 bb b b It can be proved that this equation is the unique solution to compute linear convolution for )( ps given the values of )( is b , for { }2,...,1,0 -+= NLi . 8Chap. 8 • Cook-Toom Algorithm (Algorithm Description) • Algorithm Complexity – The goal of the fast-convolution algorithm is to reduce the multiplication complexity. So, if bi `s (i=0,1,,L+N-2) are chosen properly, the computation in step-2 involves some additions and multiplications by small constants – The multiplications are only used in step-3 to compute s(bi). So, only L+N-1 multiplications are needed 1. Choose 1-+ NL different real numbers 210 ,, -+××× NLbbb 2. Compute )( ih b and )( ix b , for { }2,,1,0 -+×××= NLi 3. Compute )()()( iii xhs bbb ×= , for { }2,,1,0 -+×××= NLi 4. Compute )( ps by using Õ Õ å ¹ ¹ -+ = - - = ij ji ij jNL i i p sps )( )( )()( 2 0 bb b b 9Chap. 8 – By Cook-Toom algorithm, the number of multiplications is reduced from O(LN) to L+N-1 at the expense of an increase in the number of additions – An adder has much less area and computation time than a multiplier. So, the Cook-Toom algorithm can lead to large savings in hardware (VLSI) complexity and generate computationally efficient implementation • Example-1: (Example 8.2.1, p.230) Construct a 2X2 convolution algorithm using Cook-Toom algorithm with b={0,1,-1} – Write 2X2 convolution in polynomial multiplication form as s(p)=h(p)x(p), where – Direct implementation, which requires 4 multiplications and 1 additions, can be expressed in matrix form as follows: 2 210 1010 )( )()( pspssps pxxpxphhph ++= +=+= ú û ù ê ë é × ú ú ú û ù ê ê ê ë é = ú ú ú û ù ê ê ê ë é 1 0 1 01 0 2 1 0 0 0 x x h hh h s s s 10Chap. 8 • Example-1 (continued) – Next we use C-T algorithm to get an efficient convolution implementation with reduced multiplication number – Then, s(b0), s(b1), and s(b2) are calculated, by using 3 multiplications, as – From the Lagrange Interpolation theorem, we get: 1021022 1011011 00000 )(,)(,2 )(,)(,1 )(,)(,0 xxxhhh xxxhhh xxhh -=-== +=+== === bbb bbb bbb )()()()()()()()()( 222111000 bbbbbbbbb xhsxhsxhs === 2 2 10 21 0 221 0 1202 10 2101 10 1 2010 21 0 ) 2 )()()(() 2 )()(()( ))(( ))(( )2( ))(( ))(()( ))(( ))(()()( sppss ssspssps pp s ppsppsps ++= ++-+-+= -- -- + -- --+ -- --= bbbbbb bbbb bb b bbbb bbb bbbb bbb 11Chap. 8 • Example-1 (continued) – The preceding computation leads to the following matrix form – The computation is carried out as follows (5 additions, 3 multiplications) ú û ù ê ë é × ú ú ú û ù ê ê ê ë é - × ú ú ú û ù ê ê ê ë é - +× ú ú ú û ù ê ê ê ë é - -= ú ú ú û ù ê ê ê ë é × ú ú ú û ù ê ê ê ë é × ú ú ú û ù ê ê ê ë é - -= ú ú ú û ù ê ê ê ë é × ú ú ú û ù ê ê ê ë é - -= ú ú ú û ù ê ê ê ë é 1 0 10 10 0 2 1 0 2 1 0 2 1 0 2 1 0 11 11 01 2)(00 02)(0 00 111 110 001 )( )( )( 2)(00 02)(0 00)( 111 110 001 2)( 2)( )( 111 110 001 x x hh hh h x x x h h h s s s s s s b b b b b b b b b 210221100 222111000 10210100 10 2 10 100 ,,.4 ,,.3 ,,.2 2 , 2 ,.1 SSSsSSsSs XHSXHSXHS xxXxxXxX hhHhhHhH ++-=-== === -=+== -=+== (pre-computed) 12Chap. 8 – (Continued): Therefore, this algorithm needs 3 multiplications and 5 additions (ignoring the additions in the pre-computation ), i.e., the number of multiplications is reduced by 1 at the expense of 4 extra additions – Example-2, please see Example 8.2.2 of Textbook (p.231) • Comments – Some additions in the preaddition or postaddition matrices can be shared. So, when we count the number of additions, we only count one instead of two or three. – If we take h0, h1 as the FIR filter coefficients and take x0, x1 as the signal (data) sequence, then the terms H0, H1 need not be recomputed each time the filter is used. They can be precomputed once offline and stored. So, we ignore these computations when counting the number of operations – From Example-1, We can understand the Cook-Toom algorithm as a matrix decomposition. In general, a convolution can be expressed in matrix-vector forms as ú û ù ê ë é × ú ú ú û ù ê ê ê ë é = ú ú ú û ù ê ê ê ë é 1 0 1 01 0 2 1 0 0 0 x x h hh h s s s xTs ×=or 13Chap. 8 – Generally, the equation can be expressed as • Where C is the postaddition matrix, D is the preaddition matrix, and H is a diagonal matrix with Hi, i = 0, 1, , L+N-2 on the main diagonal. – Since T=CHD, it implies that the Cook-Toom algorithm provides a way to factorize the convolution matrix T into multiplication of 1 postaddition matrix C, 1 diagonal matrix H and 1 preaddition matrix D, such that the total number of multiplications is determined only by the non-zero elements on the main diagonal of the diagonal matrix H – Although the number of multiplications is reduced, the number of additions has increased. The Cook-Toom algorithm can be modified in order to further reduce the number of additions xDHCxTs ×××=×= 14Chap. 8 Modified Cook-Toom Algorithm • The Cook-Toom algorithm is used to further reduce the number of addition operations in linear convolutions • Now consider the modified Cook-Toom Algorithm Define 2 2)()(' -+ -+-= NL NL pSpsps . Notice that the degree of )(ps is 2-+ NL and 2-+NLS is its highest order coefficient. Therefore the degree of )(' ps is 3-+ NL . 15Chap. 8 • Modified Cook-Toom Algorithm 1. Choose 2-+ NL different real numbers 310 ,, -+××× NLbbb 2. Compute )( ih b and )( ix b , for { }3,,1,0 -+×××= NLi 3. Compute )()()( iii xhs bbb ×= , for { }3,,1,0 -+×××= NLi 4. Compute 2 2)()(' -+ -+-= NL iNLii sss bbb , for { }3,,1,0 -+×××= NLi 5. Compute )(' ps by using Õ Õ å ¹ ¹ -+ = - - = ij ji ij jNL i i p sps )( )( )(')(' 2 0 bb b b 6. Compute 2 2)(')( -+ -++= NL NL pspsps 16Chap. 8 • Example-3 (Example 8.2.3, p.234) Derive a 2X2 convolution algorithm using the modified Cook-Toom algorithm with b={0,-1} – and • Which requires 2 multiplications (not counting the h1x1 multiplication) – Apply the Lagrange interpolation algorithm, we get: Consider the Lagrange interpolation for 2 11)()(' pxhpsps -= at { }1,0 10 -== bb . First, find 2 11)()()(' iiii xhxhs bbbb -= 1011011 00000 )(,)(,1 )(,)(,0 xxxhhh xxhh -=-=-= === bbb bbb 111010 2 111111 00 2 011000 ))(()()()(' )()()(' xhxxhhxhxhs xhxhxhs ---=-= =-= bbbb bbbb ))(')('()(' )( )( )(' )( )( )(')(' 100 01 0 1 10 1 0 bbb bb b b bb b b ssps p s p sps -+= - - + - - = 17Chap. 8 • Example-3 (cont’d) – Therefore, – Finally, we have the matrix-form expression: – Notice that – Therefore: 2 210 2 11)(')( pspsspxhpsps ++=+= ú ú ú û ù ê ê ê ë é × ú ú ú û ù ê ê ê ë é -= ú ú ú û ù ê ê ê ë é 11 1 0 2 1 0 )(' )(' 100 011 001 xh s s s s s b b ú ú ú û ù ê ê ê ë é × ú ú ú û ù ê ê ê ë é -= ú ú ú û ù ê ê ê ë é 11 1 0 11 1 0 )( )( 100 110 001 )(' )(' xh s s xh s s b b b b ú û ù ê ë é × ú ú ú û ù ê ê ê ë é -× ú ú ú û ù ê ê ê ë é -× ú ú ú û ù ê ê ê ë é -= ú ú ú û ù ê ê ê ë é × ú ú ú û ù ê ê ê ë é -× ú ú ú û ù ê ê ê ë é -= ú ú ú û ù ê ê ê ë é 1 0 1 10 0 11 1 0 2 1 0 10 11 01 00 00 00 100 111 001 )( )( 100 110 001 100 011 001 x x h hh h xh s s s s s b b 18Chap. 8 • Example-3 (cont’d) – The computation is carried out as the follows: – The total number of operations are 3 multiplications and 3 additions. Compared with the convolution algorithm in Example-1, the number of addition operations has been reduced by 2 while the number of multiplications remains the same. • Example-4 (Example 8.2.4, p. 236 of Textbook) • Conclusion: The Cook-Toom Algorithm is efficient as measured by the number of multiplications. However, as the size of the problem increases, it is not efficient because the number of additions increases greatly if b takes values other than {0, ±1, ±2, ±4}. This may result in complicated pre-addition and post-addition matrices. For large-size problems, the Winograd algorithm is more efficient. 22210100 222111000 1210100 1210100 ,,.4 ,,.3 ,,.2 ,,.1 SsSSSsSs XHSXHSXHS xXxxXxX hHhhHhH =+-== === =-== =-== (pre-computed) 19Chap. 8 Winograd Algorithm • The Winograd short convolution algorithm: based on the CRT (Chinese Remainder Theorem) ---It’s possible to uniquely determine a nonnegative integer given only its remainder with respect to the given moduli, provided that the moduli are relatively prime and the integer is known to be smaller than the product of the moduli • Theorem: CRT for Integers Given [ ]cRc imi = (represents the remainder when c is divided by im ), for ki ,...,1,0= , where im are moduli and are relatively prime, then MMNcc k i iii mod 0 ÷ ø ö ç è æ = å = , where Õ == k i i mM 0 , ii mMM = , and iN is the solution of 1),( ==+ iiiiii mMGCDmnMN , provided that Mc <£0 20Chap. 8 • Theorem: CRT for Polynomials • Example-5 (Example 8.3.1, p.239): using the CRT for integer, Choose moduli m0=3, m1=4, m2=5. Then , and . Then: – where and are obtained using the Euclidean GCD algorithm. Given that the integer c satisfying , let . Given [ ])()()( )()( pcpRpc imi = , for i=0, 1, ,k, where )()( pm i are relatively prime, then )(mod)()()()( 0 )()()( pMpMpNpcpc k i iii ÷ ø ö ç è æ = å = , where Õ == k i i pmpM 0 )( )()( , )()()( )()( pmpMpM ii = , and )()( pN i is the solution of 1))(),(()()()()( )()()()()()( ==+ pmpMGCDpmpnpMpN iiiiii Provided that the degree of )( pc is less than the degree of )( pM 60210 == mmmM ii mMM = 15)5(12)2(,12,5 14)4(15)1(,15,4 1)3(720)1(,20,3 22 11 00 =+-== =+-== =+-== Mm Mm Mm iN in Mc <£0 [ ]cRc imi = 21Chap. 8 • Example-5 (cont’d) – The integer c can be calculated as – For c=17, • CRT for polynomials: The remainder of a polynomial with regard to modulus , where , can be evaluated by substituting by in the polynomial • Example-6 (Example 8.3.2, pp239) 60mod)241520(mod 210 0 cccMMNcc k i iii *-*-*-=÷ ø ö ç è æ = å = 2)17(,1)17(,2)17( 524130 ====== RcRcRc ( ) 1760mod10360mod)224115220( =-=*-*-*-=c )(pfpi + 1))(deg( -£ ipf ip )( pf- [ ] [ ] [ ] 5253)2(5535).( 5353)2(5535).( 195)2(3)2(5535).( 2 2 2 2 22 2 2 2 --=++--=++ -=++-=++ =+-+-=++ ++ + + xxxxxRc xxxxRb xxRa xx x x 22Chap. 8 • Winograd Algorithm – 1. Choose a polynomial with degree higher than the degree of and factor it into k+1 relatively prime polynomials with real coefficients, i.e., – 2. Let . Use the Euclidean GCD algorithm to solve for . – 3. Compute: – 4. Compute: – 5. Compute by using: )()()()( )()1()0( pmpmpmpm k×××= )()( pxph )()()( )()( pmpmpM ii = 1)()()()( )()()()( =+ pmpnpMpN iiii )()( pN i kifor pmpxpxpmphph iiii ,,1,0 )(mod)()(),(mod)()( )()()()( ×××= == kiforpmpxphps iiii ,,1,0),(mod)()()( )()()()( ×××== å = = k i iiii pmpMpNpsps 0 )()()()( )(mod)()()()( )( pm )( ps 23Chap. 8 • Example-7 (Example 8.3.3, p.240) Consider a 2X3 linear convolution as in Example 8.2.2. Construct an efficient realization using Winograd algorithm with – Let: – Construct the following table using the relationships and – Compute residues from : )1)(1()( 2 +-= ppppm 1)(,1)(,)( 2)2()1()0( +=-== ppmppmppm 1)()()()( )()()()( =+ pmpnpMpN iiii )()()( )()( pmpmpM ii = 2,1,0=ifor i )()( pm i )()( pM i )()( pn i )()( pN i 0 p 123 -+- ppp 12 +- pp 1- 1 1-p pp +3 ( )2221 ++- pp 21 2 12 +p pp -2 ( )221 -- p ( )121 -p 2 21010 )(,)( pxpxxpxphhph ++=+= pxxxpxphhph xxxpxhhph xpxhph 120)2(10 )2( 210 )1( 10 )1( 0 )0( 0 )0( )()(,)( )(,)( )(,)( +-=+= ++=+= == 24Chap. 8 • Example-7 (cont’d) – Notice, we need 1 multiplication for , 1 for , and 4 for – However it can be further reduced to 3 multiplications as shown below: – Then: psspxxhxhxhxxh ppxxxphhps sxxxhhpssxhps )2( 1 )2( 02011011200 12010 )2( )1( 021010 )1()0( 000 )0( ))(()( )1mod()))((()( ))(()(,)( 2 +=-++--= ++-+= =+++=== )()0( ps )()1( ps )()2( ps ú ú ú û ù ê ê ê ë é - -+ × ú ú ú û ù ê ê ê ë é + -×ú û ù ê ë é - - =ú û ù ê ë é 1 20 210 10 10 0 )2( 1 )2( 0 00 00 00 011 101 x xx xxx hh hh h s s [ ] )mod( )2()()1)(( )(mod)()()()( 234 23 2 )(3 2 )(23)0( 2 0 )()()()( )2()1( pppp ppppppppps pmpMpNpsps pSpS i iiii -+- +-+++-+--= = å = 25Chap. 8 • Example-7 (cont’d) – Substitute into to obtain the following table – Therefore, we have )(),(),( )2()1()0( pspsps )( ps 0p 1p 2p 3p )0( 0s )0( 0s- )0( 0s )0( 0s- 0 )1( 02 1 s 0 )1(021 s 0 )2( 02 1 s )2(0s- )2( 02 1 s 0 )2( 12 1 s 0 )2(12 1 s- ú ú ú ú ú û ù ê ê ê ê ê ë é × ú ú ú ú û ù ê ê ê ê ë é -- - - = ú ú ú ú û ù ê ê ê ê ë é )2( 12 1 )2( 02 1 )1( 02 1 )0( 0 3 2 1 0 1111 0201 1111 0001 s s s s s s s s 26Chap. 8 • Example-7 (cont’d) – Notice that – So, finally we have: ú ú ú ú ú ú û ù ê ê ê ê ê ê ë é - -+ ++ × ú ú ú ú ú ú û ù ê ê ê ê ê ê ë é × ú ú ú ú û ù ê ê ê ê ë é - = ú ú ú ú ú û ù ê ê ê ê ê ë é + - + 1 20 210 210 0 2 2 2 2 0 )2( 12 1 )2( 02 1 )1( 02 1 )0( 0 10 01 0 10 0000 0000 0000 0000 0000 01100 10100 00010 00001 x xx xxx xxx xh s s s s hh hh h hh ú ú ú û ù ê ê ê ë é × ú ú ú ú ú ú û ù ê ê ê ê ê ê ë é - -× ú ú ú ú ú ú û ù ê ê ê ê ê ê ë é × ú ú ú ú û ù ê ê ê ê ë é --- - -- = ú ú ú ú û ù ê ê ê ê ë é + - + 2 1 0 2 2 2 2 0 3 2 1 0 010 101 111 111 001 0000 0000 0000 0000 0000 11001 20201 11211 00001 10 01 0 10 x x x h s s s s hh hh h hh 27Chap. 8 • Example-7 (cont’d) – In this example, the Winograd convolution algorithm requires 5 multiplications and 11 additions compared with 6 multiplications and 2 additions for direct implementation • Notes: – The number of multiplications in Winograd algorithm is highly dependent on the degree of each . Therefore, the degree of m(p) should be as small as possible. – More efficient form (or a modified version) of the Winograd algorithm can be obtained by letting deg[m(p)]=deg[s(p)] and applying the CRT to )()( pm i )()()(' 11 pmxhpsps LN ---= 28Chap. 8 Modified Winograd Algorithm – 1. Choose a polynomial with degree equal to the degree of and factor it into k+1 relatively prime polynomials with real coefficients, i.e., – 2. Let , use the Euclidean GCD algorithm to solve for . – 3. Compute: – 4. Compute: – 5. Compute by using: – 6. Compute )( pm )( ps )()()()( )()1()0( pmpmpmpm k×××= )()()( )()( pmpmpM ii = 1)()()()( )()()()( =+ pmpnpMpN iiii )()( pN i kifor pmpxpxpmphph iiii ,,1,0 )(mod)()(),(mod)()( )()()()( ×××= == kiforpmpxphps iiii ,,1,0),(mod)()()(' )()()()( ×××== )(' ps å = = k i iiii pmpMpNpsps 0 )()()()( )(mod)()()(')(' )()(')( 11 pmxhpsps LN --+= 29Chap. 8 • Example-8 (Example 8.3.4, p.243 ): Construct a 2X3 convolution algorithm using modified Winograd algorithm with m(p)=p(p-1)(p+1) – Let – Construct the following table using the relationships and – Compute residues from : 1)(,1)(,)( )2()1()0( +=-== ppmppmppm )()()( )()( pmpmpM ii = 1)()()()( )()()()( =+ pmpnpMpN iiii i )()( pm i )()( pM i )()( pn i )()( pN i 0 p 12 -p p 1- 1 1-p pp +2 ( )221 +- p 21 2 1+p pp -2 ( )221 -- p 21 2 21010 )(,)( pxpxxpxphhph ++=+= 210)2(10 )2( 210 )1( 10 )1( 0 )0( 0 )0( )(,)( )(,)( )(,)( xxxpxhhph xxxpxhhph xpxhph +-=-= ++=+= == ))(()(' ,))(()(',)(' 21010 )2( 21010 )1( 00 )0( xxxhhps xxxhhpsxhps +--= +++== 30Chap. 8 • Example-8 (cont’d) – Since the degree of is equal to 1, is a polynomial of degree 0 (a constant). Therefore, we have: – The algorithm can be written in matrix form as: )()( pm i )(' )( ps i [ ] )()'()(' )()()()1(' )()(')( 21 3 2 ' 2 ')0(2 212 )2(' 2 )1(')0( 3 21 2 2 '2 2 '2)0( 21 )2()1( )2()1( xhpspxhps ppxhppppps pmxhpsps ssss SS +++-+--+= -+-++++--= += ú ú ú ú ú û ù ê ê ê ê ê ë é × ú ú ú ú û ù ê ê ê ê ë é - -- = ú ú ú ú û ù ê ê ê ê ë é 21 2 ' 2 ' )0( 3 2 1 0 )2( )1( ' 1000 0111 1110 0001 xh s s s s s s s 31Chap. 8 • Example-8 (cont’d) – (matrix form) – Conclusion: this algorithm requires 4 multiplications and 7 additions ú ú ú û ù ê ê ê ë é × ú ú ú ú û ù ê ê ê ê ë é - × ú ú ú ú û ù ê ê ê ê ë é × ú ú ú ú û ù ê ê ê ê ë é - -- = ú ú ú ú û ù ê ê ê ê ë é - + 2 1 0 1 2 2 0 3 2 1 0 100 111 111 001 000 000 000 000 1000 0111 1110 0001 10 10 x x x h h s s s s hh hh 32Chap. 8 Iterated Convolution • Iterated convolution algorithm: makes use of efficient short-length convolution algorithms iteratively to build long convolutions • Does not achieve minimal multiplication complexity, but achieves a good balance between multiplications and addition complexity • Iterated Convolution Algorithm (Description) – 1. Decompose the long convolution into several levels of short convolutions – 2. Construct fast convolution algorithms for short convolutions – 3. Use the short convolution algorithms to iteratively (hierarchically) implement the long convolution – Note: the order of short convolutions in the decomposition affects the complexity of the derived long convolution 33Chap. 8 • Example-9 (Example 8.4.1, pp.245): Construct a 4X4 linear convolution algorithm using 2X2 short convolution – Let and – First, we need to decompose the 4X4 convolution into a 2X2 convolution – Define – Then, we have: 3 3 2 210 3 3 2 210 )(,)( pxpxpxxpxphphphhph +++=+++= )()()( pxphps = pxxpxpxxpx phhphphhph 321100 321100 )(',)(' )(',)(' +=+= +=+= qpxpxqpxpxeippxpxpx qphphqphpheipphphph )(')('),()(.,.,)(')(')( )(')('),()(.,.,)(')(')( 10 2 10 10 2 10 +==+= +==+= [ ] [ ] [ ] ),()(')(')(' )(')(')(')(')(')(')(')(' )(')(')(')(' ),(),()()()( 2 210 2 11011000 1010 qpsqpsqpsps qpxphqpxphpxphpxph qpxpxqphph qpxqphpxphps =++= +++= +×+= == 34Chap. 8 • Example-9 (cont’d) – Therefore, the 4X4 convolution is decomposed into two levels of nested 2X2 convolutions – Let us start from the first convolution , we have: – We have the following expression for the third convolution: – For the second convolution, we get the following expression: )(')(')(' 000 pxphps ×= [ ]1100101021100 10100000 )()( )()('')(')(' xhxhxxhhppxhxh pxxphhxhpxph --+×+++= +×+=×º× [ ]3322323223322 323211112 )()( )()('')(')(')(' xhxhxxhhppxhxh pxxphhxhpxphps --+×+++= +×+=×º×= [ ]11001010 011001101 '''')''()''( '''')(')(')(')(')(' xhxhxxhh xhxhpxphpxphps ×-×-+×+= ×+×º×+×= : addition: multiplication 35Chap. 8 • Example-9 (Cont’d) • For , we have the following expression: – If we rewrite the three convolutions as the following expressions, then we can get the following table (see the next page): [ ])''()''( 1010 xxhh +×+ [ ] [ ] )]()()()( )()[( )()()()( )()()()()''()''( 31312020 32103210 3131 2 2020 312031201010 xxhhxxhh xxxxhhhhp xxhhpxxhh xxpxxhhphhxxhh +×+-+×+- +++×++++ +×+++×+= +++×+++=+×+ ( ) ( ) 32211010 3 2 2111 3 2 2100 '''' '' '' cppccxxhh bppbbxh appaaxh ++º+×+ ++º ++º This requires 9 multiplications and 11 additions 36Chap. 8 • Example-9 (cont’d) – Therefore, the total number of operations used in this 4X4 iterated convolution algorithm is 9 multiplications and 19 additions 0p 1p 2p 3p 4p 5p 6p 1a 2a 3a 1b 2b 3b 1c 2c 3c 1b- 2b- 3b- 1a- 2a- 3a- Total 8 additions here 37Chap. 8 Cyclic Convolution • Cyclic convolution: also known as circular convolution • Let the filter coefficients be , and the data sequence be . – The cyclic convolution can be expressed as – The output samples are given by • where denotes • The cyclic convolution can be computed as a linear convolution reduced by modulo . (Notice that there are 2n-1 different output samples for this linear convolution). Alternatively, the cyclic convolution can be computed using CRT with , which is much simpler. { }110 ,,, -×××= nhhhh { }110 ,,, -×××= nxxxx [ ] )1mod()()()( -×=O= nn ppxphxhps ( )( )å - = - -×××== 1 0 1,,1,0, n k kkii nixhs ( ) nki mod-( )( )ki - 1-np 1)( -= nppm 38Chap. 8 • Example-10 (Example 8.5.1, p.246) Construct a 4X4 cyclic convolution algorithm using CRT with – Let – Let – Get the following table using the relationships and – Compute the residues )1)(1)(1(1)( 24 ++-=-= pppppm 3 3 2 210 3 3 2 210 )(,)( pxpxpxxpxphphphhph +++=+++= 1)(,1)(,1)( 2)2()1()0( +=+=-= ppmppmppm )()()( )()( pmpmpM ii = 1)()()()( )()()()( =+ pmpnpMpN iiii i )()( pm i )()( pM i )()( pn i )()( pN i 0 1-p 123 -++ ppp )32( 24 1 ++- pp 4 1 1 1+p 123 -+- ppp ( )32241 +- pp 41- 2 12 +p 12 -p 21 21- ( ) ( ) phhphhhhph hhhhhph hhhhhph )2( 1 )2( 03120 )2( )1( 03210 )1( )0( 03210 )0( )( ,)( ,)( +=-+-= =-+-= =+++= 39Chap. 8 • Example-10 (cont’d) – Since – or in matrix-form – Computations so far require 5 multiplications ( ) ( ) pxxpxxxxpx xxxxxpx xxxxxpx )2( 1 )2( 03120 )2( )1( 03210 )1( )0( 03210 )0( )( ,)( ,)( +=-+-= =-+-= =+++= [ ] ( ) ( ))2(0)2(1)2(1)2(0)2(1)2(1)2(0)2(0 2)2()2()2( 1 )2( 0 )2( )1( 0 )1( 0 )1( 0 )1()1()1( )0( 0 )0( 0 )0( 0 )0()0()0( )1mod()()()( ,)()()( ,)()()( xhxhpxhxh ppxphpssps sxhpxphps sxhpxphps ++×-×= +×=+= =×=×= =×=×= ( ) ( ) ( ) ( ) , , )2( 0 )2( 0 )2( 1 )2( 1 )2( 0 )2( 0 )2( 0 )2( 1 )2( 1 )2( 0 )2( 1 )2( 1 )2( 1 )2( 0 )2( 1 )2( 0 )2( 0 )2( 1 )2( 1 )2( 0 )2( 0 )2( 0 xhhxxhxhxhs xhhxxhxhxhs -++=+= +-+=-= ú ú ú û ù ê ê ê ë é + × ú ú ú û ù ê ê ê ë é + -×ú û ù ê ë é - =ú û ù ê ë é )2( 1 )2( 0 )2( 1 )2( 0 )2( 1 )2( 0 )2( 0 )2( 1 )2( 0 )2( 1 )2( 0 00 00 00 011 101 x x xx hh hh h s s : multiplication 40Chap. 8 • Example-10 (cont’d) – Then – So, we have [ ] ( ) ( ) ( ) ( ) ( ))2(121)1(041)0(0413 244 2 244244 2 1)2( 12 1)2( 04 1)1( 04 1)0( 0 2 0 )()()()( )2( 0 )1( 0 )0( 0 )2( 1 )1( 0 )0( 0 )2( 0 )1( 0 )0( 0 222323 )()()( )(mod)()()()( sssp pp pssss pmpMpNpsps sssssssss pppppppp i iiii --+ -+++-+++= ×+++= = - - - - - -+-+++ = å ú ú ú ú ú û ù ê ê ê ê ê ë é × ú ú ú ú û ù ê ê ê ê ë é -- - - = ú ú ú ú û ù ê ê ê ê ë é )2( 12 1 )2( 02 1 )1( 04 1 )0( 04 1 3 2 1 0 1011 0111 1011 0111 s s s s s s s s 41Chap. 8 • Example-10 (cont’d) – Notice that: ( ) ( ) ú ú ú ú ú ú û ù ê ê ê ê ê ê ë é +× ú ú ú ú ú ú û ù ê ê ê ê ê ê ë é - - × ú ú ú ú û ù ê ê ê ê ë é - = ú ú ú ú ú û ù ê ê ê ê ê ë é )2( 1 )2( 0 )2( 1 )2( 0 )1( 0 )0( 0 )2( 1 )2( 02 1 )2( 0 )2( 12 1 )2( 02 1 )1( 04 1 )0( 04 1 )2( 12 1 )2( 02 1 )1( 04 1 )0( 04 1 0000 0000 0000 0000 0000 01100 10100 00010 00001 x x xx x x hh hh h h h s s s s 42Chap. 8 • Example-10 (cont’d) – Therefore, we have ú ú ú ú û ù ê ê ê ê ë é × ú ú ú ú ú ú û ù ê ê ê ê ê ê ë é - - -- -- × ú ú ú ú ú ú û ù ê ê ê ê ê ê ë é × ú ú ú ú û ù ê ê ê ê ë é --- - - - = ú ú ú ú û ù ê ê ê ê ë é --+ -++- - -+- +++ 3 2 1 0 2 2 2 4 4 3 2 1 0 1010 0101 1111 1111 1111 0000 0000 0000 0000 0000 01111 10111 01111 10111 3210 3210 20 3210 3210 x x x x s s s s hhhh hhhh hh hhhh hhhh 43Chap. 8 • Example-10 (cont’d) – This algorithm requires 5 multiplications and 15 additions – The direct implementation requires 16 multiplications and 12 additions (see the following matrix-form. Notice that the cyclic convolution matrix is a circulant matrix) • An efficient cyclic convolution algorithm can often be easily extended to construct efficient linear convolution • Example-11 (Example 8.5.2, p.249) Construct a 3X3 linear convolution using 4X4 cyclic convolution algorithm ú ú ú ú û ù ê ê ê ê ë é × ú ú ú ú û ù ê ê ê ê ë é = ú ú ú ú û ù ê ê ê ê ë é 3 2 1 0 0123 3012 2301 1230 3 2 1 0 x x x x hhhh hhhh hhhh hhhh s s s s 44Chap. 8 • Example-11 (cont’d) – Let the 3-point coefficient sequence be , and the 3-point data sequence be – First extend them to 4-point sequences as: – Then the 3X3 linear convolution of h and x is – The 4X4 cyclic convolution of h and x, i.e. , is: { }210 ,, hhhh ={ }210 ,, xxxx = { } { }0,,,,0,,, 210210 xxxxhhhh == ú ú ú ú ú ú û ù ê ê ê ê ê ê ë é + ++ + =× 22 2112 201102 1001 00 xh xhxh xhxhxh xhxh xh xh xh 4O ú ú ú ú û ù ê ê ê ê ë é + ++ + + =O 2112 201102 1001 2200 4 xhxh xhxhxh xhxh xhxh xh 45Chap. 8 • Example-11 (cont’d) – Therefore, we have – Using the result of Example-10 for , the following convolution algorithm for 3X3 linear convolution is obtained: )1()()()( 422 -+O=×= pxhxhpxphps n xh 4O × ú ú ú ú ú ú û ù ê ê ê ê ê ê ë é --- - - -- = ú ú ú ú ú ú û ù ê ê ê ê ê ê ë é 100000 001111 010111 001111 110111 4 3 2 1 0 s s s s s (continued on the next page) 46Chap. 8 • Example-11 (cont’d) ú ú ú û ù ê ê ê ë é × ú ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ê ë é - - - × ú ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ê ë é × -+ ++- - +- ++ 2 1 0 2 2 2 2 4 4 100 010 101 111 111 111 00000 00000 00000 00000 00000 00000 210 210 20 210 210 x x x h hhh hhh hh hhh hhh 47Chap. 8 • Example-11 (cont’d) – So, this algorithm requires 6 multiplications and 16 additions • Comments: – In general, an efficient linear convolution can be used to obtain an efficient cyclic convolution algorithm. Conversely, an efficient cyclic convolution algorithm can be used to derive an efficient linear convolution algorithm 48Chap. 8 Design of fast convolution algorithm by inspection • When the Cook-Toom or the Winograd algorithms can not generate an efficient algorithm, sometimes a clever factorization by inspection may generate a better algorithm • Example-12 (Example 8.6.1, p.250) Construct a 3X3 fast convolution algorithm by inspection – The 3X3 linear convolution can be written as follows, which requires 9 multiplications and 4 additions ú ú ú ú ú ú û ù ê ê ê ê ê ê ë é + ++ + = ú ú ú ú ú ú û ù ê ê ê ê ê ê ë é 22 2112 201102 1001 00 4 3 2 1 0 xh xhxh xhxhxh xhxh xh s s s s s 49Chap. 8 • Example-12 (cont’d) – Using the following identities: – The 3X3 linear convolution can be written as: ( ) ( ) ( )( ) ( )( ) 2211212121123 22110020202011022 1100101010011 xhxhxxhhxhxhs xhxhxhxxhhxhxhxhs xhxhxxhhxhxhs --++=+= -+-++=++= --+×+=×+= × ú ú ú ú ú ú û ù ê ê ê ê ê ê ë é -- -- -- = ú ú ú ú ú ú û ù ê ê ê ê ê ê ë é 000100 100110 010111 001011 000001 4 3 2 1 0 s s s s s (continued on the next page) 50Chap. 8 • Example-12 (cont’d) – Conclusion: This algorithm, which can not be obtained by using the Cook-Toom or the Winograd algorithms, requires 6 multiplications and 10 additions ú ú ú û ù ê ê ê ë é × ú ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ê ë é × ú ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ê ë é + + + × 2 1 0 21 20 10 2 1 0 110 101 011 100 010 001 00000 00000 00000 00000 00000 00000 x x x hh hh hh h h h

Các file đính kèm theo tài liệu này:

chap_chap8_1691_1903.pdf