Kỹ thuật viễn thông - Chapter 6: Folding

Determine the minimum number of registers using lifetime analysis. • Input each variable at the time step corresponding to the beginning of its lifetime. If multiple variables are input in a given cycle, these are allocated to multiple registers with preference given to the variable with the longest lifetime. • Each variable is allocated in a forward manner until it is dead or it reaches the last register. In forward allocation, if the register i holds the variable in the current cycle, then register i + 1 holds the same variable in the next cycle. If (i + 1)-th register is not free then use the first available forward register. • Being periodic the allocation repeats in each iteration. So hash out the register Rj for the cycle l + N if it holds a variable during cycle l. • For variables that reach the last register and are still alive, they are allocated in a backward manner on a first come first serve basis. • Repeat steps 4 and 5 until the allocation is complete

pdf17 trang | Chia sẻ: huyhoang44 | Lượt xem: 574 | Lượt tải: 0download
Bạn đang xem nội dung tài liệu Kỹ thuật viễn thông - Chapter 6: Folding, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
Chapter 6: Folding Keshab K. Parhi 2• Folding is a technique to reduce the silicon area by time- multiplexing many algorithm operations into single functional units (such as adders and multipliers) • Fig(a) shows a DSP program : y(n) = a(n) + b(n) + c(n) . • Fig(b) shows a folded architecture where 2 additions are folded or time-multiplexed to a single pipelined adder One output sample is produced every 2 clock cycles Þ input should be valid for 2 clock cycles. • In general, the data on the input of a folded realization is assumed to be valid for N cycles before changing, where N is the number of algorithm operations executed on a single functional unit in hardware. Chap. 6 3 Folding Transformation : •Nl + u and Nl + v are respectively the time units at which l-th iteration of the nodes U and V are scheduled. • u and v are called folding orders (time partition at which the node is scheduled to be executed) and satisfy 0 £ u,v £ N-1. • N is the folding factor i.e., the number of operations folded to a single functional unit. • Hu and Hv are functional units that execute u and v respectively. • Hu is pipelined by Pu stages and its output is available at Nl + u + Pu. • Edge U®V has w(e) delays Þ the l-th iteration of U is used by (l + w(e)) th iteration of node V, which is executed at N(l + w(e)) + v. So, the result should be stored for : DF(U®V) = [N(l + w(e)) + v] – [Nl + Pu + u] Þ DF(U®V) = Nw(e) - Pu + v – u ( independent of l ) Chap. 6 4 • Folding Set : An ordered set of N operations executed by the same functional unit. The operations are ordered from 0 to N- 1. Some of the operations may be null. For example, Folding set S1={A1,0,A2} is for folding order N=3. A1 has a folding order of 0 and A2 of 2 and are respectively denoted by (S1|0) and (S2|2). • Example: Folding a retimed biquad filter by N = 4. Addition time = 1u.t., Multiplication time = 2u.t., 1 stage pipelined adder and 2 stage pipelined multiplier(i.e., PA=1 and PM=2) The folding sets are S1 = {4, 2, 3, 1} and S2 = {5, 8, 6, 7} Chap. 6 5 Folding equations for each of the 11 edges are as follows: DF(1®2) = 4(1) – 1 + 1 – 3 = 1 DF(1®5) = 4(1) – 1 + 0 – 3 = 0 DF(1®6) = 4(1) – 1 + 2 – 3 = 2 DF(1®7) = 4(1) – 1 + 3 – 3 = 3 DF(1®8) = 4(2) – 1 + 1 – 3 = 5 DF(3®1) = 4(0) – 1 + 3 – 2 = 0 DF(4®2) = 4(0) – 1 + 1 – 0 = 0 DF(5®3) = 4(0) – 2 + 2 – 0 = 0 DF(6®4) = 4(1) – 2 + 0 – 2 = 0 DF(7®3) = 4(1) – 2 + 2 – 3 = 1 DF(8®4) = 4(1) – 2 + 0 – 1 = 1 Chap. 6 6 • Retiming for Folding : – For a folded system to be realizable DF(UàV) ³ 0 for all edges. – If D’F(UàV) is the folded delays in the edge UàV for the retimed graph then D’F(UàV) ³ 0. So, Nwr(e) – PU + v – u ³ 0 where wr(e) = w(e) + r(V) - r(U) Þ N(w(e) + r(V) – r(U) ) - PU + v – u ³ 0 Þr(U) – r(V) £ DF(UàV) /N Þr(U) – r(V) £ ëDF(UàV) /Nû (since retiming values are integers) 7• Register Minimization Technique : Lifetime analysis is used for register minimization techniques in a DSP hardware. • A ‘data sample or variable’ is live from the time it is produced through the time it is consumed. After that it is dead. • Linear lifetime chart : Represents the lifetime of the variables in a linear fashion. • Example : Note : Linear lifetime chart uses the convention that the variable is not live during the clock cycle when it is produced but live during the clock cycle when it is consumed. Chap. 6 8 • Due to the periodic nature of DSP programs the lifetime chart can be drawn for only one iteration to give an indication of the # of registers that are needed. This is done as follows : Ø Let N be the iteration period Ø Let the # of live variables at time partitions n ³ N be the # of live variables due to 0-th iteration at cycles n-kN for k ³ 0. In the example, # of live variables at cycle 7 ³ N (=6) is the sum of the # of live variables due to the 0-th iteration at cycles 7 and (7 - 1´6) = 1, which is 2 + 1 = 3. • Matrix transpose example : Matrix Transposer i | h | g | f | e | d | c | b | a i | f | c | h | e | b | g | d | a Chap. 6 8à1212088i 7à99-257h 6à66-426g 5à1111275f 4à88044e 3à55-213d 2à1010462c 1à77231b 0à44000a LifeToutTdiffTzloutTinSample vTo make the system causal a latency of 4 is added to the difference so that Tout is the actual output time. Chap. 6 10 • Circular lifetime chart : Useful to represent the periodic nature of the DSP programs. • In a circular lifetime chart of periodicity N, the point marked i (0 £ i £ N - 1) represents the time partition i and all time instances {(Nl + i)} where l is any non-negative integer. • For example : If N = 8, then time partition i = 3 represents time instances {3, 11, 19, }. • Note : Variable produced during time unit j and consumed during time unit k is shown to be alive from ‘j + 1’ to ‘k’. • The numbers in the bracket in the adjacent figure correspond to the # of live variables at each time partition. Chap. 6 11 Forward Backward Register Allocation Technique : Note : Hashing is done to avoid conflict during backward allocation. Chap. 6 12 Steps for Forward-Backward Register allocation : • Determine the minimum number of registers using lifetime analysis. • Input each variable at the time step corresponding to the beginning of its lifetime. If multiple variables are input in a given cycle, these are allocated to multiple registers with preference given to the variable with the longest lifetime. • Each variable is allocated in a forward manner until it is dead or it reaches the last register. In forward allocation, if the register i holds the variable in the current cycle, then register i + 1 holds the same variable in the next cycle. If (i + 1)-th register is not free then use the first available forward register. • Being periodic the allocation repeats in each iteration. So hash out the register Rj for the cycle l + N if it holds a variable during cycle l. • For variables that reach the last register and are still alive, they are allocated in a backward manner on a first come first serve basis. • Repeat steps 4 and 5 until the allocation is complete. Chap. 6 13 • Example : Forward backward Register Allocation Chap. 6 14 • Folded architecture for matrix tranposer : 15 • Register minimization in folded architectures : Ø Perform retiming for folding Ø Write the folding equations Ø Use the folding equations to construct a lifetime table Ø Draw the lifetime chart and determine the required number of registers Ø Perform forward-backward register allocation Ø Draw the folded architecture that uses the minimum number of registers. 3à48 5à67 4à46 2à25 1à14 3à33 --2 4à91 TinàToutNode •Example : Biquad Filter ØSteps 1 & 2 have already been done. ØStep 3:The lifetime table is then constructed. The 2nd row is empty as DF(2àU) is not present. Note : As retiming for folding ensures causality, we need not add any latency. Chap. 6 16 ØStep 4 : Lifetime chart is constructed and registers determined. ØStep 5 : Forward-backward register allocation Chap. 6 17 ØFolded architecture is drawn with minimum # of registers.

Các file đính kèm theo tài liệu này:

  • pdfchap_chap6_6195_7362.pdf