Exploring Optimal Topology and Routing Algorithm for 3D Network on Chip

: Problem statement: Network on Chip (NoC) is an appropriate candidate to implement interconnections in SoCs. Increase in number of IP blocks in 2D NoC will lead to increase in chip area, global interconnect, length of the communication channel, number of hops transversed by a packet, latency and difficulty in clock distribution. 3D NoC is evolved to overcome the drawbacks of 2D NoC. Topology, switching mechanism and routing algorithm are major area of 3D NoC research. In this study, three topologies (3D-MT, 3D-ST and 3D-RNT) and routing algorithm for 3D NoC are presented. Approach: Experiment is conducted to evaluate the performance of the topologies and routing algorithm. Evaluation parameters are latency, probability and network diameter and energy dissipation. Results: It is demonstrated by a comparison of experimental results analysis that 3D-RNT is a suitable candidate for 3D NoC topology. Conclusion: The performance of the topologies and routing algorithm for 3D NoC is analysed. 3D-MT is not a suitable candidate for 3D NoC, 3D-ST is a suitable candidate provided interlayer communications are frequent and 3D-RNT is a suitable candidate as interlayer communications are limited.


INTRODUCTION
According to Moore's law, number of transistors per chip is doubled every two years that enables Integrated Circuit (IC) manufactures to provide more powerful electronic gadgets that derive multiple applications. Starting with 0.25 µm CMOS technology, wire delay dominates gate delay and the gap between wire delay and gate delay becomes wider as process technology improves, thus wires, not transistors are determining the performance of chips. Increase in number of transistors in a chip permits chip designers to integrate various components of an electronic system on a single IC to implement a complete System on a Chip (SoC) in which various components are named as cores or Intellectual Property (IP) blocks which include microprocessor, DSP, memory unit, I/O controller, analog signal or Radio Frequency module (Helali et al., 2006).
The constraints like very short time to test, exploit reuse and market, force the designers to design SoCs with IP blocks which are designed by different IP vendors. Major challenge in SoC is interconnecting more number of IP blocks. Nowadays, on chip communications in SoCs are realized by direct cross bar interconnections and shared buses that are inefficient on performance, cost and reliability.
Network-on-Chip (NoC) inter connection scheme is proposed as a unified solution for the design problems faced in SoC. NoC is an on-chip communication methodology proposed to resolve the increased interconnection problems in SoC. In NoC paradigm, IP blocks are connected to a packet switched network through routers, in turn routers are interconnected each other to accomplish on chip communications (Helali et al., 2006;Dally and Towles, 2001). NoC applies packet switching network theories to on-chip communications. A node in NoC comprises of an IP block and a router.
Advantages of NoC over conventional crossbar interconnections and shared buses (Owens et al., 2007): • Wire segmentation and wire sharing design techniques are used to resolve the performance bottleneck caused by wire delay • It uses a distributed control mechanism, resulting in a scalable interconnection network architecture • Flexible and user-defined network topology • Point-to-point connections and a Globally Asynchronous Locally Synchronous (GALS) implementation decouple the IP blocks • Creating derivative products by easily adding and removing IP blocks from network Research in 3D NoC is now emerging to realize on chip communications in 3D ICs. 3D integration is achieved by stacking a number of 2D layers. Interconnection of two neighboring 2D layers is accomplished using Through-Silicon-Vias (TSVs) which provide vertical channel through vertical interconnect links (Loi et al., 2007).
This way, everything remains in 2D, except for the vertical links. These links can be integrated in the communication system by so-called 3D or vertical routers. Number of TSVs in an 3D architecture should be minimized as it has alignment problem and occupies a considerable chip area (Davis et al., 2005). In this study, three 3D network topologies and on-demand source initiated routing algorithm are presented. Topologies and routing algorithm are evaluated using simulation tool Network Simulator-2 (NS-2). Experimental results are analyzed by a comparison of various parameters of the three topologies.
This study is organized as follows: 3D NoC and its advantages, the proposed topologies and routing algorithm, materials and method, evaluation metrics, discussion and conclusion.
3D NoC: It is challenging to design mixed signal chips which combine analog processing IP blocks, such as antenna or pixel arrays, with digital IP blocks, such as microprocessors and memories, in conventional planar chip-making processes. To overcome the challenge, analog IP blocks are kept on one layer, the digital IP blocks are placed on one or two other layers and combine them in a chip which is termed as 3-D IC (Bernstein et al., 2007;Topol et al., 2006).

Advantages of 3D ICs
Chip area: Minimization of chip area is important as the yield is in general increased. Not all circuits that are manufactured function properly. The yield is the percentage of correct circuits. Causes of failure, like crystal defects, defects in the masks, defects due to contact with dust particles are less likely to affect a chip when its area is smaller.
The major advantage of 3-D IC is considerable reduction in chip length, resulting in a decrease in the chip area. Total chip area = x 2 and network area = y 2 , where x-chip length, y-network length. It is assumed that length x of 2D chip is 68µm, y is 64 µm and length of constant a is 2 µm.
From Fig. 1, the following equations can be derived: x y 2 b c 2 * x x y / 2 2 * y x y / 2 , But a x y / 2 P 2xa 2ya x 2 = 2xa + 2ya + y 2 (4) Using Eq. 1 and 2, the area b and c can be calculated. Total outer periphery of the chip P can be calculated using Eq. 3 and 4 gives the total chip area.
In order to reduce chip area, single layer is divided into multiple layers. Both length x and y are decreased as number of layers is increased. Chip length x reduction is not 50%, but it is only 47% as length a is constant when IP blocks are placed in two layers in lieu of placing in single layer. Similarly, there is no proportionate area reduction in 3D IC when number of layers is increased. It is concluded from Fig. 2 that length x is decreased as number of layers is increased. Trade off must be made between number of layers and chip area reduction in 3D IC as there is no proportionate reduction in length x. Hop count: Pavlidis and Friedman (2007) have shown that average number of hops a packet transverses from source to destination node in 3D IC is: 1 2 3 1 2 3 1 2 1 2 3 n n n (n n n ) n n Hops 3(n n n 1) where, n 1 ×n 2 is dimension of a layer and n 3 is number of layer. Multiplication of n 1 and n 2 gives number of nodes in a layer. Figure 3 shows that number of hops a packet transverses to reach destination from source node reduces when number of layers is increased Energy dissipation: In the NoC paradigm, energy dissipation for interconnection of IP blocks depends on two independent parameters (Kahng et al., 2009): • Injected traffic load • Energy dissipated in the switches and interswitch wire segments Energy dissipation in switches and interswitch wire segments is considered here (Banerjee et al., 2004).
The following assumptions are made • Uniform traffic patterns are used for message • Length of wire segment between two switches is fixed • Each switch consumes 1 Pico-joule (Pj) energy to process a packet • Each interconnect wire segment consumes 1 Picojoule (Pj) energy to transfer a packet Energy dissipated by a packet for 1 hop: where, D is distance between source and designation node which is expressed in terms of hops: where, total energy consumed to transverse a packet from source to destination node is represented by E Packet . E switch represents the energy consumed by both buffering and switching activities of a router and E wiresegment represents the energy consumed by charging and discharging of link capacitance. Each packet transverses (n+1) switches through n wire segments, thus hop count is n. It is considered that number of IP blocks to be placed is 36 and maximum number of layers is 4. Energy dessipation for single packet is expressed in Eq. 6. Energy dissipation can also be expressed using Eq. 7 and total energy dessipated for n packets can be computed using Eq. 8. Average number of hops for a packet is calculated using Eq. 5 from which average energy dissipation is computed using Eq. 7. From Fig.  3, it is concluded that average energy dissipation is reduced as number of layers is increased.
In addition to chip area, hop count and energy dissipation, 3-D ICs have following advantages: • Layer yield decreases exponentially with increases in layer size, so splitting a single layer design into two or more can save money in the end • Increasing the number of transistors that are within one clock cycle of each other • Maximum global-interconnect length and the average global-interconnect length both decrease by a factor equal to the square root of the number of layers being stacked • Higher packing density 3D NoC is becoming an emerging research area as 3D ICs apply NoC to realize on chip communications. Recursive Network Topology (3D-RNT) are derived by modifying the 2D topologies and presented here as shown in Fig. 4-6. Three layers are considered in each topology in which IP blocks may be either homogeneous or heterogeneous as the topologies have more than one layer. In the topologies, cluster is formed by grouping four nodes with one node is identified as Cluster Head (CH) which can act as CH as well as node. A layer has four clusters, thus total number of nodes in a layer is sixteen (Feero and Pande, 2009;Loh, 2008).
IP blocks are connected to routers, in turn routers are interconnected using horizontal interconnect links. Vertical interconnect links (TSVs) are used to interconnect neighboring layer routers to form 3D network. Interlayer communications are realized only through CHs. CHs and other nodes can be identified by an ID of three digits XYZ. First digit X of the ID represents a layer, second digit Y represents a cluster and third digit Z represents either a node or CH. In 3D-ST, CHs in a layer are interconnected to communicate each other in single hop.
Intercluster nodes cannot communicate each other, they will communicate each other only through CHs. In 3D MT and 3D-RNT, CHs will communicate each other only through intercluster nodes that are allowed to communicate each other in single hop. Source node ID → 242→ layer 2, cluster 4, node 2 241→ layer 2, cluster 4, node 1 141→ layer1, cluster 4, node 1 041→ layer 0, cluster 4, node 1 031→ layer 0, cluster 3, node 1 032→ layer 0, cluster 3, node 2 Destination node ID:032 Shortest path between nodes of ID 242 and 032 in 3D-MT and 3D-RNT is 242→241→141→041→044→032 Hierarchy and clustering are two efficient network techniques as they provide scalability, higher performance, easy maintainability, manageability and resource reusability, are applied in developing 3D routing algorithm. Routing is an on demand and source initiating. Hierarchy and clustering of nodes are represented using tree structure as shown in Fig.  7. The tree has three levels in its hierarchy, level 1 represents layer, level 2 represents CHs and level 3 represents nodes. A CH and nodes connected to the CH will form a cluster. Figure 8 shows packet format used to transfer message in the 3D network. The header flits have n bits in which first bit is End of Message (EOM) and the second bit is Start of Message (SOM). The rest of the bits (n-2) indicates ID of the source and destination nodes. Payload flits contain variable length message of maximum length 400-n bits as packet size is 50 bytes.

Pseudo code for routing algorithm for 3D-ST:
// Declare source node ID as ABC and destination node ID DEF // Declare Layer ID as 0, 1 and 2 // Hierarchical level-1 Reaching destination layer // Level-2 Reaching destination CH // Level-3 Reaching destination node which may be either a CH or node // Route packet to hierarchical level- Simulation output results are observed for latency at two cases: • Different switch buffer size at fixed injected load • Different traffic rate at fixed switch buffer size Two different traffic rates, 4.5 Kbps and 1 Kbps are assigned at each traffic. Number of packets sent by individual source is 563 at the rate 4.5 Kbps and 126 at the rate 1 Kbps respectively. Table 2 shows simulation results for latency at the traffic rate 4.5 Kbps with switch buffer size varies from 5-50 packets.
Switch buffer size is fixed as 50 packets and packets are injected into the network at different traffic rates. Simulation results for latency are shown in Table 3.
Evaluation metrics: Following evaluation parameters are selected for performance evaluation of the topologies and routing algorithm: • Latency • Drop probability

RESULTS AND DISCUSSION
Latency: Latency is defined as the time taken by a packet to go through a communication path from its source to its intended sink. Latency is calculated in two cases: • For different switch buffer size at the fixed traffic rate 4.5 and 1 Kbps • For different traffic rate at fixed switch buffer size of 50 packets Packets wait more time in switch buffer as number of packets increases, thus latency of a packet is increased. On comparing with 3D-MT and 3D-ST, latency is decreased in the 3D-RNT as shown in Fig.  9 at the traffic rate 4.5 Kbps as the topology has lesser number of links between the five randomly chosen source-sink pairs.
For the three topologies, it is observed from Fig. 10 that latency remains constant when buffer size increases at the traffic rate 1 Kbps which indicates that switch buffer size is not sensitive to the communication load at 1 Kbps. Performance of 3D-RNT is superior to other topologies with respect to latency at the traffic rate 1 Kbps.
When traffic rate exceeds 2.5 Kbps, there is rapid change in latency as shown in Fig. 11. Switches utilize their buffer capacity to maximum possible extent so as to avoid packet drop up to traffic rate around 2.5 Kbps. Packet drop starts as traffic rate exceeds around 2.5 Kbps owing to the shortage of switch buffer capacity. It is concluded that the performance of 3D-RNT with respect to latency is superior to 3D-MT and 3D-ST at any injected traffic rate and switch buffer size.
Drop probability: Drop probability is calculated from the number of packets sent by the source and received by the sink.
It is also calculated for the two cases.
Case I: Performance of the three topologies is identical with respect to drop probability as same number of packets is received in the three topologies at all instances. It is observed from Fig. 12 that drop probability is insensitive with respect to switch buffer size at traffic rate 1 Kbps.
It is concluded that 3D-MT is not suitable candidate as total distance d is higher than other two topologies diatance. 3D-ST is a suitable candidate provided interlayer traffic is very frequent as all other nodes from the source node have distance less than 5.
Energy dissipation: Distance (D) to transfer a packet for seven randomly chosen source-sink pairs are calculated and given in Table 5. Energy dissipation to transfer a single and multiple packets can be calculated using (Eq.7 and 8) respectively.
From Fig. 14, it is concluded that 3D-ST dissipates lesser energy when traffic is in interlayer. In intralayer communications, both 3D-MT and 3D-RNT dissipates lesser energy.

CONCLUSION
In this study, three 3D topologies and hierarchical, cluster based routing algorithm for 3D NoC are presented. Simulation results for the performance of the three topologies are analyzed by a comparison of various parameters of the topologies. As far as drop probability is concerned, performance of the three topologies is identical as same number of packets is received at all instances. It is shown using the parameter latency that 3D-RNT is out performing than 3D-MT and 3D-ST. It is concluded that 3D-MT is not suitable candidate on comparing the performance of the three topologies in terms of latency, network diameter and energy dissipation. 3D-RNT is a suitable candidate for the applications where interlayer communications of IP blocks are very limited. 3D-ST is an appropriate topology candidate of 3D NoC for the applications where frequent interlayer communications of IP blocks are required.