Trackerless bittorrent adapted to mobile ad hoc networks

Acknowledgements First, I would like give my sincere thank to my instructor, Dr. Dai Tho Nguyen, Department of Networks and Computer Communications - who has give me guidance and support during the time I work on this thesis. Next, I would like to thank the University of Technology and Engineering - Hanoi National University, the school that has given me a perfect environment to study and train myself for four years. Finally, thanks to my family and friends who have always supported and encouraged me a lot throughout my student life. They were, are, and will be the endless source of encouragement, ambition of my life. Ha Noi, June 2010 Khanh Toan Do INTRODUCTION MANETs and P2P file sharing systems are two areas that have long been studied based on a same principle: P2P paradigm. This paradigm aims to build the services of large-scale distributed systems without any infrastructure. In this model, users have an equal role. Global services are guaranteed thanks to the collaboration of users with each other. For wireless ad hoc networks, the network is a collection of wireless nodes without any central control or any base station. The network node acts as both router and host. Multi-hop routing method is used to ensure connectivity between nodes. For P2P file sharing applications, peers work together to share data and multimedia content. Each participating node will share a portion of its upload capacity to serve other nodes. The global productivity of the system increases exponentially with the number of nodes involved. Gnutella, Freenet, and BitTorrent are examples of P2P file sharing applications on the Internet. Both the file-sharing applications and ad hoc networks are all long been studied. They have been studied very much, but are independent of each other. Only very few studies try to find out if they work well together. These studies only focus on finding contents in MANETs without attention to the efficiency of content sharing. Studying the performances of file-sharing applications on mobile ad hoc networks is really challenging because of wireless channels’ diverse constraints. In fact, when the peers play a role of both routers and end users, then the routing overhead must be considered. Moreover, the performance of the transport layer protocol such as TCP decreases markedly when multi-hop paths are used. That is why file-sharing applications are not expected to work well when deployed on mobile ad hoc networks. Designing an efficient solution for the file sharing problem in wireless ad hoc networks is an important part of our study. In this work, we try to develop a file sharing application for wireless ad hoc networks. My aim is to come up with solutions that reduce the content download time while at the same time improving the distribution ratio in the network by enforcing fair sharing among peers. As efficient and fair content sharing is targeted, we choose BitTorrent given its large usage. We are interested in two main parts of the BitTorrent protocol, which are node discovering and downloading mechanisms. 1 The main framework of our study is based on [8], but the solution proposed in this paper still contains some problems with its node discovery mechanism: - - Cannot detect if a node has left the network or not. Node discovery mechanism need too much time. Because of these limitations, we replace the old presence detection method with another one introduced in [3]. The performances of our approach are proven through the ns-2 simulator to be better than the one in [8]. The remaining of this thesis is divided into 4 chapters:  Chapter 1: P2P networks We present an overview of P2P networks, their architecture, benefits and disadvantages.  Chapter 2: BitTorrent file sharing system This chapter describes about BitTorrent protocol and its installation over wired networks and mobile ad hoc networks  Chapter 3: An improvement in BitTorrent on MANETs Here is the main part of this thesis, which offers an innovative solution to BitTorrent over MANETs.  Chapter 4: Simulation results The last chapter shows the results of our simulation, in order to check the correctness arguments made above.  Conclusion and perspectives In this section, we draw a summary of our work and some plans to do in the future. TABLE OF CONTENTS INTRODUCTION 1 CHAPTER 1: PEER TO PEER NETWORKS .3 1.1 Overview .3 1.2 Definitions .3 1.3 Comparison with client/server architecture 4 1.4 Benefits and weaknesses of P2P networks .5 1.5 Classifying P2P networks .6 1.5.1 Pure P2P networks 6 1.5.2 Hybrid P2P networks 7 CHAPTER 2: BITTORRENT FILE SHARING SYSTEM .10 2.1 Overview .10 2.2 Benefits of the BitTorrent protocol 11 2.3 Limitations 13 2.4 Comparison with other file sharing protocols 13 2.5 Original BitTorrent for wired networks 14 2.5.1 Description .14 2.5.2 Operation .15 2.5.3 Creating and publishing torrents .16 2.5.4 Downloading torrents and sharing files 17 2.6 BitTorrent variant for wireless ad hoc networks 18 2.6.1 Trackerless BitTorrent 18 2.6.2 Packets exchanged between peers .18 2.6.3 Downloading mechanism .19 2.6.4 Selecting a neighbor at random .20 2.6.5 Piece selection strategy .21 2.7 Performance metrics 21 CHAPTER 3: AN IMPROVEMENT OF BITTORRENT ADAPTATION TO MANETS 24 3.1 General ideas 24 3.2 Node presence detection mechanism .24 3.2.1 Soft state bloom filter .25 3.2.2 Bloom filter operations .26 3.2.3 Decoupling information decay and BEACON Intervals 27 3.3 Integration of node presence detection mechanism into BitTorrent 28 3.4 Conclusion 30 CHAPTER 4: SIMULATION RESULTS 31 4.1 Network simulator (NS-2) 31 4.2 Main scenario .31 4.2.1 Estimate the average finish time 34 4.2.2 Estimate the average sharing ratio 35 4.2.3 Estimate the network traffic .36 4.2.4 Impact of the number of network nodes .37 CONCLUSIONS AND PERSPECTIVES 38 REFERENCES .39

pdf47 trang | Chia sẻ: banmai | Lượt xem: 2036 | Lượt tải: 0download
Bạn đang xem trước 20 trang tài liệu Trackerless bittorrent adapted to mobile ad hoc networks, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
1.5.1 Pure P2P networks In the pure models, nodes have the same roles, capabilities and responsibilities. Communications between the nodes is symmetric in the sense that each node can both play the role act as both client and server. There is not existence of a defined client/server relationship between the nodes. The program runs on each node must perform the functions of the client and the server. Because of no control mechanisms and centralized management so that the most difficult problem pure networks model faces is detecting and locating the file in the network. Gnutella applies the mechanism to promote communication: when a node receives the query and cannot find files on your machine, it will forward the search message to all neighboring node. Obviously, this method will not ensure proper operation in a large network, which has many nodes. With Freenet, file-searching mechanism has been improved through the ability selecting the neighbor node to send a message. However, the algorithm used to select neighbors is not simple. 7 Figure 2: A pure P2P network 1.5.2 Hybrid P2P networks As we know the difficulty in locating, searching for files is the biggest drawback of the pure P2P network. The cause of this difficulty comes from the nature of it completely decentralized. It has proposed a new model to reduce this problem by bringing centralized elements to the network. One or more nodes will act as special. They are used to provide information regarding the location of the files in the network instead of being used to store files directly. The special node called the searching server. When a node wants to find files shared on the network, it will send a query to searching servers. Returned from the searching server for the node will find a list of files associated with the node address on the network. Then the node can connect directly to one another to begin file transfer. There is a combination of client/server and P2P architecture. Interaction with the searching server node comply client/server architecture, while exchange files between the nodes is P2P. Therefore, we call this as hybrid model. Napster and BitTorrent are typically file-sharing applications based on hybrid model. Their server does not directly in storing data, but only contains information about the location of files shared on the network. 8 Figure 3: A hybrid P2P network Because this model is the combination of client/server and P2P architectures, should be able to see that the hybrid model contains both advantages and disadvantages of the two architectures. Centralized calculating in files positioning function can lead to overloading the server when the network is looking to expand, and there are many nodes involvement. In return we have a more quickly and more efficiently searching mechanism. The nature of P2P interaction directly between the nodes that still offers significant advantages through utilization of network resources. But here arises a problem concerning the organization of storage the information about the location of the file shared on network. Usually they organize the information into a database. Each record includes a file with file name and address of the node currently hosting the file. This is a popular way to organize applications using P2P file-sharing hybrid model like Napster or OpenNap. However, limitations of this method is that it only allows users to search by file name (or general file identifiers). This is suitable only for sharing music file or photo libraries. With more complex libraries of documents related to the character (text files, books, articles, ...), demand does not stop from looking at the name or identity criteria. Currently almost all file-sharing applications commonly found on the web do not allow to search features of the content within document files. It can be seen that to put this searching feature function into a file-sharing applications, we found it necessary to apply the methods and techniques used in information retrieval . When that information about the documents not only of 9 the name (identification) of documents but also include further information on the keywords contained in the document. Information on the searching server must be organized as a system indexing by keyword. 10 CHAPTER 2: BITTORRENT FILE SHARING SYSTEM In this chapter, we present an overview of the BitTorrent file sharing system and its implementation over wired networks and mobile ad hoc networks. 2.1 Overview File sharing is very well known in the entire community due to its popular usage. A large amount of bandwidth on the Internet is used by different file sharing applications/protocols. It has been estimated that it may account for as much as 43% of all Internet traffic (depending on geographical location) as of February 2009[14]. BitTorrent is one of the most popular protocols for transferring large files. There are many famous applications using BitTorrent protocol such as uTorrent or Podcast. Besides legal distribution of files, BitTorrent is often used to illegally share and download copyrighted material. The technology within the protocol makes it possible to distribute large amounts of data without the need of a high capacity server, and expensive bandwidth. This is probably the main reason for using this protocol instead of using traditional downloading from HTTP or FTP servers. For those sharing and downloading copyrighted files, BitTorrent is profitable because it is decentralized. 11 Figure 4: An example of the BitTorrent file sharing system 2.2 Benefits of the BitTorrent protocol The BitTorrent protocol helps to improve the previous generation of P2P software with a number of significant initiatives, which shall be discussed in next subsections. High speed downloading Simply put, BitTorrent is quite popular due to its parallel file transfer process. Unlike the typical network behavior, which bottlenecks cause performance to plummet; BitTorrent actually increases the demand to improve the performance of a computer network. This behavior increases the stability ratio for the distribution of large, popular content on the internet, for servers of limited capacity. The protocol achieves this by splitting large files into smaller blocks, such as 128 kilobytes, by splitting into blocks, the source components may be required from many sources, but 12 not need to set the complete file to be able to load up the block, which means downloading the software can quickly start contributing as uploaders for other users Simplicity, easy to use Unlike other complex previous P2P protocols, the BitTorrent protocol works on a web based distribution model built over the HTTP protocol. Users simply download a client, which to a user act as an extension of their internet browser. By opening the files ‘.Torrent’, client will allow the users to save files to a location of their choice on the hard drives in the same fashion as original file transfer applications. Until the user closes the client, BitTorrent will continue sharing the file to other clients. Effective use of limited upload bandwidth Home broadband connections have quite limited upload bandwidth available is a big problem of P2P systems. To solve this, the BitTorrent protocol uses a few mechanisms to mitigate for this problem. When a client has received a few initial blocks, it can then use its upload bandwidth to forward those blocks to other machines. Another aspect is that transfers are limited to four open sessions, and the number of files hosted is limited to a few users; by focusing on a small number of activities, the network can reduces overhead. By reducing the number of uploads per machine, and having considerably more machines uploading content, the time required by a client for downloading a large file is considerably less than tr aditional means. Overcoming the Free-Rider problem In typical P2P systems, the performance of the overall system downs when users trying to obtain service without contributing anything to the system. This is an old problem, and BitTorrent was designed with an approach of local optimization by allocating the most upload bandwidth to peers, which are supplying the greatest amount of content. Because of the design of the current BitTorrent Protocol, free riders can only get 20% of the total system bandwidth. High content integrity BitTorrent employs two mechanisms, which make it more reliable. First, to be accepted on a file tracking server, typically new content must be accepted and managed by moderators who manually inspect it. To reduce the overload for moderators, regular suppliers of content can be promoted to immoderate submitters 13 after gaining enough trust. Once the original content is considered trusted, a validation checksum of file will be presented from the server to guarantee the content integrity. 2.3 Limitations BitTorrent protocol does not help users to hide their identifiers. Because the tracker maintains a list of files being shared, it also contains a list of IP addresses of the computers are downloading the file, and the list of the files that have been downloaded previously. Based on the BitTorrent protocol also ascertains the address of the peer in the swarm and of course, the peer can be attacked. Another disadvantage of the BitTorrent protocol is less encouraging peer into seeder after file downloading is complete. Consequently, seeders and peers in swarm will disappear gradually, which means that more older torrents to download the file, then the probability of success lower. BitTorrent has the advantage in the broadband environment, such as DSL, cable, satellite ... but for dial-up Internet users using the BitTorrent protocol will not work, because dial-up connection or disconnection and speed not load high. 2.4 Comparison with other file sharing protocols The method used to distribute files between eDonkey2000 and BitTorrent networks is similar, but the clients in the eDonkey network usually share and download many files, making the bandwidth per carrier becomes less. In contrast, transporting in BitTorrent is much faster by focusing on a file or a specific group of files. Original eDonkey2000 protocol provides very little resistance to fraud machines (download more, upload very little), the new version of eDonkey2000 client has installed the system which encourages more uploads. For example, systematic program eMule (credits system) have a score system to encourage uploaders. One client will prioritize the shipment to its previous position by moving the client to the top of the queue to wait for less time. This system proved effective because a queue for each client using eMule often has hundreds, even thousand entries. KaZaA is a protocol similar to BitTorrent protocol, but it has another point that is its distinguishing workstations by dedication level (Participation Level). Contribution level increases when you upload and decreases when you download. Once you upload a resource, it devotes to the person with the highest-level first and then this person devote to the lower contribution level one and so on. This model is similar to the pyramid model; the most uploader is at the top position of the pyramid, 14 and the less one is on the bottom. KaZaA is only appropriate model distribution of resources for a large number of users, it has been shown that the file at the bottom of the pyramid can be downloaded faster than on the case by means of HTTP (in the case file is large). Nevertheless, KaZaA has a small downside is that it believes the report of the workstations on the level of dedication, so clients can cheat with a lot of dedication for the workstation which is not official. 2.5 Original BitTorrent for wired networks 2.5.1 Description BitTorrent protocol defines a method to disseminate and share files online. Before having BitTorrent protocol, there were some P2P protocols, which allow a computer in the network to share files with other computers without having a centralized server. BitTorrent is an improvement from the previous P2P protocol. BitTorrent protocol has a principle to work closely with the ability to customize, reliable and cost of maintaining a list of computer file sharing better than the previous P2P protocols. Because of communication following standard TCP/ IP, BitTorrent protocol can operate on regular Internet connection [10]. BitTorrent protocol, which allows users to distribute large amounts of data without depending much on their computers, would be needed for standard Internet hosting. This protocol works as an alternative data distribution method that makes even small devices (e.g. mobile phones) with low bandwidth capable of participating in large data transfers. BitTorrent client is a program operating under the BitTorrent protocol. Each BitTorrent client can compares, requirements, and transports files on the network using the BitTorrent protocol. Files can contain any information, including text, audio, video and encrypted content. First, a user playing the role of file-provider makes a file available to the network. This first user's file that is called a seed allows other users, called peers, to connect and start to download the file. When new peers connect to the network and request the same file, their computer gets a different piece of the data from the seed. Once multiple peers have multiple pieces of the seed, each of them becomes a source for that continuously distributes the file. The effect of this is to reduce the dependent on the initial user and distribute the file download task among the seed and many peers. With BitTorrent, there is no need to provide a computer data quantities, which could jeopardize the task by overwhelming all resources, but the same result is still reached: each peer eventually receiving the entire file. 15 After the file successfully and completely downloaded, the peer can shift roles and become an additional seed, helping the other peers to get the whole file. This eventual change from peers to seeds determined the number of times a file is available in its complete form. This distributed nature of BitTorrent leads to spread the file throughout peers. As more peers join the swarm, the likelihood of a successful download increases. Compared with standard Internet hosting, this reduces the original distributor's hardware and bandwidth resource costs significantly. It also reduces dependence on the original distributor and provides a source for the file, which is generally temporary and therefore more difficult to trace than providing by the long-term availability of a server in standard file distribution techniques. 2.5.2 Operation BitTorrent greatly reduces the load on the server, since users usually download files between them, not the server. As shown by the colored bars below each client, the file is downloaded in a random order, instead of carrying a sequential order. Unlike the systems of traditional file sharing, its main objective is to provide an efficient way to distribute a single file to a large group of people, forcing all those who download a file to share it with others. First is distributing by conventional means a small file with a. torrent. This file is static, so often found on websites or distributed via email. The file 'torrent' contains the address of a "search server", called tracker, which is responsible for locating potential sources for the file or part thereof. This server is really centralized and it provides statistics on the number of transfers, the number of nodes with a full copy of the file and the number of nodes that have only a portion thereof. The library file or files you want to download from the sources found by the tracker, and at the same time as the download, it starts to climb parts of the file available to other sources, using the bandwidth allocated to it. Since the action of sharing begins even before completing the download of a file, each peer will inevitably contribute to the distribution of this file. The system is responsible for rewarding those who share more, the higher the bandwidth the greater the number of connections to discharge peers to be established. When a peer starts downloading a file, BitTorrent does not necessarily come at the beginning of the file, but is low in parts at random, after peers are connected to 16 download the file. If the peers are available online each part of the whole file (even if they are scattered), eventually every peers will get a full copy of it. Of course, initially one must have the complete file to begin the process. This method produces significant improvements in transfer rate when many users connect to download the same file. Where there is not any node with having complete file ("seeder" or "seed") connected to the tracker, the possibility exists that the file cannot be completed. 2.5.3 Creating and publishing torrents The peers that are distributed over the network treat the file as a separation of it in a number of identically sized pieces, typically between 32 KB and 4 MB each. Each peer performs a checksum (checksum) for each part, using the SHA-1 algorithm, and storing it in the torrent file. Parts greater than 512 KB will reduce the size of a torrent file for each payload, but this would reduce the efficiency of the protocol. When another peer later receives a particular piece, the hash of the piece is compared to the recorded hash to check that is free from error. Peers that provide complete files be called seeds (seeders) and the peer that provides the initial copy of the file is called initial seed (initial seeder). The exact information that is contained in the torrent file depends on the BitTorrent protocol’s version. By convention, the name of a torrent file has the extension ".torrent". Torrent files have a section called "announce", which specifies the URL of the tracker, and "Info" section, which contains the names of the files, their sizes, length of piece used, and SHA-1 hash code for each of the pieces, all this information is used by clients to verify the integrity of the data received. Once completed torrents files are posted on a website or elsewhere, and are registered with an origin server, which is called a tracker, it keeps the list of clients who are currently participating on the torrent file. Alternatively, in a decentralized system, each peer acts as a tracker. BitTorrent clients such as μTorrent, BitComet, KTorrent and Deluge, implement this, using methods of Distributed hash table (DHT). Azureus also supports tracing method is incompatible nodes (April 2007) with the DHT that offers its customers [2]. Once DHT has passed, a "private" flag like broadcast flag was unofficially announced, telling clients to restrict the use of decentralized monitoring whether the user desires. Flag is intentionally placed in info section so it cannot be disabled or 17 removed without changing the identity of the torrent. The purpose of the flag is to prevent torrents from being shared with clients without access to tracker. The flag was requested to be included in the formal specification in August, in 2008, but it has not been accepted. 2.5.4 Downloading torrents and sharing files Using any Internet browser, like Firefox, browse the site has a list of torrents, download it then use BitTorrent client open out there. After opening the torrent file, BitTorrent will connect to the trackers, the trackers will give it a list of the peers are downloading this file. A group of members (or peers) of a BitTorrent network which download the same file is called the swarm. When the client connects to the swarm, it will start sharing files with each other. The clients will share the pieces together instead of sharing direct with the trackers, so the number of clients in the swarm can develop very quickly. In BitTorrent network, each of clients shares some of its resources such as data, upload bandwidth with other peers interested in the same content in order to go up the global system capacity. Peers participating to download the same content form a torrent. A peer discovers other peers by contacting a central rendezvous node called tracker. The latter store IP addresses of all peers in the torrent and manage statistics of uploader and downloader on network. To make the replication of the content in the network easier and to guarantee multi-sourcing, a file is subdivided into a set of pieces. Then, each piece is also subdivided into blocks. Seed (seeder) is a peer, which has all pieces of the file. When the peer is still downloading pieces, it is called leecher. The neighbors of a node are contained in a peer list which it can open a TCP-connection to exchange data and information. Only four simultaneous outgoing active TCP connections are accepted by the protocol. These neighbors are called effective neighbors. They are chose by the BitTorrent’s choking algorithm. This algorithm is performed periodically. When the choking period expires, a peer chooses to unchoke the 3 uploaders to him at the highest rate. It is a best slot unchoking. This strategy, called tit-for-tat, ensures reciprocity and enforces collaboration among peers. Now to discover new upload capacities, a peer chooses randomly the fourth peer to unchoke. This slot is called optimistic slot. All other neighbors are left choked. Once unchoked, a peer chooses a piece to download using a specific piece selection strategy that is called local rarest first. Indeed, each peer maintains an update-to-date list of pieces owned by all its neighbors. When choosing a piece, a peer selects the piece with the least redundancy in its neighbors. One of the rarest pieces is selected randomly in case of equality. Rarest first is supposed to grow up the entropy of pieces in the network that enforces cooperation and hence improves global performance [14]. 18 2.6 BitTorrent variant for wireless ad hoc networks Some works have tried to adapt to the BitTorrent network to MANETs (eg, [1] and [13]). They focus only improved phase detection node without resolve to improve the efficiency of content sharing. Michiardi et al [7] study in the performance of a cooperative mechanism for distributing content from a source to a destination of great potential. They propose to implement a small change BitTorrent that allows detection neighbors and local traffic. This is done by selecting only close neighbors as effective neighbors. The result is a decrease in download time and total energy consumption. However, this below solution goes further by focusing not only on download time, but also on the sharing between peers. 2.6.1 Trackerless BitTorrent According to the feature of MANETs that peers themselves linking into a network, a centralized tracker cannot install. To adapt BitTorrent over wireless ad hoc network, we need a new mechanism to replace tracker's role: discovering and identifying the other peers. M.K Sbai gave a method using HELLO message to solve such problem [8]. To discover new peers, a peer floods periodically a HELLO message to the network and waits for HELLO REPLY messages. These HELLO messages are forwarded to the other neighbors with some initial TTL (Time-To-Live) to control the scope of the flood and hence the visibility of a peer. Receiving a HELLO message, a peer decreases the TTL and forwards it to its wireless neighbors, and so on until its TTL reaches zero. 2.6.2 Packets exchanged between peers There are 2 types of packets exchanged in the network: Data packets and control packets. We choose to send data packets via TCP connections because TCP provides reliability and congestion controls needed to transfer blocks of file. Control packets such as HELLO message (or BEACON message in new version) and piece updates are better when using UDP. Here are the control packets exchanged between peers: - HELLO: see section 2.6.1 - HELLO REPLY: see section 2.6.1 - UPDATE PIECE LIST: when a peer receives a new piece, it sends an UPDATE PIECE LIST to all peers with whom it can exchange data. 19 - PIECE OFFER REQUEST: when a peer i unchokes a peer j, it sends a PIECE OFFER REQUEST packet to j. This packet contains the list of pieces that i has already downloaded. - PIECE OFFER REPLY: receiving a PIECE OFFER REQUEST, a peer sends a PIECE OFFER REPLY packet to the sender. A flag included in this packet indicates the offer being accepted or not. 2.6.3 Downloading mechanism To profit from the advantages of the limit neighborhood, M. Sbai gave an idea that modified the original BitTorrent to adapt to MANETs [8]. A few TCP connections are created to distant peers in addition to those with close peers. Pieces can then spread over the network and they will be propagated in different directions, which improve the sharing ratio and the download completion time. Therefore, several zones of the network can be active simultaneously, which makes the network becoming more dynamic. The new choking algorithm is used by utilizing routing information. It distributes optimistic unchokes between distant and close peers and adds a specific neighbor selection mechanism to choose a remote peer. It also applies a new piece selection strategy when the distant peer is offered. Unlike regular BitTorrent with limited neighborhood, this method requires a global knowledge about the identifiers of peers in the network. In the modification BitTorrent, each peer maintains 2 tables: NEARBY NEIGHBOR TABLE (NNT) and FAR NEIGHBOR TABLE (FNT). One and two hops neighbors will are added to NNT, the others belong to FNT. The PIECE UPDATE packets are sent only to neighbors in NNT. Due to the piece selection strategy operates only on the NNTs; peers don’t need to know about all pieces in the network. Indeed, in wireless networks, the copy of pieces is more efficient when it is based on statistics in the nearby neighborhood since this guarantees a faster local replication compared to a large neighborhood. As in BitTorrent, the choking algorithm selects three best uploaders as effective neighbors. These three neighbors are chosen from both nearby and far neighbor tables. The peer then serves these three neighbors during the next choking period. But in addition to these effective neighbors, the fourth random neighbor is selected from one of the two tables (optimistic slot) using a round robin policy that guarantees an optimal balance between the random unchokes locally and the transmission of pieces to remote neighbors in order to improve diversity. In detail, the peer selects a peer one time from FNT, q times from NNT and so on. In this protocol, the ratio of the number of time slots spent on serving nearby neighbors and those for serving far neighbors bases on the quantum q. It is also the number of slots 20 that a peer should wait before unchoking a distant neighbor again. Sbai’s experiments shown that the optimal choice of the quantum q depend on the number of nodes in network, the maximum length of a path between 2 nodes (hm) and the number of pieces that can be sent during a chocking slot to a node: 𝑞 = 𝛼ℎ𝑚 𝛼1 . ℎ𝑚 2 . 𝑁 2 - N: number of nodes in the network - hm: maximum length of a path between 2 nodes - αi: number of pieces that can be sent during a chocking slot to a node located at i hops For example, in our simulation where N = 40, hm = 12, α1 = 100, α1 = 1, the quantum q is 1 (q is a integer) 2.6.4 Selecting a neighbor at random Selecting a far neighbor In a regular BitTorrent client, a random peer is selected by a uniform probability to unchoke optimistically. While in wireless networks however, the gain we get from optimistic unchoking in terms of diversity increases with the number of hops. Therefore, a peer has more interest in unchoking a farther peer than another one closer to it. Thus, in this modification version of BitTorrent, to select a distant peer to unchoke from FNT, the peer starts by selecting the number of hops to that peer with a probability p given by this formula: 𝑝 ℎ = ℎ ℎ𝑚 + (ℎ𝑚 − 1) 𝑖𝑓 ℎ > ℎ𝑚 − 1 0 𝑒𝑙𝑠𝑒 - hm: maximum number of hops seen by the peer - Suppose that FNT contains only peer at hm and hm -1 hops Once the number of hops h is chosen, the peer at h hops, which is chosen randomly, is the optimal node to unchoke. 21 Selecting a nearby neighbor Once the peer needs to select a nearby neighbor, it chooses a node from NNT in a uniform random way. 2.6.5 Piece selection strategy When the offering is far When receiving a PIECE OFFER message from a wireless neighbor, it considers that it is an offer from far neighbor if the number of hops to the offering node is greater than 2. A specific piece selection strategy will be applied, called the absent piece strategy. The peer firstly computes the redundancy of the offered pieces in its NNT and its piece pool. At the opposite of BitTorrent, the candidate pieces will be those with zero redundancy due to the fact that no need to download a piece from a far node if it exists at nearly neighbor table. So, a piece can be accepted only none of its near neighbors has downloaded it before. If there is more than one absent pieces, one is chosen randomly. The absent piece can then be replicated quickly in the near neighborhood. If no absent piece is noticed, the peer sets a REJECT flag in the PIECE OFFER REPLY packet. When the offering is near Once the peer receives a PIECE OFFER from one of its nearby neighbors, local rarest first algorithm will be used. Pieces with the least number of copies in the NNT are selected. Rarest first is supposed to increase the entropy of pieces in the network that enforces collaboration and hence improves global performance. 2.7 Performance metrics Here are the performance metrics of BitTorrent protocol that we use in through our study: – Uij: Total bytes uploaded by peer i to peer j – Dij: Total bytes downloaded by peer i from peer j. (Ui j = Dji) 22 – Rij: Ratio of sharing between peer i and peer j. 𝑅𝑖𝑗 = 𝑀𝑖𝑛(𝑈𝑖𝑗 ,𝐷𝑖𝑗 ) 𝑀𝑎𝑥(𝑈𝑖𝑗 ,𝐷𝑖𝑗 ) – Ni: Number of neighbors j of peer i such that Uij = 0 or Dij = 0 – Ri: Sharing ratio for node i 𝑅𝑖 = 1 𝑁𝑖 . 𝑅𝑖𝑗 𝑗 | 𝑈𝑖𝑗 ≠0 𝑜𝑟 𝐷𝑖𝑗 ≠0 – Fi: The finish time of peer i. It is the time by which i receives all pieces of the file. Following our instruction, there are some additional performance metrics related to topological positions of peers below. The file is supposed to exist at one seed S at the beginning of the session. Our metrics quantify t he quality of service perceived by peers as a function of their relative positions with respect to the seed. – Fh: Average finish time of peers (or nodes) located at h hops from seed S. 𝐹𝑛 = 1 𝑛ℎ . 𝐹𝑖 𝑖 | 𝐻 𝑖 = ℎ Where nh is the number of peers located at h hops from seed S and H(i) is a function that gives the number of hops between any node i and the seed S. – Rh: Average sharing ratio of peers (or nodes) located at h hops from seed S. 23 𝑅𝑛 = 1 𝑛ℎ . 𝑅𝑖 𝑖 | 𝐻(𝑖) = ℎ 24 CHAPTER 3: AN IMPROVEMENT OF BITTORRENT ADAPTATION TO MANETS 3.1 General ideas BitTorrent uses a centralized node (called a tracker) to solve node discovery problems. This means that each node participating in the BitTorrent system must know the address of the tracker and contact the tracker to get the addresses of other nodes in the network. This cannot be done in mobile ad hoc networks because this kind of networks is formed of mobile devices and it may not rely on a tracker as in conventional wired networks. This issue has been mentioned in the article [8], which uses HELLO messages, but it still contains some problems. Recall that to discover new peers, a peer periodically floods the network with a HELLO message and waits for HELLO REPLY messages. HELLO messages are transmitted to mobile neighbors with some initial TTL (Time-To-Live) to control the scope of the flood and hence the visibility of the peer. This process consumes too many resources and increases the network overhead. Besides, one of the most important characteristics of mobile ad hoc networks is the frequent movement of nodes. Therefore, that solution is not optimal for BitTorrent protocol over MANETs. Dr. Tran Thi Minh Chau gives a new mechanism that solve above issues [3]. This method has never ever applied in any P2P systems. It is a motivation for me to do my thesis. To improve original BitTorrent’s performance when it is deployed on MANETs, we replace the old peer discovery technique with the node presence detection mechanism proposed in [3]. To achieve the same stability as the former BitTorrent, we need some modifications, which will be introduced in section 3.3. 3.2 Node presence detection mechanism We assume that each node in the network has a unique ID. This may be a MAC – or statically assigned IP address. In the old node presence detection mechanism, the IDs of the nodes are transmitted via HELLO messages, and these messages will be forwarded to the other nodes. The obvious problem of this approach is that the amount of data distributed to each node would increase linearly with the number of nodes in the network, which increases network load accordingly. 25 In order to avoid this problem, we use another method to discover node presence in the network, which is described in [3]. This method uses a data structure, which contains the node’s presence status: soft state bloom filter. 3.2.1 Soft state bloom filter A soft state bloom filter is a data structure that represents a set S = {s 1, s2 … sn} of n elements to support membership queries. It is an array of m*l bits (l is typically small, e.g. l = 3). Each element of S stores a counter. A node initializes all counters with the maximum value 2 l - 1. This maximum value indicates that the position of the Bloom filter is not set. K independent hash functions h1, h2 … hk are used, each maps every possible item in the set to a uniformly distributed value in the range {1 … m}. Any hash function with good random distribution and outputs long enough for the Bloom filter size can be selected. There are two basic operations in a bloom filter. To add a new element x, which is a node ID in our specific case, the counters at positions h1(x), h2(x)… hk(x) in S are set to 0, as depicted in Figure 5: Figure 5: A soft state Bloom filter In order to determine the presence of node x, the bits at these positions will be checked. If any of these is 2 l – 1, it means x is not in S. Otherwise, it can be assumed that x is in S with some remaining probability of a false positive that occurs when an element is actually not in the set, but all respective bit positions have been set to 0 by adding other elements. 26 Periodically, before sending a beacon message, a node applies the hash functions to its own ID, all counters at h1(id), h2(id)… hk(id) will be set to 0, and the others that are not equal to 2 l - 1 will be incremented by one. The aging of the information means that positions of leaving nodes will be expired. Thus, it can remove nodes, which are no longer present from the Bloom filter. 3.2.2 Bloom filter operations Timeout and refresh Each node performs three following steps during the timeout and refresh operation: (1) Increment each si by one, if it has not reached the limit of 2 l - 1 (2) Refresh the information about the node’s own presence, by setting s hj(ID) = 0 for all j = 1, 2… k where ID is the ID of the local node. (3) Broadcast the updated aggregate to the neighbors. Merge operation When a node receives a beacon message, the Bloom filter information of the sender will be merged into the local array. Each Bloom filter position is set to the minimum of local and received ages. Presence detection To check another node x is present or not, a node u checks its local aggregate at positions h1(x), h2(x)… hk(x). If one of them equal to 2 l - 1, we conclude that x is not present. We can observe the operation of the BF through the following example: For simplicity, we consider 2 node X, Y and 4 events: 27 Figure 6: Update Bloom filter 3.2.3 Decoupling information decay and BEACON Intervals In some situations, the aging of information is slower than the beaconing in a naive way, for example, when the counter values are increased only every b > 1 beaconing intervals. Let A and B be two neighboring nodes. Assume that si is currently at value 5 in both A and B, but no longer refreshed and should be subject to decay. A increases si to 6, but B’s value of si is still 5, and then A will receive another beacon from B with si = 5, thus resetting A’s aggregate. This will happen as same as with B’s value. To solve this problem, we need a small soft state bloom filter implementation: whenever a beacon is sent without previously ageing the counter values in the local aggregate, the sent-out copy of the aggregate must be decayed (and refreshed) prior to sending it to the neighbors. That is, if A’s value of si is currently 5, and it sends a beacon without decaying its local copy of the aggregate, the broadcast value will be si = 6. 28 Figure 7: Problem when updating Bloom filter 3.3 Integration of node presence detection mechanism into BitTorrent The main goal of our variant of BitTorrent is the combination of BitTorrent downloading mechanism with the new peer discovery method described above to achieve better performances. My simulation results described in the next section will prove this. Instead of using HELLO message, a peer floods periodically the network with a BEACON message. This message contains not only its ID but also a Bloom Filter, which maintains the status of peers in the network. Before sending BEACON message, the peer resets the information about its own presence by applying the hash functions, set hi(ID) = 0 to confirm its existence to the other peers. Upon receiving a BEACON message, a peer merges information from this message and its Bloom filter. The other sender’s neighbor situations are also updated by this operation. Each Bloom filter position is set to the minimum of local and received ages. For example, the sender’s positions in Bloom filter are set to 0 - the minimum value. When a peer no longer exists in the network, its positions in Bloom filter will reach the maximum value: 2 l – 1. Therefore, it needs to be removed from the neighbor tables to avoid affecting to the downloading algorithm. 29 The sending and receiving BEACON messages strategies are very important. They not only help to update the routing table, identify peer IDs but also allow peers to detect and add new peers into their neighbor table. When there are enough entries (≥ 4) in the routing table , the BitTorrent neighboring selection and downloading algorithms, which are present in section 3.6, will be deployed. Firstly, four peers in NNT/FNT are chosen as effective neighbors. They have to be guaranteed as regular peers by using Presence detection operation of Bloom filter (see in section 3.2.2). After that, peers start transferring their pieces. This process will be ongoing and new status of peers in the network will be updated again by using BEACON messages. Here are some functions, which are used to implement the new presence detection mechanism: Algorithm 1: update Bloom filter: Algorithm 2: Check presence: Algorithm 3: Update neighbor tables of peer x: merge(BF sbl): for i ← 0 to m x.bloom[i] = Min(x.bloom[i], sbl[i]) checkPresence(Peer p): for i ← 0 to k if (bloom[hi[p.ID]] = 2 l - 1) removeFromNeighbors(p) for i ← 0 to neighbors.size if hopCount(x, neighbors[i]) ≥ 2 FNT.append(neighbors[i]) else NNT.append(neighbors[i]) 30 Figure 8: Flowchart of new implementation 3.4 Conclusion To sum up, in this chapter, we presented an idea to improve the BitTorrent protocol over MANETs using Bloom filters. According to prove our solution, the next chapter will explain our NS-2 simulation and its results. 31 CHAPTER 4: SIMULATION RESULTS 4.1 Network simulator (NS-2) Network Simulator (Version 2), widely known as NS-2 [9], is a discrete event network simulator. NS is popularly used in the simulation of routing and multicast protocols, among others, and is heavily used in networking research. Simulation of wired as well as wireless network functions and protocols (e.g., routing algorithms, TCP, UDP) can be done using NS-2. In general, NS-2 provides users with a way of specifying such network protocols and simulating their corresponding behaviors. NS-2 is popular in academia for its extensibility (because of its open source model) and plentiful online documentation. 4.2 Main scenario To evaluate the performances of our proposition, we conduct simulation experiments using network simulator (NS-2). We develop an NS-2 application implement the one that M. Sbai built based on basic concepts of BitTorrent algorithm presented in section 2.6 and a new peer discovering mechanism using Bloom filters. Base on the simulation results, we will show that new presence detection mechanism can be deployed to replace the old version completely with better performances. The below parameters are the detailed models of the system in which new version simulation is deployed. 32 Table 1: Values of parameters using in the simulation Parameter Value Number of nodes 40 Number of nodes per row 10 Distance 40 m Routing protocol DSDV Starting download time 1500 s Data rate 1 MB/s File size 10 MB Block size 1 KB Piece size 100 blocks Number of seeds 1 Choking period 40s TTL 2 Quantum (q) variable We initialize a network of N nodes (N = 40 in default) distributed in a grid topology (10 nodes per row). The distance between 2 neighbors is set to 40m while the wireless transmission range is 50m. 33 Figure 9: Network topology Every node connects to each other using 802.11 MAC Layer with the RTS/CTS- Data/ACK mechanism enabled. The data rate is set to 1 MB/s. For ad hoc routing, we use the DSDV proactive protocol [4], and the node mobility is not set. At the beginning of simulation, node 0 located at top left is the seed and all other nodes are leechers. To ensure the convergence of the protocol to equilibrium, the file size is set equal to 10 Mbytes. The file is subdivided into 100 pieces, which contains 100 blocks of 1KB. The first 1500s of simulation is the time interval for network stabilizing, the BEACON messages are also sent in this time to calculate routing tables and initialize Bloom filters. After that, all peers start downloading the file using BitTorrent algorithms by first looking for each other then sharing the pieces. BitTorrent chocking algorithm period is taken 40s. To study the performance of our solution, we run both the old simulation with HELLO messages and the new simulation with BEACON messages to compare the results. We change some parameters of the simulation and observe the behavior of the download finish times of peers and their sharing ratios. 34 4.2.1 Estimate the average finish time Figure 10: Average finish time for our enhanced BitTorrent compared to ordinary BitTorrent with limited neighborhood Figure 8 compares the finish time of ordinary BitTorrent with limited neighborhood (TTL = 2) and our version of BitTorrent using Bloom filter. Each curve presents the average finish time Fh as a function of the number of hop to the seed. As expected, the finish time increases as far as we move away from the source. One can notice in the figure that the finish time in almost peers, which use BEACON messages, is better than HELLO version. This is because of the time to discovery new peers using BEACON messages is shorter than using HELLO, beside, in our solution, peers don’t need to send HELLO_REPLY to the HELLO sender, and they only have to process the BEACON messages. Thus, the network loading also is reduced considerably. 35 4.2.2 Estimate the average sharing ratio Figure 11: Average sharing ratio for our enhanced BitTorrent compared to ordinary BitTorrent with limited neighborhood Another important factor in BitTorrent over wireless ad-hoc networks is the sharing ratio of peers. This observation is illustrated in Figure 9 which compares the sharing ratios of the ordinary BitTorrent with limited neighborhood (TTL=2) with our variant of BitTorrent using Bloom filters. We can easily see that the difference between 2 versions is not significant. This is due to the change of using BEACON does not affect much the neighbor tables. Therefore, the neighboring selection mechanism still works almost the same as the ordinary version. That guarantees the completeness of BitTorrent algorithm. 36 4.2.3 Estimate the network traffic Figure 12: Number of control packets sent between 2 versions To continue to evaluate the performance of the solution, we count the control packets exchanged in the network and compare the differences between the two approaches. We can easily see in figure 10 that the number of control packets transmitted when using BEACON method is significantly less than the former method. Especially, in some cases, using BEACON version can reduces the number of packets a half compared with previous one. This is because when using BEACON method, the nodes only send periodically BEACON messages to their neighbors. While using the HELLO message, the nodes must forward HELLO messages to their neighbors based on the TTL parameter. The larger TTL is, the more the number of control packets sent. Besides, when a node receives a HELLO message, it has to send a HELLO REPLY message to the sender; it also increases the network overhead. This result indicates that our method offers better performances. 37 4.2.4 Impact of the number of network nodes Figure 13: Average finish time for N = 30, 40 and 50 When the number of nodes in the network increases, the number of neighbors of a node increases as well, especially in nodes far away from seed. At the same time, the node will be able to exchange more pieces with the others. This reduces the finish time. It is confirmed in figure 11 where the finish time when N = 30 is highest and this number decreases when the number of nodes increases. However, this trend does not affect too much the nearby nodes. That is why we choose a network with 40 nodes as a stable topology. 38 CONCLUSIONS AND PERSPECTIVES The P2P file-sharing applications in wireless ad hoc networks should provide good quality services on time and finished sharing ratio. These applications have great potential, but the nature of MANETs brings many difficulties to deploy these applications. The solution that we proposed in this thesis has partly resolved this problem. The simulation has also shown desirable results on the feasibility of the solution where both finish time as well as sharing ratio are significantly improved. Our future work will be on modifying the mobility of nodes and adapting our solution to mobile applications. High dynamic of such networks will open the way to new interesting problems. 39 REFERENCES [1] A. Nandan, S. Das, G. Pau, M. Gerla, Cooperative downloading in vehicular ad hoc networks, WONS, Washington, USA, 2005. [2] Cohen, Bram. BitTorrent — a new P2P app. Yahoo eGroups. Retrieved 2007-04-15. [3] Tran Thi Minh Chau. Lightweight Detection of Node Presence in MANETs. Elsevier Ad Hoc Networks 7 (7), pp. 1386-1399, September 2009. [4] C.Perkins, P.Bhagwat. Highly Dynamic Destination-Sequenced Distance-Vector Routing (DSDV) for Mobile Computers, ACM SIGCOMM'93. [5] (Ed) Javed I. Khan and Adam Wierzbicki. Special Issue, Elsevier Journal of Computer Communication, Foundation of Peer-to-Peer Computing. Volume 31, Issue 2, February 2008. [6] LOO Alfred Wai-Sing. Peer-to-Peer Computing: Building Supercomputers with Web Technologies. [7] Michiardi P., Urvoy-Keller G. Performance analysis of cooperative content distribution for wireless ad hoc networks, WONS 2007, Obergurgl. [8] Mohamed Sbai, Chadi Barakat, Jaeyoung Choi, Anwar Al Hamra and Thierry Turletti. Adapting BitTorrent to wireless ad hoc networks. 7th International conference on ad hoc networks and wireless 2008 (AD-HOC NOW), Sophia Antipolis, France, September 2008. [9] NS: The Network Simulator, [10] Peter Antal. An overview of the BitTorrent Protocol. [11] S. Rajagopalan, C-C. Shen, A cross-Layer Decentralized BitTorrent for Mobile Ad hoc Networks, MOBIQUITOUS, San Jose, USA, 2006. [12] Tamilmani, Karthik. Studying and enhancing the Bittorent protocol. Stony Brook University. Retrieved 2006-05-06. [13] Torrentfreak. [14] Urvoy-Keller. Rarest First and Choke Algorithms Are Enough. IMC'2006, 2006, Rio de Janeiro.

Các file đính kèm theo tài liệu này:

  • pdfLUANVANDoKhanhToan.pdf