Acknowledgements
First, I would like give my sincere thank to my instructor, Dr. Dai Tho Nguyen,
Department of Networks and Computer Communications - who has give me guidance
and support during the time I work on this thesis.
Next, I would like to thank the University of Technology and Engineering -
Hanoi National University, the school that has given me a perfect environment to
study and train myself for four years.
Finally, thanks to my family and friends who have always supported and
encouraged me a lot throughout my student life. They were, are, and will be the
endless source of encouragement, ambition of my life.
Ha Noi, June 2010
Khanh Toan Do
INTRODUCTION
MANETs and P2P file sharing systems are two areas that have long been studied
based on a same principle: P2P paradigm. This paradigm aims to build the services of
large-scale distributed systems without any infrastructure. In this model, users have an
equal role. Global services are guaranteed thanks to the collaboration of users with
each other. For wireless ad hoc networks, the network is a collection of wireless nodes
without any central control or any base station. The network node acts as both router
and host. Multi-hop routing method is used to ensure connectivity between nodes. For
P2P file sharing applications, peers work together to share data and multimedia
content. Each participating node will share a portion of its upload capacity to serve
other nodes. The global productivity of the system increases exponentially with the
number of nodes involved. Gnutella, Freenet, and BitTorrent are examples of P2P file
sharing applications on the Internet.
Both the file-sharing applications and ad hoc networks are all long been studied.
They have been studied very much, but are independent of each other. Only very few
studies try to find out if they work well together. These studies only focus on finding
contents in MANETs without attention to the efficiency of content sharing. Studying
the performances of file-sharing applications on mobile ad hoc networks is really
challenging because of wireless channels’ diverse constraints. In fact, when the peers
play a role of both routers and end users, then the routing overhead must be
considered. Moreover, the performance of the transport layer protocol such as TCP
decreases markedly when multi-hop paths are used. That is why file-sharing
applications are not expected to work well when deployed on mobile ad hoc networks.
Designing an efficient solution for the file sharing problem in wireless ad hoc
networks is an important part of our study.
In this work, we try to develop a file sharing application for wireless ad hoc
networks. My aim is to come up with solutions that reduce the content download time
while at the same time improving the distribution ratio in the network by enforcing fair
sharing among peers. As efficient and fair content sharing is targeted, we choose
BitTorrent given its large usage. We are interested in two main parts of the BitTorrent
protocol, which are node discovering and downloading mechanisms.
1
The main framework of our study is based on [8], but the solution proposed in
this paper still contains some problems with its node discovery mechanism:
-
-
Cannot detect if a node has left the network or not.
Node discovery mechanism need too much time.
Because of these limitations, we replace the old presence detection method with
another one introduced in [3]. The performances of our approach are proven through
the ns-2 simulator to be better than the one in [8].
The remaining of this thesis is divided into 4 chapters:
Chapter 1: P2P networks
We present an overview of P2P networks, their architecture, benefits and
disadvantages.
Chapter 2: BitTorrent file sharing system
This chapter describes about BitTorrent protocol and its installation over wired
networks and mobile ad hoc networks
Chapter 3: An improvement in BitTorrent on MANETs
Here is the main part of this thesis, which offers an innovative solution to
BitTorrent over MANETs.
Chapter 4: Simulation results
The last chapter shows the results of our simulation, in order to check the
correctness arguments made above.
Conclusion and perspectives
In this section, we draw a summary of our work and some plans to do in the
future.
TABLE OF CONTENTS
INTRODUCTION 1
CHAPTER 1: PEER TO PEER NETWORKS .3
1.1 Overview .3
1.2 Definitions .3
1.3 Comparison with client/server architecture 4
1.4 Benefits and weaknesses of P2P networks .5
1.5 Classifying P2P networks .6
1.5.1 Pure P2P networks 6
1.5.2 Hybrid P2P networks 7
CHAPTER 2: BITTORRENT FILE SHARING SYSTEM .10
2.1 Overview .10
2.2 Benefits of the BitTorrent protocol 11
2.3 Limitations 13
2.4 Comparison with other file sharing protocols 13
2.5 Original BitTorrent for wired networks 14
2.5.1 Description .14
2.5.2 Operation .15
2.5.3 Creating and publishing torrents .16
2.5.4 Downloading torrents and sharing files 17
2.6 BitTorrent variant for wireless ad hoc networks 18
2.6.1 Trackerless BitTorrent 18
2.6.2 Packets exchanged between peers .18
2.6.3 Downloading mechanism .19
2.6.4 Selecting a neighbor at random .20
2.6.5 Piece selection strategy .21
2.7 Performance metrics 21
CHAPTER 3: AN IMPROVEMENT OF BITTORRENT ADAPTATION TO
MANETS 24
3.1 General ideas 24
3.2 Node presence detection mechanism .24
3.2.1 Soft state bloom filter .25
3.2.2 Bloom filter operations .26
3.2.3 Decoupling information decay and BEACON Intervals 27
3.3 Integration of node presence detection mechanism into BitTorrent 28
3.4 Conclusion 30
CHAPTER 4: SIMULATION RESULTS 31
4.1 Network simulator (NS-2) 31
4.2 Main scenario .31
4.2.1 Estimate the average finish time 34
4.2.2 Estimate the average sharing ratio 35
4.2.3 Estimate the network traffic .36
4.2.4 Impact of the number of network nodes .37
CONCLUSIONS AND PERSPECTIVES 38
REFERENCES .39
47 trang |
Chia sẻ: banmai | Lượt xem: 2036 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Trackerless bittorrent adapted to mobile ad hoc networks, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
1.5.1 Pure P2P networks
In the pure models, nodes have the same roles, capabilities and
responsibilities. Communications between the nodes is symmetric in the sense that
each node can both play the role act as both client and server. There is not existence of
a defined client/server relationship between the nodes. The program runs on each node
must perform the functions of the client and the server. Because of no control
mechanisms and centralized management so that the most difficult problem pure
networks model faces is detecting and locating the file in the network. Gnutella applies
the mechanism to promote communication: when a node receives the query and cannot
find files on your machine, it will forward the search message to all neighboring node.
Obviously, this method will not ensure proper operation in a large network, which has
many nodes. With Freenet, file-searching mechanism has been improved through the
ability selecting the neighbor node to send a message. However, the algorithm used to
select neighbors is not simple.
7
Figure 2: A pure P2P network
1.5.2 Hybrid P2P networks
As we know the difficulty in locating, searching for files is the biggest drawback
of the pure P2P network. The cause of this difficulty comes from the nature of it
completely decentralized. It has proposed a new model to reduce this problem by
bringing centralized elements to the network. One or more nodes will act as special.
They are used to provide information regarding the location of the files in the network
instead of being used to store files directly. The special node called the searching
server. When a node wants to find files shared on the network, it will send a query to
searching servers. Returned from the searching server for the node will find a list of
files associated with the node address on the network. Then the node can connect
directly to one another to begin file transfer. There is a combination of client/server
and P2P architecture. Interaction with the searching server node comply client/server
architecture, while exchange files between the nodes is P2P. Therefore, we call this as
hybrid model. Napster and BitTorrent are typically file-sharing applications based on
hybrid model. Their server does not directly in storing data, but only contains
information about the location of files shared on the network.
8
Figure 3: A hybrid P2P network
Because this model is the combination of client/server and P2P architectures,
should be able to see that the hybrid model contains both advantages and
disadvantages of the two architectures. Centralized calculating in files positioning
function can lead to overloading the server when the network is looking to expand, and
there are many nodes involvement. In return we have a more quickly and more
efficiently searching mechanism. The nature of P2P interaction directly between the
nodes that still offers significant advantages through utilization of network resources.
But here arises a problem concerning the organization of storage the information about
the location of the file shared on network. Usually they organize the information into a
database. Each record includes a file with file name and address of the node currently
hosting the file. This is a popular way to organize applications using P2P file-sharing
hybrid model like Napster or OpenNap. However, limitations of this method is that it
only allows users to search by file name (or general file identifiers). This is suitable
only for sharing music file or photo libraries. With more complex libraries of
documents related to the character (text files, books, articles, ...), demand does not stop
from looking at the name or identity criteria. Currently almost all file-sharing
applications commonly found on the web do not allow to search features of the content
within document files. It can be seen that to put this searching feature function into a
file-sharing applications, we found it necessary to apply the methods and techniques
used in information retrieval . When that information about the documents not only of
9
the name (identification) of documents but also include further information on the
keywords contained in the document. Information on the searching server must be
organized as a system indexing by keyword.
10
CHAPTER 2: BITTORRENT FILE SHARING SYSTEM
In this chapter, we present an overview of the BitTorrent file sharing system and
its implementation over wired networks and mobile ad hoc networks.
2.1 Overview
File sharing is very well known in the entire community due to its popular usage.
A large amount of bandwidth on the Internet is used by different file sharing
applications/protocols. It has been estimated that it may account for as much as 43% of
all Internet traffic (depending on geographical location) as of February 2009[14].
BitTorrent is one of the most popular protocols for transferring large files. There
are many famous applications using BitTorrent protocol such as uTorrent or Podcast.
Besides legal distribution of files, BitTorrent is often used to illegally share and
download copyrighted material.
The technology within the protocol makes it possible to distribute large amounts
of data without the need of a high capacity server, and expensive bandwidth. This is
probably the main reason for using this protocol instead of using traditional
downloading from HTTP or FTP servers. For those sharing and downloading
copyrighted files, BitTorrent is profitable because it is decentralized.
11
Figure 4: An example of the BitTorrent file sharing system
2.2 Benefits of the BitTorrent protocol
The BitTorrent protocol helps to improve the previous generation of P2P
software with a number of significant initiatives, which shall be discussed in next
subsections.
High speed downloading
Simply put, BitTorrent is quite popular due to its parallel file transfer process.
Unlike the typical network behavior, which bottlenecks cause performance to
plummet; BitTorrent actually increases the demand to improve the performance of a
computer network. This behavior increases the stability ratio for the distribution of
large, popular content on the internet, for servers of limited capacity. The protocol
achieves this by splitting large files into smaller blocks, such as 128 kilobytes, by
splitting into blocks, the source components may be required from many sources, but
12
not need to set the complete file to be able to load up the block, which means
downloading the software can quickly start contributing as uploaders for other users
Simplicity, easy to use
Unlike other complex previous P2P protocols, the BitTorrent protocol works on a
web based distribution model built over the HTTP protocol. Users simply download a
client, which to a user act as an extension of their internet browser. By opening the
files ‘.Torrent’, client will allow the users to save files to a location of their choice on
the hard drives in the same fashion as original file transfer applications. Until the user
closes the client, BitTorrent will continue sharing the file to other clients.
Effective use of limited upload bandwidth
Home broadband connections have quite limited upload bandwidth available is a
big problem of P2P systems. To solve this, the BitTorrent protocol uses a few
mechanisms to mitigate for this problem. When a client has received a few initial
blocks, it can then use its upload bandwidth to forward those blocks to other machines.
Another aspect is that transfers are limited to four open sessions, and the number of
files hosted is limited to a few users; by focusing on a small number of activities, the
network can reduces overhead. By reducing the number of uploads per machine, and
having considerably more machines uploading content, the time required by a client
for downloading a large file is considerably less than tr aditional means.
Overcoming the Free-Rider problem
In typical P2P systems, the performance of the overall system downs when users
trying to obtain service without contributing anything to the system. This is an old
problem, and BitTorrent was designed with an approach of local optimization by
allocating the most upload bandwidth to peers, which are supplying the greatest
amount of content. Because of the design of the current BitTorrent Protocol, free riders
can only get 20% of the total system bandwidth.
High content integrity
BitTorrent employs two mechanisms, which make it more reliable. First, to be
accepted on a file tracking server, typically new content must be accepted and
managed by moderators who manually inspect it. To reduce the overload for
moderators, regular suppliers of content can be promoted to immoderate submitters
13
after gaining enough trust. Once the original content is considered trusted, a validation
checksum of file will be presented from the server to guarantee the content integrity.
2.3 Limitations
BitTorrent protocol does not help users to hide their identifiers. Because the
tracker maintains a list of files being shared, it also contains a list of IP addresses of
the computers are downloading the file, and the list of the files that have been
downloaded previously. Based on the BitTorrent protocol also ascertains the address
of the peer in the swarm and of course, the peer can be attacked.
Another disadvantage of the BitTorrent protocol is less encouraging peer into
seeder after file downloading is complete. Consequently, seeders and peers in swarm
will disappear gradually, which means that more older torrents to download the file,
then the probability of success lower. BitTorrent has the advantage in the broadband
environment, such as DSL, cable, satellite ... but for dial-up Internet users using the
BitTorrent protocol will not work, because dial-up connection or disconnection and
speed not load high.
2.4 Comparison with other file sharing protocols
The method used to distribute files between eDonkey2000 and BitTorrent
networks is similar, but the clients in the eDonkey network usually share and
download many files, making the bandwidth per carrier becomes less. In contrast,
transporting in BitTorrent is much faster by focusing on a file or a specific group of
files. Original eDonkey2000 protocol provides very little resistance to fraud machines
(download more, upload very little), the new version of eDonkey2000 client has
installed the system which encourages more uploads. For example, systematic program
eMule (credits system) have a score system to encourage uploaders. One client will
prioritize the shipment to its previous position by moving the client to the top of the
queue to wait for less time. This system proved effective because a queue for each
client using eMule often has hundreds, even thousand entries.
KaZaA is a protocol similar to BitTorrent protocol, but it has another point that
is its distinguishing workstations by dedication level (Participation Level).
Contribution level increases when you upload and decreases when you download.
Once you upload a resource, it devotes to the person with the highest-level first and
then this person devote to the lower contribution level one and so on. This model is
similar to the pyramid model; the most uploader is at the top position of the pyramid,
14
and the less one is on the bottom. KaZaA is only appropriate model distribution of
resources for a large number of users, it has been shown that the file at the bottom of
the pyramid can be downloaded faster than on the case by means of HTTP (in the case
file is large). Nevertheless, KaZaA has a small downside is that it believes the report
of the workstations on the level of dedication, so clients can cheat with a lot of
dedication for the workstation which is not official.
2.5 Original BitTorrent for wired networks
2.5.1 Description
BitTorrent protocol defines a method to disseminate and share files online.
Before having BitTorrent protocol, there were some P2P protocols, which allow a
computer in the network to share files with other computers without having a
centralized server. BitTorrent is an improvement from the previous P2P protocol.
BitTorrent protocol has a principle to work closely with the ability to customize,
reliable and cost of maintaining a list of computer file sharing better than the previous
P2P protocols. Because of communication following standard TCP/ IP, BitTorrent
protocol can operate on regular Internet connection [10].
BitTorrent protocol, which allows users to distribute large amounts of data
without depending much on their computers, would be needed for standard Internet
hosting. This protocol works as an alternative data distribution method that makes
even small devices (e.g. mobile phones) with low bandwidth capable of participating
in large data transfers.
BitTorrent client is a program operating under the BitTorrent protocol. Each
BitTorrent client can compares, requirements, and transports files on the network
using the BitTorrent protocol. Files can contain any information, including text, audio,
video and encrypted content.
First, a user playing the role of file-provider makes a file available to the
network. This first user's file that is called a seed allows other users, called peers, to
connect and start to download the file. When new peers connect to the network and
request the same file, their computer gets a different piece of the data from the seed.
Once multiple peers have multiple pieces of the seed, each of them becomes a source
for that continuously distributes the file. The effect of this is to reduce the dependent
on the initial user and distribute the file download task among the seed and many
peers. With BitTorrent, there is no need to provide a computer data quantities, which
could jeopardize the task by overwhelming all resources, but the same result is still
reached: each peer eventually receiving the entire file.
15
After the file successfully and completely downloaded, the peer can shift roles
and become an additional seed, helping the other peers to get the whole file. This
eventual change from peers to seeds determined the number of times a file is available
in its complete form.
This distributed nature of BitTorrent leads to spread the file throughout peers. As
more peers join the swarm, the likelihood of a successful download increases.
Compared with standard Internet hosting, this reduces the original distributor's
hardware and bandwidth resource costs significantly. It also reduces dependence on
the original distributor and provides a source for the file, which is generally temporary
and therefore more difficult to trace than providing by the long-term availability of a
server in standard file distribution techniques.
2.5.2 Operation
BitTorrent greatly reduces the load on the server, since users usually download
files between them, not the server. As shown by the colored bars below each client, the
file is downloaded in a random order, instead of carrying a sequential order.
Unlike the systems of traditional file sharing, its main objective is to provide an
efficient way to distribute a single file to a large group of people, forcing all those who
download a file to share it with others. First is distributing by conventional means a
small file with a. torrent. This file is static, so often found on websites or distributed
via email. The file 'torrent' contains the address of a "search server", called tracker,
which is responsible for locating potential sources for the file or part thereof.
This server is really centralized and it provides statistics on the number of
transfers, the number of nodes with a full copy of the file and the number of nodes that
have only a portion thereof.
The library file or files you want to download from the sources found by the
tracker, and at the same time as the download, it starts to climb parts of the file
available to other sources, using the bandwidth allocated to it. Since the action of
sharing begins even before completing the download of a file, each peer will inevitably
contribute to the distribution of this file. The system is responsible for rewarding those
who share more, the higher the bandwidth the greater the number of connections to
discharge peers to be established.
When a peer starts downloading a file, BitTorrent does not necessarily come at
the beginning of the file, but is low in parts at random, after peers are connected to
16
download the file. If the peers are available online each part of the whole file (even if
they are scattered), eventually every peers will get a full copy of it. Of course, initially
one must have the complete file to begin the process. This method produces significant
improvements in transfer rate when many users connect to download the same file.
Where there is not any node with having complete file ("seeder" or "seed")
connected to the tracker, the possibility exists that the file cannot be completed.
2.5.3 Creating and publishing torrents
The peers that are distributed over the network treat the file as a separation of it
in a number of identically sized pieces, typically between 32 KB and 4 MB each. Each
peer performs a checksum (checksum) for each part, using the SHA-1 algorithm, and
storing it in the torrent file. Parts greater than 512 KB will reduce the size of a torrent
file for each payload, but this would reduce the efficiency of the protocol. When
another peer later receives a particular piece, the hash of the piece is compared to the
recorded hash to check that is free from error. Peers that provide complete files be
called seeds (seeders) and the peer that provides the initial copy of the file is called
initial seed (initial seeder).
The exact information that is contained in the torrent file depends on the
BitTorrent protocol’s version. By convention, the name of a torrent file has the
extension ".torrent". Torrent files have a section called "announce", which specifies
the URL of the tracker, and "Info" section, which contains the names of the files, their
sizes, length of piece used, and SHA-1 hash code for each of the pieces, all this
information is used by clients to verify the integrity of the data received.
Once completed torrents files are posted on a website or elsewhere, and are
registered with an origin server, which is called a tracker, it keeps the list of clients
who are currently participating on the torrent file. Alternatively, in a decentralized
system, each peer acts as a tracker. BitTorrent clients such as μTorrent, BitComet,
KTorrent and Deluge, implement this, using methods of Distributed hash table (DHT).
Azureus also supports tracing method is incompatible nodes (April 2007) with the
DHT that offers its customers [2].
Once DHT has passed, a "private" flag like broadcast flag was unofficially
announced, telling clients to restrict the use of decentralized monitoring whether the
user desires. Flag is intentionally placed in info section so it cannot be disabled or
17
removed without changing the identity of the torrent. The purpose of the flag is to
prevent torrents from being shared with clients without access to tracker. The flag was
requested to be included in the formal specification in August, in 2008, but it has not
been accepted.
2.5.4 Downloading torrents and sharing files
Using any Internet browser, like Firefox, browse the site has a list of torrents,
download it then use BitTorrent client open out there. After opening the torrent file,
BitTorrent will connect to the trackers, the trackers will give it a list of the peers are
downloading this file. A group of members (or peers) of a BitTorrent network which
download the same file is called the swarm. When the client connects to the swarm, it
will start sharing files with each other. The clients will share the pieces together
instead of sharing direct with the trackers, so the number of clients in the swarm can
develop very quickly.
In BitTorrent network, each of clients shares some of its resources such as data,
upload bandwidth with other peers interested in the same content in order to go up the
global system capacity. Peers participating to download the same content form a
torrent. A peer discovers other peers by contacting a central rendezvous node called
tracker. The latter store IP addresses of all peers in the torrent and manage statistics of
uploader and downloader on network. To make the replication of the content in the
network easier and to guarantee multi-sourcing, a file is subdivided into a set of pieces.
Then, each piece is also subdivided into blocks. Seed (seeder) is a peer, which has all
pieces of the file. When the peer is still downloading pieces, it is called leecher. The
neighbors of a node are contained in a peer list which it can open a TCP-connection to
exchange data and information. Only four simultaneous outgoing active TCP
connections are accepted by the protocol. These neighbors are called effective
neighbors. They are chose by the BitTorrent’s choking algorithm. This algorithm is
performed periodically. When the choking period expires, a peer chooses to unchoke
the 3 uploaders to him at the highest rate. It is a best slot unchoking. This strategy,
called tit-for-tat, ensures reciprocity and enforces collaboration among peers. Now to
discover new upload capacities, a peer chooses randomly the fourth peer to unchoke.
This slot is called optimistic slot. All other neighbors are left choked. Once unchoked,
a peer chooses a piece to download using a specific piece selection strategy that is
called local rarest first. Indeed, each peer maintains an update-to-date list of pieces
owned by all its neighbors. When choosing a piece, a peer selects the piece with the
least redundancy in its neighbors. One of the rarest pieces is selected randomly in case
of equality. Rarest first is supposed to grow up the entropy of pieces in the network
that enforces cooperation and hence improves global performance [14].
18
2.6 BitTorrent variant for wireless ad hoc networks
Some works have tried to adapt to the BitTorrent network to MANETs (eg, [1]
and [13]). They focus only improved phase detection node without resolve to improve
the efficiency of content sharing. Michiardi et al [7] study in the performance of a
cooperative mechanism for distributing content from a source to a destination of great
potential. They propose to implement a small change BitTorrent that allows detection
neighbors and local traffic. This is done by selecting only close neighbors as effective
neighbors. The result is a decrease in download time and total energy consumption.
However, this below solution goes further by focusing not only on download time, but
also on the sharing between peers.
2.6.1 Trackerless BitTorrent
According to the feature of MANETs that peers themselves linking into a
network, a centralized tracker cannot install. To adapt BitTorrent over wireless ad hoc
network, we need a new mechanism to replace tracker's role: discovering and
identifying the other peers. M.K Sbai gave a method using HELLO message to solve
such problem [8]. To discover new peers, a peer floods periodically a HELLO
message to the network and waits for HELLO REPLY messages. These HELLO
messages are forwarded to the other neighbors with some initial TTL (Time-To-Live)
to control the scope of the flood and hence the visibility of a peer. Receiving a HELLO
message, a peer decreases the TTL and forwards it to its wireless neighbors, and so on
until its TTL reaches zero.
2.6.2 Packets exchanged between peers
There are 2 types of packets exchanged in the network: Data packets and control
packets. We choose to send data packets via TCP connections because TCP provides
reliability and congestion controls needed to transfer blocks of file. Control packets
such as HELLO message (or BEACON message in new version) and piece updates are
better when using UDP. Here are the control packets exchanged between peers:
- HELLO: see section 2.6.1
- HELLO REPLY: see section 2.6.1
- UPDATE PIECE LIST: when a peer receives a new piece, it sends an
UPDATE PIECE LIST to all peers with whom it can exchange data.
19
- PIECE OFFER REQUEST: when a peer i unchokes a peer j, it sends a
PIECE OFFER REQUEST packet to j. This packet contains the list of pieces
that i has already downloaded.
- PIECE OFFER REPLY: receiving a PIECE OFFER REQUEST, a peer sends
a PIECE OFFER REPLY packet to the sender. A flag included in this packet
indicates the offer being accepted or not.
2.6.3 Downloading mechanism
To profit from the advantages of the limit neighborhood, M. Sbai gave an idea
that modified the original BitTorrent to adapt to MANETs [8]. A few TCP connections
are created to distant peers in addition to those with close peers. Pieces can then spread
over the network and they will be propagated in different directions, which improve
the sharing ratio and the download completion time. Therefore, several zones of the
network can be active simultaneously, which makes the network becoming more
dynamic. The new choking algorithm is used by utilizing routing information. It
distributes optimistic unchokes between distant and close peers and adds a specific
neighbor selection mechanism to choose a remote peer. It also applies a new piece
selection strategy when the distant peer is offered. Unlike regular BitTorrent with
limited neighborhood, this method requires a global knowledge about the identifiers of
peers in the network.
In the modification BitTorrent, each peer maintains 2 tables: NEARBY
NEIGHBOR TABLE (NNT) and FAR NEIGHBOR TABLE (FNT). One and two
hops neighbors will are added to NNT, the others belong to FNT. The PIECE
UPDATE packets are sent only to neighbors in NNT. Due to the piece selection
strategy operates only on the NNTs; peers don’t need to know about all pieces in the
network. Indeed, in wireless networks, the copy of pieces is more efficient when it is
based on statistics in the nearby neighborhood since this guarantees a faster local
replication compared to a large neighborhood. As in BitTorrent, the choking algorithm
selects three best uploaders as effective neighbors. These three neighbors are chosen
from both nearby and far neighbor tables. The peer then serves these three neighbors
during the next choking period. But in addition to these effective neighbors, the fourth
random neighbor is selected from one of the two tables (optimistic slot) using a round
robin policy that guarantees an optimal balance between the random unchokes locally
and the transmission of pieces to remote neighbors in order to improve diversity. In
detail, the peer selects a peer one time from FNT, q times from NNT and so on. In this
protocol, the ratio of the number of time slots spent on serving nearby neighbors and
those for serving far neighbors bases on the quantum q. It is also the number of slots
20
that a peer should wait before unchoking a distant neighbor again. Sbai’s experiments
shown that the optimal choice of the quantum q depend on the number of nodes in
network, the maximum length of a path between 2 nodes (hm) and the number of
pieces that can be sent during a chocking slot to a node:
𝑞 =
𝛼ℎ𝑚
𝛼1
.
ℎ𝑚
2
.
𝑁
2
- N: number of nodes in the network
- hm: maximum length of a path between 2 nodes
- αi: number of pieces that can be sent during a chocking slot to a node located at
i hops
For example, in our simulation where N = 40, hm = 12, α1 = 100, α1 = 1,
the quantum q is 1 (q is a integer)
2.6.4 Selecting a neighbor at random
Selecting a far neighbor
In a regular BitTorrent client, a random peer is selected by a uniform probability
to unchoke optimistically. While in wireless networks however, the gain we get from
optimistic unchoking in terms of diversity increases with the number of hops.
Therefore, a peer has more interest in unchoking a farther peer than another one closer
to it. Thus, in this modification version of BitTorrent, to select a distant peer to
unchoke from FNT, the peer starts by selecting the number of hops to that peer with a
probability p given by this formula:
𝑝 ℎ =
ℎ
ℎ𝑚 + (ℎ𝑚 − 1)
𝑖𝑓 ℎ > ℎ𝑚 − 1
0 𝑒𝑙𝑠𝑒
- hm: maximum number of hops seen by the peer
- Suppose that FNT contains only peer at hm and hm -1 hops
Once the number of hops h is chosen, the peer at h hops, which is chosen
randomly, is the optimal node to unchoke.
21
Selecting a nearby neighbor
Once the peer needs to select a nearby neighbor, it chooses a node from NNT in a
uniform random way.
2.6.5 Piece selection strategy
When the offering is far
When receiving a PIECE OFFER message from a wireless neighbor, it considers
that it is an offer from far neighbor if the number of hops to the offering node is
greater than 2. A specific piece selection strategy will be applied, called the absent
piece strategy. The peer firstly computes the redundancy of the offered pieces in its
NNT and its piece pool. At the opposite of BitTorrent, the candidate pieces will be
those with zero redundancy due to the fact that no need to download a piece from a far
node if it exists at nearly neighbor table. So, a piece can be accepted only none of its
near neighbors has downloaded it before. If there is more than one absent pieces, one
is chosen randomly. The absent piece can then be replicated quickly in the near
neighborhood. If no absent piece is noticed, the peer sets a REJECT flag in the PIECE
OFFER REPLY packet.
When the offering is near
Once the peer receives a PIECE OFFER from one of its nearby neighbors, local
rarest first algorithm will be used. Pieces with the least number of copies in the NNT
are selected. Rarest first is supposed to increase the entropy of pieces in the network
that enforces collaboration and hence improves global performance.
2.7 Performance metrics
Here are the performance metrics of BitTorrent protocol that we use in through
our study:
– Uij: Total bytes uploaded by peer i to peer j
– Dij: Total bytes downloaded by peer i from peer j. (Ui j = Dji)
22
– Rij: Ratio of sharing between peer i and peer j.
𝑅𝑖𝑗 =
𝑀𝑖𝑛(𝑈𝑖𝑗 ,𝐷𝑖𝑗 )
𝑀𝑎𝑥(𝑈𝑖𝑗 ,𝐷𝑖𝑗 )
– Ni: Number of neighbors j of peer i such that Uij = 0 or Dij = 0
– Ri: Sharing ratio for node i
𝑅𝑖 =
1
𝑁𝑖
. 𝑅𝑖𝑗
𝑗 | 𝑈𝑖𝑗 ≠0 𝑜𝑟 𝐷𝑖𝑗 ≠0
– Fi: The finish time of peer i. It is the time by which i receives all pieces of the
file.
Following our instruction, there are some additional performance metrics related
to topological positions of peers below. The file is supposed to exist at one seed S at
the beginning of the session. Our metrics quantify t he quality of service perceived by
peers as a function of their relative positions with respect to the seed.
– Fh: Average finish time of peers (or nodes) located at h hops from seed S.
𝐹𝑛 =
1
𝑛ℎ
. 𝐹𝑖
𝑖 | 𝐻 𝑖 = ℎ
Where nh is the number of peers located at h hops from seed S and H(i) is a
function that gives the number of hops between any node i and the seed S.
– Rh: Average sharing ratio of peers (or nodes) located at h hops from seed S.
23
𝑅𝑛 =
1
𝑛ℎ
. 𝑅𝑖
𝑖 | 𝐻(𝑖) = ℎ
24
CHAPTER 3: AN IMPROVEMENT OF BITTORRENT
ADAPTATION TO MANETS
3.1 General ideas
BitTorrent uses a centralized node (called a tracker) to solve node discovery
problems. This means that each node participating in the BitTorrent system must know
the address of the tracker and contact the tracker to get the addresses of other nodes in
the network. This cannot be done in mobile ad hoc networks because this kind of
networks is formed of mobile devices and it may not rely on a tracker as in
conventional wired networks. This issue has been mentioned in the article [8], which
uses HELLO messages, but it still contains some problems. Recall that to discover new
peers, a peer periodically floods the network with a HELLO message and waits for
HELLO REPLY messages. HELLO messages are transmitted to mobile neighbors
with some initial TTL (Time-To-Live) to control the scope of the flood and hence the
visibility of the peer. This process consumes too many resources and increases the
network overhead. Besides, one of the most important characteristics of mobile ad hoc
networks is the frequent movement of nodes. Therefore, that solution is not optimal for
BitTorrent protocol over MANETs.
Dr. Tran Thi Minh Chau gives a new mechanism that solve above issues [3].
This method has never ever applied in any P2P systems. It is a motivation for me to do
my thesis. To improve original BitTorrent’s performance when it is deployed on
MANETs, we replace the old peer discovery technique with the node presence
detection mechanism proposed in [3]. To achieve the same stability as the former
BitTorrent, we need some modifications, which will be introduced in section 3.3.
3.2 Node presence detection mechanism
We assume that each node in the network has a unique ID. This may be a MAC –
or statically assigned IP address. In the old node presence detection mechanism, the
IDs of the nodes are transmitted via HELLO messages, and these messages will be
forwarded to the other nodes. The obvious problem of this approach is that the amount
of data distributed to each node would increase linearly with the number of nodes in
the network, which increases network load accordingly.
25
In order to avoid this problem, we use another method to discover node presence
in the network, which is described in [3]. This method uses a data structure, which
contains the node’s presence status: soft state bloom filter.
3.2.1 Soft state bloom filter
A soft state bloom filter is a data structure that represents a set S = {s 1, s2 … sn}
of n elements to support membership queries. It is an array of m*l bits (l is typically
small, e.g. l = 3). Each element of S stores a counter. A node initializes all counters
with the maximum value 2
l
- 1. This maximum value indicates that the position of the
Bloom filter is not set. K independent hash functions h1, h2 … hk are used, each maps
every possible item in the set to a uniformly distributed value in the range {1 … m}.
Any hash function with good random distribution and outputs long enough for the
Bloom filter size can be selected.
There are two basic operations in a bloom filter. To add a new element x, which
is a node ID in our specific case, the counters at positions h1(x), h2(x)… hk(x) in S are
set to 0, as depicted in Figure 5:
Figure 5: A soft state Bloom filter
In order to determine the presence of node x, the bits at these positions will be
checked. If any of these is 2
l
– 1, it means x is not in S. Otherwise, it can be assumed
that x is in S with some remaining probability of a false positive that occurs when an
element is actually not in the set, but all respective bit positions have been set to 0 by
adding other elements.
26
Periodically, before sending a beacon message, a node applies the hash functions
to its own ID, all counters at h1(id), h2(id)… hk(id) will be set to 0, and the others that
are not equal to 2
l
- 1 will be incremented by one.
The aging of the information means that positions of leaving nodes will be
expired. Thus, it can remove nodes, which are no longer present from the Bloom filter.
3.2.2 Bloom filter operations
Timeout and refresh
Each node performs three following steps during the timeout and refresh
operation:
(1) Increment each si by one, if it has not reached the limit of 2
l
- 1
(2) Refresh the information about the node’s own presence, by setting s hj(ID) = 0 for
all j = 1, 2… k where ID is the ID of the local node.
(3) Broadcast the updated aggregate to the neighbors.
Merge operation
When a node receives a beacon message, the Bloom filter information of the
sender will be merged into the local array. Each Bloom filter position is set to the
minimum of local and received ages.
Presence detection
To check another node x is present or not, a node u checks its local aggregate at
positions h1(x), h2(x)… hk(x). If one of them equal to 2
l
- 1, we conclude that x is not
present.
We can observe the operation of the BF through the following example:
For simplicity, we consider 2 node X, Y and 4 events:
27
Figure 6: Update Bloom filter
3.2.3 Decoupling information decay and BEACON Intervals
In some situations, the aging of information is slower than the beaconing in a
naive way, for example, when the counter values are increased only every b > 1
beaconing intervals. Let A and B be two neighboring nodes. Assume that si is currently
at value 5 in both A and B, but no longer refreshed and should be subject to decay. A
increases si to 6, but B’s value of si is still 5, and then A will receive another beacon
from B with si = 5, thus resetting A’s aggregate. This will happen as same as with B’s
value.
To solve this problem, we need a small soft state bloom filter implementation:
whenever a beacon is sent without previously ageing the counter values in the local
aggregate, the sent-out copy of the aggregate must be decayed (and refreshed) prior to
sending it to the neighbors. That is, if A’s value of si is currently 5, and it sends a
beacon without decaying its local copy of the aggregate, the broadcast value will be si
= 6.
28
Figure 7: Problem when updating Bloom filter
3.3 Integration of node presence detection mechanism into BitTorrent
The main goal of our variant of BitTorrent is the combination of BitTorrent
downloading mechanism with the new peer discovery method described above to
achieve better performances. My simulation results described in the next section will
prove this. Instead of using HELLO message, a peer floods periodically the network
with a BEACON message. This message contains not only its ID but also a Bloom
Filter, which maintains the status of peers in the network. Before sending BEACON
message, the peer resets the information about its own presence by applying the hash
functions, set hi(ID) = 0 to confirm its existence to the other peers.
Upon receiving a BEACON message, a peer merges information from this
message and its Bloom filter. The other sender’s neighbor situations are also updated
by this operation. Each Bloom filter position is set to the minimum of local and
received ages. For example, the sender’s positions in Bloom filter are set to 0 - the
minimum value. When a peer no longer exists in the network, its positions in Bloom
filter will reach the maximum value: 2
l
– 1. Therefore, it needs to be removed from the
neighbor tables to avoid affecting to the downloading algorithm.
29
The sending and receiving BEACON messages strategies are very important.
They not only help to update the routing table, identify peer IDs but also allow peers to
detect and add new peers into their neighbor table.
When there are enough entries (≥ 4) in the routing table , the BitTorrent
neighboring selection and downloading algorithms, which are present in section 3.6,
will be deployed. Firstly, four peers in NNT/FNT are chosen as effective neighbors.
They have to be guaranteed as regular peers by using Presence detection operation of
Bloom filter (see in section 3.2.2). After that, peers start transferring their pieces. This
process will be ongoing and new status of peers in the network will be updated again
by using BEACON messages.
Here are some functions, which are used to implement the new presence detection
mechanism:
Algorithm 1: update Bloom filter:
Algorithm 2: Check presence:
Algorithm 3: Update neighbor tables of peer x:
merge(BF sbl):
for i ← 0 to m
x.bloom[i] = Min(x.bloom[i], sbl[i])
checkPresence(Peer p):
for i ← 0 to k
if (bloom[hi[p.ID]] = 2
l
- 1)
removeFromNeighbors(p)
for i ← 0 to neighbors.size
if hopCount(x, neighbors[i]) ≥ 2
FNT.append(neighbors[i])
else NNT.append(neighbors[i])
30
Figure 8: Flowchart of new implementation
3.4 Conclusion
To sum up, in this chapter, we presented an idea to improve the BitTorrent
protocol over MANETs using Bloom filters. According to prove our solution, the next
chapter will explain our NS-2 simulation and its results.
31
CHAPTER 4: SIMULATION RESULTS
4.1 Network simulator (NS-2)
Network Simulator (Version 2), widely known as NS-2 [9], is a discrete event
network simulator. NS is popularly used in the simulation of routing and multicast
protocols, among others, and is heavily used in networking research. Simulation of
wired as well as wireless network functions and protocols (e.g., routing algorithms,
TCP, UDP) can be done using NS-2. In general, NS-2 provides users with a way of
specifying such network protocols and simulating their corresponding behaviors. NS-2
is popular in academia for its extensibility (because of its open source model) and
plentiful online documentation.
4.2 Main scenario
To evaluate the performances of our proposition, we conduct simulation
experiments using network simulator (NS-2). We develop an NS-2 application
implement the one that M. Sbai built based on basic concepts of BitTorrent algorithm
presented in section 2.6 and a new peer discovering mechanism using Bloom filters.
Base on the simulation results, we will show that new presence detection mechanism
can be deployed to replace the old version completely with better performances. The
below parameters are the detailed models of the system in which new version
simulation is deployed.
32
Table 1: Values of parameters using in the simulation
Parameter Value
Number of nodes 40
Number of nodes per row 10
Distance 40 m
Routing protocol DSDV
Starting download time 1500 s
Data rate 1 MB/s
File size 10 MB
Block size 1 KB
Piece size 100 blocks
Number of seeds 1
Choking period 40s
TTL 2
Quantum (q) variable
We initialize a network of N nodes (N = 40 in default) distributed in a grid
topology (10 nodes per row). The distance between 2 neighbors is set to 40m while the
wireless transmission range is 50m.
33
Figure 9: Network topology
Every node connects to each other using 802.11 MAC Layer with the RTS/CTS-
Data/ACK mechanism enabled. The data rate is set to 1 MB/s. For ad hoc routing, we
use the DSDV proactive protocol [4], and the node mobility is not set.
At the beginning of simulation, node 0 located at top left is the seed and all other
nodes are leechers. To ensure the convergence of the protocol to equilibrium, the file
size is set equal to 10 Mbytes. The file is subdivided into 100 pieces, which contains
100 blocks of 1KB. The first 1500s of simulation is the time interval for network
stabilizing, the BEACON messages are also sent in this time to calculate routing tables
and initialize Bloom filters. After that, all peers start downloading the file using
BitTorrent algorithms by first looking for each other then sharing the pieces.
BitTorrent chocking algorithm period is taken 40s.
To study the performance of our solution, we run both the old simulation with
HELLO messages and the new simulation with BEACON messages to compare the
results. We change some parameters of the simulation and observe the behavior of the
download finish times of peers and their sharing ratios.
34
4.2.1 Estimate the average finish time
Figure 10: Average finish time for our enhanced BitTorrent
compared to ordinary BitTorrent with limited neighborhood
Figure 8 compares the finish time of ordinary BitTorrent with limited
neighborhood (TTL = 2) and our version of BitTorrent using Bloom filter. Each curve
presents the average finish time Fh as a function of the number of hop to the seed. As
expected, the finish time increases as far as we move away from the source. One can
notice in the figure that the finish time in almost peers, which use BEACON messages,
is better than HELLO version. This is because of the time to discovery new peers
using BEACON messages is shorter than using HELLO, beside, in our solution, peers
don’t need to send HELLO_REPLY to the HELLO sender, and they only have to
process the BEACON messages. Thus, the network loading also is reduced
considerably.
35
4.2.2 Estimate the average sharing ratio
Figure 11: Average sharing ratio for our enhanced BitTorrent
compared to ordinary BitTorrent with limited neighborhood
Another important factor in BitTorrent over wireless ad-hoc networks is the
sharing ratio of peers. This observation is illustrated in Figure 9 which compares the
sharing ratios of the ordinary BitTorrent with limited neighborhood (TTL=2) with our
variant of BitTorrent using Bloom filters. We can easily see that the difference
between 2 versions is not significant. This is due to the change of using BEACON
does not affect much the neighbor tables. Therefore, the neighboring selection
mechanism still works almost the same as the ordinary version. That guarantees the
completeness of BitTorrent algorithm.
36
4.2.3 Estimate the network traffic
Figure 12: Number of control packets sent between 2 versions
To continue to evaluate the performance of the solution, we count the control
packets exchanged in the network and compare the differences between the two
approaches. We can easily see in figure 10 that the number of control packets
transmitted when using BEACON method is significantly less than the former method.
Especially, in some cases, using BEACON version can reduces the number of packets
a half compared with previous one. This is because when using BEACON method, the
nodes only send periodically BEACON messages to their neighbors. While using the
HELLO message, the nodes must forward HELLO messages to their neighbors based
on the TTL parameter. The larger TTL is, the more the number of control packets sent.
Besides, when a node receives a HELLO message, it has to send a HELLO REPLY
message to the sender; it also increases the network overhead. This result indicates that
our method offers better performances.
37
4.2.4 Impact of the number of network nodes
Figure 13: Average finish time for N = 30, 40 and 50
When the number of nodes in the network increases, the number of neighbors of
a node increases as well, especially in nodes far away from seed. At the same time, the
node will be able to exchange more pieces with the others. This reduces the finish
time. It is confirmed in figure 11 where the finish time when N = 30 is highest and this
number decreases when the number of nodes increases. However, this trend does not
affect too much the nearby nodes. That is why we choose a network with 40 nodes as a
stable topology.
38
CONCLUSIONS AND PERSPECTIVES
The P2P file-sharing applications in wireless ad hoc networks should provide
good quality services on time and finished sharing ratio. These applications have great
potential, but the nature of MANETs brings many difficulties to deploy these
applications. The solution that we proposed in this thesis has partly resolved this
problem. The simulation has also shown desirable results on the feasibility of the
solution where both finish time as well as sharing ratio are significantly improved. Our
future work will be on modifying the mobility of nodes and adapting our solution to
mobile applications. High dynamic of such networks will open the way to new
interesting problems.
39
REFERENCES
[1] A. Nandan, S. Das, G. Pau, M. Gerla, Cooperative downloading in vehicular ad
hoc networks, WONS, Washington, USA, 2005.
[2] Cohen, Bram. BitTorrent — a new P2P app. Yahoo eGroups. Retrieved 2007-04-15.
[3] Tran Thi Minh Chau. Lightweight Detection of Node Presence in MANETs.
Elsevier Ad Hoc Networks 7 (7), pp. 1386-1399, September 2009.
[4] C.Perkins, P.Bhagwat. Highly Dynamic Destination-Sequenced Distance-Vector
Routing (DSDV) for Mobile Computers, ACM SIGCOMM'93.
[5] (Ed) Javed I. Khan and Adam Wierzbicki. Special Issue, Elsevier Journal of
Computer Communication, Foundation of Peer-to-Peer Computing. Volume 31, Issue
2, February 2008.
[6] LOO Alfred Wai-Sing. Peer-to-Peer Computing: Building Supercomputers with
Web Technologies.
[7] Michiardi P., Urvoy-Keller G. Performance analysis of cooperative content
distribution for wireless ad hoc networks, WONS 2007, Obergurgl.
[8] Mohamed Sbai, Chadi Barakat, Jaeyoung Choi, Anwar Al Hamra and Thierry
Turletti. Adapting BitTorrent to wireless ad hoc networks. 7th International conference
on ad hoc networks and wireless 2008 (AD-HOC NOW), Sophia Antipolis, France,
September 2008.
[9] NS: The Network Simulator,
[10] Peter Antal. An overview of the BitTorrent Protocol.
[11] S. Rajagopalan, C-C. Shen, A cross-Layer Decentralized BitTorrent for Mobile
Ad hoc Networks, MOBIQUITOUS, San Jose, USA, 2006.
[12] Tamilmani, Karthik. Studying and enhancing the Bittorent protocol. Stony Brook
University. Retrieved 2006-05-06.
[13] Torrentfreak.
[14] Urvoy-Keller. Rarest First and Choke Algorithms Are Enough. IMC'2006, 2006,
Rio de Janeiro.
Các file đính kèm theo tài liệu này:
- LUANVANDoKhanhToan.pdf