4.7.1 The P2P concept
P2P is shorthand for ‘peer-to-peer’. Strictly speaking, a ‘peer’ is an equal – someone of the same status as oneself. In ordinary life, this is why trial by jury is sometimes described in terms of ‘the judgement of one's peers’. In computer networking, peers are computers of equal status.
To understand P2P, it's helpful to analyse the development of the internet in three phases. Let us call them Internet 1.0, 2.0 and 3.0.
The internet as we know it today dates from January 1983. From its inception in 1983 to about 1994 the entire internet had a single model of connectivity. There were relatively few dial-up (modem) connections. Instead, internet-connected computers were always on (i.e. running 24 hours a day), always connected and had permanent internet (IP) addresses. The Domain Name System (DNS) – the system that relates domain names like www.cnn.com to a specific internet address (in this case 22.214.171.124) – was designed for this environment, where a change in IP address was assumed to be abnormal and rare, and could take days to propagate through the system. Because computers had persistent connections and fixed addresses, every computer on the network was regarded as a ‘peer’ of every other computer. It had the same status and functions. In particular, each computer could both request services from another computer and serve resources (e.g. files) on request. In the jargon of the business, each computer on the early internet could function both as a client and as a server. The system was a true peer-to-peer network.
This situation changed after the World Wide Web appeared. As mentioned previously, the Web was invented at CERN by Tim Berners-Lee in 1990, and the first popular Web browser was Mosaic, created at the NCSA at the University of Illinois in 1993. With the appearance of Mosaic, and the subsequent appearance of the Netscape browser in 1994, Web use began to grow very rapidly and a different connectivity model began to appear. There was suddenly an explosive demand from people outside the academic and research world for access to the internet, mainly because they wanted to use the Web. To run a Web browser, a PC needed to be connected to the internet over a modem, with its own IP address. In order to make this possible on a large scale, the architecture of the original peer-to-peer internet had to be distorted.
Why? Well basically because the newcomers could not function as ‘peers’. There were several reasons for this:
Personal computers were then fairly primitive devices with primitive operating systems not suited to networking applications like serving files to remote computers.
More importantly, PCs with only dial-up connectivity could not, by definition, have persistent connections and so would enter and leave the network ‘cloud’ frequently and unpredictably.
Thirdly, these dial-up computers could not have permanent IP addresses for the simple reason that there were not enough unique IP addresses available to handle the sudden demand generated by Mosaic and Netscape. (There is a limit to the number of addresses of the form xxx.xxx.xxx.xxx when xxx is limited to numbers between 0 and 255, as stipulated in the original design of the Domain Name System.) The work-around devised to overcome the addressing limit was to assign Internet Service Providers (ISPs) blocks of IP addresses which they could then assign dynamically (i.e. ‘on the fly’) to their customers, giving each PC a different IP address with each new dial-up session. A subscriber might therefore be assigned a different IP address every time she logged on to the Net. This variability prevented PCs from having DNS entries and therefore precluded PC users from hosting any data or net-facing applications locally, i.e. from functioning as servers. They were essentially clients – computers that requested services (files, web pages, etc.) from servers.
Internet 2.0 is still basically the model underpinning the internet as we use it today. It is essentially a two-tier networked world, made up of a minority of ‘peers’ – privileged computers (servers within the DNS system with persistent, high-speed connections and fixed IP addresses) providing services to a vast number of dial-up computers which are essentially second-class citizens because they cannot function as servers and only have an IP address for the duration of their connection to the Net. Such a world is, as we will see later in the unit, potentially vulnerable to governmental and corporate control, for if everything has to happen via a privileged server, and servers are easy to identify, then they can be targeted for legal and other kinds of regulation.
Internet 3.0: a distributed peer-to-peer network?
Internet 2.0 made sense in the early days of the Web. For some years, the connectivity model based on treating PCs as dumb clients worked tolerably well. Indeed it was probably the only model that was feasible. Personal computers had not been designed to be part of the fabric of the internet, and in the early days of the Web the hardware and unstable operating systems of the average PC made it unsuitable for server functions.
But since then the supposedly ‘dumb’ PCs connected to the Net have become steadily more powerful, and the speed and quality of internet connections have steadily improved – at least in the industrialised world. Figures released in September 2004 suggested that 41 per cent of the UK population had broadband internet connections. On the software side, not only have proprietary operating systems (e.g. Microsoft Windows and Apple Macintosh Mac OS) improved, but the open-source (i.e. free) software movement has produced increasingly powerful operating systems (for example Linux) and industrial-strength web-server software (for example the Apache Web server, which powers a large proportion of the world's websites). As a result, it has become increasingly absurd to think of PCs equipped in this way as second-class citizens.
It is also wasteful to use such powerful computers simply as ‘life-support systems for Web browser software’. The computing community realised quickly that the unused resources existing behind the veil of second-class connectivity might be worth harnessing. After all, the world's Net-connected PCs have vast amounts of under-utilised processing power and disc storage.
Early attempts to harness these distributed resources were projects like SETI@Home, in which PCs around the globe analysed astronomical data as a background task when they were connected to the Net. More radical attempts to harness the power of the network's second-class citizens have been grouped under the general heading of ‘peer-to-peer’ (P2P) networking. This is an unsatisfactory term because, as we have seen, the servers within the DNS system have always interacted on a peer-to-peer basis, but P2P has been taken up by the mass media and is likely to stick.
The best available definition of P2P is ‘a class of applications that takes advantage of resources – storage, cycles, content, human presence – available at the edges of the internet. Because accessing these decentralised resources means operating in an environment of unstable connectivity and unpredictable IP addresses, P2P nodes must operate outside the DNS system and have significant or total autonomy from central servers’.
The two most widely known P2P applications are Instant Messaging (IM) and Napster. IM is a facility which enables two internet users to detect when they are both online and to exchange messages in ‘real time’ (i.e. without the time lag implicit in email(. Napster was a file-sharing system that enabled users to identify other users who were online and willing to share music files. I use the past tense because Napster was eventually closed down by litigation instituted by the record companies, as mentioned in Section 4.3, but the file-sharing habits that it engendered have persisted, so the phenomenon continues.
The problem that both IM and Napster had to solve was that of arranging direct communication between two ‘second-class citizens’ of the internet, i.e. computers that have non-persistent connections and no permanent IP addresses. The solution adopted in both cases was essentially the same. The user registers for the service and downloads and installs a small program,called a ‘client’, on their computer. Thereafter, whenever that computer connects to the Net the client program contacts a central server – which is inside the DNS system and running special database software – and supplies it with certain items of information.
For IM the information consists of:
notification that the user machine is online;
the current assigned IP address of the PC.
The database then checks the list of online ‘buddies’ registered by the user to see if any of them are currently online. If they are, the server notifies them and the user of this fact, enabling them to set up direct communications between one another.
For Napster the information supplied by the client consisted of:
notification that the user computer is online;
the current assigned IP address;
a list of files stored in a special folder on the user's hard drive reserved for files that he or she is willing to share.
A typical Napster session involved the user typing the name of a desired song or track into a special search engine running on the Napster server. The server would then check its records to find the IP addresses of logged-in computers which had that file in their ‘shared’ folders. If any records matching the query were found, the server would notify the user, who could then click on one of the supplied links and initiate a direct file transfer from another Napster user's computer.
Watch the short animation, linked below, of how Napster works.
Watch the animation by clicking on the ‘Start’ button in the diagram below.
4.7 Case Study 2: Peer-to-Peer (P2P) networking
4.8 Case study 2: The implications of Napster