Internet Structure -
Infrastructure
Grocery stores and
department stores are
designed in a very similar
way to one another. The idea
is to push or force the
shopper past certain items.
The internets main
intelligence comes from
search engines. As they try
to archive all of the
information about your site
and make it quickly
accessible to others. This
process unfortunately has
many flaws that leave out
much of your important
information, while allowing
less relevant sites to find
their way above you in
search engines. This is
about to change as many of
the leading search engines
strive to crawl all of the
content on each page of
every website.
The internet works because
of links. For example if you
have a website and have not
told anyone about it and
have not submitted to any
search engines or promoted
it with link exchanges than
you will not be found. Plain
and simple, if you have no
links to your site it will
fail. I stated this
information to make a point.
The internet survives due to
linking each other. If you
find a website that you
enjoy and that you believe
will be interesting to your
audience, go ahead and link
them to your site.
On websites specifically
e-commerce sites you can
find a lot more structure.
Search engines, drop down
box's, and categorized menus
can help navigate you
efficiently through their
site while trying to suggest
products that others bought.
The grocery store example is
more accurately depicted
through this analogy.
The trend towards
connectivity
Clever computer users
started connecting personal
computers together through
Local Area Networks
(sometimes abbreviated LAN)
and through telephone
connections and special
hardware devices. The
personal computers could
then be used again as
communication tools, but
only with other computers
they were directly tied to.
Gradually, LANs became more
powerful, and were often
tied together to make Wide
Area Networks (WANs). Most
businesses today use a
combination of LAN and WAN
technology.
At the same time,
educational and defense
institutions were working on
ways to connect the large
research machines. They had
a special problem. During
the height of the cold war,
these computers were used in
support of nuclear defense
initiatives. It was vital
that there be many paths
between the computers, and
that messages could get
through even if some of the
communications hubs were
brought down by the bad
guys.
An underlying protocol
The earliest form of the
Internet was based on an
ingenious idea called
TCP/IP. This stands for
Transfer Control Protocol /
Internet protocol. TCP/IP is
a big name for a simple
idea. Essentially, a message
is automatically broken into
small parts, which are
called 'packets.' A packet
is labeled with its source
and destination, as well as
some other information. Each
packet finds its own way
from the starting machine to
the destination, and if it
finds itself blocked, it has
the capacity to back up and
find a new path. When the
packets arrive at the
destination, they are pieced
back together, and the
message can be read.
The entire Internet from
email to web pages and
streaming video, is
currently based on TCP/IP
packets. Anything you see or
hear on the Internet was
broken into these packets
and sent to you. The TCP/IP
protocol is invisible and
automatic. Most users never
see it and never have to
know it is there. This has
some interesting side
effects. A message that goes
from one machine to another
in the next room might find
its way to France in the
meantime (not too often, but
it happens). The other side
effect of this is that
messages you send might
temporarily reside on dozens
of computers you will never
see before they get to the
destination.
The 'traffic cops' of the
Internet
As scientists were
developing TCP/IP and
networking technology became
more prevalent among
personal machines, it became
apparent that there ought to
be a way to connect the two.
Essentially, the solution
was a special class of
computer called a router.
The router's job is to sit
between a network and the
rest of the internet, and
act as a kind of mailman to
the network. Any traffic the
network sends to the
Internet goes through the
router, and any messages
destined for sites on the
network only get there
through the router. Routers
are connected through
high-speed cables to even
more powerful machines,
which are eventually
connected to a number of
special high-end machines,
often referred to as the
'Internet Backbone'. (This
network was originally
called the NSF backbone,
after the National Science
Foundation, which provided
much of the original
funding. Currently, the NSF
is backing a brand new
version of the Internet
backbone with a research
focus called the 'Internet
II' or 'Abilene network'. )
Hi, what's your number?
Since there are literally
millions of computers
connected to the Internet,
it could be nearly
impossible to locate just
one. Fortunately, the
original planners of the
Internet had some clever
ideas. Every machine on the
Internet was assigned a
number. The number would be
composed of four smaller
numbers between 0 and 255,
separated by dots. (There
are some wonderful urban
legends about why the
numbers don't go to 999, but
the real answer is related
to the vagaries of base two
mathematics. Let's leave
that for another session.)
The number is called an IP
(for internet protocol)
number. IP numbers work like
zip codes. They are easy for
computers to understand, and
they make it reasonably easy
for packets to be routed to
the appropriate
destinations.
The Domain name solution
The problem with IP numbers
is they are, well, numbers.
People tend to be not good
with numbers. They much
prefer characters and words.
For this reason, computer
scientists developed the
Domain Name Service (often
called DNS). DNS is just a
big database (actually
several) that contains a
bunch of computer names and
the IP addresses associated
with those names. The
interNIC (www.internic.net)
is currently the
organization which manages
the assignment of domain
names, although the process
is being privatized, and
others will soon have the
capacity to assign domain
names. There is a
registration fee for a
domain name, which is
currently $70.00 for two
years, but that may change
as competition enters the
marketplace. The good news
is that most of us do not
need to worry about a domain
name. We are usually given
an account by our employer
or some kind of provider,
and the domain name we use
reflects that entity. Part
of your email address is
usually your domain name.
For example, I used to have
an email address like this:
andyharris@aol.com The part
after the @ sign is domain
name of my organization.
Domain names have a number
of parts, and they can
actually give you a lot of
information about the person
or entity attached to them.
They usually end with a two
or three digit code. The two
digit codes refer to
countries, so .fr means
'France' and .ca means
'Canada.' In the United
States, we generally leave
off the two digit country
code '.us' The three digit
code refers to the type of
organization that owns the
computer. These fall into a
number of standard
categories. Mine ends in
'.com', which stands for
'commercial enterprise'. In
addition, you often see
domain names ending with '.gov'
(government organization),
'.edu' (educational
institution), '.org'
(non-profit organization),
or '.net' (Internet service
provider). The first part of
a domain name (the 'aol'
part in the example above)
is the name of a particular
computer or organization.
Sometimes there are a number
of intermediate words, that
can give you more clues. For
example, 'stats.math.indiana.edu'
would most likely refer to
the statistics section of
the math department of
Indiana University. (such a
machine does exist, but its
name has changed).
Domain names, as you can
see, are used as part of
email addresses, and they
also make up part of the
address of a web page. When
used in a web address, the
domain name usually comes
near the beginning. We will
look more closely at how web
addresses work in a moment.
When are you here? Is
existence essence?
It is important to determine
what it means for a person
or a computer to be 'on the
internet,' because there is
some potential for
confusion. If you can use a
computer to send email, is
it on the Internet? Is it on
the net because it has a web
browser (like Internet
Explorer or Netscape)
installed? Is a computer
always on the Internet?
Servers and clients
Some computers stay on the
Internet all the time, but
these tend to be large
expensive machines. The
computers that store
information like web pages
should stay on all the time,
and should always have some
kind of connection to the
Internet. Such machines are
called servers. It can be
complicated and expensive to
manage a permanent
connection, and even more
complex to manage a server.
Most ordinary people don't
want to do it, and want to
leave those jobs to a
professional. We would
usually just prefer to
connect our computer to a
server for short periods of
time, and use the services
of a professional to ensure
our connection stays valid
and we have all the right
programs in place. For
example, you probably turn
your home computer off at
night. What if you get an
email at two o'clock in the
morning, when your computer
is not turned on? Likewise,
you might have a small
business and want to host a
homepage. You will want
people to be able to get to
that page any time of the
day, not simply when your
computer is turned on and
'hooked up.'
In addition to servers, the
internet is also full of
clients. You will frequently
hear the term
'client-server' used in
Internet conversation. The
good news is you already
know what this means:
A client-server analogy
Imagine driving up to a
fast-food restaurant. You
get to the speaker and the
sixteen-year-old bored kid
mumbles something
incomprehensible into the
microphone. You then order a
'cholesto-burger supreme'
special, hear something that
resembles a request for some
cash, and you drive to the
window. You then exchange
the money for your meal and
drive off. The cashier
eagerly leaps to his
microphone awaiting the
opportunity to serve another
customer.
In this example, the
customer is the client and
the cashier is the server.
The server sits around
waiting for a client. A
client shows up and makes a
request. The client and
server follow a ritualized
conversation (a protocol) to
make a transaction. Finally,
the transaction is complete,
the client moves on, and the
server prepares to receive
another client.
Your machine is a client.
The Internet programs on
your own machine (like
netscape, a telnet program,
or an FTP program) are also
considered clients. Clients
exist to talk to servers.
Servers can also be both
machines and special
programs. You will almost
never directly talk to a
server program, but use a
client program to
communicate with.
So how do I get my client
talking to a server?
What most people do is
subscribe to some sort of
internet service provider.
There are two main flavors
in common use. One is the
HUGE services such as
America Online, Prodigy,
Compuserve, and many others.
These guys offer connections
to the internet, and they
also offer customized
content only for members of
the service. They can be a
great choice if you are just
starting out, and you have
probably already gotten some
software from one or more of
them in the mail or when you
purchased your computer. You
can often get free hours to
try out a service, and then
you will need to pay a
monthly service plan, or
perhaps pay by the hour. Be
very careful as you read the
plan to understand its
terms, particularly if you
are sharing an account with
members of your family. If
you are unaware of an hourly
service charge, you could be
in for a big shock when the
bill comes due.
The other main approach to
connecting to the Internet
is through some sort of
commercial Internet Service
Provider (ISP). These have
sprung up all over the
country, and they often
offer cheaper service than
the larger services, but
usually without custom
software or content. Many
experienced Internet users
prefer using an ISP, but it
can often be an intimidating
choice for beginners.
One other source of Internet
access you might pursue is
free access. Often
employers, schools, or
libraries will offer some
kind of limited free
Internet access. Most
universities now include
Internet access as a
standard student perk, like
a library card. Your
employer may have free or
reduced-rate Internet access
available to you. Local
schools, libraries, and
community centers sometimes
also offer some kind of
access. Often these accounts
are limited in some way, but
they can get you started.
Is there a free lunch?
There are a few commercial
ventures that get you on the
Internet for free as well,
but most already require you
to have some kind of access
to begin with. One notable
exception is juno (www.juno.com)
which is a free email-only
service. This service
includes special software to
connect your machine to the
internet. Of course, you
will have to endure some
advertising in order to
receive this 'free' service,
but it's not a bad
trade-off, particularly if
all you want right now is
email.
The software you might need
You probably already have
some Internet software
(clients) on your machine.
All of these programs 'know'
how to speak one or more of
the protocols and connect to
the appropriate servers.
That's all that internet
programs are!!
Once you are connected, your
machine has an IP number
(and maybe also a domain
name) assigned to it. This
means that you can now send
TCP/IP packets to and from
your machine. Of course,
most of us don't really want
to deal directly with
TCP/IP, we would prefer the
packages to be put together
in a more usable format.
TCP/IP is the most basic of
the internet protocol, but
it is used to put together
fancier and more powerful
protocols. A protocol is
simply a name for an
agreement about how a
communication will ensue.
Formal meetings have a very
different protocol than
discussions on a basketball
court, for example. There
are a number of protocols in
common use on the Internet,
but you only need to know a
few. In fact, you don't need
to know the protocols at
all, only which clients are
used for them!! We'll
discuss a few anyway, just
in case it comes up on a
quiz show ("Internet
protocols for a thousand,
please.")
The wild, wonderfully wacky
world wide web!!
The protocol most of us know
best is called HTTP (Hyper
Text Transfer Protocol) by
the People Who Like Big
Names For Simple Ideas. The
rest of us call it the
world-wide-web. HTTP is a
truly wonderful protocol,
because it allows us to have
links and images, and gives
us a chance to make much
more interesting documents
than we could have made in
the old 'text-only' days. If
you only have one Internet
client program on your
computer, you should get a
good web browser. Browsers
are powerful because the
HTTP protocol can be used to
handle some other protocols
(although in limited ways)
and because HTTP itself is
just so cool. If your
computer can handle it, you
should definitely have one
of the latest versions of
the big two browsers
(Netscape 4.5 or later, or
Microsoft Internet Explorer
4.0+). For ordinary personal
users, both are free.
This takes us back to the
idea of web addresses.
Addresses on the web are
also called URLs (for
Uniform Resource Locator).
You have probably blindly
typed http:// at the
beginning of every web
address, and you never knew
why. (It's a ritual. Throw
salt over your shoulder,
wave a chicken over the
monitor, and type http://).
Now perhaps you can see why
we type this. HTTP is the
name of the protocol we want
to use. Since web browsers
are primarily for the web,
we almost always type
http:// (Oooooooh!!)
Ocaisionally you will use a
web browser to use another
protocol, so you sometimes
see other things there (like
news:// or gopher://) These
things are just other
protocols.
You've got mail
Email is familiar. It
actually uses a number of
protocols. It is an
acceptable simplification to
say that email primarily
uses smtp (simple mail
transfer protocol) as a
protocol to send email
messages and pop3 to recieve
them. (Don't worry, there
won't be a quiz. I'm only
telling you this because you
may run across the terms
some time). Email clients
(like Eudora or the email
clients built into Netscape
and IE) already know how to
read and write the
appropriate protocols, but
sometimes you need to set
them up so they know where
your server is.
Don't forget newsgroups
Newsgroups are an important
part of the Internet that
are often overlooked. These
are special communication
forums that are widely
distributed across the web.
Most of the browsers have
built-in capability to work
with these newsgroups, but
you might want to
investigate a special
program to do so. Newsgroups
are especially wonderful for
connecting to people with
similar interests as you. If
you are interested in
something, there is probably
a global discussion going on
about the subject that you
can participate in.
Sometimes you want to
send stuff
The File Transfer
Protocol (FTP) is a
protocol designed for
transferring files
between machines on the
Internet. If will not be
doing much of this, the
FTP capability of your
web browser will
probably be enough. Some
people like to use
Internet accounts as a
place to back up
important documents, and
an FTP client is a good
way to handle the
transfers between two
accounts you own.
A classic protocol
Telnet is one of the
oldest protocols on the
Internet. What it does
is allow one computer to
act as a 'dumb terminal'
to another. In the
pre-web days of the
Internet, telnet was the
most common way to use
the Internet. It was not
for the faint-of heart,
though, because you had
to be able to use
whatever machine you
were connected to, which
often had arcane
operating systems such
as unix or VMS. It is
still common to use
telnet if you are
operating a web site,
particularly if you are
doing some web
programming, but most
beginners do not need to
worry too much about the
telnet protocol.
Summing it up
The Internet is by any
account an exceptional
thing. It is a complex,
dynamic organism with no
real head that still manages
to work together pretty
well. The core technology
that makes the Internet
possible is the TCP/IP
protocol. This provides an
underlying framework that
can be packaged together in
complex ways to form other
protocols. The Internet
contains two main classes of
computers and software:
clients and servers. Servers
are the machines and
programs that are on all the
time and are run by
professionals. Clients are
the machines and programs
that mere mortals use to
connect to servers. Hooking
up to the Internet entails
enlisting the services of a
server, establishing the
basic TCP/IP connection, and
running one or more client
programs. There is still
plenty of magic left, when
we consider how exactly the
protocols work, how the
communications happen, and
how all the various programs
are written, but it is
possible to understand the
basic workings of the
Internet. One of the most
exciting things about
technology is that when you
understand the magic, it
doesn't go away. The new
insight and ability that you
earn make you appear to be
much more effective as a
user of the technology.
Maybe we could say that when
we take some of the magic
out of the Internet, we
transfer that magic to the
people who have learned the
concepts.
this article can be found in
full at
http://wally.cs.iupui.edu/n241-new/webMag/internetMagic.html
|