how is the WWW shaped? (long article)
Here's an article from a list-serv I belong to on technology. This article discusses the "shape" of the internet and how it isn't what you think! : )
---------
Keep in mind a pervious post which point out that hypermedia
has as its partial basis, a grand design to accommodate both
disparate forms of information as well as wide distribution
of that information through links. Part of the fun of the Web,
researchers acknowledge, is the sense of serendipity at finding
yourself in an unexpected place, yet still able to find
information that is both useful and interesting. But what if
the Web did not look like a web or even a spiral at all. What
if it looked likeŕ a bow-tie? With really thin tendrils?
Ian Austen's May 18, 2000 article for the New York Times Circuit
section, "Study Reveals Web As Loosely Woven," covers the findings
of a Survey that argues just that.
(http://www.nytimes.com/library/tech/00/05/circuits/articles/18webb.html)
The study, "Graph Structure in the Web"
(http://www.almaden.ibm.com/cs/k53/www9.final),
a product of researchers from AltaVista, the IBM Almaden
Research Center, and Compaq Systems Research Center, suggests
the web of related information and links we have grown to imagine
in the past decade, where one page would take you to another
related page easily and quickly, does not exist. In its place,
we have a landscape full of missing files, massive information
disconnect, and links that don't really lead anywhere.
The researchers basically took a sample of all the web pages
thought to exist (roughly 200 million web pages or 1/5 of the
estimated number of total web pages), and then examined all
of the links contained in that sample-- about 1.5 billion! --
to see where they actually led. To process a sample this size,
they used a Compaq AlphaServer 4100 machine equipped with
12 gigabytes (!) of RAM random access memory, and AltaVista's
private Web crawler (known as Scooter), to gather and index the
links (but not keywords). This way they were able to get a sense
of how many links pointed to particular pages, in turn pointing
out how pages referred to other pages. The experiment was run
three times between Spring 1999 and the Winter 2000.
The survey basically found that:
-- only 28% of websites are actually connected to the rest of Web
through links to and from other sites;
-- if you used only links to navigate among web pages, you will fail
to find your destination about + of the time, and that, at best,
it takes about 16 clicks to get from those pages with significant
links to and from other pages;
But what about this bow-tie shaped "web?' Well, within the realm of
the sample, it looks something like this:
-- there were 56 million pages in the center, or "knot," considered
to be the most connected pages that link directly to each other
and pages at either end or "bow." This is where portals, corporate,
and media websites reside;
-- the right bow has some 44 million pages that were referenced by
the pages in the center, but that did not link back to the center.
This is where you'll find e-commerce and internal intranet sites
that do not link out to other sites;
-- the left bow also consists of 44 million pages, which represent
the most recently created pages in the sample, that link
extensively to the pages in the center, but which are often
not recognized by the center pages given their newness;
-- the links around the bow, again some 44 million pages, link only
to either end of the bow, but have no direct links to the center
pages;
-- away from the tendrils, about 5% of the pages sit, totally
isolated from all other content;
-- there were also slightly more sites being added to the center, but
for the most part not in great numbers.
There are some good reasons to keep all of this mind. Search engines
use their indexing tools to rank the placement of a page within
search results based in large part upon the number of pages that
refer to that page. The fewer the links a page has, the less likely
it is to be found, much less highly ranked in search engine results.
This means that users, who in general rely upon search engines to
help them navigate the web, will never find those pages that do not
employ at least some link to the most popularly referenced pages.
Assuming that this is the state, and shape, of the Web for some
time to come, it makes a strong case not only for strategically
linking out to other pages, but for beginning to think about how
to get pages to reference your pages in turn.
Ryan Turner
NPT Project
------------------
Doug
Blue Spade Productions
http://www.bluespade.com
|