Mudcat Café message #629207

The Mudcat Café ^TM
Thread #43160 Message #629207
Posted By: Jim Dixon
16-Jan-02 - 03:59 PM
Thread Name: Tech: A Benign and Wise Google?
Subject: RE: BS: A Benign and Wise Google?

Here's my understanding of how web sites like Google work:
They have a program called a "spider" or "crawler" which periodically examines the web site. They start by scanning and indexing the "home" page, that is, http://www.mudcat.org/. Then they follow up all the links on that page. That brings them to http://www.mudcat.org/threads.cfm, http://www.mudcat.org/radio.cfm, http://help.mudcat.org/, and so on, and they index all those pages. Then they follow up the links on those pages, and so on, until they run out of pages (or they may have some criterion that causes them to skip certain pages or stop before they get that far).
The trouble is, not all Mudcat threads can be reached by following up links this way. Some threads can only be accessed by using the Filter or the DigiTrad and Forum Search boxes. The crawler isn't smart enough to do that. (I figure it's only a matter of time until someone invents a crawler that IS smart enough to do that.)
Here's an interesting experiment: Go to Google's Advanced Search page. In the box labeled "Find results with the exact phrase___" type "mudcat discussion forum". In the box labeled "Only return results from the site or domain___" type www.mudcat.org. Then click the "Google Search" button.
The first listing you see will be the familiar Mudcat Discussion Forum page. But DON'T CLICK THE LINK. Instead, click on the word "Cached." You will see an old version of the forum page. When I did it, the first thread I saw (after the PermaThreads) was
BS: Corny lines that we love      49     13-Dec-01 - 01:48 PM
That tells me that the last time Google's crawler examined Mudcat was soon after 01:48 PM on 13-Dec-01
Try again after another month or so and you'll probably get a different result.
And I'll bet that only the threads listed on that page were the ones that got indexed.
p.s. Google also does the same thing with http://dharma.mudcat.org/, http://loki.mudcat.org/, and http://ragtime.mudcat.org/. I'm not sure it always does them on the same day, so it might get different results.