The Mudcat Café TM
Thread #110222   Message #2325693
Posted By: JohnInKansas
25-Apr-08 - 04:22 PM
Thread Name: Tech: Forum search after 2005?
Subject: RE: Tech: Forum search after 2005?
I suppose that a couple of hours of "no comment" means everyone digested the bit about the easy way to do site searches using Google?

Anyone who really wants to see some success with searching on the internet MUST UNDERSTAND what is meant by an "indexed search."

Google does NOT go to the internet and look for the term you enter that you want to find.

Google "crawls" the internet and makes LISTS OF WORDS that people might look for, and LISTS THE WEBSITES where each word appears.

When you type a word and hit Go, Google looks at it's LISTS OF WORDS (Indexes) and shows you the sites where it already knows that "your word" appears.

Someone must decide in advance what words you "might look for" and ONLY THE WORDS that were chosen in advance to be added to the lists will appear in any search result.

If you are the first person to look for "a word" you might not find it using Google; but Google is pretty good at adding a "new word" if enough people look for it, so that people within the first dozen or so searchers may find a result.

(If someone posted here "lets all go look up amblyopliatious" - assuming it's NOT already in a Google index, it is very likely that the Google result would point back to the post here that started it off within about a dozen queries - maybe. Or maybe not. Google knows that one way to appear infallible is to be inscrutible.)

In a couple of actual cases, I've posted to a thread here, and gone immediately to a Google search for something else that was needed in the same thread, and found my post in the result on the first page - so it CAN BE really quick, or sometimes it can be NEVER.

If your search term is already indexed it may instantly appear in the search result; but if the same term appears on a few thousand "more popular" sites, you'll never see the result you want because although Google may fascetiously report 28,395,253,422 results (found in 0.0027 seconds), it will only allow you to look at the first hundred or so (in order of "popularity") of the ones it finds.

The actual number of "hits" you can look at seems to depend on "how popular" the search term is(?). I've seen very few of my searches that actually will display as many as 500 hits, although it's hard to come up with a term that gets "found" less then a million times.

A sort of "key" for using Google might be to look for a less popular term that might be in the result you want, and/or exclude as many extremely popular search terms as you can think of, in order to see a bit more of the otherwise "we have them but you can't see them" results. Penetrating much beyond the top layer can require exceedingly difficult/complex(?) "query construction," and still is unlikely to get very far.

Most searches at mudcat are index searches. Perhaps someone else can explain, in simple terms:

1. How the basic Digitrad Search is indexed.
       I don't think that index has been updated recently(?)
2. How the Advanced Search Index differs from the simple search box on the Discussion and Threads pages.
       I believe there was an update of the index for Advanced search since "the big crash." Does someone know for sure?
3. How the "Refresh" with filter word(s) really works.
       Refresh appears to list only a certain maximum number of returns, regardless of the time span you enter.
4. How to use the search by thread number.
       Discussed recently, and returns any thread, but only one at a time - unless someone's hiding a new trick that the rest of us don't know about.

(I've already strained my brain here, so I think I'll take a brief break. I think I've found an "almost textbook" example of a search system - not exactly at mudcat - that illustrates everything that can possible cause a search engine to fail but it will take some more study to put it into coherent form.)

John