The Mudcat Café TM
Thread #47837 Message #1109014
Posted By: JohnInKansas
04-Feb-04 - 08:12 AM
Thread Name: Help: search engines
Subject: RE: Help: search engines
Burke
Google searches mainly for links. The reference to not searching .edu and .org sites that I found was in a "privacy policy" statement that I read very thoroughly when some people were complaining about the "hit count" tracking that Google does if you install the toolbar. I didn't save the policy, but my recollection is that you had to follow about "three links deep" in "associated policies" and the whole "policy documentation" was something like 75 pages of fine print. I printed, but have discarded, about a third of it.
Saying that they don't search these sites does not mean that they don't index links to them that are found anywhere else.
Size of database has no effect. None of the common search engines read database information. [A side-benefit, they don't (can't) read anything in a "text box" or "frame" on a page that they search, so they don't (usually) index pop-ups.]
There is no specific ability to search .edu locations. There is the ability to limit the search to "links to .edu sites."
If the links were found on any site that has been searched, they will be indexed. The site that has been searched will not normally have a .edu or .org name, although many links to .edu and .org sites may appear on it, and will (maybe) be indexed.
You can search with the limit of finding only "links to .edu sites listed in the Google database."
The "Witcombe" site that I cited previously is fairly easy to find now, because it has been linked many times in the "open" web. The site itself (.edu) has apparently NOT been searched, since almost none of the 7,000(?) or so links there are indexed - or at least don't appear in the results Google will give you. I had trouble re-finding the page a year ago, because all I had at hand to look for was the content "inside" the page, which had apparently not been searched by Google.
Lots of .edu links appear on .com pages. They will be indexed when the .com page is searched. You can look only for .edu links, and there are quite a lot of them to look at. Links that appear only on .org or .edu pages will not generally be found by Google, or by any of the other common engines.
Many database sites have their own "local" search engines, so once you get into the site, you can sometimes extend the search on the local system. This is how you can search a database, using the database itself - if you can find the front door. (You do run into quite a few sites where you have to be a "member," or otherwise qualified, to use the search.) There are also some "special interest" search engines (like ArtCyclopedia, cited above) that find stuff that Google misses.
An interesting question is how the Google image search works. My own "working thesis" is that Google has no way of finding an "image." (They're often accessible only through a database, and/or are framed on any page where they're displayed.) It looks like (working hypothesis only) Google can only find links to images so it can only show you an image if someone has posted a link to the image. (The link may be on the same page, or at least the same site where the image appears, often a "click to view;" but "no link = no Google.") If you have to click in a thumbnail, to view the "full scale" picture, the "link" is hidden from Google because it can't (or just doesn't) "select" objects on the page to look "inside" them. Do your own tests on this one. I'm still working on it.
John