The Mudcat Café TM
Thread #32792   Message #2513057
Posted By: JohnInKansas
11-Dec-08 - 06:39 PM
Thread Name: Frequently misspelled/confused names.
Subject: RE: Frequently misspelled/confused names.
Re a comment above about "case sensitivity:"

It would be expected that the older DOS database would not be case sensitive because DOS is NOT CASE SENSITIVE. Almost no DOS programs ever have been case sensitive, with the exception of a few of the last Word Processors that allowed you to "search case sensitive" as an add-in selectable option.

Windows usually is case sensitive but there are exceptions. In older Win versions you often found the ability to search in non-sensitive mode (the default) or to manually select "preserve case." (I can't comment on Office 2007 (or Vista) as - for me - no built in search works usefully in any way I can comprehend in either of these.)

Another topic:

or... how's about this... get a GOOD search engine that will allow FUZZY searches

The most common "fuzzy search" method begins with collapsing the search terms to omit all the vowels. There are a few other common "reduction methods" that replace - for example - common word endings (...tion, ing, etc) or ignore them entirely.

The problem with this is illustrated by a recent search, in which many 'catters participated, for information on a painting by an artist named "Prinet."

Collapsed by Google's fuzzy search, the term actually used by the search engine is "Prnt".

Note that Print - and all derivatives thereof such as Printer, Printing, etc - if "endings" are dropped, also collapse to "prnt."

And the word "parent" and its derivatives (parenting, parental, etc.) also collapse to fuzzy search "prnt."

The Google report thus returns its standard 237,429,714,811 hits, but will only show you the first ~800 (variable depending on some criteria nobody knows).

If there are 800 Printers in the world(?) or 800 parents(?) you will NEVER SEE a result for the obscure 19th century artist.

Fuzzy search usually "finds more." But when it finds too much, you're back to needing to search within the results (which Google sort of lets you do for some kinds of searches) but doing that successfully means you often have to know what the exact result is - and sometimes exactly where it's located. You're back to needing to "spell it right."

For a data base the size of the DT, or maybe even for searching threads here, fuzzy search might be helpful; but it's not really a sure thing. A complicating factor for the DT might be that many of the filenames/records in the database, I believe, still retain their DOS 8-char names which already are - effectively - pre-fuzzified, sometimes in ways not obvious until you actually find them.

John