The Mudcat Café TM
Thread #91460   Message #1741647
Posted By: GUEST,Jon
16-May-06 - 06:48 AM
Thread Name: Tech: Digital Tradition Programmer Needed
Subject: RE: Tech: Digital Tradition Programmer Needed
Oh well Dick, for the fun of it, I've added what MySQL calls a BOOLEAN FULL TEXT (documentation at end of post) search this morning. It seems to be one of the quicker searches.

As you will notice, the queries do vary in time. All words being the slowest, I think because it can have to scan the song lyrics for each word. The worst case I've had on the searches I've tried is 0.2 seconds.

Things speed up considerably with a keyword combined with "all words" I think because it only has to search the words for songs with that keyword(s).

Natural and Boolean use a FULL TEXT index to search. I think that is more mySQL specific than the other searches. Perhaps the closet MC equivilant would be the Verity search.

Anyway, some bits extracted from the MySQL 5.0 documentation on Boolean Full Text Search Documentation:


The boolean full-text search capability supports the following operators:

+ A leading plus sign indicates that this word must be present in each row that is returned.

- A leading minus sign indicates that this word must not be present in any of the rows that are returned.

Note: The - operator acts only to exclude rows that are otherwise matched by other search terms. Thus, a boolean-mode search that contains only terms preceded by - returns an empty result. It does not return all rows except those containing any of the excluded terms.

(no operator) By default (when neither + nor - is specified) the word is optional, but the rows that contain it are rated higher. This mimics the behavior of MATCH() ... AGAINST() without the IN BOOLEAN MODE modifier.

( ) Parentheses group words into subexpressions. Parenthesized groups can be nested.

~A leading tilde acts as a negation operator, causing the word's contribution to the row's relevance to be negative. This is useful for marking noise words. A row containing such a word is rated lower than others, but is not excluded altogether, as it would be with the - operator.

* The asterisk serves as the truncation (or wildcard) operator. Unlike the other operators, it should be appended to the word to be affected. Words match if they begin with the word preceding the * operator.

" A phrase that is enclosed within double quote (") characters matches only rows that contain the phrase literally, as it was typed. The full-text engine splits the phrase into words, performs a search in the FULLTEXT index for the words. Prior to MySQL 5.0.3, the engine then performed a substring search for the phrase in the records that were found, so the match must include non-word characters in the phrase.

The following examples demonstrate some search strings that use boolean full-text operators:

'apple banana'

Find rows that contain at least one of the two words.

'+apple +juice'

Find rows that contain both words.

'+apple -macintosh'

Find rows that contain the word apple but not macintosh.

'+apple ~macintosh'

Find rows that contain the word apple, but if the row also contains the word macintosh, rate it lower than if row does not. This is softer than a search for '+apple -macintosh', for which the presence of macintosh causes the row not to be returned at all.

'apple*'

Find rows that contain words such as apple, apples, applesauce, or applet.

'"some words"'

Find rows that contain the exact phrase some words (for example, rows that contain some words of wisdom but not some noise words). Note that the " characters that enclose the phrase are operator characters that delimit the phrase. They are not the quotes that enclose the search string itself.