The Search Query
Language
The Mudcat query language provides many operators and
modifiers for composing queries. The following search techniques can be
used in searching a Mudcat collection:
-
Word
searches
-
Proximity
searches
-
Concept-based
-
Field
searches in which documents are match based on matching predefined
custom attributes
- Scoring operators
Simple query expressions
Simple queries allow end users to enter simple, comma-delimited strings
and use wildcard characters. By default, a simple query searches for words,
not strings. For example, entering the word "All" will find documents containing
the word "all" but not "allegorical." You can use wildcards, however to
broaden the scope of the search. "All*" will return documents containing
both "all" and "alliterate." Case is ignored.
You can enter multiple words separated by commas: software, Microsoft,
Oracle. The comma in a Simple query expression is treated like a logical
OR. If you omit the commas, the query expression is treated as a phrase,
so documents would be searched for the phrase "software Microsoft Oracle."
Ordinarily, operators are employed in explicit query expressions. Operators
are normally surrounded by angle brackets < >. However, you can use
the AND, OR, and NOT operators in a simple query without using angle brackets:
software AND (Microsoft OR Oracle). To include an operator in a search,
you surround it with double quotation marks: software "and" Microsoft.
This expression searches for the phrase "software and Microsoft."
A simple query employs the STEM operator and the MANY modifier. STEM
searches for words that derive from those entered in the query expression,
so that entering "find" will return documents that contain "find," "finding,"
"finds," etc. The MANY modifier forces the documents returned in the search
to be presented in a list based on a relevancy score.
Explicit query expressions
Explicit queries can be constructed using a variety of operators, including
evidence, proximity, relational, concept, and score operators. Most operators
in an explicit query expression are surrounded by angle brackets < >.
You can use the AND, OR, and NOT operators without angle brackets.
Simple and explicit syntax
You can use either simple or explicit syntax when stating simple query
syntax. The syntax you use determines whether the search words you enter
will be stemmed, and whether the words that are found will contribute to
relevance-ranked scoring.
Simple syntax
When you use simple syntax, the search engine implicitly interprets single
words as if they were modified by the MANY and STEM operators. By implicitly
applying the MANY modifier, the search engine calculates each document’s
score based on the density of the search term in the searched documents.
The more frequent the occurrence of a word in a document, the higher the
document's score.
As a result, the search engine ranks documents according to word density
as it searches for the word you specify, as well as words that have the
same stem. For example, "films," "filmed," and "filming" are stemmed variations
of the word "film." To search for documents containing the word "film"
and its stem words, you can enter the word "film" without modification.
When documents are ranked by relevance, they appear in a list with the
most relevant documents at the top.
Explicit syntax
When you use explicit syntax, the search engine interprets the search terms
you enter as literals. For example, by entering the word "film" (including
quotation marks) using explicit syntax, the stemmed versions of the word
"film", "films," "filmed," and "filming" are ignored.
Operator summary
An operator represents logic to be applied to a search element. This logic
defines the qualifications a document must meet to be retrieved. Operator
types are as follows:
- Wildcards
- Evidence operators
- Proximity operators
- Relational operators
- Concept operators
- Score operators
Ordinarily, you use
operators in explicit searches. They are used in the following manner:
"<operator>search_string"
Search operations
The following table shows all operators available for conducting searches
of Mudcat collections.
Query expressions
Query expressions pass to the search engine in the CRITERIA attribute of
the CFSEARCH tag. Expressions are assembled with a combination of search
words, operators, and modifiers.
Special characters
A number of characters are handled in particular ways by the search engine.
Special Search Characters
|
, ( ) [
|
These characters end a text token.
|
= > < !
|
These characters also end a text token. They
are terminated by an associated end character.
|
' @ ` < { [ !
|
These characters signify the start of a delimited
token. They are terminated by an associated end character.
|
A backslash (\) removes special meaning from whatever character follows
it. To enter a literal backslash in a query, use two in succession. Examples:
<FREETEXT>("\"Hello\", said Packard.")
"backslash (\\)"
Precedence evaluation
The following rules apply for composing search expressions.
Precedence rules
While an expression is read from left to right, some operators carry more
weight than others. For example, AND operators take precedence over OR
operators. To ensure that an OR operator is interpreted prior to an AND
operator, you can use parentheses to enclose the OR operator:
(a OR b) AND c
Terms enclosed by parentheses are read first.
There must be at least one space between operators and words used in
the expression.
When the search engine encounters nested parentheses, it starts with
the innermost term:
(a AND (b OR c)) OR d
This expression means: Look for documents that contain b
or c
as well as a, or that contain d.
Prefix and infix notation
Search strings that use any operator other than evidence operators can
be defined in prefix notation or infix notation.
Prefix notation specifies that the operator comes before the search
string:
AND (a,b)
When prefix notation is used, precedence is handled explicitly within the
expression. The following example means: "Look for documents that contain
b and
c
first, then documents that contain a":
OR (a, AND (b,c))
Infix notation specifies that the operator is to be specified between each
term within the expression. The following example means: "Look for documents
that contain a
and b
or documents that contain c":
a AND b OR c
When infix notation is used, precedence is implicit in the expression.
For example, the AND operator takes precedence over the OR operator.
Commas in expressions
If an expression includes two or more search terms within parentheses,
a comma is required as a separator between each element. The following
example means: Look for documents that contain any combination of a and
b together. Note that in this example, angle brackets are used with the
OR operator.
<OR> (a, b)
Delimiters in expressions
Angle brackets < >, double quotation marks " ", and backslashes \ are
used to delimit various elements in a query expression.
Angle brackets for operators
Left
and right angle brackets < > are reserved for designating operators
and modifiers. They are optional for the AND, OR, and NOT operators, but
required for all other operators.
Double quotation marks in expressions
You use double quotation marks to search for a word that is otherwise reserved
as an operator, such as AND, OR, and NOT.
Backslashes in expressions
To include a
backslash \ in a search, insert two backslashes for each backslash
character you want to search for:
Wildcards
The following wildcard characters are available for searching Mudcat collections:
Mudcat Wildcard Characters
|
?
|
Question. Specifies any single alphanumeric
character.
|
*
|
Asterisk. Specifies zero or more alphanumeric
characters. Avoid using the asterisk as the first character in a search
string. Asterisk is ignored in a set, [ ] or an alternative pattern { }.
|
[ ]
|
Square brackets. Specifies one of any character
in a set, as in "sl[iau]m" which locates "slim," "slam," and "slum." Square
brackets indicate an implied OR.
|
{ }
|
Curly braces. Specifies one of each pattern
separated by a comma, as in "hoist{s, ing, ed}" which locates "hoists,"
"hoisting," and "hoisted." Curly braces indicate an implied AND.
|
^
|
Caret. Specifies one of any character not in
the set as in "sl[^ia]m" which locates "slum" but not "slim" or "slam."
|
-
|
Hyphen. Specifies a range of characters in a
set as in "c[a-r]t" which locates every word beginning with "c," ending
with "t," and containing any letter from "a" to "r."
|
Searching for wildcards as literals
To search for a wildcard character in your collection, you need to escape
the character with a backslash (\). For example:
To match a literal asterisk, you precede the * with two backslashes:
"a\\*"
To match a question mark or other wildcard character: "Checkers\?"
Searching for special characters as literals
The following non-alphanumeric characters must be preceded by a backslash
character (\) in a search string:
- comma (,)
- left and right parentheses ( )
- Double quotation mark (")
- backslash (\)
- at sign (@)
- left curly brace ({)
- left bracket ([)
- less than sign (<)
- backquote (`)
In addition to the backslash character, you can use paired backquotes (`
`) to interpret special characters as literals. For example, to search
for the wildcard string "a{b" you can surround the string with backquotes,
as follows:
`a{b`
To search for a wildcard string that includes the literal backquote character
(`) you must use two backquotes together and surround the whole string
in backquotes:
`*n``t`
Note that you can use either paired backquotes or backslashes to escape
special characters. There is no functional difference in the use of one
or the other. For example, you can query for the term: <DDA> in the
following ways:
\<DDA\> or`<DDA>`
Evidence operators
Evidence operators can be used to specify either a basic word search or
an intelligent word search. A basic word search finds documents that contain
only the word or words specified in the query. An intelligent word search
expands the query terms to create an expanded word list so that the search
returns documents that contain variations of the query terms.
Documents retrieved using evidence operators are not ranked by relevance
unless you use the MANY modifier.
Mudcat Evidence Operators
|
STEM
|
Expands the search to include the word you enter
and its variations. The STEM operator is automatically implied in any SIMPLE
query. Examples of EXPLICIT queries:
<STEM>believe
This query expression yields the following matches: "believe," "believing,"
"believer" etc.
|
WILDCARD
|
Matches wildcard characters included in search
strings. Certain characters automatically indicate a wildcard specification,
such as * and ?. Examples:
spam*
This query expression yields the following matches: "spam," "spammer,"
"spamming."
|
WORD
|
Performs a basic word search, selecting documents
that include one or more instances of the specific word you enter. The
WORD operator is automatically implied in any SIMPLE query.
|
Proximity operators
Proximity operators specify the relative location of specific words in
the document. Specified words must be in the same phrase, paragraph, or
sentence for a document to be retrieved. In the case of NEAR and NEAR/N
operators, retrieved documents are ranked by relevance based on the proximity
of the specified words. Proximity operators can be nested; phrases or words
can appear within SENTENCE or PARAGRAPH operators, and SENTENCE operators
can appear within PARAGRAPH operators.
The following table describes each operator:
Mudcat Proximity Operators
|
NEAR
|
Selects documents containing specified search
terms. The closer the search terms are to one another within a document,
the higher the document's score. The document with the smallest possible
region containing all search terms always receives the highest score. Documents
whose search terms are not within 1000 words of each other are not selected.
|
NEAR/
N
|
Selects documents containing two or more search
terms within
N
number of words of each other, where
N
is
an integer between 1 and 1024 where NEAR/1 searches for two words that
are next to each other. The closer the search terms are within a document,
the higher the document's score.
You can specify multiple search terms using multiple instances of NEAR/
N
as long as the value of N is the same:
commute <NEAR/10> bicycle <NEAR/10>
train <NEAR/10>
|
PARAGRAPH
|
Selects documents that include all of the words
you specify within the same paragraph. To search for three or more words
or phrases, you must use the PARAGRAPH operator between each word or phrase.
|
PHRASE
|
Selects documents that include a phrase you
specify. A phrase is a grouping of two or more words that occur in a specific
order. Examples of phrases:
mission oak
"mission oak"
mission <PHRASE> oak
<PARAGRAPH> (mission, oak)
|
SENTENCE
|
Selects documents that include all of the words
you specify within the same sentence. Examples:
jazz <SENTENCE> musician
<SENTENCE> (jazz, musician)
|
Relational operators
Relational operators search document fields that have been defined in the
collection. Documents containing specified field values are returned. Documents
retrieved using relational operators are not ranked by relevance, and you
cannot use the MANY modifier with relational operators.
There are two types of relational operators to perform numeric and date
comparisons. Text comparison operators match words and parts of words.
Numeric and date relational operators
The following operators are used for numeric and date comparisons.
Mudcat Numerica and Date Relational Operators
|
>=
|
Greater than or equal to
|
Text comparison operators
The following operators are used for text comparisons.
Mudcat Comparison Operators
|
CONTAINS
|
Selects documents by matching the word or phrase
you specify with the values stored in a specific document field. Documents
are selected only if the search elements specified appear in the same sequential
and contiguous order in the field value. For example, specifying "god"
will match "God in heaven," "a god among men," or "good god" but not "godliness,"
or "gods."
|
MATCHES
|
Selects documents by matching the query string
with values stored in a specific document field. Documents are selected
only if the search elements specified match the field value exactly. If
a partial match is found, a document is not selected. For example, specifying
"god" will match a document field containing only "god" and will not match
"gods," "godliness," or "a god among men."
|
STARTS
|
Selects documents by matching the character
string you specify with the starting characters of the values stored in
a specific document field.
|
ENDS
|
Selects documents by matching the character
string you specify with the ending characters of the values stored in a
specific document field.
|
SUBSTRING
|
Selects documents by matching the query string
you specify with any portion of the strings in a specific document field.
For example, specifying "god" will match "godliness," "a god among men,"
"godforsaken," etc.
|
SUBSTRING example
You can use the SUBSTRING operator to match a character string with data
stored in a specified data source. In the following example, a data source
called TEST1 contains the table YearPlaceText, which itself contains three
columns: Year, Place, and Text. Year and Place make up the primary key.
This is what the table looks like:
Table name: YearPlaceText
|
1990
|
Utah
|
Text about Utah 1990
|
1990
|
Oregon
|
Text about Oregon 1990
|
1991
|
Utah
|
Text about Utah 1991
|
1991
|
Oregon
|
Text about Oregon 1991
|
1992
|
Utah
|
Text about Utah 1992
|
Concept operators
Concept operators combine the meaning of search elements to identify a
concept in a document. Documents retrieved using concept operators are
ranked by relevance. The following table describes each concept operator.
AND
|
Selects documents that contain all of the search
elements you specify.
|
OR
|
Selects documents that show evidence of at least
one of the search elements you specify.
|
ACCRUE
|
Selects documents that include at least one
of the search elements you specify. Documents are ranked based on the number
of search elements found.
|
Score operators
Score operators govern how the search engine calculates scores for retrieved
documents. The maximum score a returned search element can have is 1. When
a score operator is used, the search engine first calculates a separate
score for each search element found in a document, and then performs a
mathematical operation on the individual element scores to arrive at the
final score for each document.
Note that the document's score is available as a
result column. The SCORE result column can be referenced to trap the
relevancy score of any document retrieved. For example:
YESNO
|
Forces the score of an element to 1 if the element's
score is non-zero:
<YESNO>mainframe
If the retrieval result of the search on "mainframe" is 0.75, the YESNO
operator forces the result to 1.You can use YESNO to avoid relevance ranking.
|
PRODUCT
|
Multiplies the scores for documents matching
a query. To arrive at a document's score, the search engine calculates
a score for each search element and multiplies these scores together:
<PRODUCT>(computers, laptops)
The resulting score for each document is multiplied together.
|
SUM
|
Adds together the scores for documents matching
a query, up to a maximum value of 1:
<SUM>(computers, laptops)
The resulting scores are added together.
|
COMPLEMENT
|
Calculates scores for documents matching a query
by taking the complement (subtracting from 1) of the scores for the query's
search elements. The new score is 1 minus the search element's original
score.
<COMPLEMENT>computers
If the search element's original score is .785, the COMPLEMENT operator
recalculates the score as .215.
|
Search Modifiers
Modifiers are combined with operators to change the standard behavior of
an operator in some way. For example, you can use the CASE modifier with
an operator to specify that you want to match the case of the search word.
Modifiers are as follows:
CASE
|
Specifies a case-sensitive search:
<CASE>J[AVA, ava]
Searches for "JAVA" and "Java." If a search
contains a mixed-case string, the search request will be
case-sensitive.
|
MANY
|
Counts the density of words, stemmed variations,
or phrases in a document and produces a relevance-ranked score for retrieved
documents. Can be used with the following operators:
- WORD
- WILDCARD
- STEM
- PHRASE
- SENTENCE
- PARAGRAPH
<PARAGRAPH><MANY>javascript <AND> vbscript
The MANY modifier cannot be used with the following:
- AND
- OR
- ACCRUE
- Relational operators
|
NOT
|
Used to exclude documents that contain the specified
word or phrase. Used only with the AND and OR operators.
Java <AND> programming <NOT> coffee
|
ORDER
|
Used to specify that the search elements must
occur on the same order in which they were specified in the query. Can
be used with the following operators:
- PARAGRAPH
- SENTENCE
- NEAR/
N
Place the ORDER modifier before any operator:
<ORDER><PARAGRAPH>("server", "Java")
|
|