View
 

Attribute Query Language

Page history last edited by Jared Peterson 3 wks ago

The Attribute Query Language (AQL) is a simple language to specify the input attribute sets for a Saffron query (the "q" query param).  The syntax of AQL is similar to that used in a search engine.  In general, AQL consists of a list of string expressions respresenting associative relationships of attributes that the query result must possess.

 

Supported URL Parameter(s):

 

Name Description Default
q The AQL query string. No default

 

Language Constructs:

 

Find results associated with the term 'fish'.  Specifying a single word 'fish' will match results containing attributes noun:fish and verb:fish or any other category that contains the value 'fish'.

 

fish

 

If we want to match more than one term, we just need to list the terms separated by one or more white spaces.  The query string below will find the results containing both terms 'trout' and ‘fishing’.

 

trout fishing

 

If your want to limit the parser to distinghish 'john' from 'mary', you can separate the two terms with a comma.  As a result, the system will not try to match 'john mary' with a result.  Generally this is done to help query performance reasons when a long query string is used.

 

mary, john

 

To find results containing all the words, we just simply list them, as in the above examples.  To widen the search, it is often desirable to find results containing at least one of the terms (like an OR operator).  You can express this condition by wrapping around these terms with parentheses.  For example, the query string below will find results containing at least one the term bass, trout or walleye.

 

(bass trout walleye)

 

Attribute groups, or AND-Groups as the are referred to, are commonly used to express context themes.  You might imagine forming a query of results associated to a particular entity "in the context of" a theme.  For example, we are interested in looking for cities associated to john in the context of freshwater fishing.  The query string might look something like:

 

mary (bass trout walleye)

 

However, if you want to match words in the exact order specified, we just surround the words with double quotes.

 

"Saffron Technology"

 

If we want to find results that must not contain a term, we just prefix the term with a not-operator '-'.

 

-music -sport

 

The above expression returns results containing neither term music nor term sport. To find with results containing an optional term, prefix the term with an or-operator '?'. The more associations to optional terms causes a result to be rank-ordered higher in the result list, however, the lack of association doesn't disqualify the result (as would a standard 'AND' term or 'AND-Group').

 

?trout

 

Note that the query 'bass trout ?walleye' is semantically different from '(bass trout walleye)'. The former finds results associated to both terms 'bass' and 'trout' and optionally 'walleye'. The later finds results associated to at least one term from the list 'bass', 'trout' and 'walleye'. Sometimes, we want to find results containing a name with different spellings. Although you could group them in parentheses, but often it is difficult to list them completely. AQL provides a similarity operator '~' to find results on a fuzzy matched name.

 

~"john smith"

 

The fuzzy name above will match results containing “john smith”, “joe smith”, “john smyth”, “j. smith”, “john samuel”, “smith jackson”, etc. AQL understands the full concept of an attribute (category and value) and can provide more accurate matching beyond just the term value. For example, we can be more specific to find results associated with a specific category:value like person:virginia rather than state:virginia.

 

person:virginia

 

All the operators like ”-”, “?”, and ”~” can also be used with an attribute to specify exclusive, optional and fuzzy matching conditions.

 

Finally, if you want to includes terms that match a wildcard specification, use the '*' character to indicate a sequence of wildcard characters.

 

fish*

 

The above expression will return results that are associated to any term beginning with fish. In this case, 'fish* does not need to be put in parenthesis, this is automatically done. So if 'fish*' matches both fish and fishtank, the resulting query results would be the same as (fish fishtank). 

 

Operator Precedence and Usage:

 

Symbol

Precedence

Usage Comment

” ”

0

multi word literal

:

1

specify an attribute (literal:literal)

~

2

expand a literal/attribute into a set of similar literals/attributes

()

3

enclosing a set of literals/attributes

-

4

do not match a literal/attribute

?

4

match optional literal/attribute

 

Examples:

 

The above is an attribute with a multi-word category. You can always put parathesis around categories or values that contain whitespace.

 

"junior accountant":joe

 

Returns associations that do not match any person attributes whose values are similar to “john smith”

 

-~person:"john smith"  

 

Returns associations that contain at least one person attribute whose values are similar to “john smith” or the term mary

 

(~person:"john smith" mary)

 

Returns associations that optionally match either one of the terms. It is the same as “?fish ?fishing”.

 

?(fish fishing)

 

Returns associations that do not match one of the term 'musician' or 'musical'

 

-(musician musical)

 

The following shows some illegal AQL expressions:

 

(plant (tree shrub))      // nested level not allowed
~-musical                 // literal must follow ~ operator
:joe                      // the syntax not supported, use 'joe' directly instead

 

Performance Consideration Summary:

 

Obviously using wildcards could dramatically increase the number of terms in your query. This could potentially have an impact on your query performance. There is a (configurable) practical limit of how many terms are included in a query. The default limit is 1000. If a wildcard specification matches more than the configured limit, a QueryTooLargeException is thrown. There are two other things to keep in mind when optimizing your query performance. The first is to use literal string and commas, whenever possible, to designate terms. The query string 'North Carolina Basketball' will end up looking for all permutations of the words 'North Carolina Basketball', rather than just the phrase that you might have intended "North Carolina Basketball". Likewise 'john salley' could be better formed if you type 'john, salley' if you didn't intend to match on "John Salley", but "John" and "Salley". In addition, performance can be further improved by indicating the category name of the attribute. For example, person:john is better than just john.

Comments (0)

You don't have permission to comment on this page.