Query Basics
After you have ingested your data, you can begin to look at results using the SMB queries. It is not necessary that all the data be completely ingested, you can query while ingesting to see the associations form. The easiest way to start looking at query results is by using the Test Harness. The Test Harness is simply a web page installed on the SMB server allowing you to interactively issue REST queries. Results are displayed in the native JSON format. Alternatively, you can write a custom program to issue your queries. If you are an enterprise customer, use Firefox and have disabled API keys, you can also use the REST Client Plugin for Firefox.
If you look at the REST API, you will see a breakdown of the methods into categories. Most of the queries fall in the "Space" methods. Aside from status and metadata, there are 3 basic things that SMB query methods return:
- Resources (documents, db records, etc.) that have been ingested into the system. You can use the /spaces/{space}/resources call to return resources that have been ingested.
- Attributes contained in the resources that have been ingested into the system. You can use the /spaces/{space}/attributes call to return attributes that have been ingested.
- Associations between the attributes that have been ingested into the system. There are several space methods for retriving attribute assocation-based results. The most basic of these methods is the /spaces/{space}/connections query. This is the first place to start.
The connections query represents the basic auto-associative or pattern completion operation in SMB. The concept is given a collection of attributes (category/value pairs), return a rank-ordered list of attributes associated with the input (vector). Most other high-level queries use the connections query underneath the covers, so it is important to understand the mechanics of this query. There are a number of parameters used by the query, and it may be easiest to describe the parameters in terms of their counterparts in SQL:
- Select clause: 'c' (output categories) and 'ca' (output attribute candidates)
- From clause: space name in url, 'me' (memories), 'ma' (matrices) and 's' (virtual memory scope)
- Where's clause: 'q' (query attribute terms), 'f' (output filters) and 's' (virtual attribute scope)
In terms of your attribute space, it is important to understand which are your input attributes and which are your output attributes. In many cases an attribute may be both an input and output, if you need to track bidirectional associations. For example, sometimes I know a person's name and I want to know what cities they've visited. Other times I may know the name of a city and want to know the people who have visted it. Once you are clear on your inputs and outputs, you need to map them into the network of associative memories. Outputs are usually pretty straightforward... they are generally columns in a matrix. Inputs are not as straightforward. If you remember the other elements of the SMB object hierarchy, an input attribute can be either a space name, memory name, matrix name or row attribute. Lets explore some of the trade-offs of this input mapping.
Suppose you had some Twitter user matrices that contain their tweets. What types of queries can you do and how are all the association counts used during the query? As an initial query, let ask the system to return a rank-ordered list of hashtags that are associated to a list of terms. If you were to visulize the user's matrix, it might look something like the following.

The rows are the query terms (in this case: seafood, quinoa, scottjurek, nutrition, advice) and the columns are all the various hashtags person:joe has used in the month of january 2010. The matrix name is default.person.joe.01-01-2010 (space.memory_category.memory_value.matrix). This submatrix might be one that is used to query hashtags for person:joe. You have decided to keep the associations for person:joe distinct from the associations for person:mary, for example. This way you have formed a "personal model" for joe. Given the above sub-matrix, if you were to issue the following query:
/ws/default/connections?q=(noun:seafood noun:quinoa person:scottjurek noun:nutrition noun:advice)&me=person:joe&ma=01-01-2010&c=hashtag
we might see the following rank-ordered list:
hashtag:food
hashtag:running
hashtag:travel
hashtag:programming
The results list reflects a rank ordered list of preferred hash tags for person:joe given the query terms. How do you compute the metric for the hash tags? SMB looks at various aspects of an association, but first and foremost it looks at the existance of a connection? The difference between an association count of 0 and 1 is huge! Much more than the difference between an assocation count of 10 and 11, for example. If an association count exists, it means there is a connection. As a result, the default SMB connection calculator separates the concept of a connection (boolean) and a count (integer). If you were to issue the above REST call you might see a JSON response similar to:
...
The metric that is used to rank order the results is what we call the "rollup metric". It is the sum effect of processing the connections, weights, information scores, counts, priorities, signals, distances, and anything else we consider when rank-ordering results. It shouldn't be thought of as a probability or some absolute metric. It is strictly a relative metric, where the best answer(s) will always have a rollup metric of 1.0 and the other answers will have a number less than 1.0 but greater than 0. It is a way to perform relevence comparison amongst result elements. The rollup metric should not be used to compare results from seperate queries. Let me say that again... the rollup metric should not be used to compare results from seperate queries. If one result has a metric of 1.0 and another has a metric of 0.5, the first results is twice as "relevant" and the second one. If you are curious about the raw numbers behind the rollup metric, or you have some sort of visulization where you want to show detail beyond 1-5 star ratings, you can easily dig further into the raw metrics (which is also the way to compare results across multiple queries).
An application can ask SMB to return more than just rollup metrics. The level of metric detail that is returned is controlled by the Metric Level parameter (ml). By default, queries assume metric level zero (ml=0), which just returns the basic rank ordering metric. If you want to know how many association counts there were, for example, you might ask for metric level 1 (ml=1). In the above example, when hashtag:food is returned, along with the 1.0 rollup metric, you would see that there were 30 counts. Looking down the hashtag:food column, SMB adds up 10+1+2+15+2 to get 30. Likewise, hashtag:running would have 18, and so on. Metric level 1 would also tell you that hashtag:food and running have 5 connections, travel has 4 and programming and music both have 1 connection.
How do you change the query to get back this additional info? You set the ml param to 1.
/ws/default/connections?q=(noun:seafood noun:quinoa person:scottjurek noun:nutrition noun:advice)&me=person:joe&ma=01-01-2010&c=hashtag&ml=1
Example JSON response:
...
What if you wanted not just the association counts for each output, but the association counts of each input to each output? For this level of detail, you can request metric level 2 (ml=2). With metric level 2, you get all the metric details for every input attribute.
/ws/default/connections?q=(noun:seafood noun:quinoa person:scottjurek noun:nutrition noun:advice)&me=person:joe&ma=01-01-2010&c=hashtag&ml=2
Example JSON response:
...
Increasing Dimensionality
When SMB returns a result, such as hashtag:food, the metric is an aggregation of all the pairwise associations between each input term and the output term for the space/memory/matrix. In the example above, you are capturing 4 dimensions for each association count:
- Memory: person
- Matrix: time frame
- Row: input term (person/noun)
- Column: hashtag
But what if you wanted to increase the dimensionality (resolution) beyond the 4 dimensions. How would you do it? Remember, the space/memory/matrix is essentially a folder hierarchy that can be mapped to almost anything. So how might you retain your current structure, but add another dimension so that 3-way term relationships are captured for every user given a period of time? What if you were to modify the value of your input attributes? By prepending/appending another dimension (in this case the person dimension) you can effectively accomplish your goal of adding another dimension to your representation. Also, by making everything fall under the category of person, each of the enhanced dimensions forms a separate memory (since your space schema definds "person" as a memory).

So what would be the difference between the attribute structure: person:joe.noun.seafood and noun:seafood.person.joe?
...
Multi-Matrix Queries
Sometimes a memory will "own" multple matrices. The best example of this happens when you create temporal memories. If an date attribute in your data is mapped to "matrix", data ingested with that date attribute will be placed in the matrix corresponding to the date's value. This is used frequently with news and other time sensitive information. At query time, the question becomes how do you then query multiple matrices at the same time? SMB is a virtual system in that there is both a physical and logical concept of memories, matrices, rows and columns. Therefore at query time, there are ways to put together mutiple objects as if they were one. When a memory is "sliced" into multiple temporal matrices, the representation has the additional "scalar" property in that a point along the range may be the most relavent (like the most recent).
Multi-Memory Queries
Temporal Memories
Reading Sub-Matrices
For those who are inspired (or required) to use the raw counts of the associtive memory to perform a domain specific operation, SMB provides an easy API to get access to any sub-matrix. You have seen how to use the connections query to complete a pattern to a set of categories using the 'c' param. This is called "query by category". There is another mode that allows us to "query by candidate". Instead of specifying "&c=person" to get the top associated people to your input vector, you can list the columns, explicitly, that you would like the system to return. Using the 'ca' param, you list the columns similar to how you list the rows in your 'q', expect the 'ca' param takes a single "category:value" specification for each 'ca' param.
Hetero-associative Queries
Reading across matrix rows to output columns is the core of SMB's auto-associative or pattern completion operation. This is what the connections query does. You are essentially asking the system what else is missing from this vector? Hetero-associative queries represent another associative memory operation that takes a pattern and returns matching patterns or pattern classes. In SMB, hetero-associative queries don't have a specific REST method, rather there are several approaches that can be used instead.
To illustrate the difference between auto-associtive memories and hetero-associtive memories, we will see how the above example might be represented as hetero-associtive memories.
You will notice that the rows in the auto-associative memory are also columns in the hetero-associative memories. The columns have moved ot to be matrix names (note: the temporal matrix slicing has been removed). The gray counts represent association counts between the matrix and the row. The hetero-associative quey operation is then to extract all the pari-wise associations, corresponding to your query terms, and compare across the 4 matrices. The matrix that contains the "best" set of associations wins! To see an example of how to extract a submatrix, see the ExampleBase.getSubMatrix() in our sample code.
The interesting thing about a space where all the attributes are configured as memories, rows and columns, is that you can use the connections query to effectively perform a hetero-associative query. Lets look at the following example:
For more information on query APIs see the REST API Reference Guide.
Comments (0)
You don't have permission to comment on this page.