Introduction
The SMB (SaffronMemoryBase) Store is a different kind of data store. Sometimes referred to as an information or knowledge store rather than a data store because it stores weighted "associations" or "links" between things, the fundamental representation of knowledge. Similar to how a search (inverted) index stores a link between a keyword and a document, SMB also stores links between terms. However, SMB goes beyond storing links between keywords and documents. Like RDF (triple) stores, SMB can also store associations between 3 things. But again, SMB goes beyond just storing links between 3 things. SMB also stores association counts, i.e. the number of times a particular association has occurred. Associations are typically, but not limited to, stored as doubles (A-C) and/or triples (A-B-C), where triples provide additional context (B) to the link (A-C). Looking up association counts, at query time, allow you to very efficiently utilize frequency-based statistics to compute results. When compared with alternative database methods of computing association counts at query time, you will find that using a SaffronMemoryBase makes the impractical, practical and opens a new world of opportunities for innovative problem solving.
SMB however, is logically organized like a relational database. The SMB top level object is called a "Space". A Space is simply a collection of associative memories. Very much like relational databases organize tables into databases, SMB organizes it’s memories into spaces. All REST calls (with the exception of listing spaces in a system) begin by specify the space name in the URL, e.g. http:://server/ws/spaces/space_name... When comparing a data store to Saffron’s associative memory store, there are several parallels:
| SMB Concept |
RDBMS Concept |
| Space |
Database |
| Memories (multiple combined) |
View |
| Memory.Matrix |
Table |
| Row |
Row |
| Column |
Column |
Space, Rows and Columns are familiar, straightforward concepts, although the kind of data stored in a row and column is very different from RDBMS (we’ll get to this later). What is probably a new concept in SMB are the concepts of a memories and a matrices. It is important to understand how they are related and how they are used. To begin with, we will first familiarize ourselves with the SMB object hierarchy:
Space
Memory
Matrix
Row
Col
Notice that Memories contain Matrices. A Memory is a logical concept, while a Matrix is a physical concept. Matrices are physically 2-dimensional structures (made up of rows and columns) that contain association counts:



A memory, like rows and columns have a name, as well as, a category. In this sense, matrices are sub-divisions of the associations for the attribute that the memory represents. It is the container for triples (memory-row-col). Inside a memory are matrices. A matrix has a name, like “january 2010” or “success” or just "default". You can also think of a matrix as a slice of the memory. For example, you may have a memory that represents all associations to author:jaredpeterson. Depending on the needs of your application, you may choose to partition those associations across time. You can define your space by mapping your date attribute’s value into the matrix name. As a result, every time data is ingested into the system with attribute author:jaredpeterson and date:january 2010, associations will be placed in the matrix ‘january 2010’ under the memory author:jaredpeterson. Over time, you will have formed several matrices, each representing a month in time (in this case).
In the end, Spaces, Memories and Matrices are merely ways of organizing associations between row and column attributes. You choose a strategy like "time-based slices" or "outcome based slices" or "region based slices", something that makes sense to the query process. Practically, this usually involves choosing 1 attribute type, like "publication date", and mapping values of that attribute to matrix names. Frequently applications don't have any subdivision of matrices and there is just 1, the "default" matrix. Memory mapping, on the other hand, is a bit different. You typically have several attributes mapped to memories. For example, in entity analytics, we'll typically map all person, places and things to memories. You might be able to now see how the triple store begins to form. The memory "author:jaredpeterson", may store an association in a mtraix "january 2010" where a row is "location:raleigh, nc" and column is "hashtag:running". Likewise, the memory "location:raleigh, nc" may store an association in it's mtraix "january 2010" the row is "author:jaredpeterson" and column is "hashtag:running". Through this redundat indexing process, each memory forms it's own view of it's association space. As a result of storing weights from both perspectives, with little work, SMB provides you with a tool to store and query directional, contextual weighed graphs.
In addition to the "triple" memories, there can be a double memory in a space. SMB refers to this as the directory memory. It stores all the double associations in the system.
The Directory Memory
The directory memory is the big memory in the sky for a particular space. It contains all the associations contained in all of the explicitly named memories. The directory memory is used in several ways. One way the directory memory is used is to understand associations between 2 things (called a double). At the most basic level, one element of the double is the input (row) and the other element is the output (column). Many times, attributes are used as both inputs (row) and outputs (col). In these cases, they would be both rows and columns in the matrix. The other thing the directory memory has is a global view of attributes. For example, global metrics, like entropy-based information scores, are computed from the directory memory. The directory memory also contains links to other memories. This feature allows you to drill down into more fine-grained (triple) memories, where the association counts reflect 3-way associations (memory-row-column). Triples, as they are called, represent a contextual link. For example, an application may not be interested in the fact that User A likes the color blue (in general), but likes the color blue as a background color. The background color adds “context” to the association between User A and the color blue. In fact, if all attributes were both rows and columns, the representation has maximum flexibility in terms of how you can ask the question. Any of the attributes provide context to the relationship between the other 2.
When do I need Triples?
Anytime you want to support a ‘context’-based query, you will probably want triple memories. In entity analytics, for example, all entities (proper nouns) are configured as owning their own memories. This means, every time the system ingests data containing a person, place or thing, a memory is created on behalf of the particular value of the attribute, e.g. author:jaredpeterson, where a context-based association is formed around ‘jaredpeterson’, in this case. There also may exist another attribute, in the group of associated attributes, that is used to specify the physical matrix location. Mapping an attribute to a matrix name provides a 4th dimension to your data. When this 4th dimension is used, it is frequently used to capture time. With an added temporal dimension, you can understand trends of context-based associations over time. The decision making process of deciding what attributes to map to memories, matrices, rows and columns ultimately depends on how you want to query the data. As a result, it is a good idea to become familiar with the queries and the query language (AQL) before starting your data integration. Shortly you will see how to create a space and how your attributes are defined. There are several setting when creating a space. The key setting when defining you space attributes is ‘role’. The ‘role’ property defines how an attribute is mapped to a memory, matrix, row and/or column.
Row and Columns
The representation of rows and columns in SMB is where things really start to diverge from RDBMS stores. In a relation model rows are inserted (updated) into a table as distinct records. Each row declares the set of values for each column in the table, e.g.
| Name |
City |
State |
Dept |
| John |
Boston |
MA |
Engineering |
| Bob |
Chicago |
IL |
Marketing |
| Sue |
San Francisco |
CA |
Finance |
| Jane |
Los Angeles |
CA |
Marketing |
Rather than storing the “data” (i.e. records of information), SMB stores association counts contained within the data. For example, if we were to put the above table in SMB, and configur the attribute "name" to be a memory, and the attributes "city", "state" and "dept" to be both rows and columns, it might look like the following:

This provides applications with a pre-joined "association count" representation of their data. If the question is what is related in the data, not just a report of the records that conform to an SQL expression, the determination of relevance eventually comes down to understanding association frequency counts. Now the above example is pretty boring since all the counts are 1, but you can imagine how the counts in the matrices might evolve when any of the attributes were to repeat in subsequent records, with different combinations of values. Because SMB stores association counts rather than data records, determining relevance is a much more efficient process. Instances of associations are not stored as separate records, as in database systems, they are joined with previous associations as the data is ingested. Pre-joining data has the effect of significantly reducing your read times and, therefore, is a more effective format for analytical queries. The trade-off is that you need to describe your memories in a way that satisfy your queries. We will see in a minute that with SMB, this is typically easier than defining a database schema. Imagine if in the above example, "city", "state" and "dept" were also configured as memories. From the 4 database records, SMB would contribute associations to 16, rather than 4, matrices, not to mention the directory memory and other row metadata (such as linear counts) that are stored.
The use of rows and columns by the query process is essential to understanding how data should be mapped. Fortunately the concept is pretty simple, rows are inputs (query terms) and columns are outputs (results). As long as any term that you want to put in a query exists as a row (and optionally as a memory for triples), you will be able to execute the query. Also, as long as the answer you are looking for exists as a column, you can ask for it. For example, if I create a space with "person" as a row and "city" as a column, I can ask for a rank-ordered list of cities to name:john. However to also answer the question of a rank-orderd list of people to "city:raleigh", I would have needed to make person and city both rows and columns.
After ingesting data into an SMB space, you can use the REST API to navigate that space.
Space Operations
There are 2 REST URLs that are cover the total set of Space operations:
/spaces
GET: lists spaces on the system
POST: creates a space
/spaces/{space}
GET: returns details about the space
PUT: initializes the space
DELETE: deletes the space
Navigation methods
There are a collection of navigation methods that let you traverse the memory heriarchy.
Memory Methods
/spaces/space/memories
GET: returns a list of memory categories for the given space.
/spaces/space/memories/{category}
GET: returns a list of memory values for the given category.
Matrix Methods
/spaces/space/memories/{category}/{value}
GET: returns a list of matrix names for the given memory.
/spaces/space/memories/{category}/{value}/{matrix}
GET: returns a list of row categories for the given memory/matrix.
Row Methods
/spaces/space/memories/{category}/{value}/{matrix}/{category}
GET: returns a list of row values for the given category.
/spaces/space/memories/{category}/{value}/{matrix}/{category}/{value}
GET: returns a list of column categories for the given row.
Column Methods
/spaces/space/memories/{category}/{value}/{matrix}/{category}/{value}/{category}
GET: returns a list of column values for the given category.
/spaces/space/memories/{category}/{value}/{matrix}/{category}/{value} /{category}/{value}
GET: returns a single association count stored in the cell of the spcified matrix.



Comments (0)
You don't have permission to comment on this page.