Spomet: Meteor Full-Text Search in a Nutshell

I recently updated my little full-text search project Spomet. I made some major changes to it’s API. As a consequence I just finished updating my earlier tutorial post, that covers the creation of an example app using Spomet for search. While doing this I realized two things: this tutorial is really long and it’s too basic for anyone who has already some bonus-miles riding the Meteor.

So I decided, to write this post: An in-depth, concise and to the point guide on using Spomet. It should serve as the main documentation for Spomet.

Get the package

In case you are using Meteorite / Atmosphere you can add Spomet to your app with:

mrt add spomet

If you prefer using plain Meteor you can grab the package code on GitHub and place it in a suitable folder (e.g. packages/spomet). Don’t forget to add the package to Meteor afterwards (e.g. meteor add spomet).

Adding Documents to the Index

Before searching makes any sense, there have to be some documents in the index. To achieve this, there are two functions to choose from. The first is Spomet.add, the first parameter is a hash and the second a function:

Spomet.add
        text: 'the text that should be found'
        type: 'post'
        base: someRefId
        path: 'description'
    ,
        (result) ->
            # the result is the hash from above 
            # substituted with omitted default values 
            # and the version number given

Spomet.add, well big surpriseadds a document to the index. The hashkeys type and path are optional and will be substituted with default values (‘default’, ‘/’) in case they are absent. The final function-parameter is optional as well. In case it is present it will be called when the document was successfully inserted with the initial document hash as parameter, extended with the version number used and eventual document parameter substitutions.

One thing to keep in mind though, the document gets added even if a document with the same identifying parameters (type, base, path) already exists. Internally there is a version number as a final part of the documents ID. This number gets increased. To actually know what version number was used you have to register the callback.

In case you don’t want to make different versions of the same document to be findable, there exists the second insertion-function: Spomet.replace. It adds the document and removes another occurrence of the same base document (identified by typebase and path). Spomet.replace takes optionally a second parameter, a version number. If this second parameter is present the specified version gets removed while the new document is added. If the parameter is absent the document with the biggest version number gets removed.

As the final parameter of Spomet.replace you might provide a callback. This one gets called with the document specs of the removed document, without the text parameter, though, and the same hash that get’s return from Spomet.add. Both are wrapped in a single object of the following form:

{
added:
    text: 'the text that should be found'
    type: 'post'
    base: someRefId
    path: 'description'
    version: 2
removed:
    type: 'post'
    base: someRefId
    path: 'description'
    version: 1
}

Spomet.replace adds the document even if there is nothing to remove.

Removing Documents from the Index

There is one function to remove documents from the index: Spomet.remove. I takes a hash parameter as well and removes all matching documents. Any combination is allowed. For example:

Spomet.remove
    path: 'description'

removes all documents with the path: description.

Spomet.remove accepts a callback as it’s final parameter as well. The callback function is called with an array containing the documents, that were removed. The documents in that array have the following form:

{
_id: "q6gQuRYayvnGttyyF"
base: "hiexeuv2kbyQ2Jnt8"
created: Mon Nov 04 2013 08:34:36 GMT+0100 (CET)
dlength: 22
docId: "custom-hiexeuv2kbyQ2Jnt8-custom-1"
indexTokens: Array[35]
mostCommonTermCount: 2
path: "custom"
text: "This is some test text"
type: "custom"
version: 1
}

Trigger Searches

You basically have two options. The first is to use the search-box that is provided by the package. The second is, use the search functionality but use your own implementation to trigger searches.

Use the Built-In Search Box

The built-in search-box provides autocompletion (using Bootstrap’s Typeahead) and uses the 1.000 most often frequented words, that are stored in the fullword index. There is a hash to configure, besides other things, the number of words exposed to the client. It’s accessible through Spomet.options from the server and has an attribute called keywordsCount. Alter it to the number of keywords you would like to expose to new clients.

To include the search-box you place the following code in your template of choice:

{{> spometSearch}}

This template call accepts a hash parameter to alter the way the textbox with it’s buttons is rendered. Currently the template tries to access fieldSizeClass, to get information on how wide the search-box should be rendered, and buttonText, to get the text that should be displayed on the search button.

Use the Plain Engine

You can use an arbitrary number of separate searches in your app and you are free to omit the provided search-box. You have to instantiate Spomet.Search and call find on the instance.

mySearch = new Spomet.Search
mySearch.find 'something'

To implement your own autocompletion you might want to access the collection Spomet.CommonTerms. The documents in this collection are primary ordered by number of indexed documents containing the keyword in question, secondary by the keywords length and have the following form:

{
_id: 'tRNE233c45DTrne'
token: 'meteor'
tlength: 6
documentsCount: 1
documents: [{docId: 'type-base-path-version', pos: 7}]
}

Like above, the number of keywords (tokens) is constrained by Spomet.options.keywordsCount.

Customization

In case you don’t want to search using all indexes you can explicitly set the indexes to be used during search:

mySearch.setIndexNames ['fullword', 'threegram']

The following indexes are currently implemented: fullwordthreegramwordgroup and custom.

Accessing The Results

Spomet.find doesn’t return anything. Instead, following the main Meteor paradigm, Spomet provides a reactive data source that updates itself and dependent user interface elements, while the search is underway. You access it with:

Spomet.defaultSearch.results()

in case you try to access the results from searching with the provided search-box. And with:

mySearch.results()

in case of a custom search. In either case the 20 documents with the highest score are published, sorted by score. You can alter this globally, by changing Spomet.options.resultsCount and Spomet.options.sort.

Besides this global options you can control the sort order, the number of results and the offset (for paging) on every Spomet.Search object. The methods for this are:

mySearch.setSort
    score: 1
mySearch.setLimit 50
mySearch.setOffset 50

The results() function returns a Meteor.Collection.Cursor object and the documents in this collection have the following form:

{
base: "cYeWPQ4s3uc8Aewmr"
type: "custom"
interim: false
phraseHash: "b32d73e56ec99bc5ec8f83871cde708a"
queried: Mon Nov 04 2013 09:33:37 GMT+0100 (CET)
score: 0.43085068485657296
subDocs: 
    "custom-1": 
        docId: "custom-cYeWPQ4s3uc8Aewmr-custom-1"
        path: "custom"
        version: 1
        score: 0.43085068485657296
        hits: [
            {indexName: "threegram", pos: 1, token: "not"},
            {indexName: "threegram", pos: 2, token: "oth"}]
}

The search results are grouped by base reference. In my experience this makes it easier to handle the display of the documents that are referenced in the search. The subDocs hash points to the actual matching documents (the documents that where added to Spomet), with their score and the actual hits.

The hits array holds a reference to the index (indexName), with which it was found, the offset (pos) in the document and the actual matching (sub-) string (token).

The attributes base and type are the same as from adding documents. phraseHash is a MD5 hashed representation of the search query. Queried is simply a timestamp when the search was first issued.

Searches are cached to improve performance. This search cache is flushed as soon as there are documents added or removed from Spomet.

The interim flag signals, if this result was created by a plain lookup in the local fullword index (true) or if the result was processed (or is currently being processed) on the server (false).

Controlling Index Usage

You have seen above, that you can control the indexes that should be used while searching. Additionally you might want to disable some indexes globally (for searching and indexing). You can achieve this by calling the corresponding Meteor methods:

Meteor.call 'disableFullWordIndex'
Meteor.call 'disableThreeGramIndex'
Meteor.call 'disableCustomIndex'
Meteor.call 'disableWordGroupIndex'

There exist corresponding enable methods, in case you want to re-enable an index later.

Adding Your Own Index

The implementation of indexes is pretty straight forward, actually. I haven’t tested it, but it should be possible to include your own index pretty easily. The following gist shows the threegram index. The inline comments should help you to get along. Keep in mind, though, indexes are exclusively for the server-side.

[gist https://gist.github.com/Crenshinibon/7303306]
Advertisements

One thought on “Spomet: Meteor Full-Text Search in a Nutshell

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s