Thursday, December 31, 2009

Introduction of Hubble.net

 

Homepage at codeplex

 

Abstract

Hubble.net is based on .Net framework of the open-source free full-text search database components. Open-source license is Apache 2.0. Hubble.net provides a SQL-based full-text search interface, users only need to operate SQL, can quickly learn how to use Hubble.net full-text search. Hubble.net can achieve full-text indexing and querying, multi-field searching and sorting, grouping statistics, distinct, classification, clustering, multi-table queries, and other a series of full-text search and data mining features. Hubble.net provides a database adapter interface and can be the perfect integration of various databases for data mining and full-text search features. Hubble.net designed a more perfect concurrency control procedures, data additions, deletions, updating or query can be executed as multi-threading without any conflict. Hubble.net also providers the cache and memory management is designed to help users to maximize the improvement of query efficiency. In the next few years, I think hubble.net will be the most popular full-text search component in .Net development environment.

 

 

Physical view

 

image

 

 

Hubble.net integrates full-text search and relational databases together and can do full-text searching for the database through the SQL. Hubble.net component itself is responsible for the full text inverted index, and the index stored in the directory specified by the table is defined in hubble.net. Relational database is responsible for the data storage.

 

 

logistic view

 

image

 

Hubble.net has the concept of databases and tables like relational databases. The databases and tables of hubble.net is a mapping to the relational databases. There are not databases or tables entities in hubble.net. When the full-text search query is sent to hubble.net via SQL, hubble.net will link to the database entities of relational databases automatically. Hubble.net likes a database from the user side view.

Hubble.net responsible for establishing the text field inverted index and the index of key-valued for Untokenized fields. Relational database is responsible for B+ tree index. If the query does not include full text search field, then forwarded directly to the database using the database indexes to search.

 

 

Three level cache

 

 

image

 

Three levels of cache is designed by Hubble.net.

Index cache: index-level cache for caching inverted index and key-valued index. This cache is automatically managed and can not close. Index-level cache will be synchronized automatically when the data are deleted, updated or inserted.

Query cache: query-level cache the terms of the query cache. Hubble.net system service will cache the document ids of different query result. When the same query is executed again, the query cache will work. As the table changes, the query-level cache will be expired if timeout is zero and need to re-cache.

Data cache: data-level cache is running on the sqlclient. The sqlclient will cache the data and read data from data-level cache next time. from the data cache to obtain the data directly, rather than then Hubble.net system services to obtain data. As the table changes, the data-level cache will be expired if timeout is zero and need to re-cache.

 

Memory management

Hubble.net is run as system services. It doesn’t share memory with application like Lucene. Hubble.net designed a set of memory management mechanism, the user can set the maximum amount of memory usage. Once Hubble.net using memory beyond this amount, it will start the memory cleaning process automatically, some of the less frequently cache from memory clean out in order to free up more memory space to the user. Users can view and manage memory though SP_CONFIGURE stored procedure.