Thursday, December 31, 2009

Introduction of


Homepage at codeplex


Abstract is based on .Net framework of the open-source free full-text search database components. Open-source license is Apache 2.0. provides a SQL-based full-text search interface, users only need to operate SQL, can quickly learn how to use full-text search. can achieve full-text indexing and querying, multi-field searching and sorting, grouping statistics, distinct, classification, clustering, multi-table queries, and other a series of full-text search and data mining features. provides a database adapter interface and can be the perfect integration of various databases for data mining and full-text search features. designed a more perfect concurrency control procedures, data additions, deletions, updating or query can be executed as multi-threading without any conflict. also providers the cache and memory management is designed to help users to maximize the improvement of query efficiency. In the next few years, I think will be the most popular full-text search component in .Net development environment.



Physical view


image integrates full-text search and relational databases together and can do full-text searching for the database through the SQL. component itself is responsible for the full text inverted index, and the index stored in the directory specified by the table is defined in Relational database is responsible for the data storage.



logistic view


image has the concept of databases and tables like relational databases. The databases and tables of is a mapping to the relational databases. There are not databases or tables entities in When the full-text search query is sent to via SQL, will link to the database entities of relational databases automatically. likes a database from the user side view. responsible for establishing the text field inverted index and the index of key-valued for Untokenized fields. Relational database is responsible for B+ tree index. If the query does not include full text search field, then forwarded directly to the database using the database indexes to search.



Three level cache





Three levels of cache is designed by

Index cache: index-level cache for caching inverted index and key-valued index. This cache is automatically managed and can not close. Index-level cache will be synchronized automatically when the data are deleted, updated or inserted.

Query cache: query-level cache the terms of the query cache. system service will cache the document ids of different query result. When the same query is executed again, the query cache will work. As the table changes, the query-level cache will be expired if timeout is zero and need to re-cache.

Data cache: data-level cache is running on the sqlclient. The sqlclient will cache the data and read data from data-level cache next time. from the data cache to obtain the data directly, rather than then system services to obtain data. As the table changes, the data-level cache will be expired if timeout is zero and need to re-cache.


Memory management is run as system services. It doesn’t share memory with application like Lucene. designed a set of memory management mechanism, the user can set the maximum amount of memory usage. Once using memory beyond this amount, it will start the memory cleaning process automatically, some of the less frequently cache from memory clean out in order to free up more memory space to the user. Users can view and manage memory though SP_CONFIGURE stored procedure.