dtSearch

Category Intelligent Software>Fuzzy Logic Systems/Tools and Intelligent Software>Data Mining Systems/Tools

Abstract dtSearch is a natural language/fuzzy logic text searching, retrieval and publishing system. The dtSearch product line can instantly search terabytes of text across a desktop, network, and Internet or Intranet site.

dtSearch products also serve as tools for publishing, with instant text searching, large document collections to Web sites or compact disk (CD)/digital video disks (DVDs).

Product offers over two dozen indexed, unindexed, fielded and full-text search options.

dtSearch highlights hits in Hypertext Markup Language (HTML), Extensible Markup Language (XML) and portable document format (PDF) while displaying embedded links, formatting and images.

dtSearch converts other file types - word processor, database, spreadsheet, email and full-text of email attachments, ZIP, Unicode, etc. - to HTML for display with highlighted hits.

Its built-in Spider adds local or remote web sites (including dynamically-generated content) to your searchable database.

Search Features - Relevancy-Ranking: dtSearch can sort and instantly re-sort searches by relevancy with respect to number of hits, file name, file date, etc.

Natural language algorithms provide automatic term weighting, following a “plain English” or unstructured indexed search request. Automatic term weighting, is based on the frequency and density of hits in your files.

For example, in the search request get me Sam's memo on the 1999 CorpX takeover, if 1999 appeared in 3,000 files, and Sam appeared in only two files, then Sam would get a much higher relevancy rating, taking you straight to the most “relevant” files.

A positional scoring option works with dtSearch’s natural language relevancy ranking to rank documents more highly when hits are near the top of a file, or otherwise clustered in a file.

dtSearch also includes variable term weighting options for both indexed and unindexed searches - Positive term weighting can place extra emphasis on one or more words: soup:8 or recipe:3;

Negative term weighting can assign negative emphasis to one or more words: red or green or yellow:-7. Variable term weighting can also apply to fields: (description:5 contains (apple and pear)) or (author:2 contains smith).

Fuzzy Searching -

1) Fuzzy searching uses a proprietary algorithm to find search terms even if they are misspelled;

2) Search fuzziness adjusts from 0 to 10 so you can fine-tune fuzziness to the level of optical character recognition (OCR) or typographical errors in your files;

3) A search for alphabet with a fuzziness of 1 would find alphaqet; with a fuzziness of 3, it would find both alphaqet and alpkaqet.

4) Fuzziness is Not built into the index, so you can vary fuzziness at the time of each search.

Concept / Synonym / Thesaurus Searching -

1) Concept searching lets you look for fast and find quick, speedy, etc;

2) dtSearch offers variable levels of automatic synonym expansion based on a comprehensive semantic network of the English language;

3) You can also add your own thesaurus terms.

Basic Search Types -

1) Phrase searching finds phrases like: due process of law;

2) Boolean operators like and/or/Not can join words and phrases: due process of law and Not (equal protection or civil rights);

3) Proximity searching finds a word or phrase within “n” words of another word or phrase: apple pie w/38 peach cobbler;

4) Directed Proximity searching finds a word or phrase “n” words before another word or phrase: apple pie pre/38 peach cobbler;

5) Phonic searching finds words that sound alike, like Smythe in a search for Smith;

6) Stemming finds variations on endings, like applies, applied, applying in a search for apply;

7) Numeric range searching finds any number between two numbers, such as between 6 and 36;

8) Macro capabilities make it easy to include frequently used items in a search request.

9) Wildcard support allows ‘?’ to hold a single letter place, and ‘*’ to hold multiple letter places: apple* and not appl?sauce.

Note: Combining Search Types - Nearly all search types are combinable and you can make your search request as complex as you want.

The dtSearch product line consists of:

1) dtSearch Desktop with Spider - instantly searches files on a PC;

2) dtSearch Network with Spider - searches across a network;

3) dtSearch Web with Spider - quickly publishes a large volume of instantly searchable data to an Internet or Intranet site. The Spider expands the scope of the searchable database beyond a site's own data to content on other sites;

4) dtSearch Publish - offers easy publishing of an instantly searchable document collection to CD, DVD, portable hard-drive, and the like. The product can also mirror an existing Web site on CD/DVD;

5) dtSearch Text Retrieval Engine - lets software developers add dtSearch search functionality to Web-based and other applications. The dtSearch Engine also supports databases such as SQL;

6) dtSearch Engine for Win & .Net - supports C++, Java and .NET. The dtSearch Engine for Win & .NET also includes a .NET Spider Application Programming Interface (API), making the Spider functionality accessible to software developers; and

7) dtSearch Engine for Linux - provides C++ and Java APIs to software developers.

What’s New -- dtSearch 7.66 (Build 7936) Released January 25, 2011 --

dtSearch Engine -

1) Added .NET 4.0 versions of the .NET API (dtSearchNetApi4.dll, dtSearch.Spider4.dll) and sample code for C# .NET 4.0 and VB.NET 4.0.

2) Added dtsSearchFastSearchFilterOnly search flag to enable much faster, optimized generation of a SearchFilter from a search when No other output is required from the search.

3) Added WordListBuilder.GetLastError to the C++, Java, and .NET APIs to provide better reporting of errors resulting from WordListBuilder calls.

4) Added new flag to enable caching of field values in WordListBuilder to make ListFieldValues calls faster.

The flag is dtsWordListEnableFieldValuesCache (in the WordListBuilderFlags enumeration) and is passed to WordListBuilder using the new SetFlags method.

5) Added new .NET method Server.SetEnginePath to allow ASP.NET application deployment without administrative access.

6) Added new .NET sample application, AzureDemo, demonstrating use of the dtSearch Engine in an Azure instance.

7) Added a way to disable file parsers using the file type table (filetype.xml) by setting the TypeId to the id of the parser to disable and the Flags value to 2.

8) Added a mechanism for a dtsInputStream to simulate an I/O error by returning a negative value from read() of less than 10,000. When this occurs, dtSearch will interpret it as an I/O error and halt processing of the current input file immediately, reporting an I/O error through the API.

All Products -

1) Faster indexing of binary data using the filtering algorithm.

Fixes and minor enhancements -

1) In dtSearch Desktop, added SizeK, IndexRetrievedFrom, SearchDate, ReportDate variables to SearchReportTemplate.rtf and SearchListTemplate.rtf.

2) Java and .NET API: Fixed IIndexStatusHandler bug causing PercentDone to remain zero during compression of an index.

3) Added docId of document being removed from an index to IndexFileInfo reporting through IIndexStatusHandler.

4) Fixed FileConverter bug that caused invalid XML to be generated from some conversions due to output of character code 128.

5) Added SearchJob.UnindexedSearchFlags in the .NET API and SearchJob.setUnindexedSearchFlags in the Java API to enable case and accent-sensitive unindexed searches in these APIs.

6) Added .NET SearchFilter.GetItems() to provide access to an array of the doc ids selected in a SearchFilter.

7) File parser bug fixes affecting Office XML drawings embedded in Word, PowerPoint, and Excel files; interpretation of OEM character codes (_x00NN_) in Excel 2007 files; dates prior to 1970 in MDB files; performance and memory use parsing MIME files; Word auto-numbering; PDF.

System Requirements

dtSearch Web system requirements and performance

Manufacturer

Manufacturer Web Site dtSearch

Price Contact manufacturer.

G6G Abstract Number 30710R

G6G Manufacturer Number 100840