Information and Search System
Documentation and examples
"Forget-Me-Not" information and searching system is developed to store and to work with unstructured fascimile electronic copies of paper documents (e.g., books, newspapers, incoming and outcoming business documents, etc.) as well as text documents in Windows 95/98/NT environment.
The aim of the system is to compile fascimile archives of unstructured documents and to provide easy and convenient access to them, supplied with an ability to search for the documents using natural language requests (including concrete position within the document). In contrast to majority of search systems using keyword requests (keywords with logical relations), here the search system is oriented to use rather large document fragments or even the whole document as a request. Therefore, the result of the search is not the documents containing the request word (perhaps, in all its forms), but the documents that are most close by content to the document of the request.
The searching technology is based on the ideas of complex dynamics of nonlinear systems. Here the information image is unambiguously related to a periodic motion of the dynamic system which is the information "storehouse". The relationship between the information and the system dynamics is as follows:
- information image motion over a periodic orbit (attractor);
- set of information images ("storehouse") set of periodic orbits of the dynamic system;
- retrieval of an image setting the initial conditions and transition to the motion over the corresponding periodic orbit.
To operate with facsimile copies of paper documents, they are OCR-processed and their text content is retrieved, which is further used in the search. Incoming text documents are converted by means of the Box Manager, which is a part of the "Forget-Me-Not" information system, into a dynamic archive - a storage of the text information images. When analyzing the incoming documents, the system creates an artificial language related to the contents of the stored documents. Combination of this language and the dynamic system provides the content-sensitive (associative) search for information.
User can search a document in three different ways.
- Unique search. In this case, the system seeks for a document containing the required text fragment. If the required fragment is present in the stored document archive, it will be found, even in the case of a certain mismatch between the request and its counterpart in the archive (e.g., in the cases of some words changed by synonyms, missing or extra words, spelling errors, etc.). Typical minimum of the request length in this search mode is approximately of 1-2 lines of the text.
- Links (associative search). The request is converted using an artificial system language corresponding to the information stored in the archive. Then the request is parsed by the stable "words" of that internal system language, and the parsing results are displayed. Each element ("word") of the parsed request is a link to at least one document of the archive. Visual examination of the parsed request allows the user to choose the most informative (from his viewpoint) links. Besides, parsing the request gives user useful hints and helps him to understand which words and combinations can be keys to the documents from this particular archive.
- Standard search . The search is performed in a usual way for a separate word or a combination of keywords with logical relations.
This information and searching system "Forget-Me-Not" is accomplished as a searching machine running on a Web server, e.g., MS Personal Information Server (Windows 95/98) or MS Internet Information Server (Windows NT). It can be accessed by ordinary Internet browsers (e.g., Netscape Navigator or MS Internet Explorer, etc.). The "Forget-Me-Not" system can run on a separate computer, as well as on a server on a local intranet or Internet.
You can get installation pack of the beta version of the system along with examples of archives at http://www.cplire.ru/html/InformChaosLab/products/download.html
- Incoming information is arranged in boxes-archives. The size of each box is limited by the available computer RAM. The recommended archive size is up to 32 Mbytes.
In the boxes, the user arranges the documents in folders, e.g., according to their content or arrival time. The quantum of stored information in the system is a document. There are no limits on the size of the folders and documents within the box.
Example. Fascimile electronic copy of a book.
- - Box-archive - The book itself;
- - Folder - Chapter of the book
- - Document - Page of the book.
- - Box-archive - Books (a bookshelf);
- - Folder - A book
- - Document - Chapter of the book.
Time of search within a box is less than a second.
- Incoming information (documents) is processed at a rate of 10-20 MByte/hour on Pentium 200 MHz computer.
As the example archives, we offer fascimile copies of the following books:
- R. Lewin. Complexity. Life at the Edge of Chaos (40 Mb, text version is 1 Mb).
- F.P. Feynman. Feynman Lectures on Computation (32 Mb, text version is 1 Mb).
Here, each page of a book is treated as a separate document. The search result is a number of links to the pages with the found information. Clicking a link brings user to the text document with the request position highlighted. Each text document (page) is related to a facsimile picture.
The information and search system can operate with both text and facsimile documents, or with text documents only. Due to Internet throughput and storage limitations, on the Web page we place:
- Installation module of the Forget-Me-Not information system for Windows 95/98/NT;
- Text versions of the above books;
- Separate facsimile electronic copies of the book pages (approx. up to 1 MByte each book).
Support: e-mail firstname.lastname@example.org
phone: +7 495 629 7278.
RUSSIAN page |
IRE RAS Homepage |