Home » Server Options » Text & interMedia » file_datastore not indexing correctly (oracle version 10.2.0.1.0)
file_datastore not indexing correctly (oracle version 10.2.0.1.0) [message #186033] Fri, 04 August 2006 20:48
ow002633
Messages: 1
Registered: August 2006
Junior Member
Problem:

Indexing using direct_datastore returns different counts than using file_datastore.

(1) Created an index on a VARCHAR2 column that contained text from a given file (single row).
Used direct_datastore to create index.
Selected count(*) from the index and received a count of 109.

(2) Created an index on VARCHAR2 column that contained a file location for a file on the file system which contains the same data as in (1).
Used file_datastore to create index.
Selected count(*) from the index and received a count of 17.

The data that is getting indexed is Arabic.

Index for direct_datastore works as expected when performing a query but the file_datastore does not.

Any ideas why the files are not getting parsed correctly and what I could do to resolve?

Database properties

NLS_LANGUAGE = AMERICAN
NLS_CHARACTERSET = AL32UTF8
NLS_NCHAR_CHARACTERSET = AL16UTF16

Commands used to create indexes:

(1) create index globaldoc_index on globaldoc(text_data)
indextype is ctxsys.CONTENT
parameters ('datastore CTXSYS.DIRECT_DATASTORE,
filter ctxsys.null_filter, lexer world_lexer, language column language,
charset column characterset')

(2) create index globaldoc_index on globaldoc(filepath)
indextype is ctxsys.CONTENT
parameters ('datastore CTXSYS.FILE_DATASTORE,
filter ctxsys.null_filter, lexer world_lexer, language column language,
charset column characterset')

Table Structure:

globaldoc_id number
language varchar2(4)
doc_type varchar2(6)
characterset varchar2(20)
text_data varchar2(2000)
filepath varchar2(100)



Data in table globaldoc:

language = ar
characterset = AL32UTF8
Previous Topic: Error with Contains Clause in InterMedia
Next Topic: Want to Search any kind of PDF and other types of documents in one single coulmn
Goto Forum:
  


Current Time: Thu Mar 28 13:02:14 CDT 2024