Home » Server Options » Text & interMedia » Stem operator ($) problem
icon5.gif  Stem operator ($) problem [message #269471] Sat, 22 September 2007 12:46 Go to previous message
amazing
Messages: 46
Registered: September 2007
Location: Venezuela
Member
Hi. I have a problem with Oracle Text stem operator($). I’m developing my own “Spanish” text search application by the OracleŽ Text Reference 10g Release 2 (10.2) B14218-01 (June 2005) documentation. So, I have the following structure:

A base table to store de text with two fields: the key (NUMBER) and another field (VARCHAR) to store the text.

These are my own preferences:

ctx_ddl.create_preference('FNTC_Lexer', 'BASIC_LEXER');

ctx_ddl.set_attribute('FNTC_Lexer', 'base_letter', 'yes');

ctx_ddl.set_attribute('FNTC_Lexer', 'base_letter_type', 'GENERIC');

ctx_ddl.set_attribute('FNTC_Lexer', 'override_base_letter', 'false');

ctx_ddl.set_attribute('FNTC_Lexer', 'mixed_case', 'NO');

ctx_ddl.set_attribute('FNTC_Lexer', 'index_themes', 'NO');

ctx_ddl.set_attribute('FNTC_Lexer', 'index_stems', 'SPANISH');

ctx_ddl.set_attribute('FNTC_Lexer', 'index_text', 'YES');

-----------------------------------------------------------------

ctx_ddl.create_preference('FNTC_Word_List', 'BASIC_WORDLIST');

ctx_ddl.set_attribute('FNTC_Word_List', 'stemmer', 'SPANISH');

ctx_ddl.set_attribute('FNTC_Word_List', 'fuzzy_match', 'SPANISH');

ctx_ddl.set_attribute('FNTC_Word_List', 'fuzzy_score', '60');

ctx_ddl.set_attribute('FNTC_Word_List', 'fuzzy_numresults', '100');

ctx_ddl.set_attribute('FNTC_Word_List', 'substring_index', 'true');

ctx_ddl.set_attribute('FNTC_Word_List', 'prefix_index', 'true');

ctx_ddl.set_attribute('FNTC_Word_List', 'prefix_min_length', '1');

ctx_ddl.set_attribute('FNTC_Word_List', 'prefix_max_length', '50');

ctx_ddl.set_attribute('FNTC_Word_List', 'wildcard_maxterms', '50');

-----------------------------------------------------------------

ctx_ddl.create_stoplist('FNTC_Stop_List', 'BASIC_STOPLIST');

In this last case I use an empty stop list because hay I do not want to use any stop words.

This is the index structure:

create index IDX_DENOMINACION on dbusqueda(descripcion)
indextype is ctxsys.context
parameters('LEXER FNTC_Lexer WORDLIST FNTC_Word_List STOPLIST FNTC_Stop_List SYNC (ON COMMIT)');

I have this data in the base table:

ID DESCRIPCION
-----------------
1 PEPITO
2 PEPITA
3 PEPITE
4 PEPITIN
5 PEPITU
6 PEPE

So here is de problem. When I use the stem operator ($):

select score(1), t.*
from dbusqueda t
where contains(descripcion, '$pepito', 1) > 0 order by score(1) desc;

it should return all rows. Doesn’t it??

But only returns

ID DESCRIPTION
-----------------
1 PEPITO
6 PEPE

So I did the book’s stem example (chapter 3, Oracle Text CONTAINS Query Operators, page 3-36) to make me sure is problem of my solution. And I change my preferences attributes INDEX_STEMS, STEMMER and FUZZY_MATCH values to ENGLISH, change my base table data to values same as the example:

ID DESCRIPCION
-----------------
1 scream
2 screaming
3 screamed

then I rebuild the index

ALTER INDEX IDX_DENOMINACION REBUILD;

So, when I use the stem operator once again:

select score(1), t.*
from dbusqueda t
where contains(descripcion, '$scream', 1) > 0 order by score(1) desc;

it should return all rows. Doesn’t it??

But only returns

ID DESCRIPCION
-----------------
1 scream

When would have to be all the rows, just like the example.

So…what is wrong here…?










 
Read Message icon5.gif
Read Message
Read Message icon12.gif
Previous Topic: Upgrade Oracle Text 8i to 10g
Next Topic: Disable indexing for specific records
Goto Forum:
  


Current Time: Thu Apr 25 18:15:43 CDT 2024