Home » Server Options » Text & interMedia » Stem operator ($) problem
Stem operator ($) problem [message #269471] |
Sat, 22 September 2007 12:46 |
amazing
Messages: 46 Registered: September 2007 Location: Venezuela
|
Member |
|
|
Hi. I have a problem with Oracle Text stem operator($). I’m developing my own “Spanish” text search application by the OracleŽ Text Reference 10g Release 2 (10.2) B14218-01 (June 2005) documentation. So, I have the following structure:
A base table to store de text with two fields: the key (NUMBER) and another field (VARCHAR) to store the text.
These are my own preferences:
ctx_ddl.create_preference('FNTC_Lexer', 'BASIC_LEXER');
ctx_ddl.set_attribute('FNTC_Lexer', 'base_letter', 'yes');
ctx_ddl.set_attribute('FNTC_Lexer', 'base_letter_type', 'GENERIC');
ctx_ddl.set_attribute('FNTC_Lexer', 'override_base_letter', 'false');
ctx_ddl.set_attribute('FNTC_Lexer', 'mixed_case', 'NO');
ctx_ddl.set_attribute('FNTC_Lexer', 'index_themes', 'NO');
ctx_ddl.set_attribute('FNTC_Lexer', 'index_stems', 'SPANISH');
ctx_ddl.set_attribute('FNTC_Lexer', 'index_text', 'YES');
-----------------------------------------------------------------
ctx_ddl.create_preference('FNTC_Word_List', 'BASIC_WORDLIST');
ctx_ddl.set_attribute('FNTC_Word_List', 'stemmer', 'SPANISH');
ctx_ddl.set_attribute('FNTC_Word_List', 'fuzzy_match', 'SPANISH');
ctx_ddl.set_attribute('FNTC_Word_List', 'fuzzy_score', '60');
ctx_ddl.set_attribute('FNTC_Word_List', 'fuzzy_numresults', '100');
ctx_ddl.set_attribute('FNTC_Word_List', 'substring_index', 'true');
ctx_ddl.set_attribute('FNTC_Word_List', 'prefix_index', 'true');
ctx_ddl.set_attribute('FNTC_Word_List', 'prefix_min_length', '1');
ctx_ddl.set_attribute('FNTC_Word_List', 'prefix_max_length', '50');
ctx_ddl.set_attribute('FNTC_Word_List', 'wildcard_maxterms', '50');
-----------------------------------------------------------------
ctx_ddl.create_stoplist('FNTC_Stop_List', 'BASIC_STOPLIST');
In this last case I use an empty stop list because hay I do not want to use any stop words.
This is the index structure:
create index IDX_DENOMINACION on dbusqueda(descripcion)
indextype is ctxsys.context
parameters('LEXER FNTC_Lexer WORDLIST FNTC_Word_List STOPLIST FNTC_Stop_List SYNC (ON COMMIT)');
I have this data in the base table:
ID DESCRIPCION
-----------------
1 PEPITO
2 PEPITA
3 PEPITE
4 PEPITIN
5 PEPITU
6 PEPE
So here is de problem. When I use the stem operator ($):
select score(1), t.*
from dbusqueda t
where contains(descripcion, '$pepito', 1) > 0 order by score(1) desc;
it should return all rows. Doesn’t it??
But only returns
ID DESCRIPTION
-----------------
1 PEPITO
6 PEPE
So I did the book’s stem example (chapter 3, Oracle Text CONTAINS Query Operators, page 3-36) to make me sure is problem of my solution. And I change my preferences attributes INDEX_STEMS, STEMMER and FUZZY_MATCH values to ENGLISH, change my base table data to values same as the example:
ID DESCRIPCION
-----------------
1 scream
2 screaming
3 screamed
then I rebuild the index
ALTER INDEX IDX_DENOMINACION REBUILD;
So, when I use the stem operator once again:
select score(1), t.*
from dbusqueda t
where contains(descripcion, '$scream', 1) > 0 order by score(1) desc;
it should return all rows. Doesn’t it??
But only returns
ID DESCRIPCION
-----------------
1 scream
When would have to be all the rows, just like the example.
So…what is wrong here…?
|
|
|
Goto Forum:
Current Time: Thu Apr 25 18:15:43 CDT 2024
|