Home » Server Options » Text & interMedia » Problem with Catsearch and special characters in UTF8 (Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod)
Problem with Catsearch and special characters in UTF8 [message #364835] Wed, 10 December 2008 03:54 Go to next message
sbrkic
Messages: 16
Registered: December 2008
Junior Member
Hi,

We have the following problems with catsearch:

In: searchAllSongs Sqlcode: -29902 SqlErrm: ORA-29902: error in executing ODCIIndexStart() routine
ORA-20000: Oracle Text error:
DRG-51030: wildcard query expansion resulted in too many terms;

The searchString that is being sent to the DB is: note™*

When removing the copyright sign the query works fine.

I then decided put the trademarksign in the skipjoin and recreated the textindex. But the error is still being produced.

select dump('™') from dual;
Typ=96 Len=3: 226,132,162

Code which is generating the skipjoin:
ctx_ddl.set_attribute(mainViewName || '_' || suffix || '_SpecChLex', 'skipjoins', '`-=[];''\,./~!@#$%^&*()_+{}:"|<>?§&acute;&uml;&frac12;&frac14;&frac34;&curren;£¤©™®');

When i check the user_source it seems that it has been compiled properly (see bold part which is representing the trademark sign):
SELECT DUMP(text, 1010) DMP FROM user_source e where text like '%skipjoin%'
Typ=1 Len=150 CharacterSet=AL32UTF8: 32,32,32,32,99,116,120,95,100,100,108,46,115,101,116,95,97,116,116,114,105,98,117,116,101,40,109,97,105,110,86,105,101,119,78,97,109, 101,32,124,124,32,39,95,39,32,124,124,32,115,117,102,102,105,120,32,124,124,32,39,95,83,112,101,99,67,104,76,101,120,39,44,32,39,115, 107,105,112,106,111,105,110,115,39,44,32,39,96,45,61,91,93,59,39,39,92,44,46,47,126,33,64,35,36,37,94,38,42,40,41,95,43,123,125,58,34 ,124,60,62,63,194,167,194,180,194,168,194,189,194,188,194,190,194,164,194,163,226,130,172,194,169,226,132,162,194,174,39,41,59,10

If a characther is in a skipjoin I thought that it would be removed before the search is executed. Is this wrong?

Hope somebody could help me or know if I should contact support for this matter.

Thanks,

Slavko
Re: Problem with Catsearch and special characters in UTF8 [message #364918 is a reply to message #364835] Wed, 10 December 2008 08:16 Go to previous messageGo to next message
sbrkic
Messages: 16
Registered: December 2008
Junior Member
More on above issue:
My investigations shows following:

if i search on:

note™*

it translates it to following search:
note *
which means we search on note AND "everything else" the everything else causing the "wildcard query
expansion error"

I would have thought if I did not have it in the skipjoin it would have searched on note™*.
After putting it in the skipjoin and recreating the index I expected it to search on note*.
However in both cases it is as if it replaces the trademark sign with a space.

This happens with other signs as well. Such as:
©¤®

Any idea ?
Re: Problem with Catsearch and special characters in UTF8 [message #364967 is a reply to message #364918] Wed, 10 December 2008 12:55 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
What does the following query return? (substituting your index name):

SELECT COUNT (DISTINCT dr$token)
FROM dr$your_index_name$i
WHERE dr$token LIKE 'NOTE%';

and what is the value of wildcard_maxterms that you are using? If you have not assigned a value, then the default is 5000 for your version. In order for the query to succeed the wildcard_maxterms must be greater than the count returned by the above query. I would recommend creating a wordlist and setting the wildcard_maxterms to the maximum value of 15000 for your version. If that does not solve the problem then you will have to find a way to make your queries more restrictive. I am unable to reproduce anything that indicates that your query is searching for "NOTE *" with a space instead of "NOTE*" without a space. That should expand to all tokens that begin with "NOTE". However, I am using 11g, so there could be a 10g bug that I am not encountering. Please see my sample code below, using as much of your code as was provided and adding the basic_wordlist and wildcard_maxterms. I included one word with the trademark in the middle, just to show that the skipjoins is treating it properly by removing it, not replacing it with a space when indexing. However, it sounds like you are saying your indexing is working properly, just that the search string may not be interpreted properly.

Perhaps you could provide the results of:

SELECT CTX_REPORT.CREATE_INDEX_SCRIPT ('your_index_name') FROM DUAL;

so that I could do a more thorough check. There may be some conflicting parameters or some such thing.

SCOTT@orcl_11g> CREATE TABLE test_tab
  2    (test_col VARCHAR2(60))
  3  /

Table created.

SCOTT@orcl_11g> INSERT ALL
  2  INTO test_tab VALUES ('word1 word2 word3')
  3  INTO test_tab VALUES ('word1 note™ word3')
  4  INTO test_tab VALUES ('word1 note™d word3')
  5  INTO test_tab VALUES ('word1 noted word3')
  6  INTO test_tab VALUES ('word1 notes word3')
  7  INTO test_tab VALUES ('word1 noteworthy word3')
  8  SELECT * FROM DUAL
  9  /

6 rows created.

SCOTT@orcl_11g> SET DEFINE OFF
SCOTT@orcl_11g> DECLARE
  2    mainViewName VARCHAR2(4) := 'test';
  3    suffix	    VARCHAR2(3) := 'idx';
  4  BEGIN
  5    CTX_DDL.CREATE_PREFERENCE
  6  	 (mainViewName || '_' || suffix || '_SpecChLex',
  7  	  'BASIC_LEXER');
  8    ctx_ddl.set_attribute
  9  	 (mainViewName || '_' || suffix || '_SpecChLex',
 10  	  'skipjoins',
 11  	  '`-=[];''\,./~!@#$%^&*()_+{}:"|<>?§&acute;&uml;&frac12;&frac14;&frac34;&curren;£¤©™®');
 12    CTX_DDL.CREATE_PREFERENCE
 13  	 (mainViewName || '_' || suffix || '_wordlist',
 14  	  'BASIC_WORDLIST');
 15    CTX_DDL.SET_ATTRIBUTE
 16  	 (mainViewName || '_' || suffix || '_wordlist',
 17  	  'WILDCARD_MAXTERMS', 15000);
 18  END;
 19  /

PL/SQL procedure successfully completed.

SCOTT@orcl_11g> CREATE INDEX test_idx ON test_tab (test_col)
  2  INDEXTYPE IS CTXSYS.CTXCAT
  3  PARAMETERS
  4    ('LEXER	  test_idx_SpecChLex
  5  	 WORDLIST test_idx_wordlist')
  6  /

Index created.

SCOTT@orcl_11g> COLUMN wmt HEADING 'WILDCARD_MAX_TERMS|MUST BE GREATER THAN THIS'
SCOTT@orcl_11g> SELECT COUNT (DISTINCT dr$token) AS wmt
  2  FROM  dr$test_idx$i WHERE dr$token LIKE 'NOTE%'
  3  /

       WILDCARD_MAX_TERMS
MUST BE GREATER THAN THIS
-------------------------
                        4

SCOTT@orcl_11g> SELECT DISTINCT dr$token FROM dr$test_idx$i
  2  /

DR$TOKEN
----------------------------------------------------------------
NOTE
NOTED
NOTES
NOTEWORTHY
WORD1
WORD2
WORD3

7 rows selected.

SCOTT@orcl_11g> SELECT * FROM test_tab WHERE CATSEARCH (test_col, 'note™*', NULL) > 0
  2  /

TEST_COL
------------------------------------------------------------
word1 note™ word3
word1 note™d word3
word1 noted word3
word1 notes word3
word1 noteworthy word3

SCOTT@orcl_11g> 

Re: Problem with Catsearch and special characters in UTF8 [message #365204 is a reply to message #364835] Thu, 11 December 2008 01:45 Go to previous messageGo to next message
sbrkic
Messages: 16
Registered: December 2008
Junior Member
Hi Barbara and thanks for your quick reply,

I am pretty sure that it is replacing the sign with a space.

I did following search:

select * from mv_song_se
where catsearch(searchText, 'bruce©springsteen*', 'order by sortorder') > 0;

returning 487 rows;

select * from mv_song_se
where catsearch(searchText, 'bruce springsteen*', 'order by sortorder') > 0;

returning 487 rows;

select * from mv_song_se
where catsearch(searchText, 'brucespringsteen*', 'order by sortorder') > 0;

no rows returned;

It only happens with a few signs as I mentioned earlier. All other signs we are using are being handled properly. There is no difference having the sign in the skipjoin or not.

This behaviour only gives us the problem when the user searches with such a sign at the end of his searchstring (as we always append wildcard searches for strings longer than 2 characters)

user is entering the following searchstring: bruce©
The front end is appending the wildcard search which gives: bruce©*
The bug is making following happen: bruce *

What do you get from following query?

select dump('™') from dual;

As we have more than 2 million songs in our table concataneted with album name and artist we are getting the "wildcard query expansion" error.

SELECT CTX_REPORT.CREATE_INDEX_SCRIPT ('MV_SONG_SE_BATMAN_IND') FROM DUAL;

begin
  ctx_ddl.create_preference('"MV_SONG_SE_BATMAN_IND_LEX"','BASIC_LEXER');
  ctx_ddl.set_attribute('"MV_SONG_SE_BATMAN_IND_LEX"','SKIPJOINS','`-=[];'\,./~!@#$%^&*()_+{}:"|<>?§&acute;&uml;&frac12;&frac14;&frac34;&curren;£¿©¿®');
end;
/

begin
  ctx_ddl.create_preference('"MV_SONG_SE_BATMAN_IND_WDL"','BASIC_WORDLIST');
  ctx_ddl.set_attribute('"MV_SONG_SE_BATMAN_IND_WDL"','STEMMER','ENGLISH');
  ctx_ddl.set_attribute('"MV_SONG_SE_BATMAN_IND_WDL"','FUZZY_MATCH','GENERIC');
end;
/

begin
  ctx_ddl.create_stoplist('"MV_SONG_SE_BATMAN_IND_SPL"','BASIC_STOPLIST');
end;
/

begin
  ctx_ddl.create_preference('"MV_SONG_SE_BATMAN_IND_STO"','BASIC_STORAGE');
  ctx_ddl.set_attribute('"MV_SONG_SE_BATMAN_IND_STO"','R_TABLE_CLAUSE','lob (data) store as (cache)');
  ctx_ddl.set_attribute('"MV_SONG_SE_BATMAN_IND_STO"','I_INDEX_CLAUSE','compress 2');
end;
/

begin
  ctx_ddl.create_index_set('"MV_SONG_SE_BATMAN_IND_IXS"');
  ctx_ddl.add_index('"MV_SONG_SE_BATMAN_IND_IXS"','GENREID');
  ctx_ddl.add_index('"MV_SONG_SE_BATMAN_IND_IXS"','SORTORDER');
end;
/


begin
  ctx_output.start_log('MV_SONG_SE_BATMAN_IND_LOG');
end;
/

create index "MEDIA_CORE"."MV_SONG_SE_BATMAN_IND" 
  on "MEDIA_CORE"."MV_SONG_SE_BATMAN"
      ("SEARCHTEXT")
  indextype is ctxsys.ctxcat
  parameters('
    lexer           "MV_SONG_SE_BATMAN_IND_LEX"
    wordlist        "MV_SONG_SE_BATMAN_IND_WDL"
    stoplist        "MV_SONG_SE_BATMAN_IND_SPL"
    storage         "MV_SONG_SE_BATMAN_IND_STO"
    index set       "MV_SONG_SE_BATMAN_IND_IXS"
  ')
/

begin
  ctx_output.end_log;
end;
/

Re: Problem with Catsearch and special characters in UTF8 [message #365218 is a reply to message #364835] Thu, 11 December 2008 02:35 Go to previous messageGo to next message
sbrkic
Messages: 16
Registered: December 2008
Junior Member
Also following your test I am not getting the same results which leads me to believe that it really is a 10g bug:

SELECT DISTINCT dr$token FROM dr$test_idx$i;
[COLOR=red]D[/COLOR]
NOTE
NOTED
NOTES
NOTEWORTHY
WORD1
WORD2
WORD3

SELECT * FROM test_tab WHERE CATSEARCH (test_col, 'note™*', NULL) > 0;
word1 note™ word3
word1 note™d word3


Which suggests it has indexed the words according:
word1 note d word3

instead of
word1 noted word3

Bug or what do you think?

Slavko


SELECT CTX_REPORT.CREATE_INDEX_SCRIPT ('test_idx') FROM DUAL;

begin
  ctx_ddl.create_preference('"TEST_IDX_LEX"','BASIC_LEXER');
  ctx_ddl.set_attribute('"TEST_IDX_LEX"','SKIPJOINS','`-=[];'\,./~!@#$%^*()_+{}:"|<>?§£¤©™®');
end;
/

begin
  ctx_ddl.create_preference('"TEST_IDX_WDL"','BASIC_WORDLIST');
  ctx_ddl.set_attribute('"TEST_IDX_WDL"','WILDCARD_MAXTERMS','15000');
end;
/

begin
  ctx_ddl.create_stoplist('"TEST_IDX_SPL"','BASIC_STOPLIST');
  {removed the stopwords}
end;
/

begin
  ctx_ddl.create_preference('"TEST_IDX_STO"','BASIC_STORAGE');
  ctx_ddl.set_attribute('"TEST_IDX_STO"','R_TABLE_CLAUSE','lob (data) store as (cache)');
  ctx_ddl.set_attribute('"TEST_IDX_STO"','I_INDEX_CLAUSE','compress 2');
end;
/

begin
  ctx_ddl.create_index_set('"TEST_IDX_IXS"');
end;
/


begin
  ctx_output.start_log('TEST_IDX_LOG');
end;
/

create index "MEDIA_CORE"."TEST_IDX" 
  on "MEDIA_CORE"."TEST_TAB"
      ("TEST_COL")
  indextype is ctxsys.ctxcat
  parameters('
    lexer           "TEST_IDX_LEX"
    wordlist        "TEST_IDX_WDL"
    stoplist        "TEST_IDX_SPL"
    storage         "TEST_IDX_STO"
    index set       "TEST_IDX_IXS"
  ')
/

begin
  ctx_output.end_log;
end;
/
Re: Problem with Catsearch and special characters in UTF8 [message #365343 is a reply to message #365218] Thu, 11 December 2008 10:31 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
You have convinced me that the trademark character and some others are being converted to whitespace on your system. However, even with the index script, I am unable to reproduce it on my system. So, that brings us to what is different. It could be the version or it could be the character set or it could be some default value. I am thinking that there may be something that is overriding the skipjoins on those characters to cause them to be converted to whitespace. Please try including explicitly specifying your whitespace characters in your lexer and see if that fixes the problem. If you specify just a space as whitespace, it should eliminate any other characters being treated as whitespace.

ctx_ddl.set_attribute ('TEST_IDX_LEX', 'WHITESPACE', ' ');
Re: Problem with Catsearch and special characters in UTF8 [message #366311 is a reply to message #365218] Fri, 12 December 2008 13:46 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
I saw your posts on asktom.oracle.com that included your trace files:

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:440419921146#1365474700346010440

I can see that behind the scenes, catsearch is doing a select from the domain index tables using a bind variable something like this:

SCOTT@orcl_11g> VARIABLE lkexpr VARCHAR2(30)
SCOTT@orcl_11g> EXEC :lkexpr := 'NOTE%'

PL/SQL procedure successfully completed.


LKEXPR
--------------------------------
NOTE%

SCOTT@orcl_11g> SELECT/*+DYNAMIC_SAMPLING(0) INDEX(T "DR$TEST_IDX$X")*/ DISTINCT DR$TOKEN FROM
2 "DR$TEST_IDX$I" T WHERE DR$TOKEN LIKE :lkexpr ESCAPE '\' and
3 (DR$TOKEN_TYPE IN (0, 4, 6, 604) OR (DR$TOKEN_TYPE BETWEEN 16 AND 74) OR
4 (DR$TOKEN_TYPE BETWEEN 616 AND 674))
5 /

DR$TOKEN
----------------------------------------------------------------
NOTE
NOTED
NOTES
NOTEWORTHY

SCOTT@orcl_11g>

I suspect that your index is being created properly but something is happening to the :lkexpr bind variable being used, such that it does not contain the value 'NOTE%' as it should. This could be due to some overriding parameter such as whitespace attribute or a conflict with the character set or a bug.

I see that Tom Kyte referred you to the OTN forums. I recommend that you post your problem on the OTN Text forum:

http://forums.oracle.com/forums/forum.jspa?forumID=71

Oracle employee Roger Ford regularly responds there. He has been involved in Oracle Text for many years and I believe he is the current product manager. He should be able to determine whether the problem is due to a bug or not. You might want to include links to this thread and asktom for his reference to save yourself some repetition.

[Updated on: Fri, 12 December 2008 13:50]

Report message to a moderator

Re: Problem with Catsearch and special characters in UTF8 [message #375777 is a reply to message #366311] Sun, 14 December 2008 09:02 Go to previous messageGo to next message
sbrkic
Messages: 16
Registered: December 2008
Junior Member
Hi Barbara,
I did some test last friday and this seems to have solved the problem:

ctx_ddl.set_attribute ('TEST_IDX_LEX', 'WHITESPACE', ' ');

I did the test on the small test table I am getting correct results from

SELECT DISTINCT dr$token FROM dr$test_idx$i;

Why do you think the index was being populated properly before? The SELECT DISTINCT dr$token FROM dr$test_idx$i; gave me wrong results before.

However thanks a lot. It seems to have solved my problem. I will update askTom once I have done the test on my real table. Still thinks it is a bug which probably will be solved when we upgrade to 11g. I will install 11g as well just to test as well.

Once again thanks a lot !!

Best regards,
Slavko
Re: Problem with Catsearch and special characters in UTF8 [message #375791 is a reply to message #375777] Sun, 14 December 2008 13:20 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
I am glad your problem is solved. I thought the index was O.K. because I thought your posted results were the same as mine. I missed the D. So, It looks like the problem was just that, by default, it was treating the trademark symbol as whitespace. It is documented that when there are conflicts between parameters such as whitespace and skipjoins, priority is given to one and the other is ignored. However, unless you have changed the default parameters, only the space character and the tab character should be treated as whitespace.

The default lexer can be changed using ctx_adm. You can confirm whether your system is using the original default lexer or not like so:

SCOTT@orcl_11g> SELECT par_value
  2  FROM   ctx_parameters
  3  WHERE  par_name = 'DEFAULT_LEXER'
  4  /

PAR_VALUE
--------------------------------------------------------------------------------
CTXSYS.DEFAULT_LEXER


You can also check whether any values within the default lexer have been changed, like so:

SCOTT@orcl_11g> SELECT prv_attribute, prv_value
  2  FROM   ctx_preference_values
  3  WHERE  prv_owner || '.' || prv_preference =
  4  	    (SELECT par_value
  5  	     FROM   ctx_parameters
  6  	     WHERE  par_name = 'DEFAULT_LEXER')
  7  /

PRV_ATTRIBUTE
------------------------------
PRV_VALUE
--------------------------------------------------------------------------------
WHITESPACE



SCOTT@orcl_11g>

Re: Problem with Catsearch and special characters in UTF8 [message #375792 is a reply to message #364835] Sun, 14 December 2008 13:32 Go to previous messageGo to next message
sbrkic
Messages: 16
Registered: December 2008
Junior Member
Yes I tried to highlight the D with red colour but it wouldnt take it in a code block. And now it is difficult to see.
I'll test these sql's back in the office tomorrow.

Best regards,
Slavko
Re: Problem with Catsearch and special characters in UTF8 [message #381888 is a reply to message #364835] Tue, 20 January 2009 04:10 Go to previous messageGo to next message
sbrkic
Messages: 16
Registered: December 2008
Junior Member
Hi Barabra,

Its been a while but have ben busy with other tasks unfortunately. However the WHITESPACE does not work. I did this test on my laptop which has WE and not UTF8 character set (works well on WE but not UTF8). I will post my problem on the OTN forum you suggested. Another totally different question if its ok. When people search for Beyoncé I want them to find Beyoncé and Beyonce. What is the easiest way to implement this? I cannot change all data to Beyoncé.

Thanks,
Re: Problem with Catsearch and special characters in UTF8 [message #381918 is a reply to message #364835] Tue, 20 January 2009 04:56 Go to previous messageGo to next message
sbrkic
Messages: 16
Registered: December 2008
Junior Member
Hi Barbara,

Found a solution to the é e problem.
BASIC_LEXER attribute BASE_LETTER to YES

Cheers,
Re: Problem with Catsearch and special characters in UTF8 [message #381972 is a reply to message #364835] Tue, 20 January 2009 08:53 Go to previous messageGo to next message
sbrkic
Messages: 16
Registered: December 2008
Junior Member
Hi Barbara,

Would you know how to query the base letter table ? (Oracle Table where all the base letter are stored)

With regards,
Slavko
Re: Problem with Catsearch and special characters in UTF8 [message #382110 is a reply to message #381972] Wed, 21 January 2009 00:24 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
sbrkic wrote on Tue, 20 January 2009 06:53

Would you know how to query the base letter table ? (Oracle Table where all the base letter are stored)



I can't even find a "base letter table". I know the documentation refers to there being such a table, but it is not listed in the data dictionary views and I don't see it, so it must be hidden. But, why do you think you need to query such a table? Any modification to any letter, such as an accent, maps to the base letter.
Re: Problem with Catsearch and special characters in UTF8 [message #382117 is a reply to message #364835] Wed, 21 January 2009 00:40 Go to previous messageGo to next message
sbrkic
Messages: 16
Registered: December 2008
Junior Member
We have offered the customer to introduce BASIC_LEXER attribute BASE_LETTER to YES. So they want to know exactly which letters will be transformed (to see if it will cover their needs). So I have tried to find this table to give them a report of the letters. I have not been able to find it in the documentation nor through googling.

Slavko
Re: Problem with Catsearch and special characters in UTF8 [message #382121 is a reply to message #364835] Wed, 21 January 2009 00:55 Go to previous messageGo to next message
sbrkic
Messages: 16
Registered: December 2008
Junior Member
Hi Barbara,
I opened the locale builder application. Under New Linguistic sort I chose a WE collation name. I got some result under the BASER_LETTER tab. Do not know if this is the real thing or not. We are using UTF8 and I was not able to chose that.

Slavko
Re: Problem with Catsearch and special characters in UTF8 [message #382127 is a reply to message #364835] Wed, 21 January 2009 01:15 Go to previous messageGo to next message
sbrkic
Messages: 16
Registered: December 2008
Junior Member
I guess the one to use is the generic_baseletter ? i am not sure however. It looks OK.
Re: Problem with Catsearch and special characters in UTF8 [message #382136 is a reply to message #382121] Wed, 21 January 2009 01:49 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
You can derive all of the conversions by loading all possible ascii characters into a test table, indexing them using a lexer with base_letter, then selecting from a join of the test table and the domain index table, as demonstrated below.

SCOTT@orcl_11g> CREATE TABLE test_tab (test_col  VARCHAR2(1))
  2  /

Table created.

SCOTT@orcl_11g> INSERT	INTO test_tab
  2  SELECT  CHR (LEVEL)
  3  FROM    DUAL
  4  CONNECT BY LEVEL <= 256
  5  /

256 rows created.

SCOTT@orcl_11g> BEGIN
  2    CTX_DDL.CREATE_PREFERENCE ('test_lex', 'BASIC_LEXER');
  3    CTX_DDL.SET_ATTRIBUTE ('test_lex', 'BASE_LETTER', 'YES');
  4    CTX_DDL.SET_ATTRIBUTE ('test_lex', 'MIXED_CASE', 'YES');
  5  END;
  6  /

PL/SQL procedure successfully completed.

SCOTT@orcl_11g> CREATE INDEX test_idx ON test_tab (test_col)
  2  INDEXTYPE IS CTXSYS.CONTEXT
  3  PARAMETERS
  4    ('LEXER	   test_lex
  5  	 STOPLIST  CTXSYS.EMPTY_STOPLIST')
  6  /

Index created.

SCOTT@orcl_11g> COLUMN base_letter     FORMAT A11
SCOTT@orcl_11g> COLUMN original_values FORMAT A30
SCOTT@orcl_11g> SELECT i.token_text	      AS base_letter,
  2  	    WM_CONCAT (t.test_col) AS original_values
  3  FROM   test_tab t, dr$test_idx$i i
  4  WHERE  CONTAINS (t.test_col, i.token_text) > 0
  5  AND    t.test_col <> i.token_text
  6  GROUP  BY i.token_text
  7  ORDER  BY i.token_text
  8  /

BASE_LETTER ORIGINAL_VALUES
----------- ------------------------------
A           À,Á,Ã,Â,Ä,Å
C           Ç
E           È,Ê,É,Ë
I           Ì,Ï,Í,Î
N           Ñ
O           Ò,Õ,Ø,Ö,Ô,Ó
S           ¦
U           Ù,Ú,Û,Ü
Y           ¾,Ý
Z           ´
a           à,á,ã,å,ä,â
c           ç
e           è,ê,ë,é
i           ì,í,î,ï
n           ñ
o           ò,õ,ô,ó,ö,ø
s           ¨
u           ù,ú,ü,û
y           ý,ÿ
z           ¸

20 rows selected.

SCOTT@orcl_11g>

Re: Problem with Catsearch and special characters in UTF8 [message #382170 is a reply to message #364835] Wed, 21 January 2009 04:14 Go to previous messageGo to next message
sbrkic
Messages: 16
Registered: December 2008
Junior Member
I see. Very clever indeed.
I have UTF8 10g so it does not work properly. The lengths of the characters are greater then 1 and I get a lot of weird signs such as n and so on.

Cheers,
Re: Problem with Catsearch and special characters in UTF8 [message #382199 is a reply to message #364835] Wed, 21 January 2009 06:01 Go to previous messageGo to next message
sbrkic
Messages: 16
Registered: December 2008
Junior Member
Fixed it. I had to modify according to the following. Thanks a lot.

INSERT INTO t_tab
    SELECT CHR (LEVEL), level
      FROM DUAL
     WHERE length (CHR (LEVEL)) = 1
   CONNECT BY LEVEL <= 100000 /* lot of characters in the 50000 region */
;

SELECT i.token_text           AS base_letter,
       WM_CONCAT (t.test_col) AS original_values
  FROM t_tab t, dr$t_idx$i i
 WHERE CONTAINS (t.test_col, i.token_text) > 0
   AND ascii(t.test_col) <> ascii(i.token_text)
 GROUP BY i.token_text
 ORDER BY i.token_text;

Re: Problem with Catsearch and special characters in UTF8 [message #382613 is a reply to message #364835] Fri, 23 January 2009 04:06 Go to previous message
sbrkic
Messages: 16
Registered: December 2008
Junior Member
The original problem seems so be related with bug 4158093 and 6127196. This is like 4 years old. Probably have to live with it.

Thanks for your help Barbara

BR,
Slavko
Previous Topic: Text index performance guidelines
Next Topic: CTX_DOC.Snippet not recognized special characters like ',""
Goto Forum:
  


Current Time: Thu Mar 28 07:39:47 CDT 2024