Home » Server Options » Text & interMedia » Oracle Text query with multiple contains performance
Oracle Text query with multiple contains performance [message #264520] Mon, 03 September 2007 11:20 Go to next message
Agus211
Messages: 39
Registered: September 2007
Member
Hi, Im having performance issues with an Oracle Text query that makes multiple CONTAINS filters. Can you help me tuning it, because Im a begginer in this type of things and I don't know how to do it.

The query is:

SELECT /*+ FIRST_ROWS(50) */
(score (1) + score (2) + score (3)
) AS RANK, ID, TO_CHAR (dateinsert, 'dd/mm/yyyy')
FROM NOTICIAS
WHERE ( contains (resumen, v_palabra, 1) > 0
OR contains (insert, v_palabra, 2) > 0
OR contains (tema, v_palabra, 3) > 0
)
AND ouid = v_ouid
ORDER BY dateinsert DESC;

Someone told me I should'nt have so many "contains" in the query, is that right?.

Thanks in advance.
Re: Oracle Text query with multiple contains performance [message #264554 is a reply to message #264520] Mon, 03 September 2007 13:50 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
Theoretically, you should get better performance if you used a multicolumn_datastore and then you would only need one contains clause.

Re: Oracle Text query with multiple contains performance [message #264555 is a reply to message #264520] Mon, 03 September 2007 13:55 Go to previous messageGo to next message
Agus211
Messages: 39
Registered: September 2007
Member
Hi, thank you for answering. Could you explain me a little how should I do it?. Should it be a big performance difference?.
Re: Oracle Text query with multiple contains performance [message #264576 is a reply to message #264555] Mon, 03 September 2007 17:26 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
You should see a significant performance improvement with a multi_column_datastore, because it will be using one index instead of three. I have provided a basic example below. You can find detailed information and additional options in the online documentation. For maximum performance you should also have indexes on any columns used in filter conditions, like ouid, or ordering, like dateinsert. Also, you should avoid using reserved words, like insert, as column names.


-- table and data for demo:
SCOTT@10gXE> CREATE TABLE noticias
  2    (id	     NUMBER,
  3  	ouid	     NUMBER,
  4  	dateinsert   DATE,
  5  	resumen      CLOB,
  6  	inserted     VARCHAR2 (10),
  7  	tema	     VARCHAR2 (10),
  8  	search_cols  VARCHAR2 (1))
  9  /

Table created.

SCOTT@10gXE> INSERT ALL
  2  INTO noticias VALUES (1, 10, SYSDATE, 'testing', 'something', 'whatever', null)
  3  INTO noticias VALUES (2, 10, SYSDATE, 'something', 'testing', 'whatever', null)
  4  INTO noticias VALUES (3, 10, SYSDATE, 'something', 'whatever', 'testing', null)
  5  INTO noticias VALUES (4, 10, SYSDATE, 'teting', 'testing', 'testing', null)
  6  INTO noticias VALUES (5, 10, SYSDATE, 'and', 'another', 'thing', null)
  7  SELECT * FROM DUAL
  8  /

5 rows created.


-- create multi_column_datastore and specify columns:
SCOTT@10gXE> BEGIN
  2    CTX_DDL.CREATE_PREFERENCE ('noticias_mcds', 'MULTI_COLUMN_DATASTORE');
  3    CTX_DDL.SET_ATTRIBUTE ('noticias_mcds', 'COLUMNS', 'resumen, inserted, tema');
  4  END;
  5  /

PL/SQL procedure successfully completed.


-- create index using multicolumn_datastore:
SCOTT@10gXE> CREATE INDEX noticias_index
  2  ON noticias (search_cols)
  3  INDEXTYPE IS CTXSYS.CONTEXT
  4  PARAMETERS ('DATASTORE noticias_mcds')
  5  /

Index created.


-- variables for searching:
SCOTT@10gXE> VARIABLE v_palabra VARCHAR2 (30)
SCOTT@10gXE> VARIABLE v_ouid	NUMBER
SCOTT@10gXE> EXEC :v_palabra := 'testing'

PL/SQL procedure successfully completed.

SCOTT@10gXE> EXEC :v_ouid    := 10

PL/SQL procedure successfully completed.


-- sample query:
SCOTT@10gXE> COLUMN resumen FORMAT A10 word_wrapped
SCOTT@10gXE> SELECT /*+ FIRST_ROWS(50) */
  2  	    score (1) AS RANK,
  3  	    ID, TO_CHAR (dateinsert, 'dd/mm/yyyy') "DATE",
  4  	    resumen, inserted, tema
  5  FROM   NOTICIAS
  6  WHERE  contains (search_cols, :v_palabra, 1) > 0
  7  AND   ouid = :v_ouid
  8  ORDER BY dateinsert DESC
  9  /

      RANK         ID DATE       RESUMEN    INSERTED   TEMA
---------- ---------- ---------- ---------- ---------- ----------
         3          1 03/09/2007 testing    something  whatever
         3          2 03/09/2007 something  testing    whatever
         3          3 03/09/2007 something  whatever   testing
         7          4 03/09/2007 teting     testing    testing

SCOTT@10gXE> 

Re: Oracle Text query with multiple contains performance [message #264767 is a reply to message #264520] Tue, 04 September 2007 09:07 Go to previous messageGo to next message
Agus211
Messages: 39
Registered: September 2007
Member
Hi, first of all I want to thank you for that example and the advices, they helped me a lot. Now that the performance improved in that query, they told me if I could make an another query that searches in one of the columns that are contained in the multi_column_datastore.

The thing is that I should create an Oracle Text index for resumen, inserted and tema besides noticias_index with this new requirement and use a simple datastore.

Isn't it a mix up of things that would cause a mess?.

Im sorry if I don't explain myself very well, but I don't speak English as I would like to.


Again, thank you.
Re: Oracle Text query with multiple contains performance [message #264830 is a reply to message #264767] Tue, 04 September 2007 12:59 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
You can create a section group and add one section per column, then search individual columns using "within".
Re: Oracle Text query with multiple contains performance [message #264845 is a reply to message #264767] Tue, 04 September 2007 14:42 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
Here is an expansion of the previous example that includes a section group, one section per column, and examples of searching individual columns using "within".

-- table and data for demo:
SCOTT@10gXE> CREATE TABLE noticias
  2    (id	     NUMBER,
  3  	ouid	     NUMBER,
  4  	dateinsert   DATE,
  5  	resumen  CLOB,
  6  	inserted     VARCHAR2 (10),
  7  	tema	     VARCHAR2 (10),
  8  	search_cols  VARCHAR2 (1))
  9  /

Table created.

SCOTT@10gXE> INSERT ALL
  2  INTO noticias VALUES (1, 10, SYSDATE, 'testing', 'something', 'whatever', null)
  3  INTO noticias VALUES (2, 10, SYSDATE, 'something', 'testing', 'whatever', null)
  4  INTO noticias VALUES (3, 10, SYSDATE, 'something', 'whatever', 'testing', null)
  5  INTO noticias VALUES (4, 10, SYSDATE, 'testing', 'testing', 'testing', null)
  6  INTO noticias VALUES (5, 10, SYSDATE, 'whatever', 'nothing', 'something', null)
  7  SELECT * FROM DUAL
  8  /

5 rows created.


-- create multi_column_datastore and specify columns:
SCOTT@10gXE> BEGIN
  2    CTX_DDL.CREATE_PREFERENCE ('noticias_mcds', 'MULTI_COLUMN_DATASTORE');
  3    CTX_DDL.SET_ATTRIBUTE ('noticias_mcds', 'COLUMNS', 'resumen, inserted, tema');
  4  END;
  5  /

PL/SQL procedure successfully completed.


-- create section group and add one section per column:
SCOTT@10gXE> BEGIN
  2    CTX_DDL.CREATE_SECTION_GROUP ('noticias_sg', 'BASIC_SECTION_GROUP');
  3    CTX_DDL.ADD_FIELD_SECTION ('noticias_sg', 'resumen', 'resumen', TRUE);
  4    CTX_DDL.ADD_FIELD_SECTION ('noticias_sg', 'inserted', 'inserted', TRUE);
  5    CTX_DDL.ADD_FIELD_SECTION ('noticias_sg', 'tema', 'tema', TRUE);
  6  END;
  7  /

PL/SQL procedure successfully completed.


-- create index using multicolumn_datastore and section group:
SCOTT@10gXE> CREATE INDEX noticias_index
  2  ON noticias (search_cols)
  3  INDEXTYPE IS CTXSYS.CONTEXT
  4  PARAMETERS
  5    ('DATASTORE	noticias_mcds
  6  	 SECTION GROUP	noticias_sg')
  7  /

Index created.


-- variables for searching:
SCOTT@10gXE> VARIABLE v_palabra VARCHAR2 (30)
SCOTT@10gXE> VARIABLE v_ouid	NUMBER
SCOTT@10gXE> EXEC :v_palabra := 'testing'

PL/SQL procedure successfully completed.

SCOTT@10gXE> EXEC :v_ouid    := 10

PL/SQL procedure successfully completed.


-- search of all columns:
SCOTT@10gXE> COLUMN resumen FORMAT A10 word_wrapped
SCOTT@10gXE> SELECT /*+ FIRST_ROWS(50) */
  2  	    score (1) AS RANK,
  3  	    ID, TO_CHAR (dateinsert, 'dd/mm/yyyy') "DATE",
  4  	    resumen, inserted, tema
  5  FROM   NOTICIAS
  6  WHERE  contains (search_cols, :v_palabra, 1) > 0
  7  AND   ouid = :v_ouid
  8  ORDER BY dateinsert DESC
  9  /

      RANK         ID DATE       RESUMEN    INSERTED   TEMA
---------- ---------- ---------- ---------- ---------- ----------
         3          1 04/09/2007 testing    something  whatever
         3          2 04/09/2007 something  testing    whatever
         3          3 04/09/2007 something  whatever   testing
        10          4 04/09/2007 testing    testing    testing


-- searches of individual columns using "within":
SCOTT@10gXE> SELECT score (1) AS RANK,
  2  	    ID, TO_CHAR (dateinsert, 'dd/mm/yyyy') "DATE",
  3  	    resumen, inserted, tema
  4  FROM   NOTICIAS
  5  WHERE  contains (search_cols, :v_palabra || ' WITHIN resumen', 1) > 0
  6  /

      RANK         ID DATE       RESUMEN    INSERTED   TEMA
---------- ---------- ---------- ---------- ---------- ----------
         4          1 04/09/2007 testing    something  whatever
         4          4 04/09/2007 testing    testing    testing

SCOTT@10gXE> SELECT score (1) AS RANK,
  2  	    ID, TO_CHAR (dateinsert, 'dd/mm/yyyy') "DATE",
  3  	    resumen, inserted, tema
  4  FROM   NOTICIAS
  5  WHERE  contains (search_cols, :v_palabra || ' WITHIN inserted', 1) > 0
  6  /

      RANK         ID DATE       RESUMEN    INSERTED   TEMA
---------- ---------- ---------- ---------- ---------- ----------
         4          2 04/09/2007 something  testing    whatever
         4          4 04/09/2007 testing    testing    testing

SCOTT@10gXE> SELECT score (1) AS RANK,
  2  	    ID, TO_CHAR (dateinsert, 'dd/mm/yyyy') "DATE",
  3  	    resumen, inserted, tema
  4  FROM   NOTICIAS
  5  WHERE  contains (search_cols, :v_palabra || ' WITHIN tema', 1) > 0
  6  /

      RANK         ID DATE       RESUMEN    INSERTED   TEMA
---------- ---------- ---------- ---------- ---------- ----------
         4          3 04/09/2007 something  whatever   testing
         4          4 04/09/2007 testing    testing    testing

SCOTT@10gXE> 

Re: Oracle Text query with multiple contains performance [message #265603 is a reply to message #264520] Thu, 06 September 2007 12:53 Go to previous messageGo to next message
Agus211
Messages: 39
Registered: September 2007
Member
Thank you for all the examples. I'd like to ask you since Im new on this if having only one Oracle Text index with multicolumn datastore and sections is more performant than having 3 single column Oracle Text indexes and one multicolumn index.

Im asking you this because I don't know if when you want to query one of the sections declared in the index, it "moves around" all the sections and then retrieves the one I desire. Does this happen?. If this is true, should I have 3 Oracle Text indexes besides the multicolumn one to make single columns queries?.

Thank you.
Re: Oracle Text query with multiple contains performance [message #265631 is a reply to message #265603] Thu, 06 September 2007 15:02 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
The multi_column_datatstore was invented for this purpose because it is faster. When in doubt, test and compare for yourself. If you do not get better performance then there may be other reasons. If you are searching on all three columns, then the sections are not involved. If you are searching within only one or two columns, then it is still faster than using separate indexes. With separate indexes you would likely find that the execution plan would use only one index or none of them and too many contains clauses and or conditions might just result in an error. One index with a multi_column_datastore and sections is the right way to do this. Oracle Text indexes are not like regular indexes. The data is stored in tables. For example, if your index is named your_index, when you create a context index, it creates several tables, such as dr$your_index$i, and query results are retrieved based on what is in those tables. If you "select token_text from dr$your_index$i;", you will see all of the words that have been tokenized, indexed, and are searchable. There are other columns in that table that relate it to the rows in your data table. Some of it is stored in a format that requires usage of Oracle functions to decipher it. You really need to start reading the Oracle Text Reference and Oracle Text Application Developer's Guide in the online documentation. It is all searchable. For example, you can search for "multi_column_datastore" and find plenty of information and examples. If this is all too complicated for you at this time, then you may want to consider hiring a consultant. You mentioned a problem with English previously, but your English is clear. However, if you prefer to post in Spanish or some other language, you may do so in the separate forums listed at the bottom of the main forum page.
Re: Oracle Text query with multiple contains performance [message #265712 is a reply to message #265631] Fri, 07 September 2007 02:12 Go to previous messageGo to next message
guinevere
Messages: 1
Registered: September 2007
Junior Member
I have similar situation where I need to create a unified search tool from various table and various columns. To achieve this I am planning to create multi_column_datastore that includes a section group, one section per column.

My Text Table would look like this:
CREATE TABLE myTable
2 (id NUMBER,
3 col1 CLOB,
4 col2 CLOB,
5 col3 CLOB,
6 col4 CLOB)

Each columns are populated with text delimited from various columns from various tables.
For Example:
col1 - Data1<:>Data2<:>Data3<:>Data4...etc
Table1 - Data1 Varchar2(200)
Data2 Clob
Table2 - Data3 Varchar2(400)
Data4 Clob

Use context search to achive unified search from various tables and various columns. Please advice whether the approach is correct.

I know I shall hit the road block on the CLOB datasize while concatinating the text from various columns from various tables. How do you solve this one?




Re: Oracle Text query with multiple contains performance [message #265899 is a reply to message #265712] Fri, 07 September 2007 14:34 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
If your columns come from different tables, then you should use a user_datastore with a procedure to append the clobs, with tags inbetween to mark the sections. Please see the example below that includes explained plans to show index usage. I originally posted the example below on another forum, in response to someone who was using two context indexes on the individual columns, querying with two contains clauses and getting poor performance and the explain plan showed that neither context index was being used. You want to use one index whenever possible and avoid multiple contains clauses. Unless you have a data warehouse situation, if your data is going to change, you should also avoid creating additional tables that select columns from your base tables, as this will create synchronization problems. You should also be aware that Oracle Text will not allow you to create more than one index of the same type on one column.


SCOTT@10gXE> CREATE TABLE foros_mens
  2    (id	   NUMBER,
  3  	visible    VARCHAR2 (1),
  4  	fecha	   DATE,
  5  	titulo	   VARCHAR2 (30),
  6  	CONSTRAINT foros_mens_id_pk PRIMARY KEY (id))
  7  /

Table created.

SCOTT@10gXE> INSERT ALL
  2  INTO foros_mens VALUES (1, 's', SYSDATE, 'xxxxx')
  3  INTO foros_mens VALUES (2, 's', SYSDATE, 'whatever')
  4  INTO foros_mens VALUES (3, 's', SYSDATE, 'xxxxx')
  5  INTO foros_mens VALUES (4, 's', SYSDATE, 'whatever')
  6  SELECT * FROM DUAL
  7  /

4 rows created.

SCOTT@10gXE> CREATE TABLE foros_texto
  2    (id_mens    NUMBER REFERENCES foros_mens (id),
  3  	texto	   CLOB,
  4  	CONSTRAINT foros_texto_id_mens_fk FOREIGN KEY (id_mens)
  5  		   REFERENCES foros_mens (id))
  6  /

Table created.

SCOTT@10gXE> INSERT ALL
  2  INTO foros_texto VALUES (1, 'whatever')
  3  INTO foros_texto VALUES (2, 'xxxxx')
  4  INTO foros_texto VALUES (3, 'xxxxx')
  5  INTO foros_texto VALUES (4, 'whatever')
  6  SELECT * FROM DUAL
  7  /

4 rows created.

SCOTT@10gXE> CREATE OR REPLACE PROCEDURE foros_proc
  2    (p_rowid IN     ROWID,
  3  	p_clob	IN OUT CLOB)
  4  AS
  5  BEGIN
  6    FOR r IN
  7  	 (SELECT m.titulo, t.texto
  8  	  FROM	 foros_mens m, foros_texto t
  9  	  WHERE  m.id=t.id_mens
 10  	  AND	 t.ROWID = p_rowid)
 11    LOOP
 12  	 DBMS_LOB.WRITEAPPEND (p_clob, 8, '<titulo>');
 13  	 DBMS_LOB.WRITEAPPEND (p_clob, LENGTH (r.titulo), r.titulo);
 14  	 DBMS_LOB.WRITEAPPEND (p_clob, 16, '</titulo><texto>');
 15  	 DBMS_LOB.APPEND (p_clob, r.texto);
 16  	 DBMS_LOB.WRITEAPPEND (p_clob, 8, '</texto>');
 17    END LOOP;
 18  END foros_proc;
 19  /

Procedure created.

SCOTT@10gXE> SHOW ERRORS
No errors.
SCOTT@10gXE> BEGIN
  2    CTX_DDL.CREATE_PREFERENCE ('foros_datastore', 'USER_DATASTORE');
  3    CTX_DDL.SET_ATTRIBUTE ('foros_datastore', 'PROCEDURE', 'foros_proc');
  4  END;
  5  /

PL/SQL procedure successfully completed.

SCOTT@10gXE> CREATE INDEX foros_index ON foros_texto (texto)
  2  INDEXTYPE IS CTXSYS.CONTEXT
  3  PARAMETERS
  4    ('DATASTORE	foros_datastore
  5  	 SECTION GROUP	CTXSYS.AUTO_SECTION_GROUP')
  6  /

Index created.

SCOTT@10gXE> SET AUTOTRACE ON EXPLAIN
SCOTT@10gXE> COLUMN texto FORMAT A30 WORD_WRAPPED
SCOTT@10gXE> -- search of both columns:
SCOTT@10gXE> SELECT *
  2  FROM   foros_mens m, foros_texto t
  3  WHERE  m.visible = 's'
  4  AND    CONTAINS (t.texto,'xxxxx') > 0
  5  AND    m.fecha > (SYSDATE - 90)
  6  AND    m.id = t.id_mens
  7  /

        ID V FECHA     TITULO                            ID_MENS TEXTO
---------- - --------- ------------------------------ ---------- ------------------------------
         1 s 05-SEP-07 xxxxx                                   1 whatever
         2 s 05-SEP-07 whatever                                2 xxxxx
         3 s 05-SEP-07 xxxxx                                   3 xxxxx


Execution Plan
----------------------------------------------------------
Plan hash value: 396663544

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |     1 |  2068 |     5   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |                  |     1 |  2068 |     5   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| FOROS_TEXTO      |     1 |  2027 |     4   (0)| 00:00:01 |
|*  3 |    DOMAIN INDEX              | FOROS_INDEX      |       |       |     4   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS BY INDEX ROWID| FOROS_MENS       |     1 |    41 |     1   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN         | FOROS_MENS_ID_PK |     1 |       |     0   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CTXSYS"."CONTAINS"("T"."TEXTO",'xxxxx')>0)
   4 - filter("M"."VISIBLE"='s' AND "M"."FECHA">SYSDATE@!-90)
   5 - access("M"."ID"="T"."ID_MENS")

Note
-----
   - dynamic sampling used for this statement

SCOTT@10gXE> -- searches of individual columns:
SCOTT@10gXE> SELECT *
  2  FROM   foros_mens m, foros_texto t
  3  WHERE  m.visible = 's'
  4  AND    CONTAINS (t.texto,'xxxxx WITHIN titulo') > 0
  5  AND    m.fecha > (SYSDATE - 90)
  6  AND    m.id = t.id_mens
  7  /

        ID V FECHA     TITULO                            ID_MENS TEXTO
---------- - --------- ------------------------------ ---------- ------------------------------
         1 s 05-SEP-07 xxxxx                                   1 whatever
         3 s 05-SEP-07 xxxxx                                   3 xxxxx


Execution Plan
----------------------------------------------------------
Plan hash value: 396663544

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |     1 |  2068 |     5   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |                  |     1 |  2068 |     5   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| FOROS_TEXTO      |     1 |  2027 |     4   (0)| 00:00:01 |
|*  3 |    DOMAIN INDEX              | FOROS_INDEX      |       |       |     4   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS BY INDEX ROWID| FOROS_MENS       |     1 |    41 |     1   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN         | FOROS_MENS_ID_PK |     1 |       |     0   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CTXSYS"."CONTAINS"("T"."TEXTO",'xxxxx WITHIN titulo')>0)
   4 - filter("M"."VISIBLE"='s' AND "M"."FECHA">SYSDATE@!-90)
   5 - access("M"."ID"="T"."ID_MENS")

Note
-----
   - dynamic sampling used for this statement

SCOTT@10gXE> SELECT *
  2  FROM   foros_mens m, foros_texto t
  3  WHERE  m.visible = 's'
  4  AND    CONTAINS (t.texto,'xxxxx WITHIN texto') > 0
  5  AND    m.fecha > (SYSDATE - 90)
  6  AND    m.id = t.id_mens
  7  /

        ID V FECHA     TITULO                            ID_MENS TEXTO
---------- - --------- ------------------------------ ---------- ------------------------------
         2 s 05-SEP-07 whatever                                2 xxxxx
         3 s 05-SEP-07 xxxxx                                   3 xxxxx


Execution Plan
----------------------------------------------------------
Plan hash value: 396663544

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |     1 |  2068 |     5   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |                  |     1 |  2068 |     5   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| FOROS_TEXTO      |     1 |  2027 |     4   (0)| 00:00:01 |
|*  3 |    DOMAIN INDEX              | FOROS_INDEX      |       |       |     4   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS BY INDEX ROWID| FOROS_MENS       |     1 |    41 |     1   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN         | FOROS_MENS_ID_PK |     1 |       |     0   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CTXSYS"."CONTAINS"("T"."TEXTO",'xxxxx WITHIN texto')>0)
   4 - filter("M"."VISIBLE"='s' AND "M"."FECHA">SYSDATE@!-90)
   5 - access("M"."ID"="T"."ID_MENS")

Note
-----
   - dynamic sampling used for this statement

SCOTT@10gXE> 

[Updated on: Fri, 07 September 2007 14:38]

Report message to a moderator

Re: Oracle Text query with multiple contains performance [message #270393 is a reply to message #264520] Wed, 26 September 2007 14:24 Go to previous messageGo to next message
elin@atcc.org
Messages: 4
Registered: September 2007
Junior Member
I had a similar domain index created for all_fields search. However, we ran into a problem that the search didn't return the right result when some culumns have the same values like their culumn names. (e.g. "authenticult" value is stored in authenticult column). The result always returns all rows. Can someone help?
Re: Oracle Text query with multiple contains performance [message #270415 is a reply to message #270393] Wed, 26 September 2007 18:15 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
Please provide enough code to reproduce the problem.
Re: Oracle Text query with multiple contains performance [message #270428 is a reply to message #270415] Wed, 26 September 2007 21:13 Go to previous messageGo to next message
elin@atcc.org
Messages: 4
Registered: September 2007
Junior Member
Here is a test case. 4 rows are inserted and a context index is created for all fields. The search against all fields returns 4 rows when the search value is same as one of the column names. (It should only return 1 row.) You may get a better idea when you see the DDLs below. Thanks for the help.

/*** CREATE TABLE ***/
create table test (a int, authenticult clob, preceptrol clob, safetsource clob, all_fields char(1));

/*** INSERT DATA ***/
insert into test values (1,'col1','col2','col3',null);
insert into test values (2,'authenticult',null,null,null);
insert into test values (3,null,'preceptrol',null,null);
insert into test values (4,null,null,'safetsource',null);

/*** CREATE INDEX ***/
begin
ctx_ddl.create_preference('test_ds','multi_column_datastore');
ctx_ddl.set_attribute('test_ds','columns','authenticult, preceptrol,safetsource');
end;

create index test_idx on test(all_fields) indextype is ctxsys.context parameters('datastore test_ds sync(on commit)');

/*** SEARCH QUERY ***/
select count(*) from test where contains (all_fields, 'authenticult') >0;
Re: Oracle Text query with multiple contains performance [message #270725 is a reply to message #270428] Thu, 27 September 2007 20:05 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
When you use a multi_column_datastore, it creates a virtual document, similar to what you might create using a user_datastore with a procedure to concatenate the columns. By default, it creates tags of the same names as the column names and uses them as delimiters and tokenizes and indexes them along with the data. You can override this default behavior by setting the delimiter attribute to newline for your multi_column_datastore. The column names then are ignored during indexing and only the data within them is indexed. Please see the reproduction and solution below.

SCOTT@10gXE> -- test environment you provided:
SCOTT@10gXE> 
SCOTT@10gXE> /*** CREATE TABLE ***/
SCOTT@10gXE> create table test
  2    (a	      int,
  3  	authenticult clob,
  4  	preceptrol   clob,
  5  	safetsource  clob,
  6  	all_fields   char(1));

Table created.

SCOTT@10gXE> 
SCOTT@10gXE> /*** INSERT DATA ***/
SCOTT@10gXE> insert into test values (1,'col1','col2','col3',null);

1 row created.

SCOTT@10gXE> insert into test values (2,'authenticult',null,null,null);

1 row created.

SCOTT@10gXE> insert into test values (3,null,'preceptrol',null,null);

1 row created.

SCOTT@10gXE> insert into test values (4,null,null,'safetsource',null);

1 row created.

SCOTT@10gXE> 
SCOTT@10gXE> /*** CREATE INDEX ***/
SCOTT@10gXE> begin
  2    ctx_ddl.create_preference('test_ds','multi_column_datastore');
  3    ctx_ddl.set_attribute('test_ds','columns','authenticult, preceptrol,safetsource');
  4  end;
  5  /

PL/SQL procedure successfully completed.

SCOTT@10gXE> 
SCOTT@10gXE> create index test_idx on test(all_fields)
  2  indextype is ctxsys.context
  3  parameters('datastore test_ds sync(on commit)');

Index created.

SCOTT@10gXE> 
SCOTT@10gXE> -- reproduction:
SCOTT@10gXE> 
SCOTT@10gXE> /*** SEARCH QUERY ***/
SCOTT@10gXE> select count(*) from test where contains (all_fields, 'authenticult') >0;

  COUNT(*)
----------
         4

SCOTT@10gXE> 
SCOTT@10gXE> -- see what is tokenized, indexed, and searchable:
SCOTT@10gXE> column token_text varchar2(30)
SP2-0735: unknown COLUMN option beginning "varchar2(3..."
SCOTT@10gXE> select token_text, token_count from dr$test_idx$i
  2  /

TOKEN_TEXT                                                       TOKEN_COUNT
---------------------------------------------------------------- -----------
AUTHENTICULT                                                               4
COL1                                                                       1
COL2                                                                       1
COL3                                                                       1
PRECEPTROL                                                                 4
SAFETSOURCE                                                                4

6 rows selected.

SCOTT@10gXE> 
SCOTT@10gXE> -- solution:
SCOTT@10gXE> drop index test_idx
  2  /

Index dropped.

SCOTT@10gXE> exec ctx_ddl.set_attribute ('test_ds', 'delimiter', 'newline')

PL/SQL procedure successfully completed.

SCOTT@10gXE> 
SCOTT@10gXE> create index test_idx on test(all_fields)
  2  indextype is ctxsys.context
  3  parameters('datastore test_ds sync(on commit)');

Index created.

SCOTT@10gXE> 
SCOTT@10gXE> 
SCOTT@10gXE> -- see what is tokenized, indexed, and searchable:
SCOTT@10gXE> select token_text, token_count from dr$test_idx$i
  2  /

TOKEN_TEXT                                                       TOKEN_COUNT
---------------------------------------------------------------- -----------
AUTHENTICULT                                                               1
COL1                                                                       1
COL2                                                                       1
COL3                                                                       1
PRECEPTROL                                                                 1
SAFETSOURCE                                                                1

6 rows selected.

SCOTT@10gXE> 
SCOTT@10gXE> 
SCOTT@10gXE> /*** SEARCH QUERY ***/
SCOTT@10gXE> select count(*) from test where contains (all_fields, 'authenticult') >0;

  COUNT(*)
----------
         1

SCOTT@10gXE> 


Re: Oracle Text query with multiple contains performance [message #270947 is a reply to message #264520] Fri, 28 September 2007 09:44 Go to previous messageGo to next message
elin@atcc.org
Messages: 4
Registered: September 2007
Junior Member
This is awesome! You solved my problem, Barbara. Thanks.
Re: Oracle Text query with multiple contains performance [message #271451 is a reply to message #264520] Mon, 01 October 2007 12:36 Go to previous message
elin@atcc.org
Messages: 4
Registered: September 2007
Junior Member
I also got a solution from Oracle support by creating a basic_section_group. It is necessary to use BASIC_SECTION_GROUP so the column name is indexed as section not as a word..

SQL> exec ctx_ddl.create_section_group('test_section_group','BASIC_SECTION_GROUP');

SQL> create index test_idx on test(all_fields) indextype is ctxsys.context parameters('section group test_section_group datastore test_ds sync(on commit)');
Previous Topic: Updating index information without recreating the index AND Problem executing ctx procedures (merged
Next Topic: Some issues about Oracle Full-Text Search Index
Goto Forum:
  


Current Time: Thu Mar 28 05:23:21 CDT 2024