Home » SQL & PL/SQL » SQL & PL/SQL » How to verify if string contains unwanted character (Oracle Db 11g and later)
How to verify if string contains unwanted character [message #665448] Wed, 06 September 2017 05:08 Go to next message
ator
Messages: 43
Registered: March 2009
Member
Hi,

before writing here i've been looking all over but i couldn't find how to resolve my problem.

I have to verify if in a give string there are character outside a given list.

I tried to user regular expression but it doesn't seem to work as follows.

regexp_like('test letter ã','[^a-zA-Z0-9\.,;:''\+\-\(\)\?\*\[\]\{\}\\`´~!"#%&<>÷=@_$£àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ]')

In this case it should return false because the letter ã is not in the given list and i can't do the opposite match because i don't know which character they would insert.

I hope you can help me with this problem, otherwise i had to make a list of unwanted characters and match it with it.

Thank you in advance,

Stefano

Re: How to verify if string contains unwanted character [message #665456 is a reply to message #665448] Wed, 06 September 2017 07:18 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
what does above have to do with Oracle DB?

Please read and follow the forum guidelines, to enable us to help you:
OraFAQ Forum Guide
How to use {code} tags and make your code easier to read

Re: How to verify if string contains unwanted character [message #665457 is a reply to message #665456] Wed, 06 September 2017 07:20 Go to previous messageGo to next message
Roachcoach
Messages: 1576
Registered: May 2010
Location: UK
Senior Member
iirc regexp_like is an oracle DB function Smile
Re: How to verify if string contains unwanted character [message #665458 is a reply to message #665448] Wed, 06 September 2017 07:59 Go to previous messageGo to next message
flyboy
Messages: 1903
Registered: November 2006
Senior Member
Hi,

Just a few remarks:
The regular expression pattern does not include a space (" "), so it always matching.
Additionally, escaping inside a bracket expression is not taking any effect. On the other hand, the characters "-" and "]" have to be put into special place.
See details here: https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended

So, in the end you may end with mask like this (I have not included the "ã" character , as it was not in the original one):
'[^] a-zA-Z0-9.,;:''+()?*[{}\`´~!"#%&<>÷=@_$£àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ-]'
(hopefully it recognizes all included characters)

[Edit: link refined]

[Updated on: Wed, 06 September 2017 08:22]

Report message to a moderator

Re: How to verify if string contains unwanted character [message #665473 is a reply to message #665448] Wed, 06 September 2017 15:08 Go to previous messageGo to next message
Michel Cadot
Messages: 68624
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Use TRANSLATE to eliminate the allowed characters and check if the result is null or not.

Re: How to verify if string contains unwanted character [message #665587 is a reply to message #665456] Mon, 11 September 2017 10:29 Go to previous messageGo to next message
ator
Messages: 43
Registered: March 2009
Member
Hi,

sorry for the late reply, i wrote in this secton because is a pl/sql function of oracle 11g and if i got the wrong section i apoligize.
Quote:

Hi,
Just a few remarks:
The regular expression pattern does not include a space (" "), so it always matching.
Additionally, escaping inside a bracket expression is not taking any effect. On the other hand, the characters "-" and "]" have to be put into special place.
See details here: https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended

So, in the end you may end with mask like this (I have not included the "ã" character , as it was not in the original one):

'[^] a-zA-Z0-9.,;:''+()?*[{}\`´~!"#%&<>÷=@_$£àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ-]'

(hopefully it recognizes all included characters)
I tried with the string but i did not succeded, so i'll do a translate and keep updating the character needed to be replaced and the error will occur.

If I find a better solution i will let you know.

Thanks for your help.

Stefano

Re: How to verify if string contains unwanted character [message #665738 is a reply to message #665587] Tue, 19 September 2017 05:08 Go to previous message
quirks
Messages: 82
Registered: October 2014
Member
Hello @ator:

I thought, that @flyboy s statement should be working, but it did not (as you figured out by yourself).
So I played a little bit around and figured that ...
SELECT CASE WHEN REGEXP_LIKE('ã', '[a-zA-Z]') THEN 'true' ELSE 'false' END STR_MATCHES
  FROM DUAL;
... 'ã' is part of the characters defined by '[a-z]'.

As it shows, as long as you are using a ASCII character set in your database the '[a-z]' works as expected:
WITH
    CHARACTERS
    AS
        (SELECT     TO_NUMBER(TO_CHAR(LEVEL - 1)) AS DEC_NUM
               FROM DUAL
         CONNECT BY LEVEL <= POWER(2, 7))
SELECT DEC_NUM
      ,CASE WHEN DEC_NUM < 128 THEN CHR(DEC_NUM) ELSE NULL END          ASCII_CHAR
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:graph:]]') THEN 'x' END  IS_GRAPH
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:blank:]]') THEN 'x' END  IS_BLANK
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:alnum:]]') THEN 'x' END  IS_ALNUM
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:alpha:]]') THEN 'x' END  IS_ALPHA
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[a-z-Z]') THEN 'x' END      A_Z_TEST
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:digit:]]') THEN 'x' END  IS_DIGIT
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:cntrl:]]') THEN 'x' END  IS_CNTRL
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:lower:]]') THEN 'x' END  IS_LOWER
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:upper:]]') THEN 'x' END  IS_UPPER
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:print:]]') THEN 'x' END  IS_PRINT
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:punct:]]') THEN 'x' END  IS_PUNCT
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:space:]]') THEN 'x' END  IS_SPACE
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:xdigit:]]') THEN 'x' END IS_XDIGIT
  FROM CHARACTERS

But when your character set is bigger (as it is in nearly all oracle installations) 'ã' is part of the [a_z] range.
WITH
    CHARACTERS
    AS
        (SELECT     TO_NUMBER(TO_CHAR(LEVEL - 1))                                               AS DEC_NUM
                   ,REGEXP_REPLACE(REGEXP_REPLACE(TO_CHAR(LEVEL - 1, 'XXXX'), '^ ', '\'), ' ', '0') AS HEX_NUM
               FROM DUAL
         CONNECT BY LEVEL <= POWER(2, 16))
SELECT DEC_NUM
      ,CASE WHEN DEC_NUM < 128 THEN CHR(DEC_NUM) ELSE NULL END             ASCII_CHAR
      ,HEX_NUM
      ,UNISTR(HEX_NUM)                                                     UTF8_CHAR
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:graph:]]') THEN 'x' END  IS_GRAPH
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:blank:]]') THEN 'x' END  IS_BLANK
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:alnum:]]') THEN 'x' END  IS_ALNUM
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:alpha:]]') THEN 'x' END  IS_ALPHA
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[a-z-Z]') THEN 'x' END      A_Z_TEST
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:digit:]]') THEN 'x' END  IS_DIGIT
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:cntrl:]]') THEN 'x' END  IS_CNTRL
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:lower:]]') THEN 'x' END  IS_LOWER
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:upper:]]') THEN 'x' END  IS_UPPER
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:print:]]') THEN 'x' END  IS_PRINT
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:punct:]]') THEN 'x' END  IS_PUNCT
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:space:]]') THEN 'x' END  IS_SPACE
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:xdigit:]]') THEN 'x' END IS_XDIGIT
  FROM CHARACTERS
 WHERE 
       UNISTR(HEX_NUM) = 'ã' AND
       (
          DECODE(UNISTR(HEX_NUM), REGEXP_SUBSTR(UNISTR(HEX_NUM), '[a-z-Z]'), 1) = 1 OR 
          DECODE(UNISTR(HEX_NUM), REGEXP_SUBSTR(UNISTR(HEX_NUM), '[[:alpha:]]'), 1) = 1
       );

You can find out which character set you are using with this statement:
SELECT *
  FROM NLS_DATABASE_PARAMETERS
 WHERE PARAMETER LIKE 'NLS_%CHARACTERSET';
'NLS_CHARACTERSET' is used in CHAR and VARCHAR columns.
'NLS_NCHAR_CHARACTERSET' in NCHAR and NVARCHAR columns (which can store the full Unicode character set).

In conclusion you can not use [a-z] but must enlist each character by itself:
SELECT CASE WHEN REGEXP_LIKE('ã', '[^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ]') THEN 'true' ELSE 'false' END STR_MATCHES
  FROM DUAL;

To bring @flyboy s statement to life you just need to replace the range definitions by their individual values:
SELECT CASE
           WHEN REGEXP_LIKE(
                    'test letter ã'
                   ,'[^] abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0-9.,;:''+()?*[{}\`´~!"#%&<>÷=@_$£àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ-]'
                )
           THEN
               'true'
           ELSE
               'false'
       END
           STR_MATCHES
  FROM DUAL;
Previous Topic: How to insert HINDI word in ORACLE
Next Topic: finding status
Goto Forum:
  


Current Time: Thu Mar 28 07:41:01 CDT 2024