Regular Expression or Regex - NLP

Let Strat with naive approach for finding word in sentence. print('How' in 'How are you') True print('how' in 'How are you') False You can see if 'How' in upper case then it says True but in lower case it says False. Now to use regular expression you need to import re module. import re 1. Searching By using re.search() we can search word in text. import re serch=re.search('you','How are you?') print(serch) You see match of 'you' word and also tell us about index of the text span=(8,11). print(serch.span()) you can use span() function to find only index of word. Now do remember that you can search only for once what I mean is if search for 'you' it only show the index of word which come first. import re serch=re.search('you','How are you you you?') print(serch.span()) Output: (8, 11) 2. Findall findall() finds all the occurrence of pattern in string. Let me define you what pattern meaning in here. are, area, are in here findall() will select all three words why? because area also have pattern of 'are' import re a='Hello how are area are you?' b='are' srch=re.findall(b,a) print(srch) ['are', 'are', 'are'] 3. Finditer finditer() iterate through the string and if you want to find index for all string you can use finditer function. import re for i in re.finditer('hello','hello world hello earth hello universe'): print(i.span()) Output: (0, 5) (12, 17) (24, 29) 4. Creating Regular Expression Let say we want to find 10 digit phone number from text. So we will find pattern for 10 digit number in text. import re a= 'My Phone number is 8899446677' pattern='\d\d\d\d\d\d\d\d\d\d' print(re.search(pattern,a)) Optimum way to do it: import re a= 'My Phone number is 88-9944-6677' pattern='\d{2}-\d{4}-\d{4}' print(re.search(pattern,a)) import re a= 'My Phone number is 88-9944-6677' pattern='\d{2}-\d{4}-\d{4}' print(re.search(pattern,a).group()) Now you tell me what will happen.

Jun 14, 2025 - 13:50
 0
Regular Expression or Regex - NLP

Let Strat with naive approach for finding word in sentence.

print('How' in 'How are you')
True
print('how' in 'How are you')
False

You can see if 'How' in upper case then it says True but in lower case it says False.

Now to use regular expression you need to import re module.

import re

1. Searching

By using re.search() we can search word in text.

import re
serch=re.search('you','How are you?')
print(serch)

You see match of 'you' word and also tell us about index of the text span=(8,11).

print(serch.span())

you can use span() function to find only index of word.

Now do remember that you can search only for once what I mean is if search for 'you' it only show the index of word which come first.

import re
serch=re.search('you','How are you you you?')
print(serch.span())
Output:
(8, 11)

2. Findall

findall() finds all the occurrence of pattern in string.

Let me define you what pattern meaning in here.

are, area, are in here findall() will select all three words why? because area also have pattern of 'are'

import re
a='Hello how are area are you?'
b='are'
srch=re.findall(b,a)
print(srch)
['are', 'are', 'are']

3. Finditer

finditer() iterate through the string and if you want to find index for all string you can use finditer function.

import re
for i in re.finditer('hello','hello world hello earth hello universe'):
    print(i.span())
Output:
(0, 5)
(12, 17)
(24, 29)

4. Creating Regular Expression

Let say we want to find 10 digit phone number from text. So we will find pattern for 10 digit number in text.

import re
a= 'My Phone number is 8899446677'
pattern='\d\d\d\d\d\d\d\d\d\d'
print(re.search(pattern,a))

Optimum way to do it:

import re
a= 'My Phone number is 88-9944-6677'
pattern='\d{2}-\d{4}-\d{4}'
print(re.search(pattern,a))

import re
a= 'My Phone number is 88-9944-6677'
pattern='\d{2}-\d{4}-\d{4}'
print(re.search(pattern,a).group())

Now you tell me what will happen.