Regular Expression¶

If you want to represent a group of string according to a particular format/pattern, then we should go for Regular Expression.
i.e Regular Expression is a declaration mechanism to represent a group of strings according to a perticulat format / pattern.

EXAMPLES¶

Q1. Write a regular expression to represent all mobile numbers¶

[6-9][0-9]{9}

Q2. Write a regular expression to represent all mail ids¶

``

Application areas of Regular Expression¶

Pattern Matching Applications
- Ctrl+f => In window
- grep => Linux
To perform validations / To develop validation frameworks.
To develop translators like complier, interpreter, assembler etc.
- compiler design
  - lexical Analysis
  - syntax Analysis
  - sematic Analysis
  - Intermediate code Generator
  - code Optimiztion
  - Target code Generator
To develop digital circuts like incrementer
To develop communicator protocol TCP/IP,UDP,HTTP etc.

re => module ( several functions) (java.util.regex package)

Methods¶

1. compile()¶

Returns Module contains compile() Function to compile a Pattern into RegexObject.

pattern = re.compile("ab")

Patteren Matching Application¶

Search pattern: ab target string: abaabababa 0,3,5, =>ab total => 3 times

import re
- Compiler String into pattern object
- Convert string into patteren object

pattern = re.compile('ab')

Return iterator to find pattern in the target
matcher = pattern.finditer('abaabababa') # one by one
matcher

count = 0
for match in matcher:
    print(match.start())
    count+=1

print(f'The Number of occurence: {count}')

# ouput
# 0
# 3
# 5
# 7

matcher

Start() => returns start index of the match
end() => return end + 1 index of the match
group() => return matched string

Character classes¶

We can use character classes to search a group of charaters

[abc] => a or b or c
[^abc] => Expect a and b and c
[a-z] => any lower case alphabet symbol
[A-Z] => Any Uppercase alphabet symbol
[a-zA-Z] => Any alphabet symbol
[0-9] => Any digit
[0-9a-zA-Z] => Any alphanumeric character (Special character)
[^0-9a-zA-Z] = > Except alphanumeric character (Special character)

¶

Any digit 0 to 9
('\d) => short cut

Predifinded character classes¶

\d => Any digit 0 to 9 [0-9]
\D => Any character except digit [^0-9]
\w => Any Word character [0-9a-zA-Z]
\W => Any character except alphanumeric character (special character)
\s => Sppecial character
\S => Except space character
. => Any character including special character also

```py title=" import re matcher = re.finditer(' ','abb7@k 9 yYz')

count= 0 for match in matcher: print(f'{match.start()}........{match.group()}') count= count+1

print(f"The number of occurences: {count}")

6........¶

8........¶

The number of occurences: 2¶

### Quantifiers

We can use quantifiers to specifig the number of occurences to match

- a => exactly one 'a'
- a+ => AtlEAST One 'a'
- a* => rv = r+ U {E} any number of a's including zero number
- a? => Atmost one a
- a{m} => Exactly m number of a
- a{m,n} => minimum m number of a's and maximium n number of a's

### Important function of re module

1. match()
2. fullmatch()
3. search()
4. findall()
5. finditer()
6. sub()
7. subn()
8. split()
9. compile()

#### 1. match()

- We can use match() function to check wheater the given pattern present at beginning of the target string or not?
- If match is available then we will get match object, otherwise we will get None.

```py title=""
import re

p = input("Enter patter to check!!! ")
m = re.match(p,'abcdefgh')

if m is not None:
    print(f"Target string with {m.group()}")
else:
    print(f"Target string not start with {p}")


# Enter patter to check!!!  abc
# Target string with abc

import re

p = input("Enter patter to check!!! ")
m = re.match(p,'abcdefgh')

if m:
    print(f"Target string with {m.group()}")
else:
    print(f"Target string not start with {p}")

# Enter patter to check!!! ashish
# Target string not start with ashish

import re

p = input("Enter patter to check!!! ")
m = re.match(p,'abcdefgh')

if m:
    print(f"Target string with {m.group()}")
else:
    print(f"Target string not start with {p}")

2. Fullmatch()¶

We can use fullmatch() function to check wheather total target string matched with given patteren or not
If matched then we will gett match object otheruse we will get None.
m = re.fullmatch(p,"abcdefgh")

import re

num = input("Enter Mobile Number to validate:!")
# patteren = "[6-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]"
patteren = "[6-9][0-9]{9}"

match = re.fullmatch(patteren,num)

if match:
    print("valid 10 - digit mobile numder")
else:
    print("Invalid 10 - digit mobile number")


# Enter Mobile Number to validate:! 9815039236
# valid 10 - digit mobile numder

# Enter Mobile Number to validate:! 1234567891
# Invalid 10 - digit mobile number

4. findall()¶

To find all occurrences of the match
This function returns list object which contains all occurrences.

import re

l = re.findall('[0-9]', 'a7b9k2@5kmn4')
print(l)

# ['7', '9', '2', '5', '4']

5. finditer()¶

Return iterator yielding a matching object for each match
On match object, we can call start(), end(), group() matches()

import re

matcher = re.finditer('[0-9]',"a7b9k2@skmn4")
for match in matcher:
    print(f"{match.start()}.....{match.group()}")

# 1.....7
# 3.....9
# 5.....2
# 11.....4

6. sub()¶

sub means substitution or replacement.
re.sub('pattern, replacement_string, target_string')

import re

s = re.sub('[0-9]','#','a7b9kz@5kmn4')
s
'a#b#kz@#kmn#'

7. subn()¶

It is exactly same as sub() except that it can also returns the number of replacements.
Return type is touple
(result string, number of replacements)

import re

t = re.subn('[0-9]','#','a7b9kz@5kmn4')

print(t)
print('The Result string',t[0])
print('The number of replacement',t[1])

# ('a#b#kz@#kmn#', 4)
# The Result string a#b#kz@#kmn#
# The number of replacement 4

8. split()¶

We can use split() function to split target string acc to the pattern

import re

l = re.split('-','27-11-2020')
print(l)

for s in l:
    print(s)


# ['27', '11', '2020']
# 27
# 11
# 2020

match() = > To check wheather the given target string start with specificed pattern or not
fullmatch() = > To check wheather total string matched patteren or not
Search() = > To retuen first occurrence of the match
findall() = > To return all matches
finditer() = > To return iterator object which yields match object
sub() = > To replace every occurence of the patteren with provided replacement string
subn() = > Same as sub() but also returns the number of occurences
split() = > To split the target string on given patteren

^ symbol and $ symbol¶

^ symbol = start with
re.search('[^ab],target string') ==> not
re.search('^ab,target string') ==> start with

import re
match = re.search('^learn','learn python is very easy!!')

if match is not None:
    print('Target string start with our pattern')
else:
    print("Taerget string not start with out patteren")

$ symbol¶

$ means end with

import re

match = re.search('easy$',"learn python is very easy")

if match:
    print('Target string end with our pattern')
else:
    print("Taerget string not end with out patteren")

import re

match = re.search('Easy$',"learn python is very easy")

if match:
    print('Target string end with our pattern')
else:
    print("Taerget string not end with out patteren")

import re

match = re.search('EASY$',"learn python is very easy",re.IGNORECASE)

if match:
    print('Target string end with our pattern')
else:
    print("Taerget string not end with out patteren")

Write a Regular Expression to represent all YAVA lang identifier?¶

Rule:

The allowed charaters are a-z,A-Z,0-9,#
The first character should be lowercase alphabet symbol from a to k
the scond character shoild be a digit divisible by 3
The length of identifier should be atkeast 2 a9cd#3

[a-k][0369][a-zA-Z0-9#]*
[a-zA-Z0-9#]* = > We can take character any number of time including zero times also
+= > We can take character alteast once
? => atmost once (either onetime pr zerotimes)

import re

target = input("Enter any identifier to check ")

pattern = "[a-k][0369][a-zA-Z0-9#]*"
match = re.fullmatch(pattern,target)

if match:
    print("valid identifier ")

else:
    print("Invlid identifer ")

3 Write a regular expression to represent all 10 digit mobile numbers¶

The number should contains exactly 10 digits
The first should be from 6 to 9 only.

[6-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]
[6-9][0-9]{9} | 0[6-9][0-9]{9} | 91[6-9][0-9]{9}
(091)? [6-9][0-9]{9}

WAP to extract all mobile numbers present in input.txt where numbers are mixed with normal text data?¶

import re

f1 = open('input.txt', 'r')
f2 = open('mobile_number.txt','w')

pattern = '(0 | 91)?[6-9][0-9]{9}'

for line in f1:
    matcher = re.finditer(pattern,line)

    for match in matcher:
        f2.write(match.group() + '\n')


f1.close()
f2.close()

print("Extraction complete open file to see results")

mobile number

import re

mobile_regex = r'\b(?:\+?91|0)?[-]?\(?[6789]\d{2}\)?[-]?\d{3}[-]?\d{4}\b'

with open('input.txt', 'r') as file:
    data = file.read()
    # Find all matches of mobile numbers in the data
    mobile_numbers = re.findall(mobile_regex, data)

# Print the extracted mobile numbers
for number in mobile_numbers:
    print(number)

print("Extraction complete. Open file to see results.")

WAP to check wheather given car registration number is valid Telangana state registration Number or Not?¶

Chical registration

import re

num = input("Enter vehicle Registration no to validate ").upper()

patteren = "TS[012][0-9][A-Z][A-Z][0-4]{4}"

matcher = re.fullmatch(patteren,num)

if matcher:
    print("Valid Vehicale Registration Number")
else:
    print("Invalid Vehicale Registration Number")

WAP to check wheater the given mail id is valid or not?¶

valid mail id code

import re

num = input("Enter mail ")

patteren = "[a-zA-Z0-9][a-zA-Z0-9._]*@gmail.com"

matcher = re.fullmatch(patteren,num)

if matcher:
    print("valid mail ")
else:
    print("Invalid mail")

Q sort the list using re¶

file_paths = [ 'static\pdf\01 Adsolut initiatie\page_1.pdf', 'static\pdf\01 Adsolut initiatie\page_10.pdf', 'static\pdf\01 Adsolut initiatie\page_11.pdf', 'static\pdf\01 Adsolut initiatie\page_12.pdf', 'static\pdf\01 Adsolut initiatie\page_13.pdf', 'static\pdf\01 Adsolut initiatie\page_14.pdf', 'static\pdf\01 Adsolut initiatie\page_15.pdf', 'static\pdf\01 Adsolut initiatie\page_16.pdf', 'static\pdf\01 Adsolut initiatie\page_17.pdf', 'static\pdf\01 Adsolut initiatie\page_18.pdf', 'static\pdf\01 Adsolut initiatie\page_19.pdf', 'static\pdf\01 Adsolut initiatie\page_2.pdf', 'static\pdf\01 Adsolut initiatie\page_20.pdf', 'static\pdf\01 Adsolut initiatie\page_21.pdf', 'static\pdf\01 Adsolut initiatie\page_22.pdf', 'static\pdf\01 Adsolut initiatie\page_23.pdf', 'static\pdf\01 Adsolut initiatie\page_24.pdf', 'static\pdf\01 Adsolut initiatie\page_25.pdf', 'static\pdf\01 Adsolut initiatie\page_26.pdf', 'static\pdf\01 Adsolut initiatie\page_27.pdf', 'static\pdf\01 Adsolut initiatie\page_28.pdf', 'static\pdf\01 Adsolut initiatie\page_29.pdf', 'static\pdf\01 Adsolut initiatie\page_3.pdf', 'static\pdf\01 Adsolut initiatie\page_30.pdf', 'static\pdf\01 Adsolut initiatie\page_4.pdf', 'static\pdf\01 Adsolut initiatie\page_5.pdf', 'static\pdf\01 Adsolut initiatie\page_6.pdf', 'static\pdf\01 Adsolut initiatie\page_7.pdf', 'static\pdf\01 Adsolut initiatie\page_8.pdf', 'static\pdf\01 Adsolut initiatie\page_9.pdf' ]

todo shorting

remove more then one space

import re

def split_by_spaces(text):
    # Use regex to split by one or more spaces
    split_text = re.split(r'\s+', text.strip())  # Strip leading/trailing whitespace
    return split_text

# Example usage
original_text = "This   is  an example    string with  multiple spaces."
split_result = split_by_spaces(original_text)

print(f"Original Text: '{original_text}'")
print(f"Split Result: {split_result}")

# Original Text: 'This   is  an example    string with  multiple spaces.'
# Split Result: ['This', 'is', 'an', 'example', 'string', 'with', 'multiple', 'spaces.'

Program to find Only digit Numbers¶

Only digit allowed program

import re

def remove_str_special_char(strings: str):
    l = re.findall(r'\d', strings)
    print(l)

remove_str_special_char("a7b9k2@5kmn4")
def remove_str_special_char_0(strings: str):
    l = re.findall(r'\D', strings)
    print(l)

remove_str_special_char_0("a7b9k2@5kmn4")
def remove_str_special_char_1(strings: str):
    matcher = re.finditer('[0-9]',strings)
    for match in matcher:
        print(f"{match.group()}")

remove_str_special_char_1("a7b9k2@skmn4")


def extract_digits(strings: str):
    # Remove all non-digit characters and keep only the digits
    digits = re.sub(r'\D', '', strings)
    print(digits)

extract_digits("a7b9k2@5kmn4")

Date parser¶

from dateutil import parser

def validate_date(date_string: str):
    try:
        # Try parsing the date string
        parsed_date = parser.parse(date_string)
        return True  # If no exception occurs, the date is valid
    except (ValueError, TypeError):
        return False  # If parsing fails, return False

# Test cases
print(validate_date("2024-11-25"))  # Valid date
print(validate_date("11/25/2024"))  # Valid date
print(validate_date("25-11-2024"))  # Valid date
print(validate_date("2024-02-30"))  # Invalid date
print(validate_date("V 2024-02-30"))  # Invalid date
print(validate_date("a2024-02-30"))  # Invalid date