Pattern Matching Using RegEx

Software Development
Published Jul 6, 2022 ยท less than a minute read
Regular Expressions are a very powerful tool. When done right, they can be used to extract information from texts in a clean way. In this article, we will see an example done in Java to extract information from an article's URL.

Motivation

Imagine I have the URL "https://mohammed.ezzedine.me/article/X402gZvgPGv" which I want to extract from some information, such as host, domain name, article ID, etc. 
This can be done through code by splitting on the special characters but would be nasty, hard to read, and hard to maintain. That's why we will be using pattern matching with RegEx to do the job.

Java Pattern Matching

We will be using for this the concept of Named Capturing Groups

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class Scratch {

    private static final String URL_REGEX = "(?<protocol>.*?)://(?<host>(?<subdomain>.*?)\\.(?<domain>.*?)\\.(?<domainExtension>.*?))/(?<section>.*?)/(?<sectionId>.*?)";

    public static void main(String[] args) {
        String url = "https://mohammed.ezzedine.me/article/X402gZvgPGv";

        Pattern pattern = Pattern.compile(URL_REGEX);
        Matcher matcher = pattern.matcher(url);

        if (matcher.matches() && matcher.groupCount() > 0) {
            System.out.printf("Protocol: `%s`%n", matcher.group("protocol"));
            System.out.printf("Host: `%s`%n", matcher.group("host"));
            System.out.printf("Sub-domain: `%s`%n", matcher.group("subdomain"));
            System.out.printf("Domain Name: `%s`%n", matcher.group("domain"));
            System.out.printf("Domain Name Extension: `%s`%n", matcher.group("domainExtension"));
            System.out.printf("Section: `%s`%n", matcher.group("section"));
            System.out.printf("Section ID: `%s`%n", matcher.group("sectionId"));
        }
    }
}

The output:


 

Protocol: `https`

Host: `mohammed.ezzedine.me`

Sub-domain: `mohammed`

Domain Name: `ezzedine`

Domain Name Extension: `me`

Section: `article`

Section ID: `X402gZvgPGv`

Note here that:

  • the syntax for a group is simple: (?<groupName>{regexMatchingGroupContent})
  • the group content can contain the syntax for a nested group