Motivation
Imagine I have the URL "https://mohammed.ezzedine.me/article/X402gZvgPGv" which I want to extract from some information, such as host, domain name, article ID, etc.
This can be done through code by splitting on the special characters but would be nasty, hard to read, and hard to maintain. That's why we will be using pattern matching with RegEx to do the job.
Java Pattern Matching
We will be using for this the concept of Named Capturing Groups
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Scratch {
private static final String URL_REGEX = "(?<protocol>.*?)://(?<host>(?<subdomain>.*?)\\.(?<domain>.*?)\\.(?<domainExtension>.*?))/(?<section>.*?)/(?<sectionId>.*?)";
public static void main(String[] args) {
String url = "https://mohammed.ezzedine.me/article/X402gZvgPGv";
Pattern pattern = Pattern.compile(URL_REGEX);
Matcher matcher = pattern.matcher(url);
if (matcher.matches() && matcher.groupCount() > 0) {
System.out.printf("Protocol: `%s`%n", matcher.group("protocol"));
System.out.printf("Host: `%s`%n", matcher.group("host"));
System.out.printf("Sub-domain: `%s`%n", matcher.group("subdomain"));
System.out.printf("Domain Name: `%s`%n", matcher.group("domain"));
System.out.printf("Domain Name Extension: `%s`%n", matcher.group("domainExtension"));
System.out.printf("Section: `%s`%n", matcher.group("section"));
System.out.printf("Section ID: `%s`%n", matcher.group("sectionId"));
}
}
}
The output:
Protocol: `https`
Host: `mohammed.ezzedine.me`
Sub-domain: `mohammed`
Domain Name: `ezzedine`
Domain Name Extension: `me`
Section: `article`
Section ID: `X402gZvgPGv`
Note here that:
- the syntax for a group is simple: (?<groupName>{regexMatchingGroupContent})
- the group content can contain the syntax for a nested group