SOFTWARE - [Reg-Ex Group Extractor]
Group Extractor is a small program that allows you to extract data from text strings using a technique called 'Regular Expressions Groups'. The program can be a valuable tool to do some text manipulations. The best way to explain the purpose of the tool is by giving an example: Let's assume you have a list with the filenames of photos, where you want to separate the filenames and extensions (JPG/RAW):
_DSC3091.jpg _DSC3094.raw _DSC3095.jpg _DSC3097.raw extensionLessSample
This can be done by writing a small regular expression that defines two groups (variables) called 'filename' and 'extension': ^(?<filename>.+)\.(?<extension>.+)$. The program will verify for each line if it matches the global Regular Expression, and then split up the data into groups. If a line doesn't match the global expression, it will be skipped (indicated in red). So the tool will split up each of the lines of the source text and put the results in a table. The columns are named after the groups (variables) you defined in your Regular Expression.
The program offers some post-processing options to trim the data fields (remove leading/trailing spaces) or to exclude the empty lines. You can also save the generated list as CSV data, or rebuild a certain text string (see below). For my own convenience I have added a button that can generate VB.NET source code to implement the written Regular Expression very quickly.
The String builder allows you to combine the obtained variables into a completely new customizable string. A very basic example:
_DSC3091 is a jpg _DSC3094 is a raw _DSC3095 is a jpg _DSC3097 is a raw
Before you can use this program you will need to have some basic-knowledge about Regular Expressions. You can find whole books about Regular Expressions so I am not going to explain them in details.
Basically, the program will split up the input data in separate lines. Then it will examine each line and check if it matches the specified Regular Expression. If it does the groups/variables are extracted and added into a table.
You can create a group by placing it between round brackets (parentheses). The opening bracket is always followed by a question-mark. Then the name of the group is specified between the < and >, followed by the Regular Expression to describe the group:(?<name>some_reg_ex).
Regular Expressions have a lot of reserved characters like the decimal point, which means 'any character'. In case you really want to use a decimal point, you need to escape it. This can be done by precede it by the backslash-character.
Here you can find some sample applications where the data can be split using Regular Expressions. I kept the expressions as simple as possible, but in a real-world application precautions must be taken to make sure the expression is not too "greedy".
2015-01-15T07:54:40.458+00:00 program started 2015-01-15T07:54:40.724+00:00 variables initialized 2015-01-15T07:54:41.047+00:00 process terminated^(?<date>[0-9]{4}\-[0-9]{2}\-[0-9]{2})T(?<time>[0-9]{2}:[0-9]{2}:[0-9]{2}).+\+00:00 (?<message>.+)$
<trkpt lat="50.736404126510024" lon="4.254649942740798"> <trkpt lat="50.736085949465632" lon="4.25487725995481"> <trkpt lat="50.735525619238615" lon="4.255550997331739">lat="(?<latitude>[\-]?[0-9]+\.[0-9]+)".+lon="(?<longitude>[\-]?[0-9]+\.[0-9]+)"
A13 AF345 BD72^(?<column>[A-Z]+)(?<row>[0-9]+)$
bart@sample1.com eric@sample2.com bill@sample3.com^(?<username>.+)@(?<domain>.+)$
Reg-Ex Group Extractor is written in Visual Basic .NET 4.0. You can download the complete setup wizard by clicking the button below.
Copyright ©1998-2022 Vanderhaegen Bart - last modified: August 11, 2017