My First Regular Expressions

Wait. Are there flavors of regex? Every time I have to use regex it hurts my brain and I never need to do it enough to actually sit down and learn it properly like OP is doing. Just knowing there are different ways of doing the same things in an already mind baffeling language blows me away even more.

permalink

report

parent

reply

[ - ]

remotelove@lemmy.ca

21 points

1 year ago

*

Yeah. The only one you really need to care about (especially under Linux) is PCRE, the good 'ol Perl Compatible Regular Expressions. For the most part, every other flavor is a derivative of that. Microsoft had a weird version for a while, but that may be completely dead now, thankfully.

Learning the syntax of regex is fairly easy. Hell, I still have to use this cheat sheet more often now that my perl skills are no longer needed or even relevant.

Regex isn’t that hard. The challenge is identifying and understanding patterns in the data that you are filtering. Here is a brain hack: As an example, if to have pages and pages of logs that you need to filter, open up one of the log files, stare at the screen and hold the page down key for several dozen pages. Patterns can be easily seen in the blur of text that is quickly scrolling across the screen. (Our brains love to find patterns in noise, btw.) The patterns that you see will give you focus points for developing regular expressions to match. ie: You start breaking strings into chunks and seeing the ebb and flow of data streaming across a screen helps. Anomalies in the data “stream” are are easy to spot as well.

From a security and efficiency standpoint, you should also understand where the most processing takes place so you don’t kill whatever platform you are working on.

Sorry for the rambling, but I am getting older and feel the need to pass on a ton of tips and tricks whenever I can for these “archaic” languages.

permalink

report

parent

reply

[ - ]

harsh3466@lemmy.worldOP

6 points

1 year ago

That screen scrolling tip is gold. I’ve often used that trick to spot anomalies in data. Hadn’t considered using it to spot the patterns for regex.

permalink

report

parent

reply

[ - ]

bizdelnick@lemmy.ml

2 points

1 year ago

*

The only one you really need to care about (especially under Linux) is PCRE,

Well, no. sed, grep, awk, vi etc. use POSIX regexes. GNU implementations also provide perl compatible mode via an unportable option. In modern programming languages like go and rust standard regex engines are compatible to RE2 - relatively new dialect developed in Google that is not described in the Friedl’s book (you may think of it as an extension of extended POSIX dialect). Even raku has its own dialect incompatible to perl as well as other ones.

Nowadays it is common to move away from perl-like engines, however they are still widely used in PCRE based software and software written in python, JS etc.

report

reply

[ - ]

1 point

1 year ago

Thanks for the comprehensive reply! I have only used it for quite simple things like getting the id’s out of log lines where this and this key word exist. Great tip about pattern searching!

Merry Christmas

permalink

report

parent

reply

[ - ]

ricecake@sh.itjust.works

5 points

1 year ago

Yes. Most things use pcre, or Perl Compatible Regular Expressions, but there are other flavors. Usually they lack features or have slightly different syntax.

permalink

report

parent

reply

[ - ]

fuckwit_mcbumcrumble@lemmy.world

6 points

1 year ago

Regex101 is amazing. It tends to balk at backtracing which we rely on a lot for work, but it’s such a good visual.

Chat GPT can also save a lot of time writing regex, but it tends to write very unreadable regex because it thinks it’s being clever when it really isnt.

Regex is an art form, and writing readable regex is another step above that.

permalink

report

parent

reply

[ - ]

malijaffri@feddit.ch

2 points

1 year ago

Piggybacking onto this to mention my go-to online RegEx editor: RegExr. It lets you test the regex as you type, explains the particular symbols used, as well as has a sidebar where you can see different pattern types categorically. I’ve been using it for almost 2 years now, and haven’t had any reason to use much else (after I discovered this).

permalink

report

parent

reply

[ - ]

harsh3466@lemmy.worldOP

2 points

1 year ago

*

Thank you very much. I will definitely check out the regex builders. That’ll be super useful

Edit: fix stupid autocorrect turning regex into Reyes.

permalink

report

parent

reply

[ - ]

harsh3466@lemmy.worldOP

2 points

1 year ago

Computerphile! I’ll check those out.

permalink

report

parent

reply

[ - ]

bizdelnick@lemmy.ml

14 points

1 year ago

*

It is a great book, although a bit outdated. In particular, nowadays egrep is not recommended to use. grep -E is a more portable synonim.

Some notes on you script:

You don’t need to escape slashes in grep regex. In the sed s/// command better use another character like s### so you also can leave slashes unescaped.
You usually don’t need to pipe grep and sed, sed -n with regex address and explicit printing command gives the same result as grep.
You could omit leading slash in your egrep regex, so you won’t need to remove it later.

So I would do the same with

tar -tzvf file.tar.gz | sed -En '/\.(mp4|mkv)$/{s#^.*/##; s#\.\[.*##; s#[^a-zA-Z0-9()&amp;-]# #g; s/ +/ /g; p}'

permalink

report

reply

[ - ]

DefederateLemmyMl@feddit.nl

4 points

1 year ago

nowadays egrep is not recommended to use. grep -E is a more portable synonim

Not directed at you personally, but this is the kind of pointless pedantry from upstream developers that grinds my gears.

Like, I’ve used egrep for 25 years. I don’t know of a still relevant Unix variant in existence that doesn’t have the egrep command. But suddenly now, when any other Unix variant but Linux is all but extinct, and all your shell scripts are probably full of bashisms and Linuxisms anyway, now there is somehow a portability problem, and they deem it necessary to print out a warning whenever I dare to run egrep instead of grep -E? C’mon now … If anything, they have just made it less portable by spitting out spurious warnings where there weren’t any before.

permalink

report

parent

reply

[ - ]

bizdelnick@lemmy.ml

1 point

1 year ago

GNU grep, the most widespread implementation, does not include egrep, fgrep and rgrep for years. Distributions (not all, but many) provide shell scripts that simply run grep with corresponding option for backward compatibility. You can learn this from official documentation.

Also, my scripts are not full of bashisms, gnuisms, linuxisms and other -isms, I try to keep them portable unless it is really necessary to use some unportable command or syntax.

permalink

report

parent

reply

[ - ]

DefederateLemmyMl@feddit.nl

0 points

1 year ago

*

GNU grep, the most widespread implementation, does not include egrep, fgrep and rgrep for years. Distributions (not all, but many) provide shell scripts that simply run grep with corresponding option for backward compatibility. You can learn this from official documentation.

It seems you need to read the official documentation yourself. While it’s new information to me that egrep is no longer a symlink, as it used to be a couple of years ago, but a shell script wrapper to grep -E instead, the egrep command is to this day still provided by upstream GNU grep and is installed by default if you run ./configure; make; make install from source. So it is not a backward compatibility hack provided by the distribution.

You can check for yourself. Download the source from https://ftp.gnu.org/gnu/grep/grep-3.11.tar.gz, unpack and look for src/egrep.sh or line 1756 of src/Makefile. Apparently the change from symlink to shell script was done in 2014, and the deprecation warning was added only last year.

In any case, my larger point is that the depreciation of egrep was a pointless and arbitrary decision that does not benefit users, especially not veterans like myself who have become accustomed to its presence. I don’t mind change, but let’s be honest, most people are not in the habit of checking the minutiae of every little command line utility they use, so a change like this violates the principle of least surprise. It’s one thing if things are changed with a good reason and the users do not only suffer the inconvenience of the change but get to reap the benefits of it as well, but so far I haven’t found any justification for it yet, nor can I think of any.

So if there is a portability problem with using egrep now, it’s a self-inflicted portability problem that they caused by deprecating egrep in the first place.

Also, my scripts are not full of bashisms, gnuisms, linuxisms and other -isms, I try to keep them portable unless it is really necessary to use some unportable command or syntax.

Good for you. Do you want a cookie or something?

report

reply

[ - ]

13 points

1 year ago

Just to chip in because I haven’t seen it mentioned yet, but I fing LLMs like ChatGPT or Microsoft Copilot are really good at making regexes and also at explaining regexes. So if you’re learning them or just want to get the darned thing to work so you can go to bed those are a good resource.

permalink

report

reply

[ - ]

harsh3466@lemmy.worldOP

4 points

1 year ago

You know, I haven’t yet used ChatGPT for anything, I might check it out for this reason.

permalink

report

parent

reply