Linux Extract Email Addresses and Web URLs From A Long Document

Alright! This is a tiny post about specifically two functions. You can either use it directly from the command line or embed those pieces into another script to do the job it is made for. I have used them in both forms. So, thought to share it with you people. 🙂

The file I am using to get the stuff from is quite big and filled with so much text. Refer to as a README.md file in the screenshots. I believe the similar file I have used in the video too.

Extracting Email Addresses From The Document

#!/usr/bin/env bash

filename=$1

egrep -o  "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+.[a-zA-Z0-9.-]\b+" $filename

Hey, it is darn simple. In crux, it happens between the word boundary and the use of specific characters and symbols.

Example:

2024-03-30-172021_1916x114_scrot.png

Extracting The Web URLs From The Document

#!/usr/bin/env bash

filename=$1

if [[ $1 == "" ]];then
        echo you need to provide the filename.
        exit 1
        fi


sed -ne 's/.*\(http[^"]*\).*/\1/p'  < $filename

Ah, it is even easier, simple capture with some regex and replay it to print.

Example:

2024-03-30-172159_1906x222_scrot.png

Alternatively, You can take a peek at my YouTube Video regarding that.

About unixbhaskar
GNU/Linux Consultant

Leave a comment