I have large TXT files in arabic Tashkil and I'm trying to find lines that contain specific pattern mashkula with َ ً ُ ٌ ّ ْ ٍ , I've tried the following grep syntax:
cat file.txt | grep "اهلا"
This returns nothing until I insert Tashkil marks:
cat file.txt | grep "أهْلاً"
I get the correct output
أهْلاً
I also tried
grep -P "[ُ\ ّ\ َ\ ً\ ِ\ ٍ\ ٌ\ ْ\ \~]|[اهلا]" file.txt
And this returns all matching characters in different patterns:
أهْلاً أ ... هْ.. لًا أنْتَ لَيْلاً ..
How to match arabic diacritical marks with grep? Is it possible to remove Tashkil marks from text before using grep? My OS is Ubuntu 18.04
UPDATE: At this moment, I remove Tashkil marks from text with:
sed "s/[ُ ّ َ ً ِ ٍ ٌ ْ]//g", then I can grep what I want. But in this approach, sed command removes spaces from all text!