Tuesday, May 10, 2005

How to remove empty lines and weird characters from MANY files

Say you have a file (a print) and need to clean it up, remove weird control caracters: how do you do it?
First you look at an hex dump of the file with hexdump -C: then you take note of the control caracters and their hex code.
After that you simply replace them with blanks or strip them off, depending on taste or requirements.
In the end you put together a big for loop like the following:

for f in directory/* ; do

# replace control characters
perl -pi -e 's/[\x0b\x01\x0c]//g' $f;
dos2unix $f;
# remove empty lines or lines with only blanks
grep -v '^[ \t]*$' $f > tmp;
mv tmp $f;
unix2dos $f;
done

Cheers

No comments: