Fixing Dos Line Endings

Linux Tux

Sometimes, when you're running coder on a module, you'll get a lot of errors complaining about Windows line endings. This is because you should set your editor to use Unix Line endings to be consistent with all developers. See the Drupal Coding Standards for more details.

Below is a handy bash script which will help you batch convert many files from DOS to Unix line endings.

grep -lIUr "^M" . | xargs sed -i 's/^M//'

First up, we use Grep to find ^M(which you can produce using Ctrl+V and then Ctrl+M). This is a special code for Carriage Return (Windows uses CRLF, Unix uses just LF).

-l
This tells grep to halt searching once its found the first instance - we only need 1 result per file.
-I
This tells grep to treat binary files as if there was no matching data
-U
This tells grep to treat the file as a binary file. Grep usually tries to guess the file type. If it guesses text, it will remove the CR characters from line endings to help keep Regex consistent across operating systems.
-r
This tells grep to be recursive

Next up, we pipe that result set into the sedcommand.

-i
This tells sed that we should edit all files in-place. If you like, you can change this to -ibakwhich would create a backup using the supplied suffix.
's/^M//'
This is a regular expressions find-and-replace. This tells sed to find all ^M characters (which are CR (Carriage Return) characters) and replace them with nothing (ie remove them).

This seemed to work really well for me - please post below if you have any alternative/better ways of doing this!

Note: I did try to use dos2unix however this did not remove a trailing ^Mfor some reason.

Comment Icon

8 Comments

The most recent comment was on Thu, 7th Mar 2013 - 11:46

In Ubuntu, there is a utility called "flip" that will do this. I'm not sure if it has the same bug you got with dos2unix.

Ahh handy tip! Thanks. Just out of interest, why are you adding the "g" modifier to the search? Surely there will only ever be one CR per line?

What's wrong with simply:

sed -i s/\\r//g $inputfile

s/^M// will not delete line endings but will delete all "M" at the beginning of a line, because "^" is interpreted as line beginning in a regex.

If you read the post, the ^M is a single character representing the newline. It is not two separate characters representing lines beginning with M in regexp. :)

As for the comment regarding:

sed -i s/\\r//g $inputfile

That is effectively what the initial script does - however it passes in ALL files found with the character, not a specific one.

Nice work! I modified your script slightly, incase there are ^M not at the end of lines ... such as in the script. :-)

    grep -lIUr "^M$" . | xargs sed -i 's/^M$//'

Thanks, again.

Add new comment

Filtered HTML

  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre> <ul> <ol> <li> <dl> <dt> <dd> <img> <p>
  • You can use BBCode tags in the text. URLs will automatically be converted to links.
  • You can enable syntax highlighting of source code with the following tags: <code>, <pre>, <bash>, <css>, <html>, <js>, <jquery>, <mysql>, <php>. PHP source code can also be enclosed in <?php ... ?> or <% ... %>.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.