Word Processor/E-mail Text Conversion Help
©Thomas N. Robb, 1997
Click on your problem
The e-mail document has a carriage return at the end of every line. You want to convert into a word-processing document with <cr> at the end of each paragraph only.



The document is broken into alternating long and short lines

Each paragraph of the document goes off into space to the right.

A document with aligned columns (left) come out looking like this (right).

A document in RTF (Rich Text Format)

Solutions
Removing paragraph-internal <cr>s
While some word processors have an 'unbreak' lines function, let's
assume here that you need to perform the entire operation manually.
Essentially what we need to do is to replace all <cr>'s
or<cr><lf>'s with a space. The catch here, is that you do
NOT want to replace the ones at the end of a real paragraph, just
those in the middle.
If the paragraphs happen to have an extra blank line between them, the
process becomes simpler. If they do not, I suggest that you 'pre-
process' the text to place an extra blank line between paragraphs. (If
the writer had indented paragraphs using a tab, then you can follow
the instructions below, search for a para mark followed by a tab.)
Next:
- Search for all occurrences of two successive <cr>s and
replace them with something that does not occur in the text. I
usually use '##'. Doing this leaves us with <cr>s at the end of
lines NOT marking the end of paragraphs.
- Replace all remaining occurrences of <cr> with a single space.
- Replace your original symbol '##' with a <cr>
Note: Some word processors do not allow you to directly enter
<cr> into the search & replace dialog, since pressing the
key normally signals the start of the search & replace
operation. Some common substitutes are \r or ^p. Substitutes for the
tab character for often \t or ^t. Consult your word processor's help
function or manual to discover how your particular software functions.
- If the text is long, it is also a good idea to search for '- ' (a
hyphen followed by a break) since hyphenated words at the end of a
line will automatically have a space inserted in step 2 above. Note
the problem with 'weight-loss' and 'fat-free' in the illustration.
Return to top
Mis-wrapped lines
In the above illustration, the sender had
placed a line delimiter after the 80th column of text in his word
processor, but the mailer that he used had the 'word wrap' set at 60.
The result was that the last 20 characters of each old line formed a
new line in the e-mail message resulting in what you see above.
Lines in a word processor normally have a special delimiter at the end
of each each paragraph which usually consists of multiple lines. This
way, the word processor can freely 'word wrap' when additional text is
inserted mid-paragraph.
There are two ways that this message could be sent to avoid this
problem:
- Alter the left margin so that the lines are shorter (less than
78 characters) and then "break" the lines by adding a to the end
of each line. (MS Word, for example, has a 'Save as Text with Line
Breaks' feature which will accomplish this. Nisus Writer (Macintosh)
has a menu item in the Edit menu which will break the lines in the
selected (highlighted) text.
- If you are using mailing software which automatically places
returns at the end of each line, you can merely copy & paste the
original paragraph-formatted text into the message window. The word
processor will add the returns when it is sent. Eudora, Microsoft
Explorer and Netscape work this way. The length of the lines are
determined by the width of the mailer window on the screen.
NOTE: For both of these methods it is important to view
the text using s 'fixed-width font' such as 'Courier'. Otherwise the
number of characters that will fit in a line can vary considerably
since a line with many instances of the thin letter 'l' will take up
less space than one with a preponderance of the letters 'w' and 'm'.
Return to top
Lines going into space
With certain older mailers, lines will not be broken into shorter
80-character or less segments automatically. The text appears as
multiple lines in the sender's window, but, in fact, each paragraph is
still represented as a long line of characters. When viewing such
text in mailer which is not equipt to deal with it, each paragraph
goes out into space towards the right since the software is not smart
enough to wrap the text to successive lines after a certain length.
Saving such messages and then reading them into your word processor
will normally make the entire text readable. Copying & pasting works
in some situations, but with some software, you will only be a able to
copy the part of the text which is actually visible on the screen.
Return to top
Column Misalignment
To tab or not to tab
Word processors know how to handle tabs, but e-mail does not. Many
mail readers DO have default tab settings, often set to either every
5th, 8th or 10th character position, so tabs present in the message
will still push the text somewhere. The problem is that you,
the sender, have no control over how the text will appear, and, in
fact, the text will appear very differently depending of the software
that the recipient happens to be using.
In order to assure that your message is formated as you would like it,
you need to substitute spaces for each tab, a sometimes cumbersome and
time-consuming process -- and one which seems a bit primitive to those
used to word-processor functions.
Sometimes, however, you can recreate the original formatting by
copying the text in disarray from the mailer window and pasting into
your favorite word-processor. The text most likely will still contain
the embedded tabs, so if you place tab markers in the ruler line in
appropriate locations, you can reproduce the original alignment.
If your mailer allows you to save your message as a file, you will
surely be able to capture those tabs and rest your ruler line
accordingly.
Here is an easy work-around if you want to send a message containing
tabs and are reasonably sure that your recipient has the wherewithall
to be able to move the text to a word-processor for reformatting:
Include a manually-typed 'ruler line' with the text which shows the
tabs in the correct location. For example, you could include these
lines just before your chart:
**Set your word-processor tabs like this to align the columns properly:
----------------T----------T----------T
RTF Format
If you want to preserve your settings as well as your formatting, RTF
("Rich Text Format") is one way to allow most people to recreate your
original document with ruler lines, fonts and styles intact.
Most word-processors allow files to be saved in, and decoded from, RTF
format. Take a look at your own WP's 'Save as' menu and you will
probably find this option. The resulting RTF file is considerably
longer than the original document, but all formatting information is
included and the file can travel over the internet as a normal,
plain-text ASCII file.
The user need only move the RTF data to a new document, save it, and
then open it with the word processor's RTF option active. Note that
there are some features such as footnoting and columnization which RTF
may not handle well, but it is the easiest way to convert and send
files between differing word-processing platforms.
Return to top
Unix will not allow more than 255 characters to be input without
beginning a new line. If you attempt to type text directly into the
Unix 'mailx' program, or paste text from a word-processor, when the
text has not been pre-broken into shorter segments, the Unix system
will emit one beep for each excess character. If you have pasted in a
long section of text and it starts to beep, turn down the volume and
go get a cup of coffee!