Word Processor/E-mail Text Conversion Help

©Thomas N. Robb, 1997

Click on your problem

The e-mail document has a carriage return at the end of every line. You want to convert into a word-processing document with <cr> at the end of each paragraph only.




The document is broken into alternating long and short lines


Each paragraph of the document goes off into space to the right.


A document with aligned columns (left) come out looking like this (right).


A document in RTF (Rich Text Format)


In Unix: The machine beeps and doesn't allow any more characters to be input.



Solutions

Removing paragraph-internal <cr>s

While some word processors have an 'unbreak' lines function, let's assume here that you need to perform the entire operation manually. Essentially what we need to do is to replace all <cr>'s or<cr><lf>'s with a space. The catch here, is that you do NOT want to replace the ones at the end of a real paragraph, just those in the middle.

If the paragraphs happen to have an extra blank line between them, the process becomes simpler. If they do not, I suggest that you 'pre- process' the text to place an extra blank line between paragraphs. (If the writer had indented paragraphs using a tab, then you can follow the instructions below, search for a para mark followed by a tab.)

Next:

  1. Search for all occurrences of two successive <cr>s and replace them with something that does not occur in the text. I usually use '##'. Doing this leaves us with <cr>s at the end of lines NOT marking the end of paragraphs.
  2. Replace all remaining occurrences of <cr> with a single space.
  3. Replace your original symbol '##' with a <cr>
    Note: Some word processors do not allow you to directly enter <cr> into the search & replace dialog, since pressing the key normally signals the start of the search & replace operation. Some common substitutes are \r or ^p. Substitutes for the tab character for often \t or ^t. Consult your word processor's help function or manual to discover how your particular software functions.
  4. If the text is long, it is also a good idea to search for '- ' (a hyphen followed by a break) since hyphenated words at the end of a line will automatically have a space inserted in step 2 above. Note the problem with 'weight-loss' and 'fat-free' in the illustration.
Return to top

Mis-wrapped lines

In the above illustration, the sender had placed a line delimiter after the 80th column of text in his word processor, but the mailer that he used had the 'word wrap' set at 60. The result was that the last 20 characters of each old line formed a new line in the e-mail message resulting in what you see above.

Lines in a word processor normally have a special delimiter at the end of each each paragraph which usually consists of multiple lines. This way, the word processor can freely 'word wrap' when additional text is inserted mid-paragraph.

There are two ways that this message could be sent to avoid this problem:

  1. Alter the left margin so that the lines are shorter (less than 78 characters) and then "break" the lines by adding a to the end of each line. (MS Word, for example, has a 'Save as Text with Line Breaks' feature which will accomplish this. Nisus Writer (Macintosh) has a menu item in the Edit menu which will break the lines in the selected (highlighted) text.
  2. If you are using mailing software which automatically places returns at the end of each line, you can merely copy & paste the original paragraph-formatted text into the message window. The word processor will add the returns when it is sent. Eudora, Microsoft Explorer and Netscape work this way. The length of the lines are determined by the width of the mailer window on the screen.

NOTE: For both of these methods it is important to view the text using s 'fixed-width font' such as 'Courier'. Otherwise the number of characters that will fit in a line can vary considerably since a line with many instances of the thin letter 'l' will take up less space than one with a preponderance of the letters 'w' and 'm'.

Return to top

Lines going into space

With certain older mailers, lines will not be broken into shorter 80-character or less segments automatically. The text appears as multiple lines in the sender's window, but, in fact, each paragraph is still represented as a long line of characters. When viewing such text in mailer which is not equipt to deal with it, each paragraph goes out into space towards the right since the software is not smart enough to wrap the text to successive lines after a certain length.

Saving such messages and then reading them into your word processor will normally make the entire text readable. Copying & pasting works in some situations, but with some software, you will only be a able to copy the part of the text which is actually visible on the screen.

Return to top

Column Misalignment

To tab or not to tab

Word processors know how to handle tabs, but e-mail does not. Many mail readers DO have default tab settings, often set to either every 5th, 8th or 10th character position, so tabs present in the message will still push the text somewhere. The problem is that you, the sender, have no control over how the text will appear, and, in fact, the text will appear very differently depending of the software that the recipient happens to be using.

In order to assure that your message is formated as you would like it, you need to substitute spaces for each tab, a sometimes cumbersome and time-consuming process -- and one which seems a bit primitive to those used to word-processor functions.

Sometimes, however, you can recreate the original formatting by copying the text in disarray from the mailer window and pasting into your favorite word-processor. The text most likely will still contain the embedded tabs, so if you place tab markers in the ruler line in appropriate locations, you can reproduce the original alignment.

If your mailer allows you to save your message as a file, you will surely be able to capture those tabs and rest your ruler line accordingly. Here is an easy work-around if you want to send a message containing tabs and are reasonably sure that your recipient has the wherewithall to be able to move the text to a word-processor for reformatting: Include a manually-typed 'ruler line' with the text which shows the tabs in the correct location. For example, you could include these lines just before your chart:

**Set your word-processor tabs like this to align the columns properly:
----------------T----------T----------T

RTF Format

If you want to preserve your settings as well as your formatting, RTF ("Rich Text Format") is one way to allow most people to recreate your original document with ruler lines, fonts and styles intact.

Most word-processors allow files to be saved in, and decoded from, RTF format. Take a look at your own WP's 'Save as' menu and you will probably find this option. The resulting RTF file is considerably longer than the original document, but all formatting information is included and the file can travel over the internet as a normal, plain-text ASCII file.

The user need only move the RTF data to a new document, save it, and then open it with the word processor's RTF option active. Note that there are some features such as footnoting and columnization which RTF may not handle well, but it is the easiest way to convert and send files between differing word-processing platforms.

Return to top

Unix will not allow more than 255 characters to be input without beginning a new line. If you attempt to type text directly into the Unix 'mailx' program, or paste text from a word-processor, when the text has not been pre-broken into shorter segments, the Unix system will emit one beep for each excess character. If you have pasted in a long section of text and it starts to beep, turn down the volume and go get a cup of coffee!

Back to The top The Paperless Classroom? Tom's Home Page