Personal record of bad characters that arise when converting text from a different format.
|This is a specialized area, which may be of use to some of us in the field of editing and HTML webmastering. The escape characters, instead of their true equivalents, usually show up when converting older or more simple file-types into another format.
As I said, this is a bit 'specialized', but a good topic for reference. I have, on countless occasions, had to wander files and/or the internet looking for an obscure 'bad-character' and what the author really meant in the original version.
I will start this article with some of the characters I come across every day in my job as an online-newspaper editor. I will add more characters with time, suggestions, and when the need arises.
The 'unprocessed' code usually contains the hex number of original symbol. This is of some help when looking up unfamiliar codes. In some internet environments, the hex-code can appear as a small square with the numbers printed inside it. This is also shown for unprocessed Japanese, Chinese and Korean characters.
When an HTML conversion code is listed, please understand that the actual code has a & preceding the number and a ; afterwards. These symbols have been removed to allow the code to be seen.
Unprocessed Quotation and Apostrophe marks :
These are the marks of punctuation I come across most often in my day-to-day work. They often show up, in my editor, as the following symbols.
/u2019 - This is a normal (left) apostrophe (')
Normally this can be replaced with a keyboard apostrophe, or conversion code #8216.
/u201c - This is an opening (left) double quote (“)
It can be replaced with the HTML conversion code #8220
/u201d - This is a closing (right) double quote (”)
It can be replaced with the HTML conversion code #8221
Hyphens, En and Em dashes :
After the above, these are the next most troublesome punctuation marks to convert. Often, this editor may even have to consult a punctuation guide. Note: Even if you know which mark the author meant, these punctuation marks are often misused!
The hyphens in my program, specifically, show up in the editor correctly, but on the internet-side show up as an unprocessed hex code. We use code #045 to replace hyphens.
/u2013 - This is an unprocessed em dash. The em dash is longer than a hyphen, and also longer than the 'en' dash. It is used to replace commas in certain areas of prolonged pause or separation. It is often overused. We use #150 to replace 'em' dashes.
The 'en' dash is not often used in our articles, but I will document it as soon as I see one again. The 'en' dash is often used between areas of time. It is slightly longer than a hyphen, but not as long as an 'em' dash.
The horizontal ellipsis : This is the trailing ... often seen when something is continued later, or has been left out of an article.
/u2026 - This is the horizontal ellipsis. It can be replaced with HTML conversion code #8230.
There is more help, of a tabular sort, at the link <a href="http://www.ascii.cl/htmlcodes.htm">Ascii Codes</a>.