Special characters on this Website

Any attempt to present information from a wide array of non-English languages inevitably requires the use of so-called "special characters" in order to be accurate.  This page describes how we do that on this Website.  We start by explaining the basics, then touch on problems of computer representation before delving into what we can or cannot do with non-English languages in various parts of this Website.

NOTE:  This essay over-simplifies the uses of characters in human languages.  For a painfully exhaustive treatment of that subject, see the English Wikipedia article on the English alphabet and the hundreds of linked articles in that online encyclopedia.

The basics of English language representation

Spelling any native word in the English language requires only "ordinary" Latin letters.  There are 26 of these, each of which has an UPPER CASE VERSION and a lower case version.  In "alphabetic order" they are as follows:
  Upper case:  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
  Lower case:  a b c d e f g h i j k l m n o p q r s t u v w x y z

Expressing numbers concisely requires the use of ordinary Arabic digits.  In "numeric order" they are as follows:
  Numerals:  0 1 2 3 4 5 6 7 8 9

Combining English words into sentences and including some minimal indications of expression requires the use of "ordinary" punctuation characters.  In no particular order, they are as follows:
  Punctuation:  . , ; : ' " ! ? ( ) /

Finally, there are several other common non-linguistic characters that appear on most American-made typewriters and computer keyboards.  In no particular order, they are as follows:
  Special:  ` ~ @ # $ % ^ & * - _ = + \ | [ ] { } < >

The character shapes that you see above, collectively called "glyphs," are how the English-trained human eye recognizes and distinguishes characters.  These glyphs can have many stylistic variations, as well as many variations in size, and still be recognizably the same characters.  A set of glyphs with a particular style and size is called a font; fonts of similar style make up a font family; and font families can be grouped into categories (such as serif or sans-serif).  But for the purposes of the present discussion, we are not concerned with styles or fonts--merely with the idea of distinct characters used for linguistic and related purposes.

Representing characters on computers

Computers don't understand glyphs--they only understand bits and bytes.  However, their output devices can translate groups of bits (i.e., bytes) into glyphs for us to see.  Thus every computer manufacturer has had a system for encoding common glyphs as binary numbers, and for displaying those binary numbers as the corresponding glyphs on various output devices (printers, monitors, etc.).

The earliest machine-independent American standard for conversion of bytes to glyphs is ASCII--the American Standard Code for Information Interchange.  It uses 7-bit bytes, which correspond to decimal numbers 0 through 127.  ASCII defines the human meaning of each of those 128 numbers--the 94 visible glyphs shown above plus "space" plus 33 control characters, each of which has a particular function in controlling various aspects of computer-related hardware.  Looking at this in the opposite direction, ASCII defines how each of these common glyphs and control functions is to be encoded as a binary number for storage within and transmission between computers.

Almost all modern computers use ASCII as the basis for encoding the most commonly used characters, and this is quite sufficient for handling the English language.  However, because such computers typically are constructed to use 8-bit bytes, there is an opportunity to define the association of 128 more numbers (decimal 128 through 255) with glyphs, thus enabling computers to handle information in languages that require various diacritical marks attached to or associated with the ordinary Latin letters, as well as digraphs and additional punctuation marks and special characters.  Unfortunately, these "extended ASCII" characters have not been defined consistently between the various computer manufacturers, resulting in translation difficulties when information is transferred from one computer to another.  These difficulties were eventually addressed in two different ways.

One way arose out of a desire for a machine-independent standard that would encompass all known characters in all human-readable languages, plus the entire spectrum of character-like glyphs that have been used for special purposes in printing documents of many kinds.  This desire led to the development of Unicode, which contains encoding definitions for hundreds, thousands, or even tens of thousands of different characters. 

The other way arose out of the development of the World Wide Web and the concept of a Web browser--a software program that can display information in a consistent human-readable form regardless of how information is otherwise encoded in the host computer or the visitor's computer.  The HyperText Markup Language (HTML) which is used to present readable pages to Web visitors is based strictly on the graphical characters defined by ASCII.  Glyphs for characters not included in ASCII are defined in HTML by "entity names"-- character strings that begin with an ampersand, continue with a few plain letters, and end with a semicolon.  In addition, three ASCII characters (<,>,&) have equivalent entity names, so that they can be used either as HTML markup codes or as normal glyphs.  Those entity names are, respectively, "&lt;", "&gt;" and "&amp;".  (You don't need to remember them, but they make good illustrations of the concept of entity names.)  As an example of the use of entity names to represent non-English characters, consider Ç and ç; they were displayed in the preceding phrase by using the entity names "&Ccedil;" and "&ccedil;".  (And those entity names are displayed here by nesting the entity name for ampersand. ;-)

The complexities of non-English language representation

Because the historical events involved in the development of carillons and other tower bell instruments took place primarily in Western Europe (and eventually in North America, where the author of this essay has always lived), information about that development and those instruments is readily expressed in English plus a few other languages that use the Latin alphabet.  As indicated above, the English language uses only characters that can be encoded and displayed with ASCII, which is so widely used that no translation (or to be more accurate, transcoding) is required.  That is the case for the several computers used to develop and maintain this Website, so all English-language information presented here should always display correctly.

Those other languages (primarily French and German) use a relatively small number of diacritical marks combined with various characters of the Latin alphabet.  Since all of those combined characters are contained in most of the various "extended ASCII" encoding conventions, it should be possible to manage them properly on our computers in order to express information in those languages correctly.  However, some special translation efforts are required when transferring information between computers, because the different computers follow different conventions for associating bytes with glyphs.  This has some impact on whether or not you will see in our Webpages just what we intended for you to see.

Extended ASCII characters are encoded in our database computer so as to print correctly on a Hewlett-Packard LaserJet III-P printer that gave good service for many years but no longer exists.  The database extraction program on that computer produces output files on disk in three forms--print files destined to become PDFs (in which extended ASCII has been transcoded from the HP convention to the Macintosh convention), HTML files destined to become Webpages (in which extended ASCII has been converted to HTML entities) and XML files destined to drive the display of regional locator maps (which must be fixed by hand before uploading). 

For languages that require Latin-based characters that are not included in extended ASCII (e.g., Polish, Maltese and Czech), those characters cannot be incorporated into the database.  Instead, Anglicized versions of those characters (or the names that would contain them) are used in the database, and may be manually replaced by the proper versions before files are uploaded to the Website.  For names which are natively expressed in languages that do not use Latin-based characters (e.g., Russian, Japanese, Chinese), we make no attempt to record or display such names.  Instead, we use either an English equivalent name or an English transliteration of the native name (e.g., Moscow or Moskva for the capitol city of Russia).

The Links section of each site data page is not derived from the database, but is entirely constructed by hand (with copy-and-paste of various source material, of course).  The same holds true for the pages that list great bells.  As such, it is fairly easy to incorporate HTML entities in order to display non-ASCII characters properly.

Summary

What you will see on this Website falls into three broad categories, as follows:

In spite of our best efforts to cope with the complexities of encoding foreign languages that are outlined above, it is possible that this Website includes some characters that we have failed to translate or transcode properly, or for which we have not used the correct HTML entity.  Therefore:

If you see an incorrect character on any page,
whether it is a typographic error,
a misplaced diacritical mark,
a malformed entity name,
or something else,
please use the email link at the bottom of that page to tell us about it!

Otherwise, we might not notice it for a long time, and it could cause confusion for other visitors who are not as perceptive as you are.
Thanks in advance for your help in making this Website more useful for all its visitors.


[Tower Bells Home Page] [Site data top page] [Credits and Disclaimers] [Feedback]

This page was created 2020/03/03 and last revised 2020/08/07.

Please send comments or questions about this page to csz_stl@swbell.net