How we maintained the GCNA Website

    This page was written in 1999, and last updated in 2009, before the second redesign of the GCNA Website.  It has not been revised to reflect that major change (and the accompanying personnel changes) nor the migration of these pages from the GCNA Website to the TowerBells Website in March-April 2012.  Nevertheless, those sections which do not explicitly reference the GCNA remain relevant, and many others remain accurate if the wording is changed from "GCNA Website" to "TowerBells Website".    

Maintenance of the GCNA Website is shared by two members of the committee appointed by the Guild for that purpose.  This page presents a moderately detailed overview of how some of that work is done, for the enlightenment of those visitors who are interested in such things.  It also serves to provide documentation of the technology in use for this work.  Finally, it explains why some requested updates don't get done as promptly as some might like to see.

Host system

The computer system which hosts this Website belongs to a commercial enterprise, whose services were contracted by Wylie Crawford as described elsewhere in these pages.  It provides the basic hardware/software infrastructure and the Internet access point.  The infrastructure includes the host computer and its operating system (Unix), utilities and disk storage management, plus Web server software (Apache).  It supports long, mixed-case filenames; these are case-sensitive (unlike the previous host, a Windows NT system, which supported long filenames but was not case-sensitive).  Public access is unrestricted for display of Webpages and downloading of selected files; maintenance access for adding or updating Webpages and other files is restricted to authorized personnel.

Main pages

The "main pages" on this Website are those Webpages which are in the home directory of the Website.  For the most part, they have to do with the Guild itself, its operations and activities.  They include the GCNA Home Page and all others which have Uniform Resource Locators (URLs) in which the domain name "www.gcna.org" is immediately followed by a solidus (/) and then a filename.  Most of these pages were originally designed and maintained by Norman Bliss, but all are now maintained by Carl Scott Zimmerman.  Each page has his email address (csz_stl@swbell.net) at the bottom of the page, since he should be contacted for any questions, suggestions or comments which you may have about that page.

No matter how they were originally constructed, maintenance of these pages is now done using a sophisticated HTML-aware text editor, operating on an offline mirror of the Website.  After verification of a set of changes, all revised pages are immediately uploaded directly to the GCNA Website.  For small changes, this update process can be completed in less than an hour from the time the Webmaster receives a request from a Guild officer or committee; more complex changes will naturally take longer.

(For the technorati:  The offline mirror for this part of the GCNA Website is resident on an Apple Macintosh PowerPC G4 computer running OS X 10.3.9.  The HTML editor is BBEdit 8.2.6.  The FTP utility used for upload is Fetch 5.3.)

Displayed as parts of the main pages are various images.  They may serve as stylistic elements (background or page heading) or as illustrations for portions of the text.  All are graphic files which reside in a subdirectory of the Website.

Data pages

The "data pages" on this Website are those Web pages which are in the "data" subdirectory or below it, i.e., those which have URLs of the same form as that of the page you are now reading.  Specifically, they all have the "/data/" directory name in the middle of their URLs.  These pages have to do not with the Guild itself but with the instruments which are the reason for the Guild's existence - carillons and their "relatives" in the world of tower bells.  "World" is not only symbolic but geographically literal, because these pages are derived directly or indirectly from the author's database of carillons of the world.  These pages are maintained offline by him, using a pair of computers with the technologies described below, and uploaded to the GCNA Website.  Like the main pages, all have his email address as csz_stl@swbell.net at the bottom of the page, since he should be contacted for any questions, suggestions or comments which you may have about such pages.

The data pages may be further subdivided into text data pages, site data pages and index pages.  There are also a very few hybrid data pages.

Text data pages

Text data pages provide the framework within which all data fits - introductory and general descriptive material, as well as access to the various kinds of indexes.  Text data pages all have mixed-case filenames ending in ".html", e.g., "Data_Top.html".  They were composed and are maintained using a methodology just like that used for maintaining the main pages (see above), though on a different computer.

(For the technorati:  The offline mirror for this part of the GCNA Website is resident on an Apple Power Macintosh 9600/200MP computer (Mac) running MacOS 8.1.  The HTML editor is BBEdit 6.1.  The FTP utility used for upload is Fetch 3.0.1.)

Text data pages may be uploaded directly to the GCNA Website whenever they are revised, usually accompanied by an appropriately descriptive addition to the What's New page. 

Site data pages

Site data pages are the reason for the existence of the /data/ section of this Website, since each such page describes one carillon or other tower bell instrument somewhere in the world.  All site data pages have uppercase filenames ending in ".HTM", e.g., "CODENVUD.HTM", and are derived from a database which is resident on an IBM-compatible personal computer (PC), using a specially-written Turbo Pascal program which runs under DOS.  The process of producing these pages works as follows:
  1. Run the extraction program to select information from the site database (further described below) and organize it into HTML files (Web pages) that reflect all substantive changes to the database since the last such run; one of those pages will be a revision index (see below).
  2. Copy the generated files from the PC to the Mac via a diskette, a transfer method sometimes referred to as "sneakernet".
  3. For each transferred file, apply the HTML editor as follows:
    1. If it is a revised site data page, use the file-compare feature to find and copy forward any information which was manually added in previous versions and remains relevant.  This includes all site-specific internal and external Weblinks. 
    2. If revisions to the site data page necessitate changes to any indexes which reference it, update those indexes appropriately.
    3. If it is a new site data page, add the appropriate site-specific internal Weblinks, using standard templates; also copy the revision index entry for this site into all relevant existing index files, with appropriate adaptations for each.
    4. Convert from DOS format (CR-LF line ends) to Macintosh format (CR line ends).  (Both are different from Unix format, which has LF line ends.)  This is a single-click process within BBEdit.
    5. Convert any remaining characters with diacritical marks (such as á,é,É,ç) from what works on the PC to the equivalent Macintosh character.
    6. If it is a site data page, and new external Weblinks have been found for that site, incorporate those into the page.
  4. Add an appropriate descriptive entry to the "What's New" page, including a link to the revision index.
  5. Verify the consistency of all internal links, and verify that all external links are still current.
Once the batch of new and revised pages are ready, they are uploaded to the GCNA Website together.  Then BBEdit is used to find all of the email addresses in the batch, in order to send a standard notification message to as many of those sites as possible; a copy of that message also goes to other interested persons.

Index pages

Website revision indexes are pages with names such as "IX990924.HTM".  They are relevant only to the batch of site data pages with which they are uploaded, and they are referenced only from the What's New page entry for the date of upload.  They are generated by the same Turbo Pascal program as the site data pages, as part of the same data extraction process, and are handled in the same manner (see above).  Each revision index includes links to the site data pages which were generated in the same batch run.

Permanent index pages, which enable the finding of site data pages using any of several different criteria, have mixed-case filenames like text data pages, but are a composite of text plus database material.  Most began as special extracts from the database, in a format essentially identical to revision indexes, to which appropriate text material was added.  Over time, these permanent indexes have been expanded by copy-and-paste methods using revision indexes as the raw material (see above).  Normally, they are revised and uploaded only in conjunction with the new or changed site data pages which are associated with changes in the index lines which link to such pages.  This stricture is followed in an attempt to make sure that index pages are never out of sync with the site data pages which they index.  (It is still possible for human error in editing an index page to result in a discrepancy between the index and what is being indexed.  Please report such discrepancies as soon as you find them, using the "mailto" link at the very bottom of the affected page.  Doing so will automatically generate an appropriate subject line which identifies the page.  Please send a separate message for each discrepant page.)

Hybrid site data pages

A very small number of site data pages have filenames ending in ".htm", e.g., "DENEWCAS.htm".  These are for 6-bell rings in North America, which are too small to be included in the database but too important to be omitted as we support our sister organization, the North American Guild of Change Ringers (NAGCR).  These pages were constructed by hand to resemble normal site data pages, and are also maintained by hand.

Database

Several aspects of the underlying database are particularly relevant to the process of Website maintenance.

Special characters

Different types of computers represent character data in different ways, so one of the most troublesome problems for computer users is managing the translation of character data from one system to another.  American-made computers all share a common base of ASCII (American Standard Code for Information Interchange), which includes upper and lower case alphabetic characters, numeric digits, and common punctuation marks.  But they have differing ways of extending ASCII to represent various "special" characters.

In the database for carillons of the world, the encoding of special characters was designed for optimal display of western European languages on an HP LJ IIIP laser printer.  As the data portion of the GCNA Website expanded to cover areas beyond North America, procedures were developed to manage the translation of these special characters from one system to another.

The extraction program translates most extended ASCII characters (e.g., those with diacritical marks, such as á,é,É,ç) within site data pages into HTML entities so as to be independent of font and character set.  Those few special characters which are not handled this way are corrected by hand in the editing step in which they are first seen on the Mac.  Thereafter, these alterations are carried forward semi-automatically as described above.  The Fetch utility automatically translates any remaining extended ASCII characters from the Macintosh character set to the ISO-8559-1 character set during the upload process.  Please report incorrect special characters (which may appear as asterisks or other "garbage" characters) as soon as you find them, using the "mailto" link at the bottom of the page where they appear.  Fetch also translates Mac line ends to Unix line ends during the upload process (see above).

Database maintenance

Tracking of changes to the database is done using the two dates which are presented near the bottom of every site data page.  These dates show when the textual and technical parts of each site's data were last revised in the database.  They also control which sites will be extracted in each batch run, when report-generator criteria such as "changed on date" or "changed since date" are used.

All new and revised pages, together with a concurrent revision of the What's New page, are uploaded to the GCNA Website in one batch.  This minimizes the possibility that a visitor to the Website might follow an index link to a site data page which did not agree with the content of the index.

PDF files

Dependent from the Hardcopy data page are various files in Adobe Portable Document Format (PDF), intended to be downloaded for viewing and printing.  They are all contained in the "pdf" directory below the "data" directory, i.e., they all have the "/data/pdf/" path name in the middle of their URLs.  All of these files originated as printable output from the database extraction program, though some were produced by processing plain text files which are not, strictly speaking, part of the database itself.  Those print files are copied from PC to Macintosh via sneakernet, using a custom conversion utility which translates all special characters to the Mac character set.  On the Mac, they are imported into a WordPerfect template document, lightly edited to polish the pagination, and "printed" to PDF files through a shareware PDF "printer" driver.

Precautions

No information extracted from the database is ever altered after extraction, to miminize the risk of inconsistencies or loss of information.  (Occasionally, information from the database may be slightly re-formatted for the sake of appearance.  Also, since the Technical data section of each site data page is a limited interpretation of only certain aspects of the database contents, some editorial changes are occasionally made to overcome those limitations and present such information more accurately.)  Whenever information kept in the database is changed, the affected pages are extracted anew.  (Note the distinction between information and its format.  Some changes in format are a necessary part of the cross-platform transport process.)

Archiving

Each set of uploaded files is copied to archives on three different storage devices for backup purposes.  One of those devices is part of a set which is periodically rotated off-site to provide backup against natural disaster.  The others provide backup against device failure and/or human error.

Mirroring

A local mirror of the GCNA Website is maintained on the author's primary Web-linked computer for test and reference purposes, so that statistics of public access to the "real" Website are not biased by access for development work.

Process timeline

The batch process which is used for extraction, finishing and upload of site data pages explains in part why additions and corrections sent to the maintainer may not appear promptly on the GCNA Website.  Such changes are first collected and organized, then used to update the database itself.  That process involves a different plain-text editor operating on one or more of a set of card-image text files.  (This editor is Kedit 5.0, a PC-based equivalent of Xedit, a powerful mainframe-based program.)  Then the extraction program is run to produce a printed report of the changes, for purposes of proofreading.  After all changes have been confirmed as correct, the extract and upload process described above can take place.

Although this may appear to be a tedious process, it is vital to insure that the information which we present is as accurate and consistent as we can make it.  If we violated process rules for the sake of speed, we would not only risk introducing discrepancies into what we publish, but also lose track of what is current versus what is not.  Or else we would lose the ability to proofread changes to the database effectively.

The same program used for data extraction is also used to produce hardcopy reports.  This program and the database are the direct descendants of those used to produce the very first published edition of Carillons of the World in 1979, as well as a series of six articles for the Bulletin of the GCNA in the same time frame.

Database history

As indicated under "Process timeline", above, the database which underlies the "data" pages of the GCNA Website is nothing more than a set of card-image text files, maintained with a plain-text editor.  What is meant by "card-image" is that each record of a file contains nothing but human-readable characters as if it were an 80-column punched card of the sort that once was used on mainframe computers.  In its original form, the database was in fact a box of such cards, and revision with a card punch machine was tedious.  The basic format of those original cards is still in use today, just as it was designed more than 40 years ago.  Although it has been extended and expanded to add new categories of information, is has never been necessary to change it.  Conversion from a box of cards to a set of files on a floppy disk, and now a set of files on a hard drive, has greatly eased the maintenance process.  Not only can changes be made more easily and quickly, but the risk of error has been considerably reduced, because the result of each change is immediately visible on the computer screen.

The program which processes the database has undergone considerably more change.  Originally it was written in the high-level programming language FORTRAN, and itself resided in a box of punched cards.  Several variants of FORTRAN were used, as the program migrated from one mainframe to another.  Eventually, the data (and the program) migrated from actual punched cards to card-image files on reels of half-inch reel-to-reel magnetic tape, of the type that used to be common on mainframe systems.

When personal computers became not only affordable but also sufficiently powerful to handle the processing requirements for the database, the program was completely rewritten in Borland Turbo Pascal, another high-level programming language.  All of the fundamental logic of the original program was retained, including the subprogram structures, though a number of minor implementation details had to be changed.  It was at this point that both program and data became resident on direct-access storage ("hard disk"), eliminating the need for either punched cards or magnetic tape.  The increased ease of editing in this environment affected not only maintenance of the data but also maintenance of the program.  As new categories of information were added to the database, changes were made to the program to display that information appropriately.  The largest single change was the addition of an option to display information as HTML files, i.e., Web pages.  Previously, all program output was in the form of print files, i.e., files formatted for delivery to a printer for producing hardcopy.

Database extraction

The data processing program mentioned above, which is used for extraction of information from the database, can be viewed conceptually as having three principal components: a control statement interpreter, a data loader, and a report generator. 

Control statement interpreter

The control statement interpreter accepts input from either the console or a control file.  Control statements can cause loading of a data file, setting of various processing options, selection of data (based on a wide variety of simple or complex criteria), sorting of selected data (based on single or multiple parameters), and generation of several different kinds of reports.  There is online help in the use and format of all control statements, though some can only be used in control files.

Data loader

The data loader reads one flat file and builds three temporary data structures, one in memory and two on disk.  It can be invoked repeatedly to load any number of data files together, subject only to the constraints of available space for the in-memory table.  (Loading additional flat files simply expands the three temporary data structures, as if a single large flat file had been read.)  Unfortunately, the total number of carillons and chimes in the world is so large that it is now impossible to load all data simultaneously.  This makes the production of certain types of reports impractical (e.g., world-wide summaries).

The in-memory table contains all of the condensed technical information found in the flat file, some of it converted from character strings to integers.  This table also has forward and backward pointers which connect all of the rows which describe a particular tower bell instrument, with each row describing a particular stage in the instrument's history.  The newest (or only) row for a particular instrument always describes its present (or last known) configuration, and is considered the primary record; other rows are secondary to it, in reverse chronological order.

One of the on-disk temporary data structures is a seqential list of the condensed technical information records found in the flat file, in un-converted form.  The in-memory table has pointers to the records in this list, which are used to produce Condensed Information Listing (CIL) reports. 

The other on-disk temporary data structure is a seqential list of the textual information records found in the flat file, in their original form.  The in-memory table also has pointers to the records in this list, which are used to produce Master Information Listing (MIL) reports, etc.. 

Report generator

What is conceptually a single report generator is actually a collection of special purpose report generators.  It is trivially simple to combine several different reports into a single printout, or a print file which can be used to make a PDF file.  Several of the possible reports are described on the Hardcopy page, and several combinations of them are available there as PDF downloads. 

The production of Web pages (both site data pages and index pages, as described above) is accomplished with one of these report generators.

The future of the database

From time to time, various people have urged the author to convert to use of a conventional relational database program (e.g., Microsoft Access).  The motivation for such a conversion is that it would enable the database to be maintained by anyone who has a reasonable degree of expertise in the use of such a program.  Of course that is in principle an excellent idea; very few (if any) people have the author's peculiar combination of expertise in relatively obscure programming languages and data design, and so such a conversion would make continuation of the author's work after his inevitable death a relatively straightforward matter.  Unfortunately, from the author's viewpoint it is not such a good idea (at least not just now), for two reasons. 

Firstly, the present organization of the data files does not fit the requirements of relational databases.  Relational database programs (RDBs) are excellent for handling data which fit the constraints of the relationships for which they are designed, and many kinds of commonly used data do that.  Nevertheless, RDBs are not a panacea.  On the one hand, they are unnecessarily complex for managing small quantities of simple data.  On the other hand, there are data relationships which can be forced into the "relational" mold only with great difficulty, requiring extraordinary contortions in design and programming and maintenance.  In the author's professional opinion, his existing set of data regarding carillons and chimes falls into this category.  It seems highly probable that significant information might actually be lost in the course of conversion to such a database.

Secondly, such a conversion would have almost no direct benefit for the author, and indeed would be to his detriment.  The total redesign of the database, the conversion of the existing data into an entirely new format, and the construction and debugging of an entirely new report-generator system would be a very large task that would contribute nothing to the current project for extending the scope of the GCNA Website to cover all of the carillons and chimes of the world.  Indeed, the time and effort required would considerably delay that project, the basic data for which already exists in the present format.

The one possible direct benefit to the database owner from such a conversion would be the ability to view the entire database at once for production of worldwide summaries and reports.  At present, that is an insufficient incentive for the author.

/signed/   Carl Scott Zimmerman, database owner.


[TowerBells Home Page] [Site data top page] [What's New] [Feedback]

This page was created 1999/10/04 and last revised 2009/01/11.

Please send comments or questions to csz_stl@swbell.net