Multilingual Website FAQ

fold faq

What type of site do we need?

The following types of site can be distinguished:

  • single script or language.
    If your site is in a single language, the chances are that the people entering the text are going to be native speakers of that language. The issue here is likely to be whether the back-end of your CMS will be easily intelligible to them, or whether they will need also to be skilled English readers. In TYPO3, you can choose from a wide range of translations of the back-end, and these translations remain under your control - you can edit them, or even translate the whole interface into a new lanuage. 
  • more than one script or language on the same page(s)
    The same remarks as above apply, except that each user of a TYPO3 web site can determine their own back-end language, and so collaboration is made considerably simpler.
  • independent site sections (effectively different sites) in different scripts/languages, which may be translations, but may not be. This is termed a "tree" approach. In TYPO3 these sites could be treated either as alternative parts of the same interface, accessed from a single menu, or they could be accessed as separate urls (for example www.mysite.com, hindi.mysite.com, etc.)
  • parallel translations, generally based on a "dominant" language, usually the one the site is initially written in. 
    In the case of TYPO3 there is
    • a mechanism (usually a flag-based menu) for switching between languages on pages
    • a means of transferring content elements from the dominant language to other pages for translation
    • the capacity to manage parallel translations of all content elements, and to vary the overall content of pages which are in principle identical.
    • the capacity for visitors to a site to be able to express a preference for a particular language, and/or simply picking up on their probable preference based on their IP address
    • the system will generally default to the dominant script in case a page has not yet been translated.
    • If you base your site design on a framework such as YAML, the layout can be reversed for right-to-left scripts using simple css changes.

Various combinations of these approaches could be found in a single website.

fold faq

Should I use TYPO3?

The first three types could be constructed as static websites, with translated pages switched to by means of links manually inserted in the site. Static sites, however, are difficult for non-technical site owners to maintain, because maintenance requires a knowledge of HTML. Therefore they are mainly useful if you are not going to want to change the site very often.

In particular, unless your translators are also HTML programmers, they will have to provide a technician with their translations for integration with the site. In most cases, the technicians are not going to understand the text, so the potential for errors increases greatly.

These types of site could equally well be implemented using a content management system, and this makes it much easier for as this separates the content from the layout, and provides editors and translators with familiar interfaces for text entry.

The final type of site must be implemented using a content management system which supports these features. TYPO3, which Gate Seven use, is one such system.

fold faq

Should we rely on written text, or use also/instead multimedia presentations?

The net makes strong assumptions of literacy, which need to be explored carefully when you plan the delivery of your content. The major deciding factor here is likely to be whether your target audience is likely to be literate in its language. This is not usually an issue for commercial web sites, but for community-related projects, it can be a problem, and the only practical way of getting your message across is by means of multi-media presentations.

Additional factors to be considered are:

  • Technology: Multimedia files are much larger than text, leading to issues with bandwidth at both ends of the system. The users' experience may be seriously disrupted by the need to download large files in a simple system, or wait for streaming media to deliver content in a more sophisticated one.
    The issue is less pressing than it used to be, since broadband is now widespread, and formats such as mp4, together with the widespread capacity to produce video are common, though perhaps not among the older people who tend to need the service.
  • Accessibility It is unlikely that you can provide a seamless multimedia site which supports navigation, unless the languages you are working with are amenable to presentation via a screen reader (in which case you don't really need the multimedia files) 
    Screen readers (which turn text to speech at the client end, and are designed for visually impaired people) are generally good only for major languages, and are in any case only available to people who have them installed. 
    Otherwise a third person is going to need to interact with the site to access the links to the material.
fold faq

What Problems are likely using the scripts we need?

The scripts of the world divide into two sorts:

  • simple scripts. 
    These are scripts in which the characters sit next to one another in linear order. The most common of these is Latin script, which is used in its simplest form for English. Chinese ideograms are also simple. 
  • complex scripts. 
    A script is complex if the appearance and ordering of its characters is dependent on the characters around it. Examples include Arabic script, Hebrew, and the many Indic Scripts such as Bengali and Hindi.

If you are considering using complex scripts, or scripts or language with restricted operating system support and your target group has relatively little money, you do need to be aware that you cannot rely on them being able to view your site unless you make sure that they can do so.

fold faq

What if any standards should we comply with?

The main objective of a web site, or any other publication, is to make content available to visitors in a form that they can understand: if people can read your web site/publication you have (usually)achieved your major objective, regardless of whether the text adheres to standards, or not.

Ideally, however, you should adhere to text standards, particularly Unicode, since they allow your text to be stored in a standard way which is recognised by applications, so that it is able

  • to be searched
  • to be definitively in its script, and therefore displayable using fonts supplied by operating systems
  • to be recognised and indexed by search engines
  • to be future-proof, since it is not dependent on the continued availability of particular fonts
  • in principle, at least, to be enterable directly into dialog boxes and text areas using the keyboard layouts provided with modern operating systems.

Unicode

Unless there are clear reasons not to do so, the standard to use is Unicode. Unicode encodes the information needed to generate readable text and Unicode text is displayable correctly by any modern browser - at least on PCs.

fold faq

How can I ensure that people can read my site?

Displayable is one thing, and displaying is another. Being readable gets even more complicated if your script is complex. The following solutions achieve readable text for complex scripts:

  • Forget about text and deliver graphics
    Graphics always work (provided visitors have not turned them off), but they are large compared to text, so your site will be slowed down considerably. Nor can graphics be searched, so your text will be ignored by search engines.
  • (Perhaps) forget about Unicode and save your text as cross-platform documents, usually PDF files.
    Adobe Acrobat files (PDFs) are really intended for the distribution of reference material, including brochures, which people download and potentially keep, or to prepare documents for print. Font data can be embedded in them, which ensures that the text can be read on a cross-platform basis. The key problem, however, is that they are separate files which have to be downloaded and then loaded into Adobe Acrobat before they can be viewed. 
    On a website this seriously interrupt the browsing experience, due to the need both to download the files them selves, and to activate the Acrobat reader plug-in. 
    The production of PDF files is expensive if you need any control over the process, which you do as soon as you want more than basic results. 
    To make navigation accessible, you will have to revert to graphics or another solution, which generally makes for trouble with CMS systems (though not TYPO3), and is very time consuming.
  • Forget about Unicode, and pre-compose your text in a private encoding, avoiding the need for the OS to process it.
    This may work reasonably well from the readability point of view, but there are insoluble problems if you intend to use a right-to-left formatting script. 
    If you need your site to be interactive, you will need a way to enable users to enter text, which they will probably not have. 
    It will be extremely difficult to search the text meaningfully, in fact impossible if the text formats from right to left.
    You will also need a way of delivering font data to your readers which does not breach the conditions of you font license. An example is Microsoft WEFT - but that only works with Microsoft Browsers.
  • Specify Unicode, and specify the browsers you support, and forget about everyone else.
    Microsoft will love you, since you will pretty much have to insist that readers use their technology, and, in practice, Windows XP or Vista. You are excluding anyone who cannot afford this technology, or who has a Mac.
  • Use server-side software which delivers text in a form the readers' browsers can understand.
    Glyphgate combines all the above mechanisms (apart from the PDF route), and delivers the one that works with your browser.
fold faq

Who will write the non-English text, and with what software?

Good reasons for using TYPO3 for a multilingual web site are:

  • Everything needed to enter and edit text is part of TYPO3, and therefore available from any computer that is connected to the internet
  • The TYPO3 interface languages are fully customisable, and are set individually on a user-by user basis, so that their preferred interface language appears when they log in. Translators and contributors get an interface they understand, while other people can work in another language altogether
  • The interface can be reduced to a level which is highly graphical.

Text on a TYPO3 website may be:

  1. Entered directly into the site by a backend user
  2. Entered into the site directly by a front-end user
  3. Copied into the site from material produced using other applications.

Entering Text from the Back-end

By back-end we mean issues relating to text which readers of your site cannot edit.

  • If you use non-standard encodings for your text, you will need to ensure that editors have the necessary text entry software installed on their computers. Many such packages work only with a limited range of operating systems, so you will need to take this into account. Such packages must interact with the editors' operating system in a fairly complex way, so they may cause unexpected compatibility problems.
  • If you stick with Unicode and PCs with multilingual support enabled, and the scripts/languages which the operating system supports, text entry is unlikely to be a problem, since users will have installed both the required fonts and keyboard layouts.
  • Most typists of Indic scripts are used to keyboard layouts which relate to the multiplicity of proprietary systems, not to the standard defined by the Indian Government, which is what is supplied by Microsoft.
  • Additional keyboard layouts can be developed if required, using Microsoft's Keyboard Layout Creator. This has its limitations - chiefly that text entry must follow Unicode order, not visual element order (which is more familiar to users of most existing input systems)
  • Implementing more complex keyboard layouts requires keyboard input processing to convert the keystrokes into Unicode. We can develop keyboard drivers for this purpose, though keyboard drivers suffer from the disadvantage that they are normally able to "understand" text only as it is being entered, and not with the text already in the document. Text in documents can be edited, but it is usually necessary to rub out a syllable, then re-enter it. This is, however, what users of such keyboard layouts will be used to.
  • Glyphgate provides the means for Unicode text to be entered by front-end users, at the expense of downloading a plug-in for their browser. This is built into the software enabling the process to be no more painful that installing support for Flash, for example.

Entering text from the font-end

Front end users interact with a site when they fill in forms, or participate in the many interactive features of a CMS, such as Forums. At times they may also add or edit the main content after they log in to the site.

  • Much the same comments as above apply, except that you cannot rely on your users having proprietary software on their systems, unless you are in a position to enable them to download it.
  • Front End users may have many different solutions to entering text, employing arbitrary code pages, but if you want them to be able to interact with your website, you need to provide a solution which enables entered text to be stored in a standardised way. 
    In practice, if the script is not supported by all operating systems they are likely to be using (Mac and PC being the most common), then you need to provide a text entry system.
  • Real Unicode text entry support is absent from Win 9.x, and is more-or-less limited for other Windows OS versions
  • Glyphgate provides the means for Unicode text to be entered by front-end users, cross-platform, at the expense of downloading a plug-in for their browser. This is built into Glyphgate, enabling the process to be no more painful that installing support for Flash, for example.

Text from elsewhere

  • While the ideal of a CMS is that authors enter text directly into pages, in practice pre-existing text may need to be included.  
  • You may need to be able to convert text from one encoding to another, particularly for complex scripts.
  • If you are using Unicode, be aware that the behaviour and extent of support of Unicode text in common applications like Microsoft Office may not be the same as that provided by the operating system.  
  • Contact us for information about the issues involved in integrating multilingual text across media.
fold faq

How will our site cope with the different capabilities of the multiple browsers on different operating systems?

Display of text in a browser (other than as graphics) is completely dependent on font data being available to the browser. The problem is that you cannot rely on this data being available through fonts that the user has installed.

In the case of simple scripts, provided you use a standard encoding (and the encoding is specified in the page header) your text will be readable if the visitor's system has access to at least one font containing the characters of your text.

The results may be far from what you intended, though, and to get the appearance you want, and in the case of proprietary encodings, you will almost certainly need to deliver fonts or font data, in addition to your text. 

Fonts files are complete fonts as supplied by their designers, installed onto a computer system, and available to all applications for editing, reading and printing documents. It is also possible to embed the same data, or the data relating to the characters actually in use in a document within the document itself, if the font designer gives permission for this. The most common permission granted in this way is to display and print a document (but not to allow editing). Similar technology can be used to "embed" font data into a web site. If this is the case, the web page is displayed with the correct font, but the font is not available to other documents on that computer.

Display of complex scripts pose greater problems. Unicode stores the information required to construct the text, not the text itself. Therefore a straightforward representation of the characters stored is insufficient to generate legible, or even comprehensible text. The stored text MUST be post-processed, either on the writer's computer, the server, or the vistotor's computer.

Even if the visitor's operating system does provide support for complex script display, the browser may not - and, just as you cannot predict which fonts a visitor has installed, you cannot predict in advance which operating system the reader will be using.

Unless you are going to rely on fonts already installed on readers' computers, you need to ensure that you deliver font data to them so that they can view your pages. This is essential for most multilingual work. There are several ways of achieving this: 

  • Rely on the fonts already on the Readers' systems
    If the languages of your site can be written using the base fonts provided with US versions of operating systems, there is no problem. The system will almost certainly default to a font of the correct style, even if none of the specified fonts exists on the visitor's computer. However, even system fonts such as Times New Roman, or Helvetica are present on only around 95% of computers running the operating system they were supplied with. Therefore you should always specify a range of fonts in your styles.
    The fonts for many complex scripts are not installed by default, so your users' experience may minimally be interrupted by the process of installing the appropriate fonts, assuming that these are available. 
    Availability may not be straightforward if the user does not currently have access to their OS set-up files. Your site will also only work on operating systems which support the scripts concerned. 
    Be aware also that Microsoft's fonts for Indic scripts are not masterpieces of font design, however suitable for UI work they may be, and only have a single weight. So headlines may well look misshapen.
  • Have users download fonts
    This is ok provided the fonts are included in groups 1 and 2 above, or you have a special (and probably specially expensive) license from the vendor. 
    However, your readers will not be able to view your pages immediately. They will have to download fonts, and, one way or another install them on their systems. 
    Some people may not be allowed to do this, since a bad font can bring down a system, so their administrators may prevent the installation of unapproved fonts. 
    Fonts consume resources, so the more you have installed the slower the system becomes, and some operating systems will only display a limited number of them, so even if they are installed, they may not display.
  • Dynamic Fonts
    This was a technology developed by Bitstream to enable the display of fonts on systems that did not have them installed. Although the software to produce and support them is still around, it is no longer sold by Bitstream, and the technology itself seems to be understood to be directed at printers, rather than screens. 
    This is probably because of the dubious legal thinking behind the underlying technology, except as applied by Bitstream to its own typefaces.
    The key points against the technology, if you can get dynamic fonts from a vendor, is that:
    • the system uses its own system of auto hinting, leading to inferior results on screen
    • it does not support open type, so Unicode cannot be used for complex scripts
    • it only supports a limited range of browsers, most of them now obsolete
  • WEFT
    Weft is a technology supplied free by Microsoft which delivers encrypted font data together with your web pages.
    • The font data is locked to the URL(s) which deliver it, so the font data cannot be used by web sites other than your own.
    • The font data is decrypted and installed temporarily on the target computer, so only has resource implications at the time of display of the page.
    • Hinting is included in the downloaded files.
    • Open type is supported, but only on OSs that support open type and particular scripts - i.e. Windows 2000 to some extent, and Windows XP and Vista to a greater extent.
    The key problem with the technology is that it is only supported by Microsoft browsers. Although these are the most common, they are by no means the only browsers in use.
  • GlyphGate
    Glyphgate is a server-side solution which processes your web pages and delivers text which correspond to the capabilites of the target browser. 
    • It uses the least messed up technology it can, following the list above, so in most cases, you can save standard Unicode text, and it will be displayed correctly.
    • The technology is being extended, so contact us regarding how it can support the languages you are interested in.
    The chief issues with Glyphgate are that it is proprietary server-side software, and complex to install. You need either to have your own web server to install it on, or you need to select an ISP that supports it.

Font Licensing

PLEASE DO FEED THE FONT DESIGNERS.Fonts are intellectual property, protected by copyright. Since the availability of fonts for multilingual display is central to what you are wanting to do, it is in your interests to realise that font designers need to eat, too. The skills involved in high quality font design and engineering are rare, and straightforwardly endangered by piracy. While there is a long history of typeface design for Latin script, and thousands of high quality designs already to choose from the same cannot be said for non-Latin scripts. If you do not support the companies and individuals that do this work you are directly creating a situation in which little font development gets done.Fonts, like other software are supplied to you under a license.

The usual licensing conditions for fonts are as follows (but you should check the details of your own fonts):

  • Font supplied under open source licenses
    These fonts can be redistributed in accordance with the license which is included with them. Normally, you can redistribute them. Note that fonts supplied with an open source operating system may not themselves be open source, although they may be freely distributable. Be aware, however, that, especially for non-latin scripts these fonts are often of poor quality, and if they are not poor quality, they may be of dubious legality.
  • Fonts otherwise freely redistributable
    These used to include the set of web fonts which Microsoft made available from 1996 onwards, and many other fonts of very variable quality. The Microsoft fonts are no longer freely redistributable, and require a fairly expensive license.
  • Fonts supplied with a commercial operating system
    The license for the fonts is the same as the license for the operating system. Generally this means that you are not entitled to use the fonts except on the computer which runs that particular licensed version of the OS. You should not copy the fonts to other environments, such as web servers. 
    The core fonts may have embedding permissions set to "everything is allowed", which Microsoft applications may take to mean that fonts embedded in documents should be permanently installed on the reader's computer. If this is so, the vendor has agreed to this means of distribution, but not necessarily for you to distribute the fonts when they are not embedded in a document, nor that you distribute them (unembedded) from a web server.
    Microsoft core fonts are now made commercially available in case you want to use them from other platforms.
  • Fonts supplied with a package such as Microsoft Office
    Same as above, except that the fonts usually have permissions set which allow fonts to be embedded in documents for viewing and printing, but not for editing. These fonts should not be copied to a web server. 
  • Fonts supplied commercially by foundries.
    If you wish to load these fonts onto a server in order to embed them in you web pages, you need to check the conditions of the license. Some vendors may specifically exclude this use and expect additional payments for a license. 
    Some vendors set permissions to prevent all embedding, although this is uncommon.
fold faq

Where can we get typefaces from?

We can supply high quality fonts for virtually any script, together with the appropriate licenses for internet-based installation. Please contact us for information.

Apart from this, there is a large number of fonts available free of charge from various sites on the internet, though their quality is variable (to say the least). If you are our client, we will probably have evaluated them, and be in a position to recommend them (or not). Otherwise,  Google to find them.

Commercial fonts using open type for complex scripts are few and far between. This is largely a function of piracy.

fold faq

How do we manage translations?

If you are using TYPO3, the management of translations is a logical extension of the general process of managing the writing, editing and publication of all content.

If your sites is deployed as a parallel structure (i.e. in effect several sites with the same structure, each in a different language) a page edited in one site language can be instantly prepared for translation at the click of a button: the result is a new page containing all the original text for the translator to refer to; the translator has merely to enter the translations into the content elements, then delete the original text. Once completed, the page is referred to whatever system of publication control is being used.

The other aspect of translation management is of course getting the translations done. Through our work in multilingual computing we have extensive contacts with translators, particularly for Indic languages.

Clearly an advantage of a content management system is that a translator can work on your site from anywhere in the world that is connected to the internet. Moreover, their access to the site can be limited strictly to those areas that are their business: essentially, they log in to a back-end that can be set up to display only the pages that need their attention.

Show all / Hide all