Making the Web World Wide: The W3C Launches a New Internationalization Initiative
Last month the World Wide Web Consortium (W3C) announced its new Internationalization Initiative as a way to boost its long-running activity in this area. CSA Research spoke with Richard Ishida, who leads these efforts, to learn more about its plans and what they mean for the language industry. He described an ambitious effort to identify – and resolve – technological barriers that keep the web from living up to the world wide part of its name. However, the success of this effort will rely on contributions from language experts around the world.
For speakers of English and various other major languages, it is easy to believe that the web just works, but for smaller linguistic communities, the story can be quite different. The W3C has begun an evaluation of how well 79 languages work on the web and already found that at least 60 of them face obstacles.
The W3C’s Language Matrix identifies online multilingual issues that need improvement.
Source: World Wide Web Consortium
The Internationalization Initiative’s first step is to engage in a gap analysis for the languages, which focuses on the ability of browsers and e-book readers to display and interact with these languages. Some languages have excellent support, while others are barely functional. For example, the gap analysis has found that line breaking does not work for the Javanese script, and the solution would require that some basic natural language processing (NLP) capabilities be baked into browsers. Similarly, traditional Mongolian has persistent problems with encoding.
But as even a casual glance at the chart shows, the numbers of question marks that indicate unknown status is very high. Ishida tells us that the Internationalization Initiative in many cases doesn’t know whether or not there are issues for certain features of specific languages, and so needs outside experts to contribute their knowledge. For example, Urdu, Pashto, and several other Arabic-script languages currently show perfect scores in support, but only four out of 24 cells for them are filled out. Extending the analysis would almost certainly show them – particularly Urdu, which is arguably the single most difficult language to render on computers – to face major obstacles, similar to Arabic (which has a score of 0.19 out of 1.0).
Addressing these problems will require the W3C to engage with outside experts who might not otherwise get involved with the organization’s efforts. The Internationalization Initiative is now using Github to allow the public to add issues to individual gap analysis reports, which makes it easier to contribute and manage issues, and allows editors to build documents in manageable chunks. There’s also a handy page that tracks current issues and questions about typographic support. Through these documents, developers will be able to identify barriers to access and resolve them. Of course, success depends on outreach and dissemination, as the people with the most direct experience with difficulties are unlikely to discover this effort in the course of their normal activities.
A second area of focus is on helping developers create interoperable specifications, content, and code to improve the overall state of language infrastructure on the web.
- Developers of W3C specifications. Currently, creators of W3C specifications may produce standards that do not account for international needs. A new checklist helps them ensure that their efforts will work across languages. Although the recommendations emphasize development of Web standards, the issues they raise are important to any technology implementer that needs to create internationalized data formats that can work and interoperate with other ones. Here automatic audit tools that look for potential problems based on keywords might also help. The Initiative also provides an index to help application developers find information about script-specific requirements.
- Individuals who work with W3C technologies. This initiative publishes information on authoring support that provides best practices and recommendations for how to handle multilingual content in web formats, style sheets, and other content. These materials are fairly technical, but content creators who want to do things right should keep them handy to ensure that they create optimized content that will work around the world. Any organization that produces multilingual content will benefit from familiarity with the issues that they raise.
- Web developers. The W3C’s Internationalization Checker helps web developers identify problems in their code that might create barriers for their content. This automated tool provides detailed reports on specific problems, along with suggestions for how to resolve them. It is particularly important for web architects creating new templates, as solving issues once in the design phase delivers maximum benefit.
These tools can help prepare the web for multilingualism, but the larger effort will be to improve the infrastructure, including browser support, so that web agents can render and interact with text appropriately, regardless of locale. Helping the web to live up to its international potential is a massive undertaking, but many hands can make light work, and CSA Research encourages individuals with knowledge about various languages and their limitations online to share their knowledge for the benefit of all.
Although the W3C may be able to get technical aficionados to contribute their knowledge to the documents, it will need broader outreach to the community of experts and everyday users who can react with observations, nuance, and examples of where things don’t work. The most valuable feedback in this endeavor will be from those who know and understand the typographic best practices for their languages – such as printers, publishers, typesetters, typographers, University professors, government standards folk, and web or e-book content developers – and who can provide reliable, authoritative information or release any such information which is not currently available in English.
Also key will be finding ways to engage with actual users – through methods such as gamified approaches to bug reporting with leader boards, local-language surveys of web users about problems they encounter, and in-context observational studies. However, doing so is a balancing act, if the Initiative is not to be overwhelmed with off-topic comments. Succeeding here could help the W3C break out of the technical communities that have traditionally provided resources. Such a shift would represent a major undertaking and shift in strategy, but would provide real benefits around the world.
About the Author