X

Our Analysts' Insights

17Jul

TBX:2019: A New Version of the ISO Standard Raises the Bar

Localization industry veterans may recall when the OSCAR standards group in the now-defunct Localization Industry Standards Association introduced TermBase eXchange (TBX) way back in 2002, based on earlier work from 1999. Released in the early days of XML, it promised to be a major step forward for making terminological data useful. After it was adopted as an international standard (ISO 30042) in 2008, it seemed that it had reached maturity and a firm place as a star among language industry standards. However, TBX never quite lived up to its potential. A new version, released this year, could rehabilitate its position and prepare it for the next generation of content applications.

Translation tools vendors claimed to support TBX, but they never quite managed to interoperate properly with competing and complementary terminology tools. As a result, many – if not most – LSPs and translators continued to exchange terminology via spreadsheets or CSV files, even though these mechanisms have serious problems, such as inconsistent format and encoding and a lack of vital metadata. Even though TBX represented a good solution to such difficulties, users preferred the apparent simplicity of a spreadsheet.
 

TBX Grows Up

The situation has recently changed. Over the past few years TBX underwent a major overhaul to address its limitations and prepare it to meet new goals. A steering committee in ISO Technical Committee 37 – comprised of representatives from ASTM, CSA Research, FH Köln, LTAC Global, Universidad de Las Palmas de Gran Canaria, Kent State University, and the XLIFF committee – recently completed the 2019 version of ISO 30042 that ISO published. This new edition streamlines the format and addresses many of the complaints about and limitations of the 2008 version. Some of the major changes are:

  • Updated XML syntax. The earlier version adopted a syntax where data categories appeared as attributes in the XML code. Since it appeared, XML best practice has shifted to using tag names for this purpose. As a result, TBX now supports two “styles” of XML: the original DCA (Data Categories as Attributes) and a newer DCT (Data Categories as Tag names). In the long term, practice may evolve to DCT exclusively and the current format provides a migration path for existing TBX implementations. Other changes that apply to both styles are designed to make it easier to parse and work with TBX files and to use terminological data with XLIFF.

dca_v_dct

  • Dialects simplify adoption. Perhaps the biggest impediment in the past has been that TBX does not define a single format for terminological data, but instead a way to represent the different formats various termbases use. As a result, many different data sets with different models have proven to be incompatible for interchange purposes. The newer version defines several “dialects” of TBX intended for common use and data interchange. The availability of standard dialects will remove a lot of guesswork and provide specific implementation targets for tool developers. In addition, the official dialects – TBX-Core, TBX-Min, and TBX-Basic – “telescope” into each other: Each one is a progressive superset of the preceding one, which facilitates interoperability between them. The standard also provides approaches for handling customized data categories and for developing custom dialect extensions.
     
  • Required dialect names. The 2008 version was problematic because implementers often ignored the requirement to declare what variant of TBX they were using in a separate file attached to every document: As a result, when someone received a file, there often was no way of knowing what data categories it would contain. The new version makes this declaration mandatory by using a dialect name rather than a separate file so that implementers know what to expect from a given TBX document. No longer will someone receive a “TBX file” with no guidance concerning which data categories it implements. Creators of customized extensions to dialects are required to post formal dialect definitions as links (using XML namespace for the DCT style) where users can find the information they need to ensure reliable interchange scenarios.

In order to simplify implementation, the TBX Steering Committee set up TBXInfo.net with guidance, tools, and resources for implementers. This site helps ensure that materials needed to work with TBX are open to the public and freely available. By contrast, the standard itself – which carries a price of CHF158 (~US$160) – has been streamlined and shortened to reduce cost. In most cases, only developers will need to purchase ISO 30042 because other interested parties will find answers to other questions at the TBXinfo.net site.
 

TBX Plays a Vital Role in the Intelligent Content World

Why does this matter to language service providers and enterprise content creators? The most common type of translation error is failure to comply with terminology. Although TBX cannot resolve every problem, it does provide a standards-based approach to exchanging data about terms and implementing best practices for terminology management. Managing and controlling terminology is also a key requirement for creating intelligent content and translating it. Terminology management is thus set to become more important in the language industry, especially as TBX guides processes past spreadsheets to automated workflows and deployment of terminological resources.

The changes to TBX have modernized it and prepared it for the next generation of content applications. The new version resolves many of the challenges that implementers of the previous version faced and sets TBX up to fill a vital role in the language industry and intelligent content applications.

About the Author

Arle  Lommel

Arle Lommel

Senior Analyst

Focuses on language technology, artificial intelligence, translation quality, and overall economic factors impacting globalization

Related

Augmenting Human Translator Performance

Augmenting Human Translator Performance

In the first episode of an iconic sci-fi television series, a NASA test pilot was seriously injured ...

Read More >
The Language Sector in Eight Charts

The Language Sector in Eight Charts

To understand the effect of the pandemic on the language industry, CSA Research began quarterly surv...

Read More >
Where Is Your Translation Technology?

Where Is Your Translation Technology?

Long gone are the days when only the biggest enterprises or language service providers had their own...

Read More >
Five Ways to Save Money on Video Localization

Five Ways to Save Money on Video Localization

Many firms continue to ramp up for webinars, virtual presentations, and multilingual online events. ...

Read More >
Budget Cuts Versus the Four Rs

Budget Cuts Versus the Four Rs

With no country’s economy able to predict accurately whether it’s entering a recession or cautious...

Read More >
Global Companies Turn to LSP Rapid Response Teams to Cope with COVID-19

Global Companies Turn to LSP Rapid Response Teams to Cope with COVID-19

Following online group meetings with Leadership Councils and CEOs representing LSPs of various sizes...

Read More >

Subscribe

Name

Categories

Follow Us on Twitter