The Sound of Silence - Extinct Languages Disappear Without a Trace in Brazilian Museum Fire - Our Analysts' Insights
X

Our Analysts' Insights

Blogs & Events / Blog
31Oct

The Sound of Silence - Extinct Languages Disappear Without a Trace in Brazilian Museum Fire

The Museu Nacional inferno in September shows how easily physical artifacts disappear. The famous museum’s 20 million artifacts included not only insects and fossils, but garments, tools, and documents of indigenous peoples collected over hundreds of years. Tragically for anthropologists, linguists, and musicologists, it also contained recordings of conversation, ceremonies, and songs that were not duplicated in any other collection: Interviews etched on century-old wax cylinders; indigenous music captured on a phonograph in 1912; film, reel-to-reel, and cassette tapes recorded by field anthropologists since the 1960s. Though some efforts had been made, most of the linguistic collection was not digitized and is now lost forever.

Not surprisingly, Wired issued a call-to-arms for digitization of world archives in reaction to the fire. But how sustainable are digital archives? Museums typically don't have the funding they need to do everything they should or want to do, including digitizing all of their collections. The Museu Nacional’s annual maintenance budget was $128,000 – and it hadn’t received even that much in actual funds since 2014. This year the allocation as of September had only been $13,000. Under such circumstances, how safe will a digital archive be? 

Half of the world’s linguistic diversity is already gone, as we are reduced to 7,000 or so languages today from an estimated peak of 15-20,000, and experts predict half of those remaining will die out by the end of the century. So the question of preservation is as much ahead of us as behind us. How will we capture and retain the cultural and linguistic records of the information era?

Linguist Laura McPherson relates: “Though spoken diversity is sure to continue its decline, through documentation and archiving, records of this intangible cultural heritage will always be preserved. The more thorough the documentation — the more it extends beyond language use to include cultural activities, folklore, indigenous taxonomies, music, and more — the richer the record and the better picture we can paint of these precious slices of human ingenuity.”

Today, digital records proliferate, especially with the distribution of video recorders on smartphones, and the distribution of video recordings on social media. Will YouTube be the stomping ground for future anthropologists, linguists, and musicologists? YouTube is Google’s most widely adopted app, after search, more popular than Gmail. The site already houses 1.3 billion assets, with 300 hours of video being uploaded per second by 1.5 billion active users. Unfortunately, the site provides no way for users to filter or search based on structured content, instead relying on hashtags. Users who upload content have the option of identifying the language of the video, but only if the user takes the time and the language happens to be one of the 190 on the list. Moreover, the language metadata is only available to the algorithm, not to researchers. 



Currently, searching for the name of obscure languages works only if the video title or description happens to include that word. For example, searching for Nyanja turned up several tutorials, and one example of colloquial use that a future linguist or cultural historian will greatly appreciate. However, users searching for content with hashtags may not find this video, because they might instead type in Chewa, Chichewa, Chinyanja, or Cinyanja, or Chinyanja -- other common names for the same tongue. 

Allowing users to properly label content, as uploaders or as curators, could help in identifying content of interest to linguists, ethno-musicologists, and many other disciplines. Ultimately, however, a commercial service like YouTube may not be the appropriate venue for permanent storage. Google, and its parent Alphabet won’t necessarily archive content forever, despite their current intentions. Videos in languages few (and eventually no) people speak, once the accounts of the uploader are no longer active, may be destined for the digital shredder in the long run. Though also susceptible to losses, services designed for academic record-keeping are more secure. 

The Endangered Language Archive (ELAR) and the Archive of the Indigenous Languages of Latin America (AILLA) are examples of digital collections, audio and video, which can be accessed online. Recent efforts to record disappearing languages by the National Science Foundation in the United States and at University of London’s Endangered Language program, which has documented 380 languages to date, require formal archiving of digital assets. But even these well-funded bodies may not be prepared for the scale of documentation available and in need of archiving. 

Going back to the Brazilian example, there were once 2,000 tribes in the Amazon. Now there are 500, speaking 330 languages, approximately. As smartphones and other devices penetrate the vast canopy of the world’s largest forest and watershed, a great diversity of speech, songs, and living practices will be captured and uploaded to social media. If even large governments can’t afford the upkeep of their cultural institutions, does the technology industry itself have a role to play? We are losing languages and cultures every year and the pace is increasing. YouTube, Facebook, Netflix, and other companies gain revenue when people share cultural information in social media and through tourism. Therefore loss of important cultural artifacts in collections like the Museu Nacional’s represents, in a sense, a permanent loss of revenue as images and descriptions and discussions of those artifacts will never happen. It’s in the best interest of the industry to help organize resources, in cooperation with philanthropists, NGOs, and other private and public initiatives, to protect and share the world knowledge that Google so famously set out to organize.

About the Author

Benjamin Sargent

Benjamin Sargent

Member of the Technology Advisory Board

Focuses on translation management systems and content management technologies

Related

Simple Actions for Achieving More Efficient Localization Processes

Simple Actions for Achieving More Efficient Localization Processes

While the goal for project management has long been full automation (“lights-out”), few organizati...

Read More >
Wanted: Expert Project Managers

Wanted: Expert Project Managers

Are you an expert project manager or interpreting scheduler? We need to talk! Project management – ...

Read More >
Generative AI and Copyright: Unraveling the Complexities

Generative AI and Copyright: Unraveling the Complexities

A common worry about generative AI (GenAI) is that the content that it creates may be subject to cop...

Read More >
AI in Multimedia Localization: How to Spot the Winners and Avoid the Scams

AI in Multimedia Localization: How to Spot the Winners and Avoid the Scams

During our research into multimedia localization – and all the new AI-enhanced tools that are sprou...

Read More >
Is GenAI Going to Replace NMT?

Is GenAI Going to Replace NMT?

It is incredible to think that, less than eight years after the first publicly available neural mach...

Read More >
The Coming Content Cataclysm

The Coming Content Cataclysm

Multimedia, transcribed audio, and AI-generated content in all the world’s digital languages join m...

Read More >

Subscribe

Name

Categories

Follow Us on Twitter