When DNA Includes AI, ML, and MT
An extremely popular gift for the holiday season is the family history DNA testing kit. Vendors such as MyHeritage and AncestryDNA advertise millions of users, increasing sales, and ever-improving analyses. No doubt, this week many people are eagerly awaiting the results of a test-tube full of saliva; wanting to confirm their expected heritage or to discover ancient roots – and expecting to have an absolute, definitive, and correct analysis of their ancestry. But it’s a bit like expecting a machine translation to deliver perfect results.
It is not as simple as that. Like neural machine translation, or artificial intelligence (AI) and machine learning tools that can assess language quality or route translation jobs to the most appropriate linguist, DNA analysis of racial heritage is not (yet) an exact science. It is based on advanced computer analysis of a rapidly growing – and ever-changing – mass of data. If you subscribe to one of the family history services, you can watch your DNA analysis update and change over time; new countries or regions appear, finer details emerge, and percentages readjust.
AI and Machine Learning – and MT – Still Need Human Experts
Compare that to translations through an MT engine that learns through the acquisition of new data, with results that change and improve with extended use. You can also easily skew the quality of output with the introduction of inconsistent or erroneous data. This application of additional data and human expertise is another way in which family history and the translation process currently resemble one another.
By itself, a genealogical DNA test provides valuable information and can give a meaningful, general indication of family history – just as raw machine translation can deliver usable information. Both processes depend on the amount and quality of relevant and accurate data available to the analytical software. It might be all you need – an idea of your family’s roots. But if you’re aiming for an accurate, comprehensive, and generation-by-generation story that flows, the results must be combined with other data and insights – just as machine translation is improved by post-editing, and AI and machine learning can augment and automate many aspects of the translation process. Together, combined with other processes and data, these techniques become an essential and interlocked component of an enterprise’s overarching language strategy.
As with machine translation, DNA results differ depending on the analysis engine. Different genealogical testing companies use different ethnic reference groups and algorithms for matching, so results vary – sometimes dramatically. Consider the difference between free, generic MT translations versus results from an MT engine trained on specific vertical market and company content – the output will differ and requires a knowledgeable human to assess the quality within a given context.
DNA analysis is a useful and important addition to the genealogist’s toolkit. When combined with historical record research, family memories, and an expert eye for incorrect assumptions and other easy errors, you can build a clear and accurate picture of your personal heritage. Data integrity is vital – it is too easy for novice researchers to propagate errors in online ancestry repositories – so human expertise is not going away any time soon.
Reliable Data Aids Knowledge, Direction, and Strategy
Other data sets that are useful in ancestry research are also important in the localization industry. Census returns, for example, not only identify your great-great-grandfather’s siblings and country of origin, they also deliver up-to-date information about the languages spoken at home by people today. For example, federal, regional, and local entities in the United States use this data to define the language needs of their populations. Companies can combine the same resources with data analysis tools, such as the Global Revenue Forecaster™, to develop their global content strategies.
In both family history research and the localization industry, technology advances are augmenting and improving, but not replacing, traditional methods for delivering top quality results. However, one area where the two differ is in “unexpected results.” There is no translation equivalent for a DNA test revealing that your great-grandad wasn’t Bill from Iowa but was most likely Ivan from downtown Moscow, Russia. So, before you spit in that DNA test tube, be sure you’re ready for a surprise or two – and don’t forget to combine the results with other data and research if you want top-quality results!
About the Author