Jonathan Hall

Jon Hall

Abstract

The Low Resource Language Database (LORELAD) is a project that I designed to aid language preservation efforts. The project was selected to be part of the BU Spark! Innovation Fellowship for the Fall 2020 semester. Along with guidance and funding, I got to interview and select 4 teammates to lead in developing Lorelad.

Background

In my sophomore year, I took an intro to Linguistics class that I became really involved in. The complexity of language translation is fascinating. The function of translating is rarely surjective or injective. This means even ideal translations aren't necessarily correct (which why famous works such as The Divine Comedy or The Illiad have so many different versions).

My professor encouraged me to attend a talk (Professor Gina Anne Levow, Translation for Low Resource Languages) where I learned about our rapidly decreasing language diversity. Languages are dying at an alarming rate, and by the year 2100, there are estimates that only about 10% of the world's 7000 or so actively spoken languages will still be spoken.

Among other things, this is largely due to our globalizing world economy. In order to trade and do business, we need a common language to understand each other. So, in school we're taught English, Chinese, French, Spanish, and German. The majority of the world proficiently speaks atleast one of these 5 languages.

The availability and demand for worldly languages has made local languages redundant as a tool to communicate. But language is so much more than a tool. The way we speak helps define who we are. Language holds our culture, our heritage, and our history. And many are dying. It's often hard to see because the state of a language is determined by each generation.

In school we're taught worldly languages, and we're encouraged to speak them with our friends and teachers. But at home, we likely still speak our parent's native language, and engage in our parent's traditions. This generation is properly bilingual. However, when this generation has kids, they likely speak their worldly language at home. The kids may still practice the traditions native to their grandparents, and they may still be taught their local language, but usually they speak brokenly and illiteracy is common. By the next generation, the local language is usually no longer practiced, and the cultural connection is severed.

Lorelad

While grim, this was largely the motivation that inspired LORELAD. I set out to help language preservation given my skills and resources. Being interested in Natural Language Processing and Software, I set on creating a platform for low resource language speakers to share stories in their native language. These stories could be recipes, folk tales, beliefs, music, or really anything. I want Lorelad to be a place to connect culturally with others that share your language. Where people can come explore, and learn about languages other than their own.

I also want Lorelad to serve as a public repository for low resource language data. The magic wand solution being that if we create a machine translator that's good and accessible enough, we won't ever have to give up our local languages for worldly ones.

Tools Used: