Qatar Computing Research Institute (QCRI), part of the Hamad Bin Khalifa University (HBKU), celebrated a milestone recently when its machine translation system, Shaheen, marked over 1bn words translated.
Using statistical and deep learning methods for language processing, Shaheen yields accurate Arabic versions of English content and vice-versa. As the official language of 25 countries, Arabic is spoken by more than 400mn people and widely used globally in scientific and artistic literature.
By making Arabic content accessible to the outside world, the sophisticated system facilitates knowledge sharing and learning by providing wider access to information.
Dr Hassan Sajjad, a scientist at QCRI, said: “One of the priority areas of QCRI is Arabic language technologies with the intention of promoting the language in the information age. Shaheen, which has been widely used worldwide in different fields and applications, is one of QCRI's successes in line with this objective. Shaheen uses a state-of-the-art technology that preserves context in translating between languages, providing users with high-quality content.”
Shaheen’s success was made possible by the QCRI Arabic Language Technologies team at HBKU, including Dr Hassan Sajjad, Dr Nadir Durrani, Dr Ahmed Abdelali and Fahim Dalvi. The team also had the support of Dr Stephan Vogel and Dr Francisco Gúzman during the initial stages of the system’s development.
The QCRI development team used a comprehensive collection of Arabic and English documents of various types, styles and topics, such as United Nations proceedings, news, TED Talks, movie subtitles and educational lectures, and performed billions of computations to train and hone the system. They developed artificial intelligence-based domain adaptation and generalisation techniques that allow the model to learn translating between two languages from heterogeneous data while maintaining high-quality translations.
Since its launch in 2018, Shaheen has been used in 46 countries across five continents by different organisations such as Al Jazeera Media Network, the BBC and Deutsche Welle. To date, Shaheen has rendered over 171,363 computing hours or approximately 7,000 days of producing translations that enhance the user’s understanding of the original document.