Working to surmount language barriers

22 Apr 2024
translate
22 Apr 2024

UCT’s Assoc Prof Maria Keet presented a paper at the 39th Symposium on Applied Computing on a novel modelling language, CoSMo, that aims to contribute towards a better and larger multilingual Wikipedia allowing people to share knowledge across linguistic barriers.

Keet’s work springs from volunteer work she undertook in 2022 on a special project that is part of the Wikimedia Foundation, called Abstract Wikipedia. Abstract Wikipedia aims to allow more people to share knowledge in a greater variety of languages based on the same facts. It envisions to create and maintain Wikipedia articles from a shared language-independent source and automatically generate up-to-date Wikipedia articles in all supported languages. In other words, a combination of code, data, and linguistic information then allows for the creation of natural language sentences in any of the supported languages. IsiZulu is one of the currently supported South African languages in Wikipedia, as are isiXhosa, Setswana, and Afrikaans, among others.

CoSMo assists users in selecting the ‘interesting’ content from the database, called Wikidata, and from the data store with functions, called Wikifunctions, that the processing pipeline will convert into natural language sentences.

Keet is also co-author with Ariel Gutman, a software engineer and linguist at Google, of the template language that forms part of that data-to-text natural language generation pipeline. Read more about this on Abstract Wikpedia’s project page.

Both honours and masters students with UCT’s Department of Computer Science are and have been involved in related projects to expand this endeavour into multilingualism. The work on CoSMo was carried out in collaboration with Pablo Fillottrani from the Universidad Nacional del Sur, in Argentina, and Kutz Arieta, and benefited from the discussions with members of the Abstract Wikipedia team.

The conference, sponsored by the ACM/Special Interest Group on Applied Computing (SIGAPP), was held in Ávila, Spain from the 8th to 12th April 2024. The paper will be available after the conference here: https://doi.org/10.1145/3605098.3635889; a non-technical overview is described on Keet’s blog post about CoSMo