Scope
In all machine translation paradigms, incorporating external knowledge eases the adaptation of machine translation systems to specific use cases and domains. In resource-rich scenarios, translation memories (TMs) and terminology databases are commonly used for this purpose. In resource-poor scenarios, where the quality of systems trained solely on parallel sentences is poor, almost any kind of additional source of information (dictionaries, linguistic information, translation quality indicators, etc.) is valuable. At the First International Workshop on Knowledge-Enhanced Machine Translation (KEMT2024), our goal is to bring together practitioners of these techniques to exchange insights and foster collaboration between academia and industry.
Topics
We welcome technical reports of research results in any aspect of integration of additional knowledge into machine translation, as well as case studies describing experiences in organisations of all types.
The topics of interest include, but are not limited to:
-
Integration of external terminology and constrained decoding
-
Integration of TMs and similar translations from external sources
-
Leveraging any kind of linguistic information
-
Data augmentation techniques
-
Using large language models to integrate external resources
-
Integration of translation quality indicators for improving final MT output
-
Quality assessment of knowledge-enhanced MT systems
-
Utilizing quality estimation systems for improving MT performance
-
Integration of knowledge graphs
Important Dates
-
First call for papers: 13th of March 2024
-
Second call for papers: 4th of April 2024
-
Submission deadline: 26th of April 2024 (23:59 AOE Anywhere on Earth)
-
Acceptance notification: 15th of May 2024
-
Camera-ready due: 27th of May 2024
-
KEMT Workshop: 27th of June 2024
​
​
​
KEMT 2024 Proceedings
Registration
The workshop will take place on 27th of June, the last day of the EAMT 2024 conference, which will be held from June 24th to June 27th. Participation in this workshop requires registration for the main conference.
​
For registration details, please refer to the EAMT 2024 website.
Submission Types
We welcome submissions either of research papers or extended abstracts/industry reports. Full research papers should describe original, unpublished content, while extended abstracts are open to reporting preliminary results of ongoing research. Industry reports should demonstrate the impact of conceptual modelling in a real-world setting, arguing for generalisability of methods and lessons learned.
-
Full research papers: Submissions will be accepted as papers of at least 4 up to 10 pages (plus unlimited pages for references and appendices).
-
Extended abstracts/industry reports:Submissions will be accepted as papers of up to 2 pages. The references are not included in the 2-page limit.
Accepted submissions will be presented either as posters or oral communications, as decided by the program committee and will be published online as proceedings included in the ACL Anthology, unless the authors specify otherwise.
Submission Guidelines
-
Submissions are closed!
-
Anonymisation is not required.
-
Reviews will be carried out by the PC members.
-
Please find the EAMT guidelines and style templates here.
-
Submissions can be withdrawn before 27th of May 2024.
-
IMPORTANT: Make sure to create an account at OpenReview in advance! Your account will need to be approved before you can make a submission to KEMT 2024.
Organisation Committee
Arda Tezcan (Ghent University, Belgium)
Víctor M. Sánchez-Cartagena (Universitat d'Alacant, Spain)
Miquel Espla Gomis (Universitat d'Alacant, Spain)
Program Committee
Frédéric Blain (Tilburg University)
Josep Crego (Systran)
Miquel Esplà-Gomis (University of Alicante)
Yasmin Moslem (Dublin City University)
Juan Antonio Pérez-Ortiz (University of Alicante)
Víctor M. Sánchez-Cartagena (University of Alicante)
Felipe Sánchez-Martínez (University of Alicante)
Arda Tezcan (Ghent University)
Daniel Torregrosa (World Intellectual Property Organization)
Antonio Toral (University of Groningen)
Tom Vanallemeersch (CrossLang)
Vincent Vandeghinste (Instituut voor de Nederlandse Taal)
Bram Vanroy (Katholieke Universiteit Leuven)
François Yvon (CNRS and Sorbonne-Université)
Sponsors
Invited Speakers
Ricardo Rei
TowerLLM: Improving Translation Quality through Prompting with Terminology and Translation Guidelines
TowerLLM revolutionizes machine translation by tailoring large language models (LLMs) to diverse translation tasks. By continued pretraining on mixed data and fine-tuning with task-specific instructions, TowerLLM surpasses open alternatives and rivals closed LLMs. This approach ensures proficiency across translation workflows, enhancing quality and efficiency. TowerLLM's impact extends beyond technical advancements, envisioning a future where specialized LLMs seamlessly integrate into translation pipelines, augmenting human capabilities. With the release of Tower models, specialized datasets, and evaluation frameworks, TowerLLM democratizes access to specialized resources, fostering collaboration and driving transformative advancements in machine translation.
​
Ricardo Rei is a senior research scientist at Unbabel, specializing in machine translation and natural language processing. He is set to complete his Ph.D. in April, which has been a collaborative effort between Unbabel, INESC-ID/Tecnico, and CMU University. His doctoral research has been centered on machine translation evaluation, and he is the main developer behind the COMET evaluation framework, which has become the industry standard metric for assessing machine translation quality. With a keen interest in advancing the capabilities of multilingual large language models (LLMs), he has been at the forefront of research and development in this domain. When not immersed in research, Ricardo enjoys maintaining an active lifestyle, often found at the gym or riding the waves while surfing—a passion he has pursued since the age of nine.
​
Tom Vanallemeersch
To Customize Is to Know: Leveraging In-house Knowledge for Multilingual Document Flows
While the number of commercial and open-source multilingual NLP models steadily keeps growing, such generic models do not necessarily meet users' unique demands in full. This is especially true for companies and public administrations with highly specialized document flows. To optimize the use of multilingual tools, these organizations should be aware of the value of their in-house knowledge. This knowledge is not only embedded in multilingual assets like translation memories, documents in various languages and formats, or glossaries, but also the in-house expertise on document functionality and critical textual elements like terms and named entities.
​
Tom Vanallemeersch is Language AI Adviser at CrossLang, where he contributes to the customisation and deployment of multilingual NLP systems and coordinates the company’s participation in publicly funded projects. Besides various positions in industry, including work at Systran, his career spans academia (PhD in computational linguistics at the University of Leuven) and consultancy for the European Commission (DG Translation’s MT team). In his spare time, his membership of a chamber choir allows him to conduct multilingual experiments of a wholly different kind.
Programme
JUNE 27, 2024
08:30-09:00
Registration
09:00-09:05
Opening notes
09:05-10:00
Keynote 1 — To Customize Is to Know: Leveraging In-house Knowledge for Multilingual Document Flows
Tom Vanallemeersch
10:00-10:30
Adding soft terminology constraints to pre-trained generic MT models by means of continued training
Tommi Nieminen
10:30-11:00
Coffee break
11:00-12:30
Keynote 2 — TowerLLM: Improving Translation Quality through Prompting with Terminology and Translation Guidelines
Ricardo Rei
12:30-13:30
Lunch
13:30-14:00
Exploring Inline Lexicon Injection for Cross-Domain Transfer in Neural Machine Translation
Jesujoba O. Alabi and Rachel Bawden
14:00-14:30
Incorporating Hypernym Features for Improving Low-resource Neural Machine Translation
Abhisek Chakrabarty, Haiyue Song, Raj Dabre, Hideki Tanaka and Masao Utiyama
14:30-15:30
Poster session (co-located with other workshops)
15:00-15:30
Coffee break
15:30-16:30
Panel Session
Rachel Bawden, Catarina Farinha, Tommi Nieminen, Ricardo Rei, Haiyue Sony and Tom Vanallemeersch
16:30
End of workshop
The Venue
The Diamond
University of Sheffield,
32 Leavygreave Rd
S3 7RD
The Diamond is located in the heart of the University of Sheffield main campus, easily accessible from the city centre.
​
The nearest tram stops are University of Sheffield or West Street, both less than 5 min walking to the conference venue.
​
There are also bus stops at West Street and the main lines are 51, 52, 95, 95a and 120, connecting Sheffield Train Station, the city centre and other neighbourhoods of Sheffield (including the students' accommodation at Encliffe Village) with the Diamond building.
​
More information about key bus/tram stops and connections with the main bus station and the train station can be found here.