KEMT 2024

First International Workshop on
Knowledge-Enhanced Machine Translation

The 25th Annual Conference of The European Association for Machine Translation

@kemt2024

27 JUNE 2024

SHEFFIELD, ENGLAND

Venue

Vene

Anchor 1

Scope

In all machine translation paradigms, incorporating external knowledge eases the adaptation of machine translation systems to specific use cases and domains. In resource-rich scenarios, translation memories (TMs) and terminology databases are commonly used for this purpose. In resource-poor scenarios, where the quality of systems trained solely on parallel sentences is poor, almost any kind of additional source of information (dictionaries, linguistic information, translation quality indicators, etc.) is valuable. At the First International Workshop on Knowledge-Enhanced Machine Translation (KEMT2024), our goal is to bring together practitioners of these techniques to exchange insights and foster collaboration between academia and industry.

Topics

We welcome technical reports of research results in any aspect of integration of additional knowledge into machine translation, as well as case studies describing experiences in organisations of all types.
The topics of interest include, but are not limited to:

Integration of external terminology and constrained decoding
Integration of TMs and similar translations from external sources
Leveraging any kind of linguistic information
Data augmentation techniques
Using large language models to integrate external resources
Integration of translation quality indicators for improving final MT output
Quality assessment of knowledge-enhanced MT systems
Utilizing quality estimation systems for improving MT performance
Integration of knowledge graphs

Important Dates

First call for papers: 13th of March 2024
Second call for papers: 4th of April 2024
Submission deadline: 26th of April 2024 (23:59 AOE Anywhere on Earth)
Acceptance notification: 15th of May 2024
Camera-ready due: 27th of May 2024
KEMT Workshop: 27th of June 2024

KEMT 2024 Proceedings

proceedings

Registration

The workshop will take place on 27th of June, the last day of the EAMT 2024 conference, which will be held from June 24th to June 27th. Participation in this workshop requires registration for the main conference.

For registration details, please refer to the EAMT 2024 website.

Submission Types

We welcome submissions either of research papers or extended abstracts/industry reports. Full research papers should describe original, unpublished content, while extended abstracts are open to reporting preliminary results of ongoing research. Industry reports should demonstrate the impact of conceptual modelling in a real-world setting, arguing for generalisability of methods and lessons learned.

Full research papers: Submissions will be accepted as papers of at least 4 up to 10 pages (plus unlimited pages for references and appendices).
Extended abstracts/industry reports:Submissions will be accepted as papers of up to 2 pages. The references are not included in the 2-page limit.

Accepted submissions will be presented either as posters or oral communications, as decided by the program committee and will be published online as proceedings included in the ACL Anthology, unless the authors specify otherwise.

Submission Guidelines

Submissions are closed!
Anonymisation is not required.
Reviews will be carried out by the PC members.
Please find the EAMT guidelines and style templates here.
Submissions can be withdrawn before 27th of May 2024.
IMPORTANT: Make sure to create an account at OpenReview in advance! Your account will need to be approved before you can make a submission to KEMT 2024.

Organisation Committee

Arda Tezcan (Ghent University, Belgium)
Víctor M. Sánchez-Cartagena (Universitat d'Alacant, Spain)
Miquel Espla Gomis (Universitat d'Alacant, Spain)

Program Committee

Frédéric Blain (Tilburg University)
Josep Crego (Systran)
Miquel Esplà-Gomis (University of Alicante)
Yasmin Moslem (Dublin City University)
Juan Antonio Pérez-Ortiz (University of Alicante)
Víctor M. Sánchez-Cartagena (University of Alicante)
Felipe Sánchez-Martínez (University of Alicante)
Arda Tezcan (Ghent University)
Daniel Torregrosa (World Intellectual Property Organization)
Antonio Toral (University of Groningen)
Tom Vanallemeersch (CrossLang)
Vincent Vandeghinste (Instituut voor de Nederlandse Taal)
Bram Vanroy (Katholieke Universiteit Leuven)
François Yvon (CNRS and Sorbonne-Université)

Invited Speakers

Ricardo Rei

TowerLLM: Improving Translation Quality through Prompting with Terminology and Translation Guidelines

TowerLLM revolutionizes machine translation by tailoring large language models (LLMs) to diverse translation tasks. By continued pretraining on mixed data and fine-tuning with task-specific instructions, TowerLLM surpasses open alternatives and rivals closed LLMs. This approach ensures proficiency across translation workflows, enhancing quality and efficiency. TowerLLM's impact extends beyond technical advancements, envisioning a future where specialized LLMs seamlessly integrate into translation pipelines, augmenting human capabilities. With the release of Tower models, specialized datasets, and evaluation frameworks, TowerLLM democratizes access to specialized resources, fostering collaboration and driving transformative advancements in machine translation.

Ricardo Rei is a senior research scientist at Unbabel, specializing in machine translation and natural language processing. He is set to complete his Ph.D. in April, which has been a collaborative effort between Unbabel, INESC-ID/Tecnico, and CMU University. His doctoral research has been centered on machine translation evaluation, and he is the main developer behind the COMET evaluation framework, which has become the industry standard metric for assessing machine translation quality. With a keen interest in advancing the capabilities of multilingual large language models (LLMs), he has been at the forefront of research and development in this domain. When not immersed in research, Ricardo enjoys maintaining an active lifestyle, often found at the gym or riding the waves while surfing—a passion he has pursued since the age of nine.

Tom Vanallemeersch

To Customize Is to Know: Leveraging In-house Knowledge for Multilingual Document Flows

While the number of commercial and open-source multilingual NLP models steadily keeps growing, such generic models do not necessarily meet users' unique demands in full. This is especially true for companies and public administrations with highly specialized document flows. To optimize the use of multilingual tools, these organizations should be aware of the value of their in-house knowledge. This knowledge is not only embedded in multilingual assets like translation memories, documents in various languages and formats, or glossaries, but also the in-house expertise on document functionality and critical textual elements like terms and named entities.

Tom Vanallemeersch is Language AI Adviser at CrossLang, where he contributes to the customisation and deployment of multilingual NLP systems and coordinates the company’s participation in publicly funded projects. Besides various positions in industry, including work at Systran, his career spans academia (PhD in computational linguistics at the University of Leuven) and consultancy for the European Commission (DG Translation’s MT team). In his spare time, his membership of a chamber choir allows him to conduct multilingual experiments of a wholly different kind.

Speakers

Agenda

Programme

JUNE 27, 2024

08:30-09:00

Registration

09:00-09:05

Opening notes

09:05-10:00

Keynote 1 — To Customize Is to Know: Leveraging In-house Knowledge for Multilingual Document Flows
Tom Vanallemeersch

10:00-10:30

Adding soft terminology constraints to pre-trained generic MT models by means of continued training
Tommi Nieminen

10:30-11:00

Coffee break

11:00-12:30

Keynote 2 — TowerLLM: Improving Translation Quality through Prompting with Terminology and Translation Guidelines
Ricardo Rei

12:30-13:30

Lunch

13:30-14:00

Exploring Inline Lexicon Injection for Cross-Domain Transfer in Neural Machine Translation
Jesujoba O. Alabi and Rachel Bawden

14:00-14:30

Incorporating Hypernym Features for Improving Low-resource Neural Machine Translation
Abhisek Chakrabarty, Haiyue Song, Raj Dabre, Hideki Tanaka and Masao Utiyama

14:30-15:30

Poster session (co-located with other workshops)

15:00-15:30

Coffee break

15:30-16:30

Panel Session
Rachel Bawden, Catarina Farinha, Tommi Nieminen, Ricardo Rei, Haiyue Sony and Tom Vanallemeersch

16:30

End of workshop

Venue2

The Venue

The Diamond

University of Sheffield,

32 Leavygreave Rd

S3 7RD

The Diamond is located in the heart of the University of Sheffield main campus, easily accessible from the city centre.

The nearest tram stops are University of Sheffield or West Street, both less than 5 min walking to the conference venue.

There are also bus stops at West Street and the main lines are 51, 52, 95, 95a and 120, connecting Sheffield Train Station, the city centre and other neighbourhoods of Sheffield (including the students' accommodation at Encliffe Village) with the Diamond building.

More information about key bus/tram stops and connections with the main bus station and the train station can be found here.

KEMT 2024

First International Workshop on
Knowledge-Enhanced Machine Translation

Scope

Topics

Important Dates

KEMT 2024 Proceedings

Registration

Submission Types

Full research papers: Submissions will be accepted as papers of at least 4 up to 10 pages (plus unlimited pages for references and appendices).

Extended abstracts/industry reports:Submissions will be accepted as papers of up to 2 pages. The references are not included in the 2-page limit.

Accepted submissions will be presented either as posters or oral communications, as decided by the program committee and will be published online as proceedings included in the ACL Anthology, unless the authors specify otherwise.

Submission Guidelines

Submissions are closed!

Anonymisation is not required.

Reviews will be carried out by the PC members.

Please find the EAMT guidelines and style templates here.

Submissions can be withdrawn before 27th of May 2024.

IMPORTANT: Make sure to create an account at OpenReview in advance! Your account will need to be approved before you can make a submission to KEMT 2024.

Organisation Committee

Arda Tezcan (Ghent University, Belgium)
Víctor M. Sánchez-Cartagena (Universitat d'Alacant, Spain)
Miquel Espla Gomis (Universitat d'Alacant, Spain)

Program Committee

Sponsors

Invited Speakers

Ricardo Rei

Tom Vanallemeersch

Programme

Registration

Opening notes

Keynote 1 — To Customize Is to Know: Leveraging In-house Knowledge for Multilingual Document Flows
Tom Vanallemeersch

Adding soft terminology constraints to pre-trained generic MT models by means of continued training
Tommi Nieminen

Coffee break

Keynote 2 — TowerLLM: Improving Translation Quality through Prompting with Terminology and Translation Guidelines
Ricardo Rei

Lunch

Exploring Inline Lexicon Injection for Cross-Domain Transfer in Neural Machine Translation
Jesujoba O. Alabi and Rachel Bawden

Incorporating Hypernym Features for Improving Low-resource Neural Machine Translation
Abhisek Chakrabarty, Haiyue Song, Raj Dabre, Hideki Tanaka and Masao Utiyama

Poster session (co-located with other workshops)

Coffee break

Panel Session
Rachel Bawden, Catarina Farinha, Tommi Nieminen, Ricardo Rei, Haiyue Sony and Tom Vanallemeersch

End of workshop

The Venue

Contact

KEMT 2024 First International Workshop on Knowledge-Enhanced Machine Translation

Scope

Topics

Important Dates

KEMT 2024 Proceedings

Registration

Submission Types

Full research papers: Submissions will be accepted as papers of at least 4 up to 10 pages (plus unlimited pages for references and appendices).

Extended abstracts/industry reports:Submissions will be accepted as papers of up to 2 pages. The references are not included in the 2-page limit.

Accepted submissions will be presented either as posters or oral communications, as decided by the program committee and will be published online as proceedings included in the ACL Anthology, unless the authors specify otherwise.

Submission Guidelines

Submissions are closed!

Anonymisation is not required.

Reviews will be carried out by the PC members.

Please find the EAMT guidelines and style templates here.

Submissions can be withdrawn before 27th of May 2024.

IMPORTANT: Make sure to create an account at OpenReview in advance! Your account will need to be approved before you can make a submission to KEMT 2024.

Organisation Committee

Arda Tezcan (Ghent University, Belgium) Víctor M. Sánchez-Cartagena (Universitat d'Alacant, Spain) Miquel Espla Gomis (Universitat d'Alacant, Spain)

Program Committee

Sponsors

Invited Speakers

Ricardo Rei

Tom Vanallemeersch

Programme

Registration

Opening notes

Keynote 1 — To Customize Is to Know: Leveraging In-house Knowledge for Multilingual Document Flows Tom Vanallemeersch

Adding soft terminology constraints to pre-trained generic MT models by means of continued training Tommi Nieminen

Coffee break

Keynote 2 — TowerLLM: Improving Translation Quality through Prompting with Terminology and Translation Guidelines Ricardo Rei

Lunch

Exploring Inline Lexicon Injection for Cross-Domain Transfer in Neural Machine Translation Jesujoba O. Alabi and Rachel Bawden

Incorporating Hypernym Features for Improving Low-resource Neural Machine Translation Abhisek Chakrabarty, Haiyue Song, Raj Dabre, Hideki Tanaka and Masao Utiyama

Poster session (co-located with other workshops)

Coffee break

Panel Session Rachel Bawden, Catarina Farinha, Tommi Nieminen, Ricardo Rei, Haiyue Sony and Tom Vanallemeersch

End of workshop

The Venue

Contact

KEMT 2024

First International Workshop on
Knowledge-Enhanced Machine Translation

Arda Tezcan (Ghent University, Belgium)
Víctor M. Sánchez-Cartagena (Universitat d'Alacant, Spain)
Miquel Espla Gomis (Universitat d'Alacant, Spain)

Keynote 1 — To Customize Is to Know: Leveraging In-house Knowledge for Multilingual Document Flows
Tom Vanallemeersch

Adding soft terminology constraints to pre-trained generic MT models by means of continued training
Tommi Nieminen

Keynote 2 — TowerLLM: Improving Translation Quality through Prompting with Terminology and Translation Guidelines
Ricardo Rei

Exploring Inline Lexicon Injection for Cross-Domain Transfer in Neural Machine Translation
Jesujoba O. Alabi and Rachel Bawden

Incorporating Hypernym Features for Improving Low-resource Neural Machine Translation
Abhisek Chakrabarty, Haiyue Song, Raj Dabre, Hideki Tanaka and Masao Utiyama

Panel Session
Rachel Bawden, Catarina Farinha, Tommi Nieminen, Ricardo Rei, Haiyue Sony and Tom Vanallemeersch