Evaluating the Human-Like Quality of Neural Machine Translation Outputs

Authors

  • Katya Ivanova Ural Mountains University, Russia
  • Yuki Tanaka Ural Mountains University, Russia

Abstract

Evaluating the human-like quality of neural machine translation (NMT) outputs is a crucial yet challenging task in natural language processing (NLP). This paper explores methodologies and metrics aimed at assessing how closely NMT systems approximate human translation capabilities. Central to this investigation are various evaluation approaches, including linguistic fluency, semantic fidelity, and cultural appropriateness. The study delves into the application of established metrics such as BLEU (Bilingual Evaluation Understudy), METEOR (Metric for Evaluation of Translation with Explicit ORdering), and newer metrics designed to capture nuanced aspects of human-like translation. Moreover, the role of human evaluation in complementing automated metrics is considered, providing insights into subjective perceptions of translation quality. Through empirical analysis of diverse datasets and language pairs, the strengths and limitations of current evaluation frameworks in gauging human-like quality are demonstrated. The findings underscore the need for nuanced evaluation strategies that account for cultural context, idiomatic expressions, and stylistic nuances, advancing the quest for NMT systems that mimic human translation proficiency effectively.

Downloads

Published

2024-03-21

Issue

Section

Articles