@inproceedings{lopez-etal-2023-make, title = "The {MAKE}-{NMTVIZ} System Description for the {WMT}23 Literary Task", author = "Lopez, Fabien and Gonz{\'a}lez, Gabriela and Hansen, Damien and Nakhle, Mariam and Namdarzadeh, Behnoosh and Ballier, Nicolas and Dinarelli, Marco and Esperan{\c{c}}a-Rodier, Emmanuelle and He, Sui and Mohseni, Sadaf and Rossi, Caroline and Schwab, Didier and Yang, Jun and Yun{\`e}s, Jean-Baptiste and Zhu, Lichao", editor = "Koehn, Philipp and Haddow, Barry and Kocmi, Tom and Monz, Christof", booktitle = "Proceedings of the Eighth Conference on Machine Translation", month = dec, year = "2023", address = "Singapore", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.wmt-1.30", doi = "10.18653/v1/2023.wmt-1.30", pages = "287--295", abstract = "This paper describes the MAKE-NMTVIZ Systems trained for the WMT 2023 Literary task. As a primary submission, we used Train, Valid1, test1 as part of the GuoFeng corpus (Wang et al., 2023) to fine-tune the mBART50 model with Chinese-English data. We followed very similar training parameters to (Lee et al. 2022) when fine-tuning mBART50. We trained for 3 epochs, using gelu as an activation function, with a learning rate of 0.05, dropout of 0.1 and a batch size of 16. We decoded using a beam search of size 5. For our contrastive1 submission, we implemented a fine-tuned concatenation transformer (Lupo et al., 2023). The training was developed in two steps: (i) a sentence-level transformer was implemented for 10 epochs trained using general, test1, and valid1 data (more details in contrastive2 system); (ii) second, we fine-tuned at document-level using 3-sentence concatenation for 4 epochs using train, test2, and valid2 data. During the fine-tuning, we used ReLU as an activation function, with an inverse square root learning rate, dropout of 0.1, and a batch size of 64. We decoded using a beam search of size. Four our contrastive2 and last submission, we implemented a sentence-level transformer model (Vaswani et al., 2017). The model was trained with general data for 10 epochs using general-purpose, test1, and valid 1 data. The training parameters were an inverse square root scheduled learning rate, a dropout of 0.1, and a batch size of 64. We decoded using a beam search of size 4. We then compared the three translation outputs from an interdisciplinary perspective, investigating some of the effects of sentence- vs document-based training. Computer scientists, translators and corpus linguists discussed the linguistic remaining issues for this discourse-level literary translation.", }