@inproceedings{lopez-etal-2023-make,
    title = "The {MAKE}-{NMTVIZ} System Description for the {WMT}23 Literary Task",
    author = "Lopez, Fabien  and
      Gonz{\'a}lez, Gabriela  and
      Hansen, Damien  and
      Nakhle, Mariam  and
      Namdarzadeh, Behnoosh  and
      Ballier, Nicolas  and
      Dinarelli, Marco  and
      Esperan{\c{c}}a-Rodier, Emmanuelle  and
      He, Sui  and
      Mohseni, Sadaf  and
      Rossi, Caroline  and
      Schwab, Didier  and
      Yang, Jun  and
      Yun{\`e}s, Jean-Baptiste  and
      Zhu, Lichao",
    editor = "Koehn, Philipp  and
      Haddow, Barry  and
      Kocmi, Tom  and
      Monz, Christof",
    booktitle = "Proceedings of the Eighth Conference on Machine Translation",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.wmt-1.30",
    doi = "10.18653/v1/2023.wmt-1.30",
    pages = "287--295",
    abstract = "This paper describes the MAKE-NMTVIZ Systems trained for the WMT 2023 Literary task. As a primary submission, we used Train, Valid1, test1 as part of the GuoFeng corpus (Wang et al., 2023) to fine-tune the mBART50 model with Chinese-English data. We followed very similar training parameters to (Lee et al. 2022) when fine-tuning mBART50. We trained for 3 epochs, using gelu as an activation function, with a learning rate of 0.05, dropout of 0.1 and a batch size of 16. We decoded using a beam search of size 5. For our contrastive1 submission, we implemented a fine-tuned concatenation transformer (Lupo et al., 2023). The training was developed in two steps: (i) a sentence-level transformer was implemented for 10 epochs trained using general, test1, and valid1 data (more details in contrastive2 system); (ii) second, we fine-tuned at document-level using 3-sentence concatenation for 4 epochs using train, test2, and valid2 data. During the fine-tuning, we used ReLU as an activation function, with an inverse square root learning rate, dropout of 0.1, and a batch size of 64. We decoded using a beam search of size. Four our contrastive2 and last submission, we implemented a sentence-level transformer model (Vaswani et al., 2017). The model was trained with general data for 10 epochs using general-purpose, test1, and valid 1 data. The training parameters were an inverse square root scheduled learning rate, a dropout of 0.1, and a batch size of 64. We decoded using a beam search of size 4. We then compared the three translation outputs from an interdisciplinary perspective, investigating some of the effects of sentence- vs document-based training. Computer scientists, translators and corpus linguists discussed the linguistic remaining issues for this discourse-level literary translation.",
}