Talk:Bitext word alignment

Latest comment: 5 months ago by Chiarcos in topic Terribly dated

External links modified edit

Hello fellow Wikipedians,

I have just modified 2 external links on Bitext word alignment. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}).

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 09:25, 3 November 2016 (UTC)Reply

Terribly dated edit

The article describes a start of affairs from around 10 years ago and is thus quite misleading. Relevant developments include

  • FastAlign (fast_align, yet another IBM-2 implementation, but easier to use and hence more popular than GIZA++ nowadays)
    • Chris Dyer, Victor Chahuneau, and Noah A Smith. 2013. A simple, fast, and effective reparameterization of IBM model 2. In Proc. of NAACL-HLT, pages 644–648.
  • neural alignment
    • Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
    • Ho, Anh Khoa Ngo, and François Yvon. "Neural Baselines for Word Alignments." In International Workshop on Spoken Language Translation. 2019
    • Ferrando, Javier and Marta R. Costa-jussà. 2021. Attention weights in transformer NMT fail aligning words between sequences but largely explain model predictions. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 434–443, Association for Computational Linguistics, Punta Cana, Dominican Republic.
    • Jalili Sabet, Masoud, Philipp Dufter, François Yvon, and Hinrich Schütze. 2020. SimAlign: High quality word alignments without parallel training data using static and contextualized embeddings. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1627–1643, Association for Computational Linguistics, Online
    • Dou, Zi-Yi and Graham Neubig. 2021. Word alignment by fine-tuning embeddings on parallel corpora. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2112–2128, Association for Computational Linguistics, Online.

If I have the time, I could imagine to work that into the article, ... but this might take a while. At least I wanted to leave some pointers for others to start from ;)

The other issue of the current article is that the implementations listed under "Software" actually perform very different tasks and need to be classified as such. HunAlign is for sentence alignment, the IBM models are for word alignment, and Anymalign is for dictionary induction. Chiarcos (talk) 09:41, 1 November 2023 (UTC)Reply