Talk:Bitext word alignment

	Linguistics portal This article is within the scope of WikiProject Linguistics, a collaborative effort to improve the coverage of linguistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.LinguisticsWikipedia:WikiProject LinguisticsTemplate:WikiProject LinguisticsLinguistics articles
???	This article has not yet received a rating on the project's importance scale.
	This article is supported by Applied Linguistics Task Force.

External links modified

Latest comment: 8 years ago1 comment1 person in discussion

Hello fellow Wikipedians,

I have just modified 2 external links on Bitext word alignment. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}).

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 09:25, 3 November 2016 (UTC)Reply

Terribly dated

Latest comment: 1 year ago1 comment1 person in discussion

The article describes a start of affairs from around 10 years ago and is thus quite misleading. Relevant developments include

FastAlign (fast_align, yet another IBM-2 implementation, but easier to use and hence more popular than GIZA++ nowadays)
- Chris Dyer, Victor Chahuneau, and Noah A Smith. 2013. A simple, fast, and effective reparameterization of IBM model 2. In Proc. of NAACL-HLT, pages 644–648.
neural alignment
- Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
- Ho, Anh Khoa Ngo, and François Yvon. "Neural Baselines for Word Alignments." In International Workshop on Spoken Language Translation. 2019
- Ferrando, Javier and Marta R. Costa-jussà. 2021. Attention weights in transformer NMT fail aligning words between sequences but largely explain model predictions. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 434–443, Association for Computational Linguistics, Punta Cana, Dominican Republic.
- Jalili Sabet, Masoud, Philipp Dufter, François Yvon, and Hinrich Schütze. 2020. SimAlign: High quality word alignments without parallel training data using static and contextualized embeddings. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1627–1643, Association for Computational Linguistics, Online
- Dou, Zi-Yi and Graham Neubig. 2021. Word alignment by fine-tuning embeddings on parallel corpora. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2112–2128, Association for Computational Linguistics, Online.

If I have the time, I could imagine to work that into the article, ... but this might take a while. At least I wanted to leave some pointers for others to start from ;)

The other issue of the current article is that the implementations listed under "Software" actually perform very different tasks and need to be classified as such. HunAlign is for sentence alignment, the IBM models are for word alignment, and Anymalign is for dictionary induction. Chiarcos (talk) 09:41, 1 November 2023 (UTC)Reply

Add topic