aligned-vectors.md 5.8 KB


id: aligned-vectors

title: Aligned word vectors

We are publishing aligned word vectors for 44 languages based on the pre-trained vectors computed on Wikipedia using fastText. The alignments are performed with the RCSLS method described in Joulin et al (2018).

Vectors

The aligned vectors can be downloaded from:

||||| |-|-|-|-| | Afrikaans: text | Arabic: text | Bulgarian: text | Bengali: text | | Bosnian: text | Catalan: text | Czech: text | Danish: text | | German: text | Greek: text | English: text | Spanish: text | | Estonian: text | Persian: text | Finnish: text | French: text | | Hebrew: text | Hindi: text | Croatian: text | Hungarian: text | | Indonesian: text | Italian: text | Korean: text | Lithuanian: text | | Latvian: text | Macedonian: text | Malay: text | Dutch: text | | Norwegian: text | Polish: text | Portuguese: text | Romanian: text | | Russian: text | Slovak: text | Slovenian: text | Albanian: text | | Swedish: text | Tamil: text | Thai: text | Tagalog: text | | Turkish: text | Ukrainian: text | Vietnamese: text | Chinese: text |

Format

The word vectors come in the default text format of fastText. The first line gives the number of vectors and their dimension. The other lines contain a word followed by its vector. Each value is space separated.

License

The word vectors are distributed under the Creative Commons Attribution-Share-Alike License 3.0.

References

If you use these word vectors, please cite the following papers:

[1] A. Joulin, P. Bojanowski, T. Mikolov, H. Jegou, E. Grave, Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion

@InProceedings{joulin2018loss,
  title={Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion},
  author={Joulin, Armand and Bojanowski, Piotr and Mikolov, Tomas and J\'egou, Herv\'e and Grave, Edouard},
  year={2018},
  booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
}

[2] P. Bojanowski*, E. Grave*, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information

@article{bojanowski2017enriching,
  title={Enriching Word Vectors with Subword Information},
  author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
  journal={Transactions of the Association for Computational Linguistics},
  volume={5},
  year={2017},
  issn={2307-387X},
  pages={135--146}
}