id: pretrained-vectors
We are publishing pre-trained word vectors for 294 languages, trained on Wikipedia using fastText. These vectors in dimension 300 were obtained using the skip-gram model described in Bojanowski et al. (2016) with default parameters.
Please note that a newer version of multi-lingual word vectors are available at: Word vectors for 157 languages.
The models can be downloaded from:
|||| |-|-|-| | Abkhazian: bin+text, text | Acehnese: bin+text, text | Adyghe: bin+text, text | | Afar: bin+text, text | Afrikaans: bin+text, text | Akan: bin+text, text | | Albanian: bin+text, text | Alemannic: bin+text, text | Amharic: bin+text, text | | Anglo_Saxon: bin+text, text | Arabic: bin+text, text | Aragonese: bin+text, text | | Aramaic: bin+text, text | Armenian: bin+text, text | Aromanian: bin+text, text | | Assamese: bin+text, text | Asturian: bin+text, text | Avar: bin+text, text | | Aymara: bin+text, text | Azerbaijani: bin+text, text | Bambara: bin+text, text | | Banjar: bin+text, text | Banyumasan: bin+text, text | Bashkir: bin+text, text | | Basque: bin+text, text | Bavarian: bin+text, text | Belarusian: bin+text, text | | Bengali: bin+text, text | Bihari: bin+text, text | Bishnupriya Manipuri: bin+text, text | | Bislama: bin+text, text | Bosnian: bin+text, text | Breton: bin+text, text | | Buginese: bin+text, text | Bulgarian: bin+text, text | Burmese: bin+text, text | | Buryat: bin+text, text | Cantonese: bin+text, text | Catalan: bin+text, text | | Cebuano: bin+text, text | Central Bicolano: bin+text, text | Chamorro: bin+text, text | | Chavacano: bin+text, text | Chechen: bin+text, text | Cherokee: bin+text, text | | Cheyenne: bin+text, text | Chichewa: bin+text, text | Chinese: bin+text, text | | Choctaw: bin+text, text | Chuvash: bin+text, text | Classical Chinese: bin+text, text | | Cornish: bin+text, text | Corsican: bin+text, text | Cree: bin+text, text | | Crimean Tatar: bin+text, text | Croatian: bin+text, text | Czech: bin+text, text | | Danish: bin+text, text | Divehi: bin+text, text | Dutch: bin+text, text | | Dutch Low Saxon: bin+text, text | Dzongkha: bin+text, text | Eastern Punjabi: bin+text, text | | Egyptian Arabic: bin+text, text | Emilian_Romagnol: bin+text, text | English: bin+text, text | | Erzya: bin+text, text | Esperanto: bin+text, text | Estonian: bin+text, text | | Ewe: bin+text, text | Extremaduran: bin+text, text | Faroese: bin+text, text | | Fiji Hindi: bin+text, text | Fijian: bin+text, text | Finnish: bin+text, text | | Franco_Provençal: bin+text, text | French: bin+text, text | Friulian: bin+text, text | | Fula: bin+text, text | Gagauz: bin+text, text | Galician: bin+text, text | | Gan: bin+text, text | Georgian: bin+text, text | German: bin+text, text | | Gilaki: bin+text, text | Goan Konkani: bin+text, text | Gothic: bin+text, text | | Greek: bin+text, text | Greenlandic: bin+text, text | Guarani: bin+text, text | | Gujarati: bin+text, text | Haitian: bin+text, text | Hakka: bin+text, text | | Hausa: bin+text, text | Hawaiian: bin+text, text | Hebrew: bin+text, text | | Herero: bin+text, text | Hill Mari: bin+text, text | Hindi: bin+text, text | | Hiri Motu: bin+text, text | Hungarian: bin+text, text | Icelandic: bin+text, text | | Ido: bin+text, text | Igbo: bin+text, text | Ilokano: bin+text, text | | Indonesian: bin+text, text | Interlingua: bin+text, text | Interlingue: bin+text, text | | Inuktitut: bin+text, text | Inupiak: bin+text, text | Irish: bin+text, text | | Italian: bin+text, text | Jamaican Patois: bin+text, text | Japanese: bin+text, text | | Javanese: bin+text, text | Kabardian: bin+text, text | Kabyle: bin+text, text | | Kalmyk: bin+text, text | Kannada: bin+text, text | Kanuri: bin+text, text | | Kapampangan: bin+text, text | Karachay_Balkar: bin+text, text | Karakalpak: bin+text, text | | Kashmiri: bin+text, text | Kashubian: bin+text, text | Kazakh: bin+text, text | | Khmer: bin+text, text | Kikuyu: bin+text, text | Kinyarwanda: bin+text, text | | Kirghiz: bin+text, text | Kirundi: bin+text, text | Komi: bin+text, text | | Komi_Permyak: bin+text, text | Kongo: bin+text, text | Korean: bin+text, text | | Kuanyama: bin+text, text | Kurdish (Kurmanji): bin+text, text | Kurdish (Sorani): bin+text, text | | Ladino: bin+text, text | Lak: bin+text, text | Lao: bin+text, text | | Latgalian: bin+text, text | Latin: bin+text, text | Latvian: bin+text, text | | Lezgian: bin+text, text | Ligurian: bin+text, text | Limburgish: bin+text, text | | Lingala: bin+text, text | Lithuanian: bin+text, text | Livvi_Karelian: bin+text, text | | Lojban: bin+text, text | Lombard: bin+text, text | Low Saxon: bin+text, text | | Lower Sorbian: bin+text, text | Luganda: bin+text, text | Luxembourgish: bin+text, text | | Macedonian: bin+text, text | Maithili: bin+text, text | Malagasy: bin+text, text | | Malay: bin+text, text | Malayalam: bin+text, text | Maltese: bin+text, text | | Manx: bin+text, text | Maori: bin+text, text | Marathi: bin+text, text | | Marshallese: bin+text, text | Mazandarani: bin+text, text | Meadow Mari: bin+text, text | | Min Dong: bin+text, text | Min Nan: bin+text, text | Minangkabau: bin+text, text | | Mingrelian: bin+text, text | Mirandese: bin+text, text | Moksha: bin+text, text | | Moldovan: bin+text, text | Mongolian: bin+text, text | Muscogee: bin+text, text | | Nahuatl: bin+text, text | Nauruan: bin+text, text | Navajo: bin+text, text | | Ndonga: bin+text, text | Neapolitan: bin+text, text | Nepali: bin+text, text | | Newar: bin+text, text | Norfolk: bin+text, text | Norman: bin+text, text | | North Frisian: bin+text, text | Northern Luri: bin+text, text | Northern Sami: bin+text, text | | Northern Sotho: bin+text, text | Norwegian (Bokmål): bin+text, text | Norwegian (Nynorsk): bin+text, text | | Novial: bin+text, text | Nuosu: bin+text, text | Occitan: bin+text, text | | Old Church Slavonic: bin+text, text | Oriya: bin+text, text | Oromo: bin+text, text | | Ossetian: bin+text, text | Palatinate German: bin+text, text | Pali: bin+text, text | | Pangasinan: bin+text, text | Papiamentu: bin+text, text | Pashto: bin+text, text | | Pennsylvania German: bin+text, text | Persian: bin+text, text | Picard: bin+text, text | | Piedmontese: bin+text, text | Polish: bin+text, text | Pontic: bin+text, text | | Portuguese: bin+text, text | Quechua: bin+text, text | Ripuarian: bin+text, text | | Romani: bin+text, text | Romanian: bin+text, text | Romansh: bin+text, text | | Russian: bin+text, text | Rusyn: bin+text, text | Sakha: bin+text, text | | Samoan: bin+text, text | Samogitian: bin+text, text | Sango: bin+text, text | | Sanskrit: bin+text, text | Sardinian: bin+text, text | Saterland Frisian: bin+text, text | | Scots: bin+text, text | Scottish Gaelic: bin+text, text | Serbian: bin+text, text | | Serbo_Croatian: bin+text, text | Sesotho: bin+text, text | Shona: bin+text, text | | Sicilian: bin+text, text | Silesian: bin+text, text | Simple English: bin+text, text | | Sindhi: bin+text, text | Sinhalese: bin+text, text | Slovak: bin+text, text | | Slovenian: bin+text, text | Somali: bin+text, text | Southern Azerbaijani: bin+text, text | | Spanish: bin+text, text | Sranan: bin+text, text | Sundanese: bin+text, text | | Swahili: bin+text, text | Swati: bin+text, text | Swedish: bin+text, text | | Tagalog: bin+text, text | Tahitian: bin+text, text | Tajik: bin+text, text | | Tamil: bin+text, text | Tarantino: bin+text, text | Tatar: bin+text, text | | Telugu: bin+text, text | Tetum: bin+text, text | Thai: bin+text, text | | Tibetan: bin+text, text | Tigrinya: bin+text, text | Tok Pisin: bin+text, text | | Tongan: bin+text, text | Tsonga: bin+text, text | Tswana: bin+text, text | | Tulu: bin+text, text | Tumbuka: bin+text, text | Turkish: bin+text, text | | Turkmen: bin+text, text | Tuvan: bin+text, text | Twi: bin+text, text | | Udmurt: bin+text, text | Ukrainian: bin+text, text | Upper Sorbian: bin+text, text | | Urdu: bin+text, text | Uyghur: bin+text, text | Uzbek: bin+text, text | | Venda: bin+text, text | Venetian: bin+text, text | Vepsian: bin+text, text | | Vietnamese: bin+text, text | Volapük: bin+text, text | Võro: bin+text, text | | Walloon: bin+text, text | Waray: bin+text, text | Welsh: bin+text, text | | West Flemish: bin+text, text | West Frisian: bin+text, text | Western Punjabi: bin+text, text | | Wolof: bin+text, text | Wu: bin+text, text | Xhosa: bin+text, text | | Yiddish: bin+text, text | Yoruba: bin+text, text | Zazaki: bin+text, text | | Zeelandic: bin+text, text | Zhuang: bin+text, text | Zulu: bin+text, text |
The word vectors come in both the binary and text default formats of fastText. In the text format, each line contains a word followed by its vector. Each value is space separated. Words are ordered by their frequency in a descending order.
The word vectors are distributed under the Creative Commons Attribution-Share-Alike License 3.0.
If you use these word vectors, please cite the following paper:
P. Bojanowski*, E. Grave*, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information
@article{bojanowski2017enriching,
title={Enriching Word Vectors with Subword Information},
author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
journal={Transactions of the Association for Computational Linguistics},
volume={5},
year={2017},
issn={2307-387X},
pages={135--146}
}