|
|
@@ -0,0 +1,120 @@
|
|
|
+---
|
|
|
+id: crawl-vectors
|
|
|
+title: Word vectors for 157 languages
|
|
|
+---
|
|
|
+
|
|
|
+We distribute pre-trained word vectors for 157 languages, trained on [*Common Crawl*](http://commoncrawl.org/) and [*Wikipedia*](https://www.wikipedia.org) using fastText.
|
|
|
+These models were trained using CBOW with position-weights, in dimension 300, with character n-grams of length 5, a window of size 5 and 10 negatives.
|
|
|
+
|
|
|
+### Format
|
|
|
+
|
|
|
+The word vectors are available in both binary and text formats.
|
|
|
+
|
|
|
+Using the binary models, vectors for out-of-vocabulary words can be obtained with
|
|
|
+```
|
|
|
+$ ./fasttext print-word-vectors wiki.it.300.bin < oov_words.txt
|
|
|
+```
|
|
|
+where the file oov_words.txt contains out-of-vocabulary words.
|
|
|
+
|
|
|
+In the text format, each line contain a word followed by its vector.
|
|
|
+Each value is space separated, and words are sorted by frequency in descending order.
|
|
|
+These text models can easily be loaded in Python using the following code:
|
|
|
+```python
|
|
|
+import io
|
|
|
+
|
|
|
+def load_vectors(fname):
|
|
|
+ fin = io.open(fname, 'r', encoding='utf-8', newline='\n', errors='ignore')
|
|
|
+ n, d = map(int, fin.readline().split())
|
|
|
+ data = {}
|
|
|
+ for line in fin:
|
|
|
+ tokens = line.rstrip().split(' ')
|
|
|
+ data[tokens[0]] = map(float, tokens[1:])
|
|
|
+ return data
|
|
|
+```
|
|
|
+
|
|
|
+### Tokenization
|
|
|
+
|
|
|
+We used the [*Stanford word segmenter*](https://nlp.stanford.edu/software/segmenter.html) for Chinese, [*Mecab*](http://taku910.github.io/mecab/) for Japanese and [*UETsegmenter*](https://github.com/phongnt570/UETsegmenter) for Vietnamese.
|
|
|
+For languages using the Latin, Cyrillic, Hebrew or Greek scripts, we used the tokenizer from the [*Europarl*](http://www.statmt.org/europarl/) preprocessing tools.
|
|
|
+For the remaining languages, we used the ICU tokenizer.
|
|
|
+
|
|
|
+More information about the training of these models can be found in the article [*Learning Word Vectors for 157 Languages*](https://arxiv.org/abs/1802.06893).
|
|
|
+
|
|
|
+### License
|
|
|
+
|
|
|
+The word vectors are distributed under the [*Creative Commons Attribution-Share-Alike License 3.0*](https://creativecommons.org/licenses/by-sa/3.0/).
|
|
|
+
|
|
|
+### References
|
|
|
+
|
|
|
+If you use these word vectors, please cite the following paper:
|
|
|
+
|
|
|
+E. Grave\*, P. Bojanowski\*, P. Gupta, A. Joulin, T. Mikolov, [*Learning Word Vectors for 157 Languages*](https://arxiv.org/abs/1802.06893)
|
|
|
+
|
|
|
+```markup
|
|
|
+@inproceedings{grave2018learning,
|
|
|
+ title={Learning Word Vectors for 157 Languages},
|
|
|
+ author={Grave, Edouard and Bojanowski, Piotr and Gupta, Prakhar and Joulin, Armand and Mikolov, Tomas},
|
|
|
+ booktitle={Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018)},
|
|
|
+ year={2018}
|
|
|
+}
|
|
|
+```
|
|
|
+
|
|
|
+### Models
|
|
|
+
|
|
|
+The models can be downloaded from:
|
|
|
+
|
|
|
+||||
|
|
|
+|-|-|-|
|
|
|
+| Afrikaans: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.af.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.af.300.vec.gz) | Albanian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sq.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sq.300.vec.gz) | Alemannic: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.als.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.als.300.vec.gz) |
|
|
|
+| Amharic: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.am.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.am.300.vec.gz) | Arabic: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ar.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ar.300.vec.gz) | Aragonese: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.an.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.an.300.vec.gz) |
|
|
|
+| Armenian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hy.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hy.300.vec.gz) | Assamese: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.as.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.as.300.vec.gz) | Asturian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ast.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ast.300.vec.gz) |
|
|
|
+| Azerbaijani: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.az.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.az.300.vec.gz) | Bashkir: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ba.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ba.300.vec.gz) | Basque: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.eu.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.eu.300.vec.gz) |
|
|
|
+| Bavarian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bar.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bar.300.vec.gz) | Belarusian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.be.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.be.300.vec.gz) | Bengali: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bn.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bn.300.vec.gz) |
|
|
|
+| Bihari: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bh.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bh.300.vec.gz) | Bishnupriya Manipuri: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bpy.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bpy.300.vec.gz) | Bosnian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bs.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bs.300.vec.gz) |
|
|
|
+| Breton: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.br.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.br.300.vec.gz) | Bulgarian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bg.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bg.300.vec.gz) | Burmese: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.my.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.my.300.vec.gz) |
|
|
|
+| Catalan: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ca.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ca.300.vec.gz) | Cebuano: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ceb.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ceb.300.vec.gz) | Central Bicolano: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bcl.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bcl.300.vec.gz) |
|
|
|
+| Chechen: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ce.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ce.300.vec.gz) | Chinese: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.zh.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.zh.300.vec.gz) | Chuvash: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.cv.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.cv.300.vec.gz) |
|
|
|
+| Corsican: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.co.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.co.300.vec.gz) | Croatian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hr.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hr.300.vec.gz) | Czech: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.cs.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.cs.300.vec.gz) |
|
|
|
+| Danish: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.da.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.da.300.vec.gz) | Divehi: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.dv.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.dv.300.vec.gz) | Dutch: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nl.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nl.300.vec.gz) |
|
|
|
+| Eastern Punjabi: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pa.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pa.300.vec.gz) | Egyptian Arabic: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.arz.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.arz.300.vec.gz) | Emilian-Romagnol: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.eml.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.eml.300.vec.gz) |
|
|
|
+| Erzya: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.myv.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.myv.300.vec.gz) | Esperanto: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.eo.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.eo.300.vec.gz) | Estonian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.et.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.et.300.vec.gz) |
|
|
|
+| Fiji Hindi: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hif.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hif.300.vec.gz) | Finnish: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.fi.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.fi.300.vec.gz) | French: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.fr.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.fr.300.vec.gz) |
|
|
|
+| Galician: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.gl.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.gl.300.vec.gz) | Georgian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ka.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ka.300.vec.gz) | German: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.de.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.de.300.vec.gz) |
|
|
|
+| Goan Konkani: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.gom.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.gom.300.vec.gz) | Greek: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.el.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.el.300.vec.gz) | Gujarati: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.gu.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.gu.300.vec.gz) |
|
|
|
+| Haitian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ht.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ht.300.vec.gz) | Hebrew: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.he.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.he.300.vec.gz) | Hill Mari: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mrj.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mrj.300.vec.gz) |
|
|
|
+| Hindi: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hi.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hi.300.vec.gz) | Hungarian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hu.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hu.300.vec.gz) | Icelandic: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.is.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.is.300.vec.gz) |
|
|
|
+| Ido: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.io.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.io.300.vec.gz) | Ilokano: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ilo.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ilo.300.vec.gz) | Indonesian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.id.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.id.300.vec.gz) |
|
|
|
+| Interlingua: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ia.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ia.300.vec.gz) | Irish: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ga.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ga.300.vec.gz) | Italian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.it.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.it.300.vec.gz) |
|
|
|
+| Japanese: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ja.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ja.300.vec.gz) | Javanese: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.jv.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.jv.300.vec.gz) | Kannada: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.kn.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.kn.300.vec.gz) |
|
|
|
+| Kapampangan: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pam.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pam.300.vec.gz) | Kazakh: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.kk.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.kk.300.vec.gz) | Khmer: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.km.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.km.300.vec.gz) |
|
|
|
+| Kirghiz: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ky.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ky.300.vec.gz) | Korean: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ko.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ko.300.vec.gz) | Kurdish (Kurmanji): [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ku.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ku.300.vec.gz) |
|
|
|
+| Kurdish (Sorani): [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ckb.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ckb.300.vec.gz) | Latin: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.la.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.la.300.vec.gz) | Latvian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.lv.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.lv.300.vec.gz) |
|
|
|
+| Limburgish: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.li.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.li.300.vec.gz) | Lithuanian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.lt.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.lt.300.vec.gz) | Lombard: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.lmo.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.lmo.300.vec.gz) |
|
|
|
+| Low Saxon: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nds.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nds.300.vec.gz) | Luxembourgish: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.lb.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.lb.300.vec.gz) | Macedonian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mk.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mk.300.vec.gz) |
|
|
|
+| Maithili: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mai.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mai.300.vec.gz) | Malagasy: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mg.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mg.300.vec.gz) | Malay: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ms.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ms.300.vec.gz) |
|
|
|
+| Malayalam: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ml.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ml.300.vec.gz) | Maltese: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mt.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mt.300.vec.gz) | Manx: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.gv.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.gv.300.vec.gz) |
|
|
|
+| Marathi: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mr.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mr.300.vec.gz) | Mazandarani: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mzn.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mzn.300.vec.gz) | Meadow Mari: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mhr.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mhr.300.vec.gz) |
|
|
|
+| Minangkabau: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.min.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.min.300.vec.gz) | Mingrelian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.xmf.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.xmf.300.vec.gz) | Mirandese: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mwl.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mwl.300.vec.gz) |
|
|
|
+| Mongolian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mn.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mn.300.vec.gz) | Nahuatl: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nah.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nah.300.vec.gz) | Neapolitan: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nap.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nap.300.vec.gz) |
|
|
|
+| Nepali: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ne.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ne.300.vec.gz) | Newar: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.new.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.new.300.vec.gz) | North Frisian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.frr.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.frr.300.vec.gz) |
|
|
|
+| Northern Sotho: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nso.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nso.300.vec.gz) | Norwegian (Bokmål): [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.no.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.no.300.vec.gz) | Norwegian (Nynorsk): [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nn.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nn.300.vec.gz) |
|
|
|
+| Occitan: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.oc.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.oc.300.vec.gz) | Oriya: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.or.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.or.300.vec.gz) | Ossetian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.os.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.os.300.vec.gz) |
|
|
|
+| Palatinate German: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pfl.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pfl.300.vec.gz) | Pashto: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ps.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ps.300.vec.gz) | Persian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.fa.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.fa.300.vec.gz) |
|
|
|
+| Piedmontese: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pms.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pms.300.vec.gz) | Polish: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pl.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pl.300.vec.gz) | Portuguese: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pt.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pt.300.vec.gz) |
|
|
|
+| Quechua: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.qu.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.qu.300.vec.gz) | Romanian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ro.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ro.300.vec.gz) | Romansh: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.rm.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.rm.300.vec.gz) |
|
|
|
+| Russian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ru.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ru.300.vec.gz) | Sakha: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sah.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sah.300.vec.gz) | Sanskrit: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sa.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sa.300.vec.gz) |
|
|
|
+| Sardinian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sc.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sc.300.vec.gz) | Scots: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sco.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sco.300.vec.gz) | Scottish Gaelic: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.gd.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.gd.300.vec.gz) |
|
|
|
+| Serbian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sr.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sr.300.vec.gz) | Serbo-Croatian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sh.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sh.300.vec.gz) | Sicilian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.scn.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.scn.300.vec.gz) |
|
|
|
+| Sindhi: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sd.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sd.300.vec.gz) | Sinhalese: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.si.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.si.300.vec.gz) | Slovak: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sk.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sk.300.vec.gz) |
|
|
|
+| Slovenian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sl.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sl.300.vec.gz) | Somali: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.so.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.so.300.vec.gz) | Southern Azerbaijani: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.azb.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.azb.300.vec.gz) |
|
|
|
+| Spanish: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.es.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.es.300.vec.gz) | Sundanese: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.su.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.su.300.vec.gz) | Swahili: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sw.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sw.300.vec.gz) |
|
|
|
+| Swedish: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sv.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sv.300.vec.gz) | Tagalog: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.tl.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.tl.300.vec.gz) | Tajik: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.tg.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.tg.300.vec.gz) |
|
|
|
+| Tamil: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ta.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ta.300.vec.gz) | Tatar: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.tt.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.tt.300.vec.gz) | Telugu: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.te.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.te.300.vec.gz) |
|
|
|
+| Thai: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.th.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.th.300.vec.gz) | Tibetan: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bo.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bo.300.vec.gz) | Turkish: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.tr.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.tr.300.vec.gz) |
|
|
|
+| Turkmen: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.tk.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.tk.300.vec.gz) | Ukrainian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.uk.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.uk.300.vec.gz) | Upper Sorbian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hsb.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hsb.300.vec.gz) |
|
|
|
+| Urdu: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ur.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ur.300.vec.gz) | Uyghur: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ug.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ug.300.vec.gz) | Uzbek: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.uz.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.uz.300.vec.gz) |
|
|
|
+| Venetian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.vec.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.vec.300.vec.gz) | Vietnamese: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.vi.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.vi.300.vec.gz) | Volapük: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.vo.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.vo.300.vec.gz) |
|
|
|
+| Walloon: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.wa.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.wa.300.vec.gz) | Waray: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.war.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.war.300.vec.gz) | Welsh: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.cy.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.cy.300.vec.gz) |
|
|
|
+| West Flemish: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.vls.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.vls.300.vec.gz) | West Frisian: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.fy.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.fy.300.vec.gz) | Western Punjabi: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pnb.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pnb.300.vec.gz) |
|
|
|
+| Yiddish: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.yi.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.yi.300.vec.gz) | Yoruba: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.yo.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.yo.300.vec.gz) | Zazaki: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.diq.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.diq.300.vec.gz) |
|
|
|
+| Zeelandic: [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.zea.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.zea.300.vec.gz) |
|