Browse Source

Add README for pretrained vectors v2

Summary: Add README for pretrained vectors version 2

Reviewed By: piotr-bojanowski

Differential Revision: D7027075

fbshipit-source-id: af448b608ae75e18fe389763b8914d62327a231e
Edouard Grave 8 years ago
parent
commit
012615e034
6 changed files with 147 additions and 13 deletions
  1. 2 2
      README.md
  2. 120 0
      docs/crawl-vectors.md
  3. 15 4
      docs/english-vectors.md
  4. 2 0
      docs/pretrained-vectors.md
  5. 7 7
      website/pages/en/index.js
  6. 1 0
      website/sidebars.json

+ 2 - 2
README.md

@@ -32,7 +32,7 @@
 
 ### Models
 - Recent state-of-the-art [English word vectors](https://fasttext.cc/docs/en/english-vectors.html).
-- Word vectors for [294 languages trained on Wikipedia](https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md).
+- Word vectors for [157 languages trained on Wikipedia and Crawl](https://github.com/facebookresearch/fastText/blob/master/crawl-vectors.md).
 - Models for [language identification](https://fasttext.cc/docs/en/language-identification.html#content) and [various supervised tasks](https://fasttext.cc/docs/en/supervised-models.html#content).
 
 ### Supplementary data
@@ -94,7 +94,7 @@ $ unzip v0.1.0.zip
 $ cd fastText-0.1.0
 $ make
 ```
- 
+
 This will produce object files for all the classes as well as the main binary `fasttext`.
 If you do not plan on using the default system-wide compiler, update the two macros defined at the beginning of the Makefile (CC and INCLUDES).
 

+ 120 - 0
docs/crawl-vectors.md

@@ -0,0 +1,120 @@
+---
+id: crawl-vectors
+title: Word vectors for 157 languages
+---
+
+We distribute pre-trained word vectors for 157 languages, trained on [*Common Crawl*](http://commoncrawl.org/) and [*Wikipedia*](https://www.wikipedia.org) using fastText.
+These models were trained using CBOW with position-weights, in dimension 300, with character n-grams of length 5, a window of size 5 and 10 negatives.
+
+### Format
+
+The word vectors are available in both binary and text formats.
+
+Using the binary models, vectors for out-of-vocabulary words can be obtained with
+```
+$ ./fasttext print-word-vectors wiki.it.300.bin < oov_words.txt
+```
+where the file oov_words.txt contains out-of-vocabulary words.
+
+In the text format, each line contain a word followed by its vector.
+Each value is space separated, and words are sorted by frequency in descending order.
+These text models can easily be loaded in Python using the following code:
+```python
+import io
+
+def load_vectors(fname):
+    fin = io.open(fname, 'r', encoding='utf-8', newline='\n', errors='ignore')
+    n, d = map(int, fin.readline().split())
+    data = {}
+    for line in fin:
+        tokens = line.rstrip().split(' ')
+        data[tokens[0]] = map(float, tokens[1:])
+    return data
+```
+
+### Tokenization
+
+We used the [*Stanford word segmenter*](https://nlp.stanford.edu/software/segmenter.html) for Chinese, [*Mecab*](http://taku910.github.io/mecab/) for Japanese and [*UETsegmenter*](https://github.com/phongnt570/UETsegmenter) for Vietnamese.
+For languages using the Latin, Cyrillic, Hebrew or Greek scripts, we used the tokenizer from the [*Europarl*](http://www.statmt.org/europarl/) preprocessing tools.
+For the remaining languages, we used the ICU tokenizer.
+
+More information about the training of these models can be found in the article [*Learning Word Vectors for 157 Languages*](https://arxiv.org/abs/1802.06893).
+
+### License
+
+The word vectors are distributed under the [*Creative Commons Attribution-Share-Alike License 3.0*](https://creativecommons.org/licenses/by-sa/3.0/).
+
+### References
+
+If you use these word vectors, please cite the following paper:
+
+E. Grave\*, P. Bojanowski\*, P. Gupta, A. Joulin, T. Mikolov, [*Learning Word Vectors for 157 Languages*](https://arxiv.org/abs/1802.06893)
+
+```markup
+@inproceedings{grave2018learning,
+  title={Learning Word Vectors for 157 Languages},
+  author={Grave, Edouard and Bojanowski, Piotr and Gupta, Prakhar and Joulin, Armand and Mikolov, Tomas},
+  booktitle={Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018)},
+  year={2018}
+}
+```
+
+### Models
+
+The models can be downloaded from:
+
+||||
+|-|-|-|
+|  Afrikaans:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.af.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.af.300.vec.gz) |  Albanian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sq.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sq.300.vec.gz) |  Alemannic:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.als.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.als.300.vec.gz) |
+|  Amharic:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.am.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.am.300.vec.gz) |  Arabic:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ar.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ar.300.vec.gz) |  Aragonese:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.an.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.an.300.vec.gz) |
+|  Armenian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hy.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hy.300.vec.gz) |  Assamese:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.as.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.as.300.vec.gz) |  Asturian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ast.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ast.300.vec.gz) |
+|  Azerbaijani:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.az.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.az.300.vec.gz) |  Bashkir:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ba.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ba.300.vec.gz) |  Basque:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.eu.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.eu.300.vec.gz) |
+|  Bavarian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bar.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bar.300.vec.gz) |  Belarusian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.be.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.be.300.vec.gz) |  Bengali:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bn.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bn.300.vec.gz) |
+|  Bihari:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bh.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bh.300.vec.gz) |  Bishnupriya Manipuri:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bpy.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bpy.300.vec.gz) |  Bosnian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bs.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bs.300.vec.gz) |
+|  Breton:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.br.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.br.300.vec.gz) |  Bulgarian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bg.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bg.300.vec.gz) |  Burmese:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.my.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.my.300.vec.gz) |
+|  Catalan:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ca.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ca.300.vec.gz) |  Cebuano:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ceb.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ceb.300.vec.gz) |  Central Bicolano:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bcl.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bcl.300.vec.gz) |
+|  Chechen:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ce.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ce.300.vec.gz) |  Chinese:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.zh.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.zh.300.vec.gz) |  Chuvash:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.cv.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.cv.300.vec.gz) |
+|  Corsican:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.co.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.co.300.vec.gz) |  Croatian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hr.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hr.300.vec.gz) |  Czech:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.cs.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.cs.300.vec.gz) |
+|  Danish:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.da.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.da.300.vec.gz) |  Divehi:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.dv.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.dv.300.vec.gz) |  Dutch:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nl.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nl.300.vec.gz) |
+|  Eastern Punjabi:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pa.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pa.300.vec.gz) |  Egyptian Arabic:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.arz.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.arz.300.vec.gz) |  Emilian-Romagnol:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.eml.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.eml.300.vec.gz) |
+|  Erzya:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.myv.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.myv.300.vec.gz) |  Esperanto:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.eo.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.eo.300.vec.gz) |  Estonian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.et.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.et.300.vec.gz) |
+|  Fiji Hindi:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hif.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hif.300.vec.gz) |  Finnish:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.fi.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.fi.300.vec.gz) |  French:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.fr.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.fr.300.vec.gz) |
+|  Galician:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.gl.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.gl.300.vec.gz) |  Georgian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ka.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ka.300.vec.gz) |  German:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.de.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.de.300.vec.gz) |
+|  Goan Konkani:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.gom.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.gom.300.vec.gz) |  Greek:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.el.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.el.300.vec.gz) |  Gujarati:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.gu.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.gu.300.vec.gz) |
+|  Haitian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ht.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ht.300.vec.gz) |  Hebrew:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.he.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.he.300.vec.gz) |  Hill Mari:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mrj.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mrj.300.vec.gz) |
+|  Hindi:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hi.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hi.300.vec.gz) |  Hungarian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hu.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hu.300.vec.gz) |  Icelandic:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.is.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.is.300.vec.gz) |
+|  Ido:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.io.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.io.300.vec.gz) |  Ilokano:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ilo.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ilo.300.vec.gz) |  Indonesian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.id.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.id.300.vec.gz) |
+|  Interlingua:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ia.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ia.300.vec.gz) |  Irish:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ga.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ga.300.vec.gz) |  Italian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.it.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.it.300.vec.gz) |
+|  Japanese:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ja.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ja.300.vec.gz) |  Javanese:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.jv.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.jv.300.vec.gz) |  Kannada:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.kn.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.kn.300.vec.gz) |
+|  Kapampangan:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pam.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pam.300.vec.gz) |  Kazakh:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.kk.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.kk.300.vec.gz) |  Khmer:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.km.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.km.300.vec.gz) |
+|  Kirghiz:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ky.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ky.300.vec.gz) |  Korean:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ko.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ko.300.vec.gz) |  Kurdish (Kurmanji):  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ku.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ku.300.vec.gz) |
+|  Kurdish (Sorani):  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ckb.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ckb.300.vec.gz) |  Latin:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.la.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.la.300.vec.gz) |  Latvian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.lv.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.lv.300.vec.gz) |
+|  Limburgish:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.li.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.li.300.vec.gz) |  Lithuanian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.lt.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.lt.300.vec.gz) |  Lombard:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.lmo.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.lmo.300.vec.gz) |
+|  Low Saxon:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nds.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nds.300.vec.gz) |  Luxembourgish:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.lb.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.lb.300.vec.gz) |  Macedonian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mk.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mk.300.vec.gz) |
+|  Maithili:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mai.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mai.300.vec.gz) |  Malagasy:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mg.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mg.300.vec.gz) |  Malay:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ms.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ms.300.vec.gz) |
+|  Malayalam:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ml.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ml.300.vec.gz) |  Maltese:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mt.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mt.300.vec.gz) |  Manx:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.gv.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.gv.300.vec.gz) |
+|  Marathi:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mr.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mr.300.vec.gz) |  Mazandarani:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mzn.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mzn.300.vec.gz) |  Meadow Mari:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mhr.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mhr.300.vec.gz) |
+|  Minangkabau:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.min.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.min.300.vec.gz) |  Mingrelian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.xmf.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.xmf.300.vec.gz) |  Mirandese:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mwl.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mwl.300.vec.gz) |
+|  Mongolian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mn.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.mn.300.vec.gz) |  Nahuatl:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nah.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nah.300.vec.gz) |  Neapolitan:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nap.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nap.300.vec.gz) |
+|  Nepali:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ne.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ne.300.vec.gz) |  Newar:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.new.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.new.300.vec.gz) |  North Frisian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.frr.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.frr.300.vec.gz) |
+|  Northern Sotho:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nso.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nso.300.vec.gz) |  Norwegian (Bokmål):  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.no.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.no.300.vec.gz) |  Norwegian (Nynorsk):  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nn.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.nn.300.vec.gz) |
+|  Occitan:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.oc.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.oc.300.vec.gz) |  Oriya:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.or.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.or.300.vec.gz) |  Ossetian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.os.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.os.300.vec.gz) |
+|  Palatinate German:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pfl.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pfl.300.vec.gz) |  Pashto:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ps.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ps.300.vec.gz) |  Persian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.fa.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.fa.300.vec.gz) |
+|  Piedmontese:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pms.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pms.300.vec.gz) |  Polish:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pl.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pl.300.vec.gz) |  Portuguese:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pt.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pt.300.vec.gz) |
+|  Quechua:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.qu.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.qu.300.vec.gz) |  Romanian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ro.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ro.300.vec.gz) |  Romansh:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.rm.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.rm.300.vec.gz) |
+|  Russian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ru.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ru.300.vec.gz) |  Sakha:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sah.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sah.300.vec.gz) |  Sanskrit:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sa.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sa.300.vec.gz) |
+|  Sardinian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sc.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sc.300.vec.gz) |  Scots:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sco.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sco.300.vec.gz) |  Scottish Gaelic:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.gd.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.gd.300.vec.gz) |
+|  Serbian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sr.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sr.300.vec.gz) |  Serbo-Croatian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sh.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sh.300.vec.gz) |  Sicilian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.scn.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.scn.300.vec.gz) |
+|  Sindhi:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sd.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sd.300.vec.gz) |  Sinhalese:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.si.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.si.300.vec.gz) |  Slovak:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sk.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sk.300.vec.gz) |
+|  Slovenian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sl.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sl.300.vec.gz) |  Somali:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.so.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.so.300.vec.gz) |  Southern Azerbaijani:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.azb.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.azb.300.vec.gz) |
+|  Spanish:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.es.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.es.300.vec.gz) |  Sundanese:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.su.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.su.300.vec.gz) |  Swahili:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sw.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sw.300.vec.gz) |
+|  Swedish:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sv.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.sv.300.vec.gz) |  Tagalog:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.tl.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.tl.300.vec.gz) |  Tajik:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.tg.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.tg.300.vec.gz) |
+|  Tamil:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ta.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ta.300.vec.gz) |  Tatar:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.tt.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.tt.300.vec.gz) |  Telugu:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.te.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.te.300.vec.gz) |
+|  Thai:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.th.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.th.300.vec.gz) |  Tibetan:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bo.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.bo.300.vec.gz) |  Turkish:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.tr.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.tr.300.vec.gz) |
+|  Turkmen:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.tk.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.tk.300.vec.gz) |  Ukrainian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.uk.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.uk.300.vec.gz) |  Upper Sorbian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hsb.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.hsb.300.vec.gz) |
+|  Urdu:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ur.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ur.300.vec.gz) |  Uyghur:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ug.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.ug.300.vec.gz) |  Uzbek:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.uz.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.uz.300.vec.gz) |
+|  Venetian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.vec.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.vec.300.vec.gz) |  Vietnamese:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.vi.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.vi.300.vec.gz) |  Volapük:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.vo.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.vo.300.vec.gz) |
+|  Walloon:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.wa.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.wa.300.vec.gz) |  Waray:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.war.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.war.300.vec.gz) |  Welsh:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.cy.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.cy.300.vec.gz) |
+|  West Flemish:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.vls.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.vls.300.vec.gz) |  West Frisian:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.fy.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.fy.300.vec.gz) |  Western Punjabi:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pnb.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.pnb.300.vec.gz) |
+|  Yiddish:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.yi.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.yi.300.vec.gz) |  Yoruba:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.yo.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.yo.300.vec.gz) |  Zazaki:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.diq.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.diq.300.vec.gz) |
+|  Zeelandic:  [bin](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.zea.300.bin.gz), [text](https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.zea.300.vec.gz) |

+ 15 - 4
docs/english-vectors.md

@@ -3,14 +3,14 @@ id: english-vectors
 title: English word vectors
 ---
 
-This page gathers several pre-trained word vectors trained using fastText. More details will be added later.
+This page gathers several pre-trained word vectors trained using fastText.
 
 ### Download pre-trained word vectors
 
 Pre-trained word vectors learned on different sources can be downloaded below:
 
-1. [wiki-news-300d-1M.vec.zip](https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki-news-300d-1M.vec.zip): 1 million word vectors trained on Wikipedia 2017, UMBC webbase corpus and statmt.org news dataset (16B tokens). 
-2. [wiki-news-300d-1M-subword.vec.zip](https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki-news-300d-1M-subword.vec.zip): 1 million word vectors trained with subword infomation on Wikipedia 2017, UMBC webbase corpus and statmt.org news dataset (16B tokens). 
+1. [wiki-news-300d-1M.vec.zip](https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki-news-300d-1M.vec.zip): 1 million word vectors trained on Wikipedia 2017, UMBC webbase corpus and statmt.org news dataset (16B tokens).
+2. [wiki-news-300d-1M-subword.vec.zip](https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki-news-300d-1M-subword.vec.zip): 1 million word vectors trained with subword infomation on Wikipedia 2017, UMBC webbase corpus and statmt.org news dataset (16B tokens).
 3. [crawl-300d-2M.vec.zip](https://s3-us-west-1.amazonaws.com/fasttext-vectors/crawl-300d-2M.vec.zip): 2 million word vectors trained on Common Crawl (600B tokens).
 
 ### Format
@@ -25,4 +25,15 @@ These word vectors are distributed under the [*Creative Commons Attribution-Shar
 
 ### References
 
-We are preparing a publication describing how these models were trained.
+If you use these word vectors, please cite the following paper:
+
+T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, A. Joulin. [*Advances in Pre-Training Distributed Word Representations*](https://arxiv.org/abs/1712.09405)
+
+```markup
+@inproceedings{mikolov2018advances,
+  title={Advances in Pre-Training Distributed Word Representations},
+  author={Mikolov, Tomas and Grave, Edouard and Bojanowski, Piotr and Puhrsch, Christian and Joulin, Armand},
+  booktitle={Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018)},
+  year={2018}
+}
+```

+ 2 - 0
docs/pretrained-vectors.md

@@ -6,6 +6,8 @@ title: Wiki word vectors
 We are publishing pre-trained word vectors for 294 languages, trained on [*Wikipedia*](https://www.wikipedia.org) using fastText.
 These vectors in dimension 300 were obtained using the skip-gram model described in [*Bojanowski et al. (2016)*](https://arxiv.org/abs/1607.04606) with default parameters.
 
+Please note that a newer version of multi-lingual word vectors are available at: [https://fasttext.cc/docs/en/crawl-vectors.html].
+
 ### Models
 
 The models can be downloaded from:

+ 7 - 7
website/pages/en/index.js

@@ -126,11 +126,11 @@ class Index extends React.Component {
                 pinned : "true",
               },
               {
-                content: "Pre-trained on 294 different languages of Wikipedia",
+                content: "Pre-trained models for 157 different languages",
                 image: siteConfig.baseUrl + "img/model-red.png",
                 imageAlign: "top",
-                title: "[Wiki word vectors](" + siteConfig.baseUrl + "docs/en/pretrained-vectors.html)",
-                imageLink: siteConfig.baseUrl + "docs/en/pretrained-vectors.html",
+                title: "[Multi-lingual word vectors](" + siteConfig.baseUrl + "docs/en/crawl-vectors.html)",
+                imageLink: siteConfig.baseUrl + "docs/en/crawl-vectors.html",
               },
             ]}
           layout="twoColumn"
@@ -192,7 +192,7 @@ class Index extends React.Component {
                     title: "[Bag of Tricks for Efficient Text Classification](https://arxiv.org/abs/1607.01759)",
                   },
                   {
-                    content: "A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov",
+                    content: "A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jegou, T. Mikolov",
                     title: "[FastText.zip: Compressing text classification models](https://arxiv.org/abs/1612.03651)",
                   }
 
@@ -235,9 +235,9 @@ class Index extends React.Component {
               layout="threeColumn"
             />
           </Container>
-	  <br/>
-	  <br/>
-	  </div>
+          <br/>
+          <br/>
+          </div>
           <div className="productShowcaseSection paddingTop">
             <h2>
               {"Users"}

+ 1 - 0
website/sidebars.json

@@ -7,6 +7,7 @@
   "download": {
     "Download": [
       "english-vectors",
+      "crawl-vectors",
       "pretrained-vectors",
       "supervised-models",
       "language-identification",