Selaa lähdekoodia

update various readme

Summary: update various readme

Reviewed By: ajoulin

Differential Revision: D7835091

fbshipit-source-id: 8bf86ec8578b9dd30ae90f615a733b4b8821e2d9
Edouard Grave 7 vuotta sitten
vanhempi
sitoutus
3ba1758d1c
3 muutettua tiedostoa jossa 31 lisäystä ja 9 poistoa
  1. 12 6
      README.md
  2. 13 0
      docs/english-vectors.md
  3. 6 3
      docs/pretrained-vectors.md

+ 12 - 6
README.md

@@ -283,11 +283,14 @@ Please cite [1](#enriching-word-vectors-with-subword-information) if using this
 [1] P. Bojanowski\*, E. Grave\*, A. Joulin, T. Mikolov, [*Enriching Word Vectors with Subword Information*](https://arxiv.org/abs/1607.04606)
 
 ```
-@article{bojanowski2016enriching,
+@article{bojanowski2017enriching,
   title={Enriching Word Vectors with Subword Information},
   author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
-  journal={arXiv preprint arXiv:1607.04606},
-  year={2016}
+  journal={Transactions of the Association for Computational Linguistics},
+  volume={5},
+  year={2017},
+  issn={2307-387X},
+  pages={135--146}
 }
 ```
 
@@ -296,11 +299,14 @@ Please cite [1](#enriching-word-vectors-with-subword-information) if using this
 [2] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, [*Bag of Tricks for Efficient Text Classification*](https://arxiv.org/abs/1607.01759)
 
 ```
-@article{joulin2016bag,
+@InProceedings{joulin2017bag,
   title={Bag of Tricks for Efficient Text Classification},
   author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas},
-  journal={arXiv preprint arXiv:1607.01759},
-  year={2016}
+  booktitle={Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers},
+  month={April},
+  year={2017},
+  publisher={Association for Computational Linguistics},
+  pages={427--431},
 }
 ```
 

+ 13 - 0
docs/english-vectors.md

@@ -18,6 +18,19 @@ Pre-trained word vectors learned on different sources can be downloaded below:
 The first line of the file contains the number of words in the vocabulary and the size of the vectors.
 Each line contains a word followed by its vectors, like in the default fastText text format.
 Each value is space separated. Words are ordered by descending frequency.
+These text models can easily be loaded in Python using the following code:
+```python
+import io
+
+def load_vectors(fname):
+    fin = io.open(fname, 'r', encoding='utf-8', newline='\n', errors='ignore')
+    n, d = map(int, fin.readline().split())
+    data = {}
+    for line in fin:
+        tokens = line.rstrip().split(' ')
+        data[tokens[0]] = map(float, tokens[1:])
+    return data
+```
 
 ### License
 

+ 6 - 3
docs/pretrained-vectors.md

@@ -130,10 +130,13 @@ If you use these word vectors, please cite the following paper:
 P. Bojanowski\*, E. Grave\*, A. Joulin, T. Mikolov, [*Enriching Word Vectors with Subword Information*](https://arxiv.org/abs/1607.04606)
 
 ```markup
-@article{bojanowski2016enriching,
+@article{bojanowski2017enriching,
   title={Enriching Word Vectors with Subword Information},
   author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
-  journal={arXiv preprint arXiv:1607.04606},
-  year={2016}
+  journal={Transactions of the Association for Computational Linguistics},
+  volume={5},
+  year={2017},
+  issn={2307-387X},
+  pages={135--146}
 }
 ```