浏览代码

Fix getNN in python bindings to avoid 'utf-8' codec can't decode error. (#967)

Summary:
This [earlier commit](https://github.com/facebookresearch/fastText/commit/e13484bcb261cda51d33c4940ab5e207aba3ee79) fixed issue https://github.com/facebookresearch/fastText/issues/715 by casting all strings to Python strings. However, this functionality was not added to getNN and I was seeing the same error when querying nearest neighbors for Japanese language. This commit simply adapts castToPythonString to the get NN function.
Pull Request resolved: https://github.com/facebookresearch/fastText/pull/967

Reviewed By: EdouardGrave

Differential Revision: D19287807

Pulled By: Celebio

fbshipit-source-id: 31fb8b4d643848f3f22381ac06f2443eb70c0009
DeepLearning VM 6 年之前
父节点
当前提交
3b25b87442
共有 1 个文件被更改,包括 14 次插入2 次删除
  1. 14 2
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc

+ 14 - 2
python/fasttext_module/fasttext/pybind/fasttext_pybind.cc

@@ -427,8 +427,20 @@ PYBIND11_MODULE(fasttext_pybind, m) {
              const std::string word) { m.getWordVector(vec, word); })
       .def(
           "getNN",
-          [](fasttext::FastText& m, const std::string& word, int32_t k) {
-            return m.getNN(word, k);
+          [](fasttext::FastText& m, const std::string& word, int32_t k,
+             const char* onUnicodeError) {
+            std::vector<std::pair<float, std::string>> score_words = m.getNN(
+                word, k);
+            std::vector<std::pair<float, py::str>> output_list;
+            for (uint32_t i = 0; i < score_words.size(); i++) {
+               float score = score_words[i].first;
+               py::str word = castToPythonString(
+                   score_words[i].second, onUnicodeError);
+               std::pair<float, py::str> sw_pair = std::make_pair(score, word);
+               output_list.push_back(sw_pair);
+            }
+
+            return output_list;
           })
       .def(
           "getAnalogies",