Ver Fonte

added FAQ on how to get reproducible results (#633)

Summary:
Hi everyone, and thanks for this wonderful library. I'm relatively new to it, and I found myself struggling a bit when trying to obtain reproducible results, e.g. in order to find the the best parameters.
I found the perfect answer in a 2016 issue here on your repo (https://github.com/facebookresearch/fastText/issues/116)  and I though it could be useful to add it to the FAQs.

I'm sending you two PR:
- this one, in which I added the FAQ
- a second one, in which I modified the description in src/args.cc for the "thread" param

Of course feel free to choose which one to keep (or eventually to trash both of them).

Thanks!
Leonardo
Pull Request resolved: https://github.com/facebookresearch/fastText/pull/633

Differential Revision: D9814563

Pulled By: EdouardGrave

fbshipit-source-id: 83e4b7a7163b9013aef144dedd9b4bd5945bafdf
Leonardo Foderaro há 7 anos atrás
pai
commit
711f513bc6
1 ficheiros alterados com 3 adições e 0 exclusões
  1. 3 0
      docs/faqs.md

+ 3 - 0
docs/faqs.md

@@ -53,3 +53,6 @@ You'll likely see this behavior because your learning rate is too high. Try redu
 
 ## My compiler / architecture can't build fastText. What should I do?
 Try a newer version of your compiler. We try to maintain compatibility with older versions of gcc and many platforms, however sometimes maintaining backwards compatibility becomes very hard. In general, compilers and tool chains that ship with LTS versions of major linux distributions should be fair game. In any case, create an issue with your compiler version and architecture and we'll try to implement compatibility.
+
+## How do I run fastText in a fully reproducible way? Each time I run it I get different results.
+If you run fastText multiple times you'll obtain slightly different results each time due to the optimization algorithm (asynchronous stochastic gradient descent, or Hogwild). If you need to get the same results (e.g. to confront different input params set) you have to set the 'thread' parameter to 1. In this way you'll get exactly the same performances at each run (with the same input params).