|
|
5 lat temu | |
|---|---|---|
| .. | ||
| images | 5 lat temu | |
| BooksDownloader.py | 6 lat temu | |
| BookscorpusTextFormatting.py | 6 lat temu | |
| ChemProtTextFormatting.py | 5 lat temu | |
| Downloader.py | 5 lat temu | |
| GLUEDownloader.py | 5 lat temu | |
| GooglePretrainedWeightDownloader.py | 6 lat temu | |
| NVIDIAPretrainedWeightDownloader.py | 6 lat temu | |
| PubMedDownloader.py | 6 lat temu | |
| PubMedTextFormatting.py | 6 lat temu | |
| README.md | 6 lat temu | |
| SquadDownloader.py | 6 lat temu | |
| TextSharding.py | 6 lat temu | |
| WikiDownloader.py | 5 lat temu | |
| WikicorpusTextFormatting.py | 6 lat temu | |
| __init__.py | 6 lat temu | |
| bertPrep.py | 5 lat temu | |
| check.py | 5 lat temu | |
| create_biobert_datasets_from_start.sh | 6 lat temu | |
| create_datasets_from_start.sh | 5 lat temu | |
Steps to reproduce datasets from web
1) Build the container
** Inside of the container starting here** 3) Download pretrained weights (they contain vocab files for preprocessing)