diff options
author | Vasilev Ruslan <53991623+artnitolog@users.noreply.github.com> | 2022-06-23 10:35:06 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2022-06-23 10:35:06 +0300 |
commit | 06e5164d5de93e97d01912e9a388b849ac7b64c6 (patch) | |
tree | 9748c4185a1f42eeefd29e110d5867650ea7898c | |
parent | 09187f20241f195fe4089c3f887b81d8c8038dc9 (diff) |
Update README.md
-rw-r--r-- | README.md | 10 |
1 files changed, 5 insertions, 5 deletions
@@ -5,22 +5,22 @@ The model leverages 100 billion parameters. It took 65 days to train the model o Training details and best practices on acceleration and stabilizations can be found on **[Medium](https://medium.com/p/d1df53d0e9a6)** (English) and **[Habr](https://habr.com/ru/company/yandex/blog/672396/)** (Russian) articles. -# Setup +## Setup Make sure to have 200GB of free disk space before downloading weights. The model *(code is based on [microsoft/DeepSpeedExamples/Megatron-LM-v1.1.5-ZeRO3](https://github.com/microsoft/DeepSpeedExamples/tree/068e6561188e9192104e014f70fbe25224b5eb62/Megatron-LM-v1.1.5-ZeRO3))* is supposed to run on multiple GPUs with tensor parallelism. It was tested on 4 (A100 80g) and 8 (V100 32g) GPUs, but is able to work with different configurations with ≈200GB of GPU memory in total which divide weight dimensions correctly (e.g. 16, 64, 128). -## Downloading checkpoint +### Downloading checkpoint * Run `bash download/download.sh` to download model weights and vocabulary. * By default, weights will be downloaded to `./yalm100b_checkpoint/weights/`, and vocabulary will be downloaded to `./yalm100b_checkpoint/vocab/`. -## Docker +### Docker * We [published](https://hub.docker.com/r/yandex/yalm-cuda11-ds) image on Docker Hub, it can be pulled with `docker/pull.sh`. It is compatible with A100 and V100. * Alternatively, you can build docker image from source using `docker/build.sh` (which will just build docker image from `docker/Dockerfile`). * To run container, use `docker/run.sh` *(volumes, name and other parameters can be changed)*. -# Usage +## Usage You can start with the following scripts: * `examples/generate_interactive.sh`: interactive generation from command line, the simplest way to try the model. @@ -28,6 +28,6 @@ You can start with the following scripts: * `examples/generate_conditional_greedy.sh`: same as previous, but generation is greedy. Suitable for solving problems with few-shot. * `examples/generate_unconditional.sh`: unconditional generation. No input is used, output will be jsonlines. -# License +## License The model is published under the Apache 2.0 license that permits both research and commercial use, Megatron-LM is licensed under the [Megatron-LM license](megatron_lm/LICENSE). |