summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorNikolay Zinov <nzinov@yandex-team.ru>2022-06-26 13:52:21 +0300
committerGitHub <noreply@github.com>2022-06-26 13:52:21 +0300
commitdb9ecf1745616569338a20dca77d313710827a61 (patch)
tree09e3c697c77043bdf5592005015989e7d14f97a0
parentd921248abe3bbbd6cf2f7968dfa48a1e88612460 (diff)
Add clarifications about the code
-rw-r--r--README.md2
1 files changed, 2 insertions, 0 deletions
diff --git a/README.md b/README.md
index 5998c74..bbd5061 100644
--- a/README.md
+++ b/README.md
@@ -5,6 +5,8 @@ The model leverages 100 billion parameters. It took 65 days to train the model o
Training details and best practices on acceleration and stabilizations can be found on **[Medium](https://medium.com/p/d1df53d0e9a6)** (English) and **[Habr](https://habr.com/ru/company/yandex/blog/672396/)** (Russian) articles.
+We used DeepSpeed to train the model and drew inspiration from Megatron-LM example. However, the code in this repo is not the same code that was used to train the model. Rather it is stock example from DeepSpeed repo with minimal changes needed to infer our model.
+
## Setup
Make sure to have 200GB of free disk space before downloading weights. The model *(code is based on [microsoft/DeepSpeedExamples/Megatron-LM-v1.1.5-ZeRO3](https://github.com/microsoft/DeepSpeedExamples/tree/068e6561188e9192104e014f70fbe25224b5eb62/Megatron-LM-v1.1.5-ZeRO3))* is supposed to run on multiple GPUs with tensor parallelism. It was tested on 4 (A100 80g) and 8 (V100 32g) GPUs, but is able to work with different configurations with ≈200GB of GPU memory in total which divide weight dimensions correctly (e.g. 16, 64, 128).