Model Details
This is a decoder-only model with approximately 0.7B parameters. The architecture largely follows the Qwen-3 design, with the following key hyperparameters:
- Hidden Size: 1024
- Attention Heads: 16
- Layers: 28
- Sequence Length: 4096
Training Data
The total token budget for training is 100 billions tokens. The training mixture is comprised of Nemotron-CC high-actual (85%) and Nemotron-Pretraining-Specialized-v1.1 (15%).
Tokenizer
The model utilizes custom openeurollm tokenizer with a 262K vocabulary size.
Training Information
The model was trained using the NVidia-Megatron-LM framework on the LUMI HPC supercomputer. The training utilized 16 AMD MI250x nodes, totaling approximately 1500 GPU hours.
Intermediate Checkpoints
We have released intermediate checkpoints to provide access to the model's training progression. These checkpoints are available in separate branches, with a new checkpoint released every 4000 training steps.
The naming convention is iter_0xxxxx00. For example, the checkpoint for 16000 iterations is named iter_0016000. The available checkpoints range from iter_0004000 up to iter_0047684. The final checkpoint, iter_0047684, is located in the main branch.
- Downloads last month
- 38