Safetensors
English
qwen3

Model Details

This is a decoder-only model with approximately 0.7B parameters. The architecture largely follows the Qwen-3 design, with the following key hyperparameters:

  • Hidden Size: 1024
  • Attention Heads: 16
  • Layers: 28
  • Sequence Length: 4096

Training Data

The total token budget for training is 100 billions tokens. The training mixture is comprised of Nemotron-CC high-actual (85%) and Nemotron-Pretraining-Specialized-v1.1 (15%).

Tokenizer

The model utilizes custom openeurollm tokenizer with a 262K vocabulary size.

Training Information

The model was trained using the NVidia-Megatron-LM framework on the LUMI HPC supercomputer. The training utilized 16 AMD MI250x nodes, totaling approximately 1500 GPU hours.

Intermediate Checkpoints

We have released intermediate checkpoints to provide access to the model's training progression. These checkpoints are available in separate branches, with a new checkpoint released every 4000 training steps.

The naming convention is iter_0xxxxx00. For example, the checkpoint for 16000 iterations is named iter_0016000. The available checkpoints range from iter_0004000 up to iter_0047684. The final checkpoint, iter_0047684, is located in the main branch.

Downloads last month
38
Safetensors
Model size
0.7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train openeurollm/datamix-0.7b-nemotron_pre_spec_v1.1-100bt