Model Details

This is a decoder-only model with approximately 0.7B parameters. The architecture largely follows the Qwen-3 design, with the following key hyperparameters:

Hidden Size: 1024
Attention Heads: 16
Layers: 28
Sequence Length: 4096

Training Data

The total token budget for training is 100 billions tokens. The training mixture is comprised of Nemotron-CC high-actual (85%) and Nemotron-Pretraining-Specialized-v1.1 (15%).

Tokenizer

The model utilizes custom openeurollm tokenizer with a 262K vocabulary size.

Training Information

The model was trained using the NVidia-Megatron-LM framework on the LUMI HPC supercomputer. The training utilized 16 AMD MI250x nodes, totaling approximately 1500 GPU hours.

Intermediate Checkpoints

We have released intermediate checkpoints to provide access to the model's training progression. These checkpoints are available in separate branches, with a new checkpoint released every 4000 training steps.

The naming convention is iter_0xxxxx00. For example, the checkpoint for 16000 iterations is named iter_0016000. The available checkpoints range from iter_0004000 up to iter_0047684. The final checkpoint, iter_0047684, is located in the main branch.

Downloads last month: 38

Safetensors

Model size

0.7B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

openeurollm
/

datamix-0.7b-nemotron_pre_spec_v1.1-100bt

Model Details

Training Data

Tokenizer

Training Information

Intermediate Checkpoints

Dataset used to train openeurollm/datamix-0.7b-nemotron_pre_spec_v1.1-100bt