AI large model development company MosaicML recently released a new commercially available open source large language model MPT-30B, with 30 billion parameters, which is significantly more powerful than the previous generation MPT-7B language model (7 billion parameters), and its performance is better than GPT-3.
Image source: Generated by Unbounded AI
In addition, they released two fine-tuned models: MPT-30B-Instruct and MPT-30B-Chat, which build on MPT-30B and are good at single-turn instruction tracking and multi-turn dialogue, respectively.
Features of the MPT-30B model:
8k token context window during training
Support for longer contexts via ALiBi
Achieve efficient inference + training performance through FlashAttention
The MPT-30B series also has strong encoding capabilities due to its pre-trained data mix.
The model has been extended to an 8k token context window on NVIDIA H100, making it the first LLM trained on H100.
MPT-30B stronger than GPT-3?
MPT-30B is a commercial Apache 2.0 licensed open source base model that is stronger than the original GPT-3 and competitive with other open source models such as LLaMa-30B and Falcon-40B.
(Top) Zero-shot accuracy of MPT-30B versus GPT-3 on nine contextual learning (ICL) tasks. MPT-30B outperforms GPT-3 on six out of nine metrics.
MosaicML trained the MPT-30B for 2 months, using Nvidia’s H100 GPU cluster for training.
As shown in the figure below, the training data of MPT-30B:
MPT-30B is pre-trained by data mixing, and 1T pre-training data tokens are collected from 10 different open source text corpora, and the text is segmented using the EleutherAI GPT-NeoX-20B tokenizer, and sampled according to the above ratio.
Comparison of MPT-7B and MPT-30B
MPT-30B Training Cost
Naveen Rao, CEO and co-founder of MosaicML, said that the training cost of MPT-30B is 700,000 US dollars (about 5.0244 million yuan), which is far lower than the tens of millions of dollars required for similar products such as GPT-3. .
How much time and money will it take to train a custom MPT-30B model? Let’s start with the basic model.
The figure above shows the time and cost of pre-training MPT-30B from scratch using A100 or H100 GPUs. With MosaicML infrastructure, you can train your own custom MPT-30B from scratch with 1T tokens in 2 weeks.
What if you don’t want to train from scratch, but just fine-tune an existing model?
The figure below details the time and cost of fine-tuning MPT-30B for each 1B token. With the MosaicML infrastructure, you can fully fine-tune your MPT-30B model without worrying about system memory constraints, and for only a few hundred dollars!
MosaicML said that expanding the model to 30 billion parameters is only the first step, and then they will launch a larger and higher-quality model on the premise of reducing costs.
References:
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
Open source and commercially available, the cost of the 30 billion parameter MPT-30B large model is only a fraction of GPT-3
AI large model development company MosaicML recently released a new commercially available open source large language model MPT-30B, with 30 billion parameters, which is significantly more powerful than the previous generation MPT-7B language model (7 billion parameters), and its performance is better than GPT-3.
In addition, they released two fine-tuned models: MPT-30B-Instruct and MPT-30B-Chat, which build on MPT-30B and are good at single-turn instruction tracking and multi-turn dialogue, respectively.
Features of the MPT-30B model:
The model has been extended to an 8k token context window on NVIDIA H100, making it the first LLM trained on H100.
MPT-30B stronger than GPT-3?
MPT-30B is a commercial Apache 2.0 licensed open source base model that is stronger than the original GPT-3 and competitive with other open source models such as LLaMa-30B and Falcon-40B.
MosaicML trained the MPT-30B for 2 months, using Nvidia’s H100 GPU cluster for training.
As shown in the figure below, the training data of MPT-30B:
MPT-30B Training Cost
Naveen Rao, CEO and co-founder of MosaicML, said that the training cost of MPT-30B is 700,000 US dollars (about 5.0244 million yuan), which is far lower than the tens of millions of dollars required for similar products such as GPT-3. .
How much time and money will it take to train a custom MPT-30B model? Let’s start with the basic model.
What if you don’t want to train from scratch, but just fine-tune an existing model?
The figure below details the time and cost of fine-tuning MPT-30B for each 1B token. With the MosaicML infrastructure, you can fully fine-tune your MPT-30B model without worrying about system memory constraints, and for only a few hundred dollars!
References: