Open source and commercially available, the cost of the 30 billion parameter MPT-30B large model is only a fraction of GPT-3

2023-06-26 08:21:55

AI large model development company MosaicML recently released a new commercially available open source large language model MPT-30B, with 30 billion parameters, which is significantly more powerful than the previous generation MPT-7B language model (7 billion parameters), and its performance is better than GPT-3.

Image source: Generated by Unbounded AI

In addition, they released two fine-tuned models: MPT-30B-Instruct and MPT-30B-Chat, which build on MPT-30B and are good at single-turn instruction tracking and multi-turn dialogue, respectively.

Features of the MPT-30B model:

8k token context window during training
Support for longer contexts via ALiBi
Achieve efficient inference + training performance through FlashAttention
The MPT-30B series also has strong encoding capabilities due to its pre-trained data mix.

The model has been extended to an 8k token context window on NVIDIA H100, making it the first LLM trained on H100.

MPT-30B stronger than GPT-3?

MPT-30B is a commercial Apache 2.0 licensed open source base model that is stronger than the original GPT-3 and competitive with other open source models such as LLaMa-30B and Falcon-40B.

(Top) Zero-shot accuracy of MPT-30B versus GPT-3 on nine contextual learning (ICL) tasks. MPT-30B outperforms GPT-3 on six out of nine metrics.

MosaicML trained the MPT-30B for 2 months, using Nvidia’s H100 GPU cluster for training.

As shown in the figure below, the training data of MPT-30B:

MPT-30B is pre-trained by data mixing, and 1T pre-training data tokens are collected from 10 different open source text corpora, and the text is segmented using the EleutherAI GPT-NeoX-20B tokenizer, and sampled according to the above ratio.

Comparison of MPT-7B and MPT-30B

MPT-30B Training Cost

Naveen Rao, CEO and co-founder of MosaicML, said that the training cost of MPT-30B is 700,000 US dollars (about 5.0244 million yuan), which is far lower than the tens of millions of dollars required for similar products such as GPT-3. .

How much time and money will it take to train a custom MPT-30B model? Let’s start with the basic model.

The figure above shows the time and cost of pre-training MPT-30B from scratch using A100 or H100 GPUs. With MosaicML infrastructure, you can train your own custom MPT-30B from scratch with 1T tokens in 2 weeks.

What if you don’t want to train from scratch, but just fine-tune an existing model?

The figure below details the time and cost of fine-tuning MPT-30B for each 1B token. With the MosaicML infrastructure, you can fully fine-tune your MPT-30B model without worrying about system memory constraints, and for only a few hundred dollars!

MosaicML said that expanding the model to 30 billion parameters is only the first step, and then they will launch a larger and higher-quality model on the premise of reducing costs.

References:

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.