Yesterday Mistral AI released a new language model called Mistral 7B. @justnasty@lemmy.kya.moe already posted the Sliding attention part here in LocalLLaMA, yesterday. But I think the model and the company behind that are even more noteworthy and the release of the model is worth it’s own post.
Mistral 7B is not based on Llama. And they claim it outperforms Llama2 13B on all benchmarks (at it’s size of 7B). It has additional coding abilities and a 8k sequence length. And it’s released under the Apache 2.0 license. So truly an ‘open’ model, usable without restrictions. [Edit: Unfortunately I couldn’t find the dataset or a paper. They call it ‘open-weight’. So my conclusion regarding the open-ness might be a bit premature. We’ll see.]
(It uses Grouped-query attention and Sliding Window Attention.)
Also worth to note: Mistral AI (the company) is based in Paris. They are one of the few big european AI startups and collected $113 million funding in June.
- Details are on Mistral AI’s Announcement
- techcrunch news article including information about the company
- They released an base/foundation model and an instruction-tuned one on HuggingFace
- And llama.cpp is already compatible and GGUF versions out there.
I’ve tried it and it indeed looks promising. It certainly has features that distinguishes it from Llama. And I like the competition. Our world is currently completely dominated by Meta. And if it performs exceptionally well at its size, I hope people pick up on it and fine-tune it for all kinds of specific tasks. (The lack of a dataset and detail regarding the training could be a downside, though. These were not included in this initial release of the model.)
EDIT 2023-10-12: Paper released at: https://arxiv.org/abs/2310.06825 (But I’d say no new information in it, they mostly copied their announcement)
As of now, it is clear they don’t want to publish any details about the training.
Thanks for paying close attention. I just threw kobold.cpp at it and was amazed by the speed of a 7B model on my old PC ;-) Let it complete a few stories and asked the instruct-tuned variant about llamas and other facts… Somehow missed that there are still things missing. My tests for simple and short texts seemed fine.
Another thing I somehow completely missed is the release of Qwen. This is funded by Alibaba? I need to read up on it.
Regarding the fine-tuning attempts… Idk. My personal opinion is: I’m going to be patient and see. Things are always moving fast and the community (not the researchers) sometimes do silly stuff. And most of the tools are probably focused on Llama as of now. So it’ll probably take more than a few hours to see decent results. But I’m sure the community will have a try. Especially if it turns out the performance is really as good or better than Llama 2.
Thanks. Those were my words. Maybe I got a bit too excited. I thought I’d read the entire paper later and find out what kind of dataset and how many tokens they used for training.
Turns out there is no paper or model card. At least I couldn’t find one. I’m going to edit my post.
A bit strange for a company with a claimed business model to ‘distrubute open-source models’.
People already filed issues: https://github.com/mistralai/mistral-src/issues/9 or https://huggingface.co/mistralai/Mistral-7B-v0.1/discussions/8
[Edit: The link from the github issue is also interesting regarding the ‘open source’ AI: https://opening-up-chatgpt.github.io/ ]
Guess we’re going to see what happens. Judging by their careful wording “driving the AI revolution by developing OPEN-WEIGHT models that are on par with proprietary solutions” I’m afraid they did that on purpose to mislead people and really mean open-weight and not open-source. Seems that’s just the careless interpretation of the journalists/reporters and people like me who should learn not to mix facts and own conclusions. I’m going to follow the progress. Hope they will answer the questions.
Edit: And judging by what I read on their discord, opening their tuning process is not gonna happen. :-(