Comment
Author: Admin | 2025-04-28
3.5 model, the performance was somewhat okayish. While the inference time wasn’t too shabby and the responses initially seemed good, the model started to hallucinate and produce inaccurate outputs. I had to forcefully quit it after about 11 minutes, as it showed no signs of stopping and would likely have continued indefinitely. The model utilized around 5 GB of RAM, which left some capacity for other tasks, but the hallucinations ultimately detracted from the overall experience.Mistral (7b)Mistral is a 7-billion-parameter model released under the Apache license, offered in both instruction-following and text completion variants. According to the Mistral AI team, Mistral 7B surpasses Llama2- 13B across all benchmarks and even outperforms Llama 1 34B in several areas. It also delivers performance close to CodeLlama 7B for coding tasks, while still excelling in general English language tasks.I was skeptical about this model since it was a 7b parameter model but during my testing on Pi 5, it did manage to complete the given tasks, although the inference time wasn’t super speedy around 6 minutes. It utilized only 5 GB of RAM, which is impressive given its size, and the responses were correct and aligned with my expectations. While I wouldn't rely on this model for daily use on the Pi, it's definitely nice to have as an option for more complex tasks when needed.Llama 2 (7b)Llama 2, developed by Meta Platforms, Inc., is trained on a dataset of 2 trillion tokens and natively supports a context length of 4,096 tokens. The Llama 2 Chat models are specifically optimized for conversational use, fine-tuned with more than 1 million human annotations to enhance their chat capabilities.Well well well, as you can see above in my attempt to run the Llama 2 model, I found that it simply didn’t work due to its higher RAM requirements.Codellama (7b)Code Llama, based on Llama 2, is a model created to assist with code generation and discussion. It aims to streamline development workflows and simplify the coding learning process. Capable of producing both code and explanatory natural language, Code Llama supports a wide range of popular programming languages, such as Python, C++, Java, PHP, Typescript (Javascript), C#, Bash, and others.Similar to llama2 model, due to its higher RAM requirements, it didn't run at all on my Raspberry Pi 5.Nemotron-mini (4b)Nemotron-Mini-4B-Instruct is designed to generate responses for roleplaying, retrieval-augmented generation (RAG), and function calling. It’s a small language model (SLM) that has been refined for speed and on-device deployment using distillation, pruning, and quantization techniques.Optimized specifically for roleplay, RAG-based QA, and function calling in English, this instruct model supports a context length of 4,096 tokens and is ready for commercial applications.During my testing of Nemotron-Mini-4B-Instruct, I found the model to be quite efficient. It managed to deliver responses quickly, with an inference time of under 2 minutes, while using just 4 GB of RAM. This level of performance makes it a viable option for your personal co-pilot on Pi.Orca-Mini (3b)Orca Mini is a series of models based on Llama
Add Comment