"Am I the only one who finds it a bit sus that @ollama claims new support of models (dbrx), but all they do is update their llama.cpp commit
What makes it worse for me is that they don't thank or kudos @ggerganov and the team at all in their README nor in any acknowledgment. Ollama wouldn't exist without llama.cpp."
What's frustrating is they're good at the value they add: documentation, UI/UX, their model zoo thing, etc. This alone is something to be proud of and adds a lot of value. As of this writing they have 2,377 commits - there is quite a bit of effort and resulting value in what they're doing.
However, IMO it is pretty sleazy that they frequently make claims like "Ollama now supports X" with zero mention of llama.cpp[0] - an incredible project that makes what they're doing possible in the first place and largely enables these announcements. They don't even mention llama.cpp in their Github README or release notes which cranks the sleaze up a few notches.
I don't know who they are or what their "angle" is but this reeks of "some business opportunity/VC/something is going to come along and we'll cash in on AI hype while potentially misrepresenting what we're actually doing". To a more naive audience that doesn't quite understand the shoulders of giants they're standing on it makes it seem that they are doing far more than they actually are.
Of course I don't know this is the case but it sure looks like it and it would be trivial for them to address this but they're also very good at marketing and I assume that takes priority.
First of all, they are not violating any license or terms in any form. They add value and enable thousands of people to use local LLMs, that would not be able to do that so easy otherwise. Maybe llama.cpp should mention that Ollama takes care of easy workable access to their functionality…
> First of all, they are not violating any license or terms in any form.
IANAL but from what I understand likely debatable at least. You'll notice I said "sleazy" and didn't touch on license, potential legal issues, etc.
I'm pointing out that other projects that are substantially based/dependent on other pieces of software to do the "heavy lifting" nearly always acknowledge it. An example being faster-whisper which is a good corollary and actually has "with CTranslate2" right in the heading[0] with direct links to whisper.cpp and CTranslate2 immediately following.
Ollama is the diametric opposite of this - unless you go spelunking through commits, etc you'd have no idea that Ollama doesn't do much in terms of the underlying LLM. Take a look at llama.cpp to see just how much "Ollama functionality" it provides.
Then look at /r/LocalLLaMA, HN, etc to see just how many Ollama users (most) have no idea that llama.cpp even exists.
I don't know how this could be anything other than an attempt to mislead people into thinking Ollama is uniquely and directly implementing all of the magic. It's pretty glaring and has been pointed out repeatedly. It's not some casual oversight.
> They add value and enable thousands of people to use local LLMs, that would not be able to do that so easy otherwise.
The very first thing I said going so far as to mention commits, model zoo, etc while specifically acknowledging the level of effort and added value.
> Maybe llama.cpp should mention that Ollama takes care of easy workable access to their functionality…
Are you actually suggesting that enabling software should mention, track, or even be aware of the likely countless projects that are built on it?
PR they do is very creepy, it is literally reads as if all work is being done by ollama themselves, but when I saw they started to do meet-ups and do integration with other companies(I presume with paid support), then imho coupled with previous points this is red line, do freaking attribution.
It is the same behaviour as Amazon did with OSS which in turn forced companies to adopt more restrictive licenses.
Ollama is an ergonomic "frontend" to a lower level library (llama.cpp).
The way they are operating is extremely common to the way anyone else operates. If I built a webservice on top of, say, Warp in Rust, people generally aren't putting much acknowledgement for using Warp. Or should I give acknowledgement to Hyper, which warp is built in?
Actually, on the flip side, Warp is a good example of giving acknowledgement, since they mention Hyper in their readme (of course it is made by the same owner, so he is just linking his works):
I had a consulting call with a young founder trying to start an AI company backed by ollama.
I don’t really think ollama scales to production workloads anyway, but they had no idea what llama.cpp was.
Hopefully what made you sad was that somebody could aim to start an AI company without knowing what llama.cpp is, not they did not know about llama.cpp :-)
Obviously I don't know the story but yeah... That founder and potential company are in for a rude awakening.
> I don’t really think ollama scales to production workloads anyway
Not even close. At the risk of gatekeeping in terms of production/commercial serving of LLMs Ollama (and llama.cpp) are basically toys. They serve a purpose and are fantastic projects for their intended use cases (serving a user or two) but compared to production workloads they're basically "my first LLM".
If that founder isn't at least aware of vLLM or HF TGI (let alone llama.cpp!!) they'll have a really tough time being even remotely competitive in the space, to the point of "it doesn't work and it's not going to".
Obviously there is much, much more that goes into startup success but this is pretty fundamental.
I pointed them towards vLLM, but it sounded like they were set on ollama
I’m curious though, why do you think llama.cpp is a toy compared to vllm?
I understand that vllm is also a server, but could someone not build a similar high throughput server on llama.cpp?
I’ve been looking for a way to serve small-scale-but-still-production workloads (using quantized phi models) on CPU and llama.cpp seems to be the only player in town.
> I pointed them towards vLLM, but it sounded like they were set on ollama
I'm baffled how someone could be so set on Ollama. Being married to a tool is always weird to me and being set on the (very) wrong tool for the job even when faced with good advice is even weirder.
Maybe they'll change their mind the first time a VC, customer, or hire sees Ollama and laughs ;). Kind of kidding but not.
> I’m curious though, why do you think llama.cpp is a toy compared to vllm?
llama.cpp is downright incredible for supporting things you would never do in multi-user production environment:
- Support Nvidia GPUs going back to Maxwell(!)
- CPU (waaaay too slow)
- Split layers between GPU and CPU (still way too slow)
- Wild quantization methods
- Support all kinds of random platforms you'd never deploy to in production (Apple Silicon, etc)
- Much, much more
Whereas the emphasis for vLLM is:
- High scale serving of LLMs in production environments
llama.cpp does really well when used in Ollama type use cases - "I want to run this on my Macbook and send a request every once in a while" or "load a huge model across VRAM and RAM on my desktop". WITH the understanding that being hosted locally is more important than being at least as fast as ChatGPT (which is more-or-less considered the bare-minimum standard in the industry).
I said "at least isn't aware of vLLM" because you can take it even further than this (like Cloudflare, Amazon, Mistral, Phind, Databricks, etc) and use something like TensorRT-LLM with Triton Inference Server which kicks performance and production suitability up yet another couple of notches.
It's a right tool for the job kind of thing.
At this risk of sounding elitist I have no idea how a dozen total tokens/s on CPU (or whatever) is going to be acceptable to users.
Especially in the case of the original scenario (AI startup) - if you go into a highly competitive and crowded space with Ollama (CPU or not) you're going to get beaten up by people deploying with solutions that are so fundamentally drastically better.
All of this said I have no idea what you mean by "small-scale-but-still-production" and no idea of your users or use case(s). I suppose there's always a chance llama.cpp on CPU could be fine in some cases. I just can't possibly imagine what they would be but that could just be my own experience and bias talking.
I’m working on an internal tool. Maybe 30-40 “customers” total. I say it’s production because it has to be reliable.
We just don’t want to rent a GPU for this little thing. It draws up reports once a day, so it’s okay if it takes a couple mins. It’s work that took a single person maybe 2 hours to do before.
I’ll need to look into triton, I haven’t heard of that yet!
If you have any resources for running models in production that you’d be willing to share, I’d appreciate them.
What did they do to support WizardLM 2? It seems to work with an earlier llama.cpp version. (I have an app in production that uses a llama.cpp version before WizardLM 2 release)
I just checked: there's exactly one user that has contributed (only typo fixes) to both ollama and llama.cpp according to github's contributors graphs.
Ah, thanks for this! I can't edit my parent comment that you replied to any longer unfortunately.
As I said, I only compared the contributors graphs [0] and checked for overlaps. But those apparently only go back about year and only list at most 100 contributors ranked by number of commits.
This is just like tannenbaum getting mad nobody credited him for the intel management engine (which he feels makes him the posthumous victor in the Linux/minix debates).
Bro, you shouldn’t have chosen a non-attribution license, if you wanted to be attributed.
Just like Tannenbaum, if you wanted your ego stroked, that’s attribution in this context.
Er. Minix is under a BSD license, which does require attribution. Also my distant memory is that Tannenbaum wasn't even mad about Intel not actually fulfilling the terms of the license, but I may be misremembering.
Can you demonstrate that there is not an attribution in the Intel Management Engine documentation? ;)
Sneaky or not, that's the license Tanenbaum chose, and he has to live with it. Same deal here.
Anyway no, tanenbaum isn't mad, per-se, or at least not at intel. He's sniping back at Linus Kernel (remember the Torvalds-Tanenbaum debates? it was a thing) about how he was right after all about minix being the most-widely used OS in the world. It's not anger, it's gloating - it's not even really a letter that's meant for Intel at all.
And again, that is the point of the entire BSD/MIT vs GPL debate - which went completely over Tanenbaum's head. BSD/MIT provides maximum freedom to the developer... sometimes including the freedom to deny freedoms to the user. He is critiquing Intel (obliquely) for doing the specific thing that makes this license desirable to these customers, and the specific thing Torvalds argued against.
Like it's a gloat, but about how his OS is more popular, but it also backhandedly shows why Linus Kernel was right. And the same is true here. Want attribution? Choose a license that requires it.
Yeah, I know they have attribution now, but I was fairly confident that they added that because someone called them out on shipping it without attribution before.
Via twitter
"If you're after some weekend reading, I have now added 8 deep dives about David Braben's epic Lander, the world's first game for the ARM platform. Landscape generation, 3D objects, particle physics and memory maps... it's all here.
More articles soon... "
If you're after some weekend reading, I have now added 8 deep dives about David Braben's epic Lander, the world's first game for the ARM platform. Landscape generation, 3D objects, particle physics and memory maps... it's all here.
via twitter : "Just published a big update on the PS2 article. In there, you will now find the history behind its MIPS CPU and updated information about the PS2's OS and the subsequent Homebrew scene." https://twitter.com/Flipacholas/status/1758940698466832574
"Parts 2 and 3 will be released over time.... prob one a week or something - if I remember :)
Patreons get access to all locked articles."