More

mariuz · 2025-10-24T10:46:32 1761302792

Response via Bsky https://bsky.app/profile/mdf200.bsky.social/post/3m3uejt3yi2...

"Parts 2 and 3 will be released over time.... prob one a week or something - if I remember :)

Patreons get access to all locked articles."

mariuz · 2025-04-24T07:37:20 1745480240

Related AWS Lambda MCP Server : https://awslabs.github.io/mcp/servers/lambda-mcp-server/

mariuz · on May 16, 2024

Via Martin Bauer https://twitter.com/martinmbauer/status/1790832130538279248

mariuz · on April 23, 2024

Related with GitHub link

https://twitter.com/awnihannun/status/1782436898285527229

mariuz · on April 17, 2024

What makes it worse for me is that they don't thank or kudos @ggerganov and the team at all in their README nor in any acknowledgment. Ollama wouldn't exist without llama.cpp."

https://twitter.com/_philschmid/status/1780509972092121242

kkielhofner · on April 17, 2024

What's frustrating is they're good at the value they add: documentation, UI/UX, their model zoo thing, etc. This alone is something to be proud of and adds a lot of value. As of this writing they have 2,377 commits - there is quite a bit of effort and resulting value in what they're doing.

However, IMO it is pretty sleazy that they frequently make claims like "Ollama now supports X" with zero mention of llama.cpp[0] - an incredible project that makes what they're doing possible in the first place and largely enables these announcements. They don't even mention llama.cpp in their Github README or release notes which cranks the sleaze up a few notches.

I don't know who they are or what their "angle" is but this reeks of "some business opportunity/VC/something is going to come along and we'll cash in on AI hype while potentially misrepresenting what we're actually doing". To a more naive audience that doesn't quite understand the shoulders of giants they're standing on it makes it seem that they are doing far more than they actually are.

Of course I don't know this is the case but it sure looks like it and it would be trivial for them to address this but they're also very good at marketing and I assume that takes priority.

[0] - https://ollama.com/blog

fearface · on April 17, 2024

First of all, they are not violating any license or terms in any form. They add value and enable thousands of people to use local LLMs, that would not be able to do that so easy otherwise. Maybe llama.cpp should mention that Ollama takes care of easy workable access to their functionality…

kkielhofner · on April 17, 2024

> First of all, they are not violating any license or terms in any form.

IANAL but from what I understand likely debatable at least. You'll notice I said "sleazy" and didn't touch on license, potential legal issues, etc.

I'm pointing out that other projects that are substantially based/dependent on other pieces of software to do the "heavy lifting" nearly always acknowledge it. An example being faster-whisper which is a good corollary and actually has "with CTranslate2" right in the heading[0] with direct links to whisper.cpp and CTranslate2 immediately following.

Ollama is the diametric opposite of this - unless you go spelunking through commits, etc you'd have no idea that Ollama doesn't do much in terms of the underlying LLM. Take a look at llama.cpp to see just how much "Ollama functionality" it provides.

Then look at /r/LocalLLaMA, HN, etc to see just how many Ollama users (most) have no idea that llama.cpp even exists.

I don't know how this could be anything other than an attempt to mislead people into thinking Ollama is uniquely and directly implementing all of the magic. It's pretty glaring and has been pointed out repeatedly. It's not some casual oversight.

> They add value and enable thousands of people to use local LLMs, that would not be able to do that so easy otherwise.

The very first thing I said going so far as to mention commits, model zoo, etc while specifically acknowledging the level of effort and added value.

> Maybe llama.cpp should mention that Ollama takes care of easy workable access to their functionality…

Are you actually suggesting that enabling software should mention, track, or even be aware of the likely countless projects that are built on it?

[0] - https://github.com/SYSTRAN/faster-whisper

Zambyte · on April 17, 2024

The llama.cpp license does actually require attribution, which I'm not sure exactly how ollama is complying with.

secondary_op · on April 17, 2024

PR they do is very creepy, it is literally reads as if all work is being done by ollama themselves, but when I saw they started to do meet-ups and do integration with other companies(I presume with paid support), then imho coupled with previous points this is red line, do freaking attribution.

It is the same behaviour as Amazon did with OSS which in turn forced companies to adopt more restrictive licenses.

https://www.forbes.com/sites/davidjeans/2021/03/01/elastic-w...

falcor84 · on April 17, 2024

I agree with everything except for "forced"

survirtual · on April 17, 2024

Ollama is an ergonomic "frontend" to a lower level library (llama.cpp).

The way they are operating is extremely common to the way anyone else operates. If I built a webservice on top of, say, Warp in Rust, people generally aren't putting much acknowledgement for using Warp. Or should I give acknowledgement to Hyper, which warp is built in?

Actually, on the flip side, Warp is a good example of giving acknowledgement, since they mention Hyper in their readme (of course it is made by the same owner, so he is just linking his works):

https://crates.io/crates/warp

Maybe Ollama should add a similar type of acknowledgement?

I will see about opening an issue on their github.

survirtual · on April 17, 2024

made an issue:

https://github.com/ollama/ollama/issues/3697

dartos · on April 17, 2024

I had a consulting call with a young founder trying to start an AI company backed by ollama. I don’t really think ollama scales to production workloads anyway, but they had no idea what llama.cpp was.

Kinda made me sad.

belter · on April 17, 2024

Hopefully what made you sad was that somebody could aim to start an AI company without knowing what llama.cpp is, not they did not know about llama.cpp :-)

kkielhofner · on April 17, 2024

Obviously I don't know the story but yeah... That founder and potential company are in for a rude awakening.

> I don’t really think ollama scales to production workloads anyway

Not even close. At the risk of gatekeeping in terms of production/commercial serving of LLMs Ollama (and llama.cpp) are basically toys. They serve a purpose and are fantastic projects for their intended use cases (serving a user or two) but compared to production workloads they're basically "my first LLM".

If that founder isn't at least aware of vLLM or HF TGI (let alone llama.cpp!!) they'll have a really tough time being even remotely competitive in the space, to the point of "it doesn't work and it's not going to".

Obviously there is much, much more that goes into startup success but this is pretty fundamental.

dartos · on April 17, 2024

I pointed them towards vLLM, but it sounded like they were set on ollama

I’m curious though, why do you think llama.cpp is a toy compared to vllm?

I understand that vllm is also a server, but could someone not build a similar high throughput server on llama.cpp?

I’ve been looking for a way to serve small-scale-but-still-production workloads (using quantized phi models) on CPU and llama.cpp seems to be the only player in town.

kkielhofner · on April 17, 2024

> I pointed them towards vLLM, but it sounded like they were set on ollama

I'm baffled how someone could be so set on Ollama. Being married to a tool is always weird to me and being set on the (very) wrong tool for the job even when faced with good advice is even weirder.

Maybe they'll change their mind the first time a VC, customer, or hire sees Ollama and laughs ;). Kind of kidding but not.

> I’m curious though, why do you think llama.cpp is a toy compared to vllm?

llama.cpp is downright incredible for supporting things you would never do in multi-user production environment:

- Support Nvidia GPUs going back to Maxwell(!)

- CPU (waaaay too slow)

- Split layers between GPU and CPU (still way too slow)

- Wild quantization methods

- Support all kinds of random platforms you'd never deploy to in production (Apple Silicon, etc)

- Much, much more

Whereas the emphasis for vLLM is:

- High scale serving of LLMs in production environments

llama.cpp does really well when used in Ollama type use cases - "I want to run this on my Macbook and send a request every once in a while" or "load a huge model across VRAM and RAM on my desktop". WITH the understanding that being hosted locally is more important than being at least as fast as ChatGPT (which is more-or-less considered the bare-minimum standard in the industry).

I said "at least isn't aware of vLLM" because you can take it even further than this (like Cloudflare, Amazon, Mistral, Phind, Databricks, etc) and use something like TensorRT-LLM with Triton Inference Server which kicks performance and production suitability up yet another couple of notches.

It's a right tool for the job kind of thing.

At this risk of sounding elitist I have no idea how a dozen total tokens/s on CPU (or whatever) is going to be acceptable to users.

Especially in the case of the original scenario (AI startup) - if you go into a highly competitive and crowded space with Ollama (CPU or not) you're going to get beaten up by people deploying with solutions that are so fundamentally drastically better.

All of this said I have no idea what you mean by "small-scale-but-still-production" and no idea of your users or use case(s). I suppose there's always a chance llama.cpp on CPU could be fine in some cases. I just can't possibly imagine what they would be but that could just be my own experience and bias talking.

dartos · on April 17, 2024

I’m working on an internal tool. Maybe 30-40 “customers” total. I say it’s production because it has to be reliable.

We just don’t want to rent a GPU for this little thing. It draws up reports once a day, so it’s okay if it takes a couple mins. It’s work that took a single person maybe 2 hours to do before.

I’ll need to look into triton, I haven’t heard of that yet!

If you have any resources for running models in production that you’d be willing to share, I’d appreciate them.

jerrygenser · on April 17, 2024

They ended up (attempting to) address this issue by including it on the last line of their readme as one of the "Supported backends[sic]".

https://github.com/ollama/ollama/issues/3697 https://github.com/ollama/ollama/commit/9755cf9173152047030b...

runjake · on April 17, 2024

New commit with acknowledgement in README.

https://github.com/ollama/ollama/pull/3700

xyc · on April 17, 2024

What did they do to support WizardLM 2? It seems to work with an earlier llama.cpp version. (I have an app in production that uses a llama.cpp version before WizardLM 2 release)

xyc · on April 17, 2024

Quite possible that llama.cpp already supports WizardLM 2: https://github.com/ggerganov/llama.cpp/issues/6691

viraptor · on April 17, 2024

They're not even sure if they'll keep llama.cpp as a dep https://github.com/ollama/ollama/issues/2534#issuecomment-19...

Currently the way it's vendored is a bit dodgy already.

ekianjo · on April 17, 2024

Do they even contribute back to llama.cpp in any meaningful way?

bertman · on April 17, 2024

They don't.

I just checked: there's exactly one user that has contributed (only typo fixes) to both ollama and llama.cpp according to github's contributors graphs.

JimDabell · on April 17, 2024

This doesn’t seem correct to me. I saw in an Ollama issue mentioned in another comment, an Ollama contributor said:

> As you pointed out, we carry patches, although in general we try to upstream those.

— https://github.com/ollama/ollama/issues/2534#issuecomment-19...

So I followed the link to his profile and saw that he has opened some non-documentation pull requests for llama.cpp:

https://github.com/ggerganov/llama.cpp/pull/5244

https://github.com/ggerganov/llama.cpp/pull/5576

I didn’t dig any deeper, but it took me less than thirty seconds to find those so I expect there are more.

bertman · on April 17, 2024

Ah, thanks for this! I can't edit my parent comment that you replied to any longer unfortunately.

As I said, I only compared the contributors graphs [0] and checked for overlaps. But those apparently only go back about year and only list at most 100 contributors ranked by number of commits.

[0]: https://github.com/ollama/ollama/graphs/contributors and https://github.com/ggerganov/llama.cpp/graphs/contributors

anon373839 · on April 17, 2024

Isn’t llama.cpp low level and highly optimized? There may not be that much overlap in the required skill sets.

throwaway5959 · on April 17, 2024

Are they under any obligation to?

wokwokwok · on April 17, 2024

There’s no legal obligation.

…but I think it’s fair to say there’s a social obligation tip your hat to the shoulders you stand on.

This has come up before, and they still do exactly the same thing, and there absolutely zero chance they haven’t heard the critique about it.

So you must presume it’s a deliberate choice, rather than “oops we didn’t think of that”… /shrug

If you don’t want to be called out for it, don’t do it. I’m not particularly sympathetic to them in this case.

Zambyte · on April 17, 2024

There is no license that obligates contributing upstream.

paulmd · on April 17, 2024

This is just like tannenbaum getting mad nobody credited him for the intel management engine (which he feels makes him the posthumous victor in the Linux/minix debates).

Bro, you shouldn’t have chosen a non-attribution license, if you wanted to be attributed.

Just like Tannenbaum, if you wanted your ego stroked, that’s attribution in this context.

yjftsjthsd-h · on April 17, 2024

Er. Minix is under a BSD license, which does require attribution. Also my distant memory is that Tannenbaum wasn't even mad about Intel not actually fulfilling the terms of the license, but I may be misremembering.

paulmd · on April 19, 2024

Can you demonstrate that there is not an attribution in the Intel Management Engine documentation? ;)

Sneaky or not, that's the license Tanenbaum chose, and he has to live with it. Same deal here.

Anyway no, tanenbaum isn't mad, per-se, or at least not at intel. He's sniping back at Linus Kernel (remember the Torvalds-Tanenbaum debates? it was a thing) about how he was right after all about minix being the most-widely used OS in the world. It's not anger, it's gloating - it's not even really a letter that's meant for Intel at all.

https://www.cs.vu.nl/~ast/intel/

And again, that is the point of the entire BSD/MIT vs GPL debate - which went completely over Tanenbaum's head. BSD/MIT provides maximum freedom to the developer... sometimes including the freedom to deny freedoms to the user. He is critiquing Intel (obliquely) for doing the specific thing that makes this license desirable to these customers, and the specific thing Torvalds argued against.

Like it's a gloat, but about how his OS is more popular, but it also backhandedly shows why Linus Kernel was right. And the same is true here. Want attribution? Choose a license that requires it.

yjftsjthsd-h · on April 25, 2024

Yeah, I know they have attribution now, but I was fairly confident that they added that because someone called them out on shipping it without attribution before.

mariuz · on Feb 24, 2024

Via twitter "If you're after some weekend reading, I have now added 8 deep dives about David Braben's epic Lander, the world's first game for the ARM platform. Landscape generation, 3D objects, particle physics and memory maps... it's all here.