Hacker Newsnew | past | comments | ask | show | jobs | submit | more rileyphone's commentslogin

Been working on an autoencoder that converts the hidden states of transformer models into a spatial representation that can be visualized. Started more on the toy scale but now I'm trying to scale it beyond my humble 3060. Using LLMs to help with torch and such but they are limited in the details of tensor twiddling.

https://github.com/ristew/weightscan


Gemini is probably using ring attention. But scaling to that size requires more engineering effort in terms of interlink that goes beyond the purpose of this release from Mistral.


According to the paper, the dataset goes up to 1978 because that's when copyright law was updated to automatically apply to newswires. It's unfortunate that we got into the situation where academia has to play by the rules wrt copyright while big private labs flaunt it.


They don't have to play by the rules either. They can also be sued by the NYTimes.


Justice at the service of whoever has the bigger wallet. :/


This is, unfortunately, how our (US) justice system works.

The entire concept of an "NDA" has been bastardized in this manner, if you think about it. Conceptually you may think of an NDA as protecting sensitive data from disclosure, a sort of intellectual property right. However, it's been co-opted by folks to do nothing more than cover up inconvenient truths because they realize most people cannot afford to either (a) give up money they are promised in the future or (b) bankrupt themselves in their own defense.

So it's basically a game where whomever has the most money can ensure their narrative wins out in the end, because competing narratives can simply be "bought out".


> They don't have to play by the rules either. They can also be sued by the NYTimes.

I understand a lot of other outlets have been folding and making deals with OpenAI, because they're too weak to sue and desperate for the revenue.

Which if true is really sad. It's like taking out a payday loan: solve your short-term problem by giving yourself a bigger one in the future.


*flaut


*flout


*flute

:-)


In that case there are two attractors - one towards the Golden Gate Bridge and one towards the harmless, helpful, honest assistant persona. Techniques as such probably get weirder results with model scale but no reason to think they get wiped out.


What if the Golden Gate Bridge is Main Kampf or something like that?


In this case Azure is responsible for the datacenters, billing, and support.


There's also the oceanic currents carrying warm water from the gulf. Europe is in a very lucky position.


Not really because the ordering is unambiguous given the parens. No need for operator precedence rules like PEMDAS.


Oh, you still have to worry about precedence. Consider:

    var x = 123
    print(-x.abs())
Or even:

    print(-123.abs())
What do those print? Do they print the same thing?


Unary operators are still operators. The integer parsing rules are probably different. In Lisp, -x would be a symbol, and the proper analog to -123 would be (- x) eg.


I think we're in agreement. `-` and `.` are both operators and the language and user have to understand the relative precedence of them.


The bigger size is probably from the bigger vocabulary in the tokenizer. But most people are running this model quantized at least to 8 bits, and still reasonably down to 3-4 bpw.


> The bigger size is probably from the bigger vocabulary in the tokenizer.

How does that affect anything? It still uses 16 bit floats in the model doesn't it?


To be fair, the Llama 2 instruction tuning was notably bad.


I see it more as an indirect signal for how good Llama 3 8B can get after proper fine-tuning by the community.


$10 a month?



So 10 a day? I do way more than 10 searches a day.


Or unlimited for $10/mo or $108/yr.


If you're in the Silicon Valley and spend most of your day on the Internet, $10 for superb search is a no-brainer.

If you're a student / postgrad somewhere in the US, it still likely feels worth the price, even though $10 is already not below the threshold of observability.

If you are, say, somewhere in Thailand, or Uzbekistan, or Botswana, the $10/mo become unironically noticeable: not prohibitive, but you really want to get a lot of value in exchange.

And basically anywhere in the world, if you're a kid, and mom and dad would not buy Kagi for you, you're out of luck.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: