More

rileyphone · on Aug 12, 2024

Been working on an autoencoder that converts the hidden states of transformer models into a spatial representation that can be visualized. Started more on the toy scale but now I'm trying to scale it beyond my humble 3060. Using LLMs to help with torch and such but they are limited in the details of tensor twiddling.

https://github.com/ristew/weightscan

rileyphone · on July 17, 2024

Gemini is probably using ring attention. But scaling to that size requires more engineering effort in terms of interlink that goes beyond the purpose of this release from Mistral.

rileyphone · on July 1, 2024

According to the paper, the dataset goes up to 1978 because that's when copyright law was updated to automatically apply to newswires. It's unfortunate that we got into the situation where academia has to play by the rules wrt copyright while big private labs flaunt it.

saulpw · on July 1, 2024

They don't have to play by the rules either. They can also be sued by the NYTimes.

m-p-3 · on July 1, 2024

Justice at the service of whoever has the bigger wallet. :/

ipython · on July 1, 2024

This is, unfortunately, how our (US) justice system works.

The entire concept of an "NDA" has been bastardized in this manner, if you think about it. Conceptually you may think of an NDA as protecting sensitive data from disclosure, a sort of intellectual property right. However, it's been co-opted by folks to do nothing more than cover up inconvenient truths because they realize most people cannot afford to either (a) give up money they are promised in the future or (b) bankrupt themselves in their own defense.

So it's basically a game where whomever has the most money can ensure their narrative wins out in the end, because competing narratives can simply be "bought out".

tivert · on July 1, 2024

> They don't have to play by the rules either. They can also be sued by the NYTimes.

I understand a lot of other outlets have been folding and making deals with OpenAI, because they're too weak to sue and desperate for the revenue.

Which if true is really sad. It's like taking out a payday loan: solve your short-term problem by giving yourself a bigger one in the future.

fritzo · on July 1, 2024

*flaut

devindotcom · on July 1, 2024

*flout

gcanyon · on July 1, 2024

*flute

:-)

rileyphone · on June 13, 2024

In that case there are two attractors - one towards the Golden Gate Bridge and one towards the harmless, helpful, honest assistant persona. Techniques as such probably get weirder results with model scale but no reason to think they get wiped out.

coldtea · on June 13, 2024

What if the Golden Gate Bridge is Main Kampf or something like that?

rileyphone · on June 10, 2024

In this case Azure is responsible for the datacenters, billing, and support.

rileyphone · on June 3, 2024

There's also the oceanic currents carrying warm water from the gulf. Europe is in a very lucky position.

rileyphone · on May 20, 2024

Not really because the ordering is unambiguous given the parens. No need for operator precedence rules like PEMDAS.

munificent · on May 20, 2024

Oh, you still have to worry about precedence. Consider:

    var x = 123
    print(-x.abs())

Or even:

    print(-123.abs())

What do those print? Do they print the same thing?

rileyphone · on May 22, 2024

Unary operators are still operators. The integer parsing rules are probably different. In Lisp, -x would be a symbol, and the proper analog to -123 would be (- x) eg.

munificent · on May 22, 2024

I think we're in agreement. `-` and `.` are both operators and the language and user have to understand the relative precedence of them.

rileyphone · on April 18, 2024

The bigger size is probably from the bigger vocabulary in the tokenizer. But most people are running this model quantized at least to 8 bits, and still reasonably down to 3-4 bpw.

kristianp · on April 19, 2024

> The bigger size is probably from the bigger vocabulary in the tokenizer.

How does that affect anything? It still uses 16 bit floats in the model doesn't it?

rileyphone · on April 18, 2024

To be fair, the Llama 2 instruction tuning was notably bad.

oersted · on April 18, 2024

I see it more as an indirect signal for how good Llama 3 8B can get after proper fine-tuning by the community.

rileyphone · on April 11, 2024

$10 a month?

mhb · on April 11, 2024

$54/year for 3600 searches.

https://kagi.com/settings?p=billing_plan&plan=individual&per...

LeoPanthera · on April 11, 2024

So 10 a day? I do way more than 10 searches a day.

Operyl · on April 11, 2024

Or unlimited for $10/mo or $108/yr.

nine_k · on April 11, 2024

If you're in the Silicon Valley and spend most of your day on the Internet, $10 for superb search is a no-brainer.

If you're a student / postgrad somewhere in the US, it still likely feels worth the price, even though $10 is already not below the threshold of observability.

If you are, say, somewhere in Thailand, or Uzbekistan, or Botswana, the $10/mo become unironically noticeable: not prohibitive, but you really want to get a lot of value in exchange.

And basically anywhere in the world, if you're a kid, and mom and dad would not buy Kagi for you, you're out of luck.