Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

IME, this is just about the opposite of true.

I recently did a deep dive of an (allegedly) human-curated selection of 40K blogs containing 600K posts. I got the list from Kagi’s Small Web Index [1]. I haven’t published anything about it yet, but the takeaway is that nostalgia for the IndieWeb is largely misplaced.

The overwhelming majority of was 2010s era “content marketing” SEO slop.

The next largest slice was esoteric nostalgia content. Like, “Look at these antique toys/books/movies/etc!”. You’d be shocked at the volume of this still being written by retirees on Blogger (no shade, it’s good to have a hobby, but goddamn there are a lot of you).

The slice of “things an average person might plausibly care to look at” was vanishingly small.

There are no spam filters, mods, or ways to report abuse when you run your content mill on your own domain.

Like you, I was somewhat surprised by this result. I have to assume this is little more than a marketing ploy by Kagi to turn content producers who want clicks into Kagi customers. That list is not suited for any other purpose I can discern.

[1] https://github.com/kagisearch/smallweb





I once spend half a day or so gathering RSS feeds from fortune 500 companies press releases. I expected it to be mostly bullshit but was pleasantly surprised. Apparently if one spends enough millions on doing something there is no room for bullshit in the publication.

Pleasantly surprised? I would think that these feeds would consist of product and company announcements and this would be the expected/appropriate content. Did you find something less sterile?

To convince you I would really need to rebuild this thing and show it side by side with blogs and news outlets.

To give one example, at the time I gathered f500 feeds my other feed sets (tens of thousands) suffered horribly from echo chamber effect. I had endless headlines announcing that David Bowie died. Something like half of it. I don't find that a particularly interesting topic. I have my own memories of his work and have little need for more. Perhaps you would like to read 20 of those? Surely not thousands? It isn't that it isn't a noteworthy event but it drowns out everything else.

Meanwhile Walmart is talking about donating returned Lego to charities.

Exon is talking about a giant ammonia deal to make carbon neutral hydrogen

and GM talks about their next generation software platform to help bring long term continuous innovation to customers through over-the-air updates.

They apparently have tons of diagnostic data and are looking to make it more practical complete with remote tuning for old clunkers.

This to me is quite a lot more interesting than say cbnc talking about war with iran, if bitcoin will survive and that Tesla stock is down???? I really needed more articles about those topics? I thought I already got a million of those. No way in hell I will open those links.


Do you intend to write it up? It would be interesting to get your take on how the classification works. And personally, as I know my feed is on the index as well, into which category my writing would be sorted.

Probably not. I lost interest when I figured out how poor the dataset is.

Fair enough, I've added your site to my reader just in case you change your mind ;). The mechanics of your process actually sound more interesting than the results.

Wasn't their workflow just that anyone can modify their feed list on github? Or was there a postprocessed list?

Probably a part of it, but I did not submit my site there directly at least. I think they used different sources to seed the list. There was a very active thread on HN about two or three years ago where very many folks shared their personal websites and from that several OPML lists where compiled. I noticed quite an uptick in feed subscribers around that time.

Also by the way, Henry Desroches, the author of the article we are discussing in this thread, also maintains a web directory personalsit.es, where I had submitted my site as well, so that might have also been a source for the kagi small web index.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: