Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Private and Public Mastodon (tbray.org)
240 points by AndrewDucker on Jan 2, 2023 | hide | past | favorite | 181 comments


It's also unethical to make promises to users, like privacy for posts published to a semipublic social network, that software can't possibly keep.

It's going to be interesting watching the norms of the fediverse collide with the expectations of ordinary software users. Mastodon and ActivityPub have a lot of promise as an alternative to centralized social networks, and there are good reasons to use them even if you don't buy into the ethos of the fediverse. If Mastodon is going anywhere, those users will soon drastically outnumber the keepers of the fediverse norms.


On the one hand it's bizarre from a technical standpoint to insist that Mastodon can't search or quote-post, when the software is federated and open source (so I can change my own instance to enable search and quote-posts), and when an API is available (so I can roll my own search index without even running my own instance).

If you don't want your stuff archived and indexed by strangers you don't trust, then don't post your stuff in public to a federated network operated by strangers you don't trust.

On the other hand it's not really a technical problem, it's a social problem, contrary to everyone who will try to solve E2E federated group encryption in this thread.

It's a social problem because if I fork the project to enable search and quote-posts, any instances running that fork will get defederated by the community that doesn't want search and quote-posts.

Maybe the Eternal September of new people will want these features, and a fork will become more popular than the original branch. The people will decide what they want and vote with their instances.

As for me, I want self-search, so I built my own full-text search service that indexes everything I post and everything I tag as interesting. I've opened it up to a few friends, but we won't publicize it since it would be a pretty unpopular thing at the moment.


[Edit: I misunderstood what the parent comment meant. My apologies. My original comment's text is left intact below, but it is inherently misdirected because of the misunderstanding. It still can be, of course, argued that a proposed solution is not realistically attainable - it's quite a tectonic shift in the collective psyche. But the parent is correct.]

I'd disagree. It's fundamentally a [techno]logical problem - a flawed design that only "works" because it's not happening at scale. The pen is mightier than a sword, and ink lasts longer than memory. Those millennia-old maxims still stand very true.

It guaranteed to get only "worse" as the system scales up and gains more "bad" actors that don't conform to the behavior. And it never will be solved socially (only mended) just like email spam was never socially solved, despite giant collective efforts like RBLs, community spam filters and IP reputations. Only shoved under the rug or plastered over, to be repeatedly revealed when a next drama happens.

I'd be happy to be wrong (if you know that I am - please give me an example), but I think that social and legal methods are never the solution, they don't ever fix anything (unless we're talking about humanity radically changing - then, maybe). They provide certain safeguards against honest people not accidentally screwing up, they have chilling effects, and they provide some recourse after the trust is broken - but they never fix the core issue.

Sometimes there is no way to change the design. Like - as a possibly silly example - we can't fix theft or murder, so those are social problems because we don't have technologies (such as achieving post-scarcity or immortality) to fix them. But the information security here is essentially math problem and it can be fixed.

Of course, whenever it will be is a matter of cost. Chances are, no one would ever do it, because it'll be too complicated (=costly) and the social approach would be good enough. Like how the world can't even switch to the IPv6. But, again, something like that doesn't really solve anything for real, merely lowers the frequency of occurrences to a tolerable level.


> but I think that social and legal methods are never the solution

Just to be clear we're talking about indexing + searching what people post on public internet feeds?

IMO there are only social solutions to this problem. Any legal one would probably kill any federated service that can barely compete with centralized ones as it is... or (more likely) not be enforceable in the first place.

There's two social solutions:

1) don't post things you don't want recorded forever on public networks

OR

2) stop treating every mistake humans makes in their personal history as deal breaker for life

The first is a matter of personal responsibility, which is not a very popular concept in certain communities popular online. The other is more rational and practical but is still a very serious problem in western culture. Countless people have been 'cancelled' (or attempts have been made) based on 10yr old internet posts the posters themselves denounce.

If you want technical solutions use Signal, not a federated Twitter competitor.


> Just to be clear we're talking about indexing + searching what people post on public internet feeds?

Yes. I've only brought legal here, as there is some movement towards using legal frameworks for privacy ("right to be forgotten" etc.), as I totally misunderstood what kind of social solutions you propose.

> There's two social solutions: [...]

D'oh! Now I see. And I'm honestly sorry, as I completely misunderstood what you meant saying it's a "social problem".

I wholeheartedly agree now - either of those would be an ultimate solution. Not just a solution - I also do believe that this is how things should be.

It can be argued how realistic it is - I mean, it's quite a giant change to how society sees things. But I hope - at the grand timescale - it's well achievable. Many societies had drastically changed during the 20th century, so who knows where we'll end up...


Yeah, agreed, the "right to be forgotten" legal could be very expensive to enforce on a federated internet. It's only feasible when Google is all you have to deal with. Even then it's a big hassle.


Search and quote-post are very different in this regard.

Quote-post can be a problem because it encourages bad behavior among people who aren't actively malicious. If the most popular Fediverse software doesn't support it, there won't be a widespread quote-poste -> pile-on phenomenon even if a few servers support it without getting widely blocked.

Search becomes a problem when a group of bad actors use it to harass and dox people. It doesn't require widespread adoption, just a small number of bad actors with access to an index. From what I've read about previous organized far-right harassment campaigns, I find it likely that bad actors are already indexing Fediverse posts, but not publicly talking about it. Excluding search features from the most popular software will not help with this. Blocking servers that admit they're indexing posts for search will not help with this.


Hmm? Masto comes with decent self-search now, you don't need to build it.


Yes, Elasticsearch is an option, but many (most?) instances haven't installed it, including mine.

And even when it is installed, it doesn't do what I want.

In addition to my own posts, I want to index the public posts of those I follow, and I want to organize by topics that aren't found in the text of the post, and I want to "follow" search terms similar to how the latest Masto allows following hashtags.

None of this is available with Elasticsearch even if it was enabled on my instance.


> there are good reasons to use them

What is one of them?


The medium is better for expressing thoughts (you can write a decent-sized blog post in a single Mastodon post). It has most of the effective bits of Twitter, and many of the effective bits of the peak, RSS-soaked blogosphere. If the tooling gets better, I can see it being better than the golden age of Google Reader.

It also makes sense that you can stand up your own Mastodon, in the same sense as you could stand up WordPress back in the day. Most people won't, but the fact that you can, and that the ability to do so is baked into protocol, keeps development interesting. It also makes it more interesting for third party clients, which Twitter has spent the last decade trying to eradicate. So that's a second (and probably third) reason.

Perhaps ironically, these are also reasons that I think the norms of the "fediverse" aren't going to matter in the long term. I think the software approach is important, and the notion of a single "decentralized social network" is not.


I thought you were giving reasons for "ordinary software users" to go out and use Mastodon in its current form. I was hoping that perhaps their shitshow onboarding UX got updated to accommodate the influx of users over the past few weeks. Nope-- they just changed the "Create Account" link to "Apply for Account" in their hall of instances. :(

But regarding Mastodon as a potential alternative to the the current centralized social networks-- fair enough. I even saw recently that Ted Unangst even made a stripped-down Mastodon server in Go which I'd like to check out when I get a chance.


Yes: there are reasons ordinary users might prefer Mastodon to Twitter; as I said:

* You can write full posts in Mastodon

* You can own your own data, run your own service, or select from a variety of hosted options

* The ecosystem encourages good third-party clients and tooling

The tooling isn't there, but then, I don't think anybody expected the events of the last few months, so I'm inclined to give it a quarter or two. As a casual user, there are ways in which Mastodon is superior to Twitter already, and most of my problems are with the clients; Tapbots can't keep up with the beta demand for Ivory, and there's 2-3 other promising clients, so I'm optimistic.

I wrote Mastodon off for years, but: it's pretty good? I miss the blogosphere, and Mastodon is a step back towards it.


What about the inherent bubble-y nature of Mastodon?

Do you see it as beneficial to have a series of large groups of balkanized servers optionally interconnected rather than a general public network everyone contributes to. I hate the label of "public square", because Twitter should already be opt-in in terms of which content you see based on who you follow, even though it's one big network (ignoring the @jack era AI-feeds which fought this idea).

A key difference in Twitter vs Mastodon UX is that users would have to make different accounts for each "Balkan country" server they join. If one server blacklists or is blacklisted they won't be able to participate in others. And this has implications well beyond US polarized politics once it reaches global scale.


I perceive this to be the biggest critique people have of Mastodon and I simply don't care about it at all; it just passes directly over my head. My guess is that it's a consequence of people mostly "belonging" to community servers, with defined (and evolving) community norms. Blog hosts didn't have those, and I don't think Mastodon/ActivityPub servers will for long either. It's going to get easier to run your own server, and there are going to be hosting providers that do that work for you; ultimately, "Mastodon instances" (or some pin-compatible equivalent) are going to be commodities, like Blogger blogs.

I'm on a server (infosec.exchange) that I like just fine for now, but I'd expect to be publishing on my own server within a month or so. At that point, what's the "bubble"? It seems like exactly the same situation as RSS, except I'm not depending on Google to keep Reader going.

I'm not bullish on "the fediverse" as a federation of communities with some global norms (like religiously posting alt tags on images) and lots of local norms (like never posting AI art, or things like that). But having actually used it now, I think the underlying technology is a lot smarter than I gave it credit for, and I think it stands a decent chance of being important.


> and there are going to be hosting providers that do that work for you

There are hosting providers doing that work for you, though several of them are overwhelmed too at the moment. There'll be more.

I run my own, and it's not hard but it's certainly still too much hassle for casual users, so I think there's definitely space for more hosting services.

> I'm not bullish on "the fediverse" as a federation of communities with some global norms

Yeah, a lot of these norms will collapse in the face of a broader user base that want Twitter 2.0, and don't care about a lot of the ideological stuff going on with Mastodon and other current Fediverse software.

> But having actually used it now, I think the underlying technology is a lot smarter than I gave it credit for, and I think it stands a decent chance of being important.

Treating ActivityPub w/WebFinger as a globallly addressable signed pubsub queuing system in itself is very interesting. You can build so much more on top of that than "just" "Twitter 2.0" or Mastodon. Of course there are some examples of that, like Pixelfed, but I think we're just scratching the surface of reimagining services on top of a layer like that.


Right, "bubble" is probably a loaded word to use here.

I personally would prefer going back to the "old" internet where we had niche forums. Mastodon provides that while letting forums connect with a wider set of communities.

I guess my only critique is that it's not often pitched this way. But it will become apparent to most people later on.


The "connected communities" thing doesn't resonate with me. To me, the promise of Mastodon is just integrated publishing and RSS-style aggregation, plus a set of conventions that captures most of what people did with Twitter. I think it's a weird quirk of the evolution of Mastodon/ActivityPub that I'm on someone else's server right now; it's a little bit like being on someone's huge group blog. Group blogs make sense, but not at the scale Mastodon communities operate at.

I think that if Mastodon sustains the growth they've seen over the last few months, then a year from now, it's going to be less common for people to primarily publish on communities like journa.host and FediScience, and more common for them to publish from huge Blogger-style community services --- with very few weird norms, much more comparable to 2021-era Twitter than to Mastodon.social --- or their "own" instances, just like everyone does with blogs.


> Group blogs make sense, but not at the scale Mastodon communities operate at.

That's interesting. Like I said, this bit will become more apparent to users as it grows, since in the current form that's all it really is (a few sets of big networks). I think you just sold me on the general idea of Mastodon.

Back to your earlier point, it might really come down to the UX issues around communicating on Mastodon itself, if it's going to pull this off. But that doesn't mean there won't be another service that can in the future. Otherwise that's the big thing that scared me away, since I'm a hyper-critical of UIs.


Human interaction is inherently bubble-y.

Nobody wants to get shouted at by people preaching stuff they don't like or feel harassed by all day long, so we seek together in social groups that by and large are united at least along some axes of interests, and avoid too much talk of the topics we know are controversial within those the groups.

Some of us then sometimes seek broader forums to try to spread our groups ideas. But a lot of people actively avoid those kind of situations. Many others use different identities in different groups because they don't mesh well.

"Balkanized" servers are only balkanized to the extent people wants them to be (yes, admins can ride roughshod over users, but users can vote with their feed and migrate their accounts and follows elsewhere). Most people want to be widely federated. Some want forums where they can say things nobody else wants to listen to or wants to self-isolate (e.g. Truth Social, Counter Social and Gab all run - or at least ran, not kept up - Mastodon, but either don't federate at all and/or are widely blocked)

> A key difference in Twitter vs Mastodon UX is that users would have to make different accounts for each "Balkan country" server they join. If one server blacklists or is blacklisted they won't be able to participate in others. And this has implications well beyond US polarized politics once it reaches global scale.

If you want to participate in things so offensive to so many people that they're mutually blocking each other, then yes you either have to do that or find an instance that haven't defederated either. Or host/rent your own and not block anything other than what you, yourself choose.

With Mastodon you have that choice. With Twitter you risk getting outright banned site-wide.


> (yes, admins can ride roughshod over users, but users can vote with their feed and migrate their accounts and follows elsewhere)

This is true when you think it's about HN users as the prototypical user. But I'm just imagining having to explain to my mom why she can't participate in [X, Y, Z] set of Mastodon communities, doesn't she already have a Mastodon account? Why does she need accounts for each? Why are her old "tweets", followers, DMs, etc not available on the new server?

This is a much larger barrier than advanced users can appreciate... maybe not right now but it will increasingly be in the future.

Ultimately it's very possible groups of interconnected communities is a superior concept to centralized social networks. But this should be more of an obvious part of the UX w/ Mastodon. It's fine if it's properly communicated AND if porting your old tweets/accounts/followers/etc over to new servers is easy... but even then it's an adoption hazard (people avoid complexity they don't understand).

Currently it's communicated as a Twitter of FB competitor, which may very likely come back to bite them (not to mention the UI sucking).


> But I'm just imagining having to explain to my mom why she can't participate in [X, Y, Z] set of Mastodon communities

"X and Y don't like each other, so they refuse to talk to each other. Think of it like how you can't talk to people on Facebook from Twitter". People siloing themselves is not a new concept for people.

Ironically, the new thing with Mastodon is that you don't need accounts for each of X, Y, Z to talk to people there even if X, Y and Z defederate from each other, as long as you can find "neutral territory". Only for a very small set of instances is this a challenge.

Being used to silos seems to also be behind a lot of the confusion over having to pick a server - people end up fretting over the choice when it matters a lot less than they think. I wish joinmastodon defaulted to just suggesting a random "generic" instance, and hid the rest behind an "advanced users" button.

> Why are her old "tweets", followers, DMs, etc not available on the new server?

Her follows and followers are available on the new server if she moves the account. When I moved to my new instance, most followers had auto-migrated within minutes. It can definitely be made simpler, but given the number of non-technical users I've seen migrate, it's not that big a barrier.

That the old posts are not I agree is an issue - it is likely to get solved, as the objections to migrating those with the rest are minor and stupid (you can migrate those too, but it requires a friendly admin willing to do a data import, or running your own instance, so for all practical purposes most users can't)

> (not to mention the UI sucking).

There are multiple UIs competing to improve this, before even considering the Mastodon alternatives that all can talk to Mastodon (e.g. Misskey, Pleroma), so this is improving rapidly. Personally I use the advanced interface and I think all it takes is some custom css and it'd be superior to Twitter in most ways (the advanced interface is somewhat like Tweetdeck; I run it with custom CSS that makes column widths configurable). Because of the total open nature of the API, there's far more room to get something good here than there is for Twitter, which is fairly awful.


> What about the inherent bubble-y nature of Mastodon?

So, I'd kind of question whether this is a real thing in any important way. In the early days of the Twitter exodus it _did_ look like it was going that way a bit, but in practice most instances quickly sorted out moderation to a reasonable degree. There certainly are largely defederated instances, but they're probably not of much interest to most users.

I would have some concerns about what happens if Twitter goes down for a protracted period and there's a much larger less self-selecting exodus, but for the moment at least fediverse balkanisation feels more like a theoretical problem than a real one which is likely to impact significant numbers of users.

One other potential looming challenge, I suppose, is Tumblr. If that does add ActivityPub support, it'll be, in effect, a very big, extremely poorly moderated, yet culturally very important instance, and how instances treat it could provoke a split.


> A key difference in Twitter vs Mastodon UX is that users would have to make different accounts for each "Balkan country" server they join.

Sure, but this is functionally no different from the days of forums when you'd have an account on many different forums to discuss the topics of interest to those forums. Just like forums, they may all be running the same two or three codebases, but the communities are different. The cool new thing federation adds on top of this is that communities can organically merge and split while retaining a basic level of cohesion.

And since they're all running the same protocol, it's straightforward for clients like Fedilab to make multi-account use easy.


To add on to this, remember that there is a Fediverse plugin for Wordpress as well.


And lots of other popular sites, like Tumblr, are adding Fediverse support as well.


And Drupal


> and the notion of a single "decentralized social network" is not.

I've been running my own Pleroma instance for a month or so and I'm firmly of the opinion that a single network is actually a Bad Idea. The fact that ActivityPub is federated means that it's also prone to abuse from other nodes similar to email spam, which is just about as simple to execute in the fediverse right now as rolling a new domain name. Small-time casual admins like myself won't be able to keep up with a global abuse firehose which hasn't really happened yet to my knowledge but will sooner or later.

The solution in my mind (and what I've already done) is to use a node whitelist instead of relying on blacklisting, which means smaller networks based foremost on interpersonal trust. Which means a constellation of much smaller but also stronger social networks that reflect actual communities as opposed to just the internet at large. This is super easy to do with ActivityPub servers and my prediction is this is the way forward although I expect people will also try to write clever abuse filters and fall into that perpetual cat-and-mouse game since that's what happened with email.


I don't trust any "private" settings in any public social networks, for anything remotely serious.

Anything I publish on any social network I'd be also comfortable seeing posted on every wall in foot-tall letters, with my real name attached to it. Were I not be comfortable with that, I won't post it.

I consider all "private" or "limited" posting provisions on all social networks as paper walls at best, one small devops or SWE mistake from being torn and becoming visible to public. Maybe I'm overshooting a bit, but my past experience shows that it's not much of an exaggeration. So I never use any "private" features and post everything I ever post as openly as possible.

Social media is a colander. Expect anything and everything to leak. Embrace being public.

If you want something with more real privacy, as in not likely to leak if all parties involved take reasonable care, use something that was built with privacy in mind, like, IDK, Signal, or maybe a private Telegram chat, or a private Mastodon instance only accessible through a VPN which you personally and competently operate. Of course it's unlikely to hide you from state-level actors, but at least it's more likely to protect you from simple operator blunders.


> Anything I publish on any social network I'd be also comfortable seeing posted on every wall in foot-tall letters, with my real name attached to it.

You misunderstand. It's layered defense / minimizing exposure. Nobody is suggesting it can't be done, but it not going to be normalized in the current Fediverse, and most potential attackers do not care enough to build throw own search engine.

The main reason is quite clearly stated in the article, and I'm not sure if you didn't see it or just don't care because you're not affected.

> And, if you’re vulnerable, attack you, shame you, doxx you, SWAT you, try to kill you.

A lot of people that are being targeted by the likes of KF or Libs of Tik Tok find nice comfortable corners on the Fedivserse because they're not immediately discoverable by someone typing their name into Google.


Sorry, I do understand. I think that the particular fortress of highly private communication on a network designed for publishing and dissemination cannot be defended efficiently. I'd be glad to be proven wrong, but my last 25 years of experience with social networks of all sorts, from early IRC and web chats to what we have today tells me to keep my current security stance. Anything on a social network has a very high chance of a public exposure, one way or another.

For the defense in depth / layered defense to work, the initial assumptions should be different. They should be like Signal's, not like Mastodon's. The whole approach should be paranoid about not letting things seep through by any chance at all. I don't think that such an approach would be seen as productive by most users and developers. So, I view various access control settings in networks like Mastodon as a convenience filter, not as a security mechanism.

> not immediately discoverable

This is actually both important and achievable! If you never post your public identity, and things which easily link you to your public identity, it's pretty hard to doxx you and harass you IRL, even if everything you publish were publicly accessible. If you don't advertise your presence where everybody is looking, you have a good chance of not being harassed online, because the harassers won't try hard enough to infiltrate remote corners of the internet.

A reliable pseudonymous identity is a great mechanism, if you maintain reasonable opsec. I just won't expect a great level of insulation of your content from fellow pseudonymous users, and occasionally even from general public.


Usually the kinds of people who can direct harassment at you go for the lowest hanging fruit, so any amount of obfuscation or barrier will deter most of them, unless you're a very high profile target (i.e. you're Keffals).

The entire model doesn't even rely on being entirely undiscoverable - the point is that you don't need to step back from public life because a bunch of weirdos like stalking you, it can be as easy as having a publicly visible profile with a lock on it. This prevents following, but unlike Twitter you're not entirely excluded from making public posts that anyone can interact with. You can just do a baseline of filtering when accepting followers and deciding which posts to make public or private. You'd be surprised how many of these stalkers and harassers can't make a profile that passes a basic "vibe check" - they either try too hard at putting together a profile or they don't bother at all.

Fedisearch and related projects are frowned upon because they would make it much more trivial to find people who are not yet targets based on a soundbite and to go through someone's "permanent record" (most implementations do not backfill posts made before the user was initially followed from an account on the same instance).


Yes. I would even argue that Privacy expectations should be much weaker than for the big centralized platforms.

Mastodon's instances admins have blanket read access to every post and message, even the "private ones", yet they are not under scrutiny of RGPD regulators like for example Twitter. Are you sure those admins take good security measures, like storing backups encrypted, keeping their tech stack minimal and up to date, using safe configs? Do you trust them for not reading and sharing your conversations?

Regarding the Fediverse search, it's even worse as you're supposed to trust the rest of the world for not abusing it and indexing the entire network. Well, that's not how things works. With this kind of large scale systems you should know that if there is the possibility of an event, that event will happen very soon if not already. Either you impose restrictions or you accept the tradeoffs.


> I can imagine finer-grained exclusions, such as allowing full-text indexing but only for accounts on the same instance, or allowing use for search but no other applications. (No ML model building!)

I think it's unlikely that you can prevent ML model building with a carefully designed license. The most common legal position (though not something that has been tested in court yet) is that training models is sufficiently transformative to count as fair use, and does not require any sort of license to the data.

You can see this in all the state of the art tools that are trained on all the publicly available data that they can scrape, without regard for license: translation (text), GPT-3 (text), Stable Diffusion etc (images), Co-Pilot (code).

For preventing trolling and harassment a licensing approach is an even worse fit, since those are not people who care about respecting licenses.


None of those tools have actually been legally tested, and there is a reason much of it has been using data sets laundered through academics.The companies behind them know this is not at all a given, and academics make for more sympathetic defendants than billionaires. Transformative use, as one part of a fair use consideration, is a defense against copyright infringement. The proposal is to first require agreement to a separate license before even being able to access the content. This is an additional layer, which may or may not be enforceable but would definitely establish either negligence or intent, and also brings things like unauthorized access into it. The fact that all of this also includes huge amounts of PII means that in a growing number of jurisdictions misuse of it will not be protected by any copyright exceptions. It would be an endless battle to stop smaller abusers, but you could definitely prevent GPT-3, Stable Diffusion, and Co-Pilot, since they are all coming out of well-defined legal entities with assets and identifiable humans to go after.


There are jurisdictions (the EU, the UK, Singapore, Japan) with copyright exceptions specifically for text and data mining for AI purposes.

https://www.twobirds.com/en/insights/2021/singapore/coming-u...


That's interesting thanks. I wasn't aware of the Singapore one. It seems to be the broadest, but based on the linked page, it's not clear to me how it would come down here. It requires legal access to the material first, but also says you can't contractually override the copyright exception. I don't know how they would weigh it if you're only granted access based on that contract (rather than it being a small part of a broader contract).

For the case of the EU, based on the way the GDPR and related digital laws are drafted very much in a "spirit of the law" and with individual rights and agency trumping corporate interests, it seems fairly likely it would not just allow coopting personal social media content over the explicit wishes of the creators, regardless of whether any access was deemed legal. For the moment, I think the UK digital laws are still basically just copies of the EU ones as well (with some search and replace) but I guess they'll drift apart over time if there's no de-brexit.

It's also worth noting, especially given how it's written about in the linked post, that these are generally assuming that these uses do not destroy the market for the original works, and they come from the reality before all the art generation. The specific implementations everyone is talking about, especially in the art space, have now pretty definitively proven that we're in a new reality now.


Very good points. Thanks for your thoughts.


What would be the ramifications of one of those entities releasing their model as a torrent at the first sign of legal trouble?


Wait until you hear about how followers only posts actually work. An analogy would be Microsoft can't figure out how to get email addressing to work in Outlook, so they send every email to every server, and then Exchange does some magic filtering and tells Outlook which of the emails in your inbox should be visible. Then somebody writes an alternative SMTP server that allows viewing of these hidden messages, and Microsoft sues them.


Do not mistake a convenience feature for a security feature.

Follower-only posts are not about hiding something from prying eyes, but about removing noise and clutter from those who don't care about certain topics.


If you enable follow requests, and trust the admins of your followers, does that work as a privacy feature? From what I know, such posts do not get federated to unrelated servers, right?


They can. Someone could get your post, then boost it, which distributes it to whoever follows them and so on.

Anyone thinking their post is only seen by their followers when they have federation turned on is grossly misinformed.


I was under the impression that boosting follower only posts does not work (via the API too, actually, it returns a HTTP error code if you try to do so), but then again the server code may be modified, or people can just screenshot.

I think this is simply a social problem — when sending a post to your followers, you have to trust that they do not share your post. The same applies in private messaging. You have to trust the recipients.


I agree with you when it comes to private messages, and anyone sharing those is a jerk.

I don't agree in this case where the system is designed to spread those posts. Fair enough if you don't want that to happen, in which case don't use a federated system where that is a design goal.


Hmm, to my knowledge, followers only posts are ActivityPub feature, where you just shovel the message to inboxes of just followers. It is not sending it to all servers?

It's also possible to send messages to part of the followers as well, some instances like qoto.org support circles. You make circles from your followers and post to just them.

I wouldn't call these privacy features, but ability for sender to choose what it wants to say for certain group of followers.


It starts getting weird when you reply to those posts.


I believe followers-only posts are sent to all servers with at least one follower.


That is given, how else could it work?

This is pub/sub, it's not pull based, so every time you release something it is pushed to the subscribers, your followers servers.

I would like to have a bit of pull based things as well, but ActivityPub is not built for it.

Even though it's pushed to a server, doesn't mean it goes to everyone in that server.


Even though it's pushed to a server, doesn't mean it goes to everyone in that server.

Right, the issue being raised is that there's nothing preventing it from going to everyone on the server except the server being nice about it.


How is this different from let's say email?

It's in the specs what servers should do if it receives a message. If someone sends a message addressed to just these people the server then delivers it to just these people?

Of course if you send something from foo@gmail.com to bar@yahoo.com the server at yahoo.com decides who gets it, hopefully just bar.

In fact email analogy in my mind is most apt and easiest to explain, including the fact that it's not E2E encrypted. It's totally up to servers who gets your messages when you send them.


A better ideal might be Matrix -- consider a private E2E room, with all your recipients in it, locked down so only you can post messages to it.

Stuff you post will only be visible to the set of users you want it to be visible to. Writing software to present those posts in a suitable UI is left as an exercise to the reader. And it's a bit late to start now, but I really wish we'd been able to start our fediverse journey with something that's at least supposed to be secure, then opened up the stuff that's intended be open.


Absolutely, I support E2E and signatures. It's funny that there is nothing new since GPG, it is decentralized identity with E2E encryption and verifiable signatures. But original poster was questioning the delivery mechanism. Given that there is no E2E, it already works as efficiently as it can with pub/sub.

I don't want to start discussing what Fediverse should be, but I have had this discussion in past, few points:

- ActivityPub has two signatures: HTTP signatures, which are ephemeral, and more persistent JSON-LD object signatures. They are contentious. Some servers disable JSON-LD object signatures, because they allow to proof cryptographically that someone posted something, thus deleted posts become "liability".

- E2E encryption will have similar ramifications, because not all servers agree that crypto be used.

Currently only way to do "privacy" is to deliver messages to just the people you want, and hope servers won't send them to third parties, like in email. That's what Mastodon does.


> How is this different from let's say email?

E-mail has decent brand separation between the protocol and providers. People don’t talk about joining .social, they talk about Mastodon.


People also talk about "sending email", not "sending gmail". Brand separation naturally emerges over time.


> People also talk about "sending email", not "sending gmail". Brand separation naturally emerges over time

Sure. But users are cognizant they're using Gmail. And when something goes wrong, they're halfway decent at attributing it to their or the recipient's provider. Mastodon servers are eclipsed by Mastodon per se. It's closer to AOL than e-mail.


AFAIK, it doesn't send it to every server -- it sends it to every server with at least one recipient. Exactly how email works. Anyone can in fact write an alternative SMTP server that delivers every mail to every account.


I suspect that the poster you're replying to makes a confusion with public posts, which ActivityPub, unlike email, has. When the public namespace is encountered as a recipient of an Activity, Mastodon indeed iterates over all the instances it is connected to and pushes it to all.

But for follower only posting, you are right, Activities go only to the recipients, theoretically to each individual inbox, practically to the instance's shared inbox.


Author of Mastinator here which was linked in the article. Also its back online, so if you want to start testing your implementation it should work for you. I will be adding more functionality as I go.

While search is a problem, I think the bigger problem with the fediverse is expectations. People expect that their "follower" only posts are only shown to their followers. However you can get them anyway through so many other ways, its literally ludicrous that people expect anything they post to only be delivered to their followers. Boosting posts within Mastodon transmits them to federated servers, and this is by design.

My point has always been that if you play in a federated system, where you literally distribute your posts to anyone who requests them, you have to accept once it leaves your server you have lost all control. To then complain someone is getting that post is naive. That's the federated system working as intended.

If you don't want it public, don't distributed it via a public system.

I don't know if people were told otherwise, but they seem to be under the impression their content is fully under their control in the fediverse, when it is far from the case. Deletes/Edits of content are "should" in the spec, meaning yes people should do it, they are under no obligation to do so.

The only reaction the fediverse has to anyone they dislike is to block, and in some cases harass people till they leave. That wont work long term, and it isn't working now. Despite being blocked by hundreds of instances Mastinator is still getting thousands of posts an hour delivered to it, including posts from instances that blocked it because someone who is federated with those instances has not blocked mastinator and someone boosted it.

The current "elders" of the fediverse want to keep their behaviors and cultures. Which is fine. They should do so by having their own allow-listed federation, not accepting any follow request, or totally de-federating. They should not be bullying others into accepting their behavior. Not only does it not work now, its really not going to work once Tumblr, Flickr and others start integrating, as they will have enough people on that they will be the majority and they will make the rules.


This quote from Eugen is about CWs, but I think it applies here too:

“… It's a decentralized network that doesn't belong to any one party, so by definition there is no single culture on it. Different corners have different expectations and customs.” - https://mastodon.social/@Gargron/109323056922301691


> People should be able to converse without their every word landing on a permanent global un-erasable indexed public record. Call me crazy.

Sure, and they should use Signal instead of publishing their conversations and then getting mad when they turn out to be publicly available.


This is my position too. What I'm looking for in a twitter alternative is something public that I can post to the entire world, and search. If I want to post to my friends I already use Signal and vastly prefer that to something like Mastodon.

The add the extra complexity of follower only things and instances and I just don't understand who this service is aim at other than the people who created it. Hence, to me, it is and will always be some niche thing (that isn't terribly interesting to me).

All that said I still read these threads because there needs to be a good twitter alternative.


I do think I should be able to post something publicly and not have it indexed by search engines. If someone screencaps it or copy/pastes it fine but the original should live on the dark web.

This is one of those things that tech needs think is pointless when in reality opting out of search indexing drops your visibility to nothing which is good for making sure your posts don’t “break containment.”

Unless you’re a public figure or social media influencer having a post tied to your IRL identity blow up is usually a bad thing.


If it's readable, it's indexable. There's no way around that. I'd say there's no technical way around that, but there's also no realistic legal protection against that - at most you can limit public search engines.


> having a post tied to your IRL identity blow up is usually a bad thing

Well... Then be careful with what you post tied to your real identity. I have separate accounts for separate use cases. My main on Reddit, for example, can (very) easily be doxxed so I hardly post anything controversial on that account.

Through all the recent data leaks it's getting even easier to dox people based on screen names too. I'm even considering getting a PO box next time I move house, just to decrease the chances of my home address ending up in a data dump.


> I do think I should be able to post something publicly and not have it indexed by search engines.

There's a preference in settings, but it just adds your profile to robots.txt. Meaning that if someone boosts your post and doesn't have that configured...


Do you mean deep web?


I think the future of social actually looks more like Signal and Whatsapp groups (maybe Discord too?) than it does Twitter and Facebook. People seem to appreciate and value private discussion more than they used to.

I certainly spend more time interacting with people in small group chats than I do on Facebook now.


I don't think it is reasonable to expect Signal groupchats with more than n participants to stay private, where n is more than 5 and less than 20.

Android malware is a thing and Signal Desktop does not encrypt anything at rest on disk.


Use a private group chat, and it can be decentralized too using matrix!

The fediverse is a public protocol, and indexing public content is fair use(ML is not indexing, its very different), so any kind of licensing won't matter.

Discovery is already crap in mastodon and the fediverse in general, please don't stop people from improving it.

It may look like eternal September for fediverse natives, but that's what going mainstream is like, don't like it? Create something with some kind of intentional barrier like HN to stop the masses from joining.

I find the fear about getting doxed and having your words watched and used against you from fediverse natives very funny, because I guarantee you that most of them support cancel culture.

In general I find old fediverse communities too fragile, one example are the crazy content warnings(just mute words and topics you don't like, simple!) and now this.


This idea that you create privacy by leaving the search feature out of the software is silly. Yes, to some extent security through obscurity does work, and trying to maintain an anti-sharing culture might reduce the spread of your information. But is that really what you want to rely on?

The other bad pseudo-privacy idea is time-limited posts ("stories" or snapchat or whatever).

In both cases, you're crippling the software to add an illusory safeguard, which doesn't actually stop bad actors from having access to your posts and hence the ability to record and rebroadcast them.

You know who got the system right? Facebook. Private by default, but the user can decide exactly who can see each of their digital objects. It's easy to define groups of friends, allow sharing to friends-of-friends, one person, the whole world, whatever. The concept of friends (bilateral agreement to share information) makes way more sense than this "follow" thing.

I want my open-source, federated, Facebook already.

Edit: Although, I did just have a flashback to when Facebook announced Graph Search... which lasted about a week until searches that actually worked were deemed creepy and they backpedaled into the stone age. It's so frustrating how these technologies succeed or fail based on fashion rather than technical merit.


Interestingly, I find Facebook friend groups a complete dark pattern ever since the days of Google plus. G+ circle were super easy to create, maintain, split, divide, merge, manipulate, and - and this was brilliant - share. FB groups by comparison seem hidden, obfuscated and unmaintainable. Creating a new list or updating an old one is a complete pain with poor screen usage, poor or non existent gestures, control, actions, searches, let alone regexes etc.

They exist... But it feels FB has gone out of its way to hide and obfuscate them.

Is my experience weird?


FB is very clearly deeply invested in convincing people to over-share by accident or habit, yeah. It makes their network more addictive, and they know it, so they press that button as hard as possible while building things that technically satisfy niches.

---

I quite liked G+'s focus on choosing your audience. Because you have sub-groups even within small, tightly-knit friend groups; when you raise that number into the hundreds it's only more true, not less. It was a mostly-effective UX for embracing that, and it led to my feed being dramatically more relevant.

Mastodon is filling a similar purpose for me, lately. The server you join has a pretty powerful impact on your local timeline - join a couple, use them as targeted sharing / browsing groups, and it's working much better for me than any algorithmic sorting ever did.


I don't think that's true about FB convincing people to over-share. That may have been true years ago (Bob's relationship status has changed to single!) but these days I get warnings and stuff whenever I set anything to public, and all the defaults are friends only.


Facebook got sued a whole bunch. Between the FTC's various privacy-related consent decrees, Cambridge Analytica, and the GDPR, they functionally can't do the whole "push people to overshare" thing anymore.

I will say though, it's specifically the old Facebook mentality that made me hyped back when Google+ was announced. And then it crashed and burned - which was exactly the point in time where I stopped wanting to be on literally any new social network.


G+'s implementation kind of broke communities though?

Each individual having their own personal view of their circles meant that you couldn't reliably know which of your friends had seen the stuff your reading.

That makes it really hard to talk about (Hey did you see X?, no what's X? Oh... oops?).

I like the theory of being able to organise my relationships into nice little buckets, but that's absolutely not how social things work.


Circles were trivially shareable though. It made it super easy to create... Well, circles of friends :). These are the 15 of us into computers, 12 of us into photography, 6 of us into dnd, whatever.

And then the best feature of all, Sharing of curated circles. A kind of competitive marketplace of topic related circles emerged so you could find these amazing circles of photographers or musicians etc. Best of all you ingested and then owned that (instance of) circle.


Somewhat, yeah. Personally I'd like to let people define their own publishing "topics" and let people select which ones they want to follow.[1]

"Did you see X" is largely killed by algorithmic feeds though IMO, which makes it somewhat irrelevant for any full-scale heavily-used network. Facebook is a prime example - important updates frequently are not seen by many close friends, because Facebook chose to not show them. Assuming nobody knows anything specific has kinda become the norm, sadly.

[1]: Obviously many will not, but that's fine. By following them you just get an unfiltered stream. But many of my friends couldn't care less about what programming language of the week I'm looking at (because they're not techy), or what nearby events I'm going to (because they're 1000 miles away) and I'm very much the sort of person who will categorize that for them so they aren't flooded with things they won't be able to join in on.

Hashtags are kinda like a crappy in-band version of this, and I have yet to see a system embrace them for this purpose. They're basically always for public purposes, which is part of why you need to use a million near-identical ones to actually get good coverage.


+1 for topics. Conceptually, pub sub. Maybe also give publisher to have the ability to choose an access policy for their topics (anybody can join, exclude some, only requests I accept).


What Fediverse projects have something analogous to G+ Circles? Based on some quick searching, the only one I could find is Bonfire's "Boundaries": https://bonfirenetworks.org/posts/introducing_boundaries/

> Within bonfire, you now have the possibility to define circles and boundaries: a way to privately group some of your contacts and then grant them permissions to interact with you and each piece of content you share at the most granular level.

>Boundaries go beyond the typical permissions on social media (i.e. who can see your content) and include a long list of verbs in order to represent all kinds of meaningful interactions and collaboration that should be possible on a real social network.


I agree, G+ system was more explicit, and I preferred it. FB has done the usual modern thing of hiding features so that the less technical users don't worry about them. Perhaps cowardly; I think even someone struggling with tech (parents, grandparents..) would have learned G+ if it really took off.


You can't really make that either though. Facebook itself can see everything. As soon as you federate that all the admins can see everything and nothing is private.

You could try and encrypt it down to the user level but a person added to a group would only see the content added from that point forward (the ability to decrypt would be defined at the time of the post, and new people could never read it).

The only way around that would be centralised key management which defeats the whole point.

This is one of those problems where it's probably better to just use Facebook.


Actually, I don't think this is right. The creator of a group or "shared to group Y" chat/posts can generate a shared key for use within that limited friend group. Their individual posts would be encrypted validated by their own private key, but all posts among the group would also be encrypted against the group chat.

Struggle here would be that anyone who gets the shared key would have access to view and potentially add to the group.


Signal's encrypted group chat management was recently made more robust and may be instructive:

https://signal.org/blog/signal-private-group-system/


Oh. Good point about the admins. I somehow missed that.


Just look at the 'Twitter Files'. It's not great that Elon can access anyone's private messages. It's a great tool to blackmail people.


I believe that option 2, where it relies on individual encryption at the cost of reading history, is how matrix does it (or can do it if chosen).


> The other bad pseudo-privacy idea is time-limited posts ("stories" or snapchat or whatever).

That one addresses a real threat-model - someone who was your friend and whom you trusted at the time you sent the post, but has since turned against you. That is quite common among teens, which is the main demographic these social networks are targeting. Rather than them having a permanent record of all your online interactions since you met including embarrassing photographs, it just becomes he-said-she-said like it has been for ages, unless they were actively subverting the program to archive content while you were still friends, which is rare.


It's not that rare, but releasing a ton of this kind of screenshot makes you look very untrustworthy.


Fair point.


Eugen Rochko, the developer of Mastodon had written about Search that if it comes it should be for the home timeline and own posts. It would help a little bit, at least you'd find old posts from people you follow.

It would suit some, but since this is federated, there are already instances with a search like qoto.org with a full-text search.

It's really odd that they made "no index" checkbox, but it's not cool to index. If they thought it is unpopular they should have made indexing opt-in not opt-out.

Going forward this will be instance specific thing, a lot of people want to be able to do searches.


Does Mastodon have something to the effect of robots.txt (or is robots.txt already robust enough to express an answer to the question "This site administrator is willing to have these toots indexed for search?")

At the end of the day, it's a federation protocol. The protocol's creator can have an opinion on whether things should be indexed for search, but it's up to users of the protocol, not the protocol's creator, how it's used.

(... and I have to comment on this bit from the article. "The problem would be a public search engine that Gamergaters and Kiwifarmers use to hunt down vulnerable targets." This heavily implies that Gamergaters and Kiwifarmers can't just build that for their own purposes, which would, of course, ignore robots.txt settings).


We're already seeing massive searchability problems from the mass migration of communities to discord servers, it's not a good thing and doesn't make anyone safer.

The choice of bogeymen makes me suspect the motivation is insincere, though.


It's not insincere. Many of the early adopters were social (queer, trans, furry) or political minorities (socialists, anarchists). They adopted Mastodon because they were sick of being targeted on Twitter and other centralized platforms. I can understand very much why they are protective of their space.


Which is why they should move to allow-list federation, and only allow in other instances they can firstly check to ensure it won't happen again.


Yeah there are vanishingly few online queer spaces not on Discord for pretty much this reason. The ability for server admins to bring down the banhammer and actually moderate bad behavior rather than leaving it to central platform moderators that do nothing about all but the most blatant harassment.

Semi-public semi-searchable discourse is really quite nice when your aim is to not have the spotlight on you. Because there are unfortunately people that actively search for people to target and being on Discord makes that in practice impossible.


I’m a bit confused. Isn’t there a lot of queer-ness on Twitter, Reddit etc?


> Does Mastodon have something to the effect of robots.txt

Mastodon has a "Opt-out of search engine indexing: Affects your public profile and post pages" setting, which probably inserts a "robots" meta tag. [1]

But this is just for ordinary web crawling, and the bit isn't sent along with your posts that get federated to other instances.

[1] https://developers.google.com/search/docs/crawling-indexing/...


Furthermore, crawlers may not respect the opt-out tag. https://github.com/mastodon/mastodon/issues/13207

Another thing -- noindex is not federated. I used to enable noindex, and still found my content on Google (mirrored on other instances that I was federated with). I just allowed indexing.


Hence the crux of the issue. You can opt-out of indexing. The moment you federate, and your post leaves your server you no longer have that guarantee. All it takes is one instance to boost your post to someone who enables indexing.


robots.txt is like a UN strongly worded letter

Bad actors have ignored them from the start, and are good enough at faking headers/using multiple ips such that they are very difficult to block


True. I was still speaking of the polite space and what the protocol can offer.

In the impolite space, these problems are addressed by encryption and auth/auth, not polite protocol. Which defeats much of Mastodon's designed goal IIUC.


I don't understand how a license is supposed to prevent people from doing bad things with your content. Does the not-so-nice guy in Russia care that he's infringing on your license?

I think Mastodon makes the correct call (everything here is public, because it's impossible for it not to be and still have the service be what it is), and the community wants a square circle. Yay activists.


I don't think you can split the thinking into those two groups, when the maintainers of the large instances that are pushing back against search are also lead developers.


This article echoes what strikes me as a really silly sentiment that I also see on Mastodon because it strikes me as antithetical to Mastodon.

The point of the service is to "spread the things you say most everywhere." That is the design. It's literally designed to do what the opposite of "privacy" is.

And yet, here we are.

I'm sorry, but these kind of discussions to me sound like "What if we could have email, except your posts don't go to any other person, you just read them yourself?"

I mean, you could use gmail to do this. It would technically work. But it's not what it's designed for, and much better ways to do this already exist.


yeah I did not know about this community policy and this was the first time I was significantly discouraged from using Mastodon.

I use search to do research on links all the time (https://www.swyx.io/twitter-metacommentary). If I read something good, usually plonking it into HN search or Twitter search yields a dozen more related points and rabbit holes I can go down. I can even engage with the author or find their thread of thinking or responses to a question I had that may already have been asked (or better, questions I didnt think to ask)

Without Mastodon search all these metaconversations about topics are lost.


Right, but I think the thing is the "protocol" is (or can be) stronger than the policy.

Which is why over there I'm presently unpopular with lots of people on many sides of issues, because (this is rough, but hey) e.g. too many white people are going "ooh, no racism discussion here please, I'm too delicate." and also too many black people are like "if i see the slightest thing that looks a little bit like racism defederate and block that whole server!"


I think it's more like "What if you could have email, but your posts only go to the people you want them to?" I think that's pretty easy with Gmail, for the most part, but with Mastodon it's harder because it inverts that control—anyone can follow you (or send a follow request), and it's harder to police every follower individually. This is combined with the fact that most accounts are available anonymously on the web, but it's not really necessary (many are not, for instance, and even the ones that are aren't really convenient to access that way—for bad-faith actors, it's more convenient to sign up for an account on mastodon.social or some other "well known" server and then find posts that way by browsing timelines).


Yes.

Because that's the design.

If you don't what your posts to go everywhere, use something that's not designed for your posts to go everywhere on. There are other things.


But on the other hand you have a popular platform, with a lot of people who can help, and with which some people are already used to, that you could deploy internally and use as some sort of internal messageboard that can be easily used from anywhere, with an official and third-party mobile apps already available.


That's fine, but I'm pretty sure that's not the design/use the author is talking about.


Everybody's saying "well of course you can't stop people crawling so just give up." I don't buy it - you also can't stop people from driving too fast or smoking in restaurants or torrenting popular movies. That's why we have lawyers and courts and legislation.

If Mastodon gets content licensing right, you'll still be able to ignore it and go ahead and crawl data when the license forbids, it scratches your itch and you're ethically challenged. But then if you do anything with that data in public you're going to get legal nastygrams. That may not even stop you, but it will drive up the cost of your lack of ethics.

Ask any security pro. You can't ever stop all the attackers. All you can do is make it more and more expensive to do bad stuff, and eventually most of them won't have a strong enough incentive to pay the price.

There are plenty of people on Mastodon - the vast majority is my bet - who, when there's a choice of content licenses, will cheerfully say "make it public", and then there will be excellent full-text search.


I don't really get the proposal.

Are you saying there is currently confusion as to whether a Mastodon user inadvertently issues a license for their copyrighted content to be included in full text search, simply by using Mastodon?

If not, what is preventing someone from sending a legal nastygram now, given that no such licenses currently being granted?

Or are you saying that Mastadon users are not able to legal prevent indexing based on copyright alone (i.e. fair-use, or not substantial enough to qualify for copyright protection), and thus we need to force followers into some kind of private contract that they would break?


This is covered in the article and what is missing is a login wall, due to the legal precedents being set in (as one example) the LinkedIn case.


You can in fact stop people from smoking in restaurants.


I believe they meant physically. Just because you can't stop people physically smoking doesn't mean you can't stop them by social or legal (which reflects social) means.

Similarly, just because you can't stop people from indexing Mastodon physically, doesn't mean that you can't stop them by social or legal means. However, I would add that the internet is really hard to control because of how open it is, which is why we patch security vulnerabilities instead of only relying on publicly shaming or arresting malicious actors on the internet.


Realistically, what's going to happen with "courts and legislation" is that anyone who wants to do scraping will simply operate from the jurisdiction where it's legal. That's the crucial difference between regulating roads or restaurants, and regulating stuff on the Internet, at least in the absence of Great National Firewalls.


I never got so many death threats from strangers as I did when I wrote a spider for Mastodon.

Then everyone defederated my single user instance, as if that would stop an unauthenticated web crawler.

The idea that you could publish something on the web and expect it to stay private is insane.


Harassment and toxicity on Mastodon is a very real thing indeed.


Can attest to that. The worst online interactions I have had are on it.


Clearly the solution involves web 3.0 Blockchain to provide irrefutable evidence of ownership of each post, with an off-chain oracle providing per-post licenses in machine-readable formats. </sarcasm>

...or you know don't say stuff in public if you don't want it to be seen by others. This is - and always has been - Internet 101 stuff: assume that the internet never forgets, and don't say anything publicly if you'd rather not see it on the front page of a newspaper.

I guess each new generation needs to learn that there are bad people out there, and computers make finding a needle in a haystack trivial.


I think it's kind of worse than that. It's not an education problem. People are intentionally using this stuff to talk publicly about an issue. They just don't like the consequences of that.


I think the problem is its public for way longer than people are used to, its like Snowden said, a permanent record.

You could've tweeted something 10 years ago with an anonymous account, and called it a day.

Yet, after more than a decade it can crash your life because you got caught by stylometry analysis.

This certainly applies to HN even more.


> I’m a bit puzzled by that “But people are already doing it” argument. Yes, Mastodon traffic either is already or soon will be captured and filed permanently as in forever

Correct. This is something I and several others have been doing for some time now. We have a private search engine that covers most of the Mastodon fediverse (including widely defederated instances), and there's nothing anyone can do about it.

Eventually, we'll give this data to the Internet Archive or put it in a torrent or something. It includes a decent amount of now-deleted content too.

Really, it's no different to what others, e.g. Pushshift, are doing with other sites. Except there's no opt-out. Anything you've already said is almost certainly in our data set, and it's there permanently.

> That’s extremely hard to prevent but isn’t really the problem: The problem would be a public search engine that Gamergaters and Kiwifarmers use to hunt down vulnerable targets.

Also correct. A couple of KFers we know already have access to search the data we've collected. It points them towards interesting posts or accounts to archive. People say some pretty wild things when they think they're not being recorded for posterity!


I'm amazed sometimes at the datasets KFers uses for doxxing. If you care enough, you just need to pay attention to leaks and store them all and you can make a mini NSA X-Keyscore.

During the whole Cloudflare banning thing I remember coming across a dox that deanonymized someone via the Patreon hack dump [1], where a simple Twitter username match turned into a name + address (via credit card details stored by Patreon).

Anonymity and OPSEC requires some serious effort and knowledge.

[1] https://www.christianpost.com/news/patreon-hack-almost-14-gb...


Serious effort - as in - not reusing usernames and not posting embarassing stuff using your real name?


Yes, telling people to not reuse passwords is enough of an ask.


So, among the first people you give access to the search are some known harassers? Sharp thinking, there.


I mean, look at the facts:

- permanently archiving posts they know people don’t want them to

- gave early access to known trolls

- is now publicly crowing about how there’s nothing you can do about it

- on a throwaway, because like most bullies they’re cowards

Personally, I’m left wondering if there needs to be an organized crime investigation into KF, for organized harassment campaigns which violate local stalking or other laws, and the role people like this account play in orchestrating it.

I think there’s be a certain irony to HN’s record of this comment being used to prosecute an accomplice in organized crime.


If you don't want your stuff permanently archived then don't post it on public websites. It's sad that people don't give a shit about their privacy and overshare everything these days.


No — I’m tired of saying all of society needs to walk on eggshells due to anti-social people like OP.

If you go around a bar with a tape recorder and directional microphone capturing conversations from unsuspecting people and then use those private moments to launch protracted harassment campaigns, you’re an asshole.

No matter how much you say “they didn’t have an expectation of privacy!”


That's not comparable to posting messages on public websites that can be read by anyone without an account. You don't know who will read them.

If you want to talk privately - talk to people that you trust and use apps like Signal that delete messages after some time.


When you speak in public, anyone can hear you.

What specifically do you feel is different?


If you say embarassing shit or shout obscenities in public then don't be surprised if someone pulls out their phone and starts recording it.

It's completely different if someone sneaks into your home and leaves microphones there.


Yes — and we’re asking about the case of a quiet conversation in a bar, where someone surreptitiously records using specialize equipment.

I think it’s very telling you made a strawman rather than respond to that point.


I think both of you are visualising "posting on the Fediverse" differently. One visualises it as a conversation between two people that may be overheard, while the other visualises it as broadcasting.


Posting on Twitter or Mastodon is the equivalent of shouting on a public street. If you don't want to face the consequences of what you say, don't say it publicly (or at least don't do it with your real name).


No — the expectation was never that people would creepily record every corner of the public sphere.

That’s anti-social behavior and it’s okay to call it such.


If it wasn't an expectation, then it SHOULD have been. People are allowed to remember and record what you yell on the street, just like they are allowed to remember and record what you post on a public platform for anyone to see.


When you say things on the internet, you say them publicly, and people are allowed to remember your words.


It the same as dropping wireless mics and facial recognition tech on public sidewalks. It might be legal but social norms make it weird. The people that do it will find out their lives get quite a bit harder when they're found out.


Lots of security cameras record what's happening on public sidewalks. Lots of cars have dashcams in them.


Thanks, but that's not really relevant.


I mean, "kfsnd." KF. They're not exactly hiding who they are.


Shrug.

They’re on a throwaway because they’re too cowardly to admit who they are and face people like myself who want to hold them accountable for their bullying.

I think they’re exactly hiding who they are.


Seems prudent when you have people calling for you to be prosecuted as a criminal accomplice. ;)


Now now. Be fair. Some of us would be quite content with your ongoing persecution.


Of course — that’s the difference between us:

I act in a way I can publicly own.

You act in a way you know that you have to hide — because you’re aware that’s it’s anti-social.


I have no dog in this fight (probably will never use mastodon or this search engine) but I do find it ironic that this guy is being insulted for using a throwaway by people advocating for privacy on mastodon.


Genuinely, thank you for your service. A real archive doesn't delete a collection because of public pressure like "The Internet Archive" does.


> I’m a bit puzzled by that “But people are already doing it” argument. [Snip] The problem would be a public search engine that Gamergaters and Kiwifarmers use to hunt down vulnerable targets.

The author implies GG/KF/4chan/whatever are incapable of writing their own scraping tools specifically for the purposes of harassment.


Looking at the negative opinions on mastodon , my take is that it’s only bad for writers (some may say spammers) who seem to want to subject the world to their writing (perhaps spam) and not leave a choice to the user. I prefer to keep things somewhat as they are, if I want to find some spam, I can go looking for it, if I want to find a blogger, same, it’s just a matter of looking at the local and global fields and maybe public figures should simply publish their id’s in their web page for people to find. Not having a stream filled with spam or stuff I’m simply not interested in (I can look at the local and global streams elsewhere) is what I would call efficiency in this type of app.


It’s negative for me because I don’t find much very interesting going on there. Very few of the Twitter posters I follow (mostly really anoraky military history stuff) have moved over, and the few that have aren’t posting nearly as much.


My ToS on my mastodon instance has said the same thing for 4 years. Everything you post on Mastodon should be considered public information.

Case closed.


From a technical point of view, there is no meaningful way to put any access controls in place other than server admins choosing to un-federate any misbehaving servers and create their own little bubble. And cutting off most of the fediverse would defeat the point of having one.

Making the information public kind of is the point of putting it on the web on a public url without authentication. Don't do that if you don't want it. And think twice before you publish something you are not comfortable with sharing in public. Not that hard.

But yes, it's a matter of time before these things get crawled, scraped, etc. by all sorts of media organizations, marketing companies, etc. Once you have a critical mass of people who are relatively well educated, with disposable income, etc. any self respecting spammer, advertiser, marketing person, etc. is going to want to be all over this. The fact they haven't shown up yet (at least not a lot) has nothing to do with access controls, just with a lack of critical mass. The fediverse just wasn't that interesting until people started showing up a few months ago. Now we suddenly have people like Tim Bray and other tech influencers showing up. The more clued in linked in junkies are already getting mastodon accounts. That kind of is the point of a good public forum. These people will want to be a part of it.

There are technical solutions of course but they are going to have to involve making mastodon more like a federated signal/telegram, which does not seem to exist just yet. Mastodon just isn't it. Encryption is what you need if you want things to stay private. Of course it doesn't prevent anyone with their own modified client making some modifications that archives the plain text after decryption and information leaking that way. But it would be a lot harder to scrape everything. You basically have to infiltrate every group you want to scrape.

Mastodon with some crypto added would be nice though. The mistake with email was that pgp usage never caught on (too complicated to manage for most people to bother). Signing message content would be a nice start for mastodon. It would allow people to build reputations and verify that messages are coming from who they claim to be. Impersonating people is a thing on Twitter. Without message signatures, it's going to be a thing on mastodon too. And the nice thing with reputations is that they are a great basis for filtering too. If enough servers flag a particular public signature, they might just decline to accept content by that particular signature. Or accept it but filter it out of the public feed by default.


I have been living in the fediverse for years and have adapted this strategy I think can work for most people:

-I have a real ID account with my picture and link back to my domain web page. I approve the followers to keep it to mainly people I know and trust. I post mainly to followers only or post publicly rather mundane information that I would not mind coming up in a deep background check.

-I have another anonymous account where I am most active and can express political and other views publicly without worry. I tell some of my friends of this account to follow after they have found my real ID one.


Child Porn is a huge image problem that can only be solved by governments in the relevant jurisdictions. Search makes this image problem drastically worse.

Lots of discussion about what privacy means no discussion about the elephant in the room. Networks that make acceptable use a choose your own adventure for instance by allowing people to self host and set policy on their own server end up with some people using it for things that the rest of planet earth doesn't find acceptable. For instance I'm certain that common web frameworks and servers are used to promote things both odious and illegal but because that machinery is invisible to most users the blame accrues entirely to the criminal.

With Mastodon the branding makes it possible for the blame to accrue to Mastodon as opposed merely the criminal because people are more apt to understand Mastodon as an open source twitter as opposed to a tool like Apache. Whereas this problem accrues to Mastodon the tool is in no position to dictate how users use the tool the relevant governments are and if prosecution becomes common hiding illegal porn from prying eyes will be done by the users themselves. If Mastodon suggests not federating with servers in countries that don't handle this issue eg Japan then search will tend to show off less negative content and indeed countries can be shamed into handling such issues better.


I think people who don't want search are underestimating how prevalent it already is.

Mastodon itself officially supports Elasticsearch, but it isn't turned on by default and the UI limits what can be searched. I'm not sure if that's for social reasons or just to make installing a server easier, but there's no "never federate to servers with ES enabled" setting. Friendica has full-text search as well, and I imagine there are others.

Machine-readable copyright licenses as the author suggests could be very useful for situations where legitimate for-profit companies start federating their services. Perhaps I don't want Flickr[0] to show advertisements next to my photos that I share from a self-hosted Mastodon server, for example. I could apply a license that forbids that and have a good chance of enforcing it in court. They're not a good fit for dealing with bad actors though. If somebody sets up fedisearch.ru fed by legitimate-looking public servers, they're probably beyond the reach of the courts I have access to.

[0] The CEO of Flickr has publicly pondered adding ActivityPub support.


> A server should deliver posts only to people logged into the instance, or to other instances it is federated with.

This feels like maybe not an unreasonable _default_, but I certainly don't think it should be mandatory. If a user wants to publish something on the internet, it's reasonable for them to be able to do that.


But wait, isn't Google already indexing the Fediverse? Public posts are available through http without auth... so I find it hard to understand the opposition from the admins described in this article. What am I missing?


Google is indeed already indexing the Fediverse. A Fediverse long-timer I've talked to justified this by saying that Mastodon is "more like speaking" and websites are "more like publishing". However, I pointed out that IndieWeb and influencers exist too.

Personally, I am unconvinced by the long-timer's argument. I believe that if you are against Fediverse crawlers ethically, you must be against web crawlers as well, and indeed I spent months without Google, DDG, or any other crawler-based search -- StackOverflow tags, Wikipedia and bookmarks were my friends. It was an enlightening experience, but I now hold the reverse position.


Only architecture astronauts care about the fediverse, and by extension how search works on Mastodon. Normal people just wanted a Twitter clone when Twitter started to feel icky or doomed. It doesn’t speak to the platform’s long term growth if basic functionality like this is controversial, especially if it leads to fragmentation. There’s obviously some value to conservatism here - I suspect for most people, Twitter was feature complete years ago and didn’t need more doodads. But search was certainly a core part of that.


Mastodon isn't a public company or a VC-funded startup. It doesn't need perpetual user growth, nor, I suspect, do many of its current users or developers particularly want that. At most it just needs a critical mass of people who are interested in what it has to offer so that it doesn't die as a project/community.

I don't use Mastodon (or Twitter) but it seems to me like Mastodon was never really intended to be a pure Twitter clone, and what "normal people" want in this regard isn't really relevant. What's the point in trying to emulate a platform that feels icky and doomed, anyway?


I don't use Twitter or Mastodon either, I'm happy spewing my terrible opinions here. But for those that remain, you're absolutely right it needs that critical mass of high quality content (for myriad definitions of quality), positive interactions, and reach. I'm just saying that from a purely utilitarian point of view, the sum total of disappointment will be higher for users if Mastodon's momentum disappears, compared to that of people who have ideological objections to Mastodon having a functional search engine. If nobody cares about that, well, then welcome to the fediverse I guess.


> [talking about licenses] I’m pretty sure I’m missing important dimensions.

I think they are missing fair use. IANAL but it seems unlikely that you need permission to do full text search if you only display snippets.

Honestly the whole copyright part seems misplaced. Ask the movie industry how well copyrighgt has worked to prevent people from sharing movies. If the concern is malicious people violating privacy, copyright is the wrong tool. Although what the author is describing sounds more like a contract than copyright.


Worth checking out farcaster (farcaster.xyz). They are working towards a solution where things are "sufficiently decentralised". The server admins can't prevent users from following one another etc.

https://www.varunsrinivasan.com/2022/01/11/sufficient-decent...


If mastodon is fully private then how will they get money from their application? Social Media networks can only get money by ads or by selling the user's data, and ads could potentially track the user's data.

Which companies just say NO to getting more profits, every company wants to expand so I don't see why mastodon should promise private access to content when it's really not.


Not sure what you mean. People on mastodon tend to donate to their server admins. Also, this seems pretty okay for now: https://graphtreon.com/creator/mastodon


"please resist finding ways to scrape the fediverse" that is 100% burying your head in the sand, 100%.


I remember a few months ago some people were already sharpening their pitchforks because Eugen implemented a change that would make local search results a bit more relevant if the instance wasn't using Elasticsearch.


I am saddened but not surprised at the lack of nuance in these comments. I also think the author is missing some things.

We take it for granted now that the public-ness of online content is very nearly binary: it's either public, and therefore hyper-public, globally accessible, indexed, searchable, publicly archived forever, or it is private. But this binary is artificial. Why can't we have things in-between?

Perhaps it's easier to understand if we make a real-world analogy. In the real world, there are public spaces where people interact and their interactions are, in general, not recorded, not remembered, not noticed. I can walk to my local public square and scream as much as I like about politics, and only people physically present there can hear me. If I say something strange enough, someone might film me and upload it to the internet, but that's an active choice. Almost everything that happens in these traditional public spaces does not have the hyper-publicity of web content. Likewise, I can have a casual conversation with a friend while walking in a public space, and people standing close to us might hear, they could even join in, but it's very unlikely that conversation will end up in the historical record. By contrast, while something like Twitter bills itself as a public square, it is nothing like it: no public square has cameras and microphones permanently fixed on it that globally broadcast everything happening there and pick up even the faintest voices. It really stretches the traditional conception of publicity. Previously, only celebrities and politicians had the misfortune of history's eyes being permanently fixed on them; what does “public figure” mean now?

Meanwhile, the web's conception of private is also an extreme. Conversations that nobody else can overhear and join are the only other option.

So, isn't publicly fetchable, but not indexable, content a reasonable compromise?

Now, certainly, the fact that there is no technical barrier to indexing means this is, you could say, privacy through obscurity. But that is true of almost all real-world privacy! It doesn't make it meaningless.

I also must reject the idea that because things can be indexed and archived, that they should be, especially when it goes against an explicitly expressed desire by affected users. Is mass surveillance only a problem when it is done in secret by a state?

The fact bad actors can't be stopped is a strange excuse as well. The world we live in is held together by the fact that most people, most of the time, act in good faith, even though they know a minority do not. If people gave up on caring because other people don't, there would be no society.

Finally: the legal aspect of the blog is such a strange attempt at proposing a solution. The fediverse is a system run by volunteers and its enemies (to be crude) are also generally volunteers, and the harms involved are generally at the micro, individual level (even if in aggregate they can be big), rather than having big corporate price tags attached. Licences are almost meaningless because almost nobody is taking anyone to court. The law cannot solve social problems like these.


(I'm not saying it's the best compromise, I just think it's a legitimate one. On the technical side of things I really think the protocol needs to make it clearer what the author of a post wants the visibility to be.)


> The fact bad actors can't be stopped is a strange excuse as well. The world we live in is held together by the fact that most people, most of the time, act in good faith...

The problem is, the effort required to index the whole of Mastodon is basically trivial, compared to wiretapping even a few locations, in the man-power, cost and ease of avoiding detection. With that in mind, either you never share posts publicly, or you have to assume it'll be recorded somewhere even if you don't want it.


I come from the side of "search should exist", but I enjoyed this nuanced read. Thank you.


Farcaster(farcaster.xyz) is working towards sufficiently decentralised social network (https://www.varunsrinivasan.com/2022/01/11/sufficient-decent...). It aims to have a model where server admins don't get the power to prevent users from following one another.


How I see it: any and all attempts to kneecap user functionality is shameful, anything that you're relying on goodwill for will fail miserably and be exploited. If your plan for handling a full text search of the network is to browbeat the developer into not doing it your days are numbered. If your plan for keeping your words private is to put them publicly on the internet and then call people Nazis or whatever for looking at them without your permission you're not very bright.


There is a story [0] from November doing the rounds on Twitter today (but which was flagged on HN) that does a good job of expanding on why there is no search function on Mastodon. Namely, it's* full of child porn, or "lolicon" which is drawn child porn, illegal in many Western nations but not in Japan. In the words of Mastodon Founder Eugen Rochko:

> Lack of full-text search on general content is intentional, due to negative social dynamics

I guess you could frame this as a privacy argument, but I don't buy that as the primary reason people were upset with "Fedisearch." They were upset because it exposed their communities of lolicon enthusiasts. Do they deserve privacy or is that more of an excuse to keep their content off the radar of law enforcement and out of reach of polite society?

I did find this 2017 GitHub issue [1] to be slightly less inflammatory and full of fruitful discussion about legal implications of caching strategies for media content on Mastodon instances.

[0] https://www.secjuice.com/mastodon-child-porn-pedophiles/

[1] https://github.com/mastodon/mastodon/issues/1847

* "It" being the wider fediverse, where the 2nd and 3rd most popular instances allow "lolicon" content (which is illegal in many jurisdictions). Of course not all of Mastodon is filled with this, but all it takes is an obsessive minority to poison the grapevine with a large amount of content. Also, you can expect generative AI to lead to even more of this stuff.


It's well-known where the Lolicon is (pawoo.net and mastodon.jp and some others) and all the mainstream instances have de-federated from those places.

Email is also full of disgusting stuff but nobody is going to abandon email because it's "full of child porn", although we all know perfectly well that there's lots of that stuff being emailed around.

Well-run Masto instances are actually pretty safe and pleasant spaces these days.


> I guess you could frame this as a privacy argument, but I don't buy that as the primary reason people were upset with "Fedisearch." They were upset because it exposed their communities of lolicon enthusiasts. Do they deserve privacy or is that more of an excuse to keep their content off the radar of law enforcement and out of reach of polite society?

From my time speaking to Fediverse long-timers, no, this is not the case.


I wouldn't expect them to tell you otherwise... that's my point. What justification have you heard?


The communities that publicly shame search providers also publicly shame and block lolicon servers. I don’t see how that means that they are protecting those servers by disliking search.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: