Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Tell HN: OpenBB scraping GitHub emails for marketing spam
190 points by tmp_1649016698 on April 3, 2022 | hide | past | favorite | 71 comments
A recent post[0] went around about OpenBB. I starred their GitHub[1] repository because I was interested and now I've just received an email[2] thanking me for signing up for their newsletter.

Look, I hate to call the project out but I'm tired of people scraping GitHub for emails and blasting out marketing spam. And I'm confident they scraped my email from GitHub because that's the only place 'github@mydomain.com' is used.

[0]: https://news.ycombinator.com/item?id=30854451

[1]: https://github.com/OpenBB-finance/OpenBBTerminal

[2]:

  Hi,

  We, as OpenBB, would like to personally thank you for signing up to our newsletter.

  We are on a mission to make investment research effective, powerful and accessible to everyone.

  The team would love to hear what you think of our OpenBB Terminal and if there is anything we can do to improve it.

  Feel free to e-mail us at hello@openbb.co or reach us through our Discord to get involved with the community.


Received the same email.

Here is the GitHub link[0] to report abuse. I send this short text[2]. Generally I'm ok to be contacted by email and I think I'm quite tolerant regarding this as long as I don't receive a spam saying that I subscribed to a newsletter while I have not.

[0]: https://support.github.com/contact/report-abuse?category=rep...

[1]: As described on [HN](https://news.ycombinator.com/item?id=30900237), the owner of this [repository](https://github.com/OpenBB-finance/OpenBBTerminal) is spamming GitHub users that have starred their repository. I never subscribed to their newsletter.


I would 1) report it to Github, 2) flag it as spam in your web mail client, 3) report it as spam to whatever mailing service the company may be using (Sendgrid, Mailchimp, whatever).


I starred their repo. If they send me anything I will report them to whoever they’re using.


I just don't understand how companies can think this kind of privacy invading marketing can be effective. In which situation do they believe they are adding affinity for their product by scraping your email?


It's not "companies" as a whole. It's a small department who comes up with these ideas and uses misleading, short-term metrics such as open rate or "cooking the books" by misattributing future revenue to their efforts ("this user who we spammed 3 years ago suddenly signed up - see, my marketing strategy is great, now give me a raise!") to justify their salaries. Their incentives are rarely correlated to those of the company, so every short-term trick that they can use to justify their salary is good even if it will be damaging to the company in the long term (by that time they'll already be on their next project with a successful achievement on their resume).


> It's not "companies" as a whole. It's a small department who comes up with these ideas

Ah yes, the "one bad nerd-apple" hypothesis.


The cost for the negative responses are close to zero while the benefit for positive replies is high. Usual spam/growth hack accounting.


Which means we need to substantially raise the cost for negative responses, to the point that a few negative responses completely outweighs any potential benefit from positive ones.

https://docs.github.com/en/site-policy/github-terms/github-t...

> You may not use the API to download data or Content from GitHub for spamming purposes

Report each instance of spam to GitHub, and hopefully they'll terminate the account.


It's your public email address on GitHub (which you may choose to hide, if you prefer). It's also public what you starred on GitHub. This email is obnoxious and annoying, and it's possibly even against the law to sign people up without them explicitly consenting, but it's not "privacy invading".


Surely it's some kind of trademark infringement or something to call your Bloomberg competitor "OpenBB". That seems obviously designed to mislead people into thinking it's something to do with Bloomberg.

If it were just some meme project fine, but they are apparently taking investment [1], and investors generally expect to make money. Making money from misleading people into thinking you're backed by the market leader...uhh...seems risky.

1: https://openbb.co/company#investors


I spent half a decade in finance and know my way around a Bloomberg terminal, and I don't think I've ever heard “BB” used as shorthand for Bloomberg. When I see it, I think BlackBerry.


Sure, but you see "financial terminal" you're already thinking Bloomberg. Then you see "OpenBB" and by that point what else could "BB" stand for?

I mean that as a legitimate question, what are they even claiming "BB" stands for? I actually can't find it.


They are also using stock data from Yahoo Finance and IEX Cloud as their data sources. It is misleading to use IEX because you are only getting 2-3% of volume being that IEX is a very small stock exchange. Using Yahoo Finance may be more accurate but the exchanges will not take kindly to using scraped data like that. There are very strict compliance and audit requirements for dealing with any of the exchanges, and the data costs a ton. I have seen a lot of projects pull together a bunch of random "free" data sources without regard for their licensing or terms. I can't understand how this is even comparable to Bloomberg when there is no live data feed and the data source is questionable at best. I hope their investors aren't blindly believing their marketing.

Being a free and open project I'm not sure how they would ever use a paid data feed though, the market data industry is just not compatible with that model unfortunately.


Under which authority could Yahoo finance legally restrict access to the stock data if on the technical side the website and information is public and doesn't require authentication?


Just because information is public doesn't mean it can be used without restriction, similar to how certain open source licenses doesn't allow you to do anything you want with the code just because you can view it on GitHub. For example, website owners will realize this quickly when they copy an image from Google Images and get a lawyer letter demanding payment for unlicensed usage.

In this case, Yahoo licenses the data from various exchanges. I am not sure which exchange they use (some sites often use a cheaper exchange like one of the CBOE exchanges, rather than licensing the entire consolidated tape, which is the combination of all exchanges and gives the most official and up to date pricing). The exchanges are very particular about distribution rights and sites pay a pretty penny (we are talking probably tens of thousands per month for a single data source) to access it. I am not sure exactly what legal method the use to "shut down" unauthorized, access, but they can and do depending on how serious they are (for example, NYSE may be the strictest and does regular audits).

Of course, exchanges understand that their intellectual property can and will be reproduced when it is offered for "free" through public websites, so that is factored into the cost that websites pay and for the most part everyone turns a blind eye to small personal use. However when you use it commercially or build a project (even free) around their IP, you can expect them to take a second look at some point. Exchanges make a killing on this I am sure, as they charge admin fees, monthly data fees, fees per user/visitor, and all kind of other clever things. To learn more look up "<market> Fee schedule" (where market = CTA or UTP for consolidated US equities, OPRA for options, CME for futures, etc) and you can find the licensing rates that most exchanges publish (which doesn't include the rates you will pay to a data vendor to actually get the data).


It's probably worth noting the ongoing LinkedIn vs HiQ case [1], where at least initially the legality of web scraping, and generation of databases via that scraping, was upheld.

[1]https://en.m.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn


There’s subtle differences. In HIQ case, they are scraping user generated content. In OpenBB case, it’s pricing data owned by a particular firm.


My point was that exchange prices/raw data can't even legally be "IP", because facts aren't copyrightable. (Not even compilations of facts that have no other creative input)


It's definitely not the first time I've seen emails scraped from stars or committing to a public repo on GitHub.

I agree with others in this thread: if you didn't explicitly sign up for the mailing list, mark it as spam. They should realize the damage they're inflicting on their mail sending reputation by sending unsolicited communications pretty quickly.


> They should realize the damage they're inflicting ...

Of course they should, but this is by no means a new marketing strategy ...


That sucks and I've always expected unscrupulous, shameless groups to do this at some point.

I use the users.noreply.github.com alias when developing on Github

https://docs.github.com/en/account-and-profile/setting-up-an...

And on Gitlab, the users.noreply.gitlab.com email

https://docs.gitlab.com/ee/user/profile/index.html#use-an-au...


I treat GitHub as a social coding network and I want individuals to be able to email me. I’ve received genuine questions, compliments and friendship by providing an email address on my GitHub profile. And if that’s not the product manager’s dream over at GitHub, I’m not sure what is!

So I think the solution to the problem is not to blackhole my GitHub profile, but to put systems in place that prevent the spamming techniques reported by the OP.


> We, as OpenBB, would like to personally thank you...

What dishonest fake-friendly BS. An automated email isn't personal.


Hi all,

I'm the Didier Lopes from the e-mail.

Sorry for this. This was not intentional The goal was to let all you know about the rebrand of the project, ONLY.

We would have removed every single one of you from the newsletter.

See this thread: https://github.com/OpenBB-finance/OpenBBTerminal/issues/1625


> This was not intentional The goal was to let all you know about the rebrand of the project, ONLY.

That is not the defense you think it is, given that is also a violation of anti-spam and privacy laws in many jurisdictions, and of GitHub's ToS. One unsolicited e-mail is still unsolicited e-mail. What you intended to do was also wrong.


I understand this now.

And believe me, if we had known how people would have reacted - even with the one single e-mail about us changing completely the name of org and repo - we would have not done it.

It's midnight here and am still apologising to everyone for this. It's frustrating, but it's the only thing we can do now.


Just something that stood out to me from the wording. Get some sleep, not much more immediate damage control you can do I'd say.


I think people are probably curious how you got a list of emails of people who have starred your repo(s) on GitHub? Personally I don't think "please add my email to a database" as being included in starring a repo. Maybe there's something in the TOS or a community understanding I'm not aware of.

I'm not trying to pile on or push negativity to you and the project; if this is what you intended to do (and I don't have too much reason to doubt) then it's a different flavor of spam than recruiters/blogspam contacting us or signing us up for stuff without our consent.


> curious how you got a list of emails of people who have starred your repo

No mystery there; “stargazing” is one the public [1] social networking features that GitHub provides, and the email address that you write to your Git commits [2] becomes public when you share those commits on the internet.

[1]: https://github.com/OpenBB-finance/OpenBBTerminal/stargazers

[2]: https://git-scm.com/book/en/v2/Getting-Started-First-Time-Gi...


"Letting you all know about the rebrand of the project" is still unwanted marketing that no project should engage in. Whether it's going to be continuous or not doesn't change that.


The irony is you're using a public platform to apologise for this, whereby people chose to come - it's not been foisted upon them. I'd imagine it gained you quite some interest in the project at first. You could have used the same means with some extra content to make the click worth it. This just looks sketchy and you may have alienated a number of potential users. Yes it's only an email but the demographic here is a little different to the norm.


Well at least you starred the project. I get recruitment spam and unwanted surveys from “researchers” on my GitHub emails all the time, having starred nothing related. The worst offender is Turing.com; they spammed emails of all my GitHub accounts repeatedly before I blackholed them, and their unsubscribe button seemed to be a “here’s a real human, please spam more” button. Shame on them and all their investors. https://www.crunchbase.com/organization/turing/company_finan...


Thanks for posting this. I just received their spam email an hour ago. If enough people report it as spam, it should have deleterious effects on their mail sender.


I'm gonna go star them right now in hopes they spam me so I can report it.


Yes, these kinds of marketing tactics are disgusting and despicable, but the broader issue is why does Github facilitate them by making user emails discoverable? There are tools like github-email [1], which allow you to:

> Retrieve a GitHub user's email even if it's not public. Pulls info from Github user, NPM, activity commits, owned repo commit activity.

Why does Github have so many ways to exfiltrate a user's email address?

[1]https://github.com/paulirish/github-email


Given that having an email address attached to commits is a fairly standard component of git, is there a way to prevent this?

Short of using burner/fake email addresses of course


The best way, at least when committing to GH, is to use your GH email itself like so

  8601934+judge2020@users.noreply.github.com

You can find this in email settings https://github.com/settings/emails under “Keep my email addresses private”

You can’t receive email at this address, so, hopefully, anyone that needs to contact you can find your real email elsewhere.


I don't actually use Github for my personal stuff, but that's good to know either way!

Thanks for the heads up.


The only (remotely) effective strategy I've seen is to use burners that only show up in commits and then report all companies using them to GitHub itself in the hopes that eventually those companies will somehow be reprimanded.

Of course, the problems with this strategy are that the manual work scales poorly, there are two or more points of failure, and the best outcome is only marginally positive, doing nothing to deter future abuse.


> the best outcome is only marginally positive, doing nothing to deter future abuse.

This is the problem with virtually all spam remedies so far. If the penalty is that their success rate merely goes down, well it's already dismal, what's another few percent even matter? The economics are massively in their favor.

Until they start getting kicked off platforms for a certain number of spam complaints, the way creators get kicked off platforms for a certain number of DMCA strikes (which can be bogus, that's a problem I'd not like to copy), there's nothing incentivizing them to really, really hesitate and check with legal before pulling the trigger on an email.

Spammers should live in fear. Until we make that happen, nothing will change.



> Until they start getting kicked off platforms for a certain number of spam complaints

This is even far from a solution. The spam I get is always from a newsletter or "startup" I had never heard of before. This leads me to believe MyMachineLearningNewsletterForum will just rebrand under a new name if they were ever given some death penalty from the Internet (putting aside how infeasible that is).


Because the email is part of a Git commit (author and committer information) and your Github repo has public Git commits. "man git-commit" and search for "email" for details.


Yes, I'm aware of how commits work, my point is that this kind of practice goes hand in hand with making it easy for spammers to harvest user emails.


You're barking up the wrong tree. It's Git, and nothing else, that is to blame. Should GitHub stop being a Git host?


The problem is spammers harvesting emails. Git* makes harvesting emails ridiculously easy. Therefore some solutions are needed (e.g. using Github-assigned email address aliases instead of user-supplied email addresses).


Git hosting services cannot change the address on the actual commits, so trying to hide it on the web frontend is pretty pointless. You could always just clone the repo to get the addresses.

It will always be up to the user to set it up in a way they want.

Well I guess GitHub could refuse non-GitHub e-mail addresses when pushing, but let's seriously not go there.


> GitHub could refuse non-GitHub e-mail addresses when pushing

This already exists as an option.

GitHub does as much as they can to protect e-mail privacy without making it impossible to use an email.


The emails are part of the commit hash. https://blog.thoughtram.io/git/2014/11/18/the-anatomy-of-a-g... - good luck comparing them when the emails have been rewritten.


That's just a convention in git. I always use fake emails in commits and nothing has happened to me.


Yes and no. No, it is not a convention. A commit must contain an e-mail address. That’s how the data structure is specified. Of course, whether you use an existing e-mail address or whatever is entirely up to you. I’m not sure if it needs to be RFC-compliant.

Still, even a fake address will be visible to everyone.


Using something with the syntax of an email address is a requirement. Having that be a real, deliverable email address is a convention. If it's a fake address, then sure, spammers can see it, but spamming it is useless to them.


GitHub behaves similarly to git, which uses emails as a weak form of identity. They’re not meant to be private in git’s model, and having them be public is not a serious problem absent bad actors (who should be banned for violating GitHub’s TOS).


Update: I was honestly hoping to forget about this and move on. And now I receive another email! And while they closed with an apology for the unsolicited spam and promise to unsubscribe me from whatever I didn't subscribe to, they couldn't help but shill a bit more:

  I wanted to give you an update about our name change and share with you our exciting journey, which you can find here: https://openbb.co/blog/gme-didnt-take-me-to-the-moon-but-gamestonk-terminal-did

  TL;DR. We are on a mission to make investment research effective, powerful and accessible for everyone.

  If you are interested in learning more about us, you can subscribe https://openbb.co/newsletter.
This is adding insult to injury. /rant


I was just contemplating this age old battle of light vs dark when it comes to shady marketing tactics vs every-day consumers. My phone number was somehow picked up by some spammer sending me 2-3 spam SMS's per day with the same shady tactic trying ("You have an unsent package", "claim your free gift", etc.) to send me to the below domains [1] w/ a tracking code attached.

How do we, as the people building the platforms these perpetrators ride on, stop them? I reported this one to Cloudflare's abuse form because they're all on Cloudflare's nameservers, and almost guaranteed to be the same owner, but, it took me 5 minutes to fill out the form and they only accept one URL at a time. It's just too time consuming to fight back as an individual consumer.

Every one of us has thought at some point, "there has got to be a better way".... right? So?

[1]

http://needthecbd.com

http://wantafreetv.com

http://careforgreatskin.com

http://valuedcust.com


> How do we, as the people building the platforms these perpetrators ride on, stop them?

We don't, because there is more money to be made with the way things are setup as-is. And since builders are generally not the money-decision-makers, the platforms keep being built (technically) bad.

Take phone-number-based-scams, since there is no trust in identity, but there is a money-made-per-usage, everyone except the end-user wins to keep everything as-is.

Scammers get money if a scam succeeds, network providers get money regardless, transit agreements stay up since traffic being passed at volume keeps the lines open, maintained and at size. End-users don't disconnect since they'll need it for legitimate use, so they keep being profitable as well.

If the money were to stop being made at any point in the chain it would suddenly all be over. But since legitimate and scam usage is mixed that will never happen.

Replace phones and SS7 and telcos with any other transportation and information system for comparison. Email spam keeps 'working' because there is no real way to identify the sender in such a way that the identity can be barred. Postal spam has the same anonymous sender problem.

Heck, the best way that does work is having to physically hand flyers to people since you can simply not take the flyer, and since you (as a person) can't be handing out flyers without physically being there you can also be identified and barred.


Follow the chain.

1. Confirm domains are known for phishing and spam.

2. Figure out who registered said domains.

3. Add those people to known blacklists so they can't register anymore domains ever again. Likewise block all domains owned already by them.

4. Get domain registrars and email servers to block said domains too.

5. Rinse and repeat every time it happens.

6. Find similar accountability chains as above and make sure to close the loop on them each time. "Sorry we can't give out emails and personal details. Fuck you, stop enabling illegal activity." And fight for legislation and tech solutions to enable the above.

If you can't move to a better spot after identifying bad patterns, then it's just a giant game of useless wack-a-mole.


> 3. Add those people to known blacklists so they can't register anymore domains ever again. Likewise block all domains owned already by them.

You generally can't know who operates a given domain automatically. whois is almost always redacted now.

> 4. Get domain registrars and email servers to block said domains too.

Good luck with that. They make money from spammers, and don't have any incentive to stop

I tried Namecheap twice and provided them spams with valid DKIM signatures for domains registered to them (generally on TLDs on sale for 1$, which must be sold at a loss, right?). They refused to do something about it.


> 2. Figure out who registered said domains.

How? Have you ever seen a spam domain that provides accurate and actionable WHOIS?


Well that's the first problem to address. I wasn't trying to dismiss it as trivial.


> How do we, as the people building the platforms these perpetrators ride on, stop them?

Every now and then, I get mad while I'm bored and have a bit of free time, and I'll write a script to make requests with randomised tracking codes. I've got about 30 available VPN end points easily available, and I'll cycle through them all sending requests with random ID in whatever the format looks like. It _probably_ makes no difference, but _maybe_ it'll make their data less useful (and if nothing else, I get a bit of satisfaction from doing it.)


I stew on this idea all the time and conclude that only scale (lots of users?) would make this effective. But then I ponder the remedy itself being wielded against legitimate parties and I get slightly sad and move on to something else to worry about.


Yeah - I do worry about making a request that might "verify" some other legitament user's url, so I won't do this if the identifiers looks like consecutive numbers, but if they look like guids I'm perfectly happy to blast 10,000 or 100,000 random ones back at them.


A bit more context: https://github.com/OpenBB-finance/OpenBBTerminal/issues/1625

Not nice, but semi-reasonable. Stash the pitchforks.


Sorry, we didn't mean to spam you with this email, we intended to send this other spam mail instead?


The exact same thing happened to me -- and this is very, very frustrating and makes me want to have nothing to do with OpenBB. Easy enough to block the email but just how gross to do.


When I read a spam e-mail, I add that company to my list of companies not to make business with, ever.


Same with turing.com and a bunch of other sites.


If you asked me what OpenBB is, I would've said "forum software written in php"


You're thinking of phpBB?


Or FluxBB, or bbPress, or MyBB, or PunBB...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: