It's surprising how poorly labeled these images are; who is curating this collection?
Can't they crowd-source a proper labeling project - I wonder how much better things like Stable Diffusion would be if its training would include correct, complete labels for the images. I'm sure lots of folks would willingly spend a few minutes here and there to aid with the labeling if it means they get to enjoy the model for free.
Those images are not really "labeled". They just scraped the alt text. A lot of the recent advances in AI have been by using lower quality large scale web data, instead of hand labeling. The noise will average out. Hand labeled data can be used for finetuning.
Not really. There's sound theoretical basis for that approach in astrophotography. The sort of bias and or noise introduced by systemically poor labeling is much less reliably averaged out.
They have their own datasets and included Laion-400M, a subset of 5b that was released prior to 5b. You can see a short explanation in imagen's "Limitations and Societal Impact" section at: https://imagen.research.google/.
> While a subset of our training data was filtered to removed noise and undesirable content, such as pornographic imagery and toxic language, we also utilized LAION-400M dataset which is known to contain a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes.
Could you imagine if the mediocre results we currently get from "AI" were mostly from poorly labeled data in every huge dataset and not lack of a scientific or technological breakthrough?, it wouldn't be the first time that too many academics are blinded by the wishful thinking that what they need is an eureka moment when what is needed is tons of dull repetitive work.
So for instance for photography generation what may be needed is huge amount of clean photos with obsessively detailed labels, maybe just the exact same single-point-lighting (the exact coordinates of the light being a data point, plus strength/lumens), with 8 pictures of each subject in black background: front, back, left, right, top, bottom, 3/4 mostly-front (AKA the corner), 3/4 mostly-back, and then the same 8 ones but with white background, then also include all the info possible: weight, height, width and depth, plus versions of the most common states of each object (ball: inflated or deflated; bird: flying, idle or walking), plus photos adding two subjects together (one dataset of woman with hat, another wearing the same clothes but without the hat, and one of just the hat without the woman), with properly labeled relationships so it's clear they all refer to the exact same hat (or lack of), you get the idea...
Then the most important thing for "image generation AI" may not be computing power but the most boredom-resiliant staff you can hire; of course that's just an example for photography, for things like text you may need an equivalent rigorous effort by a multitude of linguists.
I think for human generation you need to add a bone/skinmesh generator in 3d. So it needs to learn what the position is of the body in a 3d space and compare with the source images.
If you aren't Google, manually doing that with 5+ billion images might prove difficult, to put it mildly. Large-scale labeling is typically bootstrapped with smaller models and whatever manual data you have. What's being curated is the bootstrapping process.
So stable Diffusion has a img2prompt mode. I wonder if that can be used somehow. The prompts it has yielded for my personal images have been very descriptive and good. It would be interesting to see how different the img2prompt output is for the training images. I would love to measure it but I don't even know how to calculate this distance.
if there are any researchers reading this, I'd like to introduce you to booru porn galleries like rule34. millions of images meticulously labelled by dedicated enthusiasts
The training used in the current round of image generators is Contrastive learning - the idea is that you don't use "properly curated" datasets - you just throw volume at the problem and let it sort itself out. Which makes sense - humans learn by the same mechanism.
The LAION-5B set in particular is filtered by it's CLIP embeddings - basically run a better AI to classify the images, and check that the two descriptions rate highly enough for similarity, which in turn lets the system have some confidence an image isn't completely wrong (also the step you filter out NSFW and other things if you want, generally).
"Can't they crowd-source a proper labeling project"
I'm very surprised how primitive it is. I put in a few names of uncommon technical objects and some of its images were close and others so far off as to be unrecognizable or totally unrelated/useless.
What it needs is a quick/instant feedback system that allows humans to rate an image against the query word. A rating scale of say 1 to 5 where 1 is an identical match, 2 a close match, 3 marginal, 4 ambiguous, 5 wrong/totally unrelated. If in place then likely thousands would make an effort to optimize the matching.
I don't think that's feasible for anyone but someone like Google or another tech giant.
They could use Wikimedia Commons to get a smaller collection of better labeled images. Currently it's images from Common Crawl extracted by Laion, not sure if it already includes Wikimedia Commons.
You might be surprised. We have almost as many opt-in requests as opt-outs since we announced this today.
We don't see this as binary in the long term. Maybe artists want to release art from a prior period, for example, but withhold their current series until they move to the next.
I think we'll see people who want to post images of themselves in there as well. Like, being a 'playable character' you could prompt for to be in the artwork could be pretty fun.
Reading through your comments, it appears you're under the impression that we (Spawning) trained models using this data. That is not the case. We're building tools to help people to remove themselves from, or add themselves to, datasets used (by others) to train these models. We're hoping to make that as close to 0 effort as possible for all parties. We agree that artists should control how their works are used, and we're working hard to make that the norm.
I’m skeptical that Al will replace artists. The existence of computer chess and go, though superior in every way, has not killed interest in human competition. I see no reason we couldn’t have the same thing happen with art: people see AI art as a novelty and something worth studying but otherwise preferring human art for its relevance to the human condition.
The problem is that competition in chess, in games, has always been for the sake of the game itself and the fun in playing it. 'game playing' is its own reward, there is no other entity interested in paying for results of chess games, even 'interesting' chess games. If there was, you could setup instances of Stockfish to play against each other and dominate that market.
This is not true of the artist market. While yes, many artists get into it for the fun/satisfaction of producing their own artworks, the reason that they can remain in it long term is that there are other entities in paying for the results of their work. See: game studios, film studios, etc. Before AI, the only way to get the assets for these various projects produced was by paying artists/experts for their work. Now, the production of those building blocks can be automated.
I agree. It was telling that in what I perceived to the non story of the midjourney created image winning the county art fair, many of the other prizes were for activities that machines have been better at than humans for decades. Fruit canning. Furniture construction. The replacement narrative is overblown, but these tools are going to augment human artists considerably.
Could be done by reCAPTCHA, although might be more difficult to classify the results (whether a user passed the test) compared to their usual challenges.
Not quite a CAPTCHA, but something similar has actually been done before.
The ESP game[1] paired two random people looking at the same image while a timer ticked down. The players had to enter labels that described the image, and if both players applied the same label, their score increased, and the label was associated more strongly with that image.
Well-known labels for an image were excluded after a while, so you had to guess less and less obvious labels in order to score as time went by. Once in a while the system would also assign test images with known labels to prevent cheating.
Apparently a lot of people played it because it was fun, so there wasn't even the need to pay them for labeling the dataset...
" Releasing our first Spawning tool to help artists see if they are present in popular AI Art training data, and register to use our tools to opt in and opt out of AI training
I think we have created a way to make this work out well for everyone"
I found this very similar to what you'd get with a google / bing / etc image search. Is that where this database comes from? I noticed there is a lot of "Shutterstock" watermarked stuff. And I also checked a few "adult" terms (large breasts etc) and found there is a lot of nude content. Only curious because I've seen lots of the generative models have some post-filtering for nudity, why don't they just clean it out of the training data if they're worried?
It's built off of common crawl, so it probably does have a pretty representative sample from whatever the big image searches use.
Funny enough, the NSFW filter that laion built is turned on. Without it, it's... a lot. The NSFW stuff is done with a model, so you get a probability of NSFW out of it, and you can select a threshold.
If you set the threshold high, like 95% certainty that an image is nsfw to filter it, you get a bunch of false negatives, letting a ton of nsfw through. Set it too low, and you throw out stuff that isn't nsfw.
We (haveibeentrained) erred on the side of too high, so we wouldn't tell artists their work wasn't in there if it was. Tough trade-off there. Similar to using the dataset to train an AI model, where you might cut off useful images from training if you try and filter all the nsfw.
Half page popup cookie banner with no opt-out option. I am completely fine with cookies but clearly there is something wrong with the state of things when I have to click through a completely non-actionable popup
Obviously it might be a slightly futile task given the size of the dataset, but would there be value in adding community captions to some of the images? I'd be happy to spend 15 minutes captioning the worst-labelled (maybe cosine similarity between stabdiff im2txt and the prompt?) images in the set / reviewing other people's captions, and if you can get enough people on board you could probably get through a not-insignificant number of new captions. Equally a risk of anti-ai groups sabotaging this process though
Cool tool but don't you think the onus should be on Emad & team (to use a dataset with only public domain and licensed inages) rather than forcing every artist to opt-out?
"Rutkowski" returns a bunch of book covers, repeated a lot. Can you ensure images returned have diverse embeddings? I expected digital art, not detective stories.
Do you use CLIP or just metadata?
What is intended process that starts after you get artist's email (for either of purposes).
In the next few weeks we'll be adding the ability to log in and flag or upload your works (if they aren't there). Those lists will have permissions assigned to them, starting with simple opt-in or opt-out.
This sounds like a great initiative. I can't find anything on the site about a privacy policy, how the email addresses you're collecting will be used. If you're creating this for the AI community, then how will this data be made available to others?
These are important questions. We aren't storing any of the images used to search, and the email addresses are going to mailchimp lists for opt in and opt out. As we role out the next set of tools, which let users flag images and create lists from them, we'll email the mailchimp lists with more info.
When we enable sign-in, we'll also add a privacy policy, because at that point, we will store some images, on request, to use them to make finding other works by the same artists easier.
Opt-out image URL lists will be made available to the dataset owners for removal. Opt-in image lists will be public.
Thanks! That is definitely on the list, but might be a few months away. We're focusing on using images to find other images so it will be easy for artists to flag all of their stuff quickly. But, once we have that in a good place, we will definitely be adding more to the text search side.
How do you have the rights to use any of the images? They are clearly the same as what a Google image search would result in. However google links to the source.
If that's a rights issue, we'll definitely add a link to the source. For now, you can right click -> open in new tab to see where it came from, but we'll look into this asap.
The goal here is to give people the opportunity to remove images they don't want in this dataset or add images they do want in there.
This person doesn’t know what they’re talking about, or is withholding information. Yes, your photos have been used to train early iterations of the stable diffusion models. Those iterations are hardly usable in most applications, and will be phased out in A few weeks.
If Spawning is able to have your images removed from the training set by version 1.7 or whatever, you will be removed from any models actually in use for real commercial applications.
We (Spawning) did not create the dataset or train the models in question. We're working to make it easy for people to remove themselves from, or add themselves to, this dataset and future models.
You mean the portfolio which, by definition to be in that dataset, was marked for indexing by robots.txt, and made available publicly to unauthenticated GET requests on the internet?
Amazingly enough nobody made choices about how public they wanted their work to be that anticipated "someone scraping the entire internet and putting it into a giant neural network". This whole AI training thing may technically fall under fair use but it is right on the edge and begging for this very hazy edge to be made a lot sharper.
This seems no different than the controversy over GPL code being used to train Github Copilot. Just because it's publicly available and allows indexing doesn't say anything about the license it's released under.
When using GitHub, you grant GitHub a license to use your code for Copilot and other such things. This is not necessarily the same license that you might give others separately, such as MIT or GPL.
But that argument doesn't really work, since you can upload someone else's GPL'd code on GitHub. So, it's not really possible for a random person uploading old code to GitHub, to give them new rights that the original authors hadn't granted.
Computers aren't people, so thinking they "learn" the same way or for the same reason is childish. In addition, only one of those things is capable of and held accountable for following the law.
I'm impressed, never looked at another photo, just emerged from a blank room taking pictures having never seen another photo before. Just knew photography from the womb and came out with camera in hand knowing exactly how to work it. You are the one man who never had to learn a thing or rely on someone else or their prior works for training and/or inspiration.
I've been playing with DSLRs for twenty years. I just fuck around until I have some rudimentary understanding of it. Just like how I approach anything else, including playing instruments.
Yea, the only "mental gymnastics" here is the BS you're trying to pass us that your personal neural network has absorbed exactly zero input or observation from other creators/creations that influences (consciously or otherwise) it's ability to create and profit from works.
More bullshit equating the human mind to an AI. Newsflash, boyo - humans are not NNs, there is no reason legal precedent should treat humans the same as NNs (and it won't).
I capture what I see, more as a frame of reference for myself with location scouting and game dev. I take quite literally hundreds of thousands of photographs. It's 1 in every thousand that's worth sharing.
I have zero interest in other photographers, I can't even name one. Whatever other people do, neat. I don't even consider myself to be a "photographer", but I guess with hotels and airports buying prints, that somehow makes me a professional.
I capture what I like. There's literally zero other motivation behind it. Most of my income comes from elsewhere. I use a print-on-demand service.
If I take a photograph of a sculpture I've made and that is then incorporated into another work without permission, that's intellectual property theft. A lot of the images are of things I've made, not just street photography or landscapes that anybody could go and take.
If I write a riff in a microtonal scale and it turns up as a sample in a record, or is covered and released without permission, that's intellectual property theft too.
More than one person can come up with the same riff, right?
If someone is worried about an original image, why not watermark all uploaded versions?
If I walk about my neighborhood scattering paper copies of an image, and someone else picks one up to hang on their wall, did anyone break any laws (besides me littering)?
So you're trying to assert copyright on something which isn't a copy of something? Again, where in the neural weights does your original work live?
EDIT: Let's consider a simpler scenario - I take an MD5 sum of one of your photos, and then hash collide a an image from my phone camera till I find a match. Which part of this process is "stealing"?
> Again, where in the neural weights does your original work live?
That's not the standard for copyright infringement. The standard is (1) access to the original work and (2) producing a work that is "substantial similar" to the original. If the user of an AI produces an output that is substantially similar to one of the AI's training images, then it could potentially infringe.
That's the law. In an actual case, a judge or jury would look at the original and decide if the alleged infringer is "substantially similar" enough. That's a crap shoot, so cases usually settle before that.
> producing a work that is "substantial similar" to the original.
So the question “where in the model is your original?” is a reasonable and relevant question to ask.
If this person can induce the model to produce a work that is reasonably considered a copy of their original, then fair enough. All they have to do is give a prompt and a seed and they can prove copyright infringement very easily because anybody else with similar hardware and software can demonstrate the infringement on demand. If I understand correctly, this has happened with GitHub Copilot, with Copilot reproducing copyrighted works verbatim.
But if they can’t do that… why should anybody take their claims of copyright infringement seriously? If nobody can point to copyright infringement having taken place, what basis is there for believing it has? As you say, the standard is access to the work and producing a work that is substantially similar to the other. The former has been demonstrated. People are asking about the latter.
So “where can we find the original in the model?” is possibly the single most relevant question there is. It’s a clear line that divides infringement from inspiration, and it can be proved definitively if copyright infringement has been observed to occur.
If there was an actual case, the question would be whether the defendant had access to the original work. If the original work was used to train the AI model, then the answer is yes. It's not necessary for the model to contain a copy of the original work.
That wouldn’t be the question because nobody is disputing the access to the original work. You mentioned two factors. That’s the first, which nobody is questioning. The thing people are questioning is the second factor, the reproduction of the original work.
A copy must be made for there to be copyright infringement. Who has shown a copy has been made? If a copy has been observed to occur, it’s trivial to demonstrate copyright infringement. Who has done this?
I may have misunderstood. I thought we were talking about an AI output that might be substantially similar to an original used to train the AI. If we're considering only the use of the original as a training image, then I don't know if that would infringe.
The part they’re concerned about is where somebody uses their work as a reference for derivative works.
We can acknowledge that this is an unsettled legal question without pretending like there’s no such question for a creator to raise. This community works better when we give each other fair understanding.
This is very much not what your link says - quoting:
---
Subject to sections 107 through 122, the owner of copyright under this title has the exclusive rights to do and to authorize any of the following:
(1) to reproduce the copyrighted work in copies or phonorecords;
(2) to prepare derivative works based upon the copyrighted work;
(3) to distribute copies or phonorecords of the copyrighted work to the public by sale or other transfer of ownership, or by rental, lease, or lending;
(4) in the case of literary, musical, dramatic, and choreographic works, pantomimes, and motion pictures and other audiovisual works, to perform the copyrighted work publicly;
(5) in the case of literary, musical, dramatic, and choreographic works, pantomimes, and pictorial, graphic, or sculptural works, including the individual images of a motion picture or other audiovisual work, to display the copyrighted work publicly; and
(6) in the case of sound recordings, to perform the copyrighted work publicly by means of a digital audio transmission.
----
This is far from an unlimited grant of power. Of this list, the only plausible grounds is (2) - derivative works.
Which means we're well into then arguing about "Fair Use"[1], which I would encourage people to read the full description of carefully - because the answer isn't whether you can come up with a snippy "gotcha!" it's whether under careful consideration in the court of law anyone would be likely to agree with you.
None of this stuff is being used as fair use (criticism, comment, news reporting, teaching, scholarship, research, etc.) nor is it used verbatim -- a fair use requirement.
I really don't know why people bother bringing it up other than to create an uninformed distraction.
Assuming you're referring to Stable Diffusion, the training set photos are not used commercially. Using them in training is non-commercial. If they were to share the training set, that would be commercial. However, the final product (the neural weights) does not include any of the training data and so there is no commercial use of your photos occuring.
It's the same as if I were to look at your photos and then take my own similar photos for the background of a web store. I would have "used" your photos in some sense. But I would not have used them commercially in any way whatsoever.
You have just as much grounds for complaint in Stability's case as you would in the case of me looking at your photos as reference for my own.
Regardless, there is no copyright infringement occurring when I use my memory of having seen someone else's copyrighted materials to produce my own wholly new but similar looking works.
Likewise there is no copyright infringement occurring when an ML model is trained on copyrighted works.
Stable Diffusion is a 5GB matrix of floating-point numbers trained on 240TB of data. It does not, and cannot, contain infringing data from the training set. It is physically impossible for it to contain such data. There is not, and cannot be, any infringement occurring.
If I compress an image with JPEG, I won't get the exact same pixels values in the original image after decompressing because the compression is lossy, but I'm sure you'll agree that the original image is "there". How are NN weights different from the DCT coefficients in a JPEG file?
The NN weights can't recreate anything without input values - specifically a 512x512 grid of random noise, and a transformed textual prompt.
So there's almost a kilobyte of missing data, without which nothing is produced.
Would a JPEG file with random noise in it be potentially any given original image? Of course not - even though the decompressor is perfectly capable of recreating one given suitable input data.
The "Grokking Stable Diffusion" colab posted here a week or two back makes this particularly explicit - any given 512x512 image can be reversed into a latent-space encoding of that image, and reconstructed from it. The NN weights are necessary but not sufficient to do so - ultimately you end up with a 512 byte number mapping to the expression which reconstructs an image. But that includes images which aren't part of the original training set.
You can pontificate all you want about what is contained in the model, the fact is a case related to infringement is going to hit an EU court soon enough and any sort of commercialization of those models will be banned. And good fucking riddance to the AI bros, the monkeys of engineering.
You seem to think it's all figured out, go ahead and try it. This isn't theoretical, everyone seems to think they definitely know how this will go and we sure could use some court clarity.
I can't "try it" because I am not an injured party, me not being an artist that has published any image ever. That being said, the existing "relationship" between google and news publishers in the context of EU court decisions is already convincing enough about the direction this is going to go down (and google's usage of news didn't even have to cost a single person their job, unlike how this will).
> "I stole your money and several other people's money and then put it into a bank account. Tough luck! It's too late, you are unlikely to prove which of the dollar bills in account are yours!"
This NSFW! Do not open this while at work my first search was yeah let me check this totally family friendly search term and badabing badabong CTRL + W!
Excited to see semantic search getting legs with this and Lexica[1].
Related: we recently released[2] semantic search for custom datasets. You can use it to find all sorts of weird stuff in benchmark datasets like MS COCO[3] used to train many computer vision models.
If folks are interested I can write up a "how we made it" post describing the behind the scenes.
Yes, that's the same dataset. This website has some additional tools coming so artists can flag and opt out, or upload to opt in, and we'll get those to the laion team to add or remove from the 5B (and future) datasets.
What's your approach to establishing authorship? I'm sure some trolls will try to claim things they didn't actually make and get them removed (and vice versa).
Our initial approach will be to validate the artists manually, and trust them to only flag or upload their own works. If we hit a scale where that starts to become an issue, we have plenty of ideas for mitigating this, but we're not optimizing for that problem yet.
You have no way to stop people from uploading content that isn't theirs. Worse, you have no way to stop your program from stealing uploaded intellectual property.
One thing that sticks out to me is how so much of the the images in the collection have really terrible labels. I uncovered a large collection of pieces by an illustrator who was unsearchable by name, only via image upload. The reason they were unsearchable: the majority of this particular artist's images had labels that were all in the format of:
I'm not 100% on this, but I think a big portion of the captions for the images come from the alt-text, and are probably auto-generated by the sites they were scraped from.
It would be great to have some simple text from the artist connected to the artist’s images. Things like, artist’s name, title, subject matter… Many of my images (drawings and paintings) aren’t associated with my name and instead are associated with different drugs and psychology and other things. It would be better to have the opportunity to somehow to enhance that. To influence the future prompt results. I feel like it would make Ai more intelligent.
Thank you for the feedback! That's what we're hoping our opt-in tools will help artists do. For the problems you've posed here, you might actually want to both opt-in and opt-out. You'd be able to flag the imageurl-caption pairs that don't accurately credit you or describe your work and we'll forward them to the dataset creators for removal. You could then add any works that you're comfortable being used for AI training, and caption them however you'd like. Those would go into the dataset and be used when training future models.
We'd love to have you sign up if you're interested! We expect to start the beta for those tools in 2 to 3 weeks.
This is great just from a prompt engineering perspective. Now I can see what labels are typically like for the types of images I'm looking for instead of guessing.
Stable Diffusion used an aesthetic filter to train on a subset of the English language images from this full 5.8 billion multi-language set. That probably got a lot of what you're finding.
I'll get downvoted massively, but porn is by definition art (meant to be viewed and evoke reactions, just as advertising art is), and has been some of the most important creative outputs for as almost as long as humans have been creating art.
Sometimes I think a single change that would have an outsized improvement on American culture would be to make the mandatory high school art class include a week of sketch-drawing naked models, as actual artists do, both male and female.
It isn't art. But neither is decoration, nor craft.
Things can be "artistic" or contain art within them, but it doesn't make them art.
I don't buy that something done/created for the purpose of being viewed and evoking (even strong or many) reactions is enough of a criteria.
Otherwise things from trolling to 9/11 are "performance art" (an already barely hanging-on category), the latter of which was very "successful" at meeting this minimal set of rules.
So you know art when you see it, and there is no intersection between porn and art? Your definition is the generally and legally accepted definition? Is David art? The Rape of the Sabines?
No, that's why it's called pornography, very much to differentiate it from something "noble" like art, it's vulgar and trivial, by the very definition of the word it isn't art.
Etymology is "writing about prostitutes" [https://www.merriam-webster.com/dictionary/pornography]. So you got me there. In the realm of visual depictions, it's just as much art as cave art. Art is a broad classifier. Still lifes are art. Landscapes are art. Audubon's drawings of birds are art. Porn is film, photography and drawing, with a specific subject matter. If Iron Man comics are art, it is too.
"Noble" is nowhere in any working definition of art. Some academic snob might use it with their in-crowd. Damien Hirsch's shark parts is considered art.
"everything I say is art is art", you basically. Well anything I say isn't art isn't art. Especially pornography in general, which sole purpose is sexual gratification of the audience through sexual exploitation, rape via human trafficking.
There is nothing snob is saying that. There something egregious and dehumanizing in saying what you say on the other hand and trying to normalize the very definition of vulgarity.
Not necessarily; plenty of pornography contains serious artistic contents and merit beyond the simple aim of sexual gratification, and they raise interesting questions on their own. This is why some have attempted to make another classifier of 'erotica', which IMO isn't needed.
>through sexual exploitation, rape via human trafficking.
Even if this were true for all porn (and it isn't), that dosen't make it any less art; it may make it 'lesser art' from a moralist perspecitve on the value of art, but this by no means of widely accepted relevance to aesthetic value nor to the property of being 'art'.
>the very definition of vulgarity.
A rude gesture or crass words seems vulgar. A video of a penis being sucked doesn't. "Vulgar" is the sort of word you might hear a shrill, paternal sitcom character use to describe a minor inconvenience or faux pas, and it's just as hilarious when people use it as an argument in real life.
> A rude gesture or crass words seems vulgar. A video of a penis being sucked doesn't. "Vulgar" is the sort of word you might hear a shrill, paternal sitcom character use to describe a minor inconvenience or faux pas, and it's just as hilarious when people use it as an argument in real life.
In the mean time you can spare people gratuitous, graphic descriptions of explicit sex acts, AKA pornography, and keep your depraved fantasies for yourself.
"gratuitous, graphic descriptions of explicit sex acts,"
If you think a blowjob being described in seven words to make a point about vulgarity is 'gratuitous', 'graphic', or 'explicit', your moral compass is severely out of whack.
> My fantasies aren't depraved in any way, but they are well-represented in pornography.
Your certainly qualify as a vulgar individual given the fact that you feel the need to depict pornography here trying to make a point about god knows what.
Depict? I've held off from the ASCII art here. Though I recall a 1975 Ascii art of a Playboy centerfold (in 1975, no vulva in Playboy at that time). Yes, I was in computing at that time, though as a prepubescent at that time not super into porn.
Point about god knows what? Pretty sure it's been clear all along, art encompasses things we like and things we don't.
> Depict? I've held off from the ASCII art here. Though I recall a 1975 Ascii art of a Playboy centerfold (in 1975, no vulva in Playboy at that time). Yes, I was in computing at that time, though as a prepubescent at that time not super into porn.
to depict: to represent or characterize in words; describe.
Only some people define porn as degrading and rape-ful. Galleries are full of paintings of slaughter, is art about snuffing out life? Is all erotica porn? Where do you draw that line?
You sound anti-sex in dismissing sexual stimulation as non-artistic. I could stimulate rage and sadness in you with a painting of two girls stabbing a kitten in the eyes, why is that painting less art than a painting of two women making love to each other? Sex is a good thing, even outside of marriage imho.
> You sound anti-sex in dismissing sexual stimulation as non-artistic.
Pornography isn't sex, it's a gross caricature of sex, usually shot from the perspective of male subjects involving the degradation of women, other men or children, purely for commercial purposes in order to give a false sense of sexual gratification. This isn't art, this is akin to butchery, with human bodies as meat ready to be consumed by incels and all kind of other frustrated males and depraved individual.
The former certainly like claiming pornography has any artistic merit as a way to justify their consumption of depraved content and their lack of any actual sex life.
Sex with another person is difficult to come by for many, and only comes sporadically for many more. You don't sound like you've been married for 20 years and bound by oath to not have sex with anyone other than someone who's physically lost all libido. I recommend some empathy.
> Sex with another person is difficult to come by for many, and only comes sporadically for many more. You don't sound like you've been married for 20 years and bound by oath to not have sex with anyone other than someone who's physically lost all libido. I recommend some empathy.
You keep on making assumptions about your interlocutor, stick to the matter discussed.
I'd recommend you stop trying making things personal and have some empathy yourself for the victims of the porn trade and sex trafficking instead of trying defend pornography by labeling its critics "anti-sex". Porn is not sex and whoever gets off watching porn isn't having sex at first place. The fact that people have a non-existing libido is the least of my concerns.
You conflate porn with sex trafficking. May as well conflate girl's gymnastics with sex abuse. Because there is certainly sex abuse in girl's gymnastics. Heck, in society as a whole. You have a very warped definition of pornography.
You talk about sexual stimulation for people that can't have sex with other humans as though it's a bad thing. That's just being a prude.
> You conflate porn with sex trafficking. May as well conflate girl's gymnastics with sex abuse. Because there is certainly sex abuse in girl's gymnastics. Heck, in society as a whole. You have a very warped definition of pornography.
> You talk about sexual stimulation for people that can't have sex with other humans as though it's a bad thing. That's just being a prude.
You conflate sex with porn, then try to paint porn sick individuals as "victim" because they are not entitled to have sex, while being completely oblivious to the real victims of porn production and trade, trying to paint pornography as any other thing than violence, grotesque and vulgar and you have the audacity to call me prude because I'm against sexual violence and its effects on society? If that's being "prude" in the mind of porn sick individuals trying to paint their fetish as art, then I'll be "prude" all day.
There really is. I searched for my first and last name, and after a few images of an actor with a vaguely similar name it devolved into shirtless buff dudes in suggestive poses.
On the plus side, i appear to have not (yet) been trained. I can consider myself safe from the AI. For now.
I'd love to know how this works. I entered my own name for the lols, and it returned mostly paintings of the Cape Winelands in South Africa where I grew up, which is pretty creepy.
It's using openai's clip (https://openai.com/blog/clip/) to find the image similar to your query or image. Clip learned to match images to the captions that were paired with them from images on the web. My best guess would be that there were enough pictures from someone with your name (maybe just last name) that it made that association.
I entered mine and it returns a ton of 1800's images of men (apt) with a similar first name and rather different last names. It has a long way to go on this front.
It would be a stunning twist of irony if this website uploaded images to a proprietary image dataset used for training AI models, pitching "uncorrelated data"
That highlights one of my "concerns" about AI training sets. There seems to be a very real risk of accidentally adding an image that you have no rights to, so what happens if someone finds out and demand their image removed from all models trained on that image?
You can't really back an image out of a model, you can only retrain without that image.
Creative Commons should add a new license that prohibits the use of ones work to train AI.
Because Creative Commons really needs another non-open license variant like non-commercial that no one really understands what they allow and don't allow so you're better off just not using them if you're being conservative. </s>
We are building an opt-in list, because a lot of people do want to be able to prompt AI with something like, "a cat in the style of me" or "me riding a dinosaur". That will be shared publicly, of course.
I put in Donald Trump to see what kind of celebrity images might be in there, and there are a TON of memes / photoshopped versions of him looking like a caricature or otherwise warped. I wonder if the AI will average these into a fair resemblance, or whether prompts using his name will end up more cartoonish than other names due to the source data...
We think this is because the images are all links and the browser itself is pulling them in from across the web. With chrome it happened once during our testing, and we've had one user also experience it, but it's intermittent. Thank you for pointing to brave, which we haven't tested yet! That might help us repeat the errors.
Certainly this could be used for evil (tm), but it seems like something could also be built out that enables this for good in ways that don't cause issues with Mr. Fibbonaci.
Can't they crowd-source a proper labeling project - I wonder how much better things like Stable Diffusion would be if its training would include correct, complete labels for the images. I'm sure lots of folks would willingly spend a few minutes here and there to aid with the labeling if it means they get to enjoy the model for free.