Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's certainly not my intent to undermine the efforts of Robin Rombach, Andreas Blattman, Katherine Crowson (and many others).

Katherine's work on clip-guided-diffusion over the `guided-diffusion` ImageNet checkpoints was effectively the first time the public got to see what text-to-image via diffusion instead of purely transformer-based solutions (like in DALLE1/dalle-mini) would look like. And it happened well before GLIDE was published (and gets a mention/citation).

The CompVis team (Blattman, Rombach, etc.) has been able to not just compete, but surpass (in some ways - it's nuanced) the work of the big American research labs (OpenAI in particular) with solid novel research. Their research on `VQGAN` outperformed the Autoencoder from the DALLE-1 paper, and they've been competing directly in the vision space ever since.

Incredibly talented people.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: