ycombinatorrio 3 years ago

For those who wants to know before installing on my 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz, it consumes 13 GB of RAM and takes 4 minutes to generate an image with 32 steps (~2 min for 16 steps).

  • yyyk 3 years ago

    On Linux/i5-1135G7 - takes 3min very consistently for 32 steps. Memory use: ~13.5gb VIRT, 9.3gb RES.

  • digitallyfree 3 years ago

    On my Intel i5-4590T (8G RAM) it takes around 5-6 minutes to generate with 32 steps, swapping to disk as it does consume around 13G memory total. You don't get real-time feedback but it's very usable and fun to play with. I wish there was an option to force a manual seed though.

    • digitallyfree 3 years ago

      FYI they just added the seed option to the repo today. Now waiting for img2img...

moosedev 3 years ago

Working in WSL (Windows 10) Ubuntu on Ryzen 5600X; uses ~11GB of RAM and takes 2m04s with the default settings.

This is the first time I've played with a text-to-image model. I was aware that so-called "prompt engineering" can be tricky, but it's wild to see it for myself. A single space character can be the difference between garbage (or nightmare-fuel) output and output that captures the spirit of the prompt pretty well.

  • teruakohatu 3 years ago

    > A single space character can be the difference between garbage (or nightmare-fuel) output and output that captures the spirit of the prompt pretty well.

    It shouldn't really, have you tried generating a few images with each prompt with and without space?

    Even with the same prompt, you can get a wide variety of quality.

  • nl 3 years ago

    That's very surprising and shouldn't be the case in general (the exception being things like compound words or spelling errors maybe).

    Do you have some examples?

    Are you fixing the random seed? If not the variation is more likely to be that than a single space.

  • LanternLight83 3 years ago

    > Ryzen 5600X

    Ooh, I've got one of those! I've been getting by trying to run it on my PC, which for various reasons currently has a 5800 and a GTX 1050 4GB, which can just bearly handle optimizedSD at 90s/image but runs out of memory if I try to use the popular webui repo. Swapping to the 5600X might be worth it!

rhn_mk1 3 years ago

What OS does this need? Using Ubuntu 20.04, I'm getting stuck on openvino:

> $ pip install > Could not find a version that satisfies the requirement openvino==2022.1.0 (from -r requirements.txt (line 6)) (from versions: 2021.4.0, 2021.4.1, 2021.4.2)

I even upgraded to python3.9, which, inexplicably, is required but not available in the "supported" OS.

EDIT: apparently it requires a version of pip that's newer than the one bundled with Ubuntu.

ManuelKiessling 3 years ago

I've set up a Discord Bot that turns your text prompt into images using Stable Diffusion.

You can invite the bot to your server via https://discord.com/api/oauth2/authorize?client_id=101337304...

Talk to it using the /draw Slash Command.

It's very much a quick weekend hack, so no guarantees whatsoever. Not sure how long I can afford the AWS g4dn instance, so get it while it's hot.

PS: Anyone knows where to host reliable NVIDIA-equipped VMs at a reasonable price?

yayr 3 years ago

then - how far away are we from having it on M1/M2 Macs, at least with regular processing? openvino may be one path I suppose: https://github.com/openvinotoolkit/openvino/issues/11554

  • yayr 3 years ago
    • garblegarble 3 years ago

      I've been using this on my M1 Max and it works pretty well, 1.65 iterations per second (full precision, whereas my PC's 3080 can only do half-precision due to limited memory)... a 50-iteration image in about 40 seconds or so.

      • smoldesu 3 years ago

        > full precision, whereas my PC's 3080 can only do half-precision due to limited memory

        What model are you using? I've been running full-precision SD1.4 on my 3070, albeit with less than 10% VRAM headroom.

      • MattRix 3 years ago

        Your 3080 should be able to do full precision. Are you sure you don’t have the batch size set greater than 1, or another issue along those lines?

        • garblegarble 3 years ago

          Thank you and smoldesu for letting me know it should work, I'll have a better look into what's going on - it didn't immediately work on Windows in full precision (probably a batch size issue as you suggested) and I gave up...

          I shouldn't have given up so easily, but my tolerance for annoyances on Windows is pretty low (that Windows machine is kept for gaming, the last time I used a Windows machine for anything but launching Steam was when Windows 2000 was the hot new thing...)

    • zmmmmm 3 years ago

      this worked fine for me, and running side by side with Intel CPU + nVidia 2070 it actually does not take much longer (and as a sibling said, seems to be working at full precision). It is one of the first things I've done that has properly made my M1 Max's fan spin up hard though!

  • pmalynin 3 years ago

    I got it working in about an hour on M1 ultra, mostly compiling things and having to tweak some model code to be compatible with metal. It works pretty well, about 1/10 to 1/20 of performance I can get on a 3080.

  • homarp 3 years ago

    PyTorch for m1 (https://pytorch.org/blog/introducing-accelerated-pytorch-tra... ) will not work: https://github.com/CompVis/stable-diffusion/issues/25 says "StableDiffusion is CPU-only on M1 Macs because not all the pytorch ops are implemented for Metal. Generating one image with 50 steps takes 4-5 minutes."

    • andybak 3 years ago

      By comparison I can generate 512x512 images every 15 seconds on an RTX 3080 (although there's an initial 30 second setup penalty for each run)

    • fragmede 3 years ago

      Yeah you can. Using the mps backend, just set PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU for unimplementeded ops. Takes a minute but it's mostly GPU accelerated.

  • jwitthuhn 3 years ago

    I found this repo early on and have been using it to run inference on my M1 Pro MBP. https://github.com/ModeratePrawn/stable-diffusion-cpu

    For me it runs at about 3.5 seconds per iteration per picture at 512x512.

    There is also a fork that uses metal here and is much faster: https://github.com/magnusviri/stable-diffusion/tree/apple-si... but it doesn't support seeding the rng and will occasionally produce completely black output. Useful if you want to spit out a whole bunch of images for one prompt but you lose the ability to re-run a specific seed with a tweaked prompt or increased iterations.

    • woojoo666 3 years ago

      > For me it runs at about 3.5 seconds per iteration per picture at 512x512.

      Wow that's impressively fast, I have a relatively recent Nvidia GPU that still takes 10 seconds. And the GPU is already almost as big as the entire macbook

      • torginus 3 years ago

        I think that's per iteration, so the total time for the image is 32 times that

        • jwitthuhn 3 years ago

          Oh yeah i may have used confusing terms there. What I mean was 3.5s per 'step'. A full image takes quite a bit longer.

          • woojoo666 3 years ago

            Ah my fault for not reading carefully. But I don't feel as bad about my big GPU anymore now

  • chipx86 3 years ago

    I'm using the fork here: https://github.com/magnusviri/stable-diffusion.git (apple-silicon-mps-support branch).

    Pretty easy to set up, though I had to take all the Homebrew stuff out of my environment before setting up the Conda environment (can also just export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1 GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1, at least in my case).

    Otherwise, I followed the normal steps to set things up, and I'm now here generating 1 image every 30 seconds at default settings. This is on a M1 Max MacBook Pro at 64GB RAM.

krasi0 3 years ago

Great work! Is there a similar project for (local) text generation (NLP) on a CPU + lots of RAM. I mean something transformers-based and of similar quality to GPT-3 (i.e. better than GPT-2). I understand that each prompt would take almost forever to complete but still curious if something like that exists

  • versteegen 3 years ago

    Yes. Fabrice Bellard wrote a highly optimised library (libnc) [1] for training and inference of neural networks on CPU (x86 with AVX-2), and implemented GPT-2 inference (gpt2tc) with it [2]. Later he added a CUDA backend to libnc. You can try it out at his website TextSynth [3] and I see it now runs various newer GPT-based models too, but it seems he hasn't released the code for that. Doesn't surprise me as he didn't release the code for libnc either, just the parts of gpt2tc excluding libnc (libnc is released as a free binary) so someone could reimplement GPT-J and the other models themselves.

    Incidentally, he's currently leading the Large Text Compression Benchmark using a -based compressor called nncp [4] which is based on this work. It learns the transformer-based model as it goes, and the earlier versions didn't use a GPU.

    [1] https://bellard.org/libnc/

    [2] https://bellard.org/libnc/gpt2tc.html

    [3] https://textsynth.com/

    [4] http://www.mattmahoney.net/dc/text.html#1085

    • krasi0 3 years ago

      Yet another gem by the genius Fabrice B!

      I kinda understand why he would not release the source code. Perhaps, he's finally decided to monetize some of his coding skills. Maybe in the future, he'll start releasing some of those newer and bigger models to the public given that other big corps like FB have started already doing so (GPT-NeoX and OPT - as mentioned in the sibling comment by infinityio)

      • versteegen 3 years ago

        Yes, TextSynth.com is a commercial service, see pricing [1]. If his code is faster than others' (I'd certainly believe it) then it's quite valuable, and he deserves to be able to monetise it. Edit: Also, OpenAI is slashing price for GPT-3 by 2-3x tomorrow because of "progress in making our models more efficient to run" [2].

        Also, he was/is competing for the Hutter Prize with nncp, however he is outside the requirements for the prize: CPU-time, RAM, but most especially that submissions shouldn't require a modern CPU (with AVX-2) or a GPU. Otherwise he could have won it. I suspect it's actually that's the biggest reason he implemented libnc without GPU support initially. He has asked for the rules to be changed to allow AVX2 and I believe they eventually will be. So he won't give away the source for nncp yet, but will have to open source it to receive the prize.

        [1] https://textsynth.com/pricing.html

        [2] https://help.openai.com/en/articles/6485334-openai-api-prici...

  • infinityio 3 years ago

    I've had success with GPT-J (6B) [0] and GPT-NeoX (20B) [1], but they probably aren't quite the quality level you'll want to have

    On the other hand, Facebook has recently released the weights for a few sizes of their OPT model [2]. I haven't tried it, but that might be worth looking into, because they claim that their model is comparable to Davinci

    Note that for CPU inference you will be unable to use float16 datatypes, otherwise it might error out

    [0] https://huggingface.co/EleutherAI/gpt-j-6B [1] https://huggingface.co/EleutherAI/gpt-neox-20b [2] https://huggingface.co/facebook/opt-66b

ByThyGrace 3 years ago

What's the status of running SD on AMD GPUs?

yieldcrv 3 years ago

Where can i get up to speed on what’s coming down there pipeline in this ai/ml image making scene?

(And learn the agreed upon terms)

torotonnato 3 years ago

7' 12" on an ancient Intel Core i5-3350P CPU @ 3.10GHz (!) using BERT BasicTokenizer, default arguments

amrrs 3 years ago

On reddit I found some older GPUs take about 5 mins and here this video[1] says 5 mins for CPU using this OpenVino library. Not sure if OpenVino makes CPU chips compete with GPUs. Has anyone heard of OpenVino before ?

1.https://youtu.be/5iXhhf7ILME

Aiedail 3 years ago

I'm curious about what makes this project special, see there's a lot of similar implementations of diffusion models based on pytorch/tf. Is it because it use the cpu itself to produce the diffusion process?

  • nperez 3 years ago

    Yeah. For something like this, you ideally would want a powerful GPU with 12-24gb VRAM. If you have something like an RTX 2070 at the bare minimum, you probably don't need this and could do a lot more steps a lot faster on a GPU, but it's great for those who don't have that option.

    • Scaevolus 3 years ago

      A $500 RTX 3070 with 8GB of VRAM can generate 512x512 images with 50 steps in 7 seconds.

      • ShamelessC 3 years ago

        The RTX 2070 also shipped with 8GB of VRAM, just fyi.

        • nperez 3 years ago

          Yep, 8GB works fine. The 2070 is where I started. I wouldn't consider it ideal, though. There will be cases where you'll wish you could increase the resolution a little more, or could do just a few more per batch, but you're getting CUDA out-of-memory errors

mysterydip 3 years ago

I didn't see any requirements on the page beyond a CPU on that list. Do you need a certain amount of RAM? Will more speed things up to a degree?

aaaaaaaaaaab 3 years ago

Love this. OpenAI are livid. :^)

  • enchiridion 3 years ago

    Why?

    • aaaaaaaaaaab 3 years ago

      Because they no longer control the narrative.

      • Gigachad 3 years ago

        It does ruin the illusion that you need crazy million dollar servers to run the model and the world would fall in to chaos if the public got their hands on these models.

        • optimalsolver 3 years ago

          To be fair, the world only just got their hands on this (and by world I mean people with decent hardware), so too soon to say what the ramifications will be.

          • bogwog 3 years ago

            Also to be fair, the job "AI ethicist" probably didn't exist as a real thing until a few years ago. So the people in those roles over at OpenAI likely have no idea what they're doing.

        • adultSwim 3 years ago

          We wouldn't be able to run it ourselves if they hadn't trained it on 4000 GPUs for a month.

          • xtreme 3 years ago

            The cost of training is actually quite a bit less. Emad, the creator of SD stated this on Twitter:

            "We actually used 256 A100s for this per the model card, 150k hours in total so at market price $600k"

            • boppo1 3 years ago

              Christ, so what happens when google throws a cheeky 10 million at a model?

            • astrange 3 years ago

              Even if it was hard to train, you could make your own by fine-tuning a larger model for much cheaper.

              That's called "base models". (or "foundation models" if you're Stanford trying to co-opt it)

              • DoctorOetker 3 years ago

                suppose one has an idea for a different architecture / functional form etc, assuming the receiving model is substantially smaller so that the dominant computational cost is in the SD model, how long would effective knowledge distillation take on say a CPU?

                • astrange 3 years ago

                  That’s called teacher-student learning. It could still take weeks on a single machine easily, but renting more GPU time or getting free credits from somewhere is perfectly plausible.

boppo1 3 years ago

The most powerful device I have is an ipad pro M1 16gb ram. Can I run this on that thing at all?

hustwindmaple1 3 years ago

It is noticeably faster than the original model (~30-40%) on my machine.

motoboi 3 years ago

openvino is an unsung hero.

polskibus 3 years ago

can't get it to install requirements on Windows with Python 3.10 and MS Build Tools 2022. Any tips?

  • desindol 3 years ago

    It needs python 3.9.

  • manyone1 3 years ago

    python 3.10 will fail on openvino. i used these steps:

    anaconda prompt

    cd to the destination folder

    conda create --name py38 python=3.8

    conda activate py38

    conda update --all

    conda install openvino-ie4py -c intel

    pip install -r requirements.txt

    i also had to edit stable_diffusion.py and changed in the #decoder area: changed vae.xml and vae.bin to vae_decoder.xml and vae_decoder.bin respectively

    from there i could run

    python stable_diffusion.py --prompt "Street-art painting of Emma Stone dancing, in picasso style"

    for img2img, use this (note DIFFERENT program):

    python demo.py --prompt "astronaut with jetpack floating in space with earth below" --init-image ./data/jomar.jpg --strength 0.5

avocado2 3 years ago
  • skybrian 3 years ago

    Or if you don’t want to tweak anything, just use the hosted version? https://beta.dreamstudio.ai/

    • esperent 3 years ago

      It's pretty expensive. They give you 100 free credits but I burned though that in about 10 minutes just trying to figure out how things worked. Didn't get any nice images.

      After that, it's $1 per 100 credits, so about $6 an hour maybe.

      • skybrian 3 years ago

        I put $10 in and it’s lasted a long time, but I don’t generate images every day. Cheaper than getting a good graphics card, anyway.

      • ConceptJunkie 3 years ago

        One credit is one regular-sized picture generation. If you want to make it bigger or run more steps (in my experience, this is less useful than it would seem), then it can run up to 3 or more credits per generation. That's still 3 cents a picture. Compared to Dall-E and even Midjourney, that's crazy cheap.

    • TylerE 3 years ago

      Because I’ve got a 3080 already and don’t want to spend money?

      • skybrian 3 years ago

        Yep. Not everyone does, though, and the paid service is quite convenient.

  • MitPitt 3 years ago

    Why do you keep posting the same comment under every SD post? It doesn't contribute to the discussion, and it's not very relevant to the OP. Some of the links don't even work anymore.

stateoff 3 years ago

https://laion.ai/faq/ Based on the FAQ of the dataset that was used for training of https://huggingface.co/spaces/stabilityai/stable-diffusion

   LAION datasets are simply indexes to the internet, i.e. lists of URLs to the original images together with the ALT texts found linked to those images. While we downloaded and calculated CLIP embeddings of the pictures to compute similarity scores between pictures and texts, we subsequently discarded all the photos. Any researcher using the datasets must reconstruct the images data by downloading the subset they are interested in. For this purpose, we suggest the img2dataset tool.

I love the "*simply*", but doesn't it mean that (depending on country, laws etc., but generally):

1. The LAION group committed possible copyright infringements and even left undeniable evidence that they did - on top of their written testimony (dumping the "stolen goods into the river" does not make the infringement undone, does it?)

2. Any model trained on the "linked" data may commit copyright infringement.

3. As consequence, you using generated images may be liable.

I always wonder how it possibly is legal at all - considering that as a human artist if I was to copy material and remix it it without proper permission would be liable (again depending on situation), but suddenly ML is around the corner and it's all great and now you can keep remixing the potential problematic output further - no questions asked!?

I guess there are no precedence cases but why should an automaton/software (and its creators) be judged differently to persons? I don't want to spoil the fun but what am I missing?

Also disappointed that this dataset did not make sure to only collect unproblematic content like Creative Commons that allows remixing. Would be a hell of a attribution list but definitely better than what is presented here.

EDIT: Formatting

EDIT2: I actually followed one of the projects mentioned not the linked repository. Clarified above.

  • pdntspa 3 years ago

    Is feeding copyrighted material into an AI really copyright infringement?

    • moosedev 3 years ago

      I don't know, but it's quite entertaining when the output occasionally has a corrupted, but recognizable, Getty Images watermark: https://imgur.com/SmibVME

      (Prompt: "A horse delivering mail in New York City, 1870")

      • actionfromafar 3 years ago

        That's the scariest horses I have ever seen.

    • cbozeman 3 years ago

      If it is, then every human brain is guilty of copyright infringement.

  • postalrat 3 years ago

    Is training the model infringement or is distributing the model infringement?

    What if you trained the model and only distributed generated images?

    Is a human making art "in the style of" also infringement?

    • incrudible 3 years ago

      Legally, it is uncharted territory on many levels. I think there are good arguments to be made that these systems violate the intent behind copyright and trademarks, but not necessarily the laws.

  • nl 3 years ago

    It's not simply a given that using copyright material to train a model is copyright violation.

    In my view it isn't. No one image contributes a significant amount, and the process the machine is doing it analogous to that a human does when the human learns.

    • incrudible 3 years ago

      It is likely legal, but is it ethical? If it is not ethical, should it be legal?

      We do tend to treat humans differently based on them being sentient beings with a limited lifespan, not machines.

      • nl 3 years ago

        I strongly (actually very strongly) feel that it is ethical.

        I also feel that the act of producing plagiarized content is unethical, immoral and I'd be supportive of new concepts in intellectual property law that make it illegal too.

  • bogwog 3 years ago

    I'm all for having these models scrutinized for copyright violations (and possibly amending copyright laws), but this comment is nothing but low-effort FUD.

  • astrange 3 years ago

    Bad cases make bad law - if you argue too hard in the direction of "any copyrighted material in the AI's training set makes it copyrighted" this could lead to, say, "Disney owns any animated movie made by someone who watched a Disney movie".

    You can make an AI that doesn't memorize a specific training input; similarly you could probably make one that intentionally memorizes them. Both of these seem useful.

  • MattRix 3 years ago

    If these AIs were actually just "remixing" and creating collages, then perhaps I would agree with you... but there is no exact pixel data stored here. This is fairly obvious when you consider that Stable Diffusion was trained on 100 terabytes of images yet the actual model file is 4gb.

    Now I'm not saying that nothing created by these AIs should be considered copyright infringement. As a human artist, you are not judged on your process, you are judged on the end results. The same should be done for the works created by these AIs.