• 3 Posts
  • 30 Comments
Joined 1 year ago
cake
Cake day: July 1st, 2023

help-circle









  • I do SDXL generation in 4GB at extreme expense of speed, by using a number of memory optimizations.
    I’ve done this kind of stuff since SD 1.4, for the fun of it. I like to see how low I can push vram use.

    SDXL takes around 3 to 4 minutes per generation including refiner but it works within constraints.
    Graphics cards used are hilariously bad for the task, a 1050ti with 4GB and a 1060 with 3GB vram.

    Have an implementation running on the 3GB card, inside a podman container, with no ram offloading, 1 vcpu and 4GB ram.
    Graphical UI (streamlit) run on a laptop outside of server to save resources.

    Working on a example implementation of SDXL as we speak and also working on SDXL generation on mobile.
    That is the reason I’ve looked into this news, SSD-1B might be a good candidate for my dumb experiments.








  • Absolutely stellar write up. Thank you!

    I have a couple of questions.
    Imagine I have a powerful consumer gpu card to trow at this solution, 4090ti for the sake of example.
    - How many containers can share one physical card, taking into account total vram memory will not be exceeded?
    - How does one virtual gpu look like in the container? Can I run standard stuff like PyTorch, Tensorflow, and CUDA stuff in general?





  • While designing a similar classifier, I’ve considered the idea of giving it the whole thread as “context” of sorts.
    Not just the parent comment, the whole thread up to original post.

    I’ve abandoned the idea.
    A comment must stand on it’s own, and it would put limits on results, the way I was planning to do it.
    I might be very wrong, your insight into this would be very helpful.

    My original idea was to go recursively trough the thread and test each comment individually.
    Then I would influence the actual comment results with the combined results of it’s parents.
    No context during inference, just one comment at a time.

    For example consider thread OP->C1->C2->C3.
    My current model takes milliseconds per test with little resources used.
    It would be ok up to very large threads but would contain a limit to save on answer time.
    I want to determine if Comment 3 is toxic in the context of C2, C1, and OP.
    Test C3, test C2, test C1, test OP. Save results.
    My current model gives answer in several fields (“toxic”, “severe toxic”, “obscene”, “threat”, “insult”, and “identity hate”)
    The idea was to then combine the results of each into a final result for C3.

    How to combine? Haven’t figure it out but it would be results manipulation instead of inference/context, etc.

    Edit: Is there any way you can point me at examples difficult to classify? It would be a nice real world test to my stuff.
    Current iteration of model is very new and has not been tested in the wild.