Wow , yeah I found a demo here: https://huggingface.co/spaces/Qwen/Qwen2.5
A whole host of LLM models seems to be released. Thanks for the tip!
I’ll see if I can turn them into something useful 👍
That’s good to know. I’ll try them out. Thanks.
Hmm. I mean the FLUX model looks good
, so there must maybe be some magic with the T5 ?
I have no clue, so any insights are welcome.
T5 Huggingface: https://huggingface.co/docs/transformers/model_doc/t5
T5 paper : https://arxiv.org/pdf/1910.10683
Any suggestions on what LLM i ought to use instead of T5?
Good find! Fixed. It was well appreciated.
Fair enough
I get it. I hope you don’t interpret this as arguing against results etc.
What I want to say is ,
If implemented correctly , same seed does give the same result for output for a given prompt.
If there is variation , then something in the pipeline must be approximating things.
This may be good (for performance) , or it may be bad.
You are 100% correct in highlighting this issue to the dev.
Though its not a legal document , or a science paper.
Just a guide to explain seeds to newbies.
Omitting non-essential information , for the sake of making the concept clearer , can be good too.
Perchance dev is correct here Allo ;
the same seed will generate the exact same picture.
If you see variety , it will be due to factors outside the SD model. That stuff happens.
But it’s good that you fact check stuff.
Do you know where I can find documemtation on the perchance API?
Specifically createPerchanceTree ?
I need to know which functions there are , and what inputs/outputs they take.
Thanks! I appreciate the support. Helps a lot to know where to start looking ( ; v ;)b!
New stuff
Paper: https://arxiv.org/abs/2303.03032
Takes only a few seconds to calculate.
Most similiar suffix tokens : "vfx "
most similiar prefix tokens : “imperi-”
New stuff
Paper: https://arxiv.org/abs/2303.03032
Takes only a few seconds to calculate.
Most similiar suffix tokens : "vfx "
most similiar prefix tokens : “imperi-”
I count casualty_rate = number_shot / (number_shot + number_subdued)
Which in this case is 22/64 = 34% casualty rate for civilians
and 98/131 = 75% casualty rate for police
So its 64-131 between work done by bystanders vs. work done by police?
And casualty rate is actually lower for bystanders doing the work (with their guns) than the police?
This is how the notebook works:
Similiar vectors = similiar output in the SD 1.5 / SDXL / FLUX model
CLIP converts the prompt text to vectors (“tensors”) , with float32 values usually ranging from -1 to 1
Dimensions are [ 1x768 ] tensors for SD 1.5 , and a [ 1x768 , 1x1024 ] tensor for SDXL and FLUX.
The SD models and FLUX converts these vectors to an image.
This notebook takes an input string , tokenizes it and matches the first token against the 49407 token vectors in the vocab.json : https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main/tokenizer
It finds the “most similiar tokens” in the list. Similarity is the theta angle between the token vectors.
The angle is calculated using cosine similarity , where 1 = 100% similarity (parallell vectors) , and 0 = 0% similarity (perpendicular vectors).
Negative similarity is also possible.
So if you are bored of prompting “girl” and want something similiar you can run this notebook and use the “chick</w>” token at 21.88% similarity , for example
You can also run a mixed search , like “cute+girl”/2 , where for example “kpop</w>” has a 16.71% similarity
Sidenote: Prompt weights like (banana:1.2) will scale the magnitude of the corresponding 1x768 tensor(s) by 1.2 .
Source: https://huggingface.co/docs/diffusers/main/en/using-diffusers/weighted_prompts*
So TLDR; vector direction = “what to generate” , vector magnitude = “prompt weights”
Nice! Thanks. Yeah, I realize Lemmy is a really good place to keep things organized.
I can’t speculate.
If you feel up for the task I’d suggest running prompts that use Euler a at 20 steps for a given seed using that model and see if results match images on the perchance site.
If they do , then we know the furry model = Pony diffusion
(Though IIRC the furry model on perchance existed before Pony Diffusion. )
Aha. So what you wanted to say was that “Starlight” and/or “Glimmer” are triggerwords for the furry model. Gotcha!
Those are both the furry model tho?
Simple and cool.
Florence 2 image captioning sounds interesting to use.
Do people know of any other image-to-text models (apart from CLIP) ?