Quem 2.5 (State-of-the-art chatbot) was just released

AdComfortable1514@lemmy.world · edit-2 3 hours ago

Simple and cool.

Florence 2 image captioning sounds interesting to use.

Do people know of any other image-to-text models (apart from CLIP) ?

AdComfortable1514@lemmy.world · edit-2 20 hours ago

Quem 2.5 (State-of-the-art chatbot) was just released

AdComfortable1514@lemmy.world · 22 hours ago

Wow , yeah I found a demo here: https://huggingface.co/spaces/Qwen/Qwen2.5

A whole host of LLM models seems to be released. Thanks for the tip!

I’ll see if I can turn them into something useful 👍

AdComfortable1514@lemmy.world · 24 hours ago

That’s good to know. I’ll try them out. Thanks.

AdComfortable1514@lemmy.world · 2 days ago

New dedicated repo for my notebooks

AdComfortable1514@lemmy.world · edit-2 2 days ago

Hmm. I mean the FLUX model looks good

, so there must maybe be some magic with the T5 ?

I have no clue, so any insights are welcome.

T5 Huggingface: https://huggingface.co/docs/transformers/model_doc/t5

T5 paper : https://arxiv.org/pdf/1910.10683

Any suggestions on what LLM i ought to use instead of T5?

AdComfortable1514@lemmy.world · edit-2 2 days ago

Good find! Fixed. It was well appreciated.

AdComfortable1514@lemmy.world · edit-2 3 days ago

[Dev Diary] More sets added to the NND CLIP interrogator

AdComfortable1514@lemmy.world · edit-2 3 days ago

[Resource] [T2i] indexed text_encoding converter.ipynb + sneak preview

AdComfortable1514@lemmy.world · 4 days ago

Fair enough

AdComfortable1514@lemmy.world · edit-2 4 days ago

I get it. I hope you don’t interpret this as arguing against results etc.

What I want to say is ,

If implemented correctly , same seed does give the same result for output for a given prompt.

If there is variation , then something in the pipeline must be approximating things.

This may be good (for performance) , or it may be bad.

You are 100% correct in highlighting this issue to the dev.

Though its not a legal document , or a science paper.

Just a guide to explain seeds to newbies.

Omitting non-essential information , for the sake of making the concept clearer , can be good too.

AdComfortable1514@lemmy.world · edit-2 4 days ago

Perchance dev is correct here Allo ;

the same seed will generate the exact same picture.

If you see variety , it will be due to factors outside the SD model. That stuff happens.

But it’s good that you fact check stuff.

AdComfortable1514@lemmy.world · 4 days ago

Do you know where I can find documemtation on the perchance API?

Specifically createPerchanceTree ?

I need to know which functions there are , and what inputs/outputs they take.

AdComfortable1514@lemmy.world · edit-2 4 days ago

[Image Gallery] [SD1.5] I prompted myself into the backrooms

AdComfortable1514@lemmy.world · 5 days ago

Thanks! I appreciate the support. Helps a lot to know where to start looking ( ; v ;)b!

AdComfortable1514@lemmy.world · edit-2 5 days ago

I get this error from dynamic imports. Ideas?

AdComfortable1514@lemmy.world · 6 days ago

[T2i] [Prompts] Huggingface repo created for fusion-gen prompts

AdComfortable1514@lemmy.world · 6 days ago

Cool

AdComfortable1514@lemmy.world · edit-2 7 days ago

[T2i] [Prompts] The contents of the fusion-gen sub-generators can now be downloaded as JSON files

AdComfortable1514@lemmy.world · edit-2 7 days ago

Making a better CLIP interrogator with the FLUX T5 encoder?

AdComfortable1514@lemmy.world · 9 days ago

New stuff

Paper: https://arxiv.org/abs/2303.03032

Takes only a few seconds to calculate.

AdComfortable1514@lemmy.world · 9 days ago

New stuff

Paper: https://arxiv.org/abs/2303.03032

Takes only a few seconds to calculate.

AdComfortable1514@lemmy.world · edit-2 12 days ago

I count casualty_rate = number_shot / (number_shot + number_subdued)

Which in this case is 22/64 = 34% casualty rate for civilians

and 98/131 = 75% casualty rate for police

AdComfortable1514@lemmy.world · 12 days ago

So its 64-131 between work done by bystanders vs. work done by police?

And casualty rate is actually lower for bystanders doing the work (with their guns) than the police?

AdComfortable1514@lemmy.world · edit-2 12 days ago

Prompt+Token % Similarity Calculator. No GPU required.

AdComfortable1514@lemmy.world · edit-2 13 days ago

This is how the notebook works:

Similiar vectors = similiar output in the SD 1.5 / SDXL / FLUX model

CLIP converts the prompt text to vectors (“tensors”) , with float32 values usually ranging from -1 to 1

Dimensions are [ 1x768 ] tensors for SD 1.5 , and a [ 1x768 , 1x1024 ] tensor for SDXL and FLUX.

The SD models and FLUX converts these vectors to an image.

This notebook takes an input string , tokenizes it and matches the first token against the 49407 token vectors in the vocab.json : https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main/tokenizer

It finds the “most similiar tokens” in the list. Similarity is the theta angle between the token vectors.

The angle is calculated using cosine similarity , where 1 = 100% similarity (parallell vectors) , and 0 = 0% similarity (perpendicular vectors).

Negative similarity is also possible.

So if you are bored of prompting “girl” and want something similiar you can run this notebook and use the “chick</w>” token at 21.88% similarity , for example

You can also run a mixed search , like “cute+girl”/2 , where for example “kpop</w>” has a 16.71% similarity

Sidenote: Prompt weights like (banana:1.2) will scale the magnitude of the corresponding 1x768 tensor(s) by 1.2 .

Source: https://huggingface.co/docs/diffusers/main/en/using-diffusers/weighted_prompts*

So TLDR; vector direction = “what to generate” , vector magnitude = “prompt weights”

AdComfortable1514@lemmy.world · edit-2 14 days ago

[T2i] Token similarity % calculator

AdComfortable1514@lemmy.world · 16 days ago

Nice! Thanks. Yeah, I realize Lemmy is a really good place to keep things organized.

AdComfortable1514@lemmy.world · edit-2 6 days ago

Adcom's resource links for T2i prompts + other cool stuff

AdComfortable1514@lemmy.world · edit-2 16 days ago

Sen. Scott Weiner (D.) Bloomberg interview on his proposed SB-1047 bill to enable Silicon Valley to pro-actively 'regulate harmful AI generated content'

AdComfortable1514@lemmy.world · edit-2 16 days ago

The 'AI Generated Watermark Bill' AB3211 expanded with added clauses , placed on hold until further review

AdComfortable1514@lemmy.world · 17 days ago

Improved YT playlist to MP3 (For all your music needs)

AdComfortable1514@lemmy.world · edit-2 18 days ago

The official SD 1.5 model just got deleted off huggingface 😱

AdComfortable1514@lemmy.world · edit-2 19 days ago

California bill to 'force provence on AI generated content' and restrict 'harmful AI models' from commercial use

AdComfortable1514@lemmy.world · 23 days ago

I can’t speculate.

If you feel up for the task I’d suggest running prompts that use Euler a at 20 steps for a given seed using that model and see if results match images on the perchance site.

If they do , then we know the furry model = Pony diffusion

(Though IIRC the furry model on perchance existed before Pony Diffusion. )

AdComfortable1514@lemmy.world · 23 days ago

Aha. So what you wanted to say was that “Starlight” and/or “Glimmer” are triggerwords for the furry model. Gotcha!

AdComfortable1514@lemmy.world · 24 days ago

Those are both the furry model tho?

AdComfortable1514@lemmy.world · edit-2 25 days ago

[Request] [T2i] Active model selection please!

AdComfortable1514@lemmy.world · edit-2 1 month ago

[Text-to-image] The Perchance AI Stable Diffusion generator (fusion-gen) now has a anime tag tool (link in description)

AdComfortable1514@lemmy.world · edit-2 1 month ago

[Question] What SD 1.5 model does Perchance currently use?

Moderates