Commit Graph

161 Commits

Author SHA1 Message Date
Wagner Bruna
aa44e06890 fix: avoid crash with LoRAs and type override (#974) 2025-11-16 14:47:36 +08:00
leejet
347710f68f feat: support applying LoRA at runtime (#969) 2025-11-13 21:48:44 +08:00
leejet
694f0d9235 refactor: optimize the logic for name conversion and the processing of the LoRA model (#955) 2025-11-10 00:12:20 +08:00
stduhpf
8ecdf053ac feat: add image preview support (#522) 2025-11-10 00:12:02 +08:00
leejet
ee89afc878 fix: resolve issue with pmid (#957) 2025-11-09 22:47:53 +08:00
akleine
d2d3944f50 feat: add support for SD2.x with TINY U-Nets (#939) 2025-11-09 22:47:37 +08:00
akleine
0fa3e1a383 fix: prevent core dump in PM V2 in case of incomplete cmd line (#950) 2025-11-09 22:36:43 +08:00
stduhpf
fb748bb8a4 fix: TAE encoding (#935) 2025-11-07 22:58:59 +08:00
leejet
8f6c5c217b refactor: simplify the model loading logic (#933)
* remove String2GGMLType

* remove preprocess_tensor

* fix clip init

* simplify the logic for reading weights
2025-11-03 21:21:34 +08:00
leejet
6103d86e2c refactor: introduce GGMLRunnerContext (#928)
* introduce GGMLRunnerContext

* add Flash Attention enable control through GGMLRunnerContext

* add conv2d_direct enable control through GGMLRunnerContext
2025-11-02 02:11:04 +08:00
stduhpf
c42826b77c fix: resolve multiple inpainting issues (#926)
* Fix inpainting masked image being broken by side effect

* Fix unet inpainting concat not being set

* Fix Flex.2 inpaint mode crash (+ use scale factor)
2025-11-02 02:10:32 +08:00
leejet
dd75fc081c refactor: unify the naming style of ggml extension functions (#921) 2025-10-28 23:26:48 +08:00
Wagner Bruna
8a45d0ff7f chore: clean up stb includes (#919) 2025-10-28 23:25:45 +08:00
leejet
9e28be6479 feat: add chroma radiance support (#910)
* add chroma radiance support

* fix ci

* simply generate_init_latent

* workaround: avoid ggml cuda error

* format code

* add chroma radiance doc
2025-10-25 23:56:14 +08:00
akleine
062490aa7c feat: add SSD1B and tiny-sd support (#897)
* feat: add code and doc for running SSD1B models

* Added some more lines to support SD1.x with TINY U-Nets too.

* support SSD-1B.safetensors

* fix sdv1.5 diffusers format loader

---------

Co-authored-by: leejet <leejet714@gmail.com>
2025-10-25 23:35:54 +08:00
stduhpf
917f7bfe99 fix: support --flow-shift for flux models with default pred (#913) 2025-10-23 21:35:18 +08:00
leejet
48e0a28ddf feat: add shift factor support (#903) 2025-10-23 01:20:29 +08:00
leejet
d05e46ca5e chore: add .clang-tidy configuration and apply modernize checks (#902) 2025-10-18 23:23:40 +08:00
leejet
90ef5f8246 feat: add auto-resize support for reference images (was Qwen-Image-Edit only) (#898) 2025-10-18 16:37:09 +08:00
leejet
db6f4791b4 feat: add wtype stat (#899) 2025-10-17 23:40:32 +08:00
leejet
40a6a8710e fix: resolve precision issues in SDXL VAE under fp16 (#888)
* fix: resolve precision issues in SDXL VAE under fp16

* add --force-sdxl-vae-conv-scale option

* update docs
2025-10-15 23:01:00 +08:00
Daniele
e3702585cb feat: added prediction argument (#334) 2025-10-15 23:00:10 +08:00
leejet
2e9242e37f feat: add Qwen Image Edit support (#877)
* add ref latent support for qwen image

* optimize clip_preprocess and fix get_first_stage_encoding

* add qwen2vl vit support

* add qwen image edit support

* fix qwen image edit pipeline

* add mmproj file support

* support dynamic number of Qwen image transformer blocks

* set prompt_template_encode_start_idx every time

* to_add_out precision fix

* to_out.0 precision fix

* update docs
2025-10-13 23:17:18 +08:00
Wagner Bruna
c64994dc1d fix: better progress display for second-order samplers (#834) 2025-10-13 22:12:48 +08:00
Wagner Bruna
5436f6b814 fix: correct canny preprocessor (#861) 2025-10-13 22:02:35 +08:00
leejet
1c32fa03bc fix: avoid generating black images when running T5 on the GPU (#882) 2025-10-13 00:01:06 +08:00
Wagner Bruna
9727c6bb98 fix: resolve VAE tiling problem in Qwen Image (#873) 2025-10-12 23:45:53 +08:00
leejet
beb99a2de2 feat: add Qwen Image support (#851)
* add qwen tokenizer

* add qwen2.5 vl support

* mv qwen.hpp -> qwenvl.hpp

* add qwen image model

* add qwen image t2i pipeline

* fix qwen image flash attn

* add qwen image i2i pipline

* change encoding of vocab_qwen.hpp to utf8

* fix get_first_stage_encoding

* apply jeffbolz f32 patch

https://github.com/leejet/stable-diffusion.cpp/pull/851#issuecomment-3335515302

* fix the issue that occurs when using CUDA with k-quants weights

* optimize the handling of the FeedForward precision fix

* to_add_out precision fix

* update docs
2025-10-12 23:23:19 +08:00
Wagner Bruna
aa68b875b9 refactor: deal with default img-cfg-scale at the library level (#869) 2025-10-12 23:17:52 +08:00
stduhpf
11f436c483 feat: add support for Flux Controls and Flex.2 (#692) 2025-10-11 00:06:57 +08:00
Wagner Bruna
f3140eadbb fix: tensor loading thread count (#854) 2025-09-25 00:26:38 +08:00
leejet
fd693ac6a2 refactor: remove unused --normalize-input parameter (#835) 2025-09-18 00:12:53 +08:00
Wagner Bruna
171b2222a5 fix: avoid segfault for pix2pix models without reference images (#766)
* fix: avoid segfault for pix2pix models with no reference images

* fix: default to empty reference on pix2pix models to avoid segfault

* use resize instead of reserve

* format code

---------

Co-authored-by: leejet <leejet714@gmail.com>
2025-09-18 00:11:38 +08:00
Erik Scholz
8909523e92 refactor: move tiling cacl and debug print into the tiling code branch (#833) 2025-09-16 22:46:56 +08:00
rmatif
8376dfba2a feat: add sgm_uniform scheduler, simple scheduler, and support for NitroFusion (#675)
* feat: Add timestep shift and two new schedulers

* update readme

* fix spaces

* format code

* simplify SGMUniformSchedule

* simplify shifted_timestep logic

* avoid conflict

---------

Co-authored-by: leejet <leejet714@gmail.com>
2025-09-16 22:42:09 +08:00
leejet
0ebe6fe118 refactor: simplify the logic of pm id image loading (#827) 2025-09-14 22:50:21 +08:00
rmatif
55c2e05d98 feat: optimize tensor loading time (#790)
* opt tensor loading

* fix build failure

* revert the changes

* allow the use of n_threads

* fix lora loading

* optimize lora loading

* add mutex

* use atomic

* fix build

* fix potential duplicate issue

* avoid duplicate lookup of lora tensor

* fix progeress bar

* remove unused remove_duplicates

---------

Co-authored-by: leejet <leejet714@gmail.com>
2025-09-14 22:48:35 +08:00
leejet
52a97b3ac1 feat: add vace support (#819)
* add wan vace t2v support

* add --vace-strength option

* add vace i2v support

* fix the processing of vace_context

* add vace v2v support

* update docs
2025-09-14 16:57:33 +08:00
stduhpf
2c9b1e2594 feat: add VAE encoding tiling support and adaptive overlap (#484)
* implement  tiling vae encode support

* Tiling (vae/upscale): adaptative overlap

* Tiling: fix edge case

* Tiling: fix crash when less than 2 tiles per dim

* remove extra dot

* Tiling: fix edge cases for adaptative overlap

* tiling: fix edge case

* set vae tile size via env var

* vae tiling: refactor again, base on smaller buffer for alignment

* Use bigger tiles for encode (to match compute buffer size)

* Fix edge case when tile is bigger than latent

* non-square VAE tiling (#3)

* refactor tile number calculation

* support non-square tiles

* add env var to change tile overlap

* add safeguards and better error messages for SD_TILE_OVERLAP

* add safeguards and include overlapping factor for SD_TILE_SIZE

* avoid rounding issues when specifying SD_TILE_SIZE as a factor

* lower SD_TILE_OVERLAP limit

* zero-init empty output buffer

* Fix decode latent size

* fix encode

* tile size params instead of env

* Tiled vae parameter validation (#6)

* avoid crash with invalid tile sizes, use 0 for default

* refactor default tile size, limit overlap factor

* remove explicit parameter for relative tile size

* limit encoding tile to latent size

* unify code style and format code

* update docs

* fix get_tile_sizes in decode_first_stage

---------

Co-authored-by: Wagner Bruna <wbruna@users.noreply.github.com>
Co-authored-by: leejet <leejet714@gmail.com>
2025-09-14 16:00:29 +08:00
leejet
dc46993b55 feat: increase work_ctx memory buffer size (#814) 2025-09-14 13:19:20 +08:00
Wagner Bruna
c607fc3ed4 feat: use Euler sampling by default for SD3 and Flux (#753)
Thank you for your contribution.
2025-09-14 12:34:41 +08:00
Wagner Bruna
b54bec3f18 fix: do not force VAE type to f32 on SDXL (#716)
This seems to be a leftover from the initial SDXL support: it's
not enough to avoid NaN issues, and it's not not needed for the
fixed sdxl-vae-fp16-fix .
2025-09-14 12:19:59 +08:00
Wagner Bruna
5869987fe4 fix: make weight override more robust against ggml changes (#760) 2025-09-14 12:15:53 +08:00
Wagner Bruna
48956ffb87 feat: reduce CLIP memory usage with no embeddings (#768) 2025-09-14 12:08:00 +08:00
leejet
fce6afcc6a feat: add sd3 flash attn support (#815) 2025-09-11 23:24:29 +08:00
Erik Scholz
49d6570c43 feat: add SmoothStep Scheduler (#813) 2025-09-11 23:17:46 +08:00
clibdev
87cdbd5978 feat: use log_printf to print ggml logs (#545) 2025-09-11 22:16:05 +08:00
leejet
b017918106 chore: remove sd3 flash attention warn (#812) 2025-09-10 22:21:02 +08:00
Wagner Bruna
ac5a215998 fix: use {} for params init instead of memset (#781) 2025-09-10 21:49:29 +08:00
Wagner Bruna
abb36d66b5 chore: update flash attention warnings (#805) 2025-09-10 21:38:21 +08:00