leejet
5498cc0d67
feat: add Wan2.1-I2V-1.3B(SkyReels) support ( #988 )
2025-11-19 23:56:46 +08:00
rmatif
a14e2b321d
feat: add easycache support ( #940 )
2025-11-19 23:19:32 +08:00
leejet
b88cc32346
fix: avoid using same type but diff instances for rng and sampler_rng ( #982 )
2025-11-16 23:37:14 +08:00
leejet
d5b05f70c6
feat: support independent sampler rng ( #978 )
2025-11-16 17:11:02 +08:00
akleine
6d6dc1b8ed
fix: make PhotoMakerV2 more robust by image count check ( #970 )
2025-11-16 17:10:48 +08:00
Wagner Bruna
199e675cc7
feat: support for --tensor-type-rules on generation modes ( #932 )
2025-11-16 17:07:32 +08:00
leejet
742a7333c3
feat: add cpu rng ( #977 )
2025-11-16 14:48:15 +08:00
Wagner Bruna
aa44e06890
fix: avoid crash with LoRAs and type override ( #974 )
2025-11-16 14:47:36 +08:00
leejet
347710f68f
feat: support applying LoRA at runtime ( #969 )
2025-11-13 21:48:44 +08:00
leejet
694f0d9235
refactor: optimize the logic for name conversion and the processing of the LoRA model ( #955 )
2025-11-10 00:12:20 +08:00
stduhpf
8ecdf053ac
feat: add image preview support ( #522 )
2025-11-10 00:12:02 +08:00
leejet
ee89afc878
fix: resolve issue with pmid ( #957 )
2025-11-09 22:47:53 +08:00
akleine
d2d3944f50
feat: add support for SD2.x with TINY U-Nets ( #939 )
2025-11-09 22:47:37 +08:00
akleine
0fa3e1a383
fix: prevent core dump in PM V2 in case of incomplete cmd line ( #950 )
2025-11-09 22:36:43 +08:00
stduhpf
fb748bb8a4
fix: TAE encoding ( #935 )
2025-11-07 22:58:59 +08:00
leejet
8f6c5c217b
refactor: simplify the model loading logic ( #933 )
...
* remove String2GGMLType
* remove preprocess_tensor
* fix clip init
* simplify the logic for reading weights
2025-11-03 21:21:34 +08:00
leejet
6103d86e2c
refactor: introduce GGMLRunnerContext ( #928 )
...
* introduce GGMLRunnerContext
* add Flash Attention enable control through GGMLRunnerContext
* add conv2d_direct enable control through GGMLRunnerContext
2025-11-02 02:11:04 +08:00
stduhpf
c42826b77c
fix: resolve multiple inpainting issues ( #926 )
...
* Fix inpainting masked image being broken by side effect
* Fix unet inpainting concat not being set
* Fix Flex.2 inpaint mode crash (+ use scale factor)
2025-11-02 02:10:32 +08:00
leejet
dd75fc081c
refactor: unify the naming style of ggml extension functions ( #921 )
2025-10-28 23:26:48 +08:00
Wagner Bruna
8a45d0ff7f
chore: clean up stb includes ( #919 )
2025-10-28 23:25:45 +08:00
leejet
9e28be6479
feat: add chroma radiance support ( #910 )
...
* add chroma radiance support
* fix ci
* simply generate_init_latent
* workaround: avoid ggml cuda error
* format code
* add chroma radiance doc
2025-10-25 23:56:14 +08:00
akleine
062490aa7c
feat: add SSD1B and tiny-sd support ( #897 )
...
* feat: add code and doc for running SSD1B models
* Added some more lines to support SD1.x with TINY U-Nets too.
* support SSD-1B.safetensors
* fix sdv1.5 diffusers format loader
---------
Co-authored-by: leejet <leejet714@gmail.com >
2025-10-25 23:35:54 +08:00
stduhpf
917f7bfe99
fix: support --flow-shift for flux models with default pred ( #913 )
2025-10-23 21:35:18 +08:00
leejet
48e0a28ddf
feat: add shift factor support ( #903 )
2025-10-23 01:20:29 +08:00
leejet
d05e46ca5e
chore: add .clang-tidy configuration and apply modernize checks ( #902 )
2025-10-18 23:23:40 +08:00
leejet
90ef5f8246
feat: add auto-resize support for reference images (was Qwen-Image-Edit only) ( #898 )
2025-10-18 16:37:09 +08:00
leejet
db6f4791b4
feat: add wtype stat ( #899 )
2025-10-17 23:40:32 +08:00
leejet
40a6a8710e
fix: resolve precision issues in SDXL VAE under fp16 ( #888 )
...
* fix: resolve precision issues in SDXL VAE under fp16
* add --force-sdxl-vae-conv-scale option
* update docs
2025-10-15 23:01:00 +08:00
Daniele
e3702585cb
feat: added prediction argument ( #334 )
2025-10-15 23:00:10 +08:00
leejet
2e9242e37f
feat: add Qwen Image Edit support ( #877 )
...
* add ref latent support for qwen image
* optimize clip_preprocess and fix get_first_stage_encoding
* add qwen2vl vit support
* add qwen image edit support
* fix qwen image edit pipeline
* add mmproj file support
* support dynamic number of Qwen image transformer blocks
* set prompt_template_encode_start_idx every time
* to_add_out precision fix
* to_out.0 precision fix
* update docs
2025-10-13 23:17:18 +08:00
Wagner Bruna
c64994dc1d
fix: better progress display for second-order samplers ( #834 )
2025-10-13 22:12:48 +08:00
Wagner Bruna
5436f6b814
fix: correct canny preprocessor ( #861 )
2025-10-13 22:02:35 +08:00
leejet
1c32fa03bc
fix: avoid generating black images when running T5 on the GPU ( #882 )
2025-10-13 00:01:06 +08:00
Wagner Bruna
9727c6bb98
fix: resolve VAE tiling problem in Qwen Image ( #873 )
2025-10-12 23:45:53 +08:00
leejet
beb99a2de2
feat: add Qwen Image support ( #851 )
...
* add qwen tokenizer
* add qwen2.5 vl support
* mv qwen.hpp -> qwenvl.hpp
* add qwen image model
* add qwen image t2i pipeline
* fix qwen image flash attn
* add qwen image i2i pipline
* change encoding of vocab_qwen.hpp to utf8
* fix get_first_stage_encoding
* apply jeffbolz f32 patch
https://github.com/leejet/stable-diffusion.cpp/pull/851#issuecomment-3335515302
* fix the issue that occurs when using CUDA with k-quants weights
* optimize the handling of the FeedForward precision fix
* to_add_out precision fix
* update docs
2025-10-12 23:23:19 +08:00
Wagner Bruna
aa68b875b9
refactor: deal with default img-cfg-scale at the library level ( #869 )
2025-10-12 23:17:52 +08:00
stduhpf
11f436c483
feat: add support for Flux Controls and Flex.2 ( #692 )
2025-10-11 00:06:57 +08:00
Wagner Bruna
f3140eadbb
fix: tensor loading thread count ( #854 )
2025-09-25 00:26:38 +08:00
leejet
fd693ac6a2
refactor: remove unused --normalize-input parameter ( #835 )
2025-09-18 00:12:53 +08:00
Wagner Bruna
171b2222a5
fix: avoid segfault for pix2pix models without reference images ( #766 )
...
* fix: avoid segfault for pix2pix models with no reference images
* fix: default to empty reference on pix2pix models to avoid segfault
* use resize instead of reserve
* format code
---------
Co-authored-by: leejet <leejet714@gmail.com >
2025-09-18 00:11:38 +08:00
Erik Scholz
8909523e92
refactor: move tiling cacl and debug print into the tiling code branch ( #833 )
2025-09-16 22:46:56 +08:00
rmatif
8376dfba2a
feat: add sgm_uniform scheduler, simple scheduler, and support for NitroFusion ( #675 )
...
* feat: Add timestep shift and two new schedulers
* update readme
* fix spaces
* format code
* simplify SGMUniformSchedule
* simplify shifted_timestep logic
* avoid conflict
---------
Co-authored-by: leejet <leejet714@gmail.com >
2025-09-16 22:42:09 +08:00
leejet
0ebe6fe118
refactor: simplify the logic of pm id image loading ( #827 )
2025-09-14 22:50:21 +08:00
rmatif
55c2e05d98
feat: optimize tensor loading time ( #790 )
...
* opt tensor loading
* fix build failure
* revert the changes
* allow the use of n_threads
* fix lora loading
* optimize lora loading
* add mutex
* use atomic
* fix build
* fix potential duplicate issue
* avoid duplicate lookup of lora tensor
* fix progeress bar
* remove unused remove_duplicates
---------
Co-authored-by: leejet <leejet714@gmail.com >
2025-09-14 22:48:35 +08:00
leejet
52a97b3ac1
feat: add vace support ( #819 )
...
* add wan vace t2v support
* add --vace-strength option
* add vace i2v support
* fix the processing of vace_context
* add vace v2v support
* update docs
2025-09-14 16:57:33 +08:00
stduhpf
2c9b1e2594
feat: add VAE encoding tiling support and adaptive overlap ( #484 )
...
* implement tiling vae encode support
* Tiling (vae/upscale): adaptative overlap
* Tiling: fix edge case
* Tiling: fix crash when less than 2 tiles per dim
* remove extra dot
* Tiling: fix edge cases for adaptative overlap
* tiling: fix edge case
* set vae tile size via env var
* vae tiling: refactor again, base on smaller buffer for alignment
* Use bigger tiles for encode (to match compute buffer size)
* Fix edge case when tile is bigger than latent
* non-square VAE tiling (#3 )
* refactor tile number calculation
* support non-square tiles
* add env var to change tile overlap
* add safeguards and better error messages for SD_TILE_OVERLAP
* add safeguards and include overlapping factor for SD_TILE_SIZE
* avoid rounding issues when specifying SD_TILE_SIZE as a factor
* lower SD_TILE_OVERLAP limit
* zero-init empty output buffer
* Fix decode latent size
* fix encode
* tile size params instead of env
* Tiled vae parameter validation (#6 )
* avoid crash with invalid tile sizes, use 0 for default
* refactor default tile size, limit overlap factor
* remove explicit parameter for relative tile size
* limit encoding tile to latent size
* unify code style and format code
* update docs
* fix get_tile_sizes in decode_first_stage
---------
Co-authored-by: Wagner Bruna <wbruna@users.noreply.github.com >
Co-authored-by: leejet <leejet714@gmail.com >
2025-09-14 16:00:29 +08:00
leejet
dc46993b55
feat: increase work_ctx memory buffer size ( #814 )
2025-09-14 13:19:20 +08:00
Wagner Bruna
c607fc3ed4
feat: use Euler sampling by default for SD3 and Flux ( #753 )
...
Thank you for your contribution.
2025-09-14 12:34:41 +08:00
Wagner Bruna
b54bec3f18
fix: do not force VAE type to f32 on SDXL ( #716 )
...
This seems to be a leftover from the initial SDXL support: it's
not enough to avoid NaN issues, and it's not not needed for the
fixed sdxl-vae-fp16-fix .
2025-09-14 12:19:59 +08:00
Wagner Bruna
5869987fe4
fix: make weight override more robust against ggml changes ( #760 )
2025-09-14 12:15:53 +08:00