Commit Graph

127 Commits

Author SHA1 Message Date
rmatif
8376dfba2a feat: add sgm_uniform scheduler, simple scheduler, and support for NitroFusion (#675)
* feat: Add timestep shift and two new schedulers

* update readme

* fix spaces

* format code

* simplify SGMUniformSchedule

* simplify shifted_timestep logic

* avoid conflict

---------

Co-authored-by: leejet <leejet714@gmail.com>
2025-09-16 22:42:09 +08:00
leejet
0ebe6fe118 refactor: simplify the logic of pm id image loading (#827) 2025-09-14 22:50:21 +08:00
rmatif
55c2e05d98 feat: optimize tensor loading time (#790)
* opt tensor loading

* fix build failure

* revert the changes

* allow the use of n_threads

* fix lora loading

* optimize lora loading

* add mutex

* use atomic

* fix build

* fix potential duplicate issue

* avoid duplicate lookup of lora tensor

* fix progeress bar

* remove unused remove_duplicates

---------

Co-authored-by: leejet <leejet714@gmail.com>
2025-09-14 22:48:35 +08:00
leejet
52a97b3ac1 feat: add vace support (#819)
* add wan vace t2v support

* add --vace-strength option

* add vace i2v support

* fix the processing of vace_context

* add vace v2v support

* update docs
2025-09-14 16:57:33 +08:00
stduhpf
2c9b1e2594 feat: add VAE encoding tiling support and adaptive overlap (#484)
* implement  tiling vae encode support

* Tiling (vae/upscale): adaptative overlap

* Tiling: fix edge case

* Tiling: fix crash when less than 2 tiles per dim

* remove extra dot

* Tiling: fix edge cases for adaptative overlap

* tiling: fix edge case

* set vae tile size via env var

* vae tiling: refactor again, base on smaller buffer for alignment

* Use bigger tiles for encode (to match compute buffer size)

* Fix edge case when tile is bigger than latent

* non-square VAE tiling (#3)

* refactor tile number calculation

* support non-square tiles

* add env var to change tile overlap

* add safeguards and better error messages for SD_TILE_OVERLAP

* add safeguards and include overlapping factor for SD_TILE_SIZE

* avoid rounding issues when specifying SD_TILE_SIZE as a factor

* lower SD_TILE_OVERLAP limit

* zero-init empty output buffer

* Fix decode latent size

* fix encode

* tile size params instead of env

* Tiled vae parameter validation (#6)

* avoid crash with invalid tile sizes, use 0 for default

* refactor default tile size, limit overlap factor

* remove explicit parameter for relative tile size

* limit encoding tile to latent size

* unify code style and format code

* update docs

* fix get_tile_sizes in decode_first_stage

---------

Co-authored-by: Wagner Bruna <wbruna@users.noreply.github.com>
Co-authored-by: leejet <leejet714@gmail.com>
2025-09-14 16:00:29 +08:00
leejet
dc46993b55 feat: increase work_ctx memory buffer size (#814) 2025-09-14 13:19:20 +08:00
Wagner Bruna
c607fc3ed4 feat: use Euler sampling by default for SD3 and Flux (#753)
Thank you for your contribution.
2025-09-14 12:34:41 +08:00
Wagner Bruna
b54bec3f18 fix: do not force VAE type to f32 on SDXL (#716)
This seems to be a leftover from the initial SDXL support: it's
not enough to avoid NaN issues, and it's not not needed for the
fixed sdxl-vae-fp16-fix .
2025-09-14 12:19:59 +08:00
Wagner Bruna
5869987fe4 fix: make weight override more robust against ggml changes (#760) 2025-09-14 12:15:53 +08:00
Wagner Bruna
48956ffb87 feat: reduce CLIP memory usage with no embeddings (#768) 2025-09-14 12:08:00 +08:00
leejet
fce6afcc6a feat: add sd3 flash attn support (#815) 2025-09-11 23:24:29 +08:00
Erik Scholz
49d6570c43 feat: add SmoothStep Scheduler (#813) 2025-09-11 23:17:46 +08:00
clibdev
87cdbd5978 feat: use log_printf to print ggml logs (#545) 2025-09-11 22:16:05 +08:00
leejet
b017918106 chore: remove sd3 flash attention warn (#812) 2025-09-10 22:21:02 +08:00
Wagner Bruna
ac5a215998 fix: use {} for params init instead of memset (#781) 2025-09-10 21:49:29 +08:00
Wagner Bruna
abb36d66b5 chore: update flash attention warnings (#805) 2025-09-10 21:38:21 +08:00
Wagner Bruna
ff4fdbb88d fix: accept NULL in sd_img_gen_params_t::input_id_images_path (#809) 2025-09-10 21:22:55 +08:00
Markus Hartung
abb115cd02 fix: clarify lora quant support and small fixes (#792) 2025-09-08 22:39:25 +08:00
leejet
c648001030 feat: add detailed tensor loading time stat (#793) 2025-09-07 22:51:44 +08:00
stduhpf
c587a43c99 feat: support incrementing ref image index (omni-kontext) (#755)
* kontext: support  ref images indices

* lora: support x_embedder

* update help message

* Support for negative indices

* support for OmniControl (offsets at index 0)

* c++11 compat

* add --increase-ref-index option

* simplify the logic and fix some issues

* update README.md

* remove unused variable

---------

Co-authored-by: leejet <leejet714@gmail.com>
2025-09-07 22:35:16 +08:00
stduhpf
141a4b4113 feat: add flow shift parameter (for SD3 and Wan) (#780)
* Add flow shift parameter (for SD3 and Wan)

* unify code style and fix some issues

---------

Co-authored-by: leejet <leejet714@gmail.com>
2025-09-07 02:16:59 +08:00
stduhpf
21ce9fe2cf feat: add support for timestep boundary based automatic expert routing in Wan MoE (#779)
* Wan MoE: Automatic expert routing based on timestep boundary

* unify code style and fix some issues

---------

Co-authored-by: leejet <leejet714@gmail.com>
2025-09-07 01:44:10 +08:00
leejet
cb1d975e96 feat: add wan2.1/2.2 support (#778)
* add wan vae suppport

* add wan model support

* add umt5 support

* add wan2.1 t2i support

* make flash attn work with wan

* make wan a little faster

* add wan2.1 t2v support

* add wan gguf support

* add offload params to cpu support

* add wan2.1 i2v support

* crop image before resize

* set default fps to 16

* add diff lora support

* fix wan2.1 i2v

* introduce sd_sample_params_t

* add wan2.2 t2v support

* add wan2.2 14B i2v support

* add wan2.2 ti2v support

* add high noise lora support

* sync: update ggml submodule url

* avoid build failure on linux

* avoid build failure

* update ggml

* update ggml

* fix sd_version_is_wan

* update ggml, fix cpu im2col_3d

* fix ggml_nn_attention_ext mask

* add cache support to ggml runner

* fix the issue of illegal memory access

* unify image loading processing

* add wan2.1/2.2 FLF2V support

* fix end_image mask

* update to latest ggml

* add GGUFReader

* update docs
2025-09-06 18:08:03 +08:00
Daniele
5b8996f74a Conv2D direct support (#744)
* Conv2DDirect for VAE stage

* Enable only for Vulkan, reduced duplicated code

* Cmake option to use conv2d direct

* conv2d direct always on for opencl

* conv direct as a flag

* fix merge typo

* Align conv2d behavior to flash attention's

* fix readme

* add conv2d direct for controlnet

* add conv2d direct for esrgan

* clean code, use enable_conv2d_direct/get_all_blocks

* format code

---------

Co-authored-by: leejet <leejet714@gmail.com>
2025-08-03 01:25:17 +08:00
stduhpf
59080d3ce1 feat: change image dimensions requirement for DiT models (#742) 2025-07-28 21:58:17 +08:00
Oleg Skutte
1896b28ef2 fix: make --taesd work (#731) 2025-07-15 00:45:22 +08:00
leejet
ca0bd9396e refactor: update c api (#728) 2025-07-13 18:48:42 +08:00
stduhpf
a772dca27a feat: add Instruct-Pix2pix/CosXL-Edit support (#679)
* Instruct-p2p support

* support 2 conditionings cfg

* Do not re-encode the exact same image twice

* fixes for 2-cfg

* Fix pix2pix latent inputs + improve inpainting a bit + fix naming

* prepare for other pix2pix-like models

* Support sdxl ip2p

* fix reference image embeddings

* Support 2-cond cfg properly in cli

* fix typo in help

* Support masks for ip2p models

* unify code style

* delete unused code

* use edit mode

* add img_cond

* format code

---------

Co-authored-by: leejet <leejet714@gmail.com>
2025-07-12 15:36:45 +08:00
stduhpf
19fbfd8639 feat: override text encoders for unet models (#682) 2025-07-04 22:19:47 +08:00
leejet
7dac89ad75 refector: reuse some code 2025-07-01 23:33:50 +08:00
stduhpf
9251756086 feat: add CosXL support (#683) 2025-07-01 23:13:04 +08:00
leejet
23de7fc44a chore: avoid warnings when building on linux 2025-06-30 23:49:52 +08:00
rmatif
d42fd59464 feat: add OpenCL backend support (#680) 2025-06-30 23:32:23 +08:00
leejet
45d0ebb30c style: format code 2025-06-29 23:40:55 +08:00
stduhpf
b1cc40c35c feat: add Chroma support (#696)
---------

Co-authored-by: Green Sky <Green-Sky@users.noreply.github.com>
Co-authored-by: leejet <leejet714@gmail.com>
2025-06-29 23:36:42 +08:00
stduhpf
c9b5735116 feat: add FLUX.1 Kontext dev support (#707)
* Kontext support
* add edit mode

---------

Co-authored-by: leejet <leejet714@gmail.com>
2025-06-29 10:08:53 +08:00
vmobilis
10feacf031 fix: correct img2img time (#616) 2025-03-09 12:29:08 +08:00
yslai
19d876ee30 feat: implement DDIM with the "trailing" timestep spacing and TCD (#568) 2025-02-22 21:34:22 +08:00
stduhpf
f23b803a6b fix:: unapply current loras properly (#590) 2025-02-22 21:22:22 +08:00
stduhpf
d9b5942d98 feat: add sdxl v-pred suppport (#536) 2025-01-18 13:15:54 +08:00
stduhpf
587a37b2e2 fix: avoid sd2((non inpaint) crash on v-pred check (#537) 2025-01-18 13:13:34 +08:00
leejet
dcf91f9e0f chore: change SD_CUBLAS/SD_USE_CUBLAS to SD_CUDA/SD_USE_CUDA 2024-12-28 13:27:51 +08:00
stduhpf
d50473dc49 feat: support 16 channel tae (taesd/taef1) (#527) 2024-12-28 13:13:48 +08:00
stduhpf
0d9d6659a7 fix: fix metal build (#513) 2024-12-28 13:06:17 +08:00
stduhpf
8f4ab9add3 feat: support Inpaint models (#511) 2024-12-28 13:04:49 +08:00
stduhpf
cc92a6a1b3 feat: support more LoRA models (#520) 2024-12-28 12:56:44 +08:00
stduhpf
7ce63e740c feat: flexible model architecture for dit models (Flux & SD3) (#490)
* Refactor: wtype per tensor

* Fix default args

* refactor: fix flux

* Refactor photmaker v2 support

* unet: refactor the refactoring

* Refactor: fix controlnet and tae

* refactor: upscaler

* Refactor: fix runtime type override

* upscaler: use fp16 again

* Refactor: Flexible sd3 arch

* Refactor: Flexible Flux arch

* format code

---------

Co-authored-by: leejet <leejet714@gmail.com>
2024-11-30 14:18:53 +08:00
stduhpf
53b415f787 fix: remove default variables in c headers (#478) 2024-11-24 18:10:25 +08:00
leejet
b5f4932696 refactor: add some sd vesion helper functions 2024-11-23 13:02:44 +08:00
Erik Scholz
1c168d98a5 fix: repair flash attention support (#386)
* repair flash attention in _ext
this does not fix the currently broken fa behind the define, which is only used by VAE

Co-authored-by: FSSRepo <FSSRepo@users.noreply.github.com>

* make flash attention in the diffusion model a runtime flag
no support for sd3 or video

* remove old flash attention option and switch vae over to attn_ext

* update docs

* format code

---------

Co-authored-by: FSSRepo <FSSRepo@users.noreply.github.com>
Co-authored-by: leejet <leejet714@gmail.com>
2024-11-23 12:39:08 +08:00