rocm-stable-diffusion.cpp

mirror of https://github.com/BillyOutlast/rocm-stable-diffusion.cpp.git synced 2026-02-04 11:11:19 +01:00

Author	SHA1	Message	Date
Wagner Bruna	aa44e06890	fix: avoid crash with LoRAs and type override (#974 )	2025-11-16 14:47:36 +08:00
leejet	347710f68f	feat: support applying LoRA at runtime (#969 )	2025-11-13 21:48:44 +08:00
leejet	694f0d9235	refactor: optimize the logic for name conversion and the processing of the LoRA model (#955 )	2025-11-10 00:12:20 +08:00
stduhpf	8ecdf053ac	feat: add image preview support (#522 )	2025-11-10 00:12:02 +08:00
leejet	ee89afc878	fix: resolve issue with pmid (#957 )	2025-11-09 22:47:53 +08:00
akleine	d2d3944f50	feat: add support for SD2.x with TINY U-Nets (#939 )	2025-11-09 22:47:37 +08:00
akleine	0fa3e1a383	fix: prevent core dump in PM V2 in case of incomplete cmd line (#950 )	2025-11-09 22:36:43 +08:00
stduhpf	fb748bb8a4	fix: TAE encoding (#935 )	2025-11-07 22:58:59 +08:00
leejet	8f6c5c217b	refactor: simplify the model loading logic (#933 ) * remove String2GGMLType * remove preprocess_tensor * fix clip init * simplify the logic for reading weights	2025-11-03 21:21:34 +08:00
leejet	6103d86e2c	refactor: introduce GGMLRunnerContext (#928 ) * introduce GGMLRunnerContext * add Flash Attention enable control through GGMLRunnerContext * add conv2d_direct enable control through GGMLRunnerContext	2025-11-02 02:11:04 +08:00
stduhpf	c42826b77c	fix: resolve multiple inpainting issues (#926 ) * Fix inpainting masked image being broken by side effect * Fix unet inpainting concat not being set * Fix Flex.2 inpaint mode crash (+ use scale factor)	2025-11-02 02:10:32 +08:00
leejet	dd75fc081c	refactor: unify the naming style of ggml extension functions (#921 )	2025-10-28 23:26:48 +08:00
Wagner Bruna	8a45d0ff7f	chore: clean up stb includes (#919 )	2025-10-28 23:25:45 +08:00
leejet	9e28be6479	feat: add chroma radiance support (#910 ) * add chroma radiance support * fix ci * simply generate_init_latent * workaround: avoid ggml cuda error * format code * add chroma radiance doc	2025-10-25 23:56:14 +08:00
akleine	062490aa7c	feat: add SSD1B and tiny-sd support (#897 ) * feat: add code and doc for running SSD1B models * Added some more lines to support SD1.x with TINY U-Nets too. * support SSD-1B.safetensors * fix sdv1.5 diffusers format loader --------- Co-authored-by: leejet <leejet714@gmail.com>	2025-10-25 23:35:54 +08:00
stduhpf	917f7bfe99	fix: support `--flow-shift` for flux models with default pred (#913 )	2025-10-23 21:35:18 +08:00
leejet	48e0a28ddf	feat: add shift factor support (#903 )	2025-10-23 01:20:29 +08:00
leejet	d05e46ca5e	chore: add .clang-tidy configuration and apply modernize checks (#902 )	2025-10-18 23:23:40 +08:00
leejet	90ef5f8246	feat: add auto-resize support for reference images (was Qwen-Image-Edit only) (#898 )	2025-10-18 16:37:09 +08:00
leejet	db6f4791b4	feat: add wtype stat (#899 )	2025-10-17 23:40:32 +08:00
leejet	40a6a8710e	fix: resolve precision issues in SDXL VAE under fp16 (#888 ) * fix: resolve precision issues in SDXL VAE under fp16 * add --force-sdxl-vae-conv-scale option * update docs	2025-10-15 23:01:00 +08:00
Daniele	e3702585cb	feat: added prediction argument (#334 )	2025-10-15 23:00:10 +08:00
leejet	2e9242e37f	feat: add Qwen Image Edit support (#877 ) * add ref latent support for qwen image * optimize clip_preprocess and fix get_first_stage_encoding * add qwen2vl vit support * add qwen image edit support * fix qwen image edit pipeline * add mmproj file support * support dynamic number of Qwen image transformer blocks * set prompt_template_encode_start_idx every time * to_add_out precision fix * to_out.0 precision fix * update docs	2025-10-13 23:17:18 +08:00
Wagner Bruna	c64994dc1d	fix: better progress display for second-order samplers (#834 )	2025-10-13 22:12:48 +08:00
Wagner Bruna	5436f6b814	fix: correct canny preprocessor (#861 )	2025-10-13 22:02:35 +08:00
leejet	1c32fa03bc	fix: avoid generating black images when running T5 on the GPU (#882 )	2025-10-13 00:01:06 +08:00
Wagner Bruna	9727c6bb98	fix: resolve VAE tiling problem in Qwen Image (#873 )	2025-10-12 23:45:53 +08:00
leejet	beb99a2de2	feat: add Qwen Image support (#851 ) * add qwen tokenizer * add qwen2.5 vl support * mv qwen.hpp -> qwenvl.hpp * add qwen image model * add qwen image t2i pipeline * fix qwen image flash attn * add qwen image i2i pipline * change encoding of vocab_qwen.hpp to utf8 * fix get_first_stage_encoding * apply jeffbolz f32 patch https://github.com/leejet/stable-diffusion.cpp/pull/851#issuecomment-3335515302 * fix the issue that occurs when using CUDA with k-quants weights * optimize the handling of the FeedForward precision fix * to_add_out precision fix * update docs	2025-10-12 23:23:19 +08:00
Wagner Bruna	aa68b875b9	refactor: deal with default img-cfg-scale at the library level (#869 )	2025-10-12 23:17:52 +08:00
stduhpf	11f436c483	feat: add support for Flux Controls and Flex.2 (#692 )	2025-10-11 00:06:57 +08:00
Wagner Bruna	f3140eadbb	fix: tensor loading thread count (#854 )	2025-09-25 00:26:38 +08:00
leejet	fd693ac6a2	refactor: remove unused --normalize-input parameter (#835 )	2025-09-18 00:12:53 +08:00
Wagner Bruna	171b2222a5	fix: avoid segfault for pix2pix models without reference images (#766 ) * fix: avoid segfault for pix2pix models with no reference images * fix: default to empty reference on pix2pix models to avoid segfault * use resize instead of reserve * format code --------- Co-authored-by: leejet <leejet714@gmail.com>	2025-09-18 00:11:38 +08:00
Erik Scholz	8909523e92	refactor: move tiling cacl and debug print into the tiling code branch (#833 )	2025-09-16 22:46:56 +08:00
rmatif	8376dfba2a	feat: add sgm_uniform scheduler, simple scheduler, and support for NitroFusion (#675 ) * feat: Add timestep shift and two new schedulers * update readme * fix spaces * format code * simplify SGMUniformSchedule * simplify shifted_timestep logic * avoid conflict --------- Co-authored-by: leejet <leejet714@gmail.com>	2025-09-16 22:42:09 +08:00
leejet	0ebe6fe118	refactor: simplify the logic of pm id image loading (#827 )	2025-09-14 22:50:21 +08:00
rmatif	55c2e05d98	feat: optimize tensor loading time (#790 ) * opt tensor loading * fix build failure * revert the changes * allow the use of n_threads * fix lora loading * optimize lora loading * add mutex * use atomic * fix build * fix potential duplicate issue * avoid duplicate lookup of lora tensor * fix progeress bar * remove unused remove_duplicates --------- Co-authored-by: leejet <leejet714@gmail.com>	2025-09-14 22:48:35 +08:00
leejet	52a97b3ac1	feat: add vace support (#819 ) * add wan vace t2v support * add --vace-strength option * add vace i2v support * fix the processing of vace_context * add vace v2v support * update docs	2025-09-14 16:57:33 +08:00
stduhpf	2c9b1e2594	feat: add VAE encoding tiling support and adaptive overlap (#484 ) * implement tiling vae encode support * Tiling (vae/upscale): adaptative overlap * Tiling: fix edge case * Tiling: fix crash when less than 2 tiles per dim * remove extra dot * Tiling: fix edge cases for adaptative overlap * tiling: fix edge case * set vae tile size via env var * vae tiling: refactor again, base on smaller buffer for alignment * Use bigger tiles for encode (to match compute buffer size) * Fix edge case when tile is bigger than latent * non-square VAE tiling (#3) * refactor tile number calculation * support non-square tiles * add env var to change tile overlap * add safeguards and better error messages for SD_TILE_OVERLAP * add safeguards and include overlapping factor for SD_TILE_SIZE * avoid rounding issues when specifying SD_TILE_SIZE as a factor * lower SD_TILE_OVERLAP limit * zero-init empty output buffer * Fix decode latent size * fix encode * tile size params instead of env * Tiled vae parameter validation (#6) * avoid crash with invalid tile sizes, use 0 for default * refactor default tile size, limit overlap factor * remove explicit parameter for relative tile size * limit encoding tile to latent size * unify code style and format code * update docs * fix get_tile_sizes in decode_first_stage --------- Co-authored-by: Wagner Bruna <wbruna@users.noreply.github.com> Co-authored-by: leejet <leejet714@gmail.com>	2025-09-14 16:00:29 +08:00
leejet	dc46993b55	feat: increase work_ctx memory buffer size (#814 )	2025-09-14 13:19:20 +08:00
Wagner Bruna	c607fc3ed4	feat: use Euler sampling by default for SD3 and Flux (#753 ) Thank you for your contribution.	2025-09-14 12:34:41 +08:00
Wagner Bruna	b54bec3f18	fix: do not force VAE type to f32 on SDXL (#716 ) This seems to be a leftover from the initial SDXL support: it's not enough to avoid NaN issues, and it's not not needed for the fixed sdxl-vae-fp16-fix .	2025-09-14 12:19:59 +08:00
Wagner Bruna	5869987fe4	fix: make weight override more robust against ggml changes (#760 )	2025-09-14 12:15:53 +08:00
Wagner Bruna	48956ffb87	feat: reduce CLIP memory usage with no embeddings (#768 )	2025-09-14 12:08:00 +08:00
leejet	fce6afcc6a	feat: add sd3 flash attn support (#815 )	2025-09-11 23:24:29 +08:00
Erik Scholz	49d6570c43	feat: add SmoothStep Scheduler (#813 )	2025-09-11 23:17:46 +08:00
clibdev	87cdbd5978	feat: use log_printf to print ggml logs (#545 )	2025-09-11 22:16:05 +08:00
leejet	b017918106	chore: remove sd3 flash attention warn (#812 )	2025-09-10 22:21:02 +08:00
Wagner Bruna	ac5a215998	fix: use {} for params init instead of memset (#781 )	2025-09-10 21:49:29 +08:00
Wagner Bruna	abb36d66b5	chore: update flash attention warnings (#805 )	2025-09-10 21:38:21 +08:00

1 2 3 4

161 Commits