rocm-stable-diffusion.cpp

mirror of https://github.com/BillyOutlast/rocm-stable-diffusion.cpp.git synced 2026-02-04 11:11:19 +01:00

Author	SHA1	Message	Date
rmatif	8376dfba2a	feat: add sgm_uniform scheduler, simple scheduler, and support for NitroFusion (#675 ) * feat: Add timestep shift and two new schedulers * update readme * fix spaces * format code * simplify SGMUniformSchedule * simplify shifted_timestep logic * avoid conflict --------- Co-authored-by: leejet <leejet714@gmail.com>	2025-09-16 22:42:09 +08:00
leejet	0ebe6fe118	refactor: simplify the logic of pm id image loading (#827 )	2025-09-14 22:50:21 +08:00
rmatif	55c2e05d98	feat: optimize tensor loading time (#790 ) * opt tensor loading * fix build failure * revert the changes * allow the use of n_threads * fix lora loading * optimize lora loading * add mutex * use atomic * fix build * fix potential duplicate issue * avoid duplicate lookup of lora tensor * fix progeress bar * remove unused remove_duplicates --------- Co-authored-by: leejet <leejet714@gmail.com>	2025-09-14 22:48:35 +08:00
leejet	52a97b3ac1	feat: add vace support (#819 ) * add wan vace t2v support * add --vace-strength option * add vace i2v support * fix the processing of vace_context * add vace v2v support * update docs	2025-09-14 16:57:33 +08:00
stduhpf	2c9b1e2594	feat: add VAE encoding tiling support and adaptive overlap (#484 ) * implement tiling vae encode support * Tiling (vae/upscale): adaptative overlap * Tiling: fix edge case * Tiling: fix crash when less than 2 tiles per dim * remove extra dot * Tiling: fix edge cases for adaptative overlap * tiling: fix edge case * set vae tile size via env var * vae tiling: refactor again, base on smaller buffer for alignment * Use bigger tiles for encode (to match compute buffer size) * Fix edge case when tile is bigger than latent * non-square VAE tiling (#3) * refactor tile number calculation * support non-square tiles * add env var to change tile overlap * add safeguards and better error messages for SD_TILE_OVERLAP * add safeguards and include overlapping factor for SD_TILE_SIZE * avoid rounding issues when specifying SD_TILE_SIZE as a factor * lower SD_TILE_OVERLAP limit * zero-init empty output buffer * Fix decode latent size * fix encode * tile size params instead of env * Tiled vae parameter validation (#6) * avoid crash with invalid tile sizes, use 0 for default * refactor default tile size, limit overlap factor * remove explicit parameter for relative tile size * limit encoding tile to latent size * unify code style and format code * update docs * fix get_tile_sizes in decode_first_stage --------- Co-authored-by: Wagner Bruna <wbruna@users.noreply.github.com> Co-authored-by: leejet <leejet714@gmail.com>	2025-09-14 16:00:29 +08:00
leejet	dc46993b55	feat: increase work_ctx memory buffer size (#814 )	2025-09-14 13:19:20 +08:00
Wagner Bruna	c607fc3ed4	feat: use Euler sampling by default for SD3 and Flux (#753 ) Thank you for your contribution.	2025-09-14 12:34:41 +08:00
Wagner Bruna	b54bec3f18	fix: do not force VAE type to f32 on SDXL (#716 ) This seems to be a leftover from the initial SDXL support: it's not enough to avoid NaN issues, and it's not not needed for the fixed sdxl-vae-fp16-fix .	2025-09-14 12:19:59 +08:00
Wagner Bruna	5869987fe4	fix: make weight override more robust against ggml changes (#760 )	2025-09-14 12:15:53 +08:00
Wagner Bruna	48956ffb87	feat: reduce CLIP memory usage with no embeddings (#768 )	2025-09-14 12:08:00 +08:00
leejet	fce6afcc6a	feat: add sd3 flash attn support (#815 )	2025-09-11 23:24:29 +08:00
Erik Scholz	49d6570c43	feat: add SmoothStep Scheduler (#813 )	2025-09-11 23:17:46 +08:00
clibdev	87cdbd5978	feat: use log_printf to print ggml logs (#545 )	2025-09-11 22:16:05 +08:00
leejet	b017918106	chore: remove sd3 flash attention warn (#812 )	2025-09-10 22:21:02 +08:00
Wagner Bruna	ac5a215998	fix: use {} for params init instead of memset (#781 )	2025-09-10 21:49:29 +08:00
Wagner Bruna	abb36d66b5	chore: update flash attention warnings (#805 )	2025-09-10 21:38:21 +08:00
Wagner Bruna	ff4fdbb88d	fix: accept NULL in sd_img_gen_params_t::input_id_images_path (#809 )	2025-09-10 21:22:55 +08:00
Markus Hartung	abb115cd02	fix: clarify lora quant support and small fixes (#792 )	2025-09-08 22:39:25 +08:00
leejet	c648001030	feat: add detailed tensor loading time stat (#793 )	2025-09-07 22:51:44 +08:00
stduhpf	c587a43c99	feat: support incrementing ref image index (omni-kontext) (#755 ) * kontext: support ref images indices * lora: support x_embedder * update help message * Support for negative indices * support for OmniControl (offsets at index 0) * c++11 compat * add --increase-ref-index option * simplify the logic and fix some issues * update README.md * remove unused variable --------- Co-authored-by: leejet <leejet714@gmail.com>	2025-09-07 22:35:16 +08:00
stduhpf	141a4b4113	feat: add flow shift parameter (for SD3 and Wan) (#780 ) * Add flow shift parameter (for SD3 and Wan) * unify code style and fix some issues --------- Co-authored-by: leejet <leejet714@gmail.com>	2025-09-07 02:16:59 +08:00
stduhpf	21ce9fe2cf	feat: add support for timestep boundary based automatic expert routing in Wan MoE (#779 ) * Wan MoE: Automatic expert routing based on timestep boundary * unify code style and fix some issues --------- Co-authored-by: leejet <leejet714@gmail.com>	2025-09-07 01:44:10 +08:00
leejet	cb1d975e96	feat: add wan2.1/2.2 support (#778 ) * add wan vae suppport * add wan model support * add umt5 support * add wan2.1 t2i support * make flash attn work with wan * make wan a little faster * add wan2.1 t2v support * add wan gguf support * add offload params to cpu support * add wan2.1 i2v support * crop image before resize * set default fps to 16 * add diff lora support * fix wan2.1 i2v * introduce sd_sample_params_t * add wan2.2 t2v support * add wan2.2 14B i2v support * add wan2.2 ti2v support * add high noise lora support * sync: update ggml submodule url * avoid build failure on linux * avoid build failure * update ggml * update ggml * fix sd_version_is_wan * update ggml, fix cpu im2col_3d * fix ggml_nn_attention_ext mask * add cache support to ggml runner * fix the issue of illegal memory access * unify image loading processing * add wan2.1/2.2 FLF2V support * fix end_image mask * update to latest ggml * add GGUFReader * update docs	2025-09-06 18:08:03 +08:00
Daniele	5b8996f74a	Conv2D direct support (#744 ) * Conv2DDirect for VAE stage * Enable only for Vulkan, reduced duplicated code * Cmake option to use conv2d direct * conv2d direct always on for opencl * conv direct as a flag * fix merge typo * Align conv2d behavior to flash attention's * fix readme * add conv2d direct for controlnet * add conv2d direct for esrgan * clean code, use enable_conv2d_direct/get_all_blocks * format code --------- Co-authored-by: leejet <leejet714@gmail.com>	2025-08-03 01:25:17 +08:00
stduhpf	59080d3ce1	feat: change image dimensions requirement for DiT models (#742 )	2025-07-28 21:58:17 +08:00
Oleg Skutte	1896b28ef2	fix: make --taesd work (#731 )	2025-07-15 00:45:22 +08:00
leejet	ca0bd9396e	refactor: update c api (#728 )	2025-07-13 18:48:42 +08:00
stduhpf	a772dca27a	feat: add Instruct-Pix2pix/CosXL-Edit support (#679 ) * Instruct-p2p support * support 2 conditionings cfg * Do not re-encode the exact same image twice * fixes for 2-cfg * Fix pix2pix latent inputs + improve inpainting a bit + fix naming * prepare for other pix2pix-like models * Support sdxl ip2p * fix reference image embeddings * Support 2-cond cfg properly in cli * fix typo in help * Support masks for ip2p models * unify code style * delete unused code * use edit mode * add img_cond * format code --------- Co-authored-by: leejet <leejet714@gmail.com>	2025-07-12 15:36:45 +08:00
stduhpf	19fbfd8639	feat: override text encoders for unet models (#682 )	2025-07-04 22:19:47 +08:00
leejet	7dac89ad75	refector: reuse some code	2025-07-01 23:33:50 +08:00
stduhpf	9251756086	feat: add CosXL support (#683 )	2025-07-01 23:13:04 +08:00
leejet	23de7fc44a	chore: avoid warnings when building on linux	2025-06-30 23:49:52 +08:00
rmatif	d42fd59464	feat: add OpenCL backend support (#680 )	2025-06-30 23:32:23 +08:00
leejet	45d0ebb30c	style: format code	2025-06-29 23:40:55 +08:00
stduhpf	b1cc40c35c	feat: add Chroma support (#696 ) --------- Co-authored-by: Green Sky <Green-Sky@users.noreply.github.com> Co-authored-by: leejet <leejet714@gmail.com>	2025-06-29 23:36:42 +08:00
stduhpf	c9b5735116	feat: add FLUX.1 Kontext dev support (#707 ) * Kontext support * add edit mode --------- Co-authored-by: leejet <leejet714@gmail.com>	2025-06-29 10:08:53 +08:00
vmobilis	10feacf031	fix: correct img2img time (#616 )	2025-03-09 12:29:08 +08:00
yslai	19d876ee30	feat: implement DDIM with the "trailing" timestep spacing and TCD (#568 )	2025-02-22 21:34:22 +08:00
stduhpf	f23b803a6b	fix:: unapply current loras properly (#590 )	2025-02-22 21:22:22 +08:00
stduhpf	d9b5942d98	feat: add sdxl v-pred suppport (#536 )	2025-01-18 13:15:54 +08:00
stduhpf	587a37b2e2	fix: avoid sd2((non inpaint) crash on v-pred check (#537 )	2025-01-18 13:13:34 +08:00
leejet	dcf91f9e0f	chore: change SD_CUBLAS/SD_USE_CUBLAS to SD_CUDA/SD_USE_CUDA	2024-12-28 13:27:51 +08:00
stduhpf	d50473dc49	feat: support 16 channel tae (taesd/taef1) (#527 )	2024-12-28 13:13:48 +08:00
stduhpf	0d9d6659a7	fix: fix metal build (#513 )	2024-12-28 13:06:17 +08:00
stduhpf	8f4ab9add3	feat: support Inpaint models (#511 )	2024-12-28 13:04:49 +08:00
stduhpf	cc92a6a1b3	feat: support more LoRA models (#520 )	2024-12-28 12:56:44 +08:00
stduhpf	7ce63e740c	feat: flexible model architecture for dit models (Flux & SD3) (#490 ) * Refactor: wtype per tensor * Fix default args * refactor: fix flux * Refactor photmaker v2 support * unet: refactor the refactoring * Refactor: fix controlnet and tae * refactor: upscaler * Refactor: fix runtime type override * upscaler: use fp16 again * Refactor: Flexible sd3 arch * Refactor: Flexible Flux arch * format code --------- Co-authored-by: leejet <leejet714@gmail.com>	2024-11-30 14:18:53 +08:00
stduhpf	53b415f787	fix: remove default variables in c headers (#478 )	2024-11-24 18:10:25 +08:00
leejet	b5f4932696	refactor: add some sd vesion helper functions	2024-11-23 13:02:44 +08:00
Erik Scholz	1c168d98a5	fix: repair flash attention support (#386 ) * repair flash attention in _ext this does not fix the currently broken fa behind the define, which is only used by VAE Co-authored-by: FSSRepo <FSSRepo@users.noreply.github.com> * make flash attention in the diffusion model a runtime flag no support for sd3 or video * remove old flash attention option and switch vae over to attn_ext * update docs * format code --------- Co-authored-by: FSSRepo <FSSRepo@users.noreply.github.com> Co-authored-by: leejet <leejet714@gmail.com>	2024-11-23 12:39:08 +08:00

1 2 3

127 Commits