π¨ FoxeTales AI Pipeline β Iter 7 (Current State)
Updated: 2026-05-28 | Latest from production code at github.com/Foxetales/ai-infra/workflows/faceswap_fullbody_v1.json
π₯ Inputs (per render job)
βοΈ Processing pipeline
1. PuLID identity extraction
Customer photo β InsightFace AntelopeV2 face detection β EVA-CLIP embedding β PuLID-Flux v0.9.1 identity vector
Output: 1085MB PuLID model loaded, identity embedding ~512-dim
β
2. Mask preparation
Person mask β GrowMask(+2) β FeatherMask(12px) β smooth person-region mask for inpaint
Iter7 change: switched from face-only mask to person-region mask (Chi's feedback fix)
β
3. Conditioning
CLIP text encoders: CLIP-L + T5-xxl-fp16 (9.2GB)
Positive prompt: "fxtl style watercolor, {ethnicity} child, full body, skin tone consistent across face arms and legs, FLAT 2D, Foxetales storybook"
Negative: "photograph, realistic, 3d, semi-realistic, beauty filter"
β
4. FLUX KSampler
FLUX.1-dev (23GB) + FoxeTales LoRA v4 (strength 0.7) + PuLID identity injection
Iter7 locked config: sampler euler, scheduler beta, steps 25, denoise 0.55 (preserves pose vs face-only 0.85)
SetLatentNoiseMask = person-region mask
Output: 1024Γ1024 latent regenerated in person region only
β
5. VAE decode
FLUX VAE (ae.safetensors, 320MB) β decode latent to RGB image (1024Γ1024)
β
6. Composite
ImageCompositeMasked with feathered person mask β blend regenerated character into original background plate
β
7. Skin shift postproc (Black/dark ethnicity only)
pipeline/skin_shift.py β YCbCr skin detection Γ person_mask β multiply skin pixels by RGB scale
Presets: asian/blonde (no-op), indian (.85/.75/.65), black (.65/.50/.42), black_deep (.55/.42/.35)
WHY: Style LoRA trained on light-skin children β fights dark skin signal. Postproc bypasses LoRA bias.
β
π€ Output: Final personalized page (1024Γ1024 PNG, ~1MB) ready for book layout / preview
π Performance characteristics
- GPU time: ~25-32s per page (warm), ~60s first job (cold model load)
- VRAM: ~45GB peak (FLUX 22.7GB + T5 9.3GB + PuLID 1.1GB + LoRA + working memory)
- Throughput: ~110-130 pages/hour on RTX A6000 48GB
- Cost: $0.43/h Γ 1 hour β $0.43 / 120 pages = ~$0.0036/page raw GPU
π Iter7 changes vs previous
| Component | Iter6 (face-only) | Iter7 (current) |
| Mask type | Face bounding box | Person region (alpha-derived) |
| KSampler denoise | 0.85 | 0.55 (preserve pose) |
| Grow / Feather | None | +2 / 12px (smooth body edges) |
| Prompt skin anchor | Just "{ethnicity}" | "+ skin tone consistent across face arms legs" |
| Postproc | None | YCbCr skin shift for dark ethnicities |
| ControlNet | Planned | β Not yet β Phase 2 (cαΊ§n cho pose-sensitive pages) |
π§ Known limitations (need Phase 2 work)
- Pose preservation on sleeping/side-view pages (p42 risk) β needs ControlNet OpenPose
- Hair style flexibility β currently forced to template hair (Chi accepted face-only limit on hair)
- Black skin tone via LoRA β postproc workaround works but ideal is retrain LoRA v5 with diverse dataset
- Speed β 25-32s/page acceptable for batch but slow for live preview (target <10s with fp8 quantization)