🦊 FoxeTales AI Pipeline

Iter 7 — Polish Demo & Roadmap
2026-05-28 · Phase 1 (M1) Wrap-up Review

🎯 TL;DR

Đã làm xong: Pipeline face-swap full character với FLUX + PuLID + style LoRA, validated trên 5 ethnicities × 34 trang sách T16.

Quality: Asian + Blonde production-ready. Indian medium. Black ethnicity cần fix structural.

Blocker chính: Style LoRA v4 trained chủ yếu trên light-skin children → fight dark skin tones. Cần dataset diverse hơn.

Cần từ artist: (1) per-page person-region masks chuẩn, (2) bổ sung dataset diverse skin tones, (3) confirm template variants strategy.

🏗️ System Architecture & Diagrams

👉 Xem tất cả 6 diagrams tại đây → (Mermaid live render, zoomable, version-controlled)

📜 Legacy diagrams (May 15, kept for reference)

System Architecture (legacy)

System architecture
Cloudflare Workers + R2 + RunPod + Shopify · still accurate at high level

End-to-end Sequence (legacy)

Sequence diagram
Customer journey from upload to print delivery

AI Pipeline (Iter 7 — current production state)

Full detailed pipeline doc: 📄 AI Pipeline Iter7 (HTML)

Lưu ý: diễn biến iter7 rất khác diễn biến iter6/i5. Diagram cũ đã outdated, click link trên để xem state hiện tại.

Quick summary: 7 stages

  1. PuLID identity extraction — InsightFace + EVA-CLIP → PuLID embedding
  2. Mask preparation — GrowMask(+2) + FeatherMask(12px) on person-region mask [ITER7 NEW]
  3. Conditioning — CLIP-L + T5-xxl + prompt with ethnicity + skin anchor
  4. FLUX KSampler — FLUX.1-dev + LoRA v4 + PuLID inject, denoise 0.55 [ITER7 NEW]
  5. VAE decode — latent → RGB 1024×1024
  6. Composite — blend regen char into background plate
  7. Skin shift postproc — YCbCr mask + RGB multiply (Black/dark ethnicity only) [ITER7 NEW]

Performance

📜 Show legacy diagram (iter5, May 15) for comparison
Pipeline legacy
Lưu: diagram này còn một số component chưa implement (ControlNet) — đã bị thay thế bởi pipeline iter7

⚙️ Pipeline đơn giản hóa

Flow xử lý 1 page sách cho 1 customer:

📷 Customer Photo
ảnh kid upload
🎭 PuLID
extract identity
🎨 FLUX + LoRA
render watercolor
🖼️ Composite
char + background
🌓 Skin Shift
postproc (dark skin)
📚 Final Page
personalized

Tech stack

ComponentToolStatus
Base modelFLUX.1-dev (23GB)✓ Production
Identity transferPuLID-Flux v0.9.1✓ Production
Style adapterFoxeTales LoRA v4 (1000 steps)⚠ Cần retrain với diverse data
Workflow engineComfyUI✓ Production
GPURTX A6000 48GB (Vast.ai)✓ $0.43/h, ~30s/page
Repogithub.com/Foxetales/ai-infra✓ Private, all artifacts

📊 Kết quả

Sheet 1: Polish 5 ethnicities × 2 templates (iter6 baseline)

5 ethnicities × 2 templates
VN/Asian + Blonde production-ready. Indian medium. Black weak — skin tone fight LoRA prior.

Sheet 2: Face-only vs Full-character mask A/B

Face vs Person mask
Face-only: VN/Indian/Blonde nhìn giống nhau ở full-page. Person-mask: identity rõ hơn nhưng phá pose ở các trang risk.

Sheet 3: Black ethnicity tactics exhausted

Black tactics
4 strategies (strong anchors, anti-Caucasian, max PuLID, combo) đều không đẩy skin tone đủ dark. Prompt-only hit wall.

Sheet 4: Full-body workflow comparison (iter7 winner)

Fullbody vs Faceonly
Locked config: denoise 0.55, grow_mask +2, feather 12px. Full-character swap giải quyết "khoảng đen mask" và skin tone body inconsistency.

Sheet 5: Skin tone postprocess (final fix cho Black)

Skin shift
Post-process YCbCr skin detection × person-mask → shift dark brown chỉ trong character area, không đụng background (trái đất, etc).

Cross-page T16 validation

Đang chạy batch: 34 pages × 4 ethnicities = 136 renders. Mỗi page ~30s, total ~70 min.

Kết quả sẽ được link vào contact sheet riêng và Drive folder.

🔍 Key Findings

✅ Đã giải quyết

⚠️ Limitations hiện tại

🎯 Production trade-offs

ApproachProConRecommend
Face-only mask Safe, không phá pose Personalization shallow ở full page ❌ Reject (per Chi)
Full-character mask Identity + body skin match Phá pose ở risky pages ✓ Use cho 80% pages
Template variants per gender/hair UX clear, no AI guess 4× art cost per page ❌ Reject (per Chi: hiện đã làm rồi, không reduce workload)
LoRA fine-tune diverse skin tones Structural fix Black/Indian quality 1-2h training + cần diverse dataset ⚠ Phase 2

💰 Cost Analysis

Detailed cost model: 📄 Full cost analysis (VI)

Per-order economics

ItemCostNote
GPU compute / 1 order (35 pages + 50 previews + regen)~$1.61RunPod A100 80GB serverless, ~51 min GPU time
+ 25% buffer (cold start, idle, retry)$0.40Production safety margin
+ Storage + DB + CDN$0.06R2 + Workers KV + Sentry
Total infra cost / order~$2.07 (~53,000 VND)Stable across scale

Monthly infra by scale

Orders / month1,0002,5005,0007,50010,000
GPU hours (estimate)8542,1344,2686,4028,536
GPU cost (raw)$1,614$4,034$8,067$12,101$16,134
+ buffer + storage + Cloudflare + monitoring$503$1,185$2,291$3,426$4,552
Total infra / month~$2,117~$5,219~$10,358~$15,527~$20,686
Cost / order$2.12$2.09$2.07$2.07$2.07

Optimization roadmap

OptimizationSavingImplementation effort
FLUX fp8 quantization (50% memory, 2x speed)~40% GPU cost ↓ ($2.07 → $1.30)1 day
Step distillation (25 → 8 steps)~30% additional ↓2-3 days
Batch processing (multi-page parallel)~15% ↓ (cold start amortization)1 day
Cache warm models (avoid 30s reload)~10%0.5 day

Realistic target Phase 2: Cost / order ~$1.20-1.40 (40% reduction) within 1 week of optimization work.

Revenue context

Industry benchmark: Wonderbly sells personalized books at $40-60. Cost of goods (printing + shipping + AI) typically 30-40% = $12-24.

FoxeTales infra cost: ~$2/order = ~5% of revenue (very healthy margin for AI component).

Breakeven scale: Even at 100 orders/month (~$210 infra), business model viable. Optimization room is significant at scale.

🚀 Next Steps

Tuần này (immediate)

1. ControlNet pose preservation

Tích hợp ControlNet (OpenPose hoặc Canny) để hold pose ở risky pages (p42 sleeping, side view).

ETA: 1 day. Cost: $5 GPU.

2. Diverse skin LoRA fine-tune

Mở dataset ra với 50-100 watercolor children diverse ethnicity. Retrain LoRA v5.

ETA: 2 days (cần dataset từ artist). Cost: $10.

Tuần sau (production prep)

3. Worker API (FastAPI)

REST endpoint: POST /generate-page nhận customer photo + page id, return rendered PNG.

Queue + status polling. ETA: 1 day.

4. Demo storefront page

Next.js page: upload photo → preview 3 pages personalized. Show book mockup.

ETA: 2 days. Foundation cho production UX.

5. RunPod serverless deploy

Worker container deploy lên RunPod serverless. Auto-scale, pay-per-request (~$0.005/page).

ETA: 1 day.

6. Page classification heuristic

Auto-detect page type (standing/sitting/sleeping/side) → load đúng denoise + mask strategy.

ETA: 1 day.

🙏 Cần từ team Artist

  1. Per-page person-region masks chuẩn (cho 34 trang T16 + các book khác).
    Em đang dùng heuristic alpha-channel mask → noisy ở edge. Artist-prepared mask = clean edge, không bị bleed.
  2. Dataset bổ sung cho diverse skin tone LoRA.
    Cần ~50-100 watercolor illustrations với kid Black, Indian, Latino, mixed (cùng style FoxeTales). Có thể là sketch + paint reference.
  3. Confirm pages risk-classification.
    Em đã identify p42 sleeping, p15 large face là risky. Artist giúp list ra tất cả pages cần "preserve-pose" strategy đặc biệt (không full-body regen).
  4. Style guide tóc / outfit constraints.
    Khi customer trai upload, AI có swap hair short không? Outfit có giữ nguyên không? Cần boundary rõ.
  5. QA volunteer: 5-10 ảnh customer thật (với consent) để test pipeline trên real data, không chỉ stock photos.