ServiceNow
/

Llama-3.2-11B-Vision-Instruct-StarFlow

+---
+language:
+- en
+library_name: transformers
+pipeline_tag: image-text-to-text
+license: llama3.2
+datasets:
+- ServiceNow/BigDocs-Sketch2Flow
+base_model:
+- meta-llama/Llama-3.2-11B-Vision-Instruct
+---
+# Model Card for ServiceNow/Llama-3.2-11B-Vision-Instruct-StarFlow
+Llama-3.2-11B-Vision-Instruct-StarFlow is a vision-language model finetuned for **structured workflow generation from sketch images**. It translates hand-drawn or computer-generated workflow diagrams into structured JSON workflows, including triggers, flow logic, and actions.
+## Model Details
+### Model Description
+Llama-3.2-11B-Vision-Instruct-StarFlow is part of the **StarFlow** framework for automating workflow creation. It extends Meta's Llama-3.2-11B-Vision-Instruct with domain-specific finetuning on workflow diagrams, enabling accurate sketch-to-workflow generation.
+* **Developed by:** ServiceNow Research
+* **Model type:** Transformer-based Vision-Language Model (VLM)
+* **Language(s) (NLP):** English
+* **License:** [llama3.2](https://huggingface.co/meta-llama/Llama-3.2-1B/blob/main/LICENSE.txt)
+* **Finetuned from model :** [Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)
+### Model Sources
+* **Repository:** [ServiceNow/Llama-3.2-11B-Vision-Instruct-StarFlow](https://huggingface.co/ServiceNow/Llama-3.2-11B-Vision-Instruct-StarFlow)
+* **Paper:** [StarFlow: Generating Structured Workflow Outputs From Sketch Images](https://arxiv.org/abs/2503.21889);
+---
+## Uses
+### Direct Use
+* Translating **sketches of workflows** (hand-drawn, whiteboard, or digital diagrams) into **JSON structured workflows**.
+* Supporting **workflow automation** in enterprise platforms by removing the need for manual low-code configuration.
+### Downstream Use
+* Integration into **low-code platforms** (e.g., ServiceNow Flow Designer) for rapid prototyping of workflows.
+* Used in **automation migration pipelines**, e.g., converting legacy workflow screenshots into JSON representations.
+### Out-of-Scope Use
+* General-purpose vision-language tasks (e.g., image captioning, OCR).
+* Use on domains outside workflow automation (e.g., arbitrary diagram-to-code).
+* Real-time handwriting recognition (StarFlow focuses on structured workflow translation, not raw OCR).
+---
+## Bias, Risks, and Limitations
+* **Limited generalization**: Finetuned models perform poorly on out-of-distribution diagrams from unfamiliar platforms.
+* **Sensitivity to input style**: Whiteboard/handwritten sketches degrade performance compared to digital or UI-rendered workflows.
+* **Component naming mismatches**: Model may mispredict action definitions (e.g., “create\_user” vs. “create\_a\_user”), leading to execution errors.
+* **Evaluation gap**: Current metrics don’t always reflect execution correctness of generated workflows.
+### Recommendations
+Users should:
+* Validate outputs before deployment.
+* Be cautious with **handwritten/ambiguous sketches**.
+* Consider supplementing with **retrieval-augmented generation (RAG)** or **tool grounding** for robustness.
+---
+## How to Get Started with the Model
+```python
+from transformers import AutoProcessor, AutoModelForVision2Seq
+from PIL import Image
+processor = AutoProcessor.from_pretrained("ServiceNow/Llama-3.2-11B-Vision-Instruct-StarFlow")
+model = AutoModelForVision2Seq.from_pretrained("ServiceNow/Llama-3.2-11B-Vision-Instruct-StarFlow")
+image = Image.open("workflow_sketch.png")
+inputs = processor(images=image, text="Generate workflow JSON", return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=4096)
+workflow_json = processor.decode(outputs[0], skip_special_tokens=True)
+print(workflow_json)
+```
+---
+## Training Details
+### Training Data
+The model was trained using the [ServiceNow/BigDocs-Sketch2Flow](https://huggingface.co/datasets/ServiceNow/BigDocs-Sketch2Flow) dataset, which includes the following data distribution:
+* **Synthetic** (12,376 Graphviz-generated diagrams)
+* **Manual** (3,035 sketches hand-drawn by annotators)
+* **Digital** (2,613 diagrams drawn using software)
+* **Whiteboard** (484 sketches drawn on whiteboard / blackboard)
+* **User Interface** (373 screenshots from ServiceNow Flow Designer)
+### Training Procedure
+#### Preprocessing
+* Synthetic workflows generated via **heuristics** (Scheduled Loop, IF/ELSE, FOREACH, etc.).
+* Annotators recreated flows in digital, manual, and whiteboard formats.
+#### Training Hyperparameters
+* Optimizer: **AdamW** with β=(0.95,0.999), lr=2e-5, weight decay=1e-6.
+* Scheduler: **cosine learning rate** with 30 warmup steps.
+* Early stopping based on validation loss.
+* Precision: **bf16 mixed-precision**.
+* Sequence length: up to **32k tokens**.
+#### Speeds, Sizes, Times
+* Trained with **16× NVIDIA H100 80GB GPUs** across two nodes.
+* Full Sharded Data Parallel (FSDP) training, no CPU offloading.
+---
+## Evaluation
+### Testing Data
+Same dataset distribution as training: synthetic, manual, digital, whiteboard, UI-rendered workflows.
+### Factors
+* **Source of sample** (synthetic, manual, UI, etc.)
+* **Orientation** (portrait vs. landscape diagrams)
+* **Resolution** (small <400k pixels, medium, large >1M pixels)
+### Metrics
+All Evaluation metrics can be found in the official [StarFlow repo](https://github.com/ServiceNow/StarFlow).
+* **Flow Similarity (FlowSim)** – tree edit distance similarity.
+* **TreeBLEU** – structural recall of subtrees.
+* **Trigger Match (TM)** – accuracy of workflow triggers.
+* **Component Match (CM)** – overlap of predicted vs. gold components.
+### Results
+* Proprietary models (GPT-4o, Claude-3.7, Gemini 2.0) outperform open-weights **without finetuning**.
+* **Finetuned Pixtral-12B achieves SOTA**:
+  * FlowSim w/ inputs: **0.919**
+  * TreeBLEU w/ inputs: **0.950**
+  * Trigger Match: **0.753**
+  * Component Match: **0.930**
+#### Summary
+Finetuning yields **large gains over base Pixtral-12B and GPT-4o**, particularly in matching workflow components and triggers.
+## Model Examination
+* Finetuned models capture **naming conventions** and structured execution logic better.
+* Failure modes include **missing ELSE branches** or **generic table names**.
+---
+## Technical Specifications
+### Model Architecture and Objective
+* Base: **Llama-3.2-11B Vision Instruct**, a multimodal LLM with 11 B parameters, optimized for image reasoning and instruction-following tasks.
+* Objective: **Image-to-JSON structured workflow generation**.
+### Compute Infrastructure
+* **Hardware:** 16× NVIDIA H100 80GB (2 nodes)
+* **Software:** FSDP, bf16 mixed precision, PyTorch/Transformers
+---
+## Citation
+**BibTeX:**
+```bibtex
+@article{bechard2025starflow,
+  title={StarFlow: Generating Structured Workflow Outputs from Sketch Images},
+  author={B{\'e}chard, Patrice and Wang, Chao and Abaskohi, Amirhossein and Rodriguez, Juan and Pal, Christopher and Vazquez, David and Gella, Spandana and Rajeswar, Sai and Taslakian, Perouz},
+  journal={arXiv preprint arXiv:2503.21889},
+  year={2025}
+}
+```
+**APA:**
+Béchard, P., Wang, C., Abaskohi, A., Rodriguez, J., Pal, C., Vazquez, D., Gella, S., Rajeswar, S., & Taslakian, P. (2025). **StarFlow: Generating Structured Workflow Outputs from Sketch Images**. *arXiv preprint arXiv:2503.21889*.
+---
+## Glossary
+* **FlowSim**: Metric based on tree edit distance for workflows.
+* **TreeBLEU**: BLEU-like score using tree structures.
+* **Trigger Match**: Correctness of predicted workflow trigger.
+* **Component Match**: Correctness of predicted components (order-agnostic).
+---
+## More Information
+* [ServiceNow Flow Designer](https://www.servicenow.com/products/platform-flow-designer.html)
+* [StarFlow Blog](https://www.servicenow.com/blogs/2025/starflow-ai-turns-sketches-into-workflows)
+---
+## The StarFlow Team
+* Patrice Béchard, Chao Wang, Amirhossein Abaskohi, Juan Rodriguez, Christopher Pal, David Vazquez, Spandana Gella, Sai Rajeswar, Perouz Taslakian
+---
+## Model Card Contact
+* Patrice Bechard - [[email protected]](mailto:[email protected])
+* ServiceNow Research – [research.servicenow.com](https://research.servicenow.com)