Point a vision-language model at a folder of scans. Glimp renames and sorts them by category, caption, date, and orientation — entirely on-device.
Drop a folder of scans. Glimp inspects each image and proposes a new name, category, caption, and rotation. Approve in a grid view.
Runs MLX-quantized vision-language models (Qwen3-VL-4B by default) via mlx-swift-examples. Nothing leaves the machine.
photo · portrait · group · travel · event · card · letter · document · id · art · screenshot · receipt · other. Prompt iterated against a labeled eval set.
Detects the upright orientation and writes EXIF rotation. 85% accuracy on a 56-image FastFoto-heavy test set.
Ships with eval-batch and a seeded eval-search so prompt and model swaps are testable, not vibes.
No cloud calls, no analytics on image content. Your scans stay in their folder; the model lives in ~/Library.
Glimp.zip and unzip. Drag Glimp.app into /Applications.~/Library/Caches/huggingface. Subsequent runs are offline.Four MLX-quantized VLMs, four prompts, 56 hand-labeled scans. What moved the number, what didn't, and one model that was completely dead.
A ~600-line Swift tool that proposes (prompt, decode-params) cells, scores each, and emits a Pareto front. So I stop hand-rolling v6.