Seeking Feedback: Professional Marble & Stone Defect Dataset (Computer Vision)

RomMilk · January 15, 2026, 8:38pm

Hi everyone,

I’m a professional in the stone restoration trade (marble/terrazzo) and I’m building a diagnostic dataset for Computer Vision. I’ve uploaded a sample project to Hugging Face and would appreciate some technical feedback on my approach.

Link to Dataset: https://huggingface.co/datasets/RomMilk/marble-surface-damage-coco/tree/main

What’s in the Zips: I’ve provided two versions of the same data so researchers can compare:

output_tiles.zip: 181 tiled patches (512x512) with COCO JSON annotations.
Full Res.zip: The original high-resolution captures of the same surfaces.

The Project Focus: This specific set features unpolished and dull surfaces. I have intentionally NOT labeled for “dullness” or “dirt.” Instead, I am focusing strictly on physical substrate damage:

Surface Cracks & Chips
Grout Failure / Eroded Grout
Deep Scratches

The goal is to train a model that can “see through” the dirt and lack of shine to find permanent structural issues.

Future Scope: This is just the start. I have a library of thousands of images of clean, polished marble, as well as different stone types (Terrazzo, Granite) and specific architectural features like stone showers and countertops.

I’m looking for your expertise on:

Annotation Quality: Do these polygons/labels look precise enough for your training pipelines?
The “Tile” vs. “Hi-Res” Debate: For detecting hairline cracks in stone, do you prefer working with these pre-cut 640px tiles, or is it better to have the full-scale image?
Labeling: Since I have “Clean/Polished” versions of these same stone types, would adding those as a “Baseline” class significantly increase the value of this set?

Thanks for any insights you can share!

Topic		Replies	Views
How to clean/audit your image data? 🤗Datasets	1	1071	April 21, 2023
Public Electrical Utility Asset Image Dataset 🤗Datasets	0	159	December 8, 2023
How to clean 8217 pictures from the similar one Beginners	18	719	November 28, 2024
Noisy labels for instance segmentation (COCO-format): VIPER (clean) + VIPER-N + COCO-N 🤗Datasets	0	8	February 7, 2026
Generating Croissant Metadata for Custom Image Dataset 🤗Datasets	15	717	October 15, 2025

Seeking Feedback: Professional Marble & Stone Defect Dataset (Computer Vision)

Related topics