The Image Search API Inconsistent result?

{“index”: 25075, “score”: 0.775077223777771, “url”: “https://upload.wikimedia.org/wikipedia/commons/1/10/Air_MET_DP104981.jpg”, “entities”: [{“entity_name”: “Air”, “entity_attributes”: {“composition_nitrogen”: “78.08%”, “composition_oxygen”: “20.95%”, “composition_argon”: “0.93%”, “composition_carbon_dioxide”: “0.04%”, “average_molecular_weight”: “28.946 g/mol”, “mass”: “5.15\u00d710^18 kg”, “average_temperature_at_surface”: “14 \u00b0C”, “average_pressure_at_sea_level”: “101325 pascals”, “troposphere_height”: “12 km”, “stratosphere_height”: “50 km”, “mesosphere_height”: “80 km”, “thermosphere_height”: “500-1000 km”, “exosphere_height”: “700-10,000 km”, “ionosphere_height”: “50-1000 km”, “ozone_layer_concentration”: “2-8 parts per million”, “greenhouse_gases”: [“carbon dioxide”, “methane”, “nitrous oxide”], “density_at_sea_level”: “1.2 kg/m^3”, “name”: “Atmosphere of Earth”}}]}

The above is a result I obtained through the image search API. As you can see, the URL image (of a sculpture) here does not match the entity_name at all.

Below are the method and parameters I used: the main version and v0.5.0 of the API.
search_pipeline = UnifiedSearchPipeline(
image_model_name=“openai/clip-vit-large-patch14-336”,
image_hf_dataset_id=“crag-mm-2025/image-search-index-validation”,
)

Should participants consider handling this kind of inconsistency between images and text as noise? Or is this an official bug that you will fix in the future?

1 Like

@buptxy The mismatch between image and entity_name could happen and should be considered as noise. In practice, we have automatic mechanism to remove certain noises, but it is impossible to eliminate all the data errors as scale. Nevertheless, we appreciate the feedback and would love to hear more about your observations!