Mitigating Objectness Bias and Region-to-Text Misalignment for Open-Vocabulary Panoptic Segmentation

CVPR 2026

Nikolay Kormushev^1,2, Josip Saric^1,3, Matej Kristan¹

¹University of Ljubljana, ²ETH Zurich, ³University of Zagreb

OVRCOAT segmentation examples.

Abstract

Open-vocabulary panoptic segmentation remains hindered by two coupled issues: (i) mask selection bias, where objectness heads trained on closed vocabularies suppress masks of categories not observed in training, and (ii) limited regional understanding in vision–language models such as CLIP, which were optimized for global image classification rather than localized segmentation. We introduce OVRCOAT, a simple, modular framework that tackles both. First, a CLIP-conditioned objectness adjustment (COAT) updates background/foreground probabilities, preserving high-quality masks for out-of-vocabulary objects. Second, an open-vocabulary mask-to-text refinement (OVR) strengthens CLIP’s region-level alignment to improve classification of both seen and unseen classes with markedly lower memory cost than prior fine-tuning schemes. The two components combine to jointly improve objectness estimation and mask recognition, yielding consistent panoptic gains. Despite its simplicity, OVRCOAT sets a new state of the art on ADE20K (+5.5% PQ) and delivers clear gains on Mapillary Vistas and Cityscapes (+7.1% and +3% PQ, respectively).

Quantitative Results

Method	ADE20K			Mapillary			Cityscapes			COCO
Method	PQ	SQ	RQ	PQ	SQ	RQ	PQ	SQ	RQ	PQ	SQ	RQ
MaskCLIP	15.1	70.5	19.2	-	-	-	-	-	-	-	-	-
FreeSeg	16.3	71.8	21.6	-	-	-	-	-	-	-	-	-
OPSNet	19.0	52.4	23.0	-	-	-	41.5	67.5	50.0	52.4	83.5	62.1
ODISE	23.4	78.1	28.3	14.2	61.0	17.2	23.9	75.3	29.0	55.4	-	-
FC-CLIP	26.8	71.2	32.3	18.3	56.0	23.1	44.0	75.4	53.6	54.4	-	-
MAFT+	27.1	73.5	32.9	15.7	55.5	19.8	38.3	70.2	46.9	50.3	82.2	60.3
OVRCOAT	28.6(+1.5)	77.3(-0.8)	34.7(+1.8)	19.6(+1.3)	65.7(+4.7)	24.8(+1.7)	45.3(+1.3)	78.7(+3.3)	55.6(+2.0)	54.6	82.9	65.1

Mitigating Objectness Bias and Region-to-Text Misalignment for Open-Vocabulary Panoptic Segmentation

OVRCOAT segmentation examples.

Abstract

Qualitative Results on ADE20k

Quantitative Results

Qualitative Results on Internet Images

BibTeX