-
Feedback
help us improve
close
Feedback
Please help us improve your experience by sending us a comment, question or concern
Description
In this paper we introduce visual phrases, complex visual composites like “a person riding a horseâ€. Visual
phrases often display signi?cantly reduced visual complexity compared to their component objects, because the appearance of those objects can change profoundly when they
participate in relations. We introduce a dataset suitable for
phrasal recognition that uses familiar PASCAL object categories, and demonstrate signi?cant experimental gains resulting from exploiting visual phrases.
We show that a visual phrase detector signi?cantly outperforms a baseline which detects component objects and
reasons about relations, even though visual phrase training
sets tend to be smaller than those for objects. We argue that
any multi-class detection system must decode detector outputs to produce ?nal results; this is usually done with nonmaximum suppression. We describe a novel decoding procedure that can account accurately for local context without
solving dif?cult inference problems. We show this decoding
procedure outperforms the state of the art. Finally, we show
that decoding a combination of phrasal and object detectors
produces real improvements in detector results.