In this paper, we study the problem of Cooperative Localization (CL) for two robots, each equipped with an Inertial Measurement Unit (IMU) and a camera. We present an algorithm that enables the robots to exploit common features, observed over a sliding-window time horizon, in order to improve the localization accuracy of both robots. In contrast to existing CL methods, which require distance and/or bearing robot-to-robot observations, our algorithm infers the relative position and orientation (pose) of the robots using only the visual observations of common features in the scene. Moreover, we analyze the system observability properties to determine how many degrees of freedom (d.o.f.) of the relative transformation can be computed under different measurement scenarios. Lastly, we present simulation results to evaluate the performance of the proposed method.

