CLIP (Contrastive Language-Image Pre-training) is a model by OpenAI that connects images and text by learning from a large dataset of image-text pairs. It aligns images and their descriptions in a shared space, enabling zero-shot tasks like image classification and text-to-image generation. By using separate encoders for images and text, CLIP can understand and match content across both mediums. It’s widely used in applications like image search and generation based on textual input.
« Back to Glossary Index
« Back to Glossary Index