Zero shot classification: compare an image with the text of the class to know which class is most similar (e.g., ImageNet classification).As of September 2022, this is the best open source CLIP model.ĬLIP makes it possible to compute representations of images and texts to measure how similar they are. The H/14 model achieves 78.0% zero shot top-1 accuracy on ImageNet and 73.4% on zero-shot image retrieval at on MS COCO. We trained three large CLIP models with OpenCLIP: ViT-L/14, ViT-H/14 and ViT-g/14 (ViT-g/14 was trained only for about a third the epochs compared to the rest).
0 Comments
Leave a Reply. |