Embed the world: Multimodal AI for searchable aerial imagery at scale
The AWS Generative AI Innovation Center (GenAIIC) partnered with Vexcel to develop a multimodal AI system for searchable aerial imagery at scale, leveraging Amazon Bedrock and Amazon OpenSearch Serverless. The system uses multimodal embeddings, large language model (LLM) captioning, and vector search to enable natural-language-searchable knowledge bases. The evaluation methodology, built on OpenStreetMap ground truth, compared embedding models, fusion strategies, captioning, and search methods, with Amazon Nova Multimodal Embeddings delivering the highest F1 scores. This approach removes the per-feature training step, allowing for faster and more efficient semantic search. The practical implication for engineers building AI systems is the potential to apply this architecture to other domains, enabling faster and more efficient search capabilities.
⚡ Key Takeaways
- Amazon Nova Multimodal Embeddings delivered the highest F1 scores across both benchmark queries in the evaluation.
- The system uses a combination of multimodal embeddings, LLM captioning, and vector search on AWS to enable natural-language-searchable knowledge bases.
- The evaluation methodology was built on OpenStreetMap ground truth, allowing for accurate comparison of different embedding models, fusion strategies, captioning, and search methods.
- The system can be used to search millions of aerial images without per-feature training, reducing the need for manual inspection or bespoke computer vision models.
- The use of Amazon Bedrock and Amazon OpenSearch Serverless enables scalable and efficient deployment of the system.
The development of this multimodal AI system has significant implications for industries that rely on geospatial data, such as insurance, real estate, government, infrastructure, and agriculture. By enabling faster and more efficient search capabilities, this system can help organizations make more informed decisions and improve their operations.
✅ Practical Steps
- Evaluate the use of Amazon Nova Multimodal Embeddings for semantic search over multi-view aerial imagery.
- Consider leveraging Amazon Bedrock and Amazon OpenSearch Serverless for scalable and efficient deployment of multimodal AI systems.
- Apply the evaluation methodology built on OpenStreetMap ground truth to compare different embedding models, fusion strategies, captioning, and search methods.
Want the full story? Read the original article.
Read on AWS ML Blog ↗