Satellite Image Analysis: A Hands-On Workshop” is an advanced 3-week international course designed to teach participants how to leverage Vision Transformer (ViT) architectures for analyzing satellite and aerial imagery. With growing applications in climate research, defense, agriculture, and urban planning, transformers represent a major leap forward in geospatial image analysis.
This hands-on program will explore the theory behind transformers, their adaptation to vision tasks such as land cover classification, object detection, climate pattern recognition, and disaster mapping. Participants will learn how ViTs outperform traditional Convolutional Neural Networks (CNNs) by capturing long-range dependencies and spatial relationships in high-resolution imagery. The course includes practical work with real geospatial datasets (Sentinel, Landsat, DOTA) using frameworks like PyTorch, Hugging Face, and TIMM.
Aim
The course aims to equip participants with the skills to apply Vision Transformers (ViTs) to remote-sensing image analysis, enabling them to tackle tasks such as land cover classification, object detection, climate pattern recognition, and disaster mapping using advanced deep learning methods.
Course Objectives
-
Introduce Vision Transformers and their application in satellite image analysis
-
Enable hands-on experimentation with publicly available geospatial datasets
-
Teach model customization and fine-tuning techniques
-
Promote responsible AI usage in environmental and humanitarian applications
-
Foster interdisciplinary innovation at the intersection of AI and Earth sciences
Course Structure
📅 Module 1: Transformers vs CNNs in Remote Sensing
Theme: Understanding Vision Transformers for Scene Classification
-
Beyond CNNs: Vision Transformers for Scene Classification
-
Review of CNN architectures in remote sensing (ResNet, UNet, etc.)
-
Introduction to Vision Transformers (ViT) and how they work
-
Why ViTs are suited for remote-sensing imagery (large context, less inductive bias)
-
Comparison of ViT vs CNN in scene classification
-
Hands-On Lab:
-
Colab demo using pretrained ViT and CNN for a sample land scene classification task with datasets like EuroSAT or BigEarthNet
📅 Module 2: Land-Cover Change Detection Using Transformers
Theme: Using Transformers for Temporal Change Detection
-
Tracking the Earth: Transformers for Change Detection
-
The problem of land-cover change detection (LCCD) and its applications (urbanization, deforestation)
-
Architectures adapted for temporal change detection (Siamese ViTs, TimeSFormer)
-
Preprocessing → Patch Embedding → Transformer Blocks → Classification head pipeline
-
Hands-On Case Study:
-
Visual result comparison (before/after images and heatmaps) using transformers for land-cover change detection
📅 Module 3: Fine-Tuning Vision Transformers on Small Labeled Sets
Theme: Training ViTs with Limited Data
-
Efficient Learning: Adapting ViTs with Limited Data
-
Challenges of training ViTs with small labeled datasets
-
Strategies: Transfer learning, self-supervised learning (DINO, MAE), adapter layers
-
Case studies in remote sensing: Agriculture crop mapping, disaster response
-
Hands-On Lab:
-
Colab demo: Fine-tuning a ViT model on a small custom dataset for a remote sensing task
Who Should Enrol?
-
Geospatial and remote-sensing professionals
-
AI/ML engineers and computer vision researchers
-
Earth scientists, environmental engineers, and urban planners
-
Students and researchers in space science, climate science, or deep learning
-
Government/NGO professionals working with Earth observation data









Reviews
There are no reviews yet.