Task description
Unmanned Aerial Vehicle (UAV) vision has become indispensable in applications such as security inspection, traffic monitoring, and emergency response, where image quality directly influences the reliability of downstream tasks including detection, tracking, and recognition. In real deployments, UAV platforms often face stringent constraints on payload, power, bandwidth, and onboard computer, which makes robust perception under imperfect imaging conditions particularly critical. Existing public datasets and image quality assessment (IQA) studies largely rely on traditional natural-image distributions. These datasets typically comprise ground-level, mid-range imagery that differs fundamentally from UAV perspectives. Consequently, models validated on conventional IQA benchmarks may fail to predict the quality for UAV images, limiting their usefulness for UAV-centric perception pipelines, as shown in the following example.
UAV images are captured from diverse flight altitudes and viewpoint: often top-down or oblique, resulting in smaller object scales, complex scenes, and spatially nonuniform quality degradation. Such characteristics make UAV imagery a unique and challenging domain for IQA research. Beyond viewpoint and scale, UAV images are frequently affected by motion blur, defocus, low-light noise, haze, and compression artifacts introduced by onboard recording or wireless transmission. Importantly, these degradations can be highly region-dependent within a single image, for instance, a small target may be blurred or underexposed while the background appears relatively clear, or conversely a cluttered background may dominate errors even when global quality seems acceptable. Moreover, traditional IQA methods generally focus on global perceptual quality and tend to overlook target-region usability and background quality, both of which are critical in UAV-based analysis pipelines. In many UAV applications, the key question is not only whether an image “looks good” overall, but whether targets are sufficiently identifiable and separable from complex backgrounds, motivating target-aware IQA that explicitly accounts for task-relevant regions and background interference. To address these issues, we propose UAV-IQA, a new benchmark designed specifically for UAV vision tasks. The dataset consists of approximately 6,000 images collected from two authoritative UAV object detection datasets VisDrone and UAVDT which covering a wide variety of real-world conditions, including:
- Diverse scenarios and sources: Images span daytime, nighttime, and foggy weather conditions, different flight altitudes, and multiple viewpoints (top-down, oblique, and frontal).
- Balanced difficulty coverage: Sampling is deliberately designed to ensure uniform coverage across different difficulty levels rather than random selection, enabling comprehensive evaluation under varied conditions.
- Three-dimensional subjective annotations: Each image is labeled along three dimensions: background quality, target quality, and global quality to jointly characterize overall perceptual fidelity and task-specific usability.
This benchmark serves as the first large-scale, target-aware UAV image quality dataset with structured and interpretable annotations. It aims to promote new research directions in fine-grained quality modeling and assist the development of IQA algorithms that better reflect the practical challenges in UAV vision applications.
Metrics
Participant submissions will be evaluated quantitatively on a held-out test set using correlation-based metrics between model predictions and subjective Global Quality scores. The training set provides subjective annotations for Global Quality and may also include Target Quality and Background Quality as optional auxiliary supervision signals. Participants may choose to use the auxiliary annotations during training (e.g., multi-task learning), but the official ranking will be determined only by prediction performance on Global Quality.
Let the subjective score of image \(i\) be \(y_i\) and the predicted score be \(\hat{y}_i\), where \(i = 1,\ldots,N\).
PLCC (Pearson Linear Correlation Coefficient):
SRCC (Spearman Rank Correlation Coefficient):
Where \(r(\cdot)\) denotes the rank of each value in the dataset, and \(\overline{r(y)}\) and \(\overline{r(\hat{y})}\) are the mean ranks. The final leaderboard score is defined as:
Final rankings will be based on Score (with PLCC and SRCC also reported separately) on the held-out test set.
Dataset format
We will provide an evaluation script that takes as input a CSV file.
cat train.tar.gz.part-* > train.tar.gz
tar -xzf train.tar.gz
Right-click train.zip.001 → 7-Zip → Extract Here
How to submit
We provide 3,600 images as the training set and 1,200 images as the validation set. Participants should train their models using the training set and submit their predicted scores/results on the validation set, and report the corresponding validation performance.
After the validation leaderboard is formed, we will contact the top 5 teams and request their trained model, model weights (parameters), and training/inference code to reproduce the reported results. We will then evaluate these reproduced models on a held-out test set.
The final ranking will be determined only by the test set score.
Please send your validation results, model, and training code to cjiang92-c@my.cityu.edu.hk. If you have any questions, feel free to contact us at the same email address.
- February 15, 2026 — Registration Opening Date
- March 10, 2026 — Training Data Release
- April 4, 2026 — Challenge Result Submission Deadline
- May 10, 2026 — Challenge Technical Paper Submission Deadline
- TBD — Final Decisions (around May 15, 2026)
- All evaluations will be performed using the official evaluation script.
- Participants must submit prediction results for the official test set following the provided JSON submission format.
- Each team may submit up to three results per day.
- Any form of external data usage for pretraining, finetuning, distillation, or model selection is strictly prohibited.