Closed Loop Metrics for Earth Observation

Christopher Ren

Jun 16

Waymo Scaling, Waymo Insights

Read →

5 Comments

Akram Zaytar

Jun 19

Thanks for the insight! It is certainly interesting to simulate impact-based downstream tasks for GeoAI.

But even for IoU, I'm curious about scaling laws in large-scale satellite imagery segmentation. Are there established rules/work for scaling architecture size and patch numbers to systematically improve segmentation performance?

Expand full comment

Reply (1)

Christopher Ren

Jun 19

Thanks Akram! I'm not aware of any work on this, but it sounds like someone should do it ;). There are however already some hints in PANGAEA: it seems the tasks are diverse enough the perhaps it doesn't make sense to have one model to rule them all? I'm not certain though, but see MADOS vs AI4FARMS performance U-Net vs TerraMind etc...

Anecdotally, it seems like there are more gains in sub-selecting your pre-training dataset for diversity/the signal of interest than raw scaling + architectural changes, but to be honest I'd say you're more of an authority in this space than I am!

Finally: I think it'd be cool if instead of just mIoU, researchers were forced to deploy an additional map somewhere in the wild of their model. mIoU + map would give benchmark performance plus the 'vibes' of the model for potential users to inspect and compare against each other.

Expand full comment

Jackline

Jun 16

Love the model utility concept. I am thinking it is important for usecases that target end user such as yield prediction. But this will mean every team/company will have their own benchmarking methodology as well results, so how can one tell which model is actually good in this case?

Expand full comment

Reply (1)

Jackline

Jun 16Edited

And do you know of any existing benchmarks for pixel level timeseries models

Expand full comment

Reply (1)

Christopher Ren

Jun 18

Hi Jackline, thanks for engaging with the post! Yes that's absolutely correct: just like with LLMs, users will determine which ones they prefer based on their usage of the model and the utility it provides, as opposed to benchmark performance.

It would however be great if model providers could at least provide maps of their models deployed over an area, so that users can at least get a better sense of how the model might map on to their particular use cases.

W.r.t to pixel level time series models, you may want to look at PRESTO/Galileo by Tseng et al. Depending on your use case, unsupervised methods might perform quite well with some post-processing such as CCDC/BFAST/Landtrendr.

Expand full comment

Applied Geospatial

Closed Loop Metrics for Earth Observation