Geospatial Python Curriculum
Thank you for your interest in developing and sharing lesson materials! To submit lesson materials or suggest a topic for future curricular development, please answer the questions below. Our Curriculum Development Team will follow up to suggest next steps in your lesson's trajectory. Questions? Please email [email protected].
- What is the topic of your lesson or lesson proposal?
Geospatial Python Curriculum, an analog to the Geospatial R lesson
I'm personally interested in contributing to the raster component of the lesson. My background is in vegetation remote sensing and land cover classification. I mostly work with Landsat and Planet Labs imagery.
- Do you already have a draft of your lesson? You're welcome to share materials at any stage of development. If you already have drafted materials, please include a link.
No draft, though I think that we would try to follow the concepts and challenges of the R lesson closely.
(If you answered "No" to question 2, you can skip the remaining questions. Thank you for your lesson idea!)
-
Do your materials conform to our Code of Conduct? Yes (or at least, they will).
-
Are your materials already on GitHub and do they use The Carpentries lesson template? They are not but they will: https://github.com/carpentries/lesson-example
-
If you answered "No" to either part of question 4, would you like our Curriculum Team to create a repository for you in The Carpentries Incubator?
Yes that'd be great, would hopefully make it more visible to other potential. collaborators
-
If you answered "Yes" to both parts of question 4, would you like to transfer your repository to The Carpentries Incubator? You will have Write access to the repository.
-
If you answered "Yes" to either question 5 or 6, list the GitHub handles for people who should have Write access to your lesson. If you don't know how to answer this question, don't worry! We can always add collaborators later.
I don't know other folks github handles yet but will share this via slack to folks that have expressed interest, they can comment if they are interested in being added!
- Any other information you would like us to have or questions you have for us?
My initial thoughts/questions on what the lesson should cover and how it should be structured:
The R lesson could serve as a template on how to structure the lesson. We could use the Austin house pricing dataset used by this recent geospatial tutorial given at Scipy 2019:https://github.com/pysal/scipy2019-intermediate-gds/tree/master/notebooks to teach the vector component of the lesson.
Some of the core packages that we should cover in my opinion include rasterio, geopandas, fiona, and shapely (not int hat chronological order). I think it's also very worth considering xarray as a part of the curriculum:
https://pangeo.io/packages.html
Xarray allows me to easily slice a satellite time series by labels like band and time (similar to pandas), which has been a game changer for making my programs more legible. This is not easily done using a combo of rasterio and numpy arrays. However there are gaps in how xarray interfaces with geopandas, rasterio, etc. A notable one is that xarray objects cannot be written to geotiff with a single function call (requires a seperate project called rioxarray).
for reference, a discussion around rasterio/xarray interoperability: https://github.com/pydata/xarray/issues/2042
On that note, something to decide is whether we mix teaching the Pangeo model that caters to the climate and meteorology disciplines and formats (NetCDF) or to the folks that come from using ESRI/ENVI and are more used to working with geotiffs and shapefiles. I'm not an expert with either set of tools and don't know all about their capabilities so interested to hear what other folks think would be valuable to include in this set of lessons.
I've also found scikit-image very useful when I don't need location based operations or I don't need projection information. However I'm not sure how common it is to use with geospatial imagery or to what extent we should highlight how non spatially focused packages, like scikit-learn, scikit-image, and scipy can be used in classifcation, thresholding, filtering, etc.
Finally, I thought I'd just link below some existing materials that can serve as a jumping off point: https://geohackweek.github.io/vector/ https://github.com/pysal/scipy2019-intermediate-gds/tree/master/notebooks https://geohackweek.github.io/raster/ https://www.earthdatascience.org/courses/earth-analytics-python/
Thanks for getting this conversation restarted @rbavery! I've gone ahead and created a repo for this lesson here: https://github.com/carpentries-incubator/geospatial-python and given you write access. Y'all can ping me on this thread to request write access for other collaborators as they come on board.
As you get started with writing and formatting content, you may find parts of our new Curriculum Development Handbook to be useful. This is still very much a work in progress, so any and all feedback and suggestions would be very appreciated. The chapter on Technological Introductions will (hopefully) be particularly useful for you at this point as you start to play around with the lesson template files. Please reach out with any questions that come up, either by pinging me on an issue or by emailing [email protected].
Happy lessoning!
Thanks for getting this going again @rbavery, @ErinBecker! I might have a little bit of work time I could allocate to developing new materials so I would definitely be up for contributing something here.
I would give another +1 for including geopandas and xarray in this curriculum. I think they are really useful tools for researchers/data scientists in geospatial fields to be learning in the Python ecosystem. I also think they would follow on nicely from some of the core SWC/DC lessons in Python that cover pandas/numpy.
netCDF and related tools is an interesting one - you're right it does have users more in the meteorology/climate side of things, and in my experience these folk don't often overlap as much with those coming from the GIS side of things. Having said that I think netCDF is a useful file format and I wish we made more use of it in the solid Earth sciences side of things...anyway I digress...
Don't know much about scikit-image but I think there are colleagues using it here at @BritishGeologicalSurvey.
I agree with using the R-workshop as a sort of template to get us started sounds like a good idea.
pyproj/proj4 is also essential to cover.
I would strongly suggest folium in place of matplotlib for visualization.
PySal is another frequently used spatial analysis tool.
'pyshp' is fairly important as a light-weight (no gdal) way to read shapefiles.
dask (especially dask-rasterio) is something to consider with geopandas and xarray.
One important skillset to consider is how to call OGC services for data (WFS/WCS) as opposed to just loading data locally.
@blordcastillo folium looks really cool and I think that'd be great to showcase, haven't used it myself but I've seen demos. I also agree that it'd be good to cover how to access remote data and basemap services.
I've never used pyproj/proj4 directly, usually I rely on rasterio and geopandas' built in crs attributes, projection methods, and plotting functions to make sure my datasets are aligned. From a glance at the R lesson, it looks like when teaching about reprojecting the main star is the projectRaster() function that takes a raster as an input, with little time spent on discussing datums, ellipsoids, etc. The key points of this lesson are:
In order to plot two raster data sets together, they must be in the same CRS.
Use the projectRaster() function to convert between CRSs.
In the projecting vectors episode, proj4 is mentioned briefly in a sort of "Check this out later on your own time" way:
Official PROJ library documentation
More information on the proj4 format.
A fairly comprehensive list of CRSs by format.
To view a list of datum conversion factors type: projInfo(type = "datum") into the R console.
https://datacarpentry.org/r-raster-vector-geospatial/09-vector-when-data-dont-line-up-crs/index.html
Just wanted to highlight that it looks like the R lesson didn't cover an R proj4 module so if we did so in Python, we'd be deviating. What do you consider essential to cover about proj4 @blordcastillo ?
Hi all - so excited to see this conversation kicking off! I'm going to transfer the issue to the new repo so the conversation can continue there. Please let me know if you would like to contribute to the lesson first draft - I'm happy to set you up with Write access.
Thanks, @ErinBecker - would you be able to set me up with write access.
Edit: Done, thanks!
@rbavery - With respect to your question, something to decide is whether we mix teaching the Pangeo model that caters to the climate and meteorology disciplines and formats (NetCDF) or to the folks that come from using ESRI/ENVI and are more used to working with geotiffs and shapefiles, I agree with @dvalters that there isn't much overlap between those people working with raster or “gridded” data that are stored as a uniform grid of values using the netCDF file format, and those working with geospatial vector data composed of discrete geometric locations (i.e. x, y values) that define the shape of a spatial point, line or polygon.
With respect to those working with gridded data in netCDF format, I've developed and published Data Carpentry lessons (Python for Atmosphere and Ocean Scientists) that have now been taught at half a dozen workshops in Australia and the US.
With respect to those working with geospatial vector data (an area I'm far less familiar with), I agree that while R users are well catered for with the new Data Carpentry geospatial curriculum, there's not much out there for Python users.
I'd therefore suggest that these lessons that you are proposing focus on using Python for vector data and if people are passionate about gridded/netCDF data they can contribute to my existing lessons :smile: https://github.com/carpentrieslab/python-aos-lesson
Also, a relevant library that hasn't been mentioned in the discussion yet is geoviews.
@DamienIrving Thanks for the suggestions! There's some overlap with remote sensing folks that work with both vectors and multispectral imagery (RGB+ sensors) and climate/meteorology that works with gridded NEtCDF formats. I'm still getting familiar with where there is overlap between the NetDCF and Geotiff+Vector users out there, but I see many common workflows, such as reducing time series, moving window operations, and visualizing large array datasets. The differences in workflows seem to be around having to pay attention to coordinate reference systems, but xarray now supports an option to maintain this metadata on read.
Another difference seems to be that remote sensing folks often are also working with vectors to calculate zonal statistics across gridded data, clip/subset gridded data with vectors, etc. The R curriculum covers both raster and vector data processing for this reason and I think a geospatial Python lesson would need to do so as well.
geoviews looks interesting, it seems to be geared toward the "global" rather than the "regional" just judging by the initial examples they present.
I think that it makes sense to have domain focused lessons, which seems to be the model that Data Carpentry is following. Some of the concepts and examples we would need to cover in this lesson would fall outside of the scope of your atmosphere and oceans science lsson. For example, including examples on image stretching multispectral imagery or computing zonal stats with raster and vector data might not be of interest to folks attending your lessons' workshops.
So in summary, I think there's a lot in the remote sensing/geospatial domain that should be covered in it's own Python lesson. See the list of packages besides xarray that geospatial folks need to be able to use: rasterio, fiona, folium, shapely, to name a few. But I'd love to see development on this lesson dovetail back into your existing lesson where it makes sense.
Just putting my hand up here - I'd love to contribute to the materials as well. I'll admit that I tend to use mostly R for geospatial data processing, and have taught the DC Geospatial curriculum several times. However, I'm comfortable using python as well, and happy to run through materials, add clarifying/needed comments/questions/debug/etc - so please ping me when there's a rough draft of something so I can help.
Thanks @dvanic will do. Now that I've got a bit of a handle on making PRs to lessons I'll try to scaffold out a draft/outline this week modeled after the DC R lesson.
Hi @ErinBecker when you get a chance I've opened a pull request but it looks like I'm unable to merge it and don't have write access.
Hi @rbavery you should now have write access. Thanks!
:+1: to this effort. One comment I'd make is that a couple of (what I consider low-level) libraries have been mentioned (fiona and shapely). My personal experience is that I haven't found these to be very useful beyond what the low-level gdal library can do for fine-grained control. If this effort is sticking to libraries with a large user-base maybe it makes sense to stick with gdal for these purposes?
Also might want to take a look at the existing lessons at https://github.com/annefou/metos_python @annefou
@jsta Thanks a bunch for linking this and @annefou for working on this! At present, I think there are even higher level libraries than gdal that simplify some of the code for plotting and munging geospatial data. In particular, rasterio for reading, projecting, and featurizing rasters and earthpy for visualizing. With these lessons you linked, I'm particularly interested to see how earthpy can simplify the matplotlib code.
After teaching the R lesson recently and reflecting on my own day-to-day use of geospatial libraries I agree that we probably don't need to focus on fiona, shapely, or proj4. rasterio and geopandas handle the reading, writing, and reprojecting well enough that it is rare that one needs to use these lower level libraries directly. I think they are worth mentioning in a "Tip" or maybe on an extended lesson notes page.
Hi everyone! I'm glad this is getting discussed. I agree with most of what's been said that geopandas and rasterio are probably good things to start with; especially since rasterio has hit v1.0+. I did want to mention that I'm a core developer on the Satpy library which provides a very high level interface to working with meteorological satellite data (mostly raster imagery) which uses libraries like xarray, dask, netcdf4, and rasterio. I taught a 4 hour tutorial at this year's SciPy 2019 conference on Satpy, lessons here, where I did my best to teach a lot of these concepts: satellite bands/wavelengths, resampling and projections, creating RGB composites, using geotiffs and netcdf files, and visualize data with geoviews and cartopy. From that experience, I would agree that teaching projections/pyproj/PROJ is probably too much for beginners; especially those just learning about shape files and gridded raster data.
I'd love to help contribute to these lessons if you need the help (and I have time), but mostly I'm just glad that the carpentries are picking these concepts up from python-land. It will be a great resource to point to for the people that I work.
One more thing about folium. I'm 85% sure folium is slowly being deprecated in favor of projects like ipyleaflet. I don't want to tag the core developer I talked to at SciPy about this to avoid too many notifications. If folium starts being seen as a real option for these lessons then I can reach out.
Hi @djhoese thanks for the info and working on Satpy, looks really useful! I'm interested to track the development towards 1.0. Is there a general method in Satpy for reading and writing geotiffs and preserving CRS and other metadata? It'd be great to be able to use Satpy in this way so that we could work with xarray DataArrays and Datasets throughout the workshop.
I also appreciate the tip on folium, ipyleaflet is looking really strong so maybe we can work this in: https://github.com/jupyter-widgets/ipyleaflet
We have interfaces/satpy-standard ways of preserving CRS information but applying it to other libraries and tools has not really been standardized yet. In Satpy we are also changing how we do it (slowly) to be use the features provided by xarray more. The geoxarray project I started (repos here) is meant to standardize this CRS/grid logic for xarray, but development has been slow as we (the community?) learn new things about other libraries, as PROJ C (version 6.0+ recent releases), pyproj (2.x releases), and GDAL (3.0+ releases) all made huge changes. It has also been slow because of how xarray changes or has plans to change when it comes to things like coordinate arrays allowing for things other than arrays (like a crs object :smile: ) and whether or not they can be preserved and other similar things (xarray accessor functionality, etc). That's probably not the answer you were hoping for, but that's how it is and is an ongoing problem being solved.
Hi @rbavery you should now have write access. Thanks!
Thanks @fmichonneau for fixing this!
Got it, thanks @djhoese . I've looked over some of the issues across geoxarray and xarray, particularly this: https://github.com/geoxarray/geoxarray/issues/6. I'm looking forward to the dust settling! In this case we will probably need to decide whether we should make space in the episodes for covering how to convert numpy arrays (that we read in with rasterio) to xarray DataArrays and how to manage the corresponding metadata in an ad hoc way. For example we would need to let folks know of all the weird behaviors, like that addition between non overlapping DataArrays is allowed and drops the crs attribute: https://github.com/geoxarray/geoxarray/issues/8
Or, we leave xarray as a footnote until the ecosystem is a little more stable for geospatial users, and then this lesson could revised. Not sure which is the best option.
I think if xarray and rasterio are used at all, xarray's open_rasterio should be used.
I am really excited about this lesson development and I am really looking forward to contributing!
Hi all! After some time, I've gotten back around to fleshing out this lesson and intend to be more active in developing it in the coming weeks. I have a PR up for the second episode and if any of you have the time, I'd super appreciate a review!
Link to PR: https://github.com/carpentries-incubator/geospatial-python/pull/9
Just learned about your work on a geospatial python curriculum and the carpentries materials being developed. They look terrific! This thread is also pretty informative.
I wanted to mention our GeoHackweek tutorials, focused on Python. You may find something of value there. See the links from the Schedule page. We've borrowed from past Carpentries materials, but nothing focused on Python, as far as I'm aware. We ran the GeoHackweek event annually from 2016 to 2019, and put it on pause after the 2019 event.
For what it's worth, for 2019 the raster tutorials changed significantly from previous years and it now includes an xarray+dask notebook. The 2018 tutorials are here (raster) and here (xarray+dask).
Looking forward to seeing the Python geospatial curriculum mature. I may be able to contribute.
@emiliom Thanks for the input, I've referenced the Geohackweek tutorials before, they've been a great help! I think it'd be useful to include more background and examples on when to use xarray+dask for spatial data that doesn't have/need a CRS and when to use rioxarray. All of the lessons so far developed focus on geospatial imagery or vectors with CRSs (largely following the corresponding R Geospatial lesson). It'd be great to borrow from these Geohackweek tutorials and update them where it makes sense now that xarray has grown.
For those interested, here is a blog post on a first test run of this lesson in February, where @marwahaha and I taught this to interns at the NASA DEVELOP program: https://carpentries.org/blog/2020/03/teaching-a-new-geospatial-python-lesson/
The post has some suggestions for improvements (which I or someone else can make github issues for).
Our team for a related Hackweek, OceanHackweek, just had a meeting where we talked about synergies with the Carpentries. We can definitely contribute xarray+dask materials. I've never used rioxarray myself, but looking forward to it.
I'll look over your blog post. Thanks!