Discovering and Downloading Data via the Command Line
This is an in progress draft example. Please feel free to test, but use with caution!
This tutorial focuses on searching for and downloading Analysis Ready Data (ARD) from a dynamic Spatio-Temporal Asset Catalog (STAC) using the command line. At the end of this tutorial, you will have installed the stac-client command line tool, searched for Lunar data, and downloaded data locally for use in whatever analysis environment you prefer to use. Let’s get right to it!
In this tutorial, you will learn how to:
- Create GeoJSON Region of Interests (ROIs)
- Use the STAC API to list available collections
- Use the STAC API to search collections for data in the ROI
- Download data from the cloud inside of a predefined ROI.
This tutorial requires that you have the following tools installed on your computer:
stac-client is a python library and command line tool for discovering and downloading satellite data. In this tutorial, only the command line tool will be used. First, we need to get the tool installed. See the installation instructions to get the tool installed.
Note the py at the start of the name above. The client is written in python so the module is called pystac-client. When we use the command line tool, the name is stac-client.
To confirm that stac-client installed properly, execute:
You should see the following output:
Congratulations! You have successfully installed stac-client. It is now time to search for analysis ready planetary data.
Before we start searching, lets take a moment to talk about GeoJSON. GeoJSON is a standard that is used to encode spatial geometries. All of the STAC items that are available for download include an image footprint or geometry that describes the spatial extent of the data. A common way to discover data is to ask a question like ‘what image(s) intersect with my area of interest (AOI)?’. In order to answer that question, we need to ask it using a polygon encoded as GeoJSON. Since we are working with a command line, we need to do a bit of leg work and encode a GeoJSON polygon.
First, let’s make a simple square. To do this, open a text editor (vim, emacs, nano, notepad++, text editor, etc.) and paste the following:
This area of interest spans from the prime meridian to 2.5˚ east of the prime meridian (0˚ to 0.5˚) and the equator to 0.5˚ north of the equator (0˚ to 0.5˚). The geometry includes five points because we need to ‘close’ the ring. In other words, the first and last point are identical.
Let’s save that GeoJSON into a text file named
aoi.geojson (or area of interest). If you are having any issues with the above, definitely run the string through a GeoJSON linter (or checker) like
We have officially made it! We have the tools all set up to search for data. Let’s get that first search out of the way immediately. Execute the following:
stac-client search https://stac.astrogeology.usgs.gov/api --matched --method GET
You should see output that looks like the output below:
The number of items found will differ as we add more data, but the general response should be identical. This means that the dynamic planetary analysis ready data catalog contains 57114 stac items when this tutorial was being written.
Lets break down the query to understand what exactly is happening here. First, here is the query that we executed:
stac-client search https://stac.astrogeology.usgs.gov/api --matched --method GET
The first thing we do is tell stac-client that we want to search for data. The other option would be
stac-client collections (we will use that shortly). Next, stac-client needs to know the URL to use to be able to access the STAC search service. The USGS hosted STAC server URL is
https://stac.astrogeology.usgs.gov/api/. The last argument tells stac-client to limit the number of returned items to 1 and to print the number of matched items.
We use the jq tool in the section for pretty printing the GeoJSON responses from the API. jq can be installed just like
stac-client was installed, using
conda install jq.
Now we would like to see what collections are available to search and download data from. To do this, we can use the following command:
stac-client collections https://stac.astrogeology.usgs.gov/api
or, if you have installed jq for pretty printing:
stac-client collections https://stac.astrogeology.usgs.gov/api | jq
The output should look similar to the following:
At the time of writing, the above command will return six different collections with data targeting the Moon, Mars, and Jupiter’s moon Europa. Each of these collections can be queried independently. Let’s see how many data products are available from the Kaguya/SELENE Terrain Camera.
The full dump of collection metadata is a lot to parse and likely not information needed all at once. It would be easier to just get the human readable title and the machine parseable collection id. To do this:
`stac-client collections https://stac.astrogeology.usgs.gov/api | jq '. | "\(.title) \(.id)"'`
The output should look something similar to the following:
This tutorial is using the jq command line JSON tool pretty heavily. While powerful, the jq syntax can be very intimidating! Feel empowered to just copy/paste for now and let us have spent the time getting the syntax right. Once you are more comfortable with the basics of querying the API you could dig more into jq. Alternatively, just print the JSON to the screen or pipe it to a text file and manually scan for the fields of interest.
To see how many items (observations) are available within a given collection, it is necessary to tell stac-client which collection to search. We know the names of the collections because they are the id key in the STAC collection. In the example immediately above, the line is
"id": "mro_hirise_uncontrolled_observations". Since we are interested in MRO HiRISE data, we will use the following command:
stac-client search https://stac.astrogeology.usgs.gov/api/ -c mro_hirise_uncontrolled_observations --matched
The response should be:
Above, we created a file named
aoi.geojson that defines an area of interest. Now we will combine that with a query for the target body we are interested in. Here is the full command:
stac-client search https://stac.astrogeology.usgs.gov/api/ --intersects aoi.geojson -c mro_hirise_uncontrolled_observations --save hirise_to_download.json
Lets break this command down like we did above:
https://stac.astrogeology.usgs.gov/api/- defines the URL to search
--intersects aoi.geojsontells stac-client to only search for data that intersects our area of interest (as defined in aoi.json)
--save hirise_to_download.jsontells stac-client to save the results to a file named
hirise_to_download.json. We will use this file in the next step to download the files found.
This command creates a new file on disk, hirise_to_download.json that contains a GeoJSON FeatureCollection with some number of observations in it. We can see what the number is by parsing the file or running the above command replace –save hirise_to_download.json with –matched. (At the time of writing, this command return 4 items.)
Since the hirise_to_download.json file is a GeoJSON FeatureCollection, it is possible to load that file into your favorite GIS, to visualize the image footprints, and to see the attributes of the different items. You will not see the data behind the metadata, but we will download the data in the next step.
Let’s imagine that the four items found above are ones that we are looking for. In the previous step you executed a query and created a new file named
hirise_to_download.json that contains four STAC items. To download the data locally here is a small helper script. This script makes use of jq and wget. You could save this script to the directory you are currently in into a file named download_stac.sh.
Then you can download the files that were found by the search using the following:
This command will run for a few minutes (on a relatively fast internet connection). At the conclusion of the run, you should have a new directory called
hirise_uncontrolled_monoscopic. Inside of that directory, you should see four sub-directories, each containing all of the data for the stac items we discovered previously!
The data are organized temporally. The STAC specification is spatio-temporal after all.
That’s it! In this tutorial, we have installed the stac-client tool into a conda environment and executed a simple spatial query in order to discover and downloaded STAC data from the USGS hosted analysis ready data (ARD) STAC catalog.