Hosting and Accessing Cloud Optimized GeoTIFFs on AWS S3

This post is about hosting and accessing Cloud Optimized GeoTIFFs or COGs on AWS S3. I first learned about COGs while trying to find a solution for storing/hosting large raster images (e.g. aerial imagery at 2-4cm resolution) that my team and clients can easily access and use over the web.

But first, what are COGs? It isn’t a new format; COGs are simply GeoTIFFs that have an internal organization that supports efficient access via HTTP. This internal organization, combined with an HTTP feature called GET range requests (also known as Byte Serving) that allows only the portion of the file that it needs to be retrieved. Think about the way a video or music file is streamed online - you can skip forward or backward and start at a specific point without downloading the full video or music file. The COG format works the same way but for raster files - you can access the parts of the GeoTIFF as you need it instead of having to download the whole file. GOGs provide an open data format that provides an efficient and cloud ready workflow. Many data agencies and companies have started using it and it is maturing at a rapid pace. For example, the USGS have switched over to providing DEMs and Landsat data as COGs on their websites. You can find out more about GOGs at cogeo.org.

This is the workflow that I’ve come up with for creating COGs using GDAL and hosting and accessing them on AWS S3.

Step 1: Check for GDAL installation and version

First thing to do is to make sure you have GDAL installed and check which version you have. If you’ve installed QGIS using the OSGeo4W installer then you most likely will have GDAL installed. Open the OSGeo4W Shell and see which version you have. To do this, just type: gdalinfo --version

The version of GDAL is important to know as version 3.1 has a built in COG driver that support COG creation.

Step 2: Take a look at metadata

It’s always a good idea to take a look at the metadata of your geoTIFF. Go into the directory of where your geoTIFF is stored and use the gdalinfo command on your file (e..g. gdalinfo testimage.tif).

Looking at the metadata, I can tell that my geoTIFF file doesn’t have overviews and it is not tiled. Internal tiling allows rendering applications to quickly select, decompress and display only the portion or tile(s) of the image that it needs. Overviews also allows for fast access of zoomed out views of the image when needed. GDAL 3.1 version has a built-in COG driver that supports COG creation so tiling and overview creation are applied as default options.

Step 3: Create COG

So basically in this step, I’ll use the gdal_translate command with the COG option. This will make a copy of my original GeoTIFF file that is COG compliant. Below is an example for a single GeoTIFF file. If I had more files that I wanted to make COG compliant then I could create a batch file and run through all the files in my directory.

Fig03.png

See GDAL COG Creation page for all the different options available. In my case, I am using a JPEG compression which works fine for an aerial imagery. For older versions of GDAL, see the GDAL wiki page for more information on creating COGs.

Step 4: Validate COG

After creating the COG, I need to check to make sure that it is valid. Several sources (e.g. cogeo.org and GDAL Wiki) says to use the ​validate_cloud_optimized_geotiff.py to check the COG but I found that this python script doesn’t work on COG created with GDAL 3.1. It didn’t matter what options I tried - the script just said invalid COG. Other people seem to have same issue so it would seem this validation script does not work with GDAL 3.1 - see issue 151.

Instead of using the validation script, I just check the metadata of my COG to make sure that it has overviews and tiles. In the metadata, look for the Image Structure Metadata section - this will tell you if your COG is compliant. At least I am assuming it is valid since it indicates that the layout structure of my image is a COG.

Fig04.png

For comparison, my original aerial imagery is about 3.5 GB but my COG is only about 260 MB - that’s a huge difference in file size. The COG is only ~8% of the original aerial.

Step 5: Upload COG to AWS S3 Bucket

After I validate that my COG is good, I can upload it to a web server. I’m using an AWS S3 bucket to store my image. There is already plenty of information and tutorials on setting up and using AWS available; see signing up for an AWS account and free usage tier. The sign up process requires a credit card, but I think the free tier is good for testing the set up, which is what I’m using. I just followed the tutorial on Amazon’s site for creating a S3 bucket.

Here are the steps I went through:

  1. Sign up for AWS account and see getting started with AWS

  2. Create an IAM user as recommended by AWS and save Access keys (access key ID and secret access key) . These access keys are important for private data access.

  3. Create my bucket (e.g. cogaerials)

  4. Create a folder (e.g. uhm-project) to help with organization of my files in my bucket,

  5. Upload my COG aerial into the folder I just created.

  6. Also, I made the object (i.e. my COG aerial) in my bucket to public read option to see if I can easily access via https in QGIS.

  7. Copy the URL of my object or COG_aerial so I can access it in QGIS in the next step.

Step 6: Visualize COG in QGIS

Using QGIS 3.18, I’m going to load my COG and view it. Here’s a good tutorial on using COG in QGIS. The method I’m using below works for QGIS version 3.2 or higher.

Public Data Option:

  1. In QGIS, open the Data Source Manager window

  2. Go to Raster tab

  3. Source Type: Select Protocol: HTTP(S), Cloud, etc

  4. Type: HTTP/HTTPS/FTP (NOTE: this maybe be the easiest way to access the data if it’s public)

  5. URI: Paste in the object URL (i.e. COG image URL) from your AWS S3 Object

  6. Authentication: leave as default option

  7. Options: NOTE: I left this section as the default options

  8. Click Add

If I look at the source properties, I can tell that the raster layer is coming from my AWS S3 bucket.

This is how my COG aerial looks in QGIS. I am really happy with it. I can zoomed in/out, pan around and it displays pretty fast and the resolution is still good - at least I can’t tell the difference between the COG aerial and the original aerial.

Private Data Option:

If your data is private but you still want to visualize it in QIGS, you will need to set the environmental variables for AWS SECRET ACCESS KEY and AWS ACCESS KEY ID that is provided by AWS when you create an IAM user.

  1. In QGIS, got to Settings menu >> Options

  2. In the Options window: go to System tab

  3. Expand the Environment section: check the box use custom variables

  4. Click add to add variables: AWS secret access key and AWS access key id. For Apply: select Overwrite

  5. Click OK and restart QGIS

After you restart QGIS, then use the Data Source Manager window to add in the COG image.

  1. In the Data Source Manager: Select Raster tab

  2. Select Protocol: HTTP(S), cloud, etc.

  3. Type: Select AWS S3

  4. Bucket or container: enter your bucket name (e.g. cogaerials)

  5. Object key: enter your object name include subfolder if any (e.g. uhm-project/UHM_orthoCOG.tif)

  6. Options: I left as default options

  7. Click Add

If all is successful, then you should be able to see your COG added in QGIS like the example below.

There are other options for accessing AWS S3 such as Requester Pay option as well, but that’s not covered here. I haven’t tried this option, yet but my guess is you will need to set up or enable Requester Pay option for your bucket and then also set the environmental variable for that as well in QGIS.

Anyway, I hope you find this useful. I’ve learned a lot working through this workflow myself and I am happy to share with others.

Thanks for reading and until next time.