This package provides methods to get Sentinel-2 L2A data from Google Earth Engine (https://developers.google.com/earth-engine/) or from cloud-optimized geotiffs in AWS Open data registry maintained by Element84 (https://registry.opendata.aws/sentinel-2-l2a-cogs/, https://github.com/Element84/earth-search).
This package has been mainly used in agricultural projects at the Climate System Research unit at the Finnish Meteorological Institute. The package is at the moment under constant development. Apologies for the lack of documentation at the moment. Please feel free to contact me in case you want more information about this package.
Package also includes python implementation of ESA's SNAP Biophysical processor v1 which can be used to compute biophysical parameters, such as leaf area index (LAI) (Original Java code in SNAP here. Algorithm documentation: http://step.esa.int/docs/extra/ATBD_S2ToolBox_L2B_V1.1.pdf.
Documentation: https://satellitetools.readthedocs.io/
Source code at: https://github.com/ollinevalainen/satellitetools
Citing: If you use this package in your research, I would appeaciate if you cite the relevant version from Zenodo: https://doi.org/10.5281/zenodo.5993291
Author: Olli Nevalainen, Finnish Meteorological Institute.
From PyPi:
pip install satellitetoolsFrom GitHub:
pip install git+https://github.com/ollinevalainen/satellitetools.gitDevelopment branch:
pip install git+https://github.com/ollinevalainen/satellitetools.git@developEarth Engine must bee authenticated and initialized by the user. Guides to authentication: https://developers.google.com/earth-engine/guides/auth
Remember GEE's licence terms.
Currently (2024-10-23) 2022 data is missing still from the sentinel-2-c1-l2a collection because ESA has not yet processed them. 2022 is instead fetched from the sentinel-2-l2a collection. Follow discussion and issues related to this at: https://github.com/Element84/earth-search/issues.
The biophysical processor implementation in this package does not currently use the convex hull check (see the Java code and ATBD) and does not have as extensive input/output validity flagging as the original version in SNAP.
Define area of interest and request parameters:
from shapely.geometry import Polygon
import satellitetools as sattools
from satellitetools.common.sentinel2 import S2_BANDS_10_20_GEE, S2_FILTER1
qvidja_ec_field_geom = [
[22.3913931, 60.295311],
[22.3917056, 60.2951721],
[22.3922131, 60.2949717],
[22.3927016, 60.2948124],
[22.3932251, 60.2946874],
[22.3931117, 60.2946416],
[22.3926039, 60.2944037],
[22.3920127, 60.2941585],
[22.3918447, 60.2940601],
[22.391413, 60.2937852],
[22.3908102, 60.2935286],
[22.390173, 60.2933897],
[22.389483, 60.2933106],
[22.3890777, 60.293541],
[22.3891442, 60.2936358],
[22.3889863, 60.2940313],
[22.3892131, 60.2941537],
[22.3895462, 60.2942468],
[22.3899066, 60.2944289],
[22.3903881, 60.2946329],
[22.3904738, 60.2948121],
[22.3913931, 60.295311],
]
qvidja_polygon = Polygon(qvidja_ec_field_geom)
aoi = sattools.AOI("qvidja_ec", qvidja_polygon, "EPSG:4326")
datestart = "2023-06-01"
dateend = "2023-06-30"
bands = sattools.S2Band.get_10m_to_20m_bands()GEE:
You need to initialize (and possibly authenticate) Google Earth Engine and define GEE data source.
import ee
ee.Initialize(project="your-ee-project-name")
datasource = sattools.DataSource.GEEAWS:
Define AWS data source.
datasource = sattools.DataSource.AWSEvaluate quality and get the data using a wrapper function:
req_params = sattools.Sentinel2RequestParams(
datestart, dateend, datasource, bands
)
df_quality_information, ds_s2_data = sattools.wrappers.get_s2_qi_and_data(
aoi=aoi,
req_params=req_params,
qi_threshold=0.02,
qi_filter=S2_FILTER1,
)
# df_quality_information will include quality information for all available Sentinel-2 images.
# ds_s2_data will include Sentinel-2 data for the requested bands and metadata for dates passing quality filter (see below more on the quality evaluation).Alternatively without the wrapper:
This way you can access the data collection class with some extra metadata and with AWS the source data item. With AWS data source you can speed-up process by concurrently fetching multiple images with multiprocessing parameter.
if req_params.datasource == DataSource.GEE:
s2_data_collection = GEESentinel2DataCollection(aoi, req_params)
if req_params.datasource == DataSource.AWS:
import os
number_of_processes = os.cpu_count() - 1 # or something else
s2_data_collection = AWSSentinel2DataCollection(aoi, req_params, multiprocessing=number_of_processes)
s2_data_collection.search_s2_items()
s2_data_collection.get_quality_info()
qi_threshold=0.02 # without specifying default 0 will be used
qi_filter=S2_FILTER1
s2_data_collection.filter_s2_items(qi_threshold, qi_filter)
s2_data_collection.get_s2_data()
s2_data_collection.data_to_xarray()
df_quality_information = s2_data_collection.quality_information
ds_s2_data = s2_data_collection.xr_datasetCalculate vegetation variables:
# Example to calculate LAI and NDVI.
variable = sattools.biophys.BiophysVariable.LAI
ds_s2_data = sattools.biophys.run_snap_biophys(ds_s2_data, variable)
vi = sattools.biophys.VegetationIndex.NDVI
ds_s2_data = sattools.biophys.compute_vegetation_index(ds_s2_data, vi)
# Both LAI and NDVI are now included in the dataset with variables names lai and ndvi.
# Other biophysical variables and vegetation indices also available:
class VegetationIndex(str, Enum):
NDVI = "ndvi"
CI_RED_EDGE = "ci_red_edge"
GCC = "gcc"
class BiophysVariable(str, Enum):
FAPAR = "fapar"
FCOVER = "fcover"
LAI = "lai"
LAI_Cab = "lai_cab"
LAI_Cw = "lai_cw"Tests have been written for pytest.
If you need to specify Google Earth Engine project for ee.Initialize(), for test you need to do it by having "EE_PROJECT_PYTEST" environment variable set. For example:
export EE_PROJECT_PYTEST=your-ee-projectThe satellite data is filtered based on the information available in the scene classification band (SCL) of the Sentinel-2 Level 2A products. In the SCL band every pixel is classified into one of the following classes:
class SCLClass(Enum):
"""Sentinel-2 Scene Classification Layer (SCL) classes."""
NODATA = 0
SATURATED_DEFECTIVE = 1
DARK_FEATURE_SHADOW = 2
CLOUD_SHADOW = 3
VEGETATION = 4
NOT_VEGETATED = 5
WATER = 6
UNCLASSIFIED = 7
CLOUD_MEDIUM_PROBA = 8
CLOUD_HIGH_PROBA = 9
THIN_CIRRUS = 10
SNOW_ICE = 11The percentage of each of those classes within the polygon/area-of-interest(AOI) is calculated from SCL data. This information is/can be used to to filter the dates for which the actual data is requested from GEE or AWS. There's two parameters, qi_threhold and qi_filter, which are used to filter the data. The qi_filter is a list of the (unwanted) classes whose percentages within the AOI is summed. Currently, qi_filter is by default this:
S2_FILTER1 = [
"NODATA",
"SATURATED_DEFECTIVE",
"CLOUD_SHADOW",
"UNCLASSIFIED",
"CLOUD_MEDIUM_PROBA",
"CLOUD_HIGH_PROBA",
"THIN_CIRRUS",
"SNOW_ICE",
]These classes are considered unwanted and user can change the classes included in that
list by the qi_filter parameter. If the sum of the percentages of the classes defined
in the qi_filter is less than the qi_threshold, the data is considered good/acceptable.
I have been using the 0.02 which I have empirically tested and found to be ok for our
purposes.
The SCL band provided in the S2 level-2A products is automatically generated and it has errors/misclassifications in it. This affects also the accuracy of the filtering processs and some bad data acquisition dates might pass the filter.
Realistic:
- Include example notebook in the documentation
- Examples to docstrings
- Improve tests
- Add coverage and docs bulding to nox sessions
- Dcoument branching strategy and add contributing guidelines
- Refactor ee_* functions in gee.py. They have a lot of repetition.
- Calculate vegetation indices directly in GEE cloud and return only timeseries
Wishful thinking:
- Point-based AOI (currently AOI must be a polygon)
- Add Landsat data support
- Add Sentinel-1 data support
- Large spatial scale data retrieval from AWS: yield daily xr_datasets
- currently all data is fetched first to memory
The package ended-up being quite useful in multiple research projects, so development of the package has been continuous. This release is a milestone for years of small changes and recent major refactoring of code during past two months. But now develop branch was finally merged to master.
In the future, the plan is to publish releases more frequently when possible developments in the develop branch gets merged to master branch.
Too much has changed compared to the first release, but here's summary of changes respect to the previous develop branch before major refactoring done during Sep-Dec 2024:
Breaking changes:
- Renamed xrtools.py to timeseries.py
- RequestParams class is now Sentinel2RequestParams in sentinel2 submodule
- Restructured files and removed nesting. For example gee imported as satellitetools.gee instead of satellitetools.gee.gee
- The quality information dataframe doesn't have anymore
"Date"column, but instead index named"acquisition_time"as UTC aware timestamp (frompd.to_datetime).
New features: 🔧
- Refactored and made codes more object-oriented and modular:
- There's now parent classes
Sentinel2DataCollection,Sentinel2ItemandSentinel2Metadatawhich have datasource specific child classes:GEESentinel2DataCollectionAWSSentinel2DataCollection,AWSSentinel2Item,AWSSentinel2Metadata
- The parent classes have methods that are common for both data sources and the child classes have methods that are specific to the data source.
- Improved handling of Sentinel-2 bands and scene classification classes with
S2BandandSCLClassclasses - Biophysical processor is now a class
SNAPBiophysProcessor, also the biophysical variables and vegetation indices are now Enum classes. - Enabled easier importing and access of classes and submodules. For example, you can define the data source with
satellitetools.DataSource.GEEand the bands withsatellitetools.S2Band.B4. - Added tests for pytest.
- Improved docstrings, added examples and updated README.
- Documentation using sphinx
- Testing for python versions 3.10, 3.11 and 3.12 using nox
- Changed from print statements to logging.
This release is currently (7 February 2022) in use in Field Observatory (https://www.fieldobservatory.org/) and cited in a scientific article "Towards agricultural soil carbon monitoring, reporting and verification through Field Observatory Network (FiON)" that describes Field Observatory and has been accepted for publishing in EGU's Geoscientific Instrumentation, Methods and Data Systems journal (https://doi.org/10.5194/gi-2021-21).