Lesson 4.2. Improving trust in citizen science data: distant cloud-based calibration
Low-cost sensors (LCS) that are typically used in citizen science can offer measurements with high spatiotemporal resolution which could be used to supplement existing datasets from official monitoring stations. These LCS, however, require frequent calibration in order to provide accurate and reliable data as the sensor signal is often affected by environmental conditions, such as temperature and humidity, when deployed in the field.
One method to improve data accuracy and account for the dynamically changing conditions is to use a distant calibration method. In this approach, a few low-cost sensors co-located with reference grade measurement stations and several low-cost sensors in the region of interest are deployed simultaneously. This ensures that they go through a similar process of ageing in the field. Pollutants like particulate matter (PM) and nitrogen dioxide (NO2) that are caused mostly by anthropogenic factors have lower and relatively stable concentrations at night due to limited human activity, and are fairly homogeneously distributed in an area compared to the day-time.
The distant calibration process
In the distant calibration approach, the LCS data is first pre-processed which includes the removal of outliers. Then, measurements are collected from the LCS network and nearby highly accurate reference stations during a period of time, typically a month. The reference station exhibiting the highest similarity with the raw sensor data in terms of relative night-time fluctuations is selected.
Next, a multilinear regression model is built using both data types (that from LCS and reference stations), plus temperature, humidity, and possible other interfering pollutants. Other variables and coefficient parameters are estimated and applied to calibrate citizen science data in real-time in the cloud environment. This ensures that dynamic changes in the microenvironment around LCS have a minimal effect on calibration.
While LCS are deployed, the calibration’s performance is evaluated using the co-located reference sensors, and by taking corrections for drift and ageing (loss of sensitivity) into account.
The cloud-based calibration framework for LCS
The application of distant calibration in COMPAIR
Prior to COMPAIR, the calibration framework was only tested in Belgium and the Netherlands. The presence of Bulgarian, Greek and German pilots in the project necessitated expanding the method to new geographic areas. However, reference stations were sparsely spread in the new regions, which led to increased distances between the stations and LCS, posing a risk to calibration performance.
Furthermore, the use of mobile sensors in some pilots (e.g. Berlin cycling campaign) meant that these sensors too had to be calibrated, even though the framework was initially developed for and tested on static LCS devices. And since the mobile sensors were stored indoors at night, night-time calibration wasn’t feasible.
To address these challenges, the training dataset was selected based on the variance among the reference stations, and day-time data had to be used in lieu of night-time’s. With these modifications, we learned that, on average, applying distant calibration to particulate matter measurements led to moderate reductions in measurement error. However, the same approach could not be extended to other pollutants. NO2 (nitrogen dioxide) measurements did not improve until additional modifications were applied to the model, such as decreasing the search radius for reference stations in regions where they are sparsely spread.
Video presentation on data calibration