Data Status Report for MICA - Management of Invasive Coypu and muskrAt in Europe

Research Institute for Nature and Forest (INBO), Wageningen University, and Unie van Waterschappen

Tim Adriaens, Kristof Baert, Warre Baert, Abel De Boer, Gust Boiten, Dimitri Brosens, Emma Cartuyvels, Jim Casaer, Bram D’hondt, Manon Debrabandere, Peter Desmet, Sander Devisscher, Dennis Donckers, Sanne Van Donink, Silke Dupont, Wouter Franceus, Heiko Fritz, Lilja Fromme, Friederike Gethöffer, Jan Gouwy, Sanne Govaert, Casper Herbots, Frank Huysentruyt, Leo Kehl, Liam Letheren, Lydia Liebgott, Yorick Liefting, Jan Lodewijkx, Claudia Maistrelli, Björn Matthies, Kelly Meijvisch, Dolf Moerkens, Axel Neukermans, Brecht Neukermans, Jelle Ronsijn, Kurt Schamp, Dan Slootmaekers, Linda Tiggelman, and Danny Van der beeck

June 18, 2026

Abstract

This report provides an overview of the camera trap dataset and the preprocessing steps used for automated report generation. It enables users to quickly assess the quality of their data through a concise summary generated by a series of automated checks. The report is structured into six main sections: Setup, Data Availability, Species Records, Validation, Annotation, and Observation Types by Capture Method, each offering a concise assessment of dataset integrity and readiness. At the conclusion of the report, you will receive an overall data quality classification, which will be one of the following:

Perfect: All key checks passed; no issues detected.
Acceptable: Minor issues found; the dataset remains usable, although corrections are recommended.
Needs Improvement: Major issues identified; corrections are mandatory before continuing.

If the status is Perfect or Acceptable, you may proceed with generating the final report. However, if the status is Needs Improvement, report generation is not possible due to major issues identified in the dataset, and the dataset must be improved before continuing.

Note: To ensure the highest quality automated report generation from your camera trap data, we strongly recommend improving your dataset whenever possible. If the status is Acceptable, consider refining it to reach the ideal Perfect level. This may require only minor adjustments, but it can make a significant difference in the final report. If the status is Needs Improvement, we advise resolving the identified issues to bring the dataset to at least the Acceptable level, and ideally to the Perfect standard.

Chapter 1: Setup

This chapter covers spatial and temporal checks. Dataset spans Belgium, Netherlands, and Germany (MICA - Management of Invasive Coypu and muskrAt in Europe) with time zone UTC+2 and coordinate range 50.699°–53.407°N and 3.518°–8.330°E. All required CSV files (deployments.csv, media.csv, observations.csv) and JSON file are available in the dataset. The 49 distinct camera locations are distributed within a minimum convex polygon (MCP) of 43608.26 km².

Spatial Data

This section summarizes the spatial quality of the camera-trap data extracted from the dataset files. It focuses specifically on location data, assessing issues such as missing values, duplicate entries, spatial outliers, and overall spatial structure. The complete results are presented in Table 1. If any issues are identified, we strongly recommend correcting the data before proceeding with report generation for your study site.

Duplicate entries (by coordinates, LocationName, or LocationID) are collapsed to the first occurrence, and any rows with missing values in any column are excluded from the report-generation analysis unless you edit them. If any issues are identified in the table below, we strongly recommend correcting the data before proceeding with report generation for your study site.

Note: If your dataset contains rows where the coordinates (longitude and latitude) are complete but either the LocationID or LocationName is missing, these records are still treated as missing. This is because, for spatial quality checks and analysis, both the geographic coordinates and their associated metadata (LocationID and LocationName) are equally important.

Table 1. Spatial summary of camera-trap locations

Metric	Result	Status
Number of locations	54	Before any filtering
Number of duplication in coordinates	5	🔴 Duplicate coordinates found in 5 groups; 5 duplicate rows.
Number of duplication in LocationID	0	🟢 No duplicated locationIDs
Number of duplication in LocationName	0	🟢 No duplicated locationNames
Missing data	0	🟢 No missing data found
Total distinct coordinates	49	After removing duplicates/missing values
Mean distance between locations (m)	4499.4	Mean inter-location distance
Max distance between locations (m)	88620.03	D_LD_lake Dümmer and B_DM_val 1_weerstation
Min distance between locations (m)	69.92	B_DL_val 3_dikke boom and B_DL_val 2_emissaire
Spatial pattern	targeted (indicated explicitly in metadata)	Clustered (using point-pattern analysis)
Outliers	High risk outlier = 1 Medium risk outlier = 0 Low risk outlier = 0 Non-Terrestrial = 0	🔴 High-risk (1): D_LD_lake Dümmer \| 🟢 All locations are on land.

Temporal Data

This section summarizes the temporal patterns of the camera-trap dataset, including coverage over time, potential gaps, and any detected anomalies or outliers, which are presented in Table 2.

Table 2. Temporal coverage summary

Metric	Result
Deployment year coverage	2019 – 2023 (🟢 complete)
Observation year coverage	1999 – 2023 (🔴 missing: 2000–2018)
Deployment first/last setup	2019-09-18 07:11:08 – 2023-09-27 08:23:15
Observation first/last record	1999-12-31 – 2023-05-03
Temporal consistency	🔴 Temporal inconsistency (Observations exist, but deployments are missing for: 1999)
First/last date check	🔴 Earliest observation is earlier than the first deployment start date (check timestamps or timezone). 🚨
Month coverage span	2019: Sep–Dec 2020: Jan–Dec 2021: Jan–Dec 2022: Jan–Dec 2023: Jan–May, Sep
Calendar coverage	1325 of 1471 days (90.1%)
Max gap between deployments	146 days (from 2023-05-04 to 2023-09-26)
Min gap between deployments	1 days (from 2020-09-02 to 2020-09-02)
Missing deployment intervals	None 🟢
Zero-length deployments	64 zero-length interval(s) 🟡
Temporal outliers	None 🟢
Invalid timestamp format	602 timestamp(s) have invalid format 🔴 (rows: 7540, 7558, 7577, 7595, 7616, 7640, 7659, 7678, 7700, 7719, …)
Future observation timestamps	None 🟢

Chapter 2: Data Availability

Review Essential Data Availability

In this chapter, we review all essential data components required for generating the report. These include both mandatory fields and necessary supporting fields or files that contribute to the completeness and overall quality of the final output. Table 3 provides a structured overview of the availability and completeness of these elements. While all fields listed in the table are important for generating a high-quality report, the bolded fields are mandatory. Their status must be marked as Complete; otherwise, any associated records with missing or partial values will be automatically excluded from the final output.

***Table 3.*** Essential data availability and completeness
Category	Field	Status
Locations	locationID	🟢 Complete
	locationName	🟢 Complete
	longitude	🟢 Complete
	latitude	🟢 Complete
Deployment	deploymentID	🟢 Complete
	locationID	🟢 Complete
	deployment_interval	🟢 Complete
	deploymentStart	🟢 Complete
	deploymentEnd	🟢 Complete
	habitat	🔴 Incomplete (1539 of 1539 missing; 100%)
	setupBy	🟡 Partial (451 of 1539 missing; 29.30%)
	baitUse	🔴 Incomplete (1539 of 1539 missing; 100%)
	cameraHeight	🟢 Complete
Observations	timestamps	🟡 Partial (602 of 183813 missing; 0.33%) \| invalid format: 602
	observationType	🟡 Partial (16102 of 183813 missing; 8.76%)
	count	🟢 Complete
	classifiedBy	🟢 Complete
	taxonID	🟢 Complete (taxonID recorded for 116529 of 116529 animals; 100%) \| 100 unique
	behavior	🔴 Incomplete (behavior recorded for 0 of 116529 animals; 0%)
	sex	🟡 Partial (sex recorded for 9850 of 116529 animals; 8.45%)
	lifeStage	🟡 Partial (lifeStage recorded for 39374 of 116529 animals; 33.79%)
	angle	🔴 Incomplete (angle recorded for 0 of 116529 animals; 0%)
	radius	🔴 Incomplete (radius recorded for 0 of 116529 animals; 0%)
	speed	🔴 Incomplete (speed recorded for 0 of 116529 animals; 0%)
	individualID	🔴 Incomplete (individualID recorded for 0 of 116529 animals; 0%)
Media	media timestamp	🟡 Partial (657 of 3996924 missing; 0.02%) \| invalid format: 657
	file.path	🟢 Complete
	comments	🟢 Complete
	favourite	🟢 Complete
Sequences	nrphotos	🟢 Complete
Sequences	captureMethod	🟢 Complete
Taxonomy	taxonID	101 unique taxonID identified
	scientificName	🟢 Complete
	vernacularNames.nld	🟢 Complete
	vernacularNames.eng	🟢 Complete
Additional Files	habitat.csv	🟢 Complete
Additional Files	spatial boundary (e.g., shapefile)	🟢 Complete

Chapter 4: Validation

In this chapter, we summarize the classification and validation results of the observation data. The ‘CaptureMethod’ column indicates the method used to capture each image, which may include motion detection, time-lapse, or other mechanisms depending on the camera trap setup. The columns ‘Human’, ‘Machine’, and ‘NA_Classification’ represent the number and percentage of observations classified manually by a human, by an automated machine model, or left unclassified (NA), respectively. NA values often result from blank images, errors, or missing data. The ‘Machine_Animal’ column reports how many machine-classified observations were identified as animals, along with their percentage out of all machine classifications for that method. Finally, ‘Validated_Animal’ shows how many of those machine-identified animals were subsequently validated by a human, indicating human-confirmed correctness of the machine prediction. Collectively, this summary helps evaluate both the extent and reliability of the classification process across capture methods. In cases where validation rates are low or discrepancies are noted, users are encouraged to revisit the raw data for quality control, potential retraining of models, or targeted human review of specific subsets.

Table 4. Classification and validation summary

captureMethod	Human	Machine	NA_Classification	Total	Machine_Animal	Validated_Animal
motionDetection	75670 (43.4%)	50198 (28.8%)	48439 (27.8%)	174307	34705 (69.1%)	370 (1.1%)
timeLapse	8 (0.1%)	4384 (46.1%)	5114 (53.8%)	9506	33 (0.8%)	0 (0%)
TOTAL	75678 (41.2%)	54582 (29.7%)	53553 (29.1%)	183813	34738 (63.6%)	370 (1.1%)

Chapter 5: Annotation

In this chapter, we provide an overview of annotation confidence in the camera-trap observations. The summary distinguishes between machine-generated and human-generated classifications and highlights the variation in confidence scores across annotation methods. This section helps assess annotation reliability, identify records below selected confidence thresholds, and evaluate whether additional quality control or review may be needed before ecological reporting.

Table 5. Summary of machine and human annotation confidence scores

Machine Classification Confidence

Statistic	Value
Minimum Confidence Score	0.00
Maximum Confidence Score	1.00
Mean Confidence Score	0.75
Below Threshold (<0.8)	0
Total Annotations	35124

Human Classification Confidence

Statistic	Value
Minimum Confidence Score	0.50
Maximum Confidence Score	1.00
Mean Confidence Score	0.99
Below Threshold (<1)	1456
Total Annotations	75678

Chapter 6: Observation Types by Capture Method

This figure summarizes observation types by capture method (Motion, TimeLapse) and overall. Each pie shows the within-method percentage of each observation type—animal, human, vehicle, blank, unknown, and unclassified; providing a quick view of what was observed and its relative weight in the dataset.

Motion

Figure 1. Distribution of observation types for motion-detection records. Slices show the percentage contribution of each observation type within this capture method.

TimeLapse

Figure 2. Distribution of observation types for time-lapse records. Slices show the percentage contribution of each observation type within this capture method.

Total

Figure 3. Overall distribution of observation types across all capture methods. Slices show the percentage contribution of each observation type in the full dataset.

Conclusion

Based on results across five key sections—Spatial, Temporal, Data Availability, Validation, and Annotation, this dataset is classified as Acceptable — Minor issues found; the dataset remains usable, although corrections are recommended.

Acknowledgment

This report was generated using the camtrapReport R package, developed by Elham Ebrahimi at Wageningen University & Research and Utrecht University, the Netherlands. The development of camtrapReport was supported by Biodiversa+ through the Big Picture project. We also gratefully acknowledge the European Observatory of Wildlife network for its contribution to package testing. Users are kindly requested to cite the package when using camtrapReport or publishing results derived from it.