Ecoacoustic Monitoring Terminology/Glossary

Common terms and concepts used in ecoacoustic monitoring, with a focus on Rainforest Connection (RFCx) uses. 

Abundance: The number of individuals in a population (i.e. population size). Abundance is one of the most important and useful state variables for ecologists, conservationists and managers. The abundance of a species can be related to biotic and abiotic variables providing information on the status and trends of natural populations varying in space and time. The ecologists can use abundance estimation to deal with a myriad of identified management or conservation problems such as predicting the spread of wildlife disease or invasive species, assessing suitable habitats for endangered animals, investigating the response of natural populations to different landscape management, monitoring populational trends of game animals, and forecasting climate change effects on populations size and distribution.  

Amplitude: The amplitude of the periodic movement of the molecules in the medium will determine the pressure and intensity exerted by the sound. It is the distance from the central portion of a sound wave to its highest (peak) and lowest (trough) points. It is perceived as the volume of a sound and is measured in decibels (dB).

Amplitude Wavelength: The length of two adjacent peaks of a sound wave (furthest point from the center position of a sound wave).

Acoustic Indices: Mathematical tools used to measure soundscapes in different ways: acoustically, temporally, and spatially. These indices can be used to interpret ecological parameters of the soundscape, such as species composition and richness.

Above: Example of an acoustic complexity index by time of a soundscape. Higher values mean time intervals with more diversity of sounds. For more info, see: Acoustic metrics predict habitat type and vegetation structure in the Amazon.

ASU (Acoustic Space Use): An acoustic index that measures the amount of the frequency range and time being used in a soundscape. This index is positively related to species richness. Sites with higher ASU values have more species. By analyzing the frequencies most used in a sonogram, it is possible to identify which taxonomic groups are contributing to the ASU (e.g. birds, frogs, insects).

Above: Relationship between ASU and species richness. Note that sites with high ASU values also have more species. Table from Aide et al. 2017: Species Richness (of Insects) Drives the Use of Acoustic Space in the Tropics.

Anthropophony: all sound produced by humans in a soundscape. 

Audible sound: The sounds a human being can hear. Our ear can listen to sounds in a frequency range between 20Hz and 20kHz.

Audio Event Detection: An analysis used to automatically detect sound events satisfying user parameters such as minimum amplitude, event duration and bandwidth. A region of interest (ROI) is generated around each detected event. These ROIs can then be inspected in the Arbimon Visualizer tool or analyzed with the Arbimon Clustering tool. 

Bioacoustics: The study of the sounds produced by animals. The bioacoustics focus on but not limited to acoustic signal evolution, sound-producing mechanisms in animals, animal physiology and anatomy, vocalization phenology, and communication and behavior of animals. It is a subject primarily focused on individual species.

Biodiversity acoustic monitoring: Involves the surveying and monitoring of wildlife and environments using autonomous acoustic recorders (AAR). AARs are deployed in the field, recording acoustic data on a specified schedule. These recordings are processed to extract valuable ecological data, such as detecting the calls of animal species of interest. Acoustic monitoring can be used to study a wide variety of taxa as long as they emit detectable sounds, and to date, has been applied to populations of birds, bats, marine mammals, amphibians, insects, terrestrial mammals (e.g. elephants), and many fishes, and can be a useful tool in ecology and conservation.

Biophony: All non-human sounds produced by animals in a soundscape. They can be vocal sounds, such as the call of a frog, or non-vocal sounds, such as the sound produced by tapping a woodpecker's beak on a tree trunk.

Classification: In an ecoacoustics context, it is the procedure of classifying sound signals as the presence of a species. There are currently different algorithms used to identify species automatically.

Clustering: An analysis to identify groups of similar elements in a dataset. In the context of bioacoustics, it can be used to automatically partition a dataset of audio events into distinct sound types (e.g. sounds with similar frequency and duration). 

CNN (Convolutional Neural Network): A type of deep learning model designed for image recognition. They have demonstrated strong performance in the field of sound recognition, where they are typically applied to audio spectrogram images. Provided an adequate labeled dataset of input audio and target output labels, they can be trained to detect many sound classes (for more details see LeBien et al. 2020). 

Detectability: It is the probability of encountering a species in a survey. Detectability can be influenced by several factors, such as sampling effort, sampling method, type of environment, and the researcher's experience. Sometimes, a species may be present but not detected, which characterizes imperfect detection. Some statistical methods take imperfect detection into account during analyses (see occupancy analysis).

Detections: The number of times a species has been encountered in a project/site/date.

Above: Map showing the number of detections of a bird on different sites (each point corresponds to a site). This is a rare species detected in only two sites in this study area.

Detection range: It is the relationship between the probability of detection of a species and the distance from the recorder. The detection range may vary between species, environments and recording equipment.

Detection frequency: The number of recordings with positive detection of a species by the total number of recordings.

Ecoacoustics: Branch of ecology that uses bioacoustics tools applied to the conservation and ecology of animal species.

AudioMoth Edge Device: An automated recording unit designed to monitor biodiversity. AudioMoth is a small and low cost device created by researchers of the University of Southampton and University of Oxford through the Open Acoustic Device project (https://www.openacousticdevices.info/). See AudioMoth setup

 AudioMoth device

Frequency: The number of cycles per unit time, and is measured in hertz (Hz, cycles per second) or kilohertz (kHz, thousands of cycles per second). It is generally perceived by the listener as pitch, with higher and lower frequencies corresponding to higher and lower pitches, respectively.

Gain: An increase in the amplitude of a sound. This increase can improve the quality, but when it is too high, it can lead to clipping, which is a distortion in the sound caused when the amplitude is greater than the capacity of the recording system.

Geophony: All the sounds produced by the geophysical natural components of a soundscape. For example, the sound of a waterfall or foliage on trees swaying in the wind.

RFCx Guardians: Automated recording units developed by RFCx to monitor threats to environments (e.g., deforestation, poaching), but also for monitoring biodiversity. Ann online device that streams audio data to the cloud, providing real-time monitoring directly from the forest canopy. Guardians consist of a custom board, weatherproof box, antenna, microphone, and solar panels to charge the battery. To learn more about the Guardian and its applications, please reach out to contact@rfcx.org

RFCx Guardian Device

Infrasound: Sound waves frequency below 20 Hz, inferior to the minimum frequency that the human ear can hear (e.g. earthquakes, wind turbines, elephants, whales and alligators).

Metadata: The additional information that can include information generated when the recording was made or it can be added afterwards. Examples of metadata include recorder settings, field notes, and software analysis results.

Naïve occupancy: The proportion of sites that a target species was detected.

NMDS (Non-Metric Multidimensional Scaling): A type of multivariate analysis that sorts objects into a multidimensional space based on a distance classification, in which different objects are placed farther apart. In contrast, more similar objects are placed closer together. To verify if the dissimilarity found in the NMDS corresponds well to the real similarity in the data, we verify the stress value. The lower the stress value, the better the solution found in the NMDS corresponds to the real dissimilarity. We can use NMDS to explore differences between sites based on the composition of identified species of recordings or to group sites according to the composition of the soundscape's time/frequency bins (for more details see Furumo & Aide 2019). 

Above: Example of NMDS ordinations representing dissimilarity in species composition (see also species composition) in four different locations (represented by differently colored squares and circles). Each point in ordination is a site. In the first ordering, it is possible to observe that the points are well separated between the locations, which means that the species in each location are different. In the second ordination, the sites from different locations are closer together, which means some sites have many bird species in common.

Noise: Most often it describes unwanted sounds or electrical signals that can obscure the target sounds. Frequently, the sounds caused by geophony (e.g. wind) and anthropophony (e.g. human, machines) are considered noise. They are usually low frequency sounds that can obscure low-frequency calls from owls and anurans. Noise makes recording animals sounds more difficult.

Above: Example of a low frequency anthropogenic noise.

Nyquist frequency: In digital audio recordings, it is half the sampling rate (see sampling rate). For example, if a recorder is configured to record at a sampling rate of 44.1kHz, the Nyquist frequency will be 22.05kHz.

Occupancy: Defined as the proportion of sites, patches, landscape or habitat units occupied by a taxon (MacKenzie et al. 2006). The occupancy uses the basic presence-absence data of a species within a spatial unit, and it can be achieved more easily and cost-effectively than abundance estimation methods (MacKenzie et al. 2006). Although it is considered less informative than abundance, it is a very widely and essential state variable applied in ecology, wildlife management and conservation biology. Some questions faced with occupancy estimation are species distribution range, habitat selection, wildlife disease statics and dynamics, metapopulation dynamics, and resources selection (MacKenzie et al. 2006; Kéry & Royle 2016). A major concern with presence-absence data of species is that detection probability of species is often below 1, biasing the occupancy estimates and its relationship with the predictor variables. The occupancy modelling approach can be used to account for the imperfect detection of species while estimating the "true" species occupancy (see Occupancy Analysis below).

Occupancy analysis: A type of statistical analysis that uses species detection and non-detection data to estimate the probability that a site is occupied by a species. Occupancy models estimate the true occupancy probability of a species at a site while considering that the species may be present at this site even if not detected (i.e. imperfect detection). Occupancy models can incorporate several site characteristics, such as habitat type, temperature, or the data collection period.  See an example of how occupancy models can be used to analyse data from acoustic monitoring here

Above: Map representing the probability of occupancy of a frog species on the island of Puerto Rico.

Pattern Matching (PM): A semi-automatic sound classification algorithm that uses an example (template) and a threshold established by the user. Results below this threshold are automatically identified by the algorithm as possible detections (matches). Each match found by the algorithm has a score value, which is always below the threshold chosen by the user. The higher the score, the greater the match's chance to correspond to a true positive detection. To avoid false positives, it is necessary to manually validate the result, ensuring that the final dataset contains only true positives. On the Arbimon platform, it is possible to select different filters to facilitate the validation process. The most used by the RFCx Science Team is the score filter, which groups matches with the highest score decreasingly (for more details see LeBien et al. 2020). To learn more about how to create Pattern Matching jobs, see this article.  

Above: Example of pattern matching on RFCx Arbimon platform. 

Playlist: A set of recordings grouped according to criteria established by the user. For example, all recordings with the presence of a species, all recordings of a location, etc. In the Arbimon platform, playlists are an essential component for all analysis since it is necessary to select in which set of recordings the analysis will be used.

Recordings (in context of Arbimon): basic unit (sound file) obtained by an automated recording device (Edges or Guardians).

RFM (Random Forest Model): A type of machine learning model consisting of multiple decision trees. In Arbimon, the RFM module is designed for single-species detection. Using a template of the target sound type, the model can be trained to detect the target sound based on the correlation between the input audio and the template.

Above: Example of Random Forest Model on RFCx-Arbimon platform. 

ROI (Region of Interest): A box delimiting a region of the spectrogram with information of starting time, ending time, lowest frequency and highest frequency. In Arbimon, in the Pattern Matching analysis, the template chosen by the user is shown as an ROI, as well as each resultant match of the model (see Pattern Matching).

Sample rate: The sampling rate is how many times per second these samples will be collected by the recording device. For example, if we configured a recorder with a sampling rate of 44.1 kHz, it means that it was collecting 44,100 samples per second.Choosing the best sampling rate depends on the type of sound recorded. For birds and other animals that produce sounds between 20Hz-20kHz, a sampling rate of 48kHz is indicated. Higher sampling rates can better record higher-pitched sounds like some species of bat.

Sampling sites: The locations where data is collected. For example, a location where a recorder is installed is a sampling site.

Species composition: The component of biodiversity that describes which taxa are present in a species assemblage. For example, two areas can have the same species richness (see below), but the species present in both areas can be quite different. By considering species identity (in some cases abundance of species), the species composition metrics complement the species richness and play a central role in ecology and conservation biology. Many times the conservation goals are focused on the rare or endangered species, or even in the identification of assemblages that include exclusive taxa that do not occur in other species assemblages. Additionally, it is very useful to assess changes in the communities through space and time, for example, evaluate how environmental and anthropogenic stressors affect the species occurring in the assemblages.

Above: Two sites with the same richness of bird species, but different species composition.

Species richness: The number of species of a given taxon in a sample, location or time period. The species richness is the simplest, commonness and most intuitive metric to summarize biodiversity (Magurran 2004). Although a simple count of species exhibits several limitations for not considering species composition (see species composition above), or other important characteristics of biological communities such as evolutionary history and functional diversity, it is widely used for ecological and conservation issues. For example, species richness has been used for a long time to identify global hotspots of biodiversity, establish priority areas for conservation and assess how biodiversity varies along environmental gradients. Species richness faces a number of issues, which sampling effort is the most problematic, and therefore, there are manifold species richness indices developed to represent the number of species and deal with issues induced by sample size, different sampling efforts, and species abundance distribution (e.g. species accumulation curves, nonparametric estimators; Magurran 2004). 

Survey design (or Sampling design): A detailed description of the entire methodology used for sampling. For example, it can include information about how sampling sites were chosen, sampling effort, number of recorders, type of habitats, or different treatments in which each recorder was installed. The questions and goals of a project should drive the survey design.

Soundscape: All the sounds emanating from a specific location and time period. Soundscape is composed of three fundamental sources: biophony, geophony, and anthrophony, which can vary along space and time. It can be represented by a graph showing the amount of acoustic activity at each frequency within a time span. To learn more about Soundscapes in Arbimon, see this article.  

Above: Example of a soundscape. Lighter colors represent areas with the greatest vocal activity within a range of frequency and time.

Soundscape composition classes: Soundscapes classes can be divided primarily into natural (biophony and geophony) and human-made (anthrophony) classes.

Species List: List of all species identified in a sample, location, project, time or other context specified by the researcher.

Species List (context within Arbimon): The total set of species identified within a project in Arbimon.

Spectrogram: A graphic visualization of a sound represented by the frequency in the y-axis and the time in the x-axis. The colors in a spectrogram represent the intensity of a sound (i.e. amplitude).

Above: Example of a spectrogram of the Aramus guarauna call.

Supervised machine learning: A category of artificial intelligence or machine learning in which we provide data for the supervised learning algorithm previously labeled with the correct identification. The algorithm finds patterns that correlate the data with the correct identification when training. After the training phase, it is possible to use the algorithm to label new datasets based on the information provided at the beginning.

Tags: In the Arbimon context, these are terms included in recordings to facilitate search and organization. For example, in a recording where there is doubt about the identification of a species, the user can include the tag "doubt" so she/he will be able to find all recordings tagged with that same tag in the future. Furthermore, it is possible to put several tags in the same recording.

Above: In Arbimon, it is possible to select any area of the spectrogram and include tags.

Templates: An example used to create a Pattern Matching or Random Forest Model (RFM) in Arbimon. There are some important characteristics for choosing a good template. First, whenever possible, the template should have a good signal-to-noise ratio. That is, the sound signal of interest should have a greater amplitude than the background noise. Another important point is choosing a sound signal of the target species that does not overlap with sound signals of other species. Finally, we recommend leaving some margin around the sound of interest.

Above: Examples of templates. (A) is a good template, with a good signal-to-noise ratio and no overlap with other sounds. (B) is a poor template, with a great deal of noise to the signal and overlap with other sounds.

Training sets: In the context of Arbimon, training sets are used for creating Random Forest models (RFM) and Convolutional Neural Networks (CNN). In the RFM, each training set consists of 1) one or more ROIs containing the target sound (which are used for building the RFM template) and 2) a set of recordings with the presence of the target sound type validated. In the CNN, training sets consist of sets of ROIs from PM that were validated as present or absent.

Vocal activity: the number of sounds produced by a species or a group of animals in a period of time.

Above: Vocal activity of bird species in Puerto Rico.

Ultrasound: Sound waves frequency above 20 kHz, higher than the maximum frequency that the human ear can hear (examples of animals that hear ultrasound are dogs, dolphins, bats, and many groups of insects).

Rainforest Connection (RFCx) Platforms and Apps:

RFCx Arbimon: A cloud-based platform to store, manage, and analyze ecoacoustic recordings.

Arbimon main page

Companion App: A smartphone app used to for easy deployment and management of AudioMoth or Guardian devices in the field.

RFCx Companion App

Guardian App: A smartphone app which allows users to receive and review alerts from RFCx Guardians when undesired activity such as illegal logging, vehicle movement, or poaching activity is detected. Users can also submit reports concerning the detected activity.

RFCx Guardian App

Uploader App: A desktop tool for easily uploading recordings to Arbimon.

Uploader App