dleucas / wmmsdb (public) (License: GPLv3) (since 2018-07-08) (hash sha1)
A collection of scripts to download, transform and normalize the Watkins Marine Mammal Sound Database.

Credit:

“Watkins Marine Mammal Sound Database, Woods Hole Oceanographic Institution.”

http://cis.whoi.edu/science/B/whalesounds/index.cfm
List of commits:
Subject Hash Author Date (UTC)
WIP document acoustat ccc4a6de663a7272ee3d5777fe1479af549e9938 dleucas 2019-06-06 01:39:08
WIP document acoustat 1c6b03267e3016d9b637775df2b4b153866ac040 dleucas 2019-06-05 22:39:39
add dependency on nav.html and pandoc.css b4f054eb6675117d576fa9462220bc5bc8d15be4 dleucas 2019-06-01 01:12:44
nav title 7c5fadd2e143028d614fab4c31ed7389ed17e6f6 dleucas 2019-06-01 01:12:04
document world map 07b70f4b85731456b559edeecea12b339e724aaf dleucas 2019-06-01 01:11:43
use live ElasticSearch URL in example query b6df3b91ba395f62003fa89f1d6ae3f6a705ea9e dleucas 2019-05-31 23:38:40
Document geo coordinates based on the .GC field d3c9fc90773c6252209074583a00b290633f340c dleucas 2019-05-31 23:24:02
transform geo coordinates b32a76ac0917bc3c8bb85e225b19004bc56ac929 dleucas 2019-05-31 22:26:16
update index mapping with geo_point for location coordinates 510a37c03a86603425c6af347c038d69d7fa0cde dleucas 2019-05-31 22:24:42
WIP search by geo distance 61e13ff42bf7e9d283c7d18087c14bb30c2e69c8 dleucas 2019-05-31 22:16:46
about page: show task progress 9372732e21c753f99d706a474cb3a74394ced1dc dleucas 2019-05-24 01:04:07
data page: describe mapping, clean-up table a78264311f89a3b67d40fa8ba3d768c3ec9b2592 dleucas 2019-05-24 00:33:21
add site navigation, remove clutter 9e0361707284cc01d6795195df6848c1f20e88cc dleucas 2019-05-23 01:21:22
WIP initial changelog 12150e01cbb190f83d9e5f02f1431a4856c780d2 dleucas 2019-05-23 01:20:30
add source code links to about page 2fbd58dd9ee99c0568c0c8005bb7becafa8c2163 dleucas 2019-05-23 01:19:28
add download links 7cc06b24732d4167bf8d357c68098f19a66fd01d dleucas 2019-05-23 00:46:36
WIP notes on database fields 2cafc0525643f5dc0ed0a39bb1f914df4fae4f94 dleucas 2019-05-23 00:24:35
WIP notes on database fields 30cc261f4b22da6f6304b33bb57e65b2d145e931 dleucas 2019-05-22 23:43:27
more space for tables 332cdefc80e7d01046dbfe4198c16af92e85f1c4 dleucas 2019-05-22 23:42:50
spelling 48bfa46f40db11b1d120a113502d85916d166db9 dleucas 2019-05-22 23:42:22
Commit ccc4a6de663a7272ee3d5777fe1479af549e9938 - WIP document acoustat
Author: dleucas
Author date (UTC): 2019-06-06 01:39
Committer name: dleucas
Committer date (UTC): 2019-06-06 01:39
Parent(s): 1c6b03267e3016d9b637775df2b4b153866ac040
Signing key:
Tree: 6183ab99373217591913f9ac36e6fca72ed29639
File Lines added Lines deleted
webroot/changelog.md 85 8
File webroot/changelog.md changed (mode: 100644) (index b39b97a..e39f209)
1 1 % Changelog % Changelog
2 %
3 % Last Update: June 6, 2019
2 4
3 5 ## Protocol of the projects development history ## Protocol of the projects development history
4 6
 
... ... To create the GeoJSON file run the following command in the [source code][src] t
93 95
94 96 Example Queries (TODO) Example Queries (TODO)
95 97
96 ### 24. July 2018: Analyzing sound clips with Acoustat
98 ### 24. July 2018: Analyzing Watkins sound clips for acoustic features
97 99
98 100 Enriching Watkins sound database, is one opportunity to be explored with this project. Enriching Watkins sound database, is one opportunity to be explored with this project.
99 Each sound clip is well described by the database, but there is nothing providing insight into the actual signal characteristics.
101 The contents of each sound clip are well described by the database, but there is nothing providing insight into the actual signal characteristics.
100 102 Even simple properties like clip duration are not available. Even simple properties like clip duration are not available.
101 103
102 104 Automatically analyzing ~15.000 sound clips, might not have been an option with Automatically analyzing ~15.000 sound clips, might not have been an option with
103 affordable PC hardware resources in the 1990s, but any current day machine can handle this task in reasonable time.
105 affordable PC hardware in the 1990s, but any current day machine can handle this task in reasonable time.
104 106
105 107 #### Method #### Method
106 108
 
... ... on a software tool "Characterizing acoustic features of marine animal sounds".
113 115
114 116 This seems like a perfect fit, to gain various statistical properties from the signals time and frequency domain. This seems like a perfect fit, to gain various statistical properties from the signals time and frequency domain.
115 117
118 An implementation, of some (but not all) of the acoustic features functions, exists in the popular [seewave][seewave.acoustat] library,
119 written in the [statistical computing language R][R] by Jerome Sueur.
120
121 The [manual][seewave.acoustat] states:
122
123 > acoustat was originally developed in Matlab language by Fristrup and Watkins (1992). The R function was kindly checked by Kurt Fristrup.
124
125 Other methods are to be explored in the future.
126
116 127 #### Implementation #### Implementation
117 128
118 129 Downloading all sound clips is left as exercise for the reader. Please be reasonable and don't overload the WHOI server. Downloading all sound clips is left as exercise for the reader. Please be reasonable and don't overload the WHOI server.
 
... ... Downloading all sound clips is left as exercise for the reader. Please be reason
120 131 The remaining job is fairly simple: load the signal, run the statistics and store the result as JSON files, for further indexing in ElasticSearch. The remaining job is fairly simple: load the signal, run the statistics and store the result as JSON files, for further indexing in ElasticSearch.
121 132
122 133 Running the analysis *effectively* requires a task management tool. It keeps track of the progress, can resume a aborted run and allows parallel execution of tasks. Running the analysis *effectively* requires a task management tool. It keeps track of the progress, can resume a aborted run and allows parallel execution of tasks.
123 A bash script can run the task in parallel but GNU Make provides a clear _state_; how far the processing of all sound clips has progressed.
134 A bash script can run the task in parallel but [GNU Make][make] provides a clear _state_; how far the processing of all sound clips has progressed.
124 135 It does that by keeping track of input and output files. If an output JSON file does not exists, the job is not done. It does that by keeping track of input and output files. If an output JSON file does not exists, the job is not done.
125 136
126 137 This simplified `Makefile` defines `*.wav` files as INPUTS and `*.acoustat.json` as `ACOUSTAT` outputs using `acoustat.json.r` as job processor. This simplified `Makefile` defines `*.wav` files as INPUTS and `*.acoustat.json` as `ACOUSTAT` outputs using `acoustat.json.r` as job processor.
 
... ... library("methods")
158 169 argv = commandArgs(trailingOnly = TRUE) argv = commandArgs(trailingOnly = TRUE)
159 170 wav = tuneR::readWave(argv[1]) wav = tuneR::readWave(argv[1])
160 171 stat = seewave::acoustat(wave=wav, plot = FALSE) stat = seewave::acoustat(wave=wav, plot = FALSE)
161 # remove unwated contour data
172 # remove unwanted contour data
162 173 stat$freq.contour <- NULL stat$freq.contour <- NULL
163 174 stat$time.contour <- NULL stat$time.contour <- NULL
164 175 # assign record number as id # assign record number as id
165 176 stat$id <- argv[3] stat$id <- argv[3]
166 177 write_json(stat, argv[2]) write_json(stat, argv[2])
167 178 ``` ```
179 The first line allows execution as a shell script and ensures a clean R environment. To avoid cluttered output during execution, various library messages are silenced.
168 180
169 The output JSON file looks like this:
181 #### Execution
182
183 All sound files, `Makefile` and `acoustat.json.r` script are placed in the same directory and the following command runs the analysis with 8 parallel processes.
184 The number should equal the number of available CPU cores.
185
186 ```bash
187 make -j 8
188 ```
189 A single output JSON file looks like this:
170 190
171 ```JSON
191 ```json
172 192 { {
173 193 "time.P1": [ "time.P1": [
174 194 0.1157 0.1157
 
... ... The output JSON file looks like this:
200 220 } }
201 221 ``` ```
202 222
203 #### Results and Usage
223 The meaning of each value is documented in the [acoustat manual][seewave.acoustat].
224
225 #### Indexing
226
227 As a last step the raw values are mapped under `.sound.freq` and `.sound.time` of the existing JSON document tree.
228
229 A `acoustat.jq` script transforms the JSON data for the ElasticSearch [bulk import API][elastic.bulk].
230
231 ```jq
232 {
233 update: {
234 _index: "wmmsdb",
235 _type: "record",
236 _id: .id[0]
237 }
238 },
239 {
240 doc: {
241 sound: {
242 freq: {
243 IPR: .["freq.IPR"][0],
244 M: .["freq.M"][0],
245 P1: .["freq.P1"][0],
246 P2: .["freq.P2"][0]
247 },
248 time: {
249 IPR: .["time.IPR"][0],
250 M: .["time.M"][0],
251 P1: .["time.P1"][0],
252 P2: .["time.P2"][0]
253 }
254 }
255 }
256 }
257 ```
258
259 Finally the data is added to ElasticSearch using the following command.
260
261 ```bash
262 jq --raw-output --compact-output -f acoustat.jq *.acoustat.json | curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@-" | jq .took
263 ```
264
265 #### Results and Discussion
266
267 With small changes on the existing Web UI the acoustic features are now available as search filters, but how can they be used during research?
268
269 The [1992 Fristrup and Watkins report][1912/3055] outlines the design of the features.
270
271 > Each statistic was designed to emphasize particular parameters of animal sounds that we recognized as important for distinguishing species.
272
273 The paper further explains a correlation test with a subset of 200 sounds clips, to see if species could be distinguished using the statistical features.
204 274
275 It further notes:
205 276
277 > The short-term bandwidth statistics in Table 5, the aggregate bandwidth statistics in
278 > Table 6, and the center frequency statistics of Table 7 were the most diagnostic for this set
279 > of sound sequences. They apparently separated the sounds of different species.
206 280
207 281
208 282 #### References #### References
 
... ... The output JSON file looks like this:
211 285
212 286 [1912/3055]: https://hdl.handle.net/1912/3055 [1912/3055]: https://hdl.handle.net/1912/3055
213 287 [seewave.acoustat]: http://rug.mnhn.fr/seewave/HTML/MAN/acoustat.html [seewave.acoustat]: http://rug.mnhn.fr/seewave/HTML/MAN/acoustat.html
288 [R]: https://www.r-project.org/
289 [make]: https://www.gnu.org/software/make/
290 [elastic.bulk]: https://www.elastic.co/guide/en/elasticsearch/reference/1.7/docs-bulk.html
214 291
215 292 ### 20. July 2018: First release ### 20. July 2018: First release
216 293
Hints:
Before first commit, do not forget to setup your git environment:
git config --global user.name "your_name_here"
git config --global user.email "your@email_here"

Clone this repository using HTTP(S):
git clone https://rocketgit.com/user/dleucas/wmmsdb

Clone this repository using ssh (do not forget to upload a key first):
git clone ssh://rocketgit@ssh.rocketgit.com/user/dleucas/wmmsdb

Clone this repository using git:
git clone git://git.rocketgit.com/user/dleucas/wmmsdb

You are allowed to anonymously push to this repository.
This means that your pushed commits will automatically be transformed into a merge request:
... clone the repository ...
... make some changes and some commits ...
git push origin main