RocketGit

dleucas / wmmsdb (public) (License: GPLv3) (since 2018-07-08) (hash sha1)

A collection of scripts to download, transform and normalize the Watkins Marine Mammal Sound Database.

Credit:

“Watkins Marine Mammal Sound Database, Woods Hole Oceanographic Institution.”

http://cis.whoi.edu/science/B/whalesounds/index.cfm

Clone URLs: https://rocketgit.com/user/dleucas/wmmsdb ssh://rocketgit@ssh.rocketgit.com/user/dleucas/wmmsdb git://git.rocketgit.com/user/dleucas/wmmsdb

master species_names

List of commits:

Subject	Hash	Author	Date (UTC)
WIP document acoustat	ccc4a6de663a7272ee3d5777fe1479af549e9938	dleucas	2019-06-06 01:39:08
WIP document acoustat	1c6b03267e3016d9b637775df2b4b153866ac040	dleucas	2019-06-05 22:39:39
add dependency on nav.html and pandoc.css	b4f054eb6675117d576fa9462220bc5bc8d15be4	dleucas	2019-06-01 01:12:44
nav title	7c5fadd2e143028d614fab4c31ed7389ed17e6f6	dleucas	2019-06-01 01:12:04
document world map	07b70f4b85731456b559edeecea12b339e724aaf	dleucas	2019-06-01 01:11:43
use live ElasticSearch URL in example query	b6df3b91ba395f62003fa89f1d6ae3f6a705ea9e	dleucas	2019-05-31 23:38:40
Document geo coordinates based on the .GC field	d3c9fc90773c6252209074583a00b290633f340c	dleucas	2019-05-31 23:24:02
transform geo coordinates	b32a76ac0917bc3c8bb85e225b19004bc56ac929	dleucas	2019-05-31 22:26:16
update index mapping with geo_point for location coordinates	510a37c03a86603425c6af347c038d69d7fa0cde	dleucas	2019-05-31 22:24:42
WIP search by geo distance	61e13ff42bf7e9d283c7d18087c14bb30c2e69c8	dleucas	2019-05-31 22:16:46
about page: show task progress	9372732e21c753f99d706a474cb3a74394ced1dc	dleucas	2019-05-24 01:04:07
data page: describe mapping, clean-up table	a78264311f89a3b67d40fa8ba3d768c3ec9b2592	dleucas	2019-05-24 00:33:21
add site navigation, remove clutter	9e0361707284cc01d6795195df6848c1f20e88cc	dleucas	2019-05-23 01:21:22
WIP initial changelog	12150e01cbb190f83d9e5f02f1431a4856c780d2	dleucas	2019-05-23 01:20:30
add source code links to about page	2fbd58dd9ee99c0568c0c8005bb7becafa8c2163	dleucas	2019-05-23 01:19:28
add download links	7cc06b24732d4167bf8d357c68098f19a66fd01d	dleucas	2019-05-23 00:46:36
WIP notes on database fields	2cafc0525643f5dc0ed0a39bb1f914df4fae4f94	dleucas	2019-05-23 00:24:35
WIP notes on database fields	30cc261f4b22da6f6304b33bb57e65b2d145e931	dleucas	2019-05-22 23:43:27
more space for tables	332cdefc80e7d01046dbfe4198c16af92e85f1c4	dleucas	2019-05-22 23:42:50
spelling	48bfa46f40db11b1d120a113502d85916d166db9	dleucas	2019-05-22 23:42:22

Commit ccc4a6de663a7272ee3d5777fe1479af549e9938 - WIP document acoustat
Author: dleucas
Author date (UTC): 2019-06-06 01:39
Committer name: dleucas
Committer date (UTC): 2019-06-06 01:39
Parent(s): 1c6b03267e3016d9b637775df2b4b153866ac040
Signing key:
Tree: 6183ab99373217591913f9ac36e6fca72ed29639

File	Lines added	Lines deleted
webroot/changelog.md	85	8

File webroot/changelog.md changed (mode: 100644) (index b39b97a..e39f209)
1	1	% Changelog	% Changelog
	2		%
	3		% Last Update: June 6, 2019
2	4
3	5	## Protocol of the projects development history	## Protocol of the projects development history
4	6

...	...	To create the GeoJSON file run the following command in the [source code][src] t
93	95
94	96	Example Queries (TODO)	Example Queries (TODO)
95	97
96		### 24. July 2018: Analyzing sound clips with Acoustat
	98		### 24. July 2018: Analyzing Watkins sound clips for acoustic features
97	99
98	100	Enriching Watkins sound database, is one opportunity to be explored with this project.	Enriching Watkins sound database, is one opportunity to be explored with this project.
99		Each sound clip is well described by the database, but there is nothing providing insight into the actual signal characteristics.
	101		The contents of each sound clip are well described by the database, but there is nothing providing insight into the actual signal characteristics.
100	102	Even simple properties like clip duration are not available.	Even simple properties like clip duration are not available.
101	103
102	104	Automatically analyzing ~15.000 sound clips, might not have been an option with	Automatically analyzing ~15.000 sound clips, might not have been an option with
103		affordable PC hardware resources in the 1990s, but any current day machine can handle this task in reasonable time.
	105		affordable PC hardware in the 1990s, but any current day machine can handle this task in reasonable time.
104	106
105	107	#### Method	#### Method
106	108

...	...	on a software tool "Characterizing acoustic features of marine animal sounds".
113	115
114	116	This seems like a perfect fit, to gain various statistical properties from the signals time and frequency domain.	This seems like a perfect fit, to gain various statistical properties from the signals time and frequency domain.
115	117
	118		An implementation, of some (but not all) of the acoustic features functions, exists in the popular [seewave][seewave.acoustat] library,
	119		written in the [statistical computing language R][R] by Jerome Sueur.
	120
	121		The [manual][seewave.acoustat] states:
	122
	123		> acoustat was originally developed in Matlab language by Fristrup and Watkins (1992). The R function was kindly checked by Kurt Fristrup.
	124
	125		Other methods are to be explored in the future.
	126
116	127	#### Implementation	#### Implementation
117	128
118	129	Downloading all sound clips is left as exercise for the reader. Please be reasonable and don't overload the WHOI server.	Downloading all sound clips is left as exercise for the reader. Please be reasonable and don't overload the WHOI server.

...	...	Downloading all sound clips is left as exercise for the reader. Please be reason
120	131	The remaining job is fairly simple: load the signal, run the statistics and store the result as JSON files, for further indexing in ElasticSearch.	The remaining job is fairly simple: load the signal, run the statistics and store the result as JSON files, for further indexing in ElasticSearch.
121	132
122	133	Running the analysis effectively requires a task management tool. It keeps track of the progress, can resume a aborted run and allows parallel execution of tasks.	Running the analysis effectively requires a task management tool. It keeps track of the progress, can resume a aborted run and allows parallel execution of tasks.
123		A bash script can run the task in parallel but GNU Make provides a clear _state_; how far the processing of all sound clips has progressed.
	134		A bash script can run the task in parallel but [GNU Make][make] provides a clear _state_; how far the processing of all sound clips has progressed.
124	135	It does that by keeping track of input and output files. If an output JSON file does not exists, the job is not done.	It does that by keeping track of input and output files. If an output JSON file does not exists, the job is not done.
125	136
126	137	This simplified `Makefile` defines `.wav` files as INPUTS and `.acoustat.json` as `ACOUSTAT` outputs using `acoustat.json.r` as job processor.	This simplified `Makefile` defines `.wav` files as INPUTS and `.acoustat.json` as `ACOUSTAT` outputs using `acoustat.json.r` as job processor.

...	...	library("methods")
158	169	argv = commandArgs(trailingOnly = TRUE)	argv = commandArgs(trailingOnly = TRUE)
159	170	wav = tuneR::readWave(argv[1])	wav = tuneR::readWave(argv[1])
160	171	stat = seewave::acoustat(wave=wav, plot = FALSE)	stat = seewave::acoustat(wave=wav, plot = FALSE)
161		# remove unwated contour data
	172		# remove unwanted contour data
162	173	stat$freq.contour <- NULL	stat$freq.contour <- NULL
163	174	stat$time.contour <- NULL	stat$time.contour <- NULL
164	175	# assign record number as id	# assign record number as id
165	176	stat$id <- argv[3]	stat$id <- argv[3]
166	177	write_json(stat, argv[2])	write_json(stat, argv[2])
167	178	```	```
	179		The first line allows execution as a shell script and ensures a clean R environment. To avoid cluttered output during execution, various library messages are silenced.
168	180
169		The output JSON file looks like this:
	181		#### Execution
	182
	183		All sound files, `Makefile` and `acoustat.json.r` script are placed in the same directory and the following command runs the analysis with 8 parallel processes.
	184		The number should equal the number of available CPU cores.
	185
	186		```bash
	187		make -j 8
	188		```
	189		A single output JSON file looks like this:
170	190
171		```JSON
	191		```json
172	192	{	{
173	193	"time.P1": [	"time.P1": [
174	194	0.1157	0.1157

...	...	The output JSON file looks like this:
200	220	}	}
201	221	```	```
202	222
203		#### Results and Usage
	223		The meaning of each value is documented in the [acoustat manual][seewave.acoustat].
	224
	225		#### Indexing
	226
	227		As a last step the raw values are mapped under `.sound.freq` and `.sound.time` of the existing JSON document tree.
	228
	229		A `acoustat.jq` script transforms the JSON data for the ElasticSearch [bulk import API][elastic.bulk].
	230
	231		```jq
	232		{
	233		update: {
	234		_index: "wmmsdb",
	235		_type: "record",
	236		_id: .id[0]
	237		}
	238		},
	239		{
	240		doc: {
	241		sound: {
	242		freq: {
	243		IPR: .["freq.IPR"][0],
	244		M: .["freq.M"][0],
	245		P1: .["freq.P1"][0],
	246		P2: .["freq.P2"][0]
	247		},
	248		time: {
	249		IPR: .["time.IPR"][0],
	250		M: .["time.M"][0],
	251		P1: .["time.P1"][0],
	252		P2: .["time.P2"][0]
	253		}
	254		}
	255		}
	256		}
	257		```
	258
	259		Finally the data is added to ElasticSearch using the following command.
	260
	261		```bash
	262		jq --raw-output --compact-output -f acoustat.jq *.acoustat.json \| curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@-" \| jq .took
	263		```
	264
	265		#### Results and Discussion
	266
	267		With small changes on the existing Web UI the acoustic features are now available as search filters, but how can they be used during research?
	268
	269		The [1992 Fristrup and Watkins report][1912/3055] outlines the design of the features.
	270
	271		> Each statistic was designed to emphasize particular parameters of animal sounds that we recognized as important for distinguishing species.
	272
	273		The paper further explains a correlation test with a subset of 200 sounds clips, to see if species could be distinguished using the statistical features.
204	274
	275		It further notes:
205	276
	277		> The short-term bandwidth statistics in Table 5, the aggregate bandwidth statistics in
	278		> Table 6, and the center frequency statistics of Table 7 were the most diagnostic for this set
	279		> of sound sequences. They apparently separated the sounds of different species.
206	280
207	281
208	282	#### References	#### References

...	...	The output JSON file looks like this:
211	285
212	286	[1912/3055]: https://hdl.handle.net/1912/3055	[1912/3055]: https://hdl.handle.net/1912/3055
213	287	[seewave.acoustat]: http://rug.mnhn.fr/seewave/HTML/MAN/acoustat.html	[seewave.acoustat]: http://rug.mnhn.fr/seewave/HTML/MAN/acoustat.html
	288		[R]: https://www.r-project.org/
	289		[make]: https://www.gnu.org/software/make/
	290		[elastic.bulk]: https://www.elastic.co/guide/en/elasticsearch/reference/1.7/docs-bulk.html
214	291
215	292	### 20. July 2018: First release	### 20. July 2018: First release
216	293

Hints:
Before first commit, do not forget to setup your git environment:

git config --global user.name "your_name_here"
git config --global user.email "your@email_here"

Clone this repository using HTTP(S):

git clone https://rocketgit.com/user/dleucas/wmmsdb

Clone this repository using ssh (do not forget to upload a key first):

git clone ssh://rocketgit@ssh.rocketgit.com/user/dleucas/wmmsdb

Clone this repository using git:

git clone git://git.rocketgit.com/user/dleucas/wmmsdb

You are allowed to anonymously push to this repository.
This means that your pushed commits will automatically be transformed into a merge request:

... clone the repository ...
... make some changes and some commits ...
git push origin main