dleucas / wmmsdb (public) (License: GPLv3) (since 2018-07-08) (hash sha1)
A collection of scripts to download, transform and normalize the Watkins Marine Mammal Sound Database.

Credit:

“Watkins Marine Mammal Sound Database, Woods Hole Oceanographic Institution.”

http://cis.whoi.edu/science/B/whalesounds/index.cfm
List of commits:
Subject Hash Author Date (UTC)
WIP convert filters to functions 32badc3512dd9094d51ba2cc2ef8112eba2698bf dleucas 2021-12-16 18:33:21
convert html only once. extract species names as json. formating and lint. e755dc7f4fe2d7c9b97826a0f3f2cf5385e90ef9 dleucas 2021-12-16 13:35:31
download once. use wget only. get species names. test for commands. formating 572dbf1eaffe17c43a4a01dc9675737628c5a234 dleucas 2021-12-16 12:14:26
add filter by behavior type, sort by modified date c3f9f9f9d9501e714117af7fff573e7f3fa4052b dleucas 2019-06-14 03:51:19
rename type to type_of 4269dc257530a9a7fa21ff8708f4594a2f1a453d dleucas 2019-06-14 03:39:49
ElasticSearch setting for larger HTTP request e83e501f949473538096f984220934c0a51de0b4 dleucas 2019-06-14 03:25:22
rename type to type_of e1fcd27b05eabc8bce06751a9925200e4707168b dleucas 2019-06-14 02:34:15
add animal behavior transformation and documentation 7550db3bbd1c69c9369cf8dfe3a5d1195e761ae2 dleucas 2019-06-14 00:57:16
add lost modified date c4922a44cebebd63da6c23a2a71f97cdb47b4a68 dleucas 2019-06-12 22:46:35
describe remaining db fields e3c7f44ad24a3d7c8e4eb74c777a4eecc3675d75 dleucas 2019-06-11 21:37:06
WIP document acoustat 3a47cfcfa204503f682879d7485a6ef941e248e4 dleucas 2019-06-07 00:17:30
WIP document acoustat 4591875fd32c1c91d20133ff90dcf5676b3c216c dleucas 2019-06-07 00:08:18
WIP document acoustat ccc4a6de663a7272ee3d5777fe1479af549e9938 dleucas 2019-06-06 01:39:08
WIP document acoustat 1c6b03267e3016d9b637775df2b4b153866ac040 dleucas 2019-06-05 22:39:39
add dependency on nav.html and pandoc.css b4f054eb6675117d576fa9462220bc5bc8d15be4 dleucas 2019-06-01 01:12:44
nav title 7c5fadd2e143028d614fab4c31ed7389ed17e6f6 dleucas 2019-06-01 01:12:04
document world map 07b70f4b85731456b559edeecea12b339e724aaf dleucas 2019-06-01 01:11:43
use live ElasticSearch URL in example query b6df3b91ba395f62003fa89f1d6ae3f6a705ea9e dleucas 2019-05-31 23:38:40
Document geo coordinates based on the .GC field d3c9fc90773c6252209074583a00b290633f340c dleucas 2019-05-31 23:24:02
transform geo coordinates b32a76ac0917bc3c8bb85e225b19004bc56ac929 dleucas 2019-05-31 22:26:16
Commit 32badc3512dd9094d51ba2cc2ef8112eba2698bf - WIP convert filters to functions
Author: dleucas
Author date (UTC): 2021-12-16 18:33
Committer name: dleucas
Committer date (UTC): 2021-12-16 18:33
Parent(s): e755dc7f4fe2d7c9b97826a0f3f2cf5385e90ef9
Signing key:
Tree: 48b4fd03c27e16ff57d5ab82c8e15f4616703415
File Lines added Lines deleted
transform.jq 33 24
File transform.jq changed (mode: 100755) (index 0016ceb..16bc0b2)
4 4 # Source data combines multiple values into one field, so split that up # Source data combines multiple values into one field, so split that up
5 5 # also use native data types if possible. # also use native data types if possible.
6 6
7 import "./data/species.sci.names" as $species_sci_names;
8 import "./data/species.common.names" as $species_common_names;
9
7 10 # Convert Degree.Minute coordinates into decimal notation # Convert Degree.Minute coordinates into decimal notation
8 11 def as_coord: def as_coord:
9 12 # Example W073 or W70, degree only, negate # Example W073 or W70, degree only, negate
 
... ... def as_coord:
35 38 null null
36 39 end; end;
37 40
41 def as_date:
42 capture("^(?<date>\\d{1,2}-\\w{3}-\\d{4})") | .date | strptime("%d-%B-%Y") | todateiso8601;
43
44 def as_signal_overlap:
45 {
46 "OF": "Frequency",
47 "OT": "Time",
48 "OTF": "Time and Frequency",
49 "N": "No"
50 } as $overlap_type | capture("(?<o>O[TF]{1,2}|N)") | $overlap_type[.o]?;
51
52 def as_species_code:
53 capture("(?<code>[A-C][A-Z]\\d+[A-Z])") | .code;
54
55 def as_species_common_name:
56 as_species_code | $species_common_names[0][.?];
57
58 def as_species_sci_name:
59 as_species_code | $species_sci_names[0][.?];
60
38 61 # root # root
39 62 { {
40 63 # record number is unique, can be used as _id # record number is unique, can be used as _id
41 64 record_number: .RN, record_number: .RN,
42 65 note: .NT, note: .NT,
43 66 # a lot of noise in the original field, only parsing date # a lot of noise in the original field, only parsing date
44 observation_date: [
45 .OD | capture("^(?<date>\\d{1,2}-\\w{3}-\\d{4})") | .date |
46 strptime("%d-%B-%Y") | todateiso8601
47 ] | .[0],
48 last_modified_date: [
49 .DA | capture("^(?<date>\\d{1,2}-\\w{3}-\\d{4})") | .date |
50 strptime("%d-%B-%Y") | todateiso8601
51 ] | .[0],
67 observation_date: .OD | as_date,
68 last_modified_date: .DA | as_date,
52 69 location: { location: {
53 70 name: .GB | split("|") | map(gsub("(\\s+)?[A-D][A-Z]\\d+[A-Z](\\s+)?|(X$)"; ""; "gm")), name: .GB | split("|") | map(gsub("(\\s+)?[A-D][A-Z]\\d+[A-Z](\\s+)?|(X$)"; ""; "gm")),
54 71 coordinates: .GC | split("|") coordinates: .GC | split("|")
 
... ... def as_coord:
130 147 } as $class_names | } as $class_names |
131 148 [ .SC | capture("(?<c>[SMVDUC]{1})") ] | $class_names[.[0].c]? [ .SC | capture("(?<c>[SMVDUC]{1})") ] | $class_names[.[0].c]?
132 149 ] | .[0], ] | .[0],
133 overlap: [
134 # overlap lookup table
135 {
136 "OF": "Frequency",
137 "OT": "Time",
138 "OTF": "Time and Frequency",
139 "N": "No"
140 } as $overlap_type |
141 [ .SC | capture("(?<o>O[TF]{1,2}|N)") ] | $overlap_type[.[0].o]?
142 ] | .[0],
150 overlap: .SC | as_signal_overlap,
143 151 # other general sound producing sources listed in genus field # other general sound producing sources listed in genus field
144 152 source: ( .GS | split("|") | source: ( .GS | split("|") |
145 153 map(. as $s | match("\\s+[E-Z]{1}(\\s+)?$"; "m") | map(. as $s | match("\\s+[E-Z]{1}(\\s+)?$"; "m") |
 
... ... def as_coord:
291 299 end end
292 300 ) )
293 301 ), ),
294 # Genus name and species code
295 genus: ( .GS | split("|") |
296 map(. as $s | match("[A-C][A-Z]\\d+[A-Z](\\s+)?$"; "m") |
302 # Genus
303 species: .GS | split("|") |
304 map(. as $s |
297 305 { {
298 name: $s[0:.offset] | gsub("^\\s+|\\s+$";""),
299 species_code: .string | gsub("^\\s+|\\s+$";"")
306 _as_noted: $s | gsub("^\\s+|\\s+$";""),
307 species_code: $s | as_species_code,
308 scientific_name: $s | as_species_sci_name,
309 common_name: $s | as_species_common_name,
300 310 }) })
301 ),
302 311 } }
303 312 } }
Hints:
Before first commit, do not forget to setup your git environment:
git config --global user.name "your_name_here"
git config --global user.email "your@email_here"

Clone this repository using HTTP(S):
git clone https://rocketgit.com/user/dleucas/wmmsdb

Clone this repository using ssh (do not forget to upload a key first):
git clone ssh://rocketgit@ssh.rocketgit.com/user/dleucas/wmmsdb

Clone this repository using git:
git clone git://git.rocketgit.com/user/dleucas/wmmsdb

You are allowed to anonymously push to this repository.
This means that your pushed commits will automatically be transformed into a merge request:
... clone the repository ...
... make some changes and some commits ...
git push origin main