dleucas / wmmsdb (public) (License: GPLv3) (since 2018-07-08) (hash sha1)
A collection of scripts to download, transform and normalize the Watkins Marine Mammal Sound Database.

Credit:

“Watkins Marine Mammal Sound Database, Woods Hole Oceanographic Institution.”

http://cis.whoi.edu/science/B/whalesounds/index.cfm
List of commits:
Subject Hash Author Date (UTC)
convert html only once. extract species names as json. formating and lint. e755dc7f4fe2d7c9b97826a0f3f2cf5385e90ef9 dleucas 2021-12-16 13:35:31
download once. use wget only. get species names. test for commands. formating 572dbf1eaffe17c43a4a01dc9675737628c5a234 dleucas 2021-12-16 12:14:26
add filter by behavior type, sort by modified date c3f9f9f9d9501e714117af7fff573e7f3fa4052b dleucas 2019-06-14 03:51:19
rename type to type_of 4269dc257530a9a7fa21ff8708f4594a2f1a453d dleucas 2019-06-14 03:39:49
ElasticSearch setting for larger HTTP request e83e501f949473538096f984220934c0a51de0b4 dleucas 2019-06-14 03:25:22
rename type to type_of e1fcd27b05eabc8bce06751a9925200e4707168b dleucas 2019-06-14 02:34:15
add animal behavior transformation and documentation 7550db3bbd1c69c9369cf8dfe3a5d1195e761ae2 dleucas 2019-06-14 00:57:16
add lost modified date c4922a44cebebd63da6c23a2a71f97cdb47b4a68 dleucas 2019-06-12 22:46:35
describe remaining db fields e3c7f44ad24a3d7c8e4eb74c777a4eecc3675d75 dleucas 2019-06-11 21:37:06
WIP document acoustat 3a47cfcfa204503f682879d7485a6ef941e248e4 dleucas 2019-06-07 00:17:30
WIP document acoustat 4591875fd32c1c91d20133ff90dcf5676b3c216c dleucas 2019-06-07 00:08:18
WIP document acoustat ccc4a6de663a7272ee3d5777fe1479af549e9938 dleucas 2019-06-06 01:39:08
WIP document acoustat 1c6b03267e3016d9b637775df2b4b153866ac040 dleucas 2019-06-05 22:39:39
add dependency on nav.html and pandoc.css b4f054eb6675117d576fa9462220bc5bc8d15be4 dleucas 2019-06-01 01:12:44
nav title 7c5fadd2e143028d614fab4c31ed7389ed17e6f6 dleucas 2019-06-01 01:12:04
document world map 07b70f4b85731456b559edeecea12b339e724aaf dleucas 2019-06-01 01:11:43
use live ElasticSearch URL in example query b6df3b91ba395f62003fa89f1d6ae3f6a705ea9e dleucas 2019-05-31 23:38:40
Document geo coordinates based on the .GC field d3c9fc90773c6252209074583a00b290633f340c dleucas 2019-05-31 23:24:02
transform geo coordinates b32a76ac0917bc3c8bb85e225b19004bc56ac929 dleucas 2019-05-31 22:26:16
update index mapping with geo_point for location coordinates 510a37c03a86603425c6af347c038d69d7fa0cde dleucas 2019-05-31 22:24:42
Commit e755dc7f4fe2d7c9b97826a0f3f2cf5385e90ef9 - convert html only once. extract species names as json. formating and lint.
Author: dleucas
Author date (UTC): 2021-12-16 13:35
Committer name: dleucas
Committer date (UTC): 2021-12-16 13:35
Parent(s): 572dbf1eaffe17c43a4a01dc9675737628c5a234
Signing key:
Tree: d38fbed1887eb4311ba66514c5cdd48665677183
File Lines added Lines deleted
transform.sh 29 8
File transform.sh changed (mode: 100755) (index 64f7fc5..1ea333d)
1 1 #!/bin/bash #!/bin/bash
2 set -e # abort on any errors
2 set -eo pipefail
3 # set -x
4
5 test -e "$(command -v xidel)" || (
6 echo "ERR: Need xidel from https://www.videlibri.de/xidel.html"
7 exit 1
8 )
9 test -e "$(command -v jq)" || (
10 echo "ERR: Need jq from https://stedolan.github.io/jq/"
11 exit 1
12 )
13
14 # Mapping of species id to common and scientific name
15
16 tail -n+56 data/species.map | jq -cR 'split("\t") as $row | {($row[0]): ($row[1])}' | jq -cs add >data/species.sci.names.json
17 head -n 55 data/species.map | jq -cR 'split("\t") as $row | {($row[0]): ($row[1])}' | jq -cs add >data/species.common.names.json
3 18
4 19 # Transform HTML metadata from source site into JSON # Transform HTML metadata from source site into JSON
5 20
6 # for xpath
21 # for xpath
7 22 XIDEL='xidel -s --input-format=html --output-format=json-wrapped' XIDEL='xidel -s --input-format=html --output-format=json-wrapped'
8 23
9 24 # select all rows from the 2nd table element # select all rows from the 2nd table element
 
... ... XPATH_ENTRY='/html/body/table[2]/tbody/tr/td'
20 35 # "SR:": "3400", # "SR:": "3400",
21 36 # "CS:": "3.388", # "CS:": "3.388",
22 37 # ... # ...
23 #}
38 #}
24 39 # The jq filter explained # The jq filter explained
25 40 # 1. assign the whole array to $row # 1. assign the whole array to $row
26 41 # 2. create a range with a step of 2 over the lenght of the array, 0,2,4,... # 2. create a range with a step of 2 over the lenght of the array, 0,2,4,...
27 42 # 3. create a object and use the range as index for the $row elements # 3. create a object and use the range as index for the $row elements
28 43 # 3.5 remove right most colon from key # 3.5 remove right most colon from key
29 44 # 4. combine the list of objects into a single object with "add" # 4. combine the list of objects into a single object with "add"
45
46 # shellcheck disable=SC2016
30 47 JQ_ARR2OBJ='[ .[] as $row | range(0; $row|length; 2) | {( $row[.] | rtrimstr(":")): ($row[.+1]) } ] | add' JQ_ARR2OBJ='[ .[] as $row | range(0; $row|length; 2) | {( $row[.] | rtrimstr(":")): ($row[.+1]) } ] | add'
31 48
49 test -d data/rn || mkdir -p data/rn
32 50
33 while read RN
34 do
35 $XIDEL --xpath "$XPATH_ENTRY" "raw/rn/metaData.cfm?RN=$RN" | jq "$JQ_ARR2OBJ" > "data/rn/$RN.json"
36 done < data/retrieval.numbers
51 while read -r RN; do
52 # input should exist
53 test -f "raw/rn/metaData.cfm?RN=$RN" || continue
54 # output should not exist
55 test -f "data/rn/$RN.json" && continue
56 $XIDEL --xpath "$XPATH_ENTRY" "raw/rn/metaData.cfm?RN=$RN" | jq -c "$JQ_ARR2OBJ" >"data/rn/$RN.json"
57 done <data/retrieval.numbers
37 58
38 59 # transform all records with jq, this is where the magic happens # transform all records with jq, this is where the magic happens
39 ./transform.jq data/rn/*json > data/transformed.json
60 ./transform.jq data/rn/*json >data/transformed.json
Hints:
Before first commit, do not forget to setup your git environment:
git config --global user.name "your_name_here"
git config --global user.email "your@email_here"

Clone this repository using HTTP(S):
git clone https://rocketgit.com/user/dleucas/wmmsdb

Clone this repository using ssh (do not forget to upload a key first):
git clone ssh://rocketgit@ssh.rocketgit.com/user/dleucas/wmmsdb

Clone this repository using git:
git clone git://git.rocketgit.com/user/dleucas/wmmsdb

You are allowed to anonymously push to this repository.
This means that your pushed commits will automatically be transformed into a merge request:
... clone the repository ...
... make some changes and some commits ...
git push origin main