dleucas / wmmsdb (public) (License: GPLv3) (since 2018-07-08) (hash sha1)
A collection of scripts to download, transform and normalize the Watkins Marine Mammal Sound Database.

Credit:

“Watkins Marine Mammal Sound Database, Woods Hole Oceanographic Institution.”

http://cis.whoi.edu/science/B/whalesounds/index.cfm
List of commits:
Subject Hash Author Date (UTC)
rename to markdown 8f99257ba05ee39b629b7d2281c149bbbe941b29 dleucas 2018-07-07 22:06:15
markdown 0a1aa88a79caad45e62f086716592e00b51ff36e dleucas 2018-07-07 22:06:03
progress 65700e843b38245a91b97658f63d640c70cefe6f dleucas 2018-07-07 02:30:36
transform HTML table to JSON object 058f59f35e7e40c437609d91889ceb5a786e005b dleucas 2018-07-07 02:29:29
initial data survey 7a0ba30602c78bf1f06a19d4322e503e3d11e050 dleucas 2018-07-07 02:28:12
progress information 90dca6f739de3887394810e419457412cb2fd9fa dleucas 2018-07-06 01:42:38
working download of metadata pages 071d8ad292bcf1486ce55f85eb0bc2102ab9c09a dleucas 2018-07-06 01:42:06
download metadata pages, initial script ed63b4125deec37961e76b0f45592ad8549483d4 dleucas 2018-07-06 00:12:09
Commit 8f99257ba05ee39b629b7d2281c149bbbe941b29 - rename to markdown
Author: dleucas
Author date (UTC): 2018-07-07 22:06
Committer name: dleucas
Committer date (UTC): 2018-07-07 22:06
Parent(s): 0a1aa88a79caad45e62f086716592e00b51ff36e
Signer:
Signing key:
Signing status: N
Tree: 598f7bb825b3a74cc82eeb8a37c102463bb17d7b
File Lines added Lines deleted
TODO 0 30
File TODO deleted (index fca6b23..0000000)
1 No Cam / Mic
2
3 Source Site is:
4 http://cis.whoi.edu/science/B/whalesounds/fullCuts.cfm
5
6 # Overall Goal
7 - Download all metadata for all mammal's
8 - Transform metadata to something more descriptive
9 - Index meta data into ElasticSearch
10 - Explain every step of the process, somewhat of a tutorial
11
12 Tools: bash, curl, wget, jq, xpath, regex, ElasticSearch, maybe sqlite
13
14 Current Progress:
15
16 - Download all pages and metadata [DONE]
17 - Extract the list of mammals pages with xpath / xmllint [DONE]
18 - Download each mammal page and extract the list of years [DONE]
19 - Extract all retrieval numbers and download each metadata page [DONE]
20 - Extract the metadata from HTML [DONE]
21
22 - [TODO] use xidel instead of xmllint in dl.sh
23 - [TODO] explain dl.sh some more
24
25
26 - Convert the data to JSON
27 - translate abbreviations
28 - design new data structure for more insight and useful queries
29
30
Hints:
Before first commit, do not forget to setup your git environment:
git config --global user.name "your_name_here"
git config --global user.email "your@email_here"

Clone this repository using HTTP(S):
git clone https://rocketgit.com/user/dleucas/wmmsdb

Clone this repository using ssh (do not forget to upload a key first):
git clone ssh://rocketgit@ssh.rocketgit.com/user/dleucas/wmmsdb

Clone this repository using git:
git clone git://git.rocketgit.com/user/dleucas/wmmsdb

You are allowed to anonymously push to this repository.
This means that your pushed commits will automatically be transformed into a merge request:
... clone the repository ...
... make some changes and some commits ...
git push origin main