Subject | Hash | Author | Date (UTC) |
---|---|---|---|
rename to markdown | 8f99257ba05ee39b629b7d2281c149bbbe941b29 | dleucas | 2018-07-07 22:06:15 |
markdown | 0a1aa88a79caad45e62f086716592e00b51ff36e | dleucas | 2018-07-07 22:06:03 |
progress | 65700e843b38245a91b97658f63d640c70cefe6f | dleucas | 2018-07-07 02:30:36 |
transform HTML table to JSON object | 058f59f35e7e40c437609d91889ceb5a786e005b | dleucas | 2018-07-07 02:29:29 |
initial data survey | 7a0ba30602c78bf1f06a19d4322e503e3d11e050 | dleucas | 2018-07-07 02:28:12 |
progress information | 90dca6f739de3887394810e419457412cb2fd9fa | dleucas | 2018-07-06 01:42:38 |
working download of metadata pages | 071d8ad292bcf1486ce55f85eb0bc2102ab9c09a | dleucas | 2018-07-06 01:42:06 |
download metadata pages, initial script | ed63b4125deec37961e76b0f45592ad8549483d4 | dleucas | 2018-07-06 00:12:09 |
File | Lines added | Lines deleted |
---|---|---|
TODO | 0 | 30 |
File TODO deleted (index fca6b23..0000000) | |||
1 | No Cam / Mic | ||
2 | |||
3 | Source Site is: | ||
4 | http://cis.whoi.edu/science/B/whalesounds/fullCuts.cfm | ||
5 | |||
6 | # Overall Goal | ||
7 | - Download all metadata for all mammal's | ||
8 | - Transform metadata to something more descriptive | ||
9 | - Index meta data into ElasticSearch | ||
10 | - Explain every step of the process, somewhat of a tutorial | ||
11 | |||
12 | Tools: bash, curl, wget, jq, xpath, regex, ElasticSearch, maybe sqlite | ||
13 | |||
14 | Current Progress: | ||
15 | |||
16 | - Download all pages and metadata [DONE] | ||
17 | - Extract the list of mammals pages with xpath / xmllint [DONE] | ||
18 | - Download each mammal page and extract the list of years [DONE] | ||
19 | - Extract all retrieval numbers and download each metadata page [DONE] | ||
20 | - Extract the metadata from HTML [DONE] | ||
21 | |||
22 | - [TODO] use xidel instead of xmllint in dl.sh | ||
23 | - [TODO] explain dl.sh some more | ||
24 | |||
25 | |||
26 | - Convert the data to JSON | ||
27 | - translate abbreviations | ||
28 | - design new data structure for more insight and useful queries | ||
29 | |||
30 |