Subject | Hash | Author | Date (UTC) |
---|---|---|---|
Take care of mangled file name between two dirs. | 65eb5f1841aae521c401ab52298041776159a35d | Catalin(ux) M. BOIE | 2014-06-21 12:14:36 |
duilder updates. Exclude Makefile when making tar.gz | accfc9df80b5889b3e1f1183e5f60d70850b5671 | Catalin(ux) M. BOIE | 2014-06-21 06:09:12 |
Bump version to 0.2 | 078df2ea0759e4240ef7deef35744f7077957129 | Catalin(ux) M. BOIE | 2014-06-19 17:09:39 |
Fixed a case when some files were not dumped at all | 0c69ce3cb20aae442da94cff9606936461177e1d | Catalin(ux) M. BOIE | 2014-06-19 17:05:16 |
First version that passes all tests. | 9627508618bc2c783da838c2332e6896e3004c99 | Catalin(ux) M. BOIE | 2014-06-18 18:02:32 |
Fix a little problem with the man page. | 1305ac52823f0206b98181c6435049f339be9c53 | Catalin(ux) M. BOIE | 2013-02-18 19:55:43 |
Lots of stuff | 9c43842ac36feff6b29cb20a95ad4510a23bf472 | Catalin(ux) M. BOIE | 2012-07-20 15:43:51 |
Cosmetic + man | a53df11bfc30c152c9f61fdb1bcc69dc6ec20765 | Catalin(ux) M. BOIE | 2012-06-24 12:11:17 |
First working version\! | 27bd1bf47c9fb707760d84ea3cf4241083fa283d | Catalin(ux) M. BOIE | 2012-06-22 21:10:30 |
Several fixes | 2909b1ba2e99929e775ddfea5f4894c50694a638 | Catalin(ux) M. BOIE | 2012-06-19 13:08:50 |
First version | 3d7935d9b8a91694fe8213998ce4d3910348d6ef | Catalin(ux) M. BOIE | 2012-05-06 19:40:40 |
File Makefile.in changed (mode: 100644) (index 3f2da1f..d27d99f) | |||
... | ... | dupdump: dupdump.c $(OBJS) | |
16 | 16 | clean: | clean: |
17 | 17 | @rm -fv $(OBJS) dupdump vgcore.* | @rm -fv $(OBJS) dupdump vgcore.* |
18 | 18 | @-rm -f $(PRJ)-*.rpm $(PRJ)-*-*-*.tgz $(PRJ)-*.tar.gz | @-rm -f $(PRJ)-*.rpm $(PRJ)-*-*-*.tgz $(PRJ)-*.tar.gz |
19 | make -C tests clean | ||
19 | 20 | ||
20 | 21 | ||
21 | 22 | install: all | install: all |
File TODO changed (mode: 100644) (index af771af..dfc0504) | |||
1 | [ ] 1-bit fileds are not printed as %hhu! | ||
2 | [ ] Why we use 'left' flag?! Because we mark do_not_dump in the same place | ||
3 | where we set 'left'! Or, we should mark as left and not mark as | ||
4 | do_not_dump. | ||
1 | [ ] We must construct the test because we are playing with mtime now! | ||
2 | [ ] Adapt man file to recent changes: flags etc. | ||
3 | [ ] Ignore empty files. | ||
4 | [ ] 1-bit fields are not printed as %hhu! | ||
5 | 5 | [ ] Because we ignore !dir and !files, we may not have really identical | [ ] Because we ignore !dir and !files, we may not have really identical |
6 | 6 | directories. In one of them we may have a socket, for example. | directories. In one of them we may have a socket, for example. |
7 | Add a reporting flag for this situation. Or, do not ignore other | ||
8 | type of files. | ||
7 | 9 | [ ] I must document a high level view over "algorithm". Even myself | [ ] I must document a high level view over "algorithm". Even myself |
8 | 10 | I do not remember what I am doing... | I do not remember what I am doing... |
9 | 11 | [ ] id dir1/subdir1 = dir2/subdir1 + dir1/subdir2 = dir2/subdir2 => dir1 = dir2. | [ ] id dir1/subdir1 = dir2/subdir1 + dir1/subdir2 = dir2/subdir2 => dir1 = dir2. |
10 | 12 | Se pare ca raportez si directoarele mici si cel mare. | Se pare ca raportez si directoarele mici si cel mare. |
11 | |||
12 | Bugs: | ||
13 | [ ] Seems an empty files matches dumplog.txt! check bug1 dir! | ||
14 | |||
15 | 13 | [ ] Use fadvise to not cache data in RAM. | [ ] Use fadvise to not cache data in RAM. |
16 | 14 | [ ] Use more threads | [ ] Use more threads |
17 | 15 | [ ] We should order by mtime, older one being the first shown. | [ ] We should order by mtime, older one being the first shown. |
... | ... | Bugs: | |
26 | 24 | dir4=dir3 | dir4=dir3 |
27 | 25 | ||
28 | 26 | [ ] We could throw away unique files. | [ ] We could throw away unique files. |
29 | [ ] Comparing in O(N*N) sucks! | ||
27 | [ ] Comparing in O(N*N) sucks! Where? | ||
30 | 28 | [ ] Install man. | [ ] Install man. |
31 | 29 | [ ] Dump in stats also the max memory used. | [ ] Dump in stats also the max memory used. |
32 | 30 | [ ] Dump two types of dirs: DIR AND DIRFNC (File Names Changed). | [ ] Dump two types of dirs: DIR AND DIRFNC (File Names Changed). |
33 | Maybe also for files | ||
34 | [ ] Another type of DIR is when a dir is included in another one. | ||
35 | How should I report it? | ||
36 | |||
31 | Maybe list files that were renamed (cmd line flag). | ||
37 | 32 | [ ] Use a cache, specified by command line. Use inode and mtime for key? | [ ] Use a cache, specified by command line. Use inode and mtime for key? |
38 | [ ] Dump memory peak usage for statistics. | ||
39 | [ ] | ||
33 | [ ] dir1/a+b, dir2/c+d, c is a soft/hard link to a. Content of b is the same | ||
34 | with the content of d. What should I do? |
File dupdump.1 changed (mode: 100644) (index 6b31415..b032b52) | |||
1 | 1 | .TH DUPDUMP 1 | .TH DUPDUMP 1 |
2 | 2 | .\" NAME should be all caps, SECTION should be 1-8, maybe w/ subsection | .\" NAME should be all caps, SECTION should be 1-8, maybe w/ subsection |
3 | 3 | .\" other parms are allowed: see man(7), man(1) | .\" other parms are allowed: see man(7), man(1) |
4 | .SH NAME | ||
4 | .SH "NAME" | ||
5 | 5 | dupdump \- finds duplicate files in a given set of directories | dupdump \- finds duplicate files in a given set of directories |
6 | .SH SYNOPSIS | ||
6 | .SH "SYNOPSIS" | ||
7 | 7 | .B dupdump | .B dupdump |
8 | 8 | [ | [ |
9 | 9 | .I options | .I options |
10 | 10 | ] | ] |
11 | 11 | .I <dir1> | .I <dir1> |
12 | 12 | \|.\|.\|. | \|.\|.\|. |
13 | .I <dirN> | ||
13 | 14 | ||
14 | 15 | .SH "DESCRIPTION" | .SH "DESCRIPTION" |
15 | 16 | Searches a list of dirs, recursively to find dir and file matches. | Searches a list of dirs, recursively to find dir and file matches. |
... | ... | It is using SHA-1 to test the match. It outputs a list of three columns: | |
17 | 18 | first is the type of match (DIR or FILE) and the second and the third, the | first is the type of match (DIR or FILE) and the second and the third, the |
18 | 19 | matches. | matches. |
19 | 20 | ||
20 | .SH OPTIONS | ||
21 | .SH "OPTIONS" | ||
21 | 22 | .TP | .TP |
22 | 23 | .B -z --zero | .B -z --zero |
23 | 24 | use \\0 as fields and records separator instead of \\t and \\n | use \\0 as fields and records separator instead of \\t and \\n |
... | ... | when a dir match is possible | |
29 | 30 | .B -o --out | .B -o --out |
30 | 31 | specify where to store the list of duplicates (default stdout) | specify where to store the list of duplicates (default stdout) |
31 | 32 | .TP | .TP |
32 | .B -v --verbose: | ||
33 | .B -v --verbose | ||
33 | 34 | be more verbose | be more verbose |
34 | 35 | .TP | .TP |
35 | 36 | .B -d --debug | .B -d --debug |
... | ... | dump debug information useful for the developers | |
43 | 44 | .UR "http://kernel.embedromix.ro/us/" | .UR "http://kernel.embedromix.ro/us/" |
44 | 45 | Home page | Home page |
45 | 46 | .UE . | .UE . |
46 | .SH NOTES | ||
47 | .SH "NOTES" | ||
47 | 48 | This program does not delete any files. Is your responsability to | This program does not delete any files. Is your responsability to |
48 | 49 | take care of what to delete. | take care of what to delete. |
49 | .SH AUTHOR | ||
50 | .SH "AUTHOR" | ||
50 | 51 | .UR catab-dupdump@embedromix.ro | .UR catab-dupdump@embedromix.ro |
51 | 52 | Catalin(ux) M. BOIE | Catalin(ux) M. BOIE |
52 | 53 | .UE . | .UE . |
File store.c changed (mode: 100644) (index cb70b4b..b64e73f) | |||
... | ... | int file_add(const char *file, const struct stat *s, | |
259 | 259 | q->dev = s->st_dev; | q->dev = s->st_dev; |
260 | 260 | q->ino = s->st_ino; | q->ino = s->st_ino; |
261 | 261 | q->level = level; | q->level = level; |
262 | q->mtime = s->st_mtime; | ||
262 | 263 | ||
263 | 264 | /* link with dir */ | /* link with dir */ |
264 | 265 | parent = dir_current[level - 1]; | parent = dir_current[level - 1]; |
... | ... | int file_add(const char *file, const struct stat *s, | |
273 | 274 | if (file_info[hash] == NULL) { | if (file_info[hash] == NULL) { |
274 | 275 | file_info[hash] = q; | file_info[hash] = q; |
275 | 276 | } else { | } else { |
276 | /* search for a bigger item and insert before it */ | ||
277 | /* We order by size, level, mtime, name */ | ||
278 | /* Better to use qsort. TODO */ | ||
277 | 279 | p = file_info[hash]; | p = file_info[hash]; |
278 | 280 | prev = NULL; | prev = NULL; |
279 | 281 | while (p) { | while (p) { |
280 | if (size == p->size) { | ||
281 | if (level < p->level) | ||
282 | if (q->size < p->size) | ||
283 | break; | ||
284 | |||
285 | if (q->size == p->size) { | ||
286 | if (q->level < p->level) | ||
282 | 287 | break; | break; |
283 | 288 | ||
284 | if (strcmp(file, p->name) < 0) | ||
289 | if (q->mtime < p->mtime) | ||
285 | 290 | break; | break; |
286 | } | ||
287 | 291 | ||
288 | if (size < p->size) | ||
289 | break; | ||
292 | if (strcmp(q->name, p->name) < 0) | ||
293 | break; | ||
294 | } | ||
290 | 295 | ||
291 | 296 | prev = p; | prev = p; |
292 | 297 | p = p->hash_next; | p = p->hash_next; |
... | ... | void dir_dump_node(const struct dir_node *d, const unsigned int level) | |
424 | 429 | struct dir_node *subdir; | struct dir_node *subdir; |
425 | 430 | struct file_node *file; | struct file_node *file; |
426 | 431 | char dump[SHA_DIGEST_LENGTH * 2 + 1]; | char dump[SHA_DIGEST_LENGTH * 2 + 1]; |
432 | char fnh[SHA_DIGEST_LENGTH * 2 + 1]; | ||
427 | 433 | ||
428 | 434 | memset(prefix, ' ', (level + 1) * 2); | memset(prefix, ' ', (level + 1) * 2); |
429 | 435 | prefix[(level + 1) * 2] = '\0'; | prefix[(level + 1) * 2] = '\0'; |
430 | 436 | ||
431 | 437 | sha1_dump(dump, d->sha1, 8); | sha1_dump(dump, d->sha1, 8); |
438 | sha1_dump(fnh, d->file_names_sha1, 8); | ||
432 | 439 | fprintf(stderr, "%sD '%s' d=%p subdirs=%p next_sibling=%p" | fprintf(stderr, "%sD '%s' d=%p subdirs=%p next_sibling=%p" |
433 | 440 | " files=%p parent=%p no_dup_possible=%u do_not_dump=%u" | " files=%p parent=%p no_dup_possible=%u do_not_dump=%u" |
434 | " level=%hu hash_next=%p left=%hhu sha1=%s\n", | ||
441 | " level=%hu hash_next=%p left=%hhu sha1=%s file_names_sha1=%s\n", | ||
435 | 442 | prefix, d->name, d, d->subdirs, d->next_sibling, | prefix, d->name, d, d->subdirs, d->next_sibling, |
436 | 443 | d->files, d->parent, d->no_dup_possible, d->do_not_dump, | d->files, d->parent, d->no_dup_possible, d->do_not_dump, |
437 | d->level, d->hash_next, d->left, dump); | ||
444 | d->level, d->hash_next, d->left, dump, fnh); | ||
438 | 445 | ||
439 | 446 | subdir = d->subdirs; | subdir = d->subdirs; |
440 | 447 | while (subdir) { | while (subdir) { |
... | ... | static void dir_mark_up_no_dup_possible(struct dir_node *d) | |
511 | 518 | /* | /* |
512 | 519 | * When we list a folder on the left side, we must mark whole hierarchy under | * When we list a folder on the left side, we must mark whole hierarchy under |
513 | 520 | * it as 'do_not_dump'. Else, we will dump its files and we do not want that. | * it as 'do_not_dump'. Else, we will dump its files and we do not want that. |
514 | * TODO: But, we may have dir1 == dir2 and dir1/file1 == dir3/file3. In this case we want to dump dir1/file! | ||
515 | 521 | */ | */ |
516 | 522 | static void dir_mark_down_do_not_dump(struct dir_node *d) | static void dir_mark_down_do_not_dump(struct dir_node *d) |
517 | 523 | { | { |
... | ... | int file_find_dups(void) | |
716 | 722 | } | } |
717 | 723 | ||
718 | 724 | if (debug) { | if (debug) { |
719 | fprintf(stderr, "[*] Dump chain %u start:\n", hash); | ||
725 | if (debug) | ||
726 | fprintf(stderr, "[*] Dump chain %u start:\n", hash); | ||
720 | 727 | q = file_info[hash]; | q = file_info[hash]; |
721 | 728 | while (q) { | while (q) { |
722 | 729 | fprintf(stderr, "%s:\n", q->name); | fprintf(stderr, "%s:\n", q->name); |
723 | 730 | dups = q->duplicates; | dups = q->duplicates; |
724 | 731 | while(dups) { | while(dups) { |
725 | fprintf(stderr, "\t%s\n", dups->name); | ||
732 | if (debug) | ||
733 | fprintf(stderr, "\t%s\n", dups->name); | ||
726 | 734 | dups = dups->duplicates; | dups = dups->duplicates; |
727 | 735 | } | } |
728 | 736 | q = q->hash_next; | q = q->hash_next; |
729 | 737 | } | } |
730 | fprintf(stderr, "[*] Dump chain %u stop\n", hash); | ||
738 | if (debug) | ||
739 | fprintf(stderr, "[*] Dump chain %u stop\n", hash); | ||
731 | 740 | } | } |
732 | 741 | } | } |
733 | 742 | ||
... | ... | static int file_compare_hashes(const void *a0, const void *b0) | |
761 | 770 | * We need to sort because the order of files in dirs may differ because | * We need to sort because the order of files in dirs may differ because |
762 | 771 | * the names may be different but the content the same. | * the names may be different but the content the same. |
763 | 772 | * TODO: Shouldn't we test if a file is unique=1 and skip the checksum of dir??? | * TODO: Shouldn't we test if a file is unique=1 and skip the checksum of dir??? |
773 | * We return the file names hash in @fn. | ||
764 | 774 | */ | */ |
765 | static int dir_files_hash(unsigned char *hash, struct dir_node *d) | ||
775 | static int dir_files_hash(unsigned char *hash, unsigned char *fn, | ||
776 | struct dir_node *d) | ||
766 | 777 | { | { |
767 | 778 | struct file_node *p; | struct file_node *p; |
768 | 779 | struct file_node **u; | struct file_node **u; |
769 | 780 | unsigned int i, mem; | unsigned int i, mem; |
770 | SHA_CTX c; | ||
781 | SHA_CTX c, fnh; | ||
782 | char *base_name; | ||
771 | 783 | ||
772 | 784 | if (d->files == NULL) { | if (d->files == NULL) { |
773 | 785 | memset(hash, 0, SHA_DIGEST_LENGTH); | memset(hash, 0, SHA_DIGEST_LENGTH); |
786 | memset(fn, 0, SHA_DIGEST_LENGTH); | ||
774 | 787 | return 0; | return 0; |
775 | 788 | } | } |
776 | 789 | ||
... | ... | static int dir_files_hash(unsigned char *hash, struct dir_node *d) | |
790 | 803 | qsort(u, d->no_of_files, sizeof(struct file_node *), file_compare_hashes); | qsort(u, d->no_of_files, sizeof(struct file_node *), file_compare_hashes); |
791 | 804 | ||
792 | 805 | SHA1_Init(&c); | SHA1_Init(&c); |
806 | SHA1_Init(&fnh); | ||
793 | 807 | ||
794 | 808 | i = 0; | i = 0; |
795 | 809 | while (i < d->no_of_files) { | while (i < d->no_of_files) { |
796 | 810 | SHA1_Update(&c, u[i]->sha1_full, SHA_DIGEST_LENGTH); | SHA1_Update(&c, u[i]->sha1_full, SHA_DIGEST_LENGTH); |
811 | |||
812 | base_name = basename(u[i]->name); | ||
813 | if (debug) | ||
814 | fprintf(stderr, "%s: XXX: add file name hash of [%s]\n", __func__, base_name); | ||
815 | |||
816 | SHA1_Update(&fnh, base_name, strlen(base_name)); | ||
797 | 817 | i++; | i++; |
798 | 818 | } | } |
799 | 819 | ||
800 | 820 | SHA1_Final(hash, &c); | SHA1_Final(hash, &c); |
821 | SHA1_Final(fn, &fnh); | ||
801 | 822 | ||
802 | 823 | free(u); | free(u); |
803 | 824 | ||
... | ... | static int dir_files_hash(unsigned char *hash, struct dir_node *d) | |
810 | 831 | static long long dir_build_hash(struct dir_node *d) | static long long dir_build_hash(struct dir_node *d) |
811 | 832 | { | { |
812 | 833 | struct dir_node *subdir; | struct dir_node *subdir; |
813 | SHA_CTX c; | ||
834 | SHA_CTX c, fnh; | ||
814 | 835 | unsigned char files_hash[SHA_DIGEST_LENGTH]; | unsigned char files_hash[SHA_DIGEST_LENGTH]; |
836 | unsigned char file_names_sha1[SHA_DIGEST_LENGTH]; | ||
815 | 837 | int err; | int err; |
816 | 838 | long long no_of_possible_dirs = 0; | long long no_of_possible_dirs = 0; |
817 | 839 | long long ret; | long long ret; |
840 | char *base_name; | ||
818 | 841 | ||
819 | 842 | if (debug) | if (debug) |
820 | 843 | fprintf(stderr, "DEBUG: %s [%s] no_dup_possible=%u\n", | fprintf(stderr, "DEBUG: %s [%s] no_dup_possible=%u\n", |
... | ... | static long long dir_build_hash(struct dir_node *d) | |
831 | 854 | no_of_possible_dirs++; | no_of_possible_dirs++; |
832 | 855 | ||
833 | 856 | /* Order files by hash to compute correct hashes */ | /* Order files by hash to compute correct hashes */ |
834 | err = dir_files_hash(files_hash, d); | ||
857 | err = dir_files_hash(files_hash, file_names_sha1, d); | ||
835 | 858 | if (err != 0) | if (err != 0) |
836 | 859 | return -1; | return -1; |
837 | 860 | ||
838 | 861 | SHA1_Init(&c); | SHA1_Init(&c); |
862 | SHA1_Init(&fnh); | ||
839 | 863 | SHA1_Update(&c, files_hash, SHA_DIGEST_LENGTH); | SHA1_Update(&c, files_hash, SHA_DIGEST_LENGTH); |
864 | SHA1_Update(&fnh, file_names_sha1, SHA_DIGEST_LENGTH); | ||
840 | 865 | ||
866 | /* At the same time, we build hash of file names */ | ||
841 | 867 | subdir = d->subdirs; | subdir = d->subdirs; |
842 | 868 | while (subdir) { | while (subdir) { |
843 | 869 | ret = dir_build_hash(subdir); | ret = dir_build_hash(subdir); |
844 | 870 | if (ret == -1) | if (ret == -1) |
845 | 871 | return -1; | return -1; |
846 | 872 | ||
873 | base_name = basename(subdir->name); | ||
874 | if (debug) | ||
875 | fprintf(stderr, "%s: XXX: add subdir name to fnh [%s]\n", __func__, base_name); | ||
876 | SHA1_Update(&fnh, base_name, strlen(base_name)); | ||
877 | |||
847 | 878 | no_of_possible_dirs += ret; | no_of_possible_dirs += ret; |
848 | 879 | SHA1_Update(&c, subdir->sha1, SHA_DIGEST_LENGTH); | SHA1_Update(&c, subdir->sha1, SHA_DIGEST_LENGTH); |
880 | if (debug) | ||
881 | fprintf(stderr, "%s: XXX: add subdir->f_n_sha1 to fnh [%s]\n", __func__, subdir->name); | ||
882 | SHA1_Update(&fnh, subdir->file_names_sha1, SHA_DIGEST_LENGTH); | ||
883 | |||
849 | 884 | subdir = subdir->next_sibling; | subdir = subdir->next_sibling; |
850 | 885 | } | } |
851 | 886 | ||
852 | 887 | SHA1_Final(d->sha1, &c); | SHA1_Final(d->sha1, &c); |
888 | SHA1_Final(d->file_names_sha1, &fnh); | ||
853 | 889 | ||
854 | 890 | return no_of_possible_dirs; | return no_of_possible_dirs; |
855 | 891 | } | } |
... | ... | void dir_dump_duplicates(struct dir_node *d, const unsigned int zero) | |
1023 | 1059 | { | { |
1024 | 1060 | struct dir_node *p; | struct dir_node *p; |
1025 | 1061 | char sep, final; | char sep, final; |
1062 | char flags[9]; | ||
1026 | 1063 | ||
1027 | 1064 | if (debug) | if (debug) |
1028 | 1065 | fprintf(stderr, "[*] dir_dump_duplicates(%s)\n", d->name); | fprintf(stderr, "[*] dir_dump_duplicates(%s)\n", d->name); |
... | ... | void dir_dump_duplicates(struct dir_node *d, const unsigned int zero) | |
1075 | 1112 | fprintf(stderr, "dir_dump_duplicates: set do_not_dump on 'left' [%s]\n", d->name); | fprintf(stderr, "dir_dump_duplicates: set do_not_dump on 'left' [%s]\n", d->name); |
1076 | 1113 | dir_mark_left(d); | dir_mark_left(d); |
1077 | 1114 | ||
1078 | /* | ||
1079 | if (debug) | ||
1080 | fprintf(stderr, "dir_dump_duplicates: set do_not_dump=1 on left [%s]\n", d->name); | ||
1081 | dir_mark_down_do_not_dump(d); | ||
1082 | */ | ||
1083 | |||
1084 | 1115 | if (debug) | if (debug) |
1085 | 1116 | fprintf(stderr, "dir_dump_duplicates: set do_not_dump on right [%s]\n", p->name); | fprintf(stderr, "dir_dump_duplicates: set do_not_dump on right [%s]\n", p->name); |
1086 | 1117 | dir_mark_down_do_not_dump(p); | dir_mark_down_do_not_dump(p); |
1087 | 1118 | ||
1119 | memset(flags, '-', sizeof(flags) - 1); | ||
1120 | flags[sizeof(flags) - 1] = '\0'; | ||
1121 | |||
1122 | if (memcmp(d->file_names_sha1, p->file_names_sha1, SHA_DIGEST_LENGTH) != 0) | ||
1123 | flags[0] = 'M'; | ||
1124 | |||
1088 | 1125 | if (debug) | if (debug) |
1089 | fprintf(stderr, "DIR%c%s%c%s%c", | ||
1090 | sep, d->name, sep, p->name, final); | ||
1091 | fprintf(out, "DIR%c%s%c%s%c", | ||
1092 | sep, d->name, sep, p->name, final); | ||
1126 | fprintf(stderr, "DIR%c%s%c%s%c%s%c", | ||
1127 | sep, flags, sep, d->name, sep, p->name, final); | ||
1128 | fprintf(out, "DIR%c%s%c%s%c%s%c", | ||
1129 | sep, flags, sep, d->name, sep, p->name, final); | ||
1093 | 1130 | p = p->hash_next; | p = p->hash_next; |
1094 | 1131 | } | } |
1095 | 1132 | } | } |
File store.h changed (mode: 100644) (index 4de63fb..37e6eac) | |||
... | ... | struct file_node | |
31 | 31 | struct file_node *hash_next; | struct file_node *hash_next; |
32 | 32 | struct dir_node *parent; | struct dir_node *parent; |
33 | 33 | struct file_node *duplicates; | struct file_node *duplicates; |
34 | time_t mtime; | ||
34 | 35 | }; | }; |
35 | 36 | ||
36 | 37 | struct dir_node | struct dir_node |
37 | 38 | { | { |
38 | 39 | char *name; | char *name; |
39 | 40 | unsigned char sha1[SHA_DIGEST_LENGTH]; | unsigned char sha1[SHA_DIGEST_LENGTH]; |
41 | unsigned char file_names_sha1[SHA_DIGEST_LENGTH]; | ||
40 | 42 | unsigned char no_dup_possible:1; | unsigned char no_dup_possible:1; |
41 | 43 | unsigned char do_not_dump:1; | unsigned char do_not_dump:1; |
42 | 44 | unsigned char left:1; | unsigned char left:1; |
... | ... | struct dir_node | |
49 | 51 | unsigned int no_of_files; | unsigned int no_of_files; |
50 | 52 | struct dir_node *parent; | struct dir_node *parent; |
51 | 53 | struct dir_node *hash_next; /* in the last phase, here we store duplicates */ | struct dir_node *hash_next; /* in the last phase, here we store duplicates */ |
54 | time_t mtime; | ||
52 | 55 | }; | }; |
53 | 56 | ||
54 | 57 |
File tests/1/in/a1 deleted (index 7284ab4..0000000) | |||
1 | aaaa |
File tests/1/in/a2 deleted (index 7284ab4..0000000) | |||
1 | aaaa |
File tests/1/in/a3 deleted (index 7284ab4..0000000) | |||
1 | aaaa |
File tests/1/in/b1 deleted (index 6484fb6..0000000) | |||
1 | bbbb |
File tests/1/in/b2 deleted (index 6484fb6..0000000) | |||
1 | bbbb |
File tests/1/in/c1 deleted (index baebf33..0000000) | |||
1 | cccc |
File tests/1/in/dir_a1/a4 deleted (index 7284ab4..0000000) | |||
1 | aaaa |
File tests/1/in/dir_a1/a5 deleted (index 7284ab4..0000000) | |||
1 | aaaa |
File tests/1/in/dir_b1/b3 deleted (index 6484fb6..0000000) | |||
1 | bbbb |
File tests/1/in/x/dir_a2/a6 deleted (index 7284ab4..0000000) | |||
1 | aaaa |
File tests/1/in/x/dir_a2/a7 deleted (index 7284ab4..0000000) | |||
1 | aaaa |
File tests/2/expected deleted (index c24cb0f..0000000) | |||
1 | DIR in/d1 in/deeper/d2 |
File tests/2/in/d1/a deleted (index 5ee608e..0000000) | |||
1 | xxxx |
File tests/2/in/d1/b deleted (index 97aee46..0000000) | |||
1 | yyyy |
File tests/2/in/deeper/d2/c deleted (index 97aee46..0000000) | |||
1 | yyyy |
File tests/2/in/deeper/d2/d deleted (index 5ee608e..0000000) | |||
1 | xxxx |
File tests/3/expected deleted (index 0061f50..0000000) | |||
1 | DIR in/dir_a1 in/dir_a2 |
File tests/3/in/dir_a1/a1 deleted (index 5d308e1..0000000) | |||
1 | aaaa |
File tests/3/in/dir_a1/b1 deleted (index b433656..0000000) | |||
1 | bbbb |
File tests/3/in/dir_a2/a1x deleted (index b433656..0000000) | |||
1 | bbbb |
File tests/3/in/dir_a2/b1x deleted (index 5d308e1..0000000) | |||
1 | aaaa |
File tests/4/expected deleted (index a27abe8..0000000) | |||
1 | DIR in/dir1 in/fake/dir2 |
File tests/4/in/dir1/dirA/fileA deleted (index 81c545e..0000000) | |||
1 | 1234 |
File tests/4/in/dir1/dirB/fileB deleted (index 97b5955..0000000) | |||
1 | 12345678 |
File tests/4/in/fake/dir2/dirA/fileA deleted (index 81c545e..0000000) | |||
1 | 1234 |
File tests/4/in/fake/dir2/dirB/fileB deleted (index 97b5955..0000000) | |||
1 | 12345678 |
File tests/5/in/dir1/a deleted (index 2e65efe..0000000) | |||
1 | a |
File tests/5/in/dir2/a deleted (index 2e65efe..0000000) | |||
1 | a |
File tests/5/in/dir3/sub/a deleted (index 2e65efe..0000000) | |||
1 | a |
File tests/5/in/dir3/sub/fake deleted (index f0f877c..0000000) | |||
1 | fake |
File tests/README added (mode: 100644) (index 0000000..7f6bcc8) | |||
1 | Whean adding a new test, you must: | ||
2 | - sort 'expected' file | ||
3 | - |
File tests/run.sh changed (mode: 100755) (index ab6a21d..30d25d0) | |||
1 | 1 | #!/bin/bash | #!/bin/bash |
2 | 2 | ||
3 | for t in `ls`; do | ||
4 | if [ "${t}" = "run.sh" ]; then | ||
5 | continue | ||
6 | fi | ||
7 | |||
3 | find -type d -name 't_*' -print | sort | cut -b3- | while read t; do | ||
8 | 4 | echo "Running test [${t}]..." | echo "Running test [${t}]..." |
9 | 5 | ( | ( |
10 | 6 | cd "${t}" | cd "${t}" |
11 | 7 | ||
8 | # Prepare stuff | ||
9 | ./pre.sh | ||
10 | if [ "${?}" != "0" ]; then | ||
11 | echo "Preparation for test [${t}] failed!" | ||
12 | exit 1 | ||
13 | fi | ||
14 | |||
12 | 15 | valgrind --tool=memcheck \ | valgrind --tool=memcheck \ |
13 | 16 | --num-callers=16 \ | --num-callers=16 \ |
14 | 17 | --leak-check=full \ | --leak-check=full \ |
... | ... | for t in `ls`; do | |
17 | 20 | --trace-children=yes \ | --trace-children=yes \ |
18 | 21 | --track-origins=yes \ | --track-origins=yes \ |
19 | 22 | ../../dupdump --verbose --debug --out test.out in &>test.log | ../../dupdump --verbose --debug --out test.out in &>test.log |
23 | sort test.out > test.out2 && mv test.out2 test.out | ||
20 | 24 | diff -u expected test.out > test.diff | diff -u expected test.out > test.diff |
21 | 25 | if [ "${?}" != "0" ]; then | if [ "${?}" != "0" ]; then |
22 | 26 | echo "Test [${t}] failed!" | echo "Test [${t}] failed!" |
File tests/t_1/expected renamed from tests/1/expected (similarity 80%) (mode: 100644) (index ebb5f4c..fa6f604) | |||
1 | DIR in/dir_a1 in/x/dir_a2 | ||
1 | DIR M------- in/dir_a1 in/x/dir_a2 | ||
2 | FILE in/b1 in/b2 | ||
3 | FILE in/b1 in/dir_b1/b3 | ||
2 | 4 | FILE in/dir_a1/a4 in/a1 | FILE in/dir_a1/a4 in/a1 |
3 | 5 | FILE in/dir_a1/a4 in/a2 | FILE in/dir_a1/a4 in/a2 |
4 | 6 | FILE in/dir_a1/a4 in/a3 | FILE in/dir_a1/a4 in/a3 |
5 | 7 | FILE in/dir_a1/a4 in/dir_a1/a5 | FILE in/dir_a1/a4 in/dir_a1/a5 |
6 | FILE in/b1 in/b2 | ||
7 | FILE in/b1 in/dir_b1/b3 |
File tests/t_1/pre.sh added (mode: 100755) (index 0000000..72d34d9) | |||
1 | #!/bin/bash | ||
2 | |||
3 | . ../util.inc | ||
4 | |||
5 | data_out "in/a1" "aaaa" | ||
6 | data_out "in/a2" "aaaa" | ||
7 | data_out "in/a3" "aaaa" | ||
8 | |||
9 | data_out "in/b1" "bbbb" | ||
10 | data_out "in/b2" "bbbb" | ||
11 | |||
12 | data_out "in/c1" "cccc" | ||
13 | |||
14 | data_out "in/dir_a1/a4" "aaaa" | ||
15 | data_out "in/dir_a1/a5" "aaaa" | ||
16 | |||
17 | data_out "in/dir_b1/b3" "bbbb" | ||
18 | |||
19 | data_out "in/x/dir_a2/a6" "aaaa" | ||
20 | data_out "in/x/dir_a2/a7" "aaaa" |
File tests/t_2/expected added (mode: 100644) (index 0000000..be96d76) | |||
1 | DIR M------- in/d1 in/deeper/d2 | ||
2 | FILE in/a1 in/a2 | ||
3 | FILE in/a1 in/a3 |
File tests/t_2/pre.sh added (mode: 100755) (index 0000000..b949783) | |||
1 | #!/bin/bash | ||
2 | |||
3 | . ../util.inc | ||
4 | |||
5 | data_out "in/a1" "aaaa" | ||
6 | data_out "in/a2" "aaaa" | ||
7 | data_out "in/a3" "aaaa" | ||
8 | |||
9 | data_out "in/d1/a" "xxxx" | ||
10 | data_out "in/d1/b" "yyyy" | ||
11 | |||
12 | data_out "in/deeper/d2/c" "yyyy" | ||
13 | data_out "in/deeper/d2/d" "xxxx" |
File tests/t_3/expected added (mode: 100644) (index 0000000..21930f4) | |||
1 | FILE in/a1 in/a2 | ||
2 | FILE in/a1 in/a3 | ||
3 | FILE in/a1 in/dir_a1/a1 | ||
4 | FILE in/a1 in/dir_a1/a4 | ||
5 | FILE in/a1 in/dir_a2/b1x | ||
6 | FILE in/dir_a1/b1 in/dir_a2/a1x |
File tests/t_3/pre.sh added (mode: 100755) (index 0000000..057d8bb) | |||
1 | #!/bin/bash | ||
2 | |||
3 | . ../util.inc | ||
4 | |||
5 | data_out "in/dir_a1/a1" "aaaa" | ||
6 | data_out "in/dir_a1/a4" "aaaa" | ||
7 | data_out "in/dir_a1/b1" "bbbb" | ||
8 | |||
9 | data_out "in/dir_a2/a1x" "bbbb" | ||
10 | data_out "in/dir_a2/b1x" "aaaa" | ||
11 | |||
12 | data_out "in/a1" "aaaa" | ||
13 | data_out "in/a2" "aaaa" | ||
14 | data_out "in/a3" "aaaa" |
File tests/t_4/README renamed from tests/4/README (similarity 64%) (mode: 100644) (index 7e28c5d..e97564a) | |||
... | ... | fake | |
14 | 14 | fileB | fileB |
15 | 15 | ||
16 | 16 | Should report only dir1 and dir2. | Should report only dir1 and dir2. |
17 | |||
18 | Se pare ca nu ordonez hash-ul pe file_names si din cauza asta hash-ul pe dir nu e corect. |
File tests/t_4/expected added (mode: 100644) (index 0000000..bff8ab8) | |||
1 | DIR -------- in/dir1 in/fake/dir2 |
File tests/t_4/pre.sh added (mode: 100755) (index 0000000..b3218bb) | |||
1 | #!/bin/bash | ||
2 | |||
3 | . ../util.inc | ||
4 | |||
5 | data_out "in/dir1/dirA/fileA" "1234" | ||
6 | data_out "in/dir1/dirB/fileB" "12345678" | ||
7 | |||
8 | data_out "in/fake/dir2/dirA/fileA" "1234" | ||
9 | data_out "in/fake/dir2/dirB/fileB" "12345678" |
File tests/t_5/expected renamed from tests/5/expected (similarity 50%) (mode: 100644) (index b54ccfd..8b30435) | |||
1 | DIR in/dir1 in/dir2 | ||
1 | DIR -------- in/dir1 in/dir2 | ||
2 | 2 | FILE in/dir1/a in/dir3/sub/a | FILE in/dir1/a in/dir3/sub/a |
File tests/t_5/pre.sh added (mode: 100755) (index 0000000..8fff918) | |||
1 | #!/bin/bash | ||
2 | |||
3 | . ../util.inc | ||
4 | |||
5 | data_out "in/dir1/a" "a" | ||
6 | |||
7 | data_out "in/dir2/a" "a" | ||
8 | |||
9 | data_out "in/dir3/sub/a" "a" | ||
10 | data_out "in/dir3/sub/fake" "fake" |
File tests/t_6/expected copied from file tests/6/expected (similarity 100%) |
File tests/t_6/pre.sh added (mode: 100755) (index 0000000..9a69aaa) | |||
1 | #!/bin/bash | ||
2 | |||
3 | . ../util.inc | ||
4 | |||
5 | mkdir -p in |
File tests/t_7/expected added (mode: 100644) (index 0000000..51b209b) | |||
1 | DIR M------- in/a in/b |
File tests/t_7/pre.sh added (mode: 100755) (index 0000000..8e6e040) | |||
1 | #!/bin/bash | ||
2 | |||
3 | . ../util.inc | ||
4 | |||
5 | data_out "in/a/name1" "aaaa" | ||
6 | data_out "in/b/name2" "aaaa" |
File tests/t_8/expected renamed from tests/6/expected (similarity 100%) |
File tests/t_8/pre.sh added (mode: 100755) (index 0000000..69c1107) | |||
1 | #!/bin/bash | ||
2 | |||
3 | . ../util.inc | ||
4 | |||
5 | rm -rf in | ||
6 | |||
7 | data_out "in/a/file1" "aaaa" | ||
8 | |||
9 | mkdir -p in/b | ||
10 | ln in/a/file1 in/b/file2 |
File tests/util.inc added (mode: 100644) (index 0000000..05c3f5d) | |||
1 | #!/bin/bash | ||
2 | |||
3 | function data_out() | ||
4 | { | ||
5 | file="${1}" | ||
6 | content="${2}" | ||
7 | |||
8 | dir=`dirname "${file}"` | ||
9 | mkdir -p "${dir}" | ||
10 | |||
11 | echo "${content}" > "${file}" | ||
12 | } |