List of commits:
Subject Hash Author Date (UTC)
Take care of mangled file name between two dirs. 65eb5f1841aae521c401ab52298041776159a35d Catalin(ux) M. BOIE 2014-06-21 12:14:36
duilder updates. Exclude Makefile when making tar.gz accfc9df80b5889b3e1f1183e5f60d70850b5671 Catalin(ux) M. BOIE 2014-06-21 06:09:12
Bump version to 0.2 078df2ea0759e4240ef7deef35744f7077957129 Catalin(ux) M. BOIE 2014-06-19 17:09:39
Fixed a case when some files were not dumped at all 0c69ce3cb20aae442da94cff9606936461177e1d Catalin(ux) M. BOIE 2014-06-19 17:05:16
First version that passes all tests. 9627508618bc2c783da838c2332e6896e3004c99 Catalin(ux) M. BOIE 2014-06-18 18:02:32
Fix a little problem with the man page. 1305ac52823f0206b98181c6435049f339be9c53 Catalin(ux) M. BOIE 2013-02-18 19:55:43
Lots of stuff 9c43842ac36feff6b29cb20a95ad4510a23bf472 Catalin(ux) M. BOIE 2012-07-20 15:43:51
Cosmetic + man a53df11bfc30c152c9f61fdb1bcc69dc6ec20765 Catalin(ux) M. BOIE 2012-06-24 12:11:17
First working version\! 27bd1bf47c9fb707760d84ea3cf4241083fa283d Catalin(ux) M. BOIE 2012-06-22 21:10:30
Several fixes 2909b1ba2e99929e775ddfea5f4894c50694a638 Catalin(ux) M. BOIE 2012-06-19 13:08:50
First version 3d7935d9b8a91694fe8213998ce4d3910348d6ef Catalin(ux) M. BOIE 2012-05-06 19:40:40
Commit 65eb5f1841aae521c401ab52298041776159a35d - Take care of mangled file name between two dirs.
If file names are different between two directories, but content si the same
we now add the flag 'M'.
Author: Catalin(ux) M. BOIE
Author date (UTC): 2014-06-21 12:14
Committer name: Catalin(ux) M. BOIE
Committer date (UTC): 2014-06-22 19:57
Parent(s): accfc9df80b5889b3e1f1183e5f60d70850b5671
Signing key:
Tree: 4a187aa1be2aa7d4cfbf6058ce96ab11035255bd
File Lines added Lines deleted
Makefile.in 1 0
TODO 10 15
dupdump.1 7 6
store.c 64 27
store.h 3 0
tests/1/in/a1 0 1
tests/1/in/a2 0 1
tests/1/in/a3 0 1
tests/1/in/b1 0 1
tests/1/in/b2 0 1
tests/1/in/c1 0 1
tests/1/in/dir_a1/a4 0 1
tests/1/in/dir_a1/a5 0 1
tests/1/in/dir_b1/b3 0 1
tests/1/in/x/dir_a2/a6 0 1
tests/1/in/x/dir_a2/a7 0 1
tests/2/expected 0 1
tests/2/in/d1/a 0 1
tests/2/in/d1/b 0 1
tests/2/in/deeper/d2/c 0 1
tests/2/in/deeper/d2/d 0 1
tests/3/expected 0 1
tests/3/in/dir_a1/a1 0 1
tests/3/in/dir_a1/b1 0 1
tests/3/in/dir_a2/a1x 0 1
tests/3/in/dir_a2/b1x 0 1
tests/4/expected 0 1
tests/4/in/dir1/dirA/fileA 0 1
tests/4/in/dir1/dirB/fileB 0 1
tests/4/in/fake/dir2/dirA/fileA 0 1
tests/4/in/fake/dir2/dirB/fileB 0 1
tests/5/in/dir1/a 0 1
tests/5/in/dir2/a 0 1
tests/5/in/dir3/sub/a 0 1
tests/5/in/dir3/sub/fake 0 1
tests/README 3 0
tests/run.sh 9 5
tests/t_1/expected 3 3
tests/t_1/pre.sh 20 0
tests/t_2/expected 3 0
tests/t_2/pre.sh 13 0
tests/t_3/expected 6 0
tests/t_3/pre.sh 14 0
tests/t_4/README 2 0
tests/t_4/expected 1 0
tests/t_4/pre.sh 9 0
tests/t_5/expected 1 1
tests/t_5/pre.sh 10 0
tests/t_6/expected 0 0
tests/t_6/pre.sh 5 0
tests/t_7/expected 1 0
tests/t_7/pre.sh 6 0
tests/t_8/expected 0 0
tests/t_8/pre.sh 10 0
tests/util.inc 12 0
File Makefile.in changed (mode: 100644) (index 3f2da1f..d27d99f)
... ... dupdump: dupdump.c $(OBJS)
16 16 clean: clean:
17 17 @rm -fv $(OBJS) dupdump vgcore.* @rm -fv $(OBJS) dupdump vgcore.*
18 18 @-rm -f $(PRJ)-*.rpm $(PRJ)-*-*-*.tgz $(PRJ)-*.tar.gz @-rm -f $(PRJ)-*.rpm $(PRJ)-*-*-*.tgz $(PRJ)-*.tar.gz
19 make -C tests clean
19 20
20 21
21 22 install: all install: all
File TODO changed (mode: 100644) (index af771af..dfc0504)
1 [ ] 1-bit fileds are not printed as %hhu!
2 [ ] Why we use 'left' flag?! Because we mark do_not_dump in the same place
3 where we set 'left'! Or, we should mark as left and not mark as
4 do_not_dump.
1 [ ] We must construct the test because we are playing with mtime now!
2 [ ] Adapt man file to recent changes: flags etc.
3 [ ] Ignore empty files.
4 [ ] 1-bit fields are not printed as %hhu!
5 5 [ ] Because we ignore !dir and !files, we may not have really identical [ ] Because we ignore !dir and !files, we may not have really identical
6 6 directories. In one of them we may have a socket, for example. directories. In one of them we may have a socket, for example.
7 Add a reporting flag for this situation. Or, do not ignore other
8 type of files.
7 9 [ ] I must document a high level view over "algorithm". Even myself [ ] I must document a high level view over "algorithm". Even myself
8 10 I do not remember what I am doing... I do not remember what I am doing...
9 11 [ ] id dir1/subdir1 = dir2/subdir1 + dir1/subdir2 = dir2/subdir2 => dir1 = dir2. [ ] id dir1/subdir1 = dir2/subdir1 + dir1/subdir2 = dir2/subdir2 => dir1 = dir2.
10 12 Se pare ca raportez si directoarele mici si cel mare. Se pare ca raportez si directoarele mici si cel mare.
11
12 Bugs:
13 [ ] Seems an empty files matches dumplog.txt! check bug1 dir!
14
15 13 [ ] Use fadvise to not cache data in RAM. [ ] Use fadvise to not cache data in RAM.
16 14 [ ] Use more threads [ ] Use more threads
17 15 [ ] We should order by mtime, older one being the first shown. [ ] We should order by mtime, older one being the first shown.
 
... ... Bugs:
26 24 dir4=dir3 dir4=dir3
27 25
28 26 [ ] We could throw away unique files. [ ] We could throw away unique files.
29 [ ] Comparing in O(N*N) sucks!
27 [ ] Comparing in O(N*N) sucks! Where?
30 28 [ ] Install man. [ ] Install man.
31 29 [ ] Dump in stats also the max memory used. [ ] Dump in stats also the max memory used.
32 30 [ ] Dump two types of dirs: DIR AND DIRFNC (File Names Changed). [ ] Dump two types of dirs: DIR AND DIRFNC (File Names Changed).
33 Maybe also for files
34 [ ] Another type of DIR is when a dir is included in another one.
35 How should I report it?
36
31 Maybe list files that were renamed (cmd line flag).
37 32 [ ] Use a cache, specified by command line. Use inode and mtime for key? [ ] Use a cache, specified by command line. Use inode and mtime for key?
38 [ ] Dump memory peak usage for statistics.
39 [ ]
33 [ ] dir1/a+b, dir2/c+d, c is a soft/hard link to a. Content of b is the same
34 with the content of d. What should I do?
File dupdump.1 changed (mode: 100644) (index 6b31415..b032b52)
1 1 .TH DUPDUMP 1 .TH DUPDUMP 1
2 2 .\" NAME should be all caps, SECTION should be 1-8, maybe w/ subsection .\" NAME should be all caps, SECTION should be 1-8, maybe w/ subsection
3 3 .\" other parms are allowed: see man(7), man(1) .\" other parms are allowed: see man(7), man(1)
4 .SH NAME
4 .SH "NAME"
5 5 dupdump \- finds duplicate files in a given set of directories dupdump \- finds duplicate files in a given set of directories
6 .SH SYNOPSIS
6 .SH "SYNOPSIS"
7 7 .B dupdump .B dupdump
8 8 [ [
9 9 .I options .I options
10 10 ] ]
11 11 .I <dir1> .I <dir1>
12 12 \|.\|.\|. \|.\|.\|.
13 .I <dirN>
13 14
14 15 .SH "DESCRIPTION" .SH "DESCRIPTION"
15 16 Searches a list of dirs, recursively to find dir and file matches. Searches a list of dirs, recursively to find dir and file matches.
 
... ... It is using SHA-1 to test the match. It outputs a list of three columns:
17 18 first is the type of match (DIR or FILE) and the second and the third, the first is the type of match (DIR or FILE) and the second and the third, the
18 19 matches. matches.
19 20
20 .SH OPTIONS
21 .SH "OPTIONS"
21 22 .TP .TP
22 23 .B -z --zero .B -z --zero
23 24 use \\0 as fields and records separator instead of \\t and \\n use \\0 as fields and records separator instead of \\t and \\n
 
... ... when a dir match is possible
29 30 .B -o --out .B -o --out
30 31 specify where to store the list of duplicates (default stdout) specify where to store the list of duplicates (default stdout)
31 32 .TP .TP
32 .B -v --verbose:
33 .B -v --verbose
33 34 be more verbose be more verbose
34 35 .TP .TP
35 36 .B -d --debug .B -d --debug
 
... ... dump debug information useful for the developers
43 44 .UR "http://kernel.embedromix.ro/us/" .UR "http://kernel.embedromix.ro/us/"
44 45 Home page Home page
45 46 .UE . .UE .
46 .SH NOTES
47 .SH "NOTES"
47 48 This program does not delete any files. Is your responsability to This program does not delete any files. Is your responsability to
48 49 take care of what to delete. take care of what to delete.
49 .SH AUTHOR
50 .SH "AUTHOR"
50 51 .UR catab-dupdump@embedromix.ro .UR catab-dupdump@embedromix.ro
51 52 Catalin(ux) M. BOIE Catalin(ux) M. BOIE
52 53 .UE . .UE .
File store.c changed (mode: 100644) (index cb70b4b..b64e73f)
... ... int file_add(const char *file, const struct stat *s,
259 259 q->dev = s->st_dev; q->dev = s->st_dev;
260 260 q->ino = s->st_ino; q->ino = s->st_ino;
261 261 q->level = level; q->level = level;
262 q->mtime = s->st_mtime;
262 263
263 264 /* link with dir */ /* link with dir */
264 265 parent = dir_current[level - 1]; parent = dir_current[level - 1];
 
... ... int file_add(const char *file, const struct stat *s,
273 274 if (file_info[hash] == NULL) { if (file_info[hash] == NULL) {
274 275 file_info[hash] = q; file_info[hash] = q;
275 276 } else { } else {
276 /* search for a bigger item and insert before it */
277 /* We order by size, level, mtime, name */
278 /* Better to use qsort. TODO */
277 279 p = file_info[hash]; p = file_info[hash];
278 280 prev = NULL; prev = NULL;
279 281 while (p) { while (p) {
280 if (size == p->size) {
281 if (level < p->level)
282 if (q->size < p->size)
283 break;
284
285 if (q->size == p->size) {
286 if (q->level < p->level)
282 287 break; break;
283 288
284 if (strcmp(file, p->name) < 0)
289 if (q->mtime < p->mtime)
285 290 break; break;
286 }
287 291
288 if (size < p->size)
289 break;
292 if (strcmp(q->name, p->name) < 0)
293 break;
294 }
290 295
291 296 prev = p; prev = p;
292 297 p = p->hash_next; p = p->hash_next;
 
... ... void dir_dump_node(const struct dir_node *d, const unsigned int level)
424 429 struct dir_node *subdir; struct dir_node *subdir;
425 430 struct file_node *file; struct file_node *file;
426 431 char dump[SHA_DIGEST_LENGTH * 2 + 1]; char dump[SHA_DIGEST_LENGTH * 2 + 1];
432 char fnh[SHA_DIGEST_LENGTH * 2 + 1];
427 433
428 434 memset(prefix, ' ', (level + 1) * 2); memset(prefix, ' ', (level + 1) * 2);
429 435 prefix[(level + 1) * 2] = '\0'; prefix[(level + 1) * 2] = '\0';
430 436
431 437 sha1_dump(dump, d->sha1, 8); sha1_dump(dump, d->sha1, 8);
438 sha1_dump(fnh, d->file_names_sha1, 8);
432 439 fprintf(stderr, "%sD '%s' d=%p subdirs=%p next_sibling=%p" fprintf(stderr, "%sD '%s' d=%p subdirs=%p next_sibling=%p"
433 440 " files=%p parent=%p no_dup_possible=%u do_not_dump=%u" " files=%p parent=%p no_dup_possible=%u do_not_dump=%u"
434 " level=%hu hash_next=%p left=%hhu sha1=%s\n",
441 " level=%hu hash_next=%p left=%hhu sha1=%s file_names_sha1=%s\n",
435 442 prefix, d->name, d, d->subdirs, d->next_sibling, prefix, d->name, d, d->subdirs, d->next_sibling,
436 443 d->files, d->parent, d->no_dup_possible, d->do_not_dump, d->files, d->parent, d->no_dup_possible, d->do_not_dump,
437 d->level, d->hash_next, d->left, dump);
444 d->level, d->hash_next, d->left, dump, fnh);
438 445
439 446 subdir = d->subdirs; subdir = d->subdirs;
440 447 while (subdir) { while (subdir) {
 
... ... static void dir_mark_up_no_dup_possible(struct dir_node *d)
511 518 /* /*
512 519 * When we list a folder on the left side, we must mark whole hierarchy under * When we list a folder on the left side, we must mark whole hierarchy under
513 520 * it as 'do_not_dump'. Else, we will dump its files and we do not want that. * it as 'do_not_dump'. Else, we will dump its files and we do not want that.
514 * TODO: But, we may have dir1 == dir2 and dir1/file1 == dir3/file3. In this case we want to dump dir1/file!
515 521 */ */
516 522 static void dir_mark_down_do_not_dump(struct dir_node *d) static void dir_mark_down_do_not_dump(struct dir_node *d)
517 523 { {
 
... ... int file_find_dups(void)
716 722 } }
717 723
718 724 if (debug) { if (debug) {
719 fprintf(stderr, "[*] Dump chain %u start:\n", hash);
725 if (debug)
726 fprintf(stderr, "[*] Dump chain %u start:\n", hash);
720 727 q = file_info[hash]; q = file_info[hash];
721 728 while (q) { while (q) {
722 729 fprintf(stderr, "%s:\n", q->name); fprintf(stderr, "%s:\n", q->name);
723 730 dups = q->duplicates; dups = q->duplicates;
724 731 while(dups) { while(dups) {
725 fprintf(stderr, "\t%s\n", dups->name);
732 if (debug)
733 fprintf(stderr, "\t%s\n", dups->name);
726 734 dups = dups->duplicates; dups = dups->duplicates;
727 735 } }
728 736 q = q->hash_next; q = q->hash_next;
729 737 } }
730 fprintf(stderr, "[*] Dump chain %u stop\n", hash);
738 if (debug)
739 fprintf(stderr, "[*] Dump chain %u stop\n", hash);
731 740 } }
732 741 } }
733 742
 
... ... static int file_compare_hashes(const void *a0, const void *b0)
761 770 * We need to sort because the order of files in dirs may differ because * We need to sort because the order of files in dirs may differ because
762 771 * the names may be different but the content the same. * the names may be different but the content the same.
763 772 * TODO: Shouldn't we test if a file is unique=1 and skip the checksum of dir??? * TODO: Shouldn't we test if a file is unique=1 and skip the checksum of dir???
773 * We return the file names hash in @fn.
764 774 */ */
765 static int dir_files_hash(unsigned char *hash, struct dir_node *d)
775 static int dir_files_hash(unsigned char *hash, unsigned char *fn,
776 struct dir_node *d)
766 777 { {
767 778 struct file_node *p; struct file_node *p;
768 779 struct file_node **u; struct file_node **u;
769 780 unsigned int i, mem; unsigned int i, mem;
770 SHA_CTX c;
781 SHA_CTX c, fnh;
782 char *base_name;
771 783
772 784 if (d->files == NULL) { if (d->files == NULL) {
773 785 memset(hash, 0, SHA_DIGEST_LENGTH); memset(hash, 0, SHA_DIGEST_LENGTH);
786 memset(fn, 0, SHA_DIGEST_LENGTH);
774 787 return 0; return 0;
775 788 } }
776 789
 
... ... static int dir_files_hash(unsigned char *hash, struct dir_node *d)
790 803 qsort(u, d->no_of_files, sizeof(struct file_node *), file_compare_hashes); qsort(u, d->no_of_files, sizeof(struct file_node *), file_compare_hashes);
791 804
792 805 SHA1_Init(&c); SHA1_Init(&c);
806 SHA1_Init(&fnh);
793 807
794 808 i = 0; i = 0;
795 809 while (i < d->no_of_files) { while (i < d->no_of_files) {
796 810 SHA1_Update(&c, u[i]->sha1_full, SHA_DIGEST_LENGTH); SHA1_Update(&c, u[i]->sha1_full, SHA_DIGEST_LENGTH);
811
812 base_name = basename(u[i]->name);
813 if (debug)
814 fprintf(stderr, "%s: XXX: add file name hash of [%s]\n", __func__, base_name);
815
816 SHA1_Update(&fnh, base_name, strlen(base_name));
797 817 i++; i++;
798 818 } }
799 819
800 820 SHA1_Final(hash, &c); SHA1_Final(hash, &c);
821 SHA1_Final(fn, &fnh);
801 822
802 823 free(u); free(u);
803 824
 
... ... static int dir_files_hash(unsigned char *hash, struct dir_node *d)
810 831 static long long dir_build_hash(struct dir_node *d) static long long dir_build_hash(struct dir_node *d)
811 832 { {
812 833 struct dir_node *subdir; struct dir_node *subdir;
813 SHA_CTX c;
834 SHA_CTX c, fnh;
814 835 unsigned char files_hash[SHA_DIGEST_LENGTH]; unsigned char files_hash[SHA_DIGEST_LENGTH];
836 unsigned char file_names_sha1[SHA_DIGEST_LENGTH];
815 837 int err; int err;
816 838 long long no_of_possible_dirs = 0; long long no_of_possible_dirs = 0;
817 839 long long ret; long long ret;
840 char *base_name;
818 841
819 842 if (debug) if (debug)
820 843 fprintf(stderr, "DEBUG: %s [%s] no_dup_possible=%u\n", fprintf(stderr, "DEBUG: %s [%s] no_dup_possible=%u\n",
 
... ... static long long dir_build_hash(struct dir_node *d)
831 854 no_of_possible_dirs++; no_of_possible_dirs++;
832 855
833 856 /* Order files by hash to compute correct hashes */ /* Order files by hash to compute correct hashes */
834 err = dir_files_hash(files_hash, d);
857 err = dir_files_hash(files_hash, file_names_sha1, d);
835 858 if (err != 0) if (err != 0)
836 859 return -1; return -1;
837 860
838 861 SHA1_Init(&c); SHA1_Init(&c);
862 SHA1_Init(&fnh);
839 863 SHA1_Update(&c, files_hash, SHA_DIGEST_LENGTH); SHA1_Update(&c, files_hash, SHA_DIGEST_LENGTH);
864 SHA1_Update(&fnh, file_names_sha1, SHA_DIGEST_LENGTH);
840 865
866 /* At the same time, we build hash of file names */
841 867 subdir = d->subdirs; subdir = d->subdirs;
842 868 while (subdir) { while (subdir) {
843 869 ret = dir_build_hash(subdir); ret = dir_build_hash(subdir);
844 870 if (ret == -1) if (ret == -1)
845 871 return -1; return -1;
846 872
873 base_name = basename(subdir->name);
874 if (debug)
875 fprintf(stderr, "%s: XXX: add subdir name to fnh [%s]\n", __func__, base_name);
876 SHA1_Update(&fnh, base_name, strlen(base_name));
877
847 878 no_of_possible_dirs += ret; no_of_possible_dirs += ret;
848 879 SHA1_Update(&c, subdir->sha1, SHA_DIGEST_LENGTH); SHA1_Update(&c, subdir->sha1, SHA_DIGEST_LENGTH);
880 if (debug)
881 fprintf(stderr, "%s: XXX: add subdir->f_n_sha1 to fnh [%s]\n", __func__, subdir->name);
882 SHA1_Update(&fnh, subdir->file_names_sha1, SHA_DIGEST_LENGTH);
883
849 884 subdir = subdir->next_sibling; subdir = subdir->next_sibling;
850 885 } }
851 886
852 887 SHA1_Final(d->sha1, &c); SHA1_Final(d->sha1, &c);
888 SHA1_Final(d->file_names_sha1, &fnh);
853 889
854 890 return no_of_possible_dirs; return no_of_possible_dirs;
855 891 } }
 
... ... void dir_dump_duplicates(struct dir_node *d, const unsigned int zero)
1023 1059 { {
1024 1060 struct dir_node *p; struct dir_node *p;
1025 1061 char sep, final; char sep, final;
1062 char flags[9];
1026 1063
1027 1064 if (debug) if (debug)
1028 1065 fprintf(stderr, "[*] dir_dump_duplicates(%s)\n", d->name); fprintf(stderr, "[*] dir_dump_duplicates(%s)\n", d->name);
 
... ... void dir_dump_duplicates(struct dir_node *d, const unsigned int zero)
1075 1112 fprintf(stderr, "dir_dump_duplicates: set do_not_dump on 'left' [%s]\n", d->name); fprintf(stderr, "dir_dump_duplicates: set do_not_dump on 'left' [%s]\n", d->name);
1076 1113 dir_mark_left(d); dir_mark_left(d);
1077 1114
1078 /*
1079 if (debug)
1080 fprintf(stderr, "dir_dump_duplicates: set do_not_dump=1 on left [%s]\n", d->name);
1081 dir_mark_down_do_not_dump(d);
1082 */
1083
1084 1115 if (debug) if (debug)
1085 1116 fprintf(stderr, "dir_dump_duplicates: set do_not_dump on right [%s]\n", p->name); fprintf(stderr, "dir_dump_duplicates: set do_not_dump on right [%s]\n", p->name);
1086 1117 dir_mark_down_do_not_dump(p); dir_mark_down_do_not_dump(p);
1087 1118
1119 memset(flags, '-', sizeof(flags) - 1);
1120 flags[sizeof(flags) - 1] = '\0';
1121
1122 if (memcmp(d->file_names_sha1, p->file_names_sha1, SHA_DIGEST_LENGTH) != 0)
1123 flags[0] = 'M';
1124
1088 1125 if (debug) if (debug)
1089 fprintf(stderr, "DIR%c%s%c%s%c",
1090 sep, d->name, sep, p->name, final);
1091 fprintf(out, "DIR%c%s%c%s%c",
1092 sep, d->name, sep, p->name, final);
1126 fprintf(stderr, "DIR%c%s%c%s%c%s%c",
1127 sep, flags, sep, d->name, sep, p->name, final);
1128 fprintf(out, "DIR%c%s%c%s%c%s%c",
1129 sep, flags, sep, d->name, sep, p->name, final);
1093 1130 p = p->hash_next; p = p->hash_next;
1094 1131 } }
1095 1132 } }
File store.h changed (mode: 100644) (index 4de63fb..37e6eac)
... ... struct file_node
31 31 struct file_node *hash_next; struct file_node *hash_next;
32 32 struct dir_node *parent; struct dir_node *parent;
33 33 struct file_node *duplicates; struct file_node *duplicates;
34 time_t mtime;
34 35 }; };
35 36
36 37 struct dir_node struct dir_node
37 38 { {
38 39 char *name; char *name;
39 40 unsigned char sha1[SHA_DIGEST_LENGTH]; unsigned char sha1[SHA_DIGEST_LENGTH];
41 unsigned char file_names_sha1[SHA_DIGEST_LENGTH];
40 42 unsigned char no_dup_possible:1; unsigned char no_dup_possible:1;
41 43 unsigned char do_not_dump:1; unsigned char do_not_dump:1;
42 44 unsigned char left:1; unsigned char left:1;
 
... ... struct dir_node
49 51 unsigned int no_of_files; unsigned int no_of_files;
50 52 struct dir_node *parent; struct dir_node *parent;
51 53 struct dir_node *hash_next; /* in the last phase, here we store duplicates */ struct dir_node *hash_next; /* in the last phase, here we store duplicates */
54 time_t mtime;
52 55 }; };
53 56
54 57
File tests/1/in/a1 deleted (index 7284ab4..0000000)
1 aaaa
File tests/1/in/a2 deleted (index 7284ab4..0000000)
1 aaaa
File tests/1/in/a3 deleted (index 7284ab4..0000000)
1 aaaa
File tests/1/in/b1 deleted (index 6484fb6..0000000)
1 bbbb
File tests/1/in/b2 deleted (index 6484fb6..0000000)
1 bbbb
File tests/1/in/c1 deleted (index baebf33..0000000)
1 cccc
File tests/1/in/dir_a1/a4 deleted (index 7284ab4..0000000)
1 aaaa
File tests/1/in/dir_a1/a5 deleted (index 7284ab4..0000000)
1 aaaa
File tests/1/in/dir_b1/b3 deleted (index 6484fb6..0000000)
1 bbbb
File tests/1/in/x/dir_a2/a6 deleted (index 7284ab4..0000000)
1 aaaa
File tests/1/in/x/dir_a2/a7 deleted (index 7284ab4..0000000)
1 aaaa
File tests/2/expected deleted (index c24cb0f..0000000)
1 DIR in/d1 in/deeper/d2
File tests/2/in/d1/a deleted (index 5ee608e..0000000)
1 xxxx
File tests/2/in/d1/b deleted (index 97aee46..0000000)
1 yyyy
File tests/2/in/deeper/d2/c deleted (index 97aee46..0000000)
1 yyyy
File tests/2/in/deeper/d2/d deleted (index 5ee608e..0000000)
1 xxxx
File tests/3/expected deleted (index 0061f50..0000000)
1 DIR in/dir_a1 in/dir_a2
File tests/3/in/dir_a1/a1 deleted (index 5d308e1..0000000)
1 aaaa
File tests/3/in/dir_a1/b1 deleted (index b433656..0000000)
1 bbbb
File tests/3/in/dir_a2/a1x deleted (index b433656..0000000)
1 bbbb
File tests/3/in/dir_a2/b1x deleted (index 5d308e1..0000000)
1 aaaa
File tests/4/expected deleted (index a27abe8..0000000)
1 DIR in/dir1 in/fake/dir2
File tests/4/in/dir1/dirA/fileA deleted (index 81c545e..0000000)
1 1234
File tests/4/in/dir1/dirB/fileB deleted (index 97b5955..0000000)
1 12345678
File tests/4/in/fake/dir2/dirA/fileA deleted (index 81c545e..0000000)
1 1234
File tests/4/in/fake/dir2/dirB/fileB deleted (index 97b5955..0000000)
1 12345678
File tests/5/in/dir1/a deleted (index 2e65efe..0000000)
1 a
File tests/5/in/dir2/a deleted (index 2e65efe..0000000)
1 a
File tests/5/in/dir3/sub/a deleted (index 2e65efe..0000000)
1 a
File tests/5/in/dir3/sub/fake deleted (index f0f877c..0000000)
1 fake
File tests/README added (mode: 100644) (index 0000000..7f6bcc8)
1 Whean adding a new test, you must:
2 - sort 'expected' file
3 -
File tests/run.sh changed (mode: 100755) (index ab6a21d..30d25d0)
1 1 #!/bin/bash #!/bin/bash
2 2
3 for t in `ls`; do
4 if [ "${t}" = "run.sh" ]; then
5 continue
6 fi
7
3 find -type d -name 't_*' -print | sort | cut -b3- | while read t; do
8 4 echo "Running test [${t}]..." echo "Running test [${t}]..."
9 5 ( (
10 6 cd "${t}" cd "${t}"
11 7
8 # Prepare stuff
9 ./pre.sh
10 if [ "${?}" != "0" ]; then
11 echo "Preparation for test [${t}] failed!"
12 exit 1
13 fi
14
12 15 valgrind --tool=memcheck \ valgrind --tool=memcheck \
13 16 --num-callers=16 \ --num-callers=16 \
14 17 --leak-check=full \ --leak-check=full \
 
... ... for t in `ls`; do
17 20 --trace-children=yes \ --trace-children=yes \
18 21 --track-origins=yes \ --track-origins=yes \
19 22 ../../dupdump --verbose --debug --out test.out in &>test.log ../../dupdump --verbose --debug --out test.out in &>test.log
23 sort test.out > test.out2 && mv test.out2 test.out
20 24 diff -u expected test.out > test.diff diff -u expected test.out > test.diff
21 25 if [ "${?}" != "0" ]; then if [ "${?}" != "0" ]; then
22 26 echo "Test [${t}] failed!" echo "Test [${t}] failed!"
File tests/t_1/expected renamed from tests/1/expected (similarity 80%) (mode: 100644) (index ebb5f4c..fa6f604)
1 DIR in/dir_a1 in/x/dir_a2
1 DIR M------- in/dir_a1 in/x/dir_a2
2 FILE in/b1 in/b2
3 FILE in/b1 in/dir_b1/b3
2 4 FILE in/dir_a1/a4 in/a1 FILE in/dir_a1/a4 in/a1
3 5 FILE in/dir_a1/a4 in/a2 FILE in/dir_a1/a4 in/a2
4 6 FILE in/dir_a1/a4 in/a3 FILE in/dir_a1/a4 in/a3
5 7 FILE in/dir_a1/a4 in/dir_a1/a5 FILE in/dir_a1/a4 in/dir_a1/a5
6 FILE in/b1 in/b2
7 FILE in/b1 in/dir_b1/b3
File tests/t_1/pre.sh added (mode: 100755) (index 0000000..72d34d9)
1 #!/bin/bash
2
3 . ../util.inc
4
5 data_out "in/a1" "aaaa"
6 data_out "in/a2" "aaaa"
7 data_out "in/a3" "aaaa"
8
9 data_out "in/b1" "bbbb"
10 data_out "in/b2" "bbbb"
11
12 data_out "in/c1" "cccc"
13
14 data_out "in/dir_a1/a4" "aaaa"
15 data_out "in/dir_a1/a5" "aaaa"
16
17 data_out "in/dir_b1/b3" "bbbb"
18
19 data_out "in/x/dir_a2/a6" "aaaa"
20 data_out "in/x/dir_a2/a7" "aaaa"
File tests/t_2/expected added (mode: 100644) (index 0000000..be96d76)
1 DIR M------- in/d1 in/deeper/d2
2 FILE in/a1 in/a2
3 FILE in/a1 in/a3
File tests/t_2/pre.sh added (mode: 100755) (index 0000000..b949783)
1 #!/bin/bash
2
3 . ../util.inc
4
5 data_out "in/a1" "aaaa"
6 data_out "in/a2" "aaaa"
7 data_out "in/a3" "aaaa"
8
9 data_out "in/d1/a" "xxxx"
10 data_out "in/d1/b" "yyyy"
11
12 data_out "in/deeper/d2/c" "yyyy"
13 data_out "in/deeper/d2/d" "xxxx"
File tests/t_3/expected added (mode: 100644) (index 0000000..21930f4)
1 FILE in/a1 in/a2
2 FILE in/a1 in/a3
3 FILE in/a1 in/dir_a1/a1
4 FILE in/a1 in/dir_a1/a4
5 FILE in/a1 in/dir_a2/b1x
6 FILE in/dir_a1/b1 in/dir_a2/a1x
File tests/t_3/pre.sh added (mode: 100755) (index 0000000..057d8bb)
1 #!/bin/bash
2
3 . ../util.inc
4
5 data_out "in/dir_a1/a1" "aaaa"
6 data_out "in/dir_a1/a4" "aaaa"
7 data_out "in/dir_a1/b1" "bbbb"
8
9 data_out "in/dir_a2/a1x" "bbbb"
10 data_out "in/dir_a2/b1x" "aaaa"
11
12 data_out "in/a1" "aaaa"
13 data_out "in/a2" "aaaa"
14 data_out "in/a3" "aaaa"
File tests/t_4/README renamed from tests/4/README (similarity 64%) (mode: 100644) (index 7e28c5d..e97564a)
... ... fake
14 14 fileB fileB
15 15
16 16 Should report only dir1 and dir2. Should report only dir1 and dir2.
17
18 Se pare ca nu ordonez hash-ul pe file_names si din cauza asta hash-ul pe dir nu e corect.
File tests/t_4/expected added (mode: 100644) (index 0000000..bff8ab8)
1 DIR -------- in/dir1 in/fake/dir2
File tests/t_4/pre.sh added (mode: 100755) (index 0000000..b3218bb)
1 #!/bin/bash
2
3 . ../util.inc
4
5 data_out "in/dir1/dirA/fileA" "1234"
6 data_out "in/dir1/dirB/fileB" "12345678"
7
8 data_out "in/fake/dir2/dirA/fileA" "1234"
9 data_out "in/fake/dir2/dirB/fileB" "12345678"
File tests/t_5/expected renamed from tests/5/expected (similarity 50%) (mode: 100644) (index b54ccfd..8b30435)
1 DIR in/dir1 in/dir2
1 DIR -------- in/dir1 in/dir2
2 2 FILE in/dir1/a in/dir3/sub/a FILE in/dir1/a in/dir3/sub/a
File tests/t_5/pre.sh added (mode: 100755) (index 0000000..8fff918)
1 #!/bin/bash
2
3 . ../util.inc
4
5 data_out "in/dir1/a" "a"
6
7 data_out "in/dir2/a" "a"
8
9 data_out "in/dir3/sub/a" "a"
10 data_out "in/dir3/sub/fake" "fake"
File tests/t_6/expected copied from file tests/6/expected (similarity 100%)
File tests/t_6/pre.sh added (mode: 100755) (index 0000000..9a69aaa)
1 #!/bin/bash
2
3 . ../util.inc
4
5 mkdir -p in
File tests/t_7/expected added (mode: 100644) (index 0000000..51b209b)
1 DIR M------- in/a in/b
File tests/t_7/pre.sh added (mode: 100755) (index 0000000..8e6e040)
1 #!/bin/bash
2
3 . ../util.inc
4
5 data_out "in/a/name1" "aaaa"
6 data_out "in/b/name2" "aaaa"
File tests/t_8/expected renamed from tests/6/expected (similarity 100%)
File tests/t_8/pre.sh added (mode: 100755) (index 0000000..69c1107)
1 #!/bin/bash
2
3 . ../util.inc
4
5 rm -rf in
6
7 data_out "in/a/file1" "aaaa"
8
9 mkdir -p in/b
10 ln in/a/file1 in/b/file2
File tests/util.inc added (mode: 100644) (index 0000000..05c3f5d)
1 #!/bin/bash
2
3 function data_out()
4 {
5 file="${1}"
6 content="${2}"
7
8 dir=`dirname "${file}"`
9 mkdir -p "${dir}"
10
11 echo "${content}" > "${file}"
12 }
Hints:
Before first commit, do not forget to setup your git environment:
git config --global user.name "your_name_here"
git config --global user.email "your@email_here"

Clone this repository using HTTP(S):
git clone https://rocketgit.com/user/catalinux/dupdump

Clone this repository using ssh (do not forget to upload a key first):
git clone ssh://rocketgit@ssh.rocketgit.com/user/catalinux/dupdump

Clone this repository using git:
git clone git://git.rocketgit.com/user/catalinux/dupdump

You are allowed to anonymously push to this repository.
This means that your pushed commits will automatically be transformed into a merge request:
... clone the repository ...
... make some changes and some commits ...
git push origin main