RocketGit

catalinux / dupdump (public) (License: GPLv3) (since 2016-03-01) (hash sha1)

Find duplicated files and directories.

Clone URLs: https://rocketgit.com/user/catalinux/dupdump ssh://rocketgit@ssh.rocketgit.com/user/catalinux/dupdump git://git.rocketgit.com/user/catalinux/dupdump

main master

List of commits:

Subject	Hash	Author	Date (UTC)
Take care of mangled file name between two dirs.	65eb5f1841aae521c401ab52298041776159a35d	Catalin(ux) M. BOIE	2014-06-21 12:14:36
duilder updates. Exclude Makefile when making tar.gz	accfc9df80b5889b3e1f1183e5f60d70850b5671	Catalin(ux) M. BOIE	2014-06-21 06:09:12
Bump version to 0.2	078df2ea0759e4240ef7deef35744f7077957129	Catalin(ux) M. BOIE	2014-06-19 17:09:39
Fixed a case when some files were not dumped at all	0c69ce3cb20aae442da94cff9606936461177e1d	Catalin(ux) M. BOIE	2014-06-19 17:05:16
First version that passes all tests.	9627508618bc2c783da838c2332e6896e3004c99	Catalin(ux) M. BOIE	2014-06-18 18:02:32
Fix a little problem with the man page.	1305ac52823f0206b98181c6435049f339be9c53	Catalin(ux) M. BOIE	2013-02-18 19:55:43
Lots of stuff	9c43842ac36feff6b29cb20a95ad4510a23bf472	Catalin(ux) M. BOIE	2012-07-20 15:43:51
Cosmetic + man	a53df11bfc30c152c9f61fdb1bcc69dc6ec20765	Catalin(ux) M. BOIE	2012-06-24 12:11:17
First working version\!	27bd1bf47c9fb707760d84ea3cf4241083fa283d	Catalin(ux) M. BOIE	2012-06-22 21:10:30
Several fixes	2909b1ba2e99929e775ddfea5f4894c50694a638	Catalin(ux) M. BOIE	2012-06-19 13:08:50
First version	3d7935d9b8a91694fe8213998ce4d3910348d6ef	Catalin(ux) M. BOIE	2012-05-06 19:40:40

Commit 65eb5f1841aae521c401ab52298041776159a35d - Take care of mangled file name between two dirs.

If file names are different between two directories, but content si the same
we now add the flag 'M'.

Author: Catalin(ux) M. BOIE
Author date (UTC): 2014-06-21 12:14
Committer name: Catalin(ux) M. BOIE
Committer date (UTC): 2014-06-22 19:57
Parent(s): accfc9df80b5889b3e1f1183e5f60d70850b5671
Signer:
Signing key:
Signing status: N
Tree: 4a187aa1be2aa7d4cfbf6058ce96ab11035255bd

File	Lines added	Lines deleted
Makefile.in	1	0
TODO	10	15
dupdump.1	7	6
store.c	64	27
store.h	3	0
tests/1/in/a1	0	1
tests/1/in/a2	0	1
tests/1/in/a3	0	1
tests/1/in/b1	0	1
tests/1/in/b2	0	1
tests/1/in/c1	0	1
tests/1/in/dir_a1/a4	0	1
tests/1/in/dir_a1/a5	0	1
tests/1/in/dir_b1/b3	0	1
tests/1/in/x/dir_a2/a6	0	1
tests/1/in/x/dir_a2/a7	0	1
tests/2/expected	0	1
tests/2/in/d1/a	0	1
tests/2/in/d1/b	0	1
tests/2/in/deeper/d2/c	0	1
tests/2/in/deeper/d2/d	0	1
tests/3/expected	0	1
tests/3/in/dir_a1/a1	0	1
tests/3/in/dir_a1/b1	0	1
tests/3/in/dir_a2/a1x	0	1
tests/3/in/dir_a2/b1x	0	1
tests/4/expected	0	1
tests/4/in/dir1/dirA/fileA	0	1
tests/4/in/dir1/dirB/fileB	0	1
tests/4/in/fake/dir2/dirA/fileA	0	1
tests/4/in/fake/dir2/dirB/fileB	0	1
tests/5/in/dir1/a	0	1
tests/5/in/dir2/a	0	1
tests/5/in/dir3/sub/a	0	1
tests/5/in/dir3/sub/fake	0	1
tests/README	3	0
tests/run.sh	9	5
tests/t_1/expected	3	3
tests/t_1/pre.sh	20	0
tests/t_2/expected	3	0
tests/t_2/pre.sh	13	0
tests/t_3/expected	6	0
tests/t_3/pre.sh	14	0
tests/t_4/README	2	0
tests/t_4/expected	1	0
tests/t_4/pre.sh	9	0
tests/t_5/expected	1	1
tests/t_5/pre.sh	10	0
tests/t_6/expected	0	0
tests/t_6/pre.sh	5	0
tests/t_7/expected	1	0
tests/t_7/pre.sh	6	0
tests/t_8/expected	0	0
tests/t_8/pre.sh	10	0
tests/util.inc	12	0

File Makefile.in changed (mode: 100644) (index 3f2da1f..d27d99f)
...	...	dupdump: dupdump.c $(OBJS)
16	16	clean:	clean:
17	17	@rm -fv $(OBJS) dupdump vgcore.*	@rm -fv $(OBJS) dupdump vgcore.*
18	18	@-rm -f $(PRJ)-.rpm $(PRJ)---.tgz $(PRJ)-*.tar.gz	@-rm -f $(PRJ)-.rpm $(PRJ)---.tgz $(PRJ)-*.tar.gz
	19		make -C tests clean
19	20
20	21
21	22	install: all	install: all

File TODO changed (mode: 100644) (index af771af..dfc0504)
1		[ ] 1-bit fileds are not printed as %hhu!
2		[ ] Why we use 'left' flag?! Because we mark do_not_dump in the same place
3		where we set 'left'! Or, we should mark as left and not mark as
4		do_not_dump.
	1		[ ] We must construct the test because we are playing with mtime now!
	2		[ ] Adapt man file to recent changes: flags etc.
	3		[ ] Ignore empty files.
	4		[ ] 1-bit fields are not printed as %hhu!
5	5	[ ] Because we ignore !dir and !files, we may not have really identical	[ ] Because we ignore !dir and !files, we may not have really identical
6	6	directories. In one of them we may have a socket, for example.	directories. In one of them we may have a socket, for example.
	7		Add a reporting flag for this situation. Or, do not ignore other
	8		type of files.
7	9	[ ] I must document a high level view over "algorithm". Even myself	[ ] I must document a high level view over "algorithm". Even myself
8	10	I do not remember what I am doing...	I do not remember what I am doing...
9	11	[ ] id dir1/subdir1 = dir2/subdir1 + dir1/subdir2 = dir2/subdir2 => dir1 = dir2.	[ ] id dir1/subdir1 = dir2/subdir1 + dir1/subdir2 = dir2/subdir2 => dir1 = dir2.
10	12	Se pare ca raportez si directoarele mici si cel mare.	Se pare ca raportez si directoarele mici si cel mare.
11
12		Bugs:
13		[ ] Seems an empty files matches dumplog.txt! check bug1 dir!
14
15	13	[ ] Use fadvise to not cache data in RAM.	[ ] Use fadvise to not cache data in RAM.
16	14	[ ] Use more threads	[ ] Use more threads
17	15	[ ] We should order by mtime, older one being the first shown.	[ ] We should order by mtime, older one being the first shown.

...	...	Bugs:
26	24	dir4=dir3	dir4=dir3
27	25
28	26	[ ] We could throw away unique files.	[ ] We could throw away unique files.
29		[ ] Comparing in O(N*N) sucks!
	27		[ ] Comparing in O(N*N) sucks! Where?
30	28	[ ] Install man.	[ ] Install man.
31	29	[ ] Dump in stats also the max memory used.	[ ] Dump in stats also the max memory used.
32	30	[ ] Dump two types of dirs: DIR AND DIRFNC (File Names Changed).	[ ] Dump two types of dirs: DIR AND DIRFNC (File Names Changed).
33		Maybe also for files
34		[ ] Another type of DIR is when a dir is included in another one.
35		How should I report it?
36
	31		Maybe list files that were renamed (cmd line flag).
37	32	[ ] Use a cache, specified by command line. Use inode and mtime for key?	[ ] Use a cache, specified by command line. Use inode and mtime for key?
38		[ ] Dump memory peak usage for statistics.
39		[ ]
	33		[ ] dir1/a+b, dir2/c+d, c is a soft/hard link to a. Content of b is the same
	34		with the content of d. What should I do?

File dupdump.1 changed (mode: 100644) (index 6b31415..b032b52)
1	1	.TH DUPDUMP 1	.TH DUPDUMP 1
2	2	.\" NAME should be all caps, SECTION should be 1-8, maybe w/ subsection	.\" NAME should be all caps, SECTION should be 1-8, maybe w/ subsection
3	3	.\" other parms are allowed: see man(7), man(1)	.\" other parms are allowed: see man(7), man(1)
4		.SH NAME
	4		.SH "NAME"
5	5	dupdump \- finds duplicate files in a given set of directories	dupdump \- finds duplicate files in a given set of directories
6		.SH SYNOPSIS
	6		.SH "SYNOPSIS"
7	7	.B dupdump	.B dupdump
8	8	[	[
9	9	.I options	.I options
10	10	]	]
11	11	.I <dir1>	.I <dir1>
12	12	\\|.\\|.\\|.	\\|.\\|.\\|.
	13		.I <dirN>
13	14
14	15	.SH "DESCRIPTION"	.SH "DESCRIPTION"
15	16	Searches a list of dirs, recursively to find dir and file matches.	Searches a list of dirs, recursively to find dir and file matches.

...	...	It is using SHA-1 to test the match. It outputs a list of three columns:
17	18	first is the type of match (DIR or FILE) and the second and the third, the	first is the type of match (DIR or FILE) and the second and the third, the
18	19	matches.	matches.
19	20
20		.SH OPTIONS
	21		.SH "OPTIONS"
21	22	.TP	.TP
22	23	.B -z --zero	.B -z --zero
23	24	use \\0 as fields and records separator instead of \\t and \\n	use \\0 as fields and records separator instead of \\t and \\n

...	...	when a dir match is possible
29	30	.B -o --out	.B -o --out
30	31	specify where to store the list of duplicates (default stdout)	specify where to store the list of duplicates (default stdout)
31	32	.TP	.TP
32		.B -v --verbose:
	33		.B -v --verbose
33	34	be more verbose	be more verbose
34	35	.TP	.TP
35	36	.B -d --debug	.B -d --debug

...	...	dump debug information useful for the developers
43	44	.UR "http://kernel.embedromix.ro/us/"	.UR "http://kernel.embedromix.ro/us/"
44	45	Home page	Home page
45	46	.UE .	.UE .
46		.SH NOTES
	47		.SH "NOTES"
47	48	This program does not delete any files. Is your responsability to	This program does not delete any files. Is your responsability to
48	49	take care of what to delete.	take care of what to delete.
49		.SH AUTHOR
	50		.SH "AUTHOR"
50	51	.UR catab-dupdump@embedromix.ro	.UR catab-dupdump@embedromix.ro
51	52	Catalin(ux) M. BOIE	Catalin(ux) M. BOIE
52	53	.UE .	.UE .

File store.c changed (mode: 100644) (index cb70b4b..b64e73f)
...	...	int file_add(const char file, const struct stat s,
259	259	q->dev = s->st_dev;	q->dev = s->st_dev;
260	260	q->ino = s->st_ino;	q->ino = s->st_ino;
261	261	q->level = level;	q->level = level;
	262		q->mtime = s->st_mtime;
262	263
263	264	/* link with dir */	/* link with dir */
264	265	parent = dir_current[level - 1];	parent = dir_current[level - 1];

...	...	int file_add(const char file, const struct stat s,
273	274	if (file_info[hash] == NULL) {	if (file_info[hash] == NULL) {
274	275	file_info[hash] = q;	file_info[hash] = q;
275	276	} else {	} else {
276		/* search for a bigger item and insert before it */
	277		/* We order by size, level, mtime, name */
	278		/* Better to use qsort. TODO */
277	279	p = file_info[hash];	p = file_info[hash];
278	280	prev = NULL;	prev = NULL;
279	281	while (p) {	while (p) {
280		if (size == p->size) {
281		if (level < p->level)
	282		if (q->size < p->size)
	283		break;
	284
	285		if (q->size == p->size) {
	286		if (q->level < p->level)
282	287	break;	break;
283	288
284		if (strcmp(file, p->name) < 0)
	289		if (q->mtime < p->mtime)
285	290	break;	break;
286		}
287	291
288		if (size < p->size)
289		break;
	292		if (strcmp(q->name, p->name) < 0)
	293		break;
	294		}
290	295
291	296	prev = p;	prev = p;
292	297	p = p->hash_next;	p = p->hash_next;

...	...	void dir_dump_node(const struct dir_node d, const unsigned int level)*
424	429	struct dir_node *subdir;	struct dir_node *subdir;
425	430	struct file_node *file;	struct file_node *file;
426	431	char dump[SHA_DIGEST_LENGTH * 2 + 1];	char dump[SHA_DIGEST_LENGTH * 2 + 1];
	432		char fnh[SHA_DIGEST_LENGTH * 2 + 1];
427	433
428	434	memset(prefix, ' ', (level + 1) * 2);	memset(prefix, ' ', (level + 1) * 2);
429	435	prefix[(level + 1) * 2] = '\0';	prefix[(level + 1) * 2] = '\0';
430	436
431	437	sha1_dump(dump, d->sha1, 8);	sha1_dump(dump, d->sha1, 8);
	438		sha1_dump(fnh, d->file_names_sha1, 8);
432	439	fprintf(stderr, "%sD '%s' d=%p subdirs=%p next_sibling=%p"	fprintf(stderr, "%sD '%s' d=%p subdirs=%p next_sibling=%p"
433	440	" files=%p parent=%p no_dup_possible=%u do_not_dump=%u"	" files=%p parent=%p no_dup_possible=%u do_not_dump=%u"
434		" level=%hu hash_next=%p left=%hhu sha1=%s\n",
	441		" level=%hu hash_next=%p left=%hhu sha1=%s file_names_sha1=%s\n",
435	442	prefix, d->name, d, d->subdirs, d->next_sibling,	prefix, d->name, d, d->subdirs, d->next_sibling,
436	443	d->files, d->parent, d->no_dup_possible, d->do_not_dump,	d->files, d->parent, d->no_dup_possible, d->do_not_dump,
437		d->level, d->hash_next, d->left, dump);
	444		d->level, d->hash_next, d->left, dump, fnh);
438	445
439	446	subdir = d->subdirs;	subdir = d->subdirs;
440	447	while (subdir) {	while (subdir) {

...	...	static void dir_mark_up_no_dup_possible(struct dir_node d)*
511	518	/*	/*
512	519	* When we list a folder on the left side, we must mark whole hierarchy under	* When we list a folder on the left side, we must mark whole hierarchy under
513	520	* it as 'do_not_dump'. Else, we will dump its files and we do not want that.	* it as 'do_not_dump'. Else, we will dump its files and we do not want that.
514		* TODO: But, we may have dir1 == dir2 and dir1/file1 == dir3/file3. In this case we want to dump dir1/file!
515	521	*/	*/
516	522	static void dir_mark_down_do_not_dump(struct dir_node *d)	static void dir_mark_down_do_not_dump(struct dir_node *d)
517	523	{	{

...	...	int file_find_dups(void)
716	722	}	}
717	723
718	724	if (debug) {	if (debug) {
719		fprintf(stderr, "[*] Dump chain %u start:\n", hash);
	725		if (debug)
	726		fprintf(stderr, "[*] Dump chain %u start:\n", hash);
720	727	q = file_info[hash];	q = file_info[hash];
721	728	while (q) {	while (q) {
722	729	fprintf(stderr, "%s:\n", q->name);	fprintf(stderr, "%s:\n", q->name);
723	730	dups = q->duplicates;	dups = q->duplicates;
724	731	while(dups) {	while(dups) {
725		fprintf(stderr, "\t%s\n", dups->name);
	732		if (debug)
	733		fprintf(stderr, "\t%s\n", dups->name);
726	734	dups = dups->duplicates;	dups = dups->duplicates;
727	735	}	}
728	736	q = q->hash_next;	q = q->hash_next;
729	737	}	}
730		fprintf(stderr, "[*] Dump chain %u stop\n", hash);
	738		if (debug)
	739		fprintf(stderr, "[*] Dump chain %u stop\n", hash);
731	740	}	}
732	741	}	}
733	742

...	...	static int file_compare_hashes(const void a0, const void b0)
761	770	* We need to sort because the order of files in dirs may differ because	* We need to sort because the order of files in dirs may differ because
762	771	* the names may be different but the content the same.	* the names may be different but the content the same.
763	772	* TODO: Shouldn't we test if a file is unique=1 and skip the checksum of dir???	* TODO: Shouldn't we test if a file is unique=1 and skip the checksum of dir???
	773		* We return the file names hash in @fn.
764	774	*/	*/
765		static int dir_files_hash(unsigned char hash, struct dir_node d)
	775		static int dir_files_hash(unsigned char hash, unsigned char fn,
	776		struct dir_node *d)
766	777	{	{
767	778	struct file_node *p;	struct file_node *p;
768	779	struct file_node **u;	struct file_node **u;
769	780	unsigned int i, mem;	unsigned int i, mem;
770		SHA_CTX c;
	781		SHA_CTX c, fnh;
	782		char *base_name;
771	783
772	784	if (d->files == NULL) {	if (d->files == NULL) {
773	785	memset(hash, 0, SHA_DIGEST_LENGTH);	memset(hash, 0, SHA_DIGEST_LENGTH);
	786		memset(fn, 0, SHA_DIGEST_LENGTH);
774	787	return 0;	return 0;
775	788	}	}
776	789

...	...	static int dir_files_hash(unsigned char hash, struct dir_node d)
790	803	qsort(u, d->no_of_files, sizeof(struct file_node *), file_compare_hashes);	qsort(u, d->no_of_files, sizeof(struct file_node *), file_compare_hashes);
791	804
792	805	SHA1_Init(&c);	SHA1_Init(&c);
	806		SHA1_Init(&fnh);
793	807
794	808	i = 0;	i = 0;
795	809	while (i < d->no_of_files) {	while (i < d->no_of_files) {
796	810	SHA1_Update(&c, u[i]->sha1_full, SHA_DIGEST_LENGTH);	SHA1_Update(&c, u[i]->sha1_full, SHA_DIGEST_LENGTH);
	811
	812		base_name = basename(u[i]->name);
	813		if (debug)
	814		fprintf(stderr, "%s: XXX: add file name hash of [%s]\n", __func__, base_name);
	815
	816		SHA1_Update(&fnh, base_name, strlen(base_name));
797	817	i++;	i++;
798	818	}	}
799	819
800	820	SHA1_Final(hash, &c);	SHA1_Final(hash, &c);
	821		SHA1_Final(fn, &fnh);
801	822
802	823	free(u);	free(u);
803	824

...	...	static int dir_files_hash(unsigned char hash, struct dir_node d)
810	831	static long long dir_build_hash(struct dir_node *d)	static long long dir_build_hash(struct dir_node *d)
811	832	{	{
812	833	struct dir_node *subdir;	struct dir_node *subdir;
813		SHA_CTX c;
	834		SHA_CTX c, fnh;
814	835	unsigned char files_hash[SHA_DIGEST_LENGTH];	unsigned char files_hash[SHA_DIGEST_LENGTH];
	836		unsigned char file_names_sha1[SHA_DIGEST_LENGTH];
815	837	int err;	int err;
816	838	long long no_of_possible_dirs = 0;	long long no_of_possible_dirs = 0;
817	839	long long ret;	long long ret;
	840		char *base_name;
818	841
819	842	if (debug)	if (debug)
820	843	fprintf(stderr, "DEBUG: %s [%s] no_dup_possible=%u\n",	fprintf(stderr, "DEBUG: %s [%s] no_dup_possible=%u\n",

...	...	static long long dir_build_hash(struct dir_node d)*
831	854	no_of_possible_dirs++;	no_of_possible_dirs++;
832	855
833	856	/* Order files by hash to compute correct hashes */	/* Order files by hash to compute correct hashes */
834		err = dir_files_hash(files_hash, d);
	857		err = dir_files_hash(files_hash, file_names_sha1, d);
835	858	if (err != 0)	if (err != 0)
836	859	return -1;	return -1;
837	860
838	861	SHA1_Init(&c);	SHA1_Init(&c);
	862		SHA1_Init(&fnh);
839	863	SHA1_Update(&c, files_hash, SHA_DIGEST_LENGTH);	SHA1_Update(&c, files_hash, SHA_DIGEST_LENGTH);
	864		SHA1_Update(&fnh, file_names_sha1, SHA_DIGEST_LENGTH);
840	865
	866		/* At the same time, we build hash of file names */
841	867	subdir = d->subdirs;	subdir = d->subdirs;
842	868	while (subdir) {	while (subdir) {
843	869	ret = dir_build_hash(subdir);	ret = dir_build_hash(subdir);
844	870	if (ret == -1)	if (ret == -1)
845	871	return -1;	return -1;
846	872
	873		base_name = basename(subdir->name);
	874		if (debug)
	875		fprintf(stderr, "%s: XXX: add subdir name to fnh [%s]\n", __func__, base_name);
	876		SHA1_Update(&fnh, base_name, strlen(base_name));
	877
847	878	no_of_possible_dirs += ret;	no_of_possible_dirs += ret;
848	879	SHA1_Update(&c, subdir->sha1, SHA_DIGEST_LENGTH);	SHA1_Update(&c, subdir->sha1, SHA_DIGEST_LENGTH);
	880		if (debug)
	881		fprintf(stderr, "%s: XXX: add subdir->f_n_sha1 to fnh [%s]\n", __func__, subdir->name);
	882		SHA1_Update(&fnh, subdir->file_names_sha1, SHA_DIGEST_LENGTH);
	883
849	884	subdir = subdir->next_sibling;	subdir = subdir->next_sibling;
850	885	}	}
851	886
852	887	SHA1_Final(d->sha1, &c);	SHA1_Final(d->sha1, &c);
	888		SHA1_Final(d->file_names_sha1, &fnh);
853	889
854	890	return no_of_possible_dirs;	return no_of_possible_dirs;
855	891	}	}

...	...	void dir_dump_duplicates(struct dir_node d, const unsigned int zero)*
1023	1059	{	{
1024	1060	struct dir_node *p;	struct dir_node *p;
1025	1061	char sep, final;	char sep, final;
	1062		char flags[9];
1026	1063
1027	1064	if (debug)	if (debug)
1028	1065	fprintf(stderr, "[*] dir_dump_duplicates(%s)\n", d->name);	fprintf(stderr, "[*] dir_dump_duplicates(%s)\n", d->name);

...	...	void dir_dump_duplicates(struct dir_node d, const unsigned int zero)*
1075	1112	fprintf(stderr, "dir_dump_duplicates: set do_not_dump on 'left' [%s]\n", d->name);	fprintf(stderr, "dir_dump_duplicates: set do_not_dump on 'left' [%s]\n", d->name);
1076	1113	dir_mark_left(d);	dir_mark_left(d);
1077	1114
1078		/*
1079		if (debug)
1080		fprintf(stderr, "dir_dump_duplicates: set do_not_dump=1 on left [%s]\n", d->name);
1081		dir_mark_down_do_not_dump(d);
1082		*/
1083
1084	1115	if (debug)	if (debug)
1085	1116	fprintf(stderr, "dir_dump_duplicates: set do_not_dump on right [%s]\n", p->name);	fprintf(stderr, "dir_dump_duplicates: set do_not_dump on right [%s]\n", p->name);
1086	1117	dir_mark_down_do_not_dump(p);	dir_mark_down_do_not_dump(p);
1087	1118
	1119		memset(flags, '-', sizeof(flags) - 1);
	1120		flags[sizeof(flags) - 1] = '\0';
	1121
	1122		if (memcmp(d->file_names_sha1, p->file_names_sha1, SHA_DIGEST_LENGTH) != 0)
	1123		flags[0] = 'M';
	1124
1088	1125	if (debug)	if (debug)
1089		fprintf(stderr, "DIR%c%s%c%s%c",
1090		sep, d->name, sep, p->name, final);
1091		fprintf(out, "DIR%c%s%c%s%c",
1092		sep, d->name, sep, p->name, final);
	1126		fprintf(stderr, "DIR%c%s%c%s%c%s%c",
	1127		sep, flags, sep, d->name, sep, p->name, final);
	1128		fprintf(out, "DIR%c%s%c%s%c%s%c",
	1129		sep, flags, sep, d->name, sep, p->name, final);
1093	1130	p = p->hash_next;	p = p->hash_next;
1094	1131	}	}
1095	1132	}	}

File store.h changed (mode: 100644) (index 4de63fb..37e6eac)
...	...	struct file_node
31	31	struct file_node *hash_next;	struct file_node *hash_next;
32	32	struct dir_node *parent;	struct dir_node *parent;
33	33	struct file_node *duplicates;	struct file_node *duplicates;
	34		time_t mtime;
34	35	};	};
35	36
36	37	struct dir_node	struct dir_node
37	38	{	{
38	39	char *name;	char *name;
39	40	unsigned char sha1[SHA_DIGEST_LENGTH];	unsigned char sha1[SHA_DIGEST_LENGTH];
	41		unsigned char file_names_sha1[SHA_DIGEST_LENGTH];
40	42	unsigned char no_dup_possible:1;	unsigned char no_dup_possible:1;
41	43	unsigned char do_not_dump:1;	unsigned char do_not_dump:1;
42	44	unsigned char left:1;	unsigned char left:1;

...	...	struct dir_node
49	51	unsigned int no_of_files;	unsigned int no_of_files;
50	52	struct dir_node *parent;	struct dir_node *parent;
51	53	struct dir_node hash_next; / in the last phase, here we store duplicates */	struct dir_node hash_next; / in the last phase, here we store duplicates */
	54		time_t mtime;
52	55	};	};
53	56
54	57

File tests/1/in/a1 deleted (index 7284ab4..0000000)
1		aaaa

File tests/1/in/a2 deleted (index 7284ab4..0000000)
1		aaaa

File tests/1/in/a3 deleted (index 7284ab4..0000000)
1		aaaa

File tests/1/in/b1 deleted (index 6484fb6..0000000)
1		bbbb

File tests/1/in/b2 deleted (index 6484fb6..0000000)
1		bbbb

File tests/1/in/c1 deleted (index baebf33..0000000)
1		cccc

File tests/1/in/dir_a1/a4 deleted (index 7284ab4..0000000)
1		aaaa

File tests/1/in/dir_a1/a5 deleted (index 7284ab4..0000000)
1		aaaa

File tests/1/in/dir_b1/b3 deleted (index 6484fb6..0000000)
1		bbbb

File tests/1/in/x/dir_a2/a6 deleted (index 7284ab4..0000000)
1		aaaa

File tests/1/in/x/dir_a2/a7 deleted (index 7284ab4..0000000)
1		aaaa

File tests/2/expected deleted (index c24cb0f..0000000)
1		DIR in/d1 in/deeper/d2

File tests/2/in/d1/a deleted (index 5ee608e..0000000)
1		xxxx

File tests/2/in/d1/b deleted (index 97aee46..0000000)
1		yyyy

File tests/2/in/deeper/d2/c deleted (index 97aee46..0000000)
1		yyyy

File tests/2/in/deeper/d2/d deleted (index 5ee608e..0000000)
1		xxxx

File tests/3/expected deleted (index 0061f50..0000000)
1		DIR in/dir_a1 in/dir_a2

File tests/3/in/dir_a1/a1 deleted (index 5d308e1..0000000)
1		aaaa

File tests/3/in/dir_a1/b1 deleted (index b433656..0000000)
1		bbbb

File tests/3/in/dir_a2/a1x deleted (index b433656..0000000)
1		bbbb

File tests/3/in/dir_a2/b1x deleted (index 5d308e1..0000000)
1		aaaa

File tests/4/expected deleted (index a27abe8..0000000)
1		DIR in/dir1 in/fake/dir2

File tests/4/in/dir1/dirA/fileA deleted (index 81c545e..0000000)
1		1234

File tests/4/in/dir1/dirB/fileB deleted (index 97b5955..0000000)
1		12345678

File tests/4/in/fake/dir2/dirA/fileA deleted (index 81c545e..0000000)
1		1234

File tests/4/in/fake/dir2/dirB/fileB deleted (index 97b5955..0000000)
1		12345678

File tests/5/in/dir1/a deleted (index 2e65efe..0000000)
1		a

File tests/5/in/dir2/a deleted (index 2e65efe..0000000)
1		a

File tests/5/in/dir3/sub/a deleted (index 2e65efe..0000000)
1		a

File tests/5/in/dir3/sub/fake deleted (index f0f877c..0000000)
1		fake

File tests/README added (mode: 100644) (index 0000000..7f6bcc8)
	1	Whean adding a new test, you must:
	2	- sort 'expected' file
	3	-

File tests/run.sh changed (mode: 100755) (index ab6a21d..30d25d0)
1	1	#!/bin/bash	#!/bin/bash
2	2
3		for t in `ls`; do
4		if [ "${t}" = "run.sh" ]; then
5		continue
6		fi
7
	3		find -type d -name 't_*' -print \| sort \| cut -b3- \| while read t; do
8	4	echo "Running test [${t}]..."	echo "Running test [${t}]..."
9	5	(	(
10	6	cd "${t}"	cd "${t}"
11	7
	8		# Prepare stuff
	9		./pre.sh
	10		if [ "${?}" != "0" ]; then
	11		echo "Preparation for test [${t}] failed!"
	12		exit 1
	13		fi
	14
12	15	valgrind --tool=memcheck \	valgrind --tool=memcheck \
13	16	--num-callers=16 \	--num-callers=16 \
14	17	--leak-check=full \	--leak-check=full \

...	...	for t in `ls`; do
17	20	--trace-children=yes \	--trace-children=yes \
18	21	--track-origins=yes \	--track-origins=yes \
19	22	../../dupdump --verbose --debug --out test.out in &>test.log	../../dupdump --verbose --debug --out test.out in &>test.log
	23		sort test.out > test.out2 && mv test.out2 test.out
20	24	diff -u expected test.out > test.diff	diff -u expected test.out > test.diff
21	25	if [ "${?}" != "0" ]; then	if [ "${?}" != "0" ]; then
22	26	echo "Test [${t}] failed!"	echo "Test [${t}] failed!"

File tests/t_1/expected renamed from tests/1/expected (similarity 80%) (mode: 100644) (index ebb5f4c..fa6f604)
1		DIR in/dir_a1 in/x/dir_a2
	1		DIR M------- in/dir_a1 in/x/dir_a2
	2		FILE in/b1 in/b2
	3		FILE in/b1 in/dir_b1/b3
2	4	FILE in/dir_a1/a4 in/a1	FILE in/dir_a1/a4 in/a1
3	5	FILE in/dir_a1/a4 in/a2	FILE in/dir_a1/a4 in/a2
4	6	FILE in/dir_a1/a4 in/a3	FILE in/dir_a1/a4 in/a3
5	7	FILE in/dir_a1/a4 in/dir_a1/a5	FILE in/dir_a1/a4 in/dir_a1/a5
6		FILE in/b1 in/b2
7		FILE in/b1 in/dir_b1/b3

File tests/t_1/pre.sh added (mode: 100755) (index 0000000..72d34d9)
	1	#!/bin/bash
	2
	3	. ../util.inc
	4
	5	data_out "in/a1" "aaaa"
	6	data_out "in/a2" "aaaa"
	7	data_out "in/a3" "aaaa"
	8
	9	data_out "in/b1" "bbbb"
	10	data_out "in/b2" "bbbb"
	11
	12	data_out "in/c1" "cccc"
	13
	14	data_out "in/dir_a1/a4" "aaaa"
	15	data_out "in/dir_a1/a5" "aaaa"
	16
	17	data_out "in/dir_b1/b3" "bbbb"
	18
	19	data_out "in/x/dir_a2/a6" "aaaa"
	20	data_out "in/x/dir_a2/a7" "aaaa"

File tests/t_2/expected added (mode: 100644) (index 0000000..be96d76)
	1	DIR M------- in/d1 in/deeper/d2
	2	FILE in/a1 in/a2
	3	FILE in/a1 in/a3

File tests/t_2/pre.sh added (mode: 100755) (index 0000000..b949783)
	1	#!/bin/bash
	2
	3	. ../util.inc
	4
	5	data_out "in/a1" "aaaa"
	6	data_out "in/a2" "aaaa"
	7	data_out "in/a3" "aaaa"
	8
	9	data_out "in/d1/a" "xxxx"
	10	data_out "in/d1/b" "yyyy"
	11
	12	data_out "in/deeper/d2/c" "yyyy"
	13	data_out "in/deeper/d2/d" "xxxx"

File tests/t_3/expected added (mode: 100644) (index 0000000..21930f4)
	1	FILE in/a1 in/a2
	2	FILE in/a1 in/a3
	3	FILE in/a1 in/dir_a1/a1
	4	FILE in/a1 in/dir_a1/a4
	5	FILE in/a1 in/dir_a2/b1x
	6	FILE in/dir_a1/b1 in/dir_a2/a1x

File tests/t_3/pre.sh added (mode: 100755) (index 0000000..057d8bb)
	1	#!/bin/bash
	2
	3	. ../util.inc
	4
	5	data_out "in/dir_a1/a1" "aaaa"
	6	data_out "in/dir_a1/a4" "aaaa"
	7	data_out "in/dir_a1/b1" "bbbb"
	8
	9	data_out "in/dir_a2/a1x" "bbbb"
	10	data_out "in/dir_a2/b1x" "aaaa"
	11
	12	data_out "in/a1" "aaaa"
	13	data_out "in/a2" "aaaa"
	14	data_out "in/a3" "aaaa"

File tests/t_4/README renamed from tests/4/README (similarity 64%) (mode: 100644) (index 7e28c5d..e97564a)
...	...	fake
14	14	fileB	fileB
15	15
16	16	Should report only dir1 and dir2.	Should report only dir1 and dir2.
	17
	18		Se pare ca nu ordonez hash-ul pe file_names si din cauza asta hash-ul pe dir nu e corect.

File tests/t_4/expected added (mode: 100644) (index 0000000..bff8ab8)
	1		DIR -------- in/dir1 in/fake/dir2

File tests/t_4/pre.sh added (mode: 100755) (index 0000000..b3218bb)
	1	#!/bin/bash
	2
	3	. ../util.inc
	4
	5	data_out "in/dir1/dirA/fileA" "1234"
	6	data_out "in/dir1/dirB/fileB" "12345678"
	7
	8	data_out "in/fake/dir2/dirA/fileA" "1234"
	9	data_out "in/fake/dir2/dirB/fileB" "12345678"

File tests/t_5/expected renamed from tests/5/expected (similarity 50%) (mode: 100644) (index b54ccfd..8b30435)
1		DIR in/dir1 in/dir2
	1		DIR -------- in/dir1 in/dir2
2	2	FILE in/dir1/a in/dir3/sub/a	FILE in/dir1/a in/dir3/sub/a

File tests/t_5/pre.sh added (mode: 100755) (index 0000000..8fff918)
	1	#!/bin/bash
	2
	3	. ../util.inc
	4
	5	data_out "in/dir1/a" "a"
	6
	7	data_out "in/dir2/a" "a"
	8
	9	data_out "in/dir3/sub/a" "a"
	10	data_out "in/dir3/sub/fake" "fake"

File tests/t_6/expected copied from file tests/6/expected (similarity 100%)

File tests/t_6/pre.sh added (mode: 100755) (index 0000000..9a69aaa)
	1	#!/bin/bash
	2
	3	. ../util.inc
	4
	5	mkdir -p in

File tests/t_7/expected added (mode: 100644) (index 0000000..51b209b)
	1		DIR M------- in/a in/b

File tests/t_7/pre.sh added (mode: 100755) (index 0000000..8e6e040)
	1	#!/bin/bash
	2
	3	. ../util.inc
	4
	5	data_out "in/a/name1" "aaaa"
	6	data_out "in/b/name2" "aaaa"

File tests/t_8/expected renamed from tests/6/expected (similarity 100%)

File tests/t_8/pre.sh added (mode: 100755) (index 0000000..69c1107)
	1	#!/bin/bash
	2
	3	. ../util.inc
	4
	5	rm -rf in
	6
	7	data_out "in/a/file1" "aaaa"
	8
	9	mkdir -p in/b
	10	ln in/a/file1 in/b/file2

File tests/util.inc added (mode: 100644) (index 0000000..05c3f5d)
	1	#!/bin/bash
	2
	3	function data_out()
	4	{
	5	file="${1}"
	6	content="${2}"
	7
	8	dir=`dirname "${file}"`
	9	mkdir -p "${dir}"
	10
	11	echo "${content}" > "${file}"
	12	}

Hints:
Before first commit, do not forget to setup your git environment:

git config --global user.name "your_name_here"
git config --global user.email "your@email_here"

Clone this repository using HTTP(S):

git clone https://rocketgit.com/user/catalinux/dupdump

Clone this repository using ssh (do not forget to upload a key first):

git clone ssh://rocketgit@ssh.rocketgit.com/user/catalinux/dupdump

Clone this repository using git:

git clone git://git.rocketgit.com/user/catalinux/dupdump

You are allowed to anonymously push to this repository.
This means that your pushed commits will automatically be transformed into a merge request:

... clone the repository ...
... make some changes and some commits ...
git push origin main