RocketGit

catalinux / dupdump (public) (License: GPLv3) (since 2016-03-01) (hash sha1)

Find duplicated files and directories.

Clone URLs: https://rocketgit.com/user/catalinux/dupdump ssh://rocketgit@ssh.rocketgit.com/user/catalinux/dupdump git://git.rocketgit.com/user/catalinux/dupdump

main master

List of commits:

Subject	Hash	Author	Date (UTC)
Cosmetic + man	a53df11bfc30c152c9f61fdb1bcc69dc6ec20765	Catalin(ux) M. BOIE	2012-06-24 12:11:17
First working version\!	27bd1bf47c9fb707760d84ea3cf4241083fa283d	Catalin(ux) M. BOIE	2012-06-22 21:10:30
Several fixes	2909b1ba2e99929e775ddfea5f4894c50694a638	Catalin(ux) M. BOIE	2012-06-19 13:08:50
First version	3d7935d9b8a91694fe8213998ce4d3910348d6ef	Catalin(ux) M. BOIE	2012-05-06 19:40:40

Commit a53df11bfc30c152c9f61fdb1bcc69dc6ec20765 - Cosmetic + man
Author: Catalin(ux) M. BOIE
Author date (UTC): 2012-06-24 12:11
Committer name: Catalin(ux) M. BOIE
Committer date (UTC): 2012-06-24 12:11
Parent(s): 27bd1bf47c9fb707760d84ea3cf4241083fa283d
Signing key:
Tree: 7287e4c1d0aa4d180510ce831f704b3321258da4

File	Lines added	Lines deleted
TODO	3	28
dupdump.1	47	0
dupdump.c	62	19
store.c	89	95
store.h	1	0

File TODO changed (mode: 100644) (index 9e7988c..6cdf3ea)
1		[X] Mark files as NOT_POSSIBLE_DUPLICATES
2		[X] Do the compare of size and hashes
3		[X] If match, mark as POSSIBLE_DUPLICATES
4		[X] For the rest, propagate the flag to the parent dirs.
5
6		[ ] compute dir hashes by sorting hashes for dirs and files, maybe, compute separately files and dir
7		[ ] Sort files and subdirs by hash - do a function
8
9		[ ] --min-size parameter and --max-size
10	1	[ ] Use more threads	[ ] Use more threads
11		[ ] Marcam directoarele care ar putea fi identice si apoi le scanam doar pe ele.
12		[ ] First, we build the directory tree, then, we compute sha1 where needed
13		(only the possible directories), then we build a hash by sha1 and then
14		we test for duplicated dir. We mark the dirs that are duplicated
15		and we do not dump files that are part of a dup dir.
16		[ ] Order input directories by len to avoid building a strange tree. Hm.
17		Probably does not work.
18	2	[ ] We should order by mtime, older one being the first shown.	[ ] We should order by mtime, older one being the first shown.
19		[ ]
20
21
22		After find_files_dups, unique dirs are marked as such.
23		The problems now are:
24		- how we detect equal dirs
25		- what about the case when we ran "dupdump ./2/3 /1/2" we should sort
	3		[ ] What about the case when we ran "dupdump ./2/3 /1/2" we should sort
26	4	somehow the path so /1/2 to be first because starts deeper.	somehow the path so /1/2 to be first because starts deeper.
27	5
28	6	[ ] Strange case:	[ ] Strange case:

...	...	The problems now are:
32	10	dir4=dir3	dir4=dir3
33	11
34	12	[ ] We could throw away unique files.	[ ] We could throw away unique files.
35		[ ] Sort files with same hash by hash AND level?
36		[ ] Seems that the sorting is not correct in 'expected'.
37		[ ] Test if qsort for dirs is sorting by name correctly!
38		[ ] Sorting sucks!
39		[ ]
	13		[ ] Comparing in O(N*N) sucks!
	14		[ ] Install man.

File dupdump.1 added (mode: 100644) (index 0000000..4054a0a)
	1	.TH DUPDUMP 1
	2	.\" NAME should be all caps, SECTION should be 1-8, maybe w/ subsection
	3	.\" other parms are allowed: see man(7), man(1)
	4	.SH NAME
	5	dupdump \- finds duplicate files in a given set of directories
	6	.SH SYNOPSIS
	7	.B dupdump
	8	[
	9	.I options
	10	]
	11	.I <dir1>
	12	\\|.\\|.\\|.
	13
	14	.SH "DESCRIPTION"
	15	Searches a list of dirs, recursively to find dir and file matches.
	16	It is using SHA-1 to test the match. It outputs a list of three columns:
	17	first is the type of match (DIR or FILE) and the second and the third, the
	18	matches.
	19
	20	.SH OPTIONS
	21	.TP
	22	.B -i --min-size
	23	Do not dump files under specified size. Still, they are taken into account
	24	when a dir match is possible
	25	.TP
	26	.B -v --verbose:
	27	be more verbose
	28	.TP
	29	.B -d --debug
	30	dump debug information useful for the developers
	31	.SH "SEE ALSO"
	32	.\" Always quote multiple words for .SH
	33	.PP
	34	.BR fdupes (1),
	35	.BR sha1sum (1)
	36	.PP
	37	.UR "http://kernel.embedromix.ro/us/"
	38	Home page
	39	.UE .
	40	.SH NOTES
	41	This program does not delete any files. Is your responsability to
	42	take care of what to delete.
	43	.SH AUTHOR
	44	.UR catab-dupdump@embedromix.ro
	45	Catalin(ux) M. BOIE
	46	.UE .
	47

File dupdump.c changed (mode: 100644) (index f10b7b7..b293669)
9	9	#include <stdlib.h>	#include <stdlib.h>
10	10	#include <string.h>	#include <string.h>
11	11	#include <errno.h>	#include <errno.h>
	12		#include <getopt.h>
12	13
13	14	#include "store.h"	#include "store.h"
14	15
15	16	static off_t min_size = 0;	static off_t min_size = 0;
16		static int verbose = 10;
	17		static int verbose = 0;
	18		static int debug = 0;
	19
	20		static struct option options[] =
	21		{
	22		{"min-size", required_argument, NULL, 'i'},
	23		{"verbose", no_argument, NULL, 'v'},
	24		{"debug", no_argument, NULL, 'd'},
	25		{NULL, 0, NULL, 0}
	26		};
	27
	28		static void usage(void)
	29		{
	30		fprintf(stderr, "Usage [options] <dir1> [<dir2>] ...\n"
	31		" --min-size -i Ignore files under this size (default 1)\n"
	32		" --verbose -v Be more verbose\n"
	33		" --debug -d Print debug information\n"
	34		);
	35		}
17	36
18	37	static int callback(const char fpath, const struct stat s, int tflag,	static int callback(const char fpath, const struct stat s, int tflag,
19	38	struct FTW *ftwbuf)	struct FTW *ftwbuf)

...	...	static int callback(const char fpath, const struct stat s, int tflag,
32	51	if ((!S_ISREG(s->st_mode)) && (!S_ISDIR(s->st_mode)))	if ((!S_ISREG(s->st_mode)) && (!S_ISDIR(s->st_mode)))
33	52	return 0;	return 0;
34	53
35		/* Ignore wat was already seen */
	54		/* Ignore what was already seen */
36	55	if (dev_ino_seen(tflag, s->st_dev, s->st_ino) == 1) {	if (dev_ino_seen(tflag, s->st_dev, s->st_ino) == 1) {
37	56	if (verbose >= 3)	if (verbose >= 3)
38	57	fprintf(stderr, "\tINFO: Object skiped because"	fprintf(stderr, "\tINFO: Object skiped because"

...	...	static int callback(const char fpath, const struct stat s, int tflag,
62	81
63	82	int main(int argc, char *argv[])	int main(int argc, char *argv[])
64	83	{	{
65		int flags = 0, i;
	84		int flags = 0;
66	85	int err;	int err;
67
68		if (argc < 2) {
69		fprintf(stderr, "Usage: dumpdump dir1 [dir2] [dir3]\n");
70		return 1;
	86		int options_index = 0;
	87		char c;
	88
	89		while ((c = getopt_long(argc, argv, "i:vdh", options, &options_index)) != -1) {
	90		switch (c) {
	91		case 'i': min_size = strtoul(optarg, NULL, 10); break;
	92		case 'v': verbose = 1; break;
	93		case 'd': debug = 1; break;
	94		default:
	95		usage();
	96		return 1;
	97		}
71	98	}	}
72	99
73	100	flags \|= FTW_PHYS; /* Do not follow symlinks */	flags \|= FTW_PHYS; /* Do not follow symlinks */
74	101	flags \|= FTW_ACTIONRETVAL; /* To skip hierarchies */	flags \|= FTW_ACTIONRETVAL; /* To skip hierarchies */
75	102
76		i = 1;
77		while (argv[i]) {
78		fprintf(stderr, "Processing dir %s...\n", argv[i]);
79		err = nftw(argv[i], callback, 100, flags);
	103		if (optind >= argc) {
	104		usage();
	105		fprintf(stderr, "No dirs to scan specified!\n");
	106		return 1;
	107		}
	108
	109		set_debug(debug);
	110
	111		if (verbose)
	112		fprintf(stderr, "Scanning for duplicates, min-size %lld\n",
	113		min_size);
	114
	115		while (optind < argc) {
	116		if (verbose)
	117		fprintf(stderr, "Processing dir %s...\n", argv[optind]);
	118
	119		err = nftw(argv[optind], callback, 100, flags);
80	120	if (err == -1) {	if (err == -1) {
81	121	fprintf(stderr, "Cannot search dir [%s] [%d] (%s)\n",	fprintf(stderr, "Cannot search dir [%s] [%d] (%s)\n",
82		argv[i], err, strerror(errno));
	122		argv[optind], err, strerror(errno));
83	123	return 1;	return 1;
84	124	}	}
85	125
86		i++;
	126		optind++;
87	127	}	}
88	128
89		if (verbose >= 2)
	129		if (debug)
90	130	dump_files();	dump_files();
91	131
92		/* Check for file duplicates */
	132		if (verbose)
	133		fprintf(stderr, "Find duplicate files...\n");
93	134	err = file_find_dups();	err = file_find_dups();
94	135	if (err != 0) {	if (err != 0) {
95	136	fprintf(stderr, "Error comparing files!\n");	fprintf(stderr, "Error comparing files!\n");
96	137	return 1;	return 1;
97	138	}	}
98	139
99		/* Check for dir duplicates */
	140		if (verbose)
	141		fprintf(stderr, "Find duplicate dirs...\n");
100	142	err = dir_find_dups();	err = dir_find_dups();
101	143	if (err != 0) {	if (err != 0) {
102	144	fprintf(stderr, "Error comparing dirs!\n");	fprintf(stderr, "Error comparing dirs!\n");
103	145	return 1;	return 1;
104	146	}	}
105	147
106		dump_dirs();
	148		if (debug)
	149		dump_dirs();
107	150
108		fprintf(stderr, "\nDUMP DUPLICATES...\n\n");
109	151	dump_duplicates(min_size);	dump_duplicates(min_size);
110	152
111		dump_stats();
	153		if (verbose)
	154		dump_stats();
112	155
113	156	return 0;	return 0;
114	157	}	}

File store.c changed (mode: 100644) (index 0ca71a0..5d5b290)
...	...	static unsigned int dir_info_count;
37	37	static struct dir_node *dir_current[MAX_DEPTH];	static struct dir_node *dir_current[MAX_DEPTH];
38	38	static unsigned char sha1_zero[SHA_DIGEST_LENGTH];	static unsigned char sha1_zero[SHA_DIGEST_LENGTH];
39	39	static struct dev_ino *dev_ino_hash[DEV_INO_HASH_SIZE];	static struct dev_ino *dev_ino_hash[DEV_INO_HASH_SIZE];
	40		static int debug = 0;
	41
	42		/* ############### Misc functions ############### */
	43		void set_debug(const unsigned int level)
	44		{
	45		debug = level;
	46		}
40	47
41	48	/* ############### Memory functions ############### */	/* ############### Memory functions ############### */
42	49	static void *xmalloc(size_t size)	static void *xmalloc(size_t size)

...	...	int compare_sha1(const unsigned char a, const unsigned char b)
137	144
138	145	sha1_dump(sha1_a, a, 0);	sha1_dump(sha1_a, a, 0);
139	146	sha1_dump(sha1_b, a, 0);	sha1_dump(sha1_b, a, 0);
140		fprintf(stderr, "\t\tComparing [%s] with [%s]\n", sha1_a, sha1_b);
141	147	return memcmp(a, b, SHA_DIGEST_LENGTH);	return memcmp(a, b, SHA_DIGEST_LENGTH);
142	148	}	}
143	149

...	...	static void dir_mark_do_not_dump(struct dir_node d)*
484	490	struct file_node *file;	struct file_node *file;
485	491	struct dir_node *subdir;	struct dir_node *subdir;
486	492
487		fprintf(stderr, "DEBUG: dir_mark_do_not_dump(%s)\n", d->name);
	493		if (debug)
	494		fprintf(stderr, "DEBUG: dir_mark_do_not_dump(%s)\n", d->name);
488	495	if ((d == NULL) \|\| (d->do_not_dump == 1))	if ((d == NULL) \|\| (d->do_not_dump == 1))
489	496	return;	return;
490	497

...	...	static void dir_mark_do_not_dump(struct dir_node d)*
498	505
499	506	file = d->files;	file = d->files;
500	507	while (file) {	while (file) {
501		fprintf(stderr, "\tSet do_not_dump=1 on [%s]\n", file->name);
	508		if (debug)
	509		fprintf(stderr, "\tSet do_not_dump=1 on [%s]\n", file->name);
502	510	file->do_not_dump = 1;	file->do_not_dump = 1;
503	511	file = file->next;	file = file->next;
504	512	}	}

...	...	static void file_mark_no_dup_possible(struct file_node f)*
539	547	dir_mark_no_dup_possible(f->parent);	dir_mark_no_dup_possible(f->parent);
540	548	}	}
541	549
542		/*
543		* Mark a file to not be dumped
544		*/
545		static void file_mark_do_not_dump(struct file_node *f)
546		{
547		if ((f == NULL) \|\| (f->do_not_dump == 1))
548		return;
549
550		f->do_not_dump = 1;
551		}
552
553	550	/*	/*
554	551	* Compare the same size files using hashes	* Compare the same size files using hashes
555	552	* @a - start of the chain of files with same size	* @a - start of the chain of files with same size

...	...	static int compare_file_range(struct file_node a, struct file_node b)
562	559	int err;	int err;
563	560	struct file_node q, p, dups, p_last;	struct file_node q, p, dups, p_last;
564	561
565		fprintf(stderr, "compare_file_range:");
566		q = a;
567		while (q != b->hash_next) {
568		fprintf(stderr, " %s", q->name);
569		q = q->hash_next;
	562		if (debug) {
	563		fprintf(stderr, "compare_file_range:");
	564		q = a;
	565		while (q != b->hash_next) {
	566		fprintf(stderr, " %s", q->name);
	567		q = q->hash_next;
	568		}
	569		fprintf(stderr, ".\n");
570	570	}	}
571		fprintf(stderr, ".\n");
572	571
573	572	/* Mark all as unique */	/* Mark all as unique */
574	573	q = a;	q = a;

...	...	static int compare_file_range(struct file_node a, struct file_node b)
592	591	}	}
593	592
594	593	err = compare_files(p, q);	err = compare_files(p, q);
595		fprintf(stderr, "COMPARING [%s] with [%s] = %d\n", p->name, q->name, err);
	594		if (debug)
	595		fprintf(stderr, "COMPARING [%s] with [%s] = %d\n", p->name, q->name, err);
596	596	if (err == -1)	if (err == -1)
597	597	return -1;	return -1;
598	598

...	...	static int compare_file_range(struct file_node a, struct file_node b)
607	607	p_last->duplicates = q;	p_last->duplicates = q;
608	608	p_last = q;	p_last = q;
609	609
610		fprintf(stderr, "\tp[%s]->duplicates: ", p->name);
611		dups = p->duplicates;
612		while (dups) {
613		fprintf(stderr, " %s", dups->name);
614		dups = dups->duplicates;
	610		if (debug) {
	611		fprintf(stderr, "\tp[%s]->duplicates: ", p->name);
	612		dups = p->duplicates;
	613		while (dups) {
	614		fprintf(stderr, " %s", dups->name);
	615		dups = dups->duplicates;
	616		}
	617		fprintf(stderr, "\n");
615	618	}	}
616		fprintf(stderr, "\n");
617	619
618	620	p->unique = 0;	p->unique = 0;
619	621	q->unique = 0;	q->unique = 0;

...	...	int file_find_dups(void)
648	650	unsigned int hash;	unsigned int hash;
649	651	unsigned long long size;	unsigned long long size;
650	652
651		fprintf(stderr, "file_find_dups START...\n");
652	653	for (hash = 0; hash < HASH_SIZE; hash++) {	for (hash = 0; hash < HASH_SIZE; hash++) {
653	654	if (file_info[hash] == NULL)	if (file_info[hash] == NULL)
654	655	continue;	continue;
655	656
656		fprintf(stderr, "file_find_dups[%u]...\n", hash);
	657		if (debug)
	658		fprintf(stderr, "file_find_dups[%u]...\n", hash);
657	659
658	660	/* We need at least 2 nodes */	/* We need at least 2 nodes */
659	661	if (file_info[hash]->hash_next == NULL) {	if (file_info[hash]->hash_next == NULL) {

...	...	int file_find_dups(void)
685	687	first = last->hash_next;	first = last->hash_next;
686	688	}	}
687	689
688		fprintf(stderr, "Dump chain %u: ", hash);
689		q = file_info[hash];
690		while (q) {
691		fprintf(stderr, "%s(", q->name);
692		dups = q->duplicates;
693		while(dups) {
694		fprintf(stderr, " %s", dups->name);
695		dups = dups->duplicates;
	690		if (debug) {
	691		fprintf(stderr, "Dump chain %u: ", hash);
	692		q = file_info[hash];
	693		while (q) {
	694		fprintf(stderr, "%s(", q->name);
	695		dups = q->duplicates;
	696		while(dups) {
	697		fprintf(stderr, " %s", dups->name);
	698		dups = dups->duplicates;
	699		}
	700		fprintf(stderr, ") -> ");
	701		q = q->hash_next;
696	702	}	}
697		fprintf(stderr, ") -> ");
698		q = q->hash_next;
	703		fprintf(stderr, "\n");
699	704	}	}
700		fprintf(stderr, "\n");
701	705	}	}
702	706
703		fprintf(stderr, "file_find_dups ENDS...\n");
704	707	return 0;	return 0;
705	708	}	}
706	709

...	...	static int dir_files_hash(unsigned char hash, struct dir_node d)
738	741	struct file_node **u;	struct file_node **u;
739	742	unsigned int i, mem;	unsigned int i, mem;
740	743	SHA_CTX c;	SHA_CTX c;
741		char dump[SHA_DIGEST_LENGTH * 2 + 1];
742	744
743	745	if (d->files == NULL) {	if (d->files == NULL) {
744	746	memset(hash, 0, SHA_DIGEST_LENGTH);	memset(hash, 0, SHA_DIGEST_LENGTH);

...	...	static int dir_files_hash(unsigned char hash, struct dir_node d)
760	762
761	763	qsort(u, d->no_of_files, sizeof(struct file_node *), file_compare_hashes);	qsort(u, d->no_of_files, sizeof(struct file_node *), file_compare_hashes);
762	764
763		/*
764		fprintf(stderr, "DEBUG: dump after qsort [%s]\n", d->name);
765		for (i = 0; i < d->no_of_files; i++) {
766		sha1_dump(dump, u[i]->sha1_full, 0);
767		fprintf(stderr, "DEBUG: %s\t%u\t%s\n", dump, u[i]->parent->level, u[i]->name);
768		}
769		*/
770
771	765	SHA1_Init(&c);	SHA1_Init(&c);
772	766
773	767	i = 0;	i = 0;

...	...	static long long dir_build_hash(struct dir_node d)*
795	789	long long no_of_possible_dirs = 0;	long long no_of_possible_dirs = 0;
796	790	long long ret;	long long ret;
797	791
798		fprintf(stderr, "DEBUG: %s [%s] no_dup_possible=%u\n",
799		__FUNCTION__, d->name, d->no_dup_possible);
	792		if (debug)
	793		fprintf(stderr, "DEBUG: %s [%s] no_dup_possible=%u\n",
	794		__FUNCTION__, d->name, d->no_dup_possible);
800	795
801	796	/* We check current dir first. */	/* We check current dir first. */
802	797	if (d->no_dup_possible == 0)	if (d->no_dup_possible == 0)

...	...	static long long dir_find_dups_populate_list(struct dir_node u,
877	872	struct dir_node *subdir;	struct dir_node *subdir;
878	873	long long new_pos;	long long new_pos;
879	874
880		/*
881		fprintf(stderr, "\tDEBUG: ENTER %s [%s] pos=%lld\n",
882		__FUNCTION__, d->name, pos);
883		*/
884
885	875	new_pos = pos;	new_pos = pos;
886	876
887	877	/* We check current dir first. */	/* We check current dir first. */

...	...	static long long dir_find_dups_populate_list(struct dir_node u,
896	886	subdir = subdir->next_sibling;	subdir = subdir->next_sibling;
897	887	}	}
898	888
899		/*
900		fprintf(stderr, "\tDEBUG: EXIT %s [%s] new_pos=%lld\n",
901		__FUNCTION__, d->name, new_pos);
902		*/
903
904	889	return new_pos;	return new_pos;
905	890	}	}
906	891

...	...	int dir_find_dups(void)
920	905	struct dir_node **u;	struct dir_node **u;
921	906	char dump[SHA_DIGEST_LENGTH * 2 + 1];	char dump[SHA_DIGEST_LENGTH * 2 + 1];
922	907
923		fprintf(stderr, "DEBUG: %s...\n", __FUNCTION__);
924	908	for (i = 0; i < dir_info_count; i++) {	for (i = 0; i < dir_info_count; i++) {
925		fprintf(stderr, "\tDEBUG: [%llu] build hash for [%s]...\n", i, dir_info[i]->name);
926	909	err = dir_build_hash(dir_info[i]);	err = dir_build_hash(dir_info[i]);
927	910	if (err == -1)	if (err == -1)
928	911	return -1;	return -1;
929	912
930	913	no_of_possible_dirs += err;	no_of_possible_dirs += err;
931	914	}	}
932		fprintf(stderr, "\tDEBUG: no_of_possible_dirs = %lld\n", no_of_possible_dirs);
933	915
934	916	/* Allocate an array that we will pass to qsort */	/* Allocate an array that we will pass to qsort */
935	917	mem = no_of_possible_dirs * sizeof(struct dir_node *);	mem = no_of_possible_dirs * sizeof(struct dir_node *);

...	...	int dir_find_dups(void)
943	925	j = 0;	j = 0;
944	926	for (i = 0; i < dir_info_count; i++) {	for (i = 0; i < dir_info_count; i++) {
945	927	d = dir_info[i];	d = dir_info[i];
946		fprintf(stderr, "dir_find_dups[i=%llu, j=%lld] [%s]\n", i, j, d->name);
	928		if (debug)
	929		fprintf(stderr, "dir_find_dups[i=%llu, j=%lld] [%s]\n", i, j, d->name);
947	930
948	931	j += dir_find_dups_populate_list(u, j, d);	j += dir_find_dups_populate_list(u, j, d);
949	932

...	...	int dir_find_dups(void)
952	935	break;	break;
953	936	}	}
954	937
955		fprintf(stderr, "dir u (j=%lld): ", j);
956		for (i = 0; i < no_of_possible_dirs; i++)
957		fprintf(stderr, "[%lld]=%s ", i, u[i]->name);
958		fprintf(stderr, "\n");
	938		if (debug) {
	939		fprintf(stderr, "dir u (j=%lld): ", j);
	940		for (i = 0; i < no_of_possible_dirs; i++)
	941		fprintf(stderr, "[%lld]=%s ", i, u[i]->name);
	942		fprintf(stderr, "\n");
	943		}
959	944
960	945	/* Order by hash */	/* Order by hash */
961	946	qsort(u, no_of_possible_dirs, sizeof(struct dir_node *), dir_compare_hashes);	qsort(u, no_of_possible_dirs, sizeof(struct dir_node *), dir_compare_hashes);
962	947
963		fprintf(stderr, "DEBUG: dump after dir qsort [%s]\n", d->name);
964		for (i = 0; i < no_of_possible_dirs; i++) {
965		sha1_dump(dump, u[i]->sha1, 0);
966		fprintf(stderr, "DEBUG: %s\t%u\t%s\n", dump, u[i]->level, u[i]->name);
	948		if (debug) {
	949		fprintf(stderr, "DEBUG: dump after dir qsort [%s]\n", d->name);
	950		for (i = 0; i < no_of_possible_dirs; i++) {
	951		sha1_dump(dump, u[i]->sha1, 0);
	952		fprintf(stderr, "DEBUG: %s\t%u\t%s\n", dump, u[i]->level, u[i]->name);
	953		}
967	954	}	}
968	955
969	956	first = 0;	first = 0;

...	...	void dir_dump_duplicates(struct dir_node d)*
1028	1015	}	}
1029	1016
1030	1017	dir_mark_left(d);	dir_mark_left(d);
1031		fprintf(stderr, "dir_dump_duplicates: set do_not_dump on left [%s]\n", d->name);
	1018		if (debug)
	1019		fprintf(stderr, "dir_dump_duplicates: set do_not_dump on left [%s]\n", d->name);
1032	1020	dir_mark_do_not_dump(d);	dir_mark_do_not_dump(d);
1033	1021
1034		fprintf(stderr, "dir_dump_duplicates: set do_not_dump on right [%s]\n", p->name);
	1022		if (debug)
	1023		fprintf(stderr, "dir_dump_duplicates: set do_not_dump on right [%s]\n", p->name);
1035	1024	dir_mark_do_not_dump(p);	dir_mark_do_not_dump(p);
1036	1025
1037		fprintf(stderr, "DIR\t%s\t%s\n",
1038		d->name, p->name);
1039	1026	printf("DIR\t%s\t%s\n",	printf("DIR\t%s\t%s\n",
1040	1027	d->name, p->name);	d->name, p->name);
1041	1028	p = p->hash_next;	p = p->hash_next;

...	...	void file_dump_duplicates(struct file_node f,*
1050	1037	{	{
1051	1038	struct file_node p, first_left;	struct file_node p, first_left;
1052	1039
1053		fprintf(stderr, "\tfile_dump_duplicates [%s]\n", f->name);
1054		file_dump_node(f, 1);
	1040		if (debug)
	1041		file_dump_node(f, 1);
1055	1042
1056	1043	if (f->duplicates == NULL) {	if (f->duplicates == NULL) {
1057		fprintf(stderr, "\tignore duplicate file because ->duplicates is NULL\n");
	1044		if (debug)
	1045		fprintf(stderr, "\tignore duplicate file because ->duplicates is NULL\n");
1058	1046	return;	return;
1059	1047	}	}
1060	1048
1061	1049	if (f->no_dup_possible == 1) {	if (f->no_dup_possible == 1) {
1062		fprintf(stderr, "\tignore duplicate file because no_dup_possible=1\n");
	1050		if (debug)
	1051		fprintf(stderr, "\tignore duplicate file because no_dup_possible=1\n");
1063	1052	return;	return;
1064	1053	}	}
1065	1054
1066	1055	if (f->do_not_dump == 1) {	if (f->do_not_dump == 1) {
1067		fprintf(stderr, "\tignore duplicate file because do_not_dump=1\n");
	1056		if (debug)
	1057		fprintf(stderr, "\tignore duplicate file because do_not_dump=1\n");
1068	1058	return;	return;
1069	1059	}	}
1070	1060
1071	1061	if (f->size < min_size) {	if (f->size < min_size) {
1072		fprintf(stderr, "\tignore duplicate file because size < min\n");
	1062		if (debug)
	1063		fprintf(stderr, "\tignore duplicate file because size < min\n");
1073	1064	return;	return;
1074	1065	}	}
1075	1066

...	...	void file_dump_duplicates(struct file_node f,*
1085	1076	p = p->duplicates;	p = p->duplicates;
1086	1077	}	}
1087	1078	}	}
1088		fprintf(stderr, "first_left = [%s]\n", first_left->name);
	1079		if (debug)
	1080		fprintf(stderr, "first_left = [%s]\n", first_left->name);
1089	1081
1090	1082	/* now, dump */	/* now, dump */
1091	1083	p = f;	p = f;

...	...	void file_dump_duplicates(struct file_node f,*
1095	1087	* it for dirs.	* it for dirs.
1096	1088	*/	*/
1097	1089	if (p->do_not_dump == 1) {	if (p->do_not_dump == 1) {
1098		fprintf(stderr, "\t\tignore duplicate file in chain because do_not_dump=1 [%s]\n", p->name);
	1090		if (debug)
	1091		fprintf(stderr, "\t\tignore duplicate file in chain because do_not_dump=1 [%s]\n", p->name);
1099	1092	p = p->duplicates;	p = p->duplicates;
1100	1093	continue;	continue;
1101	1094	}	}

...	...	void file_dump_duplicates(struct file_node f,*
1105	1098	continue;	continue;
1106	1099	}	}
1107	1100
1108		fprintf(stderr, "Because we will dump [%s] as a left, set do_not_dump=1\n", first_left->name);
	1101		if (debug)
	1102		fprintf(stderr, "Because we will dump [%s] as a left, set do_not_dump=1\n", first_left->name);
1109	1103	first_left->left = 1;	first_left->left = 1;
1110	1104	first_left->do_not_dump = 1;	first_left->do_not_dump = 1;
1111	1105
1112	1106	/* Prevent this file to appear again in the dump */	/* Prevent this file to appear again in the dump */
1113		fprintf(stderr, "Because [%s] is a right, set do_not_dump=1\n", p->name);
	1107		if (debug)
	1108		fprintf(stderr, "Because [%s] is a right, set do_not_dump=1\n", p->name);
1114	1109	p->do_not_dump = 1;	p->do_not_dump = 1;
1115	1110
1116		fprintf(stderr, "\t\t\t%s = %s\n",
1117		first_left->name, p->name);
1118	1111	printf("FILE\t%s\t%s\n",	printf("FILE\t%s\t%s\n",
1119	1112	first_left->name, p->name);	first_left->name, p->name);
1120	1113	p = p->duplicates;	p = p->duplicates;

...	...	void dump_duplicates(const unsigned long long min_size)
1132	1125	struct file_node *f;	struct file_node *f;
1133	1126	unsigned int hash;	unsigned int hash;
1134	1127
1135		fprintf(stderr, "Dump duplicates (bigger than %llu)...\n", min_size);
1136
1137	1128	for (i = 0; i < dir_info_count; i++) {	for (i = 0; i < dir_info_count; i++) {
1138		fprintf(stderr, "\tdump_duplicates[%u]...\n", i);
	1129		if (debug)
	1130		fprintf(stderr, "\tdump_duplicates[%u]...\n", i);
1139	1131	d = dir_info[i];	d = dir_info[i];
1140	1132	dir_dump_duplicates(d);	dir_dump_duplicates(d);
1141	1133

...	...	void dump_duplicates(const unsigned long long min_size)
1147	1139	}	}
1148	1140
1149	1141	/* Now, we dump remaining files */	/* Now, we dump remaining files */
1150		fprintf(stderr, "DEBUG: Dump duplicated files...\n");
	1142		if (debug)
	1143		fprintf(stderr, "DEBUG: Dump duplicated files...\n");
1151	1144	for (hash = 0; hash < HASH_SIZE; hash++) {	for (hash = 0; hash < HASH_SIZE; hash++) {
1152	1145	if (file_info[hash] == NULL)	if (file_info[hash] == NULL)
1153	1146	continue;	continue;
1154	1147
1155		fprintf(stderr, "Dump duplicates in hash %u\n", hash);
	1148		if (debug)
	1149		fprintf(stderr, "Dump duplicates in hash %u\n", hash);
1156	1150
1157	1151	f = file_info[hash];	f = file_info[hash];
1158	1152	while (f) {	while (f) {

File store.h changed (mode: 100644) (index ebdd562..27e1a3d)
...	...	struct dir_node
52	52	};	};
53	53
54	54
	55		extern void set_debug(const unsigned int level);
55	56	extern void dump_stats(void);	extern void dump_stats(void);
56	57	extern int file_add(const char file, const struct stat s,	extern int file_add(const char file, const struct stat s,
57	58	const unsigned int level);	const unsigned int level);

Hints:
Before first commit, do not forget to setup your git environment:

git config --global user.name "your_name_here"
git config --global user.email "your@email_here"

Clone this repository using HTTP(S):

git clone https://rocketgit.com/user/catalinux/dupdump

Clone this repository using ssh (do not forget to upload a key first):

git clone ssh://rocketgit@ssh.rocketgit.com/user/catalinux/dupdump

Clone this repository using git:

git clone git://git.rocketgit.com/user/catalinux/dupdump

You are allowed to anonymously push to this repository.
This means that your pushed commits will automatically be transformed into a merge request:

... clone the repository ...
... make some changes and some commits ...
git push origin main