Find duplicated files (and different also) from several places. It generates a JSON and HTML static output files.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Xavier 73b81f6d85 removing div in README pre code 5 years ago
test README (to markdown) and some todo things 5 years ago
README.md removing div in README pre code 5 years ago
Todo.md README (to markdown) and some todo things 5 years ago
see.py README (to markdown) and some todo things 5 years ago
walk io.Copy instead of manually do the things (suggestion of al...@lx.lc, andrey mirtchovski et al) 6 years ago
walk.go io.Copy instead of manually do the things (suggestion of al...@lx.lc, andrey mirtchovski et al) 6 years ago

README.md

find-duplicated-files

Features

  • It consists of two programs: walk.go and see.py
  • walk.go calculates the hash (sha256sum) of every file in an specific directory matching file pattern. It writes to JSON file
  • see.py reads the contents of this JSON file and generates a custom HTML file
  • see.py could reads several JSON files
  • walk.go reads hash, date of modification, size and permissions of each file
  • walk.go finds the files which has the same content, although their have different file names or are in different places.
  • It ignores symlinks because it could contain recursively things from other places (you could calculate hash in the original directory)
  • Dependencies: only Python 3 (for running see.py) and Go programming language (1.0 or above) (for compiling walk.go)

Installation

  • Be sure you have Python 3 and Go installed.
  • Copy walk.go (use releases) because these are stable versions) and compile it
  • Copy the file see.py.

Usage

  • See ./walk --help:

   $ ./walk --help
Usage of ./walk:
  -output="./output.json": The JSON output file in which we save the results
  -path="./": Source of files
  -pattern="*": Pattern search expression of searching files

  • See ./see.py --help:

  $ ./see.py  --help
  usage: see.py [-h] [--sources [SOURCES [SOURCES ...]]] [--output OUTPUT]

  optional arguments:
    -h, --help            show this help message and exit
    --sources [SOURCES [SOURCES ...]]
                          The JSON files in which we have hash values
    --output OUTPUT       The output html file in which we put the results

Example

For example, for searching jpg files in foo' andbla` directories, you should run:


./walk -path=foo -pattern="*.jpg" -output=output1.json
./walk -path=bla -pattern="*.jpg" -output=output2.json
./see.py --sources ./output1.json ./output2.json --output files.html