You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
9 years ago | |
---|---|---|
test | 9 years ago | |
README.md | 9 years ago | |
Todo.md | 9 years ago | |
see.py | 9 years ago | |
walk | 9 years ago | |
walk.go | 9 years ago |
README.md
find-duplicated-files
Features
- It consists of two programs:
walk.go
andsee.py
walk.go
calculates the hash (sha256sum
) of every file in an specific directory matching file pattern. It writes to JSON filesee.py
reads the contents of this JSON file and generates a custom HTML filesee.py
could reads several JSON fileswalk.go
reads hash, date of modification, size and permissions of each filewalk.go
finds the files which has the same content, although their have different file names or are in different places.- It ignores symlinks because it could contain recursively things from other places (you could calculate hash in the original directory)
- Dependencies: only Python 3 (for running
see.py
) and Go programming language (1.0 or above) (for compilingwalk.go
)
Installation
- Be sure you have Python 3 and Go installed.
- Copy
walk.go
(use releases) because these are stable versions) and compile it - Copy the file
see.py
.
Usage
- See
./walk --help
:
$ ./walk --help
Usage of ./walk:
-output="./output.json": The JSON output file in which we save the results
-path="./": Source of files
-pattern="*": Pattern search expression of searching files
- See
./see.py --help
:
$ ./see.py --help
usage: see.py [-h] [--sources [SOURCES [SOURCES ...]]] [--output OUTPUT]
optional arguments:
-h, --help show this help message and exit
--sources [SOURCES [SOURCES ...]]
The JSON files in which we have hash values
--output OUTPUT The output html file in which we put the results
Example
For example, for searching jpg
files in foo' and
bla` directories, you should run:
./walk -path=foo -pattern="*.jpg" -output=output1.json
./walk -path=bla -pattern="*.jpg" -output=output2.json
./see.py --sources ./output1.json ./output2.json --output files.html