19May
Fuzzy hash of a file
Could someone please explain this to me: When you use a fuzzy hash algorithm (ssdeep, tlsh, sdhash... or any other) to calculate the hash value of a file, does it calculate the hash based on the whole file (e.g. a PDF file consists of a trailer dictionary, a central directory stream, etc.), so does it take these elements into account? Or does it calculate the hash based only on the content of the file (but in the case of a PDF file, the content is compressed and stored in a binary file along with all other information , so how can the algorithm "extract" the content of the file in such a case?