Implement telfhash for ELF import table (#936)

* Implement telfhash for import table and add TLSH to the project

* comment the import symbol filter regexes

* Use std::set for faster lookup

* Address code review comments

* better formatting

* Move TLSH to deps/ using cmake

* Forgot to commit tlsh headers

* Restructure elf_format to get symbols in the same manner as telfhash

* Ignore symbols from dynamic segments

* First exclude then convert to lower_case

* mask out symbol visibility from others

* Move telfhash outside import table to elf_format, use TLSH for all imphashes, create default imphash for ELF

* Fix uninitialized value

* Fixed TLSH build on Windows

* fileformat/CMakeLists.txt: do not add tlsh-related stuff

* deps/tlsh: refactor CMake

* cmake/options.cmake: move TLSH to deps section

* deps/tlsh/cmake: add new line at the end

* fileformat/elf_format: C comment -> C++ comment

* fileformat/elf_import_table.h: add missing new line

* fileformat: remove trailing spaces

Co-authored-by: Peter Matula <peter.matula@avast.com>
Co-authored-by: Peter Matula <p3t3r.matula@gmail.com>
This commit is contained in:
HoundThe 2021-04-14 13:03:15 +02:00 committed by GitHub
parent dca4d73f5c
commit 0cdc9a1de6
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
33 changed files with 7290 additions and 9 deletions

View File

@ -18,6 +18,7 @@ RetDec uses the following third-party libraries or other resources:
10) yaramod: https://github.com/avast/yaramod
11) Eigen: http://eigen.tuxfamily.org/index.php?title=Main_Page
12) cmake-modules: https://github.com/rpavlik/cmake-modules
13) tlsh: https://github.com/trendmicro/tlsh
These third-party libraries or other resources are licensed under the
following licenses:
@ -699,3 +700,276 @@ SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
===============================================================================
12) tlsh
===============================================================================
=====================
LICENSE OPTION NOTICE
=====================
TLSH is provided for use under two licenses: Apache OR BSD.
Users may opt to use either license depending on the license
restictions of the systems with which they plan to integrate
the TLSH code.
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
BSD License Version 3
https://opensource.org/licenses/BSD-3-Clause
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors
may be used to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
=========================================================================
== NOTICE file for use with the Apache License, Version 2.0, ==
== in this case for the Trend Locality Sensitive Hash distribution. ==
=========================================================================
Trend Locality Sensitive Hash (TLSH)
Copyright 2010-2014 Trend Micro
This product includes software developed at
Trend Micro (http://www.trendmicro.com/)
Refer to the following publications for more information:
Jonathan Oliver, Chun Cheng and Yanggui Chen,
"TLSH - A Locality Sensitive Hash"
4th Cybercrime and Trustworthy Computing Workshop, Sydney, November 2013
https://github.com/trendmicro/tlsh/blob/master/TLSH_CTC_final.pdf
Jonathan Oliver, Scott Forman and Chun Cheng,
"Using Randomization to Attack Similarity Digests"
Applications and Techniques in Information Security. Springer Berlin Heidelberg, 2014. 199-210.
https://github.com/trendmicro/tlsh/blob/master/Attacking_LSH_and_Sim_Dig.pdf
Jonathan Oliver and Jayson Pryde
http://blog.trendmicro.com/trendlabs-security-intelligence/smart-whitelisting-using-locality-sensitive-hashing/

View File

@ -497,6 +497,9 @@ set_if_at_least_one_set(RETDEC_ENABLE_YARAMOD
RETDEC_ENABLE_PAT2YARA
RETDEC_ENABLE_PATTERNGEN)
set_if_at_least_one_set(RETDEC_ENABLE_TLSH
RETDEC_ENABLE_FILEFORMAT)
# Support
set_if_at_least_one_set(RETDEC_ENABLE_SUPPORT_ORDINALS

1
deps/CMakeLists.txt vendored
View File

@ -23,3 +23,4 @@ cond_add_subdirectory(rapidjson RETDEC_ENABLE_RAPIDJSON)
cond_add_subdirectory(tinyxml2 RETDEC_ENABLE_TINYXML2)
cond_add_subdirectory(yara RETDEC_ENABLE_YARA)
cond_add_subdirectory(yaramod RETDEC_ENABLE_YARAMOD)
cond_add_subdirectory(tlsh RETDEC_ENABLE_TLSH)

52
deps/tlsh/CMakeLists.txt vendored Normal file
View File

@ -0,0 +1,52 @@
add_library(tlsh STATIC
tlsh.cpp
tlsh_impl.cpp
tlsh_util.cpp
)
add_library(retdec::deps::tlsh ALIAS tlsh)
target_include_directories(tlsh
SYSTEM INTERFACE
$<BUILD_INTERFACE:${RETDEC_DEPS_DIR}/tlsh/include>
$<INSTALL_INTERFACE:${RETDEC_INSTALL_DEPS_INCLUDE_DIR}>
PRIVATE
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include/tlsh>
)
set_target_properties(tlsh
PROPERTIES
OUTPUT_NAME "retdec-tlsh"
)
if(MSVC)
target_compile_definitions(tlsh PUBLIC WINDOWS TLSH_EXPORTS TLSH_LIB)
endif()
# Install includes.
install(
DIRECTORY ${RETDEC_DEPS_DIR}/tlsh/include/
DESTINATION ${RETDEC_INSTALL_DEPS_INCLUDE_DIR}
)
# Install libs.
install(TARGETS tlsh
EXPORT tlsh-targets
ARCHIVE DESTINATION ${RETDEC_INSTALL_LIB_DIR}
LIBRARY DESTINATION ${RETDEC_INSTALL_LIB_DIR}
)
# Export targets.
install(EXPORT tlsh-targets
FILE "retdec-tlsh-targets.cmake"
NAMESPACE retdec::deps::
DESTINATION ${RETDEC_INSTALL_CMAKE_DIR}
)
# Install CMake files.
install(
FILES
"${CMAKE_CURRENT_LIST_DIR}/retdec-tlsh-config.cmake"
DESTINATION
"${RETDEC_INSTALL_CMAKE_DIR}"
)

91
deps/tlsh/WinFunctions.cpp vendored Normal file
View File

@ -0,0 +1,91 @@
#ifdef WINDOWS
// implementation of Linux functions for Windows
#include <WinFunctions.h>
DIR *opendir(const char *dirname)
{
if (strlen(dirname) >= NAME_LENGTH) {
printf("ERROR: directory name candidate, %s, is too long (%d >= %d)\n",
dirname, strlen(dirname), NAME_LENGTH);
return NULL;
}
DWORD dw = GetFileAttributes(dirname);
if ((dw & FILE_ATTRIBUTE_DIRECTORY) == 0)
return NULL;
DIR *dir = new DIR;
dir->hFind = INVALID_HANDLE_VALUE;
_snprintf(dir->dirname, sizeof(dir->dirname), "%s\\*", dirname);
return dir;
}
struct dirent *readdir(DIR *dir)
{
if (dir == NULL) {
return NULL;
}
if (dir->hFind == INVALID_HANDLE_VALUE) {
dir->hFind = FindFirstFile(dir->dirname, &dir->findFileData);
if (dir->hFind == INVALID_HANDLE_VALUE) {
return NULL;
}
}
else {
if (!FindNextFile(dir->hFind, &dir->findFileData)) {
return NULL;
}
}
_snprintf(dir->findDirEnt.d_name, sizeof(dir->findDirEnt.d_name), "%s", dir->findFileData.cFileName);
return &dir->findDirEnt;
}
int closedir(DIR *dir)
{
if (dir != NULL) {
if (dir->hFind != INVALID_HANDLE_VALUE) {
FindClose(dir->hFind);
}
delete dir;
}
return 0;
}
struct tm *localtime_r(const time_t *timep, struct tm *results)
{
struct tm *tm_s = localtime(timep);
memcpy(results, tm_s, sizeof(struct tm));
return results;
}
bool read_file_win(const char *fname, int sizefile, unsigned char* data)
{
HANDLE hdl = CreateFile(fname, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL, NULL);
if (hdl == INVALID_HANDLE_VALUE) {
printf("ERROR: CreateFile failed on %s; error:%d\n", fname, GetLastError());
return false;
}
bool retVal = true;
DWORD bytesRead;
if (!ReadFile(hdl, data, sizefile, &bytesRead, NULL)) {
printf("ERROR: ReadFile failed on %s; error:%d\n", fname, GetLastError());
retVal = false;
}
else if (bytesRead != sizefile) {
printf("ERROR: ReadFile read %d bytes, but %d were requested\n", bytesRead, sizefile);
retVal = false;
}
CloseHandle(hdl);
return retVal;
}
#endif

46
deps/tlsh/include/tlsh/WinFunctions.h vendored Normal file
View File

@ -0,0 +1,46 @@
#ifdef WINDOWS
#ifndef WINFUNCTIONS_H
#define WINFUNCTIONS_H
#include <windows.h>
#include <stdio.h>
#include <time.h>
#ifndef TLSH_LIB
# ifdef TLSH_EXPORTS
# define TLSH_API __declspec(dllexport)
# else
# define TLSH_API __declspec(dllimport)
# endif
#else
# define TLSH_API
#endif
#define strdup _strdup
#define NAME_LENGTH MAX_PATH
#define snprintf _snprintf
#define strcasecmp _stricmp
#define random rand
#define srandom srand
struct dirent {
char d_name[NAME_LENGTH];
};
typedef struct _DIR
{
char dirname[NAME_LENGTH];
HANDLE hFind;
WIN32_FIND_DATA findFileData;
struct dirent findDirEnt;
} DIR;
extern DIR *opendir(const char *dirname);
extern struct dirent *readdir(DIR *dir);
extern int closedir(DIR *dir);
extern struct tm *localtime_r(const time_t *timep, struct tm *results);
extern bool read_file_win(const char *fname, int sizefile, unsigned char* data);
#endif // #ifndef WINFUNCTIONS_H
#endif // #ifdef WINDOWS

183
deps/tlsh/include/tlsh/tlsh.h vendored Normal file
View File

@ -0,0 +1,183 @@
// tlsh.h - TrendLSH Hash Algorithm
/*
* TLSH is provided for use under two licenses: Apache OR BSD.
* Users may opt to use either license depending on the license
* restictions of the systems with which they plan to integrate
* the TLSH code.
*/
/* ==============
* Apache License
* ==============
* Copyright 2013 Trend Micro Incorporated
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/* ===========
* BSD License
* ===========
* Copyright (c) 2013, Trend Micro Incorporated
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without modification,
* are permitted provided that the following conditions are met:
*
* 1. Redistributions of source code must retain the above copyright notice, this
* list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright notice,
* this list of conditions and the following disclaimer in the documentation
* and/or other materials provided with the distribution.
* 3. Neither the name of the copyright holder nor the names of its contributors
* may be used to endorse or promote products derived from this software without
* specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
* IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
* INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
* OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef HEADER_TLSH_H
#define HEADER_TLSH_H
#if defined WINDOWS || defined MINGW
#include "win_version.h"
#else
#include "version.h"
#endif
#ifndef NULL
#define NULL 0
#endif
#ifdef __cplusplus
class TlshImpl;
// Define TLSH_STRING_LEN_REQ, which is the string length of "T1" + the hex value of the Tlsh hash.
// BUCKETS_256 & CHECKSUM_3B are compiler switches defined in CMakeLists.txt
#if defined BUCKETS_256
#define TLSH_STRING_LEN_REQ 136
// changed the minimum data length to 256 for version 3.3
#define MIN_DATA_LENGTH 50
// added the -force option for version 3.5
// added the -conservatibe option for version 3.17
#define MIN_CONSERVATIVE_DATA_LENGTH 256
#endif
#if defined BUCKETS_128
#define TLSH_STRING_LEN_REQ 72
// changed the minimum data length to 256 for version 3.3
#define MIN_DATA_LENGTH 50
// added the -force option for version 3.5
// added the -conservatibe option for version 3.17
#define MIN_CONSERVATIVE_DATA_LENGTH 256
#endif
#if defined BUCKETS_48
// No 3 Byte checksum option for 48 Bucket min hash
#define TLSH_STRING_LEN 30
// changed the minimum data length to 256 for version 3.3
#define MIN_DATA_LENGTH 10
// added the -force option for version 3.5
#define MIN_CONSERVATIVE_DATA_LENGTH 10
#endif
#define TLSH_STRING_BUFFER_LEN (TLSH_STRING_LEN_REQ+1)
#ifdef WINDOWS
#include "WinFunctions.h"
#else
#if defined(__SPARC) || defined(_AS_MK_OS_RH73)
#define TLSH_API
#else
#define TLSH_API __attribute__ ((visibility("default")))
#endif
#endif
class TLSH_API Tlsh{
public:
Tlsh();
Tlsh(const Tlsh& other);
/* allow the user to add data in multiple iterations */
void update(const unsigned char* data, unsigned int len);
/* to signal the class there is no more data to be added */
void final(const unsigned char* data = NULL, unsigned int len = 0, int fc_cons_option = 0);
/* to get the hex-encoded hash code */
const char* getHash(int showvers=0) const ;
/* to get the hex-encoded hash code without allocating buffer in TlshImpl - bufSize should be TLSH_STRING_BUFFER_LEN */
const char* getHash(char *buffer, unsigned int bufSize, int showvers=0) const;
/* to bring to object back to the initial state */
void reset();
// access functions
int Lvalue();
int Q1ratio();
int Q2ratio();
int Checksum(int k);
int BucketValue(int bucket);
/* calculate difference */
/* The len_diff parameter specifies if the file length is to be included in the difference calculation (len_diff=true) or if it */
/* is to be excluded (len_diff=false). In general, the length should be considered in the difference calculation, but there */
/* could be applications where a part of the adversarial activity might be to add a lot of content. For example to add 1 million */
/* zero bytes at the end of a file. In that case, the caller would want to exclude the length from the calculation. */
int totalDiff(const Tlsh *, bool len_diff=true) const;
/* validate TrendLSH string and reset the hash according to it */
int fromTlshStr(const char* str);
/* check if Tlsh object is valid to operate */
bool isValid() const;
/* display the contents of NOTICE.txt */
static void display_notice();
/* Return the version information used to build this library */
static const char *version();
// operators
Tlsh& operator=(const Tlsh& other);
bool operator==(const Tlsh& other) const;
bool operator!=(const Tlsh& other) const;
~Tlsh();
private:
TlshImpl* impl;
};
#ifdef TLSH_DISTANCE_PARAMETERS
void set_tlsh_distance_parameters(int length_mult_value, int qratio_mult_value, int hist_diff1_add_value, int hist_diff2_add_value, int hist_diff3_add_value);
#endif
#endif
#endif

164
deps/tlsh/include/tlsh/tlsh_impl.h vendored Normal file
View File

@ -0,0 +1,164 @@
/*
* TLSH is provided for use under two licenses: Apache OR BSD.
* Users may opt to use either license depending on the license
* restictions of the systems with which they plan to integrate
* the TLSH code.
*/
/* ==============
* Apache License
* ==============
* Copyright 2013 Trend Micro Incorporated
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/* ===========
* BSD License
* ===========
* Copyright (c) 2013, Trend Micro Incorporated
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without modification,
* are permitted provided that the following conditions are met:
*
* 1. Redistributions of source code must retain the above copyright notice, this
* list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright notice,
* this list of conditions and the following disclaimer in the documentation
* and/or other materials provided with the distribution.
* 3. Neither the name of the copyright holder nor the names of its contributors
* may be used to endorse or promote products derived from this software without
* specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
* IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
* INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
* OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#if defined WINDOWS || defined MINGW
#include "win_version.h"
#else
#include "version.h"
#endif
#ifndef HEADER_TLSH_IMPL_H
#define HEADER_TLSH_IMPL_H
#define SLIDING_WND_SIZE 5
#define BUCKETS 256
#define Q_BITS 2 // 2 bits; quartile value 0, 1, 2, 3
// BUCKETS_256 & CHECKSUM_3B are compiler switches defined in CMakeLists.txt
#if defined BUCKETS_256
#define EFF_BUCKETS 256
#define CODE_SIZE 64 // 256 * 2 bits = 64 bytes
#if defined CHECKSUM_3B
#define INTERNAL_TLSH_STRING_LEN 138
#define TLSH_CHECKSUM_LEN 3
// defined in tlsh.h #define TLSH_STRING_LEN 138 // 2 + 3 + 64 bytes = 138 hexidecimal chars
#else
#define INTERNAL_TLSH_STRING_LEN 134
#define TLSH_CHECKSUM_LEN 1
// defined in tlsh.h #define TLSH_STRING_LEN 134 // 2 + 1 + 64 bytes = 134 hexidecimal chars
#endif
#endif
#if defined BUCKETS_128
#define EFF_BUCKETS 128
#define CODE_SIZE 32 // 128 * 2 bits = 32 bytes
#if defined CHECKSUM_3B
#define INTERNAL_TLSH_STRING_LEN 74
#define TLSH_CHECKSUM_LEN 3
// defined in tlsh.h #define TLSH_STRING_LEN 74 // 2 + 3 + 32 bytes = 74 hexidecimal chars
#else
#define INTERNAL_TLSH_STRING_LEN 70
#define TLSH_CHECKSUM_LEN 1
// defined in tlsh.h #define TLSH_STRING_LEN 70 // 2 + 1 + 32 bytes = 70 hexidecimal chars
#endif
#endif
#if defined BUCKETS_48
#define INTERNAL_TLSH_STRING_LEN 33
#define EFF_BUCKETS 48
#define CODE_SIZE 12 // 48 * 2 bits = 12 bytes
#define TLSH_CHECKSUM_LEN 1
// defined in tlsh.h #define TLSH_STRING_LEN 30 // 2 + 1 + 12 bytes = 30 hexidecimal chars
#endif
class TlshImpl
{
public:
TlshImpl();
~TlshImpl();
public:
void update(const unsigned char* data, unsigned int len);
void fast_update(const unsigned char* data, unsigned int len);
void final(int fc_cons_option = 0);
void reset();
const char* hash(int showvers) const;
const char* hash(char *buffer, unsigned int bufSize, int showvers) const; // saves allocating hash string in TLSH instance - bufSize should be TLSH_STRING_LEN + 1
int compare(const TlshImpl& other) const;
int totalDiff(const TlshImpl& other, bool len_diff=true) const;
int Lvalue();
int Q1ratio();
int Q2ratio();
int Checksum(int k);
int BucketValue(int bucket);
int fromTlshStr(const char* str);
bool isValid() const { return lsh_code_valid; }
private:
unsigned int *a_bucket;
unsigned char slide_window[SLIDING_WND_SIZE];
unsigned int data_len;
struct lsh_bin_struct {
unsigned char checksum[TLSH_CHECKSUM_LEN]; // 1 to 3 bytes
unsigned char Lvalue; // 1 byte
union {
#if defined(__SPARC) || defined(_AIX)
#pragma pack(1)
#endif
unsigned char QB;
struct{
#if defined(__SPARC) || defined(_AIX)
unsigned char Q2ratio : 4;
unsigned char Q1ratio : 4;
#else
unsigned char Q1ratio : 4;
unsigned char Q2ratio : 4;
#endif
} QR;
} Q; // 1 bytes
unsigned char tmp_code[CODE_SIZE]; // 32/64 bytes
} lsh_bin;
mutable char *lsh_code; // allocated when hash() function without buffer is called - 70/134 bytes or 74/138 bytes
bool lsh_code_valid; // true iff final() or fromTlshStr complete successfully
};
#endif

70
deps/tlsh/include/tlsh/tlsh_util.h vendored Normal file
View File

@ -0,0 +1,70 @@
/*
* TLSH is provided for use under two licenses: Apache OR BSD.
* Users may opt to use either license depending on the license
* restictions of the systems with which they plan to integrate
* the TLSH code.
*/
/* ==============
* Apache License
* ==============
* Copyright 2013 Trend Micro Incorporated
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/* ===========
* BSD License
* ===========
* Copyright (c) 2013, Trend Micro Incorporated
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without modification,
* are permitted provided that the following conditions are met:
*
* 1. Redistributions of source code must retain the above copyright notice, this
* list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright notice,
* this list of conditions and the following disclaimer in the documentation
* and/or other materials provided with the distribution.
* 3. Neither the name of the copyright holder nor the names of its contributors
* may be used to endorse or promote products derived from this software without
* specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
* IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
* INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
* OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef HEADER_TLSH_UTIL_H
#define HEADER_TLSH_UTIL_H
unsigned char b_mapping(unsigned char salt, unsigned char i, unsigned char j, unsigned char k);
unsigned char l_capturing(unsigned int len);
int mod_diff(unsigned int x, unsigned int y, unsigned int R);
int h_distance( int len, const unsigned char x[], const unsigned char y[]);
void to_hex( unsigned char * psrc, int len, char* pdest);
void from_hex( const char* psrc, int len, unsigned char* pdest);
unsigned char swap_byte( const unsigned char in );
#endif

23
deps/tlsh/include/tlsh/version.h vendored Normal file
View File

@ -0,0 +1,23 @@
#define VERSION_MAJOR 4
#define VERSION_MINOR 2
#define VERSION_PATCH 1
// Default.
#define TLSH_HASH "compact hash"
#define BUCKETS_128
// Possible option.
//#define TLSH_HASH "min hash"
//#define BUCKETS_48
// Possible option.
//#define TLSH_HASH "full hash"
//#define BUCKETS_256
// Default.
#define TLSH_CHECKSUM "1 byte checksum"
// Possible option.
//#define TLSH_CHECKSUM "no checksum"
//#define CHECKSUM_0B
// Possible option.
//#define TLSH_CHECKSUM "3 bytes checksum"
//#define CHECKSUM_3B

21
deps/tlsh/include/tlsh/win_version.h vendored Normal file
View File

@ -0,0 +1,21 @@
/****************************************************
* The file "version.h" is generated by cmake.
* But there was trouble creating this file on Windows
* so this is the default calues required
* Beware - do not change these values unless you really understand the software.
****************************************************/
#define VERSION_MAJOR 4
#define VERSION_MINOR 2
#define VERSION_PATCH 1
#define TLSH_HASH "compact hash"
#define TLSH_CHECKSUM "1 byte checksum"
#define buckets 128
#define TLSH_CHECKSUM_LEN 1
#define EFF_BUCKETS 128
#define CODE_SIZE 32
#define MIN_DATA_LENGTH 50
#define MIN_CONSERVATIVE_DATA_LENGTH 256
#define TLSH_STRING_LEN_REQ 72
#define INTERNAL_TLSH_STRING_LEN 70

4
deps/tlsh/retdec-tlsh-config.cmake vendored Normal file
View File

@ -0,0 +1,4 @@
if(NOT TARGET retdec::deps::tlsh)
include(${CMAKE_CURRENT_LIST_DIR}/retdec-tlsh-targets.cmake)
endif()

241
deps/tlsh/tlsh.cpp vendored Normal file
View File

@ -0,0 +1,241 @@
/*
* TLSH is provided for use under two licenses: Apache OR BSD.
* Users may opt to use either license depending on the license
* restictions of the systems with which they plan to integrate
* the TLSH code.
*/
/* ==============
* Apache License
* ==============
* Copyright 2013 Trend Micro Incorporated
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/* ===========
* BSD License
* ===========
* Copyright (c) 2013, Trend Micro Incorporated
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without modification,
* are permitted provided that the following conditions are met:
*
* 1. Redistributions of source code must retain the above copyright notice, this
* list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright notice,
* this list of conditions and the following disclaimer in the documentation
* and/or other materials provided with the distribution.
* 3. Neither the name of the copyright holder nor the names of its contributors
* may be used to endorse or promote products derived from this software without
* specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
* IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
* INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
* OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "include/tlsh/tlsh.h"
#include "include/tlsh/tlsh_impl.h"
#include "stdio.h"
#include <errno.h>
#include <string.h>
/////////////////////////////////////////////////////
// C++ Implementation
Tlsh::Tlsh():impl(NULL)
{
impl = new TlshImpl();
}
Tlsh::Tlsh(const Tlsh& other):impl(NULL)
{
impl = new TlshImpl();
*impl = *other.impl;
}
Tlsh::~Tlsh()
{
delete impl;
}
void Tlsh::display_notice()
{
printf(" =========================================================================\n");
printf(" == NOTICE file for use with the Apache License, Version 2.0, ==\n");
printf(" == in this case for the Trend Locality Sensitive Hash distribution. ==\n");
printf(" =========================================================================\n");
printf("\n");
printf(" Trend Locality Sensitive Hash (TLSH)\n");
printf(" Copyright 2010-2014 Trend Micro\n");
printf("\n");
printf(" This product includes software developed at\n");
printf(" Trend Micro (http://www.trendmicro.com/)\n");
printf("\n");
printf(" Refer to the following publications for more information:\n");
printf(" \n");
printf(" Jonathan Oliver, Chun Cheng and Yanggui Chen,\n");
printf(" \"TLSH - A Locality Sensitive Hash\"\n");
printf(" 4th Cybercrime and Trustworthy Computing Workshop, Sydney, November 2013\n");
printf(" https://github.com/trendmicro/tlsh/blob/master/TLSH_CTC_final.pdf\n");
printf(" \n");
printf(" Jonathan Oliver, Scott Forman and Chun Cheng,\n");
printf(" \"Using Randomization to Attack Similarity Digests\"\n");
printf(" Applications and Techniques in Information Security. Springer Berlin Heidelberg, 2014. 199-210.\n");
printf(" https://github.com/trendmicro/tlsh/blob/master/Attacking_LSH_and_Sim_Dig.pdf\n");
printf("\n");
printf(" Jonathan Oliver and Jayson Pryde\n");
printf(" http://blog.trendmicro.com/trendlabs-security-intelligence/smart-whitelisting-using-locality-sensitive-hashing/\n");
printf("\n");
printf("\n");
printf("\n");
printf("\n");
printf("\n");
printf("\n");
printf("\n");
printf("\n");
printf(" \n");
printf(" SHA1 of first 242 lines of LICENSE - so that we can append NOTICE.txt to LICENSE\n");
printf(" $ head -n 242 LICENSE | openssl dgst -sha1\n");
printf(" (stdin)= 11e8757af16132dd60979eacd73a525a40ff31f0\n");
printf("\n");
}
const char *Tlsh::version()
{
static char versionBuf[256];
if (versionBuf[0] == '\0')
snprintf(versionBuf, sizeof(versionBuf), "%d.%d.%d %s %s sliding_window=%d",
VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH, TLSH_HASH, TLSH_CHECKSUM, SLIDING_WND_SIZE);
return versionBuf;
}
void Tlsh::update(const unsigned char* data, unsigned int len)
{
if ( NULL != impl )
impl->update(data, len);
}
void Tlsh::final(const unsigned char* data, unsigned int len, int fc_cons_option)
{
if ( NULL != impl ){
if ( NULL != data && len > 0 )
impl->update(data, len);
impl->final(fc_cons_option);
}
}
const char* Tlsh::getHash(int showvers) const
{
if ( NULL != impl )
return impl->hash(showvers);
else
return "";
}
const char* Tlsh::getHash (char *buffer, unsigned int bufSize, int showvers) const
{
if ( NULL != impl )
return impl->hash(buffer, bufSize, showvers);
else {
buffer[0] = '\0';
return buffer;
}
}
void Tlsh::reset()
{
if ( NULL != impl )
impl->reset();
}
Tlsh& Tlsh::operator=(const Tlsh& other)
{
if (this == &other)
return *this;
*impl = *other.impl;
return *this;
}
bool Tlsh::operator==(const Tlsh& other) const
{
if( this == &other )
return true;
else if( NULL == impl || NULL == other.impl )
return false;
else
return ( 0 == impl->compare(*other.impl) );
}
bool Tlsh::operator!=(const Tlsh& other) const
{
return !(*this==other);
}
int Tlsh::Lvalue()
{
return( impl->Lvalue() );
}
int Tlsh::Q1ratio()
{
return( impl->Q1ratio() );
}
int Tlsh::Q2ratio()
{
return( impl->Q2ratio() );
}
int Tlsh::Checksum(int k)
{
return( impl->Checksum(k) );
}
int Tlsh::BucketValue(int bucket)
{
return( impl->BucketValue(bucket) );
}
int Tlsh::totalDiff(const Tlsh *other, bool len_diff) const
{
if( NULL==impl || NULL == other || NULL == other->impl )
return -(EINVAL);
else if ( this == other )
return 0;
else
return (impl->totalDiff(*other->impl, len_diff));
}
int Tlsh::fromTlshStr(const char* str)
{
if ( NULL == impl )
return -(ENOMEM);
else if ( NULL == str )
return -(EINVAL);
else
return impl->fromTlshStr(str);
}
bool Tlsh::isValid() const
{
return (impl ? impl->isValid() : false);
}

805
deps/tlsh/tlsh_impl.cpp vendored Normal file
View File

@ -0,0 +1,805 @@
/*
* TLSH is provided for use under two licenses: Apache OR BSD.
* Users may opt to use either license depending on the license
* restictions of the systems with which they plan to integrate
* the TLSH code.
*/
/* ==============
* Apache License
* ==============
* Copyright 2013 Trend Micro Incorporated
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/* ===========
* BSD License
* ===========
* Copyright (c) 2013, Trend Micro Incorporated
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without modification,
* are permitted provided that the following conditions are met:
*
* 1. Redistributions of source code must retain the above copyright notice, this
* list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright notice,
* this list of conditions and the following disclaimer in the documentation
* and/or other materials provided with the distribution.
* 3. Neither the name of the copyright holder nor the names of its contributors
* may be used to endorse or promote products derived from this software without
* specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
* IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
* INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
* OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "include/tlsh/tlsh.h"
#include "include/tlsh/tlsh_impl.h"
#include "include/tlsh/tlsh_util.h"
#include <string>
#include <cassert>
#include <cstdio>
#include <cmath>
#include <algorithm>
#include <string.h>
#include <errno.h>
#define RANGE_LVALUE 256
#define RANGE_QRATIO 16
static void find_quartile(unsigned int *q1, unsigned int *q2, unsigned int *q3, const unsigned int * a_bucket);
static unsigned int partition(unsigned int * buf, unsigned int left, unsigned int right);
////////////////////////////////////////////////////////////////////////////////////////////////
TlshImpl::TlshImpl() : a_bucket(NULL), data_len(0), lsh_code(NULL), lsh_code_valid(false)
{
memset(this->slide_window, 0, sizeof this->slide_window);
memset(&this->lsh_bin, 0, sizeof this->lsh_bin);
assert (sizeof (this->lsh_bin.Q.QR) == sizeof (this->lsh_bin.Q.QB));
}
TlshImpl::~TlshImpl()
{
delete [] this->a_bucket;
delete [] this->lsh_code;
}
void TlshImpl::reset()
{
delete [] this->a_bucket; this->a_bucket = NULL;
memset(this->slide_window, 0, sizeof this->slide_window);
delete [] this->lsh_code; this->lsh_code = NULL;
memset(&this->lsh_bin, 0, sizeof this->lsh_bin);
this->data_len = 0;
this->lsh_code_valid = false;
}
////////////////////////////////////////////////////////////////////////////////////////////
// Pearson's sample random table
static unsigned char v_table[256] = {
1, 87, 49, 12, 176, 178, 102, 166, 121, 193, 6, 84, 249, 230, 44, 163,
14, 197, 213, 181, 161, 85, 218, 80, 64, 239, 24, 226, 236, 142, 38, 200,
110, 177, 104, 103, 141, 253, 255, 50, 77, 101, 81, 18, 45, 96, 31, 222,
25, 107, 190, 70, 86, 237, 240, 34, 72, 242, 20, 214, 244, 227, 149, 235,
97, 234, 57, 22, 60, 250, 82, 175, 208, 5, 127, 199, 111, 62, 135, 248,
174, 169, 211, 58, 66, 154, 106, 195, 245, 171, 17, 187, 182, 179, 0, 243,
132, 56, 148, 75, 128, 133, 158, 100, 130, 126, 91, 13, 153, 246, 216, 219,
119, 68, 223, 78, 83, 88, 201, 99, 122, 11, 92, 32, 136, 114, 52, 10,
138, 30, 48, 183, 156, 35, 61, 26, 143, 74, 251, 94, 129, 162, 63, 152,
170, 7, 115, 167, 241, 206, 3, 150, 55, 59, 151, 220, 90, 53, 23, 131,
125, 173, 15, 238, 79, 95, 89, 16, 105, 137, 225, 224, 217, 160, 37, 123,
118, 73, 2, 157, 46, 116, 9, 145, 134, 228, 207, 212, 202, 215, 69, 229,
27, 188, 67, 124, 168, 252, 42, 4, 29, 108, 21, 247, 19, 205, 39, 203,
233, 40, 186, 147, 198, 192, 155, 33, 164, 191, 98, 204, 165, 180, 117, 76,
140, 36, 210, 172, 41, 54, 159, 8, 185, 232, 113, 196, 231, 47, 146, 120,
51, 65, 28, 144, 254, 221, 93, 189, 194, 139, 112, 43, 71, 109, 184, 209
};
// Pearson's algorithm
unsigned char b_mapping(unsigned char salt, unsigned char i, unsigned char j, unsigned char k) {
unsigned char h = 0;
h = v_table[h ^ salt];
h = v_table[h ^ i];
h = v_table[h ^ j];
h = v_table[h ^ k];
return h;
}
/*
NEVER USED - showing a step in the optimization sequence
unsigned char faster_b_mapping(unsigned char mod_salt, unsigned char i, unsigned char j, unsigned char k) {
unsigned char h;
h = v_table[mod_salt ^ i];
h = v_table[h ^ j];
h = v_table[h ^ k];
return h;
}
*/
#define fast_b_mapping(ms,i,j,k) (v_table[ v_table[ v_table[ms^i] ^ j] ^ k ])
////////////////////////////////////////////////////////////////////////////////////////////
#if SLIDING_WND_SIZE==5
#define SLIDING_WND_SIZE_M1 4
#elif SLIDING_WND_SIZE==4
#define SLIDING_WND_SIZE_M1 3
#elif SLIDING_WND_SIZE==6
#define SLIDING_WND_SIZE_M1 5
#elif SLIDING_WND_SIZE==7
#define SLIDING_WND_SIZE_M1 6
#elif SLIDING_WND_SIZE==8
#define SLIDING_WND_SIZE_M1 7
#endif
void TlshImpl::update(const unsigned char* data, unsigned int len)
{
if (this->lsh_code_valid) {
fprintf(stderr, "call to update() on a tlsh that is already valid\n");
return;
}
#define RNG_SIZE SLIDING_WND_SIZE
#define RNG_IDX(i) ((i+RNG_SIZE)%RNG_SIZE)
unsigned int fed_len = this->data_len;
if (this->a_bucket == NULL) {
this->a_bucket = new unsigned int [BUCKETS];
memset(this->a_bucket, 0, sizeof(int)*BUCKETS);
}
#if SLIDING_WND_SIZE==5
if (TLSH_CHECKSUM_LEN == 1) {
fast_update(data, len);
return;
}
#endif
int j = static_cast<int>(this->data_len % RNG_SIZE);
for( unsigned int i=0; i<len; i++, fed_len++, j=RNG_IDX(j+1) ) {
this->slide_window[j] = data[i];
if ( fed_len >= SLIDING_WND_SIZE_M1 ) {
//only calculate when input >= 5 bytes
int j_1 = RNG_IDX(j-1);
int j_2 = RNG_IDX(j-2);
int j_3 = RNG_IDX(j-3);
#if SLIDING_WND_SIZE>=5
int j_4 = RNG_IDX(j-4);
#endif
#if SLIDING_WND_SIZE>=6
int j_5 = RNG_IDX(j-5);
#endif
#if SLIDING_WND_SIZE>=7
int j_6 = RNG_IDX(j-6);
#endif
#if SLIDING_WND_SIZE>=8
int j_7 = RNG_IDX(j-7);
#endif
#ifndef CHECKSUM_0B
for (int k = 0; k < TLSH_CHECKSUM_LEN; k++) {
if (k == 0) {
// b_mapping(0, ... )
this->lsh_bin.checksum[k] = fast_b_mapping(1, this->slide_window[j], this->slide_window[j_1], this->lsh_bin.checksum[k]);
} else {
// use calculated 1 byte checksums to expand the total checksum to 3 bytes
this->lsh_bin.checksum[k] = b_mapping(this->lsh_bin.checksum[k-1], this->slide_window[j], this->slide_window[j_1], this->lsh_bin.checksum[k]);
}
}
#endif
unsigned char r;
// b_mapping(2, ... )
r = fast_b_mapping(49, this->slide_window[j], this->slide_window[j_1], this->slide_window[j_2]);
this->a_bucket[r]++;
// b_mapping(3, ... )
r = fast_b_mapping(12, this->slide_window[j], this->slide_window[j_1], this->slide_window[j_3]);
this->a_bucket[r]++;
// b_mapping(5, ... )
r = fast_b_mapping(178, this->slide_window[j], this->slide_window[j_2], this->slide_window[j_3]);
this->a_bucket[r]++;
#if SLIDING_WND_SIZE>=5
// b_mapping(7, ... )
r = fast_b_mapping(166, this->slide_window[j], this->slide_window[j_2], this->slide_window[j_4]);
this->a_bucket[r]++;
// b_mapping(11, ... )
r = fast_b_mapping(84, this->slide_window[j], this->slide_window[j_1], this->slide_window[j_4]);
this->a_bucket[r]++;
// b_mapping(13, ... )
r = fast_b_mapping(230, this->slide_window[j], this->slide_window[j_3], this->slide_window[j_4]);
this->a_bucket[r]++;
#endif
#if SLIDING_WND_SIZE>=6
// b_mapping(17, ... )
r = fast_b_mapping(197, this->slide_window[j], this->slide_window[j_1], this->slide_window[j_5]);
this->a_bucket[r]++;
// b_mapping(19, ... )
r = fast_b_mapping(181, this->slide_window[j], this->slide_window[j_2], this->slide_window[j_5]);
this->a_bucket[r]++;
// b_mapping(23, ... )
r = fast_b_mapping(80, this->slide_window[j], this->slide_window[j_3], this->slide_window[j_5]);
this->a_bucket[r]++;
// b_mapping(29, ... )
r = fast_b_mapping(142, this->slide_window[j], this->slide_window[j_4], this->slide_window[j_5]);
this->a_bucket[r]++;
#endif
#if SLIDING_WND_SIZE>=7
// b_mapping(31, ... )
r = fast_b_mapping(200, this->slide_window[j], this->slide_window[j_1], this->slide_window[j_6]);
this->a_bucket[r]++;
// b_mapping(37, ... )
r = fast_b_mapping(253, this->slide_window[j], this->slide_window[j_2], this->slide_window[j_6]);
this->a_bucket[r]++;
// b_mapping(41, ... )
r = fast_b_mapping(101, this->slide_window[j], this->slide_window[j_3], this->slide_window[j_6]);
this->a_bucket[r]++;
// b_mapping(43, ... )
r = fast_b_mapping(18, this->slide_window[j], this->slide_window[j_4], this->slide_window[j_6]);
this->a_bucket[r]++;
// b_mapping(47, ... )
r = fast_b_mapping(222, this->slide_window[j], this->slide_window[j_5], this->slide_window[j_6]);
this->a_bucket[r]++;
#endif
#if SLIDING_WND_SIZE>=8
// b_mapping(53, ... )
r = fast_b_mapping(237, this->slide_window[j], this->slide_window[j_1], this->slide_window[j_7]);
this->a_bucket[r]++;
// b_mapping(59, ... )
r = fast_b_mapping(214, this->slide_window[j], this->slide_window[j_2], this->slide_window[j_7]);
this->a_bucket[r]++;
// b_mapping(61, ... )
r = fast_b_mapping(227, this->slide_window[j], this->slide_window[j_3], this->slide_window[j_7]);
this->a_bucket[r]++;
// b_mapping(67, ... )
r = fast_b_mapping(22, this->slide_window[j], this->slide_window[j_4], this->slide_window[j_7]);
this->a_bucket[r]++;
// b_mapping(71, ... )
r = fast_b_mapping(175, this->slide_window[j], this->slide_window[j_5], this->slide_window[j_7]);
this->a_bucket[r]++;
// b_mapping(73, ... )
r = fast_b_mapping(5, this->slide_window[j], this->slide_window[j_6], this->slide_window[j_7]);
this->a_bucket[r]++;
#endif
}
}
this->data_len += len;
}
/////////////////////////////////////////////////////////////////////////////
// update for the case when SLIDING_WND_SIZE==5 && (TLSH_CHECKSUM_LEN == 1)
/////////////////////////////////////////////////////////////////////////////
void TlshImpl::fast_update(const unsigned char* data, unsigned int len)
{
unsigned int fed_len = this->data_len;
int j = static_cast<int>(this->data_len % RNG_SIZE);
unsigned char checksum = this->lsh_bin.checksum[0];
for( unsigned int i=0; i<len; ) {
if ( fed_len >= SLIDING_WND_SIZE_M1 ) {
//only calculate when input >= 5 bytes
if ((i >= 4) && (i+5 < len)) {
unsigned a0 = data[i-4];
unsigned a1 = data[i-3];
unsigned a2 = data[i-2];
unsigned a3 = data[i-1];
unsigned a4 = data[i];
unsigned a5 = data[i+1];
unsigned a6 = data[i+2];
unsigned a7 = data[i+3];
unsigned a8 = data[i+4];
checksum = fast_b_mapping(1, a4, a3, checksum );
this->a_bucket[ fast_b_mapping(49, a4, a3, a2 ) ]++;
this->a_bucket[ fast_b_mapping(12, a4, a3, a1 ) ]++;
this->a_bucket[ fast_b_mapping(178, a4, a2, a1 ) ]++;
this->a_bucket[ fast_b_mapping(166, a4, a2, a0 ) ]++;
this->a_bucket[ fast_b_mapping(84, a4, a3, a0 ) ]++;
this->a_bucket[ fast_b_mapping(230, a4, a1, a0 ) ]++;
checksum = fast_b_mapping(1, a5, a4, checksum );
this->a_bucket[ fast_b_mapping(49, a5, a4, a3 ) ]++;
this->a_bucket[ fast_b_mapping(12, a5, a4, a2 ) ]++;
this->a_bucket[ fast_b_mapping(178, a5, a3, a2 ) ]++;
this->a_bucket[ fast_b_mapping(166, a5, a3, a1 ) ]++;
this->a_bucket[ fast_b_mapping(84, a5, a4, a1 ) ]++;
this->a_bucket[ fast_b_mapping(230, a5, a2, a1 ) ]++;
checksum = fast_b_mapping(1, a6, a5, checksum );
this->a_bucket[ fast_b_mapping(49, a6, a5, a4 ) ]++;
this->a_bucket[ fast_b_mapping(12, a6, a5, a3 ) ]++;
this->a_bucket[ fast_b_mapping(178, a6, a4, a3 ) ]++;
this->a_bucket[ fast_b_mapping(166, a6, a4, a2 ) ]++;
this->a_bucket[ fast_b_mapping(84, a6, a5, a2 ) ]++;
this->a_bucket[ fast_b_mapping(230, a6, a3, a2 ) ]++;
checksum = fast_b_mapping(1, a7, a6, checksum );
this->a_bucket[ fast_b_mapping(49, a7, a6, a5 ) ]++;
this->a_bucket[ fast_b_mapping(12, a7, a6, a4 ) ]++;
this->a_bucket[ fast_b_mapping(178, a7, a5, a4 ) ]++;
this->a_bucket[ fast_b_mapping(166, a7, a5, a3 ) ]++;
this->a_bucket[ fast_b_mapping(84, a7, a6, a3 ) ]++;
this->a_bucket[ fast_b_mapping(230, a7, a4, a3 ) ]++;
checksum = fast_b_mapping(1, a8, a7, checksum );
this->a_bucket[ fast_b_mapping(49, a8, a7, a6 ) ]++;
this->a_bucket[ fast_b_mapping(12, a8, a7, a5 ) ]++;
this->a_bucket[ fast_b_mapping(178, a8, a6, a5 ) ]++;
this->a_bucket[ fast_b_mapping(166, a8, a6, a4 ) ]++;
this->a_bucket[ fast_b_mapping(84, a8, a7, a4 ) ]++;
this->a_bucket[ fast_b_mapping(230, a8, a5, a4 ) ]++;
i=i+5;
fed_len=fed_len+5;
j=RNG_IDX(j+5);
} else {
this->slide_window[j] = data[i];
int j_1 = RNG_IDX(j-1); if (i >= 1) { this->slide_window[j_1] = data[i-1]; }
int j_2 = RNG_IDX(j-2); if (i >= 2) { this->slide_window[j_2] = data[i-2]; }
int j_3 = RNG_IDX(j-3); if (i >= 3) { this->slide_window[j_3] = data[i-3]; }
int j_4 = RNG_IDX(j-4); if (i >= 4) { this->slide_window[j_4] = data[i-4]; }
checksum = fast_b_mapping(1, this->slide_window[j], this->slide_window[j_1], checksum );
this->a_bucket[ fast_b_mapping(49, this->slide_window[j], this->slide_window[j_1], this->slide_window[j_2] ) ]++;
this->a_bucket[ fast_b_mapping(12, this->slide_window[j], this->slide_window[j_1], this->slide_window[j_3] ) ]++;
this->a_bucket[ fast_b_mapping(178, this->slide_window[j], this->slide_window[j_2], this->slide_window[j_3] ) ]++;
this->a_bucket[ fast_b_mapping(166, this->slide_window[j], this->slide_window[j_2], this->slide_window[j_4] ) ]++;
this->a_bucket[ fast_b_mapping(84, this->slide_window[j], this->slide_window[j_1], this->slide_window[j_4] ) ]++;
this->a_bucket[ fast_b_mapping(230, this->slide_window[j], this->slide_window[j_3], this->slide_window[j_4] ) ]++;
i++;
fed_len++;
j=RNG_IDX(j+1);
}
} else {
i++;
fed_len++;
j=RNG_IDX(j+1);
}
}
this->lsh_bin.checksum[0] = checksum;
this->data_len += len;
}
/////////////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////////////////////////
/* to signal the class there is no more data to be added */
void TlshImpl::final(int fc_cons_option)
{
if (this->lsh_code_valid) {
fprintf(stderr, "call to final() on a tlsh that is already valid\n");
return;
}
// incoming data must more than or equal to MIN_DATA_LENGTH bytes
if ((fc_cons_option <= 1) && (this->data_len < MIN_DATA_LENGTH)) {
// this->lsh_code be empty
delete [] this->a_bucket; this->a_bucket = NULL;
return;
}
if ((fc_cons_option == 2) && (this->data_len < MIN_CONSERVATIVE_DATA_LENGTH)) {
// this->lsh_code be empty
delete [] this->a_bucket; this->a_bucket = NULL;
return;
}
unsigned int q1, q2, q3;
find_quartile(&q1, &q2, &q3, this->a_bucket);
// buckets must be more than 50% non-zero
int nonzero = 0;
for(unsigned int i=0; i<CODE_SIZE; i++) {
for(unsigned int j=0; j<4; j++) {
if (this->a_bucket[4*i + j] > 0) {
nonzero++;
}
}
}
#if defined BUCKETS_48
if (nonzero < 18) {
// printf("nonzero=%d\n", nonzero);
delete [] this->a_bucket; this->a_bucket = NULL;
return;
}
#else
if (nonzero <= 4*CODE_SIZE/2) {
delete [] this->a_bucket; this->a_bucket = NULL;
return;
}
#endif
for(unsigned int i=0; i<CODE_SIZE; i++) {
unsigned char h=0;
for(unsigned int j=0; j<4; j++) {
unsigned int k = this->a_bucket[4*i + j];
if( q3 < k ) {
h += 3 << (j*2); // leave the optimization j*2 = j<<1 or j*2 = j+j for compiler
} else if( q2 < k ) {
h += 2 << (j*2);
} else if( q1 < k ) {
h += 1 << (j*2);
}
}
this->lsh_bin.tmp_code[i] = h;
}
//Done with a_bucket so deallocate
delete [] this->a_bucket; this->a_bucket = NULL;
this->lsh_bin.Lvalue = l_capturing(this->data_len);
this->lsh_bin.Q.QR.Q1ratio = static_cast<unsigned int> (static_cast<float>(q1*100)/static_cast<float> (q3)) % 16;
this->lsh_bin.Q.QR.Q2ratio = static_cast<unsigned int> (static_cast<float>(q2*100)/static_cast<float> (q3)) % 16;
this->lsh_code_valid = true;
}
int TlshImpl::fromTlshStr(const char* str)
{
// Assume that we have 128 Buckets
int start = 0;
if (strncmp(str, "T1", 2) == 0) {
start = 2;
} else {
start = 0;
}
// Validate input string
for( int ii=0; ii < INTERNAL_TLSH_STRING_LEN; ii++ ) {
int i = ii + start;
if (!( (str[i] >= '0' && str[i] <= '9') ||
(str[i] >= 'A' && str[i] <= 'F') ||
(str[i] >= 'a' && str[i] <= 'f') ))
{
// printf("warning ii=%d str[%d]='%c'\n", ii, i, str[i]);
return 1;
}
}
int xi = INTERNAL_TLSH_STRING_LEN + start;
if (( (str[xi] >= '0' && str[xi] <= '9') ||
(str[xi] >= 'A' && str[xi] <= 'F') ||
(str[xi] >= 'a' && str[xi] <= 'f') ))
{
// printf("warning xi=%d\n", xi);
return 1;
}
this->reset();
lsh_bin_struct tmp;
from_hex( &str[start], INTERNAL_TLSH_STRING_LEN, reinterpret_cast<unsigned char*>(&tmp) );
// Reconstruct checksum, Qrations & lvalue
for (int k = 0; k < TLSH_CHECKSUM_LEN; k++) {
this->lsh_bin.checksum[k] = swap_byte(tmp.checksum[k]);
}
this->lsh_bin.Lvalue = swap_byte( tmp.Lvalue );
this->lsh_bin.Q.QB = swap_byte(tmp.Q.QB);
for( int i=0; i < CODE_SIZE; i++ ){
this->lsh_bin.tmp_code[i] = (tmp.tmp_code[CODE_SIZE-1-i]);
}
this->lsh_code_valid = true;
return 0;
}
const char* TlshImpl::hash(char *buffer, unsigned int bufSize, int showvers) const
{
if (bufSize < TLSH_STRING_LEN_REQ + 1) {
strncpy(buffer, "", bufSize);
return buffer;
}
if (this->lsh_code_valid == false) {
strncpy(buffer, "", bufSize);
return buffer;
}
lsh_bin_struct tmp;
for (int k = 0; k < TLSH_CHECKSUM_LEN; k++) {
tmp.checksum[k] = swap_byte( this->lsh_bin.checksum[k] );
}
tmp.Lvalue = swap_byte( this->lsh_bin.Lvalue );
tmp.Q.QB = swap_byte( this->lsh_bin.Q.QB );
for( int i=0; i < CODE_SIZE; i++ ){
tmp.tmp_code[i] = (this->lsh_bin.tmp_code[CODE_SIZE-1-i]);
}
if (showvers) {
buffer[0] = 'T';
buffer[1] = '0' + showvers;
to_hex( reinterpret_cast<unsigned char*>(&tmp), sizeof(tmp), &buffer[2]);
} else {
to_hex( reinterpret_cast<unsigned char*>(&tmp), sizeof(tmp), buffer);
}
return buffer;
}
/* to get the hex-encoded hash code */
const char* TlshImpl::hash(int showvers) const
{
if (this->lsh_code != NULL) {
// lsh_code has been previously calculated, so just return it
return this->lsh_code;
}
this->lsh_code = new char [TLSH_STRING_LEN_REQ+1];
memset(this->lsh_code, 0, TLSH_STRING_LEN_REQ+1);
return hash(this->lsh_code, TLSH_STRING_LEN_REQ+1, showvers);
}
// compare
int TlshImpl::compare(const TlshImpl& other) const
{
return (memcmp( &(this->lsh_bin), &(other.lsh_bin), sizeof(this->lsh_bin)));
}
////////////////////////////////////////////
// the default for these parameters is 12
////////////////////////////////////////////
static int length_mult = 12;
static int qratio_mult = 12;
#ifdef TLSH_DISTANCE_PARAMETERS
int hist_diff1_add = 1;
int hist_diff2_add = 2;
int hist_diff3_add = 6;
void set_tlsh_distance_parameters(int length_mult_value, int qratio_mult_value, int hist_diff1_add_value, int hist_diff2_add_value, int hist_diff3_add_value)
{
if (length_mult_value != -1) {
length_mult = length_mult_value;
}
if (qratio_mult_value != -1) {
qratio_mult = qratio_mult_value;
}
if (hist_diff1_add_value != -1) {
hist_diff1_add = hist_diff1_add_value;
}
if (hist_diff2_add_value != -1) {
hist_diff2_add = hist_diff2_add_value;
}
if (hist_diff3_add_value != -1) {
hist_diff3_add = hist_diff3_add_value;
}
}
#endif
int TlshImpl::Lvalue()
{
return(this->lsh_bin.Lvalue);
}
int TlshImpl::Q1ratio()
{
return(this->lsh_bin.Q.QR.Q1ratio);
}
int TlshImpl::Q2ratio()
{
return(this->lsh_bin.Q.QR.Q2ratio);
}
int TlshImpl::Checksum(int k)
{
if ((k >= TLSH_CHECKSUM_LEN) || (k < 0)) {
return(0);
}
return(this->lsh_bin.checksum[k]);
}
int TlshImpl::BucketValue(int bucket)
{
int idx;
int elem;
unsigned char bv;
// default TLSH
// #define EFF_BUCKETS 128
// #define CODE_SIZE 32 // 128 * 2 bits = 32 bytes
idx = (CODE_SIZE - (bucket / 4)) - 1;
// if ((idx < 0) || (idx >= CODE_SIZE)) {
// printf("error in BucketValue: idx=%d\n", idx);
// exit(1);
// }
elem = bucket % 4;
bv = this->lsh_bin.tmp_code[idx];
int h1 = bv / 16;
int h2 = bv % 16;
int p1 = h1 / 4;
int p2 = h1 % 4;
int p3 = h2 / 4;
int p4 = h2 % 4;
if (elem == 0) {
return(p1);
}
if (elem == 1) {
return(p2);
}
if (elem == 2) {
return(p3);
}
return(p4);
}
int TlshImpl::totalDiff(const TlshImpl& other, bool len_diff) const
{
int diff = 0;
if (len_diff) {
int ldiff = mod_diff( this->lsh_bin.Lvalue, other.lsh_bin.Lvalue, RANGE_LVALUE);
if ( ldiff == 0 )
diff = 0;
else if ( ldiff == 1 )
diff = 1;
else
diff += ldiff*length_mult;
}
int q1diff = mod_diff( this->lsh_bin.Q.QR.Q1ratio, other.lsh_bin.Q.QR.Q1ratio, RANGE_QRATIO);
if ( q1diff <= 1 )
diff += q1diff;
else
diff += (q1diff-1)*qratio_mult;
int q2diff = mod_diff( this->lsh_bin.Q.QR.Q2ratio, other.lsh_bin.Q.QR.Q2ratio, RANGE_QRATIO);
if ( q2diff <= 1)
diff += q2diff;
else
diff += (q2diff-1)*qratio_mult;
for (int k = 0; k < TLSH_CHECKSUM_LEN; k++) {
if (this->lsh_bin.checksum[k] != other.lsh_bin.checksum[k] ) {
diff ++;
break;
}
}
diff += h_distance( CODE_SIZE, this->lsh_bin.tmp_code, other.lsh_bin.tmp_code );
return (diff);
}
#define SWAP_UINT(x,y) do {\
unsigned int int_tmp = (x); \
(x) = (y); \
(y) = int_tmp; } while(0)
void find_quartile(unsigned int *q1, unsigned int *q2, unsigned int *q3, const unsigned int * a_bucket)
{
unsigned int bucket_copy[EFF_BUCKETS], short_cut_left[EFF_BUCKETS], short_cut_right[EFF_BUCKETS], spl=0, spr=0;
unsigned int p1 = EFF_BUCKETS/4-1;
unsigned int p2 = EFF_BUCKETS/2-1;
unsigned int p3 = EFF_BUCKETS-EFF_BUCKETS/4-1;
unsigned int end = EFF_BUCKETS-1;
for(unsigned int i=0; i<=end; i++) {
bucket_copy[i] = a_bucket[i];
}
for( unsigned int l=0, r=end; ; ) {
unsigned int ret = partition( bucket_copy, l, r );
if( ret > p2 ) {
r = ret - 1;
short_cut_right[spr] = ret;
spr++;
} else if( ret < p2 ){
l = ret + 1;
short_cut_left[spl] = ret;
spl++;
} else {
*q2 = bucket_copy[p2];
break;
}
}
short_cut_left[spl] = p2-1;
short_cut_right[spr] = p2+1;
for( unsigned int i=0, l=0; i<=spl; i++ ) {
unsigned int r = short_cut_left[i];
if( r > p1 ) {
for( ; ; ) {
unsigned int ret = partition( bucket_copy, l, r );
if( ret > p1 ) {
r = ret-1;
} else if( ret < p1 ) {
l = ret+1;
} else {
*q1 = bucket_copy[p1];
break;
}
}
break;
} else if( r < p1 ) {
l = r;
} else {
*q1 = bucket_copy[p1];
break;
}
}
for( unsigned int i=0, r=end; i<=spr; i++ ) {
unsigned int l = short_cut_right[i];
if( l < p3 ) {
for( ; ; ) {
unsigned int ret = partition( bucket_copy, l, r );
if( ret > p3 ) {
r = ret-1;
} else if( ret < p3 ) {
l = ret+1;
} else {
*q3 = bucket_copy[p3];
break;
}
}
break;
} else if( l > p3 ) {
r = l;
} else {
*q3 = bucket_copy[p3];
break;
}
}
}
unsigned int partition(unsigned int * buf, unsigned int left, unsigned int right)
{
if( left == right ) {
return left;
}
if( left+1 == right ) {
if( buf[left] > buf[right] ) {
SWAP_UINT( buf[left], buf[right] );
}
return left;
}
unsigned int ret = left, pivot = (left + right)>>1;
unsigned int val = buf[pivot];
buf[pivot] = buf[right];
buf[right] = val;
for( unsigned int i = left; i < right; i++ ) {
if( buf[i] < val ) {
SWAP_UINT( buf[ret], buf[i] );
ret++;
}
}
buf[right] = buf[ret];
buf[ret] = val;
return ret;
}

5051
deps/tlsh/tlsh_util.cpp vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@ -13,6 +13,7 @@
#include "retdec/fileformat/file_format/file_format.h"
#include "retdec/fileformat/types/note_section/elf_notes.h"
#include "retdec/fileformat/types/import_table/elf_import_table.h"
namespace retdec {
namespace fileformat {
@ -23,6 +24,11 @@ namespace fileformat {
class ElfFormat : public FileFormat
{
private:
std::vector<std::string> telfhashSymbols;
/// flag if we already loaded symbols from SHT_DYNSYM
bool telfhashDynsym = false;
std::string telfhash;
/**
* Description of ELF relocation table
*/
@ -80,6 +86,7 @@ class ElfFormat : public FileFormat
void loadCorePrPsInfo(std::size_t offset, std::size_t size);
void loadCoreAuxvInfo(std::size_t offset, std::size_t size);
void loadCoreInfo();
void loadTelfhash();
/// @}
protected:
int elfClass; ///< class of input ELF file
@ -134,6 +141,7 @@ class ElfFormat : public FileFormat
std::size_t getOsOrAbiVersion() const;
std::size_t getSectionTableSize() const;
std::size_t getSegmentTableSize() const;
const std::string& getTelfhash() const;
int getElfClass() const;
bool isWiiPowerPc() const;
/// @}

View File

@ -0,0 +1,20 @@
/**
* @file include/retdec/fileformat/types/import_table/elf_import_table.h
* @brief Class for ELF import table.
* @copyright (c) 2021 Avast Software, licensed under the MIT license
*/
#include "import_table.h"
#include "retdec/fileformat/types/dotnet_headers/metadata_tables.h"
namespace retdec {
namespace fileformat {
class ElfImportTable : public ImportTable
{
public:
void computeHashes() override;
};
} // namespace fileformat
} // namespace retdec

View File

@ -20,7 +20,7 @@ namespace fileformat {
*/
class ImportTable
{
private:
protected:
using importsIterator = std::vector<std::unique_ptr<Import>>::const_iterator;
std::vector<std::string> libraries; ///< name of libraries
std::vector<std::string> missingDeps; ///< missing dependencies
@ -28,6 +28,7 @@ class ImportTable
std::string impHashCrc32; ///< imphash CRC32
std::string impHashMd5; ///< imphash MD5
std::string impHashSha256; ///< imphash SHA256
std::string impHashTlsh;
public:
/// @name Getters
/// @{
@ -39,6 +40,7 @@ class ImportTable
const std::string& getImphashCrc32() const;
const std::string& getImphashMd5() const;
const std::string& getImphashSha256() const;
const std::string& getImpHashTlsh() const;
const std::vector<std::string> & getMissingDependencies() const;
std::string getLibrary(std::size_t libraryIndex) const;
@ -55,7 +57,7 @@ class ImportTable
/// @name Other methods
/// @{
void computeHashes();
virtual void computeHashes();
void clear();
void addLibrary(std::string name, bool missingDependency = false);
void addImport(std::unique_ptr<Import>&& import);
@ -70,6 +72,8 @@ class ImportTable
void dump(std::string &dumpTable) const;
void dumpLibrary(std::size_t libraryIndex, std::string &libraryDump) const;
/// @}
virtual ~ImportTable() = default;
};
} // namespace fileformat

View File

@ -34,6 +34,7 @@ add_library(fileformat STATIC
types/visual_basic/visual_basic_extern.cpp
types/import_table/import.cpp
types/import_table/import_table.cpp
types/import_table/elf_import_table.cpp
types/import_table/pe_import.cpp
types/export_table/export.cpp
types/export_table/export_table.cpp
@ -97,6 +98,7 @@ target_link_libraries(fileformat
retdec::deps::elfio
retdec::deps::llvm
PRIVATE
retdec::deps::tlsh
OpenSSL::Crypto
)

View File

@ -4,13 +4,16 @@
* @copyright (c) 2017 Avast Software, licensed under the MIT license
*/
#include <elfio/elf_types.hpp>
#include <map>
#include <regex>
#include "retdec/utils/conversion.h"
#include "retdec/utils/string.h"
#include "retdec/fileformat/file_format/elf/elf_format.h"
#include "retdec/fileformat/types/symbol_table/elf_symbol.h"
#include "retdec/fileformat/utils/conversions.h"
#include <tlsh/tlsh.h>
using namespace retdec::utils;
using namespace ELFIO;
@ -1733,6 +1736,14 @@ void ElfFormat::loadSymbols(const ELFIO::elfio *file, const ELFIO::symbol_sectio
std::unordered_multimap<std::string, unsigned long long> importNameAddressMap;
loadRelocations(file, section, importNameAddressMap);
/* check to ignore symbols from segments for telfhash this is pretty
ugly and error prone, find a better way to know symbol source */
bool isSegmentSymbols = section->get_name().find("dynamic_") != std::string::npos;
if (!telfhashDynsym && !isSegmentSymbols) {
telfhashSymbols = {};
}
for(std::size_t i = 0, e = elfSymbolTable->get_loaded_symbols_num(); i < e; ++i)
{
auto symbol = std::make_shared<ElfSymbol>();
@ -1747,6 +1758,15 @@ void ElfFormat::loadSymbols(const ELFIO::elfio *file, const ELFIO::symbol_sectio
symbol->setElfBind(bind);
symbol->setElfOther(other);
link = fixSymbolLink(link, value);
auto visibility = other & 0x3;
if (type == STT_FUNC && bind == STB_GLOBAL && visibility == STV_DEFAULT) {
/* check if we already have prefered dynsym symbols and ignore symbols from segments
this is pretty ugly and error prone, find a better way to know symbol source */
if (!telfhashDynsym && !isSegmentSymbols) {
telfhashSymbols.push_back(name);
}
}
if(link >= file->sections.size() || !file->sections[link] || link == SHN_ABS ||
link == SHN_COMMON || link == SHN_UNDEF || link == SHN_XINDEX)
{
@ -1758,7 +1778,7 @@ void ElfFormat::loadSymbols(const ELFIO::elfio *file, const ELFIO::symbol_sectio
{
if(!importTable)
{
importTable = new ImportTable();
importTable = new ElfImportTable();
}
auto keyIter = importNameAddressMap.equal_range(name);
// we create std::set from std::multimap values in order to ensure determinism
@ -1798,6 +1818,7 @@ void ElfFormat::loadSymbols(const ELFIO::elfio *file, const ELFIO::symbol_sectio
}
symtab->setName(section->get_name());
if(symtab->hasSymbols())
{
symbolTables.push_back(symtab);
@ -1806,11 +1827,85 @@ void ElfFormat::loadSymbols(const ELFIO::elfio *file, const ELFIO::symbol_sectio
{
delete symtab;
}
if (section->get_type() == SHT_DYNSYM) {
telfhashDynsym = true;
}
loadTelfhash();
loadImpHash();
loadExpHash();
}
/* exclusions are based on the original implementation
https://github.com/trendmicro/telfhash/blob/master/telfhash/telfhash.py */
static const std::unordered_set<std::string> exclusion_set = {
"__libc_start_main", // main function
"main", // main function
"abort", // ARM default
"cachectl", // MIPS default
"cacheflush", // MIPS default
"puts", // Compiler optimization (function replacement)
"atol", // Compiler optimization (function replacement)
"malloc_trim" // GNU extensions
};
/*
ignore
symbols starting with . or
x86-64 specific functions
string functions (str.* and mem.*), gcc changes them depending on architecture
symbols starting with . or _
*/
static std::regex exclusion_regex("(^[_\.].*$)|(^.*64$)|(^str.*$)|(^mem.*$)");
static bool isSymbolExcluded(const std::string& symbol)
{
return symbol.empty()
|| std::regex_match(symbol, exclusion_regex)
|| exclusion_set.count(symbol);
}
void ElfFormat::loadTelfhash()
{
std::vector<std::string> imported_symbols;
imported_symbols.reserve(telfhashSymbols.size());
for (const auto& symbol : telfhashSymbols) {
/* It is important to first exclude, then lowercase
as "Str_Aprintf" is valid, but would become
filtered when lower case */
if (isSymbolExcluded(symbol)) {
continue;
}
auto name = toLower(symbol);
imported_symbols.emplace_back(name);
}
/* sort them lexicographically */
std::sort(imported_symbols.begin(), imported_symbols.end());
std::string impHashString;
for (const auto& symbol : imported_symbols) {
if (!impHashString.empty())
impHashString.append(1, ',');
impHashString.append(symbol);
}
if (impHashString.size()) {
auto data = reinterpret_cast<const uint8_t*>(impHashString.data());
Tlsh tlsh;
tlsh.update(data, impHashString.size());
tlsh.final();
const int show_version = 1; /* this prepends the hash with 'T' + number of the version */
telfhash = toLower(tlsh.getHash(show_version));
}
}
/**
* Add new symbol table based on existing symbol table and based on global offset table
* @param oldTab Existing symbol table
@ -2957,5 +3052,10 @@ unsigned long long ElfFormat::getBaseOffset() const
return minOffset == std::numeric_limits<unsigned long long>::max() ? 0 : minOffset;
}
const std::string& ElfFormat::getTelfhash() const {
return telfhash;
}
} // namespace fileformat
} // namespace retdec

View File

@ -14,6 +14,7 @@ if(NOT TARGET retdec::fileformat)
pelib
elfio
llvm
tlsh
)
include(${CMAKE_CURRENT_LIST_DIR}/retdec-fileformat-targets.cmake)

View File

@ -0,0 +1,61 @@
/**
* @file include/retdec/fileformat/types/import_table/elf_import_table.h
* @brief Class for ELF import table.
* @copyright (c) 2021 Avast Software, licensed under the MIT license
*/
#include "retdec/fileformat/utils/crypto.h"
#include "retdec/utils/string.h"
#include "retdec/fileformat/types/import_table/elf_import_table.h"
#include <tlsh/tlsh.h>
#include <algorithm>
#include <unordered_set>>
#include <regex>
using namespace retdec::utils;
namespace retdec {
namespace fileformat {
void ElfImportTable::computeHashes()
{
std::vector<std::string> imported_symbols;
imported_symbols.reserve(imports.size());
for (const auto& import : imports) {
auto name = import->getName();
imported_symbols.emplace_back(toLower(name));
}
/* sort them lexicographically */
std::sort(imported_symbols.begin(), imported_symbols.end());
std::string impHashString;
for (const auto& symbol : imported_symbols) {
if (!impHashString.empty())
impHashString.append(1, ',');
impHashString.append(symbol);
}
if (impHashString.size()) {
auto data = reinterpret_cast<const uint8_t*>(impHashString.data());
Tlsh tlsh;
tlsh.update(data, impHashString.size());
tlsh.final();
/* this prepends the hash with 'T' + number of the version */
const int show_version = 1;
impHashTlsh = toLower(tlsh.getHash(show_version));
impHashCrc32 = getCrc32(data, impHashString.size());
impHashMd5 = getMd5(data, impHashString.size());
impHashSha256 = getSha256(data, impHashString.size());
}
}
} // namespace fileformat
} // namespace retdec

View File

@ -10,6 +10,7 @@
#include "retdec/fileformat/utils/crypto.h"
#include "retdec/pelib/PeLibAux.h"
#include "retdec/fileformat/types/import_table/import_table.h"
#include <tlsh/tlsh.h>
using namespace retdec::utils;
@ -675,6 +676,10 @@ const std::string& ImportTable::getImphashSha256() const
return impHashSha256;
}
const std::string& ImportTable::getImpHashTlsh() const {
return impHashTlsh;
}
/**
* Get list of missing dependencies
* @return Vector of missing dependencies
@ -759,7 +764,7 @@ ImportTable::importsIterator ImportTable::end() const
}
/**
* Compute import hashes - CRC32, MD5, SHA256.
* Compute import hashes - CRC32, MD5, SHA256, TLSH.
*/
void ImportTable::computeHashes()
{
@ -820,11 +825,19 @@ void ImportTable::computeHashes()
//}
}
if (impHashBytes.size())
{
impHashCrc32 = getCrc32((const uint8_t *)impHashBytes.data(), impHashBytes.size());
impHashMd5 = getMd5((const uint8_t *)impHashBytes.data(), impHashBytes.size());
impHashSha256 = getSha256((const uint8_t *)impHashBytes.data(), impHashBytes.size());
if (impHashBytes.size()) {
auto data = reinterpret_cast<const uint8_t*>(impHashBytes.data());
Tlsh tlsh;
tlsh.update(data, impHashBytes.size());
tlsh.final();
/* this prepends the hash with 'T' + number of the version */
const int show_version = 1;
impHashTlsh = toLower(tlsh.getHash(show_version));
impHashCrc32 = getCrc32(data, impHashBytes.size());
impHashMd5 = getMd5(data, impHashBytes.size());
impHashSha256 = getSha256(data, impHashBytes.size());
}
}

View File

@ -1949,6 +1949,10 @@ void ElfDetector::detectFileType()
fileInfo.setFileType(fileType);
}
void ElfDetector::getTelfhash() {
fileInfo.setTelfhash(elfParser->getTelfhash());
}
void ElfDetector::getAdditionalInfo()
{
getFileVersion();
@ -1962,6 +1966,7 @@ void ElfDetector::getAdditionalInfo()
getSymbolTable();
getNotes();
getCoreInfo();
getTelfhash();
}
/**

View File

@ -35,6 +35,7 @@ class ElfDetector : public FileDetector
void getDynamicSectionsSegments();
void getNotes();
void getCoreInfo();
void getTelfhash();
/// @}
protected:
/// @name Detection methods

View File

@ -105,6 +105,11 @@ std::string FileInformation::getPathToFile() const
return filePath;
}
std::string FileInformation::getTelfhash() const
{
return telfhash;
}
/**
* Get CRC32 of input file
* @return CRC32 of input file
@ -1204,6 +1209,14 @@ std::string FileInformation::getImphashSha256() const
{
return importTable.getImphashSha256();
}
/**
* Get imphash as Tlsh
* @return Imphash as Tlsh
*/
std::string FileInformation::getImphashTlsh() const
{
return importTable.getImphashTlsh();
}
/**
* Get import
@ -3729,6 +3742,11 @@ void FileInformation::setPathToFile(const std::string &filepath)
filePath = filepath;
}
void FileInformation::setTelfhash(const std::string &hash)
{
telfhash = hash;
}
/**
* Get CRC32 of input file
* @param fileCrc32 CRC32 of input file

View File

@ -26,6 +26,7 @@ class FileInformation
private:
retdec::cpdetect::ReturnCode status = retdec::cpdetect::ReturnCode::OK;
std::string filePath; ///< path to input file
std::string telfhash; ///< telfhash of ELF input file
std::string crc32; ///< CRC32 of input file
std::string md5; ///< MD5 of input file
std::string sha256; ///< SHA256 of input file
@ -76,6 +77,7 @@ class FileInformation
/// @{
retdec::cpdetect::ReturnCode getStatus() const;
std::string getPathToFile() const;
std::string getTelfhash() const;
std::string getCrc32() const;
std::string getMd5() const;
std::string getSha256() const;
@ -216,6 +218,7 @@ class FileInformation
std::string getImphashCrc32() const;
std::string getImphashMd5() const;
std::string getImphashSha256() const;
std::string getImphashTlsh() const;
const retdec::fileformat::Import* getImport(std::size_t position) const;
std::string getImportName(std::size_t position) const;
std::string getImportLibraryName(std::size_t position) const;
@ -546,6 +549,7 @@ class FileInformation
/// @{
void setStatus(retdec::cpdetect::ReturnCode state);
void setPathToFile(const std::string &filepath);
void setTelfhash(const std::string &telfhash);
void setCrc32(const std::string &fileCrc32);
void setMd5(const std::string &fileMd5);
void setSha256(const std::string &fileSha256);

View File

@ -55,6 +55,11 @@ std::string ImportTable::getImphashSha256() const
return table ? table->getImphashSha256() : "";
}
std::string ImportTable::getImphashTlsh() const
{
return table ? table->getImpHashTlsh() : "";
}
/**
* Get import
* @param position Index of selected import from table (indexed from 0)

View File

@ -27,6 +27,7 @@ class ImportTable
std::string getImphashCrc32() const;
std::string getImphashMd5() const;
std::string getImphashSha256() const;
std::string getImphashTlsh() const;
const retdec::fileformat::Import* getImport(std::size_t position) const;
std::string getImportName(std::size_t position) const;
std::string getImportUsageType(std::size_t position) const;

View File

@ -54,10 +54,12 @@ std::size_t ImportTablePlainGetter::getBasicInfo(std::size_t structIndex, std::v
desc.push_back("CRC32 : ");
desc.push_back("MD5 : ");
desc.push_back("SHA256 : ");
desc.push_back("TLSH : ");
info.push_back(std::to_string(fileinfo.getNumberOfStoredImports()));
info.push_back(fileinfo.getImphashCrc32());
info.push_back(fileinfo.getImphashMd5());
info.push_back(fileinfo.getImphashSha256());
info.push_back(fileinfo.getImphashTlsh());
return info.size();
}

View File

@ -50,10 +50,12 @@ std::size_t ImportTableJsonGetter::getBasicInfo(std::size_t structIndex, std::ve
desc.push_back("crc32");
desc.push_back("md5");
desc.push_back("sha256");
desc.push_back("tlsh");
info.push_back(std::to_string(fileinfo.getNumberOfStoredImports()));
info.push_back(fileinfo.getImphashCrc32());
info.push_back(fileinfo.getImphashMd5());
info.push_back(fileinfo.getImphashSha256());
info.push_back(fileinfo.getImphashTlsh());
return info.size();
}

View File

@ -4,6 +4,7 @@
* @copyright (c) 2017 Avast Software, licensed under the MIT license
*/
#include "retdec/fileformat/file_format/file_format.h"
#include "retdec/fileformat/utils/conversions.h"
#include "fileinfo/file_presentation/getters/simple_getter/basic_json_getter.h"
@ -26,6 +27,7 @@ std::size_t BasicJsonGetter::loadInformation(std::vector<std::string> &desc, std
desc.clear();
info.clear();
desc.push_back("telfhash");
desc.push_back("crc32");
desc.push_back("md5");
desc.push_back("sha256");
@ -36,6 +38,7 @@ std::size_t BasicJsonGetter::loadInformation(std::vector<std::string> &desc, std
desc.push_back("endianness");
desc.push_back("imageBaseAddress");
info.push_back(fileinfo.getTelfhash());
info.push_back(fileinfo.getCrc32());
info.push_back(fileinfo.getMd5());
info.push_back(fileinfo.getSha256());

View File

@ -30,6 +30,7 @@ std::size_t BasicPlainGetter::loadInformation(std::vector<std::string> &desc, st
desc.clear();
info.clear();
desc.push_back("Telfhash : ");
desc.push_back("CRC32 : ");
desc.push_back("MD5 : ");
desc.push_back("SHA256 : ");
@ -53,6 +54,7 @@ std::size_t BasicPlainGetter::loadInformation(std::vector<std::string> &desc, st
desc.push_back("Entry point section index: ");
desc.push_back("Bytes on entry point : ");
info.push_back(fileinfo.getTelfhash());
info.push_back(fileinfo.getCrc32());
info.push_back(fileinfo.getMd5());
info.push_back(fileinfo.getSha256());