Reducing Third-Party Library Usage

The following third-party vendor libraries are optional dependencies of File Content Extraction. This table shows you which .dll / .so files you will need to delete if you want to remove the third-party library, and what functionality you will lose as a result. It also shows which lines you should remove from formats.ini if you choose to delete the binary specified.

Vendor library File(s) in bin dir Format support or other functionality lost Line(s) to remove in formats.ini
7zip 23.01/p7zip 17.04 7z, multiarcsr PC_Library_Fmt, cpio_Archive_CRChdr_Fmt, cpio_Archive_CHRhdr_Fmt, PEX_Binary_Archive_Fmt, ARJ_Fmt, XZ_Fmt, Z7Z_Fmt, RAR5_Fmt, LZMA_Fmt, RPM_Fmt, Windows_Imaging_Fmt, Debian_Binary_Fmt, Windows_Installer_Fmt, Unix_Archive_Fmt, Mac_Executable_Fmt, Executable_JAR_Fmt, XAR_Fmt, XPInstall_Fmt, IHEX_Fmt multiarc
Abseil 20230802.1 ocr and ocr folder ocr is used for image to text conversion.  
Apache Arrow 11.0.0 parquetsr Apache_Parquet_Fmt parquet
Apache Avro 1.10.1 avrosr Avro_Fmt avro
Boost 1.75.0 See 'Apache Arrow'    
brotli 1.0.9 See ‘Apache Arrow’    
Caffe rc2 ocr and ocr folder ocr is used for image to text conversion.  
DWFToolkit 7.7 See 'ODA'    
chm_lib 0.40 chmdll,chmsr CHM_Fmt chm
Expat 2.5.0 ocr and ocr folder ocr is used for image to text conversion  
FreeType 2.13.0 See 'ODA'    
FreeType 2.13.3 xpssr, pdfsr MS_XPS_Fmt, PDF_Fmt, Portfolio_PDF_Fmt xps, pdf
ICU 76.1 icudt, icuuc, pdfsr, pdf2sr PDF_Fmt, Portfolio_PDF_Fmt pdf
Jansson 2.13.1 avrosr Avro_Fmt avro
JasPer 4.2.1 jp2000sr, kpjp2000sr JPEG_2000_JP2_File_Fmt, ISO_JPEG2000_JP2_Fmt, ISO_JPEG2000_JPM_Fmt, ISO_JPEG2000_JPX_Fmt, Motion_JPEG_2000_Fmt, JPEG_2000_PGX_Fmt jp2000
jemalloc 5.3.0 See ‘Apache Arrow’    
leptonica 1.80.0 ocr and ocr folder ocr is used for image to text conversion.  
libde265 1.0.15, libheif 1.17.6 de265, heif, kpheifrdr HEIF_Image_Fmt heif
libical 3.0.16 icssr ICS_Fmt ics
libjpeg 9e kpjpeg, kpjpgrdr, kptifrdr JPEG_File_Interchange_Fmt, TIFF_Fmt jpg, tif
libpff 20180714 pff, pffsr MS_OutlookOST_Fmt pff
libPNG 1.6.44 kppng, kppngwrt, kppngrdr APNG_Fmt, PNG_Fmt png
libTIFF 4.6.0 kptifrdr TIFF_Fmt tif
libwebp 1.3.2 kpwebprdr, kptifrdr WebP_Fmt, TIFF_Fmt webp, tif
libxml2 2.13.5 htmlsr, cryptographyservices Ability to obtain XMP data from html files and ability to decrypt data in RMS protected files.  
Linguist 5.2 codeindentifierplugin Detection of source code formats using KVFLT_SOURCECODEDETECTION  
lz4 1.9.4 See ‘Apache Arrow’ and ‘Apache ORC’    
ODA 2025.12 kpodardr AutoCAD_DXF_Binary_Fmt, AutoCAD_DXF_Text_Fmt, AutoDesk_DWG_Fmt, Intergraph_V7_DGN_Fmt, MicroStation_V8_DGN_Fmt, Design_Web_Format_Fmt oda
OpenBLAS 0.2.15 libopenblas ocr is used for image to text conversion.  
OpenSSL 3.0.8 ocr and ocr folder, cryptographyservices ocr is used for image to text conversion. This library also adds the ability to decrypt data in RMS protected files.  
OpenSSL 3.3.1 See 'ODA'    
oless 3.12 See 'ODA'    
Apache ORC 1.6.8 orcsr Apache_ORC_Fmt orc
iana 2020e-1 see 'Apache ORC' Apache_ORC_Fmt orc
Google pdfium 4500 kppdf2rdr, pdf2sr   Change 230=pdf2 to 230=pdf
Google protobuf 24.3 ocr and ocr folder ocr is used for image to text conversion.  
Google protobuf 21.12 iwwp13sr, kpiwpg13rdr, iwss13sr, kviwork13, see ‘Apache ORC’ IWWP13_Fmt, IWSS13_Fmt, IWPG13_Fmt and the ability to detect these iWork formats iwwp13, iwss13, iwpg13
pstsdk 0.3 cpstsdk, pstxsr On Windows platforms, you might be able to use pstsr instead  
rapidjson 1.1.0 pbixsr, codeidentifierplugin MS_Power_BI_Fmt and source code detection pbix
rapidxml 1.13 pbixsr MS_Power_BI_Fmt pbix
Re2 2022-06-01 See ‘Apache Arrow’    
ReadStat 1.1.4 sassr SAS7BDAT_Fmt sas
Google snappy 1.1.7 See ‘Apache ORC’    
Google snappy 1.1.9 iwwp13sr, kpiwpg13rdr, iwss13sr, kviwork13, avrosr, see ‘Apache Arrow’ Avro_Fmt, IWWP13_Fmt, IWSS13_Fmt, IWPG13_Fmt and the ability to detect these iWork formats avro, iwwp13, iwss13, iwpg13
sqlite 3.45.1 pbixsr MS_Power_BI_Fmt pbix
tesseract 5.5.0 ocr and ocr folder ocr is used for image to text conversion.  
Thrift 0.16.0 See ‘Apache Arrow’    
Tinyxml 2.6.2 See 'ODA'    
UTF See 'ODA'    
utf8proc 2.7.0 See ‘Apache Arrow’    
WinZipJPEG unzipjpg Extraction of PKZIP_Fmt subfiles that use the jpeg compresson method.  See ZIP Compression Methods.  
WavPack 5.6.0 wavpack Extraction of PKZIP_Fmt subfiles that use the wavpack compresson method.  See ZIP Compression Methods.  
Adobe XMP 2023.12 xmp Obtain XMP metadata  
xsimd 9.0.1 See ‘Apache Arrow’    
Zlib 1.2.13 ocr and ocr folder ocr is used for image to text conversion.  
Zlib 1.3.1 See ‘Apache Arrow’, ‘Apache ORC’, ‘ODA’    
Facebook zstd 1.5.5 pbixsr, zstdsr MS_Power_BI_Fmt , Zstandard_Fmt and Extraction of PKZIP_Fmt subfiles that use the zStandard compresson method. See ZIP Compression Methods below pbix, zstd
Facebook zstd 1.5.5 see ‘Apache ORC’    
Facebook zstd 1.5.6 See ‘Apache Arrow’    

ZIP Compression Methods

The ZIP specification allows compression of subfiles in many different ways, including the popular deflate method. File Content Extraction uses third-party libraries to handle some of these compression types.

You can delete some of these ZIP compression libraries, at the cost of losing the ability to extract subfiles that use that compression type.