Recoll / searching in multiple PDF files at once

Help with the version of MX KDE officially released by the Development Team.
When asking for help, use Quick System Info from MX Tools. It will be properly formatted using the following steps.
1. Click on Quick System Info in MX Tools
2. Right click in your post and paste.
Message
Author
User avatar
debianix
Posts: 185
Joined: Fri May 31, 2024 4:03 pm

Recoll / searching in multiple PDF files at once

#1 Post by debianix »

Hello!
I would like to perform a text-based search for keywords in multiple pdf files, because I think this is a very helpful feature.
Now I have discovered the package recoll and installed it directly via apt. I also added a few packages that should make it possible to search in Word documents, in case you ever come across something like this:

Code: Select all

sudo apt install recoll wv antiword
The package poppler-utils is apparently required for the PDF search and was already pre-installed.

I have now read here that it can also be integrated directly into KDE or Dolphin:
Unfortunately, I cannot find the "kio-recoll" package in the repos and am now wondering whether it is really not packaged under MX Linux or whether I have simply not looked hard enough?
If the package doesn't exist, is there any chance that it will come?

Code: Select all

Snapshot created on: 20240625_0332
System:
  Kernel: 6.1.0-22-amd64 [6.1.94-1] arch: x86_64 bits: 64 compiler: gcc v: 12.2.0
    parameters: BOOT_IMAGE=/vmlinuz-6.1.0-22-amd64 root=UUID=<filter> ro quiet splash
    resume=UUID=<filter> resume_offset=626688
  Desktop: KDE Plasma v: 5.27.5 wm: kwin_x11 vt: 7 dm: SDDM Distro: MX-23.3_KDE_x64 Libretto May
    19 2024 base: Debian GNU/Linux 12 (bookworm)
Machine:
  Type: Desktop Mobo: Gigabyte model: B560M DS3H V2 serial: <superuser required> UEFI: American
    Megatrends LLC. v: F9 date: 06/07/2023
CPU:
  Info: model: 11th Gen Intel Core i7-11700 bits: 64 type: MT MCP arch: Rocket Lake gen: core 11
    level: v4 note: check built: 2021+ process: Intel 14nm family: 6 model-id: 0xA7 (167) stepping: 1
    microcode: 0x5E
  Topology: cpus: 1x cores: 8 tpc: 2 threads: 16 smt: enabled cache: L1: 640 KiB
    desc: d-8x48 KiB; i-8x32 KiB L2: 4 MiB desc: 8x512 KiB L3: 16 MiB desc: 1x16 MiB
  Speed (MHz): avg: 800 min/max: 800/4800:4900 scaling: driver: intel_pstate governor: powersave
    cores: 1: 800 2: 800 3: 800 4: 800 5: 800 6: 800 7: 800 8: 800 9: 800 10: 800 11: 800 12: 800
    13: 800 14: 800 15: 800 16: 800 bogomips: 79872
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
  Vulnerabilities:
  Type: gather_data_sampling mitigation: Microcode
  Type: itlb_multihit status: Not affected
  Type: l1tf status: Not affected
  Type: mds status: Not affected
  Type: meltdown status: Not affected
  Type: mmio_stale_data mitigation: Clear CPU buffers; SMT vulnerable
  Type: reg_file_data_sampling status: Not affected
  Type: retbleed mitigation: Enhanced IBRS
  Type: spec_rstack_overflow status: Not affected
  Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via prctl
  Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer sanitization
  Type: spectre_v2 mitigation: Enhanced / Automatic IBRS; IBPB: conditional; RSB filling;
    PBRSB-eIBRS: SW sequence; BHI: SW loop, KVM: SW loop
  Type: srbds status: Not affected
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: Intel RocketLake-S GT1 [UHD Graphics 750] vendor: Gigabyte driver: i915 v: kernel
    arch: Gen-12.1 process: Intel 10nm built: 2020-21 ports: active: DP-1,HDMI-A-3
    empty: HDMI-A-1,HDMI-A-2 bus-ID: 00:02.0 chip-ID: 8086:4c8a class-ID: 0300
  Device-2: AIRHUG 02 type: USB driver: uvcvideo bus-ID: 3-2:2 chip-ID: 2f9d:1101 class-ID: 0e02
    serial: <filter>
  Display: x11 server: X.Org v: 1.21.1.7 with: Xwayland v: 22.1.9 compositor: kwin_x11 driver: X:
    loaded: modesetting unloaded: fbdev,vesa dri: iris gpu: i915 display-ID: :0 screens: 1
  Screen-1: 0 s-res: 3840x1080 s-dpi: 96 s-size: 1016x285mm (40.00x11.22")
    s-diag: 1055mm (41.54")
  Monitor-1: DP-1 pos: primary,left model: BenQ BL2410 serial: <filter> built: 2015
    res: 1920x1080 hz: 60 dpi: 102 gamma: 1.2 size: 477x268mm (18.78x10.55") diag: 609mm (24")
    ratio: 16:9 modes: max: 1920x1080 min: 720x400
  Monitor-2: HDMI-A-3 mapped: HDMI-3 pos: right model: BenQ BL2410 serial: <filter> built: 2015
    res: 1920x1080 hz: 60 dpi: 102 gamma: 1.2 size: 477x268mm (18.78x10.55") diag: 609mm (24")
    ratio: 16:9, 15:9 modes: max: 1920x1080 min: 720x400
  API: OpenGL v: 4.6 Mesa 23.1.2-1~mx23ahs renderer: Mesa Intel Graphics (RKL GT1)
    direct-render: Yes
Audio:
  Device-1: Texas Instruments PCM2902 Audio Codec type: USB
    driver: hid-generic,snd-usb-audio,usbhid bus-ID: 1-3:2 chip-ID: 08bb:2902 class-ID: 0300
  Device-2: C-Media Audio Adapter (Unitek Y-247A) type: USB
    driver: cmedia_hs100b,snd-usb-audio,usbhid bus-ID: 3-3:3 chip-ID: 0d8c:0014 class-ID: 0300
  API: ALSA v: k6.1.0-22-amd64 status: kernel-api tools: alsamixer,amixer
  Server-1: PipeWire v: 1.0.0 status: active with: 1: pipewire-pulse status: active
    2: wireplumber status: active 3: pipewire-alsa type: plugin 4: pw-jack type: plugin
    tools: pactl,pw-cat,pw-cli,wpctl
Network:
  Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: Gigabyte driver: r8169
    v: kernel pcie: gen: 1 speed: 2.5 GT/s lanes: 1 port: 3000 bus-ID: 03:00.0 chip-ID: 10ec:8168
    class-ID: 0200
  IF: eth0 state: up speed: 1000 Mbps duplex: full mac: <filter>
Drives:
  Local Storage: total: 3.17 TiB used: 1.01 TiB (31.8%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Samsung model: SSD 970 EVO Plus 1TB size: 931.51 GiB
    block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s lanes: 4 type: SSD serial: <filter>
    rev: 2B2QEXM7 temp: 60.9 C scheme: GPT
  ID-2: /dev/nvme1n1 maj-min: 259:1 vendor: Samsung model: SSD 970 EVO Plus 2TB size: 1.82 TiB
    block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s lanes: 4 type: SSD serial: <filter>
    rev: 2B2QEXM7 temp: 42.9 C scheme: GPT
  ID-3: /dev/sda maj-min: 8:0 vendor: Kingston model: SA400S37480G size: 447.13 GiB block-size:
    physical: 512 B logical: 512 B speed: 6.0 Gb/s type: SSD serial: <filter> rev: 60.1 scheme: GPT
Partition:
  ID-1: / raw-size: 68.34 GiB size: 66.72 GiB (97.62%) used: 59.16 GiB (88.7%) fs: ext4
    dev: /dev/dm-0 maj-min: 253:0 mapped: luks-<filter>
  ID-2: /boot raw-size: 1024 MiB size: 973.4 MiB (95.06%) used: 192.5 MiB (19.8%) fs: ext4
    dev: /dev/nvme0n1p2 maj-min: 259:3
  ID-3: /boot/efi raw-size: 1024 MiB size: 1022 MiB (99.80%) used: 288 KiB (0.0%) fs: vfat
    dev: /dev/nvme0n1p1 maj-min: 259:2
  ID-4: /home raw-size: 861.14 GiB size: 846.54 GiB (98.31%) used: 8.19 GiB (1.0%) fs: ext4
    dev: /dev/dm-2 maj-min: 253:2 mapped: luks-<filter>
Swap:
  Kernel: swappiness: 15 (default 60) cache-pressure: 100 (default)
  ID-1: swap-1 type: file size: 37.14 GiB used: 5.5 MiB (0.0%) priority: -2 file: /swap/swap
Sensors:
  System Temperatures: cpu: 33.0 C mobo: N/A
  Fan Speeds (RPM): N/A
Repos:
  Packages: 3195 pm: dpkg pkgs: 3184 libs: 1731 tools: apt,apt-get,aptitude,nala,synaptic pm: rpm
    pkgs: 0 pm: flatpak pkgs: 11
  No active apt repos in: /etc/apt/sources.list
  Active apt repos in: /etc/apt/sources.list.d/debian-stable-updates.list
    1: deb http://deb.debian.org/debian bookworm-updates main contrib non-free non-free-firmware
  Active apt repos in: /etc/apt/sources.list.d/debian.list
    1: deb http://deb.debian.org/debian bookworm main contrib non-free non-free-firmware
    2: deb http://security.debian.org/debian-security bookworm-security main contrib non-free non-free-firmware
  Active apt repos in: /etc/apt/sources.list.d/mx.list
    1: deb http://ftp.halifax.rwth-aachen.de/mxlinux/packages/mx/repo/ bookworm main non-free
    2: deb http://ftp.halifax.rwth-aachen.de/mxlinux/packages/mx/repo/ bookworm ahs
  Active apt repos in: /etc/apt/sources.list.d/syncthing.list
    1: deb https://apt.syncthing.net/ syncthing stable
Info:
  Processes: 362 Uptime: 13h 57m wakeups: 1 Memory: 31.17 GiB used: 7.41 GiB (23.8%) Init: SysVinit
  v: 3.06 runlevel: 5 default: graphical tool: systemctl Compilers: gcc: 12.2.0 alt: 12
  Client: shell wrapper v: 5.2.15-release inxi: 3.3.26
Boot Mode: UEFI

MXRobo
Posts: 1840
Joined: Thu Nov 14, 2019 12:09 pm

Re: Recoll / searching in multiple PDF files at once

#2 Post by MXRobo »

Relevant link: viewtopic.php?t=80628

User avatar
CharlesV
Administrator
Posts: 8012
Joined: Sun Jul 07, 2019 5:11 pm

Re: Recoll / searching in multiple PDF files at once

#3 Post by CharlesV »

+1 on pdfgrep .. I move so fast on files I just dont have time to sit and wait for recoll (or any app) to index them.

Also.. you do know that not all PDF's are searchable? They must have the text layer, which means some PDF's will need to be OCR'd to search.
*QSI = Quick System Info from menu (Copy for Forum)
*MXPI = MX Package Installer
*Please check the solved checkbox on the post that solved it.
*Linux -This is the way!

User avatar
debianix
Posts: 185
Joined: Fri May 31, 2024 4:03 pm

Re: Recoll / searching in multiple PDF files at once

#4 Post by debianix »

Thanks for the tip, pdfgrep seems to me to be a tool that works well.
I have read the manpages and looked at some syntax examples. I installed and tested it, and it works so far, but I haven't figured out how to search ALL files ONLY within the current folder for a specific keyword / pattern / string .
Maybe someone can give me an example syntax for this? It should be a syntax in which the folder to be searched does not have to be specified, but which automatically refers to the folder in which you are currently located in the terminal.

Overall, I am very reluctant to work with terminal and file manager in combination, because I find it a bit cumbersome. There seems to be a GUI for pdfgrep, but I'm not quite sure how to assess this project in terms of security.

https://sourceforge.net/projects/pdfgrepgui/
I'm always used to such projects being on github etc. Sourceforge.net always reminds me more of windows .exe files ;)
What is your opinion?
Last edited by debianix on Thu Jul 11, 2024 7:54 pm, edited 2 times in total.

User avatar
CharlesV
Administrator
Posts: 8012
Joined: Sun Jul 07, 2019 5:11 pm

Re: Recoll / searching in multiple PDF files at once

#5 Post by CharlesV »

The good news is that source code is there on sourceforge...

It will take a bit to walk through all the code, however, first glance it looks pretty good, and does compile and run under my Lazarus version as well. ( And it runs / looks very nice too! )

if you request this and link it in the requests area, one of the devs might be able to pick it up and verify all is good, and put it in the repos.
*QSI = Quick System Info from menu (Copy for Forum)
*MXPI = MX Package Installer
*Please check the solved checkbox on the post that solved it.
*Linux -This is the way!

User avatar
debianix
Posts: 185
Joined: Fri May 31, 2024 4:03 pm

Re: Recoll / searching in multiple PDF files at once

#6 Post by debianix »

Thanks a lot for contributing ur opinion! That sounds good, I just opened a request, lets see what happens :-) anyway, pdfgrep is an amazing tool!

User avatar
CharlesV
Administrator
Posts: 8012
Joined: Sun Jul 07, 2019 5:11 pm

Re: Recoll / searching in multiple PDF files at once

#7 Post by CharlesV »

Your very welcome. And I agree with you on the GUI and pdfgrep.

The other search tool I use is fsearch.. and it is also amazing!
*QSI = Quick System Info from menu (Copy for Forum)
*MXPI = MX Package Installer
*Please check the solved checkbox on the post that solved it.
*Linux -This is the way!

Post Reply

Return to “MX KDE Official Release”