summaryrefslogtreecommitdiff
path: root/content/posts/tips-and-tricks-for-taking-screenshots-and-selecting-text-from-images-on-sway/index.md
blob: 2145ede30362dd4d54f7623fa45d81e16bf9bf38 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
title: "Tips and Tricks for Taking Screenshots and Selecting Text From Images on Sway"
date: 2021-07-28T18:02:31-04:00
draft: true
---

Several days ago I saw someone rant about how there were no good programs for copy text from an image.
They were using the command line tool [`tesseract`](https://github.com/tesseract-ocr/tesseract), but they felt it was too clunky for them.
They were using `tesseract` by taking a screenshot of the text they were trying to copy, saving the screenshot to a file, and running `tesseract` on that file to generate a *new* file with the text that was found.
After all that, they would open the final text file, copy the text, and delete the file.

This is something that I actually used to do a lot with my Android phone, before they decided to remove the feature for some reason (or at least move it somewhere that I haven't been able to find for years) so I decided to see how easy it would be to hook up [`grim`](https://wayland.emersion.fr/grim/) and [`slurp`](https://wayland.emersion.fr/slurp/) with `tesseract` to copy text from images the same way.

It turns out, it was not that hard to do:

```
grim -g "$(slurp)" - | tesseract - - | wl-copy
```

This works by selecting the region using `"$(slurp)"`, and using `grim -g` to take a screenshot of that region.
The file is then "saved" to standard output (that's what the `-` represents here).
`tesseract` is then able to read the image from `-` (standard input, which it gets from the standard output of `grim`), and write the text it finds to - (standard output).
Finally, we stuff the text that was found using `tesseract` into the clipboard using `wl-copy`, and voilà!
We now have the text from some image in our clipboard by just drawing a box around it, and we did it with no temporary files on the disk.

However, calling this from the command line each time you want to select text from an image could become cumbersome if it's something you want to do frequently.
I decided to set this up using a wofi menu, and several options for taking screenshots.

Here is the script that I created:

```
#!/bin/sh

screenshot_copy_all_displays="Screenshot all displays to clipboard"
screenshot_all_displays_to_file="Screenshot all displays to file"
screenshot_copy_area="Screenshot area to clipboard"
screenshot_copy_area_ocr="Screenshot area to copy text"
screenshot_area_to_file="Screenshot area to file"
screenshot_copy_window="Screenshot focused window to clipboard"
screenshot_window_to_file="Screenshot focused window to file"
screenshot_copy_monitor="Screenshot focused monitor to clipboard"
screenshot_monitor_to_file="Screenshot focused monitor to file"

# Store each option in a single string seperated by newlines.
options="$screenshot_copy_all_displays\n"
options+="$screenshot_all_displays_to_file\n"
options+="$screenshot_copy_area\n"
options+="$screenshot_copy_area_ocr\n"
options+="$screenshot_area_to_file\n"
options+="$screenshot_copy_window\n"
options+="$screenshot_window_to_file\n"
options+="$screenshot_copy_monitor\n"
options+="$screenshot_monitor_to_file"

# Prompt the user with wofi.
choice="$(echo -e "$options" | wofi -d)"

# Make sure that all pictures are saved in the screenshots folder.
cd ~/Pictures/Screenshots

case $choice in
    $screenshot_copy_all_displays)
        grim - | wl-copy
        ;;
    $screenshot_all_displays_to_file)
        grim
        ;;
    $screenshot_copy_area)
        grim -g "$(slurp)" - | wl-copy
        ;;
    $screenshot_copy_area_ocr)
        grim -g "$(slurp)" - | tesseract - - | wl-copy
        ;;
    $screenshot_area_to_file)
        grim -g "$(slurp)"
        ;;
    $screenshot_copy_window)
        grim -g "$(swaymsg -t get_tree | jq -j '.. | select(.type?) | select(.focused).rect | "\(.x),\(.y) \(.width)x\(.height)"')" - | wl-copy
        ;;
    $screenshot_window_to_file)
        grim -g "$(swaymsg -t get_tree | jq -j '.. | select(.type?) | select(.focused).rect | "\(.x),\(.y) \(.width)x\(.height)"')"
        ;;
    $screenshot_copy_monitor)
        grim -o $(swaymsg -t get_outputs | jq -r '.[] | select(.focused) | .name') - | wl-copy
        ;;
    $screenshot_monitor_to_file)
        grim -o $(swaymsg -t get_outputs | jq -r '.[] | select(.focused) | .name')
        ;;
esac
```
*This script does assume that the file ~/Pictures/Screenshots exists.*
*Make sure that folder either exists, or modify the script to save the file in a different folder*

This script has a bunch of options that I find useful.
You can either choose to save the screenshot to a file, or to the clipboard.
You can choose whether to screenshot a region, the focused window, the focused monitor, or all monitors.
These could of course all be bound to different shortcuts, but I personally like having a single screenshot button for simplicity.

Speaking of binding shortcuts, lets go over making this script into a shortcut on Sway.
I recommend you save this script to a file in your path.
I personally have this saved to `~/.local/bin/screenshot-menu.sh`.
Remember to mark the script as executable using `chmod +x`.

Make sure to check that the script works properly for you by running the script from the command line.
In my case, I can open a new terminal and type `screenshot-menu.sh` and the wofi menu will appear.

If that's working fine, you can now add a binding to your sway config to run the script.
Add a new line to your config (mine is located at `~/.config/sway/config`), for the new binding.
It should look something like:

```
    bindsym $meh+a exec screenshot-menu.sh
```

The `$meh` key is a special modifier on my keyboard; you should use something that makes sense for you.

Happy screenshotting!