0

I have scanned books with black imitation leather as background. The text recognition unfortunately recognizes text on this background. I like to color the border black, so that the program does not find any text at the edge. Is this possible with tools like ImageMagick or GraphicsMagick?

Here is an example (original is in tif): https://image.ibb.co/icm7Fz/1024656063_0004_raw.jpg

2 Answers 2

1

Perhaps a combination of floodfill and fuzz?

convert input.png -fill white -fuzz 20% -draw 'color 1,1 floodfill' output.png

output

Also checkout Fred's awesome textcleaner script.

1
  • Yes! Provided that Imagemagick was installed with libtiff support (usually included with default installers). Commented Aug 21, 2018 at 13:24
1

emcconville has an excellent solution. I might add just a bit to it to include some deskew and trim/shave, since your margins are large enough to permit shaving the excess black that remains after a trim. The deskew might help in the OCR.

convert image.png -bordercolor black -border 1 -background black -deskew 40% -fuzz 50% -trim +repage -shave 10x10 result.png


enter image description here

1
  • Sorry, but the background (border) must be preserved.
    – Heintje
    Commented Aug 22, 2018 at 8:33

Not the answer you're looking for? Browse other questions tagged or ask your own question.