I installed PDFMiner with pip to extract text from a Japanese PDF and the following problem occurred.

I(cid:888), Intellectual Production Techniques(cid:887)Good(cid:845)Reference(cid:853)Greed(cid:864)(cid:845)(cid:880)(cid:866). People(cid:884)intellectual production techniques(cid:923)teaching(cid:849)(cid:916). Solution: see Text extraction from PDF.


Notes on the research process


This page is auto-translated from /nishio/CID問題 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.