2022-04-07
- Use [mecab does not split part of the input
- However, in the case of Scrapbox, there are multiple notations to be protected, so it is difficult to do the pre-processing with regular expression replacements
- This time, I decided to use scrapbox-parser in Deno only for the preprocessing part. ts
old version ts
- error
tokenizer.cpp(368) [new_node->feature]
- Due to downstream MeCab specifications, text immediately preceding link or icon notation must not end in whitespace.
- mecab constrained parsing (partial parsing) errors - aayletric
-
This error occurs only when a single-byte space immediately precedes a constrained text string.
-
- We’ll add that whitespace to the notation side this time, since we can’t restore it if we simply delete it.
- mecab constrained parsing (partial parsing) errors - aayletric
This page is auto-translated from /nishio/Scrapboxの記法を維持して形態素解析 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.