from Machine writes in Scrapbox

Objective.

  • When I search for Scrapbox, I don’t get any hits on past blog posts, I think I should get a hit.
  • It’s automatically converted to Hatena blog, but I’m not paying for it, so I get ads.
    • I feel bad when I link them.
  • I want to bracket past articles as well.

2021-05-04 Hatena Diary imported into Scrapbox with past date script 2021-12-20

In the past (2018) I had even created a “script to convert exported XML to JSON in Scrapbox format”. - Importing Hatena Diary into Scrapbox

  • Why haven’t you imported it into this project?
  • Formatting is not fully supported.
  • I didn’t want the top of the page to be filled with mechanical articles.
    • → I noticed that if you put the update date in the past, the top page won’t be filled.

(computer) format

  • It looks like the entity reference > is being erased when I read it in bs4.
    • I don’t think that’s possible, but something lost came up and I’m not sure how to solve it.
    • It’s not a big deal, so I parsed it on my own.
  • No conversion from Hatena notation to Scrapbox notation
    • I put it all together and put it in Scrapbox code notation.
    • If it hits the search and you can read it, that’s all that matters.
    • Do `html.unescape
    • I was going to use the title of the blog as the page title, but decided against it because I felt it wasn’t necessary.
      • All titles will be machine generated.
        • Easy mechanical removal if you decide it’s not good enough after import
          • Q: Wouldn’t it be better to make a backup before importing?
            • A: If I want to undo the import after running the import for a while and editing various pages, I can’t restore from a backup.
        • Safe and secure override operation
          • It’s sad when a script is updated and reconverted and then overwritten and the human-written text is lost.
          • This anxiety prevents us from “keeping it updated”.
          • Don’t change the machine-generated page, just create a page with a good title that can be duplicated when you want to change it.
  • image
    • The creation date and time are correctly set to three years ago.
    • But, well, it might have been enough to just say “not on the top page” without having to match the exact date and time of creation. python
def timestamp(*args):
    return datetime.datetime(*map(int, args)).timestamp()
- 2021-06-29 PS: I still prefer the current "date and time the article was actually written" because I'm comfortable with the search results being in chronological order.

search (e.g. for someone using a search engine)

  • image

I made a heading page but deleted it because I knew I would never use it.

scale

  • 120,000 lines of XML
  • 220,000 lines of JSON
  • 1500 pages

superscription

  • Schedule: Can be overwritten if the change date/time has not changed.
  • Schedule: Check for conflicts on import
  • Actual: Always overwritable

Try it on an empty project

  • It has to be admin.

Will bots be able to script their participation in the project as well? :

Request URL: "https://scrapbox.io/api/projects/{project}/invitations/{key}"
POST

This page is auto-translated from /nishio/はおăȘăƒ€ă‚€ă‚ąăƒȘăƒŒă‚’éŽćŽ»ăźæ—„ä»˜ă§Scrapboxă«ă‚€ăƒłăƒăƒŒăƒˆ using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.