from Machine writes in Scrapbox
Objective.
- When I search for Scrapbox, I donât get any hits on past blog posts, I think I should get a hit.
- Itâs automatically converted to Hatena blog, but Iâm not paying for it, so I get ads.
- I feel bad when I link them.
- I want to bracket past articles as well.
2021-05-04 Hatena Diary imported into Scrapbox with past date script 2021-12-20
- Q: Itâs been six months.
- A: Very good - Importing past blog posts into Scrapbox, six months, very good.
In the past (2018) I had even created a âscript to convert exported XML to JSON in Scrapbox formatâ. - Importing Hatena Diary into Scrapbox
- Why havenât you imported it into this project?
- Formatting is not fully supported.
- I didnât want the top of the page to be filled with mechanical articles.
- â I noticed that if you put the update date in the past, the top page wonât be filled.
(computer) format
- It looks like the entity reference
>
is being erased when I read it in bs4.- I donât think thatâs possible, but something lost came up and Iâm not sure how to solve it.
- Itâs not a big deal, so I parsed it on my own.
- No conversion from Hatena notation to Scrapbox notation
- I put it all together and put it in Scrapbox code notation.
- If it hits the search and you can read it, thatâs all that matters.
- Do `html.unescape
- I was going to use the title of the blog as the page title, but decided against it because I felt it wasnât necessary.
- All titles will be machine generated.
- Easy mechanical removal if you decide itâs not good enough after import
- Q: Wouldnât it be better to make a backup before importing?
- A: If I want to undo the import after running the import for a while and editing various pages, I canât restore from a backup.
- Q: Wouldnât it be better to make a backup before importing?
- Safe and secure override operation
- Itâs sad when a script is updated and reconverted and then overwritten and the human-written text is lost.
- This anxiety prevents us from âkeeping it updatedâ.
- Donât change the machine-generated page, just create a page with a good title that can be duplicated when you want to change it.
- Easy mechanical removal if you decide itâs not good enough after import
- All titles will be machine generated.
-
- The creation date and time are correctly set to three years ago.
- But, well, it might have been enough to just say ânot on the top pageâ without having to match the exact date and time of creation. python
def timestamp(*args):
return datetime.datetime(*map(int, args)).timestamp()
- 2021-06-29 PS: I still prefer the current "date and time the article was actually written" because I'm comfortable with the search results being in chronological order.
search (e.g. for someone using a search engine)
I made a heading page but deleted it because I knew I would never use it.
-
- Tasteless list is not good.
scale
- 120,000 lines of XML
- 220,000 lines of JSON
- 1500 pages
superscription
- Schedule: Can be overwritten if the change date/time has not changed.
- Schedule: Check for conflicts on import
- Actual: Always overwritable
Try it on an empty project
- It has to be admin.
Will bots be able to script their participation in the project as well? :
Request URL: "https://scrapbox.io/api/projects/{project}/invitations/{key}"
POST
This page is auto-translated from /nishio/ăŻăŠăȘăă€ăąăȘăŒăéć»ăźæ„ä»ă§Scrapboxă«ă€ăłăăŒă using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. Iâm very happy to spread my thought to non-Japanese readers.