2019-03-21 Auto-bracketing is not a good idea because if you generate a lot of low-quality links, they will be suggested only by garbage and âmaking a house into a garbage dumpâ will occur. It is true that keyword extraction on each page and making it into a link would be a disaster. But that is a problem of âlow quality,â not of automation. What is quality of links?
Consider a bad example
- For example, letâs say each page of a book is a Scrapbox page.
- If a certain keyword X appears 100 times, itâs not appropriate to make it a link all the time.
- If you see a hundred links, itâs just âWow, thatâs a lot of links.
- Then it would still be better to present only the information, âYou will find that keyword in book A.
- This is Examples of unexpected discoveries in searches.
- A keyword is unexpectedly presented as being included in a certain book and feels valuable
- You can open the book and search again to see which specific page in the book it is on.
- There is no straightforward way to achieve this on Scrapbox.
- For example, if each page has a âbook ID (name or ISBN)â, you can use the ID and a set of keywords to narrow down the search within a book.
- If we try to achieve this with a link, we would need a link that expresses âthe occurrence of keyword X in book Aâ.
- If a certain keyword X appears 100 times, itâs not appropriate to make it a link all the time.
Narrow links are useful
-
In other words, âlinks that appear infrequently in other textâ are useful
- On the other hand, my most linked in Scrapbox is the
KJ method
and theEngineer's Intellectual Production Method
.- Words related to the content of interest are naturally frequent.
- The dilemma of choosing something that simply appears infrequently is to choose something that is âuninterestingâ.
- On the other hand, my most linked in Scrapbox is the
-
Similar concept to [IDF
- If all links are equivalent, then it is a function of IDF
- In fact, the links are not equivalent.
- The same âfive timesâ but these are different in usefulness.
- In fact, the links are not equivalent.
- If all links are equivalent, then it is a function of IDF
-
Links between distant objects are beneficial.
-
So there is ânearâ and âfarâ as a relationship between pages
- How is this defined?
-
What is Page proximity?
-
Pages belonging to the same chapter are close pages
- This should give the chapter structure as metadata
-
Pages belonging to the same authorâs book are close pages
-
Book > Chapter > Section ⊠Hierarchical Structure
-
When the lower structure contains keywords, the upper structure also contains keywords.
- This has a pooling feel to it.
- Hierarchical structure is a good fit for Given
- A keyword that appears five times in the same book is different from a keyword that appears five times in the same chapter or five times across chapters.
- pooling Then the former would be one time and the latter five times.
- Higher IDFs in higher tiers are more valuable.
- A keyword that appears five times in the same book is different from a keyword that appears five times in the same chapter or five times across chapters.
-
What if the hierarchical structure is not Given?
- Assuming a book, adjacent pages are close pages
- Pages with similar content (= many keywords in common) are close pages
- Hierarchical structures can be created by Agglomerative Hierarchical Clustering.
-
- [[Proximity is determined by search]] Can't you?
- [[nodal point of thought 2019-02-18#5c6a9ccfaff09e00004ee473|5c6a9ccfaff09e00004ee473]] as it relates to
-
Bonus if the link already exists in Scrapbox
-
I want it to be able to run incrementally.
-
I hope it can be improved incrementally.
This page is auto-translated from /nishio/èȘćăă©ă±ăăŁăłă° using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. Iâm very happy to spread my thought to non-Japanese readers.