Sentences written in a lighthearted manner

Ah, well, people who are not used to the process of making lots of stickies and doing the KJ method donā€™t have a good idea of how granular the information should be when they make the stickies in the first place. Thatā€™s where the software needs to help. This is not the proper granularity.

  • image Divide and conquer by experienced personnel
  • image
  • image
    • This is a split and delete only pattern
  • With rewriting it looks like this:.
    • image

  • v1: interconnected range extraction

    • It worked for a while, but not great.
    • Instead of using the results after the engagement analysis, we thought it was necessary to devise a process of engagement analysis.
    • While trying to implement the co-reference analysis itself, I realized that there is no need for co-reference analysis in the first place.
  • v2: 4-level decomposition algorithm

    • consideration
      • For ā€œwords that appear at a distance but are connected in terms of their affinities.ā€
        • Whether they are connected in terms of engagement or not, they can be separated as stickies because they appear at a distance!
      • Split by punctuation without length-dependent splitting, just split by punctuation.
      • Itā€™s good because I break it up with punctuation and if itā€™s too long, I chop it up even more with conjunctive particles.
    • structure
      • Split by punctuation, parentheses, etc.
      • If the length is above the threshold, weā€™ll split it with a conjunctive particle.
      • If the length exceeds the threshold, weā€™ll split it with a cohesive particle.
      • If the length is above the threshold, weā€™ll split it with a case particle.
    • I was thinking of doing machine learning based on the surrounding predispositions for where to divide, but when I looked at the predispositions with my eyes, I felt like this was the way to go.
  • v3: Algorithm to split recursively from the one with the highest split priority score

    • Observe and discuss v2 results
      • The problem that ā€œcome outā€ and ā€œuse itā€ are split by ā€œteā€ and made into the original form to form ā€œkuruā€ and ā€œiruā€.
      • Same word, different division priority.
      • Change the score depending on the situation and split from the largest to the smallest.
      • Scores are now adjusted on a rule basis.
        • It will eventually fail.
        • Human beings cannot reconcile past decisions with consistency.
        • Moving to machine learning?
        • Iā€™m collecting examples of cases that have not been divided well.
  • v4: If we were to move to machine learning

    • What is the best way to do it?
    • 1: Binary classification of ā€œto separate or not to separateā€ is performed for each word, and the words are divided from the one with the highest ā€œscore to separateā€ until the length constraint is satisfied.
      • Score calculation of the current score-based method becomes a form of machine learning.
    • 2: CRFļ¼ŸLSTMļ¼ŸTransfomerļ¼Ÿ

Garbage Cleaning Algorithm

  • Not only do they divide, but they also remove unnecessary words.
  • E.g., ā€œOh, wellā€¦ā€
  • Currently, theyā€™re cutting it down to the dictionary base longest match.

Sticky note detail image

The default for images was not to auto-upgrade by default, but this has changed since M86. This auto-upgrade does not fall back to HTTP, so images served only via HTTP will not be viewable.

API - Natural Language Processing on Heroku


2021-01-05 - Canā€™t we do that with Shift-Reduce Algorithm?

  • Last time, I took the interconnected range approach of taking the interconnected range and then trimming the words I didnā€™t want.
  • Itā€™s good up to the point where you inscribe it into a phrase, a little shorter, but you can remove the final particle, etc., from that phrase and make it a sticky note.
  • Another way to describe what we want to do is.
    • Sentences are a little too short for a phrase, so Iā€™d like to combine as much as is acceptable.
    • I want to make a decision by looking at the length of the string of concatenated items.
    • There are so many words and phrases that donā€™t need to be included in the deliverables that I want to ignore.
  • The Simple Way
    • If the area enclosed in parentheses is short enough, it is adopted.
    • Sequence of words that commonly appear in multiple lines (apply RAKE)
      • This is useful, but not enough

yure Ability to automatically inscribe long-form content on stickies

pRegroup


This page is auto-translated from /nishio/é•·ę–‡ć®ä»˜ē®‹ćøć®åˆ†å‰²ę”Æę“ using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. Iā€™m very happy to spread my thought to non-Japanese readers.