Sentences written in a lighthearted manner

Ah, well, people who are not used to the process of making lots of stickies and doing the KJ method don’t have a good idea of how granular the information should be when they make the stickies in the first place. That’s where the software needs to help. This is not the proper granularity.

  • image Divide and conquer by experienced personnel
  • image
  • image
    • This is a split and delete only pattern
  • With rewriting it looks like this:.
    • image

  • v1: interconnected range extraction

    • It worked for a while, but not great.
    • Instead of using the results after the engagement analysis, we thought it was necessary to devise a process of engagement analysis.
    • While trying to implement the co-reference analysis itself, I realized that there is no need for co-reference analysis in the first place.
  • v2: 4-level decomposition algorithm

    • consideration
      • For ā€œwords that appear at a distance but are connected in terms of their affinities.ā€
        • Whether they are connected in terms of engagement or not, they can be separated as stickies because they appear at a distance!
      • Split by punctuation without length-dependent splitting, just split by punctuation.
      • It’s good because I break it up with punctuation and if it’s too long, I chop it up even more with conjunctive particles.
    • structure
      • Split by punctuation, parentheses, etc.
      • If the length is above the threshold, we’ll split it with a conjunctive particle.
      • If the length exceeds the threshold, we’ll split it with a cohesive particle.
      • If the length is above the threshold, we’ll split it with a case particle.
    • I was thinking of doing machine learning based on the surrounding predispositions for where to divide, but when I looked at the predispositions with my eyes, I felt like this was the way to go.
  • v3: Algorithm to split recursively from the one with the highest split priority score

    • Observe and discuss v2 results
      • The problem that ā€œcome outā€ and ā€œuse itā€ are split by ā€œteā€ and made into the original form to form ā€œkuruā€ and ā€œiruā€.
      • Same word, different division priority.
      • Change the score depending on the situation and split from the largest to the smallest.
      • Scores are now adjusted on a rule basis.
        • It will eventually fail.
        • Human beings cannot reconcile past decisions with consistency.
        • Moving to machine learning?
        • I’m collecting examples of cases that have not been divided well.
  • v4: If we were to move to machine learning

    • What is the best way to do it?
    • 1: Binary classification of ā€œto separate or not to separateā€ is performed for each word, and the words are divided from the one with the highest ā€œscore to separateā€ until the length constraint is satisfied.
      • Score calculation of the current score-based method becomes a form of machine learning.
    • 2: CRF?LSTM?Transfomer?

Garbage Cleaning Algorithm

  • Not only do they divide, but they also remove unnecessary words.
  • E.g., ā€œOh, wellā€¦ā€
  • Currently, they’re cutting it down to the dictionary base longest match.

Sticky note detail image

The default for images was not to auto-upgrade by default, but this has changed since M86. This auto-upgrade does not fall back to HTTP, so images served only via HTTP will not be viewable.

API - Natural Language Processing on Heroku


2021-01-05 - Can’t we do that with Shift-Reduce Algorithm?

  • Last time, I took the interconnected range approach of taking the interconnected range and then trimming the words I didn’t want.
  • It’s good up to the point where you inscribe it into a phrase, a little shorter, but you can remove the final particle, etc., from that phrase and make it a sticky note.
  • Another way to describe what we want to do is.
    • Sentences are a little too short for a phrase, so I’d like to combine as much as is acceptable.
    • I want to make a decision by looking at the length of the string of concatenated items.
    • There are so many words and phrases that don’t need to be included in the deliverables that I want to ignore.
  • The Simple Way
    • If the area enclosed in parentheses is short enough, it is adopted.
    • Sequence of words that commonly appear in multiple lines (apply RAKE)
      • This is useful, but not enough

yure Ability to automatically inscribe long-form content on stickies

pRegroup


This page is auto-translated from /nishio/é•·ę–‡ć®ä»˜ē®‹ćøć®åˆ†å‰²ę”Æę“ using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.