- I want to help division long sentence into reasonable sizes as sticky notes.
Sentences written in a lighthearted manner
Ah, well, people who are not used to the process of making lots of stickies and doing the KJ method donāt have a good idea of how granular the information should be when they make the stickies in the first place. Thatās where the software needs to help. This is not the proper granularity.
- Divide and conquer by experienced personnel
-
- This is a split and delete only pattern
- With rewriting it looks like this:.
-
v1: interconnected range extraction
- It worked for a while, but not great.
- Instead of using the results after the engagement analysis, we thought it was necessary to devise a process of engagement analysis.
- While trying to implement the co-reference analysis itself, I realized that there is no need for co-reference analysis in the first place.
-
v2: 4-level decomposition algorithm
- consideration
- For āwords that appear at a distance but are connected in terms of their affinities.ā
- Whether they are connected in terms of engagement or not, they can be separated as stickies because they appear at a distance!
- Split by punctuation without length-dependent splitting, just split by punctuation.
- Itās good because I break it up with punctuation and if itās too long, I chop it up even more with conjunctive particles.
- For āwords that appear at a distance but are connected in terms of their affinities.ā
- structure
- Split by punctuation, parentheses, etc.
- If the length is above the threshold, weāll split it with a conjunctive particle.
- If the length exceeds the threshold, weāll split it with a cohesive particle.
- If the length is above the threshold, weāll split it with a case particle.
- I was thinking of doing machine learning based on the surrounding predispositions for where to divide, but when I looked at the predispositions with my eyes, I felt like this was the way to go.
- consideration
-
v3: Algorithm to split recursively from the one with the highest split priority score
- Observe and discuss v2 results
- The problem that ācome outā and āuse itā are split by āteā and made into the original form to form ākuruā and āiruā.
- Same word, different division priority.
- Change the score depending on the situation and split from the largest to the smallest.
- Scores are now adjusted on a rule basis.
- It will eventually fail.
- Human beings cannot reconcile past decisions with consistency.
- Moving to machine learning?
- Iām collecting examples of cases that have not been divided well.
- Observe and discuss v2 results
-
v4: If we were to move to machine learning
- What is the best way to do it?
- 1: Binary classification of āto separate or not to separateā is performed for each word, and the words are divided from the one with the highest āscore to separateā until the length constraint is satisfied.
- Score calculation of the current score-based method becomes a form of machine learning.
- 2: CRFļ¼LSTMļ¼Transfomerļ¼
- If the system is too heavy, even if it is possible, I think it will be troublesome to operate.
- Machine Learning for Long Sticky Note Segmentation
Garbage Cleaning Algorithm
- Not only do they divide, but they also remove unnecessary words.
- E.g., āOh, wellā¦ā
- Currently, theyāre cutting it down to the dictionary base longest match.
Sticky note detail
The default for images was not to auto-upgrade by default, but this has changed since M86. This auto-upgrade does not fall back to HTTP, so images served only via HTTP will not be viewable.
API - Natural Language Processing on Heroku
2021-01-05 - Canāt we do that with Shift-Reduce Algorithm?
- Last time, I took the interconnected range approach of taking the interconnected range and then trimming the words I didnāt want.
- I threw paired reception analysis into CaboChaā¦
- Itās good up to the point where you inscribe it into a phrase, a little shorter, but you can remove the final particle, etc., from that phrase and make it a sticky note.
- Another way to describe what we want to do is.
- Sentences are a little too short for a phrase, so Iād like to combine as much as is acceptable.
- I want to make a decision by looking at the length of the string of concatenated items.
- There are so many words and phrases that donāt need to be included in the deliverables that I want to ignore.
- The Simple Way
- If the area enclosed in parentheses is short enough, it is adopted.
- Sequence of words that commonly appear in multiple lines (apply RAKE)
- This is useful, but not enough
yure Ability to automatically inscribe long-form content on stickies
This page is auto-translated from /nishio/é·ęć®ä»ē®ćøć®åå²ęÆę“ using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. Iām very happy to spread my thought to non-Japanese readers.