work memo
- clone [ppoi
- Renamed to
meanless_newline_ppoi
. -
Unjudged input must be placed in unknowns.txt before execution
- Suddenly the staircase is a lot bigger.
- This time, we have a situation that âWhen I convert PDF to text, it is divided into lines, and I want to merge them, but there are some that should be merged and some that should notâ.
- I want to âremove what looks like meaningless line breaks.â lines
['Part I. u3000 Introduction', '', 'Originally there was not a single word filosofiva [philosophia] even in Greece, but the word filosofoV = Weisheitsfreund appeared in Herakleitos [Herakleitos, ca. 540-ca. 480B.C.]. ', 'The word filosofoV = Weisheitsfreund first appeared in Herakleitos, ca. 540-ca. 480B.C.', 'It is said to have come from Pythagoras, but that is a mistake. By the way, the word that forms the source of today's Philosophie comes from Herodotus (484-ca. 425B.C.), the father of history, who used the word filosofei:n,'' which, as mentioned above, is a form of the verb "to seek wisdom," and is also used by the philosophers of today, ', '', '', '6', '', 'Then it was transformed into the noun form filosofiva ăæç„ă. The word "philosophia" was first used in the broad sense of "love of wisdom" or "pursuit of intellectual education," but through the Sophists and Socrates (Sokrates, 470/69-399B.C.) it gradually became limited to "study,''' '', "to study,''' and "to study. to study', '','' through the Sophists and Socrates [Sokrates, 470/69-399B.C.]. In the Sophists and Socrates, however, the word sophie', '', 'a' still included the ancient meaning of technical intelligence. It is not until Plato that it becomes wisdom in a purely spiritual sense. , '', 'Plato distinguished two kinds of knowledge. One is ejpisthvmh = Erkenntnis, knowledge, and the other is dovxa = Meinung, opinion. The former, or episteme, is true knowledge, true', 'knowledge of the real, Plato's so-called Idea. Idea is the eternal, immutable, true existence.', '', '']
py
import json
data = open("Introduction to Philosophy.txt").read()
data = data.split("\x0c")
# print(len(data))
unknowns = []
for page in data:
lines = ["SOP"] + page.split("\n") + ["EOP"]
for i in range(len(lines) - 3):
unknowns.append(json.dumps({
"i": i,
"neg_i": len(lines) - i - 4,
"lines": lines[i:i+4],
}, ensure_ascii=False))
with open("unknowns.txt", "w") as f:
f.write("\n".join(unknowns))
%run meanless_newline_ppoi/cli.py
ImportError: cannot import name 'clock' from 'time' (unknown location)
%run meanless_newline_ppoi/cli.py --initialize
Enter at least one positive examples (empty to exit)
{"i": 10, "neg_i": 15, "lines": ["", "What is philosophy? It is difficult to give a simple answer to this question. Philosophy has been around for a long time and predates the individual "," "special sciences. Philosophy has undergone many changes throughout history and will continue to develop in the future."]}
- Iâm having them enter it in text format, which is hard to do.
- Didnât we fix thisâŠ
FileNotFoundError: [Errno 2] No such file or directory: '/Users/nishio/scaffold_network/kitaro/meanless_newline_ppoi/ppoi/unknown.txt'
- You should check this first and warn them.
- Placement, with or without UNKNOWNS s :
%run meanless_newline_ppoi/cli.py --interactive
{"i": 0, "neg_i": 28, "lines": ["SOP", "One feels less pleasure when it becomes a habit, etc. Marshall's ("Pain," "Pleasure and Aesthetics," 1894 [Pain, Pleasure, and Aesthetics]) and other works are probably from this standpoint. From a psychological point of view, there is no other way to explain it. But the pleasure given by beauty and goodness is not merely a physical pleasure and a single "]}: 0.5001
negative(z), neutral(x), positive(c), quit(q)>
- Itâs working.
- Improved display
- Hey, I think I made it so that the user can customize how it is displayedâŠ
- ppoi improvement plan
-
âWhereâs the latest code?â so I pushed the current code anyway.
- Where.
- The last update before the repository weâre looking at now is 2018.
- Uh, itâs in the keichobot repository.
- 6018e017aff09e0000f60b4e
-
Letâs edit it rather than use it as it is.
- Oh, I see what you mean.
-
liquidation
- Active learning is convenient for cases in which a human being cannot clearly verbalize the conditional expression of an if statement at this time, but can determine true or false by looking at the case study.
- I wrote the code for active learning two or three times from scratch, and then I thought, âItâs inefficient to make it from scratch every time, so I want to put it all together,â so I created ppoi.
- But as a tool structure, the current ppoi is not appropriate.
- So when using it in keichobot, it was not flexible enough to âcopy and rewriteâ, and it was not reflected in the main repository.
2022-04-05
Not sure what the correct structure is, so copy and rewrite again this time Interactive learning has come around.
- Downsampling changed from 5 seconds to 1 second.
$ python3 -m ppoi.main --cross-val
Accuracy: 0.80 (+/- 0.18)
2022-04-06
In this case, we need the whole page, not each line, to create the features
- These things change from case to case, which is why itâs so difficult to design in generalâŠ
This page is auto-translated from /nishio/meanless_newline_ppoi using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. Iâm very happy to spread my thought to non-Japanese readers.