voluntas Iâm seriously wondering where full-text search is headed. I have a feeling that the current trend will result in infinitely less demand.
-
voluntas For our own documentation (Sphinx), of course, interactive search like ChatGPT is much better than full-text search. I can also see a future where we can generate a lot of sample code to learn. It would also be nice to be able to provide reference links. In this case, I wonder if full-text search is necessary. Iâm not so sure.
-
voluntas Search is not interactive, and it is too cunning to say that it is interactive.
-
voluntas Documentation is seriously important for closed source products like ours, so we have to think a lot about it.
tokoroten I think that only legal documents, contracts and receipts have lost the meaning of holding the full text anymore.
-
Other content can be genned as needed.
-
voluntas Ah, in my case, it is an âofficial documentâ, so I think it is of great value in itself. Only the base information cannot be generated automatically.
-
voluntas Is this correct?
-
tokoroten I donât think it even needs to be an official document unless the official document has a legal basis.
-
Ultimately, a future where documentation can be genned from time to time from the language specification and numerous examples.
-
voluntas Ah, I see what you mean. However, in the case of my own products, there is a ârange of what can be made publicâ (I donât want dynamic generation from the source), and I feel that a human being should do his/her best in that area. Itâs like a borderline.
-
moriwaka It is important to write documentation to clear up the distinction between what is behavior that you can promise the user that you intend to maintain in the future, and what is implementation-dependent behavior that is currently the case but you donât know if it will change in the future. It is important to write a document to clarify the distinction between what is implementation-dependent behavior and what is implementation-dependent behavior. âŚ
-
voluntas I agree. I think it will be very important to have an authorization to ârelease information to whom and to what extentâ.
-
nishio tokorotenâs point of view is interesting, but it seems to me that Vâs focus is different from what he was concerned about in his first tweet, and Vâs point of view is interesting too, so Iâd like to elaborate. Iâm inclined to ask.
moriwaka Considering the old and new versions of the documentation, âWhen did the XX feature come from? How did it change? In the future, full-text search will be used as a backend for front-end AI to obtain necessary documents when dealing with queries such as âWhen did the XX feature come out of the old version?
johtani I guess by full text search here you mean keyword search, which may change how it will look in the future as a UI. Iâm not sure if itâs easier to ask in natural language or not, so things may change. I wonder how the UI will be changed in the future. As for how to hold the data, a system to search by vector rather than inverted index is emerging, and I think thatâs what will be used.
-
johtani And then there are searches for things that are not full text (is this off topic?). And Iâm talking about âfull-text search.â
-
voluntas Will the full-text search mechanism itself be superseded? I think so. I think that the search itself is to check the âpointer to the original textâ.
nishio I put it together. Where is full-text search headed? My opinion: LLM will do full text search Toolformer
-
nishio Humans create documents for LLMs to refer to, and at first the LLMs will approach the documents as if they were written for humans to read, and eventually the knowledge that âthis format is better for LLMs to readâ will build up, and they will write in that format. Eventually, the knowledge that âLLMs can read better if it is formatted in this wayâ accumulates, and they start to write in such a format.
-
nishio Readers can explicitly ask questions and get answers, or they can ask âshow me a tutorialâ and get a generated tutorial. If you design the information well, you can tell them âI have experience with A, I donât have experience with Bâ and generate it for that individual, or you can read what comes up and âlearn moreâ when you donât understand.
dmikurube So-called âdocumentedâ functions imply âyou may useâ. So, a function that is implemented as a function but not to be used without permission is called âundocumentedâ. If the undocumented feature is made documented without permission, dependency on it will occur, and it will be a problem later.
-
dmikurube Just because the code is implemented and works, there is still a bit of a gap between that and an official âfeature you can useâ. If they look at the code and decide itâs OK to use without permission, itâs going to be extremely difficult to maintain later on.
-
dmikurube Donât think the existence of âoriginal textâ will ever go away, including in that regard.
-
dmikurube Itâs going to happen normally that full-text search for documents will be replaced by queries against the language model, but donât think that the document, the I donât think that the existence of the original document will disappear, but it will go in the opposite direction. The query itself will be analyzed to expand the documentation based on the knowledge of âOh, so this is what you want to know.
-
nishio This pathway is important, a future where the LLM summarizes the query the reader is throwing and suggests âyou should write more about ~â!
-
-
dmikurube If something works, there will certainly be people who start using it without permission, and people get angry when the behavior there changes because they were just using it without permission. Documentation is a line of defense to talk back when people get angry there!
hrjn Sometimes the original text is important, sometimes itâs not, and if the original text isnât important, I feel like itâs a good thing that ChatGPT answered the question in the right way. On the other hand, in most cases, the origin of information is important in determining the reliability of information, so I donât think the act of searching for documents will eventually disappear.
-
hrjn For example, when I look at StackOverflow, I judge the reliability of ChatGPT by looking at the upvotes and comments to some extent. I donât read that much air, so I have to look at the original text.
-
hrjn At least at present, ChatGPT does not guarantee the quality of information, and if anything, the quality is on the low side.
-
To begin with, what is the quality of information is a deep issue, and it is not easily solved, as evidenced by Googleâs preference for âHow was it?â articlesâŚ
-
nishio Isnât it subtle whether a âhumanâ would search âwith a keyword exact match full-text searchâ that the LLM points to a link and reads the original text, since âhumans Since ânoâ.
-
hrjn: Well, I kind of understand what you mean, but I think that âfull-text search with exact keyword matchesâ hardly exists nowadays, or at least token-level normalization is done by any full-text search engine, or at least synonym definition or query expansion is done by default. I think that synonym definition and query expansion can be done by default.
-
People who are a bit richer have been doing as much as vector searching of distributed documents > > since before LLM, so nothing has changed.
voluntas Iâm not sure if Iâm making a good premise here, and itâs not ChatGPT or anything, but rather âisnât full-text search for a particular document switching to interactive search? I think itâs a good idea to make it interactive. Iâm not sure if itâs ChatGPT or something else.
hrjn: I feel that the original text may or may not be important, and if the original text is not important, then itâs a good thing that ChatGPT answered the question in the right way.
-
hrjn I think it would certainly change in terms of searching within a document.
-
Iâve been using EDGE lately to search for information on pages by doing things like âwhatâs the conclusionâ or âspecific examplesâ, but I think thatâs what I want to know.
-
hrjn This is also rarely a lie, but I think itâs helpful because itâs much easier to confirm if you read the worst of it and find it after you know.
This page is auto-translated from /nishio/ĺ ¨ćć¤ç´˘ăăŠăăŤĺăăăŽă using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. Iâm very happy to spread my thought to non-Japanese readers.