from Diary 2023-12-04 Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine https://arxiv.org/abs/2311.16452
y_matsuwitter I see a lot of discourse about how itās good to build small LLMs with specialized models, but Iām not sure if itās due to reasoning ability or other factors, but GPT-4 is a specialized However, research shows that GPT-4 performs better than specialized models, perhaps due to its inference ability. It is not necessarily correct to say that it is strong because it is specialized for Japanese or medical care. Of course, the computational cost falls, and it is not affected by the rules and biases of the operator, so it may be better in that respect.
Article Summary General-purpose foundational models such as the GPT-4 have shown remarkable ability in a variety of tasks without domain-specific training. However, there is a general consensus that domain-specific competence requires training in models using specialized knowledge. In this study, we conducted an exploratory study of prompt engineering for the GPT-4 using medically relevant benchmarks and showed that it is possible to demonstrate superior expertise competence without domain-specific fine-tuning.
icoxfog417 Since there is no guarantee that GPT-4 has not studied the medical papers used to train the specialized models (the Technical Report Not mentioned), I donāt think we know if this conclusion can be applied to the generic model as a whole. In general, the larger the model, the more training data is needed, so I think there is a non-zero possibility that GPT-4 has already been trained.
y_matsuwitter There are multiple points of contention. ć»Is GPT-4 more powerful than specialized types? ć»Can a generic model achieve high accuracy through inference capability? ć»Does the general-purpose model enhance over the specialized model by using inference ability and knowledge of each domain for learning? So, this paper is about the first, the second is as you say, and if the third is possible, it seems possible that each large-scale infrastructure model will gain high capability in each domain in the future.
odashi_t Itās not a proper comparison to begin with, since you didnāt apply any of the methods you engineered around to win at GPT-4 to Med PaLM. I agree. This is a problem before the contamination of the training data, and it is a bad paper.
This page is auto-translated from [/nishio/Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine](https://scrapbox.io/nishio/Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine) using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. Iām very happy to spread my thought to non-Japanese readers.