Likewise, a familiar trope in science fiction is the rogue AI system that attacks humans to protect itself. Hence, a suitably prompted dialogue agent will begin to role-play such an AI system. If the model has generalized well from the training data, the most plausible continuation will be a response to the user that conforms to the expectations we would have of someone who fits the description in the preamble. In other words, the dialogue agent will do its best to role-play the character of a dialogue agent as portrayed in the dialogue prompt. Even though neural networks solve the sparsity problem, the context problem remains.

language understanding models

However, next to discussing BERT-style masked language models (encoders) and GPT-style autoregressive language models (decoders), it also provides useful discussions and guidance regarding pretraining and finetuning data. The BERT paper above introduces the original concept of masked-language modeling, and next-sentence prediction. If you are interested in this research branch, I recommend following up with RoBERTa, which simplified the pretraining objectives by removing the next-sentence prediction tasks. Such massive amounts of text are fed into the AI algorithm using unsupervised learning — when a model is given a dataset without explicit instructions on what to do with it. Through this method, a large language model learns words, as well as the relationships between and concepts behind them.

Microsoft releases Orca 2, a pair of small language models that outperform larger counterparts

Word probabilities have few different values, therefore most of the words have the same probability. But the first version of GPT-3, released in 2020, got it right almost 40 percent of the time—a level of performance Kosinski compares to a three-year-old. The latest version of GPT-3, released last November, improved this to around 90 percent—on par with a seven-year-old. http://noisecore.ru/s-mesyac-records.html After each layer, the Brown scientists probed the model to observe its best guess at the next token. Between the 16th and 19th layer, the model started predicting that the next word would be Poland—not correct, but getting warmer. Then at the 20th layer, the top guess changed to Warsaw—the correct answer—and stayed that way in the last four layers.

language understanding models

This unbridgeable gap between mental model and reality obtains for many natural nonliving systems too, such as the chaotic weather in a mountain pass, which is probably why many traditional people ascribe agency to such phenomena. In fairness though, if bullshit about a “favorite island” (or anything else relating to inner life) is kept consistent, it may not be distinguishable from reality. Having stable preferences, keeping promises, taking expected actions, and following through can all be understood as forms of consistency. Consistent words and actions construct a shared reality, form the basis of trust, and are required of any agent whose actions can have real-life consequences.

Large language models use cases

The agent is good at acting this part because there are plenty of examples of such behaviour in the training set. Only confabulation, the last of these categories of misinformation, is directly applicable in the case of an LLM-based dialogue agent. Given that dialogue agents are best understood in terms of role play ‘all the way down’, and that there is no such thing as the true voice of the underlying model, it makes little sense to speak of an agent’s beliefs or intentions in a literal sense. So it cannot assert a falsehood in good faith, nor can it deliberately deceive the user. Likewise, a simulacrum can play the role of a character with full agency, one that does not merely act but acts for itself. As for the underlying simulator, it has no agency of its own, not even in a mimetic sense.

The secret object in the game of 20 questions is analogous to the role played by a dialogue agent. Crudely put, the function of an LLM is to answer questions of the following sort. Given a sequence of tokens (that is, words, parts of words, punctuation marks, emojis and so on), what tokens are most likely to come next, assuming that the sequence is drawn from the same distribution as the vast corpus of public text on the Internet? The range of tasks that can be solved by an effective model with this simple objective is extraordinary5. Another possible reason that training with next-token prediction works so well is that language itself is predictable.


0 Comments

Lascia un commento

Il tuo indirizzo email non sarà pubblicato.