Nordic-Coding - Foundation Models

### More Examples for AI based completion * https://docs.warp.dev/features/ai-command-search * https://youtu.be/kSXpwOElFY0?t=113 * OpenAI Codex (basis for Copilot) * https://beta.openai.com/codex-javascript-sandbox * OpenAI Codex and GitHub Copilot are both models trained on the GPT-3 language prediction model created by OpenAI. However, while Copilot writes code alongside you in your text editor (as the name suggests), Codex requires that you access it via their API, or Playground. * https://aidan-tilgner.medium.com/github-copilot-vs-openai-codex-which-should-you-use-ed67e53e00c0

### How is this possible? * Tackle this with Machine Learning * Model has basically been been trained on all of the Internet including Github * Github contains code alongside tests and documentation * A large language model (based on GPT-3) is the basis

### Issues in Supervised Learning * linear effort in labelling data * significant error rate to be expected * all standard data sets contain up to 10% of errors * https://labelerrors.com/ * differences between different labelers * change in label definition might require to start all over _impractical with large data sets_

### Foundation Models: Transformer Core ideas 1. have a generalized language model 1. predict probabilities of sequences of words 1. train on a very large corpus 1. zero- or one-shot learning 1. self-attention for encoding long range dependencies 1. self-supervision for leveraging large unlabeled datasets (aka unsupervised pre-training) 1. additional supervised training for downstream tasks, e.g. - translation (lang1 & lang2 pairs) - question answering (Q&A pairs) - sentiment analysis (text & mood pairs) - etc.

### Transformer Zoo * the original transformer was meant for translation tasks * usage has broadened ever since * spawning a whole zoo of transformers * some use encoder only * some use decoder only * some use a combination of encoder/decoder just like the original transformer

### Decoder only (GPT-like) _also called auto-regressive Transformer models_ * the decoder part can transform given inputs into complete sentences * e.g. useful in itself, to complete started sentences * generates a response iteratively ("auto regressive") * GPT would be an example for this kind of application * unidirectional: trained to predict next word * by OpenAI

### Training GPT * self-supervised training * predict the next word, given all of the previous words within some text * has a limited context https://huggingface.co/transformers/model_summary.html#original-gpt https://huggingface.co/transformers/model_doc/gpt2.html

### Evolution of GPT GPT: Generative Pre-Trained Transformer * GPT-1: 2018, 110 million parameters (https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf), https://www.youtube.com/watch?v=LOCzBgSV4tQ * GPT-2: 2019, 1.5 billion parameters (https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf), https://www.youtube.com/watch?v=BXv1m9Asl7I * GPT-3: 2020, 175 billion parameters (https://arxiv.org/abs/2005.14165), https://www.youtube.com/watch?v=wYdKn-X4MhY * GPT-4: 2022, probably not much larger, but trained on more data and more context (4096 instead of 2048) (https://analyticsindiamag.com/gpt-4-sam-altman-confirms-the-rumours/)

### Don't forget: Transformers are language models * No abstract reasoning like it is in our brains takes place * The basis is the expression of thoughts in texts and code, etc. * That's the way the system is trained * Whether this is also intelligent is a pointed question * Turing would probably say it doesn't matter * One can argue that this system passes his test * https://twitter.com/glouppe/status/1438496208343949318

### On the Opportunities and Risks of Foundational Models * foundational models: trained on broad data at scale and are adaptable to a wide range of downstream tasks * ML is undergoing a paradigm shift with the rise of these models * their scale results in new emergent capabilities * defects of the foundation model are inherited by all the adapted models downstream * lack of clear understanding of how they work, when they fail, and what they are even capable of https://arxiv.org/abs/2108.07258

### Are Foundation Models Conscious? * Phenomenal consciousness = Does it have an inner cinema? * Self-consciousness = Is it aware of itself? * Sentience = Can it have positive or negative experiences? * Moral patienthood = Should we care about what we do to it? * Moral agency = Should we hold it accountable for what it does? * https://twitter.com/AmandaAskell/status/1493086389549862915 * https://www.heise.de/hintergrund/Hat-KI-bereits-eine-Art-Bewusstsein-entwickelt-Forscher-streiten-darueber-6522868.html * https://askellio.substack.com/p/ai-consciousness