OpenAI is “working to add more capacity” to its new ChatGPT service – which promptly crashed after launch.
ChatGPT is a chatbot that uses AI to return answers made either via text or voice. Its beta release on November 30 saw many users speculating that it could be the thing that knocks Google off its search perch.
(Unlikely for now: These are very different things indeed, not least because ChatGPT does not have access to near-term or live training data from the internet, but it may yet cannibalise some search workloads.)
OpenAI CEO Sam Altman said on December 1: “There is a lot more demand for ChatGPT than we expected” – as attempts to test the service saw it returning “internal server error” results to users including The Stack.
ChatGPT is fine-tuned from a model in the GPT-3.5 series, which finished training in early 2022.
ChatGPT and GPT 3.5 were trained on an Azure AI supercomputing infrastructure. OpenAI described the new AI chatbot as the “latest step in OpenAI’s iterative deployment of increasingly safe and useful AI systems.”
Releasing the public beta of ChatGPT, Altman had noted that “language interfaces are going to be a big deal, I think. Talk to the computer (voice or text) and get what you want, for increasingly complex definitions of "want"!
"This is an early demo of what's possible (still a lot of limitations--it's very much a research release).”
https://twitter.com/RoxanaDaneshjou/status/1598179007778160643
He added: “Soon you will be able to have helpful assistants that talk to you, answer questions, and give advice. later you can have something that goes off and does tasks for you. eventually you can have something that goes off and discovers new knowledge for you. But this same interface works for all of that. This is something that sci-fi really got right; until we get neural interfaces, language interfaces are probably the next best thing.”
How was ChatGPT trained?
OpenAI said it trained the ChatGPT model in part using Reinforcement Learning from Human Feedback (RLHF): “We trained an initial model using supervised fine-tuning: human AI trainers provided conversations in which they played both sides—the user and an AI assistant. We gave the trainers access to model-written suggestions to help them compose their responses. To create a reward model for reinforcement learning, we needed to collect comparison data, which consisted of two or more model responses ranked by quality. To collect this data, we took conversations that AI trainers had with the chatbot. We randomly selected a model-written message, sampled several alternative completions, and had AI trainers rank them. Using these reward models, we can fine-tune the model using Proximal Policy Optimization. We performed several iterations of this…”
The model is currently able to reference/"remember" up to ~3000 words from a current conversation.
OpenAi notes that the beta release “may occasionally generate incorrect information; may occasionally produce harmful instructions or biased content [and] has limited knowledge of world and events after 2021.”
Take it for a spin here when new capacity is added...