Philip Wang, the mastermind behind the reverse-engineering of several closed-source AI systems like DALL-E 2, AlphaFold, and Imagen, has just released an implementation of PaLM + RLHF, a text-generating model that works just like ChatGPT.
This system combines Google's PaLM, a large language model with 540 billion parameters (three times as many parameters as ChatGPT), and Reinforcement Learning with Human Feedback (RLHF) to enable the creation of a chatbot that can do almost everything ChatGPT can, including responding to general inquiries, composing emails, and suggesting code.
Although this version differs slightly from the PaLM model created by Google a few years ago, its architecture and methodology are strikingly comparable. Wang is highly renowned for being able to "port" several other well-known designs; as a result, he may have some knowledge of simulating these models.
"The Effectiveness of PaLm and RLHF"
Since its introduction, ChatGPT, a refined version of GPT-3.5, has caught the IT world by storm thanks to its capacity to produce text that looks and reads like human speech and can converse naturally. Although ChatGPT may represent a substantial improvement over past chatbots, many supporters of AI have expressed worry regarding its restricted architecture.
As of right now, the ChatGPT model is still considered proprietary, thus anyone can access its source code. Only OpenAI has a complete understanding of how and what data OpenAI processes. The long-term impact of this lack of transparency on user trust is potentially significant.
An open-source alternative has long been a goal of many developers, and it has now materialised. PaLM + RLHF is a PyTorch implementation that was created specifically for the Python programming language. PaLM can be easily trained as an autoregressive transformer by developers, who can then use this training data to train the reward model.
What Are the Downsides?
What is required for the public to be able to use PaLm + RLHF?
To make PaLm + RLHF available to the public, large amounts of text from various sources such as blogs, social media posts, news articles, and e-books must be compiled and fed into the fine-tuned PaLm model. This will generate multiple responses to prompts, which will be ranked by human volunteers in terms of quality. This ranking will be used to create a reward model that orders the responses from the original model according to preference and filters for the best answers to a given prompt.
However, aligning this model with the desired functionality of ChatGPT is a costly and time-consuming process due to PaLm's massive 540 billion parameters. It is worth noting that the cost of developing a text-generating model with only 1.5 billion parameters can reach up to $1.6 million.