Language Is Not the Entirety of Human Thought
Language is not the entirety of human thought; there are at least those active in the latent space of our brains before we speak or write down answers.
GPT has always lacked the process of thinking and the learning of thought chains.
When a friend asks you a question, do you respond immediately, or do you need to think about it?
When facing complex problems, GPT's forced "blurting out" has made it learn to fantasize rather than having a process of reflection for every thought and every word spoken.
Taking solving a math problem as an example, I've summarized several characteristics:
- Math solutions might contain few words, but the thinking time in our brain is particularly long
- Some problems require little thought before calculation, some answers are apparent at first sight, while others might take a long time to solve
- For solution problems, we usually need to think forward from the question, while for proof problems, we might need to think from both ends of the conclusion
- When thinking about a problem, we typically experience: trying to see if this approach works, making some but not much progress, feeling the direction might be right, thinking further, feeling the direction might be wrong, getting stuck, needing to rethink, trying another direction...
So how can we implement such thinking in GPT? (Here comes the imagination time)
Again, taking math problems as an example:
- The dataset consists of solution processes for difficult problems
- Two agents, A doesn't know the answer and solution process, B knows both
- B needs to guide A's thinking, or use prompts to get A to start thinking, beginning with the first attempt in a step-by-step direction
- Drawing from the A* algorithm, B evaluates A's thinking volume and deviation. If A's thinking volume is low, there might be deviation, but B lets A continue thinking. When both A's thinking volume and deviation are high, B makes A return to one of the checkpoints to rethink until A's thinking volume meets expectations and deviation is minimal
- Record the output of A and B during this process, obtaining language encoding similar to human brain thinking
- The results, which are training materials, are divided into three parts
- prompt
- thought process in the answer
- narrative part in the answer
- Example
- prompt: ...solve this math problem...
- answer: Let me think about this problem (preset); I think first/first part can be like this... (Agent A); Hmm, keep thinking (Agent B); Then could it be like this... (Agent A); The approach seems OK, need to think more (Agent B); Let me see, em... am I stuck? (Agent A); The direction might be problematic, let's go back to previous thoughts (Agent B); Let's try thinking from here........................seems close to the result (Agent B); I got it! <thinking end mark> (Agent A)
- answer: ...This math problem should be solved like this... (Obviously a summary of the previous thought process and correct answer, easy task)
- Training, using prompt as finetune input, combining both parts of the answer without separators as output
Roadmap:
- This learning method can enable AI to first learn to memorize knowledge (GPT-3 level), then think about simple things (GPT-4 CoT level), and then attempt to learn more complex things
- Until one day, AI learns how to design training for its own thinking, thus completing self-bootstrapping and achieving AGI
After thinking for several days, I now strongly believe that AI can become smarter through massive knowledge, not by "learning things I don't know" but by "learning things I can't do."