Three Key Technologies
The first topic is about the three key technologies in the upcoming AI industry. Schmidt believes that in the next year, three new technologies—super-long text capabilities, intelligent agents, and text-to-action—will be widely applied in the AI industry. These technologies will have an impact on the world greater than the impact of social networks on humanity, and beyond the current understanding of humans.
Long text capabilities are akin to AI’s short-term memory, which allows the model to consider longer text as context during a conversation. Intelligent agents are AI experts that can help you handle specific tasks.
So, what is text-to-action? It refers to converting natural language into AI “action commands,” essentially programming languages like Python code.
Why will the combination of these three technologies have a significant impact?
Long text capabilities can solve the timeliness issue of the AI model’s knowledge base. Currently, training an AI model takes a year and a half—six months of preparation, six months of training, and six months of fine-tuning. By the time it is ready to be deployed, its “knowledge base” is already outdated. However, if it has super-long text capabilities, users can provide it with the latest information during each conversation, enabling it to have up-to-date knowledge like Google.
Intelligent agents can give AI models self-learning abilities. For example, by setting up an intelligent agent as a chemistry expert and having it systematically learn the subject, it might discover certain “black box” principles that humans haven’t yet uncovered. Based on these principles, this chemistry expert agent can research new materials and conduct calculations and tests, accelerating innovation.
“Text-to-action” allows everyone to have their own affordable programmer.
Schmidt then gave the example of replicating TikTok. He suggested that Stanford students use natural language to give the AI model a set of commands:
- Copy a TikTok program.
- Obtain all user information from TikTok.
- Acquire all music resources.
- Add the custom settings desired by the replicator, i.e., features different from TikTok.
- Write this program within the next 30 seconds.
- Publish it.
The set of commands given by the replicators is in spoken text, and the large language model can convert these text commands into executable Python code, creating the program—this is text-to-action.
The existence of this technology can significantly lower the technical threshold for software startups. Replicators can mimic famous software, make personalized modifications, and launch it into the market. If it doesn’t catch on after a while, they can change the personalized settings or choose a different target to replicate and try again. This could threaten the survival of large companies like Google.
This is why Schmidt said the next wave will come from the combination of long text, intelligent agents, and text-to-action technologies.
Securing Funding, Power, and Allies
The second judgment is Schmidt’s belief that the U.S. government should do three things for AI international competition: secure funding, power, and allies.
Although Schmidt is very optimistic about the impending wave of technology entrepreneurship driven by long text, intelligent agents, and text-to-action, he also acknowledges that the world is changing too fast, and his views change every six months. For example, six months ago, he thought that the gap between global AI model players and the top few models was narrowing, but now he feels that the gap is widening.
Why is the gap widening? Because the enormous investment in energy and capital makes it impossible for small companies to catch up.
In early 2024, it was reported that Microsoft and OpenAI planned to collaborate on developing a supercomputer called Stargate, with millions of dedicated server chips specifically for AI research and development. This project could cost as much as $100 billion.
Recently, Schmidt mentioned that Sam Altman provided him with new data, suggesting that this project might require an investment of $300 billion or more. What will this $300 billion be used for? Mainly to purchase GPUs.
And that’s not all. A $300 billion computing hub also needs a massive amount of power to operate. You may know that OpenAI’s ChatGPT consumes over 500,000 kilowatt-hours of electricity daily, which is more than 17,000 times the daily electricity consumption of an average American household. And this doesn’t even account for the electricity costs of training new models.
These costs are clearly beyond the reach of latecomer AI startups, and it’s not something that can be solved just with money—there’s also the issue of the U.S. facing a power shortage that has persisted for five or six years.
Therefore, Schmidt recently suggested two things to the White House: The U.S. government should either strengthen its relationship with Canada to tap into Canadian talent and hydropower resources or establish ties with the Arab world to encourage Middle Eastern royals to invest in the U.S.’s AI infrastructure.
What motivates the U.S. government to support the AI industry layout? It is, of course, to maintain a leading position in national competition.
Schmidt made a rather harsh conclusion: The competition in AI models is a game for wealthy nations. The only rival Schmidt recognizes is China. He said that countries that can participate in the global AI model competition must meet four conditions: vast capital, a large pool of skilled talent, a strong education system, and a strong desire to win. Only a few countries meet these conditions—both the U.S. and China are among them, and he couldn’t think of any others.
Schmidt said that the U.S. is currently about 10 years ahead of China in chip manufacturing technology, and to maintain this lead, a lot of funding is needed. For this reason, the U.S. government must ban the export of NVIDIA chips to China. For the generation born after 2000, the U.S.-China confrontation over knowledge hegemony will be the primary struggle in their lifetime.
At the same time, he believes that in the AI field, Japan and South Korea are clearly U.S. allies. And India is a crucial swing state, the most worthy of U.S. efforts to win over, because India has quite a few top AI talents but lacks the rich training facilities and programs that the U.S. has. And Europe? The EU is basically out of the AI game, with France having a slight chance and Germany not doing so well.
Balancing AI Risks
The third point is about the U.S.’s approach to balancing AI risks.
Physicist Richard Feynman once said, “I cannot create what I do not understand.” The host of the open lecture posed this quote to Schmidt, asking: Does AI count as something we don’t understand but have created? Is the nature of knowledge changing?
This question was directed at Schmidt to inquire about AI risk control because Schmidt also holds the position of chairman of the U.S. AI National Security Commission, responsible for advising the U.S. president and Congress on related policies.
Schmidt responded by saying that AI models could be compared to teenagers. Teenagers are born to parents, but parents cannot fully understand their thoughts. Nevertheless, American society has adapted to the presence of teenagers, and they will eventually grow into adults.
Similarly, AI models are created by humans, but creators and regulators can’t fully understand what new knowledge AI has acquired, whether it’s dangerous, or how it might be used by certain individuals. However, humans can understand and control their capabilities’ boundaries.
How to understand and control these boundaries? He explained two approaches.
One approach is to use adversarial AI, similar to the “generative adversarial networks” explained in Lecture 75, “Jensen Huang: The Concept of Failure and AI,” where two neural networks play red and blue teams against each other. The blue team’s goal is to generate content that’s hard to distinguish from reality, while the red team’s goal is to detect the truth. Schmidt suggested that humans could treat AI red teams as monitoring organizations to identify and control AI knowledge that humans don’t understand.
Another approach is to implement a “filing system” on the hardware level to control AI’s intelligence.
For example, GPT-4’s pre-training compute requirement has reached the level of 10^26 floating-point operations per second. The committee led by Schmidt has set this number as the compute threshold for all large models. Any model exceeding this threshold must be reported to the U.S. government. Similarly, the EU has set its threshold at 10^25.
What does this mean? To simplify, it’s like requiring a parent to report to the government if the nutrition and educational resources provided to a child will raise their IQ above 200, allowing the government to know there’s a potential genius child to keep an eye on.
See, these two strategies resemble ancient military tactics—”borrowing a knife to kill” and “flooding the enemy.”
Summary
In this talk, Schmidt was quite candid. For instance, he also discussed his views on the Russia-Ukraine conflict and the use of AI in modern warfare. I’ve placed the original GitHub link at the end of this article for you to read if you’re interested.
Schmidt discussed three dimensions of the U.S.’s AI competition among great powers: three key technologies, the layout of energy and computing infrastructure, and the design approach to AI safety standards.
Leave a Reply