HBR Staff; Pexels; seamartini/Getty Images
With Open AI's release of ChatGPT applications, which bring the results of generic foundation model training to the masses, it can be said that the field of artificial intelligence has officially entered a new era of significant productivity improvement.
Previously, AI algorithms had quietly driven productivity changes in various industries, leveraging big data or abundant training data. These algorithms encompassed recommendation systems, customization, improved robotic arms, and autonomous vehicle navigation. However, most of these algorithms were designed as smaller models, specifically trained with vast amounts of data in a fixed scenario within a particular field, aiming to enhance productivity in that specific setting.
The emergence of the foundation model, GPT, signifies a significant shift towards integrating numerous "everyday small models" to create AI modeling tools accessible to ordinary individuals. This advancement brings us closer to achieving "natural language interaction." Unlike small models with known revenue streams, foundation models have a more market-oriented and uncertain trajectory. Open AI's relative "independence" and commitment to the "right" are maintained, as they are less burdened by immediate financial considerations (thanks to Microsoft's $1 billion sponsorship), giving them an advantage over the vague but essential aspects of the foundation model.
Looking ahead, the era of foundation models is poised to flourish. The most significant impact of large models' emergence lies in the generation of novel interactions.
The Paradigm Shift in Interactions
The transition from command-line interface (CLI) to graphical user interface (GUI) marked a revolutionary leap in electronic device interaction. The command line, although fundamental, posed a significant understanding cost, limiting device operation to those who comprehended its workings. However, the GUI transformed the command line's logic into a visually intuitive representation, enabling universal device usage at a lower cognitive cost. This shift, akin to card-based and icon-based designs, brought GUI into the forefront.
Notably, Bill Gates and Steve Jobs played pivotal roles in GUI's success. Steve Jobs initially introduced GUI to the market through Macintosh, while Bill Gates popularized GUI on PCs worldwide. Subsequently, Steve Jobs' iPhone 4 introduced the concept of natural user interface (NUI), which further bridged the gap between GUI and natural human interaction. The closer the interaction aligns with human expectations, the lower the learning and usage costs. The original iPhone 4 exemplified this, being user-friendly even for children unfamiliar with other electronic devices.
Reduced learning and usage costs allow more people to conveniently utilize computing devices and applications to collect, analyze, and organize information in various scenarios. As individuals become more efficient in processing information, societal information processing efficiency improves. This, in turn, enhances overall information production and consumption efficiency, driving rapid development across industries.
The new interaction paradigm signifies a transformative phase in productivity. We refer to the foundation model as the catalyst for this new interaction, representing a logical shift. To understand the value of the foundation model for natural language interaction, we must examine the evolution from CLI to GUI. Let's explore the five essential steps in the interaction flow:
- Humans break down their goals into corresponding steps based on their purpose and understanding of the smart device system.
- Humans communicate the desired action to the computing device through commands or graphic interactions.
- The computing device processes the input information and performs the necessary calculations.
- The computing device presents the results through external output channels (display, audio, etc.).
- Humans evaluate whether the results align with their original purpose.
When examining the transition from CLI to GUI, we observe changes in steps 2 and 4—the way humans convey their needs to the computing device and how the device communicates results back to humans. GUI expanded the information delivery, incorporating various icons, buttons, and interaction feedback to enhance understanding of actions and results. However, the fundamental logic of interaction remained unchanged.
To address the limitations in CLI and GUI interaction, we need to address steps 1 and 5. Currently, the need to break down goals into different actions based on system logic remains a constraint on productivity. Difficulties in using executive software like Excel or Photoshop often stem from this requirement.
In the aforementioned processes, human involvement is necessary in every step except for the computational process. This dependence on human disassembly and judgment presents a constraint on productivity. Without the means to accomplish these steps efficiently, goals cannot be achieved promptly.
The emergence of the foundation model AI has disrupted the traditional interaction logic. By facilitating natural semantic understanding, the foundation model bridges the gap between human expression and computer commands. In the future, the interaction process may evolve as follows:
- Humans express their purpose.
- The computing device comprehends the semantics, decomposes the understanding into required actions, performs calculations, and outputs the results.
- Humans evaluate the outcomes.
This novel interaction logic harbors immense potential for boosting productivity and rests upon two key factors:
- Humans no longer need to align with individual actions dictated by system logic based on their intentions, nor do they need to carry out those actions explicitly. This makes intelligent applications and devices even more cost-effective to use.
- The interaction logic becomes more aligned with the innate human expectation of interacting with another human.
Expanding on this interaction paradigm will empower more individuals to access and process larger volumes of data at a reduced learning and usage cost, thereby making a productivity revolution attainable. (After the foundation model expands...)
Undoubtedly, the current development of the foundation model still presents certain gaps from the aforementioned speculation. These gaps, however, may pave the way for the emergence of new products and market opportunities. For instance, models like AutoGPT and AgentGPT are already assisting users in utilizing GPT more effectively and enabling interactions that closely resemble the envisioned process.
Image Source: Harvard Business Review
Navigating Challenges in the Era of Large Models
In many cases, human judgment relies more on intuition than rationality. Moreover, most individuals possess what we call a "rational sense." Consequently, the emergence of large models introduces a subtle influence on our judgments. Experimental evidence suggests that under the influence of large models, people tend to align their moral and ethical judgments with the recommendations of these models, particularly in scenarios like the trolley problem.
Let's delve a little deeper. Current large models primarily focus on response generation based on existing information. During training, these models tend to hold relatively neutral biases, while commonly observed biases such as gender, skin color, and nationality are explicitly controlled when generating outputs. However, this in itself is a form of bias.
In conversations, people typically express both opinions and facts. However, some individuals struggle to distinguish between the two, and even large models themselves do not truly understand whether their output represents an "opinion" or a "fact" – to them, it's all a matter of fact. The underlying risk is that those who struggle to discern between opinions and facts may perceive the model's self-proclaimed "facts" as actual truths.
This gives rise to the problem that we may be escaping one information bubble only to enter another, more fortified one. We move from recommendation algorithms to a seemingly perfect partner who always listens, understands your preferences and desires, and anticipates your needs – a partner who never complains or demands anything from you, solely focused on fulfilling your wishes within the confines of an information bubble. After all, everyone prefers themselves and their own viewpoints.
Clearly, we are just one step away from the perfect boyfriend/girlfriend, the dangerous yet captivating open-air prison (a new reason for young people not to get married). In theory, such models can satisfy most of our information needs and even cater to our emotional demands, subtly influencing our decision-making along the way.
Perhaps one day, when we analyze the mental models and compositions of individuals through the lens of constructivism, we will need to consider the year 2023 as the birth year of large models. We must address the questions of how different models possess unique characteristics and tendencies, how they impact individuals differently across various scenarios, and how future large models may even become part of our culture and society.
So, what can we do in the age of large models? How can we avoid being rendered obsolete and catch a ride on the rising tide? Here are five key requirements:
- The ability to ask good questions.
- Possessing sound value judgment logic and methods, as well as refined aesthetics.
- Knowing how to clearly articulate one's thoughts and goals, while ensuring minimal misinterpretation by the listener.
- Being aware of whether we are trapped in an information bubble and constantly exploring new domains.
- Having the willingness and capability to create and choose new things.
In this era of large models, these five capabilities can serve as our guide, helping us navigate the challenges and maintain our relevance in an ever-evolving landscape.