In the last week of 2024, media outlets like iFanr visited Vivo’s headquarters in Dongguan to engage in a conversation with Vivo’s Executive Vice President and Chief Operating Officer, Hu Baishan. They discussed market dynamics, AI progress and applications, and the future direction and planning of Vivo products. This included thoughts on the foldable screen market, plans and views on MR glasses, humanoid robots, AI glasses, and Vivo’s strong suit: imaging.
Below is a summary of the product-level conversation (edited by iFanr for readability):
Telephoto and Video Have Room for Improvement; Mobile AI Has a Long Way to Go
Q: What is your view on the current state of AI? Will AI replace imaging as the primary selling point for smartphones in the future? Have flagship phones reached their peak in imaging capabilities?
Hu Baishan: Let’s talk about imaging first. Our ultimate goal is to replace most DSLR camera scenarios, so there is still significant room for improvement.
As I mentioned before, the main camera of the X200 Pro has been reduced from the previous flagship’s 1-inch sensor to a 1/1.28-inch sensor, yet the user experience hasn’t declined. This is because the chip processing power and imaging algorithms have made significant strides. This indicates that the main camera’s user experience has reached a decent level. If we were to score it, assuming a conventional DSLR is 100 points, our main camera is close to 80 to 85 points.
However, in terms of telephoto and video, there is still a considerable gap compared to DSLRs. If we continue scoring, the main camera is 80 to 85, while telephoto is around 60 points, barely passing.
In concert scenarios, at 10x zoom, our X200 Pro performs well, and at 20x, you can recognize who the person is when shooting from the outer area at night. However, users are still hesitant to share these photos on social media because the quality isn’t good enough, but 10x is presentable.
In the telephoto area, our smartphone imaging is quite distant from DSLRs. We aim to improve telephoto to an 80-point level within 3 to 5 years, and this opportunity still exists. Although the internal space utilization of smartphones has reached its limit, where else can we improve? The sensitivity of imaging sensors can still be enhanced through technology, and there is significant room for improvement in large models and imaging algorithms. This is why I am confident that Vivo can achieve an 80-point telephoto in the future.
Photography is relatively static, so algorithms have more room to play, but video is dynamic. Adding a bunch of algorithms to video would put enormous pressure on power consumption. Of course, there is room for improvement here as well. Chips are now at 3nm, and the next generation will be 2nm. SoC chips, and even future dedicated imaging processing chips, will advance. Our next step is to apply large model algorithm capabilities to video, but the overall logic of video is dynamic, so the algorithm’s enhancement capability will still be weaker.
Whether it’s telephoto or video, there is still a considerable distance from meeting users’ high demands, and the technology itself has significant room for development. Therefore, imaging remains a key focus for future flagship smartphones.
As for AI, indeed, the development of large models has been rapid over the past two years. Returning to the phone itself, AI still has its limitations. The biggest issue with phones is insufficient computing power. I divide mobile AI into three stages:
The first stage is enhancing past functions with AI capabilities. For example, in recent times, the entire mobile industry has been quite popular with AI removal, a feature that existed over a decade ago but was poorly executed due to primitive algorithms.
In the past, voice recognition capabilities using deep learning had a success rate of only 90% at best. With such a success rate, you would find that conversations couldn’t last for many rounds, as each step would distort too much. With the emergence of generative large models, voice recognition and semantic understanding capabilities have significantly improved. We had a feature called Phone Secretary, first introduced on the NEX 3, where people could immediately tell it was traditional AI, and the call would be hung up after a few sentences. Now, with AI support, people can’t tell it’s AI speaking in a short time.
These are still based on the enhancement of a specific function or module, far from general artificial intelligence (AGI).
The second stage, I believe, is integrating large model capabilities into the system. For example, in the past, finding a function setting was nearly impossible because there were too many menu options, all jumbled up. In the future, with AI deeply integrated into the system, phones will clearly understand your intentions and know what to do next, making phone interactions more intelligent. For instance, our initial attempt with “Atomic Island” is to understand your intentions and propose solutions. This stage will last quite a while because the user experience at this stage can barely be met with current computing power.
The third stage is what we mentioned at the VDC 2024 conference, PhoneGPT. The feature we demonstrated was ordering takeout, and it could be done successfully. However, each step had only an 85% success rate, and after three steps, it couldn’t proceed, and it took a long time. This experience is just a model, and the user experience is not good at all.
To truly achieve the requirements of PhoneGPT, the demand for computing power is not just a slight increase but a significant one. The current integrated architecture, packaging architecture, and bandwidth are insufficient. To truly achieve PhoneGPT, the entire capability requirement must be close to the current high-speed storage, server-side capabilities, bandwidth capabilities, and SoC architecture to have a chance.
This is similar to imaging. We can see that user demand has already emerged. Many models run on cloud servers. Our internal computing power center has nearly 10,000 computing cards, and many models can run on the cloud, such as models with 130B parameters, but this scale cannot run on phones. Phones can only run models with 2B or 3B parameters. So, to truly achieve PhoneGPT on phones, I estimate it will take at least five years to meet user experience requirements.
The AI track is currently still in the second stage. It is a gradual improvement, not a leap from 0 to 1. Therefore, AI is not a significant driving force for the current phone replacement cycle because users haven’t experienced a leap from 0 to 1. Only when such a leap occurs, and users discover that PhoneGPT can do so many things, will they have a strong desire to upgrade their phones.
Since I am responsible for both products and technology, what I reveal should reflect the current level of our technology or the entire industry’s technology.
Q: In the smartphone industry, what aspects reflect the new quality of productivity, and which parts are the most important?
Hu Baishan: The smartphone industry is a prime example of new quality productivity. As I understand it, new quality productivity has three characteristics: high technology, high quality, and high dynamism, along with four new features. By these standards, smartphones fall under the category of new quality productivity. Over the years, we’ve seen continuous updates of new technology in smartphones.
We focus heavily on two areas: imaging and AI. In the imaging field, over the past five years, people have noticed the rapid improvement in smartphone photography under various conditions. This has been a fast-paced advancement.
Smartphones have replaced many digital cameras we used in the past, even replacing mirrorless cameras, and in some scenarios, DSLRs. More consumers are willing to pay for better photography effects, spending more money on phones to achieve this.
In 2024, we will release the X100 Ultra and X200 Pro, which we call the “concert magic devices.” Concerts have been frequent in recent years, and consumers want to capture these beautiful moments. Why do concerts need smartphones? DSLRs can’t be brought into concert venues, so consumers can only use phones to capture these moments.
The AI field is similar. AI is just starting, but it has empowered many areas of smartphones. I believe the smartphone industry, as a representative of new quality productivity, is undoubtedly significant. I also believe that for a long time, smartphones will remain the core consumer electronic product, contributing to new quality productivity.
Vivo MR Prototype Coming in 2026, Humanoid Robots to Mature in Ten Years
Q: How is Vivo progressing in MR (Mixed Reality) and humanoid robots?
Hu Baishan: Our MR progress is relatively fast. The Vivo MR team has grown to nearly 500 people. Our goal is to have a high-fidelity MR experience prototype available in Vivo stores across about a dozen cities nationwide by September or October 2025. From booking to on-site experience, we aim to create a standardized process for everyone to try it out.
For commercialization, we need to look at the entire MR ecosystem, which still requires entertainment and gaming content. Since Vivo doesn’t produce content, we rely on the ecosystem to match up in time. Many indications show the industry is moving in a favorable direction. Tencent is increasing its investment in content. Previously, they wanted to make hardware, but recently they decided to focus on software, which is good for us.
I require the MR team to find scenarios we consider essential. It doesn’t matter if the target audience is niche, but for them, MR must be indispensable.
For example, games played on phones or consoles are at a certain level. When MR comes in, users will realize those were subpar, and the experience will be significantly enhanced. Except for not carrying MR devices all the time, most of the time, when they have time to play games, they’ll turn to MR. This is an essential scenario.
Regarding humanoid robots, in 2024, we also mentioned this concept. The demand is clear: society is aging rapidly.
From a trend perspective, robots are indeed a direction. We’ve analyzed some key paths for robots, one of which is spatial perception. MR has strong spatial perception capabilities. Once MR is well-developed, robots’ spatial perception won’t be an issue.
Robots also require flexible hands and feet and strong decision-making abilities. To achieve the ideal robot, we believe it will take more than ten years.
Spatial perception and decision-making abilities won’t be perfect in the short term, but hand and foot capabilities will improve relatively quickly, like industrial robots doing specialized tasks.
The ideal robot might take ten to fifteen years to achieve, but we can implement it in stages. For example, we can start with a limited range, like production line robots, which might do “two jobs,” but we hope to do “ten jobs” in the future. We’re building this capability, but product release won’t be fast.
Our current logic is that these robots, which we internally call scenario and user demand-driven, have clear needs, but the technical solution path isn’t fully clear. Like our previous discussion on imaging, users want DSLR-level photography. Robots have clear user scenario needs, but the technology doesn’t match. In the next three to five years, we’ll understand the state of technology maturity. Based on this, we can set a product with the ability to solve certain local scenarios at that midpoint.
In short, we need to understand the state of technology in the next three to five years, including AI capabilities. Based on this technological capability, we can make some adjustments in ideal scenarios to meet specific needs. This is our internal product cycle plan.
Q: The AR industry chain is maturing faster. What are your thoughts on this?
Hu Baishan: For AR products, we understand them this way: from a user demand perspective, glasses can’t be too heavy. AR glasses with displays are heavy, around 40-50 grams, which isn’t a good experience. Some AR glasses have limited display capabilities. We haven’t ventured into this category yet, but we are considering non-display glasses. No matter what product category we are working on, we need to identify the basic needs of the users and find a specific user group for whom the product is essential. Recently, I discussed with colleagues from the product team, and I asked them if they had identified the essential users and scenarios. They said they had found some, and it sounded reasonable.
Many users have their hands occupied while working. Do they need someone else to assist them? If there is only one person and their hands are occupied, an auxiliary device is needed to solve this problem. Mobile phones or other devices cannot solve this problem well. Therefore, the positioning logic of our MR device is that it is essential for that group of people, and we have identified these people. If the product progresses quickly, it will appear by the end of 2025, or by 2026 at the latest.
Changes in Foldable Screen Demand, Product Pace Will Adjust
Q: The foldable phone market, which has grown for 4 years, has stagnated or even declined. What is vivo’s plan for foldable phones?
Hu Baishan: Initially, manufacturers had high expectations for foldable screens because it was a significant change in product form. From the perspective of user needs, who is using foldable screens?
One group is people over 45 years old, like me, whose eyesight is deteriorating. Foldable phones have solved many problems related to presbyopia, as they need larger screens to read news or watch videos, addressing the needs of older people.
The second group includes media professionals like those present here. They use foldable phones to handle a large amount of information, including myself, to manage company emails and messages.
When handling information on a bar phone, it is usually in portrait mode, and you have to switch to landscape mode, which is not a good experience, and the text is relatively small.
Regardless of the group, it addresses the needs of specific people. When making products, we need to understand who the essential users are. When foldable screens first came out, many users tried them out of curiosity, but they found that it was not suitable for them.
I have a friend who said that besides using the phone for WeChat, calls, and texts, he mainly uses Douyin (TikTok), which is in portrait mode, so the foldable screen is useless for him, and he won’t buy another foldable phone.
After the initial development, the remaining users are the essential ones, as mentioned earlier. The market capacity for the first and second groups is relatively small. In many scenarios, such as gaming, foldable screens are not ideal. They have worse heat dissipation and control experience compared to bar phones, so foldable screens have become products for specific groups. The market size depends on the scale of these specific groups and may stabilize at around five million units.
For us, should we make foldable phones? Yes. From the perspective of user needs, there are those groups, but we need to control it. In the previous generation, we made two models, one focusing on imaging and performance, and the other on cost-effectiveness. We planned for millions of units in sales but ended up with hundreds of thousands, which is still limited. Moving forward, we will iterate annually, improving user experience, as there will always be some users who need foldable screens. For example, some users use one phone for daily WeChat and social interactions and another phone for stock market updates and document approvals.
Additionally, for small foldable products, the global market grew in 2023, but in 2024, leading brands’ small foldable products declined by 30% to 40%. Vivo is unlikely to release small foldable products in the future.
Flagship Phone Prices Will Continue to Rise, Sub-Flagship Experience Already Quite Good
Q: Flagship phone prices will slightly increase in 2025. Will the price increase continue in 2026? How does vivo balance cost and price?
Hu Baishan: We believe the price increase will continue due to two factors. The first is clear: the flagship SoC platform and semiconductor process will continue to improve, so price increases are inevitable. We are negotiating with SoC manufacturers to moderate the price increase, for example, by sacrificing some profit margins to maintain or slow down the price increase, such as increasing by $41 instead of $68, with the remaining $27 added the following year.
The second factor includes imaging, such as telephoto lenses, which are far from perfect. We need to continue investing annually. Although the space remains the same, the implementation methods, such as lens arrangement and module implementation, will change significantly. These changes will reduce yield rates and increase product costs.
The upward trend in flagship phone prices is inevitable. For most ordinary users, the sub-flagship experience is already quite good. For example, the N-1 platform (sub-flagship phones using the previous generation flagship chip) has significantly improved user experience. We may also include flagship imaging in N-1 platform products to meet users’ purchasing power.
In short, if users pursue the ultimate experience in imaging, AI, and gaming, they will need to spend about $68 more. If they do not pursue the ultimate experience, the N-1 platform offers a good appearance and decent experience. For users who do not play the most intense games and only play games like Genshin Impact, the N-1 platform is sufficient. For photography, if they do not need 20x zoom at concerts and are satisfied with 10x zoom, the standard X series can meet their needs.
Therefore, users with strong purchasing power and a desire for the ultimate experience will move up, but we will still offer products at suitable price points with good experiences to meet users’ needs.
Source from ifanr
Disclaimer: The information set forth above is provided by ifanr.com, independently of Chovm.com. Chovm.com makes no representation and warranties as to the quality and reliability of the seller and products. Chovm.com expressly disclaims any liability for breaches pertaining to the copyright of content.