Govur University Logo
--> --> --> -->
...

How does the iPhone's Portrait Mode figure out what part of your picture is the subject and what is the background to blur it?



The iPhone's Portrait Mode identifies the subject and background to apply a selective blur, mimicking the shallow depth of field of a professional camera, through a combination of advanced hardware and sophisticated machine learning algorithms. Its ability to sense depth varies by model. Older iPhone models with two rear cameras, for example, capture two slightly different perspectives of the same scene simultaneously, similar to how human eyes perceive depth. By analyzing the tiny differences in position (known as parallax) between corresponding points in these two images, the iPhone calculates the relative distance of objects in the scene. Objects that appear to shift more between the two views are closer to the camera, while those that shift less are farther away. Newer iPhone Pro models incorporate a LiDAR Scanner, which stands for Light Detection and Ranging. This scanner emits invisible laser light pulses and measures the time it takes for these pulses to reflect off objects and return to the sensor. Since the speed of light is known, the iPhone can precisely calculate the distance to each point in the environment, generating an extremely accurate 3D map of the scene. Regardless of the depth-sensing method, the iPhone's A-series chip, particularly its integrated Neural Engine, which is a specialized hardware component for machine learning, plays a crucial role. The Neural Engine analyzes the depth data alongside the regular camera image to perform semantic segmentation. Semantic segmentation is a computer vision technique where the system classifies each pixel in the image as belonging to a specific category, such as 'person,' 'face,' 'pet,' 'plant,' or 'background.' It has been trained on millions of images where subjects were manually outlined, allowing it to learn patterns and features that distinguish subjects from their surroundings, even complex elements like individual strands of hair or the edges of glasses. From this depth information and the results of semantic segmentation, the iPhone constructs a precise depth map. A depth map is essentially a grayscale representation of the scene where different shades of gray correspond to varying distances from the camera; for instance, lighter tones might indicate objects closer to the camera, and darker tones objects farther away. This depth map serves as a blueprint for where to apply the blur. Once the subject is accurately distinguished and its distance from the camera is known, the Neural Engine computationally applies a progressive blur effect to the background pixels. Pixels identified as background, especially those farther away from the subject, are blurred using algorithms like Gaussian blur. The intensity of the blur is not uniform; it increases with the distance from the subject and the focal plane, creating a natural-looking transition. The Neural Engine continuously refines the edges between the sharp subject and the blurred background to prevent artifacts and ensure a smooth, convincing 'bokeh' effect.