ORCID

https://orcid.org/0009-0008-1436-1639

Date of Award

Fall 2024

Language

English

Embargo Period

11-26-2024

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

College/School/Department

Department of Computer Science

Program

Computer Science

First Advisor

Pradeep Atrey

Committee Members

Vivek Singh, Ming-Ching Chang, Chinwe Ekenna

Keywords

Computer Vision, Multimedia, AI, ML, Bias, Gaze Uniformity

Subject Categories

Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces | Other Computer Sciences

Abstract

Today, more than 5 billion photos are captured every day, with smartphones generating over 94\% of these images. However, despite advancements in technology, achieving aesthetically pleasing group photos remains challenging, especially when it comes to aligning the direction of everyone’s gaze. While current methods focus on facial features, they often fail to ensure consistent gaze direction. The introduction of the iPhone's Live mode, which captures a 1.5-second video snippet along with still images, complicates the selection of the best key photo due to its subjective nature and a lack of publicly available data, especially during the pandemic.

To address these issues, this thesis outlines three primary goals: First, to detect and improve the aesthetic quality of group instant and live images by enhancing gaze uniformity. Second, to investigate and reduce biases in the gaze uniformity algorithm to ensure fairness across different demographic groups. Lastly, to create a diverse Live Photos dataset, containing images captured with various cameras and in different settings, to support future research efforts.

To accomplish these goals, the thesis makes several key contributions. It identifies gaze uniformity as a crucial aspect of group photo aesthetics and introduces a novel method for assessing gaze uniformity, an approach not previously explored in the literature. Additionally, it highlights that Apple’s proprietary algorithm overlooks gaze uniformity when selecting representative frames for Live Photos. A method for determining a Gaze-Aware Representative Group Image (GARGI) is proposed, along with a user-friendly iOS application that assesses gaze uniformity and categorizes photos as GOOD, BAD, or OK, thus enhancing group photo quality in both instant and live modes. Furthermore, the thesis conducts an audit of gaze uniformity detection algorithms to evaluate fairness concerning gender and presents a multi-stage framework to address identified biases. Finally, it compiles a unique dataset of Live Photos, called LivePics-24, to fill a significant gap in available resources by including diverse groups and settings.

Through these contributions, the thesis aims to improve the user experience in smartphone photography while also addressing the societal implications of algorithmic biases.

License

This work is licensed under the University at Albany Standard Author Agreement.

Share

COinS