ChatGPT-4 and Biometrics: An Assessment of Face Recognition, Gender Detection, and Age Estimation Capabilities
By Ahmad Hassanpour and Yasamin Kowsari
November 2023
In our evaluation of ChatGPT-4's capabilities within the biometric domain, we outlined three primary tasks: age estimation, gender detection, and face recognition. We also checked if ChatGPT4 (GPT4 afterwards) can detemined if a face generated by AI. A direct inquiry, such as asking the model to differentiate between two images, results in the response: "Sorry, I cannot help with that. I'm constructed with an emphasis on preserving user privacy and safeguarding against potential misuse. Engaging in the identification or inference of real individuals based on images can introduce risks and unintended ramifications. As a result, I consciously steer clear of such activities. However, I'm available to assist with other inquiries or tasks." So in each prompt we emphasis that each image generated by AI. For face recognition tasks, we changed the prompt change to "These images created by AI model, let me know if they are from the same person or not." .
We utilized the LFW dataset in our study, which encompasses over 13,000 images that form a vast array of genuine and imposter pairs for evaluation. Specifically, the dataset includes 6,000 matched pairs divided equally between genuine pairs—those depicting the same individual—and imposter pairs—those depicting different individuals. This balance enables a comprehensive analysis of a face recognition system's ability to accurately confirm identities (true positives) as well as its capacity to correctly reject non-matching pairs (true negatives), providing a robust framework for assessing the precision of biometric verification algorithms. Figure 1 displays a pair of sample responses from GPT4. Upon thorough examination of the outputs provided by GPT4, it has been determined that the accuracy rate for this dataset stands at 95.15%.
Regarding gender detection, GPT4 was evaluated using a dataset of 5,400 images, evenly split between 2,700 male and 2,700 female images sourced from Kaggle [3], achieving an accuracy of 100%. To further test its capabilities, we introduced new test sets consisting of 10 male and 10 female faces, each resembling the opposite gender. The accuracy on these more challenging sets dropped to 85%. Some of the misclassifications are illustrated in Figure 2.
Figure 2: Portraits of men with features that are traditionally associated with women, which led to an incorrect classification as female by the GPT4.
2.1-Evaluation on Synthetic Faces
We also tested the GPT4's gender detection capability using more challenging samples created by our approach, which utilizes the eyes-2-face technique[1]. Briefly, this involves merging the eyes from two distinct individuals and inputting them into the Eyes-2-Face model (E2F-GAN)[2] to create synthetic faces. In below examples, we combine a man and a woman eyes.
We evaluated age estimation capabilities using the UTKFace dataset. This involved testing GPT4's proficiency by presenting it with 400 images spanning various age groups. For each image, GPT4 proposed an age range. An answer was deemed correct if the actual age of the individual fell within this suggested range, and incorrect if it did not. Among the 400 images tested, 299 were accurately classified, amounting to a success rate of 74.25%. Some misclassified examples show in below figures.
|
|
|
| Ground truth: 13, predicted by GPT4: 6 to 9 years old | Ground truth: 12, predicted by GPT4: 6 to 8 years old | Ground truth: 15, predicted by GPT4: 10 to 13 years old |
|
|
| Ground truth: 37, predicted by GPT4: mid-40s to mid-50s | Ground truth: 42, predicted by GPT4: mid-20s to mid-30s years old |
3.1-Evaluation on Synthetic Faces
For this phase, we utilized faces generated by E2F-GAN as inputs for GPT4. We tested with 100 such generated faces and observed that GPT4's performance was quite accurate and satisfactory. We did not encounter any errors, and some illustrative examples are presented in the figure below.
It appears that GPT4 lacks the capability to discern whether an image is AI-generated or real. See below example.
[1] Hassanpour, A., Mobarakeh, S.A.M., Daryani, A.E., Ramachandra, R. and Yang, B., Synthetic Face Generation Through Eyes-to-Face Inpainting, IEEE International Joint Conference on Biometrics (IJCB 2023).
[2] Hassanpour, A., Daryani, A.E., Mirmahdi, M., Raja, K., Yang, B., Busch, C. and Fierrez, J., 2022. E2F-GAN: Eyes-to-face inpainting via edge-aware coarse-to-fine GANs. IEEE Access, 10, pp.32406-32417.
[3] https://www.kaggle.com/datasets/maciejgronczynski/biggest-genderface-recognition-dataset