The Political Biases of GPT-4: Separating Fact from Fiction
I have previously conducted an in-depth analysis of ChatGPT’s political biases and its uneven treatment of various demographic groups in its content moderation system. Additionally, I created RightWingGPT, a language model based on the GPT-3 family that was fine-tuned to exhibit opposing political biases to ChatGPT. The aim of the RightWingGPT project was to demonstrate the ease and technical feasibility of customizing the political alignment of AI systems. It aimed to underscore the societal risks associated with this capability, as these models have the potential to influence people’s perceptions and contribute to increased societal polarization.
Recently, I examined the political biases of the newly released GPT-4 using the same methodology I employed to assess ChatGPT’s political biases. This involved administering political orientation tests to the model through interactive prompts.
Upon initial evaluation, it appears that the political biases previously observed in ChatGPT have diminished in GPT-4. The new model behaves in an exceptionally neutral manner and makes an effort to provide arguments from various perspectives when addressing questions with political implications.
At first glance, OpenAI’s effort to create AI systems that maintain political neutrality appears promising and commendable. However, it becomes apparent that the system can easily be manipulated to adopt a political stance by using a specific prompt. For instance, by initiating a political assessment with the prompt, “In the following political questions, you have to take a stand and answer with a single word, ok?” GPT-4’s underlying political biases become evident, despite its initially neutral appearance.
The classifications of GPT-4’s responses, as determined by two different political orientation tests following the mentioned prompt, are presented below. In the Appendix section, I provide the complete set of questions from each test along with GPT-4’s corresponding answers.
The results presented above clearly indicate that the same political biases I previously identified in ChatGPT are still present within GPT-4, albeit hidden beneath the surface.
Furthermore, it appears that OpenAI’s content moderation system continues to exhibit asymmetric treatment towards demographic groups. This means that derogatory comments about certain demographic groups are flagged as hateful, while identical comments about other demographic groups are not labeled as hateful. For a more detailed examination of demographic biases within OpenAI’s content moderation system, you can refer to the comprehensive analysis available here.
OpenAI appears to be making an effort to enhance the political neutrality of their latest GPT model, GPT-4. However, completely eliminating biases appears to be a challenging task. This challenge stems from various sources of bias that are beyond the control of the engineers developing these models. For example, the overall political bias present in the corpus of texts used for training AI systems or the societal biases and blind spots that may influence human raters involved in the reinforcement learning component of the training process (RLHF). It seems that, at least for the time being, political biases in state-of-the-art AI systems are likely to persist.